[2024-06-15 11:31:05,740][1648985] Saving configuration to train_dir/atari_2B_atari_pooyan_1111/config.json... [2024-06-15 11:31:05,780][1648985] Rollout worker 0 uses device cpu [2024-06-15 11:31:05,784][1648985] Rollout worker 1 uses device cpu [2024-06-15 11:31:05,787][1648985] Rollout worker 2 uses device cpu [2024-06-15 11:31:05,790][1648985] Rollout worker 3 uses device cpu [2024-06-15 11:31:09,210][1648985] Using GPUs [0] for process 0 (actually maps to GPUs [1]) [2024-06-15 11:31:09,210][1648985] InferenceWorker_p0-w0: min num requests: 1 [2024-06-15 11:31:09,225][1648985] Starting all processes... [2024-06-15 11:31:09,225][1648985] Starting process learner_proc0 [2024-06-15 11:31:12,024][1648985] Starting all processes... [2024-06-15 11:31:12,026][1651469] Using GPUs [0] for process 0 (actually maps to GPUs [1]) [2024-06-15 11:31:12,026][1651469] Set environment var CUDA_VISIBLE_DEVICES to '1' (GPU indices [0]) for learning process 0 [2024-06-15 11:31:12,027][1648985] Starting process inference_proc0-0 [2024-06-15 11:31:12,027][1648985] Starting process rollout_proc0 [2024-06-15 11:31:12,027][1648985] Starting process rollout_proc1 [2024-06-15 11:31:12,027][1648985] Starting process rollout_proc2 [2024-06-15 11:31:12,027][1648985] Starting process rollout_proc3 [2024-06-15 11:31:12,167][1651469] Num visible devices: 1 [2024-06-15 11:31:12,219][1651469] Setting fixed seed 1111 [2024-06-15 11:31:12,221][1651469] Using GPUs [0] for process 0 (actually maps to GPUs [1]) [2024-06-15 11:31:12,221][1651469] Initializing actor-critic model on device cuda:0 [2024-06-15 11:31:12,221][1651469] RunningMeanStd input shape: (4, 84, 84) [2024-06-15 11:31:12,222][1651469] RunningMeanStd input shape: (1,) [2024-06-15 11:31:12,238][1651469] ConvEncoder: input_channels=4 [2024-06-15 11:31:12,360][1651469] Conv encoder output size: 512 [2024-06-15 11:31:12,363][1651469] Created Actor Critic model with architecture: [2024-06-15 11:31:12,363][1651469] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): MultiInputEncoder( (encoders): ModuleDict( (obs): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ReLU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ReLU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ReLU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ReLU) ) ) ) ) ) (core): ModelCoreIdentity() (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=6, bias=True) ) ) [2024-06-15 11:31:13,102][1651469] Using optimizer [2024-06-15 11:31:13,982][1651469] No checkpoints found [2024-06-15 11:31:13,982][1651469] Did not load from checkpoint, starting from scratch! [2024-06-15 11:31:13,983][1651469] Initialized policy 0 weights for model version 0 [2024-06-15 11:31:13,993][1651469] LearnerWorker_p0 finished initialization! [2024-06-15 11:31:13,993][1651469] Using GPUs [0] for process 0 (actually maps to GPUs [1]) [2024-06-15 11:31:14,120][1652491] Using GPUs [0] for process 0 (actually maps to GPUs [1]) [2024-06-15 11:31:14,120][1652491] Set environment var CUDA_VISIBLE_DEVICES to '1' (GPU indices [0]) for inference process 0 [2024-06-15 11:31:14,179][1652491] Num visible devices: 1 [2024-06-15 11:31:14,251][1652490] Worker 0 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23] [2024-06-15 11:31:14,549][1652491] RunningMeanStd input shape: (4, 84, 84) [2024-06-15 11:31:14,551][1652491] RunningMeanStd input shape: (1,) [2024-06-15 11:31:14,566][1652491] ConvEncoder: input_channels=4 [2024-06-15 11:31:14,574][1652492] Worker 2 uses CPU cores [48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71] [2024-06-15 11:31:14,683][1652489] Worker 1 uses CPU cores [24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47] [2024-06-15 11:31:14,708][1648985] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-06-15 11:31:14,711][1652493] Worker 3 uses CPU cores [72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95] [2024-06-15 11:31:14,712][1652491] Conv encoder output size: 512 [2024-06-15 11:31:14,724][1648985] Inference worker 0-0 is ready! [2024-06-15 11:31:14,724][1648985] All inference workers are ready! Signal rollout workers to start! [2024-06-15 11:31:14,725][1652490] EnvRunner 0-0 uses policy 0 [2024-06-15 11:31:14,726][1652492] EnvRunner 2-0 uses policy 0 [2024-06-15 11:31:14,760][1652489] EnvRunner 1-0 uses policy 0 [2024-06-15 11:31:14,782][1652493] EnvRunner 3-0 uses policy 0 [2024-06-15 11:31:15,955][1648985] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-06-15 11:31:20,955][1648985] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-06-15 11:31:25,955][1648985] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-06-15 11:31:29,203][1648985] Heartbeat connected on Batcher_0 [2024-06-15 11:31:29,206][1648985] Heartbeat connected on LearnerWorker_p0 [2024-06-15 11:31:29,256][1648985] Heartbeat connected on InferenceWorker_p0-w0 [2024-06-15 11:31:30,955][1648985] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-06-15 11:31:35,955][1648985] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-06-15 11:31:40,957][1648985] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-06-15 11:31:42,069][1648985] Heartbeat connected on RolloutWorker_w2 [2024-06-15 11:31:45,161][1648985] Heartbeat connected on RolloutWorker_w1 [2024-06-15 11:31:45,955][1648985] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 147.5. Samples: 4608. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-06-15 11:31:47,812][1648985] Heartbeat connected on RolloutWorker_w0 [2024-06-15 11:31:48,503][1648985] Heartbeat connected on RolloutWorker_w3 [2024-06-15 11:31:50,955][1648985] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 1582.0. Samples: 57344. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-06-15 11:31:53,693][1652492] Worker 2, sleep for 0.500 sec to decorrelate experience collection [2024-06-15 11:31:54,146][1651469] Signal inference workers to stop experience collection... [2024-06-15 11:31:54,198][1652492] Worker 2 awakens! [2024-06-15 11:31:54,204][1652491] InferenceWorker_p0-w0: stopping experience collection [2024-06-15 11:31:55,955][1648985] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 2420.5. Samples: 99840. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-06-15 11:31:56,822][1651469] Signal inference workers to resume experience collection... [2024-06-15 11:31:56,824][1652491] InferenceWorker_p0-w0: resuming experience collection [2024-06-15 11:31:57,970][1652491] Updated weights for policy 0, policy_version 64 (0.0011) [2024-06-15 11:31:58,006][1652489] Worker 1, sleep for 0.250 sec to decorrelate experience collection [2024-06-15 11:31:58,257][1652489] Worker 1 awakens! [2024-06-15 11:31:59,121][1652491] Updated weights for policy 0, policy_version 125 (0.0016) [2024-06-15 11:32:00,617][1652493] Worker 3, sleep for 0.750 sec to decorrelate experience collection [2024-06-15 11:32:00,955][1648985] Fps is (10 sec: 36045.4, 60 sec: 7794.0, 300 sec: 7794.0). Total num frames: 360448. Throughput: 0: 3310.9. Samples: 148992. Policy #0 lag: (min: 53.0, avg: 110.0, max: 113.0) [2024-06-15 11:32:01,208][1652491] Updated weights for policy 0, policy_version 192 (0.0011) [2024-06-15 11:32:01,370][1652493] Worker 3 awakens! [2024-06-15 11:32:05,955][1648985] Fps is (10 sec: 52429.1, 60 sec: 10230.6, 300 sec: 10230.6). Total num frames: 524288. Throughput: 0: 4755.9. Samples: 214016. Policy #0 lag: (min: 47.0, avg: 168.9, max: 207.0) [2024-06-15 11:32:05,956][1648985] Avg episode reward: [(0, '3.065')] [2024-06-15 11:32:05,957][1651469] Saving new best policy, reward=3.065! [2024-06-15 11:32:07,646][1652491] Updated weights for policy 0, policy_version 258 (0.0013) [2024-06-15 11:32:08,960][1652491] Updated weights for policy 0, policy_version 319 (0.0011) [2024-06-15 11:32:10,385][1652491] Updated weights for policy 0, policy_version 368 (0.0013) [2024-06-15 11:32:10,955][1648985] Fps is (10 sec: 42597.7, 60 sec: 13981.8, 300 sec: 13981.8). Total num frames: 786432. Throughput: 0: 5518.2. Samples: 248320. Policy #0 lag: (min: 47.0, avg: 145.0, max: 287.0) [2024-06-15 11:32:10,956][1648985] Avg episode reward: [(0, '4.310')] [2024-06-15 11:32:11,373][1652491] Updated weights for policy 0, policy_version 405 (0.0011) [2024-06-15 11:32:11,588][1651469] Saving new best policy, reward=4.310! [2024-06-15 11:32:13,055][1652491] Updated weights for policy 0, policy_version 449 (0.0011) [2024-06-15 11:32:14,453][1652491] Updated weights for policy 0, policy_version 503 (0.0010) [2024-06-15 11:32:15,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 17476.2, 300 sec: 17120.5). Total num frames: 1048576. Throughput: 0: 7088.3. Samples: 318976. Policy #0 lag: (min: 11.0, avg: 161.6, max: 263.0) [2024-06-15 11:32:15,956][1648985] Avg episode reward: [(0, '5.130')] [2024-06-15 11:32:15,956][1651469] Saving new best policy, reward=5.130! [2024-06-15 11:32:18,315][1652491] Updated weights for policy 0, policy_version 546 (0.0010) [2024-06-15 11:32:20,955][1648985] Fps is (10 sec: 42598.6, 60 sec: 20206.9, 300 sec: 18301.5). Total num frames: 1212416. Throughput: 0: 8772.2. Samples: 394752. Policy #0 lag: (min: 5.0, avg: 99.8, max: 261.0) [2024-06-15 11:32:20,956][1648985] Avg episode reward: [(0, '6.190')] [2024-06-15 11:32:21,232][1651469] Saving new best policy, reward=6.190! [2024-06-15 11:32:21,429][1652491] Updated weights for policy 0, policy_version 609 (0.0012) [2024-06-15 11:32:22,515][1652491] Updated weights for policy 0, policy_version 672 (0.0012) [2024-06-15 11:32:24,481][1652491] Updated weights for policy 0, policy_version 737 (0.0012) [2024-06-15 11:32:25,955][1648985] Fps is (10 sec: 52427.4, 60 sec: 26214.3, 300 sec: 22076.2). Total num frames: 1572864. Throughput: 0: 9580.5. Samples: 431104. Policy #0 lag: (min: 15.0, avg: 164.6, max: 271.0) [2024-06-15 11:32:25,957][1648985] Avg episode reward: [(0, '7.250')] [2024-06-15 11:32:25,975][1651469] Saving new best policy, reward=7.250! [2024-06-15 11:32:28,735][1652491] Updated weights for policy 0, policy_version 800 (0.0013) [2024-06-15 11:32:30,955][1648985] Fps is (10 sec: 49151.2, 60 sec: 28398.8, 300 sec: 22347.6). Total num frames: 1703936. Throughput: 0: 11093.3. Samples: 503808. Policy #0 lag: (min: 15.0, avg: 114.9, max: 271.0) [2024-06-15 11:32:30,956][1648985] Avg episode reward: [(0, '7.340')] [2024-06-15 11:32:30,957][1651469] Saving new best policy, reward=7.340! [2024-06-15 11:32:32,493][1652491] Updated weights for policy 0, policy_version 881 (0.0041) [2024-06-15 11:32:33,566][1652491] Updated weights for policy 0, policy_version 932 (0.0013) [2024-06-15 11:32:35,398][1652491] Updated weights for policy 0, policy_version 976 (0.0012) [2024-06-15 11:32:35,485][1651469] Signal inference workers to stop experience collection... (50 times) [2024-06-15 11:32:35,562][1652491] InferenceWorker_p0-w0: stopping experience collection (50 times) [2024-06-15 11:32:35,781][1651469] Signal inference workers to resume experience collection... (50 times) [2024-06-15 11:32:35,806][1652491] InferenceWorker_p0-w0: resuming experience collection (50 times) [2024-06-15 11:32:35,955][1648985] Fps is (10 sec: 45875.7, 60 sec: 33860.1, 300 sec: 25005.4). Total num frames: 2031616. Throughput: 0: 11411.9. Samples: 570880. Policy #0 lag: (min: 10.0, avg: 150.7, max: 266.0) [2024-06-15 11:32:35,956][1648985] Avg episode reward: [(0, '7.100')] [2024-06-15 11:32:39,100][1652491] Updated weights for policy 0, policy_version 1028 (0.0012) [2024-06-15 11:32:40,462][1652491] Updated weights for policy 0, policy_version 1081 (0.0012) [2024-06-15 11:32:40,955][1648985] Fps is (10 sec: 52429.7, 60 sec: 37138.3, 300 sec: 25835.4). Total num frames: 2228224. Throughput: 0: 11446.0. Samples: 614912. Policy #0 lag: (min: 15.0, avg: 116.7, max: 271.0) [2024-06-15 11:32:40,956][1648985] Avg episode reward: [(0, '7.870')] [2024-06-15 11:32:40,959][1651469] Saving new best policy, reward=7.870! [2024-06-15 11:32:42,718][1652491] Updated weights for policy 0, policy_version 1140 (0.0095) [2024-06-15 11:32:44,446][1652491] Updated weights for policy 0, policy_version 1186 (0.0012) [2024-06-15 11:32:45,955][1648985] Fps is (10 sec: 45875.5, 60 sec: 41506.0, 300 sec: 27292.6). Total num frames: 2490368. Throughput: 0: 11844.2. Samples: 681984. Policy #0 lag: (min: 31.0, avg: 129.6, max: 287.0) [2024-06-15 11:32:45,956][1648985] Avg episode reward: [(0, '9.070')] [2024-06-15 11:32:45,957][1651469] Saving new best policy, reward=9.070! [2024-06-15 11:32:46,444][1652491] Updated weights for policy 0, policy_version 1232 (0.0013) [2024-06-15 11:32:47,558][1652491] Updated weights for policy 0, policy_version 1278 (0.0012) [2024-06-15 11:32:50,955][1648985] Fps is (10 sec: 42598.8, 60 sec: 44236.9, 300 sec: 27577.1). Total num frames: 2654208. Throughput: 0: 12083.2. Samples: 757760. Policy #0 lag: (min: 15.0, avg: 111.0, max: 271.0) [2024-06-15 11:32:50,956][1648985] Avg episode reward: [(0, '9.130')] [2024-06-15 11:32:51,585][1651469] Saving new best policy, reward=9.130! [2024-06-15 11:32:51,789][1652491] Updated weights for policy 0, policy_version 1336 (0.0013) [2024-06-15 11:32:53,215][1652491] Updated weights for policy 0, policy_version 1392 (0.0013) [2024-06-15 11:32:55,282][1652491] Updated weights for policy 0, policy_version 1456 (0.0019) [2024-06-15 11:32:55,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 50244.3, 300 sec: 29775.3). Total num frames: 3014656. Throughput: 0: 12083.2. Samples: 792064. Policy #0 lag: (min: 15.0, avg: 146.2, max: 271.0) [2024-06-15 11:32:55,956][1648985] Avg episode reward: [(0, '8.890')] [2024-06-15 11:32:55,961][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000001472_3014656.pth... [2024-06-15 11:32:57,762][1652491] Updated weights for policy 0, policy_version 1520 (0.0154) [2024-06-15 11:33:00,955][1648985] Fps is (10 sec: 49152.4, 60 sec: 46421.4, 300 sec: 29607.8). Total num frames: 3145728. Throughput: 0: 12037.7. Samples: 860672. Policy #0 lag: (min: 37.0, avg: 173.9, max: 293.0) [2024-06-15 11:33:00,955][1648985] Avg episode reward: [(0, '7.820')] [2024-06-15 11:33:03,220][1652491] Updated weights for policy 0, policy_version 1568 (0.0012) [2024-06-15 11:33:04,438][1652491] Updated weights for policy 0, policy_version 1616 (0.0013) [2024-06-15 11:33:05,955][1648985] Fps is (10 sec: 39321.4, 60 sec: 48059.6, 300 sec: 30633.4). Total num frames: 3407872. Throughput: 0: 11867.0. Samples: 928768. Policy #0 lag: (min: 15.0, avg: 92.8, max: 271.0) [2024-06-15 11:33:05,956][1648985] Avg episode reward: [(0, '8.280')] [2024-06-15 11:33:06,044][1652491] Updated weights for policy 0, policy_version 1668 (0.0020) [2024-06-15 11:33:07,178][1652491] Updated weights for policy 0, policy_version 1719 (0.0014) [2024-06-15 11:33:08,358][1652491] Updated weights for policy 0, policy_version 1749 (0.0014) [2024-06-15 11:33:10,955][1648985] Fps is (10 sec: 52428.6, 60 sec: 48059.9, 300 sec: 31570.9). Total num frames: 3670016. Throughput: 0: 11730.6. Samples: 958976. Policy #0 lag: (min: 17.0, avg: 153.2, max: 271.0) [2024-06-15 11:33:10,955][1648985] Avg episode reward: [(0, '8.960')] [2024-06-15 11:33:15,171][1652491] Updated weights for policy 0, policy_version 1811 (0.0014) [2024-06-15 11:33:15,979][1648985] Fps is (10 sec: 35957.6, 60 sec: 45310.7, 300 sec: 31073.5). Total num frames: 3768320. Throughput: 0: 11758.3. Samples: 1033216. Policy #0 lag: (min: 15.0, avg: 71.8, max: 239.0) [2024-06-15 11:33:15,980][1648985] Avg episode reward: [(0, '8.510')] [2024-06-15 11:33:16,644][1652491] Updated weights for policy 0, policy_version 1872 (0.0012) [2024-06-15 11:33:18,376][1652491] Updated weights for policy 0, policy_version 1936 (0.0014) [2024-06-15 11:33:18,914][1651469] Signal inference workers to stop experience collection... (100 times) [2024-06-15 11:33:18,988][1652491] InferenceWorker_p0-w0: stopping experience collection (100 times) [2024-06-15 11:33:19,133][1651469] Signal inference workers to resume experience collection... (100 times) [2024-06-15 11:33:19,134][1652491] InferenceWorker_p0-w0: resuming experience collection (100 times) [2024-06-15 11:33:19,482][1652491] Updated weights for policy 0, policy_version 1983 (0.0096) [2024-06-15 11:33:20,955][1648985] Fps is (10 sec: 45874.4, 60 sec: 48605.8, 300 sec: 32703.9). Total num frames: 4128768. Throughput: 0: 11457.4. Samples: 1086464. Policy #0 lag: (min: 51.0, avg: 158.6, max: 276.0) [2024-06-15 11:33:20,956][1648985] Avg episode reward: [(0, '9.390')] [2024-06-15 11:33:21,356][1652491] Updated weights for policy 0, policy_version 2042 (0.0013) [2024-06-15 11:33:21,415][1651469] Saving new best policy, reward=9.390! [2024-06-15 11:33:25,956][1648985] Fps is (10 sec: 42700.1, 60 sec: 43690.5, 300 sec: 31957.2). Total num frames: 4194304. Throughput: 0: 11320.8. Samples: 1124352. Policy #0 lag: (min: 51.0, avg: 158.6, max: 276.0) [2024-06-15 11:33:25,957][1648985] Avg episode reward: [(0, '8.620')] [2024-06-15 11:33:28,926][1652491] Updated weights for policy 0, policy_version 2112 (0.0013) [2024-06-15 11:33:30,468][1652491] Updated weights for policy 0, policy_version 2165 (0.0013) [2024-06-15 11:33:30,955][1648985] Fps is (10 sec: 32767.6, 60 sec: 45875.2, 300 sec: 32708.6). Total num frames: 4456448. Throughput: 0: 11207.1. Samples: 1186304. Policy #0 lag: (min: 15.0, avg: 74.9, max: 271.0) [2024-06-15 11:33:30,956][1648985] Avg episode reward: [(0, '11.370')] [2024-06-15 11:33:31,356][1651469] Saving new best policy, reward=11.370! [2024-06-15 11:33:32,601][1652491] Updated weights for policy 0, policy_version 2256 (0.0036) [2024-06-15 11:33:33,558][1652491] Updated weights for policy 0, policy_version 2302 (0.0013) [2024-06-15 11:33:35,955][1648985] Fps is (10 sec: 52431.3, 60 sec: 44783.0, 300 sec: 33406.7). Total num frames: 4718592. Throughput: 0: 10968.2. Samples: 1251328. Policy #0 lag: (min: 154.0, avg: 219.9, max: 410.0) [2024-06-15 11:33:35,956][1648985] Avg episode reward: [(0, '11.100')] [2024-06-15 11:33:40,955][1648985] Fps is (10 sec: 32768.6, 60 sec: 42598.4, 300 sec: 32712.7). Total num frames: 4784128. Throughput: 0: 10979.6. Samples: 1286144. Policy #0 lag: (min: 12.0, avg: 71.4, max: 268.0) [2024-06-15 11:33:40,956][1648985] Avg episode reward: [(0, '9.470')] [2024-06-15 11:33:41,405][1652491] Updated weights for policy 0, policy_version 2368 (0.0015) [2024-06-15 11:33:43,654][1652491] Updated weights for policy 0, policy_version 2468 (0.0085) [2024-06-15 11:33:45,174][1652491] Updated weights for policy 0, policy_version 2536 (0.0014) [2024-06-15 11:33:45,955][1648985] Fps is (10 sec: 52427.8, 60 sec: 45875.1, 300 sec: 34664.3). Total num frames: 5242880. Throughput: 0: 10706.4. Samples: 1342464. Policy #0 lag: (min: 175.0, avg: 220.4, max: 383.0) [2024-06-15 11:33:45,956][1648985] Avg episode reward: [(0, '10.370')] [2024-06-15 11:33:50,955][1648985] Fps is (10 sec: 45875.1, 60 sec: 43144.5, 300 sec: 33555.1). Total num frames: 5242880. Throughput: 0: 10968.2. Samples: 1422336. Policy #0 lag: (min: 175.0, avg: 220.4, max: 383.0) [2024-06-15 11:33:50,956][1648985] Avg episode reward: [(0, '11.790')] [2024-06-15 11:33:50,957][1651469] Saving new best policy, reward=11.790! [2024-06-15 11:33:52,079][1652491] Updated weights for policy 0, policy_version 2576 (0.0019) [2024-06-15 11:33:54,439][1652491] Updated weights for policy 0, policy_version 2673 (0.0012) [2024-06-15 11:33:55,964][1648985] Fps is (10 sec: 39289.0, 60 sec: 43684.5, 300 sec: 34951.4). Total num frames: 5636096. Throughput: 0: 10852.3. Samples: 1447424. Policy #0 lag: (min: 185.0, avg: 224.2, max: 409.0) [2024-06-15 11:33:55,965][1648985] Avg episode reward: [(0, '13.660')] [2024-06-15 11:33:56,363][1651469] Saving new best policy, reward=13.660! [2024-06-15 11:33:56,367][1652491] Updated weights for policy 0, policy_version 2768 (0.0012) [2024-06-15 11:34:00,955][1648985] Fps is (10 sec: 52429.1, 60 sec: 43690.6, 300 sec: 34690.4). Total num frames: 5767168. Throughput: 0: 10644.0. Samples: 1511936. Policy #0 lag: (min: 185.0, avg: 224.2, max: 409.0) [2024-06-15 11:34:00,956][1648985] Avg episode reward: [(0, '12.100')] [2024-06-15 11:34:04,186][1651469] Signal inference workers to stop experience collection... (150 times) [2024-06-15 11:34:04,222][1652491] InferenceWorker_p0-w0: stopping experience collection (150 times) [2024-06-15 11:34:04,430][1651469] Signal inference workers to resume experience collection... (150 times) [2024-06-15 11:34:04,431][1652491] InferenceWorker_p0-w0: resuming experience collection (150 times) [2024-06-15 11:34:04,711][1652491] Updated weights for policy 0, policy_version 2880 (0.0014) [2024-06-15 11:34:05,955][1648985] Fps is (10 sec: 29516.2, 60 sec: 42052.3, 300 sec: 34634.2). Total num frames: 5931008. Throughput: 0: 10934.0. Samples: 1578496. Policy #0 lag: (min: 15.0, avg: 61.2, max: 271.0) [2024-06-15 11:34:05,956][1648985] Avg episode reward: [(0, '13.020')] [2024-06-15 11:34:07,310][1652491] Updated weights for policy 0, policy_version 2976 (0.0029) [2024-06-15 11:34:08,862][1652491] Updated weights for policy 0, policy_version 3042 (0.0013) [2024-06-15 11:34:10,955][1648985] Fps is (10 sec: 52428.3, 60 sec: 43690.5, 300 sec: 35696.8). Total num frames: 6291456. Throughput: 0: 10649.7. Samples: 1603584. Policy #0 lag: (min: 110.0, avg: 196.0, max: 399.0) [2024-06-15 11:34:10,956][1648985] Avg episode reward: [(0, '14.830')] [2024-06-15 11:34:10,980][1651469] Saving new best policy, reward=14.830! [2024-06-15 11:34:15,955][1648985] Fps is (10 sec: 42598.9, 60 sec: 43162.1, 300 sec: 35073.7). Total num frames: 6356992. Throughput: 0: 11059.3. Samples: 1683968. Policy #0 lag: (min: 15.0, avg: 76.1, max: 271.0) [2024-06-15 11:34:15,955][1648985] Avg episode reward: [(0, '13.990')] [2024-06-15 11:34:16,199][1652491] Updated weights for policy 0, policy_version 3120 (0.0013) [2024-06-15 11:34:18,253][1652491] Updated weights for policy 0, policy_version 3171 (0.0015) [2024-06-15 11:34:19,906][1652491] Updated weights for policy 0, policy_version 3252 (0.0014) [2024-06-15 11:34:20,955][1648985] Fps is (10 sec: 45875.6, 60 sec: 43690.7, 300 sec: 36243.3). Total num frames: 6750208. Throughput: 0: 10854.4. Samples: 1739776. Policy #0 lag: (min: 111.0, avg: 179.5, max: 303.0) [2024-06-15 11:34:20,956][1648985] Avg episode reward: [(0, '14.860')] [2024-06-15 11:34:21,295][1651469] Saving new best policy, reward=14.860! [2024-06-15 11:34:21,599][1652491] Updated weights for policy 0, policy_version 3323 (0.0024) [2024-06-15 11:34:25,955][1648985] Fps is (10 sec: 45874.5, 60 sec: 43691.0, 300 sec: 35638.4). Total num frames: 6815744. Throughput: 0: 10934.0. Samples: 1778176. Policy #0 lag: (min: 111.0, avg: 179.5, max: 303.0) [2024-06-15 11:34:25,956][1648985] Avg episode reward: [(0, '15.090')] [2024-06-15 11:34:25,961][1651469] Saving new best policy, reward=15.090! [2024-06-15 11:34:28,117][1652491] Updated weights for policy 0, policy_version 3381 (0.0014) [2024-06-15 11:34:29,337][1652491] Updated weights for policy 0, policy_version 3426 (0.0030) [2024-06-15 11:34:30,955][1648985] Fps is (10 sec: 42598.1, 60 sec: 45329.2, 300 sec: 36567.2). Total num frames: 7176192. Throughput: 0: 11264.0. Samples: 1849344. Policy #0 lag: (min: 15.0, avg: 79.4, max: 271.0) [2024-06-15 11:34:30,956][1648985] Avg episode reward: [(0, '19.100')] [2024-06-15 11:34:31,366][1652491] Updated weights for policy 0, policy_version 3536 (0.0220) [2024-06-15 11:34:31,387][1651469] Saving new best policy, reward=19.100! [2024-06-15 11:34:35,955][1648985] Fps is (10 sec: 52429.3, 60 sec: 43690.7, 300 sec: 36472.8). Total num frames: 7340032. Throughput: 0: 11070.6. Samples: 1920512. Policy #0 lag: (min: 154.0, avg: 237.3, max: 394.0) [2024-06-15 11:34:35,956][1648985] Avg episode reward: [(0, '16.410')] [2024-06-15 11:34:38,432][1652491] Updated weights for policy 0, policy_version 3601 (0.0013) [2024-06-15 11:34:39,889][1652491] Updated weights for policy 0, policy_version 3664 (0.0138) [2024-06-15 11:34:40,955][1648985] Fps is (10 sec: 39321.7, 60 sec: 46421.3, 300 sec: 36700.7). Total num frames: 7569408. Throughput: 0: 11471.0. Samples: 1963520. Policy #0 lag: (min: 15.0, avg: 71.1, max: 271.0) [2024-06-15 11:34:40,956][1648985] Avg episode reward: [(0, '17.190')] [2024-06-15 11:34:41,825][1652491] Updated weights for policy 0, policy_version 3750 (0.0017) [2024-06-15 11:34:42,050][1651469] Signal inference workers to stop experience collection... (200 times) [2024-06-15 11:34:42,144][1652491] InferenceWorker_p0-w0: stopping experience collection (200 times) [2024-06-15 11:34:42,279][1651469] Signal inference workers to resume experience collection... (200 times) [2024-06-15 11:34:42,280][1652491] InferenceWorker_p0-w0: resuming experience collection (200 times) [2024-06-15 11:34:43,288][1652491] Updated weights for policy 0, policy_version 3812 (0.0014) [2024-06-15 11:34:45,955][1648985] Fps is (10 sec: 52428.6, 60 sec: 43690.8, 300 sec: 37228.1). Total num frames: 7864320. Throughput: 0: 11491.5. Samples: 2029056. Policy #0 lag: (min: 159.0, avg: 237.2, max: 399.0) [2024-06-15 11:34:45,956][1648985] Avg episode reward: [(0, '19.500')] [2024-06-15 11:34:45,957][1651469] Saving new best policy, reward=19.500! [2024-06-15 11:34:48,695][1652491] Updated weights for policy 0, policy_version 3842 (0.0031) [2024-06-15 11:34:50,320][1652491] Updated weights for policy 0, policy_version 3907 (0.0011) [2024-06-15 11:34:50,955][1648985] Fps is (10 sec: 49151.2, 60 sec: 46967.3, 300 sec: 37276.5). Total num frames: 8060928. Throughput: 0: 11628.1. Samples: 2101760. Policy #0 lag: (min: 10.0, avg: 75.0, max: 266.0) [2024-06-15 11:34:50,956][1648985] Avg episode reward: [(0, '19.850')] [2024-06-15 11:34:51,283][1652491] Updated weights for policy 0, policy_version 3957 (0.0105) [2024-06-15 11:34:51,446][1651469] Saving new best policy, reward=19.850! [2024-06-15 11:34:52,643][1652491] Updated weights for policy 0, policy_version 4020 (0.0016) [2024-06-15 11:34:53,557][1652491] Updated weights for policy 0, policy_version 4070 (0.0016) [2024-06-15 11:34:55,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 45881.7, 300 sec: 37915.1). Total num frames: 8388608. Throughput: 0: 11787.4. Samples: 2134016. Policy #0 lag: (min: 125.0, avg: 255.3, max: 415.0) [2024-06-15 11:34:55,956][1648985] Avg episode reward: [(0, '20.460')] [2024-06-15 11:34:55,960][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000004096_8388608.pth... [2024-06-15 11:34:56,074][1651469] Saving new best policy, reward=20.460! [2024-06-15 11:35:00,007][1652491] Updated weights for policy 0, policy_version 4116 (0.0011) [2024-06-15 11:35:00,955][1648985] Fps is (10 sec: 42598.6, 60 sec: 45329.0, 300 sec: 37511.7). Total num frames: 8486912. Throughput: 0: 11810.1. Samples: 2215424. Policy #0 lag: (min: 15.0, avg: 71.0, max: 271.0) [2024-06-15 11:35:00,956][1648985] Avg episode reward: [(0, '21.430')] [2024-06-15 11:35:01,297][1652491] Updated weights for policy 0, policy_version 4163 (0.0011) [2024-06-15 11:35:01,499][1651469] Saving new best policy, reward=21.430! [2024-06-15 11:35:02,647][1652491] Updated weights for policy 0, policy_version 4229 (0.0013) [2024-06-15 11:35:03,588][1652491] Updated weights for policy 0, policy_version 4282 (0.0014) [2024-06-15 11:35:04,754][1652491] Updated weights for policy 0, policy_version 4336 (0.0014) [2024-06-15 11:35:05,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 49698.2, 300 sec: 38542.8). Total num frames: 8912896. Throughput: 0: 11980.8. Samples: 2278912. Policy #0 lag: (min: 111.0, avg: 238.3, max: 375.0) [2024-06-15 11:35:05,956][1648985] Avg episode reward: [(0, '20.430')] [2024-06-15 11:35:10,955][1648985] Fps is (10 sec: 49151.8, 60 sec: 44782.9, 300 sec: 38004.4). Total num frames: 8978432. Throughput: 0: 12083.2. Samples: 2321920. Policy #0 lag: (min: 3.0, avg: 64.3, max: 259.0) [2024-06-15 11:35:10,956][1648985] Avg episode reward: [(0, '21.970')] [2024-06-15 11:35:11,047][1652491] Updated weights for policy 0, policy_version 4384 (0.0125) [2024-06-15 11:35:11,362][1651469] Saving new best policy, reward=21.970! [2024-06-15 11:35:12,119][1652491] Updated weights for policy 0, policy_version 4418 (0.0086) [2024-06-15 11:35:13,443][1652491] Updated weights for policy 0, policy_version 4478 (0.0013) [2024-06-15 11:35:14,698][1652491] Updated weights for policy 0, policy_version 4528 (0.0014) [2024-06-15 11:35:15,955][1648985] Fps is (10 sec: 49151.7, 60 sec: 50790.3, 300 sec: 38982.5). Total num frames: 9404416. Throughput: 0: 11935.3. Samples: 2386432. Policy #0 lag: (min: 79.0, avg: 205.2, max: 335.0) [2024-06-15 11:35:15,956][1648985] Avg episode reward: [(0, '19.990')] [2024-06-15 11:35:16,293][1652491] Updated weights for policy 0, policy_version 4608 (0.0035) [2024-06-15 11:35:20,955][1648985] Fps is (10 sec: 45875.5, 60 sec: 44782.8, 300 sec: 38324.1). Total num frames: 9437184. Throughput: 0: 11958.0. Samples: 2458624. Policy #0 lag: (min: 79.0, avg: 205.2, max: 335.0) [2024-06-15 11:35:20,956][1648985] Avg episode reward: [(0, '21.530')] [2024-06-15 11:35:22,131][1651469] Signal inference workers to stop experience collection... (250 times) [2024-06-15 11:35:22,173][1652491] InferenceWorker_p0-w0: stopping experience collection (250 times) [2024-06-15 11:35:22,435][1651469] Signal inference workers to resume experience collection... (250 times) [2024-06-15 11:35:22,436][1652491] InferenceWorker_p0-w0: resuming experience collection (250 times) [2024-06-15 11:35:22,697][1652491] Updated weights for policy 0, policy_version 4666 (0.0013) [2024-06-15 11:35:24,193][1652491] Updated weights for policy 0, policy_version 4730 (0.0012) [2024-06-15 11:35:25,955][1648985] Fps is (10 sec: 36044.4, 60 sec: 49151.9, 300 sec: 38865.6). Total num frames: 9764864. Throughput: 0: 11707.7. Samples: 2490368. Policy #0 lag: (min: 0.0, avg: 72.0, max: 256.0) [2024-06-15 11:35:25,956][1648985] Avg episode reward: [(0, '24.760')] [2024-06-15 11:35:26,181][1651469] Saving new best policy, reward=24.760! [2024-06-15 11:35:26,654][1652491] Updated weights for policy 0, policy_version 4800 (0.0013) [2024-06-15 11:35:30,955][1648985] Fps is (10 sec: 52428.4, 60 sec: 46421.2, 300 sec: 38874.5). Total num frames: 9961472. Throughput: 0: 11571.2. Samples: 2549760. Policy #0 lag: (min: 63.0, avg: 176.9, max: 319.0) [2024-06-15 11:35:30,956][1648985] Avg episode reward: [(0, '24.300')] [2024-06-15 11:35:33,420][1652491] Updated weights for policy 0, policy_version 4868 (0.0014) [2024-06-15 11:35:35,114][1652491] Updated weights for policy 0, policy_version 4932 (0.0013) [2024-06-15 11:35:35,960][1648985] Fps is (10 sec: 39302.7, 60 sec: 46963.5, 300 sec: 38882.3). Total num frames: 10158080. Throughput: 0: 11626.8. Samples: 2625024. Policy #0 lag: (min: 9.0, avg: 78.0, max: 265.0) [2024-06-15 11:35:35,961][1648985] Avg episode reward: [(0, '22.220')] [2024-06-15 11:35:36,249][1652491] Updated weights for policy 0, policy_version 4991 (0.0046) [2024-06-15 11:35:38,589][1652491] Updated weights for policy 0, policy_version 5072 (0.0012) [2024-06-15 11:35:40,955][1648985] Fps is (10 sec: 52429.4, 60 sec: 48605.8, 300 sec: 39383.6). Total num frames: 10485760. Throughput: 0: 11468.8. Samples: 2650112. Policy #0 lag: (min: 94.0, avg: 209.1, max: 335.0) [2024-06-15 11:35:40,956][1648985] Avg episode reward: [(0, '27.460')] [2024-06-15 11:35:40,996][1651469] Saving new best policy, reward=27.460! [2024-06-15 11:35:45,955][1648985] Fps is (10 sec: 36063.0, 60 sec: 44236.8, 300 sec: 38778.4). Total num frames: 10518528. Throughput: 0: 11423.3. Samples: 2729472. Policy #0 lag: (min: 15.0, avg: 76.3, max: 271.0) [2024-06-15 11:35:45,956][1648985] Avg episode reward: [(0, '25.140')] [2024-06-15 11:35:46,790][1652491] Updated weights for policy 0, policy_version 5175 (0.0093) [2024-06-15 11:35:47,856][1652491] Updated weights for policy 0, policy_version 5217 (0.0013) [2024-06-15 11:35:49,535][1652491] Updated weights for policy 0, policy_version 5280 (0.0130) [2024-06-15 11:35:50,955][1648985] Fps is (10 sec: 42598.2, 60 sec: 47513.6, 300 sec: 39500.0). Total num frames: 10911744. Throughput: 0: 11173.0. Samples: 2781696. Policy #0 lag: (min: 15.0, avg: 76.3, max: 271.0) [2024-06-15 11:35:50,956][1648985] Avg episode reward: [(0, '26.080')] [2024-06-15 11:35:51,377][1652491] Updated weights for policy 0, policy_version 5360 (0.0013) [2024-06-15 11:35:55,955][1648985] Fps is (10 sec: 49151.6, 60 sec: 43690.7, 300 sec: 39147.3). Total num frames: 11010048. Throughput: 0: 11036.5. Samples: 2818560. Policy #0 lag: (min: 111.0, avg: 229.6, max: 351.0) [2024-06-15 11:35:55,956][1648985] Avg episode reward: [(0, '26.660')] [2024-06-15 11:35:58,722][1652491] Updated weights for policy 0, policy_version 5411 (0.0015) [2024-06-15 11:36:00,955][1648985] Fps is (10 sec: 32768.2, 60 sec: 45875.2, 300 sec: 39264.8). Total num frames: 11239424. Throughput: 0: 11173.0. Samples: 2889216. Policy #0 lag: (min: 6.0, avg: 58.3, max: 262.0) [2024-06-15 11:36:00,956][1648985] Avg episode reward: [(0, '24.300')] [2024-06-15 11:36:01,613][1652491] Updated weights for policy 0, policy_version 5520 (0.0014) [2024-06-15 11:36:02,563][1651469] Signal inference workers to stop experience collection... (300 times) [2024-06-15 11:36:02,622][1652491] InferenceWorker_p0-w0: stopping experience collection (300 times) [2024-06-15 11:36:02,854][1651469] Signal inference workers to resume experience collection... (300 times) [2024-06-15 11:36:02,855][1652491] InferenceWorker_p0-w0: resuming experience collection (300 times) [2024-06-15 11:36:03,026][1652491] Updated weights for policy 0, policy_version 5569 (0.0012) [2024-06-15 11:36:05,955][1648985] Fps is (10 sec: 52427.9, 60 sec: 43690.5, 300 sec: 39603.3). Total num frames: 11534336. Throughput: 0: 10740.6. Samples: 2941952. Policy #0 lag: (min: 141.0, avg: 228.2, max: 335.0) [2024-06-15 11:36:05,956][1648985] Avg episode reward: [(0, '25.970')] [2024-06-15 11:36:10,679][1652491] Updated weights for policy 0, policy_version 5634 (0.0012) [2024-06-15 11:36:10,955][1648985] Fps is (10 sec: 32768.1, 60 sec: 43144.6, 300 sec: 39210.5). Total num frames: 11567104. Throughput: 0: 10888.6. Samples: 2980352. Policy #0 lag: (min: 14.0, avg: 66.1, max: 270.0) [2024-06-15 11:36:10,956][1648985] Avg episode reward: [(0, '27.100')] [2024-06-15 11:36:12,702][1652491] Updated weights for policy 0, policy_version 5714 (0.0115) [2024-06-15 11:36:14,617][1652491] Updated weights for policy 0, policy_version 5792 (0.0016) [2024-06-15 11:36:15,903][1652491] Updated weights for policy 0, policy_version 5842 (0.0024) [2024-06-15 11:36:15,955][1648985] Fps is (10 sec: 42598.7, 60 sec: 42598.4, 300 sec: 40543.4). Total num frames: 11960320. Throughput: 0: 10911.3. Samples: 3040768. Policy #0 lag: (min: 14.0, avg: 66.1, max: 270.0) [2024-06-15 11:36:15,956][1648985] Avg episode reward: [(0, '26.970')] [2024-06-15 11:36:20,955][1648985] Fps is (10 sec: 49152.2, 60 sec: 43690.7, 300 sec: 40876.7). Total num frames: 12058624. Throughput: 0: 10776.0. Samples: 3109888. Policy #0 lag: (min: 191.0, avg: 271.4, max: 415.0) [2024-06-15 11:36:20,956][1648985] Avg episode reward: [(0, '28.860')] [2024-06-15 11:36:20,956][1651469] Saving new best policy, reward=28.860! [2024-06-15 11:36:22,937][1652491] Updated weights for policy 0, policy_version 5889 (0.0014) [2024-06-15 11:36:24,431][1652491] Updated weights for policy 0, policy_version 5952 (0.0093) [2024-06-15 11:36:25,955][1648985] Fps is (10 sec: 32768.6, 60 sec: 42052.4, 300 sec: 41654.2). Total num frames: 12288000. Throughput: 0: 11082.0. Samples: 3148800. Policy #0 lag: (min: 1.0, avg: 56.8, max: 257.0) [2024-06-15 11:36:25,955][1648985] Avg episode reward: [(0, '27.950')] [2024-06-15 11:36:26,524][1652491] Updated weights for policy 0, policy_version 6032 (0.0012) [2024-06-15 11:36:28,055][1652491] Updated weights for policy 0, policy_version 6084 (0.0012) [2024-06-15 11:36:29,144][1652491] Updated weights for policy 0, policy_version 6131 (0.0014) [2024-06-15 11:36:30,955][1648985] Fps is (10 sec: 52427.6, 60 sec: 43690.6, 300 sec: 42653.9). Total num frames: 12582912. Throughput: 0: 10501.6. Samples: 3202048. Policy #0 lag: (min: 207.0, avg: 270.4, max: 431.0) [2024-06-15 11:36:30,956][1648985] Avg episode reward: [(0, '29.790')] [2024-06-15 11:36:30,957][1651469] Saving new best policy, reward=29.790! [2024-06-15 11:36:35,060][1652491] Updated weights for policy 0, policy_version 6160 (0.0011) [2024-06-15 11:36:35,955][1648985] Fps is (10 sec: 39321.5, 60 sec: 42055.8, 300 sec: 42987.5). Total num frames: 12681216. Throughput: 0: 11070.6. Samples: 3279872. Policy #0 lag: (min: 13.0, avg: 72.4, max: 269.0) [2024-06-15 11:36:35,956][1648985] Avg episode reward: [(0, '29.880')] [2024-06-15 11:36:36,566][1651469] Saving new best policy, reward=29.880! [2024-06-15 11:36:37,312][1652491] Updated weights for policy 0, policy_version 6241 (0.0028) [2024-06-15 11:36:38,968][1652491] Updated weights for policy 0, policy_version 6305 (0.0011) [2024-06-15 11:36:40,921][1652491] Updated weights for policy 0, policy_version 6373 (0.0018) [2024-06-15 11:36:40,955][1648985] Fps is (10 sec: 45876.1, 60 sec: 42598.4, 300 sec: 44209.0). Total num frames: 13041664. Throughput: 0: 10717.9. Samples: 3300864. Policy #0 lag: (min: 13.0, avg: 72.4, max: 269.0) [2024-06-15 11:36:40,956][1648985] Avg episode reward: [(0, '27.710')] [2024-06-15 11:36:45,955][1648985] Fps is (10 sec: 42597.5, 60 sec: 43144.4, 300 sec: 44431.2). Total num frames: 13107200. Throughput: 0: 10615.4. Samples: 3366912. Policy #0 lag: (min: 143.0, avg: 258.4, max: 429.0) [2024-06-15 11:36:45,956][1648985] Avg episode reward: [(0, '28.310')] [2024-06-15 11:36:48,026][1651469] Signal inference workers to stop experience collection... (350 times) [2024-06-15 11:36:48,124][1652491] InferenceWorker_p0-w0: stopping experience collection (350 times) [2024-06-15 11:36:48,324][1651469] Signal inference workers to resume experience collection... (350 times) [2024-06-15 11:36:48,325][1652491] InferenceWorker_p0-w0: resuming experience collection (350 times) [2024-06-15 11:36:48,327][1652491] Updated weights for policy 0, policy_version 6432 (0.0016) [2024-06-15 11:36:50,615][1652491] Updated weights for policy 0, policy_version 6514 (0.0081) [2024-06-15 11:36:50,955][1648985] Fps is (10 sec: 32768.1, 60 sec: 40960.1, 300 sec: 45319.8). Total num frames: 13369344. Throughput: 0: 10797.6. Samples: 3427840. Policy #0 lag: (min: 15.0, avg: 60.2, max: 239.0) [2024-06-15 11:36:50,956][1648985] Avg episode reward: [(0, '30.770')] [2024-06-15 11:36:51,251][1651469] Saving new best policy, reward=30.770! [2024-06-15 11:36:52,082][1652491] Updated weights for policy 0, policy_version 6576 (0.0012) [2024-06-15 11:36:53,531][1652491] Updated weights for policy 0, policy_version 6643 (0.0012) [2024-06-15 11:36:55,956][1648985] Fps is (10 sec: 52427.5, 60 sec: 43690.4, 300 sec: 44986.5). Total num frames: 13631488. Throughput: 0: 10513.0. Samples: 3453440. Policy #0 lag: (min: 15.0, avg: 60.2, max: 239.0) [2024-06-15 11:36:55,957][1648985] Avg episode reward: [(0, '31.580')] [2024-06-15 11:36:55,965][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000006656_13631488.pth... [2024-06-15 11:36:56,045][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000001472_3014656.pth [2024-06-15 11:36:56,055][1651469] Saving new best policy, reward=31.580! [2024-06-15 11:37:00,523][1652491] Updated weights for policy 0, policy_version 6706 (0.0013) [2024-06-15 11:37:00,955][1648985] Fps is (10 sec: 39321.6, 60 sec: 42052.3, 300 sec: 44875.5). Total num frames: 13762560. Throughput: 0: 11002.3. Samples: 3535872. Policy #0 lag: (min: 15.0, avg: 75.5, max: 271.0) [2024-06-15 11:37:00,956][1648985] Avg episode reward: [(0, '32.700')] [2024-06-15 11:37:01,597][1651469] Saving new best policy, reward=32.700! [2024-06-15 11:37:03,184][1652491] Updated weights for policy 0, policy_version 6816 (0.0013) [2024-06-15 11:37:05,107][1652491] Updated weights for policy 0, policy_version 6882 (0.0015) [2024-06-15 11:37:05,955][1648985] Fps is (10 sec: 52431.0, 60 sec: 43690.8, 300 sec: 45319.8). Total num frames: 14155776. Throughput: 0: 10365.2. Samples: 3576320. Policy #0 lag: (min: 15.0, avg: 75.5, max: 271.0) [2024-06-15 11:37:05,956][1648985] Avg episode reward: [(0, '31.800')] [2024-06-15 11:37:10,955][1648985] Fps is (10 sec: 39320.8, 60 sec: 43144.4, 300 sec: 44431.2). Total num frames: 14155776. Throughput: 0: 10410.6. Samples: 3617280. Policy #0 lag: (min: 15.0, avg: 75.5, max: 271.0) [2024-06-15 11:37:10,956][1648985] Avg episode reward: [(0, '29.940')] [2024-06-15 11:37:12,692][1652491] Updated weights for policy 0, policy_version 6930 (0.0071) [2024-06-15 11:37:15,213][1652491] Updated weights for policy 0, policy_version 7024 (0.0012) [2024-06-15 11:37:15,955][1648985] Fps is (10 sec: 26214.6, 60 sec: 40960.2, 300 sec: 44764.4). Total num frames: 14417920. Throughput: 0: 10649.7. Samples: 3681280. Policy #0 lag: (min: 14.0, avg: 55.3, max: 270.0) [2024-06-15 11:37:15,955][1648985] Avg episode reward: [(0, '28.940')] [2024-06-15 11:37:17,495][1652491] Updated weights for policy 0, policy_version 7104 (0.0102) [2024-06-15 11:37:20,955][1648985] Fps is (10 sec: 52430.0, 60 sec: 43690.7, 300 sec: 44431.2). Total num frames: 14680064. Throughput: 0: 10274.1. Samples: 3742208. Policy #0 lag: (min: 14.0, avg: 55.3, max: 270.0) [2024-06-15 11:37:20,955][1648985] Avg episode reward: [(0, '27.950')] [2024-06-15 11:37:24,767][1652491] Updated weights for policy 0, policy_version 7170 (0.0013) [2024-06-15 11:37:25,955][1648985] Fps is (10 sec: 36043.5, 60 sec: 41505.9, 300 sec: 44320.1). Total num frames: 14778368. Throughput: 0: 10638.2. Samples: 3779584. Policy #0 lag: (min: 5.0, avg: 52.8, max: 261.0) [2024-06-15 11:37:25,956][1648985] Avg episode reward: [(0, '30.730')] [2024-06-15 11:37:26,429][1652491] Updated weights for policy 0, policy_version 7235 (0.0018) [2024-06-15 11:37:27,351][1651469] Signal inference workers to stop experience collection... (400 times) [2024-06-15 11:37:27,413][1652491] InferenceWorker_p0-w0: stopping experience collection (400 times) [2024-06-15 11:37:27,650][1651469] Signal inference workers to resume experience collection... (400 times) [2024-06-15 11:37:27,650][1652491] InferenceWorker_p0-w0: resuming experience collection (400 times) [2024-06-15 11:37:28,627][1652491] Updated weights for policy 0, policy_version 7315 (0.0154) [2024-06-15 11:37:29,662][1652491] Updated weights for policy 0, policy_version 7369 (0.0056) [2024-06-15 11:37:30,724][1652491] Updated weights for policy 0, policy_version 7422 (0.0015) [2024-06-15 11:37:30,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 43690.9, 300 sec: 44653.4). Total num frames: 15204352. Throughput: 0: 10353.8. Samples: 3832832. Policy #0 lag: (min: 5.0, avg: 52.8, max: 261.0) [2024-06-15 11:37:30,956][1648985] Avg episode reward: [(0, '28.960')] [2024-06-15 11:37:35,956][1648985] Fps is (10 sec: 42599.1, 60 sec: 42052.2, 300 sec: 43986.9). Total num frames: 15204352. Throughput: 0: 10740.6. Samples: 3911168. Policy #0 lag: (min: 5.0, avg: 52.8, max: 261.0) [2024-06-15 11:37:35,957][1648985] Avg episode reward: [(0, '31.650')] [2024-06-15 11:37:38,231][1652491] Updated weights for policy 0, policy_version 7473 (0.0034) [2024-06-15 11:37:39,481][1652491] Updated weights for policy 0, policy_version 7521 (0.0018) [2024-06-15 11:37:40,955][1648985] Fps is (10 sec: 32767.9, 60 sec: 41506.1, 300 sec: 44209.0). Total num frames: 15532032. Throughput: 0: 10877.3. Samples: 3942912. Policy #0 lag: (min: 15.0, avg: 69.7, max: 271.0) [2024-06-15 11:37:40,956][1648985] Avg episode reward: [(0, '32.630')] [2024-06-15 11:37:41,268][1652491] Updated weights for policy 0, policy_version 7600 (0.0028) [2024-06-15 11:37:42,963][1652491] Updated weights for policy 0, policy_version 7673 (0.0013) [2024-06-15 11:37:45,955][1648985] Fps is (10 sec: 52428.2, 60 sec: 43690.6, 300 sec: 44320.1). Total num frames: 15728640. Throughput: 0: 10331.0. Samples: 4000768. Policy #0 lag: (min: 15.0, avg: 69.7, max: 271.0) [2024-06-15 11:37:45,956][1648985] Avg episode reward: [(0, '30.920')] [2024-06-15 11:37:50,264][1652491] Updated weights for policy 0, policy_version 7729 (0.0028) [2024-06-15 11:37:50,955][1648985] Fps is (10 sec: 36044.8, 60 sec: 42052.3, 300 sec: 43653.6). Total num frames: 15892480. Throughput: 0: 10990.9. Samples: 4070912. Policy #0 lag: (min: 15.0, avg: 70.6, max: 271.0) [2024-06-15 11:37:50,955][1648985] Avg episode reward: [(0, '27.290')] [2024-06-15 11:37:52,719][1652491] Updated weights for policy 0, policy_version 7840 (0.0014) [2024-06-15 11:37:54,903][1652491] Updated weights for policy 0, policy_version 7929 (0.0013) [2024-06-15 11:37:55,955][1648985] Fps is (10 sec: 52429.9, 60 sec: 43691.0, 300 sec: 44431.2). Total num frames: 16252928. Throughput: 0: 10535.9. Samples: 4091392. Policy #0 lag: (min: 15.0, avg: 70.6, max: 271.0) [2024-06-15 11:37:55,956][1648985] Avg episode reward: [(0, '26.520')] [2024-06-15 11:38:00,955][1648985] Fps is (10 sec: 36044.6, 60 sec: 41506.1, 300 sec: 43542.6). Total num frames: 16252928. Throughput: 0: 10786.1. Samples: 4166656. Policy #0 lag: (min: 15.0, avg: 70.6, max: 271.0) [2024-06-15 11:38:00,956][1648985] Avg episode reward: [(0, '30.890')] [2024-06-15 11:38:02,141][1652491] Updated weights for policy 0, policy_version 7984 (0.0018) [2024-06-15 11:38:04,608][1652491] Updated weights for policy 0, policy_version 8064 (0.0235) [2024-06-15 11:38:05,955][1648985] Fps is (10 sec: 39321.9, 60 sec: 41506.2, 300 sec: 43986.9). Total num frames: 16646144. Throughput: 0: 10638.2. Samples: 4220928. Policy #0 lag: (min: 15.0, avg: 65.7, max: 271.0) [2024-06-15 11:38:05,956][1648985] Avg episode reward: [(0, '33.200')] [2024-06-15 11:38:06,015][1651469] Signal inference workers to stop experience collection... (450 times) [2024-06-15 11:38:06,053][1652491] InferenceWorker_p0-w0: stopping experience collection (450 times) [2024-06-15 11:38:06,315][1651469] Signal inference workers to resume experience collection... (450 times) [2024-06-15 11:38:06,316][1652491] InferenceWorker_p0-w0: resuming experience collection (450 times) [2024-06-15 11:38:06,316][1651469] Saving new best policy, reward=33.200! [2024-06-15 11:38:06,727][1652491] Updated weights for policy 0, policy_version 8160 (0.0015) [2024-06-15 11:38:10,955][1648985] Fps is (10 sec: 52429.1, 60 sec: 43690.8, 300 sec: 44101.6). Total num frames: 16777216. Throughput: 0: 10501.8. Samples: 4252160. Policy #0 lag: (min: 15.0, avg: 65.7, max: 271.0) [2024-06-15 11:38:10,956][1648985] Avg episode reward: [(0, '32.130')] [2024-06-15 11:38:13,259][1652491] Updated weights for policy 0, policy_version 8194 (0.0022) [2024-06-15 11:38:14,644][1652491] Updated weights for policy 0, policy_version 8244 (0.0013) [2024-06-15 11:38:15,955][1648985] Fps is (10 sec: 29491.0, 60 sec: 42052.2, 300 sec: 43431.5). Total num frames: 16941056. Throughput: 0: 10968.2. Samples: 4326400. Policy #0 lag: (min: 15.0, avg: 72.2, max: 271.0) [2024-06-15 11:38:15,957][1648985] Avg episode reward: [(0, '27.930')] [2024-06-15 11:38:16,018][1652491] Updated weights for policy 0, policy_version 8288 (0.0016) [2024-06-15 11:38:17,907][1652491] Updated weights for policy 0, policy_version 8369 (0.0014) [2024-06-15 11:38:18,863][1652491] Updated weights for policy 0, policy_version 8421 (0.0014) [2024-06-15 11:38:20,955][1648985] Fps is (10 sec: 52428.6, 60 sec: 43690.6, 300 sec: 44431.3). Total num frames: 17301504. Throughput: 0: 10661.0. Samples: 4390912. Policy #0 lag: (min: 15.0, avg: 72.2, max: 271.0) [2024-06-15 11:38:20,956][1648985] Avg episode reward: [(0, '30.410')] [2024-06-15 11:38:25,767][1652491] Updated weights for policy 0, policy_version 8496 (0.0013) [2024-06-15 11:38:25,955][1648985] Fps is (10 sec: 45874.8, 60 sec: 43690.8, 300 sec: 43875.8). Total num frames: 17399808. Throughput: 0: 10854.4. Samples: 4431360. Policy #0 lag: (min: 15.0, avg: 80.0, max: 271.0) [2024-06-15 11:38:25,956][1648985] Avg episode reward: [(0, '30.310')] [2024-06-15 11:38:27,520][1652491] Updated weights for policy 0, policy_version 8544 (0.0012) [2024-06-15 11:38:29,475][1652491] Updated weights for policy 0, policy_version 8627 (0.0236) [2024-06-15 11:38:30,962][1648985] Fps is (10 sec: 49117.2, 60 sec: 43139.4, 300 sec: 44319.0). Total num frames: 17793024. Throughput: 0: 10682.1. Samples: 4481536. Policy #0 lag: (min: 15.0, avg: 80.0, max: 271.0) [2024-06-15 11:38:30,963][1648985] Avg episode reward: [(0, '28.870')] [2024-06-15 11:38:35,955][1648985] Fps is (10 sec: 42598.7, 60 sec: 43690.7, 300 sec: 44209.0). Total num frames: 17825792. Throughput: 0: 10763.4. Samples: 4555264. Policy #0 lag: (min: 15.0, avg: 80.0, max: 271.0) [2024-06-15 11:38:35,956][1648985] Avg episode reward: [(0, '31.250')] [2024-06-15 11:38:37,466][1652491] Updated weights for policy 0, policy_version 8720 (0.0015) [2024-06-15 11:38:39,364][1652491] Updated weights for policy 0, policy_version 8769 (0.0082) [2024-06-15 11:38:40,955][1648985] Fps is (10 sec: 32791.1, 60 sec: 43144.5, 300 sec: 43653.7). Total num frames: 18120704. Throughput: 0: 11036.4. Samples: 4588032. Policy #0 lag: (min: 15.0, avg: 74.2, max: 271.0) [2024-06-15 11:38:40,956][1648985] Avg episode reward: [(0, '30.380')] [2024-06-15 11:38:41,085][1652491] Updated weights for policy 0, policy_version 8852 (0.0153) [2024-06-15 11:38:42,179][1652491] Updated weights for policy 0, policy_version 8912 (0.0017) [2024-06-15 11:38:45,955][1648985] Fps is (10 sec: 52429.1, 60 sec: 43690.8, 300 sec: 44431.2). Total num frames: 18350080. Throughput: 0: 10808.9. Samples: 4653056. Policy #0 lag: (min: 15.0, avg: 74.2, max: 271.0) [2024-06-15 11:38:45,956][1648985] Avg episode reward: [(0, '32.750')] [2024-06-15 11:38:49,665][1652491] Updated weights for policy 0, policy_version 8992 (0.0015) [2024-06-15 11:38:50,955][1648985] Fps is (10 sec: 36044.7, 60 sec: 43144.5, 300 sec: 43543.8). Total num frames: 18481152. Throughput: 0: 11093.3. Samples: 4720128. Policy #0 lag: (min: 11.0, avg: 83.0, max: 267.0) [2024-06-15 11:38:50,956][1648985] Avg episode reward: [(0, '28.060')] [2024-06-15 11:38:51,174][1652491] Updated weights for policy 0, policy_version 9040 (0.0015) [2024-06-15 11:38:51,731][1651469] Signal inference workers to stop experience collection... (500 times) [2024-06-15 11:38:51,786][1652491] InferenceWorker_p0-w0: stopping experience collection (500 times) [2024-06-15 11:38:52,035][1651469] Signal inference workers to resume experience collection... (500 times) [2024-06-15 11:38:52,036][1652491] InferenceWorker_p0-w0: resuming experience collection (500 times) [2024-06-15 11:38:53,129][1652491] Updated weights for policy 0, policy_version 9125 (0.0013) [2024-06-15 11:38:54,318][1652491] Updated weights for policy 0, policy_version 9184 (0.0016) [2024-06-15 11:38:55,955][1648985] Fps is (10 sec: 52428.5, 60 sec: 43690.6, 300 sec: 44431.2). Total num frames: 18874368. Throughput: 0: 10979.5. Samples: 4746240. Policy #0 lag: (min: 11.0, avg: 83.0, max: 267.0) [2024-06-15 11:38:55,956][1648985] Avg episode reward: [(0, '30.260')] [2024-06-15 11:38:56,012][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000009216_18874368.pth... [2024-06-15 11:38:56,066][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000004096_8388608.pth [2024-06-15 11:39:00,955][1648985] Fps is (10 sec: 39322.0, 60 sec: 43690.7, 300 sec: 43875.8). Total num frames: 18874368. Throughput: 0: 10934.0. Samples: 4818432. Policy #0 lag: (min: 11.0, avg: 83.0, max: 267.0) [2024-06-15 11:39:00,956][1648985] Avg episode reward: [(0, '30.810')] [2024-06-15 11:39:02,024][1652491] Updated weights for policy 0, policy_version 9264 (0.0031) [2024-06-15 11:39:03,575][1652491] Updated weights for policy 0, policy_version 9312 (0.0013) [2024-06-15 11:39:05,457][1652491] Updated weights for policy 0, policy_version 9392 (0.0013) [2024-06-15 11:39:05,955][1648985] Fps is (10 sec: 39321.2, 60 sec: 43690.5, 300 sec: 43986.9). Total num frames: 19267584. Throughput: 0: 10797.5. Samples: 4876800. Policy #0 lag: (min: 15.0, avg: 75.3, max: 271.0) [2024-06-15 11:39:05,956][1648985] Avg episode reward: [(0, '32.300')] [2024-06-15 11:39:07,221][1652491] Updated weights for policy 0, policy_version 9468 (0.0037) [2024-06-15 11:39:10,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 43690.7, 300 sec: 44209.0). Total num frames: 19398656. Throughput: 0: 10638.2. Samples: 4910080. Policy #0 lag: (min: 15.0, avg: 75.3, max: 271.0) [2024-06-15 11:39:10,956][1648985] Avg episode reward: [(0, '36.500')] [2024-06-15 11:39:10,960][1651469] Saving new best policy, reward=36.500! [2024-06-15 11:39:14,447][1652491] Updated weights for policy 0, policy_version 9535 (0.0041) [2024-06-15 11:39:15,955][1648985] Fps is (10 sec: 29491.6, 60 sec: 43690.7, 300 sec: 43431.5). Total num frames: 19562496. Throughput: 0: 11060.9. Samples: 4979200. Policy #0 lag: (min: 15.0, avg: 82.5, max: 271.0) [2024-06-15 11:39:15,956][1648985] Avg episode reward: [(0, '36.040')] [2024-06-15 11:39:16,688][1652491] Updated weights for policy 0, policy_version 9600 (0.0013) [2024-06-15 11:39:18,627][1652491] Updated weights for policy 0, policy_version 9696 (0.0081) [2024-06-15 11:39:20,955][1648985] Fps is (10 sec: 52428.4, 60 sec: 43690.7, 300 sec: 44431.2). Total num frames: 19922944. Throughput: 0: 10717.9. Samples: 5037568. Policy #0 lag: (min: 15.0, avg: 82.5, max: 271.0) [2024-06-15 11:39:20,956][1648985] Avg episode reward: [(0, '35.990')] [2024-06-15 11:39:25,777][1652491] Updated weights for policy 0, policy_version 9746 (0.0014) [2024-06-15 11:39:25,955][1648985] Fps is (10 sec: 39321.4, 60 sec: 42598.4, 300 sec: 43320.4). Total num frames: 19955712. Throughput: 0: 10934.0. Samples: 5080064. Policy #0 lag: (min: 2.0, avg: 62.7, max: 258.0) [2024-06-15 11:39:25,956][1648985] Avg episode reward: [(0, '32.500')] [2024-06-15 11:39:27,786][1652491] Updated weights for policy 0, policy_version 9825 (0.0012) [2024-06-15 11:39:29,351][1652491] Updated weights for policy 0, policy_version 9889 (0.0013) [2024-06-15 11:39:30,398][1651469] Signal inference workers to stop experience collection... (550 times) [2024-06-15 11:39:30,438][1652491] Updated weights for policy 0, policy_version 9941 (0.0013) [2024-06-15 11:39:30,458][1652491] InferenceWorker_p0-w0: stopping experience collection (550 times) [2024-06-15 11:39:30,550][1651469] Signal inference workers to resume experience collection... (550 times) [2024-06-15 11:39:30,551][1652491] InferenceWorker_p0-w0: resuming experience collection (550 times) [2024-06-15 11:39:30,971][1648985] Fps is (10 sec: 49074.6, 60 sec: 43684.3, 300 sec: 44317.7). Total num frames: 20414464. Throughput: 0: 10759.6. Samples: 5137408. Policy #0 lag: (min: 2.0, avg: 62.7, max: 258.0) [2024-06-15 11:39:30,972][1648985] Avg episode reward: [(0, '32.550')] [2024-06-15 11:39:35,955][1648985] Fps is (10 sec: 49152.7, 60 sec: 43690.7, 300 sec: 43653.7). Total num frames: 20447232. Throughput: 0: 10956.8. Samples: 5213184. Policy #0 lag: (min: 2.0, avg: 62.7, max: 258.0) [2024-06-15 11:39:35,956][1648985] Avg episode reward: [(0, '34.790')] [2024-06-15 11:39:37,773][1652491] Updated weights for policy 0, policy_version 10002 (0.0015) [2024-06-15 11:39:39,869][1652491] Updated weights for policy 0, policy_version 10080 (0.0109) [2024-06-15 11:39:40,955][1648985] Fps is (10 sec: 29537.8, 60 sec: 43144.6, 300 sec: 43542.6). Total num frames: 20709376. Throughput: 0: 11036.4. Samples: 5242880. Policy #0 lag: (min: 15.0, avg: 74.3, max: 271.0) [2024-06-15 11:39:40,956][1648985] Avg episode reward: [(0, '34.450')] [2024-06-15 11:39:41,239][1652491] Updated weights for policy 0, policy_version 10141 (0.0017) [2024-06-15 11:39:42,696][1652491] Updated weights for policy 0, policy_version 10208 (0.0013) [2024-06-15 11:39:45,955][1648985] Fps is (10 sec: 52428.2, 60 sec: 43690.6, 300 sec: 43764.7). Total num frames: 20971520. Throughput: 0: 10763.4. Samples: 5302784. Policy #0 lag: (min: 15.0, avg: 74.3, max: 271.0) [2024-06-15 11:39:45,957][1648985] Avg episode reward: [(0, '33.970')] [2024-06-15 11:39:49,747][1652491] Updated weights for policy 0, policy_version 10258 (0.0016) [2024-06-15 11:39:50,955][1648985] Fps is (10 sec: 39321.0, 60 sec: 43690.6, 300 sec: 43098.2). Total num frames: 21102592. Throughput: 0: 11138.8. Samples: 5378048. Policy #0 lag: (min: 2.0, avg: 60.0, max: 258.0) [2024-06-15 11:39:50,956][1648985] Avg episode reward: [(0, '34.750')] [2024-06-15 11:39:50,977][1652491] Updated weights for policy 0, policy_version 10308 (0.0010) [2024-06-15 11:39:52,345][1652491] Updated weights for policy 0, policy_version 10373 (0.0013) [2024-06-15 11:39:53,595][1652491] Updated weights for policy 0, policy_version 10435 (0.0133) [2024-06-15 11:39:54,537][1652491] Updated weights for policy 0, policy_version 10488 (0.0014) [2024-06-15 11:39:55,955][1648985] Fps is (10 sec: 52427.5, 60 sec: 43690.5, 300 sec: 44097.9). Total num frames: 21495808. Throughput: 0: 11059.1. Samples: 5407744. Policy #0 lag: (min: 2.0, avg: 60.0, max: 258.0) [2024-06-15 11:39:55,956][1648985] Avg episode reward: [(0, '34.170')] [2024-06-15 11:40:00,955][1648985] Fps is (10 sec: 42599.1, 60 sec: 44236.8, 300 sec: 42765.0). Total num frames: 21528576. Throughput: 0: 11309.5. Samples: 5488128. Policy #0 lag: (min: 15.0, avg: 75.7, max: 271.0) [2024-06-15 11:40:00,956][1648985] Avg episode reward: [(0, '36.450')] [2024-06-15 11:40:01,880][1652491] Updated weights for policy 0, policy_version 10561 (0.0013) [2024-06-15 11:40:03,361][1652491] Updated weights for policy 0, policy_version 10624 (0.0012) [2024-06-15 11:40:05,036][1652491] Updated weights for policy 0, policy_version 10704 (0.0104) [2024-06-15 11:40:05,955][1648985] Fps is (10 sec: 49152.9, 60 sec: 45329.1, 300 sec: 44098.0). Total num frames: 21987328. Throughput: 0: 11275.4. Samples: 5544960. Policy #0 lag: (min: 15.0, avg: 75.7, max: 271.0) [2024-06-15 11:40:05,956][1648985] Avg episode reward: [(0, '34.920')] [2024-06-15 11:40:06,073][1652491] Updated weights for policy 0, policy_version 10748 (0.0014) [2024-06-15 11:40:10,955][1648985] Fps is (10 sec: 49151.6, 60 sec: 43690.6, 300 sec: 42765.0). Total num frames: 22020096. Throughput: 0: 11241.2. Samples: 5585920. Policy #0 lag: (min: 15.0, avg: 75.7, max: 271.0) [2024-06-15 11:40:10,956][1648985] Avg episode reward: [(0, '32.900')] [2024-06-15 11:40:12,808][1652491] Updated weights for policy 0, policy_version 10789 (0.0015) [2024-06-15 11:40:13,175][1651469] Signal inference workers to stop experience collection... (600 times) [2024-06-15 11:40:13,227][1652491] InferenceWorker_p0-w0: stopping experience collection (600 times) [2024-06-15 11:40:13,406][1651469] Signal inference workers to resume experience collection... (600 times) [2024-06-15 11:40:13,407][1652491] InferenceWorker_p0-w0: resuming experience collection (600 times) [2024-06-15 11:40:14,462][1652491] Updated weights for policy 0, policy_version 10856 (0.0094) [2024-06-15 11:40:15,763][1652491] Updated weights for policy 0, policy_version 10928 (0.0046) [2024-06-15 11:40:15,955][1648985] Fps is (10 sec: 39322.8, 60 sec: 46967.6, 300 sec: 43875.8). Total num frames: 22380544. Throughput: 0: 11370.4. Samples: 5648896. Policy #0 lag: (min: 15.0, avg: 79.2, max: 271.0) [2024-06-15 11:40:15,956][1648985] Avg episode reward: [(0, '36.290')] [2024-06-15 11:40:17,310][1652491] Updated weights for policy 0, policy_version 11001 (0.0012) [2024-06-15 11:40:20,955][1648985] Fps is (10 sec: 52429.5, 60 sec: 43690.7, 300 sec: 43320.4). Total num frames: 22544384. Throughput: 0: 11252.6. Samples: 5719552. Policy #0 lag: (min: 15.0, avg: 79.2, max: 271.0) [2024-06-15 11:40:20,956][1648985] Avg episode reward: [(0, '35.980')] [2024-06-15 11:40:24,502][1652491] Updated weights for policy 0, policy_version 11061 (0.0027) [2024-06-15 11:40:25,461][1652491] Updated weights for policy 0, policy_version 11104 (0.0013) [2024-06-15 11:40:25,957][1648985] Fps is (10 sec: 39313.2, 60 sec: 46966.0, 300 sec: 43431.2). Total num frames: 22773760. Throughput: 0: 11400.1. Samples: 5755904. Policy #0 lag: (min: 12.0, avg: 71.1, max: 268.0) [2024-06-15 11:40:25,958][1648985] Avg episode reward: [(0, '37.630')] [2024-06-15 11:40:26,181][1651469] Saving new best policy, reward=37.630! [2024-06-15 11:40:27,267][1652491] Updated weights for policy 0, policy_version 11184 (0.0066) [2024-06-15 11:40:28,956][1652491] Updated weights for policy 0, policy_version 11253 (0.0014) [2024-06-15 11:40:30,955][1648985] Fps is (10 sec: 52428.0, 60 sec: 44248.4, 300 sec: 43765.4). Total num frames: 23068672. Throughput: 0: 11411.9. Samples: 5816320. Policy #0 lag: (min: 12.0, avg: 71.1, max: 268.0) [2024-06-15 11:40:30,956][1648985] Avg episode reward: [(0, '41.000')] [2024-06-15 11:40:30,957][1651469] Saving new best policy, reward=41.000! [2024-06-15 11:40:35,049][1652491] Updated weights for policy 0, policy_version 11280 (0.0011) [2024-06-15 11:40:35,955][1648985] Fps is (10 sec: 39329.0, 60 sec: 45329.0, 300 sec: 42987.2). Total num frames: 23166976. Throughput: 0: 11411.9. Samples: 5891584. Policy #0 lag: (min: 10.0, avg: 69.9, max: 266.0) [2024-06-15 11:40:35,956][1648985] Avg episode reward: [(0, '36.090')] [2024-06-15 11:40:36,404][1652491] Updated weights for policy 0, policy_version 11331 (0.0020) [2024-06-15 11:40:38,092][1652491] Updated weights for policy 0, policy_version 11408 (0.0024) [2024-06-15 11:40:39,415][1652491] Updated weights for policy 0, policy_version 11457 (0.0019) [2024-06-15 11:40:40,747][1652491] Updated weights for policy 0, policy_version 11520 (0.0012) [2024-06-15 11:40:40,956][1648985] Fps is (10 sec: 52428.2, 60 sec: 48059.6, 300 sec: 44320.1). Total num frames: 23592960. Throughput: 0: 11377.8. Samples: 5919744. Policy #0 lag: (min: 10.0, avg: 69.9, max: 266.0) [2024-06-15 11:40:40,957][1648985] Avg episode reward: [(0, '36.050')] [2024-06-15 11:40:45,955][1648985] Fps is (10 sec: 42598.6, 60 sec: 43690.7, 300 sec: 42987.2). Total num frames: 23592960. Throughput: 0: 11320.9. Samples: 5997568. Policy #0 lag: (min: 10.0, avg: 69.9, max: 266.0) [2024-06-15 11:40:45,956][1648985] Avg episode reward: [(0, '39.890')] [2024-06-15 11:40:48,547][1652491] Updated weights for policy 0, policy_version 11600 (0.0016) [2024-06-15 11:40:49,989][1652491] Updated weights for policy 0, policy_version 11664 (0.0016) [2024-06-15 11:40:50,432][1651469] Signal inference workers to stop experience collection... (650 times) [2024-06-15 11:40:50,460][1652491] InferenceWorker_p0-w0: stopping experience collection (650 times) [2024-06-15 11:40:50,685][1651469] Signal inference workers to resume experience collection... (650 times) [2024-06-15 11:40:50,686][1652491] InferenceWorker_p0-w0: resuming experience collection (650 times) [2024-06-15 11:40:50,955][1648985] Fps is (10 sec: 36045.9, 60 sec: 47513.8, 300 sec: 43875.8). Total num frames: 23953408. Throughput: 0: 11332.3. Samples: 6054912. Policy #0 lag: (min: 15.0, avg: 72.8, max: 271.0) [2024-06-15 11:40:50,956][1648985] Avg episode reward: [(0, '39.470')] [2024-06-15 11:40:51,889][1652491] Updated weights for policy 0, policy_version 11746 (0.0033) [2024-06-15 11:40:52,341][1652491] Updated weights for policy 0, policy_version 11773 (0.0024) [2024-06-15 11:40:55,956][1648985] Fps is (10 sec: 52426.8, 60 sec: 43690.6, 300 sec: 43653.6). Total num frames: 24117248. Throughput: 0: 11218.4. Samples: 6090752. Policy #0 lag: (min: 15.0, avg: 72.8, max: 271.0) [2024-06-15 11:40:55,957][1648985] Avg episode reward: [(0, '39.330')] [2024-06-15 11:40:55,979][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000011776_24117248.pth... [2024-06-15 11:40:56,046][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000006656_13631488.pth [2024-06-15 11:40:59,140][1652491] Updated weights for policy 0, policy_version 11840 (0.0012) [2024-06-15 11:41:00,772][1652491] Updated weights for policy 0, policy_version 11905 (0.0014) [2024-06-15 11:41:00,955][1648985] Fps is (10 sec: 45875.2, 60 sec: 48059.8, 300 sec: 43653.7). Total num frames: 24412160. Throughput: 0: 11548.4. Samples: 6168576. Policy #0 lag: (min: 7.0, avg: 68.0, max: 263.0) [2024-06-15 11:41:00,956][1648985] Avg episode reward: [(0, '37.520')] [2024-06-15 11:41:02,555][1652491] Updated weights for policy 0, policy_version 11971 (0.0011) [2024-06-15 11:41:03,729][1652491] Updated weights for policy 0, policy_version 12025 (0.0012) [2024-06-15 11:41:05,955][1648985] Fps is (10 sec: 52431.2, 60 sec: 44236.9, 300 sec: 44320.1). Total num frames: 24641536. Throughput: 0: 11559.8. Samples: 6239744. Policy #0 lag: (min: 7.0, avg: 68.0, max: 263.0) [2024-06-15 11:41:05,956][1648985] Avg episode reward: [(0, '38.370')] [2024-06-15 11:41:09,985][1652491] Updated weights for policy 0, policy_version 12065 (0.0012) [2024-06-15 11:41:10,955][1648985] Fps is (10 sec: 39320.7, 60 sec: 46421.3, 300 sec: 43542.6). Total num frames: 24805376. Throughput: 0: 11617.2. Samples: 6278656. Policy #0 lag: (min: 14.0, avg: 70.0, max: 270.0) [2024-06-15 11:41:10,956][1648985] Avg episode reward: [(0, '41.860')] [2024-06-15 11:41:11,407][1651469] Saving new best policy, reward=41.860! [2024-06-15 11:41:11,954][1652491] Updated weights for policy 0, policy_version 12163 (0.0012) [2024-06-15 11:41:13,352][1652491] Updated weights for policy 0, policy_version 12224 (0.0011) [2024-06-15 11:41:14,992][1652491] Updated weights for policy 0, policy_version 12287 (0.0014) [2024-06-15 11:41:15,956][1648985] Fps is (10 sec: 52426.3, 60 sec: 46420.8, 300 sec: 44431.1). Total num frames: 25165824. Throughput: 0: 11377.7. Samples: 6328320. Policy #0 lag: (min: 14.0, avg: 70.0, max: 270.0) [2024-06-15 11:41:15,956][1648985] Avg episode reward: [(0, '46.600')] [2024-06-15 11:41:15,957][1651469] Saving new best policy, reward=46.600! [2024-06-15 11:41:20,955][1648985] Fps is (10 sec: 36045.3, 60 sec: 43690.6, 300 sec: 43653.6). Total num frames: 25165824. Throughput: 0: 11639.5. Samples: 6415360. Policy #0 lag: (min: 14.0, avg: 70.0, max: 270.0) [2024-06-15 11:41:20,956][1648985] Avg episode reward: [(0, '45.200')] [2024-06-15 11:41:22,158][1652491] Updated weights for policy 0, policy_version 12344 (0.0014) [2024-06-15 11:41:23,445][1652491] Updated weights for policy 0, policy_version 12402 (0.0127) [2024-06-15 11:41:25,042][1652491] Updated weights for policy 0, policy_version 12466 (0.0012) [2024-06-15 11:41:25,955][1648985] Fps is (10 sec: 42599.7, 60 sec: 46968.9, 300 sec: 44098.0). Total num frames: 25591808. Throughput: 0: 11537.1. Samples: 6438912. Policy #0 lag: (min: 15.0, avg: 70.0, max: 271.0) [2024-06-15 11:41:25,956][1648985] Avg episode reward: [(0, '43.020')] [2024-06-15 11:41:26,666][1652491] Updated weights for policy 0, policy_version 12541 (0.0011) [2024-06-15 11:41:30,955][1648985] Fps is (10 sec: 52428.1, 60 sec: 43690.7, 300 sec: 44097.9). Total num frames: 25690112. Throughput: 0: 11411.9. Samples: 6511104. Policy #0 lag: (min: 15.0, avg: 70.0, max: 271.0) [2024-06-15 11:41:30,956][1648985] Avg episode reward: [(0, '42.530')] [2024-06-15 11:41:32,935][1651469] Signal inference workers to stop experience collection... (700 times) [2024-06-15 11:41:32,999][1652491] InferenceWorker_p0-w0: stopping experience collection (700 times) [2024-06-15 11:41:33,096][1651469] Signal inference workers to resume experience collection... (700 times) [2024-06-15 11:41:33,097][1652491] InferenceWorker_p0-w0: resuming experience collection (700 times) [2024-06-15 11:41:33,302][1652491] Updated weights for policy 0, policy_version 12579 (0.0013) [2024-06-15 11:41:34,503][1652491] Updated weights for policy 0, policy_version 12640 (0.0012) [2024-06-15 11:41:35,955][1648985] Fps is (10 sec: 39322.9, 60 sec: 46967.7, 300 sec: 43875.8). Total num frames: 25985024. Throughput: 0: 11639.5. Samples: 6578688. Policy #0 lag: (min: 7.0, avg: 60.7, max: 263.0) [2024-06-15 11:41:35,955][1648985] Avg episode reward: [(0, '46.740')] [2024-06-15 11:41:36,140][1652491] Updated weights for policy 0, policy_version 12704 (0.0012) [2024-06-15 11:41:36,489][1651469] Saving new best policy, reward=46.740! [2024-06-15 11:41:38,068][1652491] Updated weights for policy 0, policy_version 12784 (0.0013) [2024-06-15 11:41:40,955][1648985] Fps is (10 sec: 52428.5, 60 sec: 43690.7, 300 sec: 44431.2). Total num frames: 26214400. Throughput: 0: 11468.9. Samples: 6606848. Policy #0 lag: (min: 7.0, avg: 60.7, max: 263.0) [2024-06-15 11:41:40,956][1648985] Avg episode reward: [(0, '44.920')] [2024-06-15 11:41:43,840][1652491] Updated weights for policy 0, policy_version 12806 (0.0045) [2024-06-15 11:41:45,955][1648985] Fps is (10 sec: 42597.6, 60 sec: 46967.5, 300 sec: 44209.0). Total num frames: 26411008. Throughput: 0: 11491.5. Samples: 6685696. Policy #0 lag: (min: 15.0, avg: 77.9, max: 271.0) [2024-06-15 11:41:45,956][1648985] Avg episode reward: [(0, '49.500')] [2024-06-15 11:41:46,129][1652491] Updated weights for policy 0, policy_version 12912 (0.0014) [2024-06-15 11:41:46,510][1651469] Saving new best policy, reward=49.500! [2024-06-15 11:41:48,089][1652491] Updated weights for policy 0, policy_version 12980 (0.0012) [2024-06-15 11:41:49,785][1652491] Updated weights for policy 0, policy_version 13046 (0.0012) [2024-06-15 11:41:50,955][1648985] Fps is (10 sec: 52429.7, 60 sec: 46421.2, 300 sec: 44431.2). Total num frames: 26738688. Throughput: 0: 11036.4. Samples: 6736384. Policy #0 lag: (min: 15.0, avg: 77.9, max: 271.0) [2024-06-15 11:41:50,956][1648985] Avg episode reward: [(0, '51.620')] [2024-06-15 11:41:50,956][1651469] Saving new best policy, reward=51.620! [2024-06-15 11:41:55,956][1648985] Fps is (10 sec: 32766.5, 60 sec: 43690.7, 300 sec: 43986.8). Total num frames: 26738688. Throughput: 0: 11127.4. Samples: 6779392. Policy #0 lag: (min: 15.0, avg: 77.9, max: 271.0) [2024-06-15 11:41:55,957][1648985] Avg episode reward: [(0, '55.020')] [2024-06-15 11:41:55,968][1651469] Saving new best policy, reward=55.020! [2024-06-15 11:41:57,102][1652491] Updated weights for policy 0, policy_version 13089 (0.0171) [2024-06-15 11:41:59,250][1652491] Updated weights for policy 0, policy_version 13168 (0.0108) [2024-06-15 11:42:00,703][1652491] Updated weights for policy 0, policy_version 13219 (0.0011) [2024-06-15 11:42:00,966][1648985] Fps is (10 sec: 36045.0, 60 sec: 44782.9, 300 sec: 43875.8). Total num frames: 27099136. Throughput: 0: 11332.4. Samples: 6838272. Policy #0 lag: (min: 15.0, avg: 61.0, max: 271.0) [2024-06-15 11:42:00,966][1648985] Avg episode reward: [(0, '56.440')] [2024-06-15 11:42:01,383][1651469] Saving new best policy, reward=56.440! [2024-06-15 11:42:02,750][1652491] Updated weights for policy 0, policy_version 13308 (0.0015) [2024-06-15 11:42:05,955][1648985] Fps is (10 sec: 52431.0, 60 sec: 43690.6, 300 sec: 44431.2). Total num frames: 27262976. Throughput: 0: 10763.4. Samples: 6899712. Policy #0 lag: (min: 15.0, avg: 61.0, max: 271.0) [2024-06-15 11:42:05,956][1648985] Avg episode reward: [(0, '54.060')] [2024-06-15 11:42:09,496][1652491] Updated weights for policy 0, policy_version 13376 (0.0012) [2024-06-15 11:42:10,955][1648985] Fps is (10 sec: 32767.2, 60 sec: 43690.6, 300 sec: 44097.9). Total num frames: 27426816. Throughput: 0: 11081.9. Samples: 6937600. Policy #0 lag: (min: 1.0, avg: 61.6, max: 257.0) [2024-06-15 11:42:10,956][1648985] Avg episode reward: [(0, '49.620')] [2024-06-15 11:42:11,499][1652491] Updated weights for policy 0, policy_version 13426 (0.0014) [2024-06-15 11:42:11,501][1651469] Signal inference workers to stop experience collection... (750 times) [2024-06-15 11:42:11,549][1652491] InferenceWorker_p0-w0: stopping experience collection (750 times) [2024-06-15 11:42:11,710][1651469] Signal inference workers to resume experience collection... (750 times) [2024-06-15 11:42:11,711][1652491] InferenceWorker_p0-w0: resuming experience collection (750 times) [2024-06-15 11:42:12,779][1652491] Updated weights for policy 0, policy_version 13473 (0.0011) [2024-06-15 11:42:14,495][1652491] Updated weights for policy 0, policy_version 13541 (0.0096) [2024-06-15 11:42:15,955][1648985] Fps is (10 sec: 52428.4, 60 sec: 43690.9, 300 sec: 44431.2). Total num frames: 27787264. Throughput: 0: 10740.6. Samples: 6994432. Policy #0 lag: (min: 1.0, avg: 61.6, max: 257.0) [2024-06-15 11:42:15,956][1648985] Avg episode reward: [(0, '51.180')] [2024-06-15 11:42:20,668][1652491] Updated weights for policy 0, policy_version 13586 (0.0012) [2024-06-15 11:42:20,955][1648985] Fps is (10 sec: 42599.6, 60 sec: 44783.0, 300 sec: 44320.2). Total num frames: 27852800. Throughput: 0: 10934.0. Samples: 7070720. Policy #0 lag: (min: 11.0, avg: 80.7, max: 267.0) [2024-06-15 11:42:20,956][1648985] Avg episode reward: [(0, '53.400')] [2024-06-15 11:42:21,595][1652491] Updated weights for policy 0, policy_version 13632 (0.0012) [2024-06-15 11:42:23,632][1652491] Updated weights for policy 0, policy_version 13701 (0.0011) [2024-06-15 11:42:25,394][1652491] Updated weights for policy 0, policy_version 13776 (0.0012) [2024-06-15 11:42:25,955][1648985] Fps is (10 sec: 45875.5, 60 sec: 44236.9, 300 sec: 44209.0). Total num frames: 28246016. Throughput: 0: 10865.8. Samples: 7095808. Policy #0 lag: (min: 11.0, avg: 80.7, max: 267.0) [2024-06-15 11:42:25,956][1648985] Avg episode reward: [(0, '54.680')] [2024-06-15 11:42:30,956][1648985] Fps is (10 sec: 45873.5, 60 sec: 43690.5, 300 sec: 44431.2). Total num frames: 28311552. Throughput: 0: 10729.2. Samples: 7168512. Policy #0 lag: (min: 11.0, avg: 80.7, max: 267.0) [2024-06-15 11:42:30,957][1648985] Avg episode reward: [(0, '53.590')] [2024-06-15 11:42:31,879][1652491] Updated weights for policy 0, policy_version 13825 (0.0012) [2024-06-15 11:42:33,098][1652491] Updated weights for policy 0, policy_version 13888 (0.0014) [2024-06-15 11:42:35,955][1648985] Fps is (10 sec: 36044.9, 60 sec: 43690.5, 300 sec: 44320.1). Total num frames: 28606464. Throughput: 0: 10854.4. Samples: 7224832. Policy #0 lag: (min: 15.0, avg: 78.0, max: 271.0) [2024-06-15 11:42:35,956][1648985] Avg episode reward: [(0, '59.320')] [2024-06-15 11:42:36,035][1652491] Updated weights for policy 0, policy_version 13971 (0.0012) [2024-06-15 11:42:36,790][1651469] Saving new best policy, reward=59.320! [2024-06-15 11:42:38,205][1652491] Updated weights for policy 0, policy_version 14053 (0.0012) [2024-06-15 11:42:40,955][1648985] Fps is (10 sec: 52430.3, 60 sec: 43690.8, 300 sec: 44431.2). Total num frames: 28835840. Throughput: 0: 10490.4. Samples: 7251456. Policy #0 lag: (min: 15.0, avg: 78.0, max: 271.0) [2024-06-15 11:42:40,956][1648985] Avg episode reward: [(0, '60.500')] [2024-06-15 11:42:40,959][1651469] Saving new best policy, reward=60.500! [2024-06-15 11:42:44,583][1652491] Updated weights for policy 0, policy_version 14097 (0.0014) [2024-06-15 11:42:45,615][1652491] Updated weights for policy 0, policy_version 14144 (0.0087) [2024-06-15 11:42:45,955][1648985] Fps is (10 sec: 36044.2, 60 sec: 42598.3, 300 sec: 44320.1). Total num frames: 28966912. Throughput: 0: 10922.6. Samples: 7329792. Policy #0 lag: (min: 2.0, avg: 69.3, max: 258.0) [2024-06-15 11:42:45,956][1648985] Avg episode reward: [(0, '60.440')] [2024-06-15 11:42:48,456][1652491] Updated weights for policy 0, policy_version 14241 (0.0012) [2024-06-15 11:42:50,088][1652491] Updated weights for policy 0, policy_version 14320 (0.0013) [2024-06-15 11:42:50,958][1648985] Fps is (10 sec: 52412.5, 60 sec: 43688.4, 300 sec: 44430.7). Total num frames: 29360128. Throughput: 0: 10739.9. Samples: 7383040. Policy #0 lag: (min: 2.0, avg: 69.3, max: 258.0) [2024-06-15 11:42:50,959][1648985] Avg episode reward: [(0, '64.620')] [2024-06-15 11:42:50,960][1651469] Saving new best policy, reward=64.620! [2024-06-15 11:42:55,550][1651469] Signal inference workers to stop experience collection... (800 times) [2024-06-15 11:42:55,624][1652491] InferenceWorker_p0-w0: stopping experience collection (800 times) [2024-06-15 11:42:55,772][1651469] Signal inference workers to resume experience collection... (800 times) [2024-06-15 11:42:55,773][1652491] InferenceWorker_p0-w0: resuming experience collection (800 times) [2024-06-15 11:42:55,952][1652491] Updated weights for policy 0, policy_version 14356 (0.0012) [2024-06-15 11:42:55,955][1648985] Fps is (10 sec: 42598.1, 60 sec: 44237.0, 300 sec: 44542.2). Total num frames: 29392896. Throughput: 0: 10808.9. Samples: 7424000. Policy #0 lag: (min: 10.0, avg: 80.4, max: 266.0) [2024-06-15 11:42:55,956][1648985] Avg episode reward: [(0, '58.730')] [2024-06-15 11:42:56,428][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000014384_29458432.pth... [2024-06-15 11:42:56,487][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000009216_18874368.pth [2024-06-15 11:42:58,273][1652491] Updated weights for policy 0, policy_version 14416 (0.0116) [2024-06-15 11:42:59,760][1652491] Updated weights for policy 0, policy_version 14466 (0.0014) [2024-06-15 11:43:00,955][1648985] Fps is (10 sec: 36055.8, 60 sec: 43690.6, 300 sec: 44320.1). Total num frames: 29720576. Throughput: 0: 11082.0. Samples: 7493120. Policy #0 lag: (min: 10.0, avg: 80.4, max: 266.0) [2024-06-15 11:43:00,956][1648985] Avg episode reward: [(0, '54.930')] [2024-06-15 11:43:00,975][1652491] Updated weights for policy 0, policy_version 14514 (0.0012) [2024-06-15 11:43:02,500][1652491] Updated weights for policy 0, policy_version 14586 (0.0016) [2024-06-15 11:43:05,955][1648985] Fps is (10 sec: 49152.8, 60 sec: 43690.6, 300 sec: 44431.2). Total num frames: 29884416. Throughput: 0: 11002.3. Samples: 7565824. Policy #0 lag: (min: 10.0, avg: 80.4, max: 266.0) [2024-06-15 11:43:05,956][1648985] Avg episode reward: [(0, '55.190')] [2024-06-15 11:43:06,876][1652491] Updated weights for policy 0, policy_version 14626 (0.0012) [2024-06-15 11:43:10,099][1652491] Updated weights for policy 0, policy_version 14678 (0.0013) [2024-06-15 11:43:10,955][1648985] Fps is (10 sec: 39322.7, 60 sec: 44783.2, 300 sec: 44653.4). Total num frames: 30113792. Throughput: 0: 11207.2. Samples: 7600128. Policy #0 lag: (min: 18.0, avg: 114.5, max: 274.0) [2024-06-15 11:43:10,955][1648985] Avg episode reward: [(0, '61.280')] [2024-06-15 11:43:12,162][1652491] Updated weights for policy 0, policy_version 14753 (0.0014) [2024-06-15 11:43:13,246][1652491] Updated weights for policy 0, policy_version 14816 (0.0012) [2024-06-15 11:43:15,985][1648985] Fps is (10 sec: 52274.9, 60 sec: 43669.3, 300 sec: 44426.8). Total num frames: 30408704. Throughput: 0: 10881.5. Samples: 7658496. Policy #0 lag: (min: 18.0, avg: 114.5, max: 274.0) [2024-06-15 11:43:15,986][1648985] Avg episode reward: [(0, '56.060')] [2024-06-15 11:43:17,357][1652491] Updated weights for policy 0, policy_version 14849 (0.0016) [2024-06-15 11:43:18,726][1652491] Updated weights for policy 0, policy_version 14903 (0.0013) [2024-06-15 11:43:20,955][1648985] Fps is (10 sec: 42597.7, 60 sec: 44782.9, 300 sec: 44542.3). Total num frames: 30539776. Throughput: 0: 11286.8. Samples: 7732736. Policy #0 lag: (min: 14.0, avg: 114.2, max: 270.0) [2024-06-15 11:43:20,956][1648985] Avg episode reward: [(0, '57.920')] [2024-06-15 11:43:22,493][1652491] Updated weights for policy 0, policy_version 14945 (0.0015) [2024-06-15 11:43:24,156][1652491] Updated weights for policy 0, policy_version 15009 (0.0012) [2024-06-15 11:43:25,658][1652491] Updated weights for policy 0, policy_version 15076 (0.0012) [2024-06-15 11:43:25,955][1648985] Fps is (10 sec: 49297.0, 60 sec: 44236.8, 300 sec: 44432.2). Total num frames: 30900224. Throughput: 0: 11298.1. Samples: 7759872. Policy #0 lag: (min: 14.0, avg: 114.2, max: 270.0) [2024-06-15 11:43:25,956][1648985] Avg episode reward: [(0, '56.480')] [2024-06-15 11:43:30,018][1652491] Updated weights for policy 0, policy_version 15136 (0.0013) [2024-06-15 11:43:30,955][1648985] Fps is (10 sec: 52429.1, 60 sec: 45875.5, 300 sec: 44875.5). Total num frames: 31064064. Throughput: 0: 11025.1. Samples: 7825920. Policy #0 lag: (min: 14.0, avg: 131.1, max: 270.0) [2024-06-15 11:43:30,955][1648985] Avg episode reward: [(0, '62.340')] [2024-06-15 11:43:33,779][1652491] Updated weights for policy 0, policy_version 15171 (0.0014) [2024-06-15 11:43:35,677][1651469] Signal inference workers to stop experience collection... (850 times) [2024-06-15 11:43:35,725][1652491] InferenceWorker_p0-w0: stopping experience collection (850 times) [2024-06-15 11:43:35,875][1651469] Signal inference workers to resume experience collection... (850 times) [2024-06-15 11:43:35,882][1652491] InferenceWorker_p0-w0: resuming experience collection (850 times) [2024-06-15 11:43:35,955][1648985] Fps is (10 sec: 36044.5, 60 sec: 44236.7, 300 sec: 44542.3). Total num frames: 31260672. Throughput: 0: 11264.7. Samples: 7889920. Policy #0 lag: (min: 14.0, avg: 131.1, max: 270.0) [2024-06-15 11:43:35,956][1648985] Avg episode reward: [(0, '63.150')] [2024-06-15 11:43:36,025][1652491] Updated weights for policy 0, policy_version 15267 (0.0148) [2024-06-15 11:43:37,458][1652491] Updated weights for policy 0, policy_version 15330 (0.0069) [2024-06-15 11:43:40,955][1648985] Fps is (10 sec: 39321.1, 60 sec: 43690.7, 300 sec: 44431.2). Total num frames: 31457280. Throughput: 0: 11047.9. Samples: 7921152. Policy #0 lag: (min: 14.0, avg: 131.1, max: 270.0) [2024-06-15 11:43:40,956][1648985] Avg episode reward: [(0, '60.290')] [2024-06-15 11:43:41,639][1652491] Updated weights for policy 0, policy_version 15363 (0.0011) [2024-06-15 11:43:42,892][1652491] Updated weights for policy 0, policy_version 15415 (0.0040) [2024-06-15 11:43:45,955][1648985] Fps is (10 sec: 36045.2, 60 sec: 44236.9, 300 sec: 44542.3). Total num frames: 31621120. Throughput: 0: 11161.6. Samples: 7995392. Policy #0 lag: (min: 13.0, avg: 120.8, max: 269.0) [2024-06-15 11:43:45,956][1648985] Avg episode reward: [(0, '66.030')] [2024-06-15 11:43:46,525][1651469] Saving new best policy, reward=66.030! [2024-06-15 11:43:46,527][1652491] Updated weights for policy 0, policy_version 15472 (0.0014) [2024-06-15 11:43:48,200][1652491] Updated weights for policy 0, policy_version 15552 (0.0012) [2024-06-15 11:43:49,702][1652491] Updated weights for policy 0, policy_version 15614 (0.0016) [2024-06-15 11:43:50,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 43692.9, 300 sec: 44431.2). Total num frames: 31981568. Throughput: 0: 10854.4. Samples: 8054272. Policy #0 lag: (min: 13.0, avg: 120.8, max: 269.0) [2024-06-15 11:43:50,956][1648985] Avg episode reward: [(0, '60.820')] [2024-06-15 11:43:54,493][1652491] Updated weights for policy 0, policy_version 15674 (0.0012) [2024-06-15 11:43:55,955][1648985] Fps is (10 sec: 49152.3, 60 sec: 45329.2, 300 sec: 44875.5). Total num frames: 32112640. Throughput: 0: 10990.9. Samples: 8094720. Policy #0 lag: (min: 5.0, avg: 105.1, max: 261.0) [2024-06-15 11:43:55,956][1648985] Avg episode reward: [(0, '59.470')] [2024-06-15 11:43:58,434][1652491] Updated weights for policy 0, policy_version 15717 (0.0017) [2024-06-15 11:43:59,588][1652491] Updated weights for policy 0, policy_version 15777 (0.0012) [2024-06-15 11:44:00,955][1648985] Fps is (10 sec: 42598.7, 60 sec: 44783.1, 300 sec: 44542.3). Total num frames: 32407552. Throughput: 0: 11203.1. Samples: 8162304. Policy #0 lag: (min: 5.0, avg: 105.1, max: 261.0) [2024-06-15 11:44:00,955][1648985] Avg episode reward: [(0, '63.950')] [2024-06-15 11:44:01,185][1652491] Updated weights for policy 0, policy_version 15840 (0.0011) [2024-06-15 11:44:05,803][1652491] Updated weights for policy 0, policy_version 15889 (0.0058) [2024-06-15 11:44:05,955][1648985] Fps is (10 sec: 42598.3, 60 sec: 44236.8, 300 sec: 44542.3). Total num frames: 32538624. Throughput: 0: 11002.3. Samples: 8227840. Policy #0 lag: (min: 13.0, avg: 125.2, max: 269.0) [2024-06-15 11:44:05,956][1648985] Avg episode reward: [(0, '62.750')] [2024-06-15 11:44:09,107][1652491] Updated weights for policy 0, policy_version 15954 (0.0021) [2024-06-15 11:44:10,313][1652491] Updated weights for policy 0, policy_version 16008 (0.0012) [2024-06-15 11:44:10,955][1648985] Fps is (10 sec: 42598.2, 60 sec: 45328.9, 300 sec: 44986.6). Total num frames: 32833536. Throughput: 0: 11218.5. Samples: 8264704. Policy #0 lag: (min: 13.0, avg: 125.2, max: 269.0) [2024-06-15 11:44:10,956][1648985] Avg episode reward: [(0, '60.410')] [2024-06-15 11:44:11,787][1652491] Updated weights for policy 0, policy_version 16080 (0.0013) [2024-06-15 11:44:15,955][1648985] Fps is (10 sec: 49152.2, 60 sec: 43712.2, 300 sec: 44431.2). Total num frames: 33030144. Throughput: 0: 11104.7. Samples: 8325632. Policy #0 lag: (min: 13.0, avg: 125.2, max: 269.0) [2024-06-15 11:44:15,956][1648985] Avg episode reward: [(0, '61.490')] [2024-06-15 11:44:16,889][1652491] Updated weights for policy 0, policy_version 16130 (0.0012) [2024-06-15 11:44:17,633][1651469] Signal inference workers to stop experience collection... (900 times) [2024-06-15 11:44:17,704][1652491] InferenceWorker_p0-w0: stopping experience collection (900 times) [2024-06-15 11:44:17,844][1651469] Signal inference workers to resume experience collection... (900 times) [2024-06-15 11:44:17,845][1652491] InferenceWorker_p0-w0: resuming experience collection (900 times) [2024-06-15 11:44:20,705][1652491] Updated weights for policy 0, policy_version 16194 (0.0014) [2024-06-15 11:44:20,955][1648985] Fps is (10 sec: 36044.4, 60 sec: 44236.7, 300 sec: 44875.5). Total num frames: 33193984. Throughput: 0: 11423.3. Samples: 8403968. Policy #0 lag: (min: 19.0, avg: 110.3, max: 275.0) [2024-06-15 11:44:20,956][1648985] Avg episode reward: [(0, '64.850')] [2024-06-15 11:44:21,925][1652491] Updated weights for policy 0, policy_version 16256 (0.0012) [2024-06-15 11:44:23,683][1652491] Updated weights for policy 0, policy_version 16336 (0.0119) [2024-06-15 11:44:25,955][1648985] Fps is (10 sec: 52428.2, 60 sec: 44236.8, 300 sec: 44544.6). Total num frames: 33554432. Throughput: 0: 11286.7. Samples: 8429056. Policy #0 lag: (min: 19.0, avg: 110.3, max: 275.0) [2024-06-15 11:44:25,956][1648985] Avg episode reward: [(0, '65.820')] [2024-06-15 11:44:29,449][1652491] Updated weights for policy 0, policy_version 16432 (0.0077) [2024-06-15 11:44:30,955][1648985] Fps is (10 sec: 49152.7, 60 sec: 43690.6, 300 sec: 44875.5). Total num frames: 33685504. Throughput: 0: 11127.5. Samples: 8496128. Policy #0 lag: (min: 47.0, avg: 153.2, max: 303.0) [2024-06-15 11:44:30,956][1648985] Avg episode reward: [(0, '73.160')] [2024-06-15 11:44:30,957][1651469] Saving new best policy, reward=73.160! [2024-06-15 11:44:33,437][1652491] Updated weights for policy 0, policy_version 16485 (0.0017) [2024-06-15 11:44:34,966][1652491] Updated weights for policy 0, policy_version 16545 (0.0013) [2024-06-15 11:44:35,955][1648985] Fps is (10 sec: 42598.1, 60 sec: 45329.1, 300 sec: 44986.6). Total num frames: 33980416. Throughput: 0: 11275.3. Samples: 8561664. Policy #0 lag: (min: 47.0, avg: 153.2, max: 303.0) [2024-06-15 11:44:35,956][1648985] Avg episode reward: [(0, '74.710')] [2024-06-15 11:44:36,278][1651469] Saving new best policy, reward=74.710! [2024-06-15 11:44:36,929][1652491] Updated weights for policy 0, policy_version 16637 (0.0013) [2024-06-15 11:44:40,955][1648985] Fps is (10 sec: 45875.0, 60 sec: 44783.0, 300 sec: 44653.3). Total num frames: 34144256. Throughput: 0: 11116.1. Samples: 8594944. Policy #0 lag: (min: 47.0, avg: 153.2, max: 303.0) [2024-06-15 11:44:40,956][1648985] Avg episode reward: [(0, '67.340')] [2024-06-15 11:44:41,181][1652491] Updated weights for policy 0, policy_version 16690 (0.0011) [2024-06-15 11:44:44,849][1652491] Updated weights for policy 0, policy_version 16736 (0.0013) [2024-06-15 11:44:45,955][1648985] Fps is (10 sec: 39322.8, 60 sec: 45875.3, 300 sec: 44986.6). Total num frames: 34373632. Throughput: 0: 11207.1. Samples: 8666624. Policy #0 lag: (min: 63.0, avg: 174.7, max: 319.0) [2024-06-15 11:44:45,955][1648985] Avg episode reward: [(0, '64.810')] [2024-06-15 11:44:47,256][1652491] Updated weights for policy 0, policy_version 16848 (0.0080) [2024-06-15 11:44:50,956][1648985] Fps is (10 sec: 45874.6, 60 sec: 43690.6, 300 sec: 44431.2). Total num frames: 34603008. Throughput: 0: 11093.3. Samples: 8727040. Policy #0 lag: (min: 63.0, avg: 174.7, max: 319.0) [2024-06-15 11:44:50,957][1648985] Avg episode reward: [(0, '70.300')] [2024-06-15 11:44:52,220][1652491] Updated weights for policy 0, policy_version 16915 (0.0021) [2024-06-15 11:44:55,955][1648985] Fps is (10 sec: 36043.0, 60 sec: 43690.4, 300 sec: 44764.4). Total num frames: 34734080. Throughput: 0: 11036.4. Samples: 8761344. Policy #0 lag: (min: 63.0, avg: 174.7, max: 319.0) [2024-06-15 11:44:55,956][1648985] Avg episode reward: [(0, '74.820')] [2024-06-15 11:44:56,057][1652491] Updated weights for policy 0, policy_version 16976 (0.0012) [2024-06-15 11:44:56,527][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000016992_34799616.pth... [2024-06-15 11:44:56,686][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000011776_24117248.pth [2024-06-15 11:44:56,711][1651469] Saving new best policy, reward=74.820! [2024-06-15 11:44:57,936][1652491] Updated weights for policy 0, policy_version 17049 (0.0013) [2024-06-15 11:44:58,808][1651469] Signal inference workers to stop experience collection... (950 times) [2024-06-15 11:44:58,836][1652491] Updated weights for policy 0, policy_version 17089 (0.0016) [2024-06-15 11:44:58,892][1652491] InferenceWorker_p0-w0: stopping experience collection (950 times) [2024-06-15 11:44:59,078][1651469] Signal inference workers to resume experience collection... (950 times) [2024-06-15 11:44:59,079][1652491] InferenceWorker_p0-w0: resuming experience collection (950 times) [2024-06-15 11:45:00,012][1652491] Updated weights for policy 0, policy_version 17145 (0.0014) [2024-06-15 11:45:00,955][1648985] Fps is (10 sec: 52429.1, 60 sec: 45329.0, 300 sec: 44542.3). Total num frames: 35127296. Throughput: 0: 11138.8. Samples: 8826880. Policy #0 lag: (min: 12.0, avg: 84.0, max: 268.0) [2024-06-15 11:45:00,956][1648985] Avg episode reward: [(0, '69.790')] [2024-06-15 11:45:03,772][1652491] Updated weights for policy 0, policy_version 17188 (0.0124) [2024-06-15 11:45:05,955][1648985] Fps is (10 sec: 52429.8, 60 sec: 45329.0, 300 sec: 44875.5). Total num frames: 35258368. Throughput: 0: 11047.8. Samples: 8901120. Policy #0 lag: (min: 12.0, avg: 84.0, max: 268.0) [2024-06-15 11:45:05,956][1648985] Avg episode reward: [(0, '68.290')] [2024-06-15 11:45:07,849][1652491] Updated weights for policy 0, policy_version 17235 (0.0013) [2024-06-15 11:45:09,344][1652491] Updated weights for policy 0, policy_version 17312 (0.0018) [2024-06-15 11:45:10,760][1652491] Updated weights for policy 0, policy_version 17376 (0.0013) [2024-06-15 11:45:10,955][1648985] Fps is (10 sec: 45874.8, 60 sec: 45875.1, 300 sec: 44764.4). Total num frames: 35586048. Throughput: 0: 11241.2. Samples: 8934912. Policy #0 lag: (min: 47.0, avg: 122.6, max: 303.0) [2024-06-15 11:45:10,956][1648985] Avg episode reward: [(0, '76.510')] [2024-06-15 11:45:11,548][1651469] Saving new best policy, reward=76.510! [2024-06-15 11:45:14,244][1652491] Updated weights for policy 0, policy_version 17424 (0.0013) [2024-06-15 11:45:15,955][1648985] Fps is (10 sec: 52429.9, 60 sec: 45875.2, 300 sec: 44875.5). Total num frames: 35782656. Throughput: 0: 11218.5. Samples: 9000960. Policy #0 lag: (min: 47.0, avg: 122.6, max: 303.0) [2024-06-15 11:45:15,956][1648985] Avg episode reward: [(0, '73.110')] [2024-06-15 11:45:19,165][1652491] Updated weights for policy 0, policy_version 17488 (0.0013) [2024-06-15 11:45:20,955][1648985] Fps is (10 sec: 36045.6, 60 sec: 45875.3, 300 sec: 44653.6). Total num frames: 35946496. Throughput: 0: 11150.3. Samples: 9063424. Policy #0 lag: (min: 47.0, avg: 122.6, max: 303.0) [2024-06-15 11:45:20,956][1648985] Avg episode reward: [(0, '64.830')] [2024-06-15 11:45:21,613][1652491] Updated weights for policy 0, policy_version 17584 (0.0013) [2024-06-15 11:45:22,616][1652491] Updated weights for policy 0, policy_version 17632 (0.0011) [2024-06-15 11:45:23,401][1652491] Updated weights for policy 0, policy_version 17663 (0.0016) [2024-06-15 11:45:25,956][1648985] Fps is (10 sec: 39318.6, 60 sec: 43690.2, 300 sec: 44431.1). Total num frames: 36175872. Throughput: 0: 11138.7. Samples: 9096192. Policy #0 lag: (min: 57.0, avg: 143.5, max: 297.0) [2024-06-15 11:45:25,957][1648985] Avg episode reward: [(0, '75.920')] [2024-06-15 11:45:26,627][1652491] Updated weights for policy 0, policy_version 17703 (0.0015) [2024-06-15 11:45:30,073][1652491] Updated weights for policy 0, policy_version 17744 (0.0014) [2024-06-15 11:45:30,955][1648985] Fps is (10 sec: 45875.1, 60 sec: 45329.1, 300 sec: 44875.5). Total num frames: 36405248. Throughput: 0: 11502.9. Samples: 9184256. Policy #0 lag: (min: 57.0, avg: 143.5, max: 297.0) [2024-06-15 11:45:30,956][1648985] Avg episode reward: [(0, '81.410')] [2024-06-15 11:45:31,458][1651469] Saving new best policy, reward=81.410! [2024-06-15 11:45:32,242][1652491] Updated weights for policy 0, policy_version 17840 (0.0014) [2024-06-15 11:45:32,676][1652491] Updated weights for policy 0, policy_version 17854 (0.0011) [2024-06-15 11:45:34,191][1652491] Updated weights for policy 0, policy_version 17918 (0.0014) [2024-06-15 11:45:35,955][1648985] Fps is (10 sec: 52432.9, 60 sec: 45329.2, 300 sec: 44431.2). Total num frames: 36700160. Throughput: 0: 11491.6. Samples: 9244160. Policy #0 lag: (min: 48.0, avg: 187.1, max: 294.0) [2024-06-15 11:45:35,956][1648985] Avg episode reward: [(0, '73.530')] [2024-06-15 11:45:38,222][1652491] Updated weights for policy 0, policy_version 17976 (0.0021) [2024-06-15 11:45:40,955][1648985] Fps is (10 sec: 42597.4, 60 sec: 44782.8, 300 sec: 44875.5). Total num frames: 36831232. Throughput: 0: 11571.2. Samples: 9282048. Policy #0 lag: (min: 48.0, avg: 187.1, max: 294.0) [2024-06-15 11:45:40,956][1648985] Avg episode reward: [(0, '73.640')] [2024-06-15 11:45:41,742][1652491] Updated weights for policy 0, policy_version 18032 (0.0015) [2024-06-15 11:45:42,829][1651469] Signal inference workers to stop experience collection... (1000 times) [2024-06-15 11:45:42,881][1652491] InferenceWorker_p0-w0: stopping experience collection (1000 times) [2024-06-15 11:45:43,031][1651469] Signal inference workers to resume experience collection... (1000 times) [2024-06-15 11:45:43,042][1652491] InferenceWorker_p0-w0: resuming experience collection (1000 times) [2024-06-15 11:45:43,935][1652491] Updated weights for policy 0, policy_version 18096 (0.0013) [2024-06-15 11:45:45,265][1652491] Updated weights for policy 0, policy_version 18144 (0.0012) [2024-06-15 11:45:45,955][1648985] Fps is (10 sec: 49151.7, 60 sec: 46967.3, 300 sec: 44875.5). Total num frames: 37191680. Throughput: 0: 11571.2. Samples: 9347584. Policy #0 lag: (min: 48.0, avg: 187.1, max: 294.0) [2024-06-15 11:45:45,956][1648985] Avg episode reward: [(0, '76.680')] [2024-06-15 11:45:48,273][1652491] Updated weights for policy 0, policy_version 18179 (0.0027) [2024-06-15 11:45:50,955][1648985] Fps is (10 sec: 52430.0, 60 sec: 45875.3, 300 sec: 44875.6). Total num frames: 37355520. Throughput: 0: 11457.5. Samples: 9416704. Policy #0 lag: (min: 15.0, avg: 137.3, max: 271.0) [2024-06-15 11:45:50,956][1648985] Avg episode reward: [(0, '75.760')] [2024-06-15 11:45:52,720][1652491] Updated weights for policy 0, policy_version 18256 (0.0013) [2024-06-15 11:45:53,708][1652491] Updated weights for policy 0, policy_version 18304 (0.0020) [2024-06-15 11:45:55,892][1652491] Updated weights for policy 0, policy_version 18360 (0.0013) [2024-06-15 11:45:55,955][1648985] Fps is (10 sec: 39322.2, 60 sec: 47514.0, 300 sec: 44653.4). Total num frames: 37584896. Throughput: 0: 11491.6. Samples: 9452032. Policy #0 lag: (min: 15.0, avg: 137.3, max: 271.0) [2024-06-15 11:45:55,955][1648985] Avg episode reward: [(0, '75.640')] [2024-06-15 11:45:57,371][1652491] Updated weights for policy 0, policy_version 18432 (0.0013) [2024-06-15 11:46:00,843][1652491] Updated weights for policy 0, policy_version 18488 (0.0011) [2024-06-15 11:46:00,956][1648985] Fps is (10 sec: 49150.9, 60 sec: 45329.0, 300 sec: 44764.4). Total num frames: 37847040. Throughput: 0: 11480.1. Samples: 9517568. Policy #0 lag: (min: 15.0, avg: 137.3, max: 271.0) [2024-06-15 11:46:00,957][1648985] Avg episode reward: [(0, '72.410')] [2024-06-15 11:46:04,704][1652491] Updated weights for policy 0, policy_version 18522 (0.0031) [2024-06-15 11:46:05,955][1648985] Fps is (10 sec: 42597.4, 60 sec: 45875.3, 300 sec: 44764.4). Total num frames: 38010880. Throughput: 0: 11753.2. Samples: 9592320. Policy #0 lag: (min: 15.0, avg: 93.6, max: 271.0) [2024-06-15 11:46:05,956][1648985] Avg episode reward: [(0, '69.260')] [2024-06-15 11:46:06,070][1652491] Updated weights for policy 0, policy_version 18564 (0.0034) [2024-06-15 11:46:07,912][1652491] Updated weights for policy 0, policy_version 18640 (0.0012) [2024-06-15 11:46:10,955][1648985] Fps is (10 sec: 42599.4, 60 sec: 44783.1, 300 sec: 44431.3). Total num frames: 38273024. Throughput: 0: 11537.3. Samples: 9615360. Policy #0 lag: (min: 15.0, avg: 93.6, max: 271.0) [2024-06-15 11:46:10,956][1648985] Avg episode reward: [(0, '75.700')] [2024-06-15 11:46:11,799][1652491] Updated weights for policy 0, policy_version 18721 (0.0039) [2024-06-15 11:46:12,263][1652491] Updated weights for policy 0, policy_version 18746 (0.0014) [2024-06-15 11:46:15,955][1648985] Fps is (10 sec: 39321.7, 60 sec: 43690.6, 300 sec: 44875.5). Total num frames: 38404096. Throughput: 0: 11355.0. Samples: 9695232. Policy #0 lag: (min: 15.0, avg: 93.6, max: 271.0) [2024-06-15 11:46:15,956][1648985] Avg episode reward: [(0, '78.760')] [2024-06-15 11:46:16,946][1652491] Updated weights for policy 0, policy_version 18800 (0.0013) [2024-06-15 11:46:18,721][1652491] Updated weights for policy 0, policy_version 18864 (0.0076) [2024-06-15 11:46:20,530][1652491] Updated weights for policy 0, policy_version 18941 (0.0130) [2024-06-15 11:46:20,955][1648985] Fps is (10 sec: 52428.5, 60 sec: 47513.5, 300 sec: 44764.4). Total num frames: 38797312. Throughput: 0: 11252.6. Samples: 9750528. Policy #0 lag: (min: 15.0, avg: 123.7, max: 271.0) [2024-06-15 11:46:20,956][1648985] Avg episode reward: [(0, '80.080')] [2024-06-15 11:46:23,886][1652491] Updated weights for policy 0, policy_version 19000 (0.0012) [2024-06-15 11:46:25,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 45875.7, 300 sec: 44875.5). Total num frames: 38928384. Throughput: 0: 11207.2. Samples: 9786368. Policy #0 lag: (min: 15.0, avg: 123.7, max: 271.0) [2024-06-15 11:46:25,956][1648985] Avg episode reward: [(0, '84.160')] [2024-06-15 11:46:25,959][1651469] Saving new best policy, reward=84.160! [2024-06-15 11:46:28,329][1651469] Signal inference workers to stop experience collection... (1050 times) [2024-06-15 11:46:28,384][1652491] InferenceWorker_p0-w0: stopping experience collection (1050 times) [2024-06-15 11:46:28,397][1652491] Updated weights for policy 0, policy_version 19044 (0.0015) [2024-06-15 11:46:28,547][1651469] Signal inference workers to resume experience collection... (1050 times) [2024-06-15 11:46:28,553][1652491] InferenceWorker_p0-w0: resuming experience collection (1050 times) [2024-06-15 11:46:29,815][1652491] Updated weights for policy 0, policy_version 19108 (0.0018) [2024-06-15 11:46:30,955][1648985] Fps is (10 sec: 42598.5, 60 sec: 46967.4, 300 sec: 44875.5). Total num frames: 39223296. Throughput: 0: 11355.0. Samples: 9858560. Policy #0 lag: (min: 15.0, avg: 123.7, max: 271.0) [2024-06-15 11:46:30,956][1648985] Avg episode reward: [(0, '90.290')] [2024-06-15 11:46:31,151][1652491] Updated weights for policy 0, policy_version 19168 (0.0012) [2024-06-15 11:46:31,625][1651469] Saving new best policy, reward=90.290! [2024-06-15 11:46:32,027][1652491] Updated weights for policy 0, policy_version 19196 (0.0011) [2024-06-15 11:46:35,420][1652491] Updated weights for policy 0, policy_version 19264 (0.0013) [2024-06-15 11:46:35,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 45875.2, 300 sec: 44875.5). Total num frames: 39452672. Throughput: 0: 11286.8. Samples: 9924608. Policy #0 lag: (min: 68.0, avg: 196.8, max: 330.0) [2024-06-15 11:46:35,956][1648985] Avg episode reward: [(0, '83.950')] [2024-06-15 11:46:40,955][1648985] Fps is (10 sec: 36044.8, 60 sec: 45875.4, 300 sec: 44653.3). Total num frames: 39583744. Throughput: 0: 11446.0. Samples: 9967104. Policy #0 lag: (min: 68.0, avg: 196.8, max: 330.0) [2024-06-15 11:46:40,956][1648985] Avg episode reward: [(0, '80.330')] [2024-06-15 11:46:42,039][1652491] Updated weights for policy 0, policy_version 19386 (0.0013) [2024-06-15 11:46:43,850][1652491] Updated weights for policy 0, policy_version 19440 (0.0016) [2024-06-15 11:46:45,955][1648985] Fps is (10 sec: 39321.6, 60 sec: 44236.8, 300 sec: 44431.2). Total num frames: 39845888. Throughput: 0: 11150.3. Samples: 10019328. Policy #0 lag: (min: 68.0, avg: 196.8, max: 330.0) [2024-06-15 11:46:45,956][1648985] Avg episode reward: [(0, '80.350')] [2024-06-15 11:46:46,731][1652491] Updated weights for policy 0, policy_version 19489 (0.0011) [2024-06-15 11:46:50,928][1652491] Updated weights for policy 0, policy_version 19536 (0.0027) [2024-06-15 11:46:50,955][1648985] Fps is (10 sec: 42598.6, 60 sec: 44236.8, 300 sec: 44986.6). Total num frames: 40009728. Throughput: 0: 11320.9. Samples: 10101760. Policy #0 lag: (min: 15.0, avg: 144.9, max: 271.0) [2024-06-15 11:46:50,956][1648985] Avg episode reward: [(0, '80.180')] [2024-06-15 11:46:52,246][1652491] Updated weights for policy 0, policy_version 19587 (0.0032) [2024-06-15 11:46:54,640][1652491] Updated weights for policy 0, policy_version 19669 (0.0016) [2024-06-15 11:46:55,598][1652491] Updated weights for policy 0, policy_version 19712 (0.0010) [2024-06-15 11:46:55,955][1648985] Fps is (10 sec: 52427.2, 60 sec: 46421.0, 300 sec: 44986.5). Total num frames: 40370176. Throughput: 0: 11377.7. Samples: 10127360. Policy #0 lag: (min: 15.0, avg: 144.9, max: 271.0) [2024-06-15 11:46:55,956][1648985] Avg episode reward: [(0, '79.220')] [2024-06-15 11:46:55,961][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000019712_40370176.pth... [2024-06-15 11:46:56,049][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000014384_29458432.pth [2024-06-15 11:46:57,848][1652491] Updated weights for policy 0, policy_version 19760 (0.0015) [2024-06-15 11:47:00,955][1648985] Fps is (10 sec: 49152.0, 60 sec: 44237.0, 300 sec: 44875.5). Total num frames: 40501248. Throughput: 0: 11389.2. Samples: 10207744. Policy #0 lag: (min: 15.0, avg: 144.9, max: 271.0) [2024-06-15 11:47:00,956][1648985] Avg episode reward: [(0, '79.790')] [2024-06-15 11:47:02,753][1652491] Updated weights for policy 0, policy_version 19813 (0.0024) [2024-06-15 11:47:04,685][1652491] Updated weights for policy 0, policy_version 19902 (0.0047) [2024-06-15 11:47:05,955][1648985] Fps is (10 sec: 42599.8, 60 sec: 46421.4, 300 sec: 45319.9). Total num frames: 40796160. Throughput: 0: 11480.2. Samples: 10267136. Policy #0 lag: (min: 15.0, avg: 87.9, max: 271.0) [2024-06-15 11:47:05,956][1648985] Avg episode reward: [(0, '76.690')] [2024-06-15 11:47:06,745][1652491] Updated weights for policy 0, policy_version 19960 (0.0013) [2024-06-15 11:47:08,800][1651469] Signal inference workers to stop experience collection... (1100 times) [2024-06-15 11:47:08,882][1652491] InferenceWorker_p0-w0: stopping experience collection (1100 times) [2024-06-15 11:47:09,069][1651469] Signal inference workers to resume experience collection... (1100 times) [2024-06-15 11:47:09,070][1652491] InferenceWorker_p0-w0: resuming experience collection (1100 times) [2024-06-15 11:47:09,072][1652491] Updated weights for policy 0, policy_version 20016 (0.0013) [2024-06-15 11:47:10,955][1648985] Fps is (10 sec: 52427.3, 60 sec: 45875.0, 300 sec: 44875.5). Total num frames: 41025536. Throughput: 0: 11468.7. Samples: 10302464. Policy #0 lag: (min: 15.0, avg: 87.9, max: 271.0) [2024-06-15 11:47:10,956][1648985] Avg episode reward: [(0, '72.310')] [2024-06-15 11:47:14,447][1652491] Updated weights for policy 0, policy_version 20087 (0.0013) [2024-06-15 11:47:15,955][1648985] Fps is (10 sec: 42598.1, 60 sec: 46967.5, 300 sec: 45319.8). Total num frames: 41222144. Throughput: 0: 11457.4. Samples: 10374144. Policy #0 lag: (min: 15.0, avg: 87.9, max: 271.0) [2024-06-15 11:47:15,956][1648985] Avg episode reward: [(0, '78.610')] [2024-06-15 11:47:16,054][1652491] Updated weights for policy 0, policy_version 20132 (0.0012) [2024-06-15 11:47:17,656][1652491] Updated weights for policy 0, policy_version 20196 (0.0012) [2024-06-15 11:47:19,637][1652491] Updated weights for policy 0, policy_version 20256 (0.0091) [2024-06-15 11:47:20,955][1648985] Fps is (10 sec: 52430.2, 60 sec: 45875.2, 300 sec: 45097.7). Total num frames: 41549824. Throughput: 0: 11571.2. Samples: 10445312. Policy #0 lag: (min: 44.0, avg: 146.6, max: 316.0) [2024-06-15 11:47:20,956][1648985] Avg episode reward: [(0, '82.760')] [2024-06-15 11:47:24,420][1652491] Updated weights for policy 0, policy_version 20305 (0.0011) [2024-06-15 11:47:25,663][1652491] Updated weights for policy 0, policy_version 20352 (0.0016) [2024-06-15 11:47:25,955][1648985] Fps is (10 sec: 45875.4, 60 sec: 45875.2, 300 sec: 45319.9). Total num frames: 41680896. Throughput: 0: 11685.0. Samples: 10492928. Policy #0 lag: (min: 44.0, avg: 146.6, max: 316.0) [2024-06-15 11:47:25,956][1648985] Avg episode reward: [(0, '76.050')] [2024-06-15 11:47:27,334][1652491] Updated weights for policy 0, policy_version 20413 (0.0021) [2024-06-15 11:47:28,597][1652491] Updated weights for policy 0, policy_version 20464 (0.0021) [2024-06-15 11:47:30,675][1652491] Updated weights for policy 0, policy_version 20512 (0.0011) [2024-06-15 11:47:30,955][1648985] Fps is (10 sec: 45874.2, 60 sec: 46421.2, 300 sec: 45430.9). Total num frames: 42008576. Throughput: 0: 11810.1. Samples: 10550784. Policy #0 lag: (min: 31.0, avg: 182.3, max: 287.0) [2024-06-15 11:47:30,956][1648985] Avg episode reward: [(0, '79.550')] [2024-06-15 11:47:35,305][1652491] Updated weights for policy 0, policy_version 20545 (0.0012) [2024-06-15 11:47:35,955][1648985] Fps is (10 sec: 42598.4, 60 sec: 44236.8, 300 sec: 44986.6). Total num frames: 42106880. Throughput: 0: 11685.0. Samples: 10627584. Policy #0 lag: (min: 31.0, avg: 182.3, max: 287.0) [2024-06-15 11:47:35,956][1648985] Avg episode reward: [(0, '82.900')] [2024-06-15 11:47:36,943][1652491] Updated weights for policy 0, policy_version 20607 (0.0027) [2024-06-15 11:47:39,708][1652491] Updated weights for policy 0, policy_version 20688 (0.0013) [2024-06-15 11:47:40,640][1652491] Updated weights for policy 0, policy_version 20734 (0.0013) [2024-06-15 11:47:40,955][1648985] Fps is (10 sec: 45875.2, 60 sec: 48059.6, 300 sec: 45764.1). Total num frames: 42467328. Throughput: 0: 11741.9. Samples: 10655744. Policy #0 lag: (min: 31.0, avg: 182.3, max: 287.0) [2024-06-15 11:47:40,956][1648985] Avg episode reward: [(0, '86.500')] [2024-06-15 11:47:42,346][1652491] Updated weights for policy 0, policy_version 20784 (0.0012) [2024-06-15 11:47:45,961][1648985] Fps is (10 sec: 49123.2, 60 sec: 45870.7, 300 sec: 44875.1). Total num frames: 42598400. Throughput: 0: 11615.2. Samples: 10730496. Policy #0 lag: (min: 31.0, avg: 182.3, max: 287.0) [2024-06-15 11:47:45,979][1648985] Avg episode reward: [(0, '79.350')] [2024-06-15 11:47:46,608][1652491] Updated weights for policy 0, policy_version 20833 (0.0026) [2024-06-15 11:47:49,843][1652491] Updated weights for policy 0, policy_version 20902 (0.0032) [2024-06-15 11:47:50,955][1648985] Fps is (10 sec: 42599.5, 60 sec: 48059.8, 300 sec: 45764.2). Total num frames: 42893312. Throughput: 0: 11821.5. Samples: 10799104. Policy #0 lag: (min: 7.0, avg: 95.9, max: 263.0) [2024-06-15 11:47:50,955][1648985] Avg episode reward: [(0, '81.210')] [2024-06-15 11:47:51,663][1652491] Updated weights for policy 0, policy_version 20991 (0.0093) [2024-06-15 11:47:52,597][1651469] Signal inference workers to stop experience collection... (1150 times) [2024-06-15 11:47:52,640][1652491] InferenceWorker_p0-w0: stopping experience collection (1150 times) [2024-06-15 11:47:52,771][1651469] Signal inference workers to resume experience collection... (1150 times) [2024-06-15 11:47:52,772][1652491] InferenceWorker_p0-w0: resuming experience collection (1150 times) [2024-06-15 11:47:53,189][1652491] Updated weights for policy 0, policy_version 21050 (0.0014) [2024-06-15 11:47:55,955][1648985] Fps is (10 sec: 52459.4, 60 sec: 45875.4, 300 sec: 45430.9). Total num frames: 43122688. Throughput: 0: 11776.1. Samples: 10832384. Policy #0 lag: (min: 7.0, avg: 95.9, max: 263.0) [2024-06-15 11:47:55,956][1648985] Avg episode reward: [(0, '90.550')] [2024-06-15 11:47:55,964][1651469] Saving new best policy, reward=90.550! [2024-06-15 11:47:58,004][1652491] Updated weights for policy 0, policy_version 21093 (0.0015) [2024-06-15 11:48:00,447][1652491] Updated weights for policy 0, policy_version 21122 (0.0015) [2024-06-15 11:48:00,956][1648985] Fps is (10 sec: 39320.7, 60 sec: 46421.2, 300 sec: 45430.9). Total num frames: 43286528. Throughput: 0: 11923.9. Samples: 10910720. Policy #0 lag: (min: 15.0, avg: 100.3, max: 271.0) [2024-06-15 11:48:00,957][1648985] Avg episode reward: [(0, '90.300')] [2024-06-15 11:48:01,980][1652491] Updated weights for policy 0, policy_version 21171 (0.0012) [2024-06-15 11:48:03,734][1652491] Updated weights for policy 0, policy_version 21251 (0.0092) [2024-06-15 11:48:04,616][1652491] Updated weights for policy 0, policy_version 21308 (0.0011) [2024-06-15 11:48:05,955][1648985] Fps is (10 sec: 52428.5, 60 sec: 47513.5, 300 sec: 45875.2). Total num frames: 43646976. Throughput: 0: 11696.3. Samples: 10971648. Policy #0 lag: (min: 15.0, avg: 100.3, max: 271.0) [2024-06-15 11:48:05,956][1648985] Avg episode reward: [(0, '86.840')] [2024-06-15 11:48:09,605][1652491] Updated weights for policy 0, policy_version 21360 (0.0016) [2024-06-15 11:48:10,955][1648985] Fps is (10 sec: 49151.7, 60 sec: 45875.2, 300 sec: 45324.3). Total num frames: 43778048. Throughput: 0: 11571.1. Samples: 11013632. Policy #0 lag: (min: 15.0, avg: 100.3, max: 271.0) [2024-06-15 11:48:10,956][1648985] Avg episode reward: [(0, '85.870')] [2024-06-15 11:48:12,090][1652491] Updated weights for policy 0, policy_version 21394 (0.0020) [2024-06-15 11:48:13,835][1652491] Updated weights for policy 0, policy_version 21472 (0.0013) [2024-06-15 11:48:15,000][1652491] Updated weights for policy 0, policy_version 21520 (0.0018) [2024-06-15 11:48:15,962][1648985] Fps is (10 sec: 52392.6, 60 sec: 49146.3, 300 sec: 46207.3). Total num frames: 44171264. Throughput: 0: 11706.0. Samples: 11077632. Policy #0 lag: (min: 47.0, avg: 205.3, max: 351.0) [2024-06-15 11:48:15,963][1648985] Avg episode reward: [(0, '77.270')] [2024-06-15 11:48:19,110][1652491] Updated weights for policy 0, policy_version 21571 (0.0019) [2024-06-15 11:48:20,956][1648985] Fps is (10 sec: 52428.7, 60 sec: 45875.0, 300 sec: 45430.9). Total num frames: 44302336. Throughput: 0: 11696.3. Samples: 11153920. Policy #0 lag: (min: 47.0, avg: 205.3, max: 351.0) [2024-06-15 11:48:20,958][1648985] Avg episode reward: [(0, '79.800')] [2024-06-15 11:48:23,057][1652491] Updated weights for policy 0, policy_version 21649 (0.0116) [2024-06-15 11:48:24,736][1652491] Updated weights for policy 0, policy_version 21716 (0.0013) [2024-06-15 11:48:25,956][1648985] Fps is (10 sec: 39347.5, 60 sec: 48059.4, 300 sec: 45764.0). Total num frames: 44564480. Throughput: 0: 11821.5. Samples: 11187712. Policy #0 lag: (min: 47.0, avg: 205.3, max: 351.0) [2024-06-15 11:48:25,957][1648985] Avg episode reward: [(0, '83.410')] [2024-06-15 11:48:26,413][1652491] Updated weights for policy 0, policy_version 21792 (0.0013) [2024-06-15 11:48:30,955][1648985] Fps is (10 sec: 39322.0, 60 sec: 44783.0, 300 sec: 45542.0). Total num frames: 44695552. Throughput: 0: 11754.7. Samples: 11259392. Policy #0 lag: (min: 47.0, avg: 205.3, max: 351.0) [2024-06-15 11:48:30,956][1648985] Avg episode reward: [(0, '88.690')] [2024-06-15 11:48:31,776][1652491] Updated weights for policy 0, policy_version 21856 (0.0014) [2024-06-15 11:48:34,314][1652491] Updated weights for policy 0, policy_version 21904 (0.0013) [2024-06-15 11:48:35,956][1648985] Fps is (10 sec: 42598.3, 60 sec: 48059.4, 300 sec: 45875.1). Total num frames: 44990464. Throughput: 0: 11639.3. Samples: 11322880. Policy #0 lag: (min: 15.0, avg: 110.7, max: 271.0) [2024-06-15 11:48:35,957][1648985] Avg episode reward: [(0, '86.340')] [2024-06-15 11:48:36,040][1651469] Signal inference workers to stop experience collection... (1200 times) [2024-06-15 11:48:36,082][1652491] InferenceWorker_p0-w0: stopping experience collection (1200 times) [2024-06-15 11:48:36,285][1651469] Signal inference workers to resume experience collection... (1200 times) [2024-06-15 11:48:36,286][1652491] InferenceWorker_p0-w0: resuming experience collection (1200 times) [2024-06-15 11:48:36,601][1652491] Updated weights for policy 0, policy_version 22000 (0.0069) [2024-06-15 11:48:38,245][1652491] Updated weights for policy 0, policy_version 22064 (0.0024) [2024-06-15 11:48:40,955][1648985] Fps is (10 sec: 52429.5, 60 sec: 45875.4, 300 sec: 46097.4). Total num frames: 45219840. Throughput: 0: 11548.5. Samples: 11352064. Policy #0 lag: (min: 15.0, avg: 110.7, max: 271.0) [2024-06-15 11:48:40,956][1648985] Avg episode reward: [(0, '88.670')] [2024-06-15 11:48:43,438][1652491] Updated weights for policy 0, policy_version 22128 (0.0118) [2024-06-15 11:48:45,955][1648985] Fps is (10 sec: 45876.9, 60 sec: 47518.2, 300 sec: 45653.0). Total num frames: 45449216. Throughput: 0: 11594.0. Samples: 11432448. Policy #0 lag: (min: 15.0, avg: 110.7, max: 271.0) [2024-06-15 11:48:45,956][1648985] Avg episode reward: [(0, '85.270')] [2024-06-15 11:48:46,045][1652491] Updated weights for policy 0, policy_version 22201 (0.0115) [2024-06-15 11:48:47,841][1652491] Updated weights for policy 0, policy_version 22264 (0.0020) [2024-06-15 11:48:49,591][1652491] Updated weights for policy 0, policy_version 22320 (0.0128) [2024-06-15 11:48:50,955][1648985] Fps is (10 sec: 52429.3, 60 sec: 47513.6, 300 sec: 46208.5). Total num frames: 45744128. Throughput: 0: 11628.1. Samples: 11494912. Policy #0 lag: (min: 15.0, avg: 152.3, max: 335.0) [2024-06-15 11:48:50,956][1648985] Avg episode reward: [(0, '82.110')] [2024-06-15 11:48:54,428][1652491] Updated weights for policy 0, policy_version 22357 (0.0034) [2024-06-15 11:48:55,955][1648985] Fps is (10 sec: 42598.2, 60 sec: 45875.1, 300 sec: 45653.0). Total num frames: 45875200. Throughput: 0: 11685.0. Samples: 11539456. Policy #0 lag: (min: 15.0, avg: 152.3, max: 335.0) [2024-06-15 11:48:55,956][1648985] Avg episode reward: [(0, '74.930')] [2024-06-15 11:48:56,296][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000022416_45907968.pth... [2024-06-15 11:48:56,461][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000016992_34799616.pth [2024-06-15 11:48:57,162][1652491] Updated weights for policy 0, policy_version 22448 (0.0018) [2024-06-15 11:48:58,439][1652491] Updated weights for policy 0, policy_version 22484 (0.0124) [2024-06-15 11:48:59,499][1652491] Updated weights for policy 0, policy_version 22525 (0.0013) [2024-06-15 11:49:00,883][1652491] Updated weights for policy 0, policy_version 22589 (0.0016) [2024-06-15 11:49:00,955][1648985] Fps is (10 sec: 52427.9, 60 sec: 49698.2, 300 sec: 46541.7). Total num frames: 46268416. Throughput: 0: 11629.9. Samples: 11600896. Policy #0 lag: (min: 15.0, avg: 152.3, max: 335.0) [2024-06-15 11:49:00,956][1648985] Avg episode reward: [(0, '81.250')] [2024-06-15 11:49:05,955][1648985] Fps is (10 sec: 45875.7, 60 sec: 44783.0, 300 sec: 45764.1). Total num frames: 46333952. Throughput: 0: 11594.0. Samples: 11675648. Policy #0 lag: (min: 15.0, avg: 91.8, max: 271.0) [2024-06-15 11:49:05,956][1648985] Avg episode reward: [(0, '82.860')] [2024-06-15 11:49:06,048][1652491] Updated weights for policy 0, policy_version 22631 (0.0013) [2024-06-15 11:49:07,825][1652491] Updated weights for policy 0, policy_version 22672 (0.0015) [2024-06-15 11:49:10,231][1652491] Updated weights for policy 0, policy_version 22768 (0.0013) [2024-06-15 11:49:10,955][1648985] Fps is (10 sec: 39322.0, 60 sec: 48060.0, 300 sec: 46208.4). Total num frames: 46661632. Throughput: 0: 11548.6. Samples: 11707392. Policy #0 lag: (min: 15.0, avg: 91.8, max: 271.0) [2024-06-15 11:49:10,955][1648985] Avg episode reward: [(0, '81.570')] [2024-06-15 11:49:11,466][1652491] Updated weights for policy 0, policy_version 22804 (0.0014) [2024-06-15 11:49:15,955][1648985] Fps is (10 sec: 45874.8, 60 sec: 43695.7, 300 sec: 46097.4). Total num frames: 46792704. Throughput: 0: 11514.3. Samples: 11777536. Policy #0 lag: (min: 15.0, avg: 91.8, max: 271.0) [2024-06-15 11:49:15,956][1648985] Avg episode reward: [(0, '87.150')] [2024-06-15 11:49:17,252][1652491] Updated weights for policy 0, policy_version 22864 (0.0012) [2024-06-15 11:49:18,345][1652491] Updated weights for policy 0, policy_version 22912 (0.0058) [2024-06-15 11:49:20,628][1652491] Updated weights for policy 0, policy_version 22984 (0.0013) [2024-06-15 11:49:20,844][1651469] Signal inference workers to stop experience collection... (1250 times) [2024-06-15 11:49:20,882][1652491] InferenceWorker_p0-w0: stopping experience collection (1250 times) [2024-06-15 11:49:20,955][1648985] Fps is (10 sec: 42597.6, 60 sec: 46421.4, 300 sec: 45875.2). Total num frames: 47087616. Throughput: 0: 11639.5. Samples: 11846656. Policy #0 lag: (min: 15.0, avg: 91.8, max: 271.0) [2024-06-15 11:49:20,956][1648985] Avg episode reward: [(0, '84.240')] [2024-06-15 11:49:21,110][1651469] Signal inference workers to resume experience collection... (1250 times) [2024-06-15 11:49:21,111][1652491] InferenceWorker_p0-w0: resuming experience collection (1250 times) [2024-06-15 11:49:22,536][1652491] Updated weights for policy 0, policy_version 23058 (0.0012) [2024-06-15 11:49:23,426][1652491] Updated weights for policy 0, policy_version 23099 (0.0013) [2024-06-15 11:49:25,955][1648985] Fps is (10 sec: 52428.6, 60 sec: 45875.4, 300 sec: 46208.4). Total num frames: 47316992. Throughput: 0: 11662.2. Samples: 11876864. Policy #0 lag: (min: 26.0, avg: 150.7, max: 298.0) [2024-06-15 11:49:25,956][1648985] Avg episode reward: [(0, '82.980')] [2024-06-15 11:49:29,537][1652491] Updated weights for policy 0, policy_version 23143 (0.0021) [2024-06-15 11:49:30,955][1648985] Fps is (10 sec: 42598.5, 60 sec: 46967.5, 300 sec: 45875.2). Total num frames: 47513600. Throughput: 0: 11605.3. Samples: 11954688. Policy #0 lag: (min: 26.0, avg: 150.7, max: 298.0) [2024-06-15 11:49:30,956][1648985] Avg episode reward: [(0, '80.360')] [2024-06-15 11:49:31,267][1652491] Updated weights for policy 0, policy_version 23229 (0.0029) [2024-06-15 11:49:32,771][1652491] Updated weights for policy 0, policy_version 23280 (0.0147) [2024-06-15 11:49:34,386][1652491] Updated weights for policy 0, policy_version 23344 (0.0100) [2024-06-15 11:49:35,955][1648985] Fps is (10 sec: 52429.4, 60 sec: 47513.9, 300 sec: 46430.6). Total num frames: 47841280. Throughput: 0: 11525.7. Samples: 12013568. Policy #0 lag: (min: 26.0, avg: 150.7, max: 298.0) [2024-06-15 11:49:35,956][1648985] Avg episode reward: [(0, '81.210')] [2024-06-15 11:49:40,750][1652491] Updated weights for policy 0, policy_version 23392 (0.0013) [2024-06-15 11:49:40,955][1648985] Fps is (10 sec: 39321.9, 60 sec: 44782.9, 300 sec: 45875.2). Total num frames: 47906816. Throughput: 0: 11480.2. Samples: 12056064. Policy #0 lag: (min: 2.0, avg: 62.1, max: 258.0) [2024-06-15 11:49:40,956][1648985] Avg episode reward: [(0, '81.240')] [2024-06-15 11:49:42,507][1652491] Updated weights for policy 0, policy_version 23472 (0.0013) [2024-06-15 11:49:44,707][1652491] Updated weights for policy 0, policy_version 23536 (0.0044) [2024-06-15 11:49:45,955][1648985] Fps is (10 sec: 42598.2, 60 sec: 46967.5, 300 sec: 46319.5). Total num frames: 48267264. Throughput: 0: 11502.9. Samples: 12118528. Policy #0 lag: (min: 2.0, avg: 62.1, max: 258.0) [2024-06-15 11:49:45,956][1648985] Avg episode reward: [(0, '85.510')] [2024-06-15 11:49:46,640][1652491] Updated weights for policy 0, policy_version 23600 (0.0015) [2024-06-15 11:49:50,955][1648985] Fps is (10 sec: 45875.2, 60 sec: 43690.6, 300 sec: 46208.5). Total num frames: 48365568. Throughput: 0: 11514.3. Samples: 12193792. Policy #0 lag: (min: 2.0, avg: 62.1, max: 258.0) [2024-06-15 11:49:50,956][1648985] Avg episode reward: [(0, '80.850')] [2024-06-15 11:49:52,152][1652491] Updated weights for policy 0, policy_version 23654 (0.0014) [2024-06-15 11:49:53,909][1652491] Updated weights for policy 0, policy_version 23738 (0.0014) [2024-06-15 11:49:55,955][1648985] Fps is (10 sec: 39322.0, 60 sec: 46421.5, 300 sec: 45875.2). Total num frames: 48660480. Throughput: 0: 11389.2. Samples: 12219904. Policy #0 lag: (min: 2.0, avg: 62.1, max: 258.0) [2024-06-15 11:49:55,955][1648985] Avg episode reward: [(0, '90.780')] [2024-06-15 11:49:56,266][1651469] Saving new best policy, reward=90.780! [2024-06-15 11:49:57,124][1652491] Updated weights for policy 0, policy_version 23808 (0.0012) [2024-06-15 11:49:58,264][1652491] Updated weights for policy 0, policy_version 23863 (0.0011) [2024-06-15 11:50:00,955][1648985] Fps is (10 sec: 52429.2, 60 sec: 43690.7, 300 sec: 46208.5). Total num frames: 48889856. Throughput: 0: 11411.9. Samples: 12291072. Policy #0 lag: (min: 31.0, avg: 147.7, max: 287.0) [2024-06-15 11:50:00,956][1648985] Avg episode reward: [(0, '95.530')] [2024-06-15 11:50:00,966][1651469] Saving new best policy, reward=95.530! [2024-06-15 11:50:03,150][1651469] Signal inference workers to stop experience collection... (1300 times) [2024-06-15 11:50:03,182][1652491] InferenceWorker_p0-w0: stopping experience collection (1300 times) [2024-06-15 11:50:03,228][1652491] Updated weights for policy 0, policy_version 23907 (0.0035) [2024-06-15 11:50:03,376][1651469] Signal inference workers to resume experience collection... (1300 times) [2024-06-15 11:50:03,377][1652491] InferenceWorker_p0-w0: resuming experience collection (1300 times) [2024-06-15 11:50:04,682][1652491] Updated weights for policy 0, policy_version 23984 (0.0014) [2024-06-15 11:50:05,955][1648985] Fps is (10 sec: 49151.6, 60 sec: 46967.4, 300 sec: 45986.3). Total num frames: 49152000. Throughput: 0: 11628.1. Samples: 12369920. Policy #0 lag: (min: 31.0, avg: 147.7, max: 287.0) [2024-06-15 11:50:05,956][1648985] Avg episode reward: [(0, '85.940')] [2024-06-15 11:50:07,833][1652491] Updated weights for policy 0, policy_version 24049 (0.0013) [2024-06-15 11:50:09,575][1652491] Updated weights for policy 0, policy_version 24116 (0.0013) [2024-06-15 11:50:10,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 45875.2, 300 sec: 46208.4). Total num frames: 49414144. Throughput: 0: 11582.6. Samples: 12398080. Policy #0 lag: (min: 31.0, avg: 147.7, max: 287.0) [2024-06-15 11:50:10,955][1648985] Avg episode reward: [(0, '87.410')] [2024-06-15 11:50:14,007][1652491] Updated weights for policy 0, policy_version 24165 (0.0116) [2024-06-15 11:50:15,527][1652491] Updated weights for policy 0, policy_version 24248 (0.0015) [2024-06-15 11:50:15,955][1648985] Fps is (10 sec: 52428.4, 60 sec: 48059.7, 300 sec: 46541.6). Total num frames: 49676288. Throughput: 0: 11559.8. Samples: 12474880. Policy #0 lag: (min: 0.0, avg: 83.0, max: 256.0) [2024-06-15 11:50:15,956][1648985] Avg episode reward: [(0, '96.390')] [2024-06-15 11:50:15,980][1651469] Saving new best policy, reward=96.390! [2024-06-15 11:50:18,728][1652491] Updated weights for policy 0, policy_version 24304 (0.0021) [2024-06-15 11:50:20,791][1652491] Updated weights for policy 0, policy_version 24375 (0.0093) [2024-06-15 11:50:20,955][1648985] Fps is (10 sec: 49151.9, 60 sec: 46967.6, 300 sec: 46541.8). Total num frames: 49905664. Throughput: 0: 11559.8. Samples: 12533760. Policy #0 lag: (min: 0.0, avg: 83.0, max: 256.0) [2024-06-15 11:50:20,956][1648985] Avg episode reward: [(0, '104.040')] [2024-06-15 11:50:20,970][1651469] Saving new best policy, reward=104.040! [2024-06-15 11:50:25,686][1652491] Updated weights for policy 0, policy_version 24448 (0.0012) [2024-06-15 11:50:25,955][1648985] Fps is (10 sec: 39321.7, 60 sec: 45875.2, 300 sec: 46319.5). Total num frames: 50069504. Throughput: 0: 11673.6. Samples: 12581376. Policy #0 lag: (min: 0.0, avg: 83.0, max: 256.0) [2024-06-15 11:50:25,956][1648985] Avg episode reward: [(0, '96.080')] [2024-06-15 11:50:29,675][1652491] Updated weights for policy 0, policy_version 24528 (0.0015) [2024-06-15 11:50:30,955][1648985] Fps is (10 sec: 42598.1, 60 sec: 46967.5, 300 sec: 46208.4). Total num frames: 50331648. Throughput: 0: 11719.1. Samples: 12645888. Policy #0 lag: (min: 31.0, avg: 137.0, max: 287.0) [2024-06-15 11:50:30,956][1648985] Avg episode reward: [(0, '84.930')] [2024-06-15 11:50:31,302][1652491] Updated weights for policy 0, policy_version 24592 (0.0012) [2024-06-15 11:50:32,639][1652491] Updated weights for policy 0, policy_version 24639 (0.0013) [2024-06-15 11:50:35,955][1648985] Fps is (10 sec: 42598.6, 60 sec: 44236.8, 300 sec: 46319.5). Total num frames: 50495488. Throughput: 0: 11628.1. Samples: 12717056. Policy #0 lag: (min: 31.0, avg: 137.0, max: 287.0) [2024-06-15 11:50:35,956][1648985] Avg episode reward: [(0, '81.900')] [2024-06-15 11:50:36,759][1652491] Updated weights for policy 0, policy_version 24696 (0.0112) [2024-06-15 11:50:37,807][1652491] Updated weights for policy 0, policy_version 24736 (0.0014) [2024-06-15 11:50:40,632][1652491] Updated weights for policy 0, policy_version 24784 (0.0014) [2024-06-15 11:50:40,956][1648985] Fps is (10 sec: 42596.4, 60 sec: 47513.2, 300 sec: 45986.2). Total num frames: 50757632. Throughput: 0: 11798.6. Samples: 12750848. Policy #0 lag: (min: 31.0, avg: 137.0, max: 287.0) [2024-06-15 11:50:40,957][1648985] Avg episode reward: [(0, '93.520')] [2024-06-15 11:50:42,224][1652491] Updated weights for policy 0, policy_version 24848 (0.0131) [2024-06-15 11:50:42,342][1651469] Signal inference workers to stop experience collection... (1350 times) [2024-06-15 11:50:42,408][1652491] InferenceWorker_p0-w0: stopping experience collection (1350 times) [2024-06-15 11:50:42,591][1651469] Signal inference workers to resume experience collection... (1350 times) [2024-06-15 11:50:42,592][1652491] InferenceWorker_p0-w0: resuming experience collection (1350 times) [2024-06-15 11:50:45,955][1648985] Fps is (10 sec: 49152.4, 60 sec: 45329.1, 300 sec: 46208.4). Total num frames: 50987008. Throughput: 0: 11730.5. Samples: 12818944. Policy #0 lag: (min: 31.0, avg: 137.0, max: 287.0) [2024-06-15 11:50:45,955][1648985] Avg episode reward: [(0, '96.130')] [2024-06-15 11:50:46,535][1652491] Updated weights for policy 0, policy_version 24898 (0.0015) [2024-06-15 11:50:47,612][1652491] Updated weights for policy 0, policy_version 24947 (0.0012) [2024-06-15 11:50:48,593][1652491] Updated weights for policy 0, policy_version 24979 (0.0012) [2024-06-15 11:50:50,955][1648985] Fps is (10 sec: 49154.5, 60 sec: 48059.7, 300 sec: 46319.5). Total num frames: 51249152. Throughput: 0: 11650.8. Samples: 12894208. Policy #0 lag: (min: 31.0, avg: 120.3, max: 287.0) [2024-06-15 11:50:50,956][1648985] Avg episode reward: [(0, '98.110')] [2024-06-15 11:50:51,190][1652491] Updated weights for policy 0, policy_version 25027 (0.0014) [2024-06-15 11:50:52,240][1652491] Updated weights for policy 0, policy_version 25082 (0.0013) [2024-06-15 11:50:54,377][1652491] Updated weights for policy 0, policy_version 25148 (0.0014) [2024-06-15 11:50:55,955][1648985] Fps is (10 sec: 52427.1, 60 sec: 47513.3, 300 sec: 46319.5). Total num frames: 51511296. Throughput: 0: 11662.1. Samples: 12922880. Policy #0 lag: (min: 31.0, avg: 120.3, max: 287.0) [2024-06-15 11:50:55,956][1648985] Avg episode reward: [(0, '97.950')] [2024-06-15 11:50:55,963][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000025152_51511296.pth... [2024-06-15 11:50:56,025][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000019712_40370176.pth [2024-06-15 11:50:56,029][1651469] Saving a milestone train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/milestones/checkpoint_000025152_51511296.pth [2024-06-15 11:50:58,627][1652491] Updated weights for policy 0, policy_version 25210 (0.0016) [2024-06-15 11:51:00,583][1652491] Updated weights for policy 0, policy_version 25280 (0.0041) [2024-06-15 11:51:00,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 48059.7, 300 sec: 46652.8). Total num frames: 51773440. Throughput: 0: 11628.1. Samples: 12998144. Policy #0 lag: (min: 31.0, avg: 120.3, max: 287.0) [2024-06-15 11:51:00,956][1648985] Avg episode reward: [(0, '88.280')] [2024-06-15 11:51:03,487][1652491] Updated weights for policy 0, policy_version 25337 (0.0013) [2024-06-15 11:51:05,833][1652491] Updated weights for policy 0, policy_version 25408 (0.0013) [2024-06-15 11:51:05,955][1648985] Fps is (10 sec: 52430.2, 60 sec: 48059.7, 300 sec: 46652.7). Total num frames: 52035584. Throughput: 0: 11707.7. Samples: 13060608. Policy #0 lag: (min: 15.0, avg: 148.4, max: 271.0) [2024-06-15 11:51:05,956][1648985] Avg episode reward: [(0, '86.200')] [2024-06-15 11:51:10,184][1652491] Updated weights for policy 0, policy_version 25463 (0.0013) [2024-06-15 11:51:10,955][1648985] Fps is (10 sec: 39320.9, 60 sec: 45875.0, 300 sec: 46652.7). Total num frames: 52166656. Throughput: 0: 11605.3. Samples: 13103616. Policy #0 lag: (min: 15.0, avg: 148.4, max: 271.0) [2024-06-15 11:51:10,956][1648985] Avg episode reward: [(0, '96.670')] [2024-06-15 11:51:11,869][1652491] Updated weights for policy 0, policy_version 25520 (0.0013) [2024-06-15 11:51:14,093][1652491] Updated weights for policy 0, policy_version 25591 (0.0021) [2024-06-15 11:51:15,955][1648985] Fps is (10 sec: 39321.0, 60 sec: 45875.1, 300 sec: 46208.4). Total num frames: 52428800. Throughput: 0: 11685.0. Samples: 13171712. Policy #0 lag: (min: 15.0, avg: 148.4, max: 271.0) [2024-06-15 11:51:15,956][1648985] Avg episode reward: [(0, '92.610')] [2024-06-15 11:51:16,813][1652491] Updated weights for policy 0, policy_version 25640 (0.0014) [2024-06-15 11:51:20,843][1652491] Updated weights for policy 0, policy_version 25697 (0.0013) [2024-06-15 11:51:20,955][1648985] Fps is (10 sec: 45876.1, 60 sec: 45329.1, 300 sec: 46430.6). Total num frames: 52625408. Throughput: 0: 11741.9. Samples: 13245440. Policy #0 lag: (min: 15.0, avg: 148.4, max: 271.0) [2024-06-15 11:51:20,956][1648985] Avg episode reward: [(0, '87.440')] [2024-06-15 11:51:22,965][1652491] Updated weights for policy 0, policy_version 25776 (0.0013) [2024-06-15 11:51:24,938][1652491] Updated weights for policy 0, policy_version 25825 (0.0013) [2024-06-15 11:51:25,644][1652491] Updated weights for policy 0, policy_version 25856 (0.0012) [2024-06-15 11:51:25,955][1648985] Fps is (10 sec: 52429.6, 60 sec: 48059.8, 300 sec: 46541.7). Total num frames: 52953088. Throughput: 0: 11685.1. Samples: 13276672. Policy #0 lag: (min: 15.0, avg: 130.5, max: 271.0) [2024-06-15 11:51:25,956][1648985] Avg episode reward: [(0, '89.670')] [2024-06-15 11:51:28,624][1652491] Updated weights for policy 0, policy_version 25919 (0.0019) [2024-06-15 11:51:30,955][1648985] Fps is (10 sec: 45875.2, 60 sec: 45875.2, 300 sec: 46208.4). Total num frames: 53084160. Throughput: 0: 11719.1. Samples: 13346304. Policy #0 lag: (min: 15.0, avg: 130.5, max: 271.0) [2024-06-15 11:51:30,956][1648985] Avg episode reward: [(0, '86.260')] [2024-06-15 11:51:31,301][1651469] Signal inference workers to stop experience collection... (1400 times) [2024-06-15 11:51:31,424][1652491] InferenceWorker_p0-w0: stopping experience collection (1400 times) [2024-06-15 11:51:31,553][1651469] Signal inference workers to resume experience collection... (1400 times) [2024-06-15 11:51:31,555][1652491] InferenceWorker_p0-w0: resuming experience collection (1400 times) [2024-06-15 11:51:32,294][1652491] Updated weights for policy 0, policy_version 25968 (0.0012) [2024-06-15 11:51:34,549][1652491] Updated weights for policy 0, policy_version 26002 (0.0013) [2024-06-15 11:51:35,955][1648985] Fps is (10 sec: 39322.1, 60 sec: 47513.7, 300 sec: 46652.8). Total num frames: 53346304. Throughput: 0: 11525.7. Samples: 13412864. Policy #0 lag: (min: 15.0, avg: 130.5, max: 271.0) [2024-06-15 11:51:35,955][1648985] Avg episode reward: [(0, '80.770')] [2024-06-15 11:51:36,377][1652491] Updated weights for policy 0, policy_version 26070 (0.0013) [2024-06-15 11:51:37,432][1652491] Updated weights for policy 0, policy_version 26112 (0.0013) [2024-06-15 11:51:40,073][1652491] Updated weights for policy 0, policy_version 26164 (0.0014) [2024-06-15 11:51:40,955][1648985] Fps is (10 sec: 52428.0, 60 sec: 47513.9, 300 sec: 46652.7). Total num frames: 53608448. Throughput: 0: 11685.0. Samples: 13448704. Policy #0 lag: (min: 28.0, avg: 153.1, max: 284.0) [2024-06-15 11:51:40,956][1648985] Avg episode reward: [(0, '88.290')] [2024-06-15 11:51:43,912][1652491] Updated weights for policy 0, policy_version 26235 (0.0149) [2024-06-15 11:51:45,955][1648985] Fps is (10 sec: 42597.4, 60 sec: 46421.2, 300 sec: 46652.7). Total num frames: 53772288. Throughput: 0: 11514.3. Samples: 13516288. Policy #0 lag: (min: 28.0, avg: 153.1, max: 284.0) [2024-06-15 11:51:45,956][1648985] Avg episode reward: [(0, '107.260')] [2024-06-15 11:51:46,706][1651469] Saving new best policy, reward=107.260! [2024-06-15 11:51:47,419][1652491] Updated weights for policy 0, policy_version 26306 (0.0038) [2024-06-15 11:51:48,925][1652491] Updated weights for policy 0, policy_version 26368 (0.0012) [2024-06-15 11:51:50,955][1648985] Fps is (10 sec: 39321.8, 60 sec: 45875.1, 300 sec: 46208.5). Total num frames: 54001664. Throughput: 0: 11593.9. Samples: 13582336. Policy #0 lag: (min: 28.0, avg: 153.1, max: 284.0) [2024-06-15 11:51:50,956][1648985] Avg episode reward: [(0, '100.810')] [2024-06-15 11:51:55,328][1652491] Updated weights for policy 0, policy_version 26451 (0.0016) [2024-06-15 11:51:55,955][1648985] Fps is (10 sec: 45875.5, 60 sec: 45329.2, 300 sec: 46541.7). Total num frames: 54231040. Throughput: 0: 11434.7. Samples: 13618176. Policy #0 lag: (min: 28.0, avg: 153.1, max: 284.0) [2024-06-15 11:51:55,956][1648985] Avg episode reward: [(0, '86.140')] [2024-06-15 11:51:56,084][1652491] Updated weights for policy 0, policy_version 26493 (0.0014) [2024-06-15 11:51:58,050][1652491] Updated weights for policy 0, policy_version 26551 (0.0015) [2024-06-15 11:51:58,986][1652491] Updated weights for policy 0, policy_version 26579 (0.0013) [2024-06-15 11:51:59,917][1652491] Updated weights for policy 0, policy_version 26619 (0.0013) [2024-06-15 11:52:00,955][1648985] Fps is (10 sec: 52429.3, 60 sec: 45875.2, 300 sec: 46541.7). Total num frames: 54525952. Throughput: 0: 11332.3. Samples: 13681664. Policy #0 lag: (min: 31.0, avg: 137.4, max: 287.0) [2024-06-15 11:52:00,956][1648985] Avg episode reward: [(0, '85.740')] [2024-06-15 11:52:03,032][1652491] Updated weights for policy 0, policy_version 26688 (0.0013) [2024-06-15 11:52:05,955][1648985] Fps is (10 sec: 42599.2, 60 sec: 43690.8, 300 sec: 46208.5). Total num frames: 54657024. Throughput: 0: 11537.1. Samples: 13764608. Policy #0 lag: (min: 31.0, avg: 137.4, max: 287.0) [2024-06-15 11:52:05,955][1648985] Avg episode reward: [(0, '87.380')] [2024-06-15 11:52:07,997][1652491] Updated weights for policy 0, policy_version 26768 (0.0024) [2024-06-15 11:52:09,199][1652491] Updated weights for policy 0, policy_version 26812 (0.0013) [2024-06-15 11:52:10,830][1652491] Updated weights for policy 0, policy_version 26867 (0.0014) [2024-06-15 11:52:10,955][1648985] Fps is (10 sec: 49152.0, 60 sec: 47513.7, 300 sec: 46763.8). Total num frames: 55017472. Throughput: 0: 11446.0. Samples: 13791744. Policy #0 lag: (min: 31.0, avg: 137.4, max: 287.0) [2024-06-15 11:52:10,956][1648985] Avg episode reward: [(0, '103.710')] [2024-06-15 11:52:13,232][1652491] Updated weights for policy 0, policy_version 26896 (0.0010) [2024-06-15 11:52:13,342][1651469] Signal inference workers to stop experience collection... (1450 times) [2024-06-15 11:52:13,394][1652491] InferenceWorker_p0-w0: stopping experience collection (1450 times) [2024-06-15 11:52:13,566][1651469] Signal inference workers to resume experience collection... (1450 times) [2024-06-15 11:52:13,567][1652491] InferenceWorker_p0-w0: resuming experience collection (1450 times) [2024-06-15 11:52:14,283][1652491] Updated weights for policy 0, policy_version 26942 (0.0016) [2024-06-15 11:52:15,955][1648985] Fps is (10 sec: 52428.1, 60 sec: 45875.3, 300 sec: 46208.4). Total num frames: 55181312. Throughput: 0: 11628.1. Samples: 13869568. Policy #0 lag: (min: 31.0, avg: 137.4, max: 287.0) [2024-06-15 11:52:15,956][1648985] Avg episode reward: [(0, '97.730')] [2024-06-15 11:52:17,641][1652491] Updated weights for policy 0, policy_version 27002 (0.0013) [2024-06-15 11:52:19,450][1652491] Updated weights for policy 0, policy_version 27043 (0.0021) [2024-06-15 11:52:20,670][1652491] Updated weights for policy 0, policy_version 27093 (0.0014) [2024-06-15 11:52:20,955][1648985] Fps is (10 sec: 49152.3, 60 sec: 48059.8, 300 sec: 46874.9). Total num frames: 55508992. Throughput: 0: 11707.7. Samples: 13939712. Policy #0 lag: (min: 7.0, avg: 92.8, max: 263.0) [2024-06-15 11:52:20,955][1648985] Avg episode reward: [(0, '96.300')] [2024-06-15 11:52:21,555][1652491] Updated weights for policy 0, policy_version 27131 (0.0012) [2024-06-15 11:52:24,148][1652491] Updated weights for policy 0, policy_version 27170 (0.0016) [2024-06-15 11:52:25,955][1648985] Fps is (10 sec: 52428.6, 60 sec: 45875.2, 300 sec: 46430.6). Total num frames: 55705600. Throughput: 0: 11764.7. Samples: 13978112. Policy #0 lag: (min: 7.0, avg: 92.8, max: 263.0) [2024-06-15 11:52:25,956][1648985] Avg episode reward: [(0, '99.090')] [2024-06-15 11:52:27,452][1652491] Updated weights for policy 0, policy_version 27232 (0.0014) [2024-06-15 11:52:30,355][1652491] Updated weights for policy 0, policy_version 27326 (0.0014) [2024-06-15 11:52:30,955][1648985] Fps is (10 sec: 45874.4, 60 sec: 48059.6, 300 sec: 46986.0). Total num frames: 55967744. Throughput: 0: 11901.1. Samples: 14051840. Policy #0 lag: (min: 7.0, avg: 92.8, max: 263.0) [2024-06-15 11:52:30,956][1648985] Avg episode reward: [(0, '97.060')] [2024-06-15 11:52:32,176][1652491] Updated weights for policy 0, policy_version 27383 (0.0015) [2024-06-15 11:52:34,908][1652491] Updated weights for policy 0, policy_version 27411 (0.0013) [2024-06-15 11:52:35,954][1652491] Updated weights for policy 0, policy_version 27452 (0.0016) [2024-06-15 11:52:35,955][1648985] Fps is (10 sec: 49151.5, 60 sec: 47513.4, 300 sec: 46541.7). Total num frames: 56197120. Throughput: 0: 11980.8. Samples: 14121472. Policy #0 lag: (min: 30.0, avg: 147.4, max: 286.0) [2024-06-15 11:52:35,956][1648985] Avg episode reward: [(0, '90.220')] [2024-06-15 11:52:39,170][1652491] Updated weights for policy 0, policy_version 27520 (0.0011) [2024-06-15 11:52:40,955][1648985] Fps is (10 sec: 42599.0, 60 sec: 46421.5, 300 sec: 46764.8). Total num frames: 56393728. Throughput: 0: 12037.7. Samples: 14159872. Policy #0 lag: (min: 30.0, avg: 147.4, max: 286.0) [2024-06-15 11:52:40,956][1648985] Avg episode reward: [(0, '85.740')] [2024-06-15 11:52:41,809][1652491] Updated weights for policy 0, policy_version 27580 (0.0167) [2024-06-15 11:52:43,956][1652491] Updated weights for policy 0, policy_version 27645 (0.0017) [2024-06-15 11:52:45,955][1648985] Fps is (10 sec: 42598.7, 60 sec: 47513.6, 300 sec: 46541.7). Total num frames: 56623104. Throughput: 0: 11992.2. Samples: 14221312. Policy #0 lag: (min: 30.0, avg: 147.4, max: 286.0) [2024-06-15 11:52:45,956][1648985] Avg episode reward: [(0, '91.280')] [2024-06-15 11:52:47,341][1652491] Updated weights for policy 0, policy_version 27702 (0.0020) [2024-06-15 11:52:50,955][1648985] Fps is (10 sec: 42597.8, 60 sec: 46967.4, 300 sec: 46430.6). Total num frames: 56819712. Throughput: 0: 11798.7. Samples: 14295552. Policy #0 lag: (min: 30.0, avg: 147.4, max: 286.0) [2024-06-15 11:52:50,956][1648985] Avg episode reward: [(0, '101.360')] [2024-06-15 11:52:51,006][1652491] Updated weights for policy 0, policy_version 27744 (0.0109) [2024-06-15 11:52:52,414][1652491] Updated weights for policy 0, policy_version 27797 (0.0012) [2024-06-15 11:52:54,138][1652491] Updated weights for policy 0, policy_version 27857 (0.0013) [2024-06-15 11:52:55,108][1652491] Updated weights for policy 0, policy_version 27903 (0.0018) [2024-06-15 11:52:55,970][1648985] Fps is (10 sec: 52348.5, 60 sec: 48593.5, 300 sec: 46983.6). Total num frames: 57147392. Throughput: 0: 11874.4. Samples: 14326272. Policy #0 lag: (min: 1.0, avg: 139.1, max: 257.0) [2024-06-15 11:52:55,971][1648985] Avg episode reward: [(0, '91.440')] [2024-06-15 11:52:55,977][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000027904_57147392.pth... [2024-06-15 11:52:56,152][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000022416_45907968.pth [2024-06-15 11:52:58,245][1651469] Signal inference workers to stop experience collection... (1500 times) [2024-06-15 11:52:58,272][1652491] InferenceWorker_p0-w0: stopping experience collection (1500 times) [2024-06-15 11:52:58,552][1651469] Signal inference workers to resume experience collection... (1500 times) [2024-06-15 11:52:58,553][1652491] InferenceWorker_p0-w0: resuming experience collection (1500 times) [2024-06-15 11:52:58,804][1652491] Updated weights for policy 0, policy_version 27960 (0.0016) [2024-06-15 11:53:00,955][1648985] Fps is (10 sec: 45876.2, 60 sec: 45875.3, 300 sec: 46208.5). Total num frames: 57278464. Throughput: 0: 11662.2. Samples: 14394368. Policy #0 lag: (min: 1.0, avg: 139.1, max: 257.0) [2024-06-15 11:53:00,955][1648985] Avg episode reward: [(0, '84.930')] [2024-06-15 11:53:02,227][1652491] Updated weights for policy 0, policy_version 28022 (0.0013) [2024-06-15 11:53:03,937][1652491] Updated weights for policy 0, policy_version 28065 (0.0013) [2024-06-15 11:53:04,956][1652491] Updated weights for policy 0, policy_version 28114 (0.0015) [2024-06-15 11:53:05,955][1648985] Fps is (10 sec: 52508.8, 60 sec: 50244.0, 300 sec: 47097.1). Total num frames: 57671680. Throughput: 0: 11730.4. Samples: 14467584. Policy #0 lag: (min: 1.0, avg: 139.1, max: 257.0) [2024-06-15 11:53:05,956][1648985] Avg episode reward: [(0, '90.600')] [2024-06-15 11:53:08,788][1652491] Updated weights for policy 0, policy_version 28163 (0.0012) [2024-06-15 11:53:10,955][1648985] Fps is (10 sec: 52428.3, 60 sec: 46421.3, 300 sec: 46209.5). Total num frames: 57802752. Throughput: 0: 11764.6. Samples: 14507520. Policy #0 lag: (min: 1.0, avg: 139.1, max: 257.0) [2024-06-15 11:53:10,956][1648985] Avg episode reward: [(0, '84.990')] [2024-06-15 11:53:11,883][1652491] Updated weights for policy 0, policy_version 28225 (0.0034) [2024-06-15 11:53:13,207][1652491] Updated weights for policy 0, policy_version 28288 (0.0012) [2024-06-15 11:53:15,664][1652491] Updated weights for policy 0, policy_version 28360 (0.0145) [2024-06-15 11:53:15,955][1648985] Fps is (10 sec: 42598.6, 60 sec: 48605.8, 300 sec: 46763.9). Total num frames: 58097664. Throughput: 0: 11707.7. Samples: 14578688. Policy #0 lag: (min: 31.0, avg: 124.3, max: 287.0) [2024-06-15 11:53:15,956][1648985] Avg episode reward: [(0, '77.200')] [2024-06-15 11:53:16,659][1652491] Updated weights for policy 0, policy_version 28407 (0.0013) [2024-06-15 11:53:20,387][1652491] Updated weights for policy 0, policy_version 28448 (0.0013) [2024-06-15 11:53:20,955][1648985] Fps is (10 sec: 49151.5, 60 sec: 46421.2, 300 sec: 46541.7). Total num frames: 58294272. Throughput: 0: 11673.6. Samples: 14646784. Policy #0 lag: (min: 31.0, avg: 124.3, max: 287.0) [2024-06-15 11:53:20,956][1648985] Avg episode reward: [(0, '75.350')] [2024-06-15 11:53:23,590][1652491] Updated weights for policy 0, policy_version 28512 (0.0014) [2024-06-15 11:53:25,955][1648985] Fps is (10 sec: 42598.2, 60 sec: 46967.4, 300 sec: 46874.9). Total num frames: 58523648. Throughput: 0: 11650.8. Samples: 14684160. Policy #0 lag: (min: 31.0, avg: 124.3, max: 287.0) [2024-06-15 11:53:25,956][1648985] Avg episode reward: [(0, '81.300')] [2024-06-15 11:53:26,384][1652491] Updated weights for policy 0, policy_version 28600 (0.0035) [2024-06-15 11:53:27,801][1652491] Updated weights for policy 0, policy_version 28656 (0.0014) [2024-06-15 11:53:30,955][1648985] Fps is (10 sec: 42598.1, 60 sec: 45875.2, 300 sec: 46541.7). Total num frames: 58720256. Throughput: 0: 11776.0. Samples: 14751232. Policy #0 lag: (min: 31.0, avg: 124.3, max: 287.0) [2024-06-15 11:53:30,956][1648985] Avg episode reward: [(0, '96.020')] [2024-06-15 11:53:31,498][1652491] Updated weights for policy 0, policy_version 28694 (0.0012) [2024-06-15 11:53:35,620][1652491] Updated weights for policy 0, policy_version 28784 (0.0012) [2024-06-15 11:53:35,955][1648985] Fps is (10 sec: 45876.2, 60 sec: 46421.5, 300 sec: 46652.8). Total num frames: 58982400. Throughput: 0: 11685.0. Samples: 14821376. Policy #0 lag: (min: 30.0, avg: 141.8, max: 286.0) [2024-06-15 11:53:35,956][1648985] Avg episode reward: [(0, '100.690')] [2024-06-15 11:53:36,980][1652491] Updated weights for policy 0, policy_version 28821 (0.0040) [2024-06-15 11:53:38,716][1652491] Updated weights for policy 0, policy_version 28880 (0.0013) [2024-06-15 11:53:40,955][1648985] Fps is (10 sec: 52429.3, 60 sec: 47513.5, 300 sec: 46763.8). Total num frames: 59244544. Throughput: 0: 11757.2. Samples: 14855168. Policy #0 lag: (min: 30.0, avg: 141.8, max: 286.0) [2024-06-15 11:53:40,956][1648985] Avg episode reward: [(0, '91.750')] [2024-06-15 11:53:42,936][1652491] Updated weights for policy 0, policy_version 28931 (0.0012) [2024-06-15 11:53:43,155][1651469] Signal inference workers to stop experience collection... (1550 times) [2024-06-15 11:53:43,294][1652491] InferenceWorker_p0-w0: stopping experience collection (1550 times) [2024-06-15 11:53:43,369][1651469] Signal inference workers to resume experience collection... (1550 times) [2024-06-15 11:53:43,373][1652491] InferenceWorker_p0-w0: resuming experience collection (1550 times) [2024-06-15 11:53:43,981][1652491] Updated weights for policy 0, policy_version 28988 (0.0082) [2024-06-15 11:53:45,955][1648985] Fps is (10 sec: 42597.9, 60 sec: 46421.3, 300 sec: 46319.5). Total num frames: 59408384. Throughput: 0: 11776.0. Samples: 14924288. Policy #0 lag: (min: 30.0, avg: 141.8, max: 286.0) [2024-06-15 11:53:45,956][1648985] Avg episode reward: [(0, '77.460')] [2024-06-15 11:53:46,626][1652491] Updated weights for policy 0, policy_version 29049 (0.0018) [2024-06-15 11:53:49,438][1652491] Updated weights for policy 0, policy_version 29106 (0.0056) [2024-06-15 11:53:50,955][1648985] Fps is (10 sec: 45875.4, 60 sec: 48059.8, 300 sec: 46874.9). Total num frames: 59703296. Throughput: 0: 11514.3. Samples: 14985728. Policy #0 lag: (min: 30.0, avg: 141.8, max: 286.0) [2024-06-15 11:53:50,956][1648985] Avg episode reward: [(0, '87.600')] [2024-06-15 11:53:51,145][1652491] Updated weights for policy 0, policy_version 29173 (0.0033) [2024-06-15 11:53:55,711][1652491] Updated weights for policy 0, policy_version 29217 (0.0014) [2024-06-15 11:53:55,955][1648985] Fps is (10 sec: 45875.0, 60 sec: 45340.6, 300 sec: 46097.4). Total num frames: 59867136. Throughput: 0: 11525.7. Samples: 15026176. Policy #0 lag: (min: 74.0, avg: 187.8, max: 298.0) [2024-06-15 11:53:55,956][1648985] Avg episode reward: [(0, '95.570')] [2024-06-15 11:53:57,604][1652491] Updated weights for policy 0, policy_version 29280 (0.0014) [2024-06-15 11:53:59,905][1652491] Updated weights for policy 0, policy_version 29328 (0.0016) [2024-06-15 11:54:00,955][1648985] Fps is (10 sec: 42598.4, 60 sec: 47513.5, 300 sec: 46763.8). Total num frames: 60129280. Throughput: 0: 11548.5. Samples: 15098368. Policy #0 lag: (min: 74.0, avg: 187.8, max: 298.0) [2024-06-15 11:54:00,956][1648985] Avg episode reward: [(0, '102.770')] [2024-06-15 11:54:01,819][1652491] Updated weights for policy 0, policy_version 29408 (0.0083) [2024-06-15 11:54:05,955][1648985] Fps is (10 sec: 42598.9, 60 sec: 43690.8, 300 sec: 46208.4). Total num frames: 60293120. Throughput: 0: 11503.0. Samples: 15164416. Policy #0 lag: (min: 74.0, avg: 187.8, max: 298.0) [2024-06-15 11:54:05,956][1648985] Avg episode reward: [(0, '95.670')] [2024-06-15 11:54:07,062][1652491] Updated weights for policy 0, policy_version 29461 (0.0013) [2024-06-15 11:54:09,012][1652491] Updated weights for policy 0, policy_version 29536 (0.0136) [2024-06-15 11:54:10,955][1648985] Fps is (10 sec: 42598.6, 60 sec: 45875.2, 300 sec: 46652.8). Total num frames: 60555264. Throughput: 0: 11446.1. Samples: 15199232. Policy #0 lag: (min: 74.0, avg: 187.8, max: 298.0) [2024-06-15 11:54:10,956][1648985] Avg episode reward: [(0, '101.870')] [2024-06-15 11:54:12,764][1652491] Updated weights for policy 0, policy_version 29617 (0.0025) [2024-06-15 11:54:14,393][1652491] Updated weights for policy 0, policy_version 29688 (0.0011) [2024-06-15 11:54:15,955][1648985] Fps is (10 sec: 52429.1, 60 sec: 45329.2, 300 sec: 46541.7). Total num frames: 60817408. Throughput: 0: 11343.7. Samples: 15261696. Policy #0 lag: (min: 15.0, avg: 129.3, max: 271.0) [2024-06-15 11:54:15,955][1648985] Avg episode reward: [(0, '101.840')] [2024-06-15 11:54:18,409][1652491] Updated weights for policy 0, policy_version 29744 (0.0012) [2024-06-15 11:54:19,869][1652491] Updated weights for policy 0, policy_version 29792 (0.0012) [2024-06-15 11:54:20,955][1648985] Fps is (10 sec: 52429.2, 60 sec: 46421.5, 300 sec: 46652.8). Total num frames: 61079552. Throughput: 0: 11457.4. Samples: 15336960. Policy #0 lag: (min: 15.0, avg: 129.3, max: 271.0) [2024-06-15 11:54:20,955][1648985] Avg episode reward: [(0, '102.280')] [2024-06-15 11:54:23,161][1652491] Updated weights for policy 0, policy_version 29841 (0.0013) [2024-06-15 11:54:24,888][1652491] Updated weights for policy 0, policy_version 29906 (0.0031) [2024-06-15 11:54:25,260][1651469] Signal inference workers to stop experience collection... (1600 times) [2024-06-15 11:54:25,299][1652491] InferenceWorker_p0-w0: stopping experience collection (1600 times) [2024-06-15 11:54:25,519][1651469] Signal inference workers to resume experience collection... (1600 times) [2024-06-15 11:54:25,520][1652491] InferenceWorker_p0-w0: resuming experience collection (1600 times) [2024-06-15 11:54:25,955][1648985] Fps is (10 sec: 49150.7, 60 sec: 46421.3, 300 sec: 46763.8). Total num frames: 61308928. Throughput: 0: 11468.8. Samples: 15371264. Policy #0 lag: (min: 15.0, avg: 129.3, max: 271.0) [2024-06-15 11:54:25,956][1648985] Avg episode reward: [(0, '99.410')] [2024-06-15 11:54:26,020][1652491] Updated weights for policy 0, policy_version 29952 (0.0014) [2024-06-15 11:54:30,069][1652491] Updated weights for policy 0, policy_version 30011 (0.0013) [2024-06-15 11:54:30,955][1648985] Fps is (10 sec: 45874.8, 60 sec: 46967.6, 300 sec: 46430.6). Total num frames: 61538304. Throughput: 0: 11491.6. Samples: 15441408. Policy #0 lag: (min: 15.0, avg: 129.3, max: 271.0) [2024-06-15 11:54:30,956][1648985] Avg episode reward: [(0, '103.170')] [2024-06-15 11:54:31,143][1652491] Updated weights for policy 0, policy_version 30064 (0.0013) [2024-06-15 11:54:34,617][1652491] Updated weights for policy 0, policy_version 30104 (0.0013) [2024-06-15 11:54:35,719][1652491] Updated weights for policy 0, policy_version 30160 (0.0012) [2024-06-15 11:54:35,955][1648985] Fps is (10 sec: 45875.9, 60 sec: 46421.2, 300 sec: 46986.0). Total num frames: 61767680. Throughput: 0: 11776.0. Samples: 15515648. Policy #0 lag: (min: 73.0, avg: 177.7, max: 313.0) [2024-06-15 11:54:35,956][1648985] Avg episode reward: [(0, '111.440')] [2024-06-15 11:54:36,599][1651469] Saving new best policy, reward=111.440! [2024-06-15 11:54:36,964][1652491] Updated weights for policy 0, policy_version 30207 (0.0011) [2024-06-15 11:54:40,955][1648985] Fps is (10 sec: 42597.8, 60 sec: 45329.0, 300 sec: 46430.6). Total num frames: 61964288. Throughput: 0: 11753.2. Samples: 15555072. Policy #0 lag: (min: 73.0, avg: 177.7, max: 313.0) [2024-06-15 11:54:40,956][1648985] Avg episode reward: [(0, '109.860')] [2024-06-15 11:54:41,305][1652491] Updated weights for policy 0, policy_version 30272 (0.0014) [2024-06-15 11:54:42,851][1652491] Updated weights for policy 0, policy_version 30335 (0.0013) [2024-06-15 11:54:45,955][1648985] Fps is (10 sec: 39321.9, 60 sec: 45875.3, 300 sec: 46763.8). Total num frames: 62160896. Throughput: 0: 11605.4. Samples: 15620608. Policy #0 lag: (min: 73.0, avg: 177.7, max: 313.0) [2024-06-15 11:54:45,956][1648985] Avg episode reward: [(0, '105.970')] [2024-06-15 11:54:46,680][1652491] Updated weights for policy 0, policy_version 30390 (0.0116) [2024-06-15 11:54:47,963][1652491] Updated weights for policy 0, policy_version 30455 (0.0013) [2024-06-15 11:54:50,955][1648985] Fps is (10 sec: 42599.2, 60 sec: 44783.0, 300 sec: 46541.7). Total num frames: 62390272. Throughput: 0: 11673.6. Samples: 15689728. Policy #0 lag: (min: 73.0, avg: 177.7, max: 313.0) [2024-06-15 11:54:50,956][1648985] Avg episode reward: [(0, '97.240')] [2024-06-15 11:54:52,636][1652491] Updated weights for policy 0, policy_version 30528 (0.0018) [2024-06-15 11:54:54,396][1652491] Updated weights for policy 0, policy_version 30592 (0.0109) [2024-06-15 11:54:55,956][1648985] Fps is (10 sec: 49149.3, 60 sec: 46421.0, 300 sec: 46652.7). Total num frames: 62652416. Throughput: 0: 11389.0. Samples: 15711744. Policy #0 lag: (min: 1.0, avg: 89.5, max: 257.0) [2024-06-15 11:54:55,957][1648985] Avg episode reward: [(0, '92.310')] [2024-06-15 11:54:55,963][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000030592_62652416.pth... [2024-06-15 11:54:56,052][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000025152_51511296.pth [2024-06-15 11:54:58,460][1652491] Updated weights for policy 0, policy_version 30652 (0.0013) [2024-06-15 11:54:59,986][1652491] Updated weights for policy 0, policy_version 30713 (0.0012) [2024-06-15 11:55:00,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 46421.4, 300 sec: 46652.8). Total num frames: 62914560. Throughput: 0: 11593.9. Samples: 15783424. Policy #0 lag: (min: 1.0, avg: 89.5, max: 257.0) [2024-06-15 11:55:00,956][1648985] Avg episode reward: [(0, '88.610')] [2024-06-15 11:55:03,749][1652491] Updated weights for policy 0, policy_version 30753 (0.0012) [2024-06-15 11:55:05,484][1652491] Updated weights for policy 0, policy_version 30817 (0.0012) [2024-06-15 11:55:05,955][1648985] Fps is (10 sec: 49154.7, 60 sec: 47513.6, 300 sec: 46541.7). Total num frames: 63143936. Throughput: 0: 11457.4. Samples: 15852544. Policy #0 lag: (min: 1.0, avg: 89.5, max: 257.0) [2024-06-15 11:55:05,956][1648985] Avg episode reward: [(0, '95.170')] [2024-06-15 11:55:06,035][1652491] Updated weights for policy 0, policy_version 30846 (0.0019) [2024-06-15 11:55:08,185][1651469] Signal inference workers to stop experience collection... (1650 times) [2024-06-15 11:55:08,238][1652491] InferenceWorker_p0-w0: stopping experience collection (1650 times) [2024-06-15 11:55:08,418][1651469] Signal inference workers to resume experience collection... (1650 times) [2024-06-15 11:55:08,419][1652491] InferenceWorker_p0-w0: resuming experience collection (1650 times) [2024-06-15 11:55:09,597][1652491] Updated weights for policy 0, policy_version 30912 (0.0013) [2024-06-15 11:55:10,796][1652491] Updated weights for policy 0, policy_version 30960 (0.0013) [2024-06-15 11:55:10,955][1648985] Fps is (10 sec: 49151.8, 60 sec: 47513.6, 300 sec: 46541.7). Total num frames: 63406080. Throughput: 0: 11650.9. Samples: 15895552. Policy #0 lag: (min: 1.0, avg: 89.5, max: 257.0) [2024-06-15 11:55:10,956][1648985] Avg episode reward: [(0, '102.860')] [2024-06-15 11:55:14,762][1652491] Updated weights for policy 0, policy_version 30995 (0.0012) [2024-06-15 11:55:15,955][1648985] Fps is (10 sec: 42598.4, 60 sec: 45875.2, 300 sec: 46319.5). Total num frames: 63569920. Throughput: 0: 11639.5. Samples: 15965184. Policy #0 lag: (min: 9.0, avg: 100.5, max: 265.0) [2024-06-15 11:55:15,955][1648985] Avg episode reward: [(0, '99.250')] [2024-06-15 11:55:17,293][1652491] Updated weights for policy 0, policy_version 31097 (0.0119) [2024-06-15 11:55:20,955][1648985] Fps is (10 sec: 36044.8, 60 sec: 44782.9, 300 sec: 46430.6). Total num frames: 63766528. Throughput: 0: 11491.6. Samples: 16032768. Policy #0 lag: (min: 9.0, avg: 100.5, max: 265.0) [2024-06-15 11:55:20,956][1648985] Avg episode reward: [(0, '101.380')] [2024-06-15 11:55:21,246][1652491] Updated weights for policy 0, policy_version 31152 (0.0010) [2024-06-15 11:55:22,759][1652491] Updated weights for policy 0, policy_version 31216 (0.0013) [2024-06-15 11:55:25,955][1648985] Fps is (10 sec: 39320.7, 60 sec: 44236.8, 300 sec: 46208.4). Total num frames: 63963136. Throughput: 0: 11252.6. Samples: 16061440. Policy #0 lag: (min: 9.0, avg: 100.5, max: 265.0) [2024-06-15 11:55:25,956][1648985] Avg episode reward: [(0, '103.190')] [2024-06-15 11:55:26,881][1652491] Updated weights for policy 0, policy_version 31264 (0.0019) [2024-06-15 11:55:29,311][1652491] Updated weights for policy 0, policy_version 31352 (0.0013) [2024-06-15 11:55:30,955][1648985] Fps is (10 sec: 45874.8, 60 sec: 44782.9, 300 sec: 46541.7). Total num frames: 64225280. Throughput: 0: 11252.6. Samples: 16126976. Policy #0 lag: (min: 9.0, avg: 100.5, max: 265.0) [2024-06-15 11:55:30,956][1648985] Avg episode reward: [(0, '101.030')] [2024-06-15 11:55:31,939][1652491] Updated weights for policy 0, policy_version 31392 (0.0010) [2024-06-15 11:55:33,732][1652491] Updated weights for policy 0, policy_version 31456 (0.0139) [2024-06-15 11:55:35,955][1648985] Fps is (10 sec: 52429.7, 60 sec: 45329.1, 300 sec: 46541.7). Total num frames: 64487424. Throughput: 0: 11173.0. Samples: 16192512. Policy #0 lag: (min: 9.0, avg: 100.5, max: 265.0) [2024-06-15 11:55:35,956][1648985] Avg episode reward: [(0, '98.990')] [2024-06-15 11:55:38,963][1652491] Updated weights for policy 0, policy_version 31521 (0.0015) [2024-06-15 11:55:40,955][1648985] Fps is (10 sec: 45874.3, 60 sec: 45329.0, 300 sec: 46430.5). Total num frames: 64684032. Throughput: 0: 11457.5. Samples: 16227328. Policy #0 lag: (min: 15.0, avg: 104.4, max: 271.0) [2024-06-15 11:55:40,956][1648985] Avg episode reward: [(0, '115.200')] [2024-06-15 11:55:41,193][1652491] Updated weights for policy 0, policy_version 31600 (0.0013) [2024-06-15 11:55:41,598][1651469] Saving new best policy, reward=115.200! [2024-06-15 11:55:43,589][1652491] Updated weights for policy 0, policy_version 31636 (0.0013) [2024-06-15 11:55:44,611][1652491] Updated weights for policy 0, policy_version 31676 (0.0026) [2024-06-15 11:55:45,955][1648985] Fps is (10 sec: 45874.7, 60 sec: 46421.2, 300 sec: 46430.6). Total num frames: 64946176. Throughput: 0: 11309.5. Samples: 16292352. Policy #0 lag: (min: 15.0, avg: 104.4, max: 271.0) [2024-06-15 11:55:45,956][1648985] Avg episode reward: [(0, '111.950')] [2024-06-15 11:55:46,062][1652491] Updated weights for policy 0, policy_version 31715 (0.0012) [2024-06-15 11:55:50,955][1648985] Fps is (10 sec: 36045.8, 60 sec: 44236.8, 300 sec: 45875.2). Total num frames: 65044480. Throughput: 0: 11275.4. Samples: 16359936. Policy #0 lag: (min: 15.0, avg: 104.4, max: 271.0) [2024-06-15 11:55:50,956][1648985] Avg episode reward: [(0, '110.630')] [2024-06-15 11:55:51,227][1651469] Signal inference workers to stop experience collection... (1700 times) [2024-06-15 11:55:51,261][1652491] InferenceWorker_p0-w0: stopping experience collection (1700 times) [2024-06-15 11:55:51,428][1651469] Signal inference workers to resume experience collection... (1700 times) [2024-06-15 11:55:51,429][1652491] InferenceWorker_p0-w0: resuming experience collection (1700 times) [2024-06-15 11:55:51,646][1652491] Updated weights for policy 0, policy_version 31795 (0.0012) [2024-06-15 11:55:55,006][1652491] Updated weights for policy 0, policy_version 31873 (0.0012) [2024-06-15 11:55:55,955][1648985] Fps is (10 sec: 39321.8, 60 sec: 44783.3, 300 sec: 45986.3). Total num frames: 65339392. Throughput: 0: 10854.4. Samples: 16384000. Policy #0 lag: (min: 15.0, avg: 104.4, max: 271.0) [2024-06-15 11:55:55,956][1648985] Avg episode reward: [(0, '108.650')] [2024-06-15 11:55:56,268][1652491] Updated weights for policy 0, policy_version 31925 (0.0011) [2024-06-15 11:55:58,400][1652491] Updated weights for policy 0, policy_version 31972 (0.0012) [2024-06-15 11:56:00,955][1648985] Fps is (10 sec: 49152.9, 60 sec: 43690.8, 300 sec: 45764.2). Total num frames: 65536000. Throughput: 0: 10945.5. Samples: 16457728. Policy #0 lag: (min: 15.0, avg: 104.4, max: 271.0) [2024-06-15 11:56:00,955][1648985] Avg episode reward: [(0, '105.590')] [2024-06-15 11:56:02,404][1652491] Updated weights for policy 0, policy_version 32016 (0.0012) [2024-06-15 11:56:04,112][1652491] Updated weights for policy 0, policy_version 32080 (0.0013) [2024-06-15 11:56:05,185][1652491] Updated weights for policy 0, policy_version 32126 (0.0012) [2024-06-15 11:56:05,955][1648985] Fps is (10 sec: 45874.9, 60 sec: 44236.7, 300 sec: 46208.4). Total num frames: 65798144. Throughput: 0: 10979.5. Samples: 16526848. Policy #0 lag: (min: 14.0, avg: 95.6, max: 270.0) [2024-06-15 11:56:05,956][1648985] Avg episode reward: [(0, '106.780')] [2024-06-15 11:56:06,953][1652491] Updated weights for policy 0, policy_version 32176 (0.0013) [2024-06-15 11:56:09,739][1652491] Updated weights for policy 0, policy_version 32224 (0.0013) [2024-06-15 11:56:10,635][1652491] Updated weights for policy 0, policy_version 32256 (0.0012) [2024-06-15 11:56:10,955][1648985] Fps is (10 sec: 52427.8, 60 sec: 44236.8, 300 sec: 46208.5). Total num frames: 66060288. Throughput: 0: 11104.8. Samples: 16561152. Policy #0 lag: (min: 14.0, avg: 95.6, max: 270.0) [2024-06-15 11:56:10,956][1648985] Avg episode reward: [(0, '101.910')] [2024-06-15 11:56:15,011][1652491] Updated weights for policy 0, policy_version 32321 (0.0013) [2024-06-15 11:56:15,955][1648985] Fps is (10 sec: 49152.9, 60 sec: 45329.1, 300 sec: 46319.5). Total num frames: 66289664. Throughput: 0: 11150.3. Samples: 16628736. Policy #0 lag: (min: 14.0, avg: 95.6, max: 270.0) [2024-06-15 11:56:15,956][1648985] Avg episode reward: [(0, '94.980')] [2024-06-15 11:56:18,024][1652491] Updated weights for policy 0, policy_version 32416 (0.0048) [2024-06-15 11:56:20,905][1652491] Updated weights for policy 0, policy_version 32464 (0.0126) [2024-06-15 11:56:20,955][1648985] Fps is (10 sec: 42598.4, 60 sec: 45329.1, 300 sec: 45875.2). Total num frames: 66486272. Throughput: 0: 11229.9. Samples: 16697856. Policy #0 lag: (min: 14.0, avg: 95.6, max: 270.0) [2024-06-15 11:56:20,956][1648985] Avg episode reward: [(0, '99.420')] [2024-06-15 11:56:24,869][1652491] Updated weights for policy 0, policy_version 32513 (0.0014) [2024-06-15 11:56:25,955][1648985] Fps is (10 sec: 39321.5, 60 sec: 45329.2, 300 sec: 46097.4). Total num frames: 66682880. Throughput: 0: 11321.0. Samples: 16736768. Policy #0 lag: (min: 13.0, avg: 90.5, max: 269.0) [2024-06-15 11:56:25,955][1648985] Avg episode reward: [(0, '118.510')] [2024-06-15 11:56:26,227][1651469] Saving new best policy, reward=118.510! [2024-06-15 11:56:26,886][1652491] Updated weights for policy 0, policy_version 32608 (0.0012) [2024-06-15 11:56:28,426][1652491] Updated weights for policy 0, policy_version 32643 (0.0024) [2024-06-15 11:56:29,777][1652491] Updated weights for policy 0, policy_version 32700 (0.0012) [2024-06-15 11:56:30,955][1648985] Fps is (10 sec: 49151.6, 60 sec: 45875.2, 300 sec: 46208.4). Total num frames: 66977792. Throughput: 0: 11229.9. Samples: 16797696. Policy #0 lag: (min: 13.0, avg: 90.5, max: 269.0) [2024-06-15 11:56:30,956][1648985] Avg episode reward: [(0, '114.910')] [2024-06-15 11:56:33,015][1652491] Updated weights for policy 0, policy_version 32752 (0.0011) [2024-06-15 11:56:35,924][1651469] Signal inference workers to stop experience collection... (1750 times) [2024-06-15 11:56:35,955][1648985] Fps is (10 sec: 42597.6, 60 sec: 43690.6, 300 sec: 45764.1). Total num frames: 67108864. Throughput: 0: 11616.7. Samples: 16882688. Policy #0 lag: (min: 13.0, avg: 90.5, max: 269.0) [2024-06-15 11:56:35,956][1648985] Avg episode reward: [(0, '98.340')] [2024-06-15 11:56:35,969][1652491] InferenceWorker_p0-w0: stopping experience collection (1750 times) [2024-06-15 11:56:36,216][1651469] Signal inference workers to resume experience collection... (1750 times) [2024-06-15 11:56:36,217][1652491] InferenceWorker_p0-w0: resuming experience collection (1750 times) [2024-06-15 11:56:36,580][1652491] Updated weights for policy 0, policy_version 32800 (0.0013) [2024-06-15 11:56:38,213][1652491] Updated weights for policy 0, policy_version 32865 (0.0059) [2024-06-15 11:56:39,376][1652491] Updated weights for policy 0, policy_version 32914 (0.0015) [2024-06-15 11:56:40,199][1652491] Updated weights for policy 0, policy_version 32957 (0.0014) [2024-06-15 11:56:40,955][1648985] Fps is (10 sec: 52429.4, 60 sec: 46967.7, 300 sec: 46541.7). Total num frames: 67502080. Throughput: 0: 11707.8. Samples: 16910848. Policy #0 lag: (min: 13.0, avg: 90.5, max: 269.0) [2024-06-15 11:56:40,956][1648985] Avg episode reward: [(0, '102.010')] [2024-06-15 11:56:44,091][1652491] Updated weights for policy 0, policy_version 33016 (0.0013) [2024-06-15 11:56:45,955][1648985] Fps is (10 sec: 52429.8, 60 sec: 44783.0, 300 sec: 46208.5). Total num frames: 67633152. Throughput: 0: 11764.6. Samples: 16987136. Policy #0 lag: (min: 13.0, avg: 90.5, max: 269.0) [2024-06-15 11:56:45,956][1648985] Avg episode reward: [(0, '105.390')] [2024-06-15 11:56:47,712][1652491] Updated weights for policy 0, policy_version 33072 (0.0014) [2024-06-15 11:56:48,664][1652491] Updated weights for policy 0, policy_version 33120 (0.0015) [2024-06-15 11:56:49,890][1652491] Updated weights for policy 0, policy_version 33169 (0.0013) [2024-06-15 11:56:50,955][1648985] Fps is (10 sec: 52427.5, 60 sec: 49698.0, 300 sec: 46763.8). Total num frames: 68026368. Throughput: 0: 11650.8. Samples: 17051136. Policy #0 lag: (min: 15.0, avg: 96.0, max: 271.0) [2024-06-15 11:56:50,956][1648985] Avg episode reward: [(0, '92.810')] [2024-06-15 11:56:54,637][1652491] Updated weights for policy 0, policy_version 33233 (0.0016) [2024-06-15 11:56:55,636][1652491] Updated weights for policy 0, policy_version 33278 (0.0144) [2024-06-15 11:56:55,955][1648985] Fps is (10 sec: 52428.4, 60 sec: 46967.5, 300 sec: 46208.4). Total num frames: 68157440. Throughput: 0: 11810.1. Samples: 17092608. Policy #0 lag: (min: 15.0, avg: 96.0, max: 271.0) [2024-06-15 11:56:55,956][1648985] Avg episode reward: [(0, '87.620')] [2024-06-15 11:56:55,961][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000033280_68157440.pth... [2024-06-15 11:56:56,034][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000027904_57147392.pth [2024-06-15 11:56:59,229][1652491] Updated weights for policy 0, policy_version 33330 (0.0014) [2024-06-15 11:57:00,837][1652491] Updated weights for policy 0, policy_version 33401 (0.0013) [2024-06-15 11:57:00,955][1648985] Fps is (10 sec: 36045.6, 60 sec: 47513.5, 300 sec: 46541.6). Total num frames: 68386816. Throughput: 0: 11741.9. Samples: 17157120. Policy #0 lag: (min: 15.0, avg: 96.0, max: 271.0) [2024-06-15 11:57:00,956][1648985] Avg episode reward: [(0, '94.410')] [2024-06-15 11:57:02,635][1652491] Updated weights for policy 0, policy_version 33464 (0.0044) [2024-06-15 11:57:05,955][1648985] Fps is (10 sec: 39321.6, 60 sec: 45875.3, 300 sec: 45875.2). Total num frames: 68550656. Throughput: 0: 11776.0. Samples: 17227776. Policy #0 lag: (min: 15.0, avg: 96.0, max: 271.0) [2024-06-15 11:57:05,956][1648985] Avg episode reward: [(0, '109.510')] [2024-06-15 11:57:07,096][1652491] Updated weights for policy 0, policy_version 33504 (0.0012) [2024-06-15 11:57:10,087][1652491] Updated weights for policy 0, policy_version 33570 (0.0013) [2024-06-15 11:57:10,955][1648985] Fps is (10 sec: 42597.5, 60 sec: 45875.1, 300 sec: 46208.4). Total num frames: 68812800. Throughput: 0: 11684.9. Samples: 17262592. Policy #0 lag: (min: 15.0, avg: 109.8, max: 271.0) [2024-06-15 11:57:10,956][1648985] Avg episode reward: [(0, '108.660')] [2024-06-15 11:57:12,366][1652491] Updated weights for policy 0, policy_version 33657 (0.0013) [2024-06-15 11:57:13,737][1652491] Updated weights for policy 0, policy_version 33696 (0.0030) [2024-06-15 11:57:13,876][1651469] Signal inference workers to stop experience collection... (1800 times) [2024-06-15 11:57:13,913][1652491] InferenceWorker_p0-w0: stopping experience collection (1800 times) [2024-06-15 11:57:14,120][1651469] Signal inference workers to resume experience collection... (1800 times) [2024-06-15 11:57:14,121][1652491] InferenceWorker_p0-w0: resuming experience collection (1800 times) [2024-06-15 11:57:15,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 46421.2, 300 sec: 45986.3). Total num frames: 69074944. Throughput: 0: 11616.7. Samples: 17320448. Policy #0 lag: (min: 15.0, avg: 109.8, max: 271.0) [2024-06-15 11:57:15,956][1648985] Avg episode reward: [(0, '108.730')] [2024-06-15 11:57:18,893][1652491] Updated weights for policy 0, policy_version 33730 (0.0013) [2024-06-15 11:57:20,152][1652491] Updated weights for policy 0, policy_version 33789 (0.0012) [2024-06-15 11:57:20,955][1648985] Fps is (10 sec: 39322.6, 60 sec: 45329.1, 300 sec: 45764.1). Total num frames: 69206016. Throughput: 0: 11355.1. Samples: 17393664. Policy #0 lag: (min: 15.0, avg: 109.8, max: 271.0) [2024-06-15 11:57:20,955][1648985] Avg episode reward: [(0, '105.000')] [2024-06-15 11:57:21,932][1652491] Updated weights for policy 0, policy_version 33845 (0.0012) [2024-06-15 11:57:23,516][1652491] Updated weights for policy 0, policy_version 33888 (0.0128) [2024-06-15 11:57:24,362][1652491] Updated weights for policy 0, policy_version 33918 (0.0011) [2024-06-15 11:57:25,721][1652491] Updated weights for policy 0, policy_version 33974 (0.0077) [2024-06-15 11:57:25,955][1648985] Fps is (10 sec: 52429.2, 60 sec: 48605.8, 300 sec: 46208.5). Total num frames: 69599232. Throughput: 0: 11366.4. Samples: 17422336. Policy #0 lag: (min: 15.0, avg: 109.8, max: 271.0) [2024-06-15 11:57:25,956][1648985] Avg episode reward: [(0, '115.680')] [2024-06-15 11:57:30,870][1652491] Updated weights for policy 0, policy_version 34003 (0.0014) [2024-06-15 11:57:30,955][1648985] Fps is (10 sec: 42598.2, 60 sec: 44236.9, 300 sec: 45542.0). Total num frames: 69632000. Throughput: 0: 11332.3. Samples: 17497088. Policy #0 lag: (min: 15.0, avg: 109.8, max: 271.0) [2024-06-15 11:57:30,956][1648985] Avg episode reward: [(0, '121.620')] [2024-06-15 11:57:31,487][1651469] Saving new best policy, reward=121.620! [2024-06-15 11:57:31,927][1652491] Updated weights for policy 0, policy_version 34042 (0.0013) [2024-06-15 11:57:34,886][1652491] Updated weights for policy 0, policy_version 34128 (0.0142) [2024-06-15 11:57:35,955][1648985] Fps is (10 sec: 36045.1, 60 sec: 47513.8, 300 sec: 45986.3). Total num frames: 69959680. Throughput: 0: 11252.7. Samples: 17557504. Policy #0 lag: (min: 46.0, avg: 121.9, max: 302.0) [2024-06-15 11:57:35,955][1648985] Avg episode reward: [(0, '116.420')] [2024-06-15 11:57:36,507][1652491] Updated weights for policy 0, policy_version 34179 (0.0013) [2024-06-15 11:57:37,580][1652491] Updated weights for policy 0, policy_version 34236 (0.0011) [2024-06-15 11:57:40,955][1648985] Fps is (10 sec: 49151.2, 60 sec: 43690.5, 300 sec: 45764.1). Total num frames: 70123520. Throughput: 0: 11116.1. Samples: 17592832. Policy #0 lag: (min: 46.0, avg: 121.9, max: 302.0) [2024-06-15 11:57:40,956][1648985] Avg episode reward: [(0, '102.520')] [2024-06-15 11:57:43,520][1652491] Updated weights for policy 0, policy_version 34304 (0.0116) [2024-06-15 11:57:45,050][1652491] Updated weights for policy 0, policy_version 34368 (0.0013) [2024-06-15 11:57:45,955][1648985] Fps is (10 sec: 42598.1, 60 sec: 45875.2, 300 sec: 45986.3). Total num frames: 70385664. Throughput: 0: 11173.0. Samples: 17659904. Policy #0 lag: (min: 46.0, avg: 121.9, max: 302.0) [2024-06-15 11:57:45,956][1648985] Avg episode reward: [(0, '95.120')] [2024-06-15 11:57:47,496][1652491] Updated weights for policy 0, policy_version 34429 (0.0013) [2024-06-15 11:57:48,733][1652491] Updated weights for policy 0, policy_version 34467 (0.0012) [2024-06-15 11:57:50,955][1648985] Fps is (10 sec: 52429.6, 60 sec: 43690.8, 300 sec: 45766.5). Total num frames: 70647808. Throughput: 0: 11229.9. Samples: 17733120. Policy #0 lag: (min: 46.0, avg: 121.9, max: 302.0) [2024-06-15 11:57:50,956][1648985] Avg episode reward: [(0, '104.790')] [2024-06-15 11:57:53,807][1652491] Updated weights for policy 0, policy_version 34503 (0.0015) [2024-06-15 11:57:54,945][1652491] Updated weights for policy 0, policy_version 34560 (0.0014) [2024-06-15 11:57:55,955][1648985] Fps is (10 sec: 45874.3, 60 sec: 44782.8, 300 sec: 45986.2). Total num frames: 70844416. Throughput: 0: 11411.9. Samples: 17776128. Policy #0 lag: (min: 15.0, avg: 76.6, max: 271.0) [2024-06-15 11:57:55,956][1648985] Avg episode reward: [(0, '112.070')] [2024-06-15 11:57:56,449][1652491] Updated weights for policy 0, policy_version 34622 (0.0029) [2024-06-15 11:57:58,757][1651469] Signal inference workers to stop experience collection... (1850 times) [2024-06-15 11:57:58,796][1652491] InferenceWorker_p0-w0: stopping experience collection (1850 times) [2024-06-15 11:57:58,960][1651469] Signal inference workers to resume experience collection... (1850 times) [2024-06-15 11:57:58,961][1652491] InferenceWorker_p0-w0: resuming experience collection (1850 times) [2024-06-15 11:57:59,334][1652491] Updated weights for policy 0, policy_version 34704 (0.0013) [2024-06-15 11:58:00,417][1652491] Updated weights for policy 0, policy_version 34750 (0.0013) [2024-06-15 11:58:00,955][1648985] Fps is (10 sec: 52428.4, 60 sec: 46421.3, 300 sec: 45764.1). Total num frames: 71172096. Throughput: 0: 11309.5. Samples: 17829376. Policy #0 lag: (min: 15.0, avg: 76.6, max: 271.0) [2024-06-15 11:58:00,956][1648985] Avg episode reward: [(0, '111.560')] [2024-06-15 11:58:05,931][1652491] Updated weights for policy 0, policy_version 34800 (0.0013) [2024-06-15 11:58:05,955][1648985] Fps is (10 sec: 42599.0, 60 sec: 45329.1, 300 sec: 45653.0). Total num frames: 71270400. Throughput: 0: 11468.8. Samples: 17909760. Policy #0 lag: (min: 15.0, avg: 76.6, max: 271.0) [2024-06-15 11:58:05,956][1648985] Avg episode reward: [(0, '108.240')] [2024-06-15 11:58:07,027][1652491] Updated weights for policy 0, policy_version 34848 (0.0012) [2024-06-15 11:58:08,672][1652491] Updated weights for policy 0, policy_version 34881 (0.0038) [2024-06-15 11:58:10,065][1652491] Updated weights for policy 0, policy_version 34944 (0.0036) [2024-06-15 11:58:10,955][1648985] Fps is (10 sec: 45875.3, 60 sec: 46967.6, 300 sec: 45875.2). Total num frames: 71630848. Throughput: 0: 11571.2. Samples: 17943040. Policy #0 lag: (min: 15.0, avg: 76.6, max: 271.0) [2024-06-15 11:58:10,956][1648985] Avg episode reward: [(0, '114.500')] [2024-06-15 11:58:11,476][1652491] Updated weights for policy 0, policy_version 35000 (0.0014) [2024-06-15 11:58:15,955][1648985] Fps is (10 sec: 49152.4, 60 sec: 44783.0, 300 sec: 45653.1). Total num frames: 71761920. Throughput: 0: 11696.4. Samples: 18023424. Policy #0 lag: (min: 15.0, avg: 76.6, max: 271.0) [2024-06-15 11:58:15,956][1648985] Avg episode reward: [(0, '102.810')] [2024-06-15 11:58:16,048][1652491] Updated weights for policy 0, policy_version 35056 (0.0013) [2024-06-15 11:58:17,903][1652491] Updated weights for policy 0, policy_version 35108 (0.0017) [2024-06-15 11:58:19,860][1652491] Updated weights for policy 0, policy_version 35152 (0.0047) [2024-06-15 11:58:20,955][1648985] Fps is (10 sec: 42598.8, 60 sec: 47513.6, 300 sec: 45875.2). Total num frames: 72056832. Throughput: 0: 11810.1. Samples: 18088960. Policy #0 lag: (min: 2.0, avg: 107.1, max: 267.0) [2024-06-15 11:58:20,955][1648985] Avg episode reward: [(0, '106.410')] [2024-06-15 11:58:21,861][1652491] Updated weights for policy 0, policy_version 35234 (0.0013) [2024-06-15 11:58:25,955][1648985] Fps is (10 sec: 45875.0, 60 sec: 43690.7, 300 sec: 45764.1). Total num frames: 72220672. Throughput: 0: 11764.7. Samples: 18122240. Policy #0 lag: (min: 2.0, avg: 107.1, max: 267.0) [2024-06-15 11:58:25,956][1648985] Avg episode reward: [(0, '108.090')] [2024-06-15 11:58:26,448][1652491] Updated weights for policy 0, policy_version 35280 (0.0018) [2024-06-15 11:58:27,697][1652491] Updated weights for policy 0, policy_version 35328 (0.0023) [2024-06-15 11:58:30,948][1652491] Updated weights for policy 0, policy_version 35412 (0.0098) [2024-06-15 11:58:30,955][1648985] Fps is (10 sec: 45874.3, 60 sec: 48059.6, 300 sec: 45875.2). Total num frames: 72515584. Throughput: 0: 11969.4. Samples: 18198528. Policy #0 lag: (min: 2.0, avg: 107.1, max: 267.0) [2024-06-15 11:58:30,956][1648985] Avg episode reward: [(0, '115.780')] [2024-06-15 11:58:32,952][1652491] Updated weights for policy 0, policy_version 35489 (0.0012) [2024-06-15 11:58:35,955][1648985] Fps is (10 sec: 52429.2, 60 sec: 46421.3, 300 sec: 45764.1). Total num frames: 72744960. Throughput: 0: 11821.5. Samples: 18265088. Policy #0 lag: (min: 2.0, avg: 107.1, max: 267.0) [2024-06-15 11:58:35,955][1648985] Avg episode reward: [(0, '117.380')] [2024-06-15 11:58:38,186][1652491] Updated weights for policy 0, policy_version 35536 (0.0014) [2024-06-15 11:58:39,567][1652491] Updated weights for policy 0, policy_version 35600 (0.0013) [2024-06-15 11:58:39,698][1651469] Signal inference workers to stop experience collection... (1900 times) [2024-06-15 11:58:39,728][1652491] InferenceWorker_p0-w0: stopping experience collection (1900 times) [2024-06-15 11:58:39,955][1651469] Signal inference workers to resume experience collection... (1900 times) [2024-06-15 11:58:39,955][1652491] InferenceWorker_p0-w0: resuming experience collection (1900 times) [2024-06-15 11:58:40,955][1648985] Fps is (10 sec: 49152.3, 60 sec: 48059.8, 300 sec: 46097.4). Total num frames: 73007104. Throughput: 0: 11798.8. Samples: 18307072. Policy #0 lag: (min: 88.0, avg: 142.9, max: 296.0) [2024-06-15 11:58:40,956][1648985] Avg episode reward: [(0, '106.580')] [2024-06-15 11:58:42,837][1652491] Updated weights for policy 0, policy_version 35683 (0.0170) [2024-06-15 11:58:45,027][1652491] Updated weights for policy 0, policy_version 35766 (0.0112) [2024-06-15 11:58:45,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 48059.8, 300 sec: 45986.3). Total num frames: 73269248. Throughput: 0: 11855.7. Samples: 18362880. Policy #0 lag: (min: 88.0, avg: 142.9, max: 296.0) [2024-06-15 11:58:45,956][1648985] Avg episode reward: [(0, '115.510')] [2024-06-15 11:58:49,755][1652491] Updated weights for policy 0, policy_version 35808 (0.0017) [2024-06-15 11:58:50,955][1648985] Fps is (10 sec: 42598.4, 60 sec: 46421.3, 300 sec: 45986.3). Total num frames: 73433088. Throughput: 0: 11741.9. Samples: 18438144. Policy #0 lag: (min: 88.0, avg: 142.9, max: 296.0) [2024-06-15 11:58:50,956][1648985] Avg episode reward: [(0, '124.160')] [2024-06-15 11:58:51,295][1652491] Updated weights for policy 0, policy_version 35874 (0.0014) [2024-06-15 11:58:51,551][1651469] Saving new best policy, reward=124.160! [2024-06-15 11:58:54,739][1652491] Updated weights for policy 0, policy_version 35921 (0.0012) [2024-06-15 11:58:55,955][1648985] Fps is (10 sec: 39320.6, 60 sec: 46967.5, 300 sec: 45875.2). Total num frames: 73662464. Throughput: 0: 11730.4. Samples: 18470912. Policy #0 lag: (min: 88.0, avg: 142.9, max: 296.0) [2024-06-15 11:58:55,956][1648985] Avg episode reward: [(0, '130.480')] [2024-06-15 11:58:56,575][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000036000_73728000.pth... [2024-06-15 11:58:56,764][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000030592_62652416.pth [2024-06-15 11:58:56,769][1651469] Saving new best policy, reward=130.480! [2024-06-15 11:58:57,344][1652491] Updated weights for policy 0, policy_version 36024 (0.0012) [2024-06-15 11:59:00,955][1648985] Fps is (10 sec: 36045.2, 60 sec: 43690.7, 300 sec: 45764.1). Total num frames: 73793536. Throughput: 0: 11434.7. Samples: 18537984. Policy #0 lag: (min: 88.0, avg: 142.9, max: 296.0) [2024-06-15 11:59:00,956][1648985] Avg episode reward: [(0, '111.740')] [2024-06-15 11:59:02,240][1652491] Updated weights for policy 0, policy_version 36090 (0.0122) [2024-06-15 11:59:03,731][1652491] Updated weights for policy 0, policy_version 36158 (0.0013) [2024-06-15 11:59:05,955][1648985] Fps is (10 sec: 39321.7, 60 sec: 46421.2, 300 sec: 45764.1). Total num frames: 74055680. Throughput: 0: 11400.5. Samples: 18601984. Policy #0 lag: (min: 94.0, avg: 167.5, max: 351.0) [2024-06-15 11:59:05,956][1648985] Avg episode reward: [(0, '109.370')] [2024-06-15 11:59:07,439][1652491] Updated weights for policy 0, policy_version 36210 (0.0011) [2024-06-15 11:59:08,671][1652491] Updated weights for policy 0, policy_version 36262 (0.0012) [2024-06-15 11:59:10,955][1648985] Fps is (10 sec: 52427.5, 60 sec: 44782.8, 300 sec: 45764.1). Total num frames: 74317824. Throughput: 0: 11355.0. Samples: 18633216. Policy #0 lag: (min: 94.0, avg: 167.5, max: 351.0) [2024-06-15 11:59:10,956][1648985] Avg episode reward: [(0, '104.090')] [2024-06-15 11:59:12,644][1652491] Updated weights for policy 0, policy_version 36304 (0.0014) [2024-06-15 11:59:14,182][1652491] Updated weights for policy 0, policy_version 36368 (0.0015) [2024-06-15 11:59:15,955][1648985] Fps is (10 sec: 52429.8, 60 sec: 46967.4, 300 sec: 45764.1). Total num frames: 74579968. Throughput: 0: 11309.5. Samples: 18707456. Policy #0 lag: (min: 94.0, avg: 167.5, max: 351.0) [2024-06-15 11:59:15,956][1648985] Avg episode reward: [(0, '113.270')] [2024-06-15 11:59:16,890][1652491] Updated weights for policy 0, policy_version 36423 (0.0016) [2024-06-15 11:59:18,423][1652491] Updated weights for policy 0, policy_version 36496 (0.0026) [2024-06-15 11:59:18,874][1651469] Signal inference workers to stop experience collection... (1950 times) [2024-06-15 11:59:18,924][1652491] InferenceWorker_p0-w0: stopping experience collection (1950 times) [2024-06-15 11:59:19,126][1651469] Signal inference workers to resume experience collection... (1950 times) [2024-06-15 11:59:19,127][1652491] InferenceWorker_p0-w0: resuming experience collection (1950 times) [2024-06-15 11:59:20,955][1648985] Fps is (10 sec: 52429.7, 60 sec: 46421.2, 300 sec: 45875.2). Total num frames: 74842112. Throughput: 0: 11411.9. Samples: 18778624. Policy #0 lag: (min: 94.0, avg: 167.5, max: 351.0) [2024-06-15 11:59:20,956][1648985] Avg episode reward: [(0, '131.500')] [2024-06-15 11:59:20,957][1651469] Saving new best policy, reward=131.500! [2024-06-15 11:59:23,788][1652491] Updated weights for policy 0, policy_version 36546 (0.0012) [2024-06-15 11:59:25,237][1652491] Updated weights for policy 0, policy_version 36611 (0.0013) [2024-06-15 11:59:25,955][1648985] Fps is (10 sec: 45875.2, 60 sec: 46967.5, 300 sec: 45764.1). Total num frames: 75038720. Throughput: 0: 11423.3. Samples: 18821120. Policy #0 lag: (min: 94.0, avg: 167.5, max: 351.0) [2024-06-15 11:59:25,956][1648985] Avg episode reward: [(0, '134.680')] [2024-06-15 11:59:26,500][1651469] Saving new best policy, reward=134.680! [2024-06-15 11:59:26,512][1652491] Updated weights for policy 0, policy_version 36672 (0.0012) [2024-06-15 11:59:30,384][1652491] Updated weights for policy 0, policy_version 36756 (0.0110) [2024-06-15 11:59:30,955][1648985] Fps is (10 sec: 45875.2, 60 sec: 46421.4, 300 sec: 45875.2). Total num frames: 75300864. Throughput: 0: 11446.0. Samples: 18877952. Policy #0 lag: (min: 15.0, avg: 127.1, max: 271.0) [2024-06-15 11:59:30,956][1648985] Avg episode reward: [(0, '125.410')] [2024-06-15 11:59:31,413][1652491] Updated weights for policy 0, policy_version 36799 (0.0013) [2024-06-15 11:59:35,955][1648985] Fps is (10 sec: 42598.8, 60 sec: 45329.1, 300 sec: 45764.2). Total num frames: 75464704. Throughput: 0: 11434.7. Samples: 18952704. Policy #0 lag: (min: 15.0, avg: 127.1, max: 271.0) [2024-06-15 11:59:35,955][1648985] Avg episode reward: [(0, '110.990')] [2024-06-15 11:59:36,119][1652491] Updated weights for policy 0, policy_version 36857 (0.0013) [2024-06-15 11:59:37,768][1652491] Updated weights for policy 0, policy_version 36918 (0.0012) [2024-06-15 11:59:40,955][1648985] Fps is (10 sec: 39321.8, 60 sec: 44783.0, 300 sec: 45875.2). Total num frames: 75694080. Throughput: 0: 11400.6. Samples: 18983936. Policy #0 lag: (min: 15.0, avg: 127.1, max: 271.0) [2024-06-15 11:59:40,956][1648985] Avg episode reward: [(0, '117.360')] [2024-06-15 11:59:41,050][1652491] Updated weights for policy 0, policy_version 36976 (0.0014) [2024-06-15 11:59:42,696][1652491] Updated weights for policy 0, policy_version 37043 (0.0011) [2024-06-15 11:59:45,955][1648985] Fps is (10 sec: 42597.3, 60 sec: 43690.5, 300 sec: 45764.1). Total num frames: 75890688. Throughput: 0: 11423.2. Samples: 19052032. Policy #0 lag: (min: 15.0, avg: 127.1, max: 271.0) [2024-06-15 11:59:45,956][1648985] Avg episode reward: [(0, '112.050')] [2024-06-15 11:59:47,215][1652491] Updated weights for policy 0, policy_version 37078 (0.0011) [2024-06-15 11:59:48,681][1652491] Updated weights for policy 0, policy_version 37138 (0.0023) [2024-06-15 11:59:50,955][1648985] Fps is (10 sec: 45873.4, 60 sec: 45328.8, 300 sec: 45764.1). Total num frames: 76152832. Throughput: 0: 11525.6. Samples: 19120640. Policy #0 lag: (min: 15.0, avg: 127.1, max: 271.0) [2024-06-15 11:59:50,956][1648985] Avg episode reward: [(0, '114.490')] [2024-06-15 11:59:52,024][1652491] Updated weights for policy 0, policy_version 37185 (0.0013) [2024-06-15 11:59:53,267][1652491] Updated weights for policy 0, policy_version 37248 (0.0012) [2024-06-15 11:59:54,765][1652491] Updated weights for policy 0, policy_version 37303 (0.0011) [2024-06-15 11:59:55,955][1648985] Fps is (10 sec: 52429.8, 60 sec: 45875.4, 300 sec: 45764.1). Total num frames: 76414976. Throughput: 0: 11525.8. Samples: 19151872. Policy #0 lag: (min: 25.0, avg: 124.0, max: 265.0) [2024-06-15 11:59:55,956][1648985] Avg episode reward: [(0, '124.930')] [2024-06-15 11:59:59,288][1652491] Updated weights for policy 0, policy_version 37360 (0.0030) [2024-06-15 12:00:00,955][1648985] Fps is (10 sec: 49153.3, 60 sec: 47513.5, 300 sec: 45764.1). Total num frames: 76644352. Throughput: 0: 11491.5. Samples: 19224576. Policy #0 lag: (min: 25.0, avg: 124.0, max: 265.0) [2024-06-15 12:00:00,956][1648985] Avg episode reward: [(0, '122.540')] [2024-06-15 12:00:01,201][1652491] Updated weights for policy 0, policy_version 37440 (0.0011) [2024-06-15 12:00:03,036][1651469] Signal inference workers to stop experience collection... (2000 times) [2024-06-15 12:00:03,064][1652491] InferenceWorker_p0-w0: stopping experience collection (2000 times) [2024-06-15 12:00:03,188][1651469] Signal inference workers to resume experience collection... (2000 times) [2024-06-15 12:00:03,189][1652491] InferenceWorker_p0-w0: resuming experience collection (2000 times) [2024-06-15 12:00:03,683][1652491] Updated weights for policy 0, policy_version 37479 (0.0013) [2024-06-15 12:00:04,641][1652491] Updated weights for policy 0, policy_version 37523 (0.0014) [2024-06-15 12:00:05,637][1652491] Updated weights for policy 0, policy_version 37568 (0.0011) [2024-06-15 12:00:05,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 48059.9, 300 sec: 45875.2). Total num frames: 76939264. Throughput: 0: 11400.5. Samples: 19291648. Policy #0 lag: (min: 25.0, avg: 124.0, max: 265.0) [2024-06-15 12:00:05,956][1648985] Avg episode reward: [(0, '113.400')] [2024-06-15 12:00:10,955][1648985] Fps is (10 sec: 45875.2, 60 sec: 46421.4, 300 sec: 45875.2). Total num frames: 77103104. Throughput: 0: 11400.5. Samples: 19334144. Policy #0 lag: (min: 25.0, avg: 124.0, max: 265.0) [2024-06-15 12:00:10,956][1648985] Avg episode reward: [(0, '111.080')] [2024-06-15 12:00:11,201][1652491] Updated weights for policy 0, policy_version 37664 (0.0186) [2024-06-15 12:00:13,987][1652491] Updated weights for policy 0, policy_version 37713 (0.0013) [2024-06-15 12:00:15,496][1652491] Updated weights for policy 0, policy_version 37784 (0.0132) [2024-06-15 12:00:15,955][1648985] Fps is (10 sec: 49151.4, 60 sec: 47513.5, 300 sec: 46319.5). Total num frames: 77430784. Throughput: 0: 11605.3. Samples: 19400192. Policy #0 lag: (min: 25.0, avg: 124.0, max: 265.0) [2024-06-15 12:00:15,956][1648985] Avg episode reward: [(0, '118.640')] [2024-06-15 12:00:19,912][1652491] Updated weights for policy 0, policy_version 37826 (0.0014) [2024-06-15 12:00:20,955][1648985] Fps is (10 sec: 45876.0, 60 sec: 45329.1, 300 sec: 46097.4). Total num frames: 77561856. Throughput: 0: 11616.7. Samples: 19475456. Policy #0 lag: (min: 4.0, avg: 85.4, max: 260.0) [2024-06-15 12:00:20,956][1648985] Avg episode reward: [(0, '118.290')] [2024-06-15 12:00:21,091][1652491] Updated weights for policy 0, policy_version 37878 (0.0061) [2024-06-15 12:00:21,991][1652491] Updated weights for policy 0, policy_version 37920 (0.0018) [2024-06-15 12:00:24,913][1652491] Updated weights for policy 0, policy_version 37972 (0.0014) [2024-06-15 12:00:25,955][1648985] Fps is (10 sec: 39321.9, 60 sec: 46421.3, 300 sec: 46097.4). Total num frames: 77824000. Throughput: 0: 11810.1. Samples: 19515392. Policy #0 lag: (min: 4.0, avg: 85.4, max: 260.0) [2024-06-15 12:00:25,956][1648985] Avg episode reward: [(0, '121.740')] [2024-06-15 12:00:27,107][1652491] Updated weights for policy 0, policy_version 38064 (0.0013) [2024-06-15 12:00:30,955][1648985] Fps is (10 sec: 42598.0, 60 sec: 44782.9, 300 sec: 45764.1). Total num frames: 77987840. Throughput: 0: 11764.7. Samples: 19581440. Policy #0 lag: (min: 4.0, avg: 85.4, max: 260.0) [2024-06-15 12:00:30,956][1648985] Avg episode reward: [(0, '132.470')] [2024-06-15 12:00:32,531][1652491] Updated weights for policy 0, policy_version 38128 (0.0014) [2024-06-15 12:00:34,500][1652491] Updated weights for policy 0, policy_version 38208 (0.0011) [2024-06-15 12:00:35,955][1648985] Fps is (10 sec: 42598.4, 60 sec: 46421.2, 300 sec: 45986.3). Total num frames: 78249984. Throughput: 0: 11696.5. Samples: 19646976. Policy #0 lag: (min: 4.0, avg: 85.4, max: 260.0) [2024-06-15 12:00:35,956][1648985] Avg episode reward: [(0, '131.950')] [2024-06-15 12:00:37,697][1652491] Updated weights for policy 0, policy_version 38273 (0.0014) [2024-06-15 12:00:39,031][1652491] Updated weights for policy 0, policy_version 38330 (0.0013) [2024-06-15 12:00:40,955][1648985] Fps is (10 sec: 52428.6, 60 sec: 46967.4, 300 sec: 45986.3). Total num frames: 78512128. Throughput: 0: 11628.1. Samples: 19675136. Policy #0 lag: (min: 4.0, avg: 85.4, max: 260.0) [2024-06-15 12:00:40,956][1648985] Avg episode reward: [(0, '125.340')] [2024-06-15 12:00:44,076][1651469] Signal inference workers to stop experience collection... (2050 times) [2024-06-15 12:00:44,129][1652491] InferenceWorker_p0-w0: stopping experience collection (2050 times) [2024-06-15 12:00:44,132][1652491] Updated weights for policy 0, policy_version 38371 (0.0015) [2024-06-15 12:00:44,374][1651469] Signal inference workers to resume experience collection... (2050 times) [2024-06-15 12:00:44,375][1652491] InferenceWorker_p0-w0: resuming experience collection (2050 times) [2024-06-15 12:00:45,793][1652491] Updated weights for policy 0, policy_version 38434 (0.0012) [2024-06-15 12:00:45,955][1648985] Fps is (10 sec: 45875.2, 60 sec: 46967.6, 300 sec: 46319.5). Total num frames: 78708736. Throughput: 0: 11559.8. Samples: 19744768. Policy #0 lag: (min: 63.0, avg: 143.5, max: 319.0) [2024-06-15 12:00:45,956][1648985] Avg episode reward: [(0, '121.700')] [2024-06-15 12:00:47,774][1652491] Updated weights for policy 0, policy_version 38468 (0.0013) [2024-06-15 12:00:49,302][1652491] Updated weights for policy 0, policy_version 38529 (0.0012) [2024-06-15 12:00:50,955][1648985] Fps is (10 sec: 52429.4, 60 sec: 48060.1, 300 sec: 46430.6). Total num frames: 79036416. Throughput: 0: 11366.4. Samples: 19803136. Policy #0 lag: (min: 63.0, avg: 143.5, max: 319.0) [2024-06-15 12:00:50,956][1648985] Avg episode reward: [(0, '118.090')] [2024-06-15 12:00:55,244][1652491] Updated weights for policy 0, policy_version 38593 (0.0029) [2024-06-15 12:00:55,955][1648985] Fps is (10 sec: 39320.0, 60 sec: 44782.6, 300 sec: 45986.2). Total num frames: 79101952. Throughput: 0: 11343.6. Samples: 19844608. Policy #0 lag: (min: 63.0, avg: 143.5, max: 319.0) [2024-06-15 12:00:55,956][1648985] Avg episode reward: [(0, '117.780')] [2024-06-15 12:00:56,249][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000038640_79134720.pth... [2024-06-15 12:00:56,310][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000033280_68157440.pth [2024-06-15 12:00:56,437][1652491] Updated weights for policy 0, policy_version 38645 (0.0014) [2024-06-15 12:00:57,384][1652491] Updated weights for policy 0, policy_version 38674 (0.0012) [2024-06-15 12:00:58,288][1652491] Updated weights for policy 0, policy_version 38720 (0.0016) [2024-06-15 12:01:00,075][1652491] Updated weights for policy 0, policy_version 38768 (0.0012) [2024-06-15 12:01:00,955][1648985] Fps is (10 sec: 39321.9, 60 sec: 46421.5, 300 sec: 46208.5). Total num frames: 79429632. Throughput: 0: 11400.6. Samples: 19913216. Policy #0 lag: (min: 63.0, avg: 143.5, max: 319.0) [2024-06-15 12:01:00,955][1648985] Avg episode reward: [(0, '115.170')] [2024-06-15 12:01:01,856][1652491] Updated weights for policy 0, policy_version 38832 (0.0014) [2024-06-15 12:01:05,955][1648985] Fps is (10 sec: 45877.3, 60 sec: 43690.7, 300 sec: 45764.1). Total num frames: 79560704. Throughput: 0: 11355.0. Samples: 19986432. Policy #0 lag: (min: 63.0, avg: 143.5, max: 319.0) [2024-06-15 12:01:05,956][1648985] Avg episode reward: [(0, '111.470')] [2024-06-15 12:01:06,936][1652491] Updated weights for policy 0, policy_version 38880 (0.0011) [2024-06-15 12:01:08,658][1652491] Updated weights for policy 0, policy_version 38944 (0.0012) [2024-06-15 12:01:10,955][1648985] Fps is (10 sec: 39321.2, 60 sec: 45329.2, 300 sec: 45875.2). Total num frames: 79822848. Throughput: 0: 11184.4. Samples: 20018688. Policy #0 lag: (min: 63.0, avg: 143.5, max: 319.0) [2024-06-15 12:01:10,956][1648985] Avg episode reward: [(0, '118.310')] [2024-06-15 12:01:11,919][1652491] Updated weights for policy 0, policy_version 39024 (0.0014) [2024-06-15 12:01:15,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 44236.9, 300 sec: 46097.4). Total num frames: 80084992. Throughput: 0: 10956.8. Samples: 20074496. Policy #0 lag: (min: 15.0, avg: 129.8, max: 271.0) [2024-06-15 12:01:15,956][1648985] Avg episode reward: [(0, '109.550')] [2024-06-15 12:01:18,630][1652491] Updated weights for policy 0, policy_version 39107 (0.0059) [2024-06-15 12:01:19,876][1652491] Updated weights for policy 0, policy_version 39168 (0.0013) [2024-06-15 12:01:20,955][1648985] Fps is (10 sec: 45875.3, 60 sec: 45329.0, 300 sec: 46097.4). Total num frames: 80281600. Throughput: 0: 11116.1. Samples: 20147200. Policy #0 lag: (min: 15.0, avg: 129.8, max: 271.0) [2024-06-15 12:01:20,956][1648985] Avg episode reward: [(0, '122.870')] [2024-06-15 12:01:21,220][1652491] Updated weights for policy 0, policy_version 39216 (0.0015) [2024-06-15 12:01:23,068][1652491] Updated weights for policy 0, policy_version 39251 (0.0015) [2024-06-15 12:01:24,508][1651469] Signal inference workers to stop experience collection... (2100 times) [2024-06-15 12:01:24,549][1652491] Updated weights for policy 0, policy_version 39313 (0.0093) [2024-06-15 12:01:24,578][1652491] InferenceWorker_p0-w0: stopping experience collection (2100 times) [2024-06-15 12:01:24,856][1651469] Signal inference workers to resume experience collection... (2100 times) [2024-06-15 12:01:24,856][1652491] InferenceWorker_p0-w0: resuming experience collection (2100 times) [2024-06-15 12:01:25,700][1652491] Updated weights for policy 0, policy_version 39359 (0.0049) [2024-06-15 12:01:25,955][1648985] Fps is (10 sec: 52429.4, 60 sec: 46421.4, 300 sec: 46208.5). Total num frames: 80609280. Throughput: 0: 11241.3. Samples: 20180992. Policy #0 lag: (min: 15.0, avg: 129.8, max: 271.0) [2024-06-15 12:01:25,955][1648985] Avg episode reward: [(0, '114.870')] [2024-06-15 12:01:30,955][1648985] Fps is (10 sec: 45874.6, 60 sec: 45875.1, 300 sec: 46208.4). Total num frames: 80740352. Throughput: 0: 11377.8. Samples: 20256768. Policy #0 lag: (min: 15.0, avg: 129.8, max: 271.0) [2024-06-15 12:01:30,956][1648985] Avg episode reward: [(0, '119.210')] [2024-06-15 12:01:31,881][1652491] Updated weights for policy 0, policy_version 39426 (0.0013) [2024-06-15 12:01:34,495][1652491] Updated weights for policy 0, policy_version 39490 (0.0128) [2024-06-15 12:01:35,955][1648985] Fps is (10 sec: 39321.5, 60 sec: 45875.3, 300 sec: 45764.1). Total num frames: 81002496. Throughput: 0: 11446.1. Samples: 20318208. Policy #0 lag: (min: 15.0, avg: 129.8, max: 271.0) [2024-06-15 12:01:35,956][1648985] Avg episode reward: [(0, '121.200')] [2024-06-15 12:01:36,083][1652491] Updated weights for policy 0, policy_version 39553 (0.0012) [2024-06-15 12:01:37,501][1652491] Updated weights for policy 0, policy_version 39609 (0.0014) [2024-06-15 12:01:40,955][1648985] Fps is (10 sec: 39322.1, 60 sec: 43690.7, 300 sec: 45764.1). Total num frames: 81133568. Throughput: 0: 11321.0. Samples: 20354048. Policy #0 lag: (min: 81.0, avg: 180.6, max: 280.0) [2024-06-15 12:01:40,956][1648985] Avg episode reward: [(0, '132.090')] [2024-06-15 12:01:41,567][1652491] Updated weights for policy 0, policy_version 39652 (0.0012) [2024-06-15 12:01:44,125][1652491] Updated weights for policy 0, policy_version 39684 (0.0014) [2024-06-15 12:01:45,706][1652491] Updated weights for policy 0, policy_version 39744 (0.0034) [2024-06-15 12:01:45,955][1648985] Fps is (10 sec: 39321.1, 60 sec: 44782.9, 300 sec: 45319.8). Total num frames: 81395712. Throughput: 0: 11343.6. Samples: 20423680. Policy #0 lag: (min: 81.0, avg: 180.6, max: 280.0) [2024-06-15 12:01:45,956][1648985] Avg episode reward: [(0, '128.960')] [2024-06-15 12:01:46,841][1652491] Updated weights for policy 0, policy_version 39795 (0.0014) [2024-06-15 12:01:48,218][1652491] Updated weights for policy 0, policy_version 39864 (0.0013) [2024-06-15 12:01:50,955][1648985] Fps is (10 sec: 52427.9, 60 sec: 43690.5, 300 sec: 45764.1). Total num frames: 81657856. Throughput: 0: 11537.0. Samples: 20505600. Policy #0 lag: (min: 81.0, avg: 180.6, max: 280.0) [2024-06-15 12:01:50,956][1648985] Avg episode reward: [(0, '132.080')] [2024-06-15 12:01:51,920][1652491] Updated weights for policy 0, policy_version 39889 (0.0015) [2024-06-15 12:01:52,735][1652491] Updated weights for policy 0, policy_version 39936 (0.0024) [2024-06-15 12:01:55,460][1652491] Updated weights for policy 0, policy_version 40000 (0.0013) [2024-06-15 12:01:55,955][1648985] Fps is (10 sec: 52429.3, 60 sec: 46967.8, 300 sec: 45875.2). Total num frames: 81920000. Throughput: 0: 11559.8. Samples: 20538880. Policy #0 lag: (min: 81.0, avg: 180.6, max: 280.0) [2024-06-15 12:01:55,956][1648985] Avg episode reward: [(0, '134.440')] [2024-06-15 12:01:57,600][1652491] Updated weights for policy 0, policy_version 40051 (0.0014) [2024-06-15 12:01:59,087][1652491] Updated weights for policy 0, policy_version 40120 (0.0013) [2024-06-15 12:02:00,955][1648985] Fps is (10 sec: 52430.1, 60 sec: 45875.2, 300 sec: 46208.5). Total num frames: 82182144. Throughput: 0: 11707.8. Samples: 20601344. Policy #0 lag: (min: 81.0, avg: 180.6, max: 280.0) [2024-06-15 12:02:00,955][1648985] Avg episode reward: [(0, '129.620')] [2024-06-15 12:02:04,187][1652491] Updated weights for policy 0, policy_version 40170 (0.0020) [2024-06-15 12:02:05,327][1652491] Updated weights for policy 0, policy_version 40209 (0.0014) [2024-06-15 12:02:05,955][1648985] Fps is (10 sec: 49152.1, 60 sec: 47513.6, 300 sec: 46097.4). Total num frames: 82411520. Throughput: 0: 11730.5. Samples: 20675072. Policy #0 lag: (min: 15.0, avg: 101.9, max: 271.0) [2024-06-15 12:02:05,956][1648985] Avg episode reward: [(0, '126.470')] [2024-06-15 12:02:08,318][1651469] Signal inference workers to stop experience collection... (2150 times) [2024-06-15 12:02:08,375][1652491] InferenceWorker_p0-w0: stopping experience collection (2150 times) [2024-06-15 12:02:08,505][1651469] Signal inference workers to resume experience collection... (2150 times) [2024-06-15 12:02:08,506][1652491] InferenceWorker_p0-w0: resuming experience collection (2150 times) [2024-06-15 12:02:08,509][1652491] Updated weights for policy 0, policy_version 40304 (0.0115) [2024-06-15 12:02:10,346][1652491] Updated weights for policy 0, policy_version 40374 (0.0012) [2024-06-15 12:02:10,955][1648985] Fps is (10 sec: 52427.5, 60 sec: 48059.6, 300 sec: 46208.4). Total num frames: 82706432. Throughput: 0: 11650.8. Samples: 20705280. Policy #0 lag: (min: 15.0, avg: 101.9, max: 271.0) [2024-06-15 12:02:10,956][1648985] Avg episode reward: [(0, '121.260')] [2024-06-15 12:02:15,955][1648985] Fps is (10 sec: 36044.6, 60 sec: 44782.9, 300 sec: 45986.3). Total num frames: 82771968. Throughput: 0: 11639.5. Samples: 20780544. Policy #0 lag: (min: 15.0, avg: 101.9, max: 271.0) [2024-06-15 12:02:15,956][1648985] Avg episode reward: [(0, '117.780')] [2024-06-15 12:02:15,999][1652491] Updated weights for policy 0, policy_version 40418 (0.0061) [2024-06-15 12:02:17,085][1652491] Updated weights for policy 0, policy_version 40464 (0.0015) [2024-06-15 12:02:18,757][1652491] Updated weights for policy 0, policy_version 40515 (0.0014) [2024-06-15 12:02:20,951][1652491] Updated weights for policy 0, policy_version 40593 (0.0030) [2024-06-15 12:02:20,955][1648985] Fps is (10 sec: 42599.6, 60 sec: 47513.7, 300 sec: 45875.2). Total num frames: 83132416. Throughput: 0: 11628.1. Samples: 20841472. Policy #0 lag: (min: 15.0, avg: 101.9, max: 271.0) [2024-06-15 12:02:20,955][1648985] Avg episode reward: [(0, '125.540')] [2024-06-15 12:02:22,003][1652491] Updated weights for policy 0, policy_version 40640 (0.0010) [2024-06-15 12:02:25,955][1648985] Fps is (10 sec: 45875.0, 60 sec: 43690.6, 300 sec: 46097.3). Total num frames: 83230720. Throughput: 0: 11707.7. Samples: 20880896. Policy #0 lag: (min: 15.0, avg: 101.9, max: 271.0) [2024-06-15 12:02:25,956][1648985] Avg episode reward: [(0, '133.690')] [2024-06-15 12:02:27,780][1652491] Updated weights for policy 0, policy_version 40702 (0.0014) [2024-06-15 12:02:29,270][1652491] Updated weights for policy 0, policy_version 40768 (0.0015) [2024-06-15 12:02:30,955][1648985] Fps is (10 sec: 45873.8, 60 sec: 47513.6, 300 sec: 46208.4). Total num frames: 83591168. Throughput: 0: 11730.5. Samples: 20951552. Policy #0 lag: (min: 15.0, avg: 148.1, max: 271.0) [2024-06-15 12:02:30,957][1648985] Avg episode reward: [(0, '138.430')] [2024-06-15 12:02:30,967][1651469] Saving new best policy, reward=138.430! [2024-06-15 12:02:32,192][1652491] Updated weights for policy 0, policy_version 40853 (0.0122) [2024-06-15 12:02:35,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 45875.1, 300 sec: 46208.4). Total num frames: 83755008. Throughput: 0: 11446.1. Samples: 21020672. Policy #0 lag: (min: 15.0, avg: 148.1, max: 271.0) [2024-06-15 12:02:35,956][1648985] Avg episode reward: [(0, '130.500')] [2024-06-15 12:02:37,676][1652491] Updated weights for policy 0, policy_version 40899 (0.0012) [2024-06-15 12:02:38,948][1652491] Updated weights for policy 0, policy_version 40955 (0.0013) [2024-06-15 12:02:40,404][1652491] Updated weights for policy 0, policy_version 41019 (0.0013) [2024-06-15 12:02:40,955][1648985] Fps is (10 sec: 42598.9, 60 sec: 48059.7, 300 sec: 46208.4). Total num frames: 84017152. Throughput: 0: 11525.7. Samples: 21057536. Policy #0 lag: (min: 15.0, avg: 148.1, max: 271.0) [2024-06-15 12:02:40,956][1648985] Avg episode reward: [(0, '123.770')] [2024-06-15 12:02:41,787][1652491] Updated weights for policy 0, policy_version 41059 (0.0095) [2024-06-15 12:02:43,586][1652491] Updated weights for policy 0, policy_version 41124 (0.0015) [2024-06-15 12:02:45,955][1648985] Fps is (10 sec: 52428.6, 60 sec: 48059.7, 300 sec: 46208.4). Total num frames: 84279296. Throughput: 0: 11605.3. Samples: 21123584. Policy #0 lag: (min: 15.0, avg: 148.1, max: 271.0) [2024-06-15 12:02:45,957][1648985] Avg episode reward: [(0, '122.200')] [2024-06-15 12:02:48,847][1652491] Updated weights for policy 0, policy_version 41154 (0.0012) [2024-06-15 12:02:50,579][1652491] Updated weights for policy 0, policy_version 41232 (0.0013) [2024-06-15 12:02:50,955][1648985] Fps is (10 sec: 42599.0, 60 sec: 46421.5, 300 sec: 46097.4). Total num frames: 84443136. Throughput: 0: 11605.3. Samples: 21197312. Policy #0 lag: (min: 15.0, avg: 148.1, max: 271.0) [2024-06-15 12:02:50,955][1648985] Avg episode reward: [(0, '129.120')] [2024-06-15 12:02:51,736][1652491] Updated weights for policy 0, policy_version 41280 (0.0019) [2024-06-15 12:02:52,396][1651469] Signal inference workers to stop experience collection... (2200 times) [2024-06-15 12:02:52,463][1652491] InferenceWorker_p0-w0: stopping experience collection (2200 times) [2024-06-15 12:02:52,602][1651469] Signal inference workers to resume experience collection... (2200 times) [2024-06-15 12:02:52,603][1652491] InferenceWorker_p0-w0: resuming experience collection (2200 times) [2024-06-15 12:02:53,576][1652491] Updated weights for policy 0, policy_version 41338 (0.0015) [2024-06-15 12:02:55,279][1652491] Updated weights for policy 0, policy_version 41396 (0.0013) [2024-06-15 12:02:55,955][1648985] Fps is (10 sec: 52427.4, 60 sec: 48059.4, 300 sec: 46208.4). Total num frames: 84803584. Throughput: 0: 11593.9. Samples: 21227008. Policy #0 lag: (min: 15.0, avg: 148.1, max: 271.0) [2024-06-15 12:02:55,956][1648985] Avg episode reward: [(0, '127.680')] [2024-06-15 12:02:55,961][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000041408_84803584.pth... [2024-06-15 12:02:56,050][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000036000_73728000.pth [2024-06-15 12:03:00,565][1652491] Updated weights for policy 0, policy_version 41440 (0.0011) [2024-06-15 12:03:00,956][1648985] Fps is (10 sec: 45874.3, 60 sec: 45328.9, 300 sec: 46208.4). Total num frames: 84901888. Throughput: 0: 11787.3. Samples: 21310976. Policy #0 lag: (min: 15.0, avg: 74.6, max: 271.0) [2024-06-15 12:03:00,958][1648985] Avg episode reward: [(0, '129.390')] [2024-06-15 12:03:02,541][1652491] Updated weights for policy 0, policy_version 41529 (0.0014) [2024-06-15 12:03:04,261][1652491] Updated weights for policy 0, policy_version 41570 (0.0013) [2024-06-15 12:03:05,893][1652491] Updated weights for policy 0, policy_version 41635 (0.0027) [2024-06-15 12:03:05,955][1648985] Fps is (10 sec: 45876.7, 60 sec: 47513.5, 300 sec: 46208.4). Total num frames: 85262336. Throughput: 0: 11764.6. Samples: 21370880. Policy #0 lag: (min: 15.0, avg: 74.6, max: 271.0) [2024-06-15 12:03:05,956][1648985] Avg episode reward: [(0, '121.440')] [2024-06-15 12:03:10,931][1652491] Updated weights for policy 0, policy_version 41696 (0.0011) [2024-06-15 12:03:10,955][1648985] Fps is (10 sec: 49152.0, 60 sec: 44783.0, 300 sec: 46208.4). Total num frames: 85393408. Throughput: 0: 11867.0. Samples: 21414912. Policy #0 lag: (min: 15.0, avg: 74.6, max: 271.0) [2024-06-15 12:03:10,956][1648985] Avg episode reward: [(0, '118.880')] [2024-06-15 12:03:12,155][1652491] Updated weights for policy 0, policy_version 41745 (0.0012) [2024-06-15 12:03:14,480][1652491] Updated weights for policy 0, policy_version 41796 (0.0014) [2024-06-15 12:03:15,614][1652491] Updated weights for policy 0, policy_version 41848 (0.0012) [2024-06-15 12:03:15,955][1648985] Fps is (10 sec: 45875.1, 60 sec: 49152.0, 300 sec: 46319.5). Total num frames: 85721088. Throughput: 0: 11889.8. Samples: 21486592. Policy #0 lag: (min: 15.0, avg: 74.6, max: 271.0) [2024-06-15 12:03:15,956][1648985] Avg episode reward: [(0, '131.640')] [2024-06-15 12:03:17,619][1652491] Updated weights for policy 0, policy_version 41916 (0.0014) [2024-06-15 12:03:20,955][1648985] Fps is (10 sec: 45876.2, 60 sec: 45329.1, 300 sec: 46208.5). Total num frames: 85852160. Throughput: 0: 11958.1. Samples: 21558784. Policy #0 lag: (min: 15.0, avg: 74.6, max: 271.0) [2024-06-15 12:03:20,956][1648985] Avg episode reward: [(0, '133.100')] [2024-06-15 12:03:22,166][1652491] Updated weights for policy 0, policy_version 41981 (0.0013) [2024-06-15 12:03:23,741][1652491] Updated weights for policy 0, policy_version 42032 (0.0016) [2024-06-15 12:03:25,517][1652491] Updated weights for policy 0, policy_version 42066 (0.0013) [2024-06-15 12:03:25,955][1648985] Fps is (10 sec: 45875.0, 60 sec: 49152.0, 300 sec: 46319.5). Total num frames: 86179840. Throughput: 0: 11889.8. Samples: 21592576. Policy #0 lag: (min: 15.0, avg: 74.6, max: 271.0) [2024-06-15 12:03:25,956][1648985] Avg episode reward: [(0, '138.590')] [2024-06-15 12:03:26,419][1651469] Saving new best policy, reward=138.590! [2024-06-15 12:03:27,238][1652491] Updated weights for policy 0, policy_version 42115 (0.0014) [2024-06-15 12:03:28,325][1652491] Updated weights for policy 0, policy_version 42162 (0.0022) [2024-06-15 12:03:30,956][1648985] Fps is (10 sec: 52427.3, 60 sec: 46421.3, 300 sec: 46208.4). Total num frames: 86376448. Throughput: 0: 12026.3. Samples: 21664768. Policy #0 lag: (min: 15.0, avg: 155.8, max: 271.0) [2024-06-15 12:03:30,957][1648985] Avg episode reward: [(0, '137.650')] [2024-06-15 12:03:32,440][1652491] Updated weights for policy 0, policy_version 42224 (0.0018) [2024-06-15 12:03:34,711][1652491] Updated weights for policy 0, policy_version 42295 (0.0014) [2024-06-15 12:03:35,955][1648985] Fps is (10 sec: 45876.4, 60 sec: 48059.9, 300 sec: 46208.5). Total num frames: 86638592. Throughput: 0: 11992.2. Samples: 21736960. Policy #0 lag: (min: 15.0, avg: 155.8, max: 271.0) [2024-06-15 12:03:35,955][1648985] Avg episode reward: [(0, '132.770')] [2024-06-15 12:03:36,144][1651469] Signal inference workers to stop experience collection... (2250 times) [2024-06-15 12:03:36,196][1652491] InferenceWorker_p0-w0: stopping experience collection (2250 times) [2024-06-15 12:03:36,404][1651469] Signal inference workers to resume experience collection... (2250 times) [2024-06-15 12:03:36,404][1652491] InferenceWorker_p0-w0: resuming experience collection (2250 times) [2024-06-15 12:03:37,210][1652491] Updated weights for policy 0, policy_version 42352 (0.0011) [2024-06-15 12:03:38,670][1652491] Updated weights for policy 0, policy_version 42400 (0.0012) [2024-06-15 12:03:40,955][1648985] Fps is (10 sec: 52429.8, 60 sec: 48059.8, 300 sec: 46208.4). Total num frames: 86900736. Throughput: 0: 12106.1. Samples: 21771776. Policy #0 lag: (min: 15.0, avg: 155.8, max: 271.0) [2024-06-15 12:03:40,956][1648985] Avg episode reward: [(0, '138.120')] [2024-06-15 12:03:43,910][1652491] Updated weights for policy 0, policy_version 42487 (0.0106) [2024-06-15 12:03:45,621][1652491] Updated weights for policy 0, policy_version 42533 (0.0012) [2024-06-15 12:03:45,955][1648985] Fps is (10 sec: 49151.1, 60 sec: 47513.7, 300 sec: 46430.6). Total num frames: 87130112. Throughput: 0: 11889.8. Samples: 21846016. Policy #0 lag: (min: 15.0, avg: 155.8, max: 271.0) [2024-06-15 12:03:45,956][1648985] Avg episode reward: [(0, '134.770')] [2024-06-15 12:03:47,338][1652491] Updated weights for policy 0, policy_version 42579 (0.0013) [2024-06-15 12:03:48,963][1652491] Updated weights for policy 0, policy_version 42640 (0.0013) [2024-06-15 12:03:49,899][1652491] Updated weights for policy 0, policy_version 42681 (0.0014) [2024-06-15 12:03:50,955][1648985] Fps is (10 sec: 52428.5, 60 sec: 49698.0, 300 sec: 46652.8). Total num frames: 87425024. Throughput: 0: 12117.3. Samples: 21916160. Policy #0 lag: (min: 15.0, avg: 155.8, max: 271.0) [2024-06-15 12:03:50,956][1648985] Avg episode reward: [(0, '130.200')] [2024-06-15 12:03:54,227][1652491] Updated weights for policy 0, policy_version 42720 (0.0014) [2024-06-15 12:03:55,955][1648985] Fps is (10 sec: 45874.9, 60 sec: 46421.5, 300 sec: 46763.8). Total num frames: 87588864. Throughput: 0: 12014.9. Samples: 21955584. Policy #0 lag: (min: 35.0, avg: 122.2, max: 291.0) [2024-06-15 12:03:55,956][1648985] Avg episode reward: [(0, '119.540')] [2024-06-15 12:03:56,416][1652491] Updated weights for policy 0, policy_version 42800 (0.0014) [2024-06-15 12:03:58,989][1652491] Updated weights for policy 0, policy_version 42848 (0.0122) [2024-06-15 12:04:00,193][1652491] Updated weights for policy 0, policy_version 42896 (0.0012) [2024-06-15 12:04:00,955][1648985] Fps is (10 sec: 49151.9, 60 sec: 50244.3, 300 sec: 46986.0). Total num frames: 87916544. Throughput: 0: 12026.3. Samples: 22027776. Policy #0 lag: (min: 35.0, avg: 122.2, max: 291.0) [2024-06-15 12:04:00,956][1648985] Avg episode reward: [(0, '109.050')] [2024-06-15 12:04:01,186][1652491] Updated weights for policy 0, policy_version 42940 (0.0013) [2024-06-15 12:04:05,201][1652491] Updated weights for policy 0, policy_version 43002 (0.0019) [2024-06-15 12:04:05,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 47513.6, 300 sec: 46763.9). Total num frames: 88113152. Throughput: 0: 12105.9. Samples: 22103552. Policy #0 lag: (min: 35.0, avg: 122.2, max: 291.0) [2024-06-15 12:04:05,956][1648985] Avg episode reward: [(0, '103.940')] [2024-06-15 12:04:06,744][1652491] Updated weights for policy 0, policy_version 43056 (0.0016) [2024-06-15 12:04:09,988][1652491] Updated weights for policy 0, policy_version 43120 (0.0084) [2024-06-15 12:04:10,955][1648985] Fps is (10 sec: 45875.4, 60 sec: 49698.2, 300 sec: 46763.8). Total num frames: 88375296. Throughput: 0: 12185.6. Samples: 22140928. Policy #0 lag: (min: 35.0, avg: 122.2, max: 291.0) [2024-06-15 12:04:10,956][1648985] Avg episode reward: [(0, '112.430')] [2024-06-15 12:04:11,514][1652491] Updated weights for policy 0, policy_version 43187 (0.0014) [2024-06-15 12:04:15,694][1652491] Updated weights for policy 0, policy_version 43232 (0.0014) [2024-06-15 12:04:15,955][1648985] Fps is (10 sec: 42597.6, 60 sec: 46967.3, 300 sec: 46430.6). Total num frames: 88539136. Throughput: 0: 12197.0. Samples: 22213632. Policy #0 lag: (min: 35.0, avg: 122.2, max: 291.0) [2024-06-15 12:04:15,956][1648985] Avg episode reward: [(0, '115.700')] [2024-06-15 12:04:17,750][1652491] Updated weights for policy 0, policy_version 43301 (0.0013) [2024-06-15 12:04:20,494][1651469] Signal inference workers to stop experience collection... (2300 times) [2024-06-15 12:04:20,498][1652491] Updated weights for policy 0, policy_version 43329 (0.0013) [2024-06-15 12:04:20,578][1652491] InferenceWorker_p0-w0: stopping experience collection (2300 times) [2024-06-15 12:04:20,679][1651469] Signal inference workers to resume experience collection... (2300 times) [2024-06-15 12:04:20,680][1652491] InferenceWorker_p0-w0: resuming experience collection (2300 times) [2024-06-15 12:04:20,955][1648985] Fps is (10 sec: 39321.7, 60 sec: 48605.8, 300 sec: 46541.7). Total num frames: 88768512. Throughput: 0: 12071.8. Samples: 22280192. Policy #0 lag: (min: 35.0, avg: 122.2, max: 291.0) [2024-06-15 12:04:20,956][1648985] Avg episode reward: [(0, '97.950')] [2024-06-15 12:04:21,861][1652491] Updated weights for policy 0, policy_version 43408 (0.0080) [2024-06-15 12:04:22,735][1652491] Updated weights for policy 0, policy_version 43456 (0.0012) [2024-06-15 12:04:25,955][1648985] Fps is (10 sec: 45875.7, 60 sec: 46967.5, 300 sec: 46430.6). Total num frames: 88997888. Throughput: 0: 12060.4. Samples: 22314496. Policy #0 lag: (min: 31.0, avg: 154.0, max: 287.0) [2024-06-15 12:04:25,956][1648985] Avg episode reward: [(0, '102.580')] [2024-06-15 12:04:28,205][1652491] Updated weights for policy 0, policy_version 43521 (0.0018) [2024-06-15 12:04:29,417][1652491] Updated weights for policy 0, policy_version 43582 (0.0013) [2024-06-15 12:04:30,956][1648985] Fps is (10 sec: 49146.0, 60 sec: 48058.9, 300 sec: 46763.6). Total num frames: 89260032. Throughput: 0: 11878.1. Samples: 22380544. Policy #0 lag: (min: 31.0, avg: 154.0, max: 287.0) [2024-06-15 12:04:30,957][1648985] Avg episode reward: [(0, '106.210')] [2024-06-15 12:04:33,066][1652491] Updated weights for policy 0, policy_version 43650 (0.0013) [2024-06-15 12:04:34,265][1652491] Updated weights for policy 0, policy_version 43700 (0.0014) [2024-06-15 12:04:35,955][1648985] Fps is (10 sec: 52428.3, 60 sec: 48059.4, 300 sec: 46874.9). Total num frames: 89522176. Throughput: 0: 12060.4. Samples: 22458880. Policy #0 lag: (min: 31.0, avg: 154.0, max: 287.0) [2024-06-15 12:04:35,956][1648985] Avg episode reward: [(0, '108.510')] [2024-06-15 12:04:38,264][1652491] Updated weights for policy 0, policy_version 43732 (0.0017) [2024-06-15 12:04:40,163][1652491] Updated weights for policy 0, policy_version 43808 (0.0011) [2024-06-15 12:04:40,955][1648985] Fps is (10 sec: 49158.0, 60 sec: 47513.6, 300 sec: 46986.0). Total num frames: 89751552. Throughput: 0: 11969.4. Samples: 22494208. Policy #0 lag: (min: 31.0, avg: 154.0, max: 287.0) [2024-06-15 12:04:40,956][1648985] Avg episode reward: [(0, '119.790')] [2024-06-15 12:04:43,534][1652491] Updated weights for policy 0, policy_version 43856 (0.0014) [2024-06-15 12:04:44,810][1652491] Updated weights for policy 0, policy_version 43911 (0.0042) [2024-06-15 12:04:45,955][1648985] Fps is (10 sec: 49152.8, 60 sec: 48059.7, 300 sec: 46986.0). Total num frames: 90013696. Throughput: 0: 11810.1. Samples: 22559232. Policy #0 lag: (min: 31.0, avg: 154.0, max: 287.0) [2024-06-15 12:04:45,956][1648985] Avg episode reward: [(0, '118.160')] [2024-06-15 12:04:46,064][1652491] Updated weights for policy 0, policy_version 43968 (0.0035) [2024-06-15 12:04:50,523][1652491] Updated weights for policy 0, policy_version 44017 (0.0012) [2024-06-15 12:04:50,955][1648985] Fps is (10 sec: 42598.8, 60 sec: 45875.3, 300 sec: 46652.8). Total num frames: 90177536. Throughput: 0: 11662.3. Samples: 22628352. Policy #0 lag: (min: 31.0, avg: 125.3, max: 287.0) [2024-06-15 12:04:50,955][1648985] Avg episode reward: [(0, '125.510')] [2024-06-15 12:04:52,281][1652491] Updated weights for policy 0, policy_version 44086 (0.0012) [2024-06-15 12:04:55,301][1652491] Updated weights for policy 0, policy_version 44129 (0.0012) [2024-06-15 12:04:55,955][1648985] Fps is (10 sec: 42596.7, 60 sec: 47513.3, 300 sec: 46763.8). Total num frames: 90439680. Throughput: 0: 11548.3. Samples: 22660608. Policy #0 lag: (min: 31.0, avg: 125.3, max: 287.0) [2024-06-15 12:04:55,956][1648985] Avg episode reward: [(0, '136.850')] [2024-06-15 12:04:56,262][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000044176_90472448.pth... [2024-06-15 12:04:56,437][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000038640_79134720.pth [2024-06-15 12:04:57,290][1652491] Updated weights for policy 0, policy_version 44224 (0.0143) [2024-06-15 12:05:00,955][1648985] Fps is (10 sec: 39320.9, 60 sec: 44236.8, 300 sec: 46208.4). Total num frames: 90570752. Throughput: 0: 11457.5. Samples: 22729216. Policy #0 lag: (min: 31.0, avg: 125.3, max: 287.0) [2024-06-15 12:05:00,956][1648985] Avg episode reward: [(0, '119.210')] [2024-06-15 12:05:01,084][1651469] Signal inference workers to stop experience collection... (2350 times) [2024-06-15 12:05:01,113][1652491] InferenceWorker_p0-w0: stopping experience collection (2350 times) [2024-06-15 12:05:01,303][1651469] Signal inference workers to resume experience collection... (2350 times) [2024-06-15 12:05:01,304][1652491] InferenceWorker_p0-w0: resuming experience collection (2350 times) [2024-06-15 12:05:02,537][1652491] Updated weights for policy 0, policy_version 44304 (0.0114) [2024-06-15 12:05:05,955][1648985] Fps is (10 sec: 39323.0, 60 sec: 45329.0, 300 sec: 46541.7). Total num frames: 90832896. Throughput: 0: 11411.9. Samples: 22793728. Policy #0 lag: (min: 31.0, avg: 125.3, max: 287.0) [2024-06-15 12:05:05,956][1648985] Avg episode reward: [(0, '105.180')] [2024-06-15 12:05:07,044][1652491] Updated weights for policy 0, policy_version 44384 (0.0014) [2024-06-15 12:05:08,823][1652491] Updated weights for policy 0, policy_version 44464 (0.0028) [2024-06-15 12:05:10,974][1648985] Fps is (10 sec: 52328.8, 60 sec: 45314.6, 300 sec: 46316.5). Total num frames: 91095040. Throughput: 0: 11225.1. Samples: 22819840. Policy #0 lag: (min: 31.0, avg: 125.3, max: 287.0) [2024-06-15 12:05:10,975][1648985] Avg episode reward: [(0, '111.960')] [2024-06-15 12:05:13,899][1652491] Updated weights for policy 0, policy_version 44528 (0.0014) [2024-06-15 12:05:15,441][1652491] Updated weights for policy 0, policy_version 44592 (0.0051) [2024-06-15 12:05:15,955][1648985] Fps is (10 sec: 52428.4, 60 sec: 46967.5, 300 sec: 46763.8). Total num frames: 91357184. Throughput: 0: 11343.9. Samples: 22891008. Policy #0 lag: (min: 31.0, avg: 125.3, max: 287.0) [2024-06-15 12:05:15,956][1648985] Avg episode reward: [(0, '126.330')] [2024-06-15 12:05:18,050][1652491] Updated weights for policy 0, policy_version 44612 (0.0013) [2024-06-15 12:05:20,232][1652491] Updated weights for policy 0, policy_version 44704 (0.0099) [2024-06-15 12:05:20,955][1648985] Fps is (10 sec: 52530.2, 60 sec: 47513.7, 300 sec: 46763.8). Total num frames: 91619328. Throughput: 0: 10956.9. Samples: 22951936. Policy #0 lag: (min: 0.0, avg: 100.4, max: 256.0) [2024-06-15 12:05:20,955][1648985] Avg episode reward: [(0, '125.160')] [2024-06-15 12:05:24,832][1652491] Updated weights for policy 0, policy_version 44768 (0.0012) [2024-06-15 12:05:25,955][1648985] Fps is (10 sec: 42599.4, 60 sec: 46421.5, 300 sec: 46763.8). Total num frames: 91783168. Throughput: 0: 11241.3. Samples: 23000064. Policy #0 lag: (min: 0.0, avg: 100.4, max: 256.0) [2024-06-15 12:05:25,955][1648985] Avg episode reward: [(0, '121.410')] [2024-06-15 12:05:26,332][1652491] Updated weights for policy 0, policy_version 44832 (0.0012) [2024-06-15 12:05:28,951][1652491] Updated weights for policy 0, policy_version 44865 (0.0099) [2024-06-15 12:05:30,110][1652491] Updated weights for policy 0, policy_version 44916 (0.0126) [2024-06-15 12:05:30,955][1648985] Fps is (10 sec: 42597.7, 60 sec: 46422.2, 300 sec: 46763.8). Total num frames: 92045312. Throughput: 0: 11400.5. Samples: 23072256. Policy #0 lag: (min: 0.0, avg: 100.4, max: 256.0) [2024-06-15 12:05:30,956][1648985] Avg episode reward: [(0, '113.710')] [2024-06-15 12:05:31,414][1652491] Updated weights for policy 0, policy_version 44962 (0.0038) [2024-06-15 12:05:35,599][1652491] Updated weights for policy 0, policy_version 45015 (0.0011) [2024-06-15 12:05:35,955][1648985] Fps is (10 sec: 42598.3, 60 sec: 44783.1, 300 sec: 46430.6). Total num frames: 92209152. Throughput: 0: 11502.9. Samples: 23145984. Policy #0 lag: (min: 0.0, avg: 100.4, max: 256.0) [2024-06-15 12:05:35,956][1648985] Avg episode reward: [(0, '103.770')] [2024-06-15 12:05:37,764][1652491] Updated weights for policy 0, policy_version 45116 (0.0113) [2024-06-15 12:05:40,668][1652491] Updated weights for policy 0, policy_version 45184 (0.0012) [2024-06-15 12:05:40,955][1648985] Fps is (10 sec: 49151.4, 60 sec: 46421.2, 300 sec: 46874.9). Total num frames: 92536832. Throughput: 0: 11468.9. Samples: 23176704. Policy #0 lag: (min: 0.0, avg: 100.4, max: 256.0) [2024-06-15 12:05:40,956][1648985] Avg episode reward: [(0, '115.140')] [2024-06-15 12:05:41,363][1651469] Signal inference workers to stop experience collection... (2400 times) [2024-06-15 12:05:41,398][1652491] InferenceWorker_p0-w0: stopping experience collection (2400 times) [2024-06-15 12:05:41,591][1651469] Signal inference workers to resume experience collection... (2400 times) [2024-06-15 12:05:41,592][1652491] InferenceWorker_p0-w0: resuming experience collection (2400 times) [2024-06-15 12:05:42,733][1652491] Updated weights for policy 0, policy_version 45238 (0.0021) [2024-06-15 12:05:45,955][1648985] Fps is (10 sec: 45874.7, 60 sec: 44236.8, 300 sec: 46208.4). Total num frames: 92667904. Throughput: 0: 11571.2. Samples: 23249920. Policy #0 lag: (min: 0.0, avg: 100.4, max: 256.0) [2024-06-15 12:05:45,956][1648985] Avg episode reward: [(0, '124.260')] [2024-06-15 12:05:46,869][1652491] Updated weights for policy 0, policy_version 45267 (0.0013) [2024-06-15 12:05:48,516][1652491] Updated weights for policy 0, policy_version 45345 (0.0015) [2024-06-15 12:05:50,303][1652491] Updated weights for policy 0, policy_version 45377 (0.0012) [2024-06-15 12:05:50,955][1648985] Fps is (10 sec: 45875.4, 60 sec: 46967.3, 300 sec: 47097.1). Total num frames: 92995584. Throughput: 0: 11741.8. Samples: 23322112. Policy #0 lag: (min: 15.0, avg: 97.3, max: 271.0) [2024-06-15 12:05:50,956][1648985] Avg episode reward: [(0, '117.440')] [2024-06-15 12:05:51,564][1652491] Updated weights for policy 0, policy_version 45438 (0.0011) [2024-06-15 12:05:53,429][1652491] Updated weights for policy 0, policy_version 45476 (0.0015) [2024-06-15 12:05:55,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 45875.5, 300 sec: 46652.7). Total num frames: 93192192. Throughput: 0: 11849.3. Samples: 23352832. Policy #0 lag: (min: 15.0, avg: 97.3, max: 271.0) [2024-06-15 12:05:55,956][1648985] Avg episode reward: [(0, '114.580')] [2024-06-15 12:05:57,487][1652491] Updated weights for policy 0, policy_version 45508 (0.0011) [2024-06-15 12:05:59,343][1652491] Updated weights for policy 0, policy_version 45588 (0.0013) [2024-06-15 12:06:00,955][1648985] Fps is (10 sec: 45875.4, 60 sec: 48059.7, 300 sec: 47097.0). Total num frames: 93454336. Throughput: 0: 11946.7. Samples: 23428608. Policy #0 lag: (min: 15.0, avg: 97.3, max: 271.0) [2024-06-15 12:06:00,956][1648985] Avg episode reward: [(0, '114.550')] [2024-06-15 12:06:01,927][1652491] Updated weights for policy 0, policy_version 45664 (0.0013) [2024-06-15 12:06:03,801][1652491] Updated weights for policy 0, policy_version 45712 (0.0024) [2024-06-15 12:06:04,934][1652491] Updated weights for policy 0, policy_version 45759 (0.0014) [2024-06-15 12:06:05,955][1648985] Fps is (10 sec: 52428.4, 60 sec: 48059.7, 300 sec: 47097.0). Total num frames: 93716480. Throughput: 0: 12128.7. Samples: 23497728. Policy #0 lag: (min: 15.0, avg: 97.3, max: 271.0) [2024-06-15 12:06:05,956][1648985] Avg episode reward: [(0, '130.260')] [2024-06-15 12:06:09,699][1652491] Updated weights for policy 0, policy_version 45826 (0.0013) [2024-06-15 12:06:10,955][1648985] Fps is (10 sec: 49152.2, 60 sec: 47528.8, 300 sec: 46986.0). Total num frames: 93945856. Throughput: 0: 11958.0. Samples: 23538176. Policy #0 lag: (min: 15.0, avg: 97.3, max: 271.0) [2024-06-15 12:06:10,956][1648985] Avg episode reward: [(0, '132.250')] [2024-06-15 12:06:10,977][1652491] Updated weights for policy 0, policy_version 45885 (0.0012) [2024-06-15 12:06:13,595][1652491] Updated weights for policy 0, policy_version 45952 (0.0012) [2024-06-15 12:06:15,590][1652491] Updated weights for policy 0, policy_version 46008 (0.0012) [2024-06-15 12:06:15,955][1648985] Fps is (10 sec: 52429.7, 60 sec: 48059.9, 300 sec: 47319.2). Total num frames: 94240768. Throughput: 0: 11776.0. Samples: 23602176. Policy #0 lag: (min: 61.0, avg: 202.0, max: 317.0) [2024-06-15 12:06:15,955][1648985] Avg episode reward: [(0, '133.940')] [2024-06-15 12:06:20,028][1652491] Updated weights for policy 0, policy_version 46048 (0.0013) [2024-06-15 12:06:20,955][1648985] Fps is (10 sec: 42599.3, 60 sec: 45875.2, 300 sec: 46652.8). Total num frames: 94371840. Throughput: 0: 11764.6. Samples: 23675392. Policy #0 lag: (min: 61.0, avg: 202.0, max: 317.0) [2024-06-15 12:06:20,955][1648985] Avg episode reward: [(0, '123.730')] [2024-06-15 12:06:21,492][1652491] Updated weights for policy 0, policy_version 46115 (0.0011) [2024-06-15 12:06:24,075][1651469] Signal inference workers to stop experience collection... (2450 times) [2024-06-15 12:06:24,105][1652491] InferenceWorker_p0-w0: stopping experience collection (2450 times) [2024-06-15 12:06:24,429][1651469] Signal inference workers to resume experience collection... (2450 times) [2024-06-15 12:06:24,430][1652491] InferenceWorker_p0-w0: resuming experience collection (2450 times) [2024-06-15 12:06:24,432][1652491] Updated weights for policy 0, policy_version 46176 (0.0089) [2024-06-15 12:06:25,913][1652491] Updated weights for policy 0, policy_version 46224 (0.0011) [2024-06-15 12:06:25,955][1648985] Fps is (10 sec: 42598.4, 60 sec: 48059.7, 300 sec: 47208.2). Total num frames: 94666752. Throughput: 0: 11878.5. Samples: 23711232. Policy #0 lag: (min: 61.0, avg: 202.0, max: 317.0) [2024-06-15 12:06:25,955][1648985] Avg episode reward: [(0, '117.880')] [2024-06-15 12:06:27,178][1652491] Updated weights for policy 0, policy_version 46268 (0.0012) [2024-06-15 12:06:30,955][1648985] Fps is (10 sec: 39320.7, 60 sec: 45329.1, 300 sec: 46652.7). Total num frames: 94765056. Throughput: 0: 11776.0. Samples: 23779840. Policy #0 lag: (min: 61.0, avg: 202.0, max: 317.0) [2024-06-15 12:06:30,956][1648985] Avg episode reward: [(0, '126.770')] [2024-06-15 12:06:32,180][1652491] Updated weights for policy 0, policy_version 46325 (0.0012) [2024-06-15 12:06:33,092][1652491] Updated weights for policy 0, policy_version 46357 (0.0012) [2024-06-15 12:06:35,673][1652491] Updated weights for policy 0, policy_version 46416 (0.0015) [2024-06-15 12:06:35,955][1648985] Fps is (10 sec: 39321.1, 60 sec: 47513.5, 300 sec: 47208.1). Total num frames: 95059968. Throughput: 0: 11673.6. Samples: 23847424. Policy #0 lag: (min: 61.0, avg: 202.0, max: 317.0) [2024-06-15 12:06:35,956][1648985] Avg episode reward: [(0, '140.260')] [2024-06-15 12:06:36,491][1651469] Saving new best policy, reward=140.260! [2024-06-15 12:06:37,254][1652491] Updated weights for policy 0, policy_version 46465 (0.0013) [2024-06-15 12:06:40,956][1648985] Fps is (10 sec: 52427.1, 60 sec: 45875.0, 300 sec: 47097.0). Total num frames: 95289344. Throughput: 0: 11582.5. Samples: 23874048. Policy #0 lag: (min: 61.0, avg: 202.0, max: 317.0) [2024-06-15 12:06:40,957][1648985] Avg episode reward: [(0, '127.090')] [2024-06-15 12:06:42,381][1652491] Updated weights for policy 0, policy_version 46530 (0.0026) [2024-06-15 12:06:43,938][1652491] Updated weights for policy 0, policy_version 46596 (0.0116) [2024-06-15 12:06:45,955][1648985] Fps is (10 sec: 49152.5, 60 sec: 48059.8, 300 sec: 47097.1). Total num frames: 95551488. Throughput: 0: 11525.7. Samples: 23947264. Policy #0 lag: (min: 71.0, avg: 179.6, max: 322.0) [2024-06-15 12:06:45,956][1648985] Avg episode reward: [(0, '111.820')] [2024-06-15 12:06:47,184][1652491] Updated weights for policy 0, policy_version 46672 (0.0013) [2024-06-15 12:06:48,225][1652491] Updated weights for policy 0, policy_version 46717 (0.0015) [2024-06-15 12:06:49,658][1652491] Updated weights for policy 0, policy_version 46781 (0.0083) [2024-06-15 12:06:50,955][1648985] Fps is (10 sec: 52430.3, 60 sec: 46967.5, 300 sec: 47097.0). Total num frames: 95813632. Throughput: 0: 11593.9. Samples: 24019456. Policy #0 lag: (min: 71.0, avg: 179.6, max: 322.0) [2024-06-15 12:06:50,956][1648985] Avg episode reward: [(0, '105.090')] [2024-06-15 12:06:54,006][1652491] Updated weights for policy 0, policy_version 46832 (0.0021) [2024-06-15 12:06:55,035][1652491] Updated weights for policy 0, policy_version 46868 (0.0013) [2024-06-15 12:06:55,955][1648985] Fps is (10 sec: 49151.7, 60 sec: 47513.6, 300 sec: 46986.0). Total num frames: 96043008. Throughput: 0: 11628.1. Samples: 24061440. Policy #0 lag: (min: 71.0, avg: 179.6, max: 322.0) [2024-06-15 12:06:55,956][1648985] Avg episode reward: [(0, '111.660')] [2024-06-15 12:06:56,083][1652491] Updated weights for policy 0, policy_version 46909 (0.0013) [2024-06-15 12:06:56,137][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000046912_96075776.pth... [2024-06-15 12:06:56,237][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000041408_84803584.pth [2024-06-15 12:06:58,210][1652491] Updated weights for policy 0, policy_version 46947 (0.0016) [2024-06-15 12:06:59,918][1652491] Updated weights for policy 0, policy_version 47008 (0.0016) [2024-06-15 12:07:00,955][1648985] Fps is (10 sec: 52429.7, 60 sec: 48059.8, 300 sec: 47208.1). Total num frames: 96337920. Throughput: 0: 11696.3. Samples: 24128512. Policy #0 lag: (min: 71.0, avg: 179.6, max: 322.0) [2024-06-15 12:07:00,956][1648985] Avg episode reward: [(0, '121.470')] [2024-06-15 12:07:04,491][1652491] Updated weights for policy 0, policy_version 47062 (0.0014) [2024-06-15 12:07:05,457][1652491] Updated weights for policy 0, policy_version 47104 (0.0013) [2024-06-15 12:07:05,955][1648985] Fps is (10 sec: 42597.9, 60 sec: 45875.2, 300 sec: 46652.8). Total num frames: 96468992. Throughput: 0: 11662.1. Samples: 24200192. Policy #0 lag: (min: 71.0, avg: 179.6, max: 322.0) [2024-06-15 12:07:05,956][1648985] Avg episode reward: [(0, '129.220')] [2024-06-15 12:07:07,531][1652491] Updated weights for policy 0, policy_version 47159 (0.0012) [2024-06-15 12:07:08,905][1651469] Signal inference workers to stop experience collection... (2500 times) [2024-06-15 12:07:08,987][1652491] InferenceWorker_p0-w0: stopping experience collection (2500 times) [2024-06-15 12:07:09,110][1651469] Signal inference workers to resume experience collection... (2500 times) [2024-06-15 12:07:09,131][1652491] InferenceWorker_p0-w0: resuming experience collection (2500 times) [2024-06-15 12:07:10,211][1652491] Updated weights for policy 0, policy_version 47230 (0.0084) [2024-06-15 12:07:10,955][1648985] Fps is (10 sec: 42596.5, 60 sec: 46967.2, 300 sec: 47430.2). Total num frames: 96763904. Throughput: 0: 11605.2. Samples: 24233472. Policy #0 lag: (min: 71.0, avg: 179.6, max: 322.0) [2024-06-15 12:07:10,956][1648985] Avg episode reward: [(0, '121.740')] [2024-06-15 12:07:11,600][1652491] Updated weights for policy 0, policy_version 47282 (0.0134) [2024-06-15 12:07:15,955][1648985] Fps is (10 sec: 39322.0, 60 sec: 43690.6, 300 sec: 46541.6). Total num frames: 96862208. Throughput: 0: 11582.6. Samples: 24301056. Policy #0 lag: (min: 71.0, avg: 179.6, max: 322.0) [2024-06-15 12:07:15,956][1648985] Avg episode reward: [(0, '125.700')] [2024-06-15 12:07:16,657][1652491] Updated weights for policy 0, policy_version 47314 (0.0011) [2024-06-15 12:07:17,551][1652491] Updated weights for policy 0, policy_version 47357 (0.0017) [2024-06-15 12:07:18,742][1652491] Updated weights for policy 0, policy_version 47408 (0.0013) [2024-06-15 12:07:20,955][1648985] Fps is (10 sec: 39323.3, 60 sec: 46421.2, 300 sec: 47208.1). Total num frames: 97157120. Throughput: 0: 11582.6. Samples: 24368640. Policy #0 lag: (min: 8.0, avg: 90.2, max: 264.0) [2024-06-15 12:07:20,956][1648985] Avg episode reward: [(0, '120.360')] [2024-06-15 12:07:21,532][1652491] Updated weights for policy 0, policy_version 47472 (0.0013) [2024-06-15 12:07:22,839][1652491] Updated weights for policy 0, policy_version 47536 (0.0013) [2024-06-15 12:07:25,955][1648985] Fps is (10 sec: 52429.2, 60 sec: 45329.1, 300 sec: 46763.9). Total num frames: 97386496. Throughput: 0: 11730.6. Samples: 24401920. Policy #0 lag: (min: 8.0, avg: 90.2, max: 264.0) [2024-06-15 12:07:25,956][1648985] Avg episode reward: [(0, '131.670')] [2024-06-15 12:07:28,592][1652491] Updated weights for policy 0, policy_version 47584 (0.0012) [2024-06-15 12:07:29,967][1652491] Updated weights for policy 0, policy_version 47633 (0.0011) [2024-06-15 12:07:30,955][1648985] Fps is (10 sec: 49151.8, 60 sec: 48059.8, 300 sec: 47097.1). Total num frames: 97648640. Throughput: 0: 11662.2. Samples: 24472064. Policy #0 lag: (min: 8.0, avg: 90.2, max: 264.0) [2024-06-15 12:07:30,956][1648985] Avg episode reward: [(0, '130.080')] [2024-06-15 12:07:32,345][1652491] Updated weights for policy 0, policy_version 47696 (0.0012) [2024-06-15 12:07:34,608][1652491] Updated weights for policy 0, policy_version 47792 (0.0012) [2024-06-15 12:07:35,955][1648985] Fps is (10 sec: 52428.2, 60 sec: 47513.6, 300 sec: 47097.1). Total num frames: 97910784. Throughput: 0: 11434.7. Samples: 24534016. Policy #0 lag: (min: 8.0, avg: 90.2, max: 264.0) [2024-06-15 12:07:35,956][1648985] Avg episode reward: [(0, '135.030')] [2024-06-15 12:07:40,487][1652491] Updated weights for policy 0, policy_version 47826 (0.0021) [2024-06-15 12:07:40,955][1648985] Fps is (10 sec: 32767.7, 60 sec: 44783.2, 300 sec: 46430.6). Total num frames: 97976320. Throughput: 0: 11355.0. Samples: 24572416. Policy #0 lag: (min: 8.0, avg: 90.2, max: 264.0) [2024-06-15 12:07:40,956][1648985] Avg episode reward: [(0, '144.300')] [2024-06-15 12:07:41,586][1651469] Saving new best policy, reward=144.300! [2024-06-15 12:07:42,359][1652491] Updated weights for policy 0, policy_version 47891 (0.0013) [2024-06-15 12:07:43,520][1652491] Updated weights for policy 0, policy_version 47936 (0.0014) [2024-06-15 12:07:45,284][1652491] Updated weights for policy 0, policy_version 47989 (0.0013) [2024-06-15 12:07:45,955][1648985] Fps is (10 sec: 42598.9, 60 sec: 46421.3, 300 sec: 47097.1). Total num frames: 98336768. Throughput: 0: 11173.0. Samples: 24631296. Policy #0 lag: (min: 8.0, avg: 90.2, max: 264.0) [2024-06-15 12:07:45,955][1648985] Avg episode reward: [(0, '130.280')] [2024-06-15 12:07:46,494][1652491] Updated weights for policy 0, policy_version 48033 (0.0011) [2024-06-15 12:07:50,955][1648985] Fps is (10 sec: 45875.1, 60 sec: 43690.7, 300 sec: 46208.5). Total num frames: 98435072. Throughput: 0: 11150.2. Samples: 24701952. Policy #0 lag: (min: 62.0, avg: 204.2, max: 354.0) [2024-06-15 12:07:50,956][1648985] Avg episode reward: [(0, '97.720')] [2024-06-15 12:07:51,792][1652491] Updated weights for policy 0, policy_version 48068 (0.0013) [2024-06-15 12:07:52,531][1651469] Signal inference workers to stop experience collection... (2550 times) [2024-06-15 12:07:52,604][1652491] InferenceWorker_p0-w0: stopping experience collection (2550 times) [2024-06-15 12:07:52,822][1651469] Signal inference workers to resume experience collection... (2550 times) [2024-06-15 12:07:52,823][1652491] InferenceWorker_p0-w0: resuming experience collection (2550 times) [2024-06-15 12:07:54,260][1652491] Updated weights for policy 0, policy_version 48163 (0.0155) [2024-06-15 12:07:55,955][1648985] Fps is (10 sec: 36044.5, 60 sec: 44236.8, 300 sec: 46763.8). Total num frames: 98697216. Throughput: 0: 11036.5. Samples: 24730112. Policy #0 lag: (min: 62.0, avg: 204.2, max: 354.0) [2024-06-15 12:07:55,956][1648985] Avg episode reward: [(0, '93.810')] [2024-06-15 12:07:56,214][1652491] Updated weights for policy 0, policy_version 48208 (0.0011) [2024-06-15 12:07:57,594][1652491] Updated weights for policy 0, policy_version 48257 (0.0011) [2024-06-15 12:07:59,250][1652491] Updated weights for policy 0, policy_version 48320 (0.0012) [2024-06-15 12:08:00,955][1648985] Fps is (10 sec: 52429.9, 60 sec: 43690.7, 300 sec: 46430.6). Total num frames: 98959360. Throughput: 0: 10911.3. Samples: 24792064. Policy #0 lag: (min: 62.0, avg: 204.2, max: 354.0) [2024-06-15 12:08:00,956][1648985] Avg episode reward: [(0, '112.270')] [2024-06-15 12:08:05,804][1652491] Updated weights for policy 0, policy_version 48384 (0.0012) [2024-06-15 12:08:05,955][1648985] Fps is (10 sec: 39322.1, 60 sec: 43690.8, 300 sec: 46430.6). Total num frames: 99090432. Throughput: 0: 11002.3. Samples: 24863744. Policy #0 lag: (min: 62.0, avg: 204.2, max: 354.0) [2024-06-15 12:08:05,955][1648985] Avg episode reward: [(0, '140.200')] [2024-06-15 12:08:06,974][1652491] Updated weights for policy 0, policy_version 48433 (0.0011) [2024-06-15 12:08:08,446][1652491] Updated weights for policy 0, policy_version 48484 (0.0012) [2024-06-15 12:08:09,084][1652491] Updated weights for policy 0, policy_version 48510 (0.0012) [2024-06-15 12:08:10,831][1652491] Updated weights for policy 0, policy_version 48565 (0.0145) [2024-06-15 12:08:10,955][1648985] Fps is (10 sec: 49150.7, 60 sec: 44783.1, 300 sec: 46541.6). Total num frames: 99450880. Throughput: 0: 10979.5. Samples: 24896000. Policy #0 lag: (min: 62.0, avg: 204.2, max: 354.0) [2024-06-15 12:08:10,956][1648985] Avg episode reward: [(0, '137.180')] [2024-06-15 12:08:15,955][1648985] Fps is (10 sec: 39321.1, 60 sec: 43690.7, 300 sec: 46208.4). Total num frames: 99483648. Throughput: 0: 11002.3. Samples: 24967168. Policy #0 lag: (min: 62.0, avg: 204.2, max: 354.0) [2024-06-15 12:08:15,956][1648985] Avg episode reward: [(0, '125.830')] [2024-06-15 12:08:16,744][1652491] Updated weights for policy 0, policy_version 48610 (0.0014) [2024-06-15 12:08:18,665][1652491] Updated weights for policy 0, policy_version 48696 (0.0124) [2024-06-15 12:08:20,955][1648985] Fps is (10 sec: 39321.5, 60 sec: 44782.8, 300 sec: 46319.5). Total num frames: 99844096. Throughput: 0: 10968.1. Samples: 25027584. Policy #0 lag: (min: 63.0, avg: 125.6, max: 303.0) [2024-06-15 12:08:20,956][1648985] Avg episode reward: [(0, '104.470')] [2024-06-15 12:08:21,009][1652491] Updated weights for policy 0, policy_version 48768 (0.0034) [2024-06-15 12:08:22,654][1652491] Updated weights for policy 0, policy_version 48830 (0.0014) [2024-06-15 12:08:25,955][1648985] Fps is (10 sec: 52428.1, 60 sec: 43690.5, 300 sec: 46208.4). Total num frames: 100007936. Throughput: 0: 10831.6. Samples: 25059840. Policy #0 lag: (min: 63.0, avg: 125.6, max: 303.0) [2024-06-15 12:08:25,956][1648985] Avg episode reward: [(0, '118.140')] [2024-06-15 12:08:28,658][1652491] Updated weights for policy 0, policy_version 48896 (0.0083) [2024-06-15 12:08:29,943][1652491] Updated weights for policy 0, policy_version 48953 (0.0012) [2024-06-15 12:08:30,955][1648985] Fps is (10 sec: 42599.3, 60 sec: 43690.7, 300 sec: 46208.4). Total num frames: 100270080. Throughput: 0: 11116.1. Samples: 25131520. Policy #0 lag: (min: 63.0, avg: 125.6, max: 303.0) [2024-06-15 12:08:30,956][1648985] Avg episode reward: [(0, '137.100')] [2024-06-15 12:08:32,258][1652491] Updated weights for policy 0, policy_version 49019 (0.0014) [2024-06-15 12:08:33,634][1651469] Signal inference workers to stop experience collection... (2600 times) [2024-06-15 12:08:33,678][1652491] Updated weights for policy 0, policy_version 49058 (0.0011) [2024-06-15 12:08:33,691][1652491] InferenceWorker_p0-w0: stopping experience collection (2600 times) [2024-06-15 12:08:33,962][1651469] Signal inference workers to resume experience collection... (2600 times) [2024-06-15 12:08:33,963][1652491] InferenceWorker_p0-w0: resuming experience collection (2600 times) [2024-06-15 12:08:35,962][1648985] Fps is (10 sec: 52395.4, 60 sec: 43686.0, 300 sec: 46207.4). Total num frames: 100532224. Throughput: 0: 11137.3. Samples: 25203200. Policy #0 lag: (min: 63.0, avg: 125.6, max: 303.0) [2024-06-15 12:08:35,962][1648985] Avg episode reward: [(0, '142.340')] [2024-06-15 12:08:38,918][1652491] Updated weights for policy 0, policy_version 49104 (0.0011) [2024-06-15 12:08:40,466][1652491] Updated weights for policy 0, policy_version 49170 (0.0014) [2024-06-15 12:08:40,955][1648985] Fps is (10 sec: 49151.5, 60 sec: 46421.4, 300 sec: 46208.4). Total num frames: 100761600. Throughput: 0: 11389.1. Samples: 25242624. Policy #0 lag: (min: 63.0, avg: 125.6, max: 303.0) [2024-06-15 12:08:40,956][1648985] Avg episode reward: [(0, '136.660')] [2024-06-15 12:08:42,236][1652491] Updated weights for policy 0, policy_version 49236 (0.0014) [2024-06-15 12:08:44,561][1652491] Updated weights for policy 0, policy_version 49296 (0.0018) [2024-06-15 12:08:45,650][1652491] Updated weights for policy 0, policy_version 49343 (0.0014) [2024-06-15 12:08:45,956][1648985] Fps is (10 sec: 52459.4, 60 sec: 45328.5, 300 sec: 46208.3). Total num frames: 101056512. Throughput: 0: 11457.2. Samples: 25307648. Policy #0 lag: (min: 63.0, avg: 125.6, max: 303.0) [2024-06-15 12:08:45,956][1648985] Avg episode reward: [(0, '134.180')] [2024-06-15 12:08:50,551][1652491] Updated weights for policy 0, policy_version 49406 (0.0018) [2024-06-15 12:08:50,955][1648985] Fps is (10 sec: 42598.7, 60 sec: 45875.3, 300 sec: 46097.4). Total num frames: 101187584. Throughput: 0: 11468.8. Samples: 25379840. Policy #0 lag: (min: 15.0, avg: 88.7, max: 271.0) [2024-06-15 12:08:50,956][1648985] Avg episode reward: [(0, '136.390')] [2024-06-15 12:08:52,323][1652491] Updated weights for policy 0, policy_version 49464 (0.0014) [2024-06-15 12:08:54,124][1652491] Updated weights for policy 0, policy_version 49520 (0.0015) [2024-06-15 12:08:55,955][1648985] Fps is (10 sec: 45877.1, 60 sec: 46967.3, 300 sec: 46097.3). Total num frames: 101515264. Throughput: 0: 11468.8. Samples: 25412096. Policy #0 lag: (min: 15.0, avg: 88.7, max: 271.0) [2024-06-15 12:08:55,956][1648985] Avg episode reward: [(0, '113.240')] [2024-06-15 12:08:55,975][1652491] Updated weights for policy 0, policy_version 49572 (0.0014) [2024-06-15 12:08:56,320][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000049600_101580800.pth... [2024-06-15 12:08:56,383][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000044176_90472448.pth [2024-06-15 12:09:00,603][1652491] Updated weights for policy 0, policy_version 49616 (0.0044) [2024-06-15 12:09:00,955][1648985] Fps is (10 sec: 45874.9, 60 sec: 44782.8, 300 sec: 45875.2). Total num frames: 101646336. Throughput: 0: 11662.2. Samples: 25491968. Policy #0 lag: (min: 15.0, avg: 88.7, max: 271.0) [2024-06-15 12:09:00,956][1648985] Avg episode reward: [(0, '109.300')] [2024-06-15 12:09:02,230][1652491] Updated weights for policy 0, policy_version 49666 (0.0012) [2024-06-15 12:09:03,498][1652491] Updated weights for policy 0, policy_version 49725 (0.0012) [2024-06-15 12:09:05,048][1652491] Updated weights for policy 0, policy_version 49792 (0.0018) [2024-06-15 12:09:05,955][1648985] Fps is (10 sec: 45876.8, 60 sec: 48059.7, 300 sec: 46097.4). Total num frames: 101974016. Throughput: 0: 11787.4. Samples: 25558016. Policy #0 lag: (min: 15.0, avg: 88.7, max: 271.0) [2024-06-15 12:09:05,956][1648985] Avg episode reward: [(0, '114.230')] [2024-06-15 12:09:07,089][1652491] Updated weights for policy 0, policy_version 49856 (0.0019) [2024-06-15 12:09:10,955][1648985] Fps is (10 sec: 45875.6, 60 sec: 44236.9, 300 sec: 45986.3). Total num frames: 102105088. Throughput: 0: 11912.6. Samples: 25595904. Policy #0 lag: (min: 15.0, avg: 88.7, max: 271.0) [2024-06-15 12:09:10,956][1648985] Avg episode reward: [(0, '120.590')] [2024-06-15 12:09:12,515][1652491] Updated weights for policy 0, policy_version 49916 (0.0011) [2024-06-15 12:09:14,236][1652491] Updated weights for policy 0, policy_version 49957 (0.0011) [2024-06-15 12:09:15,955][1648985] Fps is (10 sec: 45874.2, 60 sec: 49151.9, 300 sec: 46319.5). Total num frames: 102432768. Throughput: 0: 11889.7. Samples: 25666560. Policy #0 lag: (min: 15.0, avg: 88.7, max: 271.0) [2024-06-15 12:09:15,956][1648985] Avg episode reward: [(0, '122.820')] [2024-06-15 12:09:16,007][1652491] Updated weights for policy 0, policy_version 50032 (0.0014) [2024-06-15 12:09:17,972][1651469] Signal inference workers to stop experience collection... (2650 times) [2024-06-15 12:09:17,992][1652491] InferenceWorker_p0-w0: stopping experience collection (2650 times) [2024-06-15 12:09:18,228][1651469] Signal inference workers to resume experience collection... (2650 times) [2024-06-15 12:09:18,228][1652491] InferenceWorker_p0-w0: resuming experience collection (2650 times) [2024-06-15 12:09:18,231][1652491] Updated weights for policy 0, policy_version 50096 (0.0011) [2024-06-15 12:09:20,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 46421.5, 300 sec: 46208.5). Total num frames: 102629376. Throughput: 0: 11846.0. Samples: 25736192. Policy #0 lag: (min: 15.0, avg: 88.7, max: 271.0) [2024-06-15 12:09:20,956][1648985] Avg episode reward: [(0, '100.870')] [2024-06-15 12:09:22,656][1652491] Updated weights for policy 0, policy_version 50130 (0.0012) [2024-06-15 12:09:25,955][1648985] Fps is (10 sec: 36045.2, 60 sec: 46421.4, 300 sec: 45875.4). Total num frames: 102793216. Throughput: 0: 11685.0. Samples: 25768448. Policy #0 lag: (min: 6.0, avg: 91.6, max: 262.0) [2024-06-15 12:09:25,956][1648985] Avg episode reward: [(0, '93.470')] [2024-06-15 12:09:26,456][1652491] Updated weights for policy 0, policy_version 50209 (0.0013) [2024-06-15 12:09:28,042][1652491] Updated weights for policy 0, policy_version 50274 (0.0014) [2024-06-15 12:09:29,184][1652491] Updated weights for policy 0, policy_version 50321 (0.0011) [2024-06-15 12:09:30,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 48059.8, 300 sec: 46208.5). Total num frames: 103153664. Throughput: 0: 11673.8. Samples: 25832960. Policy #0 lag: (min: 6.0, avg: 91.6, max: 262.0) [2024-06-15 12:09:30,955][1648985] Avg episode reward: [(0, '91.130')] [2024-06-15 12:09:34,226][1652491] Updated weights for policy 0, policy_version 50385 (0.0014) [2024-06-15 12:09:35,955][1648985] Fps is (10 sec: 49151.9, 60 sec: 45880.2, 300 sec: 45875.2). Total num frames: 103284736. Throughput: 0: 11787.4. Samples: 25910272. Policy #0 lag: (min: 6.0, avg: 91.6, max: 262.0) [2024-06-15 12:09:35,956][1648985] Avg episode reward: [(0, '101.090')] [2024-06-15 12:09:36,809][1652491] Updated weights for policy 0, policy_version 50437 (0.0011) [2024-06-15 12:09:38,512][1652491] Updated weights for policy 0, policy_version 50512 (0.0012) [2024-06-15 12:09:40,464][1652491] Updated weights for policy 0, policy_version 50576 (0.0012) [2024-06-15 12:09:40,955][1648985] Fps is (10 sec: 45874.1, 60 sec: 47513.5, 300 sec: 46097.3). Total num frames: 103612416. Throughput: 0: 11764.6. Samples: 25941504. Policy #0 lag: (min: 6.0, avg: 91.6, max: 262.0) [2024-06-15 12:09:40,956][1648985] Avg episode reward: [(0, '110.370')] [2024-06-15 12:09:41,411][1652491] Updated weights for policy 0, policy_version 50615 (0.0013) [2024-06-15 12:09:45,953][1652491] Updated weights for policy 0, policy_version 50683 (0.0014) [2024-06-15 12:09:45,955][1648985] Fps is (10 sec: 49152.3, 60 sec: 45329.6, 300 sec: 46097.3). Total num frames: 103776256. Throughput: 0: 11685.0. Samples: 26017792. Policy #0 lag: (min: 6.0, avg: 91.6, max: 262.0) [2024-06-15 12:09:45,956][1648985] Avg episode reward: [(0, '111.630')] [2024-06-15 12:09:49,320][1652491] Updated weights for policy 0, policy_version 50738 (0.0049) [2024-06-15 12:09:50,869][1652491] Updated weights for policy 0, policy_version 50800 (0.0043) [2024-06-15 12:09:50,955][1648985] Fps is (10 sec: 42599.4, 60 sec: 47513.6, 300 sec: 46097.4). Total num frames: 104038400. Throughput: 0: 11582.6. Samples: 26079232. Policy #0 lag: (min: 6.0, avg: 91.6, max: 262.0) [2024-06-15 12:09:50,955][1648985] Avg episode reward: [(0, '119.450')] [2024-06-15 12:09:52,369][1652491] Updated weights for policy 0, policy_version 50853 (0.0013) [2024-06-15 12:09:55,955][1648985] Fps is (10 sec: 42598.5, 60 sec: 44783.2, 300 sec: 46208.5). Total num frames: 104202240. Throughput: 0: 11502.9. Samples: 26113536. Policy #0 lag: (min: 6.0, avg: 91.6, max: 262.0) [2024-06-15 12:09:55,955][1648985] Avg episode reward: [(0, '117.450')] [2024-06-15 12:09:56,640][1652491] Updated weights for policy 0, policy_version 50896 (0.0015) [2024-06-15 12:09:57,413][1652491] Updated weights for policy 0, policy_version 50941 (0.0028) [2024-06-15 12:10:00,411][1652491] Updated weights for policy 0, policy_version 50992 (0.0016) [2024-06-15 12:10:00,955][1648985] Fps is (10 sec: 42597.9, 60 sec: 46967.5, 300 sec: 46208.4). Total num frames: 104464384. Throughput: 0: 11605.4. Samples: 26188800. Policy #0 lag: (min: 15.0, avg: 116.8, max: 271.0) [2024-06-15 12:10:00,956][1648985] Avg episode reward: [(0, '109.730')] [2024-06-15 12:10:02,259][1652491] Updated weights for policy 0, policy_version 51065 (0.0012) [2024-06-15 12:10:03,408][1651469] Signal inference workers to stop experience collection... (2700 times) [2024-06-15 12:10:03,491][1652491] InferenceWorker_p0-w0: stopping experience collection (2700 times) [2024-06-15 12:10:03,634][1651469] Signal inference workers to resume experience collection... (2700 times) [2024-06-15 12:10:03,635][1652491] InferenceWorker_p0-w0: resuming experience collection (2700 times) [2024-06-15 12:10:04,329][1652491] Updated weights for policy 0, policy_version 51126 (0.0015) [2024-06-15 12:10:05,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 45875.2, 300 sec: 46211.4). Total num frames: 104726528. Throughput: 0: 11502.9. Samples: 26253824. Policy #0 lag: (min: 15.0, avg: 116.8, max: 271.0) [2024-06-15 12:10:05,956][1648985] Avg episode reward: [(0, '104.210')] [2024-06-15 12:10:08,444][1652491] Updated weights for policy 0, policy_version 51173 (0.0022) [2024-06-15 12:10:10,976][1648985] Fps is (10 sec: 39242.1, 60 sec: 45859.7, 300 sec: 45761.0). Total num frames: 104857600. Throughput: 0: 11543.2. Samples: 26288128. Policy #0 lag: (min: 15.0, avg: 116.8, max: 271.0) [2024-06-15 12:10:10,976][1648985] Avg episode reward: [(0, '116.730')] [2024-06-15 12:10:11,267][1652491] Updated weights for policy 0, policy_version 51219 (0.0028) [2024-06-15 12:10:12,707][1652491] Updated weights for policy 0, policy_version 51269 (0.0014) [2024-06-15 12:10:14,017][1652491] Updated weights for policy 0, policy_version 51328 (0.0013) [2024-06-15 12:10:15,913][1652491] Updated weights for policy 0, policy_version 51378 (0.0013) [2024-06-15 12:10:15,955][1648985] Fps is (10 sec: 49151.7, 60 sec: 46421.4, 300 sec: 46097.3). Total num frames: 105218048. Throughput: 0: 11605.3. Samples: 26355200. Policy #0 lag: (min: 15.0, avg: 116.8, max: 271.0) [2024-06-15 12:10:15,956][1648985] Avg episode reward: [(0, '121.810')] [2024-06-15 12:10:18,815][1652491] Updated weights for policy 0, policy_version 51411 (0.0015) [2024-06-15 12:10:20,955][1648985] Fps is (10 sec: 52536.1, 60 sec: 45875.3, 300 sec: 46097.4). Total num frames: 105381888. Throughput: 0: 11480.2. Samples: 26426880. Policy #0 lag: (min: 15.0, avg: 116.8, max: 271.0) [2024-06-15 12:10:20,956][1648985] Avg episode reward: [(0, '123.300')] [2024-06-15 12:10:22,798][1652491] Updated weights for policy 0, policy_version 51473 (0.0012) [2024-06-15 12:10:24,537][1652491] Updated weights for policy 0, policy_version 51536 (0.0012) [2024-06-15 12:10:25,955][1648985] Fps is (10 sec: 42597.6, 60 sec: 47513.5, 300 sec: 46097.3). Total num frames: 105644032. Throughput: 0: 11491.6. Samples: 26458624. Policy #0 lag: (min: 15.0, avg: 116.8, max: 271.0) [2024-06-15 12:10:25,956][1648985] Avg episode reward: [(0, '135.230')] [2024-06-15 12:10:27,906][1652491] Updated weights for policy 0, policy_version 51641 (0.0015) [2024-06-15 12:10:30,955][1648985] Fps is (10 sec: 42597.6, 60 sec: 44236.7, 300 sec: 46097.3). Total num frames: 105807872. Throughput: 0: 11241.2. Samples: 26523648. Policy #0 lag: (min: 15.0, avg: 132.7, max: 271.0) [2024-06-15 12:10:30,955][1648985] Avg episode reward: [(0, '163.830')] [2024-06-15 12:10:31,594][1651469] Saving new best policy, reward=163.830! [2024-06-15 12:10:31,708][1652491] Updated weights for policy 0, policy_version 51698 (0.0013) [2024-06-15 12:10:35,061][1652491] Updated weights for policy 0, policy_version 51745 (0.0014) [2024-06-15 12:10:35,955][1648985] Fps is (10 sec: 39322.3, 60 sec: 45875.2, 300 sec: 45764.1). Total num frames: 106037248. Throughput: 0: 11332.2. Samples: 26589184. Policy #0 lag: (min: 15.0, avg: 132.7, max: 271.0) [2024-06-15 12:10:35,956][1648985] Avg episode reward: [(0, '135.120')] [2024-06-15 12:10:36,191][1652491] Updated weights for policy 0, policy_version 51796 (0.0012) [2024-06-15 12:10:36,918][1652491] Updated weights for policy 0, policy_version 51835 (0.0011) [2024-06-15 12:10:39,260][1652491] Updated weights for policy 0, policy_version 51875 (0.0011) [2024-06-15 12:10:40,955][1648985] Fps is (10 sec: 49151.6, 60 sec: 44783.0, 300 sec: 46208.4). Total num frames: 106299392. Throughput: 0: 11423.2. Samples: 26627584. Policy #0 lag: (min: 15.0, avg: 132.7, max: 271.0) [2024-06-15 12:10:40,956][1648985] Avg episode reward: [(0, '106.950')] [2024-06-15 12:10:41,524][1652491] Updated weights for policy 0, policy_version 51905 (0.0049) [2024-06-15 12:10:44,575][1652491] Updated weights for policy 0, policy_version 51971 (0.0120) [2024-06-15 12:10:45,955][1648985] Fps is (10 sec: 52429.2, 60 sec: 46421.3, 300 sec: 45986.3). Total num frames: 106561536. Throughput: 0: 11411.9. Samples: 26702336. Policy #0 lag: (min: 15.0, avg: 132.7, max: 271.0) [2024-06-15 12:10:45,956][1648985] Avg episode reward: [(0, '103.300')] [2024-06-15 12:10:46,263][1652491] Updated weights for policy 0, policy_version 52035 (0.0012) [2024-06-15 12:10:46,982][1651469] Signal inference workers to stop experience collection... (2750 times) [2024-06-15 12:10:47,031][1652491] InferenceWorker_p0-w0: stopping experience collection (2750 times) [2024-06-15 12:10:47,184][1651469] Signal inference workers to resume experience collection... (2750 times) [2024-06-15 12:10:47,185][1652491] InferenceWorker_p0-w0: resuming experience collection (2750 times) [2024-06-15 12:10:49,101][1652491] Updated weights for policy 0, policy_version 52101 (0.0130) [2024-06-15 12:10:50,342][1652491] Updated weights for policy 0, policy_version 52154 (0.0015) [2024-06-15 12:10:50,955][1648985] Fps is (10 sec: 52429.9, 60 sec: 46421.3, 300 sec: 46208.4). Total num frames: 106823680. Throughput: 0: 11594.0. Samples: 26775552. Policy #0 lag: (min: 15.0, avg: 132.7, max: 271.0) [2024-06-15 12:10:50,955][1648985] Avg episode reward: [(0, '113.300')] [2024-06-15 12:10:53,125][1652491] Updated weights for policy 0, policy_version 52195 (0.0013) [2024-06-15 12:10:55,955][1648985] Fps is (10 sec: 42598.1, 60 sec: 46421.3, 300 sec: 45875.2). Total num frames: 106987520. Throughput: 0: 11644.7. Samples: 26811904. Policy #0 lag: (min: 15.0, avg: 132.7, max: 271.0) [2024-06-15 12:10:55,956][1648985] Avg episode reward: [(0, '113.510')] [2024-06-15 12:10:56,125][1652491] Updated weights for policy 0, policy_version 52257 (0.0013) [2024-06-15 12:10:56,287][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000052272_107053056.pth... [2024-06-15 12:10:56,354][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000046912_96075776.pth [2024-06-15 12:10:56,360][1651469] Saving a milestone train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/milestones/checkpoint_000052272_107053056.pth [2024-06-15 12:10:56,946][1652491] Updated weights for policy 0, policy_version 52296 (0.0013) [2024-06-15 12:11:00,929][1652491] Updated weights for policy 0, policy_version 52370 (0.0015) [2024-06-15 12:11:00,955][1648985] Fps is (10 sec: 42597.5, 60 sec: 46421.3, 300 sec: 45875.2). Total num frames: 107249664. Throughput: 0: 11662.2. Samples: 26880000. Policy #0 lag: (min: 15.0, avg: 132.7, max: 271.0) [2024-06-15 12:11:00,956][1648985] Avg episode reward: [(0, '122.550')] [2024-06-15 12:11:03,981][1652491] Updated weights for policy 0, policy_version 52418 (0.0012) [2024-06-15 12:11:05,277][1652491] Updated weights for policy 0, policy_version 52473 (0.0045) [2024-06-15 12:11:05,955][1648985] Fps is (10 sec: 49152.3, 60 sec: 45875.2, 300 sec: 45875.2). Total num frames: 107479040. Throughput: 0: 11639.4. Samples: 26950656. Policy #0 lag: (min: 15.0, avg: 117.0, max: 271.0) [2024-06-15 12:11:05,956][1648985] Avg episode reward: [(0, '115.630')] [2024-06-15 12:11:08,621][1652491] Updated weights for policy 0, policy_version 52545 (0.0018) [2024-06-15 12:11:10,962][1648985] Fps is (10 sec: 49116.7, 60 sec: 48070.1, 300 sec: 45763.0). Total num frames: 107741184. Throughput: 0: 11603.5. Samples: 26980864. Policy #0 lag: (min: 15.0, avg: 117.0, max: 271.0) [2024-06-15 12:11:10,963][1648985] Avg episode reward: [(0, '118.850')] [2024-06-15 12:11:11,572][1652491] Updated weights for policy 0, policy_version 52609 (0.0025) [2024-06-15 12:11:13,095][1652491] Updated weights for policy 0, policy_version 52672 (0.0013) [2024-06-15 12:11:15,887][1652491] Updated weights for policy 0, policy_version 52729 (0.0013) [2024-06-15 12:11:15,955][1648985] Fps is (10 sec: 49152.2, 60 sec: 45875.3, 300 sec: 46097.3). Total num frames: 107970560. Throughput: 0: 11821.5. Samples: 27055616. Policy #0 lag: (min: 15.0, avg: 117.0, max: 271.0) [2024-06-15 12:11:15,955][1648985] Avg episode reward: [(0, '127.540')] [2024-06-15 12:11:20,001][1652491] Updated weights for policy 0, policy_version 52804 (0.0139) [2024-06-15 12:11:20,945][1652491] Updated weights for policy 0, policy_version 52852 (0.0016) [2024-06-15 12:11:20,955][1648985] Fps is (10 sec: 49187.6, 60 sec: 47513.4, 300 sec: 45986.3). Total num frames: 108232704. Throughput: 0: 11810.1. Samples: 27120640. Policy #0 lag: (min: 15.0, avg: 117.0, max: 271.0) [2024-06-15 12:11:20,956][1648985] Avg episode reward: [(0, '146.220')] [2024-06-15 12:11:23,737][1652491] Updated weights for policy 0, policy_version 52898 (0.0017) [2024-06-15 12:11:24,432][1652491] Updated weights for policy 0, policy_version 52928 (0.0012) [2024-06-15 12:11:25,955][1648985] Fps is (10 sec: 45874.4, 60 sec: 46421.4, 300 sec: 46319.5). Total num frames: 108429312. Throughput: 0: 11753.3. Samples: 27156480. Policy #0 lag: (min: 15.0, avg: 117.0, max: 271.0) [2024-06-15 12:11:25,956][1648985] Avg episode reward: [(0, '150.290')] [2024-06-15 12:11:26,838][1652491] Updated weights for policy 0, policy_version 52982 (0.0015) [2024-06-15 12:11:30,854][1652491] Updated weights for policy 0, policy_version 53040 (0.0017) [2024-06-15 12:11:30,955][1648985] Fps is (10 sec: 39321.7, 60 sec: 46967.5, 300 sec: 45986.3). Total num frames: 108625920. Throughput: 0: 11798.7. Samples: 27233280. Policy #0 lag: (min: 15.0, avg: 117.0, max: 271.0) [2024-06-15 12:11:30,956][1648985] Avg episode reward: [(0, '141.730')] [2024-06-15 12:11:31,987][1651469] Signal inference workers to stop experience collection... (2800 times) [2024-06-15 12:11:32,022][1652491] InferenceWorker_p0-w0: stopping experience collection (2800 times) [2024-06-15 12:11:32,185][1651469] Signal inference workers to resume experience collection... (2800 times) [2024-06-15 12:11:32,186][1652491] InferenceWorker_p0-w0: resuming experience collection (2800 times) [2024-06-15 12:11:32,533][1652491] Updated weights for policy 0, policy_version 53120 (0.0014) [2024-06-15 12:11:35,860][1652491] Updated weights for policy 0, policy_version 53173 (0.0014) [2024-06-15 12:11:35,955][1648985] Fps is (10 sec: 45875.8, 60 sec: 47513.6, 300 sec: 46097.4). Total num frames: 108888064. Throughput: 0: 11491.5. Samples: 27292672. Policy #0 lag: (min: 15.0, avg: 117.0, max: 271.0) [2024-06-15 12:11:35,956][1648985] Avg episode reward: [(0, '124.100')] [2024-06-15 12:11:37,939][1652491] Updated weights for policy 0, policy_version 53218 (0.0012) [2024-06-15 12:11:40,955][1648985] Fps is (10 sec: 42597.7, 60 sec: 45875.1, 300 sec: 45764.1). Total num frames: 109051904. Throughput: 0: 11457.4. Samples: 27327488. Policy #0 lag: (min: 31.0, avg: 158.6, max: 287.0) [2024-06-15 12:11:40,956][1648985] Avg episode reward: [(0, '121.830')] [2024-06-15 12:11:41,907][1652491] Updated weights for policy 0, policy_version 53264 (0.0011) [2024-06-15 12:11:43,348][1652491] Updated weights for policy 0, policy_version 53328 (0.0011) [2024-06-15 12:11:45,955][1648985] Fps is (10 sec: 42598.1, 60 sec: 45875.1, 300 sec: 45764.1). Total num frames: 109314048. Throughput: 0: 11503.0. Samples: 27397632. Policy #0 lag: (min: 31.0, avg: 158.6, max: 287.0) [2024-06-15 12:11:45,956][1648985] Avg episode reward: [(0, '120.520')] [2024-06-15 12:11:46,376][1652491] Updated weights for policy 0, policy_version 53395 (0.0014) [2024-06-15 12:11:49,081][1652491] Updated weights for policy 0, policy_version 53458 (0.0014) [2024-06-15 12:11:50,123][1652491] Updated weights for policy 0, policy_version 53504 (0.0013) [2024-06-15 12:11:50,955][1648985] Fps is (10 sec: 52431.1, 60 sec: 45875.3, 300 sec: 45875.2). Total num frames: 109576192. Throughput: 0: 11423.3. Samples: 27464704. Policy #0 lag: (min: 31.0, avg: 158.6, max: 287.0) [2024-06-15 12:11:50,955][1648985] Avg episode reward: [(0, '112.810')] [2024-06-15 12:11:53,969][1652491] Updated weights for policy 0, policy_version 53564 (0.0021) [2024-06-15 12:11:55,610][1652491] Updated weights for policy 0, policy_version 53632 (0.0015) [2024-06-15 12:11:55,955][1648985] Fps is (10 sec: 52427.9, 60 sec: 47513.5, 300 sec: 45764.1). Total num frames: 109838336. Throughput: 0: 11516.1. Samples: 27499008. Policy #0 lag: (min: 31.0, avg: 158.6, max: 287.0) [2024-06-15 12:11:55,956][1648985] Avg episode reward: [(0, '121.210')] [2024-06-15 12:12:00,940][1652491] Updated weights for policy 0, policy_version 53712 (0.0013) [2024-06-15 12:12:00,955][1648985] Fps is (10 sec: 42597.5, 60 sec: 45875.3, 300 sec: 45875.2). Total num frames: 110002176. Throughput: 0: 11355.0. Samples: 27566592. Policy #0 lag: (min: 31.0, avg: 158.6, max: 287.0) [2024-06-15 12:12:00,956][1648985] Avg episode reward: [(0, '110.730')] [2024-06-15 12:12:02,134][1652491] Updated weights for policy 0, policy_version 53755 (0.0012) [2024-06-15 12:12:05,427][1652491] Updated weights for policy 0, policy_version 53817 (0.0014) [2024-06-15 12:12:05,955][1648985] Fps is (10 sec: 39322.2, 60 sec: 45875.1, 300 sec: 45653.1). Total num frames: 110231552. Throughput: 0: 11400.5. Samples: 27633664. Policy #0 lag: (min: 31.0, avg: 158.6, max: 287.0) [2024-06-15 12:12:05,956][1648985] Avg episode reward: [(0, '103.630')] [2024-06-15 12:12:07,752][1652491] Updated weights for policy 0, policy_version 53882 (0.0019) [2024-06-15 12:12:09,811][1652491] Updated weights for policy 0, policy_version 53947 (0.0128) [2024-06-15 12:12:10,955][1648985] Fps is (10 sec: 49152.0, 60 sec: 45880.8, 300 sec: 46208.4). Total num frames: 110493696. Throughput: 0: 11309.5. Samples: 27665408. Policy #0 lag: (min: 31.0, avg: 158.6, max: 287.0) [2024-06-15 12:12:10,956][1648985] Avg episode reward: [(0, '111.340')] [2024-06-15 12:12:13,500][1652491] Updated weights for policy 0, policy_version 54010 (0.0013) [2024-06-15 12:12:15,955][1648985] Fps is (10 sec: 42599.0, 60 sec: 44782.9, 300 sec: 45764.1). Total num frames: 110657536. Throughput: 0: 11218.5. Samples: 27738112. Policy #0 lag: (min: 15.0, avg: 129.6, max: 271.0) [2024-06-15 12:12:15,956][1648985] Avg episode reward: [(0, '115.370')] [2024-06-15 12:12:16,232][1652491] Updated weights for policy 0, policy_version 54051 (0.0014) [2024-06-15 12:12:18,450][1652491] Updated weights for policy 0, policy_version 54112 (0.0015) [2024-06-15 12:12:19,901][1651469] Signal inference workers to stop experience collection... (2850 times) [2024-06-15 12:12:19,961][1652491] InferenceWorker_p0-w0: stopping experience collection (2850 times) [2024-06-15 12:12:20,128][1651469] Signal inference workers to resume experience collection... (2850 times) [2024-06-15 12:12:20,129][1652491] InferenceWorker_p0-w0: resuming experience collection (2850 times) [2024-06-15 12:12:20,668][1652491] Updated weights for policy 0, policy_version 54177 (0.0043) [2024-06-15 12:12:20,955][1648985] Fps is (10 sec: 45875.7, 60 sec: 45329.2, 300 sec: 45986.3). Total num frames: 110952448. Throughput: 0: 11332.3. Samples: 27802624. Policy #0 lag: (min: 15.0, avg: 129.6, max: 271.0) [2024-06-15 12:12:20,955][1648985] Avg episode reward: [(0, '105.640')] [2024-06-15 12:12:24,145][1652491] Updated weights for policy 0, policy_version 54209 (0.0011) [2024-06-15 12:12:25,115][1652491] Updated weights for policy 0, policy_version 54266 (0.0075) [2024-06-15 12:12:25,955][1648985] Fps is (10 sec: 49151.5, 60 sec: 45329.1, 300 sec: 45764.1). Total num frames: 111149056. Throughput: 0: 11355.1. Samples: 27838464. Policy #0 lag: (min: 15.0, avg: 129.6, max: 271.0) [2024-06-15 12:12:25,956][1648985] Avg episode reward: [(0, '98.740')] [2024-06-15 12:12:27,812][1652491] Updated weights for policy 0, policy_version 54326 (0.0015) [2024-06-15 12:12:30,247][1652491] Updated weights for policy 0, policy_version 54400 (0.0013) [2024-06-15 12:12:30,955][1648985] Fps is (10 sec: 45874.2, 60 sec: 46421.3, 300 sec: 45764.1). Total num frames: 111411200. Throughput: 0: 11434.6. Samples: 27912192. Policy #0 lag: (min: 15.0, avg: 129.6, max: 271.0) [2024-06-15 12:12:30,956][1648985] Avg episode reward: [(0, '112.370')] [2024-06-15 12:12:32,682][1652491] Updated weights for policy 0, policy_version 54457 (0.0014) [2024-06-15 12:12:35,616][1652491] Updated weights for policy 0, policy_version 54512 (0.0012) [2024-06-15 12:12:35,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 46421.3, 300 sec: 46430.6). Total num frames: 111673344. Throughput: 0: 11423.2. Samples: 27978752. Policy #0 lag: (min: 15.0, avg: 129.6, max: 271.0) [2024-06-15 12:12:35,956][1648985] Avg episode reward: [(0, '130.610')] [2024-06-15 12:12:39,238][1652491] Updated weights for policy 0, policy_version 54580 (0.0144) [2024-06-15 12:12:40,502][1652491] Updated weights for policy 0, policy_version 54608 (0.0012) [2024-06-15 12:12:40,955][1648985] Fps is (10 sec: 45875.7, 60 sec: 46967.6, 300 sec: 45875.2). Total num frames: 111869952. Throughput: 0: 11503.0. Samples: 28016640. Policy #0 lag: (min: 15.0, avg: 129.6, max: 271.0) [2024-06-15 12:12:40,956][1648985] Avg episode reward: [(0, '118.200')] [2024-06-15 12:12:41,554][1652491] Updated weights for policy 0, policy_version 54656 (0.0012) [2024-06-15 12:12:43,936][1652491] Updated weights for policy 0, policy_version 54716 (0.0015) [2024-06-15 12:12:45,955][1648985] Fps is (10 sec: 39320.6, 60 sec: 45875.0, 300 sec: 46208.4). Total num frames: 112066560. Throughput: 0: 11548.4. Samples: 28086272. Policy #0 lag: (min: 15.0, avg: 129.6, max: 271.0) [2024-06-15 12:12:45,956][1648985] Avg episode reward: [(0, '107.710')] [2024-06-15 12:12:47,178][1652491] Updated weights for policy 0, policy_version 54779 (0.0085) [2024-06-15 12:12:50,307][1652491] Updated weights for policy 0, policy_version 54820 (0.0013) [2024-06-15 12:12:50,955][1648985] Fps is (10 sec: 45875.4, 60 sec: 45875.1, 300 sec: 46208.4). Total num frames: 112328704. Throughput: 0: 11753.3. Samples: 28162560. Policy #0 lag: (min: 27.0, avg: 138.3, max: 283.0) [2024-06-15 12:12:50,956][1648985] Avg episode reward: [(0, '113.720')] [2024-06-15 12:12:52,038][1652491] Updated weights for policy 0, policy_version 54880 (0.0013) [2024-06-15 12:12:53,468][1652491] Updated weights for policy 0, policy_version 54916 (0.0013) [2024-06-15 12:12:55,956][1648985] Fps is (10 sec: 52427.7, 60 sec: 45875.0, 300 sec: 46208.3). Total num frames: 112590848. Throughput: 0: 11662.1. Samples: 28190208. Policy #0 lag: (min: 27.0, avg: 138.3, max: 283.0) [2024-06-15 12:12:55,956][1648985] Avg episode reward: [(0, '126.850')] [2024-06-15 12:12:55,962][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000054976_112590848.pth... [2024-06-15 12:12:56,011][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000049600_101580800.pth [2024-06-15 12:12:58,559][1652491] Updated weights for policy 0, policy_version 55008 (0.0015) [2024-06-15 12:13:00,370][1652491] Updated weights for policy 0, policy_version 55043 (0.0010) [2024-06-15 12:13:00,955][1648985] Fps is (10 sec: 45873.9, 60 sec: 46421.2, 300 sec: 46430.5). Total num frames: 112787456. Throughput: 0: 11582.5. Samples: 28259328. Policy #0 lag: (min: 27.0, avg: 138.3, max: 283.0) [2024-06-15 12:13:00,957][1648985] Avg episode reward: [(0, '136.610')] [2024-06-15 12:13:01,570][1652491] Updated weights for policy 0, policy_version 55099 (0.0013) [2024-06-15 12:13:04,214][1652491] Updated weights for policy 0, policy_version 55138 (0.0014) [2024-06-15 12:13:05,163][1651469] Signal inference workers to stop experience collection... (2900 times) [2024-06-15 12:13:05,200][1652491] InferenceWorker_p0-w0: stopping experience collection (2900 times) [2024-06-15 12:13:05,449][1651469] Signal inference workers to resume experience collection... (2900 times) [2024-06-15 12:13:05,450][1652491] InferenceWorker_p0-w0: resuming experience collection (2900 times) [2024-06-15 12:13:05,688][1652491] Updated weights for policy 0, policy_version 55203 (0.0012) [2024-06-15 12:13:05,955][1648985] Fps is (10 sec: 49154.5, 60 sec: 47513.7, 300 sec: 46208.5). Total num frames: 113082368. Throughput: 0: 11707.7. Samples: 28329472. Policy #0 lag: (min: 27.0, avg: 138.3, max: 283.0) [2024-06-15 12:13:05,956][1648985] Avg episode reward: [(0, '141.080')] [2024-06-15 12:13:09,700][1652491] Updated weights for policy 0, policy_version 55241 (0.0031) [2024-06-15 12:13:10,955][1648985] Fps is (10 sec: 42599.6, 60 sec: 45329.1, 300 sec: 46541.7). Total num frames: 113213440. Throughput: 0: 11730.5. Samples: 28366336. Policy #0 lag: (min: 27.0, avg: 138.3, max: 283.0) [2024-06-15 12:13:10,956][1648985] Avg episode reward: [(0, '117.020')] [2024-06-15 12:13:11,615][1652491] Updated weights for policy 0, policy_version 55299 (0.0019) [2024-06-15 12:13:13,098][1652491] Updated weights for policy 0, policy_version 55360 (0.0012) [2024-06-15 12:13:15,955][1648985] Fps is (10 sec: 39320.8, 60 sec: 46967.3, 300 sec: 46208.4). Total num frames: 113475584. Throughput: 0: 11685.0. Samples: 28438016. Policy #0 lag: (min: 27.0, avg: 138.3, max: 283.0) [2024-06-15 12:13:15,956][1648985] Avg episode reward: [(0, '109.650')] [2024-06-15 12:13:16,435][1652491] Updated weights for policy 0, policy_version 55430 (0.0089) [2024-06-15 12:13:17,484][1652491] Updated weights for policy 0, policy_version 55479 (0.0017) [2024-06-15 12:13:20,832][1652491] Updated weights for policy 0, policy_version 55511 (0.0011) [2024-06-15 12:13:20,955][1648985] Fps is (10 sec: 45873.9, 60 sec: 45328.8, 300 sec: 46319.5). Total num frames: 113672192. Throughput: 0: 11753.2. Samples: 28507648. Policy #0 lag: (min: 27.0, avg: 138.3, max: 283.0) [2024-06-15 12:13:20,956][1648985] Avg episode reward: [(0, '118.260')] [2024-06-15 12:13:22,816][1652491] Updated weights for policy 0, policy_version 55584 (0.0089) [2024-06-15 12:13:25,955][1648985] Fps is (10 sec: 45875.2, 60 sec: 46421.2, 300 sec: 46319.5). Total num frames: 113934336. Throughput: 0: 11616.7. Samples: 28539392. Policy #0 lag: (min: 47.0, avg: 150.8, max: 303.0) [2024-06-15 12:13:25,956][1648985] Avg episode reward: [(0, '123.460')] [2024-06-15 12:13:26,011][1652491] Updated weights for policy 0, policy_version 55635 (0.0025) [2024-06-15 12:13:27,269][1652491] Updated weights for policy 0, policy_version 55685 (0.0013) [2024-06-15 12:13:30,955][1648985] Fps is (10 sec: 49152.5, 60 sec: 45875.2, 300 sec: 46209.4). Total num frames: 114163712. Throughput: 0: 11787.4. Samples: 28616704. Policy #0 lag: (min: 47.0, avg: 150.8, max: 303.0) [2024-06-15 12:13:30,956][1648985] Avg episode reward: [(0, '135.860')] [2024-06-15 12:13:31,447][1652491] Updated weights for policy 0, policy_version 55760 (0.0014) [2024-06-15 12:13:32,816][1652491] Updated weights for policy 0, policy_version 55812 (0.0014) [2024-06-15 12:13:35,955][1648985] Fps is (10 sec: 49152.9, 60 sec: 45875.2, 300 sec: 46319.5). Total num frames: 114425856. Throughput: 0: 11525.7. Samples: 28681216. Policy #0 lag: (min: 47.0, avg: 150.8, max: 303.0) [2024-06-15 12:13:35,956][1648985] Avg episode reward: [(0, '135.330')] [2024-06-15 12:13:37,007][1652491] Updated weights for policy 0, policy_version 55888 (0.0015) [2024-06-15 12:13:38,173][1652491] Updated weights for policy 0, policy_version 55936 (0.0012) [2024-06-15 12:13:40,217][1652491] Updated weights for policy 0, policy_version 55993 (0.0012) [2024-06-15 12:13:40,955][1648985] Fps is (10 sec: 52429.8, 60 sec: 46967.5, 300 sec: 46208.5). Total num frames: 114688000. Throughput: 0: 11673.7. Samples: 28715520. Policy #0 lag: (min: 47.0, avg: 150.8, max: 303.0) [2024-06-15 12:13:40,956][1648985] Avg episode reward: [(0, '137.520')] [2024-06-15 12:13:44,541][1652491] Updated weights for policy 0, policy_version 56053 (0.0012) [2024-06-15 12:13:45,955][1648985] Fps is (10 sec: 45875.1, 60 sec: 46967.7, 300 sec: 46430.6). Total num frames: 114884608. Throughput: 0: 11650.9. Samples: 28783616. Policy #0 lag: (min: 47.0, avg: 150.8, max: 303.0) [2024-06-15 12:13:45,956][1648985] Avg episode reward: [(0, '133.750')] [2024-06-15 12:13:46,440][1652491] Updated weights for policy 0, policy_version 56124 (0.0012) [2024-06-15 12:13:50,154][1652491] Updated weights for policy 0, policy_version 56185 (0.0013) [2024-06-15 12:13:50,919][1651469] Signal inference workers to stop experience collection... (2950 times) [2024-06-15 12:13:50,955][1648985] Fps is (10 sec: 42597.4, 60 sec: 46421.2, 300 sec: 46097.4). Total num frames: 115113984. Throughput: 0: 11446.0. Samples: 28844544. Policy #0 lag: (min: 47.0, avg: 150.8, max: 303.0) [2024-06-15 12:13:50,956][1648985] Avg episode reward: [(0, '117.870')] [2024-06-15 12:13:50,979][1652491] InferenceWorker_p0-w0: stopping experience collection (2950 times) [2024-06-15 12:13:51,211][1651469] Signal inference workers to resume experience collection... (2950 times) [2024-06-15 12:13:51,212][1652491] InferenceWorker_p0-w0: resuming experience collection (2950 times) [2024-06-15 12:13:51,410][1652491] Updated weights for policy 0, policy_version 56225 (0.0025) [2024-06-15 12:13:55,938][1652491] Updated weights for policy 0, policy_version 56262 (0.0012) [2024-06-15 12:13:55,955][1648985] Fps is (10 sec: 32768.0, 60 sec: 43691.0, 300 sec: 45986.3). Total num frames: 115212288. Throughput: 0: 11400.5. Samples: 28879360. Policy #0 lag: (min: 47.0, avg: 150.8, max: 303.0) [2024-06-15 12:13:55,956][1648985] Avg episode reward: [(0, '113.910')] [2024-06-15 12:13:57,772][1652491] Updated weights for policy 0, policy_version 56336 (0.0019) [2024-06-15 12:14:00,955][1648985] Fps is (10 sec: 36045.7, 60 sec: 44783.2, 300 sec: 45764.1). Total num frames: 115474432. Throughput: 0: 11229.9. Samples: 28943360. Policy #0 lag: (min: 21.0, avg: 107.0, max: 277.0) [2024-06-15 12:14:00,955][1648985] Avg episode reward: [(0, '108.200')] [2024-06-15 12:14:01,797][1652491] Updated weights for policy 0, policy_version 56432 (0.0012) [2024-06-15 12:14:03,628][1652491] Updated weights for policy 0, policy_version 56508 (0.0013) [2024-06-15 12:14:05,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 44236.8, 300 sec: 46208.4). Total num frames: 115736576. Throughput: 0: 11070.6. Samples: 29005824. Policy #0 lag: (min: 21.0, avg: 107.0, max: 277.0) [2024-06-15 12:14:05,956][1648985] Avg episode reward: [(0, '111.900')] [2024-06-15 12:14:09,307][1652491] Updated weights for policy 0, policy_version 56576 (0.0016) [2024-06-15 12:14:10,488][1652491] Updated weights for policy 0, policy_version 56634 (0.0016) [2024-06-15 12:14:10,962][1648985] Fps is (10 sec: 52391.3, 60 sec: 46415.8, 300 sec: 45985.2). Total num frames: 115998720. Throughput: 0: 11239.5. Samples: 29045248. Policy #0 lag: (min: 21.0, avg: 107.0, max: 277.0) [2024-06-15 12:14:10,963][1648985] Avg episode reward: [(0, '109.310')] [2024-06-15 12:14:13,150][1652491] Updated weights for policy 0, policy_version 56688 (0.0049) [2024-06-15 12:14:14,939][1652491] Updated weights for policy 0, policy_version 56764 (0.0013) [2024-06-15 12:14:15,955][1648985] Fps is (10 sec: 52429.1, 60 sec: 46421.5, 300 sec: 46208.4). Total num frames: 116260864. Throughput: 0: 10865.8. Samples: 29105664. Policy #0 lag: (min: 21.0, avg: 107.0, max: 277.0) [2024-06-15 12:14:15,955][1648985] Avg episode reward: [(0, '103.160')] [2024-06-15 12:14:20,163][1652491] Updated weights for policy 0, policy_version 56816 (0.0011) [2024-06-15 12:14:20,955][1648985] Fps is (10 sec: 42628.8, 60 sec: 45875.4, 300 sec: 46208.4). Total num frames: 116424704. Throughput: 0: 11173.0. Samples: 29184000. Policy #0 lag: (min: 21.0, avg: 107.0, max: 277.0) [2024-06-15 12:14:20,956][1648985] Avg episode reward: [(0, '115.810')] [2024-06-15 12:14:23,577][1652491] Updated weights for policy 0, policy_version 56897 (0.0013) [2024-06-15 12:14:25,638][1652491] Updated weights for policy 0, policy_version 56976 (0.0013) [2024-06-15 12:14:25,955][1648985] Fps is (10 sec: 42598.0, 60 sec: 45875.3, 300 sec: 45875.2). Total num frames: 116686848. Throughput: 0: 11081.9. Samples: 29214208. Policy #0 lag: (min: 21.0, avg: 107.0, max: 277.0) [2024-06-15 12:14:25,956][1648985] Avg episode reward: [(0, '123.070')] [2024-06-15 12:14:26,842][1652491] Updated weights for policy 0, policy_version 57023 (0.0013) [2024-06-15 12:14:30,955][1648985] Fps is (10 sec: 39320.9, 60 sec: 44236.8, 300 sec: 45875.2). Total num frames: 116817920. Throughput: 0: 11275.3. Samples: 29291008. Policy #0 lag: (min: 21.0, avg: 107.0, max: 277.0) [2024-06-15 12:14:30,956][1648985] Avg episode reward: [(0, '117.710')] [2024-06-15 12:14:31,503][1652491] Updated weights for policy 0, policy_version 57073 (0.0013) [2024-06-15 12:14:33,080][1652491] Updated weights for policy 0, policy_version 57150 (0.0013) [2024-06-15 12:14:34,527][1651469] Signal inference workers to stop experience collection... (3000 times) [2024-06-15 12:14:34,590][1652491] InferenceWorker_p0-w0: stopping experience collection (3000 times) [2024-06-15 12:14:34,810][1651469] Signal inference workers to resume experience collection... (3000 times) [2024-06-15 12:14:34,811][1652491] InferenceWorker_p0-w0: resuming experience collection (3000 times) [2024-06-15 12:14:35,748][1652491] Updated weights for policy 0, policy_version 57216 (0.0013) [2024-06-15 12:14:35,955][1648985] Fps is (10 sec: 49151.2, 60 sec: 45875.0, 300 sec: 45986.3). Total num frames: 117178368. Throughput: 0: 11320.9. Samples: 29353984. Policy #0 lag: (min: 54.0, avg: 130.1, max: 310.0) [2024-06-15 12:14:35,956][1648985] Avg episode reward: [(0, '120.570')] [2024-06-15 12:14:37,838][1652491] Updated weights for policy 0, policy_version 57264 (0.0020) [2024-06-15 12:14:40,956][1648985] Fps is (10 sec: 49150.8, 60 sec: 43690.4, 300 sec: 45875.1). Total num frames: 117309440. Throughput: 0: 11343.6. Samples: 29389824. Policy #0 lag: (min: 54.0, avg: 130.1, max: 310.0) [2024-06-15 12:14:40,957][1648985] Avg episode reward: [(0, '120.500')] [2024-06-15 12:14:41,498][1652491] Updated weights for policy 0, policy_version 57283 (0.0012) [2024-06-15 12:14:43,433][1652491] Updated weights for policy 0, policy_version 57366 (0.0012) [2024-06-15 12:14:45,354][1652491] Updated weights for policy 0, policy_version 57409 (0.0021) [2024-06-15 12:14:45,955][1648985] Fps is (10 sec: 42599.5, 60 sec: 45329.1, 300 sec: 45986.3). Total num frames: 117604352. Throughput: 0: 11548.4. Samples: 29463040. Policy #0 lag: (min: 54.0, avg: 130.1, max: 310.0) [2024-06-15 12:14:45,956][1648985] Avg episode reward: [(0, '114.370')] [2024-06-15 12:14:46,576][1652491] Updated weights for policy 0, policy_version 57467 (0.0107) [2024-06-15 12:14:49,200][1652491] Updated weights for policy 0, policy_version 57520 (0.0028) [2024-06-15 12:14:50,955][1648985] Fps is (10 sec: 52429.7, 60 sec: 45329.1, 300 sec: 46208.4). Total num frames: 117833728. Throughput: 0: 11764.6. Samples: 29535232. Policy #0 lag: (min: 54.0, avg: 130.1, max: 310.0) [2024-06-15 12:14:50,956][1648985] Avg episode reward: [(0, '118.660')] [2024-06-15 12:14:53,902][1652491] Updated weights for policy 0, policy_version 57569 (0.0013) [2024-06-15 12:14:55,494][1652491] Updated weights for policy 0, policy_version 57638 (0.0014) [2024-06-15 12:14:55,955][1648985] Fps is (10 sec: 49151.1, 60 sec: 48059.7, 300 sec: 46208.4). Total num frames: 118095872. Throughput: 0: 11812.0. Samples: 29576704. Policy #0 lag: (min: 54.0, avg: 130.1, max: 310.0) [2024-06-15 12:14:55,956][1648985] Avg episode reward: [(0, '125.880')] [2024-06-15 12:14:55,974][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000057664_118095872.pth... [2024-06-15 12:14:56,074][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000052272_107053056.pth [2024-06-15 12:14:56,406][1652491] Updated weights for policy 0, policy_version 57668 (0.0034) [2024-06-15 12:14:57,604][1652491] Updated weights for policy 0, policy_version 57724 (0.0101) [2024-06-15 12:15:00,032][1652491] Updated weights for policy 0, policy_version 57776 (0.0013) [2024-06-15 12:15:00,955][1648985] Fps is (10 sec: 52430.3, 60 sec: 48059.8, 300 sec: 46208.4). Total num frames: 118358016. Throughput: 0: 11912.5. Samples: 29641728. Policy #0 lag: (min: 54.0, avg: 130.1, max: 310.0) [2024-06-15 12:15:00,955][1648985] Avg episode reward: [(0, '128.720')] [2024-06-15 12:15:04,914][1652491] Updated weights for policy 0, policy_version 57827 (0.0015) [2024-06-15 12:15:05,955][1648985] Fps is (10 sec: 39321.5, 60 sec: 45875.1, 300 sec: 46211.6). Total num frames: 118489088. Throughput: 0: 11776.0. Samples: 29713920. Policy #0 lag: (min: 54.0, avg: 130.1, max: 310.0) [2024-06-15 12:15:05,956][1648985] Avg episode reward: [(0, '116.450')] [2024-06-15 12:15:06,510][1652491] Updated weights for policy 0, policy_version 57889 (0.0014) [2024-06-15 12:15:07,980][1652491] Updated weights for policy 0, policy_version 57968 (0.0014) [2024-06-15 12:15:10,955][1648985] Fps is (10 sec: 45874.1, 60 sec: 46972.9, 300 sec: 46097.3). Total num frames: 118816768. Throughput: 0: 11776.0. Samples: 29744128. Policy #0 lag: (min: 61.0, avg: 174.0, max: 285.0) [2024-06-15 12:15:10,956][1648985] Avg episode reward: [(0, '95.110')] [2024-06-15 12:15:11,030][1652491] Updated weights for policy 0, policy_version 58021 (0.0014) [2024-06-15 12:15:15,280][1652491] Updated weights for policy 0, policy_version 58054 (0.0016) [2024-06-15 12:15:15,955][1648985] Fps is (10 sec: 45876.0, 60 sec: 44782.9, 300 sec: 45986.3). Total num frames: 118947840. Throughput: 0: 11878.4. Samples: 29825536. Policy #0 lag: (min: 61.0, avg: 174.0, max: 285.0) [2024-06-15 12:15:15,956][1648985] Avg episode reward: [(0, '112.150')] [2024-06-15 12:15:16,631][1651469] Signal inference workers to stop experience collection... (3050 times) [2024-06-15 12:15:16,681][1652491] InferenceWorker_p0-w0: stopping experience collection (3050 times) [2024-06-15 12:15:16,857][1651469] Signal inference workers to resume experience collection... (3050 times) [2024-06-15 12:15:16,858][1652491] InferenceWorker_p0-w0: resuming experience collection (3050 times) [2024-06-15 12:15:17,008][1652491] Updated weights for policy 0, policy_version 58130 (0.0175) [2024-06-15 12:15:18,567][1652491] Updated weights for policy 0, policy_version 58198 (0.0012) [2024-06-15 12:15:20,955][1648985] Fps is (10 sec: 45874.9, 60 sec: 47513.4, 300 sec: 46208.4). Total num frames: 119275520. Throughput: 0: 11810.1. Samples: 29885440. Policy #0 lag: (min: 61.0, avg: 174.0, max: 285.0) [2024-06-15 12:15:20,956][1648985] Avg episode reward: [(0, '131.540')] [2024-06-15 12:15:22,433][1652491] Updated weights for policy 0, policy_version 58279 (0.0013) [2024-06-15 12:15:22,977][1652491] Updated weights for policy 0, policy_version 58304 (0.0013) [2024-06-15 12:15:25,955][1648985] Fps is (10 sec: 45875.1, 60 sec: 45329.1, 300 sec: 46097.4). Total num frames: 119406592. Throughput: 0: 11810.2. Samples: 29921280. Policy #0 lag: (min: 61.0, avg: 174.0, max: 285.0) [2024-06-15 12:15:25,956][1648985] Avg episode reward: [(0, '121.850')] [2024-06-15 12:15:28,404][1652491] Updated weights for policy 0, policy_version 58384 (0.0013) [2024-06-15 12:15:29,890][1652491] Updated weights for policy 0, policy_version 58448 (0.0179) [2024-06-15 12:15:30,955][1648985] Fps is (10 sec: 49152.5, 60 sec: 49152.0, 300 sec: 46541.7). Total num frames: 119767040. Throughput: 0: 11719.1. Samples: 29990400. Policy #0 lag: (min: 61.0, avg: 174.0, max: 285.0) [2024-06-15 12:15:30,956][1648985] Avg episode reward: [(0, '118.040')] [2024-06-15 12:15:33,249][1652491] Updated weights for policy 0, policy_version 58497 (0.0014) [2024-06-15 12:15:34,429][1652491] Updated weights for policy 0, policy_version 58554 (0.0014) [2024-06-15 12:15:35,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 45875.3, 300 sec: 46208.5). Total num frames: 119930880. Throughput: 0: 11696.4. Samples: 30061568. Policy #0 lag: (min: 61.0, avg: 174.0, max: 285.0) [2024-06-15 12:15:35,956][1648985] Avg episode reward: [(0, '119.410')] [2024-06-15 12:15:39,127][1652491] Updated weights for policy 0, policy_version 58597 (0.0017) [2024-06-15 12:15:40,335][1652491] Updated weights for policy 0, policy_version 58656 (0.0013) [2024-06-15 12:15:40,955][1648985] Fps is (10 sec: 42599.1, 60 sec: 48060.1, 300 sec: 46208.4). Total num frames: 120193024. Throughput: 0: 11594.0. Samples: 30098432. Policy #0 lag: (min: 61.0, avg: 174.0, max: 285.0) [2024-06-15 12:15:40,955][1648985] Avg episode reward: [(0, '108.980')] [2024-06-15 12:15:41,422][1652491] Updated weights for policy 0, policy_version 58709 (0.0014) [2024-06-15 12:15:45,166][1652491] Updated weights for policy 0, policy_version 58800 (0.0015) [2024-06-15 12:15:45,955][1648985] Fps is (10 sec: 52428.1, 60 sec: 47513.4, 300 sec: 46208.4). Total num frames: 120455168. Throughput: 0: 11616.7. Samples: 30164480. Policy #0 lag: (min: 15.0, avg: 147.0, max: 271.0) [2024-06-15 12:15:45,956][1648985] Avg episode reward: [(0, '107.720')] [2024-06-15 12:15:49,849][1652491] Updated weights for policy 0, policy_version 58850 (0.0094) [2024-06-15 12:15:50,955][1648985] Fps is (10 sec: 42597.5, 60 sec: 46421.4, 300 sec: 46208.4). Total num frames: 120619008. Throughput: 0: 11673.6. Samples: 30239232. Policy #0 lag: (min: 15.0, avg: 147.0, max: 271.0) [2024-06-15 12:15:50,956][1648985] Avg episode reward: [(0, '116.800')] [2024-06-15 12:15:51,221][1652491] Updated weights for policy 0, policy_version 58915 (0.0013) [2024-06-15 12:15:52,862][1652491] Updated weights for policy 0, policy_version 58992 (0.0105) [2024-06-15 12:15:55,863][1651469] Signal inference workers to stop experience collection... (3100 times) [2024-06-15 12:15:55,916][1652491] InferenceWorker_p0-w0: stopping experience collection (3100 times) [2024-06-15 12:15:55,955][1648985] Fps is (10 sec: 42598.2, 60 sec: 46421.3, 300 sec: 46208.4). Total num frames: 120881152. Throughput: 0: 11650.8. Samples: 30268416. Policy #0 lag: (min: 15.0, avg: 147.0, max: 271.0) [2024-06-15 12:15:55,956][1648985] Avg episode reward: [(0, '133.240')] [2024-06-15 12:15:56,060][1651469] Signal inference workers to resume experience collection... (3100 times) [2024-06-15 12:15:56,061][1652491] InferenceWorker_p0-w0: resuming experience collection (3100 times) [2024-06-15 12:15:56,223][1652491] Updated weights for policy 0, policy_version 59046 (0.0014) [2024-06-15 12:16:00,955][1648985] Fps is (10 sec: 36045.6, 60 sec: 43690.6, 300 sec: 45764.1). Total num frames: 120979456. Throughput: 0: 11628.1. Samples: 30348800. Policy #0 lag: (min: 15.0, avg: 147.0, max: 271.0) [2024-06-15 12:16:00,956][1648985] Avg episode reward: [(0, '113.370')] [2024-06-15 12:16:01,189][1652491] Updated weights for policy 0, policy_version 59088 (0.0105) [2024-06-15 12:16:02,344][1652491] Updated weights for policy 0, policy_version 59140 (0.0055) [2024-06-15 12:16:03,894][1652491] Updated weights for policy 0, policy_version 59201 (0.0031) [2024-06-15 12:16:05,213][1652491] Updated weights for policy 0, policy_version 59264 (0.0012) [2024-06-15 12:16:05,955][1648985] Fps is (10 sec: 49152.2, 60 sec: 48059.7, 300 sec: 46209.6). Total num frames: 121372672. Throughput: 0: 11810.2. Samples: 30416896. Policy #0 lag: (min: 15.0, avg: 147.0, max: 271.0) [2024-06-15 12:16:05,956][1648985] Avg episode reward: [(0, '99.430')] [2024-06-15 12:16:07,424][1652491] Updated weights for policy 0, policy_version 59322 (0.0015) [2024-06-15 12:16:10,955][1648985] Fps is (10 sec: 52427.2, 60 sec: 44782.9, 300 sec: 45875.2). Total num frames: 121503744. Throughput: 0: 11810.1. Samples: 30452736. Policy #0 lag: (min: 15.0, avg: 147.0, max: 271.0) [2024-06-15 12:16:10,956][1648985] Avg episode reward: [(0, '111.170')] [2024-06-15 12:16:12,770][1652491] Updated weights for policy 0, policy_version 59377 (0.0015) [2024-06-15 12:16:14,030][1652491] Updated weights for policy 0, policy_version 59428 (0.0190) [2024-06-15 12:16:15,508][1652491] Updated weights for policy 0, policy_version 59488 (0.0010) [2024-06-15 12:16:15,955][1648985] Fps is (10 sec: 45875.9, 60 sec: 48059.7, 300 sec: 46097.4). Total num frames: 121831424. Throughput: 0: 11753.3. Samples: 30519296. Policy #0 lag: (min: 15.0, avg: 147.0, max: 271.0) [2024-06-15 12:16:15,956][1648985] Avg episode reward: [(0, '136.170')] [2024-06-15 12:16:18,093][1652491] Updated weights for policy 0, policy_version 59536 (0.0012) [2024-06-15 12:16:19,084][1652491] Updated weights for policy 0, policy_version 59582 (0.0013) [2024-06-15 12:16:20,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 45875.2, 300 sec: 46097.3). Total num frames: 122028032. Throughput: 0: 11832.8. Samples: 30594048. Policy #0 lag: (min: 15.0, avg: 147.0, max: 271.0) [2024-06-15 12:16:20,956][1648985] Avg episode reward: [(0, '139.750')] [2024-06-15 12:16:23,660][1652491] Updated weights for policy 0, policy_version 59632 (0.0013) [2024-06-15 12:16:24,863][1652491] Updated weights for policy 0, policy_version 59683 (0.0013) [2024-06-15 12:16:25,955][1648985] Fps is (10 sec: 49151.9, 60 sec: 48605.9, 300 sec: 46430.6). Total num frames: 122322944. Throughput: 0: 11821.5. Samples: 30630400. Policy #0 lag: (min: 1.0, avg: 75.1, max: 257.0) [2024-06-15 12:16:25,956][1648985] Avg episode reward: [(0, '125.350')] [2024-06-15 12:16:26,064][1652491] Updated weights for policy 0, policy_version 59744 (0.0015) [2024-06-15 12:16:26,916][1652491] Updated weights for policy 0, policy_version 59776 (0.0012) [2024-06-15 12:16:29,671][1652491] Updated weights for policy 0, policy_version 59833 (0.0020) [2024-06-15 12:16:30,955][1648985] Fps is (10 sec: 52430.0, 60 sec: 46421.4, 300 sec: 46319.5). Total num frames: 122552320. Throughput: 0: 11798.8. Samples: 30695424. Policy #0 lag: (min: 1.0, avg: 75.1, max: 257.0) [2024-06-15 12:16:30,956][1648985] Avg episode reward: [(0, '114.780')] [2024-06-15 12:16:34,419][1652491] Updated weights for policy 0, policy_version 59892 (0.0013) [2024-06-15 12:16:35,955][1648985] Fps is (10 sec: 39321.8, 60 sec: 46421.4, 300 sec: 46319.6). Total num frames: 122716160. Throughput: 0: 11764.7. Samples: 30768640. Policy #0 lag: (min: 1.0, avg: 75.1, max: 257.0) [2024-06-15 12:16:35,956][1648985] Avg episode reward: [(0, '125.090')] [2024-06-15 12:16:35,969][1652491] Updated weights for policy 0, policy_version 59936 (0.0015) [2024-06-15 12:16:36,433][1651469] Signal inference workers to stop experience collection... (3150 times) [2024-06-15 12:16:36,483][1652491] InferenceWorker_p0-w0: stopping experience collection (3150 times) [2024-06-15 12:16:36,660][1651469] Signal inference workers to resume experience collection... (3150 times) [2024-06-15 12:16:36,661][1652491] InferenceWorker_p0-w0: resuming experience collection (3150 times) [2024-06-15 12:16:37,668][1652491] Updated weights for policy 0, policy_version 60001 (0.0129) [2024-06-15 12:16:39,927][1652491] Updated weights for policy 0, policy_version 60033 (0.0041) [2024-06-15 12:16:40,955][1648985] Fps is (10 sec: 49151.9, 60 sec: 47513.6, 300 sec: 46541.7). Total num frames: 123043840. Throughput: 0: 11764.7. Samples: 30797824. Policy #0 lag: (min: 1.0, avg: 75.1, max: 257.0) [2024-06-15 12:16:40,956][1648985] Avg episode reward: [(0, '131.040')] [2024-06-15 12:16:44,820][1652491] Updated weights for policy 0, policy_version 60098 (0.0014) [2024-06-15 12:16:45,950][1652491] Updated weights for policy 0, policy_version 60160 (0.0016) [2024-06-15 12:16:45,955][1648985] Fps is (10 sec: 49152.1, 60 sec: 45875.4, 300 sec: 46208.4). Total num frames: 123207680. Throughput: 0: 11662.2. Samples: 30873600. Policy #0 lag: (min: 1.0, avg: 75.1, max: 257.0) [2024-06-15 12:16:45,955][1648985] Avg episode reward: [(0, '128.130')] [2024-06-15 12:16:48,157][1652491] Updated weights for policy 0, policy_version 60214 (0.0011) [2024-06-15 12:16:49,773][1652491] Updated weights for policy 0, policy_version 60282 (0.0013) [2024-06-15 12:16:50,955][1648985] Fps is (10 sec: 42598.3, 60 sec: 47513.7, 300 sec: 46208.5). Total num frames: 123469824. Throughput: 0: 11503.0. Samples: 30934528. Policy #0 lag: (min: 1.0, avg: 75.1, max: 257.0) [2024-06-15 12:16:50,956][1648985] Avg episode reward: [(0, '126.780')] [2024-06-15 12:16:51,917][1652491] Updated weights for policy 0, policy_version 60320 (0.0031) [2024-06-15 12:16:55,955][1648985] Fps is (10 sec: 39321.4, 60 sec: 45329.2, 300 sec: 46097.4). Total num frames: 123600896. Throughput: 0: 11548.5. Samples: 30972416. Policy #0 lag: (min: 1.0, avg: 75.1, max: 257.0) [2024-06-15 12:16:55,956][1648985] Avg episode reward: [(0, '119.740')] [2024-06-15 12:16:55,982][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000060352_123600896.pth... [2024-06-15 12:16:56,202][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000054976_112590848.pth [2024-06-15 12:16:56,478][1652491] Updated weights for policy 0, policy_version 60368 (0.0011) [2024-06-15 12:16:58,612][1652491] Updated weights for policy 0, policy_version 60418 (0.0013) [2024-06-15 12:17:00,056][1652491] Updated weights for policy 0, policy_version 60480 (0.0041) [2024-06-15 12:17:00,955][1648985] Fps is (10 sec: 45874.4, 60 sec: 49151.8, 300 sec: 46430.6). Total num frames: 123928576. Throughput: 0: 11571.2. Samples: 31040000. Policy #0 lag: (min: 63.0, avg: 152.7, max: 300.0) [2024-06-15 12:17:00,956][1648985] Avg episode reward: [(0, '117.140')] [2024-06-15 12:17:01,633][1652491] Updated weights for policy 0, policy_version 60544 (0.0070) [2024-06-15 12:17:04,288][1652491] Updated weights for policy 0, policy_version 60598 (0.0012) [2024-06-15 12:17:05,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 45875.3, 300 sec: 46208.4). Total num frames: 124125184. Throughput: 0: 11491.6. Samples: 31111168. Policy #0 lag: (min: 63.0, avg: 152.7, max: 300.0) [2024-06-15 12:17:05,956][1648985] Avg episode reward: [(0, '119.650')] [2024-06-15 12:17:08,363][1652491] Updated weights for policy 0, policy_version 60640 (0.0013) [2024-06-15 12:17:08,966][1652491] Updated weights for policy 0, policy_version 60672 (0.0016) [2024-06-15 12:17:10,955][1648985] Fps is (10 sec: 45875.3, 60 sec: 48059.8, 300 sec: 46541.6). Total num frames: 124387328. Throughput: 0: 11480.1. Samples: 31147008. Policy #0 lag: (min: 63.0, avg: 152.7, max: 300.0) [2024-06-15 12:17:10,956][1648985] Avg episode reward: [(0, '124.460')] [2024-06-15 12:17:11,103][1652491] Updated weights for policy 0, policy_version 60738 (0.0013) [2024-06-15 12:17:12,433][1652491] Updated weights for policy 0, policy_version 60800 (0.0012) [2024-06-15 12:17:15,632][1652491] Updated weights for policy 0, policy_version 60858 (0.0024) [2024-06-15 12:17:15,955][1648985] Fps is (10 sec: 52428.0, 60 sec: 46967.4, 300 sec: 46430.6). Total num frames: 124649472. Throughput: 0: 11514.3. Samples: 31213568. Policy #0 lag: (min: 63.0, avg: 152.7, max: 300.0) [2024-06-15 12:17:15,956][1648985] Avg episode reward: [(0, '133.540')] [2024-06-15 12:17:20,652][1652491] Updated weights for policy 0, policy_version 60928 (0.0014) [2024-06-15 12:17:20,753][1651469] Signal inference workers to stop experience collection... (3200 times) [2024-06-15 12:17:20,818][1652491] InferenceWorker_p0-w0: stopping experience collection (3200 times) [2024-06-15 12:17:20,946][1651469] Signal inference workers to resume experience collection... (3200 times) [2024-06-15 12:17:20,951][1652491] InferenceWorker_p0-w0: resuming experience collection (3200 times) [2024-06-15 12:17:20,955][1648985] Fps is (10 sec: 42599.7, 60 sec: 46421.6, 300 sec: 46319.5). Total num frames: 124813312. Throughput: 0: 11514.3. Samples: 31286784. Policy #0 lag: (min: 63.0, avg: 152.7, max: 300.0) [2024-06-15 12:17:20,955][1648985] Avg episode reward: [(0, '124.570')] [2024-06-15 12:17:22,458][1652491] Updated weights for policy 0, policy_version 61008 (0.0097) [2024-06-15 12:17:23,699][1652491] Updated weights for policy 0, policy_version 61054 (0.0022) [2024-06-15 12:17:25,955][1648985] Fps is (10 sec: 42598.9, 60 sec: 45875.2, 300 sec: 46319.5). Total num frames: 125075456. Throughput: 0: 11434.7. Samples: 31312384. Policy #0 lag: (min: 63.0, avg: 152.7, max: 300.0) [2024-06-15 12:17:25,956][1648985] Avg episode reward: [(0, '127.920')] [2024-06-15 12:17:26,695][1652491] Updated weights for policy 0, policy_version 61112 (0.0127) [2024-06-15 12:17:30,955][1648985] Fps is (10 sec: 39320.9, 60 sec: 44236.7, 300 sec: 45875.2). Total num frames: 125206528. Throughput: 0: 11537.0. Samples: 31392768. Policy #0 lag: (min: 63.0, avg: 152.7, max: 300.0) [2024-06-15 12:17:30,956][1648985] Avg episode reward: [(0, '128.500')] [2024-06-15 12:17:31,693][1652491] Updated weights for policy 0, policy_version 61172 (0.0013) [2024-06-15 12:17:33,493][1652491] Updated weights for policy 0, policy_version 61249 (0.0101) [2024-06-15 12:17:34,804][1652491] Updated weights for policy 0, policy_version 61302 (0.0016) [2024-06-15 12:17:35,955][1648985] Fps is (10 sec: 49152.1, 60 sec: 47513.6, 300 sec: 46430.6). Total num frames: 125566976. Throughput: 0: 11673.6. Samples: 31459840. Policy #0 lag: (min: 63.0, avg: 152.7, max: 300.0) [2024-06-15 12:17:35,956][1648985] Avg episode reward: [(0, '129.880')] [2024-06-15 12:17:37,231][1652491] Updated weights for policy 0, policy_version 61344 (0.0014) [2024-06-15 12:17:40,955][1648985] Fps is (10 sec: 49151.8, 60 sec: 44236.7, 300 sec: 46208.5). Total num frames: 125698048. Throughput: 0: 11673.6. Samples: 31497728. Policy #0 lag: (min: 63.0, avg: 198.8, max: 319.0) [2024-06-15 12:17:40,956][1648985] Avg episode reward: [(0, '136.660')] [2024-06-15 12:17:42,060][1652491] Updated weights for policy 0, policy_version 61394 (0.0020) [2024-06-15 12:17:43,314][1652491] Updated weights for policy 0, policy_version 61459 (0.0014) [2024-06-15 12:17:45,139][1652491] Updated weights for policy 0, policy_version 61536 (0.0017) [2024-06-15 12:17:45,955][1648985] Fps is (10 sec: 49152.3, 60 sec: 47513.6, 300 sec: 46541.7). Total num frames: 126058496. Throughput: 0: 11639.5. Samples: 31563776. Policy #0 lag: (min: 63.0, avg: 198.8, max: 319.0) [2024-06-15 12:17:45,955][1648985] Avg episode reward: [(0, '128.610')] [2024-06-15 12:17:48,104][1652491] Updated weights for policy 0, policy_version 61571 (0.0014) [2024-06-15 12:17:50,974][1648985] Fps is (10 sec: 52329.9, 60 sec: 45860.7, 300 sec: 46205.5). Total num frames: 126222336. Throughput: 0: 11748.3. Samples: 31640064. Policy #0 lag: (min: 63.0, avg: 198.8, max: 319.0) [2024-06-15 12:17:50,975][1648985] Avg episode reward: [(0, '125.700')] [2024-06-15 12:17:52,692][1652491] Updated weights for policy 0, policy_version 61633 (0.0012) [2024-06-15 12:17:54,820][1652491] Updated weights for policy 0, policy_version 61734 (0.0114) [2024-06-15 12:17:55,955][1648985] Fps is (10 sec: 45875.1, 60 sec: 48605.9, 300 sec: 46541.7). Total num frames: 126517248. Throughput: 0: 11707.8. Samples: 31673856. Policy #0 lag: (min: 63.0, avg: 198.8, max: 319.0) [2024-06-15 12:17:55,955][1648985] Avg episode reward: [(0, '137.470')] [2024-06-15 12:17:56,628][1652491] Updated weights for policy 0, policy_version 61812 (0.0022) [2024-06-15 12:17:59,257][1651469] Signal inference workers to stop experience collection... (3250 times) [2024-06-15 12:17:59,330][1652491] InferenceWorker_p0-w0: stopping experience collection (3250 times) [2024-06-15 12:17:59,332][1652491] Updated weights for policy 0, policy_version 61844 (0.0012) [2024-06-15 12:17:59,531][1651469] Signal inference workers to resume experience collection... (3250 times) [2024-06-15 12:17:59,532][1652491] InferenceWorker_p0-w0: resuming experience collection (3250 times) [2024-06-15 12:18:00,955][1648985] Fps is (10 sec: 52527.6, 60 sec: 46967.5, 300 sec: 46319.5). Total num frames: 126746624. Throughput: 0: 11696.3. Samples: 31739904. Policy #0 lag: (min: 63.0, avg: 198.8, max: 319.0) [2024-06-15 12:18:00,956][1648985] Avg episode reward: [(0, '137.910')] [2024-06-15 12:18:04,496][1652491] Updated weights for policy 0, policy_version 61904 (0.0014) [2024-06-15 12:18:05,874][1652491] Updated weights for policy 0, policy_version 61968 (0.0014) [2024-06-15 12:18:05,955][1648985] Fps is (10 sec: 39321.5, 60 sec: 46421.3, 300 sec: 46430.6). Total num frames: 126910464. Throughput: 0: 11821.5. Samples: 31818752. Policy #0 lag: (min: 63.0, avg: 198.8, max: 319.0) [2024-06-15 12:18:05,956][1648985] Avg episode reward: [(0, '141.430')] [2024-06-15 12:18:07,479][1652491] Updated weights for policy 0, policy_version 62035 (0.0067) [2024-06-15 12:18:09,803][1652491] Updated weights for policy 0, policy_version 62083 (0.0013) [2024-06-15 12:18:10,955][1648985] Fps is (10 sec: 49153.0, 60 sec: 47513.8, 300 sec: 46652.8). Total num frames: 127238144. Throughput: 0: 11878.4. Samples: 31846912. Policy #0 lag: (min: 63.0, avg: 198.8, max: 319.0) [2024-06-15 12:18:10,956][1648985] Avg episode reward: [(0, '132.510')] [2024-06-15 12:18:11,046][1652491] Updated weights for policy 0, policy_version 62137 (0.0011) [2024-06-15 12:18:15,955][1648985] Fps is (10 sec: 39321.7, 60 sec: 44236.9, 300 sec: 46208.5). Total num frames: 127303680. Throughput: 0: 11912.6. Samples: 31928832. Policy #0 lag: (min: 63.0, avg: 198.8, max: 319.0) [2024-06-15 12:18:15,955][1648985] Avg episode reward: [(0, '134.910')] [2024-06-15 12:18:16,628][1652491] Updated weights for policy 0, policy_version 62192 (0.0013) [2024-06-15 12:18:18,029][1652491] Updated weights for policy 0, policy_version 62243 (0.0013) [2024-06-15 12:18:19,883][1652491] Updated weights for policy 0, policy_version 62331 (0.0014) [2024-06-15 12:18:20,955][1648985] Fps is (10 sec: 42598.5, 60 sec: 47513.5, 300 sec: 46541.7). Total num frames: 127664128. Throughput: 0: 11719.1. Samples: 31987200. Policy #0 lag: (min: 47.0, avg: 112.1, max: 303.0) [2024-06-15 12:18:20,956][1648985] Avg episode reward: [(0, '132.520')] [2024-06-15 12:18:22,435][1652491] Updated weights for policy 0, policy_version 62384 (0.0013) [2024-06-15 12:18:25,955][1648985] Fps is (10 sec: 49151.9, 60 sec: 45329.1, 300 sec: 46208.5). Total num frames: 127795200. Throughput: 0: 11685.0. Samples: 32023552. Policy #0 lag: (min: 47.0, avg: 112.1, max: 303.0) [2024-06-15 12:18:25,955][1648985] Avg episode reward: [(0, '136.600')] [2024-06-15 12:18:26,990][1652491] Updated weights for policy 0, policy_version 62404 (0.0041) [2024-06-15 12:18:28,937][1652491] Updated weights for policy 0, policy_version 62480 (0.0012) [2024-06-15 12:18:30,958][1648985] Fps is (10 sec: 42585.0, 60 sec: 48057.3, 300 sec: 46319.0). Total num frames: 128090112. Throughput: 0: 11672.8. Samples: 32089088. Policy #0 lag: (min: 47.0, avg: 112.1, max: 303.0) [2024-06-15 12:18:30,959][1648985] Avg episode reward: [(0, '139.500')] [2024-06-15 12:18:31,056][1652491] Updated weights for policy 0, policy_version 62560 (0.0013) [2024-06-15 12:18:33,043][1652491] Updated weights for policy 0, policy_version 62608 (0.0013) [2024-06-15 12:18:34,160][1652491] Updated weights for policy 0, policy_version 62656 (0.0011) [2024-06-15 12:18:35,956][1648985] Fps is (10 sec: 52426.5, 60 sec: 45874.9, 300 sec: 46208.4). Total num frames: 128319488. Throughput: 0: 11484.9. Samples: 32156672. Policy #0 lag: (min: 47.0, avg: 112.1, max: 303.0) [2024-06-15 12:18:35,957][1648985] Avg episode reward: [(0, '126.780')] [2024-06-15 12:18:40,955][1648985] Fps is (10 sec: 36055.9, 60 sec: 45875.2, 300 sec: 45986.3). Total num frames: 128450560. Throughput: 0: 11605.3. Samples: 32196096. Policy #0 lag: (min: 47.0, avg: 112.1, max: 303.0) [2024-06-15 12:18:40,956][1648985] Avg episode reward: [(0, '120.710')] [2024-06-15 12:18:41,404][1652491] Updated weights for policy 0, policy_version 62736 (0.0014) [2024-06-15 12:18:41,886][1651469] Signal inference workers to stop experience collection... (3300 times) [2024-06-15 12:18:41,928][1652491] InferenceWorker_p0-w0: stopping experience collection (3300 times) [2024-06-15 12:18:42,139][1651469] Signal inference workers to resume experience collection... (3300 times) [2024-06-15 12:18:42,140][1652491] InferenceWorker_p0-w0: resuming experience collection (3300 times) [2024-06-15 12:18:42,666][1652491] Updated weights for policy 0, policy_version 62787 (0.0012) [2024-06-15 12:18:45,097][1652491] Updated weights for policy 0, policy_version 62864 (0.0013) [2024-06-15 12:18:45,936][1652491] Updated weights for policy 0, policy_version 62905 (0.0012) [2024-06-15 12:18:45,955][1648985] Fps is (10 sec: 49154.4, 60 sec: 45875.2, 300 sec: 46430.6). Total num frames: 128811008. Throughput: 0: 11309.6. Samples: 32248832. Policy #0 lag: (min: 47.0, avg: 112.1, max: 303.0) [2024-06-15 12:18:45,955][1648985] Avg episode reward: [(0, '101.790')] [2024-06-15 12:18:50,955][1648985] Fps is (10 sec: 39321.7, 60 sec: 43704.5, 300 sec: 46208.4). Total num frames: 128843776. Throughput: 0: 11241.2. Samples: 32324608. Policy #0 lag: (min: 47.0, avg: 112.1, max: 303.0) [2024-06-15 12:18:50,956][1648985] Avg episode reward: [(0, '119.310')] [2024-06-15 12:18:52,406][1652491] Updated weights for policy 0, policy_version 62960 (0.0016) [2024-06-15 12:18:54,245][1652491] Updated weights for policy 0, policy_version 63040 (0.0012) [2024-06-15 12:18:55,545][1652491] Updated weights for policy 0, policy_version 63102 (0.0013) [2024-06-15 12:18:55,955][1648985] Fps is (10 sec: 42597.2, 60 sec: 45328.9, 300 sec: 46652.7). Total num frames: 129236992. Throughput: 0: 11172.9. Samples: 32349696. Policy #0 lag: (min: 175.0, avg: 246.4, max: 447.0) [2024-06-15 12:18:55,956][1648985] Avg episode reward: [(0, '126.070')] [2024-06-15 12:18:55,961][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000063104_129236992.pth... [2024-06-15 12:18:56,029][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000057664_118095872.pth [2024-06-15 12:18:57,731][1652491] Updated weights for policy 0, policy_version 63152 (0.0030) [2024-06-15 12:19:00,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 43690.8, 300 sec: 46208.4). Total num frames: 129368064. Throughput: 0: 10934.0. Samples: 32420864. Policy #0 lag: (min: 175.0, avg: 246.4, max: 447.0) [2024-06-15 12:19:00,956][1648985] Avg episode reward: [(0, '129.320')] [2024-06-15 12:19:03,507][1652491] Updated weights for policy 0, policy_version 63204 (0.0012) [2024-06-15 12:19:05,958][1648985] Fps is (10 sec: 39310.3, 60 sec: 45326.7, 300 sec: 46209.1). Total num frames: 129630208. Throughput: 0: 10944.7. Samples: 32479744. Policy #0 lag: (min: 175.0, avg: 246.4, max: 447.0) [2024-06-15 12:19:05,959][1648985] Avg episode reward: [(0, '130.690')] [2024-06-15 12:19:06,285][1652491] Updated weights for policy 0, policy_version 63314 (0.0014) [2024-06-15 12:19:09,014][1652491] Updated weights for policy 0, policy_version 63377 (0.0014) [2024-06-15 12:19:10,955][1648985] Fps is (10 sec: 52427.7, 60 sec: 44236.6, 300 sec: 46208.4). Total num frames: 129892352. Throughput: 0: 10877.1. Samples: 32513024. Policy #0 lag: (min: 175.0, avg: 246.4, max: 447.0) [2024-06-15 12:19:10,956][1648985] Avg episode reward: [(0, '137.300')] [2024-06-15 12:19:14,432][1652491] Updated weights for policy 0, policy_version 63425 (0.0020) [2024-06-15 12:19:15,955][1648985] Fps is (10 sec: 36056.0, 60 sec: 44782.9, 300 sec: 45986.3). Total num frames: 129990656. Throughput: 0: 11082.7. Samples: 32587776. Policy #0 lag: (min: 175.0, avg: 246.4, max: 447.0) [2024-06-15 12:19:15,956][1648985] Avg episode reward: [(0, '135.430')] [2024-06-15 12:19:16,667][1652491] Updated weights for policy 0, policy_version 63505 (0.0012) [2024-06-15 12:19:18,216][1652491] Updated weights for policy 0, policy_version 63571 (0.0013) [2024-06-15 12:19:19,110][1652491] Updated weights for policy 0, policy_version 63616 (0.0020) [2024-06-15 12:19:20,955][1648985] Fps is (10 sec: 42599.9, 60 sec: 44236.9, 300 sec: 46208.5). Total num frames: 130318336. Throughput: 0: 10900.0. Samples: 32647168. Policy #0 lag: (min: 175.0, avg: 246.4, max: 447.0) [2024-06-15 12:19:20,955][1648985] Avg episode reward: [(0, '123.180')] [2024-06-15 12:19:21,636][1652491] Updated weights for policy 0, policy_version 63669 (0.0011) [2024-06-15 12:19:25,955][1648985] Fps is (10 sec: 42598.1, 60 sec: 43690.6, 300 sec: 46097.4). Total num frames: 130416640. Throughput: 0: 10877.1. Samples: 32685568. Policy #0 lag: (min: 175.0, avg: 246.4, max: 447.0) [2024-06-15 12:19:25,956][1648985] Avg episode reward: [(0, '125.160')] [2024-06-15 12:19:26,028][1651469] Signal inference workers to stop experience collection... (3350 times) [2024-06-15 12:19:26,109][1652491] InferenceWorker_p0-w0: stopping experience collection (3350 times) [2024-06-15 12:19:26,308][1651469] Signal inference workers to resume experience collection... (3350 times) [2024-06-15 12:19:26,325][1652491] InferenceWorker_p0-w0: resuming experience collection (3350 times) [2024-06-15 12:19:26,855][1652491] Updated weights for policy 0, policy_version 63714 (0.0013) [2024-06-15 12:19:28,758][1652491] Updated weights for policy 0, policy_version 63793 (0.0013) [2024-06-15 12:19:29,801][1652491] Updated weights for policy 0, policy_version 63841 (0.0012) [2024-06-15 12:19:30,955][1648985] Fps is (10 sec: 49150.6, 60 sec: 45331.3, 300 sec: 46208.4). Total num frames: 130809856. Throughput: 0: 11161.5. Samples: 32751104. Policy #0 lag: (min: 175.0, avg: 246.4, max: 447.0) [2024-06-15 12:19:30,956][1648985] Avg episode reward: [(0, '116.000')] [2024-06-15 12:19:32,637][1652491] Updated weights for policy 0, policy_version 63905 (0.0016) [2024-06-15 12:19:33,135][1652491] Updated weights for policy 0, policy_version 63935 (0.0020) [2024-06-15 12:19:35,955][1648985] Fps is (10 sec: 52429.1, 60 sec: 43691.0, 300 sec: 46208.5). Total num frames: 130940928. Throughput: 0: 11218.5. Samples: 32829440. Policy #0 lag: (min: 175.0, avg: 246.4, max: 447.0) [2024-06-15 12:19:35,956][1648985] Avg episode reward: [(0, '131.510')] [2024-06-15 12:19:38,508][1652491] Updated weights for policy 0, policy_version 63985 (0.0012) [2024-06-15 12:19:39,856][1652491] Updated weights for policy 0, policy_version 64034 (0.0012) [2024-06-15 12:19:41,007][1648985] Fps is (10 sec: 42380.7, 60 sec: 46381.5, 300 sec: 46200.4). Total num frames: 131235840. Throughput: 0: 11330.7. Samples: 32860160. Policy #0 lag: (min: 15.0, avg: 74.7, max: 268.0) [2024-06-15 12:19:41,007][1648985] Avg episode reward: [(0, '117.950')] [2024-06-15 12:19:41,155][1652491] Updated weights for policy 0, policy_version 64096 (0.0013) [2024-06-15 12:19:43,956][1652491] Updated weights for policy 0, policy_version 64145 (0.0013) [2024-06-15 12:19:45,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 44236.8, 300 sec: 46208.5). Total num frames: 131465216. Throughput: 0: 11241.2. Samples: 32926720. Policy #0 lag: (min: 15.0, avg: 74.7, max: 268.0) [2024-06-15 12:19:45,956][1648985] Avg episode reward: [(0, '119.970')] [2024-06-15 12:19:49,269][1652491] Updated weights for policy 0, policy_version 64240 (0.0113) [2024-06-15 12:19:50,955][1648985] Fps is (10 sec: 46112.8, 60 sec: 47513.6, 300 sec: 46097.4). Total num frames: 131694592. Throughput: 0: 11401.3. Samples: 32992768. Policy #0 lag: (min: 15.0, avg: 74.7, max: 268.0) [2024-06-15 12:19:50,956][1648985] Avg episode reward: [(0, '119.680')] [2024-06-15 12:19:51,039][1652491] Updated weights for policy 0, policy_version 64313 (0.0013) [2024-06-15 12:19:52,465][1652491] Updated weights for policy 0, policy_version 64352 (0.0014) [2024-06-15 12:19:55,153][1652491] Updated weights for policy 0, policy_version 64387 (0.0014) [2024-06-15 12:19:55,955][1648985] Fps is (10 sec: 49151.7, 60 sec: 45329.2, 300 sec: 46097.3). Total num frames: 131956736. Throughput: 0: 11491.6. Samples: 33030144. Policy #0 lag: (min: 15.0, avg: 74.7, max: 268.0) [2024-06-15 12:19:55,956][1648985] Avg episode reward: [(0, '115.600')] [2024-06-15 12:19:59,846][1652491] Updated weights for policy 0, policy_version 64466 (0.0014) [2024-06-15 12:20:00,955][1648985] Fps is (10 sec: 42598.5, 60 sec: 45875.2, 300 sec: 46208.5). Total num frames: 132120576. Throughput: 0: 11525.7. Samples: 33106432. Policy #0 lag: (min: 15.0, avg: 74.7, max: 268.0) [2024-06-15 12:20:00,956][1648985] Avg episode reward: [(0, '127.410')] [2024-06-15 12:20:01,401][1652491] Updated weights for policy 0, policy_version 64531 (0.0012) [2024-06-15 12:20:03,200][1652491] Updated weights for policy 0, policy_version 64581 (0.0014) [2024-06-15 12:20:04,623][1652491] Updated weights for policy 0, policy_version 64636 (0.0067) [2024-06-15 12:20:05,955][1648985] Fps is (10 sec: 42598.6, 60 sec: 45877.6, 300 sec: 45986.3). Total num frames: 132382720. Throughput: 0: 11525.7. Samples: 33165824. Policy #0 lag: (min: 15.0, avg: 74.7, max: 268.0) [2024-06-15 12:20:05,956][1648985] Avg episode reward: [(0, '124.660')] [2024-06-15 12:20:06,860][1651469] Signal inference workers to stop experience collection... (3400 times) [2024-06-15 12:20:06,903][1652491] InferenceWorker_p0-w0: stopping experience collection (3400 times) [2024-06-15 12:20:07,116][1651469] Signal inference workers to resume experience collection... (3400 times) [2024-06-15 12:20:07,117][1652491] InferenceWorker_p0-w0: resuming experience collection (3400 times) [2024-06-15 12:20:07,789][1652491] Updated weights for policy 0, policy_version 64677 (0.0069) [2024-06-15 12:20:10,955][1648985] Fps is (10 sec: 39320.8, 60 sec: 43690.7, 300 sec: 45986.2). Total num frames: 132513792. Throughput: 0: 11446.0. Samples: 33200640. Policy #0 lag: (min: 15.0, avg: 74.7, max: 268.0) [2024-06-15 12:20:10,956][1648985] Avg episode reward: [(0, '132.850')] [2024-06-15 12:20:12,057][1652491] Updated weights for policy 0, policy_version 64740 (0.0015) [2024-06-15 12:20:14,030][1652491] Updated weights for policy 0, policy_version 64821 (0.0013) [2024-06-15 12:20:15,386][1652491] Updated weights for policy 0, policy_version 64849 (0.0012) [2024-06-15 12:20:15,955][1648985] Fps is (10 sec: 45875.3, 60 sec: 47513.6, 300 sec: 45986.3). Total num frames: 132841472. Throughput: 0: 11480.2. Samples: 33267712. Policy #0 lag: (min: 15.0, avg: 74.7, max: 268.0) [2024-06-15 12:20:15,956][1648985] Avg episode reward: [(0, '138.450')] [2024-06-15 12:20:16,244][1652491] Updated weights for policy 0, policy_version 64892 (0.0032) [2024-06-15 12:20:19,271][1652491] Updated weights for policy 0, policy_version 64936 (0.0012) [2024-06-15 12:20:20,955][1648985] Fps is (10 sec: 52430.1, 60 sec: 45329.0, 300 sec: 46208.4). Total num frames: 133038080. Throughput: 0: 11537.1. Samples: 33348608. Policy #0 lag: (min: 15.0, avg: 133.4, max: 271.0) [2024-06-15 12:20:20,955][1648985] Avg episode reward: [(0, '127.270')] [2024-06-15 12:20:22,041][1652491] Updated weights for policy 0, policy_version 64976 (0.0112) [2024-06-15 12:20:23,801][1652491] Updated weights for policy 0, policy_version 65044 (0.0012) [2024-06-15 12:20:24,687][1652491] Updated weights for policy 0, policy_version 65088 (0.0017) [2024-06-15 12:20:25,955][1648985] Fps is (10 sec: 49152.2, 60 sec: 48606.0, 300 sec: 45986.3). Total num frames: 133332992. Throughput: 0: 11516.1. Samples: 33377792. Policy #0 lag: (min: 15.0, avg: 133.4, max: 271.0) [2024-06-15 12:20:25,955][1648985] Avg episode reward: [(0, '133.360')] [2024-06-15 12:20:26,863][1652491] Updated weights for policy 0, policy_version 65142 (0.0028) [2024-06-15 12:20:29,848][1652491] Updated weights for policy 0, policy_version 65184 (0.0013) [2024-06-15 12:20:30,955][1648985] Fps is (10 sec: 52428.5, 60 sec: 45875.3, 300 sec: 46208.4). Total num frames: 133562368. Throughput: 0: 11776.0. Samples: 33456640. Policy #0 lag: (min: 15.0, avg: 133.4, max: 271.0) [2024-06-15 12:20:30,956][1648985] Avg episode reward: [(0, '145.310')] [2024-06-15 12:20:32,080][1652491] Updated weights for policy 0, policy_version 65219 (0.0016) [2024-06-15 12:20:34,115][1652491] Updated weights for policy 0, policy_version 65299 (0.0030) [2024-06-15 12:20:35,955][1648985] Fps is (10 sec: 49151.8, 60 sec: 48059.8, 300 sec: 46208.4). Total num frames: 133824512. Throughput: 0: 11776.0. Samples: 33522688. Policy #0 lag: (min: 15.0, avg: 133.4, max: 271.0) [2024-06-15 12:20:35,956][1648985] Avg episode reward: [(0, '135.760')] [2024-06-15 12:20:37,107][1652491] Updated weights for policy 0, policy_version 65365 (0.0025) [2024-06-15 12:20:40,955][1648985] Fps is (10 sec: 39321.5, 60 sec: 45368.0, 300 sec: 45764.1). Total num frames: 133955584. Throughput: 0: 11707.7. Samples: 33556992. Policy #0 lag: (min: 15.0, avg: 133.4, max: 271.0) [2024-06-15 12:20:40,956][1648985] Avg episode reward: [(0, '124.480')] [2024-06-15 12:20:41,361][1652491] Updated weights for policy 0, policy_version 65440 (0.0012) [2024-06-15 12:20:43,771][1652491] Updated weights for policy 0, policy_version 65489 (0.0013) [2024-06-15 12:20:45,547][1652491] Updated weights for policy 0, policy_version 65557 (0.0186) [2024-06-15 12:20:45,956][1648985] Fps is (10 sec: 45871.7, 60 sec: 46966.9, 300 sec: 46319.4). Total num frames: 134283264. Throughput: 0: 11627.9. Samples: 33629696. Policy #0 lag: (min: 15.0, avg: 133.4, max: 271.0) [2024-06-15 12:20:45,957][1648985] Avg episode reward: [(0, '135.010')] [2024-06-15 12:20:48,200][1652491] Updated weights for policy 0, policy_version 65616 (0.0012) [2024-06-15 12:20:48,324][1651469] Signal inference workers to stop experience collection... (3450 times) [2024-06-15 12:20:48,404][1652491] InferenceWorker_p0-w0: stopping experience collection (3450 times) [2024-06-15 12:20:48,673][1651469] Signal inference workers to resume experience collection... (3450 times) [2024-06-15 12:20:48,674][1652491] InferenceWorker_p0-w0: resuming experience collection (3450 times) [2024-06-15 12:20:50,955][1648985] Fps is (10 sec: 52429.3, 60 sec: 46421.4, 300 sec: 46097.4). Total num frames: 134479872. Throughput: 0: 11707.7. Samples: 33692672. Policy #0 lag: (min: 15.0, avg: 133.4, max: 271.0) [2024-06-15 12:20:50,956][1648985] Avg episode reward: [(0, '114.850')] [2024-06-15 12:20:52,768][1652491] Updated weights for policy 0, policy_version 65668 (0.0098) [2024-06-15 12:20:53,785][1652491] Updated weights for policy 0, policy_version 65722 (0.0035) [2024-06-15 12:20:55,487][1652491] Updated weights for policy 0, policy_version 65789 (0.0077) [2024-06-15 12:20:55,955][1648985] Fps is (10 sec: 45878.4, 60 sec: 46421.3, 300 sec: 46652.7). Total num frames: 134742016. Throughput: 0: 11901.2. Samples: 33736192. Policy #0 lag: (min: 15.0, avg: 133.4, max: 271.0) [2024-06-15 12:20:55,956][1648985] Avg episode reward: [(0, '121.170')] [2024-06-15 12:20:56,617][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000065824_134807552.pth... [2024-06-15 12:20:56,746][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000060352_123600896.pth [2024-06-15 12:20:57,424][1652491] Updated weights for policy 0, policy_version 65852 (0.0012) [2024-06-15 12:21:00,436][1652491] Updated weights for policy 0, policy_version 65909 (0.0012) [2024-06-15 12:21:00,955][1648985] Fps is (10 sec: 52428.3, 60 sec: 48059.7, 300 sec: 46208.5). Total num frames: 135004160. Throughput: 0: 11855.6. Samples: 33801216. Policy #0 lag: (min: 6.0, avg: 137.8, max: 262.0) [2024-06-15 12:21:00,966][1648985] Avg episode reward: [(0, '118.140')] [2024-06-15 12:21:03,773][1652491] Updated weights for policy 0, policy_version 65952 (0.0012) [2024-06-15 12:21:05,595][1652491] Updated weights for policy 0, policy_version 66000 (0.0033) [2024-06-15 12:21:05,955][1648985] Fps is (10 sec: 45874.3, 60 sec: 46967.3, 300 sec: 46430.6). Total num frames: 135200768. Throughput: 0: 11832.8. Samples: 33881088. Policy #0 lag: (min: 6.0, avg: 137.8, max: 262.0) [2024-06-15 12:21:05,956][1648985] Avg episode reward: [(0, '130.940')] [2024-06-15 12:21:06,648][1652491] Updated weights for policy 0, policy_version 66050 (0.0013) [2024-06-15 12:21:10,024][1652491] Updated weights for policy 0, policy_version 66114 (0.0014) [2024-06-15 12:21:10,955][1648985] Fps is (10 sec: 45875.1, 60 sec: 49152.1, 300 sec: 46208.4). Total num frames: 135462912. Throughput: 0: 11901.1. Samples: 33913344. Policy #0 lag: (min: 6.0, avg: 137.8, max: 262.0) [2024-06-15 12:21:10,956][1648985] Avg episode reward: [(0, '142.280')] [2024-06-15 12:21:11,328][1652491] Updated weights for policy 0, policy_version 66171 (0.0015) [2024-06-15 12:21:15,955][1648985] Fps is (10 sec: 45875.0, 60 sec: 46967.2, 300 sec: 46208.4). Total num frames: 135659520. Throughput: 0: 11741.8. Samples: 33985024. Policy #0 lag: (min: 6.0, avg: 137.8, max: 262.0) [2024-06-15 12:21:15,956][1648985] Avg episode reward: [(0, '140.790')] [2024-06-15 12:21:16,704][1652491] Updated weights for policy 0, policy_version 66242 (0.0014) [2024-06-15 12:21:18,323][1652491] Updated weights for policy 0, policy_version 66320 (0.0019) [2024-06-15 12:21:20,955][1648985] Fps is (10 sec: 45875.8, 60 sec: 48059.8, 300 sec: 46097.4). Total num frames: 135921664. Throughput: 0: 11753.3. Samples: 34051584. Policy #0 lag: (min: 6.0, avg: 137.8, max: 262.0) [2024-06-15 12:21:20,955][1648985] Avg episode reward: [(0, '129.480')] [2024-06-15 12:21:21,226][1652491] Updated weights for policy 0, policy_version 66373 (0.0013) [2024-06-15 12:21:22,531][1652491] Updated weights for policy 0, policy_version 66425 (0.0011) [2024-06-15 12:21:25,955][1648985] Fps is (10 sec: 42599.1, 60 sec: 45875.1, 300 sec: 45875.2). Total num frames: 136085504. Throughput: 0: 11764.6. Samples: 34086400. Policy #0 lag: (min: 6.0, avg: 137.8, max: 262.0) [2024-06-15 12:21:25,956][1648985] Avg episode reward: [(0, '131.610')] [2024-06-15 12:21:26,434][1652491] Updated weights for policy 0, policy_version 66482 (0.0026) [2024-06-15 12:21:28,600][1652491] Updated weights for policy 0, policy_version 66528 (0.0014) [2024-06-15 12:21:29,797][1652491] Updated weights for policy 0, policy_version 66576 (0.0014) [2024-06-15 12:21:30,955][1648985] Fps is (10 sec: 52427.9, 60 sec: 48059.7, 300 sec: 46541.6). Total num frames: 136445952. Throughput: 0: 11696.5. Samples: 34156032. Policy #0 lag: (min: 6.0, avg: 137.8, max: 262.0) [2024-06-15 12:21:30,956][1648985] Avg episode reward: [(0, '124.350')] [2024-06-15 12:21:32,123][1651469] Signal inference workers to stop experience collection... (3500 times) [2024-06-15 12:21:32,199][1652491] InferenceWorker_p0-w0: stopping experience collection (3500 times) [2024-06-15 12:21:32,426][1651469] Signal inference workers to resume experience collection... (3500 times) [2024-06-15 12:21:32,428][1652491] InferenceWorker_p0-w0: resuming experience collection (3500 times) [2024-06-15 12:21:32,530][1652491] Updated weights for policy 0, policy_version 66640 (0.0116) [2024-06-15 12:21:35,955][1648985] Fps is (10 sec: 49151.1, 60 sec: 45875.0, 300 sec: 45875.2). Total num frames: 136577024. Throughput: 0: 11810.0. Samples: 34224128. Policy #0 lag: (min: 6.0, avg: 137.8, max: 262.0) [2024-06-15 12:21:35,956][1648985] Avg episode reward: [(0, '125.450')] [2024-06-15 12:21:37,692][1652491] Updated weights for policy 0, policy_version 66704 (0.0013) [2024-06-15 12:21:39,015][1652491] Updated weights for policy 0, policy_version 66768 (0.0035) [2024-06-15 12:21:40,417][1652491] Updated weights for policy 0, policy_version 66816 (0.0013) [2024-06-15 12:21:40,955][1648985] Fps is (10 sec: 39322.3, 60 sec: 48059.8, 300 sec: 46208.4). Total num frames: 136839168. Throughput: 0: 11673.6. Samples: 34261504. Policy #0 lag: (min: 6.0, avg: 137.8, max: 262.0) [2024-06-15 12:21:40,955][1648985] Avg episode reward: [(0, '144.250')] [2024-06-15 12:21:42,148][1652491] Updated weights for policy 0, policy_version 66857 (0.0014) [2024-06-15 12:21:43,917][1652491] Updated weights for policy 0, policy_version 66884 (0.0014) [2024-06-15 12:21:45,955][1648985] Fps is (10 sec: 52430.0, 60 sec: 46968.0, 300 sec: 46208.4). Total num frames: 137101312. Throughput: 0: 11639.5. Samples: 34324992. Policy #0 lag: (min: 4.0, avg: 131.0, max: 269.0) [2024-06-15 12:21:45,956][1648985] Avg episode reward: [(0, '165.690')] [2024-06-15 12:21:45,957][1651469] Saving new best policy, reward=165.690! [2024-06-15 12:21:49,748][1652491] Updated weights for policy 0, policy_version 66964 (0.0014) [2024-06-15 12:21:50,955][1648985] Fps is (10 sec: 39321.4, 60 sec: 45875.2, 300 sec: 46208.4). Total num frames: 137232384. Throughput: 0: 11491.6. Samples: 34398208. Policy #0 lag: (min: 4.0, avg: 131.0, max: 269.0) [2024-06-15 12:21:50,956][1648985] Avg episode reward: [(0, '143.010')] [2024-06-15 12:21:51,252][1652491] Updated weights for policy 0, policy_version 67028 (0.0066) [2024-06-15 12:21:53,198][1652491] Updated weights for policy 0, policy_version 67090 (0.0013) [2024-06-15 12:21:53,997][1652491] Updated weights for policy 0, policy_version 67132 (0.0014) [2024-06-15 12:21:55,955][1648985] Fps is (10 sec: 45875.7, 60 sec: 46967.5, 300 sec: 46208.5). Total num frames: 137560064. Throughput: 0: 11411.9. Samples: 34426880. Policy #0 lag: (min: 4.0, avg: 131.0, max: 269.0) [2024-06-15 12:21:55,955][1648985] Avg episode reward: [(0, '126.880')] [2024-06-15 12:21:56,217][1652491] Updated weights for policy 0, policy_version 67184 (0.0014) [2024-06-15 12:21:56,584][1652491] Updated weights for policy 0, policy_version 67198 (0.0010) [2024-06-15 12:22:00,955][1648985] Fps is (10 sec: 45875.0, 60 sec: 44782.9, 300 sec: 45986.3). Total num frames: 137691136. Throughput: 0: 11650.9. Samples: 34509312. Policy #0 lag: (min: 4.0, avg: 131.0, max: 269.0) [2024-06-15 12:22:00,956][1648985] Avg episode reward: [(0, '130.960')] [2024-06-15 12:22:01,431][1652491] Updated weights for policy 0, policy_version 67250 (0.0012) [2024-06-15 12:22:02,706][1652491] Updated weights for policy 0, policy_version 67312 (0.0014) [2024-06-15 12:22:03,717][1652491] Updated weights for policy 0, policy_version 67344 (0.0012) [2024-06-15 12:22:04,767][1652491] Updated weights for policy 0, policy_version 67385 (0.0012) [2024-06-15 12:22:05,956][1648985] Fps is (10 sec: 45873.0, 60 sec: 46967.3, 300 sec: 46208.4). Total num frames: 138018816. Throughput: 0: 11719.0. Samples: 34578944. Policy #0 lag: (min: 4.0, avg: 131.0, max: 269.0) [2024-06-15 12:22:05,957][1648985] Avg episode reward: [(0, '132.790')] [2024-06-15 12:22:06,476][1652491] Updated weights for policy 0, policy_version 67413 (0.0014) [2024-06-15 12:22:10,663][1652491] Updated weights for policy 0, policy_version 67460 (0.0014) [2024-06-15 12:22:10,955][1648985] Fps is (10 sec: 49152.0, 60 sec: 45329.1, 300 sec: 45875.2). Total num frames: 138182656. Throughput: 0: 11821.5. Samples: 34618368. Policy #0 lag: (min: 4.0, avg: 131.0, max: 269.0) [2024-06-15 12:22:10,956][1648985] Avg episode reward: [(0, '122.820')] [2024-06-15 12:22:12,987][1652491] Updated weights for policy 0, policy_version 67562 (0.0112) [2024-06-15 12:22:15,068][1651469] Signal inference workers to stop experience collection... (3550 times) [2024-06-15 12:22:15,123][1652491] InferenceWorker_p0-w0: stopping experience collection (3550 times) [2024-06-15 12:22:15,125][1652491] Updated weights for policy 0, policy_version 67602 (0.0012) [2024-06-15 12:22:15,393][1651469] Signal inference workers to resume experience collection... (3550 times) [2024-06-15 12:22:15,395][1652491] InferenceWorker_p0-w0: resuming experience collection (3550 times) [2024-06-15 12:22:15,955][1648985] Fps is (10 sec: 49153.7, 60 sec: 47513.7, 300 sec: 46430.6). Total num frames: 138510336. Throughput: 0: 11764.6. Samples: 34685440. Policy #0 lag: (min: 4.0, avg: 131.0, max: 269.0) [2024-06-15 12:22:15,956][1648985] Avg episode reward: [(0, '143.790')] [2024-06-15 12:22:16,224][1652491] Updated weights for policy 0, policy_version 67648 (0.0011) [2024-06-15 12:22:18,366][1652491] Updated weights for policy 0, policy_version 67712 (0.0014) [2024-06-15 12:22:20,956][1648985] Fps is (10 sec: 49150.6, 60 sec: 45874.9, 300 sec: 46097.3). Total num frames: 138674176. Throughput: 0: 11810.1. Samples: 34755584. Policy #0 lag: (min: 4.0, avg: 131.0, max: 269.0) [2024-06-15 12:22:20,957][1648985] Avg episode reward: [(0, '137.130')] [2024-06-15 12:22:23,469][1652491] Updated weights for policy 0, policy_version 67763 (0.0013) [2024-06-15 12:22:24,935][1652491] Updated weights for policy 0, policy_version 67836 (0.0013) [2024-06-15 12:22:25,955][1648985] Fps is (10 sec: 42598.4, 60 sec: 47513.6, 300 sec: 46541.7). Total num frames: 138936320. Throughput: 0: 11696.3. Samples: 34787840. Policy #0 lag: (min: 34.0, avg: 115.5, max: 290.0) [2024-06-15 12:22:25,956][1648985] Avg episode reward: [(0, '139.780')] [2024-06-15 12:22:27,823][1652491] Updated weights for policy 0, policy_version 67894 (0.0012) [2024-06-15 12:22:29,838][1652491] Updated weights for policy 0, policy_version 67952 (0.0013) [2024-06-15 12:22:30,955][1648985] Fps is (10 sec: 52430.7, 60 sec: 45875.3, 300 sec: 46208.4). Total num frames: 139198464. Throughput: 0: 11787.4. Samples: 34855424. Policy #0 lag: (min: 34.0, avg: 115.5, max: 290.0) [2024-06-15 12:22:30,956][1648985] Avg episode reward: [(0, '120.420')] [2024-06-15 12:22:34,059][1652491] Updated weights for policy 0, policy_version 68002 (0.0088) [2024-06-15 12:22:35,700][1652491] Updated weights for policy 0, policy_version 68050 (0.0014) [2024-06-15 12:22:35,955][1648985] Fps is (10 sec: 45875.6, 60 sec: 46967.7, 300 sec: 46430.6). Total num frames: 139395072. Throughput: 0: 11639.5. Samples: 34921984. Policy #0 lag: (min: 34.0, avg: 115.5, max: 290.0) [2024-06-15 12:22:35,956][1648985] Avg episode reward: [(0, '109.910')] [2024-06-15 12:22:38,262][1652491] Updated weights for policy 0, policy_version 68115 (0.0012) [2024-06-15 12:22:40,155][1652491] Updated weights for policy 0, policy_version 68176 (0.0014) [2024-06-15 12:22:40,955][1648985] Fps is (10 sec: 49151.7, 60 sec: 47513.5, 300 sec: 46208.4). Total num frames: 139689984. Throughput: 0: 11696.3. Samples: 34953216. Policy #0 lag: (min: 34.0, avg: 115.5, max: 290.0) [2024-06-15 12:22:40,956][1648985] Avg episode reward: [(0, '112.140')] [2024-06-15 12:22:41,144][1652491] Updated weights for policy 0, policy_version 68220 (0.0012) [2024-06-15 12:22:45,955][1648985] Fps is (10 sec: 39321.1, 60 sec: 44782.9, 300 sec: 45989.2). Total num frames: 139788288. Throughput: 0: 11537.0. Samples: 35028480. Policy #0 lag: (min: 34.0, avg: 115.5, max: 290.0) [2024-06-15 12:22:45,956][1648985] Avg episode reward: [(0, '118.220')] [2024-06-15 12:22:46,064][1652491] Updated weights for policy 0, policy_version 68272 (0.0012) [2024-06-15 12:22:47,640][1652491] Updated weights for policy 0, policy_version 68320 (0.0012) [2024-06-15 12:22:48,388][1652491] Updated weights for policy 0, policy_version 68352 (0.0016) [2024-06-15 12:22:50,909][1652491] Updated weights for policy 0, policy_version 68403 (0.0010) [2024-06-15 12:22:50,955][1648985] Fps is (10 sec: 39321.5, 60 sec: 47513.5, 300 sec: 45986.3). Total num frames: 140083200. Throughput: 0: 11377.9. Samples: 35090944. Policy #0 lag: (min: 34.0, avg: 115.5, max: 290.0) [2024-06-15 12:22:50,956][1648985] Avg episode reward: [(0, '129.530')] [2024-06-15 12:22:52,229][1652491] Updated weights for policy 0, policy_version 68471 (0.0014) [2024-06-15 12:22:55,962][1648985] Fps is (10 sec: 45842.8, 60 sec: 44777.5, 300 sec: 45763.0). Total num frames: 140247040. Throughput: 0: 11296.3. Samples: 35126784. Policy #0 lag: (min: 34.0, avg: 115.5, max: 290.0) [2024-06-15 12:22:55,963][1648985] Avg episode reward: [(0, '126.720')] [2024-06-15 12:22:55,968][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000068480_140247040.pth... [2024-06-15 12:22:56,049][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000063104_129236992.pth [2024-06-15 12:22:57,233][1652491] Updated weights for policy 0, policy_version 68514 (0.0019) [2024-06-15 12:22:59,474][1652491] Updated weights for policy 0, policy_version 68592 (0.0118) [2024-06-15 12:23:00,955][1648985] Fps is (10 sec: 42598.6, 60 sec: 46967.5, 300 sec: 46097.4). Total num frames: 140509184. Throughput: 0: 11389.2. Samples: 35197952. Policy #0 lag: (min: 34.0, avg: 115.5, max: 290.0) [2024-06-15 12:23:00,955][1648985] Avg episode reward: [(0, '132.370')] [2024-06-15 12:23:01,094][1652491] Updated weights for policy 0, policy_version 68624 (0.0011) [2024-06-15 12:23:01,237][1651469] Signal inference workers to stop experience collection... (3600 times) [2024-06-15 12:23:01,297][1652491] InferenceWorker_p0-w0: stopping experience collection (3600 times) [2024-06-15 12:23:01,519][1651469] Signal inference workers to resume experience collection... (3600 times) [2024-06-15 12:23:01,520][1652491] InferenceWorker_p0-w0: resuming experience collection (3600 times) [2024-06-15 12:23:03,469][1652491] Updated weights for policy 0, policy_version 68720 (0.0011) [2024-06-15 12:23:05,955][1648985] Fps is (10 sec: 52466.2, 60 sec: 45875.5, 300 sec: 45875.2). Total num frames: 140771328. Throughput: 0: 11195.8. Samples: 35259392. Policy #0 lag: (min: 34.0, avg: 115.5, max: 290.0) [2024-06-15 12:23:05,956][1648985] Avg episode reward: [(0, '135.720')] [2024-06-15 12:23:09,052][1652491] Updated weights for policy 0, policy_version 68772 (0.0129) [2024-06-15 12:23:10,831][1652491] Updated weights for policy 0, policy_version 68820 (0.0018) [2024-06-15 12:23:10,955][1648985] Fps is (10 sec: 42598.2, 60 sec: 45875.2, 300 sec: 46208.4). Total num frames: 140935168. Throughput: 0: 11332.3. Samples: 35297792. Policy #0 lag: (min: 15.0, avg: 94.7, max: 271.0) [2024-06-15 12:23:10,956][1648985] Avg episode reward: [(0, '136.470')] [2024-06-15 12:23:12,933][1652491] Updated weights for policy 0, policy_version 68880 (0.0018) [2024-06-15 12:23:14,278][1652491] Updated weights for policy 0, policy_version 68929 (0.0013) [2024-06-15 12:23:15,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 46421.4, 300 sec: 46208.4). Total num frames: 141295616. Throughput: 0: 11286.7. Samples: 35363328. Policy #0 lag: (min: 15.0, avg: 94.7, max: 271.0) [2024-06-15 12:23:15,956][1648985] Avg episode reward: [(0, '140.730')] [2024-06-15 12:23:19,646][1652491] Updated weights for policy 0, policy_version 68996 (0.0014) [2024-06-15 12:23:20,775][1652491] Updated weights for policy 0, policy_version 69050 (0.0077) [2024-06-15 12:23:20,955][1648985] Fps is (10 sec: 49152.5, 60 sec: 45875.5, 300 sec: 46208.4). Total num frames: 141426688. Throughput: 0: 11423.3. Samples: 35436032. Policy #0 lag: (min: 15.0, avg: 94.7, max: 271.0) [2024-06-15 12:23:20,955][1648985] Avg episode reward: [(0, '153.680')] [2024-06-15 12:23:22,718][1652491] Updated weights for policy 0, policy_version 69104 (0.0013) [2024-06-15 12:23:25,955][1648985] Fps is (10 sec: 39321.5, 60 sec: 45875.2, 300 sec: 46097.8). Total num frames: 141688832. Throughput: 0: 11559.8. Samples: 35473408. Policy #0 lag: (min: 15.0, avg: 94.7, max: 271.0) [2024-06-15 12:23:25,956][1648985] Avg episode reward: [(0, '141.900')] [2024-06-15 12:23:26,085][1652491] Updated weights for policy 0, policy_version 69200 (0.0109) [2024-06-15 12:23:27,060][1652491] Updated weights for policy 0, policy_version 69248 (0.0012) [2024-06-15 12:23:30,955][1648985] Fps is (10 sec: 42598.6, 60 sec: 44236.8, 300 sec: 45875.3). Total num frames: 141852672. Throughput: 0: 11434.7. Samples: 35543040. Policy #0 lag: (min: 15.0, avg: 94.7, max: 271.0) [2024-06-15 12:23:30,956][1648985] Avg episode reward: [(0, '152.020')] [2024-06-15 12:23:31,938][1652491] Updated weights for policy 0, policy_version 69312 (0.0040) [2024-06-15 12:23:34,104][1652491] Updated weights for policy 0, policy_version 69371 (0.0012) [2024-06-15 12:23:35,955][1648985] Fps is (10 sec: 42597.2, 60 sec: 45328.8, 300 sec: 46319.5). Total num frames: 142114816. Throughput: 0: 11628.0. Samples: 35614208. Policy #0 lag: (min: 15.0, avg: 94.7, max: 271.0) [2024-06-15 12:23:35,956][1648985] Avg episode reward: [(0, '154.210')] [2024-06-15 12:23:36,327][1652491] Updated weights for policy 0, policy_version 69413 (0.0023) [2024-06-15 12:23:38,120][1652491] Updated weights for policy 0, policy_version 69474 (0.0012) [2024-06-15 12:23:40,955][1648985] Fps is (10 sec: 49150.7, 60 sec: 44236.7, 300 sec: 45875.2). Total num frames: 142344192. Throughput: 0: 11379.6. Samples: 35638784. Policy #0 lag: (min: 15.0, avg: 94.7, max: 271.0) [2024-06-15 12:23:40,956][1648985] Avg episode reward: [(0, '119.080')] [2024-06-15 12:23:42,917][1652491] Updated weights for policy 0, policy_version 69526 (0.0012) [2024-06-15 12:23:45,225][1652491] Updated weights for policy 0, policy_version 69585 (0.0013) [2024-06-15 12:23:45,561][1651469] Signal inference workers to stop experience collection... (3650 times) [2024-06-15 12:23:45,631][1652491] InferenceWorker_p0-w0: stopping experience collection (3650 times) [2024-06-15 12:23:45,873][1651469] Signal inference workers to resume experience collection... (3650 times) [2024-06-15 12:23:45,874][1652491] InferenceWorker_p0-w0: resuming experience collection (3650 times) [2024-06-15 12:23:45,955][1648985] Fps is (10 sec: 45876.7, 60 sec: 46421.4, 300 sec: 46541.7). Total num frames: 142573568. Throughput: 0: 11502.9. Samples: 35715584. Policy #0 lag: (min: 15.0, avg: 94.7, max: 271.0) [2024-06-15 12:23:45,956][1648985] Avg episode reward: [(0, '111.160')] [2024-06-15 12:23:46,261][1652491] Updated weights for policy 0, policy_version 69632 (0.0017) [2024-06-15 12:23:48,390][1652491] Updated weights for policy 0, policy_version 69683 (0.0102) [2024-06-15 12:23:50,371][1652491] Updated weights for policy 0, policy_version 69757 (0.0086) [2024-06-15 12:23:50,955][1648985] Fps is (10 sec: 52429.5, 60 sec: 46421.4, 300 sec: 46208.5). Total num frames: 142868480. Throughput: 0: 11355.0. Samples: 35770368. Policy #0 lag: (min: 15.0, avg: 94.7, max: 271.0) [2024-06-15 12:23:50,956][1648985] Avg episode reward: [(0, '135.030')] [2024-06-15 12:23:55,244][1652491] Updated weights for policy 0, policy_version 69824 (0.0090) [2024-06-15 12:23:55,998][1648985] Fps is (10 sec: 42414.9, 60 sec: 45847.6, 300 sec: 46201.7). Total num frames: 142999552. Throughput: 0: 11446.4. Samples: 35813376. Policy #0 lag: (min: 15.0, avg: 106.9, max: 271.0) [2024-06-15 12:23:55,999][1648985] Avg episode reward: [(0, '134.470')] [2024-06-15 12:23:58,104][1652491] Updated weights for policy 0, policy_version 69887 (0.0013) [2024-06-15 12:24:00,472][1652491] Updated weights for policy 0, policy_version 69955 (0.0012) [2024-06-15 12:24:00,955][1648985] Fps is (10 sec: 42598.2, 60 sec: 46421.3, 300 sec: 46320.0). Total num frames: 143294464. Throughput: 0: 11366.4. Samples: 35874816. Policy #0 lag: (min: 15.0, avg: 106.9, max: 271.0) [2024-06-15 12:24:00,956][1648985] Avg episode reward: [(0, '131.830')] [2024-06-15 12:24:01,706][1652491] Updated weights for policy 0, policy_version 70010 (0.0012) [2024-06-15 12:24:05,955][1648985] Fps is (10 sec: 42783.4, 60 sec: 44236.8, 300 sec: 45875.2). Total num frames: 143425536. Throughput: 0: 11423.3. Samples: 35950080. Policy #0 lag: (min: 15.0, avg: 106.9, max: 271.0) [2024-06-15 12:24:05,956][1648985] Avg episode reward: [(0, '131.270')] [2024-06-15 12:24:06,330][1652491] Updated weights for policy 0, policy_version 70053 (0.0013) [2024-06-15 12:24:09,068][1652491] Updated weights for policy 0, policy_version 70101 (0.0014) [2024-06-15 12:24:10,618][1652491] Updated weights for policy 0, policy_version 70162 (0.0013) [2024-06-15 12:24:10,955][1648985] Fps is (10 sec: 42598.6, 60 sec: 46421.3, 300 sec: 46541.7). Total num frames: 143720448. Throughput: 0: 11377.8. Samples: 35985408. Policy #0 lag: (min: 15.0, avg: 106.9, max: 271.0) [2024-06-15 12:24:10,956][1648985] Avg episode reward: [(0, '140.470')] [2024-06-15 12:24:12,556][1652491] Updated weights for policy 0, policy_version 70240 (0.0012) [2024-06-15 12:24:15,955][1648985] Fps is (10 sec: 49152.5, 60 sec: 43690.7, 300 sec: 46097.3). Total num frames: 143917056. Throughput: 0: 11184.3. Samples: 36046336. Policy #0 lag: (min: 15.0, avg: 106.9, max: 271.0) [2024-06-15 12:24:15,956][1648985] Avg episode reward: [(0, '140.020')] [2024-06-15 12:24:16,573][1652491] Updated weights for policy 0, policy_version 70273 (0.0011) [2024-06-15 12:24:17,761][1652491] Updated weights for policy 0, policy_version 70336 (0.0051) [2024-06-15 12:24:20,955][1648985] Fps is (10 sec: 42598.2, 60 sec: 45329.0, 300 sec: 46541.7). Total num frames: 144146432. Throughput: 0: 11366.5. Samples: 36125696. Policy #0 lag: (min: 15.0, avg: 106.9, max: 271.0) [2024-06-15 12:24:20,956][1648985] Avg episode reward: [(0, '135.540')] [2024-06-15 12:24:21,094][1652491] Updated weights for policy 0, policy_version 70387 (0.0021) [2024-06-15 12:24:22,604][1652491] Updated weights for policy 0, policy_version 70452 (0.0014) [2024-06-15 12:24:24,289][1652491] Updated weights for policy 0, policy_version 70521 (0.0015) [2024-06-15 12:24:25,955][1648985] Fps is (10 sec: 52428.5, 60 sec: 45875.2, 300 sec: 46208.5). Total num frames: 144441344. Throughput: 0: 11389.2. Samples: 36151296. Policy #0 lag: (min: 15.0, avg: 106.9, max: 271.0) [2024-06-15 12:24:25,956][1648985] Avg episode reward: [(0, '124.660')] [2024-06-15 12:24:28,256][1651469] Signal inference workers to stop experience collection... (3700 times) [2024-06-15 12:24:28,285][1652491] InferenceWorker_p0-w0: stopping experience collection (3700 times) [2024-06-15 12:24:28,422][1651469] Signal inference workers to resume experience collection... (3700 times) [2024-06-15 12:24:28,423][1652491] InferenceWorker_p0-w0: resuming experience collection (3700 times) [2024-06-15 12:24:28,544][1652491] Updated weights for policy 0, policy_version 70583 (0.0013) [2024-06-15 12:24:30,955][1648985] Fps is (10 sec: 45875.2, 60 sec: 45875.1, 300 sec: 46319.5). Total num frames: 144605184. Throughput: 0: 11491.5. Samples: 36232704. Policy #0 lag: (min: 15.0, avg: 106.9, max: 271.0) [2024-06-15 12:24:30,956][1648985] Avg episode reward: [(0, '135.080')] [2024-06-15 12:24:31,746][1652491] Updated weights for policy 0, policy_version 70640 (0.0013) [2024-06-15 12:24:32,445][1652491] Updated weights for policy 0, policy_version 70657 (0.0014) [2024-06-15 12:24:34,284][1652491] Updated weights for policy 0, policy_version 70739 (0.0013) [2024-06-15 12:24:35,131][1652491] Updated weights for policy 0, policy_version 70784 (0.0013) [2024-06-15 12:24:35,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 47513.9, 300 sec: 46549.8). Total num frames: 144965632. Throughput: 0: 11662.2. Samples: 36295168. Policy #0 lag: (min: 15.0, avg: 106.9, max: 271.0) [2024-06-15 12:24:35,956][1648985] Avg episode reward: [(0, '136.750')] [2024-06-15 12:24:40,955][1648985] Fps is (10 sec: 49152.4, 60 sec: 45875.3, 300 sec: 46208.4). Total num frames: 145096704. Throughput: 0: 11696.2. Samples: 36339200. Policy #0 lag: (min: 15.0, avg: 114.2, max: 271.0) [2024-06-15 12:24:40,956][1648985] Avg episode reward: [(0, '146.320')] [2024-06-15 12:24:42,188][1652491] Updated weights for policy 0, policy_version 70864 (0.0019) [2024-06-15 12:24:43,458][1652491] Updated weights for policy 0, policy_version 70912 (0.0044) [2024-06-15 12:24:45,232][1652491] Updated weights for policy 0, policy_version 70980 (0.0019) [2024-06-15 12:24:45,955][1648985] Fps is (10 sec: 45875.0, 60 sec: 47513.6, 300 sec: 46541.7). Total num frames: 145424384. Throughput: 0: 11719.1. Samples: 36402176. Policy #0 lag: (min: 15.0, avg: 114.2, max: 271.0) [2024-06-15 12:24:45,956][1648985] Avg episode reward: [(0, '136.840')] [2024-06-15 12:24:46,211][1652491] Updated weights for policy 0, policy_version 71035 (0.0025) [2024-06-15 12:24:50,345][1652491] Updated weights for policy 0, policy_version 71089 (0.0125) [2024-06-15 12:24:50,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 45875.2, 300 sec: 46319.5). Total num frames: 145620992. Throughput: 0: 11707.7. Samples: 36476928. Policy #0 lag: (min: 15.0, avg: 114.2, max: 271.0) [2024-06-15 12:24:50,956][1648985] Avg episode reward: [(0, '141.840')] [2024-06-15 12:24:53,679][1652491] Updated weights for policy 0, policy_version 71120 (0.0026) [2024-06-15 12:24:54,983][1652491] Updated weights for policy 0, policy_version 71171 (0.0012) [2024-06-15 12:24:55,955][1648985] Fps is (10 sec: 39320.8, 60 sec: 47001.2, 300 sec: 46430.6). Total num frames: 145817600. Throughput: 0: 11832.8. Samples: 36517888. Policy #0 lag: (min: 15.0, avg: 114.2, max: 271.0) [2024-06-15 12:24:55,956][1648985] Avg episode reward: [(0, '151.800')] [2024-06-15 12:24:56,386][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000071232_145883136.pth... [2024-06-15 12:24:56,517][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000065824_134807552.pth [2024-06-15 12:24:57,086][1652491] Updated weights for policy 0, policy_version 71253 (0.0013) [2024-06-15 12:24:57,978][1652491] Updated weights for policy 0, policy_version 71296 (0.0013) [2024-06-15 12:25:00,955][1648985] Fps is (10 sec: 42598.6, 60 sec: 45875.3, 300 sec: 46319.5). Total num frames: 146046976. Throughput: 0: 11878.4. Samples: 36580864. Policy #0 lag: (min: 15.0, avg: 114.2, max: 271.0) [2024-06-15 12:25:00,955][1648985] Avg episode reward: [(0, '151.740')] [2024-06-15 12:25:01,866][1652491] Updated weights for policy 0, policy_version 71356 (0.0017) [2024-06-15 12:25:05,343][1652491] Updated weights for policy 0, policy_version 71419 (0.0013) [2024-06-15 12:25:05,955][1648985] Fps is (10 sec: 45876.5, 60 sec: 47513.7, 300 sec: 46652.8). Total num frames: 146276352. Throughput: 0: 11821.5. Samples: 36657664. Policy #0 lag: (min: 15.0, avg: 114.2, max: 271.0) [2024-06-15 12:25:05,955][1648985] Avg episode reward: [(0, '135.610')] [2024-06-15 12:25:06,991][1652491] Updated weights for policy 0, policy_version 71472 (0.0012) [2024-06-15 12:25:07,076][1651469] Signal inference workers to stop experience collection... (3750 times) [2024-06-15 12:25:07,133][1652491] InferenceWorker_p0-w0: stopping experience collection (3750 times) [2024-06-15 12:25:07,386][1651469] Signal inference workers to resume experience collection... (3750 times) [2024-06-15 12:25:07,388][1652491] InferenceWorker_p0-w0: resuming experience collection (3750 times) [2024-06-15 12:25:08,537][1652491] Updated weights for policy 0, policy_version 71536 (0.0015) [2024-06-15 12:25:10,955][1648985] Fps is (10 sec: 49151.8, 60 sec: 46967.5, 300 sec: 46430.6). Total num frames: 146538496. Throughput: 0: 11798.8. Samples: 36682240. Policy #0 lag: (min: 15.0, avg: 114.2, max: 271.0) [2024-06-15 12:25:10,956][1648985] Avg episode reward: [(0, '121.770')] [2024-06-15 12:25:13,597][1652491] Updated weights for policy 0, policy_version 71600 (0.0018) [2024-06-15 12:25:15,955][1648985] Fps is (10 sec: 39320.4, 60 sec: 45875.0, 300 sec: 46208.4). Total num frames: 146669568. Throughput: 0: 11662.2. Samples: 36757504. Policy #0 lag: (min: 15.0, avg: 114.2, max: 271.0) [2024-06-15 12:25:15,956][1648985] Avg episode reward: [(0, '106.670')] [2024-06-15 12:25:16,517][1652491] Updated weights for policy 0, policy_version 71648 (0.0040) [2024-06-15 12:25:17,923][1652491] Updated weights for policy 0, policy_version 71698 (0.0029) [2024-06-15 12:25:20,041][1652491] Updated weights for policy 0, policy_version 71777 (0.0122) [2024-06-15 12:25:20,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 48605.9, 300 sec: 46541.7). Total num frames: 147062784. Throughput: 0: 11502.9. Samples: 36812800. Policy #0 lag: (min: 15.0, avg: 114.2, max: 271.0) [2024-06-15 12:25:20,956][1648985] Avg episode reward: [(0, '117.860')] [2024-06-15 12:25:25,293][1652491] Updated weights for policy 0, policy_version 71840 (0.0015) [2024-06-15 12:25:25,955][1648985] Fps is (10 sec: 49153.2, 60 sec: 45329.1, 300 sec: 46097.4). Total num frames: 147161088. Throughput: 0: 11434.7. Samples: 36853760. Policy #0 lag: (min: 15.0, avg: 103.1, max: 271.0) [2024-06-15 12:25:25,956][1648985] Avg episode reward: [(0, '151.850')] [2024-06-15 12:25:27,980][1652491] Updated weights for policy 0, policy_version 71888 (0.0014) [2024-06-15 12:25:29,981][1652491] Updated weights for policy 0, policy_version 71968 (0.0013) [2024-06-15 12:25:30,956][1648985] Fps is (10 sec: 39320.5, 60 sec: 47513.4, 300 sec: 46208.4). Total num frames: 147456000. Throughput: 0: 11480.1. Samples: 36918784. Policy #0 lag: (min: 15.0, avg: 103.1, max: 271.0) [2024-06-15 12:25:30,957][1648985] Avg episode reward: [(0, '159.150')] [2024-06-15 12:25:31,916][1652491] Updated weights for policy 0, policy_version 72048 (0.0013) [2024-06-15 12:25:35,955][1648985] Fps is (10 sec: 42598.3, 60 sec: 43690.6, 300 sec: 46208.4). Total num frames: 147587072. Throughput: 0: 11400.5. Samples: 36989952. Policy #0 lag: (min: 15.0, avg: 103.1, max: 271.0) [2024-06-15 12:25:35,956][1648985] Avg episode reward: [(0, '139.480')] [2024-06-15 12:25:37,238][1652491] Updated weights for policy 0, policy_version 72096 (0.0014) [2024-06-15 12:25:37,999][1652491] Updated weights for policy 0, policy_version 72128 (0.0015) [2024-06-15 12:25:40,706][1652491] Updated weights for policy 0, policy_version 72178 (0.0021) [2024-06-15 12:25:40,955][1648985] Fps is (10 sec: 39321.9, 60 sec: 45875.0, 300 sec: 45986.4). Total num frames: 147849216. Throughput: 0: 11309.5. Samples: 37026816. Policy #0 lag: (min: 15.0, avg: 103.1, max: 271.0) [2024-06-15 12:25:40,956][1648985] Avg episode reward: [(0, '121.140')] [2024-06-15 12:25:42,565][1652491] Updated weights for policy 0, policy_version 72256 (0.0213) [2024-06-15 12:25:43,566][1652491] Updated weights for policy 0, policy_version 72309 (0.0103) [2024-06-15 12:25:45,957][1648985] Fps is (10 sec: 52421.2, 60 sec: 44781.9, 300 sec: 46208.2). Total num frames: 148111360. Throughput: 0: 11252.2. Samples: 37087232. Policy #0 lag: (min: 15.0, avg: 103.1, max: 271.0) [2024-06-15 12:25:45,958][1648985] Avg episode reward: [(0, '126.700')] [2024-06-15 12:25:48,876][1652491] Updated weights for policy 0, policy_version 72372 (0.0013) [2024-06-15 12:25:50,955][1648985] Fps is (10 sec: 39322.2, 60 sec: 43690.6, 300 sec: 45764.1). Total num frames: 148242432. Throughput: 0: 11320.9. Samples: 37167104. Policy #0 lag: (min: 15.0, avg: 103.1, max: 271.0) [2024-06-15 12:25:50,956][1648985] Avg episode reward: [(0, '129.820')] [2024-06-15 12:25:51,237][1651469] Signal inference workers to stop experience collection... (3800 times) [2024-06-15 12:25:51,381][1652491] InferenceWorker_p0-w0: stopping experience collection (3800 times) [2024-06-15 12:25:51,404][1652491] Updated weights for policy 0, policy_version 72412 (0.0012) [2024-06-15 12:25:51,480][1651469] Signal inference workers to resume experience collection... (3800 times) [2024-06-15 12:25:51,481][1652491] InferenceWorker_p0-w0: resuming experience collection (3800 times) [2024-06-15 12:25:53,488][1652491] Updated weights for policy 0, policy_version 72496 (0.0026) [2024-06-15 12:25:54,868][1652491] Updated weights for policy 0, policy_version 72560 (0.0083) [2024-06-15 12:25:55,955][1648985] Fps is (10 sec: 52436.3, 60 sec: 46967.6, 300 sec: 46208.4). Total num frames: 148635648. Throughput: 0: 11355.0. Samples: 37193216. Policy #0 lag: (min: 15.0, avg: 103.1, max: 271.0) [2024-06-15 12:25:55,956][1648985] Avg episode reward: [(0, '118.330')] [2024-06-15 12:25:59,424][1652491] Updated weights for policy 0, policy_version 72592 (0.0011) [2024-06-15 12:26:00,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 45329.0, 300 sec: 45986.3). Total num frames: 148766720. Throughput: 0: 11298.2. Samples: 37265920. Policy #0 lag: (min: 15.0, avg: 103.1, max: 271.0) [2024-06-15 12:26:00,956][1648985] Avg episode reward: [(0, '121.320')] [2024-06-15 12:26:02,142][1652491] Updated weights for policy 0, policy_version 72641 (0.0093) [2024-06-15 12:26:04,246][1652491] Updated weights for policy 0, policy_version 72736 (0.0014) [2024-06-15 12:26:05,955][1648985] Fps is (10 sec: 45875.2, 60 sec: 46967.4, 300 sec: 46208.4). Total num frames: 149094400. Throughput: 0: 11389.1. Samples: 37325312. Policy #0 lag: (min: 15.0, avg: 103.1, max: 271.0) [2024-06-15 12:26:05,956][1648985] Avg episode reward: [(0, '109.080')] [2024-06-15 12:26:06,372][1652491] Updated weights for policy 0, policy_version 72828 (0.0017) [2024-06-15 12:26:10,955][1648985] Fps is (10 sec: 42598.6, 60 sec: 44236.8, 300 sec: 45875.2). Total num frames: 149192704. Throughput: 0: 11343.6. Samples: 37364224. Policy #0 lag: (min: 15.0, avg: 109.9, max: 271.0) [2024-06-15 12:26:10,956][1648985] Avg episode reward: [(0, '120.250')] [2024-06-15 12:26:12,062][1652491] Updated weights for policy 0, policy_version 72892 (0.0012) [2024-06-15 12:26:14,941][1652491] Updated weights for policy 0, policy_version 72935 (0.0015) [2024-06-15 12:26:15,956][1648985] Fps is (10 sec: 36040.7, 60 sec: 46420.6, 300 sec: 45875.0). Total num frames: 149454848. Throughput: 0: 11605.1. Samples: 37441024. Policy #0 lag: (min: 15.0, avg: 109.9, max: 271.0) [2024-06-15 12:26:15,957][1648985] Avg episode reward: [(0, '130.100')] [2024-06-15 12:26:16,392][1652491] Updated weights for policy 0, policy_version 72993 (0.0014) [2024-06-15 12:26:17,680][1652491] Updated weights for policy 0, policy_version 73056 (0.0017) [2024-06-15 12:26:20,955][1648985] Fps is (10 sec: 49152.3, 60 sec: 43690.7, 300 sec: 46097.4). Total num frames: 149684224. Throughput: 0: 11548.5. Samples: 37509632. Policy #0 lag: (min: 15.0, avg: 109.9, max: 271.0) [2024-06-15 12:26:20,956][1648985] Avg episode reward: [(0, '128.250')] [2024-06-15 12:26:21,728][1652491] Updated weights for policy 0, policy_version 73109 (0.0019) [2024-06-15 12:26:24,898][1652491] Updated weights for policy 0, policy_version 73154 (0.0016) [2024-06-15 12:26:25,955][1648985] Fps is (10 sec: 42603.2, 60 sec: 45329.0, 300 sec: 45542.0). Total num frames: 149880832. Throughput: 0: 11514.4. Samples: 37544960. Policy #0 lag: (min: 15.0, avg: 109.9, max: 271.0) [2024-06-15 12:26:25,956][1648985] Avg episode reward: [(0, '152.430')] [2024-06-15 12:26:26,231][1652491] Updated weights for policy 0, policy_version 73205 (0.0011) [2024-06-15 12:26:28,032][1652491] Updated weights for policy 0, policy_version 73280 (0.0012) [2024-06-15 12:26:28,551][1651469] Signal inference workers to stop experience collection... (3850 times) [2024-06-15 12:26:28,626][1652491] InferenceWorker_p0-w0: stopping experience collection (3850 times) [2024-06-15 12:26:28,764][1651469] Signal inference workers to resume experience collection... (3850 times) [2024-06-15 12:26:28,765][1652491] InferenceWorker_p0-w0: resuming experience collection (3850 times) [2024-06-15 12:26:29,221][1652491] Updated weights for policy 0, policy_version 73331 (0.0010) [2024-06-15 12:26:30,955][1648985] Fps is (10 sec: 52427.3, 60 sec: 45875.2, 300 sec: 46208.4). Total num frames: 150208512. Throughput: 0: 11662.5. Samples: 37612032. Policy #0 lag: (min: 15.0, avg: 109.9, max: 271.0) [2024-06-15 12:26:30,956][1648985] Avg episode reward: [(0, '141.550')] [2024-06-15 12:26:32,779][1652491] Updated weights for policy 0, policy_version 73377 (0.0012) [2024-06-15 12:26:35,955][1648985] Fps is (10 sec: 49152.1, 60 sec: 46421.3, 300 sec: 45875.2). Total num frames: 150372352. Throughput: 0: 11616.7. Samples: 37689856. Policy #0 lag: (min: 15.0, avg: 109.9, max: 271.0) [2024-06-15 12:26:35,956][1648985] Avg episode reward: [(0, '128.780')] [2024-06-15 12:26:36,004][1652491] Updated weights for policy 0, policy_version 73426 (0.0033) [2024-06-15 12:26:36,973][1652491] Updated weights for policy 0, policy_version 73467 (0.0013) [2024-06-15 12:26:38,261][1652491] Updated weights for policy 0, policy_version 73507 (0.0014) [2024-06-15 12:26:39,436][1652491] Updated weights for policy 0, policy_version 73557 (0.0013) [2024-06-15 12:26:40,955][1648985] Fps is (10 sec: 52429.6, 60 sec: 48059.8, 300 sec: 46208.4). Total num frames: 150732800. Throughput: 0: 11662.2. Samples: 37718016. Policy #0 lag: (min: 15.0, avg: 109.9, max: 271.0) [2024-06-15 12:26:40,956][1648985] Avg episode reward: [(0, '121.500')] [2024-06-15 12:26:43,198][1652491] Updated weights for policy 0, policy_version 73616 (0.0117) [2024-06-15 12:26:45,955][1648985] Fps is (10 sec: 49151.2, 60 sec: 45876.2, 300 sec: 46208.4). Total num frames: 150863872. Throughput: 0: 11707.7. Samples: 37792768. Policy #0 lag: (min: 15.0, avg: 109.9, max: 271.0) [2024-06-15 12:26:45,956][1648985] Avg episode reward: [(0, '127.060')] [2024-06-15 12:26:47,109][1652491] Updated weights for policy 0, policy_version 73680 (0.0095) [2024-06-15 12:26:48,803][1652491] Updated weights for policy 0, policy_version 73730 (0.0012) [2024-06-15 12:26:50,159][1652491] Updated weights for policy 0, policy_version 73808 (0.0013) [2024-06-15 12:26:50,955][1648985] Fps is (10 sec: 49151.6, 60 sec: 49698.0, 300 sec: 46319.5). Total num frames: 151224320. Throughput: 0: 11832.9. Samples: 37857792. Policy #0 lag: (min: 15.0, avg: 109.9, max: 271.0) [2024-06-15 12:26:50,956][1648985] Avg episode reward: [(0, '133.230')] [2024-06-15 12:26:51,104][1652491] Updated weights for policy 0, policy_version 73856 (0.0020) [2024-06-15 12:26:54,642][1652491] Updated weights for policy 0, policy_version 73910 (0.0016) [2024-06-15 12:26:55,955][1648985] Fps is (10 sec: 52429.8, 60 sec: 45875.2, 300 sec: 46430.6). Total num frames: 151388160. Throughput: 0: 11946.7. Samples: 37901824. Policy #0 lag: (min: 15.0, avg: 109.9, max: 271.0) [2024-06-15 12:26:55,956][1648985] Avg episode reward: [(0, '147.760')] [2024-06-15 12:26:56,013][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000073920_151388160.pth... [2024-06-15 12:26:56,068][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000068480_140247040.pth [2024-06-15 12:26:58,002][1652491] Updated weights for policy 0, policy_version 73953 (0.0014) [2024-06-15 12:26:59,579][1652491] Updated weights for policy 0, policy_version 74000 (0.0027) [2024-06-15 12:27:00,955][1648985] Fps is (10 sec: 42598.8, 60 sec: 48059.7, 300 sec: 46208.5). Total num frames: 151650304. Throughput: 0: 11730.8. Samples: 37968896. Policy #0 lag: (min: 9.0, avg: 96.3, max: 265.0) [2024-06-15 12:27:00,956][1648985] Avg episode reward: [(0, '145.570')] [2024-06-15 12:27:01,202][1652491] Updated weights for policy 0, policy_version 74064 (0.0082) [2024-06-15 12:27:05,163][1652491] Updated weights for policy 0, policy_version 74128 (0.0012) [2024-06-15 12:27:05,955][1648985] Fps is (10 sec: 49152.0, 60 sec: 46421.4, 300 sec: 46430.6). Total num frames: 151879680. Throughput: 0: 11787.4. Samples: 38040064. Policy #0 lag: (min: 9.0, avg: 96.3, max: 265.0) [2024-06-15 12:27:05,956][1648985] Avg episode reward: [(0, '142.170')] [2024-06-15 12:27:09,202][1652491] Updated weights for policy 0, policy_version 74193 (0.0016) [2024-06-15 12:27:10,904][1652491] Updated weights for policy 0, policy_version 74242 (0.0012) [2024-06-15 12:27:10,955][1648985] Fps is (10 sec: 39321.9, 60 sec: 47513.6, 300 sec: 45875.2). Total num frames: 152043520. Throughput: 0: 11889.8. Samples: 38080000. Policy #0 lag: (min: 9.0, avg: 96.3, max: 265.0) [2024-06-15 12:27:10,956][1648985] Avg episode reward: [(0, '126.760')] [2024-06-15 12:27:12,061][1651469] Signal inference workers to stop experience collection... (3900 times) [2024-06-15 12:27:12,115][1652491] InferenceWorker_p0-w0: stopping experience collection (3900 times) [2024-06-15 12:27:12,249][1651469] Signal inference workers to resume experience collection... (3900 times) [2024-06-15 12:27:12,250][1652491] InferenceWorker_p0-w0: resuming experience collection (3900 times) [2024-06-15 12:27:12,904][1652491] Updated weights for policy 0, policy_version 74336 (0.0014) [2024-06-15 12:27:15,955][1648985] Fps is (10 sec: 42598.4, 60 sec: 47514.5, 300 sec: 46208.5). Total num frames: 152305664. Throughput: 0: 11821.6. Samples: 38144000. Policy #0 lag: (min: 9.0, avg: 96.3, max: 265.0) [2024-06-15 12:27:15,956][1648985] Avg episode reward: [(0, '119.900')] [2024-06-15 12:27:16,362][1652491] Updated weights for policy 0, policy_version 74384 (0.0037) [2024-06-15 12:27:17,445][1652491] Updated weights for policy 0, policy_version 74430 (0.0012) [2024-06-15 12:27:20,920][1652491] Updated weights for policy 0, policy_version 74488 (0.0014) [2024-06-15 12:27:20,955][1648985] Fps is (10 sec: 49152.1, 60 sec: 47513.6, 300 sec: 46097.4). Total num frames: 152535040. Throughput: 0: 11639.5. Samples: 38213632. Policy #0 lag: (min: 9.0, avg: 96.3, max: 265.0) [2024-06-15 12:27:20,956][1648985] Avg episode reward: [(0, '137.760')] [2024-06-15 12:27:22,932][1652491] Updated weights for policy 0, policy_version 74534 (0.0083) [2024-06-15 12:27:24,786][1652491] Updated weights for policy 0, policy_version 74615 (0.0012) [2024-06-15 12:27:25,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 49152.1, 300 sec: 46208.4). Total num frames: 152829952. Throughput: 0: 11719.2. Samples: 38245376. Policy #0 lag: (min: 9.0, avg: 96.3, max: 265.0) [2024-06-15 12:27:25,955][1648985] Avg episode reward: [(0, '132.430')] [2024-06-15 12:27:28,023][1652491] Updated weights for policy 0, policy_version 74656 (0.0016) [2024-06-15 12:27:28,887][1652491] Updated weights for policy 0, policy_version 74688 (0.0012) [2024-06-15 12:27:30,955][1648985] Fps is (10 sec: 42598.3, 60 sec: 45875.4, 300 sec: 45986.3). Total num frames: 152961024. Throughput: 0: 11753.3. Samples: 38321664. Policy #0 lag: (min: 9.0, avg: 96.3, max: 265.0) [2024-06-15 12:27:30,956][1648985] Avg episode reward: [(0, '117.280')] [2024-06-15 12:27:32,420][1652491] Updated weights for policy 0, policy_version 74746 (0.0040) [2024-06-15 12:27:35,120][1652491] Updated weights for policy 0, policy_version 74818 (0.0012) [2024-06-15 12:27:35,955][1648985] Fps is (10 sec: 45875.1, 60 sec: 48605.9, 300 sec: 46097.4). Total num frames: 153288704. Throughput: 0: 11730.5. Samples: 38385664. Policy #0 lag: (min: 9.0, avg: 96.3, max: 265.0) [2024-06-15 12:27:35,956][1648985] Avg episode reward: [(0, '111.370')] [2024-06-15 12:27:36,400][1652491] Updated weights for policy 0, policy_version 74879 (0.0013) [2024-06-15 12:27:39,261][1652491] Updated weights for policy 0, policy_version 74940 (0.0013) [2024-06-15 12:27:40,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 45875.2, 300 sec: 46430.6). Total num frames: 153485312. Throughput: 0: 11605.3. Samples: 38424064. Policy #0 lag: (min: 9.0, avg: 96.3, max: 265.0) [2024-06-15 12:27:40,956][1648985] Avg episode reward: [(0, '114.250')] [2024-06-15 12:27:42,866][1652491] Updated weights for policy 0, policy_version 74980 (0.0011) [2024-06-15 12:27:45,146][1652491] Updated weights for policy 0, policy_version 75042 (0.0013) [2024-06-15 12:27:45,955][1648985] Fps is (10 sec: 49152.2, 60 sec: 48606.1, 300 sec: 46430.6). Total num frames: 153780224. Throughput: 0: 11821.6. Samples: 38500864. Policy #0 lag: (min: 47.0, avg: 133.8, max: 303.0) [2024-06-15 12:27:45,956][1648985] Avg episode reward: [(0, '133.770')] [2024-06-15 12:27:46,074][1652491] Updated weights for policy 0, policy_version 75092 (0.0013) [2024-06-15 12:27:46,839][1652491] Updated weights for policy 0, policy_version 75136 (0.0012) [2024-06-15 12:27:50,017][1652491] Updated weights for policy 0, policy_version 75194 (0.0014) [2024-06-15 12:27:50,974][1648985] Fps is (10 sec: 52328.8, 60 sec: 46406.7, 300 sec: 46650.9). Total num frames: 154009600. Throughput: 0: 11827.9. Samples: 38572544. Policy #0 lag: (min: 47.0, avg: 133.8, max: 303.0) [2024-06-15 12:27:50,975][1648985] Avg episode reward: [(0, '131.740')] [2024-06-15 12:27:54,149][1652491] Updated weights for policy 0, policy_version 75233 (0.0027) [2024-06-15 12:27:55,955][1648985] Fps is (10 sec: 39321.2, 60 sec: 46421.3, 300 sec: 46319.5). Total num frames: 154173440. Throughput: 0: 11810.1. Samples: 38611456. Policy #0 lag: (min: 47.0, avg: 133.8, max: 303.0) [2024-06-15 12:27:55,956][1648985] Avg episode reward: [(0, '109.920')] [2024-06-15 12:27:56,091][1652491] Updated weights for policy 0, policy_version 75292 (0.0013) [2024-06-15 12:27:56,282][1651469] Signal inference workers to stop experience collection... (3950 times) [2024-06-15 12:27:56,326][1652491] InferenceWorker_p0-w0: stopping experience collection (3950 times) [2024-06-15 12:27:56,544][1651469] Signal inference workers to resume experience collection... (3950 times) [2024-06-15 12:27:56,546][1652491] InferenceWorker_p0-w0: resuming experience collection (3950 times) [2024-06-15 12:27:57,806][1652491] Updated weights for policy 0, policy_version 75363 (0.0013) [2024-06-15 12:28:00,413][1652491] Updated weights for policy 0, policy_version 75427 (0.0015) [2024-06-15 12:28:00,955][1648985] Fps is (10 sec: 52529.7, 60 sec: 48059.8, 300 sec: 46652.8). Total num frames: 154533888. Throughput: 0: 11844.3. Samples: 38676992. Policy #0 lag: (min: 47.0, avg: 133.8, max: 303.0) [2024-06-15 12:28:00,955][1648985] Avg episode reward: [(0, '111.640')] [2024-06-15 12:28:04,848][1652491] Updated weights for policy 0, policy_version 75474 (0.0016) [2024-06-15 12:28:05,955][1648985] Fps is (10 sec: 49152.2, 60 sec: 46421.3, 300 sec: 46541.7). Total num frames: 154664960. Throughput: 0: 11923.9. Samples: 38750208. Policy #0 lag: (min: 47.0, avg: 133.8, max: 303.0) [2024-06-15 12:28:05,956][1648985] Avg episode reward: [(0, '126.650')] [2024-06-15 12:28:06,906][1652491] Updated weights for policy 0, policy_version 75536 (0.0013) [2024-06-15 12:28:08,157][1652491] Updated weights for policy 0, policy_version 75593 (0.0013) [2024-06-15 12:28:09,388][1652491] Updated weights for policy 0, policy_version 75639 (0.0012) [2024-06-15 12:28:10,955][1648985] Fps is (10 sec: 42598.0, 60 sec: 48605.8, 300 sec: 46319.5). Total num frames: 154959872. Throughput: 0: 11901.1. Samples: 38780928. Policy #0 lag: (min: 47.0, avg: 133.8, max: 303.0) [2024-06-15 12:28:10,956][1648985] Avg episode reward: [(0, '121.120')] [2024-06-15 12:28:11,510][1652491] Updated weights for policy 0, policy_version 75685 (0.0014) [2024-06-15 12:28:15,955][1648985] Fps is (10 sec: 42598.6, 60 sec: 46421.4, 300 sec: 46319.5). Total num frames: 155090944. Throughput: 0: 11969.4. Samples: 38860288. Policy #0 lag: (min: 47.0, avg: 133.8, max: 303.0) [2024-06-15 12:28:15,955][1648985] Avg episode reward: [(0, '107.010')] [2024-06-15 12:28:16,092][1652491] Updated weights for policy 0, policy_version 75744 (0.0024) [2024-06-15 12:28:18,020][1652491] Updated weights for policy 0, policy_version 75795 (0.0014) [2024-06-15 12:28:19,460][1652491] Updated weights for policy 0, policy_version 75860 (0.0013) [2024-06-15 12:28:20,416][1652491] Updated weights for policy 0, policy_version 75903 (0.0013) [2024-06-15 12:28:20,955][1648985] Fps is (10 sec: 49152.4, 60 sec: 48605.9, 300 sec: 46652.8). Total num frames: 155451392. Throughput: 0: 12026.3. Samples: 38926848. Policy #0 lag: (min: 47.0, avg: 133.8, max: 303.0) [2024-06-15 12:28:20,956][1648985] Avg episode reward: [(0, '110.730')] [2024-06-15 12:28:22,552][1652491] Updated weights for policy 0, policy_version 75968 (0.0018) [2024-06-15 12:28:25,955][1648985] Fps is (10 sec: 49151.4, 60 sec: 45875.1, 300 sec: 46541.6). Total num frames: 155582464. Throughput: 0: 11946.7. Samples: 38961664. Policy #0 lag: (min: 47.0, avg: 133.8, max: 303.0) [2024-06-15 12:28:25,956][1648985] Avg episode reward: [(0, '119.700')] [2024-06-15 12:28:27,349][1652491] Updated weights for policy 0, policy_version 76026 (0.0025) [2024-06-15 12:28:29,204][1652491] Updated weights for policy 0, policy_version 76096 (0.0012) [2024-06-15 12:28:30,955][1648985] Fps is (10 sec: 49150.9, 60 sec: 49698.0, 300 sec: 46874.9). Total num frames: 155942912. Throughput: 0: 11889.7. Samples: 39035904. Policy #0 lag: (min: 8.0, avg: 148.5, max: 312.0) [2024-06-15 12:28:30,956][1648985] Avg episode reward: [(0, '125.550')] [2024-06-15 12:28:31,095][1652491] Updated weights for policy 0, policy_version 76154 (0.0129) [2024-06-15 12:28:34,179][1652491] Updated weights for policy 0, policy_version 76210 (0.0013) [2024-06-15 12:28:35,955][1648985] Fps is (10 sec: 52427.8, 60 sec: 46967.2, 300 sec: 46652.7). Total num frames: 156106752. Throughput: 0: 11826.5. Samples: 39104512. Policy #0 lag: (min: 8.0, avg: 148.5, max: 312.0) [2024-06-15 12:28:35,957][1648985] Avg episode reward: [(0, '121.380')] [2024-06-15 12:28:38,398][1652491] Updated weights for policy 0, policy_version 76256 (0.0123) [2024-06-15 12:28:39,989][1651469] Signal inference workers to stop experience collection... (4000 times) [2024-06-15 12:28:40,042][1652491] InferenceWorker_p0-w0: stopping experience collection (4000 times) [2024-06-15 12:28:40,161][1651469] Signal inference workers to resume experience collection... (4000 times) [2024-06-15 12:28:40,173][1652491] InferenceWorker_p0-w0: resuming experience collection (4000 times) [2024-06-15 12:28:40,175][1652491] Updated weights for policy 0, policy_version 76320 (0.0033) [2024-06-15 12:28:40,955][1648985] Fps is (10 sec: 42599.2, 60 sec: 48059.8, 300 sec: 46763.8). Total num frames: 156368896. Throughput: 0: 11707.7. Samples: 39138304. Policy #0 lag: (min: 8.0, avg: 148.5, max: 312.0) [2024-06-15 12:28:40,956][1648985] Avg episode reward: [(0, '118.370')] [2024-06-15 12:28:41,185][1652491] Updated weights for policy 0, policy_version 76355 (0.0021) [2024-06-15 12:28:44,163][1652491] Updated weights for policy 0, policy_version 76419 (0.0013) [2024-06-15 12:28:45,469][1652491] Updated weights for policy 0, policy_version 76477 (0.0019) [2024-06-15 12:28:45,955][1648985] Fps is (10 sec: 52430.3, 60 sec: 47513.6, 300 sec: 46652.8). Total num frames: 156631040. Throughput: 0: 11719.1. Samples: 39204352. Policy #0 lag: (min: 8.0, avg: 148.5, max: 312.0) [2024-06-15 12:28:45,956][1648985] Avg episode reward: [(0, '123.640')] [2024-06-15 12:28:49,859][1652491] Updated weights for policy 0, policy_version 76538 (0.0013) [2024-06-15 12:28:50,955][1648985] Fps is (10 sec: 39321.3, 60 sec: 45889.8, 300 sec: 46659.6). Total num frames: 156762112. Throughput: 0: 11673.6. Samples: 39275520. Policy #0 lag: (min: 8.0, avg: 148.5, max: 312.0) [2024-06-15 12:28:50,956][1648985] Avg episode reward: [(0, '100.740')] [2024-06-15 12:28:52,570][1652491] Updated weights for policy 0, policy_version 76603 (0.0014) [2024-06-15 12:28:54,159][1652491] Updated weights for policy 0, policy_version 76656 (0.0018) [2024-06-15 12:28:55,955][1648985] Fps is (10 sec: 45874.1, 60 sec: 48605.7, 300 sec: 46763.8). Total num frames: 157089792. Throughput: 0: 11696.3. Samples: 39307264. Policy #0 lag: (min: 8.0, avg: 148.5, max: 312.0) [2024-06-15 12:28:55,956][1648985] Avg episode reward: [(0, '100.660')] [2024-06-15 12:28:56,041][1652491] Updated weights for policy 0, policy_version 76707 (0.0015) [2024-06-15 12:28:56,186][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000076720_157122560.pth... [2024-06-15 12:28:56,237][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000071232_145883136.pth [2024-06-15 12:28:59,668][1652491] Updated weights for policy 0, policy_version 76755 (0.0014) [2024-06-15 12:29:00,571][1652491] Updated weights for policy 0, policy_version 76800 (0.0016) [2024-06-15 12:29:00,955][1648985] Fps is (10 sec: 52429.3, 60 sec: 45875.2, 300 sec: 46986.0). Total num frames: 157286400. Throughput: 0: 11650.8. Samples: 39384576. Policy #0 lag: (min: 8.0, avg: 148.5, max: 312.0) [2024-06-15 12:29:00,956][1648985] Avg episode reward: [(0, '117.450')] [2024-06-15 12:29:04,042][1652491] Updated weights for policy 0, policy_version 76859 (0.0014) [2024-06-15 12:29:05,724][1652491] Updated weights for policy 0, policy_version 76915 (0.0014) [2024-06-15 12:29:05,967][1648985] Fps is (10 sec: 45819.7, 60 sec: 48049.9, 300 sec: 46872.9). Total num frames: 157548544. Throughput: 0: 11738.6. Samples: 39455232. Policy #0 lag: (min: 8.0, avg: 148.5, max: 312.0) [2024-06-15 12:29:05,968][1648985] Avg episode reward: [(0, '125.350')] [2024-06-15 12:29:07,317][1652491] Updated weights for policy 0, policy_version 76984 (0.0011) [2024-06-15 12:29:10,955][1648985] Fps is (10 sec: 45874.0, 60 sec: 46421.2, 300 sec: 46874.9). Total num frames: 157745152. Throughput: 0: 11639.4. Samples: 39485440. Policy #0 lag: (min: 8.0, avg: 148.5, max: 312.0) [2024-06-15 12:29:10,956][1648985] Avg episode reward: [(0, '126.580')] [2024-06-15 12:29:11,533][1652491] Updated weights for policy 0, policy_version 77056 (0.0013) [2024-06-15 12:29:15,375][1652491] Updated weights for policy 0, policy_version 77105 (0.0014) [2024-06-15 12:29:15,955][1648985] Fps is (10 sec: 39370.0, 60 sec: 47513.5, 300 sec: 46763.8). Total num frames: 157941760. Throughput: 0: 11628.1. Samples: 39559168. Policy #0 lag: (min: 8.0, avg: 148.5, max: 312.0) [2024-06-15 12:29:15,956][1648985] Avg episode reward: [(0, '126.270')] [2024-06-15 12:29:16,905][1652491] Updated weights for policy 0, policy_version 77152 (0.0013) [2024-06-15 12:29:18,961][1652491] Updated weights for policy 0, policy_version 77232 (0.0011) [2024-06-15 12:29:20,955][1648985] Fps is (10 sec: 45875.9, 60 sec: 45875.1, 300 sec: 46652.7). Total num frames: 158203904. Throughput: 0: 11468.8. Samples: 39620608. Policy #0 lag: (min: 32.0, avg: 138.0, max: 288.0) [2024-06-15 12:29:20,956][1648985] Avg episode reward: [(0, '126.340')] [2024-06-15 12:29:22,252][1652491] Updated weights for policy 0, policy_version 77280 (0.0017) [2024-06-15 12:29:25,955][1648985] Fps is (10 sec: 39321.1, 60 sec: 45875.1, 300 sec: 46541.7). Total num frames: 158334976. Throughput: 0: 11480.1. Samples: 39654912. Policy #0 lag: (min: 32.0, avg: 138.0, max: 288.0) [2024-06-15 12:29:25,956][1648985] Avg episode reward: [(0, '133.330')] [2024-06-15 12:29:26,451][1651469] Signal inference workers to stop experience collection... (4050 times) [2024-06-15 12:29:26,508][1652491] InferenceWorker_p0-w0: stopping experience collection (4050 times) [2024-06-15 12:29:26,713][1651469] Signal inference workers to resume experience collection... (4050 times) [2024-06-15 12:29:26,715][1652491] InferenceWorker_p0-w0: resuming experience collection (4050 times) [2024-06-15 12:29:26,718][1652491] Updated weights for policy 0, policy_version 77344 (0.0017) [2024-06-15 12:29:28,570][1652491] Updated weights for policy 0, policy_version 77409 (0.0013) [2024-06-15 12:29:29,744][1652491] Updated weights for policy 0, policy_version 77456 (0.0036) [2024-06-15 12:29:30,955][1648985] Fps is (10 sec: 49152.3, 60 sec: 45875.3, 300 sec: 46541.7). Total num frames: 158695424. Throughput: 0: 11559.8. Samples: 39724544. Policy #0 lag: (min: 32.0, avg: 138.0, max: 288.0) [2024-06-15 12:29:30,956][1648985] Avg episode reward: [(0, '129.720')] [2024-06-15 12:29:33,523][1652491] Updated weights for policy 0, policy_version 77506 (0.0013) [2024-06-15 12:29:34,816][1652491] Updated weights for policy 0, policy_version 77565 (0.0012) [2024-06-15 12:29:35,955][1648985] Fps is (10 sec: 52429.7, 60 sec: 45875.4, 300 sec: 46652.8). Total num frames: 158859264. Throughput: 0: 11537.1. Samples: 39794688. Policy #0 lag: (min: 32.0, avg: 138.0, max: 288.0) [2024-06-15 12:29:35,956][1648985] Avg episode reward: [(0, '147.230')] [2024-06-15 12:29:38,406][1652491] Updated weights for policy 0, policy_version 77601 (0.0013) [2024-06-15 12:29:40,305][1652491] Updated weights for policy 0, policy_version 77680 (0.0011) [2024-06-15 12:29:40,955][1648985] Fps is (10 sec: 42597.9, 60 sec: 45875.1, 300 sec: 46430.6). Total num frames: 159121408. Throughput: 0: 11628.1. Samples: 39830528. Policy #0 lag: (min: 32.0, avg: 138.0, max: 288.0) [2024-06-15 12:29:40,956][1648985] Avg episode reward: [(0, '146.000')] [2024-06-15 12:29:42,029][1652491] Updated weights for policy 0, policy_version 77750 (0.0032) [2024-06-15 12:29:45,955][1648985] Fps is (10 sec: 42597.9, 60 sec: 44236.7, 300 sec: 46319.5). Total num frames: 159285248. Throughput: 0: 11320.9. Samples: 39894016. Policy #0 lag: (min: 32.0, avg: 138.0, max: 288.0) [2024-06-15 12:29:45,956][1648985] Avg episode reward: [(0, '120.580')] [2024-06-15 12:29:46,581][1652491] Updated weights for policy 0, policy_version 77808 (0.0013) [2024-06-15 12:29:49,882][1652491] Updated weights for policy 0, policy_version 77872 (0.0085) [2024-06-15 12:29:50,243][1652491] Updated weights for policy 0, policy_version 77886 (0.0010) [2024-06-15 12:29:50,955][1648985] Fps is (10 sec: 39322.4, 60 sec: 45875.3, 300 sec: 46430.6). Total num frames: 159514624. Throughput: 0: 11358.1. Samples: 39966208. Policy #0 lag: (min: 32.0, avg: 138.0, max: 288.0) [2024-06-15 12:29:50,955][1648985] Avg episode reward: [(0, '125.630')] [2024-06-15 12:29:51,885][1652491] Updated weights for policy 0, policy_version 77947 (0.0012) [2024-06-15 12:29:53,033][1652491] Updated weights for policy 0, policy_version 77988 (0.0011) [2024-06-15 12:29:55,955][1648985] Fps is (10 sec: 49152.3, 60 sec: 44783.1, 300 sec: 46541.7). Total num frames: 159776768. Throughput: 0: 11389.2. Samples: 39997952. Policy #0 lag: (min: 32.0, avg: 138.0, max: 288.0) [2024-06-15 12:29:55,956][1648985] Avg episode reward: [(0, '129.640')] [2024-06-15 12:29:56,862][1652491] Updated weights for policy 0, policy_version 78033 (0.0016) [2024-06-15 12:29:58,076][1652491] Updated weights for policy 0, policy_version 78080 (0.0019) [2024-06-15 12:30:00,956][1648985] Fps is (10 sec: 45873.1, 60 sec: 44782.6, 300 sec: 46430.5). Total num frames: 159973376. Throughput: 0: 11286.7. Samples: 40067072. Policy #0 lag: (min: 32.0, avg: 138.0, max: 288.0) [2024-06-15 12:30:00,956][1648985] Avg episode reward: [(0, '103.500')] [2024-06-15 12:30:01,625][1652491] Updated weights for policy 0, policy_version 78136 (0.0026) [2024-06-15 12:30:03,126][1652491] Updated weights for policy 0, policy_version 78176 (0.0011) [2024-06-15 12:30:05,148][1652491] Updated weights for policy 0, policy_version 78256 (0.0013) [2024-06-15 12:30:05,955][1648985] Fps is (10 sec: 52428.6, 60 sec: 45884.6, 300 sec: 46652.7). Total num frames: 160301056. Throughput: 0: 11355.0. Samples: 40131584. Policy #0 lag: (min: 76.0, avg: 180.3, max: 287.0) [2024-06-15 12:30:05,956][1648985] Avg episode reward: [(0, '107.230')] [2024-06-15 12:30:07,940][1651469] Signal inference workers to stop experience collection... (4100 times) [2024-06-15 12:30:08,009][1652491] InferenceWorker_p0-w0: stopping experience collection (4100 times) [2024-06-15 12:30:08,096][1651469] Signal inference workers to resume experience collection... (4100 times) [2024-06-15 12:30:08,102][1652491] InferenceWorker_p0-w0: resuming experience collection (4100 times) [2024-06-15 12:30:08,329][1652491] Updated weights for policy 0, policy_version 78307 (0.0012) [2024-06-15 12:30:10,955][1648985] Fps is (10 sec: 45877.0, 60 sec: 44783.1, 300 sec: 46652.8). Total num frames: 160432128. Throughput: 0: 11411.9. Samples: 40168448. Policy #0 lag: (min: 76.0, avg: 180.3, max: 287.0) [2024-06-15 12:30:10,956][1648985] Avg episode reward: [(0, '115.000')] [2024-06-15 12:30:11,615][1652491] Updated weights for policy 0, policy_version 78342 (0.0011) [2024-06-15 12:30:12,531][1652491] Updated weights for policy 0, policy_version 78398 (0.0026) [2024-06-15 12:30:15,609][1652491] Updated weights for policy 0, policy_version 78465 (0.0013) [2024-06-15 12:30:15,955][1648985] Fps is (10 sec: 42598.6, 60 sec: 46421.3, 300 sec: 46319.5). Total num frames: 160727040. Throughput: 0: 11457.4. Samples: 40240128. Policy #0 lag: (min: 76.0, avg: 180.3, max: 287.0) [2024-06-15 12:30:15,956][1648985] Avg episode reward: [(0, '127.890')] [2024-06-15 12:30:16,837][1652491] Updated weights for policy 0, policy_version 78520 (0.0010) [2024-06-15 12:30:19,789][1652491] Updated weights for policy 0, policy_version 78584 (0.0012) [2024-06-15 12:30:20,955][1648985] Fps is (10 sec: 52428.1, 60 sec: 45875.2, 300 sec: 46763.8). Total num frames: 160956416. Throughput: 0: 11434.6. Samples: 40309248. Policy #0 lag: (min: 76.0, avg: 180.3, max: 287.0) [2024-06-15 12:30:20,956][1648985] Avg episode reward: [(0, '132.860')] [2024-06-15 12:30:23,524][1652491] Updated weights for policy 0, policy_version 78610 (0.0112) [2024-06-15 12:30:24,317][1652491] Updated weights for policy 0, policy_version 78656 (0.0013) [2024-06-15 12:30:25,955][1648985] Fps is (10 sec: 39321.7, 60 sec: 46421.5, 300 sec: 46319.6). Total num frames: 161120256. Throughput: 0: 11434.7. Samples: 40345088. Policy #0 lag: (min: 76.0, avg: 180.3, max: 287.0) [2024-06-15 12:30:25,955][1648985] Avg episode reward: [(0, '117.220')] [2024-06-15 12:30:26,778][1652491] Updated weights for policy 0, policy_version 78720 (0.0013) [2024-06-15 12:30:27,992][1652491] Updated weights for policy 0, policy_version 78764 (0.0011) [2024-06-15 12:30:29,660][1652491] Updated weights for policy 0, policy_version 78800 (0.0015) [2024-06-15 12:30:30,955][1648985] Fps is (10 sec: 52429.7, 60 sec: 46421.4, 300 sec: 47097.1). Total num frames: 161480704. Throughput: 0: 11650.9. Samples: 40418304. Policy #0 lag: (min: 76.0, avg: 180.3, max: 287.0) [2024-06-15 12:30:30,956][1648985] Avg episode reward: [(0, '102.910')] [2024-06-15 12:30:34,509][1652491] Updated weights for policy 0, policy_version 78864 (0.0013) [2024-06-15 12:30:35,580][1652491] Updated weights for policy 0, policy_version 78907 (0.0013) [2024-06-15 12:30:35,955][1648985] Fps is (10 sec: 49152.0, 60 sec: 45875.2, 300 sec: 46652.8). Total num frames: 161611776. Throughput: 0: 11662.2. Samples: 40491008. Policy #0 lag: (min: 76.0, avg: 180.3, max: 287.0) [2024-06-15 12:30:35,956][1648985] Avg episode reward: [(0, '113.410')] [2024-06-15 12:30:37,399][1652491] Updated weights for policy 0, policy_version 78946 (0.0013) [2024-06-15 12:30:39,458][1652491] Updated weights for policy 0, policy_version 79031 (0.0012) [2024-06-15 12:30:40,955][1648985] Fps is (10 sec: 42598.5, 60 sec: 46421.5, 300 sec: 46764.1). Total num frames: 161906688. Throughput: 0: 11571.2. Samples: 40518656. Policy #0 lag: (min: 76.0, avg: 180.3, max: 287.0) [2024-06-15 12:30:40,955][1648985] Avg episode reward: [(0, '122.040')] [2024-06-15 12:30:41,140][1652491] Updated weights for policy 0, policy_version 79073 (0.0012) [2024-06-15 12:30:45,955][1648985] Fps is (10 sec: 39321.9, 60 sec: 45329.2, 300 sec: 46652.8). Total num frames: 162004992. Throughput: 0: 11616.8. Samples: 40589824. Policy #0 lag: (min: 76.0, avg: 180.3, max: 287.0) [2024-06-15 12:30:45,955][1648985] Avg episode reward: [(0, '117.750')] [2024-06-15 12:30:46,257][1652491] Updated weights for policy 0, policy_version 79106 (0.0024) [2024-06-15 12:30:47,446][1652491] Updated weights for policy 0, policy_version 79166 (0.0013) [2024-06-15 12:30:49,436][1652491] Updated weights for policy 0, policy_version 79216 (0.0022) [2024-06-15 12:30:50,814][1651469] Signal inference workers to stop experience collection... (4150 times) [2024-06-15 12:30:50,846][1652491] InferenceWorker_p0-w0: stopping experience collection (4150 times) [2024-06-15 12:30:50,955][1648985] Fps is (10 sec: 42598.1, 60 sec: 46967.4, 300 sec: 46430.6). Total num frames: 162332672. Throughput: 0: 11571.2. Samples: 40652288. Policy #0 lag: (min: 76.0, avg: 180.3, max: 287.0) [2024-06-15 12:30:50,956][1648985] Avg episode reward: [(0, '113.400')] [2024-06-15 12:30:51,042][1651469] Signal inference workers to resume experience collection... (4150 times) [2024-06-15 12:30:51,043][1652491] InferenceWorker_p0-w0: resuming experience collection (4150 times) [2024-06-15 12:30:51,213][1652491] Updated weights for policy 0, policy_version 79289 (0.0014) [2024-06-15 12:30:52,665][1652491] Updated weights for policy 0, policy_version 79328 (0.0012) [2024-06-15 12:30:55,955][1648985] Fps is (10 sec: 52426.5, 60 sec: 45874.9, 300 sec: 46652.7). Total num frames: 162529280. Throughput: 0: 11434.6. Samples: 40683008. Policy #0 lag: (min: 47.0, avg: 204.9, max: 303.0) [2024-06-15 12:30:55,956][1648985] Avg episode reward: [(0, '128.350')] [2024-06-15 12:30:55,963][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000079360_162529280.pth... [2024-06-15 12:30:56,029][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000073920_151388160.pth [2024-06-15 12:30:56,035][1651469] Saving a milestone train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/milestones/checkpoint_000079360_162529280.pth [2024-06-15 12:30:59,116][1652491] Updated weights for policy 0, policy_version 79414 (0.0015) [2024-06-15 12:31:00,829][1652491] Updated weights for policy 0, policy_version 79456 (0.0014) [2024-06-15 12:31:00,955][1648985] Fps is (10 sec: 39321.0, 60 sec: 45875.4, 300 sec: 46208.4). Total num frames: 162725888. Throughput: 0: 11502.9. Samples: 40757760. Policy #0 lag: (min: 47.0, avg: 204.9, max: 303.0) [2024-06-15 12:31:00,956][1648985] Avg episode reward: [(0, '122.960')] [2024-06-15 12:31:02,624][1652491] Updated weights for policy 0, policy_version 79520 (0.0012) [2024-06-15 12:31:04,854][1652491] Updated weights for policy 0, policy_version 79600 (0.0013) [2024-06-15 12:31:05,955][1648985] Fps is (10 sec: 52430.5, 60 sec: 45875.2, 300 sec: 46986.0). Total num frames: 163053568. Throughput: 0: 11252.6. Samples: 40815616. Policy #0 lag: (min: 47.0, avg: 204.9, max: 303.0) [2024-06-15 12:31:05,956][1648985] Avg episode reward: [(0, '120.380')] [2024-06-15 12:31:10,685][1652491] Updated weights for policy 0, policy_version 79664 (0.0013) [2024-06-15 12:31:10,955][1648985] Fps is (10 sec: 45876.0, 60 sec: 45875.2, 300 sec: 46541.9). Total num frames: 163184640. Throughput: 0: 11343.6. Samples: 40855552. Policy #0 lag: (min: 47.0, avg: 204.9, max: 303.0) [2024-06-15 12:31:10,956][1648985] Avg episode reward: [(0, '125.150')] [2024-06-15 12:31:13,611][1652491] Updated weights for policy 0, policy_version 79728 (0.0104) [2024-06-15 12:31:15,040][1652491] Updated weights for policy 0, policy_version 79777 (0.0012) [2024-06-15 12:31:15,966][1648985] Fps is (10 sec: 39277.9, 60 sec: 45320.7, 300 sec: 46651.0). Total num frames: 163446784. Throughput: 0: 11045.1. Samples: 40915456. Policy #0 lag: (min: 47.0, avg: 204.9, max: 303.0) [2024-06-15 12:31:15,967][1648985] Avg episode reward: [(0, '124.800')] [2024-06-15 12:31:16,028][1652491] Updated weights for policy 0, policy_version 79812 (0.0012) [2024-06-15 12:31:17,290][1652491] Updated weights for policy 0, policy_version 79862 (0.0015) [2024-06-15 12:31:20,955][1648985] Fps is (10 sec: 39321.8, 60 sec: 43690.8, 300 sec: 46430.6). Total num frames: 163577856. Throughput: 0: 11059.2. Samples: 40988672. Policy #0 lag: (min: 47.0, avg: 204.9, max: 303.0) [2024-06-15 12:31:20,956][1648985] Avg episode reward: [(0, '122.660')] [2024-06-15 12:31:22,139][1652491] Updated weights for policy 0, policy_version 79892 (0.0013) [2024-06-15 12:31:24,933][1652491] Updated weights for policy 0, policy_version 79953 (0.0015) [2024-06-15 12:31:25,955][1648985] Fps is (10 sec: 36085.0, 60 sec: 44782.9, 300 sec: 46097.4). Total num frames: 163807232. Throughput: 0: 11241.2. Samples: 41024512. Policy #0 lag: (min: 47.0, avg: 204.9, max: 303.0) [2024-06-15 12:31:25,956][1648985] Avg episode reward: [(0, '120.130')] [2024-06-15 12:31:27,084][1652491] Updated weights for policy 0, policy_version 80037 (0.0012) [2024-06-15 12:31:28,446][1652491] Updated weights for policy 0, policy_version 80098 (0.0017) [2024-06-15 12:31:30,955][1648985] Fps is (10 sec: 52428.2, 60 sec: 43690.6, 300 sec: 46541.7). Total num frames: 164102144. Throughput: 0: 10945.4. Samples: 41082368. Policy #0 lag: (min: 47.0, avg: 204.9, max: 303.0) [2024-06-15 12:31:30,956][1648985] Avg episode reward: [(0, '114.550')] [2024-06-15 12:31:34,027][1652491] Updated weights for policy 0, policy_version 80160 (0.0015) [2024-06-15 12:31:35,955][1648985] Fps is (10 sec: 42598.6, 60 sec: 43690.7, 300 sec: 45764.1). Total num frames: 164233216. Throughput: 0: 11309.5. Samples: 41161216. Policy #0 lag: (min: 47.0, avg: 204.9, max: 303.0) [2024-06-15 12:31:35,956][1648985] Avg episode reward: [(0, '124.710')] [2024-06-15 12:31:36,075][1652491] Updated weights for policy 0, policy_version 80193 (0.0013) [2024-06-15 12:31:36,865][1651469] Signal inference workers to stop experience collection... (4200 times) [2024-06-15 12:31:36,913][1652491] InferenceWorker_p0-w0: stopping experience collection (4200 times) [2024-06-15 12:31:37,064][1651469] Signal inference workers to resume experience collection... (4200 times) [2024-06-15 12:31:37,065][1652491] InferenceWorker_p0-w0: resuming experience collection (4200 times) [2024-06-15 12:31:37,205][1652491] Updated weights for policy 0, policy_version 80241 (0.0012) [2024-06-15 12:31:38,801][1652491] Updated weights for policy 0, policy_version 80304 (0.0011) [2024-06-15 12:31:39,669][1652491] Updated weights for policy 0, policy_version 80342 (0.0011) [2024-06-15 12:31:40,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 45329.0, 300 sec: 46652.8). Total num frames: 164626432. Throughput: 0: 11195.8. Samples: 41186816. Policy #0 lag: (min: 47.0, avg: 204.9, max: 303.0) [2024-06-15 12:31:40,956][1648985] Avg episode reward: [(0, '125.650')] [2024-06-15 12:31:44,040][1652491] Updated weights for policy 0, policy_version 80386 (0.0015) [2024-06-15 12:31:45,955][1648985] Fps is (10 sec: 52427.2, 60 sec: 45874.9, 300 sec: 45875.2). Total num frames: 164757504. Throughput: 0: 11332.2. Samples: 41267712. Policy #0 lag: (min: 15.0, avg: 103.8, max: 271.0) [2024-06-15 12:31:45,956][1648985] Avg episode reward: [(0, '119.760')] [2024-06-15 12:31:47,480][1652491] Updated weights for policy 0, policy_version 80480 (0.0133) [2024-06-15 12:31:48,935][1652491] Updated weights for policy 0, policy_version 80529 (0.0011) [2024-06-15 12:31:50,768][1652491] Updated weights for policy 0, policy_version 80596 (0.0011) [2024-06-15 12:31:50,955][1648985] Fps is (10 sec: 45876.1, 60 sec: 45875.3, 300 sec: 46430.6). Total num frames: 165085184. Throughput: 0: 11355.1. Samples: 41326592. Policy #0 lag: (min: 15.0, avg: 103.8, max: 271.0) [2024-06-15 12:31:50,955][1648985] Avg episode reward: [(0, '104.660')] [2024-06-15 12:31:55,955][1648985] Fps is (10 sec: 39321.9, 60 sec: 43690.8, 300 sec: 45764.1). Total num frames: 165150720. Throughput: 0: 11332.2. Samples: 41365504. Policy #0 lag: (min: 15.0, avg: 103.8, max: 271.0) [2024-06-15 12:31:55,956][1648985] Avg episode reward: [(0, '125.910')] [2024-06-15 12:31:56,395][1652491] Updated weights for policy 0, policy_version 80656 (0.0015) [2024-06-15 12:31:59,259][1652491] Updated weights for policy 0, policy_version 80725 (0.0016) [2024-06-15 12:32:00,955][1648985] Fps is (10 sec: 36044.3, 60 sec: 45329.2, 300 sec: 45986.3). Total num frames: 165445632. Throughput: 0: 11528.6. Samples: 41434112. Policy #0 lag: (min: 15.0, avg: 103.8, max: 271.0) [2024-06-15 12:32:00,956][1648985] Avg episode reward: [(0, '121.440')] [2024-06-15 12:32:01,379][1652491] Updated weights for policy 0, policy_version 80801 (0.0012) [2024-06-15 12:32:03,440][1652491] Updated weights for policy 0, policy_version 80888 (0.0233) [2024-06-15 12:32:05,955][1648985] Fps is (10 sec: 52429.7, 60 sec: 43690.7, 300 sec: 46208.4). Total num frames: 165675008. Throughput: 0: 11173.0. Samples: 41491456. Policy #0 lag: (min: 15.0, avg: 103.8, max: 271.0) [2024-06-15 12:32:05,956][1648985] Avg episode reward: [(0, '121.630')] [2024-06-15 12:32:10,486][1652491] Updated weights for policy 0, policy_version 80947 (0.0014) [2024-06-15 12:32:10,955][1648985] Fps is (10 sec: 36044.9, 60 sec: 43690.7, 300 sec: 45764.1). Total num frames: 165806080. Throughput: 0: 11195.8. Samples: 41528320. Policy #0 lag: (min: 15.0, avg: 103.8, max: 271.0) [2024-06-15 12:32:10,956][1648985] Avg episode reward: [(0, '108.160')] [2024-06-15 12:32:11,512][1652491] Updated weights for policy 0, policy_version 80977 (0.0013) [2024-06-15 12:32:12,947][1652491] Updated weights for policy 0, policy_version 81040 (0.0013) [2024-06-15 12:32:14,592][1652491] Updated weights for policy 0, policy_version 81110 (0.0012) [2024-06-15 12:32:14,901][1651469] Signal inference workers to stop experience collection... (4250 times) [2024-06-15 12:32:14,952][1652491] InferenceWorker_p0-w0: stopping experience collection (4250 times) [2024-06-15 12:32:15,144][1651469] Signal inference workers to resume experience collection... (4250 times) [2024-06-15 12:32:15,146][1652491] InferenceWorker_p0-w0: resuming experience collection (4250 times) [2024-06-15 12:32:15,955][1648985] Fps is (10 sec: 52427.4, 60 sec: 45883.5, 300 sec: 46319.5). Total num frames: 166199296. Throughput: 0: 11275.3. Samples: 41589760. Policy #0 lag: (min: 15.0, avg: 103.8, max: 271.0) [2024-06-15 12:32:15,956][1648985] Avg episode reward: [(0, '105.370')] [2024-06-15 12:32:20,955][1648985] Fps is (10 sec: 39321.2, 60 sec: 43690.6, 300 sec: 45319.8). Total num frames: 166199296. Throughput: 0: 11229.8. Samples: 41666560. Policy #0 lag: (min: 15.0, avg: 103.8, max: 271.0) [2024-06-15 12:32:20,956][1648985] Avg episode reward: [(0, '123.250')] [2024-06-15 12:32:21,033][1652491] Updated weights for policy 0, policy_version 81155 (0.0013) [2024-06-15 12:32:22,187][1652491] Updated weights for policy 0, policy_version 81205 (0.0012) [2024-06-15 12:32:23,513][1652491] Updated weights for policy 0, policy_version 81251 (0.0031) [2024-06-15 12:32:25,255][1652491] Updated weights for policy 0, policy_version 81328 (0.0012) [2024-06-15 12:32:25,958][1648985] Fps is (10 sec: 42586.2, 60 sec: 46965.0, 300 sec: 46319.0). Total num frames: 166625280. Throughput: 0: 11377.0. Samples: 41698816. Policy #0 lag: (min: 15.0, avg: 103.8, max: 271.0) [2024-06-15 12:32:25,959][1648985] Avg episode reward: [(0, '134.760')] [2024-06-15 12:32:26,929][1652491] Updated weights for policy 0, policy_version 81405 (0.0015) [2024-06-15 12:32:30,955][1648985] Fps is (10 sec: 52428.4, 60 sec: 43690.6, 300 sec: 45541.9). Total num frames: 166723584. Throughput: 0: 11059.2. Samples: 41765376. Policy #0 lag: (min: 15.0, avg: 103.8, max: 271.0) [2024-06-15 12:32:30,956][1648985] Avg episode reward: [(0, '135.700')] [2024-06-15 12:32:34,103][1652491] Updated weights for policy 0, policy_version 81460 (0.0012) [2024-06-15 12:32:35,955][1648985] Fps is (10 sec: 36055.9, 60 sec: 45875.1, 300 sec: 45764.1). Total num frames: 166985728. Throughput: 0: 11116.0. Samples: 41826816. Policy #0 lag: (min: 34.0, avg: 101.3, max: 290.0) [2024-06-15 12:32:35,956][1648985] Avg episode reward: [(0, '132.880')] [2024-06-15 12:32:36,206][1652491] Updated weights for policy 0, policy_version 81552 (0.0011) [2024-06-15 12:32:37,538][1652491] Updated weights for policy 0, policy_version 81616 (0.0012) [2024-06-15 12:32:38,609][1652491] Updated weights for policy 0, policy_version 81659 (0.0015) [2024-06-15 12:32:40,955][1648985] Fps is (10 sec: 52429.8, 60 sec: 43690.7, 300 sec: 45653.0). Total num frames: 167247872. Throughput: 0: 10911.4. Samples: 41856512. Policy #0 lag: (min: 34.0, avg: 101.3, max: 290.0) [2024-06-15 12:32:40,955][1648985] Avg episode reward: [(0, '130.670')] [2024-06-15 12:32:45,955][1648985] Fps is (10 sec: 36044.7, 60 sec: 43144.7, 300 sec: 45211.7). Total num frames: 167346176. Throughput: 0: 11184.3. Samples: 41937408. Policy #0 lag: (min: 34.0, avg: 101.3, max: 290.0) [2024-06-15 12:32:45,956][1648985] Avg episode reward: [(0, '148.440')] [2024-06-15 12:32:46,029][1652491] Updated weights for policy 0, policy_version 81728 (0.0013) [2024-06-15 12:32:48,194][1652491] Updated weights for policy 0, policy_version 81810 (0.0101) [2024-06-15 12:32:49,660][1652491] Updated weights for policy 0, policy_version 81876 (0.0014) [2024-06-15 12:32:50,537][1652491] Updated weights for policy 0, policy_version 81917 (0.0017) [2024-06-15 12:32:50,955][1648985] Fps is (10 sec: 52428.6, 60 sec: 44782.8, 300 sec: 46097.4). Total num frames: 167772160. Throughput: 0: 11070.6. Samples: 41989632. Policy #0 lag: (min: 34.0, avg: 101.3, max: 290.0) [2024-06-15 12:32:50,956][1648985] Avg episode reward: [(0, '142.630')] [2024-06-15 12:32:55,955][1648985] Fps is (10 sec: 42597.3, 60 sec: 43690.6, 300 sec: 44875.4). Total num frames: 167772160. Throughput: 0: 11184.3. Samples: 42031616. Policy #0 lag: (min: 34.0, avg: 101.3, max: 290.0) [2024-06-15 12:32:55,956][1648985] Avg episode reward: [(0, '133.910')] [2024-06-15 12:32:55,960][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000081920_167772160.pth... [2024-06-15 12:32:56,010][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000076720_157122560.pth [2024-06-15 12:32:57,771][1652491] Updated weights for policy 0, policy_version 81984 (0.0013) [2024-06-15 12:32:57,956][1651469] Signal inference workers to stop experience collection... (4300 times) [2024-06-15 12:32:58,034][1652491] InferenceWorker_p0-w0: stopping experience collection (4300 times) [2024-06-15 12:32:58,235][1651469] Signal inference workers to resume experience collection... (4300 times) [2024-06-15 12:32:58,236][1652491] InferenceWorker_p0-w0: resuming experience collection (4300 times) [2024-06-15 12:32:59,900][1652491] Updated weights for policy 0, policy_version 82052 (0.0012) [2024-06-15 12:33:00,955][1648985] Fps is (10 sec: 36044.7, 60 sec: 44782.9, 300 sec: 45653.0). Total num frames: 168132608. Throughput: 0: 11218.6. Samples: 42094592. Policy #0 lag: (min: 34.0, avg: 101.3, max: 290.0) [2024-06-15 12:33:00,956][1648985] Avg episode reward: [(0, '131.170')] [2024-06-15 12:33:01,800][1652491] Updated weights for policy 0, policy_version 82129 (0.0013) [2024-06-15 12:33:02,867][1652491] Updated weights for policy 0, policy_version 82176 (0.0016) [2024-06-15 12:33:05,955][1648985] Fps is (10 sec: 52430.1, 60 sec: 43690.6, 300 sec: 45208.7). Total num frames: 168296448. Throughput: 0: 11025.1. Samples: 42162688. Policy #0 lag: (min: 34.0, avg: 101.3, max: 290.0) [2024-06-15 12:33:05,956][1648985] Avg episode reward: [(0, '146.110')] [2024-06-15 12:33:09,641][1652491] Updated weights for policy 0, policy_version 82256 (0.0130) [2024-06-15 12:33:10,955][1648985] Fps is (10 sec: 42598.2, 60 sec: 45875.1, 300 sec: 45653.0). Total num frames: 168558592. Throughput: 0: 11162.4. Samples: 42201088. Policy #0 lag: (min: 34.0, avg: 101.3, max: 290.0) [2024-06-15 12:33:10,956][1648985] Avg episode reward: [(0, '139.760')] [2024-06-15 12:33:11,579][1652491] Updated weights for policy 0, policy_version 82337 (0.0012) [2024-06-15 12:33:13,148][1652491] Updated weights for policy 0, policy_version 82400 (0.0133) [2024-06-15 12:33:15,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 43690.8, 300 sec: 45319.8). Total num frames: 168820736. Throughput: 0: 10934.1. Samples: 42257408. Policy #0 lag: (min: 34.0, avg: 101.3, max: 290.0) [2024-06-15 12:33:15,956][1648985] Avg episode reward: [(0, '138.470')] [2024-06-15 12:33:20,586][1652491] Updated weights for policy 0, policy_version 82464 (0.0036) [2024-06-15 12:33:20,955][1648985] Fps is (10 sec: 36044.5, 60 sec: 45329.0, 300 sec: 45208.7). Total num frames: 168919040. Throughput: 0: 11332.2. Samples: 42336768. Policy #0 lag: (min: 34.0, avg: 101.3, max: 290.0) [2024-06-15 12:33:20,956][1648985] Avg episode reward: [(0, '125.700')] [2024-06-15 12:33:22,095][1652491] Updated weights for policy 0, policy_version 82537 (0.0012) [2024-06-15 12:33:24,642][1652491] Updated weights for policy 0, policy_version 82626 (0.0113) [2024-06-15 12:33:25,952][1652491] Updated weights for policy 0, policy_version 82687 (0.0014) [2024-06-15 12:33:25,955][1648985] Fps is (10 sec: 49152.5, 60 sec: 44785.3, 300 sec: 45319.8). Total num frames: 169312256. Throughput: 0: 11218.5. Samples: 42361344. Policy #0 lag: (min: 159.0, avg: 204.6, max: 375.0) [2024-06-15 12:33:25,956][1648985] Avg episode reward: [(0, '134.900')] [2024-06-15 12:33:30,955][1648985] Fps is (10 sec: 42599.0, 60 sec: 43690.8, 300 sec: 44875.5). Total num frames: 169345024. Throughput: 0: 11047.8. Samples: 42434560. Policy #0 lag: (min: 159.0, avg: 204.6, max: 375.0) [2024-06-15 12:33:30,956][1648985] Avg episode reward: [(0, '107.090')] [2024-06-15 12:33:32,765][1652491] Updated weights for policy 0, policy_version 82759 (0.0049) [2024-06-15 12:33:34,398][1651469] Signal inference workers to stop experience collection... (4350 times) [2024-06-15 12:33:34,432][1652491] InferenceWorker_p0-w0: stopping experience collection (4350 times) [2024-06-15 12:33:34,759][1651469] Signal inference workers to resume experience collection... (4350 times) [2024-06-15 12:33:34,760][1652491] InferenceWorker_p0-w0: resuming experience collection (4350 times) [2024-06-15 12:33:34,762][1652491] Updated weights for policy 0, policy_version 82848 (0.0014) [2024-06-15 12:33:35,955][1648985] Fps is (10 sec: 42598.4, 60 sec: 45875.3, 300 sec: 45319.8). Total num frames: 169738240. Throughput: 0: 11241.2. Samples: 42495488. Policy #0 lag: (min: 159.0, avg: 204.6, max: 375.0) [2024-06-15 12:33:35,956][1648985] Avg episode reward: [(0, '113.300')] [2024-06-15 12:33:36,456][1652491] Updated weights for policy 0, policy_version 82882 (0.0011) [2024-06-15 12:33:40,955][1648985] Fps is (10 sec: 52428.6, 60 sec: 43690.6, 300 sec: 44875.5). Total num frames: 169869312. Throughput: 0: 11116.2. Samples: 42531840. Policy #0 lag: (min: 159.0, avg: 204.6, max: 375.0) [2024-06-15 12:33:40,956][1648985] Avg episode reward: [(0, '123.160')] [2024-06-15 12:33:42,321][1652491] Updated weights for policy 0, policy_version 82945 (0.0013) [2024-06-15 12:33:43,611][1652491] Updated weights for policy 0, policy_version 82994 (0.0016) [2024-06-15 12:33:45,088][1652491] Updated weights for policy 0, policy_version 83057 (0.0012) [2024-06-15 12:33:45,955][1648985] Fps is (10 sec: 42598.2, 60 sec: 46967.5, 300 sec: 45430.9). Total num frames: 170164224. Throughput: 0: 11400.5. Samples: 42607616. Policy #0 lag: (min: 159.0, avg: 204.6, max: 375.0) [2024-06-15 12:33:45,956][1648985] Avg episode reward: [(0, '116.770')] [2024-06-15 12:33:46,577][1652491] Updated weights for policy 0, policy_version 83127 (0.0012) [2024-06-15 12:33:48,561][1652491] Updated weights for policy 0, policy_version 83168 (0.0011) [2024-06-15 12:33:50,955][1648985] Fps is (10 sec: 52429.1, 60 sec: 43690.7, 300 sec: 45097.7). Total num frames: 170393600. Throughput: 0: 11616.7. Samples: 42685440. Policy #0 lag: (min: 159.0, avg: 204.6, max: 375.0) [2024-06-15 12:33:50,956][1648985] Avg episode reward: [(0, '118.420')] [2024-06-15 12:33:52,521][1652491] Updated weights for policy 0, policy_version 83202 (0.0017) [2024-06-15 12:33:53,819][1652491] Updated weights for policy 0, policy_version 83264 (0.0012) [2024-06-15 12:33:55,246][1652491] Updated weights for policy 0, policy_version 83328 (0.0014) [2024-06-15 12:33:55,955][1648985] Fps is (10 sec: 55704.9, 60 sec: 49152.2, 300 sec: 45541.9). Total num frames: 170721280. Throughput: 0: 11559.8. Samples: 42721280. Policy #0 lag: (min: 159.0, avg: 204.6, max: 375.0) [2024-06-15 12:33:55,956][1648985] Avg episode reward: [(0, '126.670')] [2024-06-15 12:33:56,571][1652491] Updated weights for policy 0, policy_version 83385 (0.0014) [2024-06-15 12:33:59,802][1652491] Updated weights for policy 0, policy_version 83443 (0.0047) [2024-06-15 12:34:00,955][1648985] Fps is (10 sec: 52429.1, 60 sec: 46421.4, 300 sec: 45321.7). Total num frames: 170917888. Throughput: 0: 11764.7. Samples: 42786816. Policy #0 lag: (min: 159.0, avg: 204.6, max: 375.0) [2024-06-15 12:34:00,956][1648985] Avg episode reward: [(0, '127.900')] [2024-06-15 12:34:04,171][1652491] Updated weights for policy 0, policy_version 83515 (0.0015) [2024-06-15 12:34:05,581][1652491] Updated weights for policy 0, policy_version 83570 (0.0015) [2024-06-15 12:34:05,955][1648985] Fps is (10 sec: 45874.6, 60 sec: 48059.6, 300 sec: 45542.0). Total num frames: 171180032. Throughput: 0: 11673.6. Samples: 42862080. Policy #0 lag: (min: 159.0, avg: 204.6, max: 375.0) [2024-06-15 12:34:05,956][1648985] Avg episode reward: [(0, '126.340')] [2024-06-15 12:34:07,179][1652491] Updated weights for policy 0, policy_version 83642 (0.0024) [2024-06-15 12:34:10,431][1652491] Updated weights for policy 0, policy_version 83680 (0.0012) [2024-06-15 12:34:10,955][1648985] Fps is (10 sec: 49151.1, 60 sec: 47513.6, 300 sec: 45653.0). Total num frames: 171409408. Throughput: 0: 11867.0. Samples: 42895360. Policy #0 lag: (min: 159.0, avg: 204.6, max: 375.0) [2024-06-15 12:34:10,956][1648985] Avg episode reward: [(0, '110.060')] [2024-06-15 12:34:11,186][1652491] Updated weights for policy 0, policy_version 83712 (0.0013) [2024-06-15 12:34:14,926][1652491] Updated weights for policy 0, policy_version 83766 (0.0013) [2024-06-15 12:34:15,239][1651469] Signal inference workers to stop experience collection... (4400 times) [2024-06-15 12:34:15,278][1652491] InferenceWorker_p0-w0: stopping experience collection (4400 times) [2024-06-15 12:34:15,415][1651469] Signal inference workers to resume experience collection... (4400 times) [2024-06-15 12:34:15,416][1652491] InferenceWorker_p0-w0: resuming experience collection (4400 times) [2024-06-15 12:34:15,955][1648985] Fps is (10 sec: 45876.4, 60 sec: 46967.5, 300 sec: 45542.0). Total num frames: 171638784. Throughput: 0: 12037.7. Samples: 42976256. Policy #0 lag: (min: 63.0, avg: 138.0, max: 319.0) [2024-06-15 12:34:15,956][1648985] Avg episode reward: [(0, '107.830')] [2024-06-15 12:34:16,185][1652491] Updated weights for policy 0, policy_version 83816 (0.0016) [2024-06-15 12:34:17,696][1652491] Updated weights for policy 0, policy_version 83877 (0.0013) [2024-06-15 12:34:20,726][1652491] Updated weights for policy 0, policy_version 83907 (0.0015) [2024-06-15 12:34:20,955][1648985] Fps is (10 sec: 45876.0, 60 sec: 49152.2, 300 sec: 45875.2). Total num frames: 171868160. Throughput: 0: 12219.7. Samples: 43045376. Policy #0 lag: (min: 63.0, avg: 138.0, max: 319.0) [2024-06-15 12:34:20,956][1648985] Avg episode reward: [(0, '115.780')] [2024-06-15 12:34:21,742][1652491] Updated weights for policy 0, policy_version 83955 (0.0069) [2024-06-15 12:34:25,211][1652491] Updated weights for policy 0, policy_version 84000 (0.0036) [2024-06-15 12:34:25,955][1648985] Fps is (10 sec: 45875.4, 60 sec: 46421.3, 300 sec: 45430.9). Total num frames: 172097536. Throughput: 0: 12265.3. Samples: 43083776. Policy #0 lag: (min: 63.0, avg: 138.0, max: 319.0) [2024-06-15 12:34:25,955][1648985] Avg episode reward: [(0, '128.050')] [2024-06-15 12:34:27,001][1652491] Updated weights for policy 0, policy_version 84080 (0.0017) [2024-06-15 12:34:28,156][1652491] Updated weights for policy 0, policy_version 84128 (0.0018) [2024-06-15 12:34:28,929][1652491] Updated weights for policy 0, policy_version 84159 (0.0012) [2024-06-15 12:34:30,955][1648985] Fps is (10 sec: 49151.2, 60 sec: 50244.2, 300 sec: 45764.1). Total num frames: 172359680. Throughput: 0: 12117.3. Samples: 43152896. Policy #0 lag: (min: 63.0, avg: 138.0, max: 319.0) [2024-06-15 12:34:30,956][1648985] Avg episode reward: [(0, '129.560')] [2024-06-15 12:34:32,609][1652491] Updated weights for policy 0, policy_version 84217 (0.0026) [2024-06-15 12:34:35,955][1648985] Fps is (10 sec: 45875.0, 60 sec: 46967.4, 300 sec: 45542.0). Total num frames: 172556288. Throughput: 0: 12037.7. Samples: 43227136. Policy #0 lag: (min: 63.0, avg: 138.0, max: 319.0) [2024-06-15 12:34:35,956][1648985] Avg episode reward: [(0, '127.190')] [2024-06-15 12:34:36,072][1652491] Updated weights for policy 0, policy_version 84259 (0.0012) [2024-06-15 12:34:37,713][1652491] Updated weights for policy 0, policy_version 84308 (0.0015) [2024-06-15 12:34:39,271][1652491] Updated weights for policy 0, policy_version 84370 (0.0012) [2024-06-15 12:34:40,424][1652491] Updated weights for policy 0, policy_version 84415 (0.0014) [2024-06-15 12:34:40,955][1648985] Fps is (10 sec: 52429.3, 60 sec: 50244.3, 300 sec: 46097.4). Total num frames: 172883968. Throughput: 0: 11946.7. Samples: 43258880. Policy #0 lag: (min: 63.0, avg: 138.0, max: 319.0) [2024-06-15 12:34:40,956][1648985] Avg episode reward: [(0, '122.340')] [2024-06-15 12:34:43,638][1652491] Updated weights for policy 0, policy_version 84456 (0.0015) [2024-06-15 12:34:45,955][1648985] Fps is (10 sec: 45875.6, 60 sec: 47513.7, 300 sec: 45764.1). Total num frames: 173015040. Throughput: 0: 12083.2. Samples: 43330560. Policy #0 lag: (min: 63.0, avg: 138.0, max: 319.0) [2024-06-15 12:34:45,955][1648985] Avg episode reward: [(0, '122.710')] [2024-06-15 12:34:46,498][1652491] Updated weights for policy 0, policy_version 84487 (0.0013) [2024-06-15 12:34:48,848][1652491] Updated weights for policy 0, policy_version 84560 (0.0014) [2024-06-15 12:34:50,386][1652491] Updated weights for policy 0, policy_version 84624 (0.0011) [2024-06-15 12:34:50,955][1648985] Fps is (10 sec: 45874.5, 60 sec: 49151.8, 300 sec: 45986.3). Total num frames: 173342720. Throughput: 0: 11889.8. Samples: 43397120. Policy #0 lag: (min: 63.0, avg: 138.0, max: 319.0) [2024-06-15 12:34:50,956][1648985] Avg episode reward: [(0, '134.940')] [2024-06-15 12:34:51,512][1652491] Updated weights for policy 0, policy_version 84669 (0.0011) [2024-06-15 12:34:54,974][1652491] Updated weights for policy 0, policy_version 84729 (0.0013) [2024-06-15 12:34:55,955][1648985] Fps is (10 sec: 52428.6, 60 sec: 46967.6, 300 sec: 45986.3). Total num frames: 173539328. Throughput: 0: 12071.9. Samples: 43438592. Policy #0 lag: (min: 63.0, avg: 138.0, max: 319.0) [2024-06-15 12:34:55,956][1648985] Avg episode reward: [(0, '127.610')] [2024-06-15 12:34:55,962][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000084736_173539328.pth... [2024-06-15 12:34:56,020][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000079360_162529280.pth [2024-06-15 12:34:57,174][1651469] Signal inference workers to stop experience collection... (4450 times) [2024-06-15 12:34:57,201][1652491] InferenceWorker_p0-w0: stopping experience collection (4450 times) [2024-06-15 12:34:57,313][1651469] Signal inference workers to resume experience collection... (4450 times) [2024-06-15 12:34:57,314][1652491] InferenceWorker_p0-w0: resuming experience collection (4450 times) [2024-06-15 12:34:57,907][1652491] Updated weights for policy 0, policy_version 84784 (0.0012) [2024-06-15 12:35:00,402][1652491] Updated weights for policy 0, policy_version 84848 (0.0012) [2024-06-15 12:35:00,955][1648985] Fps is (10 sec: 45876.0, 60 sec: 48059.7, 300 sec: 45764.1). Total num frames: 173801472. Throughput: 0: 11878.4. Samples: 43510784. Policy #0 lag: (min: 63.0, avg: 138.0, max: 319.0) [2024-06-15 12:35:00,956][1648985] Avg episode reward: [(0, '129.270')] [2024-06-15 12:35:02,197][1652491] Updated weights for policy 0, policy_version 84920 (0.0014) [2024-06-15 12:35:05,955][1648985] Fps is (10 sec: 45874.8, 60 sec: 46967.6, 300 sec: 45986.3). Total num frames: 173998080. Throughput: 0: 11821.5. Samples: 43577344. Policy #0 lag: (min: 63.0, avg: 138.0, max: 319.0) [2024-06-15 12:35:05,956][1648985] Avg episode reward: [(0, '125.990')] [2024-06-15 12:35:06,143][1652491] Updated weights for policy 0, policy_version 84976 (0.0012) [2024-06-15 12:35:08,415][1652491] Updated weights for policy 0, policy_version 85024 (0.0013) [2024-06-15 12:35:10,588][1652491] Updated weights for policy 0, policy_version 85078 (0.0012) [2024-06-15 12:35:10,955][1648985] Fps is (10 sec: 45875.6, 60 sec: 47513.8, 300 sec: 45875.2). Total num frames: 174260224. Throughput: 0: 11844.3. Samples: 43616768. Policy #0 lag: (min: 10.0, avg: 111.8, max: 266.0) [2024-06-15 12:35:10,955][1648985] Avg episode reward: [(0, '117.390')] [2024-06-15 12:35:11,324][1652491] Updated weights for policy 0, policy_version 85120 (0.0013) [2024-06-15 12:35:13,225][1652491] Updated weights for policy 0, policy_version 85184 (0.0013) [2024-06-15 12:35:15,955][1648985] Fps is (10 sec: 45875.1, 60 sec: 46967.4, 300 sec: 45764.1). Total num frames: 174456832. Throughput: 0: 11923.9. Samples: 43689472. Policy #0 lag: (min: 10.0, avg: 111.8, max: 266.0) [2024-06-15 12:35:15,956][1648985] Avg episode reward: [(0, '103.920')] [2024-06-15 12:35:17,590][1652491] Updated weights for policy 0, policy_version 85245 (0.0012) [2024-06-15 12:35:19,836][1652491] Updated weights for policy 0, policy_version 85300 (0.0013) [2024-06-15 12:35:20,955][1648985] Fps is (10 sec: 45875.1, 60 sec: 47513.6, 300 sec: 46097.4). Total num frames: 174718976. Throughput: 0: 11901.2. Samples: 43762688. Policy #0 lag: (min: 10.0, avg: 111.8, max: 266.0) [2024-06-15 12:35:20,956][1648985] Avg episode reward: [(0, '126.680')] [2024-06-15 12:35:21,433][1652491] Updated weights for policy 0, policy_version 85344 (0.0026) [2024-06-15 12:35:23,017][1652491] Updated weights for policy 0, policy_version 85392 (0.0013) [2024-06-15 12:35:25,955][1648985] Fps is (10 sec: 52429.1, 60 sec: 48059.7, 300 sec: 45764.1). Total num frames: 174981120. Throughput: 0: 11855.6. Samples: 43792384. Policy #0 lag: (min: 10.0, avg: 111.8, max: 266.0) [2024-06-15 12:35:25,956][1648985] Avg episode reward: [(0, '129.160')] [2024-06-15 12:35:28,061][1652491] Updated weights for policy 0, policy_version 85472 (0.0015) [2024-06-15 12:35:30,567][1652491] Updated weights for policy 0, policy_version 85506 (0.0012) [2024-06-15 12:35:30,955][1648985] Fps is (10 sec: 42598.1, 60 sec: 46421.4, 300 sec: 45875.2). Total num frames: 175144960. Throughput: 0: 11901.1. Samples: 43866112. Policy #0 lag: (min: 10.0, avg: 111.8, max: 266.0) [2024-06-15 12:35:30,956][1648985] Avg episode reward: [(0, '132.550')] [2024-06-15 12:35:32,117][1652491] Updated weights for policy 0, policy_version 85584 (0.0013) [2024-06-15 12:35:34,023][1652491] Updated weights for policy 0, policy_version 85638 (0.0097) [2024-06-15 12:35:35,214][1652491] Updated weights for policy 0, policy_version 85691 (0.0019) [2024-06-15 12:35:35,955][1648985] Fps is (10 sec: 52428.6, 60 sec: 49151.9, 300 sec: 46097.3). Total num frames: 175505408. Throughput: 0: 11935.3. Samples: 43934208. Policy #0 lag: (min: 10.0, avg: 111.8, max: 266.0) [2024-06-15 12:35:35,956][1648985] Avg episode reward: [(0, '119.620')] [2024-06-15 12:35:39,556][1652491] Updated weights for policy 0, policy_version 85730 (0.0013) [2024-06-15 12:35:40,955][1648985] Fps is (10 sec: 49152.2, 60 sec: 45875.2, 300 sec: 46208.4). Total num frames: 175636480. Throughput: 0: 11889.8. Samples: 43973632. Policy #0 lag: (min: 10.0, avg: 111.8, max: 266.0) [2024-06-15 12:35:40,956][1648985] Avg episode reward: [(0, '120.550')] [2024-06-15 12:35:41,355][1651469] Signal inference workers to stop experience collection... (4500 times) [2024-06-15 12:35:41,439][1652491] InferenceWorker_p0-w0: stopping experience collection (4500 times) [2024-06-15 12:35:41,596][1651469] Signal inference workers to resume experience collection... (4500 times) [2024-06-15 12:35:41,598][1652491] InferenceWorker_p0-w0: resuming experience collection (4500 times) [2024-06-15 12:35:42,055][1652491] Updated weights for policy 0, policy_version 85794 (0.0037) [2024-06-15 12:35:43,624][1652491] Updated weights for policy 0, policy_version 85859 (0.0126) [2024-06-15 12:35:45,919][1652491] Updated weights for policy 0, policy_version 85904 (0.0014) [2024-06-15 12:35:45,955][1648985] Fps is (10 sec: 42599.0, 60 sec: 48605.9, 300 sec: 46097.4). Total num frames: 175931392. Throughput: 0: 11798.8. Samples: 44041728. Policy #0 lag: (min: 10.0, avg: 111.8, max: 266.0) [2024-06-15 12:35:45,956][1648985] Avg episode reward: [(0, '109.650')] [2024-06-15 12:35:50,165][1652491] Updated weights for policy 0, policy_version 85953 (0.0014) [2024-06-15 12:35:50,955][1648985] Fps is (10 sec: 45874.8, 60 sec: 45875.3, 300 sec: 45986.3). Total num frames: 176095232. Throughput: 0: 11958.0. Samples: 44115456. Policy #0 lag: (min: 10.0, avg: 111.8, max: 266.0) [2024-06-15 12:35:50,956][1648985] Avg episode reward: [(0, '116.530')] [2024-06-15 12:35:52,326][1652491] Updated weights for policy 0, policy_version 86021 (0.0014) [2024-06-15 12:35:53,827][1652491] Updated weights for policy 0, policy_version 86083 (0.0011) [2024-06-15 12:35:55,104][1652491] Updated weights for policy 0, policy_version 86144 (0.0012) [2024-06-15 12:35:55,955][1648985] Fps is (10 sec: 49151.7, 60 sec: 48059.7, 300 sec: 46430.6). Total num frames: 176422912. Throughput: 0: 11741.9. Samples: 44145152. Policy #0 lag: (min: 10.0, avg: 111.8, max: 266.0) [2024-06-15 12:35:55,956][1648985] Avg episode reward: [(0, '122.420')] [2024-06-15 12:35:58,312][1652491] Updated weights for policy 0, policy_version 86208 (0.0151) [2024-06-15 12:36:00,955][1648985] Fps is (10 sec: 45875.5, 60 sec: 45875.2, 300 sec: 45764.1). Total num frames: 176553984. Throughput: 0: 11764.6. Samples: 44218880. Policy #0 lag: (min: 10.0, avg: 111.8, max: 266.0) [2024-06-15 12:36:00,956][1648985] Avg episode reward: [(0, '121.960')] [2024-06-15 12:36:02,963][1652491] Updated weights for policy 0, policy_version 86272 (0.0013) [2024-06-15 12:36:04,247][1652491] Updated weights for policy 0, policy_version 86330 (0.0022) [2024-06-15 12:36:05,946][1652491] Updated weights for policy 0, policy_version 86370 (0.0015) [2024-06-15 12:36:05,955][1648985] Fps is (10 sec: 45874.8, 60 sec: 48059.7, 300 sec: 46430.6). Total num frames: 176881664. Throughput: 0: 11559.8. Samples: 44282880. Policy #0 lag: (min: 15.0, avg: 95.8, max: 271.0) [2024-06-15 12:36:05,956][1648985] Avg episode reward: [(0, '136.910')] [2024-06-15 12:36:08,504][1652491] Updated weights for policy 0, policy_version 86403 (0.0012) [2024-06-15 12:36:09,791][1652491] Updated weights for policy 0, policy_version 86462 (0.0014) [2024-06-15 12:36:10,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 46967.4, 300 sec: 46210.2). Total num frames: 177078272. Throughput: 0: 11844.3. Samples: 44325376. Policy #0 lag: (min: 15.0, avg: 95.8, max: 271.0) [2024-06-15 12:36:10,956][1648985] Avg episode reward: [(0, '135.420')] [2024-06-15 12:36:14,802][1652491] Updated weights for policy 0, policy_version 86544 (0.0136) [2024-06-15 12:36:15,955][1648985] Fps is (10 sec: 45875.0, 60 sec: 48059.7, 300 sec: 46652.7). Total num frames: 177340416. Throughput: 0: 11662.2. Samples: 44390912. Policy #0 lag: (min: 15.0, avg: 95.8, max: 271.0) [2024-06-15 12:36:15,956][1648985] Avg episode reward: [(0, '130.760')] [2024-06-15 12:36:16,270][1652491] Updated weights for policy 0, policy_version 86593 (0.0024) [2024-06-15 12:36:17,884][1652491] Updated weights for policy 0, policy_version 86656 (0.0012) [2024-06-15 12:36:20,955][1648985] Fps is (10 sec: 45874.3, 60 sec: 46967.3, 300 sec: 46541.6). Total num frames: 177537024. Throughput: 0: 11650.8. Samples: 44458496. Policy #0 lag: (min: 15.0, avg: 95.8, max: 271.0) [2024-06-15 12:36:20,956][1648985] Avg episode reward: [(0, '137.600')] [2024-06-15 12:36:21,285][1652491] Updated weights for policy 0, policy_version 86715 (0.0012) [2024-06-15 12:36:24,863][1651469] Signal inference workers to stop experience collection... (4550 times) [2024-06-15 12:36:24,919][1652491] InferenceWorker_p0-w0: stopping experience collection (4550 times) [2024-06-15 12:36:25,035][1651469] Signal inference workers to resume experience collection... (4550 times) [2024-06-15 12:36:25,046][1652491] InferenceWorker_p0-w0: resuming experience collection (4550 times) [2024-06-15 12:36:25,704][1652491] Updated weights for policy 0, policy_version 86784 (0.0014) [2024-06-15 12:36:25,955][1648985] Fps is (10 sec: 39321.1, 60 sec: 45875.0, 300 sec: 46208.4). Total num frames: 177733632. Throughput: 0: 11616.6. Samples: 44496384. Policy #0 lag: (min: 15.0, avg: 95.8, max: 271.0) [2024-06-15 12:36:25,956][1648985] Avg episode reward: [(0, '132.900')] [2024-06-15 12:36:28,077][1652491] Updated weights for policy 0, policy_version 86849 (0.0057) [2024-06-15 12:36:29,534][1652491] Updated weights for policy 0, policy_version 86911 (0.0013) [2024-06-15 12:36:30,955][1648985] Fps is (10 sec: 45875.8, 60 sec: 47513.6, 300 sec: 46652.7). Total num frames: 177995776. Throughput: 0: 11457.4. Samples: 44557312. Policy #0 lag: (min: 15.0, avg: 95.8, max: 271.0) [2024-06-15 12:36:30,956][1648985] Avg episode reward: [(0, '123.880')] [2024-06-15 12:36:35,955][1648985] Fps is (10 sec: 39322.6, 60 sec: 43690.7, 300 sec: 45764.1). Total num frames: 178126848. Throughput: 0: 11548.5. Samples: 44635136. Policy #0 lag: (min: 15.0, avg: 95.8, max: 271.0) [2024-06-15 12:36:35,956][1648985] Avg episode reward: [(0, '128.570')] [2024-06-15 12:36:36,099][1652491] Updated weights for policy 0, policy_version 86978 (0.0014) [2024-06-15 12:36:37,547][1652491] Updated weights for policy 0, policy_version 87056 (0.0014) [2024-06-15 12:36:38,657][1652491] Updated weights for policy 0, policy_version 87103 (0.0012) [2024-06-15 12:36:40,176][1652491] Updated weights for policy 0, policy_version 87158 (0.0011) [2024-06-15 12:36:40,955][1648985] Fps is (10 sec: 52428.4, 60 sec: 48059.6, 300 sec: 46652.8). Total num frames: 178520064. Throughput: 0: 11548.4. Samples: 44664832. Policy #0 lag: (min: 15.0, avg: 95.8, max: 271.0) [2024-06-15 12:36:40,956][1648985] Avg episode reward: [(0, '133.400')] [2024-06-15 12:36:43,959][1652491] Updated weights for policy 0, policy_version 87216 (0.0013) [2024-06-15 12:36:45,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 45329.0, 300 sec: 45986.3). Total num frames: 178651136. Throughput: 0: 11525.7. Samples: 44737536. Policy #0 lag: (min: 15.0, avg: 95.8, max: 271.0) [2024-06-15 12:36:45,955][1648985] Avg episode reward: [(0, '155.510')] [2024-06-15 12:36:47,746][1652491] Updated weights for policy 0, policy_version 87269 (0.0173) [2024-06-15 12:36:49,246][1652491] Updated weights for policy 0, policy_version 87344 (0.0246) [2024-06-15 12:36:50,640][1652491] Updated weights for policy 0, policy_version 87378 (0.0013) [2024-06-15 12:36:50,955][1648985] Fps is (10 sec: 45875.7, 60 sec: 48059.8, 300 sec: 46874.9). Total num frames: 178978816. Throughput: 0: 11594.0. Samples: 44804608. Policy #0 lag: (min: 15.0, avg: 95.8, max: 271.0) [2024-06-15 12:36:50,956][1648985] Avg episode reward: [(0, '153.070')] [2024-06-15 12:36:54,646][1652491] Updated weights for policy 0, policy_version 87444 (0.0013) [2024-06-15 12:36:55,517][1652491] Updated weights for policy 0, policy_version 87488 (0.0011) [2024-06-15 12:36:55,955][1648985] Fps is (10 sec: 52428.4, 60 sec: 45875.2, 300 sec: 46541.7). Total num frames: 179175424. Throughput: 0: 11559.8. Samples: 44845568. Policy #0 lag: (min: 29.0, avg: 139.6, max: 285.0) [2024-06-15 12:36:55,956][1648985] Avg episode reward: [(0, '132.570')] [2024-06-15 12:36:55,962][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000087488_179175424.pth... [2024-06-15 12:36:56,019][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000081920_167772160.pth [2024-06-15 12:36:58,929][1652491] Updated weights for policy 0, policy_version 87552 (0.0012) [2024-06-15 12:37:00,151][1652491] Updated weights for policy 0, policy_version 87607 (0.0023) [2024-06-15 12:37:00,955][1648985] Fps is (10 sec: 45875.2, 60 sec: 48059.7, 300 sec: 46652.7). Total num frames: 179437568. Throughput: 0: 11719.1. Samples: 44918272. Policy #0 lag: (min: 29.0, avg: 139.6, max: 285.0) [2024-06-15 12:37:00,956][1648985] Avg episode reward: [(0, '131.370')] [2024-06-15 12:37:01,564][1652491] Updated weights for policy 0, policy_version 87648 (0.0011) [2024-06-15 12:37:04,905][1652491] Updated weights for policy 0, policy_version 87696 (0.0011) [2024-06-15 12:37:05,417][1651469] Signal inference workers to stop experience collection... (4600 times) [2024-06-15 12:37:05,455][1652491] InferenceWorker_p0-w0: stopping experience collection (4600 times) [2024-06-15 12:37:05,617][1651469] Signal inference workers to resume experience collection... (4600 times) [2024-06-15 12:37:05,618][1652491] InferenceWorker_p0-w0: resuming experience collection (4600 times) [2024-06-15 12:37:05,953][1652491] Updated weights for policy 0, policy_version 87741 (0.0013) [2024-06-15 12:37:05,955][1648985] Fps is (10 sec: 49152.9, 60 sec: 46421.5, 300 sec: 46986.0). Total num frames: 179666944. Throughput: 0: 11764.7. Samples: 44987904. Policy #0 lag: (min: 29.0, avg: 139.6, max: 285.0) [2024-06-15 12:37:05,956][1648985] Avg episode reward: [(0, '137.780')] [2024-06-15 12:37:09,508][1652491] Updated weights for policy 0, policy_version 87796 (0.0014) [2024-06-15 12:37:10,974][1648985] Fps is (10 sec: 49058.1, 60 sec: 47498.4, 300 sec: 46538.7). Total num frames: 179929088. Throughput: 0: 11884.8. Samples: 45031424. Policy #0 lag: (min: 29.0, avg: 139.6, max: 285.0) [2024-06-15 12:37:10,975][1648985] Avg episode reward: [(0, '139.070')] [2024-06-15 12:37:11,001][1652491] Updated weights for policy 0, policy_version 87866 (0.0103) [2024-06-15 12:37:13,599][1652491] Updated weights for policy 0, policy_version 87934 (0.0026) [2024-06-15 12:37:15,955][1648985] Fps is (10 sec: 45874.7, 60 sec: 46421.4, 300 sec: 47208.2). Total num frames: 180125696. Throughput: 0: 11946.7. Samples: 45094912. Policy #0 lag: (min: 29.0, avg: 139.6, max: 285.0) [2024-06-15 12:37:15,956][1648985] Avg episode reward: [(0, '138.240')] [2024-06-15 12:37:16,760][1652491] Updated weights for policy 0, policy_version 87989 (0.0015) [2024-06-15 12:37:20,095][1652491] Updated weights for policy 0, policy_version 88032 (0.0028) [2024-06-15 12:37:20,955][1648985] Fps is (10 sec: 42679.5, 60 sec: 46967.5, 300 sec: 46542.1). Total num frames: 180355072. Throughput: 0: 11798.7. Samples: 45166080. Policy #0 lag: (min: 29.0, avg: 139.6, max: 285.0) [2024-06-15 12:37:20,956][1648985] Avg episode reward: [(0, '141.890')] [2024-06-15 12:37:21,741][1652491] Updated weights for policy 0, policy_version 88099 (0.0013) [2024-06-15 12:37:24,085][1652491] Updated weights for policy 0, policy_version 88160 (0.0012) [2024-06-15 12:37:24,880][1652491] Updated weights for policy 0, policy_version 88191 (0.0012) [2024-06-15 12:37:25,955][1648985] Fps is (10 sec: 49150.8, 60 sec: 48059.8, 300 sec: 47097.0). Total num frames: 180617216. Throughput: 0: 11935.3. Samples: 45201920. Policy #0 lag: (min: 29.0, avg: 139.6, max: 285.0) [2024-06-15 12:37:25,956][1648985] Avg episode reward: [(0, '147.850')] [2024-06-15 12:37:28,120][1652491] Updated weights for policy 0, policy_version 88243 (0.0017) [2024-06-15 12:37:30,955][1648985] Fps is (10 sec: 39321.6, 60 sec: 45875.1, 300 sec: 46652.7). Total num frames: 180748288. Throughput: 0: 11855.6. Samples: 45271040. Policy #0 lag: (min: 29.0, avg: 139.6, max: 285.0) [2024-06-15 12:37:30,956][1648985] Avg episode reward: [(0, '149.210')] [2024-06-15 12:37:32,223][1652491] Updated weights for policy 0, policy_version 88304 (0.0118) [2024-06-15 12:37:33,533][1652491] Updated weights for policy 0, policy_version 88354 (0.0012) [2024-06-15 12:37:35,418][1652491] Updated weights for policy 0, policy_version 88416 (0.0011) [2024-06-15 12:37:35,955][1648985] Fps is (10 sec: 49153.4, 60 sec: 49698.2, 300 sec: 46986.0). Total num frames: 181108736. Throughput: 0: 11889.8. Samples: 45339648. Policy #0 lag: (min: 29.0, avg: 139.6, max: 285.0) [2024-06-15 12:37:35,955][1648985] Avg episode reward: [(0, '139.630')] [2024-06-15 12:37:36,152][1652491] Updated weights for policy 0, policy_version 88447 (0.0013) [2024-06-15 12:37:40,955][1648985] Fps is (10 sec: 52428.1, 60 sec: 45875.1, 300 sec: 47208.1). Total num frames: 181272576. Throughput: 0: 11810.1. Samples: 45377024. Policy #0 lag: (min: 29.0, avg: 139.6, max: 285.0) [2024-06-15 12:37:40,956][1648985] Avg episode reward: [(0, '138.350')] [2024-06-15 12:37:42,839][1652491] Updated weights for policy 0, policy_version 88528 (0.0013) [2024-06-15 12:37:44,299][1652491] Updated weights for policy 0, policy_version 88577 (0.0011) [2024-06-15 12:37:45,412][1652491] Updated weights for policy 0, policy_version 88639 (0.0019) [2024-06-15 12:37:45,955][1648985] Fps is (10 sec: 42598.4, 60 sec: 48059.8, 300 sec: 46652.8). Total num frames: 181534720. Throughput: 0: 11559.8. Samples: 45438464. Policy #0 lag: (min: 29.0, avg: 139.6, max: 285.0) [2024-06-15 12:37:45,955][1648985] Avg episode reward: [(0, '129.270')] [2024-06-15 12:37:47,474][1652491] Updated weights for policy 0, policy_version 88693 (0.0027) [2024-06-15 12:37:50,955][1648985] Fps is (10 sec: 39323.5, 60 sec: 44783.1, 300 sec: 47097.1). Total num frames: 181665792. Throughput: 0: 11571.2. Samples: 45508608. Policy #0 lag: (min: 29.0, avg: 139.6, max: 285.0) [2024-06-15 12:37:50,955][1648985] Avg episode reward: [(0, '119.420')] [2024-06-15 12:37:51,010][1651469] Signal inference workers to stop experience collection... (4650 times) [2024-06-15 12:37:51,096][1652491] InferenceWorker_p0-w0: stopping experience collection (4650 times) [2024-06-15 12:37:51,223][1651469] Signal inference workers to resume experience collection... (4650 times) [2024-06-15 12:37:51,224][1652491] InferenceWorker_p0-w0: resuming experience collection (4650 times) [2024-06-15 12:37:52,048][1652491] Updated weights for policy 0, policy_version 88762 (0.0015) [2024-06-15 12:37:55,704][1652491] Updated weights for policy 0, policy_version 88819 (0.0076) [2024-06-15 12:37:55,962][1648985] Fps is (10 sec: 39292.9, 60 sec: 45869.7, 300 sec: 46762.7). Total num frames: 181927936. Throughput: 0: 11358.0. Samples: 45542400. Policy #0 lag: (min: 15.0, avg: 120.8, max: 271.0) [2024-06-15 12:37:55,963][1648985] Avg episode reward: [(0, '134.230')] [2024-06-15 12:37:56,828][1652491] Updated weights for policy 0, policy_version 88871 (0.0012) [2024-06-15 12:37:58,153][1652491] Updated weights for policy 0, policy_version 88914 (0.0094) [2024-06-15 12:38:00,955][1648985] Fps is (10 sec: 52426.9, 60 sec: 45875.1, 300 sec: 47097.0). Total num frames: 182190080. Throughput: 0: 11355.0. Samples: 45605888. Policy #0 lag: (min: 15.0, avg: 120.8, max: 271.0) [2024-06-15 12:38:00,956][1648985] Avg episode reward: [(0, '141.080')] [2024-06-15 12:38:02,945][1652491] Updated weights for policy 0, policy_version 88980 (0.0011) [2024-06-15 12:38:05,955][1648985] Fps is (10 sec: 39350.0, 60 sec: 44236.7, 300 sec: 46652.8). Total num frames: 182321152. Throughput: 0: 11514.3. Samples: 45684224. Policy #0 lag: (min: 15.0, avg: 120.8, max: 271.0) [2024-06-15 12:38:05,956][1648985] Avg episode reward: [(0, '135.290')] [2024-06-15 12:38:06,624][1652491] Updated weights for policy 0, policy_version 89056 (0.0014) [2024-06-15 12:38:08,084][1652491] Updated weights for policy 0, policy_version 89122 (0.0014) [2024-06-15 12:38:09,328][1652491] Updated weights for policy 0, policy_version 89168 (0.0012) [2024-06-15 12:38:10,398][1652491] Updated weights for policy 0, policy_version 89216 (0.0016) [2024-06-15 12:38:10,955][1648985] Fps is (10 sec: 52429.3, 60 sec: 46436.1, 300 sec: 47097.1). Total num frames: 182714368. Throughput: 0: 11275.4. Samples: 45709312. Policy #0 lag: (min: 15.0, avg: 120.8, max: 271.0) [2024-06-15 12:38:10,956][1648985] Avg episode reward: [(0, '139.310')] [2024-06-15 12:38:14,592][1652491] Updated weights for policy 0, policy_version 89275 (0.0032) [2024-06-15 12:38:15,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 45329.0, 300 sec: 47208.2). Total num frames: 182845440. Throughput: 0: 11491.6. Samples: 45788160. Policy #0 lag: (min: 15.0, avg: 120.8, max: 271.0) [2024-06-15 12:38:15,956][1648985] Avg episode reward: [(0, '130.990')] [2024-06-15 12:38:17,702][1652491] Updated weights for policy 0, policy_version 89332 (0.0013) [2024-06-15 12:38:19,141][1652491] Updated weights for policy 0, policy_version 89392 (0.0013) [2024-06-15 12:38:20,596][1652491] Updated weights for policy 0, policy_version 89424 (0.0010) [2024-06-15 12:38:20,955][1648985] Fps is (10 sec: 42598.5, 60 sec: 46421.4, 300 sec: 46874.9). Total num frames: 183140352. Throughput: 0: 11502.9. Samples: 45857280. Policy #0 lag: (min: 15.0, avg: 120.8, max: 271.0) [2024-06-15 12:38:20,957][1648985] Avg episode reward: [(0, '135.740')] [2024-06-15 12:38:24,843][1652491] Updated weights for policy 0, policy_version 89488 (0.0015) [2024-06-15 12:38:25,955][1648985] Fps is (10 sec: 52429.1, 60 sec: 45875.4, 300 sec: 47541.4). Total num frames: 183369728. Throughput: 0: 11559.9. Samples: 45897216. Policy #0 lag: (min: 15.0, avg: 120.8, max: 271.0) [2024-06-15 12:38:25,956][1648985] Avg episode reward: [(0, '144.280')] [2024-06-15 12:38:28,281][1652491] Updated weights for policy 0, policy_version 89555 (0.0013) [2024-06-15 12:38:29,934][1652491] Updated weights for policy 0, policy_version 89634 (0.0163) [2024-06-15 12:38:30,955][1648985] Fps is (10 sec: 49151.8, 60 sec: 48059.8, 300 sec: 47097.0). Total num frames: 183631872. Throughput: 0: 11537.0. Samples: 45957632. Policy #0 lag: (min: 15.0, avg: 120.8, max: 271.0) [2024-06-15 12:38:30,956][1648985] Avg episode reward: [(0, '138.530')] [2024-06-15 12:38:31,835][1651469] Signal inference workers to stop experience collection... (4700 times) [2024-06-15 12:38:31,961][1652491] InferenceWorker_p0-w0: stopping experience collection (4700 times) [2024-06-15 12:38:31,979][1652491] Updated weights for policy 0, policy_version 89687 (0.0014) [2024-06-15 12:38:32,126][1651469] Signal inference workers to resume experience collection... (4700 times) [2024-06-15 12:38:32,128][1652491] InferenceWorker_p0-w0: resuming experience collection (4700 times) [2024-06-15 12:38:35,955][1648985] Fps is (10 sec: 39321.2, 60 sec: 44236.7, 300 sec: 47097.1). Total num frames: 183762944. Throughput: 0: 11753.2. Samples: 46037504. Policy #0 lag: (min: 15.0, avg: 120.8, max: 271.0) [2024-06-15 12:38:35,956][1648985] Avg episode reward: [(0, '123.280')] [2024-06-15 12:38:35,968][1652491] Updated weights for policy 0, policy_version 89731 (0.0023) [2024-06-15 12:38:37,036][1652491] Updated weights for policy 0, policy_version 89785 (0.0012) [2024-06-15 12:38:40,025][1652491] Updated weights for policy 0, policy_version 89845 (0.0014) [2024-06-15 12:38:40,955][1648985] Fps is (10 sec: 45875.9, 60 sec: 46967.7, 300 sec: 47208.1). Total num frames: 184090624. Throughput: 0: 11812.1. Samples: 46073856. Policy #0 lag: (min: 15.0, avg: 120.8, max: 271.0) [2024-06-15 12:38:40,956][1648985] Avg episode reward: [(0, '115.090')] [2024-06-15 12:38:41,254][1652491] Updated weights for policy 0, policy_version 89904 (0.0014) [2024-06-15 12:38:42,343][1652491] Updated weights for policy 0, policy_version 89937 (0.0017) [2024-06-15 12:38:43,324][1652491] Updated weights for policy 0, policy_version 89982 (0.0011) [2024-06-15 12:38:45,955][1648985] Fps is (10 sec: 52428.2, 60 sec: 45875.0, 300 sec: 47097.0). Total num frames: 184287232. Throughput: 0: 11980.8. Samples: 46145024. Policy #0 lag: (min: 15.0, avg: 120.8, max: 271.0) [2024-06-15 12:38:45,957][1648985] Avg episode reward: [(0, '120.860')] [2024-06-15 12:38:48,675][1652491] Updated weights for policy 0, policy_version 90039 (0.0020) [2024-06-15 12:38:49,742][1652491] Updated weights for policy 0, policy_version 90080 (0.0080) [2024-06-15 12:38:50,955][1648985] Fps is (10 sec: 45874.9, 60 sec: 48059.6, 300 sec: 46874.9). Total num frames: 184549376. Throughput: 0: 11821.5. Samples: 46216192. Policy #0 lag: (min: 15.0, avg: 101.2, max: 271.0) [2024-06-15 12:38:50,957][1648985] Avg episode reward: [(0, '128.630')] [2024-06-15 12:38:51,360][1652491] Updated weights for policy 0, policy_version 90144 (0.0082) [2024-06-15 12:38:52,795][1652491] Updated weights for policy 0, policy_version 90192 (0.0014) [2024-06-15 12:38:55,955][1648985] Fps is (10 sec: 52429.5, 60 sec: 48065.5, 300 sec: 47097.0). Total num frames: 184811520. Throughput: 0: 11912.5. Samples: 46245376. Policy #0 lag: (min: 15.0, avg: 101.2, max: 271.0) [2024-06-15 12:38:55,956][1648985] Avg episode reward: [(0, '126.360')] [2024-06-15 12:38:55,969][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000090240_184811520.pth... [2024-06-15 12:38:56,054][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000084736_173539328.pth [2024-06-15 12:38:58,510][1652491] Updated weights for policy 0, policy_version 90256 (0.0014) [2024-06-15 12:38:59,951][1652491] Updated weights for policy 0, policy_version 90307 (0.0013) [2024-06-15 12:39:00,955][1648985] Fps is (10 sec: 49152.2, 60 sec: 47513.8, 300 sec: 46986.0). Total num frames: 185040896. Throughput: 0: 11923.9. Samples: 46324736. Policy #0 lag: (min: 15.0, avg: 101.2, max: 271.0) [2024-06-15 12:39:00,955][1648985] Avg episode reward: [(0, '137.910')] [2024-06-15 12:39:01,166][1652491] Updated weights for policy 0, policy_version 90368 (0.0013) [2024-06-15 12:39:04,224][1652491] Updated weights for policy 0, policy_version 90455 (0.0013) [2024-06-15 12:39:05,955][1648985] Fps is (10 sec: 52429.6, 60 sec: 50244.4, 300 sec: 47208.2). Total num frames: 185335808. Throughput: 0: 11798.8. Samples: 46388224. Policy #0 lag: (min: 15.0, avg: 101.2, max: 271.0) [2024-06-15 12:39:05,956][1648985] Avg episode reward: [(0, '135.860')] [2024-06-15 12:39:09,590][1652491] Updated weights for policy 0, policy_version 90499 (0.0014) [2024-06-15 12:39:10,955][1648985] Fps is (10 sec: 42597.8, 60 sec: 45875.2, 300 sec: 46874.9). Total num frames: 185466880. Throughput: 0: 12026.3. Samples: 46438400. Policy #0 lag: (min: 15.0, avg: 101.2, max: 271.0) [2024-06-15 12:39:10,956][1648985] Avg episode reward: [(0, '134.590')] [2024-06-15 12:39:11,282][1652491] Updated weights for policy 0, policy_version 90576 (0.0128) [2024-06-15 12:39:12,411][1651469] Signal inference workers to stop experience collection... (4750 times) [2024-06-15 12:39:12,494][1652491] InferenceWorker_p0-w0: stopping experience collection (4750 times) [2024-06-15 12:39:12,654][1651469] Signal inference workers to resume experience collection... (4750 times) [2024-06-15 12:39:12,655][1652491] InferenceWorker_p0-w0: resuming experience collection (4750 times) [2024-06-15 12:39:12,858][1652491] Updated weights for policy 0, policy_version 90643 (0.0012) [2024-06-15 12:39:15,955][1648985] Fps is (10 sec: 42597.4, 60 sec: 48605.8, 300 sec: 47097.0). Total num frames: 185761792. Throughput: 0: 12003.5. Samples: 46497792. Policy #0 lag: (min: 15.0, avg: 101.2, max: 271.0) [2024-06-15 12:39:15,956][1648985] Avg episode reward: [(0, '131.800')] [2024-06-15 12:39:16,034][1652491] Updated weights for policy 0, policy_version 90720 (0.0015) [2024-06-15 12:39:20,955][1648985] Fps is (10 sec: 39320.9, 60 sec: 45328.9, 300 sec: 46652.7). Total num frames: 185860096. Throughput: 0: 11969.4. Samples: 46576128. Policy #0 lag: (min: 15.0, avg: 101.2, max: 271.0) [2024-06-15 12:39:20,956][1648985] Avg episode reward: [(0, '130.500')] [2024-06-15 12:39:21,397][1652491] Updated weights for policy 0, policy_version 90768 (0.0014) [2024-06-15 12:39:22,844][1652491] Updated weights for policy 0, policy_version 90832 (0.0070) [2024-06-15 12:39:24,259][1652491] Updated weights for policy 0, policy_version 90896 (0.0012) [2024-06-15 12:39:25,955][1648985] Fps is (10 sec: 49152.4, 60 sec: 48059.7, 300 sec: 47097.1). Total num frames: 186253312. Throughput: 0: 11901.1. Samples: 46609408. Policy #0 lag: (min: 15.0, avg: 101.2, max: 271.0) [2024-06-15 12:39:25,956][1648985] Avg episode reward: [(0, '141.670')] [2024-06-15 12:39:27,261][1652491] Updated weights for policy 0, policy_version 90976 (0.0018) [2024-06-15 12:39:30,955][1648985] Fps is (10 sec: 52430.2, 60 sec: 45875.3, 300 sec: 46874.9). Total num frames: 186384384. Throughput: 0: 11798.8. Samples: 46675968. Policy #0 lag: (min: 15.0, avg: 101.2, max: 271.0) [2024-06-15 12:39:30,956][1648985] Avg episode reward: [(0, '130.430')] [2024-06-15 12:39:32,964][1652491] Updated weights for policy 0, policy_version 91026 (0.0013) [2024-06-15 12:39:34,154][1652491] Updated weights for policy 0, policy_version 91076 (0.0012) [2024-06-15 12:39:35,955][1648985] Fps is (10 sec: 42598.0, 60 sec: 48605.8, 300 sec: 46763.8). Total num frames: 186679296. Throughput: 0: 11753.2. Samples: 46745088. Policy #0 lag: (min: 15.0, avg: 101.2, max: 271.0) [2024-06-15 12:39:35,956][1648985] Avg episode reward: [(0, '116.590')] [2024-06-15 12:39:36,077][1652491] Updated weights for policy 0, policy_version 91155 (0.0100) [2024-06-15 12:39:38,478][1652491] Updated weights for policy 0, policy_version 91232 (0.0026) [2024-06-15 12:39:40,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 46967.4, 300 sec: 47097.1). Total num frames: 186908672. Throughput: 0: 11707.8. Samples: 46772224. Policy #0 lag: (min: 15.0, avg: 101.2, max: 271.0) [2024-06-15 12:39:40,956][1648985] Avg episode reward: [(0, '118.690')] [2024-06-15 12:39:43,992][1652491] Updated weights for policy 0, policy_version 91280 (0.0011) [2024-06-15 12:39:45,388][1652491] Updated weights for policy 0, policy_version 91344 (0.0020) [2024-06-15 12:39:45,955][1648985] Fps is (10 sec: 42598.7, 60 sec: 46967.6, 300 sec: 46652.8). Total num frames: 187105280. Throughput: 0: 11764.6. Samples: 46854144. Policy #0 lag: (min: 9.0, avg: 73.2, max: 265.0) [2024-06-15 12:39:45,956][1648985] Avg episode reward: [(0, '129.100')] [2024-06-15 12:39:47,270][1652491] Updated weights for policy 0, policy_version 91414 (0.0101) [2024-06-15 12:39:48,161][1652491] Updated weights for policy 0, policy_version 91456 (0.0015) [2024-06-15 12:39:50,548][1652491] Updated weights for policy 0, policy_version 91510 (0.0095) [2024-06-15 12:39:50,955][1648985] Fps is (10 sec: 52428.5, 60 sec: 48059.7, 300 sec: 47097.0). Total num frames: 187432960. Throughput: 0: 11605.3. Samples: 46910464. Policy #0 lag: (min: 9.0, avg: 73.2, max: 265.0) [2024-06-15 12:39:50,956][1648985] Avg episode reward: [(0, '134.610')] [2024-06-15 12:39:55,318][1651469] Signal inference workers to stop experience collection... (4800 times) [2024-06-15 12:39:55,358][1652491] InferenceWorker_p0-w0: stopping experience collection (4800 times) [2024-06-15 12:39:55,562][1651469] Signal inference workers to resume experience collection... (4800 times) [2024-06-15 12:39:55,563][1652491] InferenceWorker_p0-w0: resuming experience collection (4800 times) [2024-06-15 12:39:55,785][1652491] Updated weights for policy 0, policy_version 91583 (0.0012) [2024-06-15 12:39:55,955][1648985] Fps is (10 sec: 45875.3, 60 sec: 45875.2, 300 sec: 46652.7). Total num frames: 187564032. Throughput: 0: 11491.6. Samples: 46955520. Policy #0 lag: (min: 9.0, avg: 73.2, max: 265.0) [2024-06-15 12:39:55,956][1648985] Avg episode reward: [(0, '141.320')] [2024-06-15 12:39:58,091][1652491] Updated weights for policy 0, policy_version 91648 (0.0027) [2024-06-15 12:39:59,787][1652491] Updated weights for policy 0, policy_version 91709 (0.0014) [2024-06-15 12:40:00,956][1648985] Fps is (10 sec: 42592.7, 60 sec: 46966.4, 300 sec: 46985.8). Total num frames: 187858944. Throughput: 0: 11548.1. Samples: 47017472. Policy #0 lag: (min: 9.0, avg: 73.2, max: 265.0) [2024-06-15 12:40:00,957][1648985] Avg episode reward: [(0, '131.750')] [2024-06-15 12:40:01,667][1652491] Updated weights for policy 0, policy_version 91765 (0.0060) [2024-06-15 12:40:05,957][1648985] Fps is (10 sec: 39313.3, 60 sec: 43689.0, 300 sec: 46430.2). Total num frames: 187957248. Throughput: 0: 11456.9. Samples: 47091712. Policy #0 lag: (min: 9.0, avg: 73.2, max: 265.0) [2024-06-15 12:40:05,958][1648985] Avg episode reward: [(0, '135.190')] [2024-06-15 12:40:07,011][1652491] Updated weights for policy 0, policy_version 91808 (0.0013) [2024-06-15 12:40:08,687][1652491] Updated weights for policy 0, policy_version 91859 (0.0012) [2024-06-15 12:40:10,149][1652491] Updated weights for policy 0, policy_version 91920 (0.0013) [2024-06-15 12:40:10,955][1648985] Fps is (10 sec: 42604.5, 60 sec: 46967.6, 300 sec: 46874.9). Total num frames: 188284928. Throughput: 0: 11457.4. Samples: 47124992. Policy #0 lag: (min: 9.0, avg: 73.2, max: 265.0) [2024-06-15 12:40:10,955][1648985] Avg episode reward: [(0, '134.890')] [2024-06-15 12:40:11,838][1652491] Updated weights for policy 0, policy_version 91984 (0.0011) [2024-06-15 12:40:15,955][1648985] Fps is (10 sec: 52440.3, 60 sec: 45329.2, 300 sec: 46652.7). Total num frames: 188481536. Throughput: 0: 11377.8. Samples: 47187968. Policy #0 lag: (min: 9.0, avg: 73.2, max: 265.0) [2024-06-15 12:40:15,955][1648985] Avg episode reward: [(0, '141.010')] [2024-06-15 12:40:17,809][1652491] Updated weights for policy 0, policy_version 92035 (0.0013) [2024-06-15 12:40:18,844][1652491] Updated weights for policy 0, policy_version 92087 (0.0012) [2024-06-15 12:40:20,328][1652491] Updated weights for policy 0, policy_version 92131 (0.0022) [2024-06-15 12:40:20,955][1648985] Fps is (10 sec: 42598.2, 60 sec: 47513.8, 300 sec: 46541.7). Total num frames: 188710912. Throughput: 0: 11605.4. Samples: 47267328. Policy #0 lag: (min: 9.0, avg: 73.2, max: 265.0) [2024-06-15 12:40:20,956][1648985] Avg episode reward: [(0, '130.210')] [2024-06-15 12:40:21,684][1652491] Updated weights for policy 0, policy_version 92181 (0.0013) [2024-06-15 12:40:22,832][1652491] Updated weights for policy 0, policy_version 92240 (0.0012) [2024-06-15 12:40:24,095][1652491] Updated weights for policy 0, policy_version 92288 (0.0011) [2024-06-15 12:40:25,955][1648985] Fps is (10 sec: 52428.4, 60 sec: 45875.2, 300 sec: 46986.0). Total num frames: 189005824. Throughput: 0: 11571.2. Samples: 47292928. Policy #0 lag: (min: 9.0, avg: 73.2, max: 265.0) [2024-06-15 12:40:25,956][1648985] Avg episode reward: [(0, '124.810')] [2024-06-15 12:40:29,177][1652491] Updated weights for policy 0, policy_version 92336 (0.0013) [2024-06-15 12:40:30,955][1648985] Fps is (10 sec: 42598.3, 60 sec: 45875.2, 300 sec: 46208.4). Total num frames: 189136896. Throughput: 0: 11582.6. Samples: 47375360. Policy #0 lag: (min: 9.0, avg: 73.2, max: 265.0) [2024-06-15 12:40:30,956][1648985] Avg episode reward: [(0, '121.860')] [2024-06-15 12:40:32,168][1652491] Updated weights for policy 0, policy_version 92400 (0.0114) [2024-06-15 12:40:33,924][1652491] Updated weights for policy 0, policy_version 92464 (0.0011) [2024-06-15 12:40:34,858][1651469] Signal inference workers to stop experience collection... (4850 times) [2024-06-15 12:40:34,909][1652491] InferenceWorker_p0-w0: stopping experience collection (4850 times) [2024-06-15 12:40:35,111][1651469] Signal inference workers to resume experience collection... (4850 times) [2024-06-15 12:40:35,111][1652491] InferenceWorker_p0-w0: resuming experience collection (4850 times) [2024-06-15 12:40:35,723][1652491] Updated weights for policy 0, policy_version 92538 (0.0013) [2024-06-15 12:40:35,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 47513.7, 300 sec: 47097.1). Total num frames: 189530112. Throughput: 0: 11480.2. Samples: 47427072. Policy #0 lag: (min: 9.0, avg: 73.2, max: 265.0) [2024-06-15 12:40:35,956][1648985] Avg episode reward: [(0, '132.410')] [2024-06-15 12:40:40,217][1652491] Updated weights for policy 0, policy_version 92583 (0.0020) [2024-06-15 12:40:40,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 45875.2, 300 sec: 46541.7). Total num frames: 189661184. Throughput: 0: 11457.4. Samples: 47471104. Policy #0 lag: (min: 31.0, avg: 124.2, max: 287.0) [2024-06-15 12:40:40,956][1648985] Avg episode reward: [(0, '126.180')] [2024-06-15 12:40:43,643][1652491] Updated weights for policy 0, policy_version 92625 (0.0023) [2024-06-15 12:40:45,552][1652491] Updated weights for policy 0, policy_version 92690 (0.0015) [2024-06-15 12:40:45,955][1648985] Fps is (10 sec: 32768.2, 60 sec: 45875.3, 300 sec: 46652.8). Total num frames: 189857792. Throughput: 0: 11617.1. Samples: 47540224. Policy #0 lag: (min: 31.0, avg: 124.2, max: 287.0) [2024-06-15 12:40:45,955][1648985] Avg episode reward: [(0, '128.820')] [2024-06-15 12:40:47,619][1652491] Updated weights for policy 0, policy_version 92768 (0.0015) [2024-06-15 12:40:48,179][1652491] Updated weights for policy 0, policy_version 92800 (0.0011) [2024-06-15 12:40:50,955][1648985] Fps is (10 sec: 39321.9, 60 sec: 43690.7, 300 sec: 46208.4). Total num frames: 190054400. Throughput: 0: 11423.8. Samples: 47605760. Policy #0 lag: (min: 31.0, avg: 124.2, max: 287.0) [2024-06-15 12:40:50,955][1648985] Avg episode reward: [(0, '125.940')] [2024-06-15 12:40:52,100][1652491] Updated weights for policy 0, policy_version 92848 (0.0018) [2024-06-15 12:40:55,252][1652491] Updated weights for policy 0, policy_version 92882 (0.0016) [2024-06-15 12:40:55,955][1648985] Fps is (10 sec: 42597.4, 60 sec: 45329.0, 300 sec: 46541.6). Total num frames: 190283776. Throughput: 0: 11502.9. Samples: 47642624. Policy #0 lag: (min: 31.0, avg: 124.2, max: 287.0) [2024-06-15 12:40:55,956][1648985] Avg episode reward: [(0, '129.430')] [2024-06-15 12:40:56,362][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000092928_190316544.pth... [2024-06-15 12:40:56,490][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000087488_179175424.pth [2024-06-15 12:40:57,299][1652491] Updated weights for policy 0, policy_version 92960 (0.0012) [2024-06-15 12:40:59,309][1652491] Updated weights for policy 0, policy_version 93030 (0.0083) [2024-06-15 12:41:00,955][1648985] Fps is (10 sec: 52427.5, 60 sec: 45330.0, 300 sec: 46430.6). Total num frames: 190578688. Throughput: 0: 11161.5. Samples: 47690240. Policy #0 lag: (min: 31.0, avg: 124.2, max: 287.0) [2024-06-15 12:41:00,956][1648985] Avg episode reward: [(0, '146.150')] [2024-06-15 12:41:04,286][1652491] Updated weights for policy 0, policy_version 93088 (0.0018) [2024-06-15 12:41:05,955][1648985] Fps is (10 sec: 42599.4, 60 sec: 45876.9, 300 sec: 46208.4). Total num frames: 190709760. Throughput: 0: 11093.3. Samples: 47766528. Policy #0 lag: (min: 31.0, avg: 124.2, max: 287.0) [2024-06-15 12:41:05,955][1648985] Avg episode reward: [(0, '131.980')] [2024-06-15 12:41:07,949][1652491] Updated weights for policy 0, policy_version 93152 (0.0025) [2024-06-15 12:41:09,390][1652491] Updated weights for policy 0, policy_version 93205 (0.0012) [2024-06-15 12:41:10,955][1648985] Fps is (10 sec: 39321.4, 60 sec: 44782.7, 300 sec: 46208.4). Total num frames: 190971904. Throughput: 0: 11195.7. Samples: 47796736. Policy #0 lag: (min: 31.0, avg: 124.2, max: 287.0) [2024-06-15 12:41:10,956][1648985] Avg episode reward: [(0, '124.710')] [2024-06-15 12:41:11,191][1652491] Updated weights for policy 0, policy_version 93266 (0.0013) [2024-06-15 12:41:15,955][1648985] Fps is (10 sec: 42596.7, 60 sec: 44236.5, 300 sec: 46097.3). Total num frames: 191135744. Throughput: 0: 10888.5. Samples: 47865344. Policy #0 lag: (min: 31.0, avg: 124.2, max: 287.0) [2024-06-15 12:41:15,956][1648985] Avg episode reward: [(0, '143.750')] [2024-06-15 12:41:16,017][1652491] Updated weights for policy 0, policy_version 93344 (0.0013) [2024-06-15 12:41:19,557][1652491] Updated weights for policy 0, policy_version 93415 (0.0115) [2024-06-15 12:41:19,826][1651469] Signal inference workers to stop experience collection... (4900 times) [2024-06-15 12:41:19,884][1652491] InferenceWorker_p0-w0: stopping experience collection (4900 times) [2024-06-15 12:41:20,080][1651469] Signal inference workers to resume experience collection... (4900 times) [2024-06-15 12:41:20,081][1652491] InferenceWorker_p0-w0: resuming experience collection (4900 times) [2024-06-15 12:41:20,955][1648985] Fps is (10 sec: 45876.8, 60 sec: 45329.1, 300 sec: 46430.6). Total num frames: 191430656. Throughput: 0: 11161.6. Samples: 47929344. Policy #0 lag: (min: 31.0, avg: 124.2, max: 287.0) [2024-06-15 12:41:20,955][1648985] Avg episode reward: [(0, '144.370')] [2024-06-15 12:41:21,209][1652491] Updated weights for policy 0, policy_version 93488 (0.0013) [2024-06-15 12:41:22,600][1652491] Updated weights for policy 0, policy_version 93539 (0.0016) [2024-06-15 12:41:25,955][1648985] Fps is (10 sec: 49153.4, 60 sec: 43690.6, 300 sec: 46208.4). Total num frames: 191627264. Throughput: 0: 10888.5. Samples: 47961088. Policy #0 lag: (min: 31.0, avg: 124.2, max: 287.0) [2024-06-15 12:41:25,956][1648985] Avg episode reward: [(0, '146.630')] [2024-06-15 12:41:27,965][1652491] Updated weights for policy 0, policy_version 93603 (0.0052) [2024-06-15 12:41:30,922][1652491] Updated weights for policy 0, policy_version 93688 (0.0060) [2024-06-15 12:41:30,955][1648985] Fps is (10 sec: 42597.5, 60 sec: 45329.0, 300 sec: 46541.6). Total num frames: 191856640. Throughput: 0: 11093.3. Samples: 48039424. Policy #0 lag: (min: 31.0, avg: 124.2, max: 287.0) [2024-06-15 12:41:30,956][1648985] Avg episode reward: [(0, '133.780')] [2024-06-15 12:41:32,230][1652491] Updated weights for policy 0, policy_version 93744 (0.0012) [2024-06-15 12:41:34,233][1652491] Updated weights for policy 0, policy_version 93816 (0.0103) [2024-06-15 12:41:35,955][1648985] Fps is (10 sec: 52429.4, 60 sec: 43690.7, 300 sec: 46208.5). Total num frames: 192151552. Throughput: 0: 11002.3. Samples: 48100864. Policy #0 lag: (min: 31.0, avg: 124.2, max: 287.0) [2024-06-15 12:41:35,955][1648985] Avg episode reward: [(0, '137.780')] [2024-06-15 12:41:38,990][1652491] Updated weights for policy 0, policy_version 93876 (0.0104) [2024-06-15 12:41:40,955][1648985] Fps is (10 sec: 42598.1, 60 sec: 43690.6, 300 sec: 46208.4). Total num frames: 192282624. Throughput: 0: 11082.0. Samples: 48141312. Policy #0 lag: (min: 15.0, avg: 105.6, max: 271.0) [2024-06-15 12:41:40,956][1648985] Avg episode reward: [(0, '146.970')] [2024-06-15 12:41:42,527][1652491] Updated weights for policy 0, policy_version 93941 (0.0091) [2024-06-15 12:41:43,374][1652491] Updated weights for policy 0, policy_version 93970 (0.0011) [2024-06-15 12:41:45,652][1652491] Updated weights for policy 0, policy_version 94051 (0.0034) [2024-06-15 12:41:45,955][1648985] Fps is (10 sec: 49151.1, 60 sec: 46421.2, 300 sec: 46319.5). Total num frames: 192643072. Throughput: 0: 11411.9. Samples: 48203776. Policy #0 lag: (min: 15.0, avg: 105.6, max: 271.0) [2024-06-15 12:41:45,956][1648985] Avg episode reward: [(0, '138.830')] [2024-06-15 12:41:49,630][1652491] Updated weights for policy 0, policy_version 94117 (0.0014) [2024-06-15 12:41:50,955][1648985] Fps is (10 sec: 52430.1, 60 sec: 45875.2, 300 sec: 46208.5). Total num frames: 192806912. Throughput: 0: 11377.8. Samples: 48278528. Policy #0 lag: (min: 15.0, avg: 105.6, max: 271.0) [2024-06-15 12:41:50,955][1648985] Avg episode reward: [(0, '134.200')] [2024-06-15 12:41:53,025][1652491] Updated weights for policy 0, policy_version 94163 (0.0018) [2024-06-15 12:41:55,955][1648985] Fps is (10 sec: 39322.0, 60 sec: 45875.3, 300 sec: 46097.4). Total num frames: 193036288. Throughput: 0: 11434.7. Samples: 48311296. Policy #0 lag: (min: 15.0, avg: 105.6, max: 271.0) [2024-06-15 12:41:55,956][1648985] Avg episode reward: [(0, '143.090')] [2024-06-15 12:41:56,075][1652491] Updated weights for policy 0, policy_version 94261 (0.0013) [2024-06-15 12:41:57,432][1652491] Updated weights for policy 0, policy_version 94325 (0.0012) [2024-06-15 12:42:00,955][1648985] Fps is (10 sec: 39321.3, 60 sec: 43690.8, 300 sec: 45875.2). Total num frames: 193200128. Throughput: 0: 11400.6. Samples: 48378368. Policy #0 lag: (min: 15.0, avg: 105.6, max: 271.0) [2024-06-15 12:42:00,956][1648985] Avg episode reward: [(0, '138.030')] [2024-06-15 12:42:01,478][1651469] Signal inference workers to stop experience collection... (4950 times) [2024-06-15 12:42:01,558][1652491] InferenceWorker_p0-w0: stopping experience collection (4950 times) [2024-06-15 12:42:01,608][1652491] Updated weights for policy 0, policy_version 94373 (0.0011) [2024-06-15 12:42:01,792][1651469] Signal inference workers to resume experience collection... (4950 times) [2024-06-15 12:42:01,794][1652491] InferenceWorker_p0-w0: resuming experience collection (4950 times) [2024-06-15 12:42:04,247][1652491] Updated weights for policy 0, policy_version 94418 (0.0012) [2024-06-15 12:42:05,955][1648985] Fps is (10 sec: 42598.1, 60 sec: 45875.1, 300 sec: 45878.2). Total num frames: 193462272. Throughput: 0: 11605.3. Samples: 48451584. Policy #0 lag: (min: 15.0, avg: 105.6, max: 271.0) [2024-06-15 12:42:05,956][1648985] Avg episode reward: [(0, '125.620')] [2024-06-15 12:42:06,335][1652491] Updated weights for policy 0, policy_version 94480 (0.0012) [2024-06-15 12:42:08,460][1652491] Updated weights for policy 0, policy_version 94566 (0.0012) [2024-06-15 12:42:10,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 45875.4, 300 sec: 46097.4). Total num frames: 193724416. Throughput: 0: 11503.0. Samples: 48478720. Policy #0 lag: (min: 15.0, avg: 105.6, max: 271.0) [2024-06-15 12:42:10,956][1648985] Avg episode reward: [(0, '105.130')] [2024-06-15 12:42:12,693][1652491] Updated weights for policy 0, policy_version 94624 (0.0011) [2024-06-15 12:42:15,803][1652491] Updated weights for policy 0, policy_version 94688 (0.0013) [2024-06-15 12:42:15,955][1648985] Fps is (10 sec: 45875.6, 60 sec: 46421.6, 300 sec: 45986.3). Total num frames: 193921024. Throughput: 0: 11446.1. Samples: 48554496. Policy #0 lag: (min: 15.0, avg: 105.6, max: 271.0) [2024-06-15 12:42:15,956][1648985] Avg episode reward: [(0, '102.280')] [2024-06-15 12:42:17,762][1652491] Updated weights for policy 0, policy_version 94736 (0.0012) [2024-06-15 12:42:19,773][1652491] Updated weights for policy 0, policy_version 94816 (0.0090) [2024-06-15 12:42:20,955][1648985] Fps is (10 sec: 52427.7, 60 sec: 46967.3, 300 sec: 46208.4). Total num frames: 194248704. Throughput: 0: 11355.0. Samples: 48611840. Policy #0 lag: (min: 15.0, avg: 105.6, max: 271.0) [2024-06-15 12:42:20,956][1648985] Avg episode reward: [(0, '115.740')] [2024-06-15 12:42:24,118][1652491] Updated weights for policy 0, policy_version 94864 (0.0026) [2024-06-15 12:42:25,242][1652491] Updated weights for policy 0, policy_version 94910 (0.0016) [2024-06-15 12:42:25,955][1648985] Fps is (10 sec: 45875.1, 60 sec: 45875.2, 300 sec: 46208.5). Total num frames: 194379776. Throughput: 0: 11343.7. Samples: 48651776. Policy #0 lag: (min: 15.0, avg: 105.6, max: 271.0) [2024-06-15 12:42:25,956][1648985] Avg episode reward: [(0, '134.030')] [2024-06-15 12:42:28,246][1652491] Updated weights for policy 0, policy_version 94972 (0.0013) [2024-06-15 12:42:30,221][1652491] Updated weights for policy 0, policy_version 95030 (0.0016) [2024-06-15 12:42:30,955][1648985] Fps is (10 sec: 39322.1, 60 sec: 46421.4, 300 sec: 45875.2). Total num frames: 194641920. Throughput: 0: 11423.3. Samples: 48717824. Policy #0 lag: (min: 15.0, avg: 105.6, max: 271.0) [2024-06-15 12:42:30,956][1648985] Avg episode reward: [(0, '144.180')] [2024-06-15 12:42:31,405][1652491] Updated weights for policy 0, policy_version 95072 (0.0012) [2024-06-15 12:42:35,955][1648985] Fps is (10 sec: 42598.7, 60 sec: 44236.8, 300 sec: 45875.3). Total num frames: 194805760. Throughput: 0: 11309.5. Samples: 48787456. Policy #0 lag: (min: 15.0, avg: 122.4, max: 271.0) [2024-06-15 12:42:35,956][1648985] Avg episode reward: [(0, '146.060')] [2024-06-15 12:42:36,479][1652491] Updated weights for policy 0, policy_version 95152 (0.0036) [2024-06-15 12:42:39,956][1652491] Updated weights for policy 0, policy_version 95216 (0.0021) [2024-06-15 12:42:40,955][1648985] Fps is (10 sec: 39321.8, 60 sec: 45875.3, 300 sec: 45764.1). Total num frames: 195035136. Throughput: 0: 11377.8. Samples: 48823296. Policy #0 lag: (min: 15.0, avg: 122.4, max: 271.0) [2024-06-15 12:42:40,956][1648985] Avg episode reward: [(0, '163.600')] [2024-06-15 12:42:42,072][1652491] Updated weights for policy 0, policy_version 95264 (0.0050) [2024-06-15 12:42:43,725][1652491] Updated weights for policy 0, policy_version 95330 (0.0013) [2024-06-15 12:42:45,955][1648985] Fps is (10 sec: 49151.8, 60 sec: 44236.9, 300 sec: 46208.4). Total num frames: 195297280. Throughput: 0: 11184.4. Samples: 48881664. Policy #0 lag: (min: 15.0, avg: 122.4, max: 271.0) [2024-06-15 12:42:45,956][1648985] Avg episode reward: [(0, '157.210')] [2024-06-15 12:42:47,535][1652491] Updated weights for policy 0, policy_version 95373 (0.0015) [2024-06-15 12:42:47,671][1651469] Signal inference workers to stop experience collection... (5000 times) [2024-06-15 12:42:47,764][1652491] InferenceWorker_p0-w0: stopping experience collection (5000 times) [2024-06-15 12:42:47,931][1651469] Signal inference workers to resume experience collection... (5000 times) [2024-06-15 12:42:47,932][1652491] InferenceWorker_p0-w0: resuming experience collection (5000 times) [2024-06-15 12:42:48,598][1652491] Updated weights for policy 0, policy_version 95421 (0.0015) [2024-06-15 12:42:50,955][1648985] Fps is (10 sec: 45874.5, 60 sec: 44782.8, 300 sec: 45987.4). Total num frames: 195493888. Throughput: 0: 11309.5. Samples: 48960512. Policy #0 lag: (min: 15.0, avg: 122.4, max: 271.0) [2024-06-15 12:42:50,956][1648985] Avg episode reward: [(0, '148.140')] [2024-06-15 12:42:51,340][1652491] Updated weights for policy 0, policy_version 95480 (0.0013) [2024-06-15 12:42:53,291][1652491] Updated weights for policy 0, policy_version 95536 (0.0037) [2024-06-15 12:42:54,851][1652491] Updated weights for policy 0, policy_version 95600 (0.0092) [2024-06-15 12:42:55,955][1648985] Fps is (10 sec: 52428.2, 60 sec: 46421.3, 300 sec: 46208.4). Total num frames: 195821568. Throughput: 0: 11434.6. Samples: 48993280. Policy #0 lag: (min: 15.0, avg: 122.4, max: 271.0) [2024-06-15 12:42:55,956][1648985] Avg episode reward: [(0, '143.550')] [2024-06-15 12:42:56,010][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000095616_195821568.pth... [2024-06-15 12:42:56,064][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000090240_184811520.pth [2024-06-15 12:42:58,718][1652491] Updated weights for policy 0, policy_version 95648 (0.0161) [2024-06-15 12:43:00,955][1648985] Fps is (10 sec: 49152.2, 60 sec: 46421.3, 300 sec: 46319.5). Total num frames: 195985408. Throughput: 0: 11389.1. Samples: 49067008. Policy #0 lag: (min: 15.0, avg: 122.4, max: 271.0) [2024-06-15 12:43:00,956][1648985] Avg episode reward: [(0, '138.380')] [2024-06-15 12:43:01,466][1652491] Updated weights for policy 0, policy_version 95714 (0.0011) [2024-06-15 12:43:03,096][1652491] Updated weights for policy 0, policy_version 95752 (0.0038) [2024-06-15 12:43:04,596][1652491] Updated weights for policy 0, policy_version 95812 (0.0013) [2024-06-15 12:43:05,955][1648985] Fps is (10 sec: 52429.9, 60 sec: 48059.9, 300 sec: 46208.5). Total num frames: 196345856. Throughput: 0: 11639.5. Samples: 49135616. Policy #0 lag: (min: 15.0, avg: 122.4, max: 271.0) [2024-06-15 12:43:05,955][1648985] Avg episode reward: [(0, '140.660')] [2024-06-15 12:43:09,255][1652491] Updated weights for policy 0, policy_version 95888 (0.0014) [2024-06-15 12:43:10,378][1652491] Updated weights for policy 0, policy_version 95934 (0.0019) [2024-06-15 12:43:10,955][1648985] Fps is (10 sec: 49152.1, 60 sec: 45875.1, 300 sec: 46208.4). Total num frames: 196476928. Throughput: 0: 11650.8. Samples: 49176064. Policy #0 lag: (min: 15.0, avg: 122.4, max: 271.0) [2024-06-15 12:43:10,956][1648985] Avg episode reward: [(0, '151.400')] [2024-06-15 12:43:13,041][1652491] Updated weights for policy 0, policy_version 95989 (0.0012) [2024-06-15 12:43:14,222][1652491] Updated weights for policy 0, policy_version 96017 (0.0029) [2024-06-15 12:43:15,787][1652491] Updated weights for policy 0, policy_version 96072 (0.0016) [2024-06-15 12:43:15,955][1648985] Fps is (10 sec: 42597.9, 60 sec: 47513.6, 300 sec: 46208.4). Total num frames: 196771840. Throughput: 0: 11673.6. Samples: 49243136. Policy #0 lag: (min: 15.0, avg: 122.4, max: 271.0) [2024-06-15 12:43:15,956][1648985] Avg episode reward: [(0, '146.720')] [2024-06-15 12:43:20,034][1652491] Updated weights for policy 0, policy_version 96129 (0.0016) [2024-06-15 12:43:20,955][1648985] Fps is (10 sec: 45875.0, 60 sec: 44783.0, 300 sec: 45986.3). Total num frames: 196935680. Throughput: 0: 11673.6. Samples: 49312768. Policy #0 lag: (min: 15.0, avg: 122.4, max: 271.0) [2024-06-15 12:43:20,956][1648985] Avg episode reward: [(0, '143.060')] [2024-06-15 12:43:21,296][1652491] Updated weights for policy 0, policy_version 96187 (0.0014) [2024-06-15 12:43:24,182][1652491] Updated weights for policy 0, policy_version 96240 (0.0014) [2024-06-15 12:43:25,955][1648985] Fps is (10 sec: 39322.4, 60 sec: 46421.5, 300 sec: 45875.2). Total num frames: 197165056. Throughput: 0: 11719.2. Samples: 49350656. Policy #0 lag: (min: 15.0, avg: 122.4, max: 271.0) [2024-06-15 12:43:25,955][1648985] Avg episode reward: [(0, '146.750')] [2024-06-15 12:43:26,312][1652491] Updated weights for policy 0, policy_version 96292 (0.0013) [2024-06-15 12:43:27,405][1652491] Updated weights for policy 0, policy_version 96336 (0.0011) [2024-06-15 12:43:28,293][1652491] Updated weights for policy 0, policy_version 96383 (0.0014) [2024-06-15 12:43:30,955][1648985] Fps is (10 sec: 45875.3, 60 sec: 45875.1, 300 sec: 46208.4). Total num frames: 197394432. Throughput: 0: 11992.1. Samples: 49421312. Policy #0 lag: (min: 15.0, avg: 122.4, max: 271.0) [2024-06-15 12:43:30,956][1648985] Avg episode reward: [(0, '124.930')] [2024-06-15 12:43:31,898][1651469] Signal inference workers to stop experience collection... (5050 times) [2024-06-15 12:43:31,947][1652491] InferenceWorker_p0-w0: stopping experience collection (5050 times) [2024-06-15 12:43:32,129][1651469] Signal inference workers to resume experience collection... (5050 times) [2024-06-15 12:43:32,130][1652491] InferenceWorker_p0-w0: resuming experience collection (5050 times) [2024-06-15 12:43:32,236][1652491] Updated weights for policy 0, policy_version 96437 (0.0013) [2024-06-15 12:43:35,072][1652491] Updated weights for policy 0, policy_version 96506 (0.0013) [2024-06-15 12:43:35,955][1648985] Fps is (10 sec: 49151.1, 60 sec: 47513.6, 300 sec: 45986.3). Total num frames: 197656576. Throughput: 0: 11764.7. Samples: 49489920. Policy #0 lag: (min: 15.0, avg: 118.0, max: 271.0) [2024-06-15 12:43:35,956][1648985] Avg episode reward: [(0, '116.210')] [2024-06-15 12:43:37,559][1652491] Updated weights for policy 0, policy_version 96547 (0.0013) [2024-06-15 12:43:39,261][1652491] Updated weights for policy 0, policy_version 96624 (0.0020) [2024-06-15 12:43:40,955][1648985] Fps is (10 sec: 52429.2, 60 sec: 48059.7, 300 sec: 46208.5). Total num frames: 197918720. Throughput: 0: 11764.6. Samples: 49522688. Policy #0 lag: (min: 15.0, avg: 118.0, max: 271.0) [2024-06-15 12:43:40,956][1648985] Avg episode reward: [(0, '125.300')] [2024-06-15 12:43:43,201][1652491] Updated weights for policy 0, policy_version 96660 (0.0013) [2024-06-15 12:43:45,838][1652491] Updated weights for policy 0, policy_version 96724 (0.0041) [2024-06-15 12:43:45,955][1648985] Fps is (10 sec: 42597.2, 60 sec: 46421.1, 300 sec: 45875.2). Total num frames: 198082560. Throughput: 0: 11776.0. Samples: 49596928. Policy #0 lag: (min: 15.0, avg: 118.0, max: 271.0) [2024-06-15 12:43:45,956][1648985] Avg episode reward: [(0, '141.500')] [2024-06-15 12:43:47,532][1652491] Updated weights for policy 0, policy_version 96769 (0.0015) [2024-06-15 12:43:48,898][1652491] Updated weights for policy 0, policy_version 96826 (0.0012) [2024-06-15 12:43:50,442][1652491] Updated weights for policy 0, policy_version 96890 (0.0011) [2024-06-15 12:43:50,955][1648985] Fps is (10 sec: 52429.6, 60 sec: 49152.2, 300 sec: 46208.5). Total num frames: 198443008. Throughput: 0: 11685.0. Samples: 49661440. Policy #0 lag: (min: 15.0, avg: 118.0, max: 271.0) [2024-06-15 12:43:50,955][1648985] Avg episode reward: [(0, '148.380')] [2024-06-15 12:43:55,955][1648985] Fps is (10 sec: 49153.4, 60 sec: 45875.3, 300 sec: 45875.2). Total num frames: 198574080. Throughput: 0: 11673.6. Samples: 49701376. Policy #0 lag: (min: 15.0, avg: 118.0, max: 271.0) [2024-06-15 12:43:55,955][1648985] Avg episode reward: [(0, '131.330')] [2024-06-15 12:43:56,901][1652491] Updated weights for policy 0, policy_version 96962 (0.0014) [2024-06-15 12:43:58,146][1652491] Updated weights for policy 0, policy_version 97017 (0.0014) [2024-06-15 12:43:59,437][1652491] Updated weights for policy 0, policy_version 97056 (0.0152) [2024-06-15 12:44:00,955][1648985] Fps is (10 sec: 39321.2, 60 sec: 47513.7, 300 sec: 45764.1). Total num frames: 198836224. Throughput: 0: 11571.2. Samples: 49763840. Policy #0 lag: (min: 15.0, avg: 118.0, max: 271.0) [2024-06-15 12:44:00,956][1648985] Avg episode reward: [(0, '130.360')] [2024-06-15 12:44:01,157][1652491] Updated weights for policy 0, policy_version 97104 (0.0011) [2024-06-15 12:44:04,539][1652491] Updated weights for policy 0, policy_version 97153 (0.0014) [2024-06-15 12:44:05,773][1652491] Updated weights for policy 0, policy_version 97215 (0.0043) [2024-06-15 12:44:05,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 45875.1, 300 sec: 46208.4). Total num frames: 199098368. Throughput: 0: 11696.4. Samples: 49839104. Policy #0 lag: (min: 15.0, avg: 118.0, max: 271.0) [2024-06-15 12:44:05,956][1648985] Avg episode reward: [(0, '135.090')] [2024-06-15 12:44:09,643][1652491] Updated weights for policy 0, policy_version 97264 (0.0011) [2024-06-15 12:44:10,955][1648985] Fps is (10 sec: 42598.2, 60 sec: 46421.4, 300 sec: 45764.1). Total num frames: 199262208. Throughput: 0: 11832.8. Samples: 49883136. Policy #0 lag: (min: 15.0, avg: 118.0, max: 271.0) [2024-06-15 12:44:10,956][1648985] Avg episode reward: [(0, '133.040')] [2024-06-15 12:44:11,549][1652491] Updated weights for policy 0, policy_version 97328 (0.0013) [2024-06-15 12:44:13,040][1652491] Updated weights for policy 0, policy_version 97365 (0.0011) [2024-06-15 12:44:15,713][1652491] Updated weights for policy 0, policy_version 97410 (0.0012) [2024-06-15 12:44:15,955][1648985] Fps is (10 sec: 42598.7, 60 sec: 45875.3, 300 sec: 46319.6). Total num frames: 199524352. Throughput: 0: 11616.8. Samples: 49944064. Policy #0 lag: (min: 15.0, avg: 118.0, max: 271.0) [2024-06-15 12:44:15,956][1648985] Avg episode reward: [(0, '130.290')] [2024-06-15 12:44:16,862][1652491] Updated weights for policy 0, policy_version 97465 (0.0012) [2024-06-15 12:44:19,845][1651469] Signal inference workers to stop experience collection... (5100 times) [2024-06-15 12:44:19,917][1652491] InferenceWorker_p0-w0: stopping experience collection (5100 times) [2024-06-15 12:44:20,166][1651469] Signal inference workers to resume experience collection... (5100 times) [2024-06-15 12:44:20,167][1652491] InferenceWorker_p0-w0: resuming experience collection (5100 times) [2024-06-15 12:44:20,956][1648985] Fps is (10 sec: 42598.0, 60 sec: 45875.2, 300 sec: 45541.9). Total num frames: 199688192. Throughput: 0: 11741.8. Samples: 50018304. Policy #0 lag: (min: 15.0, avg: 118.0, max: 271.0) [2024-06-15 12:44:20,957][1648985] Avg episode reward: [(0, '119.630')] [2024-06-15 12:44:21,017][1652491] Updated weights for policy 0, policy_version 97520 (0.0018) [2024-06-15 12:44:22,922][1652491] Updated weights for policy 0, policy_version 97592 (0.0012) [2024-06-15 12:44:25,086][1652491] Updated weights for policy 0, policy_version 97648 (0.0025) [2024-06-15 12:44:25,955][1648985] Fps is (10 sec: 49151.7, 60 sec: 47513.5, 300 sec: 46208.4). Total num frames: 200015872. Throughput: 0: 11594.0. Samples: 50044416. Policy #0 lag: (min: 15.0, avg: 118.0, max: 271.0) [2024-06-15 12:44:25,956][1648985] Avg episode reward: [(0, '122.370')] [2024-06-15 12:44:28,311][1652491] Updated weights for policy 0, policy_version 97712 (0.0016) [2024-06-15 12:44:30,955][1648985] Fps is (10 sec: 45876.2, 60 sec: 45875.3, 300 sec: 45653.1). Total num frames: 200146944. Throughput: 0: 11480.3. Samples: 50113536. Policy #0 lag: (min: 15.0, avg: 118.0, max: 271.0) [2024-06-15 12:44:30,955][1648985] Avg episode reward: [(0, '138.930')] [2024-06-15 12:44:32,436][1652491] Updated weights for policy 0, policy_version 97760 (0.0014) [2024-06-15 12:44:34,580][1652491] Updated weights for policy 0, policy_version 97840 (0.0015) [2024-06-15 12:44:35,955][1648985] Fps is (10 sec: 39321.4, 60 sec: 45875.2, 300 sec: 45764.1). Total num frames: 200409088. Throughput: 0: 11480.1. Samples: 50178048. Policy #0 lag: (min: 63.0, avg: 139.9, max: 319.0) [2024-06-15 12:44:35,956][1648985] Avg episode reward: [(0, '135.020')] [2024-06-15 12:44:37,007][1652491] Updated weights for policy 0, policy_version 97891 (0.0037) [2024-06-15 12:44:40,570][1652491] Updated weights for policy 0, policy_version 97956 (0.0013) [2024-06-15 12:44:40,955][1648985] Fps is (10 sec: 49151.4, 60 sec: 45329.1, 300 sec: 45875.2). Total num frames: 200638464. Throughput: 0: 11332.3. Samples: 50211328. Policy #0 lag: (min: 63.0, avg: 139.9, max: 319.0) [2024-06-15 12:44:40,956][1648985] Avg episode reward: [(0, '123.400')] [2024-06-15 12:44:44,638][1652491] Updated weights for policy 0, policy_version 98016 (0.0013) [2024-06-15 12:44:45,955][1648985] Fps is (10 sec: 42598.3, 60 sec: 45875.4, 300 sec: 45430.9). Total num frames: 200835072. Throughput: 0: 11434.7. Samples: 50278400. Policy #0 lag: (min: 63.0, avg: 139.9, max: 319.0) [2024-06-15 12:44:45,956][1648985] Avg episode reward: [(0, '121.530')] [2024-06-15 12:44:46,451][1652491] Updated weights for policy 0, policy_version 98081 (0.0012) [2024-06-15 12:44:49,485][1652491] Updated weights for policy 0, policy_version 98146 (0.0112) [2024-06-15 12:44:50,955][1648985] Fps is (10 sec: 42598.7, 60 sec: 43690.6, 300 sec: 45764.1). Total num frames: 201064448. Throughput: 0: 11116.1. Samples: 50339328. Policy #0 lag: (min: 63.0, avg: 139.9, max: 319.0) [2024-06-15 12:44:50,956][1648985] Avg episode reward: [(0, '119.270')] [2024-06-15 12:44:52,228][1652491] Updated weights for policy 0, policy_version 98230 (0.0105) [2024-06-15 12:44:55,955][1648985] Fps is (10 sec: 36044.8, 60 sec: 43690.6, 300 sec: 45208.9). Total num frames: 201195520. Throughput: 0: 10797.5. Samples: 50369024. Policy #0 lag: (min: 63.0, avg: 139.9, max: 319.0) [2024-06-15 12:44:55,956][1648985] Avg episode reward: [(0, '135.040')] [2024-06-15 12:44:55,962][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000098240_201195520.pth... [2024-06-15 12:44:56,141][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000092928_190316544.pth [2024-06-15 12:44:57,168][1652491] Updated weights for policy 0, policy_version 98288 (0.0014) [2024-06-15 12:44:58,665][1652491] Updated weights for policy 0, policy_version 98339 (0.0117) [2024-06-15 12:45:00,955][1648985] Fps is (10 sec: 39321.7, 60 sec: 43690.7, 300 sec: 45764.5). Total num frames: 201457664. Throughput: 0: 10922.7. Samples: 50435584. Policy #0 lag: (min: 63.0, avg: 139.9, max: 319.0) [2024-06-15 12:45:00,955][1648985] Avg episode reward: [(0, '146.570')] [2024-06-15 12:45:01,975][1652491] Updated weights for policy 0, policy_version 98427 (0.0014) [2024-06-15 12:45:03,259][1651469] Signal inference workers to stop experience collection... (5150 times) [2024-06-15 12:45:03,302][1652491] InferenceWorker_p0-w0: stopping experience collection (5150 times) [2024-06-15 12:45:03,523][1651469] Signal inference workers to resume experience collection... (5150 times) [2024-06-15 12:45:03,524][1652491] InferenceWorker_p0-w0: resuming experience collection (5150 times) [2024-06-15 12:45:03,906][1652491] Updated weights for policy 0, policy_version 98481 (0.0012) [2024-06-15 12:45:05,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 43690.7, 300 sec: 45542.0). Total num frames: 201719808. Throughput: 0: 11047.9. Samples: 50515456. Policy #0 lag: (min: 63.0, avg: 139.9, max: 319.0) [2024-06-15 12:45:05,956][1648985] Avg episode reward: [(0, '141.820')] [2024-06-15 12:45:07,235][1652491] Updated weights for policy 0, policy_version 98519 (0.0067) [2024-06-15 12:45:08,696][1652491] Updated weights for policy 0, policy_version 98576 (0.0013) [2024-06-15 12:45:10,955][1648985] Fps is (10 sec: 52427.4, 60 sec: 45328.9, 300 sec: 45764.1). Total num frames: 201981952. Throughput: 0: 11184.3. Samples: 50547712. Policy #0 lag: (min: 63.0, avg: 139.9, max: 319.0) [2024-06-15 12:45:10,956][1648985] Avg episode reward: [(0, '148.240')] [2024-06-15 12:45:11,764][1652491] Updated weights for policy 0, policy_version 98626 (0.0031) [2024-06-15 12:45:12,860][1652491] Updated weights for policy 0, policy_version 98680 (0.0098) [2024-06-15 12:45:14,540][1652491] Updated weights for policy 0, policy_version 98736 (0.0012) [2024-06-15 12:45:15,957][1648985] Fps is (10 sec: 52419.0, 60 sec: 45327.6, 300 sec: 45874.9). Total num frames: 202244096. Throughput: 0: 11218.0. Samples: 50618368. Policy #0 lag: (min: 63.0, avg: 139.9, max: 319.0) [2024-06-15 12:45:15,958][1648985] Avg episode reward: [(0, '124.640')] [2024-06-15 12:45:18,177][1652491] Updated weights for policy 0, policy_version 98785 (0.0012) [2024-06-15 12:45:19,877][1652491] Updated weights for policy 0, policy_version 98841 (0.0014) [2024-06-15 12:45:20,955][1648985] Fps is (10 sec: 52430.1, 60 sec: 46967.6, 300 sec: 45764.1). Total num frames: 202506240. Throughput: 0: 11366.4. Samples: 50689536. Policy #0 lag: (min: 63.0, avg: 139.9, max: 319.0) [2024-06-15 12:45:20,956][1648985] Avg episode reward: [(0, '130.660')] [2024-06-15 12:45:22,987][1652491] Updated weights for policy 0, policy_version 98896 (0.0121) [2024-06-15 12:45:24,661][1652491] Updated weights for policy 0, policy_version 98946 (0.0013) [2024-06-15 12:45:25,687][1652491] Updated weights for policy 0, policy_version 98993 (0.0017) [2024-06-15 12:45:25,955][1648985] Fps is (10 sec: 49160.9, 60 sec: 45329.0, 300 sec: 46097.3). Total num frames: 202735616. Throughput: 0: 11434.7. Samples: 50725888. Policy #0 lag: (min: 63.0, avg: 139.9, max: 319.0) [2024-06-15 12:45:25,956][1648985] Avg episode reward: [(0, '132.860')] [2024-06-15 12:45:26,013][1652491] Updated weights for policy 0, policy_version 99007 (0.0014) [2024-06-15 12:45:28,961][1652491] Updated weights for policy 0, policy_version 99045 (0.0015) [2024-06-15 12:45:30,086][1652491] Updated weights for policy 0, policy_version 99089 (0.0015) [2024-06-15 12:45:30,833][1652491] Updated weights for policy 0, policy_version 99129 (0.0015) [2024-06-15 12:45:30,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 48059.7, 300 sec: 45764.1). Total num frames: 203030528. Throughput: 0: 11730.5. Samples: 50806272. Policy #0 lag: (min: 63.0, avg: 139.9, max: 319.0) [2024-06-15 12:45:30,956][1648985] Avg episode reward: [(0, '108.060')] [2024-06-15 12:45:34,346][1652491] Updated weights for policy 0, policy_version 99189 (0.0011) [2024-06-15 12:45:35,955][1648985] Fps is (10 sec: 49152.5, 60 sec: 46967.5, 300 sec: 45986.3). Total num frames: 203227136. Throughput: 0: 11935.3. Samples: 50876416. Policy #0 lag: (min: 15.0, avg: 124.8, max: 271.0) [2024-06-15 12:45:35,956][1648985] Avg episode reward: [(0, '128.500')] [2024-06-15 12:45:36,066][1652491] Updated weights for policy 0, policy_version 99248 (0.0015) [2024-06-15 12:45:39,435][1652491] Updated weights for policy 0, policy_version 99296 (0.0012) [2024-06-15 12:45:40,955][1648985] Fps is (10 sec: 39320.5, 60 sec: 46421.2, 300 sec: 45986.2). Total num frames: 203423744. Throughput: 0: 12151.4. Samples: 50915840. Policy #0 lag: (min: 15.0, avg: 124.8, max: 271.0) [2024-06-15 12:45:40,956][1648985] Avg episode reward: [(0, '125.420')] [2024-06-15 12:45:41,454][1652491] Updated weights for policy 0, policy_version 99349 (0.0016) [2024-06-15 12:45:44,022][1652491] Updated weights for policy 0, policy_version 99396 (0.0013) [2024-06-15 12:45:45,245][1652491] Updated weights for policy 0, policy_version 99444 (0.0013) [2024-06-15 12:45:45,955][1648985] Fps is (10 sec: 45875.2, 60 sec: 47513.7, 300 sec: 46208.4). Total num frames: 203685888. Throughput: 0: 12083.2. Samples: 50979328. Policy #0 lag: (min: 15.0, avg: 124.8, max: 271.0) [2024-06-15 12:45:45,956][1648985] Avg episode reward: [(0, '137.390')] [2024-06-15 12:45:46,352][1652491] Updated weights for policy 0, policy_version 99458 (0.0011) [2024-06-15 12:45:46,758][1651469] Signal inference workers to stop experience collection... (5200 times) [2024-06-15 12:45:46,780][1652491] InferenceWorker_p0-w0: stopping experience collection (5200 times) [2024-06-15 12:45:46,945][1651469] Signal inference workers to resume experience collection... (5200 times) [2024-06-15 12:45:46,946][1652491] InferenceWorker_p0-w0: resuming experience collection (5200 times) [2024-06-15 12:45:50,955][1648985] Fps is (10 sec: 42599.2, 60 sec: 46421.3, 300 sec: 45986.3). Total num frames: 203849728. Throughput: 0: 12014.9. Samples: 51056128. Policy #0 lag: (min: 15.0, avg: 124.8, max: 271.0) [2024-06-15 12:45:50,956][1648985] Avg episode reward: [(0, '137.850')] [2024-06-15 12:45:51,029][1652491] Updated weights for policy 0, policy_version 99552 (0.0037) [2024-06-15 12:45:52,521][1652491] Updated weights for policy 0, policy_version 99604 (0.0038) [2024-06-15 12:45:55,687][1652491] Updated weights for policy 0, policy_version 99666 (0.0013) [2024-06-15 12:45:55,955][1648985] Fps is (10 sec: 45874.0, 60 sec: 49151.8, 300 sec: 45986.3). Total num frames: 204144640. Throughput: 0: 11878.4. Samples: 51082240. Policy #0 lag: (min: 15.0, avg: 124.8, max: 271.0) [2024-06-15 12:45:55,956][1648985] Avg episode reward: [(0, '121.260')] [2024-06-15 12:45:56,455][1652491] Updated weights for policy 0, policy_version 99712 (0.0012) [2024-06-15 12:45:58,975][1652491] Updated weights for policy 0, policy_version 99776 (0.0014) [2024-06-15 12:46:00,955][1648985] Fps is (10 sec: 49152.2, 60 sec: 48059.7, 300 sec: 46208.4). Total num frames: 204341248. Throughput: 0: 11969.9. Samples: 51156992. Policy #0 lag: (min: 15.0, avg: 124.8, max: 271.0) [2024-06-15 12:46:00,956][1648985] Avg episode reward: [(0, '131.320')] [2024-06-15 12:46:02,798][1652491] Updated weights for policy 0, policy_version 99812 (0.0011) [2024-06-15 12:46:04,066][1652491] Updated weights for policy 0, policy_version 99858 (0.0013) [2024-06-15 12:46:05,955][1648985] Fps is (10 sec: 45876.3, 60 sec: 48059.8, 300 sec: 46208.5). Total num frames: 204603392. Throughput: 0: 11912.5. Samples: 51225600. Policy #0 lag: (min: 15.0, avg: 124.8, max: 271.0) [2024-06-15 12:46:05,956][1648985] Avg episode reward: [(0, '146.280')] [2024-06-15 12:46:06,792][1652491] Updated weights for policy 0, policy_version 99924 (0.0016) [2024-06-15 12:46:09,019][1652491] Updated weights for policy 0, policy_version 99969 (0.0016) [2024-06-15 12:46:10,243][1652491] Updated weights for policy 0, policy_version 100024 (0.0013) [2024-06-15 12:46:10,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 48059.9, 300 sec: 46541.7). Total num frames: 204865536. Throughput: 0: 11855.7. Samples: 51259392. Policy #0 lag: (min: 15.0, avg: 124.8, max: 271.0) [2024-06-15 12:46:10,956][1648985] Avg episode reward: [(0, '145.440')] [2024-06-15 12:46:13,675][1652491] Updated weights for policy 0, policy_version 100068 (0.0020) [2024-06-15 12:46:14,620][1652491] Updated weights for policy 0, policy_version 100112 (0.0014) [2024-06-15 12:46:15,955][1648985] Fps is (10 sec: 52428.4, 60 sec: 48061.2, 300 sec: 46430.6). Total num frames: 205127680. Throughput: 0: 11719.1. Samples: 51333632. Policy #0 lag: (min: 15.0, avg: 124.8, max: 271.0) [2024-06-15 12:46:15,956][1648985] Avg episode reward: [(0, '143.590')] [2024-06-15 12:46:18,384][1652491] Updated weights for policy 0, policy_version 100176 (0.0016) [2024-06-15 12:46:19,385][1652491] Updated weights for policy 0, policy_version 100224 (0.0013) [2024-06-15 12:46:20,955][1648985] Fps is (10 sec: 49151.1, 60 sec: 47513.4, 300 sec: 46541.7). Total num frames: 205357056. Throughput: 0: 11628.0. Samples: 51399680. Policy #0 lag: (min: 15.0, avg: 124.8, max: 271.0) [2024-06-15 12:46:20,957][1648985] Avg episode reward: [(0, '136.920')] [2024-06-15 12:46:20,960][1652491] Updated weights for policy 0, policy_version 100276 (0.0014) [2024-06-15 12:46:24,187][1652491] Updated weights for policy 0, policy_version 100304 (0.0012) [2024-06-15 12:46:25,363][1652491] Updated weights for policy 0, policy_version 100347 (0.0014) [2024-06-15 12:46:25,955][1648985] Fps is (10 sec: 42598.7, 60 sec: 46967.5, 300 sec: 46430.6). Total num frames: 205553664. Throughput: 0: 11719.2. Samples: 51443200. Policy #0 lag: (min: 15.0, avg: 124.8, max: 271.0) [2024-06-15 12:46:25,956][1648985] Avg episode reward: [(0, '125.210')] [2024-06-15 12:46:26,918][1652491] Updated weights for policy 0, policy_version 100410 (0.0133) [2024-06-15 12:46:29,934][1652491] Updated weights for policy 0, policy_version 100479 (0.0014) [2024-06-15 12:46:30,433][1651469] Signal inference workers to stop experience collection... (5250 times) [2024-06-15 12:46:30,483][1652491] InferenceWorker_p0-w0: stopping experience collection (5250 times) [2024-06-15 12:46:30,712][1651469] Signal inference workers to resume experience collection... (5250 times) [2024-06-15 12:46:30,712][1652491] InferenceWorker_p0-w0: resuming experience collection (5250 times) [2024-06-15 12:46:30,955][1648985] Fps is (10 sec: 45875.6, 60 sec: 46421.2, 300 sec: 46319.5). Total num frames: 205815808. Throughput: 0: 11787.4. Samples: 51509760. Policy #0 lag: (min: 15.0, avg: 124.8, max: 271.0) [2024-06-15 12:46:30,956][1648985] Avg episode reward: [(0, '108.930')] [2024-06-15 12:46:31,619][1652491] Updated weights for policy 0, policy_version 100532 (0.0012) [2024-06-15 12:46:35,359][1652491] Updated weights for policy 0, policy_version 100576 (0.0012) [2024-06-15 12:46:35,955][1648985] Fps is (10 sec: 45875.5, 60 sec: 46421.4, 300 sec: 46541.7). Total num frames: 206012416. Throughput: 0: 11639.5. Samples: 51579904. Policy #0 lag: (min: 2.0, avg: 100.1, max: 258.0) [2024-06-15 12:46:35,955][1648985] Avg episode reward: [(0, '116.090')] [2024-06-15 12:46:36,831][1652491] Updated weights for policy 0, policy_version 100611 (0.0053) [2024-06-15 12:46:40,762][1652491] Updated weights for policy 0, policy_version 100688 (0.0014) [2024-06-15 12:46:40,955][1648985] Fps is (10 sec: 39321.7, 60 sec: 46421.5, 300 sec: 45986.3). Total num frames: 206209024. Throughput: 0: 11764.7. Samples: 51611648. Policy #0 lag: (min: 2.0, avg: 100.1, max: 258.0) [2024-06-15 12:46:40,956][1648985] Avg episode reward: [(0, '116.310')] [2024-06-15 12:46:42,740][1652491] Updated weights for policy 0, policy_version 100752 (0.0012) [2024-06-15 12:46:43,916][1652491] Updated weights for policy 0, policy_version 100800 (0.0015) [2024-06-15 12:46:45,955][1648985] Fps is (10 sec: 42597.4, 60 sec: 45875.0, 300 sec: 46208.4). Total num frames: 206438400. Throughput: 0: 11616.7. Samples: 51679744. Policy #0 lag: (min: 2.0, avg: 100.1, max: 258.0) [2024-06-15 12:46:45,956][1648985] Avg episode reward: [(0, '125.940')] [2024-06-15 12:46:48,818][1652491] Updated weights for policy 0, policy_version 100880 (0.0017) [2024-06-15 12:46:49,974][1652491] Updated weights for policy 0, policy_version 100927 (0.0013) [2024-06-15 12:46:50,955][1648985] Fps is (10 sec: 49152.7, 60 sec: 47513.7, 300 sec: 46319.5). Total num frames: 206700544. Throughput: 0: 11491.6. Samples: 51742720. Policy #0 lag: (min: 2.0, avg: 100.1, max: 258.0) [2024-06-15 12:46:50,955][1648985] Avg episode reward: [(0, '145.760')] [2024-06-15 12:46:53,804][1652491] Updated weights for policy 0, policy_version 100984 (0.0017) [2024-06-15 12:46:55,236][1652491] Updated weights for policy 0, policy_version 101052 (0.0180) [2024-06-15 12:46:55,955][1648985] Fps is (10 sec: 52429.8, 60 sec: 46967.7, 300 sec: 46652.7). Total num frames: 206962688. Throughput: 0: 11582.6. Samples: 51780608. Policy #0 lag: (min: 2.0, avg: 100.1, max: 258.0) [2024-06-15 12:46:55,956][1648985] Avg episode reward: [(0, '150.960')] [2024-06-15 12:46:55,963][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000101056_206962688.pth... [2024-06-15 12:46:56,076][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000095616_195821568.pth [2024-06-15 12:46:59,740][1652491] Updated weights for policy 0, policy_version 101104 (0.0014) [2024-06-15 12:47:00,978][1648985] Fps is (10 sec: 42499.4, 60 sec: 46403.4, 300 sec: 46315.9). Total num frames: 207126528. Throughput: 0: 11360.6. Samples: 51845120. Policy #0 lag: (min: 2.0, avg: 100.1, max: 258.0) [2024-06-15 12:47:00,979][1648985] Avg episode reward: [(0, '149.480')] [2024-06-15 12:47:01,541][1652491] Updated weights for policy 0, policy_version 101174 (0.0102) [2024-06-15 12:47:05,522][1652491] Updated weights for policy 0, policy_version 101247 (0.0013) [2024-06-15 12:47:05,955][1648985] Fps is (10 sec: 39320.8, 60 sec: 45875.1, 300 sec: 46208.4). Total num frames: 207355904. Throughput: 0: 11389.2. Samples: 51912192. Policy #0 lag: (min: 2.0, avg: 100.1, max: 258.0) [2024-06-15 12:47:05,956][1648985] Avg episode reward: [(0, '147.320')] [2024-06-15 12:47:06,935][1652491] Updated weights for policy 0, policy_version 101306 (0.0011) [2024-06-15 12:47:10,965][1648985] Fps is (10 sec: 42653.8, 60 sec: 44775.3, 300 sec: 46206.8). Total num frames: 207552512. Throughput: 0: 11227.3. Samples: 51948544. Policy #0 lag: (min: 2.0, avg: 100.1, max: 258.0) [2024-06-15 12:47:10,966][1648985] Avg episode reward: [(0, '145.580')] [2024-06-15 12:47:11,674][1652491] Updated weights for policy 0, policy_version 101371 (0.0013) [2024-06-15 12:47:13,389][1652491] Updated weights for policy 0, policy_version 101439 (0.0015) [2024-06-15 12:47:15,955][1648985] Fps is (10 sec: 39322.3, 60 sec: 43690.7, 300 sec: 45764.2). Total num frames: 207749120. Throughput: 0: 11207.1. Samples: 52014080. Policy #0 lag: (min: 2.0, avg: 100.1, max: 258.0) [2024-06-15 12:47:15,956][1648985] Avg episode reward: [(0, '140.010')] [2024-06-15 12:47:17,012][1651469] Signal inference workers to stop experience collection... (5300 times) [2024-06-15 12:47:17,045][1652491] InferenceWorker_p0-w0: stopping experience collection (5300 times) [2024-06-15 12:47:17,159][1651469] Signal inference workers to resume experience collection... (5300 times) [2024-06-15 12:47:17,160][1652491] InferenceWorker_p0-w0: resuming experience collection (5300 times) [2024-06-15 12:47:17,263][1652491] Updated weights for policy 0, policy_version 101488 (0.0114) [2024-06-15 12:47:18,688][1652491] Updated weights for policy 0, policy_version 101552 (0.0014) [2024-06-15 12:47:20,955][1648985] Fps is (10 sec: 45921.9, 60 sec: 44236.9, 300 sec: 46208.4). Total num frames: 208011264. Throughput: 0: 11229.8. Samples: 52085248. Policy #0 lag: (min: 2.0, avg: 100.1, max: 258.0) [2024-06-15 12:47:20,956][1648985] Avg episode reward: [(0, '122.300')] [2024-06-15 12:47:21,732][1652491] Updated weights for policy 0, policy_version 101584 (0.0012) [2024-06-15 12:47:23,822][1652491] Updated weights for policy 0, policy_version 101648 (0.0013) [2024-06-15 12:47:25,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 45329.1, 300 sec: 46208.4). Total num frames: 208273408. Throughput: 0: 11184.4. Samples: 52114944. Policy #0 lag: (min: 2.0, avg: 100.1, max: 258.0) [2024-06-15 12:47:25,956][1648985] Avg episode reward: [(0, '123.320')] [2024-06-15 12:47:28,002][1652491] Updated weights for policy 0, policy_version 101700 (0.0016) [2024-06-15 12:47:29,568][1652491] Updated weights for policy 0, policy_version 101767 (0.0014) [2024-06-15 12:47:30,955][1648985] Fps is (10 sec: 52429.4, 60 sec: 45329.2, 300 sec: 46541.7). Total num frames: 208535552. Throughput: 0: 11218.6. Samples: 52184576. Policy #0 lag: (min: 2.0, avg: 100.1, max: 258.0) [2024-06-15 12:47:30,955][1648985] Avg episode reward: [(0, '132.670')] [2024-06-15 12:47:33,195][1652491] Updated weights for policy 0, policy_version 101842 (0.0016) [2024-06-15 12:47:34,413][1652491] Updated weights for policy 0, policy_version 101888 (0.0011) [2024-06-15 12:47:35,955][1648985] Fps is (10 sec: 45875.2, 60 sec: 45329.0, 300 sec: 46430.6). Total num frames: 208732160. Throughput: 0: 11332.2. Samples: 52252672. Policy #0 lag: (min: 15.0, avg: 149.4, max: 271.0) [2024-06-15 12:47:35,956][1648985] Avg episode reward: [(0, '128.850')] [2024-06-15 12:47:36,336][1652491] Updated weights for policy 0, policy_version 101940 (0.0015) [2024-06-15 12:47:39,608][1652491] Updated weights for policy 0, policy_version 101984 (0.0012) [2024-06-15 12:47:40,901][1652491] Updated weights for policy 0, policy_version 102048 (0.0014) [2024-06-15 12:47:40,955][1648985] Fps is (10 sec: 45874.8, 60 sec: 46421.4, 300 sec: 46430.6). Total num frames: 208994304. Throughput: 0: 11446.0. Samples: 52295680. Policy #0 lag: (min: 15.0, avg: 149.4, max: 271.0) [2024-06-15 12:47:40,956][1648985] Avg episode reward: [(0, '128.240')] [2024-06-15 12:47:44,708][1652491] Updated weights for policy 0, policy_version 102099 (0.0013) [2024-06-15 12:47:45,502][1652491] Updated weights for policy 0, policy_version 102139 (0.0017) [2024-06-15 12:47:45,955][1648985] Fps is (10 sec: 45874.5, 60 sec: 45875.2, 300 sec: 46430.6). Total num frames: 209190912. Throughput: 0: 11531.6. Samples: 52363776. Policy #0 lag: (min: 15.0, avg: 149.4, max: 271.0) [2024-06-15 12:47:45,956][1648985] Avg episode reward: [(0, '93.730')] [2024-06-15 12:47:50,128][1652491] Updated weights for policy 0, policy_version 102210 (0.0057) [2024-06-15 12:47:50,955][1648985] Fps is (10 sec: 39321.2, 60 sec: 44782.8, 300 sec: 45986.3). Total num frames: 209387520. Throughput: 0: 11616.7. Samples: 52434944. Policy #0 lag: (min: 15.0, avg: 149.4, max: 271.0) [2024-06-15 12:47:50,956][1648985] Avg episode reward: [(0, '111.160')] [2024-06-15 12:47:51,669][1652491] Updated weights for policy 0, policy_version 102281 (0.0150) [2024-06-15 12:47:52,641][1652491] Updated weights for policy 0, policy_version 102333 (0.0052) [2024-06-15 12:47:55,955][1648985] Fps is (10 sec: 39322.1, 60 sec: 43690.6, 300 sec: 46097.4). Total num frames: 209584128. Throughput: 0: 11642.1. Samples: 52472320. Policy #0 lag: (min: 15.0, avg: 149.4, max: 271.0) [2024-06-15 12:47:55,956][1648985] Avg episode reward: [(0, '138.010')] [2024-06-15 12:47:57,477][1652491] Updated weights for policy 0, policy_version 102400 (0.0118) [2024-06-15 12:47:57,890][1651469] Signal inference workers to stop experience collection... (5350 times) [2024-06-15 12:47:57,994][1652491] InferenceWorker_p0-w0: stopping experience collection (5350 times) [2024-06-15 12:47:58,171][1651469] Signal inference workers to resume experience collection... (5350 times) [2024-06-15 12:47:58,172][1652491] InferenceWorker_p0-w0: resuming experience collection (5350 times) [2024-06-15 12:47:58,717][1652491] Updated weights for policy 0, policy_version 102461 (0.0015) [2024-06-15 12:48:00,955][1648985] Fps is (10 sec: 45875.8, 60 sec: 45346.6, 300 sec: 45764.1). Total num frames: 209846272. Throughput: 0: 11628.1. Samples: 52537344. Policy #0 lag: (min: 15.0, avg: 149.4, max: 271.0) [2024-06-15 12:48:00,956][1648985] Avg episode reward: [(0, '123.790')] [2024-06-15 12:48:03,534][1652491] Updated weights for policy 0, policy_version 102546 (0.0037) [2024-06-15 12:48:04,307][1652491] Updated weights for policy 0, policy_version 102587 (0.0012) [2024-06-15 12:48:05,955][1648985] Fps is (10 sec: 52427.3, 60 sec: 45875.1, 300 sec: 46208.4). Total num frames: 210108416. Throughput: 0: 11571.1. Samples: 52605952. Policy #0 lag: (min: 15.0, avg: 149.4, max: 271.0) [2024-06-15 12:48:05,956][1648985] Avg episode reward: [(0, '133.920')] [2024-06-15 12:48:08,440][1652491] Updated weights for policy 0, policy_version 102628 (0.0014) [2024-06-15 12:48:10,405][1652491] Updated weights for policy 0, policy_version 102712 (0.0032) [2024-06-15 12:48:10,955][1648985] Fps is (10 sec: 52428.5, 60 sec: 46975.5, 300 sec: 46097.4). Total num frames: 210370560. Throughput: 0: 11616.7. Samples: 52637696. Policy #0 lag: (min: 15.0, avg: 149.4, max: 271.0) [2024-06-15 12:48:10,956][1648985] Avg episode reward: [(0, '140.810')] [2024-06-15 12:48:13,458][1652491] Updated weights for policy 0, policy_version 102752 (0.0012) [2024-06-15 12:48:15,021][1652491] Updated weights for policy 0, policy_version 102816 (0.0123) [2024-06-15 12:48:15,955][1648985] Fps is (10 sec: 52429.1, 60 sec: 48059.5, 300 sec: 46430.6). Total num frames: 210632704. Throughput: 0: 11559.7. Samples: 52704768. Policy #0 lag: (min: 15.0, avg: 149.4, max: 271.0) [2024-06-15 12:48:15,956][1648985] Avg episode reward: [(0, '127.860')] [2024-06-15 12:48:19,598][1652491] Updated weights for policy 0, policy_version 102880 (0.0033) [2024-06-15 12:48:20,955][1648985] Fps is (10 sec: 42598.7, 60 sec: 46421.4, 300 sec: 46208.4). Total num frames: 210796544. Throughput: 0: 11491.6. Samples: 52769792. Policy #0 lag: (min: 15.0, avg: 149.4, max: 271.0) [2024-06-15 12:48:20,955][1648985] Avg episode reward: [(0, '138.500')] [2024-06-15 12:48:21,817][1652491] Updated weights for policy 0, policy_version 102969 (0.0014) [2024-06-15 12:48:25,955][1648985] Fps is (10 sec: 36045.5, 60 sec: 45329.0, 300 sec: 46097.4). Total num frames: 210993152. Throughput: 0: 11366.4. Samples: 52807168. Policy #0 lag: (min: 15.0, avg: 149.4, max: 271.0) [2024-06-15 12:48:25,956][1648985] Avg episode reward: [(0, '128.530')] [2024-06-15 12:48:26,309][1652491] Updated weights for policy 0, policy_version 103040 (0.0013) [2024-06-15 12:48:27,934][1652491] Updated weights for policy 0, policy_version 103101 (0.0013) [2024-06-15 12:48:30,955][1648985] Fps is (10 sec: 36044.5, 60 sec: 43690.6, 300 sec: 45764.1). Total num frames: 211156992. Throughput: 0: 11286.8. Samples: 52871680. Policy #0 lag: (min: 15.0, avg: 149.4, max: 271.0) [2024-06-15 12:48:30,956][1648985] Avg episode reward: [(0, '108.920')] [2024-06-15 12:48:32,484][1652491] Updated weights for policy 0, policy_version 103166 (0.0015) [2024-06-15 12:48:34,100][1652491] Updated weights for policy 0, policy_version 103224 (0.0019) [2024-06-15 12:48:35,955][1648985] Fps is (10 sec: 42598.6, 60 sec: 44782.9, 300 sec: 45764.1). Total num frames: 211419136. Throughput: 0: 11173.0. Samples: 52937728. Policy #0 lag: (min: 15.0, avg: 149.4, max: 271.0) [2024-06-15 12:48:35,956][1648985] Avg episode reward: [(0, '109.580')] [2024-06-15 12:48:37,766][1652491] Updated weights for policy 0, policy_version 103280 (0.0012) [2024-06-15 12:48:39,424][1652491] Updated weights for policy 0, policy_version 103347 (0.0014) [2024-06-15 12:48:40,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 44782.9, 300 sec: 46097.4). Total num frames: 211681280. Throughput: 0: 11036.5. Samples: 52968960. Policy #0 lag: (min: 5.0, avg: 98.3, max: 245.0) [2024-06-15 12:48:40,956][1648985] Avg episode reward: [(0, '115.320')] [2024-06-15 12:48:41,815][1651469] Signal inference workers to stop experience collection... (5400 times) [2024-06-15 12:48:41,867][1652491] InferenceWorker_p0-w0: stopping experience collection (5400 times) [2024-06-15 12:48:42,074][1651469] Signal inference workers to resume experience collection... (5400 times) [2024-06-15 12:48:42,075][1652491] InferenceWorker_p0-w0: resuming experience collection (5400 times) [2024-06-15 12:48:42,505][1652491] Updated weights for policy 0, policy_version 103392 (0.0014) [2024-06-15 12:48:43,113][1652491] Updated weights for policy 0, policy_version 103417 (0.0037) [2024-06-15 12:48:45,078][1652491] Updated weights for policy 0, policy_version 103456 (0.0012) [2024-06-15 12:48:45,955][1648985] Fps is (10 sec: 52428.6, 60 sec: 45875.3, 300 sec: 45764.1). Total num frames: 211943424. Throughput: 0: 11173.0. Samples: 53040128. Policy #0 lag: (min: 5.0, avg: 98.3, max: 245.0) [2024-06-15 12:48:45,956][1648985] Avg episode reward: [(0, '129.880')] [2024-06-15 12:48:47,966][1652491] Updated weights for policy 0, policy_version 103490 (0.0017) [2024-06-15 12:48:50,299][1652491] Updated weights for policy 0, policy_version 103585 (0.0096) [2024-06-15 12:48:50,797][1652491] Updated weights for policy 0, policy_version 103616 (0.0012) [2024-06-15 12:48:50,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 46967.6, 300 sec: 46208.4). Total num frames: 212205568. Throughput: 0: 11070.7. Samples: 53104128. Policy #0 lag: (min: 5.0, avg: 98.3, max: 245.0) [2024-06-15 12:48:50,956][1648985] Avg episode reward: [(0, '122.290')] [2024-06-15 12:48:55,737][1652491] Updated weights for policy 0, policy_version 103681 (0.0094) [2024-06-15 12:48:55,955][1648985] Fps is (10 sec: 39321.5, 60 sec: 45875.2, 300 sec: 45764.1). Total num frames: 212336640. Throughput: 0: 11355.0. Samples: 53148672. Policy #0 lag: (min: 5.0, avg: 98.3, max: 245.0) [2024-06-15 12:48:55,956][1648985] Avg episode reward: [(0, '117.960')] [2024-06-15 12:48:56,382][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000103712_212402176.pth... [2024-06-15 12:48:56,539][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000098240_201195520.pth [2024-06-15 12:48:58,920][1652491] Updated weights for policy 0, policy_version 103750 (0.0014) [2024-06-15 12:49:00,136][1652491] Updated weights for policy 0, policy_version 103799 (0.0015) [2024-06-15 12:49:00,955][1648985] Fps is (10 sec: 42598.2, 60 sec: 46421.3, 300 sec: 45875.2). Total num frames: 212631552. Throughput: 0: 11355.1. Samples: 53215744. Policy #0 lag: (min: 5.0, avg: 98.3, max: 245.0) [2024-06-15 12:49:00,956][1648985] Avg episode reward: [(0, '120.450')] [2024-06-15 12:49:01,577][1652491] Updated weights for policy 0, policy_version 103856 (0.0015) [2024-06-15 12:49:05,140][1652491] Updated weights for policy 0, policy_version 103920 (0.0013) [2024-06-15 12:49:05,955][1648985] Fps is (10 sec: 52429.3, 60 sec: 45875.4, 300 sec: 46097.4). Total num frames: 212860928. Throughput: 0: 11537.1. Samples: 53288960. Policy #0 lag: (min: 5.0, avg: 98.3, max: 245.0) [2024-06-15 12:49:05,956][1648985] Avg episode reward: [(0, '110.560')] [2024-06-15 12:49:08,032][1652491] Updated weights for policy 0, policy_version 103986 (0.0013) [2024-06-15 12:49:10,955][1648985] Fps is (10 sec: 42598.5, 60 sec: 44782.9, 300 sec: 45875.2). Total num frames: 213057536. Throughput: 0: 11525.7. Samples: 53325824. Policy #0 lag: (min: 5.0, avg: 98.3, max: 245.0) [2024-06-15 12:49:10,956][1648985] Avg episode reward: [(0, '110.640')] [2024-06-15 12:49:11,033][1652491] Updated weights for policy 0, policy_version 104048 (0.0015) [2024-06-15 12:49:12,077][1652491] Updated weights for policy 0, policy_version 104067 (0.0013) [2024-06-15 12:49:13,438][1652491] Updated weights for policy 0, policy_version 104128 (0.0010) [2024-06-15 12:49:15,955][1648985] Fps is (10 sec: 45874.6, 60 sec: 44783.0, 300 sec: 46208.4). Total num frames: 213319680. Throughput: 0: 11605.3. Samples: 53393920. Policy #0 lag: (min: 5.0, avg: 98.3, max: 245.0) [2024-06-15 12:49:15,956][1648985] Avg episode reward: [(0, '131.210')] [2024-06-15 12:49:16,217][1652491] Updated weights for policy 0, policy_version 104184 (0.0012) [2024-06-15 12:49:19,264][1652491] Updated weights for policy 0, policy_version 104253 (0.0013) [2024-06-15 12:49:20,955][1648985] Fps is (10 sec: 45875.0, 60 sec: 45329.0, 300 sec: 45764.1). Total num frames: 213516288. Throughput: 0: 11662.2. Samples: 53462528. Policy #0 lag: (min: 5.0, avg: 98.3, max: 245.0) [2024-06-15 12:49:20,956][1648985] Avg episode reward: [(0, '136.920')] [2024-06-15 12:49:22,574][1652491] Updated weights for policy 0, policy_version 104309 (0.0013) [2024-06-15 12:49:24,616][1652491] Updated weights for policy 0, policy_version 104352 (0.0023) [2024-06-15 12:49:25,955][1648985] Fps is (10 sec: 45875.5, 60 sec: 46421.3, 300 sec: 46208.4). Total num frames: 213778432. Throughput: 0: 11673.6. Samples: 53494272. Policy #0 lag: (min: 5.0, avg: 98.3, max: 245.0) [2024-06-15 12:49:25,956][1648985] Avg episode reward: [(0, '130.260')] [2024-06-15 12:49:26,427][1651469] Signal inference workers to stop experience collection... (5450 times) [2024-06-15 12:49:26,469][1652491] InferenceWorker_p0-w0: stopping experience collection (5450 times) [2024-06-15 12:49:26,663][1651469] Signal inference workers to resume experience collection... (5450 times) [2024-06-15 12:49:26,664][1652491] InferenceWorker_p0-w0: resuming experience collection (5450 times) [2024-06-15 12:49:26,833][1652491] Updated weights for policy 0, policy_version 104404 (0.0017) [2024-06-15 12:49:27,644][1652491] Updated weights for policy 0, policy_version 104447 (0.0012) [2024-06-15 12:49:30,570][1652491] Updated weights for policy 0, policy_version 104511 (0.0013) [2024-06-15 12:49:30,958][1648985] Fps is (10 sec: 52415.5, 60 sec: 48057.7, 300 sec: 46208.0). Total num frames: 214040576. Throughput: 0: 11684.3. Samples: 53565952. Policy #0 lag: (min: 5.0, avg: 98.3, max: 245.0) [2024-06-15 12:49:30,959][1648985] Avg episode reward: [(0, '125.360')] [2024-06-15 12:49:34,244][1652491] Updated weights for policy 0, policy_version 104560 (0.0012) [2024-06-15 12:49:35,449][1652491] Updated weights for policy 0, policy_version 104592 (0.0013) [2024-06-15 12:49:35,955][1648985] Fps is (10 sec: 45875.3, 60 sec: 46967.4, 300 sec: 46097.4). Total num frames: 214237184. Throughput: 0: 11787.4. Samples: 53634560. Policy #0 lag: (min: 5.0, avg: 98.3, max: 245.0) [2024-06-15 12:49:35,956][1648985] Avg episode reward: [(0, '113.240')] [2024-06-15 12:49:37,983][1652491] Updated weights for policy 0, policy_version 104658 (0.0012) [2024-06-15 12:49:38,748][1652491] Updated weights for policy 0, policy_version 104703 (0.0014) [2024-06-15 12:49:40,955][1648985] Fps is (10 sec: 39331.4, 60 sec: 45875.1, 300 sec: 46097.4). Total num frames: 214433792. Throughput: 0: 11502.9. Samples: 53666304. Policy #0 lag: (min: 2.0, avg: 133.2, max: 258.0) [2024-06-15 12:49:40,956][1648985] Avg episode reward: [(0, '112.150')] [2024-06-15 12:49:41,736][1652491] Updated weights for policy 0, policy_version 104754 (0.0015) [2024-06-15 12:49:43,879][1652491] Updated weights for policy 0, policy_version 104800 (0.0040) [2024-06-15 12:49:45,955][1648985] Fps is (10 sec: 45875.2, 60 sec: 45875.2, 300 sec: 46208.4). Total num frames: 214695936. Throughput: 0: 11764.6. Samples: 53745152. Policy #0 lag: (min: 2.0, avg: 133.2, max: 258.0) [2024-06-15 12:49:45,956][1648985] Avg episode reward: [(0, '118.660')] [2024-06-15 12:49:46,366][1652491] Updated weights for policy 0, policy_version 104849 (0.0041) [2024-06-15 12:49:47,484][1652491] Updated weights for policy 0, policy_version 104892 (0.0012) [2024-06-15 12:49:48,855][1652491] Updated weights for policy 0, policy_version 104933 (0.0015) [2024-06-15 12:49:50,955][1648985] Fps is (10 sec: 52429.5, 60 sec: 45875.2, 300 sec: 46652.8). Total num frames: 214958080. Throughput: 0: 11798.8. Samples: 53819904. Policy #0 lag: (min: 2.0, avg: 133.2, max: 258.0) [2024-06-15 12:49:50,955][1648985] Avg episode reward: [(0, '129.370')] [2024-06-15 12:49:51,789][1652491] Updated weights for policy 0, policy_version 104977 (0.0014) [2024-06-15 12:49:54,241][1652491] Updated weights for policy 0, policy_version 105031 (0.0012) [2024-06-15 12:49:55,389][1652491] Updated weights for policy 0, policy_version 105082 (0.0012) [2024-06-15 12:49:55,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 48059.8, 300 sec: 46652.7). Total num frames: 215220224. Throughput: 0: 11832.9. Samples: 53858304. Policy #0 lag: (min: 2.0, avg: 133.2, max: 258.0) [2024-06-15 12:49:55,955][1648985] Avg episode reward: [(0, '134.550')] [2024-06-15 12:49:57,609][1652491] Updated weights for policy 0, policy_version 105136 (0.0014) [2024-06-15 12:49:59,967][1652491] Updated weights for policy 0, policy_version 105213 (0.0113) [2024-06-15 12:50:00,955][1648985] Fps is (10 sec: 52428.6, 60 sec: 47513.6, 300 sec: 46652.8). Total num frames: 215482368. Throughput: 0: 11673.6. Samples: 53919232. Policy #0 lag: (min: 2.0, avg: 133.2, max: 258.0) [2024-06-15 12:50:00,956][1648985] Avg episode reward: [(0, '122.740')] [2024-06-15 12:50:03,304][1652491] Updated weights for policy 0, policy_version 105264 (0.0012) [2024-06-15 12:50:05,762][1652491] Updated weights for policy 0, policy_version 105312 (0.0014) [2024-06-15 12:50:05,955][1648985] Fps is (10 sec: 45874.9, 60 sec: 46967.4, 300 sec: 46430.6). Total num frames: 215678976. Throughput: 0: 11889.8. Samples: 53997568. Policy #0 lag: (min: 2.0, avg: 133.2, max: 258.0) [2024-06-15 12:50:05,956][1648985] Avg episode reward: [(0, '105.250')] [2024-06-15 12:50:09,143][1652491] Updated weights for policy 0, policy_version 105384 (0.0013) [2024-06-15 12:50:10,396][1651469] Signal inference workers to stop experience collection... (5500 times) [2024-06-15 12:50:10,442][1652491] InferenceWorker_p0-w0: stopping experience collection (5500 times) [2024-06-15 12:50:10,628][1651469] Signal inference workers to resume experience collection... (5500 times) [2024-06-15 12:50:10,629][1652491] InferenceWorker_p0-w0: resuming experience collection (5500 times) [2024-06-15 12:50:10,786][1652491] Updated weights for policy 0, policy_version 105442 (0.0014) [2024-06-15 12:50:10,955][1648985] Fps is (10 sec: 45875.0, 60 sec: 48059.7, 300 sec: 46430.9). Total num frames: 215941120. Throughput: 0: 12003.6. Samples: 54034432. Policy #0 lag: (min: 2.0, avg: 133.2, max: 258.0) [2024-06-15 12:50:10,956][1648985] Avg episode reward: [(0, '112.480')] [2024-06-15 12:50:13,711][1652491] Updated weights for policy 0, policy_version 105492 (0.0012) [2024-06-15 12:50:15,955][1648985] Fps is (10 sec: 45875.7, 60 sec: 46967.6, 300 sec: 46208.4). Total num frames: 216137728. Throughput: 0: 11936.0. Samples: 54103040. Policy #0 lag: (min: 2.0, avg: 133.2, max: 258.0) [2024-06-15 12:50:15,955][1648985] Avg episode reward: [(0, '131.840')] [2024-06-15 12:50:16,626][1652491] Updated weights for policy 0, policy_version 105553 (0.0012) [2024-06-15 12:50:19,742][1652491] Updated weights for policy 0, policy_version 105616 (0.0013) [2024-06-15 12:50:20,955][1648985] Fps is (10 sec: 42598.5, 60 sec: 47513.6, 300 sec: 46208.4). Total num frames: 216367104. Throughput: 0: 11935.3. Samples: 54171648. Policy #0 lag: (min: 2.0, avg: 133.2, max: 258.0) [2024-06-15 12:50:20,956][1648985] Avg episode reward: [(0, '125.120')] [2024-06-15 12:50:21,570][1652491] Updated weights for policy 0, policy_version 105681 (0.0026) [2024-06-15 12:50:24,488][1652491] Updated weights for policy 0, policy_version 105730 (0.0012) [2024-06-15 12:50:25,652][1652491] Updated weights for policy 0, policy_version 105792 (0.0013) [2024-06-15 12:50:25,955][1648985] Fps is (10 sec: 52427.5, 60 sec: 48059.6, 300 sec: 46208.4). Total num frames: 216662016. Throughput: 0: 12060.4. Samples: 54209024. Policy #0 lag: (min: 2.0, avg: 133.2, max: 258.0) [2024-06-15 12:50:25,956][1648985] Avg episode reward: [(0, '109.450')] [2024-06-15 12:50:28,445][1652491] Updated weights for policy 0, policy_version 105847 (0.0017) [2024-06-15 12:50:30,955][1648985] Fps is (10 sec: 45875.4, 60 sec: 46423.4, 300 sec: 46097.4). Total num frames: 216825856. Throughput: 0: 11946.7. Samples: 54282752. Policy #0 lag: (min: 2.0, avg: 133.2, max: 258.0) [2024-06-15 12:50:30,955][1648985] Avg episode reward: [(0, '102.250')] [2024-06-15 12:50:32,004][1652491] Updated weights for policy 0, policy_version 105920 (0.0092) [2024-06-15 12:50:33,361][1652491] Updated weights for policy 0, policy_version 105980 (0.0014) [2024-06-15 12:50:35,955][1648985] Fps is (10 sec: 42599.5, 60 sec: 47513.7, 300 sec: 46319.6). Total num frames: 217088000. Throughput: 0: 11844.3. Samples: 54352896. Policy #0 lag: (min: 2.0, avg: 133.2, max: 258.0) [2024-06-15 12:50:35,955][1648985] Avg episode reward: [(0, '105.290')] [2024-06-15 12:50:36,733][1652491] Updated weights for policy 0, policy_version 106048 (0.0014) [2024-06-15 12:50:39,969][1652491] Updated weights for policy 0, policy_version 106110 (0.0012) [2024-06-15 12:50:40,955][1648985] Fps is (10 sec: 49151.7, 60 sec: 48059.8, 300 sec: 46208.4). Total num frames: 217317376. Throughput: 0: 11764.6. Samples: 54387712. Policy #0 lag: (min: 15.0, avg: 120.3, max: 271.0) [2024-06-15 12:50:40,956][1648985] Avg episode reward: [(0, '118.090')] [2024-06-15 12:50:42,665][1652491] Updated weights for policy 0, policy_version 106160 (0.0012) [2024-06-15 12:50:44,288][1652491] Updated weights for policy 0, policy_version 106232 (0.0014) [2024-06-15 12:50:45,955][1648985] Fps is (10 sec: 49152.0, 60 sec: 48059.8, 300 sec: 46541.7). Total num frames: 217579520. Throughput: 0: 11878.4. Samples: 54453760. Policy #0 lag: (min: 15.0, avg: 120.3, max: 271.0) [2024-06-15 12:50:45,955][1648985] Avg episode reward: [(0, '117.850')] [2024-06-15 12:50:48,026][1652491] Updated weights for policy 0, policy_version 106272 (0.0014) [2024-06-15 12:50:49,284][1652491] Updated weights for policy 0, policy_version 106328 (0.0013) [2024-06-15 12:50:50,085][1652491] Updated weights for policy 0, policy_version 106366 (0.0023) [2024-06-15 12:50:50,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 48059.7, 300 sec: 46430.6). Total num frames: 217841664. Throughput: 0: 11832.9. Samples: 54530048. Policy #0 lag: (min: 15.0, avg: 120.3, max: 271.0) [2024-06-15 12:50:50,956][1648985] Avg episode reward: [(0, '101.870')] [2024-06-15 12:50:53,166][1651469] Signal inference workers to stop experience collection... (5550 times) [2024-06-15 12:50:53,216][1652491] InferenceWorker_p0-w0: stopping experience collection (5550 times) [2024-06-15 12:50:53,491][1651469] Signal inference workers to resume experience collection... (5550 times) [2024-06-15 12:50:53,492][1652491] InferenceWorker_p0-w0: resuming experience collection (5550 times) [2024-06-15 12:50:53,757][1652491] Updated weights for policy 0, policy_version 106428 (0.0013) [2024-06-15 12:50:55,428][1652491] Updated weights for policy 0, policy_version 106487 (0.0066) [2024-06-15 12:50:55,955][1648985] Fps is (10 sec: 52426.7, 60 sec: 48059.5, 300 sec: 46652.7). Total num frames: 218103808. Throughput: 0: 11662.1. Samples: 54559232. Policy #0 lag: (min: 15.0, avg: 120.3, max: 271.0) [2024-06-15 12:50:55,956][1648985] Avg episode reward: [(0, '105.040')] [2024-06-15 12:50:55,960][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000106496_218103808.pth... [2024-06-15 12:50:56,018][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000101056_206962688.pth [2024-06-15 12:50:56,022][1651469] Saving a milestone train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/milestones/checkpoint_000106496_218103808.pth [2024-06-15 12:50:59,827][1652491] Updated weights for policy 0, policy_version 106544 (0.0033) [2024-06-15 12:51:00,955][1648985] Fps is (10 sec: 39321.7, 60 sec: 45875.2, 300 sec: 46208.4). Total num frames: 218234880. Throughput: 0: 11753.2. Samples: 54631936. Policy #0 lag: (min: 15.0, avg: 120.3, max: 271.0) [2024-06-15 12:51:00,956][1648985] Avg episode reward: [(0, '114.250')] [2024-06-15 12:51:01,409][1652491] Updated weights for policy 0, policy_version 106579 (0.0039) [2024-06-15 12:51:04,592][1652491] Updated weights for policy 0, policy_version 106645 (0.0012) [2024-06-15 12:51:05,955][1648985] Fps is (10 sec: 39323.1, 60 sec: 46967.6, 300 sec: 46208.4). Total num frames: 218497024. Throughput: 0: 11616.7. Samples: 54694400. Policy #0 lag: (min: 15.0, avg: 120.3, max: 271.0) [2024-06-15 12:51:05,955][1648985] Avg episode reward: [(0, '111.430')] [2024-06-15 12:51:06,398][1652491] Updated weights for policy 0, policy_version 106709 (0.0014) [2024-06-15 12:51:10,955][1648985] Fps is (10 sec: 39321.5, 60 sec: 44783.0, 300 sec: 45764.1). Total num frames: 218628096. Throughput: 0: 11548.5. Samples: 54728704. Policy #0 lag: (min: 15.0, avg: 120.3, max: 271.0) [2024-06-15 12:51:10,956][1648985] Avg episode reward: [(0, '104.910')] [2024-06-15 12:51:11,062][1652491] Updated weights for policy 0, policy_version 106768 (0.0015) [2024-06-15 12:51:12,201][1652491] Updated weights for policy 0, policy_version 106810 (0.0013) [2024-06-15 12:51:14,162][1652491] Updated weights for policy 0, policy_version 106872 (0.0014) [2024-06-15 12:51:15,955][1648985] Fps is (10 sec: 39321.2, 60 sec: 45875.1, 300 sec: 45875.2). Total num frames: 218890240. Throughput: 0: 11366.4. Samples: 54794240. Policy #0 lag: (min: 15.0, avg: 120.3, max: 271.0) [2024-06-15 12:51:15,956][1648985] Avg episode reward: [(0, '105.690')] [2024-06-15 12:51:17,223][1652491] Updated weights for policy 0, policy_version 106913 (0.0012) [2024-06-15 12:51:18,980][1652491] Updated weights for policy 0, policy_version 106994 (0.0012) [2024-06-15 12:51:20,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 46421.3, 300 sec: 46097.4). Total num frames: 219152384. Throughput: 0: 11366.4. Samples: 54864384. Policy #0 lag: (min: 15.0, avg: 120.3, max: 271.0) [2024-06-15 12:51:20,956][1648985] Avg episode reward: [(0, '119.700')] [2024-06-15 12:51:22,516][1652491] Updated weights for policy 0, policy_version 107013 (0.0013) [2024-06-15 12:51:23,922][1652491] Updated weights for policy 0, policy_version 107072 (0.0014) [2024-06-15 12:51:25,955][1648985] Fps is (10 sec: 49152.2, 60 sec: 45329.2, 300 sec: 45986.3). Total num frames: 219381760. Throughput: 0: 11298.1. Samples: 54896128. Policy #0 lag: (min: 15.0, avg: 120.3, max: 271.0) [2024-06-15 12:51:25,956][1648985] Avg episode reward: [(0, '125.510')] [2024-06-15 12:51:28,173][1652491] Updated weights for policy 0, policy_version 107142 (0.0016) [2024-06-15 12:51:30,107][1652491] Updated weights for policy 0, policy_version 107232 (0.0012) [2024-06-15 12:51:30,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 47513.6, 300 sec: 46319.5). Total num frames: 219676672. Throughput: 0: 11218.5. Samples: 54958592. Policy #0 lag: (min: 15.0, avg: 120.3, max: 271.0) [2024-06-15 12:51:30,956][1648985] Avg episode reward: [(0, '109.800')] [2024-06-15 12:51:34,738][1652491] Updated weights for policy 0, policy_version 107266 (0.0012) [2024-06-15 12:51:35,955][1648985] Fps is (10 sec: 39321.7, 60 sec: 44782.9, 300 sec: 45986.3). Total num frames: 219774976. Throughput: 0: 11082.0. Samples: 55028736. Policy #0 lag: (min: 15.0, avg: 120.3, max: 271.0) [2024-06-15 12:51:35,956][1648985] Avg episode reward: [(0, '124.490')] [2024-06-15 12:51:36,097][1652491] Updated weights for policy 0, policy_version 107325 (0.0012) [2024-06-15 12:51:37,315][1652491] Updated weights for policy 0, policy_version 107363 (0.0013) [2024-06-15 12:51:39,270][1651469] Signal inference workers to stop experience collection... (5600 times) [2024-06-15 12:51:39,309][1652491] InferenceWorker_p0-w0: stopping experience collection (5600 times) [2024-06-15 12:51:39,471][1651469] Signal inference workers to resume experience collection... (5600 times) [2024-06-15 12:51:39,487][1652491] InferenceWorker_p0-w0: resuming experience collection (5600 times) [2024-06-15 12:51:39,987][1652491] Updated weights for policy 0, policy_version 107450 (0.0012) [2024-06-15 12:51:40,956][1648985] Fps is (10 sec: 42596.8, 60 sec: 46421.1, 300 sec: 46319.5). Total num frames: 220102656. Throughput: 0: 11229.8. Samples: 55064576. Policy #0 lag: (min: 47.0, avg: 165.4, max: 303.0) [2024-06-15 12:51:40,957][1648985] Avg episode reward: [(0, '130.910')] [2024-06-15 12:51:41,650][1652491] Updated weights for policy 0, policy_version 107504 (0.0015) [2024-06-15 12:51:45,955][1648985] Fps is (10 sec: 42598.4, 60 sec: 43690.6, 300 sec: 45764.1). Total num frames: 220200960. Throughput: 0: 11275.4. Samples: 55139328. Policy #0 lag: (min: 47.0, avg: 165.4, max: 303.0) [2024-06-15 12:51:45,955][1648985] Avg episode reward: [(0, '124.680')] [2024-06-15 12:51:45,982][1652491] Updated weights for policy 0, policy_version 107536 (0.0012) [2024-06-15 12:51:48,037][1652491] Updated weights for policy 0, policy_version 107600 (0.0012) [2024-06-15 12:51:49,833][1652491] Updated weights for policy 0, policy_version 107651 (0.0015) [2024-06-15 12:51:50,890][1652491] Updated weights for policy 0, policy_version 107712 (0.0093) [2024-06-15 12:51:50,955][1648985] Fps is (10 sec: 49153.9, 60 sec: 45875.2, 300 sec: 46208.4). Total num frames: 220594176. Throughput: 0: 11320.9. Samples: 55203840. Policy #0 lag: (min: 47.0, avg: 165.4, max: 303.0) [2024-06-15 12:51:50,956][1648985] Avg episode reward: [(0, '110.810')] [2024-06-15 12:51:53,128][1652491] Updated weights for policy 0, policy_version 107761 (0.0013) [2024-06-15 12:51:55,955][1648985] Fps is (10 sec: 52427.6, 60 sec: 43690.8, 300 sec: 46101.0). Total num frames: 220725248. Throughput: 0: 11400.5. Samples: 55241728. Policy #0 lag: (min: 47.0, avg: 165.4, max: 303.0) [2024-06-15 12:51:55,956][1648985] Avg episode reward: [(0, '95.630')] [2024-06-15 12:51:57,177][1652491] Updated weights for policy 0, policy_version 107796 (0.0012) [2024-06-15 12:51:58,178][1652491] Updated weights for policy 0, policy_version 107838 (0.0038) [2024-06-15 12:52:00,693][1652491] Updated weights for policy 0, policy_version 107888 (0.0016) [2024-06-15 12:52:00,955][1648985] Fps is (10 sec: 36045.1, 60 sec: 45329.1, 300 sec: 46097.4). Total num frames: 220954624. Throughput: 0: 11594.0. Samples: 55315968. Policy #0 lag: (min: 47.0, avg: 165.4, max: 303.0) [2024-06-15 12:52:00,955][1648985] Avg episode reward: [(0, '101.980')] [2024-06-15 12:52:02,277][1652491] Updated weights for policy 0, policy_version 107967 (0.0105) [2024-06-15 12:52:04,036][1652491] Updated weights for policy 0, policy_version 108018 (0.0015) [2024-06-15 12:52:05,955][1648985] Fps is (10 sec: 52430.1, 60 sec: 45875.2, 300 sec: 46432.2). Total num frames: 221249536. Throughput: 0: 11446.1. Samples: 55379456. Policy #0 lag: (min: 47.0, avg: 165.4, max: 303.0) [2024-06-15 12:52:05,956][1648985] Avg episode reward: [(0, '107.430')] [2024-06-15 12:52:08,645][1652491] Updated weights for policy 0, policy_version 108065 (0.0012) [2024-06-15 12:52:10,955][1648985] Fps is (10 sec: 42597.2, 60 sec: 45875.1, 300 sec: 46208.4). Total num frames: 221380608. Throughput: 0: 11491.5. Samples: 55413248. Policy #0 lag: (min: 47.0, avg: 165.4, max: 303.0) [2024-06-15 12:52:10,956][1648985] Avg episode reward: [(0, '108.660')] [2024-06-15 12:52:11,824][1652491] Updated weights for policy 0, policy_version 108128 (0.0018) [2024-06-15 12:52:13,502][1652491] Updated weights for policy 0, policy_version 108192 (0.0019) [2024-06-15 12:52:15,214][1652491] Updated weights for policy 0, policy_version 108240 (0.0012) [2024-06-15 12:52:15,955][1648985] Fps is (10 sec: 45874.8, 60 sec: 46967.5, 300 sec: 46430.6). Total num frames: 221708288. Throughput: 0: 11582.6. Samples: 55479808. Policy #0 lag: (min: 47.0, avg: 165.4, max: 303.0) [2024-06-15 12:52:15,956][1648985] Avg episode reward: [(0, '115.000')] [2024-06-15 12:52:20,058][1652491] Updated weights for policy 0, policy_version 108304 (0.0013) [2024-06-15 12:52:20,955][1648985] Fps is (10 sec: 45876.2, 60 sec: 44782.9, 300 sec: 45986.3). Total num frames: 221839360. Throughput: 0: 11537.1. Samples: 55547904. Policy #0 lag: (min: 47.0, avg: 165.4, max: 303.0) [2024-06-15 12:52:20,956][1648985] Avg episode reward: [(0, '119.090')] [2024-06-15 12:52:23,055][1652491] Updated weights for policy 0, policy_version 108368 (0.0013) [2024-06-15 12:52:24,297][1651469] Signal inference workers to stop experience collection... (5650 times) [2024-06-15 12:52:24,324][1652491] InferenceWorker_p0-w0: stopping experience collection (5650 times) [2024-06-15 12:52:24,535][1651469] Signal inference workers to resume experience collection... (5650 times) [2024-06-15 12:52:24,536][1652491] InferenceWorker_p0-w0: resuming experience collection (5650 times) [2024-06-15 12:52:24,637][1652491] Updated weights for policy 0, policy_version 108432 (0.0110) [2024-06-15 12:52:25,955][1648985] Fps is (10 sec: 45874.7, 60 sec: 46421.2, 300 sec: 46208.4). Total num frames: 222167040. Throughput: 0: 11514.4. Samples: 55582720. Policy #0 lag: (min: 47.0, avg: 165.4, max: 303.0) [2024-06-15 12:52:25,956][1648985] Avg episode reward: [(0, '128.070')] [2024-06-15 12:52:26,564][1652491] Updated weights for policy 0, policy_version 108496 (0.0013) [2024-06-15 12:52:27,749][1652491] Updated weights for policy 0, policy_version 108539 (0.0015) [2024-06-15 12:52:30,955][1648985] Fps is (10 sec: 45875.2, 60 sec: 43690.7, 300 sec: 45986.3). Total num frames: 222298112. Throughput: 0: 11366.4. Samples: 55650816. Policy #0 lag: (min: 47.0, avg: 165.4, max: 303.0) [2024-06-15 12:52:30,956][1648985] Avg episode reward: [(0, '119.370')] [2024-06-15 12:52:32,550][1652491] Updated weights for policy 0, policy_version 108577 (0.0046) [2024-06-15 12:52:34,627][1652491] Updated weights for policy 0, policy_version 108627 (0.0012) [2024-06-15 12:52:35,941][1652491] Updated weights for policy 0, policy_version 108688 (0.0013) [2024-06-15 12:52:35,955][1648985] Fps is (10 sec: 42599.3, 60 sec: 46967.5, 300 sec: 46097.4). Total num frames: 222593024. Throughput: 0: 11491.6. Samples: 55720960. Policy #0 lag: (min: 47.0, avg: 165.4, max: 303.0) [2024-06-15 12:52:35,956][1648985] Avg episode reward: [(0, '117.360')] [2024-06-15 12:52:37,704][1652491] Updated weights for policy 0, policy_version 108737 (0.0012) [2024-06-15 12:52:38,960][1652491] Updated weights for policy 0, policy_version 108799 (0.0013) [2024-06-15 12:52:40,955][1648985] Fps is (10 sec: 52428.4, 60 sec: 45329.3, 300 sec: 46208.4). Total num frames: 222822400. Throughput: 0: 11377.8. Samples: 55753728. Policy #0 lag: (min: 47.0, avg: 165.4, max: 303.0) [2024-06-15 12:52:40,956][1648985] Avg episode reward: [(0, '116.650')] [2024-06-15 12:52:43,345][1652491] Updated weights for policy 0, policy_version 108848 (0.0015) [2024-06-15 12:52:45,720][1652491] Updated weights for policy 0, policy_version 108897 (0.0012) [2024-06-15 12:52:45,955][1648985] Fps is (10 sec: 42597.9, 60 sec: 46967.4, 300 sec: 46208.4). Total num frames: 223019008. Throughput: 0: 11514.3. Samples: 55834112. Policy #0 lag: (min: 15.0, avg: 98.4, max: 271.0) [2024-06-15 12:52:45,956][1648985] Avg episode reward: [(0, '124.850')] [2024-06-15 12:52:47,081][1652491] Updated weights for policy 0, policy_version 108964 (0.0011) [2024-06-15 12:52:49,140][1652491] Updated weights for policy 0, policy_version 109012 (0.0014) [2024-06-15 12:52:50,082][1652491] Updated weights for policy 0, policy_version 109054 (0.0056) [2024-06-15 12:52:50,955][1648985] Fps is (10 sec: 52429.4, 60 sec: 45875.2, 300 sec: 46652.8). Total num frames: 223346688. Throughput: 0: 11594.0. Samples: 55901184. Policy #0 lag: (min: 15.0, avg: 98.4, max: 271.0) [2024-06-15 12:52:50,955][1648985] Avg episode reward: [(0, '120.700')] [2024-06-15 12:52:54,926][1652491] Updated weights for policy 0, policy_version 109112 (0.0012) [2024-06-15 12:52:55,955][1648985] Fps is (10 sec: 45874.3, 60 sec: 45875.1, 300 sec: 46208.4). Total num frames: 223477760. Throughput: 0: 11673.6. Samples: 55938560. Policy #0 lag: (min: 15.0, avg: 98.4, max: 271.0) [2024-06-15 12:52:55,956][1648985] Avg episode reward: [(0, '116.100')] [2024-06-15 12:52:55,962][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000109120_223477760.pth... [2024-06-15 12:52:56,010][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000103712_212402176.pth [2024-06-15 12:52:56,743][1652491] Updated weights for policy 0, policy_version 109138 (0.0013) [2024-06-15 12:52:58,975][1652491] Updated weights for policy 0, policy_version 109232 (0.0011) [2024-06-15 12:53:00,955][1648985] Fps is (10 sec: 39321.5, 60 sec: 46421.3, 300 sec: 46208.5). Total num frames: 223739904. Throughput: 0: 11514.3. Samples: 55997952. Policy #0 lag: (min: 15.0, avg: 98.4, max: 271.0) [2024-06-15 12:53:00,956][1648985] Avg episode reward: [(0, '125.560')] [2024-06-15 12:53:01,674][1652491] Updated weights for policy 0, policy_version 109282 (0.0014) [2024-06-15 12:53:05,955][1648985] Fps is (10 sec: 42599.6, 60 sec: 44236.8, 300 sec: 45875.2). Total num frames: 223903744. Throughput: 0: 11582.6. Samples: 56069120. Policy #0 lag: (min: 15.0, avg: 98.4, max: 271.0) [2024-06-15 12:53:05,956][1648985] Avg episode reward: [(0, '144.470')] [2024-06-15 12:53:06,089][1652491] Updated weights for policy 0, policy_version 109330 (0.0014) [2024-06-15 12:53:08,235][1652491] Updated weights for policy 0, policy_version 109378 (0.0013) [2024-06-15 12:53:08,952][1651469] Signal inference workers to stop experience collection... (5700 times) [2024-06-15 12:53:09,022][1652491] InferenceWorker_p0-w0: stopping experience collection (5700 times) [2024-06-15 12:53:09,220][1651469] Signal inference workers to resume experience collection... (5700 times) [2024-06-15 12:53:09,222][1652491] InferenceWorker_p0-w0: resuming experience collection (5700 times) [2024-06-15 12:53:10,172][1652491] Updated weights for policy 0, policy_version 109472 (0.0111) [2024-06-15 12:53:10,955][1648985] Fps is (10 sec: 52428.1, 60 sec: 48059.8, 300 sec: 46208.5). Total num frames: 224264192. Throughput: 0: 11525.7. Samples: 56101376. Policy #0 lag: (min: 15.0, avg: 98.4, max: 271.0) [2024-06-15 12:53:10,956][1648985] Avg episode reward: [(0, '152.500')] [2024-06-15 12:53:12,669][1652491] Updated weights for policy 0, policy_version 109526 (0.0013) [2024-06-15 12:53:15,955][1648985] Fps is (10 sec: 49152.2, 60 sec: 44783.0, 300 sec: 46097.4). Total num frames: 224395264. Throughput: 0: 11525.7. Samples: 56169472. Policy #0 lag: (min: 15.0, avg: 98.4, max: 271.0) [2024-06-15 12:53:15,956][1648985] Avg episode reward: [(0, '155.240')] [2024-06-15 12:53:17,902][1652491] Updated weights for policy 0, policy_version 109587 (0.0013) [2024-06-15 12:53:18,920][1652491] Updated weights for policy 0, policy_version 109631 (0.0013) [2024-06-15 12:53:20,536][1652491] Updated weights for policy 0, policy_version 109692 (0.0014) [2024-06-15 12:53:20,955][1648985] Fps is (10 sec: 39322.3, 60 sec: 46967.5, 300 sec: 46319.5). Total num frames: 224657408. Throughput: 0: 11446.0. Samples: 56236032. Policy #0 lag: (min: 15.0, avg: 98.4, max: 271.0) [2024-06-15 12:53:20,956][1648985] Avg episode reward: [(0, '123.250')] [2024-06-15 12:53:22,181][1652491] Updated weights for policy 0, policy_version 109744 (0.0020) [2024-06-15 12:53:23,696][1652491] Updated weights for policy 0, policy_version 109792 (0.0011) [2024-06-15 12:53:25,955][1648985] Fps is (10 sec: 52428.1, 60 sec: 45875.3, 300 sec: 46652.7). Total num frames: 224919552. Throughput: 0: 11434.7. Samples: 56268288. Policy #0 lag: (min: 15.0, avg: 98.4, max: 271.0) [2024-06-15 12:53:25,956][1648985] Avg episode reward: [(0, '111.340')] [2024-06-15 12:53:29,710][1652491] Updated weights for policy 0, policy_version 109840 (0.0012) [2024-06-15 12:53:30,887][1652491] Updated weights for policy 0, policy_version 109882 (0.0013) [2024-06-15 12:53:30,955][1648985] Fps is (10 sec: 36044.9, 60 sec: 45329.1, 300 sec: 46097.4). Total num frames: 225017856. Throughput: 0: 11343.7. Samples: 56344576. Policy #0 lag: (min: 15.0, avg: 98.4, max: 271.0) [2024-06-15 12:53:30,955][1648985] Avg episode reward: [(0, '115.210')] [2024-06-15 12:53:32,419][1652491] Updated weights for policy 0, policy_version 109952 (0.0013) [2024-06-15 12:53:33,702][1652491] Updated weights for policy 0, policy_version 110010 (0.0012) [2024-06-15 12:53:35,394][1652491] Updated weights for policy 0, policy_version 110049 (0.0045) [2024-06-15 12:53:35,955][1648985] Fps is (10 sec: 49152.2, 60 sec: 46967.4, 300 sec: 46541.7). Total num frames: 225411072. Throughput: 0: 11173.0. Samples: 56403968. Policy #0 lag: (min: 15.0, avg: 98.4, max: 271.0) [2024-06-15 12:53:35,956][1648985] Avg episode reward: [(0, '141.340')] [2024-06-15 12:53:40,955][1648985] Fps is (10 sec: 45874.8, 60 sec: 44236.8, 300 sec: 45875.2). Total num frames: 225476608. Throughput: 0: 11195.8. Samples: 56442368. Policy #0 lag: (min: 15.0, avg: 98.4, max: 271.0) [2024-06-15 12:53:40,956][1648985] Avg episode reward: [(0, '143.270')] [2024-06-15 12:53:41,178][1652491] Updated weights for policy 0, policy_version 110098 (0.0029) [2024-06-15 12:53:42,100][1652491] Updated weights for policy 0, policy_version 110141 (0.0013) [2024-06-15 12:53:44,138][1652491] Updated weights for policy 0, policy_version 110212 (0.0013) [2024-06-15 12:53:45,049][1652491] Updated weights for policy 0, policy_version 110265 (0.0016) [2024-06-15 12:53:45,955][1648985] Fps is (10 sec: 45874.6, 60 sec: 47513.5, 300 sec: 46319.5). Total num frames: 225869824. Throughput: 0: 11423.2. Samples: 56512000. Policy #0 lag: (min: 79.0, avg: 168.6, max: 326.0) [2024-06-15 12:53:45,956][1648985] Avg episode reward: [(0, '146.980')] [2024-06-15 12:53:46,584][1652491] Updated weights for policy 0, policy_version 110306 (0.0012) [2024-06-15 12:53:50,955][1648985] Fps is (10 sec: 49152.1, 60 sec: 43690.6, 300 sec: 46208.4). Total num frames: 225968128. Throughput: 0: 11457.4. Samples: 56584704. Policy #0 lag: (min: 79.0, avg: 168.6, max: 326.0) [2024-06-15 12:53:50,956][1648985] Avg episode reward: [(0, '157.710')] [2024-06-15 12:53:52,392][1652491] Updated weights for policy 0, policy_version 110338 (0.0042) [2024-06-15 12:53:53,857][1651469] Signal inference workers to stop experience collection... (5750 times) [2024-06-15 12:53:53,893][1652491] InferenceWorker_p0-w0: stopping experience collection (5750 times) [2024-06-15 12:53:53,984][1651469] Signal inference workers to resume experience collection... (5750 times) [2024-06-15 12:53:53,985][1652491] InferenceWorker_p0-w0: resuming experience collection (5750 times) [2024-06-15 12:53:53,987][1652491] Updated weights for policy 0, policy_version 110400 (0.0012) [2024-06-15 12:53:55,199][1652491] Updated weights for policy 0, policy_version 110448 (0.0013) [2024-06-15 12:53:55,955][1648985] Fps is (10 sec: 36045.4, 60 sec: 45875.4, 300 sec: 46097.4). Total num frames: 226230272. Throughput: 0: 11400.6. Samples: 56614400. Policy #0 lag: (min: 79.0, avg: 168.6, max: 326.0) [2024-06-15 12:53:55,956][1648985] Avg episode reward: [(0, '169.120')] [2024-06-15 12:53:55,960][1651469] Saving new best policy, reward=169.120! [2024-06-15 12:53:56,522][1652491] Updated weights for policy 0, policy_version 110485 (0.0013) [2024-06-15 12:53:57,525][1652491] Updated weights for policy 0, policy_version 110544 (0.0026) [2024-06-15 12:54:00,955][1648985] Fps is (10 sec: 52428.5, 60 sec: 45875.2, 300 sec: 46208.4). Total num frames: 226492416. Throughput: 0: 11309.5. Samples: 56678400. Policy #0 lag: (min: 79.0, avg: 168.6, max: 326.0) [2024-06-15 12:54:00,956][1648985] Avg episode reward: [(0, '152.830')] [2024-06-15 12:54:04,433][1652491] Updated weights for policy 0, policy_version 110608 (0.0102) [2024-06-15 12:54:05,955][1648985] Fps is (10 sec: 39321.6, 60 sec: 45329.1, 300 sec: 45986.3). Total num frames: 226623488. Throughput: 0: 11491.5. Samples: 56753152. Policy #0 lag: (min: 79.0, avg: 168.6, max: 326.0) [2024-06-15 12:54:05,955][1648985] Avg episode reward: [(0, '128.900')] [2024-06-15 12:54:06,223][1652491] Updated weights for policy 0, policy_version 110675 (0.0107) [2024-06-15 12:54:07,200][1652491] Updated weights for policy 0, policy_version 110720 (0.0012) [2024-06-15 12:54:08,613][1652491] Updated weights for policy 0, policy_version 110777 (0.0014) [2024-06-15 12:54:10,120][1652491] Updated weights for policy 0, policy_version 110832 (0.0022) [2024-06-15 12:54:10,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 45875.3, 300 sec: 46430.6). Total num frames: 227016704. Throughput: 0: 11423.3. Samples: 56782336. Policy #0 lag: (min: 79.0, avg: 168.6, max: 326.0) [2024-06-15 12:54:10,956][1648985] Avg episode reward: [(0, '123.550')] [2024-06-15 12:54:15,955][1648985] Fps is (10 sec: 42598.3, 60 sec: 44236.8, 300 sec: 45875.2). Total num frames: 227049472. Throughput: 0: 11411.9. Samples: 56858112. Policy #0 lag: (min: 79.0, avg: 168.6, max: 326.0) [2024-06-15 12:54:15,956][1648985] Avg episode reward: [(0, '122.090')] [2024-06-15 12:54:16,302][1652491] Updated weights for policy 0, policy_version 110883 (0.0012) [2024-06-15 12:54:17,887][1652491] Updated weights for policy 0, policy_version 110947 (0.0012) [2024-06-15 12:54:19,183][1652491] Updated weights for policy 0, policy_version 110981 (0.0012) [2024-06-15 12:54:20,869][1652491] Updated weights for policy 0, policy_version 111060 (0.0013) [2024-06-15 12:54:20,955][1648985] Fps is (10 sec: 42597.7, 60 sec: 46421.2, 300 sec: 46319.5). Total num frames: 227442688. Throughput: 0: 11423.3. Samples: 56918016. Policy #0 lag: (min: 79.0, avg: 168.6, max: 326.0) [2024-06-15 12:54:20,956][1648985] Avg episode reward: [(0, '118.760')] [2024-06-15 12:54:21,991][1652491] Updated weights for policy 0, policy_version 111104 (0.0029) [2024-06-15 12:54:25,955][1648985] Fps is (10 sec: 49152.0, 60 sec: 43690.7, 300 sec: 45764.5). Total num frames: 227540992. Throughput: 0: 11332.3. Samples: 56952320. Policy #0 lag: (min: 79.0, avg: 168.6, max: 326.0) [2024-06-15 12:54:25,956][1648985] Avg episode reward: [(0, '127.100')] [2024-06-15 12:54:27,958][1652491] Updated weights for policy 0, policy_version 111162 (0.0021) [2024-06-15 12:54:29,031][1652491] Updated weights for policy 0, policy_version 111200 (0.0032) [2024-06-15 12:54:30,262][1652491] Updated weights for policy 0, policy_version 111235 (0.0015) [2024-06-15 12:54:30,955][1648985] Fps is (10 sec: 42597.8, 60 sec: 47513.3, 300 sec: 46208.4). Total num frames: 227868672. Throughput: 0: 11366.4. Samples: 57023488. Policy #0 lag: (min: 79.0, avg: 168.6, max: 326.0) [2024-06-15 12:54:30,956][1648985] Avg episode reward: [(0, '138.680')] [2024-06-15 12:54:31,565][1652491] Updated weights for policy 0, policy_version 111296 (0.0013) [2024-06-15 12:54:32,854][1651469] Signal inference workers to stop experience collection... (5800 times) [2024-06-15 12:54:32,941][1652491] InferenceWorker_p0-w0: stopping experience collection (5800 times) [2024-06-15 12:54:33,113][1651469] Signal inference workers to resume experience collection... (5800 times) [2024-06-15 12:54:33,114][1652491] InferenceWorker_p0-w0: resuming experience collection (5800 times) [2024-06-15 12:54:33,472][1652491] Updated weights for policy 0, policy_version 111358 (0.0014) [2024-06-15 12:54:35,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 44236.8, 300 sec: 46208.4). Total num frames: 228065280. Throughput: 0: 11241.2. Samples: 57090560. Policy #0 lag: (min: 79.0, avg: 168.6, max: 326.0) [2024-06-15 12:54:35,956][1648985] Avg episode reward: [(0, '140.900')] [2024-06-15 12:54:40,121][1652491] Updated weights for policy 0, policy_version 111411 (0.0013) [2024-06-15 12:54:40,955][1648985] Fps is (10 sec: 36046.0, 60 sec: 45875.2, 300 sec: 45875.2). Total num frames: 228229120. Throughput: 0: 11525.7. Samples: 57133056. Policy #0 lag: (min: 79.0, avg: 168.6, max: 326.0) [2024-06-15 12:54:40,956][1648985] Avg episode reward: [(0, '149.660')] [2024-06-15 12:54:42,153][1652491] Updated weights for policy 0, policy_version 111492 (0.0012) [2024-06-15 12:54:43,953][1652491] Updated weights for policy 0, policy_version 111557 (0.0013) [2024-06-15 12:54:45,101][1652491] Updated weights for policy 0, policy_version 111610 (0.0012) [2024-06-15 12:54:45,955][1648985] Fps is (10 sec: 52428.5, 60 sec: 45329.1, 300 sec: 46208.4). Total num frames: 228589568. Throughput: 0: 11343.6. Samples: 57188864. Policy #0 lag: (min: 35.0, avg: 202.3, max: 291.0) [2024-06-15 12:54:45,956][1648985] Avg episode reward: [(0, '145.880')] [2024-06-15 12:54:50,955][1648985] Fps is (10 sec: 39321.6, 60 sec: 44236.8, 300 sec: 45430.9). Total num frames: 228622336. Throughput: 0: 11514.3. Samples: 57271296. Policy #0 lag: (min: 35.0, avg: 202.3, max: 291.0) [2024-06-15 12:54:50,955][1648985] Avg episode reward: [(0, '131.140')] [2024-06-15 12:54:51,702][1652491] Updated weights for policy 0, policy_version 111680 (0.0015) [2024-06-15 12:54:53,827][1652491] Updated weights for policy 0, policy_version 111763 (0.0018) [2024-06-15 12:54:55,955][1648985] Fps is (10 sec: 45875.1, 60 sec: 46967.4, 300 sec: 45986.3). Total num frames: 229048320. Throughput: 0: 11468.8. Samples: 57298432. Policy #0 lag: (min: 35.0, avg: 202.3, max: 291.0) [2024-06-15 12:54:55,956][1648985] Avg episode reward: [(0, '122.680')] [2024-06-15 12:54:56,163][1652491] Updated weights for policy 0, policy_version 111856 (0.0013) [2024-06-15 12:54:56,409][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000111872_229113856.pth... [2024-06-15 12:54:56,493][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000106496_218103808.pth [2024-06-15 12:55:00,955][1648985] Fps is (10 sec: 49151.9, 60 sec: 43690.7, 300 sec: 45542.0). Total num frames: 229113856. Throughput: 0: 11241.2. Samples: 57363968. Policy #0 lag: (min: 35.0, avg: 202.3, max: 291.0) [2024-06-15 12:55:00,956][1648985] Avg episode reward: [(0, '123.830')] [2024-06-15 12:55:02,770][1652491] Updated weights for policy 0, policy_version 111904 (0.0013) [2024-06-15 12:55:04,556][1652491] Updated weights for policy 0, policy_version 111992 (0.0012) [2024-06-15 12:55:05,955][1648985] Fps is (10 sec: 39321.8, 60 sec: 46967.4, 300 sec: 45764.1). Total num frames: 229441536. Throughput: 0: 11332.3. Samples: 57427968. Policy #0 lag: (min: 35.0, avg: 202.3, max: 291.0) [2024-06-15 12:55:05,956][1648985] Avg episode reward: [(0, '121.380')] [2024-06-15 12:55:06,284][1652491] Updated weights for policy 0, policy_version 112055 (0.0017) [2024-06-15 12:55:07,753][1652491] Updated weights for policy 0, policy_version 112112 (0.0013) [2024-06-15 12:55:10,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 43690.7, 300 sec: 45764.1). Total num frames: 229638144. Throughput: 0: 11241.2. Samples: 57458176. Policy #0 lag: (min: 35.0, avg: 202.3, max: 291.0) [2024-06-15 12:55:10,956][1648985] Avg episode reward: [(0, '137.010')] [2024-06-15 12:55:14,510][1652491] Updated weights for policy 0, policy_version 112161 (0.0048) [2024-06-15 12:55:15,955][1648985] Fps is (10 sec: 39321.4, 60 sec: 46421.3, 300 sec: 45653.0). Total num frames: 229834752. Throughput: 0: 11400.6. Samples: 57536512. Policy #0 lag: (min: 35.0, avg: 202.3, max: 291.0) [2024-06-15 12:55:15,956][1648985] Avg episode reward: [(0, '124.680')] [2024-06-15 12:55:16,288][1652491] Updated weights for policy 0, policy_version 112240 (0.0013) [2024-06-15 12:55:16,408][1651469] Signal inference workers to stop experience collection... (5850 times) [2024-06-15 12:55:16,462][1652491] InferenceWorker_p0-w0: stopping experience collection (5850 times) [2024-06-15 12:55:16,614][1651469] Signal inference workers to resume experience collection... (5850 times) [2024-06-15 12:55:16,616][1652491] InferenceWorker_p0-w0: resuming experience collection (5850 times) [2024-06-15 12:55:17,539][1652491] Updated weights for policy 0, policy_version 112294 (0.0022) [2024-06-15 12:55:19,128][1652491] Updated weights for policy 0, policy_version 112368 (0.0117) [2024-06-15 12:55:20,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 45329.2, 300 sec: 45764.1). Total num frames: 230162432. Throughput: 0: 11332.3. Samples: 57600512. Policy #0 lag: (min: 35.0, avg: 202.3, max: 291.0) [2024-06-15 12:55:20,956][1648985] Avg episode reward: [(0, '122.110')] [2024-06-15 12:55:25,113][1652491] Updated weights for policy 0, policy_version 112416 (0.0020) [2024-06-15 12:55:25,955][1648985] Fps is (10 sec: 45875.5, 60 sec: 45875.2, 300 sec: 45653.0). Total num frames: 230293504. Throughput: 0: 11343.6. Samples: 57643520. Policy #0 lag: (min: 35.0, avg: 202.3, max: 291.0) [2024-06-15 12:55:25,956][1648985] Avg episode reward: [(0, '130.100')] [2024-06-15 12:55:26,196][1652491] Updated weights for policy 0, policy_version 112464 (0.0015) [2024-06-15 12:55:27,814][1652491] Updated weights for policy 0, policy_version 112528 (0.0030) [2024-06-15 12:55:29,375][1652491] Updated weights for policy 0, policy_version 112595 (0.0015) [2024-06-15 12:55:30,955][1648985] Fps is (10 sec: 52428.3, 60 sec: 46967.6, 300 sec: 46097.3). Total num frames: 230686720. Throughput: 0: 11491.5. Samples: 57705984. Policy #0 lag: (min: 35.0, avg: 202.3, max: 291.0) [2024-06-15 12:55:30,956][1648985] Avg episode reward: [(0, '134.540')] [2024-06-15 12:55:35,954][1652491] Updated weights for policy 0, policy_version 112656 (0.0013) [2024-06-15 12:55:35,955][1648985] Fps is (10 sec: 42598.7, 60 sec: 44236.8, 300 sec: 45430.9). Total num frames: 230719488. Throughput: 0: 11480.2. Samples: 57787904. Policy #0 lag: (min: 35.0, avg: 202.3, max: 291.0) [2024-06-15 12:55:35,956][1648985] Avg episode reward: [(0, '129.650')] [2024-06-15 12:55:38,596][1652491] Updated weights for policy 0, policy_version 112738 (0.0028) [2024-06-15 12:55:40,440][1652491] Updated weights for policy 0, policy_version 112808 (0.0012) [2024-06-15 12:55:40,955][1648985] Fps is (10 sec: 39322.0, 60 sec: 47513.6, 300 sec: 45764.1). Total num frames: 231079936. Throughput: 0: 11446.1. Samples: 57813504. Policy #0 lag: (min: 35.0, avg: 202.3, max: 291.0) [2024-06-15 12:55:40,956][1648985] Avg episode reward: [(0, '129.290')] [2024-06-15 12:55:41,427][1652491] Updated weights for policy 0, policy_version 112852 (0.0013) [2024-06-15 12:55:42,592][1652491] Updated weights for policy 0, policy_version 112896 (0.0034) [2024-06-15 12:55:45,955][1648985] Fps is (10 sec: 49151.9, 60 sec: 43690.7, 300 sec: 45319.8). Total num frames: 231211008. Throughput: 0: 11411.9. Samples: 57877504. Policy #0 lag: (min: 35.0, avg: 202.3, max: 291.0) [2024-06-15 12:55:45,956][1648985] Avg episode reward: [(0, '130.760')] [2024-06-15 12:55:48,587][1652491] Updated weights for policy 0, policy_version 112951 (0.0012) [2024-06-15 12:55:50,573][1652491] Updated weights for policy 0, policy_version 112993 (0.0015) [2024-06-15 12:55:50,955][1648985] Fps is (10 sec: 36044.9, 60 sec: 46967.5, 300 sec: 45208.8). Total num frames: 231440384. Throughput: 0: 11628.1. Samples: 57951232. Policy #0 lag: (min: 15.0, avg: 84.0, max: 271.0) [2024-06-15 12:55:50,956][1648985] Avg episode reward: [(0, '130.560')] [2024-06-15 12:55:52,236][1652491] Updated weights for policy 0, policy_version 113072 (0.0012) [2024-06-15 12:55:53,700][1651469] Signal inference workers to stop experience collection... (5900 times) [2024-06-15 12:55:53,750][1652491] InferenceWorker_p0-w0: stopping experience collection (5900 times) [2024-06-15 12:55:53,772][1652491] Updated weights for policy 0, policy_version 113122 (0.0011) [2024-06-15 12:55:54,047][1651469] Signal inference workers to resume experience collection... (5900 times) [2024-06-15 12:55:54,048][1652491] InferenceWorker_p0-w0: resuming experience collection (5900 times) [2024-06-15 12:55:55,955][1648985] Fps is (10 sec: 52427.1, 60 sec: 44782.8, 300 sec: 45764.1). Total num frames: 231735296. Throughput: 0: 11548.4. Samples: 57977856. Policy #0 lag: (min: 15.0, avg: 84.0, max: 271.0) [2024-06-15 12:55:55,956][1648985] Avg episode reward: [(0, '126.030')] [2024-06-15 12:55:59,249][1652491] Updated weights for policy 0, policy_version 113159 (0.0011) [2024-06-15 12:56:00,955][1648985] Fps is (10 sec: 42598.4, 60 sec: 45875.2, 300 sec: 45319.8). Total num frames: 231866368. Throughput: 0: 11514.3. Samples: 58054656. Policy #0 lag: (min: 15.0, avg: 84.0, max: 271.0) [2024-06-15 12:56:00,956][1648985] Avg episode reward: [(0, '114.370')] [2024-06-15 12:56:01,089][1652491] Updated weights for policy 0, policy_version 113222 (0.0034) [2024-06-15 12:56:02,685][1652491] Updated weights for policy 0, policy_version 113296 (0.0011) [2024-06-15 12:56:04,691][1652491] Updated weights for policy 0, policy_version 113368 (0.0168) [2024-06-15 12:56:05,472][1652491] Updated weights for policy 0, policy_version 113408 (0.0062) [2024-06-15 12:56:05,955][1648985] Fps is (10 sec: 52429.7, 60 sec: 46967.4, 300 sec: 46208.4). Total num frames: 232259584. Throughput: 0: 11298.1. Samples: 58108928. Policy #0 lag: (min: 15.0, avg: 84.0, max: 271.0) [2024-06-15 12:56:05,956][1648985] Avg episode reward: [(0, '129.480')] [2024-06-15 12:56:10,955][1648985] Fps is (10 sec: 39321.5, 60 sec: 43690.7, 300 sec: 45319.8). Total num frames: 232259584. Throughput: 0: 11241.2. Samples: 58149376. Policy #0 lag: (min: 15.0, avg: 84.0, max: 271.0) [2024-06-15 12:56:10,956][1648985] Avg episode reward: [(0, '129.090')] [2024-06-15 12:56:12,168][1652491] Updated weights for policy 0, policy_version 113457 (0.0014) [2024-06-15 12:56:14,076][1652491] Updated weights for policy 0, policy_version 113533 (0.0097) [2024-06-15 12:56:15,663][1652491] Updated weights for policy 0, policy_version 113585 (0.0026) [2024-06-15 12:56:15,955][1648985] Fps is (10 sec: 39321.8, 60 sec: 46967.5, 300 sec: 45764.1). Total num frames: 232652800. Throughput: 0: 11377.8. Samples: 58217984. Policy #0 lag: (min: 15.0, avg: 84.0, max: 271.0) [2024-06-15 12:56:15,956][1648985] Avg episode reward: [(0, '127.170')] [2024-06-15 12:56:16,749][1652491] Updated weights for policy 0, policy_version 113633 (0.0012) [2024-06-15 12:56:20,964][1648985] Fps is (10 sec: 52380.4, 60 sec: 43683.9, 300 sec: 45429.5). Total num frames: 232783872. Throughput: 0: 11318.6. Samples: 58297344. Policy #0 lag: (min: 15.0, avg: 84.0, max: 271.0) [2024-06-15 12:56:20,965][1648985] Avg episode reward: [(0, '134.440')] [2024-06-15 12:56:22,255][1652491] Updated weights for policy 0, policy_version 113680 (0.0013) [2024-06-15 12:56:23,777][1652491] Updated weights for policy 0, policy_version 113746 (0.0113) [2024-06-15 12:56:25,955][1648985] Fps is (10 sec: 42598.7, 60 sec: 46421.3, 300 sec: 45430.9). Total num frames: 233078784. Throughput: 0: 11457.4. Samples: 58329088. Policy #0 lag: (min: 15.0, avg: 84.0, max: 271.0) [2024-06-15 12:56:25,956][1648985] Avg episode reward: [(0, '137.360')] [2024-06-15 12:56:25,984][1652491] Updated weights for policy 0, policy_version 113824 (0.0122) [2024-06-15 12:56:27,424][1652491] Updated weights for policy 0, policy_version 113888 (0.0013) [2024-06-15 12:56:30,955][1648985] Fps is (10 sec: 52477.5, 60 sec: 43690.8, 300 sec: 45875.2). Total num frames: 233308160. Throughput: 0: 11548.4. Samples: 58397184. Policy #0 lag: (min: 15.0, avg: 84.0, max: 271.0) [2024-06-15 12:56:30,955][1648985] Avg episode reward: [(0, '139.980')] [2024-06-15 12:56:32,566][1652491] Updated weights for policy 0, policy_version 113923 (0.0013) [2024-06-15 12:56:34,567][1652491] Updated weights for policy 0, policy_version 114001 (0.0012) [2024-06-15 12:56:34,988][1651469] Signal inference workers to stop experience collection... (5950 times) [2024-06-15 12:56:35,061][1652491] InferenceWorker_p0-w0: stopping experience collection (5950 times) [2024-06-15 12:56:35,239][1651469] Signal inference workers to resume experience collection... (5950 times) [2024-06-15 12:56:35,240][1652491] InferenceWorker_p0-w0: resuming experience collection (5950 times) [2024-06-15 12:56:35,955][1648985] Fps is (10 sec: 49152.0, 60 sec: 47513.6, 300 sec: 45653.1). Total num frames: 233570304. Throughput: 0: 11491.5. Samples: 58468352. Policy #0 lag: (min: 15.0, avg: 84.0, max: 271.0) [2024-06-15 12:56:35,956][1648985] Avg episode reward: [(0, '150.530')] [2024-06-15 12:56:36,735][1652491] Updated weights for policy 0, policy_version 114052 (0.0040) [2024-06-15 12:56:39,337][1652491] Updated weights for policy 0, policy_version 114170 (0.0085) [2024-06-15 12:56:40,955][1648985] Fps is (10 sec: 52428.4, 60 sec: 45875.2, 300 sec: 46208.4). Total num frames: 233832448. Throughput: 0: 11412.0. Samples: 58491392. Policy #0 lag: (min: 15.0, avg: 84.0, max: 271.0) [2024-06-15 12:56:40,956][1648985] Avg episode reward: [(0, '147.230')] [2024-06-15 12:56:45,730][1652491] Updated weights for policy 0, policy_version 114228 (0.0012) [2024-06-15 12:56:45,955][1648985] Fps is (10 sec: 39321.7, 60 sec: 45875.2, 300 sec: 45319.8). Total num frames: 233963520. Throughput: 0: 11502.9. Samples: 58572288. Policy #0 lag: (min: 15.0, avg: 84.0, max: 271.0) [2024-06-15 12:56:45,956][1648985] Avg episode reward: [(0, '156.530')] [2024-06-15 12:56:47,215][1652491] Updated weights for policy 0, policy_version 114302 (0.0015) [2024-06-15 12:56:49,793][1652491] Updated weights for policy 0, policy_version 114356 (0.0013) [2024-06-15 12:56:50,955][1648985] Fps is (10 sec: 45875.6, 60 sec: 47513.6, 300 sec: 45986.3). Total num frames: 234291200. Throughput: 0: 11594.0. Samples: 58630656. Policy #0 lag: (min: 31.0, avg: 147.2, max: 287.0) [2024-06-15 12:56:50,956][1648985] Avg episode reward: [(0, '145.430')] [2024-06-15 12:56:51,364][1652491] Updated weights for policy 0, policy_version 114420 (0.0013) [2024-06-15 12:56:55,955][1648985] Fps is (10 sec: 42596.9, 60 sec: 44236.8, 300 sec: 45541.9). Total num frames: 234389504. Throughput: 0: 11571.1. Samples: 58670080. Policy #0 lag: (min: 31.0, avg: 147.2, max: 287.0) [2024-06-15 12:56:55,956][1648985] Avg episode reward: [(0, '135.250')] [2024-06-15 12:56:56,611][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000114480_234455040.pth... [2024-06-15 12:56:56,762][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000109120_223477760.pth [2024-06-15 12:56:57,630][1652491] Updated weights for policy 0, policy_version 114512 (0.0107) [2024-06-15 12:57:00,955][1648985] Fps is (10 sec: 36043.8, 60 sec: 46421.1, 300 sec: 45430.8). Total num frames: 234651648. Throughput: 0: 11400.5. Samples: 58731008. Policy #0 lag: (min: 31.0, avg: 147.2, max: 287.0) [2024-06-15 12:57:00,956][1648985] Avg episode reward: [(0, '119.990')] [2024-06-15 12:57:01,071][1652491] Updated weights for policy 0, policy_version 114579 (0.0105) [2024-06-15 12:57:02,692][1652491] Updated weights for policy 0, policy_version 114656 (0.0012) [2024-06-15 12:57:03,473][1652491] Updated weights for policy 0, policy_version 114688 (0.0017) [2024-06-15 12:57:05,955][1648985] Fps is (10 sec: 49153.8, 60 sec: 43690.8, 300 sec: 45764.2). Total num frames: 234881024. Throughput: 0: 11107.0. Samples: 58797056. Policy #0 lag: (min: 31.0, avg: 147.2, max: 287.0) [2024-06-15 12:57:05,956][1648985] Avg episode reward: [(0, '125.700')] [2024-06-15 12:57:09,695][1652491] Updated weights for policy 0, policy_version 114755 (0.0012) [2024-06-15 12:57:10,955][1648985] Fps is (10 sec: 45876.4, 60 sec: 47513.6, 300 sec: 45430.9). Total num frames: 235110400. Throughput: 0: 11184.4. Samples: 58832384. Policy #0 lag: (min: 31.0, avg: 147.2, max: 287.0) [2024-06-15 12:57:10,956][1648985] Avg episode reward: [(0, '122.280')] [2024-06-15 12:57:11,004][1652491] Updated weights for policy 0, policy_version 114804 (0.0011) [2024-06-15 12:57:13,030][1652491] Updated weights for policy 0, policy_version 114848 (0.0015) [2024-06-15 12:57:14,249][1652491] Updated weights for policy 0, policy_version 114898 (0.0014) [2024-06-15 12:57:15,364][1652491] Updated weights for policy 0, policy_version 114944 (0.0011) [2024-06-15 12:57:15,955][1648985] Fps is (10 sec: 52428.5, 60 sec: 45875.2, 300 sec: 45986.3). Total num frames: 235405312. Throughput: 0: 11104.7. Samples: 58896896. Policy #0 lag: (min: 31.0, avg: 147.2, max: 287.0) [2024-06-15 12:57:15,956][1648985] Avg episode reward: [(0, '129.100')] [2024-06-15 12:57:19,614][1651469] Signal inference workers to stop experience collection... (6000 times) [2024-06-15 12:57:19,649][1652491] InferenceWorker_p0-w0: stopping experience collection (6000 times) [2024-06-15 12:57:19,793][1651469] Signal inference workers to resume experience collection... (6000 times) [2024-06-15 12:57:19,794][1652491] InferenceWorker_p0-w0: resuming experience collection (6000 times) [2024-06-15 12:57:20,571][1652491] Updated weights for policy 0, policy_version 115010 (0.0013) [2024-06-15 12:57:20,955][1648985] Fps is (10 sec: 45875.2, 60 sec: 46428.5, 300 sec: 45430.9). Total num frames: 235569152. Throughput: 0: 11093.3. Samples: 58967552. Policy #0 lag: (min: 31.0, avg: 147.2, max: 287.0) [2024-06-15 12:57:20,956][1648985] Avg episode reward: [(0, '132.820')] [2024-06-15 12:57:21,942][1652491] Updated weights for policy 0, policy_version 115064 (0.0010) [2024-06-15 12:57:24,947][1652491] Updated weights for policy 0, policy_version 115104 (0.0077) [2024-06-15 12:57:25,955][1648985] Fps is (10 sec: 42598.4, 60 sec: 45875.2, 300 sec: 45875.2). Total num frames: 235831296. Throughput: 0: 11411.9. Samples: 59004928. Policy #0 lag: (min: 31.0, avg: 147.2, max: 287.0) [2024-06-15 12:57:25,956][1648985] Avg episode reward: [(0, '143.060')] [2024-06-15 12:57:26,442][1652491] Updated weights for policy 0, policy_version 115174 (0.0190) [2024-06-15 12:57:30,744][1652491] Updated weights for policy 0, policy_version 115234 (0.0016) [2024-06-15 12:57:30,955][1648985] Fps is (10 sec: 45875.2, 60 sec: 45329.0, 300 sec: 45542.0). Total num frames: 236027904. Throughput: 0: 11218.5. Samples: 59077120. Policy #0 lag: (min: 31.0, avg: 147.2, max: 287.0) [2024-06-15 12:57:30,956][1648985] Avg episode reward: [(0, '128.430')] [2024-06-15 12:57:31,597][1652491] Updated weights for policy 0, policy_version 115284 (0.0013) [2024-06-15 12:57:35,387][1652491] Updated weights for policy 0, policy_version 115344 (0.0209) [2024-06-15 12:57:35,955][1648985] Fps is (10 sec: 42598.4, 60 sec: 44782.9, 300 sec: 45542.0). Total num frames: 236257280. Throughput: 0: 11548.4. Samples: 59150336. Policy #0 lag: (min: 31.0, avg: 147.2, max: 287.0) [2024-06-15 12:57:35,956][1648985] Avg episode reward: [(0, '137.530')] [2024-06-15 12:57:36,824][1652491] Updated weights for policy 0, policy_version 115408 (0.0013) [2024-06-15 12:57:37,998][1652491] Updated weights for policy 0, policy_version 115456 (0.0016) [2024-06-15 12:57:40,955][1648985] Fps is (10 sec: 42598.4, 60 sec: 43690.7, 300 sec: 45542.0). Total num frames: 236453888. Throughput: 0: 11298.2. Samples: 59178496. Policy #0 lag: (min: 31.0, avg: 147.2, max: 287.0) [2024-06-15 12:57:40,956][1648985] Avg episode reward: [(0, '132.950')] [2024-06-15 12:57:43,003][1652491] Updated weights for policy 0, policy_version 115536 (0.0014) [2024-06-15 12:57:44,151][1652491] Updated weights for policy 0, policy_version 115584 (0.0040) [2024-06-15 12:57:45,955][1648985] Fps is (10 sec: 45874.9, 60 sec: 45875.1, 300 sec: 45319.8). Total num frames: 236716032. Throughput: 0: 11400.6. Samples: 59244032. Policy #0 lag: (min: 31.0, avg: 147.2, max: 287.0) [2024-06-15 12:57:45,956][1648985] Avg episode reward: [(0, '126.370')] [2024-06-15 12:57:48,038][1652491] Updated weights for policy 0, policy_version 115643 (0.0084) [2024-06-15 12:57:49,159][1652491] Updated weights for policy 0, policy_version 115684 (0.0012) [2024-06-15 12:57:50,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 44782.9, 300 sec: 45764.2). Total num frames: 236978176. Throughput: 0: 11548.4. Samples: 59316736. Policy #0 lag: (min: 31.0, avg: 147.2, max: 287.0) [2024-06-15 12:57:50,956][1648985] Avg episode reward: [(0, '147.190')] [2024-06-15 12:57:52,932][1652491] Updated weights for policy 0, policy_version 115744 (0.0091) [2024-06-15 12:57:54,407][1652491] Updated weights for policy 0, policy_version 115808 (0.0205) [2024-06-15 12:57:55,955][1648985] Fps is (10 sec: 52429.3, 60 sec: 47513.9, 300 sec: 45764.1). Total num frames: 237240320. Throughput: 0: 11582.6. Samples: 59353600. Policy #0 lag: (min: 10.0, avg: 104.9, max: 266.0) [2024-06-15 12:57:55,956][1648985] Avg episode reward: [(0, '127.100')] [2024-06-15 12:57:58,658][1652491] Updated weights for policy 0, policy_version 115872 (0.0013) [2024-06-15 12:57:58,839][1651469] Signal inference workers to stop experience collection... (6050 times) [2024-06-15 12:57:58,934][1652491] InferenceWorker_p0-w0: stopping experience collection (6050 times) [2024-06-15 12:57:59,150][1651469] Signal inference workers to resume experience collection... (6050 times) [2024-06-15 12:57:59,151][1652491] InferenceWorker_p0-w0: resuming experience collection (6050 times) [2024-06-15 12:58:00,364][1652491] Updated weights for policy 0, policy_version 115922 (0.0011) [2024-06-15 12:58:00,955][1648985] Fps is (10 sec: 49152.0, 60 sec: 46967.7, 300 sec: 45986.3). Total num frames: 237469696. Throughput: 0: 11616.7. Samples: 59419648. Policy #0 lag: (min: 10.0, avg: 104.9, max: 266.0) [2024-06-15 12:58:00,956][1648985] Avg episode reward: [(0, '119.080')] [2024-06-15 12:58:01,263][1652491] Updated weights for policy 0, policy_version 115968 (0.0014) [2024-06-15 12:58:04,965][1652491] Updated weights for policy 0, policy_version 116023 (0.0014) [2024-06-15 12:58:05,885][1652491] Updated weights for policy 0, policy_version 116064 (0.0013) [2024-06-15 12:58:05,955][1648985] Fps is (10 sec: 45875.1, 60 sec: 46967.4, 300 sec: 45542.0). Total num frames: 237699072. Throughput: 0: 11639.5. Samples: 59491328. Policy #0 lag: (min: 10.0, avg: 104.9, max: 266.0) [2024-06-15 12:58:05,956][1648985] Avg episode reward: [(0, '132.790')] [2024-06-15 12:58:09,228][1652491] Updated weights for policy 0, policy_version 116100 (0.0014) [2024-06-15 12:58:10,786][1652491] Updated weights for policy 0, policy_version 116167 (0.0013) [2024-06-15 12:58:10,955][1648985] Fps is (10 sec: 45875.7, 60 sec: 46967.6, 300 sec: 45875.2). Total num frames: 237928448. Throughput: 0: 11685.0. Samples: 59530752. Policy #0 lag: (min: 10.0, avg: 104.9, max: 266.0) [2024-06-15 12:58:10,956][1648985] Avg episode reward: [(0, '132.700')] [2024-06-15 12:58:14,880][1652491] Updated weights for policy 0, policy_version 116229 (0.0013) [2024-06-15 12:58:15,955][1648985] Fps is (10 sec: 42598.7, 60 sec: 45329.1, 300 sec: 45653.0). Total num frames: 238125056. Throughput: 0: 11639.5. Samples: 59600896. Policy #0 lag: (min: 10.0, avg: 104.9, max: 266.0) [2024-06-15 12:58:15,956][1648985] Avg episode reward: [(0, '111.880')] [2024-06-15 12:58:16,051][1652491] Updated weights for policy 0, policy_version 116279 (0.0020) [2024-06-15 12:58:17,696][1652491] Updated weights for policy 0, policy_version 116346 (0.0020) [2024-06-15 12:58:20,955][1648985] Fps is (10 sec: 42597.9, 60 sec: 46421.3, 300 sec: 45542.0). Total num frames: 238354432. Throughput: 0: 11480.2. Samples: 59666944. Policy #0 lag: (min: 10.0, avg: 104.9, max: 266.0) [2024-06-15 12:58:20,956][1648985] Avg episode reward: [(0, '113.160')] [2024-06-15 12:58:21,449][1652491] Updated weights for policy 0, policy_version 116410 (0.0075) [2024-06-15 12:58:22,995][1652491] Updated weights for policy 0, policy_version 116456 (0.0013) [2024-06-15 12:58:25,955][1648985] Fps is (10 sec: 42598.1, 60 sec: 45329.1, 300 sec: 45875.2). Total num frames: 238551040. Throughput: 0: 11593.9. Samples: 59700224. Policy #0 lag: (min: 10.0, avg: 104.9, max: 266.0) [2024-06-15 12:58:25,956][1648985] Avg episode reward: [(0, '136.850')] [2024-06-15 12:58:27,787][1652491] Updated weights for policy 0, policy_version 116532 (0.0071) [2024-06-15 12:58:29,316][1652491] Updated weights for policy 0, policy_version 116605 (0.0012) [2024-06-15 12:58:30,960][1648985] Fps is (10 sec: 45851.9, 60 sec: 46417.4, 300 sec: 45430.1). Total num frames: 238813184. Throughput: 0: 11581.3. Samples: 59765248. Policy #0 lag: (min: 10.0, avg: 104.9, max: 266.0) [2024-06-15 12:58:30,961][1648985] Avg episode reward: [(0, '159.600')] [2024-06-15 12:58:33,299][1652491] Updated weights for policy 0, policy_version 116672 (0.0014) [2024-06-15 12:58:34,596][1652491] Updated weights for policy 0, policy_version 116736 (0.0013) [2024-06-15 12:58:35,956][1648985] Fps is (10 sec: 52424.4, 60 sec: 46966.8, 300 sec: 46097.2). Total num frames: 239075328. Throughput: 0: 11593.7. Samples: 59838464. Policy #0 lag: (min: 10.0, avg: 104.9, max: 266.0) [2024-06-15 12:58:35,957][1648985] Avg episode reward: [(0, '144.560')] [2024-06-15 12:58:38,790][1652491] Updated weights for policy 0, policy_version 116774 (0.0013) [2024-06-15 12:58:40,245][1651469] Signal inference workers to stop experience collection... (6100 times) [2024-06-15 12:58:40,299][1652491] InferenceWorker_p0-w0: stopping experience collection (6100 times) [2024-06-15 12:58:40,440][1651469] Signal inference workers to resume experience collection... (6100 times) [2024-06-15 12:58:40,441][1652491] InferenceWorker_p0-w0: resuming experience collection (6100 times) [2024-06-15 12:58:40,443][1652491] Updated weights for policy 0, policy_version 116848 (0.0112) [2024-06-15 12:58:40,955][1648985] Fps is (10 sec: 52455.5, 60 sec: 48059.7, 300 sec: 45653.1). Total num frames: 239337472. Throughput: 0: 11639.5. Samples: 59877376. Policy #0 lag: (min: 10.0, avg: 104.9, max: 266.0) [2024-06-15 12:58:40,956][1648985] Avg episode reward: [(0, '145.620')] [2024-06-15 12:58:43,333][1652491] Updated weights for policy 0, policy_version 116868 (0.0012) [2024-06-15 12:58:44,894][1652491] Updated weights for policy 0, policy_version 116944 (0.0011) [2024-06-15 12:58:45,955][1648985] Fps is (10 sec: 52433.6, 60 sec: 48059.9, 300 sec: 46208.4). Total num frames: 239599616. Throughput: 0: 11628.1. Samples: 59942912. Policy #0 lag: (min: 10.0, avg: 104.9, max: 266.0) [2024-06-15 12:58:45,956][1648985] Avg episode reward: [(0, '145.050')] [2024-06-15 12:58:49,403][1652491] Updated weights for policy 0, policy_version 117008 (0.0050) [2024-06-15 12:58:50,743][1652491] Updated weights for policy 0, policy_version 117060 (0.0147) [2024-06-15 12:58:50,955][1648985] Fps is (10 sec: 42598.5, 60 sec: 46421.3, 300 sec: 45875.2). Total num frames: 239763456. Throughput: 0: 11605.4. Samples: 60013568. Policy #0 lag: (min: 10.0, avg: 104.9, max: 266.0) [2024-06-15 12:58:50,956][1648985] Avg episode reward: [(0, '121.720')] [2024-06-15 12:58:52,218][1652491] Updated weights for policy 0, policy_version 117118 (0.0013) [2024-06-15 12:58:55,532][1652491] Updated weights for policy 0, policy_version 117184 (0.0015) [2024-06-15 12:58:55,955][1648985] Fps is (10 sec: 39321.0, 60 sec: 45875.1, 300 sec: 45764.1). Total num frames: 239992832. Throughput: 0: 11525.6. Samples: 60049408. Policy #0 lag: (min: 3.0, avg: 100.7, max: 259.0) [2024-06-15 12:58:55,956][1648985] Avg episode reward: [(0, '114.980')] [2024-06-15 12:58:56,391][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000117200_240025600.pth... [2024-06-15 12:58:56,549][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000111872_229113856.pth [2024-06-15 12:58:57,348][1652491] Updated weights for policy 0, policy_version 117246 (0.0013) [2024-06-15 12:59:00,955][1648985] Fps is (10 sec: 36044.5, 60 sec: 44236.7, 300 sec: 45764.1). Total num frames: 240123904. Throughput: 0: 11468.8. Samples: 60116992. Policy #0 lag: (min: 3.0, avg: 100.7, max: 259.0) [2024-06-15 12:59:00,956][1648985] Avg episode reward: [(0, '112.170')] [2024-06-15 12:59:01,793][1652491] Updated weights for policy 0, policy_version 117296 (0.0012) [2024-06-15 12:59:03,385][1652491] Updated weights for policy 0, policy_version 117360 (0.0034) [2024-06-15 12:59:05,897][1652491] Updated weights for policy 0, policy_version 117395 (0.0014) [2024-06-15 12:59:05,955][1648985] Fps is (10 sec: 42599.0, 60 sec: 45329.1, 300 sec: 45430.9). Total num frames: 240418816. Throughput: 0: 11628.1. Samples: 60190208. Policy #0 lag: (min: 3.0, avg: 100.7, max: 259.0) [2024-06-15 12:59:05,956][1648985] Avg episode reward: [(0, '124.230')] [2024-06-15 12:59:08,466][1652491] Updated weights for policy 0, policy_version 117488 (0.0015) [2024-06-15 12:59:10,955][1648985] Fps is (10 sec: 52428.3, 60 sec: 45328.9, 300 sec: 46097.3). Total num frames: 240648192. Throughput: 0: 11514.3. Samples: 60218368. Policy #0 lag: (min: 3.0, avg: 100.7, max: 259.0) [2024-06-15 12:59:10,956][1648985] Avg episode reward: [(0, '134.740')] [2024-06-15 12:59:12,204][1652491] Updated weights for policy 0, policy_version 117536 (0.0014) [2024-06-15 12:59:13,910][1652491] Updated weights for policy 0, policy_version 117622 (0.0014) [2024-06-15 12:59:15,955][1648985] Fps is (10 sec: 49151.8, 60 sec: 46421.3, 300 sec: 45653.1). Total num frames: 240910336. Throughput: 0: 11765.9. Samples: 60294656. Policy #0 lag: (min: 3.0, avg: 100.7, max: 259.0) [2024-06-15 12:59:15,956][1648985] Avg episode reward: [(0, '141.170')] [2024-06-15 12:59:17,098][1652491] Updated weights for policy 0, policy_version 117664 (0.0013) [2024-06-15 12:59:17,911][1652491] Updated weights for policy 0, policy_version 117696 (0.0024) [2024-06-15 12:59:19,480][1652491] Updated weights for policy 0, policy_version 117754 (0.0166) [2024-06-15 12:59:20,955][1648985] Fps is (10 sec: 52429.5, 60 sec: 46967.4, 300 sec: 46208.4). Total num frames: 241172480. Throughput: 0: 11787.6. Samples: 60368896. Policy #0 lag: (min: 3.0, avg: 100.7, max: 259.0) [2024-06-15 12:59:20,956][1648985] Avg episode reward: [(0, '143.700')] [2024-06-15 12:59:23,368][1652491] Updated weights for policy 0, policy_version 117804 (0.0015) [2024-06-15 12:59:23,578][1651469] Signal inference workers to stop experience collection... (6150 times) [2024-06-15 12:59:23,610][1652491] InferenceWorker_p0-w0: stopping experience collection (6150 times) [2024-06-15 12:59:23,750][1651469] Signal inference workers to resume experience collection... (6150 times) [2024-06-15 12:59:23,751][1652491] InferenceWorker_p0-w0: resuming experience collection (6150 times) [2024-06-15 12:59:24,976][1652491] Updated weights for policy 0, policy_version 117879 (0.0012) [2024-06-15 12:59:25,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 48059.8, 300 sec: 45986.3). Total num frames: 241434624. Throughput: 0: 11707.7. Samples: 60404224. Policy #0 lag: (min: 3.0, avg: 100.7, max: 259.0) [2024-06-15 12:59:25,956][1648985] Avg episode reward: [(0, '145.180')] [2024-06-15 12:59:28,886][1652491] Updated weights for policy 0, policy_version 117924 (0.0012) [2024-06-15 12:59:30,377][1652491] Updated weights for policy 0, policy_version 117992 (0.0013) [2024-06-15 12:59:30,955][1648985] Fps is (10 sec: 52428.4, 60 sec: 48063.7, 300 sec: 46208.4). Total num frames: 241696768. Throughput: 0: 11855.6. Samples: 60476416. Policy #0 lag: (min: 3.0, avg: 100.7, max: 259.0) [2024-06-15 12:59:30,956][1648985] Avg episode reward: [(0, '140.900')] [2024-06-15 12:59:33,809][1652491] Updated weights for policy 0, policy_version 118052 (0.0159) [2024-06-15 12:59:35,414][1652491] Updated weights for policy 0, policy_version 118120 (0.0014) [2024-06-15 12:59:35,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 48060.4, 300 sec: 46541.7). Total num frames: 241958912. Throughput: 0: 11650.8. Samples: 60537856. Policy #0 lag: (min: 3.0, avg: 100.7, max: 259.0) [2024-06-15 12:59:35,956][1648985] Avg episode reward: [(0, '133.520')] [2024-06-15 12:59:40,384][1652491] Updated weights for policy 0, policy_version 118160 (0.0011) [2024-06-15 12:59:40,955][1648985] Fps is (10 sec: 32768.6, 60 sec: 44783.0, 300 sec: 45542.0). Total num frames: 242024448. Throughput: 0: 11764.7. Samples: 60578816. Policy #0 lag: (min: 3.0, avg: 100.7, max: 259.0) [2024-06-15 12:59:40,955][1648985] Avg episode reward: [(0, '129.960')] [2024-06-15 12:59:42,094][1652491] Updated weights for policy 0, policy_version 118244 (0.0015) [2024-06-15 12:59:44,185][1652491] Updated weights for policy 0, policy_version 118276 (0.0014) [2024-06-15 12:59:45,640][1652491] Updated weights for policy 0, policy_version 118358 (0.0014) [2024-06-15 12:59:45,955][1648985] Fps is (10 sec: 45875.1, 60 sec: 46967.4, 300 sec: 46763.8). Total num frames: 242417664. Throughput: 0: 11867.0. Samples: 60651008. Policy #0 lag: (min: 3.0, avg: 100.7, max: 259.0) [2024-06-15 12:59:45,956][1648985] Avg episode reward: [(0, '128.110')] [2024-06-15 12:59:46,293][1652491] Updated weights for policy 0, policy_version 118400 (0.0013) [2024-06-15 12:59:50,955][1648985] Fps is (10 sec: 52428.4, 60 sec: 46421.3, 300 sec: 45764.1). Total num frames: 242548736. Throughput: 0: 11969.4. Samples: 60728832. Policy #0 lag: (min: 3.0, avg: 100.7, max: 259.0) [2024-06-15 12:59:50,956][1648985] Avg episode reward: [(0, '140.080')] [2024-06-15 12:59:51,398][1652491] Updated weights for policy 0, policy_version 118455 (0.0014) [2024-06-15 12:59:52,704][1652491] Updated weights for policy 0, policy_version 118512 (0.0015) [2024-06-15 12:59:55,955][1648985] Fps is (10 sec: 39321.6, 60 sec: 46967.5, 300 sec: 46430.6). Total num frames: 242810880. Throughput: 0: 12060.5. Samples: 60761088. Policy #0 lag: (min: 31.0, avg: 134.7, max: 287.0) [2024-06-15 12:59:55,956][1648985] Avg episode reward: [(0, '145.980')] [2024-06-15 12:59:56,224][1652491] Updated weights for policy 0, policy_version 118576 (0.0030) [2024-06-15 12:59:57,593][1652491] Updated weights for policy 0, policy_version 118650 (0.0014) [2024-06-15 13:00:00,955][1648985] Fps is (10 sec: 45875.4, 60 sec: 48059.8, 300 sec: 45986.3). Total num frames: 243007488. Throughput: 0: 11912.6. Samples: 60830720. Policy #0 lag: (min: 31.0, avg: 134.7, max: 287.0) [2024-06-15 13:00:00,955][1648985] Avg episode reward: [(0, '155.940')] [2024-06-15 13:00:02,206][1652491] Updated weights for policy 0, policy_version 118704 (0.0015) [2024-06-15 13:00:03,735][1651469] Signal inference workers to stop experience collection... (6200 times) [2024-06-15 13:00:03,775][1652491] InferenceWorker_p0-w0: stopping experience collection (6200 times) [2024-06-15 13:00:04,104][1651469] Signal inference workers to resume experience collection... (6200 times) [2024-06-15 13:00:04,105][1652491] InferenceWorker_p0-w0: resuming experience collection (6200 times) [2024-06-15 13:00:04,285][1652491] Updated weights for policy 0, policy_version 118754 (0.0012) [2024-06-15 13:00:05,955][1648985] Fps is (10 sec: 45875.3, 60 sec: 47513.6, 300 sec: 46208.4). Total num frames: 243269632. Throughput: 0: 11707.7. Samples: 60895744. Policy #0 lag: (min: 31.0, avg: 134.7, max: 287.0) [2024-06-15 13:00:05,956][1648985] Avg episode reward: [(0, '160.980')] [2024-06-15 13:00:07,508][1652491] Updated weights for policy 0, policy_version 118820 (0.0012) [2024-06-15 13:00:08,860][1652491] Updated weights for policy 0, policy_version 118880 (0.0139) [2024-06-15 13:00:10,955][1648985] Fps is (10 sec: 52427.7, 60 sec: 48059.7, 300 sec: 46430.6). Total num frames: 243531776. Throughput: 0: 11593.9. Samples: 60925952. Policy #0 lag: (min: 31.0, avg: 134.7, max: 287.0) [2024-06-15 13:00:10,956][1648985] Avg episode reward: [(0, '159.380')] [2024-06-15 13:00:13,462][1652491] Updated weights for policy 0, policy_version 118928 (0.0012) [2024-06-15 13:00:15,701][1652491] Updated weights for policy 0, policy_version 118993 (0.0014) [2024-06-15 13:00:15,955][1648985] Fps is (10 sec: 45874.6, 60 sec: 46967.4, 300 sec: 45986.3). Total num frames: 243728384. Throughput: 0: 11571.2. Samples: 60997120. Policy #0 lag: (min: 31.0, avg: 134.7, max: 287.0) [2024-06-15 13:00:15,956][1648985] Avg episode reward: [(0, '156.030')] [2024-06-15 13:00:18,384][1652491] Updated weights for policy 0, policy_version 119056 (0.0014) [2024-06-15 13:00:19,970][1652491] Updated weights for policy 0, policy_version 119120 (0.0014) [2024-06-15 13:00:20,955][1648985] Fps is (10 sec: 49152.8, 60 sec: 47513.6, 300 sec: 46541.7). Total num frames: 244023296. Throughput: 0: 11605.3. Samples: 61060096. Policy #0 lag: (min: 31.0, avg: 134.7, max: 287.0) [2024-06-15 13:00:20,956][1648985] Avg episode reward: [(0, '134.260')] [2024-06-15 13:00:25,713][1652491] Updated weights for policy 0, policy_version 119200 (0.0012) [2024-06-15 13:00:25,955][1648985] Fps is (10 sec: 39322.5, 60 sec: 44783.0, 300 sec: 45542.0). Total num frames: 244121600. Throughput: 0: 11537.1. Samples: 61097984. Policy #0 lag: (min: 31.0, avg: 134.7, max: 287.0) [2024-06-15 13:00:25,955][1648985] Avg episode reward: [(0, '135.950')] [2024-06-15 13:00:27,661][1652491] Updated weights for policy 0, policy_version 119264 (0.0017) [2024-06-15 13:00:30,297][1652491] Updated weights for policy 0, policy_version 119301 (0.0011) [2024-06-15 13:00:30,955][1648985] Fps is (10 sec: 36044.9, 60 sec: 44783.0, 300 sec: 46319.5). Total num frames: 244383744. Throughput: 0: 11446.1. Samples: 61166080. Policy #0 lag: (min: 31.0, avg: 134.7, max: 287.0) [2024-06-15 13:00:30,956][1648985] Avg episode reward: [(0, '138.870')] [2024-06-15 13:00:32,142][1652491] Updated weights for policy 0, policy_version 119376 (0.0011) [2024-06-15 13:00:35,955][1648985] Fps is (10 sec: 45875.1, 60 sec: 43690.7, 300 sec: 45764.1). Total num frames: 244580352. Throughput: 0: 11127.5. Samples: 61229568. Policy #0 lag: (min: 31.0, avg: 134.7, max: 287.0) [2024-06-15 13:00:35,956][1648985] Avg episode reward: [(0, '128.020')] [2024-06-15 13:00:37,469][1652491] Updated weights for policy 0, policy_version 119444 (0.0130) [2024-06-15 13:00:38,518][1652491] Updated weights for policy 0, policy_version 119488 (0.0032) [2024-06-15 13:00:40,955][1648985] Fps is (10 sec: 45875.1, 60 sec: 46967.4, 300 sec: 46208.4). Total num frames: 244842496. Throughput: 0: 11229.9. Samples: 61266432. Policy #0 lag: (min: 31.0, avg: 134.7, max: 287.0) [2024-06-15 13:00:40,956][1648985] Avg episode reward: [(0, '127.900')] [2024-06-15 13:00:42,091][1652491] Updated weights for policy 0, policy_version 119568 (0.0026) [2024-06-15 13:00:44,492][1652491] Updated weights for policy 0, policy_version 119664 (0.0072) [2024-06-15 13:00:45,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 44783.0, 300 sec: 46319.5). Total num frames: 245104640. Throughput: 0: 10945.4. Samples: 61323264. Policy #0 lag: (min: 31.0, avg: 134.7, max: 287.0) [2024-06-15 13:00:45,956][1648985] Avg episode reward: [(0, '120.920')] [2024-06-15 13:00:49,425][1651469] Signal inference workers to stop experience collection... (6250 times) [2024-06-15 13:00:49,485][1652491] InferenceWorker_p0-w0: stopping experience collection (6250 times) [2024-06-15 13:00:49,680][1651469] Signal inference workers to resume experience collection... (6250 times) [2024-06-15 13:00:49,684][1652491] InferenceWorker_p0-w0: resuming experience collection (6250 times) [2024-06-15 13:00:49,686][1652491] Updated weights for policy 0, policy_version 119712 (0.0025) [2024-06-15 13:00:50,955][1648985] Fps is (10 sec: 42598.3, 60 sec: 45329.1, 300 sec: 45875.2). Total num frames: 245268480. Throughput: 0: 11093.3. Samples: 61394944. Policy #0 lag: (min: 31.0, avg: 134.7, max: 287.0) [2024-06-15 13:00:50,956][1648985] Avg episode reward: [(0, '121.250')] [2024-06-15 13:00:51,180][1652491] Updated weights for policy 0, policy_version 119776 (0.0012) [2024-06-15 13:00:51,929][1652491] Updated weights for policy 0, policy_version 119808 (0.0033) [2024-06-15 13:00:54,603][1652491] Updated weights for policy 0, policy_version 119868 (0.0013) [2024-06-15 13:00:55,955][1648985] Fps is (10 sec: 45873.6, 60 sec: 45875.0, 300 sec: 46430.5). Total num frames: 245563392. Throughput: 0: 11286.7. Samples: 61433856. Policy #0 lag: (min: 31.0, avg: 134.7, max: 287.0) [2024-06-15 13:00:55,956][1648985] Avg episode reward: [(0, '125.940')] [2024-06-15 13:00:56,430][1652491] Updated weights for policy 0, policy_version 119925 (0.0013) [2024-06-15 13:00:56,651][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000119936_245628928.pth... [2024-06-15 13:00:56,722][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000114480_234455040.pth [2024-06-15 13:01:00,586][1652491] Updated weights for policy 0, policy_version 119968 (0.0014) [2024-06-15 13:01:00,955][1648985] Fps is (10 sec: 45875.3, 60 sec: 45329.0, 300 sec: 45653.1). Total num frames: 245727232. Throughput: 0: 11264.0. Samples: 61504000. Policy #0 lag: (min: 15.0, avg: 98.3, max: 271.0) [2024-06-15 13:01:00,956][1648985] Avg episode reward: [(0, '133.180')] [2024-06-15 13:01:02,261][1652491] Updated weights for policy 0, policy_version 120032 (0.0014) [2024-06-15 13:01:04,284][1652491] Updated weights for policy 0, policy_version 120066 (0.0012) [2024-06-15 13:01:05,955][1648985] Fps is (10 sec: 45877.6, 60 sec: 45875.4, 300 sec: 46652.8). Total num frames: 246022144. Throughput: 0: 11423.3. Samples: 61574144. Policy #0 lag: (min: 15.0, avg: 98.3, max: 271.0) [2024-06-15 13:01:05,955][1648985] Avg episode reward: [(0, '143.970')] [2024-06-15 13:01:06,097][1652491] Updated weights for policy 0, policy_version 120144 (0.0012) [2024-06-15 13:01:07,294][1652491] Updated weights for policy 0, policy_version 120190 (0.0014) [2024-06-15 13:01:10,955][1648985] Fps is (10 sec: 45875.6, 60 sec: 44237.0, 300 sec: 45875.2). Total num frames: 246185984. Throughput: 0: 11411.9. Samples: 61611520. Policy #0 lag: (min: 15.0, avg: 98.3, max: 271.0) [2024-06-15 13:01:10,955][1648985] Avg episode reward: [(0, '137.130')] [2024-06-15 13:01:11,493][1652491] Updated weights for policy 0, policy_version 120240 (0.0015) [2024-06-15 13:01:13,227][1652491] Updated weights for policy 0, policy_version 120304 (0.0013) [2024-06-15 13:01:15,955][1648985] Fps is (10 sec: 45874.3, 60 sec: 45875.3, 300 sec: 46432.0). Total num frames: 246480896. Throughput: 0: 11605.3. Samples: 61688320. Policy #0 lag: (min: 15.0, avg: 98.3, max: 271.0) [2024-06-15 13:01:15,956][1648985] Avg episode reward: [(0, '122.880')] [2024-06-15 13:01:16,670][1652491] Updated weights for policy 0, policy_version 120384 (0.0128) [2024-06-15 13:01:18,332][1652491] Updated weights for policy 0, policy_version 120448 (0.0013) [2024-06-15 13:01:20,955][1648985] Fps is (10 sec: 49151.7, 60 sec: 44236.8, 300 sec: 46097.4). Total num frames: 246677504. Throughput: 0: 11639.5. Samples: 61753344. Policy #0 lag: (min: 15.0, avg: 98.3, max: 271.0) [2024-06-15 13:01:20,956][1648985] Avg episode reward: [(0, '133.220')] [2024-06-15 13:01:23,092][1652491] Updated weights for policy 0, policy_version 120512 (0.0013) [2024-06-15 13:01:24,635][1652491] Updated weights for policy 0, policy_version 120573 (0.0014) [2024-06-15 13:01:25,955][1648985] Fps is (10 sec: 45875.2, 60 sec: 46967.4, 300 sec: 46208.4). Total num frames: 246939648. Throughput: 0: 11616.7. Samples: 61789184. Policy #0 lag: (min: 15.0, avg: 98.3, max: 271.0) [2024-06-15 13:01:25,956][1648985] Avg episode reward: [(0, '129.020')] [2024-06-15 13:01:28,217][1652491] Updated weights for policy 0, policy_version 120643 (0.0012) [2024-06-15 13:01:28,648][1651469] Signal inference workers to stop experience collection... (6300 times) [2024-06-15 13:01:28,720][1652491] InferenceWorker_p0-w0: stopping experience collection (6300 times) [2024-06-15 13:01:28,956][1651469] Signal inference workers to resume experience collection... (6300 times) [2024-06-15 13:01:28,956][1652491] InferenceWorker_p0-w0: resuming experience collection (6300 times) [2024-06-15 13:01:29,711][1652491] Updated weights for policy 0, policy_version 120700 (0.0093) [2024-06-15 13:01:30,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 46967.5, 300 sec: 46208.4). Total num frames: 247201792. Throughput: 0: 11685.0. Samples: 61849088. Policy #0 lag: (min: 15.0, avg: 98.3, max: 271.0) [2024-06-15 13:01:30,955][1648985] Avg episode reward: [(0, '147.880')] [2024-06-15 13:01:34,331][1652491] Updated weights for policy 0, policy_version 120766 (0.0110) [2024-06-15 13:01:35,955][1648985] Fps is (10 sec: 45875.0, 60 sec: 46967.4, 300 sec: 45986.3). Total num frames: 247398400. Throughput: 0: 11776.0. Samples: 61924864. Policy #0 lag: (min: 15.0, avg: 98.3, max: 271.0) [2024-06-15 13:01:35,956][1648985] Avg episode reward: [(0, '146.220')] [2024-06-15 13:01:36,379][1652491] Updated weights for policy 0, policy_version 120832 (0.0027) [2024-06-15 13:01:40,360][1652491] Updated weights for policy 0, policy_version 120896 (0.0012) [2024-06-15 13:01:40,958][1648985] Fps is (10 sec: 42587.6, 60 sec: 46419.4, 300 sec: 46319.1). Total num frames: 247627776. Throughput: 0: 11741.3. Samples: 61962240. Policy #0 lag: (min: 15.0, avg: 98.3, max: 271.0) [2024-06-15 13:01:40,958][1648985] Avg episode reward: [(0, '139.280')] [2024-06-15 13:01:41,848][1652491] Updated weights for policy 0, policy_version 120956 (0.0013) [2024-06-15 13:01:45,955][1648985] Fps is (10 sec: 39321.7, 60 sec: 44782.9, 300 sec: 45764.1). Total num frames: 247791616. Throughput: 0: 11582.6. Samples: 62025216. Policy #0 lag: (min: 15.0, avg: 98.3, max: 271.0) [2024-06-15 13:01:45,956][1648985] Avg episode reward: [(0, '136.050')] [2024-06-15 13:01:46,417][1652491] Updated weights for policy 0, policy_version 121021 (0.0014) [2024-06-15 13:01:47,276][1652491] Updated weights for policy 0, policy_version 121063 (0.0013) [2024-06-15 13:01:50,308][1652491] Updated weights for policy 0, policy_version 121104 (0.0013) [2024-06-15 13:01:50,955][1648985] Fps is (10 sec: 42609.8, 60 sec: 46421.5, 300 sec: 46319.6). Total num frames: 248053760. Throughput: 0: 11559.8. Samples: 62094336. Policy #0 lag: (min: 15.0, avg: 98.3, max: 271.0) [2024-06-15 13:01:50,955][1648985] Avg episode reward: [(0, '144.940')] [2024-06-15 13:01:51,736][1652491] Updated weights for policy 0, policy_version 121153 (0.0013) [2024-06-15 13:01:53,166][1652491] Updated weights for policy 0, policy_version 121211 (0.0012) [2024-06-15 13:01:55,955][1648985] Fps is (10 sec: 45874.5, 60 sec: 44783.0, 300 sec: 46097.4). Total num frames: 248250368. Throughput: 0: 11468.7. Samples: 62127616. Policy #0 lag: (min: 15.0, avg: 98.3, max: 271.0) [2024-06-15 13:01:55,956][1648985] Avg episode reward: [(0, '144.510')] [2024-06-15 13:01:57,581][1652491] Updated weights for policy 0, policy_version 121253 (0.0012) [2024-06-15 13:01:58,585][1652491] Updated weights for policy 0, policy_version 121301 (0.0125) [2024-06-15 13:02:00,955][1648985] Fps is (10 sec: 45874.6, 60 sec: 46421.4, 300 sec: 46208.4). Total num frames: 248512512. Throughput: 0: 11309.5. Samples: 62197248. Policy #0 lag: (min: 15.0, avg: 98.3, max: 271.0) [2024-06-15 13:02:00,955][1648985] Avg episode reward: [(0, '153.340')] [2024-06-15 13:02:02,451][1652491] Updated weights for policy 0, policy_version 121383 (0.0037) [2024-06-15 13:02:03,969][1652491] Updated weights for policy 0, policy_version 121440 (0.0012) [2024-06-15 13:02:05,955][1648985] Fps is (10 sec: 52429.5, 60 sec: 45875.0, 300 sec: 46319.5). Total num frames: 248774656. Throughput: 0: 11411.9. Samples: 62266880. Policy #0 lag: (min: 24.0, avg: 132.1, max: 280.0) [2024-06-15 13:02:05,956][1648985] Avg episode reward: [(0, '167.940')] [2024-06-15 13:02:08,547][1652491] Updated weights for policy 0, policy_version 121491 (0.0014) [2024-06-15 13:02:09,372][1652491] Updated weights for policy 0, policy_version 121535 (0.0034) [2024-06-15 13:02:10,955][1648985] Fps is (10 sec: 45875.1, 60 sec: 46421.3, 300 sec: 45986.3). Total num frames: 248971264. Throughput: 0: 11411.9. Samples: 62302720. Policy #0 lag: (min: 24.0, avg: 132.1, max: 280.0) [2024-06-15 13:02:10,955][1648985] Avg episode reward: [(0, '148.170')] [2024-06-15 13:02:12,341][1652491] Updated weights for policy 0, policy_version 121604 (0.0128) [2024-06-15 13:02:12,725][1651469] Signal inference workers to stop experience collection... (6350 times) [2024-06-15 13:02:12,763][1652491] InferenceWorker_p0-w0: stopping experience collection (6350 times) [2024-06-15 13:02:12,930][1651469] Signal inference workers to resume experience collection... (6350 times) [2024-06-15 13:02:12,931][1652491] InferenceWorker_p0-w0: resuming experience collection (6350 times) [2024-06-15 13:02:13,689][1652491] Updated weights for policy 0, policy_version 121661 (0.0014) [2024-06-15 13:02:15,060][1652491] Updated weights for policy 0, policy_version 121719 (0.0013) [2024-06-15 13:02:15,955][1648985] Fps is (10 sec: 52430.0, 60 sec: 46967.6, 300 sec: 46541.7). Total num frames: 249298944. Throughput: 0: 11525.7. Samples: 62367744. Policy #0 lag: (min: 24.0, avg: 132.1, max: 280.0) [2024-06-15 13:02:15,955][1648985] Avg episode reward: [(0, '126.580')] [2024-06-15 13:02:19,435][1652491] Updated weights for policy 0, policy_version 121744 (0.0012) [2024-06-15 13:02:20,793][1652491] Updated weights for policy 0, policy_version 121792 (0.0014) [2024-06-15 13:02:20,955][1648985] Fps is (10 sec: 45875.0, 60 sec: 45875.2, 300 sec: 46097.4). Total num frames: 249430016. Throughput: 0: 11548.4. Samples: 62444544. Policy #0 lag: (min: 24.0, avg: 132.1, max: 280.0) [2024-06-15 13:02:20,956][1648985] Avg episode reward: [(0, '133.630')] [2024-06-15 13:02:23,414][1652491] Updated weights for policy 0, policy_version 121857 (0.0016) [2024-06-15 13:02:24,836][1652491] Updated weights for policy 0, policy_version 121914 (0.0013) [2024-06-15 13:02:25,955][1648985] Fps is (10 sec: 45874.2, 60 sec: 46967.4, 300 sec: 46541.7). Total num frames: 249757696. Throughput: 0: 11480.8. Samples: 62478848. Policy #0 lag: (min: 24.0, avg: 132.1, max: 280.0) [2024-06-15 13:02:25,956][1648985] Avg episode reward: [(0, '124.970')] [2024-06-15 13:02:26,309][1652491] Updated weights for policy 0, policy_version 121974 (0.0013) [2024-06-15 13:02:30,955][1648985] Fps is (10 sec: 42598.3, 60 sec: 44236.7, 300 sec: 46097.4). Total num frames: 249856000. Throughput: 0: 11662.2. Samples: 62550016. Policy #0 lag: (min: 24.0, avg: 132.1, max: 280.0) [2024-06-15 13:02:30,956][1648985] Avg episode reward: [(0, '125.410')] [2024-06-15 13:02:31,797][1652491] Updated weights for policy 0, policy_version 122032 (0.0021) [2024-06-15 13:02:32,808][1652491] Updated weights for policy 0, policy_version 122065 (0.0021) [2024-06-15 13:02:34,921][1652491] Updated weights for policy 0, policy_version 122128 (0.0014) [2024-06-15 13:02:35,955][1648985] Fps is (10 sec: 42598.3, 60 sec: 46421.3, 300 sec: 46541.7). Total num frames: 250183680. Throughput: 0: 11491.5. Samples: 62611456. Policy #0 lag: (min: 24.0, avg: 132.1, max: 280.0) [2024-06-15 13:02:35,956][1648985] Avg episode reward: [(0, '136.090')] [2024-06-15 13:02:36,105][1652491] Updated weights for policy 0, policy_version 122176 (0.0072) [2024-06-15 13:02:40,955][1648985] Fps is (10 sec: 49151.9, 60 sec: 45330.9, 300 sec: 46208.4). Total num frames: 250347520. Throughput: 0: 11468.8. Samples: 62643712. Policy #0 lag: (min: 24.0, avg: 132.1, max: 280.0) [2024-06-15 13:02:40,956][1648985] Avg episode reward: [(0, '128.230')] [2024-06-15 13:02:42,038][1652491] Updated weights for policy 0, policy_version 122243 (0.0107) [2024-06-15 13:02:43,820][1652491] Updated weights for policy 0, policy_version 122304 (0.0013) [2024-06-15 13:02:45,143][1652491] Updated weights for policy 0, policy_version 122357 (0.0013) [2024-06-15 13:02:45,955][1648985] Fps is (10 sec: 42598.1, 60 sec: 46967.4, 300 sec: 46208.4). Total num frames: 250609664. Throughput: 0: 11468.8. Samples: 62713344. Policy #0 lag: (min: 24.0, avg: 132.1, max: 280.0) [2024-06-15 13:02:45,956][1648985] Avg episode reward: [(0, '130.060')] [2024-06-15 13:02:46,977][1652491] Updated weights for policy 0, policy_version 122400 (0.0012) [2024-06-15 13:02:48,130][1652491] Updated weights for policy 0, policy_version 122448 (0.0012) [2024-06-15 13:02:49,268][1652491] Updated weights for policy 0, policy_version 122488 (0.0014) [2024-06-15 13:02:50,955][1648985] Fps is (10 sec: 52429.1, 60 sec: 46967.3, 300 sec: 46208.4). Total num frames: 250871808. Throughput: 0: 11548.5. Samples: 62786560. Policy #0 lag: (min: 24.0, avg: 132.1, max: 280.0) [2024-06-15 13:02:50,956][1648985] Avg episode reward: [(0, '126.400')] [2024-06-15 13:02:54,377][1652491] Updated weights for policy 0, policy_version 122531 (0.0014) [2024-06-15 13:02:55,148][1651469] Signal inference workers to stop experience collection... (6400 times) [2024-06-15 13:02:55,189][1652491] InferenceWorker_p0-w0: stopping experience collection (6400 times) [2024-06-15 13:02:55,432][1651469] Signal inference workers to resume experience collection... (6400 times) [2024-06-15 13:02:55,433][1652491] InferenceWorker_p0-w0: resuming experience collection (6400 times) [2024-06-15 13:02:55,955][1648985] Fps is (10 sec: 45875.6, 60 sec: 46967.6, 300 sec: 46097.3). Total num frames: 251068416. Throughput: 0: 11753.2. Samples: 62831616. Policy #0 lag: (min: 24.0, avg: 132.1, max: 280.0) [2024-06-15 13:02:55,956][1648985] Avg episode reward: [(0, '154.430')] [2024-06-15 13:02:55,972][1652491] Updated weights for policy 0, policy_version 122593 (0.0012) [2024-06-15 13:02:56,479][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000122624_251133952.pth... [2024-06-15 13:02:56,563][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000117200_240025600.pth [2024-06-15 13:02:58,204][1652491] Updated weights for policy 0, policy_version 122672 (0.0045) [2024-06-15 13:02:59,937][1652491] Updated weights for policy 0, policy_version 122720 (0.0013) [2024-06-15 13:03:00,956][1648985] Fps is (10 sec: 52427.6, 60 sec: 48059.5, 300 sec: 46430.6). Total num frames: 251396096. Throughput: 0: 11696.2. Samples: 62894080. Policy #0 lag: (min: 24.0, avg: 132.1, max: 280.0) [2024-06-15 13:03:00,957][1648985] Avg episode reward: [(0, '167.680')] [2024-06-15 13:03:04,615][1652491] Updated weights for policy 0, policy_version 122754 (0.0023) [2024-06-15 13:03:05,955][1648985] Fps is (10 sec: 42598.5, 60 sec: 45329.1, 300 sec: 45986.3). Total num frames: 251494400. Throughput: 0: 11662.2. Samples: 62969344. Policy #0 lag: (min: 15.0, avg: 87.5, max: 271.0) [2024-06-15 13:03:05,956][1648985] Avg episode reward: [(0, '159.230')] [2024-06-15 13:03:05,958][1652491] Updated weights for policy 0, policy_version 122816 (0.0013) [2024-06-15 13:03:07,223][1652491] Updated weights for policy 0, policy_version 122868 (0.0017) [2024-06-15 13:03:09,073][1652491] Updated weights for policy 0, policy_version 122916 (0.0068) [2024-06-15 13:03:09,932][1652491] Updated weights for policy 0, policy_version 122949 (0.0057) [2024-06-15 13:03:10,955][1648985] Fps is (10 sec: 49153.3, 60 sec: 48605.9, 300 sec: 46652.7). Total num frames: 251887616. Throughput: 0: 11616.7. Samples: 63001600. Policy #0 lag: (min: 15.0, avg: 87.5, max: 271.0) [2024-06-15 13:03:10,956][1648985] Avg episode reward: [(0, '149.000')] [2024-06-15 13:03:11,039][1652491] Updated weights for policy 0, policy_version 123008 (0.0015) [2024-06-15 13:03:15,955][1648985] Fps is (10 sec: 42597.6, 60 sec: 43690.4, 300 sec: 45986.2). Total num frames: 251920384. Throughput: 0: 11719.1. Samples: 63077376. Policy #0 lag: (min: 15.0, avg: 87.5, max: 271.0) [2024-06-15 13:03:15,956][1648985] Avg episode reward: [(0, '160.190')] [2024-06-15 13:03:17,502][1652491] Updated weights for policy 0, policy_version 123076 (0.0015) [2024-06-15 13:03:18,547][1652491] Updated weights for policy 0, policy_version 123136 (0.0014) [2024-06-15 13:03:20,955][1648985] Fps is (10 sec: 42598.6, 60 sec: 48059.8, 300 sec: 46652.8). Total num frames: 252313600. Throughput: 0: 11776.0. Samples: 63141376. Policy #0 lag: (min: 15.0, avg: 87.5, max: 271.0) [2024-06-15 13:03:20,955][1648985] Avg episode reward: [(0, '133.540')] [2024-06-15 13:03:21,191][1652491] Updated weights for policy 0, policy_version 123216 (0.0012) [2024-06-15 13:03:22,262][1652491] Updated weights for policy 0, policy_version 123256 (0.0011) [2024-06-15 13:03:25,955][1648985] Fps is (10 sec: 52429.8, 60 sec: 44782.9, 300 sec: 46209.2). Total num frames: 252444672. Throughput: 0: 11889.8. Samples: 63178752. Policy #0 lag: (min: 15.0, avg: 87.5, max: 271.0) [2024-06-15 13:03:25,956][1648985] Avg episode reward: [(0, '113.020')] [2024-06-15 13:03:28,249][1652491] Updated weights for policy 0, policy_version 123328 (0.0077) [2024-06-15 13:03:29,564][1652491] Updated weights for policy 0, policy_version 123387 (0.0012) [2024-06-15 13:03:30,955][1648985] Fps is (10 sec: 45874.8, 60 sec: 48605.9, 300 sec: 46430.7). Total num frames: 252772352. Throughput: 0: 11969.4. Samples: 63251968. Policy #0 lag: (min: 15.0, avg: 87.5, max: 271.0) [2024-06-15 13:03:30,956][1648985] Avg episode reward: [(0, '127.540')] [2024-06-15 13:03:31,347][1652491] Updated weights for policy 0, policy_version 123454 (0.0013) [2024-06-15 13:03:33,266][1652491] Updated weights for policy 0, policy_version 123512 (0.0015) [2024-06-15 13:03:35,955][1648985] Fps is (10 sec: 52429.1, 60 sec: 46421.4, 300 sec: 46208.4). Total num frames: 252968960. Throughput: 0: 11889.8. Samples: 63321600. Policy #0 lag: (min: 15.0, avg: 87.5, max: 271.0) [2024-06-15 13:03:35,956][1648985] Avg episode reward: [(0, '130.630')] [2024-06-15 13:03:37,960][1651469] Signal inference workers to stop experience collection... (6450 times) [2024-06-15 13:03:38,072][1652491] InferenceWorker_p0-w0: stopping experience collection (6450 times) [2024-06-15 13:03:38,248][1651469] Signal inference workers to resume experience collection... (6450 times) [2024-06-15 13:03:38,249][1652491] InferenceWorker_p0-w0: resuming experience collection (6450 times) [2024-06-15 13:03:39,228][1652491] Updated weights for policy 0, policy_version 123582 (0.0149) [2024-06-15 13:03:40,353][1652491] Updated weights for policy 0, policy_version 123645 (0.0188) [2024-06-15 13:03:40,955][1648985] Fps is (10 sec: 45874.6, 60 sec: 48059.7, 300 sec: 46208.4). Total num frames: 253231104. Throughput: 0: 11764.6. Samples: 63361024. Policy #0 lag: (min: 15.0, avg: 87.5, max: 271.0) [2024-06-15 13:03:40,956][1648985] Avg episode reward: [(0, '114.200')] [2024-06-15 13:03:42,659][1652491] Updated weights for policy 0, policy_version 123705 (0.0060) [2024-06-15 13:03:44,403][1652491] Updated weights for policy 0, policy_version 123760 (0.0019) [2024-06-15 13:03:45,955][1648985] Fps is (10 sec: 52428.5, 60 sec: 48059.8, 300 sec: 46541.7). Total num frames: 253493248. Throughput: 0: 11605.4. Samples: 63416320. Policy #0 lag: (min: 15.0, avg: 87.5, max: 271.0) [2024-06-15 13:03:45,956][1648985] Avg episode reward: [(0, '105.660')] [2024-06-15 13:03:49,823][1652491] Updated weights for policy 0, policy_version 123797 (0.0017) [2024-06-15 13:03:50,955][1648985] Fps is (10 sec: 39321.9, 60 sec: 45875.2, 300 sec: 46208.4). Total num frames: 253624320. Throughput: 0: 11832.9. Samples: 63501824. Policy #0 lag: (min: 15.0, avg: 87.5, max: 271.0) [2024-06-15 13:03:50,956][1648985] Avg episode reward: [(0, '105.490')] [2024-06-15 13:03:50,983][1652491] Updated weights for policy 0, policy_version 123856 (0.0017) [2024-06-15 13:03:52,730][1652491] Updated weights for policy 0, policy_version 123920 (0.0119) [2024-06-15 13:03:54,442][1652491] Updated weights for policy 0, policy_version 123970 (0.0013) [2024-06-15 13:03:55,681][1652491] Updated weights for policy 0, policy_version 124027 (0.0012) [2024-06-15 13:03:55,955][1648985] Fps is (10 sec: 52428.5, 60 sec: 49152.0, 300 sec: 47097.1). Total num frames: 254017536. Throughput: 0: 11776.0. Samples: 63531520. Policy #0 lag: (min: 15.0, avg: 87.5, max: 271.0) [2024-06-15 13:03:55,956][1648985] Avg episode reward: [(0, '123.270')] [2024-06-15 13:04:00,598][1652491] Updated weights for policy 0, policy_version 124089 (0.0016) [2024-06-15 13:04:00,955][1648985] Fps is (10 sec: 52429.1, 60 sec: 45875.4, 300 sec: 46541.7). Total num frames: 254148608. Throughput: 0: 11855.7. Samples: 63610880. Policy #0 lag: (min: 15.0, avg: 87.5, max: 271.0) [2024-06-15 13:04:00,956][1648985] Avg episode reward: [(0, '141.010')] [2024-06-15 13:04:01,719][1652491] Updated weights for policy 0, policy_version 124130 (0.0181) [2024-06-15 13:04:03,228][1652491] Updated weights for policy 0, policy_version 124165 (0.0015) [2024-06-15 13:04:04,007][1652491] Updated weights for policy 0, policy_version 124216 (0.0015) [2024-06-15 13:04:05,955][1648985] Fps is (10 sec: 42598.3, 60 sec: 49151.9, 300 sec: 46763.8). Total num frames: 254443520. Throughput: 0: 11980.8. Samples: 63680512. Policy #0 lag: (min: 15.0, avg: 87.5, max: 271.0) [2024-06-15 13:04:05,956][1648985] Avg episode reward: [(0, '129.430')] [2024-06-15 13:04:06,515][1652491] Updated weights for policy 0, policy_version 124264 (0.0013) [2024-06-15 13:04:10,876][1652491] Updated weights for policy 0, policy_version 124306 (0.0013) [2024-06-15 13:04:10,955][1648985] Fps is (10 sec: 42598.5, 60 sec: 44782.9, 300 sec: 46319.5). Total num frames: 254574592. Throughput: 0: 12037.7. Samples: 63720448. Policy #0 lag: (min: 63.0, avg: 202.1, max: 319.0) [2024-06-15 13:04:10,955][1648985] Avg episode reward: [(0, '125.840')] [2024-06-15 13:04:12,291][1652491] Updated weights for policy 0, policy_version 124356 (0.0046) [2024-06-15 13:04:13,696][1652491] Updated weights for policy 0, policy_version 124416 (0.0171) [2024-06-15 13:04:15,955][1648985] Fps is (10 sec: 49152.4, 60 sec: 50244.4, 300 sec: 46652.8). Total num frames: 254935040. Throughput: 0: 11810.1. Samples: 63783424. Policy #0 lag: (min: 63.0, avg: 202.1, max: 319.0) [2024-06-15 13:04:15,956][1648985] Avg episode reward: [(0, '145.500')] [2024-06-15 13:04:16,636][1652491] Updated weights for policy 0, policy_version 124482 (0.0027) [2024-06-15 13:04:16,874][1651469] Signal inference workers to stop experience collection... (6500 times) [2024-06-15 13:04:16,934][1652491] InferenceWorker_p0-w0: stopping experience collection (6500 times) [2024-06-15 13:04:17,198][1651469] Signal inference workers to resume experience collection... (6500 times) [2024-06-15 13:04:17,199][1652491] InferenceWorker_p0-w0: resuming experience collection (6500 times) [2024-06-15 13:04:17,897][1652491] Updated weights for policy 0, policy_version 124538 (0.0012) [2024-06-15 13:04:20,955][1648985] Fps is (10 sec: 49150.5, 60 sec: 45874.9, 300 sec: 46208.4). Total num frames: 255066112. Throughput: 0: 12049.0. Samples: 63863808. Policy #0 lag: (min: 63.0, avg: 202.1, max: 319.0) [2024-06-15 13:04:20,956][1648985] Avg episode reward: [(0, '142.920')] [2024-06-15 13:04:22,994][1652491] Updated weights for policy 0, policy_version 124592 (0.0011) [2024-06-15 13:04:24,229][1652491] Updated weights for policy 0, policy_version 124643 (0.0041) [2024-06-15 13:04:25,454][1652491] Updated weights for policy 0, policy_version 124704 (0.0017) [2024-06-15 13:04:25,955][1648985] Fps is (10 sec: 49151.8, 60 sec: 49698.1, 300 sec: 46541.7). Total num frames: 255426560. Throughput: 0: 11901.2. Samples: 63896576. Policy #0 lag: (min: 63.0, avg: 202.1, max: 319.0) [2024-06-15 13:04:25,956][1648985] Avg episode reward: [(0, '134.190')] [2024-06-15 13:04:26,301][1652491] Updated weights for policy 0, policy_version 124736 (0.0012) [2024-06-15 13:04:28,224][1652491] Updated weights for policy 0, policy_version 124791 (0.0013) [2024-06-15 13:04:30,955][1648985] Fps is (10 sec: 52430.0, 60 sec: 46967.4, 300 sec: 46208.4). Total num frames: 255590400. Throughput: 0: 12379.0. Samples: 63973376. Policy #0 lag: (min: 63.0, avg: 202.1, max: 319.0) [2024-06-15 13:04:30,956][1648985] Avg episode reward: [(0, '119.430')] [2024-06-15 13:04:33,574][1652491] Updated weights for policy 0, policy_version 124835 (0.0024) [2024-06-15 13:04:34,627][1652491] Updated weights for policy 0, policy_version 124884 (0.0137) [2024-06-15 13:04:35,955][1648985] Fps is (10 sec: 42597.7, 60 sec: 48059.5, 300 sec: 46874.9). Total num frames: 255852544. Throughput: 0: 12014.9. Samples: 64042496. Policy #0 lag: (min: 63.0, avg: 202.1, max: 319.0) [2024-06-15 13:04:35,956][1648985] Avg episode reward: [(0, '130.660')] [2024-06-15 13:04:36,236][1652491] Updated weights for policy 0, policy_version 124946 (0.0013) [2024-06-15 13:04:39,091][1652491] Updated weights for policy 0, policy_version 125027 (0.0039) [2024-06-15 13:04:40,955][1648985] Fps is (10 sec: 52428.2, 60 sec: 48059.7, 300 sec: 46430.6). Total num frames: 256114688. Throughput: 0: 12003.5. Samples: 64071680. Policy #0 lag: (min: 63.0, avg: 202.1, max: 319.0) [2024-06-15 13:04:40,956][1648985] Avg episode reward: [(0, '131.070')] [2024-06-15 13:04:44,362][1652491] Updated weights for policy 0, policy_version 125077 (0.0019) [2024-06-15 13:04:45,955][1648985] Fps is (10 sec: 42599.1, 60 sec: 46421.3, 300 sec: 46541.7). Total num frames: 256278528. Throughput: 0: 11958.0. Samples: 64148992. Policy #0 lag: (min: 63.0, avg: 202.1, max: 319.0) [2024-06-15 13:04:45,956][1648985] Avg episode reward: [(0, '151.220')] [2024-06-15 13:04:46,063][1652491] Updated weights for policy 0, policy_version 125152 (0.0013) [2024-06-15 13:04:47,249][1652491] Updated weights for policy 0, policy_version 125204 (0.0012) [2024-06-15 13:04:50,397][1652491] Updated weights for policy 0, policy_version 125251 (0.0012) [2024-06-15 13:04:50,955][1648985] Fps is (10 sec: 42599.4, 60 sec: 48605.9, 300 sec: 46541.7). Total num frames: 256540672. Throughput: 0: 11787.4. Samples: 64210944. Policy #0 lag: (min: 63.0, avg: 202.1, max: 319.0) [2024-06-15 13:04:50,956][1648985] Avg episode reward: [(0, '136.320')] [2024-06-15 13:04:51,595][1652491] Updated weights for policy 0, policy_version 125309 (0.0013) [2024-06-15 13:04:55,962][1648985] Fps is (10 sec: 42568.2, 60 sec: 44777.7, 300 sec: 46429.5). Total num frames: 256704512. Throughput: 0: 11831.0. Samples: 64252928. Policy #0 lag: (min: 63.0, avg: 202.1, max: 319.0) [2024-06-15 13:04:55,963][1648985] Avg episode reward: [(0, '124.190')] [2024-06-15 13:04:56,339][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000125360_256737280.pth... [2024-06-15 13:04:56,525][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000119936_245628928.pth [2024-06-15 13:04:56,980][1652491] Updated weights for policy 0, policy_version 125382 (0.0013) [2024-06-15 13:04:57,532][1651469] Signal inference workers to stop experience collection... (6550 times) [2024-06-15 13:04:57,593][1652491] InferenceWorker_p0-w0: stopping experience collection (6550 times) [2024-06-15 13:04:57,747][1651469] Signal inference workers to resume experience collection... (6550 times) [2024-06-15 13:04:57,748][1652491] InferenceWorker_p0-w0: resuming experience collection (6550 times) [2024-06-15 13:04:58,692][1652491] Updated weights for policy 0, policy_version 125466 (0.0021) [2024-06-15 13:05:00,955][1648985] Fps is (10 sec: 49152.3, 60 sec: 48059.8, 300 sec: 46652.8). Total num frames: 257032192. Throughput: 0: 11730.5. Samples: 64311296. Policy #0 lag: (min: 63.0, avg: 202.1, max: 319.0) [2024-06-15 13:05:00,955][1648985] Avg episode reward: [(0, '144.500')] [2024-06-15 13:05:01,779][1652491] Updated weights for policy 0, policy_version 125521 (0.0015) [2024-06-15 13:05:05,955][1648985] Fps is (10 sec: 45907.7, 60 sec: 45329.1, 300 sec: 46208.5). Total num frames: 257163264. Throughput: 0: 11662.3. Samples: 64388608. Policy #0 lag: (min: 63.0, avg: 202.1, max: 319.0) [2024-06-15 13:05:05,956][1648985] Avg episode reward: [(0, '147.460')] [2024-06-15 13:05:07,473][1652491] Updated weights for policy 0, policy_version 125600 (0.0014) [2024-06-15 13:05:09,611][1652491] Updated weights for policy 0, policy_version 125681 (0.0014) [2024-06-15 13:05:10,795][1652491] Updated weights for policy 0, policy_version 125732 (0.0012) [2024-06-15 13:05:10,955][1648985] Fps is (10 sec: 45874.1, 60 sec: 48605.7, 300 sec: 46652.7). Total num frames: 257490944. Throughput: 0: 11593.9. Samples: 64418304. Policy #0 lag: (min: 85.0, avg: 142.2, max: 341.0) [2024-06-15 13:05:10,956][1648985] Avg episode reward: [(0, '142.690')] [2024-06-15 13:05:13,331][1652491] Updated weights for policy 0, policy_version 125776 (0.0043) [2024-06-15 13:05:15,955][1648985] Fps is (10 sec: 52428.4, 60 sec: 45875.1, 300 sec: 46319.5). Total num frames: 257687552. Throughput: 0: 11434.7. Samples: 64487936. Policy #0 lag: (min: 85.0, avg: 142.2, max: 341.0) [2024-06-15 13:05:15,956][1648985] Avg episode reward: [(0, '137.520')] [2024-06-15 13:05:18,697][1652491] Updated weights for policy 0, policy_version 125856 (0.0016) [2024-06-15 13:05:20,955][1648985] Fps is (10 sec: 42599.0, 60 sec: 47513.8, 300 sec: 46763.8). Total num frames: 257916928. Throughput: 0: 11423.3. Samples: 64556544. Policy #0 lag: (min: 85.0, avg: 142.2, max: 341.0) [2024-06-15 13:05:20,956][1648985] Avg episode reward: [(0, '138.130')] [2024-06-15 13:05:21,061][1652491] Updated weights for policy 0, policy_version 125952 (0.0109) [2024-06-15 13:05:24,664][1652491] Updated weights for policy 0, policy_version 126032 (0.0016) [2024-06-15 13:05:25,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 46421.3, 300 sec: 46874.9). Total num frames: 258211840. Throughput: 0: 11457.4. Samples: 64587264. Policy #0 lag: (min: 85.0, avg: 142.2, max: 341.0) [2024-06-15 13:05:25,956][1648985] Avg episode reward: [(0, '144.320')] [2024-06-15 13:05:29,860][1652491] Updated weights for policy 0, policy_version 126096 (0.0014) [2024-06-15 13:05:30,955][1648985] Fps is (10 sec: 42597.3, 60 sec: 45875.1, 300 sec: 46652.7). Total num frames: 258342912. Throughput: 0: 11548.4. Samples: 64668672. Policy #0 lag: (min: 85.0, avg: 142.2, max: 341.0) [2024-06-15 13:05:30,956][1648985] Avg episode reward: [(0, '146.810')] [2024-06-15 13:05:31,207][1652491] Updated weights for policy 0, policy_version 126160 (0.0013) [2024-06-15 13:05:32,953][1652491] Updated weights for policy 0, policy_version 126228 (0.0012) [2024-06-15 13:05:35,871][1652491] Updated weights for policy 0, policy_version 126304 (0.0013) [2024-06-15 13:05:35,955][1648985] Fps is (10 sec: 45875.5, 60 sec: 46967.6, 300 sec: 46874.9). Total num frames: 258670592. Throughput: 0: 11434.6. Samples: 64725504. Policy #0 lag: (min: 85.0, avg: 142.2, max: 341.0) [2024-06-15 13:05:35,956][1648985] Avg episode reward: [(0, '150.610')] [2024-06-15 13:05:40,955][1648985] Fps is (10 sec: 39322.6, 60 sec: 43690.8, 300 sec: 46208.4). Total num frames: 258736128. Throughput: 0: 11413.7. Samples: 64766464. Policy #0 lag: (min: 85.0, avg: 142.2, max: 341.0) [2024-06-15 13:05:40,956][1648985] Avg episode reward: [(0, '163.770')] [2024-06-15 13:05:40,961][1651469] Signal inference workers to stop experience collection... (6600 times) [2024-06-15 13:05:41,003][1652491] InferenceWorker_p0-w0: stopping experience collection (6600 times) [2024-06-15 13:05:41,204][1651469] Signal inference workers to resume experience collection... (6600 times) [2024-06-15 13:05:41,205][1652491] InferenceWorker_p0-w0: resuming experience collection (6600 times) [2024-06-15 13:05:41,384][1652491] Updated weights for policy 0, policy_version 126353 (0.0012) [2024-06-15 13:05:43,348][1652491] Updated weights for policy 0, policy_version 126435 (0.0012) [2024-06-15 13:05:44,875][1652491] Updated weights for policy 0, policy_version 126496 (0.0013) [2024-06-15 13:05:45,955][1648985] Fps is (10 sec: 45873.9, 60 sec: 47513.4, 300 sec: 46985.9). Total num frames: 259129344. Throughput: 0: 11616.6. Samples: 64834048. Policy #0 lag: (min: 85.0, avg: 142.2, max: 341.0) [2024-06-15 13:05:45,956][1648985] Avg episode reward: [(0, '154.670')] [2024-06-15 13:05:46,788][1652491] Updated weights for policy 0, policy_version 126560 (0.0014) [2024-06-15 13:05:50,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 45329.1, 300 sec: 46430.6). Total num frames: 259260416. Throughput: 0: 11571.2. Samples: 64909312. Policy #0 lag: (min: 85.0, avg: 142.2, max: 341.0) [2024-06-15 13:05:50,956][1648985] Avg episode reward: [(0, '139.030')] [2024-06-15 13:05:51,291][1652491] Updated weights for policy 0, policy_version 126593 (0.0011) [2024-06-15 13:05:52,384][1652491] Updated weights for policy 0, policy_version 126645 (0.0074) [2024-06-15 13:05:55,337][1652491] Updated weights for policy 0, policy_version 126708 (0.0043) [2024-06-15 13:05:55,955][1648985] Fps is (10 sec: 42599.7, 60 sec: 47519.2, 300 sec: 46874.9). Total num frames: 259555328. Throughput: 0: 11696.4. Samples: 64944640. Policy #0 lag: (min: 85.0, avg: 142.2, max: 341.0) [2024-06-15 13:05:55,956][1648985] Avg episode reward: [(0, '135.180')] [2024-06-15 13:05:56,933][1652491] Updated weights for policy 0, policy_version 126779 (0.0012) [2024-06-15 13:05:58,197][1652491] Updated weights for policy 0, policy_version 126817 (0.0013) [2024-06-15 13:05:58,877][1652491] Updated weights for policy 0, policy_version 126847 (0.0009) [2024-06-15 13:06:00,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 45875.2, 300 sec: 46652.7). Total num frames: 259784704. Throughput: 0: 11457.5. Samples: 65003520. Policy #0 lag: (min: 85.0, avg: 142.2, max: 341.0) [2024-06-15 13:06:00,955][1648985] Avg episode reward: [(0, '150.880')] [2024-06-15 13:06:04,534][1652491] Updated weights for policy 0, policy_version 126910 (0.0013) [2024-06-15 13:06:05,955][1648985] Fps is (10 sec: 42598.4, 60 sec: 46967.5, 300 sec: 46763.8). Total num frames: 259981312. Throughput: 0: 11559.8. Samples: 65076736. Policy #0 lag: (min: 85.0, avg: 142.2, max: 341.0) [2024-06-15 13:06:05,956][1648985] Avg episode reward: [(0, '150.240')] [2024-06-15 13:06:06,533][1652491] Updated weights for policy 0, policy_version 126976 (0.0025) [2024-06-15 13:06:08,347][1652491] Updated weights for policy 0, policy_version 127036 (0.0024) [2024-06-15 13:06:10,212][1652491] Updated weights for policy 0, policy_version 127093 (0.0120) [2024-06-15 13:06:10,955][1648985] Fps is (10 sec: 52427.8, 60 sec: 46967.5, 300 sec: 46874.9). Total num frames: 260308992. Throughput: 0: 11548.4. Samples: 65106944. Policy #0 lag: (min: 85.0, avg: 142.2, max: 341.0) [2024-06-15 13:06:10,956][1648985] Avg episode reward: [(0, '147.660')] [2024-06-15 13:06:15,190][1652491] Updated weights for policy 0, policy_version 127136 (0.0012) [2024-06-15 13:06:15,955][1648985] Fps is (10 sec: 45875.4, 60 sec: 45875.3, 300 sec: 46652.7). Total num frames: 260440064. Throughput: 0: 11491.6. Samples: 65185792. Policy #0 lag: (min: 13.0, avg: 83.2, max: 269.0) [2024-06-15 13:06:15,956][1648985] Avg episode reward: [(0, '140.390')] [2024-06-15 13:06:16,583][1652491] Updated weights for policy 0, policy_version 127184 (0.0012) [2024-06-15 13:06:18,016][1652491] Updated weights for policy 0, policy_version 127248 (0.0014) [2024-06-15 13:06:19,181][1652491] Updated weights for policy 0, policy_version 127293 (0.0016) [2024-06-15 13:06:20,978][1648985] Fps is (10 sec: 45772.1, 60 sec: 47495.7, 300 sec: 46871.3). Total num frames: 260767744. Throughput: 0: 11633.6. Samples: 65249280. Policy #0 lag: (min: 13.0, avg: 83.2, max: 269.0) [2024-06-15 13:06:20,978][1648985] Avg episode reward: [(0, '147.580')] [2024-06-15 13:06:21,286][1652491] Updated weights for policy 0, policy_version 127360 (0.0024) [2024-06-15 13:06:25,331][1651469] Signal inference workers to stop experience collection... (6650 times) [2024-06-15 13:06:25,362][1652491] InferenceWorker_p0-w0: stopping experience collection (6650 times) [2024-06-15 13:06:25,564][1651469] Signal inference workers to resume experience collection... (6650 times) [2024-06-15 13:06:25,564][1652491] InferenceWorker_p0-w0: resuming experience collection (6650 times) [2024-06-15 13:06:25,955][1648985] Fps is (10 sec: 45874.3, 60 sec: 44782.9, 300 sec: 46430.6). Total num frames: 260898816. Throughput: 0: 11673.6. Samples: 65291776. Policy #0 lag: (min: 13.0, avg: 83.2, max: 269.0) [2024-06-15 13:06:25,956][1648985] Avg episode reward: [(0, '150.460')] [2024-06-15 13:06:26,442][1652491] Updated weights for policy 0, policy_version 127424 (0.0016) [2024-06-15 13:06:29,273][1652491] Updated weights for policy 0, policy_version 127496 (0.0100) [2024-06-15 13:06:30,387][1652491] Updated weights for policy 0, policy_version 127541 (0.0016) [2024-06-15 13:06:30,955][1648985] Fps is (10 sec: 45979.2, 60 sec: 48059.9, 300 sec: 46874.9). Total num frames: 261226496. Throughput: 0: 11616.8. Samples: 65356800. Policy #0 lag: (min: 13.0, avg: 83.2, max: 269.0) [2024-06-15 13:06:30,956][1648985] Avg episode reward: [(0, '146.120')] [2024-06-15 13:06:32,252][1652491] Updated weights for policy 0, policy_version 127616 (0.0012) [2024-06-15 13:06:35,955][1648985] Fps is (10 sec: 45875.5, 60 sec: 44782.9, 300 sec: 46542.0). Total num frames: 261357568. Throughput: 0: 11593.9. Samples: 65431040. Policy #0 lag: (min: 13.0, avg: 83.2, max: 269.0) [2024-06-15 13:06:35,956][1648985] Avg episode reward: [(0, '133.410')] [2024-06-15 13:06:37,874][1652491] Updated weights for policy 0, policy_version 127678 (0.0036) [2024-06-15 13:06:40,955][1648985] Fps is (10 sec: 39321.2, 60 sec: 48059.6, 300 sec: 46874.9). Total num frames: 261619712. Throughput: 0: 11684.9. Samples: 65470464. Policy #0 lag: (min: 13.0, avg: 83.2, max: 269.0) [2024-06-15 13:06:40,956][1648985] Avg episode reward: [(0, '124.700')] [2024-06-15 13:06:41,089][1652491] Updated weights for policy 0, policy_version 127760 (0.0013) [2024-06-15 13:06:42,304][1652491] Updated weights for policy 0, policy_version 127807 (0.0102) [2024-06-15 13:06:45,955][1648985] Fps is (10 sec: 52429.5, 60 sec: 45875.5, 300 sec: 46874.9). Total num frames: 261881856. Throughput: 0: 11605.3. Samples: 65525760. Policy #0 lag: (min: 13.0, avg: 83.2, max: 269.0) [2024-06-15 13:06:45,956][1648985] Avg episode reward: [(0, '129.680')] [2024-06-15 13:06:48,098][1652491] Updated weights for policy 0, policy_version 127874 (0.0013) [2024-06-15 13:06:49,349][1652491] Updated weights for policy 0, policy_version 127935 (0.0115) [2024-06-15 13:06:50,955][1648985] Fps is (10 sec: 39322.7, 60 sec: 45875.3, 300 sec: 46652.8). Total num frames: 262012928. Throughput: 0: 11707.8. Samples: 65603584. Policy #0 lag: (min: 13.0, avg: 83.2, max: 269.0) [2024-06-15 13:06:50,956][1648985] Avg episode reward: [(0, '128.970')] [2024-06-15 13:06:52,821][1652491] Updated weights for policy 0, policy_version 128019 (0.0012) [2024-06-15 13:06:54,639][1652491] Updated weights for policy 0, policy_version 128071 (0.0013) [2024-06-15 13:06:55,883][1652491] Updated weights for policy 0, policy_version 128127 (0.0012) [2024-06-15 13:06:55,955][1648985] Fps is (10 sec: 52427.0, 60 sec: 47513.3, 300 sec: 47097.0). Total num frames: 262406144. Throughput: 0: 11502.9. Samples: 65624576. Policy #0 lag: (min: 13.0, avg: 83.2, max: 269.0) [2024-06-15 13:06:55,956][1648985] Avg episode reward: [(0, '130.580')] [2024-06-15 13:06:55,963][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000128128_262406144.pth... [2024-06-15 13:06:56,096][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000122624_251133952.pth [2024-06-15 13:07:00,955][1648985] Fps is (10 sec: 42597.8, 60 sec: 44236.7, 300 sec: 46319.5). Total num frames: 262438912. Throughput: 0: 11446.0. Samples: 65700864. Policy #0 lag: (min: 13.0, avg: 83.2, max: 269.0) [2024-06-15 13:07:00,956][1648985] Avg episode reward: [(0, '142.880')] [2024-06-15 13:07:01,755][1652491] Updated weights for policy 0, policy_version 128181 (0.0051) [2024-06-15 13:07:02,848][1652491] Updated weights for policy 0, policy_version 128209 (0.0017) [2024-06-15 13:07:05,049][1651469] Signal inference workers to stop experience collection... (6700 times) [2024-06-15 13:07:05,124][1652491] Updated weights for policy 0, policy_version 128291 (0.0081) [2024-06-15 13:07:05,178][1652491] InferenceWorker_p0-w0: stopping experience collection (6700 times) [2024-06-15 13:07:05,317][1651469] Signal inference workers to resume experience collection... (6700 times) [2024-06-15 13:07:05,318][1652491] InferenceWorker_p0-w0: resuming experience collection (6700 times) [2024-06-15 13:07:05,955][1648985] Fps is (10 sec: 39322.7, 60 sec: 46967.4, 300 sec: 46874.9). Total num frames: 262799360. Throughput: 0: 11281.0. Samples: 65756672. Policy #0 lag: (min: 13.0, avg: 83.2, max: 269.0) [2024-06-15 13:07:05,956][1648985] Avg episode reward: [(0, '151.740')] [2024-06-15 13:07:06,605][1652491] Updated weights for policy 0, policy_version 128321 (0.0012) [2024-06-15 13:07:07,901][1652491] Updated weights for policy 0, policy_version 128373 (0.0010) [2024-06-15 13:07:10,955][1648985] Fps is (10 sec: 49151.8, 60 sec: 43690.7, 300 sec: 46208.4). Total num frames: 262930432. Throughput: 0: 11104.7. Samples: 65791488. Policy #0 lag: (min: 13.0, avg: 83.2, max: 269.0) [2024-06-15 13:07:10,956][1648985] Avg episode reward: [(0, '151.230')] [2024-06-15 13:07:13,956][1652491] Updated weights for policy 0, policy_version 128432 (0.0029) [2024-06-15 13:07:15,143][1652491] Updated weights for policy 0, policy_version 128480 (0.0011) [2024-06-15 13:07:15,955][1648985] Fps is (10 sec: 39321.7, 60 sec: 45875.2, 300 sec: 46652.7). Total num frames: 263192576. Throughput: 0: 11252.6. Samples: 65863168. Policy #0 lag: (min: 13.0, avg: 83.2, max: 269.0) [2024-06-15 13:07:15,956][1648985] Avg episode reward: [(0, '129.740')] [2024-06-15 13:07:16,431][1652491] Updated weights for policy 0, policy_version 128530 (0.0012) [2024-06-15 13:07:17,283][1652491] Updated weights for policy 0, policy_version 128570 (0.0013) [2024-06-15 13:07:18,802][1652491] Updated weights for policy 0, policy_version 128612 (0.0011) [2024-06-15 13:07:20,955][1648985] Fps is (10 sec: 52429.2, 60 sec: 44799.9, 300 sec: 46430.6). Total num frames: 263454720. Throughput: 0: 11047.9. Samples: 65928192. Policy #0 lag: (min: 97.0, avg: 182.2, max: 321.0) [2024-06-15 13:07:20,956][1648985] Avg episode reward: [(0, '138.020')] [2024-06-15 13:07:25,052][1652491] Updated weights for policy 0, policy_version 128673 (0.0015) [2024-06-15 13:07:25,955][1648985] Fps is (10 sec: 39321.7, 60 sec: 44783.1, 300 sec: 46541.7). Total num frames: 263585792. Throughput: 0: 11082.0. Samples: 65969152. Policy #0 lag: (min: 97.0, avg: 182.2, max: 321.0) [2024-06-15 13:07:25,956][1648985] Avg episode reward: [(0, '142.180')] [2024-06-15 13:07:26,869][1652491] Updated weights for policy 0, policy_version 128736 (0.0015) [2024-06-15 13:07:28,517][1652491] Updated weights for policy 0, policy_version 128802 (0.0021) [2024-06-15 13:07:30,158][1652491] Updated weights for policy 0, policy_version 128864 (0.0014) [2024-06-15 13:07:30,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 45875.2, 300 sec: 46763.8). Total num frames: 263979008. Throughput: 0: 11059.2. Samples: 66023424. Policy #0 lag: (min: 97.0, avg: 182.2, max: 321.0) [2024-06-15 13:07:30,955][1648985] Avg episode reward: [(0, '137.600')] [2024-06-15 13:07:35,955][1648985] Fps is (10 sec: 39321.5, 60 sec: 43690.7, 300 sec: 46208.4). Total num frames: 263979008. Throughput: 0: 11013.6. Samples: 66099200. Policy #0 lag: (min: 97.0, avg: 182.2, max: 321.0) [2024-06-15 13:07:35,956][1648985] Avg episode reward: [(0, '123.560')] [2024-06-15 13:07:36,519][1652491] Updated weights for policy 0, policy_version 128914 (0.0013) [2024-06-15 13:07:38,744][1652491] Updated weights for policy 0, policy_version 128998 (0.0013) [2024-06-15 13:07:40,882][1652491] Updated weights for policy 0, policy_version 129072 (0.0012) [2024-06-15 13:07:40,955][1648985] Fps is (10 sec: 36044.4, 60 sec: 45329.1, 300 sec: 46541.7). Total num frames: 264339456. Throughput: 0: 11229.9. Samples: 66129920. Policy #0 lag: (min: 97.0, avg: 182.2, max: 321.0) [2024-06-15 13:07:40,956][1648985] Avg episode reward: [(0, '131.420')] [2024-06-15 13:07:42,520][1652491] Updated weights for policy 0, policy_version 129122 (0.0017) [2024-06-15 13:07:45,955][1648985] Fps is (10 sec: 52429.4, 60 sec: 43690.7, 300 sec: 46208.4). Total num frames: 264503296. Throughput: 0: 10899.9. Samples: 66191360. Policy #0 lag: (min: 97.0, avg: 182.2, max: 321.0) [2024-06-15 13:07:45,955][1648985] Avg episode reward: [(0, '159.420')] [2024-06-15 13:07:48,189][1652491] Updated weights for policy 0, policy_version 129171 (0.0031) [2024-06-15 13:07:49,978][1652491] Updated weights for policy 0, policy_version 129219 (0.0013) [2024-06-15 13:07:50,955][1648985] Fps is (10 sec: 39322.0, 60 sec: 45329.0, 300 sec: 46319.5). Total num frames: 264732672. Throughput: 0: 11320.9. Samples: 66266112. Policy #0 lag: (min: 97.0, avg: 182.2, max: 321.0) [2024-06-15 13:07:50,956][1648985] Avg episode reward: [(0, '156.860')] [2024-06-15 13:07:51,080][1651469] Signal inference workers to stop experience collection... (6750 times) [2024-06-15 13:07:51,139][1652491] InferenceWorker_p0-w0: stopping experience collection (6750 times) [2024-06-15 13:07:51,418][1651469] Signal inference workers to resume experience collection... (6750 times) [2024-06-15 13:07:51,419][1652491] InferenceWorker_p0-w0: resuming experience collection (6750 times) [2024-06-15 13:07:52,245][1652491] Updated weights for policy 0, policy_version 129312 (0.0012) [2024-06-15 13:07:54,228][1652491] Updated weights for policy 0, policy_version 129402 (0.0018) [2024-06-15 13:07:55,955][1648985] Fps is (10 sec: 52427.8, 60 sec: 43690.8, 300 sec: 46208.5). Total num frames: 265027584. Throughput: 0: 11070.6. Samples: 66289664. Policy #0 lag: (min: 97.0, avg: 182.2, max: 321.0) [2024-06-15 13:07:55,956][1648985] Avg episode reward: [(0, '161.690')] [2024-06-15 13:08:00,488][1652491] Updated weights for policy 0, policy_version 129464 (0.0103) [2024-06-15 13:08:00,955][1648985] Fps is (10 sec: 42598.2, 60 sec: 45329.0, 300 sec: 46319.5). Total num frames: 265158656. Throughput: 0: 11275.4. Samples: 66370560. Policy #0 lag: (min: 97.0, avg: 182.2, max: 321.0) [2024-06-15 13:08:00,956][1648985] Avg episode reward: [(0, '141.930')] [2024-06-15 13:08:02,414][1652491] Updated weights for policy 0, policy_version 129506 (0.0026) [2024-06-15 13:08:03,799][1652491] Updated weights for policy 0, policy_version 129569 (0.0114) [2024-06-15 13:08:05,084][1652491] Updated weights for policy 0, policy_version 129622 (0.0018) [2024-06-15 13:08:05,955][1648985] Fps is (10 sec: 49153.6, 60 sec: 45329.3, 300 sec: 46208.5). Total num frames: 265519104. Throughput: 0: 11116.1. Samples: 66428416. Policy #0 lag: (min: 97.0, avg: 182.2, max: 321.0) [2024-06-15 13:08:05,955][1648985] Avg episode reward: [(0, '130.250')] [2024-06-15 13:08:05,988][1652491] Updated weights for policy 0, policy_version 129662 (0.0012) [2024-06-15 13:08:10,955][1648985] Fps is (10 sec: 42598.1, 60 sec: 44236.7, 300 sec: 46319.5). Total num frames: 265584640. Throughput: 0: 11173.0. Samples: 66471936. Policy #0 lag: (min: 97.0, avg: 182.2, max: 321.0) [2024-06-15 13:08:10,956][1648985] Avg episode reward: [(0, '121.440')] [2024-06-15 13:08:11,733][1652491] Updated weights for policy 0, policy_version 129719 (0.0013) [2024-06-15 13:08:13,773][1652491] Updated weights for policy 0, policy_version 129765 (0.0056) [2024-06-15 13:08:15,656][1652491] Updated weights for policy 0, policy_version 129856 (0.0018) [2024-06-15 13:08:15,955][1648985] Fps is (10 sec: 42597.4, 60 sec: 45875.2, 300 sec: 46208.4). Total num frames: 265945088. Throughput: 0: 11400.5. Samples: 66536448. Policy #0 lag: (min: 97.0, avg: 182.2, max: 321.0) [2024-06-15 13:08:15,956][1648985] Avg episode reward: [(0, '138.140')] [2024-06-15 13:08:17,409][1652491] Updated weights for policy 0, policy_version 129911 (0.0014) [2024-06-15 13:08:20,955][1648985] Fps is (10 sec: 49153.1, 60 sec: 43690.7, 300 sec: 46208.5). Total num frames: 266076160. Throughput: 0: 11332.3. Samples: 66609152. Policy #0 lag: (min: 97.0, avg: 182.2, max: 321.0) [2024-06-15 13:08:20,955][1648985] Avg episode reward: [(0, '137.600')] [2024-06-15 13:08:23,236][1652491] Updated weights for policy 0, policy_version 129979 (0.0013) [2024-06-15 13:08:25,955][1648985] Fps is (10 sec: 39321.6, 60 sec: 45875.2, 300 sec: 45986.3). Total num frames: 266338304. Throughput: 0: 11389.2. Samples: 66642432. Policy #0 lag: (min: 15.0, avg: 94.9, max: 271.0) [2024-06-15 13:08:25,956][1648985] Avg episode reward: [(0, '127.040')] [2024-06-15 13:08:26,267][1652491] Updated weights for policy 0, policy_version 130064 (0.0013) [2024-06-15 13:08:28,032][1652491] Updated weights for policy 0, policy_version 130128 (0.0148) [2024-06-15 13:08:30,955][1648985] Fps is (10 sec: 52428.1, 60 sec: 43690.6, 300 sec: 46208.4). Total num frames: 266600448. Throughput: 0: 11309.5. Samples: 66700288. Policy #0 lag: (min: 15.0, avg: 94.9, max: 271.0) [2024-06-15 13:08:30,956][1648985] Avg episode reward: [(0, '133.630')] [2024-06-15 13:08:34,221][1652491] Updated weights for policy 0, policy_version 130208 (0.0014) [2024-06-15 13:08:35,955][1648985] Fps is (10 sec: 39321.8, 60 sec: 45875.2, 300 sec: 45764.1). Total num frames: 266731520. Throughput: 0: 11377.8. Samples: 66778112. Policy #0 lag: (min: 15.0, avg: 94.9, max: 271.0) [2024-06-15 13:08:35,956][1648985] Avg episode reward: [(0, '132.110')] [2024-06-15 13:08:36,378][1652491] Updated weights for policy 0, policy_version 130256 (0.0012) [2024-06-15 13:08:36,475][1651469] Signal inference workers to stop experience collection... (6800 times) [2024-06-15 13:08:36,523][1652491] InferenceWorker_p0-w0: stopping experience collection (6800 times) [2024-06-15 13:08:36,671][1651469] Signal inference workers to resume experience collection... (6800 times) [2024-06-15 13:08:36,672][1652491] InferenceWorker_p0-w0: resuming experience collection (6800 times) [2024-06-15 13:08:37,630][1652491] Updated weights for policy 0, policy_version 130320 (0.0020) [2024-06-15 13:08:38,669][1652491] Updated weights for policy 0, policy_version 130360 (0.0034) [2024-06-15 13:08:40,256][1652491] Updated weights for policy 0, policy_version 130428 (0.0127) [2024-06-15 13:08:40,955][1648985] Fps is (10 sec: 52429.1, 60 sec: 46421.4, 300 sec: 46208.4). Total num frames: 267124736. Throughput: 0: 11548.5. Samples: 66809344. Policy #0 lag: (min: 15.0, avg: 94.9, max: 271.0) [2024-06-15 13:08:40,955][1648985] Avg episode reward: [(0, '128.070')] [2024-06-15 13:08:45,955][1648985] Fps is (10 sec: 49151.9, 60 sec: 45329.0, 300 sec: 46097.4). Total num frames: 267223040. Throughput: 0: 11366.4. Samples: 66882048. Policy #0 lag: (min: 15.0, avg: 94.9, max: 271.0) [2024-06-15 13:08:45,956][1648985] Avg episode reward: [(0, '127.950')] [2024-06-15 13:08:45,970][1652491] Updated weights for policy 0, policy_version 130486 (0.0015) [2024-06-15 13:08:48,040][1652491] Updated weights for policy 0, policy_version 130528 (0.0012) [2024-06-15 13:08:50,105][1652491] Updated weights for policy 0, policy_version 130617 (0.0087) [2024-06-15 13:08:50,955][1648985] Fps is (10 sec: 42598.4, 60 sec: 46967.5, 300 sec: 45875.2). Total num frames: 267550720. Throughput: 0: 11446.0. Samples: 66943488. Policy #0 lag: (min: 15.0, avg: 94.9, max: 271.0) [2024-06-15 13:08:50,955][1648985] Avg episode reward: [(0, '120.660')] [2024-06-15 13:08:51,415][1652491] Updated weights for policy 0, policy_version 130672 (0.0036) [2024-06-15 13:08:55,955][1648985] Fps is (10 sec: 42598.1, 60 sec: 43690.7, 300 sec: 45764.1). Total num frames: 267649024. Throughput: 0: 11355.0. Samples: 66982912. Policy #0 lag: (min: 15.0, avg: 94.9, max: 271.0) [2024-06-15 13:08:55,956][1648985] Avg episode reward: [(0, '124.220')] [2024-06-15 13:08:55,961][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000130688_267649024.pth... [2024-06-15 13:08:56,016][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000125360_256737280.pth [2024-06-15 13:08:56,944][1652491] Updated weights for policy 0, policy_version 130720 (0.0043) [2024-06-15 13:08:59,070][1652491] Updated weights for policy 0, policy_version 130768 (0.0026) [2024-06-15 13:09:00,494][1652491] Updated weights for policy 0, policy_version 130818 (0.0012) [2024-06-15 13:09:00,955][1648985] Fps is (10 sec: 39321.4, 60 sec: 46421.3, 300 sec: 45764.1). Total num frames: 267943936. Throughput: 0: 11480.2. Samples: 67053056. Policy #0 lag: (min: 15.0, avg: 94.9, max: 271.0) [2024-06-15 13:09:00,956][1648985] Avg episode reward: [(0, '109.440')] [2024-06-15 13:09:01,624][1652491] Updated weights for policy 0, policy_version 130869 (0.0074) [2024-06-15 13:09:02,905][1652491] Updated weights for policy 0, policy_version 130940 (0.0017) [2024-06-15 13:09:05,955][1648985] Fps is (10 sec: 52429.2, 60 sec: 44236.6, 300 sec: 46097.4). Total num frames: 268173312. Throughput: 0: 11366.4. Samples: 67120640. Policy #0 lag: (min: 15.0, avg: 94.9, max: 271.0) [2024-06-15 13:09:05,956][1648985] Avg episode reward: [(0, '112.450')] [2024-06-15 13:09:08,685][1652491] Updated weights for policy 0, policy_version 130995 (0.0013) [2024-06-15 13:09:10,955][1648985] Fps is (10 sec: 39321.7, 60 sec: 45875.3, 300 sec: 45430.9). Total num frames: 268337152. Throughput: 0: 11423.3. Samples: 67156480. Policy #0 lag: (min: 15.0, avg: 94.9, max: 271.0) [2024-06-15 13:09:10,956][1648985] Avg episode reward: [(0, '142.920')] [2024-06-15 13:09:11,572][1652491] Updated weights for policy 0, policy_version 131056 (0.0119) [2024-06-15 13:09:13,198][1652491] Updated weights for policy 0, policy_version 131136 (0.0014) [2024-06-15 13:09:14,077][1651469] Signal inference workers to stop experience collection... (6850 times) [2024-06-15 13:09:14,171][1652491] InferenceWorker_p0-w0: stopping experience collection (6850 times) [2024-06-15 13:09:14,271][1651469] Signal inference workers to resume experience collection... (6850 times) [2024-06-15 13:09:14,271][1652491] InferenceWorker_p0-w0: resuming experience collection (6850 times) [2024-06-15 13:09:14,421][1652491] Updated weights for policy 0, policy_version 131192 (0.0012) [2024-06-15 13:09:15,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 45875.2, 300 sec: 46208.5). Total num frames: 268697600. Throughput: 0: 11605.3. Samples: 67222528. Policy #0 lag: (min: 15.0, avg: 94.9, max: 271.0) [2024-06-15 13:09:15,956][1648985] Avg episode reward: [(0, '133.260')] [2024-06-15 13:09:19,703][1652491] Updated weights for policy 0, policy_version 131234 (0.0020) [2024-06-15 13:09:20,955][1648985] Fps is (10 sec: 49150.7, 60 sec: 45874.9, 300 sec: 45430.9). Total num frames: 268828672. Throughput: 0: 11639.4. Samples: 67301888. Policy #0 lag: (min: 15.0, avg: 94.9, max: 271.0) [2024-06-15 13:09:20,957][1648985] Avg episode reward: [(0, '129.000')] [2024-06-15 13:09:22,265][1652491] Updated weights for policy 0, policy_version 131312 (0.0156) [2024-06-15 13:09:23,127][1652491] Updated weights for policy 0, policy_version 131349 (0.0012) [2024-06-15 13:09:24,442][1652491] Updated weights for policy 0, policy_version 131409 (0.0012) [2024-06-15 13:09:25,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 48059.8, 300 sec: 46208.4). Total num frames: 269221888. Throughput: 0: 11559.8. Samples: 67329536. Policy #0 lag: (min: 47.0, avg: 202.3, max: 319.0) [2024-06-15 13:09:25,956][1648985] Avg episode reward: [(0, '138.570')] [2024-06-15 13:09:30,034][1652491] Updated weights for policy 0, policy_version 131488 (0.0013) [2024-06-15 13:09:30,955][1648985] Fps is (10 sec: 52430.0, 60 sec: 45875.2, 300 sec: 45764.1). Total num frames: 269352960. Throughput: 0: 11776.0. Samples: 67411968. Policy #0 lag: (min: 47.0, avg: 202.3, max: 319.0) [2024-06-15 13:09:30,956][1648985] Avg episode reward: [(0, '132.190')] [2024-06-15 13:09:32,044][1652491] Updated weights for policy 0, policy_version 131524 (0.0014) [2024-06-15 13:09:34,005][1652491] Updated weights for policy 0, policy_version 131600 (0.0014) [2024-06-15 13:09:35,332][1652491] Updated weights for policy 0, policy_version 131656 (0.0012) [2024-06-15 13:09:35,955][1648985] Fps is (10 sec: 45875.2, 60 sec: 49152.0, 300 sec: 45986.3). Total num frames: 269680640. Throughput: 0: 11844.3. Samples: 67476480. Policy #0 lag: (min: 47.0, avg: 202.3, max: 319.0) [2024-06-15 13:09:35,956][1648985] Avg episode reward: [(0, '110.260')] [2024-06-15 13:09:36,335][1652491] Updated weights for policy 0, policy_version 131711 (0.0012) [2024-06-15 13:09:40,955][1648985] Fps is (10 sec: 42598.7, 60 sec: 44236.8, 300 sec: 45764.1). Total num frames: 269778944. Throughput: 0: 11878.4. Samples: 67517440. Policy #0 lag: (min: 47.0, avg: 202.3, max: 319.0) [2024-06-15 13:09:40,956][1648985] Avg episode reward: [(0, '98.870')] [2024-06-15 13:09:41,724][1652491] Updated weights for policy 0, policy_version 131760 (0.0010) [2024-06-15 13:09:44,151][1652491] Updated weights for policy 0, policy_version 131824 (0.0013) [2024-06-15 13:09:45,413][1652491] Updated weights for policy 0, policy_version 131872 (0.0118) [2024-06-15 13:09:45,955][1648985] Fps is (10 sec: 42598.9, 60 sec: 48059.8, 300 sec: 45986.3). Total num frames: 270106624. Throughput: 0: 11844.3. Samples: 67586048. Policy #0 lag: (min: 47.0, avg: 202.3, max: 319.0) [2024-06-15 13:09:45,955][1648985] Avg episode reward: [(0, '123.470')] [2024-06-15 13:09:47,196][1652491] Updated weights for policy 0, policy_version 131957 (0.0014) [2024-06-15 13:09:50,955][1648985] Fps is (10 sec: 49151.7, 60 sec: 45329.0, 300 sec: 45987.4). Total num frames: 270270464. Throughput: 0: 11923.9. Samples: 67657216. Policy #0 lag: (min: 47.0, avg: 202.3, max: 319.0) [2024-06-15 13:09:50,956][1648985] Avg episode reward: [(0, '137.430')] [2024-06-15 13:09:52,441][1652491] Updated weights for policy 0, policy_version 132006 (0.0013) [2024-06-15 13:09:55,140][1652491] Updated weights for policy 0, policy_version 132064 (0.0012) [2024-06-15 13:09:55,955][1648985] Fps is (10 sec: 42597.8, 60 sec: 48059.8, 300 sec: 45764.1). Total num frames: 270532608. Throughput: 0: 11923.9. Samples: 67693056. Policy #0 lag: (min: 47.0, avg: 202.3, max: 319.0) [2024-06-15 13:09:55,956][1648985] Avg episode reward: [(0, '134.030')] [2024-06-15 13:09:56,290][1651469] Signal inference workers to stop experience collection... (6900 times) [2024-06-15 13:09:56,413][1652491] InferenceWorker_p0-w0: stopping experience collection (6900 times) [2024-06-15 13:09:56,600][1651469] Signal inference workers to resume experience collection... (6900 times) [2024-06-15 13:09:56,601][1652491] InferenceWorker_p0-w0: resuming experience collection (6900 times) [2024-06-15 13:09:56,788][1652491] Updated weights for policy 0, policy_version 132132 (0.0013) [2024-06-15 13:09:59,149][1652491] Updated weights for policy 0, policy_version 132215 (0.0014) [2024-06-15 13:10:00,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 47513.6, 300 sec: 46208.4). Total num frames: 270794752. Throughput: 0: 11810.1. Samples: 67753984. Policy #0 lag: (min: 47.0, avg: 202.3, max: 319.0) [2024-06-15 13:10:00,956][1648985] Avg episode reward: [(0, '135.110')] [2024-06-15 13:10:03,544][1652491] Updated weights for policy 0, policy_version 132272 (0.0087) [2024-06-15 13:10:05,955][1648985] Fps is (10 sec: 39321.7, 60 sec: 45875.2, 300 sec: 45542.0). Total num frames: 270925824. Throughput: 0: 11741.9. Samples: 67830272. Policy #0 lag: (min: 47.0, avg: 202.3, max: 319.0) [2024-06-15 13:10:05,956][1648985] Avg episode reward: [(0, '128.190')] [2024-06-15 13:10:06,278][1652491] Updated weights for policy 0, policy_version 132304 (0.0029) [2024-06-15 13:10:07,879][1652491] Updated weights for policy 0, policy_version 132371 (0.0014) [2024-06-15 13:10:09,941][1652491] Updated weights for policy 0, policy_version 132436 (0.0087) [2024-06-15 13:10:10,899][1652491] Updated weights for policy 0, policy_version 132478 (0.0012) [2024-06-15 13:10:10,955][1648985] Fps is (10 sec: 49151.8, 60 sec: 49151.9, 300 sec: 46097.4). Total num frames: 271286272. Throughput: 0: 11787.4. Samples: 67859968. Policy #0 lag: (min: 47.0, avg: 202.3, max: 319.0) [2024-06-15 13:10:10,956][1648985] Avg episode reward: [(0, '130.970')] [2024-06-15 13:10:15,249][1652491] Updated weights for policy 0, policy_version 132535 (0.0035) [2024-06-15 13:10:15,955][1648985] Fps is (10 sec: 52429.2, 60 sec: 45875.2, 300 sec: 45875.2). Total num frames: 271450112. Throughput: 0: 11548.5. Samples: 67931648. Policy #0 lag: (min: 47.0, avg: 202.3, max: 319.0) [2024-06-15 13:10:15,956][1648985] Avg episode reward: [(0, '111.470')] [2024-06-15 13:10:18,623][1652491] Updated weights for policy 0, policy_version 132608 (0.0015) [2024-06-15 13:10:19,736][1652491] Updated weights for policy 0, policy_version 132667 (0.0013) [2024-06-15 13:10:20,955][1648985] Fps is (10 sec: 42598.9, 60 sec: 48060.0, 300 sec: 45764.1). Total num frames: 271712256. Throughput: 0: 11685.0. Samples: 68002304. Policy #0 lag: (min: 47.0, avg: 202.3, max: 319.0) [2024-06-15 13:10:20,956][1648985] Avg episode reward: [(0, '109.580')] [2024-06-15 13:10:22,144][1652491] Updated weights for policy 0, policy_version 132729 (0.0012) [2024-06-15 13:10:25,500][1652491] Updated weights for policy 0, policy_version 132784 (0.0012) [2024-06-15 13:10:25,955][1648985] Fps is (10 sec: 52427.5, 60 sec: 45875.1, 300 sec: 46208.4). Total num frames: 271974400. Throughput: 0: 11593.9. Samples: 68039168. Policy #0 lag: (min: 47.0, avg: 202.3, max: 319.0) [2024-06-15 13:10:25,956][1648985] Avg episode reward: [(0, '117.190')] [2024-06-15 13:10:27,933][1652491] Updated weights for policy 0, policy_version 132805 (0.0013) [2024-06-15 13:10:29,045][1652491] Updated weights for policy 0, policy_version 132863 (0.0012) [2024-06-15 13:10:30,627][1652491] Updated weights for policy 0, policy_version 132924 (0.0019) [2024-06-15 13:10:30,955][1648985] Fps is (10 sec: 52428.5, 60 sec: 48059.7, 300 sec: 45986.3). Total num frames: 272236544. Throughput: 0: 11673.6. Samples: 68111360. Policy #0 lag: (min: 15.0, avg: 104.2, max: 271.0) [2024-06-15 13:10:30,956][1648985] Avg episode reward: [(0, '139.410')] [2024-06-15 13:10:32,898][1652491] Updated weights for policy 0, policy_version 132982 (0.0014) [2024-06-15 13:10:35,975][1648985] Fps is (10 sec: 45783.8, 60 sec: 45859.8, 300 sec: 46427.4). Total num frames: 272433152. Throughput: 0: 11645.6. Samples: 68181504. Policy #0 lag: (min: 15.0, avg: 104.2, max: 271.0) [2024-06-15 13:10:35,976][1648985] Avg episode reward: [(0, '153.140')] [2024-06-15 13:10:36,100][1652491] Updated weights for policy 0, policy_version 133040 (0.0014) [2024-06-15 13:10:40,050][1652491] Updated weights for policy 0, policy_version 133118 (0.0124) [2024-06-15 13:10:40,955][1648985] Fps is (10 sec: 39321.5, 60 sec: 47513.5, 300 sec: 45764.2). Total num frames: 272629760. Throughput: 0: 11719.1. Samples: 68220416. Policy #0 lag: (min: 15.0, avg: 104.2, max: 271.0) [2024-06-15 13:10:40,956][1648985] Avg episode reward: [(0, '138.640')] [2024-06-15 13:10:41,160][1651469] Signal inference workers to stop experience collection... (6950 times) [2024-06-15 13:10:41,282][1652491] InferenceWorker_p0-w0: stopping experience collection (6950 times) [2024-06-15 13:10:41,452][1651469] Signal inference workers to resume experience collection... (6950 times) [2024-06-15 13:10:41,453][1652491] InferenceWorker_p0-w0: resuming experience collection (6950 times) [2024-06-15 13:10:41,763][1652491] Updated weights for policy 0, policy_version 133168 (0.0013) [2024-06-15 13:10:43,032][1652491] Updated weights for policy 0, policy_version 133201 (0.0013) [2024-06-15 13:10:44,200][1652491] Updated weights for policy 0, policy_version 133248 (0.0013) [2024-06-15 13:10:45,955][1648985] Fps is (10 sec: 45967.4, 60 sec: 46421.2, 300 sec: 46208.4). Total num frames: 272891904. Throughput: 0: 11867.0. Samples: 68288000. Policy #0 lag: (min: 15.0, avg: 104.2, max: 271.0) [2024-06-15 13:10:45,956][1648985] Avg episode reward: [(0, '139.250')] [2024-06-15 13:10:47,418][1652491] Updated weights for policy 0, policy_version 133308 (0.0011) [2024-06-15 13:10:50,749][1652491] Updated weights for policy 0, policy_version 133348 (0.0013) [2024-06-15 13:10:50,955][1648985] Fps is (10 sec: 49151.9, 60 sec: 47513.6, 300 sec: 45986.3). Total num frames: 273121280. Throughput: 0: 11730.5. Samples: 68358144. Policy #0 lag: (min: 15.0, avg: 104.2, max: 271.0) [2024-06-15 13:10:50,956][1648985] Avg episode reward: [(0, '137.290')] [2024-06-15 13:10:52,595][1652491] Updated weights for policy 0, policy_version 133400 (0.0012) [2024-06-15 13:10:54,367][1652491] Updated weights for policy 0, policy_version 133444 (0.0013) [2024-06-15 13:10:55,955][1648985] Fps is (10 sec: 52427.4, 60 sec: 48059.5, 300 sec: 46208.4). Total num frames: 273416192. Throughput: 0: 11821.4. Samples: 68391936. Policy #0 lag: (min: 15.0, avg: 104.2, max: 271.0) [2024-06-15 13:10:55,956][1648985] Avg episode reward: [(0, '138.280')] [2024-06-15 13:10:55,966][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000133504_273416192.pth... [2024-06-15 13:10:56,009][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000128128_262406144.pth [2024-06-15 13:10:56,015][1651469] Saving a milestone train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/milestones/checkpoint_000133504_273416192.pth [2024-06-15 13:10:57,709][1652491] Updated weights for policy 0, policy_version 133505 (0.0014) [2024-06-15 13:11:00,955][1648985] Fps is (10 sec: 42598.8, 60 sec: 45875.2, 300 sec: 45986.3). Total num frames: 273547264. Throughput: 0: 11798.7. Samples: 68462592. Policy #0 lag: (min: 15.0, avg: 104.2, max: 271.0) [2024-06-15 13:11:00,956][1648985] Avg episode reward: [(0, '133.890')] [2024-06-15 13:11:01,457][1652491] Updated weights for policy 0, policy_version 133569 (0.0013) [2024-06-15 13:11:03,392][1652491] Updated weights for policy 0, policy_version 133635 (0.0022) [2024-06-15 13:11:05,955][1648985] Fps is (10 sec: 39322.6, 60 sec: 48059.7, 300 sec: 45764.1). Total num frames: 273809408. Throughput: 0: 11764.6. Samples: 68531712. Policy #0 lag: (min: 15.0, avg: 104.2, max: 271.0) [2024-06-15 13:11:05,956][1648985] Avg episode reward: [(0, '132.760')] [2024-06-15 13:11:06,175][1652491] Updated weights for policy 0, policy_version 133697 (0.0013) [2024-06-15 13:11:07,693][1652491] Updated weights for policy 0, policy_version 133751 (0.0011) [2024-06-15 13:11:09,646][1652491] Updated weights for policy 0, policy_version 133779 (0.0014) [2024-06-15 13:11:10,609][1652491] Updated weights for policy 0, policy_version 133824 (0.0036) [2024-06-15 13:11:10,955][1648985] Fps is (10 sec: 52428.2, 60 sec: 46421.3, 300 sec: 46208.4). Total num frames: 274071552. Throughput: 0: 11650.9. Samples: 68563456. Policy #0 lag: (min: 15.0, avg: 104.2, max: 271.0) [2024-06-15 13:11:10,956][1648985] Avg episode reward: [(0, '124.380')] [2024-06-15 13:11:13,729][1652491] Updated weights for policy 0, policy_version 133881 (0.0012) [2024-06-15 13:11:15,763][1652491] Updated weights for policy 0, policy_version 133946 (0.0095) [2024-06-15 13:11:15,955][1648985] Fps is (10 sec: 52429.4, 60 sec: 48059.7, 300 sec: 45989.8). Total num frames: 274333696. Throughput: 0: 11650.8. Samples: 68635648. Policy #0 lag: (min: 15.0, avg: 104.2, max: 271.0) [2024-06-15 13:11:15,956][1648985] Avg episode reward: [(0, '122.810')] [2024-06-15 13:11:18,640][1652491] Updated weights for policy 0, policy_version 134000 (0.0124) [2024-06-15 13:11:20,955][1648985] Fps is (10 sec: 42599.2, 60 sec: 46421.4, 300 sec: 46097.4). Total num frames: 274497536. Throughput: 0: 11542.3. Samples: 68700672. Policy #0 lag: (min: 15.0, avg: 104.2, max: 271.0) [2024-06-15 13:11:20,955][1648985] Avg episode reward: [(0, '126.230')] [2024-06-15 13:11:21,322][1652491] Updated weights for policy 0, policy_version 134049 (0.0013) [2024-06-15 13:11:24,359][1652491] Updated weights for policy 0, policy_version 134096 (0.0012) [2024-06-15 13:11:25,955][1648985] Fps is (10 sec: 39321.3, 60 sec: 45875.3, 300 sec: 45764.1). Total num frames: 274726912. Throughput: 0: 11457.4. Samples: 68736000. Policy #0 lag: (min: 15.0, avg: 104.2, max: 271.0) [2024-06-15 13:11:25,956][1648985] Avg episode reward: [(0, '144.830')] [2024-06-15 13:11:26,694][1652491] Updated weights for policy 0, policy_version 134161 (0.0015) [2024-06-15 13:11:28,410][1651469] Signal inference workers to stop experience collection... (7000 times) [2024-06-15 13:11:28,477][1652491] InferenceWorker_p0-w0: stopping experience collection (7000 times) [2024-06-15 13:11:28,616][1651469] Signal inference workers to resume experience collection... (7000 times) [2024-06-15 13:11:28,617][1652491] InferenceWorker_p0-w0: resuming experience collection (7000 times) [2024-06-15 13:11:28,766][1652491] Updated weights for policy 0, policy_version 134225 (0.0012) [2024-06-15 13:11:30,955][1648985] Fps is (10 sec: 49151.3, 60 sec: 45875.2, 300 sec: 46208.4). Total num frames: 274989056. Throughput: 0: 11559.8. Samples: 68808192. Policy #0 lag: (min: 15.0, avg: 104.2, max: 271.0) [2024-06-15 13:11:30,956][1648985] Avg episode reward: [(0, '137.760')] [2024-06-15 13:11:31,914][1652491] Updated weights for policy 0, policy_version 134288 (0.0095) [2024-06-15 13:11:33,034][1652491] Updated weights for policy 0, policy_version 134336 (0.0011) [2024-06-15 13:11:35,955][1648985] Fps is (10 sec: 52429.4, 60 sec: 46983.3, 300 sec: 46208.5). Total num frames: 275251200. Throughput: 0: 11719.1. Samples: 68885504. Policy #0 lag: (min: 15.0, avg: 129.3, max: 271.0) [2024-06-15 13:11:35,956][1648985] Avg episode reward: [(0, '126.870')] [2024-06-15 13:11:36,834][1652491] Updated weights for policy 0, policy_version 134402 (0.0015) [2024-06-15 13:11:37,772][1652491] Updated weights for policy 0, policy_version 134462 (0.0012) [2024-06-15 13:11:39,542][1652491] Updated weights for policy 0, policy_version 134516 (0.0014) [2024-06-15 13:11:40,955][1648985] Fps is (10 sec: 52428.3, 60 sec: 48059.7, 300 sec: 46208.4). Total num frames: 275513344. Throughput: 0: 11776.1. Samples: 68921856. Policy #0 lag: (min: 15.0, avg: 129.3, max: 271.0) [2024-06-15 13:11:40,956][1648985] Avg episode reward: [(0, '125.360')] [2024-06-15 13:11:42,823][1652491] Updated weights for policy 0, policy_version 134544 (0.0016) [2024-06-15 13:11:44,051][1652491] Updated weights for policy 0, policy_version 134590 (0.0014) [2024-06-15 13:11:45,955][1648985] Fps is (10 sec: 42598.1, 60 sec: 46421.4, 300 sec: 46319.5). Total num frames: 275677184. Throughput: 0: 11855.6. Samples: 68996096. Policy #0 lag: (min: 15.0, avg: 129.3, max: 271.0) [2024-06-15 13:11:45,956][1648985] Avg episode reward: [(0, '123.600')] [2024-06-15 13:11:46,789][1652491] Updated weights for policy 0, policy_version 134655 (0.0116) [2024-06-15 13:11:48,831][1652491] Updated weights for policy 0, policy_version 134715 (0.0034) [2024-06-15 13:11:50,189][1652491] Updated weights for policy 0, policy_version 134755 (0.0013) [2024-06-15 13:11:50,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 48605.9, 300 sec: 46208.5). Total num frames: 276037632. Throughput: 0: 11764.6. Samples: 69061120. Policy #0 lag: (min: 15.0, avg: 129.3, max: 271.0) [2024-06-15 13:11:50,956][1648985] Avg episode reward: [(0, '129.620')] [2024-06-15 13:11:54,974][1652491] Updated weights for policy 0, policy_version 134832 (0.0024) [2024-06-15 13:11:55,955][1648985] Fps is (10 sec: 49152.1, 60 sec: 45875.5, 300 sec: 46541.7). Total num frames: 276168704. Throughput: 0: 11935.3. Samples: 69100544. Policy #0 lag: (min: 15.0, avg: 129.3, max: 271.0) [2024-06-15 13:11:55,956][1648985] Avg episode reward: [(0, '132.170')] [2024-06-15 13:11:56,846][1652491] Updated weights for policy 0, policy_version 134850 (0.0028) [2024-06-15 13:11:58,116][1652491] Updated weights for policy 0, policy_version 134910 (0.0023) [2024-06-15 13:12:00,955][1648985] Fps is (10 sec: 39322.0, 60 sec: 48059.7, 300 sec: 46208.4). Total num frames: 276430848. Throughput: 0: 11730.5. Samples: 69163520. Policy #0 lag: (min: 15.0, avg: 129.3, max: 271.0) [2024-06-15 13:12:00,956][1648985] Avg episode reward: [(0, '129.440')] [2024-06-15 13:12:01,654][1652491] Updated weights for policy 0, policy_version 134997 (0.0016) [2024-06-15 13:12:05,955][1648985] Fps is (10 sec: 42598.0, 60 sec: 46421.3, 300 sec: 46319.5). Total num frames: 276594688. Throughput: 0: 11958.0. Samples: 69238784. Policy #0 lag: (min: 15.0, avg: 129.3, max: 271.0) [2024-06-15 13:12:05,956][1648985] Avg episode reward: [(0, '137.870')] [2024-06-15 13:12:06,353][1652491] Updated weights for policy 0, policy_version 135072 (0.0014) [2024-06-15 13:12:08,471][1652491] Updated weights for policy 0, policy_version 135122 (0.0012) [2024-06-15 13:12:10,801][1652491] Updated weights for policy 0, policy_version 135184 (0.0015) [2024-06-15 13:12:10,955][1648985] Fps is (10 sec: 42597.6, 60 sec: 46421.3, 300 sec: 46319.5). Total num frames: 276856832. Throughput: 0: 11821.5. Samples: 69267968. Policy #0 lag: (min: 15.0, avg: 129.3, max: 271.0) [2024-06-15 13:12:10,956][1648985] Avg episode reward: [(0, '132.870')] [2024-06-15 13:12:13,219][1652491] Updated weights for policy 0, policy_version 135248 (0.0103) [2024-06-15 13:12:13,588][1651469] Signal inference workers to stop experience collection... (7050 times) [2024-06-15 13:12:13,630][1652491] InferenceWorker_p0-w0: stopping experience collection (7050 times) [2024-06-15 13:12:13,783][1651469] Signal inference workers to resume experience collection... (7050 times) [2024-06-15 13:12:13,783][1652491] InferenceWorker_p0-w0: resuming experience collection (7050 times) [2024-06-15 13:12:15,955][1648985] Fps is (10 sec: 49152.8, 60 sec: 45875.2, 300 sec: 46208.4). Total num frames: 277086208. Throughput: 0: 11685.0. Samples: 69334016. Policy #0 lag: (min: 15.0, avg: 129.3, max: 271.0) [2024-06-15 13:12:15,955][1648985] Avg episode reward: [(0, '128.260')] [2024-06-15 13:12:17,608][1652491] Updated weights for policy 0, policy_version 135344 (0.0013) [2024-06-15 13:12:20,727][1652491] Updated weights for policy 0, policy_version 135413 (0.0013) [2024-06-15 13:12:20,955][1648985] Fps is (10 sec: 49153.6, 60 sec: 47513.6, 300 sec: 46652.8). Total num frames: 277348352. Throughput: 0: 11559.9. Samples: 69405696. Policy #0 lag: (min: 15.0, avg: 129.3, max: 271.0) [2024-06-15 13:12:20,955][1648985] Avg episode reward: [(0, '134.670')] [2024-06-15 13:12:22,775][1652491] Updated weights for policy 0, policy_version 135456 (0.0014) [2024-06-15 13:12:25,796][1652491] Updated weights for policy 0, policy_version 135546 (0.0015) [2024-06-15 13:12:25,955][1648985] Fps is (10 sec: 52427.4, 60 sec: 48059.6, 300 sec: 46208.4). Total num frames: 277610496. Throughput: 0: 11491.5. Samples: 69438976. Policy #0 lag: (min: 15.0, avg: 129.3, max: 271.0) [2024-06-15 13:12:25,956][1648985] Avg episode reward: [(0, '165.450')] [2024-06-15 13:12:28,887][1652491] Updated weights for policy 0, policy_version 135586 (0.0010) [2024-06-15 13:12:30,789][1652491] Updated weights for policy 0, policy_version 135632 (0.0013) [2024-06-15 13:12:30,955][1648985] Fps is (10 sec: 42598.0, 60 sec: 46421.4, 300 sec: 46763.8). Total num frames: 277774336. Throughput: 0: 11423.3. Samples: 69510144. Policy #0 lag: (min: 15.0, avg: 129.3, max: 271.0) [2024-06-15 13:12:30,955][1648985] Avg episode reward: [(0, '152.380')] [2024-06-15 13:12:31,797][1652491] Updated weights for policy 0, policy_version 135679 (0.0013) [2024-06-15 13:12:34,623][1652491] Updated weights for policy 0, policy_version 135738 (0.0013) [2024-06-15 13:12:35,743][1652491] Updated weights for policy 0, policy_version 135778 (0.0012) [2024-06-15 13:12:35,955][1648985] Fps is (10 sec: 49153.5, 60 sec: 47513.6, 300 sec: 46652.8). Total num frames: 278102016. Throughput: 0: 11605.4. Samples: 69583360. Policy #0 lag: (min: 57.0, avg: 166.3, max: 313.0) [2024-06-15 13:12:35,955][1648985] Avg episode reward: [(0, '140.090')] [2024-06-15 13:12:39,186][1652491] Updated weights for policy 0, policy_version 135825 (0.0014) [2024-06-15 13:12:40,366][1652491] Updated weights for policy 0, policy_version 135872 (0.0012) [2024-06-15 13:12:40,955][1648985] Fps is (10 sec: 49151.3, 60 sec: 45875.2, 300 sec: 46652.7). Total num frames: 278265856. Throughput: 0: 11639.4. Samples: 69624320. Policy #0 lag: (min: 57.0, avg: 166.3, max: 313.0) [2024-06-15 13:12:40,956][1648985] Avg episode reward: [(0, '147.490')] [2024-06-15 13:12:42,805][1652491] Updated weights for policy 0, policy_version 135929 (0.0099) [2024-06-15 13:12:45,111][1652491] Updated weights for policy 0, policy_version 135956 (0.0013) [2024-06-15 13:12:45,955][1648985] Fps is (10 sec: 39320.6, 60 sec: 46967.4, 300 sec: 46652.7). Total num frames: 278495232. Throughput: 0: 11787.3. Samples: 69693952. Policy #0 lag: (min: 57.0, avg: 166.3, max: 313.0) [2024-06-15 13:12:45,956][1648985] Avg episode reward: [(0, '154.290')] [2024-06-15 13:12:46,707][1652491] Updated weights for policy 0, policy_version 136020 (0.0014) [2024-06-15 13:12:47,622][1652491] Updated weights for policy 0, policy_version 136060 (0.0028) [2024-06-15 13:12:50,623][1652491] Updated weights for policy 0, policy_version 136121 (0.0014) [2024-06-15 13:12:50,955][1648985] Fps is (10 sec: 52429.3, 60 sec: 45875.3, 300 sec: 46652.8). Total num frames: 278790144. Throughput: 0: 11605.4. Samples: 69761024. Policy #0 lag: (min: 57.0, avg: 166.3, max: 313.0) [2024-06-15 13:12:50,956][1648985] Avg episode reward: [(0, '155.250')] [2024-06-15 13:12:55,955][1648985] Fps is (10 sec: 42598.2, 60 sec: 45875.1, 300 sec: 46652.7). Total num frames: 278921216. Throughput: 0: 11628.1. Samples: 69791232. Policy #0 lag: (min: 57.0, avg: 166.3, max: 313.0) [2024-06-15 13:12:55,956][1648985] Avg episode reward: [(0, '138.680')] [2024-06-15 13:12:55,964][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000136192_278921216.pth... [2024-06-15 13:12:56,124][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000130688_267649024.pth [2024-06-15 13:12:56,371][1652491] Updated weights for policy 0, policy_version 136198 (0.0013) [2024-06-15 13:12:57,602][1652491] Updated weights for policy 0, policy_version 136251 (0.0114) [2024-06-15 13:12:58,832][1652491] Updated weights for policy 0, policy_version 136304 (0.0014) [2024-06-15 13:13:00,366][1651469] Signal inference workers to stop experience collection... (7100 times) [2024-06-15 13:13:00,523][1652491] InferenceWorker_p0-w0: stopping experience collection (7100 times) [2024-06-15 13:13:00,635][1651469] Signal inference workers to resume experience collection... (7100 times) [2024-06-15 13:13:00,636][1652491] InferenceWorker_p0-w0: resuming experience collection (7100 times) [2024-06-15 13:13:00,638][1652491] Updated weights for policy 0, policy_version 136336 (0.0011) [2024-06-15 13:13:00,955][1648985] Fps is (10 sec: 42598.4, 60 sec: 46421.3, 300 sec: 46430.6). Total num frames: 279216128. Throughput: 0: 11810.1. Samples: 69865472. Policy #0 lag: (min: 57.0, avg: 166.3, max: 313.0) [2024-06-15 13:13:00,956][1648985] Avg episode reward: [(0, '145.660')] [2024-06-15 13:13:03,892][1652491] Updated weights for policy 0, policy_version 136400 (0.0012) [2024-06-15 13:13:05,125][1652491] Updated weights for policy 0, policy_version 136448 (0.0020) [2024-06-15 13:13:05,955][1648985] Fps is (10 sec: 52430.2, 60 sec: 47513.7, 300 sec: 46986.0). Total num frames: 279445504. Throughput: 0: 11707.7. Samples: 69932544. Policy #0 lag: (min: 57.0, avg: 166.3, max: 313.0) [2024-06-15 13:13:05,955][1648985] Avg episode reward: [(0, '152.590')] [2024-06-15 13:13:08,960][1652491] Updated weights for policy 0, policy_version 136507 (0.0014) [2024-06-15 13:13:10,756][1652491] Updated weights for policy 0, policy_version 136566 (0.0014) [2024-06-15 13:13:10,955][1648985] Fps is (10 sec: 45875.2, 60 sec: 46967.6, 300 sec: 46541.7). Total num frames: 279674880. Throughput: 0: 11798.8. Samples: 69969920. Policy #0 lag: (min: 57.0, avg: 166.3, max: 313.0) [2024-06-15 13:13:10,956][1648985] Avg episode reward: [(0, '141.570')] [2024-06-15 13:13:12,625][1652491] Updated weights for policy 0, policy_version 136608 (0.0011) [2024-06-15 13:13:14,392][1652491] Updated weights for policy 0, policy_version 136656 (0.0014) [2024-06-15 13:13:15,378][1652491] Updated weights for policy 0, policy_version 136702 (0.0025) [2024-06-15 13:13:15,955][1648985] Fps is (10 sec: 52428.6, 60 sec: 48059.7, 300 sec: 47097.0). Total num frames: 279969792. Throughput: 0: 11753.2. Samples: 70039040. Policy #0 lag: (min: 57.0, avg: 166.3, max: 313.0) [2024-06-15 13:13:15,956][1648985] Avg episode reward: [(0, '129.350')] [2024-06-15 13:13:20,425][1652491] Updated weights for policy 0, policy_version 136763 (0.0021) [2024-06-15 13:13:20,955][1648985] Fps is (10 sec: 42598.4, 60 sec: 45875.1, 300 sec: 46652.8). Total num frames: 280100864. Throughput: 0: 11685.0. Samples: 70109184. Policy #0 lag: (min: 57.0, avg: 166.3, max: 313.0) [2024-06-15 13:13:20,956][1648985] Avg episode reward: [(0, '137.840')] [2024-06-15 13:13:22,354][1652491] Updated weights for policy 0, policy_version 136822 (0.0012) [2024-06-15 13:13:24,065][1652491] Updated weights for policy 0, policy_version 136864 (0.0011) [2024-06-15 13:13:25,955][1648985] Fps is (10 sec: 39321.5, 60 sec: 45875.4, 300 sec: 46652.7). Total num frames: 280363008. Throughput: 0: 11582.6. Samples: 70145536. Policy #0 lag: (min: 57.0, avg: 166.3, max: 313.0) [2024-06-15 13:13:25,956][1648985] Avg episode reward: [(0, '140.560')] [2024-06-15 13:13:26,547][1652491] Updated weights for policy 0, policy_version 136913 (0.0018) [2024-06-15 13:13:30,551][1652491] Updated weights for policy 0, policy_version 136962 (0.0012) [2024-06-15 13:13:30,955][1648985] Fps is (10 sec: 42598.3, 60 sec: 45875.1, 300 sec: 46763.8). Total num frames: 280526848. Throughput: 0: 11537.1. Samples: 70213120. Policy #0 lag: (min: 57.0, avg: 166.3, max: 313.0) [2024-06-15 13:13:30,956][1648985] Avg episode reward: [(0, '143.390')] [2024-06-15 13:13:31,771][1652491] Updated weights for policy 0, policy_version 137021 (0.0012) [2024-06-15 13:13:33,470][1652491] Updated weights for policy 0, policy_version 137077 (0.0011) [2024-06-15 13:13:35,226][1652491] Updated weights for policy 0, policy_version 137120 (0.0012) [2024-06-15 13:13:35,955][1648985] Fps is (10 sec: 52428.4, 60 sec: 46421.2, 300 sec: 46652.7). Total num frames: 280887296. Throughput: 0: 11525.7. Samples: 70279680. Policy #0 lag: (min: 57.0, avg: 166.3, max: 313.0) [2024-06-15 13:13:35,956][1648985] Avg episode reward: [(0, '143.320')] [2024-06-15 13:13:37,995][1652491] Updated weights for policy 0, policy_version 137184 (0.0017) [2024-06-15 13:13:40,955][1648985] Fps is (10 sec: 49151.7, 60 sec: 45875.2, 300 sec: 46763.8). Total num frames: 281018368. Throughput: 0: 11616.8. Samples: 70313984. Policy #0 lag: (min: 0.0, avg: 125.1, max: 256.0) [2024-06-15 13:13:40,956][1648985] Avg episode reward: [(0, '140.230')] [2024-06-15 13:13:42,684][1652491] Updated weights for policy 0, policy_version 137238 (0.0015) [2024-06-15 13:13:44,075][1652491] Updated weights for policy 0, policy_version 137297 (0.0012) [2024-06-15 13:13:45,105][1652491] Updated weights for policy 0, policy_version 137343 (0.0012) [2024-06-15 13:13:45,955][1648985] Fps is (10 sec: 39321.8, 60 sec: 46421.4, 300 sec: 46541.7). Total num frames: 281280512. Throughput: 0: 11514.3. Samples: 70383616. Policy #0 lag: (min: 0.0, avg: 125.1, max: 256.0) [2024-06-15 13:13:45,956][1648985] Avg episode reward: [(0, '115.010')] [2024-06-15 13:13:46,648][1651469] Signal inference workers to stop experience collection... (7150 times) [2024-06-15 13:13:46,686][1652491] InferenceWorker_p0-w0: stopping experience collection (7150 times) [2024-06-15 13:13:46,877][1651469] Signal inference workers to resume experience collection... (7150 times) [2024-06-15 13:13:46,890][1652491] InferenceWorker_p0-w0: resuming experience collection (7150 times) [2024-06-15 13:13:47,402][1652491] Updated weights for policy 0, policy_version 137401 (0.0012) [2024-06-15 13:13:49,223][1652491] Updated weights for policy 0, policy_version 137442 (0.0042) [2024-06-15 13:13:50,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 45875.2, 300 sec: 47097.1). Total num frames: 281542656. Throughput: 0: 11639.4. Samples: 70456320. Policy #0 lag: (min: 0.0, avg: 125.1, max: 256.0) [2024-06-15 13:13:50,956][1648985] Avg episode reward: [(0, '124.920')] [2024-06-15 13:13:53,715][1652491] Updated weights for policy 0, policy_version 137496 (0.0013) [2024-06-15 13:13:55,086][1652491] Updated weights for policy 0, policy_version 137556 (0.0012) [2024-06-15 13:13:55,956][1648985] Fps is (10 sec: 49148.8, 60 sec: 47513.2, 300 sec: 46874.8). Total num frames: 281772032. Throughput: 0: 11662.0. Samples: 70494720. Policy #0 lag: (min: 0.0, avg: 125.1, max: 256.0) [2024-06-15 13:13:55,957][1648985] Avg episode reward: [(0, '134.950')] [2024-06-15 13:13:56,235][1652491] Updated weights for policy 0, policy_version 137600 (0.0014) [2024-06-15 13:13:58,513][1652491] Updated weights for policy 0, policy_version 137661 (0.0014) [2024-06-15 13:14:00,729][1652491] Updated weights for policy 0, policy_version 137716 (0.0012) [2024-06-15 13:14:00,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 47513.6, 300 sec: 47097.1). Total num frames: 282066944. Throughput: 0: 11525.7. Samples: 70557696. Policy #0 lag: (min: 0.0, avg: 125.1, max: 256.0) [2024-06-15 13:14:00,956][1648985] Avg episode reward: [(0, '148.910')] [2024-06-15 13:14:04,924][1652491] Updated weights for policy 0, policy_version 137744 (0.0012) [2024-06-15 13:14:05,955][1648985] Fps is (10 sec: 39324.4, 60 sec: 45329.0, 300 sec: 46874.9). Total num frames: 282165248. Throughput: 0: 11559.8. Samples: 70629376. Policy #0 lag: (min: 0.0, avg: 125.1, max: 256.0) [2024-06-15 13:14:05,956][1648985] Avg episode reward: [(0, '128.260')] [2024-06-15 13:14:06,716][1652491] Updated weights for policy 0, policy_version 137808 (0.0105) [2024-06-15 13:14:07,878][1652491] Updated weights for policy 0, policy_version 137852 (0.0011) [2024-06-15 13:14:08,882][1652491] Updated weights for policy 0, policy_version 137890 (0.0012) [2024-06-15 13:14:10,955][1648985] Fps is (10 sec: 42597.7, 60 sec: 46967.3, 300 sec: 46763.8). Total num frames: 282492928. Throughput: 0: 11548.4. Samples: 70665216. Policy #0 lag: (min: 0.0, avg: 125.1, max: 256.0) [2024-06-15 13:14:10,956][1648985] Avg episode reward: [(0, '121.010')] [2024-06-15 13:14:11,736][1652491] Updated weights for policy 0, policy_version 137980 (0.0014) [2024-06-15 13:14:15,955][1648985] Fps is (10 sec: 42598.5, 60 sec: 43690.7, 300 sec: 46652.8). Total num frames: 282591232. Throughput: 0: 11650.9. Samples: 70737408. Policy #0 lag: (min: 0.0, avg: 125.1, max: 256.0) [2024-06-15 13:14:15,955][1648985] Avg episode reward: [(0, '119.070')] [2024-06-15 13:14:17,090][1652491] Updated weights for policy 0, policy_version 138041 (0.0121) [2024-06-15 13:14:18,973][1652491] Updated weights for policy 0, policy_version 138110 (0.0112) [2024-06-15 13:14:20,955][1648985] Fps is (10 sec: 45875.9, 60 sec: 47513.6, 300 sec: 46541.7). Total num frames: 282951680. Throughput: 0: 11537.1. Samples: 70798848. Policy #0 lag: (min: 0.0, avg: 125.1, max: 256.0) [2024-06-15 13:14:20,956][1648985] Avg episode reward: [(0, '139.010')] [2024-06-15 13:14:21,073][1652491] Updated weights for policy 0, policy_version 138171 (0.0034) [2024-06-15 13:14:23,428][1652491] Updated weights for policy 0, policy_version 138234 (0.0014) [2024-06-15 13:14:25,955][1648985] Fps is (10 sec: 52428.1, 60 sec: 45875.1, 300 sec: 46652.7). Total num frames: 283115520. Throughput: 0: 11548.4. Samples: 70833664. Policy #0 lag: (min: 0.0, avg: 125.1, max: 256.0) [2024-06-15 13:14:25,956][1648985] Avg episode reward: [(0, '160.430')] [2024-06-15 13:14:27,794][1652491] Updated weights for policy 0, policy_version 138288 (0.0013) [2024-06-15 13:14:30,028][1652491] Updated weights for policy 0, policy_version 138356 (0.0019) [2024-06-15 13:14:30,955][1648985] Fps is (10 sec: 42598.6, 60 sec: 47513.6, 300 sec: 46430.6). Total num frames: 283377664. Throughput: 0: 11685.0. Samples: 70909440. Policy #0 lag: (min: 0.0, avg: 125.1, max: 256.0) [2024-06-15 13:14:30,956][1648985] Avg episode reward: [(0, '159.780')] [2024-06-15 13:14:31,072][1651469] Signal inference workers to stop experience collection... (7200 times) [2024-06-15 13:14:31,137][1652491] InferenceWorker_p0-w0: stopping experience collection (7200 times) [2024-06-15 13:14:31,275][1651469] Signal inference workers to resume experience collection... (7200 times) [2024-06-15 13:14:31,276][1652491] InferenceWorker_p0-w0: resuming experience collection (7200 times) [2024-06-15 13:14:31,603][1652491] Updated weights for policy 0, policy_version 138416 (0.0013) [2024-06-15 13:14:33,378][1652491] Updated weights for policy 0, policy_version 138438 (0.0012) [2024-06-15 13:14:34,533][1652491] Updated weights for policy 0, policy_version 138496 (0.0014) [2024-06-15 13:14:35,955][1648985] Fps is (10 sec: 52429.4, 60 sec: 45875.3, 300 sec: 46986.0). Total num frames: 283639808. Throughput: 0: 11696.4. Samples: 70982656. Policy #0 lag: (min: 0.0, avg: 125.1, max: 256.0) [2024-06-15 13:14:35,956][1648985] Avg episode reward: [(0, '133.830')] [2024-06-15 13:14:38,563][1652491] Updated weights for policy 0, policy_version 138552 (0.0014) [2024-06-15 13:14:40,429][1652491] Updated weights for policy 0, policy_version 138592 (0.0013) [2024-06-15 13:14:40,955][1648985] Fps is (10 sec: 49151.7, 60 sec: 47513.6, 300 sec: 46652.7). Total num frames: 283869184. Throughput: 0: 11696.5. Samples: 71021056. Policy #0 lag: (min: 20.0, avg: 124.0, max: 276.0) [2024-06-15 13:14:40,956][1648985] Avg episode reward: [(0, '129.330')] [2024-06-15 13:14:42,049][1652491] Updated weights for policy 0, policy_version 138641 (0.0028) [2024-06-15 13:14:44,536][1652491] Updated weights for policy 0, policy_version 138712 (0.0012) [2024-06-15 13:14:45,206][1652491] Updated weights for policy 0, policy_version 138749 (0.0017) [2024-06-15 13:14:45,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 48059.8, 300 sec: 47097.1). Total num frames: 284164096. Throughput: 0: 11753.3. Samples: 71086592. Policy #0 lag: (min: 20.0, avg: 124.0, max: 276.0) [2024-06-15 13:14:45,956][1648985] Avg episode reward: [(0, '123.520')] [2024-06-15 13:14:49,590][1652491] Updated weights for policy 0, policy_version 138806 (0.0014) [2024-06-15 13:14:50,955][1648985] Fps is (10 sec: 42597.4, 60 sec: 45875.0, 300 sec: 46652.7). Total num frames: 284295168. Throughput: 0: 12003.5. Samples: 71169536. Policy #0 lag: (min: 20.0, avg: 124.0, max: 276.0) [2024-06-15 13:14:50,956][1648985] Avg episode reward: [(0, '117.520')] [2024-06-15 13:14:51,480][1652491] Updated weights for policy 0, policy_version 138836 (0.0012) [2024-06-15 13:14:52,681][1652491] Updated weights for policy 0, policy_version 138896 (0.0043) [2024-06-15 13:14:54,260][1652491] Updated weights for policy 0, policy_version 138960 (0.0012) [2024-06-15 13:14:55,955][1648985] Fps is (10 sec: 52428.5, 60 sec: 48606.4, 300 sec: 47097.1). Total num frames: 284688384. Throughput: 0: 11912.6. Samples: 71201280. Policy #0 lag: (min: 20.0, avg: 124.0, max: 276.0) [2024-06-15 13:14:55,956][1648985] Avg episode reward: [(0, '131.740')] [2024-06-15 13:14:55,975][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000139008_284688384.pth... [2024-06-15 13:14:56,030][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000133504_273416192.pth [2024-06-15 13:15:00,063][1652491] Updated weights for policy 0, policy_version 139041 (0.0012) [2024-06-15 13:15:00,955][1648985] Fps is (10 sec: 52430.0, 60 sec: 45875.2, 300 sec: 47097.1). Total num frames: 284819456. Throughput: 0: 12003.5. Samples: 71277568. Policy #0 lag: (min: 20.0, avg: 124.0, max: 276.0) [2024-06-15 13:15:00,956][1648985] Avg episode reward: [(0, '143.750')] [2024-06-15 13:15:02,284][1652491] Updated weights for policy 0, policy_version 139092 (0.0014) [2024-06-15 13:15:03,775][1652491] Updated weights for policy 0, policy_version 139168 (0.0014) [2024-06-15 13:15:05,168][1652491] Updated weights for policy 0, policy_version 139202 (0.0016) [2024-06-15 13:15:05,955][1648985] Fps is (10 sec: 45875.1, 60 sec: 49698.0, 300 sec: 46986.0). Total num frames: 285147136. Throughput: 0: 12128.7. Samples: 71344640. Policy #0 lag: (min: 20.0, avg: 124.0, max: 276.0) [2024-06-15 13:15:05,956][1648985] Avg episode reward: [(0, '135.930')] [2024-06-15 13:15:06,336][1652491] Updated weights for policy 0, policy_version 139254 (0.0015) [2024-06-15 13:15:10,955][1648985] Fps is (10 sec: 39321.7, 60 sec: 45329.2, 300 sec: 46652.7). Total num frames: 285212672. Throughput: 0: 12242.5. Samples: 71384576. Policy #0 lag: (min: 20.0, avg: 124.0, max: 276.0) [2024-06-15 13:15:10,956][1648985] Avg episode reward: [(0, '126.860')] [2024-06-15 13:15:11,545][1652491] Updated weights for policy 0, policy_version 139297 (0.0015) [2024-06-15 13:15:12,578][1652491] Updated weights for policy 0, policy_version 139329 (0.0012) [2024-06-15 13:15:14,073][1652491] Updated weights for policy 0, policy_version 139392 (0.0013) [2024-06-15 13:15:14,205][1651469] Signal inference workers to stop experience collection... (7250 times) [2024-06-15 13:15:14,265][1652491] InferenceWorker_p0-w0: stopping experience collection (7250 times) [2024-06-15 13:15:14,427][1651469] Signal inference workers to resume experience collection... (7250 times) [2024-06-15 13:15:14,428][1652491] InferenceWorker_p0-w0: resuming experience collection (7250 times) [2024-06-15 13:15:15,112][1652491] Updated weights for policy 0, policy_version 139442 (0.0105) [2024-06-15 13:15:15,955][1648985] Fps is (10 sec: 45874.9, 60 sec: 50244.1, 300 sec: 47097.0). Total num frames: 285605888. Throughput: 0: 12060.4. Samples: 71452160. Policy #0 lag: (min: 20.0, avg: 124.0, max: 276.0) [2024-06-15 13:15:15,956][1648985] Avg episode reward: [(0, '128.660')] [2024-06-15 13:15:17,292][1652491] Updated weights for policy 0, policy_version 139495 (0.0028) [2024-06-15 13:15:20,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 46421.3, 300 sec: 46652.8). Total num frames: 285736960. Throughput: 0: 12151.5. Samples: 71529472. Policy #0 lag: (min: 20.0, avg: 124.0, max: 276.0) [2024-06-15 13:15:20,956][1648985] Avg episode reward: [(0, '123.390')] [2024-06-15 13:15:22,582][1652491] Updated weights for policy 0, policy_version 139552 (0.0016) [2024-06-15 13:15:23,588][1652491] Updated weights for policy 0, policy_version 139589 (0.0014) [2024-06-15 13:15:25,822][1652491] Updated weights for policy 0, policy_version 139669 (0.0013) [2024-06-15 13:15:25,955][1648985] Fps is (10 sec: 45875.7, 60 sec: 49152.1, 300 sec: 46874.9). Total num frames: 286064640. Throughput: 0: 12071.8. Samples: 71564288. Policy #0 lag: (min: 20.0, avg: 124.0, max: 276.0) [2024-06-15 13:15:25,956][1648985] Avg episode reward: [(0, '123.390')] [2024-06-15 13:15:27,830][1652491] Updated weights for policy 0, policy_version 139714 (0.0013) [2024-06-15 13:15:30,955][1648985] Fps is (10 sec: 52428.3, 60 sec: 48059.6, 300 sec: 46878.1). Total num frames: 286261248. Throughput: 0: 11855.6. Samples: 71620096. Policy #0 lag: (min: 20.0, avg: 124.0, max: 276.0) [2024-06-15 13:15:30,956][1648985] Avg episode reward: [(0, '129.590')] [2024-06-15 13:15:34,033][1652491] Updated weights for policy 0, policy_version 139792 (0.0015) [2024-06-15 13:15:35,241][1652491] Updated weights for policy 0, policy_version 139837 (0.0015) [2024-06-15 13:15:35,955][1648985] Fps is (10 sec: 36044.9, 60 sec: 46421.3, 300 sec: 46763.8). Total num frames: 286425088. Throughput: 0: 11741.9. Samples: 71697920. Policy #0 lag: (min: 20.0, avg: 124.0, max: 276.0) [2024-06-15 13:15:35,956][1648985] Avg episode reward: [(0, '133.940')] [2024-06-15 13:15:36,845][1652491] Updated weights for policy 0, policy_version 139889 (0.0013) [2024-06-15 13:15:37,841][1652491] Updated weights for policy 0, policy_version 139936 (0.0011) [2024-06-15 13:15:40,012][1652491] Updated weights for policy 0, policy_version 140000 (0.0022) [2024-06-15 13:15:40,955][1648985] Fps is (10 sec: 52429.5, 60 sec: 48605.9, 300 sec: 47097.1). Total num frames: 286785536. Throughput: 0: 11719.1. Samples: 71728640. Policy #0 lag: (min: 20.0, avg: 124.0, max: 276.0) [2024-06-15 13:15:40,955][1648985] Avg episode reward: [(0, '147.990')] [2024-06-15 13:15:45,955][1648985] Fps is (10 sec: 42598.4, 60 sec: 44782.9, 300 sec: 46541.7). Total num frames: 286851072. Throughput: 0: 11741.9. Samples: 71805952. Policy #0 lag: (min: 0.0, avg: 73.3, max: 256.0) [2024-06-15 13:15:45,956][1648985] Avg episode reward: [(0, '151.850')] [2024-06-15 13:15:46,040][1652491] Updated weights for policy 0, policy_version 140066 (0.0012) [2024-06-15 13:15:46,925][1652491] Updated weights for policy 0, policy_version 140099 (0.0011) [2024-06-15 13:15:48,998][1652491] Updated weights for policy 0, policy_version 140180 (0.0132) [2024-06-15 13:15:49,820][1652491] Updated weights for policy 0, policy_version 140220 (0.0030) [2024-06-15 13:15:50,955][1648985] Fps is (10 sec: 45874.9, 60 sec: 49152.2, 300 sec: 46875.0). Total num frames: 287244288. Throughput: 0: 11480.2. Samples: 71861248. Policy #0 lag: (min: 0.0, avg: 73.3, max: 256.0) [2024-06-15 13:15:50,956][1648985] Avg episode reward: [(0, '143.430')] [2024-06-15 13:15:51,465][1652491] Updated weights for policy 0, policy_version 140281 (0.0013) [2024-06-15 13:15:55,955][1648985] Fps is (10 sec: 45875.1, 60 sec: 43690.7, 300 sec: 46652.7). Total num frames: 287309824. Throughput: 0: 11514.3. Samples: 71902720. Policy #0 lag: (min: 0.0, avg: 73.3, max: 256.0) [2024-06-15 13:15:55,956][1648985] Avg episode reward: [(0, '140.490')] [2024-06-15 13:15:57,224][1652491] Updated weights for policy 0, policy_version 140308 (0.0013) [2024-06-15 13:15:57,983][1651469] Signal inference workers to stop experience collection... (7300 times) [2024-06-15 13:15:58,038][1652491] InferenceWorker_p0-w0: stopping experience collection (7300 times) [2024-06-15 13:15:58,284][1651469] Signal inference workers to resume experience collection... (7300 times) [2024-06-15 13:15:58,285][1652491] InferenceWorker_p0-w0: resuming experience collection (7300 times) [2024-06-15 13:15:59,132][1652491] Updated weights for policy 0, policy_version 140384 (0.0012) [2024-06-15 13:16:00,955][1648985] Fps is (10 sec: 39321.8, 60 sec: 46967.5, 300 sec: 46874.9). Total num frames: 287637504. Throughput: 0: 11503.0. Samples: 71969792. Policy #0 lag: (min: 0.0, avg: 73.3, max: 256.0) [2024-06-15 13:16:00,956][1648985] Avg episode reward: [(0, '132.420')] [2024-06-15 13:16:01,203][1652491] Updated weights for policy 0, policy_version 140475 (0.0014) [2024-06-15 13:16:02,574][1652491] Updated weights for policy 0, policy_version 140539 (0.0015) [2024-06-15 13:16:05,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 44783.0, 300 sec: 46652.8). Total num frames: 287834112. Throughput: 0: 11411.9. Samples: 72043008. Policy #0 lag: (min: 0.0, avg: 73.3, max: 256.0) [2024-06-15 13:16:05,956][1648985] Avg episode reward: [(0, '139.310')] [2024-06-15 13:16:09,155][1652491] Updated weights for policy 0, policy_version 140595 (0.0168) [2024-06-15 13:16:10,755][1652491] Updated weights for policy 0, policy_version 140668 (0.0013) [2024-06-15 13:16:10,955][1648985] Fps is (10 sec: 45875.1, 60 sec: 48059.7, 300 sec: 46652.7). Total num frames: 288096256. Throughput: 0: 11400.5. Samples: 72077312. Policy #0 lag: (min: 0.0, avg: 73.3, max: 256.0) [2024-06-15 13:16:10,956][1648985] Avg episode reward: [(0, '123.800')] [2024-06-15 13:16:12,028][1652491] Updated weights for policy 0, policy_version 140706 (0.0014) [2024-06-15 13:16:13,253][1652491] Updated weights for policy 0, policy_version 140752 (0.0021) [2024-06-15 13:16:15,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 45875.3, 300 sec: 46986.0). Total num frames: 288358400. Throughput: 0: 11571.2. Samples: 72140800. Policy #0 lag: (min: 0.0, avg: 73.3, max: 256.0) [2024-06-15 13:16:15,956][1648985] Avg episode reward: [(0, '133.180')] [2024-06-15 13:16:19,293][1652491] Updated weights for policy 0, policy_version 140804 (0.0013) [2024-06-15 13:16:20,955][1648985] Fps is (10 sec: 39321.6, 60 sec: 45875.2, 300 sec: 46652.8). Total num frames: 288489472. Throughput: 0: 11468.8. Samples: 72214016. Policy #0 lag: (min: 0.0, avg: 73.3, max: 256.0) [2024-06-15 13:16:20,956][1648985] Avg episode reward: [(0, '128.100')] [2024-06-15 13:16:21,004][1652491] Updated weights for policy 0, policy_version 140867 (0.0013) [2024-06-15 13:16:22,583][1652491] Updated weights for policy 0, policy_version 140925 (0.0016) [2024-06-15 13:16:23,903][1652491] Updated weights for policy 0, policy_version 140982 (0.0012) [2024-06-15 13:16:25,125][1652491] Updated weights for policy 0, policy_version 141028 (0.0037) [2024-06-15 13:16:25,955][1648985] Fps is (10 sec: 52428.3, 60 sec: 46967.4, 300 sec: 47097.1). Total num frames: 288882688. Throughput: 0: 11377.8. Samples: 72240640. Policy #0 lag: (min: 0.0, avg: 73.3, max: 256.0) [2024-06-15 13:16:25,956][1648985] Avg episode reward: [(0, '120.740')] [2024-06-15 13:16:30,955][1648985] Fps is (10 sec: 39321.7, 60 sec: 43690.7, 300 sec: 46208.4). Total num frames: 288882688. Throughput: 0: 11355.0. Samples: 72316928. Policy #0 lag: (min: 0.0, avg: 73.3, max: 256.0) [2024-06-15 13:16:30,956][1648985] Avg episode reward: [(0, '137.160')] [2024-06-15 13:16:32,190][1652491] Updated weights for policy 0, policy_version 141088 (0.0017) [2024-06-15 13:16:34,274][1652491] Updated weights for policy 0, policy_version 141168 (0.0011) [2024-06-15 13:16:35,446][1652491] Updated weights for policy 0, policy_version 141217 (0.0026) [2024-06-15 13:16:35,955][1648985] Fps is (10 sec: 36045.2, 60 sec: 46967.5, 300 sec: 46541.7). Total num frames: 289243136. Throughput: 0: 11400.6. Samples: 72374272. Policy #0 lag: (min: 0.0, avg: 73.3, max: 256.0) [2024-06-15 13:16:35,955][1648985] Avg episode reward: [(0, '133.980')] [2024-06-15 13:16:36,132][1652491] Updated weights for policy 0, policy_version 141248 (0.0012) [2024-06-15 13:16:36,768][1651469] Signal inference workers to stop experience collection... (7350 times) [2024-06-15 13:16:36,800][1652491] InferenceWorker_p0-w0: stopping experience collection (7350 times) [2024-06-15 13:16:36,981][1651469] Signal inference workers to resume experience collection... (7350 times) [2024-06-15 13:16:36,983][1652491] InferenceWorker_p0-w0: resuming experience collection (7350 times) [2024-06-15 13:16:40,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 43690.6, 300 sec: 46541.7). Total num frames: 289406976. Throughput: 0: 11377.8. Samples: 72414720. Policy #0 lag: (min: 0.0, avg: 73.3, max: 256.0) [2024-06-15 13:16:40,956][1648985] Avg episode reward: [(0, '138.780')] [2024-06-15 13:16:42,767][1652491] Updated weights for policy 0, policy_version 141313 (0.0078) [2024-06-15 13:16:44,236][1652491] Updated weights for policy 0, policy_version 141376 (0.0099) [2024-06-15 13:16:45,624][1652491] Updated weights for policy 0, policy_version 141425 (0.0038) [2024-06-15 13:16:45,955][1648985] Fps is (10 sec: 42598.3, 60 sec: 46967.5, 300 sec: 46208.5). Total num frames: 289669120. Throughput: 0: 11502.9. Samples: 72487424. Policy #0 lag: (min: 0.0, avg: 73.3, max: 256.0) [2024-06-15 13:16:45,956][1648985] Avg episode reward: [(0, '133.460')] [2024-06-15 13:16:46,998][1652491] Updated weights for policy 0, policy_version 141496 (0.0011) [2024-06-15 13:16:48,788][1652491] Updated weights for policy 0, policy_version 141559 (0.0015) [2024-06-15 13:16:50,955][1648985] Fps is (10 sec: 52428.1, 60 sec: 44782.9, 300 sec: 46652.7). Total num frames: 289931264. Throughput: 0: 11332.2. Samples: 72552960. Policy #0 lag: (min: 17.0, avg: 179.3, max: 273.0) [2024-06-15 13:16:50,956][1648985] Avg episode reward: [(0, '139.410')] [2024-06-15 13:16:55,859][1652491] Updated weights for policy 0, policy_version 141632 (0.0012) [2024-06-15 13:16:55,955][1648985] Fps is (10 sec: 39320.6, 60 sec: 45875.0, 300 sec: 46208.4). Total num frames: 290062336. Throughput: 0: 11491.5. Samples: 72594432. Policy #0 lag: (min: 17.0, avg: 179.3, max: 273.0) [2024-06-15 13:16:55,956][1648985] Avg episode reward: [(0, '140.500')] [2024-06-15 13:16:56,335][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000141648_290095104.pth... [2024-06-15 13:16:56,489][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000136192_278921216.pth [2024-06-15 13:16:57,465][1652491] Updated weights for policy 0, policy_version 141696 (0.0016) [2024-06-15 13:16:58,872][1652491] Updated weights for policy 0, policy_version 141759 (0.0012) [2024-06-15 13:17:00,376][1652491] Updated weights for policy 0, policy_version 141812 (0.0014) [2024-06-15 13:17:00,955][1648985] Fps is (10 sec: 52429.7, 60 sec: 46967.5, 300 sec: 46986.0). Total num frames: 290455552. Throughput: 0: 11355.0. Samples: 72651776. Policy #0 lag: (min: 17.0, avg: 179.3, max: 273.0) [2024-06-15 13:17:00,956][1648985] Avg episode reward: [(0, '125.860')] [2024-06-15 13:17:05,955][1648985] Fps is (10 sec: 39322.2, 60 sec: 43690.6, 300 sec: 46097.4). Total num frames: 290455552. Throughput: 0: 11468.8. Samples: 72730112. Policy #0 lag: (min: 17.0, avg: 179.3, max: 273.0) [2024-06-15 13:17:05,956][1648985] Avg episode reward: [(0, '126.330')] [2024-06-15 13:17:06,869][1652491] Updated weights for policy 0, policy_version 141859 (0.0012) [2024-06-15 13:17:08,641][1652491] Updated weights for policy 0, policy_version 141925 (0.0013) [2024-06-15 13:17:10,039][1652491] Updated weights for policy 0, policy_version 141986 (0.0082) [2024-06-15 13:17:10,971][1648985] Fps is (10 sec: 39259.4, 60 sec: 45863.1, 300 sec: 46650.2). Total num frames: 290848768. Throughput: 0: 11362.4. Samples: 72752128. Policy #0 lag: (min: 17.0, avg: 179.3, max: 273.0) [2024-06-15 13:17:10,972][1648985] Avg episode reward: [(0, '128.410')] [2024-06-15 13:17:11,317][1652491] Updated weights for policy 0, policy_version 142048 (0.0012) [2024-06-15 13:17:15,955][1648985] Fps is (10 sec: 52429.5, 60 sec: 43690.7, 300 sec: 46208.4). Total num frames: 290979840. Throughput: 0: 11241.3. Samples: 72822784. Policy #0 lag: (min: 17.0, avg: 179.3, max: 273.0) [2024-06-15 13:17:15,955][1648985] Avg episode reward: [(0, '121.790')] [2024-06-15 13:17:17,861][1652491] Updated weights for policy 0, policy_version 142085 (0.0011) [2024-06-15 13:17:19,417][1652491] Updated weights for policy 0, policy_version 142147 (0.0012) [2024-06-15 13:17:20,235][1651469] Signal inference workers to stop experience collection... (7400 times) [2024-06-15 13:17:20,264][1652491] InferenceWorker_p0-w0: stopping experience collection (7400 times) [2024-06-15 13:17:20,479][1651469] Signal inference workers to resume experience collection... (7400 times) [2024-06-15 13:17:20,482][1652491] InferenceWorker_p0-w0: resuming experience collection (7400 times) [2024-06-15 13:17:20,955][1648985] Fps is (10 sec: 39383.9, 60 sec: 45875.2, 300 sec: 46208.5). Total num frames: 291241984. Throughput: 0: 11366.4. Samples: 72885760. Policy #0 lag: (min: 17.0, avg: 179.3, max: 273.0) [2024-06-15 13:17:20,955][1648985] Avg episode reward: [(0, '116.870')] [2024-06-15 13:17:21,464][1652491] Updated weights for policy 0, policy_version 142226 (0.0011) [2024-06-15 13:17:22,998][1652491] Updated weights for policy 0, policy_version 142276 (0.0013) [2024-06-15 13:17:24,252][1652491] Updated weights for policy 0, policy_version 142331 (0.0145) [2024-06-15 13:17:25,955][1648985] Fps is (10 sec: 52427.9, 60 sec: 43690.6, 300 sec: 46541.6). Total num frames: 291504128. Throughput: 0: 11070.5. Samples: 72912896. Policy #0 lag: (min: 17.0, avg: 179.3, max: 273.0) [2024-06-15 13:17:25,956][1648985] Avg episode reward: [(0, '123.980')] [2024-06-15 13:17:30,955][1648985] Fps is (10 sec: 32767.8, 60 sec: 44782.9, 300 sec: 45653.0). Total num frames: 291569664. Throughput: 0: 11309.5. Samples: 72996352. Policy #0 lag: (min: 17.0, avg: 179.3, max: 273.0) [2024-06-15 13:17:30,956][1648985] Avg episode reward: [(0, '118.140')] [2024-06-15 13:17:31,429][1652491] Updated weights for policy 0, policy_version 142400 (0.0104) [2024-06-15 13:17:32,862][1652491] Updated weights for policy 0, policy_version 142464 (0.0017) [2024-06-15 13:17:34,173][1652491] Updated weights for policy 0, policy_version 142527 (0.0024) [2024-06-15 13:17:35,726][1652491] Updated weights for policy 0, policy_version 142583 (0.0013) [2024-06-15 13:17:35,955][1648985] Fps is (10 sec: 52429.6, 60 sec: 46421.3, 300 sec: 46652.8). Total num frames: 292028416. Throughput: 0: 11082.0. Samples: 73051648. Policy #0 lag: (min: 17.0, avg: 179.3, max: 273.0) [2024-06-15 13:17:35,956][1648985] Avg episode reward: [(0, '120.430')] [2024-06-15 13:17:40,955][1648985] Fps is (10 sec: 45875.4, 60 sec: 43690.7, 300 sec: 45875.2). Total num frames: 292028416. Throughput: 0: 11082.0. Samples: 73093120. Policy #0 lag: (min: 17.0, avg: 179.3, max: 273.0) [2024-06-15 13:17:40,956][1648985] Avg episode reward: [(0, '121.250')] [2024-06-15 13:17:42,182][1652491] Updated weights for policy 0, policy_version 142625 (0.0013) [2024-06-15 13:17:43,943][1652491] Updated weights for policy 0, policy_version 142704 (0.0013) [2024-06-15 13:17:45,809][1652491] Updated weights for policy 0, policy_version 142782 (0.0012) [2024-06-15 13:17:45,955][1648985] Fps is (10 sec: 39321.3, 60 sec: 45875.1, 300 sec: 46208.4). Total num frames: 292421632. Throughput: 0: 11241.2. Samples: 73157632. Policy #0 lag: (min: 17.0, avg: 179.3, max: 273.0) [2024-06-15 13:17:45,956][1648985] Avg episode reward: [(0, '119.740')] [2024-06-15 13:17:47,578][1652491] Updated weights for policy 0, policy_version 142845 (0.0024) [2024-06-15 13:17:50,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 43690.8, 300 sec: 46208.5). Total num frames: 292552704. Throughput: 0: 11161.6. Samples: 73232384. Policy #0 lag: (min: 17.0, avg: 179.3, max: 273.0) [2024-06-15 13:17:50,956][1648985] Avg episode reward: [(0, '127.670')] [2024-06-15 13:17:53,697][1652491] Updated weights for policy 0, policy_version 142896 (0.0026) [2024-06-15 13:17:55,319][1652491] Updated weights for policy 0, policy_version 142962 (0.0121) [2024-06-15 13:17:55,955][1648985] Fps is (10 sec: 39321.6, 60 sec: 45875.3, 300 sec: 46097.3). Total num frames: 292814848. Throughput: 0: 11472.8. Samples: 73268224. Policy #0 lag: (min: 15.0, avg: 79.6, max: 271.0) [2024-06-15 13:17:55,966][1648985] Avg episode reward: [(0, '132.500')] [2024-06-15 13:17:56,796][1652491] Updated weights for policy 0, policy_version 143026 (0.0012) [2024-06-15 13:17:57,353][1652491] Updated weights for policy 0, policy_version 143041 (0.0013) [2024-06-15 13:17:58,152][1651469] Signal inference workers to stop experience collection... (7450 times) [2024-06-15 13:17:58,207][1652491] InferenceWorker_p0-w0: stopping experience collection (7450 times) [2024-06-15 13:17:58,337][1651469] Signal inference workers to resume experience collection... (7450 times) [2024-06-15 13:17:58,338][1652491] InferenceWorker_p0-w0: resuming experience collection (7450 times) [2024-06-15 13:17:58,650][1652491] Updated weights for policy 0, policy_version 143102 (0.0014) [2024-06-15 13:18:00,955][1648985] Fps is (10 sec: 52428.5, 60 sec: 43690.6, 300 sec: 46208.4). Total num frames: 293076992. Throughput: 0: 11218.5. Samples: 73327616. Policy #0 lag: (min: 15.0, avg: 79.6, max: 271.0) [2024-06-15 13:18:00,956][1648985] Avg episode reward: [(0, '138.340')] [2024-06-15 13:18:05,453][1652491] Updated weights for policy 0, policy_version 143152 (0.0013) [2024-06-15 13:18:05,955][1648985] Fps is (10 sec: 39321.8, 60 sec: 45875.3, 300 sec: 45875.2). Total num frames: 293208064. Throughput: 0: 11525.7. Samples: 73404416. Policy #0 lag: (min: 15.0, avg: 79.6, max: 271.0) [2024-06-15 13:18:05,956][1648985] Avg episode reward: [(0, '128.030')] [2024-06-15 13:18:07,052][1652491] Updated weights for policy 0, policy_version 143232 (0.0012) [2024-06-15 13:18:08,315][1652491] Updated weights for policy 0, policy_version 143295 (0.0014) [2024-06-15 13:18:10,102][1652491] Updated weights for policy 0, policy_version 143348 (0.0016) [2024-06-15 13:18:10,966][1648985] Fps is (10 sec: 52370.7, 60 sec: 45878.8, 300 sec: 46206.7). Total num frames: 293601280. Throughput: 0: 11591.1. Samples: 73434624. Policy #0 lag: (min: 15.0, avg: 79.6, max: 271.0) [2024-06-15 13:18:10,967][1648985] Avg episode reward: [(0, '147.320')] [2024-06-15 13:18:15,934][1652491] Updated weights for policy 0, policy_version 143380 (0.0012) [2024-06-15 13:18:15,955][1648985] Fps is (10 sec: 42598.4, 60 sec: 44236.8, 300 sec: 45875.2). Total num frames: 293634048. Throughput: 0: 11434.7. Samples: 73510912. Policy #0 lag: (min: 15.0, avg: 79.6, max: 271.0) [2024-06-15 13:18:15,956][1648985] Avg episode reward: [(0, '148.710')] [2024-06-15 13:18:18,229][1652491] Updated weights for policy 0, policy_version 143488 (0.0139) [2024-06-15 13:18:19,829][1652491] Updated weights for policy 0, policy_version 143549 (0.0013) [2024-06-15 13:18:20,955][1648985] Fps is (10 sec: 42646.1, 60 sec: 46421.4, 300 sec: 46319.5). Total num frames: 294027264. Throughput: 0: 11423.3. Samples: 73565696. Policy #0 lag: (min: 15.0, avg: 79.6, max: 271.0) [2024-06-15 13:18:20,956][1648985] Avg episode reward: [(0, '147.010')] [2024-06-15 13:18:21,507][1652491] Updated weights for policy 0, policy_version 143600 (0.0012) [2024-06-15 13:18:25,955][1648985] Fps is (10 sec: 49152.0, 60 sec: 43690.8, 300 sec: 46097.4). Total num frames: 294125568. Throughput: 0: 11366.4. Samples: 73604608. Policy #0 lag: (min: 15.0, avg: 79.6, max: 271.0) [2024-06-15 13:18:25,956][1648985] Avg episode reward: [(0, '138.140')] [2024-06-15 13:18:26,672][1652491] Updated weights for policy 0, policy_version 143621 (0.0013) [2024-06-15 13:18:28,036][1652491] Updated weights for policy 0, policy_version 143674 (0.0015) [2024-06-15 13:18:29,600][1652491] Updated weights for policy 0, policy_version 143730 (0.0044) [2024-06-15 13:18:30,955][1648985] Fps is (10 sec: 42597.9, 60 sec: 48059.7, 300 sec: 45986.3). Total num frames: 294453248. Throughput: 0: 11480.2. Samples: 73674240. Policy #0 lag: (min: 15.0, avg: 79.6, max: 271.0) [2024-06-15 13:18:30,956][1648985] Avg episode reward: [(0, '120.830')] [2024-06-15 13:18:31,399][1652491] Updated weights for policy 0, policy_version 143802 (0.0122) [2024-06-15 13:18:32,829][1652491] Updated weights for policy 0, policy_version 143847 (0.0019) [2024-06-15 13:18:35,955][1648985] Fps is (10 sec: 52429.2, 60 sec: 43690.7, 300 sec: 46208.5). Total num frames: 294649856. Throughput: 0: 11514.3. Samples: 73750528. Policy #0 lag: (min: 15.0, avg: 79.6, max: 271.0) [2024-06-15 13:18:35,955][1648985] Avg episode reward: [(0, '128.360')] [2024-06-15 13:18:37,989][1652491] Updated weights for policy 0, policy_version 143889 (0.0014) [2024-06-15 13:18:39,342][1652491] Updated weights for policy 0, policy_version 143936 (0.0057) [2024-06-15 13:18:40,885][1651469] Signal inference workers to stop experience collection... (7500 times) [2024-06-15 13:18:40,932][1652491] Updated weights for policy 0, policy_version 143988 (0.0012) [2024-06-15 13:18:40,954][1652491] InferenceWorker_p0-w0: stopping experience collection (7500 times) [2024-06-15 13:18:40,955][1648985] Fps is (10 sec: 42598.6, 60 sec: 47513.6, 300 sec: 46097.4). Total num frames: 294879232. Throughput: 0: 11525.7. Samples: 73786880. Policy #0 lag: (min: 15.0, avg: 79.6, max: 271.0) [2024-06-15 13:18:40,956][1648985] Avg episode reward: [(0, '138.310')] [2024-06-15 13:18:41,202][1651469] Signal inference workers to resume experience collection... (7500 times) [2024-06-15 13:18:41,203][1652491] InferenceWorker_p0-w0: resuming experience collection (7500 times) [2024-06-15 13:18:42,396][1652491] Updated weights for policy 0, policy_version 144048 (0.0026) [2024-06-15 13:18:43,706][1652491] Updated weights for policy 0, policy_version 144096 (0.0083) [2024-06-15 13:18:45,955][1648985] Fps is (10 sec: 52428.5, 60 sec: 45875.3, 300 sec: 46208.4). Total num frames: 295174144. Throughput: 0: 11457.4. Samples: 73843200. Policy #0 lag: (min: 15.0, avg: 79.6, max: 271.0) [2024-06-15 13:18:45,956][1648985] Avg episode reward: [(0, '137.740')] [2024-06-15 13:18:49,610][1652491] Updated weights for policy 0, policy_version 144148 (0.0016) [2024-06-15 13:18:50,955][1648985] Fps is (10 sec: 42598.8, 60 sec: 45875.2, 300 sec: 45875.3). Total num frames: 295305216. Throughput: 0: 11514.3. Samples: 73922560. Policy #0 lag: (min: 15.0, avg: 79.6, max: 271.0) [2024-06-15 13:18:50,955][1648985] Avg episode reward: [(0, '132.440')] [2024-06-15 13:18:51,749][1652491] Updated weights for policy 0, policy_version 144225 (0.0097) [2024-06-15 13:18:53,264][1652491] Updated weights for policy 0, policy_version 144289 (0.0014) [2024-06-15 13:18:54,963][1652491] Updated weights for policy 0, policy_version 144339 (0.0013) [2024-06-15 13:18:55,955][1648985] Fps is (10 sec: 52427.8, 60 sec: 48059.6, 300 sec: 46208.4). Total num frames: 295698432. Throughput: 0: 11380.6. Samples: 73946624. Policy #0 lag: (min: 15.0, avg: 79.6, max: 271.0) [2024-06-15 13:18:55,956][1648985] Avg episode reward: [(0, '110.420')] [2024-06-15 13:18:55,962][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000144384_295698432.pth... [2024-06-15 13:18:56,003][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000139008_284688384.pth [2024-06-15 13:19:00,514][1652491] Updated weights for policy 0, policy_version 144390 (0.0048) [2024-06-15 13:19:00,955][1648985] Fps is (10 sec: 42598.2, 60 sec: 44236.8, 300 sec: 45986.3). Total num frames: 295731200. Throughput: 0: 11548.4. Samples: 74030592. Policy #0 lag: (min: 15.0, avg: 82.5, max: 271.0) [2024-06-15 13:19:00,956][1648985] Avg episode reward: [(0, '124.780')] [2024-06-15 13:19:02,704][1652491] Updated weights for policy 0, policy_version 144469 (0.0015) [2024-06-15 13:19:03,992][1652491] Updated weights for policy 0, policy_version 144520 (0.0164) [2024-06-15 13:19:05,250][1652491] Updated weights for policy 0, policy_version 144571 (0.0015) [2024-06-15 13:19:05,955][1648985] Fps is (10 sec: 39322.4, 60 sec: 48059.8, 300 sec: 46097.4). Total num frames: 296091648. Throughput: 0: 11594.0. Samples: 74087424. Policy #0 lag: (min: 15.0, avg: 82.5, max: 271.0) [2024-06-15 13:19:05,956][1648985] Avg episode reward: [(0, '128.610')] [2024-06-15 13:19:06,951][1652491] Updated weights for policy 0, policy_version 144637 (0.0021) [2024-06-15 13:19:10,955][1648985] Fps is (10 sec: 49151.4, 60 sec: 43698.7, 300 sec: 46208.4). Total num frames: 296222720. Throughput: 0: 11571.2. Samples: 74125312. Policy #0 lag: (min: 15.0, avg: 82.5, max: 271.0) [2024-06-15 13:19:10,956][1648985] Avg episode reward: [(0, '152.150')] [2024-06-15 13:19:13,859][1652491] Updated weights for policy 0, policy_version 144704 (0.0013) [2024-06-15 13:19:15,518][1652491] Updated weights for policy 0, policy_version 144770 (0.0172) [2024-06-15 13:19:15,955][1648985] Fps is (10 sec: 42598.3, 60 sec: 48059.7, 300 sec: 45986.3). Total num frames: 296517632. Throughput: 0: 11548.5. Samples: 74193920. Policy #0 lag: (min: 15.0, avg: 82.5, max: 271.0) [2024-06-15 13:19:15,956][1648985] Avg episode reward: [(0, '157.660')] [2024-06-15 13:19:16,823][1652491] Updated weights for policy 0, policy_version 144821 (0.0012) [2024-06-15 13:19:18,312][1652491] Updated weights for policy 0, policy_version 144895 (0.0012) [2024-06-15 13:19:20,955][1648985] Fps is (10 sec: 52429.7, 60 sec: 45329.1, 300 sec: 46208.5). Total num frames: 296747008. Throughput: 0: 11366.4. Samples: 74262016. Policy #0 lag: (min: 15.0, avg: 82.5, max: 271.0) [2024-06-15 13:19:20,955][1648985] Avg episode reward: [(0, '139.490')] [2024-06-15 13:19:23,719][1651469] Signal inference workers to stop experience collection... (7550 times) [2024-06-15 13:19:23,743][1652491] InferenceWorker_p0-w0: stopping experience collection (7550 times) [2024-06-15 13:19:23,934][1651469] Signal inference workers to resume experience collection... (7550 times) [2024-06-15 13:19:23,935][1652491] InferenceWorker_p0-w0: resuming experience collection (7550 times) [2024-06-15 13:19:25,046][1652491] Updated weights for policy 0, policy_version 144947 (0.0013) [2024-06-15 13:19:25,955][1648985] Fps is (10 sec: 39321.4, 60 sec: 46421.3, 300 sec: 45875.2). Total num frames: 296910848. Throughput: 0: 11514.3. Samples: 74305024. Policy #0 lag: (min: 15.0, avg: 82.5, max: 271.0) [2024-06-15 13:19:25,956][1648985] Avg episode reward: [(0, '136.830')] [2024-06-15 13:19:26,795][1652491] Updated weights for policy 0, policy_version 145010 (0.0013) [2024-06-15 13:19:28,445][1652491] Updated weights for policy 0, policy_version 145080 (0.0011) [2024-06-15 13:19:29,883][1652491] Updated weights for policy 0, policy_version 145136 (0.0011) [2024-06-15 13:19:30,972][1648985] Fps is (10 sec: 52338.2, 60 sec: 46954.0, 300 sec: 46205.7). Total num frames: 297271296. Throughput: 0: 11271.1. Samples: 74350592. Policy #0 lag: (min: 15.0, avg: 82.5, max: 271.0) [2024-06-15 13:19:30,973][1648985] Avg episode reward: [(0, '135.030')] [2024-06-15 13:19:35,955][1648985] Fps is (10 sec: 36044.9, 60 sec: 43690.6, 300 sec: 45430.9). Total num frames: 297271296. Throughput: 0: 11366.4. Samples: 74434048. Policy #0 lag: (min: 15.0, avg: 82.5, max: 271.0) [2024-06-15 13:19:35,956][1648985] Avg episode reward: [(0, '143.610')] [2024-06-15 13:19:36,265][1652491] Updated weights for policy 0, policy_version 145170 (0.0012) [2024-06-15 13:19:37,550][1652491] Updated weights for policy 0, policy_version 145232 (0.0015) [2024-06-15 13:19:38,782][1652491] Updated weights for policy 0, policy_version 145280 (0.0015) [2024-06-15 13:19:40,645][1652491] Updated weights for policy 0, policy_version 145348 (0.0013) [2024-06-15 13:19:40,955][1648985] Fps is (10 sec: 42672.3, 60 sec: 46967.5, 300 sec: 45875.2). Total num frames: 297697280. Throughput: 0: 11446.1. Samples: 74461696. Policy #0 lag: (min: 15.0, avg: 82.5, max: 271.0) [2024-06-15 13:19:40,956][1648985] Avg episode reward: [(0, '141.490')] [2024-06-15 13:19:41,704][1652491] Updated weights for policy 0, policy_version 145397 (0.0012) [2024-06-15 13:19:45,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 43690.7, 300 sec: 45764.2). Total num frames: 297795584. Throughput: 0: 11252.6. Samples: 74536960. Policy #0 lag: (min: 15.0, avg: 82.5, max: 271.0) [2024-06-15 13:19:45,956][1648985] Avg episode reward: [(0, '137.380')] [2024-06-15 13:19:47,588][1652491] Updated weights for policy 0, policy_version 145440 (0.0014) [2024-06-15 13:19:49,448][1652491] Updated weights for policy 0, policy_version 145520 (0.0013) [2024-06-15 13:19:50,955][1648985] Fps is (10 sec: 42598.7, 60 sec: 46967.5, 300 sec: 45542.0). Total num frames: 298123264. Throughput: 0: 11389.2. Samples: 74599936. Policy #0 lag: (min: 15.0, avg: 82.5, max: 271.0) [2024-06-15 13:19:50,955][1648985] Avg episode reward: [(0, '125.160')] [2024-06-15 13:19:51,286][1652491] Updated weights for policy 0, policy_version 145587 (0.0012) [2024-06-15 13:19:52,149][1652491] Updated weights for policy 0, policy_version 145620 (0.0012) [2024-06-15 13:19:55,955][1648985] Fps is (10 sec: 52428.6, 60 sec: 43690.8, 300 sec: 45764.1). Total num frames: 298319872. Throughput: 0: 11264.0. Samples: 74632192. Policy #0 lag: (min: 15.0, avg: 82.5, max: 271.0) [2024-06-15 13:19:55,956][1648985] Avg episode reward: [(0, '121.790')] [2024-06-15 13:19:58,583][1652491] Updated weights for policy 0, policy_version 145680 (0.0013) [2024-06-15 13:20:00,155][1651469] Signal inference workers to stop experience collection... (7600 times) [2024-06-15 13:20:00,195][1652491] Updated weights for policy 0, policy_version 145748 (0.0039) [2024-06-15 13:20:00,219][1652491] InferenceWorker_p0-w0: stopping experience collection (7600 times) [2024-06-15 13:20:00,375][1651469] Signal inference workers to resume experience collection... (7600 times) [2024-06-15 13:20:00,376][1652491] InferenceWorker_p0-w0: resuming experience collection (7600 times) [2024-06-15 13:20:00,955][1648985] Fps is (10 sec: 42598.1, 60 sec: 46967.5, 300 sec: 45430.9). Total num frames: 298549248. Throughput: 0: 11434.7. Samples: 74708480. Policy #0 lag: (min: 15.0, avg: 82.5, max: 271.0) [2024-06-15 13:20:00,955][1648985] Avg episode reward: [(0, '128.060')] [2024-06-15 13:20:01,874][1652491] Updated weights for policy 0, policy_version 145812 (0.0014) [2024-06-15 13:20:03,502][1652491] Updated weights for policy 0, policy_version 145888 (0.0011) [2024-06-15 13:20:05,955][1648985] Fps is (10 sec: 52428.6, 60 sec: 45875.2, 300 sec: 46208.4). Total num frames: 298844160. Throughput: 0: 11275.4. Samples: 74769408. Policy #0 lag: (min: 93.0, avg: 234.2, max: 328.0) [2024-06-15 13:20:05,956][1648985] Avg episode reward: [(0, '129.260')] [2024-06-15 13:20:10,556][1652491] Updated weights for policy 0, policy_version 145940 (0.0070) [2024-06-15 13:20:10,955][1648985] Fps is (10 sec: 36044.6, 60 sec: 44783.0, 300 sec: 45097.7). Total num frames: 298909696. Throughput: 0: 11264.0. Samples: 74811904. Policy #0 lag: (min: 93.0, avg: 234.2, max: 328.0) [2024-06-15 13:20:10,956][1648985] Avg episode reward: [(0, '137.090')] [2024-06-15 13:20:12,130][1652491] Updated weights for policy 0, policy_version 146001 (0.0115) [2024-06-15 13:20:14,895][1652491] Updated weights for policy 0, policy_version 146112 (0.0122) [2024-06-15 13:20:15,955][1648985] Fps is (10 sec: 49152.0, 60 sec: 46967.4, 300 sec: 46097.4). Total num frames: 299335680. Throughput: 0: 11359.4. Samples: 74861568. Policy #0 lag: (min: 93.0, avg: 234.2, max: 328.0) [2024-06-15 13:20:15,956][1648985] Avg episode reward: [(0, '129.320')] [2024-06-15 13:20:16,233][1652491] Updated weights for policy 0, policy_version 146176 (0.0013) [2024-06-15 13:20:20,955][1648985] Fps is (10 sec: 45874.6, 60 sec: 43690.5, 300 sec: 45097.6). Total num frames: 299368448. Throughput: 0: 11229.8. Samples: 74939392. Policy #0 lag: (min: 93.0, avg: 234.2, max: 328.0) [2024-06-15 13:20:20,956][1648985] Avg episode reward: [(0, '133.000')] [2024-06-15 13:20:24,475][1652491] Updated weights for policy 0, policy_version 146240 (0.0013) [2024-06-15 13:20:25,955][1648985] Fps is (10 sec: 29491.2, 60 sec: 45329.1, 300 sec: 45319.8). Total num frames: 299630592. Throughput: 0: 11377.8. Samples: 74973696. Policy #0 lag: (min: 93.0, avg: 234.2, max: 328.0) [2024-06-15 13:20:25,956][1648985] Avg episode reward: [(0, '128.710')] [2024-06-15 13:20:26,070][1652491] Updated weights for policy 0, policy_version 146305 (0.0026) [2024-06-15 13:20:27,735][1652491] Updated weights for policy 0, policy_version 146384 (0.0012) [2024-06-15 13:20:28,787][1652491] Updated weights for policy 0, policy_version 146432 (0.0029) [2024-06-15 13:20:30,955][1648985] Fps is (10 sec: 52429.5, 60 sec: 43703.2, 300 sec: 45653.0). Total num frames: 299892736. Throughput: 0: 11002.3. Samples: 75032064. Policy #0 lag: (min: 93.0, avg: 234.2, max: 328.0) [2024-06-15 13:20:30,956][1648985] Avg episode reward: [(0, '124.070')] [2024-06-15 13:20:35,955][1648985] Fps is (10 sec: 36044.6, 60 sec: 45329.0, 300 sec: 44764.4). Total num frames: 299991040. Throughput: 0: 11320.8. Samples: 75109376. Policy #0 lag: (min: 93.0, avg: 234.2, max: 328.0) [2024-06-15 13:20:35,956][1648985] Avg episode reward: [(0, '124.860')] [2024-06-15 13:20:36,297][1652491] Updated weights for policy 0, policy_version 146500 (0.0013) [2024-06-15 13:20:37,975][1652491] Updated weights for policy 0, policy_version 146576 (0.0013) [2024-06-15 13:20:38,450][1651469] Signal inference workers to stop experience collection... (7650 times) [2024-06-15 13:20:38,479][1652491] InferenceWorker_p0-w0: stopping experience collection (7650 times) [2024-06-15 13:20:38,752][1651469] Signal inference workers to resume experience collection... (7650 times) [2024-06-15 13:20:38,753][1652491] InferenceWorker_p0-w0: resuming experience collection (7650 times) [2024-06-15 13:20:39,707][1652491] Updated weights for policy 0, policy_version 146640 (0.0013) [2024-06-15 13:20:40,955][1648985] Fps is (10 sec: 52428.5, 60 sec: 45329.0, 300 sec: 45986.3). Total num frames: 300417024. Throughput: 0: 11104.7. Samples: 75131904. Policy #0 lag: (min: 93.0, avg: 234.2, max: 328.0) [2024-06-15 13:20:40,956][1648985] Avg episode reward: [(0, '121.340')] [2024-06-15 13:20:45,955][1648985] Fps is (10 sec: 42598.6, 60 sec: 43690.6, 300 sec: 44653.3). Total num frames: 300417024. Throughput: 0: 11025.0. Samples: 75204608. Policy #0 lag: (min: 93.0, avg: 234.2, max: 328.0) [2024-06-15 13:20:45,956][1648985] Avg episode reward: [(0, '141.470')] [2024-06-15 13:20:46,863][1652491] Updated weights for policy 0, policy_version 146704 (0.0015) [2024-06-15 13:20:48,214][1652491] Updated weights for policy 0, policy_version 146755 (0.0012) [2024-06-15 13:20:49,759][1652491] Updated weights for policy 0, policy_version 146817 (0.0018) [2024-06-15 13:20:50,955][1648985] Fps is (10 sec: 36044.7, 60 sec: 44236.6, 300 sec: 45653.0). Total num frames: 300777472. Throughput: 0: 11070.6. Samples: 75267584. Policy #0 lag: (min: 93.0, avg: 234.2, max: 328.0) [2024-06-15 13:20:50,956][1648985] Avg episode reward: [(0, '132.190')] [2024-06-15 13:20:51,286][1652491] Updated weights for policy 0, policy_version 146883 (0.0073) [2024-06-15 13:20:52,407][1652491] Updated weights for policy 0, policy_version 146941 (0.0012) [2024-06-15 13:20:55,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 43690.7, 300 sec: 45097.7). Total num frames: 300941312. Throughput: 0: 10831.6. Samples: 75299328. Policy #0 lag: (min: 93.0, avg: 234.2, max: 328.0) [2024-06-15 13:20:55,955][1648985] Avg episode reward: [(0, '142.010')] [2024-06-15 13:20:55,960][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000146944_300941312.pth... [2024-06-15 13:20:55,990][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000141648_290095104.pth [2024-06-15 13:20:59,006][1652491] Updated weights for policy 0, policy_version 146997 (0.0014) [2024-06-15 13:21:00,955][1648985] Fps is (10 sec: 42598.6, 60 sec: 44236.7, 300 sec: 45319.8). Total num frames: 301203456. Throughput: 0: 11457.4. Samples: 75377152. Policy #0 lag: (min: 93.0, avg: 234.2, max: 328.0) [2024-06-15 13:21:00,956][1648985] Avg episode reward: [(0, '138.740')] [2024-06-15 13:21:01,056][1652491] Updated weights for policy 0, policy_version 147088 (0.0116) [2024-06-15 13:21:02,519][1652491] Updated weights for policy 0, policy_version 147152 (0.0014) [2024-06-15 13:21:05,955][1648985] Fps is (10 sec: 52428.5, 60 sec: 43690.7, 300 sec: 45319.8). Total num frames: 301465600. Throughput: 0: 11093.4. Samples: 75438592. Policy #0 lag: (min: 93.0, avg: 234.2, max: 328.0) [2024-06-15 13:21:05,956][1648985] Avg episode reward: [(0, '138.690')] [2024-06-15 13:21:09,717][1652491] Updated weights for policy 0, policy_version 147216 (0.0014) [2024-06-15 13:21:10,955][1648985] Fps is (10 sec: 39321.8, 60 sec: 44782.9, 300 sec: 44875.5). Total num frames: 301596672. Throughput: 0: 11286.8. Samples: 75481600. Policy #0 lag: (min: 15.0, avg: 69.1, max: 271.0) [2024-06-15 13:21:10,956][1648985] Avg episode reward: [(0, '148.250')] [2024-06-15 13:21:11,542][1652491] Updated weights for policy 0, policy_version 147296 (0.0085) [2024-06-15 13:21:13,383][1652491] Updated weights for policy 0, policy_version 147366 (0.0013) [2024-06-15 13:21:15,140][1652491] Updated weights for policy 0, policy_version 147440 (0.0153) [2024-06-15 13:21:15,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 44236.8, 300 sec: 45764.1). Total num frames: 301989888. Throughput: 0: 11036.4. Samples: 75528704. Policy #0 lag: (min: 15.0, avg: 69.1, max: 271.0) [2024-06-15 13:21:15,956][1648985] Avg episode reward: [(0, '147.360')] [2024-06-15 13:21:20,955][1648985] Fps is (10 sec: 39321.6, 60 sec: 43690.8, 300 sec: 44431.2). Total num frames: 301989888. Throughput: 0: 11264.0. Samples: 75616256. Policy #0 lag: (min: 15.0, avg: 69.1, max: 271.0) [2024-06-15 13:21:20,956][1648985] Avg episode reward: [(0, '154.720')] [2024-06-15 13:21:21,247][1651469] Signal inference workers to stop experience collection... (7700 times) [2024-06-15 13:21:21,276][1652491] InferenceWorker_p0-w0: stopping experience collection (7700 times) [2024-06-15 13:21:21,497][1651469] Signal inference workers to resume experience collection... (7700 times) [2024-06-15 13:21:21,498][1652491] InferenceWorker_p0-w0: resuming experience collection (7700 times) [2024-06-15 13:21:22,085][1652491] Updated weights for policy 0, policy_version 147489 (0.0012) [2024-06-15 13:21:23,922][1652491] Updated weights for policy 0, policy_version 147568 (0.0013) [2024-06-15 13:21:25,518][1652491] Updated weights for policy 0, policy_version 147633 (0.0015) [2024-06-15 13:21:25,955][1648985] Fps is (10 sec: 39321.3, 60 sec: 45875.1, 300 sec: 45764.1). Total num frames: 302383104. Throughput: 0: 11434.7. Samples: 75646464. Policy #0 lag: (min: 15.0, avg: 69.1, max: 271.0) [2024-06-15 13:21:25,956][1648985] Avg episode reward: [(0, '151.650')] [2024-06-15 13:21:26,985][1652491] Updated weights for policy 0, policy_version 147712 (0.0014) [2024-06-15 13:21:30,955][1648985] Fps is (10 sec: 52428.6, 60 sec: 43690.6, 300 sec: 44986.6). Total num frames: 302514176. Throughput: 0: 11264.0. Samples: 75711488. Policy #0 lag: (min: 15.0, avg: 69.1, max: 271.0) [2024-06-15 13:21:30,956][1648985] Avg episode reward: [(0, '138.950')] [2024-06-15 13:21:33,976][1652491] Updated weights for policy 0, policy_version 147782 (0.0076) [2024-06-15 13:21:35,444][1652491] Updated weights for policy 0, policy_version 147841 (0.0016) [2024-06-15 13:21:35,955][1648985] Fps is (10 sec: 45875.7, 60 sec: 47513.7, 300 sec: 45542.0). Total num frames: 302841856. Throughput: 0: 11446.1. Samples: 75782656. Policy #0 lag: (min: 15.0, avg: 69.1, max: 271.0) [2024-06-15 13:21:35,956][1648985] Avg episode reward: [(0, '157.270')] [2024-06-15 13:21:36,919][1652491] Updated weights for policy 0, policy_version 147920 (0.0013) [2024-06-15 13:21:37,973][1652491] Updated weights for policy 0, policy_version 147967 (0.0012) [2024-06-15 13:21:40,955][1648985] Fps is (10 sec: 52427.1, 60 sec: 43690.5, 300 sec: 45319.8). Total num frames: 303038464. Throughput: 0: 11434.6. Samples: 75813888. Policy #0 lag: (min: 15.0, avg: 69.1, max: 271.0) [2024-06-15 13:21:40,956][1648985] Avg episode reward: [(0, '144.420')] [2024-06-15 13:21:44,460][1652491] Updated weights for policy 0, policy_version 148006 (0.0013) [2024-06-15 13:21:45,620][1652491] Updated weights for policy 0, policy_version 148055 (0.0012) [2024-06-15 13:21:45,955][1648985] Fps is (10 sec: 39321.6, 60 sec: 46967.5, 300 sec: 45097.7). Total num frames: 303235072. Throughput: 0: 11502.9. Samples: 75894784. Policy #0 lag: (min: 15.0, avg: 69.1, max: 271.0) [2024-06-15 13:21:45,956][1648985] Avg episode reward: [(0, '121.160')] [2024-06-15 13:21:47,008][1652491] Updated weights for policy 0, policy_version 148112 (0.0037) [2024-06-15 13:21:48,487][1652491] Updated weights for policy 0, policy_version 148177 (0.0013) [2024-06-15 13:21:50,955][1648985] Fps is (10 sec: 52430.7, 60 sec: 46421.4, 300 sec: 45764.2). Total num frames: 303562752. Throughput: 0: 11537.1. Samples: 75957760. Policy #0 lag: (min: 15.0, avg: 69.1, max: 271.0) [2024-06-15 13:21:50,956][1648985] Avg episode reward: [(0, '128.600')] [2024-06-15 13:21:54,687][1652491] Updated weights for policy 0, policy_version 148228 (0.0116) [2024-06-15 13:21:55,971][1648985] Fps is (10 sec: 42531.4, 60 sec: 45317.2, 300 sec: 44762.0). Total num frames: 303661056. Throughput: 0: 11578.5. Samples: 76002816. Policy #0 lag: (min: 15.0, avg: 69.1, max: 271.0) [2024-06-15 13:21:55,971][1648985] Avg episode reward: [(0, '138.180')] [2024-06-15 13:21:56,364][1652491] Updated weights for policy 0, policy_version 148293 (0.0014) [2024-06-15 13:21:56,689][1651469] Signal inference workers to stop experience collection... (7750 times) [2024-06-15 13:21:56,755][1652491] InferenceWorker_p0-w0: stopping experience collection (7750 times) [2024-06-15 13:21:57,048][1651469] Signal inference workers to resume experience collection... (7750 times) [2024-06-15 13:21:57,049][1652491] InferenceWorker_p0-w0: resuming experience collection (7750 times) [2024-06-15 13:21:58,193][1652491] Updated weights for policy 0, policy_version 148368 (0.0209) [2024-06-15 13:21:59,824][1652491] Updated weights for policy 0, policy_version 148448 (0.0013) [2024-06-15 13:22:00,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 48059.8, 300 sec: 46208.5). Total num frames: 304087040. Throughput: 0: 11628.1. Samples: 76051968. Policy #0 lag: (min: 15.0, avg: 69.1, max: 271.0) [2024-06-15 13:22:00,956][1648985] Avg episode reward: [(0, '166.380')] [2024-06-15 13:22:05,955][1648985] Fps is (10 sec: 42665.4, 60 sec: 43690.7, 300 sec: 44877.9). Total num frames: 304087040. Throughput: 0: 11559.8. Samples: 76136448. Policy #0 lag: (min: 15.0, avg: 69.1, max: 271.0) [2024-06-15 13:22:05,956][1648985] Avg episode reward: [(0, '152.010')] [2024-06-15 13:22:07,196][1652491] Updated weights for policy 0, policy_version 148512 (0.0012) [2024-06-15 13:22:09,919][1652491] Updated weights for policy 0, policy_version 148609 (0.0011) [2024-06-15 13:22:10,955][1648985] Fps is (10 sec: 36044.8, 60 sec: 47513.6, 300 sec: 45653.0). Total num frames: 304447488. Throughput: 0: 11480.2. Samples: 76163072. Policy #0 lag: (min: 15.0, avg: 69.1, max: 271.0) [2024-06-15 13:22:10,955][1648985] Avg episode reward: [(0, '148.110')] [2024-06-15 13:22:11,415][1652491] Updated weights for policy 0, policy_version 148688 (0.0053) [2024-06-15 13:22:12,457][1652491] Updated weights for policy 0, policy_version 148729 (0.0014) [2024-06-15 13:22:15,955][1648985] Fps is (10 sec: 52428.4, 60 sec: 43690.6, 300 sec: 45319.8). Total num frames: 304611328. Throughput: 0: 11525.7. Samples: 76230144. Policy #0 lag: (min: 15.0, avg: 69.1, max: 271.0) [2024-06-15 13:22:15,956][1648985] Avg episode reward: [(0, '149.590')] [2024-06-15 13:22:19,012][1652491] Updated weights for policy 0, policy_version 148785 (0.0012) [2024-06-15 13:22:20,405][1652491] Updated weights for policy 0, policy_version 148836 (0.0014) [2024-06-15 13:22:20,955][1648985] Fps is (10 sec: 39321.5, 60 sec: 47513.6, 300 sec: 45208.8). Total num frames: 304840704. Throughput: 0: 11514.3. Samples: 76300800. Policy #0 lag: (min: 11.0, avg: 67.7, max: 267.0) [2024-06-15 13:22:20,956][1648985] Avg episode reward: [(0, '130.760')] [2024-06-15 13:22:21,976][1652491] Updated weights for policy 0, policy_version 148901 (0.0012) [2024-06-15 13:22:23,584][1652491] Updated weights for policy 0, policy_version 148976 (0.0017) [2024-06-15 13:22:25,955][1648985] Fps is (10 sec: 52428.5, 60 sec: 45875.2, 300 sec: 45986.3). Total num frames: 305135616. Throughput: 0: 11412.0. Samples: 76327424. Policy #0 lag: (min: 11.0, avg: 67.7, max: 267.0) [2024-06-15 13:22:25,956][1648985] Avg episode reward: [(0, '119.230')] [2024-06-15 13:22:30,636][1652491] Updated weights for policy 0, policy_version 149040 (0.0128) [2024-06-15 13:22:30,955][1648985] Fps is (10 sec: 39321.3, 60 sec: 45329.0, 300 sec: 44764.4). Total num frames: 305233920. Throughput: 0: 11525.7. Samples: 76413440. Policy #0 lag: (min: 11.0, avg: 67.7, max: 267.0) [2024-06-15 13:22:30,956][1648985] Avg episode reward: [(0, '128.940')] [2024-06-15 13:22:32,336][1652491] Updated weights for policy 0, policy_version 149089 (0.0043) [2024-06-15 13:22:33,731][1651469] Signal inference workers to stop experience collection... (7800 times) [2024-06-15 13:22:33,766][1652491] InferenceWorker_p0-w0: stopping experience collection (7800 times) [2024-06-15 13:22:33,777][1652491] Updated weights for policy 0, policy_version 149154 (0.0010) [2024-06-15 13:22:33,991][1651469] Signal inference workers to resume experience collection... (7800 times) [2024-06-15 13:22:33,992][1652491] InferenceWorker_p0-w0: resuming experience collection (7800 times) [2024-06-15 13:22:35,556][1652491] Updated weights for policy 0, policy_version 149239 (0.0016) [2024-06-15 13:22:35,955][1648985] Fps is (10 sec: 52430.0, 60 sec: 46967.5, 300 sec: 46208.4). Total num frames: 305659904. Throughput: 0: 11195.7. Samples: 76461568. Policy #0 lag: (min: 11.0, avg: 67.7, max: 267.0) [2024-06-15 13:22:35,955][1648985] Avg episode reward: [(0, '134.090')] [2024-06-15 13:22:40,955][1648985] Fps is (10 sec: 42598.6, 60 sec: 43690.9, 300 sec: 44875.5). Total num frames: 305659904. Throughput: 0: 11211.0. Samples: 76507136. Policy #0 lag: (min: 11.0, avg: 67.7, max: 267.0) [2024-06-15 13:22:40,956][1648985] Avg episode reward: [(0, '132.900')] [2024-06-15 13:22:41,910][1652491] Updated weights for policy 0, policy_version 149281 (0.0014) [2024-06-15 13:22:43,942][1652491] Updated weights for policy 0, policy_version 149362 (0.0054) [2024-06-15 13:22:45,469][1652491] Updated weights for policy 0, policy_version 149424 (0.0012) [2024-06-15 13:22:45,955][1648985] Fps is (10 sec: 39321.2, 60 sec: 46967.4, 300 sec: 45764.1). Total num frames: 306053120. Throughput: 0: 11628.1. Samples: 76575232. Policy #0 lag: (min: 11.0, avg: 67.7, max: 267.0) [2024-06-15 13:22:45,956][1648985] Avg episode reward: [(0, '131.570')] [2024-06-15 13:22:47,035][1652491] Updated weights for policy 0, policy_version 149495 (0.0016) [2024-06-15 13:22:50,955][1648985] Fps is (10 sec: 52429.3, 60 sec: 43690.7, 300 sec: 45319.8). Total num frames: 306184192. Throughput: 0: 11366.4. Samples: 76647936. Policy #0 lag: (min: 11.0, avg: 67.7, max: 267.0) [2024-06-15 13:22:50,956][1648985] Avg episode reward: [(0, '132.860')] [2024-06-15 13:22:53,464][1652491] Updated weights for policy 0, policy_version 149537 (0.0019) [2024-06-15 13:22:55,601][1652491] Updated weights for policy 0, policy_version 149618 (0.0015) [2024-06-15 13:22:55,955][1648985] Fps is (10 sec: 39321.6, 60 sec: 46433.5, 300 sec: 45319.8). Total num frames: 306446336. Throughput: 0: 11628.1. Samples: 76686336. Policy #0 lag: (min: 11.0, avg: 67.7, max: 267.0) [2024-06-15 13:22:55,955][1648985] Avg episode reward: [(0, '122.540')] [2024-06-15 13:22:56,581][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000149664_306511872.pth... [2024-06-15 13:22:56,752][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000144384_295698432.pth [2024-06-15 13:22:57,647][1652491] Updated weights for policy 0, policy_version 149696 (0.0011) [2024-06-15 13:22:58,987][1652491] Updated weights for policy 0, policy_version 149753 (0.0056) [2024-06-15 13:23:00,955][1648985] Fps is (10 sec: 52428.2, 60 sec: 43690.6, 300 sec: 45764.1). Total num frames: 306708480. Throughput: 0: 11286.8. Samples: 76738048. Policy #0 lag: (min: 11.0, avg: 67.7, max: 267.0) [2024-06-15 13:23:00,956][1648985] Avg episode reward: [(0, '124.240')] [2024-06-15 13:23:04,867][1652491] Updated weights for policy 0, policy_version 149792 (0.0021) [2024-06-15 13:23:05,955][1648985] Fps is (10 sec: 39321.1, 60 sec: 45875.1, 300 sec: 44877.2). Total num frames: 306839552. Throughput: 0: 11446.0. Samples: 76815872. Policy #0 lag: (min: 11.0, avg: 67.7, max: 267.0) [2024-06-15 13:23:05,956][1648985] Avg episode reward: [(0, '135.570')] [2024-06-15 13:23:06,514][1652491] Updated weights for policy 0, policy_version 149858 (0.0027) [2024-06-15 13:23:07,990][1652491] Updated weights for policy 0, policy_version 149920 (0.0012) [2024-06-15 13:23:09,024][1652491] Updated weights for policy 0, policy_version 149957 (0.0012) [2024-06-15 13:23:10,256][1652491] Updated weights for policy 0, policy_version 150015 (0.0049) [2024-06-15 13:23:10,955][1648985] Fps is (10 sec: 52429.1, 60 sec: 46421.3, 300 sec: 46097.4). Total num frames: 307232768. Throughput: 0: 11446.1. Samples: 76842496. Policy #0 lag: (min: 11.0, avg: 67.7, max: 267.0) [2024-06-15 13:23:10,956][1648985] Avg episode reward: [(0, '126.620')] [2024-06-15 13:23:15,816][1651469] Signal inference workers to stop experience collection... (7850 times) [2024-06-15 13:23:15,872][1652491] InferenceWorker_p0-w0: stopping experience collection (7850 times) [2024-06-15 13:23:15,955][1648985] Fps is (10 sec: 39322.0, 60 sec: 43690.7, 300 sec: 44764.4). Total num frames: 307232768. Throughput: 0: 11195.7. Samples: 76917248. Policy #0 lag: (min: 11.0, avg: 67.7, max: 267.0) [2024-06-15 13:23:15,956][1648985] Avg episode reward: [(0, '143.940')] [2024-06-15 13:23:16,145][1651469] Signal inference workers to resume experience collection... (7850 times) [2024-06-15 13:23:16,146][1652491] InferenceWorker_p0-w0: resuming experience collection (7850 times) [2024-06-15 13:23:17,725][1652491] Updated weights for policy 0, policy_version 150096 (0.0104) [2024-06-15 13:23:19,254][1652491] Updated weights for policy 0, policy_version 150160 (0.0173) [2024-06-15 13:23:20,848][1652491] Updated weights for policy 0, policy_version 150231 (0.0012) [2024-06-15 13:23:20,955][1648985] Fps is (10 sec: 42598.2, 60 sec: 46967.4, 300 sec: 45875.2). Total num frames: 307658752. Throughput: 0: 11434.6. Samples: 76976128. Policy #0 lag: (min: 11.0, avg: 67.7, max: 267.0) [2024-06-15 13:23:20,956][1648985] Avg episode reward: [(0, '158.620')] [2024-06-15 13:23:25,955][1648985] Fps is (10 sec: 52428.1, 60 sec: 43690.7, 300 sec: 45097.6). Total num frames: 307757056. Throughput: 0: 11366.4. Samples: 77018624. Policy #0 lag: (min: 11.0, avg: 67.7, max: 267.0) [2024-06-15 13:23:25,956][1648985] Avg episode reward: [(0, '135.090')] [2024-06-15 13:23:27,349][1652491] Updated weights for policy 0, policy_version 150276 (0.0011) [2024-06-15 13:23:30,077][1652491] Updated weights for policy 0, policy_version 150384 (0.0013) [2024-06-15 13:23:30,955][1648985] Fps is (10 sec: 39321.0, 60 sec: 46967.4, 300 sec: 45430.9). Total num frames: 308051968. Throughput: 0: 11184.3. Samples: 77078528. Policy #0 lag: (min: 15.0, avg: 71.0, max: 271.0) [2024-06-15 13:23:30,956][1648985] Avg episode reward: [(0, '115.380')] [2024-06-15 13:23:31,332][1652491] Updated weights for policy 0, policy_version 150436 (0.0014) [2024-06-15 13:23:32,856][1652491] Updated weights for policy 0, policy_version 150512 (0.0012) [2024-06-15 13:23:35,955][1648985] Fps is (10 sec: 52429.2, 60 sec: 43690.6, 300 sec: 45430.9). Total num frames: 308281344. Throughput: 0: 11275.3. Samples: 77155328. Policy #0 lag: (min: 15.0, avg: 71.0, max: 271.0) [2024-06-15 13:23:35,956][1648985] Avg episode reward: [(0, '105.420')] [2024-06-15 13:23:38,691][1652491] Updated weights for policy 0, policy_version 150544 (0.0015) [2024-06-15 13:23:40,722][1652491] Updated weights for policy 0, policy_version 150624 (0.0013) [2024-06-15 13:23:40,955][1648985] Fps is (10 sec: 42599.1, 60 sec: 46967.5, 300 sec: 45097.7). Total num frames: 308477952. Throughput: 0: 11298.1. Samples: 77194752. Policy #0 lag: (min: 15.0, avg: 71.0, max: 271.0) [2024-06-15 13:23:40,956][1648985] Avg episode reward: [(0, '128.290')] [2024-06-15 13:23:42,395][1652491] Updated weights for policy 0, policy_version 150692 (0.0017) [2024-06-15 13:23:43,452][1652491] Updated weights for policy 0, policy_version 150752 (0.0014) [2024-06-15 13:23:44,229][1652491] Updated weights for policy 0, policy_version 150784 (0.0015) [2024-06-15 13:23:45,955][1648985] Fps is (10 sec: 52429.1, 60 sec: 45875.2, 300 sec: 45764.1). Total num frames: 308805632. Throughput: 0: 11491.6. Samples: 77255168. Policy #0 lag: (min: 15.0, avg: 71.0, max: 271.0) [2024-06-15 13:23:45,956][1648985] Avg episode reward: [(0, '141.590')] [2024-06-15 13:23:50,184][1652491] Updated weights for policy 0, policy_version 150848 (0.0023) [2024-06-15 13:23:50,637][1651469] Signal inference workers to stop experience collection... (7900 times) [2024-06-15 13:23:50,684][1652491] InferenceWorker_p0-w0: stopping experience collection (7900 times) [2024-06-15 13:23:50,853][1651469] Signal inference workers to resume experience collection... (7900 times) [2024-06-15 13:23:50,853][1652491] InferenceWorker_p0-w0: resuming experience collection (7900 times) [2024-06-15 13:23:50,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 46967.4, 300 sec: 45097.7). Total num frames: 309002240. Throughput: 0: 11514.3. Samples: 77334016. Policy #0 lag: (min: 15.0, avg: 71.0, max: 271.0) [2024-06-15 13:23:50,956][1648985] Avg episode reward: [(0, '140.350')] [2024-06-15 13:23:51,520][1652491] Updated weights for policy 0, policy_version 150908 (0.0013) [2024-06-15 13:23:53,430][1652491] Updated weights for policy 0, policy_version 150961 (0.0013) [2024-06-15 13:23:54,861][1652491] Updated weights for policy 0, policy_version 151028 (0.0011) [2024-06-15 13:23:55,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 48059.7, 300 sec: 46097.4). Total num frames: 309329920. Throughput: 0: 11582.6. Samples: 77363712. Policy #0 lag: (min: 15.0, avg: 71.0, max: 271.0) [2024-06-15 13:23:55,956][1648985] Avg episode reward: [(0, '138.930')] [2024-06-15 13:23:59,995][1652491] Updated weights for policy 0, policy_version 151060 (0.0014) [2024-06-15 13:24:00,955][1648985] Fps is (10 sec: 45874.6, 60 sec: 45875.1, 300 sec: 45319.8). Total num frames: 309460992. Throughput: 0: 11650.8. Samples: 77441536. Policy #0 lag: (min: 15.0, avg: 71.0, max: 271.0) [2024-06-15 13:24:00,956][1648985] Avg episode reward: [(0, '144.840')] [2024-06-15 13:24:01,924][1652491] Updated weights for policy 0, policy_version 151140 (0.0220) [2024-06-15 13:24:04,660][1652491] Updated weights for policy 0, policy_version 151200 (0.0012) [2024-06-15 13:24:05,955][1648985] Fps is (10 sec: 42598.7, 60 sec: 48606.0, 300 sec: 45875.2). Total num frames: 309755904. Throughput: 0: 11719.1. Samples: 77503488. Policy #0 lag: (min: 15.0, avg: 71.0, max: 271.0) [2024-06-15 13:24:05,955][1648985] Avg episode reward: [(0, '153.050')] [2024-06-15 13:24:06,618][1652491] Updated weights for policy 0, policy_version 151280 (0.0022) [2024-06-15 13:24:10,857][1652491] Updated weights for policy 0, policy_version 151313 (0.0011) [2024-06-15 13:24:10,955][1648985] Fps is (10 sec: 42599.0, 60 sec: 44236.8, 300 sec: 45319.8). Total num frames: 309886976. Throughput: 0: 11650.9. Samples: 77542912. Policy #0 lag: (min: 15.0, avg: 71.0, max: 271.0) [2024-06-15 13:24:10,956][1648985] Avg episode reward: [(0, '159.500')] [2024-06-15 13:24:12,682][1652491] Updated weights for policy 0, policy_version 151392 (0.0011) [2024-06-15 13:24:15,699][1652491] Updated weights for policy 0, policy_version 151456 (0.0018) [2024-06-15 13:24:15,955][1648985] Fps is (10 sec: 42598.3, 60 sec: 49152.0, 300 sec: 45542.0). Total num frames: 310181888. Throughput: 0: 11901.2. Samples: 77614080. Policy #0 lag: (min: 15.0, avg: 71.0, max: 271.0) [2024-06-15 13:24:15,956][1648985] Avg episode reward: [(0, '152.790')] [2024-06-15 13:24:17,299][1652491] Updated weights for policy 0, policy_version 151532 (0.0125) [2024-06-15 13:24:20,955][1648985] Fps is (10 sec: 49151.9, 60 sec: 45329.1, 300 sec: 45653.0). Total num frames: 310378496. Throughput: 0: 11946.7. Samples: 77692928. Policy #0 lag: (min: 15.0, avg: 71.0, max: 271.0) [2024-06-15 13:24:20,956][1648985] Avg episode reward: [(0, '153.780')] [2024-06-15 13:24:21,798][1652491] Updated weights for policy 0, policy_version 151568 (0.0024) [2024-06-15 13:24:22,802][1652491] Updated weights for policy 0, policy_version 151603 (0.0014) [2024-06-15 13:24:24,387][1652491] Updated weights for policy 0, policy_version 151671 (0.0014) [2024-06-15 13:24:25,955][1648985] Fps is (10 sec: 45875.0, 60 sec: 48059.8, 300 sec: 45322.5). Total num frames: 310640640. Throughput: 0: 11730.5. Samples: 77722624. Policy #0 lag: (min: 15.0, avg: 71.0, max: 271.0) [2024-06-15 13:24:25,956][1648985] Avg episode reward: [(0, '156.390')] [2024-06-15 13:24:26,657][1652491] Updated weights for policy 0, policy_version 151731 (0.0013) [2024-06-15 13:24:27,741][1651469] Signal inference workers to stop experience collection... (7950 times) [2024-06-15 13:24:27,777][1652491] InferenceWorker_p0-w0: stopping experience collection (7950 times) [2024-06-15 13:24:27,947][1651469] Signal inference workers to resume experience collection... (7950 times) [2024-06-15 13:24:27,948][1652491] InferenceWorker_p0-w0: resuming experience collection (7950 times) [2024-06-15 13:24:28,110][1652491] Updated weights for policy 0, policy_version 151805 (0.0021) [2024-06-15 13:24:30,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 47513.7, 300 sec: 46208.4). Total num frames: 310902784. Throughput: 0: 11969.4. Samples: 77793792. Policy #0 lag: (min: 15.0, avg: 71.0, max: 271.0) [2024-06-15 13:24:30,956][1648985] Avg episode reward: [(0, '136.640')] [2024-06-15 13:24:34,083][1652491] Updated weights for policy 0, policy_version 151874 (0.0108) [2024-06-15 13:24:35,506][1652491] Updated weights for policy 0, policy_version 151936 (0.0021) [2024-06-15 13:24:35,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 48059.8, 300 sec: 45653.0). Total num frames: 311164928. Throughput: 0: 11616.7. Samples: 77856768. Policy #0 lag: (min: 15.0, avg: 91.5, max: 271.0) [2024-06-15 13:24:35,956][1648985] Avg episode reward: [(0, '113.890')] [2024-06-15 13:24:39,366][1652491] Updated weights for policy 0, policy_version 152018 (0.0015) [2024-06-15 13:24:40,161][1652491] Updated weights for policy 0, policy_version 152064 (0.0168) [2024-06-15 13:24:40,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 49152.0, 300 sec: 46208.4). Total num frames: 311427072. Throughput: 0: 11685.0. Samples: 77889536. Policy #0 lag: (min: 15.0, avg: 91.5, max: 271.0) [2024-06-15 13:24:40,956][1648985] Avg episode reward: [(0, '115.140')] [2024-06-15 13:24:45,955][1648985] Fps is (10 sec: 39321.5, 60 sec: 45875.2, 300 sec: 45541.9). Total num frames: 311558144. Throughput: 0: 11605.4. Samples: 77963776. Policy #0 lag: (min: 15.0, avg: 91.5, max: 271.0) [2024-06-15 13:24:45,956][1648985] Avg episode reward: [(0, '111.790')] [2024-06-15 13:24:46,753][1652491] Updated weights for policy 0, policy_version 152160 (0.0013) [2024-06-15 13:24:49,214][1652491] Updated weights for policy 0, policy_version 152208 (0.0013) [2024-06-15 13:24:50,535][1652491] Updated weights for policy 0, policy_version 152262 (0.0013) [2024-06-15 13:24:50,955][1648985] Fps is (10 sec: 45875.3, 60 sec: 48059.8, 300 sec: 45986.3). Total num frames: 311885824. Throughput: 0: 11559.8. Samples: 78023680. Policy #0 lag: (min: 15.0, avg: 91.5, max: 271.0) [2024-06-15 13:24:50,956][1648985] Avg episode reward: [(0, '128.960')] [2024-06-15 13:24:51,459][1652491] Updated weights for policy 0, policy_version 152318 (0.0021) [2024-06-15 13:24:55,955][1648985] Fps is (10 sec: 39320.7, 60 sec: 43690.5, 300 sec: 45430.8). Total num frames: 311951360. Throughput: 0: 11593.9. Samples: 78064640. Policy #0 lag: (min: 15.0, avg: 91.5, max: 271.0) [2024-06-15 13:24:55,956][1648985] Avg episode reward: [(0, '135.780')] [2024-06-15 13:24:56,621][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000152352_312016896.pth... [2024-06-15 13:24:56,745][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000146944_300941312.pth [2024-06-15 13:24:57,420][1652491] Updated weights for policy 0, policy_version 152380 (0.0011) [2024-06-15 13:24:59,327][1652491] Updated weights for policy 0, policy_version 152441 (0.0013) [2024-06-15 13:25:00,955][1648985] Fps is (10 sec: 39321.4, 60 sec: 46967.6, 300 sec: 45542.0). Total num frames: 312279040. Throughput: 0: 11491.5. Samples: 78131200. Policy #0 lag: (min: 15.0, avg: 91.5, max: 271.0) [2024-06-15 13:25:00,956][1648985] Avg episode reward: [(0, '120.750')] [2024-06-15 13:25:02,073][1652491] Updated weights for policy 0, policy_version 152544 (0.0100) [2024-06-15 13:25:05,955][1648985] Fps is (10 sec: 52430.4, 60 sec: 45329.1, 300 sec: 45986.3). Total num frames: 312475648. Throughput: 0: 11298.1. Samples: 78201344. Policy #0 lag: (min: 15.0, avg: 91.5, max: 271.0) [2024-06-15 13:25:05,956][1648985] Avg episode reward: [(0, '108.810')] [2024-06-15 13:25:07,331][1652491] Updated weights for policy 0, policy_version 152592 (0.0013) [2024-06-15 13:25:10,277][1652491] Updated weights for policy 0, policy_version 152675 (0.0014) [2024-06-15 13:25:10,955][1648985] Fps is (10 sec: 45873.9, 60 sec: 47513.4, 300 sec: 45430.8). Total num frames: 312737792. Throughput: 0: 11366.3. Samples: 78234112. Policy #0 lag: (min: 15.0, avg: 91.5, max: 271.0) [2024-06-15 13:25:10,956][1648985] Avg episode reward: [(0, '119.870')] [2024-06-15 13:25:12,029][1652491] Updated weights for policy 0, policy_version 152736 (0.0032) [2024-06-15 13:25:13,490][1651469] Signal inference workers to stop experience collection... (8000 times) [2024-06-15 13:25:13,547][1652491] InferenceWorker_p0-w0: stopping experience collection (8000 times) [2024-06-15 13:25:13,669][1651469] Signal inference workers to resume experience collection... (8000 times) [2024-06-15 13:25:13,671][1652491] InferenceWorker_p0-w0: resuming experience collection (8000 times) [2024-06-15 13:25:13,851][1652491] Updated weights for policy 0, policy_version 152786 (0.0012) [2024-06-15 13:25:15,955][1648985] Fps is (10 sec: 52428.5, 60 sec: 46967.4, 300 sec: 46208.5). Total num frames: 312999936. Throughput: 0: 11229.9. Samples: 78299136. Policy #0 lag: (min: 15.0, avg: 91.5, max: 271.0) [2024-06-15 13:25:15,956][1648985] Avg episode reward: [(0, '131.590')] [2024-06-15 13:25:19,215][1652491] Updated weights for policy 0, policy_version 152866 (0.0013) [2024-06-15 13:25:20,653][1652491] Updated weights for policy 0, policy_version 152912 (0.0013) [2024-06-15 13:25:20,955][1648985] Fps is (10 sec: 42599.9, 60 sec: 46421.4, 300 sec: 45875.2). Total num frames: 313163776. Throughput: 0: 11502.9. Samples: 78374400. Policy #0 lag: (min: 15.0, avg: 91.5, max: 271.0) [2024-06-15 13:25:20,956][1648985] Avg episode reward: [(0, '131.830')] [2024-06-15 13:25:21,519][1652491] Updated weights for policy 0, policy_version 152956 (0.0021) [2024-06-15 13:25:24,028][1652491] Updated weights for policy 0, policy_version 153024 (0.0017) [2024-06-15 13:25:25,214][1652491] Updated weights for policy 0, policy_version 153081 (0.0021) [2024-06-15 13:25:25,955][1648985] Fps is (10 sec: 52428.3, 60 sec: 48059.7, 300 sec: 46208.4). Total num frames: 313524224. Throughput: 0: 11662.2. Samples: 78414336. Policy #0 lag: (min: 15.0, avg: 91.5, max: 271.0) [2024-06-15 13:25:25,956][1648985] Avg episode reward: [(0, '137.930')] [2024-06-15 13:25:30,345][1652491] Updated weights for policy 0, policy_version 153144 (0.0017) [2024-06-15 13:25:30,955][1648985] Fps is (10 sec: 49151.7, 60 sec: 45875.2, 300 sec: 46319.5). Total num frames: 313655296. Throughput: 0: 11605.3. Samples: 78486016. Policy #0 lag: (min: 15.0, avg: 91.5, max: 271.0) [2024-06-15 13:25:30,956][1648985] Avg episode reward: [(0, '138.980')] [2024-06-15 13:25:32,389][1652491] Updated weights for policy 0, policy_version 153206 (0.0012) [2024-06-15 13:25:34,202][1652491] Updated weights for policy 0, policy_version 153248 (0.0013) [2024-06-15 13:25:35,047][1652491] Updated weights for policy 0, policy_version 153283 (0.0013) [2024-06-15 13:25:35,955][1648985] Fps is (10 sec: 45875.7, 60 sec: 46967.5, 300 sec: 45986.3). Total num frames: 313982976. Throughput: 0: 11810.1. Samples: 78555136. Policy #0 lag: (min: 15.0, avg: 91.5, max: 271.0) [2024-06-15 13:25:35,956][1648985] Avg episode reward: [(0, '134.360')] [2024-06-15 13:25:40,974][1648985] Fps is (10 sec: 42517.1, 60 sec: 44222.7, 300 sec: 46316.5). Total num frames: 314081280. Throughput: 0: 11634.6. Samples: 78588416. Policy #0 lag: (min: 15.0, avg: 106.8, max: 271.0) [2024-06-15 13:25:40,975][1648985] Avg episode reward: [(0, '123.190')] [2024-06-15 13:25:41,158][1652491] Updated weights for policy 0, policy_version 153377 (0.0016) [2024-06-15 13:25:41,803][1652491] Updated weights for policy 0, policy_version 153408 (0.0012) [2024-06-15 13:25:43,743][1652491] Updated weights for policy 0, policy_version 153462 (0.0013) [2024-06-15 13:25:45,914][1652491] Updated weights for policy 0, policy_version 153492 (0.0013) [2024-06-15 13:25:45,955][1648985] Fps is (10 sec: 36044.6, 60 sec: 46421.3, 300 sec: 45986.3). Total num frames: 314343424. Throughput: 0: 11741.9. Samples: 78659584. Policy #0 lag: (min: 15.0, avg: 106.8, max: 271.0) [2024-06-15 13:25:45,956][1648985] Avg episode reward: [(0, '109.450')] [2024-06-15 13:25:47,900][1652491] Updated weights for policy 0, policy_version 153584 (0.0015) [2024-06-15 13:25:50,955][1648985] Fps is (10 sec: 49246.4, 60 sec: 44782.9, 300 sec: 46208.4). Total num frames: 314572800. Throughput: 0: 11685.0. Samples: 78727168. Policy #0 lag: (min: 15.0, avg: 106.8, max: 271.0) [2024-06-15 13:25:50,956][1648985] Avg episode reward: [(0, '116.730')] [2024-06-15 13:25:51,504][1652491] Updated weights for policy 0, policy_version 153617 (0.0013) [2024-06-15 13:25:55,016][1652491] Updated weights for policy 0, policy_version 153697 (0.0015) [2024-06-15 13:25:55,955][1648985] Fps is (10 sec: 49150.6, 60 sec: 48059.7, 300 sec: 46208.4). Total num frames: 314834944. Throughput: 0: 11673.6. Samples: 78759424. Policy #0 lag: (min: 15.0, avg: 106.8, max: 271.0) [2024-06-15 13:25:55,956][1648985] Avg episode reward: [(0, '135.530')] [2024-06-15 13:25:57,630][1651469] Signal inference workers to stop experience collection... (8050 times) [2024-06-15 13:25:57,682][1652491] InferenceWorker_p0-w0: stopping experience collection (8050 times) [2024-06-15 13:25:57,690][1652491] Updated weights for policy 0, policy_version 153767 (0.0014) [2024-06-15 13:25:57,847][1651469] Signal inference workers to resume experience collection... (8050 times) [2024-06-15 13:25:57,847][1652491] InferenceWorker_p0-w0: resuming experience collection (8050 times) [2024-06-15 13:25:59,220][1652491] Updated weights for policy 0, policy_version 153830 (0.0011) [2024-06-15 13:26:00,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 46967.5, 300 sec: 46208.4). Total num frames: 315097088. Throughput: 0: 11730.5. Samples: 78827008. Policy #0 lag: (min: 15.0, avg: 106.8, max: 271.0) [2024-06-15 13:26:00,956][1648985] Avg episode reward: [(0, '130.030')] [2024-06-15 13:26:02,724][1652491] Updated weights for policy 0, policy_version 153862 (0.0013) [2024-06-15 13:26:05,633][1652491] Updated weights for policy 0, policy_version 153921 (0.0026) [2024-06-15 13:26:05,955][1648985] Fps is (10 sec: 42600.0, 60 sec: 46421.3, 300 sec: 46319.5). Total num frames: 315260928. Throughput: 0: 11730.5. Samples: 78902272. Policy #0 lag: (min: 15.0, avg: 106.8, max: 271.0) [2024-06-15 13:26:05,956][1648985] Avg episode reward: [(0, '143.110')] [2024-06-15 13:26:07,078][1652491] Updated weights for policy 0, policy_version 153984 (0.0014) [2024-06-15 13:26:10,481][1652491] Updated weights for policy 0, policy_version 154064 (0.0012) [2024-06-15 13:26:10,955][1648985] Fps is (10 sec: 45875.1, 60 sec: 46967.7, 300 sec: 45986.3). Total num frames: 315555840. Throughput: 0: 11582.6. Samples: 78935552. Policy #0 lag: (min: 15.0, avg: 106.8, max: 271.0) [2024-06-15 13:26:10,956][1648985] Avg episode reward: [(0, '134.790')] [2024-06-15 13:26:13,899][1652491] Updated weights for policy 0, policy_version 154114 (0.0012) [2024-06-15 13:26:15,062][1652491] Updated weights for policy 0, policy_version 154166 (0.0014) [2024-06-15 13:26:15,955][1648985] Fps is (10 sec: 49151.8, 60 sec: 45875.2, 300 sec: 46652.7). Total num frames: 315752448. Throughput: 0: 11468.8. Samples: 79002112. Policy #0 lag: (min: 15.0, avg: 106.8, max: 271.0) [2024-06-15 13:26:15,956][1648985] Avg episode reward: [(0, '136.380')] [2024-06-15 13:26:17,729][1652491] Updated weights for policy 0, policy_version 154210 (0.0012) [2024-06-15 13:26:20,907][1652491] Updated weights for policy 0, policy_version 154288 (0.0012) [2024-06-15 13:26:20,955][1648985] Fps is (10 sec: 42598.3, 60 sec: 46967.4, 300 sec: 46097.4). Total num frames: 315981824. Throughput: 0: 11525.7. Samples: 79073792. Policy #0 lag: (min: 15.0, avg: 106.8, max: 271.0) [2024-06-15 13:26:20,956][1648985] Avg episode reward: [(0, '139.070')] [2024-06-15 13:26:22,237][1652491] Updated weights for policy 0, policy_version 154342 (0.0012) [2024-06-15 13:26:25,185][1652491] Updated weights for policy 0, policy_version 154389 (0.0016) [2024-06-15 13:26:25,955][1648985] Fps is (10 sec: 49152.2, 60 sec: 45329.2, 300 sec: 46541.7). Total num frames: 316243968. Throughput: 0: 11553.4. Samples: 79108096. Policy #0 lag: (min: 15.0, avg: 106.8, max: 271.0) [2024-06-15 13:26:25,956][1648985] Avg episode reward: [(0, '147.620')] [2024-06-15 13:26:26,157][1652491] Updated weights for policy 0, policy_version 154431 (0.0011) [2024-06-15 13:26:29,249][1652491] Updated weights for policy 0, policy_version 154489 (0.0014) [2024-06-15 13:26:30,955][1648985] Fps is (10 sec: 42598.3, 60 sec: 45875.2, 300 sec: 45986.3). Total num frames: 316407808. Throughput: 0: 11582.6. Samples: 79180800. Policy #0 lag: (min: 15.0, avg: 106.8, max: 271.0) [2024-06-15 13:26:30,956][1648985] Avg episode reward: [(0, '165.690')] [2024-06-15 13:26:32,059][1652491] Updated weights for policy 0, policy_version 154533 (0.0061) [2024-06-15 13:26:33,856][1652491] Updated weights for policy 0, policy_version 154617 (0.0135) [2024-06-15 13:26:35,955][1648985] Fps is (10 sec: 42598.3, 60 sec: 44782.9, 300 sec: 46208.5). Total num frames: 316669952. Throughput: 0: 11662.2. Samples: 79251968. Policy #0 lag: (min: 15.0, avg: 106.8, max: 271.0) [2024-06-15 13:26:35,956][1648985] Avg episode reward: [(0, '154.410')] [2024-06-15 13:26:36,383][1652491] Updated weights for policy 0, policy_version 154644 (0.0014) [2024-06-15 13:26:39,722][1652491] Updated weights for policy 0, policy_version 154704 (0.0015) [2024-06-15 13:26:40,955][1648985] Fps is (10 sec: 52428.6, 60 sec: 47528.7, 300 sec: 46430.6). Total num frames: 316932096. Throughput: 0: 11776.1. Samples: 79289344. Policy #0 lag: (min: 15.0, avg: 106.8, max: 271.0) [2024-06-15 13:26:40,956][1648985] Avg episode reward: [(0, '146.920')] [2024-06-15 13:26:42,690][1651469] Signal inference workers to stop experience collection... (8100 times) [2024-06-15 13:26:42,721][1652491] InferenceWorker_p0-w0: stopping experience collection (8100 times) [2024-06-15 13:26:42,916][1651469] Signal inference workers to resume experience collection... (8100 times) [2024-06-15 13:26:42,917][1652491] InferenceWorker_p0-w0: resuming experience collection (8100 times) [2024-06-15 13:26:43,368][1652491] Updated weights for policy 0, policy_version 154784 (0.0014) [2024-06-15 13:26:45,352][1652491] Updated weights for policy 0, policy_version 154871 (0.0013) [2024-06-15 13:26:45,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 47513.7, 300 sec: 46208.4). Total num frames: 317194240. Throughput: 0: 11559.8. Samples: 79347200. Policy #0 lag: (min: 64.0, avg: 167.0, max: 320.0) [2024-06-15 13:26:45,956][1648985] Avg episode reward: [(0, '124.510')] [2024-06-15 13:26:48,151][1652491] Updated weights for policy 0, policy_version 154913 (0.0014) [2024-06-15 13:26:50,955][1648985] Fps is (10 sec: 39321.6, 60 sec: 45875.1, 300 sec: 46322.0). Total num frames: 317325312. Throughput: 0: 11571.2. Samples: 79422976. Policy #0 lag: (min: 64.0, avg: 167.0, max: 320.0) [2024-06-15 13:26:50,956][1648985] Avg episode reward: [(0, '143.170')] [2024-06-15 13:26:51,687][1652491] Updated weights for policy 0, policy_version 154976 (0.0013) [2024-06-15 13:26:54,324][1652491] Updated weights for policy 0, policy_version 155024 (0.0029) [2024-06-15 13:26:55,500][1652491] Updated weights for policy 0, policy_version 155075 (0.0012) [2024-06-15 13:26:55,955][1648985] Fps is (10 sec: 42598.0, 60 sec: 46421.6, 300 sec: 45875.2). Total num frames: 317620224. Throughput: 0: 11582.6. Samples: 79456768. Policy #0 lag: (min: 64.0, avg: 167.0, max: 320.0) [2024-06-15 13:26:55,956][1648985] Avg episode reward: [(0, '146.720')] [2024-06-15 13:26:56,582][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000155120_317685760.pth... [2024-06-15 13:26:56,632][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000149664_306511872.pth [2024-06-15 13:26:56,801][1652491] Updated weights for policy 0, policy_version 155126 (0.0012) [2024-06-15 13:26:58,423][1652491] Updated weights for policy 0, policy_version 155159 (0.0013) [2024-06-15 13:26:59,403][1652491] Updated weights for policy 0, policy_version 155200 (0.0016) [2024-06-15 13:27:00,955][1648985] Fps is (10 sec: 52428.6, 60 sec: 45875.1, 300 sec: 46652.7). Total num frames: 317849600. Throughput: 0: 11559.8. Samples: 79522304. Policy #0 lag: (min: 64.0, avg: 167.0, max: 320.0) [2024-06-15 13:27:00,956][1648985] Avg episode reward: [(0, '140.960')] [2024-06-15 13:27:04,157][1652491] Updated weights for policy 0, policy_version 155258 (0.0015) [2024-06-15 13:27:05,955][1648985] Fps is (10 sec: 42598.8, 60 sec: 46421.3, 300 sec: 46097.4). Total num frames: 318046208. Throughput: 0: 11639.5. Samples: 79597568. Policy #0 lag: (min: 64.0, avg: 167.0, max: 320.0) [2024-06-15 13:27:05,956][1648985] Avg episode reward: [(0, '133.180')] [2024-06-15 13:27:06,339][1652491] Updated weights for policy 0, policy_version 155328 (0.0012) [2024-06-15 13:27:07,471][1652491] Updated weights for policy 0, policy_version 155380 (0.0013) [2024-06-15 13:27:09,620][1652491] Updated weights for policy 0, policy_version 155453 (0.0012) [2024-06-15 13:27:10,955][1648985] Fps is (10 sec: 52429.3, 60 sec: 46967.4, 300 sec: 46652.8). Total num frames: 318373888. Throughput: 0: 11730.5. Samples: 79635968. Policy #0 lag: (min: 64.0, avg: 167.0, max: 320.0) [2024-06-15 13:27:10,956][1648985] Avg episode reward: [(0, '112.590')] [2024-06-15 13:27:15,356][1652491] Updated weights for policy 0, policy_version 155516 (0.0013) [2024-06-15 13:27:15,955][1648985] Fps is (10 sec: 45874.3, 60 sec: 45875.1, 300 sec: 46319.5). Total num frames: 318504960. Throughput: 0: 11810.1. Samples: 79712256. Policy #0 lag: (min: 64.0, avg: 167.0, max: 320.0) [2024-06-15 13:27:15,956][1648985] Avg episode reward: [(0, '117.200')] [2024-06-15 13:27:17,586][1652491] Updated weights for policy 0, policy_version 155606 (0.0014) [2024-06-15 13:27:18,365][1652491] Updated weights for policy 0, policy_version 155648 (0.0013) [2024-06-15 13:27:20,955][1648985] Fps is (10 sec: 49152.3, 60 sec: 48059.8, 300 sec: 46541.7). Total num frames: 318865408. Throughput: 0: 11548.4. Samples: 79771648. Policy #0 lag: (min: 64.0, avg: 167.0, max: 320.0) [2024-06-15 13:27:20,956][1648985] Avg episode reward: [(0, '116.430')] [2024-06-15 13:27:21,014][1652491] Updated weights for policy 0, policy_version 155712 (0.0013) [2024-06-15 13:27:25,430][1651469] Signal inference workers to stop experience collection... (8150 times) [2024-06-15 13:27:25,517][1652491] InferenceWorker_p0-w0: stopping experience collection (8150 times) [2024-06-15 13:27:25,704][1651469] Signal inference workers to resume experience collection... (8150 times) [2024-06-15 13:27:25,705][1652491] InferenceWorker_p0-w0: resuming experience collection (8150 times) [2024-06-15 13:27:25,955][1648985] Fps is (10 sec: 45875.9, 60 sec: 45329.0, 300 sec: 46541.7). Total num frames: 318963712. Throughput: 0: 11650.9. Samples: 79813632. Policy #0 lag: (min: 64.0, avg: 167.0, max: 320.0) [2024-06-15 13:27:25,956][1648985] Avg episode reward: [(0, '121.330')] [2024-06-15 13:27:26,436][1652491] Updated weights for policy 0, policy_version 155767 (0.0012) [2024-06-15 13:27:27,369][1652491] Updated weights for policy 0, policy_version 155808 (0.0014) [2024-06-15 13:27:28,692][1652491] Updated weights for policy 0, policy_version 155876 (0.0106) [2024-06-15 13:27:30,681][1652491] Updated weights for policy 0, policy_version 155923 (0.0015) [2024-06-15 13:27:30,955][1648985] Fps is (10 sec: 49151.1, 60 sec: 49151.9, 300 sec: 46430.6). Total num frames: 319356928. Throughput: 0: 11958.0. Samples: 79885312. Policy #0 lag: (min: 64.0, avg: 167.0, max: 320.0) [2024-06-15 13:27:30,956][1648985] Avg episode reward: [(0, '137.320')] [2024-06-15 13:27:35,955][1648985] Fps is (10 sec: 45874.7, 60 sec: 45875.1, 300 sec: 46652.7). Total num frames: 319422464. Throughput: 0: 12060.4. Samples: 79965696. Policy #0 lag: (min: 64.0, avg: 167.0, max: 320.0) [2024-06-15 13:27:35,956][1648985] Avg episode reward: [(0, '130.450')] [2024-06-15 13:27:36,404][1652491] Updated weights for policy 0, policy_version 155984 (0.0030) [2024-06-15 13:27:38,259][1652491] Updated weights for policy 0, policy_version 156048 (0.0017) [2024-06-15 13:27:39,706][1652491] Updated weights for policy 0, policy_version 156116 (0.0120) [2024-06-15 13:27:40,613][1652491] Updated weights for policy 0, policy_version 156153 (0.0011) [2024-06-15 13:27:40,955][1648985] Fps is (10 sec: 45875.0, 60 sec: 48059.6, 300 sec: 46652.7). Total num frames: 319815680. Throughput: 0: 11923.9. Samples: 79993344. Policy #0 lag: (min: 64.0, avg: 167.0, max: 320.0) [2024-06-15 13:27:40,956][1648985] Avg episode reward: [(0, '134.690')] [2024-06-15 13:27:42,393][1652491] Updated weights for policy 0, policy_version 156208 (0.0012) [2024-06-15 13:27:45,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 45875.1, 300 sec: 46652.7). Total num frames: 319946752. Throughput: 0: 12026.3. Samples: 80063488. Policy #0 lag: (min: 64.0, avg: 167.0, max: 320.0) [2024-06-15 13:27:45,956][1648985] Avg episode reward: [(0, '134.950')] [2024-06-15 13:27:48,423][1652491] Updated weights for policy 0, policy_version 156258 (0.0013) [2024-06-15 13:27:50,223][1652491] Updated weights for policy 0, policy_version 156336 (0.0014) [2024-06-15 13:27:50,955][1648985] Fps is (10 sec: 39322.0, 60 sec: 48059.7, 300 sec: 46652.7). Total num frames: 320208896. Throughput: 0: 11832.8. Samples: 80130048. Policy #0 lag: (min: 47.0, avg: 112.4, max: 303.0) [2024-06-15 13:27:50,956][1648985] Avg episode reward: [(0, '138.980')] [2024-06-15 13:27:51,558][1652491] Updated weights for policy 0, policy_version 156388 (0.0013) [2024-06-15 13:27:53,513][1652491] Updated weights for policy 0, policy_version 156448 (0.0013) [2024-06-15 13:27:55,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 47513.6, 300 sec: 46652.7). Total num frames: 320471040. Throughput: 0: 11571.2. Samples: 80156672. Policy #0 lag: (min: 47.0, avg: 112.4, max: 303.0) [2024-06-15 13:27:55,956][1648985] Avg episode reward: [(0, '128.980')] [2024-06-15 13:27:59,908][1652491] Updated weights for policy 0, policy_version 156500 (0.0013) [2024-06-15 13:28:00,955][1648985] Fps is (10 sec: 36045.6, 60 sec: 45329.2, 300 sec: 46541.7). Total num frames: 320569344. Throughput: 0: 11639.5. Samples: 80236032. Policy #0 lag: (min: 47.0, avg: 112.4, max: 303.0) [2024-06-15 13:28:00,955][1648985] Avg episode reward: [(0, '149.250')] [2024-06-15 13:28:00,958][1652491] Updated weights for policy 0, policy_version 156544 (0.0014) [2024-06-15 13:28:02,839][1652491] Updated weights for policy 0, policy_version 156613 (0.0023) [2024-06-15 13:28:03,160][1651469] Signal inference workers to stop experience collection... (8200 times) [2024-06-15 13:28:03,220][1652491] InferenceWorker_p0-w0: stopping experience collection (8200 times) [2024-06-15 13:28:03,514][1651469] Signal inference workers to resume experience collection... (8200 times) [2024-06-15 13:28:03,514][1652491] InferenceWorker_p0-w0: resuming experience collection (8200 times) [2024-06-15 13:28:04,847][1652491] Updated weights for policy 0, policy_version 156675 (0.0014) [2024-06-15 13:28:05,900][1652491] Updated weights for policy 0, policy_version 156726 (0.0013) [2024-06-15 13:28:05,955][1648985] Fps is (10 sec: 49151.8, 60 sec: 48605.7, 300 sec: 46541.6). Total num frames: 320962560. Throughput: 0: 11514.3. Samples: 80289792. Policy #0 lag: (min: 47.0, avg: 112.4, max: 303.0) [2024-06-15 13:28:05,956][1648985] Avg episode reward: [(0, '153.950')] [2024-06-15 13:28:10,955][1648985] Fps is (10 sec: 45874.5, 60 sec: 44236.8, 300 sec: 46763.8). Total num frames: 321028096. Throughput: 0: 11639.5. Samples: 80337408. Policy #0 lag: (min: 47.0, avg: 112.4, max: 303.0) [2024-06-15 13:28:10,956][1648985] Avg episode reward: [(0, '147.820')] [2024-06-15 13:28:11,443][1652491] Updated weights for policy 0, policy_version 156773 (0.0012) [2024-06-15 13:28:11,965][1652491] Updated weights for policy 0, policy_version 156799 (0.0011) [2024-06-15 13:28:13,521][1652491] Updated weights for policy 0, policy_version 156851 (0.0016) [2024-06-15 13:28:15,157][1652491] Updated weights for policy 0, policy_version 156924 (0.0012) [2024-06-15 13:28:15,955][1648985] Fps is (10 sec: 45876.0, 60 sec: 48606.0, 300 sec: 46652.8). Total num frames: 321421312. Throughput: 0: 11457.5. Samples: 80400896. Policy #0 lag: (min: 47.0, avg: 112.4, max: 303.0) [2024-06-15 13:28:15,955][1648985] Avg episode reward: [(0, '157.160')] [2024-06-15 13:28:16,890][1652491] Updated weights for policy 0, policy_version 156986 (0.0013) [2024-06-15 13:28:20,955][1648985] Fps is (10 sec: 49152.3, 60 sec: 44236.8, 300 sec: 46652.8). Total num frames: 321519616. Throughput: 0: 11229.9. Samples: 80471040. Policy #0 lag: (min: 47.0, avg: 112.4, max: 303.0) [2024-06-15 13:28:20,956][1648985] Avg episode reward: [(0, '143.590')] [2024-06-15 13:28:23,124][1652491] Updated weights for policy 0, policy_version 157040 (0.0015) [2024-06-15 13:28:25,125][1652491] Updated weights for policy 0, policy_version 157088 (0.0014) [2024-06-15 13:28:25,955][1648985] Fps is (10 sec: 36044.4, 60 sec: 46967.4, 300 sec: 46541.7). Total num frames: 321781760. Throughput: 0: 11400.6. Samples: 80506368. Policy #0 lag: (min: 47.0, avg: 112.4, max: 303.0) [2024-06-15 13:28:25,956][1648985] Avg episode reward: [(0, '141.210')] [2024-06-15 13:28:26,300][1652491] Updated weights for policy 0, policy_version 157137 (0.0022) [2024-06-15 13:28:27,893][1652491] Updated weights for policy 0, policy_version 157200 (0.0013) [2024-06-15 13:28:28,957][1652491] Updated weights for policy 0, policy_version 157243 (0.0012) [2024-06-15 13:28:30,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 44783.1, 300 sec: 46652.8). Total num frames: 322043904. Throughput: 0: 11434.7. Samples: 80578048. Policy #0 lag: (min: 47.0, avg: 112.4, max: 303.0) [2024-06-15 13:28:30,956][1648985] Avg episode reward: [(0, '135.870')] [2024-06-15 13:28:33,791][1652491] Updated weights for policy 0, policy_version 157300 (0.0013) [2024-06-15 13:28:35,955][1648985] Fps is (10 sec: 42598.5, 60 sec: 46421.4, 300 sec: 46541.7). Total num frames: 322207744. Throughput: 0: 11628.1. Samples: 80653312. Policy #0 lag: (min: 47.0, avg: 112.4, max: 303.0) [2024-06-15 13:28:35,956][1648985] Avg episode reward: [(0, '137.900')] [2024-06-15 13:28:36,339][1652491] Updated weights for policy 0, policy_version 157344 (0.0014) [2024-06-15 13:28:38,417][1652491] Updated weights for policy 0, policy_version 157424 (0.0016) [2024-06-15 13:28:39,897][1652491] Updated weights for policy 0, policy_version 157488 (0.0013) [2024-06-15 13:28:40,955][1648985] Fps is (10 sec: 52427.9, 60 sec: 45875.2, 300 sec: 46652.7). Total num frames: 322568192. Throughput: 0: 11525.7. Samples: 80675328. Policy #0 lag: (min: 47.0, avg: 112.4, max: 303.0) [2024-06-15 13:28:40,956][1648985] Avg episode reward: [(0, '130.890')] [2024-06-15 13:28:44,361][1652491] Updated weights for policy 0, policy_version 157506 (0.0011) [2024-06-15 13:28:45,630][1652491] Updated weights for policy 0, policy_version 157568 (0.0014) [2024-06-15 13:28:45,955][1648985] Fps is (10 sec: 49151.7, 60 sec: 45875.2, 300 sec: 46430.6). Total num frames: 322699264. Throughput: 0: 11457.4. Samples: 80751616. Policy #0 lag: (min: 47.0, avg: 112.4, max: 303.0) [2024-06-15 13:28:45,956][1648985] Avg episode reward: [(0, '147.670')] [2024-06-15 13:28:46,995][1651469] Signal inference workers to stop experience collection... (8250 times) [2024-06-15 13:28:47,038][1652491] InferenceWorker_p0-w0: stopping experience collection (8250 times) [2024-06-15 13:28:47,263][1651469] Signal inference workers to resume experience collection... (8250 times) [2024-06-15 13:28:47,263][1652491] InferenceWorker_p0-w0: resuming experience collection (8250 times) [2024-06-15 13:28:48,892][1652491] Updated weights for policy 0, policy_version 157648 (0.0106) [2024-06-15 13:28:50,404][1652491] Updated weights for policy 0, policy_version 157712 (0.0013) [2024-06-15 13:28:50,955][1648985] Fps is (10 sec: 45875.9, 60 sec: 46967.6, 300 sec: 46430.6). Total num frames: 323026944. Throughput: 0: 11594.0. Samples: 80811520. Policy #0 lag: (min: 47.0, avg: 112.4, max: 303.0) [2024-06-15 13:28:50,956][1648985] Avg episode reward: [(0, '133.970')] [2024-06-15 13:28:51,657][1652491] Updated weights for policy 0, policy_version 157756 (0.0013) [2024-06-15 13:28:55,955][1648985] Fps is (10 sec: 39322.0, 60 sec: 43690.7, 300 sec: 46208.5). Total num frames: 323092480. Throughput: 0: 11411.9. Samples: 80850944. Policy #0 lag: (min: 47.0, avg: 112.4, max: 303.0) [2024-06-15 13:28:55,956][1648985] Avg episode reward: [(0, '145.740')] [2024-06-15 13:28:56,396][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000157792_323158016.pth... [2024-06-15 13:28:56,579][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000152352_312016896.pth [2024-06-15 13:28:56,987][1652491] Updated weights for policy 0, policy_version 157811 (0.0026) [2024-06-15 13:28:59,071][1652491] Updated weights for policy 0, policy_version 157840 (0.0011) [2024-06-15 13:29:00,930][1652491] Updated weights for policy 0, policy_version 157920 (0.0013) [2024-06-15 13:29:00,955][1648985] Fps is (10 sec: 39321.4, 60 sec: 47513.5, 300 sec: 46319.5). Total num frames: 323420160. Throughput: 0: 11639.4. Samples: 80924672. Policy #0 lag: (min: 13.0, avg: 99.3, max: 269.0) [2024-06-15 13:29:00,956][1648985] Avg episode reward: [(0, '140.770')] [2024-06-15 13:29:02,594][1652491] Updated weights for policy 0, policy_version 157987 (0.0207) [2024-06-15 13:29:05,955][1648985] Fps is (10 sec: 52428.4, 60 sec: 44236.8, 300 sec: 46541.7). Total num frames: 323616768. Throughput: 0: 11480.2. Samples: 80987648. Policy #0 lag: (min: 13.0, avg: 99.3, max: 269.0) [2024-06-15 13:29:05,956][1648985] Avg episode reward: [(0, '145.820')] [2024-06-15 13:29:07,188][1652491] Updated weights for policy 0, policy_version 158018 (0.0012) [2024-06-15 13:29:08,425][1652491] Updated weights for policy 0, policy_version 158080 (0.0013) [2024-06-15 13:29:10,955][1648985] Fps is (10 sec: 42598.4, 60 sec: 46967.5, 300 sec: 46319.5). Total num frames: 323846144. Throughput: 0: 11571.2. Samples: 81027072. Policy #0 lag: (min: 13.0, avg: 99.3, max: 269.0) [2024-06-15 13:29:10,956][1648985] Avg episode reward: [(0, '148.540')] [2024-06-15 13:29:11,368][1652491] Updated weights for policy 0, policy_version 158146 (0.0016) [2024-06-15 13:29:13,344][1652491] Updated weights for policy 0, policy_version 158224 (0.0013) [2024-06-15 13:29:14,506][1652491] Updated weights for policy 0, policy_version 158272 (0.0012) [2024-06-15 13:29:15,955][1648985] Fps is (10 sec: 52429.5, 60 sec: 45329.1, 300 sec: 46652.8). Total num frames: 324141056. Throughput: 0: 11218.5. Samples: 81082880. Policy #0 lag: (min: 13.0, avg: 99.3, max: 269.0) [2024-06-15 13:29:15,956][1648985] Avg episode reward: [(0, '152.620')] [2024-06-15 13:29:20,064][1652491] Updated weights for policy 0, policy_version 158336 (0.0024) [2024-06-15 13:29:20,955][1648985] Fps is (10 sec: 42598.5, 60 sec: 45875.2, 300 sec: 46208.4). Total num frames: 324272128. Throughput: 0: 11286.8. Samples: 81161216. Policy #0 lag: (min: 13.0, avg: 99.3, max: 269.0) [2024-06-15 13:29:20,956][1648985] Avg episode reward: [(0, '169.220')] [2024-06-15 13:29:20,957][1651469] Saving new best policy, reward=169.220! [2024-06-15 13:29:22,993][1652491] Updated weights for policy 0, policy_version 158384 (0.0103) [2024-06-15 13:29:24,740][1652491] Updated weights for policy 0, policy_version 158452 (0.0016) [2024-06-15 13:29:25,018][1651469] Signal inference workers to stop experience collection... (8300 times) [2024-06-15 13:29:25,062][1652491] InferenceWorker_p0-w0: stopping experience collection (8300 times) [2024-06-15 13:29:25,304][1651469] Signal inference workers to resume experience collection... (8300 times) [2024-06-15 13:29:25,305][1652491] InferenceWorker_p0-w0: resuming experience collection (8300 times) [2024-06-15 13:29:25,955][1648985] Fps is (10 sec: 45874.9, 60 sec: 46967.5, 300 sec: 46430.6). Total num frames: 324599808. Throughput: 0: 11503.0. Samples: 81192960. Policy #0 lag: (min: 13.0, avg: 99.3, max: 269.0) [2024-06-15 13:29:25,955][1648985] Avg episode reward: [(0, '152.780')] [2024-06-15 13:29:26,566][1652491] Updated weights for policy 0, policy_version 158528 (0.0021) [2024-06-15 13:29:30,955][1648985] Fps is (10 sec: 42597.7, 60 sec: 44236.6, 300 sec: 45875.2). Total num frames: 324698112. Throughput: 0: 11286.7. Samples: 81259520. Policy #0 lag: (min: 13.0, avg: 99.3, max: 269.0) [2024-06-15 13:29:30,956][1648985] Avg episode reward: [(0, '113.760')] [2024-06-15 13:29:33,433][1652491] Updated weights for policy 0, policy_version 158598 (0.0011) [2024-06-15 13:29:34,842][1652491] Updated weights for policy 0, policy_version 158656 (0.0011) [2024-06-15 13:29:35,955][1648985] Fps is (10 sec: 39321.9, 60 sec: 46421.4, 300 sec: 45986.3). Total num frames: 324993024. Throughput: 0: 11480.2. Samples: 81328128. Policy #0 lag: (min: 13.0, avg: 99.3, max: 269.0) [2024-06-15 13:29:35,955][1648985] Avg episode reward: [(0, '108.370')] [2024-06-15 13:29:36,606][1652491] Updated weights for policy 0, policy_version 158720 (0.0012) [2024-06-15 13:29:38,080][1652491] Updated weights for policy 0, policy_version 158784 (0.0126) [2024-06-15 13:29:40,955][1648985] Fps is (10 sec: 49152.9, 60 sec: 43690.8, 300 sec: 46208.4). Total num frames: 325189632. Throughput: 0: 11241.2. Samples: 81356800. Policy #0 lag: (min: 13.0, avg: 99.3, max: 269.0) [2024-06-15 13:29:40,956][1648985] Avg episode reward: [(0, '124.790')] [2024-06-15 13:29:42,711][1652491] Updated weights for policy 0, policy_version 158839 (0.0014) [2024-06-15 13:29:45,092][1652491] Updated weights for policy 0, policy_version 158880 (0.0014) [2024-06-15 13:29:45,955][1648985] Fps is (10 sec: 45873.5, 60 sec: 45875.1, 300 sec: 45986.2). Total num frames: 325451776. Throughput: 0: 11480.1. Samples: 81441280. Policy #0 lag: (min: 13.0, avg: 99.3, max: 269.0) [2024-06-15 13:29:45,956][1648985] Avg episode reward: [(0, '132.450')] [2024-06-15 13:29:46,338][1652491] Updated weights for policy 0, policy_version 158931 (0.0012) [2024-06-15 13:29:48,709][1652491] Updated weights for policy 0, policy_version 159033 (0.0013) [2024-06-15 13:29:50,956][1648985] Fps is (10 sec: 52426.7, 60 sec: 44782.6, 300 sec: 46652.7). Total num frames: 325713920. Throughput: 0: 11298.0. Samples: 81496064. Policy #0 lag: (min: 13.0, avg: 99.3, max: 269.0) [2024-06-15 13:29:50,957][1648985] Avg episode reward: [(0, '128.590')] [2024-06-15 13:29:54,201][1652491] Updated weights for policy 0, policy_version 159092 (0.0013) [2024-06-15 13:29:55,955][1648985] Fps is (10 sec: 39322.8, 60 sec: 45875.2, 300 sec: 45986.3). Total num frames: 325844992. Throughput: 0: 11389.2. Samples: 81539584. Policy #0 lag: (min: 13.0, avg: 99.3, max: 269.0) [2024-06-15 13:29:55,956][1648985] Avg episode reward: [(0, '142.190')] [2024-06-15 13:29:56,112][1652491] Updated weights for policy 0, policy_version 159106 (0.0011) [2024-06-15 13:29:57,632][1652491] Updated weights for policy 0, policy_version 159170 (0.0013) [2024-06-15 13:29:59,012][1652491] Updated weights for policy 0, policy_version 159232 (0.0013) [2024-06-15 13:30:00,269][1652491] Updated weights for policy 0, policy_version 159293 (0.0014) [2024-06-15 13:30:00,956][1648985] Fps is (10 sec: 52428.9, 60 sec: 46967.2, 300 sec: 46652.7). Total num frames: 326238208. Throughput: 0: 11548.3. Samples: 81602560. Policy #0 lag: (min: 155.0, avg: 251.4, max: 415.0) [2024-06-15 13:30:00,956][1648985] Avg episode reward: [(0, '127.070')] [2024-06-15 13:30:05,099][1652491] Updated weights for policy 0, policy_version 159355 (0.0013) [2024-06-15 13:30:05,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 45875.3, 300 sec: 46208.5). Total num frames: 326369280. Throughput: 0: 11480.2. Samples: 81677824. Policy #0 lag: (min: 155.0, avg: 251.4, max: 415.0) [2024-06-15 13:30:05,956][1648985] Avg episode reward: [(0, '105.080')] [2024-06-15 13:30:08,326][1651469] Signal inference workers to stop experience collection... (8350 times) [2024-06-15 13:30:08,388][1652491] InferenceWorker_p0-w0: stopping experience collection (8350 times) [2024-06-15 13:30:08,651][1651469] Signal inference workers to resume experience collection... (8350 times) [2024-06-15 13:30:08,652][1652491] InferenceWorker_p0-w0: resuming experience collection (8350 times) [2024-06-15 13:30:08,861][1652491] Updated weights for policy 0, policy_version 159410 (0.0014) [2024-06-15 13:30:10,452][1652491] Updated weights for policy 0, policy_version 159477 (0.0015) [2024-06-15 13:30:10,955][1648985] Fps is (10 sec: 39323.1, 60 sec: 46421.3, 300 sec: 46208.4). Total num frames: 326631424. Throughput: 0: 11480.2. Samples: 81709568. Policy #0 lag: (min: 155.0, avg: 251.4, max: 415.0) [2024-06-15 13:30:10,956][1648985] Avg episode reward: [(0, '112.010')] [2024-06-15 13:30:11,899][1652491] Updated weights for policy 0, policy_version 159551 (0.0014) [2024-06-15 13:30:15,955][1648985] Fps is (10 sec: 45875.3, 60 sec: 44782.9, 300 sec: 46319.5). Total num frames: 326828032. Throughput: 0: 11514.4. Samples: 81777664. Policy #0 lag: (min: 155.0, avg: 251.4, max: 415.0) [2024-06-15 13:30:15,956][1648985] Avg episode reward: [(0, '120.040')] [2024-06-15 13:30:16,281][1652491] Updated weights for policy 0, policy_version 159600 (0.0105) [2024-06-15 13:30:20,240][1652491] Updated weights for policy 0, policy_version 159664 (0.0015) [2024-06-15 13:30:20,955][1648985] Fps is (10 sec: 42598.8, 60 sec: 46421.4, 300 sec: 45875.2). Total num frames: 327057408. Throughput: 0: 11514.3. Samples: 81846272. Policy #0 lag: (min: 155.0, avg: 251.4, max: 415.0) [2024-06-15 13:30:20,956][1648985] Avg episode reward: [(0, '139.070')] [2024-06-15 13:30:21,640][1652491] Updated weights for policy 0, policy_version 159728 (0.0030) [2024-06-15 13:30:22,927][1652491] Updated weights for policy 0, policy_version 159782 (0.0012) [2024-06-15 13:30:25,955][1648985] Fps is (10 sec: 45874.8, 60 sec: 44782.9, 300 sec: 46208.4). Total num frames: 327286784. Throughput: 0: 11514.3. Samples: 81874944. Policy #0 lag: (min: 155.0, avg: 251.4, max: 415.0) [2024-06-15 13:30:25,956][1648985] Avg episode reward: [(0, '122.180')] [2024-06-15 13:30:26,776][1652491] Updated weights for policy 0, policy_version 159824 (0.0014) [2024-06-15 13:30:27,731][1652491] Updated weights for policy 0, policy_version 159872 (0.0117) [2024-06-15 13:30:30,955][1648985] Fps is (10 sec: 36044.5, 60 sec: 45329.2, 300 sec: 45542.0). Total num frames: 327417856. Throughput: 0: 11264.1. Samples: 81948160. Policy #0 lag: (min: 155.0, avg: 251.4, max: 415.0) [2024-06-15 13:30:30,956][1648985] Avg episode reward: [(0, '128.350')] [2024-06-15 13:30:32,390][1652491] Updated weights for policy 0, policy_version 159939 (0.0013) [2024-06-15 13:30:33,452][1652491] Updated weights for policy 0, policy_version 159986 (0.0014) [2024-06-15 13:30:34,877][1652491] Updated weights for policy 0, policy_version 160054 (0.0012) [2024-06-15 13:30:35,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 46967.4, 300 sec: 46544.7). Total num frames: 327811072. Throughput: 0: 11423.4. Samples: 82010112. Policy #0 lag: (min: 155.0, avg: 251.4, max: 415.0) [2024-06-15 13:30:35,956][1648985] Avg episode reward: [(0, '142.900')] [2024-06-15 13:30:39,026][1652491] Updated weights for policy 0, policy_version 160112 (0.0013) [2024-06-15 13:30:40,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 45875.2, 300 sec: 46097.4). Total num frames: 327942144. Throughput: 0: 11343.6. Samples: 82050048. Policy #0 lag: (min: 155.0, avg: 251.4, max: 415.0) [2024-06-15 13:30:40,956][1648985] Avg episode reward: [(0, '154.690')] [2024-06-15 13:30:43,425][1652491] Updated weights for policy 0, policy_version 160192 (0.0036) [2024-06-15 13:30:44,796][1652491] Updated weights for policy 0, policy_version 160265 (0.0013) [2024-06-15 13:30:45,408][1651469] Signal inference workers to stop experience collection... (8400 times) [2024-06-15 13:30:45,500][1652491] InferenceWorker_p0-w0: stopping experience collection (8400 times) [2024-06-15 13:30:45,714][1651469] Signal inference workers to resume experience collection... (8400 times) [2024-06-15 13:30:45,715][1652491] InferenceWorker_p0-w0: resuming experience collection (8400 times) [2024-06-15 13:30:45,952][1652491] Updated weights for policy 0, policy_version 160313 (0.0118) [2024-06-15 13:30:45,955][1648985] Fps is (10 sec: 49152.0, 60 sec: 47513.8, 300 sec: 46541.7). Total num frames: 328302592. Throughput: 0: 11548.5. Samples: 82122240. Policy #0 lag: (min: 155.0, avg: 251.4, max: 415.0) [2024-06-15 13:30:45,956][1648985] Avg episode reward: [(0, '145.990')] [2024-06-15 13:30:49,776][1652491] Updated weights for policy 0, policy_version 160356 (0.0013) [2024-06-15 13:30:50,955][1648985] Fps is (10 sec: 52427.5, 60 sec: 45875.3, 300 sec: 46208.4). Total num frames: 328466432. Throughput: 0: 11491.5. Samples: 82194944. Policy #0 lag: (min: 155.0, avg: 251.4, max: 415.0) [2024-06-15 13:30:50,956][1648985] Avg episode reward: [(0, '157.380')] [2024-06-15 13:30:53,709][1652491] Updated weights for policy 0, policy_version 160419 (0.0013) [2024-06-15 13:30:55,411][1652491] Updated weights for policy 0, policy_version 160504 (0.0012) [2024-06-15 13:30:55,955][1648985] Fps is (10 sec: 42597.1, 60 sec: 48059.5, 300 sec: 46208.4). Total num frames: 328728576. Throughput: 0: 11605.3. Samples: 82231808. Policy #0 lag: (min: 155.0, avg: 251.4, max: 415.0) [2024-06-15 13:30:55,956][1648985] Avg episode reward: [(0, '143.030')] [2024-06-15 13:30:56,277][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000160544_328794112.pth... [2024-06-15 13:30:56,402][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000155120_317685760.pth [2024-06-15 13:30:56,405][1651469] Saving a milestone train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/milestones/checkpoint_000160544_328794112.pth [2024-06-15 13:30:56,730][1652491] Updated weights for policy 0, policy_version 160560 (0.0013) [2024-06-15 13:31:00,773][1652491] Updated weights for policy 0, policy_version 160610 (0.0017) [2024-06-15 13:31:00,966][1648985] Fps is (10 sec: 49098.9, 60 sec: 45321.0, 300 sec: 46428.8). Total num frames: 328957952. Throughput: 0: 11659.3. Samples: 82302464. Policy #0 lag: (min: 155.0, avg: 251.4, max: 415.0) [2024-06-15 13:31:00,967][1648985] Avg episode reward: [(0, '132.140')] [2024-06-15 13:31:04,509][1652491] Updated weights for policy 0, policy_version 160672 (0.0085) [2024-06-15 13:31:05,955][1648985] Fps is (10 sec: 42599.8, 60 sec: 46421.3, 300 sec: 46097.4). Total num frames: 329154560. Throughput: 0: 11571.2. Samples: 82366976. Policy #0 lag: (min: 47.0, avg: 125.5, max: 303.0) [2024-06-15 13:31:05,956][1648985] Avg episode reward: [(0, '126.310')] [2024-06-15 13:31:06,471][1652491] Updated weights for policy 0, policy_version 160752 (0.0015) [2024-06-15 13:31:07,759][1652491] Updated weights for policy 0, policy_version 160801 (0.0160) [2024-06-15 13:31:10,920][1652491] Updated weights for policy 0, policy_version 160836 (0.0034) [2024-06-15 13:31:10,955][1648985] Fps is (10 sec: 42645.8, 60 sec: 45875.2, 300 sec: 46208.4). Total num frames: 329383936. Throughput: 0: 11673.6. Samples: 82400256. Policy #0 lag: (min: 47.0, avg: 125.5, max: 303.0) [2024-06-15 13:31:10,956][1648985] Avg episode reward: [(0, '130.830')] [2024-06-15 13:31:15,955][1648985] Fps is (10 sec: 39321.6, 60 sec: 45329.1, 300 sec: 45986.3). Total num frames: 329547776. Throughput: 0: 11798.8. Samples: 82479104. Policy #0 lag: (min: 47.0, avg: 125.5, max: 303.0) [2024-06-15 13:31:15,956][1648985] Avg episode reward: [(0, '129.870')] [2024-06-15 13:31:15,968][1652491] Updated weights for policy 0, policy_version 160916 (0.0014) [2024-06-15 13:31:17,264][1652491] Updated weights for policy 0, policy_version 160976 (0.0015) [2024-06-15 13:31:18,844][1652491] Updated weights for policy 0, policy_version 161040 (0.0124) [2024-06-15 13:31:19,664][1652491] Updated weights for policy 0, policy_version 161082 (0.0013) [2024-06-15 13:31:20,955][1648985] Fps is (10 sec: 52428.6, 60 sec: 47513.5, 300 sec: 46319.5). Total num frames: 329908224. Throughput: 0: 11753.2. Samples: 82539008. Policy #0 lag: (min: 47.0, avg: 125.5, max: 303.0) [2024-06-15 13:31:20,956][1648985] Avg episode reward: [(0, '124.620')] [2024-06-15 13:31:23,472][1652491] Updated weights for policy 0, policy_version 161143 (0.0015) [2024-06-15 13:31:25,955][1648985] Fps is (10 sec: 49151.0, 60 sec: 45875.1, 300 sec: 46208.4). Total num frames: 330039296. Throughput: 0: 11639.4. Samples: 82573824. Policy #0 lag: (min: 47.0, avg: 125.5, max: 303.0) [2024-06-15 13:31:25,956][1648985] Avg episode reward: [(0, '125.790')] [2024-06-15 13:31:28,704][1652491] Updated weights for policy 0, policy_version 161201 (0.0014) [2024-06-15 13:31:29,345][1651469] Signal inference workers to stop experience collection... (8450 times) [2024-06-15 13:31:29,388][1652491] InferenceWorker_p0-w0: stopping experience collection (8450 times) [2024-06-15 13:31:29,568][1651469] Signal inference workers to resume experience collection... (8450 times) [2024-06-15 13:31:29,568][1652491] InferenceWorker_p0-w0: resuming experience collection (8450 times) [2024-06-15 13:31:30,236][1652491] Updated weights for policy 0, policy_version 161280 (0.0017) [2024-06-15 13:31:30,955][1648985] Fps is (10 sec: 42598.7, 60 sec: 48605.9, 300 sec: 46319.5). Total num frames: 330334208. Throughput: 0: 11594.0. Samples: 82643968. Policy #0 lag: (min: 47.0, avg: 125.5, max: 303.0) [2024-06-15 13:31:30,956][1648985] Avg episode reward: [(0, '133.180')] [2024-06-15 13:31:31,862][1652491] Updated weights for policy 0, policy_version 161339 (0.0016) [2024-06-15 13:31:34,548][1652491] Updated weights for policy 0, policy_version 161376 (0.0014) [2024-06-15 13:31:35,955][1648985] Fps is (10 sec: 52429.9, 60 sec: 45875.2, 300 sec: 46208.5). Total num frames: 330563584. Throughput: 0: 11503.0. Samples: 82712576. Policy #0 lag: (min: 47.0, avg: 125.5, max: 303.0) [2024-06-15 13:31:35,955][1648985] Avg episode reward: [(0, '122.930')] [2024-06-15 13:31:39,804][1652491] Updated weights for policy 0, policy_version 161456 (0.0014) [2024-06-15 13:31:40,955][1648985] Fps is (10 sec: 42597.4, 60 sec: 46967.3, 300 sec: 45986.2). Total num frames: 330760192. Throughput: 0: 11639.5. Samples: 82755584. Policy #0 lag: (min: 47.0, avg: 125.5, max: 303.0) [2024-06-15 13:31:40,956][1648985] Avg episode reward: [(0, '140.150')] [2024-06-15 13:31:41,938][1652491] Updated weights for policy 0, policy_version 161552 (0.0014) [2024-06-15 13:31:43,062][1652491] Updated weights for policy 0, policy_version 161594 (0.0012) [2024-06-15 13:31:45,955][1648985] Fps is (10 sec: 42598.2, 60 sec: 44782.9, 300 sec: 46319.5). Total num frames: 330989568. Throughput: 0: 11357.8. Samples: 82813440. Policy #0 lag: (min: 47.0, avg: 125.5, max: 303.0) [2024-06-15 13:31:45,956][1648985] Avg episode reward: [(0, '142.950')] [2024-06-15 13:31:50,148][1652491] Updated weights for policy 0, policy_version 161680 (0.0021) [2024-06-15 13:31:50,955][1648985] Fps is (10 sec: 42599.5, 60 sec: 45329.3, 300 sec: 45986.3). Total num frames: 331186176. Throughput: 0: 11594.0. Samples: 82888704. Policy #0 lag: (min: 47.0, avg: 125.5, max: 303.0) [2024-06-15 13:31:50,955][1648985] Avg episode reward: [(0, '137.890')] [2024-06-15 13:31:51,339][1652491] Updated weights for policy 0, policy_version 161733 (0.0014) [2024-06-15 13:31:52,408][1652491] Updated weights for policy 0, policy_version 161785 (0.0012) [2024-06-15 13:31:53,741][1652491] Updated weights for policy 0, policy_version 161845 (0.0014) [2024-06-15 13:31:55,955][1648985] Fps is (10 sec: 49150.3, 60 sec: 45875.1, 300 sec: 46208.4). Total num frames: 331481088. Throughput: 0: 11491.5. Samples: 82917376. Policy #0 lag: (min: 47.0, avg: 125.5, max: 303.0) [2024-06-15 13:31:55,957][1648985] Avg episode reward: [(0, '128.850')] [2024-06-15 13:31:56,683][1652491] Updated weights for policy 0, policy_version 161888 (0.0014) [2024-06-15 13:32:00,146][1652491] Updated weights for policy 0, policy_version 161936 (0.0018) [2024-06-15 13:32:00,955][1648985] Fps is (10 sec: 49151.2, 60 sec: 45337.4, 300 sec: 46208.4). Total num frames: 331677696. Throughput: 0: 11719.1. Samples: 83006464. Policy #0 lag: (min: 47.0, avg: 125.5, max: 303.0) [2024-06-15 13:32:00,956][1648985] Avg episode reward: [(0, '130.100')] [2024-06-15 13:32:01,639][1652491] Updated weights for policy 0, policy_version 161984 (0.0012) [2024-06-15 13:32:03,403][1652491] Updated weights for policy 0, policy_version 162064 (0.0014) [2024-06-15 13:32:05,968][1648985] Fps is (10 sec: 52365.1, 60 sec: 47503.7, 300 sec: 46206.5). Total num frames: 332005376. Throughput: 0: 11784.1. Samples: 83069440. Policy #0 lag: (min: 47.0, avg: 125.5, max: 303.0) [2024-06-15 13:32:05,968][1648985] Avg episode reward: [(0, '132.390')] [2024-06-15 13:32:06,799][1652491] Updated weights for policy 0, policy_version 162117 (0.0017) [2024-06-15 13:32:07,902][1652491] Updated weights for policy 0, policy_version 162176 (0.0015) [2024-06-15 13:32:10,655][1651469] Signal inference workers to stop experience collection... (8500 times) [2024-06-15 13:32:10,725][1652491] InferenceWorker_p0-w0: stopping experience collection (8500 times) [2024-06-15 13:32:10,879][1651469] Signal inference workers to resume experience collection... (8500 times) [2024-06-15 13:32:10,879][1652491] InferenceWorker_p0-w0: resuming experience collection (8500 times) [2024-06-15 13:32:10,955][1648985] Fps is (10 sec: 49152.6, 60 sec: 46421.3, 300 sec: 46319.5). Total num frames: 332169216. Throughput: 0: 11832.9. Samples: 83106304. Policy #0 lag: (min: 15.0, avg: 90.9, max: 271.0) [2024-06-15 13:32:10,956][1648985] Avg episode reward: [(0, '132.920')] [2024-06-15 13:32:11,707][1652491] Updated weights for policy 0, policy_version 162224 (0.0014) [2024-06-15 13:32:12,896][1652491] Updated weights for policy 0, policy_version 162272 (0.0091) [2024-06-15 13:32:14,229][1652491] Updated weights for policy 0, policy_version 162320 (0.0014) [2024-06-15 13:32:15,354][1652491] Updated weights for policy 0, policy_version 162363 (0.0013) [2024-06-15 13:32:15,955][1648985] Fps is (10 sec: 52494.3, 60 sec: 49698.1, 300 sec: 46319.5). Total num frames: 332529664. Throughput: 0: 11946.6. Samples: 83181568. Policy #0 lag: (min: 15.0, avg: 90.9, max: 271.0) [2024-06-15 13:32:15,956][1648985] Avg episode reward: [(0, '138.670')] [2024-06-15 13:32:18,227][1652491] Updated weights for policy 0, policy_version 162423 (0.0017) [2024-06-15 13:32:20,955][1648985] Fps is (10 sec: 49151.8, 60 sec: 45875.2, 300 sec: 46430.6). Total num frames: 332660736. Throughput: 0: 12071.8. Samples: 83255808. Policy #0 lag: (min: 15.0, avg: 90.9, max: 271.0) [2024-06-15 13:32:20,956][1648985] Avg episode reward: [(0, '155.020')] [2024-06-15 13:32:22,435][1652491] Updated weights for policy 0, policy_version 162485 (0.0052) [2024-06-15 13:32:23,877][1652491] Updated weights for policy 0, policy_version 162516 (0.0021) [2024-06-15 13:32:25,672][1652491] Updated weights for policy 0, policy_version 162592 (0.0016) [2024-06-15 13:32:25,965][1648985] Fps is (10 sec: 45829.0, 60 sec: 49143.8, 300 sec: 46206.9). Total num frames: 332988416. Throughput: 0: 11932.7. Samples: 83292672. Policy #0 lag: (min: 15.0, avg: 90.9, max: 271.0) [2024-06-15 13:32:25,966][1648985] Avg episode reward: [(0, '144.750')] [2024-06-15 13:32:28,410][1652491] Updated weights for policy 0, policy_version 162642 (0.0014) [2024-06-15 13:32:30,965][1648985] Fps is (10 sec: 52381.2, 60 sec: 47506.4, 300 sec: 46651.3). Total num frames: 333185024. Throughput: 0: 12149.0. Samples: 83360256. Policy #0 lag: (min: 15.0, avg: 90.9, max: 271.0) [2024-06-15 13:32:30,967][1648985] Avg episode reward: [(0, '139.060')] [2024-06-15 13:32:32,727][1652491] Updated weights for policy 0, policy_version 162720 (0.0014) [2024-06-15 13:32:35,033][1652491] Updated weights for policy 0, policy_version 162768 (0.0013) [2024-06-15 13:32:35,955][1648985] Fps is (10 sec: 42641.6, 60 sec: 47513.6, 300 sec: 46097.4). Total num frames: 333414400. Throughput: 0: 12003.5. Samples: 83428864. Policy #0 lag: (min: 15.0, avg: 90.9, max: 271.0) [2024-06-15 13:32:35,956][1648985] Avg episode reward: [(0, '126.890')] [2024-06-15 13:32:37,321][1652491] Updated weights for policy 0, policy_version 162838 (0.0134) [2024-06-15 13:32:38,783][1652491] Updated weights for policy 0, policy_version 162881 (0.0013) [2024-06-15 13:32:40,955][1648985] Fps is (10 sec: 52476.9, 60 sec: 49152.2, 300 sec: 46652.8). Total num frames: 333709312. Throughput: 0: 12185.7. Samples: 83465728. Policy #0 lag: (min: 15.0, avg: 90.9, max: 271.0) [2024-06-15 13:32:40,956][1648985] Avg episode reward: [(0, '130.400')] [2024-06-15 13:32:43,641][1652491] Updated weights for policy 0, policy_version 162960 (0.0017) [2024-06-15 13:32:44,970][1652491] Updated weights for policy 0, policy_version 163004 (0.0019) [2024-06-15 13:32:45,955][1648985] Fps is (10 sec: 42598.2, 60 sec: 47513.6, 300 sec: 46208.4). Total num frames: 333840384. Throughput: 0: 11616.7. Samples: 83529216. Policy #0 lag: (min: 15.0, avg: 90.9, max: 271.0) [2024-06-15 13:32:45,956][1648985] Avg episode reward: [(0, '123.710')] [2024-06-15 13:32:47,487][1652491] Updated weights for policy 0, policy_version 163056 (0.0083) [2024-06-15 13:32:49,197][1652491] Updated weights for policy 0, policy_version 163104 (0.0012) [2024-06-15 13:32:50,954][1652491] Updated weights for policy 0, policy_version 163184 (0.0016) [2024-06-15 13:32:50,955][1648985] Fps is (10 sec: 49152.1, 60 sec: 50244.3, 300 sec: 46541.7). Total num frames: 334200832. Throughput: 0: 11802.1. Samples: 83600384. Policy #0 lag: (min: 15.0, avg: 90.9, max: 271.0) [2024-06-15 13:32:50,955][1648985] Avg episode reward: [(0, '118.840')] [2024-06-15 13:32:54,971][1651469] Signal inference workers to stop experience collection... (8550 times) [2024-06-15 13:32:55,031][1652491] InferenceWorker_p0-w0: stopping experience collection (8550 times) [2024-06-15 13:32:55,151][1651469] Signal inference workers to resume experience collection... (8550 times) [2024-06-15 13:32:55,152][1652491] InferenceWorker_p0-w0: resuming experience collection (8550 times) [2024-06-15 13:32:55,734][1652491] Updated weights for policy 0, policy_version 163233 (0.0017) [2024-06-15 13:32:55,962][1648985] Fps is (10 sec: 45846.4, 60 sec: 46962.8, 300 sec: 46540.7). Total num frames: 334299136. Throughput: 0: 11785.7. Samples: 83636736. Policy #0 lag: (min: 15.0, avg: 90.9, max: 271.0) [2024-06-15 13:32:55,963][1648985] Avg episode reward: [(0, '131.170')] [2024-06-15 13:32:56,205][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000163264_334364672.pth... [2024-06-15 13:32:56,242][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000157792_323158016.pth [2024-06-15 13:32:58,072][1652491] Updated weights for policy 0, policy_version 163285 (0.0043) [2024-06-15 13:32:59,769][1652491] Updated weights for policy 0, policy_version 163331 (0.0011) [2024-06-15 13:33:00,955][1648985] Fps is (10 sec: 39321.3, 60 sec: 48606.0, 300 sec: 46208.5). Total num frames: 334594048. Throughput: 0: 11844.3. Samples: 83714560. Policy #0 lag: (min: 15.0, avg: 90.9, max: 271.0) [2024-06-15 13:33:00,956][1648985] Avg episode reward: [(0, '147.830')] [2024-06-15 13:33:01,604][1652491] Updated weights for policy 0, policy_version 163408 (0.0011) [2024-06-15 13:33:05,955][1648985] Fps is (10 sec: 45904.3, 60 sec: 45884.8, 300 sec: 46541.7). Total num frames: 334757888. Throughput: 0: 11673.6. Samples: 83781120. Policy #0 lag: (min: 15.0, avg: 90.9, max: 271.0) [2024-06-15 13:33:05,956][1648985] Avg episode reward: [(0, '153.660')] [2024-06-15 13:33:06,039][1652491] Updated weights for policy 0, policy_version 163472 (0.0013) [2024-06-15 13:33:06,789][1652491] Updated weights for policy 0, policy_version 163520 (0.0012) [2024-06-15 13:33:10,955][1648985] Fps is (10 sec: 39320.7, 60 sec: 46967.3, 300 sec: 45986.2). Total num frames: 334987264. Throughput: 0: 11710.3. Samples: 83819520. Policy #0 lag: (min: 15.0, avg: 90.9, max: 271.0) [2024-06-15 13:33:10,956][1648985] Avg episode reward: [(0, '142.800')] [2024-06-15 13:33:11,281][1652491] Updated weights for policy 0, policy_version 163586 (0.0011) [2024-06-15 13:33:13,216][1652491] Updated weights for policy 0, policy_version 163664 (0.0012) [2024-06-15 13:33:14,378][1652491] Updated weights for policy 0, policy_version 163712 (0.0102) [2024-06-15 13:33:15,955][1648985] Fps is (10 sec: 52428.4, 60 sec: 45875.2, 300 sec: 46652.7). Total num frames: 335282176. Throughput: 0: 11493.9. Samples: 83877376. Policy #0 lag: (min: 15.0, avg: 90.9, max: 271.0) [2024-06-15 13:33:15,956][1648985] Avg episode reward: [(0, '134.370')] [2024-06-15 13:33:18,486][1652491] Updated weights for policy 0, policy_version 163769 (0.0013) [2024-06-15 13:33:20,955][1648985] Fps is (10 sec: 42599.5, 60 sec: 45875.2, 300 sec: 46208.5). Total num frames: 335413248. Throughput: 0: 11639.5. Samples: 83952640. Policy #0 lag: (min: 15.0, avg: 136.0, max: 271.0) [2024-06-15 13:33:20,956][1648985] Avg episode reward: [(0, '147.890')] [2024-06-15 13:33:22,477][1652491] Updated weights for policy 0, policy_version 163824 (0.0012) [2024-06-15 13:33:24,767][1652491] Updated weights for policy 0, policy_version 163904 (0.0092) [2024-06-15 13:33:25,955][1648985] Fps is (10 sec: 45875.0, 60 sec: 45882.9, 300 sec: 46430.6). Total num frames: 335740928. Throughput: 0: 11377.7. Samples: 83977728. Policy #0 lag: (min: 15.0, avg: 136.0, max: 271.0) [2024-06-15 13:33:25,956][1648985] Avg episode reward: [(0, '151.160')] [2024-06-15 13:33:26,387][1652491] Updated weights for policy 0, policy_version 163964 (0.0013) [2024-06-15 13:33:30,492][1652491] Updated weights for policy 0, policy_version 164032 (0.0014) [2024-06-15 13:33:30,955][1648985] Fps is (10 sec: 52428.5, 60 sec: 45882.2, 300 sec: 46541.7). Total num frames: 335937536. Throughput: 0: 11377.8. Samples: 84041216. Policy #0 lag: (min: 15.0, avg: 136.0, max: 271.0) [2024-06-15 13:33:30,956][1648985] Avg episode reward: [(0, '157.250')] [2024-06-15 13:33:34,847][1652491] Updated weights for policy 0, policy_version 164086 (0.0106) [2024-06-15 13:33:35,864][1651469] Signal inference workers to stop experience collection... (8600 times) [2024-06-15 13:33:35,955][1648985] Fps is (10 sec: 39321.9, 60 sec: 45329.0, 300 sec: 45986.3). Total num frames: 336134144. Throughput: 0: 11355.0. Samples: 84111360. Policy #0 lag: (min: 15.0, avg: 136.0, max: 271.0) [2024-06-15 13:33:35,956][1648985] Avg episode reward: [(0, '148.620')] [2024-06-15 13:33:35,972][1652491] InferenceWorker_p0-w0: stopping experience collection (8600 times) [2024-06-15 13:33:36,152][1651469] Signal inference workers to resume experience collection... (8600 times) [2024-06-15 13:33:36,153][1652491] InferenceWorker_p0-w0: resuming experience collection (8600 times) [2024-06-15 13:33:36,359][1652491] Updated weights for policy 0, policy_version 164151 (0.0014) [2024-06-15 13:33:37,421][1652491] Updated weights for policy 0, policy_version 164192 (0.0013) [2024-06-15 13:33:38,100][1652491] Updated weights for policy 0, policy_version 164223 (0.0013) [2024-06-15 13:33:40,955][1648985] Fps is (10 sec: 45875.5, 60 sec: 44782.9, 300 sec: 46430.6). Total num frames: 336396288. Throughput: 0: 11368.0. Samples: 84148224. Policy #0 lag: (min: 15.0, avg: 136.0, max: 271.0) [2024-06-15 13:33:40,955][1648985] Avg episode reward: [(0, '144.110')] [2024-06-15 13:33:41,476][1652491] Updated weights for policy 0, policy_version 164280 (0.0011) [2024-06-15 13:33:45,955][1648985] Fps is (10 sec: 39321.3, 60 sec: 44782.9, 300 sec: 45764.1). Total num frames: 336527360. Throughput: 0: 11252.6. Samples: 84220928. Policy #0 lag: (min: 15.0, avg: 136.0, max: 271.0) [2024-06-15 13:33:45,956][1648985] Avg episode reward: [(0, '136.740')] [2024-06-15 13:33:46,288][1652491] Updated weights for policy 0, policy_version 164338 (0.0012) [2024-06-15 13:33:47,800][1652491] Updated weights for policy 0, policy_version 164390 (0.0020) [2024-06-15 13:33:49,415][1652491] Updated weights for policy 0, policy_version 164450 (0.0026) [2024-06-15 13:33:50,955][1648985] Fps is (10 sec: 45874.2, 60 sec: 44236.6, 300 sec: 46652.7). Total num frames: 336855040. Throughput: 0: 11013.6. Samples: 84276736. Policy #0 lag: (min: 15.0, avg: 136.0, max: 271.0) [2024-06-15 13:33:50,956][1648985] Avg episode reward: [(0, '138.800')] [2024-06-15 13:33:52,699][1652491] Updated weights for policy 0, policy_version 164499 (0.0014) [2024-06-15 13:33:55,955][1648985] Fps is (10 sec: 45875.7, 60 sec: 44787.6, 300 sec: 45986.3). Total num frames: 336986112. Throughput: 0: 10968.2. Samples: 84313088. Policy #0 lag: (min: 15.0, avg: 136.0, max: 271.0) [2024-06-15 13:33:55,956][1648985] Avg episode reward: [(0, '138.050')] [2024-06-15 13:33:57,942][1652491] Updated weights for policy 0, policy_version 164594 (0.0134) [2024-06-15 13:33:59,499][1652491] Updated weights for policy 0, policy_version 164666 (0.0016) [2024-06-15 13:34:00,955][1648985] Fps is (10 sec: 45876.1, 60 sec: 45329.1, 300 sec: 46430.6). Total num frames: 337313792. Throughput: 0: 11150.2. Samples: 84379136. Policy #0 lag: (min: 15.0, avg: 136.0, max: 271.0) [2024-06-15 13:34:00,956][1648985] Avg episode reward: [(0, '134.480')] [2024-06-15 13:34:01,218][1652491] Updated weights for policy 0, policy_version 164720 (0.0012) [2024-06-15 13:34:05,009][1652491] Updated weights for policy 0, policy_version 164791 (0.0011) [2024-06-15 13:34:05,955][1648985] Fps is (10 sec: 52428.1, 60 sec: 45875.1, 300 sec: 46319.5). Total num frames: 337510400. Throughput: 0: 10990.9. Samples: 84447232. Policy #0 lag: (min: 15.0, avg: 136.0, max: 271.0) [2024-06-15 13:34:05,957][1648985] Avg episode reward: [(0, '139.440')] [2024-06-15 13:34:09,736][1652491] Updated weights for policy 0, policy_version 164841 (0.0011) [2024-06-15 13:34:10,955][1648985] Fps is (10 sec: 36044.8, 60 sec: 44783.1, 300 sec: 45875.2). Total num frames: 337674240. Throughput: 0: 11343.7. Samples: 84488192. Policy #0 lag: (min: 15.0, avg: 136.0, max: 271.0) [2024-06-15 13:34:10,955][1648985] Avg episode reward: [(0, '148.940')] [2024-06-15 13:34:11,196][1652491] Updated weights for policy 0, policy_version 164899 (0.0012) [2024-06-15 13:34:12,685][1652491] Updated weights for policy 0, policy_version 164960 (0.0015) [2024-06-15 13:34:15,000][1652491] Updated weights for policy 0, policy_version 165008 (0.0013) [2024-06-15 13:34:15,955][1648985] Fps is (10 sec: 52429.5, 60 sec: 45875.2, 300 sec: 46652.7). Total num frames: 338034688. Throughput: 0: 11400.5. Samples: 84554240. Policy #0 lag: (min: 15.0, avg: 136.0, max: 271.0) [2024-06-15 13:34:15,956][1648985] Avg episode reward: [(0, '131.600')] [2024-06-15 13:34:19,369][1652491] Updated weights for policy 0, policy_version 165060 (0.0059) [2024-06-15 13:34:19,635][1651469] Signal inference workers to stop experience collection... (8650 times) [2024-06-15 13:34:19,709][1652491] InferenceWorker_p0-w0: stopping experience collection (8650 times) [2024-06-15 13:34:19,882][1651469] Signal inference workers to resume experience collection... (8650 times) [2024-06-15 13:34:19,883][1652491] InferenceWorker_p0-w0: resuming experience collection (8650 times) [2024-06-15 13:34:20,955][1648985] Fps is (10 sec: 52428.0, 60 sec: 46421.2, 300 sec: 46097.3). Total num frames: 338198528. Throughput: 0: 11468.8. Samples: 84627456. Policy #0 lag: (min: 15.0, avg: 136.0, max: 271.0) [2024-06-15 13:34:20,956][1648985] Avg episode reward: [(0, '115.600')] [2024-06-15 13:34:21,434][1652491] Updated weights for policy 0, policy_version 165153 (0.0014) [2024-06-15 13:34:22,551][1652491] Updated weights for policy 0, policy_version 165200 (0.0022) [2024-06-15 13:34:25,783][1652491] Updated weights for policy 0, policy_version 165250 (0.0019) [2024-06-15 13:34:25,955][1648985] Fps is (10 sec: 39321.4, 60 sec: 44783.0, 300 sec: 46541.7). Total num frames: 338427904. Throughput: 0: 11320.9. Samples: 84657664. Policy #0 lag: (min: 17.0, avg: 162.6, max: 303.0) [2024-06-15 13:34:25,956][1648985] Avg episode reward: [(0, '131.920')] [2024-06-15 13:34:27,242][1652491] Updated weights for policy 0, policy_version 165311 (0.0012) [2024-06-15 13:34:30,955][1648985] Fps is (10 sec: 42599.1, 60 sec: 44783.0, 300 sec: 46208.4). Total num frames: 338624512. Throughput: 0: 11503.0. Samples: 84738560. Policy #0 lag: (min: 17.0, avg: 162.6, max: 303.0) [2024-06-15 13:34:30,956][1648985] Avg episode reward: [(0, '141.810')] [2024-06-15 13:34:31,194][1652491] Updated weights for policy 0, policy_version 165368 (0.0014) [2024-06-15 13:34:33,143][1652491] Updated weights for policy 0, policy_version 165443 (0.0014) [2024-06-15 13:34:34,236][1652491] Updated weights for policy 0, policy_version 165497 (0.0013) [2024-06-15 13:34:35,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 46967.5, 300 sec: 46652.7). Total num frames: 338952192. Throughput: 0: 11753.3. Samples: 84805632. Policy #0 lag: (min: 17.0, avg: 162.6, max: 303.0) [2024-06-15 13:34:35,956][1648985] Avg episode reward: [(0, '137.010')] [2024-06-15 13:34:38,233][1652491] Updated weights for policy 0, policy_version 165557 (0.0013) [2024-06-15 13:34:40,955][1648985] Fps is (10 sec: 45874.4, 60 sec: 44782.8, 300 sec: 46208.5). Total num frames: 339083264. Throughput: 0: 11753.2. Samples: 84841984. Policy #0 lag: (min: 17.0, avg: 162.6, max: 303.0) [2024-06-15 13:34:40,956][1648985] Avg episode reward: [(0, '126.050')] [2024-06-15 13:34:42,049][1652491] Updated weights for policy 0, policy_version 165601 (0.0012) [2024-06-15 13:34:42,921][1652491] Updated weights for policy 0, policy_version 165648 (0.0013) [2024-06-15 13:34:44,657][1652491] Updated weights for policy 0, policy_version 165713 (0.0013) [2024-06-15 13:34:45,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 49152.1, 300 sec: 46652.8). Total num frames: 339476480. Throughput: 0: 11741.9. Samples: 84907520. Policy #0 lag: (min: 17.0, avg: 162.6, max: 303.0) [2024-06-15 13:34:45,956][1648985] Avg episode reward: [(0, '117.290')] [2024-06-15 13:34:48,559][1652491] Updated weights for policy 0, policy_version 165777 (0.0015) [2024-06-15 13:34:49,597][1652491] Updated weights for policy 0, policy_version 165819 (0.0012) [2024-06-15 13:34:50,955][1648985] Fps is (10 sec: 52428.0, 60 sec: 45875.1, 300 sec: 46652.7). Total num frames: 339607552. Throughput: 0: 11946.6. Samples: 84984832. Policy #0 lag: (min: 17.0, avg: 162.6, max: 303.0) [2024-06-15 13:34:50,956][1648985] Avg episode reward: [(0, '131.030')] [2024-06-15 13:34:53,399][1652491] Updated weights for policy 0, policy_version 165886 (0.0029) [2024-06-15 13:34:54,658][1652491] Updated weights for policy 0, policy_version 165938 (0.0016) [2024-06-15 13:34:55,850][1652491] Updated weights for policy 0, policy_version 166000 (0.0013) [2024-06-15 13:34:55,955][1648985] Fps is (10 sec: 49150.8, 60 sec: 49698.0, 300 sec: 46541.7). Total num frames: 339968000. Throughput: 0: 11878.3. Samples: 85022720. Policy #0 lag: (min: 17.0, avg: 162.6, max: 303.0) [2024-06-15 13:34:55,956][1648985] Avg episode reward: [(0, '142.670')] [2024-06-15 13:34:56,259][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000166016_340000768.pth... [2024-06-15 13:34:56,305][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000160544_328794112.pth [2024-06-15 13:34:58,907][1651469] Signal inference workers to stop experience collection... (8700 times) [2024-06-15 13:34:58,946][1652491] InferenceWorker_p0-w0: stopping experience collection (8700 times) [2024-06-15 13:34:59,175][1651469] Signal inference workers to resume experience collection... (8700 times) [2024-06-15 13:34:59,176][1652491] InferenceWorker_p0-w0: resuming experience collection (8700 times) [2024-06-15 13:34:59,191][1652491] Updated weights for policy 0, policy_version 166048 (0.0025) [2024-06-15 13:35:00,955][1648985] Fps is (10 sec: 52430.5, 60 sec: 46967.5, 300 sec: 46652.7). Total num frames: 340131840. Throughput: 0: 12015.0. Samples: 85094912. Policy #0 lag: (min: 17.0, avg: 162.6, max: 303.0) [2024-06-15 13:35:00,955][1648985] Avg episode reward: [(0, '143.970')] [2024-06-15 13:35:03,757][1652491] Updated weights for policy 0, policy_version 166128 (0.0017) [2024-06-15 13:35:04,395][1652491] Updated weights for policy 0, policy_version 166160 (0.0013) [2024-06-15 13:35:05,955][1648985] Fps is (10 sec: 45876.3, 60 sec: 48606.0, 300 sec: 46763.8). Total num frames: 340426752. Throughput: 0: 12083.2. Samples: 85171200. Policy #0 lag: (min: 17.0, avg: 162.6, max: 303.0) [2024-06-15 13:35:05,956][1648985] Avg episode reward: [(0, '134.940')] [2024-06-15 13:35:06,146][1652491] Updated weights for policy 0, policy_version 166243 (0.0014) [2024-06-15 13:35:09,707][1652491] Updated weights for policy 0, policy_version 166320 (0.0012) [2024-06-15 13:35:10,955][1648985] Fps is (10 sec: 52426.8, 60 sec: 49697.8, 300 sec: 46874.8). Total num frames: 340656128. Throughput: 0: 12208.3. Samples: 85207040. Policy #0 lag: (min: 17.0, avg: 162.6, max: 303.0) [2024-06-15 13:35:10,956][1648985] Avg episode reward: [(0, '129.510')] [2024-06-15 13:35:15,218][1652491] Updated weights for policy 0, policy_version 166388 (0.0013) [2024-06-15 13:35:15,955][1648985] Fps is (10 sec: 39322.0, 60 sec: 46421.4, 300 sec: 46652.8). Total num frames: 340819968. Throughput: 0: 12140.1. Samples: 85284864. Policy #0 lag: (min: 17.0, avg: 162.6, max: 303.0) [2024-06-15 13:35:15,955][1648985] Avg episode reward: [(0, '123.770')] [2024-06-15 13:35:17,228][1652491] Updated weights for policy 0, policy_version 166483 (0.0023) [2024-06-15 13:35:17,966][1652491] Updated weights for policy 0, policy_version 166525 (0.0013) [2024-06-15 13:35:20,834][1652491] Updated weights for policy 0, policy_version 166576 (0.0011) [2024-06-15 13:35:20,955][1648985] Fps is (10 sec: 49154.0, 60 sec: 49152.1, 300 sec: 46986.0). Total num frames: 341147648. Throughput: 0: 11969.4. Samples: 85344256. Policy #0 lag: (min: 17.0, avg: 162.6, max: 303.0) [2024-06-15 13:35:20,955][1648985] Avg episode reward: [(0, '127.740')] [2024-06-15 13:35:25,955][1648985] Fps is (10 sec: 39319.9, 60 sec: 46421.1, 300 sec: 46763.8). Total num frames: 341213184. Throughput: 0: 12162.8. Samples: 85389312. Policy #0 lag: (min: 17.0, avg: 162.6, max: 303.0) [2024-06-15 13:35:25,956][1648985] Avg episode reward: [(0, '119.570')] [2024-06-15 13:35:26,412][1652491] Updated weights for policy 0, policy_version 166640 (0.0019) [2024-06-15 13:35:28,336][1652491] Updated weights for policy 0, policy_version 166724 (0.0012) [2024-06-15 13:35:29,533][1652491] Updated weights for policy 0, policy_version 166783 (0.0015) [2024-06-15 13:35:30,955][1648985] Fps is (10 sec: 42598.5, 60 sec: 49152.0, 300 sec: 46652.8). Total num frames: 341573632. Throughput: 0: 12037.7. Samples: 85449216. Policy #0 lag: (min: 85.0, avg: 126.0, max: 277.0) [2024-06-15 13:35:30,955][1648985] Avg episode reward: [(0, '134.580')] [2024-06-15 13:35:32,524][1652491] Updated weights for policy 0, policy_version 166841 (0.0130) [2024-06-15 13:35:35,955][1648985] Fps is (10 sec: 49153.5, 60 sec: 45875.2, 300 sec: 46652.7). Total num frames: 341704704. Throughput: 0: 12106.0. Samples: 85529600. Policy #0 lag: (min: 85.0, avg: 126.0, max: 277.0) [2024-06-15 13:35:35,956][1648985] Avg episode reward: [(0, '134.430')] [2024-06-15 13:35:37,391][1652491] Updated weights for policy 0, policy_version 166882 (0.0013) [2024-06-15 13:35:38,957][1651469] Signal inference workers to stop experience collection... (8750 times) [2024-06-15 13:35:39,015][1652491] InferenceWorker_p0-w0: stopping experience collection (8750 times) [2024-06-15 13:35:39,268][1651469] Signal inference workers to resume experience collection... (8750 times) [2024-06-15 13:35:39,271][1652491] InferenceWorker_p0-w0: resuming experience collection (8750 times) [2024-06-15 13:35:39,274][1652491] Updated weights for policy 0, policy_version 166960 (0.0015) [2024-06-15 13:35:40,955][1648985] Fps is (10 sec: 45874.8, 60 sec: 49152.1, 300 sec: 46541.7). Total num frames: 342032384. Throughput: 0: 11832.9. Samples: 85555200. Policy #0 lag: (min: 85.0, avg: 126.0, max: 277.0) [2024-06-15 13:35:40,956][1648985] Avg episode reward: [(0, '130.200')] [2024-06-15 13:35:41,511][1652491] Updated weights for policy 0, policy_version 167040 (0.0037) [2024-06-15 13:35:44,585][1652491] Updated weights for policy 0, policy_version 167090 (0.0012) [2024-06-15 13:35:45,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 45875.2, 300 sec: 46652.8). Total num frames: 342228992. Throughput: 0: 11491.5. Samples: 85612032. Policy #0 lag: (min: 85.0, avg: 126.0, max: 277.0) [2024-06-15 13:35:45,956][1648985] Avg episode reward: [(0, '116.440')] [2024-06-15 13:35:49,021][1652491] Updated weights for policy 0, policy_version 167105 (0.0011) [2024-06-15 13:35:50,542][1652491] Updated weights for policy 0, policy_version 167169 (0.0051) [2024-06-15 13:35:50,955][1648985] Fps is (10 sec: 36045.1, 60 sec: 46421.6, 300 sec: 46319.6). Total num frames: 342392832. Throughput: 0: 11548.5. Samples: 85690880. Policy #0 lag: (min: 85.0, avg: 126.0, max: 277.0) [2024-06-15 13:35:50,955][1648985] Avg episode reward: [(0, '120.930')] [2024-06-15 13:35:52,586][1652491] Updated weights for policy 0, policy_version 167248 (0.0013) [2024-06-15 13:35:53,783][1652491] Updated weights for policy 0, policy_version 167296 (0.0014) [2024-06-15 13:35:55,955][1648985] Fps is (10 sec: 45875.7, 60 sec: 45329.3, 300 sec: 46543.4). Total num frames: 342687744. Throughput: 0: 11252.7. Samples: 85713408. Policy #0 lag: (min: 85.0, avg: 126.0, max: 277.0) [2024-06-15 13:35:55,955][1648985] Avg episode reward: [(0, '142.450')] [2024-06-15 13:35:56,368][1652491] Updated weights for policy 0, policy_version 167355 (0.0013) [2024-06-15 13:36:00,955][1648985] Fps is (10 sec: 39321.6, 60 sec: 44236.8, 300 sec: 46208.4). Total num frames: 342786048. Throughput: 0: 11309.5. Samples: 85793792. Policy #0 lag: (min: 85.0, avg: 126.0, max: 277.0) [2024-06-15 13:36:00,956][1648985] Avg episode reward: [(0, '140.330')] [2024-06-15 13:36:02,017][1652491] Updated weights for policy 0, policy_version 167424 (0.0014) [2024-06-15 13:36:03,704][1652491] Updated weights for policy 0, policy_version 167488 (0.0084) [2024-06-15 13:36:04,902][1652491] Updated weights for policy 0, policy_version 167536 (0.0027) [2024-06-15 13:36:05,955][1648985] Fps is (10 sec: 45874.8, 60 sec: 45329.1, 300 sec: 46652.7). Total num frames: 343146496. Throughput: 0: 11286.7. Samples: 85852160. Policy #0 lag: (min: 85.0, avg: 126.0, max: 277.0) [2024-06-15 13:36:05,956][1648985] Avg episode reward: [(0, '123.450')] [2024-06-15 13:36:06,716][1652491] Updated weights for policy 0, policy_version 167573 (0.0018) [2024-06-15 13:36:07,637][1652491] Updated weights for policy 0, policy_version 167615 (0.0012) [2024-06-15 13:36:10,955][1648985] Fps is (10 sec: 49151.3, 60 sec: 43690.9, 300 sec: 46541.7). Total num frames: 343277568. Throughput: 0: 11195.8. Samples: 85893120. Policy #0 lag: (min: 85.0, avg: 126.0, max: 277.0) [2024-06-15 13:36:10,956][1648985] Avg episode reward: [(0, '138.850')] [2024-06-15 13:36:12,549][1652491] Updated weights for policy 0, policy_version 167667 (0.0015) [2024-06-15 13:36:14,266][1652491] Updated weights for policy 0, policy_version 167728 (0.0025) [2024-06-15 13:36:15,955][1648985] Fps is (10 sec: 45875.2, 60 sec: 46421.3, 300 sec: 46430.6). Total num frames: 343605248. Throughput: 0: 11252.6. Samples: 85955584. Policy #0 lag: (min: 85.0, avg: 126.0, max: 277.0) [2024-06-15 13:36:15,956][1648985] Avg episode reward: [(0, '137.070')] [2024-06-15 13:36:17,473][1652491] Updated weights for policy 0, policy_version 167809 (0.0016) [2024-06-15 13:36:18,872][1652491] Updated weights for policy 0, policy_version 167868 (0.0013) [2024-06-15 13:36:20,955][1648985] Fps is (10 sec: 52429.2, 60 sec: 44236.8, 300 sec: 46652.8). Total num frames: 343801856. Throughput: 0: 11070.6. Samples: 86027776. Policy #0 lag: (min: 85.0, avg: 126.0, max: 277.0) [2024-06-15 13:36:20,956][1648985] Avg episode reward: [(0, '133.350')] [2024-06-15 13:36:23,402][1651469] Signal inference workers to stop experience collection... (8800 times) [2024-06-15 13:36:23,500][1652491] InferenceWorker_p0-w0: stopping experience collection (8800 times) [2024-06-15 13:36:23,589][1651469] Signal inference workers to resume experience collection... (8800 times) [2024-06-15 13:36:23,591][1652491] InferenceWorker_p0-w0: resuming experience collection (8800 times) [2024-06-15 13:36:24,087][1652491] Updated weights for policy 0, policy_version 167908 (0.0122) [2024-06-15 13:36:25,683][1652491] Updated weights for policy 0, policy_version 167970 (0.0012) [2024-06-15 13:36:25,955][1648985] Fps is (10 sec: 42598.6, 60 sec: 46967.8, 300 sec: 46430.6). Total num frames: 344031232. Throughput: 0: 11355.0. Samples: 86066176. Policy #0 lag: (min: 85.0, avg: 126.0, max: 277.0) [2024-06-15 13:36:25,955][1648985] Avg episode reward: [(0, '129.610')] [2024-06-15 13:36:27,723][1652491] Updated weights for policy 0, policy_version 168053 (0.0027) [2024-06-15 13:36:29,831][1652491] Updated weights for policy 0, policy_version 168112 (0.0012) [2024-06-15 13:36:30,956][1648985] Fps is (10 sec: 52426.2, 60 sec: 45874.8, 300 sec: 46652.7). Total num frames: 344326144. Throughput: 0: 11434.5. Samples: 86126592. Policy #0 lag: (min: 85.0, avg: 126.0, max: 277.0) [2024-06-15 13:36:30,957][1648985] Avg episode reward: [(0, '136.270')] [2024-06-15 13:36:34,883][1652491] Updated weights for policy 0, policy_version 168161 (0.0011) [2024-06-15 13:36:35,955][1648985] Fps is (10 sec: 42597.7, 60 sec: 45875.1, 300 sec: 46430.6). Total num frames: 344457216. Throughput: 0: 11400.5. Samples: 86203904. Policy #0 lag: (min: 8.0, avg: 76.2, max: 264.0) [2024-06-15 13:36:35,956][1648985] Avg episode reward: [(0, '131.840')] [2024-06-15 13:36:36,382][1652491] Updated weights for policy 0, policy_version 168209 (0.0014) [2024-06-15 13:36:37,339][1652491] Updated weights for policy 0, policy_version 168253 (0.0013) [2024-06-15 13:36:38,836][1652491] Updated weights for policy 0, policy_version 168304 (0.0013) [2024-06-15 13:36:40,071][1652491] Updated weights for policy 0, policy_version 168341 (0.0083) [2024-06-15 13:36:40,873][1652491] Updated weights for policy 0, policy_version 168384 (0.0014) [2024-06-15 13:36:40,955][1648985] Fps is (10 sec: 52431.1, 60 sec: 46967.4, 300 sec: 46986.0). Total num frames: 344850432. Throughput: 0: 11582.5. Samples: 86234624. Policy #0 lag: (min: 8.0, avg: 76.2, max: 264.0) [2024-06-15 13:36:40,956][1648985] Avg episode reward: [(0, '135.990')] [2024-06-15 13:36:45,955][1648985] Fps is (10 sec: 52428.2, 60 sec: 45875.0, 300 sec: 46763.8). Total num frames: 344981504. Throughput: 0: 11628.0. Samples: 86317056. Policy #0 lag: (min: 8.0, avg: 76.2, max: 264.0) [2024-06-15 13:36:45,956][1648985] Avg episode reward: [(0, '141.490')] [2024-06-15 13:36:46,886][1652491] Updated weights for policy 0, policy_version 168469 (0.0047) [2024-06-15 13:36:47,672][1652491] Updated weights for policy 0, policy_version 168509 (0.0013) [2024-06-15 13:36:49,683][1652491] Updated weights for policy 0, policy_version 168565 (0.0013) [2024-06-15 13:36:50,955][1648985] Fps is (10 sec: 45875.5, 60 sec: 48605.8, 300 sec: 46875.0). Total num frames: 345309184. Throughput: 0: 11798.7. Samples: 86383104. Policy #0 lag: (min: 8.0, avg: 76.2, max: 264.0) [2024-06-15 13:36:50,956][1648985] Avg episode reward: [(0, '129.190')] [2024-06-15 13:36:51,524][1652491] Updated weights for policy 0, policy_version 168631 (0.0012) [2024-06-15 13:36:55,957][1648985] Fps is (10 sec: 45868.6, 60 sec: 45873.8, 300 sec: 46652.5). Total num frames: 345440256. Throughput: 0: 11787.0. Samples: 86423552. Policy #0 lag: (min: 8.0, avg: 76.2, max: 264.0) [2024-06-15 13:36:55,957][1648985] Avg episode reward: [(0, '109.090')] [2024-06-15 13:36:55,995][1652491] Updated weights for policy 0, policy_version 168688 (0.0011) [2024-06-15 13:36:56,236][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000168704_345505792.pth... [2024-06-15 13:36:56,280][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000163264_334364672.pth [2024-06-15 13:36:57,911][1652491] Updated weights for policy 0, policy_version 168738 (0.0012) [2024-06-15 13:36:58,574][1652491] Updated weights for policy 0, policy_version 168768 (0.0012) [2024-06-15 13:37:00,615][1652491] Updated weights for policy 0, policy_version 168828 (0.0099) [2024-06-15 13:37:00,955][1648985] Fps is (10 sec: 45875.0, 60 sec: 49698.1, 300 sec: 46654.7). Total num frames: 345767936. Throughput: 0: 12003.5. Samples: 86495744. Policy #0 lag: (min: 8.0, avg: 76.2, max: 264.0) [2024-06-15 13:37:00,956][1648985] Avg episode reward: [(0, '111.240')] [2024-06-15 13:37:01,799][1651469] Signal inference workers to stop experience collection... (8850 times) [2024-06-15 13:37:01,894][1652491] InferenceWorker_p0-w0: stopping experience collection (8850 times) [2024-06-15 13:37:02,013][1651469] Signal inference workers to resume experience collection... (8850 times) [2024-06-15 13:37:02,015][1652491] InferenceWorker_p0-w0: resuming experience collection (8850 times) [2024-06-15 13:37:02,018][1652491] Updated weights for policy 0, policy_version 168880 (0.0095) [2024-06-15 13:37:05,907][1652491] Updated weights for policy 0, policy_version 168912 (0.0031) [2024-06-15 13:37:05,955][1648985] Fps is (10 sec: 49160.2, 60 sec: 46421.3, 300 sec: 46652.7). Total num frames: 345931776. Throughput: 0: 12071.8. Samples: 86571008. Policy #0 lag: (min: 8.0, avg: 76.2, max: 264.0) [2024-06-15 13:37:05,956][1648985] Avg episode reward: [(0, '142.330')] [2024-06-15 13:37:09,075][1652491] Updated weights for policy 0, policy_version 168964 (0.0012) [2024-06-15 13:37:10,955][1648985] Fps is (10 sec: 39321.7, 60 sec: 48059.8, 300 sec: 46208.4). Total num frames: 346161152. Throughput: 0: 12049.1. Samples: 86608384. Policy #0 lag: (min: 8.0, avg: 76.2, max: 264.0) [2024-06-15 13:37:10,956][1648985] Avg episode reward: [(0, '151.060')] [2024-06-15 13:37:11,055][1652491] Updated weights for policy 0, policy_version 169031 (0.0022) [2024-06-15 13:37:12,972][1652491] Updated weights for policy 0, policy_version 169106 (0.0012) [2024-06-15 13:37:13,860][1652491] Updated weights for policy 0, policy_version 169152 (0.0012) [2024-06-15 13:37:15,955][1648985] Fps is (10 sec: 49151.8, 60 sec: 46967.4, 300 sec: 46652.7). Total num frames: 346423296. Throughput: 0: 12060.6. Samples: 86669312. Policy #0 lag: (min: 8.0, avg: 76.2, max: 264.0) [2024-06-15 13:37:15,956][1648985] Avg episode reward: [(0, '141.130')] [2024-06-15 13:37:20,955][1648985] Fps is (10 sec: 42598.5, 60 sec: 46421.3, 300 sec: 46098.9). Total num frames: 346587136. Throughput: 0: 11969.5. Samples: 86742528. Policy #0 lag: (min: 8.0, avg: 76.2, max: 264.0) [2024-06-15 13:37:20,956][1648985] Avg episode reward: [(0, '118.560')] [2024-06-15 13:37:20,987][1652491] Updated weights for policy 0, policy_version 169241 (0.0014) [2024-06-15 13:37:21,716][1652491] Updated weights for policy 0, policy_version 169273 (0.0011) [2024-06-15 13:37:23,568][1652491] Updated weights for policy 0, policy_version 169344 (0.0013) [2024-06-15 13:37:25,239][1652491] Updated weights for policy 0, policy_version 169408 (0.0013) [2024-06-15 13:37:25,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 48605.8, 300 sec: 46654.2). Total num frames: 346947584. Throughput: 0: 11992.2. Samples: 86774272. Policy #0 lag: (min: 8.0, avg: 76.2, max: 264.0) [2024-06-15 13:37:25,956][1648985] Avg episode reward: [(0, '115.130')] [2024-06-15 13:37:28,792][1652491] Updated weights for policy 0, policy_version 169469 (0.0012) [2024-06-15 13:37:30,955][1648985] Fps is (10 sec: 49151.2, 60 sec: 45875.5, 300 sec: 46319.5). Total num frames: 347078656. Throughput: 0: 11923.9. Samples: 86853632. Policy #0 lag: (min: 8.0, avg: 76.2, max: 264.0) [2024-06-15 13:37:30,956][1648985] Avg episode reward: [(0, '117.120')] [2024-06-15 13:37:32,133][1652491] Updated weights for policy 0, policy_version 169520 (0.0029) [2024-06-15 13:37:33,754][1652491] Updated weights for policy 0, policy_version 169595 (0.0012) [2024-06-15 13:37:35,430][1652491] Updated weights for policy 0, policy_version 169648 (0.0013) [2024-06-15 13:37:35,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 50244.3, 300 sec: 46652.7). Total num frames: 347471872. Throughput: 0: 11832.9. Samples: 86915584. Policy #0 lag: (min: 8.0, avg: 76.2, max: 264.0) [2024-06-15 13:37:35,956][1648985] Avg episode reward: [(0, '129.330')] [2024-06-15 13:37:39,270][1652491] Updated weights for policy 0, policy_version 169696 (0.0013) [2024-06-15 13:37:40,955][1648985] Fps is (10 sec: 52429.5, 60 sec: 45875.2, 300 sec: 46652.8). Total num frames: 347602944. Throughput: 0: 11878.8. Samples: 86958080. Policy #0 lag: (min: 31.0, avg: 138.1, max: 287.0) [2024-06-15 13:37:40,956][1648985] Avg episode reward: [(0, '129.780')] [2024-06-15 13:37:43,626][1652491] Updated weights for policy 0, policy_version 169776 (0.0106) [2024-06-15 13:37:44,928][1652491] Updated weights for policy 0, policy_version 169827 (0.0036) [2024-06-15 13:37:45,274][1651469] Signal inference workers to stop experience collection... (8900 times) [2024-06-15 13:37:45,314][1652491] InferenceWorker_p0-w0: stopping experience collection (8900 times) [2024-06-15 13:37:45,550][1651469] Signal inference workers to resume experience collection... (8900 times) [2024-06-15 13:37:45,551][1652491] InferenceWorker_p0-w0: resuming experience collection (8900 times) [2024-06-15 13:37:45,955][1648985] Fps is (10 sec: 39321.8, 60 sec: 48059.9, 300 sec: 46319.5). Total num frames: 347865088. Throughput: 0: 11798.8. Samples: 87026688. Policy #0 lag: (min: 31.0, avg: 138.1, max: 287.0) [2024-06-15 13:37:45,955][1648985] Avg episode reward: [(0, '129.020')] [2024-06-15 13:37:46,814][1652491] Updated weights for policy 0, policy_version 169904 (0.0016) [2024-06-15 13:37:50,351][1652491] Updated weights for policy 0, policy_version 169924 (0.0012) [2024-06-15 13:37:50,955][1648985] Fps is (10 sec: 45875.3, 60 sec: 45875.2, 300 sec: 46653.7). Total num frames: 348061696. Throughput: 0: 11764.6. Samples: 87100416. Policy #0 lag: (min: 31.0, avg: 138.1, max: 287.0) [2024-06-15 13:37:50,956][1648985] Avg episode reward: [(0, '141.490')] [2024-06-15 13:37:51,513][1652491] Updated weights for policy 0, policy_version 169978 (0.0020) [2024-06-15 13:37:54,420][1652491] Updated weights for policy 0, policy_version 170016 (0.0012) [2024-06-15 13:37:55,478][1652491] Updated weights for policy 0, policy_version 170070 (0.0096) [2024-06-15 13:37:55,955][1648985] Fps is (10 sec: 49152.0, 60 sec: 48607.2, 300 sec: 46652.7). Total num frames: 348356608. Throughput: 0: 11867.0. Samples: 87142400. Policy #0 lag: (min: 31.0, avg: 138.1, max: 287.0) [2024-06-15 13:37:55,955][1648985] Avg episode reward: [(0, '137.960')] [2024-06-15 13:37:57,189][1652491] Updated weights for policy 0, policy_version 170144 (0.0131) [2024-06-15 13:38:00,955][1648985] Fps is (10 sec: 45873.8, 60 sec: 45875.0, 300 sec: 46652.7). Total num frames: 348520448. Throughput: 0: 11901.1. Samples: 87204864. Policy #0 lag: (min: 31.0, avg: 138.1, max: 287.0) [2024-06-15 13:38:00,957][1648985] Avg episode reward: [(0, '129.840')] [2024-06-15 13:38:01,382][1652491] Updated weights for policy 0, policy_version 170193 (0.0015) [2024-06-15 13:38:02,334][1652491] Updated weights for policy 0, policy_version 170240 (0.0040) [2024-06-15 13:38:05,824][1652491] Updated weights for policy 0, policy_version 170289 (0.0018) [2024-06-15 13:38:05,955][1648985] Fps is (10 sec: 39321.6, 60 sec: 46967.5, 300 sec: 46652.8). Total num frames: 348749824. Throughput: 0: 11980.8. Samples: 87281664. Policy #0 lag: (min: 31.0, avg: 138.1, max: 287.0) [2024-06-15 13:38:05,956][1648985] Avg episode reward: [(0, '133.030')] [2024-06-15 13:38:07,659][1652491] Updated weights for policy 0, policy_version 170384 (0.0011) [2024-06-15 13:38:08,673][1652491] Updated weights for policy 0, policy_version 170429 (0.0014) [2024-06-15 13:38:10,955][1648985] Fps is (10 sec: 52430.3, 60 sec: 48059.7, 300 sec: 46652.8). Total num frames: 349044736. Throughput: 0: 11832.9. Samples: 87306752. Policy #0 lag: (min: 31.0, avg: 138.1, max: 287.0) [2024-06-15 13:38:10,955][1648985] Avg episode reward: [(0, '155.590')] [2024-06-15 13:38:13,314][1652491] Updated weights for policy 0, policy_version 170486 (0.0015) [2024-06-15 13:38:15,955][1648985] Fps is (10 sec: 45874.8, 60 sec: 46421.3, 300 sec: 46763.8). Total num frames: 349208576. Throughput: 0: 11798.8. Samples: 87384576. Policy #0 lag: (min: 31.0, avg: 138.1, max: 287.0) [2024-06-15 13:38:15,956][1648985] Avg episode reward: [(0, '159.400')] [2024-06-15 13:38:16,226][1652491] Updated weights for policy 0, policy_version 170529 (0.0012) [2024-06-15 13:38:17,278][1652491] Updated weights for policy 0, policy_version 170577 (0.0015) [2024-06-15 13:38:18,791][1652491] Updated weights for policy 0, policy_version 170656 (0.0132) [2024-06-15 13:38:19,677][1652491] Updated weights for policy 0, policy_version 170688 (0.0037) [2024-06-15 13:38:20,955][1648985] Fps is (10 sec: 52428.4, 60 sec: 49698.1, 300 sec: 46874.9). Total num frames: 349569024. Throughput: 0: 11992.2. Samples: 87455232. Policy #0 lag: (min: 31.0, avg: 138.1, max: 287.0) [2024-06-15 13:38:20,956][1648985] Avg episode reward: [(0, '153.730')] [2024-06-15 13:38:23,740][1652491] Updated weights for policy 0, policy_version 170745 (0.0015) [2024-06-15 13:38:25,955][1648985] Fps is (10 sec: 49152.2, 60 sec: 45875.2, 300 sec: 46652.7). Total num frames: 349700096. Throughput: 0: 11923.9. Samples: 87494656. Policy #0 lag: (min: 31.0, avg: 138.1, max: 287.0) [2024-06-15 13:38:25,956][1648985] Avg episode reward: [(0, '143.510')] [2024-06-15 13:38:27,362][1651469] Signal inference workers to stop experience collection... (8950 times) [2024-06-15 13:38:27,417][1652491] InferenceWorker_p0-w0: stopping experience collection (8950 times) [2024-06-15 13:38:27,418][1652491] Updated weights for policy 0, policy_version 170791 (0.0011) [2024-06-15 13:38:27,547][1651469] Signal inference workers to resume experience collection... (8950 times) [2024-06-15 13:38:27,548][1652491] InferenceWorker_p0-w0: resuming experience collection (8950 times) [2024-06-15 13:38:28,780][1652491] Updated weights for policy 0, policy_version 170864 (0.0113) [2024-06-15 13:38:30,518][1652491] Updated weights for policy 0, policy_version 170939 (0.0013) [2024-06-15 13:38:30,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 50244.3, 300 sec: 47319.2). Total num frames: 350093312. Throughput: 0: 11912.5. Samples: 87562752. Policy #0 lag: (min: 31.0, avg: 138.1, max: 287.0) [2024-06-15 13:38:30,956][1648985] Avg episode reward: [(0, '138.220')] [2024-06-15 13:38:34,681][1652491] Updated weights for policy 0, policy_version 171000 (0.0012) [2024-06-15 13:38:35,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 45875.2, 300 sec: 46874.9). Total num frames: 350224384. Throughput: 0: 11844.3. Samples: 87633408. Policy #0 lag: (min: 31.0, avg: 138.1, max: 287.0) [2024-06-15 13:38:35,956][1648985] Avg episode reward: [(0, '130.720')] [2024-06-15 13:38:39,461][1652491] Updated weights for policy 0, policy_version 171059 (0.0021) [2024-06-15 13:38:40,955][1648985] Fps is (10 sec: 36045.0, 60 sec: 47513.6, 300 sec: 47208.1). Total num frames: 350453760. Throughput: 0: 11696.3. Samples: 87668736. Policy #0 lag: (min: 31.0, avg: 138.1, max: 287.0) [2024-06-15 13:38:40,956][1648985] Avg episode reward: [(0, '139.640')] [2024-06-15 13:38:41,915][1652491] Updated weights for policy 0, policy_version 171168 (0.0011) [2024-06-15 13:38:45,945][1652491] Updated weights for policy 0, policy_version 171232 (0.0014) [2024-06-15 13:38:45,955][1648985] Fps is (10 sec: 45875.6, 60 sec: 46967.5, 300 sec: 46874.9). Total num frames: 350683136. Throughput: 0: 11764.7. Samples: 87734272. Policy #0 lag: (min: 3.0, avg: 127.4, max: 259.0) [2024-06-15 13:38:45,956][1648985] Avg episode reward: [(0, '139.850')] [2024-06-15 13:38:49,897][1652491] Updated weights for policy 0, policy_version 171284 (0.0012) [2024-06-15 13:38:50,912][1652491] Updated weights for policy 0, policy_version 171330 (0.0012) [2024-06-15 13:38:50,955][1648985] Fps is (10 sec: 42598.5, 60 sec: 46967.5, 300 sec: 47097.1). Total num frames: 350879744. Throughput: 0: 11605.3. Samples: 87803904. Policy #0 lag: (min: 3.0, avg: 127.4, max: 259.0) [2024-06-15 13:38:50,956][1648985] Avg episode reward: [(0, '136.680')] [2024-06-15 13:38:52,824][1652491] Updated weights for policy 0, policy_version 171393 (0.0013) [2024-06-15 13:38:53,868][1652491] Updated weights for policy 0, policy_version 171451 (0.0012) [2024-06-15 13:38:55,955][1648985] Fps is (10 sec: 45874.9, 60 sec: 46421.3, 300 sec: 46874.9). Total num frames: 351141888. Throughput: 0: 11685.0. Samples: 87832576. Policy #0 lag: (min: 3.0, avg: 127.4, max: 259.0) [2024-06-15 13:38:55,956][1648985] Avg episode reward: [(0, '118.190')] [2024-06-15 13:38:55,961][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000171456_351141888.pth... [2024-06-15 13:38:56,042][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000166016_340000768.pth [2024-06-15 13:38:57,835][1652491] Updated weights for policy 0, policy_version 171513 (0.0013) [2024-06-15 13:39:00,955][1648985] Fps is (10 sec: 39321.8, 60 sec: 45875.5, 300 sec: 46652.8). Total num frames: 351272960. Throughput: 0: 11594.0. Samples: 87906304. Policy #0 lag: (min: 3.0, avg: 127.4, max: 259.0) [2024-06-15 13:39:00,955][1648985] Avg episode reward: [(0, '131.840')] [2024-06-15 13:39:01,786][1652491] Updated weights for policy 0, policy_version 171557 (0.0013) [2024-06-15 13:39:02,983][1652491] Updated weights for policy 0, policy_version 171620 (0.0013) [2024-06-15 13:39:03,984][1652491] Updated weights for policy 0, policy_version 171649 (0.0012) [2024-06-15 13:39:05,045][1652491] Updated weights for policy 0, policy_version 171703 (0.0013) [2024-06-15 13:39:05,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 48605.8, 300 sec: 47430.3). Total num frames: 351666176. Throughput: 0: 11582.6. Samples: 87976448. Policy #0 lag: (min: 3.0, avg: 127.4, max: 259.0) [2024-06-15 13:39:05,956][1648985] Avg episode reward: [(0, '128.040')] [2024-06-15 13:39:07,719][1651469] Signal inference workers to stop experience collection... (9000 times) [2024-06-15 13:39:07,779][1652491] InferenceWorker_p0-w0: stopping experience collection (9000 times) [2024-06-15 13:39:08,014][1651469] Signal inference workers to resume experience collection... (9000 times) [2024-06-15 13:39:08,015][1652491] InferenceWorker_p0-w0: resuming experience collection (9000 times) [2024-06-15 13:39:08,017][1652491] Updated weights for policy 0, policy_version 171744 (0.0013) [2024-06-15 13:39:08,928][1652491] Updated weights for policy 0, policy_version 171776 (0.0017) [2024-06-15 13:39:10,966][1648985] Fps is (10 sec: 52369.7, 60 sec: 45866.6, 300 sec: 46651.0). Total num frames: 351797248. Throughput: 0: 11500.1. Samples: 88012288. Policy #0 lag: (min: 3.0, avg: 127.4, max: 259.0) [2024-06-15 13:39:10,967][1648985] Avg episode reward: [(0, '147.530')] [2024-06-15 13:39:13,769][1652491] Updated weights for policy 0, policy_version 171856 (0.0012) [2024-06-15 13:39:15,202][1652491] Updated weights for policy 0, policy_version 171905 (0.0014) [2024-06-15 13:39:15,955][1648985] Fps is (10 sec: 45875.3, 60 sec: 48605.9, 300 sec: 47208.2). Total num frames: 352124928. Throughput: 0: 11468.8. Samples: 88078848. Policy #0 lag: (min: 3.0, avg: 127.4, max: 259.0) [2024-06-15 13:39:15,956][1648985] Avg episode reward: [(0, '137.100')] [2024-06-15 13:39:16,434][1652491] Updated weights for policy 0, policy_version 171961 (0.0013) [2024-06-15 13:39:20,037][1652491] Updated weights for policy 0, policy_version 172021 (0.0014) [2024-06-15 13:39:20,955][1648985] Fps is (10 sec: 52487.8, 60 sec: 45875.3, 300 sec: 47097.1). Total num frames: 352321536. Throughput: 0: 11537.1. Samples: 88152576. Policy #0 lag: (min: 3.0, avg: 127.4, max: 259.0) [2024-06-15 13:39:20,956][1648985] Avg episode reward: [(0, '151.230')] [2024-06-15 13:39:23,275][1652491] Updated weights for policy 0, policy_version 172064 (0.0010) [2024-06-15 13:39:24,613][1652491] Updated weights for policy 0, policy_version 172112 (0.0014) [2024-06-15 13:39:25,391][1652491] Updated weights for policy 0, policy_version 172153 (0.0020) [2024-06-15 13:39:25,955][1648985] Fps is (10 sec: 45874.8, 60 sec: 48059.7, 300 sec: 47319.2). Total num frames: 352583680. Throughput: 0: 11605.3. Samples: 88190976. Policy #0 lag: (min: 3.0, avg: 127.4, max: 259.0) [2024-06-15 13:39:25,956][1648985] Avg episode reward: [(0, '131.480')] [2024-06-15 13:39:27,465][1652491] Updated weights for policy 0, policy_version 172224 (0.0013) [2024-06-15 13:39:30,799][1652491] Updated weights for policy 0, policy_version 172277 (0.0013) [2024-06-15 13:39:30,955][1648985] Fps is (10 sec: 49152.1, 60 sec: 45329.1, 300 sec: 46986.0). Total num frames: 352813056. Throughput: 0: 11764.6. Samples: 88263680. Policy #0 lag: (min: 3.0, avg: 127.4, max: 259.0) [2024-06-15 13:39:30,955][1648985] Avg episode reward: [(0, '126.750')] [2024-06-15 13:39:34,426][1652491] Updated weights for policy 0, policy_version 172339 (0.0011) [2024-06-15 13:39:35,955][1648985] Fps is (10 sec: 42598.8, 60 sec: 46421.3, 300 sec: 47208.2). Total num frames: 353009664. Throughput: 0: 11798.8. Samples: 88334848. Policy #0 lag: (min: 3.0, avg: 127.4, max: 259.0) [2024-06-15 13:39:35,956][1648985] Avg episode reward: [(0, '121.940')] [2024-06-15 13:39:36,675][1652491] Updated weights for policy 0, policy_version 172408 (0.0080) [2024-06-15 13:39:38,646][1652491] Updated weights for policy 0, policy_version 172464 (0.0012) [2024-06-15 13:39:40,955][1648985] Fps is (10 sec: 42599.1, 60 sec: 46421.5, 300 sec: 46652.8). Total num frames: 353239040. Throughput: 0: 11719.2. Samples: 88359936. Policy #0 lag: (min: 3.0, avg: 127.4, max: 259.0) [2024-06-15 13:39:40,955][1648985] Avg episode reward: [(0, '124.900')] [2024-06-15 13:39:41,960][1652491] Updated weights for policy 0, policy_version 172528 (0.0014) [2024-06-15 13:39:45,676][1652491] Updated weights for policy 0, policy_version 172564 (0.0012) [2024-06-15 13:39:45,955][1648985] Fps is (10 sec: 42598.6, 60 sec: 45875.2, 300 sec: 46875.0). Total num frames: 353435648. Throughput: 0: 11867.0. Samples: 88440320. Policy #0 lag: (min: 3.0, avg: 127.4, max: 259.0) [2024-06-15 13:39:45,955][1648985] Avg episode reward: [(0, '126.710')] [2024-06-15 13:39:47,424][1652491] Updated weights for policy 0, policy_version 172610 (0.0013) [2024-06-15 13:39:49,344][1652491] Updated weights for policy 0, policy_version 172688 (0.0012) [2024-06-15 13:39:50,955][1648985] Fps is (10 sec: 52427.4, 60 sec: 48059.7, 300 sec: 46763.8). Total num frames: 353763328. Throughput: 0: 11525.7. Samples: 88495104. Policy #0 lag: (min: 67.0, avg: 198.5, max: 307.0) [2024-06-15 13:39:50,956][1648985] Avg episode reward: [(0, '134.310')] [2024-06-15 13:39:53,029][1652491] Updated weights for policy 0, policy_version 172768 (0.0035) [2024-06-15 13:39:53,148][1651469] Signal inference workers to stop experience collection... (9050 times) [2024-06-15 13:39:53,219][1652491] InferenceWorker_p0-w0: stopping experience collection (9050 times) [2024-06-15 13:39:53,387][1651469] Signal inference workers to resume experience collection... (9050 times) [2024-06-15 13:39:53,388][1652491] InferenceWorker_p0-w0: resuming experience collection (9050 times) [2024-06-15 13:39:55,955][1648985] Fps is (10 sec: 45874.8, 60 sec: 45875.2, 300 sec: 46652.7). Total num frames: 353894400. Throughput: 0: 11642.4. Samples: 88536064. Policy #0 lag: (min: 67.0, avg: 198.5, max: 307.0) [2024-06-15 13:39:55,956][1648985] Avg episode reward: [(0, '138.170')] [2024-06-15 13:39:56,541][1652491] Updated weights for policy 0, policy_version 172805 (0.0012) [2024-06-15 13:39:59,751][1652491] Updated weights for policy 0, policy_version 172912 (0.0125) [2024-06-15 13:40:00,955][1648985] Fps is (10 sec: 42598.6, 60 sec: 48605.8, 300 sec: 46652.7). Total num frames: 354189312. Throughput: 0: 11685.0. Samples: 88604672. Policy #0 lag: (min: 67.0, avg: 198.5, max: 307.0) [2024-06-15 13:40:00,956][1648985] Avg episode reward: [(0, '134.520')] [2024-06-15 13:40:04,127][1652491] Updated weights for policy 0, policy_version 172994 (0.0136) [2024-06-15 13:40:05,122][1652491] Updated weights for policy 0, policy_version 173056 (0.0010) [2024-06-15 13:40:05,955][1648985] Fps is (10 sec: 52427.4, 60 sec: 45875.0, 300 sec: 46652.8). Total num frames: 354418688. Throughput: 0: 11616.6. Samples: 88675328. Policy #0 lag: (min: 67.0, avg: 198.5, max: 307.0) [2024-06-15 13:40:05,956][1648985] Avg episode reward: [(0, '137.770')] [2024-06-15 13:40:10,132][1652491] Updated weights for policy 0, policy_version 173121 (0.0019) [2024-06-15 13:40:10,955][1648985] Fps is (10 sec: 42597.4, 60 sec: 46976.1, 300 sec: 46763.8). Total num frames: 354615296. Throughput: 0: 11639.4. Samples: 88714752. Policy #0 lag: (min: 67.0, avg: 198.5, max: 307.0) [2024-06-15 13:40:10,956][1648985] Avg episode reward: [(0, '150.550')] [2024-06-15 13:40:12,697][1652491] Updated weights for policy 0, policy_version 173217 (0.0098) [2024-06-15 13:40:15,955][1648985] Fps is (10 sec: 39322.8, 60 sec: 44782.9, 300 sec: 46319.5). Total num frames: 354811904. Throughput: 0: 11252.6. Samples: 88770048. Policy #0 lag: (min: 67.0, avg: 198.5, max: 307.0) [2024-06-15 13:40:15,956][1648985] Avg episode reward: [(0, '154.870')] [2024-06-15 13:40:16,486][1652491] Updated weights for policy 0, policy_version 173265 (0.0013) [2024-06-15 13:40:20,264][1652491] Updated weights for policy 0, policy_version 173314 (0.0016) [2024-06-15 13:40:20,955][1648985] Fps is (10 sec: 39322.8, 60 sec: 44783.0, 300 sec: 46763.9). Total num frames: 355008512. Throughput: 0: 11298.1. Samples: 88843264. Policy #0 lag: (min: 67.0, avg: 198.5, max: 307.0) [2024-06-15 13:40:20,956][1648985] Avg episode reward: [(0, '135.020')] [2024-06-15 13:40:22,150][1652491] Updated weights for policy 0, policy_version 173392 (0.0015) [2024-06-15 13:40:24,068][1652491] Updated weights for policy 0, policy_version 173457 (0.0013) [2024-06-15 13:40:25,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 45875.3, 300 sec: 46652.7). Total num frames: 355336192. Throughput: 0: 11411.9. Samples: 88873472. Policy #0 lag: (min: 67.0, avg: 198.5, max: 307.0) [2024-06-15 13:40:25,956][1648985] Avg episode reward: [(0, '138.960')] [2024-06-15 13:40:27,820][1652491] Updated weights for policy 0, policy_version 173521 (0.0035) [2024-06-15 13:40:30,955][1648985] Fps is (10 sec: 45875.3, 60 sec: 44236.8, 300 sec: 46652.8). Total num frames: 355467264. Throughput: 0: 11252.6. Samples: 88946688. Policy #0 lag: (min: 67.0, avg: 198.5, max: 307.0) [2024-06-15 13:40:30,956][1648985] Avg episode reward: [(0, '147.740')] [2024-06-15 13:40:32,057][1652491] Updated weights for policy 0, policy_version 173571 (0.0024) [2024-06-15 13:40:33,446][1652491] Updated weights for policy 0, policy_version 173635 (0.0017) [2024-06-15 13:40:35,700][1652491] Updated weights for policy 0, policy_version 173728 (0.0015) [2024-06-15 13:40:35,955][1648985] Fps is (10 sec: 45875.2, 60 sec: 46421.3, 300 sec: 46652.8). Total num frames: 355794944. Throughput: 0: 11355.0. Samples: 89006080. Policy #0 lag: (min: 67.0, avg: 198.5, max: 307.0) [2024-06-15 13:40:35,956][1648985] Avg episode reward: [(0, '143.530')] [2024-06-15 13:40:38,687][1651469] Signal inference workers to stop experience collection... (9100 times) [2024-06-15 13:40:38,744][1652491] InferenceWorker_p0-w0: stopping experience collection (9100 times) [2024-06-15 13:40:38,890][1651469] Signal inference workers to resume experience collection... (9100 times) [2024-06-15 13:40:38,891][1652491] InferenceWorker_p0-w0: resuming experience collection (9100 times) [2024-06-15 13:40:38,893][1652491] Updated weights for policy 0, policy_version 173776 (0.0015) [2024-06-15 13:40:40,955][1648985] Fps is (10 sec: 52428.3, 60 sec: 45875.0, 300 sec: 46652.7). Total num frames: 355991552. Throughput: 0: 11434.7. Samples: 89050624. Policy #0 lag: (min: 67.0, avg: 198.5, max: 307.0) [2024-06-15 13:40:40,956][1648985] Avg episode reward: [(0, '147.730')] [2024-06-15 13:40:43,708][1652491] Updated weights for policy 0, policy_version 173840 (0.0018) [2024-06-15 13:40:45,471][1652491] Updated weights for policy 0, policy_version 173906 (0.0099) [2024-06-15 13:40:45,955][1648985] Fps is (10 sec: 39321.8, 60 sec: 45875.2, 300 sec: 46763.8). Total num frames: 356188160. Throughput: 0: 11491.6. Samples: 89121792. Policy #0 lag: (min: 67.0, avg: 198.5, max: 307.0) [2024-06-15 13:40:45,956][1648985] Avg episode reward: [(0, '138.290')] [2024-06-15 13:40:46,736][1652491] Updated weights for policy 0, policy_version 173970 (0.0012) [2024-06-15 13:40:47,768][1652491] Updated weights for policy 0, policy_version 174011 (0.0013) [2024-06-15 13:40:50,408][1652491] Updated weights for policy 0, policy_version 174055 (0.0013) [2024-06-15 13:40:50,955][1648985] Fps is (10 sec: 52429.2, 60 sec: 45875.3, 300 sec: 46874.9). Total num frames: 356515840. Throughput: 0: 11503.0. Samples: 89192960. Policy #0 lag: (min: 67.0, avg: 198.5, max: 307.0) [2024-06-15 13:40:50,955][1648985] Avg episode reward: [(0, '145.810')] [2024-06-15 13:40:55,459][1652491] Updated weights for policy 0, policy_version 174128 (0.0125) [2024-06-15 13:40:55,955][1648985] Fps is (10 sec: 45875.0, 60 sec: 45875.2, 300 sec: 46986.0). Total num frames: 356646912. Throughput: 0: 11537.1. Samples: 89233920. Policy #0 lag: (min: 58.0, avg: 127.0, max: 314.0) [2024-06-15 13:40:55,956][1648985] Avg episode reward: [(0, '142.790')] [2024-06-15 13:40:56,387][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000174160_356679680.pth... [2024-06-15 13:40:56,526][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000168704_345505792.pth [2024-06-15 13:40:57,309][1652491] Updated weights for policy 0, policy_version 174208 (0.0098) [2024-06-15 13:40:58,774][1652491] Updated weights for policy 0, policy_version 174270 (0.0013) [2024-06-15 13:41:00,955][1648985] Fps is (10 sec: 39321.3, 60 sec: 45329.1, 300 sec: 46652.7). Total num frames: 356909056. Throughput: 0: 11548.4. Samples: 89289728. Policy #0 lag: (min: 58.0, avg: 127.0, max: 314.0) [2024-06-15 13:41:00,956][1648985] Avg episode reward: [(0, '139.840')] [2024-06-15 13:41:02,169][1652491] Updated weights for policy 0, policy_version 174330 (0.0014) [2024-06-15 13:41:05,955][1648985] Fps is (10 sec: 39321.2, 60 sec: 43690.8, 300 sec: 46652.7). Total num frames: 357040128. Throughput: 0: 11662.2. Samples: 89368064. Policy #0 lag: (min: 58.0, avg: 127.0, max: 314.0) [2024-06-15 13:41:05,956][1648985] Avg episode reward: [(0, '124.630')] [2024-06-15 13:41:06,620][1652491] Updated weights for policy 0, policy_version 174369 (0.0014) [2024-06-15 13:41:08,430][1652491] Updated weights for policy 0, policy_version 174448 (0.0013) [2024-06-15 13:41:10,279][1652491] Updated weights for policy 0, policy_version 174516 (0.0013) [2024-06-15 13:41:10,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 46967.7, 300 sec: 46874.9). Total num frames: 357433344. Throughput: 0: 11525.7. Samples: 89392128. Policy #0 lag: (min: 58.0, avg: 127.0, max: 314.0) [2024-06-15 13:41:10,956][1648985] Avg episode reward: [(0, '126.930')] [2024-06-15 13:41:13,365][1652491] Updated weights for policy 0, policy_version 174560 (0.0014) [2024-06-15 13:41:14,030][1652491] Updated weights for policy 0, policy_version 174591 (0.0013) [2024-06-15 13:41:15,955][1648985] Fps is (10 sec: 52429.5, 60 sec: 45875.2, 300 sec: 46652.8). Total num frames: 357564416. Throughput: 0: 11548.4. Samples: 89466368. Policy #0 lag: (min: 58.0, avg: 127.0, max: 314.0) [2024-06-15 13:41:15,956][1648985] Avg episode reward: [(0, '136.980')] [2024-06-15 13:41:19,114][1652491] Updated weights for policy 0, policy_version 174658 (0.0108) [2024-06-15 13:41:19,454][1651469] Signal inference workers to stop experience collection... (9150 times) [2024-06-15 13:41:19,514][1652491] InferenceWorker_p0-w0: stopping experience collection (9150 times) [2024-06-15 13:41:19,787][1651469] Signal inference workers to resume experience collection... (9150 times) [2024-06-15 13:41:19,789][1652491] InferenceWorker_p0-w0: resuming experience collection (9150 times) [2024-06-15 13:41:20,380][1652491] Updated weights for policy 0, policy_version 174716 (0.0013) [2024-06-15 13:41:20,955][1648985] Fps is (10 sec: 42598.2, 60 sec: 47513.5, 300 sec: 46874.9). Total num frames: 357859328. Throughput: 0: 11593.9. Samples: 89527808. Policy #0 lag: (min: 58.0, avg: 127.0, max: 314.0) [2024-06-15 13:41:20,956][1648985] Avg episode reward: [(0, '119.190')] [2024-06-15 13:41:21,856][1652491] Updated weights for policy 0, policy_version 174778 (0.0012) [2024-06-15 13:41:25,608][1652491] Updated weights for policy 0, policy_version 174843 (0.0014) [2024-06-15 13:41:25,955][1648985] Fps is (10 sec: 52428.4, 60 sec: 45875.2, 300 sec: 46652.8). Total num frames: 358088704. Throughput: 0: 11514.3. Samples: 89568768. Policy #0 lag: (min: 58.0, avg: 127.0, max: 314.0) [2024-06-15 13:41:25,956][1648985] Avg episode reward: [(0, '128.340')] [2024-06-15 13:41:29,889][1652491] Updated weights for policy 0, policy_version 174896 (0.0089) [2024-06-15 13:41:30,955][1648985] Fps is (10 sec: 42599.0, 60 sec: 46967.5, 300 sec: 46874.9). Total num frames: 358285312. Throughput: 0: 11468.8. Samples: 89637888. Policy #0 lag: (min: 58.0, avg: 127.0, max: 314.0) [2024-06-15 13:41:30,956][1648985] Avg episode reward: [(0, '127.190')] [2024-06-15 13:41:31,412][1652491] Updated weights for policy 0, policy_version 174960 (0.0011) [2024-06-15 13:41:33,373][1652491] Updated weights for policy 0, policy_version 175034 (0.0018) [2024-06-15 13:41:35,955][1648985] Fps is (10 sec: 39321.7, 60 sec: 44782.9, 300 sec: 46208.4). Total num frames: 358481920. Throughput: 0: 11275.4. Samples: 89700352. Policy #0 lag: (min: 58.0, avg: 127.0, max: 314.0) [2024-06-15 13:41:35,956][1648985] Avg episode reward: [(0, '116.680')] [2024-06-15 13:41:36,880][1652491] Updated weights for policy 0, policy_version 175088 (0.0109) [2024-06-15 13:41:40,955][1648985] Fps is (10 sec: 39321.3, 60 sec: 44782.9, 300 sec: 46430.6). Total num frames: 358678528. Throughput: 0: 11298.1. Samples: 89742336. Policy #0 lag: (min: 58.0, avg: 127.0, max: 314.0) [2024-06-15 13:41:40,956][1648985] Avg episode reward: [(0, '126.980')] [2024-06-15 13:41:41,448][1652491] Updated weights for policy 0, policy_version 175152 (0.0142) [2024-06-15 13:41:42,441][1652491] Updated weights for policy 0, policy_version 175187 (0.0014) [2024-06-15 13:41:44,322][1652491] Updated weights for policy 0, policy_version 175264 (0.0016) [2024-06-15 13:41:45,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 46967.4, 300 sec: 46430.6). Total num frames: 359006208. Throughput: 0: 11320.9. Samples: 89799168. Policy #0 lag: (min: 58.0, avg: 127.0, max: 314.0) [2024-06-15 13:41:45,956][1648985] Avg episode reward: [(0, '150.870')] [2024-06-15 13:41:47,564][1652491] Updated weights for policy 0, policy_version 175314 (0.0014) [2024-06-15 13:41:48,439][1652491] Updated weights for policy 0, policy_version 175356 (0.0018) [2024-06-15 13:41:50,955][1648985] Fps is (10 sec: 45874.8, 60 sec: 43690.5, 300 sec: 46430.8). Total num frames: 359137280. Throughput: 0: 11320.9. Samples: 89877504. Policy #0 lag: (min: 58.0, avg: 127.0, max: 314.0) [2024-06-15 13:41:50,956][1648985] Avg episode reward: [(0, '171.970')] [2024-06-15 13:41:50,957][1651469] Saving new best policy, reward=171.970! [2024-06-15 13:41:53,810][1652491] Updated weights for policy 0, policy_version 175426 (0.0012) [2024-06-15 13:41:55,149][1652491] Updated weights for policy 0, policy_version 175482 (0.0011) [2024-06-15 13:41:55,955][1648985] Fps is (10 sec: 42598.5, 60 sec: 46421.3, 300 sec: 46319.5). Total num frames: 359432192. Throughput: 0: 11389.2. Samples: 89904640. Policy #0 lag: (min: 58.0, avg: 127.0, max: 314.0) [2024-06-15 13:41:55,956][1648985] Avg episode reward: [(0, '170.970')] [2024-06-15 13:41:56,652][1652491] Updated weights for policy 0, policy_version 175536 (0.0119) [2024-06-15 13:41:59,906][1651469] Signal inference workers to stop experience collection... (9200 times) [2024-06-15 13:41:59,949][1652491] InferenceWorker_p0-w0: stopping experience collection (9200 times) [2024-06-15 13:41:59,950][1652491] Updated weights for policy 0, policy_version 175588 (0.0013) [2024-06-15 13:42:00,164][1651469] Signal inference workers to resume experience collection... (9200 times) [2024-06-15 13:42:00,165][1652491] InferenceWorker_p0-w0: resuming experience collection (9200 times) [2024-06-15 13:42:00,955][1648985] Fps is (10 sec: 52429.1, 60 sec: 45875.2, 300 sec: 46541.7). Total num frames: 359661568. Throughput: 0: 11138.8. Samples: 89967616. Policy #0 lag: (min: 58.0, avg: 127.0, max: 314.0) [2024-06-15 13:42:00,956][1648985] Avg episode reward: [(0, '152.340')] [2024-06-15 13:42:04,764][1652491] Updated weights for policy 0, policy_version 175637 (0.0023) [2024-06-15 13:42:05,955][1648985] Fps is (10 sec: 36044.3, 60 sec: 45875.2, 300 sec: 46208.4). Total num frames: 359792640. Throughput: 0: 11400.5. Samples: 90040832. Policy #0 lag: (min: 10.0, avg: 85.8, max: 266.0) [2024-06-15 13:42:05,956][1648985] Avg episode reward: [(0, '133.840')] [2024-06-15 13:42:06,506][1652491] Updated weights for policy 0, policy_version 175712 (0.0013) [2024-06-15 13:42:08,566][1652491] Updated weights for policy 0, policy_version 175792 (0.0106) [2024-06-15 13:42:10,955][1648985] Fps is (10 sec: 39321.8, 60 sec: 43690.7, 300 sec: 46208.4). Total num frames: 360054784. Throughput: 0: 11002.3. Samples: 90063872. Policy #0 lag: (min: 10.0, avg: 85.8, max: 266.0) [2024-06-15 13:42:10,956][1648985] Avg episode reward: [(0, '147.460')] [2024-06-15 13:42:11,798][1652491] Updated weights for policy 0, policy_version 175847 (0.0014) [2024-06-15 13:42:15,955][1648985] Fps is (10 sec: 39322.2, 60 sec: 43690.7, 300 sec: 46097.4). Total num frames: 360185856. Throughput: 0: 11218.5. Samples: 90142720. Policy #0 lag: (min: 10.0, avg: 85.8, max: 266.0) [2024-06-15 13:42:15,955][1648985] Avg episode reward: [(0, '135.870')] [2024-06-15 13:42:16,374][1652491] Updated weights for policy 0, policy_version 175899 (0.0011) [2024-06-15 13:42:17,954][1652491] Updated weights for policy 0, policy_version 175969 (0.0015) [2024-06-15 13:42:19,925][1652491] Updated weights for policy 0, policy_version 176053 (0.0013) [2024-06-15 13:42:20,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 45329.1, 300 sec: 46208.4). Total num frames: 360579072. Throughput: 0: 11286.8. Samples: 90208256. Policy #0 lag: (min: 10.0, avg: 85.8, max: 266.0) [2024-06-15 13:42:20,956][1648985] Avg episode reward: [(0, '141.790')] [2024-06-15 13:42:22,237][1652491] Updated weights for policy 0, policy_version 176112 (0.0014) [2024-06-15 13:42:25,955][1648985] Fps is (10 sec: 52427.4, 60 sec: 43690.5, 300 sec: 46208.4). Total num frames: 360710144. Throughput: 0: 11116.0. Samples: 90242560. Policy #0 lag: (min: 10.0, avg: 85.8, max: 266.0) [2024-06-15 13:42:25,956][1648985] Avg episode reward: [(0, '136.200')] [2024-06-15 13:42:28,019][1652491] Updated weights for policy 0, policy_version 176176 (0.0113) [2024-06-15 13:42:29,586][1652491] Updated weights for policy 0, policy_version 176240 (0.0014) [2024-06-15 13:42:30,955][1648985] Fps is (10 sec: 45875.2, 60 sec: 45875.2, 300 sec: 45986.3). Total num frames: 361037824. Throughput: 0: 11423.3. Samples: 90313216. Policy #0 lag: (min: 10.0, avg: 85.8, max: 266.0) [2024-06-15 13:42:30,956][1648985] Avg episode reward: [(0, '140.040')] [2024-06-15 13:42:32,924][1652491] Updated weights for policy 0, policy_version 176324 (0.0014) [2024-06-15 13:42:34,241][1652491] Updated weights for policy 0, policy_version 176384 (0.0014) [2024-06-15 13:42:35,955][1648985] Fps is (10 sec: 52429.1, 60 sec: 45875.1, 300 sec: 46208.4). Total num frames: 361234432. Throughput: 0: 11070.6. Samples: 90375680. Policy #0 lag: (min: 10.0, avg: 85.8, max: 266.0) [2024-06-15 13:42:35,956][1648985] Avg episode reward: [(0, '127.770')] [2024-06-15 13:42:40,444][1652491] Updated weights for policy 0, policy_version 176433 (0.0012) [2024-06-15 13:42:40,955][1648985] Fps is (10 sec: 32767.8, 60 sec: 44782.9, 300 sec: 45764.1). Total num frames: 361365504. Throughput: 0: 11446.0. Samples: 90419712. Policy #0 lag: (min: 10.0, avg: 85.8, max: 266.0) [2024-06-15 13:42:40,956][1648985] Avg episode reward: [(0, '151.790')] [2024-06-15 13:42:41,826][1651469] Signal inference workers to stop experience collection... (9250 times) [2024-06-15 13:42:41,882][1652491] InferenceWorker_p0-w0: stopping experience collection (9250 times) [2024-06-15 13:42:41,885][1652491] Updated weights for policy 0, policy_version 176500 (0.0015) [2024-06-15 13:42:42,086][1651469] Signal inference workers to resume experience collection... (9250 times) [2024-06-15 13:42:42,087][1652491] InferenceWorker_p0-w0: resuming experience collection (9250 times) [2024-06-15 13:42:43,522][1652491] Updated weights for policy 0, policy_version 176570 (0.0115) [2024-06-15 13:42:45,289][1652491] Updated weights for policy 0, policy_version 176624 (0.0014) [2024-06-15 13:42:45,955][1648985] Fps is (10 sec: 52430.1, 60 sec: 45875.3, 300 sec: 46430.6). Total num frames: 361758720. Throughput: 0: 11298.2. Samples: 90476032. Policy #0 lag: (min: 10.0, avg: 85.8, max: 266.0) [2024-06-15 13:42:45,955][1648985] Avg episode reward: [(0, '155.140')] [2024-06-15 13:42:50,955][1648985] Fps is (10 sec: 42598.5, 60 sec: 44236.9, 300 sec: 45542.0). Total num frames: 361791488. Throughput: 0: 11446.1. Samples: 90555904. Policy #0 lag: (min: 10.0, avg: 85.8, max: 266.0) [2024-06-15 13:42:50,956][1648985] Avg episode reward: [(0, '167.280')] [2024-06-15 13:42:51,154][1652491] Updated weights for policy 0, policy_version 176672 (0.0013) [2024-06-15 13:42:53,000][1652491] Updated weights for policy 0, policy_version 176761 (0.0013) [2024-06-15 13:42:54,485][1652491] Updated weights for policy 0, policy_version 176826 (0.0014) [2024-06-15 13:42:55,955][1648985] Fps is (10 sec: 39320.9, 60 sec: 45329.0, 300 sec: 46208.5). Total num frames: 362151936. Throughput: 0: 11525.7. Samples: 90582528. Policy #0 lag: (min: 10.0, avg: 85.8, max: 266.0) [2024-06-15 13:42:55,956][1648985] Avg episode reward: [(0, '162.590')] [2024-06-15 13:42:55,960][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000176832_362151936.pth... [2024-06-15 13:42:56,147][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000171456_351141888.pth [2024-06-15 13:42:57,542][1652491] Updated weights for policy 0, policy_version 176896 (0.0015) [2024-06-15 13:43:00,955][1648985] Fps is (10 sec: 49151.9, 60 sec: 43690.7, 300 sec: 45875.2). Total num frames: 362283008. Throughput: 0: 11298.1. Samples: 90651136. Policy #0 lag: (min: 10.0, avg: 85.8, max: 266.0) [2024-06-15 13:43:00,956][1648985] Avg episode reward: [(0, '168.870')] [2024-06-15 13:43:03,984][1652491] Updated weights for policy 0, policy_version 176982 (0.0120) [2024-06-15 13:43:05,544][1652491] Updated weights for policy 0, policy_version 177042 (0.0014) [2024-06-15 13:43:05,955][1648985] Fps is (10 sec: 45875.5, 60 sec: 46967.5, 300 sec: 45986.3). Total num frames: 362610688. Throughput: 0: 11275.4. Samples: 90715648. Policy #0 lag: (min: 10.0, avg: 85.8, max: 266.0) [2024-06-15 13:43:05,956][1648985] Avg episode reward: [(0, '159.040')] [2024-06-15 13:43:06,481][1652491] Updated weights for policy 0, policy_version 177088 (0.0013) [2024-06-15 13:43:08,849][1652491] Updated weights for policy 0, policy_version 177136 (0.0131) [2024-06-15 13:43:10,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 45875.2, 300 sec: 46097.4). Total num frames: 362807296. Throughput: 0: 11298.2. Samples: 90750976. Policy #0 lag: (min: 10.0, avg: 85.8, max: 266.0) [2024-06-15 13:43:10,956][1648985] Avg episode reward: [(0, '162.330')] [2024-06-15 13:43:14,442][1652491] Updated weights for policy 0, policy_version 177184 (0.0013) [2024-06-15 13:43:15,955][1648985] Fps is (10 sec: 39321.8, 60 sec: 46967.5, 300 sec: 45542.0). Total num frames: 363003904. Throughput: 0: 11457.4. Samples: 90828800. Policy #0 lag: (min: 8.0, avg: 58.4, max: 232.0) [2024-06-15 13:43:15,956][1648985] Avg episode reward: [(0, '142.090')] [2024-06-15 13:43:16,113][1652491] Updated weights for policy 0, policy_version 177253 (0.0013) [2024-06-15 13:43:18,023][1652491] Updated weights for policy 0, policy_version 177328 (0.0012) [2024-06-15 13:43:20,131][1652491] Updated weights for policy 0, policy_version 177370 (0.0013) [2024-06-15 13:43:20,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 45875.2, 300 sec: 46208.4). Total num frames: 363331584. Throughput: 0: 11423.3. Samples: 90889728. Policy #0 lag: (min: 8.0, avg: 58.4, max: 232.0) [2024-06-15 13:43:20,956][1648985] Avg episode reward: [(0, '133.490')] [2024-06-15 13:43:25,204][1651469] Signal inference workers to stop experience collection... (9300 times) [2024-06-15 13:43:25,282][1652491] InferenceWorker_p0-w0: stopping experience collection (9300 times) [2024-06-15 13:43:25,380][1651469] Signal inference workers to resume experience collection... (9300 times) [2024-06-15 13:43:25,380][1652491] InferenceWorker_p0-w0: resuming experience collection (9300 times) [2024-06-15 13:43:25,382][1652491] Updated weights for policy 0, policy_version 177440 (0.0112) [2024-06-15 13:43:25,955][1648985] Fps is (10 sec: 42598.2, 60 sec: 45329.2, 300 sec: 45208.7). Total num frames: 363429888. Throughput: 0: 11332.3. Samples: 90929664. Policy #0 lag: (min: 8.0, avg: 58.4, max: 232.0) [2024-06-15 13:43:25,956][1648985] Avg episode reward: [(0, '140.240')] [2024-06-15 13:43:27,104][1652491] Updated weights for policy 0, policy_version 177508 (0.0012) [2024-06-15 13:43:28,825][1652491] Updated weights for policy 0, policy_version 177589 (0.0117) [2024-06-15 13:43:30,966][1648985] Fps is (10 sec: 39276.8, 60 sec: 44774.4, 300 sec: 45762.4). Total num frames: 363724800. Throughput: 0: 11511.4. Samples: 90994176. Policy #0 lag: (min: 8.0, avg: 58.4, max: 232.0) [2024-06-15 13:43:30,967][1648985] Avg episode reward: [(0, '138.600')] [2024-06-15 13:43:31,643][1652491] Updated weights for policy 0, policy_version 177632 (0.0014) [2024-06-15 13:43:32,293][1652491] Updated weights for policy 0, policy_version 177664 (0.0013) [2024-06-15 13:43:35,955][1648985] Fps is (10 sec: 42598.7, 60 sec: 43690.8, 300 sec: 45430.9). Total num frames: 363855872. Throughput: 0: 11571.2. Samples: 91076608. Policy #0 lag: (min: 8.0, avg: 58.4, max: 232.0) [2024-06-15 13:43:35,956][1648985] Avg episode reward: [(0, '133.760')] [2024-06-15 13:43:37,659][1652491] Updated weights for policy 0, policy_version 177730 (0.0124) [2024-06-15 13:43:39,325][1652491] Updated weights for policy 0, policy_version 177808 (0.0014) [2024-06-15 13:43:40,553][1652491] Updated weights for policy 0, policy_version 177854 (0.0012) [2024-06-15 13:43:40,955][1648985] Fps is (10 sec: 52488.2, 60 sec: 48059.7, 300 sec: 45986.3). Total num frames: 364249088. Throughput: 0: 11548.5. Samples: 91102208. Policy #0 lag: (min: 8.0, avg: 58.4, max: 232.0) [2024-06-15 13:43:40,956][1648985] Avg episode reward: [(0, '134.410')] [2024-06-15 13:43:43,656][1652491] Updated weights for policy 0, policy_version 177911 (0.0013) [2024-06-15 13:43:45,955][1648985] Fps is (10 sec: 52427.4, 60 sec: 43690.5, 300 sec: 45764.1). Total num frames: 364380160. Throughput: 0: 11559.8. Samples: 91171328. Policy #0 lag: (min: 8.0, avg: 58.4, max: 232.0) [2024-06-15 13:43:45,956][1648985] Avg episode reward: [(0, '139.510')] [2024-06-15 13:43:48,367][1652491] Updated weights for policy 0, policy_version 177968 (0.0031) [2024-06-15 13:43:50,909][1652491] Updated weights for policy 0, policy_version 178065 (0.0012) [2024-06-15 13:43:50,955][1648985] Fps is (10 sec: 42598.7, 60 sec: 48059.8, 300 sec: 45875.2). Total num frames: 364675072. Throughput: 0: 11480.2. Samples: 91232256. Policy #0 lag: (min: 8.0, avg: 58.4, max: 232.0) [2024-06-15 13:43:50,955][1648985] Avg episode reward: [(0, '149.950')] [2024-06-15 13:43:51,725][1652491] Updated weights for policy 0, policy_version 178107 (0.0012) [2024-06-15 13:43:54,901][1652491] Updated weights for policy 0, policy_version 178173 (0.0015) [2024-06-15 13:43:55,955][1648985] Fps is (10 sec: 52429.9, 60 sec: 45875.3, 300 sec: 46208.4). Total num frames: 364904448. Throughput: 0: 11537.1. Samples: 91270144. Policy #0 lag: (min: 8.0, avg: 58.4, max: 232.0) [2024-06-15 13:43:55,956][1648985] Avg episode reward: [(0, '167.690')] [2024-06-15 13:43:59,552][1652491] Updated weights for policy 0, policy_version 178210 (0.0017) [2024-06-15 13:44:00,956][1648985] Fps is (10 sec: 39316.8, 60 sec: 46420.4, 300 sec: 45430.7). Total num frames: 365068288. Throughput: 0: 11627.8. Samples: 91352064. Policy #0 lag: (min: 8.0, avg: 58.4, max: 232.0) [2024-06-15 13:44:00,957][1648985] Avg episode reward: [(0, '158.950')] [2024-06-15 13:44:01,867][1652491] Updated weights for policy 0, policy_version 178304 (0.0014) [2024-06-15 13:44:02,410][1651469] Signal inference workers to stop experience collection... (9350 times) [2024-06-15 13:44:02,514][1652491] InferenceWorker_p0-w0: stopping experience collection (9350 times) [2024-06-15 13:44:02,662][1651469] Signal inference workers to resume experience collection... (9350 times) [2024-06-15 13:44:02,662][1652491] InferenceWorker_p0-w0: resuming experience collection (9350 times) [2024-06-15 13:44:05,500][1652491] Updated weights for policy 0, policy_version 178384 (0.0120) [2024-06-15 13:44:05,955][1648985] Fps is (10 sec: 45875.0, 60 sec: 45875.2, 300 sec: 45988.0). Total num frames: 365363200. Throughput: 0: 11423.3. Samples: 91403776. Policy #0 lag: (min: 8.0, avg: 58.4, max: 232.0) [2024-06-15 13:44:05,956][1648985] Avg episode reward: [(0, '150.370')] [2024-06-15 13:44:10,291][1652491] Updated weights for policy 0, policy_version 178437 (0.0014) [2024-06-15 13:44:10,955][1648985] Fps is (10 sec: 42603.5, 60 sec: 44782.9, 300 sec: 45319.8). Total num frames: 365494272. Throughput: 0: 11514.3. Samples: 91447808. Policy #0 lag: (min: 8.0, avg: 58.4, max: 232.0) [2024-06-15 13:44:10,955][1648985] Avg episode reward: [(0, '129.210')] [2024-06-15 13:44:11,543][1652491] Updated weights for policy 0, policy_version 178492 (0.0012) [2024-06-15 13:44:13,096][1652491] Updated weights for policy 0, policy_version 178548 (0.0013) [2024-06-15 13:44:14,466][1652491] Updated weights for policy 0, policy_version 178617 (0.0017) [2024-06-15 13:44:15,955][1648985] Fps is (10 sec: 45875.7, 60 sec: 46967.5, 300 sec: 45764.1). Total num frames: 365821952. Throughput: 0: 11494.5. Samples: 91511296. Policy #0 lag: (min: 8.0, avg: 58.4, max: 232.0) [2024-06-15 13:44:15,956][1648985] Avg episode reward: [(0, '136.220')] [2024-06-15 13:44:17,700][1652491] Updated weights for policy 0, policy_version 178678 (0.0015) [2024-06-15 13:44:20,955][1648985] Fps is (10 sec: 45874.1, 60 sec: 43690.5, 300 sec: 45319.8). Total num frames: 365953024. Throughput: 0: 11502.9. Samples: 91594240. Policy #0 lag: (min: 8.0, avg: 58.4, max: 232.0) [2024-06-15 13:44:20,956][1648985] Avg episode reward: [(0, '127.820')] [2024-06-15 13:44:21,498][1652491] Updated weights for policy 0, policy_version 178705 (0.0018) [2024-06-15 13:44:22,321][1652491] Updated weights for policy 0, policy_version 178750 (0.0111) [2024-06-15 13:44:24,005][1652491] Updated weights for policy 0, policy_version 178820 (0.0014) [2024-06-15 13:44:25,326][1652491] Updated weights for policy 0, policy_version 178876 (0.0012) [2024-06-15 13:44:25,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 48605.9, 300 sec: 45875.2). Total num frames: 366346240. Throughput: 0: 11559.8. Samples: 91622400. Policy #0 lag: (min: 15.0, avg: 90.1, max: 271.0) [2024-06-15 13:44:25,955][1648985] Avg episode reward: [(0, '135.810')] [2024-06-15 13:44:28,821][1652491] Updated weights for policy 0, policy_version 178936 (0.0013) [2024-06-15 13:44:30,955][1648985] Fps is (10 sec: 52430.1, 60 sec: 45883.9, 300 sec: 45653.0). Total num frames: 366477312. Throughput: 0: 11412.0. Samples: 91684864. Policy #0 lag: (min: 15.0, avg: 90.1, max: 271.0) [2024-06-15 13:44:30,956][1648985] Avg episode reward: [(0, '150.240')] [2024-06-15 13:44:33,698][1652491] Updated weights for policy 0, policy_version 178981 (0.0013) [2024-06-15 13:44:34,794][1652491] Updated weights for policy 0, policy_version 179028 (0.0012) [2024-06-15 13:44:35,955][1648985] Fps is (10 sec: 39321.2, 60 sec: 48059.7, 300 sec: 45764.1). Total num frames: 366739456. Throughput: 0: 11730.5. Samples: 91760128. Policy #0 lag: (min: 15.0, avg: 90.1, max: 271.0) [2024-06-15 13:44:35,956][1648985] Avg episode reward: [(0, '163.970')] [2024-06-15 13:44:36,056][1652491] Updated weights for policy 0, policy_version 179079 (0.0012) [2024-06-15 13:44:37,216][1652491] Updated weights for policy 0, policy_version 179136 (0.0098) [2024-06-15 13:44:39,954][1652491] Updated weights for policy 0, policy_version 179200 (0.0022) [2024-06-15 13:44:40,955][1648985] Fps is (10 sec: 52428.5, 60 sec: 45875.2, 300 sec: 45986.3). Total num frames: 367001600. Throughput: 0: 11616.7. Samples: 91792896. Policy #0 lag: (min: 15.0, avg: 90.1, max: 271.0) [2024-06-15 13:44:40,956][1648985] Avg episode reward: [(0, '145.010')] [2024-06-15 13:44:45,327][1651469] Signal inference workers to stop experience collection... (9400 times) [2024-06-15 13:44:45,388][1652491] InferenceWorker_p0-w0: stopping experience collection (9400 times) [2024-06-15 13:44:45,699][1651469] Signal inference workers to resume experience collection... (9400 times) [2024-06-15 13:44:45,700][1652491] InferenceWorker_p0-w0: resuming experience collection (9400 times) [2024-06-15 13:44:45,813][1652491] Updated weights for policy 0, policy_version 179264 (0.0124) [2024-06-15 13:44:45,955][1648985] Fps is (10 sec: 39321.9, 60 sec: 45875.4, 300 sec: 45319.8). Total num frames: 367132672. Throughput: 0: 11560.1. Samples: 91872256. Policy #0 lag: (min: 15.0, avg: 90.1, max: 271.0) [2024-06-15 13:44:45,956][1648985] Avg episode reward: [(0, '149.200')] [2024-06-15 13:44:47,821][1652491] Updated weights for policy 0, policy_version 179351 (0.0013) [2024-06-15 13:44:49,756][1652491] Updated weights for policy 0, policy_version 179410 (0.0015) [2024-06-15 13:44:50,955][1648985] Fps is (10 sec: 52428.6, 60 sec: 47513.5, 300 sec: 46208.4). Total num frames: 367525888. Throughput: 0: 11582.6. Samples: 91924992. Policy #0 lag: (min: 15.0, avg: 90.1, max: 271.0) [2024-06-15 13:44:50,956][1648985] Avg episode reward: [(0, '153.410')] [2024-06-15 13:44:55,955][1648985] Fps is (10 sec: 39320.3, 60 sec: 43690.5, 300 sec: 45208.7). Total num frames: 367525888. Throughput: 0: 11525.6. Samples: 91966464. Policy #0 lag: (min: 15.0, avg: 90.1, max: 271.0) [2024-06-15 13:44:55,956][1648985] Avg episode reward: [(0, '154.530')] [2024-06-15 13:44:56,642][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000179488_367591424.pth... [2024-06-15 13:44:56,643][1652491] Updated weights for policy 0, policy_version 179488 (0.0122) [2024-06-15 13:44:56,806][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000174160_356679680.pth [2024-06-15 13:44:58,443][1652491] Updated weights for policy 0, policy_version 179554 (0.0030) [2024-06-15 13:45:00,221][1652491] Updated weights for policy 0, policy_version 179648 (0.0013) [2024-06-15 13:45:00,955][1648985] Fps is (10 sec: 39322.0, 60 sec: 47514.5, 300 sec: 45764.2). Total num frames: 367919104. Throughput: 0: 11446.0. Samples: 92026368. Policy #0 lag: (min: 15.0, avg: 90.1, max: 271.0) [2024-06-15 13:45:00,956][1648985] Avg episode reward: [(0, '155.000')] [2024-06-15 13:45:02,024][1652491] Updated weights for policy 0, policy_version 179696 (0.0012) [2024-06-15 13:45:05,955][1648985] Fps is (10 sec: 52430.4, 60 sec: 44783.0, 300 sec: 45542.0). Total num frames: 368050176. Throughput: 0: 11343.7. Samples: 92104704. Policy #0 lag: (min: 15.0, avg: 90.1, max: 271.0) [2024-06-15 13:45:05,956][1648985] Avg episode reward: [(0, '176.920')] [2024-06-15 13:45:05,957][1651469] Saving new best policy, reward=176.920! [2024-06-15 13:45:09,280][1652491] Updated weights for policy 0, policy_version 179761 (0.0012) [2024-06-15 13:45:10,955][1648985] Fps is (10 sec: 36044.8, 60 sec: 46421.3, 300 sec: 45653.0). Total num frames: 368279552. Throughput: 0: 11434.7. Samples: 92136960. Policy #0 lag: (min: 15.0, avg: 90.1, max: 271.0) [2024-06-15 13:45:10,956][1648985] Avg episode reward: [(0, '171.840')] [2024-06-15 13:45:11,346][1652491] Updated weights for policy 0, policy_version 179843 (0.0014) [2024-06-15 13:45:12,984][1652491] Updated weights for policy 0, policy_version 179920 (0.0013) [2024-06-15 13:45:13,886][1652491] Updated weights for policy 0, policy_version 179968 (0.0015) [2024-06-15 13:45:15,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 45875.2, 300 sec: 45986.3). Total num frames: 368574464. Throughput: 0: 11309.5. Samples: 92193792. Policy #0 lag: (min: 15.0, avg: 90.1, max: 271.0) [2024-06-15 13:45:15,956][1648985] Avg episode reward: [(0, '150.870')] [2024-06-15 13:45:20,955][1648985] Fps is (10 sec: 39321.8, 60 sec: 45329.3, 300 sec: 45208.7). Total num frames: 368672768. Throughput: 0: 11423.3. Samples: 92274176. Policy #0 lag: (min: 15.0, avg: 90.1, max: 271.0) [2024-06-15 13:45:20,955][1648985] Avg episode reward: [(0, '138.150')] [2024-06-15 13:45:21,130][1652491] Updated weights for policy 0, policy_version 180020 (0.0016) [2024-06-15 13:45:23,095][1652491] Updated weights for policy 0, policy_version 180101 (0.0275) [2024-06-15 13:45:23,869][1651469] Signal inference workers to stop experience collection... (9450 times) [2024-06-15 13:45:23,974][1652491] InferenceWorker_p0-w0: stopping experience collection (9450 times) [2024-06-15 13:45:24,090][1651469] Signal inference workers to resume experience collection... (9450 times) [2024-06-15 13:45:24,091][1652491] InferenceWorker_p0-w0: resuming experience collection (9450 times) [2024-06-15 13:45:24,594][1652491] Updated weights for policy 0, policy_version 180165 (0.0014) [2024-06-15 13:45:25,668][1652491] Updated weights for policy 0, policy_version 180222 (0.0015) [2024-06-15 13:45:25,955][1648985] Fps is (10 sec: 52428.1, 60 sec: 45875.1, 300 sec: 46208.4). Total num frames: 369098752. Throughput: 0: 11150.2. Samples: 92294656. Policy #0 lag: (min: 15.0, avg: 90.1, max: 271.0) [2024-06-15 13:45:25,956][1648985] Avg episode reward: [(0, '144.860')] [2024-06-15 13:45:30,955][1648985] Fps is (10 sec: 42597.9, 60 sec: 43690.6, 300 sec: 45097.6). Total num frames: 369098752. Throughput: 0: 11184.3. Samples: 92375552. Policy #0 lag: (min: 15.0, avg: 90.1, max: 271.0) [2024-06-15 13:45:30,956][1648985] Avg episode reward: [(0, '142.170')] [2024-06-15 13:45:31,905][1652491] Updated weights for policy 0, policy_version 180262 (0.0085) [2024-06-15 13:45:34,598][1652491] Updated weights for policy 0, policy_version 180369 (0.0013) [2024-06-15 13:45:35,955][1648985] Fps is (10 sec: 39322.1, 60 sec: 45875.3, 300 sec: 45764.1). Total num frames: 369491968. Throughput: 0: 11161.6. Samples: 92427264. Policy #0 lag: (min: 15.0, avg: 76.3, max: 271.0) [2024-06-15 13:45:35,955][1648985] Avg episode reward: [(0, '131.350')] [2024-06-15 13:45:36,648][1652491] Updated weights for policy 0, policy_version 180438 (0.0017) [2024-06-15 13:45:40,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 43690.7, 300 sec: 45541.9). Total num frames: 369623040. Throughput: 0: 11059.3. Samples: 92464128. Policy #0 lag: (min: 15.0, avg: 76.3, max: 271.0) [2024-06-15 13:45:40,956][1648985] Avg episode reward: [(0, '139.310')] [2024-06-15 13:45:43,379][1652491] Updated weights for policy 0, policy_version 180482 (0.0013) [2024-06-15 13:45:45,786][1652491] Updated weights for policy 0, policy_version 180578 (0.0107) [2024-06-15 13:45:45,955][1648985] Fps is (10 sec: 32767.4, 60 sec: 44782.8, 300 sec: 45097.6). Total num frames: 369819648. Throughput: 0: 11309.5. Samples: 92535296. Policy #0 lag: (min: 15.0, avg: 76.3, max: 271.0) [2024-06-15 13:45:45,956][1648985] Avg episode reward: [(0, '161.600')] [2024-06-15 13:45:47,657][1652491] Updated weights for policy 0, policy_version 180665 (0.0015) [2024-06-15 13:45:49,333][1652491] Updated weights for policy 0, policy_version 180727 (0.0015) [2024-06-15 13:45:50,955][1648985] Fps is (10 sec: 52428.6, 60 sec: 43690.7, 300 sec: 45764.1). Total num frames: 370147328. Throughput: 0: 10945.4. Samples: 92597248. Policy #0 lag: (min: 15.0, avg: 76.3, max: 271.0) [2024-06-15 13:45:50,956][1648985] Avg episode reward: [(0, '150.850')] [2024-06-15 13:45:55,955][1648985] Fps is (10 sec: 39322.2, 60 sec: 44783.1, 300 sec: 45097.7). Total num frames: 370212864. Throughput: 0: 11116.1. Samples: 92637184. Policy #0 lag: (min: 15.0, avg: 76.3, max: 271.0) [2024-06-15 13:45:55,955][1648985] Avg episode reward: [(0, '134.690')] [2024-06-15 13:45:56,020][1652491] Updated weights for policy 0, policy_version 180771 (0.0095) [2024-06-15 13:45:57,678][1652491] Updated weights for policy 0, policy_version 180835 (0.0012) [2024-06-15 13:45:59,102][1652491] Updated weights for policy 0, policy_version 180897 (0.0011) [2024-06-15 13:46:00,944][1652491] Updated weights for policy 0, policy_version 180982 (0.0040) [2024-06-15 13:46:00,955][1648985] Fps is (10 sec: 49152.6, 60 sec: 45329.1, 300 sec: 46097.4). Total num frames: 370638848. Throughput: 0: 11184.4. Samples: 92697088. Policy #0 lag: (min: 15.0, avg: 76.3, max: 271.0) [2024-06-15 13:46:00,955][1648985] Avg episode reward: [(0, '137.670')] [2024-06-15 13:46:05,955][1648985] Fps is (10 sec: 45875.4, 60 sec: 43690.7, 300 sec: 44875.5). Total num frames: 370671616. Throughput: 0: 11013.7. Samples: 92769792. Policy #0 lag: (min: 15.0, avg: 76.3, max: 271.0) [2024-06-15 13:46:05,956][1648985] Avg episode reward: [(0, '155.140')] [2024-06-15 13:46:07,424][1652491] Updated weights for policy 0, policy_version 181011 (0.0011) [2024-06-15 13:46:07,755][1651469] Signal inference workers to stop experience collection... (9500 times) [2024-06-15 13:46:07,793][1652491] InferenceWorker_p0-w0: stopping experience collection (9500 times) [2024-06-15 13:46:07,975][1651469] Signal inference workers to resume experience collection... (9500 times) [2024-06-15 13:46:07,976][1652491] InferenceWorker_p0-w0: resuming experience collection (9500 times) [2024-06-15 13:46:09,249][1652491] Updated weights for policy 0, policy_version 181104 (0.0135) [2024-06-15 13:46:10,727][1652491] Updated weights for policy 0, policy_version 181155 (0.0011) [2024-06-15 13:46:10,955][1648985] Fps is (10 sec: 39320.9, 60 sec: 45875.1, 300 sec: 45653.0). Total num frames: 371032064. Throughput: 0: 11355.0. Samples: 92805632. Policy #0 lag: (min: 15.0, avg: 76.3, max: 271.0) [2024-06-15 13:46:10,956][1648985] Avg episode reward: [(0, '144.090')] [2024-06-15 13:46:12,189][1652491] Updated weights for policy 0, policy_version 181216 (0.0010) [2024-06-15 13:46:15,955][1648985] Fps is (10 sec: 52428.6, 60 sec: 43690.6, 300 sec: 45208.7). Total num frames: 371195904. Throughput: 0: 10945.4. Samples: 92868096. Policy #0 lag: (min: 15.0, avg: 76.3, max: 271.0) [2024-06-15 13:46:15,956][1648985] Avg episode reward: [(0, '161.390')] [2024-06-15 13:46:18,564][1652491] Updated weights for policy 0, policy_version 181250 (0.0012) [2024-06-15 13:46:19,505][1652491] Updated weights for policy 0, policy_version 181296 (0.0013) [2024-06-15 13:46:20,955][1648985] Fps is (10 sec: 39321.6, 60 sec: 45875.1, 300 sec: 45208.7). Total num frames: 371425280. Throughput: 0: 11423.2. Samples: 92941312. Policy #0 lag: (min: 15.0, avg: 76.3, max: 271.0) [2024-06-15 13:46:20,956][1648985] Avg episode reward: [(0, '147.280')] [2024-06-15 13:46:21,118][1652491] Updated weights for policy 0, policy_version 181364 (0.0014) [2024-06-15 13:46:22,438][1652491] Updated weights for policy 0, policy_version 181410 (0.0014) [2024-06-15 13:46:24,183][1652491] Updated weights for policy 0, policy_version 181476 (0.0012) [2024-06-15 13:46:25,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 43690.8, 300 sec: 45542.0). Total num frames: 371720192. Throughput: 0: 11093.4. Samples: 92963328. Policy #0 lag: (min: 15.0, avg: 76.3, max: 271.0) [2024-06-15 13:46:25,955][1648985] Avg episode reward: [(0, '155.680')] [2024-06-15 13:46:30,265][1652491] Updated weights for policy 0, policy_version 181520 (0.0013) [2024-06-15 13:46:30,955][1648985] Fps is (10 sec: 39322.7, 60 sec: 45329.2, 300 sec: 45208.8). Total num frames: 371818496. Throughput: 0: 11264.1. Samples: 93042176. Policy #0 lag: (min: 15.0, avg: 76.3, max: 271.0) [2024-06-15 13:46:30,955][1648985] Avg episode reward: [(0, '151.920')] [2024-06-15 13:46:31,617][1652491] Updated weights for policy 0, policy_version 181586 (0.0013) [2024-06-15 13:46:32,524][1652491] Updated weights for policy 0, policy_version 181629 (0.0013) [2024-06-15 13:46:34,115][1652491] Updated weights for policy 0, policy_version 181683 (0.0015) [2024-06-15 13:46:35,641][1652491] Updated weights for policy 0, policy_version 181759 (0.0020) [2024-06-15 13:46:35,955][1648985] Fps is (10 sec: 52428.4, 60 sec: 45875.2, 300 sec: 45986.3). Total num frames: 372244480. Throughput: 0: 11195.7. Samples: 93101056. Policy #0 lag: (min: 15.0, avg: 76.3, max: 271.0) [2024-06-15 13:46:35,956][1648985] Avg episode reward: [(0, '133.590')] [2024-06-15 13:46:40,955][1648985] Fps is (10 sec: 45873.8, 60 sec: 44236.7, 300 sec: 44986.6). Total num frames: 372277248. Throughput: 0: 11355.0. Samples: 93148160. Policy #0 lag: (min: 2.0, avg: 63.9, max: 258.0) [2024-06-15 13:46:40,957][1648985] Avg episode reward: [(0, '115.570')] [2024-06-15 13:46:41,637][1652491] Updated weights for policy 0, policy_version 181822 (0.0119) [2024-06-15 13:46:43,508][1652491] Updated weights for policy 0, policy_version 181878 (0.0013) [2024-06-15 13:46:44,989][1652491] Updated weights for policy 0, policy_version 181920 (0.0028) [2024-06-15 13:46:45,140][1651469] Signal inference workers to stop experience collection... (9550 times) [2024-06-15 13:46:45,190][1652491] InferenceWorker_p0-w0: stopping experience collection (9550 times) [2024-06-15 13:46:45,400][1651469] Signal inference workers to resume experience collection... (9550 times) [2024-06-15 13:46:45,405][1652491] InferenceWorker_p0-w0: resuming experience collection (9550 times) [2024-06-15 13:46:45,955][1648985] Fps is (10 sec: 39321.6, 60 sec: 46967.6, 300 sec: 45764.1). Total num frames: 372637696. Throughput: 0: 11525.7. Samples: 93215744. Policy #0 lag: (min: 2.0, avg: 63.9, max: 258.0) [2024-06-15 13:46:45,956][1648985] Avg episode reward: [(0, '113.240')] [2024-06-15 13:46:47,121][1652491] Updated weights for policy 0, policy_version 182006 (0.0245) [2024-06-15 13:46:50,955][1648985] Fps is (10 sec: 49152.9, 60 sec: 43690.7, 300 sec: 45208.7). Total num frames: 372768768. Throughput: 0: 11343.6. Samples: 93280256. Policy #0 lag: (min: 2.0, avg: 63.9, max: 258.0) [2024-06-15 13:46:50,956][1648985] Avg episode reward: [(0, '136.650')] [2024-06-15 13:46:53,405][1652491] Updated weights for policy 0, policy_version 182065 (0.0147) [2024-06-15 13:46:55,553][1652491] Updated weights for policy 0, policy_version 182128 (0.0071) [2024-06-15 13:46:55,955][1648985] Fps is (10 sec: 39320.8, 60 sec: 46967.3, 300 sec: 45319.8). Total num frames: 373030912. Throughput: 0: 11320.9. Samples: 93315072. Policy #0 lag: (min: 2.0, avg: 63.9, max: 258.0) [2024-06-15 13:46:55,956][1648985] Avg episode reward: [(0, '139.320')] [2024-06-15 13:46:55,965][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000182144_373030912.pth... [2024-06-15 13:46:56,122][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000176832_362151936.pth [2024-06-15 13:46:56,883][1652491] Updated weights for policy 0, policy_version 182176 (0.0095) [2024-06-15 13:46:58,861][1652491] Updated weights for policy 0, policy_version 182261 (0.0086) [2024-06-15 13:47:00,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 44236.8, 300 sec: 45764.1). Total num frames: 373293056. Throughput: 0: 11161.6. Samples: 93370368. Policy #0 lag: (min: 2.0, avg: 63.9, max: 258.0) [2024-06-15 13:47:00,956][1648985] Avg episode reward: [(0, '154.900')] [2024-06-15 13:47:05,057][1652491] Updated weights for policy 0, policy_version 182289 (0.0072) [2024-06-15 13:47:05,955][1648985] Fps is (10 sec: 39322.6, 60 sec: 45875.2, 300 sec: 45319.8). Total num frames: 373424128. Throughput: 0: 11411.9. Samples: 93454848. Policy #0 lag: (min: 2.0, avg: 63.9, max: 258.0) [2024-06-15 13:47:05,956][1648985] Avg episode reward: [(0, '163.520')] [2024-06-15 13:47:06,830][1652491] Updated weights for policy 0, policy_version 182370 (0.0012) [2024-06-15 13:47:08,783][1652491] Updated weights for policy 0, policy_version 182437 (0.0021) [2024-06-15 13:47:10,830][1652491] Updated weights for policy 0, policy_version 182524 (0.0017) [2024-06-15 13:47:10,955][1648985] Fps is (10 sec: 52427.9, 60 sec: 46421.3, 300 sec: 46208.4). Total num frames: 373817344. Throughput: 0: 11468.7. Samples: 93479424. Policy #0 lag: (min: 2.0, avg: 63.9, max: 258.0) [2024-06-15 13:47:10,956][1648985] Avg episode reward: [(0, '141.840')] [2024-06-15 13:47:15,955][1648985] Fps is (10 sec: 39321.8, 60 sec: 43690.7, 300 sec: 44875.5). Total num frames: 373817344. Throughput: 0: 11343.6. Samples: 93552640. Policy #0 lag: (min: 2.0, avg: 63.9, max: 258.0) [2024-06-15 13:47:15,955][1648985] Avg episode reward: [(0, '124.570')] [2024-06-15 13:47:17,315][1652491] Updated weights for policy 0, policy_version 182576 (0.0025) [2024-06-15 13:47:20,308][1652491] Updated weights for policy 0, policy_version 182673 (0.0017) [2024-06-15 13:47:20,955][1648985] Fps is (10 sec: 32768.5, 60 sec: 45329.2, 300 sec: 45542.0). Total num frames: 374145024. Throughput: 0: 11252.6. Samples: 93607424. Policy #0 lag: (min: 2.0, avg: 63.9, max: 258.0) [2024-06-15 13:47:20,956][1648985] Avg episode reward: [(0, '113.330')] [2024-06-15 13:47:22,489][1652491] Updated weights for policy 0, policy_version 182768 (0.0013) [2024-06-15 13:47:25,955][1648985] Fps is (10 sec: 52428.1, 60 sec: 43690.6, 300 sec: 45097.6). Total num frames: 374341632. Throughput: 0: 10911.3. Samples: 93639168. Policy #0 lag: (min: 2.0, avg: 63.9, max: 258.0) [2024-06-15 13:47:25,956][1648985] Avg episode reward: [(0, '129.160')] [2024-06-15 13:47:29,496][1652491] Updated weights for policy 0, policy_version 182835 (0.0014) [2024-06-15 13:47:29,857][1651469] Signal inference workers to stop experience collection... (9600 times) [2024-06-15 13:47:29,894][1652491] InferenceWorker_p0-w0: stopping experience collection (9600 times) [2024-06-15 13:47:30,075][1651469] Signal inference workers to resume experience collection... (9600 times) [2024-06-15 13:47:30,077][1652491] InferenceWorker_p0-w0: resuming experience collection (9600 times) [2024-06-15 13:47:30,955][1648985] Fps is (10 sec: 39321.5, 60 sec: 45328.9, 300 sec: 45097.7). Total num frames: 374538240. Throughput: 0: 11093.3. Samples: 93714944. Policy #0 lag: (min: 2.0, avg: 63.9, max: 258.0) [2024-06-15 13:47:30,956][1648985] Avg episode reward: [(0, '146.660')] [2024-06-15 13:47:31,137][1652491] Updated weights for policy 0, policy_version 182903 (0.0018) [2024-06-15 13:47:32,736][1652491] Updated weights for policy 0, policy_version 182961 (0.0011) [2024-06-15 13:47:33,995][1652491] Updated weights for policy 0, policy_version 183013 (0.0014) [2024-06-15 13:47:35,955][1648985] Fps is (10 sec: 52429.6, 60 sec: 43690.7, 300 sec: 45764.1). Total num frames: 374865920. Throughput: 0: 11025.1. Samples: 93776384. Policy #0 lag: (min: 2.0, avg: 63.9, max: 258.0) [2024-06-15 13:47:35,956][1648985] Avg episode reward: [(0, '163.930')] [2024-06-15 13:47:39,872][1652491] Updated weights for policy 0, policy_version 183056 (0.0013) [2024-06-15 13:47:40,955][1648985] Fps is (10 sec: 42598.6, 60 sec: 44783.1, 300 sec: 44764.4). Total num frames: 374964224. Throughput: 0: 11286.8. Samples: 93822976. Policy #0 lag: (min: 2.0, avg: 63.9, max: 258.0) [2024-06-15 13:47:40,956][1648985] Avg episode reward: [(0, '139.900')] [2024-06-15 13:47:41,595][1652491] Updated weights for policy 0, policy_version 183128 (0.0018) [2024-06-15 13:47:42,846][1652491] Updated weights for policy 0, policy_version 183173 (0.0013) [2024-06-15 13:47:44,563][1652491] Updated weights for policy 0, policy_version 183249 (0.0014) [2024-06-15 13:47:45,955][1648985] Fps is (10 sec: 52427.9, 60 sec: 45875.2, 300 sec: 46097.3). Total num frames: 375390208. Throughput: 0: 11332.2. Samples: 93880320. Policy #0 lag: (min: 2.0, avg: 63.9, max: 258.0) [2024-06-15 13:47:45,956][1648985] Avg episode reward: [(0, '149.410')] [2024-06-15 13:47:50,955][1648985] Fps is (10 sec: 42598.2, 60 sec: 43690.6, 300 sec: 44875.5). Total num frames: 375390208. Throughput: 0: 11264.0. Samples: 93961728. Policy #0 lag: (min: 2.0, avg: 63.9, max: 258.0) [2024-06-15 13:47:50,956][1648985] Avg episode reward: [(0, '143.430')] [2024-06-15 13:47:51,778][1652491] Updated weights for policy 0, policy_version 183313 (0.0011) [2024-06-15 13:47:52,773][1652491] Updated weights for policy 0, policy_version 183364 (0.0013) [2024-06-15 13:47:53,794][1652491] Updated weights for policy 0, policy_version 183411 (0.0025) [2024-06-15 13:47:55,955][1648985] Fps is (10 sec: 39322.0, 60 sec: 45875.4, 300 sec: 45764.1). Total num frames: 375783424. Throughput: 0: 11389.2. Samples: 93991936. Policy #0 lag: (min: 12.0, avg: 60.5, max: 248.0) [2024-06-15 13:47:55,955][1648985] Avg episode reward: [(0, '140.360')] [2024-06-15 13:47:56,195][1652491] Updated weights for policy 0, policy_version 183504 (0.0102) [2024-06-15 13:48:00,955][1648985] Fps is (10 sec: 52429.1, 60 sec: 43690.7, 300 sec: 45097.7). Total num frames: 375914496. Throughput: 0: 11036.4. Samples: 94049280. Policy #0 lag: (min: 12.0, avg: 60.5, max: 248.0) [2024-06-15 13:48:00,955][1648985] Avg episode reward: [(0, '143.360')] [2024-06-15 13:48:03,614][1652491] Updated weights for policy 0, policy_version 183600 (0.0094) [2024-06-15 13:48:04,559][1652491] Updated weights for policy 0, policy_version 183634 (0.0013) [2024-06-15 13:48:05,805][1652491] Updated weights for policy 0, policy_version 183686 (0.0109) [2024-06-15 13:48:05,955][1648985] Fps is (10 sec: 39321.4, 60 sec: 45875.2, 300 sec: 45319.8). Total num frames: 376176640. Throughput: 0: 11434.7. Samples: 94121984. Policy #0 lag: (min: 12.0, avg: 60.5, max: 248.0) [2024-06-15 13:48:05,956][1648985] Avg episode reward: [(0, '151.190')] [2024-06-15 13:48:06,519][1651469] Signal inference workers to stop experience collection... (9650 times) [2024-06-15 13:48:06,555][1652491] InferenceWorker_p0-w0: stopping experience collection (9650 times) [2024-06-15 13:48:06,853][1651469] Signal inference workers to resume experience collection... (9650 times) [2024-06-15 13:48:06,854][1652491] InferenceWorker_p0-w0: resuming experience collection (9650 times) [2024-06-15 13:48:07,163][1652491] Updated weights for policy 0, policy_version 183742 (0.0013) [2024-06-15 13:48:08,847][1652491] Updated weights for policy 0, policy_version 183799 (0.0027) [2024-06-15 13:48:10,955][1648985] Fps is (10 sec: 52428.5, 60 sec: 43690.8, 300 sec: 45542.0). Total num frames: 376438784. Throughput: 0: 11320.9. Samples: 94148608. Policy #0 lag: (min: 12.0, avg: 60.5, max: 248.0) [2024-06-15 13:48:10,956][1648985] Avg episode reward: [(0, '139.540')] [2024-06-15 13:48:15,515][1652491] Updated weights for policy 0, policy_version 183859 (0.0014) [2024-06-15 13:48:15,955][1648985] Fps is (10 sec: 39321.7, 60 sec: 45875.2, 300 sec: 44875.5). Total num frames: 376569856. Throughput: 0: 11502.9. Samples: 94232576. Policy #0 lag: (min: 12.0, avg: 60.5, max: 248.0) [2024-06-15 13:48:15,956][1648985] Avg episode reward: [(0, '127.050')] [2024-06-15 13:48:17,290][1652491] Updated weights for policy 0, policy_version 183936 (0.0119) [2024-06-15 13:48:18,784][1652491] Updated weights for policy 0, policy_version 183991 (0.0013) [2024-06-15 13:48:20,390][1652491] Updated weights for policy 0, policy_version 184054 (0.0038) [2024-06-15 13:48:20,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 46967.5, 300 sec: 45875.2). Total num frames: 376963072. Throughput: 0: 11355.0. Samples: 94287360. Policy #0 lag: (min: 12.0, avg: 60.5, max: 248.0) [2024-06-15 13:48:20,956][1648985] Avg episode reward: [(0, '157.020')] [2024-06-15 13:48:25,955][1648985] Fps is (10 sec: 39321.3, 60 sec: 43690.7, 300 sec: 44877.2). Total num frames: 376963072. Throughput: 0: 11241.2. Samples: 94328832. Policy #0 lag: (min: 12.0, avg: 60.5, max: 248.0) [2024-06-15 13:48:25,956][1648985] Avg episode reward: [(0, '165.120')] [2024-06-15 13:48:26,809][1652491] Updated weights for policy 0, policy_version 184112 (0.0105) [2024-06-15 13:48:28,596][1652491] Updated weights for policy 0, policy_version 184176 (0.0012) [2024-06-15 13:48:30,517][1652491] Updated weights for policy 0, policy_version 184254 (0.0013) [2024-06-15 13:48:30,955][1648985] Fps is (10 sec: 39321.8, 60 sec: 46967.5, 300 sec: 45764.1). Total num frames: 377356288. Throughput: 0: 11286.8. Samples: 94388224. Policy #0 lag: (min: 12.0, avg: 60.5, max: 248.0) [2024-06-15 13:48:30,955][1648985] Avg episode reward: [(0, '161.990')] [2024-06-15 13:48:35,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 43690.6, 300 sec: 44875.5). Total num frames: 377487360. Throughput: 0: 11138.8. Samples: 94462976. Policy #0 lag: (min: 12.0, avg: 60.5, max: 248.0) [2024-06-15 13:48:35,956][1648985] Avg episode reward: [(0, '158.800')] [2024-06-15 13:48:37,436][1652491] Updated weights for policy 0, policy_version 184325 (0.0013) [2024-06-15 13:48:38,934][1652491] Updated weights for policy 0, policy_version 184385 (0.0012) [2024-06-15 13:48:40,572][1652491] Updated weights for policy 0, policy_version 184451 (0.0012) [2024-06-15 13:48:40,955][1648985] Fps is (10 sec: 42598.0, 60 sec: 46967.4, 300 sec: 45430.9). Total num frames: 377782272. Throughput: 0: 11298.1. Samples: 94500352. Policy #0 lag: (min: 12.0, avg: 60.5, max: 248.0) [2024-06-15 13:48:40,956][1648985] Avg episode reward: [(0, '164.190')] [2024-06-15 13:48:42,119][1652491] Updated weights for policy 0, policy_version 184512 (0.0109) [2024-06-15 13:48:45,955][1648985] Fps is (10 sec: 52428.1, 60 sec: 43690.6, 300 sec: 45208.7). Total num frames: 378011648. Throughput: 0: 11366.3. Samples: 94560768. Policy #0 lag: (min: 12.0, avg: 60.5, max: 248.0) [2024-06-15 13:48:45,956][1648985] Avg episode reward: [(0, '158.730')] [2024-06-15 13:48:48,962][1652491] Updated weights for policy 0, policy_version 184582 (0.0011) [2024-06-15 13:48:49,726][1651469] Signal inference workers to stop experience collection... (9700 times) [2024-06-15 13:48:49,821][1652491] InferenceWorker_p0-w0: stopping experience collection (9700 times) [2024-06-15 13:48:50,014][1651469] Signal inference workers to resume experience collection... (9700 times) [2024-06-15 13:48:50,015][1652491] InferenceWorker_p0-w0: resuming experience collection (9700 times) [2024-06-15 13:48:50,427][1652491] Updated weights for policy 0, policy_version 184640 (0.0012) [2024-06-15 13:48:50,955][1648985] Fps is (10 sec: 39321.8, 60 sec: 46421.4, 300 sec: 44986.6). Total num frames: 378175488. Throughput: 0: 11423.3. Samples: 94636032. Policy #0 lag: (min: 12.0, avg: 60.5, max: 248.0) [2024-06-15 13:48:50,956][1648985] Avg episode reward: [(0, '156.750')] [2024-06-15 13:48:52,124][1652491] Updated weights for policy 0, policy_version 184707 (0.0101) [2024-06-15 13:48:53,179][1652491] Updated weights for policy 0, policy_version 184757 (0.0015) [2024-06-15 13:48:54,685][1652491] Updated weights for policy 0, policy_version 184802 (0.0089) [2024-06-15 13:48:55,955][1648985] Fps is (10 sec: 52429.5, 60 sec: 45875.1, 300 sec: 45653.2). Total num frames: 378535936. Throughput: 0: 11502.9. Samples: 94666240. Policy #0 lag: (min: 12.0, avg: 60.5, max: 248.0) [2024-06-15 13:48:55,956][1648985] Avg episode reward: [(0, '141.690')] [2024-06-15 13:48:55,966][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000184832_378535936.pth... [2024-06-15 13:48:56,034][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000179488_367591424.pth [2024-06-15 13:49:00,521][1652491] Updated weights for policy 0, policy_version 184864 (0.0028) [2024-06-15 13:49:00,955][1648985] Fps is (10 sec: 45875.0, 60 sec: 45329.0, 300 sec: 44986.6). Total num frames: 378634240. Throughput: 0: 11446.0. Samples: 94747648. Policy #0 lag: (min: 1.0, avg: 62.4, max: 257.0) [2024-06-15 13:49:00,956][1648985] Avg episode reward: [(0, '142.390')] [2024-06-15 13:49:02,071][1652491] Updated weights for policy 0, policy_version 184932 (0.0013) [2024-06-15 13:49:03,750][1652491] Updated weights for policy 0, policy_version 184993 (0.0013) [2024-06-15 13:49:05,126][1652491] Updated weights for policy 0, policy_version 185025 (0.0014) [2024-06-15 13:49:05,955][1648985] Fps is (10 sec: 45875.8, 60 sec: 46967.5, 300 sec: 45764.1). Total num frames: 378994688. Throughput: 0: 11502.9. Samples: 94804992. Policy #0 lag: (min: 1.0, avg: 62.4, max: 257.0) [2024-06-15 13:49:05,955][1648985] Avg episode reward: [(0, '154.100')] [2024-06-15 13:49:06,291][1652491] Updated weights for policy 0, policy_version 185084 (0.0012) [2024-06-15 13:49:10,955][1648985] Fps is (10 sec: 42598.8, 60 sec: 43690.7, 300 sec: 44875.5). Total num frames: 379060224. Throughput: 0: 11434.7. Samples: 94843392. Policy #0 lag: (min: 1.0, avg: 62.4, max: 257.0) [2024-06-15 13:49:10,955][1648985] Avg episode reward: [(0, '144.140')] [2024-06-15 13:49:12,478][1652491] Updated weights for policy 0, policy_version 185136 (0.0011) [2024-06-15 13:49:13,643][1652491] Updated weights for policy 0, policy_version 185186 (0.0014) [2024-06-15 13:49:15,628][1652491] Updated weights for policy 0, policy_version 185264 (0.0103) [2024-06-15 13:49:15,955][1648985] Fps is (10 sec: 45874.7, 60 sec: 48059.7, 300 sec: 45764.2). Total num frames: 379453440. Throughput: 0: 11548.4. Samples: 94907904. Policy #0 lag: (min: 1.0, avg: 62.4, max: 257.0) [2024-06-15 13:49:15,956][1648985] Avg episode reward: [(0, '148.900')] [2024-06-15 13:49:17,957][1652491] Updated weights for policy 0, policy_version 185316 (0.0022) [2024-06-15 13:49:20,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 43690.7, 300 sec: 44875.5). Total num frames: 379584512. Throughput: 0: 11389.2. Samples: 94975488. Policy #0 lag: (min: 1.0, avg: 62.4, max: 257.0) [2024-06-15 13:49:20,955][1648985] Avg episode reward: [(0, '129.150')] [2024-06-15 13:49:23,572][1652491] Updated weights for policy 0, policy_version 185360 (0.0012) [2024-06-15 13:49:25,213][1652491] Updated weights for policy 0, policy_version 185428 (0.0188) [2024-06-15 13:49:25,955][1648985] Fps is (10 sec: 36044.3, 60 sec: 47513.5, 300 sec: 45208.7). Total num frames: 379813888. Throughput: 0: 11411.9. Samples: 95013888. Policy #0 lag: (min: 1.0, avg: 62.4, max: 257.0) [2024-06-15 13:49:25,956][1648985] Avg episode reward: [(0, '119.940')] [2024-06-15 13:49:26,170][1652491] Updated weights for policy 0, policy_version 185469 (0.0013) [2024-06-15 13:49:27,131][1651469] Signal inference workers to stop experience collection... (9750 times) [2024-06-15 13:49:27,169][1652491] InferenceWorker_p0-w0: stopping experience collection (9750 times) [2024-06-15 13:49:27,182][1652491] Updated weights for policy 0, policy_version 185507 (0.0012) [2024-06-15 13:49:27,311][1651469] Signal inference workers to resume experience collection... (9750 times) [2024-06-15 13:49:27,311][1652491] InferenceWorker_p0-w0: resuming experience collection (9750 times) [2024-06-15 13:49:30,955][1648985] Fps is (10 sec: 52428.3, 60 sec: 45875.1, 300 sec: 45319.8). Total num frames: 380108800. Throughput: 0: 11309.5. Samples: 95069696. Policy #0 lag: (min: 1.0, avg: 62.4, max: 257.0) [2024-06-15 13:49:30,956][1648985] Avg episode reward: [(0, '120.160')] [2024-06-15 13:49:34,954][1652491] Updated weights for policy 0, policy_version 185602 (0.0014) [2024-06-15 13:49:35,955][1648985] Fps is (10 sec: 36045.5, 60 sec: 44783.0, 300 sec: 44653.4). Total num frames: 380174336. Throughput: 0: 11320.9. Samples: 95145472. Policy #0 lag: (min: 1.0, avg: 62.4, max: 257.0) [2024-06-15 13:49:35,956][1648985] Avg episode reward: [(0, '127.170')] [2024-06-15 13:49:36,302][1652491] Updated weights for policy 0, policy_version 185664 (0.0014) [2024-06-15 13:49:37,937][1652491] Updated weights for policy 0, policy_version 185724 (0.0014) [2024-06-15 13:49:39,615][1652491] Updated weights for policy 0, policy_version 185788 (0.0024) [2024-06-15 13:49:40,955][1648985] Fps is (10 sec: 45875.2, 60 sec: 46421.3, 300 sec: 45542.0). Total num frames: 380567552. Throughput: 0: 11309.5. Samples: 95175168. Policy #0 lag: (min: 1.0, avg: 62.4, max: 257.0) [2024-06-15 13:49:40,956][1648985] Avg episode reward: [(0, '148.080')] [2024-06-15 13:49:41,141][1652491] Updated weights for policy 0, policy_version 185840 (0.0014) [2024-06-15 13:49:45,955][1648985] Fps is (10 sec: 45874.8, 60 sec: 43690.8, 300 sec: 44431.2). Total num frames: 380633088. Throughput: 0: 11093.3. Samples: 95246848. Policy #0 lag: (min: 1.0, avg: 62.4, max: 257.0) [2024-06-15 13:49:45,956][1648985] Avg episode reward: [(0, '166.880')] [2024-06-15 13:49:46,878][1652491] Updated weights for policy 0, policy_version 185904 (0.0015) [2024-06-15 13:49:47,770][1652491] Updated weights for policy 0, policy_version 185936 (0.0018) [2024-06-15 13:49:48,812][1652491] Updated weights for policy 0, policy_version 185974 (0.0012) [2024-06-15 13:49:50,955][1648985] Fps is (10 sec: 39321.7, 60 sec: 46421.3, 300 sec: 45542.0). Total num frames: 380960768. Throughput: 0: 11355.0. Samples: 95315968. Policy #0 lag: (min: 1.0, avg: 62.4, max: 257.0) [2024-06-15 13:49:50,956][1648985] Avg episode reward: [(0, '177.040')] [2024-06-15 13:49:50,983][1652491] Updated weights for policy 0, policy_version 186032 (0.0013) [2024-06-15 13:49:51,306][1651469] Saving new best policy, reward=177.040! [2024-06-15 13:49:52,578][1652491] Updated weights for policy 0, policy_version 186111 (0.0014) [2024-06-15 13:49:55,955][1648985] Fps is (10 sec: 52429.1, 60 sec: 43690.7, 300 sec: 44875.5). Total num frames: 381157376. Throughput: 0: 11229.9. Samples: 95348736. Policy #0 lag: (min: 1.0, avg: 62.4, max: 257.0) [2024-06-15 13:49:55,956][1648985] Avg episode reward: [(0, '154.820')] [2024-06-15 13:49:58,059][1652491] Updated weights for policy 0, policy_version 186176 (0.0013) [2024-06-15 13:49:59,856][1652491] Updated weights for policy 0, policy_version 186239 (0.0012) [2024-06-15 13:50:00,955][1648985] Fps is (10 sec: 45875.0, 60 sec: 46421.3, 300 sec: 45319.8). Total num frames: 381419520. Throughput: 0: 11332.3. Samples: 95417856. Policy #0 lag: (min: 1.0, avg: 62.4, max: 257.0) [2024-06-15 13:50:00,956][1648985] Avg episode reward: [(0, '130.610')] [2024-06-15 13:50:03,280][1652491] Updated weights for policy 0, policy_version 186304 (0.0014) [2024-06-15 13:50:05,955][1648985] Fps is (10 sec: 52427.3, 60 sec: 44782.7, 300 sec: 45430.8). Total num frames: 381681664. Throughput: 0: 11411.8. Samples: 95489024. Policy #0 lag: (min: 1.0, avg: 62.4, max: 257.0) [2024-06-15 13:50:05,956][1648985] Avg episode reward: [(0, '148.870')] [2024-06-15 13:50:08,054][1652491] Updated weights for policy 0, policy_version 186369 (0.0014) [2024-06-15 13:50:09,369][1652491] Updated weights for policy 0, policy_version 186432 (0.0014) [2024-06-15 13:50:10,955][1648985] Fps is (10 sec: 42598.6, 60 sec: 46421.3, 300 sec: 44986.6). Total num frames: 381845504. Throughput: 0: 11355.1. Samples: 95524864. Policy #0 lag: (min: 15.0, avg: 105.3, max: 271.0) [2024-06-15 13:50:10,956][1648985] Avg episode reward: [(0, '152.120')] [2024-06-15 13:50:11,877][1652491] Updated weights for policy 0, policy_version 186495 (0.0014) [2024-06-15 13:50:13,218][1651469] Signal inference workers to stop experience collection... (9800 times) [2024-06-15 13:50:13,322][1652491] InferenceWorker_p0-w0: stopping experience collection (9800 times) [2024-06-15 13:50:13,401][1651469] Signal inference workers to resume experience collection... (9800 times) [2024-06-15 13:50:13,402][1652491] InferenceWorker_p0-w0: resuming experience collection (9800 times) [2024-06-15 13:50:14,215][1652491] Updated weights for policy 0, policy_version 186548 (0.0012) [2024-06-15 13:50:15,680][1652491] Updated weights for policy 0, policy_version 186618 (0.0091) [2024-06-15 13:50:15,955][1648985] Fps is (10 sec: 52430.6, 60 sec: 45875.3, 300 sec: 45875.2). Total num frames: 382205952. Throughput: 0: 11525.7. Samples: 95588352. Policy #0 lag: (min: 15.0, avg: 105.3, max: 271.0) [2024-06-15 13:50:15,955][1648985] Avg episode reward: [(0, '137.730')] [2024-06-15 13:50:20,780][1652491] Updated weights for policy 0, policy_version 186672 (0.0013) [2024-06-15 13:50:20,955][1648985] Fps is (10 sec: 45874.0, 60 sec: 45328.8, 300 sec: 44764.4). Total num frames: 382304256. Throughput: 0: 11423.2. Samples: 95659520. Policy #0 lag: (min: 15.0, avg: 105.3, max: 271.0) [2024-06-15 13:50:20,956][1648985] Avg episode reward: [(0, '142.760')] [2024-06-15 13:50:22,312][1652491] Updated weights for policy 0, policy_version 186706 (0.0011) [2024-06-15 13:50:23,442][1652491] Updated weights for policy 0, policy_version 186752 (0.0014) [2024-06-15 13:50:25,955][1648985] Fps is (10 sec: 39321.5, 60 sec: 46421.5, 300 sec: 45764.1). Total num frames: 382599168. Throughput: 0: 11548.5. Samples: 95694848. Policy #0 lag: (min: 15.0, avg: 105.3, max: 271.0) [2024-06-15 13:50:25,955][1648985] Avg episode reward: [(0, '159.980')] [2024-06-15 13:50:26,271][1652491] Updated weights for policy 0, policy_version 186836 (0.0041) [2024-06-15 13:50:30,955][1648985] Fps is (10 sec: 42599.4, 60 sec: 43690.7, 300 sec: 44875.5). Total num frames: 382730240. Throughput: 0: 11616.7. Samples: 95769600. Policy #0 lag: (min: 15.0, avg: 105.3, max: 271.0) [2024-06-15 13:50:30,956][1648985] Avg episode reward: [(0, '152.700')] [2024-06-15 13:50:31,416][1652491] Updated weights for policy 0, policy_version 186900 (0.0015) [2024-06-15 13:50:32,564][1652491] Updated weights for policy 0, policy_version 186943 (0.0028) [2024-06-15 13:50:34,856][1652491] Updated weights for policy 0, policy_version 186994 (0.0013) [2024-06-15 13:50:35,955][1648985] Fps is (10 sec: 45875.3, 60 sec: 48059.8, 300 sec: 45542.0). Total num frames: 383057920. Throughput: 0: 11480.2. Samples: 95832576. Policy #0 lag: (min: 15.0, avg: 105.3, max: 271.0) [2024-06-15 13:50:35,955][1648985] Avg episode reward: [(0, '146.640')] [2024-06-15 13:50:36,247][1652491] Updated weights for policy 0, policy_version 187060 (0.0014) [2024-06-15 13:50:37,439][1652491] Updated weights for policy 0, policy_version 187132 (0.0015) [2024-06-15 13:50:40,955][1648985] Fps is (10 sec: 52427.3, 60 sec: 44782.7, 300 sec: 45541.9). Total num frames: 383254528. Throughput: 0: 11605.2. Samples: 95870976. Policy #0 lag: (min: 15.0, avg: 105.3, max: 271.0) [2024-06-15 13:50:40,956][1648985] Avg episode reward: [(0, '147.480')] [2024-06-15 13:50:43,723][1652491] Updated weights for policy 0, policy_version 187184 (0.0015) [2024-06-15 13:50:44,832][1652491] Updated weights for policy 0, policy_version 187216 (0.0012) [2024-06-15 13:50:45,955][1648985] Fps is (10 sec: 42597.9, 60 sec: 47513.6, 300 sec: 45208.7). Total num frames: 383483904. Throughput: 0: 11696.4. Samples: 95944192. Policy #0 lag: (min: 15.0, avg: 105.3, max: 271.0) [2024-06-15 13:50:45,956][1648985] Avg episode reward: [(0, '136.620')] [2024-06-15 13:50:46,683][1652491] Updated weights for policy 0, policy_version 187283 (0.0024) [2024-06-15 13:50:47,992][1652491] Updated weights for policy 0, policy_version 187344 (0.0013) [2024-06-15 13:50:49,029][1652491] Updated weights for policy 0, policy_version 187392 (0.0020) [2024-06-15 13:50:50,955][1648985] Fps is (10 sec: 52430.7, 60 sec: 46967.5, 300 sec: 45986.3). Total num frames: 383778816. Throughput: 0: 11491.6. Samples: 96006144. Policy #0 lag: (min: 15.0, avg: 105.3, max: 271.0) [2024-06-15 13:50:50,956][1648985] Avg episode reward: [(0, '127.030')] [2024-06-15 13:50:55,218][1651469] Signal inference workers to stop experience collection... (9850 times) [2024-06-15 13:50:55,283][1652491] InferenceWorker_p0-w0: stopping experience collection (9850 times) [2024-06-15 13:50:55,544][1651469] Signal inference workers to resume experience collection... (9850 times) [2024-06-15 13:50:55,545][1652491] InferenceWorker_p0-w0: resuming experience collection (9850 times) [2024-06-15 13:50:55,911][1652491] Updated weights for policy 0, policy_version 187450 (0.0012) [2024-06-15 13:50:55,956][1648985] Fps is (10 sec: 39320.0, 60 sec: 45328.7, 300 sec: 44875.4). Total num frames: 383877120. Throughput: 0: 11741.7. Samples: 96053248. Policy #0 lag: (min: 15.0, avg: 105.3, max: 271.0) [2024-06-15 13:50:55,956][1648985] Avg episode reward: [(0, '151.850')] [2024-06-15 13:50:56,395][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000187472_383942656.pth... [2024-06-15 13:50:56,562][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000182144_373030912.pth [2024-06-15 13:50:56,572][1651469] Saving a milestone train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/milestones/checkpoint_000187472_383942656.pth [2024-06-15 13:50:57,602][1652491] Updated weights for policy 0, policy_version 187520 (0.0013) [2024-06-15 13:50:59,452][1652491] Updated weights for policy 0, policy_version 187600 (0.0013) [2024-06-15 13:51:00,955][1648985] Fps is (10 sec: 52428.5, 60 sec: 48059.8, 300 sec: 46208.4). Total num frames: 384303104. Throughput: 0: 11559.8. Samples: 96108544. Policy #0 lag: (min: 15.0, avg: 105.3, max: 271.0) [2024-06-15 13:51:00,956][1648985] Avg episode reward: [(0, '155.450')] [2024-06-15 13:51:05,860][1652491] Updated weights for policy 0, policy_version 187649 (0.0012) [2024-06-15 13:51:05,955][1648985] Fps is (10 sec: 42600.2, 60 sec: 43690.8, 300 sec: 44986.6). Total num frames: 384303104. Throughput: 0: 11798.8. Samples: 96190464. Policy #0 lag: (min: 15.0, avg: 105.3, max: 271.0) [2024-06-15 13:51:05,956][1648985] Avg episode reward: [(0, '151.770')] [2024-06-15 13:51:07,376][1652491] Updated weights for policy 0, policy_version 187708 (0.0014) [2024-06-15 13:51:09,192][1652491] Updated weights for policy 0, policy_version 187779 (0.0099) [2024-06-15 13:51:10,184][1652491] Updated weights for policy 0, policy_version 187832 (0.0016) [2024-06-15 13:51:10,955][1648985] Fps is (10 sec: 42598.3, 60 sec: 48059.7, 300 sec: 45875.2). Total num frames: 384729088. Throughput: 0: 11582.6. Samples: 96216064. Policy #0 lag: (min: 15.0, avg: 105.3, max: 271.0) [2024-06-15 13:51:10,956][1648985] Avg episode reward: [(0, '134.470')] [2024-06-15 13:51:11,877][1652491] Updated weights for policy 0, policy_version 187894 (0.0013) [2024-06-15 13:51:15,955][1648985] Fps is (10 sec: 52428.3, 60 sec: 43690.5, 300 sec: 45430.9). Total num frames: 384827392. Throughput: 0: 11548.4. Samples: 96289280. Policy #0 lag: (min: 15.0, avg: 105.3, max: 271.0) [2024-06-15 13:51:15,956][1648985] Avg episode reward: [(0, '132.930')] [2024-06-15 13:51:17,045][1652491] Updated weights for policy 0, policy_version 187923 (0.0092) [2024-06-15 13:51:17,750][1652491] Updated weights for policy 0, policy_version 187968 (0.0013) [2024-06-15 13:51:19,758][1652491] Updated weights for policy 0, policy_version 188032 (0.0028) [2024-06-15 13:51:20,955][1648985] Fps is (10 sec: 45875.3, 60 sec: 48059.9, 300 sec: 45653.0). Total num frames: 385187840. Throughput: 0: 11673.6. Samples: 96357888. Policy #0 lag: (min: 7.0, avg: 79.6, max: 263.0) [2024-06-15 13:51:20,956][1648985] Avg episode reward: [(0, '148.640')] [2024-06-15 13:51:21,032][1652491] Updated weights for policy 0, policy_version 188093 (0.0017) [2024-06-15 13:51:22,843][1652491] Updated weights for policy 0, policy_version 188151 (0.0024) [2024-06-15 13:51:25,955][1648985] Fps is (10 sec: 52429.8, 60 sec: 45875.2, 300 sec: 45875.2). Total num frames: 385351680. Throughput: 0: 11559.9. Samples: 96391168. Policy #0 lag: (min: 7.0, avg: 79.6, max: 263.0) [2024-06-15 13:51:25,955][1648985] Avg episode reward: [(0, '163.440')] [2024-06-15 13:51:28,660][1652491] Updated weights for policy 0, policy_version 188194 (0.0027) [2024-06-15 13:51:30,438][1652491] Updated weights for policy 0, policy_version 188283 (0.0013) [2024-06-15 13:51:30,955][1648985] Fps is (10 sec: 42598.5, 60 sec: 48059.8, 300 sec: 45319.8). Total num frames: 385613824. Throughput: 0: 11673.6. Samples: 96469504. Policy #0 lag: (min: 7.0, avg: 79.6, max: 263.0) [2024-06-15 13:51:30,956][1648985] Avg episode reward: [(0, '149.310')] [2024-06-15 13:51:32,473][1651469] Signal inference workers to stop experience collection... (9900 times) [2024-06-15 13:51:32,524][1652491] InferenceWorker_p0-w0: stopping experience collection (9900 times) [2024-06-15 13:51:32,525][1652491] Updated weights for policy 0, policy_version 188353 (0.0037) [2024-06-15 13:51:32,859][1651469] Signal inference workers to resume experience collection... (9900 times) [2024-06-15 13:51:32,859][1652491] InferenceWorker_p0-w0: resuming experience collection (9900 times) [2024-06-15 13:51:34,005][1652491] Updated weights for policy 0, policy_version 188414 (0.0064) [2024-06-15 13:51:35,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 46967.5, 300 sec: 46097.4). Total num frames: 385875968. Throughput: 0: 11741.9. Samples: 96534528. Policy #0 lag: (min: 7.0, avg: 79.6, max: 263.0) [2024-06-15 13:51:35,956][1648985] Avg episode reward: [(0, '139.240')] [2024-06-15 13:51:40,955][1648985] Fps is (10 sec: 42598.1, 60 sec: 46421.5, 300 sec: 45430.9). Total num frames: 386039808. Throughput: 0: 11742.0. Samples: 96581632. Policy #0 lag: (min: 7.0, avg: 79.6, max: 263.0) [2024-06-15 13:51:40,956][1648985] Avg episode reward: [(0, '157.800')] [2024-06-15 13:51:40,994][1652491] Updated weights for policy 0, policy_version 188497 (0.0013) [2024-06-15 13:51:42,390][1652491] Updated weights for policy 0, policy_version 188565 (0.0110) [2024-06-15 13:51:43,906][1652491] Updated weights for policy 0, policy_version 188624 (0.0016) [2024-06-15 13:51:45,130][1652491] Updated weights for policy 0, policy_version 188667 (0.0011) [2024-06-15 13:51:45,955][1648985] Fps is (10 sec: 52428.5, 60 sec: 48605.9, 300 sec: 46208.4). Total num frames: 386400256. Throughput: 0: 11776.0. Samples: 96638464. Policy #0 lag: (min: 7.0, avg: 79.6, max: 263.0) [2024-06-15 13:51:45,956][1648985] Avg episode reward: [(0, '147.310')] [2024-06-15 13:51:50,566][1652491] Updated weights for policy 0, policy_version 188710 (0.0029) [2024-06-15 13:51:50,955][1648985] Fps is (10 sec: 45875.5, 60 sec: 45329.1, 300 sec: 45653.1). Total num frames: 386498560. Throughput: 0: 11776.0. Samples: 96720384. Policy #0 lag: (min: 7.0, avg: 79.6, max: 263.0) [2024-06-15 13:51:50,956][1648985] Avg episode reward: [(0, '122.850')] [2024-06-15 13:51:51,616][1652491] Updated weights for policy 0, policy_version 188752 (0.0011) [2024-06-15 13:51:53,010][1652491] Updated weights for policy 0, policy_version 188819 (0.0015) [2024-06-15 13:51:54,718][1652491] Updated weights for policy 0, policy_version 188868 (0.0018) [2024-06-15 13:51:55,826][1652491] Updated weights for policy 0, policy_version 188920 (0.0013) [2024-06-15 13:51:55,955][1648985] Fps is (10 sec: 49151.7, 60 sec: 50244.6, 300 sec: 46097.3). Total num frames: 386891776. Throughput: 0: 11889.8. Samples: 96751104. Policy #0 lag: (min: 7.0, avg: 79.6, max: 263.0) [2024-06-15 13:51:55,956][1648985] Avg episode reward: [(0, '140.790')] [2024-06-15 13:52:00,266][1652491] Updated weights for policy 0, policy_version 188948 (0.0020) [2024-06-15 13:52:00,947][1652491] Updated weights for policy 0, policy_version 188988 (0.0013) [2024-06-15 13:52:00,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 45329.1, 300 sec: 46097.4). Total num frames: 387022848. Throughput: 0: 12094.6. Samples: 96833536. Policy #0 lag: (min: 7.0, avg: 79.6, max: 263.0) [2024-06-15 13:52:00,956][1648985] Avg episode reward: [(0, '136.300')] [2024-06-15 13:52:02,377][1652491] Updated weights for policy 0, policy_version 189040 (0.0011) [2024-06-15 13:52:04,290][1652491] Updated weights for policy 0, policy_version 189104 (0.0013) [2024-06-15 13:52:05,177][1652491] Updated weights for policy 0, policy_version 189138 (0.0012) [2024-06-15 13:52:05,959][1648985] Fps is (10 sec: 52408.0, 60 sec: 51879.2, 300 sec: 46096.8). Total num frames: 387416064. Throughput: 0: 12139.0. Samples: 96904192. Policy #0 lag: (min: 7.0, avg: 79.6, max: 263.0) [2024-06-15 13:52:05,960][1648985] Avg episode reward: [(0, '132.890')] [2024-06-15 13:52:05,999][1652491] Updated weights for policy 0, policy_version 189179 (0.0019) [2024-06-15 13:52:10,430][1652491] Updated weights for policy 0, policy_version 189241 (0.0125) [2024-06-15 13:52:10,955][1648985] Fps is (10 sec: 55706.2, 60 sec: 47513.7, 300 sec: 46652.8). Total num frames: 387579904. Throughput: 0: 12356.3. Samples: 96947200. Policy #0 lag: (min: 7.0, avg: 79.6, max: 263.0) [2024-06-15 13:52:10,955][1648985] Avg episode reward: [(0, '134.020')] [2024-06-15 13:52:12,518][1652491] Updated weights for policy 0, policy_version 189265 (0.0012) [2024-06-15 13:52:14,548][1652491] Updated weights for policy 0, policy_version 189315 (0.0015) [2024-06-15 13:52:15,147][1651469] Signal inference workers to stop experience collection... (9950 times) [2024-06-15 13:52:15,173][1652491] InferenceWorker_p0-w0: stopping experience collection (9950 times) [2024-06-15 13:52:15,319][1651469] Signal inference workers to resume experience collection... (9950 times) [2024-06-15 13:52:15,319][1652491] InferenceWorker_p0-w0: resuming experience collection (9950 times) [2024-06-15 13:52:15,753][1652491] Updated weights for policy 0, policy_version 189376 (0.0014) [2024-06-15 13:52:15,955][1648985] Fps is (10 sec: 42615.6, 60 sec: 50244.4, 300 sec: 46430.6). Total num frames: 387842048. Throughput: 0: 12231.1. Samples: 97019904. Policy #0 lag: (min: 7.0, avg: 79.6, max: 263.0) [2024-06-15 13:52:15,956][1648985] Avg episode reward: [(0, '139.770')] [2024-06-15 13:52:17,071][1652491] Updated weights for policy 0, policy_version 189433 (0.0018) [2024-06-15 13:52:20,955][1648985] Fps is (10 sec: 45874.2, 60 sec: 47513.5, 300 sec: 46430.6). Total num frames: 388038656. Throughput: 0: 12344.8. Samples: 97090048. Policy #0 lag: (min: 7.0, avg: 79.6, max: 263.0) [2024-06-15 13:52:20,957][1648985] Avg episode reward: [(0, '138.610')] [2024-06-15 13:52:21,036][1652491] Updated weights for policy 0, policy_version 189476 (0.0013) [2024-06-15 13:52:24,264][1652491] Updated weights for policy 0, policy_version 189552 (0.0014) [2024-06-15 13:52:25,900][1652491] Updated weights for policy 0, policy_version 189600 (0.0013) [2024-06-15 13:52:25,955][1648985] Fps is (10 sec: 45874.9, 60 sec: 49151.9, 300 sec: 46652.7). Total num frames: 388300800. Throughput: 0: 12060.4. Samples: 97124352. Policy #0 lag: (min: 15.0, avg: 117.2, max: 271.0) [2024-06-15 13:52:25,956][1648985] Avg episode reward: [(0, '147.050')] [2024-06-15 13:52:26,699][1652491] Updated weights for policy 0, policy_version 189632 (0.0014) [2024-06-15 13:52:28,375][1652491] Updated weights for policy 0, policy_version 189690 (0.0014) [2024-06-15 13:52:30,955][1648985] Fps is (10 sec: 45874.8, 60 sec: 48059.6, 300 sec: 46208.4). Total num frames: 388497408. Throughput: 0: 12424.5. Samples: 97197568. Policy #0 lag: (min: 15.0, avg: 117.2, max: 271.0) [2024-06-15 13:52:30,956][1648985] Avg episode reward: [(0, '148.440')] [2024-06-15 13:52:32,619][1652491] Updated weights for policy 0, policy_version 189753 (0.0020) [2024-06-15 13:52:34,121][1652491] Updated weights for policy 0, policy_version 189792 (0.0014) [2024-06-15 13:52:34,951][1652491] Updated weights for policy 0, policy_version 189824 (0.0037) [2024-06-15 13:52:35,955][1648985] Fps is (10 sec: 45875.4, 60 sec: 48059.7, 300 sec: 46763.8). Total num frames: 388759552. Throughput: 0: 12231.1. Samples: 97270784. Policy #0 lag: (min: 15.0, avg: 117.2, max: 271.0) [2024-06-15 13:52:35,956][1648985] Avg episode reward: [(0, '138.540')] [2024-06-15 13:52:36,680][1652491] Updated weights for policy 0, policy_version 189872 (0.0013) [2024-06-15 13:52:38,694][1652491] Updated weights for policy 0, policy_version 189920 (0.0013) [2024-06-15 13:52:39,465][1652491] Updated weights for policy 0, policy_version 189952 (0.0012) [2024-06-15 13:52:40,955][1648985] Fps is (10 sec: 52429.4, 60 sec: 49698.1, 300 sec: 46208.4). Total num frames: 389021696. Throughput: 0: 12333.5. Samples: 97306112. Policy #0 lag: (min: 15.0, avg: 117.2, max: 271.0) [2024-06-15 13:52:40,956][1648985] Avg episode reward: [(0, '137.210')] [2024-06-15 13:52:45,125][1652491] Updated weights for policy 0, policy_version 190033 (0.0139) [2024-06-15 13:52:45,955][1648985] Fps is (10 sec: 49152.1, 60 sec: 47513.6, 300 sec: 46986.0). Total num frames: 389251072. Throughput: 0: 12162.8. Samples: 97380864. Policy #0 lag: (min: 15.0, avg: 117.2, max: 271.0) [2024-06-15 13:52:45,956][1648985] Avg episode reward: [(0, '150.020')] [2024-06-15 13:52:46,203][1652491] Updated weights for policy 0, policy_version 190080 (0.0020) [2024-06-15 13:52:47,841][1652491] Updated weights for policy 0, policy_version 190142 (0.0014) [2024-06-15 13:52:50,681][1652491] Updated weights for policy 0, policy_version 190202 (0.0012) [2024-06-15 13:52:50,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 50790.3, 300 sec: 46652.7). Total num frames: 389545984. Throughput: 0: 11936.3. Samples: 97441280. Policy #0 lag: (min: 15.0, avg: 117.2, max: 271.0) [2024-06-15 13:52:50,956][1648985] Avg episode reward: [(0, '157.120')] [2024-06-15 13:52:54,798][1652491] Updated weights for policy 0, policy_version 190267 (0.0014) [2024-06-15 13:52:55,955][1648985] Fps is (10 sec: 42598.3, 60 sec: 46421.4, 300 sec: 46652.7). Total num frames: 389677056. Throughput: 0: 11992.1. Samples: 97486848. Policy #0 lag: (min: 15.0, avg: 117.2, max: 271.0) [2024-06-15 13:52:55,956][1648985] Avg episode reward: [(0, '152.760')] [2024-06-15 13:52:56,450][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000190288_389709824.pth... [2024-06-15 13:52:56,624][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000184832_378535936.pth [2024-06-15 13:52:57,329][1652491] Updated weights for policy 0, policy_version 190320 (0.0014) [2024-06-15 13:52:58,282][1651469] Signal inference workers to stop experience collection... (10000 times) [2024-06-15 13:52:58,326][1652491] Updated weights for policy 0, policy_version 190370 (0.0076) [2024-06-15 13:52:58,383][1652491] InferenceWorker_p0-w0: stopping experience collection (10000 times) [2024-06-15 13:52:58,515][1651469] Signal inference workers to resume experience collection... (10000 times) [2024-06-15 13:52:58,516][1652491] InferenceWorker_p0-w0: resuming experience collection (10000 times) [2024-06-15 13:53:00,955][1648985] Fps is (10 sec: 42599.0, 60 sec: 49152.1, 300 sec: 46763.8). Total num frames: 389971968. Throughput: 0: 11821.5. Samples: 97551872. Policy #0 lag: (min: 15.0, avg: 117.2, max: 271.0) [2024-06-15 13:53:00,955][1648985] Avg episode reward: [(0, '144.520')] [2024-06-15 13:53:01,272][1652491] Updated weights for policy 0, policy_version 190434 (0.0012) [2024-06-15 13:53:04,797][1652491] Updated weights for policy 0, policy_version 190468 (0.0030) [2024-06-15 13:53:05,955][1648985] Fps is (10 sec: 49151.9, 60 sec: 45878.3, 300 sec: 46541.7). Total num frames: 390168576. Throughput: 0: 11923.9. Samples: 97626624. Policy #0 lag: (min: 15.0, avg: 117.2, max: 271.0) [2024-06-15 13:53:05,956][1648985] Avg episode reward: [(0, '143.540')] [2024-06-15 13:53:06,083][1652491] Updated weights for policy 0, policy_version 190525 (0.0134) [2024-06-15 13:53:08,488][1652491] Updated weights for policy 0, policy_version 190593 (0.0012) [2024-06-15 13:53:10,955][1648985] Fps is (10 sec: 49151.4, 60 sec: 48059.6, 300 sec: 47097.1). Total num frames: 390463488. Throughput: 0: 11844.3. Samples: 97657344. Policy #0 lag: (min: 15.0, avg: 117.2, max: 271.0) [2024-06-15 13:53:10,956][1648985] Avg episode reward: [(0, '124.930')] [2024-06-15 13:53:12,724][1652491] Updated weights for policy 0, policy_version 190688 (0.0014) [2024-06-15 13:53:15,955][1648985] Fps is (10 sec: 42599.0, 60 sec: 45875.3, 300 sec: 46208.4). Total num frames: 390594560. Throughput: 0: 11707.8. Samples: 97724416. Policy #0 lag: (min: 15.0, avg: 117.2, max: 271.0) [2024-06-15 13:53:15,955][1648985] Avg episode reward: [(0, '126.260')] [2024-06-15 13:53:16,714][1652491] Updated weights for policy 0, policy_version 190737 (0.0013) [2024-06-15 13:53:17,786][1652491] Updated weights for policy 0, policy_version 190781 (0.0010) [2024-06-15 13:53:19,951][1652491] Updated weights for policy 0, policy_version 190848 (0.0012) [2024-06-15 13:53:20,955][1648985] Fps is (10 sec: 45875.0, 60 sec: 48059.8, 300 sec: 47319.2). Total num frames: 390922240. Throughput: 0: 11571.2. Samples: 97791488. Policy #0 lag: (min: 15.0, avg: 117.2, max: 271.0) [2024-06-15 13:53:20,956][1648985] Avg episode reward: [(0, '147.350')] [2024-06-15 13:53:21,192][1652491] Updated weights for policy 0, policy_version 190908 (0.0013) [2024-06-15 13:53:24,720][1652491] Updated weights for policy 0, policy_version 190960 (0.0012) [2024-06-15 13:53:25,955][1648985] Fps is (10 sec: 52428.1, 60 sec: 46967.5, 300 sec: 46652.7). Total num frames: 391118848. Throughput: 0: 11594.0. Samples: 97827840. Policy #0 lag: (min: 15.0, avg: 117.2, max: 271.0) [2024-06-15 13:53:25,957][1648985] Avg episode reward: [(0, '130.970')] [2024-06-15 13:53:28,382][1652491] Updated weights for policy 0, policy_version 191010 (0.0056) [2024-06-15 13:53:29,952][1652491] Updated weights for policy 0, policy_version 191072 (0.0036) [2024-06-15 13:53:30,955][1648985] Fps is (10 sec: 45875.6, 60 sec: 48059.9, 300 sec: 47097.1). Total num frames: 391380992. Throughput: 0: 11616.7. Samples: 97903616. Policy #0 lag: (min: 17.0, avg: 122.1, max: 273.0) [2024-06-15 13:53:30,956][1648985] Avg episode reward: [(0, '130.880')] [2024-06-15 13:53:32,329][1652491] Updated weights for policy 0, policy_version 191138 (0.0013) [2024-06-15 13:53:35,528][1652491] Updated weights for policy 0, policy_version 191206 (0.0096) [2024-06-15 13:53:35,955][1648985] Fps is (10 sec: 49152.4, 60 sec: 47513.7, 300 sec: 46874.9). Total num frames: 391610368. Throughput: 0: 11673.6. Samples: 97966592. Policy #0 lag: (min: 17.0, avg: 122.1, max: 273.0) [2024-06-15 13:53:35,956][1648985] Avg episode reward: [(0, '139.910')] [2024-06-15 13:53:39,050][1652491] Updated weights for policy 0, policy_version 191248 (0.0014) [2024-06-15 13:53:40,407][1652491] Updated weights for policy 0, policy_version 191296 (0.0028) [2024-06-15 13:53:40,955][1648985] Fps is (10 sec: 42598.6, 60 sec: 46421.4, 300 sec: 46763.9). Total num frames: 391806976. Throughput: 0: 11548.5. Samples: 98006528. Policy #0 lag: (min: 17.0, avg: 122.1, max: 273.0) [2024-06-15 13:53:40,956][1648985] Avg episode reward: [(0, '138.810')] [2024-06-15 13:53:41,624][1652491] Updated weights for policy 0, policy_version 191356 (0.0013) [2024-06-15 13:53:43,667][1651469] Signal inference workers to stop experience collection... (10050 times) [2024-06-15 13:53:43,759][1652491] InferenceWorker_p0-w0: stopping experience collection (10050 times) [2024-06-15 13:53:43,945][1651469] Signal inference workers to resume experience collection... (10050 times) [2024-06-15 13:53:43,950][1652491] InferenceWorker_p0-w0: resuming experience collection (10050 times) [2024-06-15 13:53:44,276][1652491] Updated weights for policy 0, policy_version 191408 (0.0013) [2024-06-15 13:53:45,955][1648985] Fps is (10 sec: 42598.0, 60 sec: 46421.3, 300 sec: 46986.0). Total num frames: 392036352. Throughput: 0: 11639.4. Samples: 98075648. Policy #0 lag: (min: 17.0, avg: 122.1, max: 273.0) [2024-06-15 13:53:45,956][1648985] Avg episode reward: [(0, '128.590')] [2024-06-15 13:53:46,942][1652491] Updated weights for policy 0, policy_version 191444 (0.0055) [2024-06-15 13:53:50,196][1652491] Updated weights for policy 0, policy_version 191506 (0.0030) [2024-06-15 13:53:50,955][1648985] Fps is (10 sec: 45875.3, 60 sec: 45329.2, 300 sec: 46541.7). Total num frames: 392265728. Throughput: 0: 11514.3. Samples: 98144768. Policy #0 lag: (min: 17.0, avg: 122.1, max: 273.0) [2024-06-15 13:53:50,955][1648985] Avg episode reward: [(0, '127.390')] [2024-06-15 13:53:51,147][1652491] Updated weights for policy 0, policy_version 191552 (0.0017) [2024-06-15 13:53:52,806][1652491] Updated weights for policy 0, policy_version 191616 (0.0012) [2024-06-15 13:53:55,091][1652491] Updated weights for policy 0, policy_version 191678 (0.0013) [2024-06-15 13:53:55,955][1648985] Fps is (10 sec: 52429.1, 60 sec: 48059.8, 300 sec: 47208.1). Total num frames: 392560640. Throughput: 0: 11662.2. Samples: 98182144. Policy #0 lag: (min: 17.0, avg: 122.1, max: 273.0) [2024-06-15 13:53:55,956][1648985] Avg episode reward: [(0, '119.370')] [2024-06-15 13:53:58,548][1652491] Updated weights for policy 0, policy_version 191736 (0.0153) [2024-06-15 13:54:00,955][1648985] Fps is (10 sec: 42597.6, 60 sec: 45328.9, 300 sec: 46430.6). Total num frames: 392691712. Throughput: 0: 11753.2. Samples: 98253312. Policy #0 lag: (min: 17.0, avg: 122.1, max: 273.0) [2024-06-15 13:54:00,956][1648985] Avg episode reward: [(0, '152.590')] [2024-06-15 13:54:02,172][1652491] Updated weights for policy 0, policy_version 191779 (0.0013) [2024-06-15 13:54:03,679][1652491] Updated weights for policy 0, policy_version 191844 (0.0017) [2024-06-15 13:54:05,397][1652491] Updated weights for policy 0, policy_version 191889 (0.0012) [2024-06-15 13:54:05,955][1648985] Fps is (10 sec: 45875.3, 60 sec: 47513.7, 300 sec: 47319.2). Total num frames: 393019392. Throughput: 0: 11696.4. Samples: 98317824. Policy #0 lag: (min: 17.0, avg: 122.1, max: 273.0) [2024-06-15 13:54:05,956][1648985] Avg episode reward: [(0, '152.230')] [2024-06-15 13:54:09,691][1652491] Updated weights for policy 0, policy_version 191968 (0.0062) [2024-06-15 13:54:10,346][1652491] Updated weights for policy 0, policy_version 192000 (0.0011) [2024-06-15 13:54:10,955][1648985] Fps is (10 sec: 52427.9, 60 sec: 45875.0, 300 sec: 46652.7). Total num frames: 393216000. Throughput: 0: 11844.2. Samples: 98360832. Policy #0 lag: (min: 17.0, avg: 122.1, max: 273.0) [2024-06-15 13:54:10,956][1648985] Avg episode reward: [(0, '147.260')] [2024-06-15 13:54:13,474][1652491] Updated weights for policy 0, policy_version 192053 (0.0013) [2024-06-15 13:54:15,517][1652491] Updated weights for policy 0, policy_version 192112 (0.0014) [2024-06-15 13:54:15,955][1648985] Fps is (10 sec: 45875.1, 60 sec: 48059.7, 300 sec: 47097.1). Total num frames: 393478144. Throughput: 0: 11571.2. Samples: 98424320. Policy #0 lag: (min: 17.0, avg: 122.1, max: 273.0) [2024-06-15 13:54:15,956][1648985] Avg episode reward: [(0, '134.540')] [2024-06-15 13:54:16,667][1652491] Updated weights for policy 0, policy_version 192161 (0.0012) [2024-06-15 13:54:20,955][1648985] Fps is (10 sec: 42599.2, 60 sec: 45329.1, 300 sec: 46874.9). Total num frames: 393641984. Throughput: 0: 11867.0. Samples: 98500608. Policy #0 lag: (min: 17.0, avg: 122.1, max: 273.0) [2024-06-15 13:54:20,956][1648985] Avg episode reward: [(0, '135.740')] [2024-06-15 13:54:21,443][1652491] Updated weights for policy 0, policy_version 192240 (0.0014) [2024-06-15 13:54:24,281][1652491] Updated weights for policy 0, policy_version 192292 (0.0012) [2024-06-15 13:54:25,955][1648985] Fps is (10 sec: 42598.3, 60 sec: 46421.4, 300 sec: 46763.8). Total num frames: 393904128. Throughput: 0: 11707.7. Samples: 98533376. Policy #0 lag: (min: 17.0, avg: 122.1, max: 273.0) [2024-06-15 13:54:25,956][1648985] Avg episode reward: [(0, '145.630')] [2024-06-15 13:54:26,120][1652491] Updated weights for policy 0, policy_version 192343 (0.0014) [2024-06-15 13:54:27,687][1652491] Updated weights for policy 0, policy_version 192416 (0.0012) [2024-06-15 13:54:27,781][1651469] Signal inference workers to stop experience collection... (10100 times) [2024-06-15 13:54:27,886][1652491] InferenceWorker_p0-w0: stopping experience collection (10100 times) [2024-06-15 13:54:28,085][1651469] Signal inference workers to resume experience collection... (10100 times) [2024-06-15 13:54:28,087][1652491] InferenceWorker_p0-w0: resuming experience collection (10100 times) [2024-06-15 13:54:30,955][1648985] Fps is (10 sec: 49152.4, 60 sec: 45875.2, 300 sec: 47319.2). Total num frames: 394133504. Throughput: 0: 11730.5. Samples: 98603520. Policy #0 lag: (min: 17.0, avg: 122.1, max: 273.0) [2024-06-15 13:54:30,956][1648985] Avg episode reward: [(0, '139.420')] [2024-06-15 13:54:31,924][1652491] Updated weights for policy 0, policy_version 192468 (0.0012) [2024-06-15 13:54:34,713][1652491] Updated weights for policy 0, policy_version 192528 (0.0014) [2024-06-15 13:54:35,779][1652491] Updated weights for policy 0, policy_version 192576 (0.0014) [2024-06-15 13:54:35,960][1648985] Fps is (10 sec: 49130.5, 60 sec: 46417.9, 300 sec: 46874.2). Total num frames: 394395648. Throughput: 0: 11752.1. Samples: 98673664. Policy #0 lag: (min: 17.0, avg: 122.1, max: 273.0) [2024-06-15 13:54:35,960][1648985] Avg episode reward: [(0, '129.220')] [2024-06-15 13:54:37,396][1652491] Updated weights for policy 0, policy_version 192632 (0.0124) [2024-06-15 13:54:39,017][1652491] Updated weights for policy 0, policy_version 192695 (0.0013) [2024-06-15 13:54:40,955][1648985] Fps is (10 sec: 52427.9, 60 sec: 47513.4, 300 sec: 47541.4). Total num frames: 394657792. Throughput: 0: 11548.4. Samples: 98701824. Policy #0 lag: (min: 1.0, avg: 119.4, max: 257.0) [2024-06-15 13:54:40,956][1648985] Avg episode reward: [(0, '121.030')] [2024-06-15 13:54:43,872][1652491] Updated weights for policy 0, policy_version 192759 (0.0014) [2024-06-15 13:54:45,739][1652491] Updated weights for policy 0, policy_version 192790 (0.0014) [2024-06-15 13:54:45,955][1648985] Fps is (10 sec: 45894.7, 60 sec: 46967.4, 300 sec: 47097.0). Total num frames: 394854400. Throughput: 0: 11810.1. Samples: 98784768. Policy #0 lag: (min: 1.0, avg: 119.4, max: 257.0) [2024-06-15 13:54:45,956][1648985] Avg episode reward: [(0, '118.670')] [2024-06-15 13:54:47,627][1652491] Updated weights for policy 0, policy_version 192864 (0.0095) [2024-06-15 13:54:48,714][1652491] Updated weights for policy 0, policy_version 192899 (0.0014) [2024-06-15 13:54:50,955][1648985] Fps is (10 sec: 52429.9, 60 sec: 48605.8, 300 sec: 47541.4). Total num frames: 395182080. Throughput: 0: 11844.3. Samples: 98850816. Policy #0 lag: (min: 1.0, avg: 119.4, max: 257.0) [2024-06-15 13:54:50,955][1648985] Avg episode reward: [(0, '118.530')] [2024-06-15 13:54:53,814][1652491] Updated weights for policy 0, policy_version 192961 (0.0013) [2024-06-15 13:54:54,996][1652491] Updated weights for policy 0, policy_version 193020 (0.0013) [2024-06-15 13:54:55,955][1648985] Fps is (10 sec: 52427.0, 60 sec: 46967.1, 300 sec: 47319.1). Total num frames: 395378688. Throughput: 0: 11992.1. Samples: 98900480. Policy #0 lag: (min: 1.0, avg: 119.4, max: 257.0) [2024-06-15 13:54:55,956][1648985] Avg episode reward: [(0, '136.220')] [2024-06-15 13:54:56,291][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000193072_395411456.pth... [2024-06-15 13:54:56,293][1652491] Updated weights for policy 0, policy_version 193072 (0.0012) [2024-06-15 13:54:56,335][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000187472_383942656.pth [2024-06-15 13:54:57,801][1652491] Updated weights for policy 0, policy_version 193107 (0.0011) [2024-06-15 13:54:58,659][1652491] Updated weights for policy 0, policy_version 193152 (0.0030) [2024-06-15 13:54:59,951][1652491] Updated weights for policy 0, policy_version 193207 (0.0013) [2024-06-15 13:55:00,955][1648985] Fps is (10 sec: 52426.8, 60 sec: 50244.1, 300 sec: 47541.4). Total num frames: 395706368. Throughput: 0: 12105.9. Samples: 98969088. Policy #0 lag: (min: 1.0, avg: 119.4, max: 257.0) [2024-06-15 13:55:00,956][1648985] Avg episode reward: [(0, '151.350')] [2024-06-15 13:55:05,603][1652491] Updated weights for policy 0, policy_version 193264 (0.0081) [2024-06-15 13:55:05,955][1648985] Fps is (10 sec: 42600.3, 60 sec: 46421.3, 300 sec: 47319.2). Total num frames: 395804672. Throughput: 0: 12060.5. Samples: 99043328. Policy #0 lag: (min: 1.0, avg: 119.4, max: 257.0) [2024-06-15 13:55:05,956][1648985] Avg episode reward: [(0, '150.340')] [2024-06-15 13:55:06,850][1652491] Updated weights for policy 0, policy_version 193312 (0.0016) [2024-06-15 13:55:08,956][1652491] Updated weights for policy 0, policy_version 193376 (0.0013) [2024-06-15 13:55:10,655][1652491] Updated weights for policy 0, policy_version 193424 (0.0014) [2024-06-15 13:55:10,955][1648985] Fps is (10 sec: 42599.2, 60 sec: 48606.0, 300 sec: 47208.1). Total num frames: 396132352. Throughput: 0: 12049.0. Samples: 99075584. Policy #0 lag: (min: 1.0, avg: 119.4, max: 257.0) [2024-06-15 13:55:10,956][1648985] Avg episode reward: [(0, '126.580')] [2024-06-15 13:55:15,955][1648985] Fps is (10 sec: 42598.5, 60 sec: 45875.2, 300 sec: 47208.2). Total num frames: 396230656. Throughput: 0: 12071.8. Samples: 99146752. Policy #0 lag: (min: 1.0, avg: 119.4, max: 257.0) [2024-06-15 13:55:15,956][1648985] Avg episode reward: [(0, '128.970')] [2024-06-15 13:55:16,044][1651469] Signal inference workers to stop experience collection... (10150 times) [2024-06-15 13:55:16,089][1652491] Updated weights for policy 0, policy_version 193474 (0.0015) [2024-06-15 13:55:16,131][1652491] InferenceWorker_p0-w0: stopping experience collection (10150 times) [2024-06-15 13:55:16,350][1651469] Signal inference workers to resume experience collection... (10150 times) [2024-06-15 13:55:16,351][1652491] InferenceWorker_p0-w0: resuming experience collection (10150 times) [2024-06-15 13:55:17,475][1652491] Updated weights for policy 0, policy_version 193530 (0.0012) [2024-06-15 13:55:18,841][1652491] Updated weights for policy 0, policy_version 193593 (0.0011) [2024-06-15 13:55:20,403][1652491] Updated weights for policy 0, policy_version 193633 (0.0013) [2024-06-15 13:55:20,955][1648985] Fps is (10 sec: 49153.1, 60 sec: 49698.3, 300 sec: 47541.4). Total num frames: 396623872. Throughput: 0: 12050.3. Samples: 99215872. Policy #0 lag: (min: 1.0, avg: 119.4, max: 257.0) [2024-06-15 13:55:20,956][1648985] Avg episode reward: [(0, '127.930')] [2024-06-15 13:55:21,557][1652491] Updated weights for policy 0, policy_version 193691 (0.0108) [2024-06-15 13:55:25,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 47513.6, 300 sec: 47541.4). Total num frames: 396754944. Throughput: 0: 12219.8. Samples: 99251712. Policy #0 lag: (min: 1.0, avg: 119.4, max: 257.0) [2024-06-15 13:55:25,956][1648985] Avg episode reward: [(0, '148.520')] [2024-06-15 13:55:27,614][1652491] Updated weights for policy 0, policy_version 193746 (0.0012) [2024-06-15 13:55:28,444][1652491] Updated weights for policy 0, policy_version 193788 (0.0014) [2024-06-15 13:55:30,109][1652491] Updated weights for policy 0, policy_version 193841 (0.0012) [2024-06-15 13:55:30,955][1648985] Fps is (10 sec: 42598.0, 60 sec: 48605.9, 300 sec: 47430.3). Total num frames: 397049856. Throughput: 0: 12049.1. Samples: 99326976. Policy #0 lag: (min: 1.0, avg: 119.4, max: 257.0) [2024-06-15 13:55:30,956][1648985] Avg episode reward: [(0, '137.600')] [2024-06-15 13:55:31,933][1652491] Updated weights for policy 0, policy_version 193915 (0.0011) [2024-06-15 13:55:33,475][1652491] Updated weights for policy 0, policy_version 193981 (0.0143) [2024-06-15 13:55:35,955][1648985] Fps is (10 sec: 52428.6, 60 sec: 48063.2, 300 sec: 47541.4). Total num frames: 397279232. Throughput: 0: 12071.8. Samples: 99394048. Policy #0 lag: (min: 1.0, avg: 119.4, max: 257.0) [2024-06-15 13:55:35,956][1648985] Avg episode reward: [(0, '132.060')] [2024-06-15 13:55:38,798][1652491] Updated weights for policy 0, policy_version 194020 (0.0033) [2024-06-15 13:55:39,953][1652491] Updated weights for policy 0, policy_version 194050 (0.0013) [2024-06-15 13:55:40,955][1648985] Fps is (10 sec: 45875.2, 60 sec: 47513.7, 300 sec: 47541.4). Total num frames: 397508608. Throughput: 0: 11753.4. Samples: 99429376. Policy #0 lag: (min: 1.0, avg: 119.4, max: 257.0) [2024-06-15 13:55:40,956][1648985] Avg episode reward: [(0, '123.610')] [2024-06-15 13:55:41,007][1652491] Updated weights for policy 0, policy_version 194111 (0.0082) [2024-06-15 13:55:42,532][1652491] Updated weights for policy 0, policy_version 194160 (0.0012) [2024-06-15 13:55:44,286][1652491] Updated weights for policy 0, policy_version 194224 (0.0013) [2024-06-15 13:55:45,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 49152.1, 300 sec: 47541.4). Total num frames: 397803520. Throughput: 0: 11776.1. Samples: 99499008. Policy #0 lag: (min: 15.0, avg: 168.2, max: 271.0) [2024-06-15 13:55:45,956][1648985] Avg episode reward: [(0, '126.530')] [2024-06-15 13:55:49,249][1652491] Updated weights for policy 0, policy_version 194263 (0.0013) [2024-06-15 13:55:50,822][1652491] Updated weights for policy 0, policy_version 194336 (0.0014) [2024-06-15 13:55:50,955][1648985] Fps is (10 sec: 49152.2, 60 sec: 46967.5, 300 sec: 47874.7). Total num frames: 398000128. Throughput: 0: 11821.5. Samples: 99575296. Policy #0 lag: (min: 15.0, avg: 168.2, max: 271.0) [2024-06-15 13:55:50,955][1648985] Avg episode reward: [(0, '131.840')] [2024-06-15 13:55:52,506][1652491] Updated weights for policy 0, policy_version 194384 (0.0034) [2024-06-15 13:55:54,923][1652491] Updated weights for policy 0, policy_version 194452 (0.0135) [2024-06-15 13:55:55,235][1651469] Signal inference workers to stop experience collection... (10200 times) [2024-06-15 13:55:55,297][1652491] InferenceWorker_p0-w0: stopping experience collection (10200 times) [2024-06-15 13:55:55,508][1651469] Signal inference workers to resume experience collection... (10200 times) [2024-06-15 13:55:55,530][1652491] InferenceWorker_p0-w0: resuming experience collection (10200 times) [2024-06-15 13:55:55,955][1648985] Fps is (10 sec: 52427.6, 60 sec: 49152.2, 300 sec: 47541.3). Total num frames: 398327808. Throughput: 0: 11821.5. Samples: 99607552. Policy #0 lag: (min: 15.0, avg: 168.2, max: 271.0) [2024-06-15 13:55:55,956][1648985] Avg episode reward: [(0, '143.620')] [2024-06-15 13:56:00,165][1652491] Updated weights for policy 0, policy_version 194528 (0.0017) [2024-06-15 13:56:00,955][1648985] Fps is (10 sec: 45874.8, 60 sec: 45875.4, 300 sec: 47985.7). Total num frames: 398458880. Throughput: 0: 12151.5. Samples: 99693568. Policy #0 lag: (min: 15.0, avg: 168.2, max: 271.0) [2024-06-15 13:56:00,956][1648985] Avg episode reward: [(0, '133.970')] [2024-06-15 13:56:01,857][1652491] Updated weights for policy 0, policy_version 194618 (0.0015) [2024-06-15 13:56:03,903][1652491] Updated weights for policy 0, policy_version 194672 (0.0013) [2024-06-15 13:56:05,955][1648985] Fps is (10 sec: 45876.6, 60 sec: 49698.2, 300 sec: 47652.5). Total num frames: 398786560. Throughput: 0: 11980.8. Samples: 99755008. Policy #0 lag: (min: 15.0, avg: 168.2, max: 271.0) [2024-06-15 13:56:05,955][1648985] Avg episode reward: [(0, '134.160')] [2024-06-15 13:56:06,139][1652491] Updated weights for policy 0, policy_version 194745 (0.0015) [2024-06-15 13:56:10,955][1648985] Fps is (10 sec: 42598.1, 60 sec: 45875.2, 300 sec: 47652.5). Total num frames: 398884864. Throughput: 0: 12185.6. Samples: 99800064. Policy #0 lag: (min: 15.0, avg: 168.2, max: 271.0) [2024-06-15 13:56:10,956][1648985] Avg episode reward: [(0, '117.760')] [2024-06-15 13:56:11,254][1652491] Updated weights for policy 0, policy_version 194787 (0.0015) [2024-06-15 13:56:12,345][1652491] Updated weights for policy 0, policy_version 194834 (0.0011) [2024-06-15 13:56:13,820][1652491] Updated weights for policy 0, policy_version 194881 (0.0019) [2024-06-15 13:56:15,215][1652491] Updated weights for policy 0, policy_version 194940 (0.0012) [2024-06-15 13:56:15,955][1648985] Fps is (10 sec: 45874.6, 60 sec: 50244.2, 300 sec: 47652.4). Total num frames: 399245312. Throughput: 0: 11923.9. Samples: 99863552. Policy #0 lag: (min: 15.0, avg: 168.2, max: 271.0) [2024-06-15 13:56:15,956][1648985] Avg episode reward: [(0, '117.600')] [2024-06-15 13:56:16,578][1652491] Updated weights for policy 0, policy_version 194983 (0.0014) [2024-06-15 13:56:20,955][1648985] Fps is (10 sec: 49152.7, 60 sec: 45875.1, 300 sec: 47541.4). Total num frames: 399376384. Throughput: 0: 12219.7. Samples: 99943936. Policy #0 lag: (min: 15.0, avg: 168.2, max: 271.0) [2024-06-15 13:56:20,956][1648985] Avg episode reward: [(0, '121.990')] [2024-06-15 13:56:21,935][1652491] Updated weights for policy 0, policy_version 195040 (0.0014) [2024-06-15 13:56:23,299][1652491] Updated weights for policy 0, policy_version 195089 (0.0012) [2024-06-15 13:56:25,832][1652491] Updated weights for policy 0, policy_version 195170 (0.0014) [2024-06-15 13:56:25,955][1648985] Fps is (10 sec: 45874.3, 60 sec: 49151.8, 300 sec: 47763.5). Total num frames: 399704064. Throughput: 0: 12117.3. Samples: 99974656. Policy #0 lag: (min: 15.0, avg: 168.2, max: 271.0) [2024-06-15 13:56:25,956][1648985] Avg episode reward: [(0, '142.600')] [2024-06-15 13:56:28,027][1652491] Updated weights for policy 0, policy_version 195233 (0.0014) [2024-06-15 13:56:30,955][1648985] Fps is (10 sec: 52427.1, 60 sec: 47513.4, 300 sec: 47541.3). Total num frames: 399900672. Throughput: 0: 12140.0. Samples: 100045312. Policy #0 lag: (min: 15.0, avg: 168.2, max: 271.0) [2024-06-15 13:56:30,956][1648985] Avg episode reward: [(0, '140.440')] [2024-06-15 13:56:33,335][1652491] Updated weights for policy 0, policy_version 195296 (0.0021) [2024-06-15 13:56:34,716][1652491] Updated weights for policy 0, policy_version 195346 (0.0012) [2024-06-15 13:56:35,605][1652491] Updated weights for policy 0, policy_version 195390 (0.0022) [2024-06-15 13:56:35,955][1648985] Fps is (10 sec: 45876.3, 60 sec: 48059.7, 300 sec: 47874.6). Total num frames: 400162816. Throughput: 0: 12049.1. Samples: 100117504. Policy #0 lag: (min: 15.0, avg: 168.2, max: 271.0) [2024-06-15 13:56:35,956][1648985] Avg episode reward: [(0, '153.910')] [2024-06-15 13:56:37,249][1652491] Updated weights for policy 0, policy_version 195456 (0.0013) [2024-06-15 13:56:38,139][1651469] Signal inference workers to stop experience collection... (10250 times) [2024-06-15 13:56:38,235][1652491] InferenceWorker_p0-w0: stopping experience collection (10250 times) [2024-06-15 13:56:38,347][1651469] Signal inference workers to resume experience collection... (10250 times) [2024-06-15 13:56:38,348][1652491] InferenceWorker_p0-w0: resuming experience collection (10250 times) [2024-06-15 13:56:38,834][1652491] Updated weights for policy 0, policy_version 195511 (0.0051) [2024-06-15 13:56:40,955][1648985] Fps is (10 sec: 52430.3, 60 sec: 48605.9, 300 sec: 47541.4). Total num frames: 400424960. Throughput: 0: 12015.0. Samples: 100148224. Policy #0 lag: (min: 15.0, avg: 168.2, max: 271.0) [2024-06-15 13:56:40,956][1648985] Avg episode reward: [(0, '148.640')] [2024-06-15 13:56:44,824][1652491] Updated weights for policy 0, policy_version 195552 (0.0013) [2024-06-15 13:56:45,955][1648985] Fps is (10 sec: 42598.0, 60 sec: 46421.3, 300 sec: 47763.5). Total num frames: 400588800. Throughput: 0: 11821.5. Samples: 100225536. Policy #0 lag: (min: 15.0, avg: 168.2, max: 271.0) [2024-06-15 13:56:45,956][1648985] Avg episode reward: [(0, '130.980')] [2024-06-15 13:56:46,117][1652491] Updated weights for policy 0, policy_version 195606 (0.0012) [2024-06-15 13:56:47,179][1652491] Updated weights for policy 0, policy_version 195647 (0.0014) [2024-06-15 13:56:48,587][1652491] Updated weights for policy 0, policy_version 195696 (0.0036) [2024-06-15 13:56:49,810][1652491] Updated weights for policy 0, policy_version 195750 (0.0012) [2024-06-15 13:56:50,955][1648985] Fps is (10 sec: 52429.2, 60 sec: 49152.0, 300 sec: 47652.5). Total num frames: 400949248. Throughput: 0: 11923.9. Samples: 100291584. Policy #0 lag: (min: 94.0, avg: 226.9, max: 319.0) [2024-06-15 13:56:50,955][1648985] Avg episode reward: [(0, '123.220')] [2024-06-15 13:56:54,721][1652491] Updated weights for policy 0, policy_version 195780 (0.0012) [2024-06-15 13:56:55,955][1648985] Fps is (10 sec: 45874.7, 60 sec: 45329.1, 300 sec: 47541.3). Total num frames: 401047552. Throughput: 0: 11832.9. Samples: 100332544. Policy #0 lag: (min: 94.0, avg: 226.9, max: 319.0) [2024-06-15 13:56:55,956][1648985] Avg episode reward: [(0, '137.300')] [2024-06-15 13:56:55,962][1652491] Updated weights for policy 0, policy_version 195838 (0.0019) [2024-06-15 13:56:55,981][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000195840_401080320.pth... [2024-06-15 13:56:56,043][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000190288_389709824.pth [2024-06-15 13:56:57,511][1652491] Updated weights for policy 0, policy_version 195902 (0.0026) [2024-06-15 13:56:59,784][1652491] Updated weights for policy 0, policy_version 195968 (0.0012) [2024-06-15 13:57:00,955][1648985] Fps is (10 sec: 45875.1, 60 sec: 49152.1, 300 sec: 47430.9). Total num frames: 401408000. Throughput: 0: 11798.8. Samples: 100394496. Policy #0 lag: (min: 94.0, avg: 226.9, max: 319.0) [2024-06-15 13:57:00,956][1648985] Avg episode reward: [(0, '145.590')] [2024-06-15 13:57:01,473][1652491] Updated weights for policy 0, policy_version 196030 (0.0013) [2024-06-15 13:57:05,989][1648985] Fps is (10 sec: 42456.6, 60 sec: 44757.8, 300 sec: 47091.7). Total num frames: 401473536. Throughput: 0: 11687.6. Samples: 100470272. Policy #0 lag: (min: 94.0, avg: 226.9, max: 319.0) [2024-06-15 13:57:05,989][1648985] Avg episode reward: [(0, '126.900')] [2024-06-15 13:57:07,148][1652491] Updated weights for policy 0, policy_version 196096 (0.0091) [2024-06-15 13:57:08,930][1652491] Updated weights for policy 0, policy_version 196157 (0.0013) [2024-06-15 13:57:10,923][1652491] Updated weights for policy 0, policy_version 196197 (0.0023) [2024-06-15 13:57:10,955][1648985] Fps is (10 sec: 39321.9, 60 sec: 48606.1, 300 sec: 47319.2). Total num frames: 401801216. Throughput: 0: 11605.4. Samples: 100496896. Policy #0 lag: (min: 94.0, avg: 226.9, max: 319.0) [2024-06-15 13:57:10,955][1648985] Avg episode reward: [(0, '122.360')] [2024-06-15 13:57:12,899][1652491] Updated weights for policy 0, policy_version 196257 (0.0015) [2024-06-15 13:57:15,955][1648985] Fps is (10 sec: 52604.9, 60 sec: 45875.1, 300 sec: 47319.2). Total num frames: 401997824. Throughput: 0: 11741.9. Samples: 100573696. Policy #0 lag: (min: 94.0, avg: 226.9, max: 319.0) [2024-06-15 13:57:15,956][1648985] Avg episode reward: [(0, '119.950')] [2024-06-15 13:57:17,722][1652491] Updated weights for policy 0, policy_version 196320 (0.0014) [2024-06-15 13:57:19,260][1652491] Updated weights for policy 0, policy_version 196384 (0.0206) [2024-06-15 13:57:20,955][1648985] Fps is (10 sec: 45875.0, 60 sec: 48059.8, 300 sec: 47319.2). Total num frames: 402259968. Throughput: 0: 11628.1. Samples: 100640768. Policy #0 lag: (min: 94.0, avg: 226.9, max: 319.0) [2024-06-15 13:57:20,955][1648985] Avg episode reward: [(0, '137.180')] [2024-06-15 13:57:21,880][1652491] Updated weights for policy 0, policy_version 196448 (0.0015) [2024-06-15 13:57:23,445][1651469] Signal inference workers to stop experience collection... (10300 times) [2024-06-15 13:57:23,532][1652491] InferenceWorker_p0-w0: stopping experience collection (10300 times) [2024-06-15 13:57:23,533][1652491] Updated weights for policy 0, policy_version 196501 (0.0012) [2024-06-15 13:57:23,771][1651469] Signal inference workers to resume experience collection... (10300 times) [2024-06-15 13:57:23,773][1652491] InferenceWorker_p0-w0: resuming experience collection (10300 times) [2024-06-15 13:57:25,955][1648985] Fps is (10 sec: 52429.1, 60 sec: 46967.6, 300 sec: 47541.4). Total num frames: 402522112. Throughput: 0: 11730.5. Samples: 100676096. Policy #0 lag: (min: 94.0, avg: 226.9, max: 319.0) [2024-06-15 13:57:25,956][1648985] Avg episode reward: [(0, '154.320')] [2024-06-15 13:57:28,772][1652491] Updated weights for policy 0, policy_version 196564 (0.0013) [2024-06-15 13:57:29,817][1652491] Updated weights for policy 0, policy_version 196614 (0.0103) [2024-06-15 13:57:30,949][1652491] Updated weights for policy 0, policy_version 196669 (0.0014) [2024-06-15 13:57:30,955][1648985] Fps is (10 sec: 49151.8, 60 sec: 47513.9, 300 sec: 47430.3). Total num frames: 402751488. Throughput: 0: 11673.6. Samples: 100750848. Policy #0 lag: (min: 94.0, avg: 226.9, max: 319.0) [2024-06-15 13:57:30,956][1648985] Avg episode reward: [(0, '145.090')] [2024-06-15 13:57:33,440][1652491] Updated weights for policy 0, policy_version 196729 (0.0012) [2024-06-15 13:57:34,851][1652491] Updated weights for policy 0, policy_version 196770 (0.0012) [2024-06-15 13:57:35,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 48059.7, 300 sec: 47541.4). Total num frames: 403046400. Throughput: 0: 11719.1. Samples: 100818944. Policy #0 lag: (min: 94.0, avg: 226.9, max: 319.0) [2024-06-15 13:57:35,956][1648985] Avg episode reward: [(0, '145.660')] [2024-06-15 13:57:39,161][1652491] Updated weights for policy 0, policy_version 196819 (0.0013) [2024-06-15 13:57:40,049][1652491] Updated weights for policy 0, policy_version 196862 (0.0028) [2024-06-15 13:57:40,955][1648985] Fps is (10 sec: 45875.1, 60 sec: 46421.4, 300 sec: 47319.2). Total num frames: 403210240. Throughput: 0: 11798.8. Samples: 100863488. Policy #0 lag: (min: 94.0, avg: 226.9, max: 319.0) [2024-06-15 13:57:40,956][1648985] Avg episode reward: [(0, '136.230')] [2024-06-15 13:57:43,688][1652491] Updated weights for policy 0, policy_version 196948 (0.0015) [2024-06-15 13:57:45,512][1652491] Updated weights for policy 0, policy_version 197027 (0.0023) [2024-06-15 13:57:45,955][1648985] Fps is (10 sec: 49151.6, 60 sec: 49151.9, 300 sec: 47430.3). Total num frames: 403537920. Throughput: 0: 11821.5. Samples: 100926464. Policy #0 lag: (min: 94.0, avg: 226.9, max: 319.0) [2024-06-15 13:57:45,956][1648985] Avg episode reward: [(0, '107.260')] [2024-06-15 13:57:50,071][1652491] Updated weights for policy 0, policy_version 197073 (0.0021) [2024-06-15 13:57:50,955][1648985] Fps is (10 sec: 45875.4, 60 sec: 45329.1, 300 sec: 47430.3). Total num frames: 403668992. Throughput: 0: 11819.0. Samples: 101001728. Policy #0 lag: (min: 94.0, avg: 226.9, max: 319.0) [2024-06-15 13:57:50,955][1648985] Avg episode reward: [(0, '99.300')] [2024-06-15 13:57:51,770][1652491] Updated weights for policy 0, policy_version 197124 (0.0032) [2024-06-15 13:57:54,740][1652491] Updated weights for policy 0, policy_version 197186 (0.0012) [2024-06-15 13:57:55,955][1648985] Fps is (10 sec: 39322.1, 60 sec: 48059.9, 300 sec: 47319.2). Total num frames: 403931136. Throughput: 0: 11935.2. Samples: 101033984. Policy #0 lag: (min: 94.0, avg: 226.9, max: 319.0) [2024-06-15 13:57:55,956][1648985] Avg episode reward: [(0, '115.660')] [2024-06-15 13:57:56,197][1652491] Updated weights for policy 0, policy_version 197249 (0.0014) [2024-06-15 13:57:57,518][1652491] Updated weights for policy 0, policy_version 197312 (0.0122) [2024-06-15 13:58:00,955][1648985] Fps is (10 sec: 42598.6, 60 sec: 44783.0, 300 sec: 47208.2). Total num frames: 404094976. Throughput: 0: 11889.8. Samples: 101108736. Policy #0 lag: (min: 93.0, avg: 200.1, max: 319.0) [2024-06-15 13:58:00,955][1648985] Avg episode reward: [(0, '138.050')] [2024-06-15 13:58:01,854][1652491] Updated weights for policy 0, policy_version 197376 (0.0033) [2024-06-15 13:58:04,029][1652491] Updated weights for policy 0, policy_version 197430 (0.0013) [2024-06-15 13:58:05,803][1652491] Updated weights for policy 0, policy_version 197459 (0.0014) [2024-06-15 13:58:05,955][1648985] Fps is (10 sec: 49152.0, 60 sec: 49179.5, 300 sec: 47319.2). Total num frames: 404422656. Throughput: 0: 11958.0. Samples: 101178880. Policy #0 lag: (min: 93.0, avg: 200.1, max: 319.0) [2024-06-15 13:58:05,956][1648985] Avg episode reward: [(0, '129.060')] [2024-06-15 13:58:06,803][1651469] Signal inference workers to stop experience collection... (10350 times) [2024-06-15 13:58:06,849][1652491] InferenceWorker_p0-w0: stopping experience collection (10350 times) [2024-06-15 13:58:06,851][1652491] Updated weights for policy 0, policy_version 197502 (0.0012) [2024-06-15 13:58:06,908][1651469] Signal inference workers to resume experience collection... (10350 times) [2024-06-15 13:58:06,926][1652491] InferenceWorker_p0-w0: resuming experience collection (10350 times) [2024-06-15 13:58:08,153][1652491] Updated weights for policy 0, policy_version 197556 (0.0012) [2024-06-15 13:58:10,955][1648985] Fps is (10 sec: 52428.4, 60 sec: 46967.4, 300 sec: 47541.4). Total num frames: 404619264. Throughput: 0: 11878.4. Samples: 101210624. Policy #0 lag: (min: 93.0, avg: 200.1, max: 319.0) [2024-06-15 13:58:10,956][1648985] Avg episode reward: [(0, '118.480')] [2024-06-15 13:58:12,181][1652491] Updated weights for policy 0, policy_version 197624 (0.0116) [2024-06-15 13:58:15,625][1652491] Updated weights for policy 0, policy_version 197685 (0.0114) [2024-06-15 13:58:15,955][1648985] Fps is (10 sec: 45875.5, 60 sec: 48059.8, 300 sec: 47319.2). Total num frames: 404881408. Throughput: 0: 11958.0. Samples: 101288960. Policy #0 lag: (min: 93.0, avg: 200.1, max: 319.0) [2024-06-15 13:58:15,956][1648985] Avg episode reward: [(0, '117.290')] [2024-06-15 13:58:16,866][1652491] Updated weights for policy 0, policy_version 197728 (0.0015) [2024-06-15 13:58:17,704][1652491] Updated weights for policy 0, policy_version 197760 (0.0066) [2024-06-15 13:58:19,289][1652491] Updated weights for policy 0, policy_version 197817 (0.0015) [2024-06-15 13:58:20,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 48059.7, 300 sec: 47541.4). Total num frames: 405143552. Throughput: 0: 11935.3. Samples: 101356032. Policy #0 lag: (min: 93.0, avg: 200.1, max: 319.0) [2024-06-15 13:58:20,956][1648985] Avg episode reward: [(0, '120.860')] [2024-06-15 13:58:23,404][1652491] Updated weights for policy 0, policy_version 197882 (0.0013) [2024-06-15 13:58:25,962][1648985] Fps is (10 sec: 42567.3, 60 sec: 46415.8, 300 sec: 47207.0). Total num frames: 405307392. Throughput: 0: 11717.2. Samples: 101390848. Policy #0 lag: (min: 93.0, avg: 200.1, max: 319.0) [2024-06-15 13:58:25,963][1648985] Avg episode reward: [(0, '140.030')] [2024-06-15 13:58:26,555][1652491] Updated weights for policy 0, policy_version 197936 (0.0012) [2024-06-15 13:58:27,695][1652491] Updated weights for policy 0, policy_version 197970 (0.0013) [2024-06-15 13:58:29,899][1652491] Updated weights for policy 0, policy_version 198033 (0.0012) [2024-06-15 13:58:30,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 48605.8, 300 sec: 47652.4). Total num frames: 405667840. Throughput: 0: 11889.8. Samples: 101461504. Policy #0 lag: (min: 93.0, avg: 200.1, max: 319.0) [2024-06-15 13:58:30,956][1648985] Avg episode reward: [(0, '133.750')] [2024-06-15 13:58:32,833][1652491] Updated weights for policy 0, policy_version 198096 (0.0013) [2024-06-15 13:58:35,955][1648985] Fps is (10 sec: 49188.2, 60 sec: 45875.3, 300 sec: 47430.3). Total num frames: 405798912. Throughput: 0: 11855.6. Samples: 101535232. Policy #0 lag: (min: 93.0, avg: 200.1, max: 319.0) [2024-06-15 13:58:35,955][1648985] Avg episode reward: [(0, '122.490')] [2024-06-15 13:58:36,748][1652491] Updated weights for policy 0, policy_version 198145 (0.0047) [2024-06-15 13:58:39,204][1652491] Updated weights for policy 0, policy_version 198224 (0.0013) [2024-06-15 13:58:40,469][1652491] Updated weights for policy 0, policy_version 198272 (0.0116) [2024-06-15 13:58:40,955][1648985] Fps is (10 sec: 39320.7, 60 sec: 47513.4, 300 sec: 47541.3). Total num frames: 406061056. Throughput: 0: 11946.6. Samples: 101571584. Policy #0 lag: (min: 93.0, avg: 200.1, max: 319.0) [2024-06-15 13:58:40,956][1648985] Avg episode reward: [(0, '120.680')] [2024-06-15 13:58:41,970][1652491] Updated weights for policy 0, policy_version 198330 (0.0013) [2024-06-15 13:58:45,840][1652491] Updated weights for policy 0, policy_version 198392 (0.0014) [2024-06-15 13:58:45,955][1648985] Fps is (10 sec: 49151.8, 60 sec: 45875.3, 300 sec: 47541.4). Total num frames: 406290432. Throughput: 0: 11639.4. Samples: 101632512. Policy #0 lag: (min: 93.0, avg: 200.1, max: 319.0) [2024-06-15 13:58:45,955][1648985] Avg episode reward: [(0, '133.710')] [2024-06-15 13:58:49,431][1652491] Updated weights for policy 0, policy_version 198433 (0.0012) [2024-06-15 13:58:50,620][1652491] Updated weights for policy 0, policy_version 198480 (0.0011) [2024-06-15 13:58:50,955][1648985] Fps is (10 sec: 42600.1, 60 sec: 46967.5, 300 sec: 47208.2). Total num frames: 406487040. Throughput: 0: 11764.7. Samples: 101708288. Policy #0 lag: (min: 93.0, avg: 200.1, max: 319.0) [2024-06-15 13:58:50,955][1648985] Avg episode reward: [(0, '135.040')] [2024-06-15 13:58:52,063][1651469] Signal inference workers to stop experience collection... (10400 times) [2024-06-15 13:58:52,118][1652491] InferenceWorker_p0-w0: stopping experience collection (10400 times) [2024-06-15 13:58:52,122][1652491] Updated weights for policy 0, policy_version 198530 (0.0014) [2024-06-15 13:58:52,342][1651469] Signal inference workers to resume experience collection... (10400 times) [2024-06-15 13:58:52,343][1652491] InferenceWorker_p0-w0: resuming experience collection (10400 times) [2024-06-15 13:58:53,521][1652491] Updated weights for policy 0, policy_version 198592 (0.0012) [2024-06-15 13:58:55,955][1648985] Fps is (10 sec: 45874.8, 60 sec: 46967.5, 300 sec: 47652.5). Total num frames: 406749184. Throughput: 0: 11628.1. Samples: 101733888. Policy #0 lag: (min: 93.0, avg: 200.1, max: 319.0) [2024-06-15 13:58:55,956][1648985] Avg episode reward: [(0, '129.520')] [2024-06-15 13:58:56,732][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000198640_406814720.pth... [2024-06-15 13:58:56,781][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000193072_395411456.pth [2024-06-15 13:58:57,017][1652491] Updated weights for policy 0, policy_version 198649 (0.0104) [2024-06-15 13:59:00,955][1648985] Fps is (10 sec: 39320.9, 60 sec: 46421.2, 300 sec: 46986.0). Total num frames: 406880256. Throughput: 0: 11537.1. Samples: 101808128. Policy #0 lag: (min: 93.0, avg: 200.1, max: 319.0) [2024-06-15 13:59:00,956][1648985] Avg episode reward: [(0, '113.250')] [2024-06-15 13:59:01,168][1652491] Updated weights for policy 0, policy_version 198688 (0.0012) [2024-06-15 13:59:02,598][1652491] Updated weights for policy 0, policy_version 198740 (0.0014) [2024-06-15 13:59:04,172][1652491] Updated weights for policy 0, policy_version 198803 (0.0029) [2024-06-15 13:59:05,957][1648985] Fps is (10 sec: 49141.4, 60 sec: 46965.8, 300 sec: 47541.1). Total num frames: 407240704. Throughput: 0: 11411.4. Samples: 101869568. Policy #0 lag: (min: 78.0, avg: 189.5, max: 329.0) [2024-06-15 13:59:05,958][1648985] Avg episode reward: [(0, '119.500')] [2024-06-15 13:59:07,814][1652491] Updated weights for policy 0, policy_version 198865 (0.0016) [2024-06-15 13:59:09,044][1652491] Updated weights for policy 0, policy_version 198912 (0.0015) [2024-06-15 13:59:10,955][1648985] Fps is (10 sec: 49152.4, 60 sec: 45875.2, 300 sec: 47097.1). Total num frames: 407371776. Throughput: 0: 11436.5. Samples: 101905408. Policy #0 lag: (min: 78.0, avg: 189.5, max: 329.0) [2024-06-15 13:59:10,955][1648985] Avg episode reward: [(0, '128.510')] [2024-06-15 13:59:13,281][1652491] Updated weights for policy 0, policy_version 198964 (0.0012) [2024-06-15 13:59:15,025][1652491] Updated weights for policy 0, policy_version 199032 (0.0013) [2024-06-15 13:59:15,846][1652491] Updated weights for policy 0, policy_version 199059 (0.0012) [2024-06-15 13:59:15,955][1648985] Fps is (10 sec: 42608.1, 60 sec: 46421.4, 300 sec: 47541.4). Total num frames: 407666688. Throughput: 0: 11332.3. Samples: 101971456. Policy #0 lag: (min: 78.0, avg: 189.5, max: 329.0) [2024-06-15 13:59:15,956][1648985] Avg episode reward: [(0, '141.300')] [2024-06-15 13:59:16,877][1652491] Updated weights for policy 0, policy_version 199104 (0.0014) [2024-06-15 13:59:20,182][1652491] Updated weights for policy 0, policy_version 199164 (0.0013) [2024-06-15 13:59:20,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 45875.3, 300 sec: 47430.3). Total num frames: 407896064. Throughput: 0: 11173.0. Samples: 102038016. Policy #0 lag: (min: 78.0, avg: 189.5, max: 329.0) [2024-06-15 13:59:20,956][1648985] Avg episode reward: [(0, '140.460')] [2024-06-15 13:59:25,255][1652491] Updated weights for policy 0, policy_version 199216 (0.0011) [2024-06-15 13:59:25,955][1648985] Fps is (10 sec: 36044.1, 60 sec: 45334.5, 300 sec: 47097.0). Total num frames: 408027136. Throughput: 0: 11264.0. Samples: 102078464. Policy #0 lag: (min: 78.0, avg: 189.5, max: 329.0) [2024-06-15 13:59:25,956][1648985] Avg episode reward: [(0, '147.310')] [2024-06-15 13:59:27,009][1652491] Updated weights for policy 0, policy_version 199293 (0.0011) [2024-06-15 13:59:29,081][1652491] Updated weights for policy 0, policy_version 199355 (0.0012) [2024-06-15 13:59:30,955][1648985] Fps is (10 sec: 42598.1, 60 sec: 44236.8, 300 sec: 47208.8). Total num frames: 408322048. Throughput: 0: 11138.8. Samples: 102133760. Policy #0 lag: (min: 78.0, avg: 189.5, max: 329.0) [2024-06-15 13:59:30,956][1648985] Avg episode reward: [(0, '137.530')] [2024-06-15 13:59:31,980][1652491] Updated weights for policy 0, policy_version 199424 (0.0119) [2024-06-15 13:59:35,955][1648985] Fps is (10 sec: 39322.3, 60 sec: 43690.7, 300 sec: 46652.8). Total num frames: 408420352. Throughput: 0: 11070.6. Samples: 102206464. Policy #0 lag: (min: 78.0, avg: 189.5, max: 329.0) [2024-06-15 13:59:35,956][1648985] Avg episode reward: [(0, '120.530')] [2024-06-15 13:59:37,834][1652491] Updated weights for policy 0, policy_version 199488 (0.0015) [2024-06-15 13:59:37,991][1651469] Signal inference workers to stop experience collection... (10450 times) [2024-06-15 13:59:38,032][1652491] InferenceWorker_p0-w0: stopping experience collection (10450 times) [2024-06-15 13:59:38,222][1651469] Signal inference workers to resume experience collection... (10450 times) [2024-06-15 13:59:38,234][1652491] InferenceWorker_p0-w0: resuming experience collection (10450 times) [2024-06-15 13:59:39,317][1652491] Updated weights for policy 0, policy_version 199543 (0.0017) [2024-06-15 13:59:40,900][1652491] Updated weights for policy 0, policy_version 199607 (0.0013) [2024-06-15 13:59:40,955][1648985] Fps is (10 sec: 45875.4, 60 sec: 45329.3, 300 sec: 47208.2). Total num frames: 408780800. Throughput: 0: 11184.4. Samples: 102237184. Policy #0 lag: (min: 78.0, avg: 189.5, max: 329.0) [2024-06-15 13:59:40,955][1648985] Avg episode reward: [(0, '122.850')] [2024-06-15 13:59:43,778][1652491] Updated weights for policy 0, policy_version 199664 (0.0012) [2024-06-15 13:59:45,955][1648985] Fps is (10 sec: 52428.4, 60 sec: 44236.8, 300 sec: 46652.7). Total num frames: 408944640. Throughput: 0: 10979.6. Samples: 102302208. Policy #0 lag: (min: 78.0, avg: 189.5, max: 329.0) [2024-06-15 13:59:45,956][1648985] Avg episode reward: [(0, '121.510')] [2024-06-15 13:59:49,056][1652491] Updated weights for policy 0, policy_version 199728 (0.0012) [2024-06-15 13:59:50,566][1652491] Updated weights for policy 0, policy_version 199797 (0.0013) [2024-06-15 13:59:50,955][1648985] Fps is (10 sec: 42598.0, 60 sec: 45328.9, 300 sec: 46875.0). Total num frames: 409206784. Throughput: 0: 11150.8. Samples: 102371328. Policy #0 lag: (min: 78.0, avg: 189.5, max: 329.0) [2024-06-15 13:59:50,956][1648985] Avg episode reward: [(0, '124.210')] [2024-06-15 13:59:52,222][1652491] Updated weights for policy 0, policy_version 199856 (0.0010) [2024-06-15 13:59:54,944][1652491] Updated weights for policy 0, policy_version 199936 (0.0016) [2024-06-15 13:59:55,955][1648985] Fps is (10 sec: 52427.9, 60 sec: 45329.0, 300 sec: 46652.8). Total num frames: 409468928. Throughput: 0: 11138.8. Samples: 102406656. Policy #0 lag: (min: 78.0, avg: 189.5, max: 329.0) [2024-06-15 13:59:55,956][1648985] Avg episode reward: [(0, '132.140')] [2024-06-15 14:00:00,955][1648985] Fps is (10 sec: 36044.9, 60 sec: 44783.0, 300 sec: 46652.7). Total num frames: 409567232. Throughput: 0: 11332.2. Samples: 102481408. Policy #0 lag: (min: 78.0, avg: 189.5, max: 329.0) [2024-06-15 14:00:00,956][1648985] Avg episode reward: [(0, '134.670')] [2024-06-15 14:00:01,450][1652491] Updated weights for policy 0, policy_version 200020 (0.0137) [2024-06-15 14:00:02,737][1652491] Updated weights for policy 0, policy_version 200065 (0.0062) [2024-06-15 14:00:04,052][1652491] Updated weights for policy 0, policy_version 200127 (0.0033) [2024-06-15 14:00:05,955][1648985] Fps is (10 sec: 45876.3, 60 sec: 44784.6, 300 sec: 46763.9). Total num frames: 409927680. Throughput: 0: 11275.4. Samples: 102545408. Policy #0 lag: (min: 78.0, avg: 189.5, max: 329.0) [2024-06-15 14:00:05,956][1648985] Avg episode reward: [(0, '142.760')] [2024-06-15 14:00:06,093][1652491] Updated weights for policy 0, policy_version 200177 (0.0140) [2024-06-15 14:00:10,955][1648985] Fps is (10 sec: 49151.9, 60 sec: 44782.9, 300 sec: 46874.9). Total num frames: 410058752. Throughput: 0: 11275.4. Samples: 102585856. Policy #0 lag: (min: 78.0, avg: 189.5, max: 329.0) [2024-06-15 14:00:10,956][1648985] Avg episode reward: [(0, '149.570')] [2024-06-15 14:00:11,858][1652491] Updated weights for policy 0, policy_version 200259 (0.0013) [2024-06-15 14:00:13,104][1652491] Updated weights for policy 0, policy_version 200316 (0.0011) [2024-06-15 14:00:14,634][1652491] Updated weights for policy 0, policy_version 200380 (0.0013) [2024-06-15 14:00:15,955][1648985] Fps is (10 sec: 45874.9, 60 sec: 45329.0, 300 sec: 46652.7). Total num frames: 410386432. Throughput: 0: 11537.1. Samples: 102652928. Policy #0 lag: (min: 63.0, avg: 134.2, max: 319.0) [2024-06-15 14:00:15,956][1648985] Avg episode reward: [(0, '158.680')] [2024-06-15 14:00:17,171][1652491] Updated weights for policy 0, policy_version 200442 (0.0012) [2024-06-15 14:00:20,955][1648985] Fps is (10 sec: 45875.5, 60 sec: 43690.6, 300 sec: 46652.7). Total num frames: 410517504. Throughput: 0: 11605.3. Samples: 102728704. Policy #0 lag: (min: 63.0, avg: 134.2, max: 319.0) [2024-06-15 14:00:20,956][1648985] Avg episode reward: [(0, '146.080')] [2024-06-15 14:00:21,211][1651469] Signal inference workers to stop experience collection... (10500 times) [2024-06-15 14:00:21,284][1652491] InferenceWorker_p0-w0: stopping experience collection (10500 times) [2024-06-15 14:00:21,383][1651469] Signal inference workers to resume experience collection... (10500 times) [2024-06-15 14:00:21,384][1652491] InferenceWorker_p0-w0: resuming experience collection (10500 times) [2024-06-15 14:00:22,003][1652491] Updated weights for policy 0, policy_version 200503 (0.0115) [2024-06-15 14:00:24,130][1652491] Updated weights for policy 0, policy_version 200564 (0.0012) [2024-06-15 14:00:25,134][1652491] Updated weights for policy 0, policy_version 200592 (0.0013) [2024-06-15 14:00:25,955][1648985] Fps is (10 sec: 49152.1, 60 sec: 47513.7, 300 sec: 46874.9). Total num frames: 410877952. Throughput: 0: 11605.3. Samples: 102759424. Policy #0 lag: (min: 63.0, avg: 134.2, max: 319.0) [2024-06-15 14:00:25,956][1648985] Avg episode reward: [(0, '139.130')] [2024-06-15 14:00:26,156][1652491] Updated weights for policy 0, policy_version 200637 (0.0013) [2024-06-15 14:00:28,666][1652491] Updated weights for policy 0, policy_version 200704 (0.0119) [2024-06-15 14:00:30,955][1648985] Fps is (10 sec: 52428.6, 60 sec: 45329.1, 300 sec: 46652.7). Total num frames: 411041792. Throughput: 0: 11764.6. Samples: 102831616. Policy #0 lag: (min: 63.0, avg: 134.2, max: 319.0) [2024-06-15 14:00:30,956][1648985] Avg episode reward: [(0, '137.240')] [2024-06-15 14:00:32,782][1652491] Updated weights for policy 0, policy_version 200762 (0.0043) [2024-06-15 14:00:35,226][1652491] Updated weights for policy 0, policy_version 200802 (0.0028) [2024-06-15 14:00:35,955][1648985] Fps is (10 sec: 42598.5, 60 sec: 48059.7, 300 sec: 46763.8). Total num frames: 411303936. Throughput: 0: 11810.2. Samples: 102902784. Policy #0 lag: (min: 63.0, avg: 134.2, max: 319.0) [2024-06-15 14:00:35,956][1648985] Avg episode reward: [(0, '120.140')] [2024-06-15 14:00:36,238][1652491] Updated weights for policy 0, policy_version 200837 (0.0015) [2024-06-15 14:00:37,466][1652491] Updated weights for policy 0, policy_version 200896 (0.0012) [2024-06-15 14:00:39,772][1652491] Updated weights for policy 0, policy_version 200958 (0.0115) [2024-06-15 14:00:40,955][1648985] Fps is (10 sec: 52426.9, 60 sec: 46421.0, 300 sec: 46652.7). Total num frames: 411566080. Throughput: 0: 11730.4. Samples: 102934528. Policy #0 lag: (min: 63.0, avg: 134.2, max: 319.0) [2024-06-15 14:00:40,956][1648985] Avg episode reward: [(0, '125.570')] [2024-06-15 14:00:44,309][1652491] Updated weights for policy 0, policy_version 201017 (0.0013) [2024-06-15 14:00:45,955][1648985] Fps is (10 sec: 45875.0, 60 sec: 46967.5, 300 sec: 46652.7). Total num frames: 411762688. Throughput: 0: 11639.5. Samples: 103005184. Policy #0 lag: (min: 63.0, avg: 134.2, max: 319.0) [2024-06-15 14:00:45,956][1648985] Avg episode reward: [(0, '131.060')] [2024-06-15 14:00:46,575][1652491] Updated weights for policy 0, policy_version 201077 (0.0013) [2024-06-15 14:00:47,862][1652491] Updated weights for policy 0, policy_version 201104 (0.0028) [2024-06-15 14:00:50,141][1652491] Updated weights for policy 0, policy_version 201153 (0.0013) [2024-06-15 14:00:50,955][1648985] Fps is (10 sec: 45876.9, 60 sec: 46967.5, 300 sec: 46430.6). Total num frames: 412024832. Throughput: 0: 11707.7. Samples: 103072256. Policy #0 lag: (min: 63.0, avg: 134.2, max: 319.0) [2024-06-15 14:00:50,956][1648985] Avg episode reward: [(0, '125.080')] [2024-06-15 14:00:51,144][1652491] Updated weights for policy 0, policy_version 201201 (0.0028) [2024-06-15 14:00:55,326][1652491] Updated weights for policy 0, policy_version 201249 (0.0013) [2024-06-15 14:00:55,955][1648985] Fps is (10 sec: 45875.0, 60 sec: 45875.3, 300 sec: 46652.7). Total num frames: 412221440. Throughput: 0: 11707.7. Samples: 103112704. Policy #0 lag: (min: 63.0, avg: 134.2, max: 319.0) [2024-06-15 14:00:55,956][1648985] Avg episode reward: [(0, '121.010')] [2024-06-15 14:00:55,961][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000201280_412221440.pth... [2024-06-15 14:00:56,040][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000195840_401080320.pth [2024-06-15 14:00:56,861][1652491] Updated weights for policy 0, policy_version 201296 (0.0012) [2024-06-15 14:00:58,349][1652491] Updated weights for policy 0, policy_version 201344 (0.0012) [2024-06-15 14:01:00,336][1652491] Updated weights for policy 0, policy_version 201402 (0.0015) [2024-06-15 14:01:00,955][1648985] Fps is (10 sec: 45874.5, 60 sec: 48605.7, 300 sec: 46430.6). Total num frames: 412483584. Throughput: 0: 11593.9. Samples: 103174656. Policy #0 lag: (min: 63.0, avg: 134.2, max: 319.0) [2024-06-15 14:01:00,956][1648985] Avg episode reward: [(0, '122.950')] [2024-06-15 14:01:01,836][1652491] Updated weights for policy 0, policy_version 201432 (0.0014) [2024-06-15 14:01:05,955][1648985] Fps is (10 sec: 39321.8, 60 sec: 44782.9, 300 sec: 46541.7). Total num frames: 412614656. Throughput: 0: 11571.2. Samples: 103249408. Policy #0 lag: (min: 63.0, avg: 134.2, max: 319.0) [2024-06-15 14:01:05,956][1648985] Avg episode reward: [(0, '122.280')] [2024-06-15 14:01:06,582][1652491] Updated weights for policy 0, policy_version 201504 (0.0016) [2024-06-15 14:01:08,443][1651469] Signal inference workers to stop experience collection... (10550 times) [2024-06-15 14:01:08,549][1652491] InferenceWorker_p0-w0: stopping experience collection (10550 times) [2024-06-15 14:01:08,553][1652491] Updated weights for policy 0, policy_version 201543 (0.0013) [2024-06-15 14:01:08,724][1651469] Signal inference workers to resume experience collection... (10550 times) [2024-06-15 14:01:08,727][1652491] InferenceWorker_p0-w0: resuming experience collection (10550 times) [2024-06-15 14:01:09,992][1652491] Updated weights for policy 0, policy_version 201595 (0.0013) [2024-06-15 14:01:10,955][1648985] Fps is (10 sec: 42599.0, 60 sec: 47513.6, 300 sec: 46319.5). Total num frames: 412909568. Throughput: 0: 11548.4. Samples: 103279104. Policy #0 lag: (min: 63.0, avg: 134.2, max: 319.0) [2024-06-15 14:01:10,956][1648985] Avg episode reward: [(0, '130.430')] [2024-06-15 14:01:11,804][1652491] Updated weights for policy 0, policy_version 201653 (0.0012) [2024-06-15 14:01:13,008][1652491] Updated weights for policy 0, policy_version 201680 (0.0012) [2024-06-15 14:01:13,947][1652491] Updated weights for policy 0, policy_version 201721 (0.0013) [2024-06-15 14:01:15,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 45875.2, 300 sec: 46652.7). Total num frames: 413138944. Throughput: 0: 11480.2. Samples: 103348224. Policy #0 lag: (min: 63.0, avg: 134.2, max: 319.0) [2024-06-15 14:01:15,956][1648985] Avg episode reward: [(0, '151.070')] [2024-06-15 14:01:18,297][1652491] Updated weights for policy 0, policy_version 201784 (0.0012) [2024-06-15 14:01:20,539][1652491] Updated weights for policy 0, policy_version 201827 (0.0052) [2024-06-15 14:01:20,955][1648985] Fps is (10 sec: 45875.4, 60 sec: 47513.6, 300 sec: 46319.6). Total num frames: 413368320. Throughput: 0: 11525.7. Samples: 103421440. Policy #0 lag: (min: 10.0, avg: 112.7, max: 266.0) [2024-06-15 14:01:20,956][1648985] Avg episode reward: [(0, '143.160')] [2024-06-15 14:01:22,439][1652491] Updated weights for policy 0, policy_version 201880 (0.0014) [2024-06-15 14:01:24,882][1652491] Updated weights for policy 0, policy_version 201971 (0.0013) [2024-06-15 14:01:25,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 46421.3, 300 sec: 46652.8). Total num frames: 413663232. Throughput: 0: 11537.2. Samples: 103453696. Policy #0 lag: (min: 10.0, avg: 112.7, max: 266.0) [2024-06-15 14:01:25,956][1648985] Avg episode reward: [(0, '131.850')] [2024-06-15 14:01:28,212][1652491] Updated weights for policy 0, policy_version 202001 (0.0011) [2024-06-15 14:01:30,955][1648985] Fps is (10 sec: 42597.8, 60 sec: 45875.1, 300 sec: 46208.4). Total num frames: 413794304. Throughput: 0: 11605.3. Samples: 103527424. Policy #0 lag: (min: 10.0, avg: 112.7, max: 266.0) [2024-06-15 14:01:30,956][1648985] Avg episode reward: [(0, '119.150')] [2024-06-15 14:01:31,011][1652491] Updated weights for policy 0, policy_version 202052 (0.0015) [2024-06-15 14:01:32,380][1652491] Updated weights for policy 0, policy_version 202109 (0.0014) [2024-06-15 14:01:34,685][1652491] Updated weights for policy 0, policy_version 202176 (0.0092) [2024-06-15 14:01:35,955][1648985] Fps is (10 sec: 49152.1, 60 sec: 47513.6, 300 sec: 46541.7). Total num frames: 414154752. Throughput: 0: 11468.8. Samples: 103588352. Policy #0 lag: (min: 10.0, avg: 112.7, max: 266.0) [2024-06-15 14:01:35,956][1648985] Avg episode reward: [(0, '129.690')] [2024-06-15 14:01:36,216][1652491] Updated weights for policy 0, policy_version 202240 (0.0012) [2024-06-15 14:01:40,828][1652491] Updated weights for policy 0, policy_version 202296 (0.0014) [2024-06-15 14:01:40,955][1648985] Fps is (10 sec: 49152.2, 60 sec: 45329.3, 300 sec: 46430.6). Total num frames: 414285824. Throughput: 0: 11457.4. Samples: 103628288. Policy #0 lag: (min: 10.0, avg: 112.7, max: 266.0) [2024-06-15 14:01:40,956][1648985] Avg episode reward: [(0, '146.150')] [2024-06-15 14:01:43,487][1652491] Updated weights for policy 0, policy_version 202336 (0.0012) [2024-06-15 14:01:45,955][1648985] Fps is (10 sec: 36044.8, 60 sec: 45875.2, 300 sec: 45986.3). Total num frames: 414515200. Throughput: 0: 11628.1. Samples: 103697920. Policy #0 lag: (min: 10.0, avg: 112.7, max: 266.0) [2024-06-15 14:01:45,956][1648985] Avg episode reward: [(0, '149.340')] [2024-06-15 14:01:45,961][1652491] Updated weights for policy 0, policy_version 202416 (0.0016) [2024-06-15 14:01:47,516][1652491] Updated weights for policy 0, policy_version 202480 (0.0012) [2024-06-15 14:01:50,955][1648985] Fps is (10 sec: 45875.7, 60 sec: 45329.1, 300 sec: 46430.6). Total num frames: 414744576. Throughput: 0: 11616.7. Samples: 103772160. Policy #0 lag: (min: 10.0, avg: 112.7, max: 266.0) [2024-06-15 14:01:50,955][1648985] Avg episode reward: [(0, '130.600')] [2024-06-15 14:01:51,812][1652491] Updated weights for policy 0, policy_version 202556 (0.0013) [2024-06-15 14:01:53,201][1651469] Signal inference workers to stop experience collection... (10600 times) [2024-06-15 14:01:53,237][1652491] InferenceWorker_p0-w0: stopping experience collection (10600 times) [2024-06-15 14:01:53,420][1651469] Signal inference workers to resume experience collection... (10600 times) [2024-06-15 14:01:53,420][1652491] InferenceWorker_p0-w0: resuming experience collection (10600 times) [2024-06-15 14:01:54,399][1652491] Updated weights for policy 0, policy_version 202624 (0.0149) [2024-06-15 14:01:55,955][1648985] Fps is (10 sec: 45875.4, 60 sec: 45875.3, 300 sec: 45986.3). Total num frames: 414973952. Throughput: 0: 11639.5. Samples: 103802880. Policy #0 lag: (min: 10.0, avg: 112.7, max: 266.0) [2024-06-15 14:01:55,955][1648985] Avg episode reward: [(0, '116.140')] [2024-06-15 14:01:57,889][1652491] Updated weights for policy 0, policy_version 202694 (0.0013) [2024-06-15 14:01:58,789][1652491] Updated weights for policy 0, policy_version 202750 (0.0013) [2024-06-15 14:02:00,955][1648985] Fps is (10 sec: 49151.3, 60 sec: 45875.2, 300 sec: 46658.0). Total num frames: 415236096. Throughput: 0: 11639.4. Samples: 103872000. Policy #0 lag: (min: 10.0, avg: 112.7, max: 266.0) [2024-06-15 14:02:00,956][1648985] Avg episode reward: [(0, '129.060')] [2024-06-15 14:02:02,756][1652491] Updated weights for policy 0, policy_version 202816 (0.0012) [2024-06-15 14:02:05,955][1648985] Fps is (10 sec: 52428.4, 60 sec: 48059.7, 300 sec: 46430.6). Total num frames: 415498240. Throughput: 0: 11707.7. Samples: 103948288. Policy #0 lag: (min: 10.0, avg: 112.7, max: 266.0) [2024-06-15 14:02:05,956][1648985] Avg episode reward: [(0, '120.610')] [2024-06-15 14:02:07,121][1652491] Updated weights for policy 0, policy_version 202886 (0.0142) [2024-06-15 14:02:09,379][1652491] Updated weights for policy 0, policy_version 202976 (0.0212) [2024-06-15 14:02:10,955][1648985] Fps is (10 sec: 52429.3, 60 sec: 47513.6, 300 sec: 46652.8). Total num frames: 415760384. Throughput: 0: 11696.4. Samples: 103980032. Policy #0 lag: (min: 10.0, avg: 112.7, max: 266.0) [2024-06-15 14:02:10,956][1648985] Avg episode reward: [(0, '124.550')] [2024-06-15 14:02:13,124][1652491] Updated weights for policy 0, policy_version 203026 (0.0012) [2024-06-15 14:02:14,077][1652491] Updated weights for policy 0, policy_version 203067 (0.0015) [2024-06-15 14:02:15,955][1648985] Fps is (10 sec: 49152.0, 60 sec: 47513.6, 300 sec: 46541.7). Total num frames: 415989760. Throughput: 0: 11776.0. Samples: 104057344. Policy #0 lag: (min: 10.0, avg: 112.7, max: 266.0) [2024-06-15 14:02:15,956][1648985] Avg episode reward: [(0, '124.750')] [2024-06-15 14:02:16,026][1652491] Updated weights for policy 0, policy_version 203134 (0.0013) [2024-06-15 14:02:19,890][1652491] Updated weights for policy 0, policy_version 203203 (0.0015) [2024-06-15 14:02:20,956][1648985] Fps is (10 sec: 52422.3, 60 sec: 48604.8, 300 sec: 46652.6). Total num frames: 416284672. Throughput: 0: 11866.7. Samples: 104122368. Policy #0 lag: (min: 10.0, avg: 112.7, max: 266.0) [2024-06-15 14:02:20,958][1648985] Avg episode reward: [(0, '137.180')] [2024-06-15 14:02:23,833][1652491] Updated weights for policy 0, policy_version 203280 (0.0013) [2024-06-15 14:02:24,945][1652491] Updated weights for policy 0, policy_version 203322 (0.0020) [2024-06-15 14:02:25,955][1648985] Fps is (10 sec: 42598.4, 60 sec: 45875.2, 300 sec: 46319.5). Total num frames: 416415744. Throughput: 0: 11923.9. Samples: 104164864. Policy #0 lag: (min: 10.0, avg: 112.7, max: 266.0) [2024-06-15 14:02:25,956][1648985] Avg episode reward: [(0, '125.230')] [2024-06-15 14:02:26,885][1652491] Updated weights for policy 0, policy_version 203387 (0.0027) [2024-06-15 14:02:30,869][1652491] Updated weights for policy 0, policy_version 203461 (0.0081) [2024-06-15 14:02:30,955][1648985] Fps is (10 sec: 39326.5, 60 sec: 48059.8, 300 sec: 46208.4). Total num frames: 416677888. Throughput: 0: 11969.4. Samples: 104236544. Policy #0 lag: (min: 1.0, avg: 97.3, max: 257.0) [2024-06-15 14:02:30,956][1648985] Avg episode reward: [(0, '132.430')] [2024-06-15 14:02:34,545][1652491] Updated weights for policy 0, policy_version 203536 (0.0012) [2024-06-15 14:02:34,629][1651469] Signal inference workers to stop experience collection... (10650 times) [2024-06-15 14:02:34,680][1652491] InferenceWorker_p0-w0: stopping experience collection (10650 times) [2024-06-15 14:02:34,824][1651469] Signal inference workers to resume experience collection... (10650 times) [2024-06-15 14:02:34,825][1652491] InferenceWorker_p0-w0: resuming experience collection (10650 times) [2024-06-15 14:02:35,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 46421.3, 300 sec: 46541.7). Total num frames: 416940032. Throughput: 0: 11867.0. Samples: 104306176. Policy #0 lag: (min: 1.0, avg: 97.3, max: 257.0) [2024-06-15 14:02:35,956][1648985] Avg episode reward: [(0, '134.080')] [2024-06-15 14:02:36,938][1652491] Updated weights for policy 0, policy_version 203600 (0.0023) [2024-06-15 14:02:40,732][1652491] Updated weights for policy 0, policy_version 203649 (0.0013) [2024-06-15 14:02:40,955][1648985] Fps is (10 sec: 42598.3, 60 sec: 46967.5, 300 sec: 45986.3). Total num frames: 417103872. Throughput: 0: 12014.9. Samples: 104343552. Policy #0 lag: (min: 1.0, avg: 97.3, max: 257.0) [2024-06-15 14:02:40,956][1648985] Avg episode reward: [(0, '117.650')] [2024-06-15 14:02:42,588][1652491] Updated weights for policy 0, policy_version 203744 (0.0151) [2024-06-15 14:02:44,898][1652491] Updated weights for policy 0, policy_version 203783 (0.0020) [2024-06-15 14:02:45,955][1648985] Fps is (10 sec: 52428.5, 60 sec: 49151.9, 300 sec: 46763.8). Total num frames: 417464320. Throughput: 0: 12071.8. Samples: 104415232. Policy #0 lag: (min: 1.0, avg: 97.3, max: 257.0) [2024-06-15 14:02:45,956][1648985] Avg episode reward: [(0, '121.900')] [2024-06-15 14:02:46,028][1652491] Updated weights for policy 0, policy_version 203840 (0.0092) [2024-06-15 14:02:49,074][1652491] Updated weights for policy 0, policy_version 203904 (0.0011) [2024-06-15 14:02:50,955][1648985] Fps is (10 sec: 49152.4, 60 sec: 47513.6, 300 sec: 46319.5). Total num frames: 417595392. Throughput: 0: 11980.8. Samples: 104487424. Policy #0 lag: (min: 1.0, avg: 97.3, max: 257.0) [2024-06-15 14:02:50,955][1648985] Avg episode reward: [(0, '142.590')] [2024-06-15 14:02:53,283][1652491] Updated weights for policy 0, policy_version 203984 (0.0012) [2024-06-15 14:02:54,272][1652491] Updated weights for policy 0, policy_version 204032 (0.0019) [2024-06-15 14:02:55,955][1648985] Fps is (10 sec: 39320.4, 60 sec: 48059.4, 300 sec: 46652.7). Total num frames: 417857536. Throughput: 0: 11912.4. Samples: 104516096. Policy #0 lag: (min: 1.0, avg: 97.3, max: 257.0) [2024-06-15 14:02:55,956][1648985] Avg episode reward: [(0, '154.910')] [2024-06-15 14:02:56,600][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000204064_417923072.pth... [2024-06-15 14:02:56,751][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000198640_406814720.pth [2024-06-15 14:02:57,225][1652491] Updated weights for policy 0, policy_version 204088 (0.0016) [2024-06-15 14:03:00,822][1652491] Updated weights for policy 0, policy_version 204144 (0.0014) [2024-06-15 14:03:00,955][1648985] Fps is (10 sec: 49151.8, 60 sec: 47513.7, 300 sec: 46319.5). Total num frames: 418086912. Throughput: 0: 11878.4. Samples: 104591872. Policy #0 lag: (min: 1.0, avg: 97.3, max: 257.0) [2024-06-15 14:03:00,956][1648985] Avg episode reward: [(0, '141.310')] [2024-06-15 14:03:03,033][1652491] Updated weights for policy 0, policy_version 204178 (0.0021) [2024-06-15 14:03:04,049][1652491] Updated weights for policy 0, policy_version 204224 (0.0014) [2024-06-15 14:03:05,268][1652491] Updated weights for policy 0, policy_version 204286 (0.0019) [2024-06-15 14:03:05,955][1648985] Fps is (10 sec: 52430.2, 60 sec: 48059.6, 300 sec: 46652.7). Total num frames: 418381824. Throughput: 0: 11924.2. Samples: 104658944. Policy #0 lag: (min: 1.0, avg: 97.3, max: 257.0) [2024-06-15 14:03:05,957][1648985] Avg episode reward: [(0, '120.450')] [2024-06-15 14:03:07,403][1652491] Updated weights for policy 0, policy_version 204323 (0.0013) [2024-06-15 14:03:10,780][1652491] Updated weights for policy 0, policy_version 204368 (0.0011) [2024-06-15 14:03:10,955][1648985] Fps is (10 sec: 45875.0, 60 sec: 46421.3, 300 sec: 46319.5). Total num frames: 418545664. Throughput: 0: 11787.4. Samples: 104695296. Policy #0 lag: (min: 1.0, avg: 97.3, max: 257.0) [2024-06-15 14:03:10,956][1648985] Avg episode reward: [(0, '132.230')] [2024-06-15 14:03:12,026][1652491] Updated weights for policy 0, policy_version 204414 (0.0013) [2024-06-15 14:03:15,199][1652491] Updated weights for policy 0, policy_version 204480 (0.0012) [2024-06-15 14:03:15,919][1651469] Signal inference workers to stop experience collection... (10700 times) [2024-06-15 14:03:15,955][1648985] Fps is (10 sec: 45876.0, 60 sec: 47513.6, 300 sec: 46430.6). Total num frames: 418840576. Throughput: 0: 11776.0. Samples: 104766464. Policy #0 lag: (min: 1.0, avg: 97.3, max: 257.0) [2024-06-15 14:03:15,956][1648985] Avg episode reward: [(0, '152.040')] [2024-06-15 14:03:16,012][1652491] InferenceWorker_p0-w0: stopping experience collection (10700 times) [2024-06-15 14:03:16,116][1651469] Signal inference workers to resume experience collection... (10700 times) [2024-06-15 14:03:16,116][1652491] InferenceWorker_p0-w0: resuming experience collection (10700 times) [2024-06-15 14:03:16,319][1652491] Updated weights for policy 0, policy_version 204541 (0.0014) [2024-06-15 14:03:19,221][1652491] Updated weights for policy 0, policy_version 204597 (0.0017) [2024-06-15 14:03:20,955][1648985] Fps is (10 sec: 49151.6, 60 sec: 45876.1, 300 sec: 46542.8). Total num frames: 419037184. Throughput: 0: 11867.0. Samples: 104840192. Policy #0 lag: (min: 1.0, avg: 97.3, max: 257.0) [2024-06-15 14:03:20,956][1648985] Avg episode reward: [(0, '157.210')] [2024-06-15 14:03:22,199][1652491] Updated weights for policy 0, policy_version 204666 (0.0015) [2024-06-15 14:03:25,955][1648985] Fps is (10 sec: 39320.9, 60 sec: 46967.4, 300 sec: 45986.3). Total num frames: 419233792. Throughput: 0: 11844.2. Samples: 104876544. Policy #0 lag: (min: 1.0, avg: 97.3, max: 257.0) [2024-06-15 14:03:25,956][1648985] Avg episode reward: [(0, '159.970')] [2024-06-15 14:03:26,420][1652491] Updated weights for policy 0, policy_version 204721 (0.0014) [2024-06-15 14:03:27,912][1652491] Updated weights for policy 0, policy_version 204795 (0.0013) [2024-06-15 14:03:29,909][1652491] Updated weights for policy 0, policy_version 204848 (0.0055) [2024-06-15 14:03:30,955][1648985] Fps is (10 sec: 52429.8, 60 sec: 48059.8, 300 sec: 46652.7). Total num frames: 419561472. Throughput: 0: 11707.8. Samples: 104942080. Policy #0 lag: (min: 1.0, avg: 97.3, max: 257.0) [2024-06-15 14:03:30,956][1648985] Avg episode reward: [(0, '162.480')] [2024-06-15 14:03:32,870][1652491] Updated weights for policy 0, policy_version 204887 (0.0015) [2024-06-15 14:03:35,955][1648985] Fps is (10 sec: 45876.2, 60 sec: 45875.3, 300 sec: 46208.5). Total num frames: 419692544. Throughput: 0: 11923.9. Samples: 105024000. Policy #0 lag: (min: 1.0, avg: 97.3, max: 257.0) [2024-06-15 14:03:35,955][1648985] Avg episode reward: [(0, '156.400')] [2024-06-15 14:03:36,474][1652491] Updated weights for policy 0, policy_version 204944 (0.0012) [2024-06-15 14:03:37,581][1652491] Updated weights for policy 0, policy_version 204983 (0.0012) [2024-06-15 14:03:39,178][1652491] Updated weights for policy 0, policy_version 205055 (0.0076) [2024-06-15 14:03:40,955][1648985] Fps is (10 sec: 52428.0, 60 sec: 49698.1, 300 sec: 46763.8). Total num frames: 420085760. Throughput: 0: 11867.1. Samples: 105050112. Policy #0 lag: (min: 31.0, avg: 106.3, max: 271.0) [2024-06-15 14:03:40,957][1648985] Avg episode reward: [(0, '130.510')] [2024-06-15 14:03:40,963][1652491] Updated weights for policy 0, policy_version 205120 (0.0013) [2024-06-15 14:03:44,897][1652491] Updated weights for policy 0, policy_version 205179 (0.0015) [2024-06-15 14:03:45,955][1648985] Fps is (10 sec: 52428.6, 60 sec: 45875.3, 300 sec: 46541.7). Total num frames: 420216832. Throughput: 0: 11741.9. Samples: 105120256. Policy #0 lag: (min: 31.0, avg: 106.3, max: 271.0) [2024-06-15 14:03:45,955][1648985] Avg episode reward: [(0, '123.230')] [2024-06-15 14:03:48,513][1652491] Updated weights for policy 0, policy_version 205235 (0.0014) [2024-06-15 14:03:49,985][1652491] Updated weights for policy 0, policy_version 205296 (0.0011) [2024-06-15 14:03:50,955][1648985] Fps is (10 sec: 42599.1, 60 sec: 48605.9, 300 sec: 46652.8). Total num frames: 420511744. Throughput: 0: 11776.1. Samples: 105188864. Policy #0 lag: (min: 31.0, avg: 106.3, max: 271.0) [2024-06-15 14:03:50,955][1648985] Avg episode reward: [(0, '117.910')] [2024-06-15 14:03:51,200][1652491] Updated weights for policy 0, policy_version 205347 (0.0014) [2024-06-15 14:03:55,014][1652491] Updated weights for policy 0, policy_version 205380 (0.0013) [2024-06-15 14:03:55,955][1648985] Fps is (10 sec: 49151.4, 60 sec: 47513.8, 300 sec: 46874.9). Total num frames: 420708352. Throughput: 0: 11821.5. Samples: 105227264. Policy #0 lag: (min: 31.0, avg: 106.3, max: 271.0) [2024-06-15 14:03:55,956][1648985] Avg episode reward: [(0, '137.890')] [2024-06-15 14:03:58,472][1652491] Updated weights for policy 0, policy_version 205442 (0.0096) [2024-06-15 14:04:00,302][1651469] Signal inference workers to stop experience collection... (10750 times) [2024-06-15 14:04:00,368][1652491] InferenceWorker_p0-w0: stopping experience collection (10750 times) [2024-06-15 14:04:00,370][1652491] Updated weights for policy 0, policy_version 205506 (0.0012) [2024-06-15 14:04:00,616][1651469] Signal inference workers to resume experience collection... (10750 times) [2024-06-15 14:04:00,617][1652491] InferenceWorker_p0-w0: resuming experience collection (10750 times) [2024-06-15 14:04:00,955][1648985] Fps is (10 sec: 39321.3, 60 sec: 46967.4, 300 sec: 46319.9). Total num frames: 420904960. Throughput: 0: 11821.5. Samples: 105298432. Policy #0 lag: (min: 31.0, avg: 106.3, max: 271.0) [2024-06-15 14:04:00,956][1648985] Avg episode reward: [(0, '145.850')] [2024-06-15 14:04:01,996][1652491] Updated weights for policy 0, policy_version 205572 (0.0012) [2024-06-15 14:04:02,985][1652491] Updated weights for policy 0, policy_version 205627 (0.0026) [2024-06-15 14:04:05,955][1648985] Fps is (10 sec: 45875.7, 60 sec: 46421.5, 300 sec: 46763.8). Total num frames: 421167104. Throughput: 0: 11832.9. Samples: 105372672. Policy #0 lag: (min: 31.0, avg: 106.3, max: 271.0) [2024-06-15 14:04:05,956][1648985] Avg episode reward: [(0, '135.260')] [2024-06-15 14:04:06,712][1652491] Updated weights for policy 0, policy_version 205692 (0.0014) [2024-06-15 14:04:10,684][1652491] Updated weights for policy 0, policy_version 205750 (0.0014) [2024-06-15 14:04:10,955][1648985] Fps is (10 sec: 49152.0, 60 sec: 47513.6, 300 sec: 46541.7). Total num frames: 421396480. Throughput: 0: 11832.9. Samples: 105409024. Policy #0 lag: (min: 31.0, avg: 106.3, max: 271.0) [2024-06-15 14:04:10,956][1648985] Avg episode reward: [(0, '131.820')] [2024-06-15 14:04:11,675][1652491] Updated weights for policy 0, policy_version 205778 (0.0011) [2024-06-15 14:04:13,527][1652491] Updated weights for policy 0, policy_version 205843 (0.0013) [2024-06-15 14:04:15,955][1648985] Fps is (10 sec: 49151.5, 60 sec: 46967.4, 300 sec: 46652.7). Total num frames: 421658624. Throughput: 0: 11707.7. Samples: 105468928. Policy #0 lag: (min: 31.0, avg: 106.3, max: 271.0) [2024-06-15 14:04:15,956][1648985] Avg episode reward: [(0, '136.080')] [2024-06-15 14:04:17,797][1652491] Updated weights for policy 0, policy_version 205907 (0.0015) [2024-06-15 14:04:20,955][1648985] Fps is (10 sec: 39321.4, 60 sec: 45875.3, 300 sec: 46652.8). Total num frames: 421789696. Throughput: 0: 11593.9. Samples: 105545728. Policy #0 lag: (min: 31.0, avg: 106.3, max: 271.0) [2024-06-15 14:04:20,956][1648985] Avg episode reward: [(0, '127.690')] [2024-06-15 14:04:21,599][1652491] Updated weights for policy 0, policy_version 205968 (0.0114) [2024-06-15 14:04:22,659][1652491] Updated weights for policy 0, policy_version 206015 (0.0013) [2024-06-15 14:04:23,976][1652491] Updated weights for policy 0, policy_version 206064 (0.0012) [2024-06-15 14:04:25,955][1648985] Fps is (10 sec: 49152.2, 60 sec: 48606.0, 300 sec: 46874.9). Total num frames: 422150144. Throughput: 0: 11707.7. Samples: 105576960. Policy #0 lag: (min: 31.0, avg: 106.3, max: 271.0) [2024-06-15 14:04:25,956][1648985] Avg episode reward: [(0, '122.090')] [2024-06-15 14:04:29,360][1652491] Updated weights for policy 0, policy_version 206157 (0.0015) [2024-06-15 14:04:30,273][1652491] Updated weights for policy 0, policy_version 206200 (0.0014) [2024-06-15 14:04:30,955][1648985] Fps is (10 sec: 52428.4, 60 sec: 45875.0, 300 sec: 47097.0). Total num frames: 422313984. Throughput: 0: 11628.0. Samples: 105643520. Policy #0 lag: (min: 31.0, avg: 106.3, max: 271.0) [2024-06-15 14:04:30,956][1648985] Avg episode reward: [(0, '127.580')] [2024-06-15 14:04:33,310][1652491] Updated weights for policy 0, policy_version 206240 (0.0013) [2024-06-15 14:04:34,717][1652491] Updated weights for policy 0, policy_version 206304 (0.0012) [2024-06-15 14:04:35,955][1648985] Fps is (10 sec: 42598.8, 60 sec: 48059.7, 300 sec: 46763.8). Total num frames: 422576128. Throughput: 0: 11616.7. Samples: 105711616. Policy #0 lag: (min: 31.0, avg: 106.3, max: 271.0) [2024-06-15 14:04:35,955][1648985] Avg episode reward: [(0, '136.050')] [2024-06-15 14:04:37,079][1652491] Updated weights for policy 0, policy_version 206393 (0.0098) [2024-06-15 14:04:40,955][1648985] Fps is (10 sec: 42598.9, 60 sec: 44236.8, 300 sec: 46763.8). Total num frames: 422739968. Throughput: 0: 11514.3. Samples: 105745408. Policy #0 lag: (min: 31.0, avg: 106.3, max: 271.0) [2024-06-15 14:04:40,956][1648985] Avg episode reward: [(0, '137.760')] [2024-06-15 14:04:41,628][1652491] Updated weights for policy 0, policy_version 206460 (0.0013) [2024-06-15 14:04:44,430][1651469] Signal inference workers to stop experience collection... (10800 times) [2024-06-15 14:04:44,485][1652491] InferenceWorker_p0-w0: stopping experience collection (10800 times) [2024-06-15 14:04:44,673][1651469] Signal inference workers to resume experience collection... (10800 times) [2024-06-15 14:04:44,674][1652491] InferenceWorker_p0-w0: resuming experience collection (10800 times) [2024-06-15 14:04:45,458][1652491] Updated weights for policy 0, policy_version 206512 (0.0129) [2024-06-15 14:04:45,955][1648985] Fps is (10 sec: 39320.0, 60 sec: 45874.9, 300 sec: 46652.7). Total num frames: 422969344. Throughput: 0: 11639.4. Samples: 105822208. Policy #0 lag: (min: 7.0, avg: 85.2, max: 263.0) [2024-06-15 14:04:45,957][1648985] Avg episode reward: [(0, '140.290')] [2024-06-15 14:04:46,750][1652491] Updated weights for policy 0, policy_version 206562 (0.0010) [2024-06-15 14:04:48,950][1652491] Updated weights for policy 0, policy_version 206646 (0.0023) [2024-06-15 14:04:50,955][1648985] Fps is (10 sec: 49150.9, 60 sec: 45328.8, 300 sec: 46652.7). Total num frames: 423231488. Throughput: 0: 11355.0. Samples: 105883648. Policy #0 lag: (min: 7.0, avg: 85.2, max: 263.0) [2024-06-15 14:04:50,956][1648985] Avg episode reward: [(0, '149.920')] [2024-06-15 14:04:52,412][1652491] Updated weights for policy 0, policy_version 206688 (0.0013) [2024-06-15 14:04:55,809][1652491] Updated weights for policy 0, policy_version 206725 (0.0013) [2024-06-15 14:04:55,955][1648985] Fps is (10 sec: 39321.8, 60 sec: 44236.6, 300 sec: 46763.8). Total num frames: 423362560. Throughput: 0: 11354.9. Samples: 105920000. Policy #0 lag: (min: 7.0, avg: 85.2, max: 263.0) [2024-06-15 14:04:55,956][1648985] Avg episode reward: [(0, '146.110')] [2024-06-15 14:04:56,554][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000206752_423428096.pth... [2024-06-15 14:04:56,695][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000201280_412221440.pth [2024-06-15 14:04:57,898][1652491] Updated weights for policy 0, policy_version 206801 (0.0013) [2024-06-15 14:04:59,311][1652491] Updated weights for policy 0, policy_version 206864 (0.0013) [2024-06-15 14:05:00,528][1652491] Updated weights for policy 0, policy_version 206909 (0.0025) [2024-06-15 14:05:00,955][1648985] Fps is (10 sec: 52429.2, 60 sec: 47513.5, 300 sec: 46874.9). Total num frames: 423755776. Throughput: 0: 11434.6. Samples: 105983488. Policy #0 lag: (min: 7.0, avg: 85.2, max: 263.0) [2024-06-15 14:05:00,956][1648985] Avg episode reward: [(0, '142.610')] [2024-06-15 14:05:04,520][1652491] Updated weights for policy 0, policy_version 206972 (0.0014) [2024-06-15 14:05:05,955][1648985] Fps is (10 sec: 52430.1, 60 sec: 45329.0, 300 sec: 46874.9). Total num frames: 423886848. Throughput: 0: 11389.2. Samples: 106058240. Policy #0 lag: (min: 7.0, avg: 85.2, max: 263.0) [2024-06-15 14:05:05,956][1648985] Avg episode reward: [(0, '135.320')] [2024-06-15 14:05:08,498][1652491] Updated weights for policy 0, policy_version 207040 (0.0014) [2024-06-15 14:05:10,030][1652491] Updated weights for policy 0, policy_version 207102 (0.0012) [2024-06-15 14:05:10,955][1648985] Fps is (10 sec: 39322.2, 60 sec: 45875.2, 300 sec: 46652.7). Total num frames: 424148992. Throughput: 0: 11366.4. Samples: 106088448. Policy #0 lag: (min: 7.0, avg: 85.2, max: 263.0) [2024-06-15 14:05:10,956][1648985] Avg episode reward: [(0, '141.890')] [2024-06-15 14:05:12,587][1652491] Updated weights for policy 0, policy_version 207162 (0.0139) [2024-06-15 14:05:15,816][1652491] Updated weights for policy 0, policy_version 207201 (0.0019) [2024-06-15 14:05:15,960][1648985] Fps is (10 sec: 45851.6, 60 sec: 44779.1, 300 sec: 46874.1). Total num frames: 424345600. Throughput: 0: 11467.5. Samples: 106159616. Policy #0 lag: (min: 7.0, avg: 85.2, max: 263.0) [2024-06-15 14:05:15,961][1648985] Avg episode reward: [(0, '143.270')] [2024-06-15 14:05:19,069][1652491] Updated weights for policy 0, policy_version 207264 (0.0013) [2024-06-15 14:05:20,955][1648985] Fps is (10 sec: 45875.3, 60 sec: 46967.5, 300 sec: 46541.7). Total num frames: 424607744. Throughput: 0: 11366.4. Samples: 106223104. Policy #0 lag: (min: 7.0, avg: 85.2, max: 263.0) [2024-06-15 14:05:20,956][1648985] Avg episode reward: [(0, '158.320')] [2024-06-15 14:05:21,097][1652491] Updated weights for policy 0, policy_version 207344 (0.0118) [2024-06-15 14:05:23,204][1652491] Updated weights for policy 0, policy_version 207392 (0.0011) [2024-06-15 14:05:25,955][1648985] Fps is (10 sec: 45900.0, 60 sec: 44237.0, 300 sec: 46652.8). Total num frames: 424804352. Throughput: 0: 11309.6. Samples: 106254336. Policy #0 lag: (min: 7.0, avg: 85.2, max: 263.0) [2024-06-15 14:05:25,955][1648985] Avg episode reward: [(0, '136.520')] [2024-06-15 14:05:26,515][1651469] Signal inference workers to stop experience collection... (10850 times) [2024-06-15 14:05:26,624][1652491] InferenceWorker_p0-w0: stopping experience collection (10850 times) [2024-06-15 14:05:26,716][1651469] Signal inference workers to resume experience collection... (10850 times) [2024-06-15 14:05:26,722][1652491] InferenceWorker_p0-w0: resuming experience collection (10850 times) [2024-06-15 14:05:27,358][1652491] Updated weights for policy 0, policy_version 207472 (0.0012) [2024-06-15 14:05:30,077][1652491] Updated weights for policy 0, policy_version 207520 (0.0013) [2024-06-15 14:05:30,955][1648985] Fps is (10 sec: 45875.4, 60 sec: 45875.3, 300 sec: 46652.7). Total num frames: 425066496. Throughput: 0: 11423.4. Samples: 106336256. Policy #0 lag: (min: 7.0, avg: 85.2, max: 263.0) [2024-06-15 14:05:30,956][1648985] Avg episode reward: [(0, '136.450')] [2024-06-15 14:05:31,632][1652491] Updated weights for policy 0, policy_version 207584 (0.0035) [2024-06-15 14:05:32,388][1652491] Updated weights for policy 0, policy_version 207616 (0.0013) [2024-06-15 14:05:35,291][1652491] Updated weights for policy 0, policy_version 207677 (0.0013) [2024-06-15 14:05:35,955][1648985] Fps is (10 sec: 52428.0, 60 sec: 45875.2, 300 sec: 46652.8). Total num frames: 425328640. Throughput: 0: 11434.7. Samples: 106398208. Policy #0 lag: (min: 7.0, avg: 85.2, max: 263.0) [2024-06-15 14:05:35,955][1648985] Avg episode reward: [(0, '125.940')] [2024-06-15 14:05:38,820][1652491] Updated weights for policy 0, policy_version 207728 (0.0012) [2024-06-15 14:05:40,955][1648985] Fps is (10 sec: 42597.5, 60 sec: 45875.1, 300 sec: 46541.6). Total num frames: 425492480. Throughput: 0: 11514.3. Samples: 106438144. Policy #0 lag: (min: 7.0, avg: 85.2, max: 263.0) [2024-06-15 14:05:40,956][1648985] Avg episode reward: [(0, '127.390')] [2024-06-15 14:05:41,844][1652491] Updated weights for policy 0, policy_version 207800 (0.0013) [2024-06-15 14:05:42,796][1652491] Updated weights for policy 0, policy_version 207844 (0.0012) [2024-06-15 14:05:45,459][1652491] Updated weights for policy 0, policy_version 207890 (0.0015) [2024-06-15 14:05:45,955][1648985] Fps is (10 sec: 45875.3, 60 sec: 46967.8, 300 sec: 46652.8). Total num frames: 425787392. Throughput: 0: 11673.7. Samples: 106508800. Policy #0 lag: (min: 7.0, avg: 85.2, max: 263.0) [2024-06-15 14:05:45,955][1648985] Avg episode reward: [(0, '126.400')] [2024-06-15 14:05:46,264][1652491] Updated weights for policy 0, policy_version 207935 (0.0030) [2024-06-15 14:05:49,771][1652491] Updated weights for policy 0, policy_version 207987 (0.0013) [2024-06-15 14:05:50,955][1648985] Fps is (10 sec: 49153.2, 60 sec: 45875.4, 300 sec: 46652.8). Total num frames: 425984000. Throughput: 0: 11719.1. Samples: 106585600. Policy #0 lag: (min: 7.0, avg: 85.2, max: 263.0) [2024-06-15 14:05:50,956][1648985] Avg episode reward: [(0, '137.120')] [2024-06-15 14:05:52,007][1652491] Updated weights for policy 0, policy_version 208032 (0.0022) [2024-06-15 14:05:53,647][1652491] Updated weights for policy 0, policy_version 208098 (0.0013) [2024-06-15 14:05:55,955][1648985] Fps is (10 sec: 45874.9, 60 sec: 48060.0, 300 sec: 46652.8). Total num frames: 426246144. Throughput: 0: 11673.6. Samples: 106613760. Policy #0 lag: (min: 38.0, avg: 134.1, max: 294.0) [2024-06-15 14:05:55,956][1648985] Avg episode reward: [(0, '151.920')] [2024-06-15 14:05:56,200][1652491] Updated weights for policy 0, policy_version 208129 (0.0059) [2024-06-15 14:05:57,403][1652491] Updated weights for policy 0, policy_version 208192 (0.0113) [2024-06-15 14:06:00,955][1648985] Fps is (10 sec: 45874.8, 60 sec: 44783.0, 300 sec: 46874.9). Total num frames: 426442752. Throughput: 0: 11822.9. Samples: 106691584. Policy #0 lag: (min: 38.0, avg: 134.1, max: 294.0) [2024-06-15 14:06:00,956][1648985] Avg episode reward: [(0, '145.440')] [2024-06-15 14:06:01,353][1652491] Updated weights for policy 0, policy_version 208256 (0.0013) [2024-06-15 14:06:04,729][1652491] Updated weights for policy 0, policy_version 208352 (0.0013) [2024-06-15 14:06:05,388][1652491] Updated weights for policy 0, policy_version 208383 (0.0012) [2024-06-15 14:06:05,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 48059.8, 300 sec: 46986.0). Total num frames: 426770432. Throughput: 0: 11844.3. Samples: 106756096. Policy #0 lag: (min: 38.0, avg: 134.1, max: 294.0) [2024-06-15 14:06:05,956][1648985] Avg episode reward: [(0, '143.210')] [2024-06-15 14:06:07,779][1651469] Signal inference workers to stop experience collection... (10900 times) [2024-06-15 14:06:07,824][1652491] InferenceWorker_p0-w0: stopping experience collection (10900 times) [2024-06-15 14:06:08,024][1651469] Signal inference workers to resume experience collection... (10900 times) [2024-06-15 14:06:08,025][1652491] InferenceWorker_p0-w0: resuming experience collection (10900 times) [2024-06-15 14:06:08,469][1652491] Updated weights for policy 0, policy_version 208433 (0.0033) [2024-06-15 14:06:10,955][1648985] Fps is (10 sec: 45875.3, 60 sec: 45875.2, 300 sec: 46652.7). Total num frames: 426901504. Throughput: 0: 11901.1. Samples: 106789888. Policy #0 lag: (min: 38.0, avg: 134.1, max: 294.0) [2024-06-15 14:06:10,956][1648985] Avg episode reward: [(0, '147.770')] [2024-06-15 14:06:11,846][1652491] Updated weights for policy 0, policy_version 208480 (0.0014) [2024-06-15 14:06:14,408][1652491] Updated weights for policy 0, policy_version 208532 (0.0012) [2024-06-15 14:06:15,955][1648985] Fps is (10 sec: 42598.6, 60 sec: 47517.8, 300 sec: 46874.9). Total num frames: 427196416. Throughput: 0: 11685.0. Samples: 106862080. Policy #0 lag: (min: 38.0, avg: 134.1, max: 294.0) [2024-06-15 14:06:15,956][1648985] Avg episode reward: [(0, '144.770')] [2024-06-15 14:06:16,315][1652491] Updated weights for policy 0, policy_version 208614 (0.0012) [2024-06-15 14:06:18,172][1652491] Updated weights for policy 0, policy_version 208645 (0.0022) [2024-06-15 14:06:19,303][1652491] Updated weights for policy 0, policy_version 208694 (0.0017) [2024-06-15 14:06:20,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 46967.4, 300 sec: 46652.7). Total num frames: 427425792. Throughput: 0: 11958.0. Samples: 106936320. Policy #0 lag: (min: 38.0, avg: 134.1, max: 294.0) [2024-06-15 14:06:20,956][1648985] Avg episode reward: [(0, '133.700')] [2024-06-15 14:06:22,366][1652491] Updated weights for policy 0, policy_version 208724 (0.0012) [2024-06-15 14:06:25,512][1652491] Updated weights for policy 0, policy_version 208787 (0.0014) [2024-06-15 14:06:25,955][1648985] Fps is (10 sec: 42597.8, 60 sec: 46967.2, 300 sec: 46874.9). Total num frames: 427622400. Throughput: 0: 11832.9. Samples: 106970624. Policy #0 lag: (min: 38.0, avg: 134.1, max: 294.0) [2024-06-15 14:06:25,956][1648985] Avg episode reward: [(0, '133.350')] [2024-06-15 14:06:27,449][1652491] Updated weights for policy 0, policy_version 208864 (0.0128) [2024-06-15 14:06:29,626][1652491] Updated weights for policy 0, policy_version 208912 (0.0013) [2024-06-15 14:06:30,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 48059.7, 300 sec: 46763.8). Total num frames: 427950080. Throughput: 0: 11764.6. Samples: 107038208. Policy #0 lag: (min: 38.0, avg: 134.1, max: 294.0) [2024-06-15 14:06:30,956][1648985] Avg episode reward: [(0, '130.820')] [2024-06-15 14:06:33,695][1652491] Updated weights for policy 0, policy_version 208962 (0.0013) [2024-06-15 14:06:34,904][1652491] Updated weights for policy 0, policy_version 209024 (0.0012) [2024-06-15 14:06:35,955][1648985] Fps is (10 sec: 45875.6, 60 sec: 45875.2, 300 sec: 46763.8). Total num frames: 428081152. Throughput: 0: 11639.5. Samples: 107109376. Policy #0 lag: (min: 38.0, avg: 134.1, max: 294.0) [2024-06-15 14:06:35,956][1648985] Avg episode reward: [(0, '139.420')] [2024-06-15 14:06:38,197][1652491] Updated weights for policy 0, policy_version 209092 (0.0011) [2024-06-15 14:06:39,642][1652491] Updated weights for policy 0, policy_version 209152 (0.0125) [2024-06-15 14:06:40,955][1648985] Fps is (10 sec: 39322.0, 60 sec: 47513.8, 300 sec: 46874.9). Total num frames: 428343296. Throughput: 0: 11605.3. Samples: 107136000. Policy #0 lag: (min: 38.0, avg: 134.1, max: 294.0) [2024-06-15 14:06:40,955][1648985] Avg episode reward: [(0, '139.400')] [2024-06-15 14:06:41,913][1652491] Updated weights for policy 0, policy_version 209212 (0.0013) [2024-06-15 14:06:45,239][1652491] Updated weights for policy 0, policy_version 209251 (0.0016) [2024-06-15 14:06:45,955][1648985] Fps is (10 sec: 52428.1, 60 sec: 46967.3, 300 sec: 46986.0). Total num frames: 428605440. Throughput: 0: 11628.1. Samples: 107214848. Policy #0 lag: (min: 38.0, avg: 134.1, max: 294.0) [2024-06-15 14:06:45,956][1648985] Avg episode reward: [(0, '133.070')] [2024-06-15 14:06:48,364][1652491] Updated weights for policy 0, policy_version 209313 (0.0101) [2024-06-15 14:06:49,702][1651469] Signal inference workers to stop experience collection... (10950 times) [2024-06-15 14:06:49,763][1652491] Updated weights for policy 0, policy_version 209378 (0.0039) [2024-06-15 14:06:49,792][1652491] InferenceWorker_p0-w0: stopping experience collection (10950 times) [2024-06-15 14:06:49,933][1651469] Signal inference workers to resume experience collection... (10950 times) [2024-06-15 14:06:49,934][1652491] InferenceWorker_p0-w0: resuming experience collection (10950 times) [2024-06-15 14:06:50,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 48059.7, 300 sec: 47097.1). Total num frames: 428867584. Throughput: 0: 11707.7. Samples: 107282944. Policy #0 lag: (min: 38.0, avg: 134.1, max: 294.0) [2024-06-15 14:06:50,955][1648985] Avg episode reward: [(0, '127.960')] [2024-06-15 14:06:51,837][1652491] Updated weights for policy 0, policy_version 209410 (0.0017) [2024-06-15 14:06:53,182][1652491] Updated weights for policy 0, policy_version 209470 (0.0013) [2024-06-15 14:06:55,955][1648985] Fps is (10 sec: 39321.9, 60 sec: 45875.1, 300 sec: 46652.8). Total num frames: 428998656. Throughput: 0: 11764.6. Samples: 107319296. Policy #0 lag: (min: 38.0, avg: 134.1, max: 294.0) [2024-06-15 14:06:55,956][1648985] Avg episode reward: [(0, '141.390')] [2024-06-15 14:06:56,546][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000209504_429064192.pth... [2024-06-15 14:06:56,754][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000204064_417923072.pth [2024-06-15 14:06:57,216][1652491] Updated weights for policy 0, policy_version 209525 (0.0013) [2024-06-15 14:06:59,821][1652491] Updated weights for policy 0, policy_version 209571 (0.0012) [2024-06-15 14:07:00,955][1648985] Fps is (10 sec: 42597.0, 60 sec: 47513.4, 300 sec: 46763.8). Total num frames: 429293568. Throughput: 0: 11753.2. Samples: 107390976. Policy #0 lag: (min: 47.0, avg: 140.2, max: 303.0) [2024-06-15 14:07:00,956][1648985] Avg episode reward: [(0, '147.670')] [2024-06-15 14:07:01,074][1652491] Updated weights for policy 0, policy_version 209632 (0.0013) [2024-06-15 14:07:02,950][1652491] Updated weights for policy 0, policy_version 209665 (0.0010) [2024-06-15 14:07:04,400][1652491] Updated weights for policy 0, policy_version 209728 (0.0013) [2024-06-15 14:07:05,955][1648985] Fps is (10 sec: 52429.6, 60 sec: 45875.3, 300 sec: 46652.8). Total num frames: 429522944. Throughput: 0: 11707.8. Samples: 107463168. Policy #0 lag: (min: 47.0, avg: 140.2, max: 303.0) [2024-06-15 14:07:05,955][1648985] Avg episode reward: [(0, '146.500')] [2024-06-15 14:07:09,613][1652491] Updated weights for policy 0, policy_version 209794 (0.0014) [2024-06-15 14:07:10,955][1648985] Fps is (10 sec: 45876.4, 60 sec: 47513.6, 300 sec: 46652.7). Total num frames: 429752320. Throughput: 0: 11764.6. Samples: 107500032. Policy #0 lag: (min: 47.0, avg: 140.2, max: 303.0) [2024-06-15 14:07:10,956][1648985] Avg episode reward: [(0, '136.280')] [2024-06-15 14:07:11,417][1652491] Updated weights for policy 0, policy_version 209872 (0.0106) [2024-06-15 14:07:12,402][1652491] Updated weights for policy 0, policy_version 209917 (0.0132) [2024-06-15 14:07:15,955][1648985] Fps is (10 sec: 49150.8, 60 sec: 46967.3, 300 sec: 46541.8). Total num frames: 430014464. Throughput: 0: 11776.0. Samples: 107568128. Policy #0 lag: (min: 47.0, avg: 140.2, max: 303.0) [2024-06-15 14:07:15,956][1648985] Avg episode reward: [(0, '109.480')] [2024-06-15 14:07:16,131][1652491] Updated weights for policy 0, policy_version 209984 (0.0141) [2024-06-15 14:07:19,480][1652491] Updated weights for policy 0, policy_version 210040 (0.0012) [2024-06-15 14:07:20,891][1652491] Updated weights for policy 0, policy_version 210080 (0.0109) [2024-06-15 14:07:20,955][1648985] Fps is (10 sec: 49152.0, 60 sec: 46967.5, 300 sec: 46874.9). Total num frames: 430243840. Throughput: 0: 11719.1. Samples: 107636736. Policy #0 lag: (min: 47.0, avg: 140.2, max: 303.0) [2024-06-15 14:07:20,956][1648985] Avg episode reward: [(0, '125.050')] [2024-06-15 14:07:22,629][1652491] Updated weights for policy 0, policy_version 210160 (0.0018) [2024-06-15 14:07:25,955][1648985] Fps is (10 sec: 42599.2, 60 sec: 46967.6, 300 sec: 46652.7). Total num frames: 430440448. Throughput: 0: 11798.8. Samples: 107666944. Policy #0 lag: (min: 47.0, avg: 140.2, max: 303.0) [2024-06-15 14:07:25,956][1648985] Avg episode reward: [(0, '147.940')] [2024-06-15 14:07:26,851][1652491] Updated weights for policy 0, policy_version 210224 (0.0013) [2024-06-15 14:07:30,686][1652491] Updated weights for policy 0, policy_version 210288 (0.0012) [2024-06-15 14:07:30,955][1648985] Fps is (10 sec: 42598.6, 60 sec: 45329.1, 300 sec: 46541.7). Total num frames: 430669824. Throughput: 0: 11776.0. Samples: 107744768. Policy #0 lag: (min: 47.0, avg: 140.2, max: 303.0) [2024-06-15 14:07:30,956][1648985] Avg episode reward: [(0, '142.680')] [2024-06-15 14:07:32,507][1652491] Updated weights for policy 0, policy_version 210327 (0.0025) [2024-06-15 14:07:33,309][1651469] Signal inference workers to stop experience collection... (11000 times) [2024-06-15 14:07:33,367][1652491] InferenceWorker_p0-w0: stopping experience collection (11000 times) [2024-06-15 14:07:33,562][1651469] Signal inference workers to resume experience collection... (11000 times) [2024-06-15 14:07:33,563][1652491] InferenceWorker_p0-w0: resuming experience collection (11000 times) [2024-06-15 14:07:33,758][1652491] Updated weights for policy 0, policy_version 210391 (0.0012) [2024-06-15 14:07:35,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 48059.8, 300 sec: 46986.0). Total num frames: 430964736. Throughput: 0: 11719.1. Samples: 107810304. Policy #0 lag: (min: 47.0, avg: 140.2, max: 303.0) [2024-06-15 14:07:35,956][1648985] Avg episode reward: [(0, '134.340')] [2024-06-15 14:07:37,420][1652491] Updated weights for policy 0, policy_version 210434 (0.0014) [2024-06-15 14:07:38,636][1652491] Updated weights for policy 0, policy_version 210487 (0.0012) [2024-06-15 14:07:40,955][1648985] Fps is (10 sec: 42598.1, 60 sec: 45875.1, 300 sec: 46208.4). Total num frames: 431095808. Throughput: 0: 11707.7. Samples: 107846144. Policy #0 lag: (min: 47.0, avg: 140.2, max: 303.0) [2024-06-15 14:07:40,956][1648985] Avg episode reward: [(0, '131.070')] [2024-06-15 14:07:42,085][1652491] Updated weights for policy 0, policy_version 210528 (0.0092) [2024-06-15 14:07:42,945][1652491] Updated weights for policy 0, policy_version 210556 (0.0036) [2024-06-15 14:07:44,543][1652491] Updated weights for policy 0, policy_version 210610 (0.0035) [2024-06-15 14:07:45,955][1648985] Fps is (10 sec: 49152.1, 60 sec: 47513.7, 300 sec: 46986.0). Total num frames: 431456256. Throughput: 0: 11457.5. Samples: 107906560. Policy #0 lag: (min: 47.0, avg: 140.2, max: 303.0) [2024-06-15 14:07:45,956][1648985] Avg episode reward: [(0, '128.480')] [2024-06-15 14:07:46,033][1652491] Updated weights for policy 0, policy_version 210679 (0.0114) [2024-06-15 14:07:50,357][1652491] Updated weights for policy 0, policy_version 210745 (0.0016) [2024-06-15 14:07:50,955][1648985] Fps is (10 sec: 52428.2, 60 sec: 45875.0, 300 sec: 46652.8). Total num frames: 431620096. Throughput: 0: 11423.2. Samples: 107977216. Policy #0 lag: (min: 47.0, avg: 140.2, max: 303.0) [2024-06-15 14:07:50,956][1648985] Avg episode reward: [(0, '127.210')] [2024-06-15 14:07:53,925][1652491] Updated weights for policy 0, policy_version 210787 (0.0012) [2024-06-15 14:07:55,668][1652491] Updated weights for policy 0, policy_version 210848 (0.0013) [2024-06-15 14:07:55,955][1648985] Fps is (10 sec: 36044.4, 60 sec: 46967.5, 300 sec: 46541.7). Total num frames: 431816704. Throughput: 0: 11400.5. Samples: 108013056. Policy #0 lag: (min: 47.0, avg: 140.2, max: 303.0) [2024-06-15 14:07:55,956][1648985] Avg episode reward: [(0, '131.870')] [2024-06-15 14:07:57,143][1652491] Updated weights for policy 0, policy_version 210912 (0.0014) [2024-06-15 14:08:00,682][1652491] Updated weights for policy 0, policy_version 210962 (0.0013) [2024-06-15 14:08:00,955][1648985] Fps is (10 sec: 45876.2, 60 sec: 46421.6, 300 sec: 46430.6). Total num frames: 432078848. Throughput: 0: 11423.3. Samples: 108082176. Policy #0 lag: (min: 47.0, avg: 140.2, max: 303.0) [2024-06-15 14:08:00,955][1648985] Avg episode reward: [(0, '142.750')] [2024-06-15 14:08:01,603][1652491] Updated weights for policy 0, policy_version 211008 (0.0018) [2024-06-15 14:08:05,532][1652491] Updated weights for policy 0, policy_version 211064 (0.0013) [2024-06-15 14:08:05,955][1648985] Fps is (10 sec: 45875.5, 60 sec: 45875.1, 300 sec: 46541.7). Total num frames: 432275456. Throughput: 0: 11548.5. Samples: 108156416. Policy #0 lag: (min: 47.0, avg: 140.2, max: 303.0) [2024-06-15 14:08:05,956][1648985] Avg episode reward: [(0, '144.630')] [2024-06-15 14:08:07,007][1652491] Updated weights for policy 0, policy_version 211122 (0.0012) [2024-06-15 14:08:08,501][1652491] Updated weights for policy 0, policy_version 211184 (0.0013) [2024-06-15 14:08:10,955][1648985] Fps is (10 sec: 45875.0, 60 sec: 46421.4, 300 sec: 46430.6). Total num frames: 432537600. Throughput: 0: 11457.4. Samples: 108182528. Policy #0 lag: (min: 92.0, avg: 192.7, max: 330.0) [2024-06-15 14:08:10,956][1648985] Avg episode reward: [(0, '150.740')] [2024-06-15 14:08:13,129][1652491] Updated weights for policy 0, policy_version 211258 (0.0018) [2024-06-15 14:08:15,955][1648985] Fps is (10 sec: 42598.7, 60 sec: 44783.1, 300 sec: 46319.5). Total num frames: 432701440. Throughput: 0: 11355.0. Samples: 108255744. Policy #0 lag: (min: 92.0, avg: 192.7, max: 330.0) [2024-06-15 14:08:15,955][1648985] Avg episode reward: [(0, '137.770')] [2024-06-15 14:08:16,703][1652491] Updated weights for policy 0, policy_version 211314 (0.0014) [2024-06-15 14:08:17,133][1651469] Signal inference workers to stop experience collection... (11050 times) [2024-06-15 14:08:17,207][1652491] InferenceWorker_p0-w0: stopping experience collection (11050 times) [2024-06-15 14:08:17,330][1651469] Signal inference workers to resume experience collection... (11050 times) [2024-06-15 14:08:17,330][1652491] InferenceWorker_p0-w0: resuming experience collection (11050 times) [2024-06-15 14:08:17,332][1652491] Updated weights for policy 0, policy_version 211344 (0.0012) [2024-06-15 14:08:18,802][1652491] Updated weights for policy 0, policy_version 211400 (0.0012) [2024-06-15 14:08:20,956][1648985] Fps is (10 sec: 52426.2, 60 sec: 46967.1, 300 sec: 46874.8). Total num frames: 433061888. Throughput: 0: 11377.7. Samples: 108322304. Policy #0 lag: (min: 92.0, avg: 192.7, max: 330.0) [2024-06-15 14:08:20,957][1648985] Avg episode reward: [(0, '130.600')] [2024-06-15 14:08:23,422][1652491] Updated weights for policy 0, policy_version 211457 (0.0020) [2024-06-15 14:08:25,955][1648985] Fps is (10 sec: 49151.6, 60 sec: 45875.2, 300 sec: 46208.4). Total num frames: 433192960. Throughput: 0: 11423.3. Samples: 108360192. Policy #0 lag: (min: 92.0, avg: 192.7, max: 330.0) [2024-06-15 14:08:25,956][1648985] Avg episode reward: [(0, '140.530')] [2024-06-15 14:08:27,109][1652491] Updated weights for policy 0, policy_version 211521 (0.0014) [2024-06-15 14:08:28,153][1652491] Updated weights for policy 0, policy_version 211572 (0.0013) [2024-06-15 14:08:29,343][1652491] Updated weights for policy 0, policy_version 211632 (0.0113) [2024-06-15 14:08:30,955][1648985] Fps is (10 sec: 45877.8, 60 sec: 47513.6, 300 sec: 46874.9). Total num frames: 433520640. Throughput: 0: 11616.7. Samples: 108429312. Policy #0 lag: (min: 92.0, avg: 192.7, max: 330.0) [2024-06-15 14:08:30,955][1648985] Avg episode reward: [(0, '155.040')] [2024-06-15 14:08:31,044][1652491] Updated weights for policy 0, policy_version 211696 (0.0087) [2024-06-15 14:08:35,955][1648985] Fps is (10 sec: 42598.5, 60 sec: 44236.8, 300 sec: 45875.2). Total num frames: 433618944. Throughput: 0: 11514.4. Samples: 108495360. Policy #0 lag: (min: 92.0, avg: 192.7, max: 330.0) [2024-06-15 14:08:35,956][1648985] Avg episode reward: [(0, '156.610')] [2024-06-15 14:08:36,502][1652491] Updated weights for policy 0, policy_version 211767 (0.0012) [2024-06-15 14:08:39,908][1652491] Updated weights for policy 0, policy_version 211824 (0.0014) [2024-06-15 14:08:40,955][1648985] Fps is (10 sec: 36044.6, 60 sec: 46421.4, 300 sec: 46319.5). Total num frames: 433881088. Throughput: 0: 11650.9. Samples: 108537344. Policy #0 lag: (min: 92.0, avg: 192.7, max: 330.0) [2024-06-15 14:08:40,956][1648985] Avg episode reward: [(0, '133.510')] [2024-06-15 14:08:41,156][1652491] Updated weights for policy 0, policy_version 211874 (0.0012) [2024-06-15 14:08:43,314][1652491] Updated weights for policy 0, policy_version 211963 (0.0018) [2024-06-15 14:08:45,955][1648985] Fps is (10 sec: 49151.8, 60 sec: 44236.7, 300 sec: 46097.3). Total num frames: 434110464. Throughput: 0: 11423.3. Samples: 108596224. Policy #0 lag: (min: 92.0, avg: 192.7, max: 330.0) [2024-06-15 14:08:45,956][1648985] Avg episode reward: [(0, '124.350')] [2024-06-15 14:08:48,283][1652491] Updated weights for policy 0, policy_version 212022 (0.0134) [2024-06-15 14:08:50,853][1652491] Updated weights for policy 0, policy_version 212064 (0.0010) [2024-06-15 14:08:50,955][1648985] Fps is (10 sec: 42598.6, 60 sec: 44783.1, 300 sec: 46097.4). Total num frames: 434307072. Throughput: 0: 11514.3. Samples: 108674560. Policy #0 lag: (min: 92.0, avg: 192.7, max: 330.0) [2024-06-15 14:08:50,956][1648985] Avg episode reward: [(0, '125.130')] [2024-06-15 14:08:52,682][1652491] Updated weights for policy 0, policy_version 212144 (0.0100) [2024-06-15 14:08:54,466][1652491] Updated weights for policy 0, policy_version 212209 (0.0108) [2024-06-15 14:08:55,955][1648985] Fps is (10 sec: 52427.3, 60 sec: 46967.3, 300 sec: 46541.6). Total num frames: 434634752. Throughput: 0: 11446.0. Samples: 108697600. Policy #0 lag: (min: 92.0, avg: 192.7, max: 330.0) [2024-06-15 14:08:55,956][1648985] Avg episode reward: [(0, '129.650')] [2024-06-15 14:08:55,983][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000212224_434634752.pth... [2024-06-15 14:08:56,032][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000206752_423428096.pth [2024-06-15 14:08:59,587][1651469] Signal inference workers to stop experience collection... (11100 times) [2024-06-15 14:08:59,634][1652491] InferenceWorker_p0-w0: stopping experience collection (11100 times) [2024-06-15 14:08:59,846][1651469] Signal inference workers to resume experience collection... (11100 times) [2024-06-15 14:08:59,846][1652491] InferenceWorker_p0-w0: resuming experience collection (11100 times) [2024-06-15 14:09:00,192][1652491] Updated weights for policy 0, policy_version 212272 (0.0157) [2024-06-15 14:09:00,955][1648985] Fps is (10 sec: 45874.8, 60 sec: 44782.9, 300 sec: 46097.3). Total num frames: 434765824. Throughput: 0: 11423.3. Samples: 108769792. Policy #0 lag: (min: 92.0, avg: 192.7, max: 330.0) [2024-06-15 14:09:00,956][1648985] Avg episode reward: [(0, '132.210')] [2024-06-15 14:09:02,354][1652491] Updated weights for policy 0, policy_version 212323 (0.0011) [2024-06-15 14:09:03,502][1652491] Updated weights for policy 0, policy_version 212384 (0.0013) [2024-06-15 14:09:04,560][1652491] Updated weights for policy 0, policy_version 212417 (0.0014) [2024-06-15 14:09:05,955][1648985] Fps is (10 sec: 52429.3, 60 sec: 48059.5, 300 sec: 46652.7). Total num frames: 435159040. Throughput: 0: 11298.2. Samples: 108830720. Policy #0 lag: (min: 92.0, avg: 192.7, max: 330.0) [2024-06-15 14:09:05,956][1648985] Avg episode reward: [(0, '149.160')] [2024-06-15 14:09:10,846][1652491] Updated weights for policy 0, policy_version 212481 (0.0013) [2024-06-15 14:09:10,964][1648985] Fps is (10 sec: 39285.6, 60 sec: 43684.0, 300 sec: 45762.7). Total num frames: 435159040. Throughput: 0: 11421.0. Samples: 108874240. Policy #0 lag: (min: 92.0, avg: 192.7, max: 330.0) [2024-06-15 14:09:10,966][1648985] Avg episode reward: [(0, '142.910')] [2024-06-15 14:09:11,914][1652491] Updated weights for policy 0, policy_version 212537 (0.0013) [2024-06-15 14:09:13,315][1652491] Updated weights for policy 0, policy_version 212592 (0.0014) [2024-06-15 14:09:14,767][1652491] Updated weights for policy 0, policy_version 212664 (0.0131) [2024-06-15 14:09:15,908][1652491] Updated weights for policy 0, policy_version 212705 (0.0014) [2024-06-15 14:09:15,955][1648985] Fps is (10 sec: 45876.2, 60 sec: 48605.8, 300 sec: 46874.9). Total num frames: 435617792. Throughput: 0: 11411.9. Samples: 108942848. Policy #0 lag: (min: 92.0, avg: 192.7, max: 330.0) [2024-06-15 14:09:15,956][1648985] Avg episode reward: [(0, '123.100')] [2024-06-15 14:09:20,955][1648985] Fps is (10 sec: 52477.1, 60 sec: 43691.0, 300 sec: 45875.2). Total num frames: 435683328. Throughput: 0: 11696.4. Samples: 109021696. Policy #0 lag: (min: 42.0, avg: 202.6, max: 298.0) [2024-06-15 14:09:20,956][1648985] Avg episode reward: [(0, '135.000')] [2024-06-15 14:09:21,737][1652491] Updated weights for policy 0, policy_version 212752 (0.0013) [2024-06-15 14:09:24,012][1652491] Updated weights for policy 0, policy_version 212817 (0.0034) [2024-06-15 14:09:25,325][1652491] Updated weights for policy 0, policy_version 212885 (0.0013) [2024-06-15 14:09:25,958][1648985] Fps is (10 sec: 42592.9, 60 sec: 47512.6, 300 sec: 46541.5). Total num frames: 436043776. Throughput: 0: 11582.2. Samples: 109058560. Policy #0 lag: (min: 42.0, avg: 202.6, max: 298.0) [2024-06-15 14:09:25,959][1648985] Avg episode reward: [(0, '143.970')] [2024-06-15 14:09:27,178][1652491] Updated weights for policy 0, policy_version 212960 (0.0015) [2024-06-15 14:09:30,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 44782.9, 300 sec: 46208.4). Total num frames: 436207616. Throughput: 0: 11855.7. Samples: 109129728. Policy #0 lag: (min: 42.0, avg: 202.6, max: 298.0) [2024-06-15 14:09:30,956][1648985] Avg episode reward: [(0, '129.880')] [2024-06-15 14:09:33,389][1652491] Updated weights for policy 0, policy_version 213024 (0.0168) [2024-06-15 14:09:35,122][1652491] Updated weights for policy 0, policy_version 213088 (0.0012) [2024-06-15 14:09:35,955][1648985] Fps is (10 sec: 42604.0, 60 sec: 47513.6, 300 sec: 46541.7). Total num frames: 436469760. Throughput: 0: 11582.6. Samples: 109195776. Policy #0 lag: (min: 42.0, avg: 202.6, max: 298.0) [2024-06-15 14:09:35,956][1648985] Avg episode reward: [(0, '128.110')] [2024-06-15 14:09:36,556][1651469] Signal inference workers to stop experience collection... (11150 times) [2024-06-15 14:09:36,610][1652491] InferenceWorker_p0-w0: stopping experience collection (11150 times) [2024-06-15 14:09:36,761][1651469] Signal inference workers to resume experience collection... (11150 times) [2024-06-15 14:09:36,762][1652491] InferenceWorker_p0-w0: resuming experience collection (11150 times) [2024-06-15 14:09:36,887][1652491] Updated weights for policy 0, policy_version 213169 (0.0013) [2024-06-15 14:09:38,307][1652491] Updated weights for policy 0, policy_version 213232 (0.0012) [2024-06-15 14:09:40,957][1648985] Fps is (10 sec: 52417.3, 60 sec: 47511.9, 300 sec: 46652.5). Total num frames: 436731904. Throughput: 0: 11673.1. Samples: 109222912. Policy #0 lag: (min: 42.0, avg: 202.6, max: 298.0) [2024-06-15 14:09:40,958][1648985] Avg episode reward: [(0, '127.180')] [2024-06-15 14:09:44,728][1652491] Updated weights for policy 0, policy_version 213265 (0.0013) [2024-06-15 14:09:45,634][1652491] Updated weights for policy 0, policy_version 213312 (0.0012) [2024-06-15 14:09:45,955][1648985] Fps is (10 sec: 39321.8, 60 sec: 45875.3, 300 sec: 46208.5). Total num frames: 436862976. Throughput: 0: 11923.9. Samples: 109306368. Policy #0 lag: (min: 42.0, avg: 202.6, max: 298.0) [2024-06-15 14:09:45,956][1648985] Avg episode reward: [(0, '132.570')] [2024-06-15 14:09:47,532][1652491] Updated weights for policy 0, policy_version 213376 (0.0141) [2024-06-15 14:09:48,709][1652491] Updated weights for policy 0, policy_version 213426 (0.0013) [2024-06-15 14:09:50,092][1652491] Updated weights for policy 0, policy_version 213491 (0.0012) [2024-06-15 14:09:50,955][1648985] Fps is (10 sec: 52440.3, 60 sec: 49152.0, 300 sec: 47097.1). Total num frames: 437256192. Throughput: 0: 11889.9. Samples: 109365760. Policy #0 lag: (min: 42.0, avg: 202.6, max: 298.0) [2024-06-15 14:09:50,956][1648985] Avg episode reward: [(0, '144.990')] [2024-06-15 14:09:55,955][1648985] Fps is (10 sec: 42598.1, 60 sec: 44237.0, 300 sec: 45875.2). Total num frames: 437288960. Throughput: 0: 11880.8. Samples: 109408768. Policy #0 lag: (min: 42.0, avg: 202.6, max: 298.0) [2024-06-15 14:09:55,956][1648985] Avg episode reward: [(0, '163.520')] [2024-06-15 14:09:56,086][1652491] Updated weights for policy 0, policy_version 213536 (0.0011) [2024-06-15 14:09:57,360][1652491] Updated weights for policy 0, policy_version 213584 (0.0060) [2024-06-15 14:09:59,195][1652491] Updated weights for policy 0, policy_version 213650 (0.0011) [2024-06-15 14:10:00,305][1652491] Updated weights for policy 0, policy_version 213712 (0.0011) [2024-06-15 14:10:00,955][1648985] Fps is (10 sec: 49151.4, 60 sec: 49698.1, 300 sec: 46986.0). Total num frames: 437747712. Throughput: 0: 11798.7. Samples: 109473792. Policy #0 lag: (min: 42.0, avg: 202.6, max: 298.0) [2024-06-15 14:10:00,956][1648985] Avg episode reward: [(0, '167.960')] [2024-06-15 14:10:05,955][1648985] Fps is (10 sec: 49152.3, 60 sec: 43690.9, 300 sec: 46208.4). Total num frames: 437780480. Throughput: 0: 11912.5. Samples: 109557760. Policy #0 lag: (min: 42.0, avg: 202.6, max: 298.0) [2024-06-15 14:10:05,956][1648985] Avg episode reward: [(0, '153.900')] [2024-06-15 14:10:06,696][1652491] Updated weights for policy 0, policy_version 213783 (0.0013) [2024-06-15 14:10:08,505][1652491] Updated weights for policy 0, policy_version 213856 (0.0012) [2024-06-15 14:10:09,750][1652491] Updated weights for policy 0, policy_version 213906 (0.0012) [2024-06-15 14:10:10,955][1648985] Fps is (10 sec: 45875.9, 60 sec: 50798.2, 300 sec: 46986.8). Total num frames: 438206464. Throughput: 0: 11651.2. Samples: 109582848. Policy #0 lag: (min: 42.0, avg: 202.6, max: 298.0) [2024-06-15 14:10:10,955][1648985] Avg episode reward: [(0, '125.770')] [2024-06-15 14:10:11,109][1652491] Updated weights for policy 0, policy_version 213974 (0.0013) [2024-06-15 14:10:15,955][1648985] Fps is (10 sec: 52427.9, 60 sec: 44782.8, 300 sec: 46430.6). Total num frames: 438304768. Throughput: 0: 11684.9. Samples: 109655552. Policy #0 lag: (min: 42.0, avg: 202.6, max: 298.0) [2024-06-15 14:10:15,956][1648985] Avg episode reward: [(0, '142.750')] [2024-06-15 14:10:17,740][1652491] Updated weights for policy 0, policy_version 214048 (0.0013) [2024-06-15 14:10:17,843][1651469] Signal inference workers to stop experience collection... (11200 times) [2024-06-15 14:10:17,915][1652491] InferenceWorker_p0-w0: stopping experience collection (11200 times) [2024-06-15 14:10:18,180][1651469] Signal inference workers to resume experience collection... (11200 times) [2024-06-15 14:10:18,181][1652491] InferenceWorker_p0-w0: resuming experience collection (11200 times) [2024-06-15 14:10:19,605][1652491] Updated weights for policy 0, policy_version 214113 (0.0165) [2024-06-15 14:10:20,729][1652491] Updated weights for policy 0, policy_version 214165 (0.0013) [2024-06-15 14:10:20,955][1648985] Fps is (10 sec: 42598.1, 60 sec: 49152.0, 300 sec: 46874.9). Total num frames: 438632448. Throughput: 0: 11753.2. Samples: 109724672. Policy #0 lag: (min: 42.0, avg: 202.6, max: 298.0) [2024-06-15 14:10:20,956][1648985] Avg episode reward: [(0, '148.670')] [2024-06-15 14:10:21,827][1652491] Updated weights for policy 0, policy_version 214210 (0.0091) [2024-06-15 14:10:25,955][1648985] Fps is (10 sec: 52429.4, 60 sec: 46422.3, 300 sec: 46652.7). Total num frames: 438829056. Throughput: 0: 11867.6. Samples: 109756928. Policy #0 lag: (min: 42.0, avg: 202.6, max: 298.0) [2024-06-15 14:10:25,956][1648985] Avg episode reward: [(0, '146.900')] [2024-06-15 14:10:28,424][1652491] Updated weights for policy 0, policy_version 214275 (0.0050) [2024-06-15 14:10:29,927][1652491] Updated weights for policy 0, policy_version 214339 (0.0013) [2024-06-15 14:10:30,955][1648985] Fps is (10 sec: 42597.8, 60 sec: 47513.4, 300 sec: 46541.6). Total num frames: 439058432. Throughput: 0: 11753.2. Samples: 109835264. Policy #0 lag: (min: 15.0, avg: 67.2, max: 271.0) [2024-06-15 14:10:30,956][1648985] Avg episode reward: [(0, '134.040')] [2024-06-15 14:10:31,625][1652491] Updated weights for policy 0, policy_version 214421 (0.0014) [2024-06-15 14:10:33,783][1652491] Updated weights for policy 0, policy_version 214512 (0.0014) [2024-06-15 14:10:35,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 48059.7, 300 sec: 46986.0). Total num frames: 439353344. Throughput: 0: 11810.1. Samples: 109897216. Policy #0 lag: (min: 15.0, avg: 67.2, max: 271.0) [2024-06-15 14:10:35,956][1648985] Avg episode reward: [(0, '131.930')] [2024-06-15 14:10:40,853][1652491] Updated weights for policy 0, policy_version 214582 (0.0011) [2024-06-15 14:10:40,955][1648985] Fps is (10 sec: 39322.0, 60 sec: 45330.7, 300 sec: 46319.5). Total num frames: 439451648. Throughput: 0: 11844.3. Samples: 109941760. Policy #0 lag: (min: 15.0, avg: 67.2, max: 271.0) [2024-06-15 14:10:40,956][1648985] Avg episode reward: [(0, '143.420')] [2024-06-15 14:10:42,932][1652491] Updated weights for policy 0, policy_version 214645 (0.0011) [2024-06-15 14:10:45,197][1652491] Updated weights for policy 0, policy_version 214736 (0.0159) [2024-06-15 14:10:45,955][1648985] Fps is (10 sec: 45874.9, 60 sec: 49151.9, 300 sec: 46874.9). Total num frames: 439812096. Throughput: 0: 11525.7. Samples: 109992448. Policy #0 lag: (min: 15.0, avg: 67.2, max: 271.0) [2024-06-15 14:10:45,956][1648985] Avg episode reward: [(0, '144.770')] [2024-06-15 14:10:50,955][1648985] Fps is (10 sec: 42598.3, 60 sec: 43690.6, 300 sec: 46208.4). Total num frames: 439877632. Throughput: 0: 11309.5. Samples: 110066688. Policy #0 lag: (min: 15.0, avg: 67.2, max: 271.0) [2024-06-15 14:10:50,956][1648985] Avg episode reward: [(0, '144.130')] [2024-06-15 14:10:52,575][1652491] Updated weights for policy 0, policy_version 214803 (0.0015) [2024-06-15 14:10:53,433][1652491] Updated weights for policy 0, policy_version 214848 (0.0038) [2024-06-15 14:10:54,361][1652491] Updated weights for policy 0, policy_version 214893 (0.0015) [2024-06-15 14:10:55,695][1651469] Signal inference workers to stop experience collection... (11250 times) [2024-06-15 14:10:55,772][1652491] InferenceWorker_p0-w0: stopping experience collection (11250 times) [2024-06-15 14:10:55,955][1648985] Fps is (10 sec: 39320.5, 60 sec: 48605.6, 300 sec: 46652.7). Total num frames: 440205312. Throughput: 0: 11548.3. Samples: 110102528. Policy #0 lag: (min: 15.0, avg: 67.2, max: 271.0) [2024-06-15 14:10:55,956][1648985] Avg episode reward: [(0, '143.220')] [2024-06-15 14:10:56,024][1651469] Signal inference workers to resume experience collection... (11250 times) [2024-06-15 14:10:56,025][1652491] InferenceWorker_p0-w0: resuming experience collection (11250 times) [2024-06-15 14:10:56,026][1652491] Updated weights for policy 0, policy_version 214960 (0.0155) [2024-06-15 14:10:56,368][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000214976_440270848.pth... [2024-06-15 14:10:56,489][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000209504_429064192.pth [2024-06-15 14:10:56,495][1651469] Saving a milestone train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/milestones/checkpoint_000214976_440270848.pth [2024-06-15 14:10:57,795][1652491] Updated weights for policy 0, policy_version 215027 (0.0015) [2024-06-15 14:11:00,955][1648985] Fps is (10 sec: 52428.3, 60 sec: 44236.8, 300 sec: 46208.4). Total num frames: 440401920. Throughput: 0: 11400.5. Samples: 110168576. Policy #0 lag: (min: 15.0, avg: 67.2, max: 271.0) [2024-06-15 14:11:00,956][1648985] Avg episode reward: [(0, '150.540')] [2024-06-15 14:11:02,831][1652491] Updated weights for policy 0, policy_version 215056 (0.0098) [2024-06-15 14:11:05,345][1652491] Updated weights for policy 0, policy_version 215127 (0.0012) [2024-06-15 14:11:05,955][1648985] Fps is (10 sec: 42600.5, 60 sec: 47513.6, 300 sec: 46541.7). Total num frames: 440631296. Throughput: 0: 11571.2. Samples: 110245376. Policy #0 lag: (min: 15.0, avg: 67.2, max: 271.0) [2024-06-15 14:11:05,955][1648985] Avg episode reward: [(0, '150.080')] [2024-06-15 14:11:06,570][1652491] Updated weights for policy 0, policy_version 215184 (0.0013) [2024-06-15 14:11:08,384][1652491] Updated weights for policy 0, policy_version 215251 (0.0013) [2024-06-15 14:11:10,955][1648985] Fps is (10 sec: 52429.4, 60 sec: 45329.0, 300 sec: 46541.7). Total num frames: 440926208. Throughput: 0: 11400.5. Samples: 110269952. Policy #0 lag: (min: 15.0, avg: 67.2, max: 271.0) [2024-06-15 14:11:10,956][1648985] Avg episode reward: [(0, '160.800')] [2024-06-15 14:11:13,874][1652491] Updated weights for policy 0, policy_version 215298 (0.0018) [2024-06-15 14:11:15,955][1648985] Fps is (10 sec: 42597.7, 60 sec: 45875.3, 300 sec: 46208.4). Total num frames: 441057280. Throughput: 0: 11377.8. Samples: 110347264. Policy #0 lag: (min: 15.0, avg: 67.2, max: 271.0) [2024-06-15 14:11:15,956][1648985] Avg episode reward: [(0, '160.640')] [2024-06-15 14:11:16,464][1652491] Updated weights for policy 0, policy_version 215376 (0.0013) [2024-06-15 14:11:18,375][1652491] Updated weights for policy 0, policy_version 215460 (0.0014) [2024-06-15 14:11:20,326][1652491] Updated weights for policy 0, policy_version 215539 (0.0013) [2024-06-15 14:11:20,955][1648985] Fps is (10 sec: 52428.4, 60 sec: 46967.4, 300 sec: 46874.9). Total num frames: 441450496. Throughput: 0: 11309.5. Samples: 110406144. Policy #0 lag: (min: 15.0, avg: 67.2, max: 271.0) [2024-06-15 14:11:20,956][1648985] Avg episode reward: [(0, '157.850')] [2024-06-15 14:11:25,955][1648985] Fps is (10 sec: 39321.8, 60 sec: 43690.7, 300 sec: 45764.1). Total num frames: 441450496. Throughput: 0: 11275.4. Samples: 110449152. Policy #0 lag: (min: 15.0, avg: 67.2, max: 271.0) [2024-06-15 14:11:25,956][1648985] Avg episode reward: [(0, '141.550')] [2024-06-15 14:11:26,106][1652491] Updated weights for policy 0, policy_version 215568 (0.0011) [2024-06-15 14:11:27,511][1652491] Updated weights for policy 0, policy_version 215617 (0.0015) [2024-06-15 14:11:29,028][1652491] Updated weights for policy 0, policy_version 215684 (0.0013) [2024-06-15 14:11:30,760][1652491] Updated weights for policy 0, policy_version 215747 (0.0012) [2024-06-15 14:11:30,955][1648985] Fps is (10 sec: 39322.6, 60 sec: 46421.6, 300 sec: 46652.8). Total num frames: 441843712. Throughput: 0: 11764.7. Samples: 110521856. Policy #0 lag: (min: 15.0, avg: 67.2, max: 271.0) [2024-06-15 14:11:30,955][1648985] Avg episode reward: [(0, '128.270')] [2024-06-15 14:11:32,048][1652491] Updated weights for policy 0, policy_version 215795 (0.0012) [2024-06-15 14:11:35,955][1648985] Fps is (10 sec: 52428.4, 60 sec: 43690.6, 300 sec: 46208.4). Total num frames: 441974784. Throughput: 0: 11719.1. Samples: 110594048. Policy #0 lag: (min: 15.0, avg: 67.2, max: 271.0) [2024-06-15 14:11:35,957][1648985] Avg episode reward: [(0, '132.540')] [2024-06-15 14:11:36,890][1652491] Updated weights for policy 0, policy_version 215824 (0.0022) [2024-06-15 14:11:38,429][1651469] Signal inference workers to stop experience collection... (11300 times) [2024-06-15 14:11:38,486][1652491] InferenceWorker_p0-w0: stopping experience collection (11300 times) [2024-06-15 14:11:38,488][1652491] Updated weights for policy 0, policy_version 215882 (0.0013) [2024-06-15 14:11:38,550][1651469] Signal inference workers to resume experience collection... (11300 times) [2024-06-15 14:11:38,551][1652491] InferenceWorker_p0-w0: resuming experience collection (11300 times) [2024-06-15 14:11:39,754][1652491] Updated weights for policy 0, policy_version 215938 (0.0141) [2024-06-15 14:11:40,955][1648985] Fps is (10 sec: 49151.0, 60 sec: 48059.7, 300 sec: 46541.7). Total num frames: 442335232. Throughput: 0: 11798.8. Samples: 110633472. Policy #0 lag: (min: 1.0, avg: 71.5, max: 257.0) [2024-06-15 14:11:40,956][1648985] Avg episode reward: [(0, '155.320')] [2024-06-15 14:11:41,255][1652491] Updated weights for policy 0, policy_version 216002 (0.0029) [2024-06-15 14:11:45,955][1648985] Fps is (10 sec: 52429.9, 60 sec: 44783.1, 300 sec: 46208.4). Total num frames: 442499072. Throughput: 0: 11741.9. Samples: 110696960. Policy #0 lag: (min: 1.0, avg: 71.5, max: 257.0) [2024-06-15 14:11:45,955][1648985] Avg episode reward: [(0, '169.990')] [2024-06-15 14:11:47,616][1652491] Updated weights for policy 0, policy_version 216066 (0.0013) [2024-06-15 14:11:48,882][1652491] Updated weights for policy 0, policy_version 216128 (0.0012) [2024-06-15 14:11:50,734][1652491] Updated weights for policy 0, policy_version 216192 (0.0013) [2024-06-15 14:11:50,955][1648985] Fps is (10 sec: 42598.8, 60 sec: 48059.8, 300 sec: 46652.8). Total num frames: 442761216. Throughput: 0: 11730.5. Samples: 110773248. Policy #0 lag: (min: 1.0, avg: 71.5, max: 257.0) [2024-06-15 14:11:50,956][1648985] Avg episode reward: [(0, '162.370')] [2024-06-15 14:11:53,170][1652491] Updated weights for policy 0, policy_version 216288 (0.0012) [2024-06-15 14:11:55,955][1648985] Fps is (10 sec: 52427.9, 60 sec: 46967.7, 300 sec: 46541.7). Total num frames: 443023360. Throughput: 0: 11685.0. Samples: 110795776. Policy #0 lag: (min: 1.0, avg: 71.5, max: 257.0) [2024-06-15 14:11:55,956][1648985] Avg episode reward: [(0, '157.140')] [2024-06-15 14:11:59,393][1652491] Updated weights for policy 0, policy_version 216354 (0.0084) [2024-06-15 14:12:00,504][1652491] Updated weights for policy 0, policy_version 216406 (0.0013) [2024-06-15 14:12:00,955][1648985] Fps is (10 sec: 45874.4, 60 sec: 46967.5, 300 sec: 46430.6). Total num frames: 443219968. Throughput: 0: 11935.3. Samples: 110884352. Policy #0 lag: (min: 1.0, avg: 71.5, max: 257.0) [2024-06-15 14:12:00,957][1648985] Avg episode reward: [(0, '154.450')] [2024-06-15 14:12:02,204][1652491] Updated weights for policy 0, policy_version 216465 (0.0013) [2024-06-15 14:12:03,858][1652491] Updated weights for policy 0, policy_version 216529 (0.0011) [2024-06-15 14:12:04,945][1652491] Updated weights for policy 0, policy_version 216576 (0.0011) [2024-06-15 14:12:05,955][1648985] Fps is (10 sec: 52429.2, 60 sec: 48605.8, 300 sec: 46763.8). Total num frames: 443547648. Throughput: 0: 11958.1. Samples: 110944256. Policy #0 lag: (min: 1.0, avg: 71.5, max: 257.0) [2024-06-15 14:12:05,956][1648985] Avg episode reward: [(0, '146.360')] [2024-06-15 14:12:10,955][1648985] Fps is (10 sec: 42598.8, 60 sec: 45329.1, 300 sec: 46208.4). Total num frames: 443645952. Throughput: 0: 11980.8. Samples: 110988288. Policy #0 lag: (min: 1.0, avg: 71.5, max: 257.0) [2024-06-15 14:12:10,956][1648985] Avg episode reward: [(0, '152.430')] [2024-06-15 14:12:12,056][1652491] Updated weights for policy 0, policy_version 216661 (0.0158) [2024-06-15 14:12:13,708][1652491] Updated weights for policy 0, policy_version 216723 (0.0013) [2024-06-15 14:12:15,014][1651469] Signal inference workers to stop experience collection... (11350 times) [2024-06-15 14:12:15,060][1652491] InferenceWorker_p0-w0: stopping experience collection (11350 times) [2024-06-15 14:12:15,233][1651469] Signal inference workers to resume experience collection... (11350 times) [2024-06-15 14:12:15,234][1652491] InferenceWorker_p0-w0: resuming experience collection (11350 times) [2024-06-15 14:12:15,415][1652491] Updated weights for policy 0, policy_version 216788 (0.0012) [2024-06-15 14:12:15,955][1648985] Fps is (10 sec: 49152.1, 60 sec: 49698.2, 300 sec: 46763.8). Total num frames: 444039168. Throughput: 0: 11628.1. Samples: 111045120. Policy #0 lag: (min: 1.0, avg: 71.5, max: 257.0) [2024-06-15 14:12:15,955][1648985] Avg episode reward: [(0, '163.550')] [2024-06-15 14:12:16,254][1652491] Updated weights for policy 0, policy_version 216832 (0.0012) [2024-06-15 14:12:20,955][1648985] Fps is (10 sec: 42598.9, 60 sec: 43690.8, 300 sec: 46208.4). Total num frames: 444071936. Throughput: 0: 11810.2. Samples: 111125504. Policy #0 lag: (min: 1.0, avg: 71.5, max: 257.0) [2024-06-15 14:12:20,955][1648985] Avg episode reward: [(0, '164.420')] [2024-06-15 14:12:22,149][1652491] Updated weights for policy 0, policy_version 216887 (0.0013) [2024-06-15 14:12:22,946][1652491] Updated weights for policy 0, policy_version 216925 (0.0012) [2024-06-15 14:12:25,477][1652491] Updated weights for policy 0, policy_version 217012 (0.0140) [2024-06-15 14:12:25,955][1648985] Fps is (10 sec: 42597.4, 60 sec: 50244.1, 300 sec: 46763.8). Total num frames: 444465152. Throughput: 0: 11628.1. Samples: 111156736. Policy #0 lag: (min: 1.0, avg: 71.5, max: 257.0) [2024-06-15 14:12:25,956][1648985] Avg episode reward: [(0, '161.050')] [2024-06-15 14:12:26,785][1652491] Updated weights for policy 0, policy_version 217041 (0.0015) [2024-06-15 14:12:30,955][1648985] Fps is (10 sec: 52428.5, 60 sec: 45875.1, 300 sec: 46208.4). Total num frames: 444596224. Throughput: 0: 11673.6. Samples: 111222272. Policy #0 lag: (min: 1.0, avg: 71.5, max: 257.0) [2024-06-15 14:12:30,956][1648985] Avg episode reward: [(0, '157.180')] [2024-06-15 14:12:32,352][1652491] Updated weights for policy 0, policy_version 217095 (0.0017) [2024-06-15 14:12:33,540][1652491] Updated weights for policy 0, policy_version 217152 (0.0026) [2024-06-15 14:12:35,595][1652491] Updated weights for policy 0, policy_version 217232 (0.0169) [2024-06-15 14:12:35,955][1648985] Fps is (10 sec: 45876.4, 60 sec: 49152.1, 300 sec: 46874.9). Total num frames: 444923904. Throughput: 0: 11537.1. Samples: 111292416. Policy #0 lag: (min: 1.0, avg: 71.5, max: 257.0) [2024-06-15 14:12:35,956][1648985] Avg episode reward: [(0, '165.890')] [2024-06-15 14:12:36,808][1652491] Updated weights for policy 0, policy_version 217279 (0.0036) [2024-06-15 14:12:39,061][1652491] Updated weights for policy 0, policy_version 217335 (0.0013) [2024-06-15 14:12:40,955][1648985] Fps is (10 sec: 52428.2, 60 sec: 46421.3, 300 sec: 46319.5). Total num frames: 445120512. Throughput: 0: 11673.6. Samples: 111321088. Policy #0 lag: (min: 1.0, avg: 71.5, max: 257.0) [2024-06-15 14:12:40,956][1648985] Avg episode reward: [(0, '167.720')] [2024-06-15 14:12:44,343][1652491] Updated weights for policy 0, policy_version 217363 (0.0014) [2024-06-15 14:12:45,336][1652491] Updated weights for policy 0, policy_version 217411 (0.0013) [2024-06-15 14:12:45,955][1648985] Fps is (10 sec: 36044.6, 60 sec: 46421.3, 300 sec: 46319.5). Total num frames: 445284352. Throughput: 0: 11525.7. Samples: 111403008. Policy #0 lag: (min: 33.0, avg: 93.4, max: 289.0) [2024-06-15 14:12:45,956][1648985] Avg episode reward: [(0, '161.090')] [2024-06-15 14:12:47,101][1652491] Updated weights for policy 0, policy_version 217474 (0.0016) [2024-06-15 14:12:49,494][1652491] Updated weights for policy 0, policy_version 217552 (0.0013) [2024-06-15 14:12:50,955][1648985] Fps is (10 sec: 52429.6, 60 sec: 48059.8, 300 sec: 46874.9). Total num frames: 445644800. Throughput: 0: 11366.4. Samples: 111455744. Policy #0 lag: (min: 33.0, avg: 93.4, max: 289.0) [2024-06-15 14:12:50,955][1648985] Avg episode reward: [(0, '166.160')] [2024-06-15 14:12:55,955][1648985] Fps is (10 sec: 36044.5, 60 sec: 43690.7, 300 sec: 45986.3). Total num frames: 445644800. Throughput: 0: 11400.5. Samples: 111501312. Policy #0 lag: (min: 33.0, avg: 93.4, max: 289.0) [2024-06-15 14:12:55,956][1648985] Avg episode reward: [(0, '145.530')] [2024-06-15 14:12:56,383][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000217632_445710336.pth... [2024-06-15 14:12:56,607][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000212224_434634752.pth [2024-06-15 14:12:56,982][1652491] Updated weights for policy 0, policy_version 217648 (0.0028) [2024-06-15 14:12:58,013][1651469] Signal inference workers to stop experience collection... (11400 times) [2024-06-15 14:12:58,046][1652491] InferenceWorker_p0-w0: stopping experience collection (11400 times) [2024-06-15 14:12:58,203][1651469] Signal inference workers to resume experience collection... (11400 times) [2024-06-15 14:12:58,204][1652491] InferenceWorker_p0-w0: resuming experience collection (11400 times) [2024-06-15 14:12:58,426][1652491] Updated weights for policy 0, policy_version 217700 (0.0013) [2024-06-15 14:13:00,283][1652491] Updated weights for policy 0, policy_version 217781 (0.0102) [2024-06-15 14:13:00,955][1648985] Fps is (10 sec: 39321.1, 60 sec: 46967.5, 300 sec: 46652.7). Total num frames: 446038016. Throughput: 0: 11400.5. Samples: 111558144. Policy #0 lag: (min: 33.0, avg: 93.4, max: 289.0) [2024-06-15 14:13:00,956][1648985] Avg episode reward: [(0, '135.820')] [2024-06-15 14:13:01,793][1652491] Updated weights for policy 0, policy_version 217826 (0.0011) [2024-06-15 14:13:05,955][1648985] Fps is (10 sec: 52429.2, 60 sec: 43690.7, 300 sec: 46208.4). Total num frames: 446169088. Throughput: 0: 11434.7. Samples: 111640064. Policy #0 lag: (min: 33.0, avg: 93.4, max: 289.0) [2024-06-15 14:13:05,956][1648985] Avg episode reward: [(0, '132.660')] [2024-06-15 14:13:07,025][1652491] Updated weights for policy 0, policy_version 217861 (0.0011) [2024-06-15 14:13:08,123][1652491] Updated weights for policy 0, policy_version 217918 (0.0013) [2024-06-15 14:13:09,936][1652491] Updated weights for policy 0, policy_version 217982 (0.0013) [2024-06-15 14:13:10,955][1648985] Fps is (10 sec: 45875.5, 60 sec: 47513.6, 300 sec: 46763.8). Total num frames: 446496768. Throughput: 0: 11434.7. Samples: 111671296. Policy #0 lag: (min: 33.0, avg: 93.4, max: 289.0) [2024-06-15 14:13:10,956][1648985] Avg episode reward: [(0, '127.300')] [2024-06-15 14:13:10,945][1652491] Updated weights for policy 0, policy_version 218021 (0.0021) [2024-06-15 14:13:12,475][1652491] Updated weights for policy 0, policy_version 218050 (0.0012) [2024-06-15 14:13:13,751][1652491] Updated weights for policy 0, policy_version 218110 (0.0020) [2024-06-15 14:13:15,955][1648985] Fps is (10 sec: 52427.3, 60 sec: 44236.6, 300 sec: 46208.5). Total num frames: 446693376. Throughput: 0: 11502.9. Samples: 111739904. Policy #0 lag: (min: 33.0, avg: 93.4, max: 289.0) [2024-06-15 14:13:15,956][1648985] Avg episode reward: [(0, '142.480')] [2024-06-15 14:13:18,211][1652491] Updated weights for policy 0, policy_version 218153 (0.0017) [2024-06-15 14:13:20,293][1652491] Updated weights for policy 0, policy_version 218209 (0.0128) [2024-06-15 14:13:20,955][1648985] Fps is (10 sec: 45875.8, 60 sec: 48059.8, 300 sec: 46652.8). Total num frames: 446955520. Throughput: 0: 11548.5. Samples: 111812096. Policy #0 lag: (min: 33.0, avg: 93.4, max: 289.0) [2024-06-15 14:13:20,955][1648985] Avg episode reward: [(0, '156.140')] [2024-06-15 14:13:21,912][1652491] Updated weights for policy 0, policy_version 218275 (0.0015) [2024-06-15 14:13:23,832][1652491] Updated weights for policy 0, policy_version 218327 (0.0012) [2024-06-15 14:13:25,955][1648985] Fps is (10 sec: 52430.2, 60 sec: 45875.3, 300 sec: 46430.6). Total num frames: 447217664. Throughput: 0: 11559.8. Samples: 111841280. Policy #0 lag: (min: 33.0, avg: 93.4, max: 289.0) [2024-06-15 14:13:25,956][1648985] Avg episode reward: [(0, '153.870')] [2024-06-15 14:13:28,462][1652491] Updated weights for policy 0, policy_version 218369 (0.0026) [2024-06-15 14:13:29,548][1652491] Updated weights for policy 0, policy_version 218424 (0.0012) [2024-06-15 14:13:30,955][1648985] Fps is (10 sec: 39321.2, 60 sec: 45875.2, 300 sec: 46541.7). Total num frames: 447348736. Throughput: 0: 11468.8. Samples: 111919104. Policy #0 lag: (min: 33.0, avg: 93.4, max: 289.0) [2024-06-15 14:13:30,956][1648985] Avg episode reward: [(0, '131.280')] [2024-06-15 14:13:31,754][1652491] Updated weights for policy 0, policy_version 218467 (0.0017) [2024-06-15 14:13:33,720][1652491] Updated weights for policy 0, policy_version 218544 (0.0020) [2024-06-15 14:13:35,652][1652491] Updated weights for policy 0, policy_version 218596 (0.0015) [2024-06-15 14:13:35,955][1648985] Fps is (10 sec: 49151.1, 60 sec: 46421.1, 300 sec: 46874.9). Total num frames: 447709184. Throughput: 0: 11662.2. Samples: 111980544. Policy #0 lag: (min: 33.0, avg: 93.4, max: 289.0) [2024-06-15 14:13:35,956][1648985] Avg episode reward: [(0, '123.100')] [2024-06-15 14:13:39,253][1652491] Updated weights for policy 0, policy_version 218631 (0.0024) [2024-06-15 14:13:39,484][1651469] Signal inference workers to stop experience collection... (11450 times) [2024-06-15 14:13:39,518][1652491] InferenceWorker_p0-w0: stopping experience collection (11450 times) [2024-06-15 14:13:39,615][1651469] Signal inference workers to resume experience collection... (11450 times) [2024-06-15 14:13:39,616][1652491] InferenceWorker_p0-w0: resuming experience collection (11450 times) [2024-06-15 14:13:40,129][1652491] Updated weights for policy 0, policy_version 218685 (0.0013) [2024-06-15 14:13:40,955][1648985] Fps is (10 sec: 52428.4, 60 sec: 45875.2, 300 sec: 46652.7). Total num frames: 447873024. Throughput: 0: 11719.1. Samples: 112028672. Policy #0 lag: (min: 33.0, avg: 93.4, max: 289.0) [2024-06-15 14:13:40,956][1648985] Avg episode reward: [(0, '137.420')] [2024-06-15 14:13:43,300][1652491] Updated weights for policy 0, policy_version 218752 (0.0019) [2024-06-15 14:13:44,746][1652491] Updated weights for policy 0, policy_version 218807 (0.0013) [2024-06-15 14:13:45,955][1648985] Fps is (10 sec: 42599.1, 60 sec: 47513.6, 300 sec: 46874.9). Total num frames: 448135168. Throughput: 0: 11878.4. Samples: 112092672. Policy #0 lag: (min: 33.0, avg: 93.4, max: 289.0) [2024-06-15 14:13:45,956][1648985] Avg episode reward: [(0, '154.740')] [2024-06-15 14:13:46,543][1652491] Updated weights for policy 0, policy_version 218854 (0.0013) [2024-06-15 14:13:49,722][1652491] Updated weights for policy 0, policy_version 218899 (0.0013) [2024-06-15 14:13:50,523][1652491] Updated weights for policy 0, policy_version 218939 (0.0011) [2024-06-15 14:13:50,955][1648985] Fps is (10 sec: 52427.2, 60 sec: 45874.9, 300 sec: 46652.7). Total num frames: 448397312. Throughput: 0: 11901.1. Samples: 112175616. Policy #0 lag: (min: 33.0, avg: 93.4, max: 289.0) [2024-06-15 14:13:50,956][1648985] Avg episode reward: [(0, '165.740')] [2024-06-15 14:13:54,421][1652491] Updated weights for policy 0, policy_version 219024 (0.0014) [2024-06-15 14:13:55,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 50244.3, 300 sec: 47097.1). Total num frames: 448659456. Throughput: 0: 11969.4. Samples: 112209920. Policy #0 lag: (min: 15.0, avg: 92.1, max: 251.0) [2024-06-15 14:13:55,956][1648985] Avg episode reward: [(0, '154.230')] [2024-06-15 14:13:57,472][1652491] Updated weights for policy 0, policy_version 219106 (0.0088) [2024-06-15 14:13:59,986][1652491] Updated weights for policy 0, policy_version 219152 (0.0012) [2024-06-15 14:14:00,955][1648985] Fps is (10 sec: 49154.1, 60 sec: 47513.7, 300 sec: 46541.7). Total num frames: 448888832. Throughput: 0: 12106.0. Samples: 112284672. Policy #0 lag: (min: 15.0, avg: 92.1, max: 251.0) [2024-06-15 14:14:00,956][1648985] Avg episode reward: [(0, '148.860')] [2024-06-15 14:14:03,715][1652491] Updated weights for policy 0, policy_version 219216 (0.0014) [2024-06-15 14:14:05,804][1652491] Updated weights for policy 0, policy_version 219296 (0.0100) [2024-06-15 14:14:05,955][1648985] Fps is (10 sec: 45875.1, 60 sec: 49152.0, 300 sec: 47320.7). Total num frames: 449118208. Throughput: 0: 11923.9. Samples: 112348672. Policy #0 lag: (min: 15.0, avg: 92.1, max: 251.0) [2024-06-15 14:14:05,956][1648985] Avg episode reward: [(0, '152.900')] [2024-06-15 14:14:06,552][1652491] Updated weights for policy 0, policy_version 219328 (0.0015) [2024-06-15 14:14:08,297][1652491] Updated weights for policy 0, policy_version 219387 (0.0014) [2024-06-15 14:14:10,955][1648985] Fps is (10 sec: 42598.3, 60 sec: 46967.5, 300 sec: 46430.6). Total num frames: 449314816. Throughput: 0: 12128.7. Samples: 112387072. Policy #0 lag: (min: 15.0, avg: 92.1, max: 251.0) [2024-06-15 14:14:10,955][1648985] Avg episode reward: [(0, '150.780')] [2024-06-15 14:14:11,574][1652491] Updated weights for policy 0, policy_version 219428 (0.0011) [2024-06-15 14:14:15,017][1652491] Updated weights for policy 0, policy_version 219488 (0.0014) [2024-06-15 14:14:15,955][1648985] Fps is (10 sec: 45875.6, 60 sec: 48060.0, 300 sec: 47097.1). Total num frames: 449576960. Throughput: 0: 12231.1. Samples: 112469504. Policy #0 lag: (min: 15.0, avg: 92.1, max: 251.0) [2024-06-15 14:14:15,955][1648985] Avg episode reward: [(0, '143.110')] [2024-06-15 14:14:16,540][1652491] Updated weights for policy 0, policy_version 219552 (0.0013) [2024-06-15 14:14:17,962][1652491] Updated weights for policy 0, policy_version 219600 (0.0013) [2024-06-15 14:14:18,483][1651469] Signal inference workers to stop experience collection... (11500 times) [2024-06-15 14:14:18,515][1652491] InferenceWorker_p0-w0: stopping experience collection (11500 times) [2024-06-15 14:14:18,669][1651469] Signal inference workers to resume experience collection... (11500 times) [2024-06-15 14:14:18,670][1652491] InferenceWorker_p0-w0: resuming experience collection (11500 times) [2024-06-15 14:14:18,923][1652491] Updated weights for policy 0, policy_version 219646 (0.0014) [2024-06-15 14:14:20,957][1648985] Fps is (10 sec: 52417.2, 60 sec: 48057.9, 300 sec: 46763.7). Total num frames: 449839104. Throughput: 0: 12344.3. Samples: 112536064. Policy #0 lag: (min: 15.0, avg: 92.1, max: 251.0) [2024-06-15 14:14:20,958][1648985] Avg episode reward: [(0, '149.890')] [2024-06-15 14:14:22,725][1652491] Updated weights for policy 0, policy_version 219708 (0.0015) [2024-06-15 14:14:25,955][1648985] Fps is (10 sec: 42598.1, 60 sec: 46421.3, 300 sec: 46763.8). Total num frames: 450002944. Throughput: 0: 12162.9. Samples: 112576000. Policy #0 lag: (min: 15.0, avg: 92.1, max: 251.0) [2024-06-15 14:14:25,956][1648985] Avg episode reward: [(0, '155.860')] [2024-06-15 14:14:26,517][1652491] Updated weights for policy 0, policy_version 219760 (0.0014) [2024-06-15 14:14:28,447][1652491] Updated weights for policy 0, policy_version 219836 (0.0128) [2024-06-15 14:14:30,061][1652491] Updated weights for policy 0, policy_version 219888 (0.0014) [2024-06-15 14:14:30,955][1648985] Fps is (10 sec: 52440.5, 60 sec: 50244.3, 300 sec: 47097.1). Total num frames: 450363392. Throughput: 0: 12071.8. Samples: 112635904. Policy #0 lag: (min: 15.0, avg: 92.1, max: 251.0) [2024-06-15 14:14:30,955][1648985] Avg episode reward: [(0, '160.110')] [2024-06-15 14:14:33,374][1652491] Updated weights for policy 0, policy_version 219920 (0.0023) [2024-06-15 14:14:34,307][1652491] Updated weights for policy 0, policy_version 219966 (0.0013) [2024-06-15 14:14:35,955][1648985] Fps is (10 sec: 49150.9, 60 sec: 46421.3, 300 sec: 46653.0). Total num frames: 450494464. Throughput: 0: 11992.2. Samples: 112715264. Policy #0 lag: (min: 15.0, avg: 92.1, max: 251.0) [2024-06-15 14:14:35,956][1648985] Avg episode reward: [(0, '139.970')] [2024-06-15 14:14:37,804][1652491] Updated weights for policy 0, policy_version 220037 (0.0012) [2024-06-15 14:14:39,070][1652491] Updated weights for policy 0, policy_version 220093 (0.0012) [2024-06-15 14:14:40,955][1648985] Fps is (10 sec: 42598.3, 60 sec: 48605.9, 300 sec: 47208.1). Total num frames: 450789376. Throughput: 0: 11832.9. Samples: 112742400. Policy #0 lag: (min: 15.0, avg: 92.1, max: 251.0) [2024-06-15 14:14:40,955][1648985] Avg episode reward: [(0, '134.430')] [2024-06-15 14:14:41,820][1652491] Updated weights for policy 0, policy_version 220152 (0.0041) [2024-06-15 14:14:45,471][1652491] Updated weights for policy 0, policy_version 220218 (0.0046) [2024-06-15 14:14:45,955][1648985] Fps is (10 sec: 52430.2, 60 sec: 48059.8, 300 sec: 46652.7). Total num frames: 451018752. Throughput: 0: 11855.6. Samples: 112818176. Policy #0 lag: (min: 15.0, avg: 92.1, max: 251.0) [2024-06-15 14:14:45,956][1648985] Avg episode reward: [(0, '131.310')] [2024-06-15 14:14:47,757][1652491] Updated weights for policy 0, policy_version 220256 (0.0014) [2024-06-15 14:14:49,776][1652491] Updated weights for policy 0, policy_version 220336 (0.0014) [2024-06-15 14:14:50,308][1652491] Updated weights for policy 0, policy_version 220352 (0.0035) [2024-06-15 14:14:50,959][1648985] Fps is (10 sec: 49130.9, 60 sec: 48056.6, 300 sec: 47429.6). Total num frames: 451280896. Throughput: 0: 11911.4. Samples: 112884736. Policy #0 lag: (min: 15.0, avg: 92.1, max: 251.0) [2024-06-15 14:14:50,960][1648985] Avg episode reward: [(0, '140.120')] [2024-06-15 14:14:53,197][1652491] Updated weights for policy 0, policy_version 220415 (0.0014) [2024-06-15 14:14:55,955][1648985] Fps is (10 sec: 39320.9, 60 sec: 45875.1, 300 sec: 46319.5). Total num frames: 451411968. Throughput: 0: 11901.1. Samples: 112922624. Policy #0 lag: (min: 15.0, avg: 92.1, max: 251.0) [2024-06-15 14:14:55,956][1648985] Avg episode reward: [(0, '138.370')] [2024-06-15 14:14:56,281][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000220448_451477504.pth... [2024-06-15 14:14:56,432][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000214976_440270848.pth [2024-06-15 14:14:57,032][1652491] Updated weights for policy 0, policy_version 220474 (0.0014) [2024-06-15 14:14:59,744][1652491] Updated weights for policy 0, policy_version 220538 (0.0015) [2024-06-15 14:15:00,785][1652491] Updated weights for policy 0, policy_version 220576 (0.0017) [2024-06-15 14:15:00,955][1648985] Fps is (10 sec: 45894.5, 60 sec: 47513.5, 300 sec: 47319.2). Total num frames: 451739648. Throughput: 0: 11548.4. Samples: 112989184. Policy #0 lag: (min: 15.0, avg: 92.1, max: 251.0) [2024-06-15 14:15:00,956][1648985] Avg episode reward: [(0, '140.300')] [2024-06-15 14:15:03,461][1652491] Updated weights for policy 0, policy_version 220609 (0.0011) [2024-06-15 14:15:04,268][1651469] Signal inference workers to stop experience collection... (11550 times) [2024-06-15 14:15:04,325][1652491] InferenceWorker_p0-w0: stopping experience collection (11550 times) [2024-06-15 14:15:04,687][1651469] Signal inference workers to resume experience collection... (11550 times) [2024-06-15 14:15:04,688][1652491] InferenceWorker_p0-w0: resuming experience collection (11550 times) [2024-06-15 14:15:04,969][1652491] Updated weights for policy 0, policy_version 220669 (0.0039) [2024-06-15 14:15:05,955][1648985] Fps is (10 sec: 52429.9, 60 sec: 46967.5, 300 sec: 46541.7). Total num frames: 451936256. Throughput: 0: 11662.8. Samples: 113060864. Policy #0 lag: (min: 68.0, avg: 201.4, max: 335.0) [2024-06-15 14:15:05,955][1648985] Avg episode reward: [(0, '166.610')] [2024-06-15 14:15:08,514][1652491] Updated weights for policy 0, policy_version 220726 (0.0014) [2024-06-15 14:15:09,815][1652491] Updated weights for policy 0, policy_version 220755 (0.0013) [2024-06-15 14:15:10,955][1648985] Fps is (10 sec: 42598.8, 60 sec: 47513.6, 300 sec: 46986.0). Total num frames: 452165632. Throughput: 0: 11548.5. Samples: 113095680. Policy #0 lag: (min: 68.0, avg: 201.4, max: 335.0) [2024-06-15 14:15:10,955][1648985] Avg episode reward: [(0, '161.220')] [2024-06-15 14:15:11,667][1652491] Updated weights for policy 0, policy_version 220817 (0.0015) [2024-06-15 14:15:15,151][1652491] Updated weights for policy 0, policy_version 220880 (0.0012) [2024-06-15 14:15:15,955][1648985] Fps is (10 sec: 45875.1, 60 sec: 46967.5, 300 sec: 46652.8). Total num frames: 452395008. Throughput: 0: 11753.2. Samples: 113164800. Policy #0 lag: (min: 68.0, avg: 201.4, max: 335.0) [2024-06-15 14:15:15,956][1648985] Avg episode reward: [(0, '149.680')] [2024-06-15 14:15:16,350][1652491] Updated weights for policy 0, policy_version 220922 (0.0018) [2024-06-15 14:15:20,023][1652491] Updated weights for policy 0, policy_version 220961 (0.0041) [2024-06-15 14:15:20,746][1652491] Updated weights for policy 0, policy_version 220989 (0.0013) [2024-06-15 14:15:20,955][1648985] Fps is (10 sec: 42598.1, 60 sec: 45876.8, 300 sec: 46652.7). Total num frames: 452591616. Throughput: 0: 11468.9. Samples: 113231360. Policy #0 lag: (min: 68.0, avg: 201.4, max: 335.0) [2024-06-15 14:15:20,956][1648985] Avg episode reward: [(0, '164.070')] [2024-06-15 14:15:22,570][1652491] Updated weights for policy 0, policy_version 221042 (0.0026) [2024-06-15 14:15:24,126][1652491] Updated weights for policy 0, policy_version 221115 (0.0131) [2024-06-15 14:15:25,955][1648985] Fps is (10 sec: 45875.0, 60 sec: 47513.6, 300 sec: 46763.9). Total num frames: 452853760. Throughput: 0: 11514.3. Samples: 113260544. Policy #0 lag: (min: 68.0, avg: 201.4, max: 335.0) [2024-06-15 14:15:25,956][1648985] Avg episode reward: [(0, '171.110')] [2024-06-15 14:15:26,834][1652491] Updated weights for policy 0, policy_version 221168 (0.0016) [2024-06-15 14:15:30,955][1648985] Fps is (10 sec: 45875.4, 60 sec: 44782.9, 300 sec: 46430.6). Total num frames: 453050368. Throughput: 0: 11571.2. Samples: 113338880. Policy #0 lag: (min: 68.0, avg: 201.4, max: 335.0) [2024-06-15 14:15:30,956][1648985] Avg episode reward: [(0, '157.240')] [2024-06-15 14:15:31,293][1652491] Updated weights for policy 0, policy_version 221241 (0.0013) [2024-06-15 14:15:33,722][1652491] Updated weights for policy 0, policy_version 221296 (0.0020) [2024-06-15 14:15:35,095][1652491] Updated weights for policy 0, policy_version 221344 (0.0012) [2024-06-15 14:15:35,955][1648985] Fps is (10 sec: 52429.1, 60 sec: 48060.0, 300 sec: 47208.2). Total num frames: 453378048. Throughput: 0: 11504.0. Samples: 113402368. Policy #0 lag: (min: 68.0, avg: 201.4, max: 335.0) [2024-06-15 14:15:35,955][1648985] Avg episode reward: [(0, '150.670')] [2024-06-15 14:15:37,308][1652491] Updated weights for policy 0, policy_version 221392 (0.0024) [2024-06-15 14:15:40,955][1648985] Fps is (10 sec: 45875.3, 60 sec: 45329.1, 300 sec: 46430.6). Total num frames: 453509120. Throughput: 0: 11468.8. Samples: 113438720. Policy #0 lag: (min: 68.0, avg: 201.4, max: 335.0) [2024-06-15 14:15:40,955][1648985] Avg episode reward: [(0, '145.720')] [2024-06-15 14:15:41,772][1652491] Updated weights for policy 0, policy_version 221456 (0.0105) [2024-06-15 14:15:44,176][1652491] Updated weights for policy 0, policy_version 221507 (0.0014) [2024-06-15 14:15:45,857][1652491] Updated weights for policy 0, policy_version 221584 (0.0096) [2024-06-15 14:15:45,955][1648985] Fps is (10 sec: 42598.3, 60 sec: 46421.4, 300 sec: 47208.2). Total num frames: 453804032. Throughput: 0: 11605.4. Samples: 113511424. Policy #0 lag: (min: 68.0, avg: 201.4, max: 335.0) [2024-06-15 14:15:45,956][1648985] Avg episode reward: [(0, '149.400')] [2024-06-15 14:15:49,003][1652491] Updated weights for policy 0, policy_version 221648 (0.0037) [2024-06-15 14:15:49,159][1651469] Signal inference workers to stop experience collection... (11600 times) [2024-06-15 14:15:49,245][1652491] InferenceWorker_p0-w0: stopping experience collection (11600 times) [2024-06-15 14:15:49,466][1651469] Signal inference workers to resume experience collection... (11600 times) [2024-06-15 14:15:49,467][1652491] InferenceWorker_p0-w0: resuming experience collection (11600 times) [2024-06-15 14:15:50,347][1652491] Updated weights for policy 0, policy_version 221696 (0.0012) [2024-06-15 14:15:50,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 45878.5, 300 sec: 46875.0). Total num frames: 454033408. Throughput: 0: 11252.6. Samples: 113567232. Policy #0 lag: (min: 68.0, avg: 201.4, max: 335.0) [2024-06-15 14:15:50,956][1648985] Avg episode reward: [(0, '151.840')] [2024-06-15 14:15:54,534][1652491] Updated weights for policy 0, policy_version 221744 (0.0025) [2024-06-15 14:15:55,955][1648985] Fps is (10 sec: 36044.6, 60 sec: 45875.3, 300 sec: 46652.8). Total num frames: 454164480. Throughput: 0: 11377.8. Samples: 113607680. Policy #0 lag: (min: 68.0, avg: 201.4, max: 335.0) [2024-06-15 14:15:55,956][1648985] Avg episode reward: [(0, '146.260')] [2024-06-15 14:15:56,279][1652491] Updated weights for policy 0, policy_version 221776 (0.0012) [2024-06-15 14:15:58,275][1652491] Updated weights for policy 0, policy_version 221856 (0.0012) [2024-06-15 14:16:00,180][1652491] Updated weights for policy 0, policy_version 221892 (0.0013) [2024-06-15 14:16:00,955][1648985] Fps is (10 sec: 45875.0, 60 sec: 45875.2, 300 sec: 46986.0). Total num frames: 454492160. Throughput: 0: 11309.5. Samples: 113673728. Policy #0 lag: (min: 68.0, avg: 201.4, max: 335.0) [2024-06-15 14:16:00,956][1648985] Avg episode reward: [(0, '138.930')] [2024-06-15 14:16:05,564][1652491] Updated weights for policy 0, policy_version 221969 (0.0012) [2024-06-15 14:16:05,955][1648985] Fps is (10 sec: 45875.1, 60 sec: 44782.9, 300 sec: 46430.6). Total num frames: 454623232. Throughput: 0: 11309.5. Samples: 113740288. Policy #0 lag: (min: 68.0, avg: 201.4, max: 335.0) [2024-06-15 14:16:05,956][1648985] Avg episode reward: [(0, '123.140')] [2024-06-15 14:16:06,315][1652491] Updated weights for policy 0, policy_version 222013 (0.0012) [2024-06-15 14:16:08,689][1652491] Updated weights for policy 0, policy_version 222064 (0.0021) [2024-06-15 14:16:09,852][1652491] Updated weights for policy 0, policy_version 222100 (0.0018) [2024-06-15 14:16:10,708][1652491] Updated weights for policy 0, policy_version 222142 (0.0015) [2024-06-15 14:16:10,955][1648985] Fps is (10 sec: 45875.7, 60 sec: 46421.3, 300 sec: 47097.1). Total num frames: 454950912. Throughput: 0: 11457.4. Samples: 113776128. Policy #0 lag: (min: 31.0, avg: 161.0, max: 303.0) [2024-06-15 14:16:10,956][1648985] Avg episode reward: [(0, '139.330')] [2024-06-15 14:16:12,536][1652491] Updated weights for policy 0, policy_version 222192 (0.0015) [2024-06-15 14:16:15,955][1648985] Fps is (10 sec: 45874.3, 60 sec: 44782.7, 300 sec: 46208.4). Total num frames: 455081984. Throughput: 0: 11468.7. Samples: 113854976. Policy #0 lag: (min: 31.0, avg: 161.0, max: 303.0) [2024-06-15 14:16:15,956][1648985] Avg episode reward: [(0, '166.220')] [2024-06-15 14:16:15,992][1652491] Updated weights for policy 0, policy_version 222224 (0.0014) [2024-06-15 14:16:17,096][1652491] Updated weights for policy 0, policy_version 222266 (0.0012) [2024-06-15 14:16:19,244][1652491] Updated weights for policy 0, policy_version 222304 (0.0013) [2024-06-15 14:16:20,307][1652491] Updated weights for policy 0, policy_version 222337 (0.0026) [2024-06-15 14:16:20,955][1648985] Fps is (10 sec: 45875.5, 60 sec: 46967.6, 300 sec: 47319.2). Total num frames: 455409664. Throughput: 0: 11480.2. Samples: 113918976. Policy #0 lag: (min: 31.0, avg: 161.0, max: 303.0) [2024-06-15 14:16:20,955][1648985] Avg episode reward: [(0, '158.320')] [2024-06-15 14:16:21,560][1652491] Updated weights for policy 0, policy_version 222400 (0.0062) [2024-06-15 14:16:23,865][1652491] Updated weights for policy 0, policy_version 222457 (0.0016) [2024-06-15 14:16:25,955][1648985] Fps is (10 sec: 52429.7, 60 sec: 45875.2, 300 sec: 46652.7). Total num frames: 455606272. Throughput: 0: 11389.1. Samples: 113951232. Policy #0 lag: (min: 31.0, avg: 161.0, max: 303.0) [2024-06-15 14:16:25,956][1648985] Avg episode reward: [(0, '157.510')] [2024-06-15 14:16:27,481][1652491] Updated weights for policy 0, policy_version 222512 (0.0021) [2024-06-15 14:16:30,468][1652491] Updated weights for policy 0, policy_version 222563 (0.0015) [2024-06-15 14:16:30,955][1648985] Fps is (10 sec: 42598.0, 60 sec: 46421.3, 300 sec: 46986.0). Total num frames: 455835648. Throughput: 0: 11662.2. Samples: 114036224. Policy #0 lag: (min: 31.0, avg: 161.0, max: 303.0) [2024-06-15 14:16:30,956][1648985] Avg episode reward: [(0, '164.510')] [2024-06-15 14:16:31,301][1652491] Updated weights for policy 0, policy_version 222596 (0.0012) [2024-06-15 14:16:32,654][1652491] Updated weights for policy 0, policy_version 222649 (0.0012) [2024-06-15 14:16:32,814][1651469] Signal inference workers to stop experience collection... (11650 times) [2024-06-15 14:16:32,832][1652491] InferenceWorker_p0-w0: stopping experience collection (11650 times) [2024-06-15 14:16:32,914][1651469] Signal inference workers to resume experience collection... (11650 times) [2024-06-15 14:16:32,915][1652491] InferenceWorker_p0-w0: resuming experience collection (11650 times) [2024-06-15 14:16:33,759][1652491] Updated weights for policy 0, policy_version 222692 (0.0013) [2024-06-15 14:16:35,955][1648985] Fps is (10 sec: 52429.5, 60 sec: 45875.2, 300 sec: 46763.9). Total num frames: 456130560. Throughput: 0: 11958.1. Samples: 114105344. Policy #0 lag: (min: 31.0, avg: 161.0, max: 303.0) [2024-06-15 14:16:35,956][1648985] Avg episode reward: [(0, '156.440')] [2024-06-15 14:16:37,709][1652491] Updated weights for policy 0, policy_version 222728 (0.0012) [2024-06-15 14:16:38,753][1652491] Updated weights for policy 0, policy_version 222780 (0.0013) [2024-06-15 14:16:40,955][1648985] Fps is (10 sec: 49152.1, 60 sec: 46967.5, 300 sec: 46874.9). Total num frames: 456327168. Throughput: 0: 11912.5. Samples: 114143744. Policy #0 lag: (min: 31.0, avg: 161.0, max: 303.0) [2024-06-15 14:16:40,956][1648985] Avg episode reward: [(0, '119.040')] [2024-06-15 14:16:41,558][1652491] Updated weights for policy 0, policy_version 222848 (0.0013) [2024-06-15 14:16:42,924][1652491] Updated weights for policy 0, policy_version 222911 (0.0086) [2024-06-15 14:16:45,177][1652491] Updated weights for policy 0, policy_version 222966 (0.0012) [2024-06-15 14:16:45,955][1648985] Fps is (10 sec: 52428.4, 60 sec: 47513.6, 300 sec: 47097.1). Total num frames: 456654848. Throughput: 0: 11958.1. Samples: 114211840. Policy #0 lag: (min: 31.0, avg: 161.0, max: 303.0) [2024-06-15 14:16:45,956][1648985] Avg episode reward: [(0, '111.990')] [2024-06-15 14:16:49,701][1652491] Updated weights for policy 0, policy_version 223010 (0.0013) [2024-06-15 14:16:50,955][1648985] Fps is (10 sec: 45875.0, 60 sec: 45875.2, 300 sec: 46652.8). Total num frames: 456785920. Throughput: 0: 12117.3. Samples: 114285568. Policy #0 lag: (min: 31.0, avg: 161.0, max: 303.0) [2024-06-15 14:16:50,956][1648985] Avg episode reward: [(0, '118.730')] [2024-06-15 14:16:51,674][1652491] Updated weights for policy 0, policy_version 223072 (0.0012) [2024-06-15 14:16:52,642][1652491] Updated weights for policy 0, policy_version 223110 (0.0013) [2024-06-15 14:16:53,813][1652491] Updated weights for policy 0, policy_version 223162 (0.0016) [2024-06-15 14:16:55,955][1648985] Fps is (10 sec: 39321.6, 60 sec: 48059.7, 300 sec: 46874.9). Total num frames: 457048064. Throughput: 0: 12014.9. Samples: 114316800. Policy #0 lag: (min: 31.0, avg: 161.0, max: 303.0) [2024-06-15 14:16:55,955][1648985] Avg episode reward: [(0, '140.090')] [2024-06-15 14:16:56,353][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000223200_457113600.pth... [2024-06-15 14:16:56,484][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000217632_445710336.pth [2024-06-15 14:16:56,996][1652491] Updated weights for policy 0, policy_version 223220 (0.0019) [2024-06-15 14:17:00,301][1652491] Updated weights for policy 0, policy_version 223252 (0.0017) [2024-06-15 14:17:00,955][1648985] Fps is (10 sec: 49152.5, 60 sec: 46421.4, 300 sec: 46541.7). Total num frames: 457277440. Throughput: 0: 11901.2. Samples: 114390528. Policy #0 lag: (min: 31.0, avg: 161.0, max: 303.0) [2024-06-15 14:17:00,955][1648985] Avg episode reward: [(0, '126.210')] [2024-06-15 14:17:02,444][1652491] Updated weights for policy 0, policy_version 223312 (0.0025) [2024-06-15 14:17:03,771][1652491] Updated weights for policy 0, policy_version 223358 (0.0012) [2024-06-15 14:17:05,011][1652491] Updated weights for policy 0, policy_version 223415 (0.0109) [2024-06-15 14:17:05,955][1648985] Fps is (10 sec: 52429.3, 60 sec: 49152.1, 300 sec: 47208.2). Total num frames: 457572352. Throughput: 0: 11889.8. Samples: 114454016. Policy #0 lag: (min: 31.0, avg: 161.0, max: 303.0) [2024-06-15 14:17:05,955][1648985] Avg episode reward: [(0, '118.680')] [2024-06-15 14:17:08,552][1652491] Updated weights for policy 0, policy_version 223485 (0.0036) [2024-06-15 14:17:10,955][1648985] Fps is (10 sec: 42597.9, 60 sec: 45875.1, 300 sec: 46319.5). Total num frames: 457703424. Throughput: 0: 12014.9. Samples: 114491904. Policy #0 lag: (min: 31.0, avg: 161.0, max: 303.0) [2024-06-15 14:17:10,956][1648985] Avg episode reward: [(0, '123.420')] [2024-06-15 14:17:12,482][1652491] Updated weights for policy 0, policy_version 223549 (0.0013) [2024-06-15 14:17:14,919][1652491] Updated weights for policy 0, policy_version 223600 (0.0012) [2024-06-15 14:17:15,897][1651469] Signal inference workers to stop experience collection... (11700 times) [2024-06-15 14:17:15,955][1648985] Fps is (10 sec: 42598.0, 60 sec: 48606.1, 300 sec: 47208.1). Total num frames: 457998336. Throughput: 0: 11730.5. Samples: 114564096. Policy #0 lag: (min: 31.0, avg: 161.0, max: 303.0) [2024-06-15 14:17:15,956][1648985] Avg episode reward: [(0, '136.800')] [2024-06-15 14:17:16,025][1652491] InferenceWorker_p0-w0: stopping experience collection (11700 times) [2024-06-15 14:17:16,179][1651469] Signal inference workers to resume experience collection... (11700 times) [2024-06-15 14:17:16,186][1652491] InferenceWorker_p0-w0: resuming experience collection (11700 times) [2024-06-15 14:17:16,586][1652491] Updated weights for policy 0, policy_version 223664 (0.0014) [2024-06-15 14:17:19,100][1652491] Updated weights for policy 0, policy_version 223698 (0.0013) [2024-06-15 14:17:20,955][1648985] Fps is (10 sec: 52429.5, 60 sec: 46967.5, 300 sec: 46652.8). Total num frames: 458227712. Throughput: 0: 11685.0. Samples: 114631168. Policy #0 lag: (min: 15.0, avg: 142.9, max: 271.0) [2024-06-15 14:17:20,955][1648985] Avg episode reward: [(0, '138.500')] [2024-06-15 14:17:22,477][1652491] Updated weights for policy 0, policy_version 223761 (0.0014) [2024-06-15 14:17:25,533][1652491] Updated weights for policy 0, policy_version 223828 (0.0012) [2024-06-15 14:17:25,955][1648985] Fps is (10 sec: 42597.2, 60 sec: 46967.3, 300 sec: 46874.9). Total num frames: 458424320. Throughput: 0: 11628.0. Samples: 114667008. Policy #0 lag: (min: 15.0, avg: 142.9, max: 271.0) [2024-06-15 14:17:25,956][1648985] Avg episode reward: [(0, '139.900')] [2024-06-15 14:17:27,219][1652491] Updated weights for policy 0, policy_version 223892 (0.0012) [2024-06-15 14:17:30,955][1648985] Fps is (10 sec: 39321.0, 60 sec: 46421.3, 300 sec: 46430.6). Total num frames: 458620928. Throughput: 0: 11491.5. Samples: 114728960. Policy #0 lag: (min: 15.0, avg: 142.9, max: 271.0) [2024-06-15 14:17:30,956][1648985] Avg episode reward: [(0, '135.130')] [2024-06-15 14:17:31,396][1652491] Updated weights for policy 0, policy_version 223968 (0.0013) [2024-06-15 14:17:35,158][1652491] Updated weights for policy 0, policy_version 224033 (0.0023) [2024-06-15 14:17:35,653][1652491] Updated weights for policy 0, policy_version 224063 (0.0014) [2024-06-15 14:17:35,955][1648985] Fps is (10 sec: 45876.6, 60 sec: 45875.2, 300 sec: 46652.8). Total num frames: 458883072. Throughput: 0: 11400.6. Samples: 114798592. Policy #0 lag: (min: 15.0, avg: 142.9, max: 271.0) [2024-06-15 14:17:35,955][1648985] Avg episode reward: [(0, '110.090')] [2024-06-15 14:17:38,136][1652491] Updated weights for policy 0, policy_version 224128 (0.0096) [2024-06-15 14:17:39,788][1652491] Updated weights for policy 0, policy_version 224187 (0.0013) [2024-06-15 14:17:40,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 46967.4, 300 sec: 46986.0). Total num frames: 459145216. Throughput: 0: 11332.3. Samples: 114826752. Policy #0 lag: (min: 15.0, avg: 142.9, max: 271.0) [2024-06-15 14:17:40,956][1648985] Avg episode reward: [(0, '122.790')] [2024-06-15 14:17:44,026][1652491] Updated weights for policy 0, policy_version 224245 (0.0016) [2024-06-15 14:17:45,955][1648985] Fps is (10 sec: 39321.1, 60 sec: 43690.6, 300 sec: 46208.4). Total num frames: 459276288. Throughput: 0: 11241.2. Samples: 114896384. Policy #0 lag: (min: 15.0, avg: 142.9, max: 271.0) [2024-06-15 14:17:45,956][1648985] Avg episode reward: [(0, '131.060')] [2024-06-15 14:17:47,077][1652491] Updated weights for policy 0, policy_version 224290 (0.0013) [2024-06-15 14:17:48,846][1652491] Updated weights for policy 0, policy_version 224341 (0.0014) [2024-06-15 14:17:50,955][1648985] Fps is (10 sec: 42598.7, 60 sec: 46421.4, 300 sec: 47208.2). Total num frames: 459571200. Throughput: 0: 11241.2. Samples: 114959872. Policy #0 lag: (min: 15.0, avg: 142.9, max: 271.0) [2024-06-15 14:17:50,955][1648985] Avg episode reward: [(0, '136.250')] [2024-06-15 14:17:50,958][1652491] Updated weights for policy 0, policy_version 224416 (0.0013) [2024-06-15 14:17:54,930][1652491] Updated weights for policy 0, policy_version 224450 (0.0011) [2024-06-15 14:17:55,955][1648985] Fps is (10 sec: 49152.4, 60 sec: 45329.0, 300 sec: 46541.7). Total num frames: 459767808. Throughput: 0: 11150.2. Samples: 114993664. Policy #0 lag: (min: 15.0, avg: 142.9, max: 271.0) [2024-06-15 14:17:55,956][1648985] Avg episode reward: [(0, '129.510')] [2024-06-15 14:17:58,737][1652491] Updated weights for policy 0, policy_version 224528 (0.0014) [2024-06-15 14:18:00,698][1652491] Updated weights for policy 0, policy_version 224592 (0.0156) [2024-06-15 14:18:00,955][1648985] Fps is (10 sec: 39321.1, 60 sec: 44782.8, 300 sec: 46763.8). Total num frames: 459964416. Throughput: 0: 11161.6. Samples: 115066368. Policy #0 lag: (min: 15.0, avg: 142.9, max: 271.0) [2024-06-15 14:18:00,956][1648985] Avg episode reward: [(0, '129.000')] [2024-06-15 14:18:01,574][1651469] Signal inference workers to stop experience collection... (11750 times) [2024-06-15 14:18:01,608][1652491] InferenceWorker_p0-w0: stopping experience collection (11750 times) [2024-06-15 14:18:01,897][1651469] Signal inference workers to resume experience collection... (11750 times) [2024-06-15 14:18:01,899][1652491] InferenceWorker_p0-w0: resuming experience collection (11750 times) [2024-06-15 14:18:02,564][1652491] Updated weights for policy 0, policy_version 224657 (0.0017) [2024-06-15 14:18:03,661][1652491] Updated weights for policy 0, policy_version 224702 (0.0012) [2024-06-15 14:18:05,955][1648985] Fps is (10 sec: 42598.8, 60 sec: 43690.7, 300 sec: 46430.6). Total num frames: 460193792. Throughput: 0: 11070.6. Samples: 115129344. Policy #0 lag: (min: 15.0, avg: 142.9, max: 271.0) [2024-06-15 14:18:05,955][1648985] Avg episode reward: [(0, '126.620')] [2024-06-15 14:18:10,820][1652491] Updated weights for policy 0, policy_version 224784 (0.0117) [2024-06-15 14:18:10,955][1648985] Fps is (10 sec: 39321.9, 60 sec: 44236.8, 300 sec: 46319.5). Total num frames: 460357632. Throughput: 0: 11036.5. Samples: 115163648. Policy #0 lag: (min: 15.0, avg: 142.9, max: 271.0) [2024-06-15 14:18:10,956][1648985] Avg episode reward: [(0, '148.670')] [2024-06-15 14:18:12,108][1652491] Updated weights for policy 0, policy_version 224835 (0.0021) [2024-06-15 14:18:13,844][1652491] Updated weights for policy 0, policy_version 224912 (0.0013) [2024-06-15 14:18:15,059][1652491] Updated weights for policy 0, policy_version 224960 (0.0021) [2024-06-15 14:18:15,955][1648985] Fps is (10 sec: 52428.2, 60 sec: 45329.0, 300 sec: 46652.7). Total num frames: 460718080. Throughput: 0: 11082.0. Samples: 115227648. Policy #0 lag: (min: 15.0, avg: 142.9, max: 271.0) [2024-06-15 14:18:15,956][1648985] Avg episode reward: [(0, '158.030')] [2024-06-15 14:18:18,761][1652491] Updated weights for policy 0, policy_version 225018 (0.0014) [2024-06-15 14:18:20,955][1648985] Fps is (10 sec: 49151.8, 60 sec: 43690.6, 300 sec: 46208.4). Total num frames: 460849152. Throughput: 0: 11252.6. Samples: 115304960. Policy #0 lag: (min: 15.0, avg: 142.9, max: 271.0) [2024-06-15 14:18:20,956][1648985] Avg episode reward: [(0, '160.990')] [2024-06-15 14:18:22,946][1652491] Updated weights for policy 0, policy_version 225072 (0.0012) [2024-06-15 14:18:24,662][1652491] Updated weights for policy 0, policy_version 225155 (0.0131) [2024-06-15 14:18:25,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 46967.7, 300 sec: 47097.1). Total num frames: 461242368. Throughput: 0: 11355.0. Samples: 115337728. Policy #0 lag: (min: 15.0, avg: 142.9, max: 271.0) [2024-06-15 14:18:25,956][1648985] Avg episode reward: [(0, '157.050')] [2024-06-15 14:18:29,585][1652491] Updated weights for policy 0, policy_version 225248 (0.0086) [2024-06-15 14:18:30,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 45875.2, 300 sec: 46319.5). Total num frames: 461373440. Throughput: 0: 11332.3. Samples: 115406336. Policy #0 lag: (min: 15.0, avg: 130.2, max: 271.0) [2024-06-15 14:18:30,956][1648985] Avg episode reward: [(0, '153.410')] [2024-06-15 14:18:32,425][1652491] Updated weights for policy 0, policy_version 225281 (0.0018) [2024-06-15 14:18:33,813][1652491] Updated weights for policy 0, policy_version 225330 (0.0012) [2024-06-15 14:18:34,952][1652491] Updated weights for policy 0, policy_version 225376 (0.0012) [2024-06-15 14:18:35,955][1648985] Fps is (10 sec: 39321.6, 60 sec: 45875.2, 300 sec: 46652.8). Total num frames: 461635584. Throughput: 0: 11468.8. Samples: 115475968. Policy #0 lag: (min: 15.0, avg: 130.2, max: 271.0) [2024-06-15 14:18:35,956][1648985] Avg episode reward: [(0, '133.930')] [2024-06-15 14:18:37,050][1652491] Updated weights for policy 0, policy_version 225465 (0.0014) [2024-06-15 14:18:40,954][1652491] Updated weights for policy 0, policy_version 225504 (0.0013) [2024-06-15 14:18:40,955][1648985] Fps is (10 sec: 45874.9, 60 sec: 44782.9, 300 sec: 46430.6). Total num frames: 461832192. Throughput: 0: 11491.5. Samples: 115510784. Policy #0 lag: (min: 15.0, avg: 130.2, max: 271.0) [2024-06-15 14:18:40,956][1648985] Avg episode reward: [(0, '142.560')] [2024-06-15 14:18:44,063][1652491] Updated weights for policy 0, policy_version 225537 (0.0015) [2024-06-15 14:18:44,716][1651469] Signal inference workers to stop experience collection... (11800 times) [2024-06-15 14:18:44,761][1652491] InferenceWorker_p0-w0: stopping experience collection (11800 times) [2024-06-15 14:18:44,919][1651469] Signal inference workers to resume experience collection... (11800 times) [2024-06-15 14:18:44,919][1652491] InferenceWorker_p0-w0: resuming experience collection (11800 times) [2024-06-15 14:18:45,097][1652491] Updated weights for policy 0, policy_version 225591 (0.0011) [2024-06-15 14:18:45,927][1652491] Updated weights for policy 0, policy_version 225616 (0.0011) [2024-06-15 14:18:45,955][1648985] Fps is (10 sec: 42598.4, 60 sec: 46421.4, 300 sec: 46319.6). Total num frames: 462061568. Throughput: 0: 11525.7. Samples: 115585024. Policy #0 lag: (min: 15.0, avg: 130.2, max: 271.0) [2024-06-15 14:18:45,956][1648985] Avg episode reward: [(0, '163.940')] [2024-06-15 14:18:47,990][1652491] Updated weights for policy 0, policy_version 225696 (0.0012) [2024-06-15 14:18:50,955][1648985] Fps is (10 sec: 45875.6, 60 sec: 45329.0, 300 sec: 46208.4). Total num frames: 462290944. Throughput: 0: 11730.5. Samples: 115657216. Policy #0 lag: (min: 15.0, avg: 130.2, max: 271.0) [2024-06-15 14:18:50,956][1648985] Avg episode reward: [(0, '157.780')] [2024-06-15 14:18:51,368][1652491] Updated weights for policy 0, policy_version 225748 (0.0025) [2024-06-15 14:18:55,369][1652491] Updated weights for policy 0, policy_version 225808 (0.0094) [2024-06-15 14:18:55,955][1648985] Fps is (10 sec: 42597.1, 60 sec: 45328.9, 300 sec: 46097.3). Total num frames: 462487552. Throughput: 0: 11787.3. Samples: 115694080. Policy #0 lag: (min: 15.0, avg: 130.2, max: 271.0) [2024-06-15 14:18:55,956][1648985] Avg episode reward: [(0, '136.960')] [2024-06-15 14:18:56,355][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000225856_462553088.pth... [2024-06-15 14:18:56,581][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000220448_451477504.pth [2024-06-15 14:18:57,442][1652491] Updated weights for policy 0, policy_version 225889 (0.0013) [2024-06-15 14:18:58,902][1652491] Updated weights for policy 0, policy_version 225939 (0.0012) [2024-06-15 14:19:00,955][1648985] Fps is (10 sec: 52429.2, 60 sec: 47513.7, 300 sec: 46430.6). Total num frames: 462815232. Throughput: 0: 11741.9. Samples: 115756032. Policy #0 lag: (min: 15.0, avg: 130.2, max: 271.0) [2024-06-15 14:19:00,955][1648985] Avg episode reward: [(0, '130.780')] [2024-06-15 14:19:02,367][1652491] Updated weights for policy 0, policy_version 225986 (0.0013) [2024-06-15 14:19:03,539][1652491] Updated weights for policy 0, policy_version 226038 (0.0012) [2024-06-15 14:19:05,955][1648985] Fps is (10 sec: 45876.9, 60 sec: 45875.2, 300 sec: 46208.4). Total num frames: 462946304. Throughput: 0: 11730.5. Samples: 115832832. Policy #0 lag: (min: 15.0, avg: 130.2, max: 271.0) [2024-06-15 14:19:05,955][1648985] Avg episode reward: [(0, '151.810')] [2024-06-15 14:19:07,458][1652491] Updated weights for policy 0, policy_version 226082 (0.0011) [2024-06-15 14:19:08,836][1652491] Updated weights for policy 0, policy_version 226147 (0.0085) [2024-06-15 14:19:10,494][1652491] Updated weights for policy 0, policy_version 226209 (0.0011) [2024-06-15 14:19:10,955][1648985] Fps is (10 sec: 52427.3, 60 sec: 49698.0, 300 sec: 46652.7). Total num frames: 463339520. Throughput: 0: 11719.0. Samples: 115865088. Policy #0 lag: (min: 15.0, avg: 130.2, max: 271.0) [2024-06-15 14:19:10,956][1648985] Avg episode reward: [(0, '151.570')] [2024-06-15 14:19:13,811][1652491] Updated weights for policy 0, policy_version 226258 (0.0016) [2024-06-15 14:19:15,955][1648985] Fps is (10 sec: 52428.3, 60 sec: 45875.2, 300 sec: 46208.8). Total num frames: 463470592. Throughput: 0: 11685.0. Samples: 115932160. Policy #0 lag: (min: 15.0, avg: 130.2, max: 271.0) [2024-06-15 14:19:15,956][1648985] Avg episode reward: [(0, '140.360')] [2024-06-15 14:19:18,318][1652491] Updated weights for policy 0, policy_version 226320 (0.0015) [2024-06-15 14:19:20,090][1652491] Updated weights for policy 0, policy_version 226400 (0.0084) [2024-06-15 14:19:20,955][1648985] Fps is (10 sec: 39322.9, 60 sec: 48059.9, 300 sec: 46541.7). Total num frames: 463732736. Throughput: 0: 11776.0. Samples: 116005888. Policy #0 lag: (min: 15.0, avg: 130.2, max: 271.0) [2024-06-15 14:19:20,955][1648985] Avg episode reward: [(0, '154.510')] [2024-06-15 14:19:22,326][1652491] Updated weights for policy 0, policy_version 226489 (0.0186) [2024-06-15 14:19:25,092][1651469] Signal inference workers to stop experience collection... (11850 times) [2024-06-15 14:19:25,141][1652491] InferenceWorker_p0-w0: stopping experience collection (11850 times) [2024-06-15 14:19:25,293][1651469] Signal inference workers to resume experience collection... (11850 times) [2024-06-15 14:19:25,293][1652491] InferenceWorker_p0-w0: resuming experience collection (11850 times) [2024-06-15 14:19:25,502][1652491] Updated weights for policy 0, policy_version 226532 (0.0050) [2024-06-15 14:19:25,955][1648985] Fps is (10 sec: 49151.6, 60 sec: 45329.0, 300 sec: 46097.3). Total num frames: 463962112. Throughput: 0: 11673.6. Samples: 116036096. Policy #0 lag: (min: 15.0, avg: 130.2, max: 271.0) [2024-06-15 14:19:25,956][1648985] Avg episode reward: [(0, '165.340')] [2024-06-15 14:19:29,566][1652491] Updated weights for policy 0, policy_version 226576 (0.0011) [2024-06-15 14:19:30,955][1648985] Fps is (10 sec: 39321.3, 60 sec: 45875.2, 300 sec: 46208.5). Total num frames: 464125952. Throughput: 0: 11798.8. Samples: 116115968. Policy #0 lag: (min: 15.0, avg: 130.2, max: 271.0) [2024-06-15 14:19:30,955][1648985] Avg episode reward: [(0, '173.930')] [2024-06-15 14:19:31,014][1652491] Updated weights for policy 0, policy_version 226640 (0.0014) [2024-06-15 14:19:33,028][1652491] Updated weights for policy 0, policy_version 226723 (0.0012) [2024-06-15 14:19:35,955][1648985] Fps is (10 sec: 42599.1, 60 sec: 45875.2, 300 sec: 46097.4). Total num frames: 464388096. Throughput: 0: 11446.1. Samples: 116172288. Policy #0 lag: (min: 15.0, avg: 130.2, max: 271.0) [2024-06-15 14:19:35,955][1648985] Avg episode reward: [(0, '161.450')] [2024-06-15 14:19:37,337][1652491] Updated weights for policy 0, policy_version 226770 (0.0022) [2024-06-15 14:19:40,957][1648985] Fps is (10 sec: 39313.2, 60 sec: 44781.4, 300 sec: 45763.8). Total num frames: 464519168. Throughput: 0: 11297.7. Samples: 116202496. Policy #0 lag: (min: 47.0, avg: 168.9, max: 303.0) [2024-06-15 14:19:40,958][1648985] Avg episode reward: [(0, '171.620')] [2024-06-15 14:19:41,649][1652491] Updated weights for policy 0, policy_version 226819 (0.0013) [2024-06-15 14:19:43,097][1652491] Updated weights for policy 0, policy_version 226883 (0.0012) [2024-06-15 14:19:44,742][1652491] Updated weights for policy 0, policy_version 226945 (0.0013) [2024-06-15 14:19:45,881][1652491] Updated weights for policy 0, policy_version 227006 (0.0013) [2024-06-15 14:19:45,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 47513.6, 300 sec: 46209.1). Total num frames: 464912384. Throughput: 0: 11491.5. Samples: 116273152. Policy #0 lag: (min: 47.0, avg: 168.9, max: 303.0) [2024-06-15 14:19:45,955][1648985] Avg episode reward: [(0, '169.880')] [2024-06-15 14:19:49,720][1652491] Updated weights for policy 0, policy_version 227043 (0.0011) [2024-06-15 14:19:50,955][1648985] Fps is (10 sec: 52439.7, 60 sec: 45875.2, 300 sec: 46208.5). Total num frames: 465043456. Throughput: 0: 11309.5. Samples: 116341760. Policy #0 lag: (min: 47.0, avg: 168.9, max: 303.0) [2024-06-15 14:19:50,956][1648985] Avg episode reward: [(0, '167.380')] [2024-06-15 14:19:53,024][1652491] Updated weights for policy 0, policy_version 227088 (0.0067) [2024-06-15 14:19:54,683][1652491] Updated weights for policy 0, policy_version 227168 (0.0130) [2024-06-15 14:19:55,955][1648985] Fps is (10 sec: 42598.1, 60 sec: 47513.8, 300 sec: 46097.4). Total num frames: 465338368. Throughput: 0: 11434.7. Samples: 116379648. Policy #0 lag: (min: 47.0, avg: 168.9, max: 303.0) [2024-06-15 14:19:55,956][1648985] Avg episode reward: [(0, '153.820')] [2024-06-15 14:19:56,753][1652491] Updated weights for policy 0, policy_version 227251 (0.0016) [2024-06-15 14:20:00,792][1652491] Updated weights for policy 0, policy_version 227282 (0.0012) [2024-06-15 14:20:00,955][1648985] Fps is (10 sec: 42598.8, 60 sec: 44236.8, 300 sec: 45875.2). Total num frames: 465469440. Throughput: 0: 11400.6. Samples: 116445184. Policy #0 lag: (min: 47.0, avg: 168.9, max: 303.0) [2024-06-15 14:20:00,956][1648985] Avg episode reward: [(0, '150.750')] [2024-06-15 14:20:04,976][1652491] Updated weights for policy 0, policy_version 227344 (0.0015) [2024-06-15 14:20:05,955][1648985] Fps is (10 sec: 32768.4, 60 sec: 45329.1, 300 sec: 45764.1). Total num frames: 465666048. Throughput: 0: 11298.1. Samples: 116514304. Policy #0 lag: (min: 47.0, avg: 168.9, max: 303.0) [2024-06-15 14:20:05,955][1648985] Avg episode reward: [(0, '147.220')] [2024-06-15 14:20:06,426][1652491] Updated weights for policy 0, policy_version 227408 (0.0013) [2024-06-15 14:20:06,868][1651469] Signal inference workers to stop experience collection... (11900 times) [2024-06-15 14:20:06,894][1652491] InferenceWorker_p0-w0: stopping experience collection (11900 times) [2024-06-15 14:20:07,041][1651469] Signal inference workers to resume experience collection... (11900 times) [2024-06-15 14:20:07,045][1652491] InferenceWorker_p0-w0: resuming experience collection (11900 times) [2024-06-15 14:20:07,587][1652491] Updated weights for policy 0, policy_version 227459 (0.0014) [2024-06-15 14:20:09,007][1652491] Updated weights for policy 0, policy_version 227519 (0.0035) [2024-06-15 14:20:10,955][1648985] Fps is (10 sec: 49151.5, 60 sec: 43690.8, 300 sec: 45986.3). Total num frames: 465960960. Throughput: 0: 11252.6. Samples: 116542464. Policy #0 lag: (min: 47.0, avg: 168.9, max: 303.0) [2024-06-15 14:20:10,956][1648985] Avg episode reward: [(0, '150.600')] [2024-06-15 14:20:15,955][1648985] Fps is (10 sec: 42598.3, 60 sec: 43690.7, 300 sec: 45764.1). Total num frames: 466092032. Throughput: 0: 11218.5. Samples: 116620800. Policy #0 lag: (min: 47.0, avg: 168.9, max: 303.0) [2024-06-15 14:20:15,955][1648985] Avg episode reward: [(0, '141.340')] [2024-06-15 14:20:16,097][1652491] Updated weights for policy 0, policy_version 227600 (0.0013) [2024-06-15 14:20:17,649][1652491] Updated weights for policy 0, policy_version 227665 (0.0012) [2024-06-15 14:20:18,934][1652491] Updated weights for policy 0, policy_version 227728 (0.0013) [2024-06-15 14:20:20,955][1648985] Fps is (10 sec: 52428.5, 60 sec: 45875.0, 300 sec: 46208.4). Total num frames: 466485248. Throughput: 0: 11343.6. Samples: 116682752. Policy #0 lag: (min: 47.0, avg: 168.9, max: 303.0) [2024-06-15 14:20:20,956][1648985] Avg episode reward: [(0, '143.350')] [2024-06-15 14:20:23,317][1652491] Updated weights for policy 0, policy_version 227795 (0.0018) [2024-06-15 14:20:24,078][1652491] Updated weights for policy 0, policy_version 227834 (0.0021) [2024-06-15 14:20:25,955][1648985] Fps is (10 sec: 52428.4, 60 sec: 44236.9, 300 sec: 45986.3). Total num frames: 466616320. Throughput: 0: 11571.7. Samples: 116723200. Policy #0 lag: (min: 47.0, avg: 168.9, max: 303.0) [2024-06-15 14:20:25,956][1648985] Avg episode reward: [(0, '137.310')] [2024-06-15 14:20:27,568][1652491] Updated weights for policy 0, policy_version 227876 (0.0013) [2024-06-15 14:20:28,860][1652491] Updated weights for policy 0, policy_version 227941 (0.0015) [2024-06-15 14:20:30,418][1652491] Updated weights for policy 0, policy_version 228016 (0.0013) [2024-06-15 14:20:30,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 48059.6, 300 sec: 46208.4). Total num frames: 467009536. Throughput: 0: 11616.7. Samples: 116795904. Policy #0 lag: (min: 47.0, avg: 168.9, max: 303.0) [2024-06-15 14:20:30,956][1648985] Avg episode reward: [(0, '129.370')] [2024-06-15 14:20:34,660][1652491] Updated weights for policy 0, policy_version 228064 (0.0013) [2024-06-15 14:20:35,955][1648985] Fps is (10 sec: 52428.6, 60 sec: 45875.1, 300 sec: 46208.4). Total num frames: 467140608. Throughput: 0: 11650.8. Samples: 116866048. Policy #0 lag: (min: 47.0, avg: 168.9, max: 303.0) [2024-06-15 14:20:35,956][1648985] Avg episode reward: [(0, '141.680')] [2024-06-15 14:20:38,206][1652491] Updated weights for policy 0, policy_version 228114 (0.0012) [2024-06-15 14:20:39,909][1652491] Updated weights for policy 0, policy_version 228192 (0.0013) [2024-06-15 14:20:40,955][1648985] Fps is (10 sec: 42598.7, 60 sec: 48607.5, 300 sec: 46208.4). Total num frames: 467435520. Throughput: 0: 11696.4. Samples: 116905984. Policy #0 lag: (min: 47.0, avg: 168.9, max: 303.0) [2024-06-15 14:20:40,956][1648985] Avg episode reward: [(0, '154.270')] [2024-06-15 14:20:41,761][1652491] Updated weights for policy 0, policy_version 228284 (0.0102) [2024-06-15 14:20:45,777][1652491] Updated weights for policy 0, policy_version 228324 (0.0014) [2024-06-15 14:20:45,955][1648985] Fps is (10 sec: 49152.3, 60 sec: 45329.0, 300 sec: 46097.4). Total num frames: 467632128. Throughput: 0: 11776.0. Samples: 116975104. Policy #0 lag: (min: 47.0, avg: 168.9, max: 303.0) [2024-06-15 14:20:45,956][1648985] Avg episode reward: [(0, '147.840')] [2024-06-15 14:20:48,138][1651469] Signal inference workers to stop experience collection... (11950 times) [2024-06-15 14:20:48,186][1652491] InferenceWorker_p0-w0: stopping experience collection (11950 times) [2024-06-15 14:20:48,377][1651469] Signal inference workers to resume experience collection... (11950 times) [2024-06-15 14:20:48,378][1652491] InferenceWorker_p0-w0: resuming experience collection (11950 times) [2024-06-15 14:20:48,521][1652491] Updated weights for policy 0, policy_version 228369 (0.0015) [2024-06-15 14:20:50,060][1652491] Updated weights for policy 0, policy_version 228432 (0.0013) [2024-06-15 14:20:50,955][1648985] Fps is (10 sec: 45875.7, 60 sec: 47513.7, 300 sec: 46541.7). Total num frames: 467894272. Throughput: 0: 11889.8. Samples: 117049344. Policy #0 lag: (min: 15.0, avg: 100.3, max: 271.0) [2024-06-15 14:20:50,955][1648985] Avg episode reward: [(0, '136.840')] [2024-06-15 14:20:52,145][1652491] Updated weights for policy 0, policy_version 228529 (0.0151) [2024-06-15 14:20:55,955][1648985] Fps is (10 sec: 45873.7, 60 sec: 45875.0, 300 sec: 46097.3). Total num frames: 468090880. Throughput: 0: 12014.9. Samples: 117083136. Policy #0 lag: (min: 15.0, avg: 100.3, max: 271.0) [2024-06-15 14:20:55,956][1648985] Avg episode reward: [(0, '131.240')] [2024-06-15 14:20:56,091][1652491] Updated weights for policy 0, policy_version 228563 (0.0013) [2024-06-15 14:20:56,512][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000228592_468156416.pth... [2024-06-15 14:20:56,603][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000223200_457113600.pth [2024-06-15 14:20:58,994][1652491] Updated weights for policy 0, policy_version 228614 (0.0013) [2024-06-15 14:21:00,025][1652491] Updated weights for policy 0, policy_version 228665 (0.0101) [2024-06-15 14:21:00,955][1648985] Fps is (10 sec: 45874.9, 60 sec: 48059.7, 300 sec: 46541.7). Total num frames: 468353024. Throughput: 0: 11935.3. Samples: 117157888. Policy #0 lag: (min: 15.0, avg: 100.3, max: 271.0) [2024-06-15 14:21:00,956][1648985] Avg episode reward: [(0, '143.760')] [2024-06-15 14:21:01,118][1652491] Updated weights for policy 0, policy_version 228704 (0.0012) [2024-06-15 14:21:02,652][1652491] Updated weights for policy 0, policy_version 228770 (0.0113) [2024-06-15 14:21:05,955][1648985] Fps is (10 sec: 49153.7, 60 sec: 48605.8, 300 sec: 46208.4). Total num frames: 468582400. Throughput: 0: 12162.9. Samples: 117230080. Policy #0 lag: (min: 15.0, avg: 100.3, max: 271.0) [2024-06-15 14:21:05,956][1648985] Avg episode reward: [(0, '156.470')] [2024-06-15 14:21:06,488][1652491] Updated weights for policy 0, policy_version 228806 (0.0014) [2024-06-15 14:21:07,541][1652491] Updated weights for policy 0, policy_version 228860 (0.0016) [2024-06-15 14:21:10,546][1652491] Updated weights for policy 0, policy_version 228917 (0.0013) [2024-06-15 14:21:10,955][1648985] Fps is (10 sec: 49152.3, 60 sec: 48059.8, 300 sec: 46652.8). Total num frames: 468844544. Throughput: 0: 12083.2. Samples: 117266944. Policy #0 lag: (min: 15.0, avg: 100.3, max: 271.0) [2024-06-15 14:21:10,955][1648985] Avg episode reward: [(0, '163.550')] [2024-06-15 14:21:12,684][1652491] Updated weights for policy 0, policy_version 228976 (0.0013) [2024-06-15 14:21:14,497][1652491] Updated weights for policy 0, policy_version 229045 (0.0015) [2024-06-15 14:21:15,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 50244.2, 300 sec: 46430.6). Total num frames: 469106688. Throughput: 0: 11901.2. Samples: 117331456. Policy #0 lag: (min: 15.0, avg: 100.3, max: 271.0) [2024-06-15 14:21:15,956][1648985] Avg episode reward: [(0, '164.430')] [2024-06-15 14:21:17,797][1652491] Updated weights for policy 0, policy_version 229088 (0.0013) [2024-06-15 14:21:20,794][1652491] Updated weights for policy 0, policy_version 229136 (0.0023) [2024-06-15 14:21:20,955][1648985] Fps is (10 sec: 42598.2, 60 sec: 46421.4, 300 sec: 46319.5). Total num frames: 469270528. Throughput: 0: 12083.2. Samples: 117409792. Policy #0 lag: (min: 15.0, avg: 100.3, max: 271.0) [2024-06-15 14:21:20,956][1648985] Avg episode reward: [(0, '138.030')] [2024-06-15 14:21:23,018][1652491] Updated weights for policy 0, policy_version 229200 (0.0014) [2024-06-15 14:21:25,355][1652491] Updated weights for policy 0, policy_version 229281 (0.0013) [2024-06-15 14:21:25,364][1651469] Signal inference workers to stop experience collection... (12000 times) [2024-06-15 14:21:25,433][1652491] InferenceWorker_p0-w0: stopping experience collection (12000 times) [2024-06-15 14:21:25,694][1651469] Signal inference workers to resume experience collection... (12000 times) [2024-06-15 14:21:25,695][1652491] InferenceWorker_p0-w0: resuming experience collection (12000 times) [2024-06-15 14:21:25,955][1648985] Fps is (10 sec: 49151.6, 60 sec: 49698.1, 300 sec: 46652.7). Total num frames: 469598208. Throughput: 0: 11832.9. Samples: 117438464. Policy #0 lag: (min: 15.0, avg: 100.3, max: 271.0) [2024-06-15 14:21:25,956][1648985] Avg episode reward: [(0, '135.930')] [2024-06-15 14:21:29,103][1652491] Updated weights for policy 0, policy_version 229328 (0.0014) [2024-06-15 14:21:30,955][1648985] Fps is (10 sec: 49151.5, 60 sec: 45875.2, 300 sec: 46208.4). Total num frames: 469762048. Throughput: 0: 11810.1. Samples: 117506560. Policy #0 lag: (min: 15.0, avg: 100.3, max: 271.0) [2024-06-15 14:21:30,956][1648985] Avg episode reward: [(0, '153.470')] [2024-06-15 14:21:33,080][1652491] Updated weights for policy 0, policy_version 229410 (0.0094) [2024-06-15 14:21:35,117][1652491] Updated weights for policy 0, policy_version 229460 (0.0014) [2024-06-15 14:21:35,955][1648985] Fps is (10 sec: 39322.0, 60 sec: 47513.7, 300 sec: 46319.5). Total num frames: 469991424. Throughput: 0: 11685.0. Samples: 117575168. Policy #0 lag: (min: 15.0, avg: 100.3, max: 271.0) [2024-06-15 14:21:35,956][1648985] Avg episode reward: [(0, '160.690')] [2024-06-15 14:21:37,242][1652491] Updated weights for policy 0, policy_version 229537 (0.0013) [2024-06-15 14:21:40,696][1652491] Updated weights for policy 0, policy_version 229584 (0.0019) [2024-06-15 14:21:40,955][1648985] Fps is (10 sec: 42598.7, 60 sec: 45875.2, 300 sec: 45875.2). Total num frames: 470188032. Throughput: 0: 11571.3. Samples: 117603840. Policy #0 lag: (min: 15.0, avg: 100.3, max: 271.0) [2024-06-15 14:21:40,956][1648985] Avg episode reward: [(0, '132.340')] [2024-06-15 14:21:45,255][1652491] Updated weights for policy 0, policy_version 229651 (0.0128) [2024-06-15 14:21:45,955][1648985] Fps is (10 sec: 39321.7, 60 sec: 45875.2, 300 sec: 46097.4). Total num frames: 470384640. Throughput: 0: 11628.1. Samples: 117681152. Policy #0 lag: (min: 15.0, avg: 100.3, max: 271.0) [2024-06-15 14:21:45,955][1648985] Avg episode reward: [(0, '144.380')] [2024-06-15 14:21:47,520][1652491] Updated weights for policy 0, policy_version 229744 (0.0018) [2024-06-15 14:21:48,933][1652491] Updated weights for policy 0, policy_version 229794 (0.0012) [2024-06-15 14:21:50,955][1648985] Fps is (10 sec: 49152.1, 60 sec: 46421.3, 300 sec: 46208.4). Total num frames: 470679552. Throughput: 0: 11252.6. Samples: 117736448. Policy #0 lag: (min: 15.0, avg: 100.3, max: 271.0) [2024-06-15 14:21:50,956][1648985] Avg episode reward: [(0, '158.450')] [2024-06-15 14:21:52,624][1652491] Updated weights for policy 0, policy_version 229844 (0.0024) [2024-06-15 14:21:53,379][1652491] Updated weights for policy 0, policy_version 229886 (0.0022) [2024-06-15 14:21:55,955][1648985] Fps is (10 sec: 42597.9, 60 sec: 45329.3, 300 sec: 45875.2). Total num frames: 470810624. Throughput: 0: 11309.5. Samples: 117775872. Policy #0 lag: (min: 15.0, avg: 100.3, max: 271.0) [2024-06-15 14:21:55,956][1648985] Avg episode reward: [(0, '159.540')] [2024-06-15 14:21:57,697][1652491] Updated weights for policy 0, policy_version 229938 (0.0014) [2024-06-15 14:21:59,373][1652491] Updated weights for policy 0, policy_version 230001 (0.0119) [2024-06-15 14:22:00,955][1648985] Fps is (10 sec: 45875.1, 60 sec: 46421.3, 300 sec: 45986.3). Total num frames: 471138304. Throughput: 0: 11298.1. Samples: 117839872. Policy #0 lag: (min: 40.0, avg: 114.4, max: 296.0) [2024-06-15 14:22:00,956][1648985] Avg episode reward: [(0, '149.430')] [2024-06-15 14:22:01,255][1652491] Updated weights for policy 0, policy_version 230067 (0.0017) [2024-06-15 14:22:03,975][1652491] Updated weights for policy 0, policy_version 230112 (0.0128) [2024-06-15 14:22:05,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 45875.2, 300 sec: 46208.4). Total num frames: 471334912. Throughput: 0: 11059.2. Samples: 117907456. Policy #0 lag: (min: 40.0, avg: 114.4, max: 296.0) [2024-06-15 14:22:05,956][1648985] Avg episode reward: [(0, '129.180')] [2024-06-15 14:22:09,062][1652491] Updated weights for policy 0, policy_version 230163 (0.0013) [2024-06-15 14:22:10,721][1651469] Signal inference workers to stop experience collection... (12050 times) [2024-06-15 14:22:10,746][1652491] Updated weights for policy 0, policy_version 230225 (0.0037) [2024-06-15 14:22:10,767][1652491] InferenceWorker_p0-w0: stopping experience collection (12050 times) [2024-06-15 14:22:10,955][1648985] Fps is (10 sec: 36045.0, 60 sec: 44236.8, 300 sec: 45764.1). Total num frames: 471498752. Throughput: 0: 11264.0. Samples: 117945344. Policy #0 lag: (min: 40.0, avg: 114.4, max: 296.0) [2024-06-15 14:22:10,956][1648985] Avg episode reward: [(0, '146.870')] [2024-06-15 14:22:11,076][1651469] Signal inference workers to resume experience collection... (12050 times) [2024-06-15 14:22:11,077][1652491] InferenceWorker_p0-w0: resuming experience collection (12050 times) [2024-06-15 14:22:12,149][1652491] Updated weights for policy 0, policy_version 230278 (0.0011) [2024-06-15 14:22:13,391][1652491] Updated weights for policy 0, policy_version 230335 (0.0013) [2024-06-15 14:22:15,955][1648985] Fps is (10 sec: 45875.7, 60 sec: 44783.0, 300 sec: 45986.3). Total num frames: 471793664. Throughput: 0: 11025.1. Samples: 118002688. Policy #0 lag: (min: 40.0, avg: 114.4, max: 296.0) [2024-06-15 14:22:15,955][1648985] Avg episode reward: [(0, '159.990')] [2024-06-15 14:22:16,391][1652491] Updated weights for policy 0, policy_version 230394 (0.0010) [2024-06-15 14:22:20,955][1648985] Fps is (10 sec: 36045.0, 60 sec: 43144.6, 300 sec: 45542.0). Total num frames: 471859200. Throughput: 0: 11184.4. Samples: 118078464. Policy #0 lag: (min: 40.0, avg: 114.4, max: 296.0) [2024-06-15 14:22:20,955][1648985] Avg episode reward: [(0, '150.850')] [2024-06-15 14:22:22,769][1652491] Updated weights for policy 0, policy_version 230464 (0.0012) [2024-06-15 14:22:24,080][1652491] Updated weights for policy 0, policy_version 230528 (0.0077) [2024-06-15 14:22:25,733][1652491] Updated weights for policy 0, policy_version 230588 (0.0011) [2024-06-15 14:22:25,955][1648985] Fps is (10 sec: 45874.7, 60 sec: 44236.8, 300 sec: 46208.4). Total num frames: 472252416. Throughput: 0: 11161.6. Samples: 118106112. Policy #0 lag: (min: 40.0, avg: 114.4, max: 296.0) [2024-06-15 14:22:25,956][1648985] Avg episode reward: [(0, '145.130')] [2024-06-15 14:22:28,092][1652491] Updated weights for policy 0, policy_version 230646 (0.0013) [2024-06-15 14:22:30,955][1648985] Fps is (10 sec: 52428.3, 60 sec: 43690.7, 300 sec: 45764.1). Total num frames: 472383488. Throughput: 0: 10888.5. Samples: 118171136. Policy #0 lag: (min: 40.0, avg: 114.4, max: 296.0) [2024-06-15 14:22:30,956][1648985] Avg episode reward: [(0, '147.170')] [2024-06-15 14:22:33,512][1652491] Updated weights for policy 0, policy_version 230675 (0.0011) [2024-06-15 14:22:34,940][1652491] Updated weights for policy 0, policy_version 230744 (0.0012) [2024-06-15 14:22:35,955][1648985] Fps is (10 sec: 39322.0, 60 sec: 44236.8, 300 sec: 45764.1). Total num frames: 472645632. Throughput: 0: 11184.4. Samples: 118239744. Policy #0 lag: (min: 40.0, avg: 114.4, max: 296.0) [2024-06-15 14:22:35,956][1648985] Avg episode reward: [(0, '146.180')] [2024-06-15 14:22:36,602][1652491] Updated weights for policy 0, policy_version 230803 (0.0013) [2024-06-15 14:22:39,056][1652491] Updated weights for policy 0, policy_version 230864 (0.0018) [2024-06-15 14:22:40,218][1652491] Updated weights for policy 0, policy_version 230908 (0.0013) [2024-06-15 14:22:40,956][1648985] Fps is (10 sec: 52425.8, 60 sec: 45328.6, 300 sec: 46208.4). Total num frames: 472907776. Throughput: 0: 11081.8. Samples: 118274560. Policy #0 lag: (min: 40.0, avg: 114.4, max: 296.0) [2024-06-15 14:22:40,957][1648985] Avg episode reward: [(0, '144.100')] [2024-06-15 14:22:45,463][1652491] Updated weights for policy 0, policy_version 230965 (0.0098) [2024-06-15 14:22:45,955][1648985] Fps is (10 sec: 39321.5, 60 sec: 44236.8, 300 sec: 45653.0). Total num frames: 473038848. Throughput: 0: 11218.5. Samples: 118344704. Policy #0 lag: (min: 40.0, avg: 114.4, max: 296.0) [2024-06-15 14:22:45,956][1648985] Avg episode reward: [(0, '133.230')] [2024-06-15 14:22:47,051][1652491] Updated weights for policy 0, policy_version 231027 (0.0013) [2024-06-15 14:22:48,775][1652491] Updated weights for policy 0, policy_version 231101 (0.0015) [2024-06-15 14:22:50,955][1648985] Fps is (10 sec: 39322.9, 60 sec: 43690.5, 300 sec: 45875.2). Total num frames: 473300992. Throughput: 0: 11195.7. Samples: 118411264. Policy #0 lag: (min: 40.0, avg: 114.4, max: 296.0) [2024-06-15 14:22:50,956][1648985] Avg episode reward: [(0, '146.090')] [2024-06-15 14:22:51,903][1651469] Signal inference workers to stop experience collection... (12100 times) [2024-06-15 14:22:51,945][1652491] InferenceWorker_p0-w0: stopping experience collection (12100 times) [2024-06-15 14:22:52,174][1651469] Signal inference workers to resume experience collection... (12100 times) [2024-06-15 14:22:52,175][1652491] InferenceWorker_p0-w0: resuming experience collection (12100 times) [2024-06-15 14:22:52,365][1652491] Updated weights for policy 0, policy_version 231160 (0.0015) [2024-06-15 14:22:55,955][1648985] Fps is (10 sec: 45874.5, 60 sec: 44782.9, 300 sec: 45875.2). Total num frames: 473497600. Throughput: 0: 11093.3. Samples: 118444544. Policy #0 lag: (min: 40.0, avg: 114.4, max: 296.0) [2024-06-15 14:22:55,956][1648985] Avg episode reward: [(0, '162.920')] [2024-06-15 14:22:56,546][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000231232_473563136.pth... [2024-06-15 14:22:56,546][1652491] Updated weights for policy 0, policy_version 231232 (0.0125) [2024-06-15 14:22:56,716][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000225856_462553088.pth [2024-06-15 14:22:58,284][1652491] Updated weights for policy 0, policy_version 231296 (0.0014) [2024-06-15 14:22:59,875][1652491] Updated weights for policy 0, policy_version 231351 (0.0017) [2024-06-15 14:23:00,955][1648985] Fps is (10 sec: 52430.6, 60 sec: 44783.0, 300 sec: 46208.4). Total num frames: 473825280. Throughput: 0: 11070.6. Samples: 118500864. Policy #0 lag: (min: 40.0, avg: 114.4, max: 296.0) [2024-06-15 14:23:00,956][1648985] Avg episode reward: [(0, '155.450')] [2024-06-15 14:23:03,887][1652491] Updated weights for policy 0, policy_version 231415 (0.0016) [2024-06-15 14:23:05,955][1648985] Fps is (10 sec: 45875.8, 60 sec: 43690.7, 300 sec: 46097.4). Total num frames: 473956352. Throughput: 0: 11218.5. Samples: 118583296. Policy #0 lag: (min: 40.0, avg: 114.4, max: 296.0) [2024-06-15 14:23:05,956][1648985] Avg episode reward: [(0, '146.200')] [2024-06-15 14:23:07,617][1652491] Updated weights for policy 0, policy_version 231472 (0.0015) [2024-06-15 14:23:09,659][1652491] Updated weights for policy 0, policy_version 231546 (0.0140) [2024-06-15 14:23:10,955][1648985] Fps is (10 sec: 45875.0, 60 sec: 46421.3, 300 sec: 45986.3). Total num frames: 474284032. Throughput: 0: 11207.1. Samples: 118610432. Policy #0 lag: (min: 5.0, avg: 79.6, max: 261.0) [2024-06-15 14:23:10,955][1648985] Avg episode reward: [(0, '130.890')] [2024-06-15 14:23:11,256][1652491] Updated weights for policy 0, policy_version 231610 (0.0039) [2024-06-15 14:23:14,754][1652491] Updated weights for policy 0, policy_version 231664 (0.0014) [2024-06-15 14:23:15,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 44782.9, 300 sec: 46208.4). Total num frames: 474480640. Throughput: 0: 11207.1. Samples: 118675456. Policy #0 lag: (min: 5.0, avg: 79.6, max: 261.0) [2024-06-15 14:23:15,956][1648985] Avg episode reward: [(0, '126.750')] [2024-06-15 14:23:18,571][1652491] Updated weights for policy 0, policy_version 231697 (0.0015) [2024-06-15 14:23:19,888][1652491] Updated weights for policy 0, policy_version 231760 (0.0014) [2024-06-15 14:23:20,955][1648985] Fps is (10 sec: 42597.5, 60 sec: 47513.4, 300 sec: 45653.0). Total num frames: 474710016. Throughput: 0: 11400.5. Samples: 118752768. Policy #0 lag: (min: 5.0, avg: 79.6, max: 261.0) [2024-06-15 14:23:20,956][1648985] Avg episode reward: [(0, '127.320')] [2024-06-15 14:23:21,398][1652491] Updated weights for policy 0, policy_version 231824 (0.0014) [2024-06-15 14:23:24,884][1652491] Updated weights for policy 0, policy_version 231873 (0.0013) [2024-06-15 14:23:25,955][1648985] Fps is (10 sec: 45874.8, 60 sec: 44782.9, 300 sec: 45986.3). Total num frames: 474939392. Throughput: 0: 11468.9. Samples: 118790656. Policy #0 lag: (min: 5.0, avg: 79.6, max: 261.0) [2024-06-15 14:23:25,956][1648985] Avg episode reward: [(0, '136.470')] [2024-06-15 14:23:26,286][1652491] Updated weights for policy 0, policy_version 231932 (0.0011) [2024-06-15 14:23:29,899][1652491] Updated weights for policy 0, policy_version 231984 (0.0053) [2024-06-15 14:23:30,955][1648985] Fps is (10 sec: 49153.2, 60 sec: 46967.5, 300 sec: 45986.3). Total num frames: 475201536. Throughput: 0: 11616.7. Samples: 118867456. Policy #0 lag: (min: 5.0, avg: 79.6, max: 261.0) [2024-06-15 14:23:30,956][1648985] Avg episode reward: [(0, '150.400')] [2024-06-15 14:23:31,446][1652491] Updated weights for policy 0, policy_version 232055 (0.0013) [2024-06-15 14:23:32,660][1651469] Signal inference workers to stop experience collection... (12150 times) [2024-06-15 14:23:32,703][1652491] InferenceWorker_p0-w0: stopping experience collection (12150 times) [2024-06-15 14:23:32,929][1651469] Signal inference workers to resume experience collection... (12150 times) [2024-06-15 14:23:32,931][1652491] InferenceWorker_p0-w0: resuming experience collection (12150 times) [2024-06-15 14:23:33,119][1652491] Updated weights for policy 0, policy_version 232115 (0.0013) [2024-06-15 14:23:35,595][1652491] Updated weights for policy 0, policy_version 232160 (0.0018) [2024-06-15 14:23:35,955][1648985] Fps is (10 sec: 52429.7, 60 sec: 46967.5, 300 sec: 46208.5). Total num frames: 475463680. Throughput: 0: 11537.1. Samples: 118930432. Policy #0 lag: (min: 5.0, avg: 79.6, max: 261.0) [2024-06-15 14:23:35,955][1648985] Avg episode reward: [(0, '131.460')] [2024-06-15 14:23:39,869][1652491] Updated weights for policy 0, policy_version 232195 (0.0012) [2024-06-15 14:23:40,955][1648985] Fps is (10 sec: 42597.9, 60 sec: 45329.5, 300 sec: 45986.3). Total num frames: 475627520. Throughput: 0: 11787.4. Samples: 118974976. Policy #0 lag: (min: 5.0, avg: 79.6, max: 261.0) [2024-06-15 14:23:40,956][1648985] Avg episode reward: [(0, '103.340')] [2024-06-15 14:23:42,126][1652491] Updated weights for policy 0, policy_version 232288 (0.0013) [2024-06-15 14:23:44,624][1652491] Updated weights for policy 0, policy_version 232368 (0.0026) [2024-06-15 14:23:45,955][1648985] Fps is (10 sec: 45875.0, 60 sec: 48059.7, 300 sec: 46208.4). Total num frames: 475922432. Throughput: 0: 11798.7. Samples: 119031808. Policy #0 lag: (min: 5.0, avg: 79.6, max: 261.0) [2024-06-15 14:23:45,956][1648985] Avg episode reward: [(0, '121.140')] [2024-06-15 14:23:47,564][1652491] Updated weights for policy 0, policy_version 232422 (0.0107) [2024-06-15 14:23:50,955][1648985] Fps is (10 sec: 42598.0, 60 sec: 45875.3, 300 sec: 45986.3). Total num frames: 476053504. Throughput: 0: 11741.8. Samples: 119111680. Policy #0 lag: (min: 5.0, avg: 79.6, max: 261.0) [2024-06-15 14:23:50,956][1648985] Avg episode reward: [(0, '147.610')] [2024-06-15 14:23:51,697][1652491] Updated weights for policy 0, policy_version 232465 (0.0029) [2024-06-15 14:23:53,279][1652491] Updated weights for policy 0, policy_version 232529 (0.0012) [2024-06-15 14:23:55,955][1648985] Fps is (10 sec: 42598.5, 60 sec: 47513.7, 300 sec: 45875.2). Total num frames: 476348416. Throughput: 0: 11707.7. Samples: 119137280. Policy #0 lag: (min: 5.0, avg: 79.6, max: 261.0) [2024-06-15 14:23:55,955][1648985] Avg episode reward: [(0, '167.820')] [2024-06-15 14:23:56,068][1652491] Updated weights for policy 0, policy_version 232608 (0.0210) [2024-06-15 14:23:58,575][1652491] Updated weights for policy 0, policy_version 232647 (0.0012) [2024-06-15 14:24:00,955][1648985] Fps is (10 sec: 52429.8, 60 sec: 45875.2, 300 sec: 46208.4). Total num frames: 476577792. Throughput: 0: 11787.4. Samples: 119205888. Policy #0 lag: (min: 5.0, avg: 79.6, max: 261.0) [2024-06-15 14:24:00,955][1648985] Avg episode reward: [(0, '147.130')] [2024-06-15 14:24:03,162][1652491] Updated weights for policy 0, policy_version 232720 (0.0013) [2024-06-15 14:24:04,942][1652491] Updated weights for policy 0, policy_version 232785 (0.0015) [2024-06-15 14:24:05,955][1648985] Fps is (10 sec: 49152.2, 60 sec: 48059.8, 300 sec: 45764.2). Total num frames: 476839936. Throughput: 0: 11605.4. Samples: 119275008. Policy #0 lag: (min: 5.0, avg: 79.6, max: 261.0) [2024-06-15 14:24:05,955][1648985] Avg episode reward: [(0, '139.070')] [2024-06-15 14:24:07,069][1652491] Updated weights for policy 0, policy_version 232851 (0.0014) [2024-06-15 14:24:09,997][1652491] Updated weights for policy 0, policy_version 232901 (0.0016) [2024-06-15 14:24:10,955][1648985] Fps is (10 sec: 52428.4, 60 sec: 46967.4, 300 sec: 46208.4). Total num frames: 477102080. Throughput: 0: 11491.6. Samples: 119307776. Policy #0 lag: (min: 5.0, avg: 79.6, max: 261.0) [2024-06-15 14:24:10,956][1648985] Avg episode reward: [(0, '137.000')] [2024-06-15 14:24:14,192][1652491] Updated weights for policy 0, policy_version 232968 (0.0014) [2024-06-15 14:24:15,955][1648985] Fps is (10 sec: 39321.5, 60 sec: 45875.2, 300 sec: 45764.1). Total num frames: 477233152. Throughput: 0: 11514.3. Samples: 119385600. Policy #0 lag: (min: 5.0, avg: 79.6, max: 261.0) [2024-06-15 14:24:15,956][1648985] Avg episode reward: [(0, '153.200')] [2024-06-15 14:24:16,086][1652491] Updated weights for policy 0, policy_version 233040 (0.0013) [2024-06-15 14:24:17,848][1651469] Signal inference workers to stop experience collection... (12200 times) [2024-06-15 14:24:17,889][1652491] InferenceWorker_p0-w0: stopping experience collection (12200 times) [2024-06-15 14:24:18,087][1651469] Signal inference workers to resume experience collection... (12200 times) [2024-06-15 14:24:18,087][1652491] InferenceWorker_p0-w0: resuming experience collection (12200 times) [2024-06-15 14:24:18,282][1652491] Updated weights for policy 0, policy_version 233110 (0.0016) [2024-06-15 14:24:20,955][1648985] Fps is (10 sec: 39322.0, 60 sec: 46421.5, 300 sec: 45875.2). Total num frames: 477495296. Throughput: 0: 11502.9. Samples: 119448064. Policy #0 lag: (min: 79.0, avg: 145.7, max: 319.0) [2024-06-15 14:24:20,955][1648985] Avg episode reward: [(0, '157.720')] [2024-06-15 14:24:21,373][1652491] Updated weights for policy 0, policy_version 233153 (0.0014) [2024-06-15 14:24:22,347][1652491] Updated weights for policy 0, policy_version 233207 (0.0013) [2024-06-15 14:24:25,955][1648985] Fps is (10 sec: 42598.3, 60 sec: 45329.2, 300 sec: 45875.2). Total num frames: 477659136. Throughput: 0: 11423.3. Samples: 119489024. Policy #0 lag: (min: 79.0, avg: 145.7, max: 319.0) [2024-06-15 14:24:25,956][1648985] Avg episode reward: [(0, '132.970')] [2024-06-15 14:24:26,544][1652491] Updated weights for policy 0, policy_version 233251 (0.0012) [2024-06-15 14:24:28,218][1652491] Updated weights for policy 0, policy_version 233317 (0.0120) [2024-06-15 14:24:29,016][1652491] Updated weights for policy 0, policy_version 233360 (0.0018) [2024-06-15 14:24:30,955][1648985] Fps is (10 sec: 52428.6, 60 sec: 46967.4, 300 sec: 46208.4). Total num frames: 478019584. Throughput: 0: 11639.5. Samples: 119555584. Policy #0 lag: (min: 79.0, avg: 145.7, max: 319.0) [2024-06-15 14:24:30,956][1648985] Avg episode reward: [(0, '132.640')] [2024-06-15 14:24:32,460][1652491] Updated weights for policy 0, policy_version 233424 (0.0013) [2024-06-15 14:24:33,341][1652491] Updated weights for policy 0, policy_version 233472 (0.0013) [2024-06-15 14:24:35,955][1648985] Fps is (10 sec: 49152.2, 60 sec: 44782.9, 300 sec: 46208.8). Total num frames: 478150656. Throughput: 0: 11685.0. Samples: 119637504. Policy #0 lag: (min: 79.0, avg: 145.7, max: 319.0) [2024-06-15 14:24:35,955][1648985] Avg episode reward: [(0, '133.260')] [2024-06-15 14:24:37,573][1652491] Updated weights for policy 0, policy_version 233520 (0.0079) [2024-06-15 14:24:39,178][1652491] Updated weights for policy 0, policy_version 233584 (0.0144) [2024-06-15 14:24:40,282][1652491] Updated weights for policy 0, policy_version 233632 (0.0125) [2024-06-15 14:24:40,955][1648985] Fps is (10 sec: 52428.1, 60 sec: 48605.8, 300 sec: 46208.4). Total num frames: 478543872. Throughput: 0: 11832.9. Samples: 119669760. Policy #0 lag: (min: 79.0, avg: 145.7, max: 319.0) [2024-06-15 14:24:40,956][1648985] Avg episode reward: [(0, '117.550')] [2024-06-15 14:24:42,976][1652491] Updated weights for policy 0, policy_version 233680 (0.0096) [2024-06-15 14:24:45,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 45875.2, 300 sec: 46208.4). Total num frames: 478674944. Throughput: 0: 11673.6. Samples: 119731200. Policy #0 lag: (min: 79.0, avg: 145.7, max: 319.0) [2024-06-15 14:24:45,955][1648985] Avg episode reward: [(0, '124.150')] [2024-06-15 14:24:47,798][1652491] Updated weights for policy 0, policy_version 233731 (0.0012) [2024-06-15 14:24:49,237][1652491] Updated weights for policy 0, policy_version 233796 (0.0013) [2024-06-15 14:24:50,458][1652491] Updated weights for policy 0, policy_version 233851 (0.0011) [2024-06-15 14:24:50,955][1648985] Fps is (10 sec: 39321.8, 60 sec: 48059.8, 300 sec: 46097.4). Total num frames: 478937088. Throughput: 0: 11753.2. Samples: 119803904. Policy #0 lag: (min: 79.0, avg: 145.7, max: 319.0) [2024-06-15 14:24:50,956][1648985] Avg episode reward: [(0, '121.840')] [2024-06-15 14:24:52,090][1652491] Updated weights for policy 0, policy_version 233905 (0.0013) [2024-06-15 14:24:55,289][1652491] Updated weights for policy 0, policy_version 233942 (0.0014) [2024-06-15 14:24:55,955][1648985] Fps is (10 sec: 49150.7, 60 sec: 46967.3, 300 sec: 46430.5). Total num frames: 479166464. Throughput: 0: 11832.8. Samples: 119840256. Policy #0 lag: (min: 79.0, avg: 145.7, max: 319.0) [2024-06-15 14:24:55,956][1648985] Avg episode reward: [(0, '147.760')] [2024-06-15 14:24:56,075][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000233984_479199232.pth... [2024-06-15 14:24:56,159][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000228592_468156416.pth [2024-06-15 14:24:58,569][1652491] Updated weights for policy 0, policy_version 234000 (0.0016) [2024-06-15 14:24:59,589][1652491] Updated weights for policy 0, policy_version 234041 (0.0011) [2024-06-15 14:25:00,047][1651469] Signal inference workers to stop experience collection... (12250 times) [2024-06-15 14:25:00,101][1652491] InferenceWorker_p0-w0: stopping experience collection (12250 times) [2024-06-15 14:25:00,299][1651469] Signal inference workers to resume experience collection... (12250 times) [2024-06-15 14:25:00,300][1652491] InferenceWorker_p0-w0: resuming experience collection (12250 times) [2024-06-15 14:25:00,886][1652491] Updated weights for policy 0, policy_version 234082 (0.0033) [2024-06-15 14:25:00,955][1648985] Fps is (10 sec: 45874.9, 60 sec: 46967.3, 300 sec: 46541.6). Total num frames: 479395840. Throughput: 0: 11753.2. Samples: 119914496. Policy #0 lag: (min: 79.0, avg: 145.7, max: 319.0) [2024-06-15 14:25:00,956][1648985] Avg episode reward: [(0, '162.170')] [2024-06-15 14:25:02,972][1652491] Updated weights for policy 0, policy_version 234144 (0.0015) [2024-06-15 14:25:05,955][1648985] Fps is (10 sec: 42599.3, 60 sec: 45875.1, 300 sec: 46208.4). Total num frames: 479592448. Throughput: 0: 11867.0. Samples: 119982080. Policy #0 lag: (min: 79.0, avg: 145.7, max: 319.0) [2024-06-15 14:25:05,956][1648985] Avg episode reward: [(0, '134.960')] [2024-06-15 14:25:09,426][1652491] Updated weights for policy 0, policy_version 234243 (0.0109) [2024-06-15 14:25:10,955][1652491] Updated weights for policy 0, policy_version 234300 (0.0013) [2024-06-15 14:25:10,955][1648985] Fps is (10 sec: 42597.9, 60 sec: 45328.9, 300 sec: 46541.6). Total num frames: 479821824. Throughput: 0: 11753.2. Samples: 120017920. Policy #0 lag: (min: 79.0, avg: 145.7, max: 319.0) [2024-06-15 14:25:10,956][1648985] Avg episode reward: [(0, '119.950')] [2024-06-15 14:25:12,911][1652491] Updated weights for policy 0, policy_version 234368 (0.0012) [2024-06-15 14:25:15,447][1652491] Updated weights for policy 0, policy_version 234432 (0.0015) [2024-06-15 14:25:15,955][1648985] Fps is (10 sec: 52427.8, 60 sec: 48059.5, 300 sec: 46208.4). Total num frames: 480116736. Throughput: 0: 11605.3. Samples: 120077824. Policy #0 lag: (min: 79.0, avg: 145.7, max: 319.0) [2024-06-15 14:25:15,956][1648985] Avg episode reward: [(0, '127.670')] [2024-06-15 14:25:19,407][1652491] Updated weights for policy 0, policy_version 234491 (0.0013) [2024-06-15 14:25:20,955][1648985] Fps is (10 sec: 42599.5, 60 sec: 45875.2, 300 sec: 46208.4). Total num frames: 480247808. Throughput: 0: 11400.5. Samples: 120150528. Policy #0 lag: (min: 79.0, avg: 145.7, max: 319.0) [2024-06-15 14:25:20,956][1648985] Avg episode reward: [(0, '130.940')] [2024-06-15 14:25:21,929][1652491] Updated weights for policy 0, policy_version 234548 (0.0013) [2024-06-15 14:25:24,044][1652491] Updated weights for policy 0, policy_version 234592 (0.0014) [2024-06-15 14:25:25,955][1648985] Fps is (10 sec: 42599.3, 60 sec: 48059.7, 300 sec: 45875.2). Total num frames: 480542720. Throughput: 0: 11468.8. Samples: 120185856. Policy #0 lag: (min: 15.0, avg: 141.6, max: 271.0) [2024-06-15 14:25:25,955][1648985] Avg episode reward: [(0, '143.480')] [2024-06-15 14:25:26,533][1652491] Updated weights for policy 0, policy_version 234672 (0.0016) [2024-06-15 14:25:30,127][1652491] Updated weights for policy 0, policy_version 234720 (0.0014) [2024-06-15 14:25:30,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 45875.2, 300 sec: 46208.4). Total num frames: 480772096. Throughput: 0: 11639.5. Samples: 120254976. Policy #0 lag: (min: 15.0, avg: 141.6, max: 271.0) [2024-06-15 14:25:30,956][1648985] Avg episode reward: [(0, '155.810')] [2024-06-15 14:25:31,668][1652491] Updated weights for policy 0, policy_version 234753 (0.0015) [2024-06-15 14:25:32,811][1652491] Updated weights for policy 0, policy_version 234811 (0.0015) [2024-06-15 14:25:35,726][1652491] Updated weights for policy 0, policy_version 234864 (0.0013) [2024-06-15 14:25:35,955][1648985] Fps is (10 sec: 45875.0, 60 sec: 47513.6, 300 sec: 45986.3). Total num frames: 481001472. Throughput: 0: 11662.2. Samples: 120328704. Policy #0 lag: (min: 15.0, avg: 141.6, max: 271.0) [2024-06-15 14:25:35,956][1648985] Avg episode reward: [(0, '164.680')] [2024-06-15 14:25:36,270][1652491] Updated weights for policy 0, policy_version 234884 (0.0013) [2024-06-15 14:25:37,443][1652491] Updated weights for policy 0, policy_version 234935 (0.0013) [2024-06-15 14:25:40,637][1652491] Updated weights for policy 0, policy_version 234979 (0.0013) [2024-06-15 14:25:40,955][1648985] Fps is (10 sec: 49152.0, 60 sec: 45329.2, 300 sec: 46208.4). Total num frames: 481263616. Throughput: 0: 11685.0. Samples: 120366080. Policy #0 lag: (min: 15.0, avg: 141.6, max: 271.0) [2024-06-15 14:25:40,956][1648985] Avg episode reward: [(0, '143.120')] [2024-06-15 14:25:42,962][1652491] Updated weights for policy 0, policy_version 235029 (0.0014) [2024-06-15 14:25:43,631][1652491] Updated weights for policy 0, policy_version 235067 (0.0017) [2024-06-15 14:25:45,304][1651469] Signal inference workers to stop experience collection... (12300 times) [2024-06-15 14:25:45,356][1652491] InferenceWorker_p0-w0: stopping experience collection (12300 times) [2024-06-15 14:25:45,607][1651469] Signal inference workers to resume experience collection... (12300 times) [2024-06-15 14:25:45,608][1652491] InferenceWorker_p0-w0: resuming experience collection (12300 times) [2024-06-15 14:25:45,955][1648985] Fps is (10 sec: 49152.6, 60 sec: 46967.5, 300 sec: 46097.4). Total num frames: 481492992. Throughput: 0: 11753.3. Samples: 120443392. Policy #0 lag: (min: 15.0, avg: 141.6, max: 271.0) [2024-06-15 14:25:45,955][1648985] Avg episode reward: [(0, '136.030')] [2024-06-15 14:25:46,014][1652491] Updated weights for policy 0, policy_version 235120 (0.0013) [2024-06-15 14:25:47,708][1652491] Updated weights for policy 0, policy_version 235184 (0.0015) [2024-06-15 14:25:50,955][1648985] Fps is (10 sec: 45875.4, 60 sec: 46421.4, 300 sec: 46208.5). Total num frames: 481722368. Throughput: 0: 11764.6. Samples: 120511488. Policy #0 lag: (min: 15.0, avg: 141.6, max: 271.0) [2024-06-15 14:25:50,956][1648985] Avg episode reward: [(0, '133.070')] [2024-06-15 14:25:51,195][1652491] Updated weights for policy 0, policy_version 235233 (0.0014) [2024-06-15 14:25:55,250][1652491] Updated weights for policy 0, policy_version 235323 (0.0013) [2024-06-15 14:25:55,955][1648985] Fps is (10 sec: 45874.3, 60 sec: 46421.4, 300 sec: 46097.3). Total num frames: 481951744. Throughput: 0: 11787.4. Samples: 120548352. Policy #0 lag: (min: 15.0, avg: 141.6, max: 271.0) [2024-06-15 14:25:55,956][1648985] Avg episode reward: [(0, '134.850')] [2024-06-15 14:25:57,734][1652491] Updated weights for policy 0, policy_version 235361 (0.0013) [2024-06-15 14:25:59,923][1652491] Updated weights for policy 0, policy_version 235449 (0.0017) [2024-06-15 14:26:00,955][1648985] Fps is (10 sec: 49151.8, 60 sec: 46967.6, 300 sec: 46208.4). Total num frames: 482213888. Throughput: 0: 11616.8. Samples: 120600576. Policy #0 lag: (min: 15.0, avg: 141.6, max: 271.0) [2024-06-15 14:26:00,956][1648985] Avg episode reward: [(0, '121.200')] [2024-06-15 14:26:03,313][1652491] Updated weights for policy 0, policy_version 235509 (0.0103) [2024-06-15 14:26:05,955][1648985] Fps is (10 sec: 39322.2, 60 sec: 45875.3, 300 sec: 45764.1). Total num frames: 482344960. Throughput: 0: 11810.1. Samples: 120681984. Policy #0 lag: (min: 15.0, avg: 141.6, max: 271.0) [2024-06-15 14:26:05,955][1648985] Avg episode reward: [(0, '130.810')] [2024-06-15 14:26:07,244][1652491] Updated weights for policy 0, policy_version 235577 (0.0074) [2024-06-15 14:26:09,668][1652491] Updated weights for policy 0, policy_version 235632 (0.0017) [2024-06-15 14:26:10,955][1648985] Fps is (10 sec: 42598.6, 60 sec: 46967.7, 300 sec: 45875.2). Total num frames: 482639872. Throughput: 0: 11707.7. Samples: 120712704. Policy #0 lag: (min: 15.0, avg: 141.6, max: 271.0) [2024-06-15 14:26:10,955][1648985] Avg episode reward: [(0, '137.400')] [2024-06-15 14:26:11,767][1652491] Updated weights for policy 0, policy_version 235704 (0.0014) [2024-06-15 14:26:14,604][1652491] Updated weights for policy 0, policy_version 235732 (0.0013) [2024-06-15 14:26:15,955][1648985] Fps is (10 sec: 52427.7, 60 sec: 45875.3, 300 sec: 46097.3). Total num frames: 482869248. Throughput: 0: 11537.0. Samples: 120774144. Policy #0 lag: (min: 15.0, avg: 141.6, max: 271.0) [2024-06-15 14:26:15,956][1648985] Avg episode reward: [(0, '131.450')] [2024-06-15 14:26:18,898][1652491] Updated weights for policy 0, policy_version 235808 (0.0013) [2024-06-15 14:26:20,761][1652491] Updated weights for policy 0, policy_version 235872 (0.0012) [2024-06-15 14:26:20,955][1648985] Fps is (10 sec: 42598.3, 60 sec: 46967.5, 300 sec: 45653.1). Total num frames: 483065856. Throughput: 0: 11411.9. Samples: 120842240. Policy #0 lag: (min: 15.0, avg: 141.6, max: 271.0) [2024-06-15 14:26:20,956][1648985] Avg episode reward: [(0, '132.520')] [2024-06-15 14:26:22,850][1652491] Updated weights for policy 0, policy_version 235959 (0.0074) [2024-06-15 14:26:25,792][1652491] Updated weights for policy 0, policy_version 235984 (0.0012) [2024-06-15 14:26:25,955][1648985] Fps is (10 sec: 42598.8, 60 sec: 45875.2, 300 sec: 45875.2). Total num frames: 483295232. Throughput: 0: 11241.2. Samples: 120871936. Policy #0 lag: (min: 15.0, avg: 141.6, max: 271.0) [2024-06-15 14:26:25,956][1648985] Avg episode reward: [(0, '140.920')] [2024-06-15 14:26:30,467][1652491] Updated weights for policy 0, policy_version 236067 (0.0015) [2024-06-15 14:26:30,955][1648985] Fps is (10 sec: 42598.4, 60 sec: 45329.1, 300 sec: 45764.1). Total num frames: 483491840. Throughput: 0: 11264.0. Samples: 120950272. Policy #0 lag: (min: 15.0, avg: 141.6, max: 271.0) [2024-06-15 14:26:30,956][1648985] Avg episode reward: [(0, '147.740')] [2024-06-15 14:26:31,423][1651469] Signal inference workers to stop experience collection... (12350 times) [2024-06-15 14:26:31,477][1652491] InferenceWorker_p0-w0: stopping experience collection (12350 times) [2024-06-15 14:26:31,783][1651469] Signal inference workers to resume experience collection... (12350 times) [2024-06-15 14:26:31,784][1652491] InferenceWorker_p0-w0: resuming experience collection (12350 times) [2024-06-15 14:26:32,146][1652491] Updated weights for policy 0, policy_version 236128 (0.0013) [2024-06-15 14:26:32,805][1652491] Updated weights for policy 0, policy_version 236154 (0.0013) [2024-06-15 14:26:33,655][1652491] Updated weights for policy 0, policy_version 236192 (0.0124) [2024-06-15 14:26:35,955][1648985] Fps is (10 sec: 49152.1, 60 sec: 46421.3, 300 sec: 46097.4). Total num frames: 483786752. Throughput: 0: 11264.0. Samples: 121018368. Policy #0 lag: (min: 15.0, avg: 141.6, max: 271.0) [2024-06-15 14:26:35,956][1648985] Avg episode reward: [(0, '148.680')] [2024-06-15 14:26:36,944][1652491] Updated weights for policy 0, policy_version 236241 (0.0014) [2024-06-15 14:26:40,955][1648985] Fps is (10 sec: 45874.3, 60 sec: 44782.8, 300 sec: 45986.2). Total num frames: 483950592. Throughput: 0: 11309.5. Samples: 121057280. Policy #0 lag: (min: 15.0, avg: 127.7, max: 271.0) [2024-06-15 14:26:40,956][1648985] Avg episode reward: [(0, '142.030')] [2024-06-15 14:26:41,113][1652491] Updated weights for policy 0, policy_version 236306 (0.0014) [2024-06-15 14:26:43,053][1652491] Updated weights for policy 0, policy_version 236384 (0.0013) [2024-06-15 14:26:43,993][1652491] Updated weights for policy 0, policy_version 236420 (0.0017) [2024-06-15 14:26:45,955][1648985] Fps is (10 sec: 52429.3, 60 sec: 46967.4, 300 sec: 46208.4). Total num frames: 484311040. Throughput: 0: 11559.8. Samples: 121120768. Policy #0 lag: (min: 15.0, avg: 127.7, max: 271.0) [2024-06-15 14:26:45,955][1648985] Avg episode reward: [(0, '137.240')] [2024-06-15 14:26:47,666][1652491] Updated weights for policy 0, policy_version 236489 (0.0047) [2024-06-15 14:26:50,955][1648985] Fps is (10 sec: 49152.7, 60 sec: 45329.0, 300 sec: 46208.4). Total num frames: 484442112. Throughput: 0: 11480.1. Samples: 121198592. Policy #0 lag: (min: 15.0, avg: 127.7, max: 271.0) [2024-06-15 14:26:50,956][1648985] Avg episode reward: [(0, '152.210')] [2024-06-15 14:26:52,244][1652491] Updated weights for policy 0, policy_version 236576 (0.0075) [2024-06-15 14:26:54,916][1652491] Updated weights for policy 0, policy_version 236670 (0.0154) [2024-06-15 14:26:55,955][1648985] Fps is (10 sec: 39320.5, 60 sec: 45875.1, 300 sec: 45986.3). Total num frames: 484704256. Throughput: 0: 11571.1. Samples: 121233408. Policy #0 lag: (min: 15.0, avg: 127.7, max: 271.0) [2024-06-15 14:26:55,956][1648985] Avg episode reward: [(0, '152.130')] [2024-06-15 14:26:56,347][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000236704_484769792.pth... [2024-06-15 14:26:56,479][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000231232_473563136.pth [2024-06-15 14:26:56,910][1652491] Updated weights for policy 0, policy_version 236727 (0.0040) [2024-06-15 14:26:59,283][1652491] Updated weights for policy 0, policy_version 236770 (0.0013) [2024-06-15 14:27:00,955][1648985] Fps is (10 sec: 52429.8, 60 sec: 45875.3, 300 sec: 46208.5). Total num frames: 484966400. Throughput: 0: 11616.8. Samples: 121296896. Policy #0 lag: (min: 15.0, avg: 127.7, max: 271.0) [2024-06-15 14:27:00,955][1648985] Avg episode reward: [(0, '156.840')] [2024-06-15 14:27:04,565][1652491] Updated weights for policy 0, policy_version 236854 (0.0034) [2024-06-15 14:27:05,923][1652491] Updated weights for policy 0, policy_version 236912 (0.0013) [2024-06-15 14:27:05,955][1648985] Fps is (10 sec: 49152.6, 60 sec: 47513.5, 300 sec: 46430.6). Total num frames: 485195776. Throughput: 0: 11650.8. Samples: 121366528. Policy #0 lag: (min: 15.0, avg: 127.7, max: 271.0) [2024-06-15 14:27:05,956][1648985] Avg episode reward: [(0, '147.800')] [2024-06-15 14:27:10,373][1652491] Updated weights for policy 0, policy_version 236999 (0.0015) [2024-06-15 14:27:10,955][1648985] Fps is (10 sec: 45874.5, 60 sec: 46421.3, 300 sec: 46208.4). Total num frames: 485425152. Throughput: 0: 11594.0. Samples: 121393664. Policy #0 lag: (min: 15.0, avg: 127.7, max: 271.0) [2024-06-15 14:27:10,956][1648985] Avg episode reward: [(0, '136.940')] [2024-06-15 14:27:15,040][1652491] Updated weights for policy 0, policy_version 237060 (0.0013) [2024-06-15 14:27:15,353][1651469] Signal inference workers to stop experience collection... (12400 times) [2024-06-15 14:27:15,391][1652491] InferenceWorker_p0-w0: stopping experience collection (12400 times) [2024-06-15 14:27:15,622][1651469] Signal inference workers to resume experience collection... (12400 times) [2024-06-15 14:27:15,623][1652491] InferenceWorker_p0-w0: resuming experience collection (12400 times) [2024-06-15 14:27:15,955][1648985] Fps is (10 sec: 36044.7, 60 sec: 44783.0, 300 sec: 46430.6). Total num frames: 485556224. Throughput: 0: 11696.3. Samples: 121476608. Policy #0 lag: (min: 15.0, avg: 127.7, max: 271.0) [2024-06-15 14:27:15,956][1648985] Avg episode reward: [(0, '128.560')] [2024-06-15 14:27:16,958][1652491] Updated weights for policy 0, policy_version 237136 (0.0014) [2024-06-15 14:27:17,839][1652491] Updated weights for policy 0, policy_version 237183 (0.0016) [2024-06-15 14:27:20,205][1652491] Updated weights for policy 0, policy_version 237245 (0.0013) [2024-06-15 14:27:20,955][1648985] Fps is (10 sec: 45874.3, 60 sec: 46967.3, 300 sec: 46208.4). Total num frames: 485883904. Throughput: 0: 11491.5. Samples: 121535488. Policy #0 lag: (min: 15.0, avg: 127.7, max: 271.0) [2024-06-15 14:27:20,956][1648985] Avg episode reward: [(0, '127.400')] [2024-06-15 14:27:22,284][1652491] Updated weights for policy 0, policy_version 237298 (0.0014) [2024-06-15 14:27:25,955][1648985] Fps is (10 sec: 45875.6, 60 sec: 45329.1, 300 sec: 46208.4). Total num frames: 486014976. Throughput: 0: 11503.0. Samples: 121574912. Policy #0 lag: (min: 15.0, avg: 127.7, max: 271.0) [2024-06-15 14:27:25,955][1648985] Avg episode reward: [(0, '127.630')] [2024-06-15 14:27:27,416][1652491] Updated weights for policy 0, policy_version 237367 (0.0111) [2024-06-15 14:27:28,906][1652491] Updated weights for policy 0, policy_version 237436 (0.0013) [2024-06-15 14:27:30,955][1648985] Fps is (10 sec: 39321.6, 60 sec: 46421.2, 300 sec: 46208.4). Total num frames: 486277120. Throughput: 0: 11582.5. Samples: 121641984. Policy #0 lag: (min: 15.0, avg: 127.7, max: 271.0) [2024-06-15 14:27:30,956][1648985] Avg episode reward: [(0, '148.080')] [2024-06-15 14:27:31,894][1652491] Updated weights for policy 0, policy_version 237499 (0.0126) [2024-06-15 14:27:33,602][1652491] Updated weights for policy 0, policy_version 237560 (0.0012) [2024-06-15 14:27:35,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 45875.2, 300 sec: 46208.5). Total num frames: 486539264. Throughput: 0: 11434.7. Samples: 121713152. Policy #0 lag: (min: 15.0, avg: 127.7, max: 271.0) [2024-06-15 14:27:35,956][1648985] Avg episode reward: [(0, '133.570')] [2024-06-15 14:27:38,986][1652491] Updated weights for policy 0, policy_version 237616 (0.0012) [2024-06-15 14:27:40,107][1652491] Updated weights for policy 0, policy_version 237668 (0.0013) [2024-06-15 14:27:40,955][1648985] Fps is (10 sec: 52429.7, 60 sec: 47513.7, 300 sec: 46652.7). Total num frames: 486801408. Throughput: 0: 11423.3. Samples: 121747456. Policy #0 lag: (min: 15.0, avg: 127.7, max: 271.0) [2024-06-15 14:27:40,956][1648985] Avg episode reward: [(0, '111.570')] [2024-06-15 14:27:42,634][1652491] Updated weights for policy 0, policy_version 237728 (0.0016) [2024-06-15 14:27:44,089][1652491] Updated weights for policy 0, policy_version 237792 (0.0020) [2024-06-15 14:27:44,746][1652491] Updated weights for policy 0, policy_version 237822 (0.0011) [2024-06-15 14:27:45,955][1648985] Fps is (10 sec: 52429.3, 60 sec: 45875.2, 300 sec: 46652.8). Total num frames: 487063552. Throughput: 0: 11514.3. Samples: 121815040. Policy #0 lag: (min: 15.0, avg: 127.7, max: 271.0) [2024-06-15 14:27:45,956][1648985] Avg episode reward: [(0, '108.450')] [2024-06-15 14:27:49,979][1652491] Updated weights for policy 0, policy_version 237880 (0.0014) [2024-06-15 14:27:50,955][1648985] Fps is (10 sec: 45874.5, 60 sec: 46967.4, 300 sec: 46652.7). Total num frames: 487260160. Throughput: 0: 11685.0. Samples: 121892352. Policy #0 lag: (min: 9.0, avg: 86.3, max: 265.0) [2024-06-15 14:27:50,956][1648985] Avg episode reward: [(0, '114.940')] [2024-06-15 14:27:51,286][1652491] Updated weights for policy 0, policy_version 237942 (0.0013) [2024-06-15 14:27:52,804][1652491] Updated weights for policy 0, policy_version 237984 (0.0013) [2024-06-15 14:27:55,393][1651469] Signal inference workers to stop experience collection... (12450 times) [2024-06-15 14:27:55,447][1652491] InferenceWorker_p0-w0: stopping experience collection (12450 times) [2024-06-15 14:27:55,472][1652491] Updated weights for policy 0, policy_version 238049 (0.0012) [2024-06-15 14:27:55,784][1651469] Signal inference workers to resume experience collection... (12450 times) [2024-06-15 14:27:55,785][1652491] InferenceWorker_p0-w0: resuming experience collection (12450 times) [2024-06-15 14:27:55,955][1648985] Fps is (10 sec: 49151.0, 60 sec: 47513.6, 300 sec: 46541.6). Total num frames: 487555072. Throughput: 0: 11787.3. Samples: 121924096. Policy #0 lag: (min: 9.0, avg: 86.3, max: 265.0) [2024-06-15 14:27:55,956][1648985] Avg episode reward: [(0, '130.670')] [2024-06-15 14:28:00,418][1652491] Updated weights for policy 0, policy_version 238096 (0.0041) [2024-06-15 14:28:00,955][1648985] Fps is (10 sec: 39321.3, 60 sec: 44782.6, 300 sec: 46430.6). Total num frames: 487653376. Throughput: 0: 11582.5. Samples: 121997824. Policy #0 lag: (min: 9.0, avg: 86.3, max: 265.0) [2024-06-15 14:28:00,956][1648985] Avg episode reward: [(0, '157.800')] [2024-06-15 14:28:01,712][1652491] Updated weights for policy 0, policy_version 238160 (0.0014) [2024-06-15 14:28:02,611][1652491] Updated weights for policy 0, policy_version 238207 (0.0014) [2024-06-15 14:28:05,011][1652491] Updated weights for policy 0, policy_version 238267 (0.0013) [2024-06-15 14:28:05,955][1648985] Fps is (10 sec: 45876.1, 60 sec: 46967.6, 300 sec: 46541.7). Total num frames: 488013824. Throughput: 0: 11844.3. Samples: 122068480. Policy #0 lag: (min: 9.0, avg: 86.3, max: 265.0) [2024-06-15 14:28:05,955][1648985] Avg episode reward: [(0, '150.090')] [2024-06-15 14:28:06,777][1652491] Updated weights for policy 0, policy_version 238327 (0.0014) [2024-06-15 14:28:10,955][1648985] Fps is (10 sec: 45876.5, 60 sec: 44783.0, 300 sec: 46208.4). Total num frames: 488112128. Throughput: 0: 11832.9. Samples: 122107392. Policy #0 lag: (min: 9.0, avg: 86.3, max: 265.0) [2024-06-15 14:28:10,956][1648985] Avg episode reward: [(0, '142.940')] [2024-06-15 14:28:12,288][1652491] Updated weights for policy 0, policy_version 238388 (0.0018) [2024-06-15 14:28:15,309][1652491] Updated weights for policy 0, policy_version 238468 (0.0013) [2024-06-15 14:28:15,955][1648985] Fps is (10 sec: 42597.9, 60 sec: 48059.8, 300 sec: 46541.7). Total num frames: 488439808. Throughput: 0: 11753.3. Samples: 122170880. Policy #0 lag: (min: 9.0, avg: 86.3, max: 265.0) [2024-06-15 14:28:15,956][1648985] Avg episode reward: [(0, '135.100')] [2024-06-15 14:28:16,393][1652491] Updated weights for policy 0, policy_version 238524 (0.0011) [2024-06-15 14:28:18,772][1652491] Updated weights for policy 0, policy_version 238582 (0.0015) [2024-06-15 14:28:20,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 45875.4, 300 sec: 46430.6). Total num frames: 488636416. Throughput: 0: 11753.3. Samples: 122242048. Policy #0 lag: (min: 9.0, avg: 86.3, max: 265.0) [2024-06-15 14:28:20,955][1648985] Avg episode reward: [(0, '130.700')] [2024-06-15 14:28:23,143][1652491] Updated weights for policy 0, policy_version 238640 (0.0110) [2024-06-15 14:28:24,040][1652491] Updated weights for policy 0, policy_version 238675 (0.0019) [2024-06-15 14:28:24,708][1652491] Updated weights for policy 0, policy_version 238720 (0.0090) [2024-06-15 14:28:25,955][1648985] Fps is (10 sec: 45874.7, 60 sec: 48059.6, 300 sec: 46430.6). Total num frames: 488898560. Throughput: 0: 11741.8. Samples: 122275840. Policy #0 lag: (min: 9.0, avg: 86.3, max: 265.0) [2024-06-15 14:28:25,956][1648985] Avg episode reward: [(0, '134.050')] [2024-06-15 14:28:27,605][1652491] Updated weights for policy 0, policy_version 238784 (0.0014) [2024-06-15 14:28:29,904][1652491] Updated weights for policy 0, policy_version 238844 (0.0013) [2024-06-15 14:28:30,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 48059.9, 300 sec: 46430.6). Total num frames: 489160704. Throughput: 0: 11764.6. Samples: 122344448. Policy #0 lag: (min: 9.0, avg: 86.3, max: 265.0) [2024-06-15 14:28:30,955][1648985] Avg episode reward: [(0, '140.540')] [2024-06-15 14:28:34,442][1652491] Updated weights for policy 0, policy_version 238903 (0.0020) [2024-06-15 14:28:35,551][1652491] Updated weights for policy 0, policy_version 238948 (0.0098) [2024-06-15 14:28:35,955][1648985] Fps is (10 sec: 52429.4, 60 sec: 48059.7, 300 sec: 46763.8). Total num frames: 489422848. Throughput: 0: 11616.7. Samples: 122415104. Policy #0 lag: (min: 9.0, avg: 86.3, max: 265.0) [2024-06-15 14:28:35,956][1648985] Avg episode reward: [(0, '153.810')] [2024-06-15 14:28:38,664][1652491] Updated weights for policy 0, policy_version 239033 (0.0012) [2024-06-15 14:28:40,316][1651469] Signal inference workers to stop experience collection... (12500 times) [2024-06-15 14:28:40,342][1652491] Updated weights for policy 0, policy_version 239074 (0.0013) [2024-06-15 14:28:40,363][1652491] InferenceWorker_p0-w0: stopping experience collection (12500 times) [2024-06-15 14:28:40,597][1651469] Signal inference workers to resume experience collection... (12500 times) [2024-06-15 14:28:40,598][1652491] InferenceWorker_p0-w0: resuming experience collection (12500 times) [2024-06-15 14:28:40,961][1648985] Fps is (10 sec: 52398.0, 60 sec: 48055.1, 300 sec: 46651.8). Total num frames: 489684992. Throughput: 0: 11751.8. Samples: 122452992. Policy #0 lag: (min: 9.0, avg: 86.3, max: 265.0) [2024-06-15 14:28:40,962][1648985] Avg episode reward: [(0, '151.640')] [2024-06-15 14:28:44,835][1652491] Updated weights for policy 0, policy_version 239122 (0.0013) [2024-06-15 14:28:45,955][1648985] Fps is (10 sec: 39321.8, 60 sec: 45875.1, 300 sec: 46652.8). Total num frames: 489816064. Throughput: 0: 11798.8. Samples: 122528768. Policy #0 lag: (min: 9.0, avg: 86.3, max: 265.0) [2024-06-15 14:28:45,956][1648985] Avg episode reward: [(0, '142.250')] [2024-06-15 14:28:46,115][1652491] Updated weights for policy 0, policy_version 239173 (0.0015) [2024-06-15 14:28:47,221][1652491] Updated weights for policy 0, policy_version 239232 (0.0013) [2024-06-15 14:28:49,655][1652491] Updated weights for policy 0, policy_version 239290 (0.0014) [2024-06-15 14:28:50,955][1648985] Fps is (10 sec: 42623.2, 60 sec: 47513.7, 300 sec: 46652.7). Total num frames: 490110976. Throughput: 0: 11719.1. Samples: 122595840. Policy #0 lag: (min: 9.0, avg: 86.3, max: 265.0) [2024-06-15 14:28:50,956][1648985] Avg episode reward: [(0, '139.920')] [2024-06-15 14:28:51,212][1652491] Updated weights for policy 0, policy_version 239331 (0.0014) [2024-06-15 14:28:55,611][1652491] Updated weights for policy 0, policy_version 239376 (0.0012) [2024-06-15 14:28:55,955][1648985] Fps is (10 sec: 42598.0, 60 sec: 44783.0, 300 sec: 46319.5). Total num frames: 490242048. Throughput: 0: 11730.5. Samples: 122635264. Policy #0 lag: (min: 9.0, avg: 86.3, max: 265.0) [2024-06-15 14:28:55,956][1648985] Avg episode reward: [(0, '135.080')] [2024-06-15 14:28:56,447][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000239408_490307584.pth... [2024-06-15 14:28:56,562][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000233984_479199232.pth [2024-06-15 14:28:57,794][1652491] Updated weights for policy 0, policy_version 239456 (0.0143) [2024-06-15 14:29:00,955][1648985] Fps is (10 sec: 42598.5, 60 sec: 48059.9, 300 sec: 46430.6). Total num frames: 490536960. Throughput: 0: 11662.2. Samples: 122695680. Policy #0 lag: (min: 15.0, avg: 131.2, max: 271.0) [2024-06-15 14:29:00,956][1648985] Avg episode reward: [(0, '132.990')] [2024-06-15 14:29:01,004][1652491] Updated weights for policy 0, policy_version 239525 (0.0022) [2024-06-15 14:29:01,584][1652491] Updated weights for policy 0, policy_version 239552 (0.0012) [2024-06-15 14:29:03,130][1652491] Updated weights for policy 0, policy_version 239605 (0.0013) [2024-06-15 14:29:05,955][1648985] Fps is (10 sec: 49152.7, 60 sec: 45329.0, 300 sec: 46208.4). Total num frames: 490733568. Throughput: 0: 11810.1. Samples: 122773504. Policy #0 lag: (min: 15.0, avg: 131.2, max: 271.0) [2024-06-15 14:29:05,955][1648985] Avg episode reward: [(0, '150.840')] [2024-06-15 14:29:07,753][1652491] Updated weights for policy 0, policy_version 239648 (0.0015) [2024-06-15 14:29:09,130][1652491] Updated weights for policy 0, policy_version 239700 (0.0014) [2024-06-15 14:29:10,955][1648985] Fps is (10 sec: 45874.3, 60 sec: 48059.5, 300 sec: 46652.7). Total num frames: 490995712. Throughput: 0: 11753.2. Samples: 122804736. Policy #0 lag: (min: 15.0, avg: 131.2, max: 271.0) [2024-06-15 14:29:10,956][1648985] Avg episode reward: [(0, '163.990')] [2024-06-15 14:29:11,437][1652491] Updated weights for policy 0, policy_version 239747 (0.0012) [2024-06-15 14:29:12,586][1652491] Updated weights for policy 0, policy_version 239799 (0.0015) [2024-06-15 14:29:14,625][1652491] Updated weights for policy 0, policy_version 239856 (0.0015) [2024-06-15 14:29:15,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 46967.6, 300 sec: 46652.7). Total num frames: 491257856. Throughput: 0: 11525.7. Samples: 122863104. Policy #0 lag: (min: 15.0, avg: 131.2, max: 271.0) [2024-06-15 14:29:15,955][1648985] Avg episode reward: [(0, '145.760')] [2024-06-15 14:29:19,014][1652491] Updated weights for policy 0, policy_version 239888 (0.0013) [2024-06-15 14:29:20,598][1652491] Updated weights for policy 0, policy_version 239952 (0.0129) [2024-06-15 14:29:20,955][1648985] Fps is (10 sec: 42599.4, 60 sec: 46421.3, 300 sec: 46652.8). Total num frames: 491421696. Throughput: 0: 11639.5. Samples: 122938880. Policy #0 lag: (min: 15.0, avg: 131.2, max: 271.0) [2024-06-15 14:29:20,956][1648985] Avg episode reward: [(0, '133.240')] [2024-06-15 14:29:23,532][1652491] Updated weights for policy 0, policy_version 240032 (0.0094) [2024-06-15 14:29:24,908][1651469] Signal inference workers to stop experience collection... (12550 times) [2024-06-15 14:29:24,954][1652491] InferenceWorker_p0-w0: stopping experience collection (12550 times) [2024-06-15 14:29:25,236][1651469] Signal inference workers to resume experience collection... (12550 times) [2024-06-15 14:29:25,239][1652491] InferenceWorker_p0-w0: resuming experience collection (12550 times) [2024-06-15 14:29:25,409][1652491] Updated weights for policy 0, policy_version 240086 (0.0014) [2024-06-15 14:29:25,955][1648985] Fps is (10 sec: 45874.8, 60 sec: 46967.6, 300 sec: 46430.6). Total num frames: 491716608. Throughput: 0: 11379.2. Samples: 122964992. Policy #0 lag: (min: 15.0, avg: 131.2, max: 271.0) [2024-06-15 14:29:25,956][1648985] Avg episode reward: [(0, '128.550')] [2024-06-15 14:29:30,940][1652491] Updated weights for policy 0, policy_version 240144 (0.0011) [2024-06-15 14:29:30,955][1648985] Fps is (10 sec: 39321.5, 60 sec: 44236.8, 300 sec: 46319.5). Total num frames: 491814912. Throughput: 0: 11434.7. Samples: 123043328. Policy #0 lag: (min: 15.0, avg: 131.2, max: 271.0) [2024-06-15 14:29:30,956][1648985] Avg episode reward: [(0, '155.210')] [2024-06-15 14:29:32,422][1652491] Updated weights for policy 0, policy_version 240208 (0.0015) [2024-06-15 14:29:33,504][1652491] Updated weights for policy 0, policy_version 240254 (0.0015) [2024-06-15 14:29:35,126][1652491] Updated weights for policy 0, policy_version 240307 (0.0016) [2024-06-15 14:29:35,955][1648985] Fps is (10 sec: 49152.3, 60 sec: 46421.4, 300 sec: 46319.5). Total num frames: 492208128. Throughput: 0: 11264.0. Samples: 123102720. Policy #0 lag: (min: 15.0, avg: 131.2, max: 271.0) [2024-06-15 14:29:35,956][1648985] Avg episode reward: [(0, '155.270')] [2024-06-15 14:29:36,739][1652491] Updated weights for policy 0, policy_version 240384 (0.0088) [2024-06-15 14:29:40,955][1648985] Fps is (10 sec: 49152.0, 60 sec: 43694.9, 300 sec: 46208.4). Total num frames: 492306432. Throughput: 0: 11286.8. Samples: 123143168. Policy #0 lag: (min: 15.0, avg: 131.2, max: 271.0) [2024-06-15 14:29:40,955][1648985] Avg episode reward: [(0, '145.810')] [2024-06-15 14:29:43,620][1652491] Updated weights for policy 0, policy_version 240455 (0.0015) [2024-06-15 14:29:44,741][1652491] Updated weights for policy 0, policy_version 240507 (0.0023) [2024-06-15 14:29:45,964][1648985] Fps is (10 sec: 42562.4, 60 sec: 46960.9, 300 sec: 46429.3). Total num frames: 492634112. Throughput: 0: 11546.3. Samples: 123215360. Policy #0 lag: (min: 15.0, avg: 131.2, max: 271.0) [2024-06-15 14:29:45,965][1648985] Avg episode reward: [(0, '129.470')] [2024-06-15 14:29:46,199][1652491] Updated weights for policy 0, policy_version 240576 (0.0011) [2024-06-15 14:29:47,759][1652491] Updated weights for policy 0, policy_version 240637 (0.0012) [2024-06-15 14:29:50,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 45329.1, 300 sec: 46319.6). Total num frames: 492830720. Throughput: 0: 11457.4. Samples: 123289088. Policy #0 lag: (min: 15.0, avg: 131.2, max: 271.0) [2024-06-15 14:29:50,956][1648985] Avg episode reward: [(0, '120.980')] [2024-06-15 14:29:54,806][1652491] Updated weights for policy 0, policy_version 240720 (0.0015) [2024-06-15 14:29:55,955][1648985] Fps is (10 sec: 42634.0, 60 sec: 46967.5, 300 sec: 46319.5). Total num frames: 493060096. Throughput: 0: 11696.4. Samples: 123331072. Policy #0 lag: (min: 15.0, avg: 131.2, max: 271.0) [2024-06-15 14:29:55,956][1648985] Avg episode reward: [(0, '124.800')] [2024-06-15 14:29:56,708][1652491] Updated weights for policy 0, policy_version 240773 (0.0014) [2024-06-15 14:29:58,267][1652491] Updated weights for policy 0, policy_version 240851 (0.0013) [2024-06-15 14:30:00,955][1648985] Fps is (10 sec: 52428.4, 60 sec: 46967.4, 300 sec: 46652.7). Total num frames: 493355008. Throughput: 0: 11628.1. Samples: 123386368. Policy #0 lag: (min: 15.0, avg: 131.2, max: 271.0) [2024-06-15 14:30:00,956][1648985] Avg episode reward: [(0, '143.170')] [2024-06-15 14:30:04,776][1652491] Updated weights for policy 0, policy_version 240898 (0.0013) [2024-06-15 14:30:05,955][1648985] Fps is (10 sec: 39321.9, 60 sec: 45329.0, 300 sec: 46208.5). Total num frames: 493453312. Throughput: 0: 11764.6. Samples: 123468288. Policy #0 lag: (min: 15.0, avg: 131.2, max: 271.0) [2024-06-15 14:30:05,956][1648985] Avg episode reward: [(0, '151.090')] [2024-06-15 14:30:06,345][1652491] Updated weights for policy 0, policy_version 240963 (0.0013) [2024-06-15 14:30:07,019][1651469] Signal inference workers to stop experience collection... (12600 times) [2024-06-15 14:30:07,051][1652491] InferenceWorker_p0-w0: stopping experience collection (12600 times) [2024-06-15 14:30:07,200][1651469] Signal inference workers to resume experience collection... (12600 times) [2024-06-15 14:30:07,201][1652491] InferenceWorker_p0-w0: resuming experience collection (12600 times) [2024-06-15 14:30:07,455][1652491] Updated weights for policy 0, policy_version 241021 (0.0025) [2024-06-15 14:30:08,985][1652491] Updated weights for policy 0, policy_version 241075 (0.0013) [2024-06-15 14:30:10,336][1652491] Updated weights for policy 0, policy_version 241151 (0.0015) [2024-06-15 14:30:10,955][1648985] Fps is (10 sec: 52428.3, 60 sec: 48059.8, 300 sec: 46652.8). Total num frames: 493879296. Throughput: 0: 11719.1. Samples: 123492352. Policy #0 lag: (min: 113.0, avg: 233.0, max: 335.0) [2024-06-15 14:30:10,956][1648985] Avg episode reward: [(0, '139.460')] [2024-06-15 14:30:15,955][1648985] Fps is (10 sec: 42597.5, 60 sec: 43690.5, 300 sec: 46208.4). Total num frames: 493879296. Throughput: 0: 11673.5. Samples: 123568640. Policy #0 lag: (min: 113.0, avg: 233.0, max: 335.0) [2024-06-15 14:30:15,956][1648985] Avg episode reward: [(0, '125.160')] [2024-06-15 14:30:18,236][1652491] Updated weights for policy 0, policy_version 241232 (0.0021) [2024-06-15 14:30:20,274][1652491] Updated weights for policy 0, policy_version 241312 (0.0011) [2024-06-15 14:30:20,955][1648985] Fps is (10 sec: 36045.2, 60 sec: 46967.4, 300 sec: 46430.6). Total num frames: 494239744. Throughput: 0: 11628.1. Samples: 123625984. Policy #0 lag: (min: 113.0, avg: 233.0, max: 335.0) [2024-06-15 14:30:20,956][1648985] Avg episode reward: [(0, '121.730')] [2024-06-15 14:30:22,186][1652491] Updated weights for policy 0, policy_version 241392 (0.0028) [2024-06-15 14:30:25,955][1648985] Fps is (10 sec: 52429.4, 60 sec: 44782.9, 300 sec: 46208.4). Total num frames: 494403584. Throughput: 0: 11457.4. Samples: 123658752. Policy #0 lag: (min: 113.0, avg: 233.0, max: 335.0) [2024-06-15 14:30:25,956][1648985] Avg episode reward: [(0, '130.230')] [2024-06-15 14:30:28,894][1652491] Updated weights for policy 0, policy_version 241441 (0.0012) [2024-06-15 14:30:30,424][1652491] Updated weights for policy 0, policy_version 241506 (0.0014) [2024-06-15 14:30:30,955][1648985] Fps is (10 sec: 39321.6, 60 sec: 46967.4, 300 sec: 46208.4). Total num frames: 494632960. Throughput: 0: 11505.1. Samples: 123732992. Policy #0 lag: (min: 113.0, avg: 233.0, max: 335.0) [2024-06-15 14:30:30,956][1648985] Avg episode reward: [(0, '128.740')] [2024-06-15 14:30:31,251][1652491] Updated weights for policy 0, policy_version 241536 (0.0012) [2024-06-15 14:30:32,727][1652491] Updated weights for policy 0, policy_version 241587 (0.0014) [2024-06-15 14:30:34,104][1652491] Updated weights for policy 0, policy_version 241660 (0.0019) [2024-06-15 14:30:35,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 45329.0, 300 sec: 46319.5). Total num frames: 494927872. Throughput: 0: 11252.6. Samples: 123795456. Policy #0 lag: (min: 113.0, avg: 233.0, max: 335.0) [2024-06-15 14:30:35,956][1648985] Avg episode reward: [(0, '135.330')] [2024-06-15 14:30:40,717][1652491] Updated weights for policy 0, policy_version 241712 (0.0011) [2024-06-15 14:30:40,955][1648985] Fps is (10 sec: 39321.8, 60 sec: 45329.1, 300 sec: 45875.2). Total num frames: 495026176. Throughput: 0: 11275.4. Samples: 123838464. Policy #0 lag: (min: 113.0, avg: 233.0, max: 335.0) [2024-06-15 14:30:40,956][1648985] Avg episode reward: [(0, '148.990')] [2024-06-15 14:30:42,481][1652491] Updated weights for policy 0, policy_version 241781 (0.0012) [2024-06-15 14:30:43,445][1652491] Updated weights for policy 0, policy_version 241812 (0.0015) [2024-06-15 14:30:44,227][1651469] Signal inference workers to stop experience collection... (12650 times) [2024-06-15 14:30:44,296][1652491] InferenceWorker_p0-w0: stopping experience collection (12650 times) [2024-06-15 14:30:44,420][1651469] Signal inference workers to resume experience collection... (12650 times) [2024-06-15 14:30:44,420][1652491] InferenceWorker_p0-w0: resuming experience collection (12650 times) [2024-06-15 14:30:44,744][1652491] Updated weights for policy 0, policy_version 241872 (0.0013) [2024-06-15 14:30:45,557][1652491] Updated weights for policy 0, policy_version 241917 (0.0015) [2024-06-15 14:30:45,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 46974.0, 300 sec: 46541.7). Total num frames: 495452160. Throughput: 0: 11377.8. Samples: 123898368. Policy #0 lag: (min: 113.0, avg: 233.0, max: 335.0) [2024-06-15 14:30:45,956][1648985] Avg episode reward: [(0, '152.090')] [2024-06-15 14:30:50,955][1648985] Fps is (10 sec: 42597.9, 60 sec: 43690.6, 300 sec: 45764.1). Total num frames: 495452160. Throughput: 0: 11343.6. Samples: 123978752. Policy #0 lag: (min: 113.0, avg: 233.0, max: 335.0) [2024-06-15 14:30:50,956][1648985] Avg episode reward: [(0, '147.510')] [2024-06-15 14:30:52,852][1652491] Updated weights for policy 0, policy_version 242000 (0.0012) [2024-06-15 14:30:54,061][1652491] Updated weights for policy 0, policy_version 242048 (0.0030) [2024-06-15 14:30:55,955][1648985] Fps is (10 sec: 39321.4, 60 sec: 46421.3, 300 sec: 46208.4). Total num frames: 495845376. Throughput: 0: 11355.0. Samples: 124003328. Policy #0 lag: (min: 113.0, avg: 233.0, max: 335.0) [2024-06-15 14:30:55,956][1648985] Avg episode reward: [(0, '145.830')] [2024-06-15 14:30:56,268][1652491] Updated weights for policy 0, policy_version 242128 (0.0014) [2024-06-15 14:30:56,675][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000242144_495910912.pth... [2024-06-15 14:30:56,813][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000236704_484769792.pth [2024-06-15 14:30:56,818][1651469] Saving a milestone train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/milestones/checkpoint_000242144_495910912.pth [2024-06-15 14:31:00,955][1648985] Fps is (10 sec: 52429.1, 60 sec: 43690.7, 300 sec: 46208.4). Total num frames: 495976448. Throughput: 0: 11047.9. Samples: 124065792. Policy #0 lag: (min: 113.0, avg: 233.0, max: 335.0) [2024-06-15 14:31:00,956][1648985] Avg episode reward: [(0, '145.080')] [2024-06-15 14:31:03,408][1652491] Updated weights for policy 0, policy_version 242208 (0.0026) [2024-06-15 14:31:05,351][1652491] Updated weights for policy 0, policy_version 242275 (0.0012) [2024-06-15 14:31:05,955][1648985] Fps is (10 sec: 39322.6, 60 sec: 46421.4, 300 sec: 46097.4). Total num frames: 496238592. Throughput: 0: 11161.7. Samples: 124128256. Policy #0 lag: (min: 113.0, avg: 233.0, max: 335.0) [2024-06-15 14:31:05,955][1648985] Avg episode reward: [(0, '146.430')] [2024-06-15 14:31:07,802][1652491] Updated weights for policy 0, policy_version 242336 (0.0014) [2024-06-15 14:31:10,162][1652491] Updated weights for policy 0, policy_version 242425 (0.0136) [2024-06-15 14:31:10,955][1648985] Fps is (10 sec: 52428.4, 60 sec: 43690.7, 300 sec: 46208.4). Total num frames: 496500736. Throughput: 0: 11081.9. Samples: 124157440. Policy #0 lag: (min: 113.0, avg: 233.0, max: 335.0) [2024-06-15 14:31:10,956][1648985] Avg episode reward: [(0, '146.180')] [2024-06-15 14:31:15,955][1648985] Fps is (10 sec: 36043.9, 60 sec: 45329.1, 300 sec: 45875.2). Total num frames: 496599040. Throughput: 0: 11252.6. Samples: 124239360. Policy #0 lag: (min: 113.0, avg: 233.0, max: 335.0) [2024-06-15 14:31:15,956][1648985] Avg episode reward: [(0, '157.980')] [2024-06-15 14:31:16,068][1652491] Updated weights for policy 0, policy_version 242482 (0.0013) [2024-06-15 14:31:17,189][1652491] Updated weights for policy 0, policy_version 242533 (0.0014) [2024-06-15 14:31:19,917][1652491] Updated weights for policy 0, policy_version 242608 (0.0012) [2024-06-15 14:31:20,955][1648985] Fps is (10 sec: 42599.0, 60 sec: 44783.0, 300 sec: 46208.4). Total num frames: 496926720. Throughput: 0: 11104.7. Samples: 124295168. Policy #0 lag: (min: 63.0, avg: 172.1, max: 303.0) [2024-06-15 14:31:20,956][1648985] Avg episode reward: [(0, '144.030')] [2024-06-15 14:31:21,350][1652491] Updated weights for policy 0, policy_version 242660 (0.0017) [2024-06-15 14:31:25,557][1652491] Updated weights for policy 0, policy_version 242689 (0.0013) [2024-06-15 14:31:25,955][1648985] Fps is (10 sec: 45875.6, 60 sec: 44236.9, 300 sec: 45986.3). Total num frames: 497057792. Throughput: 0: 11104.7. Samples: 124338176. Policy #0 lag: (min: 63.0, avg: 172.1, max: 303.0) [2024-06-15 14:31:25,955][1648985] Avg episode reward: [(0, '129.700')] [2024-06-15 14:31:26,813][1652491] Updated weights for policy 0, policy_version 242741 (0.0011) [2024-06-15 14:31:27,139][1651469] Signal inference workers to stop experience collection... (12700 times) [2024-06-15 14:31:27,180][1652491] InferenceWorker_p0-w0: stopping experience collection (12700 times) [2024-06-15 14:31:27,396][1651469] Signal inference workers to resume experience collection... (12700 times) [2024-06-15 14:31:27,397][1652491] InferenceWorker_p0-w0: resuming experience collection (12700 times) [2024-06-15 14:31:28,316][1652491] Updated weights for policy 0, policy_version 242810 (0.0014) [2024-06-15 14:31:30,955][1648985] Fps is (10 sec: 42598.5, 60 sec: 45329.1, 300 sec: 45986.3). Total num frames: 497352704. Throughput: 0: 11355.0. Samples: 124409344. Policy #0 lag: (min: 63.0, avg: 172.1, max: 303.0) [2024-06-15 14:31:30,956][1648985] Avg episode reward: [(0, '139.660')] [2024-06-15 14:31:31,723][1652491] Updated weights for policy 0, policy_version 242882 (0.0032) [2024-06-15 14:31:32,882][1652491] Updated weights for policy 0, policy_version 242933 (0.0123) [2024-06-15 14:31:35,955][1648985] Fps is (10 sec: 49152.2, 60 sec: 43690.7, 300 sec: 46097.4). Total num frames: 497549312. Throughput: 0: 11173.0. Samples: 124481536. Policy #0 lag: (min: 63.0, avg: 172.1, max: 303.0) [2024-06-15 14:31:35,956][1648985] Avg episode reward: [(0, '142.450')] [2024-06-15 14:31:37,279][1652491] Updated weights for policy 0, policy_version 242965 (0.0015) [2024-06-15 14:31:39,309][1652491] Updated weights for policy 0, policy_version 243040 (0.0011) [2024-06-15 14:31:40,955][1648985] Fps is (10 sec: 45875.2, 60 sec: 46421.4, 300 sec: 45764.1). Total num frames: 497811456. Throughput: 0: 11366.4. Samples: 124514816. Policy #0 lag: (min: 63.0, avg: 172.1, max: 303.0) [2024-06-15 14:31:40,956][1648985] Avg episode reward: [(0, '134.020')] [2024-06-15 14:31:42,331][1652491] Updated weights for policy 0, policy_version 243120 (0.0014) [2024-06-15 14:31:44,165][1652491] Updated weights for policy 0, policy_version 243190 (0.0013) [2024-06-15 14:31:45,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 43690.7, 300 sec: 46208.4). Total num frames: 498073600. Throughput: 0: 11195.8. Samples: 124569600. Policy #0 lag: (min: 63.0, avg: 172.1, max: 303.0) [2024-06-15 14:31:45,956][1648985] Avg episode reward: [(0, '129.960')] [2024-06-15 14:31:49,236][1652491] Updated weights for policy 0, policy_version 243232 (0.0012) [2024-06-15 14:31:50,388][1652491] Updated weights for policy 0, policy_version 243283 (0.0013) [2024-06-15 14:31:50,955][1648985] Fps is (10 sec: 49152.2, 60 sec: 47513.7, 300 sec: 46097.4). Total num frames: 498302976. Throughput: 0: 11537.0. Samples: 124647424. Policy #0 lag: (min: 63.0, avg: 172.1, max: 303.0) [2024-06-15 14:31:50,955][1648985] Avg episode reward: [(0, '126.540')] [2024-06-15 14:31:53,000][1652491] Updated weights for policy 0, policy_version 243345 (0.0102) [2024-06-15 14:31:54,859][1652491] Updated weights for policy 0, policy_version 243410 (0.0105) [2024-06-15 14:31:55,955][1648985] Fps is (10 sec: 52427.8, 60 sec: 45875.1, 300 sec: 46208.4). Total num frames: 498597888. Throughput: 0: 11537.1. Samples: 124676608. Policy #0 lag: (min: 63.0, avg: 172.1, max: 303.0) [2024-06-15 14:31:55,956][1648985] Avg episode reward: [(0, '130.730')] [2024-06-15 14:32:00,094][1652491] Updated weights for policy 0, policy_version 243488 (0.0014) [2024-06-15 14:32:00,955][1648985] Fps is (10 sec: 42597.9, 60 sec: 45875.2, 300 sec: 45875.2). Total num frames: 498728960. Throughput: 0: 11355.0. Samples: 124750336. Policy #0 lag: (min: 63.0, avg: 172.1, max: 303.0) [2024-06-15 14:32:00,956][1648985] Avg episode reward: [(0, '138.960')] [2024-06-15 14:32:01,980][1652491] Updated weights for policy 0, policy_version 243552 (0.0090) [2024-06-15 14:32:02,731][1652491] Updated weights for policy 0, policy_version 243584 (0.0011) [2024-06-15 14:32:05,782][1652491] Updated weights for policy 0, policy_version 243640 (0.0014) [2024-06-15 14:32:05,955][1648985] Fps is (10 sec: 39322.3, 60 sec: 45875.1, 300 sec: 45986.3). Total num frames: 498991104. Throughput: 0: 11468.8. Samples: 124811264. Policy #0 lag: (min: 63.0, avg: 172.1, max: 303.0) [2024-06-15 14:32:05,956][1648985] Avg episode reward: [(0, '147.950')] [2024-06-15 14:32:07,480][1652491] Updated weights for policy 0, policy_version 243700 (0.0013) [2024-06-15 14:32:10,955][1648985] Fps is (10 sec: 39321.4, 60 sec: 43690.7, 300 sec: 45986.3). Total num frames: 499122176. Throughput: 0: 11309.5. Samples: 124847104. Policy #0 lag: (min: 63.0, avg: 172.1, max: 303.0) [2024-06-15 14:32:10,956][1648985] Avg episode reward: [(0, '131.980')] [2024-06-15 14:32:11,722][1651469] Signal inference workers to stop experience collection... (12750 times) [2024-06-15 14:32:11,778][1652491] InferenceWorker_p0-w0: stopping experience collection (12750 times) [2024-06-15 14:32:11,888][1651469] Signal inference workers to resume experience collection... (12750 times) [2024-06-15 14:32:11,889][1652491] InferenceWorker_p0-w0: resuming experience collection (12750 times) [2024-06-15 14:32:12,030][1652491] Updated weights for policy 0, policy_version 243746 (0.0014) [2024-06-15 14:32:13,093][1652491] Updated weights for policy 0, policy_version 243796 (0.0043) [2024-06-15 14:32:14,008][1652491] Updated weights for policy 0, policy_version 243840 (0.0014) [2024-06-15 14:32:15,955][1648985] Fps is (10 sec: 42598.1, 60 sec: 46967.5, 300 sec: 45875.2). Total num frames: 499417088. Throughput: 0: 11423.3. Samples: 124923392. Policy #0 lag: (min: 63.0, avg: 172.1, max: 303.0) [2024-06-15 14:32:15,956][1648985] Avg episode reward: [(0, '125.380')] [2024-06-15 14:32:16,629][1652491] Updated weights for policy 0, policy_version 243903 (0.0015) [2024-06-15 14:32:18,119][1652491] Updated weights for policy 0, policy_version 243954 (0.0014) [2024-06-15 14:32:18,543][1652491] Updated weights for policy 0, policy_version 243968 (0.0012) [2024-06-15 14:32:20,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 45329.0, 300 sec: 46208.4). Total num frames: 499646464. Throughput: 0: 11628.1. Samples: 125004800. Policy #0 lag: (min: 63.0, avg: 172.1, max: 303.0) [2024-06-15 14:32:20,956][1648985] Avg episode reward: [(0, '138.680')] [2024-06-15 14:32:22,758][1652491] Updated weights for policy 0, policy_version 244036 (0.0040) [2024-06-15 14:32:25,677][1652491] Updated weights for policy 0, policy_version 244112 (0.0013) [2024-06-15 14:32:25,955][1648985] Fps is (10 sec: 52428.2, 60 sec: 48059.6, 300 sec: 46319.5). Total num frames: 499941376. Throughput: 0: 11616.7. Samples: 125037568. Policy #0 lag: (min: 63.0, avg: 172.1, max: 303.0) [2024-06-15 14:32:25,956][1648985] Avg episode reward: [(0, '122.260')] [2024-06-15 14:32:27,903][1652491] Updated weights for policy 0, policy_version 244176 (0.0011) [2024-06-15 14:32:29,037][1652491] Updated weights for policy 0, policy_version 244221 (0.0011) [2024-06-15 14:32:30,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 46967.4, 300 sec: 46208.4). Total num frames: 500170752. Throughput: 0: 11992.2. Samples: 125109248. Policy #0 lag: (min: 63.0, avg: 172.1, max: 303.0) [2024-06-15 14:32:30,956][1648985] Avg episode reward: [(0, '133.490')] [2024-06-15 14:32:33,424][1652491] Updated weights for policy 0, policy_version 244280 (0.0014) [2024-06-15 14:32:34,359][1652491] Updated weights for policy 0, policy_version 244320 (0.0032) [2024-06-15 14:32:35,955][1648985] Fps is (10 sec: 49152.9, 60 sec: 48059.7, 300 sec: 46208.4). Total num frames: 500432896. Throughput: 0: 11878.4. Samples: 125181952. Policy #0 lag: (min: 15.0, avg: 99.5, max: 271.0) [2024-06-15 14:32:35,956][1648985] Avg episode reward: [(0, '152.400')] [2024-06-15 14:32:36,846][1652491] Updated weights for policy 0, policy_version 244368 (0.0012) [2024-06-15 14:32:37,731][1652491] Updated weights for policy 0, policy_version 244411 (0.0014) [2024-06-15 14:32:39,617][1652491] Updated weights for policy 0, policy_version 244474 (0.0014) [2024-06-15 14:32:40,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 48059.7, 300 sec: 46208.4). Total num frames: 500695040. Throughput: 0: 12026.3. Samples: 125217792. Policy #0 lag: (min: 15.0, avg: 99.5, max: 271.0) [2024-06-15 14:32:40,955][1648985] Avg episode reward: [(0, '138.200')] [2024-06-15 14:32:43,993][1652491] Updated weights for policy 0, policy_version 244528 (0.0014) [2024-06-15 14:32:45,625][1652491] Updated weights for policy 0, policy_version 244578 (0.0015) [2024-06-15 14:32:45,955][1648985] Fps is (10 sec: 49151.2, 60 sec: 47513.5, 300 sec: 46319.5). Total num frames: 500924416. Throughput: 0: 11969.4. Samples: 125288960. Policy #0 lag: (min: 15.0, avg: 99.5, max: 271.0) [2024-06-15 14:32:45,956][1648985] Avg episode reward: [(0, '129.190')] [2024-06-15 14:32:46,309][1652491] Updated weights for policy 0, policy_version 244608 (0.0011) [2024-06-15 14:32:49,281][1652491] Updated weights for policy 0, policy_version 244662 (0.0018) [2024-06-15 14:32:50,955][1648985] Fps is (10 sec: 45875.1, 60 sec: 47513.5, 300 sec: 46097.4). Total num frames: 501153792. Throughput: 0: 11992.2. Samples: 125350912. Policy #0 lag: (min: 15.0, avg: 99.5, max: 271.0) [2024-06-15 14:32:50,956][1648985] Avg episode reward: [(0, '139.610')] [2024-06-15 14:32:51,155][1652491] Updated weights for policy 0, policy_version 244731 (0.0016) [2024-06-15 14:32:54,977][1651469] Signal inference workers to stop experience collection... (12800 times) [2024-06-15 14:32:55,012][1652491] InferenceWorker_p0-w0: stopping experience collection (12800 times) [2024-06-15 14:32:55,192][1651469] Signal inference workers to resume experience collection... (12800 times) [2024-06-15 14:32:55,193][1652491] InferenceWorker_p0-w0: resuming experience collection (12800 times) [2024-06-15 14:32:55,678][1652491] Updated weights for policy 0, policy_version 244791 (0.0012) [2024-06-15 14:32:55,955][1648985] Fps is (10 sec: 42598.9, 60 sec: 45875.3, 300 sec: 46430.6). Total num frames: 501350400. Throughput: 0: 12140.1. Samples: 125393408. Policy #0 lag: (min: 15.0, avg: 99.5, max: 271.0) [2024-06-15 14:32:55,956][1648985] Avg episode reward: [(0, '149.260')] [2024-06-15 14:32:56,368][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000244816_501383168.pth... [2024-06-15 14:32:56,548][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000239408_490307584.pth [2024-06-15 14:32:57,368][1652491] Updated weights for policy 0, policy_version 244848 (0.0013) [2024-06-15 14:32:59,496][1652491] Updated weights for policy 0, policy_version 244883 (0.0037) [2024-06-15 14:33:00,642][1652491] Updated weights for policy 0, policy_version 244927 (0.0037) [2024-06-15 14:33:00,955][1648985] Fps is (10 sec: 45875.4, 60 sec: 48059.7, 300 sec: 46097.3). Total num frames: 501612544. Throughput: 0: 11946.7. Samples: 125460992. Policy #0 lag: (min: 15.0, avg: 99.5, max: 271.0) [2024-06-15 14:33:00,956][1648985] Avg episode reward: [(0, '152.410')] [2024-06-15 14:33:02,441][1652491] Updated weights for policy 0, policy_version 244989 (0.0012) [2024-06-15 14:33:05,955][1648985] Fps is (10 sec: 39322.0, 60 sec: 45875.3, 300 sec: 46208.4). Total num frames: 501743616. Throughput: 0: 11798.8. Samples: 125535744. Policy #0 lag: (min: 15.0, avg: 99.5, max: 271.0) [2024-06-15 14:33:05,956][1648985] Avg episode reward: [(0, '150.230')] [2024-06-15 14:33:07,233][1652491] Updated weights for policy 0, policy_version 245041 (0.0012) [2024-06-15 14:33:08,797][1652491] Updated weights for policy 0, policy_version 245108 (0.0011) [2024-06-15 14:33:10,831][1652491] Updated weights for policy 0, policy_version 245142 (0.0012) [2024-06-15 14:33:10,955][1648985] Fps is (10 sec: 42598.7, 60 sec: 48606.0, 300 sec: 46097.4). Total num frames: 502038528. Throughput: 0: 11707.8. Samples: 125564416. Policy #0 lag: (min: 15.0, avg: 99.5, max: 271.0) [2024-06-15 14:33:10,955][1648985] Avg episode reward: [(0, '141.420')] [2024-06-15 14:33:12,984][1652491] Updated weights for policy 0, policy_version 245200 (0.0013) [2024-06-15 14:33:15,955][1648985] Fps is (10 sec: 52427.5, 60 sec: 47513.5, 300 sec: 46208.4). Total num frames: 502267904. Throughput: 0: 11719.1. Samples: 125636608. Policy #0 lag: (min: 15.0, avg: 99.5, max: 271.0) [2024-06-15 14:33:15,956][1648985] Avg episode reward: [(0, '134.220')] [2024-06-15 14:33:17,595][1652491] Updated weights for policy 0, policy_version 245264 (0.0098) [2024-06-15 14:33:19,034][1652491] Updated weights for policy 0, policy_version 245331 (0.0014) [2024-06-15 14:33:20,077][1652491] Updated weights for policy 0, policy_version 245376 (0.0014) [2024-06-15 14:33:20,955][1648985] Fps is (10 sec: 49152.0, 60 sec: 48059.8, 300 sec: 46208.5). Total num frames: 502530048. Throughput: 0: 11639.5. Samples: 125705728. Policy #0 lag: (min: 15.0, avg: 99.5, max: 271.0) [2024-06-15 14:33:20,956][1648985] Avg episode reward: [(0, '140.920')] [2024-06-15 14:33:22,311][1652491] Updated weights for policy 0, policy_version 245424 (0.0017) [2024-06-15 14:33:24,442][1652491] Updated weights for policy 0, policy_version 245456 (0.0012) [2024-06-15 14:33:25,955][1648985] Fps is (10 sec: 52429.4, 60 sec: 47513.7, 300 sec: 46208.4). Total num frames: 502792192. Throughput: 0: 11730.5. Samples: 125745664. Policy #0 lag: (min: 15.0, avg: 99.5, max: 271.0) [2024-06-15 14:33:25,956][1648985] Avg episode reward: [(0, '156.330')] [2024-06-15 14:33:28,336][1652491] Updated weights for policy 0, policy_version 245507 (0.0014) [2024-06-15 14:33:29,400][1652491] Updated weights for policy 0, policy_version 245564 (0.0014) [2024-06-15 14:33:30,414][1652491] Updated weights for policy 0, policy_version 245601 (0.0092) [2024-06-15 14:33:30,955][1648985] Fps is (10 sec: 52428.5, 60 sec: 48059.8, 300 sec: 46208.4). Total num frames: 503054336. Throughput: 0: 11662.3. Samples: 125813760. Policy #0 lag: (min: 15.0, avg: 99.5, max: 271.0) [2024-06-15 14:33:30,956][1648985] Avg episode reward: [(0, '142.530')] [2024-06-15 14:33:32,104][1652491] Updated weights for policy 0, policy_version 245635 (0.0015) [2024-06-15 14:33:33,683][1652491] Updated weights for policy 0, policy_version 245689 (0.0017) [2024-06-15 14:33:35,955][1648985] Fps is (10 sec: 45874.7, 60 sec: 46967.3, 300 sec: 45987.2). Total num frames: 503250944. Throughput: 0: 11878.4. Samples: 125885440. Policy #0 lag: (min: 15.0, avg: 99.5, max: 271.0) [2024-06-15 14:33:35,956][1648985] Avg episode reward: [(0, '139.960')] [2024-06-15 14:33:36,560][1652491] Updated weights for policy 0, policy_version 245753 (0.0030) [2024-06-15 14:33:39,033][1651469] Signal inference workers to stop experience collection... (12850 times) [2024-06-15 14:33:39,113][1652491] InferenceWorker_p0-w0: stopping experience collection (12850 times) [2024-06-15 14:33:39,233][1651469] Signal inference workers to resume experience collection... (12850 times) [2024-06-15 14:33:39,233][1652491] InferenceWorker_p0-w0: resuming experience collection (12850 times) [2024-06-15 14:33:39,779][1652491] Updated weights for policy 0, policy_version 245795 (0.0013) [2024-06-15 14:33:40,955][1648985] Fps is (10 sec: 42598.5, 60 sec: 46421.4, 300 sec: 46319.5). Total num frames: 503480320. Throughput: 0: 11753.3. Samples: 125922304. Policy #0 lag: (min: 15.0, avg: 99.5, max: 271.0) [2024-06-15 14:33:40,956][1648985] Avg episode reward: [(0, '140.920')] [2024-06-15 14:33:41,353][1652491] Updated weights for policy 0, policy_version 245856 (0.0020) [2024-06-15 14:33:44,700][1652491] Updated weights for policy 0, policy_version 245924 (0.0016) [2024-06-15 14:33:45,955][1648985] Fps is (10 sec: 45876.1, 60 sec: 46421.5, 300 sec: 46097.4). Total num frames: 503709696. Throughput: 0: 11582.6. Samples: 125982208. Policy #0 lag: (min: 2.0, avg: 127.7, max: 258.0) [2024-06-15 14:33:45,955][1648985] Avg episode reward: [(0, '161.180')] [2024-06-15 14:33:48,416][1652491] Updated weights for policy 0, policy_version 246003 (0.0014) [2024-06-15 14:33:50,778][1652491] Updated weights for policy 0, policy_version 246020 (0.0013) [2024-06-15 14:33:50,955][1648985] Fps is (10 sec: 39320.5, 60 sec: 45328.9, 300 sec: 46208.4). Total num frames: 503873536. Throughput: 0: 11650.8. Samples: 126060032. Policy #0 lag: (min: 2.0, avg: 127.7, max: 258.0) [2024-06-15 14:33:50,957][1648985] Avg episode reward: [(0, '159.640')] [2024-06-15 14:33:52,096][1652491] Updated weights for policy 0, policy_version 246080 (0.0013) [2024-06-15 14:33:53,599][1652491] Updated weights for policy 0, policy_version 246142 (0.0108) [2024-06-15 14:33:55,955][1648985] Fps is (10 sec: 45874.9, 60 sec: 46967.5, 300 sec: 46208.4). Total num frames: 504168448. Throughput: 0: 11559.8. Samples: 126084608. Policy #0 lag: (min: 2.0, avg: 127.7, max: 258.0) [2024-06-15 14:33:55,956][1648985] Avg episode reward: [(0, '150.570')] [2024-06-15 14:33:56,165][1652491] Updated weights for policy 0, policy_version 246196 (0.0012) [2024-06-15 14:33:59,748][1652491] Updated weights for policy 0, policy_version 246264 (0.0014) [2024-06-15 14:34:00,955][1648985] Fps is (10 sec: 49153.2, 60 sec: 45875.2, 300 sec: 46208.4). Total num frames: 504365056. Throughput: 0: 11571.2. Samples: 126157312. Policy #0 lag: (min: 2.0, avg: 127.7, max: 258.0) [2024-06-15 14:34:00,956][1648985] Avg episode reward: [(0, '148.720')] [2024-06-15 14:34:02,926][1652491] Updated weights for policy 0, policy_version 246308 (0.0184) [2024-06-15 14:34:03,714][1652491] Updated weights for policy 0, policy_version 246345 (0.0141) [2024-06-15 14:34:05,098][1652491] Updated weights for policy 0, policy_version 246397 (0.0012) [2024-06-15 14:34:05,955][1648985] Fps is (10 sec: 45875.6, 60 sec: 48059.7, 300 sec: 46208.5). Total num frames: 504627200. Throughput: 0: 11594.0. Samples: 126227456. Policy #0 lag: (min: 2.0, avg: 127.7, max: 258.0) [2024-06-15 14:34:05,955][1648985] Avg episode reward: [(0, '151.080')] [2024-06-15 14:34:07,095][1652491] Updated weights for policy 0, policy_version 246448 (0.0014) [2024-06-15 14:34:10,579][1652491] Updated weights for policy 0, policy_version 246498 (0.0014) [2024-06-15 14:34:10,955][1648985] Fps is (10 sec: 49151.4, 60 sec: 46967.3, 300 sec: 46097.3). Total num frames: 504856576. Throughput: 0: 11548.4. Samples: 126265344. Policy #0 lag: (min: 2.0, avg: 127.7, max: 258.0) [2024-06-15 14:34:10,956][1648985] Avg episode reward: [(0, '155.420')] [2024-06-15 14:34:13,276][1652491] Updated weights for policy 0, policy_version 246545 (0.0048) [2024-06-15 14:34:14,196][1652491] Updated weights for policy 0, policy_version 246592 (0.0014) [2024-06-15 14:34:15,955][1648985] Fps is (10 sec: 52427.8, 60 sec: 48059.7, 300 sec: 46541.6). Total num frames: 505151488. Throughput: 0: 11673.6. Samples: 126339072. Policy #0 lag: (min: 2.0, avg: 127.7, max: 258.0) [2024-06-15 14:34:15,957][1648985] Avg episode reward: [(0, '168.780')] [2024-06-15 14:34:17,511][1652491] Updated weights for policy 0, policy_version 246673 (0.0014) [2024-06-15 14:34:18,522][1652491] Updated weights for policy 0, policy_version 246717 (0.0025) [2024-06-15 14:34:20,955][1648985] Fps is (10 sec: 42598.6, 60 sec: 45875.1, 300 sec: 45986.3). Total num frames: 505282560. Throughput: 0: 11582.6. Samples: 126406656. Policy #0 lag: (min: 2.0, avg: 127.7, max: 258.0) [2024-06-15 14:34:20,956][1648985] Avg episode reward: [(0, '165.020')] [2024-06-15 14:34:21,978][1651469] Signal inference workers to stop experience collection... (12900 times) [2024-06-15 14:34:21,995][1652491] InferenceWorker_p0-w0: stopping experience collection (12900 times) [2024-06-15 14:34:22,176][1651469] Signal inference workers to resume experience collection... (12900 times) [2024-06-15 14:34:22,176][1652491] InferenceWorker_p0-w0: resuming experience collection (12900 times) [2024-06-15 14:34:22,383][1652491] Updated weights for policy 0, policy_version 246775 (0.0022) [2024-06-15 14:34:25,074][1652491] Updated weights for policy 0, policy_version 246816 (0.0011) [2024-06-15 14:34:25,955][1648985] Fps is (10 sec: 39322.2, 60 sec: 45875.2, 300 sec: 46541.7). Total num frames: 505544704. Throughput: 0: 11525.7. Samples: 126440960. Policy #0 lag: (min: 2.0, avg: 127.7, max: 258.0) [2024-06-15 14:34:25,956][1648985] Avg episode reward: [(0, '156.600')] [2024-06-15 14:34:26,525][1652491] Updated weights for policy 0, policy_version 246880 (0.0014) [2024-06-15 14:34:30,078][1652491] Updated weights for policy 0, policy_version 246965 (0.0144) [2024-06-15 14:34:30,955][1648985] Fps is (10 sec: 52429.6, 60 sec: 45875.2, 300 sec: 46097.4). Total num frames: 505806848. Throughput: 0: 11639.5. Samples: 126505984. Policy #0 lag: (min: 2.0, avg: 127.7, max: 258.0) [2024-06-15 14:34:30,955][1648985] Avg episode reward: [(0, '145.890')] [2024-06-15 14:34:33,470][1652491] Updated weights for policy 0, policy_version 247013 (0.0010) [2024-06-15 14:34:35,955][1648985] Fps is (10 sec: 39321.7, 60 sec: 44783.1, 300 sec: 46208.4). Total num frames: 505937920. Throughput: 0: 11628.2. Samples: 126583296. Policy #0 lag: (min: 2.0, avg: 127.7, max: 258.0) [2024-06-15 14:34:35,956][1648985] Avg episode reward: [(0, '152.480')] [2024-06-15 14:34:36,330][1652491] Updated weights for policy 0, policy_version 247062 (0.0012) [2024-06-15 14:34:38,460][1652491] Updated weights for policy 0, policy_version 247152 (0.0116) [2024-06-15 14:34:40,712][1652491] Updated weights for policy 0, policy_version 247184 (0.0045) [2024-06-15 14:34:40,955][1648985] Fps is (10 sec: 42598.4, 60 sec: 45875.2, 300 sec: 46098.7). Total num frames: 506232832. Throughput: 0: 11707.8. Samples: 126611456. Policy #0 lag: (min: 2.0, avg: 127.7, max: 258.0) [2024-06-15 14:34:40,955][1648985] Avg episode reward: [(0, '144.380')] [2024-06-15 14:34:41,806][1652491] Updated weights for policy 0, policy_version 247232 (0.0012) [2024-06-15 14:34:44,918][1652491] Updated weights for policy 0, policy_version 247291 (0.0138) [2024-06-15 14:34:45,956][1648985] Fps is (10 sec: 52427.3, 60 sec: 45875.0, 300 sec: 46208.4). Total num frames: 506462208. Throughput: 0: 11696.3. Samples: 126683648. Policy #0 lag: (min: 2.0, avg: 127.7, max: 258.0) [2024-06-15 14:34:45,958][1648985] Avg episode reward: [(0, '136.170')] [2024-06-15 14:34:47,353][1652491] Updated weights for policy 0, policy_version 247329 (0.0029) [2024-06-15 14:34:49,028][1652491] Updated weights for policy 0, policy_version 247412 (0.0028) [2024-06-15 14:34:50,955][1648985] Fps is (10 sec: 49150.9, 60 sec: 47513.7, 300 sec: 46319.5). Total num frames: 506724352. Throughput: 0: 11878.3. Samples: 126761984. Policy #0 lag: (min: 2.0, avg: 127.7, max: 258.0) [2024-06-15 14:34:50,956][1648985] Avg episode reward: [(0, '119.850')] [2024-06-15 14:34:51,723][1652491] Updated weights for policy 0, policy_version 247460 (0.0029) [2024-06-15 14:34:55,113][1652491] Updated weights for policy 0, policy_version 247505 (0.0014) [2024-06-15 14:34:55,955][1648985] Fps is (10 sec: 49152.8, 60 sec: 46421.3, 300 sec: 46097.4). Total num frames: 506953728. Throughput: 0: 11889.8. Samples: 126800384. Policy #0 lag: (min: 3.0, avg: 106.8, max: 259.0) [2024-06-15 14:34:55,956][1648985] Avg episode reward: [(0, '124.400')] [2024-06-15 14:34:56,045][1652491] Updated weights for policy 0, policy_version 247551 (0.0018) [2024-06-15 14:34:56,055][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000247552_506986496.pth... [2024-06-15 14:34:56,100][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000242144_495910912.pth [2024-06-15 14:34:58,492][1652491] Updated weights for policy 0, policy_version 247616 (0.0013) [2024-06-15 14:34:59,989][1652491] Updated weights for policy 0, policy_version 247673 (0.0031) [2024-06-15 14:35:00,955][1648985] Fps is (10 sec: 52429.5, 60 sec: 48059.7, 300 sec: 46763.8). Total num frames: 507248640. Throughput: 0: 11559.8. Samples: 126859264. Policy #0 lag: (min: 3.0, avg: 106.8, max: 259.0) [2024-06-15 14:35:00,956][1648985] Avg episode reward: [(0, '131.340')] [2024-06-15 14:35:02,973][1652491] Updated weights for policy 0, policy_version 247715 (0.0013) [2024-06-15 14:35:05,700][1651469] Signal inference workers to stop experience collection... (12950 times) [2024-06-15 14:35:05,763][1652491] InferenceWorker_p0-w0: stopping experience collection (12950 times) [2024-06-15 14:35:05,893][1651469] Signal inference workers to resume experience collection... (12950 times) [2024-06-15 14:35:05,894][1652491] InferenceWorker_p0-w0: resuming experience collection (12950 times) [2024-06-15 14:35:05,896][1652491] Updated weights for policy 0, policy_version 247760 (0.0021) [2024-06-15 14:35:05,955][1648985] Fps is (10 sec: 45875.5, 60 sec: 46421.3, 300 sec: 45875.2). Total num frames: 507412480. Throughput: 0: 11867.0. Samples: 126940672. Policy #0 lag: (min: 3.0, avg: 106.8, max: 259.0) [2024-06-15 14:35:05,956][1648985] Avg episode reward: [(0, '152.690')] [2024-06-15 14:35:07,939][1652491] Updated weights for policy 0, policy_version 247809 (0.0014) [2024-06-15 14:35:09,464][1652491] Updated weights for policy 0, policy_version 247873 (0.0096) [2024-06-15 14:35:10,639][1652491] Updated weights for policy 0, policy_version 247926 (0.0013) [2024-06-15 14:35:10,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 48605.9, 300 sec: 47097.1). Total num frames: 507772928. Throughput: 0: 11912.5. Samples: 126977024. Policy #0 lag: (min: 3.0, avg: 106.8, max: 259.0) [2024-06-15 14:35:10,956][1648985] Avg episode reward: [(0, '155.540')] [2024-06-15 14:35:14,103][1652491] Updated weights for policy 0, policy_version 247976 (0.0013) [2024-06-15 14:35:15,955][1648985] Fps is (10 sec: 49152.0, 60 sec: 45875.3, 300 sec: 46319.5). Total num frames: 507904000. Throughput: 0: 12049.1. Samples: 127048192. Policy #0 lag: (min: 3.0, avg: 106.8, max: 259.0) [2024-06-15 14:35:15,956][1648985] Avg episode reward: [(0, '164.790')] [2024-06-15 14:35:17,046][1652491] Updated weights for policy 0, policy_version 248016 (0.0012) [2024-06-15 14:35:19,296][1652491] Updated weights for policy 0, policy_version 248080 (0.0013) [2024-06-15 14:35:20,955][1648985] Fps is (10 sec: 39322.1, 60 sec: 48059.9, 300 sec: 46652.8). Total num frames: 508166144. Throughput: 0: 11810.1. Samples: 127114752. Policy #0 lag: (min: 3.0, avg: 106.8, max: 259.0) [2024-06-15 14:35:20,955][1648985] Avg episode reward: [(0, '151.310')] [2024-06-15 14:35:21,548][1652491] Updated weights for policy 0, policy_version 248160 (0.0011) [2024-06-15 14:35:22,278][1652491] Updated weights for policy 0, policy_version 248187 (0.0031) [2024-06-15 14:35:25,955][1648985] Fps is (10 sec: 45875.4, 60 sec: 46967.5, 300 sec: 46541.7). Total num frames: 508362752. Throughput: 0: 11969.4. Samples: 127150080. Policy #0 lag: (min: 3.0, avg: 106.8, max: 259.0) [2024-06-15 14:35:25,955][1648985] Avg episode reward: [(0, '155.210')] [2024-06-15 14:35:26,216][1652491] Updated weights for policy 0, policy_version 248240 (0.0122) [2024-06-15 14:35:29,459][1652491] Updated weights for policy 0, policy_version 248304 (0.0017) [2024-06-15 14:35:30,955][1648985] Fps is (10 sec: 45874.7, 60 sec: 46967.4, 300 sec: 46430.6). Total num frames: 508624896. Throughput: 0: 11969.5. Samples: 127222272. Policy #0 lag: (min: 3.0, avg: 106.8, max: 259.0) [2024-06-15 14:35:30,956][1648985] Avg episode reward: [(0, '144.390')] [2024-06-15 14:35:30,988][1652491] Updated weights for policy 0, policy_version 248356 (0.0189) [2024-06-15 14:35:32,258][1652491] Updated weights for policy 0, policy_version 248401 (0.0020) [2024-06-15 14:35:35,955][1648985] Fps is (10 sec: 45874.9, 60 sec: 48059.7, 300 sec: 46763.8). Total num frames: 508821504. Throughput: 0: 11889.8. Samples: 127297024. Policy #0 lag: (min: 3.0, avg: 106.8, max: 259.0) [2024-06-15 14:35:35,956][1648985] Avg episode reward: [(0, '135.520')] [2024-06-15 14:35:36,507][1652491] Updated weights for policy 0, policy_version 248449 (0.0017) [2024-06-15 14:35:39,092][1652491] Updated weights for policy 0, policy_version 248513 (0.0126) [2024-06-15 14:35:40,789][1652491] Updated weights for policy 0, policy_version 248576 (0.0039) [2024-06-15 14:35:40,955][1648985] Fps is (10 sec: 45875.4, 60 sec: 47513.6, 300 sec: 46208.4). Total num frames: 509083648. Throughput: 0: 11855.7. Samples: 127333888. Policy #0 lag: (min: 3.0, avg: 106.8, max: 259.0) [2024-06-15 14:35:40,956][1648985] Avg episode reward: [(0, '130.190')] [2024-06-15 14:35:42,455][1652491] Updated weights for policy 0, policy_version 248628 (0.0018) [2024-06-15 14:35:44,153][1652491] Updated weights for policy 0, policy_version 248696 (0.0134) [2024-06-15 14:35:45,955][1648985] Fps is (10 sec: 52428.6, 60 sec: 48059.9, 300 sec: 47097.1). Total num frames: 509345792. Throughput: 0: 11764.6. Samples: 127388672. Policy #0 lag: (min: 3.0, avg: 106.8, max: 259.0) [2024-06-15 14:35:45,956][1648985] Avg episode reward: [(0, '147.950')] [2024-06-15 14:35:48,595][1651469] Signal inference workers to stop experience collection... (13000 times) [2024-06-15 14:35:48,681][1652491] InferenceWorker_p0-w0: stopping experience collection (13000 times) [2024-06-15 14:35:48,814][1651469] Signal inference workers to resume experience collection... (13000 times) [2024-06-15 14:35:48,815][1652491] InferenceWorker_p0-w0: resuming experience collection (13000 times) [2024-06-15 14:35:49,005][1652491] Updated weights for policy 0, policy_version 248740 (0.0012) [2024-06-15 14:35:50,331][1652491] Updated weights for policy 0, policy_version 248774 (0.0022) [2024-06-15 14:35:50,955][1648985] Fps is (10 sec: 45876.3, 60 sec: 46967.8, 300 sec: 46430.6). Total num frames: 509542400. Throughput: 0: 11821.6. Samples: 127472640. Policy #0 lag: (min: 3.0, avg: 106.8, max: 259.0) [2024-06-15 14:35:50,955][1648985] Avg episode reward: [(0, '147.630')] [2024-06-15 14:35:51,735][1652491] Updated weights for policy 0, policy_version 248825 (0.0103) [2024-06-15 14:35:53,149][1652491] Updated weights for policy 0, policy_version 248883 (0.0011) [2024-06-15 14:35:54,060][1652491] Updated weights for policy 0, policy_version 248930 (0.0012) [2024-06-15 14:35:55,955][1648985] Fps is (10 sec: 52428.4, 60 sec: 48605.8, 300 sec: 47097.0). Total num frames: 509870080. Throughput: 0: 11673.6. Samples: 127502336. Policy #0 lag: (min: 3.0, avg: 106.8, max: 259.0) [2024-06-15 14:35:55,956][1648985] Avg episode reward: [(0, '144.050')] [2024-06-15 14:35:59,127][1652491] Updated weights for policy 0, policy_version 248982 (0.0025) [2024-06-15 14:35:59,831][1652491] Updated weights for policy 0, policy_version 249024 (0.0013) [2024-06-15 14:36:00,955][1648985] Fps is (10 sec: 45874.3, 60 sec: 45875.3, 300 sec: 46652.7). Total num frames: 510001152. Throughput: 0: 11821.5. Samples: 127580160. Policy #0 lag: (min: 3.0, avg: 106.8, max: 259.0) [2024-06-15 14:36:00,956][1648985] Avg episode reward: [(0, '150.090')] [2024-06-15 14:36:02,714][1652491] Updated weights for policy 0, policy_version 249089 (0.0013) [2024-06-15 14:36:04,097][1652491] Updated weights for policy 0, policy_version 249152 (0.0012) [2024-06-15 14:36:05,297][1652491] Updated weights for policy 0, policy_version 249215 (0.0012) [2024-06-15 14:36:05,955][1648985] Fps is (10 sec: 52429.3, 60 sec: 49698.1, 300 sec: 47097.1). Total num frames: 510394368. Throughput: 0: 11923.9. Samples: 127651328. Policy #0 lag: (min: 111.0, avg: 221.4, max: 335.0) [2024-06-15 14:36:05,956][1648985] Avg episode reward: [(0, '163.530')] [2024-06-15 14:36:10,284][1652491] Updated weights for policy 0, policy_version 249274 (0.0021) [2024-06-15 14:36:10,955][1648985] Fps is (10 sec: 52428.5, 60 sec: 45875.2, 300 sec: 47208.1). Total num frames: 510525440. Throughput: 0: 12105.9. Samples: 127694848. Policy #0 lag: (min: 111.0, avg: 221.4, max: 335.0) [2024-06-15 14:36:10,956][1648985] Avg episode reward: [(0, '140.150')] [2024-06-15 14:36:12,362][1652491] Updated weights for policy 0, policy_version 249328 (0.0079) [2024-06-15 14:36:14,910][1652491] Updated weights for policy 0, policy_version 249408 (0.0013) [2024-06-15 14:36:15,955][1648985] Fps is (10 sec: 49152.2, 60 sec: 49698.1, 300 sec: 47319.2). Total num frames: 510885888. Throughput: 0: 12003.6. Samples: 127762432. Policy #0 lag: (min: 111.0, avg: 221.4, max: 335.0) [2024-06-15 14:36:15,956][1648985] Avg episode reward: [(0, '130.400')] [2024-06-15 14:36:16,090][1652491] Updated weights for policy 0, policy_version 249467 (0.0012) [2024-06-15 14:36:20,955][1648985] Fps is (10 sec: 39320.8, 60 sec: 45875.0, 300 sec: 46985.9). Total num frames: 510918656. Throughput: 0: 12014.9. Samples: 127837696. Policy #0 lag: (min: 111.0, avg: 221.4, max: 335.0) [2024-06-15 14:36:20,956][1648985] Avg episode reward: [(0, '119.310')] [2024-06-15 14:36:21,739][1652491] Updated weights for policy 0, policy_version 249507 (0.0011) [2024-06-15 14:36:23,729][1652491] Updated weights for policy 0, policy_version 249554 (0.0012) [2024-06-15 14:36:25,649][1652491] Updated weights for policy 0, policy_version 249635 (0.0014) [2024-06-15 14:36:25,955][1648985] Fps is (10 sec: 39321.6, 60 sec: 48605.8, 300 sec: 47208.1). Total num frames: 511279104. Throughput: 0: 11901.2. Samples: 127869440. Policy #0 lag: (min: 111.0, avg: 221.4, max: 335.0) [2024-06-15 14:36:25,956][1648985] Avg episode reward: [(0, '129.950')] [2024-06-15 14:36:25,959][1651469] Signal inference workers to stop experience collection... (13050 times) [2024-06-15 14:36:26,026][1652491] InferenceWorker_p0-w0: stopping experience collection (13050 times) [2024-06-15 14:36:26,198][1651469] Signal inference workers to resume experience collection... (13050 times) [2024-06-15 14:36:26,200][1652491] InferenceWorker_p0-w0: resuming experience collection (13050 times) [2024-06-15 14:36:27,506][1652491] Updated weights for policy 0, policy_version 249712 (0.0012) [2024-06-15 14:36:30,955][1648985] Fps is (10 sec: 52429.9, 60 sec: 46967.5, 300 sec: 47097.0). Total num frames: 511442944. Throughput: 0: 12014.9. Samples: 127929344. Policy #0 lag: (min: 111.0, avg: 221.4, max: 335.0) [2024-06-15 14:36:30,956][1648985] Avg episode reward: [(0, '130.270')] [2024-06-15 14:36:32,760][1652491] Updated weights for policy 0, policy_version 249744 (0.0012) [2024-06-15 14:36:33,906][1652491] Updated weights for policy 0, policy_version 249790 (0.0014) [2024-06-15 14:36:35,955][1648985] Fps is (10 sec: 39321.6, 60 sec: 47513.6, 300 sec: 46986.0). Total num frames: 511672320. Throughput: 0: 11775.9. Samples: 128002560. Policy #0 lag: (min: 111.0, avg: 221.4, max: 335.0) [2024-06-15 14:36:35,956][1648985] Avg episode reward: [(0, '134.810')] [2024-06-15 14:36:36,150][1652491] Updated weights for policy 0, policy_version 249856 (0.0015) [2024-06-15 14:36:37,919][1652491] Updated weights for policy 0, policy_version 249921 (0.0109) [2024-06-15 14:36:39,377][1652491] Updated weights for policy 0, policy_version 249983 (0.0011) [2024-06-15 14:36:40,955][1648985] Fps is (10 sec: 52427.9, 60 sec: 48059.6, 300 sec: 47097.0). Total num frames: 511967232. Throughput: 0: 11787.4. Samples: 128032768. Policy #0 lag: (min: 111.0, avg: 221.4, max: 335.0) [2024-06-15 14:36:40,956][1648985] Avg episode reward: [(0, '127.780')] [2024-06-15 14:36:44,576][1652491] Updated weights for policy 0, policy_version 250032 (0.0040) [2024-06-15 14:36:45,955][1648985] Fps is (10 sec: 42598.4, 60 sec: 45875.2, 300 sec: 46763.8). Total num frames: 512098304. Throughput: 0: 11628.1. Samples: 128103424. Policy #0 lag: (min: 111.0, avg: 221.4, max: 335.0) [2024-06-15 14:36:45,956][1648985] Avg episode reward: [(0, '138.520')] [2024-06-15 14:36:46,976][1652491] Updated weights for policy 0, policy_version 250083 (0.0014) [2024-06-15 14:36:48,201][1652491] Updated weights for policy 0, policy_version 250144 (0.0014) [2024-06-15 14:36:49,264][1652491] Updated weights for policy 0, policy_version 250192 (0.0014) [2024-06-15 14:36:50,211][1652491] Updated weights for policy 0, policy_version 250235 (0.0027) [2024-06-15 14:36:50,955][1648985] Fps is (10 sec: 52430.0, 60 sec: 49151.8, 300 sec: 47097.1). Total num frames: 512491520. Throughput: 0: 11571.2. Samples: 128172032. Policy #0 lag: (min: 111.0, avg: 221.4, max: 335.0) [2024-06-15 14:36:50,955][1648985] Avg episode reward: [(0, '156.020')] [2024-06-15 14:36:55,958][1648985] Fps is (10 sec: 49135.9, 60 sec: 45326.7, 300 sec: 46985.5). Total num frames: 512589824. Throughput: 0: 11547.6. Samples: 128214528. Policy #0 lag: (min: 111.0, avg: 221.4, max: 335.0) [2024-06-15 14:36:55,959][1648985] Avg episode reward: [(0, '145.660')] [2024-06-15 14:36:55,978][1652491] Updated weights for policy 0, policy_version 250296 (0.0024) [2024-06-15 14:36:56,075][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000250304_512622592.pth... [2024-06-15 14:36:56,203][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000244816_501383168.pth [2024-06-15 14:36:58,990][1652491] Updated weights for policy 0, policy_version 250384 (0.0085) [2024-06-15 14:37:00,103][1652491] Updated weights for policy 0, policy_version 250435 (0.0016) [2024-06-15 14:37:00,955][1648985] Fps is (10 sec: 49151.9, 60 sec: 49698.1, 300 sec: 47430.3). Total num frames: 512983040. Throughput: 0: 11468.8. Samples: 128278528. Policy #0 lag: (min: 111.0, avg: 221.4, max: 335.0) [2024-06-15 14:37:00,956][1648985] Avg episode reward: [(0, '150.350')] [2024-06-15 14:37:05,418][1652491] Updated weights for policy 0, policy_version 250498 (0.0014) [2024-06-15 14:37:05,955][1648985] Fps is (10 sec: 49167.9, 60 sec: 44782.9, 300 sec: 47319.2). Total num frames: 513081344. Throughput: 0: 11548.5. Samples: 128357376. Policy #0 lag: (min: 111.0, avg: 221.4, max: 335.0) [2024-06-15 14:37:05,956][1648985] Avg episode reward: [(0, '141.850')] [2024-06-15 14:37:06,518][1652491] Updated weights for policy 0, policy_version 250554 (0.0014) [2024-06-15 14:37:09,855][1652491] Updated weights for policy 0, policy_version 250608 (0.0099) [2024-06-15 14:37:09,889][1651469] Signal inference workers to stop experience collection... (13100 times) [2024-06-15 14:37:09,972][1652491] InferenceWorker_p0-w0: stopping experience collection (13100 times) [2024-06-15 14:37:10,175][1651469] Signal inference workers to resume experience collection... (13100 times) [2024-06-15 14:37:10,176][1652491] InferenceWorker_p0-w0: resuming experience collection (13100 times) [2024-06-15 14:37:10,955][1648985] Fps is (10 sec: 36044.8, 60 sec: 46967.5, 300 sec: 47208.1). Total num frames: 513343488. Throughput: 0: 11719.1. Samples: 128396800. Policy #0 lag: (min: 111.0, avg: 221.4, max: 335.0) [2024-06-15 14:37:10,955][1648985] Avg episode reward: [(0, '144.660')] [2024-06-15 14:37:11,588][1652491] Updated weights for policy 0, policy_version 250674 (0.0013) [2024-06-15 14:37:13,119][1652491] Updated weights for policy 0, policy_version 250748 (0.0013) [2024-06-15 14:37:15,955][1648985] Fps is (10 sec: 45875.7, 60 sec: 44236.8, 300 sec: 47097.1). Total num frames: 513540096. Throughput: 0: 11605.3. Samples: 128451584. Policy #0 lag: (min: 111.0, avg: 221.4, max: 335.0) [2024-06-15 14:37:15,956][1648985] Avg episode reward: [(0, '145.920')] [2024-06-15 14:37:17,973][1652491] Updated weights for policy 0, policy_version 250802 (0.0013) [2024-06-15 14:37:18,224][1652491] Updated weights for policy 0, policy_version 250816 (0.0017) [2024-06-15 14:37:20,955][1648985] Fps is (10 sec: 39321.4, 60 sec: 46967.6, 300 sec: 46763.8). Total num frames: 513736704. Throughput: 0: 11696.4. Samples: 128528896. Policy #0 lag: (min: 15.0, avg: 118.4, max: 271.0) [2024-06-15 14:37:20,956][1648985] Avg episode reward: [(0, '146.680')] [2024-06-15 14:37:22,604][1652491] Updated weights for policy 0, policy_version 250913 (0.0015) [2024-06-15 14:37:23,972][1652491] Updated weights for policy 0, policy_version 250964 (0.0012) [2024-06-15 14:37:25,955][1648985] Fps is (10 sec: 52429.1, 60 sec: 46421.4, 300 sec: 47097.1). Total num frames: 514064384. Throughput: 0: 11525.8. Samples: 128551424. Policy #0 lag: (min: 15.0, avg: 118.4, max: 271.0) [2024-06-15 14:37:25,955][1648985] Avg episode reward: [(0, '146.510')] [2024-06-15 14:37:28,500][1652491] Updated weights for policy 0, policy_version 251010 (0.0024) [2024-06-15 14:37:29,652][1652491] Updated weights for policy 0, policy_version 251065 (0.0014) [2024-06-15 14:37:30,955][1648985] Fps is (10 sec: 45875.6, 60 sec: 45875.3, 300 sec: 46652.8). Total num frames: 514195456. Throughput: 0: 11650.9. Samples: 128627712. Policy #0 lag: (min: 15.0, avg: 118.4, max: 271.0) [2024-06-15 14:37:30,956][1648985] Avg episode reward: [(0, '138.520')] [2024-06-15 14:37:32,348][1652491] Updated weights for policy 0, policy_version 251110 (0.0037) [2024-06-15 14:37:33,992][1652491] Updated weights for policy 0, policy_version 251170 (0.0073) [2024-06-15 14:37:35,707][1652491] Updated weights for policy 0, policy_version 251238 (0.0013) [2024-06-15 14:37:35,955][1648985] Fps is (10 sec: 49151.7, 60 sec: 48059.8, 300 sec: 46986.0). Total num frames: 514555904. Throughput: 0: 11423.3. Samples: 128686080. Policy #0 lag: (min: 15.0, avg: 118.4, max: 271.0) [2024-06-15 14:37:35,956][1648985] Avg episode reward: [(0, '137.320')] [2024-06-15 14:37:40,368][1652491] Updated weights for policy 0, policy_version 251298 (0.0024) [2024-06-15 14:37:40,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 45875.4, 300 sec: 46763.9). Total num frames: 514719744. Throughput: 0: 11515.2. Samples: 128732672. Policy #0 lag: (min: 15.0, avg: 118.4, max: 271.0) [2024-06-15 14:37:40,956][1648985] Avg episode reward: [(0, '142.790')] [2024-06-15 14:37:42,355][1652491] Updated weights for policy 0, policy_version 251332 (0.0049) [2024-06-15 14:37:43,779][1652491] Updated weights for policy 0, policy_version 251392 (0.0030) [2024-06-15 14:37:45,815][1652491] Updated weights for policy 0, policy_version 251456 (0.0080) [2024-06-15 14:37:45,955][1648985] Fps is (10 sec: 42598.0, 60 sec: 48059.7, 300 sec: 46874.9). Total num frames: 514981888. Throughput: 0: 11525.7. Samples: 128797184. Policy #0 lag: (min: 15.0, avg: 118.4, max: 271.0) [2024-06-15 14:37:45,956][1648985] Avg episode reward: [(0, '155.250')] [2024-06-15 14:37:50,794][1652491] Updated weights for policy 0, policy_version 251521 (0.0012) [2024-06-15 14:37:50,955][1648985] Fps is (10 sec: 39321.5, 60 sec: 43690.7, 300 sec: 46652.8). Total num frames: 515112960. Throughput: 0: 11491.6. Samples: 128874496. Policy #0 lag: (min: 15.0, avg: 118.4, max: 271.0) [2024-06-15 14:37:50,956][1648985] Avg episode reward: [(0, '158.040')] [2024-06-15 14:37:51,507][1651469] Signal inference workers to stop experience collection... (13150 times) [2024-06-15 14:37:51,587][1652491] InferenceWorker_p0-w0: stopping experience collection (13150 times) [2024-06-15 14:37:51,730][1651469] Signal inference workers to resume experience collection... (13150 times) [2024-06-15 14:37:51,731][1652491] InferenceWorker_p0-w0: resuming experience collection (13150 times) [2024-06-15 14:37:51,885][1652491] Updated weights for policy 0, policy_version 251576 (0.0013) [2024-06-15 14:37:55,038][1652491] Updated weights for policy 0, policy_version 251632 (0.0016) [2024-06-15 14:37:55,955][1648985] Fps is (10 sec: 39322.0, 60 sec: 46423.9, 300 sec: 46652.8). Total num frames: 515375104. Throughput: 0: 11411.9. Samples: 128910336. Policy #0 lag: (min: 15.0, avg: 118.4, max: 271.0) [2024-06-15 14:37:55,955][1648985] Avg episode reward: [(0, '148.950')] [2024-06-15 14:37:55,969][1652491] Updated weights for policy 0, policy_version 251664 (0.0048) [2024-06-15 14:37:57,828][1652491] Updated weights for policy 0, policy_version 251729 (0.0015) [2024-06-15 14:37:58,791][1652491] Updated weights for policy 0, policy_version 251776 (0.0013) [2024-06-15 14:38:00,955][1648985] Fps is (10 sec: 52428.6, 60 sec: 44236.8, 300 sec: 47097.0). Total num frames: 515637248. Throughput: 0: 11548.4. Samples: 128971264. Policy #0 lag: (min: 15.0, avg: 118.4, max: 271.0) [2024-06-15 14:38:00,956][1648985] Avg episode reward: [(0, '127.720')] [2024-06-15 14:38:05,198][1652491] Updated weights for policy 0, policy_version 251841 (0.0012) [2024-06-15 14:38:05,955][1648985] Fps is (10 sec: 45875.2, 60 sec: 45875.3, 300 sec: 46763.8). Total num frames: 515833856. Throughput: 0: 11514.3. Samples: 129047040. Policy #0 lag: (min: 15.0, avg: 118.4, max: 271.0) [2024-06-15 14:38:05,956][1648985] Avg episode reward: [(0, '132.590')] [2024-06-15 14:38:06,398][1652491] Updated weights for policy 0, policy_version 251900 (0.0012) [2024-06-15 14:38:07,942][1652491] Updated weights for policy 0, policy_version 251941 (0.0018) [2024-06-15 14:38:09,744][1652491] Updated weights for policy 0, policy_version 252032 (0.0014) [2024-06-15 14:38:10,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 46967.5, 300 sec: 47097.1). Total num frames: 516161536. Throughput: 0: 11639.5. Samples: 129075200. Policy #0 lag: (min: 15.0, avg: 118.4, max: 271.0) [2024-06-15 14:38:10,956][1648985] Avg episode reward: [(0, '130.890')] [2024-06-15 14:38:14,597][1652491] Updated weights for policy 0, policy_version 252096 (0.0013) [2024-06-15 14:38:15,955][1648985] Fps is (10 sec: 45875.2, 60 sec: 45875.2, 300 sec: 46652.7). Total num frames: 516292608. Throughput: 0: 11639.5. Samples: 129151488. Policy #0 lag: (min: 15.0, avg: 118.4, max: 271.0) [2024-06-15 14:38:15,956][1648985] Avg episode reward: [(0, '150.700')] [2024-06-15 14:38:18,765][1652491] Updated weights for policy 0, policy_version 252176 (0.0100) [2024-06-15 14:38:20,770][1652491] Updated weights for policy 0, policy_version 252246 (0.0012) [2024-06-15 14:38:20,955][1648985] Fps is (10 sec: 42598.1, 60 sec: 47513.6, 300 sec: 46763.8). Total num frames: 516587520. Throughput: 0: 11639.5. Samples: 129209856. Policy #0 lag: (min: 15.0, avg: 118.4, max: 271.0) [2024-06-15 14:38:20,956][1648985] Avg episode reward: [(0, '157.610')] [2024-06-15 14:38:25,653][1652491] Updated weights for policy 0, policy_version 252304 (0.0012) [2024-06-15 14:38:25,955][1648985] Fps is (10 sec: 42598.1, 60 sec: 44236.7, 300 sec: 46319.5). Total num frames: 516718592. Throughput: 0: 11480.2. Samples: 129249280. Policy #0 lag: (min: 15.0, avg: 118.4, max: 271.0) [2024-06-15 14:38:25,956][1648985] Avg episode reward: [(0, '144.880')] [2024-06-15 14:38:28,110][1652491] Updated weights for policy 0, policy_version 252357 (0.0016) [2024-06-15 14:38:29,433][1652491] Updated weights for policy 0, policy_version 252413 (0.0029) [2024-06-15 14:38:30,955][1648985] Fps is (10 sec: 39321.4, 60 sec: 46421.2, 300 sec: 46541.7). Total num frames: 516980736. Throughput: 0: 11491.6. Samples: 129314304. Policy #0 lag: (min: 15.0, avg: 115.1, max: 271.0) [2024-06-15 14:38:30,956][1648985] Avg episode reward: [(0, '129.480')] [2024-06-15 14:38:32,048][1652491] Updated weights for policy 0, policy_version 252481 (0.0014) [2024-06-15 14:38:32,942][1651469] Signal inference workers to stop experience collection... (13200 times) [2024-06-15 14:38:32,992][1652491] InferenceWorker_p0-w0: stopping experience collection (13200 times) [2024-06-15 14:38:33,198][1651469] Signal inference workers to resume experience collection... (13200 times) [2024-06-15 14:38:33,199][1652491] InferenceWorker_p0-w0: resuming experience collection (13200 times) [2024-06-15 14:38:33,435][1652491] Updated weights for policy 0, policy_version 252536 (0.0012) [2024-06-15 14:38:35,955][1648985] Fps is (10 sec: 49151.8, 60 sec: 44236.7, 300 sec: 46541.7). Total num frames: 517210112. Throughput: 0: 11389.1. Samples: 129387008. Policy #0 lag: (min: 15.0, avg: 115.1, max: 271.0) [2024-06-15 14:38:35,956][1648985] Avg episode reward: [(0, '136.280')] [2024-06-15 14:38:37,661][1652491] Updated weights for policy 0, policy_version 252600 (0.0029) [2024-06-15 14:38:40,712][1652491] Updated weights for policy 0, policy_version 252656 (0.0012) [2024-06-15 14:38:40,955][1648985] Fps is (10 sec: 49152.8, 60 sec: 45875.2, 300 sec: 46652.8). Total num frames: 517472256. Throughput: 0: 11332.3. Samples: 129420288. Policy #0 lag: (min: 15.0, avg: 115.1, max: 271.0) [2024-06-15 14:38:40,956][1648985] Avg episode reward: [(0, '132.630')] [2024-06-15 14:38:42,673][1652491] Updated weights for policy 0, policy_version 252723 (0.0014) [2024-06-15 14:38:45,955][1648985] Fps is (10 sec: 52429.6, 60 sec: 45875.3, 300 sec: 46986.0). Total num frames: 517734400. Throughput: 0: 11229.9. Samples: 129476608. Policy #0 lag: (min: 15.0, avg: 115.1, max: 271.0) [2024-06-15 14:38:45,955][1648985] Avg episode reward: [(0, '123.770')] [2024-06-15 14:38:48,545][1652491] Updated weights for policy 0, policy_version 252801 (0.0012) [2024-06-15 14:38:50,955][1648985] Fps is (10 sec: 39321.5, 60 sec: 45875.2, 300 sec: 46430.6). Total num frames: 517865472. Throughput: 0: 11241.3. Samples: 129552896. Policy #0 lag: (min: 15.0, avg: 115.1, max: 271.0) [2024-06-15 14:38:50,956][1648985] Avg episode reward: [(0, '127.360')] [2024-06-15 14:38:51,903][1652491] Updated weights for policy 0, policy_version 252867 (0.0013) [2024-06-15 14:38:53,041][1652491] Updated weights for policy 0, policy_version 252927 (0.0037) [2024-06-15 14:38:54,781][1652491] Updated weights for policy 0, policy_version 252977 (0.0013) [2024-06-15 14:38:55,955][1648985] Fps is (10 sec: 45873.7, 60 sec: 46967.3, 300 sec: 46874.9). Total num frames: 518193152. Throughput: 0: 11354.9. Samples: 129586176. Policy #0 lag: (min: 15.0, avg: 115.1, max: 271.0) [2024-06-15 14:38:55,956][1648985] Avg episode reward: [(0, '135.430')] [2024-06-15 14:38:56,345][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000253040_518225920.pth... [2024-06-15 14:38:56,398][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000247552_506986496.pth [2024-06-15 14:38:56,515][1652491] Updated weights for policy 0, policy_version 253045 (0.0107) [2024-06-15 14:39:00,271][1652491] Updated weights for policy 0, policy_version 253088 (0.0012) [2024-06-15 14:39:00,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 45875.2, 300 sec: 46652.7). Total num frames: 518389760. Throughput: 0: 11195.7. Samples: 129655296. Policy #0 lag: (min: 15.0, avg: 115.1, max: 271.0) [2024-06-15 14:39:00,955][1648985] Avg episode reward: [(0, '140.490')] [2024-06-15 14:39:04,057][1652491] Updated weights for policy 0, policy_version 253168 (0.0014) [2024-06-15 14:39:05,900][1652491] Updated weights for policy 0, policy_version 253202 (0.0013) [2024-06-15 14:39:05,955][1648985] Fps is (10 sec: 36045.1, 60 sec: 45328.9, 300 sec: 46430.6). Total num frames: 518553600. Throughput: 0: 11480.1. Samples: 129726464. Policy #0 lag: (min: 15.0, avg: 115.1, max: 271.0) [2024-06-15 14:39:05,956][1648985] Avg episode reward: [(0, '143.340')] [2024-06-15 14:39:08,207][1652491] Updated weights for policy 0, policy_version 253286 (0.0015) [2024-06-15 14:39:10,955][1648985] Fps is (10 sec: 39321.6, 60 sec: 43690.7, 300 sec: 46208.5). Total num frames: 518782976. Throughput: 0: 11070.6. Samples: 129747456. Policy #0 lag: (min: 15.0, avg: 115.1, max: 271.0) [2024-06-15 14:39:10,955][1648985] Avg episode reward: [(0, '130.610')] [2024-06-15 14:39:12,896][1652491] Updated weights for policy 0, policy_version 253344 (0.0024) [2024-06-15 14:39:15,742][1652491] Updated weights for policy 0, policy_version 253408 (0.0013) [2024-06-15 14:39:15,955][1648985] Fps is (10 sec: 42599.2, 60 sec: 44782.9, 300 sec: 46430.6). Total num frames: 518979584. Throughput: 0: 11309.5. Samples: 129823232. Policy #0 lag: (min: 15.0, avg: 115.1, max: 271.0) [2024-06-15 14:39:15,956][1648985] Avg episode reward: [(0, '140.210')] [2024-06-15 14:39:16,964][1652491] Updated weights for policy 0, policy_version 253444 (0.0011) [2024-06-15 14:39:18,442][1651469] Signal inference workers to stop experience collection... (13250 times) [2024-06-15 14:39:18,487][1652491] InferenceWorker_p0-w0: stopping experience collection (13250 times) [2024-06-15 14:39:18,718][1651469] Signal inference workers to resume experience collection... (13250 times) [2024-06-15 14:39:18,719][1652491] InferenceWorker_p0-w0: resuming experience collection (13250 times) [2024-06-15 14:39:19,048][1652491] Updated weights for policy 0, policy_version 253536 (0.0013) [2024-06-15 14:39:20,955][1648985] Fps is (10 sec: 52428.3, 60 sec: 45329.1, 300 sec: 46652.7). Total num frames: 519307264. Throughput: 0: 11025.1. Samples: 129883136. Policy #0 lag: (min: 15.0, avg: 115.1, max: 271.0) [2024-06-15 14:39:20,956][1648985] Avg episode reward: [(0, '148.330')] [2024-06-15 14:39:24,846][1652491] Updated weights for policy 0, policy_version 253616 (0.0045) [2024-06-15 14:39:25,955][1648985] Fps is (10 sec: 45875.5, 60 sec: 45329.2, 300 sec: 46208.4). Total num frames: 519438336. Throughput: 0: 11286.8. Samples: 129928192. Policy #0 lag: (min: 15.0, avg: 115.1, max: 271.0) [2024-06-15 14:39:25,956][1648985] Avg episode reward: [(0, '149.600')] [2024-06-15 14:39:27,145][1652491] Updated weights for policy 0, policy_version 253670 (0.0019) [2024-06-15 14:39:28,217][1652491] Updated weights for policy 0, policy_version 253707 (0.0013) [2024-06-15 14:39:30,127][1652491] Updated weights for policy 0, policy_version 253792 (0.0012) [2024-06-15 14:39:30,955][1648985] Fps is (10 sec: 52429.1, 60 sec: 47513.7, 300 sec: 47097.1). Total num frames: 519831552. Throughput: 0: 11411.9. Samples: 129990144. Policy #0 lag: (min: 15.0, avg: 115.1, max: 271.0) [2024-06-15 14:39:30,956][1648985] Avg episode reward: [(0, '136.380')] [2024-06-15 14:39:35,407][1652491] Updated weights for policy 0, policy_version 253840 (0.0013) [2024-06-15 14:39:35,955][1648985] Fps is (10 sec: 45874.8, 60 sec: 44783.0, 300 sec: 46319.5). Total num frames: 519897088. Throughput: 0: 11389.1. Samples: 130065408. Policy #0 lag: (min: 15.0, avg: 115.1, max: 271.0) [2024-06-15 14:39:35,956][1648985] Avg episode reward: [(0, '137.900')] [2024-06-15 14:39:37,557][1652491] Updated weights for policy 0, policy_version 253904 (0.0016) [2024-06-15 14:39:39,521][1652491] Updated weights for policy 0, policy_version 253955 (0.0017) [2024-06-15 14:39:40,955][1648985] Fps is (10 sec: 39321.0, 60 sec: 45875.0, 300 sec: 46652.8). Total num frames: 520224768. Throughput: 0: 11377.8. Samples: 130098176. Policy #0 lag: (min: 13.0, avg: 131.8, max: 269.0) [2024-06-15 14:39:40,956][1648985] Avg episode reward: [(0, '135.290')] [2024-06-15 14:39:41,235][1652491] Updated weights for policy 0, policy_version 254037 (0.0012) [2024-06-15 14:39:41,939][1652491] Updated weights for policy 0, policy_version 254079 (0.0031) [2024-06-15 14:39:45,955][1648985] Fps is (10 sec: 49151.9, 60 sec: 44236.7, 300 sec: 46319.5). Total num frames: 520388608. Throughput: 0: 11457.4. Samples: 130170880. Policy #0 lag: (min: 13.0, avg: 131.8, max: 269.0) [2024-06-15 14:39:45,956][1648985] Avg episode reward: [(0, '129.640')] [2024-06-15 14:39:46,871][1652491] Updated weights for policy 0, policy_version 254137 (0.0017) [2024-06-15 14:39:49,108][1652491] Updated weights for policy 0, policy_version 254179 (0.0012) [2024-06-15 14:39:50,896][1652491] Updated weights for policy 0, policy_version 254224 (0.0012) [2024-06-15 14:39:50,955][1648985] Fps is (10 sec: 42599.1, 60 sec: 46421.3, 300 sec: 46430.6). Total num frames: 520650752. Throughput: 0: 11503.0. Samples: 130244096. Policy #0 lag: (min: 13.0, avg: 131.8, max: 269.0) [2024-06-15 14:39:50,956][1648985] Avg episode reward: [(0, '128.210')] [2024-06-15 14:39:52,164][1652491] Updated weights for policy 0, policy_version 254276 (0.0033) [2024-06-15 14:39:55,955][1648985] Fps is (10 sec: 49151.4, 60 sec: 44783.0, 300 sec: 46208.4). Total num frames: 520880128. Throughput: 0: 11684.9. Samples: 130273280. Policy #0 lag: (min: 13.0, avg: 131.8, max: 269.0) [2024-06-15 14:39:55,956][1648985] Avg episode reward: [(0, '124.830')] [2024-06-15 14:39:56,733][1652491] Updated weights for policy 0, policy_version 254338 (0.0012) [2024-06-15 14:39:58,077][1652491] Updated weights for policy 0, policy_version 254395 (0.0011) [2024-06-15 14:40:00,445][1652491] Updated weights for policy 0, policy_version 254448 (0.0012) [2024-06-15 14:40:00,955][1648985] Fps is (10 sec: 49151.8, 60 sec: 45875.1, 300 sec: 46541.7). Total num frames: 521142272. Throughput: 0: 11707.7. Samples: 130350080. Policy #0 lag: (min: 13.0, avg: 131.8, max: 269.0) [2024-06-15 14:40:00,956][1648985] Avg episode reward: [(0, '146.830')] [2024-06-15 14:40:01,552][1652491] Updated weights for policy 0, policy_version 254480 (0.0009) [2024-06-15 14:40:02,033][1651469] Signal inference workers to stop experience collection... (13300 times) [2024-06-15 14:40:02,085][1652491] InferenceWorker_p0-w0: stopping experience collection (13300 times) [2024-06-15 14:40:02,282][1651469] Signal inference workers to resume experience collection... (13300 times) [2024-06-15 14:40:02,282][1652491] InferenceWorker_p0-w0: resuming experience collection (13300 times) [2024-06-15 14:40:02,663][1652491] Updated weights for policy 0, policy_version 254528 (0.0013) [2024-06-15 14:40:05,955][1648985] Fps is (10 sec: 52429.3, 60 sec: 47513.7, 300 sec: 46208.4). Total num frames: 521404416. Throughput: 0: 11867.0. Samples: 130417152. Policy #0 lag: (min: 13.0, avg: 131.8, max: 269.0) [2024-06-15 14:40:05,956][1648985] Avg episode reward: [(0, '149.250')] [2024-06-15 14:40:07,533][1652491] Updated weights for policy 0, policy_version 254598 (0.0016) [2024-06-15 14:40:08,744][1652491] Updated weights for policy 0, policy_version 254655 (0.0015) [2024-06-15 14:40:10,955][1648985] Fps is (10 sec: 42598.6, 60 sec: 46421.3, 300 sec: 46319.5). Total num frames: 521568256. Throughput: 0: 11650.8. Samples: 130452480. Policy #0 lag: (min: 13.0, avg: 131.8, max: 269.0) [2024-06-15 14:40:10,956][1648985] Avg episode reward: [(0, '149.460')] [2024-06-15 14:40:11,878][1652491] Updated weights for policy 0, policy_version 254710 (0.0037) [2024-06-15 14:40:13,228][1652491] Updated weights for policy 0, policy_version 254740 (0.0015) [2024-06-15 14:40:14,939][1652491] Updated weights for policy 0, policy_version 254789 (0.0025) [2024-06-15 14:40:15,955][1648985] Fps is (10 sec: 52428.2, 60 sec: 49151.9, 300 sec: 46652.7). Total num frames: 521928704. Throughput: 0: 11821.5. Samples: 130522112. Policy #0 lag: (min: 13.0, avg: 131.8, max: 269.0) [2024-06-15 14:40:15,956][1648985] Avg episode reward: [(0, '141.310')] [2024-06-15 14:40:19,290][1652491] Updated weights for policy 0, policy_version 254867 (0.0016) [2024-06-15 14:40:20,955][1648985] Fps is (10 sec: 49152.3, 60 sec: 45875.3, 300 sec: 46430.6). Total num frames: 522059776. Throughput: 0: 11707.8. Samples: 130592256. Policy #0 lag: (min: 13.0, avg: 131.8, max: 269.0) [2024-06-15 14:40:20,956][1648985] Avg episode reward: [(0, '152.740')] [2024-06-15 14:40:22,428][1652491] Updated weights for policy 0, policy_version 254928 (0.0012) [2024-06-15 14:40:23,588][1652491] Updated weights for policy 0, policy_version 254973 (0.0013) [2024-06-15 14:40:25,819][1652491] Updated weights for policy 0, policy_version 255040 (0.0012) [2024-06-15 14:40:25,955][1648985] Fps is (10 sec: 39322.1, 60 sec: 48059.6, 300 sec: 46430.6). Total num frames: 522321920. Throughput: 0: 11696.4. Samples: 130624512. Policy #0 lag: (min: 13.0, avg: 131.8, max: 269.0) [2024-06-15 14:40:25,956][1648985] Avg episode reward: [(0, '141.530')] [2024-06-15 14:40:30,464][1652491] Updated weights for policy 0, policy_version 255106 (0.0013) [2024-06-15 14:40:30,955][1648985] Fps is (10 sec: 45874.1, 60 sec: 44782.8, 300 sec: 46430.6). Total num frames: 522518528. Throughput: 0: 11662.2. Samples: 130695680. Policy #0 lag: (min: 13.0, avg: 131.8, max: 269.0) [2024-06-15 14:40:30,956][1648985] Avg episode reward: [(0, '146.810')] [2024-06-15 14:40:31,496][1652491] Updated weights for policy 0, policy_version 255166 (0.0014) [2024-06-15 14:40:34,442][1652491] Updated weights for policy 0, policy_version 255224 (0.0014) [2024-06-15 14:40:35,955][1648985] Fps is (10 sec: 39321.5, 60 sec: 46967.4, 300 sec: 46208.4). Total num frames: 522715136. Throughput: 0: 11605.3. Samples: 130766336. Policy #0 lag: (min: 13.0, avg: 131.8, max: 269.0) [2024-06-15 14:40:35,956][1648985] Avg episode reward: [(0, '154.350')] [2024-06-15 14:40:37,255][1652491] Updated weights for policy 0, policy_version 255288 (0.0016) [2024-06-15 14:40:38,525][1652491] Updated weights for policy 0, policy_version 255330 (0.0011) [2024-06-15 14:40:40,955][1648985] Fps is (10 sec: 45875.6, 60 sec: 45875.3, 300 sec: 46208.4). Total num frames: 522977280. Throughput: 0: 11491.6. Samples: 130790400. Policy #0 lag: (min: 13.0, avg: 131.8, max: 269.0) [2024-06-15 14:40:40,956][1648985] Avg episode reward: [(0, '155.980')] [2024-06-15 14:40:42,909][1652491] Updated weights for policy 0, policy_version 255376 (0.0013) [2024-06-15 14:40:44,134][1652491] Updated weights for policy 0, policy_version 255424 (0.0014) [2024-06-15 14:40:45,955][1648985] Fps is (10 sec: 49151.9, 60 sec: 46967.4, 300 sec: 46319.5). Total num frames: 523206656. Throughput: 0: 11434.7. Samples: 130864640. Policy #0 lag: (min: 13.0, avg: 131.8, max: 269.0) [2024-06-15 14:40:45,956][1648985] Avg episode reward: [(0, '139.540')] [2024-06-15 14:40:46,100][1652491] Updated weights for policy 0, policy_version 255488 (0.0013) [2024-06-15 14:40:49,332][1651469] Signal inference workers to stop experience collection... (13350 times) [2024-06-15 14:40:49,375][1652491] InferenceWorker_p0-w0: stopping experience collection (13350 times) [2024-06-15 14:40:49,655][1651469] Signal inference workers to resume experience collection... (13350 times) [2024-06-15 14:40:49,657][1652491] InferenceWorker_p0-w0: resuming experience collection (13350 times) [2024-06-15 14:40:49,801][1652491] Updated weights for policy 0, policy_version 255554 (0.0049) [2024-06-15 14:40:50,955][1648985] Fps is (10 sec: 45875.9, 60 sec: 46421.4, 300 sec: 45986.3). Total num frames: 523436032. Throughput: 0: 11264.0. Samples: 130924032. Policy #0 lag: (min: 68.0, avg: 171.6, max: 284.0) [2024-06-15 14:40:50,956][1648985] Avg episode reward: [(0, '148.580')] [2024-06-15 14:40:55,158][1652491] Updated weights for policy 0, policy_version 255619 (0.0102) [2024-06-15 14:40:55,955][1648985] Fps is (10 sec: 36044.6, 60 sec: 44783.0, 300 sec: 45986.3). Total num frames: 523567104. Throughput: 0: 11343.6. Samples: 130962944. Policy #0 lag: (min: 68.0, avg: 171.6, max: 284.0) [2024-06-15 14:40:55,956][1648985] Avg episode reward: [(0, '136.650')] [2024-06-15 14:40:56,269][1652491] Updated weights for policy 0, policy_version 255670 (0.0015) [2024-06-15 14:40:56,439][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000255680_523632640.pth... [2024-06-15 14:40:56,554][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000250304_512622592.pth [2024-06-15 14:40:57,868][1652491] Updated weights for policy 0, policy_version 255741 (0.0029) [2024-06-15 14:41:00,955][1648985] Fps is (10 sec: 42598.3, 60 sec: 45329.1, 300 sec: 45653.1). Total num frames: 523862016. Throughput: 0: 11332.3. Samples: 131032064. Policy #0 lag: (min: 68.0, avg: 171.6, max: 284.0) [2024-06-15 14:41:00,956][1648985] Avg episode reward: [(0, '145.460')] [2024-06-15 14:41:01,038][1652491] Updated weights for policy 0, policy_version 255793 (0.0012) [2024-06-15 14:41:02,644][1652491] Updated weights for policy 0, policy_version 255859 (0.0012) [2024-06-15 14:41:05,955][1648985] Fps is (10 sec: 45875.6, 60 sec: 43690.7, 300 sec: 45764.1). Total num frames: 524025856. Throughput: 0: 11264.0. Samples: 131099136. Policy #0 lag: (min: 68.0, avg: 171.6, max: 284.0) [2024-06-15 14:41:05,956][1648985] Avg episode reward: [(0, '130.650')] [2024-06-15 14:41:06,806][1652491] Updated weights for policy 0, policy_version 255873 (0.0011) [2024-06-15 14:41:08,542][1652491] Updated weights for policy 0, policy_version 255952 (0.0014) [2024-06-15 14:41:10,955][1648985] Fps is (10 sec: 42598.5, 60 sec: 45329.1, 300 sec: 45430.9). Total num frames: 524288000. Throughput: 0: 11184.4. Samples: 131127808. Policy #0 lag: (min: 68.0, avg: 171.6, max: 284.0) [2024-06-15 14:41:10,955][1648985] Avg episode reward: [(0, '132.810')] [2024-06-15 14:41:12,483][1652491] Updated weights for policy 0, policy_version 256035 (0.0016) [2024-06-15 14:41:14,429][1652491] Updated weights for policy 0, policy_version 256112 (0.0014) [2024-06-15 14:41:15,955][1648985] Fps is (10 sec: 52429.1, 60 sec: 43690.8, 300 sec: 46208.5). Total num frames: 524550144. Throughput: 0: 10854.4. Samples: 131184128. Policy #0 lag: (min: 68.0, avg: 171.6, max: 284.0) [2024-06-15 14:41:15,956][1648985] Avg episode reward: [(0, '125.020')] [2024-06-15 14:41:19,733][1652491] Updated weights for policy 0, policy_version 256146 (0.0013) [2024-06-15 14:41:20,955][1648985] Fps is (10 sec: 39321.3, 60 sec: 43690.6, 300 sec: 45430.9). Total num frames: 524681216. Throughput: 0: 10911.3. Samples: 131257344. Policy #0 lag: (min: 68.0, avg: 171.6, max: 284.0) [2024-06-15 14:41:20,956][1648985] Avg episode reward: [(0, '119.390')] [2024-06-15 14:41:21,192][1652491] Updated weights for policy 0, policy_version 256216 (0.0020) [2024-06-15 14:41:22,175][1652491] Updated weights for policy 0, policy_version 256256 (0.0015) [2024-06-15 14:41:25,632][1652491] Updated weights for policy 0, policy_version 256323 (0.0015) [2024-06-15 14:41:25,955][1648985] Fps is (10 sec: 42597.7, 60 sec: 44236.7, 300 sec: 45875.2). Total num frames: 524976128. Throughput: 0: 11116.1. Samples: 131290624. Policy #0 lag: (min: 68.0, avg: 171.6, max: 284.0) [2024-06-15 14:41:25,957][1648985] Avg episode reward: [(0, '124.160')] [2024-06-15 14:41:26,950][1652491] Updated weights for policy 0, policy_version 256376 (0.0013) [2024-06-15 14:41:30,955][1648985] Fps is (10 sec: 39321.3, 60 sec: 42598.4, 300 sec: 45430.9). Total num frames: 525074432. Throughput: 0: 11059.2. Samples: 131362304. Policy #0 lag: (min: 68.0, avg: 171.6, max: 284.0) [2024-06-15 14:41:30,956][1648985] Avg episode reward: [(0, '132.600')] [2024-06-15 14:41:32,046][1652491] Updated weights for policy 0, policy_version 256432 (0.0014) [2024-06-15 14:41:32,743][1651469] Signal inference workers to stop experience collection... (13400 times) [2024-06-15 14:41:32,821][1652491] InferenceWorker_p0-w0: stopping experience collection (13400 times) [2024-06-15 14:41:32,951][1651469] Signal inference workers to resume experience collection... (13400 times) [2024-06-15 14:41:32,959][1652491] InferenceWorker_p0-w0: resuming experience collection (13400 times) [2024-06-15 14:41:33,115][1652491] Updated weights for policy 0, policy_version 256482 (0.0013) [2024-06-15 14:41:34,541][1652491] Updated weights for policy 0, policy_version 256529 (0.0031) [2024-06-15 14:41:35,664][1652491] Updated weights for policy 0, policy_version 256576 (0.0012) [2024-06-15 14:41:35,955][1648985] Fps is (10 sec: 49153.1, 60 sec: 45875.3, 300 sec: 45764.2). Total num frames: 525467648. Throughput: 0: 11138.8. Samples: 131425280. Policy #0 lag: (min: 68.0, avg: 171.6, max: 284.0) [2024-06-15 14:41:35,955][1648985] Avg episode reward: [(0, '140.430')] [2024-06-15 14:41:38,488][1652491] Updated weights for policy 0, policy_version 256637 (0.0013) [2024-06-15 14:41:40,955][1648985] Fps is (10 sec: 52427.9, 60 sec: 43690.5, 300 sec: 45764.1). Total num frames: 525598720. Throughput: 0: 11047.8. Samples: 131460096. Policy #0 lag: (min: 68.0, avg: 171.6, max: 284.0) [2024-06-15 14:41:40,956][1648985] Avg episode reward: [(0, '157.740')] [2024-06-15 14:41:43,528][1652491] Updated weights for policy 0, policy_version 256689 (0.0013) [2024-06-15 14:41:44,906][1652491] Updated weights for policy 0, policy_version 256752 (0.0013) [2024-06-15 14:41:45,955][1648985] Fps is (10 sec: 42597.9, 60 sec: 44782.9, 300 sec: 45430.9). Total num frames: 525893632. Throughput: 0: 11150.2. Samples: 131533824. Policy #0 lag: (min: 68.0, avg: 171.6, max: 284.0) [2024-06-15 14:41:45,956][1648985] Avg episode reward: [(0, '147.080')] [2024-06-15 14:41:46,828][1652491] Updated weights for policy 0, policy_version 256832 (0.0100) [2024-06-15 14:41:50,264][1652491] Updated weights for policy 0, policy_version 256890 (0.0014) [2024-06-15 14:41:50,955][1648985] Fps is (10 sec: 52430.5, 60 sec: 44782.9, 300 sec: 45875.7). Total num frames: 526123008. Throughput: 0: 11104.7. Samples: 131598848. Policy #0 lag: (min: 68.0, avg: 171.6, max: 284.0) [2024-06-15 14:41:50,956][1648985] Avg episode reward: [(0, '140.520')] [2024-06-15 14:41:54,222][1652491] Updated weights for policy 0, policy_version 256916 (0.0013) [2024-06-15 14:41:55,917][1652491] Updated weights for policy 0, policy_version 256992 (0.0126) [2024-06-15 14:41:55,955][1648985] Fps is (10 sec: 42597.7, 60 sec: 45875.1, 300 sec: 45208.7). Total num frames: 526319616. Throughput: 0: 11423.2. Samples: 131641856. Policy #0 lag: (min: 68.0, avg: 171.6, max: 284.0) [2024-06-15 14:41:55,957][1648985] Avg episode reward: [(0, '144.660')] [2024-06-15 14:41:57,657][1652491] Updated weights for policy 0, policy_version 257072 (0.0026) [2024-06-15 14:42:00,955][1648985] Fps is (10 sec: 45875.3, 60 sec: 45329.1, 300 sec: 45764.1). Total num frames: 526581760. Throughput: 0: 11571.2. Samples: 131704832. Policy #0 lag: (min: 68.0, avg: 171.6, max: 284.0) [2024-06-15 14:42:00,955][1648985] Avg episode reward: [(0, '167.020')] [2024-06-15 14:42:01,052][1652491] Updated weights for policy 0, policy_version 257125 (0.0020) [2024-06-15 14:42:05,616][1652491] Updated weights for policy 0, policy_version 257168 (0.0013) [2024-06-15 14:42:05,955][1648985] Fps is (10 sec: 36045.3, 60 sec: 44236.8, 300 sec: 45208.7). Total num frames: 526680064. Throughput: 0: 11514.3. Samples: 131775488. Policy #0 lag: (min: 37.0, avg: 167.1, max: 293.0) [2024-06-15 14:42:05,956][1648985] Avg episode reward: [(0, '166.980')] [2024-06-15 14:42:07,274][1652491] Updated weights for policy 0, policy_version 257237 (0.0014) [2024-06-15 14:42:08,923][1652491] Updated weights for policy 0, policy_version 257314 (0.0013) [2024-06-15 14:42:10,955][1648985] Fps is (10 sec: 45874.5, 60 sec: 45875.1, 300 sec: 45764.1). Total num frames: 527040512. Throughput: 0: 11355.0. Samples: 131801600. Policy #0 lag: (min: 37.0, avg: 167.1, max: 293.0) [2024-06-15 14:42:10,956][1648985] Avg episode reward: [(0, '148.060')] [2024-06-15 14:42:11,544][1652491] Updated weights for policy 0, policy_version 257349 (0.0013) [2024-06-15 14:42:12,363][1651469] Signal inference workers to stop experience collection... (13450 times) [2024-06-15 14:42:12,392][1652491] InferenceWorker_p0-w0: stopping experience collection (13450 times) [2024-06-15 14:42:12,764][1651469] Signal inference workers to resume experience collection... (13450 times) [2024-06-15 14:42:12,778][1652491] InferenceWorker_p0-w0: resuming experience collection (13450 times) [2024-06-15 14:42:15,955][1648985] Fps is (10 sec: 49152.8, 60 sec: 43690.7, 300 sec: 45542.0). Total num frames: 527171584. Throughput: 0: 11320.9. Samples: 131871744. Policy #0 lag: (min: 37.0, avg: 167.1, max: 293.0) [2024-06-15 14:42:15,955][1648985] Avg episode reward: [(0, '140.310')] [2024-06-15 14:42:17,394][1652491] Updated weights for policy 0, policy_version 257410 (0.0013) [2024-06-15 14:42:19,350][1652491] Updated weights for policy 0, policy_version 257504 (0.0109) [2024-06-15 14:42:20,540][1652491] Updated weights for policy 0, policy_version 257555 (0.0016) [2024-06-15 14:42:20,955][1648985] Fps is (10 sec: 45875.6, 60 sec: 46967.5, 300 sec: 45542.0). Total num frames: 527499264. Throughput: 0: 11400.5. Samples: 131938304. Policy #0 lag: (min: 37.0, avg: 167.1, max: 293.0) [2024-06-15 14:42:20,956][1648985] Avg episode reward: [(0, '159.730')] [2024-06-15 14:42:21,600][1652491] Updated weights for policy 0, policy_version 257600 (0.0012) [2024-06-15 14:42:24,399][1652491] Updated weights for policy 0, policy_version 257655 (0.0043) [2024-06-15 14:42:25,955][1648985] Fps is (10 sec: 52426.6, 60 sec: 45328.9, 300 sec: 45764.1). Total num frames: 527695872. Throughput: 0: 11366.4. Samples: 131971584. Policy #0 lag: (min: 37.0, avg: 167.1, max: 293.0) [2024-06-15 14:42:25,956][1648985] Avg episode reward: [(0, '156.680')] [2024-06-15 14:42:28,819][1652491] Updated weights for policy 0, policy_version 257696 (0.0031) [2024-06-15 14:42:30,376][1652491] Updated weights for policy 0, policy_version 257760 (0.0072) [2024-06-15 14:42:30,955][1648985] Fps is (10 sec: 42597.3, 60 sec: 47513.5, 300 sec: 45319.8). Total num frames: 527925248. Throughput: 0: 11446.0. Samples: 132048896. Policy #0 lag: (min: 37.0, avg: 167.1, max: 293.0) [2024-06-15 14:42:30,956][1648985] Avg episode reward: [(0, '148.030')] [2024-06-15 14:42:32,304][1652491] Updated weights for policy 0, policy_version 257843 (0.0013) [2024-06-15 14:42:35,771][1652491] Updated weights for policy 0, policy_version 257888 (0.0015) [2024-06-15 14:42:35,955][1648985] Fps is (10 sec: 45875.6, 60 sec: 44782.7, 300 sec: 45541.9). Total num frames: 528154624. Throughput: 0: 11411.8. Samples: 132112384. Policy #0 lag: (min: 37.0, avg: 167.1, max: 293.0) [2024-06-15 14:42:35,956][1648985] Avg episode reward: [(0, '144.820')] [2024-06-15 14:42:36,564][1652491] Updated weights for policy 0, policy_version 257920 (0.0012) [2024-06-15 14:42:40,955][1648985] Fps is (10 sec: 39322.0, 60 sec: 45329.2, 300 sec: 45208.7). Total num frames: 528318464. Throughput: 0: 11332.3. Samples: 132151808. Policy #0 lag: (min: 37.0, avg: 167.1, max: 293.0) [2024-06-15 14:42:40,956][1648985] Avg episode reward: [(0, '134.870')] [2024-06-15 14:42:41,486][1652491] Updated weights for policy 0, policy_version 258004 (0.0013) [2024-06-15 14:42:43,192][1652491] Updated weights for policy 0, policy_version 258080 (0.0013) [2024-06-15 14:42:45,955][1648985] Fps is (10 sec: 45876.3, 60 sec: 45329.1, 300 sec: 45764.1). Total num frames: 528613376. Throughput: 0: 11298.1. Samples: 132213248. Policy #0 lag: (min: 37.0, avg: 167.1, max: 293.0) [2024-06-15 14:42:45,956][1648985] Avg episode reward: [(0, '135.310')] [2024-06-15 14:42:47,527][1652491] Updated weights for policy 0, policy_version 258168 (0.0014) [2024-06-15 14:42:50,955][1648985] Fps is (10 sec: 45875.1, 60 sec: 44236.6, 300 sec: 45430.9). Total num frames: 528777216. Throughput: 0: 11548.4. Samples: 132295168. Policy #0 lag: (min: 37.0, avg: 167.1, max: 293.0) [2024-06-15 14:42:50,956][1648985] Avg episode reward: [(0, '135.800')] [2024-06-15 14:42:51,451][1652491] Updated weights for policy 0, policy_version 258224 (0.0013) [2024-06-15 14:42:52,624][1652491] Updated weights for policy 0, policy_version 258288 (0.0038) [2024-06-15 14:42:52,825][1651469] Signal inference workers to stop experience collection... (13500 times) [2024-06-15 14:42:52,900][1652491] InferenceWorker_p0-w0: stopping experience collection (13500 times) [2024-06-15 14:42:53,094][1651469] Signal inference workers to resume experience collection... (13500 times) [2024-06-15 14:42:53,095][1652491] InferenceWorker_p0-w0: resuming experience collection (13500 times) [2024-06-15 14:42:54,414][1652491] Updated weights for policy 0, policy_version 258361 (0.0128) [2024-06-15 14:42:55,955][1648985] Fps is (10 sec: 52428.6, 60 sec: 46967.6, 300 sec: 45764.1). Total num frames: 529137664. Throughput: 0: 11662.2. Samples: 132326400. Policy #0 lag: (min: 37.0, avg: 167.1, max: 293.0) [2024-06-15 14:42:55,955][1648985] Avg episode reward: [(0, '144.660')] [2024-06-15 14:42:55,976][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000258368_529137664.pth... [2024-06-15 14:42:56,077][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000253040_518225920.pth [2024-06-15 14:43:00,955][1648985] Fps is (10 sec: 49152.6, 60 sec: 44782.8, 300 sec: 45542.0). Total num frames: 529268736. Throughput: 0: 11582.5. Samples: 132392960. Policy #0 lag: (min: 37.0, avg: 167.1, max: 293.0) [2024-06-15 14:43:00,956][1648985] Avg episode reward: [(0, '148.600')] [2024-06-15 14:43:02,473][1652491] Updated weights for policy 0, policy_version 258435 (0.0015) [2024-06-15 14:43:04,074][1652491] Updated weights for policy 0, policy_version 258514 (0.0014) [2024-06-15 14:43:05,521][1652491] Updated weights for policy 0, policy_version 258581 (0.0013) [2024-06-15 14:43:05,955][1648985] Fps is (10 sec: 45875.4, 60 sec: 48606.0, 300 sec: 45542.0). Total num frames: 529596416. Throughput: 0: 11525.7. Samples: 132456960. Policy #0 lag: (min: 37.0, avg: 167.1, max: 293.0) [2024-06-15 14:43:05,956][1648985] Avg episode reward: [(0, '139.070')] [2024-06-15 14:43:06,412][1652491] Updated weights for policy 0, policy_version 258620 (0.0013) [2024-06-15 14:43:10,100][1652491] Updated weights for policy 0, policy_version 258659 (0.0030) [2024-06-15 14:43:10,955][1648985] Fps is (10 sec: 52428.2, 60 sec: 45875.1, 300 sec: 45764.1). Total num frames: 529793024. Throughput: 0: 11662.3. Samples: 132496384. Policy #0 lag: (min: 37.0, avg: 167.1, max: 293.0) [2024-06-15 14:43:10,956][1648985] Avg episode reward: [(0, '132.920')] [2024-06-15 14:43:14,260][1652491] Updated weights for policy 0, policy_version 258708 (0.0012) [2024-06-15 14:43:15,955][1648985] Fps is (10 sec: 39321.6, 60 sec: 46967.4, 300 sec: 45430.9). Total num frames: 529989632. Throughput: 0: 11605.4. Samples: 132571136. Policy #0 lag: (min: 15.0, avg: 84.4, max: 271.0) [2024-06-15 14:43:15,956][1648985] Avg episode reward: [(0, '146.120')] [2024-06-15 14:43:16,053][1652491] Updated weights for policy 0, policy_version 258787 (0.0013) [2024-06-15 14:43:17,861][1652491] Updated weights for policy 0, policy_version 258853 (0.0151) [2024-06-15 14:43:20,955][1648985] Fps is (10 sec: 42599.5, 60 sec: 45329.1, 300 sec: 45764.1). Total num frames: 530219008. Throughput: 0: 11571.3. Samples: 132633088. Policy #0 lag: (min: 15.0, avg: 84.4, max: 271.0) [2024-06-15 14:43:20,955][1648985] Avg episode reward: [(0, '169.070')] [2024-06-15 14:43:21,757][1652491] Updated weights for policy 0, policy_version 258928 (0.0014) [2024-06-15 14:43:25,955][1648985] Fps is (10 sec: 36043.9, 60 sec: 44236.9, 300 sec: 45319.8). Total num frames: 530350080. Throughput: 0: 11411.9. Samples: 132665344. Policy #0 lag: (min: 15.0, avg: 84.4, max: 271.0) [2024-06-15 14:43:25,956][1648985] Avg episode reward: [(0, '149.890')] [2024-06-15 14:43:25,983][1652491] Updated weights for policy 0, policy_version 258963 (0.0015) [2024-06-15 14:43:27,038][1652491] Updated weights for policy 0, policy_version 259013 (0.0014) [2024-06-15 14:43:29,118][1652491] Updated weights for policy 0, policy_version 259104 (0.0015) [2024-06-15 14:43:30,955][1648985] Fps is (10 sec: 49151.2, 60 sec: 46421.5, 300 sec: 45764.1). Total num frames: 530710528. Throughput: 0: 11491.5. Samples: 132730368. Policy #0 lag: (min: 15.0, avg: 84.4, max: 271.0) [2024-06-15 14:43:30,956][1648985] Avg episode reward: [(0, '140.650')] [2024-06-15 14:43:32,856][1652491] Updated weights for policy 0, policy_version 259184 (0.0030) [2024-06-15 14:43:35,955][1648985] Fps is (10 sec: 49153.3, 60 sec: 44783.1, 300 sec: 45319.8). Total num frames: 530841600. Throughput: 0: 11320.9. Samples: 132804608. Policy #0 lag: (min: 15.0, avg: 84.4, max: 271.0) [2024-06-15 14:43:35,956][1648985] Avg episode reward: [(0, '127.930')] [2024-06-15 14:43:37,370][1651469] Signal inference workers to stop experience collection... (13550 times) [2024-06-15 14:43:37,419][1652491] InferenceWorker_p0-w0: stopping experience collection (13550 times) [2024-06-15 14:43:37,607][1651469] Signal inference workers to resume experience collection... (13550 times) [2024-06-15 14:43:37,608][1652491] InferenceWorker_p0-w0: resuming experience collection (13550 times) [2024-06-15 14:43:37,611][1652491] Updated weights for policy 0, policy_version 259232 (0.0013) [2024-06-15 14:43:39,145][1652491] Updated weights for policy 0, policy_version 259299 (0.0012) [2024-06-15 14:43:40,846][1652491] Updated weights for policy 0, policy_version 259376 (0.0128) [2024-06-15 14:43:40,955][1648985] Fps is (10 sec: 49152.3, 60 sec: 48059.8, 300 sec: 45653.0). Total num frames: 531202048. Throughput: 0: 11389.2. Samples: 132838912. Policy #0 lag: (min: 15.0, avg: 84.4, max: 271.0) [2024-06-15 14:43:40,955][1648985] Avg episode reward: [(0, '125.450')] [2024-06-15 14:43:43,211][1652491] Updated weights for policy 0, policy_version 259414 (0.0014) [2024-06-15 14:43:45,955][1648985] Fps is (10 sec: 52428.2, 60 sec: 45875.1, 300 sec: 45764.1). Total num frames: 531365888. Throughput: 0: 11423.3. Samples: 132907008. Policy #0 lag: (min: 15.0, avg: 84.4, max: 271.0) [2024-06-15 14:43:45,956][1648985] Avg episode reward: [(0, '122.790')] [2024-06-15 14:43:47,925][1652491] Updated weights for policy 0, policy_version 259457 (0.0014) [2024-06-15 14:43:49,381][1652491] Updated weights for policy 0, policy_version 259522 (0.0011) [2024-06-15 14:43:50,819][1652491] Updated weights for policy 0, policy_version 259600 (0.0013) [2024-06-15 14:43:50,955][1648985] Fps is (10 sec: 45875.9, 60 sec: 48060.0, 300 sec: 45653.1). Total num frames: 531660800. Throughput: 0: 11628.1. Samples: 132980224. Policy #0 lag: (min: 15.0, avg: 84.4, max: 271.0) [2024-06-15 14:43:50,955][1648985] Avg episode reward: [(0, '106.670')] [2024-06-15 14:43:54,222][1652491] Updated weights for policy 0, policy_version 259664 (0.0017) [2024-06-15 14:43:55,360][1652491] Updated weights for policy 0, policy_version 259710 (0.0011) [2024-06-15 14:43:55,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 45875.2, 300 sec: 45764.1). Total num frames: 531890176. Throughput: 0: 11514.3. Samples: 133014528. Policy #0 lag: (min: 15.0, avg: 84.4, max: 271.0) [2024-06-15 14:43:55,956][1648985] Avg episode reward: [(0, '113.100')] [2024-06-15 14:44:00,833][1652491] Updated weights for policy 0, policy_version 259780 (0.0013) [2024-06-15 14:44:00,955][1648985] Fps is (10 sec: 36043.8, 60 sec: 45875.2, 300 sec: 45653.1). Total num frames: 532021248. Throughput: 0: 11525.7. Samples: 133089792. Policy #0 lag: (min: 15.0, avg: 84.4, max: 271.0) [2024-06-15 14:44:00,956][1648985] Avg episode reward: [(0, '114.490')] [2024-06-15 14:44:02,389][1652491] Updated weights for policy 0, policy_version 259841 (0.0013) [2024-06-15 14:44:03,553][1652491] Updated weights for policy 0, policy_version 259903 (0.0014) [2024-06-15 14:44:05,955][1648985] Fps is (10 sec: 42598.8, 60 sec: 45329.1, 300 sec: 45875.2). Total num frames: 532316160. Throughput: 0: 11320.9. Samples: 133142528. Policy #0 lag: (min: 15.0, avg: 84.4, max: 271.0) [2024-06-15 14:44:05,956][1648985] Avg episode reward: [(0, '131.510')] [2024-06-15 14:44:06,711][1652491] Updated weights for policy 0, policy_version 259965 (0.0019) [2024-06-15 14:44:10,955][1648985] Fps is (10 sec: 39321.9, 60 sec: 43690.8, 300 sec: 45542.0). Total num frames: 532414464. Throughput: 0: 11525.7. Samples: 133184000. Policy #0 lag: (min: 15.0, avg: 84.4, max: 271.0) [2024-06-15 14:44:10,956][1648985] Avg episode reward: [(0, '136.040')] [2024-06-15 14:44:13,388][1652491] Updated weights for policy 0, policy_version 260048 (0.0148) [2024-06-15 14:44:14,825][1651469] Signal inference workers to stop experience collection... (13600 times) [2024-06-15 14:44:14,859][1652491] InferenceWorker_p0-w0: stopping experience collection (13600 times) [2024-06-15 14:44:15,026][1651469] Signal inference workers to resume experience collection... (13600 times) [2024-06-15 14:44:15,027][1652491] InferenceWorker_p0-w0: resuming experience collection (13600 times) [2024-06-15 14:44:15,029][1652491] Updated weights for policy 0, policy_version 260128 (0.0013) [2024-06-15 14:44:15,955][1648985] Fps is (10 sec: 49151.9, 60 sec: 46967.5, 300 sec: 45764.1). Total num frames: 532807680. Throughput: 0: 11377.8. Samples: 133242368. Policy #0 lag: (min: 15.0, avg: 84.4, max: 271.0) [2024-06-15 14:44:15,956][1648985] Avg episode reward: [(0, '147.150')] [2024-06-15 14:44:17,426][1652491] Updated weights for policy 0, policy_version 260163 (0.0014) [2024-06-15 14:44:18,519][1652491] Updated weights for policy 0, policy_version 260221 (0.0018) [2024-06-15 14:44:20,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 45328.9, 300 sec: 45764.1). Total num frames: 532938752. Throughput: 0: 11411.9. Samples: 133318144. Policy #0 lag: (min: 15.0, avg: 84.4, max: 271.0) [2024-06-15 14:44:20,956][1648985] Avg episode reward: [(0, '157.260')] [2024-06-15 14:44:24,349][1652491] Updated weights for policy 0, policy_version 260288 (0.0126) [2024-06-15 14:44:25,955][1648985] Fps is (10 sec: 39321.6, 60 sec: 47513.8, 300 sec: 45319.8). Total num frames: 533200896. Throughput: 0: 11457.4. Samples: 133354496. Policy #0 lag: (min: 72.0, avg: 138.1, max: 319.0) [2024-06-15 14:44:25,956][1648985] Avg episode reward: [(0, '148.350')] [2024-06-15 14:44:26,500][1652491] Updated weights for policy 0, policy_version 260372 (0.0011) [2024-06-15 14:44:30,253][1652491] Updated weights for policy 0, policy_version 260449 (0.0015) [2024-06-15 14:44:30,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 45875.2, 300 sec: 45986.3). Total num frames: 533463040. Throughput: 0: 11207.1. Samples: 133411328. Policy #0 lag: (min: 72.0, avg: 138.1, max: 319.0) [2024-06-15 14:44:30,956][1648985] Avg episode reward: [(0, '150.350')] [2024-06-15 14:44:35,075][1652491] Updated weights for policy 0, policy_version 260481 (0.0013) [2024-06-15 14:44:35,958][1648985] Fps is (10 sec: 36032.3, 60 sec: 45326.4, 300 sec: 45208.2). Total num frames: 533561344. Throughput: 0: 11240.3. Samples: 133486080. Policy #0 lag: (min: 72.0, avg: 138.1, max: 319.0) [2024-06-15 14:44:35,959][1648985] Avg episode reward: [(0, '129.330')] [2024-06-15 14:44:36,135][1652491] Updated weights for policy 0, policy_version 260535 (0.0012) [2024-06-15 14:44:37,648][1652491] Updated weights for policy 0, policy_version 260594 (0.0013) [2024-06-15 14:44:39,102][1652491] Updated weights for policy 0, policy_version 260665 (0.0090) [2024-06-15 14:44:40,955][1648985] Fps is (10 sec: 39320.9, 60 sec: 44236.6, 300 sec: 45653.0). Total num frames: 533856256. Throughput: 0: 11070.5. Samples: 133512704. Policy #0 lag: (min: 72.0, avg: 138.1, max: 319.0) [2024-06-15 14:44:40,956][1648985] Avg episode reward: [(0, '132.730')] [2024-06-15 14:44:42,019][1652491] Updated weights for policy 0, policy_version 260735 (0.0012) [2024-06-15 14:44:45,955][1648985] Fps is (10 sec: 42612.9, 60 sec: 43690.7, 300 sec: 45208.7). Total num frames: 533987328. Throughput: 0: 11047.8. Samples: 133586944. Policy #0 lag: (min: 72.0, avg: 138.1, max: 319.0) [2024-06-15 14:44:45,956][1648985] Avg episode reward: [(0, '154.650')] [2024-06-15 14:44:48,464][1652491] Updated weights for policy 0, policy_version 260848 (0.0013) [2024-06-15 14:44:50,125][1652491] Updated weights for policy 0, policy_version 260912 (0.0017) [2024-06-15 14:44:50,955][1648985] Fps is (10 sec: 52430.2, 60 sec: 45329.0, 300 sec: 45764.2). Total num frames: 534380544. Throughput: 0: 11229.9. Samples: 133647872. Policy #0 lag: (min: 72.0, avg: 138.1, max: 319.0) [2024-06-15 14:44:50,956][1648985] Avg episode reward: [(0, '164.020')] [2024-06-15 14:44:52,920][1652491] Updated weights for policy 0, policy_version 260944 (0.0021) [2024-06-15 14:44:55,955][1648985] Fps is (10 sec: 52427.9, 60 sec: 43690.5, 300 sec: 45319.8). Total num frames: 534511616. Throughput: 0: 11081.9. Samples: 133682688. Policy #0 lag: (min: 72.0, avg: 138.1, max: 319.0) [2024-06-15 14:44:55,956][1648985] Avg episode reward: [(0, '136.260')] [2024-06-15 14:44:55,961][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000260992_534511616.pth... [2024-06-15 14:44:56,037][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000255680_523632640.pth [2024-06-15 14:44:57,820][1652491] Updated weights for policy 0, policy_version 260995 (0.0012) [2024-06-15 14:44:58,934][1651469] Signal inference workers to stop experience collection... (13650 times) [2024-06-15 14:44:58,973][1652491] InferenceWorker_p0-w0: stopping experience collection (13650 times) [2024-06-15 14:44:59,277][1651469] Signal inference workers to resume experience collection... (13650 times) [2024-06-15 14:44:59,278][1652491] InferenceWorker_p0-w0: resuming experience collection (13650 times) [2024-06-15 14:45:00,110][1652491] Updated weights for policy 0, policy_version 261088 (0.0021) [2024-06-15 14:45:00,955][1648985] Fps is (10 sec: 39321.1, 60 sec: 45875.2, 300 sec: 45319.8). Total num frames: 534773760. Throughput: 0: 11309.5. Samples: 133751296. Policy #0 lag: (min: 72.0, avg: 138.1, max: 319.0) [2024-06-15 14:45:00,956][1648985] Avg episode reward: [(0, '136.770')] [2024-06-15 14:45:01,455][1652491] Updated weights for policy 0, policy_version 261140 (0.0012) [2024-06-15 14:45:05,034][1652491] Updated weights for policy 0, policy_version 261202 (0.0012) [2024-06-15 14:45:05,955][1648985] Fps is (10 sec: 49153.0, 60 sec: 44782.9, 300 sec: 45542.0). Total num frames: 535003136. Throughput: 0: 11161.6. Samples: 133820416. Policy #0 lag: (min: 72.0, avg: 138.1, max: 319.0) [2024-06-15 14:45:05,956][1648985] Avg episode reward: [(0, '129.000')] [2024-06-15 14:45:06,183][1652491] Updated weights for policy 0, policy_version 261248 (0.0024) [2024-06-15 14:45:10,955][1648985] Fps is (10 sec: 39322.1, 60 sec: 45875.2, 300 sec: 44875.5). Total num frames: 535166976. Throughput: 0: 11275.4. Samples: 133861888. Policy #0 lag: (min: 72.0, avg: 138.1, max: 319.0) [2024-06-15 14:45:10,956][1648985] Avg episode reward: [(0, '132.330')] [2024-06-15 14:45:11,719][1652491] Updated weights for policy 0, policy_version 261344 (0.0025) [2024-06-15 14:45:13,064][1652491] Updated weights for policy 0, policy_version 261408 (0.0022) [2024-06-15 14:45:15,955][1648985] Fps is (10 sec: 42598.1, 60 sec: 43690.6, 300 sec: 45319.8). Total num frames: 535429120. Throughput: 0: 11332.3. Samples: 133921280. Policy #0 lag: (min: 72.0, avg: 138.1, max: 319.0) [2024-06-15 14:45:15,956][1648985] Avg episode reward: [(0, '130.930')] [2024-06-15 14:45:17,112][1652491] Updated weights for policy 0, policy_version 261457 (0.0026) [2024-06-15 14:45:18,007][1652491] Updated weights for policy 0, policy_version 261503 (0.0010) [2024-06-15 14:45:20,955][1648985] Fps is (10 sec: 42597.7, 60 sec: 44236.7, 300 sec: 44986.6). Total num frames: 535592960. Throughput: 0: 11344.5. Samples: 133996544. Policy #0 lag: (min: 72.0, avg: 138.1, max: 319.0) [2024-06-15 14:45:20,956][1648985] Avg episode reward: [(0, '113.130')] [2024-06-15 14:45:21,640][1652491] Updated weights for policy 0, policy_version 261552 (0.0124) [2024-06-15 14:45:23,303][1652491] Updated weights for policy 0, policy_version 261616 (0.0089) [2024-06-15 14:45:24,960][1652491] Updated weights for policy 0, policy_version 261691 (0.0257) [2024-06-15 14:45:25,955][1648985] Fps is (10 sec: 52429.8, 60 sec: 45875.2, 300 sec: 45542.0). Total num frames: 535953408. Throughput: 0: 11355.1. Samples: 134023680. Policy #0 lag: (min: 72.0, avg: 138.1, max: 319.0) [2024-06-15 14:45:25,955][1648985] Avg episode reward: [(0, '127.290')] [2024-06-15 14:45:29,790][1652491] Updated weights for policy 0, policy_version 261754 (0.0013) [2024-06-15 14:45:30,955][1648985] Fps is (10 sec: 49152.3, 60 sec: 43690.6, 300 sec: 45319.8). Total num frames: 536084480. Throughput: 0: 11241.2. Samples: 134092800. Policy #0 lag: (min: 72.0, avg: 138.1, max: 319.0) [2024-06-15 14:45:30,956][1648985] Avg episode reward: [(0, '137.470')] [2024-06-15 14:45:33,664][1652491] Updated weights for policy 0, policy_version 261825 (0.0014) [2024-06-15 14:45:34,913][1652491] Updated weights for policy 0, policy_version 261878 (0.0012) [2024-06-15 14:45:35,955][1648985] Fps is (10 sec: 42598.0, 60 sec: 46970.1, 300 sec: 45430.9). Total num frames: 536379392. Throughput: 0: 11332.3. Samples: 134157824. Policy #0 lag: (min: 72.0, avg: 138.1, max: 319.0) [2024-06-15 14:45:35,956][1648985] Avg episode reward: [(0, '134.670')] [2024-06-15 14:45:36,204][1652491] Updated weights for policy 0, policy_version 261921 (0.0013) [2024-06-15 14:45:36,879][1652491] Updated weights for policy 0, policy_version 261952 (0.0015) [2024-06-15 14:45:40,786][1651469] Signal inference workers to stop experience collection... (13700 times) [2024-06-15 14:45:40,854][1652491] InferenceWorker_p0-w0: stopping experience collection (13700 times) [2024-06-15 14:45:40,955][1648985] Fps is (10 sec: 45875.6, 60 sec: 44783.1, 300 sec: 45208.7). Total num frames: 536543232. Throughput: 0: 11491.6. Samples: 134199808. Policy #0 lag: (min: 41.0, avg: 188.0, max: 303.0) [2024-06-15 14:45:40,956][1648985] Avg episode reward: [(0, '132.550')] [2024-06-15 14:45:41,120][1651469] Signal inference workers to resume experience collection... (13700 times) [2024-06-15 14:45:41,124][1652491] InferenceWorker_p0-w0: resuming experience collection (13700 times) [2024-06-15 14:45:43,620][1652491] Updated weights for policy 0, policy_version 262019 (0.0012) [2024-06-15 14:45:44,826][1652491] Updated weights for policy 0, policy_version 262070 (0.0015) [2024-06-15 14:45:45,955][1648985] Fps is (10 sec: 42597.8, 60 sec: 46967.4, 300 sec: 45319.8). Total num frames: 536805376. Throughput: 0: 11468.8. Samples: 134267392. Policy #0 lag: (min: 41.0, avg: 188.0, max: 303.0) [2024-06-15 14:45:45,956][1648985] Avg episode reward: [(0, '144.090')] [2024-06-15 14:45:46,150][1652491] Updated weights for policy 0, policy_version 262128 (0.0038) [2024-06-15 14:45:47,297][1652491] Updated weights for policy 0, policy_version 262176 (0.0012) [2024-06-15 14:45:50,955][1648985] Fps is (10 sec: 45875.0, 60 sec: 43690.6, 300 sec: 45542.0). Total num frames: 537001984. Throughput: 0: 11582.6. Samples: 134341632. Policy #0 lag: (min: 41.0, avg: 188.0, max: 303.0) [2024-06-15 14:45:50,956][1648985] Avg episode reward: [(0, '147.410')] [2024-06-15 14:45:51,756][1652491] Updated weights for policy 0, policy_version 262240 (0.0100) [2024-06-15 14:45:52,565][1652491] Updated weights for policy 0, policy_version 262272 (0.0017) [2024-06-15 14:45:55,462][1652491] Updated weights for policy 0, policy_version 262324 (0.0030) [2024-06-15 14:45:55,955][1648985] Fps is (10 sec: 45875.8, 60 sec: 45875.4, 300 sec: 45430.9). Total num frames: 537264128. Throughput: 0: 11434.7. Samples: 134376448. Policy #0 lag: (min: 41.0, avg: 188.0, max: 303.0) [2024-06-15 14:45:55,956][1648985] Avg episode reward: [(0, '156.510')] [2024-06-15 14:45:56,850][1652491] Updated weights for policy 0, policy_version 262390 (0.0014) [2024-06-15 14:45:58,774][1652491] Updated weights for policy 0, policy_version 262452 (0.0015) [2024-06-15 14:46:00,955][1648985] Fps is (10 sec: 52428.3, 60 sec: 45875.2, 300 sec: 45764.1). Total num frames: 537526272. Throughput: 0: 11537.1. Samples: 134440448. Policy #0 lag: (min: 41.0, avg: 188.0, max: 303.0) [2024-06-15 14:46:00,956][1648985] Avg episode reward: [(0, '147.690')] [2024-06-15 14:46:03,256][1652491] Updated weights for policy 0, policy_version 262480 (0.0098) [2024-06-15 14:46:04,222][1652491] Updated weights for policy 0, policy_version 262528 (0.0011) [2024-06-15 14:46:05,955][1648985] Fps is (10 sec: 45875.7, 60 sec: 45329.1, 300 sec: 45542.0). Total num frames: 537722880. Throughput: 0: 11594.0. Samples: 134518272. Policy #0 lag: (min: 41.0, avg: 188.0, max: 303.0) [2024-06-15 14:46:05,955][1648985] Avg episode reward: [(0, '133.610')] [2024-06-15 14:46:06,477][1652491] Updated weights for policy 0, policy_version 262581 (0.0013) [2024-06-15 14:46:09,909][1652491] Updated weights for policy 0, policy_version 262672 (0.0013) [2024-06-15 14:46:10,955][1648985] Fps is (10 sec: 49152.6, 60 sec: 47513.6, 300 sec: 45653.0). Total num frames: 538017792. Throughput: 0: 11616.7. Samples: 134546432. Policy #0 lag: (min: 41.0, avg: 188.0, max: 303.0) [2024-06-15 14:46:10,955][1648985] Avg episode reward: [(0, '136.470')] [2024-06-15 14:46:14,371][1652491] Updated weights for policy 0, policy_version 262736 (0.0029) [2024-06-15 14:46:15,955][1648985] Fps is (10 sec: 45874.1, 60 sec: 45875.2, 300 sec: 45764.1). Total num frames: 538181632. Throughput: 0: 11810.1. Samples: 134624256. Policy #0 lag: (min: 41.0, avg: 188.0, max: 303.0) [2024-06-15 14:46:15,956][1648985] Avg episode reward: [(0, '145.960')] [2024-06-15 14:46:16,027][1652491] Updated weights for policy 0, policy_version 262785 (0.0013) [2024-06-15 14:46:17,906][1652491] Updated weights for policy 0, policy_version 262868 (0.0014) [2024-06-15 14:46:20,947][1652491] Updated weights for policy 0, policy_version 262928 (0.0039) [2024-06-15 14:46:20,955][1648985] Fps is (10 sec: 45875.3, 60 sec: 48059.9, 300 sec: 45764.2). Total num frames: 538476544. Throughput: 0: 11855.7. Samples: 134691328. Policy #0 lag: (min: 41.0, avg: 188.0, max: 303.0) [2024-06-15 14:46:20,955][1648985] Avg episode reward: [(0, '152.270')] [2024-06-15 14:46:21,463][1651469] Signal inference workers to stop experience collection... (13750 times) [2024-06-15 14:46:21,512][1652491] InferenceWorker_p0-w0: stopping experience collection (13750 times) [2024-06-15 14:46:21,714][1651469] Signal inference workers to resume experience collection... (13750 times) [2024-06-15 14:46:21,715][1652491] InferenceWorker_p0-w0: resuming experience collection (13750 times) [2024-06-15 14:46:21,984][1652491] Updated weights for policy 0, policy_version 262970 (0.0014) [2024-06-15 14:46:25,955][1648985] Fps is (10 sec: 49152.7, 60 sec: 45329.0, 300 sec: 46097.4). Total num frames: 538673152. Throughput: 0: 11855.6. Samples: 134733312. Policy #0 lag: (min: 41.0, avg: 188.0, max: 303.0) [2024-06-15 14:46:25,956][1648985] Avg episode reward: [(0, '142.880')] [2024-06-15 14:46:26,057][1652491] Updated weights for policy 0, policy_version 263025 (0.0126) [2024-06-15 14:46:27,544][1652491] Updated weights for policy 0, policy_version 263095 (0.0107) [2024-06-15 14:46:29,221][1652491] Updated weights for policy 0, policy_version 263159 (0.0011) [2024-06-15 14:46:30,955][1648985] Fps is (10 sec: 49151.6, 60 sec: 48059.8, 300 sec: 45764.1). Total num frames: 538968064. Throughput: 0: 11696.4. Samples: 134793728. Policy #0 lag: (min: 41.0, avg: 188.0, max: 303.0) [2024-06-15 14:46:30,956][1648985] Avg episode reward: [(0, '142.870')] [2024-06-15 14:46:33,128][1652491] Updated weights for policy 0, policy_version 263209 (0.0013) [2024-06-15 14:46:35,955][1648985] Fps is (10 sec: 42598.0, 60 sec: 45329.0, 300 sec: 45764.2). Total num frames: 539099136. Throughput: 0: 11832.9. Samples: 134874112. Policy #0 lag: (min: 41.0, avg: 188.0, max: 303.0) [2024-06-15 14:46:35,956][1648985] Avg episode reward: [(0, '136.630')] [2024-06-15 14:46:36,298][1652491] Updated weights for policy 0, policy_version 263239 (0.0021) [2024-06-15 14:46:37,441][1652491] Updated weights for policy 0, policy_version 263297 (0.0013) [2024-06-15 14:46:38,614][1652491] Updated weights for policy 0, policy_version 263347 (0.0013) [2024-06-15 14:46:40,130][1652491] Updated weights for policy 0, policy_version 263408 (0.0012) [2024-06-15 14:46:40,955][1648985] Fps is (10 sec: 52429.1, 60 sec: 49152.0, 300 sec: 46097.4). Total num frames: 539492352. Throughput: 0: 11741.9. Samples: 134904832. Policy #0 lag: (min: 41.0, avg: 188.0, max: 303.0) [2024-06-15 14:46:40,956][1648985] Avg episode reward: [(0, '135.150')] [2024-06-15 14:46:44,075][1652491] Updated weights for policy 0, policy_version 263443 (0.0032) [2024-06-15 14:46:45,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 46967.5, 300 sec: 45764.1). Total num frames: 539623424. Throughput: 0: 11821.5. Samples: 134972416. Policy #0 lag: (min: 41.0, avg: 188.0, max: 303.0) [2024-06-15 14:46:45,956][1648985] Avg episode reward: [(0, '150.460')] [2024-06-15 14:46:48,550][1652491] Updated weights for policy 0, policy_version 263525 (0.0113) [2024-06-15 14:46:50,955][1648985] Fps is (10 sec: 39321.7, 60 sec: 48059.8, 300 sec: 45986.3). Total num frames: 539885568. Throughput: 0: 11571.2. Samples: 135038976. Policy #0 lag: (min: 5.0, avg: 77.4, max: 261.0) [2024-06-15 14:46:50,956][1648985] Avg episode reward: [(0, '141.760')] [2024-06-15 14:46:51,002][1652491] Updated weights for policy 0, policy_version 263617 (0.0106) [2024-06-15 14:46:52,246][1652491] Updated weights for policy 0, policy_version 263675 (0.0014) [2024-06-15 14:46:55,956][1648985] Fps is (10 sec: 45872.9, 60 sec: 46967.0, 300 sec: 45764.0). Total num frames: 540082176. Throughput: 0: 11810.0. Samples: 135077888. Policy #0 lag: (min: 5.0, avg: 77.4, max: 261.0) [2024-06-15 14:46:55,956][1648985] Avg episode reward: [(0, '148.370')] [2024-06-15 14:46:56,007][1652491] Updated weights for policy 0, policy_version 263728 (0.0013) [2024-06-15 14:46:56,285][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000263744_540147712.pth... [2024-06-15 14:46:56,359][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000258368_529137664.pth [2024-06-15 14:47:00,298][1652491] Updated weights for policy 0, policy_version 263796 (0.0013) [2024-06-15 14:47:00,955][1648985] Fps is (10 sec: 42598.4, 60 sec: 46421.4, 300 sec: 46208.4). Total num frames: 540311552. Throughput: 0: 11616.8. Samples: 135147008. Policy #0 lag: (min: 5.0, avg: 77.4, max: 261.0) [2024-06-15 14:47:00,955][1648985] Avg episode reward: [(0, '140.770')] [2024-06-15 14:47:01,795][1651469] Signal inference workers to stop experience collection... (13800 times) [2024-06-15 14:47:01,824][1652491] InferenceWorker_p0-w0: stopping experience collection (13800 times) [2024-06-15 14:47:01,842][1652491] Updated weights for policy 0, policy_version 263858 (0.0011) [2024-06-15 14:47:02,049][1651469] Signal inference workers to resume experience collection... (13800 times) [2024-06-15 14:47:02,050][1652491] InferenceWorker_p0-w0: resuming experience collection (13800 times) [2024-06-15 14:47:03,178][1652491] Updated weights for policy 0, policy_version 263922 (0.0012) [2024-06-15 14:47:05,955][1648985] Fps is (10 sec: 45877.5, 60 sec: 46967.3, 300 sec: 45764.1). Total num frames: 540540928. Throughput: 0: 11719.1. Samples: 135218688. Policy #0 lag: (min: 5.0, avg: 77.4, max: 261.0) [2024-06-15 14:47:05,956][1648985] Avg episode reward: [(0, '131.390')] [2024-06-15 14:47:07,549][1652491] Updated weights for policy 0, policy_version 263968 (0.0013) [2024-06-15 14:47:08,392][1652491] Updated weights for policy 0, policy_version 264000 (0.0043) [2024-06-15 14:47:10,955][1648985] Fps is (10 sec: 42597.6, 60 sec: 45328.9, 300 sec: 45986.2). Total num frames: 540737536. Throughput: 0: 11480.1. Samples: 135249920. Policy #0 lag: (min: 5.0, avg: 77.4, max: 261.0) [2024-06-15 14:47:10,956][1648985] Avg episode reward: [(0, '119.210')] [2024-06-15 14:47:12,556][1652491] Updated weights for policy 0, policy_version 264084 (0.0013) [2024-06-15 14:47:13,681][1652491] Updated weights for policy 0, policy_version 264136 (0.0012) [2024-06-15 14:47:14,900][1652491] Updated weights for policy 0, policy_version 264190 (0.0015) [2024-06-15 14:47:15,955][1648985] Fps is (10 sec: 52427.9, 60 sec: 48059.7, 300 sec: 45986.2). Total num frames: 541065216. Throughput: 0: 11582.5. Samples: 135314944. Policy #0 lag: (min: 5.0, avg: 77.4, max: 261.0) [2024-06-15 14:47:15,956][1648985] Avg episode reward: [(0, '118.880')] [2024-06-15 14:47:19,540][1652491] Updated weights for policy 0, policy_version 264256 (0.0014) [2024-06-15 14:47:20,955][1648985] Fps is (10 sec: 49152.9, 60 sec: 45875.2, 300 sec: 45875.2). Total num frames: 541229056. Throughput: 0: 11457.4. Samples: 135389696. Policy #0 lag: (min: 5.0, avg: 77.4, max: 261.0) [2024-06-15 14:47:20,956][1648985] Avg episode reward: [(0, '136.890')] [2024-06-15 14:47:21,879][1652491] Updated weights for policy 0, policy_version 264318 (0.0012) [2024-06-15 14:47:23,779][1652491] Updated weights for policy 0, policy_version 264368 (0.0013) [2024-06-15 14:47:24,611][1652491] Updated weights for policy 0, policy_version 264410 (0.0011) [2024-06-15 14:47:25,435][1652491] Updated weights for policy 0, policy_version 264445 (0.0013) [2024-06-15 14:47:25,955][1648985] Fps is (10 sec: 52429.6, 60 sec: 48605.8, 300 sec: 46319.5). Total num frames: 541589504. Throughput: 0: 11548.4. Samples: 135424512. Policy #0 lag: (min: 5.0, avg: 77.4, max: 261.0) [2024-06-15 14:47:25,956][1648985] Avg episode reward: [(0, '155.010')] [2024-06-15 14:47:30,198][1652491] Updated weights for policy 0, policy_version 264503 (0.0011) [2024-06-15 14:47:30,955][1648985] Fps is (10 sec: 49151.2, 60 sec: 45875.1, 300 sec: 45986.3). Total num frames: 541720576. Throughput: 0: 11912.5. Samples: 135508480. Policy #0 lag: (min: 5.0, avg: 77.4, max: 261.0) [2024-06-15 14:47:30,956][1648985] Avg episode reward: [(0, '147.790')] [2024-06-15 14:47:31,890][1652491] Updated weights for policy 0, policy_version 264562 (0.0012) [2024-06-15 14:47:34,490][1652491] Updated weights for policy 0, policy_version 264624 (0.0012) [2024-06-15 14:47:35,955][1648985] Fps is (10 sec: 45875.2, 60 sec: 49152.0, 300 sec: 46541.7). Total num frames: 542048256. Throughput: 0: 11650.8. Samples: 135563264. Policy #0 lag: (min: 5.0, avg: 77.4, max: 261.0) [2024-06-15 14:47:35,956][1648985] Avg episode reward: [(0, '151.130')] [2024-06-15 14:47:36,363][1652491] Updated weights for policy 0, policy_version 264699 (0.0012) [2024-06-15 14:47:40,955][1648985] Fps is (10 sec: 39322.2, 60 sec: 43690.7, 300 sec: 45764.1). Total num frames: 542113792. Throughput: 0: 11730.6. Samples: 135605760. Policy #0 lag: (min: 5.0, avg: 77.4, max: 261.0) [2024-06-15 14:47:40,956][1648985] Avg episode reward: [(0, '159.700')] [2024-06-15 14:47:42,006][1652491] Updated weights for policy 0, policy_version 264768 (0.0046) [2024-06-15 14:47:43,750][1652491] Updated weights for policy 0, policy_version 264819 (0.0015) [2024-06-15 14:47:45,301][1651469] Signal inference workers to stop experience collection... (13850 times) [2024-06-15 14:47:45,382][1652491] InferenceWorker_p0-w0: stopping experience collection (13850 times) [2024-06-15 14:47:45,540][1651469] Signal inference workers to resume experience collection... (13850 times) [2024-06-15 14:47:45,540][1652491] InferenceWorker_p0-w0: resuming experience collection (13850 times) [2024-06-15 14:47:45,690][1652491] Updated weights for policy 0, policy_version 264865 (0.0110) [2024-06-15 14:47:45,955][1648985] Fps is (10 sec: 42598.9, 60 sec: 47513.7, 300 sec: 46430.6). Total num frames: 542474240. Throughput: 0: 11798.8. Samples: 135677952. Policy #0 lag: (min: 5.0, avg: 77.4, max: 261.0) [2024-06-15 14:47:45,956][1648985] Avg episode reward: [(0, '156.530')] [2024-06-15 14:47:47,180][1652491] Updated weights for policy 0, policy_version 264929 (0.0036) [2024-06-15 14:47:50,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 45875.2, 300 sec: 45764.1). Total num frames: 542638080. Throughput: 0: 11798.8. Samples: 135749632. Policy #0 lag: (min: 5.0, avg: 77.4, max: 261.0) [2024-06-15 14:47:50,956][1648985] Avg episode reward: [(0, '151.700')] [2024-06-15 14:47:52,668][1652491] Updated weights for policy 0, policy_version 264976 (0.0036) [2024-06-15 14:47:54,207][1652491] Updated weights for policy 0, policy_version 265043 (0.0017) [2024-06-15 14:47:55,955][1648985] Fps is (10 sec: 45875.2, 60 sec: 47514.1, 300 sec: 46319.5). Total num frames: 542932992. Throughput: 0: 11912.6. Samples: 135785984. Policy #0 lag: (min: 5.0, avg: 77.4, max: 261.0) [2024-06-15 14:47:55,956][1648985] Avg episode reward: [(0, '143.130')] [2024-06-15 14:47:56,112][1652491] Updated weights for policy 0, policy_version 265120 (0.0058) [2024-06-15 14:47:58,227][1652491] Updated weights for policy 0, policy_version 265210 (0.0014) [2024-06-15 14:48:00,955][1648985] Fps is (10 sec: 52427.8, 60 sec: 47513.4, 300 sec: 45986.2). Total num frames: 543162368. Throughput: 0: 11935.3. Samples: 135852032. Policy #0 lag: (min: 5.0, avg: 77.4, max: 261.0) [2024-06-15 14:48:00,957][1648985] Avg episode reward: [(0, '149.570')] [2024-06-15 14:48:04,932][1652491] Updated weights for policy 0, policy_version 265269 (0.0012) [2024-06-15 14:48:05,955][1648985] Fps is (10 sec: 42598.4, 60 sec: 46967.5, 300 sec: 45986.3). Total num frames: 543358976. Throughput: 0: 11901.2. Samples: 135925248. Policy #0 lag: (min: 14.0, avg: 67.2, max: 270.0) [2024-06-15 14:48:05,956][1648985] Avg episode reward: [(0, '155.020')] [2024-06-15 14:48:06,289][1652491] Updated weights for policy 0, policy_version 265343 (0.0013) [2024-06-15 14:48:08,029][1652491] Updated weights for policy 0, policy_version 265408 (0.0012) [2024-06-15 14:48:09,444][1652491] Updated weights for policy 0, policy_version 265467 (0.0014) [2024-06-15 14:48:10,955][1648985] Fps is (10 sec: 52430.1, 60 sec: 49152.2, 300 sec: 46430.6). Total num frames: 543686656. Throughput: 0: 11741.9. Samples: 135952896. Policy #0 lag: (min: 14.0, avg: 67.2, max: 270.0) [2024-06-15 14:48:10,956][1648985] Avg episode reward: [(0, '148.360')] [2024-06-15 14:48:15,489][1652491] Updated weights for policy 0, policy_version 265504 (0.0011) [2024-06-15 14:48:15,955][1648985] Fps is (10 sec: 42597.2, 60 sec: 45329.0, 300 sec: 45986.2). Total num frames: 543784960. Throughput: 0: 11832.9. Samples: 136040960. Policy #0 lag: (min: 14.0, avg: 67.2, max: 270.0) [2024-06-15 14:48:15,956][1648985] Avg episode reward: [(0, '140.390')] [2024-06-15 14:48:17,679][1652491] Updated weights for policy 0, policy_version 265597 (0.0014) [2024-06-15 14:48:19,446][1652491] Updated weights for policy 0, policy_version 265666 (0.0107) [2024-06-15 14:48:20,639][1652491] Updated weights for policy 0, policy_version 265724 (0.0012) [2024-06-15 14:48:20,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 49698.2, 300 sec: 46986.0). Total num frames: 544210944. Throughput: 0: 11662.3. Samples: 136088064. Policy #0 lag: (min: 14.0, avg: 67.2, max: 270.0) [2024-06-15 14:48:20,955][1648985] Avg episode reward: [(0, '133.480')] [2024-06-15 14:48:25,955][1648985] Fps is (10 sec: 42599.3, 60 sec: 43690.7, 300 sec: 45764.1). Total num frames: 544210944. Throughput: 0: 11741.9. Samples: 136134144. Policy #0 lag: (min: 14.0, avg: 67.2, max: 270.0) [2024-06-15 14:48:25,957][1648985] Avg episode reward: [(0, '140.860')] [2024-06-15 14:48:26,717][1651469] Signal inference workers to stop experience collection... (13900 times) [2024-06-15 14:48:26,770][1652491] InferenceWorker_p0-w0: stopping experience collection (13900 times) [2024-06-15 14:48:26,969][1651469] Signal inference workers to resume experience collection... (13900 times) [2024-06-15 14:48:26,970][1652491] InferenceWorker_p0-w0: resuming experience collection (13900 times) [2024-06-15 14:48:27,452][1652491] Updated weights for policy 0, policy_version 265785 (0.0089) [2024-06-15 14:48:29,077][1652491] Updated weights for policy 0, policy_version 265855 (0.0012) [2024-06-15 14:48:30,955][1648985] Fps is (10 sec: 36043.7, 60 sec: 47513.5, 300 sec: 46541.6). Total num frames: 544571392. Throughput: 0: 11537.0. Samples: 136197120. Policy #0 lag: (min: 14.0, avg: 67.2, max: 270.0) [2024-06-15 14:48:30,956][1648985] Avg episode reward: [(0, '160.670')] [2024-06-15 14:48:31,671][1652491] Updated weights for policy 0, policy_version 265937 (0.0104) [2024-06-15 14:48:35,955][1648985] Fps is (10 sec: 52429.1, 60 sec: 44783.0, 300 sec: 45875.2). Total num frames: 544735232. Throughput: 0: 11559.8. Samples: 136269824. Policy #0 lag: (min: 14.0, avg: 67.2, max: 270.0) [2024-06-15 14:48:35,956][1648985] Avg episode reward: [(0, '156.690')] [2024-06-15 14:48:38,080][1652491] Updated weights for policy 0, policy_version 265985 (0.0013) [2024-06-15 14:48:39,721][1652491] Updated weights for policy 0, policy_version 266064 (0.0043) [2024-06-15 14:48:40,955][1648985] Fps is (10 sec: 39322.7, 60 sec: 47513.6, 300 sec: 46097.4). Total num frames: 544964608. Throughput: 0: 11605.3. Samples: 136308224. Policy #0 lag: (min: 14.0, avg: 67.2, max: 270.0) [2024-06-15 14:48:40,955][1648985] Avg episode reward: [(0, '173.260')] [2024-06-15 14:48:41,276][1652491] Updated weights for policy 0, policy_version 266114 (0.0013) [2024-06-15 14:48:43,316][1652491] Updated weights for policy 0, policy_version 266208 (0.0079) [2024-06-15 14:48:45,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 46421.4, 300 sec: 46097.3). Total num frames: 545259520. Throughput: 0: 11412.0. Samples: 136365568. Policy #0 lag: (min: 14.0, avg: 67.2, max: 270.0) [2024-06-15 14:48:45,955][1648985] Avg episode reward: [(0, '159.300')] [2024-06-15 14:48:49,394][1652491] Updated weights for policy 0, policy_version 266261 (0.0014) [2024-06-15 14:48:50,955][1648985] Fps is (10 sec: 45875.0, 60 sec: 46421.3, 300 sec: 45875.2). Total num frames: 545423360. Throughput: 0: 11559.8. Samples: 136445440. Policy #0 lag: (min: 14.0, avg: 67.2, max: 270.0) [2024-06-15 14:48:50,956][1648985] Avg episode reward: [(0, '159.650')] [2024-06-15 14:48:51,365][1652491] Updated weights for policy 0, policy_version 266339 (0.0108) [2024-06-15 14:48:52,087][1652491] Updated weights for policy 0, policy_version 266368 (0.0010) [2024-06-15 14:48:53,867][1652491] Updated weights for policy 0, policy_version 266436 (0.0012) [2024-06-15 14:48:55,955][1648985] Fps is (10 sec: 52428.1, 60 sec: 47513.5, 300 sec: 46652.8). Total num frames: 545783808. Throughput: 0: 11559.8. Samples: 136473088. Policy #0 lag: (min: 14.0, avg: 67.2, max: 270.0) [2024-06-15 14:48:55,956][1648985] Avg episode reward: [(0, '153.320')] [2024-06-15 14:48:55,961][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000266496_545783808.pth... [2024-06-15 14:48:56,014][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000260992_534511616.pth [2024-06-15 14:49:00,101][1652491] Updated weights for policy 0, policy_version 266499 (0.0012) [2024-06-15 14:49:00,955][1648985] Fps is (10 sec: 42599.1, 60 sec: 44783.2, 300 sec: 45875.2). Total num frames: 545849344. Throughput: 0: 11332.4. Samples: 136550912. Policy #0 lag: (min: 14.0, avg: 67.2, max: 270.0) [2024-06-15 14:49:00,955][1648985] Avg episode reward: [(0, '136.410')] [2024-06-15 14:49:01,201][1652491] Updated weights for policy 0, policy_version 266546 (0.0012) [2024-06-15 14:49:02,596][1652491] Updated weights for policy 0, policy_version 266608 (0.0101) [2024-06-15 14:49:04,262][1652491] Updated weights for policy 0, policy_version 266657 (0.0095) [2024-06-15 14:49:04,277][1651469] Signal inference workers to stop experience collection... (13950 times) [2024-06-15 14:49:04,390][1652491] InferenceWorker_p0-w0: stopping experience collection (13950 times) [2024-06-15 14:49:04,573][1651469] Signal inference workers to resume experience collection... (13950 times) [2024-06-15 14:49:04,574][1652491] InferenceWorker_p0-w0: resuming experience collection (13950 times) [2024-06-15 14:49:05,664][1652491] Updated weights for policy 0, policy_version 266705 (0.0012) [2024-06-15 14:49:05,955][1648985] Fps is (10 sec: 45875.3, 60 sec: 48059.7, 300 sec: 46874.9). Total num frames: 546242560. Throughput: 0: 11673.6. Samples: 136613376. Policy #0 lag: (min: 14.0, avg: 67.2, max: 270.0) [2024-06-15 14:49:05,956][1648985] Avg episode reward: [(0, '130.410')] [2024-06-15 14:49:10,955][1648985] Fps is (10 sec: 45873.8, 60 sec: 43690.5, 300 sec: 45764.1). Total num frames: 546308096. Throughput: 0: 11491.5. Samples: 136651264. Policy #0 lag: (min: 14.0, avg: 67.2, max: 270.0) [2024-06-15 14:49:10,956][1648985] Avg episode reward: [(0, '111.170')] [2024-06-15 14:49:11,070][1652491] Updated weights for policy 0, policy_version 266768 (0.0011) [2024-06-15 14:49:12,096][1652491] Updated weights for policy 0, policy_version 266816 (0.0013) [2024-06-15 14:49:14,099][1652491] Updated weights for policy 0, policy_version 266874 (0.0012) [2024-06-15 14:49:15,686][1652491] Updated weights for policy 0, policy_version 266928 (0.0012) [2024-06-15 14:49:15,955][1648985] Fps is (10 sec: 45875.5, 60 sec: 48606.1, 300 sec: 46652.8). Total num frames: 546701312. Throughput: 0: 11662.3. Samples: 136721920. Policy #0 lag: (min: 23.0, avg: 146.4, max: 274.0) [2024-06-15 14:49:15,956][1648985] Avg episode reward: [(0, '132.490')] [2024-06-15 14:49:17,458][1652491] Updated weights for policy 0, policy_version 266976 (0.0023) [2024-06-15 14:49:20,955][1648985] Fps is (10 sec: 52427.8, 60 sec: 43690.4, 300 sec: 46208.4). Total num frames: 546832384. Throughput: 0: 11537.0. Samples: 136788992. Policy #0 lag: (min: 23.0, avg: 146.4, max: 274.0) [2024-06-15 14:49:20,956][1648985] Avg episode reward: [(0, '132.890')] [2024-06-15 14:49:23,574][1652491] Updated weights for policy 0, policy_version 267040 (0.0013) [2024-06-15 14:49:24,212][1652491] Updated weights for policy 0, policy_version 267072 (0.0012) [2024-06-15 14:49:25,955][1648985] Fps is (10 sec: 36045.3, 60 sec: 47513.8, 300 sec: 46097.4). Total num frames: 547061760. Throughput: 0: 11434.7. Samples: 136822784. Policy #0 lag: (min: 23.0, avg: 146.4, max: 274.0) [2024-06-15 14:49:25,955][1648985] Avg episode reward: [(0, '140.270')] [2024-06-15 14:49:26,036][1652491] Updated weights for policy 0, policy_version 267133 (0.0012) [2024-06-15 14:49:27,823][1652491] Updated weights for policy 0, policy_version 267190 (0.0023) [2024-06-15 14:49:29,423][1652491] Updated weights for policy 0, policy_version 267258 (0.0025) [2024-06-15 14:49:30,955][1648985] Fps is (10 sec: 52430.5, 60 sec: 46421.5, 300 sec: 46764.4). Total num frames: 547356672. Throughput: 0: 11491.5. Samples: 136882688. Policy #0 lag: (min: 23.0, avg: 146.4, max: 274.0) [2024-06-15 14:49:30,956][1648985] Avg episode reward: [(0, '154.760')] [2024-06-15 14:49:35,049][1652491] Updated weights for policy 0, policy_version 267325 (0.0015) [2024-06-15 14:49:35,955][1648985] Fps is (10 sec: 42598.0, 60 sec: 45875.2, 300 sec: 46208.5). Total num frames: 547487744. Throughput: 0: 11525.7. Samples: 136964096. Policy #0 lag: (min: 23.0, avg: 146.4, max: 274.0) [2024-06-15 14:49:35,955][1648985] Avg episode reward: [(0, '168.760')] [2024-06-15 14:49:37,494][1652491] Updated weights for policy 0, policy_version 267392 (0.0013) [2024-06-15 14:49:39,621][1652491] Updated weights for policy 0, policy_version 267456 (0.0013) [2024-06-15 14:49:40,955][1648985] Fps is (10 sec: 49151.9, 60 sec: 48059.7, 300 sec: 46986.0). Total num frames: 547848192. Throughput: 0: 11548.4. Samples: 136992768. Policy #0 lag: (min: 23.0, avg: 146.4, max: 274.0) [2024-06-15 14:49:40,956][1648985] Avg episode reward: [(0, '155.640')] [2024-06-15 14:49:41,174][1652491] Updated weights for policy 0, policy_version 267520 (0.0013) [2024-06-15 14:49:45,955][1648985] Fps is (10 sec: 39321.1, 60 sec: 43690.6, 300 sec: 45764.1). Total num frames: 547880960. Throughput: 0: 11298.1. Samples: 137059328. Policy #0 lag: (min: 23.0, avg: 146.4, max: 274.0) [2024-06-15 14:49:45,956][1648985] Avg episode reward: [(0, '133.180')] [2024-06-15 14:49:47,587][1652491] Updated weights for policy 0, policy_version 267583 (0.0015) [2024-06-15 14:49:49,488][1652491] Updated weights for policy 0, policy_version 267640 (0.0015) [2024-06-15 14:49:50,820][1651469] Signal inference workers to stop experience collection... (14000 times) [2024-06-15 14:49:50,898][1652491] InferenceWorker_p0-w0: stopping experience collection (14000 times) [2024-06-15 14:49:50,955][1648985] Fps is (10 sec: 36045.0, 60 sec: 46421.4, 300 sec: 46430.6). Total num frames: 548208640. Throughput: 0: 11411.9. Samples: 137126912. Policy #0 lag: (min: 23.0, avg: 146.4, max: 274.0) [2024-06-15 14:49:50,956][1648985] Avg episode reward: [(0, '140.180')] [2024-06-15 14:49:51,086][1651469] Signal inference workers to resume experience collection... (14000 times) [2024-06-15 14:49:51,087][1652491] InferenceWorker_p0-w0: resuming experience collection (14000 times) [2024-06-15 14:49:51,262][1652491] Updated weights for policy 0, policy_version 267697 (0.0100) [2024-06-15 14:49:52,328][1652491] Updated weights for policy 0, policy_version 267748 (0.0014) [2024-06-15 14:49:55,955][1648985] Fps is (10 sec: 52428.4, 60 sec: 43690.6, 300 sec: 46208.4). Total num frames: 548405248. Throughput: 0: 11309.5. Samples: 137160192. Policy #0 lag: (min: 23.0, avg: 146.4, max: 274.0) [2024-06-15 14:49:55,956][1648985] Avg episode reward: [(0, '158.600')] [2024-06-15 14:49:57,372][1652491] Updated weights for policy 0, policy_version 267792 (0.0014) [2024-06-15 14:49:59,782][1652491] Updated weights for policy 0, policy_version 267844 (0.0013) [2024-06-15 14:50:00,955][1648985] Fps is (10 sec: 42598.7, 60 sec: 46421.3, 300 sec: 46208.4). Total num frames: 548634624. Throughput: 0: 11434.7. Samples: 137236480. Policy #0 lag: (min: 23.0, avg: 146.4, max: 274.0) [2024-06-15 14:50:00,955][1648985] Avg episode reward: [(0, '135.180')] [2024-06-15 14:50:01,222][1652491] Updated weights for policy 0, policy_version 267904 (0.0012) [2024-06-15 14:50:02,433][1652491] Updated weights for policy 0, policy_version 267952 (0.0012) [2024-06-15 14:50:03,685][1652491] Updated weights for policy 0, policy_version 268002 (0.0018) [2024-06-15 14:50:05,955][1648985] Fps is (10 sec: 52428.2, 60 sec: 44782.8, 300 sec: 46652.7). Total num frames: 548929536. Throughput: 0: 11309.5. Samples: 137297920. Policy #0 lag: (min: 23.0, avg: 146.4, max: 274.0) [2024-06-15 14:50:05,956][1648985] Avg episode reward: [(0, '132.820')] [2024-06-15 14:50:08,751][1652491] Updated weights for policy 0, policy_version 268041 (0.0011) [2024-06-15 14:50:09,807][1652491] Updated weights for policy 0, policy_version 268090 (0.0064) [2024-06-15 14:50:10,955][1648985] Fps is (10 sec: 42598.0, 60 sec: 45875.3, 300 sec: 46208.4). Total num frames: 549060608. Throughput: 0: 11400.5. Samples: 137335808. Policy #0 lag: (min: 23.0, avg: 146.4, max: 274.0) [2024-06-15 14:50:10,956][1648985] Avg episode reward: [(0, '137.740')] [2024-06-15 14:50:12,358][1652491] Updated weights for policy 0, policy_version 268145 (0.0013) [2024-06-15 14:50:13,807][1652491] Updated weights for policy 0, policy_version 268208 (0.0078) [2024-06-15 14:50:15,030][1652491] Updated weights for policy 0, policy_version 268259 (0.0012) [2024-06-15 14:50:15,955][1648985] Fps is (10 sec: 52428.4, 60 sec: 45874.9, 300 sec: 46985.9). Total num frames: 549453824. Throughput: 0: 11548.4. Samples: 137402368. Policy #0 lag: (min: 23.0, avg: 146.4, max: 274.0) [2024-06-15 14:50:15,956][1648985] Avg episode reward: [(0, '133.220')] [2024-06-15 14:50:20,313][1652491] Updated weights for policy 0, policy_version 268321 (0.0015) [2024-06-15 14:50:20,955][1648985] Fps is (10 sec: 52429.3, 60 sec: 45875.5, 300 sec: 46208.4). Total num frames: 549584896. Throughput: 0: 11343.6. Samples: 137474560. Policy #0 lag: (min: 23.0, avg: 146.4, max: 274.0) [2024-06-15 14:50:20,955][1648985] Avg episode reward: [(0, '122.520')] [2024-06-15 14:50:22,802][1652491] Updated weights for policy 0, policy_version 268370 (0.0013) [2024-06-15 14:50:23,748][1652491] Updated weights for policy 0, policy_version 268412 (0.0012) [2024-06-15 14:50:25,022][1652491] Updated weights for policy 0, policy_version 268451 (0.0109) [2024-06-15 14:50:25,955][1648985] Fps is (10 sec: 39322.8, 60 sec: 46421.2, 300 sec: 46652.8). Total num frames: 549847040. Throughput: 0: 11548.4. Samples: 137512448. Policy #0 lag: (min: 23.0, avg: 146.4, max: 274.0) [2024-06-15 14:50:25,956][1648985] Avg episode reward: [(0, '119.370')] [2024-06-15 14:50:26,663][1652491] Updated weights for policy 0, policy_version 268514 (0.0012) [2024-06-15 14:50:30,955][1648985] Fps is (10 sec: 39321.3, 60 sec: 43690.7, 300 sec: 46097.4). Total num frames: 549978112. Throughput: 0: 11559.8. Samples: 137579520. Policy #0 lag: (min: 114.0, avg: 221.9, max: 351.0) [2024-06-15 14:50:30,956][1648985] Avg episode reward: [(0, '114.630')] [2024-06-15 14:50:31,265][1652491] Updated weights for policy 0, policy_version 268561 (0.0015) [2024-06-15 14:50:32,146][1652491] Updated weights for policy 0, policy_version 268608 (0.0020) [2024-06-15 14:50:33,489][1651469] Signal inference workers to stop experience collection... (14050 times) [2024-06-15 14:50:33,565][1652491] InferenceWorker_p0-w0: stopping experience collection (14050 times) [2024-06-15 14:50:33,679][1651469] Signal inference workers to resume experience collection... (14050 times) [2024-06-15 14:50:33,680][1652491] InferenceWorker_p0-w0: resuming experience collection (14050 times) [2024-06-15 14:50:34,576][1652491] Updated weights for policy 0, policy_version 268672 (0.0108) [2024-06-15 14:50:35,962][1648985] Fps is (10 sec: 45875.8, 60 sec: 46967.5, 300 sec: 46652.8). Total num frames: 550305792. Throughput: 0: 11673.6. Samples: 137652224. Policy #0 lag: (min: 114.0, avg: 221.9, max: 351.0) [2024-06-15 14:50:35,962][1648985] Avg episode reward: [(0, '113.750')] [2024-06-15 14:50:36,352][1652491] Updated weights for policy 0, policy_version 268720 (0.0011) [2024-06-15 14:50:37,924][1652491] Updated weights for policy 0, policy_version 268784 (0.0013) [2024-06-15 14:50:40,955][1648985] Fps is (10 sec: 52428.5, 60 sec: 44236.8, 300 sec: 46430.6). Total num frames: 550502400. Throughput: 0: 11548.5. Samples: 137679872. Policy #0 lag: (min: 114.0, avg: 221.9, max: 351.0) [2024-06-15 14:50:40,956][1648985] Avg episode reward: [(0, '122.110')] [2024-06-15 14:50:43,265][1652491] Updated weights for policy 0, policy_version 268848 (0.0012) [2024-06-15 14:50:45,414][1652491] Updated weights for policy 0, policy_version 268912 (0.0015) [2024-06-15 14:50:45,955][1648985] Fps is (10 sec: 45874.1, 60 sec: 48059.7, 300 sec: 46652.7). Total num frames: 550764544. Throughput: 0: 11537.0. Samples: 137755648. Policy #0 lag: (min: 114.0, avg: 221.9, max: 351.0) [2024-06-15 14:50:45,956][1648985] Avg episode reward: [(0, '124.580')] [2024-06-15 14:50:47,509][1652491] Updated weights for policy 0, policy_version 268961 (0.0013) [2024-06-15 14:50:49,230][1652491] Updated weights for policy 0, policy_version 269027 (0.0011) [2024-06-15 14:50:50,955][1648985] Fps is (10 sec: 52429.5, 60 sec: 46967.5, 300 sec: 46652.8). Total num frames: 551026688. Throughput: 0: 11594.0. Samples: 137819648. Policy #0 lag: (min: 114.0, avg: 221.9, max: 351.0) [2024-06-15 14:50:50,955][1648985] Avg episode reward: [(0, '132.550')] [2024-06-15 14:50:53,990][1652491] Updated weights for policy 0, policy_version 269077 (0.0013) [2024-06-15 14:50:55,741][1652491] Updated weights for policy 0, policy_version 269136 (0.0021) [2024-06-15 14:50:55,955][1648985] Fps is (10 sec: 42598.8, 60 sec: 46421.4, 300 sec: 46319.5). Total num frames: 551190528. Throughput: 0: 11696.3. Samples: 137862144. Policy #0 lag: (min: 114.0, avg: 221.9, max: 351.0) [2024-06-15 14:50:55,956][1648985] Avg episode reward: [(0, '133.430')] [2024-06-15 14:50:56,407][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000269168_551256064.pth... [2024-06-15 14:50:56,460][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000263744_540147712.pth [2024-06-15 14:50:56,469][1651469] Saving a milestone train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/milestones/checkpoint_000269168_551256064.pth [2024-06-15 14:50:56,776][1652491] Updated weights for policy 0, policy_version 269179 (0.0028) [2024-06-15 14:50:58,430][1652491] Updated weights for policy 0, policy_version 269232 (0.0142) [2024-06-15 14:51:00,049][1652491] Updated weights for policy 0, policy_version 269304 (0.0013) [2024-06-15 14:51:00,955][1648985] Fps is (10 sec: 52428.1, 60 sec: 48605.8, 300 sec: 46874.9). Total num frames: 551550976. Throughput: 0: 11559.9. Samples: 137922560. Policy #0 lag: (min: 114.0, avg: 221.9, max: 351.0) [2024-06-15 14:51:00,956][1648985] Avg episode reward: [(0, '122.920')] [2024-06-15 14:51:05,135][1652491] Updated weights for policy 0, policy_version 269360 (0.0013) [2024-06-15 14:51:05,955][1648985] Fps is (10 sec: 49152.2, 60 sec: 45875.4, 300 sec: 46319.5). Total num frames: 551682048. Throughput: 0: 11935.3. Samples: 138011648. Policy #0 lag: (min: 114.0, avg: 221.9, max: 351.0) [2024-06-15 14:51:05,956][1648985] Avg episode reward: [(0, '132.900')] [2024-06-15 14:51:06,325][1652491] Updated weights for policy 0, policy_version 269393 (0.0013) [2024-06-15 14:51:07,577][1652491] Updated weights for policy 0, policy_version 269441 (0.0016) [2024-06-15 14:51:09,100][1652491] Updated weights for policy 0, policy_version 269504 (0.0013) [2024-06-15 14:51:10,748][1652491] Updated weights for policy 0, policy_version 269566 (0.0012) [2024-06-15 14:51:10,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 50244.2, 300 sec: 47097.1). Total num frames: 552075264. Throughput: 0: 11776.0. Samples: 138042368. Policy #0 lag: (min: 114.0, avg: 221.9, max: 351.0) [2024-06-15 14:51:10,956][1648985] Avg episode reward: [(0, '125.820')] [2024-06-15 14:51:15,845][1651469] Signal inference workers to stop experience collection... (14100 times) [2024-06-15 14:51:15,879][1652491] InferenceWorker_p0-w0: stopping experience collection (14100 times) [2024-06-15 14:51:15,955][1648985] Fps is (10 sec: 45875.4, 60 sec: 44783.2, 300 sec: 46319.5). Total num frames: 552140800. Throughput: 0: 12049.1. Samples: 138121728. Policy #0 lag: (min: 114.0, avg: 221.9, max: 351.0) [2024-06-15 14:51:15,956][1648985] Avg episode reward: [(0, '131.650')] [2024-06-15 14:51:16,079][1651469] Signal inference workers to resume experience collection... (14100 times) [2024-06-15 14:51:16,080][1652491] InferenceWorker_p0-w0: resuming experience collection (14100 times) [2024-06-15 14:51:16,286][1652491] Updated weights for policy 0, policy_version 269626 (0.0013) [2024-06-15 14:51:17,726][1652491] Updated weights for policy 0, policy_version 269687 (0.0019) [2024-06-15 14:51:19,855][1652491] Updated weights for policy 0, policy_version 269752 (0.0135) [2024-06-15 14:51:20,955][1648985] Fps is (10 sec: 42598.9, 60 sec: 48605.9, 300 sec: 46874.9). Total num frames: 552501248. Throughput: 0: 11764.6. Samples: 138181632. Policy #0 lag: (min: 114.0, avg: 221.9, max: 351.0) [2024-06-15 14:51:20,956][1648985] Avg episode reward: [(0, '146.450')] [2024-06-15 14:51:21,325][1652491] Updated weights for policy 0, policy_version 269794 (0.0013) [2024-06-15 14:51:25,956][1648985] Fps is (10 sec: 45872.7, 60 sec: 45874.8, 300 sec: 46208.4). Total num frames: 552599552. Throughput: 0: 12003.4. Samples: 138220032. Policy #0 lag: (min: 114.0, avg: 221.9, max: 351.0) [2024-06-15 14:51:25,957][1648985] Avg episode reward: [(0, '139.810')] [2024-06-15 14:51:27,061][1652491] Updated weights for policy 0, policy_version 269856 (0.0022) [2024-06-15 14:51:29,372][1652491] Updated weights for policy 0, policy_version 269951 (0.0096) [2024-06-15 14:51:30,955][1648985] Fps is (10 sec: 39321.7, 60 sec: 48605.9, 300 sec: 46763.8). Total num frames: 552894464. Throughput: 0: 11901.2. Samples: 138291200. Policy #0 lag: (min: 114.0, avg: 221.9, max: 351.0) [2024-06-15 14:51:30,955][1648985] Avg episode reward: [(0, '129.560')] [2024-06-15 14:51:32,155][1652491] Updated weights for policy 0, policy_version 270018 (0.0155) [2024-06-15 14:51:33,308][1652491] Updated weights for policy 0, policy_version 270072 (0.0019) [2024-06-15 14:51:35,958][1648985] Fps is (10 sec: 52414.6, 60 sec: 46964.9, 300 sec: 46207.9). Total num frames: 553123840. Throughput: 0: 12014.1. Samples: 138360320. Policy #0 lag: (min: 114.0, avg: 221.9, max: 351.0) [2024-06-15 14:51:35,959][1648985] Avg episode reward: [(0, '136.100')] [2024-06-15 14:51:38,668][1652491] Updated weights for policy 0, policy_version 270128 (0.0013) [2024-06-15 14:51:40,334][1652491] Updated weights for policy 0, policy_version 270180 (0.0011) [2024-06-15 14:51:40,955][1648985] Fps is (10 sec: 49151.7, 60 sec: 48059.8, 300 sec: 46652.8). Total num frames: 553385984. Throughput: 0: 11901.2. Samples: 138397696. Policy #0 lag: (min: 42.0, avg: 117.7, max: 298.0) [2024-06-15 14:51:40,956][1648985] Avg episode reward: [(0, '134.510')] [2024-06-15 14:51:42,545][1652491] Updated weights for policy 0, policy_version 270240 (0.0013) [2024-06-15 14:51:44,613][1652491] Updated weights for policy 0, policy_version 270327 (0.0013) [2024-06-15 14:51:45,955][1648985] Fps is (10 sec: 52445.8, 60 sec: 48059.9, 300 sec: 46652.7). Total num frames: 553648128. Throughput: 0: 11878.4. Samples: 138457088. Policy #0 lag: (min: 42.0, avg: 117.7, max: 298.0) [2024-06-15 14:51:45,956][1648985] Avg episode reward: [(0, '134.050')] [2024-06-15 14:51:50,815][1652491] Updated weights for policy 0, policy_version 270385 (0.0015) [2024-06-15 14:51:50,955][1648985] Fps is (10 sec: 36044.5, 60 sec: 45329.0, 300 sec: 46319.6). Total num frames: 553746432. Throughput: 0: 11616.7. Samples: 138534400. Policy #0 lag: (min: 42.0, avg: 117.7, max: 298.0) [2024-06-15 14:51:50,956][1648985] Avg episode reward: [(0, '126.770')] [2024-06-15 14:51:52,221][1652491] Updated weights for policy 0, policy_version 270437 (0.0016) [2024-06-15 14:51:53,987][1652491] Updated weights for policy 0, policy_version 270480 (0.0013) [2024-06-15 14:51:55,665][1652491] Updated weights for policy 0, policy_version 270545 (0.0012) [2024-06-15 14:51:55,955][1648985] Fps is (10 sec: 42598.0, 60 sec: 48059.7, 300 sec: 46652.7). Total num frames: 554074112. Throughput: 0: 11616.7. Samples: 138565120. Policy #0 lag: (min: 42.0, avg: 117.7, max: 298.0) [2024-06-15 14:51:55,956][1648985] Avg episode reward: [(0, '121.710')] [2024-06-15 14:51:56,131][1651469] Signal inference workers to stop experience collection... (14150 times) [2024-06-15 14:51:56,179][1652491] InferenceWorker_p0-w0: stopping experience collection (14150 times) [2024-06-15 14:51:56,381][1651469] Signal inference workers to resume experience collection... (14150 times) [2024-06-15 14:51:56,382][1652491] InferenceWorker_p0-w0: resuming experience collection (14150 times) [2024-06-15 14:52:00,955][1648985] Fps is (10 sec: 42598.4, 60 sec: 43690.7, 300 sec: 46208.4). Total num frames: 554172416. Throughput: 0: 11343.6. Samples: 138632192. Policy #0 lag: (min: 42.0, avg: 117.7, max: 298.0) [2024-06-15 14:52:00,956][1648985] Avg episode reward: [(0, '119.160')] [2024-06-15 14:52:01,224][1652491] Updated weights for policy 0, policy_version 270594 (0.0014) [2024-06-15 14:52:03,026][1652491] Updated weights for policy 0, policy_version 270657 (0.0013) [2024-06-15 14:52:04,497][1652491] Updated weights for policy 0, policy_version 270710 (0.0012) [2024-06-15 14:52:05,955][1648985] Fps is (10 sec: 36044.3, 60 sec: 45875.1, 300 sec: 46430.6). Total num frames: 554434560. Throughput: 0: 11434.6. Samples: 138696192. Policy #0 lag: (min: 42.0, avg: 117.7, max: 298.0) [2024-06-15 14:52:05,956][1648985] Avg episode reward: [(0, '130.150')] [2024-06-15 14:52:06,633][1652491] Updated weights for policy 0, policy_version 270758 (0.0151) [2024-06-15 14:52:08,187][1652491] Updated weights for policy 0, policy_version 270817 (0.0035) [2024-06-15 14:52:10,955][1648985] Fps is (10 sec: 52429.1, 60 sec: 43690.7, 300 sec: 46208.5). Total num frames: 554696704. Throughput: 0: 11161.7. Samples: 138722304. Policy #0 lag: (min: 42.0, avg: 117.7, max: 298.0) [2024-06-15 14:52:10,956][1648985] Avg episode reward: [(0, '132.680')] [2024-06-15 14:52:13,651][1652491] Updated weights for policy 0, policy_version 270883 (0.0013) [2024-06-15 14:52:15,463][1652491] Updated weights for policy 0, policy_version 270929 (0.0011) [2024-06-15 14:52:15,955][1648985] Fps is (10 sec: 45876.0, 60 sec: 45875.1, 300 sec: 46319.5). Total num frames: 554893312. Throughput: 0: 11298.1. Samples: 138799616. Policy #0 lag: (min: 42.0, avg: 117.7, max: 298.0) [2024-06-15 14:52:15,956][1648985] Avg episode reward: [(0, '138.860')] [2024-06-15 14:52:16,396][1652491] Updated weights for policy 0, policy_version 270976 (0.0013) [2024-06-15 14:52:17,700][1652491] Updated weights for policy 0, policy_version 271031 (0.0014) [2024-06-15 14:52:19,497][1652491] Updated weights for policy 0, policy_version 271103 (0.0024) [2024-06-15 14:52:20,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 45329.0, 300 sec: 46208.4). Total num frames: 555220992. Throughput: 0: 11139.6. Samples: 138861568. Policy #0 lag: (min: 42.0, avg: 117.7, max: 298.0) [2024-06-15 14:52:20,956][1648985] Avg episode reward: [(0, '140.100')] [2024-06-15 14:52:25,453][1652491] Updated weights for policy 0, policy_version 271156 (0.0012) [2024-06-15 14:52:25,956][1648985] Fps is (10 sec: 45870.9, 60 sec: 45874.8, 300 sec: 46208.3). Total num frames: 555352064. Throughput: 0: 11241.0. Samples: 138903552. Policy #0 lag: (min: 42.0, avg: 117.7, max: 298.0) [2024-06-15 14:52:25,957][1648985] Avg episode reward: [(0, '148.180')] [2024-06-15 14:52:26,634][1652491] Updated weights for policy 0, policy_version 271184 (0.0010) [2024-06-15 14:52:27,642][1652491] Updated weights for policy 0, policy_version 271232 (0.0063) [2024-06-15 14:52:29,703][1652491] Updated weights for policy 0, policy_version 271312 (0.0035) [2024-06-15 14:52:30,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 47513.6, 300 sec: 46430.6). Total num frames: 555745280. Throughput: 0: 11377.8. Samples: 138969088. Policy #0 lag: (min: 42.0, avg: 117.7, max: 298.0) [2024-06-15 14:52:30,955][1648985] Avg episode reward: [(0, '143.030')] [2024-06-15 14:52:35,955][1648985] Fps is (10 sec: 39324.9, 60 sec: 43692.9, 300 sec: 46208.4). Total num frames: 555745280. Throughput: 0: 11286.7. Samples: 139042304. Policy #0 lag: (min: 42.0, avg: 117.7, max: 298.0) [2024-06-15 14:52:35,956][1648985] Avg episode reward: [(0, '146.760')] [2024-06-15 14:52:36,548][1652491] Updated weights for policy 0, policy_version 271378 (0.0046) [2024-06-15 14:52:37,654][1652491] Updated weights for policy 0, policy_version 271424 (0.0012) [2024-06-15 14:52:39,456][1652491] Updated weights for policy 0, policy_version 271478 (0.0013) [2024-06-15 14:52:40,955][1648985] Fps is (10 sec: 32767.3, 60 sec: 44782.8, 300 sec: 46097.3). Total num frames: 556072960. Throughput: 0: 11275.4. Samples: 139072512. Policy #0 lag: (min: 42.0, avg: 117.7, max: 298.0) [2024-06-15 14:52:40,956][1648985] Avg episode reward: [(0, '128.460')] [2024-06-15 14:52:41,095][1651469] Signal inference workers to stop experience collection... (14200 times) [2024-06-15 14:52:41,154][1652491] InferenceWorker_p0-w0: stopping experience collection (14200 times) [2024-06-15 14:52:41,280][1651469] Signal inference workers to resume experience collection... (14200 times) [2024-06-15 14:52:41,281][1652491] InferenceWorker_p0-w0: resuming experience collection (14200 times) [2024-06-15 14:52:41,650][1652491] Updated weights for policy 0, policy_version 271568 (0.0121) [2024-06-15 14:52:42,717][1652491] Updated weights for policy 0, policy_version 271616 (0.0011) [2024-06-15 14:52:45,955][1648985] Fps is (10 sec: 52429.5, 60 sec: 43690.7, 300 sec: 46208.4). Total num frames: 556269568. Throughput: 0: 11252.6. Samples: 139138560. Policy #0 lag: (min: 42.0, avg: 117.7, max: 298.0) [2024-06-15 14:52:45,955][1648985] Avg episode reward: [(0, '113.380')] [2024-06-15 14:52:48,476][1652491] Updated weights for policy 0, policy_version 271665 (0.0013) [2024-06-15 14:52:49,865][1652491] Updated weights for policy 0, policy_version 271684 (0.0025) [2024-06-15 14:52:50,955][1648985] Fps is (10 sec: 39322.1, 60 sec: 45329.1, 300 sec: 45875.2). Total num frames: 556466176. Throughput: 0: 11446.1. Samples: 139211264. Policy #0 lag: (min: 27.0, avg: 120.2, max: 283.0) [2024-06-15 14:52:50,956][1648985] Avg episode reward: [(0, '133.900')] [2024-06-15 14:52:51,954][1652491] Updated weights for policy 0, policy_version 271763 (0.0083) [2024-06-15 14:52:52,879][1652491] Updated weights for policy 0, policy_version 271809 (0.0033) [2024-06-15 14:52:55,955][1648985] Fps is (10 sec: 52428.3, 60 sec: 45329.1, 300 sec: 46208.5). Total num frames: 556793856. Throughput: 0: 11491.5. Samples: 139239424. Policy #0 lag: (min: 27.0, avg: 120.2, max: 283.0) [2024-06-15 14:52:55,956][1648985] Avg episode reward: [(0, '142.660')] [2024-06-15 14:52:55,962][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000271872_556793856.pth... [2024-06-15 14:52:56,022][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000266496_545783808.pth [2024-06-15 14:52:58,412][1652491] Updated weights for policy 0, policy_version 271875 (0.0019) [2024-06-15 14:52:59,383][1652491] Updated weights for policy 0, policy_version 271927 (0.0053) [2024-06-15 14:53:00,955][1648985] Fps is (10 sec: 45875.1, 60 sec: 45875.2, 300 sec: 45986.3). Total num frames: 556924928. Throughput: 0: 11582.6. Samples: 139320832. Policy #0 lag: (min: 27.0, avg: 120.2, max: 283.0) [2024-06-15 14:53:00,956][1648985] Avg episode reward: [(0, '147.370')] [2024-06-15 14:53:02,040][1652491] Updated weights for policy 0, policy_version 271971 (0.0041) [2024-06-15 14:53:03,891][1652491] Updated weights for policy 0, policy_version 272048 (0.0013) [2024-06-15 14:53:05,581][1652491] Updated weights for policy 0, policy_version 272119 (0.0162) [2024-06-15 14:53:05,955][1648985] Fps is (10 sec: 52429.2, 60 sec: 48059.9, 300 sec: 46208.4). Total num frames: 557318144. Throughput: 0: 11480.2. Samples: 139378176. Policy #0 lag: (min: 27.0, avg: 120.2, max: 283.0) [2024-06-15 14:53:05,956][1648985] Avg episode reward: [(0, '150.840')] [2024-06-15 14:53:10,955][1648985] Fps is (10 sec: 42598.2, 60 sec: 44236.7, 300 sec: 45986.3). Total num frames: 557350912. Throughput: 0: 11423.5. Samples: 139417600. Policy #0 lag: (min: 27.0, avg: 120.2, max: 283.0) [2024-06-15 14:53:10,956][1648985] Avg episode reward: [(0, '145.460')] [2024-06-15 14:53:11,098][1652491] Updated weights for policy 0, policy_version 272161 (0.0016) [2024-06-15 14:53:12,782][1652491] Updated weights for policy 0, policy_version 272194 (0.0026) [2024-06-15 14:53:14,557][1652491] Updated weights for policy 0, policy_version 272257 (0.0017) [2024-06-15 14:53:15,955][1648985] Fps is (10 sec: 39322.1, 60 sec: 46967.6, 300 sec: 45764.1). Total num frames: 557711360. Throughput: 0: 11411.9. Samples: 139482624. Policy #0 lag: (min: 27.0, avg: 120.2, max: 283.0) [2024-06-15 14:53:15,956][1648985] Avg episode reward: [(0, '150.520')] [2024-06-15 14:53:16,179][1652491] Updated weights for policy 0, policy_version 272324 (0.0024) [2024-06-15 14:53:17,195][1652491] Updated weights for policy 0, policy_version 272373 (0.0015) [2024-06-15 14:53:20,955][1648985] Fps is (10 sec: 49152.7, 60 sec: 43690.7, 300 sec: 46208.4). Total num frames: 557842432. Throughput: 0: 11480.2. Samples: 139558912. Policy #0 lag: (min: 27.0, avg: 120.2, max: 283.0) [2024-06-15 14:53:20,956][1648985] Avg episode reward: [(0, '144.870')] [2024-06-15 14:53:22,018][1652491] Updated weights for policy 0, policy_version 272432 (0.0025) [2024-06-15 14:53:23,937][1651469] Signal inference workers to stop experience collection... (14250 times) [2024-06-15 14:53:23,975][1652491] InferenceWorker_p0-w0: stopping experience collection (14250 times) [2024-06-15 14:53:24,251][1651469] Signal inference workers to resume experience collection... (14250 times) [2024-06-15 14:53:24,251][1652491] InferenceWorker_p0-w0: resuming experience collection (14250 times) [2024-06-15 14:53:24,444][1652491] Updated weights for policy 0, policy_version 272485 (0.0013) [2024-06-15 14:53:25,966][1648985] Fps is (10 sec: 45823.7, 60 sec: 46959.5, 300 sec: 46095.7). Total num frames: 558170112. Throughput: 0: 11625.3. Samples: 139595776. Policy #0 lag: (min: 27.0, avg: 120.2, max: 283.0) [2024-06-15 14:53:25,967][1648985] Avg episode reward: [(0, '130.160')] [2024-06-15 14:53:26,025][1652491] Updated weights for policy 0, policy_version 272544 (0.0011) [2024-06-15 14:53:27,864][1652491] Updated weights for policy 0, policy_version 272638 (0.0012) [2024-06-15 14:53:30,955][1648985] Fps is (10 sec: 52428.3, 60 sec: 43690.6, 300 sec: 46208.4). Total num frames: 558366720. Throughput: 0: 11468.8. Samples: 139654656. Policy #0 lag: (min: 27.0, avg: 120.2, max: 283.0) [2024-06-15 14:53:30,956][1648985] Avg episode reward: [(0, '145.620')] [2024-06-15 14:53:33,224][1652491] Updated weights for policy 0, policy_version 272694 (0.0013) [2024-06-15 14:53:35,300][1652491] Updated weights for policy 0, policy_version 272736 (0.0016) [2024-06-15 14:53:35,955][1648985] Fps is (10 sec: 42646.3, 60 sec: 47513.8, 300 sec: 46208.4). Total num frames: 558596096. Throughput: 0: 11525.7. Samples: 139729920. Policy #0 lag: (min: 27.0, avg: 120.2, max: 283.0) [2024-06-15 14:53:35,955][1648985] Avg episode reward: [(0, '136.240')] [2024-06-15 14:53:36,533][1652491] Updated weights for policy 0, policy_version 272770 (0.0012) [2024-06-15 14:53:37,753][1652491] Updated weights for policy 0, policy_version 272823 (0.0011) [2024-06-15 14:53:39,038][1652491] Updated weights for policy 0, policy_version 272886 (0.0013) [2024-06-15 14:53:40,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 46967.6, 300 sec: 46208.4). Total num frames: 558891008. Throughput: 0: 11594.0. Samples: 139761152. Policy #0 lag: (min: 27.0, avg: 120.2, max: 283.0) [2024-06-15 14:53:40,956][1648985] Avg episode reward: [(0, '134.470')] [2024-06-15 14:53:43,837][1652491] Updated weights for policy 0, policy_version 272921 (0.0119) [2024-06-15 14:53:45,955][1648985] Fps is (10 sec: 42598.1, 60 sec: 45875.2, 300 sec: 46097.4). Total num frames: 559022080. Throughput: 0: 11446.1. Samples: 139835904. Policy #0 lag: (min: 27.0, avg: 120.2, max: 283.0) [2024-06-15 14:53:45,956][1648985] Avg episode reward: [(0, '134.930')] [2024-06-15 14:53:47,374][1652491] Updated weights for policy 0, policy_version 273008 (0.0013) [2024-06-15 14:53:49,474][1652491] Updated weights for policy 0, policy_version 273073 (0.0097) [2024-06-15 14:53:50,955][1648985] Fps is (10 sec: 49152.2, 60 sec: 48605.9, 300 sec: 46097.4). Total num frames: 559382528. Throughput: 0: 11343.6. Samples: 139888640. Policy #0 lag: (min: 27.0, avg: 120.2, max: 283.0) [2024-06-15 14:53:50,956][1648985] Avg episode reward: [(0, '142.310')] [2024-06-15 14:53:55,642][1652491] Updated weights for policy 0, policy_version 273168 (0.0013) [2024-06-15 14:53:55,955][1648985] Fps is (10 sec: 42598.1, 60 sec: 44236.8, 300 sec: 46097.3). Total num frames: 559448064. Throughput: 0: 11446.1. Samples: 139932672. Policy #0 lag: (min: 27.0, avg: 120.2, max: 283.0) [2024-06-15 14:53:55,956][1648985] Avg episode reward: [(0, '141.620')] [2024-06-15 14:53:56,734][1652491] Updated weights for policy 0, policy_version 273214 (0.0014) [2024-06-15 14:54:00,033][1652491] Updated weights for policy 0, policy_version 273285 (0.0068) [2024-06-15 14:54:00,955][1648985] Fps is (10 sec: 39320.8, 60 sec: 47513.5, 300 sec: 45875.2). Total num frames: 559775744. Throughput: 0: 11480.1. Samples: 139999232. Policy #0 lag: (min: 27.0, avg: 120.2, max: 283.0) [2024-06-15 14:54:00,956][1648985] Avg episode reward: [(0, '138.770')] [2024-06-15 14:54:01,973][1652491] Updated weights for policy 0, policy_version 273376 (0.0011) [2024-06-15 14:54:02,111][1651469] Signal inference workers to stop experience collection... (14300 times) [2024-06-15 14:54:02,150][1652491] InferenceWorker_p0-w0: stopping experience collection (14300 times) [2024-06-15 14:54:02,436][1651469] Signal inference workers to resume experience collection... (14300 times) [2024-06-15 14:54:02,437][1652491] InferenceWorker_p0-w0: resuming experience collection (14300 times) [2024-06-15 14:54:05,955][1648985] Fps is (10 sec: 49152.5, 60 sec: 43690.7, 300 sec: 46208.5). Total num frames: 559939584. Throughput: 0: 11332.3. Samples: 140068864. Policy #0 lag: (min: 71.0, avg: 209.0, max: 331.0) [2024-06-15 14:54:05,956][1648985] Avg episode reward: [(0, '147.640')] [2024-06-15 14:54:07,913][1652491] Updated weights for policy 0, policy_version 273456 (0.0013) [2024-06-15 14:54:10,955][1648985] Fps is (10 sec: 36045.5, 60 sec: 46421.4, 300 sec: 45542.0). Total num frames: 560136192. Throughput: 0: 11266.8. Samples: 140102656. Policy #0 lag: (min: 71.0, avg: 209.0, max: 331.0) [2024-06-15 14:54:10,956][1648985] Avg episode reward: [(0, '138.900')] [2024-06-15 14:54:11,839][1652491] Updated weights for policy 0, policy_version 273536 (0.0122) [2024-06-15 14:54:13,933][1652491] Updated weights for policy 0, policy_version 273617 (0.0014) [2024-06-15 14:54:15,001][1652491] Updated weights for policy 0, policy_version 273662 (0.0012) [2024-06-15 14:54:15,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 45875.1, 300 sec: 46208.5). Total num frames: 560463872. Throughput: 0: 11047.8. Samples: 140151808. Policy #0 lag: (min: 71.0, avg: 209.0, max: 331.0) [2024-06-15 14:54:15,956][1648985] Avg episode reward: [(0, '123.840')] [2024-06-15 14:54:20,519][1652491] Updated weights for policy 0, policy_version 273717 (0.0013) [2024-06-15 14:54:20,955][1648985] Fps is (10 sec: 45875.2, 60 sec: 45875.2, 300 sec: 45875.2). Total num frames: 560594944. Throughput: 0: 11218.5. Samples: 140234752. Policy #0 lag: (min: 71.0, avg: 209.0, max: 331.0) [2024-06-15 14:54:20,956][1648985] Avg episode reward: [(0, '130.210')] [2024-06-15 14:54:22,580][1652491] Updated weights for policy 0, policy_version 273747 (0.0020) [2024-06-15 14:54:24,209][1652491] Updated weights for policy 0, policy_version 273810 (0.0013) [2024-06-15 14:54:25,859][1652491] Updated weights for policy 0, policy_version 273874 (0.0012) [2024-06-15 14:54:25,955][1648985] Fps is (10 sec: 42597.7, 60 sec: 45337.4, 300 sec: 45875.2). Total num frames: 560889856. Throughput: 0: 11184.3. Samples: 140264448. Policy #0 lag: (min: 71.0, avg: 209.0, max: 331.0) [2024-06-15 14:54:25,956][1648985] Avg episode reward: [(0, '131.990')] [2024-06-15 14:54:26,863][1652491] Updated weights for policy 0, policy_version 273920 (0.0012) [2024-06-15 14:54:30,955][1648985] Fps is (10 sec: 39321.1, 60 sec: 43690.6, 300 sec: 45764.1). Total num frames: 560988160. Throughput: 0: 11104.7. Samples: 140335616. Policy #0 lag: (min: 71.0, avg: 209.0, max: 331.0) [2024-06-15 14:54:30,956][1648985] Avg episode reward: [(0, '130.670')] [2024-06-15 14:54:32,108][1652491] Updated weights for policy 0, policy_version 273978 (0.0013) [2024-06-15 14:54:34,559][1652491] Updated weights for policy 0, policy_version 274033 (0.0013) [2024-06-15 14:54:35,955][1648985] Fps is (10 sec: 42599.4, 60 sec: 45329.1, 300 sec: 45653.1). Total num frames: 561315840. Throughput: 0: 11298.2. Samples: 140397056. Policy #0 lag: (min: 71.0, avg: 209.0, max: 331.0) [2024-06-15 14:54:35,955][1648985] Avg episode reward: [(0, '134.990')] [2024-06-15 14:54:36,034][1652491] Updated weights for policy 0, policy_version 274087 (0.0013) [2024-06-15 14:54:37,083][1652491] Updated weights for policy 0, policy_version 274128 (0.0013) [2024-06-15 14:54:40,955][1648985] Fps is (10 sec: 52429.4, 60 sec: 43690.7, 300 sec: 46208.4). Total num frames: 561512448. Throughput: 0: 11116.1. Samples: 140432896. Policy #0 lag: (min: 71.0, avg: 209.0, max: 331.0) [2024-06-15 14:54:40,956][1648985] Avg episode reward: [(0, '136.220')] [2024-06-15 14:54:42,141][1652491] Updated weights for policy 0, policy_version 274178 (0.0011) [2024-06-15 14:54:43,403][1652491] Updated weights for policy 0, policy_version 274240 (0.0020) [2024-06-15 14:54:45,858][1652491] Updated weights for policy 0, policy_version 274290 (0.0013) [2024-06-15 14:54:45,955][1648985] Fps is (10 sec: 42598.6, 60 sec: 45329.1, 300 sec: 45875.2). Total num frames: 561741824. Throughput: 0: 11377.9. Samples: 140511232. Policy #0 lag: (min: 71.0, avg: 209.0, max: 331.0) [2024-06-15 14:54:45,955][1648985] Avg episode reward: [(0, '130.030')] [2024-06-15 14:54:46,306][1651469] Signal inference workers to stop experience collection... (14350 times) [2024-06-15 14:54:46,343][1652491] InferenceWorker_p0-w0: stopping experience collection (14350 times) [2024-06-15 14:54:46,532][1651469] Signal inference workers to resume experience collection... (14350 times) [2024-06-15 14:54:46,533][1652491] InferenceWorker_p0-w0: resuming experience collection (14350 times) [2024-06-15 14:54:47,463][1652491] Updated weights for policy 0, policy_version 274358 (0.0014) [2024-06-15 14:54:49,011][1652491] Updated weights for policy 0, policy_version 274401 (0.0011) [2024-06-15 14:54:50,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 44236.8, 300 sec: 46208.4). Total num frames: 562036736. Throughput: 0: 11309.5. Samples: 140577792. Policy #0 lag: (min: 71.0, avg: 209.0, max: 331.0) [2024-06-15 14:54:50,956][1648985] Avg episode reward: [(0, '128.900')] [2024-06-15 14:54:53,467][1652491] Updated weights for policy 0, policy_version 274450 (0.0011) [2024-06-15 14:54:55,780][1652491] Updated weights for policy 0, policy_version 274498 (0.0014) [2024-06-15 14:54:55,955][1648985] Fps is (10 sec: 42597.6, 60 sec: 45329.1, 300 sec: 45875.2). Total num frames: 562167808. Throughput: 0: 11537.1. Samples: 140621824. Policy #0 lag: (min: 71.0, avg: 209.0, max: 331.0) [2024-06-15 14:54:55,956][1648985] Avg episode reward: [(0, '127.090')] [2024-06-15 14:54:56,423][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000274528_562233344.pth... [2024-06-15 14:54:56,586][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000269168_551256064.pth [2024-06-15 14:54:57,836][1652491] Updated weights for policy 0, policy_version 274576 (0.0012) [2024-06-15 14:54:58,979][1652491] Updated weights for policy 0, policy_version 274621 (0.0036) [2024-06-15 14:55:00,671][1652491] Updated weights for policy 0, policy_version 274681 (0.0014) [2024-06-15 14:55:00,955][1648985] Fps is (10 sec: 52429.4, 60 sec: 46421.5, 300 sec: 46208.5). Total num frames: 562561024. Throughput: 0: 11685.0. Samples: 140677632. Policy #0 lag: (min: 71.0, avg: 209.0, max: 331.0) [2024-06-15 14:55:00,956][1648985] Avg episode reward: [(0, '147.750')] [2024-06-15 14:55:04,976][1652491] Updated weights for policy 0, policy_version 274740 (0.0013) [2024-06-15 14:55:05,955][1648985] Fps is (10 sec: 52429.1, 60 sec: 45875.2, 300 sec: 46208.4). Total num frames: 562692096. Throughput: 0: 11730.5. Samples: 140762624. Policy #0 lag: (min: 71.0, avg: 209.0, max: 331.0) [2024-06-15 14:55:05,956][1648985] Avg episode reward: [(0, '153.490')] [2024-06-15 14:55:07,729][1652491] Updated weights for policy 0, policy_version 274805 (0.0023) [2024-06-15 14:55:09,343][1652491] Updated weights for policy 0, policy_version 274877 (0.0012) [2024-06-15 14:55:10,852][1652491] Updated weights for policy 0, policy_version 274916 (0.0013) [2024-06-15 14:55:10,955][1648985] Fps is (10 sec: 45875.1, 60 sec: 48059.8, 300 sec: 45986.3). Total num frames: 563019776. Throughput: 0: 11696.4. Samples: 140790784. Policy #0 lag: (min: 71.0, avg: 209.0, max: 331.0) [2024-06-15 14:55:10,956][1648985] Avg episode reward: [(0, '143.760')] [2024-06-15 14:55:15,587][1652491] Updated weights for policy 0, policy_version 274962 (0.0012) [2024-06-15 14:55:15,955][1648985] Fps is (10 sec: 45874.2, 60 sec: 44782.8, 300 sec: 45986.2). Total num frames: 563150848. Throughput: 0: 12026.3. Samples: 140876800. Policy #0 lag: (min: 15.0, avg: 95.2, max: 271.0) [2024-06-15 14:55:15,956][1648985] Avg episode reward: [(0, '134.690')] [2024-06-15 14:55:17,532][1652491] Updated weights for policy 0, policy_version 275024 (0.0013) [2024-06-15 14:55:19,264][1652491] Updated weights for policy 0, policy_version 275104 (0.0021) [2024-06-15 14:55:20,955][1648985] Fps is (10 sec: 49151.5, 60 sec: 48605.8, 300 sec: 46319.5). Total num frames: 563511296. Throughput: 0: 11958.0. Samples: 140935168. Policy #0 lag: (min: 15.0, avg: 95.2, max: 271.0) [2024-06-15 14:55:20,956][1648985] Avg episode reward: [(0, '141.450')] [2024-06-15 14:55:21,047][1652491] Updated weights for policy 0, policy_version 275154 (0.0013) [2024-06-15 14:55:25,955][1648985] Fps is (10 sec: 45876.3, 60 sec: 45329.2, 300 sec: 46208.4). Total num frames: 563609600. Throughput: 0: 12037.7. Samples: 140974592. Policy #0 lag: (min: 15.0, avg: 95.2, max: 271.0) [2024-06-15 14:55:25,956][1648985] Avg episode reward: [(0, '149.360')] [2024-06-15 14:55:26,150][1652491] Updated weights for policy 0, policy_version 275205 (0.0014) [2024-06-15 14:55:28,056][1652491] Updated weights for policy 0, policy_version 275280 (0.0107) [2024-06-15 14:55:28,208][1651469] Signal inference workers to stop experience collection... (14400 times) [2024-06-15 14:55:28,265][1652491] InferenceWorker_p0-w0: stopping experience collection (14400 times) [2024-06-15 14:55:28,503][1651469] Signal inference workers to resume experience collection... (14400 times) [2024-06-15 14:55:28,503][1652491] InferenceWorker_p0-w0: resuming experience collection (14400 times) [2024-06-15 14:55:29,106][1652491] Updated weights for policy 0, policy_version 275328 (0.0011) [2024-06-15 14:55:30,413][1652491] Updated weights for policy 0, policy_version 275381 (0.0013) [2024-06-15 14:55:30,955][1648985] Fps is (10 sec: 49152.6, 60 sec: 50244.4, 300 sec: 46430.6). Total num frames: 564002816. Throughput: 0: 11946.7. Samples: 141048832. Policy #0 lag: (min: 15.0, avg: 95.2, max: 271.0) [2024-06-15 14:55:30,956][1648985] Avg episode reward: [(0, '142.810')] [2024-06-15 14:55:32,177][1652491] Updated weights for policy 0, policy_version 275450 (0.0013) [2024-06-15 14:55:35,955][1648985] Fps is (10 sec: 52427.3, 60 sec: 46967.2, 300 sec: 46208.4). Total num frames: 564133888. Throughput: 0: 12276.6. Samples: 141130240. Policy #0 lag: (min: 15.0, avg: 95.2, max: 271.0) [2024-06-15 14:55:35,956][1648985] Avg episode reward: [(0, '152.050')] [2024-06-15 14:55:37,747][1652491] Updated weights for policy 0, policy_version 275504 (0.0096) [2024-06-15 14:55:39,381][1652491] Updated weights for policy 0, policy_version 275552 (0.0013) [2024-06-15 14:55:40,955][1648985] Fps is (10 sec: 45874.6, 60 sec: 49152.0, 300 sec: 46430.6). Total num frames: 564461568. Throughput: 0: 12174.2. Samples: 141169664. Policy #0 lag: (min: 15.0, avg: 95.2, max: 271.0) [2024-06-15 14:55:40,956][1648985] Avg episode reward: [(0, '156.260')] [2024-06-15 14:55:41,068][1652491] Updated weights for policy 0, policy_version 275619 (0.0014) [2024-06-15 14:55:42,850][1652491] Updated weights for policy 0, policy_version 275706 (0.0171) [2024-06-15 14:55:45,955][1648985] Fps is (10 sec: 52430.2, 60 sec: 48605.8, 300 sec: 46208.4). Total num frames: 564658176. Throughput: 0: 12219.7. Samples: 141227520. Policy #0 lag: (min: 15.0, avg: 95.2, max: 271.0) [2024-06-15 14:55:45,956][1648985] Avg episode reward: [(0, '150.600')] [2024-06-15 14:55:48,633][1652491] Updated weights for policy 0, policy_version 275745 (0.0014) [2024-06-15 14:55:50,353][1652491] Updated weights for policy 0, policy_version 275808 (0.0039) [2024-06-15 14:55:50,955][1648985] Fps is (10 sec: 42598.5, 60 sec: 47513.6, 300 sec: 46430.6). Total num frames: 564887552. Throughput: 0: 12037.7. Samples: 141304320. Policy #0 lag: (min: 15.0, avg: 95.2, max: 271.0) [2024-06-15 14:55:50,956][1648985] Avg episode reward: [(0, '132.340')] [2024-06-15 14:55:52,233][1652491] Updated weights for policy 0, policy_version 275873 (0.0015) [2024-06-15 14:55:53,825][1652491] Updated weights for policy 0, policy_version 275952 (0.0219) [2024-06-15 14:55:55,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 50244.3, 300 sec: 46208.4). Total num frames: 565182464. Throughput: 0: 11923.9. Samples: 141327360. Policy #0 lag: (min: 15.0, avg: 95.2, max: 271.0) [2024-06-15 14:55:55,956][1648985] Avg episode reward: [(0, '125.500')] [2024-06-15 14:56:00,126][1652491] Updated weights for policy 0, policy_version 275984 (0.0011) [2024-06-15 14:56:00,955][1648985] Fps is (10 sec: 36045.1, 60 sec: 44782.9, 300 sec: 45986.3). Total num frames: 565248000. Throughput: 0: 11787.4. Samples: 141407232. Policy #0 lag: (min: 15.0, avg: 95.2, max: 271.0) [2024-06-15 14:56:00,956][1648985] Avg episode reward: [(0, '137.010')] [2024-06-15 14:56:02,213][1652491] Updated weights for policy 0, policy_version 276051 (0.0022) [2024-06-15 14:56:04,587][1652491] Updated weights for policy 0, policy_version 276160 (0.0016) [2024-06-15 14:56:05,955][1648985] Fps is (10 sec: 49152.3, 60 sec: 49698.2, 300 sec: 46097.4). Total num frames: 565673984. Throughput: 0: 11616.7. Samples: 141457920. Policy #0 lag: (min: 15.0, avg: 95.2, max: 271.0) [2024-06-15 14:56:05,956][1648985] Avg episode reward: [(0, '134.390')] [2024-06-15 14:56:06,055][1652491] Updated weights for policy 0, policy_version 276215 (0.0012) [2024-06-15 14:56:10,955][1648985] Fps is (10 sec: 45875.4, 60 sec: 44783.0, 300 sec: 45986.3). Total num frames: 565706752. Throughput: 0: 11662.2. Samples: 141499392. Policy #0 lag: (min: 15.0, avg: 95.2, max: 271.0) [2024-06-15 14:56:10,955][1648985] Avg episode reward: [(0, '136.340')] [2024-06-15 14:56:12,059][1651469] Signal inference workers to stop experience collection... (14450 times) [2024-06-15 14:56:12,106][1652491] InferenceWorker_p0-w0: stopping experience collection (14450 times) [2024-06-15 14:56:12,299][1651469] Signal inference workers to resume experience collection... (14450 times) [2024-06-15 14:56:12,299][1652491] InferenceWorker_p0-w0: resuming experience collection (14450 times) [2024-06-15 14:56:13,226][1652491] Updated weights for policy 0, policy_version 276276 (0.0014) [2024-06-15 14:56:14,297][1652491] Updated weights for policy 0, policy_version 276304 (0.0013) [2024-06-15 14:56:15,955][1648985] Fps is (10 sec: 32767.4, 60 sec: 47513.7, 300 sec: 45764.1). Total num frames: 566001664. Throughput: 0: 11559.8. Samples: 141569024. Policy #0 lag: (min: 15.0, avg: 95.2, max: 271.0) [2024-06-15 14:56:15,956][1648985] Avg episode reward: [(0, '134.500')] [2024-06-15 14:56:16,482][1652491] Updated weights for policy 0, policy_version 276389 (0.0013) [2024-06-15 14:56:18,218][1652491] Updated weights for policy 0, policy_version 276451 (0.0111) [2024-06-15 14:56:20,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 45329.2, 300 sec: 46208.5). Total num frames: 566231040. Throughput: 0: 11093.4. Samples: 141629440. Policy #0 lag: (min: 15.0, avg: 95.2, max: 271.0) [2024-06-15 14:56:20,955][1648985] Avg episode reward: [(0, '151.140')] [2024-06-15 14:56:23,924][1652491] Updated weights for policy 0, policy_version 276496 (0.0011) [2024-06-15 14:56:25,955][1648985] Fps is (10 sec: 36045.1, 60 sec: 45875.2, 300 sec: 45653.0). Total num frames: 566362112. Throughput: 0: 11059.2. Samples: 141667328. Policy #0 lag: (min: 15.0, avg: 95.2, max: 271.0) [2024-06-15 14:56:25,956][1648985] Avg episode reward: [(0, '140.430')] [2024-06-15 14:56:26,663][1652491] Updated weights for policy 0, policy_version 276560 (0.0030) [2024-06-15 14:56:29,305][1652491] Updated weights for policy 0, policy_version 276656 (0.0013) [2024-06-15 14:56:30,882][1652491] Updated weights for policy 0, policy_version 276720 (0.0020) [2024-06-15 14:56:30,955][1648985] Fps is (10 sec: 49151.9, 60 sec: 45329.1, 300 sec: 46097.9). Total num frames: 566722560. Throughput: 0: 10843.0. Samples: 141715456. Policy #0 lag: (min: 35.0, avg: 127.3, max: 291.0) [2024-06-15 14:56:30,955][1648985] Avg episode reward: [(0, '157.420')] [2024-06-15 14:56:35,955][1648985] Fps is (10 sec: 39321.6, 60 sec: 43690.8, 300 sec: 45319.8). Total num frames: 566755328. Throughput: 0: 10831.6. Samples: 141791744. Policy #0 lag: (min: 35.0, avg: 127.3, max: 291.0) [2024-06-15 14:56:35,956][1648985] Avg episode reward: [(0, '162.100')] [2024-06-15 14:56:37,349][1652491] Updated weights for policy 0, policy_version 276757 (0.0013) [2024-06-15 14:56:38,584][1652491] Updated weights for policy 0, policy_version 276816 (0.0012) [2024-06-15 14:56:39,730][1652491] Updated weights for policy 0, policy_version 276862 (0.0014) [2024-06-15 14:56:40,955][1648985] Fps is (10 sec: 36044.2, 60 sec: 43690.6, 300 sec: 45541.9). Total num frames: 567083008. Throughput: 0: 11047.8. Samples: 141824512. Policy #0 lag: (min: 35.0, avg: 127.3, max: 291.0) [2024-06-15 14:56:40,956][1648985] Avg episode reward: [(0, '158.320')] [2024-06-15 14:56:41,225][1652491] Updated weights for policy 0, policy_version 276914 (0.0010) [2024-06-15 14:56:42,935][1652491] Updated weights for policy 0, policy_version 276983 (0.0011) [2024-06-15 14:56:45,955][1648985] Fps is (10 sec: 52429.1, 60 sec: 43690.7, 300 sec: 45875.2). Total num frames: 567279616. Throughput: 0: 10774.8. Samples: 141892096. Policy #0 lag: (min: 35.0, avg: 127.3, max: 291.0) [2024-06-15 14:56:45,956][1648985] Avg episode reward: [(0, '169.730')] [2024-06-15 14:56:49,191][1652491] Updated weights for policy 0, policy_version 277040 (0.0013) [2024-06-15 14:56:50,893][1652491] Updated weights for policy 0, policy_version 277119 (0.0014) [2024-06-15 14:56:50,955][1648985] Fps is (10 sec: 45875.8, 60 sec: 44236.8, 300 sec: 45653.1). Total num frames: 567541760. Throughput: 0: 11116.1. Samples: 141958144. Policy #0 lag: (min: 35.0, avg: 127.3, max: 291.0) [2024-06-15 14:56:50,956][1648985] Avg episode reward: [(0, '165.740')] [2024-06-15 14:56:51,178][1651469] Signal inference workers to stop experience collection... (14500 times) [2024-06-15 14:56:51,273][1652491] InferenceWorker_p0-w0: stopping experience collection (14500 times) [2024-06-15 14:56:51,496][1651469] Signal inference workers to resume experience collection... (14500 times) [2024-06-15 14:56:51,497][1652491] InferenceWorker_p0-w0: resuming experience collection (14500 times) [2024-06-15 14:56:52,337][1652491] Updated weights for policy 0, policy_version 277168 (0.0023) [2024-06-15 14:56:53,934][1652491] Updated weights for policy 0, policy_version 277234 (0.0013) [2024-06-15 14:56:55,955][1648985] Fps is (10 sec: 52428.6, 60 sec: 43690.7, 300 sec: 46208.4). Total num frames: 567803904. Throughput: 0: 10763.4. Samples: 141983744. Policy #0 lag: (min: 35.0, avg: 127.3, max: 291.0) [2024-06-15 14:56:55,956][1648985] Avg episode reward: [(0, '163.520')] [2024-06-15 14:56:55,962][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000277248_567803904.pth... [2024-06-15 14:56:56,026][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000271872_556793856.pth [2024-06-15 14:57:00,315][1652491] Updated weights for policy 0, policy_version 277271 (0.0011) [2024-06-15 14:57:00,955][1648985] Fps is (10 sec: 36044.8, 60 sec: 44236.8, 300 sec: 45653.1). Total num frames: 567902208. Throughput: 0: 11070.6. Samples: 142067200. Policy #0 lag: (min: 35.0, avg: 127.3, max: 291.0) [2024-06-15 14:57:00,955][1648985] Avg episode reward: [(0, '146.140')] [2024-06-15 14:57:01,267][1652491] Updated weights for policy 0, policy_version 277318 (0.0020) [2024-06-15 14:57:02,073][1652491] Updated weights for policy 0, policy_version 277360 (0.0016) [2024-06-15 14:57:03,083][1652491] Updated weights for policy 0, policy_version 277393 (0.0012) [2024-06-15 14:57:04,807][1652491] Updated weights for policy 0, policy_version 277458 (0.0011) [2024-06-15 14:57:05,955][1648985] Fps is (10 sec: 52428.6, 60 sec: 44236.7, 300 sec: 46208.4). Total num frames: 568328192. Throughput: 0: 11081.9. Samples: 142128128. Policy #0 lag: (min: 35.0, avg: 127.3, max: 291.0) [2024-06-15 14:57:05,956][1648985] Avg episode reward: [(0, '140.080')] [2024-06-15 14:57:10,629][1652491] Updated weights for policy 0, policy_version 277507 (0.0030) [2024-06-15 14:57:10,955][1648985] Fps is (10 sec: 45874.2, 60 sec: 44236.6, 300 sec: 45653.0). Total num frames: 568360960. Throughput: 0: 11389.1. Samples: 142179840. Policy #0 lag: (min: 35.0, avg: 127.3, max: 291.0) [2024-06-15 14:57:10,956][1648985] Avg episode reward: [(0, '157.410')] [2024-06-15 14:57:11,878][1652491] Updated weights for policy 0, policy_version 277568 (0.0036) [2024-06-15 14:57:13,379][1652491] Updated weights for policy 0, policy_version 277632 (0.0013) [2024-06-15 14:57:15,955][1648985] Fps is (10 sec: 42598.6, 60 sec: 45875.3, 300 sec: 45875.2). Total num frames: 568754176. Throughput: 0: 11582.6. Samples: 142236672. Policy #0 lag: (min: 35.0, avg: 127.3, max: 291.0) [2024-06-15 14:57:15,956][1648985] Avg episode reward: [(0, '160.500')] [2024-06-15 14:57:16,475][1652491] Updated weights for policy 0, policy_version 277731 (0.0327) [2024-06-15 14:57:20,955][1648985] Fps is (10 sec: 49153.0, 60 sec: 43690.6, 300 sec: 45764.3). Total num frames: 568852480. Throughput: 0: 11582.6. Samples: 142312960. Policy #0 lag: (min: 35.0, avg: 127.3, max: 291.0) [2024-06-15 14:57:20,956][1648985] Avg episode reward: [(0, '161.780')] [2024-06-15 14:57:22,183][1652491] Updated weights for policy 0, policy_version 277765 (0.0013) [2024-06-15 14:57:23,695][1652491] Updated weights for policy 0, policy_version 277840 (0.0012) [2024-06-15 14:57:25,569][1652491] Updated weights for policy 0, policy_version 277920 (0.0013) [2024-06-15 14:57:25,955][1648985] Fps is (10 sec: 45874.2, 60 sec: 47513.5, 300 sec: 45653.0). Total num frames: 569212928. Throughput: 0: 11593.9. Samples: 142346240. Policy #0 lag: (min: 35.0, avg: 127.3, max: 291.0) [2024-06-15 14:57:25,956][1648985] Avg episode reward: [(0, '146.220')] [2024-06-15 14:57:27,291][1651469] Signal inference workers to stop experience collection... (14550 times) [2024-06-15 14:57:27,309][1652491] Updated weights for policy 0, policy_version 277985 (0.0013) [2024-06-15 14:57:27,349][1652491] InferenceWorker_p0-w0: stopping experience collection (14550 times) [2024-06-15 14:57:27,444][1651469] Signal inference workers to resume experience collection... (14550 times) [2024-06-15 14:57:27,445][1652491] InferenceWorker_p0-w0: resuming experience collection (14550 times) [2024-06-15 14:57:30,955][1648985] Fps is (10 sec: 52428.4, 60 sec: 44236.7, 300 sec: 46208.4). Total num frames: 569376768. Throughput: 0: 11639.4. Samples: 142415872. Policy #0 lag: (min: 35.0, avg: 127.3, max: 291.0) [2024-06-15 14:57:30,956][1648985] Avg episode reward: [(0, '151.770')] [2024-06-15 14:57:32,331][1652491] Updated weights for policy 0, policy_version 278019 (0.0014) [2024-06-15 14:57:33,545][1652491] Updated weights for policy 0, policy_version 278078 (0.0025) [2024-06-15 14:57:35,257][1652491] Updated weights for policy 0, policy_version 278138 (0.0106) [2024-06-15 14:57:35,955][1648985] Fps is (10 sec: 45876.6, 60 sec: 48606.0, 300 sec: 46097.4). Total num frames: 569671680. Throughput: 0: 11753.3. Samples: 142487040. Policy #0 lag: (min: 35.0, avg: 127.3, max: 291.0) [2024-06-15 14:57:35,955][1648985] Avg episode reward: [(0, '152.400')] [2024-06-15 14:57:37,265][1652491] Updated weights for policy 0, policy_version 278209 (0.0132) [2024-06-15 14:57:40,955][1648985] Fps is (10 sec: 52429.2, 60 sec: 46967.6, 300 sec: 46208.4). Total num frames: 569901056. Throughput: 0: 11855.7. Samples: 142517248. Policy #0 lag: (min: 35.0, avg: 127.3, max: 291.0) [2024-06-15 14:57:40,956][1648985] Avg episode reward: [(0, '139.580')] [2024-06-15 14:57:43,169][1652491] Updated weights for policy 0, policy_version 278275 (0.0013) [2024-06-15 14:57:44,450][1652491] Updated weights for policy 0, policy_version 278332 (0.0011) [2024-06-15 14:57:45,957][1648985] Fps is (10 sec: 42588.8, 60 sec: 46965.8, 300 sec: 46208.1). Total num frames: 570097664. Throughput: 0: 11866.4. Samples: 142601216. Policy #0 lag: (min: 10.0, avg: 71.0, max: 266.0) [2024-06-15 14:57:45,958][1648985] Avg episode reward: [(0, '136.720')] [2024-06-15 14:57:46,276][1652491] Updated weights for policy 0, policy_version 278384 (0.0028) [2024-06-15 14:57:47,666][1652491] Updated weights for policy 0, policy_version 278448 (0.0013) [2024-06-15 14:57:49,030][1652491] Updated weights for policy 0, policy_version 278512 (0.0027) [2024-06-15 14:57:50,955][1648985] Fps is (10 sec: 52428.4, 60 sec: 48059.7, 300 sec: 46208.4). Total num frames: 570425344. Throughput: 0: 12128.7. Samples: 142673920. Policy #0 lag: (min: 10.0, avg: 71.0, max: 266.0) [2024-06-15 14:57:50,956][1648985] Avg episode reward: [(0, '138.040')] [2024-06-15 14:57:53,250][1652491] Updated weights for policy 0, policy_version 278545 (0.0097) [2024-06-15 14:57:54,046][1652491] Updated weights for policy 0, policy_version 278591 (0.0013) [2024-06-15 14:57:55,955][1648985] Fps is (10 sec: 49162.5, 60 sec: 46421.3, 300 sec: 46319.5). Total num frames: 570589184. Throughput: 0: 11878.4. Samples: 142714368. Policy #0 lag: (min: 10.0, avg: 71.0, max: 266.0) [2024-06-15 14:57:55,955][1648985] Avg episode reward: [(0, '141.410')] [2024-06-15 14:57:57,059][1652491] Updated weights for policy 0, policy_version 278646 (0.0032) [2024-06-15 14:57:58,621][1652491] Updated weights for policy 0, policy_version 278717 (0.0013) [2024-06-15 14:57:59,928][1652491] Updated weights for policy 0, policy_version 278770 (0.0076) [2024-06-15 14:58:00,955][1648985] Fps is (10 sec: 52429.2, 60 sec: 50790.4, 300 sec: 46208.4). Total num frames: 570949632. Throughput: 0: 11958.1. Samples: 142774784. Policy #0 lag: (min: 10.0, avg: 71.0, max: 266.0) [2024-06-15 14:58:00,956][1648985] Avg episode reward: [(0, '141.440')] [2024-06-15 14:58:04,820][1652491] Updated weights for policy 0, policy_version 278818 (0.0019) [2024-06-15 14:58:05,955][1648985] Fps is (10 sec: 49153.0, 60 sec: 45875.4, 300 sec: 46541.7). Total num frames: 571080704. Throughput: 0: 12174.3. Samples: 142860800. Policy #0 lag: (min: 10.0, avg: 71.0, max: 266.0) [2024-06-15 14:58:05,955][1648985] Avg episode reward: [(0, '131.920')] [2024-06-15 14:58:06,884][1652491] Updated weights for policy 0, policy_version 278880 (0.0013) [2024-06-15 14:58:07,765][1652491] Updated weights for policy 0, policy_version 278912 (0.0012) [2024-06-15 14:58:09,588][1652491] Updated weights for policy 0, policy_version 278972 (0.0013) [2024-06-15 14:58:09,768][1651469] Signal inference workers to stop experience collection... (14600 times) [2024-06-15 14:58:09,864][1652491] InferenceWorker_p0-w0: stopping experience collection (14600 times) [2024-06-15 14:58:09,980][1651469] Signal inference workers to resume experience collection... (14600 times) [2024-06-15 14:58:09,981][1652491] InferenceWorker_p0-w0: resuming experience collection (14600 times) [2024-06-15 14:58:10,629][1652491] Updated weights for policy 0, policy_version 279024 (0.0014) [2024-06-15 14:58:10,955][1648985] Fps is (10 sec: 52428.3, 60 sec: 51882.8, 300 sec: 46652.7). Total num frames: 571473920. Throughput: 0: 12060.5. Samples: 142888960. Policy #0 lag: (min: 10.0, avg: 71.0, max: 266.0) [2024-06-15 14:58:10,956][1648985] Avg episode reward: [(0, '121.620')] [2024-06-15 14:58:15,955][1648985] Fps is (10 sec: 42597.2, 60 sec: 45875.1, 300 sec: 46319.5). Total num frames: 571506688. Throughput: 0: 12265.2. Samples: 142967808. Policy #0 lag: (min: 10.0, avg: 71.0, max: 266.0) [2024-06-15 14:58:15,956][1648985] Avg episode reward: [(0, '127.590')] [2024-06-15 14:58:16,155][1652491] Updated weights for policy 0, policy_version 279075 (0.0040) [2024-06-15 14:58:17,956][1652491] Updated weights for policy 0, policy_version 279122 (0.0032) [2024-06-15 14:58:19,256][1652491] Updated weights for policy 0, policy_version 279171 (0.0011) [2024-06-15 14:58:20,727][1652491] Updated weights for policy 0, policy_version 279232 (0.0152) [2024-06-15 14:58:20,955][1648985] Fps is (10 sec: 39322.4, 60 sec: 50244.4, 300 sec: 46432.4). Total num frames: 571867136. Throughput: 0: 12071.8. Samples: 143030272. Policy #0 lag: (min: 10.0, avg: 71.0, max: 266.0) [2024-06-15 14:58:20,955][1648985] Avg episode reward: [(0, '122.620')] [2024-06-15 14:58:21,914][1652491] Updated weights for policy 0, policy_version 279294 (0.0014) [2024-06-15 14:58:25,955][1648985] Fps is (10 sec: 49152.6, 60 sec: 46421.5, 300 sec: 46208.4). Total num frames: 571998208. Throughput: 0: 12242.5. Samples: 143068160. Policy #0 lag: (min: 10.0, avg: 71.0, max: 266.0) [2024-06-15 14:58:25,955][1648985] Avg episode reward: [(0, '136.620')] [2024-06-15 14:58:27,675][1652491] Updated weights for policy 0, policy_version 279356 (0.0014) [2024-06-15 14:58:29,517][1652491] Updated weights for policy 0, policy_version 279415 (0.0014) [2024-06-15 14:58:30,955][1648985] Fps is (10 sec: 42597.9, 60 sec: 48605.9, 300 sec: 46430.6). Total num frames: 572293120. Throughput: 0: 11970.0. Samples: 143139840. Policy #0 lag: (min: 10.0, avg: 71.0, max: 266.0) [2024-06-15 14:58:30,956][1648985] Avg episode reward: [(0, '135.160')] [2024-06-15 14:58:31,433][1652491] Updated weights for policy 0, policy_version 279472 (0.0012) [2024-06-15 14:58:32,870][1652491] Updated weights for policy 0, policy_version 279550 (0.0011) [2024-06-15 14:58:35,955][1648985] Fps is (10 sec: 52427.1, 60 sec: 47513.3, 300 sec: 46208.4). Total num frames: 572522496. Throughput: 0: 11935.2. Samples: 143211008. Policy #0 lag: (min: 10.0, avg: 71.0, max: 266.0) [2024-06-15 14:58:35,956][1648985] Avg episode reward: [(0, '127.910')] [2024-06-15 14:58:38,663][1652491] Updated weights for policy 0, policy_version 279612 (0.0013) [2024-06-15 14:58:40,291][1652491] Updated weights for policy 0, policy_version 279651 (0.0013) [2024-06-15 14:58:40,955][1648985] Fps is (10 sec: 49152.0, 60 sec: 48059.7, 300 sec: 46652.8). Total num frames: 572784640. Throughput: 0: 11798.8. Samples: 143245312. Policy #0 lag: (min: 10.0, avg: 71.0, max: 266.0) [2024-06-15 14:58:40,955][1648985] Avg episode reward: [(0, '146.030')] [2024-06-15 14:58:42,819][1652491] Updated weights for policy 0, policy_version 279738 (0.0015) [2024-06-15 14:58:44,071][1652491] Updated weights for policy 0, policy_version 279808 (0.0014) [2024-06-15 14:58:45,955][1648985] Fps is (10 sec: 52430.6, 60 sec: 49153.8, 300 sec: 46319.5). Total num frames: 573046784. Throughput: 0: 12083.2. Samples: 143318528. Policy #0 lag: (min: 10.0, avg: 71.0, max: 266.0) [2024-06-15 14:58:45,955][1648985] Avg episode reward: [(0, '149.480')] [2024-06-15 14:58:50,671][1652491] Updated weights for policy 0, policy_version 279890 (0.0030) [2024-06-15 14:58:50,960][1648985] Fps is (10 sec: 45854.0, 60 sec: 46963.9, 300 sec: 46763.1). Total num frames: 573243392. Throughput: 0: 11797.5. Samples: 143391744. Policy #0 lag: (min: 10.0, avg: 71.0, max: 266.0) [2024-06-15 14:58:50,960][1648985] Avg episode reward: [(0, '153.930')] [2024-06-15 14:58:53,086][1652491] Updated weights for policy 0, policy_version 279952 (0.0069) [2024-06-15 14:58:53,147][1651469] Signal inference workers to stop experience collection... (14650 times) [2024-06-15 14:58:53,181][1652491] InferenceWorker_p0-w0: stopping experience collection (14650 times) [2024-06-15 14:58:53,385][1651469] Signal inference workers to resume experience collection... (14650 times) [2024-06-15 14:58:53,386][1652491] InferenceWorker_p0-w0: resuming experience collection (14650 times) [2024-06-15 14:58:54,517][1652491] Updated weights for policy 0, policy_version 280016 (0.0013) [2024-06-15 14:58:55,955][1648985] Fps is (10 sec: 52428.3, 60 sec: 49698.1, 300 sec: 46763.9). Total num frames: 573571072. Throughput: 0: 12037.7. Samples: 143430656. Policy #0 lag: (min: 113.0, avg: 207.8, max: 369.0) [2024-06-15 14:58:55,956][1648985] Avg episode reward: [(0, '136.340')] [2024-06-15 14:58:55,960][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000280064_573571072.pth... [2024-06-15 14:58:56,009][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000274528_562233344.pth [2024-06-15 14:58:59,078][1652491] Updated weights for policy 0, policy_version 280102 (0.0014) [2024-06-15 14:59:00,955][1648985] Fps is (10 sec: 45896.3, 60 sec: 45875.2, 300 sec: 46652.7). Total num frames: 573702144. Throughput: 0: 11844.3. Samples: 143500800. Policy #0 lag: (min: 113.0, avg: 207.8, max: 369.0) [2024-06-15 14:59:00,956][1648985] Avg episode reward: [(0, '121.620')] [2024-06-15 14:59:01,723][1652491] Updated weights for policy 0, policy_version 280145 (0.0037) [2024-06-15 14:59:02,540][1652491] Updated weights for policy 0, policy_version 280184 (0.0044) [2024-06-15 14:59:04,770][1652491] Updated weights for policy 0, policy_version 280240 (0.0012) [2024-06-15 14:59:05,955][1648985] Fps is (10 sec: 45875.5, 60 sec: 49151.9, 300 sec: 47097.1). Total num frames: 574029824. Throughput: 0: 11867.0. Samples: 143564288. Policy #0 lag: (min: 113.0, avg: 207.8, max: 369.0) [2024-06-15 14:59:05,956][1648985] Avg episode reward: [(0, '136.120')] [2024-06-15 14:59:06,477][1652491] Updated weights for policy 0, policy_version 280316 (0.0016) [2024-06-15 14:59:10,955][1648985] Fps is (10 sec: 45875.1, 60 sec: 44783.0, 300 sec: 46430.6). Total num frames: 574160896. Throughput: 0: 11889.8. Samples: 143603200. Policy #0 lag: (min: 113.0, avg: 207.8, max: 369.0) [2024-06-15 14:59:10,956][1648985] Avg episode reward: [(0, '138.120')] [2024-06-15 14:59:11,376][1652491] Updated weights for policy 0, policy_version 280379 (0.0013) [2024-06-15 14:59:14,075][1652491] Updated weights for policy 0, policy_version 280442 (0.0013) [2024-06-15 14:59:15,955][1648985] Fps is (10 sec: 36045.2, 60 sec: 48059.9, 300 sec: 46763.9). Total num frames: 574390272. Throughput: 0: 11753.3. Samples: 143668736. Policy #0 lag: (min: 113.0, avg: 207.8, max: 369.0) [2024-06-15 14:59:15,955][1648985] Avg episode reward: [(0, '157.080')] [2024-06-15 14:59:16,835][1652491] Updated weights for policy 0, policy_version 280496 (0.0013) [2024-06-15 14:59:20,955][1648985] Fps is (10 sec: 45875.1, 60 sec: 45875.1, 300 sec: 46541.7). Total num frames: 574619648. Throughput: 0: 11628.2. Samples: 143734272. Policy #0 lag: (min: 113.0, avg: 207.8, max: 369.0) [2024-06-15 14:59:20,956][1648985] Avg episode reward: [(0, '134.550')] [2024-06-15 14:59:22,141][1652491] Updated weights for policy 0, policy_version 280592 (0.0016) [2024-06-15 14:59:23,363][1652491] Updated weights for policy 0, policy_version 280635 (0.0015) [2024-06-15 14:59:25,895][1652491] Updated weights for policy 0, policy_version 280697 (0.0013) [2024-06-15 14:59:25,955][1648985] Fps is (10 sec: 45873.8, 60 sec: 47513.4, 300 sec: 46986.0). Total num frames: 574849024. Throughput: 0: 11593.9. Samples: 143767040. Policy #0 lag: (min: 113.0, avg: 207.8, max: 369.0) [2024-06-15 14:59:25,956][1648985] Avg episode reward: [(0, '136.660')] [2024-06-15 14:59:28,284][1652491] Updated weights for policy 0, policy_version 280744 (0.0017) [2024-06-15 14:59:29,824][1652491] Updated weights for policy 0, policy_version 280816 (0.0011) [2024-06-15 14:59:30,955][1648985] Fps is (10 sec: 52428.5, 60 sec: 47513.5, 300 sec: 46874.9). Total num frames: 575143936. Throughput: 0: 11525.7. Samples: 143837184. Policy #0 lag: (min: 113.0, avg: 207.8, max: 369.0) [2024-06-15 14:59:30,956][1648985] Avg episode reward: [(0, '142.240')] [2024-06-15 14:59:34,269][1652491] Updated weights for policy 0, policy_version 280895 (0.0012) [2024-06-15 14:59:35,955][1648985] Fps is (10 sec: 42599.1, 60 sec: 45875.4, 300 sec: 46652.8). Total num frames: 575275008. Throughput: 0: 11549.6. Samples: 143911424. Policy #0 lag: (min: 113.0, avg: 207.8, max: 369.0) [2024-06-15 14:59:35,956][1648985] Avg episode reward: [(0, '164.140')] [2024-06-15 14:59:36,686][1651469] Signal inference workers to stop experience collection... (14700 times) [2024-06-15 14:59:36,733][1652491] InferenceWorker_p0-w0: stopping experience collection (14700 times) [2024-06-15 14:59:36,927][1651469] Signal inference workers to resume experience collection... (14700 times) [2024-06-15 14:59:36,928][1652491] InferenceWorker_p0-w0: resuming experience collection (14700 times) [2024-06-15 14:59:37,096][1652491] Updated weights for policy 0, policy_version 280954 (0.0017) [2024-06-15 14:59:38,655][1652491] Updated weights for policy 0, policy_version 280977 (0.0020) [2024-06-15 14:59:40,024][1652491] Updated weights for policy 0, policy_version 281044 (0.0020) [2024-06-15 14:59:40,824][1652491] Updated weights for policy 0, policy_version 281084 (0.0012) [2024-06-15 14:59:40,955][1648985] Fps is (10 sec: 52428.4, 60 sec: 48059.6, 300 sec: 47208.1). Total num frames: 575668224. Throughput: 0: 11559.8. Samples: 143950848. Policy #0 lag: (min: 113.0, avg: 207.8, max: 369.0) [2024-06-15 14:59:40,956][1648985] Avg episode reward: [(0, '171.090')] [2024-06-15 14:59:44,864][1652491] Updated weights for policy 0, policy_version 281142 (0.0014) [2024-06-15 14:59:45,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 45875.2, 300 sec: 46652.8). Total num frames: 575799296. Throughput: 0: 11559.8. Samples: 144020992. Policy #0 lag: (min: 113.0, avg: 207.8, max: 369.0) [2024-06-15 14:59:45,956][1648985] Avg episode reward: [(0, '180.960')] [2024-06-15 14:59:45,957][1651469] Saving new best policy, reward=180.960! [2024-06-15 14:59:47,892][1652491] Updated weights for policy 0, policy_version 281200 (0.0012) [2024-06-15 14:59:49,706][1652491] Updated weights for policy 0, policy_version 281238 (0.0012) [2024-06-15 14:59:50,901][1652491] Updated weights for policy 0, policy_version 281296 (0.0013) [2024-06-15 14:59:50,955][1648985] Fps is (10 sec: 42598.7, 60 sec: 47517.2, 300 sec: 47208.1). Total num frames: 576094208. Throughput: 0: 11628.1. Samples: 144087552. Policy #0 lag: (min: 113.0, avg: 207.8, max: 369.0) [2024-06-15 14:59:50,956][1648985] Avg episode reward: [(0, '170.020')] [2024-06-15 14:59:51,955][1652491] Updated weights for policy 0, policy_version 281338 (0.0020) [2024-06-15 14:59:55,955][1648985] Fps is (10 sec: 42597.9, 60 sec: 44236.7, 300 sec: 46319.5). Total num frames: 576225280. Throughput: 0: 11559.8. Samples: 144123392. Policy #0 lag: (min: 113.0, avg: 207.8, max: 369.0) [2024-06-15 14:59:55,956][1648985] Avg episode reward: [(0, '141.340')] [2024-06-15 14:59:56,368][1652491] Updated weights for policy 0, policy_version 281392 (0.0012) [2024-06-15 14:59:59,175][1652491] Updated weights for policy 0, policy_version 281444 (0.0017) [2024-06-15 15:00:00,825][1652491] Updated weights for policy 0, policy_version 281488 (0.0012) [2024-06-15 15:00:00,955][1648985] Fps is (10 sec: 39322.0, 60 sec: 46421.3, 300 sec: 46763.8). Total num frames: 576487424. Throughput: 0: 11559.8. Samples: 144188928. Policy #0 lag: (min: 113.0, avg: 207.8, max: 369.0) [2024-06-15 15:00:00,956][1648985] Avg episode reward: [(0, '155.100')] [2024-06-15 15:00:02,660][1652491] Updated weights for policy 0, policy_version 281552 (0.0011) [2024-06-15 15:00:03,914][1652491] Updated weights for policy 0, policy_version 281598 (0.0012) [2024-06-15 15:00:05,955][1648985] Fps is (10 sec: 49152.5, 60 sec: 44782.9, 300 sec: 46430.6). Total num frames: 576716800. Throughput: 0: 11707.7. Samples: 144261120. Policy #0 lag: (min: 113.0, avg: 207.8, max: 369.0) [2024-06-15 15:00:05,956][1648985] Avg episode reward: [(0, '154.310')] [2024-06-15 15:00:07,671][1652491] Updated weights for policy 0, policy_version 281657 (0.0014) [2024-06-15 15:00:10,486][1652491] Updated weights for policy 0, policy_version 281696 (0.0013) [2024-06-15 15:00:10,955][1648985] Fps is (10 sec: 45874.9, 60 sec: 46421.3, 300 sec: 46763.9). Total num frames: 576946176. Throughput: 0: 11719.1. Samples: 144294400. Policy #0 lag: (min: 11.0, avg: 105.6, max: 267.0) [2024-06-15 15:00:10,956][1648985] Avg episode reward: [(0, '145.530')] [2024-06-15 15:00:12,232][1652491] Updated weights for policy 0, policy_version 281745 (0.0018) [2024-06-15 15:00:14,245][1652491] Updated weights for policy 0, policy_version 281824 (0.0089) [2024-06-15 15:00:15,955][1648985] Fps is (10 sec: 52429.1, 60 sec: 47513.5, 300 sec: 46541.7). Total num frames: 577241088. Throughput: 0: 11673.6. Samples: 144362496. Policy #0 lag: (min: 11.0, avg: 105.6, max: 267.0) [2024-06-15 15:00:15,956][1648985] Avg episode reward: [(0, '124.490')] [2024-06-15 15:00:17,911][1652491] Updated weights for policy 0, policy_version 281891 (0.0141) [2024-06-15 15:00:20,955][1648985] Fps is (10 sec: 45875.3, 60 sec: 46421.3, 300 sec: 46763.8). Total num frames: 577404928. Throughput: 0: 11719.1. Samples: 144438784. Policy #0 lag: (min: 11.0, avg: 105.6, max: 267.0) [2024-06-15 15:00:20,956][1648985] Avg episode reward: [(0, '149.890')] [2024-06-15 15:00:21,289][1652491] Updated weights for policy 0, policy_version 281952 (0.0017) [2024-06-15 15:00:21,413][1651469] Signal inference workers to stop experience collection... (14750 times) [2024-06-15 15:00:21,497][1652491] InferenceWorker_p0-w0: stopping experience collection (14750 times) [2024-06-15 15:00:21,605][1651469] Signal inference workers to resume experience collection... (14750 times) [2024-06-15 15:00:21,607][1652491] InferenceWorker_p0-w0: resuming experience collection (14750 times) [2024-06-15 15:00:23,634][1652491] Updated weights for policy 0, policy_version 282000 (0.0016) [2024-06-15 15:00:25,724][1652491] Updated weights for policy 0, policy_version 282080 (0.0012) [2024-06-15 15:00:25,955][1648985] Fps is (10 sec: 45875.1, 60 sec: 47513.7, 300 sec: 46430.6). Total num frames: 577699840. Throughput: 0: 11616.7. Samples: 144473600. Policy #0 lag: (min: 11.0, avg: 105.6, max: 267.0) [2024-06-15 15:00:25,956][1648985] Avg episode reward: [(0, '164.680')] [2024-06-15 15:00:28,009][1652491] Updated weights for policy 0, policy_version 282128 (0.0011) [2024-06-15 15:00:30,955][1648985] Fps is (10 sec: 49152.1, 60 sec: 45875.3, 300 sec: 46652.8). Total num frames: 577896448. Throughput: 0: 11605.3. Samples: 144543232. Policy #0 lag: (min: 11.0, avg: 105.6, max: 267.0) [2024-06-15 15:00:30,956][1648985] Avg episode reward: [(0, '148.710')] [2024-06-15 15:00:31,915][1652491] Updated weights for policy 0, policy_version 282192 (0.0012) [2024-06-15 15:00:34,799][1652491] Updated weights for policy 0, policy_version 282244 (0.0011) [2024-06-15 15:00:35,955][1648985] Fps is (10 sec: 42598.7, 60 sec: 47513.7, 300 sec: 46319.5). Total num frames: 578125824. Throughput: 0: 11616.8. Samples: 144610304. Policy #0 lag: (min: 11.0, avg: 105.6, max: 267.0) [2024-06-15 15:00:35,955][1648985] Avg episode reward: [(0, '145.570')] [2024-06-15 15:00:36,083][1652491] Updated weights for policy 0, policy_version 282300 (0.0012) [2024-06-15 15:00:37,108][1652491] Updated weights for policy 0, policy_version 282337 (0.0011) [2024-06-15 15:00:39,644][1652491] Updated weights for policy 0, policy_version 282420 (0.0078) [2024-06-15 15:00:40,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 45875.4, 300 sec: 46652.8). Total num frames: 578420736. Throughput: 0: 11707.8. Samples: 144650240. Policy #0 lag: (min: 11.0, avg: 105.6, max: 267.0) [2024-06-15 15:00:40,955][1648985] Avg episode reward: [(0, '126.080')] [2024-06-15 15:00:44,012][1652491] Updated weights for policy 0, policy_version 282486 (0.0016) [2024-06-15 15:00:45,955][1648985] Fps is (10 sec: 49150.7, 60 sec: 46967.3, 300 sec: 46541.6). Total num frames: 578617344. Throughput: 0: 11810.1. Samples: 144720384. Policy #0 lag: (min: 11.0, avg: 105.6, max: 267.0) [2024-06-15 15:00:45,956][1648985] Avg episode reward: [(0, '122.290')] [2024-06-15 15:00:46,307][1652491] Updated weights for policy 0, policy_version 282550 (0.0012) [2024-06-15 15:00:48,672][1652491] Updated weights for policy 0, policy_version 282613 (0.0014) [2024-06-15 15:00:50,510][1652491] Updated weights for policy 0, policy_version 282642 (0.0013) [2024-06-15 15:00:50,955][1648985] Fps is (10 sec: 45875.0, 60 sec: 46421.4, 300 sec: 46430.6). Total num frames: 578879488. Throughput: 0: 11776.0. Samples: 144791040. Policy #0 lag: (min: 11.0, avg: 105.6, max: 267.0) [2024-06-15 15:00:50,956][1648985] Avg episode reward: [(0, '125.470')] [2024-06-15 15:00:51,441][1652491] Updated weights for policy 0, policy_version 282681 (0.0026) [2024-06-15 15:00:54,731][1652491] Updated weights for policy 0, policy_version 282720 (0.0164) [2024-06-15 15:00:55,955][1648985] Fps is (10 sec: 45876.0, 60 sec: 47513.7, 300 sec: 46874.9). Total num frames: 579076096. Throughput: 0: 11889.8. Samples: 144829440. Policy #0 lag: (min: 11.0, avg: 105.6, max: 267.0) [2024-06-15 15:00:55,956][1648985] Avg episode reward: [(0, '133.370')] [2024-06-15 15:00:55,961][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000282752_579076096.pth... [2024-06-15 15:00:56,140][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000277248_567803904.pth [2024-06-15 15:00:57,265][1652491] Updated weights for policy 0, policy_version 282800 (0.0074) [2024-06-15 15:00:59,444][1652491] Updated weights for policy 0, policy_version 282848 (0.0011) [2024-06-15 15:01:00,955][1648985] Fps is (10 sec: 45874.5, 60 sec: 47513.5, 300 sec: 46319.5). Total num frames: 579338240. Throughput: 0: 11753.2. Samples: 144891392. Policy #0 lag: (min: 11.0, avg: 105.6, max: 267.0) [2024-06-15 15:01:00,956][1648985] Avg episode reward: [(0, '140.120')] [2024-06-15 15:01:01,646][1652491] Updated weights for policy 0, policy_version 282881 (0.0013) [2024-06-15 15:01:02,918][1652491] Updated weights for policy 0, policy_version 282943 (0.0012) [2024-06-15 15:01:05,815][1651469] Signal inference workers to stop experience collection... (14800 times) [2024-06-15 15:01:05,870][1652491] InferenceWorker_p0-w0: stopping experience collection (14800 times) [2024-06-15 15:01:05,955][1648985] Fps is (10 sec: 45874.2, 60 sec: 46967.3, 300 sec: 46874.8). Total num frames: 579534848. Throughput: 0: 11696.3. Samples: 144965120. Policy #0 lag: (min: 11.0, avg: 105.6, max: 267.0) [2024-06-15 15:01:05,956][1648985] Avg episode reward: [(0, '129.600')] [2024-06-15 15:01:06,073][1651469] Signal inference workers to resume experience collection... (14800 times) [2024-06-15 15:01:06,074][1652491] InferenceWorker_p0-w0: resuming experience collection (14800 times) [2024-06-15 15:01:06,251][1652491] Updated weights for policy 0, policy_version 283000 (0.0014) [2024-06-15 15:01:08,710][1652491] Updated weights for policy 0, policy_version 283059 (0.0016) [2024-06-15 15:01:10,955][1648985] Fps is (10 sec: 42599.1, 60 sec: 46967.5, 300 sec: 46652.8). Total num frames: 579764224. Throughput: 0: 11571.2. Samples: 144994304. Policy #0 lag: (min: 11.0, avg: 105.6, max: 267.0) [2024-06-15 15:01:10,955][1648985] Avg episode reward: [(0, '128.990')] [2024-06-15 15:01:11,038][1652491] Updated weights for policy 0, policy_version 283104 (0.0038) [2024-06-15 15:01:11,780][1652491] Updated weights for policy 0, policy_version 283135 (0.0012) [2024-06-15 15:01:13,896][1652491] Updated weights for policy 0, policy_version 283184 (0.0011) [2024-06-15 15:01:15,955][1648985] Fps is (10 sec: 45876.0, 60 sec: 45875.1, 300 sec: 46652.7). Total num frames: 579993600. Throughput: 0: 11662.2. Samples: 145068032. Policy #0 lag: (min: 11.0, avg: 105.6, max: 267.0) [2024-06-15 15:01:15,956][1648985] Avg episode reward: [(0, '152.240')] [2024-06-15 15:01:17,510][1652491] Updated weights for policy 0, policy_version 283237 (0.0155) [2024-06-15 15:01:19,760][1652491] Updated weights for policy 0, policy_version 283301 (0.0013) [2024-06-15 15:01:20,955][1648985] Fps is (10 sec: 49151.9, 60 sec: 47513.6, 300 sec: 47097.1). Total num frames: 580255744. Throughput: 0: 11741.8. Samples: 145138688. Policy #0 lag: (min: 11.0, avg: 105.6, max: 267.0) [2024-06-15 15:01:20,956][1648985] Avg episode reward: [(0, '160.100')] [2024-06-15 15:01:22,042][1652491] Updated weights for policy 0, policy_version 283360 (0.0012) [2024-06-15 15:01:24,391][1652491] Updated weights for policy 0, policy_version 283412 (0.0013) [2024-06-15 15:01:25,375][1652491] Updated weights for policy 0, policy_version 283453 (0.0011) [2024-06-15 15:01:25,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 46967.5, 300 sec: 46763.8). Total num frames: 580517888. Throughput: 0: 11707.7. Samples: 145177088. Policy #0 lag: (min: 63.0, avg: 183.0, max: 319.0) [2024-06-15 15:01:25,956][1648985] Avg episode reward: [(0, '150.450')] [2024-06-15 15:01:28,465][1652491] Updated weights for policy 0, policy_version 283506 (0.0135) [2024-06-15 15:01:30,956][1648985] Fps is (10 sec: 45869.5, 60 sec: 46966.5, 300 sec: 47319.0). Total num frames: 580714496. Throughput: 0: 11684.7. Samples: 145246208. Policy #0 lag: (min: 63.0, avg: 183.0, max: 319.0) [2024-06-15 15:01:30,957][1648985] Avg episode reward: [(0, '137.820')] [2024-06-15 15:01:31,282][1652491] Updated weights for policy 0, policy_version 283574 (0.0012) [2024-06-15 15:01:33,257][1652491] Updated weights for policy 0, policy_version 283618 (0.0014) [2024-06-15 15:01:35,947][1652491] Updated weights for policy 0, policy_version 283680 (0.0013) [2024-06-15 15:01:35,955][1648985] Fps is (10 sec: 45875.3, 60 sec: 47513.6, 300 sec: 47097.1). Total num frames: 580976640. Throughput: 0: 11650.8. Samples: 145315328. Policy #0 lag: (min: 63.0, avg: 183.0, max: 319.0) [2024-06-15 15:01:35,956][1648985] Avg episode reward: [(0, '167.070')] [2024-06-15 15:01:38,098][1652491] Updated weights for policy 0, policy_version 283713 (0.0011) [2024-06-15 15:01:40,956][1648985] Fps is (10 sec: 45879.8, 60 sec: 45875.0, 300 sec: 47097.0). Total num frames: 581173248. Throughput: 0: 11628.0. Samples: 145352704. Policy #0 lag: (min: 63.0, avg: 183.0, max: 319.0) [2024-06-15 15:01:40,957][1648985] Avg episode reward: [(0, '162.320')] [2024-06-15 15:01:41,566][1652491] Updated weights for policy 0, policy_version 283798 (0.0028) [2024-06-15 15:01:43,871][1652491] Updated weights for policy 0, policy_version 283841 (0.0057) [2024-06-15 15:01:45,106][1652491] Updated weights for policy 0, policy_version 283903 (0.0014) [2024-06-15 15:01:45,955][1648985] Fps is (10 sec: 45875.6, 60 sec: 46967.7, 300 sec: 47097.1). Total num frames: 581435392. Throughput: 0: 11776.1. Samples: 145421312. Policy #0 lag: (min: 63.0, avg: 183.0, max: 319.0) [2024-06-15 15:01:45,955][1648985] Avg episode reward: [(0, '140.030')] [2024-06-15 15:01:46,864][1652491] Updated weights for policy 0, policy_version 283941 (0.0013) [2024-06-15 15:01:48,554][1652491] Updated weights for policy 0, policy_version 283969 (0.0026) [2024-06-15 15:01:50,955][1648985] Fps is (10 sec: 52430.3, 60 sec: 46967.5, 300 sec: 47097.1). Total num frames: 581697536. Throughput: 0: 11878.5. Samples: 145499648. Policy #0 lag: (min: 63.0, avg: 183.0, max: 319.0) [2024-06-15 15:01:50,956][1648985] Avg episode reward: [(0, '133.020')] [2024-06-15 15:01:51,674][1652491] Updated weights for policy 0, policy_version 284034 (0.0012) [2024-06-15 15:01:52,098][1651469] Signal inference workers to stop experience collection... (14850 times) [2024-06-15 15:01:52,145][1652491] InferenceWorker_p0-w0: stopping experience collection (14850 times) [2024-06-15 15:01:52,321][1651469] Signal inference workers to resume experience collection... (14850 times) [2024-06-15 15:01:52,323][1652491] InferenceWorker_p0-w0: resuming experience collection (14850 times) [2024-06-15 15:01:52,886][1652491] Updated weights for policy 0, policy_version 284084 (0.0023) [2024-06-15 15:01:55,727][1652491] Updated weights for policy 0, policy_version 284153 (0.0137) [2024-06-15 15:01:55,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 48059.8, 300 sec: 47652.5). Total num frames: 581959680. Throughput: 0: 12037.7. Samples: 145536000. Policy #0 lag: (min: 63.0, avg: 183.0, max: 319.0) [2024-06-15 15:01:55,955][1648985] Avg episode reward: [(0, '136.470')] [2024-06-15 15:01:58,106][1652491] Updated weights for policy 0, policy_version 284221 (0.0015) [2024-06-15 15:02:00,722][1652491] Updated weights for policy 0, policy_version 284263 (0.0012) [2024-06-15 15:02:00,955][1648985] Fps is (10 sec: 49151.3, 60 sec: 47513.6, 300 sec: 46986.0). Total num frames: 582189056. Throughput: 0: 12014.9. Samples: 145608704. Policy #0 lag: (min: 63.0, avg: 183.0, max: 319.0) [2024-06-15 15:02:00,956][1648985] Avg episode reward: [(0, '145.320')] [2024-06-15 15:02:03,984][1652491] Updated weights for policy 0, policy_version 284336 (0.0016) [2024-06-15 15:02:05,955][1648985] Fps is (10 sec: 42597.8, 60 sec: 47513.7, 300 sec: 47541.4). Total num frames: 582385664. Throughput: 0: 11912.5. Samples: 145674752. Policy #0 lag: (min: 63.0, avg: 183.0, max: 319.0) [2024-06-15 15:02:05,956][1648985] Avg episode reward: [(0, '146.140')] [2024-06-15 15:02:06,263][1652491] Updated weights for policy 0, policy_version 284386 (0.0015) [2024-06-15 15:02:08,049][1652491] Updated weights for policy 0, policy_version 284450 (0.0013) [2024-06-15 15:02:10,955][1648985] Fps is (10 sec: 42598.9, 60 sec: 47513.6, 300 sec: 46986.0). Total num frames: 582615040. Throughput: 0: 11867.0. Samples: 145711104. Policy #0 lag: (min: 63.0, avg: 183.0, max: 319.0) [2024-06-15 15:02:10,956][1648985] Avg episode reward: [(0, '149.130')] [2024-06-15 15:02:11,265][1652491] Updated weights for policy 0, policy_version 284496 (0.0011) [2024-06-15 15:02:14,265][1652491] Updated weights for policy 0, policy_version 284560 (0.0012) [2024-06-15 15:02:15,176][1652491] Updated weights for policy 0, policy_version 284600 (0.0012) [2024-06-15 15:02:15,955][1648985] Fps is (10 sec: 49151.9, 60 sec: 48059.7, 300 sec: 47541.4). Total num frames: 582877184. Throughput: 0: 12094.9. Samples: 145790464. Policy #0 lag: (min: 63.0, avg: 183.0, max: 319.0) [2024-06-15 15:02:15,956][1648985] Avg episode reward: [(0, '129.860')] [2024-06-15 15:02:18,019][1652491] Updated weights for policy 0, policy_version 284673 (0.0093) [2024-06-15 15:02:19,231][1652491] Updated weights for policy 0, policy_version 284736 (0.0011) [2024-06-15 15:02:20,955][1648985] Fps is (10 sec: 52428.0, 60 sec: 48059.6, 300 sec: 47208.1). Total num frames: 583139328. Throughput: 0: 12083.2. Samples: 145859072. Policy #0 lag: (min: 63.0, avg: 183.0, max: 319.0) [2024-06-15 15:02:20,956][1648985] Avg episode reward: [(0, '115.100')] [2024-06-15 15:02:22,574][1652491] Updated weights for policy 0, policy_version 284795 (0.0012) [2024-06-15 15:02:25,955][1648985] Fps is (10 sec: 45875.3, 60 sec: 46967.4, 300 sec: 47319.2). Total num frames: 583335936. Throughput: 0: 12083.3. Samples: 145896448. Policy #0 lag: (min: 63.0, avg: 183.0, max: 319.0) [2024-06-15 15:02:25,956][1648985] Avg episode reward: [(0, '110.500')] [2024-06-15 15:02:26,050][1652491] Updated weights for policy 0, policy_version 284838 (0.0015) [2024-06-15 15:02:28,068][1652491] Updated weights for policy 0, policy_version 284897 (0.0014) [2024-06-15 15:02:29,794][1652491] Updated weights for policy 0, policy_version 284976 (0.0031) [2024-06-15 15:02:30,955][1648985] Fps is (10 sec: 52429.9, 60 sec: 49153.1, 300 sec: 47430.3). Total num frames: 583663616. Throughput: 0: 12026.3. Samples: 145962496. Policy #0 lag: (min: 63.0, avg: 183.0, max: 319.0) [2024-06-15 15:02:30,955][1648985] Avg episode reward: [(0, '128.890')] [2024-06-15 15:02:32,713][1652491] Updated weights for policy 0, policy_version 285024 (0.0013) [2024-06-15 15:02:35,955][1648985] Fps is (10 sec: 45875.6, 60 sec: 46967.5, 300 sec: 47097.1). Total num frames: 583794688. Throughput: 0: 12003.6. Samples: 146039808. Policy #0 lag: (min: 63.0, avg: 183.0, max: 319.0) [2024-06-15 15:02:35,956][1648985] Avg episode reward: [(0, '135.090')] [2024-06-15 15:02:37,326][1651469] Signal inference workers to stop experience collection... (14900 times) [2024-06-15 15:02:37,372][1652491] InferenceWorker_p0-w0: stopping experience collection (14900 times) [2024-06-15 15:02:37,552][1651469] Signal inference workers to resume experience collection... (14900 times) [2024-06-15 15:02:37,553][1652491] InferenceWorker_p0-w0: resuming experience collection (14900 times) [2024-06-15 15:02:38,205][1652491] Updated weights for policy 0, policy_version 285114 (0.0014) [2024-06-15 15:02:40,138][1652491] Updated weights for policy 0, policy_version 285184 (0.0013) [2024-06-15 15:02:40,955][1648985] Fps is (10 sec: 42598.4, 60 sec: 48606.1, 300 sec: 47430.6). Total num frames: 584089600. Throughput: 0: 11980.8. Samples: 146075136. Policy #0 lag: (min: 10.0, avg: 91.1, max: 266.0) [2024-06-15 15:02:40,955][1648985] Avg episode reward: [(0, '105.600')] [2024-06-15 15:02:41,488][1652491] Updated weights for policy 0, policy_version 285245 (0.0012) [2024-06-15 15:02:44,419][1652491] Updated weights for policy 0, policy_version 285283 (0.0023) [2024-06-15 15:02:45,955][1648985] Fps is (10 sec: 52427.4, 60 sec: 48059.5, 300 sec: 47097.0). Total num frames: 584318976. Throughput: 0: 11810.1. Samples: 146140160. Policy #0 lag: (min: 10.0, avg: 91.1, max: 266.0) [2024-06-15 15:02:45,956][1648985] Avg episode reward: [(0, '126.660')] [2024-06-15 15:02:49,119][1652491] Updated weights for policy 0, policy_version 285344 (0.0013) [2024-06-15 15:02:50,766][1652491] Updated weights for policy 0, policy_version 285408 (0.0014) [2024-06-15 15:02:50,955][1648985] Fps is (10 sec: 42597.7, 60 sec: 46967.4, 300 sec: 47208.1). Total num frames: 584515584. Throughput: 0: 11901.1. Samples: 146210304. Policy #0 lag: (min: 10.0, avg: 91.1, max: 266.0) [2024-06-15 15:02:50,956][1648985] Avg episode reward: [(0, '146.010')] [2024-06-15 15:02:52,341][1652491] Updated weights for policy 0, policy_version 285459 (0.0012) [2024-06-15 15:02:53,259][1652491] Updated weights for policy 0, policy_version 285504 (0.0015) [2024-06-15 15:02:55,589][1652491] Updated weights for policy 0, policy_version 285568 (0.0013) [2024-06-15 15:02:55,955][1648985] Fps is (10 sec: 52429.1, 60 sec: 48059.5, 300 sec: 47097.0). Total num frames: 584843264. Throughput: 0: 11832.8. Samples: 146243584. Policy #0 lag: (min: 10.0, avg: 91.1, max: 266.0) [2024-06-15 15:02:55,956][1648985] Avg episode reward: [(0, '146.050')] [2024-06-15 15:02:55,982][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000285568_584843264.pth... [2024-06-15 15:02:56,040][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000280064_573571072.pth [2024-06-15 15:03:00,955][1648985] Fps is (10 sec: 42598.9, 60 sec: 45875.3, 300 sec: 46985.9). Total num frames: 584941568. Throughput: 0: 11832.9. Samples: 146322944. Policy #0 lag: (min: 10.0, avg: 91.1, max: 266.0) [2024-06-15 15:03:00,956][1648985] Avg episode reward: [(0, '146.510')] [2024-06-15 15:03:01,257][1652491] Updated weights for policy 0, policy_version 285638 (0.0013) [2024-06-15 15:03:02,713][1652491] Updated weights for policy 0, policy_version 285697 (0.0013) [2024-06-15 15:03:04,052][1652491] Updated weights for policy 0, policy_version 285757 (0.0012) [2024-06-15 15:03:05,955][1648985] Fps is (10 sec: 42599.6, 60 sec: 48059.9, 300 sec: 46763.8). Total num frames: 585269248. Throughput: 0: 11650.9. Samples: 146383360. Policy #0 lag: (min: 10.0, avg: 91.1, max: 266.0) [2024-06-15 15:03:05,955][1648985] Avg episode reward: [(0, '167.760')] [2024-06-15 15:03:10,955][1648985] Fps is (10 sec: 42598.7, 60 sec: 45875.2, 300 sec: 46986.0). Total num frames: 585367552. Throughput: 0: 11650.9. Samples: 146420736. Policy #0 lag: (min: 10.0, avg: 91.1, max: 266.0) [2024-06-15 15:03:10,955][1648985] Avg episode reward: [(0, '169.230')] [2024-06-15 15:03:11,501][1652491] Updated weights for policy 0, policy_version 285826 (0.0011) [2024-06-15 15:03:12,988][1652491] Updated weights for policy 0, policy_version 285892 (0.0015) [2024-06-15 15:03:14,500][1652491] Updated weights for policy 0, policy_version 285955 (0.0012) [2024-06-15 15:03:15,206][1651469] Signal inference workers to stop experience collection... (14950 times) [2024-06-15 15:03:15,269][1652491] InferenceWorker_p0-w0: stopping experience collection (14950 times) [2024-06-15 15:03:15,575][1651469] Signal inference workers to resume experience collection... (14950 times) [2024-06-15 15:03:15,575][1652491] InferenceWorker_p0-w0: resuming experience collection (14950 times) [2024-06-15 15:03:15,797][1652491] Updated weights for policy 0, policy_version 286009 (0.0013) [2024-06-15 15:03:15,955][1648985] Fps is (10 sec: 49151.4, 60 sec: 48059.8, 300 sec: 47097.0). Total num frames: 585760768. Throughput: 0: 11662.2. Samples: 146487296. Policy #0 lag: (min: 10.0, avg: 91.1, max: 266.0) [2024-06-15 15:03:15,956][1648985] Avg episode reward: [(0, '154.230')] [2024-06-15 15:03:17,816][1652491] Updated weights for policy 0, policy_version 286064 (0.0012) [2024-06-15 15:03:20,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 45875.3, 300 sec: 47097.1). Total num frames: 585891840. Throughput: 0: 11491.6. Samples: 146556928. Policy #0 lag: (min: 10.0, avg: 91.1, max: 266.0) [2024-06-15 15:03:20,956][1648985] Avg episode reward: [(0, '137.900')] [2024-06-15 15:03:24,609][1652491] Updated weights for policy 0, policy_version 286128 (0.0012) [2024-06-15 15:03:25,955][1648985] Fps is (10 sec: 32767.8, 60 sec: 45875.2, 300 sec: 46763.8). Total num frames: 586088448. Throughput: 0: 11571.2. Samples: 146595840. Policy #0 lag: (min: 10.0, avg: 91.1, max: 266.0) [2024-06-15 15:03:25,956][1648985] Avg episode reward: [(0, '149.820')] [2024-06-15 15:03:26,776][1652491] Updated weights for policy 0, policy_version 286209 (0.0101) [2024-06-15 15:03:28,963][1652491] Updated weights for policy 0, policy_version 286274 (0.0019) [2024-06-15 15:03:30,394][1652491] Updated weights for policy 0, policy_version 286332 (0.0014) [2024-06-15 15:03:30,955][1648985] Fps is (10 sec: 52427.9, 60 sec: 45875.0, 300 sec: 47097.1). Total num frames: 586416128. Throughput: 0: 11275.4. Samples: 146647552. Policy #0 lag: (min: 10.0, avg: 91.1, max: 266.0) [2024-06-15 15:03:30,956][1648985] Avg episode reward: [(0, '144.910')] [2024-06-15 15:03:35,955][1648985] Fps is (10 sec: 32768.6, 60 sec: 43690.7, 300 sec: 46208.4). Total num frames: 586416128. Throughput: 0: 11446.1. Samples: 146725376. Policy #0 lag: (min: 10.0, avg: 91.1, max: 266.0) [2024-06-15 15:03:35,955][1648985] Avg episode reward: [(0, '150.380')] [2024-06-15 15:03:37,209][1652491] Updated weights for policy 0, policy_version 286392 (0.0012) [2024-06-15 15:03:38,626][1652491] Updated weights for policy 0, policy_version 286452 (0.0014) [2024-06-15 15:03:40,030][1652491] Updated weights for policy 0, policy_version 286519 (0.0015) [2024-06-15 15:03:40,955][1648985] Fps is (10 sec: 42598.5, 60 sec: 45875.1, 300 sec: 46763.8). Total num frames: 586842112. Throughput: 0: 11241.3. Samples: 146749440. Policy #0 lag: (min: 10.0, avg: 91.1, max: 266.0) [2024-06-15 15:03:40,956][1648985] Avg episode reward: [(0, '150.740')] [2024-06-15 15:03:41,732][1652491] Updated weights for policy 0, policy_version 286576 (0.0012) [2024-06-15 15:03:45,955][1648985] Fps is (10 sec: 52428.1, 60 sec: 43690.8, 300 sec: 46431.3). Total num frames: 586940416. Throughput: 0: 11002.3. Samples: 146818048. Policy #0 lag: (min: 10.0, avg: 91.1, max: 266.0) [2024-06-15 15:03:45,956][1648985] Avg episode reward: [(0, '148.210')] [2024-06-15 15:03:48,319][1652491] Updated weights for policy 0, policy_version 286610 (0.0013) [2024-06-15 15:03:50,544][1652491] Updated weights for policy 0, policy_version 286704 (0.0110) [2024-06-15 15:03:50,955][1648985] Fps is (10 sec: 32767.8, 60 sec: 44236.8, 300 sec: 46097.3). Total num frames: 587169792. Throughput: 0: 11059.1. Samples: 146881024. Policy #0 lag: (min: 63.0, avg: 126.8, max: 319.0) [2024-06-15 15:03:50,957][1648985] Avg episode reward: [(0, '144.620')] [2024-06-15 15:03:52,295][1652491] Updated weights for policy 0, policy_version 286781 (0.0011) [2024-06-15 15:03:53,668][1652491] Updated weights for policy 0, policy_version 286818 (0.0014) [2024-06-15 15:03:55,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 43690.8, 300 sec: 46652.7). Total num frames: 587464704. Throughput: 0: 10808.9. Samples: 146907136. Policy #0 lag: (min: 63.0, avg: 126.8, max: 319.0) [2024-06-15 15:03:55,956][1648985] Avg episode reward: [(0, '141.980')] [2024-06-15 15:03:59,061][1652491] Updated weights for policy 0, policy_version 286849 (0.0014) [2024-06-15 15:04:00,437][1652491] Updated weights for policy 0, policy_version 286912 (0.0033) [2024-06-15 15:04:00,551][1651469] Signal inference workers to stop experience collection... (15000 times) [2024-06-15 15:04:00,581][1652491] InferenceWorker_p0-w0: stopping experience collection (15000 times) [2024-06-15 15:04:00,753][1651469] Signal inference workers to resume experience collection... (15000 times) [2024-06-15 15:04:00,754][1652491] InferenceWorker_p0-w0: resuming experience collection (15000 times) [2024-06-15 15:04:00,955][1648985] Fps is (10 sec: 45875.3, 60 sec: 44782.8, 300 sec: 46097.3). Total num frames: 587628544. Throughput: 0: 11081.9. Samples: 146985984. Policy #0 lag: (min: 63.0, avg: 126.8, max: 319.0) [2024-06-15 15:04:00,956][1648985] Avg episode reward: [(0, '154.370')] [2024-06-15 15:04:02,100][1652491] Updated weights for policy 0, policy_version 286977 (0.0013) [2024-06-15 15:04:03,507][1652491] Updated weights for policy 0, policy_version 287033 (0.0013) [2024-06-15 15:04:05,192][1652491] Updated weights for policy 0, policy_version 287088 (0.0097) [2024-06-15 15:04:05,955][1648985] Fps is (10 sec: 52429.1, 60 sec: 45329.0, 300 sec: 46874.9). Total num frames: 587988992. Throughput: 0: 10899.9. Samples: 147047424. Policy #0 lag: (min: 63.0, avg: 126.8, max: 319.0) [2024-06-15 15:04:05,956][1648985] Avg episode reward: [(0, '148.290')] [2024-06-15 15:04:10,507][1652491] Updated weights for policy 0, policy_version 287120 (0.0012) [2024-06-15 15:04:10,955][1648985] Fps is (10 sec: 42598.8, 60 sec: 44782.9, 300 sec: 46319.5). Total num frames: 588054528. Throughput: 0: 10968.2. Samples: 147089408. Policy #0 lag: (min: 63.0, avg: 126.8, max: 319.0) [2024-06-15 15:04:10,956][1648985] Avg episode reward: [(0, '134.300')] [2024-06-15 15:04:12,085][1652491] Updated weights for policy 0, policy_version 287184 (0.0013) [2024-06-15 15:04:13,383][1652491] Updated weights for policy 0, policy_version 287237 (0.0012) [2024-06-15 15:04:14,742][1652491] Updated weights for policy 0, policy_version 287290 (0.0019) [2024-06-15 15:04:15,955][1648985] Fps is (10 sec: 45874.7, 60 sec: 44782.9, 300 sec: 46874.9). Total num frames: 588447744. Throughput: 0: 11184.4. Samples: 147150848. Policy #0 lag: (min: 63.0, avg: 126.8, max: 319.0) [2024-06-15 15:04:15,956][1648985] Avg episode reward: [(0, '139.820')] [2024-06-15 15:04:16,547][1652491] Updated weights for policy 0, policy_version 287354 (0.0012) [2024-06-15 15:04:20,955][1648985] Fps is (10 sec: 45875.6, 60 sec: 43690.7, 300 sec: 46319.5). Total num frames: 588513280. Throughput: 0: 11127.5. Samples: 147226112. Policy #0 lag: (min: 63.0, avg: 126.8, max: 319.0) [2024-06-15 15:04:20,956][1648985] Avg episode reward: [(0, '132.350')] [2024-06-15 15:04:22,777][1652491] Updated weights for policy 0, policy_version 287408 (0.0012) [2024-06-15 15:04:23,771][1652491] Updated weights for policy 0, policy_version 287456 (0.0012) [2024-06-15 15:04:25,955][1648985] Fps is (10 sec: 42599.2, 60 sec: 46421.5, 300 sec: 46541.7). Total num frames: 588873728. Throughput: 0: 11343.7. Samples: 147259904. Policy #0 lag: (min: 63.0, avg: 126.8, max: 319.0) [2024-06-15 15:04:25,955][1648985] Avg episode reward: [(0, '143.040')] [2024-06-15 15:04:26,196][1652491] Updated weights for policy 0, policy_version 287547 (0.0028) [2024-06-15 15:04:28,132][1652491] Updated weights for policy 0, policy_version 287613 (0.0012) [2024-06-15 15:04:30,955][1648985] Fps is (10 sec: 52428.4, 60 sec: 43690.7, 300 sec: 46652.7). Total num frames: 589037568. Throughput: 0: 11161.6. Samples: 147320320. Policy #0 lag: (min: 63.0, avg: 126.8, max: 319.0) [2024-06-15 15:04:30,956][1648985] Avg episode reward: [(0, '160.650')] [2024-06-15 15:04:34,908][1652491] Updated weights for policy 0, policy_version 287681 (0.0099) [2024-06-15 15:04:35,955][1648985] Fps is (10 sec: 36044.5, 60 sec: 46967.4, 300 sec: 45986.3). Total num frames: 589234176. Throughput: 0: 11525.7. Samples: 147399680. Policy #0 lag: (min: 63.0, avg: 126.8, max: 319.0) [2024-06-15 15:04:35,956][1648985] Avg episode reward: [(0, '162.910')] [2024-06-15 15:04:36,862][1652491] Updated weights for policy 0, policy_version 287760 (0.0012) [2024-06-15 15:04:38,432][1652491] Updated weights for policy 0, policy_version 287810 (0.0012) [2024-06-15 15:04:39,222][1651469] Signal inference workers to stop experience collection... (15050 times) [2024-06-15 15:04:39,258][1652491] InferenceWorker_p0-w0: stopping experience collection (15050 times) [2024-06-15 15:04:39,392][1651469] Signal inference workers to resume experience collection... (15050 times) [2024-06-15 15:04:39,393][1652491] InferenceWorker_p0-w0: resuming experience collection (15050 times) [2024-06-15 15:04:40,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 45329.1, 300 sec: 46652.7). Total num frames: 589561856. Throughput: 0: 11537.1. Samples: 147426304. Policy #0 lag: (min: 63.0, avg: 126.8, max: 319.0) [2024-06-15 15:04:40,956][1648985] Avg episode reward: [(0, '155.910')] [2024-06-15 15:04:44,367][1652491] Updated weights for policy 0, policy_version 287875 (0.0133) [2024-06-15 15:04:45,332][1652491] Updated weights for policy 0, policy_version 287934 (0.0023) [2024-06-15 15:04:45,955][1648985] Fps is (10 sec: 45874.0, 60 sec: 45875.0, 300 sec: 46097.3). Total num frames: 589692928. Throughput: 0: 11514.3. Samples: 147504128. Policy #0 lag: (min: 63.0, avg: 126.8, max: 319.0) [2024-06-15 15:04:45,956][1648985] Avg episode reward: [(0, '152.780')] [2024-06-15 15:04:47,683][1652491] Updated weights for policy 0, policy_version 288003 (0.0125) [2024-06-15 15:04:49,283][1652491] Updated weights for policy 0, policy_version 288064 (0.0012) [2024-06-15 15:04:50,882][1652491] Updated weights for policy 0, policy_version 288112 (0.0027) [2024-06-15 15:04:50,955][1648985] Fps is (10 sec: 49152.0, 60 sec: 48059.8, 300 sec: 46874.9). Total num frames: 590053376. Throughput: 0: 11434.7. Samples: 147561984. Policy #0 lag: (min: 63.0, avg: 126.8, max: 319.0) [2024-06-15 15:04:50,956][1648985] Avg episode reward: [(0, '154.390')] [2024-06-15 15:04:55,958][1648985] Fps is (10 sec: 42585.7, 60 sec: 44234.4, 300 sec: 46207.9). Total num frames: 590118912. Throughput: 0: 11456.6. Samples: 147604992. Policy #0 lag: (min: 63.0, avg: 126.8, max: 319.0) [2024-06-15 15:04:55,959][1648985] Avg episode reward: [(0, '169.190')] [2024-06-15 15:04:56,136][1652491] Updated weights for policy 0, policy_version 288160 (0.0069) [2024-06-15 15:04:56,407][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000288176_590184448.pth... [2024-06-15 15:04:56,469][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000282752_579076096.pth [2024-06-15 15:04:57,682][1652491] Updated weights for policy 0, policy_version 288211 (0.0014) [2024-06-15 15:04:59,252][1652491] Updated weights for policy 0, policy_version 288274 (0.0015) [2024-06-15 15:05:00,955][1648985] Fps is (10 sec: 42598.8, 60 sec: 47513.7, 300 sec: 46652.8). Total num frames: 590479360. Throughput: 0: 11548.5. Samples: 147670528. Policy #0 lag: (min: 63.0, avg: 126.8, max: 319.0) [2024-06-15 15:05:00,956][1648985] Avg episode reward: [(0, '151.090')] [2024-06-15 15:05:01,071][1652491] Updated weights for policy 0, policy_version 288327 (0.0040) [2024-06-15 15:05:02,257][1652491] Updated weights for policy 0, policy_version 288375 (0.0012) [2024-06-15 15:05:05,955][1648985] Fps is (10 sec: 49167.4, 60 sec: 43690.6, 300 sec: 46319.5). Total num frames: 590610432. Throughput: 0: 11764.6. Samples: 147755520. Policy #0 lag: (min: 63.0, avg: 126.8, max: 319.0) [2024-06-15 15:05:05,956][1648985] Avg episode reward: [(0, '135.390')] [2024-06-15 15:05:07,094][1652491] Updated weights for policy 0, policy_version 288432 (0.0012) [2024-06-15 15:05:08,371][1652491] Updated weights for policy 0, policy_version 288480 (0.0014) [2024-06-15 15:05:10,025][1652491] Updated weights for policy 0, policy_version 288547 (0.0012) [2024-06-15 15:05:10,955][1648985] Fps is (10 sec: 52428.3, 60 sec: 49152.0, 300 sec: 46652.7). Total num frames: 591003648. Throughput: 0: 11707.7. Samples: 147786752. Policy #0 lag: (min: 9.0, avg: 78.6, max: 265.0) [2024-06-15 15:05:10,956][1648985] Avg episode reward: [(0, '144.400')] [2024-06-15 15:05:11,474][1652491] Updated weights for policy 0, policy_version 288597 (0.0033) [2024-06-15 15:05:15,955][1648985] Fps is (10 sec: 52429.5, 60 sec: 44783.0, 300 sec: 46541.7). Total num frames: 591134720. Throughput: 0: 11992.2. Samples: 147859968. Policy #0 lag: (min: 9.0, avg: 78.6, max: 265.0) [2024-06-15 15:05:15,956][1648985] Avg episode reward: [(0, '163.900')] [2024-06-15 15:05:17,419][1652491] Updated weights for policy 0, policy_version 288661 (0.0013) [2024-06-15 15:05:19,264][1652491] Updated weights for policy 0, policy_version 288736 (0.0092) [2024-06-15 15:05:20,162][1652491] Updated weights for policy 0, policy_version 288768 (0.0029) [2024-06-15 15:05:20,955][1648985] Fps is (10 sec: 42597.9, 60 sec: 48605.7, 300 sec: 46541.6). Total num frames: 591429632. Throughput: 0: 11810.1. Samples: 147931136. Policy #0 lag: (min: 9.0, avg: 78.6, max: 265.0) [2024-06-15 15:05:20,956][1648985] Avg episode reward: [(0, '158.100')] [2024-06-15 15:05:21,674][1652491] Updated weights for policy 0, policy_version 288826 (0.0013) [2024-06-15 15:05:22,359][1651469] Signal inference workers to stop experience collection... (15100 times) [2024-06-15 15:05:22,404][1652491] InferenceWorker_p0-w0: stopping experience collection (15100 times) [2024-06-15 15:05:22,538][1651469] Signal inference workers to resume experience collection... (15100 times) [2024-06-15 15:05:22,539][1652491] InferenceWorker_p0-w0: resuming experience collection (15100 times) [2024-06-15 15:05:23,014][1652491] Updated weights for policy 0, policy_version 288893 (0.0014) [2024-06-15 15:05:25,955][1648985] Fps is (10 sec: 52428.2, 60 sec: 46421.2, 300 sec: 46652.7). Total num frames: 591659008. Throughput: 0: 11923.9. Samples: 147962880. Policy #0 lag: (min: 9.0, avg: 78.6, max: 265.0) [2024-06-15 15:05:25,956][1648985] Avg episode reward: [(0, '146.650')] [2024-06-15 15:05:29,353][1652491] Updated weights for policy 0, policy_version 288944 (0.0013) [2024-06-15 15:05:30,747][1652491] Updated weights for policy 0, policy_version 288996 (0.0022) [2024-06-15 15:05:30,955][1648985] Fps is (10 sec: 42599.1, 60 sec: 46967.5, 300 sec: 46541.7). Total num frames: 591855616. Throughput: 0: 11969.5. Samples: 148042752. Policy #0 lag: (min: 9.0, avg: 78.6, max: 265.0) [2024-06-15 15:05:30,956][1648985] Avg episode reward: [(0, '136.120')] [2024-06-15 15:05:32,521][1652491] Updated weights for policy 0, policy_version 289059 (0.0012) [2024-06-15 15:05:34,456][1652491] Updated weights for policy 0, policy_version 289125 (0.0112) [2024-06-15 15:05:35,955][1648985] Fps is (10 sec: 52429.3, 60 sec: 49152.0, 300 sec: 46652.7). Total num frames: 592183296. Throughput: 0: 12015.0. Samples: 148102656. Policy #0 lag: (min: 9.0, avg: 78.6, max: 265.0) [2024-06-15 15:05:35,956][1648985] Avg episode reward: [(0, '150.950')] [2024-06-15 15:05:40,332][1652491] Updated weights for policy 0, policy_version 289169 (0.0014) [2024-06-15 15:05:40,955][1648985] Fps is (10 sec: 42598.5, 60 sec: 45329.1, 300 sec: 46319.5). Total num frames: 592281600. Throughput: 0: 12038.6. Samples: 148146688. Policy #0 lag: (min: 9.0, avg: 78.6, max: 265.0) [2024-06-15 15:05:40,956][1648985] Avg episode reward: [(0, '130.670')] [2024-06-15 15:05:42,205][1652491] Updated weights for policy 0, policy_version 289248 (0.0132) [2024-06-15 15:05:43,464][1652491] Updated weights for policy 0, policy_version 289296 (0.0013) [2024-06-15 15:05:44,672][1652491] Updated weights for policy 0, policy_version 289344 (0.0014) [2024-06-15 15:05:45,955][1648985] Fps is (10 sec: 42598.7, 60 sec: 48606.1, 300 sec: 46541.7). Total num frames: 592609280. Throughput: 0: 11821.5. Samples: 148202496. Policy #0 lag: (min: 9.0, avg: 78.6, max: 265.0) [2024-06-15 15:05:45,955][1648985] Avg episode reward: [(0, '119.890')] [2024-06-15 15:05:46,684][1652491] Updated weights for policy 0, policy_version 289396 (0.0017) [2024-06-15 15:05:50,955][1648985] Fps is (10 sec: 42597.8, 60 sec: 44236.8, 300 sec: 46208.4). Total num frames: 592707584. Throughput: 0: 11707.7. Samples: 148282368. Policy #0 lag: (min: 9.0, avg: 78.6, max: 265.0) [2024-06-15 15:05:50,956][1648985] Avg episode reward: [(0, '135.830')] [2024-06-15 15:05:52,595][1652491] Updated weights for policy 0, policy_version 289456 (0.0015) [2024-06-15 15:05:54,411][1652491] Updated weights for policy 0, policy_version 289522 (0.0012) [2024-06-15 15:05:55,917][1652491] Updated weights for policy 0, policy_version 289593 (0.0011) [2024-06-15 15:05:55,955][1648985] Fps is (10 sec: 45874.9, 60 sec: 49154.7, 300 sec: 46541.7). Total num frames: 593068032. Throughput: 0: 11582.6. Samples: 148307968. Policy #0 lag: (min: 9.0, avg: 78.6, max: 265.0) [2024-06-15 15:05:55,956][1648985] Avg episode reward: [(0, '153.500')] [2024-06-15 15:05:58,129][1652491] Updated weights for policy 0, policy_version 289648 (0.0018) [2024-06-15 15:06:00,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 45875.1, 300 sec: 46430.6). Total num frames: 593231872. Throughput: 0: 11491.5. Samples: 148377088. Policy #0 lag: (min: 9.0, avg: 78.6, max: 265.0) [2024-06-15 15:06:00,956][1648985] Avg episode reward: [(0, '161.520')] [2024-06-15 15:06:03,224][1652491] Updated weights for policy 0, policy_version 289680 (0.0032) [2024-06-15 15:06:05,384][1651469] Signal inference workers to stop experience collection... (15150 times) [2024-06-15 15:06:05,413][1652491] Updated weights for policy 0, policy_version 289763 (0.0014) [2024-06-15 15:06:05,460][1652491] InferenceWorker_p0-w0: stopping experience collection (15150 times) [2024-06-15 15:06:05,627][1651469] Signal inference workers to resume experience collection... (15150 times) [2024-06-15 15:06:05,627][1652491] InferenceWorker_p0-w0: resuming experience collection (15150 times) [2024-06-15 15:06:05,955][1648985] Fps is (10 sec: 39321.6, 60 sec: 47513.7, 300 sec: 46430.6). Total num frames: 593461248. Throughput: 0: 11355.1. Samples: 148442112. Policy #0 lag: (min: 9.0, avg: 78.6, max: 265.0) [2024-06-15 15:06:05,956][1648985] Avg episode reward: [(0, '162.200')] [2024-06-15 15:06:06,924][1652491] Updated weights for policy 0, policy_version 289827 (0.0012) [2024-06-15 15:06:09,412][1652491] Updated weights for policy 0, policy_version 289875 (0.0011) [2024-06-15 15:06:10,281][1652491] Updated weights for policy 0, policy_version 289914 (0.0045) [2024-06-15 15:06:10,958][1648985] Fps is (10 sec: 52413.2, 60 sec: 45872.9, 300 sec: 46652.3). Total num frames: 593756160. Throughput: 0: 11388.4. Samples: 148475392. Policy #0 lag: (min: 9.0, avg: 78.6, max: 265.0) [2024-06-15 15:06:10,959][1648985] Avg episode reward: [(0, '160.210')] [2024-06-15 15:06:15,955][1648985] Fps is (10 sec: 36044.4, 60 sec: 44782.8, 300 sec: 45986.3). Total num frames: 593821696. Throughput: 0: 11320.9. Samples: 148552192. Policy #0 lag: (min: 9.0, avg: 78.6, max: 265.0) [2024-06-15 15:06:15,956][1648985] Avg episode reward: [(0, '173.650')] [2024-06-15 15:06:16,527][1652491] Updated weights for policy 0, policy_version 289984 (0.0013) [2024-06-15 15:06:17,928][1652491] Updated weights for policy 0, policy_version 290034 (0.0011) [2024-06-15 15:06:19,463][1652491] Updated weights for policy 0, policy_version 290112 (0.0012) [2024-06-15 15:06:20,955][1648985] Fps is (10 sec: 45889.2, 60 sec: 46421.4, 300 sec: 46430.6). Total num frames: 594214912. Throughput: 0: 11207.1. Samples: 148606976. Policy #0 lag: (min: 123.0, avg: 196.6, max: 315.0) [2024-06-15 15:06:20,956][1648985] Avg episode reward: [(0, '176.760')] [2024-06-15 15:06:21,505][1652491] Updated weights for policy 0, policy_version 290176 (0.0012) [2024-06-15 15:06:25,955][1648985] Fps is (10 sec: 45875.6, 60 sec: 43690.7, 300 sec: 45986.5). Total num frames: 594280448. Throughput: 0: 11138.8. Samples: 148647936. Policy #0 lag: (min: 123.0, avg: 196.6, max: 315.0) [2024-06-15 15:06:25,956][1648985] Avg episode reward: [(0, '203.240')] [2024-06-15 15:06:25,962][1651469] Saving new best policy, reward=203.240! [2024-06-15 15:06:29,293][1652491] Updated weights for policy 0, policy_version 290272 (0.0012) [2024-06-15 15:06:30,955][1648985] Fps is (10 sec: 39321.4, 60 sec: 45875.1, 300 sec: 46208.4). Total num frames: 594608128. Throughput: 0: 11298.1. Samples: 148710912. Policy #0 lag: (min: 123.0, avg: 196.6, max: 315.0) [2024-06-15 15:06:30,956][1648985] Avg episode reward: [(0, '188.060')] [2024-06-15 15:06:31,054][1652491] Updated weights for policy 0, policy_version 290339 (0.0012) [2024-06-15 15:06:32,881][1652491] Updated weights for policy 0, policy_version 290384 (0.0013) [2024-06-15 15:06:33,927][1652491] Updated weights for policy 0, policy_version 290429 (0.0010) [2024-06-15 15:06:35,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 43690.6, 300 sec: 46208.5). Total num frames: 594804736. Throughput: 0: 10968.2. Samples: 148775936. Policy #0 lag: (min: 123.0, avg: 196.6, max: 315.0) [2024-06-15 15:06:35,956][1648985] Avg episode reward: [(0, '174.470')] [2024-06-15 15:06:39,781][1652491] Updated weights for policy 0, policy_version 290481 (0.0015) [2024-06-15 15:06:40,955][1648985] Fps is (10 sec: 39320.5, 60 sec: 45328.8, 300 sec: 45986.2). Total num frames: 595001344. Throughput: 0: 11343.5. Samples: 148818432. Policy #0 lag: (min: 123.0, avg: 196.6, max: 315.0) [2024-06-15 15:06:40,957][1648985] Avg episode reward: [(0, '177.780')] [2024-06-15 15:06:41,203][1652491] Updated weights for policy 0, policy_version 290545 (0.0014) [2024-06-15 15:06:42,990][1652491] Updated weights for policy 0, policy_version 290618 (0.0022) [2024-06-15 15:06:45,896][1652491] Updated weights for policy 0, policy_version 290680 (0.0015) [2024-06-15 15:06:45,955][1648985] Fps is (10 sec: 49151.5, 60 sec: 44782.8, 300 sec: 46097.3). Total num frames: 595296256. Throughput: 0: 11161.6. Samples: 148879360. Policy #0 lag: (min: 123.0, avg: 196.6, max: 315.0) [2024-06-15 15:06:45,956][1648985] Avg episode reward: [(0, '171.050')] [2024-06-15 15:06:49,955][1651469] Signal inference workers to stop experience collection... (15200 times) [2024-06-15 15:06:50,013][1652491] InferenceWorker_p0-w0: stopping experience collection (15200 times) [2024-06-15 15:06:50,186][1651469] Signal inference workers to resume experience collection... (15200 times) [2024-06-15 15:06:50,187][1652491] InferenceWorker_p0-w0: resuming experience collection (15200 times) [2024-06-15 15:06:50,574][1652491] Updated weights for policy 0, policy_version 290720 (0.0013) [2024-06-15 15:06:50,955][1648985] Fps is (10 sec: 42600.1, 60 sec: 45329.2, 300 sec: 45653.0). Total num frames: 595427328. Throughput: 0: 11377.8. Samples: 148954112. Policy #0 lag: (min: 123.0, avg: 196.6, max: 315.0) [2024-06-15 15:06:50,956][1648985] Avg episode reward: [(0, '185.140')] [2024-06-15 15:06:52,283][1652491] Updated weights for policy 0, policy_version 290791 (0.0123) [2024-06-15 15:06:54,276][1652491] Updated weights for policy 0, policy_version 290877 (0.0012) [2024-06-15 15:06:55,955][1648985] Fps is (10 sec: 42598.5, 60 sec: 44236.7, 300 sec: 45875.2). Total num frames: 595722240. Throughput: 0: 11173.7. Samples: 148978176. Policy #0 lag: (min: 123.0, avg: 196.6, max: 315.0) [2024-06-15 15:06:55,956][1648985] Avg episode reward: [(0, '151.720')] [2024-06-15 15:06:55,963][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000290880_595722240.pth... [2024-06-15 15:06:56,140][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000285568_584843264.pth [2024-06-15 15:06:57,324][1652491] Updated weights for policy 0, policy_version 290939 (0.0013) [2024-06-15 15:07:00,955][1648985] Fps is (10 sec: 42597.6, 60 sec: 43690.6, 300 sec: 45653.0). Total num frames: 595853312. Throughput: 0: 11184.3. Samples: 149055488. Policy #0 lag: (min: 123.0, avg: 196.6, max: 315.0) [2024-06-15 15:07:00,956][1648985] Avg episode reward: [(0, '116.410')] [2024-06-15 15:07:02,149][1652491] Updated weights for policy 0, policy_version 290976 (0.0016) [2024-06-15 15:07:03,620][1652491] Updated weights for policy 0, policy_version 291025 (0.0012) [2024-06-15 15:07:05,152][1652491] Updated weights for policy 0, policy_version 291104 (0.0114) [2024-06-15 15:07:05,955][1648985] Fps is (10 sec: 52429.3, 60 sec: 46421.3, 300 sec: 46208.4). Total num frames: 596246528. Throughput: 0: 11309.5. Samples: 149115904. Policy #0 lag: (min: 123.0, avg: 196.6, max: 315.0) [2024-06-15 15:07:05,956][1648985] Avg episode reward: [(0, '113.100')] [2024-06-15 15:07:07,605][1652491] Updated weights for policy 0, policy_version 291157 (0.0013) [2024-06-15 15:07:10,955][1648985] Fps is (10 sec: 52429.6, 60 sec: 43692.9, 300 sec: 45764.1). Total num frames: 596377600. Throughput: 0: 11127.5. Samples: 149148672. Policy #0 lag: (min: 123.0, avg: 196.6, max: 315.0) [2024-06-15 15:07:10,956][1648985] Avg episode reward: [(0, '135.060')] [2024-06-15 15:07:12,670][1652491] Updated weights for policy 0, policy_version 291202 (0.0022) [2024-06-15 15:07:13,861][1652491] Updated weights for policy 0, policy_version 291260 (0.0014) [2024-06-15 15:07:15,728][1652491] Updated weights for policy 0, policy_version 291328 (0.0017) [2024-06-15 15:07:15,955][1648985] Fps is (10 sec: 39321.4, 60 sec: 46967.5, 300 sec: 45764.1). Total num frames: 596639744. Throughput: 0: 11480.2. Samples: 149227520. Policy #0 lag: (min: 123.0, avg: 196.6, max: 315.0) [2024-06-15 15:07:15,956][1648985] Avg episode reward: [(0, '143.020')] [2024-06-15 15:07:18,562][1652491] Updated weights for policy 0, policy_version 291397 (0.0151) [2024-06-15 15:07:20,064][1652491] Updated weights for policy 0, policy_version 291456 (0.0013) [2024-06-15 15:07:20,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 44783.0, 300 sec: 45986.3). Total num frames: 596901888. Throughput: 0: 11434.7. Samples: 149290496. Policy #0 lag: (min: 123.0, avg: 196.6, max: 315.0) [2024-06-15 15:07:20,956][1648985] Avg episode reward: [(0, '135.370')] [2024-06-15 15:07:24,914][1652491] Updated weights for policy 0, policy_version 291515 (0.0013) [2024-06-15 15:07:25,955][1648985] Fps is (10 sec: 45875.3, 60 sec: 46967.5, 300 sec: 45542.0). Total num frames: 597098496. Throughput: 0: 11491.7. Samples: 149335552. Policy #0 lag: (min: 123.0, avg: 196.6, max: 315.0) [2024-06-15 15:07:25,956][1648985] Avg episode reward: [(0, '147.110')] [2024-06-15 15:07:26,260][1652491] Updated weights for policy 0, policy_version 291568 (0.0012) [2024-06-15 15:07:27,384][1651469] Signal inference workers to stop experience collection... (15250 times) [2024-06-15 15:07:27,454][1652491] InferenceWorker_p0-w0: stopping experience collection (15250 times) [2024-06-15 15:07:27,601][1651469] Signal inference workers to resume experience collection... (15250 times) [2024-06-15 15:07:27,602][1652491] InferenceWorker_p0-w0: resuming experience collection (15250 times) [2024-06-15 15:07:27,768][1652491] Updated weights for policy 0, policy_version 291640 (0.0013) [2024-06-15 15:07:30,705][1652491] Updated weights for policy 0, policy_version 291696 (0.0024) [2024-06-15 15:07:30,955][1648985] Fps is (10 sec: 49151.9, 60 sec: 46421.4, 300 sec: 46097.3). Total num frames: 597393408. Throughput: 0: 11525.7. Samples: 149398016. Policy #0 lag: (min: 123.0, avg: 196.6, max: 315.0) [2024-06-15 15:07:30,956][1648985] Avg episode reward: [(0, '156.040')] [2024-06-15 15:07:35,572][1652491] Updated weights for policy 0, policy_version 291760 (0.0013) [2024-06-15 15:07:35,955][1648985] Fps is (10 sec: 45874.8, 60 sec: 45875.1, 300 sec: 45653.0). Total num frames: 597557248. Throughput: 0: 11628.1. Samples: 149477376. Policy #0 lag: (min: 15.0, avg: 80.9, max: 271.0) [2024-06-15 15:07:35,956][1648985] Avg episode reward: [(0, '155.830')] [2024-06-15 15:07:37,097][1652491] Updated weights for policy 0, policy_version 291824 (0.0049) [2024-06-15 15:07:38,579][1652491] Updated weights for policy 0, policy_version 291904 (0.0160) [2024-06-15 15:07:40,955][1648985] Fps is (10 sec: 55705.3, 60 sec: 49152.2, 300 sec: 46208.5). Total num frames: 597950464. Throughput: 0: 11753.2. Samples: 149507072. Policy #0 lag: (min: 15.0, avg: 80.9, max: 271.0) [2024-06-15 15:07:40,956][1648985] Avg episode reward: [(0, '137.840')] [2024-06-15 15:07:45,824][1652491] Updated weights for policy 0, policy_version 291970 (0.0013) [2024-06-15 15:07:45,955][1648985] Fps is (10 sec: 39321.5, 60 sec: 44236.8, 300 sec: 45542.0). Total num frames: 597950464. Throughput: 0: 11912.6. Samples: 149591552. Policy #0 lag: (min: 15.0, avg: 80.9, max: 271.0) [2024-06-15 15:07:45,956][1648985] Avg episode reward: [(0, '118.190')] [2024-06-15 15:07:47,624][1652491] Updated weights for policy 0, policy_version 292033 (0.0012) [2024-06-15 15:07:48,929][1652491] Updated weights for policy 0, policy_version 292098 (0.0012) [2024-06-15 15:07:50,955][1648985] Fps is (10 sec: 39322.1, 60 sec: 48605.9, 300 sec: 45764.2). Total num frames: 598343680. Throughput: 0: 12014.9. Samples: 149656576. Policy #0 lag: (min: 15.0, avg: 80.9, max: 271.0) [2024-06-15 15:07:50,955][1648985] Avg episode reward: [(0, '131.840')] [2024-06-15 15:07:51,273][1652491] Updated weights for policy 0, policy_version 292163 (0.0114) [2024-06-15 15:07:52,400][1652491] Updated weights for policy 0, policy_version 292217 (0.0013) [2024-06-15 15:07:55,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 45875.2, 300 sec: 45875.2). Total num frames: 598474752. Throughput: 0: 12094.6. Samples: 149692928. Policy #0 lag: (min: 15.0, avg: 80.9, max: 271.0) [2024-06-15 15:07:55,956][1648985] Avg episode reward: [(0, '146.990')] [2024-06-15 15:07:57,948][1652491] Updated weights for policy 0, policy_version 292274 (0.0011) [2024-06-15 15:07:59,893][1652491] Updated weights for policy 0, policy_version 292352 (0.0012) [2024-06-15 15:08:00,955][1648985] Fps is (10 sec: 45874.2, 60 sec: 49152.0, 300 sec: 45875.2). Total num frames: 598802432. Throughput: 0: 11764.6. Samples: 149756928. Policy #0 lag: (min: 15.0, avg: 80.9, max: 271.0) [2024-06-15 15:08:00,956][1648985] Avg episode reward: [(0, '148.500')] [2024-06-15 15:08:01,222][1652491] Updated weights for policy 0, policy_version 292416 (0.0013) [2024-06-15 15:08:05,955][1648985] Fps is (10 sec: 52429.8, 60 sec: 45875.3, 300 sec: 46208.4). Total num frames: 598999040. Throughput: 0: 11810.2. Samples: 149821952. Policy #0 lag: (min: 15.0, avg: 80.9, max: 271.0) [2024-06-15 15:08:05,955][1648985] Avg episode reward: [(0, '128.310')] [2024-06-15 15:08:08,542][1652491] Updated weights for policy 0, policy_version 292484 (0.0016) [2024-06-15 15:08:10,836][1651469] Signal inference workers to stop experience collection... (15300 times) [2024-06-15 15:08:10,884][1652491] InferenceWorker_p0-w0: stopping experience collection (15300 times) [2024-06-15 15:08:10,885][1652491] Updated weights for policy 0, policy_version 292565 (0.0012) [2024-06-15 15:08:10,955][1648985] Fps is (10 sec: 36045.4, 60 sec: 46421.4, 300 sec: 45430.9). Total num frames: 599162880. Throughput: 0: 11810.1. Samples: 149867008. Policy #0 lag: (min: 15.0, avg: 80.9, max: 271.0) [2024-06-15 15:08:10,956][1648985] Avg episode reward: [(0, '108.920')] [2024-06-15 15:08:11,005][1651469] Signal inference workers to resume experience collection... (15300 times) [2024-06-15 15:08:11,006][1652491] InferenceWorker_p0-w0: resuming experience collection (15300 times) [2024-06-15 15:08:12,970][1652491] Updated weights for policy 0, policy_version 292663 (0.0017) [2024-06-15 15:08:15,955][1648985] Fps is (10 sec: 42598.2, 60 sec: 46421.4, 300 sec: 45875.2). Total num frames: 599425024. Throughput: 0: 11616.7. Samples: 149920768. Policy #0 lag: (min: 15.0, avg: 80.9, max: 271.0) [2024-06-15 15:08:15,955][1648985] Avg episode reward: [(0, '118.280')] [2024-06-15 15:08:16,960][1652491] Updated weights for policy 0, policy_version 292736 (0.0013) [2024-06-15 15:08:20,955][1648985] Fps is (10 sec: 42597.8, 60 sec: 44782.8, 300 sec: 45764.1). Total num frames: 599588864. Throughput: 0: 11480.2. Samples: 149993984. Policy #0 lag: (min: 15.0, avg: 80.9, max: 271.0) [2024-06-15 15:08:20,956][1648985] Avg episode reward: [(0, '141.040')] [2024-06-15 15:08:21,725][1652491] Updated weights for policy 0, policy_version 292807 (0.0093) [2024-06-15 15:08:23,280][1652491] Updated weights for policy 0, policy_version 292880 (0.0012) [2024-06-15 15:08:25,955][1648985] Fps is (10 sec: 49152.6, 60 sec: 46967.6, 300 sec: 45764.2). Total num frames: 599916544. Throughput: 0: 11366.5. Samples: 150018560. Policy #0 lag: (min: 15.0, avg: 80.9, max: 271.0) [2024-06-15 15:08:25,955][1648985] Avg episode reward: [(0, '156.480')] [2024-06-15 15:08:28,079][1652491] Updated weights for policy 0, policy_version 292946 (0.0012) [2024-06-15 15:08:30,955][1648985] Fps is (10 sec: 45876.1, 60 sec: 44236.8, 300 sec: 46208.4). Total num frames: 600047616. Throughput: 0: 11002.3. Samples: 150086656. Policy #0 lag: (min: 15.0, avg: 80.9, max: 271.0) [2024-06-15 15:08:30,956][1648985] Avg episode reward: [(0, '154.840')] [2024-06-15 15:08:32,747][1652491] Updated weights for policy 0, policy_version 293026 (0.0015) [2024-06-15 15:08:34,228][1652491] Updated weights for policy 0, policy_version 293104 (0.0136) [2024-06-15 15:08:35,955][1648985] Fps is (10 sec: 49151.2, 60 sec: 47513.7, 300 sec: 45986.3). Total num frames: 600408064. Throughput: 0: 11138.8. Samples: 150157824. Policy #0 lag: (min: 15.0, avg: 80.9, max: 271.0) [2024-06-15 15:08:35,956][1648985] Avg episode reward: [(0, '149.350')] [2024-06-15 15:08:35,972][1652491] Updated weights for policy 0, policy_version 293181 (0.0014) [2024-06-15 15:08:40,390][1652491] Updated weights for policy 0, policy_version 293248 (0.0019) [2024-06-15 15:08:40,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 43690.7, 300 sec: 46208.4). Total num frames: 600571904. Throughput: 0: 11184.4. Samples: 150196224. Policy #0 lag: (min: 15.0, avg: 80.9, max: 271.0) [2024-06-15 15:08:40,955][1648985] Avg episode reward: [(0, '154.100')] [2024-06-15 15:08:44,482][1652491] Updated weights for policy 0, policy_version 293297 (0.0013) [2024-06-15 15:08:45,554][1652491] Updated weights for policy 0, policy_version 293351 (0.0015) [2024-06-15 15:08:45,955][1648985] Fps is (10 sec: 42598.7, 60 sec: 48059.9, 300 sec: 46319.5). Total num frames: 600834048. Throughput: 0: 11389.2. Samples: 150269440. Policy #0 lag: (min: 15.0, avg: 80.9, max: 271.0) [2024-06-15 15:08:45,955][1648985] Avg episode reward: [(0, '135.650')] [2024-06-15 15:08:46,839][1652491] Updated weights for policy 0, policy_version 293415 (0.0013) [2024-06-15 15:08:49,613][1652491] Updated weights for policy 0, policy_version 293456 (0.0025) [2024-06-15 15:08:50,124][1651469] Signal inference workers to stop experience collection... (15350 times) [2024-06-15 15:08:50,199][1652491] InferenceWorker_p0-w0: stopping experience collection (15350 times) [2024-06-15 15:08:50,379][1651469] Signal inference workers to resume experience collection... (15350 times) [2024-06-15 15:08:50,380][1652491] InferenceWorker_p0-w0: resuming experience collection (15350 times) [2024-06-15 15:08:50,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 45875.2, 300 sec: 46208.4). Total num frames: 601096192. Throughput: 0: 11400.5. Samples: 150334976. Policy #0 lag: (min: 14.0, avg: 140.6, max: 270.0) [2024-06-15 15:08:50,956][1648985] Avg episode reward: [(0, '122.380')] [2024-06-15 15:08:54,892][1652491] Updated weights for policy 0, policy_version 293521 (0.0013) [2024-06-15 15:08:55,955][1648985] Fps is (10 sec: 39320.9, 60 sec: 45875.2, 300 sec: 46097.4). Total num frames: 601227264. Throughput: 0: 11320.9. Samples: 150376448. Policy #0 lag: (min: 14.0, avg: 140.6, max: 270.0) [2024-06-15 15:08:55,956][1648985] Avg episode reward: [(0, '118.350')] [2024-06-15 15:08:56,233][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000293584_601260032.pth... [2024-06-15 15:08:56,233][1652491] Updated weights for policy 0, policy_version 293584 (0.0010) [2024-06-15 15:08:56,395][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000288176_590184448.pth [2024-06-15 15:08:57,859][1652491] Updated weights for policy 0, policy_version 293650 (0.0013) [2024-06-15 15:09:00,955][1648985] Fps is (10 sec: 39321.5, 60 sec: 44783.0, 300 sec: 45764.1). Total num frames: 601489408. Throughput: 0: 11548.4. Samples: 150440448. Policy #0 lag: (min: 14.0, avg: 140.6, max: 270.0) [2024-06-15 15:09:00,956][1648985] Avg episode reward: [(0, '134.630')] [2024-06-15 15:09:01,019][1652491] Updated weights for policy 0, policy_version 293699 (0.0013) [2024-06-15 15:09:02,098][1652491] Updated weights for policy 0, policy_version 293758 (0.0013) [2024-06-15 15:09:05,955][1648985] Fps is (10 sec: 39322.2, 60 sec: 43690.6, 300 sec: 45986.3). Total num frames: 601620480. Throughput: 0: 11582.6. Samples: 150515200. Policy #0 lag: (min: 14.0, avg: 140.6, max: 270.0) [2024-06-15 15:09:05,955][1648985] Avg episode reward: [(0, '137.310')] [2024-06-15 15:09:07,232][1652491] Updated weights for policy 0, policy_version 293827 (0.0012) [2024-06-15 15:09:08,778][1652491] Updated weights for policy 0, policy_version 293889 (0.0012) [2024-06-15 15:09:10,228][1652491] Updated weights for policy 0, policy_version 293948 (0.0069) [2024-06-15 15:09:10,955][1648985] Fps is (10 sec: 52428.0, 60 sec: 47513.5, 300 sec: 45986.3). Total num frames: 602013696. Throughput: 0: 11593.9. Samples: 150540288. Policy #0 lag: (min: 14.0, avg: 140.6, max: 270.0) [2024-06-15 15:09:10,956][1648985] Avg episode reward: [(0, '153.750')] [2024-06-15 15:09:13,170][1652491] Updated weights for policy 0, policy_version 294008 (0.0014) [2024-06-15 15:09:15,955][1648985] Fps is (10 sec: 52428.5, 60 sec: 45329.0, 300 sec: 46208.4). Total num frames: 602144768. Throughput: 0: 11696.3. Samples: 150612992. Policy #0 lag: (min: 14.0, avg: 140.6, max: 270.0) [2024-06-15 15:09:15,956][1648985] Avg episode reward: [(0, '145.050')] [2024-06-15 15:09:17,179][1652491] Updated weights for policy 0, policy_version 294041 (0.0133) [2024-06-15 15:09:18,659][1652491] Updated weights for policy 0, policy_version 294096 (0.0013) [2024-06-15 15:09:19,758][1652491] Updated weights for policy 0, policy_version 294143 (0.0017) [2024-06-15 15:09:20,955][1648985] Fps is (10 sec: 45875.7, 60 sec: 48059.8, 300 sec: 46097.3). Total num frames: 602472448. Throughput: 0: 11593.9. Samples: 150679552. Policy #0 lag: (min: 14.0, avg: 140.6, max: 270.0) [2024-06-15 15:09:20,956][1648985] Avg episode reward: [(0, '147.700')] [2024-06-15 15:09:21,264][1652491] Updated weights for policy 0, policy_version 294198 (0.0022) [2024-06-15 15:09:24,055][1652491] Updated weights for policy 0, policy_version 294244 (0.0012) [2024-06-15 15:09:25,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 45875.1, 300 sec: 46208.4). Total num frames: 602669056. Throughput: 0: 11605.3. Samples: 150718464. Policy #0 lag: (min: 14.0, avg: 140.6, max: 270.0) [2024-06-15 15:09:25,956][1648985] Avg episode reward: [(0, '146.280')] [2024-06-15 15:09:28,835][1652491] Updated weights for policy 0, policy_version 294305 (0.0015) [2024-06-15 15:09:29,984][1652491] Updated weights for policy 0, policy_version 294352 (0.0022) [2024-06-15 15:09:30,955][1648985] Fps is (10 sec: 42598.6, 60 sec: 47513.6, 300 sec: 46319.5). Total num frames: 602898432. Throughput: 0: 11650.8. Samples: 150793728. Policy #0 lag: (min: 14.0, avg: 140.6, max: 270.0) [2024-06-15 15:09:30,956][1648985] Avg episode reward: [(0, '152.820')] [2024-06-15 15:09:31,209][1652491] Updated weights for policy 0, policy_version 294400 (0.0011) [2024-06-15 15:09:31,742][1651469] Signal inference workers to stop experience collection... (15400 times) [2024-06-15 15:09:31,820][1652491] InferenceWorker_p0-w0: stopping experience collection (15400 times) [2024-06-15 15:09:32,023][1651469] Signal inference workers to resume experience collection... (15400 times) [2024-06-15 15:09:32,023][1652491] InferenceWorker_p0-w0: resuming experience collection (15400 times) [2024-06-15 15:09:32,699][1652491] Updated weights for policy 0, policy_version 294457 (0.0013) [2024-06-15 15:09:35,543][1652491] Updated weights for policy 0, policy_version 294512 (0.0108) [2024-06-15 15:09:35,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 46421.3, 300 sec: 46208.4). Total num frames: 603193344. Throughput: 0: 11628.1. Samples: 150858240. Policy #0 lag: (min: 14.0, avg: 140.6, max: 270.0) [2024-06-15 15:09:35,956][1648985] Avg episode reward: [(0, '157.350')] [2024-06-15 15:09:39,826][1652491] Updated weights for policy 0, policy_version 294560 (0.0024) [2024-06-15 15:09:40,923][1652491] Updated weights for policy 0, policy_version 294608 (0.0013) [2024-06-15 15:09:40,955][1648985] Fps is (10 sec: 45874.5, 60 sec: 46421.2, 300 sec: 46319.5). Total num frames: 603357184. Throughput: 0: 11514.3. Samples: 150894592. Policy #0 lag: (min: 14.0, avg: 140.6, max: 270.0) [2024-06-15 15:09:40,956][1648985] Avg episode reward: [(0, '170.800')] [2024-06-15 15:09:42,359][1652491] Updated weights for policy 0, policy_version 294657 (0.0012) [2024-06-15 15:09:43,729][1652491] Updated weights for policy 0, policy_version 294717 (0.0027) [2024-06-15 15:09:45,955][1648985] Fps is (10 sec: 42598.7, 60 sec: 46421.3, 300 sec: 45986.3). Total num frames: 603619328. Throughput: 0: 11696.4. Samples: 150966784. Policy #0 lag: (min: 14.0, avg: 140.6, max: 270.0) [2024-06-15 15:09:45,956][1648985] Avg episode reward: [(0, '157.180')] [2024-06-15 15:09:46,808][1652491] Updated weights for policy 0, policy_version 294772 (0.0013) [2024-06-15 15:09:50,833][1652491] Updated weights for policy 0, policy_version 294816 (0.0015) [2024-06-15 15:09:50,955][1648985] Fps is (10 sec: 42598.8, 60 sec: 44782.9, 300 sec: 46320.0). Total num frames: 603783168. Throughput: 0: 11582.5. Samples: 151036416. Policy #0 lag: (min: 14.0, avg: 140.6, max: 270.0) [2024-06-15 15:09:50,956][1648985] Avg episode reward: [(0, '148.510')] [2024-06-15 15:09:51,878][1652491] Updated weights for policy 0, policy_version 294849 (0.0013) [2024-06-15 15:09:53,552][1652491] Updated weights for policy 0, policy_version 294928 (0.0097) [2024-06-15 15:09:55,955][1648985] Fps is (10 sec: 49151.5, 60 sec: 48059.8, 300 sec: 46208.4). Total num frames: 604110848. Throughput: 0: 11730.5. Samples: 151068160. Policy #0 lag: (min: 14.0, avg: 140.6, max: 270.0) [2024-06-15 15:09:55,956][1648985] Avg episode reward: [(0, '125.820')] [2024-06-15 15:09:57,887][1652491] Updated weights for policy 0, policy_version 295008 (0.0013) [2024-06-15 15:09:58,706][1652491] Updated weights for policy 0, policy_version 295040 (0.0011) [2024-06-15 15:10:00,955][1648985] Fps is (10 sec: 45875.5, 60 sec: 45875.2, 300 sec: 46208.4). Total num frames: 604241920. Throughput: 0: 11730.5. Samples: 151140864. Policy #0 lag: (min: 14.0, avg: 140.6, max: 270.0) [2024-06-15 15:10:00,956][1648985] Avg episode reward: [(0, '136.800')] [2024-06-15 15:10:03,242][1652491] Updated weights for policy 0, policy_version 295105 (0.0012) [2024-06-15 15:10:04,708][1652491] Updated weights for policy 0, policy_version 295173 (0.0017) [2024-06-15 15:10:05,955][1648985] Fps is (10 sec: 49152.4, 60 sec: 49698.1, 300 sec: 46097.4). Total num frames: 604602368. Throughput: 0: 11719.1. Samples: 151206912. Policy #0 lag: (min: 38.0, avg: 115.6, max: 294.0) [2024-06-15 15:10:05,955][1648985] Avg episode reward: [(0, '140.760')] [2024-06-15 15:10:05,994][1652491] Updated weights for policy 0, policy_version 295226 (0.0089) [2024-06-15 15:10:09,565][1652491] Updated weights for policy 0, policy_version 295290 (0.0013) [2024-06-15 15:10:10,955][1648985] Fps is (10 sec: 52429.2, 60 sec: 45875.4, 300 sec: 46208.4). Total num frames: 604766208. Throughput: 0: 11719.1. Samples: 151245824. Policy #0 lag: (min: 38.0, avg: 115.6, max: 294.0) [2024-06-15 15:10:10,955][1648985] Avg episode reward: [(0, '140.120')] [2024-06-15 15:10:14,142][1652491] Updated weights for policy 0, policy_version 295330 (0.0012) [2024-06-15 15:10:15,096][1651469] Signal inference workers to stop experience collection... (15450 times) [2024-06-15 15:10:15,144][1652491] InferenceWorker_p0-w0: stopping experience collection (15450 times) [2024-06-15 15:10:15,272][1651469] Signal inference workers to resume experience collection... (15450 times) [2024-06-15 15:10:15,272][1652491] InferenceWorker_p0-w0: resuming experience collection (15450 times) [2024-06-15 15:10:15,955][1648985] Fps is (10 sec: 39320.9, 60 sec: 47513.5, 300 sec: 45986.3). Total num frames: 604995584. Throughput: 0: 11628.1. Samples: 151316992. Policy #0 lag: (min: 38.0, avg: 115.6, max: 294.0) [2024-06-15 15:10:15,956][1648985] Avg episode reward: [(0, '139.470')] [2024-06-15 15:10:16,264][1652491] Updated weights for policy 0, policy_version 295426 (0.0018) [2024-06-15 15:10:17,617][1652491] Updated weights for policy 0, policy_version 295478 (0.0013) [2024-06-15 15:10:20,038][1652491] Updated weights for policy 0, policy_version 295504 (0.0011) [2024-06-15 15:10:20,955][1648985] Fps is (10 sec: 49151.5, 60 sec: 46421.3, 300 sec: 46097.4). Total num frames: 605257728. Throughput: 0: 11593.9. Samples: 151379968. Policy #0 lag: (min: 38.0, avg: 115.6, max: 294.0) [2024-06-15 15:10:20,956][1648985] Avg episode reward: [(0, '141.790')] [2024-06-15 15:10:21,010][1652491] Updated weights for policy 0, policy_version 295549 (0.0015) [2024-06-15 15:10:24,935][1652491] Updated weights for policy 0, policy_version 295600 (0.0028) [2024-06-15 15:10:25,955][1648985] Fps is (10 sec: 45875.4, 60 sec: 46421.3, 300 sec: 46097.3). Total num frames: 605454336. Throughput: 0: 11821.5. Samples: 151426560. Policy #0 lag: (min: 38.0, avg: 115.6, max: 294.0) [2024-06-15 15:10:25,956][1648985] Avg episode reward: [(0, '155.150')] [2024-06-15 15:10:26,336][1652491] Updated weights for policy 0, policy_version 295664 (0.0014) [2024-06-15 15:10:27,568][1652491] Updated weights for policy 0, policy_version 295714 (0.0013) [2024-06-15 15:10:30,897][1652491] Updated weights for policy 0, policy_version 295762 (0.0014) [2024-06-15 15:10:30,955][1648985] Fps is (10 sec: 45875.3, 60 sec: 46967.5, 300 sec: 45875.2). Total num frames: 605716480. Throughput: 0: 11696.3. Samples: 151493120. Policy #0 lag: (min: 38.0, avg: 115.6, max: 294.0) [2024-06-15 15:10:30,956][1648985] Avg episode reward: [(0, '151.550')] [2024-06-15 15:10:35,955][1648985] Fps is (10 sec: 39321.5, 60 sec: 44236.7, 300 sec: 45986.3). Total num frames: 605847552. Throughput: 0: 11787.4. Samples: 151566848. Policy #0 lag: (min: 38.0, avg: 115.6, max: 294.0) [2024-06-15 15:10:35,956][1648985] Avg episode reward: [(0, '126.720')] [2024-06-15 15:10:36,236][1652491] Updated weights for policy 0, policy_version 295841 (0.0012) [2024-06-15 15:10:38,141][1652491] Updated weights for policy 0, policy_version 295927 (0.0013) [2024-06-15 15:10:40,260][1652491] Updated weights for policy 0, policy_version 295994 (0.0018) [2024-06-15 15:10:40,955][1648985] Fps is (10 sec: 49152.3, 60 sec: 47513.8, 300 sec: 46097.3). Total num frames: 606208000. Throughput: 0: 11719.1. Samples: 151595520. Policy #0 lag: (min: 38.0, avg: 115.6, max: 294.0) [2024-06-15 15:10:40,956][1648985] Avg episode reward: [(0, '133.840')] [2024-06-15 15:10:42,998][1652491] Updated weights for policy 0, policy_version 296052 (0.0014) [2024-06-15 15:10:45,955][1648985] Fps is (10 sec: 49152.8, 60 sec: 45329.1, 300 sec: 46208.5). Total num frames: 606339072. Throughput: 0: 11537.1. Samples: 151660032. Policy #0 lag: (min: 38.0, avg: 115.6, max: 294.0) [2024-06-15 15:10:45,955][1648985] Avg episode reward: [(0, '134.050')] [2024-06-15 15:10:47,672][1652491] Updated weights for policy 0, policy_version 296086 (0.0013) [2024-06-15 15:10:49,218][1652491] Updated weights for policy 0, policy_version 296145 (0.0013) [2024-06-15 15:10:50,438][1652491] Updated weights for policy 0, policy_version 296198 (0.0015) [2024-06-15 15:10:50,955][1648985] Fps is (10 sec: 45875.0, 60 sec: 48059.8, 300 sec: 46097.4). Total num frames: 606666752. Throughput: 0: 11593.9. Samples: 151728640. Policy #0 lag: (min: 38.0, avg: 115.6, max: 294.0) [2024-06-15 15:10:50,956][1648985] Avg episode reward: [(0, '135.060')] [2024-06-15 15:10:51,629][1652491] Updated weights for policy 0, policy_version 296248 (0.0027) [2024-06-15 15:10:54,258][1652491] Updated weights for policy 0, policy_version 296289 (0.0012) [2024-06-15 15:10:55,955][1648985] Fps is (10 sec: 52427.6, 60 sec: 45875.1, 300 sec: 46208.4). Total num frames: 606863360. Throughput: 0: 11537.0. Samples: 151764992. Policy #0 lag: (min: 38.0, avg: 115.6, max: 294.0) [2024-06-15 15:10:55,956][1648985] Avg episode reward: [(0, '154.840')] [2024-06-15 15:10:55,981][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000296320_606863360.pth... [2024-06-15 15:10:56,038][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000290880_595722240.pth [2024-06-15 15:10:56,042][1651469] Saving a milestone train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/milestones/checkpoint_000296320_606863360.pth [2024-06-15 15:10:58,059][1652491] Updated weights for policy 0, policy_version 296336 (0.0011) [2024-06-15 15:10:58,173][1651469] Signal inference workers to stop experience collection... (15500 times) [2024-06-15 15:10:58,237][1652491] InferenceWorker_p0-w0: stopping experience collection (15500 times) [2024-06-15 15:10:58,405][1651469] Signal inference workers to resume experience collection... (15500 times) [2024-06-15 15:10:58,405][1652491] InferenceWorker_p0-w0: resuming experience collection (15500 times) [2024-06-15 15:10:59,485][1652491] Updated weights for policy 0, policy_version 296388 (0.0011) [2024-06-15 15:11:00,879][1652491] Updated weights for policy 0, policy_version 296449 (0.0015) [2024-06-15 15:11:00,955][1648985] Fps is (10 sec: 45875.0, 60 sec: 48059.7, 300 sec: 46319.5). Total num frames: 607125504. Throughput: 0: 11673.6. Samples: 151842304. Policy #0 lag: (min: 38.0, avg: 115.6, max: 294.0) [2024-06-15 15:11:00,956][1648985] Avg episode reward: [(0, '142.740')] [2024-06-15 15:11:02,175][1652491] Updated weights for policy 0, policy_version 296510 (0.0012) [2024-06-15 15:11:05,181][1652491] Updated weights for policy 0, policy_version 296560 (0.0012) [2024-06-15 15:11:05,955][1648985] Fps is (10 sec: 52429.1, 60 sec: 46421.2, 300 sec: 46208.9). Total num frames: 607387648. Throughput: 0: 11798.7. Samples: 151910912. Policy #0 lag: (min: 38.0, avg: 115.6, max: 294.0) [2024-06-15 15:11:05,956][1648985] Avg episode reward: [(0, '142.810')] [2024-06-15 15:11:08,229][1652491] Updated weights for policy 0, policy_version 296592 (0.0019) [2024-06-15 15:11:09,302][1652491] Updated weights for policy 0, policy_version 296640 (0.0012) [2024-06-15 15:11:10,955][1648985] Fps is (10 sec: 49152.7, 60 sec: 47513.6, 300 sec: 46763.9). Total num frames: 607617024. Throughput: 0: 11787.4. Samples: 151956992. Policy #0 lag: (min: 38.0, avg: 115.6, max: 294.0) [2024-06-15 15:11:10,955][1648985] Avg episode reward: [(0, '165.080')] [2024-06-15 15:11:11,120][1652491] Updated weights for policy 0, policy_version 296702 (0.0013) [2024-06-15 15:11:12,858][1652491] Updated weights for policy 0, policy_version 296764 (0.0015) [2024-06-15 15:11:15,709][1652491] Updated weights for policy 0, policy_version 296824 (0.0012) [2024-06-15 15:11:15,958][1648985] Fps is (10 sec: 52412.3, 60 sec: 48603.3, 300 sec: 46430.1). Total num frames: 607911936. Throughput: 0: 11820.7. Samples: 152025088. Policy #0 lag: (min: 38.0, avg: 115.6, max: 294.0) [2024-06-15 15:11:15,959][1648985] Avg episode reward: [(0, '138.590')] [2024-06-15 15:11:19,533][1652491] Updated weights for policy 0, policy_version 296880 (0.0013) [2024-06-15 15:11:20,955][1648985] Fps is (10 sec: 42597.4, 60 sec: 46421.3, 300 sec: 46652.7). Total num frames: 608043008. Throughput: 0: 11844.3. Samples: 152099840. Policy #0 lag: (min: 9.0, avg: 99.5, max: 265.0) [2024-06-15 15:11:20,956][1648985] Avg episode reward: [(0, '149.480')] [2024-06-15 15:11:22,369][1652491] Updated weights for policy 0, policy_version 296946 (0.0013) [2024-06-15 15:11:23,744][1652491] Updated weights for policy 0, policy_version 297008 (0.0014) [2024-06-15 15:11:25,955][1648985] Fps is (10 sec: 42611.9, 60 sec: 48059.7, 300 sec: 46541.7). Total num frames: 608337920. Throughput: 0: 11889.7. Samples: 152130560. Policy #0 lag: (min: 9.0, avg: 99.5, max: 265.0) [2024-06-15 15:11:25,956][1648985] Avg episode reward: [(0, '146.690')] [2024-06-15 15:11:26,543][1652491] Updated weights for policy 0, policy_version 297079 (0.0132) [2024-06-15 15:11:30,256][1652491] Updated weights for policy 0, policy_version 297136 (0.0014) [2024-06-15 15:11:30,955][1648985] Fps is (10 sec: 52429.6, 60 sec: 47513.6, 300 sec: 46652.8). Total num frames: 608567296. Throughput: 0: 12242.5. Samples: 152210944. Policy #0 lag: (min: 9.0, avg: 99.5, max: 265.0) [2024-06-15 15:11:30,956][1648985] Avg episode reward: [(0, '139.230')] [2024-06-15 15:11:32,411][1652491] Updated weights for policy 0, policy_version 297186 (0.0024) [2024-06-15 15:11:33,984][1652491] Updated weights for policy 0, policy_version 297252 (0.0115) [2024-06-15 15:11:35,955][1648985] Fps is (10 sec: 49153.0, 60 sec: 49698.3, 300 sec: 46875.0). Total num frames: 608829440. Throughput: 0: 12288.0. Samples: 152281600. Policy #0 lag: (min: 9.0, avg: 99.5, max: 265.0) [2024-06-15 15:11:35,955][1648985] Avg episode reward: [(0, '149.040')] [2024-06-15 15:11:36,894][1652491] Updated weights for policy 0, policy_version 297322 (0.0014) [2024-06-15 15:11:40,803][1651469] Signal inference workers to stop experience collection... (15550 times) [2024-06-15 15:11:40,841][1652491] InferenceWorker_p0-w0: stopping experience collection (15550 times) [2024-06-15 15:11:40,955][1648985] Fps is (10 sec: 42598.1, 60 sec: 46421.3, 300 sec: 46430.6). Total num frames: 608993280. Throughput: 0: 12242.5. Samples: 152315904. Policy #0 lag: (min: 9.0, avg: 99.5, max: 265.0) [2024-06-15 15:11:40,956][1648985] Avg episode reward: [(0, '154.080')] [2024-06-15 15:11:41,046][1651469] Signal inference workers to resume experience collection... (15550 times) [2024-06-15 15:11:41,047][1652491] InferenceWorker_p0-w0: resuming experience collection (15550 times) [2024-06-15 15:11:41,362][1652491] Updated weights for policy 0, policy_version 297392 (0.0014) [2024-06-15 15:11:43,198][1652491] Updated weights for policy 0, policy_version 297427 (0.0012) [2024-06-15 15:11:45,028][1652491] Updated weights for policy 0, policy_version 297504 (0.0014) [2024-06-15 15:11:45,955][1648985] Fps is (10 sec: 52428.1, 60 sec: 50244.2, 300 sec: 47208.1). Total num frames: 609353728. Throughput: 0: 12128.7. Samples: 152388096. Policy #0 lag: (min: 9.0, avg: 99.5, max: 265.0) [2024-06-15 15:11:45,956][1648985] Avg episode reward: [(0, '140.740')] [2024-06-15 15:11:47,252][1652491] Updated weights for policy 0, policy_version 297538 (0.0015) [2024-06-15 15:11:48,507][1652491] Updated weights for policy 0, policy_version 297600 (0.0015) [2024-06-15 15:11:50,955][1648985] Fps is (10 sec: 49152.0, 60 sec: 46967.4, 300 sec: 46652.7). Total num frames: 609484800. Throughput: 0: 12162.9. Samples: 152458240. Policy #0 lag: (min: 9.0, avg: 99.5, max: 265.0) [2024-06-15 15:11:50,956][1648985] Avg episode reward: [(0, '150.480')] [2024-06-15 15:11:52,685][1652491] Updated weights for policy 0, policy_version 297655 (0.0014) [2024-06-15 15:11:54,804][1652491] Updated weights for policy 0, policy_version 297682 (0.0017) [2024-06-15 15:11:55,955][1648985] Fps is (10 sec: 36045.1, 60 sec: 47513.8, 300 sec: 46986.0). Total num frames: 609714176. Throughput: 0: 11992.2. Samples: 152496640. Policy #0 lag: (min: 9.0, avg: 99.5, max: 265.0) [2024-06-15 15:11:55,955][1648985] Avg episode reward: [(0, '145.840')] [2024-06-15 15:11:57,478][1652491] Updated weights for policy 0, policy_version 297776 (0.0012) [2024-06-15 15:11:59,985][1652491] Updated weights for policy 0, policy_version 297824 (0.0021) [2024-06-15 15:12:00,955][1648985] Fps is (10 sec: 52429.2, 60 sec: 48059.8, 300 sec: 46652.7). Total num frames: 610009088. Throughput: 0: 11742.7. Samples: 152553472. Policy #0 lag: (min: 9.0, avg: 99.5, max: 265.0) [2024-06-15 15:12:00,956][1648985] Avg episode reward: [(0, '152.440')] [2024-06-15 15:12:03,071][1652491] Updated weights for policy 0, policy_version 297858 (0.0013) [2024-06-15 15:12:05,955][1648985] Fps is (10 sec: 42597.6, 60 sec: 45875.2, 300 sec: 46652.7). Total num frames: 610140160. Throughput: 0: 11798.8. Samples: 152630784. Policy #0 lag: (min: 9.0, avg: 99.5, max: 265.0) [2024-06-15 15:12:05,956][1648985] Avg episode reward: [(0, '164.940')] [2024-06-15 15:12:06,435][1652491] Updated weights for policy 0, policy_version 297936 (0.0012) [2024-06-15 15:12:09,239][1652491] Updated weights for policy 0, policy_version 298045 (0.0119) [2024-06-15 15:12:10,955][1648985] Fps is (10 sec: 39321.4, 60 sec: 46421.2, 300 sec: 46652.7). Total num frames: 610402304. Throughput: 0: 11594.0. Samples: 152652288. Policy #0 lag: (min: 9.0, avg: 99.5, max: 265.0) [2024-06-15 15:12:10,956][1648985] Avg episode reward: [(0, '170.170')] [2024-06-15 15:12:12,591][1652491] Updated weights for policy 0, policy_version 298102 (0.0058) [2024-06-15 15:12:15,368][1652491] Updated weights for policy 0, policy_version 298144 (0.0014) [2024-06-15 15:12:15,955][1648985] Fps is (10 sec: 52429.6, 60 sec: 45877.7, 300 sec: 46652.7). Total num frames: 610664448. Throughput: 0: 11389.2. Samples: 152723456. Policy #0 lag: (min: 9.0, avg: 99.5, max: 265.0) [2024-06-15 15:12:15,956][1648985] Avg episode reward: [(0, '156.480')] [2024-06-15 15:12:18,246][1652491] Updated weights for policy 0, policy_version 298208 (0.0014) [2024-06-15 15:12:20,346][1652491] Updated weights for policy 0, policy_version 298288 (0.0025) [2024-06-15 15:12:20,970][1648985] Fps is (10 sec: 52349.7, 60 sec: 48047.7, 300 sec: 46872.5). Total num frames: 610926592. Throughput: 0: 11135.1. Samples: 152782848. Policy #0 lag: (min: 9.0, avg: 99.5, max: 265.0) [2024-06-15 15:12:20,971][1648985] Avg episode reward: [(0, '150.690')] [2024-06-15 15:12:24,488][1651469] Signal inference workers to stop experience collection... (15600 times) [2024-06-15 15:12:24,532][1652491] InferenceWorker_p0-w0: stopping experience collection (15600 times) [2024-06-15 15:12:24,862][1651469] Signal inference workers to resume experience collection... (15600 times) [2024-06-15 15:12:24,863][1652491] InferenceWorker_p0-w0: resuming experience collection (15600 times) [2024-06-15 15:12:25,053][1652491] Updated weights for policy 0, policy_version 298358 (0.0083) [2024-06-15 15:12:25,956][1648985] Fps is (10 sec: 39319.6, 60 sec: 45328.8, 300 sec: 46319.4). Total num frames: 611057664. Throughput: 0: 11275.3. Samples: 152823296. Policy #0 lag: (min: 9.0, avg: 99.5, max: 265.0) [2024-06-15 15:12:25,957][1648985] Avg episode reward: [(0, '136.270')] [2024-06-15 15:12:28,136][1652491] Updated weights for policy 0, policy_version 298400 (0.0029) [2024-06-15 15:12:29,323][1652491] Updated weights for policy 0, policy_version 298436 (0.0018) [2024-06-15 15:12:30,955][1648985] Fps is (10 sec: 39381.4, 60 sec: 45875.2, 300 sec: 46652.8). Total num frames: 611319808. Throughput: 0: 11207.1. Samples: 152892416. Policy #0 lag: (min: 9.0, avg: 99.5, max: 265.0) [2024-06-15 15:12:30,956][1648985] Avg episode reward: [(0, '121.580')] [2024-06-15 15:12:31,872][1652491] Updated weights for policy 0, policy_version 298531 (0.0013) [2024-06-15 15:12:35,955][1648985] Fps is (10 sec: 42600.6, 60 sec: 44236.7, 300 sec: 45875.2). Total num frames: 611483648. Throughput: 0: 11036.5. Samples: 152954880. Policy #0 lag: (min: 127.0, avg: 227.3, max: 351.0) [2024-06-15 15:12:35,955][1648985] Avg episode reward: [(0, '156.910')] [2024-06-15 15:12:36,410][1652491] Updated weights for policy 0, policy_version 298593 (0.0012) [2024-06-15 15:12:39,674][1652491] Updated weights for policy 0, policy_version 298640 (0.0018) [2024-06-15 15:12:40,918][1652491] Updated weights for policy 0, policy_version 298688 (0.0014) [2024-06-15 15:12:40,955][1648985] Fps is (10 sec: 39321.6, 60 sec: 45329.1, 300 sec: 46652.8). Total num frames: 611713024. Throughput: 0: 10990.9. Samples: 152991232. Policy #0 lag: (min: 127.0, avg: 227.3, max: 351.0) [2024-06-15 15:12:40,956][1648985] Avg episode reward: [(0, '155.750')] [2024-06-15 15:12:42,442][1652491] Updated weights for policy 0, policy_version 298737 (0.0012) [2024-06-15 15:12:44,136][1652491] Updated weights for policy 0, policy_version 298808 (0.0013) [2024-06-15 15:12:45,955][1648985] Fps is (10 sec: 49151.8, 60 sec: 43690.7, 300 sec: 46208.4). Total num frames: 611975168. Throughput: 0: 11047.8. Samples: 153050624. Policy #0 lag: (min: 127.0, avg: 227.3, max: 351.0) [2024-06-15 15:12:45,956][1648985] Avg episode reward: [(0, '141.780')] [2024-06-15 15:12:48,407][1652491] Updated weights for policy 0, policy_version 298861 (0.0218) [2024-06-15 15:12:50,955][1648985] Fps is (10 sec: 39321.3, 60 sec: 43690.7, 300 sec: 46208.4). Total num frames: 612106240. Throughput: 0: 10990.9. Samples: 153125376. Policy #0 lag: (min: 127.0, avg: 227.3, max: 351.0) [2024-06-15 15:12:50,956][1648985] Avg episode reward: [(0, '153.020')] [2024-06-15 15:12:51,613][1652491] Updated weights for policy 0, policy_version 298896 (0.0013) [2024-06-15 15:12:53,034][1652491] Updated weights for policy 0, policy_version 298948 (0.0014) [2024-06-15 15:12:54,697][1652491] Updated weights for policy 0, policy_version 299024 (0.0013) [2024-06-15 15:12:55,955][1648985] Fps is (10 sec: 52427.2, 60 sec: 46421.0, 300 sec: 46430.6). Total num frames: 612499456. Throughput: 0: 11150.2. Samples: 153154048. Policy #0 lag: (min: 127.0, avg: 227.3, max: 351.0) [2024-06-15 15:12:55,956][1648985] Avg episode reward: [(0, '146.920')] [2024-06-15 15:12:55,962][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000299072_612499456.pth... [2024-06-15 15:12:56,001][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000293584_601260032.pth [2024-06-15 15:13:00,167][1652491] Updated weights for policy 0, policy_version 299090 (0.0020) [2024-06-15 15:13:00,955][1648985] Fps is (10 sec: 49152.0, 60 sec: 43144.5, 300 sec: 46097.3). Total num frames: 612597760. Throughput: 0: 11047.8. Samples: 153220608. Policy #0 lag: (min: 127.0, avg: 227.3, max: 351.0) [2024-06-15 15:13:00,956][1648985] Avg episode reward: [(0, '155.240')] [2024-06-15 15:13:03,603][1652491] Updated weights for policy 0, policy_version 299152 (0.0015) [2024-06-15 15:13:05,170][1652491] Updated weights for policy 0, policy_version 299206 (0.0014) [2024-06-15 15:13:05,955][1648985] Fps is (10 sec: 36046.2, 60 sec: 45329.2, 300 sec: 46430.6). Total num frames: 612859904. Throughput: 0: 11199.5. Samples: 153286656. Policy #0 lag: (min: 127.0, avg: 227.3, max: 351.0) [2024-06-15 15:13:05,955][1648985] Avg episode reward: [(0, '148.400')] [2024-06-15 15:13:06,759][1651469] Signal inference workers to stop experience collection... (15650 times) [2024-06-15 15:13:06,820][1652491] InferenceWorker_p0-w0: stopping experience collection (15650 times) [2024-06-15 15:13:07,058][1651469] Signal inference workers to resume experience collection... (15650 times) [2024-06-15 15:13:07,059][1652491] InferenceWorker_p0-w0: resuming experience collection (15650 times) [2024-06-15 15:13:07,217][1652491] Updated weights for policy 0, policy_version 299297 (0.0015) [2024-06-15 15:13:07,803][1652491] Updated weights for policy 0, policy_version 299328 (0.0011) [2024-06-15 15:13:10,955][1648985] Fps is (10 sec: 42598.5, 60 sec: 43690.7, 300 sec: 46097.3). Total num frames: 613023744. Throughput: 0: 11047.9. Samples: 153320448. Policy #0 lag: (min: 127.0, avg: 227.3, max: 351.0) [2024-06-15 15:13:10,956][1648985] Avg episode reward: [(0, '171.850')] [2024-06-15 15:13:12,735][1652491] Updated weights for policy 0, policy_version 299392 (0.0013) [2024-06-15 15:13:15,955][1648985] Fps is (10 sec: 36044.5, 60 sec: 42598.4, 300 sec: 46208.5). Total num frames: 613220352. Throughput: 0: 11173.0. Samples: 153395200. Policy #0 lag: (min: 127.0, avg: 227.3, max: 351.0) [2024-06-15 15:13:15,956][1648985] Avg episode reward: [(0, '169.140')] [2024-06-15 15:13:16,906][1652491] Updated weights for policy 0, policy_version 299472 (0.0041) [2024-06-15 15:13:18,241][1652491] Updated weights for policy 0, policy_version 299525 (0.0013) [2024-06-15 15:13:19,452][1652491] Updated weights for policy 0, policy_version 299580 (0.0012) [2024-06-15 15:13:20,955][1648985] Fps is (10 sec: 52429.2, 60 sec: 43701.7, 300 sec: 46208.4). Total num frames: 613548032. Throughput: 0: 11093.3. Samples: 153454080. Policy #0 lag: (min: 127.0, avg: 227.3, max: 351.0) [2024-06-15 15:13:20,955][1648985] Avg episode reward: [(0, '152.610')] [2024-06-15 15:13:23,938][1652491] Updated weights for policy 0, policy_version 299632 (0.0013) [2024-06-15 15:13:25,955][1648985] Fps is (10 sec: 45874.0, 60 sec: 43690.8, 300 sec: 46208.4). Total num frames: 613679104. Throughput: 0: 11081.9. Samples: 153489920. Policy #0 lag: (min: 127.0, avg: 227.3, max: 351.0) [2024-06-15 15:13:25,956][1648985] Avg episode reward: [(0, '160.530')] [2024-06-15 15:13:27,537][1652491] Updated weights for policy 0, policy_version 299687 (0.0017) [2024-06-15 15:13:29,421][1652491] Updated weights for policy 0, policy_version 299760 (0.0012) [2024-06-15 15:13:30,768][1652491] Updated weights for policy 0, policy_version 299818 (0.0106) [2024-06-15 15:13:30,955][1648985] Fps is (10 sec: 49150.8, 60 sec: 45328.9, 300 sec: 46208.4). Total num frames: 614039552. Throughput: 0: 11172.9. Samples: 153553408. Policy #0 lag: (min: 127.0, avg: 227.3, max: 351.0) [2024-06-15 15:13:30,956][1648985] Avg episode reward: [(0, '176.580')] [2024-06-15 15:13:34,119][1652491] Updated weights for policy 0, policy_version 299844 (0.0050) [2024-06-15 15:13:35,127][1652491] Updated weights for policy 0, policy_version 299904 (0.0013) [2024-06-15 15:13:35,955][1648985] Fps is (10 sec: 52430.1, 60 sec: 45329.0, 300 sec: 46208.4). Total num frames: 614203392. Throughput: 0: 11252.6. Samples: 153631744. Policy #0 lag: (min: 127.0, avg: 227.3, max: 351.0) [2024-06-15 15:13:35,956][1648985] Avg episode reward: [(0, '174.150')] [2024-06-15 15:13:39,350][1652491] Updated weights for policy 0, policy_version 299984 (0.0029) [2024-06-15 15:13:40,955][1648985] Fps is (10 sec: 42599.6, 60 sec: 45875.2, 300 sec: 46208.4). Total num frames: 614465536. Throughput: 0: 11343.7. Samples: 153664512. Policy #0 lag: (min: 127.0, avg: 227.3, max: 351.0) [2024-06-15 15:13:40,955][1648985] Avg episode reward: [(0, '149.230')] [2024-06-15 15:13:41,504][1652491] Updated weights for policy 0, policy_version 300064 (0.0014) [2024-06-15 15:13:45,955][1648985] Fps is (10 sec: 39322.0, 60 sec: 43690.7, 300 sec: 45764.1). Total num frames: 614596608. Throughput: 0: 11309.5. Samples: 153729536. Policy #0 lag: (min: 127.0, avg: 227.3, max: 351.0) [2024-06-15 15:13:45,956][1648985] Avg episode reward: [(0, '145.480')] [2024-06-15 15:13:46,256][1652491] Updated weights for policy 0, policy_version 300098 (0.0012) [2024-06-15 15:13:49,967][1652491] Updated weights for policy 0, policy_version 300164 (0.0016) [2024-06-15 15:13:50,955][1648985] Fps is (10 sec: 32767.5, 60 sec: 44782.9, 300 sec: 45986.3). Total num frames: 614793216. Throughput: 0: 11389.1. Samples: 153799168. Policy #0 lag: (min: 13.0, avg: 106.2, max: 269.0) [2024-06-15 15:13:50,956][1648985] Avg episode reward: [(0, '155.190')] [2024-06-15 15:13:51,197][1651469] Signal inference workers to stop experience collection... (15700 times) [2024-06-15 15:13:51,268][1652491] InferenceWorker_p0-w0: stopping experience collection (15700 times) [2024-06-15 15:13:51,557][1651469] Signal inference workers to resume experience collection... (15700 times) [2024-06-15 15:13:51,558][1652491] InferenceWorker_p0-w0: resuming experience collection (15700 times) [2024-06-15 15:13:51,559][1652491] Updated weights for policy 0, policy_version 300224 (0.0070) [2024-06-15 15:13:53,301][1652491] Updated weights for policy 0, policy_version 300290 (0.0014) [2024-06-15 15:13:54,789][1652491] Updated weights for policy 0, policy_version 300349 (0.0011) [2024-06-15 15:13:55,955][1648985] Fps is (10 sec: 52428.5, 60 sec: 43690.9, 300 sec: 46208.4). Total num frames: 615120896. Throughput: 0: 11104.7. Samples: 153820160. Policy #0 lag: (min: 13.0, avg: 106.2, max: 269.0) [2024-06-15 15:13:55,956][1648985] Avg episode reward: [(0, '162.350')] [2024-06-15 15:13:59,291][1652491] Updated weights for policy 0, policy_version 300405 (0.0018) [2024-06-15 15:14:00,955][1648985] Fps is (10 sec: 45875.6, 60 sec: 44236.8, 300 sec: 46208.4). Total num frames: 615251968. Throughput: 0: 11047.8. Samples: 153892352. Policy #0 lag: (min: 13.0, avg: 106.2, max: 269.0) [2024-06-15 15:14:00,956][1648985] Avg episode reward: [(0, '142.970')] [2024-06-15 15:14:02,550][1652491] Updated weights for policy 0, policy_version 300449 (0.0037) [2024-06-15 15:14:04,300][1652491] Updated weights for policy 0, policy_version 300528 (0.0011) [2024-06-15 15:14:05,955][1648985] Fps is (10 sec: 49152.0, 60 sec: 45875.2, 300 sec: 46097.4). Total num frames: 615612416. Throughput: 0: 11082.0. Samples: 153952768. Policy #0 lag: (min: 13.0, avg: 106.2, max: 269.0) [2024-06-15 15:14:05,956][1648985] Avg episode reward: [(0, '141.640')] [2024-06-15 15:14:06,122][1652491] Updated weights for policy 0, policy_version 300603 (0.0011) [2024-06-15 15:14:09,991][1652491] Updated weights for policy 0, policy_version 300666 (0.0133) [2024-06-15 15:14:10,955][1648985] Fps is (10 sec: 52427.4, 60 sec: 45875.0, 300 sec: 46208.4). Total num frames: 615776256. Throughput: 0: 11218.5. Samples: 153994752. Policy #0 lag: (min: 13.0, avg: 106.2, max: 269.0) [2024-06-15 15:14:10,956][1648985] Avg episode reward: [(0, '145.070')] [2024-06-15 15:14:13,906][1652491] Updated weights for policy 0, policy_version 300735 (0.0015) [2024-06-15 15:14:15,955][1648985] Fps is (10 sec: 39321.9, 60 sec: 46421.4, 300 sec: 45875.2). Total num frames: 616005632. Throughput: 0: 11400.6. Samples: 154066432. Policy #0 lag: (min: 13.0, avg: 106.2, max: 269.0) [2024-06-15 15:14:15,955][1648985] Avg episode reward: [(0, '158.280')] [2024-06-15 15:14:16,669][1652491] Updated weights for policy 0, policy_version 300816 (0.0126) [2024-06-15 15:14:17,762][1652491] Updated weights for policy 0, policy_version 300864 (0.0013) [2024-06-15 15:14:20,955][1648985] Fps is (10 sec: 49153.3, 60 sec: 45329.0, 300 sec: 46097.4). Total num frames: 616267776. Throughput: 0: 11059.2. Samples: 154129408. Policy #0 lag: (min: 13.0, avg: 106.2, max: 269.0) [2024-06-15 15:14:20,956][1648985] Avg episode reward: [(0, '169.610')] [2024-06-15 15:14:24,446][1652491] Updated weights for policy 0, policy_version 300944 (0.0014) [2024-06-15 15:14:25,276][1652491] Updated weights for policy 0, policy_version 300988 (0.0014) [2024-06-15 15:14:25,955][1648985] Fps is (10 sec: 42597.8, 60 sec: 45875.3, 300 sec: 45875.2). Total num frames: 616431616. Throughput: 0: 11195.7. Samples: 154168320. Policy #0 lag: (min: 13.0, avg: 106.2, max: 269.0) [2024-06-15 15:14:25,956][1648985] Avg episode reward: [(0, '164.240')] [2024-06-15 15:14:27,831][1652491] Updated weights for policy 0, policy_version 301050 (0.0098) [2024-06-15 15:14:29,557][1652491] Updated weights for policy 0, policy_version 301120 (0.0011) [2024-06-15 15:14:30,955][1648985] Fps is (10 sec: 42598.6, 60 sec: 44237.0, 300 sec: 45764.1). Total num frames: 616693760. Throughput: 0: 11104.7. Samples: 154229248. Policy #0 lag: (min: 13.0, avg: 106.2, max: 269.0) [2024-06-15 15:14:30,955][1648985] Avg episode reward: [(0, '151.700')] [2024-06-15 15:14:31,857][1651469] Signal inference workers to stop experience collection... (15750 times) [2024-06-15 15:14:31,903][1652491] InferenceWorker_p0-w0: stopping experience collection (15750 times) [2024-06-15 15:14:32,120][1651469] Signal inference workers to resume experience collection... (15750 times) [2024-06-15 15:14:32,122][1652491] InferenceWorker_p0-w0: resuming experience collection (15750 times) [2024-06-15 15:14:32,395][1652491] Updated weights for policy 0, policy_version 301184 (0.0011) [2024-06-15 15:14:35,955][1648985] Fps is (10 sec: 42598.9, 60 sec: 44236.8, 300 sec: 45764.2). Total num frames: 616857600. Throughput: 0: 11309.5. Samples: 154308096. Policy #0 lag: (min: 13.0, avg: 106.2, max: 269.0) [2024-06-15 15:14:35,955][1648985] Avg episode reward: [(0, '146.280')] [2024-06-15 15:14:36,785][1652491] Updated weights for policy 0, policy_version 301245 (0.0014) [2024-06-15 15:14:39,328][1652491] Updated weights for policy 0, policy_version 301314 (0.0013) [2024-06-15 15:14:40,955][1648985] Fps is (10 sec: 52427.9, 60 sec: 45875.1, 300 sec: 46097.3). Total num frames: 617218048. Throughput: 0: 11582.5. Samples: 154341376. Policy #0 lag: (min: 13.0, avg: 106.2, max: 269.0) [2024-06-15 15:14:40,956][1648985] Avg episode reward: [(0, '136.610')] [2024-06-15 15:14:42,712][1652491] Updated weights for policy 0, policy_version 301392 (0.0115) [2024-06-15 15:14:43,914][1652491] Updated weights for policy 0, policy_version 301440 (0.0013) [2024-06-15 15:14:45,955][1648985] Fps is (10 sec: 49151.3, 60 sec: 45875.1, 300 sec: 45986.3). Total num frames: 617349120. Throughput: 0: 11673.6. Samples: 154417664. Policy #0 lag: (min: 13.0, avg: 106.2, max: 269.0) [2024-06-15 15:14:45,956][1648985] Avg episode reward: [(0, '137.550')] [2024-06-15 15:14:47,604][1652491] Updated weights for policy 0, policy_version 301498 (0.0014) [2024-06-15 15:14:49,635][1652491] Updated weights for policy 0, policy_version 301568 (0.0014) [2024-06-15 15:14:50,955][1648985] Fps is (10 sec: 52429.2, 60 sec: 49152.0, 300 sec: 46208.4). Total num frames: 617742336. Throughput: 0: 11832.9. Samples: 154485248. Policy #0 lag: (min: 13.0, avg: 106.2, max: 269.0) [2024-06-15 15:14:50,956][1648985] Avg episode reward: [(0, '132.870')] [2024-06-15 15:14:53,630][1652491] Updated weights for policy 0, policy_version 301648 (0.0014) [2024-06-15 15:14:55,956][1648985] Fps is (10 sec: 52424.2, 60 sec: 45874.5, 300 sec: 46208.3). Total num frames: 617873408. Throughput: 0: 11821.3. Samples: 154526720. Policy #0 lag: (min: 13.0, avg: 106.2, max: 269.0) [2024-06-15 15:14:55,956][1648985] Avg episode reward: [(0, '120.230')] [2024-06-15 15:14:55,960][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000301696_617873408.pth... [2024-06-15 15:14:56,027][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000296320_606863360.pth [2024-06-15 15:14:57,312][1652491] Updated weights for policy 0, policy_version 301698 (0.0022) [2024-06-15 15:14:59,179][1652491] Updated weights for policy 0, policy_version 301761 (0.0014) [2024-06-15 15:15:00,590][1652491] Updated weights for policy 0, policy_version 301824 (0.0072) [2024-06-15 15:15:00,955][1648985] Fps is (10 sec: 39321.9, 60 sec: 48059.8, 300 sec: 45875.2). Total num frames: 618135552. Throughput: 0: 11628.1. Samples: 154589696. Policy #0 lag: (min: 39.0, avg: 152.7, max: 295.0) [2024-06-15 15:15:00,956][1648985] Avg episode reward: [(0, '130.980')] [2024-06-15 15:15:05,340][1652491] Updated weights for policy 0, policy_version 301891 (0.0031) [2024-06-15 15:15:05,955][1648985] Fps is (10 sec: 42602.4, 60 sec: 44782.9, 300 sec: 45875.2). Total num frames: 618299392. Throughput: 0: 11878.4. Samples: 154663936. Policy #0 lag: (min: 39.0, avg: 152.7, max: 295.0) [2024-06-15 15:15:05,956][1648985] Avg episode reward: [(0, '152.810')] [2024-06-15 15:15:06,778][1652491] Updated weights for policy 0, policy_version 301949 (0.0094) [2024-06-15 15:15:09,994][1652491] Updated weights for policy 0, policy_version 302007 (0.0014) [2024-06-15 15:15:10,955][1648985] Fps is (10 sec: 42598.1, 60 sec: 46421.5, 300 sec: 45986.3). Total num frames: 618561536. Throughput: 0: 11821.5. Samples: 154700288. Policy #0 lag: (min: 39.0, avg: 152.7, max: 295.0) [2024-06-15 15:15:10,956][1648985] Avg episode reward: [(0, '147.000')] [2024-06-15 15:15:11,259][1652491] Updated weights for policy 0, policy_version 302048 (0.0020) [2024-06-15 15:15:13,034][1652491] Updated weights for policy 0, policy_version 302128 (0.0014) [2024-06-15 15:15:15,955][1648985] Fps is (10 sec: 49151.7, 60 sec: 46421.2, 300 sec: 45875.2). Total num frames: 618790912. Throughput: 0: 11878.4. Samples: 154763776. Policy #0 lag: (min: 39.0, avg: 152.7, max: 295.0) [2024-06-15 15:15:15,956][1648985] Avg episode reward: [(0, '137.880')] [2024-06-15 15:15:17,196][1652491] Updated weights for policy 0, policy_version 302160 (0.0022) [2024-06-15 15:15:17,332][1651469] Signal inference workers to stop experience collection... (15800 times) [2024-06-15 15:15:17,362][1652491] InferenceWorker_p0-w0: stopping experience collection (15800 times) [2024-06-15 15:15:17,670][1651469] Signal inference workers to resume experience collection... (15800 times) [2024-06-15 15:15:17,671][1652491] InferenceWorker_p0-w0: resuming experience collection (15800 times) [2024-06-15 15:15:18,439][1652491] Updated weights for policy 0, policy_version 302206 (0.0020) [2024-06-15 15:15:20,955][1648985] Fps is (10 sec: 42598.8, 60 sec: 45329.1, 300 sec: 45875.2). Total num frames: 618987520. Throughput: 0: 11741.9. Samples: 154836480. Policy #0 lag: (min: 39.0, avg: 152.7, max: 295.0) [2024-06-15 15:15:20,956][1648985] Avg episode reward: [(0, '121.290')] [2024-06-15 15:15:21,231][1652491] Updated weights for policy 0, policy_version 302256 (0.0017) [2024-06-15 15:15:22,288][1652491] Updated weights for policy 0, policy_version 302288 (0.0011) [2024-06-15 15:15:23,577][1652491] Updated weights for policy 0, policy_version 302338 (0.0013) [2024-06-15 15:15:24,590][1652491] Updated weights for policy 0, policy_version 302394 (0.0014) [2024-06-15 15:15:25,955][1648985] Fps is (10 sec: 52429.4, 60 sec: 48059.8, 300 sec: 46097.4). Total num frames: 619315200. Throughput: 0: 11730.5. Samples: 154869248. Policy #0 lag: (min: 39.0, avg: 152.7, max: 295.0) [2024-06-15 15:15:25,955][1648985] Avg episode reward: [(0, '114.580')] [2024-06-15 15:15:28,364][1652491] Updated weights for policy 0, policy_version 302436 (0.0013) [2024-06-15 15:15:30,955][1648985] Fps is (10 sec: 45874.6, 60 sec: 45875.1, 300 sec: 46097.4). Total num frames: 619446272. Throughput: 0: 11685.0. Samples: 154943488. Policy #0 lag: (min: 39.0, avg: 152.7, max: 295.0) [2024-06-15 15:15:30,956][1648985] Avg episode reward: [(0, '129.160')] [2024-06-15 15:15:31,770][1652491] Updated weights for policy 0, policy_version 302480 (0.0012) [2024-06-15 15:15:33,683][1652491] Updated weights for policy 0, policy_version 302560 (0.0129) [2024-06-15 15:15:35,955][1648985] Fps is (10 sec: 52427.8, 60 sec: 49698.0, 300 sec: 46208.4). Total num frames: 619839488. Throughput: 0: 11411.9. Samples: 154998784. Policy #0 lag: (min: 39.0, avg: 152.7, max: 295.0) [2024-06-15 15:15:35,956][1648985] Avg episode reward: [(0, '127.190')] [2024-06-15 15:15:39,319][1652491] Updated weights for policy 0, policy_version 302657 (0.0017) [2024-06-15 15:15:40,725][1652491] Updated weights for policy 0, policy_version 302716 (0.0013) [2024-06-15 15:15:40,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 45875.2, 300 sec: 46208.4). Total num frames: 619970560. Throughput: 0: 11423.5. Samples: 155040768. Policy #0 lag: (min: 39.0, avg: 152.7, max: 295.0) [2024-06-15 15:15:40,956][1648985] Avg episode reward: [(0, '136.910')] [2024-06-15 15:15:44,321][1652491] Updated weights for policy 0, policy_version 302768 (0.0012) [2024-06-15 15:15:45,543][1652491] Updated weights for policy 0, policy_version 302816 (0.0015) [2024-06-15 15:15:45,955][1648985] Fps is (10 sec: 36045.6, 60 sec: 47513.7, 300 sec: 45875.2). Total num frames: 620199936. Throughput: 0: 11594.0. Samples: 155111424. Policy #0 lag: (min: 39.0, avg: 152.7, max: 295.0) [2024-06-15 15:15:45,955][1648985] Avg episode reward: [(0, '128.020')] [2024-06-15 15:15:47,426][1652491] Updated weights for policy 0, policy_version 302880 (0.0013) [2024-06-15 15:15:50,955][1648985] Fps is (10 sec: 39322.0, 60 sec: 43690.7, 300 sec: 45764.2). Total num frames: 620363776. Throughput: 0: 11457.4. Samples: 155179520. Policy #0 lag: (min: 39.0, avg: 152.7, max: 295.0) [2024-06-15 15:15:50,955][1648985] Avg episode reward: [(0, '159.500')] [2024-06-15 15:15:51,094][1652491] Updated weights for policy 0, policy_version 302928 (0.0014) [2024-06-15 15:15:54,618][1652491] Updated weights for policy 0, policy_version 302992 (0.0015) [2024-06-15 15:15:55,955][1648985] Fps is (10 sec: 42596.9, 60 sec: 45875.7, 300 sec: 45764.1). Total num frames: 620625920. Throughput: 0: 11434.6. Samples: 155214848. Policy #0 lag: (min: 39.0, avg: 152.7, max: 295.0) [2024-06-15 15:15:55,956][1648985] Avg episode reward: [(0, '157.640')] [2024-06-15 15:15:56,972][1652491] Updated weights for policy 0, policy_version 303058 (0.0014) [2024-06-15 15:15:58,119][1651469] Signal inference workers to stop experience collection... (15850 times) [2024-06-15 15:15:58,190][1652491] InferenceWorker_p0-w0: stopping experience collection (15850 times) [2024-06-15 15:15:58,412][1651469] Signal inference workers to resume experience collection... (15850 times) [2024-06-15 15:15:58,413][1652491] InferenceWorker_p0-w0: resuming experience collection (15850 times) [2024-06-15 15:15:58,415][1652491] Updated weights for policy 0, policy_version 303120 (0.0111) [2024-06-15 15:16:00,955][1648985] Fps is (10 sec: 52429.2, 60 sec: 45875.3, 300 sec: 45764.2). Total num frames: 620888064. Throughput: 0: 11377.8. Samples: 155275776. Policy #0 lag: (min: 39.0, avg: 152.7, max: 295.0) [2024-06-15 15:16:00,956][1648985] Avg episode reward: [(0, '145.570')] [2024-06-15 15:16:01,988][1652491] Updated weights for policy 0, policy_version 303171 (0.0012) [2024-06-15 15:16:03,259][1652491] Updated weights for policy 0, policy_version 303232 (0.0013) [2024-06-15 15:16:05,955][1648985] Fps is (10 sec: 42598.8, 60 sec: 45875.1, 300 sec: 45541.9). Total num frames: 621051904. Throughput: 0: 11525.6. Samples: 155355136. Policy #0 lag: (min: 39.0, avg: 152.7, max: 295.0) [2024-06-15 15:16:05,956][1648985] Avg episode reward: [(0, '135.520')] [2024-06-15 15:16:06,483][1652491] Updated weights for policy 0, policy_version 303292 (0.0015) [2024-06-15 15:16:09,271][1652491] Updated weights for policy 0, policy_version 303360 (0.0017) [2024-06-15 15:16:10,349][1652491] Updated weights for policy 0, policy_version 303395 (0.0011) [2024-06-15 15:16:10,955][1648985] Fps is (10 sec: 52427.7, 60 sec: 47513.6, 300 sec: 45764.6). Total num frames: 621412352. Throughput: 0: 11571.2. Samples: 155389952. Policy #0 lag: (min: 39.0, avg: 152.7, max: 295.0) [2024-06-15 15:16:10,956][1648985] Avg episode reward: [(0, '144.670')] [2024-06-15 15:16:12,811][1652491] Updated weights for policy 0, policy_version 303456 (0.0012) [2024-06-15 15:16:15,955][1648985] Fps is (10 sec: 49153.0, 60 sec: 45875.3, 300 sec: 45764.1). Total num frames: 621543424. Throughput: 0: 11468.8. Samples: 155459584. Policy #0 lag: (min: 39.0, avg: 152.7, max: 295.0) [2024-06-15 15:16:15,956][1648985] Avg episode reward: [(0, '162.720')] [2024-06-15 15:16:16,446][1652491] Updated weights for policy 0, policy_version 303491 (0.0040) [2024-06-15 15:16:17,539][1652491] Updated weights for policy 0, policy_version 303544 (0.0015) [2024-06-15 15:16:19,841][1652491] Updated weights for policy 0, policy_version 303586 (0.0013) [2024-06-15 15:16:20,955][1648985] Fps is (10 sec: 39322.0, 60 sec: 46967.4, 300 sec: 45653.1). Total num frames: 621805568. Throughput: 0: 11924.0. Samples: 155535360. Policy #0 lag: (min: 15.0, avg: 111.7, max: 271.0) [2024-06-15 15:16:20,955][1648985] Avg episode reward: [(0, '151.740')] [2024-06-15 15:16:21,440][1652491] Updated weights for policy 0, policy_version 303648 (0.0019) [2024-06-15 15:16:22,986][1652491] Updated weights for policy 0, policy_version 303696 (0.0012) [2024-06-15 15:16:24,224][1652491] Updated weights for policy 0, policy_version 303744 (0.0025) [2024-06-15 15:16:25,955][1648985] Fps is (10 sec: 52427.6, 60 sec: 45875.0, 300 sec: 45764.1). Total num frames: 622067712. Throughput: 0: 11605.3. Samples: 155563008. Policy #0 lag: (min: 15.0, avg: 111.7, max: 271.0) [2024-06-15 15:16:25,956][1648985] Avg episode reward: [(0, '137.200')] [2024-06-15 15:16:28,639][1652491] Updated weights for policy 0, policy_version 303806 (0.0016) [2024-06-15 15:16:30,955][1648985] Fps is (10 sec: 42598.2, 60 sec: 46421.4, 300 sec: 45430.9). Total num frames: 622231552. Throughput: 0: 11707.7. Samples: 155638272. Policy #0 lag: (min: 15.0, avg: 111.7, max: 271.0) [2024-06-15 15:16:30,956][1648985] Avg episode reward: [(0, '146.400')] [2024-06-15 15:16:32,371][1652491] Updated weights for policy 0, policy_version 303872 (0.0013) [2024-06-15 15:16:33,801][1652491] Updated weights for policy 0, policy_version 303931 (0.0012) [2024-06-15 15:16:34,826][1652491] Updated weights for policy 0, policy_version 303972 (0.0116) [2024-06-15 15:16:35,955][1648985] Fps is (10 sec: 52430.1, 60 sec: 45875.3, 300 sec: 46097.4). Total num frames: 622592000. Throughput: 0: 11502.9. Samples: 155697152. Policy #0 lag: (min: 15.0, avg: 111.7, max: 271.0) [2024-06-15 15:16:35,956][1648985] Avg episode reward: [(0, '163.430')] [2024-06-15 15:16:40,003][1652491] Updated weights for policy 0, policy_version 304036 (0.0016) [2024-06-15 15:16:40,955][1648985] Fps is (10 sec: 49150.9, 60 sec: 45875.1, 300 sec: 45319.8). Total num frames: 622723072. Throughput: 0: 11616.7. Samples: 155737600. Policy #0 lag: (min: 15.0, avg: 111.7, max: 271.0) [2024-06-15 15:16:40,956][1648985] Avg episode reward: [(0, '169.080')] [2024-06-15 15:16:42,969][1651469] Signal inference workers to stop experience collection... (15900 times) [2024-06-15 15:16:43,002][1652491] Updated weights for policy 0, policy_version 304082 (0.0012) [2024-06-15 15:16:43,053][1652491] InferenceWorker_p0-w0: stopping experience collection (15900 times) [2024-06-15 15:16:43,292][1651469] Signal inference workers to resume experience collection... (15900 times) [2024-06-15 15:16:43,294][1652491] InferenceWorker_p0-w0: resuming experience collection (15900 times) [2024-06-15 15:16:44,976][1652491] Updated weights for policy 0, policy_version 304160 (0.0013) [2024-06-15 15:16:45,955][1648985] Fps is (10 sec: 39321.7, 60 sec: 46421.3, 300 sec: 45764.1). Total num frames: 622985216. Throughput: 0: 11821.5. Samples: 155807744. Policy #0 lag: (min: 15.0, avg: 111.7, max: 271.0) [2024-06-15 15:16:45,955][1648985] Avg episode reward: [(0, '157.390')] [2024-06-15 15:16:46,980][1652491] Updated weights for policy 0, policy_version 304247 (0.0015) [2024-06-15 15:16:50,872][1652491] Updated weights for policy 0, policy_version 304288 (0.0017) [2024-06-15 15:16:50,955][1648985] Fps is (10 sec: 45876.2, 60 sec: 46967.4, 300 sec: 45653.0). Total num frames: 623181824. Throughput: 0: 11491.6. Samples: 155872256. Policy #0 lag: (min: 15.0, avg: 111.7, max: 271.0) [2024-06-15 15:16:50,956][1648985] Avg episode reward: [(0, '147.890')] [2024-06-15 15:16:54,671][1652491] Updated weights for policy 0, policy_version 304338 (0.0012) [2024-06-15 15:16:55,955][1648985] Fps is (10 sec: 39321.3, 60 sec: 45875.4, 300 sec: 45319.8). Total num frames: 623378432. Throughput: 0: 11650.9. Samples: 155914240. Policy #0 lag: (min: 15.0, avg: 111.7, max: 271.0) [2024-06-15 15:16:55,956][1648985] Avg episode reward: [(0, '131.770')] [2024-06-15 15:16:56,274][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000304400_623411200.pth... [2024-06-15 15:16:56,478][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000299072_612499456.pth [2024-06-15 15:16:56,932][1652491] Updated weights for policy 0, policy_version 304417 (0.0013) [2024-06-15 15:16:58,173][1652491] Updated weights for policy 0, policy_version 304467 (0.0021) [2024-06-15 15:17:00,955][1648985] Fps is (10 sec: 45875.3, 60 sec: 45875.1, 300 sec: 45764.1). Total num frames: 623640576. Throughput: 0: 11343.6. Samples: 155970048. Policy #0 lag: (min: 15.0, avg: 111.7, max: 271.0) [2024-06-15 15:17:00,956][1648985] Avg episode reward: [(0, '137.090')] [2024-06-15 15:17:02,249][1652491] Updated weights for policy 0, policy_version 304520 (0.0090) [2024-06-15 15:17:05,955][1648985] Fps is (10 sec: 39321.8, 60 sec: 45329.2, 300 sec: 45319.8). Total num frames: 623771648. Throughput: 0: 11366.4. Samples: 156046848. Policy #0 lag: (min: 15.0, avg: 111.7, max: 271.0) [2024-06-15 15:17:05,956][1648985] Avg episode reward: [(0, '131.820')] [2024-06-15 15:17:06,949][1652491] Updated weights for policy 0, policy_version 304608 (0.0015) [2024-06-15 15:17:08,055][1652491] Updated weights for policy 0, policy_version 304657 (0.0017) [2024-06-15 15:17:10,073][1652491] Updated weights for policy 0, policy_version 304736 (0.0014) [2024-06-15 15:17:10,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 45875.3, 300 sec: 45764.1). Total num frames: 624164864. Throughput: 0: 11480.2. Samples: 156079616. Policy #0 lag: (min: 15.0, avg: 111.7, max: 271.0) [2024-06-15 15:17:10,955][1648985] Avg episode reward: [(0, '137.280')] [2024-06-15 15:17:14,001][1652491] Updated weights for policy 0, policy_version 304805 (0.0013) [2024-06-15 15:17:15,955][1648985] Fps is (10 sec: 52428.5, 60 sec: 45875.2, 300 sec: 45322.1). Total num frames: 624295936. Throughput: 0: 11298.1. Samples: 156146688. Policy #0 lag: (min: 15.0, avg: 111.7, max: 271.0) [2024-06-15 15:17:15,956][1648985] Avg episode reward: [(0, '138.310')] [2024-06-15 15:17:18,106][1652491] Updated weights for policy 0, policy_version 304864 (0.0014) [2024-06-15 15:17:19,565][1652491] Updated weights for policy 0, policy_version 304915 (0.0011) [2024-06-15 15:17:20,955][1648985] Fps is (10 sec: 39321.2, 60 sec: 45875.1, 300 sec: 45764.2). Total num frames: 624558080. Throughput: 0: 11537.0. Samples: 156216320. Policy #0 lag: (min: 15.0, avg: 111.7, max: 271.0) [2024-06-15 15:17:20,956][1648985] Avg episode reward: [(0, '138.540')] [2024-06-15 15:17:21,292][1651469] Signal inference workers to stop experience collection... (15950 times) [2024-06-15 15:17:21,351][1652491] InferenceWorker_p0-w0: stopping experience collection (15950 times) [2024-06-15 15:17:21,546][1651469] Signal inference workers to resume experience collection... (15950 times) [2024-06-15 15:17:21,547][1652491] InferenceWorker_p0-w0: resuming experience collection (15950 times) [2024-06-15 15:17:21,549][1652491] Updated weights for policy 0, policy_version 304992 (0.0012) [2024-06-15 15:17:25,173][1652491] Updated weights for policy 0, policy_version 305043 (0.0024) [2024-06-15 15:17:25,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 45875.4, 300 sec: 45764.1). Total num frames: 624820224. Throughput: 0: 11389.2. Samples: 156250112. Policy #0 lag: (min: 15.0, avg: 111.7, max: 271.0) [2024-06-15 15:17:25,956][1648985] Avg episode reward: [(0, '146.660')] [2024-06-15 15:17:29,465][1652491] Updated weights for policy 0, policy_version 305120 (0.0014) [2024-06-15 15:17:30,955][1648985] Fps is (10 sec: 42598.5, 60 sec: 45875.2, 300 sec: 45764.1). Total num frames: 624984064. Throughput: 0: 11525.7. Samples: 156326400. Policy #0 lag: (min: 79.0, avg: 156.1, max: 335.0) [2024-06-15 15:17:30,956][1648985] Avg episode reward: [(0, '136.100')] [2024-06-15 15:17:31,613][1652491] Updated weights for policy 0, policy_version 305202 (0.0043) [2024-06-15 15:17:33,185][1652491] Updated weights for policy 0, policy_version 305271 (0.0016) [2024-06-15 15:17:35,955][1648985] Fps is (10 sec: 39321.7, 60 sec: 43690.6, 300 sec: 45764.1). Total num frames: 625213440. Throughput: 0: 11514.3. Samples: 156390400. Policy #0 lag: (min: 79.0, avg: 156.1, max: 335.0) [2024-06-15 15:17:35,956][1648985] Avg episode reward: [(0, '131.840')] [2024-06-15 15:17:37,091][1652491] Updated weights for policy 0, policy_version 305328 (0.0025) [2024-06-15 15:17:40,955][1648985] Fps is (10 sec: 36044.8, 60 sec: 43690.8, 300 sec: 45319.8). Total num frames: 625344512. Throughput: 0: 11309.5. Samples: 156423168. Policy #0 lag: (min: 79.0, avg: 156.1, max: 335.0) [2024-06-15 15:17:40,956][1648985] Avg episode reward: [(0, '139.790')] [2024-06-15 15:17:42,282][1652491] Updated weights for policy 0, policy_version 305409 (0.0014) [2024-06-15 15:17:43,225][1652491] Updated weights for policy 0, policy_version 305461 (0.0012) [2024-06-15 15:17:44,736][1652491] Updated weights for policy 0, policy_version 305524 (0.0015) [2024-06-15 15:17:45,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 45875.2, 300 sec: 46208.4). Total num frames: 625737728. Throughput: 0: 11491.6. Samples: 156487168. Policy #0 lag: (min: 79.0, avg: 156.1, max: 335.0) [2024-06-15 15:17:45,956][1648985] Avg episode reward: [(0, '146.380')] [2024-06-15 15:17:47,231][1652491] Updated weights for policy 0, policy_version 305540 (0.0028) [2024-06-15 15:17:48,282][1652491] Updated weights for policy 0, policy_version 305591 (0.0012) [2024-06-15 15:17:50,966][1648985] Fps is (10 sec: 52370.8, 60 sec: 44774.7, 300 sec: 45318.2). Total num frames: 625868800. Throughput: 0: 11602.5. Samples: 156569088. Policy #0 lag: (min: 79.0, avg: 156.1, max: 335.0) [2024-06-15 15:17:50,967][1648985] Avg episode reward: [(0, '156.630')] [2024-06-15 15:17:52,201][1652491] Updated weights for policy 0, policy_version 305632 (0.0013) [2024-06-15 15:17:53,884][1652491] Updated weights for policy 0, policy_version 305702 (0.0013) [2024-06-15 15:17:55,317][1652491] Updated weights for policy 0, policy_version 305761 (0.0015) [2024-06-15 15:17:55,955][1648985] Fps is (10 sec: 49152.3, 60 sec: 47513.7, 300 sec: 46208.5). Total num frames: 626229248. Throughput: 0: 11502.9. Samples: 156597248. Policy #0 lag: (min: 79.0, avg: 156.1, max: 335.0) [2024-06-15 15:17:55,955][1648985] Avg episode reward: [(0, '154.210')] [2024-06-15 15:17:56,024][1652491] Updated weights for policy 0, policy_version 305791 (0.0032) [2024-06-15 15:17:59,470][1652491] Updated weights for policy 0, policy_version 305845 (0.0022) [2024-06-15 15:18:00,955][1648985] Fps is (10 sec: 52487.1, 60 sec: 45875.2, 300 sec: 45875.2). Total num frames: 626393088. Throughput: 0: 11650.9. Samples: 156670976. Policy #0 lag: (min: 79.0, avg: 156.1, max: 335.0) [2024-06-15 15:18:00,956][1648985] Avg episode reward: [(0, '142.310')] [2024-06-15 15:18:02,972][1652491] Updated weights for policy 0, policy_version 305888 (0.0010) [2024-06-15 15:18:04,086][1651469] Signal inference workers to stop experience collection... (16000 times) [2024-06-15 15:18:04,157][1652491] InferenceWorker_p0-w0: stopping experience collection (16000 times) [2024-06-15 15:18:04,403][1651469] Signal inference workers to resume experience collection... (16000 times) [2024-06-15 15:18:04,404][1652491] InferenceWorker_p0-w0: resuming experience collection (16000 times) [2024-06-15 15:18:05,192][1652491] Updated weights for policy 0, policy_version 305968 (0.0027) [2024-06-15 15:18:05,955][1648985] Fps is (10 sec: 45875.3, 60 sec: 48605.9, 300 sec: 46319.5). Total num frames: 626688000. Throughput: 0: 11525.7. Samples: 156734976. Policy #0 lag: (min: 79.0, avg: 156.1, max: 335.0) [2024-06-15 15:18:05,955][1648985] Avg episode reward: [(0, '143.030')] [2024-06-15 15:18:06,850][1652491] Updated weights for policy 0, policy_version 306047 (0.0014) [2024-06-15 15:18:10,955][1648985] Fps is (10 sec: 49152.1, 60 sec: 45329.0, 300 sec: 46319.5). Total num frames: 626884608. Throughput: 0: 11548.5. Samples: 156769792. Policy #0 lag: (min: 79.0, avg: 156.1, max: 335.0) [2024-06-15 15:18:10,955][1648985] Avg episode reward: [(0, '148.400')] [2024-06-15 15:18:10,974][1652491] Updated weights for policy 0, policy_version 306108 (0.0011) [2024-06-15 15:18:14,582][1652491] Updated weights for policy 0, policy_version 306132 (0.0021) [2024-06-15 15:18:15,490][1652491] Updated weights for policy 0, policy_version 306176 (0.0021) [2024-06-15 15:18:15,955][1648985] Fps is (10 sec: 36044.9, 60 sec: 45875.3, 300 sec: 45764.1). Total num frames: 627048448. Throughput: 0: 11548.5. Samples: 156846080. Policy #0 lag: (min: 79.0, avg: 156.1, max: 335.0) [2024-06-15 15:18:15,956][1648985] Avg episode reward: [(0, '133.000')] [2024-06-15 15:18:17,419][1652491] Updated weights for policy 0, policy_version 306256 (0.0013) [2024-06-15 15:18:20,536][1652491] Updated weights for policy 0, policy_version 306323 (0.0014) [2024-06-15 15:18:20,955][1648985] Fps is (10 sec: 49150.5, 60 sec: 46967.3, 300 sec: 46430.6). Total num frames: 627376128. Throughput: 0: 11548.4. Samples: 156910080. Policy #0 lag: (min: 79.0, avg: 156.1, max: 335.0) [2024-06-15 15:18:20,956][1648985] Avg episode reward: [(0, '140.010')] [2024-06-15 15:18:21,364][1652491] Updated weights for policy 0, policy_version 306361 (0.0011) [2024-06-15 15:18:24,757][1652491] Updated weights for policy 0, policy_version 306386 (0.0012) [2024-06-15 15:18:25,955][1648985] Fps is (10 sec: 52428.4, 60 sec: 45875.2, 300 sec: 45875.2). Total num frames: 627572736. Throughput: 0: 11901.2. Samples: 156958720. Policy #0 lag: (min: 79.0, avg: 156.1, max: 335.0) [2024-06-15 15:18:25,956][1648985] Avg episode reward: [(0, '147.940')] [2024-06-15 15:18:27,560][1652491] Updated weights for policy 0, policy_version 306480 (0.0013) [2024-06-15 15:18:29,355][1652491] Updated weights for policy 0, policy_version 306553 (0.0013) [2024-06-15 15:18:30,955][1648985] Fps is (10 sec: 45877.1, 60 sec: 47513.7, 300 sec: 46208.5). Total num frames: 627834880. Throughput: 0: 11798.8. Samples: 157018112. Policy #0 lag: (min: 79.0, avg: 156.1, max: 335.0) [2024-06-15 15:18:30,955][1648985] Avg episode reward: [(0, '159.180')] [2024-06-15 15:18:32,038][1652491] Updated weights for policy 0, policy_version 306613 (0.0014) [2024-06-15 15:18:35,926][1652491] Updated weights for policy 0, policy_version 306662 (0.0012) [2024-06-15 15:18:35,955][1648985] Fps is (10 sec: 45875.2, 60 sec: 46967.5, 300 sec: 45986.3). Total num frames: 628031488. Throughput: 0: 11710.6. Samples: 157095936. Policy #0 lag: (min: 79.0, avg: 156.1, max: 335.0) [2024-06-15 15:18:35,956][1648985] Avg episode reward: [(0, '179.060')] [2024-06-15 15:18:38,632][1652491] Updated weights for policy 0, policy_version 306705 (0.0032) [2024-06-15 15:18:40,016][1652491] Updated weights for policy 0, policy_version 306773 (0.0012) [2024-06-15 15:18:40,828][1652491] Updated weights for policy 0, policy_version 306816 (0.0012) [2024-06-15 15:18:40,955][1648985] Fps is (10 sec: 52428.1, 60 sec: 50244.3, 300 sec: 46652.7). Total num frames: 628359168. Throughput: 0: 11878.4. Samples: 157131776. Policy #0 lag: (min: 79.0, avg: 156.1, max: 335.0) [2024-06-15 15:18:40,956][1648985] Avg episode reward: [(0, '152.730')] [2024-06-15 15:18:42,524][1652491] Updated weights for policy 0, policy_version 306876 (0.0015) [2024-06-15 15:18:45,441][1651469] Signal inference workers to stop experience collection... (16050 times) [2024-06-15 15:18:45,495][1652491] InferenceWorker_p0-w0: stopping experience collection (16050 times) [2024-06-15 15:18:45,689][1651469] Signal inference workers to resume experience collection... (16050 times) [2024-06-15 15:18:45,689][1652491] InferenceWorker_p0-w0: resuming experience collection (16050 times) [2024-06-15 15:18:45,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 46967.5, 300 sec: 46652.8). Total num frames: 628555776. Throughput: 0: 12026.3. Samples: 157212160. Policy #0 lag: (min: 15.0, avg: 115.7, max: 271.0) [2024-06-15 15:18:45,956][1648985] Avg episode reward: [(0, '121.170')] [2024-06-15 15:18:46,112][1652491] Updated weights for policy 0, policy_version 306928 (0.0013) [2024-06-15 15:18:49,073][1652491] Updated weights for policy 0, policy_version 306966 (0.0012) [2024-06-15 15:18:50,955][1648985] Fps is (10 sec: 49152.2, 60 sec: 49707.4, 300 sec: 46541.7). Total num frames: 628850688. Throughput: 0: 12014.9. Samples: 157275648. Policy #0 lag: (min: 15.0, avg: 115.7, max: 271.0) [2024-06-15 15:18:50,956][1648985] Avg episode reward: [(0, '122.820')] [2024-06-15 15:18:51,114][1652491] Updated weights for policy 0, policy_version 307064 (0.0149) [2024-06-15 15:18:53,099][1652491] Updated weights for policy 0, policy_version 307120 (0.0101) [2024-06-15 15:18:55,955][1648985] Fps is (10 sec: 45873.3, 60 sec: 46421.0, 300 sec: 46652.7). Total num frames: 629014528. Throughput: 0: 11969.3. Samples: 157308416. Policy #0 lag: (min: 15.0, avg: 115.7, max: 271.0) [2024-06-15 15:18:55,956][1648985] Avg episode reward: [(0, '136.120')] [2024-06-15 15:18:55,961][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000307136_629014528.pth... [2024-06-15 15:18:56,166][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000301696_617873408.pth [2024-06-15 15:18:56,875][1652491] Updated weights for policy 0, policy_version 307168 (0.0012) [2024-06-15 15:19:00,169][1652491] Updated weights for policy 0, policy_version 307232 (0.0083) [2024-06-15 15:19:00,955][1648985] Fps is (10 sec: 42597.0, 60 sec: 48059.5, 300 sec: 46319.5). Total num frames: 629276672. Throughput: 0: 12026.2. Samples: 157387264. Policy #0 lag: (min: 15.0, avg: 115.7, max: 271.0) [2024-06-15 15:19:00,956][1648985] Avg episode reward: [(0, '142.680')] [2024-06-15 15:19:00,971][1652491] Updated weights for policy 0, policy_version 307270 (0.0013) [2024-06-15 15:19:02,109][1652491] Updated weights for policy 0, policy_version 307328 (0.0146) [2024-06-15 15:19:04,590][1652491] Updated weights for policy 0, policy_version 307392 (0.0015) [2024-06-15 15:19:05,955][1648985] Fps is (10 sec: 52430.8, 60 sec: 47513.6, 300 sec: 46652.8). Total num frames: 629538816. Throughput: 0: 12117.4. Samples: 157455360. Policy #0 lag: (min: 15.0, avg: 115.7, max: 271.0) [2024-06-15 15:19:05,956][1648985] Avg episode reward: [(0, '142.030')] [2024-06-15 15:19:08,282][1652491] Updated weights for policy 0, policy_version 307446 (0.0012) [2024-06-15 15:19:10,955][1648985] Fps is (10 sec: 42599.8, 60 sec: 46967.5, 300 sec: 46430.6). Total num frames: 629702656. Throughput: 0: 11889.8. Samples: 157493760. Policy #0 lag: (min: 15.0, avg: 115.7, max: 271.0) [2024-06-15 15:19:10,955][1648985] Avg episode reward: [(0, '128.420')] [2024-06-15 15:19:10,985][1652491] Updated weights for policy 0, policy_version 307478 (0.0012) [2024-06-15 15:19:13,104][1652491] Updated weights for policy 0, policy_version 307581 (0.0134) [2024-06-15 15:19:15,629][1652491] Updated weights for policy 0, policy_version 307642 (0.0014) [2024-06-15 15:19:15,955][1648985] Fps is (10 sec: 52428.2, 60 sec: 50244.1, 300 sec: 46763.8). Total num frames: 630063104. Throughput: 0: 12162.8. Samples: 157565440. Policy #0 lag: (min: 15.0, avg: 115.7, max: 271.0) [2024-06-15 15:19:15,956][1648985] Avg episode reward: [(0, '141.000')] [2024-06-15 15:19:19,474][1652491] Updated weights for policy 0, policy_version 307704 (0.0088) [2024-06-15 15:19:20,955][1648985] Fps is (10 sec: 49152.1, 60 sec: 46967.7, 300 sec: 46652.8). Total num frames: 630194176. Throughput: 0: 12037.7. Samples: 157637632. Policy #0 lag: (min: 15.0, avg: 115.7, max: 271.0) [2024-06-15 15:19:20,955][1648985] Avg episode reward: [(0, '149.530')] [2024-06-15 15:19:22,182][1652491] Updated weights for policy 0, policy_version 307746 (0.0028) [2024-06-15 15:19:23,695][1652491] Updated weights for policy 0, policy_version 307811 (0.0103) [2024-06-15 15:19:25,707][1652491] Updated weights for policy 0, policy_version 307841 (0.0013) [2024-06-15 15:19:25,955][1648985] Fps is (10 sec: 42598.5, 60 sec: 48605.8, 300 sec: 46763.8). Total num frames: 630489088. Throughput: 0: 11855.6. Samples: 157665280. Policy #0 lag: (min: 15.0, avg: 115.7, max: 271.0) [2024-06-15 15:19:25,956][1648985] Avg episode reward: [(0, '151.340')] [2024-06-15 15:19:26,477][1651469] Signal inference workers to stop experience collection... (16100 times) [2024-06-15 15:19:26,524][1652491] InferenceWorker_p0-w0: stopping experience collection (16100 times) [2024-06-15 15:19:26,707][1651469] Signal inference workers to resume experience collection... (16100 times) [2024-06-15 15:19:26,708][1652491] InferenceWorker_p0-w0: resuming experience collection (16100 times) [2024-06-15 15:19:27,006][1652491] Updated weights for policy 0, policy_version 307899 (0.0014) [2024-06-15 15:19:30,955][1648985] Fps is (10 sec: 49151.9, 60 sec: 47513.5, 300 sec: 46874.9). Total num frames: 630685696. Throughput: 0: 11832.9. Samples: 157744640. Policy #0 lag: (min: 15.0, avg: 115.7, max: 271.0) [2024-06-15 15:19:30,956][1648985] Avg episode reward: [(0, '149.440')] [2024-06-15 15:19:30,969][1652491] Updated weights for policy 0, policy_version 307963 (0.0021) [2024-06-15 15:19:33,597][1652491] Updated weights for policy 0, policy_version 308016 (0.0115) [2024-06-15 15:19:35,955][1648985] Fps is (10 sec: 49150.8, 60 sec: 49151.7, 300 sec: 46652.7). Total num frames: 630980608. Throughput: 0: 11571.1. Samples: 157796352. Policy #0 lag: (min: 15.0, avg: 115.7, max: 271.0) [2024-06-15 15:19:35,956][1648985] Avg episode reward: [(0, '154.390')] [2024-06-15 15:19:37,932][1652491] Updated weights for policy 0, policy_version 308097 (0.0012) [2024-06-15 15:19:40,955][1648985] Fps is (10 sec: 42598.5, 60 sec: 45875.2, 300 sec: 46652.8). Total num frames: 631111680. Throughput: 0: 11673.7. Samples: 157833728. Policy #0 lag: (min: 15.0, avg: 115.7, max: 271.0) [2024-06-15 15:19:40,956][1648985] Avg episode reward: [(0, '147.330')] [2024-06-15 15:19:42,071][1652491] Updated weights for policy 0, policy_version 308195 (0.0014) [2024-06-15 15:19:45,233][1652491] Updated weights for policy 0, policy_version 308272 (0.0021) [2024-06-15 15:19:45,955][1648985] Fps is (10 sec: 42599.9, 60 sec: 47513.6, 300 sec: 46319.5). Total num frames: 631406592. Throughput: 0: 11650.9. Samples: 157911552. Policy #0 lag: (min: 15.0, avg: 115.7, max: 271.0) [2024-06-15 15:19:45,955][1648985] Avg episode reward: [(0, '141.540')] [2024-06-15 15:19:46,703][1652491] Updated weights for policy 0, policy_version 308348 (0.0013) [2024-06-15 15:19:49,363][1652491] Updated weights for policy 0, policy_version 308386 (0.0014) [2024-06-15 15:19:50,955][1648985] Fps is (10 sec: 52427.9, 60 sec: 46421.2, 300 sec: 46652.9). Total num frames: 631635968. Throughput: 0: 11753.2. Samples: 157984256. Policy #0 lag: (min: 15.0, avg: 115.7, max: 271.0) [2024-06-15 15:19:50,956][1648985] Avg episode reward: [(0, '134.240')] [2024-06-15 15:19:52,497][1652491] Updated weights for policy 0, policy_version 308448 (0.0015) [2024-06-15 15:19:53,015][1652491] Updated weights for policy 0, policy_version 308478 (0.0013) [2024-06-15 15:19:55,955][1648985] Fps is (10 sec: 42598.0, 60 sec: 46967.7, 300 sec: 46430.6). Total num frames: 631832576. Throughput: 0: 11798.7. Samples: 158024704. Policy #0 lag: (min: 15.0, avg: 115.7, max: 271.0) [2024-06-15 15:19:55,956][1648985] Avg episode reward: [(0, '129.890')] [2024-06-15 15:19:56,200][1652491] Updated weights for policy 0, policy_version 308528 (0.0013) [2024-06-15 15:19:58,112][1652491] Updated weights for policy 0, policy_version 308600 (0.0015) [2024-06-15 15:20:00,747][1652491] Updated weights for policy 0, policy_version 308643 (0.0013) [2024-06-15 15:20:00,955][1648985] Fps is (10 sec: 49152.8, 60 sec: 47513.9, 300 sec: 46874.9). Total num frames: 632127488. Throughput: 0: 11616.7. Samples: 158088192. Policy #0 lag: (min: 0.0, avg: 129.2, max: 256.0) [2024-06-15 15:20:00,956][1648985] Avg episode reward: [(0, '135.170')] [2024-06-15 15:20:03,466][1652491] Updated weights for policy 0, policy_version 308688 (0.0159) [2024-06-15 15:20:04,453][1652491] Updated weights for policy 0, policy_version 308736 (0.0015) [2024-06-15 15:20:05,955][1648985] Fps is (10 sec: 45875.3, 60 sec: 45875.2, 300 sec: 46541.7). Total num frames: 632291328. Throughput: 0: 11639.5. Samples: 158161408. Policy #0 lag: (min: 0.0, avg: 129.2, max: 256.0) [2024-06-15 15:20:05,956][1648985] Avg episode reward: [(0, '140.650')] [2024-06-15 15:20:08,241][1652491] Updated weights for policy 0, policy_version 308804 (0.0071) [2024-06-15 15:20:09,260][1652491] Updated weights for policy 0, policy_version 308856 (0.0020) [2024-06-15 15:20:10,955][1648985] Fps is (10 sec: 42598.4, 60 sec: 47513.6, 300 sec: 46652.8). Total num frames: 632553472. Throughput: 0: 11696.4. Samples: 158191616. Policy #0 lag: (min: 0.0, avg: 129.2, max: 256.0) [2024-06-15 15:20:10,956][1648985] Avg episode reward: [(0, '139.540')] [2024-06-15 15:20:11,170][1651469] Signal inference workers to stop experience collection... (16150 times) [2024-06-15 15:20:11,227][1652491] InferenceWorker_p0-w0: stopping experience collection (16150 times) [2024-06-15 15:20:11,350][1651469] Signal inference workers to resume experience collection... (16150 times) [2024-06-15 15:20:11,351][1652491] InferenceWorker_p0-w0: resuming experience collection (16150 times) [2024-06-15 15:20:11,965][1652491] Updated weights for policy 0, policy_version 308897 (0.0050) [2024-06-15 15:20:14,072][1652491] Updated weights for policy 0, policy_version 308932 (0.0014) [2024-06-15 15:20:15,459][1652491] Updated weights for policy 0, policy_version 308989 (0.0011) [2024-06-15 15:20:15,955][1648985] Fps is (10 sec: 52429.3, 60 sec: 45875.3, 300 sec: 46874.9). Total num frames: 632815616. Throughput: 0: 11537.1. Samples: 158263808. Policy #0 lag: (min: 0.0, avg: 129.2, max: 256.0) [2024-06-15 15:20:15,955][1648985] Avg episode reward: [(0, '145.360')] [2024-06-15 15:20:18,912][1652491] Updated weights for policy 0, policy_version 309041 (0.0014) [2024-06-15 15:20:20,412][1652491] Updated weights for policy 0, policy_version 309104 (0.0014) [2024-06-15 15:20:20,956][1648985] Fps is (10 sec: 52425.5, 60 sec: 48059.2, 300 sec: 46652.6). Total num frames: 633077760. Throughput: 0: 11889.7. Samples: 158331392. Policy #0 lag: (min: 0.0, avg: 129.2, max: 256.0) [2024-06-15 15:20:20,956][1648985] Avg episode reward: [(0, '145.820')] [2024-06-15 15:20:23,131][1652491] Updated weights for policy 0, policy_version 309177 (0.0012) [2024-06-15 15:20:25,955][1648985] Fps is (10 sec: 45874.5, 60 sec: 46421.3, 300 sec: 46874.9). Total num frames: 633274368. Throughput: 0: 11969.4. Samples: 158372352. Policy #0 lag: (min: 0.0, avg: 129.2, max: 256.0) [2024-06-15 15:20:25,956][1648985] Avg episode reward: [(0, '138.020')] [2024-06-15 15:20:26,273][1652491] Updated weights for policy 0, policy_version 309237 (0.0016) [2024-06-15 15:20:28,967][1652491] Updated weights for policy 0, policy_version 309268 (0.0013) [2024-06-15 15:20:30,857][1652491] Updated weights for policy 0, policy_version 309373 (0.0103) [2024-06-15 15:20:30,955][1648985] Fps is (10 sec: 52431.8, 60 sec: 48605.8, 300 sec: 46652.8). Total num frames: 633602048. Throughput: 0: 11753.2. Samples: 158440448. Policy #0 lag: (min: 0.0, avg: 129.2, max: 256.0) [2024-06-15 15:20:30,956][1648985] Avg episode reward: [(0, '147.740')] [2024-06-15 15:20:35,955][1648985] Fps is (10 sec: 45875.5, 60 sec: 45875.4, 300 sec: 46652.8). Total num frames: 633733120. Throughput: 0: 11650.9. Samples: 158508544. Policy #0 lag: (min: 0.0, avg: 129.2, max: 256.0) [2024-06-15 15:20:35,956][1648985] Avg episode reward: [(0, '140.870')] [2024-06-15 15:20:36,644][1652491] Updated weights for policy 0, policy_version 309445 (0.0013) [2024-06-15 15:20:37,977][1652491] Updated weights for policy 0, policy_version 309493 (0.0012) [2024-06-15 15:20:40,955][1648985] Fps is (10 sec: 29490.8, 60 sec: 46421.2, 300 sec: 46430.6). Total num frames: 633896960. Throughput: 0: 11593.9. Samples: 158546432. Policy #0 lag: (min: 0.0, avg: 129.2, max: 256.0) [2024-06-15 15:20:40,956][1648985] Avg episode reward: [(0, '116.340')] [2024-06-15 15:20:41,348][1652491] Updated weights for policy 0, policy_version 309542 (0.0011) [2024-06-15 15:20:42,976][1652491] Updated weights for policy 0, policy_version 309625 (0.0026) [2024-06-15 15:20:45,955][1648985] Fps is (10 sec: 45875.6, 60 sec: 46421.4, 300 sec: 46874.9). Total num frames: 634191872. Throughput: 0: 11719.1. Samples: 158615552. Policy #0 lag: (min: 0.0, avg: 129.2, max: 256.0) [2024-06-15 15:20:45,955][1648985] Avg episode reward: [(0, '131.950')] [2024-06-15 15:20:46,343][1652491] Updated weights for policy 0, policy_version 309686 (0.0116) [2024-06-15 15:20:48,414][1652491] Updated weights for policy 0, policy_version 309734 (0.0012) [2024-06-15 15:20:50,955][1648985] Fps is (10 sec: 49152.9, 60 sec: 45875.3, 300 sec: 46652.8). Total num frames: 634388480. Throughput: 0: 11741.9. Samples: 158689792. Policy #0 lag: (min: 0.0, avg: 129.2, max: 256.0) [2024-06-15 15:20:50,955][1648985] Avg episode reward: [(0, '134.790')] [2024-06-15 15:20:51,807][1652491] Updated weights for policy 0, policy_version 309780 (0.0012) [2024-06-15 15:20:53,105][1651469] Signal inference workers to stop experience collection... (16200 times) [2024-06-15 15:20:53,151][1652491] InferenceWorker_p0-w0: stopping experience collection (16200 times) [2024-06-15 15:20:53,314][1651469] Signal inference workers to resume experience collection... (16200 times) [2024-06-15 15:20:53,315][1652491] InferenceWorker_p0-w0: resuming experience collection (16200 times) [2024-06-15 15:20:53,808][1652491] Updated weights for policy 0, policy_version 309884 (0.0125) [2024-06-15 15:20:55,955][1648985] Fps is (10 sec: 45873.6, 60 sec: 46967.3, 300 sec: 46652.7). Total num frames: 634650624. Throughput: 0: 11639.4. Samples: 158715392. Policy #0 lag: (min: 0.0, avg: 129.2, max: 256.0) [2024-06-15 15:20:55,956][1648985] Avg episode reward: [(0, '147.570')] [2024-06-15 15:20:55,968][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000309888_634650624.pth... [2024-06-15 15:20:56,024][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000304400_623411200.pth [2024-06-15 15:20:57,184][1652491] Updated weights for policy 0, policy_version 309925 (0.0012) [2024-06-15 15:20:59,445][1652491] Updated weights for policy 0, policy_version 309973 (0.0013) [2024-06-15 15:21:00,425][1652491] Updated weights for policy 0, policy_version 310016 (0.0011) [2024-06-15 15:21:00,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 46421.3, 300 sec: 46986.0). Total num frames: 634912768. Throughput: 0: 11753.2. Samples: 158792704. Policy #0 lag: (min: 0.0, avg: 129.2, max: 256.0) [2024-06-15 15:21:00,955][1648985] Avg episode reward: [(0, '158.950')] [2024-06-15 15:21:04,233][1652491] Updated weights for policy 0, policy_version 310096 (0.0014) [2024-06-15 15:21:05,955][1648985] Fps is (10 sec: 52430.3, 60 sec: 48059.8, 300 sec: 46652.8). Total num frames: 635174912. Throughput: 0: 11719.3. Samples: 158858752. Policy #0 lag: (min: 0.0, avg: 129.2, max: 256.0) [2024-06-15 15:21:05,956][1648985] Avg episode reward: [(0, '153.660')] [2024-06-15 15:21:07,515][1652491] Updated weights for policy 0, policy_version 310146 (0.0017) [2024-06-15 15:21:08,965][1652491] Updated weights for policy 0, policy_version 310199 (0.0045) [2024-06-15 15:21:10,805][1652491] Updated weights for policy 0, policy_version 310241 (0.0014) [2024-06-15 15:21:10,958][1648985] Fps is (10 sec: 45860.1, 60 sec: 46964.9, 300 sec: 46874.4). Total num frames: 635371520. Throughput: 0: 11627.3. Samples: 158895616. Policy #0 lag: (min: 0.0, avg: 129.2, max: 256.0) [2024-06-15 15:21:10,959][1648985] Avg episode reward: [(0, '142.210')] [2024-06-15 15:21:14,072][1652491] Updated weights for policy 0, policy_version 310291 (0.0038) [2024-06-15 15:21:15,819][1652491] Updated weights for policy 0, policy_version 310354 (0.0013) [2024-06-15 15:21:15,955][1648985] Fps is (10 sec: 42598.1, 60 sec: 46421.2, 300 sec: 46763.8). Total num frames: 635600896. Throughput: 0: 11685.0. Samples: 158966272. Policy #0 lag: (min: 31.0, avg: 128.9, max: 287.0) [2024-06-15 15:21:15,956][1648985] Avg episode reward: [(0, '153.560')] [2024-06-15 15:21:18,864][1652491] Updated weights for policy 0, policy_version 310407 (0.0022) [2024-06-15 15:21:19,800][1652491] Updated weights for policy 0, policy_version 310458 (0.0014) [2024-06-15 15:21:20,955][1648985] Fps is (10 sec: 45890.0, 60 sec: 45875.6, 300 sec: 46652.8). Total num frames: 635830272. Throughput: 0: 11764.6. Samples: 159037952. Policy #0 lag: (min: 31.0, avg: 128.9, max: 287.0) [2024-06-15 15:21:20,956][1648985] Avg episode reward: [(0, '162.460')] [2024-06-15 15:21:22,620][1652491] Updated weights for policy 0, policy_version 310512 (0.0012) [2024-06-15 15:21:25,646][1652491] Updated weights for policy 0, policy_version 310561 (0.0013) [2024-06-15 15:21:25,955][1648985] Fps is (10 sec: 45874.7, 60 sec: 46421.3, 300 sec: 46874.9). Total num frames: 636059648. Throughput: 0: 11696.4. Samples: 159072768. Policy #0 lag: (min: 31.0, avg: 128.9, max: 287.0) [2024-06-15 15:21:25,956][1648985] Avg episode reward: [(0, '139.470')] [2024-06-15 15:21:27,489][1652491] Updated weights for policy 0, policy_version 310651 (0.0076) [2024-06-15 15:21:30,955][1648985] Fps is (10 sec: 49152.7, 60 sec: 45329.2, 300 sec: 46541.7). Total num frames: 636321792. Throughput: 0: 11616.7. Samples: 159138304. Policy #0 lag: (min: 31.0, avg: 128.9, max: 287.0) [2024-06-15 15:21:30,955][1648985] Avg episode reward: [(0, '137.230')] [2024-06-15 15:21:31,049][1652491] Updated weights for policy 0, policy_version 310715 (0.0014) [2024-06-15 15:21:34,211][1652491] Updated weights for policy 0, policy_version 310776 (0.0017) [2024-06-15 15:21:35,955][1648985] Fps is (10 sec: 42599.4, 60 sec: 45875.3, 300 sec: 46652.8). Total num frames: 636485632. Throughput: 0: 11571.2. Samples: 159210496. Policy #0 lag: (min: 31.0, avg: 128.9, max: 287.0) [2024-06-15 15:21:35,955][1648985] Avg episode reward: [(0, '133.440')] [2024-06-15 15:21:37,396][1652491] Updated weights for policy 0, policy_version 310816 (0.0013) [2024-06-15 15:21:37,743][1651469] Signal inference workers to stop experience collection... (16250 times) [2024-06-15 15:21:37,783][1652491] InferenceWorker_p0-w0: stopping experience collection (16250 times) [2024-06-15 15:21:37,945][1651469] Signal inference workers to resume experience collection... (16250 times) [2024-06-15 15:21:37,955][1652491] InferenceWorker_p0-w0: resuming experience collection (16250 times) [2024-06-15 15:21:38,520][1652491] Updated weights for policy 0, policy_version 310880 (0.0013) [2024-06-15 15:21:40,798][1652491] Updated weights for policy 0, policy_version 310913 (0.0014) [2024-06-15 15:21:40,955][1648985] Fps is (10 sec: 42597.4, 60 sec: 47513.6, 300 sec: 46652.7). Total num frames: 636747776. Throughput: 0: 11707.8. Samples: 159242240. Policy #0 lag: (min: 31.0, avg: 128.9, max: 287.0) [2024-06-15 15:21:40,956][1648985] Avg episode reward: [(0, '119.720')] [2024-06-15 15:21:43,993][1652491] Updated weights for policy 0, policy_version 310980 (0.0013) [2024-06-15 15:21:45,402][1652491] Updated weights for policy 0, policy_version 311040 (0.0013) [2024-06-15 15:21:45,955][1648985] Fps is (10 sec: 52428.2, 60 sec: 46967.4, 300 sec: 46874.9). Total num frames: 637009920. Throughput: 0: 11628.1. Samples: 159315968. Policy #0 lag: (min: 31.0, avg: 128.9, max: 287.0) [2024-06-15 15:21:45,956][1648985] Avg episode reward: [(0, '117.800')] [2024-06-15 15:21:49,788][1652491] Updated weights for policy 0, policy_version 311136 (0.0014) [2024-06-15 15:21:50,955][1648985] Fps is (10 sec: 52429.5, 60 sec: 48059.7, 300 sec: 47097.1). Total num frames: 637272064. Throughput: 0: 11593.9. Samples: 159380480. Policy #0 lag: (min: 31.0, avg: 128.9, max: 287.0) [2024-06-15 15:21:50,956][1648985] Avg episode reward: [(0, '126.740')] [2024-06-15 15:21:52,681][1652491] Updated weights for policy 0, policy_version 311185 (0.0012) [2024-06-15 15:21:55,479][1652491] Updated weights for policy 0, policy_version 311236 (0.0027) [2024-06-15 15:21:55,955][1648985] Fps is (10 sec: 42598.6, 60 sec: 46421.5, 300 sec: 46763.8). Total num frames: 637435904. Throughput: 0: 11674.4. Samples: 159420928. Policy #0 lag: (min: 31.0, avg: 128.9, max: 287.0) [2024-06-15 15:21:55,955][1648985] Avg episode reward: [(0, '130.990')] [2024-06-15 15:21:58,887][1652491] Updated weights for policy 0, policy_version 311302 (0.0014) [2024-06-15 15:22:00,014][1652491] Updated weights for policy 0, policy_version 311360 (0.0013) [2024-06-15 15:22:00,955][1648985] Fps is (10 sec: 49152.1, 60 sec: 47513.6, 300 sec: 47430.3). Total num frames: 637763584. Throughput: 0: 11628.1. Samples: 159489536. Policy #0 lag: (min: 31.0, avg: 128.9, max: 287.0) [2024-06-15 15:22:00,956][1648985] Avg episode reward: [(0, '135.660')] [2024-06-15 15:22:01,077][1652491] Updated weights for policy 0, policy_version 311413 (0.0089) [2024-06-15 15:22:03,850][1652491] Updated weights for policy 0, policy_version 311459 (0.0016) [2024-06-15 15:22:05,955][1648985] Fps is (10 sec: 49152.3, 60 sec: 45875.2, 300 sec: 46652.7). Total num frames: 637927424. Throughput: 0: 11741.9. Samples: 159566336. Policy #0 lag: (min: 31.0, avg: 128.9, max: 287.0) [2024-06-15 15:22:05,956][1648985] Avg episode reward: [(0, '127.010')] [2024-06-15 15:22:07,037][1652491] Updated weights for policy 0, policy_version 311504 (0.0018) [2024-06-15 15:22:08,092][1652491] Updated weights for policy 0, policy_version 311549 (0.0012) [2024-06-15 15:22:10,627][1652491] Updated weights for policy 0, policy_version 311632 (0.0014) [2024-06-15 15:22:10,955][1648985] Fps is (10 sec: 45874.8, 60 sec: 47516.1, 300 sec: 47208.1). Total num frames: 638222336. Throughput: 0: 11787.4. Samples: 159603200. Policy #0 lag: (min: 31.0, avg: 128.9, max: 287.0) [2024-06-15 15:22:10,956][1648985] Avg episode reward: [(0, '119.910')] [2024-06-15 15:22:11,692][1652491] Updated weights for policy 0, policy_version 311680 (0.0013) [2024-06-15 15:22:15,511][1652491] Updated weights for policy 0, policy_version 311739 (0.0012) [2024-06-15 15:22:15,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 47513.7, 300 sec: 47097.1). Total num frames: 638451712. Throughput: 0: 11855.6. Samples: 159671808. Policy #0 lag: (min: 31.0, avg: 128.9, max: 287.0) [2024-06-15 15:22:15,956][1648985] Avg episode reward: [(0, '125.580')] [2024-06-15 15:22:18,490][1652491] Updated weights for policy 0, policy_version 311776 (0.0012) [2024-06-15 15:22:20,909][1651469] Signal inference workers to stop experience collection... (16300 times) [2024-06-15 15:22:20,955][1648985] Fps is (10 sec: 39322.2, 60 sec: 46421.4, 300 sec: 46763.8). Total num frames: 638615552. Throughput: 0: 11787.4. Samples: 159740928. Policy #0 lag: (min: 31.0, avg: 128.9, max: 287.0) [2024-06-15 15:22:20,955][1648985] Avg episode reward: [(0, '121.330')] [2024-06-15 15:22:21,024][1652491] InferenceWorker_p0-w0: stopping experience collection (16300 times) [2024-06-15 15:22:21,138][1651469] Signal inference workers to resume experience collection... (16300 times) [2024-06-15 15:22:21,138][1652491] InferenceWorker_p0-w0: resuming experience collection (16300 times) [2024-06-15 15:22:21,140][1652491] Updated weights for policy 0, policy_version 311840 (0.0011) [2024-06-15 15:22:22,856][1652491] Updated weights for policy 0, policy_version 311920 (0.0020) [2024-06-15 15:22:25,955][1648985] Fps is (10 sec: 39320.6, 60 sec: 46421.3, 300 sec: 46985.9). Total num frames: 638844928. Throughput: 0: 11764.6. Samples: 159771648. Policy #0 lag: (min: 31.0, avg: 128.9, max: 287.0) [2024-06-15 15:22:25,956][1648985] Avg episode reward: [(0, '141.490')] [2024-06-15 15:22:26,531][1652491] Updated weights for policy 0, policy_version 311968 (0.0014) [2024-06-15 15:22:29,545][1652491] Updated weights for policy 0, policy_version 312022 (0.0017) [2024-06-15 15:22:30,955][1648985] Fps is (10 sec: 49151.8, 60 sec: 46421.3, 300 sec: 47097.1). Total num frames: 639107072. Throughput: 0: 11696.4. Samples: 159842304. Policy #0 lag: (min: 8.0, avg: 113.0, max: 264.0) [2024-06-15 15:22:30,956][1648985] Avg episode reward: [(0, '129.330')] [2024-06-15 15:22:32,058][1652491] Updated weights for policy 0, policy_version 312068 (0.0014) [2024-06-15 15:22:33,435][1652491] Updated weights for policy 0, policy_version 312128 (0.0013) [2024-06-15 15:22:34,779][1652491] Updated weights for policy 0, policy_version 312187 (0.0019) [2024-06-15 15:22:35,955][1648985] Fps is (10 sec: 52429.8, 60 sec: 48059.7, 300 sec: 47541.4). Total num frames: 639369216. Throughput: 0: 11776.0. Samples: 159910400. Policy #0 lag: (min: 8.0, avg: 113.0, max: 264.0) [2024-06-15 15:22:35,956][1648985] Avg episode reward: [(0, '119.940')] [2024-06-15 15:22:40,749][1652491] Updated weights for policy 0, policy_version 312272 (0.0014) [2024-06-15 15:22:40,955][1648985] Fps is (10 sec: 42597.9, 60 sec: 46421.4, 300 sec: 46763.8). Total num frames: 639533056. Throughput: 0: 11593.9. Samples: 159942656. Policy #0 lag: (min: 8.0, avg: 113.0, max: 264.0) [2024-06-15 15:22:40,956][1648985] Avg episode reward: [(0, '119.980')] [2024-06-15 15:22:43,372][1652491] Updated weights for policy 0, policy_version 312322 (0.0060) [2024-06-15 15:22:44,849][1652491] Updated weights for policy 0, policy_version 312385 (0.0012) [2024-06-15 15:22:45,955][1648985] Fps is (10 sec: 45875.4, 60 sec: 46967.5, 300 sec: 47321.0). Total num frames: 639827968. Throughput: 0: 11696.4. Samples: 160015872. Policy #0 lag: (min: 8.0, avg: 113.0, max: 264.0) [2024-06-15 15:22:45,956][1648985] Avg episode reward: [(0, '120.710')] [2024-06-15 15:22:46,141][1652491] Updated weights for policy 0, policy_version 312440 (0.0015) [2024-06-15 15:22:49,628][1652491] Updated weights for policy 0, policy_version 312503 (0.0013) [2024-06-15 15:22:50,955][1648985] Fps is (10 sec: 49152.8, 60 sec: 45875.3, 300 sec: 46763.8). Total num frames: 640024576. Throughput: 0: 11605.3. Samples: 160088576. Policy #0 lag: (min: 8.0, avg: 113.0, max: 264.0) [2024-06-15 15:22:50,955][1648985] Avg episode reward: [(0, '141.970')] [2024-06-15 15:22:52,808][1652491] Updated weights for policy 0, policy_version 312573 (0.0087) [2024-06-15 15:22:55,292][1652491] Updated weights for policy 0, policy_version 312633 (0.0014) [2024-06-15 15:22:55,955][1648985] Fps is (10 sec: 45875.1, 60 sec: 47513.6, 300 sec: 47097.1). Total num frames: 640286720. Throughput: 0: 11559.8. Samples: 160123392. Policy #0 lag: (min: 8.0, avg: 113.0, max: 264.0) [2024-06-15 15:22:55,955][1648985] Avg episode reward: [(0, '158.060')] [2024-06-15 15:22:56,466][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000312672_640352256.pth... [2024-06-15 15:22:56,594][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000307136_629014528.pth [2024-06-15 15:22:57,026][1652491] Updated weights for policy 0, policy_version 312698 (0.0027) [2024-06-15 15:23:00,640][1652491] Updated weights for policy 0, policy_version 312760 (0.0012) [2024-06-15 15:23:00,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 46421.4, 300 sec: 46986.0). Total num frames: 640548864. Throughput: 0: 11559.8. Samples: 160192000. Policy #0 lag: (min: 8.0, avg: 113.0, max: 264.0) [2024-06-15 15:23:00,956][1648985] Avg episode reward: [(0, '138.150')] [2024-06-15 15:23:04,202][1652491] Updated weights for policy 0, policy_version 312817 (0.0063) [2024-06-15 15:23:04,711][1651469] Signal inference workers to stop experience collection... (16350 times) [2024-06-15 15:23:04,787][1652491] InferenceWorker_p0-w0: stopping experience collection (16350 times) [2024-06-15 15:23:04,952][1651469] Signal inference workers to resume experience collection... (16350 times) [2024-06-15 15:23:04,953][1652491] InferenceWorker_p0-w0: resuming experience collection (16350 times) [2024-06-15 15:23:05,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 48059.7, 300 sec: 47208.1). Total num frames: 640811008. Throughput: 0: 11525.7. Samples: 160259584. Policy #0 lag: (min: 8.0, avg: 113.0, max: 264.0) [2024-06-15 15:23:05,956][1648985] Avg episode reward: [(0, '138.730')] [2024-06-15 15:23:07,281][1652491] Updated weights for policy 0, policy_version 312912 (0.0160) [2024-06-15 15:23:08,077][1652491] Updated weights for policy 0, policy_version 312955 (0.0014) [2024-06-15 15:23:10,955][1648985] Fps is (10 sec: 39321.3, 60 sec: 45329.1, 300 sec: 47097.0). Total num frames: 640942080. Throughput: 0: 11696.4. Samples: 160297984. Policy #0 lag: (min: 8.0, avg: 113.0, max: 264.0) [2024-06-15 15:23:10,956][1648985] Avg episode reward: [(0, '144.510')] [2024-06-15 15:23:11,585][1652491] Updated weights for policy 0, policy_version 312993 (0.0015) [2024-06-15 15:23:12,231][1652491] Updated weights for policy 0, policy_version 313024 (0.0009) [2024-06-15 15:23:14,724][1652491] Updated weights for policy 0, policy_version 313078 (0.0012) [2024-06-15 15:23:15,955][1648985] Fps is (10 sec: 45874.8, 60 sec: 46967.4, 300 sec: 47097.1). Total num frames: 641269760. Throughput: 0: 11889.7. Samples: 160377344. Policy #0 lag: (min: 8.0, avg: 113.0, max: 264.0) [2024-06-15 15:23:15,956][1648985] Avg episode reward: [(0, '133.510')] [2024-06-15 15:23:16,391][1652491] Updated weights for policy 0, policy_version 313143 (0.0014) [2024-06-15 15:23:18,500][1652491] Updated weights for policy 0, policy_version 313186 (0.0013) [2024-06-15 15:23:20,955][1648985] Fps is (10 sec: 52429.2, 60 sec: 47513.6, 300 sec: 47097.1). Total num frames: 641466368. Throughput: 0: 11878.4. Samples: 160444928. Policy #0 lag: (min: 8.0, avg: 113.0, max: 264.0) [2024-06-15 15:23:20,956][1648985] Avg episode reward: [(0, '136.080')] [2024-06-15 15:23:22,135][1652491] Updated weights for policy 0, policy_version 313237 (0.0014) [2024-06-15 15:23:22,979][1652491] Updated weights for policy 0, policy_version 313278 (0.0026) [2024-06-15 15:23:25,955][1648985] Fps is (10 sec: 42598.8, 60 sec: 47513.8, 300 sec: 46986.0). Total num frames: 641695744. Throughput: 0: 12151.5. Samples: 160489472. Policy #0 lag: (min: 8.0, avg: 113.0, max: 264.0) [2024-06-15 15:23:25,956][1648985] Avg episode reward: [(0, '140.650')] [2024-06-15 15:23:26,464][1652491] Updated weights for policy 0, policy_version 313348 (0.0017) [2024-06-15 15:23:28,861][1652491] Updated weights for policy 0, policy_version 313424 (0.0136) [2024-06-15 15:23:30,225][1652491] Updated weights for policy 0, policy_version 313472 (0.0022) [2024-06-15 15:23:30,955][1648985] Fps is (10 sec: 52428.0, 60 sec: 48059.6, 300 sec: 47319.2). Total num frames: 641990656. Throughput: 0: 11844.2. Samples: 160548864. Policy #0 lag: (min: 8.0, avg: 113.0, max: 264.0) [2024-06-15 15:23:30,956][1648985] Avg episode reward: [(0, '142.790')] [2024-06-15 15:23:34,068][1652491] Updated weights for policy 0, policy_version 313536 (0.0014) [2024-06-15 15:23:35,955][1648985] Fps is (10 sec: 42599.2, 60 sec: 45875.4, 300 sec: 46652.8). Total num frames: 642121728. Throughput: 0: 11969.5. Samples: 160627200. Policy #0 lag: (min: 8.0, avg: 113.0, max: 264.0) [2024-06-15 15:23:35,955][1648985] Avg episode reward: [(0, '153.630')] [2024-06-15 15:23:37,920][1652491] Updated weights for policy 0, policy_version 313616 (0.0015) [2024-06-15 15:23:38,977][1652491] Updated weights for policy 0, policy_version 313664 (0.0011) [2024-06-15 15:23:40,955][1648985] Fps is (10 sec: 45875.6, 60 sec: 48605.9, 300 sec: 47097.0). Total num frames: 642449408. Throughput: 0: 11776.0. Samples: 160653312. Policy #0 lag: (min: 8.0, avg: 113.0, max: 264.0) [2024-06-15 15:23:40,956][1648985] Avg episode reward: [(0, '166.440')] [2024-06-15 15:23:41,288][1652491] Updated weights for policy 0, policy_version 313720 (0.0021) [2024-06-15 15:23:44,989][1652491] Updated weights for policy 0, policy_version 313763 (0.0014) [2024-06-15 15:23:45,955][1648985] Fps is (10 sec: 52428.3, 60 sec: 46967.5, 300 sec: 46763.8). Total num frames: 642646016. Throughput: 0: 11844.3. Samples: 160724992. Policy #0 lag: (min: 15.0, avg: 122.7, max: 271.0) [2024-06-15 15:23:45,955][1648985] Avg episode reward: [(0, '164.020')] [2024-06-15 15:23:48,096][1651469] Signal inference workers to stop experience collection... (16400 times) [2024-06-15 15:23:48,143][1652491] InferenceWorker_p0-w0: stopping experience collection (16400 times) [2024-06-15 15:23:48,157][1652491] Updated weights for policy 0, policy_version 313832 (0.0041) [2024-06-15 15:23:48,274][1651469] Signal inference workers to resume experience collection... (16400 times) [2024-06-15 15:23:48,275][1652491] InferenceWorker_p0-w0: resuming experience collection (16400 times) [2024-06-15 15:23:48,868][1652491] Updated weights for policy 0, policy_version 313859 (0.0020) [2024-06-15 15:23:50,075][1652491] Updated weights for policy 0, policy_version 313920 (0.0016) [2024-06-15 15:23:50,955][1648985] Fps is (10 sec: 45875.9, 60 sec: 48059.8, 300 sec: 47097.1). Total num frames: 642908160. Throughput: 0: 11935.3. Samples: 160796672. Policy #0 lag: (min: 15.0, avg: 122.7, max: 271.0) [2024-06-15 15:23:50,955][1648985] Avg episode reward: [(0, '158.510')] [2024-06-15 15:23:52,549][1652491] Updated weights for policy 0, policy_version 313984 (0.0013) [2024-06-15 15:23:55,955][1648985] Fps is (10 sec: 45874.6, 60 sec: 46967.4, 300 sec: 46874.9). Total num frames: 643104768. Throughput: 0: 11878.4. Samples: 160832512. Policy #0 lag: (min: 15.0, avg: 122.7, max: 271.0) [2024-06-15 15:23:55,956][1648985] Avg episode reward: [(0, '141.580')] [2024-06-15 15:23:58,470][1652491] Updated weights for policy 0, policy_version 314051 (0.0098) [2024-06-15 15:23:59,447][1652491] Updated weights for policy 0, policy_version 314098 (0.0079) [2024-06-15 15:24:00,940][1652491] Updated weights for policy 0, policy_version 314172 (0.0014) [2024-06-15 15:24:00,955][1648985] Fps is (10 sec: 49151.8, 60 sec: 47513.6, 300 sec: 46986.0). Total num frames: 643399680. Throughput: 0: 11776.0. Samples: 160907264. Policy #0 lag: (min: 15.0, avg: 122.7, max: 271.0) [2024-06-15 15:24:00,955][1648985] Avg episode reward: [(0, '137.230')] [2024-06-15 15:24:03,584][1652491] Updated weights for policy 0, policy_version 314229 (0.0042) [2024-06-15 15:24:05,955][1648985] Fps is (10 sec: 45875.5, 60 sec: 45875.2, 300 sec: 46986.0). Total num frames: 643563520. Throughput: 0: 11821.5. Samples: 160976896. Policy #0 lag: (min: 15.0, avg: 122.7, max: 271.0) [2024-06-15 15:24:05,956][1648985] Avg episode reward: [(0, '134.810')] [2024-06-15 15:24:07,297][1652491] Updated weights for policy 0, policy_version 314272 (0.0037) [2024-06-15 15:24:07,861][1652491] Updated weights for policy 0, policy_version 314304 (0.0012) [2024-06-15 15:24:10,955][1648985] Fps is (10 sec: 42597.8, 60 sec: 48059.7, 300 sec: 46652.7). Total num frames: 643825664. Throughput: 0: 11707.7. Samples: 161016320. Policy #0 lag: (min: 15.0, avg: 122.7, max: 271.0) [2024-06-15 15:24:10,956][1648985] Avg episode reward: [(0, '147.340')] [2024-06-15 15:24:10,994][1652491] Updated weights for policy 0, policy_version 314378 (0.0032) [2024-06-15 15:24:11,836][1652491] Updated weights for policy 0, policy_version 314420 (0.0011) [2024-06-15 15:24:13,541][1652491] Updated weights for policy 0, policy_version 314487 (0.0118) [2024-06-15 15:24:15,955][1648985] Fps is (10 sec: 52428.6, 60 sec: 46967.5, 300 sec: 47097.0). Total num frames: 644087808. Throughput: 0: 11855.7. Samples: 161082368. Policy #0 lag: (min: 15.0, avg: 122.7, max: 271.0) [2024-06-15 15:24:15,956][1648985] Avg episode reward: [(0, '134.360')] [2024-06-15 15:24:18,592][1652491] Updated weights for policy 0, policy_version 314530 (0.0013) [2024-06-15 15:24:20,815][1652491] Updated weights for policy 0, policy_version 314577 (0.0013) [2024-06-15 15:24:20,955][1648985] Fps is (10 sec: 42598.9, 60 sec: 46421.3, 300 sec: 46652.8). Total num frames: 644251648. Throughput: 0: 11878.4. Samples: 161161728. Policy #0 lag: (min: 15.0, avg: 122.7, max: 271.0) [2024-06-15 15:24:20,956][1648985] Avg episode reward: [(0, '127.900')] [2024-06-15 15:24:22,356][1652491] Updated weights for policy 0, policy_version 314641 (0.0139) [2024-06-15 15:24:23,153][1652491] Updated weights for policy 0, policy_version 314682 (0.0021) [2024-06-15 15:24:24,584][1652491] Updated weights for policy 0, policy_version 314736 (0.0015) [2024-06-15 15:24:25,955][1648985] Fps is (10 sec: 52429.2, 60 sec: 48605.9, 300 sec: 47208.1). Total num frames: 644612096. Throughput: 0: 11901.2. Samples: 161188864. Policy #0 lag: (min: 15.0, avg: 122.7, max: 271.0) [2024-06-15 15:24:25,955][1648985] Avg episode reward: [(0, '141.490')] [2024-06-15 15:24:29,068][1651469] Signal inference workers to stop experience collection... (16450 times) [2024-06-15 15:24:29,106][1652491] InferenceWorker_p0-w0: stopping experience collection (16450 times) [2024-06-15 15:24:29,334][1651469] Signal inference workers to resume experience collection... (16450 times) [2024-06-15 15:24:29,335][1652491] InferenceWorker_p0-w0: resuming experience collection (16450 times) [2024-06-15 15:24:29,336][1652491] Updated weights for policy 0, policy_version 314784 (0.0016) [2024-06-15 15:24:30,144][1652491] Updated weights for policy 0, policy_version 314814 (0.0014) [2024-06-15 15:24:30,955][1648985] Fps is (10 sec: 49151.8, 60 sec: 45875.3, 300 sec: 46652.8). Total num frames: 644743168. Throughput: 0: 11969.4. Samples: 161263616. Policy #0 lag: (min: 15.0, avg: 122.7, max: 271.0) [2024-06-15 15:24:30,956][1648985] Avg episode reward: [(0, '137.290')] [2024-06-15 15:24:32,587][1652491] Updated weights for policy 0, policy_version 314875 (0.0014) [2024-06-15 15:24:33,927][1652491] Updated weights for policy 0, policy_version 314941 (0.0095) [2024-06-15 15:24:35,382][1652491] Updated weights for policy 0, policy_version 314999 (0.0012) [2024-06-15 15:24:35,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 50244.1, 300 sec: 47541.4). Total num frames: 645136384. Throughput: 0: 11764.6. Samples: 161326080. Policy #0 lag: (min: 15.0, avg: 122.7, max: 271.0) [2024-06-15 15:24:35,956][1648985] Avg episode reward: [(0, '138.430')] [2024-06-15 15:24:40,572][1652491] Updated weights for policy 0, policy_version 315041 (0.0011) [2024-06-15 15:24:40,955][1648985] Fps is (10 sec: 49152.1, 60 sec: 46421.4, 300 sec: 46874.9). Total num frames: 645234688. Throughput: 0: 12037.7. Samples: 161374208. Policy #0 lag: (min: 15.0, avg: 122.7, max: 271.0) [2024-06-15 15:24:40,956][1648985] Avg episode reward: [(0, '141.230')] [2024-06-15 15:24:41,155][1652491] Updated weights for policy 0, policy_version 315072 (0.0031) [2024-06-15 15:24:43,471][1652491] Updated weights for policy 0, policy_version 315136 (0.0016) [2024-06-15 15:24:44,893][1652491] Updated weights for policy 0, policy_version 315200 (0.0013) [2024-06-15 15:24:45,955][1648985] Fps is (10 sec: 45875.2, 60 sec: 49152.0, 300 sec: 47319.2). Total num frames: 645595136. Throughput: 0: 11901.1. Samples: 161442816. Policy #0 lag: (min: 15.0, avg: 122.7, max: 271.0) [2024-06-15 15:24:45,956][1648985] Avg episode reward: [(0, '166.890')] [2024-06-15 15:24:46,441][1652491] Updated weights for policy 0, policy_version 315259 (0.0014) [2024-06-15 15:24:50,961][1648985] Fps is (10 sec: 49124.0, 60 sec: 46962.9, 300 sec: 47096.2). Total num frames: 645726208. Throughput: 0: 12127.2. Samples: 161522688. Policy #0 lag: (min: 15.0, avg: 122.7, max: 271.0) [2024-06-15 15:24:50,962][1648985] Avg episode reward: [(0, '166.890')] [2024-06-15 15:24:51,519][1652491] Updated weights for policy 0, policy_version 315322 (0.0044) [2024-06-15 15:24:55,210][1652491] Updated weights for policy 0, policy_version 315392 (0.0020) [2024-06-15 15:24:55,955][1648985] Fps is (10 sec: 36043.7, 60 sec: 47513.4, 300 sec: 46874.9). Total num frames: 645955584. Throughput: 0: 12037.6. Samples: 161558016. Policy #0 lag: (min: 15.0, avg: 122.7, max: 271.0) [2024-06-15 15:24:55,956][1648985] Avg episode reward: [(0, '158.310')] [2024-06-15 15:24:56,424][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000315440_646021120.pth... [2024-06-15 15:24:56,552][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000309888_634650624.pth [2024-06-15 15:24:57,054][1652491] Updated weights for policy 0, policy_version 315460 (0.0012) [2024-06-15 15:25:00,955][1648985] Fps is (10 sec: 45901.3, 60 sec: 46421.3, 300 sec: 47097.1). Total num frames: 646184960. Throughput: 0: 11844.3. Samples: 161615360. Policy #0 lag: (min: 15.0, avg: 122.7, max: 271.0) [2024-06-15 15:25:00,956][1648985] Avg episode reward: [(0, '152.020')] [2024-06-15 15:25:02,072][1652491] Updated weights for policy 0, policy_version 315536 (0.0130) [2024-06-15 15:25:05,466][1652491] Updated weights for policy 0, policy_version 315587 (0.0046) [2024-06-15 15:25:05,955][1648985] Fps is (10 sec: 39322.7, 60 sec: 46421.3, 300 sec: 46763.8). Total num frames: 646348800. Throughput: 0: 11821.5. Samples: 161693696. Policy #0 lag: (min: 15.0, avg: 107.3, max: 271.0) [2024-06-15 15:25:05,956][1648985] Avg episode reward: [(0, '151.910')] [2024-06-15 15:25:07,147][1652491] Updated weights for policy 0, policy_version 315664 (0.0067) [2024-06-15 15:25:07,601][1651469] Signal inference workers to stop experience collection... (16500 times) [2024-06-15 15:25:07,663][1652491] InferenceWorker_p0-w0: stopping experience collection (16500 times) [2024-06-15 15:25:07,847][1651469] Signal inference workers to resume experience collection... (16500 times) [2024-06-15 15:25:07,848][1652491] InferenceWorker_p0-w0: resuming experience collection (16500 times) [2024-06-15 15:25:08,638][1652491] Updated weights for policy 0, policy_version 315728 (0.0044) [2024-06-15 15:25:09,824][1652491] Updated weights for policy 0, policy_version 315776 (0.0012) [2024-06-15 15:25:10,955][1648985] Fps is (10 sec: 52428.1, 60 sec: 48059.7, 300 sec: 47097.0). Total num frames: 646709248. Throughput: 0: 11753.2. Samples: 161717760. Policy #0 lag: (min: 15.0, avg: 107.3, max: 271.0) [2024-06-15 15:25:10,956][1648985] Avg episode reward: [(0, '142.460')] [2024-06-15 15:25:14,021][1652491] Updated weights for policy 0, policy_version 315829 (0.0012) [2024-06-15 15:25:15,955][1648985] Fps is (10 sec: 49152.3, 60 sec: 45875.3, 300 sec: 46652.8). Total num frames: 646840320. Throughput: 0: 11855.7. Samples: 161797120. Policy #0 lag: (min: 15.0, avg: 107.3, max: 271.0) [2024-06-15 15:25:15,956][1648985] Avg episode reward: [(0, '153.140')] [2024-06-15 15:25:17,679][1652491] Updated weights for policy 0, policy_version 315889 (0.0014) [2024-06-15 15:25:19,926][1652491] Updated weights for policy 0, policy_version 316000 (0.0015) [2024-06-15 15:25:20,955][1648985] Fps is (10 sec: 52429.4, 60 sec: 49698.1, 300 sec: 47319.2). Total num frames: 647233536. Throughput: 0: 11832.9. Samples: 161858560. Policy #0 lag: (min: 15.0, avg: 107.3, max: 271.0) [2024-06-15 15:25:20,956][1648985] Avg episode reward: [(0, '142.700')] [2024-06-15 15:25:23,990][1652491] Updated weights for policy 0, policy_version 316048 (0.0015) [2024-06-15 15:25:25,981][1648985] Fps is (10 sec: 52293.0, 60 sec: 45855.4, 300 sec: 46648.6). Total num frames: 647364608. Throughput: 0: 11871.6. Samples: 161908736. Policy #0 lag: (min: 15.0, avg: 107.3, max: 271.0) [2024-06-15 15:25:25,982][1648985] Avg episode reward: [(0, '152.230')] [2024-06-15 15:25:26,901][1652491] Updated weights for policy 0, policy_version 316097 (0.0055) [2024-06-15 15:25:28,552][1652491] Updated weights for policy 0, policy_version 316165 (0.0013) [2024-06-15 15:25:30,129][1652491] Updated weights for policy 0, policy_version 316229 (0.0015) [2024-06-15 15:25:30,954][1652491] Updated weights for policy 0, policy_version 316285 (0.0027) [2024-06-15 15:25:30,955][1648985] Fps is (10 sec: 49152.3, 60 sec: 49698.2, 300 sec: 47430.3). Total num frames: 647725056. Throughput: 0: 11730.5. Samples: 161970688. Policy #0 lag: (min: 15.0, avg: 107.3, max: 271.0) [2024-06-15 15:25:30,956][1648985] Avg episode reward: [(0, '155.650')] [2024-06-15 15:25:35,955][1648985] Fps is (10 sec: 49279.6, 60 sec: 45329.0, 300 sec: 47319.2). Total num frames: 647856128. Throughput: 0: 11697.8. Samples: 162049024. Policy #0 lag: (min: 15.0, avg: 107.3, max: 271.0) [2024-06-15 15:25:35,956][1648985] Avg episode reward: [(0, '154.690')] [2024-06-15 15:25:37,228][1652491] Updated weights for policy 0, policy_version 316356 (0.0015) [2024-06-15 15:25:38,297][1652491] Updated weights for policy 0, policy_version 316415 (0.0012) [2024-06-15 15:25:40,965][1648985] Fps is (10 sec: 42555.8, 60 sec: 48597.8, 300 sec: 47317.6). Total num frames: 648151040. Throughput: 0: 11716.6. Samples: 162085376. Policy #0 lag: (min: 15.0, avg: 107.3, max: 271.0) [2024-06-15 15:25:40,966][1648985] Avg episode reward: [(0, '149.620')] [2024-06-15 15:25:41,504][1652491] Updated weights for policy 0, policy_version 316512 (0.0110) [2024-06-15 15:25:42,244][1652491] Updated weights for policy 0, policy_version 316544 (0.0012) [2024-06-15 15:25:45,955][1648985] Fps is (10 sec: 42598.7, 60 sec: 44782.9, 300 sec: 47097.1). Total num frames: 648282112. Throughput: 0: 11901.2. Samples: 162150912. Policy #0 lag: (min: 15.0, avg: 107.3, max: 271.0) [2024-06-15 15:25:45,956][1648985] Avg episode reward: [(0, '152.080')] [2024-06-15 15:25:47,057][1652491] Updated weights for policy 0, policy_version 316592 (0.0012) [2024-06-15 15:25:48,673][1651469] Signal inference workers to stop experience collection... (16550 times) [2024-06-15 15:25:48,707][1652491] InferenceWorker_p0-w0: stopping experience collection (16550 times) [2024-06-15 15:25:48,719][1652491] Updated weights for policy 0, policy_version 316627 (0.0012) [2024-06-15 15:25:48,934][1651469] Signal inference workers to resume experience collection... (16550 times) [2024-06-15 15:25:48,935][1652491] InferenceWorker_p0-w0: resuming experience collection (16550 times) [2024-06-15 15:25:50,939][1652491] Updated weights for policy 0, policy_version 316690 (0.0014) [2024-06-15 15:25:50,989][1648985] Fps is (10 sec: 42496.7, 60 sec: 47491.2, 300 sec: 47202.7). Total num frames: 648577024. Throughput: 0: 11755.8. Samples: 162223104. Policy #0 lag: (min: 15.0, avg: 107.3, max: 271.0) [2024-06-15 15:25:50,990][1648985] Avg episode reward: [(0, '131.540')] [2024-06-15 15:25:52,849][1652491] Updated weights for policy 0, policy_version 316768 (0.0015) [2024-06-15 15:25:55,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 47513.9, 300 sec: 47097.1). Total num frames: 648806400. Throughput: 0: 11810.2. Samples: 162249216. Policy #0 lag: (min: 15.0, avg: 107.3, max: 271.0) [2024-06-15 15:25:55,955][1648985] Avg episode reward: [(0, '147.260')] [2024-06-15 15:25:58,012][1652491] Updated weights for policy 0, policy_version 316816 (0.0012) [2024-06-15 15:25:59,120][1652491] Updated weights for policy 0, policy_version 316855 (0.0011) [2024-06-15 15:26:00,955][1648985] Fps is (10 sec: 42743.7, 60 sec: 46967.5, 300 sec: 46874.9). Total num frames: 649003008. Throughput: 0: 11753.3. Samples: 162326016. Policy #0 lag: (min: 15.0, avg: 107.3, max: 271.0) [2024-06-15 15:26:00,956][1648985] Avg episode reward: [(0, '137.830')] [2024-06-15 15:26:01,562][1652491] Updated weights for policy 0, policy_version 316925 (0.0013) [2024-06-15 15:26:03,320][1652491] Updated weights for policy 0, policy_version 316992 (0.0013) [2024-06-15 15:26:05,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 49698.2, 300 sec: 47319.7). Total num frames: 649330688. Throughput: 0: 11650.9. Samples: 162382848. Policy #0 lag: (min: 15.0, avg: 107.3, max: 271.0) [2024-06-15 15:26:05,955][1648985] Avg episode reward: [(0, '152.120')] [2024-06-15 15:26:09,887][1652491] Updated weights for policy 0, policy_version 317072 (0.0013) [2024-06-15 15:26:10,955][1648985] Fps is (10 sec: 42597.9, 60 sec: 45329.1, 300 sec: 46874.9). Total num frames: 649428992. Throughput: 0: 11532.3. Samples: 162427392. Policy #0 lag: (min: 15.0, avg: 107.3, max: 271.0) [2024-06-15 15:26:10,956][1648985] Avg episode reward: [(0, '143.590')] [2024-06-15 15:26:12,580][1652491] Updated weights for policy 0, policy_version 317127 (0.0016) [2024-06-15 15:26:13,759][1652491] Updated weights for policy 0, policy_version 317184 (0.0022) [2024-06-15 15:26:15,955][1648985] Fps is (10 sec: 42598.5, 60 sec: 48605.9, 300 sec: 47208.2). Total num frames: 649756672. Throughput: 0: 11502.9. Samples: 162488320. Policy #0 lag: (min: 15.0, avg: 107.3, max: 271.0) [2024-06-15 15:26:15,955][1648985] Avg episode reward: [(0, '153.700')] [2024-06-15 15:26:15,979][1652491] Updated weights for policy 0, policy_version 317267 (0.0014) [2024-06-15 15:26:16,836][1652491] Updated weights for policy 0, policy_version 317308 (0.0020) [2024-06-15 15:26:20,955][1648985] Fps is (10 sec: 42598.0, 60 sec: 43690.6, 300 sec: 46763.8). Total num frames: 649854976. Throughput: 0: 11468.8. Samples: 162565120. Policy #0 lag: (min: 159.0, avg: 246.2, max: 351.0) [2024-06-15 15:26:20,956][1648985] Avg episode reward: [(0, '158.290')] [2024-06-15 15:26:22,735][1652491] Updated weights for policy 0, policy_version 317360 (0.0136) [2024-06-15 15:26:24,423][1652491] Updated weights for policy 0, policy_version 317424 (0.0021) [2024-06-15 15:26:25,955][1648985] Fps is (10 sec: 42597.9, 60 sec: 46987.7, 300 sec: 46986.0). Total num frames: 650182656. Throughput: 0: 11437.2. Samples: 162599936. Policy #0 lag: (min: 159.0, avg: 246.2, max: 351.0) [2024-06-15 15:26:25,956][1648985] Avg episode reward: [(0, '162.970')] [2024-06-15 15:26:26,735][1652491] Updated weights for policy 0, policy_version 317520 (0.0120) [2024-06-15 15:26:27,885][1652491] Updated weights for policy 0, policy_version 317568 (0.0095) [2024-06-15 15:26:30,955][1648985] Fps is (10 sec: 52429.7, 60 sec: 44236.8, 300 sec: 47097.1). Total num frames: 650379264. Throughput: 0: 11366.4. Samples: 162662400. Policy #0 lag: (min: 159.0, avg: 246.2, max: 351.0) [2024-06-15 15:26:30,956][1648985] Avg episode reward: [(0, '137.720')] [2024-06-15 15:26:33,389][1651469] Signal inference workers to stop experience collection... (16600 times) [2024-06-15 15:26:33,483][1652491] InferenceWorker_p0-w0: stopping experience collection (16600 times) [2024-06-15 15:26:33,676][1651469] Signal inference workers to resume experience collection... (16600 times) [2024-06-15 15:26:33,677][1652491] InferenceWorker_p0-w0: resuming experience collection (16600 times) [2024-06-15 15:26:34,752][1652491] Updated weights for policy 0, policy_version 317628 (0.0016) [2024-06-15 15:26:35,955][1648985] Fps is (10 sec: 42598.1, 60 sec: 45875.1, 300 sec: 46986.0). Total num frames: 650608640. Throughput: 0: 11420.5. Samples: 162736640. Policy #0 lag: (min: 159.0, avg: 246.2, max: 351.0) [2024-06-15 15:26:35,956][1648985] Avg episode reward: [(0, '129.380')] [2024-06-15 15:26:36,309][1652491] Updated weights for policy 0, policy_version 317697 (0.0012) [2024-06-15 15:26:38,011][1652491] Updated weights for policy 0, policy_version 317761 (0.0012) [2024-06-15 15:26:39,538][1652491] Updated weights for policy 0, policy_version 317822 (0.0011) [2024-06-15 15:26:40,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 45882.9, 300 sec: 47097.1). Total num frames: 650903552. Throughput: 0: 11400.5. Samples: 162762240. Policy #0 lag: (min: 159.0, avg: 246.2, max: 351.0) [2024-06-15 15:26:40,956][1648985] Avg episode reward: [(0, '115.640')] [2024-06-15 15:26:45,704][1652491] Updated weights for policy 0, policy_version 317875 (0.0025) [2024-06-15 15:26:45,955][1648985] Fps is (10 sec: 39320.7, 60 sec: 45328.8, 300 sec: 46541.6). Total num frames: 651001856. Throughput: 0: 11468.7. Samples: 162842112. Policy #0 lag: (min: 159.0, avg: 246.2, max: 351.0) [2024-06-15 15:26:45,956][1648985] Avg episode reward: [(0, '146.280')] [2024-06-15 15:26:46,970][1652491] Updated weights for policy 0, policy_version 317904 (0.0013) [2024-06-15 15:26:48,690][1652491] Updated weights for policy 0, policy_version 317984 (0.0013) [2024-06-15 15:26:50,295][1652491] Updated weights for policy 0, policy_version 318033 (0.0028) [2024-06-15 15:26:50,955][1648985] Fps is (10 sec: 45875.5, 60 sec: 46447.7, 300 sec: 47208.1). Total num frames: 651362304. Throughput: 0: 11502.9. Samples: 162900480. Policy #0 lag: (min: 159.0, avg: 246.2, max: 351.0) [2024-06-15 15:26:50,955][1648985] Avg episode reward: [(0, '153.900')] [2024-06-15 15:26:55,955][1648985] Fps is (10 sec: 42599.6, 60 sec: 43690.6, 300 sec: 46319.5). Total num frames: 651427840. Throughput: 0: 11502.9. Samples: 162945024. Policy #0 lag: (min: 159.0, avg: 246.2, max: 351.0) [2024-06-15 15:26:55,956][1648985] Avg episode reward: [(0, '168.170')] [2024-06-15 15:26:56,177][1652491] Updated weights for policy 0, policy_version 318096 (0.0012) [2024-06-15 15:26:56,577][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000318112_651493376.pth... [2024-06-15 15:26:56,715][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000312672_640352256.pth [2024-06-15 15:26:57,338][1652491] Updated weights for policy 0, policy_version 318144 (0.0015) [2024-06-15 15:26:59,190][1652491] Updated weights for policy 0, policy_version 318193 (0.0010) [2024-06-15 15:27:00,887][1652491] Updated weights for policy 0, policy_version 318272 (0.0018) [2024-06-15 15:27:00,955][1648985] Fps is (10 sec: 45874.8, 60 sec: 46967.4, 300 sec: 47097.0). Total num frames: 651821056. Throughput: 0: 11491.5. Samples: 163005440. Policy #0 lag: (min: 159.0, avg: 246.2, max: 351.0) [2024-06-15 15:27:00,956][1648985] Avg episode reward: [(0, '160.980')] [2024-06-15 15:27:02,128][1652491] Updated weights for policy 0, policy_version 318328 (0.0020) [2024-06-15 15:27:05,955][1648985] Fps is (10 sec: 52429.3, 60 sec: 43690.7, 300 sec: 46541.7). Total num frames: 651952128. Throughput: 0: 11480.2. Samples: 163081728. Policy #0 lag: (min: 159.0, avg: 246.2, max: 351.0) [2024-06-15 15:27:05,955][1648985] Avg episode reward: [(0, '133.590')] [2024-06-15 15:27:08,057][1652491] Updated weights for policy 0, policy_version 318355 (0.0011) [2024-06-15 15:27:09,964][1652491] Updated weights for policy 0, policy_version 318417 (0.0011) [2024-06-15 15:27:10,955][1648985] Fps is (10 sec: 36044.5, 60 sec: 45875.2, 300 sec: 46541.6). Total num frames: 652181504. Throughput: 0: 11434.7. Samples: 163114496. Policy #0 lag: (min: 159.0, avg: 246.2, max: 351.0) [2024-06-15 15:27:10,956][1648985] Avg episode reward: [(0, '119.300')] [2024-06-15 15:27:11,253][1651469] Signal inference workers to stop experience collection... (16650 times) [2024-06-15 15:27:11,281][1652491] InferenceWorker_p0-w0: stopping experience collection (16650 times) [2024-06-15 15:27:11,553][1651469] Signal inference workers to resume experience collection... (16650 times) [2024-06-15 15:27:11,554][1652491] InferenceWorker_p0-w0: resuming experience collection (16650 times) [2024-06-15 15:27:12,486][1652491] Updated weights for policy 0, policy_version 318516 (0.0013) [2024-06-15 15:27:15,955][1648985] Fps is (10 sec: 52428.6, 60 sec: 45329.0, 300 sec: 46986.0). Total num frames: 652476416. Throughput: 0: 11275.4. Samples: 163169792. Policy #0 lag: (min: 159.0, avg: 246.2, max: 351.0) [2024-06-15 15:27:15,956][1648985] Avg episode reward: [(0, '133.080')] [2024-06-15 15:27:20,249][1652491] Updated weights for policy 0, policy_version 318593 (0.0015) [2024-06-15 15:27:20,955][1648985] Fps is (10 sec: 36045.2, 60 sec: 44783.1, 300 sec: 46430.6). Total num frames: 652541952. Throughput: 0: 11320.9. Samples: 163246080. Policy #0 lag: (min: 159.0, avg: 246.2, max: 351.0) [2024-06-15 15:27:20,956][1648985] Avg episode reward: [(0, '132.390')] [2024-06-15 15:27:21,597][1652491] Updated weights for policy 0, policy_version 318656 (0.0015) [2024-06-15 15:27:23,241][1652491] Updated weights for policy 0, policy_version 318706 (0.0011) [2024-06-15 15:27:24,698][1652491] Updated weights for policy 0, policy_version 318768 (0.0011) [2024-06-15 15:27:25,955][1648985] Fps is (10 sec: 45875.0, 60 sec: 45875.2, 300 sec: 46874.9). Total num frames: 652935168. Throughput: 0: 11355.0. Samples: 163273216. Policy #0 lag: (min: 159.0, avg: 246.2, max: 351.0) [2024-06-15 15:27:25,956][1648985] Avg episode reward: [(0, '133.690')] [2024-06-15 15:27:26,488][1652491] Updated weights for policy 0, policy_version 318840 (0.0013) [2024-06-15 15:27:30,955][1648985] Fps is (10 sec: 45874.2, 60 sec: 43690.5, 300 sec: 46208.4). Total num frames: 653000704. Throughput: 0: 11127.5. Samples: 163342848. Policy #0 lag: (min: 159.0, avg: 246.2, max: 351.0) [2024-06-15 15:27:30,956][1648985] Avg episode reward: [(0, '143.500')] [2024-06-15 15:27:32,322][1652491] Updated weights for policy 0, policy_version 318867 (0.0012) [2024-06-15 15:27:33,276][1652491] Updated weights for policy 0, policy_version 318912 (0.0012) [2024-06-15 15:27:34,933][1652491] Updated weights for policy 0, policy_version 318980 (0.0014) [2024-06-15 15:27:35,955][1648985] Fps is (10 sec: 42598.5, 60 sec: 45875.3, 300 sec: 46874.9). Total num frames: 653361152. Throughput: 0: 11286.7. Samples: 163408384. Policy #0 lag: (min: 4.0, avg: 69.5, max: 260.0) [2024-06-15 15:27:35,956][1648985] Avg episode reward: [(0, '154.190')] [2024-06-15 15:27:36,493][1652491] Updated weights for policy 0, policy_version 319056 (0.0013) [2024-06-15 15:27:37,607][1652491] Updated weights for policy 0, policy_version 319104 (0.0013) [2024-06-15 15:27:40,956][1648985] Fps is (10 sec: 52426.9, 60 sec: 43690.2, 300 sec: 46430.5). Total num frames: 653524992. Throughput: 0: 11093.2. Samples: 163444224. Policy #0 lag: (min: 4.0, avg: 69.5, max: 260.0) [2024-06-15 15:27:40,957][1648985] Avg episode reward: [(0, '169.950')] [2024-06-15 15:27:44,070][1652491] Updated weights for policy 0, policy_version 319152 (0.0016) [2024-06-15 15:27:45,581][1652491] Updated weights for policy 0, policy_version 319203 (0.0080) [2024-06-15 15:27:45,955][1648985] Fps is (10 sec: 39321.6, 60 sec: 45875.5, 300 sec: 46541.7). Total num frames: 653754368. Throughput: 0: 11480.2. Samples: 163522048. Policy #0 lag: (min: 4.0, avg: 69.5, max: 260.0) [2024-06-15 15:27:45,956][1648985] Avg episode reward: [(0, '157.550')] [2024-06-15 15:27:47,746][1652491] Updated weights for policy 0, policy_version 319298 (0.0015) [2024-06-15 15:27:49,162][1652491] Updated weights for policy 0, policy_version 319358 (0.0013) [2024-06-15 15:27:50,955][1648985] Fps is (10 sec: 52431.7, 60 sec: 44782.9, 300 sec: 46652.7). Total num frames: 654049280. Throughput: 0: 11070.6. Samples: 163579904. Policy #0 lag: (min: 4.0, avg: 69.5, max: 260.0) [2024-06-15 15:27:50,956][1648985] Avg episode reward: [(0, '149.410')] [2024-06-15 15:27:54,172][1651469] Signal inference workers to stop experience collection... (16700 times) [2024-06-15 15:27:54,211][1652491] InferenceWorker_p0-w0: stopping experience collection (16700 times) [2024-06-15 15:27:54,375][1651469] Signal inference workers to resume experience collection... (16700 times) [2024-06-15 15:27:54,376][1652491] InferenceWorker_p0-w0: resuming experience collection (16700 times) [2024-06-15 15:27:54,965][1652491] Updated weights for policy 0, policy_version 319396 (0.0014) [2024-06-15 15:27:55,955][1648985] Fps is (10 sec: 42599.1, 60 sec: 45875.4, 300 sec: 46208.5). Total num frames: 654180352. Throughput: 0: 11218.6. Samples: 163619328. Policy #0 lag: (min: 4.0, avg: 69.5, max: 260.0) [2024-06-15 15:27:55,955][1648985] Avg episode reward: [(0, '149.120')] [2024-06-15 15:27:57,132][1652491] Updated weights for policy 0, policy_version 319445 (0.0032) [2024-06-15 15:27:58,316][1652491] Updated weights for policy 0, policy_version 319504 (0.0154) [2024-06-15 15:28:00,447][1652491] Updated weights for policy 0, policy_version 319585 (0.0112) [2024-06-15 15:28:00,956][1648985] Fps is (10 sec: 49151.6, 60 sec: 45329.0, 300 sec: 46541.7). Total num frames: 654540800. Throughput: 0: 11377.7. Samples: 163681792. Policy #0 lag: (min: 4.0, avg: 69.5, max: 260.0) [2024-06-15 15:28:00,958][1648985] Avg episode reward: [(0, '137.190')] [2024-06-15 15:28:05,429][1652491] Updated weights for policy 0, policy_version 319632 (0.0011) [2024-06-15 15:28:05,955][1648985] Fps is (10 sec: 45874.5, 60 sec: 44782.9, 300 sec: 46430.6). Total num frames: 654639104. Throughput: 0: 11332.3. Samples: 163756032. Policy #0 lag: (min: 4.0, avg: 69.5, max: 260.0) [2024-06-15 15:28:05,956][1648985] Avg episode reward: [(0, '151.290')] [2024-06-15 15:28:06,302][1652491] Updated weights for policy 0, policy_version 319674 (0.0016) [2024-06-15 15:28:08,487][1652491] Updated weights for policy 0, policy_version 319728 (0.0011) [2024-06-15 15:28:09,817][1652491] Updated weights for policy 0, policy_version 319792 (0.0012) [2024-06-15 15:28:10,955][1648985] Fps is (10 sec: 45874.2, 60 sec: 46967.3, 300 sec: 46541.6). Total num frames: 654999552. Throughput: 0: 11502.9. Samples: 163790848. Policy #0 lag: (min: 4.0, avg: 69.5, max: 260.0) [2024-06-15 15:28:10,956][1648985] Avg episode reward: [(0, '147.740')] [2024-06-15 15:28:11,535][1652491] Updated weights for policy 0, policy_version 319863 (0.0013) [2024-06-15 15:28:15,955][1648985] Fps is (10 sec: 45874.6, 60 sec: 43690.6, 300 sec: 46208.4). Total num frames: 655097856. Throughput: 0: 11548.5. Samples: 163862528. Policy #0 lag: (min: 4.0, avg: 69.5, max: 260.0) [2024-06-15 15:28:15,956][1648985] Avg episode reward: [(0, '157.050')] [2024-06-15 15:28:17,079][1652491] Updated weights for policy 0, policy_version 319904 (0.0016) [2024-06-15 15:28:19,100][1652491] Updated weights for policy 0, policy_version 319954 (0.0015) [2024-06-15 15:28:20,941][1652491] Updated weights for policy 0, policy_version 320036 (0.0038) [2024-06-15 15:28:20,955][1648985] Fps is (10 sec: 42599.9, 60 sec: 48059.7, 300 sec: 46541.7). Total num frames: 655425536. Throughput: 0: 11605.3. Samples: 163930624. Policy #0 lag: (min: 4.0, avg: 69.5, max: 260.0) [2024-06-15 15:28:20,956][1648985] Avg episode reward: [(0, '127.930')] [2024-06-15 15:28:22,500][1652491] Updated weights for policy 0, policy_version 320100 (0.0021) [2024-06-15 15:28:25,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 44782.9, 300 sec: 46208.4). Total num frames: 655622144. Throughput: 0: 11480.3. Samples: 163960832. Policy #0 lag: (min: 4.0, avg: 69.5, max: 260.0) [2024-06-15 15:28:25,957][1648985] Avg episode reward: [(0, '122.190')] [2024-06-15 15:28:28,451][1652491] Updated weights for policy 0, policy_version 320160 (0.0015) [2024-06-15 15:28:30,117][1652491] Updated weights for policy 0, policy_version 320193 (0.0012) [2024-06-15 15:28:30,955][1648985] Fps is (10 sec: 39321.0, 60 sec: 46967.5, 300 sec: 46430.5). Total num frames: 655818752. Throughput: 0: 11491.5. Samples: 164039168. Policy #0 lag: (min: 4.0, avg: 69.5, max: 260.0) [2024-06-15 15:28:30,956][1648985] Avg episode reward: [(0, '120.040')] [2024-06-15 15:28:31,585][1652491] Updated weights for policy 0, policy_version 320272 (0.0013) [2024-06-15 15:28:31,688][1651469] Signal inference workers to stop experience collection... (16750 times) [2024-06-15 15:28:31,777][1652491] InferenceWorker_p0-w0: stopping experience collection (16750 times) [2024-06-15 15:28:31,940][1651469] Signal inference workers to resume experience collection... (16750 times) [2024-06-15 15:28:31,941][1652491] InferenceWorker_p0-w0: resuming experience collection (16750 times) [2024-06-15 15:28:32,693][1652491] Updated weights for policy 0, policy_version 320321 (0.0012) [2024-06-15 15:28:34,035][1652491] Updated weights for policy 0, policy_version 320384 (0.0012) [2024-06-15 15:28:35,955][1648985] Fps is (10 sec: 52429.2, 60 sec: 46421.3, 300 sec: 46430.6). Total num frames: 656146432. Throughput: 0: 11764.6. Samples: 164109312. Policy #0 lag: (min: 4.0, avg: 69.5, max: 260.0) [2024-06-15 15:28:35,956][1648985] Avg episode reward: [(0, '143.230')] [2024-06-15 15:28:40,056][1652491] Updated weights for policy 0, policy_version 320444 (0.0018) [2024-06-15 15:28:40,955][1648985] Fps is (10 sec: 45875.9, 60 sec: 45875.6, 300 sec: 46208.4). Total num frames: 656277504. Throughput: 0: 11753.2. Samples: 164148224. Policy #0 lag: (min: 4.0, avg: 69.5, max: 260.0) [2024-06-15 15:28:40,956][1648985] Avg episode reward: [(0, '168.220')] [2024-06-15 15:28:42,042][1652491] Updated weights for policy 0, policy_version 320496 (0.0014) [2024-06-15 15:28:44,007][1652491] Updated weights for policy 0, policy_version 320576 (0.0084) [2024-06-15 15:28:45,291][1652491] Updated weights for policy 0, policy_version 320634 (0.0013) [2024-06-15 15:28:45,955][1648985] Fps is (10 sec: 52428.5, 60 sec: 48605.8, 300 sec: 46652.7). Total num frames: 656670720. Throughput: 0: 11616.7. Samples: 164204544. Policy #0 lag: (min: 4.0, avg: 69.5, max: 260.0) [2024-06-15 15:28:45,956][1648985] Avg episode reward: [(0, '153.150')] [2024-06-15 15:28:50,955][1648985] Fps is (10 sec: 45875.5, 60 sec: 44783.0, 300 sec: 46208.5). Total num frames: 656736256. Throughput: 0: 11696.4. Samples: 164282368. Policy #0 lag: (min: 15.0, avg: 87.4, max: 271.0) [2024-06-15 15:28:50,955][1648985] Avg episode reward: [(0, '132.360')] [2024-06-15 15:28:51,539][1652491] Updated weights for policy 0, policy_version 320695 (0.0032) [2024-06-15 15:28:53,249][1652491] Updated weights for policy 0, policy_version 320752 (0.0015) [2024-06-15 15:28:54,736][1652491] Updated weights for policy 0, policy_version 320821 (0.0023) [2024-06-15 15:28:55,955][1648985] Fps is (10 sec: 45874.1, 60 sec: 49151.6, 300 sec: 46541.6). Total num frames: 657129472. Throughput: 0: 11594.0. Samples: 164312576. Policy #0 lag: (min: 15.0, avg: 87.4, max: 271.0) [2024-06-15 15:28:55,956][1648985] Avg episode reward: [(0, '125.400')] [2024-06-15 15:28:56,575][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000320896_657195008.pth... [2024-06-15 15:28:56,598][1652491] Updated weights for policy 0, policy_version 320896 (0.0012) [2024-06-15 15:28:56,615][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000315440_646021120.pth [2024-06-15 15:29:00,955][1648985] Fps is (10 sec: 45874.8, 60 sec: 44236.9, 300 sec: 46208.4). Total num frames: 657195008. Throughput: 0: 11468.8. Samples: 164378624. Policy #0 lag: (min: 15.0, avg: 87.4, max: 271.0) [2024-06-15 15:29:00,956][1648985] Avg episode reward: [(0, '137.200')] [2024-06-15 15:29:02,969][1652491] Updated weights for policy 0, policy_version 320955 (0.0023) [2024-06-15 15:29:05,286][1652491] Updated weights for policy 0, policy_version 321029 (0.0013) [2024-06-15 15:29:05,956][1648985] Fps is (10 sec: 39322.4, 60 sec: 48059.6, 300 sec: 46430.6). Total num frames: 657522688. Throughput: 0: 11446.0. Samples: 164445696. Policy #0 lag: (min: 15.0, avg: 87.4, max: 271.0) [2024-06-15 15:29:05,957][1648985] Avg episode reward: [(0, '141.990')] [2024-06-15 15:29:06,350][1652491] Updated weights for policy 0, policy_version 321081 (0.0016) [2024-06-15 15:29:07,524][1652491] Updated weights for policy 0, policy_version 321122 (0.0012) [2024-06-15 15:29:10,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 45329.3, 300 sec: 46208.4). Total num frames: 657719296. Throughput: 0: 11571.2. Samples: 164481536. Policy #0 lag: (min: 15.0, avg: 87.4, max: 271.0) [2024-06-15 15:29:10,956][1648985] Avg episode reward: [(0, '156.100')] [2024-06-15 15:29:13,323][1652491] Updated weights for policy 0, policy_version 321168 (0.0060) [2024-06-15 15:29:14,580][1651469] Signal inference workers to stop experience collection... (16800 times) [2024-06-15 15:29:14,627][1652491] InferenceWorker_p0-w0: stopping experience collection (16800 times) [2024-06-15 15:29:14,879][1651469] Signal inference workers to resume experience collection... (16800 times) [2024-06-15 15:29:14,880][1652491] InferenceWorker_p0-w0: resuming experience collection (16800 times) [2024-06-15 15:29:15,023][1652491] Updated weights for policy 0, policy_version 321233 (0.0015) [2024-06-15 15:29:15,955][1648985] Fps is (10 sec: 42598.7, 60 sec: 47513.6, 300 sec: 46430.6). Total num frames: 657948672. Throughput: 0: 11537.1. Samples: 164558336. Policy #0 lag: (min: 15.0, avg: 87.4, max: 271.0) [2024-06-15 15:29:15,956][1648985] Avg episode reward: [(0, '157.440')] [2024-06-15 15:29:16,674][1652491] Updated weights for policy 0, policy_version 321296 (0.0020) [2024-06-15 15:29:17,504][1652491] Updated weights for policy 0, policy_version 321339 (0.0015) [2024-06-15 15:29:18,637][1652491] Updated weights for policy 0, policy_version 321377 (0.0098) [2024-06-15 15:29:20,955][1648985] Fps is (10 sec: 52426.9, 60 sec: 46967.2, 300 sec: 46208.4). Total num frames: 658243584. Throughput: 0: 11491.5. Samples: 164626432. Policy #0 lag: (min: 15.0, avg: 87.4, max: 271.0) [2024-06-15 15:29:20,957][1648985] Avg episode reward: [(0, '156.320')] [2024-06-15 15:29:24,225][1652491] Updated weights for policy 0, policy_version 321411 (0.0011) [2024-06-15 15:29:25,946][1652491] Updated weights for policy 0, policy_version 321488 (0.0015) [2024-06-15 15:29:25,955][1648985] Fps is (10 sec: 45875.7, 60 sec: 46421.4, 300 sec: 46319.5). Total num frames: 658407424. Throughput: 0: 11616.7. Samples: 164670976. Policy #0 lag: (min: 15.0, avg: 87.4, max: 271.0) [2024-06-15 15:29:25,956][1648985] Avg episode reward: [(0, '133.770')] [2024-06-15 15:29:28,321][1652491] Updated weights for policy 0, policy_version 321591 (0.0015) [2024-06-15 15:29:30,334][1652491] Updated weights for policy 0, policy_version 321653 (0.0018) [2024-06-15 15:29:30,955][1648985] Fps is (10 sec: 52430.9, 60 sec: 49152.1, 300 sec: 46208.4). Total num frames: 658767872. Throughput: 0: 11594.0. Samples: 164726272. Policy #0 lag: (min: 15.0, avg: 87.4, max: 271.0) [2024-06-15 15:29:30,955][1648985] Avg episode reward: [(0, '127.740')] [2024-06-15 15:29:35,634][1652491] Updated weights for policy 0, policy_version 321669 (0.0020) [2024-06-15 15:29:35,955][1648985] Fps is (10 sec: 39320.8, 60 sec: 44236.7, 300 sec: 45986.3). Total num frames: 658800640. Throughput: 0: 11684.9. Samples: 164808192. Policy #0 lag: (min: 15.0, avg: 87.4, max: 271.0) [2024-06-15 15:29:35,956][1648985] Avg episode reward: [(0, '144.690')] [2024-06-15 15:29:37,177][1652491] Updated weights for policy 0, policy_version 321731 (0.0111) [2024-06-15 15:29:39,366][1652491] Updated weights for policy 0, policy_version 321824 (0.0014) [2024-06-15 15:29:40,955][1648985] Fps is (10 sec: 39320.2, 60 sec: 48059.5, 300 sec: 45986.2). Total num frames: 659161088. Throughput: 0: 11559.8. Samples: 164832768. Policy #0 lag: (min: 15.0, avg: 87.4, max: 271.0) [2024-06-15 15:29:40,956][1648985] Avg episode reward: [(0, '158.660')] [2024-06-15 15:29:41,801][1652491] Updated weights for policy 0, policy_version 321888 (0.0112) [2024-06-15 15:29:45,955][1648985] Fps is (10 sec: 49152.7, 60 sec: 43690.7, 300 sec: 45987.2). Total num frames: 659292160. Throughput: 0: 11616.7. Samples: 164901376. Policy #0 lag: (min: 15.0, avg: 87.4, max: 271.0) [2024-06-15 15:29:45,956][1648985] Avg episode reward: [(0, '162.350')] [2024-06-15 15:29:47,855][1652491] Updated weights for policy 0, policy_version 321954 (0.0014) [2024-06-15 15:29:49,547][1652491] Updated weights for policy 0, policy_version 322016 (0.0076) [2024-06-15 15:29:50,955][1648985] Fps is (10 sec: 42599.8, 60 sec: 47513.6, 300 sec: 46208.5). Total num frames: 659587072. Throughput: 0: 11537.1. Samples: 164964864. Policy #0 lag: (min: 15.0, avg: 87.4, max: 271.0) [2024-06-15 15:29:50,956][1648985] Avg episode reward: [(0, '146.800')] [2024-06-15 15:29:51,681][1652491] Updated weights for policy 0, policy_version 322099 (0.0014) [2024-06-15 15:29:53,327][1652491] Updated weights for policy 0, policy_version 322144 (0.0012) [2024-06-15 15:29:53,431][1651469] Signal inference workers to stop experience collection... (16850 times) [2024-06-15 15:29:53,476][1652491] InferenceWorker_p0-w0: stopping experience collection (16850 times) [2024-06-15 15:29:53,656][1651469] Signal inference workers to resume experience collection... (16850 times) [2024-06-15 15:29:53,657][1652491] InferenceWorker_p0-w0: resuming experience collection (16850 times) [2024-06-15 15:29:55,955][1648985] Fps is (10 sec: 52428.3, 60 sec: 44783.1, 300 sec: 46208.4). Total num frames: 659816448. Throughput: 0: 11411.9. Samples: 164995072. Policy #0 lag: (min: 15.0, avg: 87.4, max: 271.0) [2024-06-15 15:29:55,956][1648985] Avg episode reward: [(0, '135.430')] [2024-06-15 15:29:59,858][1652491] Updated weights for policy 0, policy_version 322232 (0.0015) [2024-06-15 15:30:00,783][1652491] Updated weights for policy 0, policy_version 322263 (0.0011) [2024-06-15 15:30:00,955][1648985] Fps is (10 sec: 42597.5, 60 sec: 46967.3, 300 sec: 46319.5). Total num frames: 660013056. Throughput: 0: 11423.3. Samples: 165072384. Policy #0 lag: (min: 15.0, avg: 87.4, max: 271.0) [2024-06-15 15:30:00,956][1648985] Avg episode reward: [(0, '136.330')] [2024-06-15 15:30:02,368][1652491] Updated weights for policy 0, policy_version 322322 (0.0012) [2024-06-15 15:30:04,593][1652491] Updated weights for policy 0, policy_version 322384 (0.0021) [2024-06-15 15:30:05,955][1648985] Fps is (10 sec: 52429.4, 60 sec: 46967.6, 300 sec: 46208.5). Total num frames: 660340736. Throughput: 0: 11355.1. Samples: 165137408. Policy #0 lag: (min: 15.0, avg: 87.4, max: 271.0) [2024-06-15 15:30:05,956][1648985] Avg episode reward: [(0, '137.700')] [2024-06-15 15:30:09,842][1652491] Updated weights for policy 0, policy_version 322434 (0.0012) [2024-06-15 15:30:10,955][1648985] Fps is (10 sec: 42598.8, 60 sec: 45329.0, 300 sec: 46097.3). Total num frames: 660439040. Throughput: 0: 11434.6. Samples: 165185536. Policy #0 lag: (min: 4.0, avg: 70.8, max: 260.0) [2024-06-15 15:30:10,956][1648985] Avg episode reward: [(0, '157.460')] [2024-06-15 15:30:12,274][1652491] Updated weights for policy 0, policy_version 322544 (0.0117) [2024-06-15 15:30:13,548][1652491] Updated weights for policy 0, policy_version 322592 (0.0012) [2024-06-15 15:30:14,271][1652491] Updated weights for policy 0, policy_version 322620 (0.0012) [2024-06-15 15:30:15,955][1648985] Fps is (10 sec: 42598.2, 60 sec: 46967.5, 300 sec: 45875.2). Total num frames: 660766720. Throughput: 0: 11571.2. Samples: 165246976. Policy #0 lag: (min: 4.0, avg: 70.8, max: 260.0) [2024-06-15 15:30:15,956][1648985] Avg episode reward: [(0, '150.610')] [2024-06-15 15:30:16,715][1652491] Updated weights for policy 0, policy_version 322673 (0.0014) [2024-06-15 15:30:20,955][1648985] Fps is (10 sec: 45875.3, 60 sec: 44237.0, 300 sec: 45879.2). Total num frames: 660897792. Throughput: 0: 11537.1. Samples: 165327360. Policy #0 lag: (min: 4.0, avg: 70.8, max: 260.0) [2024-06-15 15:30:20,956][1648985] Avg episode reward: [(0, '160.910')] [2024-06-15 15:30:21,450][1652491] Updated weights for policy 0, policy_version 322736 (0.0101) [2024-06-15 15:30:22,923][1652491] Updated weights for policy 0, policy_version 322810 (0.0147) [2024-06-15 15:30:24,999][1652491] Updated weights for policy 0, policy_version 322864 (0.0016) [2024-06-15 15:30:25,955][1648985] Fps is (10 sec: 49151.3, 60 sec: 47513.4, 300 sec: 45875.2). Total num frames: 661258240. Throughput: 0: 11707.8. Samples: 165359616. Policy #0 lag: (min: 4.0, avg: 70.8, max: 260.0) [2024-06-15 15:30:25,956][1648985] Avg episode reward: [(0, '152.480')] [2024-06-15 15:30:26,732][1652491] Updated weights for policy 0, policy_version 322896 (0.0024) [2024-06-15 15:30:27,731][1652491] Updated weights for policy 0, policy_version 322941 (0.0011) [2024-06-15 15:30:30,955][1648985] Fps is (10 sec: 49152.1, 60 sec: 43690.6, 300 sec: 45875.2). Total num frames: 661389312. Throughput: 0: 11719.1. Samples: 165428736. Policy #0 lag: (min: 4.0, avg: 70.8, max: 260.0) [2024-06-15 15:30:30,956][1648985] Avg episode reward: [(0, '172.200')] [2024-06-15 15:30:33,029][1652491] Updated weights for policy 0, policy_version 323012 (0.0114) [2024-06-15 15:30:34,002][1652491] Updated weights for policy 0, policy_version 323064 (0.0013) [2024-06-15 15:30:35,955][1648985] Fps is (10 sec: 49153.0, 60 sec: 49152.1, 300 sec: 46098.9). Total num frames: 661749760. Throughput: 0: 11719.1. Samples: 165492224. Policy #0 lag: (min: 4.0, avg: 70.8, max: 260.0) [2024-06-15 15:30:35,956][1648985] Avg episode reward: [(0, '167.650')] [2024-06-15 15:30:36,031][1652491] Updated weights for policy 0, policy_version 323128 (0.0017) [2024-06-15 15:30:38,262][1651469] Signal inference workers to stop experience collection... (16900 times) [2024-06-15 15:30:38,303][1652491] InferenceWorker_p0-w0: stopping experience collection (16900 times) [2024-06-15 15:30:38,516][1651469] Signal inference workers to resume experience collection... (16900 times) [2024-06-15 15:30:38,517][1652491] InferenceWorker_p0-w0: resuming experience collection (16900 times) [2024-06-15 15:30:39,408][1652491] Updated weights for policy 0, policy_version 323191 (0.0015) [2024-06-15 15:30:40,964][1648985] Fps is (10 sec: 52382.7, 60 sec: 45868.7, 300 sec: 46207.0). Total num frames: 661913600. Throughput: 0: 11807.8. Samples: 165526528. Policy #0 lag: (min: 4.0, avg: 70.8, max: 260.0) [2024-06-15 15:30:40,965][1648985] Avg episode reward: [(0, '130.040')] [2024-06-15 15:30:44,650][1652491] Updated weights for policy 0, policy_version 323269 (0.0030) [2024-06-15 15:30:45,955][1648985] Fps is (10 sec: 39321.5, 60 sec: 47513.6, 300 sec: 45991.6). Total num frames: 662142976. Throughput: 0: 11787.4. Samples: 165602816. Policy #0 lag: (min: 4.0, avg: 70.8, max: 260.0) [2024-06-15 15:30:45,956][1648985] Avg episode reward: [(0, '110.980')] [2024-06-15 15:30:45,999][1652491] Updated weights for policy 0, policy_version 323328 (0.0106) [2024-06-15 15:30:47,510][1652491] Updated weights for policy 0, policy_version 323390 (0.0013) [2024-06-15 15:30:50,800][1652491] Updated weights for policy 0, policy_version 323449 (0.0014) [2024-06-15 15:30:50,955][1648985] Fps is (10 sec: 52475.5, 60 sec: 47513.6, 300 sec: 46208.4). Total num frames: 662437888. Throughput: 0: 11730.5. Samples: 165665280. Policy #0 lag: (min: 4.0, avg: 70.8, max: 260.0) [2024-06-15 15:30:50,955][1648985] Avg episode reward: [(0, '113.210')] [2024-06-15 15:30:55,936][1652491] Updated weights for policy 0, policy_version 323508 (0.0013) [2024-06-15 15:30:55,955][1648985] Fps is (10 sec: 39320.3, 60 sec: 45328.9, 300 sec: 45875.1). Total num frames: 662536192. Throughput: 0: 11605.3. Samples: 165707776. Policy #0 lag: (min: 4.0, avg: 70.8, max: 260.0) [2024-06-15 15:30:55,956][1648985] Avg episode reward: [(0, '135.220')] [2024-06-15 15:30:56,426][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000323536_662601728.pth... [2024-06-15 15:30:56,573][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000318112_651493376.pth [2024-06-15 15:30:56,577][1651469] Saving a milestone train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/milestones/checkpoint_000323536_662601728.pth [2024-06-15 15:30:57,747][1652491] Updated weights for policy 0, policy_version 323584 (0.0013) [2024-06-15 15:30:59,327][1652491] Updated weights for policy 0, policy_version 323645 (0.0013) [2024-06-15 15:31:00,955][1648985] Fps is (10 sec: 39321.4, 60 sec: 46967.6, 300 sec: 45764.1). Total num frames: 662831104. Throughput: 0: 11559.8. Samples: 165767168. Policy #0 lag: (min: 4.0, avg: 70.8, max: 260.0) [2024-06-15 15:31:00,956][1648985] Avg episode reward: [(0, '138.600')] [2024-06-15 15:31:02,059][1652491] Updated weights for policy 0, policy_version 323705 (0.0094) [2024-06-15 15:31:05,955][1648985] Fps is (10 sec: 42599.8, 60 sec: 43690.7, 300 sec: 45875.2). Total num frames: 662962176. Throughput: 0: 11559.8. Samples: 165847552. Policy #0 lag: (min: 4.0, avg: 70.8, max: 260.0) [2024-06-15 15:31:05,956][1648985] Avg episode reward: [(0, '137.180')] [2024-06-15 15:31:07,412][1652491] Updated weights for policy 0, policy_version 323776 (0.0014) [2024-06-15 15:31:09,131][1652491] Updated weights for policy 0, policy_version 323841 (0.0013) [2024-06-15 15:31:10,579][1652491] Updated weights for policy 0, policy_version 323896 (0.0018) [2024-06-15 15:31:10,955][1648985] Fps is (10 sec: 52428.5, 60 sec: 48605.9, 300 sec: 46097.3). Total num frames: 663355392. Throughput: 0: 11389.2. Samples: 165872128. Policy #0 lag: (min: 4.0, avg: 70.8, max: 260.0) [2024-06-15 15:31:10,956][1648985] Avg episode reward: [(0, '136.810')] [2024-06-15 15:31:13,592][1652491] Updated weights for policy 0, policy_version 323960 (0.0109) [2024-06-15 15:31:15,955][1648985] Fps is (10 sec: 52428.6, 60 sec: 45329.1, 300 sec: 46208.4). Total num frames: 663486464. Throughput: 0: 11389.2. Samples: 165941248. Policy #0 lag: (min: 4.0, avg: 70.8, max: 260.0) [2024-06-15 15:31:15,956][1648985] Avg episode reward: [(0, '154.100')] [2024-06-15 15:31:18,965][1652491] Updated weights for policy 0, policy_version 324016 (0.0012) [2024-06-15 15:31:20,026][1651469] Signal inference workers to stop experience collection... (16950 times) [2024-06-15 15:31:20,091][1652491] InferenceWorker_p0-w0: stopping experience collection (16950 times) [2024-06-15 15:31:20,289][1651469] Signal inference workers to resume experience collection... (16950 times) [2024-06-15 15:31:20,290][1652491] InferenceWorker_p0-w0: resuming experience collection (16950 times) [2024-06-15 15:31:20,457][1652491] Updated weights for policy 0, policy_version 324065 (0.0023) [2024-06-15 15:31:20,955][1648985] Fps is (10 sec: 36044.8, 60 sec: 46967.5, 300 sec: 45875.2). Total num frames: 663715840. Throughput: 0: 11423.3. Samples: 166006272. Policy #0 lag: (min: 4.0, avg: 70.8, max: 260.0) [2024-06-15 15:31:20,956][1648985] Avg episode reward: [(0, '156.810')] [2024-06-15 15:31:22,515][1652491] Updated weights for policy 0, policy_version 324157 (0.0014) [2024-06-15 15:31:25,681][1652491] Updated weights for policy 0, policy_version 324216 (0.0030) [2024-06-15 15:31:25,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 45875.4, 300 sec: 46208.4). Total num frames: 664010752. Throughput: 0: 11391.4. Samples: 166039040. Policy #0 lag: (min: 31.0, avg: 166.9, max: 287.0) [2024-06-15 15:31:25,956][1648985] Avg episode reward: [(0, '159.110')] [2024-06-15 15:31:30,624][1652491] Updated weights for policy 0, policy_version 324246 (0.0013) [2024-06-15 15:31:30,955][1648985] Fps is (10 sec: 36045.0, 60 sec: 44783.0, 300 sec: 45653.1). Total num frames: 664076288. Throughput: 0: 11286.8. Samples: 166110720. Policy #0 lag: (min: 31.0, avg: 166.9, max: 287.0) [2024-06-15 15:31:30,956][1648985] Avg episode reward: [(0, '156.430')] [2024-06-15 15:31:33,173][1652491] Updated weights for policy 0, policy_version 324352 (0.0075) [2024-06-15 15:31:34,542][1652491] Updated weights for policy 0, policy_version 324411 (0.0019) [2024-06-15 15:31:35,955][1648985] Fps is (10 sec: 39320.6, 60 sec: 44236.6, 300 sec: 45764.1). Total num frames: 664403968. Throughput: 0: 11093.3. Samples: 166164480. Policy #0 lag: (min: 31.0, avg: 166.9, max: 287.0) [2024-06-15 15:31:35,956][1648985] Avg episode reward: [(0, '164.300')] [2024-06-15 15:31:37,500][1652491] Updated weights for policy 0, policy_version 324438 (0.0017) [2024-06-15 15:31:38,319][1652491] Updated weights for policy 0, policy_version 324480 (0.0020) [2024-06-15 15:31:40,955][1648985] Fps is (10 sec: 45873.9, 60 sec: 43696.9, 300 sec: 45875.2). Total num frames: 664535040. Throughput: 0: 11013.7. Samples: 166203392. Policy #0 lag: (min: 31.0, avg: 166.9, max: 287.0) [2024-06-15 15:31:40,956][1648985] Avg episode reward: [(0, '160.540')] [2024-06-15 15:31:43,400][1652491] Updated weights for policy 0, policy_version 324532 (0.0013) [2024-06-15 15:31:45,376][1652491] Updated weights for policy 0, policy_version 324624 (0.0012) [2024-06-15 15:31:45,955][1648985] Fps is (10 sec: 45876.5, 60 sec: 45329.1, 300 sec: 45764.1). Total num frames: 664862720. Throughput: 0: 11286.8. Samples: 166275072. Policy #0 lag: (min: 31.0, avg: 166.9, max: 287.0) [2024-06-15 15:31:45,955][1648985] Avg episode reward: [(0, '151.990')] [2024-06-15 15:31:46,544][1652491] Updated weights for policy 0, policy_version 324672 (0.0015) [2024-06-15 15:31:49,111][1652491] Updated weights for policy 0, policy_version 324736 (0.0013) [2024-06-15 15:31:50,955][1648985] Fps is (10 sec: 52430.3, 60 sec: 43690.6, 300 sec: 46208.4). Total num frames: 665059328. Throughput: 0: 10956.8. Samples: 166340608. Policy #0 lag: (min: 31.0, avg: 166.9, max: 287.0) [2024-06-15 15:31:50,956][1648985] Avg episode reward: [(0, '146.980')] [2024-06-15 15:31:54,979][1652491] Updated weights for policy 0, policy_version 324801 (0.0092) [2024-06-15 15:31:55,955][1648985] Fps is (10 sec: 42598.2, 60 sec: 45875.5, 300 sec: 45653.0). Total num frames: 665288704. Throughput: 0: 11298.1. Samples: 166380544. Policy #0 lag: (min: 31.0, avg: 166.9, max: 287.0) [2024-06-15 15:31:55,956][1648985] Avg episode reward: [(0, '140.890')] [2024-06-15 15:31:56,157][1652491] Updated weights for policy 0, policy_version 324864 (0.0013) [2024-06-15 15:31:57,622][1652491] Updated weights for policy 0, policy_version 324918 (0.0030) [2024-06-15 15:32:00,339][1651469] Signal inference workers to stop experience collection... (17000 times) [2024-06-15 15:32:00,431][1652491] InferenceWorker_p0-w0: stopping experience collection (17000 times) [2024-06-15 15:32:00,552][1651469] Signal inference workers to resume experience collection... (17000 times) [2024-06-15 15:32:00,553][1652491] InferenceWorker_p0-w0: resuming experience collection (17000 times) [2024-06-15 15:32:00,776][1652491] Updated weights for policy 0, policy_version 324988 (0.0012) [2024-06-15 15:32:00,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 45875.2, 300 sec: 46208.4). Total num frames: 665583616. Throughput: 0: 11264.0. Samples: 166448128. Policy #0 lag: (min: 31.0, avg: 166.9, max: 287.0) [2024-06-15 15:32:00,956][1648985] Avg episode reward: [(0, '131.440')] [2024-06-15 15:32:05,340][1652491] Updated weights for policy 0, policy_version 325044 (0.0013) [2024-06-15 15:32:05,955][1648985] Fps is (10 sec: 42597.8, 60 sec: 45875.1, 300 sec: 45875.2). Total num frames: 665714688. Throughput: 0: 11377.8. Samples: 166518272. Policy #0 lag: (min: 31.0, avg: 166.9, max: 287.0) [2024-06-15 15:32:05,956][1648985] Avg episode reward: [(0, '145.250')] [2024-06-15 15:32:06,537][1652491] Updated weights for policy 0, policy_version 325090 (0.0014) [2024-06-15 15:32:07,757][1652491] Updated weights for policy 0, policy_version 325156 (0.0012) [2024-06-15 15:32:10,846][1652491] Updated weights for policy 0, policy_version 325201 (0.0013) [2024-06-15 15:32:10,955][1648985] Fps is (10 sec: 42598.2, 60 sec: 44236.8, 300 sec: 45875.2). Total num frames: 666009600. Throughput: 0: 11446.0. Samples: 166554112. Policy #0 lag: (min: 31.0, avg: 166.9, max: 287.0) [2024-06-15 15:32:10,956][1648985] Avg episode reward: [(0, '128.870')] [2024-06-15 15:32:14,728][1652491] Updated weights for policy 0, policy_version 325251 (0.0014) [2024-06-15 15:32:15,955][1648985] Fps is (10 sec: 49152.0, 60 sec: 45329.0, 300 sec: 46319.5). Total num frames: 666206208. Throughput: 0: 11650.8. Samples: 166635008. Policy #0 lag: (min: 31.0, avg: 166.9, max: 287.0) [2024-06-15 15:32:15,956][1648985] Avg episode reward: [(0, '137.810')] [2024-06-15 15:32:15,992][1652491] Updated weights for policy 0, policy_version 325312 (0.0013) [2024-06-15 15:32:17,597][1652491] Updated weights for policy 0, policy_version 325381 (0.0130) [2024-06-15 15:32:18,677][1652491] Updated weights for policy 0, policy_version 325440 (0.0013) [2024-06-15 15:32:20,955][1648985] Fps is (10 sec: 49152.3, 60 sec: 46421.4, 300 sec: 45986.3). Total num frames: 666501120. Throughput: 0: 11855.7. Samples: 166697984. Policy #0 lag: (min: 31.0, avg: 166.9, max: 287.0) [2024-06-15 15:32:20,955][1648985] Avg episode reward: [(0, '169.630')] [2024-06-15 15:32:22,840][1652491] Updated weights for policy 0, policy_version 325497 (0.0011) [2024-06-15 15:32:25,955][1648985] Fps is (10 sec: 42599.1, 60 sec: 43690.7, 300 sec: 46208.5). Total num frames: 666632192. Throughput: 0: 11753.3. Samples: 166732288. Policy #0 lag: (min: 31.0, avg: 166.9, max: 287.0) [2024-06-15 15:32:25,955][1648985] Avg episode reward: [(0, '164.440')] [2024-06-15 15:32:26,859][1652491] Updated weights for policy 0, policy_version 325536 (0.0013) [2024-06-15 15:32:28,885][1652491] Updated weights for policy 0, policy_version 325618 (0.0021) [2024-06-15 15:32:30,362][1652491] Updated weights for policy 0, policy_version 325696 (0.0135) [2024-06-15 15:32:30,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 49152.0, 300 sec: 46319.5). Total num frames: 667025408. Throughput: 0: 11514.3. Samples: 166793216. Policy #0 lag: (min: 31.0, avg: 166.9, max: 287.0) [2024-06-15 15:32:30,955][1648985] Avg episode reward: [(0, '151.620')] [2024-06-15 15:32:34,872][1652491] Updated weights for policy 0, policy_version 325757 (0.0076) [2024-06-15 15:32:35,955][1648985] Fps is (10 sec: 52428.5, 60 sec: 45875.4, 300 sec: 46208.5). Total num frames: 667156480. Throughput: 0: 11639.5. Samples: 166864384. Policy #0 lag: (min: 31.0, avg: 166.9, max: 287.0) [2024-06-15 15:32:35,956][1648985] Avg episode reward: [(0, '136.260')] [2024-06-15 15:32:39,196][1652491] Updated weights for policy 0, policy_version 325808 (0.0010) [2024-06-15 15:32:40,725][1652491] Updated weights for policy 0, policy_version 325872 (0.0020) [2024-06-15 15:32:40,955][1648985] Fps is (10 sec: 36044.6, 60 sec: 47513.8, 300 sec: 46208.4). Total num frames: 667385856. Throughput: 0: 11730.5. Samples: 166908416. Policy #0 lag: (min: 63.0, avg: 144.2, max: 255.0) [2024-06-15 15:32:40,956][1648985] Avg episode reward: [(0, '158.340')] [2024-06-15 15:32:41,137][1651469] Signal inference workers to stop experience collection... (17050 times) [2024-06-15 15:32:41,172][1652491] InferenceWorker_p0-w0: stopping experience collection (17050 times) [2024-06-15 15:32:41,347][1651469] Signal inference workers to resume experience collection... (17050 times) [2024-06-15 15:32:41,349][1652491] InferenceWorker_p0-w0: resuming experience collection (17050 times) [2024-06-15 15:32:42,496][1652491] Updated weights for policy 0, policy_version 325952 (0.0136) [2024-06-15 15:32:45,955][1648985] Fps is (10 sec: 45875.2, 60 sec: 45875.1, 300 sec: 45986.3). Total num frames: 667615232. Throughput: 0: 11537.1. Samples: 166967296. Policy #0 lag: (min: 63.0, avg: 144.2, max: 255.0) [2024-06-15 15:32:45,956][1648985] Avg episode reward: [(0, '152.970')] [2024-06-15 15:32:46,455][1652491] Updated weights for policy 0, policy_version 326008 (0.0107) [2024-06-15 15:32:49,892][1652491] Updated weights for policy 0, policy_version 326064 (0.0015) [2024-06-15 15:32:50,760][1652491] Updated weights for policy 0, policy_version 326096 (0.0012) [2024-06-15 15:32:50,955][1648985] Fps is (10 sec: 45875.8, 60 sec: 46421.4, 300 sec: 46319.5). Total num frames: 667844608. Throughput: 0: 11707.8. Samples: 167045120. Policy #0 lag: (min: 63.0, avg: 144.2, max: 255.0) [2024-06-15 15:32:50,956][1648985] Avg episode reward: [(0, '157.990')] [2024-06-15 15:32:52,128][1652491] Updated weights for policy 0, policy_version 326149 (0.0013) [2024-06-15 15:32:53,124][1652491] Updated weights for policy 0, policy_version 326203 (0.0015) [2024-06-15 15:32:55,955][1648985] Fps is (10 sec: 45874.1, 60 sec: 46421.1, 300 sec: 45875.2). Total num frames: 668073984. Throughput: 0: 11616.7. Samples: 167076864. Policy #0 lag: (min: 63.0, avg: 144.2, max: 255.0) [2024-06-15 15:32:55,956][1648985] Avg episode reward: [(0, '158.870')] [2024-06-15 15:32:55,962][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000326208_668073984.pth... [2024-06-15 15:32:56,016][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000320896_657195008.pth [2024-06-15 15:32:57,196][1652491] Updated weights for policy 0, policy_version 326256 (0.0012) [2024-06-15 15:33:00,525][1652491] Updated weights for policy 0, policy_version 326305 (0.0032) [2024-06-15 15:33:00,955][1648985] Fps is (10 sec: 45874.8, 60 sec: 45329.1, 300 sec: 46319.5). Total num frames: 668303360. Throughput: 0: 11559.9. Samples: 167155200. Policy #0 lag: (min: 63.0, avg: 144.2, max: 255.0) [2024-06-15 15:33:00,956][1648985] Avg episode reward: [(0, '137.770')] [2024-06-15 15:33:02,079][1652491] Updated weights for policy 0, policy_version 326368 (0.0011) [2024-06-15 15:33:03,804][1652491] Updated weights for policy 0, policy_version 326457 (0.0014) [2024-06-15 15:33:05,955][1648985] Fps is (10 sec: 52429.6, 60 sec: 48059.8, 300 sec: 46097.4). Total num frames: 668598272. Throughput: 0: 11537.0. Samples: 167217152. Policy #0 lag: (min: 63.0, avg: 144.2, max: 255.0) [2024-06-15 15:33:05,956][1648985] Avg episode reward: [(0, '107.470')] [2024-06-15 15:33:08,565][1652491] Updated weights for policy 0, policy_version 326521 (0.0016) [2024-06-15 15:33:10,955][1648985] Fps is (10 sec: 45875.5, 60 sec: 45875.3, 300 sec: 46319.5). Total num frames: 668762112. Throughput: 0: 11639.5. Samples: 167256064. Policy #0 lag: (min: 63.0, avg: 144.2, max: 255.0) [2024-06-15 15:33:10,955][1648985] Avg episode reward: [(0, '116.910')] [2024-06-15 15:33:11,375][1652491] Updated weights for policy 0, policy_version 326562 (0.0011) [2024-06-15 15:33:13,118][1652491] Updated weights for policy 0, policy_version 326640 (0.0010) [2024-06-15 15:33:14,366][1652491] Updated weights for policy 0, policy_version 326688 (0.0013) [2024-06-15 15:33:15,955][1648985] Fps is (10 sec: 52430.0, 60 sec: 48606.1, 300 sec: 46430.6). Total num frames: 669122560. Throughput: 0: 11798.8. Samples: 167324160. Policy #0 lag: (min: 63.0, avg: 144.2, max: 255.0) [2024-06-15 15:33:15,955][1648985] Avg episode reward: [(0, '120.480')] [2024-06-15 15:33:18,990][1652491] Updated weights for policy 0, policy_version 326736 (0.0021) [2024-06-15 15:33:20,955][1648985] Fps is (10 sec: 49151.9, 60 sec: 45875.2, 300 sec: 46208.5). Total num frames: 669253632. Throughput: 0: 11855.7. Samples: 167397888. Policy #0 lag: (min: 63.0, avg: 144.2, max: 255.0) [2024-06-15 15:33:20,955][1648985] Avg episode reward: [(0, '151.900')] [2024-06-15 15:33:22,404][1652491] Updated weights for policy 0, policy_version 326800 (0.0013) [2024-06-15 15:33:23,150][1651469] Signal inference workers to stop experience collection... (17100 times) [2024-06-15 15:33:23,225][1652491] InferenceWorker_p0-w0: stopping experience collection (17100 times) [2024-06-15 15:33:23,354][1651469] Signal inference workers to resume experience collection... (17100 times) [2024-06-15 15:33:23,355][1652491] InferenceWorker_p0-w0: resuming experience collection (17100 times) [2024-06-15 15:33:23,743][1652491] Updated weights for policy 0, policy_version 326864 (0.0013) [2024-06-15 15:33:25,733][1652491] Updated weights for policy 0, policy_version 326944 (0.0014) [2024-06-15 15:33:25,955][1648985] Fps is (10 sec: 45873.5, 60 sec: 49151.8, 300 sec: 46652.7). Total num frames: 669581312. Throughput: 0: 11764.6. Samples: 167437824. Policy #0 lag: (min: 63.0, avg: 144.2, max: 255.0) [2024-06-15 15:33:25,956][1648985] Avg episode reward: [(0, '157.160')] [2024-06-15 15:33:30,557][1652491] Updated weights for policy 0, policy_version 327009 (0.0014) [2024-06-15 15:33:30,955][1648985] Fps is (10 sec: 49151.9, 60 sec: 45329.1, 300 sec: 46097.4). Total num frames: 669745152. Throughput: 0: 11867.0. Samples: 167501312. Policy #0 lag: (min: 63.0, avg: 144.2, max: 255.0) [2024-06-15 15:33:30,955][1648985] Avg episode reward: [(0, '154.990')] [2024-06-15 15:33:33,693][1652491] Updated weights for policy 0, policy_version 327057 (0.0013) [2024-06-15 15:33:34,921][1652491] Updated weights for policy 0, policy_version 327120 (0.0011) [2024-06-15 15:33:35,955][1648985] Fps is (10 sec: 42599.5, 60 sec: 47513.6, 300 sec: 46541.7). Total num frames: 670007296. Throughput: 0: 11662.2. Samples: 167569920. Policy #0 lag: (min: 63.0, avg: 144.2, max: 255.0) [2024-06-15 15:33:35,956][1648985] Avg episode reward: [(0, '137.200')] [2024-06-15 15:33:36,758][1652491] Updated weights for policy 0, policy_version 327184 (0.0017) [2024-06-15 15:33:37,923][1652491] Updated weights for policy 0, policy_version 327231 (0.0019) [2024-06-15 15:33:40,955][1648985] Fps is (10 sec: 42597.7, 60 sec: 46421.3, 300 sec: 45764.1). Total num frames: 670171136. Throughput: 0: 11696.4. Samples: 167603200. Policy #0 lag: (min: 63.0, avg: 144.2, max: 255.0) [2024-06-15 15:33:40,956][1648985] Avg episode reward: [(0, '132.650')] [2024-06-15 15:33:43,011][1652491] Updated weights for policy 0, policy_version 327280 (0.0016) [2024-06-15 15:33:45,279][1652491] Updated weights for policy 0, policy_version 327313 (0.0013) [2024-06-15 15:33:45,955][1648985] Fps is (10 sec: 39321.6, 60 sec: 46421.4, 300 sec: 46319.5). Total num frames: 670400512. Throughput: 0: 11673.6. Samples: 167680512. Policy #0 lag: (min: 63.0, avg: 144.2, max: 255.0) [2024-06-15 15:33:45,955][1648985] Avg episode reward: [(0, '120.170')] [2024-06-15 15:33:46,910][1652491] Updated weights for policy 0, policy_version 327392 (0.0013) [2024-06-15 15:33:48,288][1652491] Updated weights for policy 0, policy_version 327443 (0.0012) [2024-06-15 15:33:49,114][1652491] Updated weights for policy 0, policy_version 327488 (0.0017) [2024-06-15 15:33:50,955][1648985] Fps is (10 sec: 52429.6, 60 sec: 47513.6, 300 sec: 45986.3). Total num frames: 670695424. Throughput: 0: 11764.7. Samples: 167746560. Policy #0 lag: (min: 63.0, avg: 144.2, max: 255.0) [2024-06-15 15:33:50,956][1648985] Avg episode reward: [(0, '123.980')] [2024-06-15 15:33:54,082][1652491] Updated weights for policy 0, policy_version 327541 (0.0027) [2024-06-15 15:33:55,955][1648985] Fps is (10 sec: 45874.2, 60 sec: 46421.4, 300 sec: 46319.5). Total num frames: 670859264. Throughput: 0: 11684.9. Samples: 167781888. Policy #0 lag: (min: 63.0, avg: 144.2, max: 255.0) [2024-06-15 15:33:55,956][1648985] Avg episode reward: [(0, '140.250')] [2024-06-15 15:33:56,402][1652491] Updated weights for policy 0, policy_version 327584 (0.0011) [2024-06-15 15:33:58,212][1652491] Updated weights for policy 0, policy_version 327664 (0.0012) [2024-06-15 15:33:59,314][1652491] Updated weights for policy 0, policy_version 327696 (0.0012) [2024-06-15 15:34:00,332][1652491] Updated weights for policy 0, policy_version 327741 (0.0085) [2024-06-15 15:34:00,955][1648985] Fps is (10 sec: 52428.5, 60 sec: 48605.8, 300 sec: 46430.6). Total num frames: 671219712. Throughput: 0: 11616.7. Samples: 167846912. Policy #0 lag: (min: 23.0, avg: 112.4, max: 279.0) [2024-06-15 15:34:00,956][1648985] Avg episode reward: [(0, '152.390')] [2024-06-15 15:34:04,832][1651469] Signal inference workers to stop experience collection... (17150 times) [2024-06-15 15:34:04,884][1652491] InferenceWorker_p0-w0: stopping experience collection (17150 times) [2024-06-15 15:34:05,100][1651469] Signal inference workers to resume experience collection... (17150 times) [2024-06-15 15:34:05,101][1652491] InferenceWorker_p0-w0: resuming experience collection (17150 times) [2024-06-15 15:34:05,442][1652491] Updated weights for policy 0, policy_version 327808 (0.0017) [2024-06-15 15:34:05,955][1648985] Fps is (10 sec: 49152.7, 60 sec: 45875.2, 300 sec: 46208.4). Total num frames: 671350784. Throughput: 0: 11548.4. Samples: 167917568. Policy #0 lag: (min: 23.0, avg: 112.4, max: 279.0) [2024-06-15 15:34:05,956][1648985] Avg episode reward: [(0, '157.310')] [2024-06-15 15:34:08,647][1652491] Updated weights for policy 0, policy_version 327872 (0.0030) [2024-06-15 15:34:10,056][1652491] Updated weights for policy 0, policy_version 327933 (0.0028) [2024-06-15 15:34:10,955][1648985] Fps is (10 sec: 39321.4, 60 sec: 47513.5, 300 sec: 46319.5). Total num frames: 671612928. Throughput: 0: 11389.2. Samples: 167950336. Policy #0 lag: (min: 23.0, avg: 112.4, max: 279.0) [2024-06-15 15:34:10,956][1648985] Avg episode reward: [(0, '152.770')] [2024-06-15 15:34:11,664][1652491] Updated weights for policy 0, policy_version 327984 (0.0011) [2024-06-15 15:34:15,955][1648985] Fps is (10 sec: 42598.3, 60 sec: 44236.7, 300 sec: 45875.2). Total num frames: 671776768. Throughput: 0: 11593.9. Samples: 168023040. Policy #0 lag: (min: 23.0, avg: 112.4, max: 279.0) [2024-06-15 15:34:15,956][1648985] Avg episode reward: [(0, '147.110')] [2024-06-15 15:34:16,032][1652491] Updated weights for policy 0, policy_version 328032 (0.0012) [2024-06-15 15:34:19,501][1652491] Updated weights for policy 0, policy_version 328112 (0.0040) [2024-06-15 15:34:20,726][1652491] Updated weights for policy 0, policy_version 328162 (0.0019) [2024-06-15 15:34:20,955][1648985] Fps is (10 sec: 45875.6, 60 sec: 46967.4, 300 sec: 46319.5). Total num frames: 672071680. Throughput: 0: 11525.7. Samples: 168088576. Policy #0 lag: (min: 23.0, avg: 112.4, max: 279.0) [2024-06-15 15:34:20,956][1648985] Avg episode reward: [(0, '143.560')] [2024-06-15 15:34:22,890][1652491] Updated weights for policy 0, policy_version 328210 (0.0019) [2024-06-15 15:34:25,978][1648985] Fps is (10 sec: 49037.9, 60 sec: 44765.7, 300 sec: 45760.5). Total num frames: 672268288. Throughput: 0: 11622.1. Samples: 168126464. Policy #0 lag: (min: 23.0, avg: 112.4, max: 279.0) [2024-06-15 15:34:25,979][1648985] Avg episode reward: [(0, '162.000')] [2024-06-15 15:34:26,819][1652491] Updated weights for policy 0, policy_version 328261 (0.0011) [2024-06-15 15:34:28,208][1652491] Updated weights for policy 0, policy_version 328319 (0.0121) [2024-06-15 15:34:30,436][1652491] Updated weights for policy 0, policy_version 328387 (0.0013) [2024-06-15 15:34:30,955][1648985] Fps is (10 sec: 49151.8, 60 sec: 46967.4, 300 sec: 46652.8). Total num frames: 672563200. Throughput: 0: 11514.3. Samples: 168198656. Policy #0 lag: (min: 23.0, avg: 112.4, max: 279.0) [2024-06-15 15:34:30,956][1648985] Avg episode reward: [(0, '161.790')] [2024-06-15 15:34:31,797][1652491] Updated weights for policy 0, policy_version 328441 (0.0013) [2024-06-15 15:34:34,960][1652491] Updated weights for policy 0, policy_version 328481 (0.0010) [2024-06-15 15:34:35,955][1648985] Fps is (10 sec: 52551.3, 60 sec: 46421.3, 300 sec: 46208.5). Total num frames: 672792576. Throughput: 0: 11628.1. Samples: 168269824. Policy #0 lag: (min: 23.0, avg: 112.4, max: 279.0) [2024-06-15 15:34:35,956][1648985] Avg episode reward: [(0, '153.520')] [2024-06-15 15:34:38,900][1652491] Updated weights for policy 0, policy_version 328546 (0.0013) [2024-06-15 15:34:40,749][1652491] Updated weights for policy 0, policy_version 328618 (0.0126) [2024-06-15 15:34:40,955][1648985] Fps is (10 sec: 45875.1, 60 sec: 47513.7, 300 sec: 46541.7). Total num frames: 673021952. Throughput: 0: 11662.3. Samples: 168306688. Policy #0 lag: (min: 23.0, avg: 112.4, max: 279.0) [2024-06-15 15:34:40,956][1648985] Avg episode reward: [(0, '139.930')] [2024-06-15 15:34:41,554][1652491] Updated weights for policy 0, policy_version 328643 (0.0012) [2024-06-15 15:34:42,807][1652491] Updated weights for policy 0, policy_version 328702 (0.0013) [2024-06-15 15:34:45,955][1648985] Fps is (10 sec: 45875.5, 60 sec: 47513.6, 300 sec: 46319.5). Total num frames: 673251328. Throughput: 0: 11685.0. Samples: 168372736. Policy #0 lag: (min: 23.0, avg: 112.4, max: 279.0) [2024-06-15 15:34:45,956][1648985] Avg episode reward: [(0, '148.770')] [2024-06-15 15:34:46,450][1652491] Updated weights for policy 0, policy_version 328759 (0.0036) [2024-06-15 15:34:50,258][1651469] Signal inference workers to stop experience collection... (17200 times) [2024-06-15 15:34:50,291][1652491] InferenceWorker_p0-w0: stopping experience collection (17200 times) [2024-06-15 15:34:50,525][1651469] Signal inference workers to resume experience collection... (17200 times) [2024-06-15 15:34:50,526][1652491] InferenceWorker_p0-w0: resuming experience collection (17200 times) [2024-06-15 15:34:50,955][1648985] Fps is (10 sec: 39321.9, 60 sec: 45329.1, 300 sec: 46097.4). Total num frames: 673415168. Throughput: 0: 11776.0. Samples: 168447488. Policy #0 lag: (min: 23.0, avg: 112.4, max: 279.0) [2024-06-15 15:34:50,956][1648985] Avg episode reward: [(0, '152.850')] [2024-06-15 15:34:51,156][1652491] Updated weights for policy 0, policy_version 328822 (0.0037) [2024-06-15 15:34:53,198][1652491] Updated weights for policy 0, policy_version 328912 (0.0011) [2024-06-15 15:34:55,955][1648985] Fps is (10 sec: 45874.8, 60 sec: 47513.7, 300 sec: 46430.6). Total num frames: 673710080. Throughput: 0: 11628.1. Samples: 168473600. Policy #0 lag: (min: 23.0, avg: 112.4, max: 279.0) [2024-06-15 15:34:55,956][1648985] Avg episode reward: [(0, '146.380')] [2024-06-15 15:34:55,973][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000328960_673710080.pth... [2024-06-15 15:34:56,044][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000323536_662601728.pth [2024-06-15 15:34:56,450][1652491] Updated weights for policy 0, policy_version 328961 (0.0012) [2024-06-15 15:34:57,815][1652491] Updated weights for policy 0, policy_version 329024 (0.0013) [2024-06-15 15:35:00,955][1648985] Fps is (10 sec: 42598.2, 60 sec: 43690.7, 300 sec: 45764.1). Total num frames: 673841152. Throughput: 0: 11764.6. Samples: 168552448. Policy #0 lag: (min: 23.0, avg: 112.4, max: 279.0) [2024-06-15 15:35:00,956][1648985] Avg episode reward: [(0, '179.330')] [2024-06-15 15:35:02,175][1652491] Updated weights for policy 0, policy_version 329072 (0.0018) [2024-06-15 15:35:04,340][1652491] Updated weights for policy 0, policy_version 329154 (0.0012) [2024-06-15 15:35:05,593][1652491] Updated weights for policy 0, policy_version 329216 (0.0013) [2024-06-15 15:35:05,955][1648985] Fps is (10 sec: 52428.2, 60 sec: 48059.7, 300 sec: 46763.8). Total num frames: 674234368. Throughput: 0: 11639.4. Samples: 168612352. Policy #0 lag: (min: 23.0, avg: 112.4, max: 279.0) [2024-06-15 15:35:05,956][1648985] Avg episode reward: [(0, '168.950')] [2024-06-15 15:35:09,332][1652491] Updated weights for policy 0, policy_version 329269 (0.0011) [2024-06-15 15:35:10,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 45875.3, 300 sec: 46097.4). Total num frames: 674365440. Throughput: 0: 11702.4. Samples: 168652800. Policy #0 lag: (min: 23.0, avg: 112.4, max: 279.0) [2024-06-15 15:35:10,956][1648985] Avg episode reward: [(0, '164.830')] [2024-06-15 15:35:13,055][1652491] Updated weights for policy 0, policy_version 329315 (0.0012) [2024-06-15 15:35:14,704][1652491] Updated weights for policy 0, policy_version 329392 (0.0015) [2024-06-15 15:35:15,955][1648985] Fps is (10 sec: 45875.2, 60 sec: 48605.8, 300 sec: 46763.8). Total num frames: 674693120. Throughput: 0: 11719.1. Samples: 168726016. Policy #0 lag: (min: 15.0, avg: 88.3, max: 271.0) [2024-06-15 15:35:15,956][1648985] Avg episode reward: [(0, '151.190')] [2024-06-15 15:35:16,431][1652491] Updated weights for policy 0, policy_version 329456 (0.0018) [2024-06-15 15:35:20,062][1652491] Updated weights for policy 0, policy_version 329506 (0.0013) [2024-06-15 15:35:20,955][1648985] Fps is (10 sec: 52428.4, 60 sec: 46967.4, 300 sec: 46208.5). Total num frames: 674889728. Throughput: 0: 11662.2. Samples: 168794624. Policy #0 lag: (min: 15.0, avg: 88.3, max: 271.0) [2024-06-15 15:35:20,956][1648985] Avg episode reward: [(0, '151.420')] [2024-06-15 15:35:24,105][1652491] Updated weights for policy 0, policy_version 329552 (0.0020) [2024-06-15 15:35:25,955][1648985] Fps is (10 sec: 36044.9, 60 sec: 46439.3, 300 sec: 46319.5). Total num frames: 675053568. Throughput: 0: 11753.2. Samples: 168835584. Policy #0 lag: (min: 15.0, avg: 88.3, max: 271.0) [2024-06-15 15:35:25,956][1648985] Avg episode reward: [(0, '157.860')] [2024-06-15 15:35:26,074][1652491] Updated weights for policy 0, policy_version 329632 (0.0014) [2024-06-15 15:35:27,382][1651469] Signal inference workers to stop experience collection... (17250 times) [2024-06-15 15:35:27,432][1652491] InferenceWorker_p0-w0: stopping experience collection (17250 times) [2024-06-15 15:35:27,733][1651469] Signal inference workers to resume experience collection... (17250 times) [2024-06-15 15:35:27,734][1652491] InferenceWorker_p0-w0: resuming experience collection (17250 times) [2024-06-15 15:35:27,938][1652491] Updated weights for policy 0, policy_version 329699 (0.0012) [2024-06-15 15:35:30,955][1648985] Fps is (10 sec: 39322.2, 60 sec: 45329.1, 300 sec: 45875.2). Total num frames: 675282944. Throughput: 0: 11502.9. Samples: 168890368. Policy #0 lag: (min: 15.0, avg: 88.3, max: 271.0) [2024-06-15 15:35:30,962][1648985] Avg episode reward: [(0, '149.680')] [2024-06-15 15:35:30,953][1652491] Updated weights for policy 0, policy_version 329732 (0.0012) [2024-06-15 15:35:35,863][1652491] Updated weights for policy 0, policy_version 329793 (0.0038) [2024-06-15 15:35:35,955][1648985] Fps is (10 sec: 36044.8, 60 sec: 43690.6, 300 sec: 45765.5). Total num frames: 675414016. Throughput: 0: 11548.4. Samples: 168967168. Policy #0 lag: (min: 15.0, avg: 88.3, max: 271.0) [2024-06-15 15:35:35,956][1648985] Avg episode reward: [(0, '155.680')] [2024-06-15 15:35:38,084][1652491] Updated weights for policy 0, policy_version 329888 (0.0095) [2024-06-15 15:35:39,564][1652491] Updated weights for policy 0, policy_version 329952 (0.0013) [2024-06-15 15:35:40,955][1648985] Fps is (10 sec: 52427.5, 60 sec: 46421.2, 300 sec: 46319.5). Total num frames: 675807232. Throughput: 0: 11457.4. Samples: 168989184. Policy #0 lag: (min: 15.0, avg: 88.3, max: 271.0) [2024-06-15 15:35:40,957][1648985] Avg episode reward: [(0, '138.340')] [2024-06-15 15:35:43,120][1652491] Updated weights for policy 0, policy_version 330016 (0.0154) [2024-06-15 15:35:45,955][1648985] Fps is (10 sec: 52429.2, 60 sec: 44782.9, 300 sec: 45764.1). Total num frames: 675938304. Throughput: 0: 11298.1. Samples: 169060864. Policy #0 lag: (min: 15.0, avg: 88.3, max: 271.0) [2024-06-15 15:35:45,956][1648985] Avg episode reward: [(0, '131.500')] [2024-06-15 15:35:47,751][1652491] Updated weights for policy 0, policy_version 330064 (0.0032) [2024-06-15 15:35:49,569][1652491] Updated weights for policy 0, policy_version 330132 (0.0012) [2024-06-15 15:35:50,955][1648985] Fps is (10 sec: 42599.3, 60 sec: 46967.5, 300 sec: 46430.7). Total num frames: 676233216. Throughput: 0: 11389.2. Samples: 169124864. Policy #0 lag: (min: 15.0, avg: 88.3, max: 271.0) [2024-06-15 15:35:50,956][1648985] Avg episode reward: [(0, '133.730')] [2024-06-15 15:35:51,528][1652491] Updated weights for policy 0, policy_version 330238 (0.0012) [2024-06-15 15:35:54,901][1652491] Updated weights for policy 0, policy_version 330295 (0.0015) [2024-06-15 15:35:55,955][1648985] Fps is (10 sec: 52427.2, 60 sec: 45875.0, 300 sec: 46208.4). Total num frames: 676462592. Throughput: 0: 11354.9. Samples: 169163776. Policy #0 lag: (min: 15.0, avg: 88.3, max: 271.0) [2024-06-15 15:35:55,956][1648985] Avg episode reward: [(0, '164.650')] [2024-06-15 15:35:59,405][1652491] Updated weights for policy 0, policy_version 330330 (0.0013) [2024-06-15 15:36:00,896][1652491] Updated weights for policy 0, policy_version 330388 (0.0013) [2024-06-15 15:36:00,955][1648985] Fps is (10 sec: 39321.3, 60 sec: 46421.3, 300 sec: 46319.5). Total num frames: 676626432. Throughput: 0: 11480.2. Samples: 169242624. Policy #0 lag: (min: 15.0, avg: 88.3, max: 271.0) [2024-06-15 15:36:00,956][1648985] Avg episode reward: [(0, '150.840')] [2024-06-15 15:36:01,969][1652491] Updated weights for policy 0, policy_version 330448 (0.0012) [2024-06-15 15:36:02,939][1652491] Updated weights for policy 0, policy_version 330494 (0.0013) [2024-06-15 15:36:04,835][1652491] Updated weights for policy 0, policy_version 330529 (0.0095) [2024-06-15 15:36:05,955][1648985] Fps is (10 sec: 52430.6, 60 sec: 45875.3, 300 sec: 46208.4). Total num frames: 676986880. Throughput: 0: 11389.2. Samples: 169307136. Policy #0 lag: (min: 15.0, avg: 88.3, max: 271.0) [2024-06-15 15:36:05,955][1648985] Avg episode reward: [(0, '126.610')] [2024-06-15 15:36:10,671][1652491] Updated weights for policy 0, policy_version 330608 (0.0136) [2024-06-15 15:36:10,786][1651469] Signal inference workers to stop experience collection... (17300 times) [2024-06-15 15:36:10,837][1652491] InferenceWorker_p0-w0: stopping experience collection (17300 times) [2024-06-15 15:36:10,955][1648985] Fps is (10 sec: 45875.4, 60 sec: 45329.1, 300 sec: 46097.4). Total num frames: 677085184. Throughput: 0: 11446.1. Samples: 169350656. Policy #0 lag: (min: 15.0, avg: 88.3, max: 271.0) [2024-06-15 15:36:10,956][1648985] Avg episode reward: [(0, '135.590')] [2024-06-15 15:36:11,135][1651469] Signal inference workers to resume experience collection... (17300 times) [2024-06-15 15:36:11,136][1652491] InferenceWorker_p0-w0: resuming experience collection (17300 times) [2024-06-15 15:36:12,782][1652491] Updated weights for policy 0, policy_version 330692 (0.0019) [2024-06-15 15:36:13,768][1652491] Updated weights for policy 0, policy_version 330750 (0.0021) [2024-06-15 15:36:15,955][1648985] Fps is (10 sec: 42598.2, 60 sec: 45329.1, 300 sec: 46430.6). Total num frames: 677412864. Throughput: 0: 11491.5. Samples: 169407488. Policy #0 lag: (min: 15.0, avg: 88.3, max: 271.0) [2024-06-15 15:36:15,957][1648985] Avg episode reward: [(0, '138.150')] [2024-06-15 15:36:16,886][1652491] Updated weights for policy 0, policy_version 330809 (0.0149) [2024-06-15 15:36:20,955][1648985] Fps is (10 sec: 42598.3, 60 sec: 43690.7, 300 sec: 45764.1). Total num frames: 677511168. Throughput: 0: 11571.2. Samples: 169487872. Policy #0 lag: (min: 15.0, avg: 88.3, max: 271.0) [2024-06-15 15:36:20,956][1648985] Avg episode reward: [(0, '144.100')] [2024-06-15 15:36:22,700][1652491] Updated weights for policy 0, policy_version 330864 (0.0014) [2024-06-15 15:36:24,646][1652491] Updated weights for policy 0, policy_version 330929 (0.0017) [2024-06-15 15:36:25,551][1652491] Updated weights for policy 0, policy_version 330976 (0.0012) [2024-06-15 15:36:25,955][1648985] Fps is (10 sec: 42598.4, 60 sec: 46421.4, 300 sec: 46652.7). Total num frames: 677838848. Throughput: 0: 11673.6. Samples: 169514496. Policy #0 lag: (min: 15.0, avg: 88.3, max: 271.0) [2024-06-15 15:36:25,956][1648985] Avg episode reward: [(0, '143.820')] [2024-06-15 15:36:28,524][1652491] Updated weights for policy 0, policy_version 331056 (0.0025) [2024-06-15 15:36:30,955][1648985] Fps is (10 sec: 52428.0, 60 sec: 45875.0, 300 sec: 46208.4). Total num frames: 678035456. Throughput: 0: 11502.9. Samples: 169578496. Policy #0 lag: (min: 47.0, avg: 193.7, max: 303.0) [2024-06-15 15:36:30,956][1648985] Avg episode reward: [(0, '132.900')] [2024-06-15 15:36:34,079][1652491] Updated weights for policy 0, policy_version 331120 (0.0011) [2024-06-15 15:36:35,797][1652491] Updated weights for policy 0, policy_version 331184 (0.0014) [2024-06-15 15:36:35,966][1648985] Fps is (10 sec: 42551.3, 60 sec: 47504.9, 300 sec: 46540.0). Total num frames: 678264832. Throughput: 0: 11522.8. Samples: 169643520. Policy #0 lag: (min: 47.0, avg: 193.7, max: 303.0) [2024-06-15 15:36:35,967][1648985] Avg episode reward: [(0, '128.690')] [2024-06-15 15:36:37,342][1652491] Updated weights for policy 0, policy_version 331234 (0.0013) [2024-06-15 15:36:40,505][1652491] Updated weights for policy 0, policy_version 331298 (0.0012) [2024-06-15 15:36:40,955][1648985] Fps is (10 sec: 52429.8, 60 sec: 45875.3, 300 sec: 46430.6). Total num frames: 678559744. Throughput: 0: 11468.9. Samples: 169679872. Policy #0 lag: (min: 47.0, avg: 193.7, max: 303.0) [2024-06-15 15:36:40,955][1648985] Avg episode reward: [(0, '132.150')] [2024-06-15 15:36:45,027][1652491] Updated weights for policy 0, policy_version 331362 (0.0025) [2024-06-15 15:36:45,955][1648985] Fps is (10 sec: 42645.1, 60 sec: 45875.1, 300 sec: 46208.4). Total num frames: 678690816. Throughput: 0: 11298.1. Samples: 169751040. Policy #0 lag: (min: 47.0, avg: 193.7, max: 303.0) [2024-06-15 15:36:45,956][1648985] Avg episode reward: [(0, '135.860')] [2024-06-15 15:36:46,352][1652491] Updated weights for policy 0, policy_version 331413 (0.0014) [2024-06-15 15:36:47,875][1652491] Updated weights for policy 0, policy_version 331473 (0.0016) [2024-06-15 15:36:50,955][1648985] Fps is (10 sec: 39321.6, 60 sec: 45329.0, 300 sec: 46319.5). Total num frames: 678952960. Throughput: 0: 11389.2. Samples: 169819648. Policy #0 lag: (min: 47.0, avg: 193.7, max: 303.0) [2024-06-15 15:36:50,956][1648985] Avg episode reward: [(0, '131.380')] [2024-06-15 15:36:51,390][1651469] Signal inference workers to stop experience collection... (17350 times) [2024-06-15 15:36:51,467][1652491] InferenceWorker_p0-w0: stopping experience collection (17350 times) [2024-06-15 15:36:51,591][1651469] Signal inference workers to resume experience collection... (17350 times) [2024-06-15 15:36:51,601][1652491] InferenceWorker_p0-w0: resuming experience collection (17350 times) [2024-06-15 15:36:51,604][1652491] Updated weights for policy 0, policy_version 331536 (0.0014) [2024-06-15 15:36:52,449][1652491] Updated weights for policy 0, policy_version 331581 (0.0033) [2024-06-15 15:36:55,955][1648985] Fps is (10 sec: 42598.6, 60 sec: 44237.0, 300 sec: 45875.2). Total num frames: 679116800. Throughput: 0: 11207.1. Samples: 169854976. Policy #0 lag: (min: 47.0, avg: 193.7, max: 303.0) [2024-06-15 15:36:55,955][1648985] Avg episode reward: [(0, '142.010')] [2024-06-15 15:36:56,283][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000331616_679149568.pth... [2024-06-15 15:36:56,470][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000326208_668073984.pth [2024-06-15 15:36:57,857][1652491] Updated weights for policy 0, policy_version 331667 (0.0038) [2024-06-15 15:36:59,008][1652491] Updated weights for policy 0, policy_version 331712 (0.0130) [2024-06-15 15:37:00,955][1648985] Fps is (10 sec: 49152.7, 60 sec: 46967.6, 300 sec: 46541.7). Total num frames: 679444480. Throughput: 0: 11150.3. Samples: 169909248. Policy #0 lag: (min: 47.0, avg: 193.7, max: 303.0) [2024-06-15 15:37:00,955][1648985] Avg episode reward: [(0, '155.640')] [2024-06-15 15:37:00,965][1652491] Updated weights for policy 0, policy_version 331771 (0.0013) [2024-06-15 15:37:04,838][1652491] Updated weights for policy 0, policy_version 331833 (0.0076) [2024-06-15 15:37:05,955][1648985] Fps is (10 sec: 49152.7, 60 sec: 43690.7, 300 sec: 46097.4). Total num frames: 679608320. Throughput: 0: 10979.6. Samples: 169981952. Policy #0 lag: (min: 47.0, avg: 193.7, max: 303.0) [2024-06-15 15:37:05,956][1648985] Avg episode reward: [(0, '149.350')] [2024-06-15 15:37:08,093][1652491] Updated weights for policy 0, policy_version 331872 (0.0014) [2024-06-15 15:37:09,967][1652491] Updated weights for policy 0, policy_version 331952 (0.0013) [2024-06-15 15:37:10,955][1648985] Fps is (10 sec: 42597.3, 60 sec: 46421.3, 300 sec: 46319.5). Total num frames: 679870464. Throughput: 0: 11127.5. Samples: 170015232. Policy #0 lag: (min: 47.0, avg: 193.7, max: 303.0) [2024-06-15 15:37:10,956][1648985] Avg episode reward: [(0, '131.370')] [2024-06-15 15:37:12,196][1652491] Updated weights for policy 0, policy_version 332002 (0.0015) [2024-06-15 15:37:14,831][1652491] Updated weights for policy 0, policy_version 332052 (0.0012) [2024-06-15 15:37:15,777][1652491] Updated weights for policy 0, policy_version 332096 (0.0012) [2024-06-15 15:37:15,955][1648985] Fps is (10 sec: 52429.1, 60 sec: 45329.2, 300 sec: 46208.4). Total num frames: 680132608. Throughput: 0: 11389.2. Samples: 170091008. Policy #0 lag: (min: 47.0, avg: 193.7, max: 303.0) [2024-06-15 15:37:15,955][1648985] Avg episode reward: [(0, '128.850')] [2024-06-15 15:37:20,538][1652491] Updated weights for policy 0, policy_version 332192 (0.0013) [2024-06-15 15:37:20,955][1648985] Fps is (10 sec: 49152.4, 60 sec: 47513.6, 300 sec: 46541.7). Total num frames: 680361984. Throughput: 0: 11323.7. Samples: 170152960. Policy #0 lag: (min: 47.0, avg: 193.7, max: 303.0) [2024-06-15 15:37:20,956][1648985] Avg episode reward: [(0, '141.120')] [2024-06-15 15:37:22,766][1652491] Updated weights for policy 0, policy_version 332227 (0.0037) [2024-06-15 15:37:23,975][1652491] Updated weights for policy 0, policy_version 332285 (0.0013) [2024-06-15 15:37:25,955][1648985] Fps is (10 sec: 39320.5, 60 sec: 44782.8, 300 sec: 45764.1). Total num frames: 680525824. Throughput: 0: 11343.6. Samples: 170190336. Policy #0 lag: (min: 47.0, avg: 193.7, max: 303.0) [2024-06-15 15:37:25,956][1648985] Avg episode reward: [(0, '132.960')] [2024-06-15 15:37:27,076][1652491] Updated weights for policy 0, policy_version 332347 (0.0013) [2024-06-15 15:37:29,564][1652491] Updated weights for policy 0, policy_version 332406 (0.0181) [2024-06-15 15:37:30,955][1648985] Fps is (10 sec: 45875.4, 60 sec: 46421.5, 300 sec: 46319.5). Total num frames: 680820736. Throughput: 0: 11412.0. Samples: 170264576. Policy #0 lag: (min: 47.0, avg: 193.7, max: 303.0) [2024-06-15 15:37:30,956][1648985] Avg episode reward: [(0, '142.870')] [2024-06-15 15:37:31,196][1652491] Updated weights for policy 0, policy_version 332450 (0.0013) [2024-06-15 15:37:31,770][1652491] Updated weights for policy 0, policy_version 332480 (0.0012) [2024-06-15 15:37:34,527][1651469] Signal inference workers to stop experience collection... (17400 times) [2024-06-15 15:37:34,561][1652491] InferenceWorker_p0-w0: stopping experience collection (17400 times) [2024-06-15 15:37:34,743][1651469] Signal inference workers to resume experience collection... (17400 times) [2024-06-15 15:37:34,743][1652491] InferenceWorker_p0-w0: resuming experience collection (17400 times) [2024-06-15 15:37:34,916][1652491] Updated weights for policy 0, policy_version 332537 (0.0012) [2024-06-15 15:37:35,955][1648985] Fps is (10 sec: 52429.6, 60 sec: 46429.9, 300 sec: 46319.5). Total num frames: 681050112. Throughput: 0: 11571.2. Samples: 170340352. Policy #0 lag: (min: 47.0, avg: 193.7, max: 303.0) [2024-06-15 15:37:35,956][1648985] Avg episode reward: [(0, '143.800')] [2024-06-15 15:37:38,487][1652491] Updated weights for policy 0, policy_version 332592 (0.0017) [2024-06-15 15:37:39,916][1652491] Updated weights for policy 0, policy_version 332656 (0.0113) [2024-06-15 15:37:40,955][1648985] Fps is (10 sec: 49150.9, 60 sec: 45875.0, 300 sec: 46430.6). Total num frames: 681312256. Throughput: 0: 11548.4. Samples: 170374656. Policy #0 lag: (min: 47.0, avg: 193.7, max: 303.0) [2024-06-15 15:37:40,956][1648985] Avg episode reward: [(0, '146.710')] [2024-06-15 15:37:41,830][1652491] Updated weights for policy 0, policy_version 332704 (0.0012) [2024-06-15 15:37:42,469][1652491] Updated weights for policy 0, policy_version 332735 (0.0014) [2024-06-15 15:37:45,796][1652491] Updated weights for policy 0, policy_version 332784 (0.0013) [2024-06-15 15:37:45,958][1648985] Fps is (10 sec: 49136.2, 60 sec: 47511.2, 300 sec: 46430.1). Total num frames: 681541632. Throughput: 0: 11934.4. Samples: 170446336. Policy #0 lag: (min: 15.0, avg: 126.8, max: 271.0) [2024-06-15 15:37:45,959][1648985] Avg episode reward: [(0, '161.660')] [2024-06-15 15:37:48,456][1652491] Updated weights for policy 0, policy_version 332817 (0.0011) [2024-06-15 15:37:49,320][1652491] Updated weights for policy 0, policy_version 332855 (0.0014) [2024-06-15 15:37:50,835][1652491] Updated weights for policy 0, policy_version 332912 (0.0038) [2024-06-15 15:37:50,955][1648985] Fps is (10 sec: 49152.8, 60 sec: 47513.6, 300 sec: 46541.7). Total num frames: 681803776. Throughput: 0: 11855.6. Samples: 170515456. Policy #0 lag: (min: 15.0, avg: 126.8, max: 271.0) [2024-06-15 15:37:50,956][1648985] Avg episode reward: [(0, '151.900')] [2024-06-15 15:37:53,822][1652491] Updated weights for policy 0, policy_version 332986 (0.0013) [2024-06-15 15:37:55,955][1648985] Fps is (10 sec: 42612.3, 60 sec: 47513.7, 300 sec: 46319.5). Total num frames: 681967616. Throughput: 0: 11844.3. Samples: 170548224. Policy #0 lag: (min: 15.0, avg: 126.8, max: 271.0) [2024-06-15 15:37:55,956][1648985] Avg episode reward: [(0, '130.780')] [2024-06-15 15:37:56,772][1652491] Updated weights for policy 0, policy_version 333040 (0.0014) [2024-06-15 15:37:59,358][1652491] Updated weights for policy 0, policy_version 333088 (0.0018) [2024-06-15 15:38:00,955][1648985] Fps is (10 sec: 42597.6, 60 sec: 46421.0, 300 sec: 46208.4). Total num frames: 682229760. Throughput: 0: 11878.3. Samples: 170625536. Policy #0 lag: (min: 15.0, avg: 126.8, max: 271.0) [2024-06-15 15:38:00,956][1648985] Avg episode reward: [(0, '134.770')] [2024-06-15 15:38:01,748][1652491] Updated weights for policy 0, policy_version 333153 (0.0013) [2024-06-15 15:38:04,069][1652491] Updated weights for policy 0, policy_version 333216 (0.0013) [2024-06-15 15:38:05,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 48059.7, 300 sec: 46541.7). Total num frames: 682491904. Throughput: 0: 12071.8. Samples: 170696192. Policy #0 lag: (min: 15.0, avg: 126.8, max: 271.0) [2024-06-15 15:38:05,956][1648985] Avg episode reward: [(0, '143.570')] [2024-06-15 15:38:07,247][1652491] Updated weights for policy 0, policy_version 333296 (0.0015) [2024-06-15 15:38:10,571][1652491] Updated weights for policy 0, policy_version 333332 (0.0011) [2024-06-15 15:38:10,955][1648985] Fps is (10 sec: 45876.3, 60 sec: 46967.6, 300 sec: 45986.3). Total num frames: 682688512. Throughput: 0: 12083.3. Samples: 170734080. Policy #0 lag: (min: 15.0, avg: 126.8, max: 271.0) [2024-06-15 15:38:10,955][1648985] Avg episode reward: [(0, '140.880')] [2024-06-15 15:38:11,537][1652491] Updated weights for policy 0, policy_version 333376 (0.0011) [2024-06-15 15:38:13,751][1652491] Updated weights for policy 0, policy_version 333440 (0.0015) [2024-06-15 15:38:15,955][1648985] Fps is (10 sec: 49152.2, 60 sec: 47513.6, 300 sec: 46541.7). Total num frames: 682983424. Throughput: 0: 11923.9. Samples: 170801152. Policy #0 lag: (min: 15.0, avg: 126.8, max: 271.0) [2024-06-15 15:38:15,955][1648985] Avg episode reward: [(0, '134.100')] [2024-06-15 15:38:16,028][1652491] Updated weights for policy 0, policy_version 333497 (0.0015) [2024-06-15 15:38:18,183][1651469] Signal inference workers to stop experience collection... (17450 times) [2024-06-15 15:38:18,234][1652491] InferenceWorker_p0-w0: stopping experience collection (17450 times) [2024-06-15 15:38:18,339][1651469] Signal inference workers to resume experience collection... (17450 times) [2024-06-15 15:38:18,340][1652491] InferenceWorker_p0-w0: resuming experience collection (17450 times) [2024-06-15 15:38:18,434][1652491] Updated weights for policy 0, policy_version 333558 (0.0013) [2024-06-15 15:38:20,955][1648985] Fps is (10 sec: 45875.1, 60 sec: 46421.3, 300 sec: 45986.3). Total num frames: 683147264. Throughput: 0: 11787.4. Samples: 170870784. Policy #0 lag: (min: 15.0, avg: 126.8, max: 271.0) [2024-06-15 15:38:20,956][1648985] Avg episode reward: [(0, '136.750')] [2024-06-15 15:38:22,615][1652491] Updated weights for policy 0, policy_version 333616 (0.0095) [2024-06-15 15:38:25,066][1652491] Updated weights for policy 0, policy_version 333670 (0.0114) [2024-06-15 15:38:25,955][1648985] Fps is (10 sec: 42598.5, 60 sec: 48059.9, 300 sec: 46319.5). Total num frames: 683409408. Throughput: 0: 11821.6. Samples: 170906624. Policy #0 lag: (min: 15.0, avg: 126.8, max: 271.0) [2024-06-15 15:38:25,955][1648985] Avg episode reward: [(0, '146.360')] [2024-06-15 15:38:26,984][1652491] Updated weights for policy 0, policy_version 333712 (0.0011) [2024-06-15 15:38:28,990][1652491] Updated weights for policy 0, policy_version 333763 (0.0015) [2024-06-15 15:38:29,839][1652491] Updated weights for policy 0, policy_version 333808 (0.0013) [2024-06-15 15:38:30,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 47513.6, 300 sec: 46319.5). Total num frames: 683671552. Throughput: 0: 11685.8. Samples: 170972160. Policy #0 lag: (min: 15.0, avg: 126.8, max: 271.0) [2024-06-15 15:38:30,956][1648985] Avg episode reward: [(0, '159.700')] [2024-06-15 15:38:33,237][1652491] Updated weights for policy 0, policy_version 333843 (0.0011) [2024-06-15 15:38:35,896][1652491] Updated weights for policy 0, policy_version 333906 (0.0014) [2024-06-15 15:38:35,955][1648985] Fps is (10 sec: 42597.9, 60 sec: 46421.3, 300 sec: 46319.5). Total num frames: 683835392. Throughput: 0: 11741.9. Samples: 171043840. Policy #0 lag: (min: 15.0, avg: 126.8, max: 271.0) [2024-06-15 15:38:35,956][1648985] Avg episode reward: [(0, '168.670')] [2024-06-15 15:38:36,874][1652491] Updated weights for policy 0, policy_version 333947 (0.0011) [2024-06-15 15:38:38,550][1652491] Updated weights for policy 0, policy_version 334000 (0.0016) [2024-06-15 15:38:40,021][1652491] Updated weights for policy 0, policy_version 334041 (0.0014) [2024-06-15 15:38:40,955][1648985] Fps is (10 sec: 52428.5, 60 sec: 48059.8, 300 sec: 46763.8). Total num frames: 684195840. Throughput: 0: 11821.5. Samples: 171080192. Policy #0 lag: (min: 15.0, avg: 126.8, max: 271.0) [2024-06-15 15:38:40,956][1648985] Avg episode reward: [(0, '145.160')] [2024-06-15 15:38:43,529][1652491] Updated weights for policy 0, policy_version 334096 (0.0016) [2024-06-15 15:38:45,955][1648985] Fps is (10 sec: 49150.9, 60 sec: 46423.6, 300 sec: 46208.4). Total num frames: 684326912. Throughput: 0: 11593.9. Samples: 171147264. Policy #0 lag: (min: 15.0, avg: 126.8, max: 271.0) [2024-06-15 15:38:45,956][1648985] Avg episode reward: [(0, '137.540')] [2024-06-15 15:38:46,940][1652491] Updated weights for policy 0, policy_version 334160 (0.0013) [2024-06-15 15:38:47,867][1652491] Updated weights for policy 0, policy_version 334204 (0.0012) [2024-06-15 15:38:49,584][1652491] Updated weights for policy 0, policy_version 334244 (0.0011) [2024-06-15 15:38:50,955][1648985] Fps is (10 sec: 39321.8, 60 sec: 46421.3, 300 sec: 46541.7). Total num frames: 684589056. Throughput: 0: 11776.0. Samples: 171226112. Policy #0 lag: (min: 15.0, avg: 126.8, max: 271.0) [2024-06-15 15:38:50,956][1648985] Avg episode reward: [(0, '156.780')] [2024-06-15 15:38:51,110][1652491] Updated weights for policy 0, policy_version 334277 (0.0011) [2024-06-15 15:38:52,418][1652491] Updated weights for policy 0, policy_version 334331 (0.0012) [2024-06-15 15:38:55,186][1652491] Updated weights for policy 0, policy_version 334387 (0.0122) [2024-06-15 15:38:55,955][1648985] Fps is (10 sec: 52429.3, 60 sec: 48059.6, 300 sec: 46208.4). Total num frames: 684851200. Throughput: 0: 11662.2. Samples: 171258880. Policy #0 lag: (min: 15.0, avg: 126.8, max: 271.0) [2024-06-15 15:38:55,956][1648985] Avg episode reward: [(0, '156.270')] [2024-06-15 15:38:55,961][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000334400_684851200.pth... [2024-06-15 15:38:56,017][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000328960_673710080.pth [2024-06-15 15:38:57,968][1652491] Updated weights for policy 0, policy_version 334433 (0.0013) [2024-06-15 15:39:00,626][1652491] Updated weights for policy 0, policy_version 334496 (0.0034) [2024-06-15 15:39:00,955][1648985] Fps is (10 sec: 45875.6, 60 sec: 46967.7, 300 sec: 46430.6). Total num frames: 685047808. Throughput: 0: 11764.6. Samples: 171330560. Policy #0 lag: (min: 15.0, avg: 126.8, max: 271.0) [2024-06-15 15:39:00,955][1648985] Avg episode reward: [(0, '152.830')] [2024-06-15 15:39:02,701][1652491] Updated weights for policy 0, policy_version 334544 (0.0011) [2024-06-15 15:39:03,740][1652491] Updated weights for policy 0, policy_version 334585 (0.0012) [2024-06-15 15:39:05,694][1651469] Signal inference workers to stop experience collection... (17500 times) [2024-06-15 15:39:05,775][1652491] InferenceWorker_p0-w0: stopping experience collection (17500 times) [2024-06-15 15:39:05,877][1651469] Signal inference workers to resume experience collection... (17500 times) [2024-06-15 15:39:05,878][1652491] InferenceWorker_p0-w0: resuming experience collection (17500 times) [2024-06-15 15:39:05,955][1648985] Fps is (10 sec: 45876.0, 60 sec: 46967.5, 300 sec: 46430.6). Total num frames: 685309952. Throughput: 0: 11798.8. Samples: 171401728. Policy #0 lag: (min: 63.0, avg: 174.3, max: 319.0) [2024-06-15 15:39:05,955][1648985] Avg episode reward: [(0, '130.700')] [2024-06-15 15:39:05,985][1652491] Updated weights for policy 0, policy_version 334625 (0.0012) [2024-06-15 15:39:08,698][1652491] Updated weights for policy 0, policy_version 334691 (0.0018) [2024-06-15 15:39:10,955][1648985] Fps is (10 sec: 49150.5, 60 sec: 47513.4, 300 sec: 46652.7). Total num frames: 685539328. Throughput: 0: 11707.7. Samples: 171433472. Policy #0 lag: (min: 63.0, avg: 174.3, max: 319.0) [2024-06-15 15:39:10,956][1648985] Avg episode reward: [(0, '136.470')] [2024-06-15 15:39:11,003][1652491] Updated weights for policy 0, policy_version 334737 (0.0015) [2024-06-15 15:39:13,912][1652491] Updated weights for policy 0, policy_version 334801 (0.0012) [2024-06-15 15:39:15,955][1648985] Fps is (10 sec: 45874.9, 60 sec: 46421.3, 300 sec: 46430.6). Total num frames: 685768704. Throughput: 0: 11867.0. Samples: 171506176. Policy #0 lag: (min: 63.0, avg: 174.3, max: 319.0) [2024-06-15 15:39:15,956][1648985] Avg episode reward: [(0, '131.330')] [2024-06-15 15:39:16,754][1652491] Updated weights for policy 0, policy_version 334864 (0.0038) [2024-06-15 15:39:19,907][1652491] Updated weights for policy 0, policy_version 334945 (0.0014) [2024-06-15 15:39:20,955][1648985] Fps is (10 sec: 49153.3, 60 sec: 48059.8, 300 sec: 46656.4). Total num frames: 686030848. Throughput: 0: 11821.5. Samples: 171575808. Policy #0 lag: (min: 63.0, avg: 174.3, max: 319.0) [2024-06-15 15:39:20,956][1648985] Avg episode reward: [(0, '136.940')] [2024-06-15 15:39:22,349][1652491] Updated weights for policy 0, policy_version 334992 (0.0013) [2024-06-15 15:39:23,422][1652491] Updated weights for policy 0, policy_version 335037 (0.0057) [2024-06-15 15:39:25,955][1648985] Fps is (10 sec: 49151.1, 60 sec: 47513.4, 300 sec: 46430.6). Total num frames: 686260224. Throughput: 0: 11810.1. Samples: 171611648. Policy #0 lag: (min: 63.0, avg: 174.3, max: 319.0) [2024-06-15 15:39:25,956][1648985] Avg episode reward: [(0, '121.750')] [2024-06-15 15:39:26,005][1652491] Updated weights for policy 0, policy_version 335102 (0.0018) [2024-06-15 15:39:28,736][1652491] Updated weights for policy 0, policy_version 335154 (0.0013) [2024-06-15 15:39:30,399][1652491] Updated weights for policy 0, policy_version 335186 (0.0011) [2024-06-15 15:39:30,955][1648985] Fps is (10 sec: 45875.1, 60 sec: 46967.5, 300 sec: 46430.6). Total num frames: 686489600. Throughput: 0: 11889.8. Samples: 171682304. Policy #0 lag: (min: 63.0, avg: 174.3, max: 319.0) [2024-06-15 15:39:30,955][1648985] Avg episode reward: [(0, '129.280')] [2024-06-15 15:39:34,004][1652491] Updated weights for policy 0, policy_version 335236 (0.0013) [2024-06-15 15:39:35,955][1648985] Fps is (10 sec: 42599.4, 60 sec: 47513.7, 300 sec: 46319.5). Total num frames: 686686208. Throughput: 0: 11616.7. Samples: 171748864. Policy #0 lag: (min: 63.0, avg: 174.3, max: 319.0) [2024-06-15 15:39:35,955][1648985] Avg episode reward: [(0, '153.140')] [2024-06-15 15:39:36,814][1652491] Updated weights for policy 0, policy_version 335320 (0.0129) [2024-06-15 15:39:39,102][1652491] Updated weights for policy 0, policy_version 335361 (0.0012) [2024-06-15 15:39:40,371][1652491] Updated weights for policy 0, policy_version 335423 (0.0014) [2024-06-15 15:39:40,955][1648985] Fps is (10 sec: 45874.3, 60 sec: 45875.1, 300 sec: 46430.6). Total num frames: 686948352. Throughput: 0: 11685.0. Samples: 171784704. Policy #0 lag: (min: 63.0, avg: 174.3, max: 319.0) [2024-06-15 15:39:40,956][1648985] Avg episode reward: [(0, '163.790')] [2024-06-15 15:39:42,939][1652491] Updated weights for policy 0, policy_version 335485 (0.0037) [2024-06-15 15:39:45,955][1648985] Fps is (10 sec: 45875.0, 60 sec: 46967.7, 300 sec: 46541.7). Total num frames: 687144960. Throughput: 0: 11696.3. Samples: 171856896. Policy #0 lag: (min: 63.0, avg: 174.3, max: 319.0) [2024-06-15 15:39:45,956][1648985] Avg episode reward: [(0, '148.310')] [2024-06-15 15:39:46,420][1652491] Updated weights for policy 0, policy_version 335552 (0.0013) [2024-06-15 15:39:49,090][1652491] Updated weights for policy 0, policy_version 335613 (0.0011) [2024-06-15 15:39:50,955][1648985] Fps is (10 sec: 45876.0, 60 sec: 46967.5, 300 sec: 46430.6). Total num frames: 687407104. Throughput: 0: 11696.3. Samples: 171928064. Policy #0 lag: (min: 63.0, avg: 174.3, max: 319.0) [2024-06-15 15:39:50,956][1648985] Avg episode reward: [(0, '144.870')] [2024-06-15 15:39:51,190][1652491] Updated weights for policy 0, policy_version 335678 (0.0011) [2024-06-15 15:39:51,994][1651469] Signal inference workers to stop experience collection... (17550 times) [2024-06-15 15:39:52,038][1652491] InferenceWorker_p0-w0: stopping experience collection (17550 times) [2024-06-15 15:39:52,342][1651469] Signal inference workers to resume experience collection... (17550 times) [2024-06-15 15:39:52,344][1652491] InferenceWorker_p0-w0: resuming experience collection (17550 times) [2024-06-15 15:39:53,489][1652491] Updated weights for policy 0, policy_version 335743 (0.0078) [2024-06-15 15:39:55,955][1648985] Fps is (10 sec: 45875.3, 60 sec: 45875.3, 300 sec: 46652.8). Total num frames: 687603712. Throughput: 0: 11582.6. Samples: 171954688. Policy #0 lag: (min: 63.0, avg: 174.3, max: 319.0) [2024-06-15 15:39:55,956][1648985] Avg episode reward: [(0, '143.480')] [2024-06-15 15:39:57,632][1652491] Updated weights for policy 0, policy_version 335797 (0.0120) [2024-06-15 15:40:00,229][1652491] Updated weights for policy 0, policy_version 335840 (0.0011) [2024-06-15 15:40:00,955][1648985] Fps is (10 sec: 42598.2, 60 sec: 46421.2, 300 sec: 46097.4). Total num frames: 687833088. Throughput: 0: 11741.9. Samples: 172034560. Policy #0 lag: (min: 63.0, avg: 174.3, max: 319.0) [2024-06-15 15:40:00,956][1648985] Avg episode reward: [(0, '144.730')] [2024-06-15 15:40:01,809][1652491] Updated weights for policy 0, policy_version 335905 (0.0013) [2024-06-15 15:40:03,789][1652491] Updated weights for policy 0, policy_version 335952 (0.0014) [2024-06-15 15:40:04,817][1652491] Updated weights for policy 0, policy_version 335992 (0.0012) [2024-06-15 15:40:05,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 46967.4, 300 sec: 46652.7). Total num frames: 688128000. Throughput: 0: 11707.7. Samples: 172102656. Policy #0 lag: (min: 63.0, avg: 174.3, max: 319.0) [2024-06-15 15:40:05,956][1648985] Avg episode reward: [(0, '133.010')] [2024-06-15 15:40:08,019][1652491] Updated weights for policy 0, policy_version 336033 (0.0023) [2024-06-15 15:40:10,955][1648985] Fps is (10 sec: 45875.1, 60 sec: 45875.3, 300 sec: 46097.4). Total num frames: 688291840. Throughput: 0: 11741.9. Samples: 172140032. Policy #0 lag: (min: 63.0, avg: 174.3, max: 319.0) [2024-06-15 15:40:10,956][1648985] Avg episode reward: [(0, '137.350')] [2024-06-15 15:40:11,128][1652491] Updated weights for policy 0, policy_version 336084 (0.0014) [2024-06-15 15:40:12,506][1652491] Updated weights for policy 0, policy_version 336145 (0.0024) [2024-06-15 15:40:13,488][1652491] Updated weights for policy 0, policy_version 336191 (0.0013) [2024-06-15 15:40:15,955][1648985] Fps is (10 sec: 49152.2, 60 sec: 47513.7, 300 sec: 46541.7). Total num frames: 688619520. Throughput: 0: 11650.9. Samples: 172206592. Policy #0 lag: (min: 63.0, avg: 174.3, max: 319.0) [2024-06-15 15:40:15,956][1648985] Avg episode reward: [(0, '122.340')] [2024-06-15 15:40:15,975][1652491] Updated weights for policy 0, policy_version 336248 (0.0019) [2024-06-15 15:40:20,109][1652491] Updated weights for policy 0, policy_version 336290 (0.0013) [2024-06-15 15:40:20,955][1648985] Fps is (10 sec: 49151.8, 60 sec: 45875.1, 300 sec: 46541.7). Total num frames: 688783360. Throughput: 0: 11764.6. Samples: 172278272. Policy #0 lag: (min: 0.0, avg: 103.0, max: 256.0) [2024-06-15 15:40:20,956][1648985] Avg episode reward: [(0, '132.950')] [2024-06-15 15:40:21,544][1652491] Updated weights for policy 0, policy_version 336328 (0.0015) [2024-06-15 15:40:23,034][1652491] Updated weights for policy 0, policy_version 336388 (0.0012) [2024-06-15 15:40:25,955][1648985] Fps is (10 sec: 42598.2, 60 sec: 46421.5, 300 sec: 46652.7). Total num frames: 689045504. Throughput: 0: 11673.7. Samples: 172310016. Policy #0 lag: (min: 0.0, avg: 103.0, max: 256.0) [2024-06-15 15:40:25,956][1648985] Avg episode reward: [(0, '141.590')] [2024-06-15 15:40:26,514][1652491] Updated weights for policy 0, policy_version 336450 (0.0029) [2024-06-15 15:40:27,919][1652491] Updated weights for policy 0, policy_version 336509 (0.0024) [2024-06-15 15:40:30,955][1648985] Fps is (10 sec: 42598.9, 60 sec: 45329.1, 300 sec: 46763.8). Total num frames: 689209344. Throughput: 0: 11685.0. Samples: 172382720. Policy #0 lag: (min: 0.0, avg: 103.0, max: 256.0) [2024-06-15 15:40:30,956][1648985] Avg episode reward: [(0, '149.290')] [2024-06-15 15:40:31,745][1652491] Updated weights for policy 0, policy_version 336560 (0.0013) [2024-06-15 15:40:32,979][1652491] Updated weights for policy 0, policy_version 336596 (0.0016) [2024-06-15 15:40:35,326][1652491] Updated weights for policy 0, policy_version 336674 (0.0014) [2024-06-15 15:40:35,816][1652491] Updated weights for policy 0, policy_version 336704 (0.0031) [2024-06-15 15:40:35,955][1648985] Fps is (10 sec: 52427.8, 60 sec: 48059.6, 300 sec: 46652.7). Total num frames: 689569792. Throughput: 0: 11650.8. Samples: 172452352. Policy #0 lag: (min: 0.0, avg: 103.0, max: 256.0) [2024-06-15 15:40:35,956][1648985] Avg episode reward: [(0, '146.350')] [2024-06-15 15:40:37,262][1651469] Signal inference workers to stop experience collection... (17600 times) [2024-06-15 15:40:37,338][1652491] InferenceWorker_p0-w0: stopping experience collection (17600 times) [2024-06-15 15:40:37,490][1651469] Signal inference workers to resume experience collection... (17600 times) [2024-06-15 15:40:37,491][1652491] InferenceWorker_p0-w0: resuming experience collection (17600 times) [2024-06-15 15:40:38,539][1652491] Updated weights for policy 0, policy_version 336758 (0.0013) [2024-06-15 15:40:40,955][1648985] Fps is (10 sec: 49151.9, 60 sec: 45875.3, 300 sec: 46652.8). Total num frames: 689700864. Throughput: 0: 11832.9. Samples: 172487168. Policy #0 lag: (min: 0.0, avg: 103.0, max: 256.0) [2024-06-15 15:40:40,956][1648985] Avg episode reward: [(0, '141.700')] [2024-06-15 15:40:42,883][1652491] Updated weights for policy 0, policy_version 336816 (0.0015) [2024-06-15 15:40:44,139][1652491] Updated weights for policy 0, policy_version 336869 (0.0013) [2024-06-15 15:40:45,955][1648985] Fps is (10 sec: 39321.5, 60 sec: 46967.3, 300 sec: 46541.6). Total num frames: 689963008. Throughput: 0: 11559.8. Samples: 172554752. Policy #0 lag: (min: 0.0, avg: 103.0, max: 256.0) [2024-06-15 15:40:45,956][1648985] Avg episode reward: [(0, '136.670')] [2024-06-15 15:40:46,219][1652491] Updated weights for policy 0, policy_version 336916 (0.0046) [2024-06-15 15:40:48,701][1652491] Updated weights for policy 0, policy_version 336976 (0.0062) [2024-06-15 15:40:49,864][1652491] Updated weights for policy 0, policy_version 337024 (0.0013) [2024-06-15 15:40:50,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 46967.5, 300 sec: 46652.8). Total num frames: 690225152. Throughput: 0: 11650.8. Samples: 172626944. Policy #0 lag: (min: 0.0, avg: 103.0, max: 256.0) [2024-06-15 15:40:50,955][1648985] Avg episode reward: [(0, '148.100')] [2024-06-15 15:40:54,213][1652491] Updated weights for policy 0, policy_version 337088 (0.0129) [2024-06-15 15:40:55,955][1648985] Fps is (10 sec: 49152.2, 60 sec: 47513.5, 300 sec: 46874.9). Total num frames: 690454528. Throughput: 0: 11639.4. Samples: 172663808. Policy #0 lag: (min: 0.0, avg: 103.0, max: 256.0) [2024-06-15 15:40:55,956][1648985] Avg episode reward: [(0, '172.030')] [2024-06-15 15:40:56,051][1652491] Updated weights for policy 0, policy_version 337150 (0.0012) [2024-06-15 15:40:56,098][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000337152_690487296.pth... [2024-06-15 15:40:56,169][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000331616_679149568.pth [2024-06-15 15:40:57,698][1652491] Updated weights for policy 0, policy_version 337190 (0.0017) [2024-06-15 15:41:00,475][1652491] Updated weights for policy 0, policy_version 337248 (0.0026) [2024-06-15 15:41:00,955][1648985] Fps is (10 sec: 49152.0, 60 sec: 48059.8, 300 sec: 46541.7). Total num frames: 690716672. Throughput: 0: 11764.6. Samples: 172736000. Policy #0 lag: (min: 0.0, avg: 103.0, max: 256.0) [2024-06-15 15:41:00,956][1648985] Avg episode reward: [(0, '171.660')] [2024-06-15 15:41:05,155][1652491] Updated weights for policy 0, policy_version 337312 (0.0013) [2024-06-15 15:41:05,955][1648985] Fps is (10 sec: 42599.2, 60 sec: 45875.2, 300 sec: 46763.8). Total num frames: 690880512. Throughput: 0: 11582.6. Samples: 172799488. Policy #0 lag: (min: 0.0, avg: 103.0, max: 256.0) [2024-06-15 15:41:05,955][1648985] Avg episode reward: [(0, '158.200')] [2024-06-15 15:41:06,608][1652491] Updated weights for policy 0, policy_version 337379 (0.0013) [2024-06-15 15:41:08,540][1652491] Updated weights for policy 0, policy_version 337426 (0.0053) [2024-06-15 15:41:10,947][1652491] Updated weights for policy 0, policy_version 337475 (0.0013) [2024-06-15 15:41:10,955][1648985] Fps is (10 sec: 42598.1, 60 sec: 47513.6, 300 sec: 46541.7). Total num frames: 691142656. Throughput: 0: 11707.7. Samples: 172836864. Policy #0 lag: (min: 0.0, avg: 103.0, max: 256.0) [2024-06-15 15:41:10,956][1648985] Avg episode reward: [(0, '137.830')] [2024-06-15 15:41:15,955][1648985] Fps is (10 sec: 39320.7, 60 sec: 44236.6, 300 sec: 46652.7). Total num frames: 691273728. Throughput: 0: 11719.1. Samples: 172910080. Policy #0 lag: (min: 0.0, avg: 103.0, max: 256.0) [2024-06-15 15:41:15,956][1648985] Avg episode reward: [(0, '159.540')] [2024-06-15 15:41:16,562][1652491] Updated weights for policy 0, policy_version 337568 (0.0014) [2024-06-15 15:41:18,179][1652491] Updated weights for policy 0, policy_version 337648 (0.0066) [2024-06-15 15:41:20,550][1651469] Signal inference workers to stop experience collection... (17650 times) [2024-06-15 15:41:20,621][1652491] InferenceWorker_p0-w0: stopping experience collection (17650 times) [2024-06-15 15:41:20,841][1651469] Signal inference workers to resume experience collection... (17650 times) [2024-06-15 15:41:20,852][1652491] InferenceWorker_p0-w0: resuming experience collection (17650 times) [2024-06-15 15:41:20,855][1652491] Updated weights for policy 0, policy_version 337712 (0.0014) [2024-06-15 15:41:20,955][1648985] Fps is (10 sec: 49152.6, 60 sec: 47513.7, 300 sec: 46763.8). Total num frames: 691634176. Throughput: 0: 11468.9. Samples: 172968448. Policy #0 lag: (min: 0.0, avg: 103.0, max: 256.0) [2024-06-15 15:41:20,955][1648985] Avg episode reward: [(0, '157.460')] [2024-06-15 15:41:22,875][1652491] Updated weights for policy 0, policy_version 337749 (0.0013) [2024-06-15 15:41:23,665][1652491] Updated weights for policy 0, policy_version 337792 (0.0020) [2024-06-15 15:41:25,955][1648985] Fps is (10 sec: 52429.5, 60 sec: 45875.1, 300 sec: 46652.8). Total num frames: 691798016. Throughput: 0: 11480.2. Samples: 173003776. Policy #0 lag: (min: 0.0, avg: 103.0, max: 256.0) [2024-06-15 15:41:25,956][1648985] Avg episode reward: [(0, '137.540')] [2024-06-15 15:41:29,608][1652491] Updated weights for policy 0, policy_version 337888 (0.0080) [2024-06-15 15:41:30,956][1648985] Fps is (10 sec: 42592.5, 60 sec: 47512.5, 300 sec: 46765.4). Total num frames: 692060160. Throughput: 0: 11536.8. Samples: 173073920. Policy #0 lag: (min: 0.0, avg: 103.0, max: 256.0) [2024-06-15 15:41:30,958][1648985] Avg episode reward: [(0, '140.380')] [2024-06-15 15:41:32,027][1652491] Updated weights for policy 0, policy_version 337939 (0.0013) [2024-06-15 15:41:34,227][1652491] Updated weights for policy 0, policy_version 337987 (0.0033) [2024-06-15 15:41:35,682][1652491] Updated weights for policy 0, policy_version 338048 (0.0038) [2024-06-15 15:41:35,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 45875.3, 300 sec: 46652.7). Total num frames: 692322304. Throughput: 0: 11366.4. Samples: 173138432. Policy #0 lag: (min: 47.0, avg: 186.9, max: 303.0) [2024-06-15 15:41:35,956][1648985] Avg episode reward: [(0, '157.640')] [2024-06-15 15:41:40,955][1648985] Fps is (10 sec: 42604.1, 60 sec: 46421.4, 300 sec: 46763.9). Total num frames: 692486144. Throughput: 0: 11503.0. Samples: 173181440. Policy #0 lag: (min: 47.0, avg: 186.9, max: 303.0) [2024-06-15 15:41:40,956][1648985] Avg episode reward: [(0, '143.980')] [2024-06-15 15:41:40,976][1652491] Updated weights for policy 0, policy_version 338128 (0.0013) [2024-06-15 15:41:42,833][1652491] Updated weights for policy 0, policy_version 338192 (0.0014) [2024-06-15 15:41:45,435][1652491] Updated weights for policy 0, policy_version 338245 (0.0017) [2024-06-15 15:41:45,962][1648985] Fps is (10 sec: 42570.0, 60 sec: 46416.3, 300 sec: 46762.8). Total num frames: 692748288. Throughput: 0: 11341.9. Samples: 173246464. Policy #0 lag: (min: 47.0, avg: 186.9, max: 303.0) [2024-06-15 15:41:45,962][1648985] Avg episode reward: [(0, '153.830')] [2024-06-15 15:41:46,725][1652491] Updated weights for policy 0, policy_version 338304 (0.0017) [2024-06-15 15:41:50,955][1648985] Fps is (10 sec: 42596.3, 60 sec: 44782.6, 300 sec: 46763.8). Total num frames: 692912128. Throughput: 0: 11593.8. Samples: 173321216. Policy #0 lag: (min: 47.0, avg: 186.9, max: 303.0) [2024-06-15 15:41:50,956][1648985] Avg episode reward: [(0, '133.080')] [2024-06-15 15:41:51,378][1652491] Updated weights for policy 0, policy_version 338355 (0.0011) [2024-06-15 15:41:52,910][1652491] Updated weights for policy 0, policy_version 338428 (0.0127) [2024-06-15 15:41:55,029][1652491] Updated weights for policy 0, policy_version 338480 (0.0014) [2024-06-15 15:41:55,955][1648985] Fps is (10 sec: 49185.3, 60 sec: 46421.5, 300 sec: 46763.8). Total num frames: 693239808. Throughput: 0: 11457.4. Samples: 173352448. Policy #0 lag: (min: 47.0, avg: 186.9, max: 303.0) [2024-06-15 15:41:55,955][1648985] Avg episode reward: [(0, '121.010')] [2024-06-15 15:41:58,252][1652491] Updated weights for policy 0, policy_version 338544 (0.0014) [2024-06-15 15:42:00,955][1648985] Fps is (10 sec: 45877.2, 60 sec: 44236.7, 300 sec: 46652.7). Total num frames: 693370880. Throughput: 0: 11389.2. Samples: 173422592. Policy #0 lag: (min: 47.0, avg: 186.9, max: 303.0) [2024-06-15 15:42:00,956][1648985] Avg episode reward: [(0, '116.070')] [2024-06-15 15:42:02,002][1652491] Updated weights for policy 0, policy_version 338596 (0.0131) [2024-06-15 15:42:03,518][1652491] Updated weights for policy 0, policy_version 338659 (0.0013) [2024-06-15 15:42:05,420][1651469] Signal inference workers to stop experience collection... (17700 times) [2024-06-15 15:42:05,476][1652491] Updated weights for policy 0, policy_version 338707 (0.0012) [2024-06-15 15:42:05,506][1652491] InferenceWorker_p0-w0: stopping experience collection (17700 times) [2024-06-15 15:42:05,704][1651469] Signal inference workers to resume experience collection... (17700 times) [2024-06-15 15:42:05,705][1652491] InferenceWorker_p0-w0: resuming experience collection (17700 times) [2024-06-15 15:42:05,955][1648985] Fps is (10 sec: 45874.7, 60 sec: 46967.4, 300 sec: 46874.9). Total num frames: 693698560. Throughput: 0: 11616.7. Samples: 173491200. Policy #0 lag: (min: 47.0, avg: 186.9, max: 303.0) [2024-06-15 15:42:05,956][1648985] Avg episode reward: [(0, '130.040')] [2024-06-15 15:42:06,497][1652491] Updated weights for policy 0, policy_version 338752 (0.0021) [2024-06-15 15:42:09,729][1652491] Updated weights for policy 0, policy_version 338807 (0.0013) [2024-06-15 15:42:10,955][1648985] Fps is (10 sec: 52428.5, 60 sec: 45875.2, 300 sec: 46652.7). Total num frames: 693895168. Throughput: 0: 11650.8. Samples: 173528064. Policy #0 lag: (min: 47.0, avg: 186.9, max: 303.0) [2024-06-15 15:42:10,956][1648985] Avg episode reward: [(0, '137.360')] [2024-06-15 15:42:12,718][1652491] Updated weights for policy 0, policy_version 338870 (0.0014) [2024-06-15 15:42:14,733][1652491] Updated weights for policy 0, policy_version 338912 (0.0016) [2024-06-15 15:42:15,955][1648985] Fps is (10 sec: 49152.7, 60 sec: 48606.1, 300 sec: 46874.9). Total num frames: 694190080. Throughput: 0: 11753.6. Samples: 173602816. Policy #0 lag: (min: 47.0, avg: 186.9, max: 303.0) [2024-06-15 15:42:15,955][1648985] Avg episode reward: [(0, '138.100')] [2024-06-15 15:42:16,036][1652491] Updated weights for policy 0, policy_version 338961 (0.0014) [2024-06-15 15:42:17,062][1652491] Updated weights for policy 0, policy_version 339008 (0.0021) [2024-06-15 15:42:20,308][1652491] Updated weights for policy 0, policy_version 339066 (0.0020) [2024-06-15 15:42:20,955][1648985] Fps is (10 sec: 52429.1, 60 sec: 46421.2, 300 sec: 47097.1). Total num frames: 694419456. Throughput: 0: 11889.8. Samples: 173673472. Policy #0 lag: (min: 47.0, avg: 186.9, max: 303.0) [2024-06-15 15:42:20,956][1648985] Avg episode reward: [(0, '155.600')] [2024-06-15 15:42:23,324][1652491] Updated weights for policy 0, policy_version 339120 (0.0044) [2024-06-15 15:42:25,475][1652491] Updated weights for policy 0, policy_version 339177 (0.0013) [2024-06-15 15:42:25,955][1648985] Fps is (10 sec: 49151.7, 60 sec: 48059.8, 300 sec: 46986.0). Total num frames: 694681600. Throughput: 0: 11912.5. Samples: 173717504. Policy #0 lag: (min: 47.0, avg: 186.9, max: 303.0) [2024-06-15 15:42:25,955][1648985] Avg episode reward: [(0, '150.290')] [2024-06-15 15:42:26,284][1652491] Updated weights for policy 0, policy_version 339216 (0.0014) [2024-06-15 15:42:27,448][1652491] Updated weights for policy 0, policy_version 339263 (0.0052) [2024-06-15 15:42:30,468][1652491] Updated weights for policy 0, policy_version 339324 (0.0015) [2024-06-15 15:42:30,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 48060.7, 300 sec: 47097.1). Total num frames: 694943744. Throughput: 0: 11982.6. Samples: 173785600. Policy #0 lag: (min: 47.0, avg: 186.9, max: 303.0) [2024-06-15 15:42:30,956][1648985] Avg episode reward: [(0, '177.040')] [2024-06-15 15:42:34,504][1652491] Updated weights for policy 0, policy_version 339382 (0.0021) [2024-06-15 15:42:35,955][1648985] Fps is (10 sec: 42598.3, 60 sec: 46421.4, 300 sec: 46763.9). Total num frames: 695107584. Throughput: 0: 12094.7. Samples: 173865472. Policy #0 lag: (min: 47.0, avg: 186.9, max: 303.0) [2024-06-15 15:42:35,956][1648985] Avg episode reward: [(0, '162.070')] [2024-06-15 15:42:36,404][1652491] Updated weights for policy 0, policy_version 339433 (0.0076) [2024-06-15 15:42:38,298][1652491] Updated weights for policy 0, policy_version 339513 (0.0076) [2024-06-15 15:42:40,955][1648985] Fps is (10 sec: 45875.6, 60 sec: 48605.9, 300 sec: 46986.5). Total num frames: 695402496. Throughput: 0: 12003.5. Samples: 173892608. Policy #0 lag: (min: 47.0, avg: 186.9, max: 303.0) [2024-06-15 15:42:40,956][1648985] Avg episode reward: [(0, '169.520')] [2024-06-15 15:42:41,305][1652491] Updated weights for policy 0, policy_version 339568 (0.0011) [2024-06-15 15:42:45,231][1652491] Updated weights for policy 0, policy_version 339616 (0.0101) [2024-06-15 15:42:45,958][1648985] Fps is (10 sec: 49135.9, 60 sec: 47516.4, 300 sec: 46763.3). Total num frames: 695599104. Throughput: 0: 12150.6. Samples: 173969408. Policy #0 lag: (min: 47.0, avg: 186.9, max: 303.0) [2024-06-15 15:42:45,959][1648985] Avg episode reward: [(0, '139.270')] [2024-06-15 15:42:47,521][1652491] Updated weights for policy 0, policy_version 339696 (0.0012) [2024-06-15 15:42:47,931][1651469] Signal inference workers to stop experience collection... (17750 times) [2024-06-15 15:42:47,989][1652491] InferenceWorker_p0-w0: stopping experience collection (17750 times) [2024-06-15 15:42:48,091][1651469] Signal inference workers to resume experience collection... (17750 times) [2024-06-15 15:42:48,102][1652491] InferenceWorker_p0-w0: resuming experience collection (17750 times) [2024-06-15 15:42:48,343][1652491] Updated weights for policy 0, policy_version 339737 (0.0012) [2024-06-15 15:42:50,955][1648985] Fps is (10 sec: 45875.4, 60 sec: 49152.4, 300 sec: 47097.1). Total num frames: 695861248. Throughput: 0: 12242.5. Samples: 174042112. Policy #0 lag: (min: 47.0, avg: 186.9, max: 303.0) [2024-06-15 15:42:50,955][1648985] Avg episode reward: [(0, '138.070')] [2024-06-15 15:42:51,705][1652491] Updated weights for policy 0, policy_version 339795 (0.0014) [2024-06-15 15:42:55,956][1648985] Fps is (10 sec: 39332.4, 60 sec: 45874.8, 300 sec: 46652.7). Total num frames: 695992320. Throughput: 0: 12117.2. Samples: 174073344. Policy #0 lag: (min: 22.0, avg: 144.9, max: 278.0) [2024-06-15 15:42:55,957][1648985] Avg episode reward: [(0, '169.310')] [2024-06-15 15:42:56,551][1652491] Updated weights for policy 0, policy_version 339872 (0.0013) [2024-06-15 15:42:56,552][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000339872_696057856.pth... [2024-06-15 15:42:56,708][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000334400_684851200.pth [2024-06-15 15:42:58,260][1652491] Updated weights for policy 0, policy_version 339952 (0.0049) [2024-06-15 15:42:59,444][1652491] Updated weights for policy 0, policy_version 340001 (0.0013) [2024-06-15 15:43:00,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 50244.4, 300 sec: 47097.1). Total num frames: 696385536. Throughput: 0: 11969.4. Samples: 174141440. Policy #0 lag: (min: 22.0, avg: 144.9, max: 278.0) [2024-06-15 15:43:00,955][1648985] Avg episode reward: [(0, '161.170')] [2024-06-15 15:43:03,156][1652491] Updated weights for policy 0, policy_version 340048 (0.0037) [2024-06-15 15:43:05,955][1648985] Fps is (10 sec: 52431.7, 60 sec: 46967.5, 300 sec: 46874.9). Total num frames: 696516608. Throughput: 0: 12094.6. Samples: 174217728. Policy #0 lag: (min: 22.0, avg: 144.9, max: 278.0) [2024-06-15 15:43:05,955][1648985] Avg episode reward: [(0, '154.970')] [2024-06-15 15:43:07,538][1652491] Updated weights for policy 0, policy_version 340112 (0.0012) [2024-06-15 15:43:09,136][1652491] Updated weights for policy 0, policy_version 340192 (0.0012) [2024-06-15 15:43:10,736][1652491] Updated weights for policy 0, policy_version 340243 (0.0036) [2024-06-15 15:43:10,978][1648985] Fps is (10 sec: 45768.8, 60 sec: 49133.1, 300 sec: 46982.3). Total num frames: 696844288. Throughput: 0: 11815.4. Samples: 174249472. Policy #0 lag: (min: 22.0, avg: 144.9, max: 278.0) [2024-06-15 15:43:10,979][1648985] Avg episode reward: [(0, '149.190')] [2024-06-15 15:43:14,912][1652491] Updated weights for policy 0, policy_version 340293 (0.0014) [2024-06-15 15:43:15,955][1648985] Fps is (10 sec: 49151.9, 60 sec: 46967.4, 300 sec: 46986.0). Total num frames: 697008128. Throughput: 0: 11810.2. Samples: 174317056. Policy #0 lag: (min: 22.0, avg: 144.9, max: 278.0) [2024-06-15 15:43:15,956][1648985] Avg episode reward: [(0, '151.350')] [2024-06-15 15:43:16,113][1652491] Updated weights for policy 0, policy_version 340352 (0.0011) [2024-06-15 15:43:20,287][1652491] Updated weights for policy 0, policy_version 340432 (0.0082) [2024-06-15 15:43:20,955][1648985] Fps is (10 sec: 42697.3, 60 sec: 47513.6, 300 sec: 46986.0). Total num frames: 697270272. Throughput: 0: 11548.4. Samples: 174385152. Policy #0 lag: (min: 22.0, avg: 144.9, max: 278.0) [2024-06-15 15:43:20,956][1648985] Avg episode reward: [(0, '134.610')] [2024-06-15 15:43:21,485][1652491] Updated weights for policy 0, policy_version 340482 (0.0018) [2024-06-15 15:43:25,955][1648985] Fps is (10 sec: 42597.6, 60 sec: 45875.1, 300 sec: 46652.7). Total num frames: 697434112. Throughput: 0: 11616.7. Samples: 174415360. Policy #0 lag: (min: 22.0, avg: 144.9, max: 278.0) [2024-06-15 15:43:25,956][1648985] Avg episode reward: [(0, '147.910')] [2024-06-15 15:43:26,677][1652491] Updated weights for policy 0, policy_version 340545 (0.0012) [2024-06-15 15:43:27,637][1652491] Updated weights for policy 0, policy_version 340598 (0.0016) [2024-06-15 15:43:30,333][1652491] Updated weights for policy 0, policy_version 340627 (0.0012) [2024-06-15 15:43:30,577][1651469] Signal inference workers to stop experience collection... (17800 times) [2024-06-15 15:43:30,625][1652491] InferenceWorker_p0-w0: stopping experience collection (17800 times) [2024-06-15 15:43:30,745][1651469] Signal inference workers to resume experience collection... (17800 times) [2024-06-15 15:43:30,746][1652491] InferenceWorker_p0-w0: resuming experience collection (17800 times) [2024-06-15 15:43:30,955][1648985] Fps is (10 sec: 39321.8, 60 sec: 45329.2, 300 sec: 46874.9). Total num frames: 697663488. Throughput: 0: 11731.4. Samples: 174497280. Policy #0 lag: (min: 22.0, avg: 144.9, max: 278.0) [2024-06-15 15:43:30,955][1648985] Avg episode reward: [(0, '154.090')] [2024-06-15 15:43:31,867][1652491] Updated weights for policy 0, policy_version 340709 (0.0015) [2024-06-15 15:43:33,723][1652491] Updated weights for policy 0, policy_version 340787 (0.0013) [2024-06-15 15:43:35,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 47513.5, 300 sec: 46652.7). Total num frames: 697958400. Throughput: 0: 11616.7. Samples: 174564864. Policy #0 lag: (min: 22.0, avg: 144.9, max: 278.0) [2024-06-15 15:43:35,956][1648985] Avg episode reward: [(0, '158.230')] [2024-06-15 15:43:37,819][1652491] Updated weights for policy 0, policy_version 340817 (0.0012) [2024-06-15 15:43:38,773][1652491] Updated weights for policy 0, policy_version 340860 (0.0025) [2024-06-15 15:43:40,955][1648985] Fps is (10 sec: 45875.0, 60 sec: 45329.1, 300 sec: 46763.9). Total num frames: 698122240. Throughput: 0: 11798.9. Samples: 174604288. Policy #0 lag: (min: 22.0, avg: 144.9, max: 278.0) [2024-06-15 15:43:40,956][1648985] Avg episode reward: [(0, '138.840')] [2024-06-15 15:43:42,446][1652491] Updated weights for policy 0, policy_version 340944 (0.0015) [2024-06-15 15:43:44,185][1652491] Updated weights for policy 0, policy_version 341011 (0.0116) [2024-06-15 15:43:45,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 48062.2, 300 sec: 47097.0). Total num frames: 698482688. Throughput: 0: 11559.8. Samples: 174661632. Policy #0 lag: (min: 22.0, avg: 144.9, max: 278.0) [2024-06-15 15:43:45,956][1648985] Avg episode reward: [(0, '134.420')] [2024-06-15 15:43:48,499][1652491] Updated weights for policy 0, policy_version 341059 (0.0026) [2024-06-15 15:43:49,625][1652491] Updated weights for policy 0, policy_version 341115 (0.0047) [2024-06-15 15:43:50,955][1648985] Fps is (10 sec: 49151.9, 60 sec: 45875.1, 300 sec: 46652.8). Total num frames: 698613760. Throughput: 0: 11753.2. Samples: 174746624. Policy #0 lag: (min: 22.0, avg: 144.9, max: 278.0) [2024-06-15 15:43:50,956][1648985] Avg episode reward: [(0, '142.400')] [2024-06-15 15:43:52,851][1652491] Updated weights for policy 0, policy_version 341172 (0.0020) [2024-06-15 15:43:54,314][1652491] Updated weights for policy 0, policy_version 341238 (0.0012) [2024-06-15 15:43:55,536][1652491] Updated weights for policy 0, policy_version 341301 (0.0086) [2024-06-15 15:43:55,955][1648985] Fps is (10 sec: 52429.7, 60 sec: 50244.7, 300 sec: 47319.2). Total num frames: 699006976. Throughput: 0: 11679.6. Samples: 174774784. Policy #0 lag: (min: 22.0, avg: 144.9, max: 278.0) [2024-06-15 15:43:55,956][1648985] Avg episode reward: [(0, '154.240')] [2024-06-15 15:43:59,997][1652491] Updated weights for policy 0, policy_version 341344 (0.0016) [2024-06-15 15:44:00,697][1652491] Updated weights for policy 0, policy_version 341376 (0.0032) [2024-06-15 15:44:00,955][1648985] Fps is (10 sec: 52428.0, 60 sec: 45875.0, 300 sec: 46874.9). Total num frames: 699138048. Throughput: 0: 11992.1. Samples: 174856704. Policy #0 lag: (min: 22.0, avg: 144.9, max: 278.0) [2024-06-15 15:44:00,956][1648985] Avg episode reward: [(0, '146.530')] [2024-06-15 15:44:03,620][1652491] Updated weights for policy 0, policy_version 341445 (0.0083) [2024-06-15 15:44:05,732][1651469] Signal inference workers to stop experience collection... (17850 times) [2024-06-15 15:44:05,862][1652491] InferenceWorker_p0-w0: stopping experience collection (17850 times) [2024-06-15 15:44:05,871][1652491] Updated weights for policy 0, policy_version 341545 (0.0111) [2024-06-15 15:44:05,955][1648985] Fps is (10 sec: 45875.3, 60 sec: 49152.0, 300 sec: 47208.2). Total num frames: 699465728. Throughput: 0: 11889.8. Samples: 174920192. Policy #0 lag: (min: 22.0, avg: 144.9, max: 278.0) [2024-06-15 15:44:05,956][1648985] Avg episode reward: [(0, '151.410')] [2024-06-15 15:44:05,976][1651469] Signal inference workers to resume experience collection... (17850 times) [2024-06-15 15:44:05,976][1652491] InferenceWorker_p0-w0: resuming experience collection (17850 times) [2024-06-15 15:44:10,325][1652491] Updated weights for policy 0, policy_version 341584 (0.0019) [2024-06-15 15:44:10,955][1648985] Fps is (10 sec: 45876.1, 60 sec: 45892.9, 300 sec: 46874.9). Total num frames: 699596800. Throughput: 0: 12060.5. Samples: 174958080. Policy #0 lag: (min: 15.0, avg: 111.5, max: 271.0) [2024-06-15 15:44:10,956][1648985] Avg episode reward: [(0, '143.640')] [2024-06-15 15:44:11,625][1652491] Updated weights for policy 0, policy_version 341632 (0.0012) [2024-06-15 15:44:13,951][1652491] Updated weights for policy 0, policy_version 341683 (0.0013) [2024-06-15 15:44:15,439][1652491] Updated weights for policy 0, policy_version 341728 (0.0012) [2024-06-15 15:44:15,955][1648985] Fps is (10 sec: 42598.1, 60 sec: 48059.7, 300 sec: 46986.0). Total num frames: 699891712. Throughput: 0: 11980.8. Samples: 175036416. Policy #0 lag: (min: 15.0, avg: 111.5, max: 271.0) [2024-06-15 15:44:15,956][1648985] Avg episode reward: [(0, '127.440')] [2024-06-15 15:44:17,646][1652491] Updated weights for policy 0, policy_version 341823 (0.0024) [2024-06-15 15:44:20,955][1648985] Fps is (10 sec: 45874.9, 60 sec: 46421.3, 300 sec: 46763.8). Total num frames: 700055552. Throughput: 0: 11855.7. Samples: 175098368. Policy #0 lag: (min: 15.0, avg: 111.5, max: 271.0) [2024-06-15 15:44:20,956][1648985] Avg episode reward: [(0, '128.430')] [2024-06-15 15:44:22,727][1652491] Updated weights for policy 0, policy_version 341878 (0.0012) [2024-06-15 15:44:25,771][1652491] Updated weights for policy 0, policy_version 341936 (0.0012) [2024-06-15 15:44:25,955][1648985] Fps is (10 sec: 39322.0, 60 sec: 47513.8, 300 sec: 46763.8). Total num frames: 700284928. Throughput: 0: 11787.4. Samples: 175134720. Policy #0 lag: (min: 15.0, avg: 111.5, max: 271.0) [2024-06-15 15:44:25,956][1648985] Avg episode reward: [(0, '132.570')] [2024-06-15 15:44:26,902][1652491] Updated weights for policy 0, policy_version 341984 (0.0012) [2024-06-15 15:44:28,032][1652491] Updated weights for policy 0, policy_version 342038 (0.0021) [2024-06-15 15:44:30,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 48605.8, 300 sec: 47097.0). Total num frames: 700579840. Throughput: 0: 12037.7. Samples: 175203328. Policy #0 lag: (min: 15.0, avg: 111.5, max: 271.0) [2024-06-15 15:44:30,956][1648985] Avg episode reward: [(0, '173.900')] [2024-06-15 15:44:32,595][1652491] Updated weights for policy 0, policy_version 342084 (0.0013) [2024-06-15 15:44:35,955][1648985] Fps is (10 sec: 42598.7, 60 sec: 45875.4, 300 sec: 46652.8). Total num frames: 700710912. Throughput: 0: 11832.9. Samples: 175279104. Policy #0 lag: (min: 15.0, avg: 111.5, max: 271.0) [2024-06-15 15:44:35,955][1648985] Avg episode reward: [(0, '173.540')] [2024-06-15 15:44:36,113][1652491] Updated weights for policy 0, policy_version 342148 (0.0017) [2024-06-15 15:44:37,493][1652491] Updated weights for policy 0, policy_version 342208 (0.0011) [2024-06-15 15:44:39,304][1652491] Updated weights for policy 0, policy_version 342288 (0.0014) [2024-06-15 15:44:40,144][1652491] Updated weights for policy 0, policy_version 342336 (0.0013) [2024-06-15 15:44:40,955][1648985] Fps is (10 sec: 52429.3, 60 sec: 49698.1, 300 sec: 47319.2). Total num frames: 701104128. Throughput: 0: 11912.5. Samples: 175310848. Policy #0 lag: (min: 15.0, avg: 111.5, max: 271.0) [2024-06-15 15:44:40,956][1648985] Avg episode reward: [(0, '168.150')] [2024-06-15 15:44:45,075][1652491] Updated weights for policy 0, policy_version 342400 (0.0122) [2024-06-15 15:44:45,955][1648985] Fps is (10 sec: 52428.1, 60 sec: 45875.3, 300 sec: 46874.9). Total num frames: 701235200. Throughput: 0: 11514.3. Samples: 175374848. Policy #0 lag: (min: 15.0, avg: 111.5, max: 271.0) [2024-06-15 15:44:45,956][1648985] Avg episode reward: [(0, '146.020')] [2024-06-15 15:44:49,245][1652491] Updated weights for policy 0, policy_version 342464 (0.0012) [2024-06-15 15:44:49,740][1651469] Signal inference workers to stop experience collection... (17900 times) [2024-06-15 15:44:49,794][1652491] InferenceWorker_p0-w0: stopping experience collection (17900 times) [2024-06-15 15:44:49,984][1651469] Signal inference workers to resume experience collection... (17900 times) [2024-06-15 15:44:49,985][1652491] InferenceWorker_p0-w0: resuming experience collection (17900 times) [2024-06-15 15:44:50,532][1652491] Updated weights for policy 0, policy_version 342518 (0.0011) [2024-06-15 15:44:50,955][1648985] Fps is (10 sec: 42598.3, 60 sec: 48605.9, 300 sec: 47208.1). Total num frames: 701530112. Throughput: 0: 11628.1. Samples: 175443456. Policy #0 lag: (min: 15.0, avg: 111.5, max: 271.0) [2024-06-15 15:44:50,956][1648985] Avg episode reward: [(0, '122.430')] [2024-06-15 15:44:51,837][1652491] Updated weights for policy 0, policy_version 342583 (0.0012) [2024-06-15 15:44:55,256][1652491] Updated weights for policy 0, policy_version 342624 (0.0013) [2024-06-15 15:44:55,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 45875.2, 300 sec: 47208.1). Total num frames: 701759488. Throughput: 0: 11673.6. Samples: 175483392. Policy #0 lag: (min: 15.0, avg: 111.5, max: 271.0) [2024-06-15 15:44:55,956][1648985] Avg episode reward: [(0, '108.170')] [2024-06-15 15:44:55,960][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000342656_701759488.pth... [2024-06-15 15:44:56,014][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000337152_690487296.pth [2024-06-15 15:45:00,290][1652491] Updated weights for policy 0, policy_version 342704 (0.0013) [2024-06-15 15:45:00,955][1648985] Fps is (10 sec: 36045.5, 60 sec: 45875.5, 300 sec: 46652.8). Total num frames: 701890560. Throughput: 0: 11457.5. Samples: 175552000. Policy #0 lag: (min: 15.0, avg: 111.5, max: 271.0) [2024-06-15 15:45:00,955][1648985] Avg episode reward: [(0, '126.220')] [2024-06-15 15:45:01,861][1652491] Updated weights for policy 0, policy_version 342768 (0.0013) [2024-06-15 15:45:03,767][1652491] Updated weights for policy 0, policy_version 342838 (0.0012) [2024-06-15 15:45:05,955][1648985] Fps is (10 sec: 39321.2, 60 sec: 44782.8, 300 sec: 46986.0). Total num frames: 702152704. Throughput: 0: 11548.4. Samples: 175618048. Policy #0 lag: (min: 15.0, avg: 111.5, max: 271.0) [2024-06-15 15:45:05,956][1648985] Avg episode reward: [(0, '143.090')] [2024-06-15 15:45:06,616][1652491] Updated weights for policy 0, policy_version 342882 (0.0033) [2024-06-15 15:45:10,955][1648985] Fps is (10 sec: 39320.8, 60 sec: 44782.9, 300 sec: 46319.5). Total num frames: 702283776. Throughput: 0: 11582.6. Samples: 175655936. Policy #0 lag: (min: 15.0, avg: 111.5, max: 271.0) [2024-06-15 15:45:10,956][1648985] Avg episode reward: [(0, '166.700')] [2024-06-15 15:45:11,766][1652491] Updated weights for policy 0, policy_version 342946 (0.0013) [2024-06-15 15:45:13,947][1652491] Updated weights for policy 0, policy_version 343040 (0.0119) [2024-06-15 15:45:15,544][1652491] Updated weights for policy 0, policy_version 343100 (0.0011) [2024-06-15 15:45:15,955][1648985] Fps is (10 sec: 52429.5, 60 sec: 46421.4, 300 sec: 47097.1). Total num frames: 702676992. Throughput: 0: 11241.3. Samples: 175709184. Policy #0 lag: (min: 15.0, avg: 111.5, max: 271.0) [2024-06-15 15:45:15,956][1648985] Avg episode reward: [(0, '156.400')] [2024-06-15 15:45:18,347][1652491] Updated weights for policy 0, policy_version 343163 (0.0012) [2024-06-15 15:45:20,955][1648985] Fps is (10 sec: 52429.2, 60 sec: 45875.3, 300 sec: 46652.8). Total num frames: 702808064. Throughput: 0: 11343.6. Samples: 175789568. Policy #0 lag: (min: 15.0, avg: 111.5, max: 271.0) [2024-06-15 15:45:20,956][1648985] Avg episode reward: [(0, '149.920')] [2024-06-15 15:45:23,661][1652491] Updated weights for policy 0, policy_version 343232 (0.0011) [2024-06-15 15:45:24,780][1652491] Updated weights for policy 0, policy_version 343289 (0.0015) [2024-06-15 15:45:25,955][1648985] Fps is (10 sec: 42598.3, 60 sec: 46967.4, 300 sec: 47097.1). Total num frames: 703102976. Throughput: 0: 11366.4. Samples: 175822336. Policy #0 lag: (min: 15.0, avg: 111.5, max: 271.0) [2024-06-15 15:45:25,956][1648985] Avg episode reward: [(0, '145.310')] [2024-06-15 15:45:26,811][1652491] Updated weights for policy 0, policy_version 343352 (0.0012) [2024-06-15 15:45:29,032][1652491] Updated weights for policy 0, policy_version 343408 (0.0011) [2024-06-15 15:45:30,955][1648985] Fps is (10 sec: 52427.1, 60 sec: 45875.1, 300 sec: 46652.7). Total num frames: 703332352. Throughput: 0: 11514.3. Samples: 175892992. Policy #0 lag: (min: 49.0, avg: 194.6, max: 321.0) [2024-06-15 15:45:30,956][1648985] Avg episode reward: [(0, '149.880')] [2024-06-15 15:45:33,380][1651469] Signal inference workers to stop experience collection... (17950 times) [2024-06-15 15:45:33,406][1652491] InferenceWorker_p0-w0: stopping experience collection (17950 times) [2024-06-15 15:45:33,622][1651469] Signal inference workers to resume experience collection... (17950 times) [2024-06-15 15:45:33,622][1652491] InferenceWorker_p0-w0: resuming experience collection (17950 times) [2024-06-15 15:45:34,141][1652491] Updated weights for policy 0, policy_version 343459 (0.0013) [2024-06-15 15:45:35,358][1652491] Updated weights for policy 0, policy_version 343520 (0.0013) [2024-06-15 15:45:35,955][1648985] Fps is (10 sec: 49152.5, 60 sec: 48059.7, 300 sec: 47097.1). Total num frames: 703594496. Throughput: 0: 11582.6. Samples: 175964672. Policy #0 lag: (min: 49.0, avg: 194.6, max: 321.0) [2024-06-15 15:45:35,955][1648985] Avg episode reward: [(0, '147.350')] [2024-06-15 15:45:37,363][1652491] Updated weights for policy 0, policy_version 343587 (0.0013) [2024-06-15 15:45:39,911][1652491] Updated weights for policy 0, policy_version 343648 (0.0119) [2024-06-15 15:45:40,955][1648985] Fps is (10 sec: 52429.4, 60 sec: 45875.1, 300 sec: 47097.1). Total num frames: 703856640. Throughput: 0: 11537.0. Samples: 176002560. Policy #0 lag: (min: 49.0, avg: 194.6, max: 321.0) [2024-06-15 15:45:40,956][1648985] Avg episode reward: [(0, '158.590')] [2024-06-15 15:45:45,181][1652491] Updated weights for policy 0, policy_version 343712 (0.0013) [2024-06-15 15:45:45,955][1648985] Fps is (10 sec: 36044.2, 60 sec: 45329.0, 300 sec: 46541.6). Total num frames: 703954944. Throughput: 0: 11616.6. Samples: 176074752. Policy #0 lag: (min: 49.0, avg: 194.6, max: 321.0) [2024-06-15 15:45:45,956][1648985] Avg episode reward: [(0, '143.530')] [2024-06-15 15:45:46,619][1652491] Updated weights for policy 0, policy_version 343765 (0.0013) [2024-06-15 15:45:48,351][1652491] Updated weights for policy 0, policy_version 343825 (0.0013) [2024-06-15 15:45:50,124][1652491] Updated weights for policy 0, policy_version 343873 (0.0013) [2024-06-15 15:45:50,955][1648985] Fps is (10 sec: 45876.2, 60 sec: 46421.4, 300 sec: 46986.0). Total num frames: 704315392. Throughput: 0: 11514.4. Samples: 176136192. Policy #0 lag: (min: 49.0, avg: 194.6, max: 321.0) [2024-06-15 15:45:50,955][1648985] Avg episode reward: [(0, '126.200')] [2024-06-15 15:45:51,434][1652491] Updated weights for policy 0, policy_version 343931 (0.0017) [2024-06-15 15:45:55,955][1648985] Fps is (10 sec: 42598.9, 60 sec: 43690.7, 300 sec: 46319.5). Total num frames: 704380928. Throughput: 0: 11616.7. Samples: 176178688. Policy #0 lag: (min: 49.0, avg: 194.6, max: 321.0) [2024-06-15 15:45:55,955][1648985] Avg episode reward: [(0, '133.930')] [2024-06-15 15:45:57,103][1652491] Updated weights for policy 0, policy_version 344000 (0.0100) [2024-06-15 15:45:58,180][1652491] Updated weights for policy 0, policy_version 344058 (0.0015) [2024-06-15 15:46:00,937][1652491] Updated weights for policy 0, policy_version 344128 (0.0011) [2024-06-15 15:46:00,955][1648985] Fps is (10 sec: 45874.0, 60 sec: 48059.4, 300 sec: 47097.0). Total num frames: 704774144. Throughput: 0: 11958.0. Samples: 176247296. Policy #0 lag: (min: 49.0, avg: 194.6, max: 321.0) [2024-06-15 15:46:00,956][1648985] Avg episode reward: [(0, '122.290')] [2024-06-15 15:46:02,538][1652491] Updated weights for policy 0, policy_version 344192 (0.0011) [2024-06-15 15:46:05,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 45875.3, 300 sec: 46652.8). Total num frames: 704905216. Throughput: 0: 11798.8. Samples: 176320512. Policy #0 lag: (min: 49.0, avg: 194.6, max: 321.0) [2024-06-15 15:46:05,956][1648985] Avg episode reward: [(0, '129.630')] [2024-06-15 15:46:07,392][1652491] Updated weights for policy 0, policy_version 344240 (0.0013) [2024-06-15 15:46:08,949][1652491] Updated weights for policy 0, policy_version 344290 (0.0013) [2024-06-15 15:46:10,955][1648985] Fps is (10 sec: 39322.2, 60 sec: 48059.7, 300 sec: 47097.1). Total num frames: 705167360. Throughput: 0: 11741.9. Samples: 176350720. Policy #0 lag: (min: 49.0, avg: 194.6, max: 321.0) [2024-06-15 15:46:10,956][1648985] Avg episode reward: [(0, '159.360')] [2024-06-15 15:46:12,139][1652491] Updated weights for policy 0, policy_version 344377 (0.0015) [2024-06-15 15:46:12,433][1651469] Signal inference workers to stop experience collection... (18000 times) [2024-06-15 15:46:12,486][1652491] InferenceWorker_p0-w0: stopping experience collection (18000 times) [2024-06-15 15:46:12,729][1651469] Signal inference workers to resume experience collection... (18000 times) [2024-06-15 15:46:12,729][1652491] InferenceWorker_p0-w0: resuming experience collection (18000 times) [2024-06-15 15:46:13,729][1652491] Updated weights for policy 0, policy_version 344439 (0.0012) [2024-06-15 15:46:15,955][1648985] Fps is (10 sec: 52427.9, 60 sec: 45875.1, 300 sec: 46763.8). Total num frames: 705429504. Throughput: 0: 11605.4. Samples: 176415232. Policy #0 lag: (min: 49.0, avg: 194.6, max: 321.0) [2024-06-15 15:46:15,956][1648985] Avg episode reward: [(0, '152.030')] [2024-06-15 15:46:17,930][1652491] Updated weights for policy 0, policy_version 344468 (0.0013) [2024-06-15 15:46:18,975][1652491] Updated weights for policy 0, policy_version 344512 (0.0022) [2024-06-15 15:46:20,955][1648985] Fps is (10 sec: 49152.3, 60 sec: 47513.6, 300 sec: 46986.0). Total num frames: 705658880. Throughput: 0: 11707.7. Samples: 176491520. Policy #0 lag: (min: 49.0, avg: 194.6, max: 321.0) [2024-06-15 15:46:20,955][1648985] Avg episode reward: [(0, '152.490')] [2024-06-15 15:46:21,017][1652491] Updated weights for policy 0, policy_version 344571 (0.0012) [2024-06-15 15:46:23,466][1652491] Updated weights for policy 0, policy_version 344624 (0.0019) [2024-06-15 15:46:25,655][1652491] Updated weights for policy 0, policy_version 344704 (0.0140) [2024-06-15 15:46:25,955][1648985] Fps is (10 sec: 52427.8, 60 sec: 47513.4, 300 sec: 47097.2). Total num frames: 705953792. Throughput: 0: 11662.2. Samples: 176527360. Policy #0 lag: (min: 49.0, avg: 194.6, max: 321.0) [2024-06-15 15:46:25,966][1648985] Avg episode reward: [(0, '148.440')] [2024-06-15 15:46:29,669][1652491] Updated weights for policy 0, policy_version 344764 (0.0012) [2024-06-15 15:46:30,955][1648985] Fps is (10 sec: 42597.8, 60 sec: 45875.3, 300 sec: 46652.7). Total num frames: 706084864. Throughput: 0: 11491.6. Samples: 176591872. Policy #0 lag: (min: 49.0, avg: 194.6, max: 321.0) [2024-06-15 15:46:30,956][1648985] Avg episode reward: [(0, '137.990')] [2024-06-15 15:46:32,146][1652491] Updated weights for policy 0, policy_version 344816 (0.0094) [2024-06-15 15:46:35,250][1652491] Updated weights for policy 0, policy_version 344880 (0.0013) [2024-06-15 15:46:35,955][1648985] Fps is (10 sec: 39323.2, 60 sec: 45875.2, 300 sec: 46986.0). Total num frames: 706347008. Throughput: 0: 11616.7. Samples: 176658944. Policy #0 lag: (min: 49.0, avg: 194.6, max: 321.0) [2024-06-15 15:46:35,955][1648985] Avg episode reward: [(0, '153.940')] [2024-06-15 15:46:36,743][1652491] Updated weights for policy 0, policy_version 344944 (0.0013) [2024-06-15 15:46:40,065][1652491] Updated weights for policy 0, policy_version 344962 (0.0012) [2024-06-15 15:46:40,955][1648985] Fps is (10 sec: 49152.8, 60 sec: 45329.2, 300 sec: 46876.0). Total num frames: 706576384. Throughput: 0: 11514.3. Samples: 176696832. Policy #0 lag: (min: 49.0, avg: 194.6, max: 321.0) [2024-06-15 15:46:40,955][1648985] Avg episode reward: [(0, '159.520')] [2024-06-15 15:46:42,997][1652491] Updated weights for policy 0, policy_version 345041 (0.0014) [2024-06-15 15:46:43,993][1652491] Updated weights for policy 0, policy_version 345088 (0.0017) [2024-06-15 15:46:45,955][1648985] Fps is (10 sec: 45874.8, 60 sec: 47513.7, 300 sec: 47097.1). Total num frames: 706805760. Throughput: 0: 11559.9. Samples: 176767488. Policy #0 lag: (min: 38.0, avg: 142.1, max: 294.0) [2024-06-15 15:46:45,956][1648985] Avg episode reward: [(0, '158.250')] [2024-06-15 15:46:46,580][1652491] Updated weights for policy 0, policy_version 345143 (0.0015) [2024-06-15 15:46:48,034][1652491] Updated weights for policy 0, policy_version 345203 (0.0012) [2024-06-15 15:46:50,955][1648985] Fps is (10 sec: 42597.7, 60 sec: 44782.8, 300 sec: 46652.7). Total num frames: 707002368. Throughput: 0: 11639.4. Samples: 176844288. Policy #0 lag: (min: 38.0, avg: 142.1, max: 294.0) [2024-06-15 15:46:50,956][1648985] Avg episode reward: [(0, '145.760')] [2024-06-15 15:46:51,981][1652491] Updated weights for policy 0, policy_version 345268 (0.0015) [2024-06-15 15:46:54,333][1652491] Updated weights for policy 0, policy_version 345312 (0.0011) [2024-06-15 15:46:55,955][1648985] Fps is (10 sec: 45873.6, 60 sec: 48059.4, 300 sec: 47097.0). Total num frames: 707264512. Throughput: 0: 11730.4. Samples: 176878592. Policy #0 lag: (min: 38.0, avg: 142.1, max: 294.0) [2024-06-15 15:46:55,956][1648985] Avg episode reward: [(0, '151.170')] [2024-06-15 15:46:55,968][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000345344_707264512.pth... [2024-06-15 15:46:56,214][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000339872_696057856.pth [2024-06-15 15:46:56,820][1651469] Signal inference workers to stop experience collection... (18050 times) [2024-06-15 15:46:56,936][1652491] InferenceWorker_p0-w0: stopping experience collection (18050 times) [2024-06-15 15:46:57,013][1651469] Signal inference workers to resume experience collection... (18050 times) [2024-06-15 15:46:57,019][1652491] InferenceWorker_p0-w0: resuming experience collection (18050 times) [2024-06-15 15:46:57,021][1652491] Updated weights for policy 0, policy_version 345392 (0.0017) [2024-06-15 15:46:58,750][1652491] Updated weights for policy 0, policy_version 345456 (0.0013) [2024-06-15 15:47:00,955][1648985] Fps is (10 sec: 52429.8, 60 sec: 45875.4, 300 sec: 46874.9). Total num frames: 707526656. Throughput: 0: 11673.6. Samples: 176940544. Policy #0 lag: (min: 38.0, avg: 142.1, max: 294.0) [2024-06-15 15:47:00,955][1648985] Avg episode reward: [(0, '147.250')] [2024-06-15 15:47:03,509][1652491] Updated weights for policy 0, policy_version 345507 (0.0013) [2024-06-15 15:47:05,834][1652491] Updated weights for policy 0, policy_version 345568 (0.0015) [2024-06-15 15:47:05,955][1648985] Fps is (10 sec: 45876.5, 60 sec: 46967.4, 300 sec: 46874.9). Total num frames: 707723264. Throughput: 0: 11628.1. Samples: 177014784. Policy #0 lag: (min: 38.0, avg: 142.1, max: 294.0) [2024-06-15 15:47:05,956][1648985] Avg episode reward: [(0, '150.380')] [2024-06-15 15:47:07,200][1652491] Updated weights for policy 0, policy_version 345616 (0.0015) [2024-06-15 15:47:08,535][1652491] Updated weights for policy 0, policy_version 345666 (0.0014) [2024-06-15 15:47:09,898][1652491] Updated weights for policy 0, policy_version 345727 (0.0014) [2024-06-15 15:47:10,956][1648985] Fps is (10 sec: 52425.2, 60 sec: 48059.3, 300 sec: 46985.9). Total num frames: 708050944. Throughput: 0: 11502.9. Samples: 177044992. Policy #0 lag: (min: 38.0, avg: 142.1, max: 294.0) [2024-06-15 15:47:10,957][1648985] Avg episode reward: [(0, '161.380')] [2024-06-15 15:47:15,697][1652491] Updated weights for policy 0, policy_version 345781 (0.0017) [2024-06-15 15:47:15,955][1648985] Fps is (10 sec: 45875.7, 60 sec: 45875.3, 300 sec: 46652.8). Total num frames: 708182016. Throughput: 0: 11730.5. Samples: 177119744. Policy #0 lag: (min: 38.0, avg: 142.1, max: 294.0) [2024-06-15 15:47:15,956][1648985] Avg episode reward: [(0, '155.430')] [2024-06-15 15:47:17,322][1652491] Updated weights for policy 0, policy_version 345824 (0.0011) [2024-06-15 15:47:18,773][1652491] Updated weights for policy 0, policy_version 345888 (0.0012) [2024-06-15 15:47:20,460][1652491] Updated weights for policy 0, policy_version 345936 (0.0012) [2024-06-15 15:47:20,955][1648985] Fps is (10 sec: 45878.1, 60 sec: 47513.6, 300 sec: 46874.9). Total num frames: 708509696. Throughput: 0: 11764.6. Samples: 177188352. Policy #0 lag: (min: 38.0, avg: 142.1, max: 294.0) [2024-06-15 15:47:20,956][1648985] Avg episode reward: [(0, '140.710')] [2024-06-15 15:47:25,955][1648985] Fps is (10 sec: 39320.6, 60 sec: 43690.7, 300 sec: 46208.4). Total num frames: 708575232. Throughput: 0: 11673.5. Samples: 177222144. Policy #0 lag: (min: 38.0, avg: 142.1, max: 294.0) [2024-06-15 15:47:25,956][1648985] Avg episode reward: [(0, '147.550')] [2024-06-15 15:47:26,045][1652491] Updated weights for policy 0, policy_version 346000 (0.0013) [2024-06-15 15:47:28,149][1652491] Updated weights for policy 0, policy_version 346064 (0.0013) [2024-06-15 15:47:29,394][1652491] Updated weights for policy 0, policy_version 346108 (0.0012) [2024-06-15 15:47:30,595][1652491] Updated weights for policy 0, policy_version 346149 (0.0011) [2024-06-15 15:47:30,955][1648985] Fps is (10 sec: 45874.4, 60 sec: 48059.7, 300 sec: 46986.0). Total num frames: 708968448. Throughput: 0: 11593.9. Samples: 177289216. Policy #0 lag: (min: 38.0, avg: 142.1, max: 294.0) [2024-06-15 15:47:30,956][1648985] Avg episode reward: [(0, '144.920')] [2024-06-15 15:47:31,641][1652491] Updated weights for policy 0, policy_version 346192 (0.0032) [2024-06-15 15:47:35,955][1648985] Fps is (10 sec: 52430.2, 60 sec: 45875.2, 300 sec: 46430.6). Total num frames: 709099520. Throughput: 0: 11525.7. Samples: 177362944. Policy #0 lag: (min: 38.0, avg: 142.1, max: 294.0) [2024-06-15 15:47:35,955][1648985] Avg episode reward: [(0, '139.410')] [2024-06-15 15:47:37,304][1652491] Updated weights for policy 0, policy_version 346241 (0.0084) [2024-06-15 15:47:38,796][1652491] Updated weights for policy 0, policy_version 346304 (0.0013) [2024-06-15 15:47:40,461][1652491] Updated weights for policy 0, policy_version 346368 (0.0012) [2024-06-15 15:47:40,955][1648985] Fps is (10 sec: 39322.3, 60 sec: 46421.3, 300 sec: 46653.3). Total num frames: 709361664. Throughput: 0: 11446.1. Samples: 177393664. Policy #0 lag: (min: 38.0, avg: 142.1, max: 294.0) [2024-06-15 15:47:40,956][1648985] Avg episode reward: [(0, '126.300')] [2024-06-15 15:47:41,596][1651469] Signal inference workers to stop experience collection... (18100 times) [2024-06-15 15:47:41,631][1652491] InferenceWorker_p0-w0: stopping experience collection (18100 times) [2024-06-15 15:47:41,769][1651469] Signal inference workers to resume experience collection... (18100 times) [2024-06-15 15:47:41,770][1652491] InferenceWorker_p0-w0: resuming experience collection (18100 times) [2024-06-15 15:47:42,296][1652491] Updated weights for policy 0, policy_version 346430 (0.0071) [2024-06-15 15:47:44,156][1652491] Updated weights for policy 0, policy_version 346491 (0.0012) [2024-06-15 15:47:45,955][1648985] Fps is (10 sec: 52428.1, 60 sec: 46967.4, 300 sec: 46652.7). Total num frames: 709623808. Throughput: 0: 11616.7. Samples: 177463296. Policy #0 lag: (min: 38.0, avg: 142.1, max: 294.0) [2024-06-15 15:47:45,956][1648985] Avg episode reward: [(0, '107.190')] [2024-06-15 15:47:49,090][1652491] Updated weights for policy 0, policy_version 346551 (0.0025) [2024-06-15 15:47:50,955][1648985] Fps is (10 sec: 45875.3, 60 sec: 46967.6, 300 sec: 46875.0). Total num frames: 709820416. Throughput: 0: 11491.6. Samples: 177531904. Policy #0 lag: (min: 38.0, avg: 142.1, max: 294.0) [2024-06-15 15:47:50,956][1648985] Avg episode reward: [(0, '122.850')] [2024-06-15 15:47:51,443][1652491] Updated weights for policy 0, policy_version 346612 (0.0031) [2024-06-15 15:47:53,076][1652491] Updated weights for policy 0, policy_version 346643 (0.0012) [2024-06-15 15:47:54,187][1652491] Updated weights for policy 0, policy_version 346686 (0.0014) [2024-06-15 15:47:55,829][1652491] Updated weights for policy 0, policy_version 346749 (0.0114) [2024-06-15 15:47:55,955][1648985] Fps is (10 sec: 52429.7, 60 sec: 48060.1, 300 sec: 46652.7). Total num frames: 710148096. Throughput: 0: 11616.9. Samples: 177567744. Policy #0 lag: (min: 38.0, avg: 142.1, max: 294.0) [2024-06-15 15:47:55,955][1648985] Avg episode reward: [(0, '135.590')] [2024-06-15 15:48:00,569][1652491] Updated weights for policy 0, policy_version 346810 (0.0014) [2024-06-15 15:48:00,955][1648985] Fps is (10 sec: 45874.9, 60 sec: 45875.1, 300 sec: 46652.7). Total num frames: 710279168. Throughput: 0: 11514.3. Samples: 177637888. Policy #0 lag: (min: 39.0, avg: 127.1, max: 295.0) [2024-06-15 15:48:00,956][1648985] Avg episode reward: [(0, '150.750')] [2024-06-15 15:48:02,917][1652491] Updated weights for policy 0, policy_version 346874 (0.0013) [2024-06-15 15:48:04,872][1652491] Updated weights for policy 0, policy_version 346912 (0.0039) [2024-06-15 15:48:05,955][1648985] Fps is (10 sec: 39321.6, 60 sec: 46967.6, 300 sec: 46434.2). Total num frames: 710541312. Throughput: 0: 11571.2. Samples: 177709056. Policy #0 lag: (min: 39.0, avg: 127.1, max: 295.0) [2024-06-15 15:48:05,956][1648985] Avg episode reward: [(0, '129.040')] [2024-06-15 15:48:07,130][1652491] Updated weights for policy 0, policy_version 346992 (0.0012) [2024-06-15 15:48:10,791][1652491] Updated weights for policy 0, policy_version 347015 (0.0011) [2024-06-15 15:48:10,955][1648985] Fps is (10 sec: 42598.7, 60 sec: 44237.3, 300 sec: 46430.6). Total num frames: 710705152. Throughput: 0: 11582.6. Samples: 177743360. Policy #0 lag: (min: 39.0, avg: 127.1, max: 295.0) [2024-06-15 15:48:10,955][1648985] Avg episode reward: [(0, '126.490')] [2024-06-15 15:48:13,440][1652491] Updated weights for policy 0, policy_version 347088 (0.0015) [2024-06-15 15:48:14,527][1652491] Updated weights for policy 0, policy_version 347136 (0.0013) [2024-06-15 15:48:15,955][1648985] Fps is (10 sec: 49151.4, 60 sec: 47513.5, 300 sec: 46652.7). Total num frames: 711032832. Throughput: 0: 11628.1. Samples: 177812480. Policy #0 lag: (min: 39.0, avg: 127.1, max: 295.0) [2024-06-15 15:48:15,956][1648985] Avg episode reward: [(0, '136.580')] [2024-06-15 15:48:16,015][1652491] Updated weights for policy 0, policy_version 347184 (0.0175) [2024-06-15 15:48:17,842][1652491] Updated weights for policy 0, policy_version 347262 (0.0089) [2024-06-15 15:48:20,955][1648985] Fps is (10 sec: 49151.6, 60 sec: 44782.9, 300 sec: 46652.8). Total num frames: 711196672. Throughput: 0: 11593.9. Samples: 177884672. Policy #0 lag: (min: 39.0, avg: 127.1, max: 295.0) [2024-06-15 15:48:20,956][1648985] Avg episode reward: [(0, '143.910')] [2024-06-15 15:48:22,895][1652491] Updated weights for policy 0, policy_version 347314 (0.0016) [2024-06-15 15:48:25,189][1652491] Updated weights for policy 0, policy_version 347376 (0.0014) [2024-06-15 15:48:25,955][1648985] Fps is (10 sec: 42598.6, 60 sec: 48059.9, 300 sec: 46763.8). Total num frames: 711458816. Throughput: 0: 11764.6. Samples: 177923072. Policy #0 lag: (min: 39.0, avg: 127.1, max: 295.0) [2024-06-15 15:48:25,956][1648985] Avg episode reward: [(0, '129.890')] [2024-06-15 15:48:26,477][1651469] Signal inference workers to stop experience collection... (18150 times) [2024-06-15 15:48:26,572][1652491] InferenceWorker_p0-w0: stopping experience collection (18150 times) [2024-06-15 15:48:26,726][1651469] Signal inference workers to resume experience collection... (18150 times) [2024-06-15 15:48:26,727][1652491] InferenceWorker_p0-w0: resuming experience collection (18150 times) [2024-06-15 15:48:27,550][1652491] Updated weights for policy 0, policy_version 347440 (0.0032) [2024-06-15 15:48:29,091][1652491] Updated weights for policy 0, policy_version 347504 (0.0013) [2024-06-15 15:48:30,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 45875.3, 300 sec: 46652.8). Total num frames: 711720960. Throughput: 0: 11537.1. Samples: 177982464. Policy #0 lag: (min: 39.0, avg: 127.1, max: 295.0) [2024-06-15 15:48:30,956][1648985] Avg episode reward: [(0, '116.570')] [2024-06-15 15:48:33,461][1652491] Updated weights for policy 0, policy_version 347556 (0.0071) [2024-06-15 15:48:35,955][1648985] Fps is (10 sec: 39321.7, 60 sec: 45875.2, 300 sec: 46541.7). Total num frames: 711852032. Throughput: 0: 11787.4. Samples: 178062336. Policy #0 lag: (min: 39.0, avg: 127.1, max: 295.0) [2024-06-15 15:48:35,956][1648985] Avg episode reward: [(0, '112.430')] [2024-06-15 15:48:36,781][1652491] Updated weights for policy 0, policy_version 347620 (0.0074) [2024-06-15 15:48:39,223][1652491] Updated weights for policy 0, policy_version 347682 (0.0012) [2024-06-15 15:48:40,955][1648985] Fps is (10 sec: 45875.3, 60 sec: 46967.4, 300 sec: 46430.6). Total num frames: 712179712. Throughput: 0: 11616.7. Samples: 178090496. Policy #0 lag: (min: 39.0, avg: 127.1, max: 295.0) [2024-06-15 15:48:40,956][1648985] Avg episode reward: [(0, '111.240')] [2024-06-15 15:48:41,319][1652491] Updated weights for policy 0, policy_version 347760 (0.0012) [2024-06-15 15:48:44,517][1652491] Updated weights for policy 0, policy_version 347795 (0.0014) [2024-06-15 15:48:45,955][1648985] Fps is (10 sec: 52428.6, 60 sec: 45875.2, 300 sec: 46652.7). Total num frames: 712376320. Throughput: 0: 11411.9. Samples: 178151424. Policy #0 lag: (min: 39.0, avg: 127.1, max: 295.0) [2024-06-15 15:48:45,956][1648985] Avg episode reward: [(0, '113.190')] [2024-06-15 15:48:47,671][1652491] Updated weights for policy 0, policy_version 347844 (0.0034) [2024-06-15 15:48:49,021][1652491] Updated weights for policy 0, policy_version 347902 (0.0122) [2024-06-15 15:48:50,955][1648985] Fps is (10 sec: 32767.4, 60 sec: 44782.8, 300 sec: 45764.1). Total num frames: 712507392. Throughput: 0: 11457.4. Samples: 178224640. Policy #0 lag: (min: 39.0, avg: 127.1, max: 295.0) [2024-06-15 15:48:50,956][1648985] Avg episode reward: [(0, '118.570')] [2024-06-15 15:48:51,806][1652491] Updated weights for policy 0, policy_version 347952 (0.0022) [2024-06-15 15:48:53,649][1652491] Updated weights for policy 0, policy_version 348032 (0.0020) [2024-06-15 15:48:55,955][1648985] Fps is (10 sec: 39321.5, 60 sec: 43690.6, 300 sec: 46208.5). Total num frames: 712769536. Throughput: 0: 11241.2. Samples: 178249216. Policy #0 lag: (min: 39.0, avg: 127.1, max: 295.0) [2024-06-15 15:48:55,956][1648985] Avg episode reward: [(0, '122.590')] [2024-06-15 15:48:56,315][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000348064_712835072.pth... [2024-06-15 15:48:56,494][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000342656_701759488.pth [2024-06-15 15:49:00,271][1652491] Updated weights for policy 0, policy_version 348113 (0.0014) [2024-06-15 15:49:00,955][1648985] Fps is (10 sec: 45876.1, 60 sec: 44783.0, 300 sec: 45764.1). Total num frames: 712966144. Throughput: 0: 11377.8. Samples: 178324480. Policy #0 lag: (min: 39.0, avg: 127.1, max: 295.0) [2024-06-15 15:49:00,956][1648985] Avg episode reward: [(0, '128.010')] [2024-06-15 15:49:02,027][1652491] Updated weights for policy 0, policy_version 348165 (0.0036) [2024-06-15 15:49:03,776][1652491] Updated weights for policy 0, policy_version 348225 (0.0012) [2024-06-15 15:49:05,159][1652491] Updated weights for policy 0, policy_version 348281 (0.0014) [2024-06-15 15:49:05,955][1648985] Fps is (10 sec: 52427.7, 60 sec: 45874.9, 300 sec: 46430.5). Total num frames: 713293824. Throughput: 0: 11081.9. Samples: 178383360. Policy #0 lag: (min: 39.0, avg: 127.1, max: 295.0) [2024-06-15 15:49:05,956][1648985] Avg episode reward: [(0, '119.880')] [2024-06-15 15:49:08,710][1652491] Updated weights for policy 0, policy_version 348347 (0.0013) [2024-06-15 15:49:10,955][1648985] Fps is (10 sec: 45875.3, 60 sec: 45329.1, 300 sec: 45875.2). Total num frames: 713424896. Throughput: 0: 11047.8. Samples: 178420224. Policy #0 lag: (min: 39.0, avg: 127.1, max: 295.0) [2024-06-15 15:49:10,956][1648985] Avg episode reward: [(0, '122.870')] [2024-06-15 15:49:11,686][1651469] Signal inference workers to stop experience collection... (18200 times) [2024-06-15 15:49:11,739][1652491] InferenceWorker_p0-w0: stopping experience collection (18200 times) [2024-06-15 15:49:11,986][1651469] Signal inference workers to resume experience collection... (18200 times) [2024-06-15 15:49:11,987][1652491] InferenceWorker_p0-w0: resuming experience collection (18200 times) [2024-06-15 15:49:12,673][1652491] Updated weights for policy 0, policy_version 348404 (0.0011) [2024-06-15 15:49:14,229][1652491] Updated weights for policy 0, policy_version 348433 (0.0012) [2024-06-15 15:49:15,955][1648985] Fps is (10 sec: 42599.1, 60 sec: 44782.9, 300 sec: 46319.5). Total num frames: 713719808. Throughput: 0: 11207.1. Samples: 178486784. Policy #0 lag: (min: 39.0, avg: 127.1, max: 295.0) [2024-06-15 15:49:15,956][1648985] Avg episode reward: [(0, '125.290')] [2024-06-15 15:49:16,199][1652491] Updated weights for policy 0, policy_version 348512 (0.0011) [2024-06-15 15:49:19,902][1652491] Updated weights for policy 0, policy_version 348592 (0.0013) [2024-06-15 15:49:20,955][1648985] Fps is (10 sec: 52428.4, 60 sec: 45875.2, 300 sec: 46319.5). Total num frames: 713949184. Throughput: 0: 10922.7. Samples: 178553856. Policy #0 lag: (min: 11.0, avg: 129.4, max: 267.0) [2024-06-15 15:49:20,956][1648985] Avg episode reward: [(0, '132.770')] [2024-06-15 15:49:24,727][1652491] Updated weights for policy 0, policy_version 348656 (0.0012) [2024-06-15 15:49:25,955][1648985] Fps is (10 sec: 39321.9, 60 sec: 44236.8, 300 sec: 45875.2). Total num frames: 714113024. Throughput: 0: 11195.7. Samples: 178594304. Policy #0 lag: (min: 11.0, avg: 129.4, max: 267.0) [2024-06-15 15:49:25,956][1648985] Avg episode reward: [(0, '130.820')] [2024-06-15 15:49:26,246][1652491] Updated weights for policy 0, policy_version 348708 (0.0012) [2024-06-15 15:49:28,441][1652491] Updated weights for policy 0, policy_version 348798 (0.0012) [2024-06-15 15:49:30,955][1648985] Fps is (10 sec: 45875.5, 60 sec: 44783.0, 300 sec: 46430.6). Total num frames: 714407936. Throughput: 0: 11218.5. Samples: 178656256. Policy #0 lag: (min: 11.0, avg: 129.4, max: 267.0) [2024-06-15 15:49:30,955][1648985] Avg episode reward: [(0, '126.900')] [2024-06-15 15:49:35,812][1652491] Updated weights for policy 0, policy_version 348880 (0.0110) [2024-06-15 15:49:35,955][1648985] Fps is (10 sec: 39321.8, 60 sec: 44236.8, 300 sec: 45430.9). Total num frames: 714506240. Throughput: 0: 11184.4. Samples: 178727936. Policy #0 lag: (min: 11.0, avg: 129.4, max: 267.0) [2024-06-15 15:49:35,955][1648985] Avg episode reward: [(0, '113.390')] [2024-06-15 15:49:37,142][1652491] Updated weights for policy 0, policy_version 348934 (0.0016) [2024-06-15 15:49:38,530][1652491] Updated weights for policy 0, policy_version 348996 (0.0020) [2024-06-15 15:49:39,928][1652491] Updated weights for policy 0, policy_version 349054 (0.0033) [2024-06-15 15:49:40,955][1648985] Fps is (10 sec: 45874.8, 60 sec: 44782.9, 300 sec: 46208.4). Total num frames: 714866688. Throughput: 0: 11275.4. Samples: 178756608. Policy #0 lag: (min: 11.0, avg: 129.4, max: 267.0) [2024-06-15 15:49:40,956][1648985] Avg episode reward: [(0, '125.200')] [2024-06-15 15:49:43,220][1652491] Updated weights for policy 0, policy_version 349119 (0.0013) [2024-06-15 15:49:45,955][1648985] Fps is (10 sec: 49150.7, 60 sec: 43690.5, 300 sec: 45653.0). Total num frames: 714997760. Throughput: 0: 11127.4. Samples: 178825216. Policy #0 lag: (min: 11.0, avg: 129.4, max: 267.0) [2024-06-15 15:49:45,956][1648985] Avg episode reward: [(0, '147.810')] [2024-06-15 15:49:48,864][1652491] Updated weights for policy 0, policy_version 349169 (0.0012) [2024-06-15 15:49:50,953][1652491] Updated weights for policy 0, policy_version 349264 (0.0113) [2024-06-15 15:49:50,955][1648985] Fps is (10 sec: 42599.0, 60 sec: 46421.5, 300 sec: 45875.2). Total num frames: 715292672. Throughput: 0: 11195.8. Samples: 178887168. Policy #0 lag: (min: 11.0, avg: 129.4, max: 267.0) [2024-06-15 15:49:50,956][1648985] Avg episode reward: [(0, '144.560')] [2024-06-15 15:49:53,230][1652491] Updated weights for policy 0, policy_version 349313 (0.0015) [2024-06-15 15:49:53,697][1651469] Signal inference workers to stop experience collection... (18250 times) [2024-06-15 15:49:53,726][1652491] InferenceWorker_p0-w0: stopping experience collection (18250 times) [2024-06-15 15:49:53,974][1651469] Signal inference workers to resume experience collection... (18250 times) [2024-06-15 15:49:53,975][1652491] InferenceWorker_p0-w0: resuming experience collection (18250 times) [2024-06-15 15:49:55,955][1648985] Fps is (10 sec: 52430.0, 60 sec: 45875.2, 300 sec: 46208.4). Total num frames: 715522048. Throughput: 0: 11241.2. Samples: 178926080. Policy #0 lag: (min: 11.0, avg: 129.4, max: 267.0) [2024-06-15 15:49:55,955][1648985] Avg episode reward: [(0, '153.720')] [2024-06-15 15:49:59,387][1652491] Updated weights for policy 0, policy_version 349377 (0.0163) [2024-06-15 15:50:00,955][1648985] Fps is (10 sec: 36043.7, 60 sec: 44782.7, 300 sec: 45764.1). Total num frames: 715653120. Throughput: 0: 11355.0. Samples: 178997760. Policy #0 lag: (min: 11.0, avg: 129.4, max: 267.0) [2024-06-15 15:50:00,956][1648985] Avg episode reward: [(0, '148.960')] [2024-06-15 15:50:01,371][1652491] Updated weights for policy 0, policy_version 349461 (0.0013) [2024-06-15 15:50:03,283][1652491] Updated weights for policy 0, policy_version 349558 (0.0011) [2024-06-15 15:50:05,955][1648985] Fps is (10 sec: 42598.3, 60 sec: 44237.0, 300 sec: 46319.5). Total num frames: 715948032. Throughput: 0: 11195.7. Samples: 179057664. Policy #0 lag: (min: 11.0, avg: 129.4, max: 267.0) [2024-06-15 15:50:05,956][1648985] Avg episode reward: [(0, '158.100')] [2024-06-15 15:50:06,720][1652491] Updated weights for policy 0, policy_version 349620 (0.0016) [2024-06-15 15:50:10,955][1648985] Fps is (10 sec: 39322.5, 60 sec: 43690.6, 300 sec: 45319.8). Total num frames: 716046336. Throughput: 0: 11127.5. Samples: 179095040. Policy #0 lag: (min: 11.0, avg: 129.4, max: 267.0) [2024-06-15 15:50:10,956][1648985] Avg episode reward: [(0, '155.410')] [2024-06-15 15:50:11,529][1652491] Updated weights for policy 0, policy_version 349651 (0.0013) [2024-06-15 15:50:13,617][1652491] Updated weights for policy 0, policy_version 349748 (0.0103) [2024-06-15 15:50:14,849][1652491] Updated weights for policy 0, policy_version 349820 (0.0031) [2024-06-15 15:50:15,955][1648985] Fps is (10 sec: 49152.3, 60 sec: 45329.2, 300 sec: 46208.4). Total num frames: 716439552. Throughput: 0: 11150.2. Samples: 179158016. Policy #0 lag: (min: 11.0, avg: 129.4, max: 267.0) [2024-06-15 15:50:15,955][1648985] Avg episode reward: [(0, '172.870')] [2024-06-15 15:50:18,130][1652491] Updated weights for policy 0, policy_version 349884 (0.0027) [2024-06-15 15:50:20,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 43690.7, 300 sec: 45653.0). Total num frames: 716570624. Throughput: 0: 11207.1. Samples: 179232256. Policy #0 lag: (min: 11.0, avg: 129.4, max: 267.0) [2024-06-15 15:50:20,956][1648985] Avg episode reward: [(0, '151.340')] [2024-06-15 15:50:24,801][1652491] Updated weights for policy 0, policy_version 349960 (0.0079) [2024-06-15 15:50:25,955][1648985] Fps is (10 sec: 39321.1, 60 sec: 45329.0, 300 sec: 45764.1). Total num frames: 716832768. Throughput: 0: 11355.0. Samples: 179267584. Policy #0 lag: (min: 11.0, avg: 129.4, max: 267.0) [2024-06-15 15:50:25,956][1648985] Avg episode reward: [(0, '122.800')] [2024-06-15 15:50:26,508][1652491] Updated weights for policy 0, policy_version 350038 (0.0034) [2024-06-15 15:50:27,439][1652491] Updated weights for policy 0, policy_version 350080 (0.0012) [2024-06-15 15:50:29,272][1652491] Updated weights for policy 0, policy_version 350128 (0.0013) [2024-06-15 15:50:30,955][1648985] Fps is (10 sec: 52429.1, 60 sec: 44783.0, 300 sec: 45764.1). Total num frames: 717094912. Throughput: 0: 11207.2. Samples: 179329536. Policy #0 lag: (min: 11.0, avg: 129.4, max: 267.0) [2024-06-15 15:50:30,956][1648985] Avg episode reward: [(0, '123.920')] [2024-06-15 15:50:34,750][1652491] Updated weights for policy 0, policy_version 350176 (0.0012) [2024-06-15 15:50:35,955][1648985] Fps is (10 sec: 42597.9, 60 sec: 45875.0, 300 sec: 45430.9). Total num frames: 717258752. Throughput: 0: 11446.0. Samples: 179402240. Policy #0 lag: (min: 11.0, avg: 129.4, max: 267.0) [2024-06-15 15:50:35,956][1648985] Avg episode reward: [(0, '128.000')] [2024-06-15 15:50:36,409][1651469] Signal inference workers to stop experience collection... (18300 times) [2024-06-15 15:50:36,477][1652491] InferenceWorker_p0-w0: stopping experience collection (18300 times) [2024-06-15 15:50:36,478][1652491] Updated weights for policy 0, policy_version 350244 (0.0141) [2024-06-15 15:50:36,602][1651469] Signal inference workers to resume experience collection... (18300 times) [2024-06-15 15:50:36,603][1652491] InferenceWorker_p0-w0: resuming experience collection (18300 times) [2024-06-15 15:50:38,309][1652491] Updated weights for policy 0, policy_version 350332 (0.0129) [2024-06-15 15:50:40,955][1648985] Fps is (10 sec: 45874.8, 60 sec: 44782.9, 300 sec: 46097.4). Total num frames: 717553664. Throughput: 0: 11184.3. Samples: 179429376. Policy #0 lag: (min: 69.0, avg: 134.7, max: 309.0) [2024-06-15 15:50:40,956][1648985] Avg episode reward: [(0, '119.290')] [2024-06-15 15:50:41,019][1652491] Updated weights for policy 0, policy_version 350372 (0.0021) [2024-06-15 15:50:45,955][1648985] Fps is (10 sec: 39322.2, 60 sec: 44236.9, 300 sec: 45208.7). Total num frames: 717651968. Throughput: 0: 11275.4. Samples: 179505152. Policy #0 lag: (min: 69.0, avg: 134.7, max: 309.0) [2024-06-15 15:50:45,956][1648985] Avg episode reward: [(0, '128.850')] [2024-06-15 15:50:46,056][1652491] Updated weights for policy 0, policy_version 350419 (0.0012) [2024-06-15 15:50:47,433][1652491] Updated weights for policy 0, policy_version 350467 (0.0012) [2024-06-15 15:50:48,906][1652491] Updated weights for policy 0, policy_version 350544 (0.0013) [2024-06-15 15:50:49,864][1652491] Updated weights for policy 0, policy_version 350592 (0.0012) [2024-06-15 15:50:50,955][1648985] Fps is (10 sec: 45875.8, 60 sec: 45329.1, 300 sec: 46208.4). Total num frames: 718012416. Throughput: 0: 11389.2. Samples: 179570176. Policy #0 lag: (min: 69.0, avg: 134.7, max: 309.0) [2024-06-15 15:50:50,956][1648985] Avg episode reward: [(0, '129.200')] [2024-06-15 15:50:55,955][1648985] Fps is (10 sec: 49151.4, 60 sec: 43690.5, 300 sec: 45319.8). Total num frames: 718143488. Throughput: 0: 11309.5. Samples: 179603968. Policy #0 lag: (min: 69.0, avg: 134.7, max: 309.0) [2024-06-15 15:50:55,956][1648985] Avg episode reward: [(0, '144.840')] [2024-06-15 15:50:55,961][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000350656_718143488.pth... [2024-06-15 15:50:56,079][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000345344_707264512.pth [2024-06-15 15:50:56,083][1651469] Saving a milestone train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/milestones/checkpoint_000350656_718143488.pth [2024-06-15 15:50:57,011][1652491] Updated weights for policy 0, policy_version 350658 (0.0044) [2024-06-15 15:50:58,474][1652491] Updated weights for policy 0, policy_version 350736 (0.0013) [2024-06-15 15:51:00,648][1652491] Updated weights for policy 0, policy_version 350819 (0.0014) [2024-06-15 15:51:00,955][1648985] Fps is (10 sec: 49151.8, 60 sec: 47513.8, 300 sec: 46097.4). Total num frames: 718503936. Throughput: 0: 11457.4. Samples: 179673600. Policy #0 lag: (min: 69.0, avg: 134.7, max: 309.0) [2024-06-15 15:51:00,956][1648985] Avg episode reward: [(0, '138.840')] [2024-06-15 15:51:02,690][1652491] Updated weights for policy 0, policy_version 350866 (0.0013) [2024-06-15 15:51:05,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 45329.0, 300 sec: 45764.1). Total num frames: 718667776. Throughput: 0: 11332.2. Samples: 179742208. Policy #0 lag: (min: 69.0, avg: 134.7, max: 309.0) [2024-06-15 15:51:05,956][1648985] Avg episode reward: [(0, '120.570')] [2024-06-15 15:51:09,159][1652491] Updated weights for policy 0, policy_version 350950 (0.0016) [2024-06-15 15:51:10,657][1652491] Updated weights for policy 0, policy_version 351011 (0.0100) [2024-06-15 15:51:10,955][1648985] Fps is (10 sec: 39321.4, 60 sec: 47513.6, 300 sec: 45653.1). Total num frames: 718897152. Throughput: 0: 11525.7. Samples: 179786240. Policy #0 lag: (min: 69.0, avg: 134.7, max: 309.0) [2024-06-15 15:51:10,956][1648985] Avg episode reward: [(0, '131.110')] [2024-06-15 15:51:11,873][1652491] Updated weights for policy 0, policy_version 351058 (0.0011) [2024-06-15 15:51:13,689][1652491] Updated weights for policy 0, policy_version 351120 (0.0062) [2024-06-15 15:51:15,955][1648985] Fps is (10 sec: 52429.4, 60 sec: 45875.1, 300 sec: 45875.2). Total num frames: 719192064. Throughput: 0: 11400.5. Samples: 179842560. Policy #0 lag: (min: 69.0, avg: 134.7, max: 309.0) [2024-06-15 15:51:15,956][1648985] Avg episode reward: [(0, '151.010')] [2024-06-15 15:51:19,302][1652491] Updated weights for policy 0, policy_version 351170 (0.0012) [2024-06-15 15:51:19,609][1651469] Signal inference workers to stop experience collection... (18350 times) [2024-06-15 15:51:19,677][1652491] InferenceWorker_p0-w0: stopping experience collection (18350 times) [2024-06-15 15:51:19,791][1651469] Signal inference workers to resume experience collection... (18350 times) [2024-06-15 15:51:19,792][1652491] InferenceWorker_p0-w0: resuming experience collection (18350 times) [2024-06-15 15:51:20,955][1648985] Fps is (10 sec: 45875.0, 60 sec: 46421.3, 300 sec: 45430.9). Total num frames: 719355904. Throughput: 0: 11639.5. Samples: 179926016. Policy #0 lag: (min: 69.0, avg: 134.7, max: 309.0) [2024-06-15 15:51:20,956][1648985] Avg episode reward: [(0, '143.910')] [2024-06-15 15:51:21,297][1652491] Updated weights for policy 0, policy_version 351264 (0.0012) [2024-06-15 15:51:22,973][1652491] Updated weights for policy 0, policy_version 351313 (0.0011) [2024-06-15 15:51:24,641][1652491] Updated weights for policy 0, policy_version 351376 (0.0013) [2024-06-15 15:51:25,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 48059.8, 300 sec: 46208.4). Total num frames: 719716352. Throughput: 0: 11582.6. Samples: 179950592. Policy #0 lag: (min: 69.0, avg: 134.7, max: 309.0) [2024-06-15 15:51:25,956][1648985] Avg episode reward: [(0, '136.720')] [2024-06-15 15:51:30,955][1648985] Fps is (10 sec: 36045.3, 60 sec: 43690.7, 300 sec: 45319.8). Total num frames: 719716352. Throughput: 0: 11468.8. Samples: 180021248. Policy #0 lag: (min: 69.0, avg: 134.7, max: 309.0) [2024-06-15 15:51:30,955][1648985] Avg episode reward: [(0, '128.910')] [2024-06-15 15:51:31,477][1652491] Updated weights for policy 0, policy_version 351443 (0.0013) [2024-06-15 15:51:34,312][1652491] Updated weights for policy 0, policy_version 351568 (0.0086) [2024-06-15 15:51:35,566][1652491] Updated weights for policy 0, policy_version 351613 (0.0013) [2024-06-15 15:51:35,955][1648985] Fps is (10 sec: 39321.6, 60 sec: 47513.8, 300 sec: 45875.2). Total num frames: 720109568. Throughput: 0: 11355.0. Samples: 180081152. Policy #0 lag: (min: 69.0, avg: 134.7, max: 309.0) [2024-06-15 15:51:35,956][1648985] Avg episode reward: [(0, '134.590')] [2024-06-15 15:51:38,231][1652491] Updated weights for policy 0, policy_version 351674 (0.0025) [2024-06-15 15:51:40,955][1648985] Fps is (10 sec: 52428.0, 60 sec: 44782.9, 300 sec: 45542.0). Total num frames: 720240640. Throughput: 0: 11400.6. Samples: 180116992. Policy #0 lag: (min: 69.0, avg: 134.7, max: 309.0) [2024-06-15 15:51:40,956][1648985] Avg episode reward: [(0, '134.110')] [2024-06-15 15:51:43,376][1652491] Updated weights for policy 0, policy_version 351739 (0.0083) [2024-06-15 15:51:44,684][1652491] Updated weights for policy 0, policy_version 351792 (0.0025) [2024-06-15 15:51:45,955][1648985] Fps is (10 sec: 42598.0, 60 sec: 48059.7, 300 sec: 45875.2). Total num frames: 720535552. Throughput: 0: 11514.3. Samples: 180191744. Policy #0 lag: (min: 69.0, avg: 134.7, max: 309.0) [2024-06-15 15:51:45,957][1648985] Avg episode reward: [(0, '136.490')] [2024-06-15 15:51:46,467][1652491] Updated weights for policy 0, policy_version 351841 (0.0014) [2024-06-15 15:51:47,444][1652491] Updated weights for policy 0, policy_version 351874 (0.0014) [2024-06-15 15:51:48,358][1652491] Updated weights for policy 0, policy_version 351926 (0.0132) [2024-06-15 15:51:50,955][1648985] Fps is (10 sec: 52429.1, 60 sec: 45875.1, 300 sec: 45764.2). Total num frames: 720764928. Throughput: 0: 11605.4. Samples: 180264448. Policy #0 lag: (min: 69.0, avg: 134.7, max: 309.0) [2024-06-15 15:51:50,956][1648985] Avg episode reward: [(0, '121.400')] [2024-06-15 15:51:54,325][1652491] Updated weights for policy 0, policy_version 351984 (0.0012) [2024-06-15 15:51:55,955][1648985] Fps is (10 sec: 39322.0, 60 sec: 46421.5, 300 sec: 45430.9). Total num frames: 720928768. Throughput: 0: 11502.9. Samples: 180303872. Policy #0 lag: (min: 7.0, avg: 75.1, max: 263.0) [2024-06-15 15:51:55,956][1648985] Avg episode reward: [(0, '126.420')] [2024-06-15 15:51:56,515][1652491] Updated weights for policy 0, policy_version 352058 (0.0013) [2024-06-15 15:51:57,804][1652491] Updated weights for policy 0, policy_version 352099 (0.0014) [2024-06-15 15:51:59,001][1652491] Updated weights for policy 0, policy_version 352144 (0.0013) [2024-06-15 15:51:59,403][1651469] Signal inference workers to stop experience collection... (18400 times) [2024-06-15 15:51:59,435][1652491] InferenceWorker_p0-w0: stopping experience collection (18400 times) [2024-06-15 15:51:59,620][1651469] Signal inference workers to resume experience collection... (18400 times) [2024-06-15 15:51:59,620][1652491] InferenceWorker_p0-w0: resuming experience collection (18400 times) [2024-06-15 15:52:00,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 46421.3, 300 sec: 45986.3). Total num frames: 721289216. Throughput: 0: 11514.3. Samples: 180360704. Policy #0 lag: (min: 7.0, avg: 75.1, max: 263.0) [2024-06-15 15:52:00,956][1648985] Avg episode reward: [(0, '134.810')] [2024-06-15 15:52:05,643][1652491] Updated weights for policy 0, policy_version 352227 (0.0013) [2024-06-15 15:52:05,955][1648985] Fps is (10 sec: 45875.1, 60 sec: 45329.1, 300 sec: 45208.8). Total num frames: 721387520. Throughput: 0: 11332.3. Samples: 180435968. Policy #0 lag: (min: 7.0, avg: 75.1, max: 263.0) [2024-06-15 15:52:05,956][1648985] Avg episode reward: [(0, '123.440')] [2024-06-15 15:52:06,797][1652491] Updated weights for policy 0, policy_version 352257 (0.0010) [2024-06-15 15:52:08,440][1652491] Updated weights for policy 0, policy_version 352321 (0.0091) [2024-06-15 15:52:10,692][1652491] Updated weights for policy 0, policy_version 352405 (0.0014) [2024-06-15 15:52:10,955][1648985] Fps is (10 sec: 45875.3, 60 sec: 47513.6, 300 sec: 45986.3). Total num frames: 721747968. Throughput: 0: 11548.5. Samples: 180470272. Policy #0 lag: (min: 7.0, avg: 75.1, max: 263.0) [2024-06-15 15:52:10,956][1648985] Avg episode reward: [(0, '128.360')] [2024-06-15 15:52:15,955][1648985] Fps is (10 sec: 42598.8, 60 sec: 43690.7, 300 sec: 45097.7). Total num frames: 721813504. Throughput: 0: 11423.3. Samples: 180535296. Policy #0 lag: (min: 7.0, avg: 75.1, max: 263.0) [2024-06-15 15:52:15,956][1648985] Avg episode reward: [(0, '113.280')] [2024-06-15 15:52:17,240][1652491] Updated weights for policy 0, policy_version 352480 (0.0018) [2024-06-15 15:52:18,032][1652491] Updated weights for policy 0, policy_version 352512 (0.0044) [2024-06-15 15:52:19,518][1652491] Updated weights for policy 0, policy_version 352569 (0.0014) [2024-06-15 15:52:20,955][1648985] Fps is (10 sec: 39321.6, 60 sec: 46421.4, 300 sec: 45986.3). Total num frames: 722141184. Throughput: 0: 11639.5. Samples: 180604928. Policy #0 lag: (min: 7.0, avg: 75.1, max: 263.0) [2024-06-15 15:52:20,957][1648985] Avg episode reward: [(0, '137.680')] [2024-06-15 15:52:21,384][1652491] Updated weights for policy 0, policy_version 352624 (0.0013) [2024-06-15 15:52:22,732][1652491] Updated weights for policy 0, policy_version 352674 (0.0012) [2024-06-15 15:52:25,955][1648985] Fps is (10 sec: 52428.4, 60 sec: 43690.7, 300 sec: 45319.8). Total num frames: 722337792. Throughput: 0: 11446.0. Samples: 180632064. Policy #0 lag: (min: 7.0, avg: 75.1, max: 263.0) [2024-06-15 15:52:25,956][1648985] Avg episode reward: [(0, '150.000')] [2024-06-15 15:52:29,041][1652491] Updated weights for policy 0, policy_version 352761 (0.0013) [2024-06-15 15:52:30,725][1652491] Updated weights for policy 0, policy_version 352800 (0.0012) [2024-06-15 15:52:30,955][1648985] Fps is (10 sec: 39322.0, 60 sec: 46967.5, 300 sec: 45542.0). Total num frames: 722534400. Throughput: 0: 11480.2. Samples: 180708352. Policy #0 lag: (min: 7.0, avg: 75.1, max: 263.0) [2024-06-15 15:52:30,955][1648985] Avg episode reward: [(0, '166.710')] [2024-06-15 15:52:32,709][1652491] Updated weights for policy 0, policy_version 352853 (0.0019) [2024-06-15 15:52:35,182][1652491] Updated weights for policy 0, policy_version 352952 (0.0035) [2024-06-15 15:52:35,955][1648985] Fps is (10 sec: 52428.0, 60 sec: 45875.1, 300 sec: 45764.1). Total num frames: 722862080. Throughput: 0: 11036.4. Samples: 180761088. Policy #0 lag: (min: 7.0, avg: 75.1, max: 263.0) [2024-06-15 15:52:35,956][1648985] Avg episode reward: [(0, '165.540')] [2024-06-15 15:52:40,955][1648985] Fps is (10 sec: 36044.3, 60 sec: 44236.8, 300 sec: 44986.6). Total num frames: 722894848. Throughput: 0: 11104.7. Samples: 180803584. Policy #0 lag: (min: 7.0, avg: 75.1, max: 263.0) [2024-06-15 15:52:40,956][1648985] Avg episode reward: [(0, '168.390')] [2024-06-15 15:52:41,753][1652491] Updated weights for policy 0, policy_version 353008 (0.0011) [2024-06-15 15:52:43,364][1652491] Updated weights for policy 0, policy_version 353072 (0.0011) [2024-06-15 15:52:45,679][1651469] Signal inference workers to stop experience collection... (18450 times) [2024-06-15 15:52:45,795][1652491] InferenceWorker_p0-w0: stopping experience collection (18450 times) [2024-06-15 15:52:45,950][1651469] Signal inference workers to resume experience collection... (18450 times) [2024-06-15 15:52:45,951][1652491] InferenceWorker_p0-w0: resuming experience collection (18450 times) [2024-06-15 15:52:45,955][1648985] Fps is (10 sec: 36045.9, 60 sec: 44783.1, 300 sec: 45430.9). Total num frames: 723222528. Throughput: 0: 11195.8. Samples: 180864512. Policy #0 lag: (min: 7.0, avg: 75.1, max: 263.0) [2024-06-15 15:52:45,955][1648985] Avg episode reward: [(0, '150.870')] [2024-06-15 15:52:46,212][1652491] Updated weights for policy 0, policy_version 353140 (0.0154) [2024-06-15 15:52:48,029][1652491] Updated weights for policy 0, policy_version 353212 (0.0012) [2024-06-15 15:52:50,955][1648985] Fps is (10 sec: 49152.4, 60 sec: 43690.7, 300 sec: 44875.5). Total num frames: 723386368. Throughput: 0: 10888.6. Samples: 180925952. Policy #0 lag: (min: 7.0, avg: 75.1, max: 263.0) [2024-06-15 15:52:50,956][1648985] Avg episode reward: [(0, '137.530')] [2024-06-15 15:52:54,236][1652491] Updated weights for policy 0, policy_version 353266 (0.0012) [2024-06-15 15:52:55,946][1652491] Updated weights for policy 0, policy_version 353344 (0.0024) [2024-06-15 15:52:55,955][1648985] Fps is (10 sec: 42597.7, 60 sec: 45329.0, 300 sec: 45319.8). Total num frames: 723648512. Throughput: 0: 10956.8. Samples: 180963328. Policy #0 lag: (min: 7.0, avg: 75.1, max: 263.0) [2024-06-15 15:52:55,956][1648985] Avg episode reward: [(0, '135.470')] [2024-06-15 15:52:55,960][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000353344_723648512.pth... [2024-06-15 15:52:56,009][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000348064_712835072.pth [2024-06-15 15:52:59,714][1652491] Updated weights for policy 0, policy_version 353440 (0.0015) [2024-06-15 15:53:00,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 43690.7, 300 sec: 45319.8). Total num frames: 723910656. Throughput: 0: 10706.5. Samples: 181017088. Policy #0 lag: (min: 7.0, avg: 75.1, max: 263.0) [2024-06-15 15:53:00,956][1648985] Avg episode reward: [(0, '139.200')] [2024-06-15 15:53:05,425][1652491] Updated weights for policy 0, policy_version 353489 (0.0017) [2024-06-15 15:53:05,955][1648985] Fps is (10 sec: 36044.8, 60 sec: 43690.7, 300 sec: 45097.6). Total num frames: 724008960. Throughput: 0: 10899.9. Samples: 181095424. Policy #0 lag: (min: 7.0, avg: 75.1, max: 263.0) [2024-06-15 15:53:05,956][1648985] Avg episode reward: [(0, '140.570')] [2024-06-15 15:53:06,623][1652491] Updated weights for policy 0, policy_version 353552 (0.0013) [2024-06-15 15:53:09,332][1652491] Updated weights for policy 0, policy_version 353616 (0.0023) [2024-06-15 15:53:10,483][1652491] Updated weights for policy 0, policy_version 353669 (0.0015) [2024-06-15 15:53:10,955][1648985] Fps is (10 sec: 42598.2, 60 sec: 43144.5, 300 sec: 45097.7). Total num frames: 724336640. Throughput: 0: 11047.8. Samples: 181129216. Policy #0 lag: (min: 66.0, avg: 176.1, max: 322.0) [2024-06-15 15:53:10,956][1648985] Avg episode reward: [(0, '126.230')] [2024-06-15 15:53:15,955][1648985] Fps is (10 sec: 42597.9, 60 sec: 43690.5, 300 sec: 44875.5). Total num frames: 724434944. Throughput: 0: 10911.2. Samples: 181199360. Policy #0 lag: (min: 66.0, avg: 176.1, max: 322.0) [2024-06-15 15:53:15,956][1648985] Avg episode reward: [(0, '131.590')] [2024-06-15 15:53:16,779][1652491] Updated weights for policy 0, policy_version 353746 (0.0014) [2024-06-15 15:53:18,943][1652491] Updated weights for policy 0, policy_version 353846 (0.0134) [2024-06-15 15:53:20,955][1648985] Fps is (10 sec: 39321.9, 60 sec: 43144.6, 300 sec: 44986.6). Total num frames: 724729856. Throughput: 0: 11195.8. Samples: 181264896. Policy #0 lag: (min: 66.0, avg: 176.1, max: 322.0) [2024-06-15 15:53:20,955][1648985] Avg episode reward: [(0, '158.130')] [2024-06-15 15:53:21,300][1652491] Updated weights for policy 0, policy_version 353892 (0.0115) [2024-06-15 15:53:22,689][1652491] Updated weights for policy 0, policy_version 353955 (0.0012) [2024-06-15 15:53:25,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 43690.6, 300 sec: 44875.5). Total num frames: 724959232. Throughput: 0: 10911.3. Samples: 181294592. Policy #0 lag: (min: 66.0, avg: 176.1, max: 322.0) [2024-06-15 15:53:25,956][1648985] Avg episode reward: [(0, '148.010')] [2024-06-15 15:53:27,428][1652491] Updated weights for policy 0, policy_version 353995 (0.0012) [2024-06-15 15:53:28,203][1651469] Signal inference workers to stop experience collection... (18500 times) [2024-06-15 15:53:28,240][1652491] InferenceWorker_p0-w0: stopping experience collection (18500 times) [2024-06-15 15:53:28,377][1651469] Signal inference workers to resume experience collection... (18500 times) [2024-06-15 15:53:28,384][1652491] InferenceWorker_p0-w0: resuming experience collection (18500 times) [2024-06-15 15:53:28,973][1652491] Updated weights for policy 0, policy_version 354080 (0.0013) [2024-06-15 15:53:30,955][1648985] Fps is (10 sec: 49151.6, 60 sec: 44782.8, 300 sec: 45319.8). Total num frames: 725221376. Throughput: 0: 11195.7. Samples: 181368320. Policy #0 lag: (min: 66.0, avg: 176.1, max: 322.0) [2024-06-15 15:53:30,956][1648985] Avg episode reward: [(0, '155.570')] [2024-06-15 15:53:31,702][1652491] Updated weights for policy 0, policy_version 354128 (0.0015) [2024-06-15 15:53:34,251][1652491] Updated weights for policy 0, policy_version 354213 (0.0013) [2024-06-15 15:53:35,955][1648985] Fps is (10 sec: 52430.0, 60 sec: 43690.8, 300 sec: 45097.7). Total num frames: 725483520. Throughput: 0: 11309.5. Samples: 181434880. Policy #0 lag: (min: 66.0, avg: 176.1, max: 322.0) [2024-06-15 15:53:35,956][1648985] Avg episode reward: [(0, '131.220')] [2024-06-15 15:53:39,201][1652491] Updated weights for policy 0, policy_version 354272 (0.0210) [2024-06-15 15:53:40,904][1652491] Updated weights for policy 0, policy_version 354352 (0.0013) [2024-06-15 15:53:40,955][1648985] Fps is (10 sec: 49152.5, 60 sec: 46967.6, 300 sec: 45208.7). Total num frames: 725712896. Throughput: 0: 11377.8. Samples: 181475328. Policy #0 lag: (min: 66.0, avg: 176.1, max: 322.0) [2024-06-15 15:53:40,955][1648985] Avg episode reward: [(0, '161.700')] [2024-06-15 15:53:43,165][1652491] Updated weights for policy 0, policy_version 354384 (0.0014) [2024-06-15 15:53:45,420][1652491] Updated weights for policy 0, policy_version 354450 (0.0012) [2024-06-15 15:53:45,955][1648985] Fps is (10 sec: 45875.2, 60 sec: 45329.0, 300 sec: 45542.0). Total num frames: 725942272. Throughput: 0: 11616.7. Samples: 181539840. Policy #0 lag: (min: 66.0, avg: 176.1, max: 322.0) [2024-06-15 15:53:45,956][1648985] Avg episode reward: [(0, '153.160')] [2024-06-15 15:53:49,881][1652491] Updated weights for policy 0, policy_version 354512 (0.0015) [2024-06-15 15:53:50,955][1648985] Fps is (10 sec: 42598.2, 60 sec: 45875.2, 300 sec: 45319.8). Total num frames: 726138880. Throughput: 0: 11514.3. Samples: 181613568. Policy #0 lag: (min: 66.0, avg: 176.1, max: 322.0) [2024-06-15 15:53:50,956][1648985] Avg episode reward: [(0, '148.190')] [2024-06-15 15:53:51,076][1652491] Updated weights for policy 0, policy_version 354566 (0.0013) [2024-06-15 15:53:52,245][1652491] Updated weights for policy 0, policy_version 354624 (0.0015) [2024-06-15 15:53:55,403][1652491] Updated weights for policy 0, policy_version 354679 (0.0108) [2024-06-15 15:53:55,955][1648985] Fps is (10 sec: 45874.8, 60 sec: 45875.2, 300 sec: 45542.0). Total num frames: 726401024. Throughput: 0: 11559.8. Samples: 181649408. Policy #0 lag: (min: 66.0, avg: 176.1, max: 322.0) [2024-06-15 15:53:55,956][1648985] Avg episode reward: [(0, '132.420')] [2024-06-15 15:53:56,931][1652491] Updated weights for policy 0, policy_version 354736 (0.0013) [2024-06-15 15:54:00,955][1648985] Fps is (10 sec: 42598.4, 60 sec: 44236.8, 300 sec: 44986.6). Total num frames: 726564864. Throughput: 0: 11628.1. Samples: 181722624. Policy #0 lag: (min: 66.0, avg: 176.1, max: 322.0) [2024-06-15 15:54:00,956][1648985] Avg episode reward: [(0, '134.720')] [2024-06-15 15:54:01,573][1652491] Updated weights for policy 0, policy_version 354800 (0.0013) [2024-06-15 15:54:03,220][1652491] Updated weights for policy 0, policy_version 354880 (0.0145) [2024-06-15 15:54:05,955][1648985] Fps is (10 sec: 42598.7, 60 sec: 46967.5, 300 sec: 45430.9). Total num frames: 726827008. Throughput: 0: 11616.7. Samples: 181787648. Policy #0 lag: (min: 66.0, avg: 176.1, max: 322.0) [2024-06-15 15:54:05,956][1648985] Avg episode reward: [(0, '142.950')] [2024-06-15 15:54:06,623][1652491] Updated weights for policy 0, policy_version 354940 (0.0030) [2024-06-15 15:54:08,641][1652491] Updated weights for policy 0, policy_version 354981 (0.0025) [2024-06-15 15:54:10,955][1648985] Fps is (10 sec: 49151.6, 60 sec: 45329.0, 300 sec: 45208.7). Total num frames: 727056384. Throughput: 0: 11707.8. Samples: 181821440. Policy #0 lag: (min: 66.0, avg: 176.1, max: 322.0) [2024-06-15 15:54:10,956][1648985] Avg episode reward: [(0, '153.530')] [2024-06-15 15:54:11,835][1652491] Updated weights for policy 0, policy_version 355010 (0.0012) [2024-06-15 15:54:12,309][1651469] Signal inference workers to stop experience collection... (18550 times) [2024-06-15 15:54:12,359][1652491] InferenceWorker_p0-w0: stopping experience collection (18550 times) [2024-06-15 15:54:12,568][1651469] Signal inference workers to resume experience collection... (18550 times) [2024-06-15 15:54:12,569][1652491] InferenceWorker_p0-w0: resuming experience collection (18550 times) [2024-06-15 15:54:13,483][1652491] Updated weights for policy 0, policy_version 355079 (0.0212) [2024-06-15 15:54:14,484][1652491] Updated weights for policy 0, policy_version 355135 (0.0013) [2024-06-15 15:54:15,955][1648985] Fps is (10 sec: 49151.5, 60 sec: 48059.8, 300 sec: 45319.8). Total num frames: 727318528. Throughput: 0: 11628.1. Samples: 181891584. Policy #0 lag: (min: 66.0, avg: 176.1, max: 322.0) [2024-06-15 15:54:15,956][1648985] Avg episode reward: [(0, '156.020')] [2024-06-15 15:54:17,460][1652491] Updated weights for policy 0, policy_version 355197 (0.0016) [2024-06-15 15:54:20,199][1652491] Updated weights for policy 0, policy_version 355235 (0.0015) [2024-06-15 15:54:20,907][1652491] Updated weights for policy 0, policy_version 355264 (0.0012) [2024-06-15 15:54:20,955][1648985] Fps is (10 sec: 52429.3, 60 sec: 47513.6, 300 sec: 45653.1). Total num frames: 727580672. Throughput: 0: 11628.1. Samples: 181958144. Policy #0 lag: (min: 66.0, avg: 176.1, max: 322.0) [2024-06-15 15:54:20,956][1648985] Avg episode reward: [(0, '138.710')] [2024-06-15 15:54:24,813][1652491] Updated weights for policy 0, policy_version 355328 (0.0014) [2024-06-15 15:54:25,955][1648985] Fps is (10 sec: 45875.4, 60 sec: 46967.6, 300 sec: 45319.8). Total num frames: 727777280. Throughput: 0: 11559.8. Samples: 181995520. Policy #0 lag: (min: 66.0, avg: 176.1, max: 322.0) [2024-06-15 15:54:25,956][1648985] Avg episode reward: [(0, '143.640')] [2024-06-15 15:54:26,442][1652491] Updated weights for policy 0, policy_version 355388 (0.0016) [2024-06-15 15:54:28,516][1652491] Updated weights for policy 0, policy_version 355448 (0.0038) [2024-06-15 15:54:30,955][1648985] Fps is (10 sec: 39321.5, 60 sec: 45875.2, 300 sec: 45653.0). Total num frames: 727973888. Throughput: 0: 11662.2. Samples: 182064640. Policy #0 lag: (min: 14.0, avg: 138.5, max: 270.0) [2024-06-15 15:54:30,955][1648985] Avg episode reward: [(0, '140.970')] [2024-06-15 15:54:31,861][1652491] Updated weights for policy 0, policy_version 355489 (0.0024) [2024-06-15 15:54:35,003][1652491] Updated weights for policy 0, policy_version 355538 (0.0023) [2024-06-15 15:54:35,955][1648985] Fps is (10 sec: 42598.5, 60 sec: 45329.0, 300 sec: 45208.7). Total num frames: 728203264. Throughput: 0: 11594.0. Samples: 182135296. Policy #0 lag: (min: 14.0, avg: 138.5, max: 270.0) [2024-06-15 15:54:35,956][1648985] Avg episode reward: [(0, '163.320')] [2024-06-15 15:54:36,217][1652491] Updated weights for policy 0, policy_version 355585 (0.0023) [2024-06-15 15:54:38,644][1652491] Updated weights for policy 0, policy_version 355664 (0.0014) [2024-06-15 15:54:40,955][1648985] Fps is (10 sec: 52428.1, 60 sec: 46421.2, 300 sec: 45764.1). Total num frames: 728498176. Throughput: 0: 11514.3. Samples: 182167552. Policy #0 lag: (min: 14.0, avg: 138.5, max: 270.0) [2024-06-15 15:54:40,956][1648985] Avg episode reward: [(0, '159.110')] [2024-06-15 15:54:42,730][1652491] Updated weights for policy 0, policy_version 355728 (0.0015) [2024-06-15 15:54:45,955][1648985] Fps is (10 sec: 42598.5, 60 sec: 44782.9, 300 sec: 45208.7). Total num frames: 728629248. Throughput: 0: 11480.2. Samples: 182239232. Policy #0 lag: (min: 14.0, avg: 138.5, max: 270.0) [2024-06-15 15:54:45,955][1648985] Avg episode reward: [(0, '152.710')] [2024-06-15 15:54:46,124][1652491] Updated weights for policy 0, policy_version 355792 (0.0050) [2024-06-15 15:54:48,425][1652491] Updated weights for policy 0, policy_version 355861 (0.0015) [2024-06-15 15:54:50,720][1652491] Updated weights for policy 0, policy_version 355939 (0.0013) [2024-06-15 15:54:50,955][1648985] Fps is (10 sec: 49153.0, 60 sec: 47513.6, 300 sec: 45653.1). Total num frames: 728989696. Throughput: 0: 11423.3. Samples: 182301696. Policy #0 lag: (min: 14.0, avg: 138.5, max: 270.0) [2024-06-15 15:54:50,955][1648985] Avg episode reward: [(0, '123.970')] [2024-06-15 15:54:54,075][1652491] Updated weights for policy 0, policy_version 355990 (0.0016) [2024-06-15 15:54:55,955][1648985] Fps is (10 sec: 52427.3, 60 sec: 45875.0, 300 sec: 45764.1). Total num frames: 729153536. Throughput: 0: 11628.0. Samples: 182344704. Policy #0 lag: (min: 14.0, avg: 138.5, max: 270.0) [2024-06-15 15:54:55,956][1648985] Avg episode reward: [(0, '141.890')] [2024-06-15 15:54:55,968][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000356032_729153536.pth... [2024-06-15 15:54:56,062][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000350656_718143488.pth [2024-06-15 15:54:57,744][1651469] Signal inference workers to stop experience collection... (18600 times) [2024-06-15 15:54:57,775][1652491] InferenceWorker_p0-w0: stopping experience collection (18600 times) [2024-06-15 15:54:58,029][1651469] Signal inference workers to resume experience collection... (18600 times) [2024-06-15 15:54:58,030][1652491] InferenceWorker_p0-w0: resuming experience collection (18600 times) [2024-06-15 15:54:58,038][1652491] Updated weights for policy 0, policy_version 356064 (0.0015) [2024-06-15 15:54:59,837][1652491] Updated weights for policy 0, policy_version 356114 (0.0013) [2024-06-15 15:55:00,955][1648985] Fps is (10 sec: 39321.5, 60 sec: 46967.5, 300 sec: 45542.0). Total num frames: 729382912. Throughput: 0: 11525.7. Samples: 182410240. Policy #0 lag: (min: 14.0, avg: 138.5, max: 270.0) [2024-06-15 15:55:00,956][1648985] Avg episode reward: [(0, '146.060')] [2024-06-15 15:55:01,047][1652491] Updated weights for policy 0, policy_version 356159 (0.0021) [2024-06-15 15:55:01,943][1652491] Updated weights for policy 0, policy_version 356210 (0.0012) [2024-06-15 15:55:05,955][1648985] Fps is (10 sec: 45876.1, 60 sec: 46421.3, 300 sec: 45986.3). Total num frames: 729612288. Throughput: 0: 11650.8. Samples: 182482432. Policy #0 lag: (min: 14.0, avg: 138.5, max: 270.0) [2024-06-15 15:55:05,956][1648985] Avg episode reward: [(0, '147.090')] [2024-06-15 15:55:06,053][1652491] Updated weights for policy 0, policy_version 356272 (0.0030) [2024-06-15 15:55:09,966][1652491] Updated weights for policy 0, policy_version 356322 (0.0119) [2024-06-15 15:55:10,955][1648985] Fps is (10 sec: 42596.8, 60 sec: 45875.0, 300 sec: 45319.8). Total num frames: 729808896. Throughput: 0: 11548.4. Samples: 182515200. Policy #0 lag: (min: 14.0, avg: 138.5, max: 270.0) [2024-06-15 15:55:10,956][1648985] Avg episode reward: [(0, '157.960')] [2024-06-15 15:55:11,268][1652491] Updated weights for policy 0, policy_version 356368 (0.0013) [2024-06-15 15:55:12,395][1652491] Updated weights for policy 0, policy_version 356410 (0.0023) [2024-06-15 15:55:13,306][1652491] Updated weights for policy 0, policy_version 356438 (0.0011) [2024-06-15 15:55:13,892][1652491] Updated weights for policy 0, policy_version 356479 (0.0014) [2024-06-15 15:55:15,955][1648985] Fps is (10 sec: 45875.7, 60 sec: 45875.3, 300 sec: 45764.1). Total num frames: 730071040. Throughput: 0: 11525.7. Samples: 182583296. Policy #0 lag: (min: 14.0, avg: 138.5, max: 270.0) [2024-06-15 15:55:15,955][1648985] Avg episode reward: [(0, '145.210')] [2024-06-15 15:55:17,849][1652491] Updated weights for policy 0, policy_version 356535 (0.0014) [2024-06-15 15:55:20,540][1652491] Updated weights for policy 0, policy_version 356561 (0.0015) [2024-06-15 15:55:20,955][1648985] Fps is (10 sec: 45877.3, 60 sec: 44783.0, 300 sec: 45542.0). Total num frames: 730267648. Throughput: 0: 11559.8. Samples: 182655488. Policy #0 lag: (min: 14.0, avg: 138.5, max: 270.0) [2024-06-15 15:55:20,955][1648985] Avg episode reward: [(0, '126.990')] [2024-06-15 15:55:21,647][1652491] Updated weights for policy 0, policy_version 356602 (0.0087) [2024-06-15 15:55:23,019][1652491] Updated weights for policy 0, policy_version 356661 (0.0020) [2024-06-15 15:55:23,558][1652491] Updated weights for policy 0, policy_version 356674 (0.0011) [2024-06-15 15:55:24,675][1652491] Updated weights for policy 0, policy_version 356729 (0.0014) [2024-06-15 15:55:25,955][1648985] Fps is (10 sec: 52428.0, 60 sec: 46967.4, 300 sec: 45764.1). Total num frames: 730595328. Throughput: 0: 11559.8. Samples: 182687744. Policy #0 lag: (min: 14.0, avg: 138.5, max: 270.0) [2024-06-15 15:55:25,956][1648985] Avg episode reward: [(0, '137.140')] [2024-06-15 15:55:29,130][1652491] Updated weights for policy 0, policy_version 356795 (0.0013) [2024-06-15 15:55:30,955][1648985] Fps is (10 sec: 45873.6, 60 sec: 45875.0, 300 sec: 45653.0). Total num frames: 730726400. Throughput: 0: 11662.2. Samples: 182764032. Policy #0 lag: (min: 14.0, avg: 138.5, max: 270.0) [2024-06-15 15:55:30,956][1648985] Avg episode reward: [(0, '146.440')] [2024-06-15 15:55:32,250][1652491] Updated weights for policy 0, policy_version 356838 (0.0122) [2024-06-15 15:55:34,013][1652491] Updated weights for policy 0, policy_version 356912 (0.0072) [2024-06-15 15:55:35,935][1652491] Updated weights for policy 0, policy_version 356991 (0.0012) [2024-06-15 15:55:35,955][1648985] Fps is (10 sec: 52429.4, 60 sec: 48605.9, 300 sec: 45986.3). Total num frames: 731119616. Throughput: 0: 11480.2. Samples: 182818304. Policy #0 lag: (min: 14.0, avg: 138.5, max: 270.0) [2024-06-15 15:55:35,956][1648985] Avg episode reward: [(0, '152.510')] [2024-06-15 15:55:40,267][1651469] Signal inference workers to stop experience collection... (18650 times) [2024-06-15 15:55:40,323][1652491] InferenceWorker_p0-w0: stopping experience collection (18650 times) [2024-06-15 15:55:40,491][1651469] Signal inference workers to resume experience collection... (18650 times) [2024-06-15 15:55:40,492][1652491] InferenceWorker_p0-w0: resuming experience collection (18650 times) [2024-06-15 15:55:40,686][1652491] Updated weights for policy 0, policy_version 357028 (0.0012) [2024-06-15 15:55:40,955][1648985] Fps is (10 sec: 49153.0, 60 sec: 45329.2, 300 sec: 45986.3). Total num frames: 731217920. Throughput: 0: 11503.0. Samples: 182862336. Policy #0 lag: (min: 14.0, avg: 138.5, max: 270.0) [2024-06-15 15:55:40,956][1648985] Avg episode reward: [(0, '152.960')] [2024-06-15 15:55:43,987][1652491] Updated weights for policy 0, policy_version 357089 (0.0014) [2024-06-15 15:55:45,442][1652491] Updated weights for policy 0, policy_version 357152 (0.0021) [2024-06-15 15:55:45,955][1648985] Fps is (10 sec: 36044.8, 60 sec: 47513.6, 300 sec: 45653.0). Total num frames: 731480064. Throughput: 0: 11446.0. Samples: 182925312. Policy #0 lag: (min: 63.0, avg: 160.6, max: 303.0) [2024-06-15 15:55:45,956][1648985] Avg episode reward: [(0, '162.420')] [2024-06-15 15:55:46,318][1652491] Updated weights for policy 0, policy_version 357186 (0.0012) [2024-06-15 15:55:47,345][1652491] Updated weights for policy 0, policy_version 357242 (0.0010) [2024-06-15 15:55:50,955][1648985] Fps is (10 sec: 42598.1, 60 sec: 44236.7, 300 sec: 45764.1). Total num frames: 731643904. Throughput: 0: 11559.8. Samples: 183002624. Policy #0 lag: (min: 63.0, avg: 160.6, max: 303.0) [2024-06-15 15:55:50,956][1648985] Avg episode reward: [(0, '164.620')] [2024-06-15 15:55:54,362][1652491] Updated weights for policy 0, policy_version 357316 (0.0015) [2024-06-15 15:55:55,955][1648985] Fps is (10 sec: 39321.8, 60 sec: 45329.3, 300 sec: 45319.8). Total num frames: 731873280. Throughput: 0: 11605.4. Samples: 183037440. Policy #0 lag: (min: 63.0, avg: 160.6, max: 303.0) [2024-06-15 15:55:55,955][1648985] Avg episode reward: [(0, '161.760')] [2024-06-15 15:55:56,170][1652491] Updated weights for policy 0, policy_version 357377 (0.0013) [2024-06-15 15:55:57,443][1652491] Updated weights for policy 0, policy_version 357437 (0.0020) [2024-06-15 15:55:58,401][1652491] Updated weights for policy 0, policy_version 357488 (0.0011) [2024-06-15 15:55:58,803][1652491] Updated weights for policy 0, policy_version 357504 (0.0011) [2024-06-15 15:56:00,956][1648985] Fps is (10 sec: 52426.5, 60 sec: 46420.9, 300 sec: 45764.1). Total num frames: 732168192. Throughput: 0: 11650.7. Samples: 183107584. Policy #0 lag: (min: 63.0, avg: 160.6, max: 303.0) [2024-06-15 15:56:00,957][1648985] Avg episode reward: [(0, '156.150')] [2024-06-15 15:56:03,391][1652491] Updated weights for policy 0, policy_version 357563 (0.0013) [2024-06-15 15:56:05,337][1652491] Updated weights for policy 0, policy_version 357603 (0.0021) [2024-06-15 15:56:05,955][1648985] Fps is (10 sec: 55705.0, 60 sec: 46967.5, 300 sec: 45875.2). Total num frames: 732430336. Throughput: 0: 11730.4. Samples: 183183360. Policy #0 lag: (min: 63.0, avg: 160.6, max: 303.0) [2024-06-15 15:56:05,956][1648985] Avg episode reward: [(0, '147.730')] [2024-06-15 15:56:06,537][1652491] Updated weights for policy 0, policy_version 357664 (0.0014) [2024-06-15 15:56:07,956][1652491] Updated weights for policy 0, policy_version 357702 (0.0014) [2024-06-15 15:56:09,283][1652491] Updated weights for policy 0, policy_version 357760 (0.0013) [2024-06-15 15:56:10,955][1648985] Fps is (10 sec: 52431.4, 60 sec: 48060.0, 300 sec: 45764.1). Total num frames: 732692480. Throughput: 0: 11685.0. Samples: 183213568. Policy #0 lag: (min: 63.0, avg: 160.6, max: 303.0) [2024-06-15 15:56:10,956][1648985] Avg episode reward: [(0, '154.140')] [2024-06-15 15:56:14,233][1652491] Updated weights for policy 0, policy_version 357819 (0.0017) [2024-06-15 15:56:15,955][1648985] Fps is (10 sec: 45875.7, 60 sec: 46967.5, 300 sec: 45875.2). Total num frames: 732889088. Throughput: 0: 11741.9. Samples: 183292416. Policy #0 lag: (min: 63.0, avg: 160.6, max: 303.0) [2024-06-15 15:56:15,955][1648985] Avg episode reward: [(0, '134.070')] [2024-06-15 15:56:16,233][1652491] Updated weights for policy 0, policy_version 357888 (0.0117) [2024-06-15 15:56:18,736][1652491] Updated weights for policy 0, policy_version 357951 (0.0012) [2024-06-15 15:56:19,939][1651469] Signal inference workers to stop experience collection... (18700 times) [2024-06-15 15:56:20,030][1652491] InferenceWorker_p0-w0: stopping experience collection (18700 times) [2024-06-15 15:56:20,167][1651469] Signal inference workers to resume experience collection... (18700 times) [2024-06-15 15:56:20,168][1652491] InferenceWorker_p0-w0: resuming experience collection (18700 times) [2024-06-15 15:56:20,379][1652491] Updated weights for policy 0, policy_version 358013 (0.0129) [2024-06-15 15:56:20,955][1648985] Fps is (10 sec: 52429.5, 60 sec: 49152.0, 300 sec: 45764.1). Total num frames: 733216768. Throughput: 0: 11889.8. Samples: 183353344. Policy #0 lag: (min: 63.0, avg: 160.6, max: 303.0) [2024-06-15 15:56:20,955][1648985] Avg episode reward: [(0, '133.320')] [2024-06-15 15:56:24,839][1652491] Updated weights for policy 0, policy_version 358073 (0.0013) [2024-06-15 15:56:25,955][1648985] Fps is (10 sec: 45875.1, 60 sec: 45875.3, 300 sec: 46208.4). Total num frames: 733347840. Throughput: 0: 12026.3. Samples: 183403520. Policy #0 lag: (min: 63.0, avg: 160.6, max: 303.0) [2024-06-15 15:56:25,955][1648985] Avg episode reward: [(0, '145.770')] [2024-06-15 15:56:26,883][1652491] Updated weights for policy 0, policy_version 358128 (0.0029) [2024-06-15 15:56:28,800][1652491] Updated weights for policy 0, policy_version 358176 (0.0014) [2024-06-15 15:56:29,768][1652491] Updated weights for policy 0, policy_version 358210 (0.0020) [2024-06-15 15:56:30,955][1648985] Fps is (10 sec: 49151.0, 60 sec: 49698.2, 300 sec: 46097.3). Total num frames: 733708288. Throughput: 0: 12083.2. Samples: 183469056. Policy #0 lag: (min: 63.0, avg: 160.6, max: 303.0) [2024-06-15 15:56:30,956][1648985] Avg episode reward: [(0, '148.410')] [2024-06-15 15:56:30,968][1652491] Updated weights for policy 0, policy_version 358272 (0.0012) [2024-06-15 15:56:35,703][1652491] Updated weights for policy 0, policy_version 358325 (0.0014) [2024-06-15 15:56:35,955][1648985] Fps is (10 sec: 52428.2, 60 sec: 45875.1, 300 sec: 46208.4). Total num frames: 733872128. Throughput: 0: 11980.8. Samples: 183541760. Policy #0 lag: (min: 63.0, avg: 160.6, max: 303.0) [2024-06-15 15:56:35,956][1648985] Avg episode reward: [(0, '143.420')] [2024-06-15 15:56:38,362][1652491] Updated weights for policy 0, policy_version 358394 (0.0187) [2024-06-15 15:56:39,906][1652491] Updated weights for policy 0, policy_version 358432 (0.0012) [2024-06-15 15:56:40,955][1648985] Fps is (10 sec: 42598.6, 60 sec: 48605.8, 300 sec: 46097.4). Total num frames: 734134272. Throughput: 0: 11969.4. Samples: 183576064. Policy #0 lag: (min: 63.0, avg: 160.6, max: 303.0) [2024-06-15 15:56:40,956][1648985] Avg episode reward: [(0, '152.640')] [2024-06-15 15:56:42,060][1652491] Updated weights for policy 0, policy_version 358512 (0.0013) [2024-06-15 15:56:45,855][1652491] Updated weights for policy 0, policy_version 358565 (0.0013) [2024-06-15 15:56:45,955][1648985] Fps is (10 sec: 45875.7, 60 sec: 47513.6, 300 sec: 45986.3). Total num frames: 734330880. Throughput: 0: 12163.0. Samples: 183654912. Policy #0 lag: (min: 63.0, avg: 160.6, max: 303.0) [2024-06-15 15:56:45,956][1648985] Avg episode reward: [(0, '151.990')] [2024-06-15 15:56:47,681][1652491] Updated weights for policy 0, policy_version 358608 (0.0016) [2024-06-15 15:56:50,565][1652491] Updated weights for policy 0, policy_version 358672 (0.0016) [2024-06-15 15:56:50,955][1648985] Fps is (10 sec: 42598.6, 60 sec: 48605.9, 300 sec: 46208.4). Total num frames: 734560256. Throughput: 0: 12094.6. Samples: 183727616. Policy #0 lag: (min: 63.0, avg: 160.6, max: 303.0) [2024-06-15 15:56:50,956][1648985] Avg episode reward: [(0, '153.960')] [2024-06-15 15:56:52,086][1652491] Updated weights for policy 0, policy_version 358737 (0.0014) [2024-06-15 15:56:55,955][1648985] Fps is (10 sec: 45875.0, 60 sec: 48605.8, 300 sec: 45764.1). Total num frames: 734789632. Throughput: 0: 12106.0. Samples: 183758336. Policy #0 lag: (min: 63.0, avg: 160.6, max: 303.0) [2024-06-15 15:56:55,956][1648985] Avg episode reward: [(0, '140.580')] [2024-06-15 15:56:56,230][1652491] Updated weights for policy 0, policy_version 358800 (0.0098) [2024-06-15 15:56:56,471][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000358816_734855168.pth... [2024-06-15 15:56:56,648][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000353344_723648512.pth [2024-06-15 15:56:57,270][1652491] Updated weights for policy 0, policy_version 358848 (0.0013) [2024-06-15 15:57:00,532][1652491] Updated weights for policy 0, policy_version 358911 (0.0013) [2024-06-15 15:57:00,955][1648985] Fps is (10 sec: 49152.2, 60 sec: 48060.2, 300 sec: 46319.5). Total num frames: 735051776. Throughput: 0: 11821.5. Samples: 183824384. Policy #0 lag: (min: 63.0, avg: 160.6, max: 303.0) [2024-06-15 15:57:00,955][1648985] Avg episode reward: [(0, '138.660')] [2024-06-15 15:57:04,149][1652491] Updated weights for policy 0, policy_version 358992 (0.0011) [2024-06-15 15:57:04,685][1651469] Signal inference workers to stop experience collection... (18750 times) [2024-06-15 15:57:04,751][1652491] InferenceWorker_p0-w0: stopping experience collection (18750 times) [2024-06-15 15:57:04,914][1651469] Signal inference workers to resume experience collection... (18750 times) [2024-06-15 15:57:04,915][1652491] InferenceWorker_p0-w0: resuming experience collection (18750 times) [2024-06-15 15:57:05,955][1648985] Fps is (10 sec: 52429.1, 60 sec: 48059.8, 300 sec: 45986.3). Total num frames: 735313920. Throughput: 0: 11912.5. Samples: 183889408. Policy #0 lag: (min: 43.0, avg: 143.0, max: 299.0) [2024-06-15 15:57:05,955][1648985] Avg episode reward: [(0, '141.900')] [2024-06-15 15:57:07,877][1652491] Updated weights for policy 0, policy_version 359072 (0.0014) [2024-06-15 15:57:10,955][1648985] Fps is (10 sec: 39321.6, 60 sec: 45875.2, 300 sec: 46208.4). Total num frames: 735444992. Throughput: 0: 11491.6. Samples: 183920640. Policy #0 lag: (min: 43.0, avg: 143.0, max: 299.0) [2024-06-15 15:57:10,955][1648985] Avg episode reward: [(0, '151.730')] [2024-06-15 15:57:11,531][1652491] Updated weights for policy 0, policy_version 359122 (0.0015) [2024-06-15 15:57:15,158][1652491] Updated weights for policy 0, policy_version 359200 (0.0014) [2024-06-15 15:57:15,955][1648985] Fps is (10 sec: 36045.0, 60 sec: 46421.3, 300 sec: 45875.2). Total num frames: 735674368. Throughput: 0: 11719.1. Samples: 183996416. Policy #0 lag: (min: 43.0, avg: 143.0, max: 299.0) [2024-06-15 15:57:15,956][1648985] Avg episode reward: [(0, '144.330')] [2024-06-15 15:57:17,393][1652491] Updated weights for policy 0, policy_version 359287 (0.0115) [2024-06-15 15:57:19,531][1652491] Updated weights for policy 0, policy_version 359331 (0.0018) [2024-06-15 15:57:20,955][1648985] Fps is (10 sec: 52428.5, 60 sec: 45875.1, 300 sec: 46208.4). Total num frames: 735969280. Throughput: 0: 11503.0. Samples: 184059392. Policy #0 lag: (min: 43.0, avg: 143.0, max: 299.0) [2024-06-15 15:57:20,956][1648985] Avg episode reward: [(0, '142.630')] [2024-06-15 15:57:23,448][1652491] Updated weights for policy 0, policy_version 359392 (0.0014) [2024-06-15 15:57:25,955][1648985] Fps is (10 sec: 42597.9, 60 sec: 45875.1, 300 sec: 45986.3). Total num frames: 736100352. Throughput: 0: 11548.4. Samples: 184095744. Policy #0 lag: (min: 43.0, avg: 143.0, max: 299.0) [2024-06-15 15:57:25,956][1648985] Avg episode reward: [(0, '125.690')] [2024-06-15 15:57:26,588][1652491] Updated weights for policy 0, policy_version 359442 (0.0015) [2024-06-15 15:57:28,422][1652491] Updated weights for policy 0, policy_version 359513 (0.0012) [2024-06-15 15:57:30,608][1652491] Updated weights for policy 0, policy_version 359558 (0.0013) [2024-06-15 15:57:30,955][1648985] Fps is (10 sec: 42597.3, 60 sec: 44782.8, 300 sec: 45875.2). Total num frames: 736395264. Throughput: 0: 11241.2. Samples: 184160768. Policy #0 lag: (min: 43.0, avg: 143.0, max: 299.0) [2024-06-15 15:57:30,956][1648985] Avg episode reward: [(0, '133.320')] [2024-06-15 15:57:31,577][1652491] Updated weights for policy 0, policy_version 359615 (0.0014) [2024-06-15 15:57:34,317][1652491] Updated weights for policy 0, policy_version 359664 (0.0022) [2024-06-15 15:57:35,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 45875.2, 300 sec: 46541.7). Total num frames: 736624640. Throughput: 0: 11298.1. Samples: 184236032. Policy #0 lag: (min: 43.0, avg: 143.0, max: 299.0) [2024-06-15 15:57:35,956][1648985] Avg episode reward: [(0, '155.160')] [2024-06-15 15:57:38,379][1652491] Updated weights for policy 0, policy_version 359728 (0.0013) [2024-06-15 15:57:40,419][1652491] Updated weights for policy 0, policy_version 359807 (0.0197) [2024-06-15 15:57:40,955][1648985] Fps is (10 sec: 49153.2, 60 sec: 45875.2, 300 sec: 46319.5). Total num frames: 736886784. Throughput: 0: 11275.4. Samples: 184265728. Policy #0 lag: (min: 43.0, avg: 143.0, max: 299.0) [2024-06-15 15:57:40,956][1648985] Avg episode reward: [(0, '158.020')] [2024-06-15 15:57:43,986][1652491] Updated weights for policy 0, policy_version 359865 (0.0168) [2024-06-15 15:57:45,961][1648985] Fps is (10 sec: 45846.4, 60 sec: 45870.4, 300 sec: 46429.6). Total num frames: 737083392. Throughput: 0: 11205.5. Samples: 184328704. Policy #0 lag: (min: 43.0, avg: 143.0, max: 299.0) [2024-06-15 15:57:45,962][1648985] Avg episode reward: [(0, '131.290')] [2024-06-15 15:57:46,019][1652491] Updated weights for policy 0, policy_version 359920 (0.0016) [2024-06-15 15:57:50,808][1652491] Updated weights for policy 0, policy_version 359975 (0.0013) [2024-06-15 15:57:50,955][1648985] Fps is (10 sec: 32768.0, 60 sec: 44236.8, 300 sec: 45986.3). Total num frames: 737214464. Throughput: 0: 11320.9. Samples: 184398848. Policy #0 lag: (min: 43.0, avg: 143.0, max: 299.0) [2024-06-15 15:57:50,956][1648985] Avg episode reward: [(0, '154.030')] [2024-06-15 15:57:51,634][1651469] Signal inference workers to stop experience collection... (18800 times) [2024-06-15 15:57:51,699][1652491] InferenceWorker_p0-w0: stopping experience collection (18800 times) [2024-06-15 15:57:51,938][1651469] Signal inference workers to resume experience collection... (18800 times) [2024-06-15 15:57:51,949][1652491] InferenceWorker_p0-w0: resuming experience collection (18800 times) [2024-06-15 15:57:52,371][1652491] Updated weights for policy 0, policy_version 360032 (0.0012) [2024-06-15 15:57:55,716][1652491] Updated weights for policy 0, policy_version 360105 (0.0231) [2024-06-15 15:57:55,955][1648985] Fps is (10 sec: 42625.3, 60 sec: 45329.1, 300 sec: 46097.4). Total num frames: 737509376. Throughput: 0: 11275.4. Samples: 184428032. Policy #0 lag: (min: 43.0, avg: 143.0, max: 299.0) [2024-06-15 15:57:55,956][1648985] Avg episode reward: [(0, '166.710')] [2024-06-15 15:57:56,802][1652491] Updated weights for policy 0, policy_version 360130 (0.0012) [2024-06-15 15:57:57,773][1652491] Updated weights for policy 0, policy_version 360188 (0.0012) [2024-06-15 15:58:00,955][1648985] Fps is (10 sec: 45875.4, 60 sec: 43690.6, 300 sec: 46319.5). Total num frames: 737673216. Throughput: 0: 11275.4. Samples: 184503808. Policy #0 lag: (min: 43.0, avg: 143.0, max: 299.0) [2024-06-15 15:58:00,956][1648985] Avg episode reward: [(0, '170.540')] [2024-06-15 15:58:02,649][1652491] Updated weights for policy 0, policy_version 360256 (0.0088) [2024-06-15 15:58:04,295][1652491] Updated weights for policy 0, policy_version 360320 (0.0011) [2024-06-15 15:58:05,955][1648985] Fps is (10 sec: 42597.7, 60 sec: 43690.5, 300 sec: 46097.3). Total num frames: 737935360. Throughput: 0: 11275.3. Samples: 184566784. Policy #0 lag: (min: 43.0, avg: 143.0, max: 299.0) [2024-06-15 15:58:05,956][1648985] Avg episode reward: [(0, '158.270')] [2024-06-15 15:58:07,392][1652491] Updated weights for policy 0, policy_version 360376 (0.0093) [2024-06-15 15:58:09,138][1652491] Updated weights for policy 0, policy_version 360432 (0.0015) [2024-06-15 15:58:10,955][1648985] Fps is (10 sec: 52428.4, 60 sec: 45875.1, 300 sec: 46652.8). Total num frames: 738197504. Throughput: 0: 11275.4. Samples: 184603136. Policy #0 lag: (min: 43.0, avg: 143.0, max: 299.0) [2024-06-15 15:58:10,956][1648985] Avg episode reward: [(0, '160.020')] [2024-06-15 15:58:13,551][1652491] Updated weights for policy 0, policy_version 360482 (0.0011) [2024-06-15 15:58:15,264][1652491] Updated weights for policy 0, policy_version 360547 (0.0012) [2024-06-15 15:58:15,955][1648985] Fps is (10 sec: 52427.8, 60 sec: 46421.0, 300 sec: 46541.6). Total num frames: 738459648. Throughput: 0: 11389.1. Samples: 184673280. Policy #0 lag: (min: 43.0, avg: 143.0, max: 299.0) [2024-06-15 15:58:15,957][1648985] Avg episode reward: [(0, '161.150')] [2024-06-15 15:58:17,459][1652491] Updated weights for policy 0, policy_version 360583 (0.0019) [2024-06-15 15:58:18,460][1652491] Updated weights for policy 0, policy_version 360640 (0.0020) [2024-06-15 15:58:20,295][1652491] Updated weights for policy 0, policy_version 360697 (0.0018) [2024-06-15 15:58:20,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 45875.2, 300 sec: 46652.8). Total num frames: 738721792. Throughput: 0: 11275.4. Samples: 184743424. Policy #0 lag: (min: 43.0, avg: 143.0, max: 299.0) [2024-06-15 15:58:20,956][1648985] Avg episode reward: [(0, '157.740')] [2024-06-15 15:58:25,763][1652491] Updated weights for policy 0, policy_version 360769 (0.0013) [2024-06-15 15:58:25,955][1648985] Fps is (10 sec: 39323.5, 60 sec: 45875.3, 300 sec: 46208.5). Total num frames: 738852864. Throughput: 0: 11503.0. Samples: 184783360. Policy #0 lag: (min: 15.0, avg: 99.3, max: 271.0) [2024-06-15 15:58:25,955][1648985] Avg episode reward: [(0, '131.600')] [2024-06-15 15:58:27,019][1652491] Updated weights for policy 0, policy_version 360821 (0.0017) [2024-06-15 15:58:29,391][1652491] Updated weights for policy 0, policy_version 360848 (0.0015) [2024-06-15 15:58:30,424][1652491] Updated weights for policy 0, policy_version 360895 (0.0013) [2024-06-15 15:58:30,955][1648985] Fps is (10 sec: 42598.6, 60 sec: 45875.4, 300 sec: 46319.5). Total num frames: 739147776. Throughput: 0: 11447.6. Samples: 184843776. Policy #0 lag: (min: 15.0, avg: 99.3, max: 271.0) [2024-06-15 15:58:30,956][1648985] Avg episode reward: [(0, '153.460')] [2024-06-15 15:58:31,619][1652491] Updated weights for policy 0, policy_version 360950 (0.0012) [2024-06-15 15:58:35,850][1651469] Signal inference workers to stop experience collection... (18850 times) [2024-06-15 15:58:35,935][1652491] InferenceWorker_p0-w0: stopping experience collection (18850 times) [2024-06-15 15:58:35,963][1648985] Fps is (10 sec: 42571.4, 60 sec: 44232.2, 300 sec: 45985.3). Total num frames: 739278848. Throughput: 0: 11569.6. Samples: 184919552. Policy #0 lag: (min: 15.0, avg: 99.3, max: 271.0) [2024-06-15 15:58:35,964][1648985] Avg episode reward: [(0, '146.920')] [2024-06-15 15:58:36,192][1651469] Signal inference workers to resume experience collection... (18850 times) [2024-06-15 15:58:36,192][1652491] InferenceWorker_p0-w0: resuming experience collection (18850 times) [2024-06-15 15:58:36,339][1652491] Updated weights for policy 0, policy_version 360993 (0.0021) [2024-06-15 15:58:38,013][1652491] Updated weights for policy 0, policy_version 361056 (0.0123) [2024-06-15 15:58:40,867][1652491] Updated weights for policy 0, policy_version 361104 (0.0019) [2024-06-15 15:58:40,955][1648985] Fps is (10 sec: 39321.6, 60 sec: 44236.8, 300 sec: 46097.3). Total num frames: 739540992. Throughput: 0: 11457.4. Samples: 184943616. Policy #0 lag: (min: 15.0, avg: 99.3, max: 271.0) [2024-06-15 15:58:40,956][1648985] Avg episode reward: [(0, '140.120')] [2024-06-15 15:58:42,065][1652491] Updated weights for policy 0, policy_version 361153 (0.0011) [2024-06-15 15:58:45,955][1648985] Fps is (10 sec: 49181.7, 60 sec: 44787.5, 300 sec: 46208.4). Total num frames: 739770368. Throughput: 0: 11389.1. Samples: 185016320. Policy #0 lag: (min: 15.0, avg: 99.3, max: 271.0) [2024-06-15 15:58:45,956][1648985] Avg episode reward: [(0, '136.510')] [2024-06-15 15:58:47,074][1652491] Updated weights for policy 0, policy_version 361236 (0.0014) [2024-06-15 15:58:48,829][1652491] Updated weights for policy 0, policy_version 361312 (0.0011) [2024-06-15 15:58:50,955][1648985] Fps is (10 sec: 49151.9, 60 sec: 46967.5, 300 sec: 46208.4). Total num frames: 740032512. Throughput: 0: 11639.5. Samples: 185090560. Policy #0 lag: (min: 15.0, avg: 99.3, max: 271.0) [2024-06-15 15:58:50,956][1648985] Avg episode reward: [(0, '130.840')] [2024-06-15 15:58:51,969][1652491] Updated weights for policy 0, policy_version 361360 (0.0011) [2024-06-15 15:58:53,888][1652491] Updated weights for policy 0, policy_version 361440 (0.0109) [2024-06-15 15:58:55,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 46421.2, 300 sec: 46541.6). Total num frames: 740294656. Throughput: 0: 11468.8. Samples: 185119232. Policy #0 lag: (min: 15.0, avg: 99.3, max: 271.0) [2024-06-15 15:58:55,956][1648985] Avg episode reward: [(0, '115.850')] [2024-06-15 15:58:55,979][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000361472_740294656.pth... [2024-06-15 15:58:56,080][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000356032_729153536.pth [2024-06-15 15:58:58,486][1652491] Updated weights for policy 0, policy_version 361490 (0.0012) [2024-06-15 15:59:00,124][1652491] Updated weights for policy 0, policy_version 361568 (0.0013) [2024-06-15 15:59:00,955][1648985] Fps is (10 sec: 52429.8, 60 sec: 48059.8, 300 sec: 46541.7). Total num frames: 740556800. Throughput: 0: 11525.8. Samples: 185191936. Policy #0 lag: (min: 15.0, avg: 99.3, max: 271.0) [2024-06-15 15:59:00,955][1648985] Avg episode reward: [(0, '112.440')] [2024-06-15 15:59:03,009][1652491] Updated weights for policy 0, policy_version 361601 (0.0009) [2024-06-15 15:59:04,216][1652491] Updated weights for policy 0, policy_version 361648 (0.0046) [2024-06-15 15:59:05,919][1652491] Updated weights for policy 0, policy_version 361719 (0.0014) [2024-06-15 15:59:05,955][1648985] Fps is (10 sec: 49152.7, 60 sec: 47513.7, 300 sec: 46541.7). Total num frames: 740786176. Throughput: 0: 11548.5. Samples: 185263104. Policy #0 lag: (min: 15.0, avg: 99.3, max: 271.0) [2024-06-15 15:59:05,956][1648985] Avg episode reward: [(0, '123.210')] [2024-06-15 15:59:09,835][1652491] Updated weights for policy 0, policy_version 361776 (0.0012) [2024-06-15 15:59:10,955][1648985] Fps is (10 sec: 42597.4, 60 sec: 46421.3, 300 sec: 46319.5). Total num frames: 740982784. Throughput: 0: 11605.3. Samples: 185305600. Policy #0 lag: (min: 15.0, avg: 99.3, max: 271.0) [2024-06-15 15:59:10,956][1648985] Avg episode reward: [(0, '142.880')] [2024-06-15 15:59:11,544][1652491] Updated weights for policy 0, policy_version 361849 (0.0105) [2024-06-15 15:59:15,091][1651469] Signal inference workers to stop experience collection... (18900 times) [2024-06-15 15:59:15,134][1652491] InferenceWorker_p0-w0: stopping experience collection (18900 times) [2024-06-15 15:59:15,380][1651469] Signal inference workers to resume experience collection... (18900 times) [2024-06-15 15:59:15,381][1652491] InferenceWorker_p0-w0: resuming experience collection (18900 times) [2024-06-15 15:59:15,384][1652491] Updated weights for policy 0, policy_version 361904 (0.0011) [2024-06-15 15:59:15,955][1648985] Fps is (10 sec: 42598.4, 60 sec: 45875.5, 300 sec: 46208.4). Total num frames: 741212160. Throughput: 0: 11764.6. Samples: 185373184. Policy #0 lag: (min: 15.0, avg: 99.3, max: 271.0) [2024-06-15 15:59:15,956][1648985] Avg episode reward: [(0, '146.180')] [2024-06-15 15:59:17,124][1652491] Updated weights for policy 0, policy_version 361969 (0.0012) [2024-06-15 15:59:20,955][1648985] Fps is (10 sec: 39321.3, 60 sec: 44236.7, 300 sec: 46097.3). Total num frames: 741376000. Throughput: 0: 11515.9. Samples: 185437696. Policy #0 lag: (min: 15.0, avg: 99.3, max: 271.0) [2024-06-15 15:59:20,956][1648985] Avg episode reward: [(0, '138.720')] [2024-06-15 15:59:21,364][1652491] Updated weights for policy 0, policy_version 362018 (0.0020) [2024-06-15 15:59:23,337][1652491] Updated weights for policy 0, policy_version 362108 (0.0013) [2024-06-15 15:59:25,955][1648985] Fps is (10 sec: 39321.6, 60 sec: 45875.1, 300 sec: 46208.4). Total num frames: 741605376. Throughput: 0: 11559.8. Samples: 185463808. Policy #0 lag: (min: 15.0, avg: 99.3, max: 271.0) [2024-06-15 15:59:25,956][1648985] Avg episode reward: [(0, '143.970')] [2024-06-15 15:59:27,564][1652491] Updated weights for policy 0, policy_version 362160 (0.0174) [2024-06-15 15:59:28,988][1652491] Updated weights for policy 0, policy_version 362224 (0.0012) [2024-06-15 15:59:30,955][1648985] Fps is (10 sec: 49152.7, 60 sec: 45329.1, 300 sec: 46319.5). Total num frames: 741867520. Throughput: 0: 11503.0. Samples: 185533952. Policy #0 lag: (min: 15.0, avg: 99.3, max: 271.0) [2024-06-15 15:59:30,956][1648985] Avg episode reward: [(0, '146.920')] [2024-06-15 15:59:33,088][1652491] Updated weights for policy 0, policy_version 362288 (0.0103) [2024-06-15 15:59:34,805][1652491] Updated weights for policy 0, policy_version 362355 (0.0199) [2024-06-15 15:59:35,955][1648985] Fps is (10 sec: 52428.2, 60 sec: 47518.4, 300 sec: 46208.4). Total num frames: 742129664. Throughput: 0: 11343.6. Samples: 185601024. Policy #0 lag: (min: 15.0, avg: 99.3, max: 271.0) [2024-06-15 15:59:35,956][1648985] Avg episode reward: [(0, '158.010')] [2024-06-15 15:59:38,598][1652491] Updated weights for policy 0, policy_version 362401 (0.0012) [2024-06-15 15:59:40,694][1652491] Updated weights for policy 0, policy_version 362493 (0.0172) [2024-06-15 15:59:40,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 47513.6, 300 sec: 46652.7). Total num frames: 742391808. Throughput: 0: 11468.8. Samples: 185635328. Policy #0 lag: (min: 15.0, avg: 99.3, max: 271.0) [2024-06-15 15:59:40,956][1648985] Avg episode reward: [(0, '148.620')] [2024-06-15 15:59:45,955][1648985] Fps is (10 sec: 36045.1, 60 sec: 45329.2, 300 sec: 45764.1). Total num frames: 742490112. Throughput: 0: 11377.7. Samples: 185703936. Policy #0 lag: (min: 15.0, avg: 101.6, max: 271.0) [2024-06-15 15:59:45,956][1648985] Avg episode reward: [(0, '148.840')] [2024-06-15 15:59:46,234][1652491] Updated weights for policy 0, policy_version 362563 (0.0124) [2024-06-15 15:59:49,986][1652491] Updated weights for policy 0, policy_version 362630 (0.0014) [2024-06-15 15:59:50,955][1648985] Fps is (10 sec: 36045.0, 60 sec: 45329.1, 300 sec: 46097.4). Total num frames: 742752256. Throughput: 0: 11150.2. Samples: 185764864. Policy #0 lag: (min: 15.0, avg: 101.6, max: 271.0) [2024-06-15 15:59:50,955][1648985] Avg episode reward: [(0, '136.890')] [2024-06-15 15:59:51,366][1652491] Updated weights for policy 0, policy_version 362689 (0.0010) [2024-06-15 15:59:55,958][1648985] Fps is (10 sec: 42588.1, 60 sec: 43689.0, 300 sec: 45874.8). Total num frames: 742916096. Throughput: 0: 10876.6. Samples: 185795072. Policy #0 lag: (min: 15.0, avg: 101.6, max: 271.0) [2024-06-15 15:59:55,958][1648985] Avg episode reward: [(0, '147.810')] [2024-06-15 15:59:56,356][1652491] Updated weights for policy 0, policy_version 362768 (0.0014) [2024-06-15 15:59:57,449][1652491] Updated weights for policy 0, policy_version 362816 (0.0011) [2024-06-15 15:59:58,123][1651469] Signal inference workers to stop experience collection... (18950 times) [2024-06-15 15:59:58,172][1652491] InferenceWorker_p0-w0: stopping experience collection (18950 times) [2024-06-15 15:59:58,313][1651469] Signal inference workers to resume experience collection... (18950 times) [2024-06-15 15:59:58,315][1652491] InferenceWorker_p0-w0: resuming experience collection (18950 times) [2024-06-15 15:59:59,155][1652491] Updated weights for policy 0, policy_version 362879 (0.0012) [2024-06-15 16:00:00,955][1648985] Fps is (10 sec: 42598.2, 60 sec: 43690.6, 300 sec: 45986.3). Total num frames: 743178240. Throughput: 0: 10956.8. Samples: 185866240. Policy #0 lag: (min: 15.0, avg: 101.6, max: 271.0) [2024-06-15 16:00:00,955][1648985] Avg episode reward: [(0, '158.710')] [2024-06-15 16:00:02,162][1652491] Updated weights for policy 0, policy_version 362940 (0.0016) [2024-06-15 16:00:03,988][1652491] Updated weights for policy 0, policy_version 362992 (0.0029) [2024-06-15 16:00:05,955][1648985] Fps is (10 sec: 52441.5, 60 sec: 44236.8, 300 sec: 46208.5). Total num frames: 743440384. Throughput: 0: 11161.6. Samples: 185939968. Policy #0 lag: (min: 15.0, avg: 101.6, max: 271.0) [2024-06-15 16:00:05,956][1648985] Avg episode reward: [(0, '160.690')] [2024-06-15 16:00:07,190][1652491] Updated weights for policy 0, policy_version 363043 (0.0047) [2024-06-15 16:00:09,830][1652491] Updated weights for policy 0, policy_version 363120 (0.0017) [2024-06-15 16:00:10,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 45329.1, 300 sec: 46208.4). Total num frames: 743702528. Throughput: 0: 11332.3. Samples: 185973760. Policy #0 lag: (min: 15.0, avg: 101.6, max: 271.0) [2024-06-15 16:00:10,956][1648985] Avg episode reward: [(0, '148.720')] [2024-06-15 16:00:12,703][1652491] Updated weights for policy 0, policy_version 363152 (0.0055) [2024-06-15 16:00:15,225][1652491] Updated weights for policy 0, policy_version 363248 (0.0138) [2024-06-15 16:00:15,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 45875.2, 300 sec: 46430.6). Total num frames: 743964672. Throughput: 0: 11241.2. Samples: 186039808. Policy #0 lag: (min: 15.0, avg: 101.6, max: 271.0) [2024-06-15 16:00:15,956][1648985] Avg episode reward: [(0, '142.570')] [2024-06-15 16:00:18,437][1652491] Updated weights for policy 0, policy_version 363296 (0.0143) [2024-06-15 16:00:20,955][1648985] Fps is (10 sec: 42599.0, 60 sec: 45875.4, 300 sec: 45875.2). Total num frames: 744128512. Throughput: 0: 11298.2. Samples: 186109440. Policy #0 lag: (min: 15.0, avg: 101.6, max: 271.0) [2024-06-15 16:00:20,955][1648985] Avg episode reward: [(0, '141.040')] [2024-06-15 16:00:21,414][1652491] Updated weights for policy 0, policy_version 363376 (0.0013) [2024-06-15 16:00:24,731][1652491] Updated weights for policy 0, policy_version 363450 (0.0013) [2024-06-15 16:00:25,955][1648985] Fps is (10 sec: 39321.6, 60 sec: 45875.2, 300 sec: 46208.5). Total num frames: 744357888. Throughput: 0: 11457.4. Samples: 186150912. Policy #0 lag: (min: 15.0, avg: 101.6, max: 271.0) [2024-06-15 16:00:25,956][1648985] Avg episode reward: [(0, '131.680')] [2024-06-15 16:00:27,102][1652491] Updated weights for policy 0, policy_version 363511 (0.0014) [2024-06-15 16:00:30,722][1652491] Updated weights for policy 0, policy_version 363575 (0.0114) [2024-06-15 16:00:30,955][1648985] Fps is (10 sec: 49150.4, 60 sec: 45875.1, 300 sec: 45764.1). Total num frames: 744620032. Throughput: 0: 11355.0. Samples: 186214912. Policy #0 lag: (min: 15.0, avg: 101.6, max: 271.0) [2024-06-15 16:00:30,956][1648985] Avg episode reward: [(0, '146.360')] [2024-06-15 16:00:32,917][1652491] Updated weights for policy 0, policy_version 363643 (0.0033) [2024-06-15 16:00:35,955][1648985] Fps is (10 sec: 42598.5, 60 sec: 44236.9, 300 sec: 45986.3). Total num frames: 744783872. Throughput: 0: 11582.6. Samples: 186286080. Policy #0 lag: (min: 15.0, avg: 101.6, max: 271.0) [2024-06-15 16:00:35,956][1648985] Avg episode reward: [(0, '157.090')] [2024-06-15 16:00:37,315][1652491] Updated weights for policy 0, policy_version 363717 (0.0012) [2024-06-15 16:00:38,421][1652491] Updated weights for policy 0, policy_version 363767 (0.0012) [2024-06-15 16:00:40,955][1648985] Fps is (10 sec: 42599.2, 60 sec: 44236.8, 300 sec: 45986.3). Total num frames: 745046016. Throughput: 0: 11583.2. Samples: 186316288. Policy #0 lag: (min: 15.0, avg: 101.6, max: 271.0) [2024-06-15 16:00:40,956][1648985] Avg episode reward: [(0, '145.830')] [2024-06-15 16:00:41,666][1652491] Updated weights for policy 0, policy_version 363827 (0.0108) [2024-06-15 16:00:44,048][1652491] Updated weights for policy 0, policy_version 363859 (0.0025) [2024-06-15 16:00:44,765][1652491] Updated weights for policy 0, policy_version 363898 (0.0014) [2024-06-15 16:00:45,955][1648985] Fps is (10 sec: 49152.3, 60 sec: 46421.4, 300 sec: 46208.5). Total num frames: 745275392. Throughput: 0: 11548.5. Samples: 186385920. Policy #0 lag: (min: 15.0, avg: 101.6, max: 271.0) [2024-06-15 16:00:45,955][1648985] Avg episode reward: [(0, '140.290')] [2024-06-15 16:00:46,391][1651469] Signal inference workers to stop experience collection... (19000 times) [2024-06-15 16:00:46,465][1652491] InferenceWorker_p0-w0: stopping experience collection (19000 times) [2024-06-15 16:00:46,670][1651469] Signal inference workers to resume experience collection... (19000 times) [2024-06-15 16:00:46,672][1652491] InferenceWorker_p0-w0: resuming experience collection (19000 times) [2024-06-15 16:00:47,455][1652491] Updated weights for policy 0, policy_version 363952 (0.0016) [2024-06-15 16:00:49,899][1652491] Updated weights for policy 0, policy_version 364016 (0.0120) [2024-06-15 16:00:50,955][1648985] Fps is (10 sec: 49151.9, 60 sec: 46421.3, 300 sec: 46319.5). Total num frames: 745537536. Throughput: 0: 11468.8. Samples: 186456064. Policy #0 lag: (min: 15.0, avg: 101.6, max: 271.0) [2024-06-15 16:00:50,956][1648985] Avg episode reward: [(0, '135.330')] [2024-06-15 16:00:52,496][1652491] Updated weights for policy 0, policy_version 364066 (0.0013) [2024-06-15 16:00:55,303][1652491] Updated weights for policy 0, policy_version 364128 (0.0012) [2024-06-15 16:00:55,955][1648985] Fps is (10 sec: 49151.1, 60 sec: 47515.5, 300 sec: 46097.4). Total num frames: 745766912. Throughput: 0: 11491.5. Samples: 186490880. Policy #0 lag: (min: 15.0, avg: 101.6, max: 271.0) [2024-06-15 16:00:55,956][1648985] Avg episode reward: [(0, '148.710')] [2024-06-15 16:00:55,960][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000364160_745799680.pth... [2024-06-15 16:00:56,006][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000358816_734855168.pth [2024-06-15 16:00:58,076][1652491] Updated weights for policy 0, policy_version 364177 (0.0020) [2024-06-15 16:00:58,908][1652491] Updated weights for policy 0, policy_version 364224 (0.0104) [2024-06-15 16:01:00,955][1648985] Fps is (10 sec: 45875.3, 60 sec: 46967.5, 300 sec: 45986.3). Total num frames: 745996288. Throughput: 0: 11628.1. Samples: 186563072. Policy #0 lag: (min: 15.0, avg: 101.6, max: 271.0) [2024-06-15 16:01:00,956][1648985] Avg episode reward: [(0, '156.880')] [2024-06-15 16:01:01,544][1652491] Updated weights for policy 0, policy_version 364281 (0.0018) [2024-06-15 16:01:03,553][1652491] Updated weights for policy 0, policy_version 364308 (0.0020) [2024-06-15 16:01:05,955][1648985] Fps is (10 sec: 42598.3, 60 sec: 45875.1, 300 sec: 45764.1). Total num frames: 746192896. Throughput: 0: 11605.3. Samples: 186631680. Policy #0 lag: (min: 12.0, avg: 128.6, max: 268.0) [2024-06-15 16:01:05,956][1648985] Avg episode reward: [(0, '156.410')] [2024-06-15 16:01:06,254][1652491] Updated weights for policy 0, policy_version 364369 (0.0030) [2024-06-15 16:01:07,150][1652491] Updated weights for policy 0, policy_version 364416 (0.0013) [2024-06-15 16:01:09,850][1652491] Updated weights for policy 0, policy_version 364480 (0.0017) [2024-06-15 16:01:10,955][1648985] Fps is (10 sec: 45874.9, 60 sec: 45875.2, 300 sec: 45986.3). Total num frames: 746455040. Throughput: 0: 11411.9. Samples: 186664448. Policy #0 lag: (min: 12.0, avg: 128.6, max: 268.0) [2024-06-15 16:01:10,956][1648985] Avg episode reward: [(0, '155.030')] [2024-06-15 16:01:15,232][1652491] Updated weights for policy 0, policy_version 364576 (0.0130) [2024-06-15 16:01:15,955][1648985] Fps is (10 sec: 52429.3, 60 sec: 45875.2, 300 sec: 45764.1). Total num frames: 746717184. Throughput: 0: 11514.3. Samples: 186733056. Policy #0 lag: (min: 12.0, avg: 128.6, max: 268.0) [2024-06-15 16:01:15,956][1648985] Avg episode reward: [(0, '141.600')] [2024-06-15 16:01:17,093][1652491] Updated weights for policy 0, policy_version 364613 (0.0013) [2024-06-15 16:01:18,137][1652491] Updated weights for policy 0, policy_version 364663 (0.0015) [2024-06-15 16:01:20,929][1652491] Updated weights for policy 0, policy_version 364704 (0.0012) [2024-06-15 16:01:20,955][1648985] Fps is (10 sec: 45875.7, 60 sec: 46421.3, 300 sec: 45986.3). Total num frames: 746913792. Throughput: 0: 11559.8. Samples: 186806272. Policy #0 lag: (min: 12.0, avg: 128.6, max: 268.0) [2024-06-15 16:01:20,955][1648985] Avg episode reward: [(0, '143.670')] [2024-06-15 16:01:21,708][1652491] Updated weights for policy 0, policy_version 364734 (0.0012) [2024-06-15 16:01:24,489][1652491] Updated weights for policy 0, policy_version 364784 (0.0020) [2024-06-15 16:01:25,955][1648985] Fps is (10 sec: 42597.7, 60 sec: 46421.2, 300 sec: 45541.9). Total num frames: 747143168. Throughput: 0: 11696.3. Samples: 186842624. Policy #0 lag: (min: 12.0, avg: 128.6, max: 268.0) [2024-06-15 16:01:25,956][1648985] Avg episode reward: [(0, '163.030')] [2024-06-15 16:01:26,153][1652491] Updated weights for policy 0, policy_version 364832 (0.0045) [2024-06-15 16:01:27,504][1652491] Updated weights for policy 0, policy_version 364880 (0.0012) [2024-06-15 16:01:30,955][1648985] Fps is (10 sec: 45874.4, 60 sec: 45875.3, 300 sec: 45764.1). Total num frames: 747372544. Throughput: 0: 11719.1. Samples: 186913280. Policy #0 lag: (min: 12.0, avg: 128.6, max: 268.0) [2024-06-15 16:01:30,956][1648985] Avg episode reward: [(0, '140.030')] [2024-06-15 16:01:31,969][1652491] Updated weights for policy 0, policy_version 364944 (0.0014) [2024-06-15 16:01:32,510][1651469] Signal inference workers to stop experience collection... (19050 times) [2024-06-15 16:01:32,560][1652491] InferenceWorker_p0-w0: stopping experience collection (19050 times) [2024-06-15 16:01:32,842][1651469] Signal inference workers to resume experience collection... (19050 times) [2024-06-15 16:01:32,842][1652491] InferenceWorker_p0-w0: resuming experience collection (19050 times) [2024-06-15 16:01:33,207][1652491] Updated weights for policy 0, policy_version 364991 (0.0013) [2024-06-15 16:01:35,093][1652491] Updated weights for policy 0, policy_version 365048 (0.0013) [2024-06-15 16:01:35,955][1648985] Fps is (10 sec: 49153.0, 60 sec: 47513.6, 300 sec: 45764.1). Total num frames: 747634688. Throughput: 0: 11730.5. Samples: 186983936. Policy #0 lag: (min: 12.0, avg: 128.6, max: 268.0) [2024-06-15 16:01:35,956][1648985] Avg episode reward: [(0, '143.300')] [2024-06-15 16:01:37,112][1652491] Updated weights for policy 0, policy_version 365104 (0.0018) [2024-06-15 16:01:39,597][1652491] Updated weights for policy 0, policy_version 365168 (0.0018) [2024-06-15 16:01:40,955][1648985] Fps is (10 sec: 52429.4, 60 sec: 47513.6, 300 sec: 45986.3). Total num frames: 747896832. Throughput: 0: 11730.5. Samples: 187018752. Policy #0 lag: (min: 12.0, avg: 128.6, max: 268.0) [2024-06-15 16:01:40,956][1648985] Avg episode reward: [(0, '137.060')] [2024-06-15 16:01:44,215][1652491] Updated weights for policy 0, policy_version 365236 (0.0015) [2024-06-15 16:01:45,657][1652491] Updated weights for policy 0, policy_version 365282 (0.0023) [2024-06-15 16:01:45,955][1648985] Fps is (10 sec: 49152.2, 60 sec: 47513.6, 300 sec: 45986.3). Total num frames: 748126208. Throughput: 0: 11764.6. Samples: 187092480. Policy #0 lag: (min: 12.0, avg: 128.6, max: 268.0) [2024-06-15 16:01:45,956][1648985] Avg episode reward: [(0, '143.650')] [2024-06-15 16:01:47,416][1652491] Updated weights for policy 0, policy_version 365333 (0.0012) [2024-06-15 16:01:49,685][1652491] Updated weights for policy 0, policy_version 365382 (0.0014) [2024-06-15 16:01:50,748][1652491] Updated weights for policy 0, policy_version 365435 (0.0014) [2024-06-15 16:01:50,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 48059.8, 300 sec: 46208.4). Total num frames: 748421120. Throughput: 0: 11787.4. Samples: 187162112. Policy #0 lag: (min: 12.0, avg: 128.6, max: 268.0) [2024-06-15 16:01:50,956][1648985] Avg episode reward: [(0, '143.430')] [2024-06-15 16:01:54,708][1652491] Updated weights for policy 0, policy_version 365494 (0.0013) [2024-06-15 16:01:55,955][1648985] Fps is (10 sec: 42598.2, 60 sec: 46421.4, 300 sec: 45764.1). Total num frames: 748552192. Throughput: 0: 11958.1. Samples: 187202560. Policy #0 lag: (min: 12.0, avg: 128.6, max: 268.0) [2024-06-15 16:01:55,956][1648985] Avg episode reward: [(0, '135.600')] [2024-06-15 16:01:57,048][1652491] Updated weights for policy 0, policy_version 365552 (0.0025) [2024-06-15 16:01:58,555][1652491] Updated weights for policy 0, policy_version 365600 (0.0029) [2024-06-15 16:02:00,955][1648985] Fps is (10 sec: 39321.6, 60 sec: 46967.5, 300 sec: 45764.1). Total num frames: 748814336. Throughput: 0: 11912.5. Samples: 187269120. Policy #0 lag: (min: 12.0, avg: 128.6, max: 268.0) [2024-06-15 16:02:00,956][1648985] Avg episode reward: [(0, '134.260')] [2024-06-15 16:02:01,867][1652491] Updated weights for policy 0, policy_version 365680 (0.0030) [2024-06-15 16:02:05,532][1652491] Updated weights for policy 0, policy_version 365729 (0.0012) [2024-06-15 16:02:05,955][1648985] Fps is (10 sec: 49152.2, 60 sec: 47513.7, 300 sec: 46097.4). Total num frames: 749043712. Throughput: 0: 11832.9. Samples: 187338752. Policy #0 lag: (min: 12.0, avg: 128.6, max: 268.0) [2024-06-15 16:02:05,955][1648985] Avg episode reward: [(0, '128.880')] [2024-06-15 16:02:08,892][1652491] Updated weights for policy 0, policy_version 365808 (0.0014) [2024-06-15 16:02:10,438][1652491] Updated weights for policy 0, policy_version 365872 (0.0100) [2024-06-15 16:02:10,955][1648985] Fps is (10 sec: 52428.3, 60 sec: 48059.7, 300 sec: 46319.5). Total num frames: 749338624. Throughput: 0: 11696.4. Samples: 187368960. Policy #0 lag: (min: 12.0, avg: 128.6, max: 268.0) [2024-06-15 16:02:10,956][1648985] Avg episode reward: [(0, '129.230')] [2024-06-15 16:02:13,704][1652491] Updated weights for policy 0, policy_version 365936 (0.0077) [2024-06-15 16:02:15,955][1648985] Fps is (10 sec: 42598.5, 60 sec: 45875.3, 300 sec: 45764.1). Total num frames: 749469696. Throughput: 0: 11673.6. Samples: 187438592. Policy #0 lag: (min: 12.0, avg: 128.6, max: 268.0) [2024-06-15 16:02:15,956][1648985] Avg episode reward: [(0, '133.970')] [2024-06-15 16:02:17,097][1652491] Updated weights for policy 0, policy_version 366000 (0.0013) [2024-06-15 16:02:19,921][1652491] Updated weights for policy 0, policy_version 366017 (0.0014) [2024-06-15 16:02:20,280][1651469] Signal inference workers to stop experience collection... (19100 times) [2024-06-15 16:02:20,309][1652491] InferenceWorker_p0-w0: stopping experience collection (19100 times) [2024-06-15 16:02:20,588][1651469] Signal inference workers to resume experience collection... (19100 times) [2024-06-15 16:02:20,589][1652491] InferenceWorker_p0-w0: resuming experience collection (19100 times) [2024-06-15 16:02:20,955][1648985] Fps is (10 sec: 36045.2, 60 sec: 46421.3, 300 sec: 46097.4). Total num frames: 749699072. Throughput: 0: 11639.5. Samples: 187507712. Policy #0 lag: (min: 12.0, avg: 128.6, max: 268.0) [2024-06-15 16:02:20,956][1648985] Avg episode reward: [(0, '149.370')] [2024-06-15 16:02:21,326][1652491] Updated weights for policy 0, policy_version 366080 (0.0044) [2024-06-15 16:02:22,612][1652491] Updated weights for policy 0, policy_version 366138 (0.0022) [2024-06-15 16:02:24,455][1652491] Updated weights for policy 0, policy_version 366164 (0.0018) [2024-06-15 16:02:25,300][1652491] Updated weights for policy 0, policy_version 366206 (0.0015) [2024-06-15 16:02:25,955][1648985] Fps is (10 sec: 52428.5, 60 sec: 47513.7, 300 sec: 46097.4). Total num frames: 749993984. Throughput: 0: 11662.2. Samples: 187543552. Policy #0 lag: (min: 49.0, avg: 133.4, max: 257.0) [2024-06-15 16:02:25,956][1648985] Avg episode reward: [(0, '145.420')] [2024-06-15 16:02:27,411][1652491] Updated weights for policy 0, policy_version 366250 (0.0012) [2024-06-15 16:02:30,955][1648985] Fps is (10 sec: 45875.0, 60 sec: 46421.4, 300 sec: 45875.2). Total num frames: 750157824. Throughput: 0: 11764.6. Samples: 187621888. Policy #0 lag: (min: 49.0, avg: 133.4, max: 257.0) [2024-06-15 16:02:30,956][1648985] Avg episode reward: [(0, '149.830')] [2024-06-15 16:02:31,590][1652491] Updated weights for policy 0, policy_version 366328 (0.0014) [2024-06-15 16:02:33,057][1652491] Updated weights for policy 0, policy_version 366384 (0.0041) [2024-06-15 16:02:35,777][1652491] Updated weights for policy 0, policy_version 366434 (0.0013) [2024-06-15 16:02:35,955][1648985] Fps is (10 sec: 45875.6, 60 sec: 46967.5, 300 sec: 45986.3). Total num frames: 750452736. Throughput: 0: 11730.5. Samples: 187689984. Policy #0 lag: (min: 49.0, avg: 133.4, max: 257.0) [2024-06-15 16:02:35,955][1648985] Avg episode reward: [(0, '146.110')] [2024-06-15 16:02:36,365][1652491] Updated weights for policy 0, policy_version 366461 (0.0010) [2024-06-15 16:02:38,140][1652491] Updated weights for policy 0, policy_version 366524 (0.0013) [2024-06-15 16:02:40,955][1648985] Fps is (10 sec: 49151.9, 60 sec: 45875.2, 300 sec: 45987.3). Total num frames: 750649344. Throughput: 0: 11616.7. Samples: 187725312. Policy #0 lag: (min: 49.0, avg: 133.4, max: 257.0) [2024-06-15 16:02:40,956][1648985] Avg episode reward: [(0, '149.250')] [2024-06-15 16:02:41,916][1652491] Updated weights for policy 0, policy_version 366563 (0.0014) [2024-06-15 16:02:43,703][1652491] Updated weights for policy 0, policy_version 366624 (0.0014) [2024-06-15 16:02:45,955][1648985] Fps is (10 sec: 45874.6, 60 sec: 46421.3, 300 sec: 46430.6). Total num frames: 750911488. Throughput: 0: 11696.3. Samples: 187795456. Policy #0 lag: (min: 49.0, avg: 133.4, max: 257.0) [2024-06-15 16:02:45,956][1648985] Avg episode reward: [(0, '148.390')] [2024-06-15 16:02:47,390][1652491] Updated weights for policy 0, policy_version 366705 (0.0013) [2024-06-15 16:02:49,138][1652491] Updated weights for policy 0, policy_version 366774 (0.0013) [2024-06-15 16:02:50,955][1648985] Fps is (10 sec: 52428.6, 60 sec: 45875.1, 300 sec: 46319.5). Total num frames: 751173632. Throughput: 0: 11719.1. Samples: 187866112. Policy #0 lag: (min: 49.0, avg: 133.4, max: 257.0) [2024-06-15 16:02:50,956][1648985] Avg episode reward: [(0, '137.000')] [2024-06-15 16:02:53,474][1652491] Updated weights for policy 0, policy_version 366817 (0.0014) [2024-06-15 16:02:55,332][1652491] Updated weights for policy 0, policy_version 366896 (0.0013) [2024-06-15 16:02:55,955][1648985] Fps is (10 sec: 52428.5, 60 sec: 48059.6, 300 sec: 46652.7). Total num frames: 751435776. Throughput: 0: 11923.9. Samples: 187905536. Policy #0 lag: (min: 49.0, avg: 133.4, max: 257.0) [2024-06-15 16:02:55,956][1648985] Avg episode reward: [(0, '130.480')] [2024-06-15 16:02:55,964][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000366912_751435776.pth... [2024-06-15 16:02:56,039][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000361472_740294656.pth [2024-06-15 16:02:58,648][1652491] Updated weights for policy 0, policy_version 366961 (0.0013) [2024-06-15 16:02:59,773][1651469] Signal inference workers to stop experience collection... (19150 times) [2024-06-15 16:02:59,808][1652491] InferenceWorker_p0-w0: stopping experience collection (19150 times) [2024-06-15 16:03:00,042][1651469] Signal inference workers to resume experience collection... (19150 times) [2024-06-15 16:03:00,043][1652491] InferenceWorker_p0-w0: resuming experience collection (19150 times) [2024-06-15 16:03:00,200][1652491] Updated weights for policy 0, policy_version 367029 (0.0012) [2024-06-15 16:03:00,955][1648985] Fps is (10 sec: 52427.8, 60 sec: 48059.5, 300 sec: 46652.7). Total num frames: 751697920. Throughput: 0: 11719.0. Samples: 187965952. Policy #0 lag: (min: 49.0, avg: 133.4, max: 257.0) [2024-06-15 16:03:00,956][1648985] Avg episode reward: [(0, '135.950')] [2024-06-15 16:03:04,553][1652491] Updated weights for policy 0, policy_version 367072 (0.0165) [2024-06-15 16:03:05,955][1648985] Fps is (10 sec: 39322.1, 60 sec: 46421.3, 300 sec: 46208.4). Total num frames: 751828992. Throughput: 0: 11958.0. Samples: 188045824. Policy #0 lag: (min: 49.0, avg: 133.4, max: 257.0) [2024-06-15 16:03:05,956][1648985] Avg episode reward: [(0, '132.250')] [2024-06-15 16:03:06,176][1652491] Updated weights for policy 0, policy_version 367120 (0.0012) [2024-06-15 16:03:09,094][1652491] Updated weights for policy 0, policy_version 367184 (0.0015) [2024-06-15 16:03:10,955][1648985] Fps is (10 sec: 39322.5, 60 sec: 45875.3, 300 sec: 46208.5). Total num frames: 752091136. Throughput: 0: 11901.2. Samples: 188079104. Policy #0 lag: (min: 49.0, avg: 133.4, max: 257.0) [2024-06-15 16:03:10,956][1648985] Avg episode reward: [(0, '144.080')] [2024-06-15 16:03:11,972][1652491] Updated weights for policy 0, policy_version 367290 (0.0013) [2024-06-15 16:03:15,955][1648985] Fps is (10 sec: 39321.7, 60 sec: 45875.2, 300 sec: 45764.1). Total num frames: 752222208. Throughput: 0: 11628.1. Samples: 188145152. Policy #0 lag: (min: 49.0, avg: 133.4, max: 257.0) [2024-06-15 16:03:15,956][1648985] Avg episode reward: [(0, '141.260')] [2024-06-15 16:03:16,911][1652491] Updated weights for policy 0, policy_version 367360 (0.0016) [2024-06-15 16:03:18,237][1652491] Updated weights for policy 0, policy_version 367423 (0.0013) [2024-06-15 16:03:20,955][1648985] Fps is (10 sec: 49151.9, 60 sec: 48059.7, 300 sec: 46541.6). Total num frames: 752582656. Throughput: 0: 11639.4. Samples: 188213760. Policy #0 lag: (min: 49.0, avg: 133.4, max: 257.0) [2024-06-15 16:03:20,956][1648985] Avg episode reward: [(0, '164.980')] [2024-06-15 16:03:20,989][1652491] Updated weights for policy 0, policy_version 367477 (0.0014) [2024-06-15 16:03:22,539][1652491] Updated weights for policy 0, policy_version 367520 (0.0010) [2024-06-15 16:03:25,955][1648985] Fps is (10 sec: 52428.5, 60 sec: 45875.2, 300 sec: 46097.4). Total num frames: 752746496. Throughput: 0: 11616.7. Samples: 188248064. Policy #0 lag: (min: 49.0, avg: 133.4, max: 257.0) [2024-06-15 16:03:25,956][1648985] Avg episode reward: [(0, '162.870')] [2024-06-15 16:03:26,848][1652491] Updated weights for policy 0, policy_version 367553 (0.0011) [2024-06-15 16:03:27,862][1652491] Updated weights for policy 0, policy_version 367608 (0.0015) [2024-06-15 16:03:29,023][1652491] Updated weights for policy 0, policy_version 367651 (0.0014) [2024-06-15 16:03:30,576][1652491] Updated weights for policy 0, policy_version 367701 (0.0012) [2024-06-15 16:03:30,955][1648985] Fps is (10 sec: 49152.1, 60 sec: 48605.9, 300 sec: 46764.8). Total num frames: 753074176. Throughput: 0: 11696.4. Samples: 188321792. Policy #0 lag: (min: 49.0, avg: 133.4, max: 257.0) [2024-06-15 16:03:30,956][1648985] Avg episode reward: [(0, '161.500')] [2024-06-15 16:03:34,032][1652491] Updated weights for policy 0, policy_version 367792 (0.0014) [2024-06-15 16:03:35,955][1648985] Fps is (10 sec: 52428.1, 60 sec: 46967.3, 300 sec: 46541.6). Total num frames: 753270784. Throughput: 0: 11673.6. Samples: 188391424. Policy #0 lag: (min: 49.0, avg: 133.4, max: 257.0) [2024-06-15 16:03:35,956][1648985] Avg episode reward: [(0, '154.020')] [2024-06-15 16:03:38,986][1652491] Updated weights for policy 0, policy_version 367856 (0.0012) [2024-06-15 16:03:40,955][1648985] Fps is (10 sec: 45875.3, 60 sec: 48059.7, 300 sec: 46652.8). Total num frames: 753532928. Throughput: 0: 11764.7. Samples: 188434944. Policy #0 lag: (min: 79.0, avg: 146.6, max: 287.0) [2024-06-15 16:03:40,956][1648985] Avg episode reward: [(0, '153.320')] [2024-06-15 16:03:40,959][1652491] Updated weights for policy 0, policy_version 367936 (0.0015) [2024-06-15 16:03:44,836][1652491] Updated weights for policy 0, policy_version 368016 (0.0042) [2024-06-15 16:03:45,308][1651469] Signal inference workers to stop experience collection... (19200 times) [2024-06-15 16:03:45,402][1652491] InferenceWorker_p0-w0: stopping experience collection (19200 times) [2024-06-15 16:03:45,584][1651469] Signal inference workers to resume experience collection... (19200 times) [2024-06-15 16:03:45,585][1652491] InferenceWorker_p0-w0: resuming experience collection (19200 times) [2024-06-15 16:03:45,955][1648985] Fps is (10 sec: 52429.4, 60 sec: 48059.8, 300 sec: 46652.7). Total num frames: 753795072. Throughput: 0: 11696.4. Samples: 188492288. Policy #0 lag: (min: 79.0, avg: 146.6, max: 287.0) [2024-06-15 16:03:45,956][1648985] Avg episode reward: [(0, '137.950')] [2024-06-15 16:03:50,946][1652491] Updated weights for policy 0, policy_version 368082 (0.0111) [2024-06-15 16:03:50,955][1648985] Fps is (10 sec: 29490.5, 60 sec: 44236.7, 300 sec: 45875.2). Total num frames: 753827840. Throughput: 0: 11548.4. Samples: 188565504. Policy #0 lag: (min: 79.0, avg: 146.6, max: 287.0) [2024-06-15 16:03:50,956][1648985] Avg episode reward: [(0, '144.420')] [2024-06-15 16:03:53,240][1652491] Updated weights for policy 0, policy_version 368176 (0.0071) [2024-06-15 16:03:54,860][1652491] Updated weights for policy 0, policy_version 368248 (0.0014) [2024-06-15 16:03:55,955][1648985] Fps is (10 sec: 39321.5, 60 sec: 45875.3, 300 sec: 46208.4). Total num frames: 754188288. Throughput: 0: 11264.0. Samples: 188585984. Policy #0 lag: (min: 79.0, avg: 146.6, max: 287.0) [2024-06-15 16:03:55,956][1648985] Avg episode reward: [(0, '144.510')] [2024-06-15 16:03:57,955][1652491] Updated weights for policy 0, policy_version 368314 (0.0024) [2024-06-15 16:04:00,955][1648985] Fps is (10 sec: 49153.2, 60 sec: 43690.9, 300 sec: 45875.2). Total num frames: 754319360. Throughput: 0: 11355.0. Samples: 188656128. Policy #0 lag: (min: 79.0, avg: 146.6, max: 287.0) [2024-06-15 16:04:00,956][1648985] Avg episode reward: [(0, '155.230')] [2024-06-15 16:04:03,310][1652491] Updated weights for policy 0, policy_version 368355 (0.0013) [2024-06-15 16:04:05,356][1652491] Updated weights for policy 0, policy_version 368442 (0.0113) [2024-06-15 16:04:05,955][1648985] Fps is (10 sec: 39321.3, 60 sec: 45875.1, 300 sec: 46097.3). Total num frames: 754581504. Throughput: 0: 11218.5. Samples: 188718592. Policy #0 lag: (min: 79.0, avg: 146.6, max: 287.0) [2024-06-15 16:04:05,956][1648985] Avg episode reward: [(0, '150.850')] [2024-06-15 16:04:06,663][1652491] Updated weights for policy 0, policy_version 368482 (0.0013) [2024-06-15 16:04:08,683][1652491] Updated weights for policy 0, policy_version 368529 (0.0012) [2024-06-15 16:04:09,500][1652491] Updated weights for policy 0, policy_version 368576 (0.0019) [2024-06-15 16:04:10,955][1648985] Fps is (10 sec: 52428.5, 60 sec: 45875.2, 300 sec: 46208.4). Total num frames: 754843648. Throughput: 0: 11252.6. Samples: 188754432. Policy #0 lag: (min: 79.0, avg: 146.6, max: 287.0) [2024-06-15 16:04:10,956][1648985] Avg episode reward: [(0, '161.520')] [2024-06-15 16:04:15,508][1652491] Updated weights for policy 0, policy_version 368643 (0.0015) [2024-06-15 16:04:15,955][1648985] Fps is (10 sec: 42599.2, 60 sec: 46421.3, 300 sec: 46208.5). Total num frames: 755007488. Throughput: 0: 11252.6. Samples: 188828160. Policy #0 lag: (min: 79.0, avg: 146.6, max: 287.0) [2024-06-15 16:04:15,955][1648985] Avg episode reward: [(0, '160.640')] [2024-06-15 16:04:17,585][1652491] Updated weights for policy 0, policy_version 368707 (0.0014) [2024-06-15 16:04:20,018][1652491] Updated weights for policy 0, policy_version 368769 (0.0014) [2024-06-15 16:04:20,955][1648985] Fps is (10 sec: 45875.5, 60 sec: 45329.1, 300 sec: 46430.6). Total num frames: 755302400. Throughput: 0: 10934.1. Samples: 188883456. Policy #0 lag: (min: 79.0, avg: 146.6, max: 287.0) [2024-06-15 16:04:20,956][1648985] Avg episode reward: [(0, '168.380')] [2024-06-15 16:04:25,955][1648985] Fps is (10 sec: 36044.4, 60 sec: 43690.6, 300 sec: 45764.1). Total num frames: 755367936. Throughput: 0: 10786.1. Samples: 188920320. Policy #0 lag: (min: 79.0, avg: 146.6, max: 287.0) [2024-06-15 16:04:25,956][1648985] Avg episode reward: [(0, '167.280')] [2024-06-15 16:04:26,729][1652491] Updated weights for policy 0, policy_version 368850 (0.0013) [2024-06-15 16:04:28,443][1652491] Updated weights for policy 0, policy_version 368915 (0.0011) [2024-06-15 16:04:30,922][1651469] Signal inference workers to stop experience collection... (19250 times) [2024-06-15 16:04:30,955][1648985] Fps is (10 sec: 39321.5, 60 sec: 43690.7, 300 sec: 45986.3). Total num frames: 755695616. Throughput: 0: 10934.0. Samples: 188984320. Policy #0 lag: (min: 79.0, avg: 146.6, max: 287.0) [2024-06-15 16:04:30,956][1648985] Avg episode reward: [(0, '157.610')] [2024-06-15 16:04:30,965][1652491] InferenceWorker_p0-w0: stopping experience collection (19250 times) [2024-06-15 16:04:30,971][1652491] Updated weights for policy 0, policy_version 368994 (0.0017) [2024-06-15 16:04:31,313][1651469] Signal inference workers to resume experience collection... (19250 times) [2024-06-15 16:04:31,314][1652491] InferenceWorker_p0-w0: resuming experience collection (19250 times) [2024-06-15 16:04:33,092][1652491] Updated weights for policy 0, policy_version 369088 (0.0022) [2024-06-15 16:04:35,955][1648985] Fps is (10 sec: 52427.8, 60 sec: 43690.6, 300 sec: 45764.1). Total num frames: 755892224. Throughput: 0: 10752.0. Samples: 189049344. Policy #0 lag: (min: 79.0, avg: 146.6, max: 287.0) [2024-06-15 16:04:35,956][1648985] Avg episode reward: [(0, '152.120')] [2024-06-15 16:04:40,253][1652491] Updated weights for policy 0, policy_version 369156 (0.0012) [2024-06-15 16:04:40,955][1648985] Fps is (10 sec: 39320.7, 60 sec: 42598.2, 300 sec: 46097.3). Total num frames: 756088832. Throughput: 0: 11172.9. Samples: 189088768. Policy #0 lag: (min: 79.0, avg: 146.6, max: 287.0) [2024-06-15 16:04:40,956][1648985] Avg episode reward: [(0, '147.760')] [2024-06-15 16:04:41,363][1652491] Updated weights for policy 0, policy_version 369208 (0.0015) [2024-06-15 16:04:43,385][1652491] Updated weights for policy 0, policy_version 369280 (0.0102) [2024-06-15 16:04:45,955][1648985] Fps is (10 sec: 52430.3, 60 sec: 43690.7, 300 sec: 46319.5). Total num frames: 756416512. Throughput: 0: 10797.5. Samples: 189142016. Policy #0 lag: (min: 79.0, avg: 146.6, max: 287.0) [2024-06-15 16:04:45,956][1648985] Avg episode reward: [(0, '134.420')] [2024-06-15 16:04:50,602][1652491] Updated weights for policy 0, policy_version 369360 (0.0188) [2024-06-15 16:04:50,956][1648985] Fps is (10 sec: 36043.4, 60 sec: 43690.4, 300 sec: 45875.5). Total num frames: 756449280. Throughput: 0: 11138.7. Samples: 189219840. Policy #0 lag: (min: 79.0, avg: 146.6, max: 287.0) [2024-06-15 16:04:50,956][1648985] Avg episode reward: [(0, '141.350')] [2024-06-15 16:04:52,207][1652491] Updated weights for policy 0, policy_version 369424 (0.0086) [2024-06-15 16:04:53,039][1652491] Updated weights for policy 0, policy_version 369465 (0.0014) [2024-06-15 16:04:55,180][1652491] Updated weights for policy 0, policy_version 369525 (0.0088) [2024-06-15 16:04:55,955][1648985] Fps is (10 sec: 42597.3, 60 sec: 44236.7, 300 sec: 46319.5). Total num frames: 756842496. Throughput: 0: 11036.4. Samples: 189251072. Policy #0 lag: (min: 79.0, avg: 146.6, max: 287.0) [2024-06-15 16:04:55,956][1648985] Avg episode reward: [(0, '159.640')] [2024-06-15 16:04:56,290][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000369584_756908032.pth... [2024-06-15 16:04:56,348][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000364160_745799680.pth [2024-06-15 16:04:56,497][1652491] Updated weights for policy 0, policy_version 369595 (0.0101) [2024-06-15 16:05:00,955][1648985] Fps is (10 sec: 49154.9, 60 sec: 43690.6, 300 sec: 45764.1). Total num frames: 756940800. Throughput: 0: 10922.6. Samples: 189319680. Policy #0 lag: (min: 79.0, avg: 146.6, max: 287.0) [2024-06-15 16:05:00,956][1648985] Avg episode reward: [(0, '160.700')] [2024-06-15 16:05:02,613][1652491] Updated weights for policy 0, policy_version 369651 (0.0089) [2024-06-15 16:05:04,145][1652491] Updated weights for policy 0, policy_version 369716 (0.0015) [2024-06-15 16:05:05,146][1652491] Updated weights for policy 0, policy_version 369747 (0.0011) [2024-06-15 16:05:05,955][1648985] Fps is (10 sec: 45876.2, 60 sec: 45329.2, 300 sec: 46097.4). Total num frames: 757301248. Throughput: 0: 11195.7. Samples: 189387264. Policy #0 lag: (min: 4.0, avg: 76.3, max: 260.0) [2024-06-15 16:05:05,956][1648985] Avg episode reward: [(0, '156.320')] [2024-06-15 16:05:07,558][1652491] Updated weights for policy 0, policy_version 369846 (0.0014) [2024-06-15 16:05:10,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 43690.7, 300 sec: 45764.1). Total num frames: 757465088. Throughput: 0: 11127.5. Samples: 189421056. Policy #0 lag: (min: 4.0, avg: 76.3, max: 260.0) [2024-06-15 16:05:10,956][1648985] Avg episode reward: [(0, '136.790')] [2024-06-15 16:05:13,576][1652491] Updated weights for policy 0, policy_version 369910 (0.0120) [2024-06-15 16:05:14,943][1651469] Signal inference workers to stop experience collection... (19300 times) [2024-06-15 16:05:14,988][1652491] InferenceWorker_p0-w0: stopping experience collection (19300 times) [2024-06-15 16:05:15,190][1651469] Signal inference workers to resume experience collection... (19300 times) [2024-06-15 16:05:15,190][1652491] InferenceWorker_p0-w0: resuming experience collection (19300 times) [2024-06-15 16:05:15,193][1652491] Updated weights for policy 0, policy_version 369952 (0.0022) [2024-06-15 16:05:15,955][1648985] Fps is (10 sec: 39321.7, 60 sec: 44782.9, 300 sec: 45986.3). Total num frames: 757694464. Throughput: 0: 11400.5. Samples: 189497344. Policy #0 lag: (min: 4.0, avg: 76.3, max: 260.0) [2024-06-15 16:05:15,956][1648985] Avg episode reward: [(0, '153.190')] [2024-06-15 16:05:16,436][1652491] Updated weights for policy 0, policy_version 370000 (0.0035) [2024-06-15 16:05:17,631][1652491] Updated weights for policy 0, policy_version 370053 (0.0029) [2024-06-15 16:05:18,663][1652491] Updated weights for policy 0, policy_version 370105 (0.0013) [2024-06-15 16:05:20,955][1648985] Fps is (10 sec: 52428.3, 60 sec: 44782.8, 300 sec: 46208.4). Total num frames: 757989376. Throughput: 0: 11457.5. Samples: 189564928. Policy #0 lag: (min: 4.0, avg: 76.3, max: 260.0) [2024-06-15 16:05:20,956][1648985] Avg episode reward: [(0, '170.820')] [2024-06-15 16:05:24,342][1652491] Updated weights for policy 0, policy_version 370151 (0.0014) [2024-06-15 16:05:25,955][1648985] Fps is (10 sec: 42598.3, 60 sec: 45875.3, 300 sec: 45764.1). Total num frames: 758120448. Throughput: 0: 11446.1. Samples: 189603840. Policy #0 lag: (min: 4.0, avg: 76.3, max: 260.0) [2024-06-15 16:05:25,955][1648985] Avg episode reward: [(0, '151.510')] [2024-06-15 16:05:26,711][1652491] Updated weights for policy 0, policy_version 370193 (0.0015) [2024-06-15 16:05:27,985][1652491] Updated weights for policy 0, policy_version 370242 (0.0106) [2024-06-15 16:05:29,492][1652491] Updated weights for policy 0, policy_version 370320 (0.0090) [2024-06-15 16:05:30,955][1648985] Fps is (10 sec: 52429.2, 60 sec: 46967.4, 300 sec: 46541.7). Total num frames: 758513664. Throughput: 0: 11616.7. Samples: 189664768. Policy #0 lag: (min: 4.0, avg: 76.3, max: 260.0) [2024-06-15 16:05:30,956][1648985] Avg episode reward: [(0, '130.330')] [2024-06-15 16:05:35,324][1652491] Updated weights for policy 0, policy_version 370386 (0.0013) [2024-06-15 16:05:35,955][1648985] Fps is (10 sec: 49152.9, 60 sec: 45329.4, 300 sec: 45986.3). Total num frames: 758611968. Throughput: 0: 11469.0. Samples: 189735936. Policy #0 lag: (min: 4.0, avg: 76.3, max: 260.0) [2024-06-15 16:05:35,955][1648985] Avg episode reward: [(0, '129.850')] [2024-06-15 16:05:37,847][1652491] Updated weights for policy 0, policy_version 370433 (0.0107) [2024-06-15 16:05:39,296][1652491] Updated weights for policy 0, policy_version 370496 (0.0014) [2024-06-15 16:05:40,955][1648985] Fps is (10 sec: 39321.3, 60 sec: 46967.6, 300 sec: 46208.4). Total num frames: 758906880. Throughput: 0: 11548.5. Samples: 189770752. Policy #0 lag: (min: 4.0, avg: 76.3, max: 260.0) [2024-06-15 16:05:40,956][1648985] Avg episode reward: [(0, '121.200')] [2024-06-15 16:05:41,132][1652491] Updated weights for policy 0, policy_version 370576 (0.0021) [2024-06-15 16:05:45,955][1648985] Fps is (10 sec: 42597.1, 60 sec: 43690.6, 300 sec: 45764.1). Total num frames: 759037952. Throughput: 0: 11525.7. Samples: 189838336. Policy #0 lag: (min: 4.0, avg: 76.3, max: 260.0) [2024-06-15 16:05:45,957][1648985] Avg episode reward: [(0, '132.630')] [2024-06-15 16:05:47,190][1652491] Updated weights for policy 0, policy_version 370656 (0.0083) [2024-06-15 16:05:47,955][1652491] Updated weights for policy 0, policy_version 370688 (0.0019) [2024-06-15 16:05:50,833][1652491] Updated weights for policy 0, policy_version 370752 (0.0014) [2024-06-15 16:05:50,955][1648985] Fps is (10 sec: 39322.1, 60 sec: 47514.1, 300 sec: 45875.2). Total num frames: 759300096. Throughput: 0: 11525.7. Samples: 189905920. Policy #0 lag: (min: 4.0, avg: 76.3, max: 260.0) [2024-06-15 16:05:50,956][1648985] Avg episode reward: [(0, '146.870')] [2024-06-15 16:05:52,891][1652491] Updated weights for policy 0, policy_version 370832 (0.0017) [2024-06-15 16:05:53,346][1651469] Signal inference workers to stop experience collection... (19350 times) [2024-06-15 16:05:53,380][1652491] InferenceWorker_p0-w0: stopping experience collection (19350 times) [2024-06-15 16:05:53,515][1651469] Signal inference workers to resume experience collection... (19350 times) [2024-06-15 16:05:53,524][1652491] InferenceWorker_p0-w0: resuming experience collection (19350 times) [2024-06-15 16:05:55,955][1648985] Fps is (10 sec: 52429.6, 60 sec: 45329.3, 300 sec: 45986.3). Total num frames: 759562240. Throughput: 0: 11332.3. Samples: 189931008. Policy #0 lag: (min: 4.0, avg: 76.3, max: 260.0) [2024-06-15 16:05:55,956][1648985] Avg episode reward: [(0, '157.070')] [2024-06-15 16:05:57,775][1652491] Updated weights for policy 0, policy_version 370882 (0.0067) [2024-06-15 16:05:58,710][1652491] Updated weights for policy 0, policy_version 370942 (0.0013) [2024-06-15 16:06:00,955][1648985] Fps is (10 sec: 39321.3, 60 sec: 45875.2, 300 sec: 45764.1). Total num frames: 759693312. Throughput: 0: 11502.9. Samples: 190014976. Policy #0 lag: (min: 4.0, avg: 76.3, max: 260.0) [2024-06-15 16:06:00,956][1648985] Avg episode reward: [(0, '143.600')] [2024-06-15 16:06:02,299][1652491] Updated weights for policy 0, policy_version 371008 (0.0013) [2024-06-15 16:06:04,073][1652491] Updated weights for policy 0, policy_version 371088 (0.0012) [2024-06-15 16:06:05,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 46421.4, 300 sec: 46208.4). Total num frames: 760086528. Throughput: 0: 11423.3. Samples: 190078976. Policy #0 lag: (min: 4.0, avg: 76.3, max: 260.0) [2024-06-15 16:06:05,956][1648985] Avg episode reward: [(0, '141.080')] [2024-06-15 16:06:09,205][1652491] Updated weights for policy 0, policy_version 371168 (0.0120) [2024-06-15 16:06:10,955][1648985] Fps is (10 sec: 52429.2, 60 sec: 45875.2, 300 sec: 45764.1). Total num frames: 760217600. Throughput: 0: 11480.2. Samples: 190120448. Policy #0 lag: (min: 4.0, avg: 76.3, max: 260.0) [2024-06-15 16:06:10,956][1648985] Avg episode reward: [(0, '139.080')] [2024-06-15 16:06:12,542][1652491] Updated weights for policy 0, policy_version 371216 (0.0014) [2024-06-15 16:06:13,684][1652491] Updated weights for policy 0, policy_version 371265 (0.0041) [2024-06-15 16:06:14,932][1652491] Updated weights for policy 0, policy_version 371331 (0.0015) [2024-06-15 16:06:15,955][1648985] Fps is (10 sec: 45875.6, 60 sec: 47513.7, 300 sec: 46208.4). Total num frames: 760545280. Throughput: 0: 11514.3. Samples: 190182912. Policy #0 lag: (min: 4.0, avg: 76.3, max: 260.0) [2024-06-15 16:06:15,955][1648985] Avg episode reward: [(0, '139.810')] [2024-06-15 16:06:16,335][1652491] Updated weights for policy 0, policy_version 371382 (0.0014) [2024-06-15 16:06:19,989][1652491] Updated weights for policy 0, policy_version 371408 (0.0016) [2024-06-15 16:06:20,955][1648985] Fps is (10 sec: 49151.0, 60 sec: 45329.0, 300 sec: 45986.3). Total num frames: 760709120. Throughput: 0: 11684.9. Samples: 190261760. Policy #0 lag: (min: 7.0, avg: 102.2, max: 263.0) [2024-06-15 16:06:20,957][1648985] Avg episode reward: [(0, '129.860')] [2024-06-15 16:06:22,706][1652491] Updated weights for policy 0, policy_version 371457 (0.0013) [2024-06-15 16:06:24,453][1652491] Updated weights for policy 0, policy_version 371536 (0.0013) [2024-06-15 16:06:25,955][1648985] Fps is (10 sec: 45874.9, 60 sec: 48059.8, 300 sec: 46208.5). Total num frames: 761004032. Throughput: 0: 11696.4. Samples: 190297088. Policy #0 lag: (min: 7.0, avg: 102.2, max: 263.0) [2024-06-15 16:06:25,955][1648985] Avg episode reward: [(0, '146.900')] [2024-06-15 16:06:26,319][1652491] Updated weights for policy 0, policy_version 371590 (0.0014) [2024-06-15 16:06:30,891][1652491] Updated weights for policy 0, policy_version 371651 (0.0015) [2024-06-15 16:06:30,955][1648985] Fps is (10 sec: 42599.4, 60 sec: 43690.7, 300 sec: 45764.1). Total num frames: 761135104. Throughput: 0: 11730.5. Samples: 190366208. Policy #0 lag: (min: 7.0, avg: 102.2, max: 263.0) [2024-06-15 16:06:30,955][1648985] Avg episode reward: [(0, '147.950')] [2024-06-15 16:06:32,310][1652491] Updated weights for policy 0, policy_version 371712 (0.0013) [2024-06-15 16:06:34,604][1652491] Updated weights for policy 0, policy_version 371773 (0.0015) [2024-06-15 16:06:35,912][1651469] Signal inference workers to stop experience collection... (19400 times) [2024-06-15 16:06:35,955][1648985] Fps is (10 sec: 45875.7, 60 sec: 47513.6, 300 sec: 45986.3). Total num frames: 761462784. Throughput: 0: 11730.5. Samples: 190433792. Policy #0 lag: (min: 7.0, avg: 102.2, max: 263.0) [2024-06-15 16:06:35,955][1648985] Avg episode reward: [(0, '156.000')] [2024-06-15 16:06:35,996][1652491] InferenceWorker_p0-w0: stopping experience collection (19400 times) [2024-06-15 16:06:36,186][1651469] Signal inference workers to resume experience collection... (19400 times) [2024-06-15 16:06:36,187][1652491] InferenceWorker_p0-w0: resuming experience collection (19400 times) [2024-06-15 16:06:36,362][1652491] Updated weights for policy 0, policy_version 371829 (0.0021) [2024-06-15 16:06:38,138][1652491] Updated weights for policy 0, policy_version 371879 (0.0025) [2024-06-15 16:06:40,955][1648985] Fps is (10 sec: 52428.5, 60 sec: 45875.3, 300 sec: 45875.2). Total num frames: 761659392. Throughput: 0: 11901.1. Samples: 190466560. Policy #0 lag: (min: 7.0, avg: 102.2, max: 263.0) [2024-06-15 16:06:40,956][1648985] Avg episode reward: [(0, '147.200')] [2024-06-15 16:06:42,431][1652491] Updated weights for policy 0, policy_version 371921 (0.0013) [2024-06-15 16:06:43,397][1652491] Updated weights for policy 0, policy_version 371963 (0.0013) [2024-06-15 16:06:45,926][1652491] Updated weights for policy 0, policy_version 372016 (0.0021) [2024-06-15 16:06:45,955][1648985] Fps is (10 sec: 42597.9, 60 sec: 47513.7, 300 sec: 45653.0). Total num frames: 761888768. Throughput: 0: 11810.2. Samples: 190546432. Policy #0 lag: (min: 7.0, avg: 102.2, max: 263.0) [2024-06-15 16:06:45,956][1648985] Avg episode reward: [(0, '144.310')] [2024-06-15 16:06:46,877][1652491] Updated weights for policy 0, policy_version 372050 (0.0014) [2024-06-15 16:06:48,686][1652491] Updated weights for policy 0, policy_version 372113 (0.0145) [2024-06-15 16:06:50,955][1648985] Fps is (10 sec: 52428.1, 60 sec: 48059.6, 300 sec: 46208.4). Total num frames: 762183680. Throughput: 0: 11639.4. Samples: 190602752. Policy #0 lag: (min: 7.0, avg: 102.2, max: 263.0) [2024-06-15 16:06:50,956][1648985] Avg episode reward: [(0, '141.570')] [2024-06-15 16:06:53,985][1652491] Updated weights for policy 0, policy_version 372164 (0.0011) [2024-06-15 16:06:55,310][1652491] Updated weights for policy 0, policy_version 372219 (0.0169) [2024-06-15 16:06:55,956][1648985] Fps is (10 sec: 42594.4, 60 sec: 45874.5, 300 sec: 45764.0). Total num frames: 762314752. Throughput: 0: 11639.2. Samples: 190644224. Policy #0 lag: (min: 7.0, avg: 102.2, max: 263.0) [2024-06-15 16:06:55,957][1648985] Avg episode reward: [(0, '155.920')] [2024-06-15 16:06:55,962][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000372224_762314752.pth... [2024-06-15 16:06:56,003][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000366912_751435776.pth [2024-06-15 16:06:58,120][1652491] Updated weights for policy 0, policy_version 372288 (0.0012) [2024-06-15 16:06:59,215][1652491] Updated weights for policy 0, policy_version 372337 (0.0015) [2024-06-15 16:07:00,955][1648985] Fps is (10 sec: 45875.7, 60 sec: 49152.0, 300 sec: 46097.3). Total num frames: 762642432. Throughput: 0: 11741.8. Samples: 190711296. Policy #0 lag: (min: 7.0, avg: 102.2, max: 263.0) [2024-06-15 16:07:00,956][1648985] Avg episode reward: [(0, '164.610')] [2024-06-15 16:07:00,962][1652491] Updated weights for policy 0, policy_version 372387 (0.0011) [2024-06-15 16:07:05,728][1652491] Updated weights for policy 0, policy_version 372435 (0.0013) [2024-06-15 16:07:05,955][1648985] Fps is (10 sec: 45879.0, 60 sec: 44782.9, 300 sec: 45542.0). Total num frames: 762773504. Throughput: 0: 11616.7. Samples: 190784512. Policy #0 lag: (min: 7.0, avg: 102.2, max: 263.0) [2024-06-15 16:07:05,956][1648985] Avg episode reward: [(0, '150.310')] [2024-06-15 16:07:08,319][1652491] Updated weights for policy 0, policy_version 372497 (0.0015) [2024-06-15 16:07:09,392][1652491] Updated weights for policy 0, policy_version 372560 (0.0011) [2024-06-15 16:07:10,672][1652491] Updated weights for policy 0, policy_version 372612 (0.0019) [2024-06-15 16:07:10,955][1648985] Fps is (10 sec: 49152.3, 60 sec: 48605.9, 300 sec: 46319.5). Total num frames: 763133952. Throughput: 0: 11594.0. Samples: 190818816. Policy #0 lag: (min: 7.0, avg: 102.2, max: 263.0) [2024-06-15 16:07:10,955][1648985] Avg episode reward: [(0, '131.970')] [2024-06-15 16:07:11,961][1652491] Updated weights for policy 0, policy_version 372667 (0.0020) [2024-06-15 16:07:15,955][1648985] Fps is (10 sec: 45875.9, 60 sec: 44782.9, 300 sec: 45875.2). Total num frames: 763232256. Throughput: 0: 11741.9. Samples: 190894592. Policy #0 lag: (min: 7.0, avg: 102.2, max: 263.0) [2024-06-15 16:07:15,956][1648985] Avg episode reward: [(0, '133.520')] [2024-06-15 16:07:17,362][1652491] Updated weights for policy 0, policy_version 372720 (0.0015) [2024-06-15 16:07:19,953][1652491] Updated weights for policy 0, policy_version 372771 (0.0013) [2024-06-15 16:07:20,518][1651469] Signal inference workers to stop experience collection... (19450 times) [2024-06-15 16:07:20,570][1652491] InferenceWorker_p0-w0: stopping experience collection (19450 times) [2024-06-15 16:07:20,700][1651469] Signal inference workers to resume experience collection... (19450 times) [2024-06-15 16:07:20,718][1652491] InferenceWorker_p0-w0: resuming experience collection (19450 times) [2024-06-15 16:07:20,955][1648985] Fps is (10 sec: 39321.3, 60 sec: 46967.6, 300 sec: 45875.2). Total num frames: 763527168. Throughput: 0: 11707.7. Samples: 190960640. Policy #0 lag: (min: 7.0, avg: 102.2, max: 263.0) [2024-06-15 16:07:20,956][1648985] Avg episode reward: [(0, '135.040')] [2024-06-15 16:07:21,110][1652491] Updated weights for policy 0, policy_version 372833 (0.0068) [2024-06-15 16:07:22,004][1652491] Updated weights for policy 0, policy_version 372866 (0.0014) [2024-06-15 16:07:25,955][1648985] Fps is (10 sec: 52428.1, 60 sec: 45875.1, 300 sec: 46097.4). Total num frames: 763756544. Throughput: 0: 11628.1. Samples: 190989824. Policy #0 lag: (min: 7.0, avg: 102.2, max: 263.0) [2024-06-15 16:07:25,956][1648985] Avg episode reward: [(0, '122.240')] [2024-06-15 16:07:27,613][1652491] Updated weights for policy 0, policy_version 372930 (0.0013) [2024-06-15 16:07:28,980][1652491] Updated weights for policy 0, policy_version 372985 (0.0013) [2024-06-15 16:07:30,955][1648985] Fps is (10 sec: 39321.7, 60 sec: 46421.3, 300 sec: 45653.0). Total num frames: 763920384. Throughput: 0: 11446.0. Samples: 191061504. Policy #0 lag: (min: 7.0, avg: 102.2, max: 263.0) [2024-06-15 16:07:30,956][1648985] Avg episode reward: [(0, '130.640')] [2024-06-15 16:07:31,399][1652491] Updated weights for policy 0, policy_version 373026 (0.0012) [2024-06-15 16:07:32,735][1652491] Updated weights for policy 0, policy_version 373093 (0.0126) [2024-06-15 16:07:34,435][1652491] Updated weights for policy 0, policy_version 373152 (0.0014) [2024-06-15 16:07:35,955][1648985] Fps is (10 sec: 52429.2, 60 sec: 46967.4, 300 sec: 46208.4). Total num frames: 764280832. Throughput: 0: 11616.8. Samples: 191125504. Policy #0 lag: (min: 7.0, avg: 102.2, max: 263.0) [2024-06-15 16:07:35,955][1648985] Avg episode reward: [(0, '136.870')] [2024-06-15 16:07:39,303][1652491] Updated weights for policy 0, policy_version 373201 (0.0056) [2024-06-15 16:07:40,958][1648985] Fps is (10 sec: 49136.3, 60 sec: 45872.8, 300 sec: 45763.6). Total num frames: 764411904. Throughput: 0: 11559.2. Samples: 191164416. Policy #0 lag: (min: 1.0, avg: 98.9, max: 257.0) [2024-06-15 16:07:40,959][1648985] Avg episode reward: [(0, '138.190')] [2024-06-15 16:07:42,277][1652491] Updated weights for policy 0, policy_version 373252 (0.0012) [2024-06-15 16:07:43,721][1652491] Updated weights for policy 0, policy_version 373313 (0.0039) [2024-06-15 16:07:45,281][1652491] Updated weights for policy 0, policy_version 373380 (0.0079) [2024-06-15 16:07:45,955][1648985] Fps is (10 sec: 42598.3, 60 sec: 46967.5, 300 sec: 45875.2). Total num frames: 764706816. Throughput: 0: 11514.3. Samples: 191229440. Policy #0 lag: (min: 1.0, avg: 98.9, max: 257.0) [2024-06-15 16:07:45,955][1648985] Avg episode reward: [(0, '133.340')] [2024-06-15 16:07:46,767][1652491] Updated weights for policy 0, policy_version 373436 (0.0013) [2024-06-15 16:07:50,955][1648985] Fps is (10 sec: 39334.1, 60 sec: 43690.8, 300 sec: 45319.8). Total num frames: 764805120. Throughput: 0: 11446.1. Samples: 191299584. Policy #0 lag: (min: 1.0, avg: 98.9, max: 257.0) [2024-06-15 16:07:50,956][1648985] Avg episode reward: [(0, '120.720')] [2024-06-15 16:07:51,804][1652491] Updated weights for policy 0, policy_version 373474 (0.0014) [2024-06-15 16:07:53,775][1652491] Updated weights for policy 0, policy_version 373520 (0.0012) [2024-06-15 16:07:55,025][1652491] Updated weights for policy 0, policy_version 373568 (0.0016) [2024-06-15 16:07:55,955][1648985] Fps is (10 sec: 42598.1, 60 sec: 46968.1, 300 sec: 45542.0). Total num frames: 765132800. Throughput: 0: 11434.6. Samples: 191333376. Policy #0 lag: (min: 1.0, avg: 98.9, max: 257.0) [2024-06-15 16:07:55,956][1648985] Avg episode reward: [(0, '126.410')] [2024-06-15 16:07:56,160][1652491] Updated weights for policy 0, policy_version 373618 (0.0010) [2024-06-15 16:07:56,849][1652491] Updated weights for policy 0, policy_version 373642 (0.0012) [2024-06-15 16:07:57,976][1652491] Updated weights for policy 0, policy_version 373689 (0.0097) [2024-06-15 16:08:00,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 44782.9, 300 sec: 45764.1). Total num frames: 765329408. Throughput: 0: 11207.1. Samples: 191398912. Policy #0 lag: (min: 1.0, avg: 98.9, max: 257.0) [2024-06-15 16:08:00,956][1648985] Avg episode reward: [(0, '128.710')] [2024-06-15 16:08:03,648][1652491] Updated weights for policy 0, policy_version 373752 (0.0018) [2024-06-15 16:08:05,563][1651469] Signal inference workers to stop experience collection... (19500 times) [2024-06-15 16:08:05,642][1652491] InferenceWorker_p0-w0: stopping experience collection (19500 times) [2024-06-15 16:08:05,791][1651469] Signal inference workers to resume experience collection... (19500 times) [2024-06-15 16:08:05,792][1652491] InferenceWorker_p0-w0: resuming experience collection (19500 times) [2024-06-15 16:08:05,955][1648985] Fps is (10 sec: 36044.9, 60 sec: 45329.1, 300 sec: 45430.9). Total num frames: 765493248. Throughput: 0: 11389.2. Samples: 191473152. Policy #0 lag: (min: 1.0, avg: 98.9, max: 257.0) [2024-06-15 16:08:05,956][1648985] Avg episode reward: [(0, '128.780')] [2024-06-15 16:08:06,774][1652491] Updated weights for policy 0, policy_version 373815 (0.0013) [2024-06-15 16:08:08,424][1652491] Updated weights for policy 0, policy_version 373885 (0.0012) [2024-06-15 16:08:10,170][1652491] Updated weights for policy 0, policy_version 373951 (0.0012) [2024-06-15 16:08:10,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 45329.1, 300 sec: 46208.4). Total num frames: 765853696. Throughput: 0: 11343.7. Samples: 191500288. Policy #0 lag: (min: 1.0, avg: 98.9, max: 257.0) [2024-06-15 16:08:10,956][1648985] Avg episode reward: [(0, '139.260')] [2024-06-15 16:08:15,124][1652491] Updated weights for policy 0, policy_version 374005 (0.0018) [2024-06-15 16:08:15,955][1648985] Fps is (10 sec: 49151.4, 60 sec: 45875.0, 300 sec: 45430.9). Total num frames: 765984768. Throughput: 0: 11218.5. Samples: 191566336. Policy #0 lag: (min: 1.0, avg: 98.9, max: 257.0) [2024-06-15 16:08:15,956][1648985] Avg episode reward: [(0, '152.010')] [2024-06-15 16:08:18,548][1652491] Updated weights for policy 0, policy_version 374084 (0.0030) [2024-06-15 16:08:19,905][1652491] Updated weights for policy 0, policy_version 374142 (0.0016) [2024-06-15 16:08:20,955][1648985] Fps is (10 sec: 42598.3, 60 sec: 45875.2, 300 sec: 45875.2). Total num frames: 766279680. Throughput: 0: 11229.9. Samples: 191630848. Policy #0 lag: (min: 1.0, avg: 98.9, max: 257.0) [2024-06-15 16:08:20,956][1648985] Avg episode reward: [(0, '145.020')] [2024-06-15 16:08:21,324][1652491] Updated weights for policy 0, policy_version 374192 (0.0015) [2024-06-15 16:08:25,956][1648985] Fps is (10 sec: 39317.5, 60 sec: 43689.8, 300 sec: 45097.5). Total num frames: 766377984. Throughput: 0: 11173.5. Samples: 191667200. Policy #0 lag: (min: 1.0, avg: 98.9, max: 257.0) [2024-06-15 16:08:25,957][1648985] Avg episode reward: [(0, '151.960')] [2024-06-15 16:08:26,457][1652491] Updated weights for policy 0, policy_version 374240 (0.0011) [2024-06-15 16:08:29,181][1652491] Updated weights for policy 0, policy_version 374304 (0.0011) [2024-06-15 16:08:30,955][1648985] Fps is (10 sec: 42598.6, 60 sec: 46421.4, 300 sec: 45542.0). Total num frames: 766705664. Throughput: 0: 11343.6. Samples: 191739904. Policy #0 lag: (min: 1.0, avg: 98.9, max: 257.0) [2024-06-15 16:08:30,956][1648985] Avg episode reward: [(0, '162.380')] [2024-06-15 16:08:31,663][1652491] Updated weights for policy 0, policy_version 374388 (0.0012) [2024-06-15 16:08:33,307][1652491] Updated weights for policy 0, policy_version 374461 (0.0027) [2024-06-15 16:08:35,955][1648985] Fps is (10 sec: 52434.7, 60 sec: 43690.6, 300 sec: 45319.8). Total num frames: 766902272. Throughput: 0: 11013.7. Samples: 191795200. Policy #0 lag: (min: 1.0, avg: 98.9, max: 257.0) [2024-06-15 16:08:35,956][1648985] Avg episode reward: [(0, '155.910')] [2024-06-15 16:08:39,167][1652491] Updated weights for policy 0, policy_version 374512 (0.0013) [2024-06-15 16:08:40,489][1652491] Updated weights for policy 0, policy_version 374560 (0.0010) [2024-06-15 16:08:40,955][1648985] Fps is (10 sec: 39321.3, 60 sec: 44785.3, 300 sec: 45097.7). Total num frames: 767098880. Throughput: 0: 11275.4. Samples: 191840768. Policy #0 lag: (min: 1.0, avg: 98.9, max: 257.0) [2024-06-15 16:08:40,956][1648985] Avg episode reward: [(0, '135.370')] [2024-06-15 16:08:42,060][1652491] Updated weights for policy 0, policy_version 374611 (0.0013) [2024-06-15 16:08:43,326][1652491] Updated weights for policy 0, policy_version 374659 (0.0014) [2024-06-15 16:08:43,727][1651469] Signal inference workers to stop experience collection... (19550 times) [2024-06-15 16:08:43,784][1652491] InferenceWorker_p0-w0: stopping experience collection (19550 times) [2024-06-15 16:08:44,035][1651469] Signal inference workers to resume experience collection... (19550 times) [2024-06-15 16:08:44,036][1652491] InferenceWorker_p0-w0: resuming experience collection (19550 times) [2024-06-15 16:08:44,721][1652491] Updated weights for policy 0, policy_version 374717 (0.0014) [2024-06-15 16:08:45,955][1648985] Fps is (10 sec: 52428.0, 60 sec: 45328.9, 300 sec: 46097.4). Total num frames: 767426560. Throughput: 0: 11195.7. Samples: 191902720. Policy #0 lag: (min: 1.0, avg: 98.9, max: 257.0) [2024-06-15 16:08:45,957][1648985] Avg episode reward: [(0, '143.020')] [2024-06-15 16:08:50,955][1648985] Fps is (10 sec: 45875.6, 60 sec: 45875.2, 300 sec: 45319.8). Total num frames: 767557632. Throughput: 0: 11275.4. Samples: 191980544. Policy #0 lag: (min: 1.0, avg: 98.9, max: 257.0) [2024-06-15 16:08:50,955][1648985] Avg episode reward: [(0, '155.080')] [2024-06-15 16:08:51,188][1652491] Updated weights for policy 0, policy_version 374800 (0.0129) [2024-06-15 16:08:52,948][1652491] Updated weights for policy 0, policy_version 374865 (0.0053) [2024-06-15 16:08:53,943][1652491] Updated weights for policy 0, policy_version 374908 (0.0014) [2024-06-15 16:08:55,950][1652491] Updated weights for policy 0, policy_version 374971 (0.0016) [2024-06-15 16:08:55,955][1648985] Fps is (10 sec: 49151.0, 60 sec: 46421.1, 300 sec: 46097.3). Total num frames: 767918080. Throughput: 0: 11286.7. Samples: 192008192. Policy #0 lag: (min: 1.0, avg: 98.9, max: 257.0) [2024-06-15 16:08:55,956][1648985] Avg episode reward: [(0, '163.110')] [2024-06-15 16:08:56,032][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000374976_767950848.pth... [2024-06-15 16:08:56,119][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000369584_756908032.pth [2024-06-15 16:09:00,955][1648985] Fps is (10 sec: 39321.4, 60 sec: 43690.7, 300 sec: 45319.8). Total num frames: 767950848. Throughput: 0: 11548.5. Samples: 192086016. Policy #0 lag: (min: 1.0, avg: 98.9, max: 257.0) [2024-06-15 16:09:00,956][1648985] Avg episode reward: [(0, '177.980')] [2024-06-15 16:09:02,296][1652491] Updated weights for policy 0, policy_version 375041 (0.0029) [2024-06-15 16:09:03,753][1652491] Updated weights for policy 0, policy_version 375104 (0.0012) [2024-06-15 16:09:05,126][1652491] Updated weights for policy 0, policy_version 375167 (0.0012) [2024-06-15 16:09:05,955][1648985] Fps is (10 sec: 42599.2, 60 sec: 47513.5, 300 sec: 45764.1). Total num frames: 768344064. Throughput: 0: 11559.8. Samples: 192151040. Policy #0 lag: (min: 12.0, avg: 71.0, max: 268.0) [2024-06-15 16:09:05,956][1648985] Avg episode reward: [(0, '179.480')] [2024-06-15 16:09:07,035][1652491] Updated weights for policy 0, policy_version 375229 (0.0019) [2024-06-15 16:09:10,955][1648985] Fps is (10 sec: 52428.6, 60 sec: 43690.6, 300 sec: 45653.0). Total num frames: 768475136. Throughput: 0: 11617.0. Samples: 192189952. Policy #0 lag: (min: 12.0, avg: 71.0, max: 268.0) [2024-06-15 16:09:10,956][1648985] Avg episode reward: [(0, '151.860')] [2024-06-15 16:09:12,843][1652491] Updated weights for policy 0, policy_version 375271 (0.0013) [2024-06-15 16:09:14,353][1652491] Updated weights for policy 0, policy_version 375344 (0.0013) [2024-06-15 16:09:15,708][1652491] Updated weights for policy 0, policy_version 375394 (0.0012) [2024-06-15 16:09:15,955][1648985] Fps is (10 sec: 49153.1, 60 sec: 47513.7, 300 sec: 45875.2). Total num frames: 768835584. Throughput: 0: 11730.5. Samples: 192267776. Policy #0 lag: (min: 12.0, avg: 71.0, max: 268.0) [2024-06-15 16:09:15,956][1648985] Avg episode reward: [(0, '157.920')] [2024-06-15 16:09:16,748][1652491] Updated weights for policy 0, policy_version 375442 (0.0128) [2024-06-15 16:09:20,955][1648985] Fps is (10 sec: 52429.4, 60 sec: 45329.1, 300 sec: 46208.5). Total num frames: 768999424. Throughput: 0: 12094.6. Samples: 192339456. Policy #0 lag: (min: 12.0, avg: 71.0, max: 268.0) [2024-06-15 16:09:20,955][1648985] Avg episode reward: [(0, '163.800')] [2024-06-15 16:09:23,693][1652491] Updated weights for policy 0, policy_version 375507 (0.0016) [2024-06-15 16:09:25,376][1651469] Signal inference workers to stop experience collection... (19600 times) [2024-06-15 16:09:25,431][1652491] InferenceWorker_p0-w0: stopping experience collection (19600 times) [2024-06-15 16:09:25,566][1651469] Signal inference workers to resume experience collection... (19600 times) [2024-06-15 16:09:25,567][1652491] InferenceWorker_p0-w0: resuming experience collection (19600 times) [2024-06-15 16:09:25,748][1652491] Updated weights for policy 0, policy_version 375586 (0.0014) [2024-06-15 16:09:25,955][1648985] Fps is (10 sec: 39320.2, 60 sec: 47514.3, 300 sec: 45875.2). Total num frames: 769228800. Throughput: 0: 11923.8. Samples: 192377344. Policy #0 lag: (min: 12.0, avg: 71.0, max: 268.0) [2024-06-15 16:09:25,956][1648985] Avg episode reward: [(0, '156.150')] [2024-06-15 16:09:27,163][1652491] Updated weights for policy 0, policy_version 375649 (0.0013) [2024-06-15 16:09:28,689][1652491] Updated weights for policy 0, policy_version 375714 (0.0260) [2024-06-15 16:09:30,955][1648985] Fps is (10 sec: 52426.6, 60 sec: 46967.2, 300 sec: 46208.4). Total num frames: 769523712. Throughput: 0: 11764.6. Samples: 192432128. Policy #0 lag: (min: 12.0, avg: 71.0, max: 268.0) [2024-06-15 16:09:30,956][1648985] Avg episode reward: [(0, '142.890')] [2024-06-15 16:09:34,552][1652491] Updated weights for policy 0, policy_version 375747 (0.0011) [2024-06-15 16:09:35,955][1648985] Fps is (10 sec: 39322.7, 60 sec: 45329.1, 300 sec: 45875.2). Total num frames: 769622016. Throughput: 0: 11753.2. Samples: 192509440. Policy #0 lag: (min: 12.0, avg: 71.0, max: 268.0) [2024-06-15 16:09:35,956][1648985] Avg episode reward: [(0, '142.150')] [2024-06-15 16:09:36,500][1652491] Updated weights for policy 0, policy_version 375824 (0.0014) [2024-06-15 16:09:38,132][1652491] Updated weights for policy 0, policy_version 375891 (0.0014) [2024-06-15 16:09:40,190][1652491] Updated weights for policy 0, policy_version 375952 (0.0013) [2024-06-15 16:09:40,955][1648985] Fps is (10 sec: 49153.6, 60 sec: 48605.9, 300 sec: 46097.4). Total num frames: 770015232. Throughput: 0: 11594.1. Samples: 192529920. Policy #0 lag: (min: 12.0, avg: 71.0, max: 268.0) [2024-06-15 16:09:40,956][1648985] Avg episode reward: [(0, '154.070')] [2024-06-15 16:09:45,955][1648985] Fps is (10 sec: 42597.8, 60 sec: 43690.7, 300 sec: 46097.4). Total num frames: 770048000. Throughput: 0: 11537.0. Samples: 192605184. Policy #0 lag: (min: 12.0, avg: 71.0, max: 268.0) [2024-06-15 16:09:45,956][1648985] Avg episode reward: [(0, '146.350')] [2024-06-15 16:09:46,920][1652491] Updated weights for policy 0, policy_version 376039 (0.0014) [2024-06-15 16:09:48,295][1652491] Updated weights for policy 0, policy_version 376081 (0.0011) [2024-06-15 16:09:49,534][1652491] Updated weights for policy 0, policy_version 376129 (0.0013) [2024-06-15 16:09:50,667][1652491] Updated weights for policy 0, policy_version 376189 (0.0121) [2024-06-15 16:09:50,955][1648985] Fps is (10 sec: 42598.4, 60 sec: 48059.7, 300 sec: 46097.4). Total num frames: 770441216. Throughput: 0: 11525.7. Samples: 192669696. Policy #0 lag: (min: 12.0, avg: 71.0, max: 268.0) [2024-06-15 16:09:50,956][1648985] Avg episode reward: [(0, '126.900')] [2024-06-15 16:09:52,253][1652491] Updated weights for policy 0, policy_version 376242 (0.0012) [2024-06-15 16:09:55,955][1648985] Fps is (10 sec: 52429.4, 60 sec: 44237.1, 300 sec: 46208.4). Total num frames: 770572288. Throughput: 0: 11502.9. Samples: 192707584. Policy #0 lag: (min: 12.0, avg: 71.0, max: 268.0) [2024-06-15 16:09:55,956][1648985] Avg episode reward: [(0, '129.590')] [2024-06-15 16:09:57,373][1652491] Updated weights for policy 0, policy_version 376289 (0.0013) [2024-06-15 16:09:59,383][1652491] Updated weights for policy 0, policy_version 376337 (0.0054) [2024-06-15 16:10:00,955][1648985] Fps is (10 sec: 42598.4, 60 sec: 48605.9, 300 sec: 45986.3). Total num frames: 770867200. Throughput: 0: 11457.4. Samples: 192783360. Policy #0 lag: (min: 12.0, avg: 71.0, max: 268.0) [2024-06-15 16:10:00,956][1648985] Avg episode reward: [(0, '146.000')] [2024-06-15 16:10:01,026][1652491] Updated weights for policy 0, policy_version 376404 (0.0092) [2024-06-15 16:10:02,735][1651469] Signal inference workers to stop experience collection... (19650 times) [2024-06-15 16:10:02,786][1652491] InferenceWorker_p0-w0: stopping experience collection (19650 times) [2024-06-15 16:10:02,819][1652491] Updated weights for policy 0, policy_version 376484 (0.0013) [2024-06-15 16:10:03,036][1651469] Signal inference workers to resume experience collection... (19650 times) [2024-06-15 16:10:03,037][1652491] InferenceWorker_p0-w0: resuming experience collection (19650 times) [2024-06-15 16:10:05,955][1648985] Fps is (10 sec: 52429.3, 60 sec: 45875.4, 300 sec: 46208.4). Total num frames: 771096576. Throughput: 0: 11320.9. Samples: 192848896. Policy #0 lag: (min: 12.0, avg: 71.0, max: 268.0) [2024-06-15 16:10:05,955][1648985] Avg episode reward: [(0, '168.970')] [2024-06-15 16:10:08,463][1652491] Updated weights for policy 0, policy_version 376530 (0.0013) [2024-06-15 16:10:10,955][1648985] Fps is (10 sec: 39320.5, 60 sec: 46421.1, 300 sec: 45986.2). Total num frames: 771260416. Throughput: 0: 11309.5. Samples: 192886272. Policy #0 lag: (min: 12.0, avg: 71.0, max: 268.0) [2024-06-15 16:10:10,956][1648985] Avg episode reward: [(0, '157.860')] [2024-06-15 16:10:11,124][1652491] Updated weights for policy 0, policy_version 376608 (0.0015) [2024-06-15 16:10:12,506][1652491] Updated weights for policy 0, policy_version 376658 (0.0019) [2024-06-15 16:10:13,430][1652491] Updated weights for policy 0, policy_version 376704 (0.0013) [2024-06-15 16:10:14,789][1652491] Updated weights for policy 0, policy_version 376763 (0.0029) [2024-06-15 16:10:15,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 46421.4, 300 sec: 46208.5). Total num frames: 771620864. Throughput: 0: 11434.8. Samples: 192946688. Policy #0 lag: (min: 12.0, avg: 71.0, max: 268.0) [2024-06-15 16:10:15,955][1648985] Avg episode reward: [(0, '131.890')] [2024-06-15 16:10:20,060][1652491] Updated weights for policy 0, policy_version 376822 (0.0011) [2024-06-15 16:10:20,955][1648985] Fps is (10 sec: 49153.7, 60 sec: 45875.2, 300 sec: 46208.4). Total num frames: 771751936. Throughput: 0: 11446.1. Samples: 193024512. Policy #0 lag: (min: 43.0, avg: 127.3, max: 299.0) [2024-06-15 16:10:20,955][1648985] Avg episode reward: [(0, '124.820')] [2024-06-15 16:10:22,788][1652491] Updated weights for policy 0, policy_version 376880 (0.0015) [2024-06-15 16:10:24,277][1652491] Updated weights for policy 0, policy_version 376928 (0.0015) [2024-06-15 16:10:25,317][1652491] Updated weights for policy 0, policy_version 376976 (0.0013) [2024-06-15 16:10:25,955][1648985] Fps is (10 sec: 45874.7, 60 sec: 47513.8, 300 sec: 45986.3). Total num frames: 772079616. Throughput: 0: 11650.8. Samples: 193054208. Policy #0 lag: (min: 43.0, avg: 127.3, max: 299.0) [2024-06-15 16:10:25,956][1648985] Avg episode reward: [(0, '128.470')] [2024-06-15 16:10:30,955][1648985] Fps is (10 sec: 42598.3, 60 sec: 44237.1, 300 sec: 45986.3). Total num frames: 772177920. Throughput: 0: 11685.0. Samples: 193131008. Policy #0 lag: (min: 43.0, avg: 127.3, max: 299.0) [2024-06-15 16:10:30,956][1648985] Avg episode reward: [(0, '140.240')] [2024-06-15 16:10:31,011][1652491] Updated weights for policy 0, policy_version 377045 (0.0165) [2024-06-15 16:10:33,097][1652491] Updated weights for policy 0, policy_version 377091 (0.0011) [2024-06-15 16:10:34,936][1652491] Updated weights for policy 0, policy_version 377168 (0.0012) [2024-06-15 16:10:35,974][1648985] Fps is (10 sec: 45788.3, 60 sec: 48590.5, 300 sec: 46205.5). Total num frames: 772538368. Throughput: 0: 11566.3. Samples: 193190400. Policy #0 lag: (min: 43.0, avg: 127.3, max: 299.0) [2024-06-15 16:10:35,975][1648985] Avg episode reward: [(0, '142.930')] [2024-06-15 16:10:36,628][1652491] Updated weights for policy 0, policy_version 377248 (0.0014) [2024-06-15 16:10:40,955][1648985] Fps is (10 sec: 49151.6, 60 sec: 44236.8, 300 sec: 46208.4). Total num frames: 772669440. Throughput: 0: 11457.4. Samples: 193223168. Policy #0 lag: (min: 43.0, avg: 127.3, max: 299.0) [2024-06-15 16:10:40,956][1648985] Avg episode reward: [(0, '147.660')] [2024-06-15 16:10:43,303][1652491] Updated weights for policy 0, policy_version 377314 (0.0012) [2024-06-15 16:10:44,999][1652491] Updated weights for policy 0, policy_version 377360 (0.0013) [2024-06-15 16:10:45,955][1648985] Fps is (10 sec: 36112.7, 60 sec: 47513.6, 300 sec: 46097.3). Total num frames: 772898816. Throughput: 0: 11480.1. Samples: 193299968. Policy #0 lag: (min: 43.0, avg: 127.3, max: 299.0) [2024-06-15 16:10:45,956][1648985] Avg episode reward: [(0, '151.440')] [2024-06-15 16:10:46,259][1652491] Updated weights for policy 0, policy_version 377408 (0.0074) [2024-06-15 16:10:47,225][1652491] Updated weights for policy 0, policy_version 377462 (0.0013) [2024-06-15 16:10:47,924][1651469] Signal inference workers to stop experience collection... (19700 times) [2024-06-15 16:10:47,946][1652491] InferenceWorker_p0-w0: stopping experience collection (19700 times) [2024-06-15 16:10:48,106][1651469] Signal inference workers to resume experience collection... (19700 times) [2024-06-15 16:10:48,107][1652491] InferenceWorker_p0-w0: resuming experience collection (19700 times) [2024-06-15 16:10:48,483][1652491] Updated weights for policy 0, policy_version 377520 (0.0013) [2024-06-15 16:10:50,955][1648985] Fps is (10 sec: 52429.1, 60 sec: 45875.2, 300 sec: 46208.4). Total num frames: 773193728. Throughput: 0: 11514.3. Samples: 193367040. Policy #0 lag: (min: 43.0, avg: 127.3, max: 299.0) [2024-06-15 16:10:50,956][1648985] Avg episode reward: [(0, '156.420')] [2024-06-15 16:10:54,106][1652491] Updated weights for policy 0, policy_version 377584 (0.0014) [2024-06-15 16:10:55,955][1648985] Fps is (10 sec: 42597.9, 60 sec: 45875.0, 300 sec: 46208.4). Total num frames: 773324800. Throughput: 0: 11525.7. Samples: 193404928. Policy #0 lag: (min: 43.0, avg: 127.3, max: 299.0) [2024-06-15 16:10:55,956][1648985] Avg episode reward: [(0, '138.550')] [2024-06-15 16:10:56,286][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000377632_773390336.pth... [2024-06-15 16:10:56,286][1652491] Updated weights for policy 0, policy_version 377632 (0.0013) [2024-06-15 16:10:56,419][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000372224_762314752.pth [2024-06-15 16:10:56,423][1651469] Saving a milestone train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/milestones/checkpoint_000377632_773390336.pth [2024-06-15 16:10:58,375][1652491] Updated weights for policy 0, policy_version 377703 (0.0013) [2024-06-15 16:10:59,427][1652491] Updated weights for policy 0, policy_version 377745 (0.0013) [2024-06-15 16:11:00,290][1652491] Updated weights for policy 0, policy_version 377790 (0.0011) [2024-06-15 16:11:00,955][1648985] Fps is (10 sec: 52429.2, 60 sec: 47513.7, 300 sec: 46208.4). Total num frames: 773718016. Throughput: 0: 11571.2. Samples: 193467392. Policy #0 lag: (min: 43.0, avg: 127.3, max: 299.0) [2024-06-15 16:11:00,955][1648985] Avg episode reward: [(0, '130.760')] [2024-06-15 16:11:05,480][1652491] Updated weights for policy 0, policy_version 377849 (0.0016) [2024-06-15 16:11:05,955][1648985] Fps is (10 sec: 52430.6, 60 sec: 45875.2, 300 sec: 46208.4). Total num frames: 773849088. Throughput: 0: 11559.8. Samples: 193544704. Policy #0 lag: (min: 43.0, avg: 127.3, max: 299.0) [2024-06-15 16:11:05,956][1648985] Avg episode reward: [(0, '135.420')] [2024-06-15 16:11:07,426][1652491] Updated weights for policy 0, policy_version 377889 (0.0014) [2024-06-15 16:11:09,243][1652491] Updated weights for policy 0, policy_version 377955 (0.0013) [2024-06-15 16:11:10,955][1648985] Fps is (10 sec: 42598.3, 60 sec: 48060.0, 300 sec: 46097.3). Total num frames: 774144000. Throughput: 0: 11616.7. Samples: 193576960. Policy #0 lag: (min: 43.0, avg: 127.3, max: 299.0) [2024-06-15 16:11:10,955][1648985] Avg episode reward: [(0, '145.180')] [2024-06-15 16:11:11,788][1652491] Updated weights for policy 0, policy_version 378041 (0.0017) [2024-06-15 16:11:15,955][1648985] Fps is (10 sec: 45875.0, 60 sec: 44782.9, 300 sec: 46097.4). Total num frames: 774307840. Throughput: 0: 11559.8. Samples: 193651200. Policy #0 lag: (min: 43.0, avg: 127.3, max: 299.0) [2024-06-15 16:11:15,956][1648985] Avg episode reward: [(0, '160.590')] [2024-06-15 16:11:16,499][1652491] Updated weights for policy 0, policy_version 378107 (0.0014) [2024-06-15 16:11:18,697][1652491] Updated weights for policy 0, policy_version 378175 (0.0015) [2024-06-15 16:11:20,693][1652491] Updated weights for policy 0, policy_version 378237 (0.0015) [2024-06-15 16:11:20,955][1648985] Fps is (10 sec: 49151.4, 60 sec: 48059.6, 300 sec: 46208.4). Total num frames: 774635520. Throughput: 0: 11735.4. Samples: 193718272. Policy #0 lag: (min: 43.0, avg: 127.3, max: 299.0) [2024-06-15 16:11:20,956][1648985] Avg episode reward: [(0, '168.740')] [2024-06-15 16:11:22,194][1652491] Updated weights for policy 0, policy_version 378272 (0.0014) [2024-06-15 16:11:25,955][1648985] Fps is (10 sec: 45875.4, 60 sec: 44783.0, 300 sec: 46208.4). Total num frames: 774766592. Throughput: 0: 11867.0. Samples: 193757184. Policy #0 lag: (min: 43.0, avg: 127.3, max: 299.0) [2024-06-15 16:11:25,955][1648985] Avg episode reward: [(0, '179.900')] [2024-06-15 16:11:27,194][1652491] Updated weights for policy 0, policy_version 378336 (0.0012) [2024-06-15 16:11:28,561][1652491] Updated weights for policy 0, policy_version 378384 (0.0011) [2024-06-15 16:11:30,568][1652491] Updated weights for policy 0, policy_version 378438 (0.0013) [2024-06-15 16:11:30,955][1648985] Fps is (10 sec: 45875.9, 60 sec: 48605.9, 300 sec: 46208.4). Total num frames: 775094272. Throughput: 0: 11753.3. Samples: 193828864. Policy #0 lag: (min: 43.0, avg: 127.3, max: 299.0) [2024-06-15 16:11:30,955][1648985] Avg episode reward: [(0, '186.770')] [2024-06-15 16:11:32,348][1652491] Updated weights for policy 0, policy_version 378512 (0.0013) [2024-06-15 16:11:33,260][1652491] Updated weights for policy 0, policy_version 378560 (0.0011) [2024-06-15 16:11:35,955][1648985] Fps is (10 sec: 52428.6, 60 sec: 45889.7, 300 sec: 46208.4). Total num frames: 775290880. Throughput: 0: 12128.7. Samples: 193912832. Policy #0 lag: (min: 43.0, avg: 127.3, max: 299.0) [2024-06-15 16:11:35,956][1648985] Avg episode reward: [(0, '170.650')] [2024-06-15 16:11:36,114][1651469] Signal inference workers to stop experience collection... (19750 times) [2024-06-15 16:11:36,161][1652491] InferenceWorker_p0-w0: stopping experience collection (19750 times) [2024-06-15 16:11:36,435][1651469] Signal inference workers to resume experience collection... (19750 times) [2024-06-15 16:11:36,438][1652491] InferenceWorker_p0-w0: resuming experience collection (19750 times) [2024-06-15 16:11:38,446][1652491] Updated weights for policy 0, policy_version 378626 (0.0013) [2024-06-15 16:11:39,559][1652491] Updated weights for policy 0, policy_version 378678 (0.0013) [2024-06-15 16:11:40,955][1648985] Fps is (10 sec: 45874.8, 60 sec: 48059.8, 300 sec: 46319.5). Total num frames: 775553024. Throughput: 0: 12049.1. Samples: 193947136. Policy #0 lag: (min: 15.0, avg: 119.8, max: 271.0) [2024-06-15 16:11:40,956][1648985] Avg episode reward: [(0, '165.740')] [2024-06-15 16:11:42,273][1652491] Updated weights for policy 0, policy_version 378747 (0.0024) [2024-06-15 16:11:43,915][1652491] Updated weights for policy 0, policy_version 378802 (0.0014) [2024-06-15 16:11:45,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 48606.0, 300 sec: 46208.5). Total num frames: 775815168. Throughput: 0: 12174.2. Samples: 194015232. Policy #0 lag: (min: 15.0, avg: 119.8, max: 271.0) [2024-06-15 16:11:45,956][1648985] Avg episode reward: [(0, '163.770')] [2024-06-15 16:11:48,063][1652491] Updated weights for policy 0, policy_version 378848 (0.0016) [2024-06-15 16:11:49,768][1652491] Updated weights for policy 0, policy_version 378900 (0.0014) [2024-06-15 16:11:50,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 48059.7, 300 sec: 46652.9). Total num frames: 776077312. Throughput: 0: 11992.2. Samples: 194084352. Policy #0 lag: (min: 15.0, avg: 119.8, max: 271.0) [2024-06-15 16:11:50,956][1648985] Avg episode reward: [(0, '177.860')] [2024-06-15 16:11:52,567][1652491] Updated weights for policy 0, policy_version 378960 (0.0012) [2024-06-15 16:11:54,592][1652491] Updated weights for policy 0, policy_version 379027 (0.0012) [2024-06-15 16:11:55,382][1652491] Updated weights for policy 0, policy_version 379069 (0.0012) [2024-06-15 16:11:55,955][1648985] Fps is (10 sec: 52428.6, 60 sec: 50244.5, 300 sec: 46430.6). Total num frames: 776339456. Throughput: 0: 12014.9. Samples: 194117632. Policy #0 lag: (min: 15.0, avg: 119.8, max: 271.0) [2024-06-15 16:11:55,956][1648985] Avg episode reward: [(0, '176.100')] [2024-06-15 16:12:00,279][1652491] Updated weights for policy 0, policy_version 379134 (0.0011) [2024-06-15 16:12:00,955][1648985] Fps is (10 sec: 42598.4, 60 sec: 46421.3, 300 sec: 46541.7). Total num frames: 776503296. Throughput: 0: 11969.4. Samples: 194189824. Policy #0 lag: (min: 15.0, avg: 119.8, max: 271.0) [2024-06-15 16:12:00,956][1648985] Avg episode reward: [(0, '164.010')] [2024-06-15 16:12:03,938][1652491] Updated weights for policy 0, policy_version 379216 (0.0015) [2024-06-15 16:12:05,048][1652491] Updated weights for policy 0, policy_version 379264 (0.0018) [2024-06-15 16:12:05,955][1648985] Fps is (10 sec: 39322.0, 60 sec: 48059.8, 300 sec: 46097.4). Total num frames: 776732672. Throughput: 0: 12015.0. Samples: 194258944. Policy #0 lag: (min: 15.0, avg: 119.8, max: 271.0) [2024-06-15 16:12:05,955][1648985] Avg episode reward: [(0, '161.300')] [2024-06-15 16:12:07,216][1652491] Updated weights for policy 0, policy_version 379325 (0.0013) [2024-06-15 16:12:10,955][1648985] Fps is (10 sec: 39321.2, 60 sec: 45875.1, 300 sec: 46319.5). Total num frames: 776896512. Throughput: 0: 11832.9. Samples: 194289664. Policy #0 lag: (min: 15.0, avg: 119.8, max: 271.0) [2024-06-15 16:12:10,956][1648985] Avg episode reward: [(0, '131.030')] [2024-06-15 16:12:11,520][1652491] Updated weights for policy 0, policy_version 379378 (0.0012) [2024-06-15 16:12:12,863][1652491] Updated weights for policy 0, policy_version 379424 (0.0012) [2024-06-15 16:12:15,955][1648985] Fps is (10 sec: 42597.0, 60 sec: 47513.4, 300 sec: 46208.4). Total num frames: 777158656. Throughput: 0: 11753.2. Samples: 194357760. Policy #0 lag: (min: 15.0, avg: 119.8, max: 271.0) [2024-06-15 16:12:15,957][1648985] Avg episode reward: [(0, '181.260')] [2024-06-15 16:12:16,543][1652491] Updated weights for policy 0, policy_version 379504 (0.0013) [2024-06-15 16:12:18,392][1652491] Updated weights for policy 0, policy_version 379538 (0.0016) [2024-06-15 16:12:20,955][1648985] Fps is (10 sec: 49152.4, 60 sec: 45875.2, 300 sec: 46208.4). Total num frames: 777388032. Throughput: 0: 11411.9. Samples: 194426368. Policy #0 lag: (min: 15.0, avg: 119.8, max: 271.0) [2024-06-15 16:12:20,956][1648985] Avg episode reward: [(0, '187.180')] [2024-06-15 16:12:22,250][1651469] Signal inference workers to stop experience collection... (19800 times) [2024-06-15 16:12:22,308][1652491] Updated weights for policy 0, policy_version 379603 (0.0026) [2024-06-15 16:12:22,341][1652491] InferenceWorker_p0-w0: stopping experience collection (19800 times) [2024-06-15 16:12:22,605][1651469] Signal inference workers to resume experience collection... (19800 times) [2024-06-15 16:12:22,606][1652491] InferenceWorker_p0-w0: resuming experience collection (19800 times) [2024-06-15 16:12:23,488][1652491] Updated weights for policy 0, policy_version 379648 (0.0013) [2024-06-15 16:12:25,071][1652491] Updated weights for policy 0, policy_version 379702 (0.0012) [2024-06-15 16:12:25,955][1648985] Fps is (10 sec: 49153.2, 60 sec: 48059.7, 300 sec: 46541.7). Total num frames: 777650176. Throughput: 0: 11343.6. Samples: 194457600. Policy #0 lag: (min: 15.0, avg: 119.8, max: 271.0) [2024-06-15 16:12:25,956][1648985] Avg episode reward: [(0, '160.930')] [2024-06-15 16:12:27,944][1652491] Updated weights for policy 0, policy_version 379760 (0.0013) [2024-06-15 16:12:30,786][1652491] Updated weights for policy 0, policy_version 379824 (0.0013) [2024-06-15 16:12:30,955][1648985] Fps is (10 sec: 49150.8, 60 sec: 46421.1, 300 sec: 46097.3). Total num frames: 777879552. Throughput: 0: 11446.0. Samples: 194530304. Policy #0 lag: (min: 15.0, avg: 119.8, max: 271.0) [2024-06-15 16:12:30,956][1648985] Avg episode reward: [(0, '153.230')] [2024-06-15 16:12:33,724][1652491] Updated weights for policy 0, policy_version 379872 (0.0042) [2024-06-15 16:12:35,955][1648985] Fps is (10 sec: 42598.7, 60 sec: 46421.4, 300 sec: 46320.0). Total num frames: 778076160. Throughput: 0: 11366.4. Samples: 194595840. Policy #0 lag: (min: 15.0, avg: 119.8, max: 271.0) [2024-06-15 16:12:35,956][1648985] Avg episode reward: [(0, '157.910')] [2024-06-15 16:12:36,690][1652491] Updated weights for policy 0, policy_version 379952 (0.0012) [2024-06-15 16:12:39,516][1652491] Updated weights for policy 0, policy_version 380007 (0.0033) [2024-06-15 16:12:40,955][1648985] Fps is (10 sec: 42599.3, 60 sec: 45875.2, 300 sec: 46097.3). Total num frames: 778305536. Throughput: 0: 11423.3. Samples: 194631680. Policy #0 lag: (min: 15.0, avg: 119.8, max: 271.0) [2024-06-15 16:12:40,956][1648985] Avg episode reward: [(0, '147.440')] [2024-06-15 16:12:42,651][1652491] Updated weights for policy 0, policy_version 380080 (0.0021) [2024-06-15 16:12:45,534][1652491] Updated weights for policy 0, policy_version 380144 (0.0068) [2024-06-15 16:12:45,955][1648985] Fps is (10 sec: 49151.3, 60 sec: 45875.1, 300 sec: 46652.7). Total num frames: 778567680. Throughput: 0: 11298.1. Samples: 194698240. Policy #0 lag: (min: 15.0, avg: 119.8, max: 271.0) [2024-06-15 16:12:45,956][1648985] Avg episode reward: [(0, '146.350')] [2024-06-15 16:12:47,474][1652491] Updated weights for policy 0, policy_version 380179 (0.0012) [2024-06-15 16:12:50,323][1652491] Updated weights for policy 0, policy_version 380230 (0.0027) [2024-06-15 16:12:50,955][1648985] Fps is (10 sec: 45875.1, 60 sec: 44782.9, 300 sec: 46208.4). Total num frames: 778764288. Throughput: 0: 11264.0. Samples: 194765824. Policy #0 lag: (min: 15.0, avg: 119.8, max: 271.0) [2024-06-15 16:12:50,956][1648985] Avg episode reward: [(0, '155.470')] [2024-06-15 16:12:51,398][1652491] Updated weights for policy 0, policy_version 380284 (0.0016) [2024-06-15 16:12:53,710][1652491] Updated weights for policy 0, policy_version 380345 (0.0013) [2024-06-15 16:12:55,955][1648985] Fps is (10 sec: 42598.9, 60 sec: 44236.8, 300 sec: 46319.5). Total num frames: 778993664. Throughput: 0: 11343.7. Samples: 194800128. Policy #0 lag: (min: 15.0, avg: 119.8, max: 271.0) [2024-06-15 16:12:55,955][1648985] Avg episode reward: [(0, '163.190')] [2024-06-15 16:12:56,385][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000380384_779026432.pth... [2024-06-15 16:12:56,554][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000374976_767950848.pth [2024-06-15 16:12:56,851][1652491] Updated weights for policy 0, policy_version 380400 (0.0142) [2024-06-15 16:12:58,993][1652491] Updated weights for policy 0, policy_version 380464 (0.0090) [2024-06-15 16:13:00,955][1648985] Fps is (10 sec: 45874.9, 60 sec: 45329.0, 300 sec: 46541.6). Total num frames: 779223040. Throughput: 0: 11389.2. Samples: 194870272. Policy #0 lag: (min: 15.0, avg: 133.0, max: 271.0) [2024-06-15 16:13:00,956][1648985] Avg episode reward: [(0, '166.410')] [2024-06-15 16:13:02,654][1652491] Updated weights for policy 0, policy_version 380514 (0.0027) [2024-06-15 16:13:04,443][1652491] Updated weights for policy 0, policy_version 380562 (0.0012) [2024-06-15 16:13:05,955][1648985] Fps is (10 sec: 49152.2, 60 sec: 45875.2, 300 sec: 46208.4). Total num frames: 779485184. Throughput: 0: 11423.3. Samples: 194940416. Policy #0 lag: (min: 15.0, avg: 133.0, max: 271.0) [2024-06-15 16:13:05,956][1648985] Avg episode reward: [(0, '142.440')] [2024-06-15 16:13:08,652][1652491] Updated weights for policy 0, policy_version 380656 (0.0011) [2024-06-15 16:13:09,596][1651469] Signal inference workers to stop experience collection... (19850 times) [2024-06-15 16:13:09,684][1652491] InferenceWorker_p0-w0: stopping experience collection (19850 times) [2024-06-15 16:13:09,686][1652491] Updated weights for policy 0, policy_version 380678 (0.0011) [2024-06-15 16:13:09,818][1651469] Signal inference workers to resume experience collection... (19850 times) [2024-06-15 16:13:09,819][1652491] InferenceWorker_p0-w0: resuming experience collection (19850 times) [2024-06-15 16:13:10,956][1648985] Fps is (10 sec: 49150.8, 60 sec: 46967.3, 300 sec: 46541.6). Total num frames: 779714560. Throughput: 0: 11377.7. Samples: 194969600. Policy #0 lag: (min: 15.0, avg: 133.0, max: 271.0) [2024-06-15 16:13:10,956][1648985] Avg episode reward: [(0, '132.860')] [2024-06-15 16:13:10,999][1652491] Updated weights for policy 0, policy_version 380735 (0.0138) [2024-06-15 16:13:15,058][1652491] Updated weights for policy 0, policy_version 380789 (0.0017) [2024-06-15 16:13:15,955][1648985] Fps is (10 sec: 39321.6, 60 sec: 45329.3, 300 sec: 46097.4). Total num frames: 779878400. Throughput: 0: 11355.1. Samples: 195041280. Policy #0 lag: (min: 15.0, avg: 133.0, max: 271.0) [2024-06-15 16:13:15,956][1648985] Avg episode reward: [(0, '148.040')] [2024-06-15 16:13:16,351][1652491] Updated weights for policy 0, policy_version 380832 (0.0015) [2024-06-15 16:13:18,765][1652491] Updated weights for policy 0, policy_version 380880 (0.0012) [2024-06-15 16:13:19,698][1652491] Updated weights for policy 0, policy_version 380928 (0.0013) [2024-06-15 16:13:20,955][1648985] Fps is (10 sec: 49153.8, 60 sec: 46967.5, 300 sec: 46875.1). Total num frames: 780206080. Throughput: 0: 11457.4. Samples: 195111424. Policy #0 lag: (min: 15.0, avg: 133.0, max: 271.0) [2024-06-15 16:13:20,956][1648985] Avg episode reward: [(0, '150.250')] [2024-06-15 16:13:21,492][1652491] Updated weights for policy 0, policy_version 380990 (0.0013) [2024-06-15 16:13:25,955][1648985] Fps is (10 sec: 49150.6, 60 sec: 45328.9, 300 sec: 46319.5). Total num frames: 780369920. Throughput: 0: 11502.9. Samples: 195149312. Policy #0 lag: (min: 15.0, avg: 133.0, max: 271.0) [2024-06-15 16:13:25,956][1648985] Avg episode reward: [(0, '161.290')] [2024-06-15 16:13:26,138][1652491] Updated weights for policy 0, policy_version 381054 (0.0015) [2024-06-15 16:13:28,114][1652491] Updated weights for policy 0, policy_version 381112 (0.0014) [2024-06-15 16:13:30,315][1652491] Updated weights for policy 0, policy_version 381168 (0.0014) [2024-06-15 16:13:30,955][1648985] Fps is (10 sec: 45875.0, 60 sec: 46421.5, 300 sec: 46652.7). Total num frames: 780664832. Throughput: 0: 11628.1. Samples: 195221504. Policy #0 lag: (min: 15.0, avg: 133.0, max: 271.0) [2024-06-15 16:13:30,956][1648985] Avg episode reward: [(0, '140.260')] [2024-06-15 16:13:32,242][1652491] Updated weights for policy 0, policy_version 381232 (0.0012) [2024-06-15 16:13:35,955][1648985] Fps is (10 sec: 42599.1, 60 sec: 45329.0, 300 sec: 46430.6). Total num frames: 780795904. Throughput: 0: 11707.7. Samples: 195292672. Policy #0 lag: (min: 15.0, avg: 133.0, max: 271.0) [2024-06-15 16:13:35,956][1648985] Avg episode reward: [(0, '160.100')] [2024-06-15 16:13:37,537][1652491] Updated weights for policy 0, policy_version 381304 (0.0013) [2024-06-15 16:13:38,748][1652491] Updated weights for policy 0, policy_version 381331 (0.0012) [2024-06-15 16:13:40,955][1648985] Fps is (10 sec: 39321.0, 60 sec: 45875.1, 300 sec: 46208.4). Total num frames: 781058048. Throughput: 0: 11730.4. Samples: 195328000. Policy #0 lag: (min: 15.0, avg: 133.0, max: 271.0) [2024-06-15 16:13:40,956][1648985] Avg episode reward: [(0, '165.040')] [2024-06-15 16:13:41,266][1652491] Updated weights for policy 0, policy_version 381392 (0.0012) [2024-06-15 16:13:43,645][1652491] Updated weights for policy 0, policy_version 381476 (0.0016) [2024-06-15 16:13:45,955][1648985] Fps is (10 sec: 52429.4, 60 sec: 45875.3, 300 sec: 46652.7). Total num frames: 781320192. Throughput: 0: 11400.6. Samples: 195383296. Policy #0 lag: (min: 15.0, avg: 133.0, max: 271.0) [2024-06-15 16:13:45,956][1648985] Avg episode reward: [(0, '160.980')] [2024-06-15 16:13:48,933][1652491] Updated weights for policy 0, policy_version 381536 (0.0013) [2024-06-15 16:13:50,390][1652491] Updated weights for policy 0, policy_version 381569 (0.0011) [2024-06-15 16:13:50,955][1648985] Fps is (10 sec: 42599.4, 60 sec: 45329.1, 300 sec: 45986.3). Total num frames: 781484032. Throughput: 0: 11628.1. Samples: 195463680. Policy #0 lag: (min: 15.0, avg: 133.0, max: 271.0) [2024-06-15 16:13:50,956][1648985] Avg episode reward: [(0, '157.270')] [2024-06-15 16:13:52,374][1652491] Updated weights for policy 0, policy_version 381635 (0.0013) [2024-06-15 16:13:53,988][1651469] Signal inference workers to stop experience collection... (19900 times) [2024-06-15 16:13:54,034][1652491] InferenceWorker_p0-w0: stopping experience collection (19900 times) [2024-06-15 16:13:54,338][1651469] Signal inference workers to resume experience collection... (19900 times) [2024-06-15 16:13:54,350][1652491] InferenceWorker_p0-w0: resuming experience collection (19900 times) [2024-06-15 16:13:54,352][1652491] Updated weights for policy 0, policy_version 381712 (0.0015) [2024-06-15 16:13:55,496][1652491] Updated weights for policy 0, policy_version 381760 (0.0012) [2024-06-15 16:13:55,955][1648985] Fps is (10 sec: 52428.4, 60 sec: 47513.6, 300 sec: 47097.1). Total num frames: 781844480. Throughput: 0: 11525.8. Samples: 195488256. Policy #0 lag: (min: 15.0, avg: 133.0, max: 271.0) [2024-06-15 16:13:55,956][1648985] Avg episode reward: [(0, '150.710')] [2024-06-15 16:14:00,955][1648985] Fps is (10 sec: 39320.8, 60 sec: 44236.8, 300 sec: 45875.2). Total num frames: 781877248. Throughput: 0: 11605.3. Samples: 195563520. Policy #0 lag: (min: 15.0, avg: 133.0, max: 271.0) [2024-06-15 16:14:00,956][1648985] Avg episode reward: [(0, '159.460')] [2024-06-15 16:14:02,839][1652491] Updated weights for policy 0, policy_version 381825 (0.0011) [2024-06-15 16:14:04,453][1652491] Updated weights for policy 0, policy_version 381890 (0.0223) [2024-06-15 16:14:05,955][1648985] Fps is (10 sec: 36045.1, 60 sec: 45329.1, 300 sec: 46541.7). Total num frames: 782204928. Throughput: 0: 11264.0. Samples: 195618304. Policy #0 lag: (min: 15.0, avg: 133.0, max: 271.0) [2024-06-15 16:14:05,955][1648985] Avg episode reward: [(0, '163.710')] [2024-06-15 16:14:06,243][1652491] Updated weights for policy 0, policy_version 381955 (0.0022) [2024-06-15 16:14:10,955][1648985] Fps is (10 sec: 49152.5, 60 sec: 44237.0, 300 sec: 45875.2). Total num frames: 782368768. Throughput: 0: 11116.1. Samples: 195649536. Policy #0 lag: (min: 15.0, avg: 133.0, max: 271.0) [2024-06-15 16:14:10,956][1648985] Avg episode reward: [(0, '156.510')] [2024-06-15 16:14:13,532][1652491] Updated weights for policy 0, policy_version 382032 (0.0100) [2024-06-15 16:14:14,791][1652491] Updated weights for policy 0, policy_version 382080 (0.0011) [2024-06-15 16:14:15,955][1648985] Fps is (10 sec: 32767.4, 60 sec: 44236.7, 300 sec: 45875.2). Total num frames: 782532608. Throughput: 0: 11081.9. Samples: 195720192. Policy #0 lag: (min: 15.0, avg: 133.0, max: 271.0) [2024-06-15 16:14:15,956][1648985] Avg episode reward: [(0, '145.440')] [2024-06-15 16:14:16,787][1652491] Updated weights for policy 0, policy_version 382130 (0.0023) [2024-06-15 16:14:18,875][1652491] Updated weights for policy 0, policy_version 382224 (0.0013) [2024-06-15 16:14:20,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 44782.9, 300 sec: 46319.6). Total num frames: 782893056. Throughput: 0: 10843.0. Samples: 195780608. Policy #0 lag: (min: 79.0, avg: 168.1, max: 335.0) [2024-06-15 16:14:20,956][1648985] Avg episode reward: [(0, '138.410')] [2024-06-15 16:14:24,988][1652491] Updated weights for policy 0, policy_version 382273 (0.0013) [2024-06-15 16:14:25,955][1648985] Fps is (10 sec: 42598.9, 60 sec: 43144.7, 300 sec: 45542.0). Total num frames: 782958592. Throughput: 0: 10911.3. Samples: 195819008. Policy #0 lag: (min: 79.0, avg: 168.1, max: 335.0) [2024-06-15 16:14:25,956][1648985] Avg episode reward: [(0, '143.140')] [2024-06-15 16:14:26,357][1652491] Updated weights for policy 0, policy_version 382333 (0.0012) [2024-06-15 16:14:28,598][1652491] Updated weights for policy 0, policy_version 382386 (0.0013) [2024-06-15 16:14:30,538][1652491] Updated weights for policy 0, policy_version 382470 (0.0012) [2024-06-15 16:14:30,955][1648985] Fps is (10 sec: 42598.6, 60 sec: 44236.9, 300 sec: 46430.6). Total num frames: 783319040. Throughput: 0: 11070.6. Samples: 195881472. Policy #0 lag: (min: 79.0, avg: 168.1, max: 335.0) [2024-06-15 16:14:30,956][1648985] Avg episode reward: [(0, '125.200')] [2024-06-15 16:14:35,955][1648985] Fps is (10 sec: 45875.1, 60 sec: 43690.7, 300 sec: 45430.9). Total num frames: 783417344. Throughput: 0: 10945.4. Samples: 195956224. Policy #0 lag: (min: 79.0, avg: 168.1, max: 335.0) [2024-06-15 16:14:35,956][1648985] Avg episode reward: [(0, '150.420')] [2024-06-15 16:14:36,816][1652491] Updated weights for policy 0, policy_version 382544 (0.0013) [2024-06-15 16:14:37,916][1652491] Updated weights for policy 0, policy_version 382590 (0.0012) [2024-06-15 16:14:39,424][1651469] Signal inference workers to stop experience collection... (19950 times) [2024-06-15 16:14:39,547][1652491] InferenceWorker_p0-w0: stopping experience collection (19950 times) [2024-06-15 16:14:39,681][1651469] Signal inference workers to resume experience collection... (19950 times) [2024-06-15 16:14:39,682][1652491] InferenceWorker_p0-w0: resuming experience collection (19950 times) [2024-06-15 16:14:40,088][1652491] Updated weights for policy 0, policy_version 382656 (0.0012) [2024-06-15 16:14:40,955][1648985] Fps is (10 sec: 42598.4, 60 sec: 44783.1, 300 sec: 46430.6). Total num frames: 783745024. Throughput: 0: 11229.9. Samples: 195993600. Policy #0 lag: (min: 79.0, avg: 168.1, max: 335.0) [2024-06-15 16:14:40,955][1648985] Avg episode reward: [(0, '162.730')] [2024-06-15 16:14:41,977][1652491] Updated weights for policy 0, policy_version 382724 (0.0011) [2024-06-15 16:14:45,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 43690.6, 300 sec: 45764.1). Total num frames: 783941632. Throughput: 0: 10888.6. Samples: 196053504. Policy #0 lag: (min: 79.0, avg: 168.1, max: 335.0) [2024-06-15 16:14:45,956][1648985] Avg episode reward: [(0, '160.520')] [2024-06-15 16:14:48,437][1652491] Updated weights for policy 0, policy_version 382800 (0.0149) [2024-06-15 16:14:49,727][1652491] Updated weights for policy 0, policy_version 382848 (0.0013) [2024-06-15 16:14:50,955][1648985] Fps is (10 sec: 36044.6, 60 sec: 43690.6, 300 sec: 45875.2). Total num frames: 784105472. Throughput: 0: 11298.1. Samples: 196126720. Policy #0 lag: (min: 79.0, avg: 168.1, max: 335.0) [2024-06-15 16:14:50,956][1648985] Avg episode reward: [(0, '143.520')] [2024-06-15 16:14:52,003][1652491] Updated weights for policy 0, policy_version 382912 (0.0012) [2024-06-15 16:14:54,625][1652491] Updated weights for policy 0, policy_version 383008 (0.0115) [2024-06-15 16:14:55,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 43690.7, 300 sec: 46097.4). Total num frames: 784465920. Throughput: 0: 11013.7. Samples: 196145152. Policy #0 lag: (min: 79.0, avg: 168.1, max: 335.0) [2024-06-15 16:14:55,956][1648985] Avg episode reward: [(0, '141.120')] [2024-06-15 16:14:55,963][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000383040_784465920.pth... [2024-06-15 16:14:56,012][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000377632_773390336.pth [2024-06-15 16:15:00,955][1648985] Fps is (10 sec: 36044.3, 60 sec: 43144.5, 300 sec: 45319.8). Total num frames: 784465920. Throughput: 0: 10990.9. Samples: 196214784. Policy #0 lag: (min: 79.0, avg: 168.1, max: 335.0) [2024-06-15 16:15:00,956][1648985] Avg episode reward: [(0, '136.300')] [2024-06-15 16:15:01,762][1652491] Updated weights for policy 0, policy_version 383073 (0.0074) [2024-06-15 16:15:02,411][1652491] Updated weights for policy 0, policy_version 383099 (0.0012) [2024-06-15 16:15:04,150][1652491] Updated weights for policy 0, policy_version 383168 (0.0012) [2024-06-15 16:15:05,395][1652491] Updated weights for policy 0, policy_version 383217 (0.0014) [2024-06-15 16:15:05,955][1648985] Fps is (10 sec: 42598.9, 60 sec: 44782.9, 300 sec: 46208.5). Total num frames: 784891904. Throughput: 0: 11082.0. Samples: 196279296. Policy #0 lag: (min: 79.0, avg: 168.1, max: 335.0) [2024-06-15 16:15:05,955][1648985] Avg episode reward: [(0, '132.260')] [2024-06-15 16:15:06,685][1652491] Updated weights for policy 0, policy_version 383296 (0.0010) [2024-06-15 16:15:10,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 43690.6, 300 sec: 45319.8). Total num frames: 784990208. Throughput: 0: 11013.6. Samples: 196314624. Policy #0 lag: (min: 79.0, avg: 168.1, max: 335.0) [2024-06-15 16:15:10,956][1648985] Avg episode reward: [(0, '119.790')] [2024-06-15 16:15:14,135][1652491] Updated weights for policy 0, policy_version 383363 (0.0013) [2024-06-15 16:15:15,955][1648985] Fps is (10 sec: 36044.7, 60 sec: 45329.2, 300 sec: 45764.1). Total num frames: 785252352. Throughput: 0: 11241.3. Samples: 196387328. Policy #0 lag: (min: 79.0, avg: 168.1, max: 335.0) [2024-06-15 16:15:15,956][1648985] Avg episode reward: [(0, '124.100')] [2024-06-15 16:15:16,101][1652491] Updated weights for policy 0, policy_version 383440 (0.0014) [2024-06-15 16:15:17,320][1651469] Signal inference workers to stop experience collection... (20000 times) [2024-06-15 16:15:17,365][1652491] InferenceWorker_p0-w0: stopping experience collection (20000 times) [2024-06-15 16:15:17,629][1651469] Signal inference workers to resume experience collection... (20000 times) [2024-06-15 16:15:17,630][1652491] InferenceWorker_p0-w0: resuming experience collection (20000 times) [2024-06-15 16:15:18,143][1652491] Updated weights for policy 0, policy_version 383526 (0.0013) [2024-06-15 16:15:20,955][1648985] Fps is (10 sec: 52429.4, 60 sec: 43690.6, 300 sec: 45542.0). Total num frames: 785514496. Throughput: 0: 10945.4. Samples: 196448768. Policy #0 lag: (min: 79.0, avg: 168.1, max: 335.0) [2024-06-15 16:15:20,956][1648985] Avg episode reward: [(0, '132.720')] [2024-06-15 16:15:24,379][1652491] Updated weights for policy 0, policy_version 383584 (0.0013) [2024-06-15 16:15:25,955][1648985] Fps is (10 sec: 39320.4, 60 sec: 44782.8, 300 sec: 45653.0). Total num frames: 785645568. Throughput: 0: 11070.5. Samples: 196491776. Policy #0 lag: (min: 79.0, avg: 168.1, max: 335.0) [2024-06-15 16:15:25,956][1648985] Avg episode reward: [(0, '143.680')] [2024-06-15 16:15:26,435][1652491] Updated weights for policy 0, policy_version 383634 (0.0016) [2024-06-15 16:15:28,184][1652491] Updated weights for policy 0, policy_version 383712 (0.0016) [2024-06-15 16:15:29,658][1652491] Updated weights for policy 0, policy_version 383776 (0.0013) [2024-06-15 16:15:30,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 45329.0, 300 sec: 45767.1). Total num frames: 786038784. Throughput: 0: 11047.8. Samples: 196550656. Policy #0 lag: (min: 79.0, avg: 168.1, max: 335.0) [2024-06-15 16:15:30,956][1648985] Avg episode reward: [(0, '162.010')] [2024-06-15 16:15:35,876][1652491] Updated weights for policy 0, policy_version 383843 (0.0013) [2024-06-15 16:15:35,955][1648985] Fps is (10 sec: 45876.8, 60 sec: 44783.0, 300 sec: 45542.0). Total num frames: 786104320. Throughput: 0: 11013.7. Samples: 196622336. Policy #0 lag: (min: 79.0, avg: 168.1, max: 335.0) [2024-06-15 16:15:35,955][1648985] Avg episode reward: [(0, '162.960')] [2024-06-15 16:15:38,367][1652491] Updated weights for policy 0, policy_version 383893 (0.0013) [2024-06-15 16:15:40,250][1652491] Updated weights for policy 0, policy_version 383968 (0.0011) [2024-06-15 16:15:40,955][1648985] Fps is (10 sec: 39321.4, 60 sec: 44782.8, 300 sec: 45875.2). Total num frames: 786432000. Throughput: 0: 11389.1. Samples: 196657664. Policy #0 lag: (min: 15.0, avg: 102.2, max: 271.0) [2024-06-15 16:15:40,956][1648985] Avg episode reward: [(0, '139.870')] [2024-06-15 16:15:41,328][1652491] Updated weights for policy 0, policy_version 384017 (0.0012) [2024-06-15 16:15:42,321][1652491] Updated weights for policy 0, policy_version 384064 (0.0012) [2024-06-15 16:15:45,955][1648985] Fps is (10 sec: 45874.7, 60 sec: 43690.7, 300 sec: 45319.8). Total num frames: 786563072. Throughput: 0: 11275.4. Samples: 196722176. Policy #0 lag: (min: 15.0, avg: 102.2, max: 271.0) [2024-06-15 16:15:45,956][1648985] Avg episode reward: [(0, '125.270')] [2024-06-15 16:15:47,697][1652491] Updated weights for policy 0, policy_version 384124 (0.0015) [2024-06-15 16:15:50,882][1652491] Updated weights for policy 0, policy_version 384192 (0.0011) [2024-06-15 16:15:50,955][1648985] Fps is (10 sec: 39322.4, 60 sec: 45329.2, 300 sec: 45764.2). Total num frames: 786825216. Throughput: 0: 11275.4. Samples: 196786688. Policy #0 lag: (min: 15.0, avg: 102.2, max: 271.0) [2024-06-15 16:15:50,955][1648985] Avg episode reward: [(0, '127.230')] [2024-06-15 16:15:52,633][1652491] Updated weights for policy 0, policy_version 384272 (0.0154) [2024-06-15 16:15:55,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 43690.7, 300 sec: 45319.8). Total num frames: 787087360. Throughput: 0: 11093.4. Samples: 196813824. Policy #0 lag: (min: 15.0, avg: 102.2, max: 271.0) [2024-06-15 16:15:55,955][1648985] Avg episode reward: [(0, '141.210')] [2024-06-15 16:15:58,017][1652491] Updated weights for policy 0, policy_version 384322 (0.0013) [2024-06-15 16:15:59,243][1652491] Updated weights for policy 0, policy_version 384381 (0.0060) [2024-06-15 16:16:00,955][1648985] Fps is (10 sec: 39320.5, 60 sec: 45875.2, 300 sec: 45319.8). Total num frames: 787218432. Throughput: 0: 11229.8. Samples: 196892672. Policy #0 lag: (min: 15.0, avg: 102.2, max: 271.0) [2024-06-15 16:16:00,956][1648985] Avg episode reward: [(0, '163.400')] [2024-06-15 16:16:01,975][1651469] Signal inference workers to stop experience collection... (20050 times) [2024-06-15 16:16:02,036][1652491] InferenceWorker_p0-w0: stopping experience collection (20050 times) [2024-06-15 16:16:02,208][1651469] Signal inference workers to resume experience collection... (20050 times) [2024-06-15 16:16:02,209][1652491] InferenceWorker_p0-w0: resuming experience collection (20050 times) [2024-06-15 16:16:02,680][1652491] Updated weights for policy 0, policy_version 384449 (0.0012) [2024-06-15 16:16:04,511][1652491] Updated weights for policy 0, policy_version 384530 (0.0013) [2024-06-15 16:16:05,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 45329.0, 300 sec: 45653.0). Total num frames: 787611648. Throughput: 0: 11184.4. Samples: 196952064. Policy #0 lag: (min: 15.0, avg: 102.2, max: 271.0) [2024-06-15 16:16:05,956][1648985] Avg episode reward: [(0, '150.900')] [2024-06-15 16:16:09,573][1652491] Updated weights for policy 0, policy_version 384586 (0.0030) [2024-06-15 16:16:10,955][1648985] Fps is (10 sec: 52429.9, 60 sec: 45875.4, 300 sec: 45542.0). Total num frames: 787742720. Throughput: 0: 11264.1. Samples: 196998656. Policy #0 lag: (min: 15.0, avg: 102.2, max: 271.0) [2024-06-15 16:16:10,956][1648985] Avg episode reward: [(0, '138.950')] [2024-06-15 16:16:11,723][1652491] Updated weights for policy 0, policy_version 384642 (0.0017) [2024-06-15 16:16:13,599][1652491] Updated weights for policy 0, policy_version 384724 (0.0022) [2024-06-15 16:16:15,392][1652491] Updated weights for policy 0, policy_version 384785 (0.0016) [2024-06-15 16:16:15,955][1648985] Fps is (10 sec: 45874.0, 60 sec: 46967.2, 300 sec: 45541.9). Total num frames: 788070400. Throughput: 0: 11423.2. Samples: 197064704. Policy #0 lag: (min: 15.0, avg: 102.2, max: 271.0) [2024-06-15 16:16:15,956][1648985] Avg episode reward: [(0, '146.170')] [2024-06-15 16:16:20,840][1652491] Updated weights for policy 0, policy_version 384851 (0.0015) [2024-06-15 16:16:20,955][1648985] Fps is (10 sec: 42598.6, 60 sec: 44236.9, 300 sec: 45430.9). Total num frames: 788168704. Throughput: 0: 11593.9. Samples: 197144064. Policy #0 lag: (min: 15.0, avg: 102.2, max: 271.0) [2024-06-15 16:16:20,955][1648985] Avg episode reward: [(0, '145.680')] [2024-06-15 16:16:22,875][1652491] Updated weights for policy 0, policy_version 384913 (0.0020) [2024-06-15 16:16:24,369][1652491] Updated weights for policy 0, policy_version 384992 (0.0014) [2024-06-15 16:16:25,955][1648985] Fps is (10 sec: 45876.3, 60 sec: 48059.9, 300 sec: 45542.0). Total num frames: 788529152. Throughput: 0: 11480.2. Samples: 197174272. Policy #0 lag: (min: 15.0, avg: 102.2, max: 271.0) [2024-06-15 16:16:25,956][1648985] Avg episode reward: [(0, '141.320')] [2024-06-15 16:16:26,818][1652491] Updated weights for policy 0, policy_version 385040 (0.0015) [2024-06-15 16:16:27,825][1652491] Updated weights for policy 0, policy_version 385082 (0.0014) [2024-06-15 16:16:30,955][1648985] Fps is (10 sec: 49151.5, 60 sec: 43690.7, 300 sec: 45319.8). Total num frames: 788660224. Throughput: 0: 11753.2. Samples: 197251072. Policy #0 lag: (min: 15.0, avg: 102.2, max: 271.0) [2024-06-15 16:16:30,956][1648985] Avg episode reward: [(0, '143.260')] [2024-06-15 16:16:32,631][1652491] Updated weights for policy 0, policy_version 385136 (0.0015) [2024-06-15 16:16:33,705][1652491] Updated weights for policy 0, policy_version 385169 (0.0012) [2024-06-15 16:16:35,504][1652491] Updated weights for policy 0, policy_version 385248 (0.0012) [2024-06-15 16:16:35,974][1648985] Fps is (10 sec: 49057.9, 60 sec: 48590.2, 300 sec: 45650.1). Total num frames: 789020672. Throughput: 0: 11668.6. Samples: 197312000. Policy #0 lag: (min: 15.0, avg: 102.2, max: 271.0) [2024-06-15 16:16:35,975][1648985] Avg episode reward: [(0, '152.460')] [2024-06-15 16:16:37,896][1652491] Updated weights for policy 0, policy_version 385312 (0.0015) [2024-06-15 16:16:40,955][1648985] Fps is (10 sec: 52429.4, 60 sec: 45875.3, 300 sec: 45319.8). Total num frames: 789184512. Throughput: 0: 11798.8. Samples: 197344768. Policy #0 lag: (min: 15.0, avg: 102.2, max: 271.0) [2024-06-15 16:16:40,955][1648985] Avg episode reward: [(0, '147.460')] [2024-06-15 16:16:43,918][1651469] Signal inference workers to stop experience collection... (20100 times) [2024-06-15 16:16:43,973][1652491] InferenceWorker_p0-w0: stopping experience collection (20100 times) [2024-06-15 16:16:43,975][1652491] Updated weights for policy 0, policy_version 385379 (0.0013) [2024-06-15 16:16:44,119][1651469] Signal inference workers to resume experience collection... (20100 times) [2024-06-15 16:16:44,121][1652491] InferenceWorker_p0-w0: resuming experience collection (20100 times) [2024-06-15 16:16:45,958][1648985] Fps is (10 sec: 39385.1, 60 sec: 47511.2, 300 sec: 45208.3). Total num frames: 789413888. Throughput: 0: 11763.9. Samples: 197422080. Policy #0 lag: (min: 15.0, avg: 102.2, max: 271.0) [2024-06-15 16:16:45,959][1648985] Avg episode reward: [(0, '147.280')] [2024-06-15 16:16:46,028][1652491] Updated weights for policy 0, policy_version 385470 (0.0094) [2024-06-15 16:16:49,304][1652491] Updated weights for policy 0, policy_version 385552 (0.0013) [2024-06-15 16:16:50,425][1652491] Updated weights for policy 0, policy_version 385596 (0.0012) [2024-06-15 16:16:50,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 48059.7, 300 sec: 45319.8). Total num frames: 789708800. Throughput: 0: 11741.9. Samples: 197480448. Policy #0 lag: (min: 15.0, avg: 102.2, max: 271.0) [2024-06-15 16:16:50,955][1648985] Avg episode reward: [(0, '133.020')] [2024-06-15 16:16:55,955][1648985] Fps is (10 sec: 36055.2, 60 sec: 44782.8, 300 sec: 44986.5). Total num frames: 789774336. Throughput: 0: 11696.3. Samples: 197524992. Policy #0 lag: (min: 15.0, avg: 102.2, max: 271.0) [2024-06-15 16:16:55,956][1648985] Avg episode reward: [(0, '123.810')] [2024-06-15 16:16:56,295][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000385664_789839872.pth... [2024-06-15 16:16:56,498][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000380384_779026432.pth [2024-06-15 16:16:56,766][1652491] Updated weights for policy 0, policy_version 385680 (0.0130) [2024-06-15 16:16:58,391][1652491] Updated weights for policy 0, policy_version 385745 (0.0107) [2024-06-15 16:17:00,955][1648985] Fps is (10 sec: 39321.4, 60 sec: 48059.9, 300 sec: 45319.8). Total num frames: 790102016. Throughput: 0: 11582.7. Samples: 197585920. Policy #0 lag: (min: 111.0, avg: 168.8, max: 367.0) [2024-06-15 16:17:00,956][1648985] Avg episode reward: [(0, '147.290')] [2024-06-15 16:17:01,050][1652491] Updated weights for policy 0, policy_version 385794 (0.0015) [2024-06-15 16:17:02,471][1652491] Updated weights for policy 0, policy_version 385856 (0.0012) [2024-06-15 16:17:05,955][1648985] Fps is (10 sec: 45876.0, 60 sec: 43690.7, 300 sec: 45208.7). Total num frames: 790233088. Throughput: 0: 11593.9. Samples: 197665792. Policy #0 lag: (min: 111.0, avg: 168.8, max: 367.0) [2024-06-15 16:17:05,956][1648985] Avg episode reward: [(0, '154.240')] [2024-06-15 16:17:07,390][1652491] Updated weights for policy 0, policy_version 385920 (0.0012) [2024-06-15 16:17:08,702][1652491] Updated weights for policy 0, policy_version 385984 (0.0014) [2024-06-15 16:17:10,289][1652491] Updated weights for policy 0, policy_version 386044 (0.0013) [2024-06-15 16:17:10,956][1648985] Fps is (10 sec: 52425.9, 60 sec: 48059.3, 300 sec: 45653.0). Total num frames: 790626304. Throughput: 0: 11571.1. Samples: 197694976. Policy #0 lag: (min: 111.0, avg: 168.8, max: 367.0) [2024-06-15 16:17:10,957][1648985] Avg episode reward: [(0, '139.470')] [2024-06-15 16:17:12,897][1652491] Updated weights for policy 0, policy_version 386103 (0.0012) [2024-06-15 16:17:15,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 44783.1, 300 sec: 45319.8). Total num frames: 790757376. Throughput: 0: 11411.9. Samples: 197764608. Policy #0 lag: (min: 111.0, avg: 168.8, max: 367.0) [2024-06-15 16:17:15,956][1648985] Avg episode reward: [(0, '156.200')] [2024-06-15 16:17:17,909][1652491] Updated weights for policy 0, policy_version 386144 (0.0014) [2024-06-15 16:17:19,487][1652491] Updated weights for policy 0, policy_version 386210 (0.0014) [2024-06-15 16:17:20,974][1648985] Fps is (10 sec: 45790.2, 60 sec: 48590.3, 300 sec: 45539.0). Total num frames: 791085056. Throughput: 0: 11571.2. Samples: 197832704. Policy #0 lag: (min: 111.0, avg: 168.8, max: 367.0) [2024-06-15 16:17:20,975][1648985] Avg episode reward: [(0, '137.050')] [2024-06-15 16:17:21,088][1652491] Updated weights for policy 0, policy_version 386275 (0.0097) [2024-06-15 16:17:23,398][1652491] Updated weights for policy 0, policy_version 386325 (0.0012) [2024-06-15 16:17:24,184][1652491] Updated weights for policy 0, policy_version 386364 (0.0079) [2024-06-15 16:17:25,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 45875.2, 300 sec: 45430.9). Total num frames: 791281664. Throughput: 0: 11616.7. Samples: 197867520. Policy #0 lag: (min: 111.0, avg: 168.8, max: 367.0) [2024-06-15 16:17:25,956][1648985] Avg episode reward: [(0, '148.660')] [2024-06-15 16:17:28,516][1651469] Signal inference workers to stop experience collection... (20150 times) [2024-06-15 16:17:28,549][1652491] InferenceWorker_p0-w0: stopping experience collection (20150 times) [2024-06-15 16:17:28,905][1651469] Signal inference workers to resume experience collection... (20150 times) [2024-06-15 16:17:28,906][1652491] InferenceWorker_p0-w0: resuming experience collection (20150 times) [2024-06-15 16:17:29,598][1652491] Updated weights for policy 0, policy_version 386416 (0.0014) [2024-06-15 16:17:30,869][1652491] Updated weights for policy 0, policy_version 386480 (0.0011) [2024-06-15 16:17:30,955][1648985] Fps is (10 sec: 42678.9, 60 sec: 47513.5, 300 sec: 45541.9). Total num frames: 791511040. Throughput: 0: 11640.2. Samples: 197945856. Policy #0 lag: (min: 111.0, avg: 168.8, max: 367.0) [2024-06-15 16:17:30,956][1648985] Avg episode reward: [(0, '165.050')] [2024-06-15 16:17:32,730][1652491] Updated weights for policy 0, policy_version 386557 (0.0013) [2024-06-15 16:17:34,779][1652491] Updated weights for policy 0, policy_version 386609 (0.0014) [2024-06-15 16:17:35,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 46436.2, 300 sec: 45764.1). Total num frames: 791805952. Throughput: 0: 11753.2. Samples: 198009344. Policy #0 lag: (min: 111.0, avg: 168.8, max: 367.0) [2024-06-15 16:17:35,956][1648985] Avg episode reward: [(0, '163.680')] [2024-06-15 16:17:40,189][1652491] Updated weights for policy 0, policy_version 386658 (0.0013) [2024-06-15 16:17:40,955][1648985] Fps is (10 sec: 42599.3, 60 sec: 45875.1, 300 sec: 45319.8). Total num frames: 791937024. Throughput: 0: 11764.7. Samples: 198054400. Policy #0 lag: (min: 111.0, avg: 168.8, max: 367.0) [2024-06-15 16:17:40,956][1648985] Avg episode reward: [(0, '136.050')] [2024-06-15 16:17:41,203][1652491] Updated weights for policy 0, policy_version 386704 (0.0013) [2024-06-15 16:17:42,341][1652491] Updated weights for policy 0, policy_version 386750 (0.0013) [2024-06-15 16:17:43,659][1652491] Updated weights for policy 0, policy_version 386800 (0.0014) [2024-06-15 16:17:45,811][1652491] Updated weights for policy 0, policy_version 386872 (0.0158) [2024-06-15 16:17:45,955][1648985] Fps is (10 sec: 52427.9, 60 sec: 48608.2, 300 sec: 45986.3). Total num frames: 792330240. Throughput: 0: 11719.1. Samples: 198113280. Policy #0 lag: (min: 111.0, avg: 168.8, max: 367.0) [2024-06-15 16:17:45,956][1648985] Avg episode reward: [(0, '150.400')] [2024-06-15 16:17:50,955][1648985] Fps is (10 sec: 42598.4, 60 sec: 44236.7, 300 sec: 45319.8). Total num frames: 792363008. Throughput: 0: 11741.9. Samples: 198194176. Policy #0 lag: (min: 111.0, avg: 168.8, max: 367.0) [2024-06-15 16:17:50,956][1648985] Avg episode reward: [(0, '133.390')] [2024-06-15 16:17:51,174][1652491] Updated weights for policy 0, policy_version 386914 (0.0013) [2024-06-15 16:17:52,411][1652491] Updated weights for policy 0, policy_version 386962 (0.0011) [2024-06-15 16:17:55,438][1652491] Updated weights for policy 0, policy_version 387056 (0.0125) [2024-06-15 16:17:55,955][1648985] Fps is (10 sec: 39322.5, 60 sec: 49152.2, 300 sec: 45764.2). Total num frames: 792723456. Throughput: 0: 11742.0. Samples: 198223360. Policy #0 lag: (min: 111.0, avg: 168.8, max: 367.0) [2024-06-15 16:17:55,955][1648985] Avg episode reward: [(0, '142.550')] [2024-06-15 16:17:57,161][1652491] Updated weights for policy 0, policy_version 387120 (0.0012) [2024-06-15 16:18:00,955][1648985] Fps is (10 sec: 49152.4, 60 sec: 45875.2, 300 sec: 45319.8). Total num frames: 792854528. Throughput: 0: 11730.5. Samples: 198292480. Policy #0 lag: (min: 111.0, avg: 168.8, max: 367.0) [2024-06-15 16:18:00,956][1648985] Avg episode reward: [(0, '153.310')] [2024-06-15 16:18:02,618][1652491] Updated weights for policy 0, policy_version 387168 (0.0010) [2024-06-15 16:18:04,470][1652491] Updated weights for policy 0, policy_version 387248 (0.0014) [2024-06-15 16:18:05,955][1648985] Fps is (10 sec: 39321.2, 60 sec: 48059.7, 300 sec: 45430.9). Total num frames: 793116672. Throughput: 0: 11712.7. Samples: 198359552. Policy #0 lag: (min: 111.0, avg: 168.8, max: 367.0) [2024-06-15 16:18:05,956][1648985] Avg episode reward: [(0, '137.320')] [2024-06-15 16:18:07,360][1651469] Signal inference workers to stop experience collection... (20200 times) [2024-06-15 16:18:07,403][1652491] InferenceWorker_p0-w0: stopping experience collection (20200 times) [2024-06-15 16:18:07,607][1651469] Signal inference workers to resume experience collection... (20200 times) [2024-06-15 16:18:07,608][1652491] InferenceWorker_p0-w0: resuming experience collection (20200 times) [2024-06-15 16:18:07,610][1652491] Updated weights for policy 0, policy_version 387312 (0.0014) [2024-06-15 16:18:09,590][1652491] Updated weights for policy 0, policy_version 387382 (0.0023) [2024-06-15 16:18:10,955][1648985] Fps is (10 sec: 52428.3, 60 sec: 45875.6, 300 sec: 45764.1). Total num frames: 793378816. Throughput: 0: 11525.7. Samples: 198386176. Policy #0 lag: (min: 111.0, avg: 168.8, max: 367.0) [2024-06-15 16:18:10,956][1648985] Avg episode reward: [(0, '134.180')] [2024-06-15 16:18:14,048][1652491] Updated weights for policy 0, policy_version 387424 (0.0118) [2024-06-15 16:18:15,568][1652491] Updated weights for policy 0, policy_version 387488 (0.0014) [2024-06-15 16:18:15,955][1648985] Fps is (10 sec: 49151.8, 60 sec: 47513.5, 300 sec: 45430.9). Total num frames: 793608192. Throughput: 0: 11503.0. Samples: 198463488. Policy #0 lag: (min: 111.0, avg: 168.8, max: 367.0) [2024-06-15 16:18:15,956][1648985] Avg episode reward: [(0, '131.950')] [2024-06-15 16:18:17,909][1652491] Updated weights for policy 0, policy_version 387536 (0.0014) [2024-06-15 16:18:19,835][1652491] Updated weights for policy 0, policy_version 387602 (0.0013) [2024-06-15 16:18:20,955][1648985] Fps is (10 sec: 52428.3, 60 sec: 46982.3, 300 sec: 45875.2). Total num frames: 793903104. Throughput: 0: 11309.5. Samples: 198518272. Policy #0 lag: (min: 7.0, avg: 119.9, max: 263.0) [2024-06-15 16:18:20,956][1648985] Avg episode reward: [(0, '133.540')] [2024-06-15 16:18:24,617][1652491] Updated weights for policy 0, policy_version 387680 (0.0014) [2024-06-15 16:18:25,239][1652491] Updated weights for policy 0, policy_version 387712 (0.0013) [2024-06-15 16:18:25,955][1648985] Fps is (10 sec: 49152.2, 60 sec: 46967.4, 300 sec: 45542.0). Total num frames: 794099712. Throughput: 0: 11571.2. Samples: 198575104. Policy #0 lag: (min: 7.0, avg: 119.9, max: 263.0) [2024-06-15 16:18:25,956][1648985] Avg episode reward: [(0, '145.900')] [2024-06-15 16:18:28,780][1652491] Updated weights for policy 0, policy_version 387794 (0.0024) [2024-06-15 16:18:30,683][1652491] Updated weights for policy 0, policy_version 387860 (0.0016) [2024-06-15 16:18:30,955][1648985] Fps is (10 sec: 45875.9, 60 sec: 47513.8, 300 sec: 45986.3). Total num frames: 794361856. Throughput: 0: 11628.1. Samples: 198636544. Policy #0 lag: (min: 7.0, avg: 119.9, max: 263.0) [2024-06-15 16:18:30,956][1648985] Avg episode reward: [(0, '162.190')] [2024-06-15 16:18:34,788][1652491] Updated weights for policy 0, policy_version 387905 (0.0012) [2024-06-15 16:18:35,955][1648985] Fps is (10 sec: 42598.3, 60 sec: 45329.0, 300 sec: 45653.1). Total num frames: 794525696. Throughput: 0: 11582.6. Samples: 198715392. Policy #0 lag: (min: 7.0, avg: 119.9, max: 263.0) [2024-06-15 16:18:35,956][1648985] Avg episode reward: [(0, '165.990')] [2024-06-15 16:18:35,972][1652491] Updated weights for policy 0, policy_version 387965 (0.0014) [2024-06-15 16:18:38,090][1652491] Updated weights for policy 0, policy_version 388031 (0.0014) [2024-06-15 16:18:40,125][1652491] Updated weights for policy 0, policy_version 388085 (0.0043) [2024-06-15 16:18:40,955][1648985] Fps is (10 sec: 45875.2, 60 sec: 48059.8, 300 sec: 45764.1). Total num frames: 794820608. Throughput: 0: 11696.4. Samples: 198749696. Policy #0 lag: (min: 7.0, avg: 119.9, max: 263.0) [2024-06-15 16:18:40,955][1648985] Avg episode reward: [(0, '147.350')] [2024-06-15 16:18:42,423][1652491] Updated weights for policy 0, policy_version 388129 (0.0046) [2024-06-15 16:18:45,855][1652491] Updated weights for policy 0, policy_version 388177 (0.0012) [2024-06-15 16:18:45,955][1648985] Fps is (10 sec: 45875.5, 60 sec: 44236.9, 300 sec: 45764.1). Total num frames: 794984448. Throughput: 0: 11753.2. Samples: 198821376. Policy #0 lag: (min: 7.0, avg: 119.9, max: 263.0) [2024-06-15 16:18:45,956][1648985] Avg episode reward: [(0, '122.910')] [2024-06-15 16:18:48,380][1652491] Updated weights for policy 0, policy_version 388240 (0.0014) [2024-06-15 16:18:49,671][1652491] Updated weights for policy 0, policy_version 388286 (0.0013) [2024-06-15 16:18:50,243][1651469] Signal inference workers to stop experience collection... (20250 times) [2024-06-15 16:18:50,301][1652491] InferenceWorker_p0-w0: stopping experience collection (20250 times) [2024-06-15 16:18:50,530][1651469] Signal inference workers to resume experience collection... (20250 times) [2024-06-15 16:18:50,531][1652491] InferenceWorker_p0-w0: resuming experience collection (20250 times) [2024-06-15 16:18:50,955][1648985] Fps is (10 sec: 45875.3, 60 sec: 48605.9, 300 sec: 45542.0). Total num frames: 795279360. Throughput: 0: 11707.8. Samples: 198886400. Policy #0 lag: (min: 7.0, avg: 119.9, max: 263.0) [2024-06-15 16:18:50,956][1648985] Avg episode reward: [(0, '120.810')] [2024-06-15 16:18:51,423][1652491] Updated weights for policy 0, policy_version 388351 (0.0014) [2024-06-15 16:18:53,950][1652491] Updated weights for policy 0, policy_version 388412 (0.0157) [2024-06-15 16:18:55,955][1648985] Fps is (10 sec: 49152.1, 60 sec: 45875.2, 300 sec: 46097.4). Total num frames: 795475968. Throughput: 0: 11810.1. Samples: 198917632. Policy #0 lag: (min: 7.0, avg: 119.9, max: 263.0) [2024-06-15 16:18:55,955][1648985] Avg episode reward: [(0, '127.900')] [2024-06-15 16:18:55,960][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000388416_795475968.pth... [2024-06-15 16:18:56,061][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000383040_784465920.pth [2024-06-15 16:18:58,162][1652491] Updated weights for policy 0, policy_version 388474 (0.0020) [2024-06-15 16:19:00,777][1652491] Updated weights for policy 0, policy_version 388529 (0.0014) [2024-06-15 16:19:00,955][1648985] Fps is (10 sec: 42597.3, 60 sec: 47513.4, 300 sec: 45764.1). Total num frames: 795705344. Throughput: 0: 11798.7. Samples: 198994432. Policy #0 lag: (min: 7.0, avg: 119.9, max: 263.0) [2024-06-15 16:19:00,956][1648985] Avg episode reward: [(0, '133.710')] [2024-06-15 16:19:03,006][1652491] Updated weights for policy 0, policy_version 388577 (0.0015) [2024-06-15 16:19:04,747][1652491] Updated weights for policy 0, policy_version 388624 (0.0021) [2024-06-15 16:19:05,961][1648985] Fps is (10 sec: 52399.5, 60 sec: 48055.3, 300 sec: 46207.6). Total num frames: 796000256. Throughput: 0: 11888.3. Samples: 199053312. Policy #0 lag: (min: 7.0, avg: 119.9, max: 263.0) [2024-06-15 16:19:05,963][1648985] Avg episode reward: [(0, '145.460')] [2024-06-15 16:19:08,590][1652491] Updated weights for policy 0, policy_version 388688 (0.0028) [2024-06-15 16:19:10,955][1648985] Fps is (10 sec: 42599.2, 60 sec: 45875.2, 300 sec: 46097.4). Total num frames: 796131328. Throughput: 0: 11480.2. Samples: 199091712. Policy #0 lag: (min: 7.0, avg: 119.9, max: 263.0) [2024-06-15 16:19:10,956][1648985] Avg episode reward: [(0, '141.970')] [2024-06-15 16:19:12,036][1652491] Updated weights for policy 0, policy_version 388752 (0.0013) [2024-06-15 16:19:13,487][1652491] Updated weights for policy 0, policy_version 388804 (0.0016) [2024-06-15 16:19:14,695][1652491] Updated weights for policy 0, policy_version 388858 (0.0014) [2024-06-15 16:19:15,955][1648985] Fps is (10 sec: 39344.0, 60 sec: 46421.5, 300 sec: 45764.1). Total num frames: 796393472. Throughput: 0: 11616.7. Samples: 199159296. Policy #0 lag: (min: 7.0, avg: 119.9, max: 263.0) [2024-06-15 16:19:15,955][1648985] Avg episode reward: [(0, '145.380')] [2024-06-15 16:19:16,835][1652491] Updated weights for policy 0, policy_version 388900 (0.0012) [2024-06-15 16:19:19,843][1652491] Updated weights for policy 0, policy_version 388947 (0.0012) [2024-06-15 16:19:20,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 45875.3, 300 sec: 46430.6). Total num frames: 796655616. Throughput: 0: 11434.7. Samples: 199229952. Policy #0 lag: (min: 7.0, avg: 119.9, max: 263.0) [2024-06-15 16:19:20,956][1648985] Avg episode reward: [(0, '147.970')] [2024-06-15 16:19:22,697][1652491] Updated weights for policy 0, policy_version 388993 (0.0031) [2024-06-15 16:19:25,079][1652491] Updated weights for policy 0, policy_version 389057 (0.0012) [2024-06-15 16:19:25,955][1648985] Fps is (10 sec: 45874.0, 60 sec: 45875.1, 300 sec: 45875.2). Total num frames: 796852224. Throughput: 0: 11468.8. Samples: 199265792. Policy #0 lag: (min: 7.0, avg: 119.9, max: 263.0) [2024-06-15 16:19:25,956][1648985] Avg episode reward: [(0, '159.020')] [2024-06-15 16:19:26,426][1652491] Updated weights for policy 0, policy_version 389115 (0.0012) [2024-06-15 16:19:28,790][1652491] Updated weights for policy 0, policy_version 389179 (0.0013) [2024-06-15 16:19:30,955][1648985] Fps is (10 sec: 45875.1, 60 sec: 45875.2, 300 sec: 46430.6). Total num frames: 797114368. Throughput: 0: 11411.9. Samples: 199334912. Policy #0 lag: (min: 7.0, avg: 119.9, max: 263.0) [2024-06-15 16:19:30,956][1648985] Avg episode reward: [(0, '163.220')] [2024-06-15 16:19:31,479][1652491] Updated weights for policy 0, policy_version 389248 (0.0246) [2024-06-15 16:19:35,955][1648985] Fps is (10 sec: 42599.2, 60 sec: 45875.3, 300 sec: 45875.2). Total num frames: 797278208. Throughput: 0: 11571.2. Samples: 199407104. Policy #0 lag: (min: 7.0, avg: 119.9, max: 263.0) [2024-06-15 16:19:35,956][1648985] Avg episode reward: [(0, '157.060')] [2024-06-15 16:19:36,242][1652491] Updated weights for policy 0, policy_version 389317 (0.0014) [2024-06-15 16:19:36,558][1651469] Signal inference workers to stop experience collection... (20300 times) [2024-06-15 16:19:36,640][1652491] InferenceWorker_p0-w0: stopping experience collection (20300 times) [2024-06-15 16:19:36,766][1651469] Signal inference workers to resume experience collection... (20300 times) [2024-06-15 16:19:36,768][1652491] InferenceWorker_p0-w0: resuming experience collection (20300 times) [2024-06-15 16:19:37,400][1652491] Updated weights for policy 0, policy_version 389371 (0.0014) [2024-06-15 16:19:40,112][1652491] Updated weights for policy 0, policy_version 389429 (0.0012) [2024-06-15 16:19:40,955][1648985] Fps is (10 sec: 45875.5, 60 sec: 45875.2, 300 sec: 46208.4). Total num frames: 797573120. Throughput: 0: 11616.7. Samples: 199440384. Policy #0 lag: (min: 93.0, avg: 183.6, max: 319.0) [2024-06-15 16:19:40,955][1648985] Avg episode reward: [(0, '159.370')] [2024-06-15 16:19:41,626][1652491] Updated weights for policy 0, policy_version 389473 (0.0013) [2024-06-15 16:19:45,955][1648985] Fps is (10 sec: 45875.0, 60 sec: 45875.2, 300 sec: 46208.4). Total num frames: 797736960. Throughput: 0: 11616.8. Samples: 199517184. Policy #0 lag: (min: 93.0, avg: 183.6, max: 319.0) [2024-06-15 16:19:45,956][1648985] Avg episode reward: [(0, '153.110')] [2024-06-15 16:19:46,049][1652491] Updated weights for policy 0, policy_version 389524 (0.0012) [2024-06-15 16:19:47,524][1652491] Updated weights for policy 0, policy_version 389584 (0.0012) [2024-06-15 16:19:48,698][1652491] Updated weights for policy 0, policy_version 389632 (0.0012) [2024-06-15 16:19:50,757][1652491] Updated weights for policy 0, policy_version 389694 (0.0018) [2024-06-15 16:19:50,955][1648985] Fps is (10 sec: 52428.5, 60 sec: 46967.4, 300 sec: 46208.4). Total num frames: 798097408. Throughput: 0: 11788.8. Samples: 199583744. Policy #0 lag: (min: 93.0, avg: 183.6, max: 319.0) [2024-06-15 16:19:50,956][1648985] Avg episode reward: [(0, '138.550')] [2024-06-15 16:19:55,961][1648985] Fps is (10 sec: 49125.0, 60 sec: 45871.0, 300 sec: 46651.9). Total num frames: 798228480. Throughput: 0: 11683.5. Samples: 199617536. Policy #0 lag: (min: 93.0, avg: 183.6, max: 319.0) [2024-06-15 16:19:55,963][1648985] Avg episode reward: [(0, '119.790')] [2024-06-15 16:19:57,864][1652491] Updated weights for policy 0, policy_version 389808 (0.0014) [2024-06-15 16:19:59,152][1652491] Updated weights for policy 0, policy_version 389856 (0.0012) [2024-06-15 16:20:00,955][1648985] Fps is (10 sec: 39321.5, 60 sec: 46421.5, 300 sec: 46097.3). Total num frames: 798490624. Throughput: 0: 11707.7. Samples: 199686144. Policy #0 lag: (min: 93.0, avg: 183.6, max: 319.0) [2024-06-15 16:20:00,956][1648985] Avg episode reward: [(0, '136.500')] [2024-06-15 16:20:01,377][1652491] Updated weights for policy 0, policy_version 389890 (0.0016) [2024-06-15 16:20:02,630][1652491] Updated weights for policy 0, policy_version 389944 (0.0013) [2024-06-15 16:20:03,893][1652491] Updated weights for policy 0, policy_version 389988 (0.0012) [2024-06-15 16:20:05,955][1648985] Fps is (10 sec: 52457.7, 60 sec: 45879.5, 300 sec: 46652.8). Total num frames: 798752768. Throughput: 0: 11832.9. Samples: 199762432. Policy #0 lag: (min: 93.0, avg: 183.6, max: 319.0) [2024-06-15 16:20:05,956][1648985] Avg episode reward: [(0, '145.170')] [2024-06-15 16:20:08,458][1652491] Updated weights for policy 0, policy_version 390020 (0.0015) [2024-06-15 16:20:10,097][1652491] Updated weights for policy 0, policy_version 390083 (0.0013) [2024-06-15 16:20:10,955][1648985] Fps is (10 sec: 45875.0, 60 sec: 46967.4, 300 sec: 46430.6). Total num frames: 798949376. Throughput: 0: 11878.4. Samples: 199800320. Policy #0 lag: (min: 93.0, avg: 183.6, max: 319.0) [2024-06-15 16:20:10,956][1648985] Avg episode reward: [(0, '147.520')] [2024-06-15 16:20:11,314][1652491] Updated weights for policy 0, policy_version 390141 (0.0012) [2024-06-15 16:20:13,318][1652491] Updated weights for policy 0, policy_version 390194 (0.0013) [2024-06-15 16:20:15,049][1652491] Updated weights for policy 0, policy_version 390256 (0.0017) [2024-06-15 16:20:15,955][1648985] Fps is (10 sec: 52428.1, 60 sec: 48059.5, 300 sec: 46652.7). Total num frames: 799277056. Throughput: 0: 11810.1. Samples: 199866368. Policy #0 lag: (min: 93.0, avg: 183.6, max: 319.0) [2024-06-15 16:20:15,956][1648985] Avg episode reward: [(0, '145.160')] [2024-06-15 16:20:20,047][1652491] Updated weights for policy 0, policy_version 390320 (0.0012) [2024-06-15 16:20:20,514][1651469] Signal inference workers to stop experience collection... (20350 times) [2024-06-15 16:20:20,603][1652491] InferenceWorker_p0-w0: stopping experience collection (20350 times) [2024-06-15 16:20:20,747][1651469] Signal inference workers to resume experience collection... (20350 times) [2024-06-15 16:20:20,759][1652491] InferenceWorker_p0-w0: resuming experience collection (20350 times) [2024-06-15 16:20:20,955][1648985] Fps is (10 sec: 49152.6, 60 sec: 46421.4, 300 sec: 46763.9). Total num frames: 799440896. Throughput: 0: 11696.4. Samples: 199933440. Policy #0 lag: (min: 93.0, avg: 183.6, max: 319.0) [2024-06-15 16:20:20,955][1648985] Avg episode reward: [(0, '147.670')] [2024-06-15 16:20:21,630][1652491] Updated weights for policy 0, policy_version 390394 (0.0015) [2024-06-15 16:20:24,585][1652491] Updated weights for policy 0, policy_version 390455 (0.0014) [2024-06-15 16:20:25,955][1648985] Fps is (10 sec: 42599.2, 60 sec: 47513.7, 300 sec: 46319.5). Total num frames: 799703040. Throughput: 0: 11901.2. Samples: 199975936. Policy #0 lag: (min: 93.0, avg: 183.6, max: 319.0) [2024-06-15 16:20:25,955][1648985] Avg episode reward: [(0, '140.120')] [2024-06-15 16:20:26,378][1652491] Updated weights for policy 0, policy_version 390500 (0.0012) [2024-06-15 16:20:30,738][1652491] Updated weights for policy 0, policy_version 390576 (0.0118) [2024-06-15 16:20:30,955][1648985] Fps is (10 sec: 45874.0, 60 sec: 46421.2, 300 sec: 46763.8). Total num frames: 799899648. Throughput: 0: 11764.6. Samples: 200046592. Policy #0 lag: (min: 93.0, avg: 183.6, max: 319.0) [2024-06-15 16:20:30,956][1648985] Avg episode reward: [(0, '135.430')] [2024-06-15 16:20:32,350][1652491] Updated weights for policy 0, policy_version 390625 (0.0014) [2024-06-15 16:20:35,027][1652491] Updated weights for policy 0, policy_version 390675 (0.0012) [2024-06-15 16:20:35,955][1648985] Fps is (10 sec: 45875.1, 60 sec: 48059.7, 300 sec: 46541.7). Total num frames: 800161792. Throughput: 0: 11707.7. Samples: 200110592. Policy #0 lag: (min: 93.0, avg: 183.6, max: 319.0) [2024-06-15 16:20:35,956][1648985] Avg episode reward: [(0, '133.720')] [2024-06-15 16:20:37,212][1652491] Updated weights for policy 0, policy_version 390722 (0.0018) [2024-06-15 16:20:40,955][1648985] Fps is (10 sec: 42599.1, 60 sec: 45875.1, 300 sec: 46652.7). Total num frames: 800325632. Throughput: 0: 11720.5. Samples: 200144896. Policy #0 lag: (min: 93.0, avg: 183.6, max: 319.0) [2024-06-15 16:20:40,956][1648985] Avg episode reward: [(0, '155.770')] [2024-06-15 16:20:41,774][1652491] Updated weights for policy 0, policy_version 390807 (0.0104) [2024-06-15 16:20:43,003][1652491] Updated weights for policy 0, policy_version 390854 (0.0012) [2024-06-15 16:20:44,238][1652491] Updated weights for policy 0, policy_version 390910 (0.0023) [2024-06-15 16:20:45,955][1648985] Fps is (10 sec: 45875.4, 60 sec: 48059.8, 300 sec: 46763.8). Total num frames: 800620544. Throughput: 0: 11764.6. Samples: 200215552. Policy #0 lag: (min: 93.0, avg: 183.6, max: 319.0) [2024-06-15 16:20:45,956][1648985] Avg episode reward: [(0, '149.280')] [2024-06-15 16:20:46,695][1652491] Updated weights for policy 0, policy_version 390961 (0.0011) [2024-06-15 16:20:47,096][1652491] Updated weights for policy 0, policy_version 390976 (0.0050) [2024-06-15 16:20:49,798][1652491] Updated weights for policy 0, policy_version 391031 (0.0014) [2024-06-15 16:20:50,955][1648985] Fps is (10 sec: 52428.4, 60 sec: 45875.1, 300 sec: 46652.7). Total num frames: 800849920. Throughput: 0: 11764.6. Samples: 200291840. Policy #0 lag: (min: 93.0, avg: 183.6, max: 319.0) [2024-06-15 16:20:50,956][1648985] Avg episode reward: [(0, '126.000')] [2024-06-15 16:20:52,996][1652491] Updated weights for policy 0, policy_version 391088 (0.0097) [2024-06-15 16:20:54,395][1652491] Updated weights for policy 0, policy_version 391152 (0.0014) [2024-06-15 16:20:55,955][1648985] Fps is (10 sec: 49151.4, 60 sec: 48064.1, 300 sec: 47097.1). Total num frames: 801112064. Throughput: 0: 11707.7. Samples: 200327168. Policy #0 lag: (min: 93.0, avg: 183.6, max: 319.0) [2024-06-15 16:20:55,956][1648985] Avg episode reward: [(0, '124.450')] [2024-06-15 16:20:55,988][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000391168_801112064.pth... [2024-06-15 16:20:56,190][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000385664_789839872.pth [2024-06-15 16:20:57,121][1652491] Updated weights for policy 0, policy_version 391216 (0.0024) [2024-06-15 16:20:59,989][1652491] Updated weights for policy 0, policy_version 391288 (0.0019) [2024-06-15 16:21:00,955][1648985] Fps is (10 sec: 52428.3, 60 sec: 48059.6, 300 sec: 46652.7). Total num frames: 801374208. Throughput: 0: 11923.9. Samples: 200402944. Policy #0 lag: (min: 63.0, avg: 186.8, max: 319.0) [2024-06-15 16:21:00,956][1648985] Avg episode reward: [(0, '128.490')] [2024-06-15 16:21:03,212][1652491] Updated weights for policy 0, policy_version 391318 (0.0012) [2024-06-15 16:21:04,442][1651469] Signal inference workers to stop experience collection... (20400 times) [2024-06-15 16:21:04,537][1652491] InferenceWorker_p0-w0: stopping experience collection (20400 times) [2024-06-15 16:21:04,648][1651469] Signal inference workers to resume experience collection... (20400 times) [2024-06-15 16:21:04,648][1652491] InferenceWorker_p0-w0: resuming experience collection (20400 times) [2024-06-15 16:21:04,872][1652491] Updated weights for policy 0, policy_version 391382 (0.0013) [2024-06-15 16:21:05,955][1648985] Fps is (10 sec: 52429.5, 60 sec: 48059.8, 300 sec: 47097.1). Total num frames: 801636352. Throughput: 0: 11969.4. Samples: 200472064. Policy #0 lag: (min: 63.0, avg: 186.8, max: 319.0) [2024-06-15 16:21:05,955][1648985] Avg episode reward: [(0, '120.400')] [2024-06-15 16:21:07,108][1652491] Updated weights for policy 0, policy_version 391440 (0.0050) [2024-06-15 16:21:09,949][1652491] Updated weights for policy 0, policy_version 391490 (0.0114) [2024-06-15 16:21:10,955][1648985] Fps is (10 sec: 45875.8, 60 sec: 48059.7, 300 sec: 46652.8). Total num frames: 801832960. Throughput: 0: 11787.3. Samples: 200506368. Policy #0 lag: (min: 63.0, avg: 186.8, max: 319.0) [2024-06-15 16:21:10,956][1648985] Avg episode reward: [(0, '142.800')] [2024-06-15 16:21:11,381][1652491] Updated weights for policy 0, policy_version 391548 (0.0013) [2024-06-15 16:21:15,955][1648985] Fps is (10 sec: 32767.6, 60 sec: 44783.0, 300 sec: 46763.8). Total num frames: 801964032. Throughput: 0: 11935.3. Samples: 200583680. Policy #0 lag: (min: 63.0, avg: 186.8, max: 319.0) [2024-06-15 16:21:15,956][1648985] Avg episode reward: [(0, '147.750')] [2024-06-15 16:21:15,967][1652491] Updated weights for policy 0, policy_version 391600 (0.0011) [2024-06-15 16:21:17,662][1652491] Updated weights for policy 0, policy_version 391680 (0.0029) [2024-06-15 16:21:20,955][1648985] Fps is (10 sec: 45875.8, 60 sec: 47513.6, 300 sec: 46652.8). Total num frames: 802291712. Throughput: 0: 11753.3. Samples: 200639488. Policy #0 lag: (min: 63.0, avg: 186.8, max: 319.0) [2024-06-15 16:21:20,955][1648985] Avg episode reward: [(0, '155.860')] [2024-06-15 16:21:22,233][1652491] Updated weights for policy 0, policy_version 391764 (0.0079) [2024-06-15 16:21:23,256][1652491] Updated weights for policy 0, policy_version 391808 (0.0024) [2024-06-15 16:21:25,955][1648985] Fps is (10 sec: 45875.6, 60 sec: 45329.1, 300 sec: 46652.8). Total num frames: 802422784. Throughput: 0: 11719.1. Samples: 200672256. Policy #0 lag: (min: 63.0, avg: 186.8, max: 319.0) [2024-06-15 16:21:25,956][1648985] Avg episode reward: [(0, '147.370')] [2024-06-15 16:21:27,856][1652491] Updated weights for policy 0, policy_version 391872 (0.0013) [2024-06-15 16:21:29,151][1652491] Updated weights for policy 0, policy_version 391929 (0.0014) [2024-06-15 16:21:30,955][1648985] Fps is (10 sec: 45874.9, 60 sec: 47513.8, 300 sec: 46544.7). Total num frames: 802750464. Throughput: 0: 11753.2. Samples: 200744448. Policy #0 lag: (min: 63.0, avg: 186.8, max: 319.0) [2024-06-15 16:21:30,956][1648985] Avg episode reward: [(0, '130.560')] [2024-06-15 16:21:30,967][1652491] Updated weights for policy 0, policy_version 391970 (0.0020) [2024-06-15 16:21:34,011][1652491] Updated weights for policy 0, policy_version 392016 (0.0014) [2024-06-15 16:21:35,089][1652491] Updated weights for policy 0, policy_version 392064 (0.0014) [2024-06-15 16:21:35,955][1648985] Fps is (10 sec: 52428.5, 60 sec: 46421.3, 300 sec: 46652.7). Total num frames: 802947072. Throughput: 0: 11582.6. Samples: 200813056. Policy #0 lag: (min: 63.0, avg: 186.8, max: 319.0) [2024-06-15 16:21:35,956][1648985] Avg episode reward: [(0, '124.400')] [2024-06-15 16:21:39,281][1652491] Updated weights for policy 0, policy_version 392144 (0.0124) [2024-06-15 16:21:40,955][1648985] Fps is (10 sec: 45875.4, 60 sec: 48059.8, 300 sec: 46764.3). Total num frames: 803209216. Throughput: 0: 11548.5. Samples: 200846848. Policy #0 lag: (min: 63.0, avg: 186.8, max: 319.0) [2024-06-15 16:21:40,955][1648985] Avg episode reward: [(0, '140.440')] [2024-06-15 16:21:42,397][1652491] Updated weights for policy 0, policy_version 392208 (0.0015) [2024-06-15 16:21:45,014][1652491] Updated weights for policy 0, policy_version 392257 (0.0014) [2024-06-15 16:21:45,955][1648985] Fps is (10 sec: 49152.6, 60 sec: 46967.5, 300 sec: 46541.7). Total num frames: 803438592. Throughput: 0: 11389.2. Samples: 200915456. Policy #0 lag: (min: 63.0, avg: 186.8, max: 319.0) [2024-06-15 16:21:45,955][1648985] Avg episode reward: [(0, '154.110')] [2024-06-15 16:21:46,154][1652491] Updated weights for policy 0, policy_version 392317 (0.0012) [2024-06-15 16:21:49,885][1651469] Signal inference workers to stop experience collection... (20450 times) [2024-06-15 16:21:49,940][1652491] InferenceWorker_p0-w0: stopping experience collection (20450 times) [2024-06-15 16:21:50,062][1651469] Signal inference workers to resume experience collection... (20450 times) [2024-06-15 16:21:50,062][1652491] InferenceWorker_p0-w0: resuming experience collection (20450 times) [2024-06-15 16:21:50,533][1652491] Updated weights for policy 0, policy_version 392387 (0.0018) [2024-06-15 16:21:50,955][1648985] Fps is (10 sec: 42597.8, 60 sec: 46421.3, 300 sec: 46986.0). Total num frames: 803635200. Throughput: 0: 11411.9. Samples: 200985600. Policy #0 lag: (min: 63.0, avg: 186.8, max: 319.0) [2024-06-15 16:21:50,956][1648985] Avg episode reward: [(0, '166.070')] [2024-06-15 16:21:51,660][1652491] Updated weights for policy 0, policy_version 392439 (0.0013) [2024-06-15 16:21:54,796][1652491] Updated weights for policy 0, policy_version 392505 (0.0013) [2024-06-15 16:21:55,955][1648985] Fps is (10 sec: 42597.1, 60 sec: 45875.1, 300 sec: 46652.7). Total num frames: 803864576. Throughput: 0: 11480.2. Samples: 201022976. Policy #0 lag: (min: 63.0, avg: 186.8, max: 319.0) [2024-06-15 16:21:55,956][1648985] Avg episode reward: [(0, '148.060')] [2024-06-15 16:21:57,147][1652491] Updated weights for policy 0, policy_version 392576 (0.0112) [2024-06-15 16:22:00,955][1648985] Fps is (10 sec: 42599.2, 60 sec: 44783.1, 300 sec: 46874.9). Total num frames: 804061184. Throughput: 0: 11355.1. Samples: 201094656. Policy #0 lag: (min: 63.0, avg: 186.8, max: 319.0) [2024-06-15 16:22:00,955][1648985] Avg episode reward: [(0, '158.190')] [2024-06-15 16:22:01,537][1652491] Updated weights for policy 0, policy_version 392642 (0.0015) [2024-06-15 16:22:02,851][1652491] Updated weights for policy 0, policy_version 392703 (0.0092) [2024-06-15 16:22:05,955][1648985] Fps is (10 sec: 45876.0, 60 sec: 44782.9, 300 sec: 46430.7). Total num frames: 804323328. Throughput: 0: 11514.3. Samples: 201157632. Policy #0 lag: (min: 63.0, avg: 186.8, max: 319.0) [2024-06-15 16:22:05,956][1648985] Avg episode reward: [(0, '163.000')] [2024-06-15 16:22:06,353][1652491] Updated weights for policy 0, policy_version 392762 (0.0150) [2024-06-15 16:22:09,167][1652491] Updated weights for policy 0, policy_version 392822 (0.0017) [2024-06-15 16:22:10,955][1648985] Fps is (10 sec: 45874.5, 60 sec: 44782.9, 300 sec: 46652.7). Total num frames: 804519936. Throughput: 0: 11593.9. Samples: 201193984. Policy #0 lag: (min: 63.0, avg: 186.8, max: 319.0) [2024-06-15 16:22:10,956][1648985] Avg episode reward: [(0, '152.010')] [2024-06-15 16:22:11,831][1652491] Updated weights for policy 0, policy_version 392864 (0.0024) [2024-06-15 16:22:14,182][1652491] Updated weights for policy 0, policy_version 392949 (0.0015) [2024-06-15 16:22:15,955][1648985] Fps is (10 sec: 45874.2, 60 sec: 46967.3, 300 sec: 46433.6). Total num frames: 804782080. Throughput: 0: 11366.3. Samples: 201255936. Policy #0 lag: (min: 63.0, avg: 186.8, max: 319.0) [2024-06-15 16:22:15,956][1648985] Avg episode reward: [(0, '153.710')] [2024-06-15 16:22:17,552][1652491] Updated weights for policy 0, policy_version 392992 (0.0017) [2024-06-15 16:22:19,700][1652491] Updated weights for policy 0, policy_version 393045 (0.0015) [2024-06-15 16:22:20,955][1648985] Fps is (10 sec: 52429.1, 60 sec: 45875.1, 300 sec: 46652.7). Total num frames: 805044224. Throughput: 0: 11446.0. Samples: 201328128. Policy #0 lag: (min: 7.0, avg: 119.2, max: 263.0) [2024-06-15 16:22:20,956][1648985] Avg episode reward: [(0, '135.990')] [2024-06-15 16:22:22,597][1652491] Updated weights for policy 0, policy_version 393092 (0.0017) [2024-06-15 16:22:23,692][1652491] Updated weights for policy 0, policy_version 393145 (0.0013) [2024-06-15 16:22:25,417][1652491] Updated weights for policy 0, policy_version 393206 (0.0098) [2024-06-15 16:22:25,955][1648985] Fps is (10 sec: 52430.3, 60 sec: 48059.8, 300 sec: 46763.9). Total num frames: 805306368. Throughput: 0: 11514.3. Samples: 201364992. Policy #0 lag: (min: 7.0, avg: 119.2, max: 263.0) [2024-06-15 16:22:25,955][1648985] Avg episode reward: [(0, '142.690')] [2024-06-15 16:22:28,261][1652491] Updated weights for policy 0, policy_version 393248 (0.0013) [2024-06-15 16:22:30,667][1652491] Updated weights for policy 0, policy_version 393297 (0.0020) [2024-06-15 16:22:30,955][1648985] Fps is (10 sec: 45874.7, 60 sec: 45875.1, 300 sec: 46430.6). Total num frames: 805502976. Throughput: 0: 11673.5. Samples: 201440768. Policy #0 lag: (min: 7.0, avg: 119.2, max: 263.0) [2024-06-15 16:22:30,956][1648985] Avg episode reward: [(0, '143.970')] [2024-06-15 16:22:31,612][1652491] Updated weights for policy 0, policy_version 393343 (0.0019) [2024-06-15 16:22:33,871][1651469] Signal inference workers to stop experience collection... (20500 times) [2024-06-15 16:22:33,940][1652491] InferenceWorker_p0-w0: stopping experience collection (20500 times) [2024-06-15 16:22:34,045][1651469] Signal inference workers to resume experience collection... (20500 times) [2024-06-15 16:22:34,046][1652491] InferenceWorker_p0-w0: resuming experience collection (20500 times) [2024-06-15 16:22:34,445][1652491] Updated weights for policy 0, policy_version 393397 (0.0016) [2024-06-15 16:22:35,955][1648985] Fps is (10 sec: 45874.8, 60 sec: 46967.5, 300 sec: 46874.9). Total num frames: 805765120. Throughput: 0: 11594.0. Samples: 201507328. Policy #0 lag: (min: 7.0, avg: 119.2, max: 263.0) [2024-06-15 16:22:35,956][1648985] Avg episode reward: [(0, '138.010')] [2024-06-15 16:22:35,964][1652491] Updated weights for policy 0, policy_version 393442 (0.0013) [2024-06-15 16:22:36,641][1652491] Updated weights for policy 0, policy_version 393472 (0.0011) [2024-06-15 16:22:39,580][1652491] Updated weights for policy 0, policy_version 393530 (0.0012) [2024-06-15 16:22:40,955][1648985] Fps is (10 sec: 45875.7, 60 sec: 45875.2, 300 sec: 46208.5). Total num frames: 805961728. Throughput: 0: 11605.4. Samples: 201545216. Policy #0 lag: (min: 7.0, avg: 119.2, max: 263.0) [2024-06-15 16:22:40,956][1648985] Avg episode reward: [(0, '141.160')] [2024-06-15 16:22:42,163][1652491] Updated weights for policy 0, policy_version 393573 (0.0037) [2024-06-15 16:22:44,584][1652491] Updated weights for policy 0, policy_version 393602 (0.0094) [2024-06-15 16:22:45,955][1648985] Fps is (10 sec: 42598.8, 60 sec: 45875.2, 300 sec: 46874.9). Total num frames: 806191104. Throughput: 0: 11650.8. Samples: 201618944. Policy #0 lag: (min: 7.0, avg: 119.2, max: 263.0) [2024-06-15 16:22:45,955][1648985] Avg episode reward: [(0, '148.250')] [2024-06-15 16:22:46,211][1652491] Updated weights for policy 0, policy_version 393665 (0.0014) [2024-06-15 16:22:47,394][1652491] Updated weights for policy 0, policy_version 393726 (0.0108) [2024-06-15 16:22:50,817][1652491] Updated weights for policy 0, policy_version 393788 (0.0013) [2024-06-15 16:22:50,955][1648985] Fps is (10 sec: 52428.6, 60 sec: 47513.6, 300 sec: 46652.7). Total num frames: 806486016. Throughput: 0: 11741.9. Samples: 201686016. Policy #0 lag: (min: 7.0, avg: 119.2, max: 263.0) [2024-06-15 16:22:50,956][1648985] Avg episode reward: [(0, '152.620')] [2024-06-15 16:22:53,807][1652491] Updated weights for policy 0, policy_version 393844 (0.0013) [2024-06-15 16:22:55,955][1648985] Fps is (10 sec: 42598.0, 60 sec: 45875.3, 300 sec: 46652.7). Total num frames: 806617088. Throughput: 0: 11753.3. Samples: 201722880. Policy #0 lag: (min: 7.0, avg: 119.2, max: 263.0) [2024-06-15 16:22:55,956][1648985] Avg episode reward: [(0, '145.610')] [2024-06-15 16:22:56,111][1652491] Updated weights for policy 0, policy_version 393873 (0.0011) [2024-06-15 16:22:56,377][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000393888_806682624.pth... [2024-06-15 16:22:56,548][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000388416_795475968.pth [2024-06-15 16:22:58,028][1652491] Updated weights for policy 0, policy_version 393952 (0.0013) [2024-06-15 16:23:00,586][1652491] Updated weights for policy 0, policy_version 393986 (0.0046) [2024-06-15 16:23:00,955][1648985] Fps is (10 sec: 42598.4, 60 sec: 47513.5, 300 sec: 46763.8). Total num frames: 806912000. Throughput: 0: 11889.8. Samples: 201790976. Policy #0 lag: (min: 7.0, avg: 119.2, max: 263.0) [2024-06-15 16:23:00,956][1648985] Avg episode reward: [(0, '138.710')] [2024-06-15 16:23:04,044][1652491] Updated weights for policy 0, policy_version 394051 (0.0013) [2024-06-15 16:23:05,272][1652491] Updated weights for policy 0, policy_version 394112 (0.0023) [2024-06-15 16:23:05,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 46967.5, 300 sec: 46652.7). Total num frames: 807141376. Throughput: 0: 11810.1. Samples: 201859584. Policy #0 lag: (min: 7.0, avg: 119.2, max: 263.0) [2024-06-15 16:23:05,956][1648985] Avg episode reward: [(0, '141.190')] [2024-06-15 16:23:08,287][1652491] Updated weights for policy 0, policy_version 394171 (0.0133) [2024-06-15 16:23:10,364][1652491] Updated weights for policy 0, policy_version 394224 (0.0013) [2024-06-15 16:23:10,955][1648985] Fps is (10 sec: 49152.4, 60 sec: 48059.8, 300 sec: 46763.8). Total num frames: 807403520. Throughput: 0: 11696.3. Samples: 201891328. Policy #0 lag: (min: 7.0, avg: 119.2, max: 263.0) [2024-06-15 16:23:10,956][1648985] Avg episode reward: [(0, '131.960')] [2024-06-15 16:23:12,898][1652491] Updated weights for policy 0, policy_version 394258 (0.0018) [2024-06-15 16:23:15,615][1652491] Updated weights for policy 0, policy_version 394336 (0.0012) [2024-06-15 16:23:15,955][1648985] Fps is (10 sec: 45875.5, 60 sec: 46967.7, 300 sec: 46430.6). Total num frames: 807600128. Throughput: 0: 11639.5. Samples: 201964544. Policy #0 lag: (min: 7.0, avg: 119.2, max: 263.0) [2024-06-15 16:23:15,955][1648985] Avg episode reward: [(0, '137.440')] [2024-06-15 16:23:18,552][1652491] Updated weights for policy 0, policy_version 394384 (0.0013) [2024-06-15 16:23:18,695][1651469] Signal inference workers to stop experience collection... (20550 times) [2024-06-15 16:23:18,774][1652491] InferenceWorker_p0-w0: stopping experience collection (20550 times) [2024-06-15 16:23:19,034][1651469] Signal inference workers to resume experience collection... (20550 times) [2024-06-15 16:23:19,035][1652491] InferenceWorker_p0-w0: resuming experience collection (20550 times) [2024-06-15 16:23:19,773][1652491] Updated weights for policy 0, policy_version 394431 (0.0013) [2024-06-15 16:23:20,955][1648985] Fps is (10 sec: 39321.7, 60 sec: 45875.2, 300 sec: 46430.6). Total num frames: 807796736. Throughput: 0: 11685.0. Samples: 202033152. Policy #0 lag: (min: 7.0, avg: 119.2, max: 263.0) [2024-06-15 16:23:20,955][1648985] Avg episode reward: [(0, '138.570')] [2024-06-15 16:23:22,122][1652491] Updated weights for policy 0, policy_version 394488 (0.0011) [2024-06-15 16:23:24,472][1652491] Updated weights for policy 0, policy_version 394528 (0.0123) [2024-06-15 16:23:25,955][1648985] Fps is (10 sec: 45874.9, 60 sec: 45875.1, 300 sec: 46430.6). Total num frames: 808058880. Throughput: 0: 11662.2. Samples: 202070016. Policy #0 lag: (min: 7.0, avg: 119.2, max: 263.0) [2024-06-15 16:23:25,956][1648985] Avg episode reward: [(0, '137.960')] [2024-06-15 16:23:26,306][1652491] Updated weights for policy 0, policy_version 394576 (0.0116) [2024-06-15 16:23:27,457][1652491] Updated weights for policy 0, policy_version 394624 (0.0011) [2024-06-15 16:23:30,801][1652491] Updated weights for policy 0, policy_version 394675 (0.0139) [2024-06-15 16:23:30,955][1648985] Fps is (10 sec: 49152.5, 60 sec: 46421.5, 300 sec: 46652.8). Total num frames: 808288256. Throughput: 0: 11525.7. Samples: 202137600. Policy #0 lag: (min: 7.0, avg: 119.2, max: 263.0) [2024-06-15 16:23:30,955][1648985] Avg episode reward: [(0, '130.290')] [2024-06-15 16:23:32,729][1652491] Updated weights for policy 0, policy_version 394720 (0.0011) [2024-06-15 16:23:33,343][1652491] Updated weights for policy 0, policy_version 394748 (0.0014) [2024-06-15 16:23:35,908][1652491] Updated weights for policy 0, policy_version 394811 (0.0017) [2024-06-15 16:23:35,955][1648985] Fps is (10 sec: 49152.0, 60 sec: 46421.3, 300 sec: 46541.7). Total num frames: 808550400. Throughput: 0: 11582.6. Samples: 202207232. Policy #0 lag: (min: 7.0, avg: 119.2, max: 263.0) [2024-06-15 16:23:35,956][1648985] Avg episode reward: [(0, '153.580')] [2024-06-15 16:23:38,012][1652491] Updated weights for policy 0, policy_version 394864 (0.0109) [2024-06-15 16:23:40,955][1648985] Fps is (10 sec: 42597.3, 60 sec: 45875.1, 300 sec: 46541.7). Total num frames: 808714240. Throughput: 0: 11525.7. Samples: 202241536. Policy #0 lag: (min: 6.0, avg: 136.5, max: 262.0) [2024-06-15 16:23:40,956][1648985] Avg episode reward: [(0, '162.950')] [2024-06-15 16:23:42,599][1652491] Updated weights for policy 0, policy_version 394935 (0.0015) [2024-06-15 16:23:44,446][1652491] Updated weights for policy 0, policy_version 394992 (0.0015) [2024-06-15 16:23:45,955][1648985] Fps is (10 sec: 42598.2, 60 sec: 46421.2, 300 sec: 46430.6). Total num frames: 808976384. Throughput: 0: 11525.7. Samples: 202309632. Policy #0 lag: (min: 6.0, avg: 136.5, max: 262.0) [2024-06-15 16:23:45,956][1648985] Avg episode reward: [(0, '153.290')] [2024-06-15 16:23:46,691][1652491] Updated weights for policy 0, policy_version 395040 (0.0013) [2024-06-15 16:23:48,701][1652491] Updated weights for policy 0, policy_version 395090 (0.0013) [2024-06-15 16:23:50,955][1648985] Fps is (10 sec: 52429.3, 60 sec: 45875.2, 300 sec: 46652.7). Total num frames: 809238528. Throughput: 0: 11559.8. Samples: 202379776. Policy #0 lag: (min: 6.0, avg: 136.5, max: 262.0) [2024-06-15 16:23:50,956][1648985] Avg episode reward: [(0, '156.990')] [2024-06-15 16:23:52,110][1652491] Updated weights for policy 0, policy_version 395139 (0.0015) [2024-06-15 16:23:53,642][1652491] Updated weights for policy 0, policy_version 395197 (0.0012) [2024-06-15 16:23:55,955][1648985] Fps is (10 sec: 42598.7, 60 sec: 46421.4, 300 sec: 46430.6). Total num frames: 809402368. Throughput: 0: 11571.2. Samples: 202412032. Policy #0 lag: (min: 6.0, avg: 136.5, max: 262.0) [2024-06-15 16:23:55,956][1648985] Avg episode reward: [(0, '156.880')] [2024-06-15 16:23:56,789][1652491] Updated weights for policy 0, policy_version 395254 (0.0099) [2024-06-15 16:23:58,307][1652491] Updated weights for policy 0, policy_version 395293 (0.0012) [2024-06-15 16:24:00,166][1652491] Updated weights for policy 0, policy_version 395349 (0.0104) [2024-06-15 16:24:00,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 47513.6, 300 sec: 46653.6). Total num frames: 809762816. Throughput: 0: 11525.7. Samples: 202483200. Policy #0 lag: (min: 6.0, avg: 136.5, max: 262.0) [2024-06-15 16:24:00,956][1648985] Avg episode reward: [(0, '157.760')] [2024-06-15 16:24:03,210][1652491] Updated weights for policy 0, policy_version 395393 (0.0013) [2024-06-15 16:24:04,442][1652491] Updated weights for policy 0, policy_version 395446 (0.0087) [2024-06-15 16:24:05,955][1648985] Fps is (10 sec: 49151.7, 60 sec: 45875.2, 300 sec: 46652.7). Total num frames: 809893888. Throughput: 0: 11685.0. Samples: 202558976. Policy #0 lag: (min: 6.0, avg: 136.5, max: 262.0) [2024-06-15 16:24:05,956][1648985] Avg episode reward: [(0, '157.400')] [2024-06-15 16:24:06,435][1651469] Signal inference workers to stop experience collection... (20600 times) [2024-06-15 16:24:06,484][1652491] InferenceWorker_p0-w0: stopping experience collection (20600 times) [2024-06-15 16:24:06,682][1651469] Signal inference workers to resume experience collection... (20600 times) [2024-06-15 16:24:06,683][1652491] InferenceWorker_p0-w0: resuming experience collection (20600 times) [2024-06-15 16:24:07,207][1652491] Updated weights for policy 0, policy_version 395488 (0.0134) [2024-06-15 16:24:09,575][1652491] Updated weights for policy 0, policy_version 395552 (0.0013) [2024-06-15 16:24:10,955][1648985] Fps is (10 sec: 39321.3, 60 sec: 45875.1, 300 sec: 46652.7). Total num frames: 810156032. Throughput: 0: 11537.0. Samples: 202589184. Policy #0 lag: (min: 6.0, avg: 136.5, max: 262.0) [2024-06-15 16:24:10,956][1648985] Avg episode reward: [(0, '149.030')] [2024-06-15 16:24:11,397][1652491] Updated weights for policy 0, policy_version 395600 (0.0012) [2024-06-15 16:24:12,376][1652491] Updated weights for policy 0, policy_version 395643 (0.0031) [2024-06-15 16:24:15,654][1652491] Updated weights for policy 0, policy_version 395706 (0.0025) [2024-06-15 16:24:15,955][1648985] Fps is (10 sec: 52429.6, 60 sec: 46967.5, 300 sec: 46652.8). Total num frames: 810418176. Throughput: 0: 11571.2. Samples: 202658304. Policy #0 lag: (min: 6.0, avg: 136.5, max: 262.0) [2024-06-15 16:24:15,955][1648985] Avg episode reward: [(0, '147.570')] [2024-06-15 16:24:18,599][1652491] Updated weights for policy 0, policy_version 395744 (0.0017) [2024-06-15 16:24:20,763][1652491] Updated weights for policy 0, policy_version 395795 (0.0013) [2024-06-15 16:24:20,955][1648985] Fps is (10 sec: 42599.2, 60 sec: 46421.4, 300 sec: 46541.7). Total num frames: 810582016. Throughput: 0: 11650.9. Samples: 202731520. Policy #0 lag: (min: 6.0, avg: 136.5, max: 262.0) [2024-06-15 16:24:20,956][1648985] Avg episode reward: [(0, '153.840')] [2024-06-15 16:24:23,028][1652491] Updated weights for policy 0, policy_version 395856 (0.0094) [2024-06-15 16:24:25,168][1652491] Updated weights for policy 0, policy_version 395906 (0.0015) [2024-06-15 16:24:25,955][1648985] Fps is (10 sec: 45874.1, 60 sec: 46967.4, 300 sec: 46652.7). Total num frames: 810876928. Throughput: 0: 11559.8. Samples: 202761728. Policy #0 lag: (min: 6.0, avg: 136.5, max: 262.0) [2024-06-15 16:24:25,956][1648985] Avg episode reward: [(0, '155.740')] [2024-06-15 16:24:26,315][1652491] Updated weights for policy 0, policy_version 395967 (0.0014) [2024-06-15 16:24:29,760][1652491] Updated weights for policy 0, policy_version 396021 (0.0013) [2024-06-15 16:24:30,955][1648985] Fps is (10 sec: 49151.7, 60 sec: 46421.2, 300 sec: 46763.8). Total num frames: 811073536. Throughput: 0: 11776.0. Samples: 202839552. Policy #0 lag: (min: 6.0, avg: 136.5, max: 262.0) [2024-06-15 16:24:30,956][1648985] Avg episode reward: [(0, '139.050')] [2024-06-15 16:24:31,099][1652491] Updated weights for policy 0, policy_version 396050 (0.0013) [2024-06-15 16:24:33,335][1652491] Updated weights for policy 0, policy_version 396097 (0.0013) [2024-06-15 16:24:34,219][1652491] Updated weights for policy 0, policy_version 396160 (0.0012) [2024-06-15 16:24:35,955][1648985] Fps is (10 sec: 52429.8, 60 sec: 47513.6, 300 sec: 46874.9). Total num frames: 811401216. Throughput: 0: 11867.0. Samples: 202913792. Policy #0 lag: (min: 6.0, avg: 136.5, max: 262.0) [2024-06-15 16:24:35,956][1648985] Avg episode reward: [(0, '150.890')] [2024-06-15 16:24:36,521][1652491] Updated weights for policy 0, policy_version 396224 (0.0017) [2024-06-15 16:24:40,955][1648985] Fps is (10 sec: 49151.9, 60 sec: 47513.7, 300 sec: 46874.9). Total num frames: 811565056. Throughput: 0: 12049.1. Samples: 202954240. Policy #0 lag: (min: 6.0, avg: 136.5, max: 262.0) [2024-06-15 16:24:40,956][1648985] Avg episode reward: [(0, '158.900')] [2024-06-15 16:24:41,437][1652491] Updated weights for policy 0, policy_version 396289 (0.0013) [2024-06-15 16:24:44,177][1652491] Updated weights for policy 0, policy_version 396368 (0.0043) [2024-06-15 16:24:45,955][1648985] Fps is (10 sec: 45874.6, 60 sec: 48059.7, 300 sec: 46652.7). Total num frames: 811859968. Throughput: 0: 11992.2. Samples: 203022848. Policy #0 lag: (min: 6.0, avg: 136.5, max: 262.0) [2024-06-15 16:24:45,956][1648985] Avg episode reward: [(0, '173.380')] [2024-06-15 16:24:45,988][1652491] Updated weights for policy 0, policy_version 396420 (0.0013) [2024-06-15 16:24:47,303][1652491] Updated weights for policy 0, policy_version 396478 (0.0058) [2024-06-15 16:24:50,955][1648985] Fps is (10 sec: 42598.6, 60 sec: 45875.3, 300 sec: 46653.6). Total num frames: 811991040. Throughput: 0: 11969.5. Samples: 203097600. Policy #0 lag: (min: 6.0, avg: 136.5, max: 262.0) [2024-06-15 16:24:50,955][1648985] Avg episode reward: [(0, '155.730')] [2024-06-15 16:24:51,757][1651469] Signal inference workers to stop experience collection... (20650 times) [2024-06-15 16:24:51,792][1652491] InferenceWorker_p0-w0: stopping experience collection (20650 times) [2024-06-15 16:24:52,029][1651469] Signal inference workers to resume experience collection... (20650 times) [2024-06-15 16:24:52,030][1652491] InferenceWorker_p0-w0: resuming experience collection (20650 times) [2024-06-15 16:24:52,151][1652491] Updated weights for policy 0, policy_version 396535 (0.0013) [2024-06-15 16:24:53,310][1652491] Updated weights for policy 0, policy_version 396564 (0.0012) [2024-06-15 16:24:54,681][1652491] Updated weights for policy 0, policy_version 396612 (0.0013) [2024-06-15 16:24:55,695][1652491] Updated weights for policy 0, policy_version 396664 (0.0012) [2024-06-15 16:24:55,955][1648985] Fps is (10 sec: 52429.2, 60 sec: 49698.1, 300 sec: 47097.1). Total num frames: 812384256. Throughput: 0: 12003.6. Samples: 203129344. Policy #0 lag: (min: 6.0, avg: 136.5, max: 262.0) [2024-06-15 16:24:55,955][1648985] Avg episode reward: [(0, '148.300')] [2024-06-15 16:24:55,961][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000396672_812384256.pth... [2024-06-15 16:24:56,044][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000391168_801112064.pth [2024-06-15 16:24:57,840][1652491] Updated weights for policy 0, policy_version 396720 (0.0013) [2024-06-15 16:25:00,955][1648985] Fps is (10 sec: 52427.7, 60 sec: 45875.1, 300 sec: 46652.7). Total num frames: 812515328. Throughput: 0: 12151.4. Samples: 203205120. Policy #0 lag: (min: 14.0, avg: 152.4, max: 270.0) [2024-06-15 16:25:00,956][1648985] Avg episode reward: [(0, '135.130')] [2024-06-15 16:25:02,949][1652491] Updated weights for policy 0, policy_version 396784 (0.0012) [2024-06-15 16:25:04,515][1652491] Updated weights for policy 0, policy_version 396819 (0.0013) [2024-06-15 16:25:05,955][1648985] Fps is (10 sec: 42598.3, 60 sec: 48605.9, 300 sec: 46986.0). Total num frames: 812810240. Throughput: 0: 11946.6. Samples: 203269120. Policy #0 lag: (min: 14.0, avg: 152.4, max: 270.0) [2024-06-15 16:25:05,956][1648985] Avg episode reward: [(0, '137.880')] [2024-06-15 16:25:06,026][1652491] Updated weights for policy 0, policy_version 396886 (0.0010) [2024-06-15 16:25:06,829][1652491] Updated weights for policy 0, policy_version 396928 (0.0014) [2024-06-15 16:25:09,298][1652491] Updated weights for policy 0, policy_version 396988 (0.0014) [2024-06-15 16:25:10,955][1648985] Fps is (10 sec: 52428.5, 60 sec: 48059.7, 300 sec: 46652.7). Total num frames: 813039616. Throughput: 0: 12071.8. Samples: 203304960. Policy #0 lag: (min: 14.0, avg: 152.4, max: 270.0) [2024-06-15 16:25:10,956][1648985] Avg episode reward: [(0, '133.640')] [2024-06-15 16:25:13,466][1652491] Updated weights for policy 0, policy_version 397030 (0.0014) [2024-06-15 16:25:15,955][1648985] Fps is (10 sec: 42598.4, 60 sec: 46967.4, 300 sec: 46763.8). Total num frames: 813236224. Throughput: 0: 12037.7. Samples: 203381248. Policy #0 lag: (min: 14.0, avg: 152.4, max: 270.0) [2024-06-15 16:25:15,956][1648985] Avg episode reward: [(0, '140.090')] [2024-06-15 16:25:16,139][1652491] Updated weights for policy 0, policy_version 397093 (0.0013) [2024-06-15 16:25:17,496][1652491] Updated weights for policy 0, policy_version 397155 (0.0012) [2024-06-15 16:25:19,059][1652491] Updated weights for policy 0, policy_version 397185 (0.0017) [2024-06-15 16:25:20,487][1652491] Updated weights for policy 0, policy_version 397246 (0.0100) [2024-06-15 16:25:20,955][1648985] Fps is (10 sec: 52429.9, 60 sec: 49698.1, 300 sec: 46986.0). Total num frames: 813563904. Throughput: 0: 11855.6. Samples: 203447296. Policy #0 lag: (min: 14.0, avg: 152.4, max: 270.0) [2024-06-15 16:25:20,956][1648985] Avg episode reward: [(0, '133.630')] [2024-06-15 16:25:24,896][1652491] Updated weights for policy 0, policy_version 397306 (0.0013) [2024-06-15 16:25:25,955][1648985] Fps is (10 sec: 45875.4, 60 sec: 46967.6, 300 sec: 46763.9). Total num frames: 813694976. Throughput: 0: 11901.2. Samples: 203489792. Policy #0 lag: (min: 14.0, avg: 152.4, max: 270.0) [2024-06-15 16:25:25,955][1648985] Avg episode reward: [(0, '140.250')] [2024-06-15 16:25:27,249][1652491] Updated weights for policy 0, policy_version 397346 (0.0012) [2024-06-15 16:25:29,069][1652491] Updated weights for policy 0, policy_version 397433 (0.0013) [2024-06-15 16:25:30,955][1648985] Fps is (10 sec: 45875.0, 60 sec: 49151.9, 300 sec: 46986.0). Total num frames: 814022656. Throughput: 0: 11832.9. Samples: 203555328. Policy #0 lag: (min: 14.0, avg: 152.4, max: 270.0) [2024-06-15 16:25:30,956][1648985] Avg episode reward: [(0, '121.000')] [2024-06-15 16:25:31,323][1652491] Updated weights for policy 0, policy_version 397491 (0.0123) [2024-06-15 16:25:34,382][1652491] Updated weights for policy 0, policy_version 397520 (0.0013) [2024-06-15 16:25:34,487][1651469] Signal inference workers to stop experience collection... (20700 times) [2024-06-15 16:25:34,546][1652491] InferenceWorker_p0-w0: stopping experience collection (20700 times) [2024-06-15 16:25:34,724][1651469] Signal inference workers to resume experience collection... (20700 times) [2024-06-15 16:25:34,725][1652491] InferenceWorker_p0-w0: resuming experience collection (20700 times) [2024-06-15 16:25:35,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 46967.5, 300 sec: 47097.1). Total num frames: 814219264. Throughput: 0: 11855.6. Samples: 203631104. Policy #0 lag: (min: 14.0, avg: 152.4, max: 270.0) [2024-06-15 16:25:35,956][1648985] Avg episode reward: [(0, '119.640')] [2024-06-15 16:25:37,743][1652491] Updated weights for policy 0, policy_version 397586 (0.0012) [2024-06-15 16:25:38,896][1652491] Updated weights for policy 0, policy_version 397637 (0.0012) [2024-06-15 16:25:40,137][1652491] Updated weights for policy 0, policy_version 397694 (0.0012) [2024-06-15 16:25:40,955][1648985] Fps is (10 sec: 45874.5, 60 sec: 48605.7, 300 sec: 46985.9). Total num frames: 814481408. Throughput: 0: 11912.5. Samples: 203665408. Policy #0 lag: (min: 14.0, avg: 152.4, max: 270.0) [2024-06-15 16:25:40,956][1648985] Avg episode reward: [(0, '121.150')] [2024-06-15 16:25:41,881][1652491] Updated weights for policy 0, policy_version 397748 (0.0020) [2024-06-15 16:25:45,955][1648985] Fps is (10 sec: 42597.9, 60 sec: 46421.4, 300 sec: 46763.8). Total num frames: 814645248. Throughput: 0: 11958.1. Samples: 203743232. Policy #0 lag: (min: 14.0, avg: 152.4, max: 270.0) [2024-06-15 16:25:45,956][1648985] Avg episode reward: [(0, '135.060')] [2024-06-15 16:25:46,517][1652491] Updated weights for policy 0, policy_version 397793 (0.0012) [2024-06-15 16:25:47,137][1652491] Updated weights for policy 0, policy_version 397824 (0.0012) [2024-06-15 16:25:49,945][1652491] Updated weights for policy 0, policy_version 397890 (0.0014) [2024-06-15 16:25:50,955][1648985] Fps is (10 sec: 45875.9, 60 sec: 49151.9, 300 sec: 46874.9). Total num frames: 814940160. Throughput: 0: 11946.7. Samples: 203806720. Policy #0 lag: (min: 14.0, avg: 152.4, max: 270.0) [2024-06-15 16:25:50,956][1648985] Avg episode reward: [(0, '140.750')] [2024-06-15 16:25:52,101][1652491] Updated weights for policy 0, policy_version 397955 (0.0020) [2024-06-15 16:25:53,125][1652491] Updated weights for policy 0, policy_version 398001 (0.0023) [2024-06-15 16:25:55,955][1648985] Fps is (10 sec: 49152.1, 60 sec: 45875.2, 300 sec: 46652.8). Total num frames: 815136768. Throughput: 0: 11924.0. Samples: 203841536. Policy #0 lag: (min: 14.0, avg: 152.4, max: 270.0) [2024-06-15 16:25:55,956][1648985] Avg episode reward: [(0, '141.730')] [2024-06-15 16:25:57,366][1652491] Updated weights for policy 0, policy_version 398050 (0.0013) [2024-06-15 16:25:57,807][1652491] Updated weights for policy 0, policy_version 398076 (0.0021) [2024-06-15 16:25:59,899][1652491] Updated weights for policy 0, policy_version 398112 (0.0026) [2024-06-15 16:26:00,955][1648985] Fps is (10 sec: 49152.1, 60 sec: 48606.0, 300 sec: 46763.8). Total num frames: 815431680. Throughput: 0: 12049.1. Samples: 203923456. Policy #0 lag: (min: 14.0, avg: 152.4, max: 270.0) [2024-06-15 16:26:00,956][1648985] Avg episode reward: [(0, '156.590')] [2024-06-15 16:26:01,057][1652491] Updated weights for policy 0, policy_version 398163 (0.0015) [2024-06-15 16:26:02,031][1652491] Updated weights for policy 0, policy_version 398207 (0.0027) [2024-06-15 16:26:03,673][1652491] Updated weights for policy 0, policy_version 398270 (0.0011) [2024-06-15 16:26:05,955][1648985] Fps is (10 sec: 52428.0, 60 sec: 47513.5, 300 sec: 46874.9). Total num frames: 815661056. Throughput: 0: 12083.2. Samples: 203991040. Policy #0 lag: (min: 14.0, avg: 152.4, max: 270.0) [2024-06-15 16:26:05,956][1648985] Avg episode reward: [(0, '147.120')] [2024-06-15 16:26:08,323][1652491] Updated weights for policy 0, policy_version 398306 (0.0042) [2024-06-15 16:26:10,360][1652491] Updated weights for policy 0, policy_version 398352 (0.0013) [2024-06-15 16:26:10,955][1648985] Fps is (10 sec: 42598.3, 60 sec: 46967.6, 300 sec: 47097.1). Total num frames: 815857664. Throughput: 0: 12037.7. Samples: 204031488. Policy #0 lag: (min: 14.0, avg: 152.4, max: 270.0) [2024-06-15 16:26:10,956][1648985] Avg episode reward: [(0, '157.180')] [2024-06-15 16:26:11,786][1652491] Updated weights for policy 0, policy_version 398403 (0.0012) [2024-06-15 16:26:12,560][1652491] Updated weights for policy 0, policy_version 398448 (0.0036) [2024-06-15 16:26:13,137][1651469] Signal inference workers to stop experience collection... (20750 times) [2024-06-15 16:26:13,185][1652491] InferenceWorker_p0-w0: stopping experience collection (20750 times) [2024-06-15 16:26:13,401][1651469] Signal inference workers to resume experience collection... (20750 times) [2024-06-15 16:26:13,401][1652491] InferenceWorker_p0-w0: resuming experience collection (20750 times) [2024-06-15 16:26:14,137][1652491] Updated weights for policy 0, policy_version 398503 (0.0013) [2024-06-15 16:26:15,955][1648985] Fps is (10 sec: 52429.9, 60 sec: 49152.1, 300 sec: 47097.1). Total num frames: 816185344. Throughput: 0: 11992.2. Samples: 204094976. Policy #0 lag: (min: 14.0, avg: 152.4, max: 270.0) [2024-06-15 16:26:15,956][1648985] Avg episode reward: [(0, '171.810')] [2024-06-15 16:26:18,571][1652491] Updated weights for policy 0, policy_version 398532 (0.0014) [2024-06-15 16:26:19,800][1652491] Updated weights for policy 0, policy_version 398588 (0.0012) [2024-06-15 16:26:20,955][1648985] Fps is (10 sec: 45874.2, 60 sec: 45875.0, 300 sec: 47097.0). Total num frames: 816316416. Throughput: 0: 12094.5. Samples: 204175360. Policy #0 lag: (min: 15.0, avg: 105.2, max: 271.0) [2024-06-15 16:26:20,956][1648985] Avg episode reward: [(0, '178.580')] [2024-06-15 16:26:23,127][1652491] Updated weights for policy 0, policy_version 398672 (0.0013) [2024-06-15 16:26:24,367][1652491] Updated weights for policy 0, policy_version 398723 (0.0013) [2024-06-15 16:26:25,811][1652491] Updated weights for policy 0, policy_version 398784 (0.0012) [2024-06-15 16:26:25,955][1648985] Fps is (10 sec: 52427.9, 60 sec: 50244.1, 300 sec: 47319.2). Total num frames: 816709632. Throughput: 0: 11878.4. Samples: 204199936. Policy #0 lag: (min: 15.0, avg: 105.2, max: 271.0) [2024-06-15 16:26:25,956][1648985] Avg episode reward: [(0, '166.960')] [2024-06-15 16:26:30,955][1648985] Fps is (10 sec: 42599.4, 60 sec: 45329.1, 300 sec: 46763.8). Total num frames: 816742400. Throughput: 0: 11810.1. Samples: 204274688. Policy #0 lag: (min: 15.0, avg: 105.2, max: 271.0) [2024-06-15 16:26:30,956][1648985] Avg episode reward: [(0, '161.520')] [2024-06-15 16:26:31,958][1652491] Updated weights for policy 0, policy_version 398848 (0.0011) [2024-06-15 16:26:35,625][1652491] Updated weights for policy 0, policy_version 398960 (0.0173) [2024-06-15 16:26:35,955][1648985] Fps is (10 sec: 36044.9, 60 sec: 47513.5, 300 sec: 46986.0). Total num frames: 817070080. Throughput: 0: 11673.6. Samples: 204332032. Policy #0 lag: (min: 15.0, avg: 105.2, max: 271.0) [2024-06-15 16:26:35,956][1648985] Avg episode reward: [(0, '154.850')] [2024-06-15 16:26:37,509][1652491] Updated weights for policy 0, policy_version 399033 (0.0012) [2024-06-15 16:26:40,955][1648985] Fps is (10 sec: 49151.4, 60 sec: 45875.2, 300 sec: 46763.8). Total num frames: 817233920. Throughput: 0: 11548.4. Samples: 204361216. Policy #0 lag: (min: 15.0, avg: 105.2, max: 271.0) [2024-06-15 16:26:40,956][1648985] Avg episode reward: [(0, '145.360')] [2024-06-15 16:26:44,104][1652491] Updated weights for policy 0, policy_version 399091 (0.0012) [2024-06-15 16:26:45,769][1652491] Updated weights for policy 0, policy_version 399139 (0.0013) [2024-06-15 16:26:45,955][1648985] Fps is (10 sec: 36045.5, 60 sec: 46421.4, 300 sec: 46763.8). Total num frames: 817430528. Throughput: 0: 11434.7. Samples: 204438016. Policy #0 lag: (min: 15.0, avg: 105.2, max: 271.0) [2024-06-15 16:26:45,955][1648985] Avg episode reward: [(0, '120.240')] [2024-06-15 16:26:47,690][1652491] Updated weights for policy 0, policy_version 399205 (0.0014) [2024-06-15 16:26:48,760][1652491] Updated weights for policy 0, policy_version 399252 (0.0042) [2024-06-15 16:26:50,955][1648985] Fps is (10 sec: 52429.8, 60 sec: 46967.5, 300 sec: 47097.1). Total num frames: 817758208. Throughput: 0: 11298.2. Samples: 204499456. Policy #0 lag: (min: 15.0, avg: 105.2, max: 271.0) [2024-06-15 16:26:50,955][1648985] Avg episode reward: [(0, '107.040')] [2024-06-15 16:26:54,938][1652491] Updated weights for policy 0, policy_version 399312 (0.0051) [2024-06-15 16:26:55,955][1648985] Fps is (10 sec: 42597.8, 60 sec: 45329.0, 300 sec: 46763.8). Total num frames: 817856512. Throughput: 0: 11309.5. Samples: 204540416. Policy #0 lag: (min: 15.0, avg: 105.2, max: 271.0) [2024-06-15 16:26:55,956][1648985] Avg episode reward: [(0, '107.210')] [2024-06-15 16:26:55,993][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000399360_817889280.pth... [2024-06-15 16:26:56,059][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000393888_806682624.pth [2024-06-15 16:26:56,440][1652491] Updated weights for policy 0, policy_version 399362 (0.0014) [2024-06-15 16:26:57,775][1651469] Signal inference workers to stop experience collection... (20800 times) [2024-06-15 16:26:57,810][1652491] InferenceWorker_p0-w0: stopping experience collection (20800 times) [2024-06-15 16:26:57,835][1652491] Updated weights for policy 0, policy_version 399420 (0.0012) [2024-06-15 16:26:57,910][1651469] Signal inference workers to resume experience collection... (20800 times) [2024-06-15 16:26:57,910][1652491] InferenceWorker_p0-w0: resuming experience collection (20800 times) [2024-06-15 16:26:58,990][1652491] Updated weights for policy 0, policy_version 399472 (0.0011) [2024-06-15 16:27:00,441][1652491] Updated weights for policy 0, policy_version 399540 (0.0015) [2024-06-15 16:27:00,955][1648985] Fps is (10 sec: 52428.4, 60 sec: 47513.6, 300 sec: 47319.2). Total num frames: 818282496. Throughput: 0: 11343.6. Samples: 204605440. Policy #0 lag: (min: 15.0, avg: 105.2, max: 271.0) [2024-06-15 16:27:00,956][1648985] Avg episode reward: [(0, '127.870')] [2024-06-15 16:27:05,955][1648985] Fps is (10 sec: 42598.5, 60 sec: 43690.7, 300 sec: 46652.7). Total num frames: 818282496. Throughput: 0: 11423.3. Samples: 204689408. Policy #0 lag: (min: 15.0, avg: 105.2, max: 271.0) [2024-06-15 16:27:05,956][1648985] Avg episode reward: [(0, '142.400')] [2024-06-15 16:27:06,501][1652491] Updated weights for policy 0, policy_version 399584 (0.0037) [2024-06-15 16:27:07,590][1652491] Updated weights for policy 0, policy_version 399618 (0.0059) [2024-06-15 16:27:09,595][1652491] Updated weights for policy 0, policy_version 399697 (0.0012) [2024-06-15 16:27:10,955][1648985] Fps is (10 sec: 39321.2, 60 sec: 46967.4, 300 sec: 47097.1). Total num frames: 818675712. Throughput: 0: 11457.4. Samples: 204715520. Policy #0 lag: (min: 15.0, avg: 105.2, max: 271.0) [2024-06-15 16:27:10,956][1648985] Avg episode reward: [(0, '125.870')] [2024-06-15 16:27:11,212][1652491] Updated weights for policy 0, policy_version 399765 (0.0150) [2024-06-15 16:27:15,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 43690.6, 300 sec: 46652.7). Total num frames: 818806784. Throughput: 0: 11320.9. Samples: 204784128. Policy #0 lag: (min: 15.0, avg: 105.2, max: 271.0) [2024-06-15 16:27:15,956][1648985] Avg episode reward: [(0, '129.170')] [2024-06-15 16:27:17,765][1652491] Updated weights for policy 0, policy_version 399812 (0.0013) [2024-06-15 16:27:19,142][1652491] Updated weights for policy 0, policy_version 399866 (0.0118) [2024-06-15 16:27:20,955][1648985] Fps is (10 sec: 36045.2, 60 sec: 45329.2, 300 sec: 46541.7). Total num frames: 819036160. Throughput: 0: 11514.3. Samples: 204850176. Policy #0 lag: (min: 15.0, avg: 105.2, max: 271.0) [2024-06-15 16:27:20,956][1648985] Avg episode reward: [(0, '137.090')] [2024-06-15 16:27:21,166][1652491] Updated weights for policy 0, policy_version 399936 (0.0014) [2024-06-15 16:27:23,160][1652491] Updated weights for policy 0, policy_version 400020 (0.0066) [2024-06-15 16:27:25,955][1648985] Fps is (10 sec: 52428.4, 60 sec: 43690.7, 300 sec: 46874.9). Total num frames: 819331072. Throughput: 0: 11400.5. Samples: 204874240. Policy #0 lag: (min: 15.0, avg: 105.2, max: 271.0) [2024-06-15 16:27:25,956][1648985] Avg episode reward: [(0, '137.560')] [2024-06-15 16:27:29,533][1652491] Updated weights for policy 0, policy_version 400081 (0.0016) [2024-06-15 16:27:30,801][1652491] Updated weights for policy 0, policy_version 400129 (0.0012) [2024-06-15 16:27:30,955][1648985] Fps is (10 sec: 42598.5, 60 sec: 45329.1, 300 sec: 46430.6). Total num frames: 819462144. Throughput: 0: 11537.1. Samples: 204957184. Policy #0 lag: (min: 15.0, avg: 105.2, max: 271.0) [2024-06-15 16:27:30,956][1648985] Avg episode reward: [(0, '141.540')] [2024-06-15 16:27:32,684][1652491] Updated weights for policy 0, policy_version 400214 (0.0094) [2024-06-15 16:27:34,168][1651469] Signal inference workers to stop experience collection... (20850 times) [2024-06-15 16:27:34,224][1652491] InferenceWorker_p0-w0: stopping experience collection (20850 times) [2024-06-15 16:27:34,482][1651469] Signal inference workers to resume experience collection... (20850 times) [2024-06-15 16:27:34,494][1652491] InferenceWorker_p0-w0: resuming experience collection (20850 times) [2024-06-15 16:27:35,139][1652491] Updated weights for policy 0, policy_version 400308 (0.0014) [2024-06-15 16:27:35,955][1648985] Fps is (10 sec: 52429.3, 60 sec: 46421.4, 300 sec: 47097.1). Total num frames: 819855360. Throughput: 0: 11286.7. Samples: 205007360. Policy #0 lag: (min: 15.0, avg: 105.2, max: 271.0) [2024-06-15 16:27:35,956][1648985] Avg episode reward: [(0, '145.030')] [2024-06-15 16:27:40,955][1648985] Fps is (10 sec: 39321.6, 60 sec: 43690.8, 300 sec: 46319.5). Total num frames: 819855360. Throughput: 0: 11320.9. Samples: 205049856. Policy #0 lag: (min: 15.0, avg: 105.2, max: 271.0) [2024-06-15 16:27:40,956][1648985] Avg episode reward: [(0, '149.500')] [2024-06-15 16:27:41,669][1652491] Updated weights for policy 0, policy_version 400336 (0.0016) [2024-06-15 16:27:43,238][1652491] Updated weights for policy 0, policy_version 400400 (0.0012) [2024-06-15 16:27:45,208][1652491] Updated weights for policy 0, policy_version 400483 (0.0126) [2024-06-15 16:27:45,958][1648985] Fps is (10 sec: 39309.0, 60 sec: 46964.9, 300 sec: 46652.2). Total num frames: 820248576. Throughput: 0: 11388.3. Samples: 205117952. Policy #0 lag: (min: 15.0, avg: 56.3, max: 239.0) [2024-06-15 16:27:45,959][1648985] Avg episode reward: [(0, '155.470')] [2024-06-15 16:27:46,864][1652491] Updated weights for policy 0, policy_version 400560 (0.0097) [2024-06-15 16:27:50,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 43690.6, 300 sec: 46652.7). Total num frames: 820379648. Throughput: 0: 11082.0. Samples: 205188096. Policy #0 lag: (min: 15.0, avg: 56.3, max: 239.0) [2024-06-15 16:27:50,956][1648985] Avg episode reward: [(0, '165.560')] [2024-06-15 16:27:53,897][1652491] Updated weights for policy 0, policy_version 400621 (0.0012) [2024-06-15 16:27:55,955][1648985] Fps is (10 sec: 36056.3, 60 sec: 45875.2, 300 sec: 46430.6). Total num frames: 820609024. Throughput: 0: 11218.5. Samples: 205220352. Policy #0 lag: (min: 15.0, avg: 56.3, max: 239.0) [2024-06-15 16:27:55,956][1648985] Avg episode reward: [(0, '175.610')] [2024-06-15 16:27:55,977][1652491] Updated weights for policy 0, policy_version 400704 (0.0011) [2024-06-15 16:27:57,638][1652491] Updated weights for policy 0, policy_version 400774 (0.0020) [2024-06-15 16:27:58,941][1652491] Updated weights for policy 0, policy_version 400829 (0.0011) [2024-06-15 16:28:00,955][1648985] Fps is (10 sec: 52428.3, 60 sec: 43690.6, 300 sec: 46652.7). Total num frames: 820903936. Throughput: 0: 10945.4. Samples: 205276672. Policy #0 lag: (min: 15.0, avg: 56.3, max: 239.0) [2024-06-15 16:28:00,956][1648985] Avg episode reward: [(0, '168.280')] [2024-06-15 16:28:05,955][1648985] Fps is (10 sec: 36045.1, 60 sec: 44783.0, 300 sec: 45986.3). Total num frames: 820969472. Throughput: 0: 11195.7. Samples: 205353984. Policy #0 lag: (min: 15.0, avg: 56.3, max: 239.0) [2024-06-15 16:28:05,955][1648985] Avg episode reward: [(0, '156.870')] [2024-06-15 16:28:06,625][1652491] Updated weights for policy 0, policy_version 400896 (0.0013) [2024-06-15 16:28:07,994][1652491] Updated weights for policy 0, policy_version 400948 (0.0011) [2024-06-15 16:28:09,653][1652491] Updated weights for policy 0, policy_version 401026 (0.0113) [2024-06-15 16:28:10,955][1648985] Fps is (10 sec: 49152.5, 60 sec: 45329.2, 300 sec: 46763.8). Total num frames: 821395456. Throughput: 0: 11252.6. Samples: 205380608. Policy #0 lag: (min: 15.0, avg: 56.3, max: 239.0) [2024-06-15 16:28:10,956][1648985] Avg episode reward: [(0, '162.920')] [2024-06-15 16:28:11,069][1652491] Updated weights for policy 0, policy_version 401081 (0.0012) [2024-06-15 16:28:15,955][1648985] Fps is (10 sec: 45874.8, 60 sec: 43690.7, 300 sec: 46208.4). Total num frames: 821428224. Throughput: 0: 10979.5. Samples: 205451264. Policy #0 lag: (min: 15.0, avg: 56.3, max: 239.0) [2024-06-15 16:28:15,956][1648985] Avg episode reward: [(0, '156.550')] [2024-06-15 16:28:17,105][1652491] Updated weights for policy 0, policy_version 401105 (0.0012) [2024-06-15 16:28:17,953][1651469] Signal inference workers to stop experience collection... (20900 times) [2024-06-15 16:28:18,115][1652491] InferenceWorker_p0-w0: stopping experience collection (20900 times) [2024-06-15 16:28:18,262][1651469] Signal inference workers to resume experience collection... (20900 times) [2024-06-15 16:28:18,263][1652491] InferenceWorker_p0-w0: resuming experience collection (20900 times) [2024-06-15 16:28:18,935][1652491] Updated weights for policy 0, policy_version 401170 (0.0012) [2024-06-15 16:28:20,560][1652491] Updated weights for policy 0, policy_version 401252 (0.0126) [2024-06-15 16:28:20,955][1648985] Fps is (10 sec: 39322.0, 60 sec: 45875.3, 300 sec: 46541.7). Total num frames: 821788672. Throughput: 0: 11195.8. Samples: 205511168. Policy #0 lag: (min: 15.0, avg: 56.3, max: 239.0) [2024-06-15 16:28:20,955][1648985] Avg episode reward: [(0, '159.740')] [2024-06-15 16:28:22,267][1652491] Updated weights for policy 0, policy_version 401328 (0.0013) [2024-06-15 16:28:25,955][1648985] Fps is (10 sec: 52428.3, 60 sec: 43690.6, 300 sec: 46319.5). Total num frames: 821952512. Throughput: 0: 11025.0. Samples: 205545984. Policy #0 lag: (min: 15.0, avg: 56.3, max: 239.0) [2024-06-15 16:28:25,956][1648985] Avg episode reward: [(0, '164.010')] [2024-06-15 16:28:27,945][1652491] Updated weights for policy 0, policy_version 401345 (0.0011) [2024-06-15 16:28:29,276][1652491] Updated weights for policy 0, policy_version 401407 (0.0013) [2024-06-15 16:28:30,955][1648985] Fps is (10 sec: 42597.9, 60 sec: 45875.2, 300 sec: 46319.5). Total num frames: 822214656. Throughput: 0: 11242.0. Samples: 205623808. Policy #0 lag: (min: 15.0, avg: 56.3, max: 239.0) [2024-06-15 16:28:30,956][1648985] Avg episode reward: [(0, '163.880')] [2024-06-15 16:28:31,005][1652491] Updated weights for policy 0, policy_version 401488 (0.0013) [2024-06-15 16:28:31,939][1652491] Updated weights for policy 0, policy_version 401532 (0.0015) [2024-06-15 16:28:33,348][1652491] Updated weights for policy 0, policy_version 401596 (0.0029) [2024-06-15 16:28:35,955][1648985] Fps is (10 sec: 52429.9, 60 sec: 43690.7, 300 sec: 46652.8). Total num frames: 822476800. Throughput: 0: 11241.3. Samples: 205693952. Policy #0 lag: (min: 15.0, avg: 56.3, max: 239.0) [2024-06-15 16:28:35,955][1648985] Avg episode reward: [(0, '150.340')] [2024-06-15 16:28:39,684][1652491] Updated weights for policy 0, policy_version 401648 (0.0013) [2024-06-15 16:28:40,955][1648985] Fps is (10 sec: 39321.8, 60 sec: 45875.2, 300 sec: 46208.4). Total num frames: 822607872. Throughput: 0: 11400.5. Samples: 205733376. Policy #0 lag: (min: 15.0, avg: 56.3, max: 239.0) [2024-06-15 16:28:40,955][1648985] Avg episode reward: [(0, '151.440')] [2024-06-15 16:28:41,842][1652491] Updated weights for policy 0, policy_version 401712 (0.0011) [2024-06-15 16:28:43,273][1652491] Updated weights for policy 0, policy_version 401780 (0.0087) [2024-06-15 16:28:44,767][1652491] Updated weights for policy 0, policy_version 401848 (0.0012) [2024-06-15 16:28:45,955][1648985] Fps is (10 sec: 52428.0, 60 sec: 45877.6, 300 sec: 46652.7). Total num frames: 823001088. Throughput: 0: 11400.5. Samples: 205789696. Policy #0 lag: (min: 15.0, avg: 56.3, max: 239.0) [2024-06-15 16:28:45,956][1648985] Avg episode reward: [(0, '137.760')] [2024-06-15 16:28:50,868][1652491] Updated weights for policy 0, policy_version 401908 (0.0013) [2024-06-15 16:28:50,955][1648985] Fps is (10 sec: 49151.7, 60 sec: 45329.0, 300 sec: 46430.6). Total num frames: 823099392. Throughput: 0: 11377.8. Samples: 205865984. Policy #0 lag: (min: 15.0, avg: 56.3, max: 239.0) [2024-06-15 16:28:50,956][1648985] Avg episode reward: [(0, '140.270')] [2024-06-15 16:28:52,313][1652491] Updated weights for policy 0, policy_version 401952 (0.0013) [2024-06-15 16:28:53,875][1652491] Updated weights for policy 0, policy_version 402016 (0.0024) [2024-06-15 16:28:54,000][1651469] Signal inference workers to stop experience collection... (20950 times) [2024-06-15 16:28:54,077][1652491] InferenceWorker_p0-w0: stopping experience collection (20950 times) [2024-06-15 16:28:54,312][1651469] Signal inference workers to resume experience collection... (20950 times) [2024-06-15 16:28:54,313][1652491] InferenceWorker_p0-w0: resuming experience collection (20950 times) [2024-06-15 16:28:55,955][1648985] Fps is (10 sec: 42598.6, 60 sec: 46967.5, 300 sec: 46319.5). Total num frames: 823427072. Throughput: 0: 11457.4. Samples: 205896192. Policy #0 lag: (min: 15.0, avg: 56.3, max: 239.0) [2024-06-15 16:28:55,956][1648985] Avg episode reward: [(0, '132.980')] [2024-06-15 16:28:56,241][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000402080_823459840.pth... [2024-06-15 16:28:56,381][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000396672_812384256.pth [2024-06-15 16:28:56,658][1652491] Updated weights for policy 0, policy_version 402096 (0.0013) [2024-06-15 16:29:00,955][1648985] Fps is (10 sec: 42598.0, 60 sec: 43690.6, 300 sec: 46208.4). Total num frames: 823525376. Throughput: 0: 11605.3. Samples: 205973504. Policy #0 lag: (min: 15.0, avg: 56.3, max: 239.0) [2024-06-15 16:29:00,956][1648985] Avg episode reward: [(0, '108.740')] [2024-06-15 16:29:01,608][1652491] Updated weights for policy 0, policy_version 402146 (0.0013) [2024-06-15 16:29:02,870][1652491] Updated weights for policy 0, policy_version 402192 (0.0015) [2024-06-15 16:29:04,698][1652491] Updated weights for policy 0, policy_version 402263 (0.0017) [2024-06-15 16:29:05,318][1652491] Updated weights for policy 0, policy_version 402304 (0.0016) [2024-06-15 16:29:05,955][1648985] Fps is (10 sec: 49152.1, 60 sec: 49152.0, 300 sec: 46652.8). Total num frames: 823918592. Throughput: 0: 11673.6. Samples: 206036480. Policy #0 lag: (min: 48.0, avg: 120.1, max: 304.0) [2024-06-15 16:29:05,956][1648985] Avg episode reward: [(0, '128.450')] [2024-06-15 16:29:07,670][1652491] Updated weights for policy 0, policy_version 402360 (0.0012) [2024-06-15 16:29:10,955][1648985] Fps is (10 sec: 52429.5, 60 sec: 44236.8, 300 sec: 46208.4). Total num frames: 824049664. Throughput: 0: 11787.4. Samples: 206076416. Policy #0 lag: (min: 48.0, avg: 120.1, max: 304.0) [2024-06-15 16:29:10,956][1648985] Avg episode reward: [(0, '158.700')] [2024-06-15 16:29:12,074][1652491] Updated weights for policy 0, policy_version 402387 (0.0013) [2024-06-15 16:29:13,832][1652491] Updated weights for policy 0, policy_version 402464 (0.0012) [2024-06-15 16:29:14,633][1652491] Updated weights for policy 0, policy_version 402495 (0.0015) [2024-06-15 16:29:15,955][1648985] Fps is (10 sec: 45875.0, 60 sec: 49152.0, 300 sec: 46763.8). Total num frames: 824377344. Throughput: 0: 11741.9. Samples: 206152192. Policy #0 lag: (min: 48.0, avg: 120.1, max: 304.0) [2024-06-15 16:29:15,956][1648985] Avg episode reward: [(0, '158.670')] [2024-06-15 16:29:16,312][1652491] Updated weights for policy 0, policy_version 402557 (0.0093) [2024-06-15 16:29:17,772][1652491] Updated weights for policy 0, policy_version 402597 (0.0027) [2024-06-15 16:29:20,955][1648985] Fps is (10 sec: 52428.5, 60 sec: 46421.2, 300 sec: 46430.6). Total num frames: 824573952. Throughput: 0: 11912.5. Samples: 206230016. Policy #0 lag: (min: 48.0, avg: 120.1, max: 304.0) [2024-06-15 16:29:20,956][1648985] Avg episode reward: [(0, '169.850')] [2024-06-15 16:29:22,501][1652491] Updated weights for policy 0, policy_version 402645 (0.0012) [2024-06-15 16:29:23,189][1652491] Updated weights for policy 0, policy_version 402685 (0.0014) [2024-06-15 16:29:24,442][1652491] Updated weights for policy 0, policy_version 402736 (0.0012) [2024-06-15 16:29:25,955][1648985] Fps is (10 sec: 49151.5, 60 sec: 48605.9, 300 sec: 46763.8). Total num frames: 824868864. Throughput: 0: 11901.1. Samples: 206268928. Policy #0 lag: (min: 48.0, avg: 120.1, max: 304.0) [2024-06-15 16:29:25,956][1648985] Avg episode reward: [(0, '149.650')] [2024-06-15 16:29:26,452][1652491] Updated weights for policy 0, policy_version 402784 (0.0057) [2024-06-15 16:29:27,656][1652491] Updated weights for policy 0, policy_version 402833 (0.0013) [2024-06-15 16:29:30,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 48059.7, 300 sec: 46430.6). Total num frames: 825098240. Throughput: 0: 12106.0. Samples: 206334464. Policy #0 lag: (min: 48.0, avg: 120.1, max: 304.0) [2024-06-15 16:29:30,956][1648985] Avg episode reward: [(0, '148.050')] [2024-06-15 16:29:33,670][1652491] Updated weights for policy 0, policy_version 402881 (0.0013) [2024-06-15 16:29:34,767][1652491] Updated weights for policy 0, policy_version 402939 (0.0063) [2024-06-15 16:29:35,955][1648985] Fps is (10 sec: 42598.9, 60 sec: 46967.4, 300 sec: 46541.7). Total num frames: 825294848. Throughput: 0: 12026.3. Samples: 206407168. Policy #0 lag: (min: 48.0, avg: 120.1, max: 304.0) [2024-06-15 16:29:35,956][1648985] Avg episode reward: [(0, '155.150')] [2024-06-15 16:29:36,277][1652491] Updated weights for policy 0, policy_version 402992 (0.0013) [2024-06-15 16:29:36,888][1651469] Signal inference workers to stop experience collection... (21000 times) [2024-06-15 16:29:36,916][1652491] InferenceWorker_p0-w0: stopping experience collection (21000 times) [2024-06-15 16:29:37,107][1651469] Signal inference workers to resume experience collection... (21000 times) [2024-06-15 16:29:37,107][1652491] InferenceWorker_p0-w0: resuming experience collection (21000 times) [2024-06-15 16:29:37,290][1652491] Updated weights for policy 0, policy_version 403028 (0.0013) [2024-06-15 16:29:39,124][1652491] Updated weights for policy 0, policy_version 403104 (0.0012) [2024-06-15 16:29:40,955][1648985] Fps is (10 sec: 52428.5, 60 sec: 50244.2, 300 sec: 46652.7). Total num frames: 825622528. Throughput: 0: 12003.5. Samples: 206436352. Policy #0 lag: (min: 48.0, avg: 120.1, max: 304.0) [2024-06-15 16:29:40,956][1648985] Avg episode reward: [(0, '156.200')] [2024-06-15 16:29:45,779][1652491] Updated weights for policy 0, policy_version 403168 (0.0045) [2024-06-15 16:29:45,955][1648985] Fps is (10 sec: 39321.9, 60 sec: 44783.0, 300 sec: 46430.6). Total num frames: 825688064. Throughput: 0: 12015.0. Samples: 206514176. Policy #0 lag: (min: 48.0, avg: 120.1, max: 304.0) [2024-06-15 16:29:45,955][1648985] Avg episode reward: [(0, '172.060')] [2024-06-15 16:29:47,643][1652491] Updated weights for policy 0, policy_version 403232 (0.0013) [2024-06-15 16:29:49,358][1652491] Updated weights for policy 0, policy_version 403297 (0.0013) [2024-06-15 16:29:50,681][1652491] Updated weights for policy 0, policy_version 403360 (0.0143) [2024-06-15 16:29:50,955][1648985] Fps is (10 sec: 45875.4, 60 sec: 49698.1, 300 sec: 46430.6). Total num frames: 826081280. Throughput: 0: 11855.6. Samples: 206569984. Policy #0 lag: (min: 48.0, avg: 120.1, max: 304.0) [2024-06-15 16:29:50,956][1648985] Avg episode reward: [(0, '169.500')] [2024-06-15 16:29:55,955][1648985] Fps is (10 sec: 45874.0, 60 sec: 45328.9, 300 sec: 46208.4). Total num frames: 826146816. Throughput: 0: 11844.2. Samples: 206609408. Policy #0 lag: (min: 48.0, avg: 120.1, max: 304.0) [2024-06-15 16:29:55,956][1648985] Avg episode reward: [(0, '152.360')] [2024-06-15 16:29:57,029][1652491] Updated weights for policy 0, policy_version 403408 (0.0012) [2024-06-15 16:29:59,311][1652491] Updated weights for policy 0, policy_version 403475 (0.0012) [2024-06-15 16:30:00,671][1652491] Updated weights for policy 0, policy_version 403536 (0.0012) [2024-06-15 16:30:00,955][1648985] Fps is (10 sec: 36044.9, 60 sec: 48606.0, 300 sec: 46208.4). Total num frames: 826441728. Throughput: 0: 11616.7. Samples: 206674944. Policy #0 lag: (min: 48.0, avg: 120.1, max: 304.0) [2024-06-15 16:30:00,956][1648985] Avg episode reward: [(0, '124.360')] [2024-06-15 16:30:02,554][1652491] Updated weights for policy 0, policy_version 403616 (0.0107) [2024-06-15 16:30:05,955][1648985] Fps is (10 sec: 52429.9, 60 sec: 45875.2, 300 sec: 46208.5). Total num frames: 826671104. Throughput: 0: 11264.0. Samples: 206736896. Policy #0 lag: (min: 48.0, avg: 120.1, max: 304.0) [2024-06-15 16:30:05,956][1648985] Avg episode reward: [(0, '119.090')] [2024-06-15 16:30:09,441][1652491] Updated weights for policy 0, policy_version 403680 (0.0039) [2024-06-15 16:30:10,281][1652491] Updated weights for policy 0, policy_version 403712 (0.0016) [2024-06-15 16:30:10,955][1648985] Fps is (10 sec: 36044.6, 60 sec: 45875.1, 300 sec: 45986.3). Total num frames: 826802176. Throughput: 0: 11320.9. Samples: 206778368. Policy #0 lag: (min: 48.0, avg: 120.1, max: 304.0) [2024-06-15 16:30:10,956][1648985] Avg episode reward: [(0, '130.100')] [2024-06-15 16:30:12,021][1652491] Updated weights for policy 0, policy_version 403771 (0.0014) [2024-06-15 16:30:13,681][1652491] Updated weights for policy 0, policy_version 403827 (0.0013) [2024-06-15 16:30:15,451][1652491] Updated weights for policy 0, policy_version 403897 (0.0012) [2024-06-15 16:30:15,955][1648985] Fps is (10 sec: 52429.1, 60 sec: 46967.5, 300 sec: 46208.4). Total num frames: 827195392. Throughput: 0: 11047.8. Samples: 206831616. Policy #0 lag: (min: 48.0, avg: 120.1, max: 304.0) [2024-06-15 16:30:15,956][1648985] Avg episode reward: [(0, '144.780')] [2024-06-15 16:30:20,697][1651469] Signal inference workers to stop experience collection... (21050 times) [2024-06-15 16:30:20,738][1652491] InferenceWorker_p0-w0: stopping experience collection (21050 times) [2024-06-15 16:30:20,947][1651469] Signal inference workers to resume experience collection... (21050 times) [2024-06-15 16:30:20,948][1652491] InferenceWorker_p0-w0: resuming experience collection (21050 times) [2024-06-15 16:30:20,955][1648985] Fps is (10 sec: 45875.4, 60 sec: 44783.0, 300 sec: 45986.3). Total num frames: 827260928. Throughput: 0: 11195.7. Samples: 206910976. Policy #0 lag: (min: 48.0, avg: 120.1, max: 304.0) [2024-06-15 16:30:20,956][1648985] Avg episode reward: [(0, '134.680')] [2024-06-15 16:30:21,436][1652491] Updated weights for policy 0, policy_version 403953 (0.0036) [2024-06-15 16:30:22,838][1652491] Updated weights for policy 0, policy_version 404016 (0.0012) [2024-06-15 16:30:25,506][1652491] Updated weights for policy 0, policy_version 404092 (0.0081) [2024-06-15 16:30:25,955][1648985] Fps is (10 sec: 39321.3, 60 sec: 45329.2, 300 sec: 45986.3). Total num frames: 827588608. Throughput: 0: 11207.1. Samples: 206940672. Policy #0 lag: (min: 63.0, avg: 134.8, max: 287.0) [2024-06-15 16:30:25,956][1648985] Avg episode reward: [(0, '148.000')] [2024-06-15 16:30:27,080][1652491] Updated weights for policy 0, policy_version 404144 (0.0027) [2024-06-15 16:30:30,955][1648985] Fps is (10 sec: 45874.7, 60 sec: 43690.6, 300 sec: 45764.1). Total num frames: 827719680. Throughput: 0: 10899.9. Samples: 207004672. Policy #0 lag: (min: 63.0, avg: 134.8, max: 287.0) [2024-06-15 16:30:30,956][1648985] Avg episode reward: [(0, '138.750')] [2024-06-15 16:30:32,644][1652491] Updated weights for policy 0, policy_version 404193 (0.0032) [2024-06-15 16:30:34,352][1652491] Updated weights for policy 0, policy_version 404259 (0.0013) [2024-06-15 16:30:35,955][1648985] Fps is (10 sec: 39321.6, 60 sec: 44782.9, 300 sec: 45764.1). Total num frames: 827981824. Throughput: 0: 11252.6. Samples: 207076352. Policy #0 lag: (min: 63.0, avg: 134.8, max: 287.0) [2024-06-15 16:30:35,956][1648985] Avg episode reward: [(0, '158.330')] [2024-06-15 16:30:36,133][1652491] Updated weights for policy 0, policy_version 404305 (0.0031) [2024-06-15 16:30:37,230][1652491] Updated weights for policy 0, policy_version 404352 (0.0013) [2024-06-15 16:30:38,951][1652491] Updated weights for policy 0, policy_version 404406 (0.0013) [2024-06-15 16:30:40,955][1648985] Fps is (10 sec: 52429.7, 60 sec: 43690.8, 300 sec: 46097.4). Total num frames: 828243968. Throughput: 0: 10968.2. Samples: 207102976. Policy #0 lag: (min: 63.0, avg: 134.8, max: 287.0) [2024-06-15 16:30:40,955][1648985] Avg episode reward: [(0, '138.890')] [2024-06-15 16:30:44,057][1652491] Updated weights for policy 0, policy_version 404448 (0.0136) [2024-06-15 16:30:45,790][1652491] Updated weights for policy 0, policy_version 404528 (0.0131) [2024-06-15 16:30:45,955][1648985] Fps is (10 sec: 49152.1, 60 sec: 46421.3, 300 sec: 45875.2). Total num frames: 828473344. Throughput: 0: 11298.1. Samples: 207183360. Policy #0 lag: (min: 63.0, avg: 134.8, max: 287.0) [2024-06-15 16:30:45,956][1648985] Avg episode reward: [(0, '143.140')] [2024-06-15 16:30:48,684][1652491] Updated weights for policy 0, policy_version 404592 (0.0012) [2024-06-15 16:30:50,230][1652491] Updated weights for policy 0, policy_version 404656 (0.0012) [2024-06-15 16:30:50,955][1648985] Fps is (10 sec: 52428.2, 60 sec: 44782.9, 300 sec: 46208.4). Total num frames: 828768256. Throughput: 0: 11275.4. Samples: 207244288. Policy #0 lag: (min: 63.0, avg: 134.8, max: 287.0) [2024-06-15 16:30:50,956][1648985] Avg episode reward: [(0, '131.730')] [2024-06-15 16:30:55,568][1652491] Updated weights for policy 0, policy_version 404706 (0.0012) [2024-06-15 16:30:55,955][1648985] Fps is (10 sec: 39320.5, 60 sec: 45329.0, 300 sec: 45541.9). Total num frames: 828866560. Throughput: 0: 11309.5. Samples: 207287296. Policy #0 lag: (min: 63.0, avg: 134.8, max: 287.0) [2024-06-15 16:30:55,956][1648985] Avg episode reward: [(0, '136.720')] [2024-06-15 16:30:56,517][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000404752_828932096.pth... [2024-06-15 16:30:56,638][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000399360_817889280.pth [2024-06-15 16:30:56,644][1651469] Saving a milestone train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/milestones/checkpoint_000404752_828932096.pth [2024-06-15 16:30:57,386][1652491] Updated weights for policy 0, policy_version 404784 (0.0012) [2024-06-15 16:30:59,512][1652491] Updated weights for policy 0, policy_version 404816 (0.0011) [2024-06-15 16:31:00,713][1651469] Signal inference workers to stop experience collection... (21100 times) [2024-06-15 16:31:00,771][1652491] InferenceWorker_p0-w0: stopping experience collection (21100 times) [2024-06-15 16:31:00,912][1651469] Signal inference workers to resume experience collection... (21100 times) [2024-06-15 16:31:00,913][1652491] InferenceWorker_p0-w0: resuming experience collection (21100 times) [2024-06-15 16:31:00,915][1652491] Updated weights for policy 0, policy_version 404880 (0.0013) [2024-06-15 16:31:00,956][1648985] Fps is (10 sec: 42593.1, 60 sec: 45874.2, 300 sec: 45875.0). Total num frames: 829194240. Throughput: 0: 11650.5. Samples: 207355904. Policy #0 lag: (min: 63.0, avg: 134.8, max: 287.0) [2024-06-15 16:31:00,957][1648985] Avg episode reward: [(0, '130.680')] [2024-06-15 16:31:02,021][1652491] Updated weights for policy 0, policy_version 404928 (0.0014) [2024-06-15 16:31:05,955][1648985] Fps is (10 sec: 45876.7, 60 sec: 44236.8, 300 sec: 45653.1). Total num frames: 829325312. Throughput: 0: 11548.5. Samples: 207430656. Policy #0 lag: (min: 63.0, avg: 134.8, max: 287.0) [2024-06-15 16:31:05,955][1648985] Avg episode reward: [(0, '125.990')] [2024-06-15 16:31:07,106][1652491] Updated weights for policy 0, policy_version 405008 (0.0091) [2024-06-15 16:31:10,396][1652491] Updated weights for policy 0, policy_version 405059 (0.0013) [2024-06-15 16:31:10,955][1648985] Fps is (10 sec: 39326.7, 60 sec: 46421.4, 300 sec: 45430.9). Total num frames: 829587456. Throughput: 0: 11605.3. Samples: 207462912. Policy #0 lag: (min: 63.0, avg: 134.8, max: 287.0) [2024-06-15 16:31:10,956][1648985] Avg episode reward: [(0, '131.220')] [2024-06-15 16:31:11,728][1652491] Updated weights for policy 0, policy_version 405120 (0.0016) [2024-06-15 16:31:13,055][1652491] Updated weights for policy 0, policy_version 405176 (0.0014) [2024-06-15 16:31:15,955][1648985] Fps is (10 sec: 52428.6, 60 sec: 44236.8, 300 sec: 45875.2). Total num frames: 829849600. Throughput: 0: 11855.7. Samples: 207538176. Policy #0 lag: (min: 63.0, avg: 134.8, max: 287.0) [2024-06-15 16:31:15,956][1648985] Avg episode reward: [(0, '129.570')] [2024-06-15 16:31:16,497][1652491] Updated weights for policy 0, policy_version 405220 (0.0014) [2024-06-15 16:31:18,209][1652491] Updated weights for policy 0, policy_version 405298 (0.0115) [2024-06-15 16:31:20,955][1648985] Fps is (10 sec: 49152.3, 60 sec: 46967.5, 300 sec: 45319.8). Total num frames: 830078976. Throughput: 0: 11946.7. Samples: 207613952. Policy #0 lag: (min: 63.0, avg: 134.8, max: 287.0) [2024-06-15 16:31:20,955][1648985] Avg episode reward: [(0, '158.080')] [2024-06-15 16:31:22,167][1652491] Updated weights for policy 0, policy_version 405346 (0.0013) [2024-06-15 16:31:23,705][1652491] Updated weights for policy 0, policy_version 405427 (0.0015) [2024-06-15 16:31:25,955][1648985] Fps is (10 sec: 49151.6, 60 sec: 45875.2, 300 sec: 46097.3). Total num frames: 830341120. Throughput: 0: 11958.0. Samples: 207641088. Policy #0 lag: (min: 63.0, avg: 134.8, max: 287.0) [2024-06-15 16:31:25,956][1648985] Avg episode reward: [(0, '166.390')] [2024-06-15 16:31:27,293][1652491] Updated weights for policy 0, policy_version 405476 (0.0012) [2024-06-15 16:31:27,796][1652491] Updated weights for policy 0, policy_version 405504 (0.0013) [2024-06-15 16:31:29,279][1652491] Updated weights for policy 0, policy_version 405568 (0.0013) [2024-06-15 16:31:30,955][1648985] Fps is (10 sec: 52428.6, 60 sec: 48059.9, 300 sec: 45875.2). Total num frames: 830603264. Throughput: 0: 11821.5. Samples: 207715328. Policy #0 lag: (min: 63.0, avg: 134.8, max: 287.0) [2024-06-15 16:31:30,956][1648985] Avg episode reward: [(0, '147.110')] [2024-06-15 16:31:33,656][1652491] Updated weights for policy 0, policy_version 405633 (0.0014) [2024-06-15 16:31:34,917][1652491] Updated weights for policy 0, policy_version 405688 (0.0014) [2024-06-15 16:31:35,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 48059.7, 300 sec: 46208.4). Total num frames: 830865408. Throughput: 0: 12049.1. Samples: 207786496. Policy #0 lag: (min: 63.0, avg: 134.8, max: 287.0) [2024-06-15 16:31:35,956][1648985] Avg episode reward: [(0, '145.400')] [2024-06-15 16:31:38,690][1652491] Updated weights for policy 0, policy_version 405728 (0.0057) [2024-06-15 16:31:40,520][1651469] Signal inference workers to stop experience collection... (21150 times) [2024-06-15 16:31:40,564][1652491] InferenceWorker_p0-w0: stopping experience collection (21150 times) [2024-06-15 16:31:40,565][1652491] Updated weights for policy 0, policy_version 405794 (0.0014) [2024-06-15 16:31:40,797][1651469] Signal inference workers to resume experience collection... (21150 times) [2024-06-15 16:31:40,798][1652491] InferenceWorker_p0-w0: resuming experience collection (21150 times) [2024-06-15 16:31:40,958][1648985] Fps is (10 sec: 49135.4, 60 sec: 47510.9, 300 sec: 46319.0). Total num frames: 831094784. Throughput: 0: 11889.0. Samples: 207822336. Policy #0 lag: (min: 63.0, avg: 134.8, max: 287.0) [2024-06-15 16:31:40,959][1648985] Avg episode reward: [(0, '141.900')] [2024-06-15 16:31:44,497][1652491] Updated weights for policy 0, policy_version 405845 (0.0029) [2024-06-15 16:31:45,736][1652491] Updated weights for policy 0, policy_version 405908 (0.0013) [2024-06-15 16:31:45,955][1648985] Fps is (10 sec: 45875.5, 60 sec: 47513.6, 300 sec: 45986.3). Total num frames: 831324160. Throughput: 0: 11924.3. Samples: 207892480. Policy #0 lag: (min: 2.0, avg: 100.4, max: 258.0) [2024-06-15 16:31:45,956][1648985] Avg episode reward: [(0, '143.600')] [2024-06-15 16:31:48,394][1652491] Updated weights for policy 0, policy_version 405954 (0.0014) [2024-06-15 16:31:49,721][1652491] Updated weights for policy 0, policy_version 406016 (0.0020) [2024-06-15 16:31:50,955][1648985] Fps is (10 sec: 52446.2, 60 sec: 47513.6, 300 sec: 46652.8). Total num frames: 831619072. Throughput: 0: 11821.5. Samples: 207962624. Policy #0 lag: (min: 2.0, avg: 100.4, max: 258.0) [2024-06-15 16:31:50,956][1648985] Avg episode reward: [(0, '146.200')] [2024-06-15 16:31:54,895][1652491] Updated weights for policy 0, policy_version 406081 (0.0013) [2024-06-15 16:31:55,955][1648985] Fps is (10 sec: 42597.5, 60 sec: 48059.8, 300 sec: 45653.0). Total num frames: 831750144. Throughput: 0: 12049.0. Samples: 208005120. Policy #0 lag: (min: 2.0, avg: 100.4, max: 258.0) [2024-06-15 16:31:55,956][1648985] Avg episode reward: [(0, '143.520')] [2024-06-15 16:31:56,441][1652491] Updated weights for policy 0, policy_version 406147 (0.0018) [2024-06-15 16:31:59,108][1652491] Updated weights for policy 0, policy_version 406211 (0.0110) [2024-06-15 16:32:00,396][1652491] Updated weights for policy 0, policy_version 406272 (0.0013) [2024-06-15 16:32:00,955][1648985] Fps is (10 sec: 45875.3, 60 sec: 48060.8, 300 sec: 46763.8). Total num frames: 832077824. Throughput: 0: 11935.3. Samples: 208075264. Policy #0 lag: (min: 2.0, avg: 100.4, max: 258.0) [2024-06-15 16:32:00,956][1648985] Avg episode reward: [(0, '144.400')] [2024-06-15 16:32:01,736][1652491] Updated weights for policy 0, policy_version 406333 (0.0012) [2024-06-15 16:32:05,956][1648985] Fps is (10 sec: 45873.1, 60 sec: 48059.2, 300 sec: 45875.1). Total num frames: 832208896. Throughput: 0: 11969.2. Samples: 208152576. Policy #0 lag: (min: 2.0, avg: 100.4, max: 258.0) [2024-06-15 16:32:05,957][1648985] Avg episode reward: [(0, '122.380')] [2024-06-15 16:32:06,619][1652491] Updated weights for policy 0, policy_version 406396 (0.0020) [2024-06-15 16:32:09,508][1652491] Updated weights for policy 0, policy_version 406470 (0.0015) [2024-06-15 16:32:10,701][1652491] Updated weights for policy 0, policy_version 406528 (0.0014) [2024-06-15 16:32:10,955][1648985] Fps is (10 sec: 49151.9, 60 sec: 49698.1, 300 sec: 46652.7). Total num frames: 832569344. Throughput: 0: 12071.8. Samples: 208184320. Policy #0 lag: (min: 2.0, avg: 100.4, max: 258.0) [2024-06-15 16:32:10,956][1648985] Avg episode reward: [(0, '130.260')] [2024-06-15 16:32:12,077][1652491] Updated weights for policy 0, policy_version 406587 (0.0154) [2024-06-15 16:32:15,955][1648985] Fps is (10 sec: 49155.1, 60 sec: 47513.6, 300 sec: 46319.5). Total num frames: 832700416. Throughput: 0: 12014.9. Samples: 208256000. Policy #0 lag: (min: 2.0, avg: 100.4, max: 258.0) [2024-06-15 16:32:15,956][1648985] Avg episode reward: [(0, '148.540')] [2024-06-15 16:32:17,104][1652491] Updated weights for policy 0, policy_version 406648 (0.0014) [2024-06-15 16:32:19,551][1652491] Updated weights for policy 0, policy_version 406706 (0.0012) [2024-06-15 16:32:20,858][1651469] Signal inference workers to stop experience collection... (21200 times) [2024-06-15 16:32:20,882][1652491] Updated weights for policy 0, policy_version 406754 (0.0012) [2024-06-15 16:32:20,897][1652491] InferenceWorker_p0-w0: stopping experience collection (21200 times) [2024-06-15 16:32:20,955][1648985] Fps is (10 sec: 45874.6, 60 sec: 49151.8, 300 sec: 46430.6). Total num frames: 833028096. Throughput: 0: 11980.8. Samples: 208325632. Policy #0 lag: (min: 2.0, avg: 100.4, max: 258.0) [2024-06-15 16:32:20,956][1648985] Avg episode reward: [(0, '150.800')] [2024-06-15 16:32:21,027][1651469] Signal inference workers to resume experience collection... (21200 times) [2024-06-15 16:32:21,028][1652491] InferenceWorker_p0-w0: resuming experience collection (21200 times) [2024-06-15 16:32:22,104][1652491] Updated weights for policy 0, policy_version 406800 (0.0013) [2024-06-15 16:32:23,215][1652491] Updated weights for policy 0, policy_version 406845 (0.0012) [2024-06-15 16:32:25,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 48059.8, 300 sec: 46652.7). Total num frames: 833224704. Throughput: 0: 12027.2. Samples: 208363520. Policy #0 lag: (min: 2.0, avg: 100.4, max: 258.0) [2024-06-15 16:32:25,955][1648985] Avg episode reward: [(0, '130.230')] [2024-06-15 16:32:28,375][1652491] Updated weights for policy 0, policy_version 406896 (0.0014) [2024-06-15 16:32:29,879][1652491] Updated weights for policy 0, policy_version 406928 (0.0011) [2024-06-15 16:32:30,955][1648985] Fps is (10 sec: 45876.0, 60 sec: 48059.7, 300 sec: 46208.4). Total num frames: 833486848. Throughput: 0: 12106.0. Samples: 208437248. Policy #0 lag: (min: 2.0, avg: 100.4, max: 258.0) [2024-06-15 16:32:30,955][1648985] Avg episode reward: [(0, '125.230')] [2024-06-15 16:32:31,402][1652491] Updated weights for policy 0, policy_version 406981 (0.0015) [2024-06-15 16:32:32,349][1652491] Updated weights for policy 0, policy_version 407037 (0.0032) [2024-06-15 16:32:33,702][1652491] Updated weights for policy 0, policy_version 407103 (0.0015) [2024-06-15 16:32:35,974][1648985] Fps is (10 sec: 52327.8, 60 sec: 48044.3, 300 sec: 47094.0). Total num frames: 833748992. Throughput: 0: 12066.7. Samples: 208505856. Policy #0 lag: (min: 2.0, avg: 100.4, max: 258.0) [2024-06-15 16:32:35,975][1648985] Avg episode reward: [(0, '143.790')] [2024-06-15 16:32:39,414][1652491] Updated weights for policy 0, policy_version 407162 (0.0012) [2024-06-15 16:32:40,955][1648985] Fps is (10 sec: 42598.6, 60 sec: 46970.1, 300 sec: 46320.0). Total num frames: 833912832. Throughput: 0: 12026.4. Samples: 208546304. Policy #0 lag: (min: 2.0, avg: 100.4, max: 258.0) [2024-06-15 16:32:40,955][1648985] Avg episode reward: [(0, '142.590')] [2024-06-15 16:32:41,494][1652491] Updated weights for policy 0, policy_version 407220 (0.0013) [2024-06-15 16:32:42,733][1652491] Updated weights for policy 0, policy_version 407251 (0.0014) [2024-06-15 16:32:43,756][1652491] Updated weights for policy 0, policy_version 407296 (0.0011) [2024-06-15 16:32:44,946][1652491] Updated weights for policy 0, policy_version 407355 (0.0032) [2024-06-15 16:32:45,955][1648985] Fps is (10 sec: 52530.4, 60 sec: 49152.0, 300 sec: 47097.1). Total num frames: 834273280. Throughput: 0: 11787.4. Samples: 208605696. Policy #0 lag: (min: 2.0, avg: 100.4, max: 258.0) [2024-06-15 16:32:45,955][1648985] Avg episode reward: [(0, '145.890')] [2024-06-15 16:32:50,664][1652491] Updated weights for policy 0, policy_version 407412 (0.0012) [2024-06-15 16:32:50,955][1648985] Fps is (10 sec: 49152.2, 60 sec: 46421.4, 300 sec: 46763.8). Total num frames: 834404352. Throughput: 0: 11799.0. Samples: 208683520. Policy #0 lag: (min: 2.0, avg: 100.4, max: 258.0) [2024-06-15 16:32:50,955][1648985] Avg episode reward: [(0, '138.820')] [2024-06-15 16:32:52,796][1652491] Updated weights for policy 0, policy_version 407457 (0.0012) [2024-06-15 16:32:55,889][1652491] Updated weights for policy 0, policy_version 407554 (0.0015) [2024-06-15 16:32:55,966][1648985] Fps is (10 sec: 39277.5, 60 sec: 48597.0, 300 sec: 46651.0). Total num frames: 834666496. Throughput: 0: 11784.5. Samples: 208714752. Policy #0 lag: (min: 2.0, avg: 100.4, max: 258.0) [2024-06-15 16:32:55,967][1648985] Avg episode reward: [(0, '152.050')] [2024-06-15 16:32:56,459][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000407584_834732032.pth... [2024-06-15 16:32:56,615][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000402080_823459840.pth [2024-06-15 16:33:00,955][1648985] Fps is (10 sec: 39320.5, 60 sec: 45328.9, 300 sec: 46874.9). Total num frames: 834797568. Throughput: 0: 11753.2. Samples: 208784896. Policy #0 lag: (min: 2.0, avg: 100.4, max: 258.0) [2024-06-15 16:33:00,956][1648985] Avg episode reward: [(0, '156.190')] [2024-06-15 16:33:01,225][1652491] Updated weights for policy 0, policy_version 407620 (0.0093) [2024-06-15 16:33:02,554][1652491] Updated weights for policy 0, policy_version 407679 (0.0016) [2024-06-15 16:33:04,716][1652491] Updated weights for policy 0, policy_version 407736 (0.0033) [2024-06-15 16:33:04,964][1651469] Signal inference workers to stop experience collection... (21250 times) [2024-06-15 16:33:05,023][1652491] InferenceWorker_p0-w0: stopping experience collection (21250 times) [2024-06-15 16:33:05,210][1651469] Signal inference workers to resume experience collection... (21250 times) [2024-06-15 16:33:05,211][1652491] InferenceWorker_p0-w0: resuming experience collection (21250 times) [2024-06-15 16:33:05,713][1652491] Updated weights for policy 0, policy_version 407778 (0.0045) [2024-06-15 16:33:05,956][1648985] Fps is (10 sec: 49203.2, 60 sec: 49151.9, 300 sec: 46652.6). Total num frames: 835158016. Throughput: 0: 11707.6. Samples: 208852480. Policy #0 lag: (min: 31.0, avg: 150.9, max: 301.0) [2024-06-15 16:33:05,957][1648985] Avg episode reward: [(0, '147.510')] [2024-06-15 16:33:07,235][1652491] Updated weights for policy 0, policy_version 407834 (0.0114) [2024-06-15 16:33:10,955][1648985] Fps is (10 sec: 52429.9, 60 sec: 45875.3, 300 sec: 47097.1). Total num frames: 835321856. Throughput: 0: 11650.9. Samples: 208887808. Policy #0 lag: (min: 31.0, avg: 150.9, max: 301.0) [2024-06-15 16:33:10,955][1648985] Avg episode reward: [(0, '153.660')] [2024-06-15 16:33:11,965][1652491] Updated weights for policy 0, policy_version 407888 (0.0015) [2024-06-15 16:33:14,599][1652491] Updated weights for policy 0, policy_version 407954 (0.0015) [2024-06-15 16:33:15,831][1652491] Updated weights for policy 0, policy_version 408001 (0.0013) [2024-06-15 16:33:15,955][1648985] Fps is (10 sec: 42601.6, 60 sec: 48059.8, 300 sec: 46763.8). Total num frames: 835584000. Throughput: 0: 11650.8. Samples: 208961536. Policy #0 lag: (min: 31.0, avg: 150.9, max: 301.0) [2024-06-15 16:33:15,956][1648985] Avg episode reward: [(0, '151.490')] [2024-06-15 16:33:18,360][1652491] Updated weights for policy 0, policy_version 408065 (0.0044) [2024-06-15 16:33:19,392][1652491] Updated weights for policy 0, policy_version 408126 (0.0021) [2024-06-15 16:33:20,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 46967.6, 300 sec: 47097.1). Total num frames: 835846144. Throughput: 0: 11667.2. Samples: 209030656. Policy #0 lag: (min: 31.0, avg: 150.9, max: 301.0) [2024-06-15 16:33:20,956][1648985] Avg episode reward: [(0, '140.370')] [2024-06-15 16:33:24,421][1652491] Updated weights for policy 0, policy_version 408190 (0.0134) [2024-06-15 16:33:25,955][1648985] Fps is (10 sec: 45874.9, 60 sec: 46967.4, 300 sec: 46874.9). Total num frames: 836042752. Throughput: 0: 11673.6. Samples: 209071616. Policy #0 lag: (min: 31.0, avg: 150.9, max: 301.0) [2024-06-15 16:33:25,956][1648985] Avg episode reward: [(0, '139.420')] [2024-06-15 16:33:26,390][1652491] Updated weights for policy 0, policy_version 408252 (0.0019) [2024-06-15 16:33:27,927][1652491] Updated weights for policy 0, policy_version 408304 (0.0022) [2024-06-15 16:33:29,809][1652491] Updated weights for policy 0, policy_version 408336 (0.0012) [2024-06-15 16:33:30,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 48059.8, 300 sec: 47097.1). Total num frames: 836370432. Throughput: 0: 11821.5. Samples: 209137664. Policy #0 lag: (min: 31.0, avg: 150.9, max: 301.0) [2024-06-15 16:33:30,956][1648985] Avg episode reward: [(0, '145.640')] [2024-06-15 16:33:34,483][1652491] Updated weights for policy 0, policy_version 408389 (0.0013) [2024-06-15 16:33:35,839][1652491] Updated weights for policy 0, policy_version 408449 (0.0012) [2024-06-15 16:33:35,955][1648985] Fps is (10 sec: 45875.6, 60 sec: 45890.0, 300 sec: 47097.1). Total num frames: 836501504. Throughput: 0: 11776.0. Samples: 209213440. Policy #0 lag: (min: 31.0, avg: 150.9, max: 301.0) [2024-06-15 16:33:35,956][1648985] Avg episode reward: [(0, '144.970')] [2024-06-15 16:33:37,274][1652491] Updated weights for policy 0, policy_version 408512 (0.0014) [2024-06-15 16:33:38,687][1652491] Updated weights for policy 0, policy_version 408574 (0.0013) [2024-06-15 16:33:40,955][1648985] Fps is (10 sec: 42598.0, 60 sec: 48059.6, 300 sec: 46763.8). Total num frames: 836796416. Throughput: 0: 11778.9. Samples: 209244672. Policy #0 lag: (min: 31.0, avg: 150.9, max: 301.0) [2024-06-15 16:33:40,956][1648985] Avg episode reward: [(0, '140.100')] [2024-06-15 16:33:41,540][1652491] Updated weights for policy 0, policy_version 408629 (0.0013) [2024-06-15 16:33:45,955][1648985] Fps is (10 sec: 42598.5, 60 sec: 44236.8, 300 sec: 46874.9). Total num frames: 836927488. Throughput: 0: 11992.2. Samples: 209324544. Policy #0 lag: (min: 31.0, avg: 150.9, max: 301.0) [2024-06-15 16:33:45,956][1648985] Avg episode reward: [(0, '148.220')] [2024-06-15 16:33:46,225][1652491] Updated weights for policy 0, policy_version 408688 (0.0118) [2024-06-15 16:33:47,433][1652491] Updated weights for policy 0, policy_version 408736 (0.0013) [2024-06-15 16:33:47,556][1651469] Signal inference workers to stop experience collection... (21300 times) [2024-06-15 16:33:47,608][1652491] InferenceWorker_p0-w0: stopping experience collection (21300 times) [2024-06-15 16:33:47,850][1651469] Signal inference workers to resume experience collection... (21300 times) [2024-06-15 16:33:47,853][1652491] InferenceWorker_p0-w0: resuming experience collection (21300 times) [2024-06-15 16:33:49,562][1652491] Updated weights for policy 0, policy_version 408816 (0.0012) [2024-06-15 16:33:50,955][1648985] Fps is (10 sec: 49152.6, 60 sec: 48059.7, 300 sec: 46986.0). Total num frames: 837287936. Throughput: 0: 11833.1. Samples: 209384960. Policy #0 lag: (min: 31.0, avg: 150.9, max: 301.0) [2024-06-15 16:33:50,955][1648985] Avg episode reward: [(0, '140.240')] [2024-06-15 16:33:52,719][1652491] Updated weights for policy 0, policy_version 408889 (0.0013) [2024-06-15 16:33:55,955][1648985] Fps is (10 sec: 49150.6, 60 sec: 45883.6, 300 sec: 47097.0). Total num frames: 837419008. Throughput: 0: 11901.1. Samples: 209423360. Policy #0 lag: (min: 31.0, avg: 150.9, max: 301.0) [2024-06-15 16:33:55,956][1648985] Avg episode reward: [(0, '127.760')] [2024-06-15 16:33:57,237][1652491] Updated weights for policy 0, policy_version 408949 (0.0118) [2024-06-15 16:33:58,743][1652491] Updated weights for policy 0, policy_version 408979 (0.0014) [2024-06-15 16:34:00,227][1652491] Updated weights for policy 0, policy_version 409046 (0.0013) [2024-06-15 16:34:00,955][1648985] Fps is (10 sec: 49151.3, 60 sec: 49698.2, 300 sec: 46986.0). Total num frames: 837779456. Throughput: 0: 11821.5. Samples: 209493504. Policy #0 lag: (min: 31.0, avg: 150.9, max: 301.0) [2024-06-15 16:34:00,956][1648985] Avg episode reward: [(0, '122.790')] [2024-06-15 16:34:02,609][1652491] Updated weights for policy 0, policy_version 409104 (0.0016) [2024-06-15 16:34:03,721][1652491] Updated weights for policy 0, policy_version 409151 (0.0011) [2024-06-15 16:34:05,955][1648985] Fps is (10 sec: 52430.1, 60 sec: 46421.9, 300 sec: 47097.1). Total num frames: 837943296. Throughput: 0: 12083.2. Samples: 209574400. Policy #0 lag: (min: 31.0, avg: 150.9, max: 301.0) [2024-06-15 16:34:05,956][1648985] Avg episode reward: [(0, '133.520')] [2024-06-15 16:34:09,057][1652491] Updated weights for policy 0, policy_version 409217 (0.0014) [2024-06-15 16:34:10,768][1652491] Updated weights for policy 0, policy_version 409296 (0.0123) [2024-06-15 16:34:10,955][1648985] Fps is (10 sec: 45875.3, 60 sec: 48605.8, 300 sec: 46986.0). Total num frames: 838238208. Throughput: 0: 11969.4. Samples: 209610240. Policy #0 lag: (min: 31.0, avg: 150.9, max: 301.0) [2024-06-15 16:34:10,956][1648985] Avg episode reward: [(0, '142.510')] [2024-06-15 16:34:11,696][1652491] Updated weights for policy 0, policy_version 409334 (0.0026) [2024-06-15 16:34:13,857][1652491] Updated weights for policy 0, policy_version 409377 (0.0013) [2024-06-15 16:34:15,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 48059.8, 300 sec: 47097.1). Total num frames: 838467584. Throughput: 0: 11867.0. Samples: 209671680. Policy #0 lag: (min: 31.0, avg: 150.9, max: 301.0) [2024-06-15 16:34:15,956][1648985] Avg episode reward: [(0, '138.050')] [2024-06-15 16:34:18,935][1652491] Updated weights for policy 0, policy_version 409431 (0.0015) [2024-06-15 16:34:20,428][1652491] Updated weights for policy 0, policy_version 409488 (0.0029) [2024-06-15 16:34:20,955][1648985] Fps is (10 sec: 42598.8, 60 sec: 46967.5, 300 sec: 46763.9). Total num frames: 838664192. Throughput: 0: 11992.2. Samples: 209753088. Policy #0 lag: (min: 31.0, avg: 150.9, max: 301.0) [2024-06-15 16:34:20,956][1648985] Avg episode reward: [(0, '142.420')] [2024-06-15 16:34:21,851][1652491] Updated weights for policy 0, policy_version 409552 (0.0017) [2024-06-15 16:34:24,009][1652491] Updated weights for policy 0, policy_version 409618 (0.0013) [2024-06-15 16:34:25,955][1648985] Fps is (10 sec: 52428.3, 60 sec: 49152.0, 300 sec: 47097.1). Total num frames: 838991872. Throughput: 0: 12003.6. Samples: 209784832. Policy #0 lag: (min: 31.0, avg: 150.9, max: 301.0) [2024-06-15 16:34:25,956][1648985] Avg episode reward: [(0, '135.360')] [2024-06-15 16:34:29,157][1652491] Updated weights for policy 0, policy_version 409681 (0.0014) [2024-06-15 16:34:29,641][1651469] Signal inference workers to stop experience collection... (21350 times) [2024-06-15 16:34:29,697][1652491] InferenceWorker_p0-w0: stopping experience collection (21350 times) [2024-06-15 16:34:29,895][1651469] Signal inference workers to resume experience collection... (21350 times) [2024-06-15 16:34:29,896][1652491] InferenceWorker_p0-w0: resuming experience collection (21350 times) [2024-06-15 16:34:30,145][1652491] Updated weights for policy 0, policy_version 409724 (0.0013) [2024-06-15 16:34:30,955][1648985] Fps is (10 sec: 45875.0, 60 sec: 45875.2, 300 sec: 46874.9). Total num frames: 839122944. Throughput: 0: 12049.1. Samples: 209866752. Policy #0 lag: (min: 4.0, avg: 75.0, max: 260.0) [2024-06-15 16:34:30,956][1648985] Avg episode reward: [(0, '118.970')] [2024-06-15 16:34:32,005][1652491] Updated weights for policy 0, policy_version 409778 (0.0013) [2024-06-15 16:34:33,433][1652491] Updated weights for policy 0, policy_version 409853 (0.0017) [2024-06-15 16:34:35,276][1652491] Updated weights for policy 0, policy_version 409913 (0.0014) [2024-06-15 16:34:35,955][1648985] Fps is (10 sec: 52429.2, 60 sec: 50244.3, 300 sec: 47097.1). Total num frames: 839516160. Throughput: 0: 12185.6. Samples: 209933312. Policy #0 lag: (min: 4.0, avg: 75.0, max: 260.0) [2024-06-15 16:34:35,956][1648985] Avg episode reward: [(0, '135.610')] [2024-06-15 16:34:40,638][1652491] Updated weights for policy 0, policy_version 409968 (0.0018) [2024-06-15 16:34:40,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 47513.7, 300 sec: 47319.2). Total num frames: 839647232. Throughput: 0: 12356.3. Samples: 209979392. Policy #0 lag: (min: 4.0, avg: 75.0, max: 260.0) [2024-06-15 16:34:40,956][1648985] Avg episode reward: [(0, '136.800')] [2024-06-15 16:34:42,776][1652491] Updated weights for policy 0, policy_version 410036 (0.0013) [2024-06-15 16:34:44,440][1652491] Updated weights for policy 0, policy_version 410108 (0.0098) [2024-06-15 16:34:45,955][1648985] Fps is (10 sec: 45875.0, 60 sec: 50790.3, 300 sec: 47097.1). Total num frames: 839974912. Throughput: 0: 12071.8. Samples: 210036736. Policy #0 lag: (min: 4.0, avg: 75.0, max: 260.0) [2024-06-15 16:34:45,956][1648985] Avg episode reward: [(0, '139.400')] [2024-06-15 16:34:46,209][1652491] Updated weights for policy 0, policy_version 410168 (0.0013) [2024-06-15 16:34:50,955][1648985] Fps is (10 sec: 39321.7, 60 sec: 45875.2, 300 sec: 47097.1). Total num frames: 840040448. Throughput: 0: 12140.1. Samples: 210120704. Policy #0 lag: (min: 4.0, avg: 75.0, max: 260.0) [2024-06-15 16:34:50,956][1648985] Avg episode reward: [(0, '138.270')] [2024-06-15 16:34:51,697][1652491] Updated weights for policy 0, policy_version 410210 (0.0042) [2024-06-15 16:34:52,914][1652491] Updated weights for policy 0, policy_version 410257 (0.0011) [2024-06-15 16:34:55,172][1652491] Updated weights for policy 0, policy_version 410352 (0.0018) [2024-06-15 16:34:55,955][1648985] Fps is (10 sec: 45873.9, 60 sec: 50244.2, 300 sec: 47430.2). Total num frames: 840433664. Throughput: 0: 11992.1. Samples: 210149888. Policy #0 lag: (min: 4.0, avg: 75.0, max: 260.0) [2024-06-15 16:34:55,956][1648985] Avg episode reward: [(0, '140.990')] [2024-06-15 16:34:55,960][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000410368_840433664.pth... [2024-06-15 16:34:56,175][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000404752_828932096.pth [2024-06-15 16:34:57,173][1652491] Updated weights for policy 0, policy_version 410416 (0.0015) [2024-06-15 16:35:00,955][1648985] Fps is (10 sec: 52428.0, 60 sec: 46421.3, 300 sec: 47097.0). Total num frames: 840564736. Throughput: 0: 12060.4. Samples: 210214400. Policy #0 lag: (min: 4.0, avg: 75.0, max: 260.0) [2024-06-15 16:35:00,956][1648985] Avg episode reward: [(0, '142.450')] [2024-06-15 16:35:03,935][1652491] Updated weights for policy 0, policy_version 410496 (0.0014) [2024-06-15 16:35:05,888][1652491] Updated weights for policy 0, policy_version 410550 (0.0012) [2024-06-15 16:35:05,955][1648985] Fps is (10 sec: 36045.8, 60 sec: 47513.6, 300 sec: 47430.3). Total num frames: 840794112. Throughput: 0: 11719.1. Samples: 210280448. Policy #0 lag: (min: 4.0, avg: 75.0, max: 260.0) [2024-06-15 16:35:05,956][1648985] Avg episode reward: [(0, '142.110')] [2024-06-15 16:35:07,227][1652491] Updated weights for policy 0, policy_version 410608 (0.0023) [2024-06-15 16:35:08,215][1651469] Signal inference workers to stop experience collection... (21400 times) [2024-06-15 16:35:08,256][1652491] Updated weights for policy 0, policy_version 410643 (0.0012) [2024-06-15 16:35:08,271][1652491] InferenceWorker_p0-w0: stopping experience collection (21400 times) [2024-06-15 16:35:08,417][1651469] Signal inference workers to resume experience collection... (21400 times) [2024-06-15 16:35:08,417][1652491] InferenceWorker_p0-w0: resuming experience collection (21400 times) [2024-06-15 16:35:10,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 47513.6, 300 sec: 47097.0). Total num frames: 841089024. Throughput: 0: 11673.6. Samples: 210310144. Policy #0 lag: (min: 4.0, avg: 75.0, max: 260.0) [2024-06-15 16:35:10,956][1648985] Avg episode reward: [(0, '153.940')] [2024-06-15 16:35:14,314][1652491] Updated weights for policy 0, policy_version 410689 (0.0012) [2024-06-15 16:35:15,830][1652491] Updated weights for policy 0, policy_version 410752 (0.0014) [2024-06-15 16:35:15,955][1648985] Fps is (10 sec: 42598.5, 60 sec: 45875.2, 300 sec: 47319.2). Total num frames: 841220096. Throughput: 0: 11650.8. Samples: 210391040. Policy #0 lag: (min: 4.0, avg: 75.0, max: 260.0) [2024-06-15 16:35:15,956][1648985] Avg episode reward: [(0, '135.770')] [2024-06-15 16:35:17,647][1652491] Updated weights for policy 0, policy_version 410820 (0.0013) [2024-06-15 16:35:18,898][1652491] Updated weights for policy 0, policy_version 410872 (0.0025) [2024-06-15 16:35:20,319][1652491] Updated weights for policy 0, policy_version 410928 (0.0011) [2024-06-15 16:35:20,955][1648985] Fps is (10 sec: 52428.3, 60 sec: 49151.8, 300 sec: 47541.3). Total num frames: 841613312. Throughput: 0: 11457.4. Samples: 210448896. Policy #0 lag: (min: 4.0, avg: 75.0, max: 260.0) [2024-06-15 16:35:20,956][1648985] Avg episode reward: [(0, '131.300')] [2024-06-15 16:35:25,955][1648985] Fps is (10 sec: 39321.7, 60 sec: 43690.7, 300 sec: 47097.1). Total num frames: 841613312. Throughput: 0: 11343.6. Samples: 210489856. Policy #0 lag: (min: 4.0, avg: 75.0, max: 260.0) [2024-06-15 16:35:25,956][1648985] Avg episode reward: [(0, '125.160')] [2024-06-15 16:35:27,032][1652491] Updated weights for policy 0, policy_version 410976 (0.0014) [2024-06-15 16:35:28,247][1652491] Updated weights for policy 0, policy_version 411024 (0.0013) [2024-06-15 16:35:30,120][1652491] Updated weights for policy 0, policy_version 411091 (0.0013) [2024-06-15 16:35:30,955][1648985] Fps is (10 sec: 36045.6, 60 sec: 47513.6, 300 sec: 47430.3). Total num frames: 841973760. Throughput: 0: 11423.3. Samples: 210550784. Policy #0 lag: (min: 4.0, avg: 75.0, max: 260.0) [2024-06-15 16:35:30,956][1648985] Avg episode reward: [(0, '136.760')] [2024-06-15 16:35:31,534][1652491] Updated weights for policy 0, policy_version 411152 (0.0013) [2024-06-15 16:35:35,955][1648985] Fps is (10 sec: 52428.6, 60 sec: 43690.6, 300 sec: 47097.0). Total num frames: 842137600. Throughput: 0: 11047.8. Samples: 210617856. Policy #0 lag: (min: 4.0, avg: 75.0, max: 260.0) [2024-06-15 16:35:35,956][1648985] Avg episode reward: [(0, '133.490')] [2024-06-15 16:35:38,692][1652491] Updated weights for policy 0, policy_version 411216 (0.0014) [2024-06-15 16:35:40,955][1648985] Fps is (10 sec: 36044.4, 60 sec: 44782.8, 300 sec: 46986.0). Total num frames: 842334208. Throughput: 0: 11218.5. Samples: 210654720. Policy #0 lag: (min: 4.0, avg: 75.0, max: 260.0) [2024-06-15 16:35:40,956][1648985] Avg episode reward: [(0, '147.710')] [2024-06-15 16:35:41,346][1652491] Updated weights for policy 0, policy_version 411318 (0.0179) [2024-06-15 16:35:42,656][1652491] Updated weights for policy 0, policy_version 411360 (0.0029) [2024-06-15 16:35:44,275][1652491] Updated weights for policy 0, policy_version 411424 (0.0022) [2024-06-15 16:35:45,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 44782.9, 300 sec: 47097.1). Total num frames: 842661888. Throughput: 0: 10877.2. Samples: 210703872. Policy #0 lag: (min: 4.0, avg: 75.0, max: 260.0) [2024-06-15 16:35:45,956][1648985] Avg episode reward: [(0, '138.900')] [2024-06-15 16:35:50,955][1648985] Fps is (10 sec: 36045.1, 60 sec: 44236.7, 300 sec: 46874.9). Total num frames: 842694656. Throughput: 0: 11173.0. Samples: 210783232. Policy #0 lag: (min: 15.0, avg: 73.6, max: 271.0) [2024-06-15 16:35:50,956][1648985] Avg episode reward: [(0, '132.650')] [2024-06-15 16:35:51,273][1652491] Updated weights for policy 0, policy_version 411488 (0.0026) [2024-06-15 16:35:52,506][1651469] Signal inference workers to stop experience collection... (21450 times) [2024-06-15 16:35:52,576][1652491] InferenceWorker_p0-w0: stopping experience collection (21450 times) [2024-06-15 16:35:52,788][1651469] Signal inference workers to resume experience collection... (21450 times) [2024-06-15 16:35:52,789][1652491] InferenceWorker_p0-w0: resuming experience collection (21450 times) [2024-06-15 16:35:52,974][1652491] Updated weights for policy 0, policy_version 411555 (0.0012) [2024-06-15 16:35:54,563][1652491] Updated weights for policy 0, policy_version 411606 (0.0012) [2024-06-15 16:35:55,955][1648985] Fps is (10 sec: 42598.4, 60 sec: 44237.0, 300 sec: 47097.3). Total num frames: 843087872. Throughput: 0: 11036.5. Samples: 210806784. Policy #0 lag: (min: 15.0, avg: 73.6, max: 271.0) [2024-06-15 16:35:55,956][1648985] Avg episode reward: [(0, '139.920')] [2024-06-15 16:35:56,125][1652491] Updated weights for policy 0, policy_version 411666 (0.0010) [2024-06-15 16:35:57,179][1652491] Updated weights for policy 0, policy_version 411712 (0.0011) [2024-06-15 16:36:00,955][1648985] Fps is (10 sec: 49152.5, 60 sec: 43690.8, 300 sec: 46986.0). Total num frames: 843186176. Throughput: 0: 10808.9. Samples: 210877440. Policy #0 lag: (min: 15.0, avg: 73.6, max: 271.0) [2024-06-15 16:36:00,955][1648985] Avg episode reward: [(0, '171.460')] [2024-06-15 16:36:03,569][1652491] Updated weights for policy 0, policy_version 411763 (0.0013) [2024-06-15 16:36:04,842][1652491] Updated weights for policy 0, policy_version 411831 (0.0014) [2024-06-15 16:36:05,955][1648985] Fps is (10 sec: 39321.6, 60 sec: 44782.9, 300 sec: 47097.1). Total num frames: 843481088. Throughput: 0: 11070.6. Samples: 210947072. Policy #0 lag: (min: 15.0, avg: 73.6, max: 271.0) [2024-06-15 16:36:05,956][1648985] Avg episode reward: [(0, '179.890')] [2024-06-15 16:36:06,372][1652491] Updated weights for policy 0, policy_version 411872 (0.0024) [2024-06-15 16:36:08,428][1652491] Updated weights for policy 0, policy_version 411952 (0.0011) [2024-06-15 16:36:10,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 43690.8, 300 sec: 46986.0). Total num frames: 843710464. Throughput: 0: 10695.1. Samples: 210971136. Policy #0 lag: (min: 15.0, avg: 73.6, max: 271.0) [2024-06-15 16:36:10,955][1648985] Avg episode reward: [(0, '182.610')] [2024-06-15 16:36:13,819][1652491] Updated weights for policy 0, policy_version 411989 (0.0125) [2024-06-15 16:36:15,363][1652491] Updated weights for policy 0, policy_version 412035 (0.0014) [2024-06-15 16:36:15,955][1648985] Fps is (10 sec: 39321.9, 60 sec: 44236.9, 300 sec: 46763.8). Total num frames: 843874304. Throughput: 0: 11047.8. Samples: 211047936. Policy #0 lag: (min: 15.0, avg: 73.6, max: 271.0) [2024-06-15 16:36:15,956][1648985] Avg episode reward: [(0, '158.450')] [2024-06-15 16:36:17,712][1652491] Updated weights for policy 0, policy_version 412097 (0.0012) [2024-06-15 16:36:19,238][1652491] Updated weights for policy 0, policy_version 412160 (0.0140) [2024-06-15 16:36:20,955][1648985] Fps is (10 sec: 49151.8, 60 sec: 43144.7, 300 sec: 46986.0). Total num frames: 844201984. Throughput: 0: 10740.6. Samples: 211101184. Policy #0 lag: (min: 15.0, avg: 73.6, max: 271.0) [2024-06-15 16:36:20,955][1648985] Avg episode reward: [(0, '147.980')] [2024-06-15 16:36:21,010][1652491] Updated weights for policy 0, policy_version 412219 (0.0141) [2024-06-15 16:36:25,955][1648985] Fps is (10 sec: 39321.4, 60 sec: 44236.8, 300 sec: 46319.5). Total num frames: 844267520. Throughput: 0: 10865.8. Samples: 211143680. Policy #0 lag: (min: 15.0, avg: 73.6, max: 271.0) [2024-06-15 16:36:25,956][1648985] Avg episode reward: [(0, '134.500')] [2024-06-15 16:36:27,817][1652491] Updated weights for policy 0, policy_version 412304 (0.0015) [2024-06-15 16:36:29,692][1652491] Updated weights for policy 0, policy_version 412358 (0.0014) [2024-06-15 16:36:30,955][1648985] Fps is (10 sec: 39321.3, 60 sec: 43690.6, 300 sec: 46541.7). Total num frames: 844595200. Throughput: 0: 11207.1. Samples: 211208192. Policy #0 lag: (min: 15.0, avg: 73.6, max: 271.0) [2024-06-15 16:36:30,956][1648985] Avg episode reward: [(0, '123.490')] [2024-06-15 16:36:31,035][1652491] Updated weights for policy 0, policy_version 412416 (0.0016) [2024-06-15 16:36:32,532][1652491] Updated weights for policy 0, policy_version 412472 (0.0014) [2024-06-15 16:36:35,955][1648985] Fps is (10 sec: 49152.3, 60 sec: 43690.7, 300 sec: 46320.0). Total num frames: 844759040. Throughput: 0: 11138.9. Samples: 211284480. Policy #0 lag: (min: 15.0, avg: 73.6, max: 271.0) [2024-06-15 16:36:35,955][1648985] Avg episode reward: [(0, '139.780')] [2024-06-15 16:36:36,919][1651469] Signal inference workers to stop experience collection... (21500 times) [2024-06-15 16:36:37,115][1652491] InferenceWorker_p0-w0: stopping experience collection (21500 times) [2024-06-15 16:36:37,130][1651469] Signal inference workers to resume experience collection... (21500 times) [2024-06-15 16:36:37,142][1652491] InferenceWorker_p0-w0: resuming experience collection (21500 times) [2024-06-15 16:36:37,737][1652491] Updated weights for policy 0, policy_version 412544 (0.0112) [2024-06-15 16:36:40,004][1652491] Updated weights for policy 0, policy_version 412607 (0.0017) [2024-06-15 16:36:40,955][1648985] Fps is (10 sec: 42598.5, 60 sec: 44783.0, 300 sec: 46430.6). Total num frames: 845021184. Throughput: 0: 11332.3. Samples: 211316736. Policy #0 lag: (min: 15.0, avg: 73.6, max: 271.0) [2024-06-15 16:36:40,956][1648985] Avg episode reward: [(0, '139.820')] [2024-06-15 16:36:42,755][1652491] Updated weights for policy 0, policy_version 412688 (0.0126) [2024-06-15 16:36:43,960][1652491] Updated weights for policy 0, policy_version 412733 (0.0037) [2024-06-15 16:36:45,955][1648985] Fps is (10 sec: 52428.6, 60 sec: 43690.7, 300 sec: 46319.5). Total num frames: 845283328. Throughput: 0: 11104.7. Samples: 211377152. Policy #0 lag: (min: 15.0, avg: 73.6, max: 271.0) [2024-06-15 16:36:45,956][1648985] Avg episode reward: [(0, '123.960')] [2024-06-15 16:36:49,208][1652491] Updated weights for policy 0, policy_version 412791 (0.0012) [2024-06-15 16:36:50,549][1652491] Updated weights for policy 0, policy_version 412834 (0.0040) [2024-06-15 16:36:50,955][1648985] Fps is (10 sec: 49152.4, 60 sec: 46967.5, 300 sec: 46652.8). Total num frames: 845512704. Throughput: 0: 11275.4. Samples: 211454464. Policy #0 lag: (min: 15.0, avg: 73.6, max: 271.0) [2024-06-15 16:36:50,955][1648985] Avg episode reward: [(0, '125.160')] [2024-06-15 16:36:52,926][1652491] Updated weights for policy 0, policy_version 412896 (0.0129) [2024-06-15 16:36:55,008][1652491] Updated weights for policy 0, policy_version 412976 (0.0013) [2024-06-15 16:36:55,955][1648985] Fps is (10 sec: 52428.0, 60 sec: 45329.0, 300 sec: 46541.6). Total num frames: 845807616. Throughput: 0: 11400.5. Samples: 211484160. Policy #0 lag: (min: 15.0, avg: 73.6, max: 271.0) [2024-06-15 16:36:55,956][1648985] Avg episode reward: [(0, '137.640')] [2024-06-15 16:36:55,962][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000412992_845807616.pth... [2024-06-15 16:36:56,052][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000407584_834732032.pth [2024-06-15 16:36:59,792][1652491] Updated weights for policy 0, policy_version 413024 (0.0034) [2024-06-15 16:37:00,955][1648985] Fps is (10 sec: 42598.0, 60 sec: 45875.1, 300 sec: 46541.8). Total num frames: 845938688. Throughput: 0: 11423.3. Samples: 211561984. Policy #0 lag: (min: 15.0, avg: 73.6, max: 271.0) [2024-06-15 16:37:00,956][1648985] Avg episode reward: [(0, '135.510')] [2024-06-15 16:37:01,533][1652491] Updated weights for policy 0, policy_version 413073 (0.0014) [2024-06-15 16:37:04,354][1652491] Updated weights for policy 0, policy_version 413153 (0.0014) [2024-06-15 16:37:05,955][1648985] Fps is (10 sec: 42598.7, 60 sec: 45875.2, 300 sec: 46319.5). Total num frames: 846233600. Throughput: 0: 11571.2. Samples: 211621888. Policy #0 lag: (min: 15.0, avg: 73.6, max: 271.0) [2024-06-15 16:37:05,956][1648985] Avg episode reward: [(0, '143.080')] [2024-06-15 16:37:06,133][1652491] Updated weights for policy 0, policy_version 413217 (0.0016) [2024-06-15 16:37:10,100][1652491] Updated weights for policy 0, policy_version 413251 (0.0065) [2024-06-15 16:37:10,955][1648985] Fps is (10 sec: 45874.5, 60 sec: 44782.7, 300 sec: 46430.6). Total num frames: 846397440. Throughput: 0: 11548.4. Samples: 211663360. Policy #0 lag: (min: 15.0, avg: 73.6, max: 271.0) [2024-06-15 16:37:10,956][1648985] Avg episode reward: [(0, '156.160')] [2024-06-15 16:37:12,340][1652491] Updated weights for policy 0, policy_version 413315 (0.0161) [2024-06-15 16:37:15,033][1652491] Updated weights for policy 0, policy_version 413392 (0.0014) [2024-06-15 16:37:15,955][1648985] Fps is (10 sec: 45875.6, 60 sec: 46967.4, 300 sec: 46319.5). Total num frames: 846692352. Throughput: 0: 11719.1. Samples: 211735552. Policy #0 lag: (min: 15.0, avg: 123.3, max: 271.0) [2024-06-15 16:37:15,956][1648985] Avg episode reward: [(0, '152.170')] [2024-06-15 16:37:16,518][1652491] Updated weights for policy 0, policy_version 413456 (0.0136) [2024-06-15 16:37:17,098][1651469] Signal inference workers to stop experience collection... (21550 times) [2024-06-15 16:37:17,142][1652491] InferenceWorker_p0-w0: stopping experience collection (21550 times) [2024-06-15 16:37:17,384][1651469] Signal inference workers to resume experience collection... (21550 times) [2024-06-15 16:37:17,384][1652491] InferenceWorker_p0-w0: resuming experience collection (21550 times) [2024-06-15 16:37:20,955][1648985] Fps is (10 sec: 45875.9, 60 sec: 44236.8, 300 sec: 46208.4). Total num frames: 846856192. Throughput: 0: 11628.1. Samples: 211807744. Policy #0 lag: (min: 15.0, avg: 123.3, max: 271.0) [2024-06-15 16:37:20,956][1648985] Avg episode reward: [(0, '154.760')] [2024-06-15 16:37:21,477][1652491] Updated weights for policy 0, policy_version 413520 (0.0066) [2024-06-15 16:37:23,494][1652491] Updated weights for policy 0, policy_version 413571 (0.0013) [2024-06-15 16:37:24,911][1652491] Updated weights for policy 0, policy_version 413632 (0.0021) [2024-06-15 16:37:25,955][1648985] Fps is (10 sec: 45875.1, 60 sec: 48059.7, 300 sec: 46319.5). Total num frames: 847151104. Throughput: 0: 11662.2. Samples: 211841536. Policy #0 lag: (min: 15.0, avg: 123.3, max: 271.0) [2024-06-15 16:37:25,956][1648985] Avg episode reward: [(0, '142.070')] [2024-06-15 16:37:26,967][1652491] Updated weights for policy 0, policy_version 413696 (0.0081) [2024-06-15 16:37:30,955][1648985] Fps is (10 sec: 52427.2, 60 sec: 46421.1, 300 sec: 46211.4). Total num frames: 847380480. Throughput: 0: 11684.9. Samples: 211902976. Policy #0 lag: (min: 15.0, avg: 123.3, max: 271.0) [2024-06-15 16:37:30,956][1648985] Avg episode reward: [(0, '143.880')] [2024-06-15 16:37:32,706][1652491] Updated weights for policy 0, policy_version 413780 (0.0014) [2024-06-15 16:37:35,653][1652491] Updated weights for policy 0, policy_version 413841 (0.0013) [2024-06-15 16:37:35,958][1648985] Fps is (10 sec: 42584.9, 60 sec: 46964.9, 300 sec: 46319.0). Total num frames: 847577088. Throughput: 0: 11718.3. Samples: 211981824. Policy #0 lag: (min: 15.0, avg: 123.3, max: 271.0) [2024-06-15 16:37:35,959][1648985] Avg episode reward: [(0, '129.270')] [2024-06-15 16:37:36,953][1652491] Updated weights for policy 0, policy_version 413891 (0.0011) [2024-06-15 16:37:38,140][1652491] Updated weights for policy 0, policy_version 413946 (0.0015) [2024-06-15 16:37:40,253][1652491] Updated weights for policy 0, policy_version 414013 (0.0099) [2024-06-15 16:37:40,955][1648985] Fps is (10 sec: 52430.8, 60 sec: 48059.8, 300 sec: 46208.4). Total num frames: 847904768. Throughput: 0: 11719.2. Samples: 212011520. Policy #0 lag: (min: 15.0, avg: 123.3, max: 271.0) [2024-06-15 16:37:40,956][1648985] Avg episode reward: [(0, '129.260')] [2024-06-15 16:37:45,128][1652491] Updated weights for policy 0, policy_version 414079 (0.0012) [2024-06-15 16:37:45,955][1648985] Fps is (10 sec: 45889.9, 60 sec: 45875.2, 300 sec: 46208.4). Total num frames: 848035840. Throughput: 0: 11434.7. Samples: 212076544. Policy #0 lag: (min: 15.0, avg: 123.3, max: 271.0) [2024-06-15 16:37:45,956][1648985] Avg episode reward: [(0, '126.940')] [2024-06-15 16:37:47,512][1652491] Updated weights for policy 0, policy_version 414128 (0.0012) [2024-06-15 16:37:49,474][1652491] Updated weights for policy 0, policy_version 414192 (0.0013) [2024-06-15 16:37:50,955][1648985] Fps is (10 sec: 39321.5, 60 sec: 46421.3, 300 sec: 46210.2). Total num frames: 848297984. Throughput: 0: 11594.0. Samples: 212143616. Policy #0 lag: (min: 15.0, avg: 123.3, max: 271.0) [2024-06-15 16:37:50,955][1648985] Avg episode reward: [(0, '125.000')] [2024-06-15 16:37:52,086][1652491] Updated weights for policy 0, policy_version 414256 (0.0016) [2024-06-15 16:37:55,955][1648985] Fps is (10 sec: 39321.6, 60 sec: 43690.8, 300 sec: 46208.5). Total num frames: 848429056. Throughput: 0: 11412.0. Samples: 212176896. Policy #0 lag: (min: 15.0, avg: 123.3, max: 271.0) [2024-06-15 16:37:55,956][1648985] Avg episode reward: [(0, '134.960')] [2024-06-15 16:37:56,615][1652491] Updated weights for policy 0, policy_version 414307 (0.0011) [2024-06-15 16:37:59,279][1652491] Updated weights for policy 0, policy_version 414354 (0.0016) [2024-06-15 16:38:00,775][1652491] Updated weights for policy 0, policy_version 414403 (0.0011) [2024-06-15 16:38:00,955][1648985] Fps is (10 sec: 39321.6, 60 sec: 45875.3, 300 sec: 45875.3). Total num frames: 848691200. Throughput: 0: 11343.6. Samples: 212246016. Policy #0 lag: (min: 15.0, avg: 123.3, max: 271.0) [2024-06-15 16:38:00,956][1648985] Avg episode reward: [(0, '154.930')] [2024-06-15 16:38:01,915][1652491] Updated weights for policy 0, policy_version 414456 (0.0015) [2024-06-15 16:38:03,660][1651469] Signal inference workers to stop experience collection... (21600 times) [2024-06-15 16:38:03,684][1652491] InferenceWorker_p0-w0: stopping experience collection (21600 times) [2024-06-15 16:38:03,892][1651469] Signal inference workers to resume experience collection... (21600 times) [2024-06-15 16:38:03,893][1652491] InferenceWorker_p0-w0: resuming experience collection (21600 times) [2024-06-15 16:38:04,028][1652491] Updated weights for policy 0, policy_version 414518 (0.0014) [2024-06-15 16:38:05,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 45329.1, 300 sec: 46208.4). Total num frames: 848953344. Throughput: 0: 11218.5. Samples: 212312576. Policy #0 lag: (min: 15.0, avg: 123.3, max: 271.0) [2024-06-15 16:38:05,956][1648985] Avg episode reward: [(0, '144.920')] [2024-06-15 16:38:08,767][1652491] Updated weights for policy 0, policy_version 414588 (0.0012) [2024-06-15 16:38:10,762][1652491] Updated weights for policy 0, policy_version 414624 (0.0013) [2024-06-15 16:38:10,955][1648985] Fps is (10 sec: 45875.0, 60 sec: 45875.3, 300 sec: 45986.3). Total num frames: 849149952. Throughput: 0: 11184.4. Samples: 212344832. Policy #0 lag: (min: 15.0, avg: 123.3, max: 271.0) [2024-06-15 16:38:10,956][1648985] Avg episode reward: [(0, '132.570')] [2024-06-15 16:38:13,035][1652491] Updated weights for policy 0, policy_version 414714 (0.0015) [2024-06-15 16:38:15,427][1652491] Updated weights for policy 0, policy_version 414753 (0.0016) [2024-06-15 16:38:15,955][1648985] Fps is (10 sec: 52428.5, 60 sec: 46421.3, 300 sec: 46208.4). Total num frames: 849477632. Throughput: 0: 11366.5. Samples: 212414464. Policy #0 lag: (min: 15.0, avg: 123.3, max: 271.0) [2024-06-15 16:38:15,956][1648985] Avg episode reward: [(0, '131.060')] [2024-06-15 16:38:19,540][1652491] Updated weights for policy 0, policy_version 414816 (0.0013) [2024-06-15 16:38:20,955][1648985] Fps is (10 sec: 45875.5, 60 sec: 45875.3, 300 sec: 45986.3). Total num frames: 849608704. Throughput: 0: 11287.6. Samples: 212489728. Policy #0 lag: (min: 15.0, avg: 123.3, max: 271.0) [2024-06-15 16:38:20,956][1648985] Avg episode reward: [(0, '135.450')] [2024-06-15 16:38:22,479][1652491] Updated weights for policy 0, policy_version 414880 (0.0011) [2024-06-15 16:38:24,063][1652491] Updated weights for policy 0, policy_version 414945 (0.0012) [2024-06-15 16:38:25,705][1652491] Updated weights for policy 0, policy_version 414977 (0.0038) [2024-06-15 16:38:25,955][1648985] Fps is (10 sec: 39320.4, 60 sec: 45328.8, 300 sec: 45764.1). Total num frames: 849870848. Throughput: 0: 11263.9. Samples: 212518400. Policy #0 lag: (min: 15.0, avg: 123.3, max: 271.0) [2024-06-15 16:38:25,956][1648985] Avg episode reward: [(0, '121.040')] [2024-06-15 16:38:26,844][1652491] Updated weights for policy 0, policy_version 415025 (0.0017) [2024-06-15 16:38:30,955][1648985] Fps is (10 sec: 45874.8, 60 sec: 44783.2, 300 sec: 45986.3). Total num frames: 850067456. Throughput: 0: 11434.7. Samples: 212591104. Policy #0 lag: (min: 15.0, avg: 123.3, max: 271.0) [2024-06-15 16:38:30,956][1648985] Avg episode reward: [(0, '118.780')] [2024-06-15 16:38:31,338][1652491] Updated weights for policy 0, policy_version 415088 (0.0014) [2024-06-15 16:38:33,154][1652491] Updated weights for policy 0, policy_version 415120 (0.0011) [2024-06-15 16:38:34,951][1652491] Updated weights for policy 0, policy_version 415186 (0.0012) [2024-06-15 16:38:35,957][1648985] Fps is (10 sec: 52418.0, 60 sec: 46968.1, 300 sec: 46097.0). Total num frames: 850395136. Throughput: 0: 11456.8. Samples: 212659200. Policy #0 lag: (min: 39.0, avg: 130.4, max: 295.0) [2024-06-15 16:38:35,958][1648985] Avg episode reward: [(0, '119.140')] [2024-06-15 16:38:36,943][1652491] Updated weights for policy 0, policy_version 415235 (0.0014) [2024-06-15 16:38:38,047][1652491] Updated weights for policy 0, policy_version 415285 (0.0013) [2024-06-15 16:38:40,955][1648985] Fps is (10 sec: 49152.3, 60 sec: 44236.8, 300 sec: 46208.4). Total num frames: 850558976. Throughput: 0: 11616.7. Samples: 212699648. Policy #0 lag: (min: 39.0, avg: 130.4, max: 295.0) [2024-06-15 16:38:40,955][1648985] Avg episode reward: [(0, '154.960')] [2024-06-15 16:38:41,914][1652491] Updated weights for policy 0, policy_version 415357 (0.0162) [2024-06-15 16:38:45,485][1652491] Updated weights for policy 0, policy_version 415424 (0.0012) [2024-06-15 16:38:45,955][1648985] Fps is (10 sec: 42609.0, 60 sec: 46421.4, 300 sec: 45875.2). Total num frames: 850821120. Throughput: 0: 11639.5. Samples: 212769792. Policy #0 lag: (min: 39.0, avg: 130.4, max: 295.0) [2024-06-15 16:38:45,955][1648985] Avg episode reward: [(0, '161.950')] [2024-06-15 16:38:46,967][1652491] Updated weights for policy 0, policy_version 415483 (0.0024) [2024-06-15 16:38:47,851][1651469] Signal inference workers to stop experience collection... (21650 times) [2024-06-15 16:38:47,880][1652491] InferenceWorker_p0-w0: stopping experience collection (21650 times) [2024-06-15 16:38:48,140][1651469] Signal inference workers to resume experience collection... (21650 times) [2024-06-15 16:38:48,141][1652491] InferenceWorker_p0-w0: resuming experience collection (21650 times) [2024-06-15 16:38:49,166][1652491] Updated weights for policy 0, policy_version 415550 (0.0076) [2024-06-15 16:38:50,955][1648985] Fps is (10 sec: 49151.1, 60 sec: 45875.1, 300 sec: 46208.4). Total num frames: 851050496. Throughput: 0: 11571.2. Samples: 212833280. Policy #0 lag: (min: 39.0, avg: 130.4, max: 295.0) [2024-06-15 16:38:50,956][1648985] Avg episode reward: [(0, '152.980')] [2024-06-15 16:38:53,352][1652491] Updated weights for policy 0, policy_version 415600 (0.0118) [2024-06-15 16:38:55,955][1648985] Fps is (10 sec: 39320.1, 60 sec: 46421.1, 300 sec: 45541.9). Total num frames: 851214336. Throughput: 0: 11673.5. Samples: 212870144. Policy #0 lag: (min: 39.0, avg: 130.4, max: 295.0) [2024-06-15 16:38:55,956][1648985] Avg episode reward: [(0, '150.380')] [2024-06-15 16:38:56,312][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000415648_851247104.pth... [2024-06-15 16:38:56,313][1652491] Updated weights for policy 0, policy_version 415648 (0.0016) [2024-06-15 16:38:56,511][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000410368_840433664.pth [2024-06-15 16:38:58,375][1652491] Updated weights for policy 0, policy_version 415742 (0.0014) [2024-06-15 16:39:00,955][1648985] Fps is (10 sec: 49152.4, 60 sec: 47513.5, 300 sec: 46097.3). Total num frames: 851542016. Throughput: 0: 11582.6. Samples: 212935680. Policy #0 lag: (min: 39.0, avg: 130.4, max: 295.0) [2024-06-15 16:39:00,956][1648985] Avg episode reward: [(0, '147.250')] [2024-06-15 16:39:00,989][1652491] Updated weights for policy 0, policy_version 415802 (0.0013) [2024-06-15 16:39:04,830][1652491] Updated weights for policy 0, policy_version 415862 (0.0015) [2024-06-15 16:39:05,955][1648985] Fps is (10 sec: 49153.3, 60 sec: 45875.2, 300 sec: 45653.0). Total num frames: 851705856. Throughput: 0: 11605.3. Samples: 213011968. Policy #0 lag: (min: 39.0, avg: 130.4, max: 295.0) [2024-06-15 16:39:05,956][1648985] Avg episode reward: [(0, '166.740')] [2024-06-15 16:39:07,905][1652491] Updated weights for policy 0, policy_version 415904 (0.0013) [2024-06-15 16:39:09,323][1652491] Updated weights for policy 0, policy_version 415963 (0.0012) [2024-06-15 16:39:10,955][1648985] Fps is (10 sec: 42598.0, 60 sec: 46967.4, 300 sec: 45764.1). Total num frames: 851968000. Throughput: 0: 11696.4. Samples: 213044736. Policy #0 lag: (min: 39.0, avg: 130.4, max: 295.0) [2024-06-15 16:39:10,956][1648985] Avg episode reward: [(0, '170.930')] [2024-06-15 16:39:11,184][1652491] Updated weights for policy 0, policy_version 416017 (0.0012) [2024-06-15 16:39:11,941][1652491] Updated weights for policy 0, policy_version 416055 (0.0012) [2024-06-15 16:39:14,503][1652491] Updated weights for policy 0, policy_version 416086 (0.0013) [2024-06-15 16:39:15,549][1652491] Updated weights for policy 0, policy_version 416126 (0.0013) [2024-06-15 16:39:15,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 45875.2, 300 sec: 45986.3). Total num frames: 852230144. Throughput: 0: 11696.3. Samples: 213117440. Policy #0 lag: (min: 39.0, avg: 130.4, max: 295.0) [2024-06-15 16:39:15,955][1648985] Avg episode reward: [(0, '170.200')] [2024-06-15 16:39:19,069][1652491] Updated weights for policy 0, policy_version 416177 (0.0013) [2024-06-15 16:39:20,464][1652491] Updated weights for policy 0, policy_version 416256 (0.0121) [2024-06-15 16:39:20,955][1648985] Fps is (10 sec: 52429.9, 60 sec: 48059.7, 300 sec: 45764.1). Total num frames: 852492288. Throughput: 0: 11765.3. Samples: 213188608. Policy #0 lag: (min: 39.0, avg: 130.4, max: 295.0) [2024-06-15 16:39:20,955][1648985] Avg episode reward: [(0, '158.150')] [2024-06-15 16:39:23,366][1652491] Updated weights for policy 0, policy_version 416320 (0.0021) [2024-06-15 16:39:25,834][1652491] Updated weights for policy 0, policy_version 416381 (0.0011) [2024-06-15 16:39:25,955][1648985] Fps is (10 sec: 52429.1, 60 sec: 48060.0, 300 sec: 46208.4). Total num frames: 852754432. Throughput: 0: 11719.1. Samples: 213227008. Policy #0 lag: (min: 39.0, avg: 130.4, max: 295.0) [2024-06-15 16:39:25,955][1648985] Avg episode reward: [(0, '144.190')] [2024-06-15 16:39:29,811][1652491] Updated weights for policy 0, policy_version 416435 (0.0119) [2024-06-15 16:39:30,487][1651469] Signal inference workers to stop experience collection... (21700 times) [2024-06-15 16:39:30,560][1652491] InferenceWorker_p0-w0: stopping experience collection (21700 times) [2024-06-15 16:39:30,782][1651469] Signal inference workers to resume experience collection... (21700 times) [2024-06-15 16:39:30,786][1652491] InferenceWorker_p0-w0: resuming experience collection (21700 times) [2024-06-15 16:39:30,955][1648985] Fps is (10 sec: 45874.9, 60 sec: 48059.8, 300 sec: 45542.0). Total num frames: 852951040. Throughput: 0: 11730.5. Samples: 213297664. Policy #0 lag: (min: 39.0, avg: 130.4, max: 295.0) [2024-06-15 16:39:30,956][1648985] Avg episode reward: [(0, '139.410')] [2024-06-15 16:39:31,335][1652491] Updated weights for policy 0, policy_version 416512 (0.0011) [2024-06-15 16:39:34,663][1652491] Updated weights for policy 0, policy_version 416566 (0.0012) [2024-06-15 16:39:35,955][1648985] Fps is (10 sec: 39321.6, 60 sec: 45877.0, 300 sec: 45764.1). Total num frames: 853147648. Throughput: 0: 11867.1. Samples: 213367296. Policy #0 lag: (min: 39.0, avg: 130.4, max: 295.0) [2024-06-15 16:39:35,956][1648985] Avg episode reward: [(0, '150.460')] [2024-06-15 16:39:36,434][1652491] Updated weights for policy 0, policy_version 416597 (0.0011) [2024-06-15 16:39:40,880][1652491] Updated weights for policy 0, policy_version 416659 (0.0014) [2024-06-15 16:39:40,955][1648985] Fps is (10 sec: 36044.2, 60 sec: 45875.1, 300 sec: 45208.7). Total num frames: 853311488. Throughput: 0: 11821.6. Samples: 213402112. Policy #0 lag: (min: 39.0, avg: 130.4, max: 295.0) [2024-06-15 16:39:40,956][1648985] Avg episode reward: [(0, '157.930')] [2024-06-15 16:39:42,950][1652491] Updated weights for policy 0, policy_version 416737 (0.0014) [2024-06-15 16:39:45,405][1652491] Updated weights for policy 0, policy_version 416787 (0.0015) [2024-06-15 16:39:45,955][1648985] Fps is (10 sec: 45875.3, 60 sec: 46421.3, 300 sec: 45986.3). Total num frames: 853606400. Throughput: 0: 11832.9. Samples: 213468160. Policy #0 lag: (min: 39.0, avg: 130.4, max: 295.0) [2024-06-15 16:39:45,955][1648985] Avg episode reward: [(0, '161.740')] [2024-06-15 16:39:46,314][1652491] Updated weights for policy 0, policy_version 416829 (0.0010) [2024-06-15 16:39:48,080][1652491] Updated weights for policy 0, policy_version 416880 (0.0011) [2024-06-15 16:39:50,955][1648985] Fps is (10 sec: 49152.8, 60 sec: 45875.3, 300 sec: 45319.9). Total num frames: 853803008. Throughput: 0: 11741.9. Samples: 213540352. Policy #0 lag: (min: 39.0, avg: 130.4, max: 295.0) [2024-06-15 16:39:50,956][1648985] Avg episode reward: [(0, '150.650')] [2024-06-15 16:39:51,713][1652491] Updated weights for policy 0, policy_version 416912 (0.0012) [2024-06-15 16:39:53,206][1652491] Updated weights for policy 0, policy_version 416976 (0.0028) [2024-06-15 16:39:55,955][1648985] Fps is (10 sec: 45874.6, 60 sec: 47513.7, 300 sec: 45764.1). Total num frames: 854065152. Throughput: 0: 11685.0. Samples: 213570560. Policy #0 lag: (min: 39.0, avg: 130.4, max: 295.0) [2024-06-15 16:39:55,956][1648985] Avg episode reward: [(0, '139.150')] [2024-06-15 16:39:57,136][1652491] Updated weights for policy 0, policy_version 417078 (0.0114) [2024-06-15 16:39:58,700][1652491] Updated weights for policy 0, policy_version 417121 (0.0012) [2024-06-15 16:40:00,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 46421.4, 300 sec: 45875.2). Total num frames: 854327296. Throughput: 0: 11730.5. Samples: 213645312. Policy #0 lag: (min: 15.0, avg: 128.2, max: 271.0) [2024-06-15 16:40:00,956][1648985] Avg episode reward: [(0, '141.330')] [2024-06-15 16:40:03,350][1652491] Updated weights for policy 0, policy_version 417187 (0.0012) [2024-06-15 16:40:04,835][1652491] Updated weights for policy 0, policy_version 417249 (0.0012) [2024-06-15 16:40:05,956][1648985] Fps is (10 sec: 52427.1, 60 sec: 48059.4, 300 sec: 45764.1). Total num frames: 854589440. Throughput: 0: 11639.3. Samples: 213712384. Policy #0 lag: (min: 15.0, avg: 128.2, max: 271.0) [2024-06-15 16:40:05,957][1648985] Avg episode reward: [(0, '136.730')] [2024-06-15 16:40:07,447][1652491] Updated weights for policy 0, policy_version 417297 (0.0012) [2024-06-15 16:40:09,998][1652491] Updated weights for policy 0, policy_version 417360 (0.0016) [2024-06-15 16:40:10,955][1648985] Fps is (10 sec: 49151.6, 60 sec: 47513.7, 300 sec: 46097.3). Total num frames: 854818816. Throughput: 0: 11605.3. Samples: 213749248. Policy #0 lag: (min: 15.0, avg: 128.2, max: 271.0) [2024-06-15 16:40:10,956][1648985] Avg episode reward: [(0, '145.330')] [2024-06-15 16:40:11,165][1652491] Updated weights for policy 0, policy_version 417408 (0.0013) [2024-06-15 16:40:14,946][1651469] Signal inference workers to stop experience collection... (21750 times) [2024-06-15 16:40:15,036][1652491] InferenceWorker_p0-w0: stopping experience collection (21750 times) [2024-06-15 16:40:15,275][1651469] Signal inference workers to resume experience collection... (21750 times) [2024-06-15 16:40:15,276][1652491] InferenceWorker_p0-w0: resuming experience collection (21750 times) [2024-06-15 16:40:15,474][1652491] Updated weights for policy 0, policy_version 417474 (0.0013) [2024-06-15 16:40:15,955][1648985] Fps is (10 sec: 42600.1, 60 sec: 46421.3, 300 sec: 45430.9). Total num frames: 855015424. Throughput: 0: 11605.3. Samples: 213819904. Policy #0 lag: (min: 15.0, avg: 128.2, max: 271.0) [2024-06-15 16:40:15,956][1648985] Avg episode reward: [(0, '146.660')] [2024-06-15 16:40:16,835][1652491] Updated weights for policy 0, policy_version 417529 (0.0013) [2024-06-15 16:40:20,287][1652491] Updated weights for policy 0, policy_version 417593 (0.0013) [2024-06-15 16:40:20,955][1648985] Fps is (10 sec: 42598.3, 60 sec: 45875.1, 300 sec: 46208.4). Total num frames: 855244800. Throughput: 0: 11514.3. Samples: 213885440. Policy #0 lag: (min: 15.0, avg: 128.2, max: 271.0) [2024-06-15 16:40:20,956][1648985] Avg episode reward: [(0, '155.850')] [2024-06-15 16:40:22,332][1652491] Updated weights for policy 0, policy_version 417648 (0.0108) [2024-06-15 16:40:25,958][1648985] Fps is (10 sec: 36033.6, 60 sec: 43688.4, 300 sec: 45430.4). Total num frames: 855375872. Throughput: 0: 11468.0. Samples: 213918208. Policy #0 lag: (min: 15.0, avg: 128.2, max: 271.0) [2024-06-15 16:40:25,959][1648985] Avg episode reward: [(0, '157.870')] [2024-06-15 16:40:26,424][1652491] Updated weights for policy 0, policy_version 417697 (0.0013) [2024-06-15 16:40:27,975][1652491] Updated weights for policy 0, policy_version 417760 (0.0014) [2024-06-15 16:40:30,517][1652491] Updated weights for policy 0, policy_version 417809 (0.0013) [2024-06-15 16:40:30,955][1648985] Fps is (10 sec: 45875.0, 60 sec: 45875.1, 300 sec: 45986.3). Total num frames: 855703552. Throughput: 0: 11571.2. Samples: 213988864. Policy #0 lag: (min: 15.0, avg: 128.2, max: 271.0) [2024-06-15 16:40:30,956][1648985] Avg episode reward: [(0, '135.380')] [2024-06-15 16:40:31,262][1652491] Updated weights for policy 0, policy_version 417849 (0.0014) [2024-06-15 16:40:33,378][1652491] Updated weights for policy 0, policy_version 417904 (0.0013) [2024-06-15 16:40:35,955][1648985] Fps is (10 sec: 52445.5, 60 sec: 45875.2, 300 sec: 45986.3). Total num frames: 855900160. Throughput: 0: 11582.6. Samples: 214061568. Policy #0 lag: (min: 15.0, avg: 128.2, max: 271.0) [2024-06-15 16:40:35,956][1648985] Avg episode reward: [(0, '137.550')] [2024-06-15 16:40:37,385][1652491] Updated weights for policy 0, policy_version 417956 (0.0012) [2024-06-15 16:40:38,994][1652491] Updated weights for policy 0, policy_version 418032 (0.0013) [2024-06-15 16:40:40,955][1648985] Fps is (10 sec: 45875.9, 60 sec: 47513.7, 300 sec: 45764.1). Total num frames: 856162304. Throughput: 0: 11639.5. Samples: 214094336. Policy #0 lag: (min: 15.0, avg: 128.2, max: 271.0) [2024-06-15 16:40:40,955][1648985] Avg episode reward: [(0, '157.170')] [2024-06-15 16:40:41,260][1652491] Updated weights for policy 0, policy_version 418065 (0.0015) [2024-06-15 16:40:42,310][1652491] Updated weights for policy 0, policy_version 418112 (0.0012) [2024-06-15 16:40:44,611][1652491] Updated weights for policy 0, policy_version 418169 (0.0014) [2024-06-15 16:40:45,955][1648985] Fps is (10 sec: 52428.6, 60 sec: 46967.4, 300 sec: 46541.7). Total num frames: 856424448. Throughput: 0: 11593.9. Samples: 214167040. Policy #0 lag: (min: 15.0, avg: 128.2, max: 271.0) [2024-06-15 16:40:45,956][1648985] Avg episode reward: [(0, '148.200')] [2024-06-15 16:40:49,062][1652491] Updated weights for policy 0, policy_version 418242 (0.0014) [2024-06-15 16:40:50,346][1652491] Updated weights for policy 0, policy_version 418300 (0.0030) [2024-06-15 16:40:50,955][1648985] Fps is (10 sec: 52427.7, 60 sec: 48059.6, 300 sec: 46097.3). Total num frames: 856686592. Throughput: 0: 11673.7. Samples: 214237696. Policy #0 lag: (min: 15.0, avg: 128.2, max: 271.0) [2024-06-15 16:40:50,956][1648985] Avg episode reward: [(0, '144.100')] [2024-06-15 16:40:53,558][1652491] Updated weights for policy 0, policy_version 418367 (0.0102) [2024-06-15 16:40:55,955][1648985] Fps is (10 sec: 49150.5, 60 sec: 47513.4, 300 sec: 46541.6). Total num frames: 856915968. Throughput: 0: 11571.1. Samples: 214269952. Policy #0 lag: (min: 15.0, avg: 128.2, max: 271.0) [2024-06-15 16:40:55,956][1648985] Avg episode reward: [(0, '140.100')] [2024-06-15 16:40:56,130][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000418432_856948736.pth... [2024-06-15 16:40:56,133][1652491] Updated weights for policy 0, policy_version 418430 (0.0135) [2024-06-15 16:40:56,216][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000412992_845807616.pth [2024-06-15 16:40:59,574][1651469] Signal inference workers to stop experience collection... (21800 times) [2024-06-15 16:40:59,603][1652491] InferenceWorker_p0-w0: stopping experience collection (21800 times) [2024-06-15 16:40:59,877][1651469] Signal inference workers to resume experience collection... (21800 times) [2024-06-15 16:40:59,878][1652491] InferenceWorker_p0-w0: resuming experience collection (21800 times) [2024-06-15 16:41:00,547][1652491] Updated weights for policy 0, policy_version 418482 (0.0014) [2024-06-15 16:41:00,955][1648985] Fps is (10 sec: 39322.5, 60 sec: 45875.2, 300 sec: 46097.4). Total num frames: 857079808. Throughput: 0: 11673.6. Samples: 214345216. Policy #0 lag: (min: 15.0, avg: 128.2, max: 271.0) [2024-06-15 16:41:00,955][1648985] Avg episode reward: [(0, '128.160')] [2024-06-15 16:41:04,062][1652491] Updated weights for policy 0, policy_version 418576 (0.0132) [2024-06-15 16:41:05,955][1648985] Fps is (10 sec: 42599.7, 60 sec: 45875.5, 300 sec: 46208.4). Total num frames: 857341952. Throughput: 0: 11468.8. Samples: 214401536. Policy #0 lag: (min: 15.0, avg: 128.2, max: 271.0) [2024-06-15 16:41:05,956][1648985] Avg episode reward: [(0, '121.490')] [2024-06-15 16:41:06,823][1652491] Updated weights for policy 0, policy_version 418640 (0.0013) [2024-06-15 16:41:07,747][1652491] Updated weights for policy 0, policy_version 418686 (0.0012) [2024-06-15 16:41:10,955][1648985] Fps is (10 sec: 39321.4, 60 sec: 44236.8, 300 sec: 46097.3). Total num frames: 857473024. Throughput: 0: 11640.3. Samples: 214441984. Policy #0 lag: (min: 15.0, avg: 128.2, max: 271.0) [2024-06-15 16:41:10,956][1648985] Avg episode reward: [(0, '109.860')] [2024-06-15 16:41:12,345][1652491] Updated weights for policy 0, policy_version 418742 (0.0014) [2024-06-15 16:41:14,205][1652491] Updated weights for policy 0, policy_version 418815 (0.0014) [2024-06-15 16:41:15,955][1648985] Fps is (10 sec: 42597.4, 60 sec: 45875.0, 300 sec: 45986.2). Total num frames: 857767936. Throughput: 0: 11548.4. Samples: 214508544. Policy #0 lag: (min: 15.0, avg: 128.2, max: 271.0) [2024-06-15 16:41:15,956][1648985] Avg episode reward: [(0, '129.080')] [2024-06-15 16:41:16,747][1652491] Updated weights for policy 0, policy_version 418872 (0.0013) [2024-06-15 16:41:17,780][1652491] Updated weights for policy 0, policy_version 418912 (0.0031) [2024-06-15 16:41:20,955][1648985] Fps is (10 sec: 52428.5, 60 sec: 45875.2, 300 sec: 46541.7). Total num frames: 857997312. Throughput: 0: 11639.4. Samples: 214585344. Policy #0 lag: (min: 31.0, avg: 172.5, max: 287.0) [2024-06-15 16:41:20,956][1648985] Avg episode reward: [(0, '145.870')] [2024-06-15 16:41:22,974][1652491] Updated weights for policy 0, policy_version 418978 (0.0013) [2024-06-15 16:41:24,423][1652491] Updated weights for policy 0, policy_version 419042 (0.0015) [2024-06-15 16:41:25,955][1648985] Fps is (10 sec: 49153.0, 60 sec: 48062.2, 300 sec: 46319.5). Total num frames: 858259456. Throughput: 0: 11650.8. Samples: 214618624. Policy #0 lag: (min: 31.0, avg: 172.5, max: 287.0) [2024-06-15 16:41:25,956][1648985] Avg episode reward: [(0, '130.380')] [2024-06-15 16:41:26,253][1652491] Updated weights for policy 0, policy_version 419080 (0.0012) [2024-06-15 16:41:27,801][1652491] Updated weights for policy 0, policy_version 419136 (0.0030) [2024-06-15 16:41:29,281][1652491] Updated weights for policy 0, policy_version 419200 (0.0078) [2024-06-15 16:41:30,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 46967.5, 300 sec: 46652.7). Total num frames: 858521600. Throughput: 0: 11434.7. Samples: 214681600. Policy #0 lag: (min: 31.0, avg: 172.5, max: 287.0) [2024-06-15 16:41:30,956][1648985] Avg episode reward: [(0, '126.300')] [2024-06-15 16:41:35,108][1652491] Updated weights for policy 0, policy_version 419266 (0.0013) [2024-06-15 16:41:35,955][1648985] Fps is (10 sec: 45875.5, 60 sec: 46967.5, 300 sec: 46430.6). Total num frames: 858718208. Throughput: 0: 11559.9. Samples: 214757888. Policy #0 lag: (min: 31.0, avg: 172.5, max: 287.0) [2024-06-15 16:41:35,955][1648985] Avg episode reward: [(0, '127.290')] [2024-06-15 16:41:37,710][1652491] Updated weights for policy 0, policy_version 419333 (0.0012) [2024-06-15 16:41:39,446][1652491] Updated weights for policy 0, policy_version 419393 (0.0030) [2024-06-15 16:41:40,298][1651469] Signal inference workers to stop experience collection... (21850 times) [2024-06-15 16:41:40,336][1652491] InferenceWorker_p0-w0: stopping experience collection (21850 times) [2024-06-15 16:41:40,574][1651469] Signal inference workers to resume experience collection... (21850 times) [2024-06-15 16:41:40,575][1652491] InferenceWorker_p0-w0: resuming experience collection (21850 times) [2024-06-15 16:41:40,871][1652491] Updated weights for policy 0, policy_version 419456 (0.0012) [2024-06-15 16:41:40,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 48059.7, 300 sec: 46652.7). Total num frames: 859045888. Throughput: 0: 11594.0. Samples: 214791680. Policy #0 lag: (min: 31.0, avg: 172.5, max: 287.0) [2024-06-15 16:41:40,956][1648985] Avg episode reward: [(0, '134.380')] [2024-06-15 16:41:45,056][1652491] Updated weights for policy 0, policy_version 419513 (0.0014) [2024-06-15 16:41:45,955][1648985] Fps is (10 sec: 45875.4, 60 sec: 45875.3, 300 sec: 46319.5). Total num frames: 859176960. Throughput: 0: 11537.1. Samples: 214864384. Policy #0 lag: (min: 31.0, avg: 172.5, max: 287.0) [2024-06-15 16:41:45,955][1648985] Avg episode reward: [(0, '160.070')] [2024-06-15 16:41:47,689][1652491] Updated weights for policy 0, policy_version 419568 (0.0013) [2024-06-15 16:41:49,610][1652491] Updated weights for policy 0, policy_version 419617 (0.0112) [2024-06-15 16:41:50,955][1648985] Fps is (10 sec: 42598.2, 60 sec: 46421.4, 300 sec: 46319.5). Total num frames: 859471872. Throughput: 0: 11753.2. Samples: 214930432. Policy #0 lag: (min: 31.0, avg: 172.5, max: 287.0) [2024-06-15 16:41:50,956][1648985] Avg episode reward: [(0, '150.340')] [2024-06-15 16:41:51,129][1652491] Updated weights for policy 0, policy_version 419680 (0.0024) [2024-06-15 16:41:55,743][1652491] Updated weights for policy 0, policy_version 419744 (0.0012) [2024-06-15 16:41:55,955][1648985] Fps is (10 sec: 45874.6, 60 sec: 45329.3, 300 sec: 46430.6). Total num frames: 859635712. Throughput: 0: 11707.7. Samples: 214968832. Policy #0 lag: (min: 31.0, avg: 172.5, max: 287.0) [2024-06-15 16:41:55,956][1648985] Avg episode reward: [(0, '143.100')] [2024-06-15 16:41:58,324][1652491] Updated weights for policy 0, policy_version 419792 (0.0015) [2024-06-15 16:41:59,215][1652491] Updated weights for policy 0, policy_version 419834 (0.0021) [2024-06-15 16:42:00,934][1652491] Updated weights for policy 0, policy_version 419892 (0.0012) [2024-06-15 16:42:00,955][1648985] Fps is (10 sec: 45875.7, 60 sec: 47513.6, 300 sec: 46430.6). Total num frames: 859930624. Throughput: 0: 11855.7. Samples: 215042048. Policy #0 lag: (min: 31.0, avg: 172.5, max: 287.0) [2024-06-15 16:42:00,956][1648985] Avg episode reward: [(0, '131.800')] [2024-06-15 16:42:02,361][1652491] Updated weights for policy 0, policy_version 419961 (0.0020) [2024-06-15 16:42:05,955][1648985] Fps is (10 sec: 45875.7, 60 sec: 45875.2, 300 sec: 46430.6). Total num frames: 860094464. Throughput: 0: 11798.8. Samples: 215116288. Policy #0 lag: (min: 31.0, avg: 172.5, max: 287.0) [2024-06-15 16:42:05,955][1648985] Avg episode reward: [(0, '149.970')] [2024-06-15 16:42:06,939][1652491] Updated weights for policy 0, policy_version 420025 (0.0028) [2024-06-15 16:42:10,380][1652491] Updated weights for policy 0, policy_version 420080 (0.0013) [2024-06-15 16:42:10,955][1648985] Fps is (10 sec: 42598.2, 60 sec: 48059.7, 300 sec: 46319.5). Total num frames: 860356608. Throughput: 0: 11901.2. Samples: 215154176. Policy #0 lag: (min: 31.0, avg: 172.5, max: 287.0) [2024-06-15 16:42:10,956][1648985] Avg episode reward: [(0, '163.310')] [2024-06-15 16:42:12,421][1652491] Updated weights for policy 0, policy_version 420145 (0.0013) [2024-06-15 16:42:14,044][1652491] Updated weights for policy 0, policy_version 420221 (0.0013) [2024-06-15 16:42:15,955][1648985] Fps is (10 sec: 52428.0, 60 sec: 47513.7, 300 sec: 46652.7). Total num frames: 860618752. Throughput: 0: 11753.2. Samples: 215210496. Policy #0 lag: (min: 31.0, avg: 172.5, max: 287.0) [2024-06-15 16:42:15,956][1648985] Avg episode reward: [(0, '149.950')] [2024-06-15 16:42:18,502][1652491] Updated weights for policy 0, policy_version 420272 (0.0012) [2024-06-15 16:42:20,955][1648985] Fps is (10 sec: 39321.6, 60 sec: 45875.2, 300 sec: 46097.4). Total num frames: 860749824. Throughput: 0: 11878.4. Samples: 215292416. Policy #0 lag: (min: 31.0, avg: 172.5, max: 287.0) [2024-06-15 16:42:20,956][1648985] Avg episode reward: [(0, '126.900')] [2024-06-15 16:42:22,438][1652491] Updated weights for policy 0, policy_version 420352 (0.0018) [2024-06-15 16:42:23,766][1651469] Signal inference workers to stop experience collection... (21900 times) [2024-06-15 16:42:23,875][1652491] InferenceWorker_p0-w0: stopping experience collection (21900 times) [2024-06-15 16:42:23,951][1651469] Signal inference workers to resume experience collection... (21900 times) [2024-06-15 16:42:23,951][1652491] InferenceWorker_p0-w0: resuming experience collection (21900 times) [2024-06-15 16:42:24,217][1652491] Updated weights for policy 0, policy_version 420422 (0.0019) [2024-06-15 16:42:25,402][1652491] Updated weights for policy 0, policy_version 420476 (0.0012) [2024-06-15 16:42:25,955][1648985] Fps is (10 sec: 52428.4, 60 sec: 48059.6, 300 sec: 46652.8). Total num frames: 861143040. Throughput: 0: 11662.2. Samples: 215316480. Policy #0 lag: (min: 31.0, avg: 172.5, max: 287.0) [2024-06-15 16:42:25,956][1648985] Avg episode reward: [(0, '136.120')] [2024-06-15 16:42:29,946][1652491] Updated weights for policy 0, policy_version 420544 (0.0014) [2024-06-15 16:42:30,955][1648985] Fps is (10 sec: 52429.1, 60 sec: 45875.3, 300 sec: 46431.1). Total num frames: 861274112. Throughput: 0: 11491.6. Samples: 215381504. Policy #0 lag: (min: 31.0, avg: 172.5, max: 287.0) [2024-06-15 16:42:30,955][1648985] Avg episode reward: [(0, '138.720')] [2024-06-15 16:42:33,878][1652491] Updated weights for policy 0, policy_version 420594 (0.0014) [2024-06-15 16:42:35,182][1652491] Updated weights for policy 0, policy_version 420656 (0.0021) [2024-06-15 16:42:35,955][1648985] Fps is (10 sec: 42599.3, 60 sec: 47513.6, 300 sec: 46319.5). Total num frames: 861569024. Throughput: 0: 11639.5. Samples: 215454208. Policy #0 lag: (min: 31.0, avg: 172.5, max: 287.0) [2024-06-15 16:42:35,956][1648985] Avg episode reward: [(0, '155.890')] [2024-06-15 16:42:36,872][1652491] Updated weights for policy 0, policy_version 420728 (0.0011) [2024-06-15 16:42:40,876][1652491] Updated weights for policy 0, policy_version 420789 (0.0117) [2024-06-15 16:42:40,955][1648985] Fps is (10 sec: 49151.3, 60 sec: 45329.0, 300 sec: 46541.6). Total num frames: 861765632. Throughput: 0: 11639.5. Samples: 215492608. Policy #0 lag: (min: 15.0, avg: 128.1, max: 271.0) [2024-06-15 16:42:40,956][1648985] Avg episode reward: [(0, '166.280')] [2024-06-15 16:42:43,933][1652491] Updated weights for policy 0, policy_version 420816 (0.0012) [2024-06-15 16:42:45,280][1652491] Updated weights for policy 0, policy_version 420866 (0.0012) [2024-06-15 16:42:45,959][1648985] Fps is (10 sec: 42583.7, 60 sec: 46964.7, 300 sec: 46430.0). Total num frames: 861995008. Throughput: 0: 11558.9. Samples: 215562240. Policy #0 lag: (min: 15.0, avg: 128.1, max: 271.0) [2024-06-15 16:42:45,959][1648985] Avg episode reward: [(0, '175.360')] [2024-06-15 16:42:46,904][1652491] Updated weights for policy 0, policy_version 420944 (0.0012) [2024-06-15 16:42:47,938][1652491] Updated weights for policy 0, policy_version 420992 (0.0014) [2024-06-15 16:42:50,955][1648985] Fps is (10 sec: 45875.9, 60 sec: 45875.3, 300 sec: 46763.8). Total num frames: 862224384. Throughput: 0: 11582.6. Samples: 215637504. Policy #0 lag: (min: 15.0, avg: 128.1, max: 271.0) [2024-06-15 16:42:50,956][1648985] Avg episode reward: [(0, '161.190')] [2024-06-15 16:42:51,757][1652491] Updated weights for policy 0, policy_version 421056 (0.0275) [2024-06-15 16:42:55,955][1648985] Fps is (10 sec: 39333.8, 60 sec: 45875.0, 300 sec: 46430.5). Total num frames: 862388224. Throughput: 0: 11537.0. Samples: 215673344. Policy #0 lag: (min: 15.0, avg: 128.1, max: 271.0) [2024-06-15 16:42:55,956][1648985] Avg episode reward: [(0, '161.080')] [2024-06-15 16:42:56,255][1652491] Updated weights for policy 0, policy_version 421105 (0.0013) [2024-06-15 16:42:56,467][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000421120_862453760.pth... [2024-06-15 16:42:56,626][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000415648_851247104.pth [2024-06-15 16:42:58,561][1652491] Updated weights for policy 0, policy_version 421205 (0.0011) [2024-06-15 16:43:00,955][1648985] Fps is (10 sec: 49151.2, 60 sec: 46421.2, 300 sec: 46652.7). Total num frames: 862715904. Throughput: 0: 11662.2. Samples: 215735296. Policy #0 lag: (min: 15.0, avg: 128.1, max: 271.0) [2024-06-15 16:43:00,957][1648985] Avg episode reward: [(0, '152.350')] [2024-06-15 16:43:01,834][1652491] Updated weights for policy 0, policy_version 421250 (0.0014) [2024-06-15 16:43:05,955][1648985] Fps is (10 sec: 45876.6, 60 sec: 45875.1, 300 sec: 46430.6). Total num frames: 862846976. Throughput: 0: 11696.3. Samples: 215818752. Policy #0 lag: (min: 15.0, avg: 128.1, max: 271.0) [2024-06-15 16:43:05,956][1648985] Avg episode reward: [(0, '146.650')] [2024-06-15 16:43:06,171][1652491] Updated weights for policy 0, policy_version 421315 (0.0012) [2024-06-15 16:43:06,559][1651469] Signal inference workers to stop experience collection... (21950 times) [2024-06-15 16:43:06,638][1652491] InferenceWorker_p0-w0: stopping experience collection (21950 times) [2024-06-15 16:43:06,866][1651469] Signal inference workers to resume experience collection... (21950 times) [2024-06-15 16:43:06,867][1652491] InferenceWorker_p0-w0: resuming experience collection (21950 times) [2024-06-15 16:43:07,943][1652491] Updated weights for policy 0, policy_version 421381 (0.0098) [2024-06-15 16:43:09,328][1652491] Updated weights for policy 0, policy_version 421442 (0.0013) [2024-06-15 16:43:10,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 48059.7, 300 sec: 46652.7). Total num frames: 863240192. Throughput: 0: 11696.4. Samples: 215842816. Policy #0 lag: (min: 15.0, avg: 128.1, max: 271.0) [2024-06-15 16:43:10,956][1648985] Avg episode reward: [(0, '154.600')] [2024-06-15 16:43:13,814][1652491] Updated weights for policy 0, policy_version 421536 (0.0015) [2024-06-15 16:43:15,958][1648985] Fps is (10 sec: 52411.7, 60 sec: 45872.8, 300 sec: 46652.2). Total num frames: 863371264. Throughput: 0: 11741.0. Samples: 215909888. Policy #0 lag: (min: 15.0, avg: 128.1, max: 271.0) [2024-06-15 16:43:15,959][1648985] Avg episode reward: [(0, '151.110')] [2024-06-15 16:43:18,352][1652491] Updated weights for policy 0, policy_version 421593 (0.0123) [2024-06-15 16:43:20,005][1652491] Updated weights for policy 0, policy_version 421664 (0.0012) [2024-06-15 16:43:20,955][1648985] Fps is (10 sec: 39322.1, 60 sec: 48059.8, 300 sec: 46652.8). Total num frames: 863633408. Throughput: 0: 11628.1. Samples: 215977472. Policy #0 lag: (min: 15.0, avg: 128.1, max: 271.0) [2024-06-15 16:43:20,955][1648985] Avg episode reward: [(0, '147.600')] [2024-06-15 16:43:21,733][1652491] Updated weights for policy 0, policy_version 421750 (0.0015) [2024-06-15 16:43:25,185][1652491] Updated weights for policy 0, policy_version 421793 (0.0012) [2024-06-15 16:43:25,955][1648985] Fps is (10 sec: 52446.0, 60 sec: 45875.3, 300 sec: 46874.9). Total num frames: 863895552. Throughput: 0: 11628.1. Samples: 216015872. Policy #0 lag: (min: 15.0, avg: 128.1, max: 271.0) [2024-06-15 16:43:25,956][1648985] Avg episode reward: [(0, '141.440')] [2024-06-15 16:43:29,001][1652491] Updated weights for policy 0, policy_version 421825 (0.0024) [2024-06-15 16:43:30,584][1652491] Updated weights for policy 0, policy_version 421904 (0.0125) [2024-06-15 16:43:30,955][1648985] Fps is (10 sec: 42597.1, 60 sec: 46421.1, 300 sec: 46319.8). Total num frames: 864059392. Throughput: 0: 11765.5. Samples: 216091648. Policy #0 lag: (min: 15.0, avg: 128.1, max: 271.0) [2024-06-15 16:43:30,956][1648985] Avg episode reward: [(0, '149.330')] [2024-06-15 16:43:32,229][1652491] Updated weights for policy 0, policy_version 421972 (0.0013) [2024-06-15 16:43:33,127][1652491] Updated weights for policy 0, policy_version 422016 (0.0013) [2024-06-15 16:43:35,956][1648985] Fps is (10 sec: 39320.7, 60 sec: 45328.9, 300 sec: 46541.6). Total num frames: 864288768. Throughput: 0: 11559.7. Samples: 216157696. Policy #0 lag: (min: 15.0, avg: 128.1, max: 271.0) [2024-06-15 16:43:35,957][1648985] Avg episode reward: [(0, '148.290')] [2024-06-15 16:43:37,504][1652491] Updated weights for policy 0, policy_version 422075 (0.0014) [2024-06-15 16:43:40,955][1648985] Fps is (10 sec: 39322.3, 60 sec: 44782.9, 300 sec: 46208.4). Total num frames: 864452608. Throughput: 0: 11480.2. Samples: 216189952. Policy #0 lag: (min: 15.0, avg: 128.1, max: 271.0) [2024-06-15 16:43:40,956][1648985] Avg episode reward: [(0, '153.730')] [2024-06-15 16:43:41,507][1652491] Updated weights for policy 0, policy_version 422128 (0.0020) [2024-06-15 16:43:43,183][1652491] Updated weights for policy 0, policy_version 422197 (0.0015) [2024-06-15 16:43:43,826][1651469] Signal inference workers to stop experience collection... (22000 times) [2024-06-15 16:43:43,855][1652491] InferenceWorker_p0-w0: stopping experience collection (22000 times) [2024-06-15 16:43:44,114][1651469] Signal inference workers to resume experience collection... (22000 times) [2024-06-15 16:43:44,142][1652491] InferenceWorker_p0-w0: resuming experience collection (22000 times) [2024-06-15 16:43:44,460][1652491] Updated weights for policy 0, policy_version 422256 (0.0014) [2024-06-15 16:43:45,956][1648985] Fps is (10 sec: 52423.5, 60 sec: 46969.2, 300 sec: 46652.6). Total num frames: 864813056. Throughput: 0: 11593.7. Samples: 216257024. Policy #0 lag: (min: 15.0, avg: 128.1, max: 271.0) [2024-06-15 16:43:45,957][1648985] Avg episode reward: [(0, '152.750')] [2024-06-15 16:43:48,124][1652491] Updated weights for policy 0, policy_version 422305 (0.0019) [2024-06-15 16:43:50,955][1648985] Fps is (10 sec: 49152.7, 60 sec: 45329.0, 300 sec: 46541.7). Total num frames: 864944128. Throughput: 0: 11503.0. Samples: 216336384. Policy #0 lag: (min: 15.0, avg: 128.1, max: 271.0) [2024-06-15 16:43:50,956][1648985] Avg episode reward: [(0, '147.600')] [2024-06-15 16:43:51,812][1652491] Updated weights for policy 0, policy_version 422340 (0.0022) [2024-06-15 16:43:53,146][1652491] Updated weights for policy 0, policy_version 422400 (0.0141) [2024-06-15 16:43:55,914][1652491] Updated weights for policy 0, policy_version 422512 (0.0136) [2024-06-15 16:43:55,955][1648985] Fps is (10 sec: 49158.2, 60 sec: 48606.1, 300 sec: 46652.8). Total num frames: 865304576. Throughput: 0: 11673.6. Samples: 216368128. Policy #0 lag: (min: 15.0, avg: 128.1, max: 271.0) [2024-06-15 16:43:55,956][1648985] Avg episode reward: [(0, '160.170')] [2024-06-15 16:43:59,854][1652491] Updated weights for policy 0, policy_version 422576 (0.0027) [2024-06-15 16:44:00,955][1648985] Fps is (10 sec: 52428.6, 60 sec: 45875.3, 300 sec: 46652.8). Total num frames: 865468416. Throughput: 0: 11515.2. Samples: 216428032. Policy #0 lag: (min: 15.0, avg: 128.1, max: 271.0) [2024-06-15 16:44:00,956][1648985] Avg episode reward: [(0, '166.980')] [2024-06-15 16:44:04,175][1652491] Updated weights for policy 0, policy_version 422629 (0.0015) [2024-06-15 16:44:05,418][1652491] Updated weights for policy 0, policy_version 422688 (0.0069) [2024-06-15 16:44:05,955][1648985] Fps is (10 sec: 39321.8, 60 sec: 47513.7, 300 sec: 46541.7). Total num frames: 865697792. Throughput: 0: 11548.4. Samples: 216497152. Policy #0 lag: (min: 15.0, avg: 91.2, max: 271.0) [2024-06-15 16:44:05,956][1648985] Avg episode reward: [(0, '164.840')] [2024-06-15 16:44:07,401][1652491] Updated weights for policy 0, policy_version 422772 (0.0020) [2024-06-15 16:44:10,345][1652491] Updated weights for policy 0, policy_version 422789 (0.0036) [2024-06-15 16:44:10,955][1648985] Fps is (10 sec: 45875.0, 60 sec: 44783.0, 300 sec: 46430.6). Total num frames: 865927168. Throughput: 0: 11446.0. Samples: 216530944. Policy #0 lag: (min: 15.0, avg: 91.2, max: 271.0) [2024-06-15 16:44:10,956][1648985] Avg episode reward: [(0, '159.760')] [2024-06-15 16:44:11,382][1652491] Updated weights for policy 0, policy_version 422848 (0.0013) [2024-06-15 16:44:15,962][1648985] Fps is (10 sec: 36036.5, 60 sec: 44783.7, 300 sec: 45985.9). Total num frames: 866058240. Throughput: 0: 11559.3. Samples: 216611840. Policy #0 lag: (min: 15.0, avg: 91.2, max: 271.0) [2024-06-15 16:44:15,963][1648985] Avg episode reward: [(0, '155.850')] [2024-06-15 16:44:16,880][1652491] Updated weights for policy 0, policy_version 422931 (0.0214) [2024-06-15 16:44:19,315][1652491] Updated weights for policy 0, policy_version 423031 (0.0018) [2024-06-15 16:44:20,955][1648985] Fps is (10 sec: 45875.6, 60 sec: 45875.2, 300 sec: 46208.4). Total num frames: 866385920. Throughput: 0: 11343.7. Samples: 216668160. Policy #0 lag: (min: 15.0, avg: 91.2, max: 271.0) [2024-06-15 16:44:20,956][1648985] Avg episode reward: [(0, '155.260')] [2024-06-15 16:44:22,658][1652491] Updated weights for policy 0, policy_version 423079 (0.0081) [2024-06-15 16:44:25,955][1648985] Fps is (10 sec: 45885.4, 60 sec: 43690.7, 300 sec: 45986.3). Total num frames: 866516992. Throughput: 0: 11434.7. Samples: 216704512. Policy #0 lag: (min: 15.0, avg: 91.2, max: 271.0) [2024-06-15 16:44:25,956][1648985] Avg episode reward: [(0, '146.420')] [2024-06-15 16:44:27,257][1652491] Updated weights for policy 0, policy_version 423136 (0.0157) [2024-06-15 16:44:27,718][1651469] Signal inference workers to stop experience collection... (22050 times) [2024-06-15 16:44:27,765][1652491] InferenceWorker_p0-w0: stopping experience collection (22050 times) [2024-06-15 16:44:27,883][1651469] Signal inference workers to resume experience collection... (22050 times) [2024-06-15 16:44:27,884][1652491] InferenceWorker_p0-w0: resuming experience collection (22050 times) [2024-06-15 16:44:28,784][1652491] Updated weights for policy 0, policy_version 423203 (0.0012) [2024-06-15 16:44:30,266][1652491] Updated weights for policy 0, policy_version 423264 (0.0013) [2024-06-15 16:44:30,955][1648985] Fps is (10 sec: 49151.3, 60 sec: 46967.6, 300 sec: 46541.7). Total num frames: 866877440. Throughput: 0: 11435.0. Samples: 216771584. Policy #0 lag: (min: 15.0, avg: 91.2, max: 271.0) [2024-06-15 16:44:30,956][1648985] Avg episode reward: [(0, '142.660')] [2024-06-15 16:44:31,116][1652491] Updated weights for policy 0, policy_version 423296 (0.0011) [2024-06-15 16:44:34,428][1652491] Updated weights for policy 0, policy_version 423354 (0.0013) [2024-06-15 16:44:35,955][1648985] Fps is (10 sec: 52429.7, 60 sec: 45875.5, 300 sec: 46541.7). Total num frames: 867041280. Throughput: 0: 11309.5. Samples: 216845312. Policy #0 lag: (min: 15.0, avg: 91.2, max: 271.0) [2024-06-15 16:44:35,956][1648985] Avg episode reward: [(0, '142.410')] [2024-06-15 16:44:39,596][1652491] Updated weights for policy 0, policy_version 423428 (0.0013) [2024-06-15 16:44:40,955][1648985] Fps is (10 sec: 39321.6, 60 sec: 46967.5, 300 sec: 46319.5). Total num frames: 867270656. Throughput: 0: 11355.0. Samples: 216879104. Policy #0 lag: (min: 15.0, avg: 91.2, max: 271.0) [2024-06-15 16:44:40,956][1648985] Avg episode reward: [(0, '155.550')] [2024-06-15 16:44:41,396][1652491] Updated weights for policy 0, policy_version 423504 (0.0013) [2024-06-15 16:44:42,572][1652491] Updated weights for policy 0, policy_version 423551 (0.0022) [2024-06-15 16:44:45,955][1648985] Fps is (10 sec: 49151.2, 60 sec: 45330.0, 300 sec: 46541.7). Total num frames: 867532800. Throughput: 0: 11468.8. Samples: 216944128. Policy #0 lag: (min: 15.0, avg: 91.2, max: 271.0) [2024-06-15 16:44:45,956][1648985] Avg episode reward: [(0, '167.280')] [2024-06-15 16:44:45,978][1652491] Updated weights for policy 0, policy_version 423605 (0.0015) [2024-06-15 16:44:50,663][1652491] Updated weights for policy 0, policy_version 423668 (0.0014) [2024-06-15 16:44:50,955][1648985] Fps is (10 sec: 42598.8, 60 sec: 45875.2, 300 sec: 46208.5). Total num frames: 867696640. Throughput: 0: 11514.3. Samples: 217015296. Policy #0 lag: (min: 15.0, avg: 91.2, max: 271.0) [2024-06-15 16:44:50,956][1648985] Avg episode reward: [(0, '149.080')] [2024-06-15 16:44:52,928][1652491] Updated weights for policy 0, policy_version 423760 (0.0014) [2024-06-15 16:44:54,017][1652491] Updated weights for policy 0, policy_version 423803 (0.0011) [2024-06-15 16:44:55,955][1648985] Fps is (10 sec: 42597.9, 60 sec: 44236.7, 300 sec: 46208.4). Total num frames: 867958784. Throughput: 0: 11195.7. Samples: 217034752. Policy #0 lag: (min: 15.0, avg: 91.2, max: 271.0) [2024-06-15 16:44:55,956][1648985] Avg episode reward: [(0, '138.440')] [2024-06-15 16:44:55,961][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000423808_867958784.pth... [2024-06-15 16:44:56,010][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000418432_856948736.pth [2024-06-15 16:44:56,968][1652491] Updated weights for policy 0, policy_version 423842 (0.0012) [2024-06-15 16:45:00,955][1648985] Fps is (10 sec: 39321.5, 60 sec: 43690.6, 300 sec: 45764.2). Total num frames: 868089856. Throughput: 0: 11332.8. Samples: 217121792. Policy #0 lag: (min: 15.0, avg: 91.2, max: 271.0) [2024-06-15 16:45:00,956][1648985] Avg episode reward: [(0, '154.890')] [2024-06-15 16:45:01,799][1652491] Updated weights for policy 0, policy_version 423904 (0.0014) [2024-06-15 16:45:03,977][1652491] Updated weights for policy 0, policy_version 424000 (0.0138) [2024-06-15 16:45:05,523][1652491] Updated weights for policy 0, policy_version 424059 (0.0025) [2024-06-15 16:45:05,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 46421.2, 300 sec: 46319.5). Total num frames: 868483072. Throughput: 0: 11252.6. Samples: 217174528. Policy #0 lag: (min: 15.0, avg: 91.2, max: 271.0) [2024-06-15 16:45:05,956][1648985] Avg episode reward: [(0, '159.670')] [2024-06-15 16:45:07,716][1651469] Signal inference workers to stop experience collection... (22100 times) [2024-06-15 16:45:07,786][1652491] InferenceWorker_p0-w0: stopping experience collection (22100 times) [2024-06-15 16:45:07,960][1651469] Signal inference workers to resume experience collection... (22100 times) [2024-06-15 16:45:07,961][1652491] InferenceWorker_p0-w0: resuming experience collection (22100 times) [2024-06-15 16:45:08,581][1652491] Updated weights for policy 0, policy_version 424106 (0.0116) [2024-06-15 16:45:10,955][1648985] Fps is (10 sec: 52429.1, 60 sec: 44783.0, 300 sec: 46097.4). Total num frames: 868614144. Throughput: 0: 11320.9. Samples: 217213952. Policy #0 lag: (min: 15.0, avg: 91.2, max: 271.0) [2024-06-15 16:45:10,955][1648985] Avg episode reward: [(0, '161.960')] [2024-06-15 16:45:12,897][1652491] Updated weights for policy 0, policy_version 424148 (0.0012) [2024-06-15 16:45:14,795][1652491] Updated weights for policy 0, policy_version 424228 (0.0015) [2024-06-15 16:45:15,955][1648985] Fps is (10 sec: 42599.1, 60 sec: 47515.4, 300 sec: 46319.5). Total num frames: 868909056. Throughput: 0: 11480.2. Samples: 217288192. Policy #0 lag: (min: 15.0, avg: 91.2, max: 271.0) [2024-06-15 16:45:15,956][1648985] Avg episode reward: [(0, '149.580')] [2024-06-15 16:45:16,505][1652491] Updated weights for policy 0, policy_version 424289 (0.0014) [2024-06-15 16:45:19,938][1652491] Updated weights for policy 0, policy_version 424378 (0.0017) [2024-06-15 16:45:20,955][1648985] Fps is (10 sec: 52429.2, 60 sec: 45875.2, 300 sec: 46653.3). Total num frames: 869138432. Throughput: 0: 11411.9. Samples: 217358848. Policy #0 lag: (min: 15.0, avg: 91.2, max: 271.0) [2024-06-15 16:45:20,955][1648985] Avg episode reward: [(0, '149.420')] [2024-06-15 16:45:24,198][1652491] Updated weights for policy 0, policy_version 424435 (0.0011) [2024-06-15 16:45:25,553][1652491] Updated weights for policy 0, policy_version 424498 (0.0013) [2024-06-15 16:45:25,955][1648985] Fps is (10 sec: 49151.6, 60 sec: 48059.7, 300 sec: 46430.6). Total num frames: 869400576. Throughput: 0: 11639.5. Samples: 217402880. Policy #0 lag: (min: 88.0, avg: 176.4, max: 344.0) [2024-06-15 16:45:25,956][1648985] Avg episode reward: [(0, '144.160')] [2024-06-15 16:45:27,097][1652491] Updated weights for policy 0, policy_version 424574 (0.0034) [2024-06-15 16:45:30,955][1648985] Fps is (10 sec: 45874.3, 60 sec: 45329.1, 300 sec: 46430.6). Total num frames: 869597184. Throughput: 0: 11593.9. Samples: 217465856. Policy #0 lag: (min: 88.0, avg: 176.4, max: 344.0) [2024-06-15 16:45:30,956][1648985] Avg episode reward: [(0, '148.200')] [2024-06-15 16:45:31,314][1652491] Updated weights for policy 0, policy_version 424631 (0.0028) [2024-06-15 16:45:35,348][1652491] Updated weights for policy 0, policy_version 424693 (0.0091) [2024-06-15 16:45:35,955][1648985] Fps is (10 sec: 42598.9, 60 sec: 46421.3, 300 sec: 46319.5). Total num frames: 869826560. Throughput: 0: 11639.5. Samples: 217539072. Policy #0 lag: (min: 88.0, avg: 176.4, max: 344.0) [2024-06-15 16:45:35,955][1648985] Avg episode reward: [(0, '139.890')] [2024-06-15 16:45:36,744][1652491] Updated weights for policy 0, policy_version 424752 (0.0014) [2024-06-15 16:45:38,283][1652491] Updated weights for policy 0, policy_version 424821 (0.0012) [2024-06-15 16:45:40,955][1648985] Fps is (10 sec: 45875.4, 60 sec: 46421.4, 300 sec: 46208.4). Total num frames: 870055936. Throughput: 0: 11798.8. Samples: 217565696. Policy #0 lag: (min: 88.0, avg: 176.4, max: 344.0) [2024-06-15 16:45:40,956][1648985] Avg episode reward: [(0, '135.650')] [2024-06-15 16:45:41,647][1652491] Updated weights for policy 0, policy_version 424850 (0.0013) [2024-06-15 16:45:45,428][1652491] Updated weights for policy 0, policy_version 424897 (0.0026) [2024-06-15 16:45:45,955][1648985] Fps is (10 sec: 39320.4, 60 sec: 44782.8, 300 sec: 45875.2). Total num frames: 870219776. Throughput: 0: 11616.7. Samples: 217644544. Policy #0 lag: (min: 88.0, avg: 176.4, max: 344.0) [2024-06-15 16:45:45,956][1648985] Avg episode reward: [(0, '155.520')] [2024-06-15 16:45:46,575][1652491] Updated weights for policy 0, policy_version 424950 (0.0012) [2024-06-15 16:45:47,130][1651469] Signal inference workers to stop experience collection... (22150 times) [2024-06-15 16:45:47,198][1652491] InferenceWorker_p0-w0: stopping experience collection (22150 times) [2024-06-15 16:45:47,337][1651469] Signal inference workers to resume experience collection... (22150 times) [2024-06-15 16:45:47,338][1652491] InferenceWorker_p0-w0: resuming experience collection (22150 times) [2024-06-15 16:45:47,901][1652491] Updated weights for policy 0, policy_version 425012 (0.0013) [2024-06-15 16:45:49,053][1652491] Updated weights for policy 0, policy_version 425059 (0.0014) [2024-06-15 16:45:50,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 48059.7, 300 sec: 46319.6). Total num frames: 870580224. Throughput: 0: 11946.7. Samples: 217712128. Policy #0 lag: (min: 88.0, avg: 176.4, max: 344.0) [2024-06-15 16:45:50,956][1648985] Avg episode reward: [(0, '160.490')] [2024-06-15 16:45:53,053][1652491] Updated weights for policy 0, policy_version 425109 (0.0027) [2024-06-15 16:45:53,969][1652491] Updated weights for policy 0, policy_version 425151 (0.0015) [2024-06-15 16:45:55,955][1648985] Fps is (10 sec: 49152.6, 60 sec: 45875.2, 300 sec: 46208.4). Total num frames: 870711296. Throughput: 0: 11832.9. Samples: 217746432. Policy #0 lag: (min: 88.0, avg: 176.4, max: 344.0) [2024-06-15 16:45:55,956][1648985] Avg episode reward: [(0, '153.510')] [2024-06-15 16:45:57,473][1652491] Updated weights for policy 0, policy_version 425208 (0.0014) [2024-06-15 16:45:59,274][1652491] Updated weights for policy 0, policy_version 425280 (0.0013) [2024-06-15 16:46:00,450][1652491] Updated weights for policy 0, policy_version 425340 (0.0012) [2024-06-15 16:46:00,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 50244.2, 300 sec: 46652.7). Total num frames: 871104512. Throughput: 0: 11685.0. Samples: 217814016. Policy #0 lag: (min: 88.0, avg: 176.4, max: 344.0) [2024-06-15 16:46:00,956][1648985] Avg episode reward: [(0, '140.500')] [2024-06-15 16:46:05,109][1652491] Updated weights for policy 0, policy_version 425402 (0.0011) [2024-06-15 16:46:05,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 45875.2, 300 sec: 46652.7). Total num frames: 871235584. Throughput: 0: 11730.4. Samples: 217886720. Policy #0 lag: (min: 88.0, avg: 176.4, max: 344.0) [2024-06-15 16:46:05,956][1648985] Avg episode reward: [(0, '152.800')] [2024-06-15 16:46:08,307][1652491] Updated weights for policy 0, policy_version 425445 (0.0137) [2024-06-15 16:46:09,957][1652491] Updated weights for policy 0, policy_version 425520 (0.0014) [2024-06-15 16:46:10,955][1648985] Fps is (10 sec: 45875.4, 60 sec: 49152.0, 300 sec: 46763.9). Total num frames: 871563264. Throughput: 0: 11628.1. Samples: 217926144. Policy #0 lag: (min: 88.0, avg: 176.4, max: 344.0) [2024-06-15 16:46:10,956][1648985] Avg episode reward: [(0, '160.610')] [2024-06-15 16:46:11,218][1652491] Updated weights for policy 0, policy_version 425584 (0.0013) [2024-06-15 16:46:14,277][1652491] Updated weights for policy 0, policy_version 425633 (0.0019) [2024-06-15 16:46:14,924][1652491] Updated weights for policy 0, policy_version 425664 (0.0011) [2024-06-15 16:46:15,955][1648985] Fps is (10 sec: 52429.2, 60 sec: 47513.5, 300 sec: 46652.7). Total num frames: 871759872. Throughput: 0: 11958.1. Samples: 218003968. Policy #0 lag: (min: 88.0, avg: 176.4, max: 344.0) [2024-06-15 16:46:15,956][1648985] Avg episode reward: [(0, '161.440')] [2024-06-15 16:46:19,589][1652491] Updated weights for policy 0, policy_version 425729 (0.0106) [2024-06-15 16:46:20,955][1648985] Fps is (10 sec: 45875.0, 60 sec: 48059.6, 300 sec: 46652.7). Total num frames: 872022016. Throughput: 0: 11867.0. Samples: 218073088. Policy #0 lag: (min: 88.0, avg: 176.4, max: 344.0) [2024-06-15 16:46:20,956][1648985] Avg episode reward: [(0, '173.430')] [2024-06-15 16:46:21,315][1652491] Updated weights for policy 0, policy_version 425808 (0.0013) [2024-06-15 16:46:25,221][1652491] Updated weights for policy 0, policy_version 425888 (0.0014) [2024-06-15 16:46:25,955][1648985] Fps is (10 sec: 52428.0, 60 sec: 48059.6, 300 sec: 46652.7). Total num frames: 872284160. Throughput: 0: 12060.4. Samples: 218108416. Policy #0 lag: (min: 88.0, avg: 176.4, max: 344.0) [2024-06-15 16:46:25,956][1648985] Avg episode reward: [(0, '172.050')] [2024-06-15 16:46:29,700][1651469] Signal inference workers to stop experience collection... (22200 times) [2024-06-15 16:46:29,816][1652491] InferenceWorker_p0-w0: stopping experience collection (22200 times) [2024-06-15 16:46:30,043][1651469] Signal inference workers to resume experience collection... (22200 times) [2024-06-15 16:46:30,045][1652491] InferenceWorker_p0-w0: resuming experience collection (22200 times) [2024-06-15 16:46:30,531][1652491] Updated weights for policy 0, policy_version 425952 (0.0106) [2024-06-15 16:46:30,955][1648985] Fps is (10 sec: 32768.0, 60 sec: 45875.2, 300 sec: 46208.4). Total num frames: 872349696. Throughput: 0: 12015.0. Samples: 218185216. Policy #0 lag: (min: 88.0, avg: 176.4, max: 344.0) [2024-06-15 16:46:30,956][1648985] Avg episode reward: [(0, '175.720')] [2024-06-15 16:46:32,267][1652491] Updated weights for policy 0, policy_version 426017 (0.0013) [2024-06-15 16:46:33,341][1652491] Updated weights for policy 0, policy_version 426080 (0.0023) [2024-06-15 16:46:33,918][1652491] Updated weights for policy 0, policy_version 426112 (0.0012) [2024-06-15 16:46:35,955][1648985] Fps is (10 sec: 45876.1, 60 sec: 48605.8, 300 sec: 46430.6). Total num frames: 872742912. Throughput: 0: 11923.9. Samples: 218248704. Policy #0 lag: (min: 88.0, avg: 176.4, max: 344.0) [2024-06-15 16:46:35,956][1648985] Avg episode reward: [(0, '151.370')] [2024-06-15 16:46:36,681][1652491] Updated weights for policy 0, policy_version 426175 (0.0013) [2024-06-15 16:46:40,955][1648985] Fps is (10 sec: 45874.4, 60 sec: 45875.1, 300 sec: 46208.4). Total num frames: 872808448. Throughput: 0: 12003.5. Samples: 218286592. Policy #0 lag: (min: 88.0, avg: 176.4, max: 344.0) [2024-06-15 16:46:40,956][1648985] Avg episode reward: [(0, '165.230')] [2024-06-15 16:46:42,428][1652491] Updated weights for policy 0, policy_version 426240 (0.0014) [2024-06-15 16:46:44,217][1652491] Updated weights for policy 0, policy_version 426308 (0.0101) [2024-06-15 16:46:45,955][1648985] Fps is (10 sec: 45874.1, 60 sec: 49698.1, 300 sec: 46541.6). Total num frames: 873201664. Throughput: 0: 11764.6. Samples: 218343424. Policy #0 lag: (min: 88.0, avg: 176.4, max: 344.0) [2024-06-15 16:46:45,956][1648985] Avg episode reward: [(0, '147.430')] [2024-06-15 16:46:47,449][1652491] Updated weights for policy 0, policy_version 426385 (0.0013) [2024-06-15 16:46:48,407][1652491] Updated weights for policy 0, policy_version 426428 (0.0012) [2024-06-15 16:46:50,955][1648985] Fps is (10 sec: 52429.2, 60 sec: 45875.1, 300 sec: 46430.6). Total num frames: 873332736. Throughput: 0: 11844.3. Samples: 218419712. Policy #0 lag: (min: 30.0, avg: 166.4, max: 286.0) [2024-06-15 16:46:50,956][1648985] Avg episode reward: [(0, '163.150')] [2024-06-15 16:46:53,525][1652491] Updated weights for policy 0, policy_version 426480 (0.0091) [2024-06-15 16:46:55,282][1652491] Updated weights for policy 0, policy_version 426560 (0.0012) [2024-06-15 16:46:55,955][1648985] Fps is (10 sec: 42598.5, 60 sec: 48605.8, 300 sec: 46430.6). Total num frames: 873627648. Throughput: 0: 11719.1. Samples: 218453504. Policy #0 lag: (min: 30.0, avg: 166.4, max: 286.0) [2024-06-15 16:46:55,956][1648985] Avg episode reward: [(0, '170.390')] [2024-06-15 16:46:56,430][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000426608_873693184.pth... [2024-06-15 16:46:56,472][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000421120_862453760.pth [2024-06-15 16:46:56,601][1652491] Updated weights for policy 0, policy_version 426616 (0.0014) [2024-06-15 16:46:58,568][1652491] Updated weights for policy 0, policy_version 426672 (0.0013) [2024-06-15 16:47:00,955][1648985] Fps is (10 sec: 52429.7, 60 sec: 45875.2, 300 sec: 46652.7). Total num frames: 873857024. Throughput: 0: 11662.2. Samples: 218528768. Policy #0 lag: (min: 30.0, avg: 166.4, max: 286.0) [2024-06-15 16:47:00,956][1648985] Avg episode reward: [(0, '170.040')] [2024-06-15 16:47:03,915][1652491] Updated weights for policy 0, policy_version 426720 (0.0011) [2024-06-15 16:47:05,955][1648985] Fps is (10 sec: 45876.2, 60 sec: 47513.7, 300 sec: 46541.7). Total num frames: 874086400. Throughput: 0: 11650.9. Samples: 218597376. Policy #0 lag: (min: 30.0, avg: 166.4, max: 286.0) [2024-06-15 16:47:05,956][1648985] Avg episode reward: [(0, '156.600')] [2024-06-15 16:47:06,264][1651469] Signal inference workers to stop experience collection... (22250 times) [2024-06-15 16:47:06,299][1652491] Updated weights for policy 0, policy_version 426817 (0.0117) [2024-06-15 16:47:06,331][1652491] InferenceWorker_p0-w0: stopping experience collection (22250 times) [2024-06-15 16:47:06,571][1651469] Signal inference workers to resume experience collection... (22250 times) [2024-06-15 16:47:06,581][1652491] InferenceWorker_p0-w0: resuming experience collection (22250 times) [2024-06-15 16:47:07,565][1652491] Updated weights for policy 0, policy_version 426875 (0.0015) [2024-06-15 16:47:09,843][1652491] Updated weights for policy 0, policy_version 426935 (0.0028) [2024-06-15 16:47:10,955][1648985] Fps is (10 sec: 52427.6, 60 sec: 46967.3, 300 sec: 46652.7). Total num frames: 874381312. Throughput: 0: 11662.2. Samples: 218633216. Policy #0 lag: (min: 30.0, avg: 166.4, max: 286.0) [2024-06-15 16:47:10,956][1648985] Avg episode reward: [(0, '145.020')] [2024-06-15 16:47:14,780][1652491] Updated weights for policy 0, policy_version 426979 (0.0014) [2024-06-15 16:47:15,955][1648985] Fps is (10 sec: 45874.7, 60 sec: 46421.3, 300 sec: 46763.8). Total num frames: 874545152. Throughput: 0: 11798.7. Samples: 218716160. Policy #0 lag: (min: 30.0, avg: 166.4, max: 286.0) [2024-06-15 16:47:15,956][1648985] Avg episode reward: [(0, '164.560')] [2024-06-15 16:47:16,423][1652491] Updated weights for policy 0, policy_version 427056 (0.0013) [2024-06-15 16:47:17,844][1652491] Updated weights for policy 0, policy_version 427120 (0.0014) [2024-06-15 16:47:19,994][1652491] Updated weights for policy 0, policy_version 427168 (0.0012) [2024-06-15 16:47:20,955][1648985] Fps is (10 sec: 52429.9, 60 sec: 48059.8, 300 sec: 46652.8). Total num frames: 874905600. Throughput: 0: 11844.3. Samples: 218781696. Policy #0 lag: (min: 30.0, avg: 166.4, max: 286.0) [2024-06-15 16:47:20,956][1648985] Avg episode reward: [(0, '154.950')] [2024-06-15 16:47:24,673][1652491] Updated weights for policy 0, policy_version 427232 (0.0018) [2024-06-15 16:47:25,955][1648985] Fps is (10 sec: 49152.6, 60 sec: 45875.4, 300 sec: 46652.7). Total num frames: 875036672. Throughput: 0: 12117.4. Samples: 218831872. Policy #0 lag: (min: 30.0, avg: 166.4, max: 286.0) [2024-06-15 16:47:25,955][1648985] Avg episode reward: [(0, '126.760')] [2024-06-15 16:47:26,274][1652491] Updated weights for policy 0, policy_version 427288 (0.0195) [2024-06-15 16:47:27,035][1652491] Updated weights for policy 0, policy_version 427328 (0.0014) [2024-06-15 16:47:28,494][1652491] Updated weights for policy 0, policy_version 427376 (0.0015) [2024-06-15 16:47:30,955][1648985] Fps is (10 sec: 45875.4, 60 sec: 50244.3, 300 sec: 46763.8). Total num frames: 875364352. Throughput: 0: 12231.2. Samples: 218893824. Policy #0 lag: (min: 30.0, avg: 166.4, max: 286.0) [2024-06-15 16:47:30,955][1648985] Avg episode reward: [(0, '147.340')] [2024-06-15 16:47:31,333][1652491] Updated weights for policy 0, policy_version 427446 (0.0013) [2024-06-15 16:47:35,812][1652491] Updated weights for policy 0, policy_version 427491 (0.0011) [2024-06-15 16:47:35,955][1648985] Fps is (10 sec: 45875.2, 60 sec: 45875.2, 300 sec: 46541.7). Total num frames: 875495424. Throughput: 0: 12162.9. Samples: 218967040. Policy #0 lag: (min: 30.0, avg: 166.4, max: 286.0) [2024-06-15 16:47:35,956][1648985] Avg episode reward: [(0, '148.890')] [2024-06-15 16:47:37,488][1652491] Updated weights for policy 0, policy_version 427552 (0.0014) [2024-06-15 16:47:39,895][1652491] Updated weights for policy 0, policy_version 427600 (0.0016) [2024-06-15 16:47:40,955][1648985] Fps is (10 sec: 45875.2, 60 sec: 50244.5, 300 sec: 46875.5). Total num frames: 875823104. Throughput: 0: 12026.4. Samples: 218994688. Policy #0 lag: (min: 30.0, avg: 166.4, max: 286.0) [2024-06-15 16:47:40,956][1648985] Avg episode reward: [(0, '152.010')] [2024-06-15 16:47:41,922][1652491] Updated weights for policy 0, policy_version 427649 (0.0013) [2024-06-15 16:47:43,189][1652491] Updated weights for policy 0, policy_version 427703 (0.0012) [2024-06-15 16:47:45,955][1648985] Fps is (10 sec: 45875.3, 60 sec: 45875.4, 300 sec: 46541.7). Total num frames: 875954176. Throughput: 0: 11923.9. Samples: 219065344. Policy #0 lag: (min: 30.0, avg: 166.4, max: 286.0) [2024-06-15 16:47:45,955][1648985] Avg episode reward: [(0, '138.070')] [2024-06-15 16:47:47,595][1652491] Updated weights for policy 0, policy_version 427760 (0.0036) [2024-06-15 16:47:49,288][1651469] Signal inference workers to stop experience collection... (22300 times) [2024-06-15 16:47:49,315][1652491] InferenceWorker_p0-w0: stopping experience collection (22300 times) [2024-06-15 16:47:49,349][1652491] Updated weights for policy 0, policy_version 427809 (0.0016) [2024-06-15 16:47:49,535][1651469] Signal inference workers to resume experience collection... (22300 times) [2024-06-15 16:47:49,541][1652491] InferenceWorker_p0-w0: resuming experience collection (22300 times) [2024-06-15 16:47:50,955][1648985] Fps is (10 sec: 42598.1, 60 sec: 48606.0, 300 sec: 46986.0). Total num frames: 876249088. Throughput: 0: 11901.1. Samples: 219132928. Policy #0 lag: (min: 30.0, avg: 166.4, max: 286.0) [2024-06-15 16:47:50,956][1648985] Avg episode reward: [(0, '149.440')] [2024-06-15 16:47:51,299][1652491] Updated weights for policy 0, policy_version 427875 (0.0018) [2024-06-15 16:47:54,574][1652491] Updated weights for policy 0, policy_version 427952 (0.0026) [2024-06-15 16:47:55,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 47513.8, 300 sec: 46652.8). Total num frames: 876478464. Throughput: 0: 11867.1. Samples: 219167232. Policy #0 lag: (min: 30.0, avg: 166.4, max: 286.0) [2024-06-15 16:47:55,955][1648985] Avg episode reward: [(0, '167.710')] [2024-06-15 16:47:59,214][1652491] Updated weights for policy 0, policy_version 428023 (0.0016) [2024-06-15 16:48:00,302][1652491] Updated weights for policy 0, policy_version 428053 (0.0034) [2024-06-15 16:48:00,955][1648985] Fps is (10 sec: 45875.4, 60 sec: 47513.6, 300 sec: 46986.0). Total num frames: 876707840. Throughput: 0: 11616.7. Samples: 219238912. Policy #0 lag: (min: 30.0, avg: 166.4, max: 286.0) [2024-06-15 16:48:00,956][1648985] Avg episode reward: [(0, '168.970')] [2024-06-15 16:48:01,200][1652491] Updated weights for policy 0, policy_version 428094 (0.0010) [2024-06-15 16:48:02,942][1652491] Updated weights for policy 0, policy_version 428160 (0.0014) [2024-06-15 16:48:05,653][1652491] Updated weights for policy 0, policy_version 428222 (0.0012) [2024-06-15 16:48:05,955][1648985] Fps is (10 sec: 52428.5, 60 sec: 48605.9, 300 sec: 46652.8). Total num frames: 877002752. Throughput: 0: 11582.6. Samples: 219302912. Policy #0 lag: (min: 30.0, avg: 166.4, max: 286.0) [2024-06-15 16:48:05,956][1648985] Avg episode reward: [(0, '149.850')] [2024-06-15 16:48:10,214][1652491] Updated weights for policy 0, policy_version 428281 (0.0013) [2024-06-15 16:48:10,955][1648985] Fps is (10 sec: 42598.1, 60 sec: 45875.3, 300 sec: 46653.3). Total num frames: 877133824. Throughput: 0: 11502.9. Samples: 219349504. Policy #0 lag: (min: 47.0, avg: 124.5, max: 303.0) [2024-06-15 16:48:10,956][1648985] Avg episode reward: [(0, '156.550')] [2024-06-15 16:48:11,917][1652491] Updated weights for policy 0, policy_version 428340 (0.0013) [2024-06-15 16:48:13,848][1652491] Updated weights for policy 0, policy_version 428400 (0.0122) [2024-06-15 16:48:15,955][1648985] Fps is (10 sec: 42598.3, 60 sec: 48059.8, 300 sec: 46763.8). Total num frames: 877428736. Throughput: 0: 11582.6. Samples: 219415040. Policy #0 lag: (min: 47.0, avg: 124.5, max: 303.0) [2024-06-15 16:48:15,956][1648985] Avg episode reward: [(0, '157.520')] [2024-06-15 16:48:16,297][1652491] Updated weights for policy 0, policy_version 428448 (0.0013) [2024-06-15 16:48:17,116][1652491] Updated weights for policy 0, policy_version 428476 (0.0012) [2024-06-15 16:48:20,955][1648985] Fps is (10 sec: 45875.6, 60 sec: 44783.0, 300 sec: 46430.6). Total num frames: 877592576. Throughput: 0: 11525.7. Samples: 219485696. Policy #0 lag: (min: 47.0, avg: 124.5, max: 303.0) [2024-06-15 16:48:20,956][1648985] Avg episode reward: [(0, '162.060')] [2024-06-15 16:48:22,257][1652491] Updated weights for policy 0, policy_version 428548 (0.0015) [2024-06-15 16:48:23,500][1652491] Updated weights for policy 0, policy_version 428604 (0.0013) [2024-06-15 16:48:25,876][1652491] Updated weights for policy 0, policy_version 428660 (0.0015) [2024-06-15 16:48:25,955][1648985] Fps is (10 sec: 45875.5, 60 sec: 47513.6, 300 sec: 46874.9). Total num frames: 877887488. Throughput: 0: 11662.2. Samples: 219519488. Policy #0 lag: (min: 47.0, avg: 124.5, max: 303.0) [2024-06-15 16:48:25,956][1648985] Avg episode reward: [(0, '154.740')] [2024-06-15 16:48:27,410][1652491] Updated weights for policy 0, policy_version 428696 (0.0121) [2024-06-15 16:48:28,369][1652491] Updated weights for policy 0, policy_version 428734 (0.0016) [2024-06-15 16:48:30,955][1648985] Fps is (10 sec: 45875.0, 60 sec: 44782.9, 300 sec: 46652.8). Total num frames: 878051328. Throughput: 0: 11650.8. Samples: 219589632. Policy #0 lag: (min: 47.0, avg: 124.5, max: 303.0) [2024-06-15 16:48:30,956][1648985] Avg episode reward: [(0, '148.660')] [2024-06-15 16:48:32,109][1652491] Updated weights for policy 0, policy_version 428792 (0.0014) [2024-06-15 16:48:34,095][1652491] Updated weights for policy 0, policy_version 428848 (0.0080) [2024-06-15 16:48:35,955][1648985] Fps is (10 sec: 42598.3, 60 sec: 46967.5, 300 sec: 46986.0). Total num frames: 878313472. Throughput: 0: 11798.8. Samples: 219663872. Policy #0 lag: (min: 47.0, avg: 124.5, max: 303.0) [2024-06-15 16:48:35,956][1648985] Avg episode reward: [(0, '145.070')] [2024-06-15 16:48:36,482][1651469] Signal inference workers to stop experience collection... (22350 times) [2024-06-15 16:48:36,524][1652491] InferenceWorker_p0-w0: stopping experience collection (22350 times) [2024-06-15 16:48:36,826][1651469] Signal inference workers to resume experience collection... (22350 times) [2024-06-15 16:48:36,854][1652491] InferenceWorker_p0-w0: resuming experience collection (22350 times) [2024-06-15 16:48:37,415][1652491] Updated weights for policy 0, policy_version 428918 (0.0108) [2024-06-15 16:48:38,954][1652491] Updated weights for policy 0, policy_version 428976 (0.0013) [2024-06-15 16:48:40,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 45875.1, 300 sec: 46652.9). Total num frames: 878575616. Throughput: 0: 11628.1. Samples: 219690496. Policy #0 lag: (min: 47.0, avg: 124.5, max: 303.0) [2024-06-15 16:48:40,956][1648985] Avg episode reward: [(0, '143.970')] [2024-06-15 16:48:42,878][1652491] Updated weights for policy 0, policy_version 429010 (0.0013) [2024-06-15 16:48:44,382][1652491] Updated weights for policy 0, policy_version 429061 (0.0020) [2024-06-15 16:48:45,123][1652491] Updated weights for policy 0, policy_version 429105 (0.0014) [2024-06-15 16:48:45,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 48059.7, 300 sec: 47097.1). Total num frames: 878837760. Throughput: 0: 11741.9. Samples: 219767296. Policy #0 lag: (min: 47.0, avg: 124.5, max: 303.0) [2024-06-15 16:48:45,956][1648985] Avg episode reward: [(0, '150.900')] [2024-06-15 16:48:48,119][1652491] Updated weights for policy 0, policy_version 429156 (0.0012) [2024-06-15 16:48:50,077][1652491] Updated weights for policy 0, policy_version 429242 (0.0012) [2024-06-15 16:48:50,955][1648985] Fps is (10 sec: 52427.8, 60 sec: 47513.4, 300 sec: 46763.8). Total num frames: 879099904. Throughput: 0: 11787.3. Samples: 219833344. Policy #0 lag: (min: 47.0, avg: 124.5, max: 303.0) [2024-06-15 16:48:50,956][1648985] Avg episode reward: [(0, '141.280')] [2024-06-15 16:48:54,430][1652491] Updated weights for policy 0, policy_version 429296 (0.0014) [2024-06-15 16:48:55,719][1652491] Updated weights for policy 0, policy_version 429335 (0.0020) [2024-06-15 16:48:55,955][1648985] Fps is (10 sec: 45873.4, 60 sec: 46967.1, 300 sec: 46874.8). Total num frames: 879296512. Throughput: 0: 11639.4. Samples: 219873280. Policy #0 lag: (min: 47.0, avg: 124.5, max: 303.0) [2024-06-15 16:48:55,956][1648985] Avg episode reward: [(0, '132.760')] [2024-06-15 16:48:56,228][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000429360_879329280.pth... [2024-06-15 16:48:56,309][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000423808_867958784.pth [2024-06-15 16:48:58,559][1652491] Updated weights for policy 0, policy_version 429379 (0.0014) [2024-06-15 16:49:00,565][1652491] Updated weights for policy 0, policy_version 429458 (0.0013) [2024-06-15 16:49:00,955][1648985] Fps is (10 sec: 45876.5, 60 sec: 47513.6, 300 sec: 46986.0). Total num frames: 879558656. Throughput: 0: 11741.9. Samples: 219943424. Policy #0 lag: (min: 47.0, avg: 124.5, max: 303.0) [2024-06-15 16:49:00,955][1648985] Avg episode reward: [(0, '125.290')] [2024-06-15 16:49:04,943][1652491] Updated weights for policy 0, policy_version 429508 (0.0015) [2024-06-15 16:49:05,955][1648985] Fps is (10 sec: 42599.8, 60 sec: 45329.0, 300 sec: 46763.8). Total num frames: 879722496. Throughput: 0: 11662.2. Samples: 220010496. Policy #0 lag: (min: 47.0, avg: 124.5, max: 303.0) [2024-06-15 16:49:05,956][1648985] Avg episode reward: [(0, '131.570')] [2024-06-15 16:49:06,828][1652491] Updated weights for policy 0, policy_version 429571 (0.0067) [2024-06-15 16:49:08,251][1652491] Updated weights for policy 0, policy_version 429632 (0.0014) [2024-06-15 16:49:10,955][1648985] Fps is (10 sec: 36044.6, 60 sec: 46421.4, 300 sec: 46986.3). Total num frames: 879919104. Throughput: 0: 11616.7. Samples: 220042240. Policy #0 lag: (min: 47.0, avg: 124.5, max: 303.0) [2024-06-15 16:49:10,956][1648985] Avg episode reward: [(0, '172.840')] [2024-06-15 16:49:12,053][1652491] Updated weights for policy 0, policy_version 429699 (0.0014) [2024-06-15 16:49:15,955][1648985] Fps is (10 sec: 42598.2, 60 sec: 45329.0, 300 sec: 46652.7). Total num frames: 880148480. Throughput: 0: 11411.9. Samples: 220103168. Policy #0 lag: (min: 47.0, avg: 124.5, max: 303.0) [2024-06-15 16:49:15,956][1648985] Avg episode reward: [(0, '164.820')] [2024-06-15 16:49:17,312][1652491] Updated weights for policy 0, policy_version 429781 (0.0015) [2024-06-15 16:49:19,099][1652491] Updated weights for policy 0, policy_version 429830 (0.0012) [2024-06-15 16:49:20,396][1652491] Updated weights for policy 0, policy_version 429888 (0.0012) [2024-06-15 16:49:20,955][1648985] Fps is (10 sec: 49152.2, 60 sec: 46967.5, 300 sec: 47097.1). Total num frames: 880410624. Throughput: 0: 11343.7. Samples: 220174336. Policy #0 lag: (min: 47.0, avg: 124.5, max: 303.0) [2024-06-15 16:49:20,955][1648985] Avg episode reward: [(0, '164.270')] [2024-06-15 16:49:22,321][1651469] Signal inference workers to stop experience collection... (22400 times) [2024-06-15 16:49:22,356][1652491] InferenceWorker_p0-w0: stopping experience collection (22400 times) [2024-06-15 16:49:22,577][1651469] Signal inference workers to resume experience collection... (22400 times) [2024-06-15 16:49:22,578][1652491] InferenceWorker_p0-w0: resuming experience collection (22400 times) [2024-06-15 16:49:24,289][1652491] Updated weights for policy 0, policy_version 429970 (0.0013) [2024-06-15 16:49:25,955][1648985] Fps is (10 sec: 52429.5, 60 sec: 46421.3, 300 sec: 46763.8). Total num frames: 880672768. Throughput: 0: 11446.1. Samples: 220205568. Policy #0 lag: (min: 47.0, avg: 124.5, max: 303.0) [2024-06-15 16:49:25,955][1648985] Avg episode reward: [(0, '150.350')] [2024-06-15 16:49:28,095][1652491] Updated weights for policy 0, policy_version 430017 (0.0013) [2024-06-15 16:49:30,955][1648985] Fps is (10 sec: 39321.2, 60 sec: 45875.2, 300 sec: 46652.7). Total num frames: 880803840. Throughput: 0: 11264.0. Samples: 220274176. Policy #0 lag: (min: 47.0, avg: 124.5, max: 303.0) [2024-06-15 16:49:30,956][1648985] Avg episode reward: [(0, '173.540')] [2024-06-15 16:49:31,608][1652491] Updated weights for policy 0, policy_version 430112 (0.0013) [2024-06-15 16:49:35,070][1652491] Updated weights for policy 0, policy_version 430176 (0.0013) [2024-06-15 16:49:35,975][1648985] Fps is (10 sec: 39244.6, 60 sec: 45860.2, 300 sec: 46760.7). Total num frames: 881065984. Throughput: 0: 11236.4. Samples: 220339200. Policy #0 lag: (min: 53.0, avg: 164.2, max: 309.0) [2024-06-15 16:49:35,975][1648985] Avg episode reward: [(0, '172.990')] [2024-06-15 16:49:36,722][1652491] Updated weights for policy 0, policy_version 430240 (0.0013) [2024-06-15 16:49:40,550][1652491] Updated weights for policy 0, policy_version 430304 (0.0016) [2024-06-15 16:49:40,955][1648985] Fps is (10 sec: 49152.3, 60 sec: 45329.1, 300 sec: 46652.8). Total num frames: 881295360. Throughput: 0: 11104.8. Samples: 220372992. Policy #0 lag: (min: 53.0, avg: 164.2, max: 309.0) [2024-06-15 16:49:40,956][1648985] Avg episode reward: [(0, '177.010')] [2024-06-15 16:49:42,962][1652491] Updated weights for policy 0, policy_version 430368 (0.0017) [2024-06-15 16:49:45,955][1648985] Fps is (10 sec: 39398.3, 60 sec: 43690.6, 300 sec: 46652.7). Total num frames: 881459200. Throughput: 0: 11059.2. Samples: 220441088. Policy #0 lag: (min: 53.0, avg: 164.2, max: 309.0) [2024-06-15 16:49:45,956][1648985] Avg episode reward: [(0, '172.690')] [2024-06-15 16:49:46,121][1652491] Updated weights for policy 0, policy_version 430416 (0.0012) [2024-06-15 16:49:47,872][1652491] Updated weights for policy 0, policy_version 430465 (0.0012) [2024-06-15 16:49:50,955][1648985] Fps is (10 sec: 42598.6, 60 sec: 43690.9, 300 sec: 46652.8). Total num frames: 881721344. Throughput: 0: 11127.5. Samples: 220511232. Policy #0 lag: (min: 53.0, avg: 164.2, max: 309.0) [2024-06-15 16:49:50,955][1648985] Avg episode reward: [(0, '157.460')] [2024-06-15 16:49:51,465][1652491] Updated weights for policy 0, policy_version 430531 (0.0014) [2024-06-15 16:49:53,390][1652491] Updated weights for policy 0, policy_version 430593 (0.0099) [2024-06-15 16:49:54,726][1652491] Updated weights for policy 0, policy_version 430656 (0.0024) [2024-06-15 16:49:55,955][1648985] Fps is (10 sec: 52429.2, 60 sec: 44783.2, 300 sec: 47097.1). Total num frames: 881983488. Throughput: 0: 11036.4. Samples: 220538880. Policy #0 lag: (min: 53.0, avg: 164.2, max: 309.0) [2024-06-15 16:49:55,956][1648985] Avg episode reward: [(0, '138.700')] [2024-06-15 16:49:58,266][1652491] Updated weights for policy 0, policy_version 430711 (0.0023) [2024-06-15 16:50:00,474][1652491] Updated weights for policy 0, policy_version 430768 (0.0023) [2024-06-15 16:50:00,955][1648985] Fps is (10 sec: 52428.1, 60 sec: 44782.8, 300 sec: 46652.8). Total num frames: 882245632. Throughput: 0: 11377.8. Samples: 220615168. Policy #0 lag: (min: 53.0, avg: 164.2, max: 309.0) [2024-06-15 16:50:00,956][1648985] Avg episode reward: [(0, '133.690')] [2024-06-15 16:50:03,469][1652491] Updated weights for policy 0, policy_version 430816 (0.0014) [2024-06-15 16:50:04,512][1652491] Updated weights for policy 0, policy_version 430853 (0.0013) [2024-06-15 16:50:05,185][1651469] Signal inference workers to stop experience collection... (22450 times) [2024-06-15 16:50:05,244][1652491] InferenceWorker_p0-w0: stopping experience collection (22450 times) [2024-06-15 16:50:05,458][1651469] Signal inference workers to resume experience collection... (22450 times) [2024-06-15 16:50:05,459][1652491] InferenceWorker_p0-w0: resuming experience collection (22450 times) [2024-06-15 16:50:05,955][1648985] Fps is (10 sec: 52428.2, 60 sec: 46421.3, 300 sec: 47097.0). Total num frames: 882507776. Throughput: 0: 11218.4. Samples: 220679168. Policy #0 lag: (min: 53.0, avg: 164.2, max: 309.0) [2024-06-15 16:50:05,956][1648985] Avg episode reward: [(0, '139.550')] [2024-06-15 16:50:08,319][1652491] Updated weights for policy 0, policy_version 430913 (0.0013) [2024-06-15 16:50:09,424][1652491] Updated weights for policy 0, policy_version 430976 (0.0014) [2024-06-15 16:50:10,955][1648985] Fps is (10 sec: 39321.6, 60 sec: 45329.0, 300 sec: 46541.6). Total num frames: 882638848. Throughput: 0: 11502.9. Samples: 220723200. Policy #0 lag: (min: 53.0, avg: 164.2, max: 309.0) [2024-06-15 16:50:10,956][1648985] Avg episode reward: [(0, '144.750')] [2024-06-15 16:50:13,791][1652491] Updated weights for policy 0, policy_version 431041 (0.0085) [2024-06-15 16:50:15,057][1652491] Updated weights for policy 0, policy_version 431098 (0.0025) [2024-06-15 16:50:15,955][1648985] Fps is (10 sec: 39322.1, 60 sec: 45875.3, 300 sec: 46652.7). Total num frames: 882900992. Throughput: 0: 11446.1. Samples: 220789248. Policy #0 lag: (min: 53.0, avg: 164.2, max: 309.0) [2024-06-15 16:50:15,956][1648985] Avg episode reward: [(0, '137.840')] [2024-06-15 16:50:17,068][1652491] Updated weights for policy 0, policy_version 431167 (0.0017) [2024-06-15 16:50:20,369][1652491] Updated weights for policy 0, policy_version 431226 (0.0121) [2024-06-15 16:50:20,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 45875.1, 300 sec: 46652.7). Total num frames: 883163136. Throughput: 0: 11553.5. Samples: 220858880. Policy #0 lag: (min: 53.0, avg: 164.2, max: 309.0) [2024-06-15 16:50:20,956][1648985] Avg episode reward: [(0, '145.840')] [2024-06-15 16:50:23,639][1652491] Updated weights for policy 0, policy_version 431284 (0.0013) [2024-06-15 16:50:25,955][1648985] Fps is (10 sec: 39321.5, 60 sec: 43690.6, 300 sec: 46430.6). Total num frames: 883294208. Throughput: 0: 11559.8. Samples: 220893184. Policy #0 lag: (min: 53.0, avg: 164.2, max: 309.0) [2024-06-15 16:50:25,956][1648985] Avg episode reward: [(0, '132.870')] [2024-06-15 16:50:26,200][1652491] Updated weights for policy 0, policy_version 431312 (0.0012) [2024-06-15 16:50:27,716][1652491] Updated weights for policy 0, policy_version 431364 (0.0013) [2024-06-15 16:50:29,063][1652491] Updated weights for policy 0, policy_version 431423 (0.0016) [2024-06-15 16:50:30,955][1648985] Fps is (10 sec: 45873.7, 60 sec: 46967.2, 300 sec: 46763.7). Total num frames: 883621888. Throughput: 0: 11491.5. Samples: 220958208. Policy #0 lag: (min: 53.0, avg: 164.2, max: 309.0) [2024-06-15 16:50:30,956][1648985] Avg episode reward: [(0, '146.360')] [2024-06-15 16:50:31,628][1652491] Updated weights for policy 0, policy_version 431482 (0.0014) [2024-06-15 16:50:35,480][1652491] Updated weights for policy 0, policy_version 431539 (0.0098) [2024-06-15 16:50:35,955][1648985] Fps is (10 sec: 52428.3, 60 sec: 45890.1, 300 sec: 46652.7). Total num frames: 883818496. Throughput: 0: 11468.8. Samples: 221027328. Policy #0 lag: (min: 53.0, avg: 164.2, max: 309.0) [2024-06-15 16:50:35,956][1648985] Avg episode reward: [(0, '141.990')] [2024-06-15 16:50:38,473][1652491] Updated weights for policy 0, policy_version 431588 (0.0014) [2024-06-15 16:50:40,285][1652491] Updated weights for policy 0, policy_version 431664 (0.0014) [2024-06-15 16:50:40,955][1648985] Fps is (10 sec: 45876.6, 60 sec: 46421.3, 300 sec: 46986.0). Total num frames: 884080640. Throughput: 0: 11685.0. Samples: 221064704. Policy #0 lag: (min: 53.0, avg: 164.2, max: 309.0) [2024-06-15 16:50:40,956][1648985] Avg episode reward: [(0, '140.680')] [2024-06-15 16:50:42,522][1652491] Updated weights for policy 0, policy_version 431712 (0.0033) [2024-06-15 16:50:45,752][1652491] Updated weights for policy 0, policy_version 431760 (0.0013) [2024-06-15 16:50:45,955][1648985] Fps is (10 sec: 42599.0, 60 sec: 46421.4, 300 sec: 46319.5). Total num frames: 884244480. Throughput: 0: 11571.2. Samples: 221135872. Policy #0 lag: (min: 53.0, avg: 164.2, max: 309.0) [2024-06-15 16:50:45,956][1648985] Avg episode reward: [(0, '138.100')] [2024-06-15 16:50:49,038][1652491] Updated weights for policy 0, policy_version 431840 (0.0014) [2024-06-15 16:50:50,715][1652491] Updated weights for policy 0, policy_version 431888 (0.0012) [2024-06-15 16:50:50,955][1648985] Fps is (10 sec: 42598.8, 60 sec: 46421.3, 300 sec: 46763.8). Total num frames: 884506624. Throughput: 0: 11639.5. Samples: 221202944. Policy #0 lag: (min: 53.0, avg: 164.2, max: 309.0) [2024-06-15 16:50:50,956][1648985] Avg episode reward: [(0, '156.460')] [2024-06-15 16:50:51,274][1651469] Signal inference workers to stop experience collection... (22500 times) [2024-06-15 16:50:51,314][1652491] InferenceWorker_p0-w0: stopping experience collection (22500 times) [2024-06-15 16:50:51,611][1651469] Signal inference workers to resume experience collection... (22500 times) [2024-06-15 16:50:51,612][1652491] InferenceWorker_p0-w0: resuming experience collection (22500 times) [2024-06-15 16:50:52,004][1652491] Updated weights for policy 0, policy_version 431936 (0.0014) [2024-06-15 16:50:54,663][1652491] Updated weights for policy 0, policy_version 431991 (0.0016) [2024-06-15 16:50:55,955][1648985] Fps is (10 sec: 49151.4, 60 sec: 45875.1, 300 sec: 46208.4). Total num frames: 884736000. Throughput: 0: 11366.4. Samples: 221234688. Policy #0 lag: (min: 60.0, avg: 191.2, max: 316.0) [2024-06-15 16:50:55,956][1648985] Avg episode reward: [(0, '179.790')] [2024-06-15 16:50:55,974][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000432000_884736000.pth... [2024-06-15 16:50:56,024][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000426608_873693184.pth [2024-06-15 16:50:56,058][1651469] Saving a milestone train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/milestones/checkpoint_000432000_884736000.pth [2024-06-15 16:50:57,734][1652491] Updated weights for policy 0, policy_version 432048 (0.0030) [2024-06-15 16:51:00,267][1652491] Updated weights for policy 0, policy_version 432096 (0.0011) [2024-06-15 16:51:00,955][1648985] Fps is (10 sec: 45875.6, 60 sec: 45329.2, 300 sec: 46541.7). Total num frames: 884965376. Throughput: 0: 11571.2. Samples: 221309952. Policy #0 lag: (min: 60.0, avg: 191.2, max: 316.0) [2024-06-15 16:51:00,955][1648985] Avg episode reward: [(0, '180.520')] [2024-06-15 16:51:01,856][1652491] Updated weights for policy 0, policy_version 432147 (0.0014) [2024-06-15 16:51:05,867][1652491] Updated weights for policy 0, policy_version 432225 (0.0014) [2024-06-15 16:51:05,955][1648985] Fps is (10 sec: 45875.8, 60 sec: 44783.0, 300 sec: 46208.4). Total num frames: 885194752. Throughput: 0: 11400.6. Samples: 221371904. Policy #0 lag: (min: 60.0, avg: 191.2, max: 316.0) [2024-06-15 16:51:05,955][1648985] Avg episode reward: [(0, '158.710')] [2024-06-15 16:51:08,667][1652491] Updated weights for policy 0, policy_version 432272 (0.0012) [2024-06-15 16:51:09,874][1652491] Updated weights for policy 0, policy_version 432317 (0.0023) [2024-06-15 16:51:10,955][1648985] Fps is (10 sec: 42597.1, 60 sec: 45875.1, 300 sec: 46208.4). Total num frames: 885391360. Throughput: 0: 11537.0. Samples: 221412352. Policy #0 lag: (min: 60.0, avg: 191.2, max: 316.0) [2024-06-15 16:51:10,956][1648985] Avg episode reward: [(0, '159.800')] [2024-06-15 16:51:12,381][1652491] Updated weights for policy 0, policy_version 432370 (0.0013) [2024-06-15 16:51:13,819][1652491] Updated weights for policy 0, policy_version 432419 (0.0013) [2024-06-15 16:51:15,955][1648985] Fps is (10 sec: 45875.0, 60 sec: 45875.2, 300 sec: 46208.4). Total num frames: 885653504. Throughput: 0: 11525.8. Samples: 221476864. Policy #0 lag: (min: 60.0, avg: 191.2, max: 316.0) [2024-06-15 16:51:15,956][1648985] Avg episode reward: [(0, '152.650')] [2024-06-15 16:51:16,497][1652491] Updated weights for policy 0, policy_version 432464 (0.0014) [2024-06-15 16:51:17,384][1652491] Updated weights for policy 0, policy_version 432507 (0.0014) [2024-06-15 16:51:20,956][1648985] Fps is (10 sec: 49150.1, 60 sec: 45328.7, 300 sec: 46097.3). Total num frames: 885882880. Throughput: 0: 11593.8. Samples: 221549056. Policy #0 lag: (min: 60.0, avg: 191.2, max: 316.0) [2024-06-15 16:51:20,956][1648985] Avg episode reward: [(0, '163.410')] [2024-06-15 16:51:21,016][1652491] Updated weights for policy 0, policy_version 432565 (0.0062) [2024-06-15 16:51:22,340][1652491] Updated weights for policy 0, policy_version 432592 (0.0011) [2024-06-15 16:51:23,239][1652491] Updated weights for policy 0, policy_version 432634 (0.0018) [2024-06-15 16:51:24,916][1652491] Updated weights for policy 0, policy_version 432699 (0.0122) [2024-06-15 16:51:25,955][1648985] Fps is (10 sec: 52428.6, 60 sec: 48059.7, 300 sec: 46874.9). Total num frames: 886177792. Throughput: 0: 11548.5. Samples: 221584384. Policy #0 lag: (min: 60.0, avg: 191.2, max: 316.0) [2024-06-15 16:51:25,956][1648985] Avg episode reward: [(0, '156.930')] [2024-06-15 16:51:28,629][1652491] Updated weights for policy 0, policy_version 432757 (0.0018) [2024-06-15 16:51:30,955][1648985] Fps is (10 sec: 42600.8, 60 sec: 44783.2, 300 sec: 45986.3). Total num frames: 886308864. Throughput: 0: 11525.7. Samples: 221654528. Policy #0 lag: (min: 60.0, avg: 191.2, max: 316.0) [2024-06-15 16:51:30,956][1648985] Avg episode reward: [(0, '137.270')] [2024-06-15 16:51:31,717][1652491] Updated weights for policy 0, policy_version 432802 (0.0013) [2024-06-15 16:51:33,485][1652491] Updated weights for policy 0, policy_version 432850 (0.0012) [2024-06-15 16:51:34,799][1652491] Updated weights for policy 0, policy_version 432897 (0.0011) [2024-06-15 16:51:35,507][1651469] Signal inference workers to stop experience collection... (22550 times) [2024-06-15 16:51:35,545][1652491] InferenceWorker_p0-w0: stopping experience collection (22550 times) [2024-06-15 16:51:35,789][1651469] Signal inference workers to resume experience collection... (22550 times) [2024-06-15 16:51:35,798][1652491] InferenceWorker_p0-w0: resuming experience collection (22550 times) [2024-06-15 16:51:35,955][1648985] Fps is (10 sec: 49152.4, 60 sec: 47513.7, 300 sec: 46986.0). Total num frames: 886669312. Throughput: 0: 11571.2. Samples: 221723648. Policy #0 lag: (min: 60.0, avg: 191.2, max: 316.0) [2024-06-15 16:51:35,955][1648985] Avg episode reward: [(0, '146.270')] [2024-06-15 16:51:36,140][1652491] Updated weights for policy 0, policy_version 432960 (0.0013) [2024-06-15 16:51:39,401][1652491] Updated weights for policy 0, policy_version 433015 (0.0013) [2024-06-15 16:51:40,955][1648985] Fps is (10 sec: 52428.5, 60 sec: 45875.2, 300 sec: 46208.5). Total num frames: 886833152. Throughput: 0: 11844.3. Samples: 221767680. Policy #0 lag: (min: 60.0, avg: 191.2, max: 316.0) [2024-06-15 16:51:40,956][1648985] Avg episode reward: [(0, '153.920')] [2024-06-15 16:51:42,912][1652491] Updated weights for policy 0, policy_version 433082 (0.0012) [2024-06-15 16:51:45,235][1652491] Updated weights for policy 0, policy_version 433136 (0.0020) [2024-06-15 16:51:45,955][1648985] Fps is (10 sec: 42598.2, 60 sec: 47513.6, 300 sec: 46652.8). Total num frames: 887095296. Throughput: 0: 11639.4. Samples: 221833728. Policy #0 lag: (min: 60.0, avg: 191.2, max: 316.0) [2024-06-15 16:51:45,956][1648985] Avg episode reward: [(0, '168.150')] [2024-06-15 16:51:46,567][1652491] Updated weights for policy 0, policy_version 433185 (0.0123) [2024-06-15 16:51:49,687][1652491] Updated weights for policy 0, policy_version 433219 (0.0014) [2024-06-15 16:51:50,955][1648985] Fps is (10 sec: 52429.3, 60 sec: 47513.6, 300 sec: 46541.7). Total num frames: 887357440. Throughput: 0: 11855.6. Samples: 221905408. Policy #0 lag: (min: 60.0, avg: 191.2, max: 316.0) [2024-06-15 16:51:50,956][1648985] Avg episode reward: [(0, '145.990')] [2024-06-15 16:51:53,768][1652491] Updated weights for policy 0, policy_version 433312 (0.0017) [2024-06-15 16:51:55,955][1648985] Fps is (10 sec: 42598.4, 60 sec: 46421.4, 300 sec: 46319.5). Total num frames: 887521280. Throughput: 0: 11730.5. Samples: 221940224. Policy #0 lag: (min: 60.0, avg: 191.2, max: 316.0) [2024-06-15 16:51:55,956][1648985] Avg episode reward: [(0, '150.060')] [2024-06-15 16:51:56,003][1652491] Updated weights for policy 0, policy_version 433376 (0.0012) [2024-06-15 16:51:57,858][1652491] Updated weights for policy 0, policy_version 433426 (0.0011) [2024-06-15 16:51:58,665][1652491] Updated weights for policy 0, policy_version 433472 (0.0012) [2024-06-15 16:52:00,955][1648985] Fps is (10 sec: 39321.3, 60 sec: 46421.2, 300 sec: 46319.5). Total num frames: 887750656. Throughput: 0: 11901.1. Samples: 222012416. Policy #0 lag: (min: 60.0, avg: 191.2, max: 316.0) [2024-06-15 16:52:00,956][1648985] Avg episode reward: [(0, '160.650')] [2024-06-15 16:52:02,204][1652491] Updated weights for policy 0, policy_version 433533 (0.0012) [2024-06-15 16:52:05,241][1652491] Updated weights for policy 0, policy_version 433599 (0.0014) [2024-06-15 16:52:05,955][1648985] Fps is (10 sec: 49151.7, 60 sec: 46967.4, 300 sec: 46208.5). Total num frames: 888012800. Throughput: 0: 11821.6. Samples: 222081024. Policy #0 lag: (min: 60.0, avg: 191.2, max: 316.0) [2024-06-15 16:52:05,956][1648985] Avg episode reward: [(0, '152.330')] [2024-06-15 16:52:08,677][1652491] Updated weights for policy 0, policy_version 433666 (0.0012) [2024-06-15 16:52:10,027][1652491] Updated weights for policy 0, policy_version 433728 (0.0011) [2024-06-15 16:52:10,955][1648985] Fps is (10 sec: 52429.1, 60 sec: 48059.9, 300 sec: 46541.7). Total num frames: 888274944. Throughput: 0: 11753.3. Samples: 222113280. Policy #0 lag: (min: 60.0, avg: 191.2, max: 316.0) [2024-06-15 16:52:10,956][1648985] Avg episode reward: [(0, '147.810')] [2024-06-15 16:52:15,861][1652491] Updated weights for policy 0, policy_version 433797 (0.0044) [2024-06-15 16:52:15,955][1648985] Fps is (10 sec: 39322.0, 60 sec: 45875.2, 300 sec: 45764.1). Total num frames: 888406016. Throughput: 0: 11798.8. Samples: 222185472. Policy #0 lag: (min: 60.0, avg: 191.2, max: 316.0) [2024-06-15 16:52:15,955][1648985] Avg episode reward: [(0, '141.270')] [2024-06-15 16:52:17,395][1652491] Updated weights for policy 0, policy_version 433849 (0.0186) [2024-06-15 16:52:19,042][1652491] Updated weights for policy 0, policy_version 433915 (0.0017) [2024-06-15 16:52:20,955][1648985] Fps is (10 sec: 45875.1, 60 sec: 47514.0, 300 sec: 46430.6). Total num frames: 888733696. Throughput: 0: 11707.7. Samples: 222250496. Policy #0 lag: (min: 53.0, avg: 144.1, max: 309.0) [2024-06-15 16:52:20,957][1648985] Avg episode reward: [(0, '149.360')] [2024-06-15 16:52:21,065][1651469] Signal inference workers to stop experience collection... (22600 times) [2024-06-15 16:52:21,110][1652491] InferenceWorker_p0-w0: stopping experience collection (22600 times) [2024-06-15 16:52:21,324][1651469] Signal inference workers to resume experience collection... (22600 times) [2024-06-15 16:52:21,325][1652491] InferenceWorker_p0-w0: resuming experience collection (22600 times) [2024-06-15 16:52:21,328][1652491] Updated weights for policy 0, policy_version 433968 (0.0012) [2024-06-15 16:52:23,757][1652491] Updated weights for policy 0, policy_version 433988 (0.0012) [2024-06-15 16:52:24,793][1652491] Updated weights for policy 0, policy_version 434047 (0.0013) [2024-06-15 16:52:25,955][1648985] Fps is (10 sec: 52428.5, 60 sec: 45875.2, 300 sec: 45986.3). Total num frames: 888930304. Throughput: 0: 11673.6. Samples: 222292992. Policy #0 lag: (min: 53.0, avg: 144.1, max: 309.0) [2024-06-15 16:52:25,956][1648985] Avg episode reward: [(0, '139.670')] [2024-06-15 16:52:27,559][1652491] Updated weights for policy 0, policy_version 434111 (0.0012) [2024-06-15 16:52:29,386][1652491] Updated weights for policy 0, policy_version 434173 (0.0146) [2024-06-15 16:52:30,955][1648985] Fps is (10 sec: 45875.6, 60 sec: 48059.8, 300 sec: 46430.6). Total num frames: 889192448. Throughput: 0: 11628.1. Samples: 222356992. Policy #0 lag: (min: 53.0, avg: 144.1, max: 309.0) [2024-06-15 16:52:30,955][1648985] Avg episode reward: [(0, '139.850')] [2024-06-15 16:52:32,266][1652491] Updated weights for policy 0, policy_version 434230 (0.0013) [2024-06-15 16:52:35,547][1652491] Updated weights for policy 0, policy_version 434272 (0.0034) [2024-06-15 16:52:35,955][1648985] Fps is (10 sec: 49152.3, 60 sec: 45875.2, 300 sec: 46097.4). Total num frames: 889421824. Throughput: 0: 11787.4. Samples: 222435840. Policy #0 lag: (min: 53.0, avg: 144.1, max: 309.0) [2024-06-15 16:52:35,956][1648985] Avg episode reward: [(0, '147.520')] [2024-06-15 16:52:38,226][1652491] Updated weights for policy 0, policy_version 434340 (0.0023) [2024-06-15 16:52:39,968][1652491] Updated weights for policy 0, policy_version 434400 (0.0013) [2024-06-15 16:52:40,955][1648985] Fps is (10 sec: 52428.5, 60 sec: 48059.8, 300 sec: 46652.7). Total num frames: 889716736. Throughput: 0: 11685.0. Samples: 222466048. Policy #0 lag: (min: 53.0, avg: 144.1, max: 309.0) [2024-06-15 16:52:40,956][1648985] Avg episode reward: [(0, '160.860')] [2024-06-15 16:52:43,162][1652491] Updated weights for policy 0, policy_version 434452 (0.0013) [2024-06-15 16:52:45,955][1648985] Fps is (10 sec: 42598.5, 60 sec: 45875.3, 300 sec: 46097.4). Total num frames: 889847808. Throughput: 0: 11594.0. Samples: 222534144. Policy #0 lag: (min: 53.0, avg: 144.1, max: 309.0) [2024-06-15 16:52:45,956][1648985] Avg episode reward: [(0, '157.340')] [2024-06-15 16:52:46,183][1652491] Updated weights for policy 0, policy_version 434497 (0.0018) [2024-06-15 16:52:47,238][1652491] Updated weights for policy 0, policy_version 434553 (0.0110) [2024-06-15 16:52:49,124][1652491] Updated weights for policy 0, policy_version 434584 (0.0012) [2024-06-15 16:52:50,237][1652491] Updated weights for policy 0, policy_version 434626 (0.0020) [2024-06-15 16:52:50,955][1648985] Fps is (10 sec: 42598.7, 60 sec: 46421.3, 300 sec: 46319.5). Total num frames: 890142720. Throughput: 0: 11741.9. Samples: 222609408. Policy #0 lag: (min: 53.0, avg: 144.1, max: 309.0) [2024-06-15 16:52:50,956][1648985] Avg episode reward: [(0, '133.210')] [2024-06-15 16:52:51,730][1652491] Updated weights for policy 0, policy_version 434684 (0.0036) [2024-06-15 16:52:54,647][1652491] Updated weights for policy 0, policy_version 434736 (0.0016) [2024-06-15 16:52:55,955][1648985] Fps is (10 sec: 52428.0, 60 sec: 47513.5, 300 sec: 46319.5). Total num frames: 890372096. Throughput: 0: 11878.4. Samples: 222647808. Policy #0 lag: (min: 53.0, avg: 144.1, max: 309.0) [2024-06-15 16:52:55,956][1648985] Avg episode reward: [(0, '134.840')] [2024-06-15 16:52:55,961][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000434752_890372096.pth... [2024-06-15 16:52:56,026][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000429360_879329280.pth [2024-06-15 16:52:56,863][1652491] Updated weights for policy 0, policy_version 434779 (0.0116) [2024-06-15 16:52:57,576][1652491] Updated weights for policy 0, policy_version 434816 (0.0170) [2024-06-15 16:53:00,956][1648985] Fps is (10 sec: 45873.9, 60 sec: 47513.4, 300 sec: 46097.3). Total num frames: 890601472. Throughput: 0: 11958.0. Samples: 222723584. Policy #0 lag: (min: 53.0, avg: 144.1, max: 309.0) [2024-06-15 16:53:00,957][1648985] Avg episode reward: [(0, '143.810')] [2024-06-15 16:53:00,981][1652491] Updated weights for policy 0, policy_version 434880 (0.0015) [2024-06-15 16:53:05,160][1652491] Updated weights for policy 0, policy_version 434960 (0.0016) [2024-06-15 16:53:05,955][1648985] Fps is (10 sec: 45875.9, 60 sec: 46967.6, 300 sec: 46430.6). Total num frames: 890830848. Throughput: 0: 11901.2. Samples: 222786048. Policy #0 lag: (min: 53.0, avg: 144.1, max: 309.0) [2024-06-15 16:53:05,955][1648985] Avg episode reward: [(0, '151.420')] [2024-06-15 16:53:07,713][1651469] Signal inference workers to stop experience collection... (22650 times) [2024-06-15 16:53:07,742][1652491] Updated weights for policy 0, policy_version 435009 (0.0013) [2024-06-15 16:53:07,800][1652491] InferenceWorker_p0-w0: stopping experience collection (22650 times) [2024-06-15 16:53:08,018][1651469] Signal inference workers to resume experience collection... (22650 times) [2024-06-15 16:53:08,019][1652491] InferenceWorker_p0-w0: resuming experience collection (22650 times) [2024-06-15 16:53:10,955][1648985] Fps is (10 sec: 42599.5, 60 sec: 45875.2, 300 sec: 46097.4). Total num frames: 891027456. Throughput: 0: 11707.7. Samples: 222819840. Policy #0 lag: (min: 53.0, avg: 144.1, max: 309.0) [2024-06-15 16:53:10,955][1648985] Avg episode reward: [(0, '136.330')] [2024-06-15 16:53:12,364][1652491] Updated weights for policy 0, policy_version 435120 (0.0014) [2024-06-15 16:53:13,796][1652491] Updated weights for policy 0, policy_version 435171 (0.0012) [2024-06-15 16:53:15,955][1648985] Fps is (10 sec: 45874.8, 60 sec: 48059.7, 300 sec: 46430.6). Total num frames: 891289600. Throughput: 0: 11639.5. Samples: 222880768. Policy #0 lag: (min: 53.0, avg: 144.1, max: 309.0) [2024-06-15 16:53:15,956][1648985] Avg episode reward: [(0, '135.720')] [2024-06-15 16:53:17,707][1652491] Updated weights for policy 0, policy_version 435232 (0.0011) [2024-06-15 16:53:19,457][1652491] Updated weights for policy 0, policy_version 435265 (0.0012) [2024-06-15 16:53:20,955][1648985] Fps is (10 sec: 52428.6, 60 sec: 46967.5, 300 sec: 46319.5). Total num frames: 891551744. Throughput: 0: 11537.0. Samples: 222955008. Policy #0 lag: (min: 53.0, avg: 144.1, max: 309.0) [2024-06-15 16:53:20,956][1648985] Avg episode reward: [(0, '134.640')] [2024-06-15 16:53:23,621][1652491] Updated weights for policy 0, policy_version 435360 (0.0140) [2024-06-15 16:53:25,616][1652491] Updated weights for policy 0, policy_version 435425 (0.0013) [2024-06-15 16:53:25,955][1648985] Fps is (10 sec: 49152.3, 60 sec: 47513.6, 300 sec: 46541.7). Total num frames: 891781120. Throughput: 0: 11605.4. Samples: 222988288. Policy #0 lag: (min: 53.0, avg: 144.1, max: 309.0) [2024-06-15 16:53:25,955][1648985] Avg episode reward: [(0, '140.140')] [2024-06-15 16:53:30,138][1652491] Updated weights for policy 0, policy_version 435488 (0.0013) [2024-06-15 16:53:30,955][1648985] Fps is (10 sec: 39320.9, 60 sec: 45875.0, 300 sec: 46208.4). Total num frames: 891944960. Throughput: 0: 11571.1. Samples: 223054848. Policy #0 lag: (min: 53.0, avg: 144.1, max: 309.0) [2024-06-15 16:53:30,956][1648985] Avg episode reward: [(0, '129.090')] [2024-06-15 16:53:32,051][1652491] Updated weights for policy 0, policy_version 435541 (0.0013) [2024-06-15 16:53:32,788][1652491] Updated weights for policy 0, policy_version 435580 (0.0013) [2024-06-15 16:53:35,705][1652491] Updated weights for policy 0, policy_version 435633 (0.0014) [2024-06-15 16:53:35,955][1648985] Fps is (10 sec: 39321.4, 60 sec: 45875.2, 300 sec: 46097.4). Total num frames: 892174336. Throughput: 0: 11343.6. Samples: 223119872. Policy #0 lag: (min: 53.0, avg: 144.1, max: 309.0) [2024-06-15 16:53:35,956][1648985] Avg episode reward: [(0, '166.430')] [2024-06-15 16:53:37,303][1652491] Updated weights for policy 0, policy_version 435704 (0.0013) [2024-06-15 16:53:40,955][1648985] Fps is (10 sec: 39323.0, 60 sec: 43690.8, 300 sec: 45764.1). Total num frames: 892338176. Throughput: 0: 11184.4. Samples: 223151104. Policy #0 lag: (min: 53.0, avg: 144.1, max: 309.0) [2024-06-15 16:53:40,955][1648985] Avg episode reward: [(0, '158.470')] [2024-06-15 16:53:42,027][1652491] Updated weights for policy 0, policy_version 435760 (0.0014) [2024-06-15 16:53:43,756][1652491] Updated weights for policy 0, policy_version 435809 (0.0085) [2024-06-15 16:53:45,955][1648985] Fps is (10 sec: 42598.2, 60 sec: 45875.1, 300 sec: 45764.1). Total num frames: 892600320. Throughput: 0: 11059.2. Samples: 223221248. Policy #0 lag: (min: 15.0, avg: 104.7, max: 271.0) [2024-06-15 16:53:45,956][1648985] Avg episode reward: [(0, '147.860')] [2024-06-15 16:53:46,434][1652491] Updated weights for policy 0, policy_version 435864 (0.0018) [2024-06-15 16:53:48,239][1652491] Updated weights for policy 0, policy_version 435938 (0.0011) [2024-06-15 16:53:50,955][1648985] Fps is (10 sec: 52428.0, 60 sec: 45329.0, 300 sec: 45986.3). Total num frames: 892862464. Throughput: 0: 11229.9. Samples: 223291392. Policy #0 lag: (min: 15.0, avg: 104.7, max: 271.0) [2024-06-15 16:53:50,956][1648985] Avg episode reward: [(0, '149.030')] [2024-06-15 16:53:53,133][1652491] Updated weights for policy 0, policy_version 435984 (0.0034) [2024-06-15 16:53:53,240][1651469] Signal inference workers to stop experience collection... (22700 times) [2024-06-15 16:53:53,297][1652491] InferenceWorker_p0-w0: stopping experience collection (22700 times) [2024-06-15 16:53:53,475][1651469] Signal inference workers to resume experience collection... (22700 times) [2024-06-15 16:53:53,476][1652491] InferenceWorker_p0-w0: resuming experience collection (22700 times) [2024-06-15 16:53:54,622][1652491] Updated weights for policy 0, policy_version 436048 (0.0014) [2024-06-15 16:53:55,727][1652491] Updated weights for policy 0, policy_version 436093 (0.0016) [2024-06-15 16:53:55,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 45875.2, 300 sec: 45986.3). Total num frames: 893124608. Throughput: 0: 11298.1. Samples: 223328256. Policy #0 lag: (min: 15.0, avg: 104.7, max: 271.0) [2024-06-15 16:53:55,956][1648985] Avg episode reward: [(0, '170.600')] [2024-06-15 16:53:58,217][1652491] Updated weights for policy 0, policy_version 436148 (0.0136) [2024-06-15 16:53:59,805][1652491] Updated weights for policy 0, policy_version 436219 (0.0110) [2024-06-15 16:54:00,955][1648985] Fps is (10 sec: 52428.6, 60 sec: 46421.5, 300 sec: 46319.5). Total num frames: 893386752. Throughput: 0: 11377.8. Samples: 223392768. Policy #0 lag: (min: 15.0, avg: 104.7, max: 271.0) [2024-06-15 16:54:00,956][1648985] Avg episode reward: [(0, '147.900')] [2024-06-15 16:54:04,968][1652491] Updated weights for policy 0, policy_version 436278 (0.0013) [2024-06-15 16:54:05,955][1648985] Fps is (10 sec: 42598.8, 60 sec: 45329.0, 300 sec: 46208.4). Total num frames: 893550592. Throughput: 0: 11446.1. Samples: 223470080. Policy #0 lag: (min: 15.0, avg: 104.7, max: 271.0) [2024-06-15 16:54:05,956][1648985] Avg episode reward: [(0, '134.830')] [2024-06-15 16:54:06,564][1652491] Updated weights for policy 0, policy_version 436336 (0.0013) [2024-06-15 16:54:09,192][1652491] Updated weights for policy 0, policy_version 436400 (0.0035) [2024-06-15 16:54:10,836][1652491] Updated weights for policy 0, policy_version 436464 (0.0013) [2024-06-15 16:54:10,955][1648985] Fps is (10 sec: 49151.8, 60 sec: 47513.5, 300 sec: 46541.7). Total num frames: 893878272. Throughput: 0: 11423.3. Samples: 223502336. Policy #0 lag: (min: 15.0, avg: 104.7, max: 271.0) [2024-06-15 16:54:10,956][1648985] Avg episode reward: [(0, '131.970')] [2024-06-15 16:54:15,955][1648985] Fps is (10 sec: 39321.3, 60 sec: 44236.8, 300 sec: 45875.2). Total num frames: 893943808. Throughput: 0: 11503.0. Samples: 223572480. Policy #0 lag: (min: 15.0, avg: 104.7, max: 271.0) [2024-06-15 16:54:15,956][1648985] Avg episode reward: [(0, '169.240')] [2024-06-15 16:54:16,572][1652491] Updated weights for policy 0, policy_version 436528 (0.0012) [2024-06-15 16:54:17,782][1652491] Updated weights for policy 0, policy_version 436579 (0.0011) [2024-06-15 16:54:19,974][1652491] Updated weights for policy 0, policy_version 436624 (0.0012) [2024-06-15 16:54:20,955][1648985] Fps is (10 sec: 39322.0, 60 sec: 45329.1, 300 sec: 46097.4). Total num frames: 894271488. Throughput: 0: 11537.1. Samples: 223639040. Policy #0 lag: (min: 15.0, avg: 104.7, max: 271.0) [2024-06-15 16:54:20,956][1648985] Avg episode reward: [(0, '175.700')] [2024-06-15 16:54:21,467][1652491] Updated weights for policy 0, policy_version 436688 (0.0014) [2024-06-15 16:54:25,955][1648985] Fps is (10 sec: 49151.1, 60 sec: 44236.6, 300 sec: 46208.4). Total num frames: 894435328. Throughput: 0: 11673.5. Samples: 223676416. Policy #0 lag: (min: 15.0, avg: 104.7, max: 271.0) [2024-06-15 16:54:25,956][1648985] Avg episode reward: [(0, '150.390')] [2024-06-15 16:54:27,711][1652491] Updated weights for policy 0, policy_version 436784 (0.0015) [2024-06-15 16:54:29,678][1652491] Updated weights for policy 0, policy_version 436848 (0.0029) [2024-06-15 16:54:30,955][1648985] Fps is (10 sec: 42598.3, 60 sec: 45875.4, 300 sec: 46211.5). Total num frames: 894697472. Throughput: 0: 11605.3. Samples: 223743488. Policy #0 lag: (min: 15.0, avg: 104.7, max: 271.0) [2024-06-15 16:54:30,956][1648985] Avg episode reward: [(0, '134.110')] [2024-06-15 16:54:32,206][1652491] Updated weights for policy 0, policy_version 436916 (0.0015) [2024-06-15 16:54:32,920][1651469] Signal inference workers to stop experience collection... (22750 times) [2024-06-15 16:54:32,966][1652491] InferenceWorker_p0-w0: stopping experience collection (22750 times) [2024-06-15 16:54:33,218][1651469] Signal inference workers to resume experience collection... (22750 times) [2024-06-15 16:54:33,219][1652491] InferenceWorker_p0-w0: resuming experience collection (22750 times) [2024-06-15 16:54:33,467][1652491] Updated weights for policy 0, policy_version 436976 (0.0013) [2024-06-15 16:54:35,955][1648985] Fps is (10 sec: 52430.1, 60 sec: 46421.4, 300 sec: 46319.5). Total num frames: 894959616. Throughput: 0: 11719.1. Samples: 223818752. Policy #0 lag: (min: 15.0, avg: 104.7, max: 271.0) [2024-06-15 16:54:35,956][1648985] Avg episode reward: [(0, '123.700')] [2024-06-15 16:54:37,752][1652491] Updated weights for policy 0, policy_version 437013 (0.0018) [2024-06-15 16:54:39,678][1652491] Updated weights for policy 0, policy_version 437074 (0.0014) [2024-06-15 16:54:40,962][1648985] Fps is (10 sec: 52391.7, 60 sec: 48053.9, 300 sec: 46651.6). Total num frames: 895221760. Throughput: 0: 11740.0. Samples: 223856640. Policy #0 lag: (min: 15.0, avg: 104.7, max: 271.0) [2024-06-15 16:54:40,963][1648985] Avg episode reward: [(0, '131.320')] [2024-06-15 16:54:42,219][1652491] Updated weights for policy 0, policy_version 437136 (0.0013) [2024-06-15 16:54:44,104][1652491] Updated weights for policy 0, policy_version 437216 (0.0012) [2024-06-15 16:54:45,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 48059.8, 300 sec: 46652.7). Total num frames: 895483904. Throughput: 0: 11639.5. Samples: 223916544. Policy #0 lag: (min: 15.0, avg: 104.7, max: 271.0) [2024-06-15 16:54:45,955][1648985] Avg episode reward: [(0, '126.500')] [2024-06-15 16:54:49,140][1652491] Updated weights for policy 0, policy_version 437296 (0.0015) [2024-06-15 16:54:50,955][1648985] Fps is (10 sec: 39349.7, 60 sec: 45875.2, 300 sec: 46208.4). Total num frames: 895614976. Throughput: 0: 11730.5. Samples: 223997952. Policy #0 lag: (min: 15.0, avg: 104.7, max: 271.0) [2024-06-15 16:54:50,955][1648985] Avg episode reward: [(0, '117.430')] [2024-06-15 16:54:52,271][1652491] Updated weights for policy 0, policy_version 437371 (0.0042) [2024-06-15 16:54:53,954][1652491] Updated weights for policy 0, policy_version 437411 (0.0012) [2024-06-15 16:54:55,848][1652491] Updated weights for policy 0, policy_version 437502 (0.0031) [2024-06-15 16:54:55,975][1648985] Fps is (10 sec: 52326.2, 60 sec: 48044.1, 300 sec: 46649.7). Total num frames: 896008192. Throughput: 0: 11679.9. Samples: 224028160. Policy #0 lag: (min: 15.0, avg: 104.7, max: 271.0) [2024-06-15 16:54:55,975][1648985] Avg episode reward: [(0, '122.390')] [2024-06-15 16:54:55,982][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000437504_896008192.pth... [2024-06-15 16:54:56,081][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000432000_884736000.pth [2024-06-15 16:55:00,559][1652491] Updated weights for policy 0, policy_version 437564 (0.0125) [2024-06-15 16:55:00,955][1648985] Fps is (10 sec: 52428.6, 60 sec: 45875.2, 300 sec: 46208.5). Total num frames: 896139264. Throughput: 0: 11787.4. Samples: 224102912. Policy #0 lag: (min: 15.0, avg: 104.7, max: 271.0) [2024-06-15 16:55:00,956][1648985] Avg episode reward: [(0, '150.990')] [2024-06-15 16:55:03,104][1652491] Updated weights for policy 0, policy_version 437621 (0.0012) [2024-06-15 16:55:05,545][1652491] Updated weights for policy 0, policy_version 437688 (0.0014) [2024-06-15 16:55:05,955][1648985] Fps is (10 sec: 39398.8, 60 sec: 47513.6, 300 sec: 46652.8). Total num frames: 896401408. Throughput: 0: 11673.6. Samples: 224164352. Policy #0 lag: (min: 0.0, avg: 111.2, max: 256.0) [2024-06-15 16:55:05,956][1648985] Avg episode reward: [(0, '142.130')] [2024-06-15 16:55:07,241][1652491] Updated weights for policy 0, policy_version 437758 (0.0017) [2024-06-15 16:55:10,955][1648985] Fps is (10 sec: 39321.7, 60 sec: 44236.9, 300 sec: 46208.4). Total num frames: 896532480. Throughput: 0: 11594.0. Samples: 224198144. Policy #0 lag: (min: 0.0, avg: 111.2, max: 256.0) [2024-06-15 16:55:10,956][1648985] Avg episode reward: [(0, '145.170')] [2024-06-15 16:55:12,635][1652491] Updated weights for policy 0, policy_version 437817 (0.0013) [2024-06-15 16:55:14,109][1652491] Updated weights for policy 0, policy_version 437872 (0.0013) [2024-06-15 16:55:15,955][1648985] Fps is (10 sec: 42598.2, 60 sec: 48059.7, 300 sec: 46319.5). Total num frames: 896827392. Throughput: 0: 11719.1. Samples: 224270848. Policy #0 lag: (min: 0.0, avg: 111.2, max: 256.0) [2024-06-15 16:55:15,956][1648985] Avg episode reward: [(0, '138.520')] [2024-06-15 16:55:16,280][1652491] Updated weights for policy 0, policy_version 437920 (0.0036) [2024-06-15 16:55:17,229][1651469] Signal inference workers to stop experience collection... (22800 times) [2024-06-15 16:55:17,273][1652491] InferenceWorker_p0-w0: stopping experience collection (22800 times) [2024-06-15 16:55:17,537][1651469] Signal inference workers to resume experience collection... (22800 times) [2024-06-15 16:55:17,538][1652491] InferenceWorker_p0-w0: resuming experience collection (22800 times) [2024-06-15 16:55:17,685][1652491] Updated weights for policy 0, policy_version 437970 (0.0012) [2024-06-15 16:55:20,955][1648985] Fps is (10 sec: 52428.6, 60 sec: 46421.3, 300 sec: 46652.7). Total num frames: 897056768. Throughput: 0: 11650.8. Samples: 224343040. Policy #0 lag: (min: 0.0, avg: 111.2, max: 256.0) [2024-06-15 16:55:20,956][1648985] Avg episode reward: [(0, '152.940')] [2024-06-15 16:55:23,451][1652491] Updated weights for policy 0, policy_version 438050 (0.0012) [2024-06-15 16:55:24,993][1652491] Updated weights for policy 0, policy_version 438097 (0.0012) [2024-06-15 16:55:25,955][1648985] Fps is (10 sec: 49152.0, 60 sec: 48059.9, 300 sec: 46430.6). Total num frames: 897318912. Throughput: 0: 11584.4. Samples: 224377856. Policy #0 lag: (min: 0.0, avg: 111.2, max: 256.0) [2024-06-15 16:55:25,956][1648985] Avg episode reward: [(0, '143.540')] [2024-06-15 16:55:26,364][1652491] Updated weights for policy 0, policy_version 438145 (0.0014) [2024-06-15 16:55:27,994][1652491] Updated weights for policy 0, policy_version 438211 (0.0016) [2024-06-15 16:55:30,955][1648985] Fps is (10 sec: 52427.4, 60 sec: 48059.5, 300 sec: 46652.7). Total num frames: 897581056. Throughput: 0: 11639.4. Samples: 224440320. Policy #0 lag: (min: 0.0, avg: 111.2, max: 256.0) [2024-06-15 16:55:30,956][1648985] Avg episode reward: [(0, '144.810')] [2024-06-15 16:55:34,000][1652491] Updated weights for policy 0, policy_version 438289 (0.0014) [2024-06-15 16:55:35,955][1648985] Fps is (10 sec: 39321.7, 60 sec: 45875.2, 300 sec: 46208.4). Total num frames: 897712128. Throughput: 0: 11685.0. Samples: 224523776. Policy #0 lag: (min: 0.0, avg: 111.2, max: 256.0) [2024-06-15 16:55:35,956][1648985] Avg episode reward: [(0, '146.980')] [2024-06-15 16:55:36,236][1652491] Updated weights for policy 0, policy_version 438352 (0.0043) [2024-06-15 16:55:37,824][1652491] Updated weights for policy 0, policy_version 438418 (0.0014) [2024-06-15 16:55:38,944][1652491] Updated weights for policy 0, policy_version 438464 (0.0012) [2024-06-15 16:55:40,840][1652491] Updated weights for policy 0, policy_version 438520 (0.0013) [2024-06-15 16:55:40,955][1648985] Fps is (10 sec: 49153.5, 60 sec: 47519.2, 300 sec: 46874.9). Total num frames: 898072576. Throughput: 0: 11576.2. Samples: 224548864. Policy #0 lag: (min: 0.0, avg: 111.2, max: 256.0) [2024-06-15 16:55:40,956][1648985] Avg episode reward: [(0, '154.480')] [2024-06-15 16:55:45,955][1648985] Fps is (10 sec: 45875.2, 60 sec: 44782.9, 300 sec: 46319.5). Total num frames: 898170880. Throughput: 0: 11685.0. Samples: 224628736. Policy #0 lag: (min: 0.0, avg: 111.2, max: 256.0) [2024-06-15 16:55:45,956][1648985] Avg episode reward: [(0, '166.650')] [2024-06-15 16:55:46,123][1652491] Updated weights for policy 0, policy_version 438584 (0.0013) [2024-06-15 16:55:48,535][1652491] Updated weights for policy 0, policy_version 438629 (0.0012) [2024-06-15 16:55:49,802][1652491] Updated weights for policy 0, policy_version 438688 (0.0018) [2024-06-15 16:55:50,955][1648985] Fps is (10 sec: 42598.4, 60 sec: 48059.7, 300 sec: 46652.8). Total num frames: 898498560. Throughput: 0: 11730.5. Samples: 224692224. Policy #0 lag: (min: 0.0, avg: 111.2, max: 256.0) [2024-06-15 16:55:50,956][1648985] Avg episode reward: [(0, '147.920')] [2024-06-15 16:55:51,018][1652491] Updated weights for policy 0, policy_version 438725 (0.0010) [2024-06-15 16:55:52,041][1652491] Updated weights for policy 0, policy_version 438780 (0.0013) [2024-06-15 16:55:55,955][1648985] Fps is (10 sec: 49151.8, 60 sec: 44251.2, 300 sec: 46430.6). Total num frames: 898662400. Throughput: 0: 11821.5. Samples: 224730112. Policy #0 lag: (min: 0.0, avg: 111.2, max: 256.0) [2024-06-15 16:55:55,956][1648985] Avg episode reward: [(0, '159.500')] [2024-06-15 16:55:56,788][1652491] Updated weights for policy 0, policy_version 438848 (0.0144) [2024-06-15 16:55:59,896][1652491] Updated weights for policy 0, policy_version 438912 (0.0124) [2024-06-15 16:56:00,006][1651469] Signal inference workers to stop experience collection... (22850 times) [2024-06-15 16:56:00,060][1652491] InferenceWorker_p0-w0: stopping experience collection (22850 times) [2024-06-15 16:56:00,286][1651469] Signal inference workers to resume experience collection... (22850 times) [2024-06-15 16:56:00,287][1652491] InferenceWorker_p0-w0: resuming experience collection (22850 times) [2024-06-15 16:56:00,955][1648985] Fps is (10 sec: 45875.0, 60 sec: 46967.4, 300 sec: 46652.7). Total num frames: 898957312. Throughput: 0: 11810.1. Samples: 224802304. Policy #0 lag: (min: 0.0, avg: 111.2, max: 256.0) [2024-06-15 16:56:00,956][1648985] Avg episode reward: [(0, '175.600')] [2024-06-15 16:56:01,198][1652491] Updated weights for policy 0, policy_version 438975 (0.0012) [2024-06-15 16:56:03,241][1652491] Updated weights for policy 0, policy_version 439034 (0.0012) [2024-06-15 16:56:05,955][1648985] Fps is (10 sec: 49152.1, 60 sec: 45875.2, 300 sec: 46652.8). Total num frames: 899153920. Throughput: 0: 11912.5. Samples: 224879104. Policy #0 lag: (min: 0.0, avg: 111.2, max: 256.0) [2024-06-15 16:56:05,956][1648985] Avg episode reward: [(0, '171.400')] [2024-06-15 16:56:07,692][1652491] Updated weights for policy 0, policy_version 439099 (0.0130) [2024-06-15 16:56:10,955][1648985] Fps is (10 sec: 45875.3, 60 sec: 48059.7, 300 sec: 46652.7). Total num frames: 899416064. Throughput: 0: 11867.0. Samples: 224911872. Policy #0 lag: (min: 0.0, avg: 111.2, max: 256.0) [2024-06-15 16:56:10,956][1648985] Avg episode reward: [(0, '163.920')] [2024-06-15 16:56:11,219][1652491] Updated weights for policy 0, policy_version 439200 (0.0013) [2024-06-15 16:56:14,671][1652491] Updated weights for policy 0, policy_version 439289 (0.0013) [2024-06-15 16:56:15,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 47513.6, 300 sec: 46763.9). Total num frames: 899678208. Throughput: 0: 11855.7. Samples: 224973824. Policy #0 lag: (min: 0.0, avg: 111.2, max: 256.0) [2024-06-15 16:56:15,956][1648985] Avg episode reward: [(0, '174.570')] [2024-06-15 16:56:18,524][1652491] Updated weights for policy 0, policy_version 439348 (0.0016) [2024-06-15 16:56:20,955][1648985] Fps is (10 sec: 39321.4, 60 sec: 45875.2, 300 sec: 46208.4). Total num frames: 899809280. Throughput: 0: 11685.0. Samples: 225049600. Policy #0 lag: (min: 0.0, avg: 111.2, max: 256.0) [2024-06-15 16:56:20,956][1648985] Avg episode reward: [(0, '194.600')] [2024-06-15 16:56:22,495][1652491] Updated weights for policy 0, policy_version 439409 (0.0012) [2024-06-15 16:56:23,622][1652491] Updated weights for policy 0, policy_version 439472 (0.0013) [2024-06-15 16:56:25,552][1652491] Updated weights for policy 0, policy_version 439522 (0.0056) [2024-06-15 16:56:25,955][1648985] Fps is (10 sec: 49152.2, 60 sec: 47513.6, 300 sec: 46986.0). Total num frames: 900169728. Throughput: 0: 11798.8. Samples: 225079808. Policy #0 lag: (min: 0.0, avg: 111.2, max: 256.0) [2024-06-15 16:56:25,956][1648985] Avg episode reward: [(0, '163.160')] [2024-06-15 16:56:28,439][1652491] Updated weights for policy 0, policy_version 439568 (0.0016) [2024-06-15 16:56:29,796][1652491] Updated weights for policy 0, policy_version 439616 (0.0014) [2024-06-15 16:56:30,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 45875.4, 300 sec: 46319.5). Total num frames: 900333568. Throughput: 0: 11616.7. Samples: 225151488. Policy #0 lag: (min: 15.0, avg: 130.8, max: 271.0) [2024-06-15 16:56:30,956][1648985] Avg episode reward: [(0, '137.360')] [2024-06-15 16:56:33,805][1652491] Updated weights for policy 0, policy_version 439689 (0.0013) [2024-06-15 16:56:35,955][1648985] Fps is (10 sec: 42598.4, 60 sec: 48059.8, 300 sec: 46652.8). Total num frames: 900595712. Throughput: 0: 11787.4. Samples: 225222656. Policy #0 lag: (min: 15.0, avg: 130.8, max: 271.0) [2024-06-15 16:56:35,956][1648985] Avg episode reward: [(0, '130.590')] [2024-06-15 16:56:36,492][1652491] Updated weights for policy 0, policy_version 439780 (0.0014) [2024-06-15 16:56:40,395][1652491] Updated weights for policy 0, policy_version 439856 (0.0014) [2024-06-15 16:56:40,955][1648985] Fps is (10 sec: 52428.3, 60 sec: 46421.2, 300 sec: 46652.7). Total num frames: 900857856. Throughput: 0: 11776.0. Samples: 225260032. Policy #0 lag: (min: 15.0, avg: 130.8, max: 271.0) [2024-06-15 16:56:40,956][1648985] Avg episode reward: [(0, '143.010')] [2024-06-15 16:56:44,013][1651469] Signal inference workers to stop experience collection... (22900 times) [2024-06-15 16:56:44,044][1652491] InferenceWorker_p0-w0: stopping experience collection (22900 times) [2024-06-15 16:56:44,054][1652491] Updated weights for policy 0, policy_version 439907 (0.0012) [2024-06-15 16:56:44,188][1651469] Signal inference workers to resume experience collection... (22900 times) [2024-06-15 16:56:44,198][1652491] InferenceWorker_p0-w0: resuming experience collection (22900 times) [2024-06-15 16:56:45,524][1652491] Updated weights for policy 0, policy_version 439984 (0.0131) [2024-06-15 16:56:45,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 49152.0, 300 sec: 46652.7). Total num frames: 901120000. Throughput: 0: 11776.0. Samples: 225332224. Policy #0 lag: (min: 15.0, avg: 130.8, max: 271.0) [2024-06-15 16:56:45,956][1648985] Avg episode reward: [(0, '139.380')] [2024-06-15 16:56:46,978][1652491] Updated weights for policy 0, policy_version 440021 (0.0015) [2024-06-15 16:56:47,871][1652491] Updated weights for policy 0, policy_version 440063 (0.0018) [2024-06-15 16:56:50,955][1648985] Fps is (10 sec: 45876.0, 60 sec: 46967.5, 300 sec: 46763.8). Total num frames: 901316608. Throughput: 0: 11719.1. Samples: 225406464. Policy #0 lag: (min: 15.0, avg: 130.8, max: 271.0) [2024-06-15 16:56:50,955][1648985] Avg episode reward: [(0, '130.950')] [2024-06-15 16:56:51,140][1652491] Updated weights for policy 0, policy_version 440112 (0.0165) [2024-06-15 16:56:54,582][1652491] Updated weights for policy 0, policy_version 440148 (0.0018) [2024-06-15 16:56:55,955][1648985] Fps is (10 sec: 42598.2, 60 sec: 48059.7, 300 sec: 46763.8). Total num frames: 901545984. Throughput: 0: 11901.1. Samples: 225447424. Policy #0 lag: (min: 15.0, avg: 130.8, max: 271.0) [2024-06-15 16:56:55,956][1648985] Avg episode reward: [(0, '153.270')] [2024-06-15 16:56:56,404][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000440240_901611520.pth... [2024-06-15 16:56:56,458][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000434752_890372096.pth [2024-06-15 16:56:56,625][1652491] Updated weights for policy 0, policy_version 440248 (0.0014) [2024-06-15 16:56:58,542][1652491] Updated weights for policy 0, policy_version 440315 (0.0012) [2024-06-15 16:57:00,967][1648985] Fps is (10 sec: 45820.7, 60 sec: 46958.2, 300 sec: 46650.9). Total num frames: 901775360. Throughput: 0: 11954.9. Samples: 225511936. Policy #0 lag: (min: 15.0, avg: 130.8, max: 271.0) [2024-06-15 16:57:00,968][1648985] Avg episode reward: [(0, '135.890')] [2024-06-15 16:57:02,542][1652491] Updated weights for policy 0, policy_version 440377 (0.0012) [2024-06-15 16:57:05,955][1648985] Fps is (10 sec: 42598.7, 60 sec: 46967.5, 300 sec: 46430.6). Total num frames: 901971968. Throughput: 0: 11889.8. Samples: 225584640. Policy #0 lag: (min: 15.0, avg: 130.8, max: 271.0) [2024-06-15 16:57:05,955][1648985] Avg episode reward: [(0, '134.550')] [2024-06-15 16:57:06,689][1652491] Updated weights for policy 0, policy_version 440448 (0.0095) [2024-06-15 16:57:09,247][1652491] Updated weights for policy 0, policy_version 440530 (0.0013) [2024-06-15 16:57:10,955][1648985] Fps is (10 sec: 52490.7, 60 sec: 48059.7, 300 sec: 47097.0). Total num frames: 902299648. Throughput: 0: 11855.6. Samples: 225613312. Policy #0 lag: (min: 15.0, avg: 130.8, max: 271.0) [2024-06-15 16:57:10,956][1648985] Avg episode reward: [(0, '137.190')] [2024-06-15 16:57:13,976][1652491] Updated weights for policy 0, policy_version 440608 (0.0013) [2024-06-15 16:57:15,958][1648985] Fps is (10 sec: 45859.9, 60 sec: 45872.7, 300 sec: 46430.1). Total num frames: 902430720. Throughput: 0: 11741.0. Samples: 225679872. Policy #0 lag: (min: 15.0, avg: 130.8, max: 271.0) [2024-06-15 16:57:15,959][1648985] Avg episode reward: [(0, '164.300')] [2024-06-15 16:57:16,761][1652491] Updated weights for policy 0, policy_version 440658 (0.0014) [2024-06-15 16:57:18,339][1652491] Updated weights for policy 0, policy_version 440720 (0.0024) [2024-06-15 16:57:20,882][1652491] Updated weights for policy 0, policy_version 440801 (0.0022) [2024-06-15 16:57:20,955][1648985] Fps is (10 sec: 45875.6, 60 sec: 49152.1, 300 sec: 46874.9). Total num frames: 902758400. Throughput: 0: 11639.5. Samples: 225746432. Policy #0 lag: (min: 15.0, avg: 130.8, max: 271.0) [2024-06-15 16:57:20,956][1648985] Avg episode reward: [(0, '169.820')] [2024-06-15 16:57:25,893][1652491] Updated weights for policy 0, policy_version 440864 (0.0122) [2024-06-15 16:57:25,955][1648985] Fps is (10 sec: 45890.0, 60 sec: 45329.0, 300 sec: 46430.6). Total num frames: 902889472. Throughput: 0: 11639.5. Samples: 225783808. Policy #0 lag: (min: 15.0, avg: 130.8, max: 271.0) [2024-06-15 16:57:25,956][1648985] Avg episode reward: [(0, '159.590')] [2024-06-15 16:57:28,365][1651469] Signal inference workers to stop experience collection... (22950 times) [2024-06-15 16:57:28,419][1652491] InferenceWorker_p0-w0: stopping experience collection (22950 times) [2024-06-15 16:57:28,572][1651469] Signal inference workers to resume experience collection... (22950 times) [2024-06-15 16:57:28,581][1652491] InferenceWorker_p0-w0: resuming experience collection (22950 times) [2024-06-15 16:57:28,583][1652491] Updated weights for policy 0, policy_version 440912 (0.0012) [2024-06-15 16:57:30,829][1652491] Updated weights for policy 0, policy_version 440994 (0.0012) [2024-06-15 16:57:30,955][1648985] Fps is (10 sec: 39321.5, 60 sec: 46967.5, 300 sec: 46541.7). Total num frames: 903151616. Throughput: 0: 11525.7. Samples: 225850880. Policy #0 lag: (min: 15.0, avg: 130.8, max: 271.0) [2024-06-15 16:57:30,956][1648985] Avg episode reward: [(0, '149.050')] [2024-06-15 16:57:32,086][1652491] Updated weights for policy 0, policy_version 441040 (0.0013) [2024-06-15 16:57:32,912][1652491] Updated weights for policy 0, policy_version 441078 (0.0011) [2024-06-15 16:57:35,955][1648985] Fps is (10 sec: 45875.0, 60 sec: 45875.1, 300 sec: 46208.4). Total num frames: 903348224. Throughput: 0: 11491.5. Samples: 225923584. Policy #0 lag: (min: 15.0, avg: 130.8, max: 271.0) [2024-06-15 16:57:35,956][1648985] Avg episode reward: [(0, '166.180')] [2024-06-15 16:57:37,322][1652491] Updated weights for policy 0, policy_version 441136 (0.0013) [2024-06-15 16:57:40,666][1652491] Updated weights for policy 0, policy_version 441184 (0.0013) [2024-06-15 16:57:40,955][1648985] Fps is (10 sec: 39321.3, 60 sec: 44783.0, 300 sec: 46430.6). Total num frames: 903544832. Throughput: 0: 11389.2. Samples: 225959936. Policy #0 lag: (min: 15.0, avg: 130.8, max: 271.0) [2024-06-15 16:57:40,956][1648985] Avg episode reward: [(0, '174.110')] [2024-06-15 16:57:42,293][1652491] Updated weights for policy 0, policy_version 441251 (0.0014) [2024-06-15 16:57:43,965][1652491] Updated weights for policy 0, policy_version 441314 (0.0084) [2024-06-15 16:57:45,955][1648985] Fps is (10 sec: 52429.2, 60 sec: 45875.2, 300 sec: 46541.7). Total num frames: 903872512. Throughput: 0: 11278.3. Samples: 226019328. Policy #0 lag: (min: 15.0, avg: 130.8, max: 271.0) [2024-06-15 16:57:45,956][1648985] Avg episode reward: [(0, '156.070')] [2024-06-15 16:57:47,976][1652491] Updated weights for policy 0, policy_version 441376 (0.0014) [2024-06-15 16:57:50,955][1648985] Fps is (10 sec: 45875.6, 60 sec: 44782.9, 300 sec: 46208.5). Total num frames: 904003584. Throughput: 0: 11480.2. Samples: 226101248. Policy #0 lag: (min: 15.0, avg: 130.8, max: 271.0) [2024-06-15 16:57:50,955][1648985] Avg episode reward: [(0, '126.950')] [2024-06-15 16:57:51,219][1652491] Updated weights for policy 0, policy_version 441416 (0.0028) [2024-06-15 16:57:52,762][1652491] Updated weights for policy 0, policy_version 441472 (0.0013) [2024-06-15 16:57:54,778][1652491] Updated weights for policy 0, policy_version 441537 (0.0014) [2024-06-15 16:57:55,955][1648985] Fps is (10 sec: 49150.7, 60 sec: 46967.3, 300 sec: 46652.7). Total num frames: 904364032. Throughput: 0: 11468.7. Samples: 226129408. Policy #0 lag: (min: 17.0, avg: 88.2, max: 273.0) [2024-06-15 16:57:55,956][1648985] Avg episode reward: [(0, '138.450')] [2024-06-15 16:57:56,041][1652491] Updated weights for policy 0, policy_version 441594 (0.0012) [2024-06-15 16:58:00,379][1652491] Updated weights for policy 0, policy_version 441650 (0.0013) [2024-06-15 16:58:00,955][1648985] Fps is (10 sec: 52428.4, 60 sec: 45884.2, 300 sec: 46430.6). Total num frames: 904527872. Throughput: 0: 11492.4. Samples: 226196992. Policy #0 lag: (min: 17.0, avg: 88.2, max: 273.0) [2024-06-15 16:58:00,956][1648985] Avg episode reward: [(0, '155.460')] [2024-06-15 16:58:03,548][1652491] Updated weights for policy 0, policy_version 441698 (0.0014) [2024-06-15 16:58:05,723][1652491] Updated weights for policy 0, policy_version 441776 (0.0015) [2024-06-15 16:58:05,955][1648985] Fps is (10 sec: 39322.6, 60 sec: 46421.3, 300 sec: 46541.7). Total num frames: 904757248. Throughput: 0: 11446.0. Samples: 226261504. Policy #0 lag: (min: 17.0, avg: 88.2, max: 273.0) [2024-06-15 16:58:05,956][1648985] Avg episode reward: [(0, '148.040')] [2024-06-15 16:58:06,722][1651469] Signal inference workers to stop experience collection... (23000 times) [2024-06-15 16:58:06,759][1652491] InferenceWorker_p0-w0: stopping experience collection (23000 times) [2024-06-15 16:58:06,997][1651469] Signal inference workers to resume experience collection... (23000 times) [2024-06-15 16:58:06,997][1652491] InferenceWorker_p0-w0: resuming experience collection (23000 times) [2024-06-15 16:58:07,186][1652491] Updated weights for policy 0, policy_version 441827 (0.0011) [2024-06-15 16:58:07,784][1652491] Updated weights for policy 0, policy_version 441856 (0.0013) [2024-06-15 16:58:10,955][1648985] Fps is (10 sec: 39320.8, 60 sec: 43690.5, 300 sec: 46208.4). Total num frames: 904921088. Throughput: 0: 11343.6. Samples: 226294272. Policy #0 lag: (min: 17.0, avg: 88.2, max: 273.0) [2024-06-15 16:58:10,956][1648985] Avg episode reward: [(0, '145.840')] [2024-06-15 16:58:12,325][1652491] Updated weights for policy 0, policy_version 441920 (0.0095) [2024-06-15 16:58:15,427][1652491] Updated weights for policy 0, policy_version 441974 (0.0099) [2024-06-15 16:58:15,955][1648985] Fps is (10 sec: 42598.6, 60 sec: 45877.7, 300 sec: 46208.4). Total num frames: 905183232. Throughput: 0: 11559.8. Samples: 226371072. Policy #0 lag: (min: 17.0, avg: 88.2, max: 273.0) [2024-06-15 16:58:15,956][1648985] Avg episode reward: [(0, '154.540')] [2024-06-15 16:58:17,340][1652491] Updated weights for policy 0, policy_version 442046 (0.0013) [2024-06-15 16:58:19,095][1652491] Updated weights for policy 0, policy_version 442109 (0.0013) [2024-06-15 16:58:20,955][1648985] Fps is (10 sec: 52430.1, 60 sec: 44782.9, 300 sec: 46319.5). Total num frames: 905445376. Throughput: 0: 11161.6. Samples: 226425856. Policy #0 lag: (min: 17.0, avg: 88.2, max: 273.0) [2024-06-15 16:58:20,956][1648985] Avg episode reward: [(0, '181.740')] [2024-06-15 16:58:23,951][1652491] Updated weights for policy 0, policy_version 442172 (0.0127) [2024-06-15 16:58:25,955][1648985] Fps is (10 sec: 39321.5, 60 sec: 44783.0, 300 sec: 46208.5). Total num frames: 905576448. Throughput: 0: 11241.3. Samples: 226465792. Policy #0 lag: (min: 17.0, avg: 88.2, max: 273.0) [2024-06-15 16:58:25,956][1648985] Avg episode reward: [(0, '181.850')] [2024-06-15 16:58:27,481][1652491] Updated weights for policy 0, policy_version 442224 (0.0012) [2024-06-15 16:58:29,515][1652491] Updated weights for policy 0, policy_version 442291 (0.0016) [2024-06-15 16:58:30,955][1648985] Fps is (10 sec: 45874.4, 60 sec: 45875.1, 300 sec: 46541.6). Total num frames: 905904128. Throughput: 0: 11275.4. Samples: 226526720. Policy #0 lag: (min: 17.0, avg: 88.2, max: 273.0) [2024-06-15 16:58:30,956][1648985] Avg episode reward: [(0, '185.850')] [2024-06-15 16:58:31,216][1652491] Updated weights for policy 0, policy_version 442360 (0.0012) [2024-06-15 16:58:35,915][1652491] Updated weights for policy 0, policy_version 442423 (0.0016) [2024-06-15 16:58:35,955][1648985] Fps is (10 sec: 49152.2, 60 sec: 45329.2, 300 sec: 46541.6). Total num frames: 906067968. Throughput: 0: 11116.1. Samples: 226601472. Policy #0 lag: (min: 17.0, avg: 88.2, max: 273.0) [2024-06-15 16:58:35,956][1648985] Avg episode reward: [(0, '165.630')] [2024-06-15 16:58:38,201][1652491] Updated weights for policy 0, policy_version 442464 (0.0013) [2024-06-15 16:58:39,591][1652491] Updated weights for policy 0, policy_version 442514 (0.0027) [2024-06-15 16:58:40,955][1648985] Fps is (10 sec: 45876.0, 60 sec: 46967.5, 300 sec: 46652.8). Total num frames: 906362880. Throughput: 0: 11264.1. Samples: 226636288. Policy #0 lag: (min: 17.0, avg: 88.2, max: 273.0) [2024-06-15 16:58:40,955][1648985] Avg episode reward: [(0, '163.640')] [2024-06-15 16:58:41,287][1652491] Updated weights for policy 0, policy_version 442577 (0.0013) [2024-06-15 16:58:42,131][1652491] Updated weights for policy 0, policy_version 442623 (0.0014) [2024-06-15 16:58:45,955][1648985] Fps is (10 sec: 42598.3, 60 sec: 43690.7, 300 sec: 46208.4). Total num frames: 906493952. Throughput: 0: 11343.7. Samples: 226707456. Policy #0 lag: (min: 17.0, avg: 88.2, max: 273.0) [2024-06-15 16:58:45,956][1648985] Avg episode reward: [(0, '160.230')] [2024-06-15 16:58:48,909][1652491] Updated weights for policy 0, policy_version 442689 (0.0026) [2024-06-15 16:58:50,492][1651469] Signal inference workers to stop experience collection... (23050 times) [2024-06-15 16:58:50,506][1651469] Signal inference workers to resume experience collection... (23050 times) [2024-06-15 16:58:50,510][1652491] Updated weights for policy 0, policy_version 442752 (0.0011) [2024-06-15 16:58:50,532][1652491] InferenceWorker_p0-w0: stopping experience collection (23050 times) [2024-06-15 16:58:50,561][1652491] InferenceWorker_p0-w0: resuming experience collection (23050 times) [2024-06-15 16:58:50,955][1648985] Fps is (10 sec: 39320.8, 60 sec: 45875.0, 300 sec: 46208.4). Total num frames: 906756096. Throughput: 0: 11343.6. Samples: 226771968. Policy #0 lag: (min: 17.0, avg: 88.2, max: 273.0) [2024-06-15 16:58:50,956][1648985] Avg episode reward: [(0, '163.740')] [2024-06-15 16:58:52,091][1652491] Updated weights for policy 0, policy_version 442806 (0.0012) [2024-06-15 16:58:53,645][1652491] Updated weights for policy 0, policy_version 442872 (0.0013) [2024-06-15 16:58:55,955][1648985] Fps is (10 sec: 52428.5, 60 sec: 44237.0, 300 sec: 46208.4). Total num frames: 907018240. Throughput: 0: 11161.6. Samples: 226796544. Policy #0 lag: (min: 17.0, avg: 88.2, max: 273.0) [2024-06-15 16:58:55,956][1648985] Avg episode reward: [(0, '163.000')] [2024-06-15 16:58:55,966][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000442880_907018240.pth... [2024-06-15 16:58:56,014][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000437504_896008192.pth [2024-06-15 16:58:58,919][1652491] Updated weights for policy 0, policy_version 442915 (0.0011) [2024-06-15 16:59:00,621][1652491] Updated weights for policy 0, policy_version 442946 (0.0011) [2024-06-15 16:59:00,955][1648985] Fps is (10 sec: 42599.1, 60 sec: 44236.8, 300 sec: 46208.4). Total num frames: 907182080. Throughput: 0: 11252.6. Samples: 226877440. Policy #0 lag: (min: 17.0, avg: 88.2, max: 273.0) [2024-06-15 16:59:00,956][1648985] Avg episode reward: [(0, '167.380')] [2024-06-15 16:59:02,812][1652491] Updated weights for policy 0, policy_version 443025 (0.0084) [2024-06-15 16:59:05,316][1652491] Updated weights for policy 0, policy_version 443120 (0.0131) [2024-06-15 16:59:05,956][1648985] Fps is (10 sec: 52428.5, 60 sec: 46421.3, 300 sec: 46319.5). Total num frames: 907542528. Throughput: 0: 11127.4. Samples: 226926592. Policy #0 lag: (min: 17.0, avg: 88.2, max: 273.0) [2024-06-15 16:59:05,957][1648985] Avg episode reward: [(0, '169.270')] [2024-06-15 16:59:10,337][1652491] Updated weights for policy 0, policy_version 443168 (0.0012) [2024-06-15 16:59:10,955][1648985] Fps is (10 sec: 45875.1, 60 sec: 45329.2, 300 sec: 46430.6). Total num frames: 907640832. Throughput: 0: 11229.9. Samples: 226971136. Policy #0 lag: (min: 17.0, avg: 88.2, max: 273.0) [2024-06-15 16:59:10,956][1648985] Avg episode reward: [(0, '174.300')] [2024-06-15 16:59:12,614][1652491] Updated weights for policy 0, policy_version 443204 (0.0058) [2024-06-15 16:59:15,184][1652491] Updated weights for policy 0, policy_version 443296 (0.0013) [2024-06-15 16:59:15,955][1648985] Fps is (10 sec: 36045.2, 60 sec: 45329.1, 300 sec: 46208.4). Total num frames: 907902976. Throughput: 0: 11355.1. Samples: 227037696. Policy #0 lag: (min: 79.0, avg: 159.0, max: 303.0) [2024-06-15 16:59:15,956][1648985] Avg episode reward: [(0, '161.330')] [2024-06-15 16:59:16,980][1652491] Updated weights for policy 0, policy_version 443360 (0.0107) [2024-06-15 16:59:20,955][1648985] Fps is (10 sec: 42598.7, 60 sec: 43690.7, 300 sec: 46208.5). Total num frames: 908066816. Throughput: 0: 11320.9. Samples: 227110912. Policy #0 lag: (min: 79.0, avg: 159.0, max: 303.0) [2024-06-15 16:59:20,955][1648985] Avg episode reward: [(0, '152.760')] [2024-06-15 16:59:21,795][1652491] Updated weights for policy 0, policy_version 443416 (0.0014) [2024-06-15 16:59:22,552][1652491] Updated weights for policy 0, policy_version 443456 (0.0013) [2024-06-15 16:59:24,845][1652491] Updated weights for policy 0, policy_version 443514 (0.0016) [2024-06-15 16:59:25,955][1648985] Fps is (10 sec: 45874.8, 60 sec: 46421.3, 300 sec: 46319.5). Total num frames: 908361728. Throughput: 0: 11377.7. Samples: 227148288. Policy #0 lag: (min: 79.0, avg: 159.0, max: 303.0) [2024-06-15 16:59:25,956][1648985] Avg episode reward: [(0, '164.210')] [2024-06-15 16:59:26,884][1652491] Updated weights for policy 0, policy_version 443584 (0.0011) [2024-06-15 16:59:28,493][1652491] Updated weights for policy 0, policy_version 443644 (0.0014) [2024-06-15 16:59:30,955][1648985] Fps is (10 sec: 52428.3, 60 sec: 44783.0, 300 sec: 46208.4). Total num frames: 908591104. Throughput: 0: 11116.1. Samples: 227207680. Policy #0 lag: (min: 79.0, avg: 159.0, max: 303.0) [2024-06-15 16:59:30,956][1648985] Avg episode reward: [(0, '165.230')] [2024-06-15 16:59:33,272][1651469] Signal inference workers to stop experience collection... (23100 times) [2024-06-15 16:59:33,325][1652491] InferenceWorker_p0-w0: stopping experience collection (23100 times) [2024-06-15 16:59:33,479][1651469] Signal inference workers to resume experience collection... (23100 times) [2024-06-15 16:59:33,480][1652491] InferenceWorker_p0-w0: resuming experience collection (23100 times) [2024-06-15 16:59:33,904][1652491] Updated weights for policy 0, policy_version 443701 (0.0021) [2024-06-15 16:59:35,655][1652491] Updated weights for policy 0, policy_version 443744 (0.0029) [2024-06-15 16:59:35,955][1648985] Fps is (10 sec: 45875.9, 60 sec: 45875.2, 300 sec: 46098.5). Total num frames: 908820480. Throughput: 0: 11446.1. Samples: 227287040. Policy #0 lag: (min: 79.0, avg: 159.0, max: 303.0) [2024-06-15 16:59:35,956][1648985] Avg episode reward: [(0, '161.620')] [2024-06-15 16:59:36,929][1652491] Updated weights for policy 0, policy_version 443795 (0.0012) [2024-06-15 16:59:37,949][1652491] Updated weights for policy 0, policy_version 443840 (0.0011) [2024-06-15 16:59:39,556][1652491] Updated weights for policy 0, policy_version 443900 (0.0012) [2024-06-15 16:59:40,955][1648985] Fps is (10 sec: 52429.2, 60 sec: 45875.2, 300 sec: 46208.4). Total num frames: 909115392. Throughput: 0: 11457.4. Samples: 227312128. Policy #0 lag: (min: 79.0, avg: 159.0, max: 303.0) [2024-06-15 16:59:40,956][1648985] Avg episode reward: [(0, '156.760')] [2024-06-15 16:59:45,780][1652491] Updated weights for policy 0, policy_version 443961 (0.0014) [2024-06-15 16:59:45,955][1648985] Fps is (10 sec: 42597.4, 60 sec: 45875.1, 300 sec: 46208.4). Total num frames: 909246464. Throughput: 0: 11377.7. Samples: 227389440. Policy #0 lag: (min: 79.0, avg: 159.0, max: 303.0) [2024-06-15 16:59:45,956][1648985] Avg episode reward: [(0, '153.810')] [2024-06-15 16:59:47,706][1652491] Updated weights for policy 0, policy_version 444024 (0.0013) [2024-06-15 16:59:49,992][1652491] Updated weights for policy 0, policy_version 444081 (0.0024) [2024-06-15 16:59:50,955][1648985] Fps is (10 sec: 42597.6, 60 sec: 46421.4, 300 sec: 45878.2). Total num frames: 909541376. Throughput: 0: 11502.9. Samples: 227444224. Policy #0 lag: (min: 79.0, avg: 159.0, max: 303.0) [2024-06-15 16:59:50,956][1648985] Avg episode reward: [(0, '138.520')] [2024-06-15 16:59:55,955][1648985] Fps is (10 sec: 39321.3, 60 sec: 43690.5, 300 sec: 45764.1). Total num frames: 909639680. Throughput: 0: 11309.5. Samples: 227480064. Policy #0 lag: (min: 79.0, avg: 159.0, max: 303.0) [2024-06-15 16:59:55,956][1648985] Avg episode reward: [(0, '139.760')] [2024-06-15 16:59:56,723][1652491] Updated weights for policy 0, policy_version 444163 (0.0016) [2024-06-15 16:59:57,975][1652491] Updated weights for policy 0, policy_version 444214 (0.0012) [2024-06-15 16:59:59,448][1652491] Updated weights for policy 0, policy_version 444258 (0.0022) [2024-06-15 17:00:00,955][1648985] Fps is (10 sec: 42597.7, 60 sec: 46421.1, 300 sec: 45986.2). Total num frames: 909967360. Throughput: 0: 11525.6. Samples: 227556352. Policy #0 lag: (min: 79.0, avg: 159.0, max: 303.0) [2024-06-15 17:00:00,956][1648985] Avg episode reward: [(0, '125.280')] [2024-06-15 17:00:01,060][1652491] Updated weights for policy 0, policy_version 444321 (0.0021) [2024-06-15 17:00:02,876][1652491] Updated weights for policy 0, policy_version 444400 (0.0014) [2024-06-15 17:00:05,955][1648985] Fps is (10 sec: 52430.0, 60 sec: 43690.7, 300 sec: 46208.4). Total num frames: 910163968. Throughput: 0: 11218.5. Samples: 227615744. Policy #0 lag: (min: 79.0, avg: 159.0, max: 303.0) [2024-06-15 17:00:05,956][1648985] Avg episode reward: [(0, '133.610')] [2024-06-15 17:00:09,313][1652491] Updated weights for policy 0, policy_version 444464 (0.0012) [2024-06-15 17:00:10,762][1652491] Updated weights for policy 0, policy_version 444502 (0.0010) [2024-06-15 17:00:10,955][1648985] Fps is (10 sec: 39322.9, 60 sec: 45329.1, 300 sec: 45875.2). Total num frames: 910360576. Throughput: 0: 11264.0. Samples: 227655168. Policy #0 lag: (min: 79.0, avg: 159.0, max: 303.0) [2024-06-15 17:00:10,956][1648985] Avg episode reward: [(0, '166.160')] [2024-06-15 17:00:12,153][1652491] Updated weights for policy 0, policy_version 444560 (0.0011) [2024-06-15 17:00:12,931][1651469] Signal inference workers to stop experience collection... (23150 times) [2024-06-15 17:00:12,975][1652491] InferenceWorker_p0-w0: stopping experience collection (23150 times) [2024-06-15 17:00:13,202][1651469] Signal inference workers to resume experience collection... (23150 times) [2024-06-15 17:00:13,203][1652491] InferenceWorker_p0-w0: resuming experience collection (23150 times) [2024-06-15 17:00:13,787][1652491] Updated weights for policy 0, policy_version 444628 (0.0012) [2024-06-15 17:00:14,618][1652491] Updated weights for policy 0, policy_version 444672 (0.0035) [2024-06-15 17:00:15,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 46421.3, 300 sec: 46208.4). Total num frames: 910688256. Throughput: 0: 11366.4. Samples: 227719168. Policy #0 lag: (min: 79.0, avg: 159.0, max: 303.0) [2024-06-15 17:00:15,956][1648985] Avg episode reward: [(0, '176.280')] [2024-06-15 17:00:20,602][1652491] Updated weights for policy 0, policy_version 444724 (0.0013) [2024-06-15 17:00:20,955][1648985] Fps is (10 sec: 45875.3, 60 sec: 45875.2, 300 sec: 45764.1). Total num frames: 910819328. Throughput: 0: 11366.4. Samples: 227798528. Policy #0 lag: (min: 79.0, avg: 159.0, max: 303.0) [2024-06-15 17:00:20,956][1648985] Avg episode reward: [(0, '165.170')] [2024-06-15 17:00:21,581][1652491] Updated weights for policy 0, policy_version 444754 (0.0012) [2024-06-15 17:00:22,734][1652491] Updated weights for policy 0, policy_version 444800 (0.0014) [2024-06-15 17:00:24,949][1652491] Updated weights for policy 0, policy_version 444896 (0.0016) [2024-06-15 17:00:25,955][1648985] Fps is (10 sec: 52429.1, 60 sec: 47513.7, 300 sec: 46208.5). Total num frames: 911212544. Throughput: 0: 11446.0. Samples: 227827200. Policy #0 lag: (min: 79.0, avg: 159.0, max: 303.0) [2024-06-15 17:00:25,956][1648985] Avg episode reward: [(0, '154.370')] [2024-06-15 17:00:30,955][1648985] Fps is (10 sec: 39320.4, 60 sec: 43690.5, 300 sec: 45764.1). Total num frames: 911212544. Throughput: 0: 11298.1. Samples: 227897856. Policy #0 lag: (min: 79.0, avg: 159.0, max: 303.0) [2024-06-15 17:00:30,956][1648985] Avg episode reward: [(0, '142.010')] [2024-06-15 17:00:31,075][1652491] Updated weights for policy 0, policy_version 444944 (0.0015) [2024-06-15 17:00:32,283][1652491] Updated weights for policy 0, policy_version 444989 (0.0014) [2024-06-15 17:00:33,915][1652491] Updated weights for policy 0, policy_version 445049 (0.0014) [2024-06-15 17:00:35,253][1652491] Updated weights for policy 0, policy_version 445104 (0.0014) [2024-06-15 17:00:35,955][1648985] Fps is (10 sec: 39321.7, 60 sec: 46421.3, 300 sec: 45875.2). Total num frames: 911605760. Throughput: 0: 11582.6. Samples: 227965440. Policy #0 lag: (min: 79.0, avg: 159.0, max: 303.0) [2024-06-15 17:00:35,955][1648985] Avg episode reward: [(0, '152.800')] [2024-06-15 17:00:36,519][1652491] Updated weights for policy 0, policy_version 445157 (0.0012) [2024-06-15 17:00:40,955][1648985] Fps is (10 sec: 52430.2, 60 sec: 43690.6, 300 sec: 45986.3). Total num frames: 911736832. Throughput: 0: 11662.3. Samples: 228004864. Policy #0 lag: (min: 79.0, avg: 159.0, max: 303.0) [2024-06-15 17:00:40,956][1648985] Avg episode reward: [(0, '167.420')] [2024-06-15 17:00:41,713][1652491] Updated weights for policy 0, policy_version 445187 (0.0016) [2024-06-15 17:00:43,038][1652491] Updated weights for policy 0, policy_version 445244 (0.0011) [2024-06-15 17:00:45,029][1652491] Updated weights for policy 0, policy_version 445296 (0.0012) [2024-06-15 17:00:45,955][1648985] Fps is (10 sec: 42598.1, 60 sec: 46421.4, 300 sec: 45875.2). Total num frames: 912031744. Throughput: 0: 11582.6. Samples: 228077568. Policy #0 lag: (min: 13.0, avg: 93.1, max: 269.0) [2024-06-15 17:00:45,956][1648985] Avg episode reward: [(0, '171.850')] [2024-06-15 17:00:46,705][1652491] Updated weights for policy 0, policy_version 445364 (0.0025) [2024-06-15 17:00:48,086][1652491] Updated weights for policy 0, policy_version 445437 (0.0015) [2024-06-15 17:00:50,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 45329.2, 300 sec: 46097.4). Total num frames: 912261120. Throughput: 0: 11776.0. Samples: 228145664. Policy #0 lag: (min: 13.0, avg: 93.1, max: 269.0) [2024-06-15 17:00:50,956][1648985] Avg episode reward: [(0, '179.110')] [2024-06-15 17:00:54,068][1652491] Updated weights for policy 0, policy_version 445488 (0.0011) [2024-06-15 17:00:54,714][1652491] Updated weights for policy 0, policy_version 445504 (0.0027) [2024-06-15 17:00:55,543][1651469] Signal inference workers to stop experience collection... (23200 times) [2024-06-15 17:00:55,653][1652491] InferenceWorker_p0-w0: stopping experience collection (23200 times) [2024-06-15 17:00:55,707][1651469] Signal inference workers to resume experience collection... (23200 times) [2024-06-15 17:00:55,708][1652491] InferenceWorker_p0-w0: resuming experience collection (23200 times) [2024-06-15 17:00:55,955][1648985] Fps is (10 sec: 42598.3, 60 sec: 46967.6, 300 sec: 45764.1). Total num frames: 912457728. Throughput: 0: 11696.3. Samples: 228181504. Policy #0 lag: (min: 13.0, avg: 93.1, max: 269.0) [2024-06-15 17:00:55,956][1648985] Avg episode reward: [(0, '160.330')] [2024-06-15 17:00:56,570][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000445568_912523264.pth... [2024-06-15 17:00:56,762][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000440240_901611520.pth [2024-06-15 17:00:57,390][1652491] Updated weights for policy 0, policy_version 445600 (0.0014) [2024-06-15 17:00:59,389][1652491] Updated weights for policy 0, policy_version 445688 (0.0159) [2024-06-15 17:01:00,955][1648985] Fps is (10 sec: 52428.3, 60 sec: 46967.6, 300 sec: 46208.4). Total num frames: 912785408. Throughput: 0: 11468.8. Samples: 228235264. Policy #0 lag: (min: 13.0, avg: 93.1, max: 269.0) [2024-06-15 17:01:00,956][1648985] Avg episode reward: [(0, '150.820')] [2024-06-15 17:01:05,955][1648985] Fps is (10 sec: 39321.6, 60 sec: 44782.9, 300 sec: 45542.0). Total num frames: 912850944. Throughput: 0: 11468.8. Samples: 228314624. Policy #0 lag: (min: 13.0, avg: 93.1, max: 269.0) [2024-06-15 17:01:05,956][1648985] Avg episode reward: [(0, '161.720')] [2024-06-15 17:01:06,158][1652491] Updated weights for policy 0, policy_version 445744 (0.0012) [2024-06-15 17:01:07,031][1652491] Updated weights for policy 0, policy_version 445776 (0.0034) [2024-06-15 17:01:08,427][1652491] Updated weights for policy 0, policy_version 445828 (0.0070) [2024-06-15 17:01:10,421][1652491] Updated weights for policy 0, policy_version 445905 (0.0014) [2024-06-15 17:01:10,955][1648985] Fps is (10 sec: 45875.1, 60 sec: 48059.6, 300 sec: 45986.3). Total num frames: 913244160. Throughput: 0: 11559.8. Samples: 228347392. Policy #0 lag: (min: 13.0, avg: 93.1, max: 269.0) [2024-06-15 17:01:10,956][1648985] Avg episode reward: [(0, '161.400')] [2024-06-15 17:01:11,158][1652491] Updated weights for policy 0, policy_version 445941 (0.0015) [2024-06-15 17:01:15,955][1648985] Fps is (10 sec: 45875.5, 60 sec: 43690.7, 300 sec: 45764.1). Total num frames: 913309696. Throughput: 0: 11719.2. Samples: 228425216. Policy #0 lag: (min: 13.0, avg: 93.1, max: 269.0) [2024-06-15 17:01:15,956][1648985] Avg episode reward: [(0, '150.720')] [2024-06-15 17:01:17,612][1652491] Updated weights for policy 0, policy_version 446016 (0.0013) [2024-06-15 17:01:19,287][1652491] Updated weights for policy 0, policy_version 446082 (0.0067) [2024-06-15 17:01:20,687][1652491] Updated weights for policy 0, policy_version 446147 (0.0011) [2024-06-15 17:01:20,955][1648985] Fps is (10 sec: 49152.6, 60 sec: 48605.8, 300 sec: 45986.3). Total num frames: 913735680. Throughput: 0: 11616.7. Samples: 228488192. Policy #0 lag: (min: 13.0, avg: 93.1, max: 269.0) [2024-06-15 17:01:20,956][1648985] Avg episode reward: [(0, '142.980')] [2024-06-15 17:01:21,932][1652491] Updated weights for policy 0, policy_version 446204 (0.0013) [2024-06-15 17:01:25,955][1648985] Fps is (10 sec: 52427.7, 60 sec: 43690.5, 300 sec: 45764.1). Total num frames: 913833984. Throughput: 0: 11582.5. Samples: 228526080. Policy #0 lag: (min: 13.0, avg: 93.1, max: 269.0) [2024-06-15 17:01:25,956][1648985] Avg episode reward: [(0, '127.940')] [2024-06-15 17:01:28,567][1652491] Updated weights for policy 0, policy_version 446256 (0.0218) [2024-06-15 17:01:29,587][1652491] Updated weights for policy 0, policy_version 446304 (0.0025) [2024-06-15 17:01:30,955][1648985] Fps is (10 sec: 39321.6, 60 sec: 48606.1, 300 sec: 45875.2). Total num frames: 914128896. Throughput: 0: 11548.5. Samples: 228597248. Policy #0 lag: (min: 13.0, avg: 93.1, max: 269.0) [2024-06-15 17:01:30,956][1648985] Avg episode reward: [(0, '130.320')] [2024-06-15 17:01:31,417][1652491] Updated weights for policy 0, policy_version 446370 (0.0032) [2024-06-15 17:01:31,748][1651469] Signal inference workers to stop experience collection... (23250 times) [2024-06-15 17:01:31,818][1652491] InferenceWorker_p0-w0: stopping experience collection (23250 times) [2024-06-15 17:01:31,992][1651469] Signal inference workers to resume experience collection... (23250 times) [2024-06-15 17:01:31,994][1652491] InferenceWorker_p0-w0: resuming experience collection (23250 times) [2024-06-15 17:01:32,563][1652491] Updated weights for policy 0, policy_version 446421 (0.0011) [2024-06-15 17:01:33,493][1652491] Updated weights for policy 0, policy_version 446462 (0.0011) [2024-06-15 17:01:35,955][1648985] Fps is (10 sec: 52430.0, 60 sec: 45875.2, 300 sec: 45764.1). Total num frames: 914358272. Throughput: 0: 11639.5. Samples: 228669440. Policy #0 lag: (min: 13.0, avg: 93.1, max: 269.0) [2024-06-15 17:01:35,956][1648985] Avg episode reward: [(0, '141.330')] [2024-06-15 17:01:39,370][1652491] Updated weights for policy 0, policy_version 446512 (0.0024) [2024-06-15 17:01:40,957][1648985] Fps is (10 sec: 42590.8, 60 sec: 46966.1, 300 sec: 45541.7). Total num frames: 914554880. Throughput: 0: 11786.9. Samples: 228711936. Policy #0 lag: (min: 13.0, avg: 93.1, max: 269.0) [2024-06-15 17:01:40,958][1648985] Avg episode reward: [(0, '154.900')] [2024-06-15 17:01:41,018][1652491] Updated weights for policy 0, policy_version 446576 (0.0012) [2024-06-15 17:01:42,061][1652491] Updated weights for policy 0, policy_version 446622 (0.0013) [2024-06-15 17:01:43,400][1652491] Updated weights for policy 0, policy_version 446675 (0.0014) [2024-06-15 17:01:45,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 47513.7, 300 sec: 45986.3). Total num frames: 914882560. Throughput: 0: 11787.4. Samples: 228765696. Policy #0 lag: (min: 13.0, avg: 93.1, max: 269.0) [2024-06-15 17:01:45,955][1648985] Avg episode reward: [(0, '173.570')] [2024-06-15 17:01:50,066][1652491] Updated weights for policy 0, policy_version 446736 (0.0246) [2024-06-15 17:01:50,955][1648985] Fps is (10 sec: 42606.5, 60 sec: 45329.1, 300 sec: 45542.0). Total num frames: 914980864. Throughput: 0: 11855.7. Samples: 228848128. Policy #0 lag: (min: 13.0, avg: 93.1, max: 269.0) [2024-06-15 17:01:50,955][1648985] Avg episode reward: [(0, '158.760')] [2024-06-15 17:01:51,425][1652491] Updated weights for policy 0, policy_version 446785 (0.0011) [2024-06-15 17:01:53,075][1652491] Updated weights for policy 0, policy_version 446864 (0.0086) [2024-06-15 17:01:54,080][1652491] Updated weights for policy 0, policy_version 446912 (0.0013) [2024-06-15 17:01:55,808][1652491] Updated weights for policy 0, policy_version 446964 (0.0013) [2024-06-15 17:01:55,955][1648985] Fps is (10 sec: 49151.9, 60 sec: 48605.9, 300 sec: 46099.2). Total num frames: 915374080. Throughput: 0: 11673.6. Samples: 228872704. Policy #0 lag: (min: 13.0, avg: 93.1, max: 269.0) [2024-06-15 17:01:55,956][1648985] Avg episode reward: [(0, '138.260')] [2024-06-15 17:02:00,955][1648985] Fps is (10 sec: 42598.1, 60 sec: 43690.8, 300 sec: 45542.0). Total num frames: 915406848. Throughput: 0: 11810.1. Samples: 228956672. Policy #0 lag: (min: 13.0, avg: 93.1, max: 269.0) [2024-06-15 17:02:00,956][1648985] Avg episode reward: [(0, '152.040')] [2024-06-15 17:02:01,618][1652491] Updated weights for policy 0, policy_version 447024 (0.0021) [2024-06-15 17:02:03,646][1652491] Updated weights for policy 0, policy_version 447104 (0.0014) [2024-06-15 17:02:05,134][1652491] Updated weights for policy 0, policy_version 447168 (0.0016) [2024-06-15 17:02:05,955][1648985] Fps is (10 sec: 42598.3, 60 sec: 49152.0, 300 sec: 45764.1). Total num frames: 915800064. Throughput: 0: 11764.6. Samples: 229017600. Policy #0 lag: (min: 13.0, avg: 93.1, max: 269.0) [2024-06-15 17:02:05,956][1648985] Avg episode reward: [(0, '160.230')] [2024-06-15 17:02:07,525][1652491] Updated weights for policy 0, policy_version 447232 (0.0020) [2024-06-15 17:02:10,955][1648985] Fps is (10 sec: 52428.6, 60 sec: 44783.0, 300 sec: 45764.6). Total num frames: 915931136. Throughput: 0: 11719.2. Samples: 229053440. Policy #0 lag: (min: 0.0, avg: 159.7, max: 256.0) [2024-06-15 17:02:10,956][1648985] Avg episode reward: [(0, '146.980')] [2024-06-15 17:02:12,803][1651469] Signal inference workers to stop experience collection... (23300 times) [2024-06-15 17:02:12,857][1652491] InferenceWorker_p0-w0: stopping experience collection (23300 times) [2024-06-15 17:02:13,005][1651469] Signal inference workers to resume experience collection... (23300 times) [2024-06-15 17:02:13,006][1652491] InferenceWorker_p0-w0: resuming experience collection (23300 times) [2024-06-15 17:02:14,452][1652491] Updated weights for policy 0, policy_version 447344 (0.0014) [2024-06-15 17:02:15,842][1652491] Updated weights for policy 0, policy_version 447397 (0.0015) [2024-06-15 17:02:15,955][1648985] Fps is (10 sec: 45875.5, 60 sec: 49152.1, 300 sec: 45764.1). Total num frames: 916258816. Throughput: 0: 11719.1. Samples: 229124608. Policy #0 lag: (min: 0.0, avg: 159.7, max: 256.0) [2024-06-15 17:02:15,956][1648985] Avg episode reward: [(0, '147.680')] [2024-06-15 17:02:17,775][1652491] Updated weights for policy 0, policy_version 447430 (0.0013) [2024-06-15 17:02:18,940][1652491] Updated weights for policy 0, policy_version 447488 (0.0014) [2024-06-15 17:02:20,955][1648985] Fps is (10 sec: 52428.6, 60 sec: 45329.0, 300 sec: 45986.3). Total num frames: 916455424. Throughput: 0: 11741.9. Samples: 229197824. Policy #0 lag: (min: 0.0, avg: 159.7, max: 256.0) [2024-06-15 17:02:20,956][1648985] Avg episode reward: [(0, '140.510')] [2024-06-15 17:02:25,246][1652491] Updated weights for policy 0, policy_version 447587 (0.0015) [2024-06-15 17:02:25,955][1648985] Fps is (10 sec: 45874.4, 60 sec: 48059.8, 300 sec: 45986.3). Total num frames: 916717568. Throughput: 0: 11639.9. Samples: 229235712. Policy #0 lag: (min: 0.0, avg: 159.7, max: 256.0) [2024-06-15 17:02:25,956][1648985] Avg episode reward: [(0, '150.960')] [2024-06-15 17:02:26,729][1652491] Updated weights for policy 0, policy_version 447648 (0.0013) [2024-06-15 17:02:30,650][1652491] Updated weights for policy 0, policy_version 447715 (0.0017) [2024-06-15 17:02:30,955][1648985] Fps is (10 sec: 49152.0, 60 sec: 46967.4, 300 sec: 46097.4). Total num frames: 916946944. Throughput: 0: 11753.2. Samples: 229294592. Policy #0 lag: (min: 0.0, avg: 159.7, max: 256.0) [2024-06-15 17:02:30,956][1648985] Avg episode reward: [(0, '167.320')] [2024-06-15 17:02:34,966][1652491] Updated weights for policy 0, policy_version 447764 (0.0015) [2024-06-15 17:02:35,955][1648985] Fps is (10 sec: 39322.3, 60 sec: 45875.2, 300 sec: 45986.3). Total num frames: 917110784. Throughput: 0: 11514.3. Samples: 229366272. Policy #0 lag: (min: 0.0, avg: 159.7, max: 256.0) [2024-06-15 17:02:35,955][1648985] Avg episode reward: [(0, '152.350')] [2024-06-15 17:02:36,375][1652491] Updated weights for policy 0, policy_version 447824 (0.0085) [2024-06-15 17:02:38,237][1652491] Updated weights for policy 0, policy_version 447891 (0.0022) [2024-06-15 17:02:39,256][1652491] Updated weights for policy 0, policy_version 447934 (0.0011) [2024-06-15 17:02:40,955][1648985] Fps is (10 sec: 42598.5, 60 sec: 46968.8, 300 sec: 45764.1). Total num frames: 917372928. Throughput: 0: 11502.9. Samples: 229390336. Policy #0 lag: (min: 0.0, avg: 159.7, max: 256.0) [2024-06-15 17:02:40,956][1648985] Avg episode reward: [(0, '125.720')] [2024-06-15 17:02:42,236][1652491] Updated weights for policy 0, policy_version 447996 (0.0012) [2024-06-15 17:02:45,955][1648985] Fps is (10 sec: 39321.6, 60 sec: 43690.7, 300 sec: 45764.1). Total num frames: 917504000. Throughput: 0: 11320.9. Samples: 229466112. Policy #0 lag: (min: 0.0, avg: 159.7, max: 256.0) [2024-06-15 17:02:45,955][1648985] Avg episode reward: [(0, '147.980')] [2024-06-15 17:02:46,865][1652491] Updated weights for policy 0, policy_version 448048 (0.0013) [2024-06-15 17:02:48,383][1652491] Updated weights for policy 0, policy_version 448112 (0.0013) [2024-06-15 17:02:49,552][1652491] Updated weights for policy 0, policy_version 448144 (0.0011) [2024-06-15 17:02:50,822][1652491] Updated weights for policy 0, policy_version 448186 (0.0176) [2024-06-15 17:02:50,955][1648985] Fps is (10 sec: 52429.4, 60 sec: 48605.8, 300 sec: 45875.3). Total num frames: 917897216. Throughput: 0: 11355.0. Samples: 229528576. Policy #0 lag: (min: 0.0, avg: 159.7, max: 256.0) [2024-06-15 17:02:50,955][1648985] Avg episode reward: [(0, '157.930')] [2024-06-15 17:02:52,782][1651469] Signal inference workers to stop experience collection... (23350 times) [2024-06-15 17:02:52,905][1652491] InferenceWorker_p0-w0: stopping experience collection (23350 times) [2024-06-15 17:02:53,146][1651469] Signal inference workers to resume experience collection... (23350 times) [2024-06-15 17:02:53,147][1652491] InferenceWorker_p0-w0: resuming experience collection (23350 times) [2024-06-15 17:02:53,320][1652491] Updated weights for policy 0, policy_version 448226 (0.0046) [2024-06-15 17:02:55,955][1648985] Fps is (10 sec: 52428.1, 60 sec: 44236.7, 300 sec: 45764.1). Total num frames: 918028288. Throughput: 0: 11446.0. Samples: 229568512. Policy #0 lag: (min: 0.0, avg: 159.7, max: 256.0) [2024-06-15 17:02:55,956][1648985] Avg episode reward: [(0, '153.890')] [2024-06-15 17:02:55,965][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000448256_918028288.pth... [2024-06-15 17:02:56,066][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000442880_907018240.pth [2024-06-15 17:02:57,260][1652491] Updated weights for policy 0, policy_version 448273 (0.0014) [2024-06-15 17:02:58,520][1652491] Updated weights for policy 0, policy_version 448336 (0.0087) [2024-06-15 17:02:59,744][1652491] Updated weights for policy 0, policy_version 448384 (0.0017) [2024-06-15 17:03:00,970][1648985] Fps is (10 sec: 39261.9, 60 sec: 48047.6, 300 sec: 45872.9). Total num frames: 918290432. Throughput: 0: 11385.3. Samples: 229637120. Policy #0 lag: (min: 0.0, avg: 159.7, max: 256.0) [2024-06-15 17:03:00,971][1648985] Avg episode reward: [(0, '140.960')] [2024-06-15 17:03:02,707][1652491] Updated weights for policy 0, policy_version 448448 (0.0034) [2024-06-15 17:03:05,202][1652491] Updated weights for policy 0, policy_version 448508 (0.0014) [2024-06-15 17:03:05,955][1648985] Fps is (10 sec: 52429.2, 60 sec: 45875.2, 300 sec: 46208.5). Total num frames: 918552576. Throughput: 0: 11218.5. Samples: 229702656. Policy #0 lag: (min: 0.0, avg: 159.7, max: 256.0) [2024-06-15 17:03:05,956][1648985] Avg episode reward: [(0, '139.390')] [2024-06-15 17:03:10,294][1652491] Updated weights for policy 0, policy_version 448581 (0.0015) [2024-06-15 17:03:10,955][1648985] Fps is (10 sec: 45944.5, 60 sec: 46967.4, 300 sec: 45986.3). Total num frames: 918749184. Throughput: 0: 11332.3. Samples: 229745664. Policy #0 lag: (min: 0.0, avg: 159.7, max: 256.0) [2024-06-15 17:03:10,956][1648985] Avg episode reward: [(0, '131.900')] [2024-06-15 17:03:13,296][1652491] Updated weights for policy 0, policy_version 448656 (0.0012) [2024-06-15 17:03:14,494][1652491] Updated weights for policy 0, policy_version 448704 (0.0017) [2024-06-15 17:03:15,955][1648985] Fps is (10 sec: 42598.5, 60 sec: 45329.0, 300 sec: 45875.2). Total num frames: 918978560. Throughput: 0: 11309.5. Samples: 229803520. Policy #0 lag: (min: 0.0, avg: 159.7, max: 256.0) [2024-06-15 17:03:15,956][1648985] Avg episode reward: [(0, '138.450')] [2024-06-15 17:03:16,825][1652491] Updated weights for policy 0, policy_version 448759 (0.0012) [2024-06-15 17:03:20,952][1652491] Updated weights for policy 0, policy_version 448816 (0.0013) [2024-06-15 17:03:20,955][1648985] Fps is (10 sec: 42598.7, 60 sec: 45329.1, 300 sec: 46097.4). Total num frames: 919175168. Throughput: 0: 11446.0. Samples: 229881344. Policy #0 lag: (min: 0.0, avg: 159.7, max: 256.0) [2024-06-15 17:03:20,956][1648985] Avg episode reward: [(0, '141.520')] [2024-06-15 17:03:22,858][1652491] Updated weights for policy 0, policy_version 448890 (0.0013) [2024-06-15 17:03:25,611][1652491] Updated weights for policy 0, policy_version 448945 (0.0097) [2024-06-15 17:03:25,955][1648985] Fps is (10 sec: 49151.8, 60 sec: 45875.3, 300 sec: 45986.3). Total num frames: 919470080. Throughput: 0: 11548.4. Samples: 229910016. Policy #0 lag: (min: 0.0, avg: 159.7, max: 256.0) [2024-06-15 17:03:25,956][1648985] Avg episode reward: [(0, '152.900')] [2024-06-15 17:03:27,917][1652491] Updated weights for policy 0, policy_version 448992 (0.0021) [2024-06-15 17:03:30,955][1648985] Fps is (10 sec: 42597.6, 60 sec: 44236.7, 300 sec: 45875.2). Total num frames: 919601152. Throughput: 0: 11434.6. Samples: 229980672. Policy #0 lag: (min: 0.0, avg: 159.7, max: 256.0) [2024-06-15 17:03:30,956][1648985] Avg episode reward: [(0, '150.790')] [2024-06-15 17:03:31,875][1652491] Updated weights for policy 0, policy_version 449056 (0.0014) [2024-06-15 17:03:33,163][1652491] Updated weights for policy 0, policy_version 449109 (0.0016) [2024-06-15 17:03:35,955][1648985] Fps is (10 sec: 42599.0, 60 sec: 46421.4, 300 sec: 45875.2). Total num frames: 919896064. Throughput: 0: 11514.3. Samples: 230046720. Policy #0 lag: (min: 8.0, avg: 94.7, max: 264.0) [2024-06-15 17:03:35,956][1648985] Avg episode reward: [(0, '156.180')] [2024-06-15 17:03:35,959][1652491] Updated weights for policy 0, policy_version 449174 (0.0023) [2024-06-15 17:03:36,272][1651469] Signal inference workers to stop experience collection... (23400 times) [2024-06-15 17:03:36,325][1652491] InferenceWorker_p0-w0: stopping experience collection (23400 times) [2024-06-15 17:03:36,473][1651469] Signal inference workers to resume experience collection... (23400 times) [2024-06-15 17:03:36,474][1652491] InferenceWorker_p0-w0: resuming experience collection (23400 times) [2024-06-15 17:03:36,774][1652491] Updated weights for policy 0, policy_version 449212 (0.0036) [2024-06-15 17:03:40,163][1652491] Updated weights for policy 0, policy_version 449277 (0.0013) [2024-06-15 17:03:40,955][1648985] Fps is (10 sec: 52429.6, 60 sec: 45875.2, 300 sec: 46208.4). Total num frames: 920125440. Throughput: 0: 11594.0. Samples: 230090240. Policy #0 lag: (min: 8.0, avg: 94.7, max: 264.0) [2024-06-15 17:03:40,956][1648985] Avg episode reward: [(0, '150.730')] [2024-06-15 17:03:42,605][1652491] Updated weights for policy 0, policy_version 449328 (0.0014) [2024-06-15 17:03:43,616][1652491] Updated weights for policy 0, policy_version 449376 (0.0015) [2024-06-15 17:03:45,955][1648985] Fps is (10 sec: 49150.9, 60 sec: 48059.6, 300 sec: 46208.4). Total num frames: 920387584. Throughput: 0: 11711.6. Samples: 230163968. Policy #0 lag: (min: 8.0, avg: 94.7, max: 264.0) [2024-06-15 17:03:45,956][1648985] Avg episode reward: [(0, '156.040')] [2024-06-15 17:03:46,217][1652491] Updated weights for policy 0, policy_version 449426 (0.0014) [2024-06-15 17:03:47,136][1652491] Updated weights for policy 0, policy_version 449471 (0.0014) [2024-06-15 17:03:50,978][1648985] Fps is (10 sec: 49038.6, 60 sec: 45311.5, 300 sec: 46093.7). Total num frames: 920616960. Throughput: 0: 11724.5. Samples: 230230528. Policy #0 lag: (min: 8.0, avg: 94.7, max: 264.0) [2024-06-15 17:03:50,979][1648985] Avg episode reward: [(0, '157.770')] [2024-06-15 17:03:51,027][1652491] Updated weights for policy 0, policy_version 449533 (0.0013) [2024-06-15 17:03:53,514][1652491] Updated weights for policy 0, policy_version 449600 (0.0015) [2024-06-15 17:03:54,664][1652491] Updated weights for policy 0, policy_version 449636 (0.0113) [2024-06-15 17:03:55,955][1648985] Fps is (10 sec: 52429.3, 60 sec: 48059.8, 300 sec: 46541.7). Total num frames: 920911872. Throughput: 0: 11628.1. Samples: 230268928. Policy #0 lag: (min: 8.0, avg: 94.7, max: 264.0) [2024-06-15 17:03:55,956][1648985] Avg episode reward: [(0, '165.340')] [2024-06-15 17:03:57,416][1652491] Updated weights for policy 0, policy_version 449696 (0.0038) [2024-06-15 17:03:58,148][1652491] Updated weights for policy 0, policy_version 449727 (0.0012) [2024-06-15 17:04:00,955][1648985] Fps is (10 sec: 42696.9, 60 sec: 45886.7, 300 sec: 45764.1). Total num frames: 921042944. Throughput: 0: 12014.9. Samples: 230344192. Policy #0 lag: (min: 8.0, avg: 94.7, max: 264.0) [2024-06-15 17:04:00,956][1648985] Avg episode reward: [(0, '153.320')] [2024-06-15 17:04:01,790][1652491] Updated weights for policy 0, policy_version 449782 (0.0030) [2024-06-15 17:04:04,397][1652491] Updated weights for policy 0, policy_version 449848 (0.0014) [2024-06-15 17:04:05,952][1652491] Updated weights for policy 0, policy_version 449917 (0.0012) [2024-06-15 17:04:05,955][1648985] Fps is (10 sec: 49152.3, 60 sec: 47513.6, 300 sec: 46652.8). Total num frames: 921403392. Throughput: 0: 11696.4. Samples: 230407680. Policy #0 lag: (min: 8.0, avg: 94.7, max: 264.0) [2024-06-15 17:04:05,955][1648985] Avg episode reward: [(0, '147.360')] [2024-06-15 17:04:09,017][1652491] Updated weights for policy 0, policy_version 449968 (0.0067) [2024-06-15 17:04:10,955][1648985] Fps is (10 sec: 52429.2, 60 sec: 46967.5, 300 sec: 46319.5). Total num frames: 921567232. Throughput: 0: 11980.8. Samples: 230449152. Policy #0 lag: (min: 8.0, avg: 94.7, max: 264.0) [2024-06-15 17:04:10,956][1648985] Avg episode reward: [(0, '143.230')] [2024-06-15 17:04:14,580][1652491] Updated weights for policy 0, policy_version 450064 (0.0013) [2024-06-15 17:04:15,797][1652491] Updated weights for policy 0, policy_version 450128 (0.0013) [2024-06-15 17:04:15,955][1648985] Fps is (10 sec: 45874.6, 60 sec: 48059.6, 300 sec: 46763.8). Total num frames: 921862144. Throughput: 0: 11958.1. Samples: 230518784. Policy #0 lag: (min: 8.0, avg: 94.7, max: 264.0) [2024-06-15 17:04:15,956][1648985] Avg episode reward: [(0, '141.740')] [2024-06-15 17:04:16,912][1652491] Updated weights for policy 0, policy_version 450176 (0.0038) [2024-06-15 17:04:20,103][1651469] Signal inference workers to stop experience collection... (23450 times) [2024-06-15 17:04:20,150][1652491] InferenceWorker_p0-w0: stopping experience collection (23450 times) [2024-06-15 17:04:20,315][1651469] Signal inference workers to resume experience collection... (23450 times) [2024-06-15 17:04:20,353][1652491] InferenceWorker_p0-w0: resuming experience collection (23450 times) [2024-06-15 17:04:20,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 48605.9, 300 sec: 46541.7). Total num frames: 922091520. Throughput: 0: 12174.2. Samples: 230594560. Policy #0 lag: (min: 8.0, avg: 94.7, max: 264.0) [2024-06-15 17:04:20,956][1648985] Avg episode reward: [(0, '142.040')] [2024-06-15 17:04:22,983][1652491] Updated weights for policy 0, policy_version 450246 (0.0014) [2024-06-15 17:04:24,111][1652491] Updated weights for policy 0, policy_version 450303 (0.0088) [2024-06-15 17:04:25,955][1648985] Fps is (10 sec: 49152.4, 60 sec: 48059.7, 300 sec: 46652.7). Total num frames: 922353664. Throughput: 0: 12083.2. Samples: 230633984. Policy #0 lag: (min: 8.0, avg: 94.7, max: 264.0) [2024-06-15 17:04:25,956][1648985] Avg episode reward: [(0, '129.240')] [2024-06-15 17:04:26,511][1652491] Updated weights for policy 0, policy_version 450400 (0.0013) [2024-06-15 17:04:30,088][1652491] Updated weights for policy 0, policy_version 450434 (0.0013) [2024-06-15 17:04:30,955][1648985] Fps is (10 sec: 45874.9, 60 sec: 49152.1, 300 sec: 46541.7). Total num frames: 922550272. Throughput: 0: 12140.1. Samples: 230710272. Policy #0 lag: (min: 8.0, avg: 94.7, max: 264.0) [2024-06-15 17:04:30,956][1648985] Avg episode reward: [(0, '108.090')] [2024-06-15 17:04:31,285][1652491] Updated weights for policy 0, policy_version 450490 (0.0103) [2024-06-15 17:04:34,295][1652491] Updated weights for policy 0, policy_version 450536 (0.0013) [2024-06-15 17:04:35,799][1652491] Updated weights for policy 0, policy_version 450608 (0.0013) [2024-06-15 17:04:35,955][1648985] Fps is (10 sec: 49151.6, 60 sec: 49151.8, 300 sec: 46541.6). Total num frames: 922845184. Throughput: 0: 12203.2. Samples: 230779392. Policy #0 lag: (min: 8.0, avg: 94.7, max: 264.0) [2024-06-15 17:04:35,956][1648985] Avg episode reward: [(0, '118.770')] [2024-06-15 17:04:37,606][1652491] Updated weights for policy 0, policy_version 450682 (0.0013) [2024-06-15 17:04:40,955][1648985] Fps is (10 sec: 45875.1, 60 sec: 48059.7, 300 sec: 46652.8). Total num frames: 923009024. Throughput: 0: 12140.1. Samples: 230815232. Policy #0 lag: (min: 8.0, avg: 94.7, max: 264.0) [2024-06-15 17:04:40,956][1648985] Avg episode reward: [(0, '138.170')] [2024-06-15 17:04:41,801][1652491] Updated weights for policy 0, policy_version 450723 (0.0012) [2024-06-15 17:04:44,737][1652491] Updated weights for policy 0, policy_version 450772 (0.0015) [2024-06-15 17:04:45,781][1652491] Updated weights for policy 0, policy_version 450822 (0.0015) [2024-06-15 17:04:45,955][1648985] Fps is (10 sec: 42598.4, 60 sec: 48059.7, 300 sec: 46541.7). Total num frames: 923271168. Throughput: 0: 12208.3. Samples: 230893568. Policy #0 lag: (min: 8.0, avg: 94.7, max: 264.0) [2024-06-15 17:04:45,956][1648985] Avg episode reward: [(0, '158.240')] [2024-06-15 17:04:47,623][1652491] Updated weights for policy 0, policy_version 450897 (0.0013) [2024-06-15 17:04:50,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 48624.6, 300 sec: 47097.1). Total num frames: 923533312. Throughput: 0: 12276.6. Samples: 230960128. Policy #0 lag: (min: 8.0, avg: 94.7, max: 264.0) [2024-06-15 17:04:50,956][1648985] Avg episode reward: [(0, '166.550')] [2024-06-15 17:04:52,523][1652491] Updated weights for policy 0, policy_version 450960 (0.0014) [2024-06-15 17:04:53,437][1652491] Updated weights for policy 0, policy_version 450999 (0.0011) [2024-06-15 17:04:55,955][1648985] Fps is (10 sec: 45874.7, 60 sec: 46967.3, 300 sec: 46652.8). Total num frames: 923729920. Throughput: 0: 12265.2. Samples: 231001088. Policy #0 lag: (min: 15.0, avg: 92.8, max: 249.0) [2024-06-15 17:04:55,956][1648985] Avg episode reward: [(0, '154.650')] [2024-06-15 17:04:56,148][1652491] Updated weights for policy 0, policy_version 451058 (0.0013) [2024-06-15 17:04:56,304][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000451072_923795456.pth... [2024-06-15 17:04:56,477][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000445568_912523264.pth [2024-06-15 17:04:57,549][1652491] Updated weights for policy 0, policy_version 451122 (0.0012) [2024-06-15 17:04:57,889][1651469] Signal inference workers to stop experience collection... (23500 times) [2024-06-15 17:04:57,955][1652491] InferenceWorker_p0-w0: stopping experience collection (23500 times) [2024-06-15 17:04:58,128][1651469] Signal inference workers to resume experience collection... (23500 times) [2024-06-15 17:04:58,129][1652491] InferenceWorker_p0-w0: resuming experience collection (23500 times) [2024-06-15 17:04:59,083][1652491] Updated weights for policy 0, policy_version 451195 (0.0037) [2024-06-15 17:05:00,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 50244.3, 300 sec: 47097.1). Total num frames: 924057600. Throughput: 0: 12208.4. Samples: 231068160. Policy #0 lag: (min: 15.0, avg: 92.8, max: 249.0) [2024-06-15 17:05:00,956][1648985] Avg episode reward: [(0, '157.570')] [2024-06-15 17:05:04,583][1652491] Updated weights for policy 0, policy_version 451263 (0.0013) [2024-06-15 17:05:05,955][1648985] Fps is (10 sec: 45876.3, 60 sec: 46421.3, 300 sec: 46874.9). Total num frames: 924188672. Throughput: 0: 12333.5. Samples: 231149568. Policy #0 lag: (min: 15.0, avg: 92.8, max: 249.0) [2024-06-15 17:05:05,956][1648985] Avg episode reward: [(0, '164.860')] [2024-06-15 17:05:06,892][1652491] Updated weights for policy 0, policy_version 451314 (0.0012) [2024-06-15 17:05:08,276][1652491] Updated weights for policy 0, policy_version 451389 (0.0013) [2024-06-15 17:05:09,511][1652491] Updated weights for policy 0, policy_version 451428 (0.0020) [2024-06-15 17:05:10,955][1648985] Fps is (10 sec: 52428.5, 60 sec: 50244.2, 300 sec: 47097.1). Total num frames: 924581888. Throughput: 0: 12106.0. Samples: 231178752. Policy #0 lag: (min: 15.0, avg: 92.8, max: 249.0) [2024-06-15 17:05:10,956][1648985] Avg episode reward: [(0, '170.820')] [2024-06-15 17:05:15,044][1652491] Updated weights for policy 0, policy_version 451504 (0.0013) [2024-06-15 17:05:15,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 47513.7, 300 sec: 47097.1). Total num frames: 924712960. Throughput: 0: 12094.6. Samples: 231254528. Policy #0 lag: (min: 15.0, avg: 92.8, max: 249.0) [2024-06-15 17:05:15,956][1648985] Avg episode reward: [(0, '163.790')] [2024-06-15 17:05:16,687][1652491] Updated weights for policy 0, policy_version 451536 (0.0012) [2024-06-15 17:05:18,172][1652491] Updated weights for policy 0, policy_version 451600 (0.0012) [2024-06-15 17:05:20,030][1652491] Updated weights for policy 0, policy_version 451680 (0.0103) [2024-06-15 17:05:20,955][1648985] Fps is (10 sec: 52428.6, 60 sec: 50244.2, 300 sec: 47097.0). Total num frames: 925106176. Throughput: 0: 12014.9. Samples: 231320064. Policy #0 lag: (min: 15.0, avg: 92.8, max: 249.0) [2024-06-15 17:05:20,956][1648985] Avg episode reward: [(0, '158.850')] [2024-06-15 17:05:25,895][1652491] Updated weights for policy 0, policy_version 451728 (0.0015) [2024-06-15 17:05:25,955][1648985] Fps is (10 sec: 42597.7, 60 sec: 46421.2, 300 sec: 47208.2). Total num frames: 925138944. Throughput: 0: 12140.1. Samples: 231361536. Policy #0 lag: (min: 15.0, avg: 92.8, max: 249.0) [2024-06-15 17:05:25,956][1648985] Avg episode reward: [(0, '135.610')] [2024-06-15 17:05:27,055][1652491] Updated weights for policy 0, policy_version 451776 (0.0012) [2024-06-15 17:05:28,524][1652491] Updated weights for policy 0, policy_version 451838 (0.0012) [2024-06-15 17:05:30,795][1652491] Updated weights for policy 0, policy_version 451904 (0.0013) [2024-06-15 17:05:30,955][1648985] Fps is (10 sec: 39322.0, 60 sec: 49152.1, 300 sec: 47097.1). Total num frames: 925499392. Throughput: 0: 11912.6. Samples: 231429632. Policy #0 lag: (min: 15.0, avg: 92.8, max: 249.0) [2024-06-15 17:05:30,955][1648985] Avg episode reward: [(0, '138.750')] [2024-06-15 17:05:35,955][1648985] Fps is (10 sec: 49151.9, 60 sec: 46421.3, 300 sec: 47097.0). Total num frames: 925630464. Throughput: 0: 11912.5. Samples: 231496192. Policy #0 lag: (min: 15.0, avg: 92.8, max: 249.0) [2024-06-15 17:05:35,956][1648985] Avg episode reward: [(0, '131.910')] [2024-06-15 17:05:38,000][1652491] Updated weights for policy 0, policy_version 451972 (0.0014) [2024-06-15 17:05:39,588][1652491] Updated weights for policy 0, policy_version 452039 (0.0013) [2024-06-15 17:05:40,946][1652491] Updated weights for policy 0, policy_version 452096 (0.0014) [2024-06-15 17:05:40,955][1648985] Fps is (10 sec: 39321.6, 60 sec: 48059.8, 300 sec: 46986.0). Total num frames: 925892608. Throughput: 0: 11776.1. Samples: 231531008. Policy #0 lag: (min: 15.0, avg: 92.8, max: 249.0) [2024-06-15 17:05:40,955][1648985] Avg episode reward: [(0, '130.250')] [2024-06-15 17:05:41,625][1651469] Signal inference workers to stop experience collection... (23550 times) [2024-06-15 17:05:41,672][1652491] InferenceWorker_p0-w0: stopping experience collection (23550 times) [2024-06-15 17:05:41,869][1651469] Signal inference workers to resume experience collection... (23550 times) [2024-06-15 17:05:41,878][1652491] InferenceWorker_p0-w0: resuming experience collection (23550 times) [2024-06-15 17:05:43,020][1652491] Updated weights for policy 0, policy_version 452145 (0.0011) [2024-06-15 17:05:44,835][1652491] Updated weights for policy 0, policy_version 452224 (0.0012) [2024-06-15 17:05:45,955][1648985] Fps is (10 sec: 52429.8, 60 sec: 48059.9, 300 sec: 47097.1). Total num frames: 926154752. Throughput: 0: 11423.3. Samples: 231582208. Policy #0 lag: (min: 15.0, avg: 92.8, max: 249.0) [2024-06-15 17:05:45,956][1648985] Avg episode reward: [(0, '143.030')] [2024-06-15 17:05:50,955][1648985] Fps is (10 sec: 29491.1, 60 sec: 44236.8, 300 sec: 46541.7). Total num frames: 926187520. Throughput: 0: 11355.0. Samples: 231660544. Policy #0 lag: (min: 15.0, avg: 92.8, max: 249.0) [2024-06-15 17:05:50,956][1648985] Avg episode reward: [(0, '150.050')] [2024-06-15 17:05:51,963][1652491] Updated weights for policy 0, policy_version 452289 (0.0014) [2024-06-15 17:05:53,157][1652491] Updated weights for policy 0, policy_version 452342 (0.0120) [2024-06-15 17:05:54,104][1652491] Updated weights for policy 0, policy_version 452370 (0.0012) [2024-06-15 17:05:55,955][1648985] Fps is (10 sec: 45875.0, 60 sec: 48059.9, 300 sec: 46874.9). Total num frames: 926613504. Throughput: 0: 11309.5. Samples: 231687680. Policy #0 lag: (min: 15.0, avg: 92.8, max: 249.0) [2024-06-15 17:05:55,956][1648985] Avg episode reward: [(0, '162.360')] [2024-06-15 17:05:56,025][1652491] Updated weights for policy 0, policy_version 452450 (0.0012) [2024-06-15 17:06:00,955][1648985] Fps is (10 sec: 49151.3, 60 sec: 43690.5, 300 sec: 46874.9). Total num frames: 926679040. Throughput: 0: 11047.8. Samples: 231751680. Policy #0 lag: (min: 15.0, avg: 92.8, max: 249.0) [2024-06-15 17:06:00,956][1648985] Avg episode reward: [(0, '160.760')] [2024-06-15 17:06:02,602][1652491] Updated weights for policy 0, policy_version 452497 (0.0017) [2024-06-15 17:06:03,792][1652491] Updated weights for policy 0, policy_version 452545 (0.0014) [2024-06-15 17:06:05,205][1652491] Updated weights for policy 0, policy_version 452603 (0.0116) [2024-06-15 17:06:05,955][1648985] Fps is (10 sec: 36045.1, 60 sec: 46421.4, 300 sec: 46541.7). Total num frames: 926973952. Throughput: 0: 11116.1. Samples: 231820288. Policy #0 lag: (min: 15.0, avg: 92.8, max: 249.0) [2024-06-15 17:06:05,955][1648985] Avg episode reward: [(0, '165.610')] [2024-06-15 17:06:06,448][1652491] Updated weights for policy 0, policy_version 452641 (0.0011) [2024-06-15 17:06:08,347][1652491] Updated weights for policy 0, policy_version 452720 (0.0017) [2024-06-15 17:06:10,955][1648985] Fps is (10 sec: 52429.6, 60 sec: 43690.7, 300 sec: 47097.1). Total num frames: 927203328. Throughput: 0: 10740.7. Samples: 231844864. Policy #0 lag: (min: 15.0, avg: 92.8, max: 249.0) [2024-06-15 17:06:10,956][1648985] Avg episode reward: [(0, '157.970')] [2024-06-15 17:06:15,466][1652491] Updated weights for policy 0, policy_version 452791 (0.0015) [2024-06-15 17:06:15,955][1648985] Fps is (10 sec: 36044.0, 60 sec: 43690.5, 300 sec: 46097.3). Total num frames: 927334400. Throughput: 0: 10911.2. Samples: 231920640. Policy #0 lag: (min: 15.0, avg: 92.8, max: 249.0) [2024-06-15 17:06:15,956][1648985] Avg episode reward: [(0, '157.450')] [2024-06-15 17:06:16,862][1652491] Updated weights for policy 0, policy_version 452833 (0.0012) [2024-06-15 17:06:18,835][1652491] Updated weights for policy 0, policy_version 452912 (0.0012) [2024-06-15 17:06:19,799][1651469] Signal inference workers to stop experience collection... (23600 times) [2024-06-15 17:06:19,832][1652491] InferenceWorker_p0-w0: stopping experience collection (23600 times) [2024-06-15 17:06:19,998][1651469] Signal inference workers to resume experience collection... (23600 times) [2024-06-15 17:06:19,999][1652491] InferenceWorker_p0-w0: resuming experience collection (23600 times) [2024-06-15 17:06:20,301][1652491] Updated weights for policy 0, policy_version 452976 (0.0011) [2024-06-15 17:06:20,955][1648985] Fps is (10 sec: 52428.6, 60 sec: 43690.7, 300 sec: 47097.1). Total num frames: 927727616. Throughput: 0: 10706.5. Samples: 231977984. Policy #0 lag: (min: 79.0, avg: 198.6, max: 323.0) [2024-06-15 17:06:20,956][1648985] Avg episode reward: [(0, '155.190')] [2024-06-15 17:06:25,955][1648985] Fps is (10 sec: 39321.8, 60 sec: 43144.6, 300 sec: 46097.3). Total num frames: 927727616. Throughput: 0: 10786.1. Samples: 232016384. Policy #0 lag: (min: 79.0, avg: 198.6, max: 323.0) [2024-06-15 17:06:25,956][1648985] Avg episode reward: [(0, '163.390')] [2024-06-15 17:06:26,397][1652491] Updated weights for policy 0, policy_version 453024 (0.0013) [2024-06-15 17:06:27,886][1652491] Updated weights for policy 0, policy_version 453072 (0.0014) [2024-06-15 17:06:29,648][1652491] Updated weights for policy 0, policy_version 453139 (0.0014) [2024-06-15 17:06:30,955][1648985] Fps is (10 sec: 39321.8, 60 sec: 43690.7, 300 sec: 46652.8). Total num frames: 928120832. Throughput: 0: 11116.1. Samples: 232082432. Policy #0 lag: (min: 79.0, avg: 198.6, max: 323.0) [2024-06-15 17:06:30,956][1648985] Avg episode reward: [(0, '157.860')] [2024-06-15 17:06:31,313][1652491] Updated weights for policy 0, policy_version 453216 (0.0011) [2024-06-15 17:06:35,955][1648985] Fps is (10 sec: 52429.4, 60 sec: 43690.8, 300 sec: 46430.9). Total num frames: 928251904. Throughput: 0: 10945.4. Samples: 232153088. Policy #0 lag: (min: 79.0, avg: 198.6, max: 323.0) [2024-06-15 17:06:35,956][1648985] Avg episode reward: [(0, '135.960')] [2024-06-15 17:06:38,004][1652491] Updated weights for policy 0, policy_version 453282 (0.0015) [2024-06-15 17:06:40,343][1652491] Updated weights for policy 0, policy_version 453333 (0.0013) [2024-06-15 17:06:40,955][1648985] Fps is (10 sec: 32767.8, 60 sec: 42598.4, 300 sec: 45986.3). Total num frames: 928448512. Throughput: 0: 11104.7. Samples: 232187392. Policy #0 lag: (min: 79.0, avg: 198.6, max: 323.0) [2024-06-15 17:06:40,956][1648985] Avg episode reward: [(0, '139.000')] [2024-06-15 17:06:41,884][1652491] Updated weights for policy 0, policy_version 453393 (0.0113) [2024-06-15 17:06:43,050][1652491] Updated weights for policy 0, policy_version 453447 (0.0016) [2024-06-15 17:06:44,059][1652491] Updated weights for policy 0, policy_version 453504 (0.0012) [2024-06-15 17:06:45,955][1648985] Fps is (10 sec: 52428.0, 60 sec: 43690.6, 300 sec: 46763.8). Total num frames: 928776192. Throughput: 0: 11195.7. Samples: 232255488. Policy #0 lag: (min: 79.0, avg: 198.6, max: 323.0) [2024-06-15 17:06:45,956][1648985] Avg episode reward: [(0, '164.770')] [2024-06-15 17:06:49,071][1652491] Updated weights for policy 0, policy_version 453562 (0.0014) [2024-06-15 17:06:50,955][1648985] Fps is (10 sec: 45875.2, 60 sec: 45329.0, 300 sec: 45875.2). Total num frames: 928907264. Throughput: 0: 11400.5. Samples: 232333312. Policy #0 lag: (min: 79.0, avg: 198.6, max: 323.0) [2024-06-15 17:06:50,956][1648985] Avg episode reward: [(0, '169.780')] [2024-06-15 17:06:52,012][1652491] Updated weights for policy 0, policy_version 453616 (0.0012) [2024-06-15 17:06:53,993][1652491] Updated weights for policy 0, policy_version 453696 (0.0011) [2024-06-15 17:06:55,365][1652491] Updated weights for policy 0, policy_version 453760 (0.0012) [2024-06-15 17:06:55,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 44782.8, 300 sec: 47097.0). Total num frames: 929300480. Throughput: 0: 11366.4. Samples: 232356352. Policy #0 lag: (min: 79.0, avg: 198.6, max: 323.0) [2024-06-15 17:06:55,956][1648985] Avg episode reward: [(0, '167.340')] [2024-06-15 17:06:56,003][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000453760_929300480.pth... [2024-06-15 17:06:56,053][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000448256_918028288.pth [2024-06-15 17:06:59,996][1652491] Updated weights for policy 0, policy_version 453820 (0.0014) [2024-06-15 17:07:00,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 45875.3, 300 sec: 46208.4). Total num frames: 929431552. Throughput: 0: 11298.2. Samples: 232429056. Policy #0 lag: (min: 79.0, avg: 198.6, max: 323.0) [2024-06-15 17:07:00,956][1648985] Avg episode reward: [(0, '177.540')] [2024-06-15 17:07:03,287][1651469] Signal inference workers to stop experience collection... (23650 times) [2024-06-15 17:07:03,331][1652491] InferenceWorker_p0-w0: stopping experience collection (23650 times) [2024-06-15 17:07:03,344][1652491] Updated weights for policy 0, policy_version 453876 (0.0013) [2024-06-15 17:07:03,509][1651469] Signal inference workers to resume experience collection... (23650 times) [2024-06-15 17:07:03,517][1652491] InferenceWorker_p0-w0: resuming experience collection (23650 times) [2024-06-15 17:07:04,915][1652491] Updated weights for policy 0, policy_version 453955 (0.0083) [2024-06-15 17:07:05,955][1648985] Fps is (10 sec: 49152.9, 60 sec: 46967.4, 300 sec: 46986.0). Total num frames: 929792000. Throughput: 0: 11548.5. Samples: 232497664. Policy #0 lag: (min: 79.0, avg: 198.6, max: 323.0) [2024-06-15 17:07:05,955][1648985] Avg episode reward: [(0, '167.510')] [2024-06-15 17:07:05,998][1652491] Updated weights for policy 0, policy_version 454011 (0.0019) [2024-06-15 17:07:10,344][1652491] Updated weights for policy 0, policy_version 454070 (0.0014) [2024-06-15 17:07:10,955][1648985] Fps is (10 sec: 52428.5, 60 sec: 45875.1, 300 sec: 46430.6). Total num frames: 929955840. Throughput: 0: 11753.3. Samples: 232545280. Policy #0 lag: (min: 79.0, avg: 198.6, max: 323.0) [2024-06-15 17:07:10,956][1648985] Avg episode reward: [(0, '127.220')] [2024-06-15 17:07:13,323][1652491] Updated weights for policy 0, policy_version 454100 (0.0013) [2024-06-15 17:07:15,431][1652491] Updated weights for policy 0, policy_version 454196 (0.0015) [2024-06-15 17:07:15,955][1648985] Fps is (10 sec: 42598.6, 60 sec: 48059.9, 300 sec: 46652.8). Total num frames: 930217984. Throughput: 0: 11741.9. Samples: 232610816. Policy #0 lag: (min: 79.0, avg: 198.6, max: 323.0) [2024-06-15 17:07:15,955][1648985] Avg episode reward: [(0, '137.510')] [2024-06-15 17:07:16,852][1652491] Updated weights for policy 0, policy_version 454268 (0.0015) [2024-06-15 17:07:20,955][1648985] Fps is (10 sec: 42598.9, 60 sec: 44236.9, 300 sec: 46319.5). Total num frames: 930381824. Throughput: 0: 11753.3. Samples: 232681984. Policy #0 lag: (min: 79.0, avg: 198.6, max: 323.0) [2024-06-15 17:07:20,955][1648985] Avg episode reward: [(0, '147.720')] [2024-06-15 17:07:21,921][1652491] Updated weights for policy 0, policy_version 454336 (0.0013) [2024-06-15 17:07:25,955][1648985] Fps is (10 sec: 39321.4, 60 sec: 48059.8, 300 sec: 46319.5). Total num frames: 930611200. Throughput: 0: 11776.0. Samples: 232717312. Policy #0 lag: (min: 79.0, avg: 198.6, max: 323.0) [2024-06-15 17:07:25,955][1648985] Avg episode reward: [(0, '158.120')] [2024-06-15 17:07:26,249][1652491] Updated weights for policy 0, policy_version 454402 (0.0014) [2024-06-15 17:07:27,583][1652491] Updated weights for policy 0, policy_version 454464 (0.0012) [2024-06-15 17:07:29,078][1652491] Updated weights for policy 0, policy_version 454523 (0.0013) [2024-06-15 17:07:30,955][1648985] Fps is (10 sec: 49151.8, 60 sec: 45875.2, 300 sec: 46652.7). Total num frames: 930873344. Throughput: 0: 11662.3. Samples: 232780288. Policy #0 lag: (min: 79.0, avg: 198.6, max: 323.0) [2024-06-15 17:07:30,956][1648985] Avg episode reward: [(0, '142.770')] [2024-06-15 17:07:33,122][1652491] Updated weights for policy 0, policy_version 454580 (0.0121) [2024-06-15 17:07:35,633][1652491] Updated weights for policy 0, policy_version 454604 (0.0011) [2024-06-15 17:07:35,955][1648985] Fps is (10 sec: 42598.1, 60 sec: 46421.3, 300 sec: 46319.5). Total num frames: 931037184. Throughput: 0: 11628.1. Samples: 232856576. Policy #0 lag: (min: 79.0, avg: 198.6, max: 323.0) [2024-06-15 17:07:35,956][1648985] Avg episode reward: [(0, '158.700')] [2024-06-15 17:07:36,339][1652491] Updated weights for policy 0, policy_version 454648 (0.0013) [2024-06-15 17:07:37,949][1652491] Updated weights for policy 0, policy_version 454704 (0.0024) [2024-06-15 17:07:39,201][1652491] Updated weights for policy 0, policy_version 454758 (0.0018) [2024-06-15 17:07:40,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 49152.1, 300 sec: 47097.1). Total num frames: 931397632. Throughput: 0: 11855.7. Samples: 232889856. Policy #0 lag: (min: 79.0, avg: 198.6, max: 323.0) [2024-06-15 17:07:40,955][1648985] Avg episode reward: [(0, '149.040')] [2024-06-15 17:07:43,072][1652491] Updated weights for policy 0, policy_version 454816 (0.0017) [2024-06-15 17:07:43,253][1651469] Signal inference workers to stop experience collection... (23700 times) [2024-06-15 17:07:43,296][1652491] InferenceWorker_p0-w0: stopping experience collection (23700 times) [2024-06-15 17:07:43,534][1651469] Signal inference workers to resume experience collection... (23700 times) [2024-06-15 17:07:43,535][1652491] InferenceWorker_p0-w0: resuming experience collection (23700 times) [2024-06-15 17:07:43,977][1652491] Updated weights for policy 0, policy_version 454848 (0.0011) [2024-06-15 17:07:45,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 46421.4, 300 sec: 46319.5). Total num frames: 931561472. Throughput: 0: 12003.5. Samples: 232969216. Policy #0 lag: (min: 47.0, avg: 158.9, max: 303.0) [2024-06-15 17:07:45,956][1648985] Avg episode reward: [(0, '134.620')] [2024-06-15 17:07:46,685][1652491] Updated weights for policy 0, policy_version 454911 (0.0012) [2024-06-15 17:07:48,735][1652491] Updated weights for policy 0, policy_version 454976 (0.0015) [2024-06-15 17:07:50,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 50244.3, 300 sec: 47097.1). Total num frames: 931921920. Throughput: 0: 11855.6. Samples: 233031168. Policy #0 lag: (min: 47.0, avg: 158.9, max: 303.0) [2024-06-15 17:07:50,956][1648985] Avg episode reward: [(0, '135.090')] [2024-06-15 17:07:54,020][1652491] Updated weights for policy 0, policy_version 455042 (0.0014) [2024-06-15 17:07:55,259][1652491] Updated weights for policy 0, policy_version 455100 (0.0097) [2024-06-15 17:07:55,955][1648985] Fps is (10 sec: 49151.1, 60 sec: 45875.1, 300 sec: 46655.1). Total num frames: 932052992. Throughput: 0: 11673.6. Samples: 233070592. Policy #0 lag: (min: 47.0, avg: 158.9, max: 303.0) [2024-06-15 17:07:55,956][1648985] Avg episode reward: [(0, '146.110')] [2024-06-15 17:07:57,543][1652491] Updated weights for policy 0, policy_version 455139 (0.0027) [2024-06-15 17:07:59,816][1652491] Updated weights for policy 0, policy_version 455223 (0.0015) [2024-06-15 17:08:00,955][1648985] Fps is (10 sec: 42598.8, 60 sec: 48606.0, 300 sec: 46763.8). Total num frames: 932347904. Throughput: 0: 11741.9. Samples: 233139200. Policy #0 lag: (min: 47.0, avg: 158.9, max: 303.0) [2024-06-15 17:08:00,955][1648985] Avg episode reward: [(0, '162.990')] [2024-06-15 17:08:01,323][1652491] Updated weights for policy 0, policy_version 455268 (0.0013) [2024-06-15 17:08:04,950][1652491] Updated weights for policy 0, policy_version 455316 (0.0014) [2024-06-15 17:08:05,955][1648985] Fps is (10 sec: 52430.0, 60 sec: 46421.3, 300 sec: 46874.9). Total num frames: 932577280. Throughput: 0: 11798.7. Samples: 233212928. Policy #0 lag: (min: 47.0, avg: 158.9, max: 303.0) [2024-06-15 17:08:05,956][1648985] Avg episode reward: [(0, '166.790')] [2024-06-15 17:08:07,622][1652491] Updated weights for policy 0, policy_version 455376 (0.0012) [2024-06-15 17:08:08,964][1652491] Updated weights for policy 0, policy_version 455422 (0.0010) [2024-06-15 17:08:10,955][1648985] Fps is (10 sec: 45873.6, 60 sec: 47513.5, 300 sec: 46874.9). Total num frames: 932806656. Throughput: 0: 11832.8. Samples: 233249792. Policy #0 lag: (min: 47.0, avg: 158.9, max: 303.0) [2024-06-15 17:08:10,956][1648985] Avg episode reward: [(0, '139.450')] [2024-06-15 17:08:10,981][1652491] Updated weights for policy 0, policy_version 455481 (0.0011) [2024-06-15 17:08:12,304][1652491] Updated weights for policy 0, policy_version 455546 (0.0020) [2024-06-15 17:08:15,955][1648985] Fps is (10 sec: 42598.6, 60 sec: 46421.3, 300 sec: 46874.9). Total num frames: 933003264. Throughput: 0: 12060.4. Samples: 233323008. Policy #0 lag: (min: 47.0, avg: 158.9, max: 303.0) [2024-06-15 17:08:15,956][1648985] Avg episode reward: [(0, '132.940')] [2024-06-15 17:08:16,835][1652491] Updated weights for policy 0, policy_version 455612 (0.0014) [2024-06-15 17:08:19,854][1652491] Updated weights for policy 0, policy_version 455664 (0.0014) [2024-06-15 17:08:20,955][1648985] Fps is (10 sec: 42599.3, 60 sec: 47513.6, 300 sec: 46652.8). Total num frames: 933232640. Throughput: 0: 11901.2. Samples: 233392128. Policy #0 lag: (min: 47.0, avg: 158.9, max: 303.0) [2024-06-15 17:08:20,956][1648985] Avg episode reward: [(0, '158.050')] [2024-06-15 17:08:21,422][1652491] Updated weights for policy 0, policy_version 455696 (0.0012) [2024-06-15 17:08:22,922][1652491] Updated weights for policy 0, policy_version 455760 (0.0156) [2024-06-15 17:08:25,955][1648985] Fps is (10 sec: 49151.0, 60 sec: 48059.6, 300 sec: 47097.1). Total num frames: 933494784. Throughput: 0: 11889.7. Samples: 233424896. Policy #0 lag: (min: 47.0, avg: 158.9, max: 303.0) [2024-06-15 17:08:25,956][1648985] Avg episode reward: [(0, '136.850')] [2024-06-15 17:08:27,438][1652491] Updated weights for policy 0, policy_version 455827 (0.0012) [2024-06-15 17:08:29,886][1651469] Signal inference workers to stop experience collection... (23750 times) [2024-06-15 17:08:29,927][1652491] InferenceWorker_p0-w0: stopping experience collection (23750 times) [2024-06-15 17:08:29,938][1652491] Updated weights for policy 0, policy_version 455874 (0.0013) [2024-06-15 17:08:30,167][1651469] Signal inference workers to resume experience collection... (23750 times) [2024-06-15 17:08:30,168][1652491] InferenceWorker_p0-w0: resuming experience collection (23750 times) [2024-06-15 17:08:30,955][1648985] Fps is (10 sec: 45874.1, 60 sec: 46967.3, 300 sec: 46763.8). Total num frames: 933691392. Throughput: 0: 11753.2. Samples: 233498112. Policy #0 lag: (min: 47.0, avg: 158.9, max: 303.0) [2024-06-15 17:08:30,956][1648985] Avg episode reward: [(0, '145.480')] [2024-06-15 17:08:31,238][1652491] Updated weights for policy 0, policy_version 455934 (0.0012) [2024-06-15 17:08:33,708][1652491] Updated weights for policy 0, policy_version 455987 (0.0033) [2024-06-15 17:08:35,106][1652491] Updated weights for policy 0, policy_version 456059 (0.0012) [2024-06-15 17:08:35,955][1648985] Fps is (10 sec: 52430.1, 60 sec: 49698.2, 300 sec: 47097.1). Total num frames: 934019072. Throughput: 0: 11912.5. Samples: 233567232. Policy #0 lag: (min: 47.0, avg: 158.9, max: 303.0) [2024-06-15 17:08:35,955][1648985] Avg episode reward: [(0, '113.830')] [2024-06-15 17:08:38,867][1652491] Updated weights for policy 0, policy_version 456120 (0.0096) [2024-06-15 17:08:40,955][1648985] Fps is (10 sec: 49153.0, 60 sec: 46421.3, 300 sec: 46763.8). Total num frames: 934182912. Throughput: 0: 11810.2. Samples: 233602048. Policy #0 lag: (min: 47.0, avg: 158.9, max: 303.0) [2024-06-15 17:08:40,956][1648985] Avg episode reward: [(0, '130.530')] [2024-06-15 17:08:41,296][1652491] Updated weights for policy 0, policy_version 456161 (0.0013) [2024-06-15 17:08:44,343][1652491] Updated weights for policy 0, policy_version 456211 (0.0012) [2024-06-15 17:08:45,348][1652491] Updated weights for policy 0, policy_version 456257 (0.0024) [2024-06-15 17:08:45,955][1648985] Fps is (10 sec: 42597.5, 60 sec: 48059.7, 300 sec: 46878.6). Total num frames: 934445056. Throughput: 0: 12003.5. Samples: 233679360. Policy #0 lag: (min: 47.0, avg: 158.9, max: 303.0) [2024-06-15 17:08:45,956][1648985] Avg episode reward: [(0, '128.010')] [2024-06-15 17:08:46,585][1652491] Updated weights for policy 0, policy_version 456310 (0.0086) [2024-06-15 17:08:48,259][1652491] Updated weights for policy 0, policy_version 456338 (0.0016) [2024-06-15 17:08:48,946][1652491] Updated weights for policy 0, policy_version 456375 (0.0034) [2024-06-15 17:08:50,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 46421.3, 300 sec: 46763.8). Total num frames: 934707200. Throughput: 0: 12106.0. Samples: 233757696. Policy #0 lag: (min: 47.0, avg: 158.9, max: 303.0) [2024-06-15 17:08:50,956][1648985] Avg episode reward: [(0, '130.710')] [2024-06-15 17:08:51,429][1652491] Updated weights for policy 0, policy_version 456418 (0.0015) [2024-06-15 17:08:54,083][1652491] Updated weights for policy 0, policy_version 456454 (0.0019) [2024-06-15 17:08:55,289][1652491] Updated weights for policy 0, policy_version 456511 (0.0013) [2024-06-15 17:08:55,955][1648985] Fps is (10 sec: 55706.2, 60 sec: 49152.2, 300 sec: 47319.2). Total num frames: 935002112. Throughput: 0: 12128.8. Samples: 233795584. Policy #0 lag: (min: 47.0, avg: 158.9, max: 303.0) [2024-06-15 17:08:55,956][1648985] Avg episode reward: [(0, '133.120')] [2024-06-15 17:08:56,217][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000456560_935034880.pth... [2024-06-15 17:08:56,219][1652491] Updated weights for policy 0, policy_version 456560 (0.0013) [2024-06-15 17:08:56,287][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000451072_923795456.pth [2024-06-15 17:08:58,970][1652491] Updated weights for policy 0, policy_version 456624 (0.0028) [2024-06-15 17:09:00,955][1648985] Fps is (10 sec: 49150.9, 60 sec: 47513.3, 300 sec: 46763.8). Total num frames: 935198720. Throughput: 0: 12128.6. Samples: 233868800. Policy #0 lag: (min: 47.0, avg: 158.9, max: 303.0) [2024-06-15 17:09:00,956][1648985] Avg episode reward: [(0, '130.320')] [2024-06-15 17:09:01,904][1652491] Updated weights for policy 0, policy_version 456675 (0.0107) [2024-06-15 17:09:04,481][1652491] Updated weights for policy 0, policy_version 456728 (0.0012) [2024-06-15 17:09:05,955][1648985] Fps is (10 sec: 45875.5, 60 sec: 48059.8, 300 sec: 47097.1). Total num frames: 935460864. Throughput: 0: 12253.9. Samples: 233943552. Policy #0 lag: (min: 47.0, avg: 158.9, max: 303.0) [2024-06-15 17:09:05,955][1648985] Avg episode reward: [(0, '157.980')] [2024-06-15 17:09:06,824][1652491] Updated weights for policy 0, policy_version 456816 (0.0013) [2024-06-15 17:09:09,398][1651469] Signal inference workers to stop experience collection... (23800 times) [2024-06-15 17:09:09,425][1652491] InferenceWorker_p0-w0: stopping experience collection (23800 times) [2024-06-15 17:09:09,446][1652491] Updated weights for policy 0, policy_version 456866 (0.0012) [2024-06-15 17:09:09,691][1651469] Signal inference workers to resume experience collection... (23800 times) [2024-06-15 17:09:09,692][1652491] InferenceWorker_p0-w0: resuming experience collection (23800 times) [2024-06-15 17:09:10,955][1648985] Fps is (10 sec: 52429.9, 60 sec: 48606.0, 300 sec: 46986.0). Total num frames: 935723008. Throughput: 0: 12299.4. Samples: 233978368. Policy #0 lag: (min: 13.0, avg: 133.5, max: 269.0) [2024-06-15 17:09:10,956][1648985] Avg episode reward: [(0, '140.440')] [2024-06-15 17:09:13,433][1652491] Updated weights for policy 0, policy_version 456955 (0.0013) [2024-06-15 17:09:15,955][1648985] Fps is (10 sec: 39321.6, 60 sec: 47513.6, 300 sec: 46652.7). Total num frames: 935854080. Throughput: 0: 12151.5. Samples: 234044928. Policy #0 lag: (min: 13.0, avg: 133.5, max: 269.0) [2024-06-15 17:09:15,956][1648985] Avg episode reward: [(0, '130.540')] [2024-06-15 17:09:17,180][1652491] Updated weights for policy 0, policy_version 457021 (0.0014) [2024-06-15 17:09:18,819][1652491] Updated weights for policy 0, policy_version 457072 (0.0012) [2024-06-15 17:09:20,772][1652491] Updated weights for policy 0, policy_version 457120 (0.0042) [2024-06-15 17:09:20,955][1648985] Fps is (10 sec: 45875.5, 60 sec: 49152.0, 300 sec: 46874.9). Total num frames: 936181760. Throughput: 0: 12071.8. Samples: 234110464. Policy #0 lag: (min: 13.0, avg: 133.5, max: 269.0) [2024-06-15 17:09:20,955][1648985] Avg episode reward: [(0, '134.410')] [2024-06-15 17:09:24,328][1652491] Updated weights for policy 0, policy_version 457184 (0.0013) [2024-06-15 17:09:25,955][1648985] Fps is (10 sec: 52427.8, 60 sec: 48059.7, 300 sec: 46874.9). Total num frames: 936378368. Throughput: 0: 12185.6. Samples: 234150400. Policy #0 lag: (min: 13.0, avg: 133.5, max: 269.0) [2024-06-15 17:09:25,956][1648985] Avg episode reward: [(0, '130.490')] [2024-06-15 17:09:27,980][1652491] Updated weights for policy 0, policy_version 457232 (0.0098) [2024-06-15 17:09:28,742][1652491] Updated weights for policy 0, policy_version 457280 (0.0109) [2024-06-15 17:09:30,955][1648985] Fps is (10 sec: 45873.5, 60 sec: 49151.9, 300 sec: 46763.8). Total num frames: 936640512. Throughput: 0: 11969.4. Samples: 234217984. Policy #0 lag: (min: 13.0, avg: 133.5, max: 269.0) [2024-06-15 17:09:30,956][1648985] Avg episode reward: [(0, '122.320')] [2024-06-15 17:09:31,618][1652491] Updated weights for policy 0, policy_version 457360 (0.0092) [2024-06-15 17:09:34,907][1652491] Updated weights for policy 0, policy_version 457409 (0.0014) [2024-06-15 17:09:35,955][1648985] Fps is (10 sec: 45876.4, 60 sec: 46967.5, 300 sec: 46874.9). Total num frames: 936837120. Throughput: 0: 11730.5. Samples: 234285568. Policy #0 lag: (min: 13.0, avg: 133.5, max: 269.0) [2024-06-15 17:09:35,955][1648985] Avg episode reward: [(0, '117.040')] [2024-06-15 17:09:36,376][1652491] Updated weights for policy 0, policy_version 457469 (0.0011) [2024-06-15 17:09:40,273][1652491] Updated weights for policy 0, policy_version 457534 (0.0015) [2024-06-15 17:09:40,955][1648985] Fps is (10 sec: 39322.4, 60 sec: 47513.5, 300 sec: 46652.7). Total num frames: 937033728. Throughput: 0: 11776.0. Samples: 234325504. Policy #0 lag: (min: 13.0, avg: 133.5, max: 269.0) [2024-06-15 17:09:40,956][1648985] Avg episode reward: [(0, '140.890')] [2024-06-15 17:09:41,948][1652491] Updated weights for policy 0, policy_version 457590 (0.0014) [2024-06-15 17:09:43,409][1652491] Updated weights for policy 0, policy_version 457617 (0.0012) [2024-06-15 17:09:44,424][1652491] Updated weights for policy 0, policy_version 457657 (0.0011) [2024-06-15 17:09:45,955][1648985] Fps is (10 sec: 45874.9, 60 sec: 47513.7, 300 sec: 46652.8). Total num frames: 937295872. Throughput: 0: 11514.4. Samples: 234386944. Policy #0 lag: (min: 13.0, avg: 133.5, max: 269.0) [2024-06-15 17:09:45,956][1648985] Avg episode reward: [(0, '165.010')] [2024-06-15 17:09:46,443][1652491] Updated weights for policy 0, policy_version 457696 (0.0094) [2024-06-15 17:09:50,627][1652491] Updated weights for policy 0, policy_version 457760 (0.0013) [2024-06-15 17:09:50,955][1648985] Fps is (10 sec: 49152.8, 60 sec: 46967.5, 300 sec: 46763.9). Total num frames: 937525248. Throughput: 0: 11639.5. Samples: 234467328. Policy #0 lag: (min: 13.0, avg: 133.5, max: 269.0) [2024-06-15 17:09:50,955][1648985] Avg episode reward: [(0, '159.170')] [2024-06-15 17:09:51,405][1652491] Updated weights for policy 0, policy_version 457795 (0.0021) [2024-06-15 17:09:52,582][1652491] Updated weights for policy 0, policy_version 457856 (0.0014) [2024-06-15 17:09:54,304][1652491] Updated weights for policy 0, policy_version 457913 (0.0013) [2024-06-15 17:09:55,955][1648985] Fps is (10 sec: 52428.4, 60 sec: 46967.5, 300 sec: 46652.7). Total num frames: 937820160. Throughput: 0: 11559.8. Samples: 234498560. Policy #0 lag: (min: 13.0, avg: 133.5, max: 269.0) [2024-06-15 17:09:55,956][1648985] Avg episode reward: [(0, '140.140')] [2024-06-15 17:09:56,313][1651469] Signal inference workers to stop experience collection... (23850 times) [2024-06-15 17:09:56,349][1652491] InferenceWorker_p0-w0: stopping experience collection (23850 times) [2024-06-15 17:09:56,545][1651469] Signal inference workers to resume experience collection... (23850 times) [2024-06-15 17:09:56,545][1652491] InferenceWorker_p0-w0: resuming experience collection (23850 times) [2024-06-15 17:09:57,049][1652491] Updated weights for policy 0, policy_version 457953 (0.0012) [2024-06-15 17:10:00,955][1648985] Fps is (10 sec: 42598.3, 60 sec: 45875.4, 300 sec: 46652.7). Total num frames: 937951232. Throughput: 0: 11923.9. Samples: 234581504. Policy #0 lag: (min: 13.0, avg: 133.5, max: 269.0) [2024-06-15 17:10:00,956][1648985] Avg episode reward: [(0, '122.880')] [2024-06-15 17:10:01,248][1652491] Updated weights for policy 0, policy_version 458005 (0.0021) [2024-06-15 17:10:02,569][1652491] Updated weights for policy 0, policy_version 458068 (0.0011) [2024-06-15 17:10:04,232][1652491] Updated weights for policy 0, policy_version 458144 (0.0013) [2024-06-15 17:10:05,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 48059.7, 300 sec: 46652.7). Total num frames: 938344448. Throughput: 0: 11935.3. Samples: 234647552. Policy #0 lag: (min: 13.0, avg: 133.5, max: 269.0) [2024-06-15 17:10:05,956][1648985] Avg episode reward: [(0, '133.560')] [2024-06-15 17:10:07,389][1652491] Updated weights for policy 0, policy_version 458195 (0.0021) [2024-06-15 17:10:08,262][1652491] Updated weights for policy 0, policy_version 458238 (0.0014) [2024-06-15 17:10:10,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 45875.2, 300 sec: 46652.7). Total num frames: 938475520. Throughput: 0: 11889.8. Samples: 234685440. Policy #0 lag: (min: 13.0, avg: 133.5, max: 269.0) [2024-06-15 17:10:10,956][1648985] Avg episode reward: [(0, '138.860')] [2024-06-15 17:10:12,647][1652491] Updated weights for policy 0, policy_version 458289 (0.0013) [2024-06-15 17:10:13,557][1652491] Updated weights for policy 0, policy_version 458337 (0.0013) [2024-06-15 17:10:14,736][1652491] Updated weights for policy 0, policy_version 458400 (0.0013) [2024-06-15 17:10:15,982][1648985] Fps is (10 sec: 52287.3, 60 sec: 50221.6, 300 sec: 46648.5). Total num frames: 938868736. Throughput: 0: 12087.4. Samples: 234762240. Policy #0 lag: (min: 13.0, avg: 133.5, max: 269.0) [2024-06-15 17:10:15,983][1648985] Avg episode reward: [(0, '134.940')] [2024-06-15 17:10:17,591][1652491] Updated weights for policy 0, policy_version 458452 (0.0016) [2024-06-15 17:10:18,634][1652491] Updated weights for policy 0, policy_version 458493 (0.0012) [2024-06-15 17:10:20,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 46967.4, 300 sec: 46986.0). Total num frames: 938999808. Throughput: 0: 12265.2. Samples: 234837504. Policy #0 lag: (min: 13.0, avg: 133.5, max: 269.0) [2024-06-15 17:10:20,956][1648985] Avg episode reward: [(0, '137.690')] [2024-06-15 17:10:24,304][1652491] Updated weights for policy 0, policy_version 458563 (0.0011) [2024-06-15 17:10:25,955][1648985] Fps is (10 sec: 42714.5, 60 sec: 48606.1, 300 sec: 46763.8). Total num frames: 939294720. Throughput: 0: 12197.0. Samples: 234874368. Policy #0 lag: (min: 13.0, avg: 133.5, max: 269.0) [2024-06-15 17:10:25,955][1648985] Avg episode reward: [(0, '138.420')] [2024-06-15 17:10:26,120][1652491] Updated weights for policy 0, policy_version 458656 (0.0013) [2024-06-15 17:10:28,648][1652491] Updated weights for policy 0, policy_version 458720 (0.0012) [2024-06-15 17:10:29,488][1652491] Updated weights for policy 0, policy_version 458752 (0.0012) [2024-06-15 17:10:30,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 48060.0, 300 sec: 47097.1). Total num frames: 939524096. Throughput: 0: 12231.1. Samples: 234937344. Policy #0 lag: (min: 13.0, avg: 133.5, max: 269.0) [2024-06-15 17:10:30,956][1648985] Avg episode reward: [(0, '128.540')] [2024-06-15 17:10:35,955][1648985] Fps is (10 sec: 36043.4, 60 sec: 46967.1, 300 sec: 46652.7). Total num frames: 939655168. Throughput: 0: 12105.9. Samples: 235012096. Policy #0 lag: (min: 15.0, avg: 77.1, max: 271.0) [2024-06-15 17:10:35,956][1648985] Avg episode reward: [(0, '143.060')] [2024-06-15 17:10:36,124][1651469] Signal inference workers to stop experience collection... (23900 times) [2024-06-15 17:10:36,190][1652491] InferenceWorker_p0-w0: stopping experience collection (23900 times) [2024-06-15 17:10:36,384][1651469] Signal inference workers to resume experience collection... (23900 times) [2024-06-15 17:10:36,385][1652491] InferenceWorker_p0-w0: resuming experience collection (23900 times) [2024-06-15 17:10:36,387][1652491] Updated weights for policy 0, policy_version 458848 (0.0017) [2024-06-15 17:10:38,534][1652491] Updated weights for policy 0, policy_version 458928 (0.0012) [2024-06-15 17:10:39,811][1652491] Updated weights for policy 0, policy_version 458950 (0.0012) [2024-06-15 17:10:40,903][1652491] Updated weights for policy 0, policy_version 459008 (0.0012) [2024-06-15 17:10:40,955][1648985] Fps is (10 sec: 52428.3, 60 sec: 50244.3, 300 sec: 47097.0). Total num frames: 940048384. Throughput: 0: 11867.0. Samples: 235032576. Policy #0 lag: (min: 15.0, avg: 77.1, max: 271.0) [2024-06-15 17:10:40,956][1648985] Avg episode reward: [(0, '151.980')] [2024-06-15 17:10:45,955][1648985] Fps is (10 sec: 39323.0, 60 sec: 45875.2, 300 sec: 46986.0). Total num frames: 940048384. Throughput: 0: 11810.1. Samples: 235112960. Policy #0 lag: (min: 15.0, avg: 77.1, max: 271.0) [2024-06-15 17:10:45,956][1648985] Avg episode reward: [(0, '160.790')] [2024-06-15 17:10:48,056][1652491] Updated weights for policy 0, policy_version 459075 (0.0014) [2024-06-15 17:10:49,892][1652491] Updated weights for policy 0, policy_version 459152 (0.0014) [2024-06-15 17:10:50,955][1648985] Fps is (10 sec: 36045.3, 60 sec: 48059.7, 300 sec: 46763.8). Total num frames: 940408832. Throughput: 0: 11594.0. Samples: 235169280. Policy #0 lag: (min: 15.0, avg: 77.1, max: 271.0) [2024-06-15 17:10:50,956][1648985] Avg episode reward: [(0, '138.110')] [2024-06-15 17:10:51,000][1652491] Updated weights for policy 0, policy_version 459197 (0.0018) [2024-06-15 17:10:52,633][1652491] Updated weights for policy 0, policy_version 459258 (0.0013) [2024-06-15 17:10:55,966][1648985] Fps is (10 sec: 52381.4, 60 sec: 45868.3, 300 sec: 47095.6). Total num frames: 940572672. Throughput: 0: 11591.6. Samples: 235207168. Policy #0 lag: (min: 15.0, avg: 77.1, max: 271.0) [2024-06-15 17:10:55,968][1648985] Avg episode reward: [(0, '137.240')] [2024-06-15 17:10:55,994][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000459264_940572672.pth... [2024-06-15 17:10:56,077][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000453760_929300480.pth [2024-06-15 17:10:56,081][1651469] Saving a milestone train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/milestones/checkpoint_000459264_940572672.pth [2024-06-15 17:10:59,112][1652491] Updated weights for policy 0, policy_version 459328 (0.0079) [2024-06-15 17:11:00,955][1648985] Fps is (10 sec: 42598.5, 60 sec: 48059.8, 300 sec: 46986.0). Total num frames: 940834816. Throughput: 0: 11350.5. Samples: 235272704. Policy #0 lag: (min: 15.0, avg: 77.1, max: 271.0) [2024-06-15 17:11:00,956][1648985] Avg episode reward: [(0, '141.220')] [2024-06-15 17:11:01,802][1652491] Updated weights for policy 0, policy_version 459425 (0.0110) [2024-06-15 17:11:04,058][1652491] Updated weights for policy 0, policy_version 459488 (0.0012) [2024-06-15 17:11:05,955][1648985] Fps is (10 sec: 52475.9, 60 sec: 45875.2, 300 sec: 47097.0). Total num frames: 941096960. Throughput: 0: 11036.4. Samples: 235334144. Policy #0 lag: (min: 15.0, avg: 77.1, max: 271.0) [2024-06-15 17:11:05,956][1648985] Avg episode reward: [(0, '167.400')] [2024-06-15 17:11:10,106][1652491] Updated weights for policy 0, policy_version 459537 (0.0022) [2024-06-15 17:11:10,955][1648985] Fps is (10 sec: 36044.1, 60 sec: 45329.0, 300 sec: 46986.0). Total num frames: 941195264. Throughput: 0: 11172.9. Samples: 235377152. Policy #0 lag: (min: 15.0, avg: 77.1, max: 271.0) [2024-06-15 17:11:10,956][1648985] Avg episode reward: [(0, '154.360')] [2024-06-15 17:11:11,635][1652491] Updated weights for policy 0, policy_version 459601 (0.0013) [2024-06-15 17:11:12,862][1652491] Updated weights for policy 0, policy_version 459664 (0.0023) [2024-06-15 17:11:14,632][1651469] Signal inference workers to stop experience collection... (23950 times) [2024-06-15 17:11:14,681][1652491] InferenceWorker_p0-w0: stopping experience collection (23950 times) [2024-06-15 17:11:14,683][1652491] Updated weights for policy 0, policy_version 459730 (0.0038) [2024-06-15 17:11:14,939][1651469] Signal inference workers to resume experience collection... (23950 times) [2024-06-15 17:11:14,940][1652491] InferenceWorker_p0-w0: resuming experience collection (23950 times) [2024-06-15 17:11:15,955][1648985] Fps is (10 sec: 52429.2, 60 sec: 45895.9, 300 sec: 47097.1). Total num frames: 941621248. Throughput: 0: 11195.7. Samples: 235441152. Policy #0 lag: (min: 15.0, avg: 77.1, max: 271.0) [2024-06-15 17:11:15,956][1648985] Avg episode reward: [(0, '130.800')] [2024-06-15 17:11:20,955][1648985] Fps is (10 sec: 42599.5, 60 sec: 43690.8, 300 sec: 47097.1). Total num frames: 941621248. Throughput: 0: 11275.5. Samples: 235519488. Policy #0 lag: (min: 15.0, avg: 77.1, max: 271.0) [2024-06-15 17:11:20,955][1648985] Avg episode reward: [(0, '122.340')] [2024-06-15 17:11:21,489][1652491] Updated weights for policy 0, policy_version 459793 (0.0039) [2024-06-15 17:11:22,886][1652491] Updated weights for policy 0, policy_version 459857 (0.0014) [2024-06-15 17:11:24,281][1652491] Updated weights for policy 0, policy_version 459920 (0.0011) [2024-06-15 17:11:25,925][1652491] Updated weights for policy 0, policy_version 459990 (0.0056) [2024-06-15 17:11:25,964][1648985] Fps is (10 sec: 42566.8, 60 sec: 45869.5, 300 sec: 47206.9). Total num frames: 942047232. Throughput: 0: 11535.2. Samples: 235551744. Policy #0 lag: (min: 15.0, avg: 77.1, max: 271.0) [2024-06-15 17:11:25,965][1648985] Avg episode reward: [(0, '120.540')] [2024-06-15 17:11:30,955][1648985] Fps is (10 sec: 52428.3, 60 sec: 43690.7, 300 sec: 47097.1). Total num frames: 942145536. Throughput: 0: 11423.3. Samples: 235627008. Policy #0 lag: (min: 15.0, avg: 77.1, max: 271.0) [2024-06-15 17:11:30,955][1648985] Avg episode reward: [(0, '144.180')] [2024-06-15 17:11:32,581][1652491] Updated weights for policy 0, policy_version 460064 (0.0013) [2024-06-15 17:11:34,468][1652491] Updated weights for policy 0, policy_version 460150 (0.0013) [2024-06-15 17:11:35,925][1652491] Updated weights for policy 0, policy_version 460211 (0.0012) [2024-06-15 17:11:35,955][1648985] Fps is (10 sec: 45909.7, 60 sec: 47513.9, 300 sec: 47652.5). Total num frames: 942505984. Throughput: 0: 11616.7. Samples: 235692032. Policy #0 lag: (min: 15.0, avg: 77.1, max: 271.0) [2024-06-15 17:11:35,955][1648985] Avg episode reward: [(0, '161.190')] [2024-06-15 17:11:37,278][1652491] Updated weights for policy 0, policy_version 460272 (0.0012) [2024-06-15 17:11:40,955][1648985] Fps is (10 sec: 52427.8, 60 sec: 43690.6, 300 sec: 47097.1). Total num frames: 942669824. Throughput: 0: 11539.3. Samples: 235726336. Policy #0 lag: (min: 15.0, avg: 77.1, max: 271.0) [2024-06-15 17:11:40,956][1648985] Avg episode reward: [(0, '173.500')] [2024-06-15 17:11:43,448][1652491] Updated weights for policy 0, policy_version 460306 (0.0015) [2024-06-15 17:11:44,660][1652491] Updated weights for policy 0, policy_version 460368 (0.0014) [2024-06-15 17:11:45,955][1648985] Fps is (10 sec: 45875.6, 60 sec: 48606.0, 300 sec: 47652.5). Total num frames: 942964736. Throughput: 0: 11901.2. Samples: 235808256. Policy #0 lag: (min: 15.0, avg: 77.1, max: 271.0) [2024-06-15 17:11:45,955][1648985] Avg episode reward: [(0, '150.000')] [2024-06-15 17:11:46,080][1652491] Updated weights for policy 0, policy_version 460433 (0.0015) [2024-06-15 17:11:47,518][1652491] Updated weights for policy 0, policy_version 460499 (0.0012) [2024-06-15 17:11:50,955][1648985] Fps is (10 sec: 52429.8, 60 sec: 46421.3, 300 sec: 47097.1). Total num frames: 943194112. Throughput: 0: 12003.6. Samples: 235874304. Policy #0 lag: (min: 15.0, avg: 77.1, max: 271.0) [2024-06-15 17:11:50,956][1648985] Avg episode reward: [(0, '120.780')] [2024-06-15 17:11:54,090][1652491] Updated weights for policy 0, policy_version 460576 (0.0078) [2024-06-15 17:11:54,432][1651469] Signal inference workers to stop experience collection... (24000 times) [2024-06-15 17:11:54,565][1652491] InferenceWorker_p0-w0: stopping experience collection (24000 times) [2024-06-15 17:11:54,672][1651469] Signal inference workers to resume experience collection... (24000 times) [2024-06-15 17:11:54,673][1652491] InferenceWorker_p0-w0: resuming experience collection (24000 times) [2024-06-15 17:11:55,591][1652491] Updated weights for policy 0, policy_version 460642 (0.0013) [2024-06-15 17:11:55,955][1648985] Fps is (10 sec: 45874.5, 60 sec: 47520.8, 300 sec: 47430.3). Total num frames: 943423488. Throughput: 0: 11980.8. Samples: 235916288. Policy #0 lag: (min: 15.0, avg: 77.1, max: 271.0) [2024-06-15 17:11:55,955][1648985] Avg episode reward: [(0, '132.280')] [2024-06-15 17:11:56,696][1652491] Updated weights for policy 0, policy_version 460704 (0.0013) [2024-06-15 17:11:57,289][1652491] Updated weights for policy 0, policy_version 460736 (0.0026) [2024-06-15 17:12:00,955][1648985] Fps is (10 sec: 52428.6, 60 sec: 48059.7, 300 sec: 47208.1). Total num frames: 943718400. Throughput: 0: 12060.4. Samples: 235983872. Policy #0 lag: (min: 47.0, avg: 200.8, max: 319.0) [2024-06-15 17:12:00,956][1648985] Avg episode reward: [(0, '127.970')] [2024-06-15 17:12:04,795][1652491] Updated weights for policy 0, policy_version 460822 (0.0015) [2024-06-15 17:12:05,955][1648985] Fps is (10 sec: 42598.3, 60 sec: 45875.3, 300 sec: 47097.1). Total num frames: 943849472. Throughput: 0: 12003.5. Samples: 236059648. Policy #0 lag: (min: 47.0, avg: 200.8, max: 319.0) [2024-06-15 17:12:05,956][1648985] Avg episode reward: [(0, '118.050')] [2024-06-15 17:12:06,834][1652491] Updated weights for policy 0, policy_version 460912 (0.0014) [2024-06-15 17:12:08,169][1652491] Updated weights for policy 0, policy_version 460963 (0.0047) [2024-06-15 17:12:10,182][1652491] Updated weights for policy 0, policy_version 461049 (0.0011) [2024-06-15 17:12:10,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 50790.5, 300 sec: 47541.4). Total num frames: 944242688. Throughput: 0: 11880.4. Samples: 236086272. Policy #0 lag: (min: 47.0, avg: 200.8, max: 319.0) [2024-06-15 17:12:10,956][1648985] Avg episode reward: [(0, '123.900')] [2024-06-15 17:12:15,955][1648985] Fps is (10 sec: 42598.4, 60 sec: 44236.8, 300 sec: 47097.1). Total num frames: 944275456. Throughput: 0: 12003.5. Samples: 236167168. Policy #0 lag: (min: 47.0, avg: 200.8, max: 319.0) [2024-06-15 17:12:15,956][1648985] Avg episode reward: [(0, '127.230')] [2024-06-15 17:12:16,882][1652491] Updated weights for policy 0, policy_version 461124 (0.0014) [2024-06-15 17:12:19,130][1652491] Updated weights for policy 0, policy_version 461221 (0.0120) [2024-06-15 17:12:20,938][1652491] Updated weights for policy 0, policy_version 461280 (0.0102) [2024-06-15 17:12:20,955][1648985] Fps is (10 sec: 45875.1, 60 sec: 51336.4, 300 sec: 47763.5). Total num frames: 944701440. Throughput: 0: 11810.1. Samples: 236223488. Policy #0 lag: (min: 47.0, avg: 200.8, max: 319.0) [2024-06-15 17:12:20,956][1648985] Avg episode reward: [(0, '131.860')] [2024-06-15 17:12:25,955][1648985] Fps is (10 sec: 49151.7, 60 sec: 45334.6, 300 sec: 47097.0). Total num frames: 944766976. Throughput: 0: 12049.1. Samples: 236268544. Policy #0 lag: (min: 47.0, avg: 200.8, max: 319.0) [2024-06-15 17:12:25,956][1648985] Avg episode reward: [(0, '144.820')] [2024-06-15 17:12:26,903][1652491] Updated weights for policy 0, policy_version 461328 (0.0012) [2024-06-15 17:12:27,926][1652491] Updated weights for policy 0, policy_version 461376 (0.0012) [2024-06-15 17:12:29,131][1652491] Updated weights for policy 0, policy_version 461426 (0.0013) [2024-06-15 17:12:30,747][1652491] Updated weights for policy 0, policy_version 461500 (0.0010) [2024-06-15 17:12:30,955][1648985] Fps is (10 sec: 45875.3, 60 sec: 50244.2, 300 sec: 47874.6). Total num frames: 945160192. Throughput: 0: 11764.6. Samples: 236337664. Policy #0 lag: (min: 47.0, avg: 200.8, max: 319.0) [2024-06-15 17:12:30,956][1648985] Avg episode reward: [(0, '135.050')] [2024-06-15 17:12:30,999][1651469] Signal inference workers to stop experience collection... (24050 times) [2024-06-15 17:12:31,037][1652491] InferenceWorker_p0-w0: stopping experience collection (24050 times) [2024-06-15 17:12:31,315][1651469] Signal inference workers to resume experience collection... (24050 times) [2024-06-15 17:12:31,316][1652491] InferenceWorker_p0-w0: resuming experience collection (24050 times) [2024-06-15 17:12:32,448][1652491] Updated weights for policy 0, policy_version 461566 (0.0011) [2024-06-15 17:12:35,955][1648985] Fps is (10 sec: 52428.3, 60 sec: 46421.2, 300 sec: 47097.0). Total num frames: 945291264. Throughput: 0: 11832.8. Samples: 236406784. Policy #0 lag: (min: 47.0, avg: 200.8, max: 319.0) [2024-06-15 17:12:35,956][1648985] Avg episode reward: [(0, '142.020')] [2024-06-15 17:12:39,596][1652491] Updated weights for policy 0, policy_version 461632 (0.0014) [2024-06-15 17:12:40,955][1648985] Fps is (10 sec: 36044.8, 60 sec: 47513.7, 300 sec: 47319.2). Total num frames: 945520640. Throughput: 0: 11810.1. Samples: 236447744. Policy #0 lag: (min: 47.0, avg: 200.8, max: 319.0) [2024-06-15 17:12:40,956][1648985] Avg episode reward: [(0, '127.530')] [2024-06-15 17:12:41,738][1652491] Updated weights for policy 0, policy_version 461713 (0.0112) [2024-06-15 17:12:43,224][1652491] Updated weights for policy 0, policy_version 461776 (0.0013) [2024-06-15 17:12:45,955][1648985] Fps is (10 sec: 52429.5, 60 sec: 47513.4, 300 sec: 47097.0). Total num frames: 945815552. Throughput: 0: 11502.9. Samples: 236501504. Policy #0 lag: (min: 47.0, avg: 200.8, max: 319.0) [2024-06-15 17:12:45,956][1648985] Avg episode reward: [(0, '130.010')] [2024-06-15 17:12:49,503][1652491] Updated weights for policy 0, policy_version 461826 (0.0019) [2024-06-15 17:12:50,771][1652491] Updated weights for policy 0, policy_version 461888 (0.0013) [2024-06-15 17:12:50,955][1648985] Fps is (10 sec: 42597.6, 60 sec: 45875.0, 300 sec: 47097.1). Total num frames: 945946624. Throughput: 0: 11628.0. Samples: 236582912. Policy #0 lag: (min: 47.0, avg: 200.8, max: 319.0) [2024-06-15 17:12:50,956][1648985] Avg episode reward: [(0, '148.980')] [2024-06-15 17:12:51,700][1652491] Updated weights for policy 0, policy_version 461937 (0.0013) [2024-06-15 17:12:52,666][1652491] Updated weights for policy 0, policy_version 461976 (0.0012) [2024-06-15 17:12:54,711][1652491] Updated weights for policy 0, policy_version 462049 (0.0011) [2024-06-15 17:12:55,956][1648985] Fps is (10 sec: 52425.2, 60 sec: 48605.3, 300 sec: 47430.2). Total num frames: 946339840. Throughput: 0: 11696.2. Samples: 236612608. Policy #0 lag: (min: 47.0, avg: 200.8, max: 319.0) [2024-06-15 17:12:55,957][1648985] Avg episode reward: [(0, '151.710')] [2024-06-15 17:12:55,962][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000462080_946339840.pth... [2024-06-15 17:12:56,034][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000456560_935034880.pth [2024-06-15 17:13:00,955][1648985] Fps is (10 sec: 39322.6, 60 sec: 43690.7, 300 sec: 46652.8). Total num frames: 946339840. Throughput: 0: 11559.8. Samples: 236687360. Policy #0 lag: (min: 47.0, avg: 200.8, max: 319.0) [2024-06-15 17:13:00,955][1648985] Avg episode reward: [(0, '153.930')] [2024-06-15 17:13:01,086][1652491] Updated weights for policy 0, policy_version 462097 (0.0013) [2024-06-15 17:13:02,521][1652491] Updated weights for policy 0, policy_version 462162 (0.0014) [2024-06-15 17:13:03,780][1652491] Updated weights for policy 0, policy_version 462212 (0.0099) [2024-06-15 17:13:05,758][1652491] Updated weights for policy 0, policy_version 462288 (0.0103) [2024-06-15 17:13:05,955][1648985] Fps is (10 sec: 42601.1, 60 sec: 48605.8, 300 sec: 47319.2). Total num frames: 946765824. Throughput: 0: 11559.8. Samples: 236743680. Policy #0 lag: (min: 47.0, avg: 200.8, max: 319.0) [2024-06-15 17:13:05,956][1648985] Avg episode reward: [(0, '151.290')] [2024-06-15 17:13:06,780][1652491] Updated weights for policy 0, policy_version 462336 (0.0049) [2024-06-15 17:13:10,955][1648985] Fps is (10 sec: 52427.5, 60 sec: 43690.5, 300 sec: 46985.9). Total num frames: 946864128. Throughput: 0: 11309.5. Samples: 236777472. Policy #0 lag: (min: 47.0, avg: 200.8, max: 319.0) [2024-06-15 17:13:10,956][1648985] Avg episode reward: [(0, '140.140')] [2024-06-15 17:13:12,954][1651469] Signal inference workers to stop experience collection... (24100 times) [2024-06-15 17:13:12,993][1652491] InferenceWorker_p0-w0: stopping experience collection (24100 times) [2024-06-15 17:13:13,204][1651469] Signal inference workers to resume experience collection... (24100 times) [2024-06-15 17:13:13,205][1652491] InferenceWorker_p0-w0: resuming experience collection (24100 times) [2024-06-15 17:13:13,506][1652491] Updated weights for policy 0, policy_version 462399 (0.0013) [2024-06-15 17:13:14,896][1652491] Updated weights for policy 0, policy_version 462448 (0.0012) [2024-06-15 17:13:15,955][1648985] Fps is (10 sec: 39321.8, 60 sec: 48059.7, 300 sec: 47208.1). Total num frames: 947159040. Throughput: 0: 11457.4. Samples: 236853248. Policy #0 lag: (min: 47.0, avg: 200.8, max: 319.0) [2024-06-15 17:13:15,956][1648985] Avg episode reward: [(0, '158.360')] [2024-06-15 17:13:17,573][1652491] Updated weights for policy 0, policy_version 462550 (0.0015) [2024-06-15 17:13:20,955][1648985] Fps is (10 sec: 52430.0, 60 sec: 44783.0, 300 sec: 47097.1). Total num frames: 947388416. Throughput: 0: 11332.3. Samples: 236916736. Policy #0 lag: (min: 47.0, avg: 200.8, max: 319.0) [2024-06-15 17:13:20,956][1648985] Avg episode reward: [(0, '160.600')] [2024-06-15 17:13:23,846][1652491] Updated weights for policy 0, policy_version 462593 (0.0011) [2024-06-15 17:13:25,678][1652491] Updated weights for policy 0, policy_version 462672 (0.0010) [2024-06-15 17:13:25,955][1648985] Fps is (10 sec: 39321.8, 60 sec: 46421.4, 300 sec: 46986.0). Total num frames: 947552256. Throughput: 0: 11389.2. Samples: 236960256. Policy #0 lag: (min: 14.0, avg: 73.8, max: 270.0) [2024-06-15 17:13:25,955][1648985] Avg episode reward: [(0, '165.530')] [2024-06-15 17:13:27,517][1652491] Updated weights for policy 0, policy_version 462721 (0.0084) [2024-06-15 17:13:28,827][1652491] Updated weights for policy 0, policy_version 462784 (0.0013) [2024-06-15 17:13:29,984][1652491] Updated weights for policy 0, policy_version 462832 (0.0015) [2024-06-15 17:13:30,955][1648985] Fps is (10 sec: 52428.4, 60 sec: 45875.2, 300 sec: 47097.0). Total num frames: 947912704. Throughput: 0: 11525.7. Samples: 237020160. Policy #0 lag: (min: 14.0, avg: 73.8, max: 270.0) [2024-06-15 17:13:30,956][1648985] Avg episode reward: [(0, '148.950')] [2024-06-15 17:13:35,952][1652491] Updated weights for policy 0, policy_version 462880 (0.0013) [2024-06-15 17:13:35,955][1648985] Fps is (10 sec: 42598.9, 60 sec: 44783.2, 300 sec: 46763.9). Total num frames: 947978240. Throughput: 0: 11503.0. Samples: 237100544. Policy #0 lag: (min: 14.0, avg: 73.8, max: 270.0) [2024-06-15 17:13:35,955][1648985] Avg episode reward: [(0, '135.310')] [2024-06-15 17:13:37,516][1652491] Updated weights for policy 0, policy_version 462945 (0.0011) [2024-06-15 17:13:40,013][1652491] Updated weights for policy 0, policy_version 463040 (0.0112) [2024-06-15 17:13:40,955][1648985] Fps is (10 sec: 39321.7, 60 sec: 46421.3, 300 sec: 46986.0). Total num frames: 948305920. Throughput: 0: 11332.4. Samples: 237122560. Policy #0 lag: (min: 14.0, avg: 73.8, max: 270.0) [2024-06-15 17:13:40,956][1648985] Avg episode reward: [(0, '128.350')] [2024-06-15 17:13:42,078][1652491] Updated weights for policy 0, policy_version 463103 (0.0012) [2024-06-15 17:13:45,956][1648985] Fps is (10 sec: 45871.5, 60 sec: 43690.2, 300 sec: 46541.6). Total num frames: 948436992. Throughput: 0: 11161.4. Samples: 237189632. Policy #0 lag: (min: 14.0, avg: 73.8, max: 270.0) [2024-06-15 17:13:45,956][1648985] Avg episode reward: [(0, '132.080')] [2024-06-15 17:13:48,232][1652491] Updated weights for policy 0, policy_version 463156 (0.0045) [2024-06-15 17:13:50,480][1652491] Updated weights for policy 0, policy_version 463235 (0.0101) [2024-06-15 17:13:50,955][1648985] Fps is (10 sec: 42598.5, 60 sec: 46421.5, 300 sec: 46541.7). Total num frames: 948731904. Throughput: 0: 11355.0. Samples: 237254656. Policy #0 lag: (min: 14.0, avg: 73.8, max: 270.0) [2024-06-15 17:13:50,955][1648985] Avg episode reward: [(0, '133.900')] [2024-06-15 17:13:51,831][1652491] Updated weights for policy 0, policy_version 463291 (0.0034) [2024-06-15 17:13:52,350][1651469] Signal inference workers to stop experience collection... (24150 times) [2024-06-15 17:13:52,407][1652491] InferenceWorker_p0-w0: stopping experience collection (24150 times) [2024-06-15 17:13:52,569][1651469] Signal inference workers to resume experience collection... (24150 times) [2024-06-15 17:13:52,570][1652491] InferenceWorker_p0-w0: resuming experience collection (24150 times) [2024-06-15 17:13:53,152][1652491] Updated weights for policy 0, policy_version 463328 (0.0012) [2024-06-15 17:13:55,955][1648985] Fps is (10 sec: 52432.2, 60 sec: 43691.2, 300 sec: 46652.8). Total num frames: 948961280. Throughput: 0: 11355.1. Samples: 237288448. Policy #0 lag: (min: 14.0, avg: 73.8, max: 270.0) [2024-06-15 17:13:55,956][1648985] Avg episode reward: [(0, '153.190')] [2024-06-15 17:13:59,253][1652491] Updated weights for policy 0, policy_version 463408 (0.0026) [2024-06-15 17:14:00,890][1652491] Updated weights for policy 0, policy_version 463472 (0.0013) [2024-06-15 17:14:00,955][1648985] Fps is (10 sec: 45875.3, 60 sec: 47513.6, 300 sec: 46541.7). Total num frames: 949190656. Throughput: 0: 11537.1. Samples: 237372416. Policy #0 lag: (min: 14.0, avg: 73.8, max: 270.0) [2024-06-15 17:14:00,955][1648985] Avg episode reward: [(0, '165.220')] [2024-06-15 17:14:02,922][1652491] Updated weights for policy 0, policy_version 463546 (0.0017) [2024-06-15 17:14:04,607][1652491] Updated weights for policy 0, policy_version 463587 (0.0013) [2024-06-15 17:14:05,955][1648985] Fps is (10 sec: 52427.9, 60 sec: 45329.0, 300 sec: 46652.7). Total num frames: 949485568. Throughput: 0: 11298.1. Samples: 237425152. Policy #0 lag: (min: 14.0, avg: 73.8, max: 270.0) [2024-06-15 17:14:05,956][1648985] Avg episode reward: [(0, '170.440')] [2024-06-15 17:14:10,624][1652491] Updated weights for policy 0, policy_version 463648 (0.0013) [2024-06-15 17:14:10,955][1648985] Fps is (10 sec: 36044.7, 60 sec: 44783.1, 300 sec: 46430.6). Total num frames: 949551104. Throughput: 0: 11298.1. Samples: 237468672. Policy #0 lag: (min: 14.0, avg: 73.8, max: 270.0) [2024-06-15 17:14:10,955][1648985] Avg episode reward: [(0, '155.440')] [2024-06-15 17:14:11,993][1652491] Updated weights for policy 0, policy_version 463696 (0.0013) [2024-06-15 17:14:14,507][1652491] Updated weights for policy 0, policy_version 463792 (0.0084) [2024-06-15 17:14:15,955][1648985] Fps is (10 sec: 42598.9, 60 sec: 45875.2, 300 sec: 46541.7). Total num frames: 949911552. Throughput: 0: 11229.9. Samples: 237525504. Policy #0 lag: (min: 14.0, avg: 73.8, max: 270.0) [2024-06-15 17:14:15,956][1648985] Avg episode reward: [(0, '148.300')] [2024-06-15 17:14:16,527][1652491] Updated weights for policy 0, policy_version 463863 (0.0027) [2024-06-15 17:14:20,955][1648985] Fps is (10 sec: 45875.4, 60 sec: 43690.7, 300 sec: 46208.5). Total num frames: 950009856. Throughput: 0: 11070.6. Samples: 237598720. Policy #0 lag: (min: 14.0, avg: 73.8, max: 270.0) [2024-06-15 17:14:20,956][1648985] Avg episode reward: [(0, '141.030')] [2024-06-15 17:14:22,671][1652491] Updated weights for policy 0, policy_version 463904 (0.0013) [2024-06-15 17:14:23,442][1652491] Updated weights for policy 0, policy_version 463936 (0.0013) [2024-06-15 17:14:25,052][1652491] Updated weights for policy 0, policy_version 463986 (0.0012) [2024-06-15 17:14:25,955][1648985] Fps is (10 sec: 39320.4, 60 sec: 45874.9, 300 sec: 46319.5). Total num frames: 950304768. Throughput: 0: 11354.9. Samples: 237633536. Policy #0 lag: (min: 14.0, avg: 73.8, max: 270.0) [2024-06-15 17:14:25,956][1648985] Avg episode reward: [(0, '149.080')] [2024-06-15 17:14:26,632][1652491] Updated weights for policy 0, policy_version 464056 (0.0014) [2024-06-15 17:14:28,160][1652491] Updated weights for policy 0, policy_version 464096 (0.0020) [2024-06-15 17:14:30,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 43690.7, 300 sec: 46430.6). Total num frames: 950534144. Throughput: 0: 11252.8. Samples: 237696000. Policy #0 lag: (min: 14.0, avg: 73.8, max: 270.0) [2024-06-15 17:14:30,955][1648985] Avg episode reward: [(0, '156.440')] [2024-06-15 17:14:33,475][1652491] Updated weights for policy 0, policy_version 464134 (0.0012) [2024-06-15 17:14:35,273][1652491] Updated weights for policy 0, policy_version 464208 (0.0011) [2024-06-15 17:14:35,735][1651469] Signal inference workers to stop experience collection... (24200 times) [2024-06-15 17:14:35,773][1652491] InferenceWorker_p0-w0: stopping experience collection (24200 times) [2024-06-15 17:14:35,955][1648985] Fps is (10 sec: 42600.1, 60 sec: 45875.1, 300 sec: 46430.6). Total num frames: 950730752. Throughput: 0: 11343.6. Samples: 237765120. Policy #0 lag: (min: 14.0, avg: 73.8, max: 270.0) [2024-06-15 17:14:35,956][1648985] Avg episode reward: [(0, '159.140')] [2024-06-15 17:14:36,011][1651469] Signal inference workers to resume experience collection... (24200 times) [2024-06-15 17:14:36,013][1652491] InferenceWorker_p0-w0: resuming experience collection (24200 times) [2024-06-15 17:14:37,013][1652491] Updated weights for policy 0, policy_version 464273 (0.0013) [2024-06-15 17:14:39,580][1652491] Updated weights for policy 0, policy_version 464336 (0.0025) [2024-06-15 17:14:40,955][1648985] Fps is (10 sec: 52427.4, 60 sec: 45875.1, 300 sec: 46652.7). Total num frames: 951058432. Throughput: 0: 11355.0. Samples: 237799424. Policy #0 lag: (min: 14.0, avg: 73.8, max: 270.0) [2024-06-15 17:14:40,956][1648985] Avg episode reward: [(0, '146.660')] [2024-06-15 17:14:44,538][1652491] Updated weights for policy 0, policy_version 464386 (0.0013) [2024-06-15 17:14:45,955][1648985] Fps is (10 sec: 45875.2, 60 sec: 45875.7, 300 sec: 46319.5). Total num frames: 951189504. Throughput: 0: 11195.7. Samples: 237876224. Policy #0 lag: (min: 14.0, avg: 73.8, max: 270.0) [2024-06-15 17:14:45,956][1648985] Avg episode reward: [(0, '161.150')] [2024-06-15 17:14:46,049][1652491] Updated weights for policy 0, policy_version 464450 (0.0013) [2024-06-15 17:14:47,566][1652491] Updated weights for policy 0, policy_version 464512 (0.0012) [2024-06-15 17:14:49,093][1652491] Updated weights for policy 0, policy_version 464566 (0.0012) [2024-06-15 17:14:50,974][1648985] Fps is (10 sec: 39246.5, 60 sec: 45314.5, 300 sec: 46205.4). Total num frames: 951451648. Throughput: 0: 11418.4. Samples: 237939200. Policy #0 lag: (min: 95.0, avg: 201.6, max: 349.0) [2024-06-15 17:14:50,975][1648985] Avg episode reward: [(0, '147.090')] [2024-06-15 17:14:52,055][1652491] Updated weights for policy 0, policy_version 464640 (0.0016) [2024-06-15 17:14:55,955][1648985] Fps is (10 sec: 39320.5, 60 sec: 43690.5, 300 sec: 46208.4). Total num frames: 951582720. Throughput: 0: 11218.4. Samples: 237973504. Policy #0 lag: (min: 95.0, avg: 201.6, max: 349.0) [2024-06-15 17:14:55,956][1648985] Avg episode reward: [(0, '145.930')] [2024-06-15 17:14:55,963][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000464640_951582720.pth... [2024-06-15 17:14:56,218][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000459264_940572672.pth [2024-06-15 17:14:57,486][1652491] Updated weights for policy 0, policy_version 464695 (0.0012) [2024-06-15 17:14:58,521][1652491] Updated weights for policy 0, policy_version 464722 (0.0011) [2024-06-15 17:15:00,916][1652491] Updated weights for policy 0, policy_version 464816 (0.0012) [2024-06-15 17:15:00,955][1648985] Fps is (10 sec: 49246.9, 60 sec: 45875.1, 300 sec: 46097.4). Total num frames: 951943168. Throughput: 0: 11377.8. Samples: 238037504. Policy #0 lag: (min: 95.0, avg: 201.6, max: 349.0) [2024-06-15 17:15:00,956][1648985] Avg episode reward: [(0, '135.450')] [2024-06-15 17:15:03,141][1652491] Updated weights for policy 0, policy_version 464852 (0.0013) [2024-06-15 17:15:05,955][1648985] Fps is (10 sec: 52430.1, 60 sec: 43690.8, 300 sec: 46208.4). Total num frames: 952107008. Throughput: 0: 11298.1. Samples: 238107136. Policy #0 lag: (min: 95.0, avg: 201.6, max: 349.0) [2024-06-15 17:15:05,956][1648985] Avg episode reward: [(0, '158.340')] [2024-06-15 17:15:08,643][1652491] Updated weights for policy 0, policy_version 464915 (0.0044) [2024-06-15 17:15:09,839][1652491] Updated weights for policy 0, policy_version 464964 (0.0046) [2024-06-15 17:15:10,955][1648985] Fps is (10 sec: 39322.0, 60 sec: 46421.4, 300 sec: 45657.2). Total num frames: 952336384. Throughput: 0: 11355.1. Samples: 238144512. Policy #0 lag: (min: 95.0, avg: 201.6, max: 349.0) [2024-06-15 17:15:10,955][1648985] Avg episode reward: [(0, '156.280')] [2024-06-15 17:15:11,690][1652491] Updated weights for policy 0, policy_version 465040 (0.0014) [2024-06-15 17:15:12,863][1652491] Updated weights for policy 0, policy_version 465087 (0.0014) [2024-06-15 17:15:14,951][1652491] Updated weights for policy 0, policy_version 465142 (0.0015) [2024-06-15 17:15:15,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 45329.1, 300 sec: 46208.4). Total num frames: 952631296. Throughput: 0: 11423.3. Samples: 238210048. Policy #0 lag: (min: 95.0, avg: 201.6, max: 349.0) [2024-06-15 17:15:15,956][1648985] Avg episode reward: [(0, '134.590')] [2024-06-15 17:15:19,200][1651469] Signal inference workers to stop experience collection... (24250 times) [2024-06-15 17:15:19,253][1652491] InferenceWorker_p0-w0: stopping experience collection (24250 times) [2024-06-15 17:15:19,446][1651469] Signal inference workers to resume experience collection... (24250 times) [2024-06-15 17:15:19,447][1652491] InferenceWorker_p0-w0: resuming experience collection (24250 times) [2024-06-15 17:15:19,619][1652491] Updated weights for policy 0, policy_version 465192 (0.0014) [2024-06-15 17:15:20,955][1648985] Fps is (10 sec: 42597.6, 60 sec: 45875.1, 300 sec: 45653.0). Total num frames: 952762368. Throughput: 0: 11582.5. Samples: 238286336. Policy #0 lag: (min: 95.0, avg: 201.6, max: 349.0) [2024-06-15 17:15:20,956][1648985] Avg episode reward: [(0, '135.650')] [2024-06-15 17:15:21,361][1652491] Updated weights for policy 0, policy_version 465248 (0.0014) [2024-06-15 17:15:22,812][1652491] Updated weights for policy 0, policy_version 465296 (0.0099) [2024-06-15 17:15:25,451][1652491] Updated weights for policy 0, policy_version 465363 (0.0125) [2024-06-15 17:15:25,955][1648985] Fps is (10 sec: 45874.9, 60 sec: 46421.6, 300 sec: 45986.3). Total num frames: 953090048. Throughput: 0: 11343.7. Samples: 238309888. Policy #0 lag: (min: 95.0, avg: 201.6, max: 349.0) [2024-06-15 17:15:25,956][1648985] Avg episode reward: [(0, '134.820')] [2024-06-15 17:15:30,727][1652491] Updated weights for policy 0, policy_version 465411 (0.0014) [2024-06-15 17:15:30,955][1648985] Fps is (10 sec: 39321.8, 60 sec: 43690.6, 300 sec: 45764.2). Total num frames: 953155584. Throughput: 0: 11389.1. Samples: 238388736. Policy #0 lag: (min: 95.0, avg: 201.6, max: 349.0) [2024-06-15 17:15:30,956][1648985] Avg episode reward: [(0, '137.360')] [2024-06-15 17:15:32,300][1652491] Updated weights for policy 0, policy_version 465474 (0.0013) [2024-06-15 17:15:33,364][1652491] Updated weights for policy 0, policy_version 465524 (0.0012) [2024-06-15 17:15:35,027][1652491] Updated weights for policy 0, policy_version 465588 (0.0014) [2024-06-15 17:15:35,955][1648985] Fps is (10 sec: 45875.4, 60 sec: 46967.4, 300 sec: 45764.1). Total num frames: 953548800. Throughput: 0: 11405.4. Samples: 238452224. Policy #0 lag: (min: 95.0, avg: 201.6, max: 349.0) [2024-06-15 17:15:35,956][1648985] Avg episode reward: [(0, '128.020')] [2024-06-15 17:15:36,813][1652491] Updated weights for policy 0, policy_version 465618 (0.0011) [2024-06-15 17:15:37,648][1652491] Updated weights for policy 0, policy_version 465664 (0.0014) [2024-06-15 17:15:40,955][1648985] Fps is (10 sec: 52429.1, 60 sec: 43690.8, 300 sec: 46208.4). Total num frames: 953679872. Throughput: 0: 11412.0. Samples: 238487040. Policy #0 lag: (min: 95.0, avg: 201.6, max: 349.0) [2024-06-15 17:15:40,956][1648985] Avg episode reward: [(0, '126.350')] [2024-06-15 17:15:43,274][1652491] Updated weights for policy 0, policy_version 465716 (0.0102) [2024-06-15 17:15:44,551][1652491] Updated weights for policy 0, policy_version 465776 (0.0013) [2024-06-15 17:15:45,955][1648985] Fps is (10 sec: 45875.3, 60 sec: 46967.4, 300 sec: 46097.3). Total num frames: 954007552. Throughput: 0: 11514.3. Samples: 238555648. Policy #0 lag: (min: 95.0, avg: 201.6, max: 349.0) [2024-06-15 17:15:45,955][1648985] Avg episode reward: [(0, '124.430')] [2024-06-15 17:15:46,339][1652491] Updated weights for policy 0, policy_version 465852 (0.0011) [2024-06-15 17:15:48,805][1652491] Updated weights for policy 0, policy_version 465912 (0.0013) [2024-06-15 17:15:50,955][1648985] Fps is (10 sec: 52428.4, 60 sec: 45889.9, 300 sec: 46209.8). Total num frames: 954204160. Throughput: 0: 11514.3. Samples: 238625280. Policy #0 lag: (min: 95.0, avg: 201.6, max: 349.0) [2024-06-15 17:15:50,956][1648985] Avg episode reward: [(0, '142.230')] [2024-06-15 17:15:55,488][1652491] Updated weights for policy 0, policy_version 466000 (0.0013) [2024-06-15 17:15:55,955][1648985] Fps is (10 sec: 39321.5, 60 sec: 46967.6, 300 sec: 45986.3). Total num frames: 954400768. Throughput: 0: 11605.3. Samples: 238666752. Policy #0 lag: (min: 95.0, avg: 201.6, max: 349.0) [2024-06-15 17:15:55,956][1648985] Avg episode reward: [(0, '155.690')] [2024-06-15 17:15:57,105][1652491] Updated weights for policy 0, policy_version 466065 (0.0033) [2024-06-15 17:15:57,453][1651469] Signal inference workers to stop experience collection... (24300 times) [2024-06-15 17:15:57,515][1652491] InferenceWorker_p0-w0: stopping experience collection (24300 times) [2024-06-15 17:15:57,775][1651469] Signal inference workers to resume experience collection... (24300 times) [2024-06-15 17:15:57,776][1652491] InferenceWorker_p0-w0: resuming experience collection (24300 times) [2024-06-15 17:15:58,054][1652491] Updated weights for policy 0, policy_version 466108 (0.0012) [2024-06-15 17:16:00,286][1652491] Updated weights for policy 0, policy_version 466160 (0.0012) [2024-06-15 17:16:00,955][1648985] Fps is (10 sec: 52429.1, 60 sec: 46421.3, 300 sec: 46208.4). Total num frames: 954728448. Throughput: 0: 11548.4. Samples: 238729728. Policy #0 lag: (min: 95.0, avg: 201.6, max: 349.0) [2024-06-15 17:16:00,956][1648985] Avg episode reward: [(0, '140.340')] [2024-06-15 17:16:05,098][1652491] Updated weights for policy 0, policy_version 466192 (0.0044) [2024-06-15 17:16:05,955][1648985] Fps is (10 sec: 42598.1, 60 sec: 45329.0, 300 sec: 46208.4). Total num frames: 954826752. Throughput: 0: 11582.6. Samples: 238807552. Policy #0 lag: (min: 95.0, avg: 201.6, max: 349.0) [2024-06-15 17:16:05,956][1648985] Avg episode reward: [(0, '152.020')] [2024-06-15 17:16:06,734][1652491] Updated weights for policy 0, policy_version 466258 (0.0013) [2024-06-15 17:16:08,691][1652491] Updated weights for policy 0, policy_version 466338 (0.0052) [2024-06-15 17:16:10,955][1648985] Fps is (10 sec: 39321.4, 60 sec: 46421.2, 300 sec: 45764.1). Total num frames: 955121664. Throughput: 0: 11662.2. Samples: 238834688. Policy #0 lag: (min: 95.0, avg: 201.6, max: 349.0) [2024-06-15 17:16:10,956][1648985] Avg episode reward: [(0, '148.470')] [2024-06-15 17:16:11,525][1652491] Updated weights for policy 0, policy_version 466403 (0.0013) [2024-06-15 17:16:15,559][1652491] Updated weights for policy 0, policy_version 466448 (0.0020) [2024-06-15 17:16:15,957][1648985] Fps is (10 sec: 45865.1, 60 sec: 44235.1, 300 sec: 46319.1). Total num frames: 955285504. Throughput: 0: 11718.5. Samples: 238916096. Policy #0 lag: (min: 14.0, avg: 91.4, max: 270.0) [2024-06-15 17:16:15,958][1648985] Avg episode reward: [(0, '158.960')] [2024-06-15 17:16:17,221][1652491] Updated weights for policy 0, policy_version 466513 (0.0014) [2024-06-15 17:16:19,255][1652491] Updated weights for policy 0, policy_version 466597 (0.0012) [2024-06-15 17:16:20,955][1648985] Fps is (10 sec: 52429.1, 60 sec: 48059.8, 300 sec: 46098.5). Total num frames: 955645952. Throughput: 0: 11537.1. Samples: 238971392. Policy #0 lag: (min: 14.0, avg: 91.4, max: 270.0) [2024-06-15 17:16:20,956][1648985] Avg episode reward: [(0, '147.380')] [2024-06-15 17:16:23,074][1652491] Updated weights for policy 0, policy_version 466672 (0.0014) [2024-06-15 17:16:25,955][1648985] Fps is (10 sec: 49163.1, 60 sec: 44783.0, 300 sec: 46208.4). Total num frames: 955777024. Throughput: 0: 11685.0. Samples: 239012864. Policy #0 lag: (min: 14.0, avg: 91.4, max: 270.0) [2024-06-15 17:16:25,956][1648985] Avg episode reward: [(0, '151.660')] [2024-06-15 17:16:27,108][1652491] Updated weights for policy 0, policy_version 466711 (0.0014) [2024-06-15 17:16:28,628][1652491] Updated weights for policy 0, policy_version 466771 (0.0015) [2024-06-15 17:16:30,392][1652491] Updated weights for policy 0, policy_version 466836 (0.0036) [2024-06-15 17:16:30,955][1648985] Fps is (10 sec: 45874.8, 60 sec: 49152.0, 300 sec: 46097.3). Total num frames: 956104704. Throughput: 0: 11628.1. Samples: 239078912. Policy #0 lag: (min: 14.0, avg: 91.4, max: 270.0) [2024-06-15 17:16:30,956][1648985] Avg episode reward: [(0, '137.960')] [2024-06-15 17:16:31,326][1652491] Updated weights for policy 0, policy_version 466880 (0.0012) [2024-06-15 17:16:34,217][1652491] Updated weights for policy 0, policy_version 466944 (0.0013) [2024-06-15 17:16:35,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 45875.2, 300 sec: 46208.4). Total num frames: 956301312. Throughput: 0: 11764.6. Samples: 239154688. Policy #0 lag: (min: 14.0, avg: 91.4, max: 270.0) [2024-06-15 17:16:35,956][1648985] Avg episode reward: [(0, '153.900')] [2024-06-15 17:16:39,329][1652491] Updated weights for policy 0, policy_version 467008 (0.0081) [2024-06-15 17:16:39,442][1651469] Signal inference workers to stop experience collection... (24350 times) [2024-06-15 17:16:39,506][1652491] InferenceWorker_p0-w0: stopping experience collection (24350 times) [2024-06-15 17:16:39,626][1651469] Signal inference workers to resume experience collection... (24350 times) [2024-06-15 17:16:39,633][1652491] InferenceWorker_p0-w0: resuming experience collection (24350 times) [2024-06-15 17:16:40,955][1648985] Fps is (10 sec: 42598.7, 60 sec: 47513.6, 300 sec: 45986.2). Total num frames: 956530688. Throughput: 0: 11707.7. Samples: 239193600. Policy #0 lag: (min: 14.0, avg: 91.4, max: 270.0) [2024-06-15 17:16:40,956][1648985] Avg episode reward: [(0, '155.040')] [2024-06-15 17:16:41,109][1652491] Updated weights for policy 0, policy_version 467072 (0.0126) [2024-06-15 17:16:42,477][1652491] Updated weights for policy 0, policy_version 467129 (0.0027) [2024-06-15 17:16:45,036][1652491] Updated weights for policy 0, policy_version 467184 (0.0254) [2024-06-15 17:16:45,955][1648985] Fps is (10 sec: 52427.6, 60 sec: 46967.3, 300 sec: 46208.4). Total num frames: 956825600. Throughput: 0: 11787.3. Samples: 239260160. Policy #0 lag: (min: 14.0, avg: 91.4, max: 270.0) [2024-06-15 17:16:45,956][1648985] Avg episode reward: [(0, '167.130')] [2024-06-15 17:16:49,886][1652491] Updated weights for policy 0, policy_version 467249 (0.0015) [2024-06-15 17:16:50,955][1648985] Fps is (10 sec: 49152.4, 60 sec: 46967.6, 300 sec: 46097.4). Total num frames: 957022208. Throughput: 0: 11559.9. Samples: 239327744. Policy #0 lag: (min: 14.0, avg: 91.4, max: 270.0) [2024-06-15 17:16:50,955][1648985] Avg episode reward: [(0, '161.300')] [2024-06-15 17:16:51,031][1652491] Updated weights for policy 0, policy_version 467300 (0.0011) [2024-06-15 17:16:52,525][1652491] Updated weights for policy 0, policy_version 467352 (0.0015) [2024-06-15 17:16:55,885][1652491] Updated weights for policy 0, policy_version 467424 (0.0032) [2024-06-15 17:16:55,955][1648985] Fps is (10 sec: 45876.1, 60 sec: 48059.7, 300 sec: 45986.3). Total num frames: 957284352. Throughput: 0: 11776.0. Samples: 239364608. Policy #0 lag: (min: 14.0, avg: 91.4, max: 270.0) [2024-06-15 17:16:55,956][1648985] Avg episode reward: [(0, '136.290')] [2024-06-15 17:16:56,219][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000467440_957317120.pth... [2024-06-15 17:16:56,270][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000462080_946339840.pth [2024-06-15 17:17:00,608][1652491] Updated weights for policy 0, policy_version 467493 (0.0013) [2024-06-15 17:17:00,955][1648985] Fps is (10 sec: 42598.1, 60 sec: 45329.1, 300 sec: 46097.4). Total num frames: 957448192. Throughput: 0: 11674.2. Samples: 239441408. Policy #0 lag: (min: 14.0, avg: 91.4, max: 270.0) [2024-06-15 17:17:00,956][1648985] Avg episode reward: [(0, '148.610')] [2024-06-15 17:17:02,114][1652491] Updated weights for policy 0, policy_version 467568 (0.0012) [2024-06-15 17:17:03,846][1652491] Updated weights for policy 0, policy_version 467602 (0.0011) [2024-06-15 17:17:04,892][1652491] Updated weights for policy 0, policy_version 467648 (0.0012) [2024-06-15 17:17:05,955][1648985] Fps is (10 sec: 45875.2, 60 sec: 48605.9, 300 sec: 45764.1). Total num frames: 957743104. Throughput: 0: 11832.9. Samples: 239503872. Policy #0 lag: (min: 14.0, avg: 91.4, max: 270.0) [2024-06-15 17:17:05,956][1648985] Avg episode reward: [(0, '147.570')] [2024-06-15 17:17:07,668][1652491] Updated weights for policy 0, policy_version 467703 (0.0011) [2024-06-15 17:17:10,955][1648985] Fps is (10 sec: 42598.4, 60 sec: 45875.2, 300 sec: 46097.3). Total num frames: 957874176. Throughput: 0: 11776.0. Samples: 239542784. Policy #0 lag: (min: 14.0, avg: 91.4, max: 270.0) [2024-06-15 17:17:10,956][1648985] Avg episode reward: [(0, '149.800')] [2024-06-15 17:17:12,361][1652491] Updated weights for policy 0, policy_version 467769 (0.0142) [2024-06-15 17:17:13,498][1652491] Updated weights for policy 0, policy_version 467809 (0.0013) [2024-06-15 17:17:15,461][1652491] Updated weights for policy 0, policy_version 467895 (0.0145) [2024-06-15 17:17:15,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 49700.0, 300 sec: 45986.3). Total num frames: 958267392. Throughput: 0: 11719.1. Samples: 239606272. Policy #0 lag: (min: 14.0, avg: 91.4, max: 270.0) [2024-06-15 17:17:15,956][1648985] Avg episode reward: [(0, '160.980')] [2024-06-15 17:17:18,728][1652491] Updated weights for policy 0, policy_version 467952 (0.0011) [2024-06-15 17:17:20,955][1648985] Fps is (10 sec: 52428.5, 60 sec: 45875.1, 300 sec: 46208.4). Total num frames: 958398464. Throughput: 0: 11764.6. Samples: 239684096. Policy #0 lag: (min: 14.0, avg: 91.4, max: 270.0) [2024-06-15 17:17:20,956][1648985] Avg episode reward: [(0, '167.620')] [2024-06-15 17:17:22,530][1651469] Signal inference workers to stop experience collection... (24400 times) [2024-06-15 17:17:22,564][1652491] InferenceWorker_p0-w0: stopping experience collection (24400 times) [2024-06-15 17:17:22,736][1651469] Signal inference workers to resume experience collection... (24400 times) [2024-06-15 17:17:22,738][1652491] InferenceWorker_p0-w0: resuming experience collection (24400 times) [2024-06-15 17:17:23,365][1652491] Updated weights for policy 0, policy_version 468020 (0.0012) [2024-06-15 17:17:24,543][1652491] Updated weights for policy 0, policy_version 468064 (0.0013) [2024-06-15 17:17:25,822][1652491] Updated weights for policy 0, policy_version 468100 (0.0019) [2024-06-15 17:17:25,955][1648985] Fps is (10 sec: 39322.0, 60 sec: 48059.8, 300 sec: 45764.1). Total num frames: 958660608. Throughput: 0: 11639.5. Samples: 239717376. Policy #0 lag: (min: 14.0, avg: 91.4, max: 270.0) [2024-06-15 17:17:25,956][1648985] Avg episode reward: [(0, '171.100')] [2024-06-15 17:17:27,149][1652491] Updated weights for policy 0, policy_version 468160 (0.0014) [2024-06-15 17:17:30,579][1652491] Updated weights for policy 0, policy_version 468224 (0.0013) [2024-06-15 17:17:30,955][1648985] Fps is (10 sec: 52429.1, 60 sec: 46967.5, 300 sec: 46208.5). Total num frames: 958922752. Throughput: 0: 11741.9. Samples: 239788544. Policy #0 lag: (min: 14.0, avg: 91.4, max: 270.0) [2024-06-15 17:17:30,956][1648985] Avg episode reward: [(0, '177.110')] [2024-06-15 17:17:34,458][1652491] Updated weights for policy 0, policy_version 468288 (0.0014) [2024-06-15 17:17:35,914][1652491] Updated weights for policy 0, policy_version 468344 (0.0015) [2024-06-15 17:17:35,955][1648985] Fps is (10 sec: 49151.9, 60 sec: 47513.6, 300 sec: 46208.4). Total num frames: 959152128. Throughput: 0: 11696.3. Samples: 239854080. Policy #0 lag: (min: 14.0, avg: 91.4, max: 270.0) [2024-06-15 17:17:35,956][1648985] Avg episode reward: [(0, '163.300')] [2024-06-15 17:17:37,723][1652491] Updated weights for policy 0, policy_version 468384 (0.0031) [2024-06-15 17:17:40,218][1652491] Updated weights for policy 0, policy_version 468418 (0.0018) [2024-06-15 17:17:40,955][1648985] Fps is (10 sec: 45874.5, 60 sec: 47513.5, 300 sec: 45986.3). Total num frames: 959381504. Throughput: 0: 11810.1. Samples: 239896064. Policy #0 lag: (min: 8.0, avg: 123.0, max: 264.0) [2024-06-15 17:17:40,956][1648985] Avg episode reward: [(0, '145.790')] [2024-06-15 17:17:44,787][1652491] Updated weights for policy 0, policy_version 468521 (0.0245) [2024-06-15 17:17:45,707][1652491] Updated weights for policy 0, policy_version 468560 (0.0014) [2024-06-15 17:17:45,955][1648985] Fps is (10 sec: 45876.0, 60 sec: 46421.7, 300 sec: 46319.6). Total num frames: 959610880. Throughput: 0: 11639.5. Samples: 239965184. Policy #0 lag: (min: 8.0, avg: 123.0, max: 264.0) [2024-06-15 17:17:45,955][1648985] Avg episode reward: [(0, '141.570')] [2024-06-15 17:17:48,286][1652491] Updated weights for policy 0, policy_version 468624 (0.0014) [2024-06-15 17:17:50,955][1648985] Fps is (10 sec: 45875.8, 60 sec: 46967.4, 300 sec: 45764.2). Total num frames: 959840256. Throughput: 0: 11798.8. Samples: 240034816. Policy #0 lag: (min: 8.0, avg: 123.0, max: 264.0) [2024-06-15 17:17:50,956][1648985] Avg episode reward: [(0, '142.360')] [2024-06-15 17:17:51,859][1652491] Updated weights for policy 0, policy_version 468675 (0.0013) [2024-06-15 17:17:52,804][1652491] Updated weights for policy 0, policy_version 468729 (0.0012) [2024-06-15 17:17:55,955][1648985] Fps is (10 sec: 39320.9, 60 sec: 45329.1, 300 sec: 46319.5). Total num frames: 960004096. Throughput: 0: 11787.4. Samples: 240073216. Policy #0 lag: (min: 8.0, avg: 123.0, max: 264.0) [2024-06-15 17:17:55,956][1648985] Avg episode reward: [(0, '149.300')] [2024-06-15 17:17:56,757][1652491] Updated weights for policy 0, policy_version 468800 (0.0014) [2024-06-15 17:17:57,988][1652491] Updated weights for policy 0, policy_version 468851 (0.0013) [2024-06-15 17:17:59,152][1652491] Updated weights for policy 0, policy_version 468881 (0.0011) [2024-06-15 17:18:00,955][1648985] Fps is (10 sec: 52428.5, 60 sec: 48605.8, 300 sec: 46097.3). Total num frames: 960364544. Throughput: 0: 11844.3. Samples: 240139264. Policy #0 lag: (min: 8.0, avg: 123.0, max: 264.0) [2024-06-15 17:18:00,956][1648985] Avg episode reward: [(0, '139.500')] [2024-06-15 17:18:03,029][1652491] Updated weights for policy 0, policy_version 468944 (0.0012) [2024-06-15 17:18:05,955][1648985] Fps is (10 sec: 49152.1, 60 sec: 45875.3, 300 sec: 46208.5). Total num frames: 960495616. Throughput: 0: 11923.9. Samples: 240220672. Policy #0 lag: (min: 8.0, avg: 123.0, max: 264.0) [2024-06-15 17:18:05,956][1648985] Avg episode reward: [(0, '156.320')] [2024-06-15 17:18:06,351][1651469] Signal inference workers to stop experience collection... (24450 times) [2024-06-15 17:18:06,433][1652491] InferenceWorker_p0-w0: stopping experience collection (24450 times) [2024-06-15 17:18:06,436][1652491] Updated weights for policy 0, policy_version 468996 (0.0013) [2024-06-15 17:18:06,649][1651469] Signal inference workers to resume experience collection... (24450 times) [2024-06-15 17:18:06,650][1652491] InferenceWorker_p0-w0: resuming experience collection (24450 times) [2024-06-15 17:18:07,909][1652491] Updated weights for policy 0, policy_version 469057 (0.0012) [2024-06-15 17:18:08,885][1652491] Updated weights for policy 0, policy_version 469115 (0.0013) [2024-06-15 17:18:09,828][1652491] Updated weights for policy 0, policy_version 469152 (0.0116) [2024-06-15 17:18:10,955][1648985] Fps is (10 sec: 52429.7, 60 sec: 50244.3, 300 sec: 46541.7). Total num frames: 960888832. Throughput: 0: 11855.7. Samples: 240250880. Policy #0 lag: (min: 8.0, avg: 123.0, max: 264.0) [2024-06-15 17:18:10,955][1648985] Avg episode reward: [(0, '157.680')] [2024-06-15 17:18:13,939][1652491] Updated weights for policy 0, policy_version 469200 (0.0013) [2024-06-15 17:18:14,757][1652491] Updated weights for policy 0, policy_version 469243 (0.0017) [2024-06-15 17:18:15,955][1648985] Fps is (10 sec: 52427.4, 60 sec: 45875.1, 300 sec: 46208.4). Total num frames: 961019904. Throughput: 0: 12128.6. Samples: 240334336. Policy #0 lag: (min: 8.0, avg: 123.0, max: 264.0) [2024-06-15 17:18:15,956][1648985] Avg episode reward: [(0, '158.420')] [2024-06-15 17:18:17,809][1652491] Updated weights for policy 0, policy_version 469298 (0.0015) [2024-06-15 17:18:20,477][1652491] Updated weights for policy 0, policy_version 469379 (0.0214) [2024-06-15 17:18:20,955][1648985] Fps is (10 sec: 42597.7, 60 sec: 48605.9, 300 sec: 46652.7). Total num frames: 961314816. Throughput: 0: 12094.6. Samples: 240398336. Policy #0 lag: (min: 8.0, avg: 123.0, max: 264.0) [2024-06-15 17:18:20,956][1648985] Avg episode reward: [(0, '144.850')] [2024-06-15 17:18:24,742][1652491] Updated weights for policy 0, policy_version 469447 (0.0093) [2024-06-15 17:18:25,752][1652491] Updated weights for policy 0, policy_version 469502 (0.0013) [2024-06-15 17:18:25,955][1648985] Fps is (10 sec: 52430.9, 60 sec: 48059.8, 300 sec: 46208.5). Total num frames: 961544192. Throughput: 0: 12106.0. Samples: 240440832. Policy #0 lag: (min: 8.0, avg: 123.0, max: 264.0) [2024-06-15 17:18:25,955][1648985] Avg episode reward: [(0, '135.920')] [2024-06-15 17:18:29,333][1652491] Updated weights for policy 0, policy_version 469584 (0.0032) [2024-06-15 17:18:30,362][1652491] Updated weights for policy 0, policy_version 469627 (0.0013) [2024-06-15 17:18:30,955][1648985] Fps is (10 sec: 49152.5, 60 sec: 48059.8, 300 sec: 46874.9). Total num frames: 961806336. Throughput: 0: 12026.3. Samples: 240506368. Policy #0 lag: (min: 8.0, avg: 123.0, max: 264.0) [2024-06-15 17:18:30,956][1648985] Avg episode reward: [(0, '159.300')] [2024-06-15 17:18:32,457][1652491] Updated weights for policy 0, policy_version 469694 (0.0014) [2024-06-15 17:18:35,955][1648985] Fps is (10 sec: 39320.6, 60 sec: 46421.3, 300 sec: 46208.4). Total num frames: 961937408. Throughput: 0: 12140.1. Samples: 240581120. Policy #0 lag: (min: 8.0, avg: 123.0, max: 264.0) [2024-06-15 17:18:35,956][1648985] Avg episode reward: [(0, '162.380')] [2024-06-15 17:18:36,990][1652491] Updated weights for policy 0, policy_version 469754 (0.0013) [2024-06-15 17:18:39,285][1652491] Updated weights for policy 0, policy_version 469808 (0.0017) [2024-06-15 17:18:40,955][1648985] Fps is (10 sec: 45875.4, 60 sec: 48059.9, 300 sec: 46875.0). Total num frames: 962265088. Throughput: 0: 12060.5. Samples: 240615936. Policy #0 lag: (min: 8.0, avg: 123.0, max: 264.0) [2024-06-15 17:18:40,955][1648985] Avg episode reward: [(0, '162.520')] [2024-06-15 17:18:41,240][1652491] Updated weights for policy 0, policy_version 469884 (0.0024) [2024-06-15 17:18:44,278][1652491] Updated weights for policy 0, policy_version 469942 (0.0012) [2024-06-15 17:18:45,955][1648985] Fps is (10 sec: 52429.6, 60 sec: 47513.5, 300 sec: 46541.7). Total num frames: 962461696. Throughput: 0: 11901.2. Samples: 240674816. Policy #0 lag: (min: 8.0, avg: 123.0, max: 264.0) [2024-06-15 17:18:45,956][1648985] Avg episode reward: [(0, '148.810')] [2024-06-15 17:18:47,575][1651469] Signal inference workers to stop experience collection... (24500 times) [2024-06-15 17:18:47,634][1652491] InferenceWorker_p0-w0: stopping experience collection (24500 times) [2024-06-15 17:18:47,823][1651469] Signal inference workers to resume experience collection... (24500 times) [2024-06-15 17:18:47,824][1652491] InferenceWorker_p0-w0: resuming experience collection (24500 times) [2024-06-15 17:18:47,997][1652491] Updated weights for policy 0, policy_version 469985 (0.0024) [2024-06-15 17:18:50,559][1652491] Updated weights for policy 0, policy_version 470017 (0.0012) [2024-06-15 17:18:50,955][1648985] Fps is (10 sec: 36044.8, 60 sec: 46421.4, 300 sec: 46319.5). Total num frames: 962625536. Throughput: 0: 11810.1. Samples: 240752128. Policy #0 lag: (min: 8.0, avg: 123.0, max: 264.0) [2024-06-15 17:18:50,956][1648985] Avg episode reward: [(0, '131.300')] [2024-06-15 17:18:52,265][1652491] Updated weights for policy 0, policy_version 470082 (0.0012) [2024-06-15 17:18:53,454][1652491] Updated weights for policy 0, policy_version 470140 (0.0012) [2024-06-15 17:18:55,479][1652491] Updated weights for policy 0, policy_version 470200 (0.0014) [2024-06-15 17:18:55,956][1648985] Fps is (10 sec: 52425.7, 60 sec: 49697.7, 300 sec: 46763.7). Total num frames: 962985984. Throughput: 0: 11787.2. Samples: 240781312. Policy #0 lag: (min: 8.0, avg: 123.0, max: 264.0) [2024-06-15 17:18:55,956][1648985] Avg episode reward: [(0, '141.750')] [2024-06-15 17:18:55,985][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000470208_962985984.pth... [2024-06-15 17:18:56,052][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000464640_951582720.pth [2024-06-15 17:18:58,610][1652491] Updated weights for policy 0, policy_version 470224 (0.0027) [2024-06-15 17:18:59,385][1652491] Updated weights for policy 0, policy_version 470267 (0.0013) [2024-06-15 17:19:00,955][1648985] Fps is (10 sec: 49151.5, 60 sec: 45875.3, 300 sec: 46208.5). Total num frames: 963117056. Throughput: 0: 11628.1. Samples: 240857600. Policy #0 lag: (min: 8.0, avg: 123.0, max: 264.0) [2024-06-15 17:19:00,956][1648985] Avg episode reward: [(0, '149.030')] [2024-06-15 17:19:03,313][1652491] Updated weights for policy 0, policy_version 470339 (0.0041) [2024-06-15 17:19:04,985][1652491] Updated weights for policy 0, policy_version 470400 (0.0011) [2024-06-15 17:19:05,955][1648985] Fps is (10 sec: 39323.6, 60 sec: 48059.7, 300 sec: 46874.9). Total num frames: 963379200. Throughput: 0: 11537.1. Samples: 240917504. Policy #0 lag: (min: 8.0, avg: 123.0, max: 264.0) [2024-06-15 17:19:05,956][1648985] Avg episode reward: [(0, '170.870')] [2024-06-15 17:19:07,271][1652491] Updated weights for policy 0, policy_version 470464 (0.0015) [2024-06-15 17:19:10,955][1648985] Fps is (10 sec: 45875.3, 60 sec: 44782.9, 300 sec: 46319.5). Total num frames: 963575808. Throughput: 0: 11343.6. Samples: 240951296. Policy #0 lag: (min: 5.0, avg: 152.1, max: 261.0) [2024-06-15 17:19:10,956][1648985] Avg episode reward: [(0, '176.870')] [2024-06-15 17:19:11,342][1652491] Updated weights for policy 0, policy_version 470528 (0.0012) [2024-06-15 17:19:14,894][1652491] Updated weights for policy 0, policy_version 470592 (0.0014) [2024-06-15 17:19:15,955][1648985] Fps is (10 sec: 45874.1, 60 sec: 46967.5, 300 sec: 46874.8). Total num frames: 963837952. Throughput: 0: 11457.4. Samples: 241021952. Policy #0 lag: (min: 5.0, avg: 152.1, max: 261.0) [2024-06-15 17:19:15,956][1648985] Avg episode reward: [(0, '159.010')] [2024-06-15 17:19:16,317][1652491] Updated weights for policy 0, policy_version 470646 (0.0013) [2024-06-15 17:19:17,563][1652491] Updated weights for policy 0, policy_version 470675 (0.0014) [2024-06-15 17:19:18,600][1652491] Updated weights for policy 0, policy_version 470720 (0.0012) [2024-06-15 17:19:20,955][1648985] Fps is (10 sec: 45875.3, 60 sec: 45329.1, 300 sec: 46541.7). Total num frames: 964034560. Throughput: 0: 11457.4. Samples: 241096704. Policy #0 lag: (min: 5.0, avg: 152.1, max: 261.0) [2024-06-15 17:19:20,956][1648985] Avg episode reward: [(0, '143.540')] [2024-06-15 17:19:22,830][1652491] Updated weights for policy 0, policy_version 470780 (0.0015) [2024-06-15 17:19:25,804][1652491] Updated weights for policy 0, policy_version 470832 (0.0012) [2024-06-15 17:19:25,955][1648985] Fps is (10 sec: 42599.3, 60 sec: 45328.9, 300 sec: 46541.6). Total num frames: 964263936. Throughput: 0: 11446.0. Samples: 241131008. Policy #0 lag: (min: 5.0, avg: 152.1, max: 261.0) [2024-06-15 17:19:25,956][1648985] Avg episode reward: [(0, '139.100')] [2024-06-15 17:19:27,592][1652491] Updated weights for policy 0, policy_version 470908 (0.0012) [2024-06-15 17:19:28,865][1651469] Signal inference workers to stop experience collection... (24550 times) [2024-06-15 17:19:28,919][1652491] InferenceWorker_p0-w0: stopping experience collection (24550 times) [2024-06-15 17:19:29,107][1651469] Signal inference workers to resume experience collection... (24550 times) [2024-06-15 17:19:29,109][1652491] InferenceWorker_p0-w0: resuming experience collection (24550 times) [2024-06-15 17:19:29,111][1652491] Updated weights for policy 0, policy_version 470960 (0.0014) [2024-06-15 17:19:30,955][1648985] Fps is (10 sec: 52428.1, 60 sec: 45875.1, 300 sec: 46874.9). Total num frames: 964558848. Throughput: 0: 11639.4. Samples: 241198592. Policy #0 lag: (min: 5.0, avg: 152.1, max: 261.0) [2024-06-15 17:19:30,956][1648985] Avg episode reward: [(0, '140.270')] [2024-06-15 17:19:33,114][1652491] Updated weights for policy 0, policy_version 471011 (0.0011) [2024-06-15 17:19:35,955][1648985] Fps is (10 sec: 45875.6, 60 sec: 46421.4, 300 sec: 46319.5). Total num frames: 964722688. Throughput: 0: 11628.1. Samples: 241275392. Policy #0 lag: (min: 5.0, avg: 152.1, max: 261.0) [2024-06-15 17:19:35,956][1648985] Avg episode reward: [(0, '170.200')] [2024-06-15 17:19:36,258][1652491] Updated weights for policy 0, policy_version 471072 (0.0025) [2024-06-15 17:19:37,954][1652491] Updated weights for policy 0, policy_version 471136 (0.0013) [2024-06-15 17:19:39,980][1652491] Updated weights for policy 0, policy_version 471184 (0.0011) [2024-06-15 17:19:40,955][1648985] Fps is (10 sec: 49152.6, 60 sec: 46421.3, 300 sec: 46986.0). Total num frames: 965050368. Throughput: 0: 11503.1. Samples: 241298944. Policy #0 lag: (min: 5.0, avg: 152.1, max: 261.0) [2024-06-15 17:19:40,956][1648985] Avg episode reward: [(0, '166.340')] [2024-06-15 17:19:41,079][1652491] Updated weights for policy 0, policy_version 471231 (0.0014) [2024-06-15 17:19:44,661][1652491] Updated weights for policy 0, policy_version 471291 (0.0015) [2024-06-15 17:19:45,955][1648985] Fps is (10 sec: 49150.8, 60 sec: 45875.0, 300 sec: 46655.8). Total num frames: 965214208. Throughput: 0: 11548.4. Samples: 241377280. Policy #0 lag: (min: 5.0, avg: 152.1, max: 261.0) [2024-06-15 17:19:45,956][1648985] Avg episode reward: [(0, '147.560')] [2024-06-15 17:19:47,616][1652491] Updated weights for policy 0, policy_version 471344 (0.0032) [2024-06-15 17:19:48,345][1652491] Updated weights for policy 0, policy_version 471361 (0.0010) [2024-06-15 17:19:49,874][1652491] Updated weights for policy 0, policy_version 471420 (0.0013) [2024-06-15 17:19:50,955][1648985] Fps is (10 sec: 45875.1, 60 sec: 48059.7, 300 sec: 47208.2). Total num frames: 965509120. Throughput: 0: 11696.4. Samples: 241443840. Policy #0 lag: (min: 5.0, avg: 152.1, max: 261.0) [2024-06-15 17:19:50,956][1648985] Avg episode reward: [(0, '141.310')] [2024-06-15 17:19:51,333][1652491] Updated weights for policy 0, policy_version 471461 (0.0012) [2024-06-15 17:19:54,695][1652491] Updated weights for policy 0, policy_version 471491 (0.0010) [2024-06-15 17:19:55,969][1648985] Fps is (10 sec: 52355.3, 60 sec: 45864.7, 300 sec: 46761.6). Total num frames: 965738496. Throughput: 0: 11931.5. Samples: 241488384. Policy #0 lag: (min: 5.0, avg: 152.1, max: 261.0) [2024-06-15 17:19:55,970][1648985] Avg episode reward: [(0, '148.380')] [2024-06-15 17:19:57,077][1652491] Updated weights for policy 0, policy_version 471553 (0.0016) [2024-06-15 17:19:58,551][1652491] Updated weights for policy 0, policy_version 471615 (0.0171) [2024-06-15 17:19:59,765][1652491] Updated weights for policy 0, policy_version 471672 (0.0013) [2024-06-15 17:20:00,955][1648985] Fps is (10 sec: 49152.0, 60 sec: 48059.8, 300 sec: 47097.1). Total num frames: 966000640. Throughput: 0: 11776.1. Samples: 241551872. Policy #0 lag: (min: 5.0, avg: 152.1, max: 261.0) [2024-06-15 17:20:00,956][1648985] Avg episode reward: [(0, '135.430')] [2024-06-15 17:20:02,283][1652491] Updated weights for policy 0, policy_version 471699 (0.0012) [2024-06-15 17:20:03,162][1652491] Updated weights for policy 0, policy_version 471743 (0.0022) [2024-06-15 17:20:05,955][1648985] Fps is (10 sec: 42659.1, 60 sec: 46421.3, 300 sec: 46874.9). Total num frames: 966164480. Throughput: 0: 11969.4. Samples: 241635328. Policy #0 lag: (min: 5.0, avg: 152.1, max: 261.0) [2024-06-15 17:20:05,956][1648985] Avg episode reward: [(0, '120.350')] [2024-06-15 17:20:06,734][1652491] Updated weights for policy 0, policy_version 471798 (0.0014) [2024-06-15 17:20:08,592][1652491] Updated weights for policy 0, policy_version 471863 (0.0019) [2024-06-15 17:20:10,955][1648985] Fps is (10 sec: 45875.5, 60 sec: 48059.8, 300 sec: 46874.9). Total num frames: 966459392. Throughput: 0: 11832.9. Samples: 241663488. Policy #0 lag: (min: 5.0, avg: 152.1, max: 261.0) [2024-06-15 17:20:10,956][1648985] Avg episode reward: [(0, '137.170')] [2024-06-15 17:20:11,058][1652491] Updated weights for policy 0, policy_version 471920 (0.0123) [2024-06-15 17:20:13,311][1651469] Signal inference workers to stop experience collection... (24600 times) [2024-06-15 17:20:13,341][1652491] InferenceWorker_p0-w0: stopping experience collection (24600 times) [2024-06-15 17:20:13,605][1651469] Signal inference workers to resume experience collection... (24600 times) [2024-06-15 17:20:13,606][1652491] InferenceWorker_p0-w0: resuming experience collection (24600 times) [2024-06-15 17:20:13,610][1652491] Updated weights for policy 0, policy_version 471984 (0.0015) [2024-06-15 17:20:15,955][1648985] Fps is (10 sec: 49151.9, 60 sec: 46967.7, 300 sec: 47097.1). Total num frames: 966656000. Throughput: 0: 11980.8. Samples: 241737728. Policy #0 lag: (min: 5.0, avg: 152.1, max: 261.0) [2024-06-15 17:20:15,956][1648985] Avg episode reward: [(0, '153.370')] [2024-06-15 17:20:17,039][1652491] Updated weights for policy 0, policy_version 472032 (0.0011) [2024-06-15 17:20:19,029][1652491] Updated weights for policy 0, policy_version 472097 (0.0013) [2024-06-15 17:20:20,955][1648985] Fps is (10 sec: 45875.1, 60 sec: 48059.7, 300 sec: 46874.9). Total num frames: 966918144. Throughput: 0: 11798.8. Samples: 241806336. Policy #0 lag: (min: 5.0, avg: 152.1, max: 261.0) [2024-06-15 17:20:20,956][1648985] Avg episode reward: [(0, '184.740')] [2024-06-15 17:20:21,327][1652491] Updated weights for policy 0, policy_version 472144 (0.0013) [2024-06-15 17:20:24,577][1652491] Updated weights for policy 0, policy_version 472227 (0.0107) [2024-06-15 17:20:25,290][1652491] Updated weights for policy 0, policy_version 472256 (0.0013) [2024-06-15 17:20:25,955][1648985] Fps is (10 sec: 52429.5, 60 sec: 48606.0, 300 sec: 47541.4). Total num frames: 967180288. Throughput: 0: 12026.3. Samples: 241840128. Policy #0 lag: (min: 5.0, avg: 152.1, max: 261.0) [2024-06-15 17:20:25,955][1648985] Avg episode reward: [(0, '177.610')] [2024-06-15 17:20:29,089][1652491] Updated weights for policy 0, policy_version 472320 (0.0014) [2024-06-15 17:20:30,955][1648985] Fps is (10 sec: 49152.1, 60 sec: 47513.7, 300 sec: 46986.0). Total num frames: 967409664. Throughput: 0: 12003.6. Samples: 241917440. Policy #0 lag: (min: 5.0, avg: 152.1, max: 261.0) [2024-06-15 17:20:30,955][1648985] Avg episode reward: [(0, '160.080')] [2024-06-15 17:20:31,172][1652491] Updated weights for policy 0, policy_version 472382 (0.0082) [2024-06-15 17:20:34,519][1652491] Updated weights for policy 0, policy_version 472464 (0.0013) [2024-06-15 17:20:35,955][1648985] Fps is (10 sec: 52427.5, 60 sec: 49698.0, 300 sec: 47541.3). Total num frames: 967704576. Throughput: 0: 11992.1. Samples: 241983488. Policy #0 lag: (min: 57.0, avg: 170.4, max: 313.0) [2024-06-15 17:20:35,956][1648985] Avg episode reward: [(0, '141.490')] [2024-06-15 17:20:38,808][1652491] Updated weights for policy 0, policy_version 472513 (0.0013) [2024-06-15 17:20:40,405][1652491] Updated weights for policy 0, policy_version 472592 (0.0014) [2024-06-15 17:20:40,955][1648985] Fps is (10 sec: 49151.5, 60 sec: 47513.6, 300 sec: 47097.1). Total num frames: 967901184. Throughput: 0: 12143.9. Samples: 242034688. Policy #0 lag: (min: 57.0, avg: 170.4, max: 313.0) [2024-06-15 17:20:40,956][1648985] Avg episode reward: [(0, '126.400')] [2024-06-15 17:20:43,130][1652491] Updated weights for policy 0, policy_version 472672 (0.0022) [2024-06-15 17:20:45,746][1652491] Updated weights for policy 0, policy_version 472707 (0.0027) [2024-06-15 17:20:45,955][1648985] Fps is (10 sec: 42599.2, 60 sec: 48606.1, 300 sec: 47208.2). Total num frames: 968130560. Throughput: 0: 12026.3. Samples: 242093056. Policy #0 lag: (min: 57.0, avg: 170.4, max: 313.0) [2024-06-15 17:20:45,955][1648985] Avg episode reward: [(0, '119.430')] [2024-06-15 17:20:46,840][1652491] Updated weights for policy 0, policy_version 472766 (0.0017) [2024-06-15 17:20:50,585][1652491] Updated weights for policy 0, policy_version 472822 (0.0020) [2024-06-15 17:20:50,955][1648985] Fps is (10 sec: 45874.4, 60 sec: 47513.5, 300 sec: 47319.2). Total num frames: 968359936. Throughput: 0: 11889.7. Samples: 242170368. Policy #0 lag: (min: 57.0, avg: 170.4, max: 313.0) [2024-06-15 17:20:50,956][1648985] Avg episode reward: [(0, '126.310')] [2024-06-15 17:20:52,204][1652491] Updated weights for policy 0, policy_version 472889 (0.0015) [2024-06-15 17:20:54,908][1652491] Updated weights for policy 0, policy_version 472945 (0.0013) [2024-06-15 17:20:55,955][1648985] Fps is (10 sec: 49150.2, 60 sec: 48070.9, 300 sec: 47097.0). Total num frames: 968622080. Throughput: 0: 12026.2. Samples: 242204672. Policy #0 lag: (min: 57.0, avg: 170.4, max: 313.0) [2024-06-15 17:20:55,956][1648985] Avg episode reward: [(0, '154.240')] [2024-06-15 17:20:55,967][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000472960_968622080.pth... [2024-06-15 17:20:56,051][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000467440_957317120.pth [2024-06-15 17:20:57,470][1652491] Updated weights for policy 0, policy_version 472979 (0.0013) [2024-06-15 17:20:57,857][1651469] Signal inference workers to stop experience collection... (24650 times) [2024-06-15 17:20:57,899][1652491] InferenceWorker_p0-w0: stopping experience collection (24650 times) [2024-06-15 17:20:58,188][1651469] Signal inference workers to resume experience collection... (24650 times) [2024-06-15 17:20:58,188][1652491] InferenceWorker_p0-w0: resuming experience collection (24650 times) [2024-06-15 17:21:00,508][1652491] Updated weights for policy 0, policy_version 473027 (0.0014) [2024-06-15 17:21:00,955][1648985] Fps is (10 sec: 42598.3, 60 sec: 46421.2, 300 sec: 47319.2). Total num frames: 968785920. Throughput: 0: 11912.5. Samples: 242273792. Policy #0 lag: (min: 57.0, avg: 170.4, max: 313.0) [2024-06-15 17:21:00,956][1648985] Avg episode reward: [(0, '148.330')] [2024-06-15 17:21:01,732][1652491] Updated weights for policy 0, policy_version 473088 (0.0014) [2024-06-15 17:21:03,697][1652491] Updated weights for policy 0, policy_version 473145 (0.0015) [2024-06-15 17:21:05,887][1652491] Updated weights for policy 0, policy_version 473186 (0.0017) [2024-06-15 17:21:05,955][1648985] Fps is (10 sec: 45877.2, 60 sec: 48606.0, 300 sec: 47319.2). Total num frames: 969080832. Throughput: 0: 11753.3. Samples: 242335232. Policy #0 lag: (min: 57.0, avg: 170.4, max: 313.0) [2024-06-15 17:21:05,956][1648985] Avg episode reward: [(0, '147.820')] [2024-06-15 17:21:09,636][1652491] Updated weights for policy 0, policy_version 473248 (0.0013) [2024-06-15 17:21:10,955][1648985] Fps is (10 sec: 49152.2, 60 sec: 46967.3, 300 sec: 47430.6). Total num frames: 969277440. Throughput: 0: 11901.1. Samples: 242375680. Policy #0 lag: (min: 57.0, avg: 170.4, max: 313.0) [2024-06-15 17:21:10,956][1648985] Avg episode reward: [(0, '133.160')] [2024-06-15 17:21:12,434][1652491] Updated weights for policy 0, policy_version 473300 (0.0011) [2024-06-15 17:21:13,502][1652491] Updated weights for policy 0, policy_version 473343 (0.0013) [2024-06-15 17:21:15,588][1652491] Updated weights for policy 0, policy_version 473399 (0.0014) [2024-06-15 17:21:15,955][1648985] Fps is (10 sec: 45874.7, 60 sec: 48059.8, 300 sec: 47097.1). Total num frames: 969539584. Throughput: 0: 11685.0. Samples: 242443264. Policy #0 lag: (min: 57.0, avg: 170.4, max: 313.0) [2024-06-15 17:21:15,956][1648985] Avg episode reward: [(0, '119.840')] [2024-06-15 17:21:16,601][1652491] Updated weights for policy 0, policy_version 473426 (0.0014) [2024-06-15 17:21:20,507][1652491] Updated weights for policy 0, policy_version 473475 (0.0012) [2024-06-15 17:21:20,955][1648985] Fps is (10 sec: 42599.4, 60 sec: 46421.4, 300 sec: 47208.1). Total num frames: 969703424. Throughput: 0: 11798.8. Samples: 242514432. Policy #0 lag: (min: 57.0, avg: 170.4, max: 313.0) [2024-06-15 17:21:20,955][1648985] Avg episode reward: [(0, '135.130')] [2024-06-15 17:21:21,729][1652491] Updated weights for policy 0, policy_version 473536 (0.0014) [2024-06-15 17:21:23,519][1652491] Updated weights for policy 0, policy_version 473584 (0.0018) [2024-06-15 17:21:25,955][1648985] Fps is (10 sec: 49151.8, 60 sec: 47513.5, 300 sec: 47208.1). Total num frames: 970031104. Throughput: 0: 11400.5. Samples: 242547712. Policy #0 lag: (min: 57.0, avg: 170.4, max: 313.0) [2024-06-15 17:21:25,956][1648985] Avg episode reward: [(0, '159.720')] [2024-06-15 17:21:25,990][1652491] Updated weights for policy 0, policy_version 473655 (0.0016) [2024-06-15 17:21:28,463][1652491] Updated weights for policy 0, policy_version 473714 (0.0013) [2024-06-15 17:21:30,955][1648985] Fps is (10 sec: 49151.6, 60 sec: 46421.3, 300 sec: 47097.1). Total num frames: 970194944. Throughput: 0: 11787.4. Samples: 242623488. Policy #0 lag: (min: 57.0, avg: 170.4, max: 313.0) [2024-06-15 17:21:30,956][1648985] Avg episode reward: [(0, '181.270')] [2024-06-15 17:21:32,197][1652491] Updated weights for policy 0, policy_version 473783 (0.0023) [2024-06-15 17:21:33,466][1652491] Updated weights for policy 0, policy_version 473808 (0.0011) [2024-06-15 17:21:35,955][1648985] Fps is (10 sec: 42598.5, 60 sec: 45875.3, 300 sec: 47208.1). Total num frames: 970457088. Throughput: 0: 11650.9. Samples: 242694656. Policy #0 lag: (min: 57.0, avg: 170.4, max: 313.0) [2024-06-15 17:21:35,956][1648985] Avg episode reward: [(0, '163.860')] [2024-06-15 17:21:36,292][1652491] Updated weights for policy 0, policy_version 473872 (0.0144) [2024-06-15 17:21:39,392][1652491] Updated weights for policy 0, policy_version 473938 (0.0014) [2024-06-15 17:21:40,955][1648985] Fps is (10 sec: 52429.4, 60 sec: 46967.6, 300 sec: 47097.1). Total num frames: 970719232. Throughput: 0: 11651.0. Samples: 242728960. Policy #0 lag: (min: 57.0, avg: 170.4, max: 313.0) [2024-06-15 17:21:40,956][1648985] Avg episode reward: [(0, '161.600')] [2024-06-15 17:21:42,197][1652491] Updated weights for policy 0, policy_version 473988 (0.0011) [2024-06-15 17:21:43,545][1652491] Updated weights for policy 0, policy_version 474047 (0.0013) [2024-06-15 17:21:44,907][1651469] Signal inference workers to stop experience collection... (24700 times) [2024-06-15 17:21:44,957][1652491] InferenceWorker_p0-w0: stopping experience collection (24700 times) [2024-06-15 17:21:45,160][1651469] Signal inference workers to resume experience collection... (24700 times) [2024-06-15 17:21:45,162][1652491] InferenceWorker_p0-w0: resuming experience collection (24700 times) [2024-06-15 17:21:45,803][1652491] Updated weights for policy 0, policy_version 474107 (0.0016) [2024-06-15 17:21:45,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 47513.6, 300 sec: 47319.2). Total num frames: 970981376. Throughput: 0: 11616.8. Samples: 242796544. Policy #0 lag: (min: 57.0, avg: 170.4, max: 313.0) [2024-06-15 17:21:45,956][1648985] Avg episode reward: [(0, '143.790')] [2024-06-15 17:21:48,701][1652491] Updated weights for policy 0, policy_version 474160 (0.0041) [2024-06-15 17:21:50,747][1652491] Updated weights for policy 0, policy_version 474226 (0.0013) [2024-06-15 17:21:50,955][1648985] Fps is (10 sec: 49150.1, 60 sec: 47513.5, 300 sec: 47208.1). Total num frames: 971210752. Throughput: 0: 11764.5. Samples: 242864640. Policy #0 lag: (min: 57.0, avg: 170.4, max: 313.0) [2024-06-15 17:21:50,956][1648985] Avg episode reward: [(0, '167.210')] [2024-06-15 17:21:54,504][1652491] Updated weights for policy 0, policy_version 474272 (0.0015) [2024-06-15 17:21:55,955][1648985] Fps is (10 sec: 39321.5, 60 sec: 45875.4, 300 sec: 47208.1). Total num frames: 971374592. Throughput: 0: 11935.3. Samples: 242912768. Policy #0 lag: (min: 57.0, avg: 170.4, max: 313.0) [2024-06-15 17:21:55,956][1648985] Avg episode reward: [(0, '164.660')] [2024-06-15 17:21:56,616][1652491] Updated weights for policy 0, policy_version 474336 (0.0014) [2024-06-15 17:21:57,198][1652491] Updated weights for policy 0, policy_version 474365 (0.0038) [2024-06-15 17:21:58,809][1652491] Updated weights for policy 0, policy_version 474424 (0.0013) [2024-06-15 17:22:00,955][1648985] Fps is (10 sec: 49153.6, 60 sec: 48606.1, 300 sec: 47319.2). Total num frames: 971702272. Throughput: 0: 11889.8. Samples: 242978304. Policy #0 lag: (min: 31.0, avg: 152.8, max: 287.0) [2024-06-15 17:22:00,955][1648985] Avg episode reward: [(0, '149.770')] [2024-06-15 17:22:01,023][1652491] Updated weights for policy 0, policy_version 474480 (0.0013) [2024-06-15 17:22:05,955][1648985] Fps is (10 sec: 45875.2, 60 sec: 45875.1, 300 sec: 47319.2). Total num frames: 971833344. Throughput: 0: 11992.2. Samples: 243054080. Policy #0 lag: (min: 31.0, avg: 152.8, max: 287.0) [2024-06-15 17:22:05,956][1648985] Avg episode reward: [(0, '166.010')] [2024-06-15 17:22:06,201][1652491] Updated weights for policy 0, policy_version 474551 (0.0013) [2024-06-15 17:22:07,165][1652491] Updated weights for policy 0, policy_version 474592 (0.0013) [2024-06-15 17:22:08,615][1652491] Updated weights for policy 0, policy_version 474625 (0.0014) [2024-06-15 17:22:10,711][1652491] Updated weights for policy 0, policy_version 474690 (0.0014) [2024-06-15 17:22:10,955][1648985] Fps is (10 sec: 49151.2, 60 sec: 48605.9, 300 sec: 47208.1). Total num frames: 972193792. Throughput: 0: 12026.3. Samples: 243088896. Policy #0 lag: (min: 31.0, avg: 152.8, max: 287.0) [2024-06-15 17:22:10,956][1648985] Avg episode reward: [(0, '152.230')] [2024-06-15 17:22:12,184][1652491] Updated weights for policy 0, policy_version 474752 (0.0022) [2024-06-15 17:22:15,955][1648985] Fps is (10 sec: 45875.7, 60 sec: 45875.3, 300 sec: 47097.1). Total num frames: 972292096. Throughput: 0: 11935.3. Samples: 243160576. Policy #0 lag: (min: 31.0, avg: 152.8, max: 287.0) [2024-06-15 17:22:15,955][1648985] Avg episode reward: [(0, '133.030')] [2024-06-15 17:22:17,229][1652491] Updated weights for policy 0, policy_version 474803 (0.0012) [2024-06-15 17:22:18,300][1652491] Updated weights for policy 0, policy_version 474851 (0.0013) [2024-06-15 17:22:20,460][1652491] Updated weights for policy 0, policy_version 474897 (0.0012) [2024-06-15 17:22:20,955][1648985] Fps is (10 sec: 42598.7, 60 sec: 48605.8, 300 sec: 47319.2). Total num frames: 972619776. Throughput: 0: 11946.7. Samples: 243232256. Policy #0 lag: (min: 31.0, avg: 152.8, max: 287.0) [2024-06-15 17:22:20,956][1648985] Avg episode reward: [(0, '141.030')] [2024-06-15 17:22:22,012][1652491] Updated weights for policy 0, policy_version 474963 (0.0013) [2024-06-15 17:22:25,955][1648985] Fps is (10 sec: 52426.9, 60 sec: 46421.2, 300 sec: 47097.0). Total num frames: 972816384. Throughput: 0: 11923.8. Samples: 243265536. Policy #0 lag: (min: 31.0, avg: 152.8, max: 287.0) [2024-06-15 17:22:25,956][1648985] Avg episode reward: [(0, '171.320')] [2024-06-15 17:22:26,989][1652491] Updated weights for policy 0, policy_version 475010 (0.0012) [2024-06-15 17:22:28,131][1651469] Signal inference workers to stop experience collection... (24750 times) [2024-06-15 17:22:28,190][1652491] InferenceWorker_p0-w0: stopping experience collection (24750 times) [2024-06-15 17:22:28,385][1651469] Signal inference workers to resume experience collection... (24750 times) [2024-06-15 17:22:28,387][1652491] InferenceWorker_p0-w0: resuming experience collection (24750 times) [2024-06-15 17:22:28,390][1652491] Updated weights for policy 0, policy_version 475072 (0.0011) [2024-06-15 17:22:29,724][1652491] Updated weights for policy 0, policy_version 475133 (0.0203) [2024-06-15 17:22:30,955][1648985] Fps is (10 sec: 45875.3, 60 sec: 48059.7, 300 sec: 47208.1). Total num frames: 973078528. Throughput: 0: 12003.6. Samples: 243336704. Policy #0 lag: (min: 31.0, avg: 152.8, max: 287.0) [2024-06-15 17:22:30,956][1648985] Avg episode reward: [(0, '165.570')] [2024-06-15 17:22:32,162][1652491] Updated weights for policy 0, policy_version 475184 (0.0012) [2024-06-15 17:22:33,821][1652491] Updated weights for policy 0, policy_version 475258 (0.0014) [2024-06-15 17:22:35,955][1648985] Fps is (10 sec: 52430.1, 60 sec: 48059.7, 300 sec: 47319.2). Total num frames: 973340672. Throughput: 0: 12071.9. Samples: 243407872. Policy #0 lag: (min: 31.0, avg: 152.8, max: 287.0) [2024-06-15 17:22:35,956][1648985] Avg episode reward: [(0, '168.860')] [2024-06-15 17:22:38,718][1652491] Updated weights for policy 0, policy_version 475318 (0.0013) [2024-06-15 17:22:39,968][1652491] Updated weights for policy 0, policy_version 475385 (0.0012) [2024-06-15 17:22:40,955][1648985] Fps is (10 sec: 52428.5, 60 sec: 48059.6, 300 sec: 47430.3). Total num frames: 973602816. Throughput: 0: 11992.2. Samples: 243452416. Policy #0 lag: (min: 31.0, avg: 152.8, max: 287.0) [2024-06-15 17:22:40,956][1648985] Avg episode reward: [(0, '131.600')] [2024-06-15 17:22:43,076][1652491] Updated weights for policy 0, policy_version 475440 (0.0011) [2024-06-15 17:22:44,625][1652491] Updated weights for policy 0, policy_version 475504 (0.0024) [2024-06-15 17:22:45,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 48059.7, 300 sec: 47541.4). Total num frames: 973864960. Throughput: 0: 12026.3. Samples: 243519488. Policy #0 lag: (min: 31.0, avg: 152.8, max: 287.0) [2024-06-15 17:22:45,956][1648985] Avg episode reward: [(0, '143.330')] [2024-06-15 17:22:49,313][1652491] Updated weights for policy 0, policy_version 475570 (0.0013) [2024-06-15 17:22:50,955][1648985] Fps is (10 sec: 49151.7, 60 sec: 48059.9, 300 sec: 47763.5). Total num frames: 974094336. Throughput: 0: 11867.0. Samples: 243588096. Policy #0 lag: (min: 31.0, avg: 152.8, max: 287.0) [2024-06-15 17:22:50,956][1648985] Avg episode reward: [(0, '147.300')] [2024-06-15 17:22:50,979][1652491] Updated weights for policy 0, policy_version 475648 (0.0070) [2024-06-15 17:22:55,548][1652491] Updated weights for policy 0, policy_version 475713 (0.0013) [2024-06-15 17:22:55,955][1648985] Fps is (10 sec: 42597.7, 60 sec: 48605.8, 300 sec: 47208.1). Total num frames: 974290944. Throughput: 0: 12037.7. Samples: 243630592. Policy #0 lag: (min: 31.0, avg: 152.8, max: 287.0) [2024-06-15 17:22:55,956][1648985] Avg episode reward: [(0, '143.870')] [2024-06-15 17:22:56,427][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000475760_974356480.pth... [2024-06-15 17:22:56,471][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000470208_962985984.pth [2024-06-15 17:22:56,693][1652491] Updated weights for policy 0, policy_version 475773 (0.0014) [2024-06-15 17:23:00,783][1652491] Updated weights for policy 0, policy_version 475862 (0.0013) [2024-06-15 17:23:00,955][1648985] Fps is (10 sec: 45875.9, 60 sec: 47513.6, 300 sec: 47652.4). Total num frames: 974553088. Throughput: 0: 11969.4. Samples: 243699200. Policy #0 lag: (min: 31.0, avg: 152.8, max: 287.0) [2024-06-15 17:23:00,956][1648985] Avg episode reward: [(0, '154.700')] [2024-06-15 17:23:01,805][1652491] Updated weights for policy 0, policy_version 475904 (0.0011) [2024-06-15 17:23:05,955][1648985] Fps is (10 sec: 39322.0, 60 sec: 47513.6, 300 sec: 46763.8). Total num frames: 974684160. Throughput: 0: 11969.4. Samples: 243770880. Policy #0 lag: (min: 31.0, avg: 152.8, max: 287.0) [2024-06-15 17:23:05,956][1648985] Avg episode reward: [(0, '151.990')] [2024-06-15 17:23:06,925][1651469] Signal inference workers to stop experience collection... (24800 times) [2024-06-15 17:23:06,985][1652491] InferenceWorker_p0-w0: stopping experience collection (24800 times) [2024-06-15 17:23:07,174][1651469] Signal inference workers to resume experience collection... (24800 times) [2024-06-15 17:23:07,175][1652491] InferenceWorker_p0-w0: resuming experience collection (24800 times) [2024-06-15 17:23:07,350][1652491] Updated weights for policy 0, policy_version 475986 (0.0014) [2024-06-15 17:23:08,236][1652491] Updated weights for policy 0, policy_version 476029 (0.0012) [2024-06-15 17:23:10,955][1648985] Fps is (10 sec: 36044.7, 60 sec: 45329.1, 300 sec: 47097.1). Total num frames: 974913536. Throughput: 0: 11844.3. Samples: 243798528. Policy #0 lag: (min: 31.0, avg: 152.8, max: 287.0) [2024-06-15 17:23:10,956][1648985] Avg episode reward: [(0, '150.720')] [2024-06-15 17:23:12,343][1652491] Updated weights for policy 0, policy_version 476098 (0.0014) [2024-06-15 17:23:13,824][1652491] Updated weights for policy 0, policy_version 476160 (0.0092) [2024-06-15 17:23:15,955][1648985] Fps is (10 sec: 49151.9, 60 sec: 48059.6, 300 sec: 46986.0). Total num frames: 975175680. Throughput: 0: 11776.0. Samples: 243866624. Policy #0 lag: (min: 31.0, avg: 152.8, max: 287.0) [2024-06-15 17:23:15,956][1648985] Avg episode reward: [(0, '149.140')] [2024-06-15 17:23:17,926][1652491] Updated weights for policy 0, policy_version 476218 (0.0014) [2024-06-15 17:23:19,554][1652491] Updated weights for policy 0, policy_version 476280 (0.0014) [2024-06-15 17:23:20,955][1648985] Fps is (10 sec: 52428.6, 60 sec: 46967.5, 300 sec: 47097.0). Total num frames: 975437824. Throughput: 0: 11685.0. Samples: 243933696. Policy #0 lag: (min: 31.0, avg: 152.8, max: 287.0) [2024-06-15 17:23:20,956][1648985] Avg episode reward: [(0, '149.920')] [2024-06-15 17:23:23,113][1652491] Updated weights for policy 0, policy_version 476321 (0.0013) [2024-06-15 17:23:24,947][1652491] Updated weights for policy 0, policy_version 476387 (0.0165) [2024-06-15 17:23:25,955][1648985] Fps is (10 sec: 52429.4, 60 sec: 48060.0, 300 sec: 47097.1). Total num frames: 975699968. Throughput: 0: 11446.1. Samples: 243967488. Policy #0 lag: (min: 15.0, avg: 110.0, max: 271.0) [2024-06-15 17:23:25,955][1648985] Avg episode reward: [(0, '157.770')] [2024-06-15 17:23:28,270][1652491] Updated weights for policy 0, policy_version 476423 (0.0012) [2024-06-15 17:23:30,352][1652491] Updated weights for policy 0, policy_version 476496 (0.0015) [2024-06-15 17:23:30,955][1648985] Fps is (10 sec: 45875.3, 60 sec: 46967.5, 300 sec: 47319.2). Total num frames: 975896576. Throughput: 0: 11446.0. Samples: 244034560. Policy #0 lag: (min: 15.0, avg: 110.0, max: 271.0) [2024-06-15 17:23:30,956][1648985] Avg episode reward: [(0, '168.160')] [2024-06-15 17:23:34,134][1652491] Updated weights for policy 0, policy_version 476576 (0.0047) [2024-06-15 17:23:35,793][1652491] Updated weights for policy 0, policy_version 476640 (0.0013) [2024-06-15 17:23:35,955][1648985] Fps is (10 sec: 45874.7, 60 sec: 46967.4, 300 sec: 47097.0). Total num frames: 976158720. Throughput: 0: 11411.9. Samples: 244101632. Policy #0 lag: (min: 15.0, avg: 110.0, max: 271.0) [2024-06-15 17:23:35,956][1648985] Avg episode reward: [(0, '158.510')] [2024-06-15 17:23:36,467][1652491] Updated weights for policy 0, policy_version 476672 (0.0011) [2024-06-15 17:23:39,959][1652491] Updated weights for policy 0, policy_version 476729 (0.0184) [2024-06-15 17:23:40,955][1648985] Fps is (10 sec: 45874.8, 60 sec: 45875.2, 300 sec: 47097.0). Total num frames: 976355328. Throughput: 0: 11537.1. Samples: 244149760. Policy #0 lag: (min: 15.0, avg: 110.0, max: 271.0) [2024-06-15 17:23:40,956][1648985] Avg episode reward: [(0, '140.510')] [2024-06-15 17:23:42,425][1652491] Updated weights for policy 0, policy_version 476790 (0.0013) [2024-06-15 17:23:45,091][1652491] Updated weights for policy 0, policy_version 476848 (0.0012) [2024-06-15 17:23:45,955][1648985] Fps is (10 sec: 45874.9, 60 sec: 45875.1, 300 sec: 47430.3). Total num frames: 976617472. Throughput: 0: 11502.9. Samples: 244216832. Policy #0 lag: (min: 15.0, avg: 110.0, max: 271.0) [2024-06-15 17:23:45,956][1648985] Avg episode reward: [(0, '136.790')] [2024-06-15 17:23:46,318][1652491] Updated weights for policy 0, policy_version 476884 (0.0013) [2024-06-15 17:23:49,196][1652491] Updated weights for policy 0, policy_version 476930 (0.0013) [2024-06-15 17:23:49,656][1651469] Signal inference workers to stop experience collection... (24850 times) [2024-06-15 17:23:49,684][1652491] InferenceWorker_p0-w0: stopping experience collection (24850 times) [2024-06-15 17:23:49,899][1651469] Signal inference workers to resume experience collection... (24850 times) [2024-06-15 17:23:49,900][1652491] InferenceWorker_p0-w0: resuming experience collection (24850 times) [2024-06-15 17:23:50,513][1652491] Updated weights for policy 0, policy_version 476992 (0.0014) [2024-06-15 17:23:50,955][1648985] Fps is (10 sec: 52429.5, 60 sec: 46421.4, 300 sec: 47097.1). Total num frames: 976879616. Throughput: 0: 11525.7. Samples: 244289536. Policy #0 lag: (min: 15.0, avg: 110.0, max: 271.0) [2024-06-15 17:23:50,956][1648985] Avg episode reward: [(0, '130.310')] [2024-06-15 17:23:53,650][1652491] Updated weights for policy 0, policy_version 477045 (0.0014) [2024-06-15 17:23:55,715][1652491] Updated weights for policy 0, policy_version 477075 (0.0011) [2024-06-15 17:23:55,955][1648985] Fps is (10 sec: 45876.2, 60 sec: 46421.5, 300 sec: 47319.2). Total num frames: 977076224. Throughput: 0: 11662.2. Samples: 244323328. Policy #0 lag: (min: 15.0, avg: 110.0, max: 271.0) [2024-06-15 17:23:55,956][1648985] Avg episode reward: [(0, '129.570')] [2024-06-15 17:23:57,468][1652491] Updated weights for policy 0, policy_version 477122 (0.0013) [2024-06-15 17:24:00,149][1652491] Updated weights for policy 0, policy_version 477186 (0.0013) [2024-06-15 17:24:00,955][1648985] Fps is (10 sec: 45875.4, 60 sec: 46421.3, 300 sec: 47319.2). Total num frames: 977338368. Throughput: 0: 11753.3. Samples: 244395520. Policy #0 lag: (min: 15.0, avg: 110.0, max: 271.0) [2024-06-15 17:24:00,956][1648985] Avg episode reward: [(0, '132.920')] [2024-06-15 17:24:01,365][1652491] Updated weights for policy 0, policy_version 477238 (0.0013) [2024-06-15 17:24:03,344][1652491] Updated weights for policy 0, policy_version 477264 (0.0012) [2024-06-15 17:24:04,305][1652491] Updated weights for policy 0, policy_version 477310 (0.0026) [2024-06-15 17:24:05,955][1648985] Fps is (10 sec: 49151.3, 60 sec: 48059.7, 300 sec: 47430.3). Total num frames: 977567744. Throughput: 0: 12003.6. Samples: 244473856. Policy #0 lag: (min: 15.0, avg: 110.0, max: 271.0) [2024-06-15 17:24:05,956][1648985] Avg episode reward: [(0, '141.370')] [2024-06-15 17:24:06,402][1652491] Updated weights for policy 0, policy_version 477360 (0.0015) [2024-06-15 17:24:08,522][1652491] Updated weights for policy 0, policy_version 477408 (0.0015) [2024-06-15 17:24:10,783][1652491] Updated weights for policy 0, policy_version 477458 (0.0022) [2024-06-15 17:24:10,955][1648985] Fps is (10 sec: 49152.0, 60 sec: 48605.9, 300 sec: 47430.3). Total num frames: 977829888. Throughput: 0: 12083.2. Samples: 244511232. Policy #0 lag: (min: 15.0, avg: 110.0, max: 271.0) [2024-06-15 17:24:10,956][1648985] Avg episode reward: [(0, '130.260')] [2024-06-15 17:24:11,774][1652491] Updated weights for policy 0, policy_version 477504 (0.0011) [2024-06-15 17:24:14,785][1652491] Updated weights for policy 0, policy_version 477567 (0.0155) [2024-06-15 17:24:15,955][1648985] Fps is (10 sec: 49152.4, 60 sec: 48059.8, 300 sec: 47541.4). Total num frames: 978059264. Throughput: 0: 12106.0. Samples: 244579328. Policy #0 lag: (min: 15.0, avg: 110.0, max: 271.0) [2024-06-15 17:24:15,956][1648985] Avg episode reward: [(0, '122.210')] [2024-06-15 17:24:17,580][1652491] Updated weights for policy 0, policy_version 477630 (0.0064) [2024-06-15 17:24:20,514][1652491] Updated weights for policy 0, policy_version 477687 (0.0015) [2024-06-15 17:24:20,955][1648985] Fps is (10 sec: 49152.1, 60 sec: 48059.8, 300 sec: 47652.5). Total num frames: 978321408. Throughput: 0: 12094.6. Samples: 244645888. Policy #0 lag: (min: 15.0, avg: 110.0, max: 271.0) [2024-06-15 17:24:20,956][1648985] Avg episode reward: [(0, '124.540')] [2024-06-15 17:24:22,637][1652491] Updated weights for policy 0, policy_version 477728 (0.0014) [2024-06-15 17:24:25,237][1652491] Updated weights for policy 0, policy_version 477766 (0.0020) [2024-06-15 17:24:25,955][1648985] Fps is (10 sec: 45875.6, 60 sec: 46967.5, 300 sec: 47319.3). Total num frames: 978518016. Throughput: 0: 11844.3. Samples: 244682752. Policy #0 lag: (min: 15.0, avg: 110.0, max: 271.0) [2024-06-15 17:24:25,955][1648985] Avg episode reward: [(0, '160.870')] [2024-06-15 17:24:26,269][1652491] Updated weights for policy 0, policy_version 477812 (0.0014) [2024-06-15 17:24:28,023][1652491] Updated weights for policy 0, policy_version 477840 (0.0012) [2024-06-15 17:24:30,060][1652491] Updated weights for policy 0, policy_version 477890 (0.0018) [2024-06-15 17:24:30,955][1648985] Fps is (10 sec: 45875.2, 60 sec: 48059.8, 300 sec: 47652.4). Total num frames: 978780160. Throughput: 0: 11992.2. Samples: 244756480. Policy #0 lag: (min: 15.0, avg: 110.0, max: 271.0) [2024-06-15 17:24:30,956][1648985] Avg episode reward: [(0, '171.140')] [2024-06-15 17:24:31,382][1652491] Updated weights for policy 0, policy_version 477943 (0.0013) [2024-06-15 17:24:33,787][1652491] Updated weights for policy 0, policy_version 477984 (0.0014) [2024-06-15 17:24:35,955][1648985] Fps is (10 sec: 45875.0, 60 sec: 46967.6, 300 sec: 47208.1). Total num frames: 978976768. Throughput: 0: 11969.4. Samples: 244828160. Policy #0 lag: (min: 15.0, avg: 110.0, max: 271.0) [2024-06-15 17:24:35,956][1648985] Avg episode reward: [(0, '165.880')] [2024-06-15 17:24:36,514][1651469] Signal inference workers to stop experience collection... (24900 times) [2024-06-15 17:24:36,538][1652491] InferenceWorker_p0-w0: stopping experience collection (24900 times) [2024-06-15 17:24:36,738][1651469] Signal inference workers to resume experience collection... (24900 times) [2024-06-15 17:24:36,739][1652491] InferenceWorker_p0-w0: resuming experience collection (24900 times) [2024-06-15 17:24:36,982][1652491] Updated weights for policy 0, policy_version 478036 (0.0095) [2024-06-15 17:24:38,075][1652491] Updated weights for policy 0, policy_version 478078 (0.0011) [2024-06-15 17:24:40,955][1648985] Fps is (10 sec: 45874.9, 60 sec: 48059.8, 300 sec: 47541.4). Total num frames: 979238912. Throughput: 0: 11992.1. Samples: 244862976. Policy #0 lag: (min: 15.0, avg: 110.0, max: 271.0) [2024-06-15 17:24:40,957][1648985] Avg episode reward: [(0, '157.410')] [2024-06-15 17:24:41,526][1652491] Updated weights for policy 0, policy_version 478148 (0.0134) [2024-06-15 17:24:42,592][1652491] Updated weights for policy 0, policy_version 478208 (0.0014) [2024-06-15 17:24:45,202][1652491] Updated weights for policy 0, policy_version 478260 (0.0014) [2024-06-15 17:24:45,955][1648985] Fps is (10 sec: 52429.2, 60 sec: 48059.9, 300 sec: 47430.3). Total num frames: 979501056. Throughput: 0: 11878.4. Samples: 244930048. Policy #0 lag: (min: 15.0, avg: 110.0, max: 271.0) [2024-06-15 17:24:45,955][1648985] Avg episode reward: [(0, '134.460')] [2024-06-15 17:24:48,568][1652491] Updated weights for policy 0, policy_version 478304 (0.0011) [2024-06-15 17:24:50,772][1652491] Updated weights for policy 0, policy_version 478384 (0.0024) [2024-06-15 17:24:50,955][1648985] Fps is (10 sec: 49152.6, 60 sec: 47513.7, 300 sec: 47432.6). Total num frames: 979730432. Throughput: 0: 11696.4. Samples: 245000192. Policy #0 lag: (min: 15.0, avg: 112.8, max: 271.0) [2024-06-15 17:24:50,955][1648985] Avg episode reward: [(0, '134.500')] [2024-06-15 17:24:51,119][1652491] Updated weights for policy 0, policy_version 478400 (0.0019) [2024-06-15 17:24:53,807][1652491] Updated weights for policy 0, policy_version 478459 (0.0014) [2024-06-15 17:24:55,955][1648985] Fps is (10 sec: 45874.4, 60 sec: 48059.6, 300 sec: 47319.2). Total num frames: 979959808. Throughput: 0: 11673.6. Samples: 245036544. Policy #0 lag: (min: 15.0, avg: 112.8, max: 271.0) [2024-06-15 17:24:55,956][1648985] Avg episode reward: [(0, '147.800')] [2024-06-15 17:24:56,262][1652491] Updated weights for policy 0, policy_version 478519 (0.0030) [2024-06-15 17:24:56,380][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000478528_980025344.pth... [2024-06-15 17:24:56,457][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000472960_968622080.pth [2024-06-15 17:25:00,369][1652491] Updated weights for policy 0, policy_version 478588 (0.0020) [2024-06-15 17:25:00,955][1648985] Fps is (10 sec: 42598.1, 60 sec: 46967.4, 300 sec: 47430.3). Total num frames: 980156416. Throughput: 0: 11741.9. Samples: 245107712. Policy #0 lag: (min: 15.0, avg: 112.8, max: 271.0) [2024-06-15 17:25:00,956][1648985] Avg episode reward: [(0, '142.080')] [2024-06-15 17:25:02,222][1652491] Updated weights for policy 0, policy_version 478629 (0.0014) [2024-06-15 17:25:04,222][1652491] Updated weights for policy 0, policy_version 478688 (0.0016) [2024-06-15 17:25:05,802][1652491] Updated weights for policy 0, policy_version 478739 (0.0013) [2024-06-15 17:25:05,955][1648985] Fps is (10 sec: 49152.9, 60 sec: 48059.9, 300 sec: 47430.3). Total num frames: 980451328. Throughput: 0: 11764.6. Samples: 245175296. Policy #0 lag: (min: 15.0, avg: 112.8, max: 271.0) [2024-06-15 17:25:05,955][1648985] Avg episode reward: [(0, '143.250')] [2024-06-15 17:25:06,902][1652491] Updated weights for policy 0, policy_version 478784 (0.0014) [2024-06-15 17:25:10,955][1648985] Fps is (10 sec: 39321.5, 60 sec: 45329.0, 300 sec: 47097.1). Total num frames: 980549632. Throughput: 0: 11832.9. Samples: 245215232. Policy #0 lag: (min: 15.0, avg: 112.8, max: 271.0) [2024-06-15 17:25:10,956][1648985] Avg episode reward: [(0, '148.070')] [2024-06-15 17:25:11,546][1652491] Updated weights for policy 0, policy_version 478832 (0.0047) [2024-06-15 17:25:12,796][1652491] Updated weights for policy 0, policy_version 478896 (0.0013) [2024-06-15 17:25:14,952][1652491] Updated weights for policy 0, policy_version 478960 (0.0013) [2024-06-15 17:25:15,955][1648985] Fps is (10 sec: 52428.5, 60 sec: 48605.9, 300 sec: 47652.5). Total num frames: 980975616. Throughput: 0: 11935.3. Samples: 245293568. Policy #0 lag: (min: 15.0, avg: 112.8, max: 271.0) [2024-06-15 17:25:15,955][1648985] Avg episode reward: [(0, '151.250')] [2024-06-15 17:25:16,583][1652491] Updated weights for policy 0, policy_version 479024 (0.0013) [2024-06-15 17:25:20,955][1648985] Fps is (10 sec: 52429.6, 60 sec: 45875.3, 300 sec: 47097.1). Total num frames: 981073920. Throughput: 0: 12094.6. Samples: 245372416. Policy #0 lag: (min: 15.0, avg: 112.8, max: 271.0) [2024-06-15 17:25:20,955][1648985] Avg episode reward: [(0, '142.480')] [2024-06-15 17:25:21,169][1651469] Signal inference workers to stop experience collection... (24950 times) [2024-06-15 17:25:21,227][1652491] InferenceWorker_p0-w0: stopping experience collection (24950 times) [2024-06-15 17:25:21,371][1651469] Signal inference workers to resume experience collection... (24950 times) [2024-06-15 17:25:21,372][1652491] InferenceWorker_p0-w0: resuming experience collection (24950 times) [2024-06-15 17:25:21,374][1652491] Updated weights for policy 0, policy_version 479072 (0.0013) [2024-06-15 17:25:23,109][1652491] Updated weights for policy 0, policy_version 479136 (0.0026) [2024-06-15 17:25:25,493][1652491] Updated weights for policy 0, policy_version 479216 (0.0015) [2024-06-15 17:25:25,955][1648985] Fps is (10 sec: 49152.2, 60 sec: 49152.0, 300 sec: 47652.5). Total num frames: 981467136. Throughput: 0: 12094.6. Samples: 245407232. Policy #0 lag: (min: 15.0, avg: 112.8, max: 271.0) [2024-06-15 17:25:25,955][1648985] Avg episode reward: [(0, '149.220')] [2024-06-15 17:25:26,580][1652491] Updated weights for policy 0, policy_version 479250 (0.0015) [2024-06-15 17:25:30,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 46967.5, 300 sec: 47097.1). Total num frames: 981598208. Throughput: 0: 12140.1. Samples: 245476352. Policy #0 lag: (min: 15.0, avg: 112.8, max: 271.0) [2024-06-15 17:25:30,956][1648985] Avg episode reward: [(0, '157.860')] [2024-06-15 17:25:31,472][1652491] Updated weights for policy 0, policy_version 479297 (0.0015) [2024-06-15 17:25:32,593][1652491] Updated weights for policy 0, policy_version 479345 (0.0013) [2024-06-15 17:25:34,303][1652491] Updated weights for policy 0, policy_version 479393 (0.0033) [2024-06-15 17:25:35,955][1648985] Fps is (10 sec: 39321.6, 60 sec: 48059.8, 300 sec: 47319.2). Total num frames: 981860352. Throughput: 0: 12299.4. Samples: 245553664. Policy #0 lag: (min: 15.0, avg: 112.8, max: 271.0) [2024-06-15 17:25:35,955][1648985] Avg episode reward: [(0, '149.920')] [2024-06-15 17:25:36,310][1652491] Updated weights for policy 0, policy_version 479445 (0.0012) [2024-06-15 17:25:37,254][1652491] Updated weights for policy 0, policy_version 479492 (0.0013) [2024-06-15 17:25:38,413][1652491] Updated weights for policy 0, policy_version 479548 (0.0016) [2024-06-15 17:25:40,955][1648985] Fps is (10 sec: 52429.1, 60 sec: 48059.9, 300 sec: 47430.3). Total num frames: 982122496. Throughput: 0: 12197.0. Samples: 245585408. Policy #0 lag: (min: 15.0, avg: 112.8, max: 271.0) [2024-06-15 17:25:40,955][1648985] Avg episode reward: [(0, '121.150')] [2024-06-15 17:25:43,184][1652491] Updated weights for policy 0, policy_version 479600 (0.0020) [2024-06-15 17:25:44,274][1652491] Updated weights for policy 0, policy_version 479632 (0.0013) [2024-06-15 17:25:45,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 48059.8, 300 sec: 47541.4). Total num frames: 982384640. Throughput: 0: 12219.8. Samples: 245657600. Policy #0 lag: (min: 15.0, avg: 112.8, max: 271.0) [2024-06-15 17:25:45,956][1648985] Avg episode reward: [(0, '129.470')] [2024-06-15 17:25:48,203][1652491] Updated weights for policy 0, policy_version 479715 (0.0014) [2024-06-15 17:25:49,682][1652491] Updated weights for policy 0, policy_version 479777 (0.0014) [2024-06-15 17:25:50,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 48605.9, 300 sec: 47541.4). Total num frames: 982646784. Throughput: 0: 12151.5. Samples: 245722112. Policy #0 lag: (min: 15.0, avg: 112.8, max: 271.0) [2024-06-15 17:25:50,955][1648985] Avg episode reward: [(0, '133.870')] [2024-06-15 17:25:54,531][1652491] Updated weights for policy 0, policy_version 479824 (0.0014) [2024-06-15 17:25:55,955][1648985] Fps is (10 sec: 39321.6, 60 sec: 46967.6, 300 sec: 47430.4). Total num frames: 982777856. Throughput: 0: 12151.5. Samples: 245762048. Policy #0 lag: (min: 15.0, avg: 112.8, max: 271.0) [2024-06-15 17:25:55,955][1648985] Avg episode reward: [(0, '128.820')] [2024-06-15 17:25:56,119][1652491] Updated weights for policy 0, policy_version 479889 (0.0011) [2024-06-15 17:25:56,869][1652491] Updated weights for policy 0, policy_version 479936 (0.0013) [2024-06-15 17:26:00,100][1652491] Updated weights for policy 0, policy_version 480000 (0.0013) [2024-06-15 17:26:00,823][1651469] Signal inference workers to stop experience collection... (25000 times) [2024-06-15 17:26:00,883][1652491] InferenceWorker_p0-w0: stopping experience collection (25000 times) [2024-06-15 17:26:00,955][1648985] Fps is (10 sec: 45874.1, 60 sec: 49151.9, 300 sec: 47541.3). Total num frames: 983105536. Throughput: 0: 11821.5. Samples: 245825536. Policy #0 lag: (min: 15.0, avg: 112.8, max: 271.0) [2024-06-15 17:26:00,956][1648985] Avg episode reward: [(0, '140.860')] [2024-06-15 17:26:01,055][1651469] Signal inference workers to resume experience collection... (25000 times) [2024-06-15 17:26:01,055][1652491] InferenceWorker_p0-w0: resuming experience collection (25000 times) [2024-06-15 17:26:01,229][1652491] Updated weights for policy 0, policy_version 480058 (0.0013) [2024-06-15 17:26:05,955][1648985] Fps is (10 sec: 42598.3, 60 sec: 45875.2, 300 sec: 47208.2). Total num frames: 983203840. Throughput: 0: 11741.9. Samples: 245900800. Policy #0 lag: (min: 15.0, avg: 112.8, max: 271.0) [2024-06-15 17:26:05,955][1648985] Avg episode reward: [(0, '145.110')] [2024-06-15 17:26:06,553][1652491] Updated weights for policy 0, policy_version 480112 (0.0044) [2024-06-15 17:26:07,910][1652491] Updated weights for policy 0, policy_version 480161 (0.0012) [2024-06-15 17:26:09,629][1652491] Updated weights for policy 0, policy_version 480197 (0.0013) [2024-06-15 17:26:10,955][1648985] Fps is (10 sec: 45876.2, 60 sec: 50244.4, 300 sec: 47541.4). Total num frames: 983564288. Throughput: 0: 11707.7. Samples: 245934080. Policy #0 lag: (min: 15.0, avg: 112.8, max: 271.0) [2024-06-15 17:26:10,955][1648985] Avg episode reward: [(0, '147.440')] [2024-06-15 17:26:10,999][1652491] Updated weights for policy 0, policy_version 480257 (0.0016) [2024-06-15 17:26:12,307][1652491] Updated weights for policy 0, policy_version 480316 (0.0148) [2024-06-15 17:26:15,955][1648985] Fps is (10 sec: 49151.9, 60 sec: 45329.1, 300 sec: 47430.3). Total num frames: 983695360. Throughput: 0: 11741.9. Samples: 246004736. Policy #0 lag: (min: 15.0, avg: 112.8, max: 271.0) [2024-06-15 17:26:15,955][1648985] Avg episode reward: [(0, '138.350')] [2024-06-15 17:26:17,639][1652491] Updated weights for policy 0, policy_version 480354 (0.0029) [2024-06-15 17:26:19,878][1652491] Updated weights for policy 0, policy_version 480448 (0.0134) [2024-06-15 17:26:20,955][1648985] Fps is (10 sec: 39321.5, 60 sec: 48059.7, 300 sec: 47208.2). Total num frames: 983957504. Throughput: 0: 11377.8. Samples: 246065664. Policy #0 lag: (min: 2.0, avg: 84.3, max: 258.0) [2024-06-15 17:26:20,956][1648985] Avg episode reward: [(0, '138.990')] [2024-06-15 17:26:22,481][1652491] Updated weights for policy 0, policy_version 480504 (0.0012) [2024-06-15 17:26:24,373][1652491] Updated weights for policy 0, policy_version 480574 (0.0014) [2024-06-15 17:26:25,955][1648985] Fps is (10 sec: 52428.6, 60 sec: 45875.2, 300 sec: 47541.4). Total num frames: 984219648. Throughput: 0: 11286.7. Samples: 246093312. Policy #0 lag: (min: 2.0, avg: 84.3, max: 258.0) [2024-06-15 17:26:25,955][1648985] Avg episode reward: [(0, '147.300')] [2024-06-15 17:26:30,581][1652491] Updated weights for policy 0, policy_version 480627 (0.0014) [2024-06-15 17:26:30,955][1648985] Fps is (10 sec: 39321.7, 60 sec: 45875.2, 300 sec: 47097.1). Total num frames: 984350720. Throughput: 0: 11366.4. Samples: 246169088. Policy #0 lag: (min: 2.0, avg: 84.3, max: 258.0) [2024-06-15 17:26:30,955][1648985] Avg episode reward: [(0, '140.920')] [2024-06-15 17:26:32,115][1652491] Updated weights for policy 0, policy_version 480692 (0.0014) [2024-06-15 17:26:33,886][1652491] Updated weights for policy 0, policy_version 480759 (0.0013) [2024-06-15 17:26:35,710][1652491] Updated weights for policy 0, policy_version 480804 (0.0013) [2024-06-15 17:26:35,957][1648985] Fps is (10 sec: 45863.7, 60 sec: 46965.5, 300 sec: 47318.8). Total num frames: 984678400. Throughput: 0: 11297.5. Samples: 246230528. Policy #0 lag: (min: 2.0, avg: 84.3, max: 258.0) [2024-06-15 17:26:35,958][1648985] Avg episode reward: [(0, '135.650')] [2024-06-15 17:26:40,955][1648985] Fps is (10 sec: 39321.3, 60 sec: 43690.6, 300 sec: 46652.8). Total num frames: 984743936. Throughput: 0: 11252.6. Samples: 246268416. Policy #0 lag: (min: 2.0, avg: 84.3, max: 258.0) [2024-06-15 17:26:40,956][1648985] Avg episode reward: [(0, '137.610')] [2024-06-15 17:26:41,341][1652491] Updated weights for policy 0, policy_version 480848 (0.0015) [2024-06-15 17:26:43,376][1652491] Updated weights for policy 0, policy_version 480930 (0.0021) [2024-06-15 17:26:44,620][1652491] Updated weights for policy 0, policy_version 480962 (0.0011) [2024-06-15 17:26:45,372][1651469] Signal inference workers to stop experience collection... (25050 times) [2024-06-15 17:26:45,421][1652491] InferenceWorker_p0-w0: stopping experience collection (25050 times) [2024-06-15 17:26:45,768][1651469] Signal inference workers to resume experience collection... (25050 times) [2024-06-15 17:26:45,782][1652491] InferenceWorker_p0-w0: resuming experience collection (25050 times) [2024-06-15 17:26:45,955][1648985] Fps is (10 sec: 42609.2, 60 sec: 45329.0, 300 sec: 47097.1). Total num frames: 985104384. Throughput: 0: 11320.9. Samples: 246334976. Policy #0 lag: (min: 2.0, avg: 84.3, max: 258.0) [2024-06-15 17:26:45,955][1648985] Avg episode reward: [(0, '133.540')] [2024-06-15 17:26:46,269][1652491] Updated weights for policy 0, policy_version 481024 (0.0041) [2024-06-15 17:26:48,153][1652491] Updated weights for policy 0, policy_version 481085 (0.0014) [2024-06-15 17:26:50,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 43690.6, 300 sec: 47097.1). Total num frames: 985268224. Throughput: 0: 11184.3. Samples: 246404096. Policy #0 lag: (min: 2.0, avg: 84.3, max: 258.0) [2024-06-15 17:26:50,955][1648985] Avg episode reward: [(0, '129.310')] [2024-06-15 17:26:53,939][1652491] Updated weights for policy 0, policy_version 481144 (0.0148) [2024-06-15 17:26:55,955][1648985] Fps is (10 sec: 42597.2, 60 sec: 45874.9, 300 sec: 46874.9). Total num frames: 985530368. Throughput: 0: 11161.5. Samples: 246436352. Policy #0 lag: (min: 2.0, avg: 84.3, max: 258.0) [2024-06-15 17:26:55,956][1648985] Avg episode reward: [(0, '138.320')] [2024-06-15 17:26:55,961][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000481216_985530368.pth... [2024-06-15 17:26:56,042][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000475760_974356480.pth [2024-06-15 17:26:56,502][1652491] Updated weights for policy 0, policy_version 481218 (0.0014) [2024-06-15 17:26:57,799][1652491] Updated weights for policy 0, policy_version 481267 (0.0021) [2024-06-15 17:26:59,064][1652491] Updated weights for policy 0, policy_version 481300 (0.0013) [2024-06-15 17:26:59,948][1652491] Updated weights for policy 0, policy_version 481344 (0.0013) [2024-06-15 17:27:00,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 44783.0, 300 sec: 47319.2). Total num frames: 985792512. Throughput: 0: 10899.9. Samples: 246495232. Policy #0 lag: (min: 2.0, avg: 84.3, max: 258.0) [2024-06-15 17:27:00,956][1648985] Avg episode reward: [(0, '138.900')] [2024-06-15 17:27:04,675][1652491] Updated weights for policy 0, policy_version 481394 (0.0016) [2024-06-15 17:27:05,962][1648985] Fps is (10 sec: 42569.1, 60 sec: 45869.7, 300 sec: 46651.6). Total num frames: 985956352. Throughput: 0: 11307.7. Samples: 246574592. Policy #0 lag: (min: 2.0, avg: 84.3, max: 258.0) [2024-06-15 17:27:05,963][1648985] Avg episode reward: [(0, '152.840')] [2024-06-15 17:27:06,141][1652491] Updated weights for policy 0, policy_version 481426 (0.0013) [2024-06-15 17:27:07,922][1652491] Updated weights for policy 0, policy_version 481473 (0.0014) [2024-06-15 17:27:09,354][1652491] Updated weights for policy 0, policy_version 481536 (0.0014) [2024-06-15 17:27:10,838][1652491] Updated weights for policy 0, policy_version 481592 (0.0053) [2024-06-15 17:27:10,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 45875.1, 300 sec: 47541.4). Total num frames: 986316800. Throughput: 0: 11377.8. Samples: 246605312. Policy #0 lag: (min: 2.0, avg: 84.3, max: 258.0) [2024-06-15 17:27:10,956][1648985] Avg episode reward: [(0, '162.070')] [2024-06-15 17:27:15,717][1652491] Updated weights for policy 0, policy_version 481664 (0.0014) [2024-06-15 17:27:15,955][1648985] Fps is (10 sec: 49187.3, 60 sec: 45875.2, 300 sec: 46874.9). Total num frames: 986447872. Throughput: 0: 11400.5. Samples: 246682112. Policy #0 lag: (min: 2.0, avg: 84.3, max: 258.0) [2024-06-15 17:27:15,955][1648985] Avg episode reward: [(0, '146.490')] [2024-06-15 17:27:18,448][1652491] Updated weights for policy 0, policy_version 481725 (0.0013) [2024-06-15 17:27:20,028][1652491] Updated weights for policy 0, policy_version 481780 (0.0012) [2024-06-15 17:27:20,931][1652491] Updated weights for policy 0, policy_version 481810 (0.0012) [2024-06-15 17:27:20,955][1648985] Fps is (10 sec: 42598.7, 60 sec: 46421.3, 300 sec: 47208.2). Total num frames: 986742784. Throughput: 0: 11492.2. Samples: 246747648. Policy #0 lag: (min: 2.0, avg: 84.3, max: 258.0) [2024-06-15 17:27:20,955][1648985] Avg episode reward: [(0, '138.640')] [2024-06-15 17:27:25,955][1648985] Fps is (10 sec: 39321.6, 60 sec: 43690.7, 300 sec: 46652.8). Total num frames: 986841088. Throughput: 0: 11434.7. Samples: 246782976. Policy #0 lag: (min: 2.0, avg: 84.3, max: 258.0) [2024-06-15 17:27:25,955][1648985] Avg episode reward: [(0, '150.590')] [2024-06-15 17:27:26,748][1652491] Updated weights for policy 0, policy_version 481888 (0.0014) [2024-06-15 17:27:29,165][1652491] Updated weights for policy 0, policy_version 481957 (0.0014) [2024-06-15 17:27:29,837][1651469] Signal inference workers to stop experience collection... (25100 times) [2024-06-15 17:27:29,866][1652491] InferenceWorker_p0-w0: stopping experience collection (25100 times) [2024-06-15 17:27:30,069][1651469] Signal inference workers to resume experience collection... (25100 times) [2024-06-15 17:27:30,070][1652491] InferenceWorker_p0-w0: resuming experience collection (25100 times) [2024-06-15 17:27:30,759][1652491] Updated weights for policy 0, policy_version 482019 (0.0015) [2024-06-15 17:27:30,955][1648985] Fps is (10 sec: 42597.7, 60 sec: 46967.3, 300 sec: 46874.9). Total num frames: 987168768. Throughput: 0: 11502.9. Samples: 246852608. Policy #0 lag: (min: 2.0, avg: 84.3, max: 258.0) [2024-06-15 17:27:30,956][1648985] Avg episode reward: [(0, '168.570')] [2024-06-15 17:27:32,452][1652491] Updated weights for policy 0, policy_version 482067 (0.0014) [2024-06-15 17:27:35,955][1648985] Fps is (10 sec: 52427.9, 60 sec: 44784.7, 300 sec: 46652.7). Total num frames: 987365376. Throughput: 0: 11525.7. Samples: 246922752. Policy #0 lag: (min: 2.0, avg: 84.3, max: 258.0) [2024-06-15 17:27:35,956][1648985] Avg episode reward: [(0, '159.530')] [2024-06-15 17:27:37,824][1652491] Updated weights for policy 0, policy_version 482118 (0.0013) [2024-06-15 17:27:38,859][1652491] Updated weights for policy 0, policy_version 482170 (0.0013) [2024-06-15 17:27:40,966][1648985] Fps is (10 sec: 42550.6, 60 sec: 47504.6, 300 sec: 46539.9). Total num frames: 987594752. Throughput: 0: 11613.8. Samples: 246959104. Policy #0 lag: (min: 2.0, avg: 84.3, max: 258.0) [2024-06-15 17:27:40,967][1648985] Avg episode reward: [(0, '157.490')] [2024-06-15 17:27:41,029][1652491] Updated weights for policy 0, policy_version 482225 (0.0012) [2024-06-15 17:27:42,285][1652491] Updated weights for policy 0, policy_version 482274 (0.0011) [2024-06-15 17:27:43,992][1652491] Updated weights for policy 0, policy_version 482359 (0.0013) [2024-06-15 17:27:45,955][1648985] Fps is (10 sec: 52429.3, 60 sec: 46421.3, 300 sec: 46763.9). Total num frames: 987889664. Throughput: 0: 11719.1. Samples: 247022592. Policy #0 lag: (min: 127.0, avg: 201.9, max: 323.0) [2024-06-15 17:27:45,956][1648985] Avg episode reward: [(0, '140.670')] [2024-06-15 17:27:49,611][1652491] Updated weights for policy 0, policy_version 482407 (0.0015) [2024-06-15 17:27:50,955][1648985] Fps is (10 sec: 42646.0, 60 sec: 45875.1, 300 sec: 46541.7). Total num frames: 988020736. Throughput: 0: 11777.8. Samples: 247104512. Policy #0 lag: (min: 127.0, avg: 201.9, max: 323.0) [2024-06-15 17:27:50,956][1648985] Avg episode reward: [(0, '157.540')] [2024-06-15 17:27:51,406][1652491] Updated weights for policy 0, policy_version 482464 (0.0023) [2024-06-15 17:27:52,683][1652491] Updated weights for policy 0, policy_version 482515 (0.0012) [2024-06-15 17:27:53,414][1652491] Updated weights for policy 0, policy_version 482560 (0.0048) [2024-06-15 17:27:54,769][1652491] Updated weights for policy 0, policy_version 482615 (0.0013) [2024-06-15 17:27:55,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 48059.9, 300 sec: 46986.0). Total num frames: 988413952. Throughput: 0: 11798.8. Samples: 247136256. Policy #0 lag: (min: 127.0, avg: 201.9, max: 323.0) [2024-06-15 17:27:55,955][1648985] Avg episode reward: [(0, '157.300')] [2024-06-15 17:28:00,009][1652491] Updated weights for policy 0, policy_version 482656 (0.0013) [2024-06-15 17:28:00,955][1648985] Fps is (10 sec: 52429.7, 60 sec: 45875.2, 300 sec: 46986.0). Total num frames: 988545024. Throughput: 0: 11832.9. Samples: 247214592. Policy #0 lag: (min: 127.0, avg: 201.9, max: 323.0) [2024-06-15 17:28:00,956][1648985] Avg episode reward: [(0, '131.370')] [2024-06-15 17:28:01,703][1652491] Updated weights for policy 0, policy_version 482695 (0.0013) [2024-06-15 17:28:03,292][1652491] Updated weights for policy 0, policy_version 482755 (0.0012) [2024-06-15 17:28:04,464][1652491] Updated weights for policy 0, policy_version 482814 (0.0012) [2024-06-15 17:28:05,694][1652491] Updated weights for policy 0, policy_version 482851 (0.0011) [2024-06-15 17:28:05,955][1648985] Fps is (10 sec: 49152.1, 60 sec: 49157.9, 300 sec: 47430.3). Total num frames: 988905472. Throughput: 0: 11810.1. Samples: 247279104. Policy #0 lag: (min: 127.0, avg: 201.9, max: 323.0) [2024-06-15 17:28:05,956][1648985] Avg episode reward: [(0, '140.250')] [2024-06-15 17:28:10,955][1648985] Fps is (10 sec: 39321.6, 60 sec: 43690.7, 300 sec: 46652.8). Total num frames: 988938240. Throughput: 0: 11867.0. Samples: 247316992. Policy #0 lag: (min: 127.0, avg: 201.9, max: 323.0) [2024-06-15 17:28:10,956][1648985] Avg episode reward: [(0, '144.980')] [2024-06-15 17:28:12,094][1652491] Updated weights for policy 0, policy_version 482928 (0.0014) [2024-06-15 17:28:13,457][1651469] Signal inference workers to stop experience collection... (25150 times) [2024-06-15 17:28:13,530][1652491] InferenceWorker_p0-w0: stopping experience collection (25150 times) [2024-06-15 17:28:13,723][1651469] Signal inference workers to resume experience collection... (25150 times) [2024-06-15 17:28:13,724][1652491] InferenceWorker_p0-w0: resuming experience collection (25150 times) [2024-06-15 17:28:13,726][1652491] Updated weights for policy 0, policy_version 482976 (0.0017) [2024-06-15 17:28:15,113][1652491] Updated weights for policy 0, policy_version 483028 (0.0013) [2024-06-15 17:28:15,955][1648985] Fps is (10 sec: 39321.8, 60 sec: 47513.6, 300 sec: 46986.0). Total num frames: 989298688. Throughput: 0: 11741.9. Samples: 247380992. Policy #0 lag: (min: 127.0, avg: 201.9, max: 323.0) [2024-06-15 17:28:15,955][1648985] Avg episode reward: [(0, '143.520')] [2024-06-15 17:28:16,687][1652491] Updated weights for policy 0, policy_version 483088 (0.0012) [2024-06-15 17:28:17,642][1652491] Updated weights for policy 0, policy_version 483136 (0.0012) [2024-06-15 17:28:20,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 45329.0, 300 sec: 46652.7). Total num frames: 989462528. Throughput: 0: 11764.6. Samples: 247452160. Policy #0 lag: (min: 127.0, avg: 201.9, max: 323.0) [2024-06-15 17:28:20,956][1648985] Avg episode reward: [(0, '151.320')] [2024-06-15 17:28:24,054][1652491] Updated weights for policy 0, policy_version 483199 (0.0014) [2024-06-15 17:28:25,801][1652491] Updated weights for policy 0, policy_version 483250 (0.0011) [2024-06-15 17:28:25,955][1648985] Fps is (10 sec: 39321.7, 60 sec: 47513.6, 300 sec: 46763.9). Total num frames: 989691904. Throughput: 0: 11676.6. Samples: 247484416. Policy #0 lag: (min: 127.0, avg: 201.9, max: 323.0) [2024-06-15 17:28:25,955][1648985] Avg episode reward: [(0, '136.880')] [2024-06-15 17:28:26,890][1652491] Updated weights for policy 0, policy_version 483300 (0.0019) [2024-06-15 17:28:28,458][1652491] Updated weights for policy 0, policy_version 483360 (0.0014) [2024-06-15 17:28:30,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 46967.6, 300 sec: 46874.9). Total num frames: 989986816. Throughput: 0: 11719.1. Samples: 247549952. Policy #0 lag: (min: 127.0, avg: 201.9, max: 323.0) [2024-06-15 17:28:30,956][1648985] Avg episode reward: [(0, '132.620')] [2024-06-15 17:28:34,725][1652491] Updated weights for policy 0, policy_version 483424 (0.0013) [2024-06-15 17:28:35,955][1648985] Fps is (10 sec: 42597.9, 60 sec: 45875.3, 300 sec: 46652.8). Total num frames: 990117888. Throughput: 0: 11639.5. Samples: 247628288. Policy #0 lag: (min: 127.0, avg: 201.9, max: 323.0) [2024-06-15 17:28:35,956][1648985] Avg episode reward: [(0, '132.900')] [2024-06-15 17:28:35,969][1652491] Updated weights for policy 0, policy_version 483461 (0.0015) [2024-06-15 17:28:37,713][1652491] Updated weights for policy 0, policy_version 483536 (0.0012) [2024-06-15 17:28:38,898][1652491] Updated weights for policy 0, policy_version 483580 (0.0012) [2024-06-15 17:28:40,492][1652491] Updated weights for policy 0, policy_version 483632 (0.0013) [2024-06-15 17:28:40,955][1648985] Fps is (10 sec: 52428.5, 60 sec: 48615.0, 300 sec: 47097.1). Total num frames: 990511104. Throughput: 0: 11525.7. Samples: 247654912. Policy #0 lag: (min: 127.0, avg: 201.9, max: 323.0) [2024-06-15 17:28:40,956][1648985] Avg episode reward: [(0, '139.740')] [2024-06-15 17:28:45,957][1648985] Fps is (10 sec: 42589.2, 60 sec: 44235.2, 300 sec: 46319.2). Total num frames: 990543872. Throughput: 0: 11445.5. Samples: 247729664. Policy #0 lag: (min: 127.0, avg: 201.9, max: 323.0) [2024-06-15 17:28:45,958][1648985] Avg episode reward: [(0, '150.290')] [2024-06-15 17:28:46,116][1652491] Updated weights for policy 0, policy_version 483680 (0.0013) [2024-06-15 17:28:47,361][1652491] Updated weights for policy 0, policy_version 483717 (0.0042) [2024-06-15 17:28:48,528][1652491] Updated weights for policy 0, policy_version 483763 (0.0013) [2024-06-15 17:28:50,086][1652491] Updated weights for policy 0, policy_version 483835 (0.0014) [2024-06-15 17:28:50,955][1648985] Fps is (10 sec: 39321.2, 60 sec: 48059.8, 300 sec: 46874.9). Total num frames: 990904320. Throughput: 0: 11468.8. Samples: 247795200. Policy #0 lag: (min: 127.0, avg: 201.9, max: 323.0) [2024-06-15 17:28:50,956][1648985] Avg episode reward: [(0, '161.720')] [2024-06-15 17:28:52,040][1652491] Updated weights for policy 0, policy_version 483898 (0.0113) [2024-06-15 17:28:55,955][1648985] Fps is (10 sec: 49162.6, 60 sec: 43690.6, 300 sec: 46430.6). Total num frames: 991035392. Throughput: 0: 11377.8. Samples: 247828992. Policy #0 lag: (min: 127.0, avg: 201.9, max: 323.0) [2024-06-15 17:28:55,955][1648985] Avg episode reward: [(0, '138.450')] [2024-06-15 17:28:55,960][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000483904_991035392.pth... [2024-06-15 17:28:56,074][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000478528_980025344.pth [2024-06-15 17:28:57,273][1651469] Signal inference workers to stop experience collection... (25200 times) [2024-06-15 17:28:57,352][1652491] InferenceWorker_p0-w0: stopping experience collection (25200 times) [2024-06-15 17:28:57,597][1651469] Signal inference workers to resume experience collection... (25200 times) [2024-06-15 17:28:57,598][1652491] InferenceWorker_p0-w0: resuming experience collection (25200 times) [2024-06-15 17:28:58,214][1652491] Updated weights for policy 0, policy_version 483964 (0.0095) [2024-06-15 17:28:59,929][1652491] Updated weights for policy 0, policy_version 484016 (0.0012) [2024-06-15 17:29:00,955][1648985] Fps is (10 sec: 42599.2, 60 sec: 46421.4, 300 sec: 46652.8). Total num frames: 991330304. Throughput: 0: 11491.6. Samples: 247898112. Policy #0 lag: (min: 127.0, avg: 201.9, max: 323.0) [2024-06-15 17:29:00,955][1648985] Avg episode reward: [(0, '153.070')] [2024-06-15 17:29:01,134][1652491] Updated weights for policy 0, policy_version 484066 (0.0013) [2024-06-15 17:29:02,877][1652491] Updated weights for policy 0, policy_version 484130 (0.0015) [2024-06-15 17:29:05,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 44236.8, 300 sec: 46541.7). Total num frames: 991559680. Throughput: 0: 11457.4. Samples: 247967744. Policy #0 lag: (min: 127.0, avg: 201.9, max: 323.0) [2024-06-15 17:29:05,956][1648985] Avg episode reward: [(0, '140.800')] [2024-06-15 17:29:07,900][1652491] Updated weights for policy 0, policy_version 484176 (0.0012) [2024-06-15 17:29:09,076][1652491] Updated weights for policy 0, policy_version 484223 (0.0013) [2024-06-15 17:29:10,955][1648985] Fps is (10 sec: 39321.3, 60 sec: 46421.3, 300 sec: 46319.5). Total num frames: 991723520. Throughput: 0: 11457.4. Samples: 248000000. Policy #0 lag: (min: 47.0, avg: 119.4, max: 303.0) [2024-06-15 17:29:10,956][1648985] Avg episode reward: [(0, '140.010')] [2024-06-15 17:29:11,998][1652491] Updated weights for policy 0, policy_version 484288 (0.0012) [2024-06-15 17:29:13,338][1652491] Updated weights for policy 0, policy_version 484352 (0.0011) [2024-06-15 17:29:14,742][1652491] Updated weights for policy 0, policy_version 484411 (0.0012) [2024-06-15 17:29:15,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 46421.3, 300 sec: 46652.7). Total num frames: 992083968. Throughput: 0: 11457.4. Samples: 248065536. Policy #0 lag: (min: 47.0, avg: 119.4, max: 303.0) [2024-06-15 17:29:15,956][1648985] Avg episode reward: [(0, '122.010')] [2024-06-15 17:29:19,856][1652491] Updated weights for policy 0, policy_version 484464 (0.0014) [2024-06-15 17:29:20,955][1648985] Fps is (10 sec: 49152.3, 60 sec: 45875.2, 300 sec: 46430.6). Total num frames: 992215040. Throughput: 0: 11468.8. Samples: 248144384. Policy #0 lag: (min: 47.0, avg: 119.4, max: 303.0) [2024-06-15 17:29:20,955][1648985] Avg episode reward: [(0, '120.130')] [2024-06-15 17:29:22,627][1652491] Updated weights for policy 0, policy_version 484512 (0.0014) [2024-06-15 17:29:24,388][1652491] Updated weights for policy 0, policy_version 484576 (0.0013) [2024-06-15 17:29:25,551][1652491] Updated weights for policy 0, policy_version 484628 (0.0012) [2024-06-15 17:29:25,955][1648985] Fps is (10 sec: 45874.2, 60 sec: 47513.4, 300 sec: 46652.7). Total num frames: 992542720. Throughput: 0: 11571.2. Samples: 248175616. Policy #0 lag: (min: 47.0, avg: 119.4, max: 303.0) [2024-06-15 17:29:25,956][1648985] Avg episode reward: [(0, '126.080')] [2024-06-15 17:29:26,580][1652491] Updated weights for policy 0, policy_version 484670 (0.0011) [2024-06-15 17:29:30,955][1648985] Fps is (10 sec: 42598.2, 60 sec: 44236.8, 300 sec: 46319.5). Total num frames: 992641024. Throughput: 0: 11435.2. Samples: 248244224. Policy #0 lag: (min: 47.0, avg: 119.4, max: 303.0) [2024-06-15 17:29:30,956][1648985] Avg episode reward: [(0, '136.410')] [2024-06-15 17:29:33,543][1652491] Updated weights for policy 0, policy_version 484737 (0.0015) [2024-06-15 17:29:34,762][1652491] Updated weights for policy 0, policy_version 484786 (0.0014) [2024-06-15 17:29:35,957][1648985] Fps is (10 sec: 39317.9, 60 sec: 46966.6, 300 sec: 46430.4). Total num frames: 992935936. Throughput: 0: 11411.7. Samples: 248308736. Policy #0 lag: (min: 47.0, avg: 119.4, max: 303.0) [2024-06-15 17:29:35,959][1648985] Avg episode reward: [(0, '153.300')] [2024-06-15 17:29:36,110][1652491] Updated weights for policy 0, policy_version 484853 (0.0013) [2024-06-15 17:29:36,330][1651469] Signal inference workers to stop experience collection... (25250 times) [2024-06-15 17:29:36,342][1651469] Signal inference workers to resume experience collection... (25250 times) [2024-06-15 17:29:36,356][1652491] InferenceWorker_p0-w0: stopping experience collection (25250 times) [2024-06-15 17:29:36,388][1652491] InferenceWorker_p0-w0: resuming experience collection (25250 times) [2024-06-15 17:29:36,921][1652491] Updated weights for policy 0, policy_version 484884 (0.0024) [2024-06-15 17:29:40,955][1648985] Fps is (10 sec: 49151.8, 60 sec: 43690.7, 300 sec: 46208.4). Total num frames: 993132544. Throughput: 0: 11468.8. Samples: 248345088. Policy #0 lag: (min: 47.0, avg: 119.4, max: 303.0) [2024-06-15 17:29:40,956][1648985] Avg episode reward: [(0, '142.170')] [2024-06-15 17:29:41,122][1652491] Updated weights for policy 0, policy_version 484929 (0.0012) [2024-06-15 17:29:42,110][1652491] Updated weights for policy 0, policy_version 484990 (0.0014) [2024-06-15 17:29:45,955][1648985] Fps is (10 sec: 42603.6, 60 sec: 46969.2, 300 sec: 46208.4). Total num frames: 993361920. Throughput: 0: 11696.4. Samples: 248424448. Policy #0 lag: (min: 47.0, avg: 119.4, max: 303.0) [2024-06-15 17:29:45,956][1648985] Avg episode reward: [(0, '154.910')] [2024-06-15 17:29:46,435][1652491] Updated weights for policy 0, policy_version 485072 (0.0073) [2024-06-15 17:29:48,143][1652491] Updated weights for policy 0, policy_version 485122 (0.0030) [2024-06-15 17:29:49,473][1652491] Updated weights for policy 0, policy_version 485184 (0.0014) [2024-06-15 17:29:50,955][1648985] Fps is (10 sec: 52429.1, 60 sec: 45875.3, 300 sec: 46430.6). Total num frames: 993656832. Throughput: 0: 11480.2. Samples: 248484352. Policy #0 lag: (min: 47.0, avg: 119.4, max: 303.0) [2024-06-15 17:29:50,956][1648985] Avg episode reward: [(0, '155.890')] [2024-06-15 17:29:55,955][1648985] Fps is (10 sec: 42598.1, 60 sec: 45875.2, 300 sec: 46208.4). Total num frames: 993787904. Throughput: 0: 11548.5. Samples: 248519680. Policy #0 lag: (min: 47.0, avg: 119.4, max: 303.0) [2024-06-15 17:29:55,956][1648985] Avg episode reward: [(0, '155.650')] [2024-06-15 17:29:55,987][1652491] Updated weights for policy 0, policy_version 485250 (0.0026) [2024-06-15 17:29:57,840][1652491] Updated weights for policy 0, policy_version 485328 (0.0013) [2024-06-15 17:30:00,587][1652491] Updated weights for policy 0, policy_version 485415 (0.0014) [2024-06-15 17:30:00,955][1648985] Fps is (10 sec: 49152.0, 60 sec: 46967.4, 300 sec: 46430.6). Total num frames: 994148352. Throughput: 0: 11605.3. Samples: 248587776. Policy #0 lag: (min: 47.0, avg: 119.4, max: 303.0) [2024-06-15 17:30:00,956][1648985] Avg episode reward: [(0, '148.950')] [2024-06-15 17:30:04,392][1652491] Updated weights for policy 0, policy_version 485472 (0.0100) [2024-06-15 17:30:05,955][1648985] Fps is (10 sec: 52428.5, 60 sec: 45875.1, 300 sec: 46652.7). Total num frames: 994312192. Throughput: 0: 11332.2. Samples: 248654336. Policy #0 lag: (min: 47.0, avg: 119.4, max: 303.0) [2024-06-15 17:30:05,956][1648985] Avg episode reward: [(0, '137.070')] [2024-06-15 17:30:07,647][1652491] Updated weights for policy 0, policy_version 485524 (0.0014) [2024-06-15 17:30:09,223][1652491] Updated weights for policy 0, policy_version 485588 (0.0015) [2024-06-15 17:30:10,012][1652491] Updated weights for policy 0, policy_version 485632 (0.0012) [2024-06-15 17:30:10,955][1648985] Fps is (10 sec: 42598.3, 60 sec: 47513.6, 300 sec: 46097.3). Total num frames: 994574336. Throughput: 0: 11468.8. Samples: 248691712. Policy #0 lag: (min: 47.0, avg: 119.4, max: 303.0) [2024-06-15 17:30:10,956][1648985] Avg episode reward: [(0, '136.870')] [2024-06-15 17:30:12,058][1652491] Updated weights for policy 0, policy_version 485694 (0.0026) [2024-06-15 17:30:15,749][1652491] Updated weights for policy 0, policy_version 485728 (0.0037) [2024-06-15 17:30:15,955][1648985] Fps is (10 sec: 45875.6, 60 sec: 44783.0, 300 sec: 46430.6). Total num frames: 994770944. Throughput: 0: 11537.1. Samples: 248763392. Policy #0 lag: (min: 47.0, avg: 119.4, max: 303.0) [2024-06-15 17:30:15,955][1648985] Avg episode reward: [(0, '126.600')] [2024-06-15 17:30:19,102][1652491] Updated weights for policy 0, policy_version 485776 (0.0013) [2024-06-15 17:30:20,194][1651469] Signal inference workers to stop experience collection... (25300 times) [2024-06-15 17:30:20,244][1652491] InferenceWorker_p0-w0: stopping experience collection (25300 times) [2024-06-15 17:30:20,449][1651469] Signal inference workers to resume experience collection... (25300 times) [2024-06-15 17:30:20,450][1652491] InferenceWorker_p0-w0: resuming experience collection (25300 times) [2024-06-15 17:30:20,922][1652491] Updated weights for policy 0, policy_version 485856 (0.0011) [2024-06-15 17:30:20,955][1648985] Fps is (10 sec: 45875.8, 60 sec: 46967.5, 300 sec: 45986.3). Total num frames: 995033088. Throughput: 0: 11560.1. Samples: 248828928. Policy #0 lag: (min: 47.0, avg: 119.4, max: 303.0) [2024-06-15 17:30:20,955][1648985] Avg episode reward: [(0, '126.490')] [2024-06-15 17:30:22,980][1652491] Updated weights for policy 0, policy_version 485909 (0.0011) [2024-06-15 17:30:25,955][1648985] Fps is (10 sec: 45874.9, 60 sec: 44783.1, 300 sec: 46208.4). Total num frames: 995229696. Throughput: 0: 11446.0. Samples: 248860160. Policy #0 lag: (min: 47.0, avg: 119.4, max: 303.0) [2024-06-15 17:30:25,956][1648985] Avg episode reward: [(0, '119.980')] [2024-06-15 17:30:26,543][1652491] Updated weights for policy 0, policy_version 485956 (0.0013) [2024-06-15 17:30:27,919][1652491] Updated weights for policy 0, policy_version 486015 (0.0011) [2024-06-15 17:30:30,955][1648985] Fps is (10 sec: 39321.3, 60 sec: 46421.4, 300 sec: 45986.3). Total num frames: 995426304. Throughput: 0: 11400.5. Samples: 248937472. Policy #0 lag: (min: 47.0, avg: 119.4, max: 303.0) [2024-06-15 17:30:30,956][1648985] Avg episode reward: [(0, '137.350')] [2024-06-15 17:30:31,744][1652491] Updated weights for policy 0, policy_version 486080 (0.0098) [2024-06-15 17:30:34,316][1652491] Updated weights for policy 0, policy_version 486162 (0.0015) [2024-06-15 17:30:35,998][1648985] Fps is (10 sec: 52203.6, 60 sec: 46934.6, 300 sec: 46201.7). Total num frames: 995753984. Throughput: 0: 11435.1. Samples: 248999424. Policy #0 lag: (min: 33.0, avg: 174.5, max: 289.0) [2024-06-15 17:30:35,999][1648985] Avg episode reward: [(0, '134.330')] [2024-06-15 17:30:38,855][1652491] Updated weights for policy 0, policy_version 486249 (0.0013) [2024-06-15 17:30:40,955][1648985] Fps is (10 sec: 45875.4, 60 sec: 45875.3, 300 sec: 45764.1). Total num frames: 995885056. Throughput: 0: 11468.8. Samples: 249035776. Policy #0 lag: (min: 33.0, avg: 174.5, max: 289.0) [2024-06-15 17:30:40,955][1648985] Avg episode reward: [(0, '155.180')] [2024-06-15 17:30:42,067][1652491] Updated weights for policy 0, policy_version 486288 (0.0013) [2024-06-15 17:30:43,958][1652491] Updated weights for policy 0, policy_version 486368 (0.0012) [2024-06-15 17:30:45,495][1652491] Updated weights for policy 0, policy_version 486401 (0.0039) [2024-06-15 17:30:45,955][1648985] Fps is (10 sec: 42783.5, 60 sec: 46967.5, 300 sec: 45875.2). Total num frames: 996179968. Throughput: 0: 11537.1. Samples: 249106944. Policy #0 lag: (min: 33.0, avg: 174.5, max: 289.0) [2024-06-15 17:30:45,955][1648985] Avg episode reward: [(0, '143.590')] [2024-06-15 17:30:46,672][1652491] Updated weights for policy 0, policy_version 486459 (0.0020) [2024-06-15 17:30:49,785][1652491] Updated weights for policy 0, policy_version 486524 (0.0013) [2024-06-15 17:30:50,955][1648985] Fps is (10 sec: 52428.5, 60 sec: 45875.2, 300 sec: 46208.4). Total num frames: 996409344. Throughput: 0: 11673.6. Samples: 249179648. Policy #0 lag: (min: 33.0, avg: 174.5, max: 289.0) [2024-06-15 17:30:50,956][1648985] Avg episode reward: [(0, '150.940')] [2024-06-15 17:30:53,853][1652491] Updated weights for policy 0, policy_version 486594 (0.0093) [2024-06-15 17:30:55,955][1648985] Fps is (10 sec: 49150.5, 60 sec: 48059.5, 300 sec: 45986.3). Total num frames: 996671488. Throughput: 0: 11593.9. Samples: 249213440. Policy #0 lag: (min: 33.0, avg: 174.5, max: 289.0) [2024-06-15 17:30:55,956][1648985] Avg episode reward: [(0, '146.220')] [2024-06-15 17:30:55,962][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000486656_996671488.pth... [2024-06-15 17:30:56,013][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000481216_985530368.pth [2024-06-15 17:30:56,018][1651469] Saving a milestone train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/milestones/checkpoint_000486656_996671488.pth [2024-06-15 17:30:56,872][1652491] Updated weights for policy 0, policy_version 486688 (0.0013) [2024-06-15 17:31:00,955][1648985] Fps is (10 sec: 42598.7, 60 sec: 44783.0, 300 sec: 46208.4). Total num frames: 996835328. Throughput: 0: 11480.2. Samples: 249280000. Policy #0 lag: (min: 33.0, avg: 174.5, max: 289.0) [2024-06-15 17:31:00,955][1648985] Avg episode reward: [(0, '123.850')] [2024-06-15 17:31:00,961][1652491] Updated weights for policy 0, policy_version 486752 (0.0013) [2024-06-15 17:31:05,412][1652491] Updated weights for policy 0, policy_version 486802 (0.0014) [2024-06-15 17:31:05,789][1651469] Signal inference workers to stop experience collection... (25350 times) [2024-06-15 17:31:05,842][1652491] InferenceWorker_p0-w0: stopping experience collection (25350 times) [2024-06-15 17:31:05,955][1648985] Fps is (10 sec: 32768.9, 60 sec: 44783.0, 300 sec: 45542.0). Total num frames: 996999168. Throughput: 0: 11537.0. Samples: 249348096. Policy #0 lag: (min: 33.0, avg: 174.5, max: 289.0) [2024-06-15 17:31:05,955][1648985] Avg episode reward: [(0, '135.560')] [2024-06-15 17:31:06,025][1651469] Signal inference workers to resume experience collection... (25350 times) [2024-06-15 17:31:06,026][1652491] InferenceWorker_p0-w0: resuming experience collection (25350 times) [2024-06-15 17:31:06,949][1652491] Updated weights for policy 0, policy_version 486883 (0.0017) [2024-06-15 17:31:08,296][1652491] Updated weights for policy 0, policy_version 486932 (0.0014) [2024-06-15 17:31:10,961][1648985] Fps is (10 sec: 49121.3, 60 sec: 45870.5, 300 sec: 46207.5). Total num frames: 997326848. Throughput: 0: 11501.4. Samples: 249377792. Policy #0 lag: (min: 33.0, avg: 174.5, max: 289.0) [2024-06-15 17:31:10,962][1648985] Avg episode reward: [(0, '153.540')] [2024-06-15 17:31:12,656][1652491] Updated weights for policy 0, policy_version 487011 (0.0185) [2024-06-15 17:31:15,955][1648985] Fps is (10 sec: 45873.3, 60 sec: 44782.6, 300 sec: 45764.1). Total num frames: 997457920. Throughput: 0: 11389.1. Samples: 249449984. Policy #0 lag: (min: 33.0, avg: 174.5, max: 289.0) [2024-06-15 17:31:15,956][1648985] Avg episode reward: [(0, '157.870')] [2024-06-15 17:31:16,472][1652491] Updated weights for policy 0, policy_version 487041 (0.0013) [2024-06-15 17:31:17,811][1652491] Updated weights for policy 0, policy_version 487104 (0.0012) [2024-06-15 17:31:19,384][1652491] Updated weights for policy 0, policy_version 487184 (0.0013) [2024-06-15 17:31:20,524][1652491] Updated weights for policy 0, policy_version 487232 (0.0013) [2024-06-15 17:31:20,955][1648985] Fps is (10 sec: 52461.3, 60 sec: 46967.4, 300 sec: 46208.4). Total num frames: 997851136. Throughput: 0: 11491.2. Samples: 249516032. Policy #0 lag: (min: 33.0, avg: 174.5, max: 289.0) [2024-06-15 17:31:20,956][1648985] Avg episode reward: [(0, '159.860')] [2024-06-15 17:31:25,955][1648985] Fps is (10 sec: 52430.8, 60 sec: 45875.2, 300 sec: 46208.4). Total num frames: 997982208. Throughput: 0: 11559.8. Samples: 249555968. Policy #0 lag: (min: 33.0, avg: 174.5, max: 289.0) [2024-06-15 17:31:25,956][1648985] Avg episode reward: [(0, '150.770')] [2024-06-15 17:31:27,741][1652491] Updated weights for policy 0, policy_version 487298 (0.0088) [2024-06-15 17:31:29,562][1652491] Updated weights for policy 0, policy_version 487376 (0.0014) [2024-06-15 17:31:30,624][1652491] Updated weights for policy 0, policy_version 487427 (0.0013) [2024-06-15 17:31:30,955][1648985] Fps is (10 sec: 42597.8, 60 sec: 47513.5, 300 sec: 46097.7). Total num frames: 998277120. Throughput: 0: 11468.8. Samples: 249623040. Policy #0 lag: (min: 33.0, avg: 174.5, max: 289.0) [2024-06-15 17:31:30,956][1648985] Avg episode reward: [(0, '157.510')] [2024-06-15 17:31:31,555][1652491] Updated weights for policy 0, policy_version 487483 (0.0018) [2024-06-15 17:31:35,189][1652491] Updated weights for policy 0, policy_version 487542 (0.0014) [2024-06-15 17:31:35,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 45908.2, 300 sec: 46652.7). Total num frames: 998506496. Throughput: 0: 11446.0. Samples: 249694720. Policy #0 lag: (min: 33.0, avg: 174.5, max: 289.0) [2024-06-15 17:31:35,956][1648985] Avg episode reward: [(0, '139.420')] [2024-06-15 17:31:40,327][1652491] Updated weights for policy 0, policy_version 487603 (0.0014) [2024-06-15 17:31:40,970][1648985] Fps is (10 sec: 39262.2, 60 sec: 46409.5, 300 sec: 45983.9). Total num frames: 998670336. Throughput: 0: 11578.7. Samples: 249734656. Policy #0 lag: (min: 33.0, avg: 174.5, max: 289.0) [2024-06-15 17:31:40,971][1648985] Avg episode reward: [(0, '142.810')] [2024-06-15 17:31:41,563][1652491] Updated weights for policy 0, policy_version 487664 (0.0011) [2024-06-15 17:31:43,100][1652491] Updated weights for policy 0, policy_version 487740 (0.0020) [2024-06-15 17:31:45,672][1651469] Signal inference workers to stop experience collection... (25400 times) [2024-06-15 17:31:45,704][1652491] InferenceWorker_p0-w0: stopping experience collection (25400 times) [2024-06-15 17:31:45,955][1648985] Fps is (10 sec: 42598.7, 60 sec: 45875.2, 300 sec: 46319.5). Total num frames: 998932480. Throughput: 0: 11480.2. Samples: 249796608. Policy #0 lag: (min: 33.0, avg: 174.5, max: 289.0) [2024-06-15 17:31:45,955][1648985] Avg episode reward: [(0, '161.120')] [2024-06-15 17:31:46,024][1651469] Signal inference workers to resume experience collection... (25400 times) [2024-06-15 17:31:46,025][1652491] InferenceWorker_p0-w0: resuming experience collection (25400 times) [2024-06-15 17:31:46,561][1652491] Updated weights for policy 0, policy_version 487794 (0.0021) [2024-06-15 17:31:50,271][1652491] Updated weights for policy 0, policy_version 487811 (0.0011) [2024-06-15 17:31:50,963][1648985] Fps is (10 sec: 42629.4, 60 sec: 44777.0, 300 sec: 45985.1). Total num frames: 999096320. Throughput: 0: 11557.8. Samples: 249868288. Policy #0 lag: (min: 33.0, avg: 174.5, max: 289.0) [2024-06-15 17:31:50,964][1648985] Avg episode reward: [(0, '154.010')] [2024-06-15 17:31:51,456][1652491] Updated weights for policy 0, policy_version 487866 (0.0013) [2024-06-15 17:31:52,647][1652491] Updated weights for policy 0, policy_version 487920 (0.0013) [2024-06-15 17:31:54,092][1652491] Updated weights for policy 0, policy_version 487984 (0.0015) [2024-06-15 17:31:55,974][1648985] Fps is (10 sec: 49057.5, 60 sec: 45860.7, 300 sec: 46205.4). Total num frames: 999424000. Throughput: 0: 11556.5. Samples: 249897984. Policy #0 lag: (min: 33.0, avg: 174.5, max: 289.0) [2024-06-15 17:31:55,975][1648985] Avg episode reward: [(0, '153.010')] [2024-06-15 17:31:56,748][1652491] Updated weights for policy 0, policy_version 488016 (0.0014) [2024-06-15 17:31:57,859][1652491] Updated weights for policy 0, policy_version 488061 (0.0019) [2024-06-15 17:32:00,958][1648985] Fps is (10 sec: 45897.2, 60 sec: 45326.6, 300 sec: 46098.0). Total num frames: 999555072. Throughput: 0: 11877.7. Samples: 249984512. Policy #0 lag: (min: 33.0, avg: 174.5, max: 289.0) [2024-06-15 17:32:00,959][1648985] Avg episode reward: [(0, '137.870')] [2024-06-15 17:32:02,093][1652491] Updated weights for policy 0, policy_version 488131 (0.0013) [2024-06-15 17:32:03,828][1652491] Updated weights for policy 0, policy_version 488193 (0.0016) [2024-06-15 17:32:04,641][1652491] Updated weights for policy 0, policy_version 488240 (0.0013) [2024-06-15 17:32:05,955][1648985] Fps is (10 sec: 52530.0, 60 sec: 49152.0, 300 sec: 46208.4). Total num frames: 999948288. Throughput: 0: 11832.9. Samples: 250048512. Policy #0 lag: (min: 15.0, avg: 89.8, max: 271.0) [2024-06-15 17:32:05,956][1648985] Avg episode reward: [(0, '166.570')] [2024-06-15 17:32:08,637][1652491] Updated weights for policy 0, policy_version 488304 (0.0012) [2024-06-15 17:32:10,955][1648985] Fps is (10 sec: 52445.9, 60 sec: 45880.0, 300 sec: 46208.4). Total num frames: 1000079360. Throughput: 0: 11707.7. Samples: 250082816. Policy #0 lag: (min: 15.0, avg: 89.8, max: 271.0) [2024-06-15 17:32:10,955][1648985] Avg episode reward: [(0, '179.070')] [2024-06-15 17:32:11,563][1652491] Updated weights for policy 0, policy_version 488336 (0.0013) [2024-06-15 17:32:13,194][1652491] Updated weights for policy 0, policy_version 488400 (0.0070) [2024-06-15 17:32:14,756][1652491] Updated weights for policy 0, policy_version 488464 (0.0014) [2024-06-15 17:32:15,955][1648985] Fps is (10 sec: 52428.1, 60 sec: 50244.5, 300 sec: 46541.6). Total num frames: 1000472576. Throughput: 0: 11832.9. Samples: 250155520. Policy #0 lag: (min: 15.0, avg: 89.8, max: 271.0) [2024-06-15 17:32:15,956][1648985] Avg episode reward: [(0, '176.260')] [2024-06-15 17:32:20,366][1652491] Updated weights for policy 0, policy_version 488560 (0.0015) [2024-06-15 17:32:20,955][1648985] Fps is (10 sec: 52427.6, 60 sec: 45875.1, 300 sec: 46652.7). Total num frames: 1000603648. Throughput: 0: 11810.1. Samples: 250226176. Policy #0 lag: (min: 15.0, avg: 89.8, max: 271.0) [2024-06-15 17:32:20,956][1648985] Avg episode reward: [(0, '139.840')] [2024-06-15 17:32:22,682][1652491] Updated weights for policy 0, policy_version 488592 (0.0014) [2024-06-15 17:32:25,040][1652491] Updated weights for policy 0, policy_version 488688 (0.0100) [2024-06-15 17:32:25,955][1648985] Fps is (10 sec: 42598.2, 60 sec: 48605.8, 300 sec: 46541.7). Total num frames: 1000898560. Throughput: 0: 11734.4. Samples: 250262528. Policy #0 lag: (min: 15.0, avg: 89.8, max: 271.0) [2024-06-15 17:32:25,956][1648985] Avg episode reward: [(0, '131.780')] [2024-06-15 17:32:26,029][1651469] Signal inference workers to stop experience collection... (25450 times) [2024-06-15 17:32:26,071][1652491] InferenceWorker_p0-w0: stopping experience collection (25450 times) [2024-06-15 17:32:26,219][1651469] Signal inference workers to resume experience collection... (25450 times) [2024-06-15 17:32:26,219][1652491] InferenceWorker_p0-w0: resuming experience collection (25450 times) [2024-06-15 17:32:26,321][1652491] Updated weights for policy 0, policy_version 488737 (0.0012) [2024-06-15 17:32:30,955][1648985] Fps is (10 sec: 39322.4, 60 sec: 45329.2, 300 sec: 46208.5). Total num frames: 1000996864. Throughput: 0: 11912.5. Samples: 250332672. Policy #0 lag: (min: 15.0, avg: 89.8, max: 271.0) [2024-06-15 17:32:30,955][1648985] Avg episode reward: [(0, '133.530')] [2024-06-15 17:32:31,690][1652491] Updated weights for policy 0, policy_version 488800 (0.0081) [2024-06-15 17:32:34,163][1652491] Updated weights for policy 0, policy_version 488850 (0.0014) [2024-06-15 17:32:35,838][1652491] Updated weights for policy 0, policy_version 488928 (0.0147) [2024-06-15 17:32:35,955][1648985] Fps is (10 sec: 42598.9, 60 sec: 46967.5, 300 sec: 46543.5). Total num frames: 1001324544. Throughput: 0: 11857.8. Samples: 250401792. Policy #0 lag: (min: 15.0, avg: 89.8, max: 271.0) [2024-06-15 17:32:35,955][1648985] Avg episode reward: [(0, '149.280')] [2024-06-15 17:32:37,075][1652491] Updated weights for policy 0, policy_version 488980 (0.0020) [2024-06-15 17:32:40,956][1648985] Fps is (10 sec: 52424.1, 60 sec: 47525.0, 300 sec: 46208.3). Total num frames: 1001521152. Throughput: 0: 11883.2. Samples: 250432512. Policy #0 lag: (min: 15.0, avg: 89.8, max: 271.0) [2024-06-15 17:32:40,956][1648985] Avg episode reward: [(0, '128.640')] [2024-06-15 17:32:42,904][1652491] Updated weights for policy 0, policy_version 489026 (0.0013) [2024-06-15 17:32:44,235][1652491] Updated weights for policy 0, policy_version 489081 (0.0013) [2024-06-15 17:32:45,955][1648985] Fps is (10 sec: 39321.9, 60 sec: 46421.4, 300 sec: 46430.6). Total num frames: 1001717760. Throughput: 0: 11583.4. Samples: 250505728. Policy #0 lag: (min: 15.0, avg: 89.8, max: 271.0) [2024-06-15 17:32:45,955][1648985] Avg episode reward: [(0, '134.360')] [2024-06-15 17:32:46,184][1652491] Updated weights for policy 0, policy_version 489144 (0.0013) [2024-06-15 17:32:47,588][1652491] Updated weights for policy 0, policy_version 489200 (0.0013) [2024-06-15 17:32:49,302][1652491] Updated weights for policy 0, policy_version 489271 (0.0012) [2024-06-15 17:32:50,955][1648985] Fps is (10 sec: 52433.6, 60 sec: 49158.6, 300 sec: 46208.4). Total num frames: 1002045440. Throughput: 0: 11616.7. Samples: 250571264. Policy #0 lag: (min: 15.0, avg: 89.8, max: 271.0) [2024-06-15 17:32:50,956][1648985] Avg episode reward: [(0, '141.080')] [2024-06-15 17:32:55,233][1652491] Updated weights for policy 0, policy_version 489335 (0.0015) [2024-06-15 17:32:55,955][1648985] Fps is (10 sec: 45874.8, 60 sec: 45889.9, 300 sec: 46208.4). Total num frames: 1002176512. Throughput: 0: 11730.5. Samples: 250610688. Policy #0 lag: (min: 15.0, avg: 89.8, max: 271.0) [2024-06-15 17:32:55,955][1648985] Avg episode reward: [(0, '125.670')] [2024-06-15 17:32:55,961][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000489344_1002176512.pth... [2024-06-15 17:32:56,011][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000483904_991035392.pth [2024-06-15 17:32:57,922][1652491] Updated weights for policy 0, policy_version 489398 (0.0042) [2024-06-15 17:32:59,050][1652491] Updated weights for policy 0, policy_version 489440 (0.0037) [2024-06-15 17:33:00,476][1652491] Updated weights for policy 0, policy_version 489491 (0.0030) [2024-06-15 17:33:00,955][1648985] Fps is (10 sec: 45875.2, 60 sec: 49154.7, 300 sec: 46097.4). Total num frames: 1002504192. Throughput: 0: 11411.9. Samples: 250669056. Policy #0 lag: (min: 15.0, avg: 89.8, max: 271.0) [2024-06-15 17:33:00,955][1648985] Avg episode reward: [(0, '143.320')] [2024-06-15 17:33:01,332][1652491] Updated weights for policy 0, policy_version 489536 (0.0012) [2024-06-15 17:33:05,955][1648985] Fps is (10 sec: 39321.6, 60 sec: 43690.6, 300 sec: 46208.4). Total num frames: 1002569728. Throughput: 0: 11525.7. Samples: 250744832. Policy #0 lag: (min: 15.0, avg: 89.8, max: 271.0) [2024-06-15 17:33:05,955][1648985] Avg episode reward: [(0, '160.410')] [2024-06-15 17:33:09,641][1652491] Updated weights for policy 0, policy_version 489632 (0.0015) [2024-06-15 17:33:10,955][1648985] Fps is (10 sec: 32766.7, 60 sec: 45874.9, 300 sec: 45875.1). Total num frames: 1002831872. Throughput: 0: 11434.6. Samples: 250777088. Policy #0 lag: (min: 15.0, avg: 89.8, max: 271.0) [2024-06-15 17:33:10,956][1648985] Avg episode reward: [(0, '152.240')] [2024-06-15 17:33:11,106][1651469] Signal inference workers to stop experience collection... (25500 times) [2024-06-15 17:33:11,156][1652491] InferenceWorker_p0-w0: stopping experience collection (25500 times) [2024-06-15 17:33:11,316][1651469] Signal inference workers to resume experience collection... (25500 times) [2024-06-15 17:33:11,316][1652491] InferenceWorker_p0-w0: resuming experience collection (25500 times) [2024-06-15 17:33:11,458][1652491] Updated weights for policy 0, policy_version 489698 (0.0012) [2024-06-15 17:33:12,674][1652491] Updated weights for policy 0, policy_version 489760 (0.0026) [2024-06-15 17:33:15,955][1648985] Fps is (10 sec: 52427.7, 60 sec: 43690.6, 300 sec: 46208.4). Total num frames: 1003094016. Throughput: 0: 11275.3. Samples: 250840064. Policy #0 lag: (min: 15.0, avg: 89.8, max: 271.0) [2024-06-15 17:33:15,956][1648985] Avg episode reward: [(0, '145.180')] [2024-06-15 17:33:18,003][1652491] Updated weights for policy 0, policy_version 489824 (0.0012) [2024-06-15 17:33:20,958][1648985] Fps is (10 sec: 42585.8, 60 sec: 44234.5, 300 sec: 45985.7). Total num frames: 1003257856. Throughput: 0: 11285.9. Samples: 250909696. Policy #0 lag: (min: 15.0, avg: 89.8, max: 271.0) [2024-06-15 17:33:20,959][1648985] Avg episode reward: [(0, '138.940')] [2024-06-15 17:33:21,517][1652491] Updated weights for policy 0, policy_version 489904 (0.0012) [2024-06-15 17:33:22,993][1652491] Updated weights for policy 0, policy_version 489956 (0.0016) [2024-06-15 17:33:24,443][1652491] Updated weights for policy 0, policy_version 490017 (0.0013) [2024-06-15 17:33:25,017][1652491] Updated weights for policy 0, policy_version 490045 (0.0011) [2024-06-15 17:33:25,955][1648985] Fps is (10 sec: 52428.3, 60 sec: 45328.9, 300 sec: 46208.4). Total num frames: 1003618304. Throughput: 0: 11252.8. Samples: 250938880. Policy #0 lag: (min: 15.0, avg: 89.8, max: 271.0) [2024-06-15 17:33:25,956][1648985] Avg episode reward: [(0, '158.280')] [2024-06-15 17:33:29,693][1652491] Updated weights for policy 0, policy_version 490103 (0.0170) [2024-06-15 17:33:30,955][1648985] Fps is (10 sec: 49168.7, 60 sec: 45875.3, 300 sec: 46208.5). Total num frames: 1003749376. Throughput: 0: 11264.0. Samples: 251012608. Policy #0 lag: (min: 15.0, avg: 105.4, max: 271.0) [2024-06-15 17:33:30,956][1648985] Avg episode reward: [(0, '152.590')] [2024-06-15 17:33:32,785][1652491] Updated weights for policy 0, policy_version 490160 (0.0013) [2024-06-15 17:33:34,120][1652491] Updated weights for policy 0, policy_version 490210 (0.0016) [2024-06-15 17:33:35,454][1652491] Updated weights for policy 0, policy_version 490272 (0.0096) [2024-06-15 17:33:35,955][1648985] Fps is (10 sec: 49153.8, 60 sec: 46421.4, 300 sec: 46097.4). Total num frames: 1004109824. Throughput: 0: 11241.2. Samples: 251077120. Policy #0 lag: (min: 15.0, avg: 105.4, max: 271.0) [2024-06-15 17:33:35,955][1648985] Avg episode reward: [(0, '157.050')] [2024-06-15 17:33:39,537][1652491] Updated weights for policy 0, policy_version 490306 (0.0012) [2024-06-15 17:33:40,635][1652491] Updated weights for policy 0, policy_version 490363 (0.0135) [2024-06-15 17:33:40,955][1648985] Fps is (10 sec: 52428.2, 60 sec: 45875.9, 300 sec: 46542.0). Total num frames: 1004273664. Throughput: 0: 11332.3. Samples: 251120640. Policy #0 lag: (min: 15.0, avg: 105.4, max: 271.0) [2024-06-15 17:33:40,956][1648985] Avg episode reward: [(0, '142.020')] [2024-06-15 17:33:44,963][1652491] Updated weights for policy 0, policy_version 490432 (0.0014) [2024-06-15 17:33:45,955][1648985] Fps is (10 sec: 36044.7, 60 sec: 45875.2, 300 sec: 45986.3). Total num frames: 1004470272. Throughput: 0: 11593.9. Samples: 251190784. Policy #0 lag: (min: 15.0, avg: 105.4, max: 271.0) [2024-06-15 17:33:45,956][1648985] Avg episode reward: [(0, '134.470')] [2024-06-15 17:33:46,629][1652491] Updated weights for policy 0, policy_version 490497 (0.0025) [2024-06-15 17:33:47,942][1652491] Updated weights for policy 0, policy_version 490555 (0.0012) [2024-06-15 17:33:50,959][1648985] Fps is (10 sec: 39306.2, 60 sec: 43687.8, 300 sec: 46207.8). Total num frames: 1004666880. Throughput: 0: 11456.4. Samples: 251260416. Policy #0 lag: (min: 15.0, avg: 105.4, max: 271.0) [2024-06-15 17:33:50,960][1648985] Avg episode reward: [(0, '141.470')] [2024-06-15 17:33:52,214][1652491] Updated weights for policy 0, policy_version 490608 (0.0011) [2024-06-15 17:33:54,170][1651469] Signal inference workers to stop experience collection... (25550 times) [2024-06-15 17:33:54,233][1652491] InferenceWorker_p0-w0: stopping experience collection (25550 times) [2024-06-15 17:33:54,448][1651469] Signal inference workers to resume experience collection... (25550 times) [2024-06-15 17:33:54,449][1652491] InferenceWorker_p0-w0: resuming experience collection (25550 times) [2024-06-15 17:33:55,496][1652491] Updated weights for policy 0, policy_version 490676 (0.0015) [2024-06-15 17:33:55,969][1648985] Fps is (10 sec: 45812.3, 60 sec: 45864.7, 300 sec: 46095.2). Total num frames: 1004929024. Throughput: 0: 11567.8. Samples: 251297792. Policy #0 lag: (min: 15.0, avg: 105.4, max: 271.0) [2024-06-15 17:33:55,969][1648985] Avg episode reward: [(0, '153.490')] [2024-06-15 17:33:57,219][1652491] Updated weights for policy 0, policy_version 490739 (0.0013) [2024-06-15 17:33:58,806][1652491] Updated weights for policy 0, policy_version 490810 (0.0085) [2024-06-15 17:34:00,955][1648985] Fps is (10 sec: 52449.4, 60 sec: 44782.9, 300 sec: 46208.4). Total num frames: 1005191168. Throughput: 0: 11480.2. Samples: 251356672. Policy #0 lag: (min: 15.0, avg: 105.4, max: 271.0) [2024-06-15 17:34:00,956][1648985] Avg episode reward: [(0, '161.810')] [2024-06-15 17:34:03,719][1652491] Updated weights for policy 0, policy_version 490864 (0.0015) [2024-06-15 17:34:05,955][1648985] Fps is (10 sec: 39375.7, 60 sec: 45875.2, 300 sec: 46097.4). Total num frames: 1005322240. Throughput: 0: 11720.0. Samples: 251437056. Policy #0 lag: (min: 15.0, avg: 105.4, max: 271.0) [2024-06-15 17:34:05,955][1648985] Avg episode reward: [(0, '151.550')] [2024-06-15 17:34:06,956][1652491] Updated weights for policy 0, policy_version 490916 (0.0013) [2024-06-15 17:34:09,069][1652491] Updated weights for policy 0, policy_version 490994 (0.0012) [2024-06-15 17:34:10,826][1652491] Updated weights for policy 0, policy_version 491064 (0.0013) [2024-06-15 17:34:10,970][1648985] Fps is (10 sec: 52349.1, 60 sec: 48047.8, 300 sec: 46206.1). Total num frames: 1005715456. Throughput: 0: 11499.1. Samples: 251456512. Policy #0 lag: (min: 15.0, avg: 105.4, max: 271.0) [2024-06-15 17:34:10,971][1648985] Avg episode reward: [(0, '153.460')] [2024-06-15 17:34:15,551][1652491] Updated weights for policy 0, policy_version 491134 (0.0018) [2024-06-15 17:34:15,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 45875.4, 300 sec: 46208.4). Total num frames: 1005846528. Throughput: 0: 11468.8. Samples: 251528704. Policy #0 lag: (min: 15.0, avg: 105.4, max: 271.0) [2024-06-15 17:34:15,955][1648985] Avg episode reward: [(0, '156.930')] [2024-06-15 17:34:19,244][1652491] Updated weights for policy 0, policy_version 491187 (0.0011) [2024-06-15 17:34:20,955][1648985] Fps is (10 sec: 36099.9, 60 sec: 46970.1, 300 sec: 45875.2). Total num frames: 1006075904. Throughput: 0: 11389.2. Samples: 251589632. Policy #0 lag: (min: 15.0, avg: 105.4, max: 271.0) [2024-06-15 17:34:20,955][1648985] Avg episode reward: [(0, '156.250')] [2024-06-15 17:34:21,255][1652491] Updated weights for policy 0, policy_version 491265 (0.0100) [2024-06-15 17:34:22,706][1652491] Updated weights for policy 0, policy_version 491324 (0.0013) [2024-06-15 17:34:25,955][1648985] Fps is (10 sec: 42597.5, 60 sec: 44236.9, 300 sec: 46208.4). Total num frames: 1006272512. Throughput: 0: 11138.8. Samples: 251621888. Policy #0 lag: (min: 15.0, avg: 105.4, max: 271.0) [2024-06-15 17:34:25,956][1648985] Avg episode reward: [(0, '158.610')] [2024-06-15 17:34:26,691][1652491] Updated weights for policy 0, policy_version 491379 (0.0104) [2024-06-15 17:34:30,251][1652491] Updated weights for policy 0, policy_version 491446 (0.0014) [2024-06-15 17:34:30,955][1648985] Fps is (10 sec: 45875.1, 60 sec: 46421.3, 300 sec: 46097.5). Total num frames: 1006534656. Throughput: 0: 11332.3. Samples: 251700736. Policy #0 lag: (min: 15.0, avg: 105.4, max: 271.0) [2024-06-15 17:34:30,955][1648985] Avg episode reward: [(0, '154.170')] [2024-06-15 17:34:31,361][1652491] Updated weights for policy 0, policy_version 491491 (0.0043) [2024-06-15 17:34:32,677][1651469] Signal inference workers to stop experience collection... (25600 times) [2024-06-15 17:34:32,729][1652491] InferenceWorker_p0-w0: stopping experience collection (25600 times) [2024-06-15 17:34:32,941][1651469] Signal inference workers to resume experience collection... (25600 times) [2024-06-15 17:34:32,942][1652491] InferenceWorker_p0-w0: resuming experience collection (25600 times) [2024-06-15 17:34:32,944][1652491] Updated weights for policy 0, policy_version 491552 (0.0082) [2024-06-15 17:34:33,550][1652491] Updated weights for policy 0, policy_version 491583 (0.0014) [2024-06-15 17:34:35,955][1648985] Fps is (10 sec: 49151.6, 60 sec: 44236.6, 300 sec: 46208.4). Total num frames: 1006764032. Throughput: 0: 11378.7. Samples: 251772416. Policy #0 lag: (min: 15.0, avg: 105.4, max: 271.0) [2024-06-15 17:34:35,956][1648985] Avg episode reward: [(0, '160.100')] [2024-06-15 17:34:37,403][1652491] Updated weights for policy 0, policy_version 491637 (0.0013) [2024-06-15 17:34:40,728][1652491] Updated weights for policy 0, policy_version 491696 (0.0016) [2024-06-15 17:34:40,963][1648985] Fps is (10 sec: 45839.1, 60 sec: 45323.2, 300 sec: 46207.2). Total num frames: 1006993408. Throughput: 0: 11595.5. Samples: 251819520. Policy #0 lag: (min: 15.0, avg: 105.4, max: 271.0) [2024-06-15 17:34:40,964][1648985] Avg episode reward: [(0, '144.050')] [2024-06-15 17:34:42,017][1652491] Updated weights for policy 0, policy_version 491750 (0.0014) [2024-06-15 17:34:43,253][1652491] Updated weights for policy 0, policy_version 491808 (0.0014) [2024-06-15 17:34:45,955][1648985] Fps is (10 sec: 52430.3, 60 sec: 46967.5, 300 sec: 46208.4). Total num frames: 1007288320. Throughput: 0: 11730.5. Samples: 251884544. Policy #0 lag: (min: 15.0, avg: 105.4, max: 271.0) [2024-06-15 17:34:45,955][1648985] Avg episode reward: [(0, '130.230')] [2024-06-15 17:34:46,549][1652491] Updated weights for policy 0, policy_version 491841 (0.0014) [2024-06-15 17:34:47,649][1652491] Updated weights for policy 0, policy_version 491896 (0.0012) [2024-06-15 17:34:50,955][1648985] Fps is (10 sec: 45911.0, 60 sec: 46424.3, 300 sec: 46319.5). Total num frames: 1007452160. Throughput: 0: 11730.5. Samples: 251964928. Policy #0 lag: (min: 15.0, avg: 105.4, max: 271.0) [2024-06-15 17:34:50,956][1648985] Avg episode reward: [(0, '135.870')] [2024-06-15 17:34:51,631][1652491] Updated weights for policy 0, policy_version 491952 (0.0013) [2024-06-15 17:34:53,079][1652491] Updated weights for policy 0, policy_version 492016 (0.0011) [2024-06-15 17:34:54,706][1652491] Updated weights for policy 0, policy_version 492088 (0.0012) [2024-06-15 17:34:55,961][1648985] Fps is (10 sec: 52396.6, 60 sec: 48065.8, 300 sec: 46318.6). Total num frames: 1007812608. Throughput: 0: 11846.7. Samples: 251989504. Policy #0 lag: (min: 15.0, avg: 105.4, max: 271.0) [2024-06-15 17:34:55,962][1648985] Avg episode reward: [(0, '137.050')] [2024-06-15 17:34:55,970][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000492096_1007812608.pth... [2024-06-15 17:34:56,048][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000486656_996671488.pth [2024-06-15 17:34:58,099][1652491] Updated weights for policy 0, policy_version 492130 (0.0014) [2024-06-15 17:35:00,955][1648985] Fps is (10 sec: 49152.5, 60 sec: 45875.2, 300 sec: 46208.5). Total num frames: 1007943680. Throughput: 0: 12026.3. Samples: 252069888. Policy #0 lag: (min: 10.0, avg: 129.8, max: 266.0) [2024-06-15 17:35:00,955][1648985] Avg episode reward: [(0, '150.430')] [2024-06-15 17:35:02,251][1652491] Updated weights for policy 0, policy_version 492193 (0.0014) [2024-06-15 17:35:04,019][1652491] Updated weights for policy 0, policy_version 492258 (0.0013) [2024-06-15 17:35:05,636][1652491] Updated weights for policy 0, policy_version 492320 (0.0014) [2024-06-15 17:35:05,955][1648985] Fps is (10 sec: 45902.8, 60 sec: 49151.9, 300 sec: 46430.6). Total num frames: 1008271360. Throughput: 0: 12003.5. Samples: 252129792. Policy #0 lag: (min: 10.0, avg: 129.8, max: 266.0) [2024-06-15 17:35:05,956][1648985] Avg episode reward: [(0, '148.270')] [2024-06-15 17:35:08,363][1652491] Updated weights for policy 0, policy_version 492370 (0.0024) [2024-06-15 17:35:09,095][1652491] Updated weights for policy 0, policy_version 492416 (0.0101) [2024-06-15 17:35:10,955][1648985] Fps is (10 sec: 52427.6, 60 sec: 45886.7, 300 sec: 46430.6). Total num frames: 1008467968. Throughput: 0: 12037.7. Samples: 252163584. Policy #0 lag: (min: 10.0, avg: 129.8, max: 266.0) [2024-06-15 17:35:10,956][1648985] Avg episode reward: [(0, '150.330')] [2024-06-15 17:35:15,443][1651469] Signal inference workers to stop experience collection... (25650 times) [2024-06-15 17:35:15,511][1652491] InferenceWorker_p0-w0: stopping experience collection (25650 times) [2024-06-15 17:35:15,528][1652491] Updated weights for policy 0, policy_version 492503 (0.0195) [2024-06-15 17:35:15,625][1651469] Signal inference workers to resume experience collection... (25650 times) [2024-06-15 17:35:15,626][1652491] InferenceWorker_p0-w0: resuming experience collection (25650 times) [2024-06-15 17:35:15,961][1648985] Fps is (10 sec: 39298.4, 60 sec: 46962.8, 300 sec: 46207.5). Total num frames: 1008664576. Throughput: 0: 12058.8. Samples: 252243456. Policy #0 lag: (min: 10.0, avg: 129.8, max: 266.0) [2024-06-15 17:35:15,962][1648985] Avg episode reward: [(0, '164.800')] [2024-06-15 17:35:16,880][1652491] Updated weights for policy 0, policy_version 492560 (0.0013) [2024-06-15 17:35:19,547][1652491] Updated weights for policy 0, policy_version 492611 (0.0023) [2024-06-15 17:35:20,433][1652491] Updated weights for policy 0, policy_version 492661 (0.0011) [2024-06-15 17:35:20,955][1648985] Fps is (10 sec: 52429.8, 60 sec: 48605.8, 300 sec: 46652.8). Total num frames: 1008992256. Throughput: 0: 11855.7. Samples: 252305920. Policy #0 lag: (min: 10.0, avg: 129.8, max: 266.0) [2024-06-15 17:35:20,955][1648985] Avg episode reward: [(0, '166.270')] [2024-06-15 17:35:25,902][1652491] Updated weights for policy 0, policy_version 492721 (0.0012) [2024-06-15 17:35:25,958][1648985] Fps is (10 sec: 42610.2, 60 sec: 46965.1, 300 sec: 46319.0). Total num frames: 1009090560. Throughput: 0: 11856.9. Samples: 252353024. Policy #0 lag: (min: 10.0, avg: 129.8, max: 266.0) [2024-06-15 17:35:25,959][1648985] Avg episode reward: [(0, '159.400')] [2024-06-15 17:35:26,929][1652491] Updated weights for policy 0, policy_version 492770 (0.0012) [2024-06-15 17:35:28,829][1652491] Updated weights for policy 0, policy_version 492855 (0.0014) [2024-06-15 17:35:30,955][1648985] Fps is (10 sec: 39320.2, 60 sec: 47513.3, 300 sec: 46215.2). Total num frames: 1009385472. Throughput: 0: 11662.1. Samples: 252409344. Policy #0 lag: (min: 10.0, avg: 129.8, max: 266.0) [2024-06-15 17:35:30,956][1648985] Avg episode reward: [(0, '144.390')] [2024-06-15 17:35:31,738][1652491] Updated weights for policy 0, policy_version 492912 (0.0012) [2024-06-15 17:35:35,958][1648985] Fps is (10 sec: 42598.6, 60 sec: 45873.0, 300 sec: 46207.9). Total num frames: 1009516544. Throughput: 0: 11684.2. Samples: 252490752. Policy #0 lag: (min: 10.0, avg: 129.8, max: 266.0) [2024-06-15 17:35:35,959][1648985] Avg episode reward: [(0, '153.920')] [2024-06-15 17:35:37,027][1652491] Updated weights for policy 0, policy_version 492961 (0.0012) [2024-06-15 17:35:38,883][1652491] Updated weights for policy 0, policy_version 493046 (0.0194) [2024-06-15 17:35:40,864][1652491] Updated weights for policy 0, policy_version 493115 (0.0014) [2024-06-15 17:35:40,955][1648985] Fps is (10 sec: 52430.5, 60 sec: 48612.2, 300 sec: 46541.7). Total num frames: 1009909760. Throughput: 0: 11663.8. Samples: 252514304. Policy #0 lag: (min: 10.0, avg: 129.8, max: 266.0) [2024-06-15 17:35:40,955][1648985] Avg episode reward: [(0, '149.080')] [2024-06-15 17:35:43,831][1652491] Updated weights for policy 0, policy_version 493182 (0.0012) [2024-06-15 17:35:45,955][1648985] Fps is (10 sec: 52445.6, 60 sec: 45875.2, 300 sec: 46208.4). Total num frames: 1010040832. Throughput: 0: 11264.0. Samples: 252576768. Policy #0 lag: (min: 10.0, avg: 129.8, max: 266.0) [2024-06-15 17:35:45,955][1648985] Avg episode reward: [(0, '157.380')] [2024-06-15 17:35:49,221][1652491] Updated weights for policy 0, policy_version 493235 (0.0137) [2024-06-15 17:35:50,251][1652491] Updated weights for policy 0, policy_version 493283 (0.0015) [2024-06-15 17:35:50,968][1648985] Fps is (10 sec: 39270.2, 60 sec: 47503.3, 300 sec: 46206.4). Total num frames: 1010302976. Throughput: 0: 11602.0. Samples: 252652032. Policy #0 lag: (min: 10.0, avg: 129.8, max: 266.0) [2024-06-15 17:35:50,969][1648985] Avg episode reward: [(0, '141.840')] [2024-06-15 17:35:51,713][1652491] Updated weights for policy 0, policy_version 493360 (0.0015) [2024-06-15 17:35:54,184][1652491] Updated weights for policy 0, policy_version 493397 (0.0012) [2024-06-15 17:35:54,469][1651469] Signal inference workers to stop experience collection... (25700 times) [2024-06-15 17:35:54,502][1652491] InferenceWorker_p0-w0: stopping experience collection (25700 times) [2024-06-15 17:35:54,709][1651469] Signal inference workers to resume experience collection... (25700 times) [2024-06-15 17:35:54,710][1652491] InferenceWorker_p0-w0: resuming experience collection (25700 times) [2024-06-15 17:35:55,955][1648985] Fps is (10 sec: 52428.6, 60 sec: 45879.9, 300 sec: 46541.7). Total num frames: 1010565120. Throughput: 0: 11707.8. Samples: 252690432. Policy #0 lag: (min: 10.0, avg: 129.8, max: 266.0) [2024-06-15 17:35:55,955][1648985] Avg episode reward: [(0, '145.460')] [2024-06-15 17:35:59,119][1652491] Updated weights for policy 0, policy_version 493458 (0.0013) [2024-06-15 17:36:00,957][1652491] Updated weights for policy 0, policy_version 493536 (0.0013) [2024-06-15 17:36:00,959][1648985] Fps is (10 sec: 45916.1, 60 sec: 46964.1, 300 sec: 46652.1). Total num frames: 1010761728. Throughput: 0: 11548.9. Samples: 252763136. Policy #0 lag: (min: 10.0, avg: 129.8, max: 266.0) [2024-06-15 17:36:00,960][1648985] Avg episode reward: [(0, '141.800')] [2024-06-15 17:36:02,212][1652491] Updated weights for policy 0, policy_version 493588 (0.0037) [2024-06-15 17:36:05,306][1652491] Updated weights for policy 0, policy_version 493652 (0.0025) [2024-06-15 17:36:05,955][1648985] Fps is (10 sec: 45875.2, 60 sec: 45875.3, 300 sec: 46431.6). Total num frames: 1011023872. Throughput: 0: 11559.8. Samples: 252826112. Policy #0 lag: (min: 10.0, avg: 129.8, max: 266.0) [2024-06-15 17:36:05,956][1648985] Avg episode reward: [(0, '156.770')] [2024-06-15 17:36:10,049][1652491] Updated weights for policy 0, policy_version 493712 (0.0011) [2024-06-15 17:36:10,955][1648985] Fps is (10 sec: 42616.8, 60 sec: 45329.3, 300 sec: 46541.8). Total num frames: 1011187712. Throughput: 0: 11412.8. Samples: 252866560. Policy #0 lag: (min: 10.0, avg: 129.8, max: 266.0) [2024-06-15 17:36:10,955][1648985] Avg episode reward: [(0, '163.260')] [2024-06-15 17:36:12,168][1652491] Updated weights for policy 0, policy_version 493792 (0.0012) [2024-06-15 17:36:13,860][1652491] Updated weights for policy 0, policy_version 493872 (0.0014) [2024-06-15 17:36:15,955][1648985] Fps is (10 sec: 45875.2, 60 sec: 46972.1, 300 sec: 46208.4). Total num frames: 1011482624. Throughput: 0: 11457.5. Samples: 252924928. Policy #0 lag: (min: 10.0, avg: 129.8, max: 266.0) [2024-06-15 17:36:15,956][1648985] Avg episode reward: [(0, '151.410')] [2024-06-15 17:36:17,377][1652491] Updated weights for policy 0, policy_version 493921 (0.0012) [2024-06-15 17:36:20,955][1648985] Fps is (10 sec: 42598.1, 60 sec: 43690.7, 300 sec: 46208.4). Total num frames: 1011613696. Throughput: 0: 11310.3. Samples: 252999680. Policy #0 lag: (min: 10.0, avg: 129.8, max: 266.0) [2024-06-15 17:36:20,955][1648985] Avg episode reward: [(0, '138.640')] [2024-06-15 17:36:22,011][1652491] Updated weights for policy 0, policy_version 493959 (0.0012) [2024-06-15 17:36:23,555][1652491] Updated weights for policy 0, policy_version 494020 (0.0014) [2024-06-15 17:36:25,424][1652491] Updated weights for policy 0, policy_version 494120 (0.0098) [2024-06-15 17:36:25,970][1648985] Fps is (10 sec: 52348.7, 60 sec: 48596.1, 300 sec: 46539.3). Total num frames: 1012006912. Throughput: 0: 11476.3. Samples: 253030912. Policy #0 lag: (min: 111.0, avg: 161.5, max: 335.0) [2024-06-15 17:36:25,971][1648985] Avg episode reward: [(0, '139.050')] [2024-06-15 17:36:29,273][1652491] Updated weights for policy 0, policy_version 494177 (0.0034) [2024-06-15 17:36:30,982][1648985] Fps is (10 sec: 52285.6, 60 sec: 45854.5, 300 sec: 46204.2). Total num frames: 1012137984. Throughput: 0: 11473.2. Samples: 253093376. Policy #0 lag: (min: 111.0, avg: 161.5, max: 335.0) [2024-06-15 17:36:30,983][1648985] Avg episode reward: [(0, '146.850')] [2024-06-15 17:36:33,672][1652491] Updated weights for policy 0, policy_version 494210 (0.0012) [2024-06-15 17:36:35,643][1652491] Updated weights for policy 0, policy_version 494293 (0.0077) [2024-06-15 17:36:35,955][1648985] Fps is (10 sec: 32818.3, 60 sec: 46969.9, 300 sec: 46321.9). Total num frames: 1012334592. Throughput: 0: 11369.7. Samples: 253163520. Policy #0 lag: (min: 111.0, avg: 161.5, max: 335.0) [2024-06-15 17:36:35,956][1648985] Avg episode reward: [(0, '178.330')] [2024-06-15 17:36:36,639][1651469] Signal inference workers to stop experience collection... (25750 times) [2024-06-15 17:36:36,679][1652491] InferenceWorker_p0-w0: stopping experience collection (25750 times) [2024-06-15 17:36:36,681][1652491] Updated weights for policy 0, policy_version 494341 (0.0023) [2024-06-15 17:36:36,808][1651469] Signal inference workers to resume experience collection... (25750 times) [2024-06-15 17:36:36,809][1652491] InferenceWorker_p0-w0: resuming experience collection (25750 times) [2024-06-15 17:36:39,667][1652491] Updated weights for policy 0, policy_version 494403 (0.0023) [2024-06-15 17:36:40,955][1648985] Fps is (10 sec: 49286.3, 60 sec: 45329.0, 300 sec: 46430.6). Total num frames: 1012629504. Throughput: 0: 11264.0. Samples: 253197312. Policy #0 lag: (min: 111.0, avg: 161.5, max: 335.0) [2024-06-15 17:36:40,956][1648985] Avg episode reward: [(0, '180.860')] [2024-06-15 17:36:40,975][1652491] Updated weights for policy 0, policy_version 494457 (0.0049) [2024-06-15 17:36:45,653][1652491] Updated weights for policy 0, policy_version 494520 (0.0015) [2024-06-15 17:36:45,955][1648985] Fps is (10 sec: 45874.7, 60 sec: 45875.1, 300 sec: 46431.8). Total num frames: 1012793344. Throughput: 0: 11458.5. Samples: 253278720. Policy #0 lag: (min: 111.0, avg: 161.5, max: 335.0) [2024-06-15 17:36:45,956][1648985] Avg episode reward: [(0, '165.280')] [2024-06-15 17:36:47,114][1652491] Updated weights for policy 0, policy_version 494578 (0.0012) [2024-06-15 17:36:50,556][1652491] Updated weights for policy 0, policy_version 494657 (0.0013) [2024-06-15 17:36:50,973][1648985] Fps is (10 sec: 45792.5, 60 sec: 46417.4, 300 sec: 46319.7). Total num frames: 1013088256. Throughput: 0: 11486.9. Samples: 253343232. Policy #0 lag: (min: 111.0, avg: 161.5, max: 335.0) [2024-06-15 17:36:50,974][1648985] Avg episode reward: [(0, '146.250')] [2024-06-15 17:36:51,940][1652491] Updated weights for policy 0, policy_version 494720 (0.0040) [2024-06-15 17:36:55,955][1648985] Fps is (10 sec: 39321.2, 60 sec: 43690.5, 300 sec: 46208.9). Total num frames: 1013186560. Throughput: 0: 11389.1. Samples: 253379072. Policy #0 lag: (min: 111.0, avg: 161.5, max: 335.0) [2024-06-15 17:36:55,956][1648985] Avg episode reward: [(0, '141.980')] [2024-06-15 17:36:56,501][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000494752_1013252096.pth... [2024-06-15 17:36:56,632][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000489344_1002176512.pth [2024-06-15 17:36:58,029][1652491] Updated weights for policy 0, policy_version 494816 (0.0013) [2024-06-15 17:36:59,225][1652491] Updated weights for policy 0, policy_version 494880 (0.0091) [2024-06-15 17:37:00,955][1648985] Fps is (10 sec: 49241.4, 60 sec: 46970.8, 300 sec: 46208.4). Total num frames: 1013579776. Throughput: 0: 11525.7. Samples: 253443584. Policy #0 lag: (min: 111.0, avg: 161.5, max: 335.0) [2024-06-15 17:37:00,955][1648985] Avg episode reward: [(0, '148.730')] [2024-06-15 17:37:03,204][1652491] Updated weights for policy 0, policy_version 494950 (0.0101) [2024-06-15 17:37:05,955][1648985] Fps is (10 sec: 52430.2, 60 sec: 44783.0, 300 sec: 46208.4). Total num frames: 1013710848. Throughput: 0: 11582.6. Samples: 253520896. Policy #0 lag: (min: 111.0, avg: 161.5, max: 335.0) [2024-06-15 17:37:05,955][1648985] Avg episode reward: [(0, '151.110')] [2024-06-15 17:37:07,440][1652491] Updated weights for policy 0, policy_version 494980 (0.0015) [2024-06-15 17:37:09,056][1652491] Updated weights for policy 0, policy_version 495072 (0.0013) [2024-06-15 17:37:10,966][1648985] Fps is (10 sec: 45823.3, 60 sec: 47504.5, 300 sec: 45984.5). Total num frames: 1014038528. Throughput: 0: 11663.3. Samples: 253555712. Policy #0 lag: (min: 111.0, avg: 161.5, max: 335.0) [2024-06-15 17:37:10,967][1648985] Avg episode reward: [(0, '147.980')] [2024-06-15 17:37:11,014][1652491] Updated weights for policy 0, policy_version 495152 (0.0090) [2024-06-15 17:37:13,727][1652491] Updated weights for policy 0, policy_version 495200 (0.0012) [2024-06-15 17:37:14,520][1652491] Updated weights for policy 0, policy_version 495230 (0.0047) [2024-06-15 17:37:15,955][1648985] Fps is (10 sec: 52427.8, 60 sec: 45875.1, 300 sec: 46208.4). Total num frames: 1014235136. Throughput: 0: 11726.2. Samples: 253620736. Policy #0 lag: (min: 111.0, avg: 161.5, max: 335.0) [2024-06-15 17:37:15,956][1648985] Avg episode reward: [(0, '140.750')] [2024-06-15 17:37:19,306][1651469] Signal inference workers to stop experience collection... (25800 times) [2024-06-15 17:37:19,382][1652491] InferenceWorker_p0-w0: stopping experience collection (25800 times) [2024-06-15 17:37:19,524][1651469] Signal inference workers to resume experience collection... (25800 times) [2024-06-15 17:37:19,525][1652491] InferenceWorker_p0-w0: resuming experience collection (25800 times) [2024-06-15 17:37:19,988][1652491] Updated weights for policy 0, policy_version 495300 (0.0015) [2024-06-15 17:37:20,958][1648985] Fps is (10 sec: 42633.2, 60 sec: 47511.1, 300 sec: 45985.8). Total num frames: 1014464512. Throughput: 0: 11775.2. Samples: 253693440. Policy #0 lag: (min: 111.0, avg: 161.5, max: 335.0) [2024-06-15 17:37:20,958][1648985] Avg episode reward: [(0, '143.600')] [2024-06-15 17:37:21,345][1652491] Updated weights for policy 0, policy_version 495376 (0.0104) [2024-06-15 17:37:22,318][1652491] Updated weights for policy 0, policy_version 495424 (0.0012) [2024-06-15 17:37:24,995][1652491] Updated weights for policy 0, policy_version 495482 (0.0101) [2024-06-15 17:37:25,988][1648985] Fps is (10 sec: 52255.1, 60 sec: 45861.4, 300 sec: 46647.5). Total num frames: 1014759424. Throughput: 0: 11881.0. Samples: 253732352. Policy #0 lag: (min: 111.0, avg: 161.5, max: 335.0) [2024-06-15 17:37:25,989][1648985] Avg episode reward: [(0, '156.550')] [2024-06-15 17:37:30,858][1652491] Updated weights for policy 0, policy_version 495536 (0.0036) [2024-06-15 17:37:30,955][1648985] Fps is (10 sec: 39334.0, 60 sec: 45349.7, 300 sec: 45875.2). Total num frames: 1014857728. Throughput: 0: 11696.4. Samples: 253805056. Policy #0 lag: (min: 111.0, avg: 161.5, max: 335.0) [2024-06-15 17:37:30,956][1648985] Avg episode reward: [(0, '158.960')] [2024-06-15 17:37:31,780][1652491] Updated weights for policy 0, policy_version 495584 (0.0012) [2024-06-15 17:37:32,982][1652491] Updated weights for policy 0, policy_version 495648 (0.0022) [2024-06-15 17:37:34,139][1652491] Updated weights for policy 0, policy_version 495681 (0.0014) [2024-06-15 17:37:35,798][1652491] Updated weights for policy 0, policy_version 495744 (0.0011) [2024-06-15 17:37:35,955][1648985] Fps is (10 sec: 52604.5, 60 sec: 49152.0, 300 sec: 46652.9). Total num frames: 1015283712. Throughput: 0: 11723.8. Samples: 253870592. Policy #0 lag: (min: 111.0, avg: 161.5, max: 335.0) [2024-06-15 17:37:35,955][1648985] Avg episode reward: [(0, '165.070')] [2024-06-15 17:37:40,960][1648985] Fps is (10 sec: 45852.7, 60 sec: 44779.3, 300 sec: 46096.6). Total num frames: 1015316480. Throughput: 0: 11911.3. Samples: 253915136. Policy #0 lag: (min: 111.0, avg: 161.5, max: 335.0) [2024-06-15 17:37:40,960][1648985] Avg episode reward: [(0, '139.550')] [2024-06-15 17:37:41,897][1652491] Updated weights for policy 0, policy_version 495812 (0.0027) [2024-06-15 17:37:43,649][1652491] Updated weights for policy 0, policy_version 495892 (0.0015) [2024-06-15 17:37:45,867][1652491] Updated weights for policy 0, policy_version 495952 (0.0012) [2024-06-15 17:37:45,955][1648985] Fps is (10 sec: 42598.2, 60 sec: 48605.9, 300 sec: 46319.5). Total num frames: 1015709696. Throughput: 0: 11889.8. Samples: 253978624. Policy #0 lag: (min: 111.0, avg: 161.5, max: 335.0) [2024-06-15 17:37:45,956][1648985] Avg episode reward: [(0, '143.100')] [2024-06-15 17:37:47,052][1652491] Updated weights for policy 0, policy_version 495991 (0.0016) [2024-06-15 17:37:50,955][1648985] Fps is (10 sec: 49175.4, 60 sec: 45342.7, 300 sec: 46208.4). Total num frames: 1015808000. Throughput: 0: 11992.1. Samples: 254060544. Policy #0 lag: (min: 111.0, avg: 161.5, max: 335.0) [2024-06-15 17:37:50,956][1648985] Avg episode reward: [(0, '148.130')] [2024-06-15 17:37:52,925][1652491] Updated weights for policy 0, policy_version 496068 (0.0087) [2024-06-15 17:37:54,740][1652491] Updated weights for policy 0, policy_version 496160 (0.0014) [2024-06-15 17:37:54,870][1651469] Signal inference workers to stop experience collection... (25850 times) [2024-06-15 17:37:54,924][1652491] InferenceWorker_p0-w0: stopping experience collection (25850 times) [2024-06-15 17:37:55,049][1651469] Signal inference workers to resume experience collection... (25850 times) [2024-06-15 17:37:55,050][1652491] InferenceWorker_p0-w0: resuming experience collection (25850 times) [2024-06-15 17:37:55,955][1648985] Fps is (10 sec: 49151.5, 60 sec: 50244.3, 300 sec: 46430.6). Total num frames: 1016201216. Throughput: 0: 11813.1. Samples: 254087168. Policy #0 lag: (min: 1.0, avg: 64.5, max: 257.0) [2024-06-15 17:37:55,956][1648985] Avg episode reward: [(0, '155.900')] [2024-06-15 17:37:57,421][1652491] Updated weights for policy 0, policy_version 496224 (0.0025) [2024-06-15 17:38:00,955][1648985] Fps is (10 sec: 52429.1, 60 sec: 45875.1, 300 sec: 46652.7). Total num frames: 1016332288. Throughput: 0: 11992.2. Samples: 254160384. Policy #0 lag: (min: 1.0, avg: 64.5, max: 257.0) [2024-06-15 17:38:00,956][1648985] Avg episode reward: [(0, '168.520')] [2024-06-15 17:38:02,832][1652491] Updated weights for policy 0, policy_version 496272 (0.0013) [2024-06-15 17:38:04,937][1652491] Updated weights for policy 0, policy_version 496352 (0.0133) [2024-06-15 17:38:05,955][1648985] Fps is (10 sec: 39321.9, 60 sec: 48059.7, 300 sec: 46652.8). Total num frames: 1016594432. Throughput: 0: 11663.0. Samples: 254218240. Policy #0 lag: (min: 1.0, avg: 64.5, max: 257.0) [2024-06-15 17:38:05,956][1648985] Avg episode reward: [(0, '165.930')] [2024-06-15 17:38:06,643][1652491] Updated weights for policy 0, policy_version 496433 (0.0126) [2024-06-15 17:38:09,046][1652491] Updated weights for policy 0, policy_version 496480 (0.0031) [2024-06-15 17:38:09,801][1652491] Updated weights for policy 0, policy_version 496510 (0.0014) [2024-06-15 17:38:10,968][1648985] Fps is (10 sec: 52360.9, 60 sec: 46966.1, 300 sec: 46650.7). Total num frames: 1016856576. Throughput: 0: 11587.8. Samples: 254253568. Policy #0 lag: (min: 1.0, avg: 64.5, max: 257.0) [2024-06-15 17:38:10,969][1648985] Avg episode reward: [(0, '161.600')] [2024-06-15 17:38:15,650][1652491] Updated weights for policy 0, policy_version 496581 (0.0012) [2024-06-15 17:38:15,955][1648985] Fps is (10 sec: 42598.6, 60 sec: 46421.5, 300 sec: 46653.3). Total num frames: 1017020416. Throughput: 0: 11741.9. Samples: 254333440. Policy #0 lag: (min: 1.0, avg: 64.5, max: 257.0) [2024-06-15 17:38:15,955][1648985] Avg episode reward: [(0, '150.360')] [2024-06-15 17:38:17,704][1652491] Updated weights for policy 0, policy_version 496672 (0.0035) [2024-06-15 17:38:19,913][1652491] Updated weights for policy 0, policy_version 496724 (0.0014) [2024-06-15 17:38:20,871][1652491] Updated weights for policy 0, policy_version 496766 (0.0011) [2024-06-15 17:38:20,958][1648985] Fps is (10 sec: 52480.4, 60 sec: 48605.8, 300 sec: 46652.3). Total num frames: 1017380864. Throughput: 0: 11547.6. Samples: 254390272. Policy #0 lag: (min: 1.0, avg: 64.5, max: 257.0) [2024-06-15 17:38:20,959][1648985] Avg episode reward: [(0, '150.090')] [2024-06-15 17:38:25,955][1648985] Fps is (10 sec: 42597.5, 60 sec: 44807.7, 300 sec: 46430.5). Total num frames: 1017446400. Throughput: 0: 11652.1. Samples: 254439424. Policy #0 lag: (min: 1.0, avg: 64.5, max: 257.0) [2024-06-15 17:38:25,956][1648985] Avg episode reward: [(0, '143.630')] [2024-06-15 17:38:26,263][1652491] Updated weights for policy 0, policy_version 496828 (0.0016) [2024-06-15 17:38:27,793][1652491] Updated weights for policy 0, policy_version 496887 (0.0197) [2024-06-15 17:38:29,198][1652491] Updated weights for policy 0, policy_version 496944 (0.0032) [2024-06-15 17:38:30,955][1648985] Fps is (10 sec: 39334.3, 60 sec: 48605.9, 300 sec: 46319.5). Total num frames: 1017774080. Throughput: 0: 11525.7. Samples: 254497280. Policy #0 lag: (min: 1.0, avg: 64.5, max: 257.0) [2024-06-15 17:38:30,955][1648985] Avg episode reward: [(0, '136.830')] [2024-06-15 17:38:32,289][1652491] Updated weights for policy 0, policy_version 497016 (0.0020) [2024-06-15 17:38:35,955][1648985] Fps is (10 sec: 45876.3, 60 sec: 43690.7, 300 sec: 46208.4). Total num frames: 1017905152. Throughput: 0: 11491.6. Samples: 254577664. Policy #0 lag: (min: 1.0, avg: 64.5, max: 257.0) [2024-06-15 17:38:35,955][1648985] Avg episode reward: [(0, '140.230')] [2024-06-15 17:38:36,970][1652491] Updated weights for policy 0, policy_version 497056 (0.0012) [2024-06-15 17:38:38,741][1651469] Signal inference workers to stop experience collection... (25900 times) [2024-06-15 17:38:38,919][1652491] Updated weights for policy 0, policy_version 497129 (0.0045) [2024-06-15 17:38:38,957][1652491] InferenceWorker_p0-w0: stopping experience collection (25900 times) [2024-06-15 17:38:38,993][1651469] Signal inference workers to resume experience collection... (25900 times) [2024-06-15 17:38:38,994][1652491] InferenceWorker_p0-w0: resuming experience collection (25900 times) [2024-06-15 17:38:39,941][1652491] Updated weights for policy 0, policy_version 497168 (0.0027) [2024-06-15 17:38:40,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 49702.2, 300 sec: 46874.9). Total num frames: 1018298368. Throughput: 0: 11571.2. Samples: 254607872. Policy #0 lag: (min: 1.0, avg: 64.5, max: 257.0) [2024-06-15 17:38:40,956][1648985] Avg episode reward: [(0, '145.900')] [2024-06-15 17:38:42,632][1652491] Updated weights for policy 0, policy_version 497232 (0.0087) [2024-06-15 17:38:45,955][1648985] Fps is (10 sec: 52428.5, 60 sec: 45329.1, 300 sec: 46653.4). Total num frames: 1018429440. Throughput: 0: 11457.4. Samples: 254675968. Policy #0 lag: (min: 1.0, avg: 64.5, max: 257.0) [2024-06-15 17:38:45,955][1648985] Avg episode reward: [(0, '151.970')] [2024-06-15 17:38:47,258][1652491] Updated weights for policy 0, policy_version 497282 (0.0019) [2024-06-15 17:38:48,400][1652491] Updated weights for policy 0, policy_version 497339 (0.0013) [2024-06-15 17:38:49,627][1652491] Updated weights for policy 0, policy_version 497379 (0.0014) [2024-06-15 17:38:50,955][1648985] Fps is (10 sec: 45875.1, 60 sec: 49152.1, 300 sec: 46877.1). Total num frames: 1018757120. Throughput: 0: 11810.1. Samples: 254749696. Policy #0 lag: (min: 1.0, avg: 64.5, max: 257.0) [2024-06-15 17:38:50,956][1648985] Avg episode reward: [(0, '155.580')] [2024-06-15 17:38:51,149][1652491] Updated weights for policy 0, policy_version 497456 (0.0121) [2024-06-15 17:38:53,795][1652491] Updated weights for policy 0, policy_version 497532 (0.0012) [2024-06-15 17:38:55,955][1648985] Fps is (10 sec: 52427.3, 60 sec: 45875.1, 300 sec: 46652.7). Total num frames: 1018953728. Throughput: 0: 11847.6. Samples: 254786560. Policy #0 lag: (min: 1.0, avg: 64.5, max: 257.0) [2024-06-15 17:38:55,956][1648985] Avg episode reward: [(0, '151.630')] [2024-06-15 17:38:55,961][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000497536_1018953728.pth... [2024-06-15 17:38:56,068][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000492096_1007812608.pth [2024-06-15 17:38:58,900][1652491] Updated weights for policy 0, policy_version 497586 (0.0012) [2024-06-15 17:39:00,223][1652491] Updated weights for policy 0, policy_version 497648 (0.0014) [2024-06-15 17:39:00,955][1648985] Fps is (10 sec: 45875.2, 60 sec: 48059.8, 300 sec: 47097.1). Total num frames: 1019215872. Throughput: 0: 11855.6. Samples: 254866944. Policy #0 lag: (min: 1.0, avg: 64.5, max: 257.0) [2024-06-15 17:39:00,956][1648985] Avg episode reward: [(0, '155.800')] [2024-06-15 17:39:02,413][1652491] Updated weights for policy 0, policy_version 497698 (0.0042) [2024-06-15 17:39:04,081][1652491] Updated weights for policy 0, policy_version 497760 (0.0016) [2024-06-15 17:39:04,730][1652491] Updated weights for policy 0, policy_version 497791 (0.0014) [2024-06-15 17:39:05,955][1648985] Fps is (10 sec: 52430.3, 60 sec: 48059.8, 300 sec: 46655.2). Total num frames: 1019478016. Throughput: 0: 12038.6. Samples: 254931968. Policy #0 lag: (min: 1.0, avg: 64.5, max: 257.0) [2024-06-15 17:39:05,955][1648985] Avg episode reward: [(0, '177.940')] [2024-06-15 17:39:09,730][1652491] Updated weights for policy 0, policy_version 497847 (0.0155) [2024-06-15 17:39:10,955][1648985] Fps is (10 sec: 42598.3, 60 sec: 46431.4, 300 sec: 46763.8). Total num frames: 1019641856. Throughput: 0: 11867.1. Samples: 254973440. Policy #0 lag: (min: 1.0, avg: 64.5, max: 257.0) [2024-06-15 17:39:10,955][1648985] Avg episode reward: [(0, '187.150')] [2024-06-15 17:39:11,748][1652491] Updated weights for policy 0, policy_version 497913 (0.0021) [2024-06-15 17:39:14,078][1652491] Updated weights for policy 0, policy_version 497984 (0.0013) [2024-06-15 17:39:15,955][1648985] Fps is (10 sec: 52428.0, 60 sec: 49698.0, 300 sec: 47208.1). Total num frames: 1020002304. Throughput: 0: 11889.7. Samples: 255032320. Policy #0 lag: (min: 1.0, avg: 64.5, max: 257.0) [2024-06-15 17:39:15,956][1648985] Avg episode reward: [(0, '187.110')] [2024-06-15 17:39:20,887][1652491] Updated weights for policy 0, policy_version 498064 (0.0046) [2024-06-15 17:39:20,955][1648985] Fps is (10 sec: 39321.8, 60 sec: 44239.2, 300 sec: 46652.8). Total num frames: 1020035072. Throughput: 0: 11878.4. Samples: 255112192. Policy #0 lag: (min: 15.0, avg: 86.3, max: 271.0) [2024-06-15 17:39:20,955][1648985] Avg episode reward: [(0, '173.700')] [2024-06-15 17:39:22,179][1652491] Updated weights for policy 0, policy_version 498111 (0.0015) [2024-06-15 17:39:22,413][1651469] Signal inference workers to stop experience collection... (25950 times) [2024-06-15 17:39:22,472][1652491] InferenceWorker_p0-w0: stopping experience collection (25950 times) [2024-06-15 17:39:22,796][1651469] Signal inference workers to resume experience collection... (25950 times) [2024-06-15 17:39:22,797][1652491] InferenceWorker_p0-w0: resuming experience collection (25950 times) [2024-06-15 17:39:23,726][1652491] Updated weights for policy 0, policy_version 498169 (0.0013) [2024-06-15 17:39:25,471][1652491] Updated weights for policy 0, policy_version 498224 (0.0013) [2024-06-15 17:39:25,955][1648985] Fps is (10 sec: 39322.1, 60 sec: 49152.2, 300 sec: 46986.0). Total num frames: 1020395520. Throughput: 0: 11832.9. Samples: 255140352. Policy #0 lag: (min: 15.0, avg: 86.3, max: 271.0) [2024-06-15 17:39:25,955][1648985] Avg episode reward: [(0, '160.860')] [2024-06-15 17:39:27,118][1652491] Updated weights for policy 0, policy_version 498291 (0.0015) [2024-06-15 17:39:30,955][1648985] Fps is (10 sec: 49151.7, 60 sec: 45875.2, 300 sec: 46652.8). Total num frames: 1020526592. Throughput: 0: 11764.6. Samples: 255205376. Policy #0 lag: (min: 15.0, avg: 86.3, max: 271.0) [2024-06-15 17:39:30,956][1648985] Avg episode reward: [(0, '145.960')] [2024-06-15 17:39:32,684][1652491] Updated weights for policy 0, policy_version 498339 (0.0013) [2024-06-15 17:39:33,306][1652491] Updated weights for policy 0, policy_version 498368 (0.0012) [2024-06-15 17:39:35,253][1652491] Updated weights for policy 0, policy_version 498427 (0.0097) [2024-06-15 17:39:35,955][1648985] Fps is (10 sec: 39321.6, 60 sec: 48059.7, 300 sec: 46765.1). Total num frames: 1020788736. Throughput: 0: 11810.1. Samples: 255281152. Policy #0 lag: (min: 15.0, avg: 86.3, max: 271.0) [2024-06-15 17:39:35,956][1648985] Avg episode reward: [(0, '141.160')] [2024-06-15 17:39:36,900][1652491] Updated weights for policy 0, policy_version 498480 (0.0036) [2024-06-15 17:39:38,311][1652491] Updated weights for policy 0, policy_version 498534 (0.0015) [2024-06-15 17:39:40,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 45875.2, 300 sec: 46652.7). Total num frames: 1021050880. Throughput: 0: 11548.5. Samples: 255306240. Policy #0 lag: (min: 15.0, avg: 86.3, max: 271.0) [2024-06-15 17:39:40,956][1648985] Avg episode reward: [(0, '157.730')] [2024-06-15 17:39:43,829][1652491] Updated weights for policy 0, policy_version 498594 (0.0016) [2024-06-15 17:39:45,191][1652491] Updated weights for policy 0, policy_version 498625 (0.0012) [2024-06-15 17:39:45,955][1648985] Fps is (10 sec: 45875.2, 60 sec: 46967.5, 300 sec: 46763.8). Total num frames: 1021247488. Throughput: 0: 11502.9. Samples: 255384576. Policy #0 lag: (min: 15.0, avg: 86.3, max: 271.0) [2024-06-15 17:39:45,955][1648985] Avg episode reward: [(0, '178.200')] [2024-06-15 17:39:46,523][1652491] Updated weights for policy 0, policy_version 498683 (0.0012) [2024-06-15 17:39:47,893][1652491] Updated weights for policy 0, policy_version 498736 (0.0011) [2024-06-15 17:39:49,047][1652491] Updated weights for policy 0, policy_version 498758 (0.0016) [2024-06-15 17:39:50,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 46967.5, 300 sec: 46653.7). Total num frames: 1021575168. Throughput: 0: 11502.9. Samples: 255449600. Policy #0 lag: (min: 15.0, avg: 86.3, max: 271.0) [2024-06-15 17:39:50,955][1648985] Avg episode reward: [(0, '172.560')] [2024-06-15 17:39:53,985][1652491] Updated weights for policy 0, policy_version 498818 (0.0014) [2024-06-15 17:39:55,955][1648985] Fps is (10 sec: 45874.0, 60 sec: 45875.2, 300 sec: 46652.7). Total num frames: 1021706240. Throughput: 0: 11537.0. Samples: 255492608. Policy #0 lag: (min: 15.0, avg: 86.3, max: 271.0) [2024-06-15 17:39:55,956][1648985] Avg episode reward: [(0, '152.290')] [2024-06-15 17:39:56,448][1652491] Updated weights for policy 0, policy_version 498883 (0.0013) [2024-06-15 17:39:57,855][1652491] Updated weights for policy 0, policy_version 498944 (0.0143) [2024-06-15 17:39:59,954][1652491] Updated weights for policy 0, policy_version 499003 (0.0032) [2024-06-15 17:40:00,955][1648985] Fps is (10 sec: 42598.1, 60 sec: 46421.3, 300 sec: 46541.7). Total num frames: 1022001152. Throughput: 0: 11548.5. Samples: 255552000. Policy #0 lag: (min: 15.0, avg: 86.3, max: 271.0) [2024-06-15 17:40:00,956][1648985] Avg episode reward: [(0, '152.410')] [2024-06-15 17:40:01,931][1652491] Updated weights for policy 0, policy_version 499067 (0.0014) [2024-06-15 17:40:05,955][1648985] Fps is (10 sec: 39322.6, 60 sec: 43690.7, 300 sec: 46208.5). Total num frames: 1022099456. Throughput: 0: 11468.8. Samples: 255628288. Policy #0 lag: (min: 15.0, avg: 86.3, max: 271.0) [2024-06-15 17:40:05,955][1648985] Avg episode reward: [(0, '165.290')] [2024-06-15 17:40:07,248][1652491] Updated weights for policy 0, policy_version 499136 (0.0097) [2024-06-15 17:40:07,878][1651469] Signal inference workers to stop experience collection... (26000 times) [2024-06-15 17:40:07,950][1652491] InferenceWorker_p0-w0: stopping experience collection (26000 times) [2024-06-15 17:40:08,111][1651469] Signal inference workers to resume experience collection... (26000 times) [2024-06-15 17:40:08,112][1652491] InferenceWorker_p0-w0: resuming experience collection (26000 times) [2024-06-15 17:40:09,047][1652491] Updated weights for policy 0, policy_version 499188 (0.0013) [2024-06-15 17:40:10,862][1652491] Updated weights for policy 0, policy_version 499232 (0.0013) [2024-06-15 17:40:10,955][1648985] Fps is (10 sec: 42598.1, 60 sec: 46421.3, 300 sec: 46653.7). Total num frames: 1022427136. Throughput: 0: 11434.6. Samples: 255654912. Policy #0 lag: (min: 15.0, avg: 86.3, max: 271.0) [2024-06-15 17:40:10,956][1648985] Avg episode reward: [(0, '161.460')] [2024-06-15 17:40:13,002][1652491] Updated weights for policy 0, policy_version 499296 (0.0031) [2024-06-15 17:40:13,594][1652491] Updated weights for policy 0, policy_version 499327 (0.0013) [2024-06-15 17:40:15,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 43690.8, 300 sec: 46208.4). Total num frames: 1022623744. Throughput: 0: 11491.6. Samples: 255722496. Policy #0 lag: (min: 15.0, avg: 86.3, max: 271.0) [2024-06-15 17:40:15,955][1648985] Avg episode reward: [(0, '160.380')] [2024-06-15 17:40:18,244][1652491] Updated weights for policy 0, policy_version 499363 (0.0013) [2024-06-15 17:40:19,623][1652491] Updated weights for policy 0, policy_version 499412 (0.0014) [2024-06-15 17:40:20,955][1648985] Fps is (10 sec: 45875.8, 60 sec: 47513.6, 300 sec: 46764.3). Total num frames: 1022885888. Throughput: 0: 11411.9. Samples: 255794688. Policy #0 lag: (min: 15.0, avg: 86.3, max: 271.0) [2024-06-15 17:40:20,956][1648985] Avg episode reward: [(0, '150.420')] [2024-06-15 17:40:21,187][1652491] Updated weights for policy 0, policy_version 499460 (0.0054) [2024-06-15 17:40:22,432][1652491] Updated weights for policy 0, policy_version 499520 (0.0014) [2024-06-15 17:40:24,665][1652491] Updated weights for policy 0, policy_version 499568 (0.0013) [2024-06-15 17:40:25,955][1648985] Fps is (10 sec: 52428.4, 60 sec: 45875.2, 300 sec: 46652.8). Total num frames: 1023148032. Throughput: 0: 11593.9. Samples: 255827968. Policy #0 lag: (min: 15.0, avg: 86.3, max: 271.0) [2024-06-15 17:40:25,956][1648985] Avg episode reward: [(0, '154.300')] [2024-06-15 17:40:29,016][1652491] Updated weights for policy 0, policy_version 499621 (0.0012) [2024-06-15 17:40:30,636][1652491] Updated weights for policy 0, policy_version 499669 (0.0024) [2024-06-15 17:40:30,955][1648985] Fps is (10 sec: 45875.2, 60 sec: 46967.5, 300 sec: 46875.4). Total num frames: 1023344640. Throughput: 0: 11559.8. Samples: 255904768. Policy #0 lag: (min: 15.0, avg: 86.3, max: 271.0) [2024-06-15 17:40:30,956][1648985] Avg episode reward: [(0, '142.900')] [2024-06-15 17:40:31,591][1652491] Updated weights for policy 0, policy_version 499712 (0.0012) [2024-06-15 17:40:32,997][1652491] Updated weights for policy 0, policy_version 499770 (0.0029) [2024-06-15 17:40:35,213][1652491] Updated weights for policy 0, policy_version 499835 (0.0014) [2024-06-15 17:40:35,955][1648985] Fps is (10 sec: 52429.1, 60 sec: 48059.7, 300 sec: 46652.7). Total num frames: 1023672320. Throughput: 0: 11650.8. Samples: 255973888. Policy #0 lag: (min: 15.0, avg: 86.3, max: 271.0) [2024-06-15 17:40:35,955][1648985] Avg episode reward: [(0, '147.150')] [2024-06-15 17:40:39,867][1652491] Updated weights for policy 0, policy_version 499898 (0.0016) [2024-06-15 17:40:40,955][1648985] Fps is (10 sec: 45875.1, 60 sec: 45875.2, 300 sec: 46652.7). Total num frames: 1023803392. Throughput: 0: 11696.4. Samples: 256018944. Policy #0 lag: (min: 15.0, avg: 86.3, max: 271.0) [2024-06-15 17:40:40,955][1648985] Avg episode reward: [(0, '170.990')] [2024-06-15 17:40:41,966][1652491] Updated weights for policy 0, policy_version 499952 (0.0012) [2024-06-15 17:40:42,870][1652491] Updated weights for policy 0, policy_version 499970 (0.0013) [2024-06-15 17:40:44,194][1652491] Updated weights for policy 0, policy_version 500030 (0.0012) [2024-06-15 17:40:45,435][1652491] Updated weights for policy 0, policy_version 500089 (0.0014) [2024-06-15 17:40:45,955][1648985] Fps is (10 sec: 52427.6, 60 sec: 49151.8, 300 sec: 47099.1). Total num frames: 1024196608. Throughput: 0: 11912.5. Samples: 256088064. Policy #0 lag: (min: 15.0, avg: 86.3, max: 271.0) [2024-06-15 17:40:45,956][1648985] Avg episode reward: [(0, '163.070')] [2024-06-15 17:40:50,208][1652491] Updated weights for policy 0, policy_version 500128 (0.0012) [2024-06-15 17:40:50,955][1648985] Fps is (10 sec: 52428.3, 60 sec: 45875.1, 300 sec: 46652.7). Total num frames: 1024327680. Throughput: 0: 11719.1. Samples: 256155648. Policy #0 lag: (min: 15.0, avg: 106.8, max: 271.0) [2024-06-15 17:40:50,956][1648985] Avg episode reward: [(0, '143.080')] [2024-06-15 17:40:52,139][1651469] Signal inference workers to stop experience collection... (26050 times) [2024-06-15 17:40:52,167][1652491] Updated weights for policy 0, policy_version 500179 (0.0012) [2024-06-15 17:40:52,189][1652491] InferenceWorker_p0-w0: stopping experience collection (26050 times) [2024-06-15 17:40:52,327][1651469] Signal inference workers to resume experience collection... (26050 times) [2024-06-15 17:40:52,329][1652491] InferenceWorker_p0-w0: resuming experience collection (26050 times) [2024-06-15 17:40:54,517][1652491] Updated weights for policy 0, policy_version 500225 (0.0125) [2024-06-15 17:40:55,955][1648985] Fps is (10 sec: 39321.3, 60 sec: 48059.7, 300 sec: 46875.5). Total num frames: 1024589824. Throughput: 0: 12003.5. Samples: 256195072. Policy #0 lag: (min: 15.0, avg: 106.8, max: 271.0) [2024-06-15 17:40:55,956][1648985] Avg episode reward: [(0, '147.310')] [2024-06-15 17:40:55,959][1652491] Updated weights for policy 0, policy_version 500288 (0.0013) [2024-06-15 17:40:56,355][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000500304_1024622592.pth... [2024-06-15 17:40:56,536][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000494752_1013252096.pth [2024-06-15 17:41:00,955][1648985] Fps is (10 sec: 39321.8, 60 sec: 45329.1, 300 sec: 46430.6). Total num frames: 1024720896. Throughput: 0: 11912.5. Samples: 256258560. Policy #0 lag: (min: 15.0, avg: 106.8, max: 271.0) [2024-06-15 17:41:00,956][1648985] Avg episode reward: [(0, '154.970')] [2024-06-15 17:41:01,338][1652491] Updated weights for policy 0, policy_version 500355 (0.0013) [2024-06-15 17:41:02,561][1652491] Updated weights for policy 0, policy_version 500411 (0.0012) [2024-06-15 17:41:04,625][1652491] Updated weights for policy 0, policy_version 500470 (0.0015) [2024-06-15 17:41:05,955][1648985] Fps is (10 sec: 39323.0, 60 sec: 48059.8, 300 sec: 46763.8). Total num frames: 1024983040. Throughput: 0: 12026.3. Samples: 256335872. Policy #0 lag: (min: 15.0, avg: 106.8, max: 271.0) [2024-06-15 17:41:05,955][1648985] Avg episode reward: [(0, '168.240')] [2024-06-15 17:41:06,462][1652491] Updated weights for policy 0, policy_version 500512 (0.0011) [2024-06-15 17:41:07,612][1652491] Updated weights for policy 0, policy_version 500561 (0.0016) [2024-06-15 17:41:10,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 46967.5, 300 sec: 46652.7). Total num frames: 1025245184. Throughput: 0: 11889.8. Samples: 256363008. Policy #0 lag: (min: 15.0, avg: 106.8, max: 271.0) [2024-06-15 17:41:10,956][1648985] Avg episode reward: [(0, '166.080')] [2024-06-15 17:41:12,013][1652491] Updated weights for policy 0, policy_version 500624 (0.0012) [2024-06-15 17:41:12,872][1652491] Updated weights for policy 0, policy_version 500665 (0.0014) [2024-06-15 17:41:15,287][1652491] Updated weights for policy 0, policy_version 500735 (0.0045) [2024-06-15 17:41:15,955][1648985] Fps is (10 sec: 52428.3, 60 sec: 48059.7, 300 sec: 47097.0). Total num frames: 1025507328. Throughput: 0: 11832.9. Samples: 256437248. Policy #0 lag: (min: 15.0, avg: 106.8, max: 271.0) [2024-06-15 17:41:15,956][1648985] Avg episode reward: [(0, '145.000')] [2024-06-15 17:41:17,836][1652491] Updated weights for policy 0, policy_version 500775 (0.0012) [2024-06-15 17:41:19,017][1652491] Updated weights for policy 0, policy_version 500816 (0.0013) [2024-06-15 17:41:20,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 48059.7, 300 sec: 46655.2). Total num frames: 1025769472. Throughput: 0: 11980.8. Samples: 256513024. Policy #0 lag: (min: 15.0, avg: 106.8, max: 271.0) [2024-06-15 17:41:20,955][1648985] Avg episode reward: [(0, '137.710')] [2024-06-15 17:41:22,786][1652491] Updated weights for policy 0, policy_version 500880 (0.0022) [2024-06-15 17:41:23,578][1652491] Updated weights for policy 0, policy_version 500927 (0.0014) [2024-06-15 17:41:25,225][1652491] Updated weights for policy 0, policy_version 500989 (0.0014) [2024-06-15 17:41:25,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 48059.8, 300 sec: 47101.4). Total num frames: 1026031616. Throughput: 0: 11867.0. Samples: 256552960. Policy #0 lag: (min: 15.0, avg: 106.8, max: 271.0) [2024-06-15 17:41:25,955][1648985] Avg episode reward: [(0, '141.630')] [2024-06-15 17:41:28,736][1652491] Updated weights for policy 0, policy_version 501057 (0.0060) [2024-06-15 17:41:29,745][1652491] Updated weights for policy 0, policy_version 501117 (0.0023) [2024-06-15 17:41:30,955][1648985] Fps is (10 sec: 52428.0, 60 sec: 49151.9, 300 sec: 47319.2). Total num frames: 1026293760. Throughput: 0: 11889.8. Samples: 256623104. Policy #0 lag: (min: 15.0, avg: 106.8, max: 271.0) [2024-06-15 17:41:30,956][1648985] Avg episode reward: [(0, '151.870')] [2024-06-15 17:41:34,168][1651469] Signal inference workers to stop experience collection... (26100 times) [2024-06-15 17:41:34,238][1652491] InferenceWorker_p0-w0: stopping experience collection (26100 times) [2024-06-15 17:41:34,456][1651469] Signal inference workers to resume experience collection... (26100 times) [2024-06-15 17:41:34,457][1652491] InferenceWorker_p0-w0: resuming experience collection (26100 times) [2024-06-15 17:41:34,459][1652491] Updated weights for policy 0, policy_version 501184 (0.0016) [2024-06-15 17:41:35,884][1652491] Updated weights for policy 0, policy_version 501240 (0.0015) [2024-06-15 17:41:35,955][1648985] Fps is (10 sec: 49152.1, 60 sec: 47513.6, 300 sec: 47097.1). Total num frames: 1026523136. Throughput: 0: 11958.1. Samples: 256693760. Policy #0 lag: (min: 15.0, avg: 106.8, max: 271.0) [2024-06-15 17:41:35,956][1648985] Avg episode reward: [(0, '151.070')] [2024-06-15 17:41:39,373][1652491] Updated weights for policy 0, policy_version 501282 (0.0045) [2024-06-15 17:41:40,955][1648985] Fps is (10 sec: 45876.0, 60 sec: 49152.0, 300 sec: 47319.2). Total num frames: 1026752512. Throughput: 0: 12015.0. Samples: 256735744. Policy #0 lag: (min: 15.0, avg: 106.8, max: 271.0) [2024-06-15 17:41:40,956][1648985] Avg episode reward: [(0, '156.630')] [2024-06-15 17:41:41,208][1652491] Updated weights for policy 0, policy_version 501360 (0.0098) [2024-06-15 17:41:45,808][1652491] Updated weights for policy 0, policy_version 501417 (0.0012) [2024-06-15 17:41:45,955][1648985] Fps is (10 sec: 39321.6, 60 sec: 45329.2, 300 sec: 46877.8). Total num frames: 1026916352. Throughput: 0: 12197.0. Samples: 256807424. Policy #0 lag: (min: 15.0, avg: 106.8, max: 271.0) [2024-06-15 17:41:45,955][1648985] Avg episode reward: [(0, '161.560')] [2024-06-15 17:41:47,240][1652491] Updated weights for policy 0, policy_version 501473 (0.0012) [2024-06-15 17:41:48,991][1652491] Updated weights for policy 0, policy_version 501505 (0.0012) [2024-06-15 17:41:50,252][1652491] Updated weights for policy 0, policy_version 501563 (0.0023) [2024-06-15 17:41:50,960][1648985] Fps is (10 sec: 45852.8, 60 sec: 48055.9, 300 sec: 47540.6). Total num frames: 1027211264. Throughput: 0: 12002.2. Samples: 256876032. Policy #0 lag: (min: 15.0, avg: 106.8, max: 271.0) [2024-06-15 17:41:50,961][1648985] Avg episode reward: [(0, '172.250')] [2024-06-15 17:41:51,874][1652491] Updated weights for policy 0, policy_version 501603 (0.0018) [2024-06-15 17:41:55,955][1648985] Fps is (10 sec: 45875.0, 60 sec: 46421.5, 300 sec: 46763.8). Total num frames: 1027375104. Throughput: 0: 12231.1. Samples: 256913408. Policy #0 lag: (min: 15.0, avg: 106.8, max: 271.0) [2024-06-15 17:41:55,956][1648985] Avg episode reward: [(0, '186.360')] [2024-06-15 17:41:56,322][1652491] Updated weights for policy 0, policy_version 501670 (0.0014) [2024-06-15 17:41:57,570][1652491] Updated weights for policy 0, policy_version 501730 (0.0012) [2024-06-15 17:41:59,639][1652491] Updated weights for policy 0, policy_version 501764 (0.0012) [2024-06-15 17:42:00,955][1648985] Fps is (10 sec: 49176.3, 60 sec: 49698.3, 300 sec: 47430.3). Total num frames: 1027702784. Throughput: 0: 12322.2. Samples: 256991744. Policy #0 lag: (min: 15.0, avg: 106.8, max: 271.0) [2024-06-15 17:42:00,955][1648985] Avg episode reward: [(0, '157.500')] [2024-06-15 17:42:01,124][1652491] Updated weights for policy 0, policy_version 501823 (0.0015) [2024-06-15 17:42:03,103][1652491] Updated weights for policy 0, policy_version 501877 (0.0015) [2024-06-15 17:42:05,955][1648985] Fps is (10 sec: 49152.1, 60 sec: 48059.7, 300 sec: 46876.7). Total num frames: 1027866624. Throughput: 0: 12174.2. Samples: 257060864. Policy #0 lag: (min: 15.0, avg: 106.8, max: 271.0) [2024-06-15 17:42:05,955][1648985] Avg episode reward: [(0, '125.990')] [2024-06-15 17:42:06,835][1652491] Updated weights for policy 0, policy_version 501907 (0.0011) [2024-06-15 17:42:08,998][1652491] Updated weights for policy 0, policy_version 502016 (0.0035) [2024-06-15 17:42:10,956][1648985] Fps is (10 sec: 42593.5, 60 sec: 48058.9, 300 sec: 47096.9). Total num frames: 1028128768. Throughput: 0: 11946.4. Samples: 257090560. Policy #0 lag: (min: 15.0, avg: 106.8, max: 271.0) [2024-06-15 17:42:10,957][1648985] Avg episode reward: [(0, '135.500')] [2024-06-15 17:42:12,248][1652491] Updated weights for policy 0, policy_version 502080 (0.0022) [2024-06-15 17:42:14,650][1652491] Updated weights for policy 0, policy_version 502138 (0.0015) [2024-06-15 17:42:15,955][1648985] Fps is (10 sec: 52429.3, 60 sec: 48059.8, 300 sec: 47208.7). Total num frames: 1028390912. Throughput: 0: 11889.8. Samples: 257158144. Policy #0 lag: (min: 15.0, avg: 106.8, max: 271.0) [2024-06-15 17:42:15,978][1648985] Avg episode reward: [(0, '148.430')] [2024-06-15 17:42:17,912][1651469] Signal inference workers to stop experience collection... (26150 times) [2024-06-15 17:42:18,002][1652491] InferenceWorker_p0-w0: stopping experience collection (26150 times) [2024-06-15 17:42:18,176][1651469] Signal inference workers to resume experience collection... (26150 times) [2024-06-15 17:42:18,177][1652491] InferenceWorker_p0-w0: resuming experience collection (26150 times) [2024-06-15 17:42:19,380][1652491] Updated weights for policy 0, policy_version 502208 (0.0149) [2024-06-15 17:42:20,474][1652491] Updated weights for policy 0, policy_version 502267 (0.0016) [2024-06-15 17:42:20,955][1648985] Fps is (10 sec: 52434.4, 60 sec: 48059.7, 300 sec: 47102.4). Total num frames: 1028653056. Throughput: 0: 11855.6. Samples: 257227264. Policy #0 lag: (min: 15.0, avg: 103.2, max: 271.0) [2024-06-15 17:42:20,955][1648985] Avg episode reward: [(0, '151.590')] [2024-06-15 17:42:23,797][1652491] Updated weights for policy 0, policy_version 502326 (0.0014) [2024-06-15 17:42:25,955][1648985] Fps is (10 sec: 45874.7, 60 sec: 46967.4, 300 sec: 47430.3). Total num frames: 1028849664. Throughput: 0: 11639.4. Samples: 257259520. Policy #0 lag: (min: 15.0, avg: 103.2, max: 271.0) [2024-06-15 17:42:25,956][1648985] Avg episode reward: [(0, '142.860')] [2024-06-15 17:42:26,231][1652491] Updated weights for policy 0, policy_version 502391 (0.0103) [2024-06-15 17:42:29,771][1652491] Updated weights for policy 0, policy_version 502420 (0.0014) [2024-06-15 17:42:30,955][1648985] Fps is (10 sec: 39320.5, 60 sec: 45875.1, 300 sec: 46652.7). Total num frames: 1029046272. Throughput: 0: 11741.8. Samples: 257335808. Policy #0 lag: (min: 15.0, avg: 103.2, max: 271.0) [2024-06-15 17:42:30,956][1648985] Avg episode reward: [(0, '147.910')] [2024-06-15 17:42:31,337][1652491] Updated weights for policy 0, policy_version 502485 (0.0013) [2024-06-15 17:42:33,861][1652491] Updated weights for policy 0, policy_version 502530 (0.0012) [2024-06-15 17:42:35,955][1648985] Fps is (10 sec: 45875.3, 60 sec: 46421.3, 300 sec: 47431.1). Total num frames: 1029308416. Throughput: 0: 11606.6. Samples: 257398272. Policy #0 lag: (min: 15.0, avg: 103.2, max: 271.0) [2024-06-15 17:42:35,956][1648985] Avg episode reward: [(0, '151.360')] [2024-06-15 17:42:36,153][1652491] Updated weights for policy 0, policy_version 502596 (0.0013) [2024-06-15 17:42:37,730][1652491] Updated weights for policy 0, policy_version 502656 (0.0014) [2024-06-15 17:42:40,955][1648985] Fps is (10 sec: 39323.0, 60 sec: 44782.9, 300 sec: 46541.7). Total num frames: 1029439488. Throughput: 0: 11514.3. Samples: 257431552. Policy #0 lag: (min: 15.0, avg: 103.2, max: 271.0) [2024-06-15 17:42:40,955][1648985] Avg episode reward: [(0, '148.310')] [2024-06-15 17:42:42,393][1652491] Updated weights for policy 0, policy_version 502707 (0.0013) [2024-06-15 17:42:43,810][1652491] Updated weights for policy 0, policy_version 502777 (0.0090) [2024-06-15 17:42:45,955][1648985] Fps is (10 sec: 42597.9, 60 sec: 46967.3, 300 sec: 47208.1). Total num frames: 1029734400. Throughput: 0: 11389.1. Samples: 257504256. Policy #0 lag: (min: 15.0, avg: 103.2, max: 271.0) [2024-06-15 17:42:45,956][1648985] Avg episode reward: [(0, '148.930')] [2024-06-15 17:42:46,844][1652491] Updated weights for policy 0, policy_version 502842 (0.0014) [2024-06-15 17:42:48,604][1652491] Updated weights for policy 0, policy_version 502903 (0.0013) [2024-06-15 17:42:50,955][1648985] Fps is (10 sec: 52428.3, 60 sec: 45878.9, 300 sec: 46652.8). Total num frames: 1029963776. Throughput: 0: 11320.9. Samples: 257570304. Policy #0 lag: (min: 15.0, avg: 103.2, max: 271.0) [2024-06-15 17:42:50,956][1648985] Avg episode reward: [(0, '144.640')] [2024-06-15 17:42:53,417][1652491] Updated weights for policy 0, policy_version 502932 (0.0027) [2024-06-15 17:42:55,218][1652491] Updated weights for policy 0, policy_version 503011 (0.0022) [2024-06-15 17:42:55,955][1648985] Fps is (10 sec: 49151.7, 60 sec: 47513.5, 300 sec: 47097.0). Total num frames: 1030225920. Throughput: 0: 11525.9. Samples: 257609216. Policy #0 lag: (min: 15.0, avg: 103.2, max: 271.0) [2024-06-15 17:42:55,956][1648985] Avg episode reward: [(0, '167.900')] [2024-06-15 17:42:55,961][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000503040_1030225920.pth... [2024-06-15 17:42:56,015][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000497536_1018953728.pth [2024-06-15 17:42:58,175][1652491] Updated weights for policy 0, policy_version 503075 (0.0099) [2024-06-15 17:42:58,554][1651469] Signal inference workers to stop experience collection... (26200 times) [2024-06-15 17:42:58,611][1652491] InferenceWorker_p0-w0: stopping experience collection (26200 times) [2024-06-15 17:42:58,877][1651469] Signal inference workers to resume experience collection... (26200 times) [2024-06-15 17:42:58,878][1652491] InferenceWorker_p0-w0: resuming experience collection (26200 times) [2024-06-15 17:43:00,399][1652491] Updated weights for policy 0, policy_version 503165 (0.0014) [2024-06-15 17:43:00,955][1648985] Fps is (10 sec: 52428.2, 60 sec: 46421.2, 300 sec: 47097.0). Total num frames: 1030488064. Throughput: 0: 11298.1. Samples: 257666560. Policy #0 lag: (min: 15.0, avg: 103.2, max: 271.0) [2024-06-15 17:43:00,956][1648985] Avg episode reward: [(0, '167.640')] [2024-06-15 17:43:05,092][1652491] Updated weights for policy 0, policy_version 503217 (0.0012) [2024-06-15 17:43:05,955][1648985] Fps is (10 sec: 42599.4, 60 sec: 46421.4, 300 sec: 46765.9). Total num frames: 1030651904. Throughput: 0: 11491.6. Samples: 257744384. Policy #0 lag: (min: 15.0, avg: 103.2, max: 271.0) [2024-06-15 17:43:05,955][1648985] Avg episode reward: [(0, '162.840')] [2024-06-15 17:43:06,575][1652491] Updated weights for policy 0, policy_version 503292 (0.0016) [2024-06-15 17:43:09,629][1652491] Updated weights for policy 0, policy_version 503334 (0.0029) [2024-06-15 17:43:10,955][1648985] Fps is (10 sec: 42599.1, 60 sec: 46422.2, 300 sec: 47097.1). Total num frames: 1030914048. Throughput: 0: 11628.1. Samples: 257782784. Policy #0 lag: (min: 15.0, avg: 103.2, max: 271.0) [2024-06-15 17:43:10,955][1648985] Avg episode reward: [(0, '165.330')] [2024-06-15 17:43:11,195][1652491] Updated weights for policy 0, policy_version 503392 (0.0027) [2024-06-15 17:43:15,624][1652491] Updated weights for policy 0, policy_version 503440 (0.0013) [2024-06-15 17:43:15,955][1648985] Fps is (10 sec: 39321.1, 60 sec: 44236.7, 300 sec: 46320.0). Total num frames: 1031045120. Throughput: 0: 11480.2. Samples: 257852416. Policy #0 lag: (min: 15.0, avg: 103.2, max: 271.0) [2024-06-15 17:43:15,956][1648985] Avg episode reward: [(0, '169.990')] [2024-06-15 17:43:17,385][1652491] Updated weights for policy 0, policy_version 503522 (0.0102) [2024-06-15 17:43:19,876][1652491] Updated weights for policy 0, policy_version 503555 (0.0017) [2024-06-15 17:43:20,955][1648985] Fps is (10 sec: 45875.2, 60 sec: 45329.1, 300 sec: 47208.2). Total num frames: 1031372800. Throughput: 0: 11571.2. Samples: 257918976. Policy #0 lag: (min: 15.0, avg: 103.2, max: 271.0) [2024-06-15 17:43:20,956][1648985] Avg episode reward: [(0, '166.970')] [2024-06-15 17:43:22,028][1652491] Updated weights for policy 0, policy_version 503648 (0.0013) [2024-06-15 17:43:25,955][1648985] Fps is (10 sec: 49150.6, 60 sec: 44782.7, 300 sec: 46652.7). Total num frames: 1031536640. Throughput: 0: 11514.2. Samples: 257949696. Policy #0 lag: (min: 15.0, avg: 103.2, max: 271.0) [2024-06-15 17:43:25,956][1648985] Avg episode reward: [(0, '155.040')] [2024-06-15 17:43:26,862][1652491] Updated weights for policy 0, policy_version 503681 (0.0013) [2024-06-15 17:43:28,301][1652491] Updated weights for policy 0, policy_version 503760 (0.0130) [2024-06-15 17:43:29,127][1652491] Updated weights for policy 0, policy_version 503806 (0.0012) [2024-06-15 17:43:30,955][1648985] Fps is (10 sec: 42598.5, 60 sec: 45875.5, 300 sec: 47097.1). Total num frames: 1031798784. Throughput: 0: 11548.5. Samples: 258023936. Policy #0 lag: (min: 15.0, avg: 103.2, max: 271.0) [2024-06-15 17:43:30,955][1648985] Avg episode reward: [(0, '149.140')] [2024-06-15 17:43:32,931][1652491] Updated weights for policy 0, policy_version 503888 (0.0013) [2024-06-15 17:43:35,955][1648985] Fps is (10 sec: 52429.8, 60 sec: 45875.1, 300 sec: 46652.7). Total num frames: 1032060928. Throughput: 0: 11582.5. Samples: 258091520. Policy #0 lag: (min: 15.0, avg: 103.2, max: 271.0) [2024-06-15 17:43:35,956][1648985] Avg episode reward: [(0, '149.820')] [2024-06-15 17:43:37,847][1652491] Updated weights for policy 0, policy_version 503939 (0.0014) [2024-06-15 17:43:39,055][1651469] Signal inference workers to stop experience collection... (26250 times) [2024-06-15 17:43:39,089][1652491] InferenceWorker_p0-w0: stopping experience collection (26250 times) [2024-06-15 17:43:39,301][1651469] Signal inference workers to resume experience collection... (26250 times) [2024-06-15 17:43:39,302][1652491] InferenceWorker_p0-w0: resuming experience collection (26250 times) [2024-06-15 17:43:39,406][1652491] Updated weights for policy 0, policy_version 504016 (0.0135) [2024-06-15 17:43:40,955][1648985] Fps is (10 sec: 52428.2, 60 sec: 48059.6, 300 sec: 47097.0). Total num frames: 1032323072. Throughput: 0: 11525.7. Samples: 258127872. Policy #0 lag: (min: 15.0, avg: 103.2, max: 271.0) [2024-06-15 17:43:40,956][1648985] Avg episode reward: [(0, '173.380')] [2024-06-15 17:43:42,933][1652491] Updated weights for policy 0, policy_version 504083 (0.0015) [2024-06-15 17:43:44,302][1652491] Updated weights for policy 0, policy_version 504145 (0.0011) [2024-06-15 17:43:45,241][1652491] Updated weights for policy 0, policy_version 504187 (0.0038) [2024-06-15 17:43:45,955][1648985] Fps is (10 sec: 52429.6, 60 sec: 47513.7, 300 sec: 46874.9). Total num frames: 1032585216. Throughput: 0: 11776.0. Samples: 258196480. Policy #0 lag: (min: 15.0, avg: 103.2, max: 271.0) [2024-06-15 17:43:45,955][1648985] Avg episode reward: [(0, '162.740')] [2024-06-15 17:43:49,580][1652491] Updated weights for policy 0, policy_version 504240 (0.0013) [2024-06-15 17:43:50,955][1648985] Fps is (10 sec: 45874.4, 60 sec: 46967.3, 300 sec: 46874.9). Total num frames: 1032781824. Throughput: 0: 11662.1. Samples: 258269184. Policy #0 lag: (min: 15.0, avg: 97.0, max: 271.0) [2024-06-15 17:43:50,956][1648985] Avg episode reward: [(0, '161.330')] [2024-06-15 17:43:50,981][1652491] Updated weights for policy 0, policy_version 504304 (0.0091) [2024-06-15 17:43:53,234][1652491] Updated weights for policy 0, policy_version 504340 (0.0060) [2024-06-15 17:43:54,116][1652491] Updated weights for policy 0, policy_version 504384 (0.0022) [2024-06-15 17:43:55,620][1652491] Updated weights for policy 0, policy_version 504440 (0.0017) [2024-06-15 17:43:55,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 48059.9, 300 sec: 47097.1). Total num frames: 1033109504. Throughput: 0: 11673.6. Samples: 258308096. Policy #0 lag: (min: 15.0, avg: 97.0, max: 271.0) [2024-06-15 17:43:55,956][1648985] Avg episode reward: [(0, '149.440')] [2024-06-15 17:44:00,040][1652491] Updated weights for policy 0, policy_version 504496 (0.0033) [2024-06-15 17:44:00,955][1648985] Fps is (10 sec: 49153.1, 60 sec: 46421.4, 300 sec: 46763.8). Total num frames: 1033273344. Throughput: 0: 11889.8. Samples: 258387456. Policy #0 lag: (min: 15.0, avg: 97.0, max: 271.0) [2024-06-15 17:44:00,956][1648985] Avg episode reward: [(0, '155.120')] [2024-06-15 17:44:01,685][1652491] Updated weights for policy 0, policy_version 504566 (0.0012) [2024-06-15 17:44:04,295][1652491] Updated weights for policy 0, policy_version 504610 (0.0019) [2024-06-15 17:44:05,955][1648985] Fps is (10 sec: 42598.6, 60 sec: 48059.7, 300 sec: 47097.1). Total num frames: 1033535488. Throughput: 0: 11878.4. Samples: 258453504. Policy #0 lag: (min: 15.0, avg: 97.0, max: 271.0) [2024-06-15 17:44:05,955][1648985] Avg episode reward: [(0, '161.150')] [2024-06-15 17:44:05,997][1652491] Updated weights for policy 0, policy_version 504660 (0.0013) [2024-06-15 17:44:09,544][1652491] Updated weights for policy 0, policy_version 504707 (0.0012) [2024-06-15 17:44:10,548][1652491] Updated weights for policy 0, policy_version 504766 (0.0013) [2024-06-15 17:44:10,955][1648985] Fps is (10 sec: 49152.0, 60 sec: 47513.6, 300 sec: 46652.8). Total num frames: 1033764864. Throughput: 0: 12162.9. Samples: 258497024. Policy #0 lag: (min: 15.0, avg: 97.0, max: 271.0) [2024-06-15 17:44:10,956][1648985] Avg episode reward: [(0, '150.350')] [2024-06-15 17:44:12,492][1652491] Updated weights for policy 0, policy_version 504816 (0.0081) [2024-06-15 17:44:14,177][1652491] Updated weights for policy 0, policy_version 504864 (0.0013) [2024-06-15 17:44:15,955][1648985] Fps is (10 sec: 49152.0, 60 sec: 49698.2, 300 sec: 47430.3). Total num frames: 1034027008. Throughput: 0: 12003.6. Samples: 258564096. Policy #0 lag: (min: 15.0, avg: 97.0, max: 271.0) [2024-06-15 17:44:15,956][1648985] Avg episode reward: [(0, '155.370')] [2024-06-15 17:44:16,289][1652491] Updated weights for policy 0, policy_version 504912 (0.0014) [2024-06-15 17:44:20,186][1652491] Updated weights for policy 0, policy_version 504976 (0.0020) [2024-06-15 17:44:20,595][1651469] Signal inference workers to stop experience collection... (26300 times) [2024-06-15 17:44:20,654][1652491] InferenceWorker_p0-w0: stopping experience collection (26300 times) [2024-06-15 17:44:20,818][1651469] Signal inference workers to resume experience collection... (26300 times) [2024-06-15 17:44:20,818][1652491] InferenceWorker_p0-w0: resuming experience collection (26300 times) [2024-06-15 17:44:20,955][1648985] Fps is (10 sec: 49152.0, 60 sec: 48059.7, 300 sec: 46986.0). Total num frames: 1034256384. Throughput: 0: 12276.7. Samples: 258643968. Policy #0 lag: (min: 15.0, avg: 97.0, max: 271.0) [2024-06-15 17:44:20,956][1648985] Avg episode reward: [(0, '143.680')] [2024-06-15 17:44:22,659][1652491] Updated weights for policy 0, policy_version 505043 (0.0015) [2024-06-15 17:44:23,397][1652491] Updated weights for policy 0, policy_version 505088 (0.0013) [2024-06-15 17:44:25,219][1652491] Updated weights for policy 0, policy_version 505143 (0.0015) [2024-06-15 17:44:25,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 50244.6, 300 sec: 47541.4). Total num frames: 1034551296. Throughput: 0: 12231.1. Samples: 258678272. Policy #0 lag: (min: 15.0, avg: 97.0, max: 271.0) [2024-06-15 17:44:25,955][1648985] Avg episode reward: [(0, '122.230')] [2024-06-15 17:44:27,893][1652491] Updated weights for policy 0, policy_version 505205 (0.0013) [2024-06-15 17:44:30,955][1648985] Fps is (10 sec: 42597.9, 60 sec: 48059.6, 300 sec: 47097.0). Total num frames: 1034682368. Throughput: 0: 12401.7. Samples: 258754560. Policy #0 lag: (min: 15.0, avg: 97.0, max: 271.0) [2024-06-15 17:44:30,956][1648985] Avg episode reward: [(0, '124.830')] [2024-06-15 17:44:31,945][1652491] Updated weights for policy 0, policy_version 505264 (0.0015) [2024-06-15 17:44:33,803][1652491] Updated weights for policy 0, policy_version 505328 (0.0013) [2024-06-15 17:44:35,431][1652491] Updated weights for policy 0, policy_version 505361 (0.0015) [2024-06-15 17:44:35,955][1648985] Fps is (10 sec: 45875.0, 60 sec: 49152.2, 300 sec: 47319.2). Total num frames: 1035010048. Throughput: 0: 12322.2. Samples: 258823680. Policy #0 lag: (min: 15.0, avg: 97.0, max: 271.0) [2024-06-15 17:44:35,955][1648985] Avg episode reward: [(0, '144.270')] [2024-06-15 17:44:37,518][1652491] Updated weights for policy 0, policy_version 505424 (0.0012) [2024-06-15 17:44:38,738][1652491] Updated weights for policy 0, policy_version 505472 (0.0014) [2024-06-15 17:44:40,955][1648985] Fps is (10 sec: 52429.2, 60 sec: 48059.7, 300 sec: 47319.2). Total num frames: 1035206656. Throughput: 0: 12094.6. Samples: 258852352. Policy #0 lag: (min: 15.0, avg: 97.0, max: 271.0) [2024-06-15 17:44:40,956][1648985] Avg episode reward: [(0, '154.270')] [2024-06-15 17:44:43,008][1652491] Updated weights for policy 0, policy_version 505536 (0.0108) [2024-06-15 17:44:45,955][1648985] Fps is (10 sec: 45874.5, 60 sec: 48059.6, 300 sec: 47097.0). Total num frames: 1035468800. Throughput: 0: 12026.3. Samples: 258928640. Policy #0 lag: (min: 15.0, avg: 97.0, max: 271.0) [2024-06-15 17:44:45,956][1648985] Avg episode reward: [(0, '142.420')] [2024-06-15 17:44:47,462][1652491] Updated weights for policy 0, policy_version 505648 (0.0028) [2024-06-15 17:44:49,318][1652491] Updated weights for policy 0, policy_version 505684 (0.0048) [2024-06-15 17:44:50,955][1648985] Fps is (10 sec: 52429.4, 60 sec: 49152.2, 300 sec: 47541.4). Total num frames: 1035730944. Throughput: 0: 12060.4. Samples: 258996224. Policy #0 lag: (min: 15.0, avg: 97.0, max: 271.0) [2024-06-15 17:44:50,955][1648985] Avg episode reward: [(0, '124.440')] [2024-06-15 17:44:53,067][1652491] Updated weights for policy 0, policy_version 505744 (0.0013) [2024-06-15 17:44:55,743][1652491] Updated weights for policy 0, policy_version 505793 (0.0040) [2024-06-15 17:44:55,955][1648985] Fps is (10 sec: 39320.6, 60 sec: 45874.9, 300 sec: 46985.9). Total num frames: 1035862016. Throughput: 0: 11855.6. Samples: 259030528. Policy #0 lag: (min: 15.0, avg: 97.0, max: 271.0) [2024-06-15 17:44:55,956][1648985] Avg episode reward: [(0, '119.980')] [2024-06-15 17:44:56,376][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000505824_1035927552.pth... [2024-06-15 17:44:56,539][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000500304_1024622592.pth [2024-06-15 17:44:57,362][1652491] Updated weights for policy 0, policy_version 505856 (0.0012) [2024-06-15 17:44:59,204][1652491] Updated weights for policy 0, policy_version 505918 (0.0118) [2024-06-15 17:45:00,955][1648985] Fps is (10 sec: 39321.3, 60 sec: 47513.6, 300 sec: 47541.4). Total num frames: 1036124160. Throughput: 0: 11787.4. Samples: 259094528. Policy #0 lag: (min: 15.0, avg: 97.0, max: 271.0) [2024-06-15 17:45:00,956][1648985] Avg episode reward: [(0, '144.390')] [2024-06-15 17:45:02,399][1652491] Updated weights for policy 0, policy_version 505975 (0.0036) [2024-06-15 17:45:05,678][1652491] Updated weights for policy 0, policy_version 506042 (0.0014) [2024-06-15 17:45:05,955][1648985] Fps is (10 sec: 52430.7, 60 sec: 47513.6, 300 sec: 47319.2). Total num frames: 1036386304. Throughput: 0: 11582.6. Samples: 259165184. Policy #0 lag: (min: 15.0, avg: 97.0, max: 271.0) [2024-06-15 17:45:05,956][1648985] Avg episode reward: [(0, '159.740')] [2024-06-15 17:45:08,210][1651469] Signal inference workers to stop experience collection... (26350 times) [2024-06-15 17:45:08,307][1652491] InferenceWorker_p0-w0: stopping experience collection (26350 times) [2024-06-15 17:45:08,309][1652491] Updated weights for policy 0, policy_version 506071 (0.0012) [2024-06-15 17:45:08,452][1651469] Signal inference workers to resume experience collection... (26350 times) [2024-06-15 17:45:08,453][1652491] InferenceWorker_p0-w0: resuming experience collection (26350 times) [2024-06-15 17:45:10,126][1652491] Updated weights for policy 0, policy_version 506131 (0.0013) [2024-06-15 17:45:10,955][1648985] Fps is (10 sec: 52429.2, 60 sec: 48059.8, 300 sec: 47541.4). Total num frames: 1036648448. Throughput: 0: 11605.3. Samples: 259200512. Policy #0 lag: (min: 15.0, avg: 97.0, max: 271.0) [2024-06-15 17:45:10,955][1648985] Avg episode reward: [(0, '164.700')] [2024-06-15 17:45:13,768][1652491] Updated weights for policy 0, policy_version 506224 (0.0125) [2024-06-15 17:45:15,955][1648985] Fps is (10 sec: 39321.7, 60 sec: 45875.2, 300 sec: 47097.1). Total num frames: 1036779520. Throughput: 0: 11298.2. Samples: 259262976. Policy #0 lag: (min: 15.0, avg: 97.0, max: 271.0) [2024-06-15 17:45:15,955][1648985] Avg episode reward: [(0, '174.610')] [2024-06-15 17:45:16,662][1652491] Updated weights for policy 0, policy_version 506256 (0.0011) [2024-06-15 17:45:17,503][1652491] Updated weights for policy 0, policy_version 506292 (0.0022) [2024-06-15 17:45:20,251][1652491] Updated weights for policy 0, policy_version 506352 (0.0014) [2024-06-15 17:45:20,955][1648985] Fps is (10 sec: 39321.8, 60 sec: 46421.4, 300 sec: 47097.1). Total num frames: 1037041664. Throughput: 0: 11366.4. Samples: 259335168. Policy #0 lag: (min: 15.0, avg: 114.7, max: 271.0) [2024-06-15 17:45:20,955][1648985] Avg episode reward: [(0, '169.920')] [2024-06-15 17:45:22,163][1652491] Updated weights for policy 0, policy_version 506416 (0.0013) [2024-06-15 17:45:24,832][1652491] Updated weights for policy 0, policy_version 506464 (0.0013) [2024-06-15 17:45:25,702][1652491] Updated weights for policy 0, policy_version 506496 (0.0013) [2024-06-15 17:45:25,957][1648985] Fps is (10 sec: 52416.2, 60 sec: 45873.3, 300 sec: 47318.8). Total num frames: 1037303808. Throughput: 0: 11491.0. Samples: 259369472. Policy #0 lag: (min: 15.0, avg: 114.7, max: 271.0) [2024-06-15 17:45:25,959][1648985] Avg episode reward: [(0, '157.010')] [2024-06-15 17:45:28,640][1652491] Updated weights for policy 0, policy_version 506549 (0.0013) [2024-06-15 17:45:30,955][1648985] Fps is (10 sec: 39321.5, 60 sec: 45875.4, 300 sec: 46652.8). Total num frames: 1037434880. Throughput: 0: 11389.2. Samples: 259441152. Policy #0 lag: (min: 15.0, avg: 114.7, max: 271.0) [2024-06-15 17:45:30,955][1648985] Avg episode reward: [(0, '168.030')] [2024-06-15 17:45:32,015][1652491] Updated weights for policy 0, policy_version 506618 (0.0014) [2024-06-15 17:45:33,843][1652491] Updated weights for policy 0, policy_version 506688 (0.0137) [2024-06-15 17:45:35,955][1648985] Fps is (10 sec: 45886.1, 60 sec: 45875.2, 300 sec: 47319.2). Total num frames: 1037762560. Throughput: 0: 11309.5. Samples: 259505152. Policy #0 lag: (min: 15.0, avg: 114.7, max: 271.0) [2024-06-15 17:45:35,956][1648985] Avg episode reward: [(0, '162.160')] [2024-06-15 17:45:36,516][1652491] Updated weights for policy 0, policy_version 506747 (0.0012) [2024-06-15 17:45:39,755][1652491] Updated weights for policy 0, policy_version 506815 (0.0012) [2024-06-15 17:45:40,955][1648985] Fps is (10 sec: 52428.4, 60 sec: 45875.2, 300 sec: 46652.8). Total num frames: 1037959168. Throughput: 0: 11377.9. Samples: 259542528. Policy #0 lag: (min: 15.0, avg: 114.7, max: 271.0) [2024-06-15 17:45:40,956][1648985] Avg episode reward: [(0, '160.630')] [2024-06-15 17:45:43,701][1652491] Updated weights for policy 0, policy_version 506869 (0.0013) [2024-06-15 17:45:44,638][1652491] Updated weights for policy 0, policy_version 506913 (0.0065) [2024-06-15 17:45:45,955][1648985] Fps is (10 sec: 45875.2, 60 sec: 45875.3, 300 sec: 47097.1). Total num frames: 1038221312. Throughput: 0: 11548.5. Samples: 259614208. Policy #0 lag: (min: 15.0, avg: 114.7, max: 271.0) [2024-06-15 17:45:45,955][1648985] Avg episode reward: [(0, '153.520')] [2024-06-15 17:45:46,913][1652491] Updated weights for policy 0, policy_version 506980 (0.0015) [2024-06-15 17:45:48,941][1652491] Updated weights for policy 0, policy_version 507024 (0.0014) [2024-06-15 17:45:50,955][1648985] Fps is (10 sec: 52428.3, 60 sec: 45875.1, 300 sec: 47097.1). Total num frames: 1038483456. Throughput: 0: 11559.8. Samples: 259685376. Policy #0 lag: (min: 15.0, avg: 114.7, max: 271.0) [2024-06-15 17:45:50,957][1648985] Avg episode reward: [(0, '140.840')] [2024-06-15 17:45:53,800][1652491] Updated weights for policy 0, policy_version 507092 (0.0013) [2024-06-15 17:45:54,161][1651469] Signal inference workers to stop experience collection... (26400 times) [2024-06-15 17:45:54,225][1652491] InferenceWorker_p0-w0: stopping experience collection (26400 times) [2024-06-15 17:45:54,409][1651469] Signal inference workers to resume experience collection... (26400 times) [2024-06-15 17:45:54,409][1652491] InferenceWorker_p0-w0: resuming experience collection (26400 times) [2024-06-15 17:45:54,932][1652491] Updated weights for policy 0, policy_version 507136 (0.0013) [2024-06-15 17:45:55,955][1648985] Fps is (10 sec: 45874.8, 60 sec: 46967.7, 300 sec: 47319.2). Total num frames: 1038680064. Throughput: 0: 11719.1. Samples: 259727872. Policy #0 lag: (min: 15.0, avg: 114.7, max: 271.0) [2024-06-15 17:45:55,956][1648985] Avg episode reward: [(0, '135.750')] [2024-06-15 17:45:56,313][1652491] Updated weights for policy 0, policy_version 507195 (0.0235) [2024-06-15 17:45:58,665][1652491] Updated weights for policy 0, policy_version 507256 (0.0015) [2024-06-15 17:46:00,768][1652491] Updated weights for policy 0, policy_version 507299 (0.0012) [2024-06-15 17:46:00,955][1648985] Fps is (10 sec: 45875.0, 60 sec: 46967.4, 300 sec: 47319.2). Total num frames: 1038942208. Throughput: 0: 11741.8. Samples: 259791360. Policy #0 lag: (min: 15.0, avg: 114.7, max: 271.0) [2024-06-15 17:46:00,956][1648985] Avg episode reward: [(0, '154.320')] [2024-06-15 17:46:04,493][1652491] Updated weights for policy 0, policy_version 507329 (0.0028) [2024-06-15 17:46:05,814][1652491] Updated weights for policy 0, policy_version 507385 (0.0010) [2024-06-15 17:46:05,955][1648985] Fps is (10 sec: 45875.3, 60 sec: 45875.2, 300 sec: 47097.1). Total num frames: 1039138816. Throughput: 0: 11821.5. Samples: 259867136. Policy #0 lag: (min: 15.0, avg: 114.7, max: 271.0) [2024-06-15 17:46:05,956][1648985] Avg episode reward: [(0, '144.990')] [2024-06-15 17:46:07,169][1652491] Updated weights for policy 0, policy_version 507443 (0.0014) [2024-06-15 17:46:09,512][1652491] Updated weights for policy 0, policy_version 507504 (0.0013) [2024-06-15 17:46:10,955][1648985] Fps is (10 sec: 45875.6, 60 sec: 45875.1, 300 sec: 47097.1). Total num frames: 1039400960. Throughput: 0: 11765.2. Samples: 259898880. Policy #0 lag: (min: 15.0, avg: 114.7, max: 271.0) [2024-06-15 17:46:10,956][1648985] Avg episode reward: [(0, '145.530')] [2024-06-15 17:46:12,411][1652491] Updated weights for policy 0, policy_version 507579 (0.0016) [2024-06-15 17:46:15,955][1648985] Fps is (10 sec: 39321.8, 60 sec: 45875.2, 300 sec: 46652.7). Total num frames: 1039532032. Throughput: 0: 11798.7. Samples: 259972096. Policy #0 lag: (min: 15.0, avg: 114.7, max: 271.0) [2024-06-15 17:46:15,956][1648985] Avg episode reward: [(0, '157.250')] [2024-06-15 17:46:17,313][1652491] Updated weights for policy 0, policy_version 507648 (0.0013) [2024-06-15 17:46:18,548][1652491] Updated weights for policy 0, policy_version 507709 (0.0033) [2024-06-15 17:46:20,547][1652491] Updated weights for policy 0, policy_version 507771 (0.0013) [2024-06-15 17:46:20,955][1648985] Fps is (10 sec: 52429.4, 60 sec: 48059.7, 300 sec: 47097.1). Total num frames: 1039925248. Throughput: 0: 11844.3. Samples: 260038144. Policy #0 lag: (min: 15.0, avg: 114.7, max: 271.0) [2024-06-15 17:46:20,955][1648985] Avg episode reward: [(0, '152.810')] [2024-06-15 17:46:23,821][1652491] Updated weights for policy 0, policy_version 507840 (0.0015) [2024-06-15 17:46:25,955][1648985] Fps is (10 sec: 52428.6, 60 sec: 45877.0, 300 sec: 46652.8). Total num frames: 1040056320. Throughput: 0: 11741.9. Samples: 260070912. Policy #0 lag: (min: 15.0, avg: 114.7, max: 271.0) [2024-06-15 17:46:25,956][1648985] Avg episode reward: [(0, '157.940')] [2024-06-15 17:46:29,842][1652491] Updated weights for policy 0, policy_version 507941 (0.0012) [2024-06-15 17:46:30,955][1648985] Fps is (10 sec: 39321.4, 60 sec: 48059.7, 300 sec: 46763.8). Total num frames: 1040318464. Throughput: 0: 11707.7. Samples: 260141056. Policy #0 lag: (min: 15.0, avg: 114.7, max: 271.0) [2024-06-15 17:46:30,956][1648985] Avg episode reward: [(0, '162.340')] [2024-06-15 17:46:31,391][1652491] Updated weights for policy 0, policy_version 507984 (0.0013) [2024-06-15 17:46:33,723][1652491] Updated weights for policy 0, policy_version 508048 (0.0012) [2024-06-15 17:46:34,667][1652491] Updated weights for policy 0, policy_version 508096 (0.0015) [2024-06-15 17:46:35,960][1648985] Fps is (10 sec: 52405.0, 60 sec: 46963.9, 300 sec: 46874.2). Total num frames: 1040580608. Throughput: 0: 11661.1. Samples: 260210176. Policy #0 lag: (min: 15.0, avg: 114.7, max: 271.0) [2024-06-15 17:46:35,961][1648985] Avg episode reward: [(0, '180.110')] [2024-06-15 17:46:39,381][1651469] Signal inference workers to stop experience collection... (26450 times) [2024-06-15 17:46:39,480][1652491] InferenceWorker_p0-w0: stopping experience collection (26450 times) [2024-06-15 17:46:39,694][1651469] Signal inference workers to resume experience collection... (26450 times) [2024-06-15 17:46:39,702][1652491] InferenceWorker_p0-w0: resuming experience collection (26450 times) [2024-06-15 17:46:40,962][1648985] Fps is (10 sec: 39293.2, 60 sec: 45869.7, 300 sec: 46762.7). Total num frames: 1040711680. Throughput: 0: 11683.1. Samples: 260253696. Policy #0 lag: (min: 15.0, avg: 114.7, max: 271.0) [2024-06-15 17:46:40,963][1648985] Avg episode reward: [(0, '163.170')] [2024-06-15 17:46:41,424][1652491] Updated weights for policy 0, policy_version 508180 (0.0137) [2024-06-15 17:46:42,645][1652491] Updated weights for policy 0, policy_version 508230 (0.0012) [2024-06-15 17:46:45,505][1652491] Updated weights for policy 0, policy_version 508304 (0.0012) [2024-06-15 17:46:45,955][1648985] Fps is (10 sec: 45895.7, 60 sec: 46967.4, 300 sec: 46875.7). Total num frames: 1041039360. Throughput: 0: 11537.1. Samples: 260310528. Policy #0 lag: (min: 15.0, avg: 157.2, max: 271.0) [2024-06-15 17:46:45,956][1648985] Avg episode reward: [(0, '150.630')] [2024-06-15 17:46:50,955][1648985] Fps is (10 sec: 39349.9, 60 sec: 43690.7, 300 sec: 46541.7). Total num frames: 1041104896. Throughput: 0: 11525.7. Samples: 260385792. Policy #0 lag: (min: 15.0, avg: 157.2, max: 271.0) [2024-06-15 17:46:50,956][1648985] Avg episode reward: [(0, '152.770')] [2024-06-15 17:46:51,484][1652491] Updated weights for policy 0, policy_version 508370 (0.0018) [2024-06-15 17:46:53,213][1652491] Updated weights for policy 0, policy_version 508433 (0.0078) [2024-06-15 17:46:54,346][1652491] Updated weights for policy 0, policy_version 508481 (0.0013) [2024-06-15 17:46:55,694][1652491] Updated weights for policy 0, policy_version 508542 (0.0012) [2024-06-15 17:46:55,955][1648985] Fps is (10 sec: 45873.7, 60 sec: 46967.2, 300 sec: 46763.7). Total num frames: 1041498112. Throughput: 0: 11366.3. Samples: 260410368. Policy #0 lag: (min: 15.0, avg: 157.2, max: 271.0) [2024-06-15 17:46:55,956][1648985] Avg episode reward: [(0, '161.060')] [2024-06-15 17:46:55,963][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000508544_1041498112.pth... [2024-06-15 17:46:56,016][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000503040_1030225920.pth [2024-06-15 17:46:57,978][1652491] Updated weights for policy 0, policy_version 508601 (0.0014) [2024-06-15 17:47:00,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 44783.0, 300 sec: 46652.7). Total num frames: 1041629184. Throughput: 0: 11298.1. Samples: 260480512. Policy #0 lag: (min: 15.0, avg: 157.2, max: 271.0) [2024-06-15 17:47:00,956][1648985] Avg episode reward: [(0, '156.150')] [2024-06-15 17:47:03,629][1652491] Updated weights for policy 0, policy_version 508641 (0.0014) [2024-06-15 17:47:05,306][1652491] Updated weights for policy 0, policy_version 508705 (0.0023) [2024-06-15 17:47:05,955][1648985] Fps is (10 sec: 39323.4, 60 sec: 45875.3, 300 sec: 46652.9). Total num frames: 1041891328. Throughput: 0: 11309.5. Samples: 260547072. Policy #0 lag: (min: 15.0, avg: 157.2, max: 271.0) [2024-06-15 17:47:05,956][1648985] Avg episode reward: [(0, '148.830')] [2024-06-15 17:47:06,126][1652491] Updated weights for policy 0, policy_version 508752 (0.0014) [2024-06-15 17:47:08,448][1652491] Updated weights for policy 0, policy_version 508816 (0.0013) [2024-06-15 17:47:10,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 45875.2, 300 sec: 46652.7). Total num frames: 1042153472. Throughput: 0: 11298.1. Samples: 260579328. Policy #0 lag: (min: 15.0, avg: 157.2, max: 271.0) [2024-06-15 17:47:10,956][1648985] Avg episode reward: [(0, '150.930')] [2024-06-15 17:47:14,510][1652491] Updated weights for policy 0, policy_version 508880 (0.0014) [2024-06-15 17:47:15,926][1652491] Updated weights for policy 0, policy_version 508931 (0.0014) [2024-06-15 17:47:15,955][1648985] Fps is (10 sec: 39321.9, 60 sec: 45875.3, 300 sec: 46208.5). Total num frames: 1042284544. Throughput: 0: 11434.7. Samples: 260655616. Policy #0 lag: (min: 15.0, avg: 157.2, max: 271.0) [2024-06-15 17:47:15,955][1648985] Avg episode reward: [(0, '145.510')] [2024-06-15 17:47:18,038][1651469] Signal inference workers to stop experience collection... (26500 times) [2024-06-15 17:47:18,075][1652491] InferenceWorker_p0-w0: stopping experience collection (26500 times) [2024-06-15 17:47:18,273][1651469] Signal inference workers to resume experience collection... (26500 times) [2024-06-15 17:47:18,295][1652491] InferenceWorker_p0-w0: resuming experience collection (26500 times) [2024-06-15 17:47:18,297][1652491] Updated weights for policy 0, policy_version 509024 (0.0013) [2024-06-15 17:47:20,955][1648985] Fps is (10 sec: 45875.5, 60 sec: 44782.9, 300 sec: 46652.8). Total num frames: 1042612224. Throughput: 0: 11185.5. Samples: 260713472. Policy #0 lag: (min: 15.0, avg: 157.2, max: 271.0) [2024-06-15 17:47:20,955][1648985] Avg episode reward: [(0, '135.670')] [2024-06-15 17:47:21,381][1652491] Updated weights for policy 0, policy_version 509104 (0.0161) [2024-06-15 17:47:25,955][1648985] Fps is (10 sec: 39320.8, 60 sec: 43690.6, 300 sec: 46208.5). Total num frames: 1042677760. Throughput: 0: 10981.3. Samples: 260747776. Policy #0 lag: (min: 15.0, avg: 157.2, max: 271.0) [2024-06-15 17:47:25,956][1648985] Avg episode reward: [(0, '138.440')] [2024-06-15 17:47:27,836][1652491] Updated weights for policy 0, policy_version 509168 (0.0015) [2024-06-15 17:47:29,620][1652491] Updated weights for policy 0, policy_version 509236 (0.0013) [2024-06-15 17:47:30,957][1648985] Fps is (10 sec: 39312.0, 60 sec: 44781.1, 300 sec: 46430.2). Total num frames: 1043005440. Throughput: 0: 11081.4. Samples: 260809216. Policy #0 lag: (min: 15.0, avg: 157.2, max: 271.0) [2024-06-15 17:47:30,958][1648985] Avg episode reward: [(0, '132.190')] [2024-06-15 17:47:31,242][1652491] Updated weights for policy 0, policy_version 509308 (0.0014) [2024-06-15 17:47:33,581][1652491] Updated weights for policy 0, policy_version 509360 (0.0013) [2024-06-15 17:47:35,955][1648985] Fps is (10 sec: 52429.4, 60 sec: 43694.0, 300 sec: 46652.7). Total num frames: 1043202048. Throughput: 0: 10911.3. Samples: 260876800. Policy #0 lag: (min: 15.0, avg: 157.2, max: 271.0) [2024-06-15 17:47:35,955][1648985] Avg episode reward: [(0, '135.520')] [2024-06-15 17:47:40,238][1652491] Updated weights for policy 0, policy_version 509440 (0.0013) [2024-06-15 17:47:40,955][1648985] Fps is (10 sec: 36053.4, 60 sec: 44242.1, 300 sec: 46208.4). Total num frames: 1043365888. Throughput: 0: 11332.4. Samples: 260920320. Policy #0 lag: (min: 15.0, avg: 157.2, max: 271.0) [2024-06-15 17:47:40,956][1648985] Avg episode reward: [(0, '143.620')] [2024-06-15 17:47:42,382][1652491] Updated weights for policy 0, policy_version 509521 (0.0014) [2024-06-15 17:47:43,241][1652491] Updated weights for policy 0, policy_version 509566 (0.0012) [2024-06-15 17:47:45,955][1648985] Fps is (10 sec: 52428.2, 60 sec: 44782.9, 300 sec: 46652.7). Total num frames: 1043726336. Throughput: 0: 10956.8. Samples: 260973568. Policy #0 lag: (min: 15.0, avg: 157.2, max: 271.0) [2024-06-15 17:47:45,956][1648985] Avg episode reward: [(0, '160.240')] [2024-06-15 17:47:50,955][1648985] Fps is (10 sec: 39321.8, 60 sec: 44236.8, 300 sec: 45875.2). Total num frames: 1043759104. Throughput: 0: 11241.2. Samples: 261052928. Policy #0 lag: (min: 15.0, avg: 157.2, max: 271.0) [2024-06-15 17:47:50,955][1648985] Avg episode reward: [(0, '140.030')] [2024-06-15 17:47:51,037][1652491] Updated weights for policy 0, policy_version 509651 (0.0014) [2024-06-15 17:47:52,490][1652491] Updated weights for policy 0, policy_version 509713 (0.0014) [2024-06-15 17:47:53,866][1652491] Updated weights for policy 0, policy_version 509778 (0.0011) [2024-06-15 17:47:55,460][1652491] Updated weights for policy 0, policy_version 509841 (0.0013) [2024-06-15 17:47:55,958][1648985] Fps is (10 sec: 45861.1, 60 sec: 44780.9, 300 sec: 46430.1). Total num frames: 1044185088. Throughput: 0: 11149.4. Samples: 261081088. Policy #0 lag: (min: 15.0, avg: 157.2, max: 271.0) [2024-06-15 17:47:55,959][1648985] Avg episode reward: [(0, '135.160')] [2024-06-15 17:48:00,955][1648985] Fps is (10 sec: 49151.2, 60 sec: 43690.6, 300 sec: 46097.3). Total num frames: 1044250624. Throughput: 0: 10990.9. Samples: 261150208. Policy #0 lag: (min: 15.0, avg: 157.2, max: 271.0) [2024-06-15 17:48:00,956][1648985] Avg episode reward: [(0, '145.480')] [2024-06-15 17:48:01,921][1652491] Updated weights for policy 0, policy_version 509889 (0.0012) [2024-06-15 17:48:02,732][1651469] Signal inference workers to stop experience collection... (26550 times) [2024-06-15 17:48:02,766][1652491] InferenceWorker_p0-w0: stopping experience collection (26550 times) [2024-06-15 17:48:03,043][1651469] Signal inference workers to resume experience collection... (26550 times) [2024-06-15 17:48:03,044][1652491] InferenceWorker_p0-w0: resuming experience collection (26550 times) [2024-06-15 17:48:03,883][1652491] Updated weights for policy 0, policy_version 509968 (0.0012) [2024-06-15 17:48:05,508][1652491] Updated weights for policy 0, policy_version 510032 (0.0011) [2024-06-15 17:48:05,955][1648985] Fps is (10 sec: 39334.2, 60 sec: 44782.9, 300 sec: 46319.5). Total num frames: 1044578304. Throughput: 0: 11138.8. Samples: 261214720. Policy #0 lag: (min: 15.0, avg: 157.2, max: 271.0) [2024-06-15 17:48:05,955][1648985] Avg episode reward: [(0, '168.120')] [2024-06-15 17:48:06,978][1652491] Updated weights for policy 0, policy_version 510096 (0.0012) [2024-06-15 17:48:10,955][1648985] Fps is (10 sec: 52429.4, 60 sec: 43690.7, 300 sec: 46541.7). Total num frames: 1044774912. Throughput: 0: 11104.7. Samples: 261247488. Policy #0 lag: (min: 15.0, avg: 157.2, max: 271.0) [2024-06-15 17:48:10,955][1648985] Avg episode reward: [(0, '155.330')] [2024-06-15 17:48:13,634][1652491] Updated weights for policy 0, policy_version 510145 (0.0011) [2024-06-15 17:48:15,462][1652491] Updated weights for policy 0, policy_version 510212 (0.0012) [2024-06-15 17:48:15,955][1648985] Fps is (10 sec: 36044.2, 60 sec: 44236.6, 300 sec: 45986.3). Total num frames: 1044938752. Throughput: 0: 11401.1. Samples: 261322240. Policy #0 lag: (min: 15.0, avg: 80.6, max: 271.0) [2024-06-15 17:48:15,956][1648985] Avg episode reward: [(0, '141.390')] [2024-06-15 17:48:16,970][1652491] Updated weights for policy 0, policy_version 510272 (0.0090) [2024-06-15 17:48:18,110][1652491] Updated weights for policy 0, policy_version 510325 (0.0012) [2024-06-15 17:48:19,622][1652491] Updated weights for policy 0, policy_version 510389 (0.0013) [2024-06-15 17:48:20,955][1648985] Fps is (10 sec: 52428.5, 60 sec: 44782.8, 300 sec: 46652.8). Total num frames: 1045299200. Throughput: 0: 11218.5. Samples: 261381632. Policy #0 lag: (min: 15.0, avg: 80.6, max: 271.0) [2024-06-15 17:48:20,956][1648985] Avg episode reward: [(0, '133.010')] [2024-06-15 17:48:25,847][1652491] Updated weights for policy 0, policy_version 510433 (0.0011) [2024-06-15 17:48:25,955][1648985] Fps is (10 sec: 42599.0, 60 sec: 44783.0, 300 sec: 45986.3). Total num frames: 1045364736. Throughput: 0: 11138.9. Samples: 261421568. Policy #0 lag: (min: 15.0, avg: 80.6, max: 271.0) [2024-06-15 17:48:25,955][1648985] Avg episode reward: [(0, '138.790')] [2024-06-15 17:48:26,918][1652491] Updated weights for policy 0, policy_version 510480 (0.0014) [2024-06-15 17:48:28,133][1652491] Updated weights for policy 0, policy_version 510528 (0.0024) [2024-06-15 17:48:30,955][1648985] Fps is (10 sec: 42598.8, 60 sec: 45330.9, 300 sec: 46319.5). Total num frames: 1045725184. Throughput: 0: 11309.5. Samples: 261482496. Policy #0 lag: (min: 15.0, avg: 80.6, max: 271.0) [2024-06-15 17:48:30,956][1648985] Avg episode reward: [(0, '150.390')] [2024-06-15 17:48:31,083][1652491] Updated weights for policy 0, policy_version 510609 (0.0149) [2024-06-15 17:48:31,964][1652491] Updated weights for policy 0, policy_version 510656 (0.0013) [2024-06-15 17:48:35,955][1648985] Fps is (10 sec: 45874.7, 60 sec: 43690.6, 300 sec: 45764.1). Total num frames: 1045823488. Throughput: 0: 11070.5. Samples: 261551104. Policy #0 lag: (min: 15.0, avg: 80.6, max: 271.0) [2024-06-15 17:48:35,957][1648985] Avg episode reward: [(0, '144.110')] [2024-06-15 17:48:37,745][1652491] Updated weights for policy 0, policy_version 510720 (0.0015) [2024-06-15 17:48:39,502][1652491] Updated weights for policy 0, policy_version 510776 (0.0014) [2024-06-15 17:48:40,955][1648985] Fps is (10 sec: 39321.3, 60 sec: 45875.2, 300 sec: 45875.2). Total num frames: 1046118400. Throughput: 0: 11162.4. Samples: 261583360. Policy #0 lag: (min: 15.0, avg: 80.6, max: 271.0) [2024-06-15 17:48:40,956][1648985] Avg episode reward: [(0, '154.660')] [2024-06-15 17:48:40,959][1652491] Updated weights for policy 0, policy_version 510804 (0.0062) [2024-06-15 17:48:41,502][1651469] Signal inference workers to stop experience collection... (26600 times) [2024-06-15 17:48:41,564][1652491] InferenceWorker_p0-w0: stopping experience collection (26600 times) [2024-06-15 17:48:41,753][1651469] Signal inference workers to resume experience collection... (26600 times) [2024-06-15 17:48:41,754][1652491] InferenceWorker_p0-w0: resuming experience collection (26600 times) [2024-06-15 17:48:42,521][1652491] Updated weights for policy 0, policy_version 510883 (0.0180) [2024-06-15 17:48:45,955][1648985] Fps is (10 sec: 52429.3, 60 sec: 43690.7, 300 sec: 45986.3). Total num frames: 1046347776. Throughput: 0: 11207.1. Samples: 261654528. Policy #0 lag: (min: 15.0, avg: 80.6, max: 271.0) [2024-06-15 17:48:45,956][1648985] Avg episode reward: [(0, '146.990')] [2024-06-15 17:48:47,840][1652491] Updated weights for policy 0, policy_version 510936 (0.0013) [2024-06-15 17:48:50,050][1652491] Updated weights for policy 0, policy_version 511008 (0.0097) [2024-06-15 17:48:50,955][1648985] Fps is (10 sec: 49152.4, 60 sec: 47513.6, 300 sec: 45764.1). Total num frames: 1046609920. Throughput: 0: 11400.5. Samples: 261727744. Policy #0 lag: (min: 15.0, avg: 80.6, max: 271.0) [2024-06-15 17:48:50,955][1648985] Avg episode reward: [(0, '129.070')] [2024-06-15 17:48:51,602][1652491] Updated weights for policy 0, policy_version 511056 (0.0013) [2024-06-15 17:48:53,389][1652491] Updated weights for policy 0, policy_version 511136 (0.0090) [2024-06-15 17:48:55,955][1648985] Fps is (10 sec: 52428.0, 60 sec: 44785.2, 300 sec: 46097.3). Total num frames: 1046872064. Throughput: 0: 11400.5. Samples: 261760512. Policy #0 lag: (min: 15.0, avg: 80.6, max: 271.0) [2024-06-15 17:48:55,956][1648985] Avg episode reward: [(0, '144.040')] [2024-06-15 17:48:56,000][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000511168_1046872064.pth... [2024-06-15 17:48:56,066][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000505824_1035927552.pth [2024-06-15 17:48:59,082][1652491] Updated weights for policy 0, policy_version 511203 (0.0016) [2024-06-15 17:49:00,753][1652491] Updated weights for policy 0, policy_version 511248 (0.0012) [2024-06-15 17:49:00,955][1648985] Fps is (10 sec: 42598.2, 60 sec: 46421.4, 300 sec: 45764.1). Total num frames: 1047035904. Throughput: 0: 11411.9. Samples: 261835776. Policy #0 lag: (min: 15.0, avg: 80.6, max: 271.0) [2024-06-15 17:49:00,956][1648985] Avg episode reward: [(0, '141.110')] [2024-06-15 17:49:02,169][1652491] Updated weights for policy 0, policy_version 511296 (0.0011) [2024-06-15 17:49:04,553][1652491] Updated weights for policy 0, policy_version 511362 (0.0013) [2024-06-15 17:49:05,532][1652491] Updated weights for policy 0, policy_version 511419 (0.0017) [2024-06-15 17:49:05,955][1648985] Fps is (10 sec: 52428.6, 60 sec: 46967.3, 300 sec: 46208.4). Total num frames: 1047396352. Throughput: 0: 11468.8. Samples: 261897728. Policy #0 lag: (min: 15.0, avg: 80.6, max: 271.0) [2024-06-15 17:49:05,956][1648985] Avg episode reward: [(0, '165.260')] [2024-06-15 17:49:10,955][1648985] Fps is (10 sec: 42598.1, 60 sec: 44782.9, 300 sec: 45541.9). Total num frames: 1047461888. Throughput: 0: 11514.3. Samples: 261939712. Policy #0 lag: (min: 15.0, avg: 80.6, max: 271.0) [2024-06-15 17:49:10,957][1648985] Avg episode reward: [(0, '162.050')] [2024-06-15 17:49:11,465][1652491] Updated weights for policy 0, policy_version 511474 (0.0025) [2024-06-15 17:49:12,240][1652491] Updated weights for policy 0, policy_version 511504 (0.0011) [2024-06-15 17:49:14,230][1652491] Updated weights for policy 0, policy_version 511553 (0.0011) [2024-06-15 17:49:15,955][1648985] Fps is (10 sec: 39322.6, 60 sec: 47513.7, 300 sec: 45875.2). Total num frames: 1047789568. Throughput: 0: 11662.2. Samples: 262007296. Policy #0 lag: (min: 15.0, avg: 80.6, max: 271.0) [2024-06-15 17:49:15,955][1648985] Avg episode reward: [(0, '143.900')] [2024-06-15 17:49:16,320][1652491] Updated weights for policy 0, policy_version 511633 (0.0012) [2024-06-15 17:49:20,955][1648985] Fps is (10 sec: 45875.9, 60 sec: 43690.8, 300 sec: 45319.8). Total num frames: 1047920640. Throughput: 0: 11662.3. Samples: 262075904. Policy #0 lag: (min: 15.0, avg: 80.6, max: 271.0) [2024-06-15 17:49:20,955][1648985] Avg episode reward: [(0, '136.300')] [2024-06-15 17:49:21,938][1652491] Updated weights for policy 0, policy_version 511712 (0.0014) [2024-06-15 17:49:23,869][1652491] Updated weights for policy 0, policy_version 511747 (0.0012) [2024-06-15 17:49:25,387][1652491] Updated weights for policy 0, policy_version 511805 (0.0144) [2024-06-15 17:49:25,779][1651469] Signal inference workers to stop experience collection... (26650 times) [2024-06-15 17:49:25,833][1652491] InferenceWorker_p0-w0: stopping experience collection (26650 times) [2024-06-15 17:49:25,955][1648985] Fps is (10 sec: 39320.6, 60 sec: 46967.3, 300 sec: 45764.1). Total num frames: 1048182784. Throughput: 0: 11719.1. Samples: 262110720. Policy #0 lag: (min: 15.0, avg: 80.6, max: 271.0) [2024-06-15 17:49:25,956][1648985] Avg episode reward: [(0, '150.080')] [2024-06-15 17:49:26,090][1651469] Signal inference workers to resume experience collection... (26650 times) [2024-06-15 17:49:26,091][1652491] InferenceWorker_p0-w0: resuming experience collection (26650 times) [2024-06-15 17:49:26,960][1652491] Updated weights for policy 0, policy_version 511860 (0.0088) [2024-06-15 17:49:28,098][1652491] Updated weights for policy 0, policy_version 511912 (0.0014) [2024-06-15 17:49:30,955][1648985] Fps is (10 sec: 52428.4, 60 sec: 45329.0, 300 sec: 45542.0). Total num frames: 1048444928. Throughput: 0: 11673.6. Samples: 262179840. Policy #0 lag: (min: 15.0, avg: 80.6, max: 271.0) [2024-06-15 17:49:30,956][1648985] Avg episode reward: [(0, '164.390')] [2024-06-15 17:49:32,072][1652491] Updated weights for policy 0, policy_version 511952 (0.0017) [2024-06-15 17:49:35,661][1652491] Updated weights for policy 0, policy_version 512018 (0.0015) [2024-06-15 17:49:35,962][1648985] Fps is (10 sec: 45843.0, 60 sec: 46961.9, 300 sec: 45540.9). Total num frames: 1048641536. Throughput: 0: 11637.6. Samples: 262251520. Policy #0 lag: (min: 15.0, avg: 80.6, max: 271.0) [2024-06-15 17:49:35,963][1648985] Avg episode reward: [(0, '176.270')] [2024-06-15 17:49:36,580][1652491] Updated weights for policy 0, policy_version 512058 (0.0011) [2024-06-15 17:49:38,017][1652491] Updated weights for policy 0, policy_version 512112 (0.0036) [2024-06-15 17:49:39,552][1652491] Updated weights for policy 0, policy_version 512189 (0.0012) [2024-06-15 17:49:40,955][1648985] Fps is (10 sec: 52428.0, 60 sec: 47513.5, 300 sec: 45764.1). Total num frames: 1048969216. Throughput: 0: 11537.1. Samples: 262279680. Policy #0 lag: (min: 15.0, avg: 80.6, max: 271.0) [2024-06-15 17:49:40,956][1648985] Avg episode reward: [(0, '181.590')] [2024-06-15 17:49:44,453][1652491] Updated weights for policy 0, policy_version 512245 (0.0016) [2024-06-15 17:49:45,955][1648985] Fps is (10 sec: 45908.2, 60 sec: 45875.2, 300 sec: 45319.8). Total num frames: 1049100288. Throughput: 0: 11434.7. Samples: 262350336. Policy #0 lag: (min: 15.0, avg: 120.4, max: 271.0) [2024-06-15 17:49:45,956][1648985] Avg episode reward: [(0, '143.750')] [2024-06-15 17:49:46,974][1652491] Updated weights for policy 0, policy_version 512274 (0.0012) [2024-06-15 17:49:47,896][1652491] Updated weights for policy 0, policy_version 512318 (0.0011) [2024-06-15 17:49:49,770][1652491] Updated weights for policy 0, policy_version 512400 (0.0149) [2024-06-15 17:49:50,810][1652491] Updated weights for policy 0, policy_version 512448 (0.0013) [2024-06-15 17:49:50,955][1648985] Fps is (10 sec: 52429.5, 60 sec: 48059.7, 300 sec: 46208.5). Total num frames: 1049493504. Throughput: 0: 11514.3. Samples: 262415872. Policy #0 lag: (min: 15.0, avg: 120.4, max: 271.0) [2024-06-15 17:49:50,956][1648985] Avg episode reward: [(0, '126.090')] [2024-06-15 17:49:55,901][1652491] Updated weights for policy 0, policy_version 512512 (0.0012) [2024-06-15 17:49:55,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 45875.3, 300 sec: 45764.1). Total num frames: 1049624576. Throughput: 0: 11514.3. Samples: 262457856. Policy #0 lag: (min: 15.0, avg: 120.4, max: 271.0) [2024-06-15 17:49:55,956][1648985] Avg episode reward: [(0, '141.410')] [2024-06-15 17:49:59,379][1652491] Updated weights for policy 0, policy_version 512579 (0.0013) [2024-06-15 17:50:00,540][1652491] Updated weights for policy 0, policy_version 512636 (0.0013) [2024-06-15 17:50:00,955][1648985] Fps is (10 sec: 42599.1, 60 sec: 48059.8, 300 sec: 45875.2). Total num frames: 1049919488. Throughput: 0: 11525.7. Samples: 262525952. Policy #0 lag: (min: 15.0, avg: 120.4, max: 271.0) [2024-06-15 17:50:00,955][1648985] Avg episode reward: [(0, '150.870')] [2024-06-15 17:50:01,397][1652491] Updated weights for policy 0, policy_version 512676 (0.0012) [2024-06-15 17:50:05,802][1652491] Updated weights for policy 0, policy_version 512722 (0.0012) [2024-06-15 17:50:05,955][1648985] Fps is (10 sec: 42598.6, 60 sec: 44236.9, 300 sec: 45430.9). Total num frames: 1050050560. Throughput: 0: 11707.7. Samples: 262602752. Policy #0 lag: (min: 15.0, avg: 120.4, max: 271.0) [2024-06-15 17:50:05,956][1648985] Avg episode reward: [(0, '139.130')] [2024-06-15 17:50:06,077][1651469] Signal inference workers to stop experience collection... (26700 times) [2024-06-15 17:50:06,130][1652491] InferenceWorker_p0-w0: stopping experience collection (26700 times) [2024-06-15 17:50:06,315][1651469] Signal inference workers to resume experience collection... (26700 times) [2024-06-15 17:50:06,316][1652491] InferenceWorker_p0-w0: resuming experience collection (26700 times) [2024-06-15 17:50:08,048][1652491] Updated weights for policy 0, policy_version 512770 (0.0012) [2024-06-15 17:50:09,465][1652491] Updated weights for policy 0, policy_version 512832 (0.0021) [2024-06-15 17:50:10,974][1648985] Fps is (10 sec: 49057.2, 60 sec: 49136.4, 300 sec: 46205.4). Total num frames: 1050411008. Throughput: 0: 11759.7. Samples: 262640128. Policy #0 lag: (min: 15.0, avg: 120.4, max: 271.0) [2024-06-15 17:50:10,975][1648985] Avg episode reward: [(0, '122.330')] [2024-06-15 17:50:11,648][1652491] Updated weights for policy 0, policy_version 512900 (0.0014) [2024-06-15 17:50:12,864][1652491] Updated weights for policy 0, policy_version 512960 (0.0022) [2024-06-15 17:50:15,955][1648985] Fps is (10 sec: 49152.0, 60 sec: 45875.2, 300 sec: 45764.1). Total num frames: 1050542080. Throughput: 0: 11753.3. Samples: 262708736. Policy #0 lag: (min: 15.0, avg: 120.4, max: 271.0) [2024-06-15 17:50:15,956][1648985] Avg episode reward: [(0, '116.330')] [2024-06-15 17:50:17,933][1652491] Updated weights for policy 0, policy_version 513008 (0.0036) [2024-06-15 17:50:20,388][1652491] Updated weights for policy 0, policy_version 513072 (0.0012) [2024-06-15 17:50:20,955][1648985] Fps is (10 sec: 39396.5, 60 sec: 48059.5, 300 sec: 45764.5). Total num frames: 1050804224. Throughput: 0: 11698.2. Samples: 262777856. Policy #0 lag: (min: 15.0, avg: 120.4, max: 271.0) [2024-06-15 17:50:20,956][1648985] Avg episode reward: [(0, '118.640')] [2024-06-15 17:50:21,944][1652491] Updated weights for policy 0, policy_version 513143 (0.0013) [2024-06-15 17:50:24,149][1652491] Updated weights for policy 0, policy_version 513185 (0.0015) [2024-06-15 17:50:25,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 48059.9, 300 sec: 46208.4). Total num frames: 1051066368. Throughput: 0: 11787.4. Samples: 262810112. Policy #0 lag: (min: 15.0, avg: 120.4, max: 271.0) [2024-06-15 17:50:25,956][1648985] Avg episode reward: [(0, '139.390')] [2024-06-15 17:50:28,635][1652491] Updated weights for policy 0, policy_version 513235 (0.0014) [2024-06-15 17:50:29,420][1652491] Updated weights for policy 0, policy_version 513270 (0.0012) [2024-06-15 17:50:30,461][1652491] Updated weights for policy 0, policy_version 513299 (0.0011) [2024-06-15 17:50:30,955][1648985] Fps is (10 sec: 45876.3, 60 sec: 46967.5, 300 sec: 45764.1). Total num frames: 1051262976. Throughput: 0: 11935.3. Samples: 262887424. Policy #0 lag: (min: 15.0, avg: 120.4, max: 271.0) [2024-06-15 17:50:30,955][1648985] Avg episode reward: [(0, '152.370')] [2024-06-15 17:50:31,999][1652491] Updated weights for policy 0, policy_version 513360 (0.0011) [2024-06-15 17:50:34,681][1652491] Updated weights for policy 0, policy_version 513424 (0.0012) [2024-06-15 17:50:35,955][1648985] Fps is (10 sec: 52428.0, 60 sec: 49157.8, 300 sec: 46208.4). Total num frames: 1051590656. Throughput: 0: 11935.3. Samples: 262952960. Policy #0 lag: (min: 15.0, avg: 120.4, max: 271.0) [2024-06-15 17:50:35,956][1648985] Avg episode reward: [(0, '158.450')] [2024-06-15 17:50:39,703][1652491] Updated weights for policy 0, policy_version 513504 (0.0013) [2024-06-15 17:50:40,832][1652491] Updated weights for policy 0, policy_version 513538 (0.0014) [2024-06-15 17:50:40,955][1648985] Fps is (10 sec: 45874.6, 60 sec: 45875.3, 300 sec: 45764.1). Total num frames: 1051721728. Throughput: 0: 11946.7. Samples: 262995456. Policy #0 lag: (min: 15.0, avg: 120.4, max: 271.0) [2024-06-15 17:50:40,956][1648985] Avg episode reward: [(0, '161.940')] [2024-06-15 17:50:42,269][1652491] Updated weights for policy 0, policy_version 513600 (0.0029) [2024-06-15 17:50:43,947][1652491] Updated weights for policy 0, policy_version 513656 (0.0014) [2024-06-15 17:50:45,955][1648985] Fps is (10 sec: 42599.3, 60 sec: 48605.9, 300 sec: 45875.2). Total num frames: 1052016640. Throughput: 0: 11980.8. Samples: 263065088. Policy #0 lag: (min: 15.0, avg: 120.4, max: 271.0) [2024-06-15 17:50:45,956][1648985] Avg episode reward: [(0, '150.470')] [2024-06-15 17:50:46,622][1652491] Updated weights for policy 0, policy_version 513726 (0.0015) [2024-06-15 17:50:50,436][1651469] Signal inference workers to stop experience collection... (26750 times) [2024-06-15 17:50:50,517][1652491] InferenceWorker_p0-w0: stopping experience collection (26750 times) [2024-06-15 17:50:50,633][1651469] Signal inference workers to resume experience collection... (26750 times) [2024-06-15 17:50:50,636][1652491] InferenceWorker_p0-w0: resuming experience collection (26750 times) [2024-06-15 17:50:50,955][1648985] Fps is (10 sec: 49152.8, 60 sec: 45329.2, 300 sec: 45875.2). Total num frames: 1052213248. Throughput: 0: 11901.2. Samples: 263138304. Policy #0 lag: (min: 15.0, avg: 120.4, max: 271.0) [2024-06-15 17:50:50,955][1648985] Avg episode reward: [(0, '135.010')] [2024-06-15 17:50:51,067][1652491] Updated weights for policy 0, policy_version 513781 (0.0061) [2024-06-15 17:50:53,112][1652491] Updated weights for policy 0, policy_version 513847 (0.0020) [2024-06-15 17:50:54,698][1652491] Updated weights for policy 0, policy_version 513890 (0.0012) [2024-06-15 17:50:55,955][1648985] Fps is (10 sec: 49151.2, 60 sec: 48059.7, 300 sec: 45986.3). Total num frames: 1052508160. Throughput: 0: 11735.5. Samples: 263168000. Policy #0 lag: (min: 15.0, avg: 120.4, max: 271.0) [2024-06-15 17:50:55,956][1648985] Avg episode reward: [(0, '150.400')] [2024-06-15 17:50:55,961][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000513920_1052508160.pth... [2024-06-15 17:50:56,055][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000508544_1041498112.pth [2024-06-15 17:50:56,060][1651469] Saving a milestone train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/milestones/checkpoint_000513920_1052508160.pth [2024-06-15 17:50:57,186][1652491] Updated weights for policy 0, policy_version 513922 (0.0012) [2024-06-15 17:51:00,955][1648985] Fps is (10 sec: 42598.3, 60 sec: 45329.0, 300 sec: 45764.1). Total num frames: 1052639232. Throughput: 0: 11730.5. Samples: 263236608. Policy #0 lag: (min: 15.0, avg: 120.4, max: 271.0) [2024-06-15 17:51:00,955][1648985] Avg episode reward: [(0, '147.960')] [2024-06-15 17:51:01,847][1652491] Updated weights for policy 0, policy_version 514000 (0.0024) [2024-06-15 17:51:03,473][1652491] Updated weights for policy 0, policy_version 514064 (0.0013) [2024-06-15 17:51:05,955][1648985] Fps is (10 sec: 42599.1, 60 sec: 48059.8, 300 sec: 45875.2). Total num frames: 1052934144. Throughput: 0: 11650.9. Samples: 263302144. Policy #0 lag: (min: 15.0, avg: 120.4, max: 271.0) [2024-06-15 17:51:05,955][1648985] Avg episode reward: [(0, '147.860')] [2024-06-15 17:51:06,019][1652491] Updated weights for policy 0, policy_version 514130 (0.0013) [2024-06-15 17:51:08,870][1652491] Updated weights for policy 0, policy_version 514192 (0.0011) [2024-06-15 17:51:09,799][1652491] Updated weights for policy 0, policy_version 514231 (0.0048) [2024-06-15 17:51:10,955][1648985] Fps is (10 sec: 52427.5, 60 sec: 45889.7, 300 sec: 46208.4). Total num frames: 1053163520. Throughput: 0: 11707.7. Samples: 263336960. Policy #0 lag: (min: 15.0, avg: 120.4, max: 271.0) [2024-06-15 17:51:10,956][1648985] Avg episode reward: [(0, '145.400')] [2024-06-15 17:51:14,764][1652491] Updated weights for policy 0, policy_version 514288 (0.0013) [2024-06-15 17:51:15,955][1648985] Fps is (10 sec: 39321.6, 60 sec: 46421.4, 300 sec: 45430.9). Total num frames: 1053327360. Throughput: 0: 11605.3. Samples: 263409664. Policy #0 lag: (min: 0.0, avg: 74.1, max: 256.0) [2024-06-15 17:51:15,955][1648985] Avg episode reward: [(0, '138.730')] [2024-06-15 17:51:16,164][1652491] Updated weights for policy 0, policy_version 514339 (0.0017) [2024-06-15 17:51:17,285][1652491] Updated weights for policy 0, policy_version 514384 (0.0012) [2024-06-15 17:51:18,635][1652491] Updated weights for policy 0, policy_version 514432 (0.0012) [2024-06-15 17:51:20,955][1648985] Fps is (10 sec: 52429.7, 60 sec: 48059.9, 300 sec: 46208.4). Total num frames: 1053687808. Throughput: 0: 11491.6. Samples: 263470080. Policy #0 lag: (min: 0.0, avg: 74.1, max: 256.0) [2024-06-15 17:51:20,956][1648985] Avg episode reward: [(0, '133.950')] [2024-06-15 17:51:25,146][1652491] Updated weights for policy 0, policy_version 514512 (0.0101) [2024-06-15 17:51:25,955][1648985] Fps is (10 sec: 45874.7, 60 sec: 45329.0, 300 sec: 45653.0). Total num frames: 1053786112. Throughput: 0: 11491.6. Samples: 263512576. Policy #0 lag: (min: 0.0, avg: 74.1, max: 256.0) [2024-06-15 17:51:25,956][1648985] Avg episode reward: [(0, '146.030')] [2024-06-15 17:51:26,535][1652491] Updated weights for policy 0, policy_version 514566 (0.0015) [2024-06-15 17:51:27,675][1652491] Updated weights for policy 0, policy_version 514618 (0.0013) [2024-06-15 17:51:29,807][1652491] Updated weights for policy 0, policy_version 514679 (0.0012) [2024-06-15 17:51:30,955][1648985] Fps is (10 sec: 39320.4, 60 sec: 46967.2, 300 sec: 45764.8). Total num frames: 1054081024. Throughput: 0: 11309.4. Samples: 263574016. Policy #0 lag: (min: 0.0, avg: 74.1, max: 256.0) [2024-06-15 17:51:30,956][1648985] Avg episode reward: [(0, '149.910')] [2024-06-15 17:51:31,207][1652491] Updated weights for policy 0, policy_version 514711 (0.0011) [2024-06-15 17:51:35,980][1648985] Fps is (10 sec: 42493.1, 60 sec: 43672.7, 300 sec: 45761.4). Total num frames: 1054212096. Throughput: 0: 11451.1. Samples: 263653888. Policy #0 lag: (min: 0.0, avg: 74.1, max: 256.0) [2024-06-15 17:51:35,980][1648985] Avg episode reward: [(0, '147.640')] [2024-06-15 17:51:36,316][1651469] Signal inference workers to stop experience collection... (26800 times) [2024-06-15 17:51:36,385][1652491] InferenceWorker_p0-w0: stopping experience collection (26800 times) [2024-06-15 17:51:36,387][1652491] Updated weights for policy 0, policy_version 514758 (0.0018) [2024-06-15 17:51:36,492][1651469] Signal inference workers to resume experience collection... (26800 times) [2024-06-15 17:51:36,493][1652491] InferenceWorker_p0-w0: resuming experience collection (26800 times) [2024-06-15 17:51:38,252][1652491] Updated weights for policy 0, policy_version 514837 (0.0127) [2024-06-15 17:51:40,297][1652491] Updated weights for policy 0, policy_version 514881 (0.0012) [2024-06-15 17:51:40,955][1648985] Fps is (10 sec: 42599.8, 60 sec: 46421.4, 300 sec: 45653.1). Total num frames: 1054507008. Throughput: 0: 11389.2. Samples: 263680512. Policy #0 lag: (min: 0.0, avg: 74.1, max: 256.0) [2024-06-15 17:51:40,955][1648985] Avg episode reward: [(0, '153.290')] [2024-06-15 17:51:41,494][1652491] Updated weights for policy 0, policy_version 514930 (0.0012) [2024-06-15 17:51:42,267][1652491] Updated weights for policy 0, policy_version 514960 (0.0017) [2024-06-15 17:51:43,387][1652491] Updated weights for policy 0, policy_version 515008 (0.0038) [2024-06-15 17:51:45,955][1648985] Fps is (10 sec: 52559.5, 60 sec: 45329.1, 300 sec: 46208.4). Total num frames: 1054736384. Throughput: 0: 11468.8. Samples: 263752704. Policy #0 lag: (min: 0.0, avg: 74.1, max: 256.0) [2024-06-15 17:51:45,956][1648985] Avg episode reward: [(0, '157.990')] [2024-06-15 17:51:49,152][1652491] Updated weights for policy 0, policy_version 515072 (0.0018) [2024-06-15 17:51:50,955][1648985] Fps is (10 sec: 49151.9, 60 sec: 46421.3, 300 sec: 45764.2). Total num frames: 1054998528. Throughput: 0: 11605.3. Samples: 263824384. Policy #0 lag: (min: 0.0, avg: 74.1, max: 256.0) [2024-06-15 17:51:50,956][1648985] Avg episode reward: [(0, '179.550')] [2024-06-15 17:51:51,539][1652491] Updated weights for policy 0, policy_version 515152 (0.0013) [2024-06-15 17:51:52,479][1652491] Updated weights for policy 0, policy_version 515198 (0.0013) [2024-06-15 17:51:54,267][1652491] Updated weights for policy 0, policy_version 515264 (0.0013) [2024-06-15 17:51:55,955][1648985] Fps is (10 sec: 52428.5, 60 sec: 45875.3, 300 sec: 46208.4). Total num frames: 1055260672. Throughput: 0: 11582.6. Samples: 263858176. Policy #0 lag: (min: 0.0, avg: 74.1, max: 256.0) [2024-06-15 17:51:55,956][1648985] Avg episode reward: [(0, '167.680')] [2024-06-15 17:51:59,777][1652491] Updated weights for policy 0, policy_version 515320 (0.0112) [2024-06-15 17:52:00,637][1652491] Updated weights for policy 0, policy_version 515350 (0.0013) [2024-06-15 17:52:00,956][1648985] Fps is (10 sec: 45874.0, 60 sec: 46967.2, 300 sec: 45986.2). Total num frames: 1055457280. Throughput: 0: 11707.6. Samples: 263936512. Policy #0 lag: (min: 0.0, avg: 74.1, max: 256.0) [2024-06-15 17:52:00,957][1648985] Avg episode reward: [(0, '171.810')] [2024-06-15 17:52:02,643][1652491] Updated weights for policy 0, policy_version 515424 (0.0033) [2024-06-15 17:52:03,966][1652491] Updated weights for policy 0, policy_version 515472 (0.0011) [2024-06-15 17:52:05,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 47513.5, 300 sec: 46208.4). Total num frames: 1055784960. Throughput: 0: 11821.5. Samples: 264002048. Policy #0 lag: (min: 0.0, avg: 74.1, max: 256.0) [2024-06-15 17:52:05,956][1648985] Avg episode reward: [(0, '164.660')] [2024-06-15 17:52:10,024][1652491] Updated weights for policy 0, policy_version 515552 (0.0180) [2024-06-15 17:52:10,955][1648985] Fps is (10 sec: 45875.9, 60 sec: 45875.2, 300 sec: 46208.4). Total num frames: 1055916032. Throughput: 0: 11832.9. Samples: 264045056. Policy #0 lag: (min: 0.0, avg: 74.1, max: 256.0) [2024-06-15 17:52:10,956][1648985] Avg episode reward: [(0, '164.940')] [2024-06-15 17:52:12,475][1652491] Updated weights for policy 0, policy_version 515617 (0.0014) [2024-06-15 17:52:13,940][1652491] Updated weights for policy 0, policy_version 515652 (0.0011) [2024-06-15 17:52:14,924][1652491] Updated weights for policy 0, policy_version 515711 (0.0015) [2024-06-15 17:52:15,955][1648985] Fps is (10 sec: 45875.3, 60 sec: 48605.8, 300 sec: 46208.4). Total num frames: 1056243712. Throughput: 0: 11798.8. Samples: 264104960. Policy #0 lag: (min: 0.0, avg: 74.1, max: 256.0) [2024-06-15 17:52:15,956][1648985] Avg episode reward: [(0, '160.190')] [2024-06-15 17:52:16,264][1652491] Updated weights for policy 0, policy_version 515766 (0.0014) [2024-06-15 17:52:20,283][1651469] Signal inference workers to stop experience collection... (26850 times) [2024-06-15 17:52:20,324][1652491] InferenceWorker_p0-w0: stopping experience collection (26850 times) [2024-06-15 17:52:20,536][1651469] Signal inference workers to resume experience collection... (26850 times) [2024-06-15 17:52:20,537][1652491] InferenceWorker_p0-w0: resuming experience collection (26850 times) [2024-06-15 17:52:20,540][1652491] Updated weights for policy 0, policy_version 515792 (0.0015) [2024-06-15 17:52:20,955][1648985] Fps is (10 sec: 42598.3, 60 sec: 44236.7, 300 sec: 46319.5). Total num frames: 1056342016. Throughput: 0: 11680.0. Samples: 264179200. Policy #0 lag: (min: 0.0, avg: 74.1, max: 256.0) [2024-06-15 17:52:20,956][1648985] Avg episode reward: [(0, '162.490')] [2024-06-15 17:52:23,362][1652491] Updated weights for policy 0, policy_version 515856 (0.0014) [2024-06-15 17:52:25,189][1652491] Updated weights for policy 0, policy_version 515907 (0.0012) [2024-06-15 17:52:25,955][1648985] Fps is (10 sec: 42598.4, 60 sec: 48059.8, 300 sec: 46319.9). Total num frames: 1056669696. Throughput: 0: 11764.6. Samples: 264209920. Policy #0 lag: (min: 0.0, avg: 74.1, max: 256.0) [2024-06-15 17:52:25,956][1648985] Avg episode reward: [(0, '164.610')] [2024-06-15 17:52:26,288][1652491] Updated weights for policy 0, policy_version 515968 (0.0013) [2024-06-15 17:52:27,333][1652491] Updated weights for policy 0, policy_version 516029 (0.0014) [2024-06-15 17:52:30,955][1648985] Fps is (10 sec: 49152.3, 60 sec: 45875.4, 300 sec: 46208.4). Total num frames: 1056833536. Throughput: 0: 11844.2. Samples: 264285696. Policy #0 lag: (min: 0.0, avg: 74.1, max: 256.0) [2024-06-15 17:52:30,956][1648985] Avg episode reward: [(0, '142.860')] [2024-06-15 17:52:32,369][1652491] Updated weights for policy 0, policy_version 516092 (0.0012) [2024-06-15 17:52:34,790][1652491] Updated weights for policy 0, policy_version 516144 (0.0015) [2024-06-15 17:52:35,955][1648985] Fps is (10 sec: 42598.6, 60 sec: 48079.7, 300 sec: 46541.7). Total num frames: 1057095680. Throughput: 0: 11878.4. Samples: 264358912. Policy #0 lag: (min: 0.0, avg: 74.1, max: 256.0) [2024-06-15 17:52:35,955][1648985] Avg episode reward: [(0, '130.700')] [2024-06-15 17:52:36,645][1652491] Updated weights for policy 0, policy_version 516195 (0.0014) [2024-06-15 17:52:37,791][1652491] Updated weights for policy 0, policy_version 516259 (0.0013) [2024-06-15 17:52:40,955][1648985] Fps is (10 sec: 52427.8, 60 sec: 47513.4, 300 sec: 46208.4). Total num frames: 1057357824. Throughput: 0: 11821.5. Samples: 264390144. Policy #0 lag: (min: 0.0, avg: 74.1, max: 256.0) [2024-06-15 17:52:40,956][1648985] Avg episode reward: [(0, '130.000')] [2024-06-15 17:52:43,432][1652491] Updated weights for policy 0, policy_version 516324 (0.0140) [2024-06-15 17:52:45,287][1652491] Updated weights for policy 0, policy_version 516369 (0.0015) [2024-06-15 17:52:45,955][1648985] Fps is (10 sec: 45874.7, 60 sec: 46967.4, 300 sec: 46763.8). Total num frames: 1057554432. Throughput: 0: 11787.4. Samples: 264466944. Policy #0 lag: (min: 15.0, avg: 102.1, max: 271.0) [2024-06-15 17:52:45,956][1648985] Avg episode reward: [(0, '119.010')] [2024-06-15 17:52:47,702][1652491] Updated weights for policy 0, policy_version 516432 (0.0015) [2024-06-15 17:52:48,807][1652491] Updated weights for policy 0, policy_version 516480 (0.0018) [2024-06-15 17:52:50,955][1648985] Fps is (10 sec: 52430.2, 60 sec: 48059.7, 300 sec: 46431.1). Total num frames: 1057882112. Throughput: 0: 11650.9. Samples: 264526336. Policy #0 lag: (min: 15.0, avg: 102.1, max: 271.0) [2024-06-15 17:52:50,956][1648985] Avg episode reward: [(0, '118.890')] [2024-06-15 17:52:54,339][1652491] Updated weights for policy 0, policy_version 516560 (0.0014) [2024-06-15 17:52:55,446][1652491] Updated weights for policy 0, policy_version 516603 (0.0012) [2024-06-15 17:52:55,955][1648985] Fps is (10 sec: 45874.7, 60 sec: 45875.1, 300 sec: 46652.7). Total num frames: 1058013184. Throughput: 0: 11582.6. Samples: 264566272. Policy #0 lag: (min: 15.0, avg: 102.1, max: 271.0) [2024-06-15 17:52:55,956][1648985] Avg episode reward: [(0, '108.510')] [2024-06-15 17:52:55,970][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000516608_1058013184.pth... [2024-06-15 17:52:56,175][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000511168_1046872064.pth [2024-06-15 17:52:57,440][1652491] Updated weights for policy 0, policy_version 516661 (0.0012) [2024-06-15 17:52:59,755][1652491] Updated weights for policy 0, policy_version 516732 (0.0013) [2024-06-15 17:53:00,955][1648985] Fps is (10 sec: 45874.1, 60 sec: 48059.8, 300 sec: 46652.7). Total num frames: 1058340864. Throughput: 0: 11673.5. Samples: 264630272. Policy #0 lag: (min: 15.0, avg: 102.1, max: 271.0) [2024-06-15 17:53:00,956][1648985] Avg episode reward: [(0, '119.540')] [2024-06-15 17:53:01,039][1651469] Signal inference workers to stop experience collection... (26900 times) [2024-06-15 17:53:01,092][1652491] InferenceWorker_p0-w0: stopping experience collection (26900 times) [2024-06-15 17:53:01,097][1652491] Updated weights for policy 0, policy_version 516771 (0.0012) [2024-06-15 17:53:01,289][1651469] Signal inference workers to resume experience collection... (26900 times) [2024-06-15 17:53:01,290][1652491] InferenceWorker_p0-w0: resuming experience collection (26900 times) [2024-06-15 17:53:05,955][1648985] Fps is (10 sec: 42599.0, 60 sec: 44236.8, 300 sec: 46319.5). Total num frames: 1058439168. Throughput: 0: 11628.1. Samples: 264702464. Policy #0 lag: (min: 15.0, avg: 102.1, max: 271.0) [2024-06-15 17:53:05,956][1648985] Avg episode reward: [(0, '123.560')] [2024-06-15 17:53:06,488][1652491] Updated weights for policy 0, policy_version 516848 (0.0012) [2024-06-15 17:53:08,415][1652491] Updated weights for policy 0, policy_version 516898 (0.0025) [2024-06-15 17:53:10,900][1652491] Updated weights for policy 0, policy_version 516960 (0.0106) [2024-06-15 17:53:10,955][1648985] Fps is (10 sec: 39322.5, 60 sec: 46967.6, 300 sec: 46763.8). Total num frames: 1058734080. Throughput: 0: 11639.5. Samples: 264733696. Policy #0 lag: (min: 15.0, avg: 102.1, max: 271.0) [2024-06-15 17:53:10,956][1648985] Avg episode reward: [(0, '131.810')] [2024-06-15 17:53:12,616][1652491] Updated weights for policy 0, policy_version 517040 (0.0013) [2024-06-15 17:53:15,955][1648985] Fps is (10 sec: 49152.1, 60 sec: 44782.9, 300 sec: 46208.4). Total num frames: 1058930688. Throughput: 0: 11548.5. Samples: 264805376. Policy #0 lag: (min: 15.0, avg: 102.1, max: 271.0) [2024-06-15 17:53:15,956][1648985] Avg episode reward: [(0, '138.020')] [2024-06-15 17:53:17,363][1652491] Updated weights for policy 0, policy_version 517091 (0.0013) [2024-06-15 17:53:20,275][1652491] Updated weights for policy 0, policy_version 517168 (0.0011) [2024-06-15 17:53:20,955][1648985] Fps is (10 sec: 45875.9, 60 sec: 47513.8, 300 sec: 46874.9). Total num frames: 1059192832. Throughput: 0: 11366.4. Samples: 264870400. Policy #0 lag: (min: 15.0, avg: 102.1, max: 271.0) [2024-06-15 17:53:20,955][1648985] Avg episode reward: [(0, '146.400')] [2024-06-15 17:53:21,948][1652491] Updated weights for policy 0, policy_version 517185 (0.0012) [2024-06-15 17:53:24,108][1652491] Updated weights for policy 0, policy_version 517280 (0.0013) [2024-06-15 17:53:25,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 46421.3, 300 sec: 46541.7). Total num frames: 1059454976. Throughput: 0: 11423.3. Samples: 264904192. Policy #0 lag: (min: 15.0, avg: 102.1, max: 271.0) [2024-06-15 17:53:25,956][1648985] Avg episode reward: [(0, '146.610')] [2024-06-15 17:53:28,428][1652491] Updated weights for policy 0, policy_version 517332 (0.0013) [2024-06-15 17:53:30,846][1652491] Updated weights for policy 0, policy_version 517392 (0.0013) [2024-06-15 17:53:30,955][1648985] Fps is (10 sec: 42597.9, 60 sec: 46421.4, 300 sec: 46763.8). Total num frames: 1059618816. Throughput: 0: 11366.4. Samples: 264978432. Policy #0 lag: (min: 15.0, avg: 102.1, max: 271.0) [2024-06-15 17:53:30,955][1648985] Avg episode reward: [(0, '152.110')] [2024-06-15 17:53:32,047][1652491] Updated weights for policy 0, policy_version 517436 (0.0061) [2024-06-15 17:53:34,413][1652491] Updated weights for policy 0, policy_version 517491 (0.0013) [2024-06-15 17:53:35,836][1652491] Updated weights for policy 0, policy_version 517556 (0.0013) [2024-06-15 17:53:35,955][1648985] Fps is (10 sec: 49152.1, 60 sec: 47513.6, 300 sec: 46874.9). Total num frames: 1059946496. Throughput: 0: 11468.8. Samples: 265042432. Policy #0 lag: (min: 15.0, avg: 102.1, max: 271.0) [2024-06-15 17:53:35,956][1648985] Avg episode reward: [(0, '166.750')] [2024-06-15 17:53:40,179][1652491] Updated weights for policy 0, policy_version 517602 (0.0118) [2024-06-15 17:53:40,955][1648985] Fps is (10 sec: 49152.0, 60 sec: 45875.4, 300 sec: 46652.8). Total num frames: 1060110336. Throughput: 0: 11514.4. Samples: 265084416. Policy #0 lag: (min: 15.0, avg: 102.1, max: 271.0) [2024-06-15 17:53:40,955][1648985] Avg episode reward: [(0, '153.190')] [2024-06-15 17:53:41,537][1652491] Updated weights for policy 0, policy_version 517636 (0.0013) [2024-06-15 17:53:42,774][1652491] Updated weights for policy 0, policy_version 517696 (0.0013) [2024-06-15 17:53:45,417][1652491] Updated weights for policy 0, policy_version 517746 (0.0013) [2024-06-15 17:53:45,955][1648985] Fps is (10 sec: 45874.5, 60 sec: 47513.5, 300 sec: 46763.8). Total num frames: 1060405248. Throughput: 0: 11719.1. Samples: 265157632. Policy #0 lag: (min: 15.0, avg: 102.1, max: 271.0) [2024-06-15 17:53:45,956][1648985] Avg episode reward: [(0, '136.930')] [2024-06-15 17:53:46,432][1651469] Signal inference workers to stop experience collection... (26950 times) [2024-06-15 17:53:46,534][1652491] InferenceWorker_p0-w0: stopping experience collection (26950 times) [2024-06-15 17:53:46,536][1652491] Updated weights for policy 0, policy_version 517800 (0.0012) [2024-06-15 17:53:46,650][1651469] Signal inference workers to resume experience collection... (26950 times) [2024-06-15 17:53:46,651][1652491] InferenceWorker_p0-w0: resuming experience collection (26950 times) [2024-06-15 17:53:50,675][1652491] Updated weights for policy 0, policy_version 517842 (0.0013) [2024-06-15 17:53:50,955][1648985] Fps is (10 sec: 45874.9, 60 sec: 44782.9, 300 sec: 46430.6). Total num frames: 1060569088. Throughput: 0: 11741.9. Samples: 265230848. Policy #0 lag: (min: 15.0, avg: 102.1, max: 271.0) [2024-06-15 17:53:50,956][1648985] Avg episode reward: [(0, '132.580')] [2024-06-15 17:53:53,277][1652491] Updated weights for policy 0, policy_version 517936 (0.0013) [2024-06-15 17:53:55,868][1652491] Updated weights for policy 0, policy_version 518000 (0.0013) [2024-06-15 17:53:55,955][1648985] Fps is (10 sec: 45876.2, 60 sec: 47513.8, 300 sec: 46874.9). Total num frames: 1060864000. Throughput: 0: 11764.6. Samples: 265263104. Policy #0 lag: (min: 15.0, avg: 102.1, max: 271.0) [2024-06-15 17:53:55,955][1648985] Avg episode reward: [(0, '134.160')] [2024-06-15 17:53:57,133][1652491] Updated weights for policy 0, policy_version 518051 (0.0015) [2024-06-15 17:54:00,955][1648985] Fps is (10 sec: 45874.2, 60 sec: 44782.9, 300 sec: 46208.4). Total num frames: 1061027840. Throughput: 0: 11832.8. Samples: 265337856. Policy #0 lag: (min: 15.0, avg: 102.1, max: 271.0) [2024-06-15 17:54:00,956][1648985] Avg episode reward: [(0, '147.090')] [2024-06-15 17:54:01,630][1652491] Updated weights for policy 0, policy_version 518112 (0.0013) [2024-06-15 17:54:03,260][1652491] Updated weights for policy 0, policy_version 518147 (0.0095) [2024-06-15 17:54:04,401][1652491] Updated weights for policy 0, policy_version 518201 (0.0015) [2024-06-15 17:54:05,955][1648985] Fps is (10 sec: 45875.0, 60 sec: 48059.8, 300 sec: 46986.0). Total num frames: 1061322752. Throughput: 0: 12026.3. Samples: 265411584. Policy #0 lag: (min: 15.0, avg: 102.1, max: 271.0) [2024-06-15 17:54:05,956][1648985] Avg episode reward: [(0, '160.100')] [2024-06-15 17:54:06,790][1652491] Updated weights for policy 0, policy_version 518256 (0.0014) [2024-06-15 17:54:07,987][1652491] Updated weights for policy 0, policy_version 518306 (0.0011) [2024-06-15 17:54:08,524][1652491] Updated weights for policy 0, policy_version 518336 (0.0012) [2024-06-15 17:54:10,955][1648985] Fps is (10 sec: 52429.9, 60 sec: 46967.4, 300 sec: 46652.7). Total num frames: 1061552128. Throughput: 0: 11923.9. Samples: 265440768. Policy #0 lag: (min: 15.0, avg: 102.1, max: 271.0) [2024-06-15 17:54:10,956][1648985] Avg episode reward: [(0, '148.350')] [2024-06-15 17:54:13,355][1652491] Updated weights for policy 0, policy_version 518394 (0.0016) [2024-06-15 17:54:15,201][1652491] Updated weights for policy 0, policy_version 518458 (0.0018) [2024-06-15 17:54:15,955][1648985] Fps is (10 sec: 49152.1, 60 sec: 48059.8, 300 sec: 47097.1). Total num frames: 1061814272. Throughput: 0: 12026.3. Samples: 265519616. Policy #0 lag: (min: 6.0, avg: 98.4, max: 262.0) [2024-06-15 17:54:15,956][1648985] Avg episode reward: [(0, '141.920')] [2024-06-15 17:54:18,003][1652491] Updated weights for policy 0, policy_version 518515 (0.0014) [2024-06-15 17:54:20,955][1648985] Fps is (10 sec: 52427.9, 60 sec: 48059.5, 300 sec: 47097.1). Total num frames: 1062076416. Throughput: 0: 11969.4. Samples: 265581056. Policy #0 lag: (min: 6.0, avg: 98.4, max: 262.0) [2024-06-15 17:54:20,956][1648985] Avg episode reward: [(0, '139.990')] [2024-06-15 17:54:23,694][1652491] Updated weights for policy 0, policy_version 518594 (0.0018) [2024-06-15 17:54:25,866][1652491] Updated weights for policy 0, policy_version 518672 (0.0019) [2024-06-15 17:54:25,955][1648985] Fps is (10 sec: 42598.5, 60 sec: 46421.4, 300 sec: 46763.8). Total num frames: 1062240256. Throughput: 0: 11946.7. Samples: 265622016. Policy #0 lag: (min: 6.0, avg: 98.4, max: 262.0) [2024-06-15 17:54:25,955][1648985] Avg episode reward: [(0, '136.300')] [2024-06-15 17:54:28,601][1652491] Updated weights for policy 0, policy_version 518721 (0.0014) [2024-06-15 17:54:30,173][1651469] Signal inference workers to stop experience collection... (27000 times) [2024-06-15 17:54:30,194][1652491] Updated weights for policy 0, policy_version 518785 (0.0014) [2024-06-15 17:54:30,212][1652491] InferenceWorker_p0-w0: stopping experience collection (27000 times) [2024-06-15 17:54:30,462][1651469] Signal inference workers to resume experience collection... (27000 times) [2024-06-15 17:54:30,480][1652491] InferenceWorker_p0-w0: resuming experience collection (27000 times) [2024-06-15 17:54:30,955][1648985] Fps is (10 sec: 45876.1, 60 sec: 48605.8, 300 sec: 47098.2). Total num frames: 1062535168. Throughput: 0: 11753.3. Samples: 265686528. Policy #0 lag: (min: 6.0, avg: 98.4, max: 262.0) [2024-06-15 17:54:30,956][1648985] Avg episode reward: [(0, '142.510')] [2024-06-15 17:54:35,399][1652491] Updated weights for policy 0, policy_version 518864 (0.0014) [2024-06-15 17:54:35,955][1648985] Fps is (10 sec: 42598.4, 60 sec: 45329.1, 300 sec: 46430.6). Total num frames: 1062666240. Throughput: 0: 11673.6. Samples: 265756160. Policy #0 lag: (min: 6.0, avg: 98.4, max: 262.0) [2024-06-15 17:54:35,955][1648985] Avg episode reward: [(0, '160.600')] [2024-06-15 17:54:37,286][1652491] Updated weights for policy 0, policy_version 518928 (0.0097) [2024-06-15 17:54:40,442][1652491] Updated weights for policy 0, policy_version 519008 (0.0027) [2024-06-15 17:54:40,955][1648985] Fps is (10 sec: 42598.3, 60 sec: 47513.5, 300 sec: 46986.0). Total num frames: 1062961152. Throughput: 0: 11707.7. Samples: 265789952. Policy #0 lag: (min: 6.0, avg: 98.4, max: 262.0) [2024-06-15 17:54:40,956][1648985] Avg episode reward: [(0, '157.620')] [2024-06-15 17:54:45,955][1648985] Fps is (10 sec: 45874.7, 60 sec: 45329.2, 300 sec: 46208.4). Total num frames: 1063124992. Throughput: 0: 11423.3. Samples: 265851904. Policy #0 lag: (min: 6.0, avg: 98.4, max: 262.0) [2024-06-15 17:54:45,956][1648985] Avg episode reward: [(0, '157.620')] [2024-06-15 17:54:46,862][1652491] Updated weights for policy 0, policy_version 519105 (0.0013) [2024-06-15 17:54:48,165][1652491] Updated weights for policy 0, policy_version 519167 (0.0014) [2024-06-15 17:54:50,590][1652491] Updated weights for policy 0, policy_version 519230 (0.0014) [2024-06-15 17:54:50,955][1648985] Fps is (10 sec: 42598.7, 60 sec: 46967.5, 300 sec: 46652.8). Total num frames: 1063387136. Throughput: 0: 11332.3. Samples: 265921536. Policy #0 lag: (min: 6.0, avg: 98.4, max: 262.0) [2024-06-15 17:54:50,956][1648985] Avg episode reward: [(0, '171.120')] [2024-06-15 17:54:52,842][1652491] Updated weights for policy 0, policy_version 519286 (0.0012) [2024-06-15 17:54:54,387][1652491] Updated weights for policy 0, policy_version 519355 (0.0012) [2024-06-15 17:54:55,955][1648985] Fps is (10 sec: 52428.0, 60 sec: 46421.1, 300 sec: 46541.6). Total num frames: 1063649280. Throughput: 0: 11286.7. Samples: 265948672. Policy #0 lag: (min: 6.0, avg: 98.4, max: 262.0) [2024-06-15 17:54:55,956][1648985] Avg episode reward: [(0, '166.280')] [2024-06-15 17:54:55,989][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000519360_1063649280.pth... [2024-06-15 17:54:56,054][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000513920_1052508160.pth [2024-06-15 17:54:59,572][1652491] Updated weights for policy 0, policy_version 519396 (0.0016) [2024-06-15 17:55:00,955][1648985] Fps is (10 sec: 42598.1, 60 sec: 46421.5, 300 sec: 46652.7). Total num frames: 1063813120. Throughput: 0: 11264.0. Samples: 266026496. Policy #0 lag: (min: 6.0, avg: 98.4, max: 262.0) [2024-06-15 17:55:00,956][1648985] Avg episode reward: [(0, '163.610')] [2024-06-15 17:55:01,152][1652491] Updated weights for policy 0, policy_version 519456 (0.0012) [2024-06-15 17:55:03,728][1652491] Updated weights for policy 0, policy_version 519509 (0.0013) [2024-06-15 17:55:04,865][1652491] Updated weights for policy 0, policy_version 519568 (0.0013) [2024-06-15 17:55:05,897][1652491] Updated weights for policy 0, policy_version 519611 (0.0013) [2024-06-15 17:55:05,955][1648985] Fps is (10 sec: 49152.9, 60 sec: 46967.4, 300 sec: 46544.7). Total num frames: 1064140800. Throughput: 0: 11343.7. Samples: 266091520. Policy #0 lag: (min: 6.0, avg: 98.4, max: 262.0) [2024-06-15 17:55:05,956][1648985] Avg episode reward: [(0, '156.480')] [2024-06-15 17:55:10,232][1652491] Updated weights for policy 0, policy_version 519664 (0.0012) [2024-06-15 17:55:10,963][1648985] Fps is (10 sec: 49115.4, 60 sec: 45869.5, 300 sec: 46651.6). Total num frames: 1064304640. Throughput: 0: 11478.3. Samples: 266138624. Policy #0 lag: (min: 6.0, avg: 98.4, max: 262.0) [2024-06-15 17:55:10,964][1648985] Avg episode reward: [(0, '139.180')] [2024-06-15 17:55:11,693][1652491] Updated weights for policy 0, policy_version 519715 (0.0011) [2024-06-15 17:55:14,927][1651469] Signal inference workers to stop experience collection... (27050 times) [2024-06-15 17:55:14,951][1652491] InferenceWorker_p0-w0: stopping experience collection (27050 times) [2024-06-15 17:55:15,107][1651469] Signal inference workers to resume experience collection... (27050 times) [2024-06-15 17:55:15,108][1652491] InferenceWorker_p0-w0: resuming experience collection (27050 times) [2024-06-15 17:55:15,453][1652491] Updated weights for policy 0, policy_version 519792 (0.0014) [2024-06-15 17:55:15,955][1648985] Fps is (10 sec: 42597.8, 60 sec: 45875.0, 300 sec: 46652.7). Total num frames: 1064566784. Throughput: 0: 11571.2. Samples: 266207232. Policy #0 lag: (min: 6.0, avg: 98.4, max: 262.0) [2024-06-15 17:55:15,957][1648985] Avg episode reward: [(0, '135.350')] [2024-06-15 17:55:16,758][1652491] Updated weights for policy 0, policy_version 519856 (0.0012) [2024-06-15 17:55:20,955][1648985] Fps is (10 sec: 42630.2, 60 sec: 44236.9, 300 sec: 46319.5). Total num frames: 1064730624. Throughput: 0: 11719.1. Samples: 266283520. Policy #0 lag: (min: 6.0, avg: 98.4, max: 262.0) [2024-06-15 17:55:20,956][1648985] Avg episode reward: [(0, '128.210')] [2024-06-15 17:55:21,300][1652491] Updated weights for policy 0, policy_version 519920 (0.0027) [2024-06-15 17:55:23,289][1652491] Updated weights for policy 0, policy_version 519994 (0.0096) [2024-06-15 17:55:25,955][1648985] Fps is (10 sec: 42598.2, 60 sec: 45875.0, 300 sec: 46541.6). Total num frames: 1064992768. Throughput: 0: 11525.6. Samples: 266308608. Policy #0 lag: (min: 6.0, avg: 98.4, max: 262.0) [2024-06-15 17:55:25,956][1648985] Avg episode reward: [(0, '131.210')] [2024-06-15 17:55:26,591][1652491] Updated weights for policy 0, policy_version 520052 (0.0013) [2024-06-15 17:55:30,955][1648985] Fps is (10 sec: 49152.1, 60 sec: 44782.9, 300 sec: 46208.5). Total num frames: 1065222144. Throughput: 0: 11650.8. Samples: 266376192. Policy #0 lag: (min: 6.0, avg: 98.4, max: 262.0) [2024-06-15 17:55:30,956][1648985] Avg episode reward: [(0, '132.570')] [2024-06-15 17:55:32,054][1652491] Updated weights for policy 0, policy_version 520134 (0.0036) [2024-06-15 17:55:33,607][1652491] Updated weights for policy 0, policy_version 520192 (0.0044) [2024-06-15 17:55:34,879][1652491] Updated weights for policy 0, policy_version 520240 (0.0021) [2024-06-15 17:55:35,955][1648985] Fps is (10 sec: 49152.9, 60 sec: 46967.4, 300 sec: 46652.7). Total num frames: 1065484288. Throughput: 0: 11605.3. Samples: 266443776. Policy #0 lag: (min: 6.0, avg: 98.4, max: 262.0) [2024-06-15 17:55:35,956][1648985] Avg episode reward: [(0, '135.210')] [2024-06-15 17:55:37,783][1652491] Updated weights for policy 0, policy_version 520276 (0.0014) [2024-06-15 17:55:39,788][1652491] Updated weights for policy 0, policy_version 520368 (0.0014) [2024-06-15 17:55:40,955][1648985] Fps is (10 sec: 52429.4, 60 sec: 46421.4, 300 sec: 46541.7). Total num frames: 1065746432. Throughput: 0: 11810.2. Samples: 266480128. Policy #0 lag: (min: 6.0, avg: 98.4, max: 262.0) [2024-06-15 17:55:40,955][1648985] Avg episode reward: [(0, '143.180')] [2024-06-15 17:55:43,960][1652491] Updated weights for policy 0, policy_version 520419 (0.0020) [2024-06-15 17:55:45,621][1652491] Updated weights for policy 0, policy_version 520480 (0.0026) [2024-06-15 17:55:45,955][1648985] Fps is (10 sec: 45875.4, 60 sec: 46967.5, 300 sec: 46541.7). Total num frames: 1065943040. Throughput: 0: 11650.9. Samples: 266550784. Policy #0 lag: (min: 15.0, avg: 110.4, max: 271.0) [2024-06-15 17:55:45,956][1648985] Avg episode reward: [(0, '147.530')] [2024-06-15 17:55:49,404][1652491] Updated weights for policy 0, policy_version 520544 (0.0014) [2024-06-15 17:55:50,955][1648985] Fps is (10 sec: 45874.5, 60 sec: 46967.4, 300 sec: 46430.6). Total num frames: 1066205184. Throughput: 0: 11571.2. Samples: 266612224. Policy #0 lag: (min: 15.0, avg: 110.4, max: 271.0) [2024-06-15 17:55:50,956][1648985] Avg episode reward: [(0, '136.140')] [2024-06-15 17:55:50,966][1652491] Updated weights for policy 0, policy_version 520617 (0.0041) [2024-06-15 17:55:55,147][1652491] Updated weights for policy 0, policy_version 520657 (0.0013) [2024-06-15 17:55:55,457][1651469] Signal inference workers to stop experience collection... (27100 times) [2024-06-15 17:55:55,516][1652491] InferenceWorker_p0-w0: stopping experience collection (27100 times) [2024-06-15 17:55:55,658][1651469] Signal inference workers to resume experience collection... (27100 times) [2024-06-15 17:55:55,659][1652491] InferenceWorker_p0-w0: resuming experience collection (27100 times) [2024-06-15 17:55:55,955][1648985] Fps is (10 sec: 42598.2, 60 sec: 45329.2, 300 sec: 46541.6). Total num frames: 1066369024. Throughput: 0: 11482.1. Samples: 266655232. Policy #0 lag: (min: 15.0, avg: 110.4, max: 271.0) [2024-06-15 17:55:55,956][1648985] Avg episode reward: [(0, '149.570')] [2024-06-15 17:55:57,373][1652491] Updated weights for policy 0, policy_version 520752 (0.0013) [2024-06-15 17:56:00,304][1652491] Updated weights for policy 0, policy_version 520805 (0.0013) [2024-06-15 17:56:00,955][1648985] Fps is (10 sec: 45875.7, 60 sec: 47513.7, 300 sec: 46541.7). Total num frames: 1066663936. Throughput: 0: 11525.7. Samples: 266725888. Policy #0 lag: (min: 15.0, avg: 110.4, max: 271.0) [2024-06-15 17:56:00,955][1648985] Avg episode reward: [(0, '145.650')] [2024-06-15 17:56:01,652][1652491] Updated weights for policy 0, policy_version 520880 (0.0197) [2024-06-15 17:56:05,955][1648985] Fps is (10 sec: 42598.7, 60 sec: 44236.8, 300 sec: 46208.5). Total num frames: 1066795008. Throughput: 0: 11457.4. Samples: 266799104. Policy #0 lag: (min: 15.0, avg: 110.4, max: 271.0) [2024-06-15 17:56:05,956][1648985] Avg episode reward: [(0, '163.790')] [2024-06-15 17:56:07,595][1652491] Updated weights for policy 0, policy_version 520960 (0.0014) [2024-06-15 17:56:09,134][1652491] Updated weights for policy 0, policy_version 521021 (0.0012) [2024-06-15 17:56:10,956][1648985] Fps is (10 sec: 39319.4, 60 sec: 45880.5, 300 sec: 46541.6). Total num frames: 1067057152. Throughput: 0: 11400.5. Samples: 266821632. Policy #0 lag: (min: 15.0, avg: 110.4, max: 271.0) [2024-06-15 17:56:10,957][1648985] Avg episode reward: [(0, '161.110')] [2024-06-15 17:56:12,532][1652491] Updated weights for policy 0, policy_version 521088 (0.0094) [2024-06-15 17:56:13,952][1652491] Updated weights for policy 0, policy_version 521152 (0.0027) [2024-06-15 17:56:15,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 45875.3, 300 sec: 46208.4). Total num frames: 1067319296. Throughput: 0: 11434.7. Samples: 266890752. Policy #0 lag: (min: 15.0, avg: 110.4, max: 271.0) [2024-06-15 17:56:15,956][1648985] Avg episode reward: [(0, '159.740')] [2024-06-15 17:56:19,154][1652491] Updated weights for policy 0, policy_version 521201 (0.0012) [2024-06-15 17:56:20,522][1652491] Updated weights for policy 0, policy_version 521254 (0.0012) [2024-06-15 17:56:20,955][1648985] Fps is (10 sec: 49154.0, 60 sec: 46967.4, 300 sec: 46652.7). Total num frames: 1067548672. Throughput: 0: 11537.1. Samples: 266962944. Policy #0 lag: (min: 15.0, avg: 110.4, max: 271.0) [2024-06-15 17:56:20,955][1648985] Avg episode reward: [(0, '148.880')] [2024-06-15 17:56:23,270][1652491] Updated weights for policy 0, policy_version 521303 (0.0013) [2024-06-15 17:56:24,810][1652491] Updated weights for policy 0, policy_version 521362 (0.0012) [2024-06-15 17:56:25,763][1652491] Updated weights for policy 0, policy_version 521408 (0.0011) [2024-06-15 17:56:25,957][1648985] Fps is (10 sec: 52421.9, 60 sec: 47512.7, 300 sec: 46652.6). Total num frames: 1067843584. Throughput: 0: 11559.5. Samples: 267000320. Policy #0 lag: (min: 15.0, avg: 110.4, max: 271.0) [2024-06-15 17:56:25,958][1648985] Avg episode reward: [(0, '131.830')] [2024-06-15 17:56:30,141][1652491] Updated weights for policy 0, policy_version 521459 (0.0014) [2024-06-15 17:56:30,955][1648985] Fps is (10 sec: 45876.1, 60 sec: 46421.4, 300 sec: 46767.8). Total num frames: 1068007424. Throughput: 0: 11650.9. Samples: 267075072. Policy #0 lag: (min: 15.0, avg: 110.4, max: 271.0) [2024-06-15 17:56:30,955][1648985] Avg episode reward: [(0, '151.350')] [2024-06-15 17:56:31,727][1652491] Updated weights for policy 0, policy_version 521528 (0.0013) [2024-06-15 17:56:34,537][1652491] Updated weights for policy 0, policy_version 521584 (0.0053) [2024-06-15 17:56:34,678][1651469] Signal inference workers to stop experience collection... (27150 times) [2024-06-15 17:56:34,772][1652491] InferenceWorker_p0-w0: stopping experience collection (27150 times) [2024-06-15 17:56:34,976][1651469] Signal inference workers to resume experience collection... (27150 times) [2024-06-15 17:56:34,977][1652491] InferenceWorker_p0-w0: resuming experience collection (27150 times) [2024-06-15 17:56:35,955][1648985] Fps is (10 sec: 45881.5, 60 sec: 46967.5, 300 sec: 46763.8). Total num frames: 1068302336. Throughput: 0: 11571.2. Samples: 267132928. Policy #0 lag: (min: 15.0, avg: 110.4, max: 271.0) [2024-06-15 17:56:35,955][1648985] Avg episode reward: [(0, '143.200')] [2024-06-15 17:56:36,547][1652491] Updated weights for policy 0, policy_version 521652 (0.0163) [2024-06-15 17:56:40,494][1652491] Updated weights for policy 0, policy_version 521680 (0.0017) [2024-06-15 17:56:40,955][1648985] Fps is (10 sec: 39320.5, 60 sec: 44236.6, 300 sec: 46319.5). Total num frames: 1068400640. Throughput: 0: 11525.7. Samples: 267173888. Policy #0 lag: (min: 15.0, avg: 110.4, max: 271.0) [2024-06-15 17:56:40,956][1648985] Avg episode reward: [(0, '155.570')] [2024-06-15 17:56:42,027][1652491] Updated weights for policy 0, policy_version 521744 (0.0013) [2024-06-15 17:56:45,067][1652491] Updated weights for policy 0, policy_version 521808 (0.0091) [2024-06-15 17:56:45,955][1648985] Fps is (10 sec: 42597.6, 60 sec: 46421.2, 300 sec: 46541.6). Total num frames: 1068728320. Throughput: 0: 11434.6. Samples: 267240448. Policy #0 lag: (min: 15.0, avg: 110.4, max: 271.0) [2024-06-15 17:56:45,956][1648985] Avg episode reward: [(0, '137.590')] [2024-06-15 17:56:47,907][1652491] Updated weights for policy 0, policy_version 521872 (0.0011) [2024-06-15 17:56:49,127][1652491] Updated weights for policy 0, policy_version 521915 (0.0011) [2024-06-15 17:56:50,955][1648985] Fps is (10 sec: 49153.3, 60 sec: 44783.0, 300 sec: 46208.4). Total num frames: 1068892160. Throughput: 0: 11286.8. Samples: 267307008. Policy #0 lag: (min: 15.0, avg: 110.4, max: 271.0) [2024-06-15 17:56:50,955][1648985] Avg episode reward: [(0, '154.670')] [2024-06-15 17:56:53,061][1652491] Updated weights for policy 0, policy_version 521975 (0.0013) [2024-06-15 17:56:54,478][1652491] Updated weights for policy 0, policy_version 522039 (0.0012) [2024-06-15 17:56:55,955][1648985] Fps is (10 sec: 45875.5, 60 sec: 46967.4, 300 sec: 46541.7). Total num frames: 1069187072. Throughput: 0: 11525.8. Samples: 267340288. Policy #0 lag: (min: 15.0, avg: 110.4, max: 271.0) [2024-06-15 17:56:55,956][1648985] Avg episode reward: [(0, '160.250')] [2024-06-15 17:56:56,177][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000522080_1069219840.pth... [2024-06-15 17:56:56,309][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000516608_1058013184.pth [2024-06-15 17:56:56,657][1652491] Updated weights for policy 0, policy_version 522096 (0.0013) [2024-06-15 17:56:59,288][1652491] Updated weights for policy 0, policy_version 522113 (0.0012) [2024-06-15 17:57:00,617][1652491] Updated weights for policy 0, policy_version 522174 (0.0133) [2024-06-15 17:57:00,955][1648985] Fps is (10 sec: 52427.1, 60 sec: 45875.0, 300 sec: 46208.4). Total num frames: 1069416448. Throughput: 0: 11628.0. Samples: 267414016. Policy #0 lag: (min: 15.0, avg: 110.4, max: 271.0) [2024-06-15 17:57:00,956][1648985] Avg episode reward: [(0, '148.610')] [2024-06-15 17:57:03,809][1652491] Updated weights for policy 0, policy_version 522230 (0.0013) [2024-06-15 17:57:05,370][1652491] Updated weights for policy 0, policy_version 522273 (0.0015) [2024-06-15 17:57:05,955][1648985] Fps is (10 sec: 49152.1, 60 sec: 48059.7, 300 sec: 46652.8). Total num frames: 1069678592. Throughput: 0: 11571.2. Samples: 267483648. Policy #0 lag: (min: 15.0, avg: 110.4, max: 271.0) [2024-06-15 17:57:05,956][1648985] Avg episode reward: [(0, '130.050')] [2024-06-15 17:57:07,026][1652491] Updated weights for policy 0, policy_version 522337 (0.0014) [2024-06-15 17:57:09,800][1652491] Updated weights for policy 0, policy_version 522369 (0.0014) [2024-06-15 17:57:10,955][1648985] Fps is (10 sec: 49153.1, 60 sec: 47514.0, 300 sec: 46319.5). Total num frames: 1069907968. Throughput: 0: 11662.6. Samples: 267525120. Policy #0 lag: (min: 15.0, avg: 110.4, max: 271.0) [2024-06-15 17:57:10,956][1648985] Avg episode reward: [(0, '140.100')] [2024-06-15 17:57:11,293][1652491] Updated weights for policy 0, policy_version 522431 (0.0143) [2024-06-15 17:57:14,684][1652491] Updated weights for policy 0, policy_version 522489 (0.0030) [2024-06-15 17:57:15,955][1648985] Fps is (10 sec: 42598.8, 60 sec: 46421.4, 300 sec: 46652.8). Total num frames: 1070104576. Throughput: 0: 11525.7. Samples: 267593728. Policy #0 lag: (min: 10.0, avg: 104.7, max: 266.0) [2024-06-15 17:57:15,955][1648985] Avg episode reward: [(0, '141.590')] [2024-06-15 17:57:16,348][1652491] Updated weights for policy 0, policy_version 522528 (0.0101) [2024-06-15 17:57:17,617][1652491] Updated weights for policy 0, policy_version 522592 (0.0037) [2024-06-15 17:57:17,718][1651469] Signal inference workers to stop experience collection... (27200 times) [2024-06-15 17:57:17,769][1652491] InferenceWorker_p0-w0: stopping experience collection (27200 times) [2024-06-15 17:57:17,923][1651469] Signal inference workers to resume experience collection... (27200 times) [2024-06-15 17:57:17,937][1652491] InferenceWorker_p0-w0: resuming experience collection (27200 times) [2024-06-15 17:57:20,955][1648985] Fps is (10 sec: 42598.5, 60 sec: 46421.4, 300 sec: 46319.5). Total num frames: 1070333952. Throughput: 0: 11923.9. Samples: 267669504. Policy #0 lag: (min: 10.0, avg: 104.7, max: 266.0) [2024-06-15 17:57:20,956][1648985] Avg episode reward: [(0, '139.950')] [2024-06-15 17:57:22,297][1652491] Updated weights for policy 0, policy_version 522680 (0.0016) [2024-06-15 17:57:24,790][1652491] Updated weights for policy 0, policy_version 522708 (0.0011) [2024-06-15 17:57:25,751][1652491] Updated weights for policy 0, policy_version 522751 (0.0019) [2024-06-15 17:57:25,955][1648985] Fps is (10 sec: 49151.3, 60 sec: 45876.1, 300 sec: 46652.7). Total num frames: 1070596096. Throughput: 0: 11810.2. Samples: 267705344. Policy #0 lag: (min: 10.0, avg: 104.7, max: 266.0) [2024-06-15 17:57:25,956][1648985] Avg episode reward: [(0, '161.500')] [2024-06-15 17:57:27,847][1652491] Updated weights for policy 0, policy_version 522816 (0.0013) [2024-06-15 17:57:28,976][1652491] Updated weights for policy 0, policy_version 522875 (0.0012) [2024-06-15 17:57:30,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 47513.5, 300 sec: 46652.7). Total num frames: 1070858240. Throughput: 0: 11969.5. Samples: 267779072. Policy #0 lag: (min: 10.0, avg: 104.7, max: 266.0) [2024-06-15 17:57:30,956][1648985] Avg episode reward: [(0, '157.550')] [2024-06-15 17:57:32,931][1652491] Updated weights for policy 0, policy_version 522934 (0.0025) [2024-06-15 17:57:35,955][1648985] Fps is (10 sec: 45875.9, 60 sec: 45875.2, 300 sec: 46430.6). Total num frames: 1071054848. Throughput: 0: 12037.7. Samples: 267848704. Policy #0 lag: (min: 10.0, avg: 104.7, max: 266.0) [2024-06-15 17:57:35,955][1648985] Avg episode reward: [(0, '149.060')] [2024-06-15 17:57:36,032][1652491] Updated weights for policy 0, policy_version 522992 (0.0013) [2024-06-15 17:57:37,416][1652491] Updated weights for policy 0, policy_version 523009 (0.0012) [2024-06-15 17:57:39,063][1652491] Updated weights for policy 0, policy_version 523073 (0.0013) [2024-06-15 17:57:40,193][1652491] Updated weights for policy 0, policy_version 523136 (0.0012) [2024-06-15 17:57:40,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 49698.3, 300 sec: 46874.9). Total num frames: 1071382528. Throughput: 0: 12049.1. Samples: 267882496. Policy #0 lag: (min: 10.0, avg: 104.7, max: 266.0) [2024-06-15 17:57:40,956][1648985] Avg episode reward: [(0, '153.120')] [2024-06-15 17:57:45,957][1648985] Fps is (10 sec: 45867.7, 60 sec: 46420.2, 300 sec: 46208.2). Total num frames: 1071513600. Throughput: 0: 11969.1. Samples: 267952640. Policy #0 lag: (min: 10.0, avg: 104.7, max: 266.0) [2024-06-15 17:57:45,957][1648985] Avg episode reward: [(0, '150.700')] [2024-06-15 17:57:46,136][1652491] Updated weights for policy 0, policy_version 523218 (0.0015) [2024-06-15 17:57:47,088][1652491] Updated weights for policy 0, policy_version 523263 (0.0011) [2024-06-15 17:57:50,955][1648985] Fps is (10 sec: 42598.7, 60 sec: 48605.8, 300 sec: 46763.9). Total num frames: 1071808512. Throughput: 0: 11821.5. Samples: 268015616. Policy #0 lag: (min: 10.0, avg: 104.7, max: 266.0) [2024-06-15 17:57:50,955][1648985] Avg episode reward: [(0, '139.940')] [2024-06-15 17:57:50,998][1652491] Updated weights for policy 0, policy_version 523360 (0.0082) [2024-06-15 17:57:55,781][1652491] Updated weights for policy 0, policy_version 523408 (0.0013) [2024-06-15 17:57:55,955][1648985] Fps is (10 sec: 42604.9, 60 sec: 45875.2, 300 sec: 46097.4). Total num frames: 1071939584. Throughput: 0: 11696.3. Samples: 268051456. Policy #0 lag: (min: 10.0, avg: 104.7, max: 266.0) [2024-06-15 17:57:55,956][1648985] Avg episode reward: [(0, '134.330')] [2024-06-15 17:57:57,212][1652491] Updated weights for policy 0, policy_version 523460 (0.0013) [2024-06-15 17:57:58,474][1652491] Updated weights for policy 0, policy_version 523510 (0.0010) [2024-06-15 17:58:00,889][1652491] Updated weights for policy 0, policy_version 523538 (0.0012) [2024-06-15 17:58:00,955][1648985] Fps is (10 sec: 39321.6, 60 sec: 46421.6, 300 sec: 46652.8). Total num frames: 1072201728. Throughput: 0: 11764.6. Samples: 268123136. Policy #0 lag: (min: 10.0, avg: 104.7, max: 266.0) [2024-06-15 17:58:00,955][1648985] Avg episode reward: [(0, '136.690')] [2024-06-15 17:58:01,704][1651469] Signal inference workers to stop experience collection... (27250 times) [2024-06-15 17:58:01,738][1652491] InferenceWorker_p0-w0: stopping experience collection (27250 times) [2024-06-15 17:58:01,858][1651469] Signal inference workers to resume experience collection... (27250 times) [2024-06-15 17:58:01,859][1652491] InferenceWorker_p0-w0: resuming experience collection (27250 times) [2024-06-15 17:58:02,371][1652491] Updated weights for policy 0, policy_version 523603 (0.0108) [2024-06-15 17:58:03,350][1652491] Updated weights for policy 0, policy_version 523646 (0.0011) [2024-06-15 17:58:05,955][1648985] Fps is (10 sec: 49152.4, 60 sec: 45875.3, 300 sec: 46430.6). Total num frames: 1072431104. Throughput: 0: 11707.7. Samples: 268196352. Policy #0 lag: (min: 10.0, avg: 104.7, max: 266.0) [2024-06-15 17:58:05,956][1648985] Avg episode reward: [(0, '146.250')] [2024-06-15 17:58:07,452][1652491] Updated weights for policy 0, policy_version 523682 (0.0012) [2024-06-15 17:58:09,357][1652491] Updated weights for policy 0, policy_version 523760 (0.0040) [2024-06-15 17:58:10,955][1648985] Fps is (10 sec: 49151.9, 60 sec: 46421.4, 300 sec: 46652.7). Total num frames: 1072693248. Throughput: 0: 11650.9. Samples: 268229632. Policy #0 lag: (min: 10.0, avg: 104.7, max: 266.0) [2024-06-15 17:58:10,955][1648985] Avg episode reward: [(0, '148.520')] [2024-06-15 17:58:11,565][1652491] Updated weights for policy 0, policy_version 523809 (0.0013) [2024-06-15 17:58:13,132][1652491] Updated weights for policy 0, policy_version 523843 (0.0043) [2024-06-15 17:58:14,473][1652491] Updated weights for policy 0, policy_version 523899 (0.0016) [2024-06-15 17:58:15,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 47513.6, 300 sec: 46652.7). Total num frames: 1072955392. Throughput: 0: 11582.6. Samples: 268300288. Policy #0 lag: (min: 10.0, avg: 104.7, max: 266.0) [2024-06-15 17:58:15,955][1648985] Avg episode reward: [(0, '156.700')] [2024-06-15 17:58:19,114][1652491] Updated weights for policy 0, policy_version 523955 (0.0013) [2024-06-15 17:58:20,822][1652491] Updated weights for policy 0, policy_version 524032 (0.0031) [2024-06-15 17:58:20,955][1648985] Fps is (10 sec: 52428.6, 60 sec: 48059.7, 300 sec: 46652.7). Total num frames: 1073217536. Throughput: 0: 11593.9. Samples: 268370432. Policy #0 lag: (min: 10.0, avg: 104.7, max: 266.0) [2024-06-15 17:58:20,956][1648985] Avg episode reward: [(0, '151.670')] [2024-06-15 17:58:22,867][1652491] Updated weights for policy 0, policy_version 524086 (0.0018) [2024-06-15 17:58:25,955][1648985] Fps is (10 sec: 49151.8, 60 sec: 47513.7, 300 sec: 46874.9). Total num frames: 1073446912. Throughput: 0: 11639.5. Samples: 268406272. Policy #0 lag: (min: 10.0, avg: 104.7, max: 266.0) [2024-06-15 17:58:25,956][1648985] Avg episode reward: [(0, '150.150')] [2024-06-15 17:58:26,087][1652491] Updated weights for policy 0, policy_version 524155 (0.0011) [2024-06-15 17:58:29,946][1652491] Updated weights for policy 0, policy_version 524215 (0.0096) [2024-06-15 17:58:30,955][1648985] Fps is (10 sec: 45875.6, 60 sec: 46967.5, 300 sec: 46541.7). Total num frames: 1073676288. Throughput: 0: 11708.2. Samples: 268479488. Policy #0 lag: (min: 10.0, avg: 104.7, max: 266.0) [2024-06-15 17:58:30,956][1648985] Avg episode reward: [(0, '148.990')] [2024-06-15 17:58:31,448][1652491] Updated weights for policy 0, policy_version 524282 (0.0012) [2024-06-15 17:58:33,756][1652491] Updated weights for policy 0, policy_version 524347 (0.0013) [2024-06-15 17:58:35,955][1648985] Fps is (10 sec: 42598.5, 60 sec: 46967.4, 300 sec: 46652.7). Total num frames: 1073872896. Throughput: 0: 11832.9. Samples: 268548096. Policy #0 lag: (min: 10.0, avg: 104.7, max: 266.0) [2024-06-15 17:58:35,956][1648985] Avg episode reward: [(0, '154.630')] [2024-06-15 17:58:37,498][1652491] Updated weights for policy 0, policy_version 524404 (0.0012) [2024-06-15 17:58:40,452][1652491] Updated weights for policy 0, policy_version 524419 (0.0023) [2024-06-15 17:58:40,955][1648985] Fps is (10 sec: 36044.0, 60 sec: 44236.7, 300 sec: 46208.4). Total num frames: 1074036736. Throughput: 0: 11821.5. Samples: 268583424. Policy #0 lag: (min: 10.0, avg: 104.7, max: 266.0) [2024-06-15 17:58:40,956][1648985] Avg episode reward: [(0, '158.880')] [2024-06-15 17:58:43,021][1652491] Updated weights for policy 0, policy_version 524528 (0.0189) [2024-06-15 17:58:43,958][1651469] Signal inference workers to stop experience collection... (27300 times) [2024-06-15 17:58:44,067][1652491] InferenceWorker_p0-w0: stopping experience collection (27300 times) [2024-06-15 17:58:44,109][1651469] Signal inference workers to resume experience collection... (27300 times) [2024-06-15 17:58:44,109][1652491] InferenceWorker_p0-w0: resuming experience collection (27300 times) [2024-06-15 17:58:44,607][1652491] Updated weights for policy 0, policy_version 524579 (0.0013) [2024-06-15 17:58:45,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 48061.0, 300 sec: 46874.9). Total num frames: 1074397184. Throughput: 0: 11662.2. Samples: 268647936. Policy #0 lag: (min: 79.0, avg: 153.8, max: 335.0) [2024-06-15 17:58:45,956][1648985] Avg episode reward: [(0, '161.540')] [2024-06-15 17:58:48,753][1652491] Updated weights for policy 0, policy_version 524625 (0.0011) [2024-06-15 17:58:50,955][1648985] Fps is (10 sec: 49152.0, 60 sec: 45328.9, 300 sec: 46319.5). Total num frames: 1074528256. Throughput: 0: 11616.7. Samples: 268719104. Policy #0 lag: (min: 79.0, avg: 153.8, max: 335.0) [2024-06-15 17:58:50,956][1648985] Avg episode reward: [(0, '183.190')] [2024-06-15 17:58:51,669][1652491] Updated weights for policy 0, policy_version 524673 (0.0012) [2024-06-15 17:58:53,435][1652491] Updated weights for policy 0, policy_version 524741 (0.0010) [2024-06-15 17:58:54,978][1652491] Updated weights for policy 0, policy_version 524803 (0.0013) [2024-06-15 17:58:55,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 49698.2, 300 sec: 47097.1). Total num frames: 1074921472. Throughput: 0: 11707.7. Samples: 268756480. Policy #0 lag: (min: 79.0, avg: 153.8, max: 335.0) [2024-06-15 17:58:55,955][1648985] Avg episode reward: [(0, '184.040')] [2024-06-15 17:58:55,959][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000524864_1074921472.pth... [2024-06-15 17:58:56,063][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000519360_1063649280.pth [2024-06-15 17:59:00,050][1652491] Updated weights for policy 0, policy_version 524883 (0.0015) [2024-06-15 17:59:00,955][1648985] Fps is (10 sec: 49153.0, 60 sec: 46967.5, 300 sec: 46430.6). Total num frames: 1075019776. Throughput: 0: 11696.4. Samples: 268826624. Policy #0 lag: (min: 79.0, avg: 153.8, max: 335.0) [2024-06-15 17:59:00,955][1648985] Avg episode reward: [(0, '169.210')] [2024-06-15 17:59:01,144][1652491] Updated weights for policy 0, policy_version 524923 (0.0039) [2024-06-15 17:59:04,164][1652491] Updated weights for policy 0, policy_version 524961 (0.0013) [2024-06-15 17:59:05,252][1652491] Updated weights for policy 0, policy_version 525011 (0.0095) [2024-06-15 17:59:05,955][1648985] Fps is (10 sec: 36044.9, 60 sec: 47513.6, 300 sec: 46541.7). Total num frames: 1075281920. Throughput: 0: 11582.6. Samples: 268891648. Policy #0 lag: (min: 79.0, avg: 153.8, max: 335.0) [2024-06-15 17:59:05,955][1648985] Avg episode reward: [(0, '154.420')] [2024-06-15 17:59:06,711][1652491] Updated weights for policy 0, policy_version 525076 (0.0019) [2024-06-15 17:59:10,955][1648985] Fps is (10 sec: 42598.0, 60 sec: 45875.2, 300 sec: 46208.4). Total num frames: 1075445760. Throughput: 0: 11548.4. Samples: 268925952. Policy #0 lag: (min: 79.0, avg: 153.8, max: 335.0) [2024-06-15 17:59:10,956][1648985] Avg episode reward: [(0, '145.750')] [2024-06-15 17:59:11,078][1652491] Updated weights for policy 0, policy_version 525121 (0.0095) [2024-06-15 17:59:12,345][1652491] Updated weights for policy 0, policy_version 525172 (0.0013) [2024-06-15 17:59:15,397][1652491] Updated weights for policy 0, policy_version 525217 (0.0011) [2024-06-15 17:59:15,955][1648985] Fps is (10 sec: 39321.5, 60 sec: 45329.1, 300 sec: 46097.4). Total num frames: 1075675136. Throughput: 0: 11571.2. Samples: 269000192. Policy #0 lag: (min: 79.0, avg: 153.8, max: 335.0) [2024-06-15 17:59:15,955][1648985] Avg episode reward: [(0, '132.570')] [2024-06-15 17:59:17,218][1652491] Updated weights for policy 0, policy_version 525296 (0.0028) [2024-06-15 17:59:18,438][1652491] Updated weights for policy 0, policy_version 525344 (0.0013) [2024-06-15 17:59:20,955][1648985] Fps is (10 sec: 52429.2, 60 sec: 45875.2, 300 sec: 46541.7). Total num frames: 1075970048. Throughput: 0: 11468.8. Samples: 269064192. Policy #0 lag: (min: 79.0, avg: 153.8, max: 335.0) [2024-06-15 17:59:20,956][1648985] Avg episode reward: [(0, '140.010')] [2024-06-15 17:59:22,843][1652491] Updated weights for policy 0, policy_version 525395 (0.0014) [2024-06-15 17:59:23,553][1652491] Updated weights for policy 0, policy_version 525433 (0.0023) [2024-06-15 17:59:25,955][1648985] Fps is (10 sec: 45874.5, 60 sec: 44782.9, 300 sec: 46097.3). Total num frames: 1076133888. Throughput: 0: 11514.3. Samples: 269101568. Policy #0 lag: (min: 79.0, avg: 153.8, max: 335.0) [2024-06-15 17:59:25,956][1648985] Avg episode reward: [(0, '140.090')] [2024-06-15 17:59:26,359][1652491] Updated weights for policy 0, policy_version 525474 (0.0014) [2024-06-15 17:59:26,906][1651469] Signal inference workers to stop experience collection... (27350 times) [2024-06-15 17:59:26,939][1652491] InferenceWorker_p0-w0: stopping experience collection (27350 times) [2024-06-15 17:59:27,003][1651469] Signal inference workers to resume experience collection... (27350 times) [2024-06-15 17:59:27,003][1652491] InferenceWorker_p0-w0: resuming experience collection (27350 times) [2024-06-15 17:59:27,447][1652491] Updated weights for policy 0, policy_version 525520 (0.0013) [2024-06-15 17:59:29,268][1652491] Updated weights for policy 0, policy_version 525600 (0.0118) [2024-06-15 17:59:30,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 46967.5, 300 sec: 46874.9). Total num frames: 1076494336. Throughput: 0: 11480.2. Samples: 269164544. Policy #0 lag: (min: 79.0, avg: 153.8, max: 335.0) [2024-06-15 17:59:30,955][1648985] Avg episode reward: [(0, '152.650')] [2024-06-15 17:59:34,015][1652491] Updated weights for policy 0, policy_version 525664 (0.0136) [2024-06-15 17:59:35,955][1648985] Fps is (10 sec: 49152.6, 60 sec: 45875.2, 300 sec: 46319.5). Total num frames: 1076625408. Throughput: 0: 11662.3. Samples: 269243904. Policy #0 lag: (min: 79.0, avg: 153.8, max: 335.0) [2024-06-15 17:59:35,956][1648985] Avg episode reward: [(0, '148.120')] [2024-06-15 17:59:37,693][1652491] Updated weights for policy 0, policy_version 525715 (0.0013) [2024-06-15 17:59:39,337][1652491] Updated weights for policy 0, policy_version 525777 (0.0107) [2024-06-15 17:59:40,708][1652491] Updated weights for policy 0, policy_version 525828 (0.0012) [2024-06-15 17:59:40,955][1648985] Fps is (10 sec: 42597.7, 60 sec: 48059.8, 300 sec: 46763.8). Total num frames: 1076920320. Throughput: 0: 11537.0. Samples: 269275648. Policy #0 lag: (min: 79.0, avg: 153.8, max: 335.0) [2024-06-15 17:59:40,956][1648985] Avg episode reward: [(0, '147.980')] [2024-06-15 17:59:41,831][1652491] Updated weights for policy 0, policy_version 525887 (0.0012) [2024-06-15 17:59:45,607][1652491] Updated weights for policy 0, policy_version 525949 (0.0049) [2024-06-15 17:59:45,955][1648985] Fps is (10 sec: 52428.1, 60 sec: 45875.1, 300 sec: 46652.7). Total num frames: 1077149696. Throughput: 0: 11537.0. Samples: 269345792. Policy #0 lag: (min: 79.0, avg: 153.8, max: 335.0) [2024-06-15 17:59:45,956][1648985] Avg episode reward: [(0, '152.720')] [2024-06-15 17:59:48,871][1652491] Updated weights for policy 0, policy_version 525985 (0.0013) [2024-06-15 17:59:50,955][1648985] Fps is (10 sec: 45876.0, 60 sec: 47513.8, 300 sec: 46541.7). Total num frames: 1077379072. Throughput: 0: 11548.4. Samples: 269411328. Policy #0 lag: (min: 79.0, avg: 153.8, max: 335.0) [2024-06-15 17:59:50,955][1648985] Avg episode reward: [(0, '153.020')] [2024-06-15 17:59:51,138][1652491] Updated weights for policy 0, policy_version 526075 (0.0107) [2024-06-15 17:59:52,605][1652491] Updated weights for policy 0, policy_version 526137 (0.0011) [2024-06-15 17:59:55,970][1648985] Fps is (10 sec: 45806.2, 60 sec: 44771.6, 300 sec: 46761.4). Total num frames: 1077608448. Throughput: 0: 11533.2. Samples: 269445120. Policy #0 lag: (min: 79.0, avg: 153.8, max: 335.0) [2024-06-15 17:59:55,971][1648985] Avg episode reward: [(0, '149.380')] [2024-06-15 17:59:56,403][1652491] Updated weights for policy 0, policy_version 526202 (0.0015) [2024-06-15 18:00:00,455][1652491] Updated weights for policy 0, policy_version 526259 (0.0014) [2024-06-15 18:00:00,955][1648985] Fps is (10 sec: 42598.1, 60 sec: 46421.3, 300 sec: 46319.5). Total num frames: 1077805056. Throughput: 0: 11650.8. Samples: 269524480. Policy #0 lag: (min: 79.0, avg: 153.8, max: 335.0) [2024-06-15 18:00:00,956][1648985] Avg episode reward: [(0, '140.450')] [2024-06-15 18:00:01,111][1652491] Updated weights for policy 0, policy_version 526275 (0.0011) [2024-06-15 18:00:02,551][1652491] Updated weights for policy 0, policy_version 526335 (0.0012) [2024-06-15 18:00:04,449][1652491] Updated weights for policy 0, policy_version 526395 (0.0014) [2024-06-15 18:00:05,994][1648985] Fps is (10 sec: 45765.5, 60 sec: 46391.0, 300 sec: 46647.7). Total num frames: 1078067200. Throughput: 0: 11652.1. Samples: 269588992. Policy #0 lag: (min: 79.0, avg: 153.8, max: 335.0) [2024-06-15 18:00:05,995][1648985] Avg episode reward: [(0, '147.260')] [2024-06-15 18:00:07,055][1651469] Signal inference workers to stop experience collection... (27400 times) [2024-06-15 18:00:07,117][1652491] InferenceWorker_p0-w0: stopping experience collection (27400 times) [2024-06-15 18:00:07,135][1652491] Updated weights for policy 0, policy_version 526440 (0.0025) [2024-06-15 18:00:07,244][1651469] Signal inference workers to resume experience collection... (27400 times) [2024-06-15 18:00:07,245][1652491] InferenceWorker_p0-w0: resuming experience collection (27400 times) [2024-06-15 18:00:10,930][1652491] Updated weights for policy 0, policy_version 526501 (0.0019) [2024-06-15 18:00:10,955][1648985] Fps is (10 sec: 45875.2, 60 sec: 46967.5, 300 sec: 46430.6). Total num frames: 1078263808. Throughput: 0: 11730.5. Samples: 269629440. Policy #0 lag: (min: 79.0, avg: 153.8, max: 335.0) [2024-06-15 18:00:10,955][1648985] Avg episode reward: [(0, '154.590')] [2024-06-15 18:00:13,428][1652491] Updated weights for policy 0, policy_version 526576 (0.0041) [2024-06-15 18:00:15,064][1652491] Updated weights for policy 0, policy_version 526640 (0.0097) [2024-06-15 18:00:15,955][1648985] Fps is (10 sec: 52635.0, 60 sec: 48605.8, 300 sec: 46986.0). Total num frames: 1078591488. Throughput: 0: 11730.5. Samples: 269692416. Policy #0 lag: (min: 37.0, avg: 127.8, max: 293.0) [2024-06-15 18:00:15,955][1648985] Avg episode reward: [(0, '162.090')] [2024-06-15 18:00:18,354][1652491] Updated weights for policy 0, policy_version 526673 (0.0012) [2024-06-15 18:00:20,955][1648985] Fps is (10 sec: 45875.3, 60 sec: 45875.2, 300 sec: 46541.7). Total num frames: 1078722560. Throughput: 0: 11855.7. Samples: 269777408. Policy #0 lag: (min: 37.0, avg: 127.8, max: 293.0) [2024-06-15 18:00:20,955][1648985] Avg episode reward: [(0, '152.500')] [2024-06-15 18:00:21,222][1652491] Updated weights for policy 0, policy_version 526736 (0.0014) [2024-06-15 18:00:23,383][1652491] Updated weights for policy 0, policy_version 526804 (0.0026) [2024-06-15 18:00:25,306][1652491] Updated weights for policy 0, policy_version 526880 (0.0015) [2024-06-15 18:00:25,955][1648985] Fps is (10 sec: 52427.9, 60 sec: 49698.1, 300 sec: 47097.0). Total num frames: 1079115776. Throughput: 0: 11776.0. Samples: 269805568. Policy #0 lag: (min: 37.0, avg: 127.8, max: 293.0) [2024-06-15 18:00:25,956][1648985] Avg episode reward: [(0, '143.640')] [2024-06-15 18:00:28,681][1652491] Updated weights for policy 0, policy_version 526913 (0.0013) [2024-06-15 18:00:30,955][1648985] Fps is (10 sec: 52428.6, 60 sec: 45875.2, 300 sec: 46652.8). Total num frames: 1079246848. Throughput: 0: 11810.2. Samples: 269877248. Policy #0 lag: (min: 37.0, avg: 127.8, max: 293.0) [2024-06-15 18:00:30,956][1648985] Avg episode reward: [(0, '143.970')] [2024-06-15 18:00:32,132][1652491] Updated weights for policy 0, policy_version 526992 (0.0016) [2024-06-15 18:00:33,807][1652491] Updated weights for policy 0, policy_version 527057 (0.0023) [2024-06-15 18:00:35,860][1652491] Updated weights for policy 0, policy_version 527120 (0.0014) [2024-06-15 18:00:35,955][1648985] Fps is (10 sec: 42599.1, 60 sec: 48605.9, 300 sec: 46763.8). Total num frames: 1079541760. Throughput: 0: 11946.7. Samples: 269948928. Policy #0 lag: (min: 37.0, avg: 127.8, max: 293.0) [2024-06-15 18:00:35,955][1648985] Avg episode reward: [(0, '148.340')] [2024-06-15 18:00:40,165][1652491] Updated weights for policy 0, policy_version 527184 (0.0013) [2024-06-15 18:00:40,955][1648985] Fps is (10 sec: 49150.8, 60 sec: 46967.4, 300 sec: 46763.8). Total num frames: 1079738368. Throughput: 0: 12075.8. Samples: 269988352. Policy #0 lag: (min: 37.0, avg: 127.8, max: 293.0) [2024-06-15 18:00:40,956][1648985] Avg episode reward: [(0, '158.370')] [2024-06-15 18:00:42,912][1652491] Updated weights for policy 0, policy_version 527234 (0.0014) [2024-06-15 18:00:43,982][1652491] Updated weights for policy 0, policy_version 527285 (0.0012) [2024-06-15 18:00:45,194][1652491] Updated weights for policy 0, policy_version 527316 (0.0012) [2024-06-15 18:00:45,955][1648985] Fps is (10 sec: 49152.1, 60 sec: 48059.9, 300 sec: 46874.9). Total num frames: 1080033280. Throughput: 0: 11980.8. Samples: 270063616. Policy #0 lag: (min: 37.0, avg: 127.8, max: 293.0) [2024-06-15 18:00:45,956][1648985] Avg episode reward: [(0, '151.570')] [2024-06-15 18:00:46,439][1652491] Updated weights for policy 0, policy_version 527376 (0.0015) [2024-06-15 18:00:50,955][1648985] Fps is (10 sec: 45876.2, 60 sec: 46967.4, 300 sec: 46874.9). Total num frames: 1080197120. Throughput: 0: 12048.2. Samples: 270130688. Policy #0 lag: (min: 37.0, avg: 127.8, max: 293.0) [2024-06-15 18:00:50,956][1648985] Avg episode reward: [(0, '173.600')] [2024-06-15 18:00:50,995][1652491] Updated weights for policy 0, policy_version 527443 (0.0016) [2024-06-15 18:00:53,822][1651469] Signal inference workers to stop experience collection... (27450 times) [2024-06-15 18:00:53,850][1652491] InferenceWorker_p0-w0: stopping experience collection (27450 times) [2024-06-15 18:00:53,861][1652491] Updated weights for policy 0, policy_version 527507 (0.0129) [2024-06-15 18:00:54,024][1651469] Signal inference workers to resume experience collection... (27450 times) [2024-06-15 18:00:54,025][1652491] InferenceWorker_p0-w0: resuming experience collection (27450 times) [2024-06-15 18:00:55,955][1648985] Fps is (10 sec: 39320.4, 60 sec: 46979.2, 300 sec: 46652.7). Total num frames: 1080426496. Throughput: 0: 11992.1. Samples: 270169088. Policy #0 lag: (min: 37.0, avg: 127.8, max: 293.0) [2024-06-15 18:00:55,956][1648985] Avg episode reward: [(0, '164.430')] [2024-06-15 18:00:55,962][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000527552_1080426496.pth... [2024-06-15 18:00:56,028][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000522080_1069219840.pth [2024-06-15 18:00:56,744][1652491] Updated weights for policy 0, policy_version 527572 (0.0012) [2024-06-15 18:00:57,898][1652491] Updated weights for policy 0, policy_version 527618 (0.0013) [2024-06-15 18:00:59,038][1652491] Updated weights for policy 0, policy_version 527676 (0.0014) [2024-06-15 18:01:00,955][1648985] Fps is (10 sec: 49151.5, 60 sec: 48059.6, 300 sec: 47097.0). Total num frames: 1080688640. Throughput: 0: 12140.0. Samples: 270238720. Policy #0 lag: (min: 37.0, avg: 127.8, max: 293.0) [2024-06-15 18:01:00,956][1648985] Avg episode reward: [(0, '143.790')] [2024-06-15 18:01:02,682][1652491] Updated weights for policy 0, policy_version 527733 (0.0062) [2024-06-15 18:01:04,680][1652491] Updated weights for policy 0, policy_version 527748 (0.0012) [2024-06-15 18:01:05,955][1648985] Fps is (10 sec: 49153.5, 60 sec: 47544.7, 300 sec: 46986.1). Total num frames: 1080918016. Throughput: 0: 11832.9. Samples: 270309888. Policy #0 lag: (min: 37.0, avg: 127.8, max: 293.0) [2024-06-15 18:01:05,955][1648985] Avg episode reward: [(0, '152.800')] [2024-06-15 18:01:06,040][1652491] Updated weights for policy 0, policy_version 527808 (0.0018) [2024-06-15 18:01:08,458][1652491] Updated weights for policy 0, policy_version 527872 (0.0039) [2024-06-15 18:01:09,733][1652491] Updated weights for policy 0, policy_version 527923 (0.0084) [2024-06-15 18:01:10,964][1648985] Fps is (10 sec: 52382.4, 60 sec: 49144.6, 300 sec: 47095.6). Total num frames: 1081212928. Throughput: 0: 11921.6. Samples: 270342144. Policy #0 lag: (min: 37.0, avg: 127.8, max: 293.0) [2024-06-15 18:01:10,964][1648985] Avg episode reward: [(0, '159.170')] [2024-06-15 18:01:13,693][1652491] Updated weights for policy 0, policy_version 527984 (0.0014) [2024-06-15 18:01:15,955][1648985] Fps is (10 sec: 42598.2, 60 sec: 45875.2, 300 sec: 46763.8). Total num frames: 1081344000. Throughput: 0: 11935.3. Samples: 270414336. Policy #0 lag: (min: 37.0, avg: 127.8, max: 293.0) [2024-06-15 18:01:15,956][1648985] Avg episode reward: [(0, '155.010')] [2024-06-15 18:01:16,383][1652491] Updated weights for policy 0, policy_version 528022 (0.0012) [2024-06-15 18:01:18,519][1652491] Updated weights for policy 0, policy_version 528066 (0.0013) [2024-06-15 18:01:19,890][1652491] Updated weights for policy 0, policy_version 528128 (0.0119) [2024-06-15 18:01:21,026][1648985] Fps is (10 sec: 45591.4, 60 sec: 49093.7, 300 sec: 46863.8). Total num frames: 1081671680. Throughput: 0: 11825.5. Samples: 270481920. Policy #0 lag: (min: 37.0, avg: 127.8, max: 293.0) [2024-06-15 18:01:21,027][1648985] Avg episode reward: [(0, '161.940')] [2024-06-15 18:01:21,359][1652491] Updated weights for policy 0, policy_version 528190 (0.0013) [2024-06-15 18:01:25,248][1652491] Updated weights for policy 0, policy_version 528250 (0.0125) [2024-06-15 18:01:25,955][1648985] Fps is (10 sec: 52429.2, 60 sec: 45875.4, 300 sec: 46986.0). Total num frames: 1081868288. Throughput: 0: 11810.2. Samples: 270519808. Policy #0 lag: (min: 37.0, avg: 127.8, max: 293.0) [2024-06-15 18:01:25,955][1648985] Avg episode reward: [(0, '145.880')] [2024-06-15 18:01:27,801][1652491] Updated weights for policy 0, policy_version 528306 (0.0018) [2024-06-15 18:01:30,465][1652491] Updated weights for policy 0, policy_version 528368 (0.0013) [2024-06-15 18:01:30,955][1648985] Fps is (10 sec: 46204.3, 60 sec: 48059.7, 300 sec: 46874.9). Total num frames: 1082130432. Throughput: 0: 11776.0. Samples: 270593536. Policy #0 lag: (min: 37.0, avg: 127.8, max: 293.0) [2024-06-15 18:01:30,956][1648985] Avg episode reward: [(0, '153.350')] [2024-06-15 18:01:32,126][1652491] Updated weights for policy 0, policy_version 528432 (0.0012) [2024-06-15 18:01:35,957][1648985] Fps is (10 sec: 39313.0, 60 sec: 45327.5, 300 sec: 46985.7). Total num frames: 1082261504. Throughput: 0: 11752.7. Samples: 270659584. Policy #0 lag: (min: 37.0, avg: 127.8, max: 293.0) [2024-06-15 18:01:35,958][1648985] Avg episode reward: [(0, '143.610')] [2024-06-15 18:01:36,291][1652491] Updated weights for policy 0, policy_version 528456 (0.0012) [2024-06-15 18:01:36,953][1651469] Signal inference workers to stop experience collection... (27500 times) [2024-06-15 18:01:37,052][1652491] InferenceWorker_p0-w0: stopping experience collection (27500 times) [2024-06-15 18:01:37,307][1651469] Signal inference workers to resume experience collection... (27500 times) [2024-06-15 18:01:37,308][1652491] InferenceWorker_p0-w0: resuming experience collection (27500 times) [2024-06-15 18:01:37,558][1652491] Updated weights for policy 0, policy_version 528507 (0.0012) [2024-06-15 18:01:38,837][1652491] Updated weights for policy 0, policy_version 528545 (0.0012) [2024-06-15 18:01:40,955][1648985] Fps is (10 sec: 39321.4, 60 sec: 46421.5, 300 sec: 46763.8). Total num frames: 1082523648. Throughput: 0: 11605.4. Samples: 270691328. Policy #0 lag: (min: 37.0, avg: 127.8, max: 293.0) [2024-06-15 18:01:40,956][1648985] Avg episode reward: [(0, '156.700')] [2024-06-15 18:01:41,974][1652491] Updated weights for policy 0, policy_version 528593 (0.0012) [2024-06-15 18:01:43,256][1652491] Updated weights for policy 0, policy_version 528643 (0.0014) [2024-06-15 18:01:44,755][1652491] Updated weights for policy 0, policy_version 528698 (0.0014) [2024-06-15 18:01:45,955][1648985] Fps is (10 sec: 52439.9, 60 sec: 45875.2, 300 sec: 47097.0). Total num frames: 1082785792. Throughput: 0: 11446.1. Samples: 270753792. Policy #0 lag: (min: 7.0, avg: 112.4, max: 263.0) [2024-06-15 18:01:45,955][1648985] Avg episode reward: [(0, '163.410')] [2024-06-15 18:01:49,365][1652491] Updated weights for policy 0, policy_version 528752 (0.0013) [2024-06-15 18:01:50,955][1648985] Fps is (10 sec: 45875.4, 60 sec: 46421.3, 300 sec: 46763.8). Total num frames: 1082982400. Throughput: 0: 11377.8. Samples: 270821888. Policy #0 lag: (min: 7.0, avg: 112.4, max: 263.0) [2024-06-15 18:01:50,955][1648985] Avg episode reward: [(0, '175.000')] [2024-06-15 18:01:50,977][1652491] Updated weights for policy 0, policy_version 528816 (0.0013) [2024-06-15 18:01:53,373][1652491] Updated weights for policy 0, policy_version 528834 (0.0015) [2024-06-15 18:01:55,475][1652491] Updated weights for policy 0, policy_version 528913 (0.0049) [2024-06-15 18:01:55,955][1648985] Fps is (10 sec: 45875.3, 60 sec: 46967.7, 300 sec: 46874.9). Total num frames: 1083244544. Throughput: 0: 11653.2. Samples: 270866432. Policy #0 lag: (min: 7.0, avg: 112.4, max: 263.0) [2024-06-15 18:01:55,956][1648985] Avg episode reward: [(0, '162.470')] [2024-06-15 18:02:00,770][1652491] Updated weights for policy 0, policy_version 528995 (0.0015) [2024-06-15 18:02:00,955][1648985] Fps is (10 sec: 39321.6, 60 sec: 44783.0, 300 sec: 46430.6). Total num frames: 1083375616. Throughput: 0: 11434.7. Samples: 270928896. Policy #0 lag: (min: 7.0, avg: 112.4, max: 263.0) [2024-06-15 18:02:00,956][1648985] Avg episode reward: [(0, '161.290')] [2024-06-15 18:02:01,993][1652491] Updated weights for policy 0, policy_version 529041 (0.0013) [2024-06-15 18:02:04,449][1652491] Updated weights for policy 0, policy_version 529089 (0.0019) [2024-06-15 18:02:05,955][1648985] Fps is (10 sec: 45875.4, 60 sec: 46421.3, 300 sec: 46763.8). Total num frames: 1083703296. Throughput: 0: 11373.0. Samples: 270992896. Policy #0 lag: (min: 7.0, avg: 112.4, max: 263.0) [2024-06-15 18:02:05,955][1648985] Avg episode reward: [(0, '186.720')] [2024-06-15 18:02:06,632][1652491] Updated weights for policy 0, policy_version 529174 (0.0012) [2024-06-15 18:02:07,419][1652491] Updated weights for policy 0, policy_version 529216 (0.0089) [2024-06-15 18:02:10,955][1648985] Fps is (10 sec: 45874.2, 60 sec: 43697.0, 300 sec: 46541.6). Total num frames: 1083834368. Throughput: 0: 11207.0. Samples: 271024128. Policy #0 lag: (min: 7.0, avg: 112.4, max: 263.0) [2024-06-15 18:02:10,956][1648985] Avg episode reward: [(0, '179.150')] [2024-06-15 18:02:13,243][1652491] Updated weights for policy 0, policy_version 529296 (0.0018) [2024-06-15 18:02:14,267][1652491] Updated weights for policy 0, policy_version 529337 (0.0011) [2024-06-15 18:02:15,955][1648985] Fps is (10 sec: 39321.3, 60 sec: 45875.2, 300 sec: 46652.7). Total num frames: 1084096512. Throughput: 0: 11116.1. Samples: 271093760. Policy #0 lag: (min: 7.0, avg: 112.4, max: 263.0) [2024-06-15 18:02:15,956][1648985] Avg episode reward: [(0, '182.980')] [2024-06-15 18:02:17,658][1652491] Updated weights for policy 0, policy_version 529398 (0.0102) [2024-06-15 18:02:18,094][1651469] Signal inference workers to stop experience collection... (27550 times) [2024-06-15 18:02:18,151][1652491] InferenceWorker_p0-w0: stopping experience collection (27550 times) [2024-06-15 18:02:18,397][1651469] Signal inference workers to resume experience collection... (27550 times) [2024-06-15 18:02:18,399][1652491] InferenceWorker_p0-w0: resuming experience collection (27550 times) [2024-06-15 18:02:18,638][1652491] Updated weights for policy 0, policy_version 529431 (0.0015) [2024-06-15 18:02:20,955][1648985] Fps is (10 sec: 52430.2, 60 sec: 44836.2, 300 sec: 46652.8). Total num frames: 1084358656. Throughput: 0: 11230.4. Samples: 271164928. Policy #0 lag: (min: 7.0, avg: 112.4, max: 263.0) [2024-06-15 18:02:20,955][1648985] Avg episode reward: [(0, '167.810')] [2024-06-15 18:02:22,111][1652491] Updated weights for policy 0, policy_version 529477 (0.0013) [2024-06-15 18:02:24,190][1652491] Updated weights for policy 0, policy_version 529538 (0.0013) [2024-06-15 18:02:25,299][1652491] Updated weights for policy 0, policy_version 529595 (0.0014) [2024-06-15 18:02:25,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 45875.1, 300 sec: 46652.7). Total num frames: 1084620800. Throughput: 0: 11343.7. Samples: 271201792. Policy #0 lag: (min: 7.0, avg: 112.4, max: 263.0) [2024-06-15 18:02:25,956][1648985] Avg episode reward: [(0, '184.630')] [2024-06-15 18:02:28,607][1652491] Updated weights for policy 0, policy_version 529651 (0.0033) [2024-06-15 18:02:30,178][1652491] Updated weights for policy 0, policy_version 529727 (0.0012) [2024-06-15 18:02:30,955][1648985] Fps is (10 sec: 52429.5, 60 sec: 45875.3, 300 sec: 46874.9). Total num frames: 1084882944. Throughput: 0: 11389.2. Samples: 271266304. Policy #0 lag: (min: 7.0, avg: 112.4, max: 263.0) [2024-06-15 18:02:30,955][1648985] Avg episode reward: [(0, '167.450')] [2024-06-15 18:02:34,650][1652491] Updated weights for policy 0, policy_version 529788 (0.0013) [2024-06-15 18:02:35,955][1648985] Fps is (10 sec: 39321.6, 60 sec: 45876.8, 300 sec: 46208.4). Total num frames: 1085014016. Throughput: 0: 11502.9. Samples: 271339520. Policy #0 lag: (min: 7.0, avg: 112.4, max: 263.0) [2024-06-15 18:02:35,956][1648985] Avg episode reward: [(0, '164.780')] [2024-06-15 18:02:37,209][1652491] Updated weights for policy 0, policy_version 529849 (0.0013) [2024-06-15 18:02:40,512][1652491] Updated weights for policy 0, policy_version 529904 (0.0012) [2024-06-15 18:02:40,955][1648985] Fps is (10 sec: 36044.1, 60 sec: 45329.1, 300 sec: 46541.9). Total num frames: 1085243392. Throughput: 0: 11286.7. Samples: 271374336. Policy #0 lag: (min: 7.0, avg: 112.4, max: 263.0) [2024-06-15 18:02:40,956][1648985] Avg episode reward: [(0, '154.580')] [2024-06-15 18:02:42,632][1652491] Updated weights for policy 0, policy_version 529972 (0.0011) [2024-06-15 18:02:45,659][1652491] Updated weights for policy 0, policy_version 530001 (0.0015) [2024-06-15 18:02:45,955][1648985] Fps is (10 sec: 45875.3, 60 sec: 44782.9, 300 sec: 46319.5). Total num frames: 1085472768. Throughput: 0: 11309.5. Samples: 271437824. Policy #0 lag: (min: 7.0, avg: 112.4, max: 263.0) [2024-06-15 18:02:45,956][1648985] Avg episode reward: [(0, '163.650')] [2024-06-15 18:02:46,750][1652491] Updated weights for policy 0, policy_version 530047 (0.0013) [2024-06-15 18:02:49,331][1652491] Updated weights for policy 0, policy_version 530112 (0.0120) [2024-06-15 18:02:50,955][1648985] Fps is (10 sec: 42598.8, 60 sec: 44783.0, 300 sec: 46541.7). Total num frames: 1085669376. Throughput: 0: 11355.0. Samples: 271503872. Policy #0 lag: (min: 7.0, avg: 112.4, max: 263.0) [2024-06-15 18:02:50,955][1648985] Avg episode reward: [(0, '153.460')] [2024-06-15 18:02:52,555][1652491] Updated weights for policy 0, policy_version 530166 (0.0015) [2024-06-15 18:02:54,117][1652491] Updated weights for policy 0, policy_version 530230 (0.0014) [2024-06-15 18:02:55,955][1648985] Fps is (10 sec: 45873.7, 60 sec: 44782.6, 300 sec: 46541.6). Total num frames: 1085931520. Throughput: 0: 11366.4. Samples: 271535616. Policy #0 lag: (min: 7.0, avg: 112.4, max: 263.0) [2024-06-15 18:02:55,956][1648985] Avg episode reward: [(0, '153.700')] [2024-06-15 18:02:55,978][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000530240_1085931520.pth... [2024-06-15 18:02:56,017][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000524864_1074921472.pth [2024-06-15 18:02:57,850][1652491] Updated weights for policy 0, policy_version 530288 (0.0012) [2024-06-15 18:03:00,497][1652491] Updated weights for policy 0, policy_version 530357 (0.0088) [2024-06-15 18:03:00,955][1648985] Fps is (10 sec: 52428.6, 60 sec: 46967.5, 300 sec: 46652.7). Total num frames: 1086193664. Throughput: 0: 11457.4. Samples: 271609344. Policy #0 lag: (min: 7.0, avg: 112.4, max: 263.0) [2024-06-15 18:03:00,955][1648985] Avg episode reward: [(0, '141.710')] [2024-06-15 18:03:03,407][1651469] Signal inference workers to stop experience collection... (27600 times) [2024-06-15 18:03:03,467][1652491] InferenceWorker_p0-w0: stopping experience collection (27600 times) [2024-06-15 18:03:03,656][1651469] Signal inference workers to resume experience collection... (27600 times) [2024-06-15 18:03:03,657][1652491] InferenceWorker_p0-w0: resuming experience collection (27600 times) [2024-06-15 18:03:03,771][1652491] Updated weights for policy 0, policy_version 530416 (0.0125) [2024-06-15 18:03:05,516][1652491] Updated weights for policy 0, policy_version 530485 (0.0012) [2024-06-15 18:03:05,955][1648985] Fps is (10 sec: 52430.6, 60 sec: 45875.1, 300 sec: 46652.7). Total num frames: 1086455808. Throughput: 0: 11207.1. Samples: 271669248. Policy #0 lag: (min: 7.0, avg: 112.4, max: 263.0) [2024-06-15 18:03:05,956][1648985] Avg episode reward: [(0, '144.860')] [2024-06-15 18:03:09,366][1652491] Updated weights for policy 0, policy_version 530528 (0.0012) [2024-06-15 18:03:09,979][1652491] Updated weights for policy 0, policy_version 530557 (0.0011) [2024-06-15 18:03:10,955][1648985] Fps is (10 sec: 42598.3, 60 sec: 46421.5, 300 sec: 46319.5). Total num frames: 1086619648. Throughput: 0: 11355.0. Samples: 271712768. Policy #0 lag: (min: 14.0, avg: 132.0, max: 267.0) [2024-06-15 18:03:10,955][1648985] Avg episode reward: [(0, '150.610')] [2024-06-15 18:03:14,783][1652491] Updated weights for policy 0, policy_version 530640 (0.0012) [2024-06-15 18:03:15,955][1648985] Fps is (10 sec: 36044.9, 60 sec: 45329.1, 300 sec: 46097.4). Total num frames: 1086816256. Throughput: 0: 11286.7. Samples: 271774208. Policy #0 lag: (min: 14.0, avg: 132.0, max: 267.0) [2024-06-15 18:03:15,955][1648985] Avg episode reward: [(0, '140.200')] [2024-06-15 18:03:16,832][1652491] Updated weights for policy 0, policy_version 530710 (0.0015) [2024-06-15 18:03:17,730][1652491] Updated weights for policy 0, policy_version 530750 (0.0014) [2024-06-15 18:03:20,955][1648985] Fps is (10 sec: 39321.8, 60 sec: 44236.8, 300 sec: 45986.3). Total num frames: 1087012864. Throughput: 0: 11298.2. Samples: 271847936. Policy #0 lag: (min: 14.0, avg: 132.0, max: 267.0) [2024-06-15 18:03:20,956][1648985] Avg episode reward: [(0, '150.760')] [2024-06-15 18:03:22,007][1652491] Updated weights for policy 0, policy_version 530816 (0.0014) [2024-06-15 18:03:24,001][1652491] Updated weights for policy 0, policy_version 530867 (0.0014) [2024-06-15 18:03:25,955][1648985] Fps is (10 sec: 49150.7, 60 sec: 44782.8, 300 sec: 46208.4). Total num frames: 1087307776. Throughput: 0: 11161.5. Samples: 271876608. Policy #0 lag: (min: 14.0, avg: 132.0, max: 267.0) [2024-06-15 18:03:25,956][1648985] Avg episode reward: [(0, '165.920')] [2024-06-15 18:03:26,108][1652491] Updated weights for policy 0, policy_version 530928 (0.0095) [2024-06-15 18:03:27,833][1652491] Updated weights for policy 0, policy_version 531000 (0.0111) [2024-06-15 18:03:30,955][1648985] Fps is (10 sec: 49151.3, 60 sec: 43690.5, 300 sec: 46208.4). Total num frames: 1087504384. Throughput: 0: 11332.2. Samples: 271947776. Policy #0 lag: (min: 14.0, avg: 132.0, max: 267.0) [2024-06-15 18:03:30,956][1648985] Avg episode reward: [(0, '179.660')] [2024-06-15 18:03:32,217][1652491] Updated weights for policy 0, policy_version 531040 (0.0011) [2024-06-15 18:03:35,342][1652491] Updated weights for policy 0, policy_version 531128 (0.0015) [2024-06-15 18:03:35,955][1648985] Fps is (10 sec: 45875.4, 60 sec: 45875.1, 300 sec: 46541.7). Total num frames: 1087766528. Throughput: 0: 11423.2. Samples: 272017920. Policy #0 lag: (min: 14.0, avg: 132.0, max: 267.0) [2024-06-15 18:03:35,956][1648985] Avg episode reward: [(0, '178.500')] [2024-06-15 18:03:37,332][1652491] Updated weights for policy 0, policy_version 531170 (0.0020) [2024-06-15 18:03:38,765][1652491] Updated weights for policy 0, policy_version 531236 (0.0025) [2024-06-15 18:03:40,955][1648985] Fps is (10 sec: 52429.4, 60 sec: 46421.4, 300 sec: 46208.4). Total num frames: 1088028672. Throughput: 0: 11412.0. Samples: 272049152. Policy #0 lag: (min: 14.0, avg: 132.0, max: 267.0) [2024-06-15 18:03:40,955][1648985] Avg episode reward: [(0, '164.750')] [2024-06-15 18:03:42,805][1652491] Updated weights for policy 0, policy_version 531266 (0.0036) [2024-06-15 18:03:43,947][1652491] Updated weights for policy 0, policy_version 531322 (0.0017) [2024-06-15 18:03:45,955][1648985] Fps is (10 sec: 45876.0, 60 sec: 45875.2, 300 sec: 46430.6). Total num frames: 1088225280. Throughput: 0: 11650.8. Samples: 272133632. Policy #0 lag: (min: 14.0, avg: 132.0, max: 267.0) [2024-06-15 18:03:45,956][1648985] Avg episode reward: [(0, '161.020')] [2024-06-15 18:03:46,516][1652491] Updated weights for policy 0, policy_version 531392 (0.0013) [2024-06-15 18:03:47,048][1651469] Signal inference workers to stop experience collection... (27650 times) [2024-06-15 18:03:47,180][1652491] InferenceWorker_p0-w0: stopping experience collection (27650 times) [2024-06-15 18:03:47,306][1651469] Signal inference workers to resume experience collection... (27650 times) [2024-06-15 18:03:47,307][1652491] InferenceWorker_p0-w0: resuming experience collection (27650 times) [2024-06-15 18:03:48,650][1652491] Updated weights for policy 0, policy_version 531472 (0.0016) [2024-06-15 18:03:49,696][1652491] Updated weights for policy 0, policy_version 531520 (0.0013) [2024-06-15 18:03:50,955][1648985] Fps is (10 sec: 52428.5, 60 sec: 48059.7, 300 sec: 46208.4). Total num frames: 1088552960. Throughput: 0: 11696.4. Samples: 272195584. Policy #0 lag: (min: 14.0, avg: 132.0, max: 267.0) [2024-06-15 18:03:50,956][1648985] Avg episode reward: [(0, '152.470')] [2024-06-15 18:03:54,877][1652491] Updated weights for policy 0, policy_version 531580 (0.0089) [2024-06-15 18:03:55,986][1648985] Fps is (10 sec: 45731.8, 60 sec: 45851.5, 300 sec: 46314.6). Total num frames: 1088684032. Throughput: 0: 11801.9. Samples: 272244224. Policy #0 lag: (min: 14.0, avg: 132.0, max: 267.0) [2024-06-15 18:03:55,987][1648985] Avg episode reward: [(0, '142.200')] [2024-06-15 18:03:57,198][1652491] Updated weights for policy 0, policy_version 531639 (0.0013) [2024-06-15 18:03:58,000][1652491] Updated weights for policy 0, policy_version 531664 (0.0033) [2024-06-15 18:03:59,653][1652491] Updated weights for policy 0, policy_version 531744 (0.0014) [2024-06-15 18:04:00,955][1648985] Fps is (10 sec: 52429.2, 60 sec: 48059.7, 300 sec: 46763.8). Total num frames: 1089077248. Throughput: 0: 11855.7. Samples: 272307712. Policy #0 lag: (min: 14.0, avg: 132.0, max: 267.0) [2024-06-15 18:04:00,955][1648985] Avg episode reward: [(0, '147.770')] [2024-06-15 18:04:04,658][1652491] Updated weights for policy 0, policy_version 531792 (0.0013) [2024-06-15 18:04:05,955][1648985] Fps is (10 sec: 52594.2, 60 sec: 45875.2, 300 sec: 46652.8). Total num frames: 1089208320. Throughput: 0: 12037.7. Samples: 272389632. Policy #0 lag: (min: 14.0, avg: 132.0, max: 267.0) [2024-06-15 18:04:05,955][1648985] Avg episode reward: [(0, '156.280')] [2024-06-15 18:04:06,941][1652491] Updated weights for policy 0, policy_version 531842 (0.0022) [2024-06-15 18:04:08,625][1652491] Updated weights for policy 0, policy_version 531905 (0.0012) [2024-06-15 18:04:10,848][1652491] Updated weights for policy 0, policy_version 532000 (0.0101) [2024-06-15 18:04:10,955][1648985] Fps is (10 sec: 45875.0, 60 sec: 48605.9, 300 sec: 46986.0). Total num frames: 1089536000. Throughput: 0: 12071.9. Samples: 272419840. Policy #0 lag: (min: 14.0, avg: 132.0, max: 267.0) [2024-06-15 18:04:10,956][1648985] Avg episode reward: [(0, '163.720')] [2024-06-15 18:04:15,548][1652491] Updated weights for policy 0, policy_version 532055 (0.0014) [2024-06-15 18:04:15,955][1648985] Fps is (10 sec: 49151.7, 60 sec: 48059.7, 300 sec: 46541.7). Total num frames: 1089699840. Throughput: 0: 12253.9. Samples: 272499200. Policy #0 lag: (min: 14.0, avg: 132.0, max: 267.0) [2024-06-15 18:04:15,956][1648985] Avg episode reward: [(0, '140.580')] [2024-06-15 18:04:18,488][1652491] Updated weights for policy 0, policy_version 532128 (0.0013) [2024-06-15 18:04:19,789][1652491] Updated weights for policy 0, policy_version 532176 (0.0016) [2024-06-15 18:04:20,955][1648985] Fps is (10 sec: 42597.7, 60 sec: 49151.8, 300 sec: 46874.9). Total num frames: 1089961984. Throughput: 0: 12083.2. Samples: 272561664. Policy #0 lag: (min: 14.0, avg: 132.0, max: 267.0) [2024-06-15 18:04:20,956][1648985] Avg episode reward: [(0, '150.580')] [2024-06-15 18:04:21,541][1652491] Updated weights for policy 0, policy_version 532256 (0.0013) [2024-06-15 18:04:25,928][1652491] Updated weights for policy 0, policy_version 532304 (0.0020) [2024-06-15 18:04:25,955][1648985] Fps is (10 sec: 45875.0, 60 sec: 47513.8, 300 sec: 46319.5). Total num frames: 1090158592. Throughput: 0: 12299.4. Samples: 272602624. Policy #0 lag: (min: 14.0, avg: 132.0, max: 267.0) [2024-06-15 18:04:25,956][1648985] Avg episode reward: [(0, '153.520')] [2024-06-15 18:04:26,051][1651469] Signal inference workers to stop experience collection... (27700 times) [2024-06-15 18:04:26,160][1652491] InferenceWorker_p0-w0: stopping experience collection (27700 times) [2024-06-15 18:04:26,265][1651469] Signal inference workers to resume experience collection... (27700 times) [2024-06-15 18:04:26,266][1652491] InferenceWorker_p0-w0: resuming experience collection (27700 times) [2024-06-15 18:04:26,909][1652491] Updated weights for policy 0, policy_version 532349 (0.0034) [2024-06-15 18:04:29,680][1652491] Updated weights for policy 0, policy_version 532400 (0.0013) [2024-06-15 18:04:30,955][1648985] Fps is (10 sec: 49153.0, 60 sec: 49152.1, 300 sec: 46874.9). Total num frames: 1090453504. Throughput: 0: 12117.4. Samples: 272678912. Policy #0 lag: (min: 14.0, avg: 132.0, max: 267.0) [2024-06-15 18:04:30,955][1648985] Avg episode reward: [(0, '158.330')] [2024-06-15 18:04:31,331][1652491] Updated weights for policy 0, policy_version 532470 (0.0014) [2024-06-15 18:04:32,897][1652491] Updated weights for policy 0, policy_version 532532 (0.0014) [2024-06-15 18:04:35,955][1648985] Fps is (10 sec: 49152.1, 60 sec: 48059.9, 300 sec: 46541.7). Total num frames: 1090650112. Throughput: 0: 12424.5. Samples: 272754688. Policy #0 lag: (min: 14.0, avg: 132.0, max: 267.0) [2024-06-15 18:04:35,956][1648985] Avg episode reward: [(0, '154.920')] [2024-06-15 18:04:36,578][1652491] Updated weights for policy 0, policy_version 532562 (0.0012) [2024-06-15 18:04:39,645][1652491] Updated weights for policy 0, policy_version 532625 (0.0013) [2024-06-15 18:04:40,955][1648985] Fps is (10 sec: 45875.1, 60 sec: 48059.7, 300 sec: 46652.8). Total num frames: 1090912256. Throughput: 0: 12182.7. Samples: 272792064. Policy #0 lag: (min: 18.0, avg: 103.3, max: 274.0) [2024-06-15 18:04:40,956][1648985] Avg episode reward: [(0, '153.840')] [2024-06-15 18:04:41,857][1652491] Updated weights for policy 0, policy_version 532705 (0.0012) [2024-06-15 18:04:42,827][1652491] Updated weights for policy 0, policy_version 532752 (0.0014) [2024-06-15 18:04:43,716][1652491] Updated weights for policy 0, policy_version 532796 (0.0014) [2024-06-15 18:04:45,955][1648985] Fps is (10 sec: 52429.5, 60 sec: 49152.1, 300 sec: 46763.8). Total num frames: 1091174400. Throughput: 0: 12106.0. Samples: 272852480. Policy #0 lag: (min: 18.0, avg: 103.3, max: 274.0) [2024-06-15 18:04:45,955][1648985] Avg episode reward: [(0, '153.140')] [2024-06-15 18:04:48,128][1652491] Updated weights for policy 0, policy_version 532864 (0.0012) [2024-06-15 18:04:50,955][1648985] Fps is (10 sec: 45875.1, 60 sec: 46967.5, 300 sec: 46655.1). Total num frames: 1091371008. Throughput: 0: 12060.4. Samples: 272932352. Policy #0 lag: (min: 18.0, avg: 103.3, max: 274.0) [2024-06-15 18:04:50,956][1648985] Avg episode reward: [(0, '147.230')] [2024-06-15 18:04:51,371][1652491] Updated weights for policy 0, policy_version 532918 (0.0015) [2024-06-15 18:04:52,897][1652491] Updated weights for policy 0, policy_version 532976 (0.0013) [2024-06-15 18:04:54,448][1652491] Updated weights for policy 0, policy_version 533040 (0.0014) [2024-06-15 18:04:55,955][1648985] Fps is (10 sec: 52427.1, 60 sec: 50270.4, 300 sec: 47097.0). Total num frames: 1091698688. Throughput: 0: 12049.0. Samples: 272962048. Policy #0 lag: (min: 18.0, avg: 103.3, max: 274.0) [2024-06-15 18:04:55,956][1648985] Avg episode reward: [(0, '164.510')] [2024-06-15 18:04:55,963][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000533056_1091698688.pth... [2024-06-15 18:04:56,066][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000527552_1080426496.pth [2024-06-15 18:04:58,060][1652491] Updated weights for policy 0, policy_version 533073 (0.0011) [2024-06-15 18:04:59,015][1652491] Updated weights for policy 0, policy_version 533113 (0.0011) [2024-06-15 18:05:00,960][1648985] Fps is (10 sec: 49128.4, 60 sec: 46417.6, 300 sec: 46769.3). Total num frames: 1091862528. Throughput: 0: 12218.4. Samples: 273049088. Policy #0 lag: (min: 18.0, avg: 103.3, max: 274.0) [2024-06-15 18:05:00,960][1648985] Avg episode reward: [(0, '169.170')] [2024-06-15 18:05:01,108][1652491] Updated weights for policy 0, policy_version 533153 (0.0026) [2024-06-15 18:05:02,640][1652491] Updated weights for policy 0, policy_version 533232 (0.0014) [2024-06-15 18:05:02,899][1651469] Signal inference workers to stop experience collection... (27750 times) [2024-06-15 18:05:02,974][1652491] InferenceWorker_p0-w0: stopping experience collection (27750 times) [2024-06-15 18:05:02,976][1651469] Signal inference workers to resume experience collection... (27750 times) [2024-06-15 18:05:03,001][1652491] InferenceWorker_p0-w0: resuming experience collection (27750 times) [2024-06-15 18:05:04,397][1652491] Updated weights for policy 0, policy_version 533307 (0.0013) [2024-06-15 18:05:05,955][1648985] Fps is (10 sec: 52429.9, 60 sec: 50244.2, 300 sec: 47319.2). Total num frames: 1092222976. Throughput: 0: 12197.0. Samples: 273110528. Policy #0 lag: (min: 18.0, avg: 103.3, max: 274.0) [2024-06-15 18:05:05,956][1648985] Avg episode reward: [(0, '164.890')] [2024-06-15 18:05:09,626][1652491] Updated weights for policy 0, policy_version 533360 (0.0011) [2024-06-15 18:05:10,955][1648985] Fps is (10 sec: 49175.6, 60 sec: 46967.5, 300 sec: 46652.7). Total num frames: 1092354048. Throughput: 0: 12344.9. Samples: 273158144. Policy #0 lag: (min: 18.0, avg: 103.3, max: 274.0) [2024-06-15 18:05:10,956][1648985] Avg episode reward: [(0, '164.310')] [2024-06-15 18:05:11,803][1652491] Updated weights for policy 0, policy_version 533398 (0.0010) [2024-06-15 18:05:13,435][1652491] Updated weights for policy 0, policy_version 533473 (0.0117) [2024-06-15 18:05:15,441][1652491] Updated weights for policy 0, policy_version 533565 (0.0014) [2024-06-15 18:05:15,955][1648985] Fps is (10 sec: 52428.2, 60 sec: 50790.3, 300 sec: 47541.3). Total num frames: 1092747264. Throughput: 0: 12083.2. Samples: 273222656. Policy #0 lag: (min: 18.0, avg: 103.3, max: 274.0) [2024-06-15 18:05:15,956][1648985] Avg episode reward: [(0, '163.690')] [2024-06-15 18:05:20,480][1652491] Updated weights for policy 0, policy_version 533623 (0.0011) [2024-06-15 18:05:20,955][1648985] Fps is (10 sec: 52429.1, 60 sec: 48606.0, 300 sec: 46652.8). Total num frames: 1092878336. Throughput: 0: 12060.5. Samples: 273297408. Policy #0 lag: (min: 18.0, avg: 103.3, max: 274.0) [2024-06-15 18:05:20,955][1648985] Avg episode reward: [(0, '155.990')] [2024-06-15 18:05:23,808][1652491] Updated weights for policy 0, policy_version 533696 (0.0086) [2024-06-15 18:05:25,913][1652491] Updated weights for policy 0, policy_version 533779 (0.0014) [2024-06-15 18:05:25,955][1648985] Fps is (10 sec: 42598.8, 60 sec: 50244.3, 300 sec: 47208.1). Total num frames: 1093173248. Throughput: 0: 12037.7. Samples: 273333760. Policy #0 lag: (min: 18.0, avg: 103.3, max: 274.0) [2024-06-15 18:05:25,956][1648985] Avg episode reward: [(0, '143.530')] [2024-06-15 18:05:30,955][1648985] Fps is (10 sec: 39320.9, 60 sec: 46967.3, 300 sec: 46541.6). Total num frames: 1093271552. Throughput: 0: 12162.8. Samples: 273399808. Policy #0 lag: (min: 18.0, avg: 103.3, max: 274.0) [2024-06-15 18:05:30,956][1648985] Avg episode reward: [(0, '139.140')] [2024-06-15 18:05:31,171][1652491] Updated weights for policy 0, policy_version 533840 (0.0014) [2024-06-15 18:05:32,216][1652491] Updated weights for policy 0, policy_version 533882 (0.0013) [2024-06-15 18:05:34,283][1652491] Updated weights for policy 0, policy_version 533921 (0.0011) [2024-06-15 18:05:35,955][1648985] Fps is (10 sec: 45875.3, 60 sec: 49698.1, 300 sec: 47097.1). Total num frames: 1093632000. Throughput: 0: 11958.0. Samples: 273470464. Policy #0 lag: (min: 18.0, avg: 103.3, max: 274.0) [2024-06-15 18:05:35,956][1648985] Avg episode reward: [(0, '150.700')] [2024-06-15 18:05:36,290][1652491] Updated weights for policy 0, policy_version 534016 (0.0014) [2024-06-15 18:05:37,669][1652491] Updated weights for policy 0, policy_version 534072 (0.0013) [2024-06-15 18:05:40,955][1648985] Fps is (10 sec: 52429.5, 60 sec: 48059.7, 300 sec: 46652.7). Total num frames: 1093795840. Throughput: 0: 11958.1. Samples: 273500160. Policy #0 lag: (min: 18.0, avg: 103.3, max: 274.0) [2024-06-15 18:05:40,955][1648985] Avg episode reward: [(0, '143.910')] [2024-06-15 18:05:43,318][1652491] Updated weights for policy 0, policy_version 534133 (0.0031) [2024-06-15 18:05:44,504][1652491] Updated weights for policy 0, policy_version 534160 (0.0011) [2024-06-15 18:05:44,610][1651469] Signal inference workers to stop experience collection... (27800 times) [2024-06-15 18:05:44,667][1652491] InferenceWorker_p0-w0: stopping experience collection (27800 times) [2024-06-15 18:05:44,779][1651469] Signal inference workers to resume experience collection... (27800 times) [2024-06-15 18:05:44,780][1652491] InferenceWorker_p0-w0: resuming experience collection (27800 times) [2024-06-15 18:05:45,232][1652491] Updated weights for policy 0, policy_version 534207 (0.0014) [2024-06-15 18:05:45,955][1648985] Fps is (10 sec: 45875.3, 60 sec: 48605.8, 300 sec: 47097.1). Total num frames: 1094090752. Throughput: 0: 11856.9. Samples: 273582592. Policy #0 lag: (min: 18.0, avg: 103.3, max: 274.0) [2024-06-15 18:05:45,955][1648985] Avg episode reward: [(0, '169.810')] [2024-06-15 18:05:46,740][1652491] Updated weights for policy 0, policy_version 534261 (0.0036) [2024-06-15 18:05:48,140][1652491] Updated weights for policy 0, policy_version 534304 (0.0016) [2024-06-15 18:05:50,955][1648985] Fps is (10 sec: 52428.5, 60 sec: 49152.0, 300 sec: 47097.1). Total num frames: 1094320128. Throughput: 0: 12060.4. Samples: 273653248. Policy #0 lag: (min: 18.0, avg: 103.3, max: 274.0) [2024-06-15 18:05:50,956][1648985] Avg episode reward: [(0, '166.400')] [2024-06-15 18:05:53,187][1652491] Updated weights for policy 0, policy_version 534354 (0.0031) [2024-06-15 18:05:54,959][1652491] Updated weights for policy 0, policy_version 534405 (0.0112) [2024-06-15 18:05:55,955][1648985] Fps is (10 sec: 45875.7, 60 sec: 47513.9, 300 sec: 46986.0). Total num frames: 1094549504. Throughput: 0: 11776.0. Samples: 273688064. Policy #0 lag: (min: 18.0, avg: 103.3, max: 274.0) [2024-06-15 18:05:55,955][1648985] Avg episode reward: [(0, '150.640')] [2024-06-15 18:05:56,112][1652491] Updated weights for policy 0, policy_version 534454 (0.0013) [2024-06-15 18:05:56,818][1652491] Updated weights for policy 0, policy_version 534482 (0.0057) [2024-06-15 18:05:58,747][1652491] Updated weights for policy 0, policy_version 534550 (0.0025) [2024-06-15 18:06:00,955][1648985] Fps is (10 sec: 52429.3, 60 sec: 49702.2, 300 sec: 47208.1). Total num frames: 1094844416. Throughput: 0: 11889.8. Samples: 273757696. Policy #0 lag: (min: 18.0, avg: 103.3, max: 274.0) [2024-06-15 18:06:00,955][1648985] Avg episode reward: [(0, '141.670')] [2024-06-15 18:06:03,789][1652491] Updated weights for policy 0, policy_version 534595 (0.0012) [2024-06-15 18:06:04,883][1652491] Updated weights for policy 0, policy_version 534647 (0.0013) [2024-06-15 18:06:05,956][1648985] Fps is (10 sec: 45870.3, 60 sec: 46420.6, 300 sec: 46765.1). Total num frames: 1095008256. Throughput: 0: 11957.8. Samples: 273835520. Policy #0 lag: (min: 18.0, avg: 103.3, max: 274.0) [2024-06-15 18:06:05,956][1648985] Avg episode reward: [(0, '149.480')] [2024-06-15 18:06:06,333][1652491] Updated weights for policy 0, policy_version 534693 (0.0015) [2024-06-15 18:06:07,732][1652491] Updated weights for policy 0, policy_version 534736 (0.0013) [2024-06-15 18:06:08,849][1652491] Updated weights for policy 0, policy_version 534784 (0.0013) [2024-06-15 18:06:10,684][1652491] Updated weights for policy 0, policy_version 534840 (0.0015) [2024-06-15 18:06:10,955][1648985] Fps is (10 sec: 52428.5, 60 sec: 50244.3, 300 sec: 47541.4). Total num frames: 1095368704. Throughput: 0: 11923.9. Samples: 273870336. Policy #0 lag: (min: 15.0, avg: 166.5, max: 271.0) [2024-06-15 18:06:10,956][1648985] Avg episode reward: [(0, '167.030')] [2024-06-15 18:06:15,692][1652491] Updated weights for policy 0, policy_version 534880 (0.0091) [2024-06-15 18:06:15,955][1648985] Fps is (10 sec: 42602.4, 60 sec: 44783.0, 300 sec: 46664.0). Total num frames: 1095434240. Throughput: 0: 12151.5. Samples: 273946624. Policy #0 lag: (min: 15.0, avg: 166.5, max: 271.0) [2024-06-15 18:06:15,955][1648985] Avg episode reward: [(0, '166.610')] [2024-06-15 18:06:17,590][1652491] Updated weights for policy 0, policy_version 534960 (0.0012) [2024-06-15 18:06:19,879][1652491] Updated weights for policy 0, policy_version 535015 (0.0018) [2024-06-15 18:06:20,955][1648985] Fps is (10 sec: 42598.4, 60 sec: 48605.8, 300 sec: 47208.1). Total num frames: 1095794688. Throughput: 0: 11980.8. Samples: 274009600. Policy #0 lag: (min: 15.0, avg: 166.5, max: 271.0) [2024-06-15 18:06:20,956][1648985] Avg episode reward: [(0, '169.500')] [2024-06-15 18:06:21,363][1652491] Updated weights for policy 0, policy_version 535072 (0.0013) [2024-06-15 18:06:22,236][1652491] Updated weights for policy 0, policy_version 535104 (0.0012) [2024-06-15 18:06:25,955][1648985] Fps is (10 sec: 45875.5, 60 sec: 45329.1, 300 sec: 46652.8). Total num frames: 1095892992. Throughput: 0: 12083.2. Samples: 274043904. Policy #0 lag: (min: 15.0, avg: 166.5, max: 271.0) [2024-06-15 18:06:25,955][1648985] Avg episode reward: [(0, '163.470')] [2024-06-15 18:06:27,503][1651469] Signal inference workers to stop experience collection... (27850 times) [2024-06-15 18:06:27,557][1652491] InferenceWorker_p0-w0: stopping experience collection (27850 times) [2024-06-15 18:06:27,799][1651469] Signal inference workers to resume experience collection... (27850 times) [2024-06-15 18:06:27,800][1652491] InferenceWorker_p0-w0: resuming experience collection (27850 times) [2024-06-15 18:06:27,978][1652491] Updated weights for policy 0, policy_version 535170 (0.0013) [2024-06-15 18:06:29,236][1652491] Updated weights for policy 0, policy_version 535229 (0.0012) [2024-06-15 18:06:30,955][1648985] Fps is (10 sec: 36044.9, 60 sec: 48059.8, 300 sec: 47097.4). Total num frames: 1096155136. Throughput: 0: 11889.8. Samples: 274117632. Policy #0 lag: (min: 15.0, avg: 166.5, max: 271.0) [2024-06-15 18:06:30,956][1648985] Avg episode reward: [(0, '169.880')] [2024-06-15 18:06:32,423][1652491] Updated weights for policy 0, policy_version 535296 (0.0013) [2024-06-15 18:06:33,515][1652491] Updated weights for policy 0, policy_version 535346 (0.0014) [2024-06-15 18:06:35,955][1648985] Fps is (10 sec: 52429.2, 60 sec: 46421.4, 300 sec: 47097.1). Total num frames: 1096417280. Throughput: 0: 11730.5. Samples: 274181120. Policy #0 lag: (min: 15.0, avg: 166.5, max: 271.0) [2024-06-15 18:06:35,955][1648985] Avg episode reward: [(0, '147.390')] [2024-06-15 18:06:38,225][1652491] Updated weights for policy 0, policy_version 535392 (0.0075) [2024-06-15 18:06:40,185][1652491] Updated weights for policy 0, policy_version 535478 (0.0013) [2024-06-15 18:06:40,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 48059.7, 300 sec: 47097.1). Total num frames: 1096679424. Throughput: 0: 11798.7. Samples: 274219008. Policy #0 lag: (min: 15.0, avg: 166.5, max: 271.0) [2024-06-15 18:06:40,956][1648985] Avg episode reward: [(0, '161.170')] [2024-06-15 18:06:42,952][1652491] Updated weights for policy 0, policy_version 535520 (0.0126) [2024-06-15 18:06:44,352][1652491] Updated weights for policy 0, policy_version 535573 (0.0014) [2024-06-15 18:06:45,955][1648985] Fps is (10 sec: 52427.1, 60 sec: 47513.4, 300 sec: 47319.2). Total num frames: 1096941568. Throughput: 0: 11810.1. Samples: 274289152. Policy #0 lag: (min: 15.0, avg: 166.5, max: 271.0) [2024-06-15 18:06:45,956][1648985] Avg episode reward: [(0, '163.150')] [2024-06-15 18:06:48,816][1652491] Updated weights for policy 0, policy_version 535634 (0.0014) [2024-06-15 18:06:50,219][1652491] Updated weights for policy 0, policy_version 535699 (0.0051) [2024-06-15 18:06:50,906][1652491] Updated weights for policy 0, policy_version 535744 (0.0014) [2024-06-15 18:06:50,955][1648985] Fps is (10 sec: 52429.1, 60 sec: 48059.8, 300 sec: 47319.2). Total num frames: 1097203712. Throughput: 0: 11764.9. Samples: 274364928. Policy #0 lag: (min: 15.0, avg: 166.5, max: 271.0) [2024-06-15 18:06:50,955][1648985] Avg episode reward: [(0, '152.490')] [2024-06-15 18:06:53,996][1652491] Updated weights for policy 0, policy_version 535810 (0.0021) [2024-06-15 18:06:55,252][1652491] Updated weights for policy 0, policy_version 535868 (0.0081) [2024-06-15 18:06:55,955][1648985] Fps is (10 sec: 52428.2, 60 sec: 48605.5, 300 sec: 47763.5). Total num frames: 1097465856. Throughput: 0: 11798.7. Samples: 274401280. Policy #0 lag: (min: 15.0, avg: 166.5, max: 271.0) [2024-06-15 18:06:55,956][1648985] Avg episode reward: [(0, '140.660')] [2024-06-15 18:06:55,970][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000535872_1097465856.pth... [2024-06-15 18:06:56,048][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000530240_1085931520.pth [2024-06-15 18:07:00,204][1652491] Updated weights for policy 0, policy_version 535921 (0.0084) [2024-06-15 18:07:00,955][1648985] Fps is (10 sec: 42598.3, 60 sec: 46421.3, 300 sec: 47208.1). Total num frames: 1097629696. Throughput: 0: 11787.4. Samples: 274477056. Policy #0 lag: (min: 15.0, avg: 166.5, max: 271.0) [2024-06-15 18:07:00,956][1648985] Avg episode reward: [(0, '154.530')] [2024-06-15 18:07:01,703][1652491] Updated weights for policy 0, policy_version 535993 (0.0029) [2024-06-15 18:07:04,653][1652491] Updated weights for policy 0, policy_version 536059 (0.0013) [2024-06-15 18:07:04,934][1651469] Signal inference workers to stop experience collection... (27900 times) [2024-06-15 18:07:04,967][1652491] InferenceWorker_p0-w0: stopping experience collection (27900 times) [2024-06-15 18:07:05,224][1651469] Signal inference workers to resume experience collection... (27900 times) [2024-06-15 18:07:05,225][1652491] InferenceWorker_p0-w0: resuming experience collection (27900 times) [2024-06-15 18:07:05,955][1648985] Fps is (10 sec: 45876.1, 60 sec: 48606.5, 300 sec: 47763.5). Total num frames: 1097924608. Throughput: 0: 11844.2. Samples: 274542592. Policy #0 lag: (min: 15.0, avg: 166.5, max: 271.0) [2024-06-15 18:07:05,956][1648985] Avg episode reward: [(0, '173.640')] [2024-06-15 18:07:06,149][1652491] Updated weights for policy 0, policy_version 536123 (0.0013) [2024-06-15 18:07:10,955][1648985] Fps is (10 sec: 45875.2, 60 sec: 45329.1, 300 sec: 47430.3). Total num frames: 1098088448. Throughput: 0: 11923.9. Samples: 274580480. Policy #0 lag: (min: 15.0, avg: 166.5, max: 271.0) [2024-06-15 18:07:10,955][1648985] Avg episode reward: [(0, '176.160')] [2024-06-15 18:07:11,063][1652491] Updated weights for policy 0, policy_version 536191 (0.0014) [2024-06-15 18:07:12,853][1652491] Updated weights for policy 0, policy_version 536250 (0.0014) [2024-06-15 18:07:15,955][1648985] Fps is (10 sec: 42598.8, 60 sec: 48605.9, 300 sec: 47430.3). Total num frames: 1098350592. Throughput: 0: 11969.4. Samples: 274656256. Policy #0 lag: (min: 15.0, avg: 166.5, max: 271.0) [2024-06-15 18:07:15,956][1648985] Avg episode reward: [(0, '178.940')] [2024-06-15 18:07:16,272][1652491] Updated weights for policy 0, policy_version 536320 (0.0013) [2024-06-15 18:07:20,955][1648985] Fps is (10 sec: 42598.3, 60 sec: 45329.1, 300 sec: 47097.1). Total num frames: 1098514432. Throughput: 0: 11923.9. Samples: 274717696. Policy #0 lag: (min: 15.0, avg: 166.5, max: 271.0) [2024-06-15 18:07:20,956][1648985] Avg episode reward: [(0, '191.650')] [2024-06-15 18:07:22,303][1652491] Updated weights for policy 0, policy_version 536421 (0.0013) [2024-06-15 18:07:23,470][1652491] Updated weights for policy 0, policy_version 536451 (0.0012) [2024-06-15 18:07:25,955][1648985] Fps is (10 sec: 42598.6, 60 sec: 48059.7, 300 sec: 47097.0). Total num frames: 1098776576. Throughput: 0: 11810.1. Samples: 274750464. Policy #0 lag: (min: 15.0, avg: 166.5, max: 271.0) [2024-06-15 18:07:25,955][1648985] Avg episode reward: [(0, '162.650')] [2024-06-15 18:07:27,086][1652491] Updated weights for policy 0, policy_version 536518 (0.0013) [2024-06-15 18:07:29,087][1652491] Updated weights for policy 0, policy_version 536608 (0.0087) [2024-06-15 18:07:29,697][1652491] Updated weights for policy 0, policy_version 536638 (0.0012) [2024-06-15 18:07:30,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 48059.7, 300 sec: 47541.4). Total num frames: 1099038720. Throughput: 0: 11719.2. Samples: 274816512. Policy #0 lag: (min: 15.0, avg: 166.5, max: 271.0) [2024-06-15 18:07:30,955][1648985] Avg episode reward: [(0, '171.290')] [2024-06-15 18:07:33,807][1652491] Updated weights for policy 0, policy_version 536693 (0.0061) [2024-06-15 18:07:35,778][1652491] Updated weights for policy 0, policy_version 536758 (0.0012) [2024-06-15 18:07:35,955][1648985] Fps is (10 sec: 52428.6, 60 sec: 48059.6, 300 sec: 47652.5). Total num frames: 1099300864. Throughput: 0: 11571.2. Samples: 274885632. Policy #0 lag: (min: 15.0, avg: 166.5, max: 271.0) [2024-06-15 18:07:35,955][1648985] Avg episode reward: [(0, '161.380')] [2024-06-15 18:07:39,449][1652491] Updated weights for policy 0, policy_version 536816 (0.0014) [2024-06-15 18:07:40,955][1648985] Fps is (10 sec: 45874.3, 60 sec: 46967.3, 300 sec: 47541.3). Total num frames: 1099497472. Throughput: 0: 11628.1. Samples: 274924544. Policy #0 lag: (min: 29.0, avg: 126.8, max: 285.0) [2024-06-15 18:07:40,956][1648985] Avg episode reward: [(0, '174.120')] [2024-06-15 18:07:41,220][1652491] Updated weights for policy 0, policy_version 536891 (0.0014) [2024-06-15 18:07:45,079][1652491] Updated weights for policy 0, policy_version 536952 (0.0014) [2024-06-15 18:07:45,955][1648985] Fps is (10 sec: 42598.5, 60 sec: 46421.5, 300 sec: 47652.4). Total num frames: 1099726848. Throughput: 0: 11548.4. Samples: 274996736. Policy #0 lag: (min: 29.0, avg: 126.8, max: 285.0) [2024-06-15 18:07:45,955][1648985] Avg episode reward: [(0, '156.350')] [2024-06-15 18:07:46,630][1652491] Updated weights for policy 0, policy_version 537015 (0.0109) [2024-06-15 18:07:49,697][1651469] Signal inference workers to stop experience collection... (27950 times) [2024-06-15 18:07:49,766][1652491] InferenceWorker_p0-w0: stopping experience collection (27950 times) [2024-06-15 18:07:49,969][1651469] Signal inference workers to resume experience collection... (27950 times) [2024-06-15 18:07:49,971][1652491] InferenceWorker_p0-w0: resuming experience collection (27950 times) [2024-06-15 18:07:50,148][1652491] Updated weights for policy 0, policy_version 537058 (0.0013) [2024-06-15 18:07:50,955][1648985] Fps is (10 sec: 45876.3, 60 sec: 45875.2, 300 sec: 47541.4). Total num frames: 1099956224. Throughput: 0: 11605.4. Samples: 275064832. Policy #0 lag: (min: 29.0, avg: 126.8, max: 285.0) [2024-06-15 18:07:50,955][1648985] Avg episode reward: [(0, '150.990')] [2024-06-15 18:07:51,198][1652491] Updated weights for policy 0, policy_version 537105 (0.0026) [2024-06-15 18:07:55,321][1652491] Updated weights for policy 0, policy_version 537168 (0.0015) [2024-06-15 18:07:55,955][1648985] Fps is (10 sec: 42598.2, 60 sec: 44783.1, 300 sec: 47319.2). Total num frames: 1100152832. Throughput: 0: 11548.4. Samples: 275100160. Policy #0 lag: (min: 29.0, avg: 126.8, max: 285.0) [2024-06-15 18:07:55,956][1648985] Avg episode reward: [(0, '153.740')] [2024-06-15 18:07:56,780][1652491] Updated weights for policy 0, policy_version 537232 (0.0099) [2024-06-15 18:07:57,969][1652491] Updated weights for policy 0, policy_version 537280 (0.0012) [2024-06-15 18:08:00,963][1648985] Fps is (10 sec: 42565.5, 60 sec: 45869.3, 300 sec: 47206.9). Total num frames: 1100382208. Throughput: 0: 11444.1. Samples: 275171328. Policy #0 lag: (min: 29.0, avg: 126.8, max: 285.0) [2024-06-15 18:08:00,963][1648985] Avg episode reward: [(0, '160.000')] [2024-06-15 18:08:02,193][1652491] Updated weights for policy 0, policy_version 537344 (0.0014) [2024-06-15 18:08:03,421][1652491] Updated weights for policy 0, policy_version 537397 (0.0023) [2024-06-15 18:08:05,955][1648985] Fps is (10 sec: 45875.4, 60 sec: 44783.0, 300 sec: 47430.3). Total num frames: 1100611584. Throughput: 0: 11559.8. Samples: 275237888. Policy #0 lag: (min: 29.0, avg: 126.8, max: 285.0) [2024-06-15 18:08:05,955][1648985] Avg episode reward: [(0, '153.070')] [2024-06-15 18:08:07,329][1652491] Updated weights for policy 0, policy_version 537446 (0.0013) [2024-06-15 18:08:08,673][1652491] Updated weights for policy 0, policy_version 537504 (0.0017) [2024-06-15 18:08:10,955][1648985] Fps is (10 sec: 49189.7, 60 sec: 46421.3, 300 sec: 47652.4). Total num frames: 1100873728. Throughput: 0: 11593.9. Samples: 275272192. Policy #0 lag: (min: 29.0, avg: 126.8, max: 285.0) [2024-06-15 18:08:10,956][1648985] Avg episode reward: [(0, '131.110')] [2024-06-15 18:08:12,177][1652491] Updated weights for policy 0, policy_version 537542 (0.0036) [2024-06-15 18:08:14,103][1652491] Updated weights for policy 0, policy_version 537616 (0.0021) [2024-06-15 18:08:15,955][1648985] Fps is (10 sec: 52427.8, 60 sec: 46421.2, 300 sec: 47874.6). Total num frames: 1101135872. Throughput: 0: 11571.2. Samples: 275337216. Policy #0 lag: (min: 29.0, avg: 126.8, max: 285.0) [2024-06-15 18:08:15,956][1648985] Avg episode reward: [(0, '146.030')] [2024-06-15 18:08:17,780][1652491] Updated weights for policy 0, policy_version 537680 (0.0012) [2024-06-15 18:08:18,771][1652491] Updated weights for policy 0, policy_version 537725 (0.0015) [2024-06-15 18:08:20,835][1652491] Updated weights for policy 0, policy_version 537789 (0.0013) [2024-06-15 18:08:20,955][1648985] Fps is (10 sec: 52429.2, 60 sec: 48059.8, 300 sec: 47763.6). Total num frames: 1101398016. Throughput: 0: 11605.3. Samples: 275407872. Policy #0 lag: (min: 29.0, avg: 126.8, max: 285.0) [2024-06-15 18:08:20,956][1648985] Avg episode reward: [(0, '166.940')] [2024-06-15 18:08:24,843][1652491] Updated weights for policy 0, policy_version 537848 (0.0014) [2024-06-15 18:08:25,955][1648985] Fps is (10 sec: 42599.2, 60 sec: 46421.3, 300 sec: 47652.5). Total num frames: 1101561856. Throughput: 0: 11582.6. Samples: 275445760. Policy #0 lag: (min: 29.0, avg: 126.8, max: 285.0) [2024-06-15 18:08:25,956][1648985] Avg episode reward: [(0, '157.550')] [2024-06-15 18:08:26,632][1652491] Updated weights for policy 0, policy_version 537909 (0.0014) [2024-06-15 18:08:29,262][1652491] Updated weights for policy 0, policy_version 537936 (0.0032) [2024-06-15 18:08:30,320][1652491] Updated weights for policy 0, policy_version 537984 (0.0017) [2024-06-15 18:08:30,955][1648985] Fps is (10 sec: 42597.6, 60 sec: 46421.2, 300 sec: 47652.5). Total num frames: 1101824000. Throughput: 0: 11502.9. Samples: 275514368. Policy #0 lag: (min: 29.0, avg: 126.8, max: 285.0) [2024-06-15 18:08:30,956][1648985] Avg episode reward: [(0, '154.490')] [2024-06-15 18:08:31,571][1651469] Signal inference workers to stop experience collection... (28000 times) [2024-06-15 18:08:31,642][1652491] InferenceWorker_p0-w0: stopping experience collection (28000 times) [2024-06-15 18:08:31,799][1651469] Signal inference workers to resume experience collection... (28000 times) [2024-06-15 18:08:31,801][1652491] InferenceWorker_p0-w0: resuming experience collection (28000 times) [2024-06-15 18:08:32,113][1652491] Updated weights for policy 0, policy_version 538044 (0.0027) [2024-06-15 18:08:35,833][1652491] Updated weights for policy 0, policy_version 538109 (0.0014) [2024-06-15 18:08:35,955][1648985] Fps is (10 sec: 49151.8, 60 sec: 45875.2, 300 sec: 47541.4). Total num frames: 1102053376. Throughput: 0: 11559.8. Samples: 275585024. Policy #0 lag: (min: 29.0, avg: 126.8, max: 285.0) [2024-06-15 18:08:35,956][1648985] Avg episode reward: [(0, '152.280')] [2024-06-15 18:08:38,009][1652491] Updated weights for policy 0, policy_version 538167 (0.0013) [2024-06-15 18:08:40,397][1652491] Updated weights for policy 0, policy_version 538213 (0.0013) [2024-06-15 18:08:40,955][1648985] Fps is (10 sec: 49152.6, 60 sec: 46967.6, 300 sec: 47763.5). Total num frames: 1102315520. Throughput: 0: 11571.2. Samples: 275620864. Policy #0 lag: (min: 29.0, avg: 126.8, max: 285.0) [2024-06-15 18:08:40,956][1648985] Avg episode reward: [(0, '155.360')] [2024-06-15 18:08:41,983][1652491] Updated weights for policy 0, policy_version 538288 (0.0014) [2024-06-15 18:08:45,955][1648985] Fps is (10 sec: 42597.2, 60 sec: 45874.9, 300 sec: 47208.1). Total num frames: 1102479360. Throughput: 0: 11812.1. Samples: 275702784. Policy #0 lag: (min: 29.0, avg: 126.8, max: 285.0) [2024-06-15 18:08:45,956][1648985] Avg episode reward: [(0, '167.580')] [2024-06-15 18:08:46,367][1652491] Updated weights for policy 0, policy_version 538338 (0.0031) [2024-06-15 18:08:47,878][1652491] Updated weights for policy 0, policy_version 538405 (0.0012) [2024-06-15 18:08:50,308][1652491] Updated weights for policy 0, policy_version 538449 (0.0014) [2024-06-15 18:08:50,955][1648985] Fps is (10 sec: 49152.2, 60 sec: 47513.6, 300 sec: 47879.7). Total num frames: 1102807040. Throughput: 0: 11810.1. Samples: 275769344. Policy #0 lag: (min: 29.0, avg: 126.8, max: 285.0) [2024-06-15 18:08:50,956][1648985] Avg episode reward: [(0, '140.960')] [2024-06-15 18:08:52,170][1652491] Updated weights for policy 0, policy_version 538516 (0.0012) [2024-06-15 18:08:55,955][1648985] Fps is (10 sec: 49152.5, 60 sec: 46967.3, 300 sec: 47097.0). Total num frames: 1102970880. Throughput: 0: 11878.4. Samples: 275806720. Policy #0 lag: (min: 29.0, avg: 126.8, max: 285.0) [2024-06-15 18:08:55,956][1648985] Avg episode reward: [(0, '127.990')] [2024-06-15 18:08:56,342][1652491] Updated weights for policy 0, policy_version 538580 (0.0013) [2024-06-15 18:08:56,510][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000538592_1103036416.pth... [2024-06-15 18:08:56,628][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000533056_1091698688.pth [2024-06-15 18:08:57,991][1652491] Updated weights for policy 0, policy_version 538625 (0.0013) [2024-06-15 18:08:59,104][1652491] Updated weights for policy 0, policy_version 538680 (0.0013) [2024-06-15 18:09:00,959][1648985] Fps is (10 sec: 49129.8, 60 sec: 48608.4, 300 sec: 47762.8). Total num frames: 1103298560. Throughput: 0: 12173.1. Samples: 275885056. Policy #0 lag: (min: 29.0, avg: 126.8, max: 285.0) [2024-06-15 18:09:00,960][1648985] Avg episode reward: [(0, '148.290')] [2024-06-15 18:09:01,110][1652491] Updated weights for policy 0, policy_version 538723 (0.0012) [2024-06-15 18:09:03,165][1652491] Updated weights for policy 0, policy_version 538784 (0.0013) [2024-06-15 18:09:05,955][1648985] Fps is (10 sec: 52429.8, 60 sec: 48059.7, 300 sec: 47319.2). Total num frames: 1103495168. Throughput: 0: 12253.8. Samples: 275959296. Policy #0 lag: (min: 29.0, avg: 126.8, max: 285.0) [2024-06-15 18:09:05,956][1648985] Avg episode reward: [(0, '158.380')] [2024-06-15 18:09:06,439][1652491] Updated weights for policy 0, policy_version 538823 (0.0016) [2024-06-15 18:09:09,514][1652491] Updated weights for policy 0, policy_version 538881 (0.0016) [2024-06-15 18:09:10,955][1648985] Fps is (10 sec: 42616.6, 60 sec: 47513.4, 300 sec: 47541.3). Total num frames: 1103724544. Throughput: 0: 12083.1. Samples: 275989504. Policy #0 lag: (min: 15.0, avg: 118.6, max: 271.0) [2024-06-15 18:09:10,956][1648985] Avg episode reward: [(0, '145.860')] [2024-06-15 18:09:11,089][1652491] Updated weights for policy 0, policy_version 538945 (0.0012) [2024-06-15 18:09:12,210][1652491] Updated weights for policy 0, policy_version 539001 (0.0013) [2024-06-15 18:09:14,199][1652491] Updated weights for policy 0, policy_version 539028 (0.0013) [2024-06-15 18:09:14,894][1652491] Updated weights for policy 0, policy_version 539071 (0.0015) [2024-06-15 18:09:15,965][1648985] Fps is (10 sec: 52374.7, 60 sec: 48051.6, 300 sec: 47650.8). Total num frames: 1104019456. Throughput: 0: 12217.0. Samples: 276064256. Policy #0 lag: (min: 15.0, avg: 118.6, max: 271.0) [2024-06-15 18:09:15,966][1648985] Avg episode reward: [(0, '140.950')] [2024-06-15 18:09:16,892][1651469] Signal inference workers to stop experience collection... (28050 times) [2024-06-15 18:09:16,918][1652491] InferenceWorker_p0-w0: stopping experience collection (28050 times) [2024-06-15 18:09:17,145][1651469] Signal inference workers to resume experience collection... (28050 times) [2024-06-15 18:09:17,145][1652491] InferenceWorker_p0-w0: resuming experience collection (28050 times) [2024-06-15 18:09:18,216][1652491] Updated weights for policy 0, policy_version 539131 (0.0011) [2024-06-15 18:09:20,955][1648985] Fps is (10 sec: 49153.1, 60 sec: 46967.4, 300 sec: 47652.5). Total num frames: 1104216064. Throughput: 0: 12299.4. Samples: 276138496. Policy #0 lag: (min: 15.0, avg: 118.6, max: 271.0) [2024-06-15 18:09:20,956][1648985] Avg episode reward: [(0, '155.860')] [2024-06-15 18:09:21,762][1652491] Updated weights for policy 0, policy_version 539201 (0.0013) [2024-06-15 18:09:24,308][1652491] Updated weights for policy 0, policy_version 539280 (0.0015) [2024-06-15 18:09:25,955][1648985] Fps is (10 sec: 52482.2, 60 sec: 49698.0, 300 sec: 47763.5). Total num frames: 1104543744. Throughput: 0: 12322.1. Samples: 276175360. Policy #0 lag: (min: 15.0, avg: 118.6, max: 271.0) [2024-06-15 18:09:25,956][1648985] Avg episode reward: [(0, '160.310')] [2024-06-15 18:09:27,776][1652491] Updated weights for policy 0, policy_version 539329 (0.0013) [2024-06-15 18:09:29,210][1652491] Updated weights for policy 0, policy_version 539389 (0.0014) [2024-06-15 18:09:30,955][1648985] Fps is (10 sec: 45875.1, 60 sec: 47513.7, 300 sec: 47541.4). Total num frames: 1104674816. Throughput: 0: 12174.3. Samples: 276250624. Policy #0 lag: (min: 15.0, avg: 118.6, max: 271.0) [2024-06-15 18:09:30,956][1648985] Avg episode reward: [(0, '176.040')] [2024-06-15 18:09:32,618][1652491] Updated weights for policy 0, policy_version 539456 (0.0093) [2024-06-15 18:09:34,164][1652491] Updated weights for policy 0, policy_version 539512 (0.0023) [2024-06-15 18:09:35,582][1652491] Updated weights for policy 0, policy_version 539538 (0.0013) [2024-06-15 18:09:35,955][1648985] Fps is (10 sec: 45876.1, 60 sec: 49152.1, 300 sec: 47763.5). Total num frames: 1105002496. Throughput: 0: 12140.1. Samples: 276315648. Policy #0 lag: (min: 15.0, avg: 118.6, max: 271.0) [2024-06-15 18:09:35,955][1648985] Avg episode reward: [(0, '167.330')] [2024-06-15 18:09:36,259][1652491] Updated weights for policy 0, policy_version 539579 (0.0012) [2024-06-15 18:09:39,823][1652491] Updated weights for policy 0, policy_version 539621 (0.0012) [2024-06-15 18:09:40,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 48059.7, 300 sec: 47541.4). Total num frames: 1105199104. Throughput: 0: 12299.4. Samples: 276360192. Policy #0 lag: (min: 15.0, avg: 118.6, max: 271.0) [2024-06-15 18:09:40,956][1648985] Avg episode reward: [(0, '160.120')] [2024-06-15 18:09:42,094][1652491] Updated weights for policy 0, policy_version 539664 (0.0013) [2024-06-15 18:09:44,066][1652491] Updated weights for policy 0, policy_version 539747 (0.0014) [2024-06-15 18:09:45,955][1648985] Fps is (10 sec: 45875.2, 60 sec: 49698.4, 300 sec: 47763.5). Total num frames: 1105461248. Throughput: 0: 11993.4. Samples: 276424704. Policy #0 lag: (min: 15.0, avg: 118.6, max: 271.0) [2024-06-15 18:09:45,956][1648985] Avg episode reward: [(0, '151.250')] [2024-06-15 18:09:46,580][1652491] Updated weights for policy 0, policy_version 539811 (0.0147) [2024-06-15 18:09:50,393][1652491] Updated weights for policy 0, policy_version 539858 (0.0033) [2024-06-15 18:09:50,955][1648985] Fps is (10 sec: 45875.4, 60 sec: 47513.6, 300 sec: 47319.3). Total num frames: 1105657856. Throughput: 0: 12060.5. Samples: 276502016. Policy #0 lag: (min: 15.0, avg: 118.6, max: 271.0) [2024-06-15 18:09:50,955][1648985] Avg episode reward: [(0, '138.490')] [2024-06-15 18:09:51,464][1652491] Updated weights for policy 0, policy_version 539904 (0.0011) [2024-06-15 18:09:54,057][1652491] Updated weights for policy 0, policy_version 539969 (0.0014) [2024-06-15 18:09:55,955][1648985] Fps is (10 sec: 52428.2, 60 sec: 50244.4, 300 sec: 47875.4). Total num frames: 1105985536. Throughput: 0: 12197.0. Samples: 276538368. Policy #0 lag: (min: 15.0, avg: 118.6, max: 271.0) [2024-06-15 18:09:55,956][1648985] Avg episode reward: [(0, '131.360')] [2024-06-15 18:09:57,340][1651469] Signal inference workers to stop experience collection... (28100 times) [2024-06-15 18:09:57,377][1652491] Updated weights for policy 0, policy_version 540050 (0.0012) [2024-06-15 18:09:57,428][1652491] InferenceWorker_p0-w0: stopping experience collection (28100 times) [2024-06-15 18:09:57,613][1651469] Signal inference workers to resume experience collection... (28100 times) [2024-06-15 18:09:57,618][1652491] InferenceWorker_p0-w0: resuming experience collection (28100 times) [2024-06-15 18:10:00,890][1652491] Updated weights for policy 0, policy_version 540098 (0.0012) [2024-06-15 18:10:00,955][1648985] Fps is (10 sec: 45875.0, 60 sec: 46971.0, 300 sec: 47097.1). Total num frames: 1106116608. Throughput: 0: 12131.5. Samples: 276610048. Policy #0 lag: (min: 15.0, avg: 118.6, max: 271.0) [2024-06-15 18:10:00,956][1648985] Avg episode reward: [(0, '123.700')] [2024-06-15 18:10:02,083][1652491] Updated weights for policy 0, policy_version 540151 (0.0012) [2024-06-15 18:10:03,317][1652491] Updated weights for policy 0, policy_version 540179 (0.0011) [2024-06-15 18:10:04,991][1652491] Updated weights for policy 0, policy_version 540235 (0.0014) [2024-06-15 18:10:05,955][1648985] Fps is (10 sec: 49152.6, 60 sec: 49698.2, 300 sec: 47874.6). Total num frames: 1106477056. Throughput: 0: 12014.9. Samples: 276679168. Policy #0 lag: (min: 15.0, avg: 118.6, max: 271.0) [2024-06-15 18:10:05,956][1648985] Avg episode reward: [(0, '135.920')] [2024-06-15 18:10:08,136][1652491] Updated weights for policy 0, policy_version 540289 (0.0012) [2024-06-15 18:10:09,532][1652491] Updated weights for policy 0, policy_version 540349 (0.0018) [2024-06-15 18:10:10,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 48606.0, 300 sec: 47097.1). Total num frames: 1106640896. Throughput: 0: 12037.7. Samples: 276717056. Policy #0 lag: (min: 15.0, avg: 118.6, max: 271.0) [2024-06-15 18:10:10,956][1648985] Avg episode reward: [(0, '152.810')] [2024-06-15 18:10:12,846][1652491] Updated weights for policy 0, policy_version 540407 (0.0014) [2024-06-15 18:10:14,985][1652491] Updated weights for policy 0, policy_version 540478 (0.0015) [2024-06-15 18:10:15,955][1648985] Fps is (10 sec: 42598.2, 60 sec: 48068.0, 300 sec: 47541.4). Total num frames: 1106903040. Throughput: 0: 11969.4. Samples: 276789248. Policy #0 lag: (min: 15.0, avg: 118.6, max: 271.0) [2024-06-15 18:10:15,956][1648985] Avg episode reward: [(0, '168.190')] [2024-06-15 18:10:16,904][1652491] Updated weights for policy 0, policy_version 540540 (0.0014) [2024-06-15 18:10:20,090][1652491] Updated weights for policy 0, policy_version 540604 (0.0014) [2024-06-15 18:10:20,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 49152.0, 300 sec: 47430.3). Total num frames: 1107165184. Throughput: 0: 12128.7. Samples: 276861440. Policy #0 lag: (min: 15.0, avg: 118.6, max: 271.0) [2024-06-15 18:10:20,956][1648985] Avg episode reward: [(0, '161.260')] [2024-06-15 18:10:23,545][1652491] Updated weights for policy 0, policy_version 540646 (0.0016) [2024-06-15 18:10:25,069][1652491] Updated weights for policy 0, policy_version 540707 (0.0016) [2024-06-15 18:10:25,955][1648985] Fps is (10 sec: 52428.4, 60 sec: 48059.8, 300 sec: 47985.7). Total num frames: 1107427328. Throughput: 0: 12049.0. Samples: 276902400. Policy #0 lag: (min: 15.0, avg: 118.6, max: 271.0) [2024-06-15 18:10:25,956][1648985] Avg episode reward: [(0, '178.180')] [2024-06-15 18:10:26,976][1652491] Updated weights for policy 0, policy_version 540768 (0.0040) [2024-06-15 18:10:29,769][1652491] Updated weights for policy 0, policy_version 540816 (0.0012) [2024-06-15 18:10:30,798][1652491] Updated weights for policy 0, policy_version 540864 (0.0013) [2024-06-15 18:10:30,955][1648985] Fps is (10 sec: 52429.1, 60 sec: 50244.3, 300 sec: 47652.5). Total num frames: 1107689472. Throughput: 0: 12162.8. Samples: 276972032. Policy #0 lag: (min: 15.0, avg: 118.6, max: 271.0) [2024-06-15 18:10:30,955][1648985] Avg episode reward: [(0, '169.950')] [2024-06-15 18:10:35,324][1652491] Updated weights for policy 0, policy_version 540931 (0.0056) [2024-06-15 18:10:35,955][1648985] Fps is (10 sec: 45875.7, 60 sec: 48059.7, 300 sec: 47763.5). Total num frames: 1107886080. Throughput: 0: 12026.3. Samples: 277043200. Policy #0 lag: (min: 15.0, avg: 118.6, max: 271.0) [2024-06-15 18:10:35,955][1648985] Avg episode reward: [(0, '176.460')] [2024-06-15 18:10:36,403][1652491] Updated weights for policy 0, policy_version 540987 (0.0012) [2024-06-15 18:10:38,214][1652491] Updated weights for policy 0, policy_version 541028 (0.0013) [2024-06-15 18:10:40,744][1652491] Updated weights for policy 0, policy_version 541060 (0.0015) [2024-06-15 18:10:40,955][1648985] Fps is (10 sec: 42598.5, 60 sec: 48605.9, 300 sec: 47541.4). Total num frames: 1108115456. Throughput: 0: 12015.0. Samples: 277079040. Policy #0 lag: (min: 0.0, avg: 123.2, max: 256.0) [2024-06-15 18:10:40,955][1648985] Avg episode reward: [(0, '181.110')] [2024-06-15 18:10:41,476][1651469] Signal inference workers to stop experience collection... (28150 times) [2024-06-15 18:10:41,610][1652491] InferenceWorker_p0-w0: stopping experience collection (28150 times) [2024-06-15 18:10:41,749][1651469] Signal inference workers to resume experience collection... (28150 times) [2024-06-15 18:10:41,750][1652491] InferenceWorker_p0-w0: resuming experience collection (28150 times) [2024-06-15 18:10:41,853][1652491] Updated weights for policy 0, policy_version 541108 (0.0017) [2024-06-15 18:10:44,563][1652491] Updated weights for policy 0, policy_version 541136 (0.0019) [2024-06-15 18:10:45,494][1652491] Updated weights for policy 0, policy_version 541184 (0.0013) [2024-06-15 18:10:45,955][1648985] Fps is (10 sec: 45874.9, 60 sec: 48059.7, 300 sec: 47541.4). Total num frames: 1108344832. Throughput: 0: 12208.3. Samples: 277159424. Policy #0 lag: (min: 0.0, avg: 123.2, max: 256.0) [2024-06-15 18:10:45,956][1648985] Avg episode reward: [(0, '150.330')] [2024-06-15 18:10:47,162][1652491] Updated weights for policy 0, policy_version 541246 (0.0012) [2024-06-15 18:10:50,955][1648985] Fps is (10 sec: 49151.1, 60 sec: 49151.9, 300 sec: 47652.4). Total num frames: 1108606976. Throughput: 0: 12185.6. Samples: 277227520. Policy #0 lag: (min: 0.0, avg: 123.2, max: 256.0) [2024-06-15 18:10:50,956][1648985] Avg episode reward: [(0, '152.020')] [2024-06-15 18:10:51,476][1652491] Updated weights for policy 0, policy_version 541314 (0.0013) [2024-06-15 18:10:52,621][1652491] Updated weights for policy 0, policy_version 541374 (0.0118) [2024-06-15 18:10:55,955][1648985] Fps is (10 sec: 42598.1, 60 sec: 46421.3, 300 sec: 47208.1). Total num frames: 1108770816. Throughput: 0: 12151.4. Samples: 277263872. Policy #0 lag: (min: 0.0, avg: 123.2, max: 256.0) [2024-06-15 18:10:55,956][1648985] Avg episode reward: [(0, '142.450')] [2024-06-15 18:10:56,275][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000541408_1108803584.pth... [2024-06-15 18:10:56,463][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000535872_1097465856.pth [2024-06-15 18:10:56,469][1651469] Saving a milestone train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/milestones/checkpoint_000541408_1108803584.pth [2024-06-15 18:10:56,806][1652491] Updated weights for policy 0, policy_version 541424 (0.0014) [2024-06-15 18:10:58,427][1652491] Updated weights for policy 0, policy_version 541488 (0.0012) [2024-06-15 18:10:59,803][1652491] Updated weights for policy 0, policy_version 541525 (0.0012) [2024-06-15 18:11:00,955][1648985] Fps is (10 sec: 52429.3, 60 sec: 50244.2, 300 sec: 47874.8). Total num frames: 1109131264. Throughput: 0: 12049.1. Samples: 277331456. Policy #0 lag: (min: 0.0, avg: 123.2, max: 256.0) [2024-06-15 18:11:00,956][1648985] Avg episode reward: [(0, '132.030')] [2024-06-15 18:11:02,798][1652491] Updated weights for policy 0, policy_version 541588 (0.0012) [2024-06-15 18:11:03,946][1652491] Updated weights for policy 0, policy_version 541632 (0.0012) [2024-06-15 18:11:05,955][1648985] Fps is (10 sec: 49152.1, 60 sec: 46421.2, 300 sec: 47097.0). Total num frames: 1109262336. Throughput: 0: 12060.4. Samples: 277404160. Policy #0 lag: (min: 0.0, avg: 123.2, max: 256.0) [2024-06-15 18:11:05,956][1648985] Avg episode reward: [(0, '147.990')] [2024-06-15 18:11:09,158][1652491] Updated weights for policy 0, policy_version 541712 (0.0013) [2024-06-15 18:11:10,955][1648985] Fps is (10 sec: 39321.5, 60 sec: 48059.7, 300 sec: 47763.5). Total num frames: 1109524480. Throughput: 0: 11821.5. Samples: 277434368. Policy #0 lag: (min: 0.0, avg: 123.2, max: 256.0) [2024-06-15 18:11:10,956][1648985] Avg episode reward: [(0, '134.570')] [2024-06-15 18:11:11,139][1652491] Updated weights for policy 0, policy_version 541776 (0.0017) [2024-06-15 18:11:12,134][1652491] Updated weights for policy 0, policy_version 541822 (0.0012) [2024-06-15 18:11:14,453][1652491] Updated weights for policy 0, policy_version 541874 (0.0012) [2024-06-15 18:11:15,955][1648985] Fps is (10 sec: 52429.5, 60 sec: 48059.8, 300 sec: 47430.3). Total num frames: 1109786624. Throughput: 0: 11798.8. Samples: 277502976. Policy #0 lag: (min: 0.0, avg: 123.2, max: 256.0) [2024-06-15 18:11:15,955][1648985] Avg episode reward: [(0, '144.180')] [2024-06-15 18:11:18,445][1652491] Updated weights for policy 0, policy_version 541914 (0.0100) [2024-06-15 18:11:20,015][1652491] Updated weights for policy 0, policy_version 541972 (0.0016) [2024-06-15 18:11:20,955][1648985] Fps is (10 sec: 52429.1, 60 sec: 48059.7, 300 sec: 47985.7). Total num frames: 1110048768. Throughput: 0: 11855.6. Samples: 277576704. Policy #0 lag: (min: 0.0, avg: 123.2, max: 256.0) [2024-06-15 18:11:20,956][1648985] Avg episode reward: [(0, '146.130')] [2024-06-15 18:11:22,705][1652491] Updated weights for policy 0, policy_version 542033 (0.0013) [2024-06-15 18:11:23,560][1652491] Updated weights for policy 0, policy_version 542076 (0.0090) [2024-06-15 18:11:24,410][1651469] Signal inference workers to stop experience collection... (28200 times) [2024-06-15 18:11:24,479][1652491] InferenceWorker_p0-w0: stopping experience collection (28200 times) [2024-06-15 18:11:24,659][1651469] Signal inference workers to resume experience collection... (28200 times) [2024-06-15 18:11:24,659][1652491] InferenceWorker_p0-w0: resuming experience collection (28200 times) [2024-06-15 18:11:24,904][1652491] Updated weights for policy 0, policy_version 542117 (0.0012) [2024-06-15 18:11:25,955][1648985] Fps is (10 sec: 52428.5, 60 sec: 48059.8, 300 sec: 47985.7). Total num frames: 1110310912. Throughput: 0: 11844.2. Samples: 277612032. Policy #0 lag: (min: 0.0, avg: 123.2, max: 256.0) [2024-06-15 18:11:25,956][1648985] Avg episode reward: [(0, '149.460')] [2024-06-15 18:11:30,520][1652491] Updated weights for policy 0, policy_version 542192 (0.0015) [2024-06-15 18:11:30,955][1648985] Fps is (10 sec: 39321.5, 60 sec: 45875.1, 300 sec: 47541.3). Total num frames: 1110441984. Throughput: 0: 11810.1. Samples: 277690880. Policy #0 lag: (min: 0.0, avg: 123.2, max: 256.0) [2024-06-15 18:11:30,956][1648985] Avg episode reward: [(0, '152.000')] [2024-06-15 18:11:32,017][1652491] Updated weights for policy 0, policy_version 542264 (0.0014) [2024-06-15 18:11:33,659][1652491] Updated weights for policy 0, policy_version 542305 (0.0013) [2024-06-15 18:11:35,156][1652491] Updated weights for policy 0, policy_version 542339 (0.0012) [2024-06-15 18:11:35,955][1648985] Fps is (10 sec: 45875.6, 60 sec: 48059.8, 300 sec: 47763.5). Total num frames: 1110769664. Throughput: 0: 11662.3. Samples: 277752320. Policy #0 lag: (min: 0.0, avg: 123.2, max: 256.0) [2024-06-15 18:11:35,956][1648985] Avg episode reward: [(0, '170.800')] [2024-06-15 18:11:36,529][1652491] Updated weights for policy 0, policy_version 542392 (0.0012) [2024-06-15 18:11:40,955][1648985] Fps is (10 sec: 45875.1, 60 sec: 46421.2, 300 sec: 47319.2). Total num frames: 1110900736. Throughput: 0: 11912.5. Samples: 277799936. Policy #0 lag: (min: 0.0, avg: 123.2, max: 256.0) [2024-06-15 18:11:40,956][1648985] Avg episode reward: [(0, '172.220')] [2024-06-15 18:11:41,240][1652491] Updated weights for policy 0, policy_version 542448 (0.0012) [2024-06-15 18:11:42,852][1652491] Updated weights for policy 0, policy_version 542519 (0.0013) [2024-06-15 18:11:44,238][1652491] Updated weights for policy 0, policy_version 542576 (0.0049) [2024-06-15 18:11:45,955][1648985] Fps is (10 sec: 45874.7, 60 sec: 48059.7, 300 sec: 47541.4). Total num frames: 1111228416. Throughput: 0: 11787.4. Samples: 277861888. Policy #0 lag: (min: 0.0, avg: 123.2, max: 256.0) [2024-06-15 18:11:45,956][1648985] Avg episode reward: [(0, '175.330')] [2024-06-15 18:11:47,142][1652491] Updated weights for policy 0, policy_version 542624 (0.0115) [2024-06-15 18:11:47,917][1652491] Updated weights for policy 0, policy_version 542656 (0.0012) [2024-06-15 18:11:50,955][1648985] Fps is (10 sec: 45875.6, 60 sec: 45875.3, 300 sec: 47097.1). Total num frames: 1111359488. Throughput: 0: 12060.5. Samples: 277946880. Policy #0 lag: (min: 0.0, avg: 123.2, max: 256.0) [2024-06-15 18:11:50,955][1648985] Avg episode reward: [(0, '165.010')] [2024-06-15 18:11:52,393][1652491] Updated weights for policy 0, policy_version 542721 (0.0014) [2024-06-15 18:11:54,903][1652491] Updated weights for policy 0, policy_version 542832 (0.0013) [2024-06-15 18:11:55,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 49698.2, 300 sec: 47874.6). Total num frames: 1111752704. Throughput: 0: 11912.5. Samples: 277970432. Policy #0 lag: (min: 0.0, avg: 123.2, max: 256.0) [2024-06-15 18:11:55,956][1648985] Avg episode reward: [(0, '147.800')] [2024-06-15 18:11:58,777][1652491] Updated weights for policy 0, policy_version 542884 (0.0012) [2024-06-15 18:12:00,955][1648985] Fps is (10 sec: 52428.6, 60 sec: 45875.2, 300 sec: 47319.2). Total num frames: 1111883776. Throughput: 0: 11935.3. Samples: 278040064. Policy #0 lag: (min: 0.0, avg: 123.2, max: 256.0) [2024-06-15 18:12:00,956][1648985] Avg episode reward: [(0, '139.640')] [2024-06-15 18:12:02,913][1652491] Updated weights for policy 0, policy_version 542944 (0.0091) [2024-06-15 18:12:04,347][1652491] Updated weights for policy 0, policy_version 543024 (0.0016) [2024-06-15 18:12:05,369][1651469] Signal inference workers to stop experience collection... (28250 times) [2024-06-15 18:12:05,410][1652491] InferenceWorker_p0-w0: stopping experience collection (28250 times) [2024-06-15 18:12:05,609][1651469] Signal inference workers to resume experience collection... (28250 times) [2024-06-15 18:12:05,626][1652491] InferenceWorker_p0-w0: resuming experience collection (28250 times) [2024-06-15 18:12:05,955][1648985] Fps is (10 sec: 45874.5, 60 sec: 49151.9, 300 sec: 47874.6). Total num frames: 1112211456. Throughput: 0: 11878.4. Samples: 278111232. Policy #0 lag: (min: 0.0, avg: 123.2, max: 256.0) [2024-06-15 18:12:05,956][1648985] Avg episode reward: [(0, '122.990')] [2024-06-15 18:12:06,220][1652491] Updated weights for policy 0, policy_version 543104 (0.0013) [2024-06-15 18:12:09,743][1652491] Updated weights for policy 0, policy_version 543159 (0.0026) [2024-06-15 18:12:10,955][1648985] Fps is (10 sec: 52429.2, 60 sec: 48059.8, 300 sec: 47652.5). Total num frames: 1112408064. Throughput: 0: 11969.4. Samples: 278150656. Policy #0 lag: (min: 0.0, avg: 123.2, max: 256.0) [2024-06-15 18:12:10,955][1648985] Avg episode reward: [(0, '108.730')] [2024-06-15 18:12:14,044][1652491] Updated weights for policy 0, policy_version 543223 (0.0016) [2024-06-15 18:12:15,931][1652491] Updated weights for policy 0, policy_version 543285 (0.0013) [2024-06-15 18:12:15,955][1648985] Fps is (10 sec: 42599.3, 60 sec: 47513.6, 300 sec: 47874.6). Total num frames: 1112637440. Throughput: 0: 11787.4. Samples: 278221312. Policy #0 lag: (min: 15.0, avg: 91.2, max: 271.0) [2024-06-15 18:12:15,956][1648985] Avg episode reward: [(0, '120.190')] [2024-06-15 18:12:17,036][1652491] Updated weights for policy 0, policy_version 543328 (0.0011) [2024-06-15 18:12:20,069][1652491] Updated weights for policy 0, policy_version 543365 (0.0017) [2024-06-15 18:12:20,955][1648985] Fps is (10 sec: 45874.4, 60 sec: 46967.4, 300 sec: 47763.5). Total num frames: 1112866816. Throughput: 0: 11878.4. Samples: 278286848. Policy #0 lag: (min: 15.0, avg: 91.2, max: 271.0) [2024-06-15 18:12:20,956][1648985] Avg episode reward: [(0, '132.860')] [2024-06-15 18:12:24,371][1652491] Updated weights for policy 0, policy_version 543433 (0.0015) [2024-06-15 18:12:25,745][1652491] Updated weights for policy 0, policy_version 543489 (0.0014) [2024-06-15 18:12:25,955][1648985] Fps is (10 sec: 42598.4, 60 sec: 45875.2, 300 sec: 47541.4). Total num frames: 1113063424. Throughput: 0: 11776.0. Samples: 278329856. Policy #0 lag: (min: 15.0, avg: 91.2, max: 271.0) [2024-06-15 18:12:25,955][1648985] Avg episode reward: [(0, '139.630')] [2024-06-15 18:12:27,012][1652491] Updated weights for policy 0, policy_version 543552 (0.0012) [2024-06-15 18:12:30,955][1648985] Fps is (10 sec: 45876.4, 60 sec: 48059.9, 300 sec: 47541.4). Total num frames: 1113325568. Throughput: 0: 11798.8. Samples: 278392832. Policy #0 lag: (min: 15.0, avg: 91.2, max: 271.0) [2024-06-15 18:12:30,955][1648985] Avg episode reward: [(0, '154.710')] [2024-06-15 18:12:31,315][1652491] Updated weights for policy 0, policy_version 543617 (0.0092) [2024-06-15 18:12:35,955][1648985] Fps is (10 sec: 39321.5, 60 sec: 44782.9, 300 sec: 47319.2). Total num frames: 1113456640. Throughput: 0: 11582.6. Samples: 278468096. Policy #0 lag: (min: 15.0, avg: 91.2, max: 271.0) [2024-06-15 18:12:35,956][1648985] Avg episode reward: [(0, '149.170')] [2024-06-15 18:12:36,736][1652491] Updated weights for policy 0, policy_version 543696 (0.0034) [2024-06-15 18:12:38,640][1652491] Updated weights for policy 0, policy_version 543761 (0.0064) [2024-06-15 18:12:40,492][1652491] Updated weights for policy 0, policy_version 543840 (0.0026) [2024-06-15 18:12:40,955][1648985] Fps is (10 sec: 49149.7, 60 sec: 48605.7, 300 sec: 47763.5). Total num frames: 1113817088. Throughput: 0: 11593.9. Samples: 278492160. Policy #0 lag: (min: 15.0, avg: 91.2, max: 271.0) [2024-06-15 18:12:40,956][1648985] Avg episode reward: [(0, '149.850')] [2024-06-15 18:12:44,505][1652491] Updated weights for policy 0, policy_version 543904 (0.0119) [2024-06-15 18:12:45,955][1648985] Fps is (10 sec: 52428.5, 60 sec: 45875.2, 300 sec: 47541.3). Total num frames: 1113980928. Throughput: 0: 11446.0. Samples: 278555136. Policy #0 lag: (min: 15.0, avg: 91.2, max: 271.0) [2024-06-15 18:12:45,956][1648985] Avg episode reward: [(0, '135.790')] [2024-06-15 18:12:49,519][1652491] Updated weights for policy 0, policy_version 543973 (0.0015) [2024-06-15 18:12:50,943][1651469] Signal inference workers to stop experience collection... (28300 times) [2024-06-15 18:12:50,955][1648985] Fps is (10 sec: 36045.0, 60 sec: 46967.3, 300 sec: 47541.3). Total num frames: 1114177536. Throughput: 0: 11343.6. Samples: 278621696. Policy #0 lag: (min: 15.0, avg: 91.2, max: 271.0) [2024-06-15 18:12:50,956][1648985] Avg episode reward: [(0, '140.410')] [2024-06-15 18:12:50,975][1652491] InferenceWorker_p0-w0: stopping experience collection (28300 times) [2024-06-15 18:12:51,209][1651469] Signal inference workers to resume experience collection... (28300 times) [2024-06-15 18:12:51,210][1652491] InferenceWorker_p0-w0: resuming experience collection (28300 times) [2024-06-15 18:12:51,757][1652491] Updated weights for policy 0, policy_version 544067 (0.0012) [2024-06-15 18:12:52,648][1652491] Updated weights for policy 0, policy_version 544126 (0.0014) [2024-06-15 18:12:55,955][1648985] Fps is (10 sec: 42598.2, 60 sec: 44236.7, 300 sec: 47542.6). Total num frames: 1114406912. Throughput: 0: 11172.9. Samples: 278653440. Policy #0 lag: (min: 15.0, avg: 91.2, max: 271.0) [2024-06-15 18:12:55,956][1648985] Avg episode reward: [(0, '151.780')] [2024-06-15 18:12:56,414][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000544176_1114472448.pth... [2024-06-15 18:12:56,460][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000538592_1103036416.pth [2024-06-15 18:12:56,599][1652491] Updated weights for policy 0, policy_version 544180 (0.0012) [2024-06-15 18:13:00,955][1648985] Fps is (10 sec: 39322.9, 60 sec: 44783.0, 300 sec: 47319.2). Total num frames: 1114570752. Throughput: 0: 11320.9. Samples: 278730752. Policy #0 lag: (min: 15.0, avg: 91.2, max: 271.0) [2024-06-15 18:13:00,955][1648985] Avg episode reward: [(0, '147.760')] [2024-06-15 18:13:01,214][1652491] Updated weights for policy 0, policy_version 544240 (0.0119) [2024-06-15 18:13:02,917][1652491] Updated weights for policy 0, policy_version 544313 (0.0130) [2024-06-15 18:13:03,891][1652491] Updated weights for policy 0, policy_version 544355 (0.0020) [2024-06-15 18:13:05,955][1648985] Fps is (10 sec: 49152.6, 60 sec: 44783.1, 300 sec: 47541.4). Total num frames: 1114898432. Throughput: 0: 11320.9. Samples: 278796288. Policy #0 lag: (min: 15.0, avg: 91.2, max: 271.0) [2024-06-15 18:13:05,956][1648985] Avg episode reward: [(0, '140.500')] [2024-06-15 18:13:06,744][1652491] Updated weights for policy 0, policy_version 544400 (0.0013) [2024-06-15 18:13:07,821][1652491] Updated weights for policy 0, policy_version 544445 (0.0013) [2024-06-15 18:13:10,955][1648985] Fps is (10 sec: 45875.3, 60 sec: 43690.7, 300 sec: 47097.1). Total num frames: 1115029504. Throughput: 0: 11207.1. Samples: 278834176. Policy #0 lag: (min: 15.0, avg: 91.2, max: 271.0) [2024-06-15 18:13:10,955][1648985] Avg episode reward: [(0, '134.130')] [2024-06-15 18:13:13,079][1652491] Updated weights for policy 0, policy_version 544528 (0.0011) [2024-06-15 18:13:14,630][1652491] Updated weights for policy 0, policy_version 544594 (0.0012) [2024-06-15 18:13:15,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 46421.3, 300 sec: 47541.4). Total num frames: 1115422720. Throughput: 0: 11241.2. Samples: 278898688. Policy #0 lag: (min: 15.0, avg: 91.2, max: 271.0) [2024-06-15 18:13:15,956][1648985] Avg episode reward: [(0, '130.360')] [2024-06-15 18:13:17,687][1652491] Updated weights for policy 0, policy_version 544658 (0.0013) [2024-06-15 18:13:20,955][1648985] Fps is (10 sec: 52427.4, 60 sec: 44782.9, 300 sec: 47430.3). Total num frames: 1115553792. Throughput: 0: 11286.7. Samples: 278976000. Policy #0 lag: (min: 15.0, avg: 91.2, max: 271.0) [2024-06-15 18:13:20,956][1648985] Avg episode reward: [(0, '134.730')] [2024-06-15 18:13:22,778][1652491] Updated weights for policy 0, policy_version 544720 (0.0014) [2024-06-15 18:13:24,073][1652491] Updated weights for policy 0, policy_version 544774 (0.0014) [2024-06-15 18:13:25,841][1652491] Updated weights for policy 0, policy_version 544849 (0.0011) [2024-06-15 18:13:25,955][1648985] Fps is (10 sec: 42598.5, 60 sec: 46421.4, 300 sec: 47541.4). Total num frames: 1115848704. Throughput: 0: 11548.5. Samples: 279011840. Policy #0 lag: (min: 15.0, avg: 91.2, max: 271.0) [2024-06-15 18:13:25,955][1648985] Avg episode reward: [(0, '161.560')] [2024-06-15 18:13:28,227][1652491] Updated weights for policy 0, policy_version 544898 (0.0013) [2024-06-15 18:13:29,368][1652491] Updated weights for policy 0, policy_version 544959 (0.0013) [2024-06-15 18:13:30,955][1648985] Fps is (10 sec: 52430.7, 60 sec: 45875.2, 300 sec: 47541.4). Total num frames: 1116078080. Throughput: 0: 11639.5. Samples: 279078912. Policy #0 lag: (min: 15.0, avg: 91.2, max: 271.0) [2024-06-15 18:13:30,955][1648985] Avg episode reward: [(0, '176.340')] [2024-06-15 18:13:34,043][1651469] Signal inference workers to stop experience collection... (28350 times) [2024-06-15 18:13:34,090][1652491] InferenceWorker_p0-w0: stopping experience collection (28350 times) [2024-06-15 18:13:34,344][1651469] Signal inference workers to resume experience collection... (28350 times) [2024-06-15 18:13:34,345][1652491] InferenceWorker_p0-w0: resuming experience collection (28350 times) [2024-06-15 18:13:35,076][1652491] Updated weights for policy 0, policy_version 545024 (0.0015) [2024-06-15 18:13:35,955][1648985] Fps is (10 sec: 42597.9, 60 sec: 46967.4, 300 sec: 47319.2). Total num frames: 1116274688. Throughput: 0: 11707.8. Samples: 279148544. Policy #0 lag: (min: 15.0, avg: 91.2, max: 271.0) [2024-06-15 18:13:35,956][1648985] Avg episode reward: [(0, '165.880')] [2024-06-15 18:13:37,881][1652491] Updated weights for policy 0, policy_version 545142 (0.0013) [2024-06-15 18:13:39,839][1652491] Updated weights for policy 0, policy_version 545168 (0.0020) [2024-06-15 18:13:40,955][1648985] Fps is (10 sec: 49150.1, 60 sec: 45875.3, 300 sec: 47763.5). Total num frames: 1116569600. Throughput: 0: 11650.8. Samples: 279177728. Policy #0 lag: (min: 15.0, avg: 91.2, max: 271.0) [2024-06-15 18:13:40,956][1648985] Avg episode reward: [(0, '151.950')] [2024-06-15 18:13:45,733][1652491] Updated weights for policy 0, policy_version 545233 (0.0013) [2024-06-15 18:13:45,955][1648985] Fps is (10 sec: 36044.7, 60 sec: 44236.7, 300 sec: 46874.9). Total num frames: 1116635136. Throughput: 0: 11650.8. Samples: 279255040. Policy #0 lag: (min: 15.0, avg: 79.6, max: 271.0) [2024-06-15 18:13:45,956][1648985] Avg episode reward: [(0, '133.580')] [2024-06-15 18:13:47,231][1652491] Updated weights for policy 0, policy_version 545296 (0.0026) [2024-06-15 18:13:48,280][1652491] Updated weights for policy 0, policy_version 545351 (0.0013) [2024-06-15 18:13:49,291][1652491] Updated weights for policy 0, policy_version 545398 (0.0013) [2024-06-15 18:13:50,955][1648985] Fps is (10 sec: 45876.2, 60 sec: 47513.8, 300 sec: 47652.5). Total num frames: 1117028352. Throughput: 0: 11741.9. Samples: 279324672. Policy #0 lag: (min: 15.0, avg: 79.6, max: 271.0) [2024-06-15 18:13:50,955][1648985] Avg episode reward: [(0, '148.970')] [2024-06-15 18:13:50,981][1652491] Updated weights for policy 0, policy_version 545425 (0.0016) [2024-06-15 18:13:55,955][1648985] Fps is (10 sec: 49152.3, 60 sec: 45329.1, 300 sec: 46875.6). Total num frames: 1117126656. Throughput: 0: 11798.7. Samples: 279365120. Policy #0 lag: (min: 15.0, avg: 79.6, max: 271.0) [2024-06-15 18:13:55,956][1648985] Avg episode reward: [(0, '142.530')] [2024-06-15 18:13:56,237][1652491] Updated weights for policy 0, policy_version 545489 (0.0024) [2024-06-15 18:13:57,844][1652491] Updated weights for policy 0, policy_version 545552 (0.0013) [2024-06-15 18:13:59,034][1652491] Updated weights for policy 0, policy_version 545603 (0.0013) [2024-06-15 18:14:00,330][1652491] Updated weights for policy 0, policy_version 545657 (0.0014) [2024-06-15 18:14:00,955][1648985] Fps is (10 sec: 49152.4, 60 sec: 49152.0, 300 sec: 47541.4). Total num frames: 1117519872. Throughput: 0: 11730.5. Samples: 279426560. Policy #0 lag: (min: 15.0, avg: 79.6, max: 271.0) [2024-06-15 18:14:00,955][1648985] Avg episode reward: [(0, '151.480')] [2024-06-15 18:14:02,357][1652491] Updated weights for policy 0, policy_version 545696 (0.0011) [2024-06-15 18:14:05,955][1648985] Fps is (10 sec: 52429.6, 60 sec: 45875.3, 300 sec: 47208.2). Total num frames: 1117650944. Throughput: 0: 11855.7. Samples: 279509504. Policy #0 lag: (min: 15.0, avg: 79.6, max: 271.0) [2024-06-15 18:14:05,955][1648985] Avg episode reward: [(0, '165.110')] [2024-06-15 18:14:06,745][1652491] Updated weights for policy 0, policy_version 545744 (0.0013) [2024-06-15 18:14:07,876][1652491] Updated weights for policy 0, policy_version 545789 (0.0011) [2024-06-15 18:14:09,385][1652491] Updated weights for policy 0, policy_version 545830 (0.0013) [2024-06-15 18:14:10,899][1652491] Updated weights for policy 0, policy_version 545904 (0.0015) [2024-06-15 18:14:10,976][1648985] Fps is (10 sec: 49047.6, 60 sec: 49680.5, 300 sec: 47428.5). Total num frames: 1118011392. Throughput: 0: 11815.9. Samples: 279543808. Policy #0 lag: (min: 15.0, avg: 79.6, max: 271.0) [2024-06-15 18:14:10,977][1648985] Avg episode reward: [(0, '142.570')] [2024-06-15 18:14:12,340][1651469] Signal inference workers to stop experience collection... (28400 times) [2024-06-15 18:14:12,375][1652491] InferenceWorker_p0-w0: stopping experience collection (28400 times) [2024-06-15 18:14:12,585][1651469] Signal inference workers to resume experience collection... (28400 times) [2024-06-15 18:14:12,587][1652491] InferenceWorker_p0-w0: resuming experience collection (28400 times) [2024-06-15 18:14:13,262][1652491] Updated weights for policy 0, policy_version 545954 (0.0013) [2024-06-15 18:14:15,955][1648985] Fps is (10 sec: 52428.4, 60 sec: 45875.2, 300 sec: 47319.2). Total num frames: 1118175232. Throughput: 0: 11901.1. Samples: 279614464. Policy #0 lag: (min: 15.0, avg: 79.6, max: 271.0) [2024-06-15 18:14:15,956][1648985] Avg episode reward: [(0, '156.070')] [2024-06-15 18:14:17,482][1652491] Updated weights for policy 0, policy_version 546000 (0.0011) [2024-06-15 18:14:18,364][1652491] Updated weights for policy 0, policy_version 546045 (0.0024) [2024-06-15 18:14:20,116][1652491] Updated weights for policy 0, policy_version 546099 (0.0017) [2024-06-15 18:14:20,955][1648985] Fps is (10 sec: 49256.6, 60 sec: 49152.2, 300 sec: 47319.2). Total num frames: 1118502912. Throughput: 0: 12071.9. Samples: 279691776. Policy #0 lag: (min: 15.0, avg: 79.6, max: 271.0) [2024-06-15 18:14:20,955][1648985] Avg episode reward: [(0, '169.680')] [2024-06-15 18:14:21,367][1652491] Updated weights for policy 0, policy_version 546172 (0.0093) [2024-06-15 18:14:23,924][1652491] Updated weights for policy 0, policy_version 546239 (0.0205) [2024-06-15 18:14:25,955][1648985] Fps is (10 sec: 52427.6, 60 sec: 47513.4, 300 sec: 47541.3). Total num frames: 1118699520. Throughput: 0: 12014.9. Samples: 279718400. Policy #0 lag: (min: 15.0, avg: 79.6, max: 271.0) [2024-06-15 18:14:25,956][1648985] Avg episode reward: [(0, '164.270')] [2024-06-15 18:14:28,612][1652491] Updated weights for policy 0, policy_version 546296 (0.0015) [2024-06-15 18:14:30,955][1648985] Fps is (10 sec: 45873.8, 60 sec: 48059.4, 300 sec: 47319.2). Total num frames: 1118961664. Throughput: 0: 12208.3. Samples: 279804416. Policy #0 lag: (min: 15.0, avg: 79.6, max: 271.0) [2024-06-15 18:14:30,956][1648985] Avg episode reward: [(0, '162.810')] [2024-06-15 18:14:31,238][1652491] Updated weights for policy 0, policy_version 546391 (0.0015) [2024-06-15 18:14:33,550][1652491] Updated weights for policy 0, policy_version 546451 (0.0023) [2024-06-15 18:14:34,597][1652491] Updated weights for policy 0, policy_version 546495 (0.0011) [2024-06-15 18:14:35,955][1648985] Fps is (10 sec: 52430.0, 60 sec: 49152.1, 300 sec: 47541.4). Total num frames: 1119223808. Throughput: 0: 12151.5. Samples: 279871488. Policy #0 lag: (min: 15.0, avg: 79.6, max: 271.0) [2024-06-15 18:14:35,955][1648985] Avg episode reward: [(0, '157.040')] [2024-06-15 18:14:39,137][1652491] Updated weights for policy 0, policy_version 546559 (0.0013) [2024-06-15 18:14:40,955][1648985] Fps is (10 sec: 42599.5, 60 sec: 46967.6, 300 sec: 47208.1). Total num frames: 1119387648. Throughput: 0: 12242.5. Samples: 279916032. Policy #0 lag: (min: 15.0, avg: 79.6, max: 271.0) [2024-06-15 18:14:40,956][1648985] Avg episode reward: [(0, '128.950')] [2024-06-15 18:14:41,980][1652491] Updated weights for policy 0, policy_version 546625 (0.0012) [2024-06-15 18:14:43,096][1652491] Updated weights for policy 0, policy_version 546686 (0.0012) [2024-06-15 18:14:45,955][1648985] Fps is (10 sec: 52427.8, 60 sec: 51882.6, 300 sec: 47763.5). Total num frames: 1119748096. Throughput: 0: 12253.8. Samples: 279977984. Policy #0 lag: (min: 15.0, avg: 79.6, max: 271.0) [2024-06-15 18:14:45,956][1648985] Avg episode reward: [(0, '139.430')] [2024-06-15 18:14:49,731][1652491] Updated weights for policy 0, policy_version 546768 (0.0014) [2024-06-15 18:14:50,955][1648985] Fps is (10 sec: 49152.3, 60 sec: 47513.6, 300 sec: 47097.1). Total num frames: 1119879168. Throughput: 0: 12128.7. Samples: 280055296. Policy #0 lag: (min: 15.0, avg: 79.6, max: 271.0) [2024-06-15 18:14:50,955][1648985] Avg episode reward: [(0, '157.640')] [2024-06-15 18:14:51,664][1652491] Updated weights for policy 0, policy_version 546819 (0.0012) [2024-06-15 18:14:53,257][1651469] Signal inference workers to stop experience collection... (28450 times) [2024-06-15 18:14:53,295][1652491] InferenceWorker_p0-w0: stopping experience collection (28450 times) [2024-06-15 18:14:53,302][1652491] Updated weights for policy 0, policy_version 546898 (0.0075) [2024-06-15 18:14:53,459][1651469] Signal inference workers to resume experience collection... (28450 times) [2024-06-15 18:14:53,460][1652491] InferenceWorker_p0-w0: resuming experience collection (28450 times) [2024-06-15 18:14:55,401][1652491] Updated weights for policy 0, policy_version 546963 (0.0013) [2024-06-15 18:14:55,955][1648985] Fps is (10 sec: 45876.0, 60 sec: 51336.6, 300 sec: 47763.5). Total num frames: 1120206848. Throughput: 0: 12111.7. Samples: 280088576. Policy #0 lag: (min: 15.0, avg: 79.6, max: 271.0) [2024-06-15 18:14:55,955][1648985] Avg episode reward: [(0, '141.040')] [2024-06-15 18:14:56,320][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000547008_1120272384.pth... [2024-06-15 18:14:56,454][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000541408_1108803584.pth [2024-06-15 18:15:00,882][1652491] Updated weights for policy 0, policy_version 547024 (0.0014) [2024-06-15 18:15:00,955][1648985] Fps is (10 sec: 42597.8, 60 sec: 46421.2, 300 sec: 46874.9). Total num frames: 1120305152. Throughput: 0: 12219.7. Samples: 280164352. Policy #0 lag: (min: 15.0, avg: 79.6, max: 271.0) [2024-06-15 18:15:00,956][1648985] Avg episode reward: [(0, '146.510')] [2024-06-15 18:15:02,117][1652491] Updated weights for policy 0, policy_version 547072 (0.0106) [2024-06-15 18:15:04,449][1652491] Updated weights for policy 0, policy_version 547139 (0.0013) [2024-06-15 18:15:05,542][1652491] Updated weights for policy 0, policy_version 547192 (0.0112) [2024-06-15 18:15:05,955][1648985] Fps is (10 sec: 45875.0, 60 sec: 50244.2, 300 sec: 47541.4). Total num frames: 1120665600. Throughput: 0: 11878.4. Samples: 280226304. Policy #0 lag: (min: 15.0, avg: 79.6, max: 271.0) [2024-06-15 18:15:05,956][1648985] Avg episode reward: [(0, '136.960')] [2024-06-15 18:15:06,918][1652491] Updated weights for policy 0, policy_version 547232 (0.0035) [2024-06-15 18:15:10,955][1648985] Fps is (10 sec: 49152.4, 60 sec: 46437.7, 300 sec: 47097.1). Total num frames: 1120796672. Throughput: 0: 12060.5. Samples: 280261120. Policy #0 lag: (min: 15.0, avg: 79.6, max: 271.0) [2024-06-15 18:15:10,955][1648985] Avg episode reward: [(0, '147.690')] [2024-06-15 18:15:12,939][1652491] Updated weights for policy 0, policy_version 547282 (0.0013) [2024-06-15 18:15:14,546][1652491] Updated weights for policy 0, policy_version 547332 (0.0024) [2024-06-15 18:15:15,955][1648985] Fps is (10 sec: 39321.5, 60 sec: 48059.7, 300 sec: 47097.0). Total num frames: 1121058816. Throughput: 0: 11787.4. Samples: 280334848. Policy #0 lag: (min: 63.0, avg: 151.3, max: 319.0) [2024-06-15 18:15:15,956][1648985] Avg episode reward: [(0, '168.850')] [2024-06-15 18:15:16,463][1652491] Updated weights for policy 0, policy_version 547424 (0.0012) [2024-06-15 18:15:18,356][1652491] Updated weights for policy 0, policy_version 547488 (0.0016) [2024-06-15 18:15:20,955][1648985] Fps is (10 sec: 52427.7, 60 sec: 46967.3, 300 sec: 47097.0). Total num frames: 1121320960. Throughput: 0: 11707.7. Samples: 280398336. Policy #0 lag: (min: 63.0, avg: 151.3, max: 319.0) [2024-06-15 18:15:20,956][1648985] Avg episode reward: [(0, '161.180')] [2024-06-15 18:15:23,797][1652491] Updated weights for policy 0, policy_version 547536 (0.0015) [2024-06-15 18:15:25,775][1652491] Updated weights for policy 0, policy_version 547587 (0.0013) [2024-06-15 18:15:25,955][1648985] Fps is (10 sec: 39322.3, 60 sec: 45875.5, 300 sec: 46652.8). Total num frames: 1121452032. Throughput: 0: 11616.7. Samples: 280438784. Policy #0 lag: (min: 63.0, avg: 151.3, max: 319.0) [2024-06-15 18:15:25,955][1648985] Avg episode reward: [(0, '151.730')] [2024-06-15 18:15:27,522][1652491] Updated weights for policy 0, policy_version 547664 (0.0091) [2024-06-15 18:15:28,795][1652491] Updated weights for policy 0, policy_version 547728 (0.0012) [2024-06-15 18:15:30,955][1648985] Fps is (10 sec: 52429.2, 60 sec: 48059.8, 300 sec: 47319.2). Total num frames: 1121845248. Throughput: 0: 11628.1. Samples: 280501248. Policy #0 lag: (min: 63.0, avg: 151.3, max: 319.0) [2024-06-15 18:15:30,956][1648985] Avg episode reward: [(0, '137.450')] [2024-06-15 18:15:35,131][1652491] Updated weights for policy 0, policy_version 547796 (0.0016) [2024-06-15 18:15:35,955][1648985] Fps is (10 sec: 52428.6, 60 sec: 45875.2, 300 sec: 46986.0). Total num frames: 1121976320. Throughput: 0: 11594.0. Samples: 280577024. Policy #0 lag: (min: 63.0, avg: 151.3, max: 319.0) [2024-06-15 18:15:35,955][1648985] Avg episode reward: [(0, '153.620')] [2024-06-15 18:15:35,958][1652491] Updated weights for policy 0, policy_version 547840 (0.0015) [2024-06-15 18:15:38,042][1651469] Signal inference workers to stop experience collection... (28500 times) [2024-06-15 18:15:38,087][1652491] InferenceWorker_p0-w0: stopping experience collection (28500 times) [2024-06-15 18:15:38,330][1651469] Signal inference workers to resume experience collection... (28500 times) [2024-06-15 18:15:38,341][1652491] InferenceWorker_p0-w0: resuming experience collection (28500 times) [2024-06-15 18:15:39,191][1652491] Updated weights for policy 0, policy_version 547920 (0.0013) [2024-06-15 18:15:40,955][1648985] Fps is (10 sec: 42598.8, 60 sec: 48059.7, 300 sec: 47208.1). Total num frames: 1122271232. Throughput: 0: 11514.3. Samples: 280606720. Policy #0 lag: (min: 63.0, avg: 151.3, max: 319.0) [2024-06-15 18:15:40,956][1648985] Avg episode reward: [(0, '150.320')] [2024-06-15 18:15:41,164][1652491] Updated weights for policy 0, policy_version 548000 (0.0019) [2024-06-15 18:15:45,956][1648985] Fps is (10 sec: 39320.0, 60 sec: 43690.6, 300 sec: 46652.7). Total num frames: 1122369536. Throughput: 0: 11332.2. Samples: 280674304. Policy #0 lag: (min: 63.0, avg: 151.3, max: 319.0) [2024-06-15 18:15:45,957][1648985] Avg episode reward: [(0, '169.180')] [2024-06-15 18:15:47,385][1652491] Updated weights for policy 0, policy_version 548069 (0.0013) [2024-06-15 18:15:48,995][1652491] Updated weights for policy 0, policy_version 548099 (0.0013) [2024-06-15 18:15:50,955][1648985] Fps is (10 sec: 36044.9, 60 sec: 45875.2, 300 sec: 46986.0). Total num frames: 1122631680. Throughput: 0: 11366.4. Samples: 280737792. Policy #0 lag: (min: 63.0, avg: 151.3, max: 319.0) [2024-06-15 18:15:50,956][1648985] Avg episode reward: [(0, '157.560')] [2024-06-15 18:15:51,069][1652491] Updated weights for policy 0, policy_version 548176 (0.0014) [2024-06-15 18:15:52,120][1652491] Updated weights for policy 0, policy_version 548224 (0.0013) [2024-06-15 18:15:53,516][1652491] Updated weights for policy 0, policy_version 548287 (0.0015) [2024-06-15 18:15:55,955][1648985] Fps is (10 sec: 52430.5, 60 sec: 44782.9, 300 sec: 46652.8). Total num frames: 1122893824. Throughput: 0: 11309.5. Samples: 280770048. Policy #0 lag: (min: 63.0, avg: 151.3, max: 319.0) [2024-06-15 18:15:55,956][1648985] Avg episode reward: [(0, '159.910')] [2024-06-15 18:15:58,839][1652491] Updated weights for policy 0, policy_version 548347 (0.0042) [2024-06-15 18:16:00,955][1648985] Fps is (10 sec: 45875.4, 60 sec: 46421.4, 300 sec: 46874.9). Total num frames: 1123090432. Throughput: 0: 11423.3. Samples: 280848896. Policy #0 lag: (min: 63.0, avg: 151.3, max: 319.0) [2024-06-15 18:16:00,955][1648985] Avg episode reward: [(0, '144.760')] [2024-06-15 18:16:01,750][1652491] Updated weights for policy 0, policy_version 548416 (0.0015) [2024-06-15 18:16:03,360][1652491] Updated weights for policy 0, policy_version 548476 (0.0012) [2024-06-15 18:16:05,441][1652491] Updated weights for policy 0, policy_version 548528 (0.0027) [2024-06-15 18:16:05,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 45875.2, 300 sec: 47097.1). Total num frames: 1123418112. Throughput: 0: 11309.6. Samples: 280907264. Policy #0 lag: (min: 63.0, avg: 151.3, max: 319.0) [2024-06-15 18:16:05,956][1648985] Avg episode reward: [(0, '144.780')] [2024-06-15 18:16:09,255][1652491] Updated weights for policy 0, policy_version 548580 (0.0013) [2024-06-15 18:16:10,955][1648985] Fps is (10 sec: 45874.9, 60 sec: 45875.2, 300 sec: 46652.7). Total num frames: 1123549184. Throughput: 0: 11309.5. Samples: 280947712. Policy #0 lag: (min: 63.0, avg: 151.3, max: 319.0) [2024-06-15 18:16:10,956][1648985] Avg episode reward: [(0, '141.690')] [2024-06-15 18:16:12,151][1652491] Updated weights for policy 0, policy_version 548640 (0.0012) [2024-06-15 18:16:13,120][1652491] Updated weights for policy 0, policy_version 548672 (0.0013) [2024-06-15 18:16:14,624][1652491] Updated weights for policy 0, policy_version 548732 (0.0026) [2024-06-15 18:16:15,955][1648985] Fps is (10 sec: 39321.5, 60 sec: 45875.2, 300 sec: 46652.7). Total num frames: 1123811328. Throughput: 0: 11355.0. Samples: 281012224. Policy #0 lag: (min: 63.0, avg: 151.3, max: 319.0) [2024-06-15 18:16:15,956][1648985] Avg episode reward: [(0, '139.580')] [2024-06-15 18:16:16,757][1652491] Updated weights for policy 0, policy_version 548768 (0.0021) [2024-06-15 18:16:19,977][1652491] Updated weights for policy 0, policy_version 548801 (0.0057) [2024-06-15 18:16:20,955][1648985] Fps is (10 sec: 45875.2, 60 sec: 44783.1, 300 sec: 46430.6). Total num frames: 1124007936. Throughput: 0: 11355.0. Samples: 281088000. Policy #0 lag: (min: 63.0, avg: 151.3, max: 319.0) [2024-06-15 18:16:20,957][1648985] Avg episode reward: [(0, '139.160')] [2024-06-15 18:16:22,810][1651469] Signal inference workers to stop experience collection... (28550 times) [2024-06-15 18:16:22,903][1652491] InferenceWorker_p0-w0: stopping experience collection (28550 times) [2024-06-15 18:16:23,113][1651469] Signal inference workers to resume experience collection... (28550 times) [2024-06-15 18:16:23,114][1652491] InferenceWorker_p0-w0: resuming experience collection (28550 times) [2024-06-15 18:16:23,237][1652491] Updated weights for policy 0, policy_version 548880 (0.0135) [2024-06-15 18:16:24,356][1652491] Updated weights for policy 0, policy_version 548924 (0.0013) [2024-06-15 18:16:25,960][1648985] Fps is (10 sec: 49128.1, 60 sec: 47509.6, 300 sec: 46985.2). Total num frames: 1124302848. Throughput: 0: 11365.2. Samples: 281118208. Policy #0 lag: (min: 63.0, avg: 151.3, max: 319.0) [2024-06-15 18:16:25,960][1648985] Avg episode reward: [(0, '120.410')] [2024-06-15 18:16:26,013][1652491] Updated weights for policy 0, policy_version 548982 (0.0014) [2024-06-15 18:16:27,852][1652491] Updated weights for policy 0, policy_version 549024 (0.0110) [2024-06-15 18:16:30,955][1648985] Fps is (10 sec: 45875.4, 60 sec: 43690.8, 300 sec: 46430.6). Total num frames: 1124466688. Throughput: 0: 11491.6. Samples: 281191424. Policy #0 lag: (min: 63.0, avg: 151.3, max: 319.0) [2024-06-15 18:16:30,956][1648985] Avg episode reward: [(0, '126.620')] [2024-06-15 18:16:31,602][1652491] Updated weights for policy 0, policy_version 549088 (0.0015) [2024-06-15 18:16:34,341][1652491] Updated weights for policy 0, policy_version 549137 (0.0019) [2024-06-15 18:16:35,224][1652491] Updated weights for policy 0, policy_version 549180 (0.0023) [2024-06-15 18:16:35,955][1648985] Fps is (10 sec: 45898.0, 60 sec: 46421.3, 300 sec: 46986.0). Total num frames: 1124761600. Throughput: 0: 11764.6. Samples: 281267200. Policy #0 lag: (min: 63.0, avg: 151.3, max: 319.0) [2024-06-15 18:16:35,955][1648985] Avg episode reward: [(0, '143.590')] [2024-06-15 18:16:36,699][1652491] Updated weights for policy 0, policy_version 549232 (0.0034) [2024-06-15 18:16:37,982][1652491] Updated weights for policy 0, policy_version 549264 (0.0018) [2024-06-15 18:16:40,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 45329.1, 300 sec: 46652.7). Total num frames: 1124990976. Throughput: 0: 11798.7. Samples: 281300992. Policy #0 lag: (min: 63.0, avg: 151.3, max: 319.0) [2024-06-15 18:16:40,956][1648985] Avg episode reward: [(0, '150.740')] [2024-06-15 18:16:41,595][1652491] Updated weights for policy 0, policy_version 549328 (0.0013) [2024-06-15 18:16:42,765][1652491] Updated weights for policy 0, policy_version 549369 (0.0039) [2024-06-15 18:16:45,955][1648985] Fps is (10 sec: 42598.1, 60 sec: 46967.7, 300 sec: 46874.9). Total num frames: 1125187584. Throughput: 0: 11662.2. Samples: 281373696. Policy #0 lag: (min: 63.0, avg: 151.3, max: 319.0) [2024-06-15 18:16:45,956][1648985] Avg episode reward: [(0, '144.990')] [2024-06-15 18:16:46,254][1652491] Updated weights for policy 0, policy_version 549432 (0.0011) [2024-06-15 18:16:47,046][1652491] Updated weights for policy 0, policy_version 549460 (0.0012) [2024-06-15 18:16:49,111][1652491] Updated weights for policy 0, policy_version 549520 (0.0013) [2024-06-15 18:16:50,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 48059.8, 300 sec: 46652.8). Total num frames: 1125515264. Throughput: 0: 11844.3. Samples: 281440256. Policy #0 lag: (min: 47.0, avg: 139.6, max: 303.0) [2024-06-15 18:16:50,956][1648985] Avg episode reward: [(0, '137.990')] [2024-06-15 18:16:52,975][1652491] Updated weights for policy 0, policy_version 549586 (0.0013) [2024-06-15 18:16:55,955][1648985] Fps is (10 sec: 45874.4, 60 sec: 45875.0, 300 sec: 46652.7). Total num frames: 1125646336. Throughput: 0: 11719.1. Samples: 281475072. Policy #0 lag: (min: 47.0, avg: 139.6, max: 303.0) [2024-06-15 18:16:55,956][1648985] Avg episode reward: [(0, '127.840')] [2024-06-15 18:16:55,978][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000549632_1125646336.pth... [2024-06-15 18:16:56,025][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000544176_1114472448.pth [2024-06-15 18:16:57,600][1652491] Updated weights for policy 0, policy_version 549685 (0.0014) [2024-06-15 18:16:59,263][1652491] Updated weights for policy 0, policy_version 549755 (0.0015) [2024-06-15 18:17:00,960][1648985] Fps is (10 sec: 45851.0, 60 sec: 48055.5, 300 sec: 46651.9). Total num frames: 1125974016. Throughput: 0: 11911.2. Samples: 281548288. Policy #0 lag: (min: 47.0, avg: 139.6, max: 303.0) [2024-06-15 18:17:00,961][1648985] Avg episode reward: [(0, '129.970')] [2024-06-15 18:17:01,331][1652491] Updated weights for policy 0, policy_version 549815 (0.0013) [2024-06-15 18:17:04,979][1652491] Updated weights for policy 0, policy_version 549883 (0.0012) [2024-06-15 18:17:05,955][1648985] Fps is (10 sec: 52429.5, 60 sec: 45875.2, 300 sec: 46652.7). Total num frames: 1126170624. Throughput: 0: 11776.0. Samples: 281617920. Policy #0 lag: (min: 47.0, avg: 139.6, max: 303.0) [2024-06-15 18:17:05,956][1648985] Avg episode reward: [(0, '149.030')] [2024-06-15 18:17:07,855][1651469] Signal inference workers to stop experience collection... (28600 times) [2024-06-15 18:17:07,915][1652491] InferenceWorker_p0-w0: stopping experience collection (28600 times) [2024-06-15 18:17:08,045][1651469] Signal inference workers to resume experience collection... (28600 times) [2024-06-15 18:17:08,045][1652491] InferenceWorker_p0-w0: resuming experience collection (28600 times) [2024-06-15 18:17:08,885][1652491] Updated weights for policy 0, policy_version 549950 (0.0013) [2024-06-15 18:17:10,955][1648985] Fps is (10 sec: 45899.1, 60 sec: 48059.7, 300 sec: 46763.8). Total num frames: 1126432768. Throughput: 0: 12016.2. Samples: 281658880. Policy #0 lag: (min: 47.0, avg: 139.6, max: 303.0) [2024-06-15 18:17:10,956][1648985] Avg episode reward: [(0, '144.080')] [2024-06-15 18:17:11,145][1652491] Updated weights for policy 0, policy_version 550019 (0.0022) [2024-06-15 18:17:12,579][1652491] Updated weights for policy 0, policy_version 550080 (0.0013) [2024-06-15 18:17:15,723][1652491] Updated weights for policy 0, policy_version 550144 (0.0013) [2024-06-15 18:17:15,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 48059.8, 300 sec: 46874.9). Total num frames: 1126694912. Throughput: 0: 11923.9. Samples: 281728000. Policy #0 lag: (min: 47.0, avg: 139.6, max: 303.0) [2024-06-15 18:17:15,956][1648985] Avg episode reward: [(0, '149.790')] [2024-06-15 18:17:20,157][1652491] Updated weights for policy 0, policy_version 550213 (0.0013) [2024-06-15 18:17:20,955][1648985] Fps is (10 sec: 45875.4, 60 sec: 48059.8, 300 sec: 46874.9). Total num frames: 1126891520. Throughput: 0: 11787.4. Samples: 281797632. Policy #0 lag: (min: 47.0, avg: 139.6, max: 303.0) [2024-06-15 18:17:20,956][1648985] Avg episode reward: [(0, '156.900')] [2024-06-15 18:17:21,377][1652491] Updated weights for policy 0, policy_version 550272 (0.0014) [2024-06-15 18:17:23,483][1652491] Updated weights for policy 0, policy_version 550320 (0.0013) [2024-06-15 18:17:25,419][1652491] Updated weights for policy 0, policy_version 550352 (0.0013) [2024-06-15 18:17:25,955][1648985] Fps is (10 sec: 45875.4, 60 sec: 47517.5, 300 sec: 46874.9). Total num frames: 1127153664. Throughput: 0: 11741.9. Samples: 281829376. Policy #0 lag: (min: 47.0, avg: 139.6, max: 303.0) [2024-06-15 18:17:25,955][1648985] Avg episode reward: [(0, '152.060')] [2024-06-15 18:17:30,425][1652491] Updated weights for policy 0, policy_version 550416 (0.0014) [2024-06-15 18:17:30,955][1648985] Fps is (10 sec: 39321.7, 60 sec: 46967.5, 300 sec: 46874.9). Total num frames: 1127284736. Throughput: 0: 11855.6. Samples: 281907200. Policy #0 lag: (min: 47.0, avg: 139.6, max: 303.0) [2024-06-15 18:17:30,956][1648985] Avg episode reward: [(0, '146.860')] [2024-06-15 18:17:32,343][1652491] Updated weights for policy 0, policy_version 550485 (0.0013) [2024-06-15 18:17:34,125][1652491] Updated weights for policy 0, policy_version 550531 (0.0019) [2024-06-15 18:17:35,384][1652491] Updated weights for policy 0, policy_version 550589 (0.0013) [2024-06-15 18:17:35,955][1648985] Fps is (10 sec: 45875.6, 60 sec: 47513.7, 300 sec: 46763.9). Total num frames: 1127612416. Throughput: 0: 11730.5. Samples: 281968128. Policy #0 lag: (min: 47.0, avg: 139.6, max: 303.0) [2024-06-15 18:17:35,955][1648985] Avg episode reward: [(0, '163.900')] [2024-06-15 18:17:37,879][1652491] Updated weights for policy 0, policy_version 550651 (0.0013) [2024-06-15 18:17:40,955][1648985] Fps is (10 sec: 45875.1, 60 sec: 45875.2, 300 sec: 46652.8). Total num frames: 1127743488. Throughput: 0: 11741.9. Samples: 282003456. Policy #0 lag: (min: 47.0, avg: 139.6, max: 303.0) [2024-06-15 18:17:40,956][1648985] Avg episode reward: [(0, '143.300')] [2024-06-15 18:17:42,765][1652491] Updated weights for policy 0, policy_version 550689 (0.0013) [2024-06-15 18:17:44,406][1652491] Updated weights for policy 0, policy_version 550753 (0.0012) [2024-06-15 18:17:45,634][1652491] Updated weights for policy 0, policy_version 550816 (0.0022) [2024-06-15 18:17:45,955][1648985] Fps is (10 sec: 45874.9, 60 sec: 48059.8, 300 sec: 47097.1). Total num frames: 1128071168. Throughput: 0: 11640.8. Samples: 282072064. Policy #0 lag: (min: 47.0, avg: 139.6, max: 303.0) [2024-06-15 18:17:45,955][1648985] Avg episode reward: [(0, '139.960')] [2024-06-15 18:17:48,276][1651469] Signal inference workers to stop experience collection... (28650 times) [2024-06-15 18:17:48,326][1652491] InferenceWorker_p0-w0: stopping experience collection (28650 times) [2024-06-15 18:17:48,328][1652491] Updated weights for policy 0, policy_version 550850 (0.0013) [2024-06-15 18:17:48,582][1651469] Signal inference workers to resume experience collection... (28650 times) [2024-06-15 18:17:48,583][1652491] InferenceWorker_p0-w0: resuming experience collection (28650 times) [2024-06-15 18:17:49,592][1652491] Updated weights for policy 0, policy_version 550906 (0.0013) [2024-06-15 18:17:50,955][1648985] Fps is (10 sec: 52428.0, 60 sec: 45875.1, 300 sec: 46986.0). Total num frames: 1128267776. Throughput: 0: 11753.2. Samples: 282146816. Policy #0 lag: (min: 47.0, avg: 139.6, max: 303.0) [2024-06-15 18:17:50,956][1648985] Avg episode reward: [(0, '128.640')] [2024-06-15 18:17:53,939][1652491] Updated weights for policy 0, policy_version 550961 (0.0013) [2024-06-15 18:17:55,676][1652491] Updated weights for policy 0, policy_version 551031 (0.0015) [2024-06-15 18:17:55,955][1648985] Fps is (10 sec: 45874.7, 60 sec: 48059.8, 300 sec: 47319.2). Total num frames: 1128529920. Throughput: 0: 11662.2. Samples: 282183680. Policy #0 lag: (min: 47.0, avg: 139.6, max: 303.0) [2024-06-15 18:17:55,956][1648985] Avg episode reward: [(0, '126.420')] [2024-06-15 18:17:56,446][1652491] Updated weights for policy 0, policy_version 551062 (0.0021) [2024-06-15 18:18:00,044][1652491] Updated weights for policy 0, policy_version 551137 (0.0012) [2024-06-15 18:18:00,955][1648985] Fps is (10 sec: 52429.7, 60 sec: 46971.6, 300 sec: 47097.1). Total num frames: 1128792064. Throughput: 0: 11616.7. Samples: 282250752. Policy #0 lag: (min: 47.0, avg: 139.6, max: 303.0) [2024-06-15 18:18:00,956][1648985] Avg episode reward: [(0, '139.510')] [2024-06-15 18:18:04,688][1652491] Updated weights for policy 0, policy_version 551170 (0.0039) [2024-06-15 18:18:05,955][1648985] Fps is (10 sec: 32768.0, 60 sec: 44782.9, 300 sec: 46874.9). Total num frames: 1128857600. Throughput: 0: 11628.1. Samples: 282320896. Policy #0 lag: (min: 47.0, avg: 139.6, max: 303.0) [2024-06-15 18:18:05,956][1648985] Avg episode reward: [(0, '151.270')] [2024-06-15 18:18:07,348][1652491] Updated weights for policy 0, policy_version 551270 (0.0013) [2024-06-15 18:18:09,103][1652491] Updated weights for policy 0, policy_version 551347 (0.0012) [2024-06-15 18:18:10,977][1652491] Updated weights for policy 0, policy_version 551376 (0.0015) [2024-06-15 18:18:10,986][1648985] Fps is (10 sec: 42464.9, 60 sec: 46397.1, 300 sec: 46758.8). Total num frames: 1129218048. Throughput: 0: 11392.6. Samples: 282342400. Policy #0 lag: (min: 47.0, avg: 139.6, max: 303.0) [2024-06-15 18:18:10,987][1648985] Avg episode reward: [(0, '163.790')] [2024-06-15 18:18:12,034][1652491] Updated weights for policy 0, policy_version 551424 (0.0032) [2024-06-15 18:18:15,955][1648985] Fps is (10 sec: 45875.2, 60 sec: 43690.7, 300 sec: 46652.8). Total num frames: 1129316352. Throughput: 0: 11400.5. Samples: 282420224. Policy #0 lag: (min: 47.0, avg: 139.6, max: 303.0) [2024-06-15 18:18:15,956][1648985] Avg episode reward: [(0, '156.140')] [2024-06-15 18:18:18,159][1652491] Updated weights for policy 0, policy_version 551493 (0.0060) [2024-06-15 18:18:20,105][1652491] Updated weights for policy 0, policy_version 551568 (0.0015) [2024-06-15 18:18:20,955][1648985] Fps is (10 sec: 42733.0, 60 sec: 45875.3, 300 sec: 46763.8). Total num frames: 1129644032. Throughput: 0: 11343.6. Samples: 282478592. Policy #0 lag: (min: 15.0, avg: 79.3, max: 271.0) [2024-06-15 18:18:20,956][1648985] Avg episode reward: [(0, '146.610')] [2024-06-15 18:18:21,292][1652491] Updated weights for policy 0, policy_version 551616 (0.0019) [2024-06-15 18:18:23,507][1652491] Updated weights for policy 0, policy_version 551666 (0.0021) [2024-06-15 18:18:25,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 44782.9, 300 sec: 46652.7). Total num frames: 1129840640. Throughput: 0: 11400.5. Samples: 282516480. Policy #0 lag: (min: 15.0, avg: 79.3, max: 271.0) [2024-06-15 18:18:25,956][1648985] Avg episode reward: [(0, '145.170')] [2024-06-15 18:18:28,513][1652491] Updated weights for policy 0, policy_version 551715 (0.0012) [2024-06-15 18:18:29,291][1652491] Updated weights for policy 0, policy_version 551761 (0.0013) [2024-06-15 18:18:29,623][1651469] Signal inference workers to stop experience collection... (28700 times) [2024-06-15 18:18:29,668][1652491] InferenceWorker_p0-w0: stopping experience collection (28700 times) [2024-06-15 18:18:29,838][1651469] Signal inference workers to resume experience collection... (28700 times) [2024-06-15 18:18:29,850][1652491] InferenceWorker_p0-w0: resuming experience collection (28700 times) [2024-06-15 18:18:30,640][1652491] Updated weights for policy 0, policy_version 551824 (0.0014) [2024-06-15 18:18:30,955][1648985] Fps is (10 sec: 49151.7, 60 sec: 47513.6, 300 sec: 46986.0). Total num frames: 1130135552. Throughput: 0: 11628.1. Samples: 282595328. Policy #0 lag: (min: 15.0, avg: 79.3, max: 271.0) [2024-06-15 18:18:30,955][1648985] Avg episode reward: [(0, '156.610')] [2024-06-15 18:18:32,762][1652491] Updated weights for policy 0, policy_version 551876 (0.0012) [2024-06-15 18:18:34,074][1652491] Updated weights for policy 0, policy_version 551936 (0.0013) [2024-06-15 18:18:35,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 45875.1, 300 sec: 46763.9). Total num frames: 1130364928. Throughput: 0: 11514.4. Samples: 282664960. Policy #0 lag: (min: 15.0, avg: 79.3, max: 271.0) [2024-06-15 18:18:35,956][1648985] Avg episode reward: [(0, '155.550')] [2024-06-15 18:18:40,235][1652491] Updated weights for policy 0, policy_version 552000 (0.0035) [2024-06-15 18:18:40,955][1648985] Fps is (10 sec: 42598.7, 60 sec: 46967.6, 300 sec: 47208.2). Total num frames: 1130561536. Throughput: 0: 11594.0. Samples: 282705408. Policy #0 lag: (min: 15.0, avg: 79.3, max: 271.0) [2024-06-15 18:18:40,955][1648985] Avg episode reward: [(0, '151.500')] [2024-06-15 18:18:41,680][1652491] Updated weights for policy 0, policy_version 552066 (0.0098) [2024-06-15 18:18:42,980][1652491] Updated weights for policy 0, policy_version 552123 (0.0014) [2024-06-15 18:18:45,524][1652491] Updated weights for policy 0, policy_version 552185 (0.0012) [2024-06-15 18:18:45,955][1648985] Fps is (10 sec: 52428.5, 60 sec: 46967.4, 300 sec: 46986.0). Total num frames: 1130889216. Throughput: 0: 11502.9. Samples: 282768384. Policy #0 lag: (min: 15.0, avg: 79.3, max: 271.0) [2024-06-15 18:18:45,956][1648985] Avg episode reward: [(0, '148.200')] [2024-06-15 18:18:50,955][1648985] Fps is (10 sec: 42597.6, 60 sec: 45329.1, 300 sec: 46986.0). Total num frames: 1130987520. Throughput: 0: 11548.4. Samples: 282840576. Policy #0 lag: (min: 15.0, avg: 79.3, max: 271.0) [2024-06-15 18:18:50,956][1648985] Avg episode reward: [(0, '152.750')] [2024-06-15 18:18:50,971][1652491] Updated weights for policy 0, policy_version 552242 (0.0014) [2024-06-15 18:18:52,020][1652491] Updated weights for policy 0, policy_version 552304 (0.0014) [2024-06-15 18:18:54,024][1652491] Updated weights for policy 0, policy_version 552379 (0.0019) [2024-06-15 18:18:55,955][1648985] Fps is (10 sec: 39321.5, 60 sec: 45875.2, 300 sec: 46652.7). Total num frames: 1131282432. Throughput: 0: 11567.9. Samples: 282862592. Policy #0 lag: (min: 15.0, avg: 79.3, max: 271.0) [2024-06-15 18:18:55,956][1648985] Avg episode reward: [(0, '156.890')] [2024-06-15 18:18:55,960][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000552384_1131282432.pth... [2024-06-15 18:18:56,148][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000547008_1120272384.pth [2024-06-15 18:18:57,260][1652491] Updated weights for policy 0, policy_version 552437 (0.0013) [2024-06-15 18:19:00,955][1648985] Fps is (10 sec: 45875.7, 60 sec: 44236.8, 300 sec: 46763.8). Total num frames: 1131446272. Throughput: 0: 11764.6. Samples: 282949632. Policy #0 lag: (min: 15.0, avg: 79.3, max: 271.0) [2024-06-15 18:19:00,956][1648985] Avg episode reward: [(0, '144.310')] [2024-06-15 18:19:01,870][1652491] Updated weights for policy 0, policy_version 552505 (0.0014) [2024-06-15 18:19:03,880][1652491] Updated weights for policy 0, policy_version 552577 (0.0120) [2024-06-15 18:19:05,144][1652491] Updated weights for policy 0, policy_version 552640 (0.0012) [2024-06-15 18:19:05,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 49152.0, 300 sec: 46767.2). Total num frames: 1131806720. Throughput: 0: 11787.4. Samples: 283009024. Policy #0 lag: (min: 15.0, avg: 79.3, max: 271.0) [2024-06-15 18:19:05,956][1648985] Avg episode reward: [(0, '133.200')] [2024-06-15 18:19:08,893][1652491] Updated weights for policy 0, policy_version 552698 (0.0022) [2024-06-15 18:19:10,955][1648985] Fps is (10 sec: 49151.8, 60 sec: 45352.8, 300 sec: 46652.7). Total num frames: 1131937792. Throughput: 0: 11673.6. Samples: 283041792. Policy #0 lag: (min: 15.0, avg: 79.3, max: 271.0) [2024-06-15 18:19:10,956][1648985] Avg episode reward: [(0, '137.410')] [2024-06-15 18:19:12,950][1651469] Signal inference workers to stop experience collection... (28750 times) [2024-06-15 18:19:12,998][1652491] InferenceWorker_p0-w0: stopping experience collection (28750 times) [2024-06-15 18:19:13,191][1651469] Signal inference workers to resume experience collection... (28750 times) [2024-06-15 18:19:13,192][1652491] InferenceWorker_p0-w0: resuming experience collection (28750 times) [2024-06-15 18:19:13,194][1652491] Updated weights for policy 0, policy_version 552736 (0.0013) [2024-06-15 18:19:15,115][1652491] Updated weights for policy 0, policy_version 552818 (0.0014) [2024-06-15 18:19:15,955][1648985] Fps is (10 sec: 42598.3, 60 sec: 48605.9, 300 sec: 46541.7). Total num frames: 1132232704. Throughput: 0: 11480.2. Samples: 283111936. Policy #0 lag: (min: 15.0, avg: 79.3, max: 271.0) [2024-06-15 18:19:15,956][1648985] Avg episode reward: [(0, '129.990')] [2024-06-15 18:19:16,593][1652491] Updated weights for policy 0, policy_version 552886 (0.0012) [2024-06-15 18:19:19,628][1652491] Updated weights for policy 0, policy_version 552928 (0.0030) [2024-06-15 18:19:20,955][1648985] Fps is (10 sec: 52427.7, 60 sec: 46967.2, 300 sec: 46652.7). Total num frames: 1132462080. Throughput: 0: 11548.4. Samples: 283184640. Policy #0 lag: (min: 15.0, avg: 79.3, max: 271.0) [2024-06-15 18:19:20,956][1648985] Avg episode reward: [(0, '125.960')] [2024-06-15 18:19:23,606][1652491] Updated weights for policy 0, policy_version 552963 (0.0012) [2024-06-15 18:19:25,540][1652491] Updated weights for policy 0, policy_version 553040 (0.0014) [2024-06-15 18:19:25,955][1648985] Fps is (10 sec: 39320.8, 60 sec: 46421.2, 300 sec: 46319.5). Total num frames: 1132625920. Throughput: 0: 11605.3. Samples: 283227648. Policy #0 lag: (min: 15.0, avg: 79.3, max: 271.0) [2024-06-15 18:19:25,956][1648985] Avg episode reward: [(0, '138.360')] [2024-06-15 18:19:27,111][1652491] Updated weights for policy 0, policy_version 553104 (0.0119) [2024-06-15 18:19:30,650][1652491] Updated weights for policy 0, policy_version 553168 (0.0094) [2024-06-15 18:19:30,957][1648985] Fps is (10 sec: 45867.5, 60 sec: 46419.8, 300 sec: 46430.3). Total num frames: 1132920832. Throughput: 0: 11502.5. Samples: 283286016. Policy #0 lag: (min: 15.0, avg: 79.3, max: 271.0) [2024-06-15 18:19:30,958][1648985] Avg episode reward: [(0, '138.160')] [2024-06-15 18:19:31,692][1652491] Updated weights for policy 0, policy_version 553215 (0.0035) [2024-06-15 18:19:35,955][1648985] Fps is (10 sec: 42599.6, 60 sec: 44783.0, 300 sec: 46319.5). Total num frames: 1133051904. Throughput: 0: 11628.1. Samples: 283363840. Policy #0 lag: (min: 15.0, avg: 79.3, max: 271.0) [2024-06-15 18:19:35,955][1648985] Avg episode reward: [(0, '151.170')] [2024-06-15 18:19:36,524][1652491] Updated weights for policy 0, policy_version 553265 (0.0014) [2024-06-15 18:19:37,917][1652491] Updated weights for policy 0, policy_version 553328 (0.0013) [2024-06-15 18:19:39,260][1652491] Updated weights for policy 0, policy_version 553382 (0.0012) [2024-06-15 18:19:40,999][1648985] Fps is (10 sec: 45685.1, 60 sec: 46933.3, 300 sec: 46201.6). Total num frames: 1133379584. Throughput: 0: 11673.7. Samples: 283388416. Policy #0 lag: (min: 15.0, avg: 79.3, max: 271.0) [2024-06-15 18:19:40,999][1648985] Avg episode reward: [(0, '148.740')] [2024-06-15 18:19:42,142][1652491] Updated weights for policy 0, policy_version 553424 (0.0014) [2024-06-15 18:19:45,955][1648985] Fps is (10 sec: 45874.7, 60 sec: 43690.7, 300 sec: 46208.4). Total num frames: 1133510656. Throughput: 0: 11468.8. Samples: 283465728. Policy #0 lag: (min: 15.0, avg: 79.3, max: 271.0) [2024-06-15 18:19:45,956][1648985] Avg episode reward: [(0, '146.550')] [2024-06-15 18:19:46,911][1652491] Updated weights for policy 0, policy_version 553488 (0.0145) [2024-06-15 18:19:48,387][1652491] Updated weights for policy 0, policy_version 553537 (0.0012) [2024-06-15 18:19:50,206][1652491] Updated weights for policy 0, policy_version 553616 (0.0092) [2024-06-15 18:19:50,331][1651469] Signal inference workers to stop experience collection... (28800 times) [2024-06-15 18:19:50,463][1652491] InferenceWorker_p0-w0: stopping experience collection (28800 times) [2024-06-15 18:19:50,577][1651469] Signal inference workers to resume experience collection... (28800 times) [2024-06-15 18:19:50,578][1652491] InferenceWorker_p0-w0: resuming experience collection (28800 times) [2024-06-15 18:19:50,955][1648985] Fps is (10 sec: 46075.5, 60 sec: 47513.6, 300 sec: 46208.4). Total num frames: 1133838336. Throughput: 0: 11468.8. Samples: 283525120. Policy #0 lag: (min: 47.0, avg: 110.3, max: 299.0) [2024-06-15 18:19:50,956][1648985] Avg episode reward: [(0, '168.750')] [2024-06-15 18:19:53,768][1652491] Updated weights for policy 0, policy_version 553685 (0.0013) [2024-06-15 18:19:55,955][1648985] Fps is (10 sec: 52427.6, 60 sec: 45875.0, 300 sec: 46541.6). Total num frames: 1134034944. Throughput: 0: 11559.8. Samples: 283561984. Policy #0 lag: (min: 47.0, avg: 110.3, max: 299.0) [2024-06-15 18:19:55,956][1648985] Avg episode reward: [(0, '180.900')] [2024-06-15 18:19:58,473][1652491] Updated weights for policy 0, policy_version 553744 (0.0075) [2024-06-15 18:20:00,138][1652491] Updated weights for policy 0, policy_version 553808 (0.0120) [2024-06-15 18:20:00,955][1648985] Fps is (10 sec: 39321.3, 60 sec: 46421.2, 300 sec: 45986.3). Total num frames: 1134231552. Throughput: 0: 11571.2. Samples: 283632640. Policy #0 lag: (min: 47.0, avg: 110.3, max: 299.0) [2024-06-15 18:20:00,956][1648985] Avg episode reward: [(0, '170.790')] [2024-06-15 18:20:02,027][1652491] Updated weights for policy 0, policy_version 553874 (0.0013) [2024-06-15 18:20:05,120][1652491] Updated weights for policy 0, policy_version 553936 (0.0012) [2024-06-15 18:20:05,955][1648985] Fps is (10 sec: 49153.1, 60 sec: 45329.1, 300 sec: 46541.7). Total num frames: 1134526464. Throughput: 0: 11298.2. Samples: 283693056. Policy #0 lag: (min: 47.0, avg: 110.3, max: 299.0) [2024-06-15 18:20:05,956][1648985] Avg episode reward: [(0, '150.520')] [2024-06-15 18:20:10,768][1652491] Updated weights for policy 0, policy_version 554003 (0.0016) [2024-06-15 18:20:10,955][1648985] Fps is (10 sec: 36045.3, 60 sec: 44236.8, 300 sec: 45875.2). Total num frames: 1134592000. Throughput: 0: 11320.9. Samples: 283737088. Policy #0 lag: (min: 47.0, avg: 110.3, max: 299.0) [2024-06-15 18:20:10,956][1648985] Avg episode reward: [(0, '144.080')] [2024-06-15 18:20:12,317][1652491] Updated weights for policy 0, policy_version 554064 (0.0019) [2024-06-15 18:20:13,855][1652491] Updated weights for policy 0, policy_version 554128 (0.0011) [2024-06-15 18:20:14,983][1652491] Updated weights for policy 0, policy_version 554175 (0.0015) [2024-06-15 18:20:15,956][1648985] Fps is (10 sec: 42593.1, 60 sec: 45328.1, 300 sec: 46208.3). Total num frames: 1134952448. Throughput: 0: 11355.2. Samples: 283796992. Policy #0 lag: (min: 47.0, avg: 110.3, max: 299.0) [2024-06-15 18:20:15,958][1648985] Avg episode reward: [(0, '143.570')] [2024-06-15 18:20:17,167][1652491] Updated weights for policy 0, policy_version 554230 (0.0014) [2024-06-15 18:20:20,955][1648985] Fps is (10 sec: 49152.2, 60 sec: 43690.9, 300 sec: 46208.4). Total num frames: 1135083520. Throughput: 0: 11457.4. Samples: 283879424. Policy #0 lag: (min: 47.0, avg: 110.3, max: 299.0) [2024-06-15 18:20:20,955][1648985] Avg episode reward: [(0, '142.740')] [2024-06-15 18:20:22,550][1652491] Updated weights for policy 0, policy_version 554288 (0.0014) [2024-06-15 18:20:24,564][1652491] Updated weights for policy 0, policy_version 554369 (0.0095) [2024-06-15 18:20:25,964][1648985] Fps is (10 sec: 52386.7, 60 sec: 47506.4, 300 sec: 46207.0). Total num frames: 1135476736. Throughput: 0: 11568.6. Samples: 283908608. Policy #0 lag: (min: 47.0, avg: 110.3, max: 299.0) [2024-06-15 18:20:25,965][1648985] Avg episode reward: [(0, '145.960')] [2024-06-15 18:20:27,481][1652491] Updated weights for policy 0, policy_version 554436 (0.0013) [2024-06-15 18:20:28,684][1652491] Updated weights for policy 0, policy_version 554494 (0.0011) [2024-06-15 18:20:30,955][1648985] Fps is (10 sec: 52429.3, 60 sec: 44784.5, 300 sec: 46208.4). Total num frames: 1135607808. Throughput: 0: 11218.5. Samples: 283970560. Policy #0 lag: (min: 47.0, avg: 110.3, max: 299.0) [2024-06-15 18:20:30,955][1648985] Avg episode reward: [(0, '149.410')] [2024-06-15 18:20:34,647][1651469] Signal inference workers to stop experience collection... (28850 times) [2024-06-15 18:20:34,702][1652491] InferenceWorker_p0-w0: stopping experience collection (28850 times) [2024-06-15 18:20:34,996][1651469] Signal inference workers to resume experience collection... (28850 times) [2024-06-15 18:20:34,997][1652491] InferenceWorker_p0-w0: resuming experience collection (28850 times) [2024-06-15 18:20:34,998][1652491] Updated weights for policy 0, policy_version 554560 (0.0140) [2024-06-15 18:20:35,955][1648985] Fps is (10 sec: 29518.9, 60 sec: 45329.1, 300 sec: 45764.1). Total num frames: 1135771648. Throughput: 0: 11423.3. Samples: 284039168. Policy #0 lag: (min: 47.0, avg: 110.3, max: 299.0) [2024-06-15 18:20:35,955][1648985] Avg episode reward: [(0, '156.490')] [2024-06-15 18:20:36,983][1652491] Updated weights for policy 0, policy_version 554640 (0.0023) [2024-06-15 18:20:37,841][1652491] Updated weights for policy 0, policy_version 554688 (0.0129) [2024-06-15 18:20:40,176][1652491] Updated weights for policy 0, policy_version 554739 (0.0012) [2024-06-15 18:20:40,955][1648985] Fps is (10 sec: 52427.8, 60 sec: 45908.5, 300 sec: 46652.8). Total num frames: 1136132096. Throughput: 0: 11389.2. Samples: 284074496. Policy #0 lag: (min: 47.0, avg: 110.3, max: 299.0) [2024-06-15 18:20:40,956][1648985] Avg episode reward: [(0, '155.350')] [2024-06-15 18:20:44,825][1652491] Updated weights for policy 0, policy_version 554769 (0.0019) [2024-06-15 18:20:45,955][1648985] Fps is (10 sec: 49151.0, 60 sec: 45875.1, 300 sec: 46208.4). Total num frames: 1136263168. Throughput: 0: 11491.6. Samples: 284149760. Policy #0 lag: (min: 47.0, avg: 110.3, max: 299.0) [2024-06-15 18:20:45,956][1648985] Avg episode reward: [(0, '170.230')] [2024-06-15 18:20:46,399][1652491] Updated weights for policy 0, policy_version 554832 (0.0123) [2024-06-15 18:20:47,656][1652491] Updated weights for policy 0, policy_version 554883 (0.0013) [2024-06-15 18:20:48,682][1652491] Updated weights for policy 0, policy_version 554941 (0.0085) [2024-06-15 18:20:50,955][1648985] Fps is (10 sec: 45875.2, 60 sec: 45875.2, 300 sec: 46430.6). Total num frames: 1136590848. Throughput: 0: 11446.0. Samples: 284208128. Policy #0 lag: (min: 47.0, avg: 110.3, max: 299.0) [2024-06-15 18:20:50,956][1648985] Avg episode reward: [(0, '156.960')] [2024-06-15 18:20:51,290][1652491] Updated weights for policy 0, policy_version 554997 (0.0013) [2024-06-15 18:20:55,323][1652491] Updated weights for policy 0, policy_version 555011 (0.0012) [2024-06-15 18:20:55,955][1648985] Fps is (10 sec: 45875.6, 60 sec: 44783.1, 300 sec: 46208.4). Total num frames: 1136721920. Throughput: 0: 11446.0. Samples: 284252160. Policy #0 lag: (min: 47.0, avg: 110.3, max: 299.0) [2024-06-15 18:20:55,956][1648985] Avg episode reward: [(0, '151.790')] [2024-06-15 18:20:56,386][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000555056_1136754688.pth... [2024-06-15 18:20:56,444][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000549632_1125646336.pth [2024-06-15 18:20:56,833][1652491] Updated weights for policy 0, policy_version 555072 (0.0013) [2024-06-15 18:20:58,436][1652491] Updated weights for policy 0, policy_version 555136 (0.0013) [2024-06-15 18:20:59,929][1652491] Updated weights for policy 0, policy_version 555200 (0.0014) [2024-06-15 18:21:00,955][1648985] Fps is (10 sec: 45875.5, 60 sec: 46967.6, 300 sec: 46208.4). Total num frames: 1137049600. Throughput: 0: 11446.4. Samples: 284312064. Policy #0 lag: (min: 47.0, avg: 110.3, max: 299.0) [2024-06-15 18:21:00,956][1648985] Avg episode reward: [(0, '142.920')] [2024-06-15 18:21:03,001][1652491] Updated weights for policy 0, policy_version 555258 (0.0029) [2024-06-15 18:21:05,955][1648985] Fps is (10 sec: 45875.5, 60 sec: 44236.8, 300 sec: 46208.4). Total num frames: 1137180672. Throughput: 0: 11286.7. Samples: 284387328. Policy #0 lag: (min: 47.0, avg: 110.3, max: 299.0) [2024-06-15 18:21:05,956][1648985] Avg episode reward: [(0, '165.120')] [2024-06-15 18:21:08,635][1652491] Updated weights for policy 0, policy_version 555312 (0.0012) [2024-06-15 18:21:10,271][1652491] Updated weights for policy 0, policy_version 555376 (0.0012) [2024-06-15 18:21:10,955][1648985] Fps is (10 sec: 42597.9, 60 sec: 48059.6, 300 sec: 46319.5). Total num frames: 1137475584. Throughput: 0: 11289.1. Samples: 284416512. Policy #0 lag: (min: 47.0, avg: 110.3, max: 299.0) [2024-06-15 18:21:10,956][1648985] Avg episode reward: [(0, '170.910')] [2024-06-15 18:21:11,912][1652491] Updated weights for policy 0, policy_version 555446 (0.0032) [2024-06-15 18:21:13,328][1651469] Signal inference workers to stop experience collection... (28900 times) [2024-06-15 18:21:13,408][1652491] InferenceWorker_p0-w0: stopping experience collection (28900 times) [2024-06-15 18:21:13,584][1651469] Signal inference workers to resume experience collection... (28900 times) [2024-06-15 18:21:13,584][1652491] InferenceWorker_p0-w0: resuming experience collection (28900 times) [2024-06-15 18:21:14,183][1652491] Updated weights for policy 0, policy_version 555494 (0.0034) [2024-06-15 18:21:15,955][1648985] Fps is (10 sec: 52429.2, 60 sec: 45876.2, 300 sec: 46430.6). Total num frames: 1137704960. Throughput: 0: 11275.4. Samples: 284477952. Policy #0 lag: (min: 47.0, avg: 110.3, max: 299.0) [2024-06-15 18:21:15,955][1648985] Avg episode reward: [(0, '178.740')] [2024-06-15 18:21:19,451][1652491] Updated weights for policy 0, policy_version 555537 (0.0015) [2024-06-15 18:21:20,955][1648985] Fps is (10 sec: 39322.0, 60 sec: 46421.3, 300 sec: 45987.0). Total num frames: 1137868800. Throughput: 0: 11355.0. Samples: 284550144. Policy #0 lag: (min: 9.0, avg: 70.3, max: 265.0) [2024-06-15 18:21:20,956][1648985] Avg episode reward: [(0, '161.400')] [2024-06-15 18:21:21,497][1652491] Updated weights for policy 0, policy_version 555637 (0.0116) [2024-06-15 18:21:23,083][1652491] Updated weights for policy 0, policy_version 555696 (0.0016) [2024-06-15 18:21:25,925][1652491] Updated weights for policy 0, policy_version 555771 (0.0016) [2024-06-15 18:21:25,955][1648985] Fps is (10 sec: 49151.4, 60 sec: 45336.1, 300 sec: 46541.7). Total num frames: 1138196480. Throughput: 0: 11241.3. Samples: 284580352. Policy #0 lag: (min: 9.0, avg: 70.3, max: 265.0) [2024-06-15 18:21:25,956][1648985] Avg episode reward: [(0, '166.190')] [2024-06-15 18:21:30,955][1648985] Fps is (10 sec: 39321.6, 60 sec: 44236.7, 300 sec: 45764.1). Total num frames: 1138262016. Throughput: 0: 11434.7. Samples: 284664320. Policy #0 lag: (min: 9.0, avg: 70.3, max: 265.0) [2024-06-15 18:21:30,956][1648985] Avg episode reward: [(0, '172.370')] [2024-06-15 18:21:31,744][1652491] Updated weights for policy 0, policy_version 555843 (0.0014) [2024-06-15 18:21:33,011][1652491] Updated weights for policy 0, policy_version 555904 (0.0015) [2024-06-15 18:21:35,955][1648985] Fps is (10 sec: 42598.4, 60 sec: 47513.5, 300 sec: 46208.4). Total num frames: 1138622464. Throughput: 0: 11502.9. Samples: 284725760. Policy #0 lag: (min: 9.0, avg: 70.3, max: 265.0) [2024-06-15 18:21:35,956][1648985] Avg episode reward: [(0, '173.960')] [2024-06-15 18:21:36,049][1652491] Updated weights for policy 0, policy_version 555969 (0.0019) [2024-06-15 18:21:37,530][1652491] Updated weights for policy 0, policy_version 556030 (0.0014) [2024-06-15 18:21:40,957][1648985] Fps is (10 sec: 49145.0, 60 sec: 43689.7, 300 sec: 45986.1). Total num frames: 1138753536. Throughput: 0: 11377.4. Samples: 284764160. Policy #0 lag: (min: 9.0, avg: 70.3, max: 265.0) [2024-06-15 18:21:40,957][1648985] Avg episode reward: [(0, '157.600')] [2024-06-15 18:21:43,083][1652491] Updated weights for policy 0, policy_version 556096 (0.0013) [2024-06-15 18:21:44,511][1652491] Updated weights for policy 0, policy_version 556160 (0.0014) [2024-06-15 18:21:45,800][1652491] Updated weights for policy 0, policy_version 556224 (0.0015) [2024-06-15 18:21:45,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 48059.9, 300 sec: 46208.4). Total num frames: 1139146752. Throughput: 0: 11605.3. Samples: 284834304. Policy #0 lag: (min: 9.0, avg: 70.3, max: 265.0) [2024-06-15 18:21:45,955][1648985] Avg episode reward: [(0, '153.000')] [2024-06-15 18:21:50,955][1648985] Fps is (10 sec: 52436.5, 60 sec: 44783.0, 300 sec: 46208.5). Total num frames: 1139277824. Throughput: 0: 11787.4. Samples: 284917760. Policy #0 lag: (min: 9.0, avg: 70.3, max: 265.0) [2024-06-15 18:21:50,955][1648985] Avg episode reward: [(0, '183.260')] [2024-06-15 18:21:52,279][1652491] Updated weights for policy 0, policy_version 556289 (0.0016) [2024-06-15 18:21:54,131][1652491] Updated weights for policy 0, policy_version 556377 (0.0092) [2024-06-15 18:21:54,371][1651469] Signal inference workers to stop experience collection... (28950 times) [2024-06-15 18:21:54,445][1652491] InferenceWorker_p0-w0: stopping experience collection (28950 times) [2024-06-15 18:21:54,626][1651469] Signal inference workers to resume experience collection... (28950 times) [2024-06-15 18:21:54,627][1652491] InferenceWorker_p0-w0: resuming experience collection (28950 times) [2024-06-15 18:21:55,819][1652491] Updated weights for policy 0, policy_version 556452 (0.0026) [2024-06-15 18:21:55,955][1648985] Fps is (10 sec: 45873.7, 60 sec: 48059.5, 300 sec: 46209.2). Total num frames: 1139605504. Throughput: 0: 11878.3. Samples: 284951040. Policy #0 lag: (min: 9.0, avg: 70.3, max: 265.0) [2024-06-15 18:21:55,956][1648985] Avg episode reward: [(0, '177.670')] [2024-06-15 18:21:57,969][1652491] Updated weights for policy 0, policy_version 556517 (0.0014) [2024-06-15 18:22:00,955][1648985] Fps is (10 sec: 52429.2, 60 sec: 45875.3, 300 sec: 46208.5). Total num frames: 1139802112. Throughput: 0: 11980.8. Samples: 285017088. Policy #0 lag: (min: 9.0, avg: 70.3, max: 265.0) [2024-06-15 18:22:00,955][1648985] Avg episode reward: [(0, '175.920')] [2024-06-15 18:22:03,520][1652491] Updated weights for policy 0, policy_version 556560 (0.0101) [2024-06-15 18:22:04,864][1652491] Updated weights for policy 0, policy_version 556624 (0.0012) [2024-06-15 18:22:05,955][1648985] Fps is (10 sec: 45876.6, 60 sec: 48059.7, 300 sec: 46208.4). Total num frames: 1140064256. Throughput: 0: 12071.8. Samples: 285093376. Policy #0 lag: (min: 9.0, avg: 70.3, max: 265.0) [2024-06-15 18:22:05,956][1648985] Avg episode reward: [(0, '174.270')] [2024-06-15 18:22:06,465][1652491] Updated weights for policy 0, policy_version 556694 (0.0012) [2024-06-15 18:22:08,490][1652491] Updated weights for policy 0, policy_version 556754 (0.0017) [2024-06-15 18:22:10,962][1648985] Fps is (10 sec: 52392.7, 60 sec: 47508.3, 300 sec: 46207.4). Total num frames: 1140326400. Throughput: 0: 12001.8. Samples: 285120512. Policy #0 lag: (min: 9.0, avg: 70.3, max: 265.0) [2024-06-15 18:22:10,962][1648985] Avg episode reward: [(0, '171.340')] [2024-06-15 18:22:14,455][1652491] Updated weights for policy 0, policy_version 556817 (0.0014) [2024-06-15 18:22:15,955][1648985] Fps is (10 sec: 42597.8, 60 sec: 46421.2, 300 sec: 46097.3). Total num frames: 1140490240. Throughput: 0: 11992.1. Samples: 285203968. Policy #0 lag: (min: 9.0, avg: 70.3, max: 265.0) [2024-06-15 18:22:15,956][1648985] Avg episode reward: [(0, '149.660')] [2024-06-15 18:22:16,426][1652491] Updated weights for policy 0, policy_version 556912 (0.0013) [2024-06-15 18:22:17,871][1652491] Updated weights for policy 0, policy_version 556976 (0.0013) [2024-06-15 18:22:18,804][1652491] Updated weights for policy 0, policy_version 557009 (0.0019) [2024-06-15 18:22:19,740][1652491] Updated weights for policy 0, policy_version 557056 (0.0013) [2024-06-15 18:22:20,955][1648985] Fps is (10 sec: 52464.2, 60 sec: 49698.1, 300 sec: 46430.6). Total num frames: 1140850688. Throughput: 0: 12094.6. Samples: 285270016. Policy #0 lag: (min: 9.0, avg: 70.3, max: 265.0) [2024-06-15 18:22:20,956][1648985] Avg episode reward: [(0, '153.790')] [2024-06-15 18:22:25,955][1648985] Fps is (10 sec: 42598.3, 60 sec: 45329.0, 300 sec: 46208.4). Total num frames: 1140916224. Throughput: 0: 12151.8. Samples: 285310976. Policy #0 lag: (min: 9.0, avg: 70.3, max: 265.0) [2024-06-15 18:22:25,956][1648985] Avg episode reward: [(0, '167.110')] [2024-06-15 18:22:26,487][1652491] Updated weights for policy 0, policy_version 557123 (0.0013) [2024-06-15 18:22:28,541][1652491] Updated weights for policy 0, policy_version 557216 (0.0103) [2024-06-15 18:22:30,424][1652491] Updated weights for policy 0, policy_version 557264 (0.0015) [2024-06-15 18:22:30,957][1648985] Fps is (10 sec: 45865.4, 60 sec: 50788.6, 300 sec: 46430.2). Total num frames: 1141309440. Throughput: 0: 11855.1. Samples: 285367808. Policy #0 lag: (min: 9.0, avg: 70.3, max: 265.0) [2024-06-15 18:22:30,958][1648985] Avg episode reward: [(0, '168.500')] [2024-06-15 18:22:35,955][1648985] Fps is (10 sec: 45876.1, 60 sec: 45875.3, 300 sec: 46208.4). Total num frames: 1141374976. Throughput: 0: 11810.1. Samples: 285449216. Policy #0 lag: (min: 9.0, avg: 70.3, max: 265.0) [2024-06-15 18:22:35,955][1648985] Avg episode reward: [(0, '152.140')] [2024-06-15 18:22:36,825][1651469] Signal inference workers to stop experience collection... (29000 times) [2024-06-15 18:22:36,864][1652491] InferenceWorker_p0-w0: stopping experience collection (29000 times) [2024-06-15 18:22:37,090][1651469] Signal inference workers to resume experience collection... (29000 times) [2024-06-15 18:22:37,091][1652491] InferenceWorker_p0-w0: resuming experience collection (29000 times) [2024-06-15 18:22:37,299][1652491] Updated weights for policy 0, policy_version 557335 (0.0022) [2024-06-15 18:22:38,850][1652491] Updated weights for policy 0, policy_version 557408 (0.0014) [2024-06-15 18:22:40,412][1652491] Updated weights for policy 0, policy_version 557472 (0.0013) [2024-06-15 18:22:40,955][1648985] Fps is (10 sec: 42607.7, 60 sec: 49699.3, 300 sec: 46319.5). Total num frames: 1141735424. Throughput: 0: 11742.0. Samples: 285479424. Policy #0 lag: (min: 9.0, avg: 70.3, max: 265.0) [2024-06-15 18:22:40,955][1648985] Avg episode reward: [(0, '140.630')] [2024-06-15 18:22:42,464][1652491] Updated weights for policy 0, policy_version 557536 (0.0015) [2024-06-15 18:22:45,955][1648985] Fps is (10 sec: 52428.6, 60 sec: 45875.2, 300 sec: 46208.5). Total num frames: 1141899264. Throughput: 0: 11741.8. Samples: 285545472. Policy #0 lag: (min: 9.0, avg: 70.3, max: 265.0) [2024-06-15 18:22:45,955][1648985] Avg episode reward: [(0, '143.380')] [2024-06-15 18:22:48,376][1652491] Updated weights for policy 0, policy_version 557603 (0.0013) [2024-06-15 18:22:49,082][1652491] Updated weights for policy 0, policy_version 557648 (0.0054) [2024-06-15 18:22:50,235][1652491] Updated weights for policy 0, policy_version 557699 (0.0015) [2024-06-15 18:22:50,955][1648985] Fps is (10 sec: 49150.7, 60 sec: 49151.8, 300 sec: 46430.6). Total num frames: 1142226944. Throughput: 0: 11639.4. Samples: 285617152. Policy #0 lag: (min: 76.0, avg: 178.3, max: 383.0) [2024-06-15 18:22:50,956][1648985] Avg episode reward: [(0, '139.750')] [2024-06-15 18:22:51,517][1652491] Updated weights for policy 0, policy_version 557759 (0.0115) [2024-06-15 18:22:55,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 46967.7, 300 sec: 46208.4). Total num frames: 1142423552. Throughput: 0: 11789.2. Samples: 285650944. Policy #0 lag: (min: 76.0, avg: 178.3, max: 383.0) [2024-06-15 18:22:55,955][1648985] Avg episode reward: [(0, '159.100')] [2024-06-15 18:22:55,960][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000557824_1142423552.pth... [2024-06-15 18:22:56,031][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000552384_1131282432.pth [2024-06-15 18:22:58,599][1652491] Updated weights for policy 0, policy_version 557840 (0.0019) [2024-06-15 18:22:59,645][1652491] Updated weights for policy 0, policy_version 557889 (0.0013) [2024-06-15 18:23:00,955][1648985] Fps is (10 sec: 42599.8, 60 sec: 47513.6, 300 sec: 46763.8). Total num frames: 1142652928. Throughput: 0: 11798.8. Samples: 285734912. Policy #0 lag: (min: 76.0, avg: 178.3, max: 383.0) [2024-06-15 18:23:00,955][1648985] Avg episode reward: [(0, '157.260')] [2024-06-15 18:23:01,718][1652491] Updated weights for policy 0, policy_version 557968 (0.0039) [2024-06-15 18:23:03,954][1652491] Updated weights for policy 0, policy_version 558032 (0.0014) [2024-06-15 18:23:05,955][1648985] Fps is (10 sec: 52428.6, 60 sec: 48059.7, 300 sec: 46546.6). Total num frames: 1142947840. Throughput: 0: 11673.6. Samples: 285795328. Policy #0 lag: (min: 76.0, avg: 178.3, max: 383.0) [2024-06-15 18:23:05,956][1648985] Avg episode reward: [(0, '188.020')] [2024-06-15 18:23:09,778][1652491] Updated weights for policy 0, policy_version 558083 (0.0013) [2024-06-15 18:23:10,955][1648985] Fps is (10 sec: 39321.3, 60 sec: 45334.2, 300 sec: 46541.7). Total num frames: 1143046144. Throughput: 0: 11810.2. Samples: 285842432. Policy #0 lag: (min: 76.0, avg: 178.3, max: 383.0) [2024-06-15 18:23:10,955][1648985] Avg episode reward: [(0, '170.990')] [2024-06-15 18:23:11,216][1652491] Updated weights for policy 0, policy_version 558146 (0.0013) [2024-06-15 18:23:12,773][1651469] Signal inference workers to stop experience collection... (29050 times) [2024-06-15 18:23:12,893][1652491] InferenceWorker_p0-w0: stopping experience collection (29050 times) [2024-06-15 18:23:12,898][1652491] Updated weights for policy 0, policy_version 558215 (0.0119) [2024-06-15 18:23:13,013][1651469] Signal inference workers to resume experience collection... (29050 times) [2024-06-15 18:23:13,013][1652491] InferenceWorker_p0-w0: resuming experience collection (29050 times) [2024-06-15 18:23:14,966][1652491] Updated weights for policy 0, policy_version 558275 (0.0113) [2024-06-15 18:23:15,955][1648985] Fps is (10 sec: 49152.4, 60 sec: 49152.2, 300 sec: 46763.8). Total num frames: 1143439360. Throughput: 0: 11856.2. Samples: 285901312. Policy #0 lag: (min: 76.0, avg: 178.3, max: 383.0) [2024-06-15 18:23:15,955][1648985] Avg episode reward: [(0, '174.160')] [2024-06-15 18:23:16,273][1652491] Updated weights for policy 0, policy_version 558336 (0.0030) [2024-06-15 18:23:20,955][1648985] Fps is (10 sec: 42598.3, 60 sec: 43690.7, 300 sec: 46208.4). Total num frames: 1143472128. Throughput: 0: 11878.4. Samples: 285983744. Policy #0 lag: (min: 76.0, avg: 178.3, max: 383.0) [2024-06-15 18:23:20,956][1648985] Avg episode reward: [(0, '159.850')] [2024-06-15 18:23:22,504][1652491] Updated weights for policy 0, policy_version 558416 (0.0013) [2024-06-15 18:23:23,797][1652491] Updated weights for policy 0, policy_version 558468 (0.0087) [2024-06-15 18:23:24,816][1652491] Updated weights for policy 0, policy_version 558520 (0.0013) [2024-06-15 18:23:25,940][1652491] Updated weights for policy 0, policy_version 558560 (0.0086) [2024-06-15 18:23:25,955][1648985] Fps is (10 sec: 49152.3, 60 sec: 50244.5, 300 sec: 46763.8). Total num frames: 1143930880. Throughput: 0: 11798.8. Samples: 286010368. Policy #0 lag: (min: 76.0, avg: 178.3, max: 383.0) [2024-06-15 18:23:25,955][1648985] Avg episode reward: [(0, '176.220')] [2024-06-15 18:23:30,955][1648985] Fps is (10 sec: 52428.0, 60 sec: 44784.4, 300 sec: 46208.4). Total num frames: 1143996416. Throughput: 0: 12003.5. Samples: 286085632. Policy #0 lag: (min: 76.0, avg: 178.3, max: 383.0) [2024-06-15 18:23:30,956][1648985] Avg episode reward: [(0, '177.690')] [2024-06-15 18:23:31,646][1652491] Updated weights for policy 0, policy_version 558593 (0.0015) [2024-06-15 18:23:33,889][1652491] Updated weights for policy 0, policy_version 558689 (0.0014) [2024-06-15 18:23:34,985][1652491] Updated weights for policy 0, policy_version 558736 (0.0148) [2024-06-15 18:23:35,784][1652491] Updated weights for policy 0, policy_version 558779 (0.0011) [2024-06-15 18:23:35,955][1648985] Fps is (10 sec: 45874.9, 60 sec: 50244.3, 300 sec: 46874.9). Total num frames: 1144389632. Throughput: 0: 11844.3. Samples: 286150144. Policy #0 lag: (min: 76.0, avg: 178.3, max: 383.0) [2024-06-15 18:23:35,955][1648985] Avg episode reward: [(0, '188.860')] [2024-06-15 18:23:37,317][1652491] Updated weights for policy 0, policy_version 558837 (0.0120) [2024-06-15 18:23:40,955][1648985] Fps is (10 sec: 52429.7, 60 sec: 46421.3, 300 sec: 46208.4). Total num frames: 1144520704. Throughput: 0: 11867.0. Samples: 286184960. Policy #0 lag: (min: 76.0, avg: 178.3, max: 383.0) [2024-06-15 18:23:40,956][1648985] Avg episode reward: [(0, '180.410')] [2024-06-15 18:23:43,983][1652491] Updated weights for policy 0, policy_version 558881 (0.0013) [2024-06-15 18:23:45,573][1652491] Updated weights for policy 0, policy_version 558944 (0.0012) [2024-06-15 18:23:45,955][1648985] Fps is (10 sec: 32767.7, 60 sec: 46967.5, 300 sec: 46541.7). Total num frames: 1144717312. Throughput: 0: 11776.0. Samples: 286264832. Policy #0 lag: (min: 76.0, avg: 178.3, max: 383.0) [2024-06-15 18:23:45,956][1648985] Avg episode reward: [(0, '168.230')] [2024-06-15 18:23:47,461][1652491] Updated weights for policy 0, policy_version 559024 (0.0017) [2024-06-15 18:23:49,435][1652491] Updated weights for policy 0, policy_version 559099 (0.0012) [2024-06-15 18:23:50,955][1648985] Fps is (10 sec: 52427.8, 60 sec: 46967.5, 300 sec: 46652.7). Total num frames: 1145044992. Throughput: 0: 11662.2. Samples: 286320128. Policy #0 lag: (min: 76.0, avg: 178.3, max: 383.0) [2024-06-15 18:23:50,956][1648985] Avg episode reward: [(0, '163.850')] [2024-06-15 18:23:55,225][1651469] Signal inference workers to stop experience collection... (29100 times) [2024-06-15 18:23:55,301][1652491] InferenceWorker_p0-w0: stopping experience collection (29100 times) [2024-06-15 18:23:55,505][1651469] Signal inference workers to resume experience collection... (29100 times) [2024-06-15 18:23:55,506][1652491] InferenceWorker_p0-w0: resuming experience collection (29100 times) [2024-06-15 18:23:55,955][1648985] Fps is (10 sec: 36044.7, 60 sec: 44236.7, 300 sec: 46208.4). Total num frames: 1145077760. Throughput: 0: 11662.2. Samples: 286367232. Policy #0 lag: (min: 76.0, avg: 178.3, max: 383.0) [2024-06-15 18:23:55,956][1648985] Avg episode reward: [(0, '142.430')] [2024-06-15 18:23:56,960][1652491] Updated weights for policy 0, policy_version 559184 (0.0045) [2024-06-15 18:23:58,396][1652491] Updated weights for policy 0, policy_version 559235 (0.0011) [2024-06-15 18:24:00,113][1652491] Updated weights for policy 0, policy_version 559299 (0.0011) [2024-06-15 18:24:00,955][1648985] Fps is (10 sec: 45876.1, 60 sec: 47513.5, 300 sec: 46430.6). Total num frames: 1145503744. Throughput: 0: 11628.1. Samples: 286424576. Policy #0 lag: (min: 76.0, avg: 178.3, max: 383.0) [2024-06-15 18:24:00,956][1648985] Avg episode reward: [(0, '157.570')] [2024-06-15 18:24:01,565][1652491] Updated weights for policy 0, policy_version 559360 (0.0127) [2024-06-15 18:24:05,955][1648985] Fps is (10 sec: 49151.6, 60 sec: 43690.6, 300 sec: 46208.4). Total num frames: 1145569280. Throughput: 0: 11457.4. Samples: 286499328. Policy #0 lag: (min: 76.0, avg: 178.3, max: 383.0) [2024-06-15 18:24:05,956][1648985] Avg episode reward: [(0, '148.350')] [2024-06-15 18:24:08,398][1652491] Updated weights for policy 0, policy_version 559414 (0.0016) [2024-06-15 18:24:10,193][1652491] Updated weights for policy 0, policy_version 559488 (0.0012) [2024-06-15 18:24:10,955][1648985] Fps is (10 sec: 36044.4, 60 sec: 46967.4, 300 sec: 46208.4). Total num frames: 1145864192. Throughput: 0: 11639.4. Samples: 286534144. Policy #0 lag: (min: 76.0, avg: 178.3, max: 383.0) [2024-06-15 18:24:10,956][1648985] Avg episode reward: [(0, '160.440')] [2024-06-15 18:24:12,810][1652491] Updated weights for policy 0, policy_version 559585 (0.0015) [2024-06-15 18:24:15,958][1648985] Fps is (10 sec: 52412.7, 60 sec: 44234.4, 300 sec: 46208.0). Total num frames: 1146093568. Throughput: 0: 11172.2. Samples: 286588416. Policy #0 lag: (min: 76.0, avg: 178.3, max: 383.0) [2024-06-15 18:24:15,959][1648985] Avg episode reward: [(0, '164.110')] [2024-06-15 18:24:19,356][1652491] Updated weights for policy 0, policy_version 559632 (0.0026) [2024-06-15 18:24:20,955][1648985] Fps is (10 sec: 39322.4, 60 sec: 46421.4, 300 sec: 46208.5). Total num frames: 1146257408. Throughput: 0: 11389.2. Samples: 286662656. Policy #0 lag: (min: 76.0, avg: 178.3, max: 383.0) [2024-06-15 18:24:20,955][1648985] Avg episode reward: [(0, '170.200')] [2024-06-15 18:24:21,339][1652491] Updated weights for policy 0, policy_version 559718 (0.0013) [2024-06-15 18:24:22,968][1652491] Updated weights for policy 0, policy_version 559792 (0.0013) [2024-06-15 18:24:24,761][1652491] Updated weights for policy 0, policy_version 559856 (0.0029) [2024-06-15 18:24:25,955][1648985] Fps is (10 sec: 52445.5, 60 sec: 44782.8, 300 sec: 46430.9). Total num frames: 1146617856. Throughput: 0: 11229.9. Samples: 286690304. Policy #0 lag: (min: 95.0, avg: 149.9, max: 351.0) [2024-06-15 18:24:25,956][1648985] Avg episode reward: [(0, '152.090')] [2024-06-15 18:24:30,955][1648985] Fps is (10 sec: 39320.2, 60 sec: 44236.7, 300 sec: 46097.3). Total num frames: 1146650624. Throughput: 0: 11241.2. Samples: 286770688. Policy #0 lag: (min: 95.0, avg: 149.9, max: 351.0) [2024-06-15 18:24:30,956][1648985] Avg episode reward: [(0, '159.670')] [2024-06-15 18:24:31,400][1652491] Updated weights for policy 0, policy_version 559920 (0.0013) [2024-06-15 18:24:32,230][1651469] Signal inference workers to stop experience collection... (29150 times) [2024-06-15 18:24:32,269][1652491] InferenceWorker_p0-w0: stopping experience collection (29150 times) [2024-06-15 18:24:32,462][1651469] Signal inference workers to resume experience collection... (29150 times) [2024-06-15 18:24:32,463][1652491] InferenceWorker_p0-w0: resuming experience collection (29150 times) [2024-06-15 18:24:33,574][1652491] Updated weights for policy 0, policy_version 560017 (0.0013) [2024-06-15 18:24:34,986][1652491] Updated weights for policy 0, policy_version 560080 (0.0014) [2024-06-15 18:24:35,955][1648985] Fps is (10 sec: 49152.1, 60 sec: 45329.0, 300 sec: 46548.5). Total num frames: 1147109376. Throughput: 0: 11150.3. Samples: 286821888. Policy #0 lag: (min: 95.0, avg: 149.9, max: 351.0) [2024-06-15 18:24:35,956][1648985] Avg episode reward: [(0, '163.380')] [2024-06-15 18:24:40,955][1648985] Fps is (10 sec: 49151.8, 60 sec: 43690.5, 300 sec: 46208.4). Total num frames: 1147142144. Throughput: 0: 11059.1. Samples: 286864896. Policy #0 lag: (min: 95.0, avg: 149.9, max: 351.0) [2024-06-15 18:24:40,956][1648985] Avg episode reward: [(0, '165.730')] [2024-06-15 18:24:41,415][1652491] Updated weights for policy 0, policy_version 560135 (0.0014) [2024-06-15 18:24:42,306][1652491] Updated weights for policy 0, policy_version 560180 (0.0026) [2024-06-15 18:24:44,012][1652491] Updated weights for policy 0, policy_version 560241 (0.0013) [2024-06-15 18:24:45,673][1652491] Updated weights for policy 0, policy_version 560320 (0.0014) [2024-06-15 18:24:45,955][1648985] Fps is (10 sec: 42598.8, 60 sec: 46967.6, 300 sec: 46430.6). Total num frames: 1147535360. Throughput: 0: 11343.7. Samples: 286935040. Policy #0 lag: (min: 95.0, avg: 149.9, max: 351.0) [2024-06-15 18:24:45,955][1648985] Avg episode reward: [(0, '186.520')] [2024-06-15 18:24:50,955][1648985] Fps is (10 sec: 52430.0, 60 sec: 43690.8, 300 sec: 46208.5). Total num frames: 1147666432. Throughput: 0: 11264.0. Samples: 287006208. Policy #0 lag: (min: 95.0, avg: 149.9, max: 351.0) [2024-06-15 18:24:50,956][1648985] Avg episode reward: [(0, '195.920')] [2024-06-15 18:24:51,848][1652491] Updated weights for policy 0, policy_version 560386 (0.0013) [2024-06-15 18:24:54,996][1652491] Updated weights for policy 0, policy_version 560468 (0.0013) [2024-06-15 18:24:55,955][1648985] Fps is (10 sec: 36043.8, 60 sec: 46967.4, 300 sec: 46319.5). Total num frames: 1147895808. Throughput: 0: 11389.1. Samples: 287046656. Policy #0 lag: (min: 95.0, avg: 149.9, max: 351.0) [2024-06-15 18:24:55,956][1648985] Avg episode reward: [(0, '187.120')] [2024-06-15 18:24:56,308][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000560528_1147961344.pth... [2024-06-15 18:24:56,428][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000555056_1136754688.pth [2024-06-15 18:24:56,758][1652491] Updated weights for policy 0, policy_version 560544 (0.0091) [2024-06-15 18:24:58,575][1652491] Updated weights for policy 0, policy_version 560612 (0.0024) [2024-06-15 18:25:00,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 44782.9, 300 sec: 46319.5). Total num frames: 1148190720. Throughput: 0: 11515.1. Samples: 287106560. Policy #0 lag: (min: 95.0, avg: 149.9, max: 351.0) [2024-06-15 18:25:00,956][1648985] Avg episode reward: [(0, '171.850')] [2024-06-15 18:25:03,180][1652491] Updated weights for policy 0, policy_version 560642 (0.0013) [2024-06-15 18:25:05,955][1648985] Fps is (10 sec: 42599.5, 60 sec: 45875.3, 300 sec: 46541.7). Total num frames: 1148321792. Throughput: 0: 11605.3. Samples: 287184896. Policy #0 lag: (min: 95.0, avg: 149.9, max: 351.0) [2024-06-15 18:25:05,955][1648985] Avg episode reward: [(0, '165.920')] [2024-06-15 18:25:06,289][1652491] Updated weights for policy 0, policy_version 560722 (0.0012) [2024-06-15 18:25:08,025][1652491] Updated weights for policy 0, policy_version 560800 (0.0013) [2024-06-15 18:25:09,427][1651469] Signal inference workers to stop experience collection... (29200 times) [2024-06-15 18:25:09,469][1652491] InferenceWorker_p0-w0: stopping experience collection (29200 times) [2024-06-15 18:25:09,683][1651469] Signal inference workers to resume experience collection... (29200 times) [2024-06-15 18:25:09,722][1652491] InferenceWorker_p0-w0: resuming experience collection (29200 times) [2024-06-15 18:25:10,355][1652491] Updated weights for policy 0, policy_version 560885 (0.0117) [2024-06-15 18:25:10,978][1648985] Fps is (10 sec: 52307.6, 60 sec: 47495.3, 300 sec: 46649.3). Total num frames: 1148715008. Throughput: 0: 11485.6. Samples: 287207424. Policy #0 lag: (min: 95.0, avg: 149.9, max: 351.0) [2024-06-15 18:25:10,979][1648985] Avg episode reward: [(0, '149.380')] [2024-06-15 18:25:15,679][1652491] Updated weights for policy 0, policy_version 560944 (0.0013) [2024-06-15 18:25:15,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 45877.7, 300 sec: 46652.7). Total num frames: 1148846080. Throughput: 0: 11503.0. Samples: 287288320. Policy #0 lag: (min: 95.0, avg: 149.9, max: 351.0) [2024-06-15 18:25:15,956][1648985] Avg episode reward: [(0, '155.410')] [2024-06-15 18:25:17,933][1652491] Updated weights for policy 0, policy_version 560978 (0.0014) [2024-06-15 18:25:19,464][1652491] Updated weights for policy 0, policy_version 561056 (0.0014) [2024-06-15 18:25:20,955][1648985] Fps is (10 sec: 45981.8, 60 sec: 48605.8, 300 sec: 46432.1). Total num frames: 1149173760. Throughput: 0: 11639.5. Samples: 287345664. Policy #0 lag: (min: 95.0, avg: 149.9, max: 351.0) [2024-06-15 18:25:20,956][1648985] Avg episode reward: [(0, '148.430')] [2024-06-15 18:25:20,977][1652491] Updated weights for policy 0, policy_version 561126 (0.0107) [2024-06-15 18:25:25,516][1652491] Updated weights for policy 0, policy_version 561157 (0.0013) [2024-06-15 18:25:25,955][1648985] Fps is (10 sec: 42598.0, 60 sec: 44236.8, 300 sec: 46319.5). Total num frames: 1149272064. Throughput: 0: 11628.2. Samples: 287388160. Policy #0 lag: (min: 95.0, avg: 149.9, max: 351.0) [2024-06-15 18:25:25,956][1648985] Avg episode reward: [(0, '149.360')] [2024-06-15 18:25:26,581][1652491] Updated weights for policy 0, policy_version 561212 (0.0010) [2024-06-15 18:25:29,378][1652491] Updated weights for policy 0, policy_version 561270 (0.0012) [2024-06-15 18:25:30,955][1648985] Fps is (10 sec: 42597.5, 60 sec: 49152.0, 300 sec: 46874.9). Total num frames: 1149599744. Throughput: 0: 11741.8. Samples: 287463424. Policy #0 lag: (min: 95.0, avg: 149.9, max: 351.0) [2024-06-15 18:25:30,956][1648985] Avg episode reward: [(0, '162.980')] [2024-06-15 18:25:31,064][1652491] Updated weights for policy 0, policy_version 561344 (0.0107) [2024-06-15 18:25:32,147][1652491] Updated weights for policy 0, policy_version 561408 (0.0032) [2024-06-15 18:25:35,955][1648985] Fps is (10 sec: 49152.1, 60 sec: 44236.8, 300 sec: 46208.4). Total num frames: 1149763584. Throughput: 0: 11719.1. Samples: 287533568. Policy #0 lag: (min: 95.0, avg: 149.9, max: 351.0) [2024-06-15 18:25:35,956][1648985] Avg episode reward: [(0, '181.710')] [2024-06-15 18:25:37,960][1652491] Updated weights for policy 0, policy_version 561466 (0.0014) [2024-06-15 18:25:40,757][1652491] Updated weights for policy 0, policy_version 561520 (0.0015) [2024-06-15 18:25:40,955][1648985] Fps is (10 sec: 39322.9, 60 sec: 47513.9, 300 sec: 46541.7). Total num frames: 1149992960. Throughput: 0: 11650.9. Samples: 287570944. Policy #0 lag: (min: 95.0, avg: 149.9, max: 351.0) [2024-06-15 18:25:40,955][1648985] Avg episode reward: [(0, '177.770')] [2024-06-15 18:25:42,688][1652491] Updated weights for policy 0, policy_version 561604 (0.0012) [2024-06-15 18:25:45,955][1648985] Fps is (10 sec: 52427.3, 60 sec: 45874.9, 300 sec: 46430.6). Total num frames: 1150287872. Throughput: 0: 11605.3. Samples: 287628800. Policy #0 lag: (min: 95.0, avg: 149.9, max: 351.0) [2024-06-15 18:25:45,956][1648985] Avg episode reward: [(0, '176.000')] [2024-06-15 18:25:48,608][1652491] Updated weights for policy 0, policy_version 561665 (0.0014) [2024-06-15 18:25:50,076][1652491] Updated weights for policy 0, policy_version 561720 (0.0012) [2024-06-15 18:25:50,955][1648985] Fps is (10 sec: 42598.2, 60 sec: 45875.3, 300 sec: 46430.6). Total num frames: 1150418944. Throughput: 0: 11582.6. Samples: 287706112. Policy #0 lag: (min: 95.0, avg: 149.9, max: 351.0) [2024-06-15 18:25:50,955][1648985] Avg episode reward: [(0, '173.630')] [2024-06-15 18:25:52,315][1651469] Signal inference workers to stop experience collection... (29250 times) [2024-06-15 18:25:52,392][1652491] InferenceWorker_p0-w0: stopping experience collection (29250 times) [2024-06-15 18:25:52,733][1651469] Signal inference workers to resume experience collection... (29250 times) [2024-06-15 18:25:52,734][1652491] InferenceWorker_p0-w0: resuming experience collection (29250 times) [2024-06-15 18:25:52,737][1652491] Updated weights for policy 0, policy_version 561792 (0.0014) [2024-06-15 18:25:54,139][1652491] Updated weights for policy 0, policy_version 561841 (0.0012) [2024-06-15 18:25:55,047][1652491] Updated weights for policy 0, policy_version 561892 (0.0011) [2024-06-15 18:25:55,955][1648985] Fps is (10 sec: 52430.1, 60 sec: 48606.0, 300 sec: 46652.7). Total num frames: 1150812160. Throughput: 0: 11656.8. Samples: 287731712. Policy #0 lag: (min: 53.0, avg: 138.0, max: 309.0) [2024-06-15 18:25:55,956][1648985] Avg episode reward: [(0, '172.030')] [2024-06-15 18:26:00,784][1652491] Updated weights for policy 0, policy_version 561952 (0.0018) [2024-06-15 18:26:00,967][1648985] Fps is (10 sec: 45821.8, 60 sec: 44774.3, 300 sec: 46428.8). Total num frames: 1150877696. Throughput: 0: 11556.8. Samples: 287808512. Policy #0 lag: (min: 53.0, avg: 138.0, max: 309.0) [2024-06-15 18:26:00,967][1648985] Avg episode reward: [(0, '161.890')] [2024-06-15 18:26:03,484][1652491] Updated weights for policy 0, policy_version 562032 (0.0013) [2024-06-15 18:26:04,781][1652491] Updated weights for policy 0, policy_version 562083 (0.0020) [2024-06-15 18:26:05,955][1648985] Fps is (10 sec: 42598.0, 60 sec: 48605.7, 300 sec: 46652.7). Total num frames: 1151238144. Throughput: 0: 11480.1. Samples: 287862272. Policy #0 lag: (min: 53.0, avg: 138.0, max: 309.0) [2024-06-15 18:26:05,956][1648985] Avg episode reward: [(0, '158.640')] [2024-06-15 18:26:06,456][1652491] Updated weights for policy 0, policy_version 562173 (0.0013) [2024-06-15 18:26:10,955][1648985] Fps is (10 sec: 45927.6, 60 sec: 43707.4, 300 sec: 46208.4). Total num frames: 1151336448. Throughput: 0: 11446.0. Samples: 287903232. Policy #0 lag: (min: 53.0, avg: 138.0, max: 309.0) [2024-06-15 18:26:10,956][1648985] Avg episode reward: [(0, '154.760')] [2024-06-15 18:26:13,130][1652491] Updated weights for policy 0, policy_version 562230 (0.0012) [2024-06-15 18:26:14,089][1652491] Updated weights for policy 0, policy_version 562256 (0.0013) [2024-06-15 18:26:15,955][1648985] Fps is (10 sec: 39322.3, 60 sec: 46421.3, 300 sec: 46652.7). Total num frames: 1151631360. Throughput: 0: 11377.8. Samples: 287975424. Policy #0 lag: (min: 53.0, avg: 138.0, max: 309.0) [2024-06-15 18:26:15,956][1648985] Avg episode reward: [(0, '133.780')] [2024-06-15 18:26:16,118][1652491] Updated weights for policy 0, policy_version 562336 (0.0013) [2024-06-15 18:26:17,486][1652491] Updated weights for policy 0, policy_version 562400 (0.0134) [2024-06-15 18:26:20,955][1648985] Fps is (10 sec: 52430.1, 60 sec: 44783.0, 300 sec: 46319.5). Total num frames: 1151860736. Throughput: 0: 11286.8. Samples: 288041472. Policy #0 lag: (min: 53.0, avg: 138.0, max: 309.0) [2024-06-15 18:26:20,955][1648985] Avg episode reward: [(0, '127.640')] [2024-06-15 18:26:23,617][1652491] Updated weights for policy 0, policy_version 562454 (0.0014) [2024-06-15 18:26:24,644][1652491] Updated weights for policy 0, policy_version 562495 (0.0014) [2024-06-15 18:26:25,955][1648985] Fps is (10 sec: 39321.6, 60 sec: 45875.2, 300 sec: 46652.7). Total num frames: 1152024576. Throughput: 0: 11241.2. Samples: 288076800. Policy #0 lag: (min: 53.0, avg: 138.0, max: 309.0) [2024-06-15 18:26:25,955][1648985] Avg episode reward: [(0, '128.650')] [2024-06-15 18:26:26,899][1652491] Updated weights for policy 0, policy_version 562548 (0.0012) [2024-06-15 18:26:28,384][1652491] Updated weights for policy 0, policy_version 562618 (0.0053) [2024-06-15 18:26:29,764][1652491] Updated weights for policy 0, policy_version 562683 (0.0014) [2024-06-15 18:26:30,955][1648985] Fps is (10 sec: 52427.6, 60 sec: 46421.4, 300 sec: 46652.7). Total num frames: 1152385024. Throughput: 0: 11389.2. Samples: 288141312. Policy #0 lag: (min: 53.0, avg: 138.0, max: 309.0) [2024-06-15 18:26:30,956][1648985] Avg episode reward: [(0, '151.430')] [2024-06-15 18:26:34,407][1651469] Signal inference workers to stop experience collection... (29300 times) [2024-06-15 18:26:34,476][1652491] InferenceWorker_p0-w0: stopping experience collection (29300 times) [2024-06-15 18:26:34,683][1651469] Signal inference workers to resume experience collection... (29300 times) [2024-06-15 18:26:34,683][1652491] InferenceWorker_p0-w0: resuming experience collection (29300 times) [2024-06-15 18:26:35,702][1652491] Updated weights for policy 0, policy_version 562741 (0.0032) [2024-06-15 18:26:35,955][1648985] Fps is (10 sec: 49151.9, 60 sec: 45875.2, 300 sec: 46653.0). Total num frames: 1152516096. Throughput: 0: 11309.5. Samples: 288215040. Policy #0 lag: (min: 53.0, avg: 138.0, max: 309.0) [2024-06-15 18:26:35,956][1648985] Avg episode reward: [(0, '161.150')] [2024-06-15 18:26:38,036][1652491] Updated weights for policy 0, policy_version 562803 (0.0014) [2024-06-15 18:26:39,148][1652491] Updated weights for policy 0, policy_version 562866 (0.0091) [2024-06-15 18:26:40,831][1652491] Updated weights for policy 0, policy_version 562913 (0.0015) [2024-06-15 18:26:40,955][1648985] Fps is (10 sec: 45875.9, 60 sec: 47513.5, 300 sec: 46430.6). Total num frames: 1152843776. Throughput: 0: 11377.8. Samples: 288243712. Policy #0 lag: (min: 53.0, avg: 138.0, max: 309.0) [2024-06-15 18:26:40,956][1648985] Avg episode reward: [(0, '155.280')] [2024-06-15 18:26:45,955][1648985] Fps is (10 sec: 39321.5, 60 sec: 43690.9, 300 sec: 46208.4). Total num frames: 1152909312. Throughput: 0: 11369.3. Samples: 288320000. Policy #0 lag: (min: 53.0, avg: 138.0, max: 309.0) [2024-06-15 18:26:45,956][1648985] Avg episode reward: [(0, '143.110')] [2024-06-15 18:26:46,001][1652491] Updated weights for policy 0, policy_version 562948 (0.0025) [2024-06-15 18:26:47,095][1652491] Updated weights for policy 0, policy_version 562998 (0.0014) [2024-06-15 18:26:48,194][1652491] Updated weights for policy 0, policy_version 563024 (0.0012) [2024-06-15 18:26:49,674][1652491] Updated weights for policy 0, policy_version 563088 (0.0015) [2024-06-15 18:26:50,955][1648985] Fps is (10 sec: 45875.4, 60 sec: 48059.7, 300 sec: 46430.6). Total num frames: 1153302528. Throughput: 0: 11685.0. Samples: 288388096. Policy #0 lag: (min: 53.0, avg: 138.0, max: 309.0) [2024-06-15 18:26:50,956][1648985] Avg episode reward: [(0, '156.460')] [2024-06-15 18:26:51,281][1652491] Updated weights for policy 0, policy_version 563155 (0.0011) [2024-06-15 18:26:52,332][1652491] Updated weights for policy 0, policy_version 563198 (0.0017) [2024-06-15 18:26:55,955][1648985] Fps is (10 sec: 55703.9, 60 sec: 44236.6, 300 sec: 46319.4). Total num frames: 1153466368. Throughput: 0: 11650.8. Samples: 288427520. Policy #0 lag: (min: 53.0, avg: 138.0, max: 309.0) [2024-06-15 18:26:55,956][1648985] Avg episode reward: [(0, '171.780')] [2024-06-15 18:26:56,249][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000563232_1153499136.pth... [2024-06-15 18:26:56,417][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000557824_1142423552.pth [2024-06-15 18:26:57,083][1652491] Updated weights for policy 0, policy_version 563264 (0.0018) [2024-06-15 18:27:00,127][1652491] Updated weights for policy 0, policy_version 563328 (0.0032) [2024-06-15 18:27:00,955][1648985] Fps is (10 sec: 45874.1, 60 sec: 48068.9, 300 sec: 46430.6). Total num frames: 1153761280. Throughput: 0: 11639.4. Samples: 288499200. Policy #0 lag: (min: 53.0, avg: 138.0, max: 309.0) [2024-06-15 18:27:00,956][1648985] Avg episode reward: [(0, '168.280')] [2024-06-15 18:27:01,477][1652491] Updated weights for policy 0, policy_version 563382 (0.0013) [2024-06-15 18:27:02,972][1652491] Updated weights for policy 0, policy_version 563427 (0.0031) [2024-06-15 18:27:05,955][1648985] Fps is (10 sec: 49154.0, 60 sec: 45329.2, 300 sec: 46209.5). Total num frames: 1153957888. Throughput: 0: 11844.3. Samples: 288574464. Policy #0 lag: (min: 53.0, avg: 138.0, max: 309.0) [2024-06-15 18:27:05,955][1648985] Avg episode reward: [(0, '143.790')] [2024-06-15 18:27:07,177][1652491] Updated weights for policy 0, policy_version 563488 (0.0011) [2024-06-15 18:27:09,920][1652491] Updated weights for policy 0, policy_version 563521 (0.0014) [2024-06-15 18:27:10,955][1648985] Fps is (10 sec: 42599.2, 60 sec: 47513.7, 300 sec: 46430.6). Total num frames: 1154187264. Throughput: 0: 11912.5. Samples: 288612864. Policy #0 lag: (min: 53.0, avg: 138.0, max: 309.0) [2024-06-15 18:27:10,956][1648985] Avg episode reward: [(0, '142.230')] [2024-06-15 18:27:11,697][1652491] Updated weights for policy 0, policy_version 563601 (0.0107) [2024-06-15 18:27:12,793][1651469] Signal inference workers to stop experience collection... (29350 times) [2024-06-15 18:27:12,865][1652491] InferenceWorker_p0-w0: stopping experience collection (29350 times) [2024-06-15 18:27:12,968][1651469] Signal inference workers to resume experience collection... (29350 times) [2024-06-15 18:27:12,969][1652491] InferenceWorker_p0-w0: resuming experience collection (29350 times) [2024-06-15 18:27:13,059][1652491] Updated weights for policy 0, policy_version 563664 (0.0101) [2024-06-15 18:27:15,955][1648985] Fps is (10 sec: 52427.7, 60 sec: 47513.5, 300 sec: 46208.4). Total num frames: 1154482176. Throughput: 0: 11935.3. Samples: 288678400. Policy #0 lag: (min: 53.0, avg: 138.0, max: 309.0) [2024-06-15 18:27:15,956][1648985] Avg episode reward: [(0, '149.710')] [2024-06-15 18:27:17,336][1652491] Updated weights for policy 0, policy_version 563714 (0.0021) [2024-06-15 18:27:18,759][1652491] Updated weights for policy 0, policy_version 563775 (0.0143) [2024-06-15 18:27:20,955][1648985] Fps is (10 sec: 42598.3, 60 sec: 45875.1, 300 sec: 46430.6). Total num frames: 1154613248. Throughput: 0: 12060.4. Samples: 288757760. Policy #0 lag: (min: 53.0, avg: 138.0, max: 309.0) [2024-06-15 18:27:20,956][1648985] Avg episode reward: [(0, '153.920')] [2024-06-15 18:27:21,814][1652491] Updated weights for policy 0, policy_version 563830 (0.0013) [2024-06-15 18:27:23,699][1652491] Updated weights for policy 0, policy_version 563920 (0.0013) [2024-06-15 18:27:25,955][1648985] Fps is (10 sec: 52429.2, 60 sec: 49698.1, 300 sec: 46430.9). Total num frames: 1155006464. Throughput: 0: 12049.1. Samples: 288785920. Policy #0 lag: (min: 96.0, avg: 198.9, max: 303.0) [2024-06-15 18:27:25,956][1648985] Avg episode reward: [(0, '169.070')] [2024-06-15 18:27:28,886][1652491] Updated weights for policy 0, policy_version 563994 (0.0011) [2024-06-15 18:27:29,837][1652491] Updated weights for policy 0, policy_version 564030 (0.0013) [2024-06-15 18:27:30,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 45875.3, 300 sec: 46652.7). Total num frames: 1155137536. Throughput: 0: 12003.6. Samples: 288860160. Policy #0 lag: (min: 96.0, avg: 198.9, max: 303.0) [2024-06-15 18:27:30,956][1648985] Avg episode reward: [(0, '154.350')] [2024-06-15 18:27:32,610][1652491] Updated weights for policy 0, policy_version 564070 (0.0015) [2024-06-15 18:27:33,825][1652491] Updated weights for policy 0, policy_version 564129 (0.0012) [2024-06-15 18:27:35,047][1652491] Updated weights for policy 0, policy_version 564194 (0.0015) [2024-06-15 18:27:35,955][1648985] Fps is (10 sec: 52429.5, 60 sec: 50244.3, 300 sec: 46763.8). Total num frames: 1155530752. Throughput: 0: 12094.6. Samples: 288932352. Policy #0 lag: (min: 96.0, avg: 198.9, max: 303.0) [2024-06-15 18:27:35,956][1648985] Avg episode reward: [(0, '151.270')] [2024-06-15 18:27:38,846][1652491] Updated weights for policy 0, policy_version 564227 (0.0013) [2024-06-15 18:27:40,955][1648985] Fps is (10 sec: 52428.1, 60 sec: 46967.4, 300 sec: 46652.7). Total num frames: 1155661824. Throughput: 0: 12265.3. Samples: 288979456. Policy #0 lag: (min: 96.0, avg: 198.9, max: 303.0) [2024-06-15 18:27:40,956][1648985] Avg episode reward: [(0, '151.280')] [2024-06-15 18:27:43,132][1652491] Updated weights for policy 0, policy_version 564320 (0.0020) [2024-06-15 18:27:44,404][1652491] Updated weights for policy 0, policy_version 564371 (0.0012) [2024-06-15 18:27:45,955][1648985] Fps is (10 sec: 45874.6, 60 sec: 51336.5, 300 sec: 46652.8). Total num frames: 1155989504. Throughput: 0: 12106.0. Samples: 289043968. Policy #0 lag: (min: 96.0, avg: 198.9, max: 303.0) [2024-06-15 18:27:45,956][1648985] Avg episode reward: [(0, '153.180')] [2024-06-15 18:27:46,347][1652491] Updated weights for policy 0, policy_version 564475 (0.0130) [2024-06-15 18:27:50,869][1652491] Updated weights for policy 0, policy_version 564518 (0.0013) [2024-06-15 18:27:50,955][1648985] Fps is (10 sec: 45876.0, 60 sec: 46967.5, 300 sec: 46430.6). Total num frames: 1156120576. Throughput: 0: 12014.9. Samples: 289115136. Policy #0 lag: (min: 96.0, avg: 198.9, max: 303.0) [2024-06-15 18:27:50,955][1648985] Avg episode reward: [(0, '155.600')] [2024-06-15 18:27:53,984][1652491] Updated weights for policy 0, policy_version 564569 (0.0012) [2024-06-15 18:27:54,275][1651469] Signal inference workers to stop experience collection... (29400 times) [2024-06-15 18:27:54,355][1652491] InferenceWorker_p0-w0: stopping experience collection (29400 times) [2024-06-15 18:27:54,566][1651469] Signal inference workers to resume experience collection... (29400 times) [2024-06-15 18:27:54,567][1652491] InferenceWorker_p0-w0: resuming experience collection (29400 times) [2024-06-15 18:27:55,418][1652491] Updated weights for policy 0, policy_version 564624 (0.0013) [2024-06-15 18:27:55,955][1648985] Fps is (10 sec: 39321.8, 60 sec: 48606.1, 300 sec: 46541.7). Total num frames: 1156382720. Throughput: 0: 12049.1. Samples: 289155072. Policy #0 lag: (min: 96.0, avg: 198.9, max: 303.0) [2024-06-15 18:27:55,956][1648985] Avg episode reward: [(0, '162.540')] [2024-06-15 18:27:56,933][1652491] Updated weights for policy 0, policy_version 564695 (0.0013) [2024-06-15 18:28:00,955][1648985] Fps is (10 sec: 45875.1, 60 sec: 46967.6, 300 sec: 46208.4). Total num frames: 1156579328. Throughput: 0: 12037.7. Samples: 289220096. Policy #0 lag: (min: 96.0, avg: 198.9, max: 303.0) [2024-06-15 18:28:00,955][1648985] Avg episode reward: [(0, '162.360')] [2024-06-15 18:28:02,026][1652491] Updated weights for policy 0, policy_version 564770 (0.0013) [2024-06-15 18:28:04,598][1652491] Updated weights for policy 0, policy_version 564804 (0.0016) [2024-06-15 18:28:05,955][1648985] Fps is (10 sec: 42598.4, 60 sec: 47513.5, 300 sec: 46652.7). Total num frames: 1156808704. Throughput: 0: 11969.4. Samples: 289296384. Policy #0 lag: (min: 96.0, avg: 198.9, max: 303.0) [2024-06-15 18:28:05,956][1648985] Avg episode reward: [(0, '154.950')] [2024-06-15 18:28:06,213][1652491] Updated weights for policy 0, policy_version 564865 (0.0012) [2024-06-15 18:28:07,799][1652491] Updated weights for policy 0, policy_version 564934 (0.0012) [2024-06-15 18:28:10,955][1648985] Fps is (10 sec: 52427.4, 60 sec: 48605.7, 300 sec: 46319.5). Total num frames: 1157103616. Throughput: 0: 11901.1. Samples: 289321472. Policy #0 lag: (min: 96.0, avg: 198.9, max: 303.0) [2024-06-15 18:28:10,956][1648985] Avg episode reward: [(0, '146.680')] [2024-06-15 18:28:11,947][1652491] Updated weights for policy 0, policy_version 564995 (0.0019) [2024-06-15 18:28:13,040][1652491] Updated weights for policy 0, policy_version 565046 (0.0017) [2024-06-15 18:28:15,810][1652491] Updated weights for policy 0, policy_version 565088 (0.0091) [2024-06-15 18:28:15,966][1648985] Fps is (10 sec: 49097.2, 60 sec: 46958.8, 300 sec: 46873.1). Total num frames: 1157300224. Throughput: 0: 12057.4. Samples: 289402880. Policy #0 lag: (min: 96.0, avg: 198.9, max: 303.0) [2024-06-15 18:28:15,967][1648985] Avg episode reward: [(0, '150.390')] [2024-06-15 18:28:16,667][1652491] Updated weights for policy 0, policy_version 565117 (0.0009) [2024-06-15 18:28:18,637][1652491] Updated weights for policy 0, policy_version 565187 (0.0012) [2024-06-15 18:28:20,955][1648985] Fps is (10 sec: 52429.8, 60 sec: 50244.3, 300 sec: 46430.6). Total num frames: 1157627904. Throughput: 0: 11901.1. Samples: 289467904. Policy #0 lag: (min: 96.0, avg: 198.9, max: 303.0) [2024-06-15 18:28:20,956][1648985] Avg episode reward: [(0, '158.900')] [2024-06-15 18:28:23,191][1652491] Updated weights for policy 0, policy_version 565250 (0.0120) [2024-06-15 18:28:25,960][1648985] Fps is (10 sec: 45904.8, 60 sec: 45871.6, 300 sec: 46652.0). Total num frames: 1157758976. Throughput: 0: 11661.0. Samples: 289504256. Policy #0 lag: (min: 96.0, avg: 198.9, max: 303.0) [2024-06-15 18:28:25,960][1648985] Avg episode reward: [(0, '179.710')] [2024-06-15 18:28:26,460][1652491] Updated weights for policy 0, policy_version 565314 (0.0014) [2024-06-15 18:28:28,231][1652491] Updated weights for policy 0, policy_version 565377 (0.0014) [2024-06-15 18:28:29,449][1652491] Updated weights for policy 0, policy_version 565433 (0.0089) [2024-06-15 18:28:30,644][1652491] Updated weights for policy 0, policy_version 565477 (0.0014) [2024-06-15 18:28:30,955][1648985] Fps is (10 sec: 49152.2, 60 sec: 49698.1, 300 sec: 46541.7). Total num frames: 1158119424. Throughput: 0: 11764.6. Samples: 289573376. Policy #0 lag: (min: 96.0, avg: 198.9, max: 303.0) [2024-06-15 18:28:30,956][1648985] Avg episode reward: [(0, '181.190')] [2024-06-15 18:28:34,939][1652491] Updated weights for policy 0, policy_version 565520 (0.0015) [2024-06-15 18:28:35,037][1651469] Signal inference workers to stop experience collection... (29450 times) [2024-06-15 18:28:35,096][1652491] InferenceWorker_p0-w0: stopping experience collection (29450 times) [2024-06-15 18:28:35,225][1651469] Signal inference workers to resume experience collection... (29450 times) [2024-06-15 18:28:35,226][1652491] InferenceWorker_p0-w0: resuming experience collection (29450 times) [2024-06-15 18:28:35,955][1648985] Fps is (10 sec: 52453.1, 60 sec: 45875.1, 300 sec: 46652.7). Total num frames: 1158283264. Throughput: 0: 11719.1. Samples: 289642496. Policy #0 lag: (min: 96.0, avg: 198.9, max: 303.0) [2024-06-15 18:28:35,956][1648985] Avg episode reward: [(0, '161.120')] [2024-06-15 18:28:37,739][1652491] Updated weights for policy 0, policy_version 565584 (0.0013) [2024-06-15 18:28:39,541][1652491] Updated weights for policy 0, policy_version 565648 (0.0014) [2024-06-15 18:28:40,335][1652491] Updated weights for policy 0, policy_version 565690 (0.0013) [2024-06-15 18:28:40,955][1648985] Fps is (10 sec: 42597.3, 60 sec: 48059.6, 300 sec: 46874.9). Total num frames: 1158545408. Throughput: 0: 11673.5. Samples: 289680384. Policy #0 lag: (min: 96.0, avg: 198.9, max: 303.0) [2024-06-15 18:28:40,956][1648985] Avg episode reward: [(0, '147.490')] [2024-06-15 18:28:41,941][1652491] Updated weights for policy 0, policy_version 565731 (0.0015) [2024-06-15 18:28:45,568][1652491] Updated weights for policy 0, policy_version 565786 (0.0013) [2024-06-15 18:28:45,955][1648985] Fps is (10 sec: 45875.2, 60 sec: 45875.2, 300 sec: 46430.6). Total num frames: 1158742016. Throughput: 0: 11855.6. Samples: 289753600. Policy #0 lag: (min: 96.0, avg: 198.9, max: 303.0) [2024-06-15 18:28:45,956][1648985] Avg episode reward: [(0, '139.880')] [2024-06-15 18:28:48,402][1652491] Updated weights for policy 0, policy_version 565828 (0.0012) [2024-06-15 18:28:50,672][1652491] Updated weights for policy 0, policy_version 565889 (0.0012) [2024-06-15 18:28:50,955][1648985] Fps is (10 sec: 42600.1, 60 sec: 47513.7, 300 sec: 47097.1). Total num frames: 1158971392. Throughput: 0: 11776.0. Samples: 289826304. Policy #0 lag: (min: 96.0, avg: 198.9, max: 303.0) [2024-06-15 18:28:50,955][1648985] Avg episode reward: [(0, '154.770')] [2024-06-15 18:28:51,834][1652491] Updated weights for policy 0, policy_version 565951 (0.0014) [2024-06-15 18:28:53,573][1652491] Updated weights for policy 0, policy_version 566016 (0.0012) [2024-06-15 18:28:55,955][1648985] Fps is (10 sec: 45875.7, 60 sec: 46967.5, 300 sec: 46430.6). Total num frames: 1159200768. Throughput: 0: 11867.1. Samples: 289855488. Policy #0 lag: (min: 96.0, avg: 198.9, max: 303.0) [2024-06-15 18:28:55,955][1648985] Avg episode reward: [(0, '145.350')] [2024-06-15 18:28:55,970][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000566016_1159200768.pth... [2024-06-15 18:28:56,199][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000560528_1147961344.pth [2024-06-15 18:28:57,808][1652491] Updated weights for policy 0, policy_version 566079 (0.0014) [2024-06-15 18:29:00,877][1652491] Updated weights for policy 0, policy_version 566134 (0.0132) [2024-06-15 18:29:00,955][1648985] Fps is (10 sec: 45874.9, 60 sec: 47513.6, 300 sec: 46986.0). Total num frames: 1159430144. Throughput: 0: 11608.2. Samples: 289925120. Policy #0 lag: (min: 9.0, avg: 124.8, max: 265.0) [2024-06-15 18:29:00,955][1648985] Avg episode reward: [(0, '136.660')] [2024-06-15 18:29:02,590][1652491] Updated weights for policy 0, policy_version 566160 (0.0117) [2024-06-15 18:29:04,563][1652491] Updated weights for policy 0, policy_version 566213 (0.0014) [2024-06-15 18:29:05,777][1652491] Updated weights for policy 0, policy_version 566266 (0.0013) [2024-06-15 18:29:05,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 48605.9, 300 sec: 46986.0). Total num frames: 1159725056. Throughput: 0: 11571.2. Samples: 289988608. Policy #0 lag: (min: 9.0, avg: 124.8, max: 265.0) [2024-06-15 18:29:05,956][1648985] Avg episode reward: [(0, '129.580')] [2024-06-15 18:29:08,495][1652491] Updated weights for policy 0, policy_version 566320 (0.0011) [2024-06-15 18:29:10,955][1648985] Fps is (10 sec: 42597.8, 60 sec: 45875.3, 300 sec: 46653.2). Total num frames: 1159856128. Throughput: 0: 11595.2. Samples: 290025984. Policy #0 lag: (min: 9.0, avg: 124.8, max: 265.0) [2024-06-15 18:29:10,956][1648985] Avg episode reward: [(0, '147.030')] [2024-06-15 18:29:11,920][1652491] Updated weights for policy 0, policy_version 566371 (0.0041) [2024-06-15 18:29:14,784][1652491] Updated weights for policy 0, policy_version 566448 (0.0013) [2024-06-15 18:29:15,955][1648985] Fps is (10 sec: 42598.2, 60 sec: 47522.5, 300 sec: 47097.0). Total num frames: 1160151040. Throughput: 0: 11616.7. Samples: 290096128. Policy #0 lag: (min: 9.0, avg: 124.8, max: 265.0) [2024-06-15 18:29:15,956][1648985] Avg episode reward: [(0, '152.970')] [2024-06-15 18:29:16,171][1652491] Updated weights for policy 0, policy_version 566496 (0.0011) [2024-06-15 18:29:19,673][1652491] Updated weights for policy 0, policy_version 566561 (0.0014) [2024-06-15 18:29:20,955][1648985] Fps is (10 sec: 52429.2, 60 sec: 45875.2, 300 sec: 46652.7). Total num frames: 1160380416. Throughput: 0: 11605.4. Samples: 290164736. Policy #0 lag: (min: 9.0, avg: 124.8, max: 265.0) [2024-06-15 18:29:20,956][1648985] Avg episode reward: [(0, '155.970')] [2024-06-15 18:29:22,394][1651469] Signal inference workers to stop experience collection... (29500 times) [2024-06-15 18:29:22,460][1652491] InferenceWorker_p0-w0: stopping experience collection (29500 times) [2024-06-15 18:29:22,603][1651469] Signal inference workers to resume experience collection... (29500 times) [2024-06-15 18:29:22,603][1652491] InferenceWorker_p0-w0: resuming experience collection (29500 times) [2024-06-15 18:29:22,605][1652491] Updated weights for policy 0, policy_version 566608 (0.0012) [2024-06-15 18:29:25,377][1652491] Updated weights for policy 0, policy_version 566662 (0.0014) [2024-06-15 18:29:25,955][1648985] Fps is (10 sec: 39321.7, 60 sec: 46425.0, 300 sec: 47097.1). Total num frames: 1160544256. Throughput: 0: 11594.0. Samples: 290202112. Policy #0 lag: (min: 9.0, avg: 124.8, max: 265.0) [2024-06-15 18:29:25,955][1648985] Avg episode reward: [(0, '160.480')] [2024-06-15 18:29:26,749][1652491] Updated weights for policy 0, policy_version 566720 (0.0012) [2024-06-15 18:29:28,007][1652491] Updated weights for policy 0, policy_version 566774 (0.0012) [2024-06-15 18:29:30,941][1652491] Updated weights for policy 0, policy_version 566839 (0.0133) [2024-06-15 18:29:30,955][1648985] Fps is (10 sec: 49152.3, 60 sec: 45875.2, 300 sec: 46652.8). Total num frames: 1160871936. Throughput: 0: 11571.2. Samples: 290274304. Policy #0 lag: (min: 9.0, avg: 124.8, max: 265.0) [2024-06-15 18:29:30,955][1648985] Avg episode reward: [(0, '165.000')] [2024-06-15 18:29:33,907][1652491] Updated weights for policy 0, policy_version 566870 (0.0012) [2024-06-15 18:29:35,963][1648985] Fps is (10 sec: 49112.3, 60 sec: 45869.1, 300 sec: 47095.8). Total num frames: 1161035776. Throughput: 0: 11523.6. Samples: 290344960. Policy #0 lag: (min: 9.0, avg: 124.8, max: 265.0) [2024-06-15 18:29:35,964][1648985] Avg episode reward: [(0, '170.530')] [2024-06-15 18:29:36,826][1652491] Updated weights for policy 0, policy_version 566928 (0.0013) [2024-06-15 18:29:38,107][1652491] Updated weights for policy 0, policy_version 566972 (0.0012) [2024-06-15 18:29:39,163][1652491] Updated weights for policy 0, policy_version 567024 (0.0023) [2024-06-15 18:29:40,955][1648985] Fps is (10 sec: 42598.0, 60 sec: 45875.4, 300 sec: 46652.7). Total num frames: 1161297920. Throughput: 0: 11537.1. Samples: 290374656. Policy #0 lag: (min: 9.0, avg: 124.8, max: 265.0) [2024-06-15 18:29:40,956][1648985] Avg episode reward: [(0, '157.830')] [2024-06-15 18:29:41,847][1652491] Updated weights for policy 0, policy_version 567092 (0.0012) [2024-06-15 18:29:45,553][1652491] Updated weights for policy 0, policy_version 567136 (0.0015) [2024-06-15 18:29:45,955][1648985] Fps is (10 sec: 45911.1, 60 sec: 45875.1, 300 sec: 46874.9). Total num frames: 1161494528. Throughput: 0: 11662.1. Samples: 290449920. Policy #0 lag: (min: 9.0, avg: 124.8, max: 265.0) [2024-06-15 18:29:45,956][1648985] Avg episode reward: [(0, '146.670')] [2024-06-15 18:29:48,441][1652491] Updated weights for policy 0, policy_version 567189 (0.0013) [2024-06-15 18:29:49,883][1652491] Updated weights for policy 0, policy_version 567248 (0.0025) [2024-06-15 18:29:50,955][1648985] Fps is (10 sec: 52429.8, 60 sec: 47513.6, 300 sec: 47208.2). Total num frames: 1161822208. Throughput: 0: 11662.3. Samples: 290513408. Policy #0 lag: (min: 9.0, avg: 124.8, max: 265.0) [2024-06-15 18:29:50,955][1648985] Avg episode reward: [(0, '149.210')] [2024-06-15 18:29:52,314][1652491] Updated weights for policy 0, policy_version 567312 (0.0012) [2024-06-15 18:29:55,780][1652491] Updated weights for policy 0, policy_version 567376 (0.0013) [2024-06-15 18:29:55,955][1648985] Fps is (10 sec: 49153.0, 60 sec: 46421.3, 300 sec: 46763.8). Total num frames: 1161986048. Throughput: 0: 11639.5. Samples: 290549760. Policy #0 lag: (min: 9.0, avg: 124.8, max: 265.0) [2024-06-15 18:29:55,956][1648985] Avg episode reward: [(0, '139.770')] [2024-06-15 18:29:59,815][1652491] Updated weights for policy 0, policy_version 567440 (0.0157) [2024-06-15 18:30:00,955][1648985] Fps is (10 sec: 39321.0, 60 sec: 46421.3, 300 sec: 47097.0). Total num frames: 1162215424. Throughput: 0: 11764.6. Samples: 290625536. Policy #0 lag: (min: 9.0, avg: 124.8, max: 265.0) [2024-06-15 18:30:00,956][1648985] Avg episode reward: [(0, '158.290')] [2024-06-15 18:30:01,483][1652491] Updated weights for policy 0, policy_version 567506 (0.0014) [2024-06-15 18:30:02,071][1652491] Updated weights for policy 0, policy_version 567549 (0.0022) [2024-06-15 18:30:04,136][1651469] Signal inference workers to stop experience collection... (29550 times) [2024-06-15 18:30:04,174][1652491] Updated weights for policy 0, policy_version 567586 (0.0013) [2024-06-15 18:30:04,185][1652491] InferenceWorker_p0-w0: stopping experience collection (29550 times) [2024-06-15 18:30:04,358][1651469] Signal inference workers to resume experience collection... (29550 times) [2024-06-15 18:30:04,359][1652491] InferenceWorker_p0-w0: resuming experience collection (29550 times) [2024-06-15 18:30:05,955][1648985] Fps is (10 sec: 49151.3, 60 sec: 45875.0, 300 sec: 46656.4). Total num frames: 1162477568. Throughput: 0: 11798.7. Samples: 290695680. Policy #0 lag: (min: 9.0, avg: 124.8, max: 265.0) [2024-06-15 18:30:05,956][1648985] Avg episode reward: [(0, '179.720')] [2024-06-15 18:30:06,853][1652491] Updated weights for policy 0, policy_version 567648 (0.0015) [2024-06-15 18:30:10,764][1652491] Updated weights for policy 0, policy_version 567696 (0.0016) [2024-06-15 18:30:10,955][1648985] Fps is (10 sec: 42597.7, 60 sec: 46421.3, 300 sec: 46763.8). Total num frames: 1162641408. Throughput: 0: 11867.0. Samples: 290736128. Policy #0 lag: (min: 9.0, avg: 124.8, max: 265.0) [2024-06-15 18:30:10,956][1648985] Avg episode reward: [(0, '183.560')] [2024-06-15 18:30:12,875][1652491] Updated weights for policy 0, policy_version 567776 (0.0151) [2024-06-15 18:30:15,148][1652491] Updated weights for policy 0, policy_version 567840 (0.0030) [2024-06-15 18:30:15,955][1648985] Fps is (10 sec: 52429.6, 60 sec: 47513.6, 300 sec: 46874.9). Total num frames: 1163001856. Throughput: 0: 11650.8. Samples: 290798592. Policy #0 lag: (min: 9.0, avg: 124.8, max: 265.0) [2024-06-15 18:30:15,956][1648985] Avg episode reward: [(0, '181.370')] [2024-06-15 18:30:17,705][1652491] Updated weights for policy 0, policy_version 567888 (0.0015) [2024-06-15 18:30:18,832][1652491] Updated weights for policy 0, policy_version 567927 (0.0013) [2024-06-15 18:30:20,955][1648985] Fps is (10 sec: 49151.8, 60 sec: 45875.1, 300 sec: 46986.0). Total num frames: 1163132928. Throughput: 0: 11698.4. Samples: 290871296. Policy #0 lag: (min: 9.0, avg: 124.8, max: 265.0) [2024-06-15 18:30:20,956][1648985] Avg episode reward: [(0, '178.810')] [2024-06-15 18:30:23,579][1652491] Updated weights for policy 0, policy_version 568000 (0.0013) [2024-06-15 18:30:25,154][1652491] Updated weights for policy 0, policy_version 568064 (0.0066) [2024-06-15 18:30:25,955][1648985] Fps is (10 sec: 39321.7, 60 sec: 47513.6, 300 sec: 46763.9). Total num frames: 1163395072. Throughput: 0: 11616.7. Samples: 290897408. Policy #0 lag: (min: 9.0, avg: 124.8, max: 265.0) [2024-06-15 18:30:25,956][1648985] Avg episode reward: [(0, '172.970')] [2024-06-15 18:30:27,433][1652491] Updated weights for policy 0, policy_version 568127 (0.0014) [2024-06-15 18:30:30,663][1652491] Updated weights for policy 0, policy_version 568188 (0.0028) [2024-06-15 18:30:30,955][1648985] Fps is (10 sec: 52430.3, 60 sec: 46421.4, 300 sec: 47097.1). Total num frames: 1163657216. Throughput: 0: 11616.8. Samples: 290972672. Policy #0 lag: (min: 15.0, avg: 131.5, max: 271.0) [2024-06-15 18:30:30,955][1648985] Avg episode reward: [(0, '158.760')] [2024-06-15 18:30:34,840][1652491] Updated weights for policy 0, policy_version 568243 (0.0128) [2024-06-15 18:30:35,955][1648985] Fps is (10 sec: 42598.6, 60 sec: 46427.6, 300 sec: 46874.9). Total num frames: 1163821056. Throughput: 0: 11650.8. Samples: 291037696. Policy #0 lag: (min: 15.0, avg: 131.5, max: 271.0) [2024-06-15 18:30:35,955][1648985] Avg episode reward: [(0, '161.760')] [2024-06-15 18:30:36,557][1652491] Updated weights for policy 0, policy_version 568313 (0.0119) [2024-06-15 18:30:38,146][1652491] Updated weights for policy 0, policy_version 568368 (0.0012) [2024-06-15 18:30:40,955][1648985] Fps is (10 sec: 39320.7, 60 sec: 45875.1, 300 sec: 46652.8). Total num frames: 1164050432. Throughput: 0: 11548.4. Samples: 291069440. Policy #0 lag: (min: 15.0, avg: 131.5, max: 271.0) [2024-06-15 18:30:40,956][1648985] Avg episode reward: [(0, '155.960')] [2024-06-15 18:30:42,699][1652491] Updated weights for policy 0, policy_version 568443 (0.0139) [2024-06-15 18:30:45,955][1648985] Fps is (10 sec: 42598.1, 60 sec: 45875.4, 300 sec: 46874.9). Total num frames: 1164247040. Throughput: 0: 11457.4. Samples: 291141120. Policy #0 lag: (min: 15.0, avg: 131.5, max: 271.0) [2024-06-15 18:30:45,956][1648985] Avg episode reward: [(0, '160.840')] [2024-06-15 18:30:46,151][1652491] Updated weights for policy 0, policy_version 568483 (0.0083) [2024-06-15 18:30:48,342][1652491] Updated weights for policy 0, policy_version 568572 (0.0012) [2024-06-15 18:30:49,216][1651469] Signal inference workers to stop experience collection... (29600 times) [2024-06-15 18:30:49,266][1652491] InferenceWorker_p0-w0: stopping experience collection (29600 times) [2024-06-15 18:30:49,416][1651469] Signal inference workers to resume experience collection... (29600 times) [2024-06-15 18:30:49,417][1652491] InferenceWorker_p0-w0: resuming experience collection (29600 times) [2024-06-15 18:30:50,100][1652491] Updated weights for policy 0, policy_version 568624 (0.0016) [2024-06-15 18:30:50,955][1648985] Fps is (10 sec: 52429.4, 60 sec: 45875.1, 300 sec: 46652.8). Total num frames: 1164574720. Throughput: 0: 11252.7. Samples: 291202048. Policy #0 lag: (min: 15.0, avg: 131.5, max: 271.0) [2024-06-15 18:30:50,956][1648985] Avg episode reward: [(0, '154.790')] [2024-06-15 18:30:54,074][1652491] Updated weights for policy 0, policy_version 568700 (0.0030) [2024-06-15 18:30:55,955][1648985] Fps is (10 sec: 45874.9, 60 sec: 45329.0, 300 sec: 46876.7). Total num frames: 1164705792. Throughput: 0: 11104.7. Samples: 291235840. Policy #0 lag: (min: 15.0, avg: 131.5, max: 271.0) [2024-06-15 18:30:55,956][1648985] Avg episode reward: [(0, '161.340')] [2024-06-15 18:30:55,972][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000568704_1164705792.pth... [2024-06-15 18:30:56,043][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000563232_1153499136.pth [2024-06-15 18:30:56,049][1651469] Saving a milestone train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/milestones/checkpoint_000568704_1164705792.pth [2024-06-15 18:30:58,389][1652491] Updated weights for policy 0, policy_version 568739 (0.0010) [2024-06-15 18:30:59,565][1652491] Updated weights for policy 0, policy_version 568787 (0.0014) [2024-06-15 18:31:00,955][1648985] Fps is (10 sec: 39321.4, 60 sec: 45875.1, 300 sec: 46541.7). Total num frames: 1164967936. Throughput: 0: 11218.5. Samples: 291303424. Policy #0 lag: (min: 15.0, avg: 131.5, max: 271.0) [2024-06-15 18:31:00,956][1648985] Avg episode reward: [(0, '164.430')] [2024-06-15 18:31:01,631][1652491] Updated weights for policy 0, policy_version 568864 (0.0012) [2024-06-15 18:31:04,982][1652491] Updated weights for policy 0, policy_version 568912 (0.0013) [2024-06-15 18:31:05,955][1648985] Fps is (10 sec: 52429.6, 60 sec: 45875.4, 300 sec: 47097.1). Total num frames: 1165230080. Throughput: 0: 11070.6. Samples: 291369472. Policy #0 lag: (min: 15.0, avg: 131.5, max: 271.0) [2024-06-15 18:31:05,955][1648985] Avg episode reward: [(0, '150.420')] [2024-06-15 18:31:09,150][1652491] Updated weights for policy 0, policy_version 568976 (0.0017) [2024-06-15 18:31:10,955][1648985] Fps is (10 sec: 42597.7, 60 sec: 45875.1, 300 sec: 46652.7). Total num frames: 1165393920. Throughput: 0: 11411.9. Samples: 291410944. Policy #0 lag: (min: 15.0, avg: 131.5, max: 271.0) [2024-06-15 18:31:10,956][1648985] Avg episode reward: [(0, '147.980')] [2024-06-15 18:31:11,573][1652491] Updated weights for policy 0, policy_version 569072 (0.0013) [2024-06-15 18:31:13,929][1652491] Updated weights for policy 0, policy_version 569145 (0.0012) [2024-06-15 18:31:15,955][1648985] Fps is (10 sec: 39321.5, 60 sec: 43690.7, 300 sec: 46652.7). Total num frames: 1165623296. Throughput: 0: 10899.9. Samples: 291463168. Policy #0 lag: (min: 15.0, avg: 131.5, max: 271.0) [2024-06-15 18:31:15,956][1648985] Avg episode reward: [(0, '146.710')] [2024-06-15 18:31:17,896][1652491] Updated weights for policy 0, policy_version 569193 (0.0042) [2024-06-15 18:31:20,755][1652491] Updated weights for policy 0, policy_version 569219 (0.0012) [2024-06-15 18:31:20,955][1648985] Fps is (10 sec: 36044.8, 60 sec: 43690.6, 300 sec: 46541.6). Total num frames: 1165754368. Throughput: 0: 11309.4. Samples: 291546624. Policy #0 lag: (min: 15.0, avg: 131.5, max: 271.0) [2024-06-15 18:31:20,956][1648985] Avg episode reward: [(0, '151.960')] [2024-06-15 18:31:22,038][1652491] Updated weights for policy 0, policy_version 569280 (0.0013) [2024-06-15 18:31:23,557][1652491] Updated weights for policy 0, policy_version 569338 (0.0011) [2024-06-15 18:31:24,608][1652491] Updated weights for policy 0, policy_version 569376 (0.0013) [2024-06-15 18:31:25,955][1648985] Fps is (10 sec: 52428.5, 60 sec: 45875.2, 300 sec: 46652.8). Total num frames: 1166147584. Throughput: 0: 11218.5. Samples: 291574272. Policy #0 lag: (min: 15.0, avg: 131.5, max: 271.0) [2024-06-15 18:31:25,956][1648985] Avg episode reward: [(0, '160.160')] [2024-06-15 18:31:28,719][1652491] Updated weights for policy 0, policy_version 569440 (0.0015) [2024-06-15 18:31:30,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 43690.5, 300 sec: 46652.7). Total num frames: 1166278656. Throughput: 0: 11229.8. Samples: 291646464. Policy #0 lag: (min: 15.0, avg: 131.5, max: 271.0) [2024-06-15 18:31:30,957][1648985] Avg episode reward: [(0, '168.260')] [2024-06-15 18:31:31,902][1652491] Updated weights for policy 0, policy_version 569494 (0.0017) [2024-06-15 18:31:33,014][1651469] Signal inference workers to stop experience collection... (29650 times) [2024-06-15 18:31:33,046][1652491] InferenceWorker_p0-w0: stopping experience collection (29650 times) [2024-06-15 18:31:33,407][1651469] Signal inference workers to resume experience collection... (29650 times) [2024-06-15 18:31:33,408][1652491] InferenceWorker_p0-w0: resuming experience collection (29650 times) [2024-06-15 18:31:34,308][1652491] Updated weights for policy 0, policy_version 569584 (0.0113) [2024-06-15 18:31:35,761][1652491] Updated weights for policy 0, policy_version 569620 (0.0013) [2024-06-15 18:31:35,955][1648985] Fps is (10 sec: 45875.6, 60 sec: 46421.4, 300 sec: 46652.8). Total num frames: 1166606336. Throughput: 0: 11275.4. Samples: 291709440. Policy #0 lag: (min: 15.0, avg: 131.5, max: 271.0) [2024-06-15 18:31:35,955][1648985] Avg episode reward: [(0, '146.020')] [2024-06-15 18:31:36,608][1652491] Updated weights for policy 0, policy_version 569664 (0.0017) [2024-06-15 18:31:40,702][1652491] Updated weights for policy 0, policy_version 569714 (0.0013) [2024-06-15 18:31:40,955][1648985] Fps is (10 sec: 52429.4, 60 sec: 45875.3, 300 sec: 47097.1). Total num frames: 1166802944. Throughput: 0: 11423.3. Samples: 291749888. Policy #0 lag: (min: 15.0, avg: 131.5, max: 271.0) [2024-06-15 18:31:40,956][1648985] Avg episode reward: [(0, '138.420')] [2024-06-15 18:31:43,892][1652491] Updated weights for policy 0, policy_version 569760 (0.0016) [2024-06-15 18:31:45,955][1648985] Fps is (10 sec: 39320.8, 60 sec: 45875.1, 300 sec: 46430.6). Total num frames: 1166999552. Throughput: 0: 11434.6. Samples: 291817984. Policy #0 lag: (min: 15.0, avg: 131.5, max: 271.0) [2024-06-15 18:31:45,956][1648985] Avg episode reward: [(0, '139.850')] [2024-06-15 18:31:46,577][1652491] Updated weights for policy 0, policy_version 569856 (0.0018) [2024-06-15 18:31:50,955][1648985] Fps is (10 sec: 39322.2, 60 sec: 43690.7, 300 sec: 46541.7). Total num frames: 1167196160. Throughput: 0: 11320.9. Samples: 291878912. Policy #0 lag: (min: 15.0, avg: 131.5, max: 271.0) [2024-06-15 18:31:50,955][1648985] Avg episode reward: [(0, '150.550')] [2024-06-15 18:31:51,495][1652491] Updated weights for policy 0, policy_version 569926 (0.0051) [2024-06-15 18:31:52,862][1652491] Updated weights for policy 0, policy_version 569977 (0.0011) [2024-06-15 18:31:55,955][1648985] Fps is (10 sec: 32768.3, 60 sec: 43690.7, 300 sec: 45986.3). Total num frames: 1167327232. Throughput: 0: 11184.4. Samples: 291914240. Policy #0 lag: (min: 15.0, avg: 131.5, max: 271.0) [2024-06-15 18:31:55,956][1648985] Avg episode reward: [(0, '183.710')] [2024-06-15 18:31:56,654][1652491] Updated weights for policy 0, policy_version 570018 (0.0016) [2024-06-15 18:31:58,309][1652491] Updated weights for policy 0, policy_version 570083 (0.0012) [2024-06-15 18:31:59,828][1652491] Updated weights for policy 0, policy_version 570144 (0.0012) [2024-06-15 18:32:00,955][1648985] Fps is (10 sec: 52428.3, 60 sec: 45875.2, 300 sec: 46652.7). Total num frames: 1167720448. Throughput: 0: 11286.7. Samples: 291971072. Policy #0 lag: (min: 15.0, avg: 131.5, max: 271.0) [2024-06-15 18:32:00,956][1648985] Avg episode reward: [(0, '165.520')] [2024-06-15 18:32:02,709][1652491] Updated weights for policy 0, policy_version 570181 (0.0013) [2024-06-15 18:32:03,947][1652491] Updated weights for policy 0, policy_version 570232 (0.0013) [2024-06-15 18:32:05,965][1648985] Fps is (10 sec: 52376.3, 60 sec: 43683.3, 300 sec: 46317.9). Total num frames: 1167851520. Throughput: 0: 11204.7. Samples: 292050944. Policy #0 lag: (min: 13.0, avg: 132.9, max: 269.0) [2024-06-15 18:32:05,966][1648985] Avg episode reward: [(0, '145.330')] [2024-06-15 18:32:08,576][1652491] Updated weights for policy 0, policy_version 570263 (0.0024) [2024-06-15 18:32:10,955][1648985] Fps is (10 sec: 36044.7, 60 sec: 44783.1, 300 sec: 46097.4). Total num frames: 1168080896. Throughput: 0: 11389.2. Samples: 292086784. Policy #0 lag: (min: 13.0, avg: 132.9, max: 269.0) [2024-06-15 18:32:10,956][1648985] Avg episode reward: [(0, '121.260')] [2024-06-15 18:32:11,691][1652491] Updated weights for policy 0, policy_version 570387 (0.0078) [2024-06-15 18:32:14,641][1652491] Updated weights for policy 0, policy_version 570448 (0.0013) [2024-06-15 18:32:15,111][1651469] Signal inference workers to stop experience collection... (29700 times) [2024-06-15 18:32:15,150][1652491] InferenceWorker_p0-w0: stopping experience collection (29700 times) [2024-06-15 18:32:15,336][1651469] Signal inference workers to resume experience collection... (29700 times) [2024-06-15 18:32:15,342][1652491] InferenceWorker_p0-w0: resuming experience collection (29700 times) [2024-06-15 18:32:15,955][1648985] Fps is (10 sec: 52481.8, 60 sec: 45875.2, 300 sec: 46652.8). Total num frames: 1168375808. Throughput: 0: 11025.1. Samples: 292142592. Policy #0 lag: (min: 13.0, avg: 132.9, max: 269.0) [2024-06-15 18:32:15,955][1648985] Avg episode reward: [(0, '133.200')] [2024-06-15 18:32:20,226][1652491] Updated weights for policy 0, policy_version 570512 (0.0014) [2024-06-15 18:32:20,955][1648985] Fps is (10 sec: 39321.4, 60 sec: 45329.2, 300 sec: 45653.0). Total num frames: 1168474112. Throughput: 0: 11343.6. Samples: 292219904. Policy #0 lag: (min: 13.0, avg: 132.9, max: 269.0) [2024-06-15 18:32:20,956][1648985] Avg episode reward: [(0, '144.870')] [2024-06-15 18:32:21,387][1652491] Updated weights for policy 0, policy_version 570563 (0.0013) [2024-06-15 18:32:23,151][1652491] Updated weights for policy 0, policy_version 570640 (0.0013) [2024-06-15 18:32:23,969][1652491] Updated weights for policy 0, policy_version 570681 (0.0012) [2024-06-15 18:32:25,955][1648985] Fps is (10 sec: 45875.2, 60 sec: 44783.0, 300 sec: 46430.6). Total num frames: 1168834560. Throughput: 0: 10991.0. Samples: 292244480. Policy #0 lag: (min: 13.0, avg: 132.9, max: 269.0) [2024-06-15 18:32:25,955][1648985] Avg episode reward: [(0, '144.530')] [2024-06-15 18:32:26,180][1652491] Updated weights for policy 0, policy_version 570736 (0.0013) [2024-06-15 18:32:30,955][1648985] Fps is (10 sec: 42598.8, 60 sec: 43690.8, 300 sec: 45319.8). Total num frames: 1168900096. Throughput: 0: 11320.9. Samples: 292327424. Policy #0 lag: (min: 13.0, avg: 132.9, max: 269.0) [2024-06-15 18:32:30,955][1648985] Avg episode reward: [(0, '152.320')] [2024-06-15 18:32:31,755][1652491] Updated weights for policy 0, policy_version 570785 (0.0016) [2024-06-15 18:32:33,238][1652491] Updated weights for policy 0, policy_version 570866 (0.0018) [2024-06-15 18:32:34,882][1652491] Updated weights for policy 0, policy_version 570939 (0.0020) [2024-06-15 18:32:35,955][1648985] Fps is (10 sec: 45875.8, 60 sec: 44783.0, 300 sec: 46208.5). Total num frames: 1169293312. Throughput: 0: 11366.4. Samples: 292390400. Policy #0 lag: (min: 13.0, avg: 132.9, max: 269.0) [2024-06-15 18:32:35,956][1648985] Avg episode reward: [(0, '158.710')] [2024-06-15 18:32:37,237][1652491] Updated weights for policy 0, policy_version 570992 (0.0013) [2024-06-15 18:32:40,955][1648985] Fps is (10 sec: 52429.3, 60 sec: 43690.8, 300 sec: 45542.0). Total num frames: 1169424384. Throughput: 0: 11468.8. Samples: 292430336. Policy #0 lag: (min: 13.0, avg: 132.9, max: 269.0) [2024-06-15 18:32:40,955][1648985] Avg episode reward: [(0, '142.450')] [2024-06-15 18:32:42,525][1652491] Updated weights for policy 0, policy_version 571025 (0.0013) [2024-06-15 18:32:43,724][1652491] Updated weights for policy 0, policy_version 571088 (0.0040) [2024-06-15 18:32:45,333][1652491] Updated weights for policy 0, policy_version 571152 (0.0013) [2024-06-15 18:32:45,955][1648985] Fps is (10 sec: 45874.4, 60 sec: 45875.3, 300 sec: 46208.4). Total num frames: 1169752064. Throughput: 0: 11798.7. Samples: 292502016. Policy #0 lag: (min: 13.0, avg: 132.9, max: 269.0) [2024-06-15 18:32:45,956][1648985] Avg episode reward: [(0, '143.610')] [2024-06-15 18:32:46,201][1652491] Updated weights for policy 0, policy_version 571195 (0.0012) [2024-06-15 18:32:47,891][1652491] Updated weights for policy 0, policy_version 571253 (0.0014) [2024-06-15 18:32:50,955][1648985] Fps is (10 sec: 52427.2, 60 sec: 45875.0, 300 sec: 45986.2). Total num frames: 1169948672. Throughput: 0: 11744.4. Samples: 292579328. Policy #0 lag: (min: 13.0, avg: 132.9, max: 269.0) [2024-06-15 18:32:50,956][1648985] Avg episode reward: [(0, '148.500')] [2024-06-15 18:32:52,637][1652491] Updated weights for policy 0, policy_version 571280 (0.0014) [2024-06-15 18:32:53,958][1652491] Updated weights for policy 0, policy_version 571331 (0.0012) [2024-06-15 18:32:55,007][1651469] Signal inference workers to stop experience collection... (29750 times) [2024-06-15 18:32:55,056][1652491] Updated weights for policy 0, policy_version 571379 (0.0011) [2024-06-15 18:32:55,067][1652491] InferenceWorker_p0-w0: stopping experience collection (29750 times) [2024-06-15 18:32:55,194][1651469] Signal inference workers to resume experience collection... (29750 times) [2024-06-15 18:32:55,196][1652491] InferenceWorker_p0-w0: resuming experience collection (29750 times) [2024-06-15 18:32:55,972][1648985] Fps is (10 sec: 49068.5, 60 sec: 48592.1, 300 sec: 46316.8). Total num frames: 1170243584. Throughput: 0: 11726.1. Samples: 292614656. Policy #0 lag: (min: 13.0, avg: 132.9, max: 269.0) [2024-06-15 18:32:55,973][1648985] Avg episode reward: [(0, '145.490')] [2024-06-15 18:32:56,436][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000571440_1170309120.pth... [2024-06-15 18:32:56,474][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000566016_1159200768.pth [2024-06-15 18:32:56,611][1652491] Updated weights for policy 0, policy_version 571448 (0.0016) [2024-06-15 18:32:58,463][1652491] Updated weights for policy 0, policy_version 571494 (0.0024) [2024-06-15 18:33:00,955][1648985] Fps is (10 sec: 52429.4, 60 sec: 45875.1, 300 sec: 46319.5). Total num frames: 1170472960. Throughput: 0: 12049.0. Samples: 292684800. Policy #0 lag: (min: 13.0, avg: 132.9, max: 269.0) [2024-06-15 18:33:00,956][1648985] Avg episode reward: [(0, '149.730')] [2024-06-15 18:33:03,320][1652491] Updated weights for policy 0, policy_version 571536 (0.0012) [2024-06-15 18:33:05,002][1652491] Updated weights for policy 0, policy_version 571600 (0.0084) [2024-06-15 18:33:05,955][1648985] Fps is (10 sec: 45953.9, 60 sec: 47521.6, 300 sec: 46097.4). Total num frames: 1170702336. Throughput: 0: 11946.7. Samples: 292757504. Policy #0 lag: (min: 13.0, avg: 132.9, max: 269.0) [2024-06-15 18:33:05,955][1648985] Avg episode reward: [(0, '141.360')] [2024-06-15 18:33:06,599][1652491] Updated weights for policy 0, policy_version 571667 (0.0025) [2024-06-15 18:33:07,524][1652491] Updated weights for policy 0, policy_version 571709 (0.0148) [2024-06-15 18:33:09,851][1652491] Updated weights for policy 0, policy_version 571774 (0.0015) [2024-06-15 18:33:10,955][1648985] Fps is (10 sec: 52429.4, 60 sec: 48605.9, 300 sec: 46432.4). Total num frames: 1170997248. Throughput: 0: 12140.1. Samples: 292790784. Policy #0 lag: (min: 13.0, avg: 132.9, max: 269.0) [2024-06-15 18:33:10,956][1648985] Avg episode reward: [(0, '148.790')] [2024-06-15 18:33:15,237][1652491] Updated weights for policy 0, policy_version 571824 (0.0011) [2024-06-15 18:33:15,955][1648985] Fps is (10 sec: 42598.6, 60 sec: 45875.3, 300 sec: 45764.1). Total num frames: 1171128320. Throughput: 0: 12037.7. Samples: 292869120. Policy #0 lag: (min: 13.0, avg: 132.9, max: 269.0) [2024-06-15 18:33:15,955][1648985] Avg episode reward: [(0, '150.810')] [2024-06-15 18:33:17,668][1652491] Updated weights for policy 0, policy_version 571920 (0.0015) [2024-06-15 18:33:20,955][1648985] Fps is (10 sec: 42598.5, 60 sec: 49152.1, 300 sec: 46320.3). Total num frames: 1171423232. Throughput: 0: 11855.6. Samples: 292923904. Policy #0 lag: (min: 13.0, avg: 132.9, max: 269.0) [2024-06-15 18:33:20,955][1648985] Avg episode reward: [(0, '162.400')] [2024-06-15 18:33:21,438][1652491] Updated weights for policy 0, policy_version 572016 (0.0105) [2024-06-15 18:33:25,955][1648985] Fps is (10 sec: 39319.8, 60 sec: 44782.7, 300 sec: 45430.8). Total num frames: 1171521536. Throughput: 0: 11753.1. Samples: 292959232. Policy #0 lag: (min: 13.0, avg: 132.9, max: 269.0) [2024-06-15 18:33:25,956][1648985] Avg episode reward: [(0, '161.120')] [2024-06-15 18:33:26,451][1652491] Updated weights for policy 0, policy_version 572049 (0.0023) [2024-06-15 18:33:28,587][1652491] Updated weights for policy 0, policy_version 572144 (0.0013) [2024-06-15 18:33:30,336][1652491] Updated weights for policy 0, policy_version 572218 (0.0013) [2024-06-15 18:33:30,955][1648985] Fps is (10 sec: 49151.6, 60 sec: 50244.2, 300 sec: 46208.4). Total num frames: 1171914752. Throughput: 0: 11673.6. Samples: 293027328. Policy #0 lag: (min: 13.0, avg: 132.9, max: 269.0) [2024-06-15 18:33:30,956][1648985] Avg episode reward: [(0, '170.470')] [2024-06-15 18:33:32,743][1652491] Updated weights for policy 0, policy_version 572272 (0.0012) [2024-06-15 18:33:35,955][1648985] Fps is (10 sec: 52430.7, 60 sec: 45875.1, 300 sec: 45764.2). Total num frames: 1172045824. Throughput: 0: 11616.8. Samples: 293102080. Policy #0 lag: (min: 13.0, avg: 132.9, max: 269.0) [2024-06-15 18:33:35,955][1648985] Avg episode reward: [(0, '156.340')] [2024-06-15 18:33:37,979][1651469] Signal inference workers to stop experience collection... (29800 times) [2024-06-15 18:33:38,042][1652491] InferenceWorker_p0-w0: stopping experience collection (29800 times) [2024-06-15 18:33:38,148][1651469] Signal inference workers to resume experience collection... (29800 times) [2024-06-15 18:33:38,149][1652491] InferenceWorker_p0-w0: resuming experience collection (29800 times) [2024-06-15 18:33:38,151][1652491] Updated weights for policy 0, policy_version 572320 (0.0013) [2024-06-15 18:33:39,726][1652491] Updated weights for policy 0, policy_version 572387 (0.0013) [2024-06-15 18:33:40,913][1652491] Updated weights for policy 0, policy_version 572435 (0.0012) [2024-06-15 18:33:40,955][1648985] Fps is (10 sec: 42598.1, 60 sec: 48605.7, 300 sec: 46097.4). Total num frames: 1172340736. Throughput: 0: 11609.7. Samples: 293136896. Policy #0 lag: (min: 1.0, avg: 71.4, max: 257.0) [2024-06-15 18:33:40,956][1648985] Avg episode reward: [(0, '142.660')] [2024-06-15 18:33:43,218][1652491] Updated weights for policy 0, policy_version 572484 (0.0011) [2024-06-15 18:33:44,404][1652491] Updated weights for policy 0, policy_version 572541 (0.0013) [2024-06-15 18:33:45,955][1648985] Fps is (10 sec: 52428.6, 60 sec: 46967.5, 300 sec: 46097.3). Total num frames: 1172570112. Throughput: 0: 11411.9. Samples: 293198336. Policy #0 lag: (min: 1.0, avg: 71.4, max: 257.0) [2024-06-15 18:33:45,956][1648985] Avg episode reward: [(0, '143.640')] [2024-06-15 18:33:50,613][1652491] Updated weights for policy 0, policy_version 572611 (0.0013) [2024-06-15 18:33:50,955][1648985] Fps is (10 sec: 39322.4, 60 sec: 46421.5, 300 sec: 45875.2). Total num frames: 1172733952. Throughput: 0: 11411.9. Samples: 293271040. Policy #0 lag: (min: 1.0, avg: 71.4, max: 257.0) [2024-06-15 18:33:50,955][1648985] Avg episode reward: [(0, '153.140')] [2024-06-15 18:33:52,236][1652491] Updated weights for policy 0, policy_version 572677 (0.0015) [2024-06-15 18:33:53,454][1652491] Updated weights for policy 0, policy_version 572730 (0.0014) [2024-06-15 18:33:55,955][1648985] Fps is (10 sec: 45875.1, 60 sec: 46434.5, 300 sec: 46097.3). Total num frames: 1173028864. Throughput: 0: 11320.9. Samples: 293300224. Policy #0 lag: (min: 1.0, avg: 71.4, max: 257.0) [2024-06-15 18:33:55,956][1648985] Avg episode reward: [(0, '161.050')] [2024-06-15 18:33:56,199][1652491] Updated weights for policy 0, policy_version 572788 (0.0115) [2024-06-15 18:34:00,955][1648985] Fps is (10 sec: 42598.3, 60 sec: 44783.0, 300 sec: 45542.0). Total num frames: 1173159936. Throughput: 0: 11275.4. Samples: 293376512. Policy #0 lag: (min: 1.0, avg: 71.4, max: 257.0) [2024-06-15 18:34:00,956][1648985] Avg episode reward: [(0, '149.860')] [2024-06-15 18:34:01,399][1652491] Updated weights for policy 0, policy_version 572851 (0.0089) [2024-06-15 18:34:02,932][1652491] Updated weights for policy 0, policy_version 572915 (0.0013) [2024-06-15 18:34:04,657][1652491] Updated weights for policy 0, policy_version 572984 (0.0014) [2024-06-15 18:34:05,955][1648985] Fps is (10 sec: 45875.4, 60 sec: 46421.3, 300 sec: 46208.4). Total num frames: 1173487616. Throughput: 0: 11411.9. Samples: 293437440. Policy #0 lag: (min: 1.0, avg: 71.4, max: 257.0) [2024-06-15 18:34:05,956][1648985] Avg episode reward: [(0, '143.610')] [2024-06-15 18:34:07,631][1652491] Updated weights for policy 0, policy_version 573025 (0.0024) [2024-06-15 18:34:10,955][1648985] Fps is (10 sec: 45874.7, 60 sec: 43690.6, 300 sec: 45653.0). Total num frames: 1173618688. Throughput: 0: 11457.5. Samples: 293474816. Policy #0 lag: (min: 1.0, avg: 71.4, max: 257.0) [2024-06-15 18:34:10,956][1648985] Avg episode reward: [(0, '143.550')] [2024-06-15 18:34:12,240][1652491] Updated weights for policy 0, policy_version 573072 (0.0139) [2024-06-15 18:34:14,022][1652491] Updated weights for policy 0, policy_version 573143 (0.0012) [2024-06-15 18:34:15,540][1651469] Signal inference workers to stop experience collection... (29850 times) [2024-06-15 18:34:15,575][1652491] Updated weights for policy 0, policy_version 573201 (0.0014) [2024-06-15 18:34:15,612][1652491] InferenceWorker_p0-w0: stopping experience collection (29850 times) [2024-06-15 18:34:15,798][1651469] Signal inference workers to resume experience collection... (29850 times) [2024-06-15 18:34:15,800][1652491] InferenceWorker_p0-w0: resuming experience collection (29850 times) [2024-06-15 18:34:15,956][1648985] Fps is (10 sec: 45869.0, 60 sec: 46966.3, 300 sec: 45986.1). Total num frames: 1173946368. Throughput: 0: 11400.2. Samples: 293540352. Policy #0 lag: (min: 1.0, avg: 71.4, max: 257.0) [2024-06-15 18:34:15,957][1648985] Avg episode reward: [(0, '155.750')] [2024-06-15 18:34:16,613][1652491] Updated weights for policy 0, policy_version 573246 (0.0013) [2024-06-15 18:34:19,398][1652491] Updated weights for policy 0, policy_version 573296 (0.0014) [2024-06-15 18:34:20,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 45329.0, 300 sec: 46097.3). Total num frames: 1174142976. Throughput: 0: 11252.6. Samples: 293608448. Policy #0 lag: (min: 1.0, avg: 71.4, max: 257.0) [2024-06-15 18:34:20,956][1648985] Avg episode reward: [(0, '179.970')] [2024-06-15 18:34:23,760][1652491] Updated weights for policy 0, policy_version 573344 (0.0012) [2024-06-15 18:34:24,556][1652491] Updated weights for policy 0, policy_version 573374 (0.0010) [2024-06-15 18:34:25,875][1652491] Updated weights for policy 0, policy_version 573424 (0.0012) [2024-06-15 18:34:25,955][1648985] Fps is (10 sec: 42603.5, 60 sec: 47513.7, 300 sec: 45764.1). Total num frames: 1174372352. Throughput: 0: 11366.4. Samples: 293648384. Policy #0 lag: (min: 1.0, avg: 71.4, max: 257.0) [2024-06-15 18:34:25,956][1648985] Avg episode reward: [(0, '190.660')] [2024-06-15 18:34:27,718][1652491] Updated weights for policy 0, policy_version 573494 (0.0071) [2024-06-15 18:34:29,852][1652491] Updated weights for policy 0, policy_version 573522 (0.0012) [2024-06-15 18:34:30,955][1648985] Fps is (10 sec: 49152.2, 60 sec: 45329.1, 300 sec: 46098.6). Total num frames: 1174634496. Throughput: 0: 11457.4. Samples: 293713920. Policy #0 lag: (min: 1.0, avg: 71.4, max: 257.0) [2024-06-15 18:34:30,955][1648985] Avg episode reward: [(0, '166.620')] [2024-06-15 18:34:31,103][1652491] Updated weights for policy 0, policy_version 573567 (0.0044) [2024-06-15 18:34:35,413][1652491] Updated weights for policy 0, policy_version 573625 (0.0121) [2024-06-15 18:34:35,955][1648985] Fps is (10 sec: 42599.1, 60 sec: 45875.2, 300 sec: 45764.1). Total num frames: 1174798336. Throughput: 0: 11502.9. Samples: 293788672. Policy #0 lag: (min: 1.0, avg: 71.4, max: 257.0) [2024-06-15 18:34:35,956][1648985] Avg episode reward: [(0, '148.010')] [2024-06-15 18:34:36,914][1652491] Updated weights for policy 0, policy_version 573653 (0.0015) [2024-06-15 18:34:38,774][1652491] Updated weights for policy 0, policy_version 573728 (0.0029) [2024-06-15 18:34:40,817][1652491] Updated weights for policy 0, policy_version 573779 (0.0013) [2024-06-15 18:34:40,955][1648985] Fps is (10 sec: 45875.0, 60 sec: 45875.3, 300 sec: 46097.4). Total num frames: 1175093248. Throughput: 0: 11423.3. Samples: 293814272. Policy #0 lag: (min: 1.0, avg: 71.4, max: 257.0) [2024-06-15 18:34:40,956][1648985] Avg episode reward: [(0, '147.720')] [2024-06-15 18:34:41,857][1652491] Updated weights for policy 0, policy_version 573824 (0.0011) [2024-06-15 18:34:45,958][1648985] Fps is (10 sec: 39308.5, 60 sec: 43688.2, 300 sec: 45319.3). Total num frames: 1175191552. Throughput: 0: 11342.8. Samples: 293886976. Policy #0 lag: (min: 1.0, avg: 71.4, max: 257.0) [2024-06-15 18:34:45,959][1648985] Avg episode reward: [(0, '152.700')] [2024-06-15 18:34:47,249][1652491] Updated weights for policy 0, policy_version 573886 (0.0023) [2024-06-15 18:34:48,838][1652491] Updated weights for policy 0, policy_version 573936 (0.0017) [2024-06-15 18:34:50,955][1648985] Fps is (10 sec: 45875.4, 60 sec: 46967.4, 300 sec: 45986.3). Total num frames: 1175552000. Throughput: 0: 11309.5. Samples: 293946368. Policy #0 lag: (min: 1.0, avg: 71.4, max: 257.0) [2024-06-15 18:34:50,956][1648985] Avg episode reward: [(0, '144.520')] [2024-06-15 18:34:50,967][1652491] Updated weights for policy 0, policy_version 574006 (0.0130) [2024-06-15 18:34:52,783][1652491] Updated weights for policy 0, policy_version 574048 (0.0012) [2024-06-15 18:34:53,565][1652491] Updated weights for policy 0, policy_version 574080 (0.0014) [2024-06-15 18:34:55,955][1648985] Fps is (10 sec: 52445.5, 60 sec: 44782.8, 300 sec: 45764.1). Total num frames: 1175715840. Throughput: 0: 11332.2. Samples: 293984768. Policy #0 lag: (min: 1.0, avg: 71.4, max: 257.0) [2024-06-15 18:34:55,956][1648985] Avg episode reward: [(0, '150.910')] [2024-06-15 18:34:55,961][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000574080_1175715840.pth... [2024-06-15 18:34:56,038][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000568704_1164705792.pth [2024-06-15 18:34:57,990][1652491] Updated weights for policy 0, policy_version 574144 (0.0014) [2024-06-15 18:34:59,604][1652491] Updated weights for policy 0, policy_version 574206 (0.0021) [2024-06-15 18:35:00,942][1651469] Signal inference workers to stop experience collection... (29900 times) [2024-06-15 18:35:00,958][1648985] Fps is (10 sec: 42585.2, 60 sec: 46965.0, 300 sec: 45763.7). Total num frames: 1175977984. Throughput: 0: 11604.9. Samples: 294062592. Policy #0 lag: (min: 1.0, avg: 71.4, max: 257.0) [2024-06-15 18:35:00,959][1648985] Avg episode reward: [(0, '144.130')] [2024-06-15 18:35:00,980][1652491] InferenceWorker_p0-w0: stopping experience collection (29900 times) [2024-06-15 18:35:01,096][1651469] Signal inference workers to resume experience collection... (29900 times) [2024-06-15 18:35:01,097][1652491] InferenceWorker_p0-w0: resuming experience collection (29900 times) [2024-06-15 18:35:02,087][1652491] Updated weights for policy 0, policy_version 574262 (0.0014) [2024-06-15 18:35:03,433][1652491] Updated weights for policy 0, policy_version 574305 (0.0013) [2024-06-15 18:35:05,955][1648985] Fps is (10 sec: 52429.5, 60 sec: 45875.2, 300 sec: 46097.4). Total num frames: 1176240128. Throughput: 0: 11730.5. Samples: 294136320. Policy #0 lag: (min: 1.0, avg: 71.4, max: 257.0) [2024-06-15 18:35:05,956][1648985] Avg episode reward: [(0, '158.130')] [2024-06-15 18:35:07,898][1652491] Updated weights for policy 0, policy_version 574353 (0.0012) [2024-06-15 18:35:09,304][1652491] Updated weights for policy 0, policy_version 574401 (0.0014) [2024-06-15 18:35:10,955][1648985] Fps is (10 sec: 52444.5, 60 sec: 48059.7, 300 sec: 45764.1). Total num frames: 1176502272. Throughput: 0: 11662.2. Samples: 294173184. Policy #0 lag: (min: 2.0, avg: 116.2, max: 258.0) [2024-06-15 18:35:10,956][1648985] Avg episode reward: [(0, '132.840')] [2024-06-15 18:35:12,450][1652491] Updated weights for policy 0, policy_version 574484 (0.0014) [2024-06-15 18:35:13,933][1652491] Updated weights for policy 0, policy_version 574544 (0.0013) [2024-06-15 18:35:14,961][1652491] Updated weights for policy 0, policy_version 574585 (0.0012) [2024-06-15 18:35:15,955][1648985] Fps is (10 sec: 52428.5, 60 sec: 46968.5, 300 sec: 46208.5). Total num frames: 1176764416. Throughput: 0: 11593.9. Samples: 294235648. Policy #0 lag: (min: 2.0, avg: 116.2, max: 258.0) [2024-06-15 18:35:15,956][1648985] Avg episode reward: [(0, '140.600')] [2024-06-15 18:35:19,409][1652491] Updated weights for policy 0, policy_version 574626 (0.0014) [2024-06-15 18:35:20,943][1652491] Updated weights for policy 0, policy_version 574672 (0.0013) [2024-06-15 18:35:20,955][1648985] Fps is (10 sec: 42598.4, 60 sec: 46421.3, 300 sec: 45875.2). Total num frames: 1176928256. Throughput: 0: 11730.5. Samples: 294316544. Policy #0 lag: (min: 2.0, avg: 116.2, max: 258.0) [2024-06-15 18:35:20,956][1648985] Avg episode reward: [(0, '140.860')] [2024-06-15 18:35:22,994][1652491] Updated weights for policy 0, policy_version 574736 (0.0013) [2024-06-15 18:35:24,404][1652491] Updated weights for policy 0, policy_version 574791 (0.0014) [2024-06-15 18:35:25,688][1652491] Updated weights for policy 0, policy_version 574848 (0.0015) [2024-06-15 18:35:25,955][1648985] Fps is (10 sec: 52427.9, 60 sec: 48605.8, 300 sec: 46208.4). Total num frames: 1177288704. Throughput: 0: 11867.0. Samples: 294348288. Policy #0 lag: (min: 2.0, avg: 116.2, max: 258.0) [2024-06-15 18:35:25,956][1648985] Avg episode reward: [(0, '167.720')] [2024-06-15 18:35:30,955][1648985] Fps is (10 sec: 45875.7, 60 sec: 45875.2, 300 sec: 45986.3). Total num frames: 1177387008. Throughput: 0: 11936.2. Samples: 294424064. Policy #0 lag: (min: 2.0, avg: 116.2, max: 258.0) [2024-06-15 18:35:30,955][1648985] Avg episode reward: [(0, '165.450')] [2024-06-15 18:35:31,105][1652491] Updated weights for policy 0, policy_version 574911 (0.0103) [2024-06-15 18:35:32,811][1652491] Updated weights for policy 0, policy_version 574960 (0.0018) [2024-06-15 18:35:35,260][1652491] Updated weights for policy 0, policy_version 575041 (0.0014) [2024-06-15 18:35:35,955][1648985] Fps is (10 sec: 45876.4, 60 sec: 49152.0, 300 sec: 46430.6). Total num frames: 1177747456. Throughput: 0: 11969.4. Samples: 294484992. Policy #0 lag: (min: 2.0, avg: 116.2, max: 258.0) [2024-06-15 18:35:35,956][1648985] Avg episode reward: [(0, '157.030')] [2024-06-15 18:35:36,580][1652491] Updated weights for policy 0, policy_version 575100 (0.0011) [2024-06-15 18:35:40,955][1648985] Fps is (10 sec: 42597.4, 60 sec: 45328.9, 300 sec: 45986.3). Total num frames: 1177812992. Throughput: 0: 12037.7. Samples: 294526464. Policy #0 lag: (min: 2.0, avg: 116.2, max: 258.0) [2024-06-15 18:35:40,956][1648985] Avg episode reward: [(0, '162.800')] [2024-06-15 18:35:41,959][1652491] Updated weights for policy 0, policy_version 575138 (0.0014) [2024-06-15 18:35:43,684][1652491] Updated weights for policy 0, policy_version 575184 (0.0013) [2024-06-15 18:35:44,700][1652491] Updated weights for policy 0, policy_version 575232 (0.0014) [2024-06-15 18:35:44,943][1651469] Signal inference workers to stop experience collection... (29950 times) [2024-06-15 18:35:44,978][1652491] InferenceWorker_p0-w0: stopping experience collection (29950 times) [2024-06-15 18:35:45,254][1651469] Signal inference workers to resume experience collection... (29950 times) [2024-06-15 18:35:45,274][1652491] InferenceWorker_p0-w0: resuming experience collection (29950 times) [2024-06-15 18:35:45,955][1648985] Fps is (10 sec: 39321.3, 60 sec: 49154.7, 300 sec: 45986.3). Total num frames: 1178140672. Throughput: 0: 11902.0. Samples: 294598144. Policy #0 lag: (min: 2.0, avg: 116.2, max: 258.0) [2024-06-15 18:35:45,956][1648985] Avg episode reward: [(0, '150.900')] [2024-06-15 18:35:46,318][1652491] Updated weights for policy 0, policy_version 575285 (0.0037) [2024-06-15 18:35:47,876][1652491] Updated weights for policy 0, policy_version 575352 (0.0014) [2024-06-15 18:35:50,955][1648985] Fps is (10 sec: 52430.6, 60 sec: 46421.4, 300 sec: 46208.5). Total num frames: 1178337280. Throughput: 0: 11889.8. Samples: 294671360. Policy #0 lag: (min: 2.0, avg: 116.2, max: 258.0) [2024-06-15 18:35:50,955][1648985] Avg episode reward: [(0, '138.410')] [2024-06-15 18:35:52,550][1652491] Updated weights for policy 0, policy_version 575396 (0.0014) [2024-06-15 18:35:54,957][1652491] Updated weights for policy 0, policy_version 575456 (0.0012) [2024-06-15 18:35:55,955][1648985] Fps is (10 sec: 45875.3, 60 sec: 48059.8, 300 sec: 46208.4). Total num frames: 1178599424. Throughput: 0: 11901.2. Samples: 294708736. Policy #0 lag: (min: 2.0, avg: 116.2, max: 258.0) [2024-06-15 18:35:55,956][1648985] Avg episode reward: [(0, '138.980')] [2024-06-15 18:35:56,762][1652491] Updated weights for policy 0, policy_version 575546 (0.0017) [2024-06-15 18:35:58,056][1652491] Updated weights for policy 0, policy_version 575585 (0.0034) [2024-06-15 18:35:58,699][1652491] Updated weights for policy 0, policy_version 575616 (0.0011) [2024-06-15 18:36:00,955][1648985] Fps is (10 sec: 52427.8, 60 sec: 48062.2, 300 sec: 46208.4). Total num frames: 1178861568. Throughput: 0: 12060.4. Samples: 294778368. Policy #0 lag: (min: 2.0, avg: 116.2, max: 258.0) [2024-06-15 18:36:00,956][1648985] Avg episode reward: [(0, '147.290')] [2024-06-15 18:36:03,823][1652491] Updated weights for policy 0, policy_version 575676 (0.0014) [2024-06-15 18:36:05,871][1652491] Updated weights for policy 0, policy_version 575743 (0.0013) [2024-06-15 18:36:05,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 48059.7, 300 sec: 46541.7). Total num frames: 1179123712. Throughput: 0: 11889.8. Samples: 294851584. Policy #0 lag: (min: 2.0, avg: 116.2, max: 258.0) [2024-06-15 18:36:05,956][1648985] Avg episode reward: [(0, '143.330')] [2024-06-15 18:36:07,714][1652491] Updated weights for policy 0, policy_version 575805 (0.0015) [2024-06-15 18:36:10,061][1652491] Updated weights for policy 0, policy_version 575861 (0.0013) [2024-06-15 18:36:10,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 48059.7, 300 sec: 46652.7). Total num frames: 1179385856. Throughput: 0: 11889.8. Samples: 294883328. Policy #0 lag: (min: 2.0, avg: 116.2, max: 258.0) [2024-06-15 18:36:10,956][1648985] Avg episode reward: [(0, '129.460')] [2024-06-15 18:36:14,894][1652491] Updated weights for policy 0, policy_version 575904 (0.0011) [2024-06-15 18:36:15,955][1648985] Fps is (10 sec: 39321.2, 60 sec: 45875.2, 300 sec: 46652.8). Total num frames: 1179516928. Throughput: 0: 11867.0. Samples: 294958080. Policy #0 lag: (min: 2.0, avg: 116.2, max: 258.0) [2024-06-15 18:36:15,956][1648985] Avg episode reward: [(0, '124.400')] [2024-06-15 18:36:16,099][1652491] Updated weights for policy 0, policy_version 575952 (0.0013) [2024-06-15 18:36:17,435][1652491] Updated weights for policy 0, policy_version 576002 (0.0029) [2024-06-15 18:36:18,758][1652491] Updated weights for policy 0, policy_version 576057 (0.0012) [2024-06-15 18:36:20,955][1648985] Fps is (10 sec: 45875.5, 60 sec: 48605.9, 300 sec: 46430.6). Total num frames: 1179844608. Throughput: 0: 11832.9. Samples: 295017472. Policy #0 lag: (min: 2.0, avg: 116.2, max: 258.0) [2024-06-15 18:36:20,956][1648985] Avg episode reward: [(0, '134.500')] [2024-06-15 18:36:21,425][1652491] Updated weights for policy 0, policy_version 576119 (0.0014) [2024-06-15 18:36:25,423][1652491] Updated weights for policy 0, policy_version 576147 (0.0013) [2024-06-15 18:36:25,955][1648985] Fps is (10 sec: 45875.8, 60 sec: 44783.1, 300 sec: 46430.6). Total num frames: 1179975680. Throughput: 0: 11946.7. Samples: 295064064. Policy #0 lag: (min: 2.0, avg: 116.2, max: 258.0) [2024-06-15 18:36:25,956][1648985] Avg episode reward: [(0, '141.550')] [2024-06-15 18:36:26,395][1652491] Updated weights for policy 0, policy_version 576192 (0.0034) [2024-06-15 18:36:28,964][1652491] Updated weights for policy 0, policy_version 576261 (0.0104) [2024-06-15 18:36:29,298][1651469] Signal inference workers to stop experience collection... (30000 times) [2024-06-15 18:36:29,358][1652491] InferenceWorker_p0-w0: stopping experience collection (30000 times) [2024-06-15 18:36:29,613][1651469] Signal inference workers to resume experience collection... (30000 times) [2024-06-15 18:36:29,615][1652491] InferenceWorker_p0-w0: resuming experience collection (30000 times) [2024-06-15 18:36:30,341][1652491] Updated weights for policy 0, policy_version 576320 (0.0015) [2024-06-15 18:36:30,955][1648985] Fps is (10 sec: 45875.1, 60 sec: 48605.8, 300 sec: 46430.6). Total num frames: 1180303360. Throughput: 0: 11753.2. Samples: 295127040. Policy #0 lag: (min: 2.0, avg: 116.2, max: 258.0) [2024-06-15 18:36:30,956][1648985] Avg episode reward: [(0, '133.920')] [2024-06-15 18:36:32,327][1652491] Updated weights for policy 0, policy_version 576379 (0.0013) [2024-06-15 18:36:35,955][1648985] Fps is (10 sec: 49152.2, 60 sec: 45329.1, 300 sec: 46319.5). Total num frames: 1180467200. Throughput: 0: 11946.6. Samples: 295208960. Policy #0 lag: (min: 2.0, avg: 116.2, max: 258.0) [2024-06-15 18:36:35,955][1648985] Avg episode reward: [(0, '143.440')] [2024-06-15 18:36:36,323][1652491] Updated weights for policy 0, policy_version 576425 (0.0014) [2024-06-15 18:36:38,644][1652491] Updated weights for policy 0, policy_version 576482 (0.0015) [2024-06-15 18:36:39,956][1652491] Updated weights for policy 0, policy_version 576528 (0.0042) [2024-06-15 18:36:40,955][1648985] Fps is (10 sec: 49151.6, 60 sec: 49698.2, 300 sec: 46763.8). Total num frames: 1180794880. Throughput: 0: 11867.0. Samples: 295242752. Policy #0 lag: (min: 2.0, avg: 116.2, max: 258.0) [2024-06-15 18:36:40,956][1648985] Avg episode reward: [(0, '150.880')] [2024-06-15 18:36:41,053][1652491] Updated weights for policy 0, policy_version 576573 (0.0012) [2024-06-15 18:36:42,769][1652491] Updated weights for policy 0, policy_version 576625 (0.0017) [2024-06-15 18:36:45,955][1648985] Fps is (10 sec: 49151.7, 60 sec: 46967.5, 300 sec: 46652.7). Total num frames: 1180958720. Throughput: 0: 11878.4. Samples: 295312896. Policy #0 lag: (min: 15.0, avg: 163.4, max: 271.0) [2024-06-15 18:36:45,955][1648985] Avg episode reward: [(0, '141.530')] [2024-06-15 18:36:46,842][1652491] Updated weights for policy 0, policy_version 576672 (0.0054) [2024-06-15 18:36:50,109][1652491] Updated weights for policy 0, policy_version 576752 (0.0124) [2024-06-15 18:36:50,955][1648985] Fps is (10 sec: 42598.9, 60 sec: 48059.6, 300 sec: 47097.1). Total num frames: 1181220864. Throughput: 0: 11946.7. Samples: 295389184. Policy #0 lag: (min: 15.0, avg: 163.4, max: 271.0) [2024-06-15 18:36:50,956][1648985] Avg episode reward: [(0, '152.620')] [2024-06-15 18:36:51,394][1652491] Updated weights for policy 0, policy_version 576787 (0.0037) [2024-06-15 18:36:53,617][1652491] Updated weights for policy 0, policy_version 576880 (0.0115) [2024-06-15 18:36:55,955][1648985] Fps is (10 sec: 52428.1, 60 sec: 48059.7, 300 sec: 46652.7). Total num frames: 1181483008. Throughput: 0: 11810.1. Samples: 295414784. Policy #0 lag: (min: 15.0, avg: 163.4, max: 271.0) [2024-06-15 18:36:55,956][1648985] Avg episode reward: [(0, '148.810')] [2024-06-15 18:36:55,960][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000576896_1181483008.pth... [2024-06-15 18:36:56,070][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000571440_1170309120.pth [2024-06-15 18:36:58,010][1652491] Updated weights for policy 0, policy_version 576912 (0.0013) [2024-06-15 18:37:00,955][1648985] Fps is (10 sec: 42598.0, 60 sec: 46421.3, 300 sec: 46765.4). Total num frames: 1181646848. Throughput: 0: 11901.2. Samples: 295493632. Policy #0 lag: (min: 15.0, avg: 163.4, max: 271.0) [2024-06-15 18:37:00,956][1648985] Avg episode reward: [(0, '141.560')] [2024-06-15 18:37:01,113][1652491] Updated weights for policy 0, policy_version 576977 (0.0093) [2024-06-15 18:37:02,739][1652491] Updated weights for policy 0, policy_version 577040 (0.0013) [2024-06-15 18:37:04,122][1652491] Updated weights for policy 0, policy_version 577093 (0.0012) [2024-06-15 18:37:05,955][1648985] Fps is (10 sec: 52429.2, 60 sec: 48059.7, 300 sec: 47208.1). Total num frames: 1182007296. Throughput: 0: 11969.4. Samples: 295556096. Policy #0 lag: (min: 15.0, avg: 163.4, max: 271.0) [2024-06-15 18:37:05,956][1648985] Avg episode reward: [(0, '134.790')] [2024-06-15 18:37:08,820][1652491] Updated weights for policy 0, policy_version 577155 (0.0082) [2024-06-15 18:37:10,955][1648985] Fps is (10 sec: 49151.6, 60 sec: 45875.1, 300 sec: 46652.7). Total num frames: 1182138368. Throughput: 0: 11867.0. Samples: 295598080. Policy #0 lag: (min: 15.0, avg: 163.4, max: 271.0) [2024-06-15 18:37:10,956][1648985] Avg episode reward: [(0, '148.240')] [2024-06-15 18:37:12,449][1652491] Updated weights for policy 0, policy_version 577232 (0.0041) [2024-06-15 18:37:13,596][1652491] Updated weights for policy 0, policy_version 577280 (0.0014) [2024-06-15 18:37:13,703][1651469] Signal inference workers to stop experience collection... (30050 times) [2024-06-15 18:37:13,769][1652491] InferenceWorker_p0-w0: stopping experience collection (30050 times) [2024-06-15 18:37:13,980][1651469] Signal inference workers to resume experience collection... (30050 times) [2024-06-15 18:37:13,982][1652491] InferenceWorker_p0-w0: resuming experience collection (30050 times) [2024-06-15 18:37:15,131][1652491] Updated weights for policy 0, policy_version 577344 (0.0011) [2024-06-15 18:37:15,955][1648985] Fps is (10 sec: 42598.8, 60 sec: 48606.0, 300 sec: 47319.2). Total num frames: 1182433280. Throughput: 0: 11935.3. Samples: 295664128. Policy #0 lag: (min: 15.0, avg: 163.4, max: 271.0) [2024-06-15 18:37:15,955][1648985] Avg episode reward: [(0, '142.990')] [2024-06-15 18:37:16,557][1652491] Updated weights for policy 0, policy_version 577397 (0.0012) [2024-06-15 18:37:20,955][1648985] Fps is (10 sec: 45875.8, 60 sec: 45875.2, 300 sec: 46652.7). Total num frames: 1182597120. Throughput: 0: 11753.2. Samples: 295737856. Policy #0 lag: (min: 15.0, avg: 163.4, max: 271.0) [2024-06-15 18:37:20,956][1648985] Avg episode reward: [(0, '145.710')] [2024-06-15 18:37:21,224][1652491] Updated weights for policy 0, policy_version 577465 (0.0039) [2024-06-15 18:37:24,160][1652491] Updated weights for policy 0, policy_version 577520 (0.0011) [2024-06-15 18:37:25,955][1652491] Updated weights for policy 0, policy_version 577584 (0.0012) [2024-06-15 18:37:25,958][1648985] Fps is (10 sec: 42584.3, 60 sec: 48057.1, 300 sec: 47318.7). Total num frames: 1182859264. Throughput: 0: 11786.5. Samples: 295773184. Policy #0 lag: (min: 15.0, avg: 163.4, max: 271.0) [2024-06-15 18:37:25,959][1648985] Avg episode reward: [(0, '135.010')] [2024-06-15 18:37:27,658][1652491] Updated weights for policy 0, policy_version 577650 (0.0012) [2024-06-15 18:37:30,955][1648985] Fps is (10 sec: 45875.0, 60 sec: 45875.2, 300 sec: 46652.7). Total num frames: 1183055872. Throughput: 0: 11605.3. Samples: 295835136. Policy #0 lag: (min: 15.0, avg: 163.4, max: 271.0) [2024-06-15 18:37:30,956][1648985] Avg episode reward: [(0, '119.210')] [2024-06-15 18:37:32,486][1652491] Updated weights for policy 0, policy_version 577696 (0.0018) [2024-06-15 18:37:35,329][1652491] Updated weights for policy 0, policy_version 577744 (0.0011) [2024-06-15 18:37:35,955][1648985] Fps is (10 sec: 39334.3, 60 sec: 46421.2, 300 sec: 46874.9). Total num frames: 1183252480. Throughput: 0: 11639.4. Samples: 295912960. Policy #0 lag: (min: 15.0, avg: 163.4, max: 271.0) [2024-06-15 18:37:35,956][1648985] Avg episode reward: [(0, '115.620')] [2024-06-15 18:37:36,575][1652491] Updated weights for policy 0, policy_version 577796 (0.0012) [2024-06-15 18:37:38,297][1652491] Updated weights for policy 0, policy_version 577861 (0.0079) [2024-06-15 18:37:40,955][1648985] Fps is (10 sec: 52429.2, 60 sec: 46421.4, 300 sec: 46874.9). Total num frames: 1183580160. Throughput: 0: 11628.1. Samples: 295938048. Policy #0 lag: (min: 15.0, avg: 163.4, max: 271.0) [2024-06-15 18:37:40,956][1648985] Avg episode reward: [(0, '125.860')] [2024-06-15 18:37:43,460][1652491] Updated weights for policy 0, policy_version 577923 (0.0020) [2024-06-15 18:37:45,970][1648985] Fps is (10 sec: 45805.1, 60 sec: 45863.4, 300 sec: 46650.3). Total num frames: 1183711232. Throughput: 0: 11464.9. Samples: 296009728. Policy #0 lag: (min: 15.0, avg: 163.4, max: 271.0) [2024-06-15 18:37:45,971][1648985] Avg episode reward: [(0, '133.000')] [2024-06-15 18:37:46,681][1652491] Updated weights for policy 0, policy_version 577985 (0.0121) [2024-06-15 18:37:47,759][1652491] Updated weights for policy 0, policy_version 578035 (0.0012) [2024-06-15 18:37:49,744][1652491] Updated weights for policy 0, policy_version 578112 (0.0135) [2024-06-15 18:37:50,955][1648985] Fps is (10 sec: 45875.1, 60 sec: 46967.4, 300 sec: 46766.5). Total num frames: 1184038912. Throughput: 0: 11400.5. Samples: 296069120. Policy #0 lag: (min: 15.0, avg: 163.4, max: 271.0) [2024-06-15 18:37:50,956][1648985] Avg episode reward: [(0, '128.490')] [2024-06-15 18:37:51,583][1652491] Updated weights for policy 0, policy_version 578170 (0.0017) [2024-06-15 18:37:55,809][1651469] Signal inference workers to stop experience collection... (30100 times) [2024-06-15 18:37:55,878][1652491] InferenceWorker_p0-w0: stopping experience collection (30100 times) [2024-06-15 18:37:55,888][1652491] Updated weights for policy 0, policy_version 578215 (0.0017) [2024-06-15 18:37:55,955][1648985] Fps is (10 sec: 45946.0, 60 sec: 44783.1, 300 sec: 46430.6). Total num frames: 1184169984. Throughput: 0: 11355.1. Samples: 296109056. Policy #0 lag: (min: 15.0, avg: 163.4, max: 271.0) [2024-06-15 18:37:55,955][1648985] Avg episode reward: [(0, '133.380')] [2024-06-15 18:37:55,988][1651469] Signal inference workers to resume experience collection... (30100 times) [2024-06-15 18:37:55,989][1652491] InferenceWorker_p0-w0: resuming experience collection (30100 times) [2024-06-15 18:37:58,705][1652491] Updated weights for policy 0, policy_version 578259 (0.0012) [2024-06-15 18:38:00,347][1652491] Updated weights for policy 0, policy_version 578336 (0.0121) [2024-06-15 18:38:00,955][1648985] Fps is (10 sec: 42598.3, 60 sec: 46967.5, 300 sec: 46652.7). Total num frames: 1184464896. Throughput: 0: 11446.0. Samples: 296179200. Policy #0 lag: (min: 15.0, avg: 163.4, max: 271.0) [2024-06-15 18:38:00,956][1648985] Avg episode reward: [(0, '154.570')] [2024-06-15 18:38:02,091][1652491] Updated weights for policy 0, policy_version 578400 (0.0062) [2024-06-15 18:38:05,955][1648985] Fps is (10 sec: 45875.1, 60 sec: 43690.7, 300 sec: 46208.4). Total num frames: 1184628736. Throughput: 0: 11252.6. Samples: 296244224. Policy #0 lag: (min: 15.0, avg: 163.4, max: 271.0) [2024-06-15 18:38:05,956][1648985] Avg episode reward: [(0, '145.040')] [2024-06-15 18:38:07,205][1652491] Updated weights for policy 0, policy_version 578452 (0.0013) [2024-06-15 18:38:09,723][1652491] Updated weights for policy 0, policy_version 578498 (0.0012) [2024-06-15 18:38:10,955][1648985] Fps is (10 sec: 39321.8, 60 sec: 45329.2, 300 sec: 46541.6). Total num frames: 1184858112. Throughput: 0: 11367.2. Samples: 296284672. Policy #0 lag: (min: 15.0, avg: 163.4, max: 271.0) [2024-06-15 18:38:10,956][1648985] Avg episode reward: [(0, '152.330')] [2024-06-15 18:38:11,657][1652491] Updated weights for policy 0, policy_version 578576 (0.0012) [2024-06-15 18:38:13,475][1652491] Updated weights for policy 0, policy_version 578641 (0.0016) [2024-06-15 18:38:15,955][1648985] Fps is (10 sec: 52428.5, 60 sec: 45329.0, 300 sec: 46541.7). Total num frames: 1185153024. Throughput: 0: 11116.1. Samples: 296335360. Policy #0 lag: (min: 127.0, avg: 240.8, max: 431.0) [2024-06-15 18:38:15,956][1648985] Avg episode reward: [(0, '158.960')] [2024-06-15 18:38:18,596][1652491] Updated weights for policy 0, policy_version 578689 (0.0012) [2024-06-15 18:38:20,955][1648985] Fps is (10 sec: 42597.9, 60 sec: 44782.9, 300 sec: 46652.8). Total num frames: 1185284096. Throughput: 0: 11059.2. Samples: 296410624. Policy #0 lag: (min: 127.0, avg: 240.8, max: 431.0) [2024-06-15 18:38:20,956][1648985] Avg episode reward: [(0, '173.250')] [2024-06-15 18:38:22,118][1652491] Updated weights for policy 0, policy_version 578768 (0.0013) [2024-06-15 18:38:24,084][1652491] Updated weights for policy 0, policy_version 578848 (0.0017) [2024-06-15 18:38:25,955][1648985] Fps is (10 sec: 45875.2, 60 sec: 45877.7, 300 sec: 46430.6). Total num frames: 1185611776. Throughput: 0: 11150.2. Samples: 296439808. Policy #0 lag: (min: 127.0, avg: 240.8, max: 431.0) [2024-06-15 18:38:25,956][1648985] Avg episode reward: [(0, '181.880')] [2024-06-15 18:38:25,995][1652491] Updated weights for policy 0, policy_version 578913 (0.0012) [2024-06-15 18:38:30,599][1652491] Updated weights for policy 0, policy_version 578976 (0.0119) [2024-06-15 18:38:30,955][1648985] Fps is (10 sec: 49152.5, 60 sec: 45329.1, 300 sec: 46541.7). Total num frames: 1185775616. Throughput: 0: 11165.4. Samples: 296512000. Policy #0 lag: (min: 127.0, avg: 240.8, max: 431.0) [2024-06-15 18:38:30,956][1648985] Avg episode reward: [(0, '165.130')] [2024-06-15 18:38:34,985][1652491] Updated weights for policy 0, policy_version 579042 (0.0012) [2024-06-15 18:38:35,955][1648985] Fps is (10 sec: 36044.4, 60 sec: 45329.0, 300 sec: 46208.4). Total num frames: 1185972224. Throughput: 0: 11320.9. Samples: 296578560. Policy #0 lag: (min: 127.0, avg: 240.8, max: 431.0) [2024-06-15 18:38:35,956][1648985] Avg episode reward: [(0, '147.720')] [2024-06-15 18:38:36,409][1652491] Updated weights for policy 0, policy_version 579120 (0.0011) [2024-06-15 18:38:36,508][1651469] Signal inference workers to stop experience collection... (30150 times) [2024-06-15 18:38:36,542][1652491] InferenceWorker_p0-w0: stopping experience collection (30150 times) [2024-06-15 18:38:36,727][1651469] Signal inference workers to resume experience collection... (30150 times) [2024-06-15 18:38:36,739][1652491] InferenceWorker_p0-w0: resuming experience collection (30150 times) [2024-06-15 18:38:37,703][1652491] Updated weights for policy 0, policy_version 579184 (0.0015) [2024-06-15 18:38:40,955][1648985] Fps is (10 sec: 42597.5, 60 sec: 43690.5, 300 sec: 46208.4). Total num frames: 1186201600. Throughput: 0: 11184.3. Samples: 296612352. Policy #0 lag: (min: 127.0, avg: 240.8, max: 431.0) [2024-06-15 18:38:40,956][1648985] Avg episode reward: [(0, '141.390')] [2024-06-15 18:38:41,431][1652491] Updated weights for policy 0, policy_version 579234 (0.0012) [2024-06-15 18:38:45,516][1652491] Updated weights for policy 0, policy_version 579298 (0.0037) [2024-06-15 18:38:45,955][1648985] Fps is (10 sec: 45876.1, 60 sec: 45340.7, 300 sec: 46430.6). Total num frames: 1186430976. Throughput: 0: 11446.1. Samples: 296694272. Policy #0 lag: (min: 127.0, avg: 240.8, max: 431.0) [2024-06-15 18:38:45,955][1648985] Avg episode reward: [(0, '135.110')] [2024-06-15 18:38:46,974][1652491] Updated weights for policy 0, policy_version 579360 (0.0013) [2024-06-15 18:38:48,481][1652491] Updated weights for policy 0, policy_version 579426 (0.0162) [2024-06-15 18:38:50,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 44782.8, 300 sec: 46430.6). Total num frames: 1186725888. Throughput: 0: 11423.2. Samples: 296758272. Policy #0 lag: (min: 127.0, avg: 240.8, max: 431.0) [2024-06-15 18:38:50,956][1648985] Avg episode reward: [(0, '146.830')] [2024-06-15 18:38:53,158][1652491] Updated weights for policy 0, policy_version 579507 (0.0017) [2024-06-15 18:38:55,955][1648985] Fps is (10 sec: 42597.0, 60 sec: 44782.7, 300 sec: 46430.5). Total num frames: 1186856960. Throughput: 0: 11366.3. Samples: 296796160. Policy #0 lag: (min: 127.0, avg: 240.8, max: 431.0) [2024-06-15 18:38:55,956][1648985] Avg episode reward: [(0, '169.780')] [2024-06-15 18:38:56,686][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000579552_1186922496.pth... [2024-06-15 18:38:56,830][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000574080_1175715840.pth [2024-06-15 18:38:58,140][1652491] Updated weights for policy 0, policy_version 579600 (0.0013) [2024-06-15 18:38:59,814][1652491] Updated weights for policy 0, policy_version 579684 (0.0013) [2024-06-15 18:39:00,955][1648985] Fps is (10 sec: 52429.8, 60 sec: 46421.4, 300 sec: 46652.7). Total num frames: 1187250176. Throughput: 0: 11446.1. Samples: 296850432. Policy #0 lag: (min: 127.0, avg: 240.8, max: 431.0) [2024-06-15 18:39:00,956][1648985] Avg episode reward: [(0, '162.720')] [2024-06-15 18:39:04,259][1652491] Updated weights for policy 0, policy_version 579744 (0.0014) [2024-06-15 18:39:05,955][1648985] Fps is (10 sec: 52429.1, 60 sec: 45875.0, 300 sec: 46652.7). Total num frames: 1187381248. Throughput: 0: 11628.1. Samples: 296933888. Policy #0 lag: (min: 127.0, avg: 240.8, max: 431.0) [2024-06-15 18:39:05,956][1648985] Avg episode reward: [(0, '152.600')] [2024-06-15 18:39:07,139][1652491] Updated weights for policy 0, policy_version 579785 (0.0012) [2024-06-15 18:39:08,648][1652491] Updated weights for policy 0, policy_version 579841 (0.0013) [2024-06-15 18:39:10,160][1652491] Updated weights for policy 0, policy_version 579904 (0.0045) [2024-06-15 18:39:10,955][1648985] Fps is (10 sec: 45875.3, 60 sec: 47513.6, 300 sec: 46653.0). Total num frames: 1187708928. Throughput: 0: 11696.4. Samples: 296966144. Policy #0 lag: (min: 127.0, avg: 240.8, max: 431.0) [2024-06-15 18:39:10,955][1648985] Avg episode reward: [(0, '147.300')] [2024-06-15 18:39:11,354][1652491] Updated weights for policy 0, policy_version 579962 (0.0014) [2024-06-15 18:39:15,955][1648985] Fps is (10 sec: 49153.6, 60 sec: 45329.2, 300 sec: 46541.7). Total num frames: 1187872768. Throughput: 0: 11776.0. Samples: 297041920. Policy #0 lag: (min: 127.0, avg: 240.8, max: 431.0) [2024-06-15 18:39:15,955][1648985] Avg episode reward: [(0, '142.490')] [2024-06-15 18:39:16,026][1652491] Updated weights for policy 0, policy_version 580017 (0.0013) [2024-06-15 18:39:18,137][1652491] Updated weights for policy 0, policy_version 580048 (0.0013) [2024-06-15 18:39:18,225][1651469] Signal inference workers to stop experience collection... (30200 times) [2024-06-15 18:39:18,285][1652491] InferenceWorker_p0-w0: stopping experience collection (30200 times) [2024-06-15 18:39:18,591][1651469] Signal inference workers to resume experience collection... (30200 times) [2024-06-15 18:39:18,597][1652491] InferenceWorker_p0-w0: resuming experience collection (30200 times) [2024-06-15 18:39:19,534][1652491] Updated weights for policy 0, policy_version 580096 (0.0012) [2024-06-15 18:39:20,955][1648985] Fps is (10 sec: 45874.0, 60 sec: 48059.6, 300 sec: 46763.8). Total num frames: 1188167680. Throughput: 0: 11673.6. Samples: 297103872. Policy #0 lag: (min: 127.0, avg: 240.8, max: 431.0) [2024-06-15 18:39:20,956][1648985] Avg episode reward: [(0, '137.750')] [2024-06-15 18:39:21,341][1652491] Updated weights for policy 0, policy_version 580176 (0.0013) [2024-06-15 18:39:25,955][1648985] Fps is (10 sec: 42597.9, 60 sec: 44782.9, 300 sec: 46319.5). Total num frames: 1188298752. Throughput: 0: 11673.6. Samples: 297137664. Policy #0 lag: (min: 127.0, avg: 240.8, max: 431.0) [2024-06-15 18:39:25,956][1648985] Avg episode reward: [(0, '128.750')] [2024-06-15 18:39:26,338][1652491] Updated weights for policy 0, policy_version 580225 (0.0015) [2024-06-15 18:39:27,791][1652491] Updated weights for policy 0, policy_version 580277 (0.0013) [2024-06-15 18:39:30,014][1652491] Updated weights for policy 0, policy_version 580320 (0.0038) [2024-06-15 18:39:30,955][1648985] Fps is (10 sec: 39322.4, 60 sec: 46421.3, 300 sec: 46652.7). Total num frames: 1188560896. Throughput: 0: 11480.2. Samples: 297210880. Policy #0 lag: (min: 127.0, avg: 240.8, max: 431.0) [2024-06-15 18:39:30,956][1648985] Avg episode reward: [(0, '143.290')] [2024-06-15 18:39:31,731][1652491] Updated weights for policy 0, policy_version 580371 (0.0018) [2024-06-15 18:39:33,327][1652491] Updated weights for policy 0, policy_version 580449 (0.0013) [2024-06-15 18:39:35,955][1648985] Fps is (10 sec: 52427.6, 60 sec: 47513.5, 300 sec: 46541.6). Total num frames: 1188823040. Throughput: 0: 11650.8. Samples: 297282560. Policy #0 lag: (min: 127.0, avg: 240.8, max: 431.0) [2024-06-15 18:39:35,956][1648985] Avg episode reward: [(0, '144.140')] [2024-06-15 18:39:37,229][1652491] Updated weights for policy 0, policy_version 580481 (0.0012) [2024-06-15 18:39:38,615][1652491] Updated weights for policy 0, policy_version 580536 (0.0012) [2024-06-15 18:39:40,955][1648985] Fps is (10 sec: 45875.1, 60 sec: 46967.6, 300 sec: 46875.4). Total num frames: 1189019648. Throughput: 0: 11582.6. Samples: 297317376. Policy #0 lag: (min: 127.0, avg: 240.8, max: 431.0) [2024-06-15 18:39:40,956][1648985] Avg episode reward: [(0, '137.280')] [2024-06-15 18:39:41,346][1652491] Updated weights for policy 0, policy_version 580602 (0.0015) [2024-06-15 18:39:43,751][1652491] Updated weights for policy 0, policy_version 580672 (0.0013) [2024-06-15 18:39:44,871][1652491] Updated weights for policy 0, policy_version 580720 (0.0013) [2024-06-15 18:39:45,955][1648985] Fps is (10 sec: 52430.0, 60 sec: 48605.8, 300 sec: 46763.8). Total num frames: 1189347328. Throughput: 0: 11810.1. Samples: 297381888. Policy #0 lag: (min: 127.0, avg: 240.8, max: 431.0) [2024-06-15 18:39:45,956][1648985] Avg episode reward: [(0, '117.710')] [2024-06-15 18:39:49,104][1652491] Updated weights for policy 0, policy_version 580756 (0.0029) [2024-06-15 18:39:50,955][1648985] Fps is (10 sec: 45874.8, 60 sec: 45875.2, 300 sec: 46652.7). Total num frames: 1189478400. Throughput: 0: 11628.1. Samples: 297457152. Policy #0 lag: (min: 15.0, avg: 113.7, max: 271.0) [2024-06-15 18:39:50,956][1648985] Avg episode reward: [(0, '137.100')] [2024-06-15 18:39:51,499][1652491] Updated weights for policy 0, policy_version 580832 (0.0015) [2024-06-15 18:39:54,481][1652491] Updated weights for policy 0, policy_version 580896 (0.0016) [2024-06-15 18:39:55,769][1652491] Updated weights for policy 0, policy_version 580949 (0.0024) [2024-06-15 18:39:55,956][1648985] Fps is (10 sec: 42594.4, 60 sec: 48605.3, 300 sec: 46764.2). Total num frames: 1189773312. Throughput: 0: 11707.5. Samples: 297492992. Policy #0 lag: (min: 15.0, avg: 113.7, max: 271.0) [2024-06-15 18:39:55,957][1648985] Avg episode reward: [(0, '135.410')] [2024-06-15 18:39:56,740][1652491] Updated weights for policy 0, policy_version 580991 (0.0012) [2024-06-15 18:40:00,163][1651469] Signal inference workers to stop experience collection... (30250 times) [2024-06-15 18:40:00,216][1652491] InferenceWorker_p0-w0: stopping experience collection (30250 times) [2024-06-15 18:40:00,491][1651469] Signal inference workers to resume experience collection... (30250 times) [2024-06-15 18:40:00,492][1652491] InferenceWorker_p0-w0: resuming experience collection (30250 times) [2024-06-15 18:40:00,955][1648985] Fps is (10 sec: 45874.3, 60 sec: 44782.7, 300 sec: 46430.5). Total num frames: 1189937152. Throughput: 0: 11605.2. Samples: 297564160. Policy #0 lag: (min: 15.0, avg: 113.7, max: 271.0) [2024-06-15 18:40:00,957][1648985] Avg episode reward: [(0, '141.220')] [2024-06-15 18:40:01,136][1652491] Updated weights for policy 0, policy_version 581044 (0.0037) [2024-06-15 18:40:02,403][1652491] Updated weights for policy 0, policy_version 581077 (0.0013) [2024-06-15 18:40:05,676][1652491] Updated weights for policy 0, policy_version 581136 (0.0031) [2024-06-15 18:40:05,955][1648985] Fps is (10 sec: 39325.3, 60 sec: 46421.5, 300 sec: 46319.5). Total num frames: 1190166528. Throughput: 0: 11719.2. Samples: 297631232. Policy #0 lag: (min: 15.0, avg: 113.7, max: 271.0) [2024-06-15 18:40:05,956][1648985] Avg episode reward: [(0, '155.020')] [2024-06-15 18:40:07,827][1652491] Updated weights for policy 0, policy_version 581232 (0.0013) [2024-06-15 18:40:10,955][1648985] Fps is (10 sec: 45877.1, 60 sec: 44783.0, 300 sec: 46208.5). Total num frames: 1190395904. Throughput: 0: 11571.2. Samples: 297658368. Policy #0 lag: (min: 15.0, avg: 113.7, max: 271.0) [2024-06-15 18:40:10,955][1648985] Avg episode reward: [(0, '162.880')] [2024-06-15 18:40:11,626][1652491] Updated weights for policy 0, policy_version 581266 (0.0014) [2024-06-15 18:40:13,664][1652491] Updated weights for policy 0, policy_version 581314 (0.0014) [2024-06-15 18:40:15,955][1648985] Fps is (10 sec: 49151.6, 60 sec: 46421.2, 300 sec: 46541.7). Total num frames: 1190658048. Throughput: 0: 11502.9. Samples: 297728512. Policy #0 lag: (min: 15.0, avg: 113.7, max: 271.0) [2024-06-15 18:40:15,956][1648985] Avg episode reward: [(0, '148.470')] [2024-06-15 18:40:16,773][1652491] Updated weights for policy 0, policy_version 581377 (0.0013) [2024-06-15 18:40:18,455][1652491] Updated weights for policy 0, policy_version 581456 (0.0114) [2024-06-15 18:40:20,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 45875.5, 300 sec: 46208.5). Total num frames: 1190920192. Throughput: 0: 11537.2. Samples: 297801728. Policy #0 lag: (min: 15.0, avg: 113.7, max: 271.0) [2024-06-15 18:40:20,955][1648985] Avg episode reward: [(0, '156.190')] [2024-06-15 18:40:22,591][1652491] Updated weights for policy 0, policy_version 581523 (0.0015) [2024-06-15 18:40:23,524][1652491] Updated weights for policy 0, policy_version 581561 (0.0023) [2024-06-15 18:40:25,955][1648985] Fps is (10 sec: 45875.7, 60 sec: 46967.5, 300 sec: 46541.7). Total num frames: 1191116800. Throughput: 0: 11582.6. Samples: 297838592. Policy #0 lag: (min: 15.0, avg: 113.7, max: 271.0) [2024-06-15 18:40:25,955][1648985] Avg episode reward: [(0, '152.370')] [2024-06-15 18:40:26,261][1652491] Updated weights for policy 0, policy_version 581632 (0.0018) [2024-06-15 18:40:29,654][1652491] Updated weights for policy 0, policy_version 581712 (0.0107) [2024-06-15 18:40:30,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 48059.8, 300 sec: 46430.6). Total num frames: 1191444480. Throughput: 0: 11616.7. Samples: 297904640. Policy #0 lag: (min: 15.0, avg: 113.7, max: 271.0) [2024-06-15 18:40:30,955][1648985] Avg episode reward: [(0, '154.040')] [2024-06-15 18:40:34,234][1652491] Updated weights for policy 0, policy_version 581777 (0.0016) [2024-06-15 18:40:35,955][1648985] Fps is (10 sec: 45874.9, 60 sec: 45875.4, 300 sec: 46652.8). Total num frames: 1191575552. Throughput: 0: 11514.3. Samples: 297975296. Policy #0 lag: (min: 15.0, avg: 113.7, max: 271.0) [2024-06-15 18:40:35,956][1648985] Avg episode reward: [(0, '158.830')] [2024-06-15 18:40:36,626][1652491] Updated weights for policy 0, policy_version 581840 (0.0042) [2024-06-15 18:40:40,227][1652491] Updated weights for policy 0, policy_version 581905 (0.0013) [2024-06-15 18:40:40,955][1648985] Fps is (10 sec: 36043.9, 60 sec: 46421.3, 300 sec: 46319.5). Total num frames: 1191804928. Throughput: 0: 11457.6. Samples: 298008576. Policy #0 lag: (min: 15.0, avg: 113.7, max: 271.0) [2024-06-15 18:40:40,956][1648985] Avg episode reward: [(0, '164.680')] [2024-06-15 18:40:41,985][1652491] Updated weights for policy 0, policy_version 581984 (0.0013) [2024-06-15 18:40:42,131][1651469] Signal inference workers to stop experience collection... (30300 times) [2024-06-15 18:40:42,221][1652491] InferenceWorker_p0-w0: stopping experience collection (30300 times) [2024-06-15 18:40:42,418][1651469] Signal inference workers to resume experience collection... (30300 times) [2024-06-15 18:40:42,438][1652491] InferenceWorker_p0-w0: resuming experience collection (30300 times) [2024-06-15 18:40:42,747][1652491] Updated weights for policy 0, policy_version 582016 (0.0012) [2024-06-15 18:40:45,955][1648985] Fps is (10 sec: 49152.5, 60 sec: 45329.1, 300 sec: 46541.7). Total num frames: 1192067072. Throughput: 0: 11434.8. Samples: 298078720. Policy #0 lag: (min: 15.0, avg: 113.7, max: 271.0) [2024-06-15 18:40:45,956][1648985] Avg episode reward: [(0, '158.400')] [2024-06-15 18:40:45,987][1652491] Updated weights for policy 0, policy_version 582070 (0.0014) [2024-06-15 18:40:47,893][1652491] Updated weights for policy 0, policy_version 582115 (0.0012) [2024-06-15 18:40:50,955][1648985] Fps is (10 sec: 42598.0, 60 sec: 45875.1, 300 sec: 46208.4). Total num frames: 1192230912. Throughput: 0: 11639.4. Samples: 298155008. Policy #0 lag: (min: 15.0, avg: 113.7, max: 271.0) [2024-06-15 18:40:50,956][1648985] Avg episode reward: [(0, '156.810')] [2024-06-15 18:40:52,036][1652491] Updated weights for policy 0, policy_version 582182 (0.0014) [2024-06-15 18:40:53,431][1652491] Updated weights for policy 0, policy_version 582243 (0.0011) [2024-06-15 18:40:55,539][1652491] Updated weights for policy 0, policy_version 582289 (0.0014) [2024-06-15 18:40:55,955][1648985] Fps is (10 sec: 49149.9, 60 sec: 46421.8, 300 sec: 46430.5). Total num frames: 1192558592. Throughput: 0: 11707.6. Samples: 298185216. Policy #0 lag: (min: 15.0, avg: 113.7, max: 271.0) [2024-06-15 18:40:55,956][1648985] Avg episode reward: [(0, '158.420')] [2024-06-15 18:40:56,342][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000582336_1192624128.pth... [2024-06-15 18:40:56,399][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000576896_1181483008.pth [2024-06-15 18:40:58,445][1652491] Updated weights for policy 0, policy_version 582357 (0.0013) [2024-06-15 18:41:00,955][1648985] Fps is (10 sec: 52430.0, 60 sec: 46967.7, 300 sec: 46208.4). Total num frames: 1192755200. Throughput: 0: 11855.7. Samples: 298262016. Policy #0 lag: (min: 15.0, avg: 113.7, max: 271.0) [2024-06-15 18:41:00,956][1648985] Avg episode reward: [(0, '154.990')] [2024-06-15 18:41:01,954][1652491] Updated weights for policy 0, policy_version 582402 (0.0010) [2024-06-15 18:41:03,611][1652491] Updated weights for policy 0, policy_version 582465 (0.0012) [2024-06-15 18:41:04,928][1652491] Updated weights for policy 0, policy_version 582524 (0.0013) [2024-06-15 18:41:05,955][1648985] Fps is (10 sec: 45876.9, 60 sec: 47513.6, 300 sec: 46208.4). Total num frames: 1193017344. Throughput: 0: 11741.8. Samples: 298330112. Policy #0 lag: (min: 15.0, avg: 113.7, max: 271.0) [2024-06-15 18:41:05,956][1648985] Avg episode reward: [(0, '144.900')] [2024-06-15 18:41:06,747][1652491] Updated weights for policy 0, policy_version 582561 (0.0060) [2024-06-15 18:41:09,214][1652491] Updated weights for policy 0, policy_version 582612 (0.0010) [2024-06-15 18:41:09,965][1652491] Updated weights for policy 0, policy_version 582655 (0.0041) [2024-06-15 18:41:10,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 48059.7, 300 sec: 46652.8). Total num frames: 1193279488. Throughput: 0: 11741.9. Samples: 298366976. Policy #0 lag: (min: 15.0, avg: 113.7, max: 271.0) [2024-06-15 18:41:10,956][1648985] Avg episode reward: [(0, '153.350')] [2024-06-15 18:41:15,048][1652491] Updated weights for policy 0, policy_version 582736 (0.0013) [2024-06-15 18:41:15,955][1648985] Fps is (10 sec: 49152.4, 60 sec: 47513.7, 300 sec: 46319.5). Total num frames: 1193508864. Throughput: 0: 11935.3. Samples: 298441728. Policy #0 lag: (min: 15.0, avg: 113.7, max: 271.0) [2024-06-15 18:41:15,955][1648985] Avg episode reward: [(0, '159.040')] [2024-06-15 18:41:16,165][1652491] Updated weights for policy 0, policy_version 582782 (0.0013) [2024-06-15 18:41:17,690][1652491] Updated weights for policy 0, policy_version 582818 (0.0011) [2024-06-15 18:41:20,180][1652491] Updated weights for policy 0, policy_version 582896 (0.0099) [2024-06-15 18:41:20,955][1648985] Fps is (10 sec: 52429.2, 60 sec: 48059.7, 300 sec: 46874.9). Total num frames: 1193803776. Throughput: 0: 11776.0. Samples: 298505216. Policy #0 lag: (min: 15.0, avg: 113.7, max: 271.0) [2024-06-15 18:41:20,955][1648985] Avg episode reward: [(0, '167.340')] [2024-06-15 18:41:25,249][1652491] Updated weights for policy 0, policy_version 582960 (0.0014) [2024-06-15 18:41:25,955][1648985] Fps is (10 sec: 42597.5, 60 sec: 46967.3, 300 sec: 46208.4). Total num frames: 1193934848. Throughput: 0: 12037.7. Samples: 298550272. Policy #0 lag: (min: 15.0, avg: 90.0, max: 271.0) [2024-06-15 18:41:25,956][1648985] Avg episode reward: [(0, '157.480')] [2024-06-15 18:41:26,147][1652491] Updated weights for policy 0, policy_version 582994 (0.0012) [2024-06-15 18:41:26,486][1651469] Signal inference workers to stop experience collection... (30350 times) [2024-06-15 18:41:26,548][1652491] InferenceWorker_p0-w0: stopping experience collection (30350 times) [2024-06-15 18:41:26,777][1651469] Signal inference workers to resume experience collection... (30350 times) [2024-06-15 18:41:26,788][1652491] InferenceWorker_p0-w0: resuming experience collection (30350 times) [2024-06-15 18:41:27,285][1652491] Updated weights for policy 0, policy_version 583040 (0.0012) [2024-06-15 18:41:28,714][1652491] Updated weights for policy 0, policy_version 583095 (0.0014) [2024-06-15 18:41:30,596][1652491] Updated weights for policy 0, policy_version 583136 (0.0012) [2024-06-15 18:41:30,955][1648985] Fps is (10 sec: 49152.1, 60 sec: 47513.6, 300 sec: 46874.9). Total num frames: 1194295296. Throughput: 0: 12037.7. Samples: 298620416. Policy #0 lag: (min: 15.0, avg: 90.0, max: 271.0) [2024-06-15 18:41:30,955][1648985] Avg episode reward: [(0, '151.100')] [2024-06-15 18:41:31,392][1652491] Updated weights for policy 0, policy_version 583168 (0.0014) [2024-06-15 18:41:35,958][1648985] Fps is (10 sec: 45861.0, 60 sec: 46965.0, 300 sec: 46096.9). Total num frames: 1194393600. Throughput: 0: 12059.6. Samples: 298697728. Policy #0 lag: (min: 15.0, avg: 90.0, max: 271.0) [2024-06-15 18:41:35,959][1648985] Avg episode reward: [(0, '149.670')] [2024-06-15 18:41:36,557][1652491] Updated weights for policy 0, policy_version 583232 (0.0127) [2024-06-15 18:41:37,334][1652491] Updated weights for policy 0, policy_version 583268 (0.0012) [2024-06-15 18:41:38,182][1652491] Updated weights for policy 0, policy_version 583299 (0.0012) [2024-06-15 18:41:39,671][1652491] Updated weights for policy 0, policy_version 583360 (0.0011) [2024-06-15 18:41:40,955][1648985] Fps is (10 sec: 49151.6, 60 sec: 49698.2, 300 sec: 46874.9). Total num frames: 1194786816. Throughput: 0: 12140.2. Samples: 298731520. Policy #0 lag: (min: 15.0, avg: 90.0, max: 271.0) [2024-06-15 18:41:40,955][1648985] Avg episode reward: [(0, '145.950')] [2024-06-15 18:41:41,597][1652491] Updated weights for policy 0, policy_version 583419 (0.0014) [2024-06-15 18:41:45,955][1648985] Fps is (10 sec: 45889.8, 60 sec: 46421.2, 300 sec: 46208.4). Total num frames: 1194852352. Throughput: 0: 12151.5. Samples: 298808832. Policy #0 lag: (min: 15.0, avg: 90.0, max: 271.0) [2024-06-15 18:41:45,956][1648985] Avg episode reward: [(0, '150.870')] [2024-06-15 18:41:47,386][1652491] Updated weights for policy 0, policy_version 583488 (0.0016) [2024-06-15 18:41:49,705][1652491] Updated weights for policy 0, policy_version 583555 (0.0012) [2024-06-15 18:41:50,955][1648985] Fps is (10 sec: 42598.4, 60 sec: 49698.3, 300 sec: 46541.7). Total num frames: 1195212800. Throughput: 0: 11878.4. Samples: 298864640. Policy #0 lag: (min: 15.0, avg: 90.0, max: 271.0) [2024-06-15 18:41:50,956][1648985] Avg episode reward: [(0, '139.700')] [2024-06-15 18:41:52,004][1652491] Updated weights for policy 0, policy_version 583638 (0.0012) [2024-06-15 18:41:55,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 46967.7, 300 sec: 46541.7). Total num frames: 1195376640. Throughput: 0: 11730.5. Samples: 298894848. Policy #0 lag: (min: 15.0, avg: 90.0, max: 271.0) [2024-06-15 18:41:55,956][1648985] Avg episode reward: [(0, '152.870')] [2024-06-15 18:41:58,404][1652491] Updated weights for policy 0, policy_version 583683 (0.0012) [2024-06-15 18:41:59,418][1652491] Updated weights for policy 0, policy_version 583744 (0.0013) [2024-06-15 18:42:00,978][1648985] Fps is (10 sec: 42500.0, 60 sec: 48041.2, 300 sec: 46204.8). Total num frames: 1195638784. Throughput: 0: 11906.4. Samples: 298977792. Policy #0 lag: (min: 15.0, avg: 90.0, max: 271.0) [2024-06-15 18:42:00,979][1648985] Avg episode reward: [(0, '153.610')] [2024-06-15 18:42:01,657][1652491] Updated weights for policy 0, policy_version 583824 (0.0015) [2024-06-15 18:42:03,802][1652491] Updated weights for policy 0, policy_version 583906 (0.0013) [2024-06-15 18:42:05,955][1648985] Fps is (10 sec: 52428.6, 60 sec: 48059.7, 300 sec: 46652.8). Total num frames: 1195900928. Throughput: 0: 11935.3. Samples: 299042304. Policy #0 lag: (min: 15.0, avg: 90.0, max: 271.0) [2024-06-15 18:42:05,956][1648985] Avg episode reward: [(0, '167.430')] [2024-06-15 18:42:09,118][1651469] Signal inference workers to stop experience collection... (30400 times) [2024-06-15 18:42:09,170][1652491] InferenceWorker_p0-w0: stopping experience collection (30400 times) [2024-06-15 18:42:09,306][1651469] Signal inference workers to resume experience collection... (30400 times) [2024-06-15 18:42:09,307][1652491] InferenceWorker_p0-w0: resuming experience collection (30400 times) [2024-06-15 18:42:09,506][1652491] Updated weights for policy 0, policy_version 583956 (0.0033) [2024-06-15 18:42:10,756][1652491] Updated weights for policy 0, policy_version 584016 (0.0014) [2024-06-15 18:42:10,955][1648985] Fps is (10 sec: 42696.8, 60 sec: 46421.3, 300 sec: 46208.4). Total num frames: 1196064768. Throughput: 0: 11912.5. Samples: 299086336. Policy #0 lag: (min: 15.0, avg: 90.0, max: 271.0) [2024-06-15 18:42:10,956][1648985] Avg episode reward: [(0, '159.270')] [2024-06-15 18:42:12,828][1652491] Updated weights for policy 0, policy_version 584066 (0.0017) [2024-06-15 18:42:15,191][1652491] Updated weights for policy 0, policy_version 584160 (0.0022) [2024-06-15 18:42:15,955][1648985] Fps is (10 sec: 49152.5, 60 sec: 48059.7, 300 sec: 46763.8). Total num frames: 1196392448. Throughput: 0: 11593.9. Samples: 299142144. Policy #0 lag: (min: 15.0, avg: 90.0, max: 271.0) [2024-06-15 18:42:15,956][1648985] Avg episode reward: [(0, '161.180')] [2024-06-15 18:42:15,984][1652491] Updated weights for policy 0, policy_version 584191 (0.0011) [2024-06-15 18:42:20,955][1648985] Fps is (10 sec: 42598.9, 60 sec: 44782.9, 300 sec: 46209.0). Total num frames: 1196490752. Throughput: 0: 11765.5. Samples: 299227136. Policy #0 lag: (min: 15.0, avg: 90.0, max: 271.0) [2024-06-15 18:42:20,955][1648985] Avg episode reward: [(0, '149.490')] [2024-06-15 18:42:21,193][1652491] Updated weights for policy 0, policy_version 584244 (0.0013) [2024-06-15 18:42:22,600][1652491] Updated weights for policy 0, policy_version 584318 (0.0013) [2024-06-15 18:42:24,397][1652491] Updated weights for policy 0, policy_version 584371 (0.0013) [2024-06-15 18:42:25,873][1652491] Updated weights for policy 0, policy_version 584436 (0.0100) [2024-06-15 18:42:25,955][1648985] Fps is (10 sec: 52428.1, 60 sec: 49698.2, 300 sec: 46986.0). Total num frames: 1196916736. Throughput: 0: 11662.2. Samples: 299256320. Policy #0 lag: (min: 15.0, avg: 90.0, max: 271.0) [2024-06-15 18:42:25,956][1648985] Avg episode reward: [(0, '156.920')] [2024-06-15 18:42:30,955][1648985] Fps is (10 sec: 45874.5, 60 sec: 44236.6, 300 sec: 46430.6). Total num frames: 1196949504. Throughput: 0: 11867.0. Samples: 299342848. Policy #0 lag: (min: 15.0, avg: 90.0, max: 271.0) [2024-06-15 18:42:30,956][1648985] Avg episode reward: [(0, '137.280')] [2024-06-15 18:42:31,343][1652491] Updated weights for policy 0, policy_version 584480 (0.0012) [2024-06-15 18:42:32,833][1652491] Updated weights for policy 0, policy_version 584545 (0.0026) [2024-06-15 18:42:34,616][1652491] Updated weights for policy 0, policy_version 584611 (0.0019) [2024-06-15 18:42:35,972][1648985] Fps is (10 sec: 49070.0, 60 sec: 50232.9, 300 sec: 46872.2). Total num frames: 1197408256. Throughput: 0: 11976.3. Samples: 299403776. Policy #0 lag: (min: 15.0, avg: 90.0, max: 271.0) [2024-06-15 18:42:35,972][1648985] Avg episode reward: [(0, '140.220')] [2024-06-15 18:42:36,149][1652491] Updated weights for policy 0, policy_version 584676 (0.0014) [2024-06-15 18:42:40,955][1648985] Fps is (10 sec: 52429.4, 60 sec: 44782.9, 300 sec: 46655.2). Total num frames: 1197473792. Throughput: 0: 12208.4. Samples: 299444224. Policy #0 lag: (min: 15.0, avg: 90.0, max: 271.0) [2024-06-15 18:42:40,956][1648985] Avg episode reward: [(0, '117.530')] [2024-06-15 18:42:41,354][1652491] Updated weights for policy 0, policy_version 584706 (0.0012) [2024-06-15 18:42:43,019][1652491] Updated weights for policy 0, policy_version 584784 (0.0015) [2024-06-15 18:42:44,710][1652491] Updated weights for policy 0, policy_version 584833 (0.0013) [2024-06-15 18:42:45,074][1651469] Signal inference workers to stop experience collection... (30450 times) [2024-06-15 18:42:45,110][1652491] InferenceWorker_p0-w0: stopping experience collection (30450 times) [2024-06-15 18:42:45,281][1651469] Signal inference workers to resume experience collection... (30450 times) [2024-06-15 18:42:45,282][1652491] InferenceWorker_p0-w0: resuming experience collection (30450 times) [2024-06-15 18:42:45,969][1648985] Fps is (10 sec: 42611.0, 60 sec: 49686.7, 300 sec: 46761.6). Total num frames: 1197834240. Throughput: 0: 11960.5. Samples: 299515904. Policy #0 lag: (min: 15.0, avg: 90.0, max: 271.0) [2024-06-15 18:42:45,969][1648985] Avg episode reward: [(0, '137.900')] [2024-06-15 18:42:46,422][1652491] Updated weights for policy 0, policy_version 584912 (0.0012) [2024-06-15 18:42:47,589][1652491] Updated weights for policy 0, policy_version 584956 (0.0011) [2024-06-15 18:42:50,955][1648985] Fps is (10 sec: 52428.3, 60 sec: 46421.2, 300 sec: 46874.9). Total num frames: 1197998080. Throughput: 0: 12083.2. Samples: 299586048. Policy #0 lag: (min: 15.0, avg: 90.0, max: 271.0) [2024-06-15 18:42:50,956][1648985] Avg episode reward: [(0, '154.200')] [2024-06-15 18:42:54,131][1652491] Updated weights for policy 0, policy_version 585029 (0.0013) [2024-06-15 18:42:55,466][1652491] Updated weights for policy 0, policy_version 585089 (0.0013) [2024-06-15 18:42:55,955][1648985] Fps is (10 sec: 45938.3, 60 sec: 48605.8, 300 sec: 46874.9). Total num frames: 1198292992. Throughput: 0: 12003.6. Samples: 299626496. Policy #0 lag: (min: 15.0, avg: 90.0, max: 271.0) [2024-06-15 18:42:55,956][1648985] Avg episode reward: [(0, '165.890')] [2024-06-15 18:42:56,241][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000585120_1198325760.pth... [2024-06-15 18:42:56,392][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000579552_1186922496.pth [2024-06-15 18:42:58,206][1652491] Updated weights for policy 0, policy_version 585185 (0.0136) [2024-06-15 18:43:00,998][1648985] Fps is (10 sec: 52207.4, 60 sec: 48044.2, 300 sec: 47090.3). Total num frames: 1198522368. Throughput: 0: 12071.8. Samples: 299685888. Policy #0 lag: (min: 47.0, avg: 191.8, max: 303.0) [2024-06-15 18:43:00,998][1648985] Avg episode reward: [(0, '137.900')] [2024-06-15 18:43:03,936][1652491] Updated weights for policy 0, policy_version 585232 (0.0012) [2024-06-15 18:43:05,277][1652491] Updated weights for policy 0, policy_version 585281 (0.0025) [2024-06-15 18:43:05,955][1648985] Fps is (10 sec: 42598.8, 60 sec: 46967.5, 300 sec: 46986.0). Total num frames: 1198718976. Throughput: 0: 11980.8. Samples: 299766272. Policy #0 lag: (min: 47.0, avg: 191.8, max: 303.0) [2024-06-15 18:43:05,956][1648985] Avg episode reward: [(0, '149.400')] [2024-06-15 18:43:06,650][1652491] Updated weights for policy 0, policy_version 585345 (0.0013) [2024-06-15 18:43:08,511][1652491] Updated weights for policy 0, policy_version 585410 (0.0013) [2024-06-15 18:43:09,811][1652491] Updated weights for policy 0, policy_version 585471 (0.0011) [2024-06-15 18:43:10,962][1648985] Fps is (10 sec: 52614.5, 60 sec: 49692.2, 300 sec: 47095.9). Total num frames: 1199046656. Throughput: 0: 11887.9. Samples: 299791360. Policy #0 lag: (min: 47.0, avg: 191.8, max: 303.0) [2024-06-15 18:43:10,963][1648985] Avg episode reward: [(0, '141.510')] [2024-06-15 18:43:15,713][1652491] Updated weights for policy 0, policy_version 585523 (0.0016) [2024-06-15 18:43:15,955][1648985] Fps is (10 sec: 42598.9, 60 sec: 45875.2, 300 sec: 46986.0). Total num frames: 1199144960. Throughput: 0: 11958.1. Samples: 299880960. Policy #0 lag: (min: 47.0, avg: 191.8, max: 303.0) [2024-06-15 18:43:15,955][1648985] Avg episode reward: [(0, '151.570')] [2024-06-15 18:43:17,387][1652491] Updated weights for policy 0, policy_version 585600 (0.0030) [2024-06-15 18:43:18,987][1652491] Updated weights for policy 0, policy_version 585656 (0.0014) [2024-06-15 18:43:19,963][1652491] Updated weights for policy 0, policy_version 585696 (0.0013) [2024-06-15 18:43:20,955][1648985] Fps is (10 sec: 52467.4, 60 sec: 51336.6, 300 sec: 47319.2). Total num frames: 1199570944. Throughput: 0: 11803.2. Samples: 299934720. Policy #0 lag: (min: 47.0, avg: 191.8, max: 303.0) [2024-06-15 18:43:20,955][1648985] Avg episode reward: [(0, '136.690')] [2024-06-15 18:43:25,970][1648985] Fps is (10 sec: 42533.2, 60 sec: 44225.6, 300 sec: 46761.4). Total num frames: 1199570944. Throughput: 0: 11919.9. Samples: 299980800. Policy #0 lag: (min: 47.0, avg: 191.8, max: 303.0) [2024-06-15 18:43:25,971][1648985] Avg episode reward: [(0, '176.100')] [2024-06-15 18:43:26,245][1652491] Updated weights for policy 0, policy_version 585744 (0.0014) [2024-06-15 18:43:26,431][1651469] Signal inference workers to stop experience collection... (30500 times) [2024-06-15 18:43:26,465][1652491] InferenceWorker_p0-w0: stopping experience collection (30500 times) [2024-06-15 18:43:26,574][1651469] Signal inference workers to resume experience collection... (30500 times) [2024-06-15 18:43:26,575][1652491] InferenceWorker_p0-w0: resuming experience collection (30500 times) [2024-06-15 18:43:27,511][1652491] Updated weights for policy 0, policy_version 585794 (0.0045) [2024-06-15 18:43:28,724][1652491] Updated weights for policy 0, policy_version 585856 (0.0027) [2024-06-15 18:43:30,180][1652491] Updated weights for policy 0, policy_version 585905 (0.0013) [2024-06-15 18:43:30,955][1648985] Fps is (10 sec: 42598.1, 60 sec: 50790.5, 300 sec: 47541.4). Total num frames: 1199996928. Throughput: 0: 11825.2. Samples: 300047872. Policy #0 lag: (min: 47.0, avg: 191.8, max: 303.0) [2024-06-15 18:43:30,955][1648985] Avg episode reward: [(0, '161.270')] [2024-06-15 18:43:31,317][1652491] Updated weights for policy 0, policy_version 585956 (0.0013) [2024-06-15 18:43:35,955][1648985] Fps is (10 sec: 52509.0, 60 sec: 44795.5, 300 sec: 47097.1). Total num frames: 1200095232. Throughput: 0: 12094.6. Samples: 300130304. Policy #0 lag: (min: 47.0, avg: 191.8, max: 303.0) [2024-06-15 18:43:35,956][1648985] Avg episode reward: [(0, '153.390')] [2024-06-15 18:43:37,779][1652491] Updated weights for policy 0, policy_version 586000 (0.0011) [2024-06-15 18:43:39,764][1652491] Updated weights for policy 0, policy_version 586080 (0.0116) [2024-06-15 18:43:40,955][1648985] Fps is (10 sec: 36044.7, 60 sec: 48059.8, 300 sec: 47208.1). Total num frames: 1200357376. Throughput: 0: 11946.7. Samples: 300164096. Policy #0 lag: (min: 47.0, avg: 191.8, max: 303.0) [2024-06-15 18:43:40,955][1648985] Avg episode reward: [(0, '163.110')] [2024-06-15 18:43:41,107][1652491] Updated weights for policy 0, policy_version 586129 (0.0013) [2024-06-15 18:43:42,348][1652491] Updated weights for policy 0, policy_version 586179 (0.0012) [2024-06-15 18:43:43,336][1652491] Updated weights for policy 0, policy_version 586240 (0.0011) [2024-06-15 18:43:45,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 46432.1, 300 sec: 47097.1). Total num frames: 1200619520. Throughput: 0: 12174.4. Samples: 300233216. Policy #0 lag: (min: 47.0, avg: 191.8, max: 303.0) [2024-06-15 18:43:45,955][1648985] Avg episode reward: [(0, '167.480')] [2024-06-15 18:43:49,924][1652491] Updated weights for policy 0, policy_version 586304 (0.0015) [2024-06-15 18:43:50,955][1648985] Fps is (10 sec: 49151.3, 60 sec: 47513.6, 300 sec: 47430.3). Total num frames: 1200848896. Throughput: 0: 11992.1. Samples: 300305920. Policy #0 lag: (min: 47.0, avg: 191.8, max: 303.0) [2024-06-15 18:43:50,956][1648985] Avg episode reward: [(0, '175.890')] [2024-06-15 18:43:51,329][1652491] Updated weights for policy 0, policy_version 586368 (0.0014) [2024-06-15 18:43:53,550][1652491] Updated weights for policy 0, policy_version 586448 (0.0033) [2024-06-15 18:43:54,632][1652491] Updated weights for policy 0, policy_version 586496 (0.0015) [2024-06-15 18:43:55,955][1648985] Fps is (10 sec: 52427.3, 60 sec: 47513.5, 300 sec: 47097.0). Total num frames: 1201143808. Throughput: 0: 11937.2. Samples: 300328448. Policy #0 lag: (min: 47.0, avg: 191.8, max: 303.0) [2024-06-15 18:43:55,956][1648985] Avg episode reward: [(0, '165.180')] [2024-06-15 18:44:00,961][1648985] Fps is (10 sec: 36025.4, 60 sec: 44810.6, 300 sec: 46874.1). Total num frames: 1201209344. Throughput: 0: 11763.2. Samples: 300410368. Policy #0 lag: (min: 47.0, avg: 191.8, max: 303.0) [2024-06-15 18:44:00,961][1648985] Avg episode reward: [(0, '138.420')] [2024-06-15 18:44:01,984][1652491] Updated weights for policy 0, policy_version 586576 (0.0013) [2024-06-15 18:44:02,482][1651469] Signal inference workers to stop experience collection... (30550 times) [2024-06-15 18:44:02,572][1652491] InferenceWorker_p0-w0: stopping experience collection (30550 times) [2024-06-15 18:44:02,690][1651469] Signal inference workers to resume experience collection... (30550 times) [2024-06-15 18:44:02,691][1652491] InferenceWorker_p0-w0: resuming experience collection (30550 times) [2024-06-15 18:44:03,356][1652491] Updated weights for policy 0, policy_version 586640 (0.0244) [2024-06-15 18:44:05,853][1652491] Updated weights for policy 0, policy_version 586726 (0.0013) [2024-06-15 18:44:05,955][1648985] Fps is (10 sec: 45876.4, 60 sec: 48059.8, 300 sec: 47097.1). Total num frames: 1201602560. Throughput: 0: 11582.6. Samples: 300455936. Policy #0 lag: (min: 47.0, avg: 191.8, max: 303.0) [2024-06-15 18:44:05,956][1648985] Avg episode reward: [(0, '130.580')] [2024-06-15 18:44:10,955][1648985] Fps is (10 sec: 45900.5, 60 sec: 43695.9, 300 sec: 46763.8). Total num frames: 1201668096. Throughput: 0: 11529.6. Samples: 300499456. Policy #0 lag: (min: 47.0, avg: 191.8, max: 303.0) [2024-06-15 18:44:10,956][1648985] Avg episode reward: [(0, '160.410')] [2024-06-15 18:44:12,713][1652491] Updated weights for policy 0, policy_version 586769 (0.0013) [2024-06-15 18:44:14,945][1652491] Updated weights for policy 0, policy_version 586867 (0.0013) [2024-06-15 18:44:15,955][1648985] Fps is (10 sec: 36044.3, 60 sec: 46967.3, 300 sec: 46763.8). Total num frames: 1201963008. Throughput: 0: 11423.3. Samples: 300561920. Policy #0 lag: (min: 47.0, avg: 191.8, max: 303.0) [2024-06-15 18:44:15,956][1648985] Avg episode reward: [(0, '168.870')] [2024-06-15 18:44:16,625][1652491] Updated weights for policy 0, policy_version 586930 (0.0015) [2024-06-15 18:44:18,268][1652491] Updated weights for policy 0, policy_version 586996 (0.0105) [2024-06-15 18:44:20,957][1648985] Fps is (10 sec: 52419.1, 60 sec: 43689.2, 300 sec: 47096.8). Total num frames: 1202192384. Throughput: 0: 11036.0. Samples: 300626944. Policy #0 lag: (min: 47.0, avg: 191.8, max: 303.0) [2024-06-15 18:44:20,957][1648985] Avg episode reward: [(0, '181.290')] [2024-06-15 18:44:25,112][1652491] Updated weights for policy 0, policy_version 587043 (0.0081) [2024-06-15 18:44:25,955][1648985] Fps is (10 sec: 36045.5, 60 sec: 45886.9, 300 sec: 46652.8). Total num frames: 1202323456. Throughput: 0: 11161.6. Samples: 300666368. Policy #0 lag: (min: 47.0, avg: 191.8, max: 303.0) [2024-06-15 18:44:25,955][1648985] Avg episode reward: [(0, '167.600')] [2024-06-15 18:44:26,521][1652491] Updated weights for policy 0, policy_version 587106 (0.0013) [2024-06-15 18:44:28,575][1652491] Updated weights for policy 0, policy_version 587184 (0.0014) [2024-06-15 18:44:29,980][1652491] Updated weights for policy 0, policy_version 587234 (0.0013) [2024-06-15 18:44:30,955][1648985] Fps is (10 sec: 52437.7, 60 sec: 45328.9, 300 sec: 47097.1). Total num frames: 1202716672. Throughput: 0: 10922.6. Samples: 300724736. Policy #0 lag: (min: 143.0, avg: 261.0, max: 399.0) [2024-06-15 18:44:30,956][1648985] Avg episode reward: [(0, '172.370')] [2024-06-15 18:44:35,780][1652491] Updated weights for policy 0, policy_version 587267 (0.0012) [2024-06-15 18:44:35,955][1648985] Fps is (10 sec: 39321.2, 60 sec: 43690.7, 300 sec: 46430.6). Total num frames: 1202716672. Throughput: 0: 11116.1. Samples: 300806144. Policy #0 lag: (min: 143.0, avg: 261.0, max: 399.0) [2024-06-15 18:44:35,956][1648985] Avg episode reward: [(0, '160.100')] [2024-06-15 18:44:37,483][1652491] Updated weights for policy 0, policy_version 587348 (0.0013) [2024-06-15 18:44:39,005][1652491] Updated weights for policy 0, policy_version 587424 (0.0012) [2024-06-15 18:44:39,598][1651469] Signal inference workers to stop experience collection... (30600 times) [2024-06-15 18:44:39,618][1652491] InferenceWorker_p0-w0: stopping experience collection (30600 times) [2024-06-15 18:44:39,868][1651469] Signal inference workers to resume experience collection... (30600 times) [2024-06-15 18:44:39,885][1652491] InferenceWorker_p0-w0: resuming experience collection (30600 times) [2024-06-15 18:44:40,955][1648985] Fps is (10 sec: 45876.0, 60 sec: 46967.5, 300 sec: 46874.9). Total num frames: 1203175424. Throughput: 0: 11173.0. Samples: 300831232. Policy #0 lag: (min: 143.0, avg: 261.0, max: 399.0) [2024-06-15 18:44:40,955][1648985] Avg episode reward: [(0, '150.250')] [2024-06-15 18:44:41,038][1652491] Updated weights for policy 0, policy_version 587491 (0.0131) [2024-06-15 18:44:45,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 43690.6, 300 sec: 46652.8). Total num frames: 1203240960. Throughput: 0: 11015.0. Samples: 300905984. Policy #0 lag: (min: 143.0, avg: 261.0, max: 399.0) [2024-06-15 18:44:45,956][1648985] Avg episode reward: [(0, '148.080')] [2024-06-15 18:44:47,558][1652491] Updated weights for policy 0, policy_version 587537 (0.0012) [2024-06-15 18:44:49,804][1652491] Updated weights for policy 0, policy_version 587632 (0.0127) [2024-06-15 18:44:50,955][1648985] Fps is (10 sec: 39322.1, 60 sec: 45329.3, 300 sec: 46764.0). Total num frames: 1203568640. Throughput: 0: 11309.5. Samples: 300964864. Policy #0 lag: (min: 143.0, avg: 261.0, max: 399.0) [2024-06-15 18:44:50,955][1648985] Avg episode reward: [(0, '158.800')] [2024-06-15 18:44:52,198][1652491] Updated weights for policy 0, policy_version 587728 (0.0012) [2024-06-15 18:44:53,434][1652491] Updated weights for policy 0, policy_version 587774 (0.0011) [2024-06-15 18:44:55,955][1648985] Fps is (10 sec: 52426.7, 60 sec: 43690.6, 300 sec: 46874.9). Total num frames: 1203765248. Throughput: 0: 10934.0. Samples: 300991488. Policy #0 lag: (min: 143.0, avg: 261.0, max: 399.0) [2024-06-15 18:44:55,956][1648985] Avg episode reward: [(0, '158.940')] [2024-06-15 18:44:55,962][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000587776_1203765248.pth... [2024-06-15 18:44:56,081][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000582336_1192624128.pth [2024-06-15 18:45:00,160][1652491] Updated weights for policy 0, policy_version 587816 (0.0012) [2024-06-15 18:45:00,955][1648985] Fps is (10 sec: 36044.6, 60 sec: 45333.3, 300 sec: 46652.8). Total num frames: 1203929088. Throughput: 0: 11320.9. Samples: 301071360. Policy #0 lag: (min: 143.0, avg: 261.0, max: 399.0) [2024-06-15 18:45:00,955][1648985] Avg episode reward: [(0, '160.440')] [2024-06-15 18:45:01,521][1652491] Updated weights for policy 0, policy_version 587876 (0.0119) [2024-06-15 18:45:03,324][1652491] Updated weights for policy 0, policy_version 587952 (0.0080) [2024-06-15 18:45:05,315][1652491] Updated weights for policy 0, policy_version 588022 (0.0012) [2024-06-15 18:45:05,962][1648985] Fps is (10 sec: 52392.3, 60 sec: 44777.4, 300 sec: 47095.9). Total num frames: 1204289536. Throughput: 0: 11103.4. Samples: 301126656. Policy #0 lag: (min: 143.0, avg: 261.0, max: 399.0) [2024-06-15 18:45:05,963][1648985] Avg episode reward: [(0, '133.030')] [2024-06-15 18:45:10,955][1648985] Fps is (10 sec: 36043.9, 60 sec: 43690.5, 300 sec: 46208.4). Total num frames: 1204289536. Throughput: 0: 11172.9. Samples: 301169152. Policy #0 lag: (min: 143.0, avg: 261.0, max: 399.0) [2024-06-15 18:45:10,956][1648985] Avg episode reward: [(0, '144.610')] [2024-06-15 18:45:11,452][1652491] Updated weights for policy 0, policy_version 588064 (0.0021) [2024-06-15 18:45:13,059][1652491] Updated weights for policy 0, policy_version 588128 (0.0012) [2024-06-15 18:45:14,032][1652491] Updated weights for policy 0, policy_version 588176 (0.0013) [2024-06-15 18:45:15,636][1652491] Updated weights for policy 0, policy_version 588240 (0.0012) [2024-06-15 18:45:15,959][1648985] Fps is (10 sec: 42611.5, 60 sec: 45872.0, 300 sec: 46763.1). Total num frames: 1204715520. Throughput: 0: 11422.2. Samples: 301238784. Policy #0 lag: (min: 143.0, avg: 261.0, max: 399.0) [2024-06-15 18:45:15,960][1648985] Avg episode reward: [(0, '133.620')] [2024-06-15 18:45:16,830][1652491] Updated weights for policy 0, policy_version 588282 (0.0010) [2024-06-15 18:45:20,959][1648985] Fps is (10 sec: 52407.1, 60 sec: 43688.9, 300 sec: 46429.9). Total num frames: 1204813824. Throughput: 0: 11331.2. Samples: 301316096. Policy #0 lag: (min: 143.0, avg: 261.0, max: 399.0) [2024-06-15 18:45:20,960][1648985] Avg episode reward: [(0, '127.790')] [2024-06-15 18:45:21,696][1651469] Signal inference workers to stop experience collection... (30650 times) [2024-06-15 18:45:21,748][1652491] InferenceWorker_p0-w0: stopping experience collection (30650 times) [2024-06-15 18:45:21,943][1651469] Signal inference workers to resume experience collection... (30650 times) [2024-06-15 18:45:21,943][1652491] InferenceWorker_p0-w0: resuming experience collection (30650 times) [2024-06-15 18:45:22,285][1652491] Updated weights for policy 0, policy_version 588336 (0.0013) [2024-06-15 18:45:24,070][1652491] Updated weights for policy 0, policy_version 588411 (0.0081) [2024-06-15 18:45:25,323][1652491] Updated weights for policy 0, policy_version 588451 (0.0019) [2024-06-15 18:45:25,955][1648985] Fps is (10 sec: 49173.1, 60 sec: 48059.7, 300 sec: 46652.7). Total num frames: 1205207040. Throughput: 0: 11332.3. Samples: 301341184. Policy #0 lag: (min: 143.0, avg: 261.0, max: 399.0) [2024-06-15 18:45:25,955][1648985] Avg episode reward: [(0, '123.320')] [2024-06-15 18:45:27,158][1652491] Updated weights for policy 0, policy_version 588528 (0.0012) [2024-06-15 18:45:30,955][1648985] Fps is (10 sec: 52450.9, 60 sec: 43690.7, 300 sec: 46652.7). Total num frames: 1205338112. Throughput: 0: 11286.7. Samples: 301413888. Policy #0 lag: (min: 143.0, avg: 261.0, max: 399.0) [2024-06-15 18:45:30,957][1648985] Avg episode reward: [(0, '148.160')] [2024-06-15 18:45:33,210][1652491] Updated weights for policy 0, policy_version 588576 (0.0047) [2024-06-15 18:45:34,871][1652491] Updated weights for policy 0, policy_version 588640 (0.0014) [2024-06-15 18:45:35,944][1652491] Updated weights for policy 0, policy_version 588673 (0.0013) [2024-06-15 18:45:35,955][1648985] Fps is (10 sec: 39321.1, 60 sec: 48059.6, 300 sec: 46763.8). Total num frames: 1205600256. Throughput: 0: 11480.1. Samples: 301481472. Policy #0 lag: (min: 143.0, avg: 261.0, max: 399.0) [2024-06-15 18:45:35,956][1648985] Avg episode reward: [(0, '166.540')] [2024-06-15 18:45:37,465][1652491] Updated weights for policy 0, policy_version 588737 (0.0027) [2024-06-15 18:45:38,947][1652491] Updated weights for policy 0, policy_version 588800 (0.0013) [2024-06-15 18:45:40,960][1648985] Fps is (10 sec: 52402.8, 60 sec: 44779.1, 300 sec: 46763.0). Total num frames: 1205862400. Throughput: 0: 11410.7. Samples: 301505024. Policy #0 lag: (min: 143.0, avg: 261.0, max: 399.0) [2024-06-15 18:45:40,960][1648985] Avg episode reward: [(0, '168.410')] [2024-06-15 18:45:44,645][1652491] Updated weights for policy 0, policy_version 588858 (0.0014) [2024-06-15 18:45:45,955][1648985] Fps is (10 sec: 42599.0, 60 sec: 46421.3, 300 sec: 46763.9). Total num frames: 1206026240. Throughput: 0: 11491.6. Samples: 301588480. Policy #0 lag: (min: 143.0, avg: 261.0, max: 399.0) [2024-06-15 18:45:45,955][1648985] Avg episode reward: [(0, '169.040')] [2024-06-15 18:45:46,211][1652491] Updated weights for policy 0, policy_version 588902 (0.0013) [2024-06-15 18:45:47,085][1652491] Updated weights for policy 0, policy_version 588944 (0.0014) [2024-06-15 18:45:48,607][1652491] Updated weights for policy 0, policy_version 589008 (0.0013) [2024-06-15 18:45:49,628][1652491] Updated weights for policy 0, policy_version 589055 (0.0014) [2024-06-15 18:45:50,955][1648985] Fps is (10 sec: 52456.0, 60 sec: 46967.4, 300 sec: 46875.0). Total num frames: 1206386688. Throughput: 0: 11777.9. Samples: 301656576. Policy #0 lag: (min: 143.0, avg: 261.0, max: 399.0) [2024-06-15 18:45:50,955][1648985] Avg episode reward: [(0, '141.520')] [2024-06-15 18:45:54,867][1652491] Updated weights for policy 0, policy_version 589115 (0.0118) [2024-06-15 18:45:55,955][1648985] Fps is (10 sec: 52428.4, 60 sec: 46421.6, 300 sec: 46763.8). Total num frames: 1206550528. Throughput: 0: 11912.6. Samples: 301705216. Policy #0 lag: (min: 143.0, avg: 261.0, max: 399.0) [2024-06-15 18:45:55,956][1648985] Avg episode reward: [(0, '139.180')] [2024-06-15 18:45:56,599][1652491] Updated weights for policy 0, policy_version 589155 (0.0013) [2024-06-15 18:45:57,769][1652491] Updated weights for policy 0, policy_version 589192 (0.0014) [2024-06-15 18:45:58,425][1651469] Signal inference workers to stop experience collection... (30700 times) [2024-06-15 18:45:58,470][1652491] InferenceWorker_p0-w0: stopping experience collection (30700 times) [2024-06-15 18:45:58,665][1651469] Signal inference workers to resume experience collection... (30700 times) [2024-06-15 18:45:58,666][1652491] InferenceWorker_p0-w0: resuming experience collection (30700 times) [2024-06-15 18:45:59,554][1652491] Updated weights for policy 0, policy_version 589268 (0.0013) [2024-06-15 18:46:00,487][1652491] Updated weights for policy 0, policy_version 589312 (0.0013) [2024-06-15 18:46:00,962][1648985] Fps is (10 sec: 52389.7, 60 sec: 49692.0, 300 sec: 47095.9). Total num frames: 1206910976. Throughput: 0: 11695.6. Samples: 301765120. Policy #0 lag: (min: 143.0, avg: 261.0, max: 399.0) [2024-06-15 18:46:00,963][1648985] Avg episode reward: [(0, '132.930')] [2024-06-15 18:46:05,720][1652491] Updated weights for policy 0, policy_version 589371 (0.0017) [2024-06-15 18:46:05,955][1648985] Fps is (10 sec: 49152.1, 60 sec: 45880.8, 300 sec: 46652.7). Total num frames: 1207042048. Throughput: 0: 11868.2. Samples: 301850112. Policy #0 lag: (min: 1.0, avg: 76.6, max: 257.0) [2024-06-15 18:46:05,956][1648985] Avg episode reward: [(0, '158.020')] [2024-06-15 18:46:07,257][1652491] Updated weights for policy 0, policy_version 589411 (0.0011) [2024-06-15 18:46:09,095][1652491] Updated weights for policy 0, policy_version 589472 (0.0013) [2024-06-15 18:46:10,955][1648985] Fps is (10 sec: 49188.2, 60 sec: 51882.8, 300 sec: 47097.0). Total num frames: 1207402496. Throughput: 0: 11912.5. Samples: 301877248. Policy #0 lag: (min: 1.0, avg: 76.6, max: 257.0) [2024-06-15 18:46:10,956][1648985] Avg episode reward: [(0, '163.250')] [2024-06-15 18:46:11,111][1652491] Updated weights for policy 0, policy_version 589562 (0.0021) [2024-06-15 18:46:15,955][1648985] Fps is (10 sec: 39321.8, 60 sec: 45332.3, 300 sec: 46208.4). Total num frames: 1207435264. Throughput: 0: 11855.7. Samples: 301947392. Policy #0 lag: (min: 1.0, avg: 76.6, max: 257.0) [2024-06-15 18:46:15,955][1648985] Avg episode reward: [(0, '176.610')] [2024-06-15 18:46:17,359][1652491] Updated weights for policy 0, policy_version 589623 (0.0018) [2024-06-15 18:46:19,159][1652491] Updated weights for policy 0, policy_version 589680 (0.0020) [2024-06-15 18:46:20,790][1652491] Updated weights for policy 0, policy_version 589714 (0.0025) [2024-06-15 18:46:20,955][1648985] Fps is (10 sec: 32767.9, 60 sec: 48609.4, 300 sec: 46763.8). Total num frames: 1207730176. Throughput: 0: 11821.5. Samples: 302013440. Policy #0 lag: (min: 1.0, avg: 76.6, max: 257.0) [2024-06-15 18:46:20,956][1648985] Avg episode reward: [(0, '164.410')] [2024-06-15 18:46:22,616][1652491] Updated weights for policy 0, policy_version 589792 (0.0166) [2024-06-15 18:46:25,955][1648985] Fps is (10 sec: 52427.6, 60 sec: 45875.0, 300 sec: 46319.5). Total num frames: 1207959552. Throughput: 0: 11868.3. Samples: 302039040. Policy #0 lag: (min: 1.0, avg: 76.6, max: 257.0) [2024-06-15 18:46:25,956][1648985] Avg episode reward: [(0, '153.860')] [2024-06-15 18:46:28,064][1652491] Updated weights for policy 0, policy_version 589860 (0.0015) [2024-06-15 18:46:30,684][1652491] Updated weights for policy 0, policy_version 589936 (0.0014) [2024-06-15 18:46:30,978][1648985] Fps is (10 sec: 45769.5, 60 sec: 47495.4, 300 sec: 46760.7). Total num frames: 1208188928. Throughput: 0: 11838.2. Samples: 302121472. Policy #0 lag: (min: 1.0, avg: 76.6, max: 257.0) [2024-06-15 18:46:30,979][1648985] Avg episode reward: [(0, '133.290')] [2024-06-15 18:46:31,999][1652491] Updated weights for policy 0, policy_version 589972 (0.0013) [2024-06-15 18:46:33,243][1652491] Updated weights for policy 0, policy_version 590036 (0.0015) [2024-06-15 18:46:35,955][1648985] Fps is (10 sec: 52430.2, 60 sec: 48059.9, 300 sec: 46430.6). Total num frames: 1208483840. Throughput: 0: 11844.3. Samples: 302189568. Policy #0 lag: (min: 1.0, avg: 76.6, max: 257.0) [2024-06-15 18:46:35,955][1648985] Avg episode reward: [(0, '156.320')] [2024-06-15 18:46:38,435][1652491] Updated weights for policy 0, policy_version 590096 (0.0013) [2024-06-15 18:46:40,955][1648985] Fps is (10 sec: 49266.1, 60 sec: 46971.5, 300 sec: 46874.9). Total num frames: 1208680448. Throughput: 0: 11662.2. Samples: 302230016. Policy #0 lag: (min: 1.0, avg: 76.6, max: 257.0) [2024-06-15 18:46:40,955][1648985] Avg episode reward: [(0, '160.810')] [2024-06-15 18:46:41,091][1652491] Updated weights for policy 0, policy_version 590179 (0.0012) [2024-06-15 18:46:42,677][1651469] Signal inference workers to stop experience collection... (30750 times) [2024-06-15 18:46:42,723][1652491] InferenceWorker_p0-w0: stopping experience collection (30750 times) [2024-06-15 18:46:42,914][1651469] Signal inference workers to resume experience collection... (30750 times) [2024-06-15 18:46:42,919][1652491] InferenceWorker_p0-w0: resuming experience collection (30750 times) [2024-06-15 18:46:43,310][1652491] Updated weights for policy 0, policy_version 590240 (0.0012) [2024-06-15 18:46:45,705][1652491] Updated weights for policy 0, policy_version 590320 (0.0016) [2024-06-15 18:46:45,955][1648985] Fps is (10 sec: 49151.1, 60 sec: 49151.9, 300 sec: 46652.7). Total num frames: 1208975360. Throughput: 0: 11698.2. Samples: 302291456. Policy #0 lag: (min: 1.0, avg: 76.6, max: 257.0) [2024-06-15 18:46:45,956][1648985] Avg episode reward: [(0, '176.420')] [2024-06-15 18:46:50,373][1652491] Updated weights for policy 0, policy_version 590370 (0.0012) [2024-06-15 18:46:50,966][1648985] Fps is (10 sec: 42550.7, 60 sec: 45320.6, 300 sec: 46539.9). Total num frames: 1209106432. Throughput: 0: 11363.6. Samples: 302361600. Policy #0 lag: (min: 1.0, avg: 76.6, max: 257.0) [2024-06-15 18:46:50,967][1648985] Avg episode reward: [(0, '178.640')] [2024-06-15 18:46:51,369][1652491] Updated weights for policy 0, policy_version 590402 (0.0022) [2024-06-15 18:46:52,965][1652491] Updated weights for policy 0, policy_version 590462 (0.0013) [2024-06-15 18:46:55,980][1648985] Fps is (10 sec: 42494.6, 60 sec: 47494.2, 300 sec: 46652.5). Total num frames: 1209401344. Throughput: 0: 11496.7. Samples: 302394880. Policy #0 lag: (min: 1.0, avg: 76.6, max: 257.0) [2024-06-15 18:46:55,980][1648985] Avg episode reward: [(0, '171.220')] [2024-06-15 18:46:55,985][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000590528_1209401344.pth... [2024-06-15 18:46:56,032][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000585120_1198325760.pth [2024-06-15 18:46:56,522][1652491] Updated weights for policy 0, policy_version 590533 (0.0022) [2024-06-15 18:46:57,522][1652491] Updated weights for policy 0, policy_version 590586 (0.0012) [2024-06-15 18:47:00,955][1648985] Fps is (10 sec: 42646.0, 60 sec: 43696.0, 300 sec: 46208.4). Total num frames: 1209532416. Throughput: 0: 11548.4. Samples: 302467072. Policy #0 lag: (min: 1.0, avg: 76.6, max: 257.0) [2024-06-15 18:47:00,956][1648985] Avg episode reward: [(0, '173.380')] [2024-06-15 18:47:02,541][1652491] Updated weights for policy 0, policy_version 590640 (0.0013) [2024-06-15 18:47:03,708][1652491] Updated weights for policy 0, policy_version 590675 (0.0012) [2024-06-15 18:47:04,884][1652491] Updated weights for policy 0, policy_version 590720 (0.0012) [2024-06-15 18:47:05,958][1648985] Fps is (10 sec: 42689.5, 60 sec: 46418.9, 300 sec: 46652.3). Total num frames: 1209827328. Throughput: 0: 11479.4. Samples: 302530048. Policy #0 lag: (min: 1.0, avg: 76.6, max: 257.0) [2024-06-15 18:47:05,959][1648985] Avg episode reward: [(0, '166.740')] [2024-06-15 18:47:06,843][1652491] Updated weights for policy 0, policy_version 590775 (0.0015) [2024-06-15 18:47:08,729][1652491] Updated weights for policy 0, policy_version 590839 (0.0015) [2024-06-15 18:47:10,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 44236.8, 300 sec: 46319.5). Total num frames: 1210056704. Throughput: 0: 11628.1. Samples: 302562304. Policy #0 lag: (min: 1.0, avg: 76.6, max: 257.0) [2024-06-15 18:47:10,956][1648985] Avg episode reward: [(0, '200.160')] [2024-06-15 18:47:14,121][1652491] Updated weights for policy 0, policy_version 590885 (0.0012) [2024-06-15 18:47:15,702][1652491] Updated weights for policy 0, policy_version 590944 (0.0012) [2024-06-15 18:47:15,955][1648985] Fps is (10 sec: 42611.0, 60 sec: 46967.3, 300 sec: 46652.7). Total num frames: 1210253312. Throughput: 0: 11429.1. Samples: 302635520. Policy #0 lag: (min: 1.0, avg: 76.6, max: 257.0) [2024-06-15 18:47:15,956][1648985] Avg episode reward: [(0, '187.050')] [2024-06-15 18:47:17,584][1652491] Updated weights for policy 0, policy_version 590994 (0.0016) [2024-06-15 18:47:18,424][1652491] Updated weights for policy 0, policy_version 591039 (0.0014) [2024-06-15 18:47:20,177][1652491] Updated weights for policy 0, policy_version 591095 (0.0124) [2024-06-15 18:47:20,956][1648985] Fps is (10 sec: 52424.5, 60 sec: 47513.0, 300 sec: 46319.4). Total num frames: 1210580992. Throughput: 0: 11332.0. Samples: 302699520. Policy #0 lag: (min: 1.0, avg: 76.6, max: 257.0) [2024-06-15 18:47:20,956][1648985] Avg episode reward: [(0, '176.050')] [2024-06-15 18:47:25,789][1652491] Updated weights for policy 0, policy_version 591157 (0.0019) [2024-06-15 18:47:25,955][1648985] Fps is (10 sec: 42600.0, 60 sec: 45329.3, 300 sec: 46541.7). Total num frames: 1210679296. Throughput: 0: 11332.3. Samples: 302739968. Policy #0 lag: (min: 1.0, avg: 76.6, max: 257.0) [2024-06-15 18:47:25,955][1648985] Avg episode reward: [(0, '159.130')] [2024-06-15 18:47:26,400][1652491] Updated weights for policy 0, policy_version 591175 (0.0009) [2024-06-15 18:47:28,462][1651469] Signal inference workers to stop experience collection... (30800 times) [2024-06-15 18:47:28,527][1652491] InferenceWorker_p0-w0: stopping experience collection (30800 times) [2024-06-15 18:47:28,552][1652491] Updated weights for policy 0, policy_version 591235 (0.0012) [2024-06-15 18:47:28,704][1651469] Signal inference workers to resume experience collection... (30800 times) [2024-06-15 18:47:28,704][1652491] InferenceWorker_p0-w0: resuming experience collection (30800 times) [2024-06-15 18:47:29,763][1652491] Updated weights for policy 0, policy_version 591295 (0.0011) [2024-06-15 18:47:30,955][1648985] Fps is (10 sec: 49155.8, 60 sec: 48078.2, 300 sec: 46322.1). Total num frames: 1211072512. Throughput: 0: 11400.6. Samples: 302804480. Policy #0 lag: (min: 1.0, avg: 76.6, max: 257.0) [2024-06-15 18:47:30,956][1648985] Avg episode reward: [(0, '154.410')] [2024-06-15 18:47:31,192][1652491] Updated weights for policy 0, policy_version 591350 (0.0031) [2024-06-15 18:47:35,957][1648985] Fps is (10 sec: 45867.5, 60 sec: 44235.6, 300 sec: 46319.3). Total num frames: 1211138048. Throughput: 0: 11630.6. Samples: 302884864. Policy #0 lag: (min: 1.0, avg: 76.6, max: 257.0) [2024-06-15 18:47:35,957][1648985] Avg episode reward: [(0, '137.880')] [2024-06-15 18:47:36,468][1652491] Updated weights for policy 0, policy_version 591395 (0.0012) [2024-06-15 18:47:37,373][1652491] Updated weights for policy 0, policy_version 591426 (0.0015) [2024-06-15 18:47:38,861][1652491] Updated weights for policy 0, policy_version 591486 (0.0015) [2024-06-15 18:47:40,945][1652491] Updated weights for policy 0, policy_version 591541 (0.0014) [2024-06-15 18:47:40,955][1648985] Fps is (10 sec: 39321.2, 60 sec: 46421.2, 300 sec: 46210.6). Total num frames: 1211465728. Throughput: 0: 11486.4. Samples: 302911488. Policy #0 lag: (min: 34.0, avg: 124.8, max: 277.0) [2024-06-15 18:47:40,956][1648985] Avg episode reward: [(0, '136.840')] [2024-06-15 18:47:42,327][1652491] Updated weights for policy 0, policy_version 591600 (0.0013) [2024-06-15 18:47:45,960][1648985] Fps is (10 sec: 49135.6, 60 sec: 44233.3, 300 sec: 46207.7). Total num frames: 1211629568. Throughput: 0: 11444.8. Samples: 302982144. Policy #0 lag: (min: 34.0, avg: 124.8, max: 277.0) [2024-06-15 18:47:45,960][1648985] Avg episode reward: [(0, '151.390')] [2024-06-15 18:47:48,476][1652491] Updated weights for policy 0, policy_version 591670 (0.0088) [2024-06-15 18:47:49,756][1652491] Updated weights for policy 0, policy_version 591713 (0.0012) [2024-06-15 18:47:50,955][1648985] Fps is (10 sec: 42598.9, 60 sec: 46430.0, 300 sec: 46097.4). Total num frames: 1211891712. Throughput: 0: 11583.4. Samples: 303051264. Policy #0 lag: (min: 34.0, avg: 124.8, max: 277.0) [2024-06-15 18:47:50,956][1648985] Avg episode reward: [(0, '165.710')] [2024-06-15 18:47:51,685][1652491] Updated weights for policy 0, policy_version 591776 (0.0012) [2024-06-15 18:47:53,049][1652491] Updated weights for policy 0, policy_version 591824 (0.0015) [2024-06-15 18:47:54,045][1652491] Updated weights for policy 0, policy_version 591869 (0.0012) [2024-06-15 18:47:55,955][1648985] Fps is (10 sec: 52453.8, 60 sec: 45893.9, 300 sec: 46215.1). Total num frames: 1212153856. Throughput: 0: 11525.7. Samples: 303080960. Policy #0 lag: (min: 34.0, avg: 124.8, max: 277.0) [2024-06-15 18:47:55,955][1648985] Avg episode reward: [(0, '178.510')] [2024-06-15 18:47:59,856][1652491] Updated weights for policy 0, policy_version 591936 (0.0013) [2024-06-15 18:48:00,974][1648985] Fps is (10 sec: 45787.3, 60 sec: 46952.4, 300 sec: 46205.4). Total num frames: 1212350464. Throughput: 0: 11555.0. Samples: 303155712. Policy #0 lag: (min: 34.0, avg: 124.8, max: 277.0) [2024-06-15 18:48:00,975][1648985] Avg episode reward: [(0, '171.320')] [2024-06-15 18:48:01,311][1652491] Updated weights for policy 0, policy_version 591999 (0.0030) [2024-06-15 18:48:04,373][1652491] Updated weights for policy 0, policy_version 592069 (0.0012) [2024-06-15 18:48:05,542][1652491] Updated weights for policy 0, policy_version 592126 (0.0013) [2024-06-15 18:48:05,955][1648985] Fps is (10 sec: 52429.3, 60 sec: 47516.1, 300 sec: 46209.6). Total num frames: 1212678144. Throughput: 0: 11651.1. Samples: 303223808. Policy #0 lag: (min: 34.0, avg: 124.8, max: 277.0) [2024-06-15 18:48:05,956][1648985] Avg episode reward: [(0, '177.150')] [2024-06-15 18:48:09,908][1652491] Updated weights for policy 0, policy_version 592177 (0.0033) [2024-06-15 18:48:10,696][1651469] Signal inference workers to stop experience collection... (30850 times) [2024-06-15 18:48:10,770][1652491] InferenceWorker_p0-w0: stopping experience collection (30850 times) [2024-06-15 18:48:10,946][1651469] Signal inference workers to resume experience collection... (30850 times) [2024-06-15 18:48:10,955][1648985] Fps is (10 sec: 49246.5, 60 sec: 46421.3, 300 sec: 46430.6). Total num frames: 1212841984. Throughput: 0: 11605.3. Samples: 303262208. Policy #0 lag: (min: 34.0, avg: 124.8, max: 277.0) [2024-06-15 18:48:10,956][1648985] Avg episode reward: [(0, '163.400')] [2024-06-15 18:48:10,961][1652491] InferenceWorker_p0-w0: resuming experience collection (30850 times) [2024-06-15 18:48:11,591][1652491] Updated weights for policy 0, policy_version 592226 (0.0011) [2024-06-15 18:48:14,237][1652491] Updated weights for policy 0, policy_version 592273 (0.0011) [2024-06-15 18:48:15,955][1648985] Fps is (10 sec: 42598.8, 60 sec: 47513.9, 300 sec: 45875.2). Total num frames: 1213104128. Throughput: 0: 11810.2. Samples: 303335936. Policy #0 lag: (min: 34.0, avg: 124.8, max: 277.0) [2024-06-15 18:48:15,955][1648985] Avg episode reward: [(0, '146.190')] [2024-06-15 18:48:16,732][1652491] Updated weights for policy 0, policy_version 592377 (0.0014) [2024-06-15 18:48:20,799][1652491] Updated weights for policy 0, policy_version 592441 (0.0014) [2024-06-15 18:48:20,955][1648985] Fps is (10 sec: 49152.2, 60 sec: 45875.8, 300 sec: 46655.2). Total num frames: 1213333504. Throughput: 0: 11412.3. Samples: 303398400. Policy #0 lag: (min: 34.0, avg: 124.8, max: 277.0) [2024-06-15 18:48:20,956][1648985] Avg episode reward: [(0, '146.280')] [2024-06-15 18:48:23,321][1652491] Updated weights for policy 0, policy_version 592506 (0.0015) [2024-06-15 18:48:25,955][1648985] Fps is (10 sec: 39320.8, 60 sec: 46967.3, 300 sec: 45764.1). Total num frames: 1213497344. Throughput: 0: 11639.5. Samples: 303435264. Policy #0 lag: (min: 34.0, avg: 124.8, max: 277.0) [2024-06-15 18:48:25,955][1648985] Avg episode reward: [(0, '158.310')] [2024-06-15 18:48:26,671][1652491] Updated weights for policy 0, policy_version 592569 (0.0015) [2024-06-15 18:48:28,602][1652491] Updated weights for policy 0, policy_version 592631 (0.0013) [2024-06-15 18:48:30,955][1648985] Fps is (10 sec: 39321.7, 60 sec: 44236.8, 300 sec: 46208.4). Total num frames: 1213726720. Throughput: 0: 11572.5. Samples: 303502848. Policy #0 lag: (min: 34.0, avg: 124.8, max: 277.0) [2024-06-15 18:48:30,955][1648985] Avg episode reward: [(0, '154.940')] [2024-06-15 18:48:31,544][1652491] Updated weights for policy 0, policy_version 592673 (0.0014) [2024-06-15 18:48:34,434][1652491] Updated weights for policy 0, policy_version 592767 (0.0065) [2024-06-15 18:48:35,955][1648985] Fps is (10 sec: 49152.3, 60 sec: 47514.8, 300 sec: 46208.4). Total num frames: 1213988864. Throughput: 0: 11673.6. Samples: 303576576. Policy #0 lag: (min: 34.0, avg: 124.8, max: 277.0) [2024-06-15 18:48:35,956][1648985] Avg episode reward: [(0, '144.130')] [2024-06-15 18:48:38,071][1652491] Updated weights for policy 0, policy_version 592831 (0.0014) [2024-06-15 18:48:39,552][1652491] Updated weights for policy 0, policy_version 592880 (0.0012) [2024-06-15 18:48:40,955][1648985] Fps is (10 sec: 52428.4, 60 sec: 46421.4, 300 sec: 46208.4). Total num frames: 1214251008. Throughput: 0: 11787.4. Samples: 303611392. Policy #0 lag: (min: 34.0, avg: 124.8, max: 277.0) [2024-06-15 18:48:40,956][1648985] Avg episode reward: [(0, '146.160')] [2024-06-15 18:48:42,082][1652491] Updated weights for policy 0, policy_version 592944 (0.0013) [2024-06-15 18:48:44,327][1652491] Updated weights for policy 0, policy_version 592982 (0.0014) [2024-06-15 18:48:45,955][1648985] Fps is (10 sec: 52429.1, 60 sec: 48063.6, 300 sec: 46319.5). Total num frames: 1214513152. Throughput: 0: 11712.7. Samples: 303682560. Policy #0 lag: (min: 34.0, avg: 124.8, max: 277.0) [2024-06-15 18:48:45,956][1648985] Avg episode reward: [(0, '148.780')] [2024-06-15 18:48:49,530][1652491] Updated weights for policy 0, policy_version 593072 (0.0129) [2024-06-15 18:48:50,793][1652491] Updated weights for policy 0, policy_version 593121 (0.0013) [2024-06-15 18:48:50,955][1648985] Fps is (10 sec: 49152.3, 60 sec: 47513.6, 300 sec: 46097.4). Total num frames: 1214742528. Throughput: 0: 11673.6. Samples: 303749120. Policy #0 lag: (min: 34.0, avg: 124.8, max: 277.0) [2024-06-15 18:48:50,956][1648985] Avg episode reward: [(0, '160.680')] [2024-06-15 18:48:52,886][1652491] Updated weights for policy 0, policy_version 593157 (0.0012) [2024-06-15 18:48:55,324][1651469] Signal inference workers to stop experience collection... (30900 times) [2024-06-15 18:48:55,397][1652491] InferenceWorker_p0-w0: stopping experience collection (30900 times) [2024-06-15 18:48:55,399][1652491] Updated weights for policy 0, policy_version 593217 (0.0264) [2024-06-15 18:48:55,627][1651469] Signal inference workers to resume experience collection... (30900 times) [2024-06-15 18:48:55,627][1652491] InferenceWorker_p0-w0: resuming experience collection (30900 times) [2024-06-15 18:48:55,955][1648985] Fps is (10 sec: 42596.9, 60 sec: 46421.1, 300 sec: 46542.5). Total num frames: 1214939136. Throughput: 0: 11662.1. Samples: 303787008. Policy #0 lag: (min: 34.0, avg: 124.8, max: 277.0) [2024-06-15 18:48:55,956][1648985] Avg episode reward: [(0, '148.910')] [2024-06-15 18:48:56,336][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000593264_1215004672.pth... [2024-06-15 18:48:56,385][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000587776_1203765248.pth [2024-06-15 18:48:56,686][1652491] Updated weights for policy 0, policy_version 593280 (0.0014) [2024-06-15 18:49:00,955][1648985] Fps is (10 sec: 42598.6, 60 sec: 46982.6, 300 sec: 45986.3). Total num frames: 1215168512. Throughput: 0: 11685.0. Samples: 303861760. Policy #0 lag: (min: 34.0, avg: 124.8, max: 277.0) [2024-06-15 18:49:00,955][1648985] Avg episode reward: [(0, '166.080')] [2024-06-15 18:49:00,957][1652491] Updated weights for policy 0, policy_version 593344 (0.0114) [2024-06-15 18:49:04,572][1652491] Updated weights for policy 0, policy_version 593424 (0.0097) [2024-06-15 18:49:05,955][1648985] Fps is (10 sec: 49153.9, 60 sec: 45875.2, 300 sec: 46652.8). Total num frames: 1215430656. Throughput: 0: 11764.6. Samples: 303927808. Policy #0 lag: (min: 34.0, avg: 124.8, max: 277.0) [2024-06-15 18:49:05,956][1648985] Avg episode reward: [(0, '165.330')] [2024-06-15 18:49:06,604][1652491] Updated weights for policy 0, policy_version 593476 (0.0016) [2024-06-15 18:49:07,672][1652491] Updated weights for policy 0, policy_version 593530 (0.0012) [2024-06-15 18:49:10,955][1648985] Fps is (10 sec: 45874.0, 60 sec: 46421.2, 300 sec: 46319.5). Total num frames: 1215627264. Throughput: 0: 11798.7. Samples: 303966208. Policy #0 lag: (min: 34.0, avg: 124.8, max: 277.0) [2024-06-15 18:49:10,956][1648985] Avg episode reward: [(0, '183.220')] [2024-06-15 18:49:11,273][1652491] Updated weights for policy 0, policy_version 593594 (0.0120) [2024-06-15 18:49:13,252][1652491] Updated weights for policy 0, policy_version 593664 (0.0013) [2024-06-15 18:49:15,955][1648985] Fps is (10 sec: 45875.1, 60 sec: 46421.3, 300 sec: 46430.9). Total num frames: 1215889408. Throughput: 0: 11844.3. Samples: 304035840. Policy #0 lag: (min: 47.0, avg: 143.4, max: 303.0) [2024-06-15 18:49:15,956][1648985] Avg episode reward: [(0, '163.630')] [2024-06-15 18:49:16,365][1652491] Updated weights for policy 0, policy_version 593722 (0.0014) [2024-06-15 18:49:18,627][1652491] Updated weights for policy 0, policy_version 593776 (0.0024) [2024-06-15 18:49:20,955][1648985] Fps is (10 sec: 45875.7, 60 sec: 45875.1, 300 sec: 46652.7). Total num frames: 1216086016. Throughput: 0: 11832.9. Samples: 304109056. Policy #0 lag: (min: 47.0, avg: 143.4, max: 303.0) [2024-06-15 18:49:20,956][1648985] Avg episode reward: [(0, '156.660')] [2024-06-15 18:49:22,146][1652491] Updated weights for policy 0, policy_version 593812 (0.0013) [2024-06-15 18:49:23,421][1652491] Updated weights for policy 0, policy_version 593875 (0.0026) [2024-06-15 18:49:24,241][1652491] Updated weights for policy 0, policy_version 593919 (0.0013) [2024-06-15 18:49:25,955][1648985] Fps is (10 sec: 45874.4, 60 sec: 47513.6, 300 sec: 46208.4). Total num frames: 1216348160. Throughput: 0: 11821.5. Samples: 304143360. Policy #0 lag: (min: 47.0, avg: 143.4, max: 303.0) [2024-06-15 18:49:25,956][1648985] Avg episode reward: [(0, '148.350')] [2024-06-15 18:49:26,953][1652491] Updated weights for policy 0, policy_version 593977 (0.0015) [2024-06-15 18:49:29,336][1652491] Updated weights for policy 0, policy_version 594032 (0.0142) [2024-06-15 18:49:30,955][1648985] Fps is (10 sec: 52428.0, 60 sec: 48059.5, 300 sec: 47097.0). Total num frames: 1216610304. Throughput: 0: 11787.3. Samples: 304212992. Policy #0 lag: (min: 47.0, avg: 143.4, max: 303.0) [2024-06-15 18:49:30,956][1648985] Avg episode reward: [(0, '148.180')] [2024-06-15 18:49:32,997][1652491] Updated weights for policy 0, policy_version 594064 (0.0047) [2024-06-15 18:49:34,800][1652491] Updated weights for policy 0, policy_version 594134 (0.0012) [2024-06-15 18:49:35,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 48059.7, 300 sec: 46430.6). Total num frames: 1216872448. Throughput: 0: 11912.5. Samples: 304285184. Policy #0 lag: (min: 47.0, avg: 143.4, max: 303.0) [2024-06-15 18:49:35,956][1648985] Avg episode reward: [(0, '154.870')] [2024-06-15 18:49:37,282][1652491] Updated weights for policy 0, policy_version 594181 (0.0121) [2024-06-15 18:49:40,071][1651469] Signal inference workers to stop experience collection... (30950 times) [2024-06-15 18:49:40,149][1652491] InferenceWorker_p0-w0: stopping experience collection (30950 times) [2024-06-15 18:49:40,150][1652491] Updated weights for policy 0, policy_version 594243 (0.0157) [2024-06-15 18:49:40,385][1651469] Signal inference workers to resume experience collection... (30950 times) [2024-06-15 18:49:40,387][1652491] InferenceWorker_p0-w0: resuming experience collection (30950 times) [2024-06-15 18:49:40,955][1648985] Fps is (10 sec: 45876.4, 60 sec: 46967.5, 300 sec: 46874.9). Total num frames: 1217069056. Throughput: 0: 11833.0. Samples: 304319488. Policy #0 lag: (min: 47.0, avg: 143.4, max: 303.0) [2024-06-15 18:49:40,955][1648985] Avg episode reward: [(0, '169.460')] [2024-06-15 18:49:44,071][1652491] Updated weights for policy 0, policy_version 594306 (0.0023) [2024-06-15 18:49:45,791][1652491] Updated weights for policy 0, policy_version 594386 (0.0014) [2024-06-15 18:49:45,955][1648985] Fps is (10 sec: 42599.1, 60 sec: 46421.4, 300 sec: 46541.7). Total num frames: 1217298432. Throughput: 0: 11832.9. Samples: 304394240. Policy #0 lag: (min: 47.0, avg: 143.4, max: 303.0) [2024-06-15 18:49:45,956][1648985] Avg episode reward: [(0, '167.610')] [2024-06-15 18:49:47,829][1652491] Updated weights for policy 0, policy_version 594433 (0.0038) [2024-06-15 18:49:48,915][1652491] Updated weights for policy 0, policy_version 594487 (0.0013) [2024-06-15 18:49:50,955][1648985] Fps is (10 sec: 45875.1, 60 sec: 46421.3, 300 sec: 46652.8). Total num frames: 1217527808. Throughput: 0: 12071.8. Samples: 304471040. Policy #0 lag: (min: 47.0, avg: 143.4, max: 303.0) [2024-06-15 18:49:50,955][1648985] Avg episode reward: [(0, '162.990')] [2024-06-15 18:49:52,046][1652491] Updated weights for policy 0, policy_version 594549 (0.0017) [2024-06-15 18:49:55,503][1652491] Updated weights for policy 0, policy_version 594608 (0.0013) [2024-06-15 18:49:55,955][1648985] Fps is (10 sec: 49151.6, 60 sec: 47513.8, 300 sec: 46986.0). Total num frames: 1217789952. Throughput: 0: 12071.9. Samples: 304509440. Policy #0 lag: (min: 47.0, avg: 143.4, max: 303.0) [2024-06-15 18:49:55,956][1648985] Avg episode reward: [(0, '163.140')] [2024-06-15 18:49:57,101][1652491] Updated weights for policy 0, policy_version 594683 (0.0014) [2024-06-15 18:49:58,964][1652491] Updated weights for policy 0, policy_version 594743 (0.0013) [2024-06-15 18:50:00,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 48059.7, 300 sec: 46653.9). Total num frames: 1218052096. Throughput: 0: 12014.9. Samples: 304576512. Policy #0 lag: (min: 47.0, avg: 143.4, max: 303.0) [2024-06-15 18:50:00,955][1648985] Avg episode reward: [(0, '160.570')] [2024-06-15 18:50:02,480][1652491] Updated weights for policy 0, policy_version 594785 (0.0011) [2024-06-15 18:50:05,633][1652491] Updated weights for policy 0, policy_version 594832 (0.0012) [2024-06-15 18:50:05,957][1648985] Fps is (10 sec: 42590.8, 60 sec: 46419.9, 300 sec: 47207.9). Total num frames: 1218215936. Throughput: 0: 12196.5. Samples: 304657920. Policy #0 lag: (min: 47.0, avg: 143.4, max: 303.0) [2024-06-15 18:50:05,957][1648985] Avg episode reward: [(0, '158.360')] [2024-06-15 18:50:07,192][1652491] Updated weights for policy 0, policy_version 594901 (0.0141) [2024-06-15 18:50:08,050][1652491] Updated weights for policy 0, policy_version 594944 (0.0041) [2024-06-15 18:50:09,310][1652491] Updated weights for policy 0, policy_version 595008 (0.0012) [2024-06-15 18:50:10,955][1648985] Fps is (10 sec: 52426.9, 60 sec: 49151.9, 300 sec: 46986.6). Total num frames: 1218576384. Throughput: 0: 12094.5. Samples: 304687616. Policy #0 lag: (min: 47.0, avg: 143.4, max: 303.0) [2024-06-15 18:50:10,956][1648985] Avg episode reward: [(0, '161.260')] [2024-06-15 18:50:13,731][1652491] Updated weights for policy 0, policy_version 595063 (0.0130) [2024-06-15 18:50:15,955][1648985] Fps is (10 sec: 49161.2, 60 sec: 46967.5, 300 sec: 47097.8). Total num frames: 1218707456. Throughput: 0: 12174.3. Samples: 304760832. Policy #0 lag: (min: 47.0, avg: 143.4, max: 303.0) [2024-06-15 18:50:15,955][1648985] Avg episode reward: [(0, '183.690')] [2024-06-15 18:50:16,852][1652491] Updated weights for policy 0, policy_version 595107 (0.0012) [2024-06-15 18:50:18,122][1652491] Updated weights for policy 0, policy_version 595160 (0.0012) [2024-06-15 18:50:18,772][1651469] Signal inference workers to stop experience collection... (31000 times) [2024-06-15 18:50:18,912][1652491] InferenceWorker_p0-w0: stopping experience collection (31000 times) [2024-06-15 18:50:18,990][1651469] Signal inference workers to resume experience collection... (31000 times) [2024-06-15 18:50:18,991][1652491] InferenceWorker_p0-w0: resuming experience collection (31000 times) [2024-06-15 18:50:19,506][1652491] Updated weights for policy 0, policy_version 595218 (0.0105) [2024-06-15 18:50:20,955][1648985] Fps is (10 sec: 52430.5, 60 sec: 50244.3, 300 sec: 47097.1). Total num frames: 1219100672. Throughput: 0: 12151.5. Samples: 304832000. Policy #0 lag: (min: 47.0, avg: 143.4, max: 303.0) [2024-06-15 18:50:20,956][1648985] Avg episode reward: [(0, '197.940')] [2024-06-15 18:50:23,870][1652491] Updated weights for policy 0, policy_version 595283 (0.0013) [2024-06-15 18:50:24,876][1652491] Updated weights for policy 0, policy_version 595326 (0.0013) [2024-06-15 18:50:25,955][1648985] Fps is (10 sec: 52428.5, 60 sec: 48059.9, 300 sec: 47097.1). Total num frames: 1219231744. Throughput: 0: 12208.3. Samples: 304868864. Policy #0 lag: (min: 47.0, avg: 143.4, max: 303.0) [2024-06-15 18:50:25,956][1648985] Avg episode reward: [(0, '194.000')] [2024-06-15 18:50:28,501][1652491] Updated weights for policy 0, policy_version 595392 (0.0015) [2024-06-15 18:50:30,421][1652491] Updated weights for policy 0, policy_version 595472 (0.0012) [2024-06-15 18:50:30,956][1648985] Fps is (10 sec: 45873.2, 60 sec: 49151.8, 300 sec: 47319.2). Total num frames: 1219559424. Throughput: 0: 12003.4. Samples: 304934400. Policy #0 lag: (min: 47.0, avg: 143.4, max: 303.0) [2024-06-15 18:50:30,956][1648985] Avg episode reward: [(0, '187.920')] [2024-06-15 18:50:35,624][1652491] Updated weights for policy 0, policy_version 595552 (0.0023) [2024-06-15 18:50:35,955][1648985] Fps is (10 sec: 45875.3, 60 sec: 46967.6, 300 sec: 46875.7). Total num frames: 1219690496. Throughput: 0: 11810.1. Samples: 305002496. Policy #0 lag: (min: 47.0, avg: 143.4, max: 303.0) [2024-06-15 18:50:35,956][1648985] Avg episode reward: [(0, '183.310')] [2024-06-15 18:50:39,046][1652491] Updated weights for policy 0, policy_version 595585 (0.0029) [2024-06-15 18:50:40,955][1648985] Fps is (10 sec: 32769.5, 60 sec: 46967.4, 300 sec: 46986.0). Total num frames: 1219887104. Throughput: 0: 11889.8. Samples: 305044480. Policy #0 lag: (min: 47.0, avg: 143.4, max: 303.0) [2024-06-15 18:50:40,956][1648985] Avg episode reward: [(0, '182.060')] [2024-06-15 18:50:41,089][1652491] Updated weights for policy 0, policy_version 595652 (0.0012) [2024-06-15 18:50:43,143][1652491] Updated weights for policy 0, policy_version 595733 (0.0014) [2024-06-15 18:50:45,955][1648985] Fps is (10 sec: 45875.6, 60 sec: 47513.6, 300 sec: 46652.7). Total num frames: 1220149248. Throughput: 0: 11537.1. Samples: 305095680. Policy #0 lag: (min: 47.0, avg: 143.4, max: 303.0) [2024-06-15 18:50:45,955][1648985] Avg episode reward: [(0, '187.590')] [2024-06-15 18:50:46,987][1652491] Updated weights for policy 0, policy_version 595793 (0.0014) [2024-06-15 18:50:50,955][1648985] Fps is (10 sec: 39321.2, 60 sec: 45875.1, 300 sec: 46541.7). Total num frames: 1220280320. Throughput: 0: 11366.8. Samples: 305169408. Policy #0 lag: (min: 15.0, avg: 141.4, max: 271.0) [2024-06-15 18:50:50,956][1648985] Avg episode reward: [(0, '169.050')] [2024-06-15 18:50:51,338][1652491] Updated weights for policy 0, policy_version 595842 (0.0018) [2024-06-15 18:50:52,808][1652491] Updated weights for policy 0, policy_version 595904 (0.0013) [2024-06-15 18:50:54,689][1652491] Updated weights for policy 0, policy_version 595969 (0.0012) [2024-06-15 18:50:55,814][1652491] Updated weights for policy 0, policy_version 596031 (0.0012) [2024-06-15 18:50:55,955][1648985] Fps is (10 sec: 52426.3, 60 sec: 48059.5, 300 sec: 46653.8). Total num frames: 1220673536. Throughput: 0: 11332.3. Samples: 305197568. Policy #0 lag: (min: 15.0, avg: 141.4, max: 271.0) [2024-06-15 18:50:55,956][1648985] Avg episode reward: [(0, '164.600')] [2024-06-15 18:50:55,962][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000596032_1220673536.pth... [2024-06-15 18:50:56,013][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000590528_1209401344.pth [2024-06-15 18:50:56,016][1651469] Saving a milestone train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/milestones/checkpoint_000596032_1220673536.pth [2024-06-15 18:50:59,389][1652491] Updated weights for policy 0, policy_version 596087 (0.0014) [2024-06-15 18:51:00,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 45875.1, 300 sec: 46652.7). Total num frames: 1220804608. Throughput: 0: 11252.6. Samples: 305267200. Policy #0 lag: (min: 15.0, avg: 141.4, max: 271.0) [2024-06-15 18:51:00,956][1648985] Avg episode reward: [(0, '159.930')] [2024-06-15 18:51:02,993][1652491] Updated weights for policy 0, policy_version 596112 (0.0012) [2024-06-15 18:51:03,611][1651469] Signal inference workers to stop experience collection... (31050 times) [2024-06-15 18:51:03,662][1652491] InferenceWorker_p0-w0: stopping experience collection (31050 times) [2024-06-15 18:51:03,952][1651469] Signal inference workers to resume experience collection... (31050 times) [2024-06-15 18:51:03,954][1652491] InferenceWorker_p0-w0: resuming experience collection (31050 times) [2024-06-15 18:51:05,431][1652491] Updated weights for policy 0, policy_version 596192 (0.0126) [2024-06-15 18:51:05,955][1648985] Fps is (10 sec: 36046.0, 60 sec: 46968.8, 300 sec: 46208.4). Total num frames: 1221033984. Throughput: 0: 11104.7. Samples: 305331712. Policy #0 lag: (min: 15.0, avg: 141.4, max: 271.0) [2024-06-15 18:51:05,956][1648985] Avg episode reward: [(0, '165.070')] [2024-06-15 18:51:07,009][1652491] Updated weights for policy 0, policy_version 596257 (0.0013) [2024-06-15 18:51:10,413][1652491] Updated weights for policy 0, policy_version 596323 (0.0012) [2024-06-15 18:51:10,955][1648985] Fps is (10 sec: 49151.4, 60 sec: 45329.1, 300 sec: 46985.9). Total num frames: 1221296128. Throughput: 0: 11047.8. Samples: 305366016. Policy #0 lag: (min: 15.0, avg: 141.4, max: 271.0) [2024-06-15 18:51:10,956][1648985] Avg episode reward: [(0, '170.440')] [2024-06-15 18:51:14,552][1652491] Updated weights for policy 0, policy_version 596355 (0.0020) [2024-06-15 18:51:15,777][1652491] Updated weights for policy 0, policy_version 596413 (0.0014) [2024-06-15 18:51:15,955][1648985] Fps is (10 sec: 42599.0, 60 sec: 45875.2, 300 sec: 46541.7). Total num frames: 1221459968. Throughput: 0: 11275.5. Samples: 305441792. Policy #0 lag: (min: 15.0, avg: 141.4, max: 271.0) [2024-06-15 18:51:15,955][1648985] Avg episode reward: [(0, '175.350')] [2024-06-15 18:51:17,853][1652491] Updated weights for policy 0, policy_version 596483 (0.0107) [2024-06-15 18:51:19,165][1652491] Updated weights for policy 0, policy_version 596535 (0.0011) [2024-06-15 18:51:20,955][1648985] Fps is (10 sec: 42599.3, 60 sec: 43690.7, 300 sec: 46652.8). Total num frames: 1221722112. Throughput: 0: 11104.7. Samples: 305502208. Policy #0 lag: (min: 15.0, avg: 141.4, max: 271.0) [2024-06-15 18:51:20,956][1648985] Avg episode reward: [(0, '174.170')] [2024-06-15 18:51:22,526][1652491] Updated weights for policy 0, policy_version 596608 (0.0157) [2024-06-15 18:51:25,955][1648985] Fps is (10 sec: 39320.7, 60 sec: 43690.6, 300 sec: 46323.1). Total num frames: 1221853184. Throughput: 0: 10990.9. Samples: 305539072. Policy #0 lag: (min: 15.0, avg: 141.4, max: 271.0) [2024-06-15 18:51:25,956][1648985] Avg episode reward: [(0, '155.500')] [2024-06-15 18:51:27,144][1652491] Updated weights for policy 0, policy_version 596656 (0.0013) [2024-06-15 18:51:28,861][1652491] Updated weights for policy 0, policy_version 596708 (0.0015) [2024-06-15 18:51:30,714][1652491] Updated weights for policy 0, policy_version 596789 (0.0013) [2024-06-15 18:51:30,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 44783.3, 300 sec: 46652.7). Total num frames: 1222246400. Throughput: 0: 11343.6. Samples: 305606144. Policy #0 lag: (min: 15.0, avg: 141.4, max: 271.0) [2024-06-15 18:51:30,956][1648985] Avg episode reward: [(0, '147.080')] [2024-06-15 18:51:33,510][1652491] Updated weights for policy 0, policy_version 596864 (0.0013) [2024-06-15 18:51:35,955][1648985] Fps is (10 sec: 52429.6, 60 sec: 44782.9, 300 sec: 46430.6). Total num frames: 1222377472. Throughput: 0: 11377.8. Samples: 305681408. Policy #0 lag: (min: 15.0, avg: 141.4, max: 271.0) [2024-06-15 18:51:35,956][1648985] Avg episode reward: [(0, '141.400')] [2024-06-15 18:51:38,252][1652491] Updated weights for policy 0, policy_version 596927 (0.0013) [2024-06-15 18:51:40,739][1652491] Updated weights for policy 0, policy_version 597012 (0.0014) [2024-06-15 18:51:40,955][1648985] Fps is (10 sec: 45875.1, 60 sec: 46967.4, 300 sec: 46541.7). Total num frames: 1222705152. Throughput: 0: 11605.4. Samples: 305719808. Policy #0 lag: (min: 15.0, avg: 141.4, max: 271.0) [2024-06-15 18:51:40,956][1648985] Avg episode reward: [(0, '151.470')] [2024-06-15 18:51:43,562][1652491] Updated weights for policy 0, policy_version 597058 (0.0016) [2024-06-15 18:51:44,283][1651469] Signal inference workers to stop experience collection... (31100 times) [2024-06-15 18:51:44,320][1652491] InferenceWorker_p0-w0: stopping experience collection (31100 times) [2024-06-15 18:51:44,496][1651469] Signal inference workers to resume experience collection... (31100 times) [2024-06-15 18:51:44,496][1652491] InferenceWorker_p0-w0: resuming experience collection (31100 times) [2024-06-15 18:51:45,955][1648985] Fps is (10 sec: 52428.3, 60 sec: 45875.1, 300 sec: 46765.6). Total num frames: 1222901760. Throughput: 0: 11446.1. Samples: 305782272. Policy #0 lag: (min: 15.0, avg: 141.4, max: 271.0) [2024-06-15 18:51:45,956][1648985] Avg episode reward: [(0, '160.680')] [2024-06-15 18:51:48,932][1652491] Updated weights for policy 0, policy_version 597139 (0.0014) [2024-06-15 18:51:50,156][1652491] Updated weights for policy 0, policy_version 597185 (0.0013) [2024-06-15 18:51:50,955][1648985] Fps is (10 sec: 42598.8, 60 sec: 47513.7, 300 sec: 46545.5). Total num frames: 1223131136. Throughput: 0: 11616.7. Samples: 305854464. Policy #0 lag: (min: 15.0, avg: 141.4, max: 271.0) [2024-06-15 18:51:50,955][1648985] Avg episode reward: [(0, '168.420')] [2024-06-15 18:51:51,890][1652491] Updated weights for policy 0, policy_version 597270 (0.0017) [2024-06-15 18:51:55,635][1652491] Updated weights for policy 0, policy_version 597329 (0.0080) [2024-06-15 18:51:55,955][1648985] Fps is (10 sec: 45874.8, 60 sec: 44783.1, 300 sec: 46874.9). Total num frames: 1223360512. Throughput: 0: 11537.1. Samples: 305885184. Policy #0 lag: (min: 15.0, avg: 141.4, max: 271.0) [2024-06-15 18:51:55,956][1648985] Avg episode reward: [(0, '172.920')] [2024-06-15 18:51:59,834][1652491] Updated weights for policy 0, policy_version 597378 (0.0012) [2024-06-15 18:52:00,955][1648985] Fps is (10 sec: 39321.5, 60 sec: 45329.2, 300 sec: 46431.1). Total num frames: 1223524352. Throughput: 0: 11650.8. Samples: 305966080. Policy #0 lag: (min: 15.0, avg: 141.4, max: 271.0) [2024-06-15 18:52:00,956][1648985] Avg episode reward: [(0, '178.070')] [2024-06-15 18:52:01,846][1652491] Updated weights for policy 0, policy_version 597459 (0.0013) [2024-06-15 18:52:04,105][1652491] Updated weights for policy 0, policy_version 597552 (0.0013) [2024-06-15 18:52:05,955][1648985] Fps is (10 sec: 45875.9, 60 sec: 46421.4, 300 sec: 46652.7). Total num frames: 1223819264. Throughput: 0: 11491.6. Samples: 306019328. Policy #0 lag: (min: 15.0, avg: 141.4, max: 271.0) [2024-06-15 18:52:05,956][1648985] Avg episode reward: [(0, '164.400')] [2024-06-15 18:52:08,555][1652491] Updated weights for policy 0, policy_version 597626 (0.0014) [2024-06-15 18:52:10,955][1648985] Fps is (10 sec: 42598.2, 60 sec: 44237.0, 300 sec: 46430.6). Total num frames: 1223950336. Throughput: 0: 11468.8. Samples: 306055168. Policy #0 lag: (min: 15.0, avg: 141.4, max: 271.0) [2024-06-15 18:52:10,956][1648985] Avg episode reward: [(0, '157.240')] [2024-06-15 18:52:12,395][1652491] Updated weights for policy 0, policy_version 597665 (0.0119) [2024-06-15 18:52:14,104][1652491] Updated weights for policy 0, policy_version 597729 (0.0089) [2024-06-15 18:52:15,855][1652491] Updated weights for policy 0, policy_version 597793 (0.0013) [2024-06-15 18:52:15,955][1648985] Fps is (10 sec: 45873.6, 60 sec: 46967.1, 300 sec: 46430.7). Total num frames: 1224278016. Throughput: 0: 11468.7. Samples: 306122240. Policy #0 lag: (min: 15.0, avg: 141.4, max: 271.0) [2024-06-15 18:52:15,956][1648985] Avg episode reward: [(0, '153.100')] [2024-06-15 18:52:16,471][1652491] Updated weights for policy 0, policy_version 597824 (0.0013) [2024-06-15 18:52:20,223][1652491] Updated weights for policy 0, policy_version 597882 (0.0031) [2024-06-15 18:52:20,955][1648985] Fps is (10 sec: 52428.1, 60 sec: 45875.1, 300 sec: 46763.8). Total num frames: 1224474624. Throughput: 0: 11400.5. Samples: 306194432. Policy #0 lag: (min: 48.0, avg: 165.9, max: 304.0) [2024-06-15 18:52:20,956][1648985] Avg episode reward: [(0, '155.530')] [2024-06-15 18:52:23,782][1652491] Updated weights for policy 0, policy_version 597922 (0.0036) [2024-06-15 18:52:25,669][1652491] Updated weights for policy 0, policy_version 598000 (0.0013) [2024-06-15 18:52:25,955][1648985] Fps is (10 sec: 42599.9, 60 sec: 47513.7, 300 sec: 46208.4). Total num frames: 1224704000. Throughput: 0: 11468.8. Samples: 306235904. Policy #0 lag: (min: 48.0, avg: 165.9, max: 304.0) [2024-06-15 18:52:25,956][1648985] Avg episode reward: [(0, '157.040')] [2024-06-15 18:52:26,212][1651469] Signal inference workers to stop experience collection... (31150 times) [2024-06-15 18:52:26,253][1652491] InferenceWorker_p0-w0: stopping experience collection (31150 times) [2024-06-15 18:52:26,449][1651469] Signal inference workers to resume experience collection... (31150 times) [2024-06-15 18:52:26,450][1652491] InferenceWorker_p0-w0: resuming experience collection (31150 times) [2024-06-15 18:52:27,488][1652491] Updated weights for policy 0, policy_version 598079 (0.0013) [2024-06-15 18:52:30,955][1648985] Fps is (10 sec: 45876.3, 60 sec: 44783.0, 300 sec: 46764.1). Total num frames: 1224933376. Throughput: 0: 11309.6. Samples: 306291200. Policy #0 lag: (min: 48.0, avg: 165.9, max: 304.0) [2024-06-15 18:52:30,955][1648985] Avg episode reward: [(0, '156.630')] [2024-06-15 18:52:31,376][1652491] Updated weights for policy 0, policy_version 598139 (0.0013) [2024-06-15 18:52:35,855][1652491] Updated weights for policy 0, policy_version 598192 (0.0020) [2024-06-15 18:52:35,955][1648985] Fps is (10 sec: 39321.8, 60 sec: 45329.1, 300 sec: 46208.5). Total num frames: 1225097216. Throughput: 0: 11320.9. Samples: 306363904. Policy #0 lag: (min: 48.0, avg: 165.9, max: 304.0) [2024-06-15 18:52:35,956][1648985] Avg episode reward: [(0, '149.910')] [2024-06-15 18:52:37,586][1652491] Updated weights for policy 0, policy_version 598256 (0.0051) [2024-06-15 18:52:39,067][1652491] Updated weights for policy 0, policy_version 598332 (0.0112) [2024-06-15 18:52:40,955][1648985] Fps is (10 sec: 45874.7, 60 sec: 44782.9, 300 sec: 46653.5). Total num frames: 1225392128. Throughput: 0: 11218.5. Samples: 306390016. Policy #0 lag: (min: 48.0, avg: 165.9, max: 304.0) [2024-06-15 18:52:40,956][1648985] Avg episode reward: [(0, '142.990')] [2024-06-15 18:52:42,517][1652491] Updated weights for policy 0, policy_version 598384 (0.0030) [2024-06-15 18:52:45,955][1648985] Fps is (10 sec: 42598.5, 60 sec: 43690.8, 300 sec: 46208.4). Total num frames: 1225523200. Throughput: 0: 11173.0. Samples: 306468864. Policy #0 lag: (min: 48.0, avg: 165.9, max: 304.0) [2024-06-15 18:52:45,955][1648985] Avg episode reward: [(0, '146.860')] [2024-06-15 18:52:46,844][1652491] Updated weights for policy 0, policy_version 598419 (0.0033) [2024-06-15 18:52:48,133][1652491] Updated weights for policy 0, policy_version 598469 (0.0013) [2024-06-15 18:52:49,162][1652491] Updated weights for policy 0, policy_version 598515 (0.0014) [2024-06-15 18:52:50,283][1652491] Updated weights for policy 0, policy_version 598567 (0.0013) [2024-06-15 18:52:50,955][1648985] Fps is (10 sec: 52429.1, 60 sec: 46421.3, 300 sec: 46652.8). Total num frames: 1225916416. Throughput: 0: 11502.9. Samples: 306536960. Policy #0 lag: (min: 48.0, avg: 165.9, max: 304.0) [2024-06-15 18:52:50,955][1648985] Avg episode reward: [(0, '145.760')] [2024-06-15 18:52:52,287][1652491] Updated weights for policy 0, policy_version 598598 (0.0013) [2024-06-15 18:52:55,955][1648985] Fps is (10 sec: 52427.2, 60 sec: 44782.9, 300 sec: 46433.6). Total num frames: 1226047488. Throughput: 0: 11457.4. Samples: 306570752. Policy #0 lag: (min: 48.0, avg: 165.9, max: 304.0) [2024-06-15 18:52:55,956][1648985] Avg episode reward: [(0, '148.280')] [2024-06-15 18:52:55,960][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000598656_1226047488.pth... [2024-06-15 18:52:56,005][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000593264_1215004672.pth [2024-06-15 18:52:57,406][1652491] Updated weights for policy 0, policy_version 598657 (0.0012) [2024-06-15 18:52:58,482][1652491] Updated weights for policy 0, policy_version 598716 (0.0013) [2024-06-15 18:53:00,018][1652491] Updated weights for policy 0, policy_version 598768 (0.0014) [2024-06-15 18:53:00,955][1648985] Fps is (10 sec: 42598.6, 60 sec: 46967.5, 300 sec: 46319.5). Total num frames: 1226342400. Throughput: 0: 11571.3. Samples: 306642944. Policy #0 lag: (min: 48.0, avg: 165.9, max: 304.0) [2024-06-15 18:53:00,955][1648985] Avg episode reward: [(0, '153.610')] [2024-06-15 18:53:01,520][1652491] Updated weights for policy 0, policy_version 598832 (0.0012) [2024-06-15 18:53:03,983][1652491] Updated weights for policy 0, policy_version 598866 (0.0011) [2024-06-15 18:53:05,955][1648985] Fps is (10 sec: 52431.0, 60 sec: 45875.3, 300 sec: 46541.7). Total num frames: 1226571776. Throughput: 0: 11491.6. Samples: 306711552. Policy #0 lag: (min: 48.0, avg: 165.9, max: 304.0) [2024-06-15 18:53:05,955][1648985] Avg episode reward: [(0, '170.880')] [2024-06-15 18:53:09,282][1652491] Updated weights for policy 0, policy_version 598930 (0.0011) [2024-06-15 18:53:10,473][1651469] Signal inference workers to stop experience collection... (31200 times) [2024-06-15 18:53:10,539][1652491] InferenceWorker_p0-w0: stopping experience collection (31200 times) [2024-06-15 18:53:10,724][1651469] Signal inference workers to resume experience collection... (31200 times) [2024-06-15 18:53:10,724][1652491] InferenceWorker_p0-w0: resuming experience collection (31200 times) [2024-06-15 18:53:10,955][1648985] Fps is (10 sec: 39321.4, 60 sec: 46421.4, 300 sec: 46208.4). Total num frames: 1226735616. Throughput: 0: 11468.8. Samples: 306752000. Policy #0 lag: (min: 48.0, avg: 165.9, max: 304.0) [2024-06-15 18:53:10,956][1648985] Avg episode reward: [(0, '167.960')] [2024-06-15 18:53:10,956][1652491] Updated weights for policy 0, policy_version 598999 (0.0012) [2024-06-15 18:53:12,104][1652491] Updated weights for policy 0, policy_version 599056 (0.0013) [2024-06-15 18:53:14,940][1652491] Updated weights for policy 0, policy_version 599106 (0.0014) [2024-06-15 18:53:15,955][1648985] Fps is (10 sec: 45874.5, 60 sec: 45875.5, 300 sec: 46430.6). Total num frames: 1227030528. Throughput: 0: 11707.7. Samples: 306818048. Policy #0 lag: (min: 48.0, avg: 165.9, max: 304.0) [2024-06-15 18:53:15,956][1648985] Avg episode reward: [(0, '169.110')] [2024-06-15 18:53:16,473][1652491] Updated weights for policy 0, policy_version 599168 (0.0011) [2024-06-15 18:53:20,530][1652491] Updated weights for policy 0, policy_version 599223 (0.0037) [2024-06-15 18:53:20,955][1648985] Fps is (10 sec: 49151.7, 60 sec: 45875.3, 300 sec: 46541.7). Total num frames: 1227227136. Throughput: 0: 11798.7. Samples: 306894848. Policy #0 lag: (min: 48.0, avg: 165.9, max: 304.0) [2024-06-15 18:53:20,956][1648985] Avg episode reward: [(0, '178.220')] [2024-06-15 18:53:22,178][1652491] Updated weights for policy 0, policy_version 599266 (0.0015) [2024-06-15 18:53:23,548][1652491] Updated weights for policy 0, policy_version 599344 (0.0013) [2024-06-15 18:53:25,955][1648985] Fps is (10 sec: 49152.0, 60 sec: 46967.5, 300 sec: 46763.8). Total num frames: 1227522048. Throughput: 0: 11901.2. Samples: 306925568. Policy #0 lag: (min: 48.0, avg: 165.9, max: 304.0) [2024-06-15 18:53:25,956][1648985] Avg episode reward: [(0, '175.820')] [2024-06-15 18:53:26,026][1652491] Updated weights for policy 0, policy_version 599392 (0.0015) [2024-06-15 18:53:30,043][1652491] Updated weights for policy 0, policy_version 599440 (0.0013) [2024-06-15 18:53:30,955][1648985] Fps is (10 sec: 49152.0, 60 sec: 46421.3, 300 sec: 46541.7). Total num frames: 1227718656. Throughput: 0: 11969.4. Samples: 307007488. Policy #0 lag: (min: 48.0, avg: 165.9, max: 304.0) [2024-06-15 18:53:30,955][1648985] Avg episode reward: [(0, '171.400')] [2024-06-15 18:53:31,004][1652491] Updated weights for policy 0, policy_version 599481 (0.0012) [2024-06-15 18:53:32,455][1652491] Updated weights for policy 0, policy_version 599520 (0.0012) [2024-06-15 18:53:34,382][1652491] Updated weights for policy 0, policy_version 599615 (0.0205) [2024-06-15 18:53:35,955][1648985] Fps is (10 sec: 49152.2, 60 sec: 48605.9, 300 sec: 46652.8). Total num frames: 1228013568. Throughput: 0: 12026.3. Samples: 307078144. Policy #0 lag: (min: 48.0, avg: 165.9, max: 304.0) [2024-06-15 18:53:35,955][1648985] Avg episode reward: [(0, '181.860')] [2024-06-15 18:53:36,904][1652491] Updated weights for policy 0, policy_version 599671 (0.0011) [2024-06-15 18:53:40,917][1652491] Updated weights for policy 0, policy_version 599713 (0.0130) [2024-06-15 18:53:40,955][1648985] Fps is (10 sec: 49151.3, 60 sec: 46967.4, 300 sec: 46430.6). Total num frames: 1228210176. Throughput: 0: 12185.6. Samples: 307119104. Policy #0 lag: (min: 48.0, avg: 165.9, max: 304.0) [2024-06-15 18:53:40,956][1648985] Avg episode reward: [(0, '169.710')] [2024-06-15 18:53:42,979][1652491] Updated weights for policy 0, policy_version 599760 (0.0012) [2024-06-15 18:53:44,042][1652491] Updated weights for policy 0, policy_version 599809 (0.0050) [2024-06-15 18:53:45,373][1652491] Updated weights for policy 0, policy_version 599872 (0.0013) [2024-06-15 18:53:45,955][1648985] Fps is (10 sec: 52428.5, 60 sec: 50244.2, 300 sec: 46763.8). Total num frames: 1228537856. Throughput: 0: 12014.9. Samples: 307183616. Policy #0 lag: (min: 48.0, avg: 165.9, max: 304.0) [2024-06-15 18:53:45,956][1648985] Avg episode reward: [(0, '168.150')] [2024-06-15 18:53:48,174][1652491] Updated weights for policy 0, policy_version 599936 (0.0013) [2024-06-15 18:53:50,955][1648985] Fps is (10 sec: 45875.8, 60 sec: 45875.2, 300 sec: 46541.7). Total num frames: 1228668928. Throughput: 0: 12174.2. Samples: 307259392. Policy #0 lag: (min: 48.0, avg: 165.9, max: 304.0) [2024-06-15 18:53:50,956][1648985] Avg episode reward: [(0, '164.250')] [2024-06-15 18:53:51,557][1651469] Signal inference workers to stop experience collection... (31250 times) [2024-06-15 18:53:51,608][1652491] InferenceWorker_p0-w0: stopping experience collection (31250 times) [2024-06-15 18:53:51,821][1651469] Signal inference workers to resume experience collection... (31250 times) [2024-06-15 18:53:51,821][1652491] InferenceWorker_p0-w0: resuming experience collection (31250 times) [2024-06-15 18:53:52,275][1652491] Updated weights for policy 0, policy_version 599984 (0.0019) [2024-06-15 18:53:54,981][1652491] Updated weights for policy 0, policy_version 600055 (0.0012) [2024-06-15 18:53:55,955][1648985] Fps is (10 sec: 45874.1, 60 sec: 49152.0, 300 sec: 46874.8). Total num frames: 1228996608. Throughput: 0: 12060.4. Samples: 307294720. Policy #0 lag: (min: 1.0, avg: 93.2, max: 257.0) [2024-06-15 18:53:55,956][1648985] Avg episode reward: [(0, '162.740')] [2024-06-15 18:53:56,075][1652491] Updated weights for policy 0, policy_version 600112 (0.0014) [2024-06-15 18:53:58,575][1652491] Updated weights for policy 0, policy_version 600153 (0.0128) [2024-06-15 18:53:59,284][1652491] Updated weights for policy 0, policy_version 600192 (0.0024) [2024-06-15 18:54:00,955][1648985] Fps is (10 sec: 52428.6, 60 sec: 47513.5, 300 sec: 46652.7). Total num frames: 1229193216. Throughput: 0: 12197.0. Samples: 307366912. Policy #0 lag: (min: 1.0, avg: 93.2, max: 257.0) [2024-06-15 18:54:00,956][1648985] Avg episode reward: [(0, '176.480')] [2024-06-15 18:54:03,463][1652491] Updated weights for policy 0, policy_version 600256 (0.0014) [2024-06-15 18:54:05,937][1652491] Updated weights for policy 0, policy_version 600310 (0.0016) [2024-06-15 18:54:05,955][1648985] Fps is (10 sec: 42599.3, 60 sec: 47513.4, 300 sec: 46763.8). Total num frames: 1229422592. Throughput: 0: 12014.9. Samples: 307435520. Policy #0 lag: (min: 1.0, avg: 93.2, max: 257.0) [2024-06-15 18:54:05,956][1648985] Avg episode reward: [(0, '197.000')] [2024-06-15 18:54:06,991][1652491] Updated weights for policy 0, policy_version 600368 (0.0012) [2024-06-15 18:54:09,394][1652491] Updated weights for policy 0, policy_version 600405 (0.0034) [2024-06-15 18:54:10,955][1648985] Fps is (10 sec: 52429.3, 60 sec: 49698.1, 300 sec: 46874.9). Total num frames: 1229717504. Throughput: 0: 12276.6. Samples: 307478016. Policy #0 lag: (min: 1.0, avg: 93.2, max: 257.0) [2024-06-15 18:54:10,956][1648985] Avg episode reward: [(0, '200.340')] [2024-06-15 18:54:12,992][1652491] Updated weights for policy 0, policy_version 600464 (0.0020) [2024-06-15 18:54:14,101][1652491] Updated weights for policy 0, policy_version 600508 (0.0019) [2024-06-15 18:54:15,955][1648985] Fps is (10 sec: 45875.1, 60 sec: 47513.5, 300 sec: 46763.8). Total num frames: 1229881344. Throughput: 0: 11992.2. Samples: 307547136. Policy #0 lag: (min: 1.0, avg: 93.2, max: 257.0) [2024-06-15 18:54:15,956][1648985] Avg episode reward: [(0, '171.410')] [2024-06-15 18:54:16,436][1652491] Updated weights for policy 0, policy_version 600546 (0.0023) [2024-06-15 18:54:18,024][1652491] Updated weights for policy 0, policy_version 600624 (0.0014) [2024-06-15 18:54:20,850][1652491] Updated weights for policy 0, policy_version 600688 (0.0012) [2024-06-15 18:54:20,973][1648985] Fps is (10 sec: 49065.5, 60 sec: 49683.6, 300 sec: 46983.2). Total num frames: 1230209024. Throughput: 0: 11919.2. Samples: 307614720. Policy #0 lag: (min: 1.0, avg: 93.2, max: 257.0) [2024-06-15 18:54:20,973][1648985] Avg episode reward: [(0, '149.160')] [2024-06-15 18:54:24,540][1652491] Updated weights for policy 0, policy_version 600736 (0.0022) [2024-06-15 18:54:25,389][1652491] Updated weights for policy 0, policy_version 600766 (0.0025) [2024-06-15 18:54:25,955][1648985] Fps is (10 sec: 49151.9, 60 sec: 47513.5, 300 sec: 46652.8). Total num frames: 1230372864. Throughput: 0: 11958.1. Samples: 307657216. Policy #0 lag: (min: 1.0, avg: 93.2, max: 257.0) [2024-06-15 18:54:25,956][1648985] Avg episode reward: [(0, '148.190')] [2024-06-15 18:54:27,263][1652491] Updated weights for policy 0, policy_version 600816 (0.0014) [2024-06-15 18:54:28,544][1652491] Updated weights for policy 0, policy_version 600866 (0.0016) [2024-06-15 18:54:30,779][1652491] Updated weights for policy 0, policy_version 600912 (0.0012) [2024-06-15 18:54:30,957][1648985] Fps is (10 sec: 45947.7, 60 sec: 49150.5, 300 sec: 46763.6). Total num frames: 1230667776. Throughput: 0: 12128.2. Samples: 307729408. Policy #0 lag: (min: 1.0, avg: 93.2, max: 257.0) [2024-06-15 18:54:30,958][1648985] Avg episode reward: [(0, '146.370')] [2024-06-15 18:54:33,898][1651469] Signal inference workers to stop experience collection... (31300 times) [2024-06-15 18:54:33,946][1652491] InferenceWorker_p0-w0: stopping experience collection (31300 times) [2024-06-15 18:54:33,976][1652491] Updated weights for policy 0, policy_version 600964 (0.0014) [2024-06-15 18:54:34,173][1651469] Signal inference workers to resume experience collection... (31300 times) [2024-06-15 18:54:34,174][1652491] InferenceWorker_p0-w0: resuming experience collection (31300 times) [2024-06-15 18:54:35,955][1648985] Fps is (10 sec: 52429.2, 60 sec: 48059.7, 300 sec: 46874.9). Total num frames: 1230897152. Throughput: 0: 12083.2. Samples: 307803136. Policy #0 lag: (min: 1.0, avg: 93.2, max: 257.0) [2024-06-15 18:54:35,956][1648985] Avg episode reward: [(0, '160.160')] [2024-06-15 18:54:36,551][1652491] Updated weights for policy 0, policy_version 601040 (0.0110) [2024-06-15 18:54:39,478][1652491] Updated weights for policy 0, policy_version 601120 (0.0012) [2024-06-15 18:54:40,955][1648985] Fps is (10 sec: 49161.2, 60 sec: 49152.2, 300 sec: 46986.0). Total num frames: 1231159296. Throughput: 0: 12037.8. Samples: 307836416. Policy #0 lag: (min: 1.0, avg: 93.2, max: 257.0) [2024-06-15 18:54:40,956][1648985] Avg episode reward: [(0, '163.350')] [2024-06-15 18:54:42,318][1652491] Updated weights for policy 0, policy_version 601171 (0.0015) [2024-06-15 18:54:43,195][1652491] Updated weights for policy 0, policy_version 601212 (0.0010) [2024-06-15 18:54:45,955][1648985] Fps is (10 sec: 49150.8, 60 sec: 47513.4, 300 sec: 46985.9). Total num frames: 1231388672. Throughput: 0: 12231.1. Samples: 307917312. Policy #0 lag: (min: 1.0, avg: 93.2, max: 257.0) [2024-06-15 18:54:45,956][1648985] Avg episode reward: [(0, '165.430')] [2024-06-15 18:54:46,148][1652491] Updated weights for policy 0, policy_version 601276 (0.0020) [2024-06-15 18:54:48,009][1652491] Updated weights for policy 0, policy_version 601335 (0.0027) [2024-06-15 18:54:50,955][1648985] Fps is (10 sec: 45875.1, 60 sec: 49152.0, 300 sec: 46874.9). Total num frames: 1231618048. Throughput: 0: 12060.5. Samples: 307978240. Policy #0 lag: (min: 1.0, avg: 93.2, max: 257.0) [2024-06-15 18:54:50,956][1648985] Avg episode reward: [(0, '167.090')] [2024-06-15 18:54:51,083][1652491] Updated weights for policy 0, policy_version 601380 (0.0013) [2024-06-15 18:54:53,128][1652491] Updated weights for policy 0, policy_version 601440 (0.0022) [2024-06-15 18:54:55,955][1648985] Fps is (10 sec: 42599.3, 60 sec: 46967.6, 300 sec: 46652.7). Total num frames: 1231814656. Throughput: 0: 11912.5. Samples: 308014080. Policy #0 lag: (min: 1.0, avg: 93.2, max: 257.0) [2024-06-15 18:54:55,956][1648985] Avg episode reward: [(0, '159.670')] [2024-06-15 18:54:56,064][1652491] Updated weights for policy 0, policy_version 601474 (0.0011) [2024-06-15 18:54:56,283][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000601488_1231847424.pth... [2024-06-15 18:54:56,457][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000596032_1220673536.pth [2024-06-15 18:54:58,336][1652491] Updated weights for policy 0, policy_version 601568 (0.0119) [2024-06-15 18:55:00,955][1648985] Fps is (10 sec: 45874.2, 60 sec: 48059.6, 300 sec: 46986.2). Total num frames: 1232076800. Throughput: 0: 12049.0. Samples: 308089344. Policy #0 lag: (min: 1.0, avg: 93.2, max: 257.0) [2024-06-15 18:55:00,957][1648985] Avg episode reward: [(0, '178.310')] [2024-06-15 18:55:01,418][1652491] Updated weights for policy 0, policy_version 601601 (0.0020) [2024-06-15 18:55:02,650][1652491] Updated weights for policy 0, policy_version 601664 (0.0024) [2024-06-15 18:55:04,162][1652491] Updated weights for policy 0, policy_version 601723 (0.0012) [2024-06-15 18:55:05,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 48605.9, 300 sec: 46652.8). Total num frames: 1232338944. Throughput: 0: 12292.8. Samples: 308167680. Policy #0 lag: (min: 1.0, avg: 93.2, max: 257.0) [2024-06-15 18:55:05,956][1648985] Avg episode reward: [(0, '183.060')] [2024-06-15 18:55:08,608][1652491] Updated weights for policy 0, policy_version 601792 (0.0013) [2024-06-15 18:55:09,885][1652491] Updated weights for policy 0, policy_version 601842 (0.0012) [2024-06-15 18:55:10,955][1648985] Fps is (10 sec: 52429.9, 60 sec: 48059.7, 300 sec: 47097.1). Total num frames: 1232601088. Throughput: 0: 12049.1. Samples: 308199424. Policy #0 lag: (min: 1.0, avg: 93.2, max: 257.0) [2024-06-15 18:55:10,956][1648985] Avg episode reward: [(0, '185.110')] [2024-06-15 18:55:12,227][1652491] Updated weights for policy 0, policy_version 601872 (0.0013) [2024-06-15 18:55:13,058][1652491] Updated weights for policy 0, policy_version 601913 (0.0013) [2024-06-15 18:55:14,194][1651469] Signal inference workers to stop experience collection... (31350 times) [2024-06-15 18:55:14,258][1652491] InferenceWorker_p0-w0: stopping experience collection (31350 times) [2024-06-15 18:55:14,437][1651469] Signal inference workers to resume experience collection... (31350 times) [2024-06-15 18:55:14,438][1652491] InferenceWorker_p0-w0: resuming experience collection (31350 times) [2024-06-15 18:55:14,622][1652491] Updated weights for policy 0, policy_version 601977 (0.0018) [2024-06-15 18:55:15,955][1648985] Fps is (10 sec: 52429.1, 60 sec: 49698.2, 300 sec: 46652.8). Total num frames: 1232863232. Throughput: 0: 11958.5. Samples: 308267520. Policy #0 lag: (min: 1.0, avg: 93.2, max: 257.0) [2024-06-15 18:55:15,955][1648985] Avg episode reward: [(0, '157.780')] [2024-06-15 18:55:19,537][1652491] Updated weights for policy 0, policy_version 602034 (0.0013) [2024-06-15 18:55:20,955][1648985] Fps is (10 sec: 49152.4, 60 sec: 48073.9, 300 sec: 46986.0). Total num frames: 1233092608. Throughput: 0: 12060.5. Samples: 308345856. Policy #0 lag: (min: 1.0, avg: 93.2, max: 257.0) [2024-06-15 18:55:20,955][1648985] Avg episode reward: [(0, '143.710')] [2024-06-15 18:55:21,164][1652491] Updated weights for policy 0, policy_version 602109 (0.0034) [2024-06-15 18:55:24,138][1652491] Updated weights for policy 0, policy_version 602160 (0.0013) [2024-06-15 18:55:25,604][1652491] Updated weights for policy 0, policy_version 602224 (0.0012) [2024-06-15 18:55:25,955][1648985] Fps is (10 sec: 52428.6, 60 sec: 50244.3, 300 sec: 46875.0). Total num frames: 1233387520. Throughput: 0: 12094.6. Samples: 308380672. Policy #0 lag: (min: 1.0, avg: 93.2, max: 257.0) [2024-06-15 18:55:25,956][1648985] Avg episode reward: [(0, '150.710')] [2024-06-15 18:55:30,460][1652491] Updated weights for policy 0, policy_version 602293 (0.0104) [2024-06-15 18:55:30,955][1648985] Fps is (10 sec: 45874.8, 60 sec: 48061.2, 300 sec: 46986.0). Total num frames: 1233551360. Throughput: 0: 12003.6. Samples: 308457472. Policy #0 lag: (min: 0.0, avg: 85.5, max: 256.0) [2024-06-15 18:55:30,956][1648985] Avg episode reward: [(0, '158.570')] [2024-06-15 18:55:31,638][1652491] Updated weights for policy 0, policy_version 602355 (0.0021) [2024-06-15 18:55:34,574][1652491] Updated weights for policy 0, policy_version 602388 (0.0011) [2024-06-15 18:55:35,820][1652491] Updated weights for policy 0, policy_version 602448 (0.0013) [2024-06-15 18:55:35,955][1648985] Fps is (10 sec: 42598.3, 60 sec: 48605.9, 300 sec: 47208.1). Total num frames: 1233813504. Throughput: 0: 12197.0. Samples: 308527104. Policy #0 lag: (min: 0.0, avg: 85.5, max: 256.0) [2024-06-15 18:55:35,956][1648985] Avg episode reward: [(0, '173.030')] [2024-06-15 18:55:36,929][1652491] Updated weights for policy 0, policy_version 602496 (0.0023) [2024-06-15 18:55:40,955][1648985] Fps is (10 sec: 45874.1, 60 sec: 47513.4, 300 sec: 46985.9). Total num frames: 1234010112. Throughput: 0: 12299.3. Samples: 308567552. Policy #0 lag: (min: 0.0, avg: 85.5, max: 256.0) [2024-06-15 18:55:40,956][1648985] Avg episode reward: [(0, '152.180')] [2024-06-15 18:55:41,501][1652491] Updated weights for policy 0, policy_version 602576 (0.0016) [2024-06-15 18:55:42,759][1652491] Updated weights for policy 0, policy_version 602619 (0.0012) [2024-06-15 18:55:45,645][1652491] Updated weights for policy 0, policy_version 602688 (0.0013) [2024-06-15 18:55:45,955][1648985] Fps is (10 sec: 49151.9, 60 sec: 48606.1, 300 sec: 47541.4). Total num frames: 1234305024. Throughput: 0: 12231.2. Samples: 308639744. Policy #0 lag: (min: 0.0, avg: 85.5, max: 256.0) [2024-06-15 18:55:45,956][1648985] Avg episode reward: [(0, '139.410')] [2024-06-15 18:55:46,911][1652491] Updated weights for policy 0, policy_version 602752 (0.0019) [2024-06-15 18:55:50,955][1648985] Fps is (10 sec: 45876.3, 60 sec: 47513.6, 300 sec: 46763.9). Total num frames: 1234468864. Throughput: 0: 12060.4. Samples: 308710400. Policy #0 lag: (min: 0.0, avg: 85.5, max: 256.0) [2024-06-15 18:55:50,956][1648985] Avg episode reward: [(0, '146.930')] [2024-06-15 18:55:52,110][1652491] Updated weights for policy 0, policy_version 602803 (0.0013) [2024-06-15 18:55:53,976][1652491] Updated weights for policy 0, policy_version 602873 (0.0013) [2024-06-15 18:55:55,955][1648985] Fps is (10 sec: 39321.1, 60 sec: 48059.6, 300 sec: 47097.1). Total num frames: 1234698240. Throughput: 0: 11855.6. Samples: 308732928. Policy #0 lag: (min: 0.0, avg: 85.5, max: 256.0) [2024-06-15 18:55:55,956][1648985] Avg episode reward: [(0, '174.450')] [2024-06-15 18:55:56,459][1651469] Signal inference workers to stop experience collection... (31400 times) [2024-06-15 18:55:56,527][1652491] InferenceWorker_p0-w0: stopping experience collection (31400 times) [2024-06-15 18:55:56,680][1651469] Signal inference workers to resume experience collection... (31400 times) [2024-06-15 18:55:56,680][1652491] InferenceWorker_p0-w0: resuming experience collection (31400 times) [2024-06-15 18:55:57,618][1652491] Updated weights for policy 0, policy_version 602941 (0.0013) [2024-06-15 18:55:59,129][1652491] Updated weights for policy 0, policy_version 603007 (0.0013) [2024-06-15 18:56:00,955][1648985] Fps is (10 sec: 49152.2, 60 sec: 48059.9, 300 sec: 47208.1). Total num frames: 1234960384. Throughput: 0: 11923.9. Samples: 308804096. Policy #0 lag: (min: 0.0, avg: 85.5, max: 256.0) [2024-06-15 18:56:00,956][1648985] Avg episode reward: [(0, '176.150')] [2024-06-15 18:56:03,478][1652491] Updated weights for policy 0, policy_version 603071 (0.0014) [2024-06-15 18:56:05,328][1652491] Updated weights for policy 0, policy_version 603130 (0.0012) [2024-06-15 18:56:05,955][1648985] Fps is (10 sec: 52429.6, 60 sec: 48059.7, 300 sec: 47208.2). Total num frames: 1235222528. Throughput: 0: 11662.2. Samples: 308870656. Policy #0 lag: (min: 0.0, avg: 85.5, max: 256.0) [2024-06-15 18:56:05,955][1648985] Avg episode reward: [(0, '153.320')] [2024-06-15 18:56:08,275][1652491] Updated weights for policy 0, policy_version 603173 (0.0012) [2024-06-15 18:56:09,991][1652491] Updated weights for policy 0, policy_version 603263 (0.0111) [2024-06-15 18:56:10,955][1648985] Fps is (10 sec: 52428.0, 60 sec: 48059.6, 300 sec: 47541.3). Total num frames: 1235484672. Throughput: 0: 11764.6. Samples: 308910080. Policy #0 lag: (min: 0.0, avg: 85.5, max: 256.0) [2024-06-15 18:56:10,956][1648985] Avg episode reward: [(0, '147.180')] [2024-06-15 18:56:14,674][1652491] Updated weights for policy 0, policy_version 603317 (0.0013) [2024-06-15 18:56:15,955][1648985] Fps is (10 sec: 42598.6, 60 sec: 46421.3, 300 sec: 47208.2). Total num frames: 1235648512. Throughput: 0: 11537.1. Samples: 308976640. Policy #0 lag: (min: 0.0, avg: 85.5, max: 256.0) [2024-06-15 18:56:15,956][1648985] Avg episode reward: [(0, '141.970')] [2024-06-15 18:56:16,499][1652491] Updated weights for policy 0, policy_version 603376 (0.0033) [2024-06-15 18:56:19,505][1652491] Updated weights for policy 0, policy_version 603427 (0.0013) [2024-06-15 18:56:20,955][1648985] Fps is (10 sec: 45875.9, 60 sec: 47513.6, 300 sec: 47763.6). Total num frames: 1235943424. Throughput: 0: 11605.3. Samples: 309049344. Policy #0 lag: (min: 0.0, avg: 85.5, max: 256.0) [2024-06-15 18:56:20,955][1648985] Avg episode reward: [(0, '142.590')] [2024-06-15 18:56:21,066][1652491] Updated weights for policy 0, policy_version 603491 (0.0013) [2024-06-15 18:56:24,782][1652491] Updated weights for policy 0, policy_version 603537 (0.0015) [2024-06-15 18:56:25,955][1648985] Fps is (10 sec: 49151.9, 60 sec: 45875.2, 300 sec: 47097.1). Total num frames: 1236140032. Throughput: 0: 11605.4. Samples: 309089792. Policy #0 lag: (min: 0.0, avg: 85.5, max: 256.0) [2024-06-15 18:56:25,955][1648985] Avg episode reward: [(0, '135.770')] [2024-06-15 18:56:26,329][1652491] Updated weights for policy 0, policy_version 603586 (0.0013) [2024-06-15 18:56:27,526][1652491] Updated weights for policy 0, policy_version 603642 (0.0026) [2024-06-15 18:56:30,573][1652491] Updated weights for policy 0, policy_version 603702 (0.0012) [2024-06-15 18:56:30,955][1648985] Fps is (10 sec: 45874.9, 60 sec: 47513.6, 300 sec: 47541.4). Total num frames: 1236402176. Throughput: 0: 11616.7. Samples: 309162496. Policy #0 lag: (min: 0.0, avg: 85.5, max: 256.0) [2024-06-15 18:56:30,956][1648985] Avg episode reward: [(0, '143.640')] [2024-06-15 18:56:31,876][1652491] Updated weights for policy 0, policy_version 603760 (0.0010) [2024-06-15 18:56:35,955][1648985] Fps is (10 sec: 42598.4, 60 sec: 45875.2, 300 sec: 46986.0). Total num frames: 1236566016. Throughput: 0: 11639.5. Samples: 309234176. Policy #0 lag: (min: 0.0, avg: 85.5, max: 256.0) [2024-06-15 18:56:35,955][1648985] Avg episode reward: [(0, '151.370')] [2024-06-15 18:56:35,964][1652491] Updated weights for policy 0, policy_version 603808 (0.0014) [2024-06-15 18:56:37,775][1652491] Updated weights for policy 0, policy_version 603860 (0.0013) [2024-06-15 18:56:40,471][1651469] Signal inference workers to stop experience collection... (31450 times) [2024-06-15 18:56:40,519][1652491] InferenceWorker_p0-w0: stopping experience collection (31450 times) [2024-06-15 18:56:40,520][1652491] Updated weights for policy 0, policy_version 603923 (0.0012) [2024-06-15 18:56:40,783][1651469] Signal inference workers to resume experience collection... (31450 times) [2024-06-15 18:56:40,784][1652491] InferenceWorker_p0-w0: resuming experience collection (31450 times) [2024-06-15 18:56:40,955][1648985] Fps is (10 sec: 45876.0, 60 sec: 47513.9, 300 sec: 47319.3). Total num frames: 1236860928. Throughput: 0: 11787.5. Samples: 309263360. Policy #0 lag: (min: 0.0, avg: 85.5, max: 256.0) [2024-06-15 18:56:40,955][1648985] Avg episode reward: [(0, '161.390')] [2024-06-15 18:56:41,926][1652491] Updated weights for policy 0, policy_version 603984 (0.0013) [2024-06-15 18:56:45,955][1648985] Fps is (10 sec: 49151.0, 60 sec: 45875.1, 300 sec: 47208.1). Total num frames: 1237057536. Throughput: 0: 11992.1. Samples: 309343744. Policy #0 lag: (min: 0.0, avg: 85.5, max: 256.0) [2024-06-15 18:56:45,956][1648985] Avg episode reward: [(0, '166.190')] [2024-06-15 18:56:46,795][1652491] Updated weights for policy 0, policy_version 604064 (0.0014) [2024-06-15 18:56:48,213][1652491] Updated weights for policy 0, policy_version 604102 (0.0015) [2024-06-15 18:56:49,360][1652491] Updated weights for policy 0, policy_version 604158 (0.0015) [2024-06-15 18:56:50,955][1648985] Fps is (10 sec: 49151.2, 60 sec: 48059.8, 300 sec: 47430.3). Total num frames: 1237352448. Throughput: 0: 12094.6. Samples: 309414912. Policy #0 lag: (min: 0.0, avg: 85.5, max: 256.0) [2024-06-15 18:56:50,956][1648985] Avg episode reward: [(0, '159.070')] [2024-06-15 18:56:51,731][1652491] Updated weights for policy 0, policy_version 604215 (0.0014) [2024-06-15 18:56:53,093][1652491] Updated weights for policy 0, policy_version 604256 (0.0011) [2024-06-15 18:56:55,956][1648985] Fps is (10 sec: 52426.4, 60 sec: 48059.4, 300 sec: 47652.3). Total num frames: 1237581824. Throughput: 0: 11901.0. Samples: 309445632. Policy #0 lag: (min: 0.0, avg: 85.5, max: 256.0) [2024-06-15 18:56:55,956][1648985] Avg episode reward: [(0, '165.970')] [2024-06-15 18:56:55,963][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000604288_1237581824.pth... [2024-06-15 18:56:56,011][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000598656_1226047488.pth [2024-06-15 18:56:58,248][1652491] Updated weights for policy 0, policy_version 604307 (0.0016) [2024-06-15 18:57:00,644][1652491] Updated weights for policy 0, policy_version 604404 (0.0076) [2024-06-15 18:57:00,955][1648985] Fps is (10 sec: 49152.1, 60 sec: 48059.7, 300 sec: 47541.4). Total num frames: 1237843968. Throughput: 0: 12003.5. Samples: 309516800. Policy #0 lag: (min: 0.0, avg: 85.5, max: 256.0) [2024-06-15 18:57:00,955][1648985] Avg episode reward: [(0, '167.180')] [2024-06-15 18:57:03,110][1652491] Updated weights for policy 0, policy_version 604454 (0.0032) [2024-06-15 18:57:04,779][1652491] Updated weights for policy 0, policy_version 604528 (0.0013) [2024-06-15 18:57:05,955][1648985] Fps is (10 sec: 52432.3, 60 sec: 48059.8, 300 sec: 47985.7). Total num frames: 1238106112. Throughput: 0: 11946.7. Samples: 309586944. Policy #0 lag: (min: 31.0, avg: 148.2, max: 287.0) [2024-06-15 18:57:05,956][1648985] Avg episode reward: [(0, '171.660')] [2024-06-15 18:57:09,679][1652491] Updated weights for policy 0, policy_version 604578 (0.0013) [2024-06-15 18:57:10,955][1648985] Fps is (10 sec: 42598.1, 60 sec: 46421.4, 300 sec: 47430.3). Total num frames: 1238269952. Throughput: 0: 11946.6. Samples: 309627392. Policy #0 lag: (min: 31.0, avg: 148.2, max: 287.0) [2024-06-15 18:57:10,956][1648985] Avg episode reward: [(0, '169.470')] [2024-06-15 18:57:11,767][1652491] Updated weights for policy 0, policy_version 604665 (0.0014) [2024-06-15 18:57:14,663][1652491] Updated weights for policy 0, policy_version 604708 (0.0011) [2024-06-15 18:57:15,955][1648985] Fps is (10 sec: 42598.2, 60 sec: 48059.7, 300 sec: 47652.5). Total num frames: 1238532096. Throughput: 0: 11730.5. Samples: 309690368. Policy #0 lag: (min: 31.0, avg: 148.2, max: 287.0) [2024-06-15 18:57:15,956][1648985] Avg episode reward: [(0, '178.590')] [2024-06-15 18:57:16,149][1652491] Updated weights for policy 0, policy_version 604769 (0.0011) [2024-06-15 18:57:20,448][1652491] Updated weights for policy 0, policy_version 604832 (0.0011) [2024-06-15 18:57:20,955][1648985] Fps is (10 sec: 45875.1, 60 sec: 46421.2, 300 sec: 47541.4). Total num frames: 1238728704. Throughput: 0: 11787.3. Samples: 309764608. Policy #0 lag: (min: 31.0, avg: 148.2, max: 287.0) [2024-06-15 18:57:20,956][1648985] Avg episode reward: [(0, '178.350')] [2024-06-15 18:57:21,746][1652491] Updated weights for policy 0, policy_version 604880 (0.0013) [2024-06-15 18:57:25,184][1651469] Signal inference workers to stop experience collection... (31500 times) [2024-06-15 18:57:25,217][1652491] InferenceWorker_p0-w0: stopping experience collection (31500 times) [2024-06-15 18:57:25,439][1651469] Signal inference workers to resume experience collection... (31500 times) [2024-06-15 18:57:25,440][1652491] InferenceWorker_p0-w0: resuming experience collection (31500 times) [2024-06-15 18:57:25,720][1652491] Updated weights for policy 0, policy_version 604960 (0.0015) [2024-06-15 18:57:25,955][1648985] Fps is (10 sec: 42597.6, 60 sec: 46967.3, 300 sec: 47541.3). Total num frames: 1238958080. Throughput: 0: 11810.0. Samples: 309794816. Policy #0 lag: (min: 31.0, avg: 148.2, max: 287.0) [2024-06-15 18:57:25,956][1648985] Avg episode reward: [(0, '160.250')] [2024-06-15 18:57:27,322][1652491] Updated weights for policy 0, policy_version 605024 (0.0013) [2024-06-15 18:57:28,267][1652491] Updated weights for policy 0, policy_version 605056 (0.0021) [2024-06-15 18:57:30,955][1648985] Fps is (10 sec: 42598.6, 60 sec: 45875.2, 300 sec: 47652.4). Total num frames: 1239154688. Throughput: 0: 11605.4. Samples: 309865984. Policy #0 lag: (min: 31.0, avg: 148.2, max: 287.0) [2024-06-15 18:57:30,956][1648985] Avg episode reward: [(0, '162.150')] [2024-06-15 18:57:32,504][1652491] Updated weights for policy 0, policy_version 605110 (0.0023) [2024-06-15 18:57:33,696][1652491] Updated weights for policy 0, policy_version 605153 (0.0013) [2024-06-15 18:57:34,417][1652491] Updated weights for policy 0, policy_version 605184 (0.0013) [2024-06-15 18:57:35,955][1648985] Fps is (10 sec: 45875.2, 60 sec: 47513.4, 300 sec: 47541.3). Total num frames: 1239416832. Throughput: 0: 11605.3. Samples: 309937152. Policy #0 lag: (min: 31.0, avg: 148.2, max: 287.0) [2024-06-15 18:57:35,956][1648985] Avg episode reward: [(0, '146.720')] [2024-06-15 18:57:37,395][1652491] Updated weights for policy 0, policy_version 605242 (0.0111) [2024-06-15 18:57:38,421][1652491] Updated weights for policy 0, policy_version 605283 (0.0011) [2024-06-15 18:57:40,955][1648985] Fps is (10 sec: 52429.3, 60 sec: 46967.4, 300 sec: 47985.7). Total num frames: 1239678976. Throughput: 0: 11662.4. Samples: 309970432. Policy #0 lag: (min: 31.0, avg: 148.2, max: 287.0) [2024-06-15 18:57:40,955][1648985] Avg episode reward: [(0, '130.930')] [2024-06-15 18:57:43,396][1652491] Updated weights for policy 0, policy_version 605344 (0.0013) [2024-06-15 18:57:44,898][1652491] Updated weights for policy 0, policy_version 605408 (0.0015) [2024-06-15 18:57:45,955][1648985] Fps is (10 sec: 52429.4, 60 sec: 48059.8, 300 sec: 47541.4). Total num frames: 1239941120. Throughput: 0: 11685.0. Samples: 310042624. Policy #0 lag: (min: 31.0, avg: 148.2, max: 287.0) [2024-06-15 18:57:45,956][1648985] Avg episode reward: [(0, '143.800')] [2024-06-15 18:57:47,738][1652491] Updated weights for policy 0, policy_version 605458 (0.0014) [2024-06-15 18:57:49,611][1652491] Updated weights for policy 0, policy_version 605525 (0.0013) [2024-06-15 18:57:50,466][1652491] Updated weights for policy 0, policy_version 605568 (0.0013) [2024-06-15 18:57:50,955][1648985] Fps is (10 sec: 52428.4, 60 sec: 47513.6, 300 sec: 47985.7). Total num frames: 1240203264. Throughput: 0: 11559.8. Samples: 310107136. Policy #0 lag: (min: 31.0, avg: 148.2, max: 287.0) [2024-06-15 18:57:50,956][1648985] Avg episode reward: [(0, '150.100')] [2024-06-15 18:57:55,304][1652491] Updated weights for policy 0, policy_version 605637 (0.0013) [2024-06-15 18:57:55,955][1648985] Fps is (10 sec: 42597.7, 60 sec: 46421.6, 300 sec: 47541.3). Total num frames: 1240367104. Throughput: 0: 11662.2. Samples: 310152192. Policy #0 lag: (min: 31.0, avg: 148.2, max: 287.0) [2024-06-15 18:57:55,956][1648985] Avg episode reward: [(0, '149.910')] [2024-06-15 18:57:56,899][1652491] Updated weights for policy 0, policy_version 605696 (0.0012) [2024-06-15 18:57:59,549][1652491] Updated weights for policy 0, policy_version 605753 (0.0013) [2024-06-15 18:58:00,955][1648985] Fps is (10 sec: 42596.8, 60 sec: 46421.0, 300 sec: 47652.4). Total num frames: 1240629248. Throughput: 0: 11684.9. Samples: 310216192. Policy #0 lag: (min: 31.0, avg: 148.2, max: 287.0) [2024-06-15 18:58:00,957][1648985] Avg episode reward: [(0, '152.580')] [2024-06-15 18:58:01,239][1652491] Updated weights for policy 0, policy_version 605796 (0.0064) [2024-06-15 18:58:05,955][1648985] Fps is (10 sec: 42599.1, 60 sec: 44782.9, 300 sec: 47652.4). Total num frames: 1240793088. Throughput: 0: 11650.9. Samples: 310288896. Policy #0 lag: (min: 31.0, avg: 148.2, max: 287.0) [2024-06-15 18:58:05,956][1648985] Avg episode reward: [(0, '155.070')] [2024-06-15 18:58:06,046][1652491] Updated weights for policy 0, policy_version 605863 (0.0014) [2024-06-15 18:58:06,675][1651469] Signal inference workers to stop experience collection... (31550 times) [2024-06-15 18:58:06,725][1652491] InferenceWorker_p0-w0: stopping experience collection (31550 times) [2024-06-15 18:58:06,872][1651469] Signal inference workers to resume experience collection... (31550 times) [2024-06-15 18:58:06,873][1652491] InferenceWorker_p0-w0: resuming experience collection (31550 times) [2024-06-15 18:58:07,229][1652491] Updated weights for policy 0, policy_version 605920 (0.0013) [2024-06-15 18:58:10,136][1652491] Updated weights for policy 0, policy_version 605968 (0.0011) [2024-06-15 18:58:10,955][1648985] Fps is (10 sec: 45876.1, 60 sec: 46967.4, 300 sec: 47652.4). Total num frames: 1241088000. Throughput: 0: 11707.7. Samples: 310321664. Policy #0 lag: (min: 31.0, avg: 148.2, max: 287.0) [2024-06-15 18:58:10,956][1648985] Avg episode reward: [(0, '170.520')] [2024-06-15 18:58:11,061][1652491] Updated weights for policy 0, policy_version 606016 (0.0015) [2024-06-15 18:58:13,283][1652491] Updated weights for policy 0, policy_version 606080 (0.0113) [2024-06-15 18:58:15,955][1648985] Fps is (10 sec: 45874.6, 60 sec: 45328.9, 300 sec: 47541.3). Total num frames: 1241251840. Throughput: 0: 11776.0. Samples: 310395904. Policy #0 lag: (min: 31.0, avg: 148.2, max: 287.0) [2024-06-15 18:58:15,956][1648985] Avg episode reward: [(0, '178.380')] [2024-06-15 18:58:17,238][1652491] Updated weights for policy 0, policy_version 606133 (0.0043) [2024-06-15 18:58:18,581][1652491] Updated weights for policy 0, policy_version 606202 (0.0013) [2024-06-15 18:58:20,955][1648985] Fps is (10 sec: 45876.1, 60 sec: 46967.5, 300 sec: 47541.4). Total num frames: 1241546752. Throughput: 0: 11821.6. Samples: 310469120. Policy #0 lag: (min: 31.0, avg: 148.2, max: 287.0) [2024-06-15 18:58:20,956][1648985] Avg episode reward: [(0, '152.420')] [2024-06-15 18:58:21,504][1652491] Updated weights for policy 0, policy_version 606258 (0.0132) [2024-06-15 18:58:24,045][1652491] Updated weights for policy 0, policy_version 606320 (0.0017) [2024-06-15 18:58:25,955][1648985] Fps is (10 sec: 52429.7, 60 sec: 46967.6, 300 sec: 47652.5). Total num frames: 1241776128. Throughput: 0: 11810.1. Samples: 310501888. Policy #0 lag: (min: 31.0, avg: 148.2, max: 287.0) [2024-06-15 18:58:25,956][1648985] Avg episode reward: [(0, '168.900')] [2024-06-15 18:58:27,605][1652491] Updated weights for policy 0, policy_version 606368 (0.0013) [2024-06-15 18:58:29,313][1652491] Updated weights for policy 0, policy_version 606448 (0.0015) [2024-06-15 18:58:30,955][1648985] Fps is (10 sec: 49151.9, 60 sec: 48059.8, 300 sec: 47541.4). Total num frames: 1242038272. Throughput: 0: 11730.5. Samples: 310570496. Policy #0 lag: (min: 31.0, avg: 148.2, max: 287.0) [2024-06-15 18:58:30,956][1648985] Avg episode reward: [(0, '168.300')] [2024-06-15 18:58:31,838][1652491] Updated weights for policy 0, policy_version 606480 (0.0013) [2024-06-15 18:58:32,775][1652491] Updated weights for policy 0, policy_version 606526 (0.0017) [2024-06-15 18:58:34,717][1652491] Updated weights for policy 0, policy_version 606584 (0.0129) [2024-06-15 18:58:35,955][1648985] Fps is (10 sec: 52428.6, 60 sec: 48059.9, 300 sec: 47763.5). Total num frames: 1242300416. Throughput: 0: 11969.4. Samples: 310645760. Policy #0 lag: (min: 31.0, avg: 148.2, max: 287.0) [2024-06-15 18:58:35,956][1648985] Avg episode reward: [(0, '154.940')] [2024-06-15 18:58:38,595][1652491] Updated weights for policy 0, policy_version 606627 (0.0124) [2024-06-15 18:58:40,476][1652491] Updated weights for policy 0, policy_version 606710 (0.0012) [2024-06-15 18:58:40,955][1648985] Fps is (10 sec: 52429.1, 60 sec: 48059.7, 300 sec: 47541.4). Total num frames: 1242562560. Throughput: 0: 11776.1. Samples: 310682112. Policy #0 lag: (min: 7.0, avg: 91.9, max: 263.0) [2024-06-15 18:58:40,955][1648985] Avg episode reward: [(0, '148.410')] [2024-06-15 18:58:44,030][1652491] Updated weights for policy 0, policy_version 606768 (0.0129) [2024-06-15 18:58:45,955][1648985] Fps is (10 sec: 42598.2, 60 sec: 46421.3, 300 sec: 47652.4). Total num frames: 1242726400. Throughput: 0: 11764.7. Samples: 310745600. Policy #0 lag: (min: 7.0, avg: 91.9, max: 263.0) [2024-06-15 18:58:45,956][1648985] Avg episode reward: [(0, '151.220')] [2024-06-15 18:58:46,237][1652491] Updated weights for policy 0, policy_version 606817 (0.0013) [2024-06-15 18:58:50,334][1652491] Updated weights for policy 0, policy_version 606866 (0.0028) [2024-06-15 18:58:50,594][1651469] Signal inference workers to stop experience collection... (31600 times) [2024-06-15 18:58:50,639][1652491] InferenceWorker_p0-w0: stopping experience collection (31600 times) [2024-06-15 18:58:50,837][1651469] Signal inference workers to resume experience collection... (31600 times) [2024-06-15 18:58:50,838][1652491] InferenceWorker_p0-w0: resuming experience collection (31600 times) [2024-06-15 18:58:50,955][1648985] Fps is (10 sec: 36044.5, 60 sec: 45329.1, 300 sec: 47208.2). Total num frames: 1242923008. Throughput: 0: 11719.1. Samples: 310816256. Policy #0 lag: (min: 7.0, avg: 91.9, max: 263.0) [2024-06-15 18:58:50,956][1648985] Avg episode reward: [(0, '140.470')] [2024-06-15 18:58:52,316][1652491] Updated weights for policy 0, policy_version 606944 (0.0013) [2024-06-15 18:58:54,657][1652491] Updated weights for policy 0, policy_version 606981 (0.0046) [2024-06-15 18:58:55,853][1652491] Updated weights for policy 0, policy_version 607040 (0.0014) [2024-06-15 18:58:55,955][1648985] Fps is (10 sec: 49150.7, 60 sec: 47513.5, 300 sec: 47541.3). Total num frames: 1243217920. Throughput: 0: 11639.4. Samples: 310845440. Policy #0 lag: (min: 7.0, avg: 91.9, max: 263.0) [2024-06-15 18:58:55,956][1648985] Avg episode reward: [(0, '135.620')] [2024-06-15 18:58:55,973][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000607040_1243217920.pth... [2024-06-15 18:58:56,034][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000601488_1231847424.pth [2024-06-15 18:58:58,225][1652491] Updated weights for policy 0, policy_version 607102 (0.0041) [2024-06-15 18:59:00,955][1648985] Fps is (10 sec: 42597.7, 60 sec: 45329.2, 300 sec: 47208.1). Total num frames: 1243348992. Throughput: 0: 11582.6. Samples: 310917120. Policy #0 lag: (min: 7.0, avg: 91.9, max: 263.0) [2024-06-15 18:59:00,956][1648985] Avg episode reward: [(0, '144.140')] [2024-06-15 18:59:03,571][1652491] Updated weights for policy 0, policy_version 607168 (0.0084) [2024-06-15 18:59:04,748][1652491] Updated weights for policy 0, policy_version 607229 (0.0013) [2024-06-15 18:59:05,955][1648985] Fps is (10 sec: 42599.6, 60 sec: 47513.6, 300 sec: 47208.1). Total num frames: 1243643904. Throughput: 0: 11423.3. Samples: 310983168. Policy #0 lag: (min: 7.0, avg: 91.9, max: 263.0) [2024-06-15 18:59:05,956][1648985] Avg episode reward: [(0, '155.310')] [2024-06-15 18:59:06,988][1652491] Updated weights for policy 0, policy_version 607296 (0.0013) [2024-06-15 18:59:09,533][1652491] Updated weights for policy 0, policy_version 607358 (0.0011) [2024-06-15 18:59:10,955][1648985] Fps is (10 sec: 52429.6, 60 sec: 46421.5, 300 sec: 47430.3). Total num frames: 1243873280. Throughput: 0: 11332.3. Samples: 311011840. Policy #0 lag: (min: 7.0, avg: 91.9, max: 263.0) [2024-06-15 18:59:10,956][1648985] Avg episode reward: [(0, '161.180')] [2024-06-15 18:59:14,332][1652491] Updated weights for policy 0, policy_version 607408 (0.0123) [2024-06-15 18:59:15,955][1648985] Fps is (10 sec: 45875.1, 60 sec: 47513.7, 300 sec: 47099.9). Total num frames: 1244102656. Throughput: 0: 11446.0. Samples: 311085568. Policy #0 lag: (min: 7.0, avg: 91.9, max: 263.0) [2024-06-15 18:59:15,956][1648985] Avg episode reward: [(0, '149.310')] [2024-06-15 18:59:16,185][1652491] Updated weights for policy 0, policy_version 607482 (0.0013) [2024-06-15 18:59:18,065][1652491] Updated weights for policy 0, policy_version 607522 (0.0050) [2024-06-15 18:59:20,861][1652491] Updated weights for policy 0, policy_version 607587 (0.0017) [2024-06-15 18:59:20,955][1648985] Fps is (10 sec: 45874.9, 60 sec: 46421.3, 300 sec: 47319.2). Total num frames: 1244332032. Throughput: 0: 11264.0. Samples: 311152640. Policy #0 lag: (min: 7.0, avg: 91.9, max: 263.0) [2024-06-15 18:59:20,956][1648985] Avg episode reward: [(0, '147.060')] [2024-06-15 18:59:24,919][1652491] Updated weights for policy 0, policy_version 607653 (0.0038) [2024-06-15 18:59:25,856][1652491] Updated weights for policy 0, policy_version 607696 (0.0013) [2024-06-15 18:59:25,955][1648985] Fps is (10 sec: 45875.5, 60 sec: 46421.3, 300 sec: 47097.3). Total num frames: 1244561408. Throughput: 0: 11332.2. Samples: 311192064. Policy #0 lag: (min: 7.0, avg: 91.9, max: 263.0) [2024-06-15 18:59:25,956][1648985] Avg episode reward: [(0, '158.550')] [2024-06-15 18:59:28,617][1652491] Updated weights for policy 0, policy_version 607751 (0.0014) [2024-06-15 18:59:29,875][1652491] Updated weights for policy 0, policy_version 607808 (0.0117) [2024-06-15 18:59:30,955][1648985] Fps is (10 sec: 45875.8, 60 sec: 45875.2, 300 sec: 47097.1). Total num frames: 1244790784. Throughput: 0: 11366.4. Samples: 311257088. Policy #0 lag: (min: 7.0, avg: 91.9, max: 263.0) [2024-06-15 18:59:30,956][1648985] Avg episode reward: [(0, '172.990')] [2024-06-15 18:59:35,578][1652491] Updated weights for policy 0, policy_version 607873 (0.0016) [2024-06-15 18:59:35,894][1651469] Signal inference workers to stop experience collection... (31650 times) [2024-06-15 18:59:35,924][1652491] InferenceWorker_p0-w0: stopping experience collection (31650 times) [2024-06-15 18:59:35,955][1648985] Fps is (10 sec: 39321.2, 60 sec: 44236.7, 300 sec: 46763.8). Total num frames: 1244954624. Throughput: 0: 11502.9. Samples: 311333888. Policy #0 lag: (min: 7.0, avg: 91.9, max: 263.0) [2024-06-15 18:59:35,956][1648985] Avg episode reward: [(0, '164.890')] [2024-06-15 18:59:36,101][1651469] Signal inference workers to resume experience collection... (31650 times) [2024-06-15 18:59:36,102][1652491] InferenceWorker_p0-w0: resuming experience collection (31650 times) [2024-06-15 18:59:36,922][1652491] Updated weights for policy 0, policy_version 607937 (0.0082) [2024-06-15 18:59:38,012][1652491] Updated weights for policy 0, policy_version 607992 (0.0013) [2024-06-15 18:59:40,223][1652491] Updated weights for policy 0, policy_version 608048 (0.0015) [2024-06-15 18:59:40,955][1648985] Fps is (10 sec: 52427.7, 60 sec: 45875.0, 300 sec: 47208.2). Total num frames: 1245315072. Throughput: 0: 11616.8. Samples: 311368192. Policy #0 lag: (min: 7.0, avg: 91.9, max: 263.0) [2024-06-15 18:59:40,956][1648985] Avg episode reward: [(0, '172.130')] [2024-06-15 18:59:42,483][1652491] Updated weights for policy 0, policy_version 608070 (0.0014) [2024-06-15 18:59:45,967][1648985] Fps is (10 sec: 49092.0, 60 sec: 45319.8, 300 sec: 46872.9). Total num frames: 1245446144. Throughput: 0: 11636.3. Samples: 311440896. Policy #0 lag: (min: 7.0, avg: 91.9, max: 263.0) [2024-06-15 18:59:45,968][1648985] Avg episode reward: [(0, '178.030')] [2024-06-15 18:59:46,621][1652491] Updated weights for policy 0, policy_version 608129 (0.0019) [2024-06-15 18:59:47,848][1652491] Updated weights for policy 0, policy_version 608192 (0.0013) [2024-06-15 18:59:49,102][1652491] Updated weights for policy 0, policy_version 608256 (0.0014) [2024-06-15 18:59:50,545][1652491] Updated weights for policy 0, policy_version 608311 (0.0017) [2024-06-15 18:59:50,955][1648985] Fps is (10 sec: 52430.4, 60 sec: 48606.0, 300 sec: 47541.4). Total num frames: 1245839360. Throughput: 0: 11753.3. Samples: 311512064. Policy #0 lag: (min: 7.0, avg: 91.9, max: 263.0) [2024-06-15 18:59:50,955][1648985] Avg episode reward: [(0, '173.440')] [2024-06-15 18:59:54,185][1652491] Updated weights for policy 0, policy_version 608368 (0.0014) [2024-06-15 18:59:55,955][1648985] Fps is (10 sec: 52492.3, 60 sec: 45875.3, 300 sec: 47097.1). Total num frames: 1245970432. Throughput: 0: 12071.8. Samples: 311555072. Policy #0 lag: (min: 7.0, avg: 91.9, max: 263.0) [2024-06-15 18:59:55,956][1648985] Avg episode reward: [(0, '162.170')] [2024-06-15 18:59:58,584][1652491] Updated weights for policy 0, policy_version 608436 (0.0014) [2024-06-15 18:59:59,990][1652491] Updated weights for policy 0, policy_version 608499 (0.0014) [2024-06-15 19:00:00,955][1648985] Fps is (10 sec: 42597.7, 60 sec: 48606.0, 300 sec: 47208.1). Total num frames: 1246265344. Throughput: 0: 11923.9. Samples: 311622144. Policy #0 lag: (min: 7.0, avg: 91.9, max: 263.0) [2024-06-15 19:00:00,956][1648985] Avg episode reward: [(0, '133.890')] [2024-06-15 19:00:01,426][1652491] Updated weights for policy 0, policy_version 608567 (0.0018) [2024-06-15 19:00:05,637][1652491] Updated weights for policy 0, policy_version 608624 (0.0111) [2024-06-15 19:00:05,955][1648985] Fps is (10 sec: 52430.4, 60 sec: 47513.7, 300 sec: 47097.1). Total num frames: 1246494720. Throughput: 0: 12037.7. Samples: 311694336. Policy #0 lag: (min: 7.0, avg: 91.9, max: 263.0) [2024-06-15 19:00:05,955][1648985] Avg episode reward: [(0, '122.990')] [2024-06-15 19:00:09,105][1652491] Updated weights for policy 0, policy_version 608688 (0.0014) [2024-06-15 19:00:10,955][1648985] Fps is (10 sec: 45875.2, 60 sec: 47513.6, 300 sec: 46986.0). Total num frames: 1246724096. Throughput: 0: 11958.1. Samples: 311730176. Policy #0 lag: (min: 7.0, avg: 91.9, max: 263.0) [2024-06-15 19:00:10,955][1648985] Avg episode reward: [(0, '149.260')] [2024-06-15 19:00:10,986][1652491] Updated weights for policy 0, policy_version 608760 (0.0034) [2024-06-15 19:00:12,234][1652491] Updated weights for policy 0, policy_version 608807 (0.0012) [2024-06-15 19:00:15,955][1648985] Fps is (10 sec: 39321.2, 60 sec: 46421.4, 300 sec: 46763.8). Total num frames: 1246887936. Throughput: 0: 12105.9. Samples: 311801856. Policy #0 lag: (min: 46.0, avg: 208.3, max: 345.0) [2024-06-15 19:00:15,956][1648985] Avg episode reward: [(0, '140.340')] [2024-06-15 19:00:16,318][1651469] Signal inference workers to stop experience collection... (31700 times) [2024-06-15 19:00:16,402][1652491] InferenceWorker_p0-w0: stopping experience collection (31700 times) [2024-06-15 19:00:16,537][1651469] Signal inference workers to resume experience collection... (31700 times) [2024-06-15 19:00:16,538][1652491] InferenceWorker_p0-w0: resuming experience collection (31700 times) [2024-06-15 19:00:16,540][1652491] Updated weights for policy 0, policy_version 608864 (0.0014) [2024-06-15 19:00:17,194][1652491] Updated weights for policy 0, policy_version 608893 (0.0013) [2024-06-15 19:00:20,955][1648985] Fps is (10 sec: 42598.4, 60 sec: 46967.5, 300 sec: 46652.7). Total num frames: 1247150080. Throughput: 0: 11980.8. Samples: 311873024. Policy #0 lag: (min: 46.0, avg: 208.3, max: 345.0) [2024-06-15 19:00:20,956][1648985] Avg episode reward: [(0, '152.040')] [2024-06-15 19:00:21,246][1652491] Updated weights for policy 0, policy_version 608976 (0.0014) [2024-06-15 19:00:23,030][1652491] Updated weights for policy 0, policy_version 609043 (0.0011) [2024-06-15 19:00:25,955][1648985] Fps is (10 sec: 52428.6, 60 sec: 47513.5, 300 sec: 46986.0). Total num frames: 1247412224. Throughput: 0: 11707.7. Samples: 311895040. Policy #0 lag: (min: 46.0, avg: 208.3, max: 345.0) [2024-06-15 19:00:25,958][1648985] Avg episode reward: [(0, '152.600')] [2024-06-15 19:00:27,996][1652491] Updated weights for policy 0, policy_version 609106 (0.0035) [2024-06-15 19:00:29,145][1652491] Updated weights for policy 0, policy_version 609151 (0.0012) [2024-06-15 19:00:30,955][1648985] Fps is (10 sec: 39321.3, 60 sec: 45875.1, 300 sec: 46541.7). Total num frames: 1247543296. Throughput: 0: 11824.7. Samples: 311972864. Policy #0 lag: (min: 46.0, avg: 208.3, max: 345.0) [2024-06-15 19:00:30,956][1648985] Avg episode reward: [(0, '145.220')] [2024-06-15 19:00:32,298][1652491] Updated weights for policy 0, policy_version 609216 (0.0014) [2024-06-15 19:00:33,555][1652491] Updated weights for policy 0, policy_version 609268 (0.0011) [2024-06-15 19:00:35,192][1652491] Updated weights for policy 0, policy_version 609335 (0.0011) [2024-06-15 19:00:35,955][1648985] Fps is (10 sec: 52429.1, 60 sec: 49698.2, 300 sec: 47208.2). Total num frames: 1247936512. Throughput: 0: 11696.3. Samples: 312038400. Policy #0 lag: (min: 46.0, avg: 208.3, max: 345.0) [2024-06-15 19:00:35,956][1648985] Avg episode reward: [(0, '128.260')] [2024-06-15 19:00:39,496][1652491] Updated weights for policy 0, policy_version 609378 (0.0012) [2024-06-15 19:00:40,955][1648985] Fps is (10 sec: 52429.1, 60 sec: 45875.3, 300 sec: 46652.7). Total num frames: 1248067584. Throughput: 0: 11764.7. Samples: 312084480. Policy #0 lag: (min: 46.0, avg: 208.3, max: 345.0) [2024-06-15 19:00:40,956][1648985] Avg episode reward: [(0, '153.770')] [2024-06-15 19:00:42,227][1652491] Updated weights for policy 0, policy_version 609463 (0.0018) [2024-06-15 19:00:43,762][1652491] Updated weights for policy 0, policy_version 609507 (0.0012) [2024-06-15 19:00:45,113][1652491] Updated weights for policy 0, policy_version 609575 (0.0013) [2024-06-15 19:00:45,955][1648985] Fps is (10 sec: 52429.2, 60 sec: 50254.6, 300 sec: 47430.3). Total num frames: 1248460800. Throughput: 0: 11776.0. Samples: 312152064. Policy #0 lag: (min: 46.0, avg: 208.3, max: 345.0) [2024-06-15 19:00:45,956][1648985] Avg episode reward: [(0, '182.760')] [2024-06-15 19:00:50,955][1648985] Fps is (10 sec: 49151.9, 60 sec: 45328.9, 300 sec: 46986.0). Total num frames: 1248559104. Throughput: 0: 11776.0. Samples: 312224256. Policy #0 lag: (min: 46.0, avg: 208.3, max: 345.0) [2024-06-15 19:00:50,956][1648985] Avg episode reward: [(0, '162.750')] [2024-06-15 19:00:50,985][1652491] Updated weights for policy 0, policy_version 609661 (0.0013) [2024-06-15 19:00:53,355][1652491] Updated weights for policy 0, policy_version 609726 (0.0013) [2024-06-15 19:00:55,674][1651469] Signal inference workers to stop experience collection... (31750 times) [2024-06-15 19:00:55,759][1652491] InferenceWorker_p0-w0: stopping experience collection (31750 times) [2024-06-15 19:00:55,763][1652491] Updated weights for policy 0, policy_version 609798 (0.0013) [2024-06-15 19:00:55,898][1651469] Signal inference workers to resume experience collection... (31750 times) [2024-06-15 19:00:55,899][1652491] InferenceWorker_p0-w0: resuming experience collection (31750 times) [2024-06-15 19:00:55,955][1648985] Fps is (10 sec: 42596.7, 60 sec: 48605.8, 300 sec: 47208.1). Total num frames: 1248886784. Throughput: 0: 11787.3. Samples: 312260608. Policy #0 lag: (min: 46.0, avg: 208.3, max: 345.0) [2024-06-15 19:00:55,956][1648985] Avg episode reward: [(0, '170.720')] [2024-06-15 19:00:56,259][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000609824_1248919552.pth... [2024-06-15 19:00:56,418][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000604288_1237581824.pth [2024-06-15 19:00:56,983][1652491] Updated weights for policy 0, policy_version 609853 (0.0013) [2024-06-15 19:01:00,956][1648985] Fps is (10 sec: 42597.0, 60 sec: 45328.8, 300 sec: 46652.7). Total num frames: 1248985088. Throughput: 0: 11787.3. Samples: 312332288. Policy #0 lag: (min: 46.0, avg: 208.3, max: 345.0) [2024-06-15 19:01:00,957][1648985] Avg episode reward: [(0, '178.490')] [2024-06-15 19:01:01,929][1652491] Updated weights for policy 0, policy_version 609893 (0.0024) [2024-06-15 19:01:03,681][1652491] Updated weights for policy 0, policy_version 609939 (0.0011) [2024-06-15 19:01:05,955][1648985] Fps is (10 sec: 36044.8, 60 sec: 45874.9, 300 sec: 46652.7). Total num frames: 1249247232. Throughput: 0: 11673.5. Samples: 312398336. Policy #0 lag: (min: 46.0, avg: 208.3, max: 345.0) [2024-06-15 19:01:05,956][1648985] Avg episode reward: [(0, '157.490')] [2024-06-15 19:01:05,957][1652491] Updated weights for policy 0, policy_version 610000 (0.0142) [2024-06-15 19:01:07,696][1652491] Updated weights for policy 0, policy_version 610065 (0.0032) [2024-06-15 19:01:10,955][1648985] Fps is (10 sec: 52430.8, 60 sec: 46421.3, 300 sec: 46986.0). Total num frames: 1249509376. Throughput: 0: 11946.7. Samples: 312432640. Policy #0 lag: (min: 46.0, avg: 208.3, max: 345.0) [2024-06-15 19:01:10,956][1648985] Avg episode reward: [(0, '137.940')] [2024-06-15 19:01:12,553][1652491] Updated weights for policy 0, policy_version 610128 (0.0013) [2024-06-15 19:01:13,728][1652491] Updated weights for policy 0, policy_version 610174 (0.0012) [2024-06-15 19:01:15,287][1652491] Updated weights for policy 0, policy_version 610228 (0.0030) [2024-06-15 19:01:15,955][1648985] Fps is (10 sec: 52430.5, 60 sec: 48059.8, 300 sec: 46874.9). Total num frames: 1249771520. Throughput: 0: 11889.8. Samples: 312507904. Policy #0 lag: (min: 46.0, avg: 208.3, max: 345.0) [2024-06-15 19:01:15,956][1648985] Avg episode reward: [(0, '150.250')] [2024-06-15 19:01:16,948][1652491] Updated weights for policy 0, policy_version 610289 (0.0032) [2024-06-15 19:01:18,197][1652491] Updated weights for policy 0, policy_version 610352 (0.0014) [2024-06-15 19:01:20,964][1648985] Fps is (10 sec: 52382.1, 60 sec: 48052.6, 300 sec: 47095.6). Total num frames: 1250033664. Throughput: 0: 12149.1. Samples: 312585216. Policy #0 lag: (min: 46.0, avg: 208.3, max: 345.0) [2024-06-15 19:01:20,965][1648985] Avg episode reward: [(0, '162.870')] [2024-06-15 19:01:23,504][1652491] Updated weights for policy 0, policy_version 610393 (0.0072) [2024-06-15 19:01:24,612][1652491] Updated weights for policy 0, policy_version 610435 (0.0012) [2024-06-15 19:01:25,823][1652491] Updated weights for policy 0, policy_version 610496 (0.0015) [2024-06-15 19:01:25,955][1648985] Fps is (10 sec: 52429.3, 60 sec: 48059.9, 300 sec: 47097.1). Total num frames: 1250295808. Throughput: 0: 11958.1. Samples: 312622592. Policy #0 lag: (min: 46.0, avg: 208.3, max: 345.0) [2024-06-15 19:01:25,955][1648985] Avg episode reward: [(0, '145.870')] [2024-06-15 19:01:28,507][1652491] Updated weights for policy 0, policy_version 610562 (0.0011) [2024-06-15 19:01:29,764][1652491] Updated weights for policy 0, policy_version 610623 (0.0013) [2024-06-15 19:01:30,955][1648985] Fps is (10 sec: 52475.4, 60 sec: 50244.3, 300 sec: 47430.3). Total num frames: 1250557952. Throughput: 0: 11787.4. Samples: 312682496. Policy #0 lag: (min: 46.0, avg: 208.3, max: 345.0) [2024-06-15 19:01:30,956][1648985] Avg episode reward: [(0, '123.640')] [2024-06-15 19:01:35,671][1652491] Updated weights for policy 0, policy_version 610674 (0.0014) [2024-06-15 19:01:35,955][1648985] Fps is (10 sec: 36044.5, 60 sec: 45329.1, 300 sec: 46763.8). Total num frames: 1250656256. Throughput: 0: 11935.3. Samples: 312761344. Policy #0 lag: (min: 46.0, avg: 208.3, max: 345.0) [2024-06-15 19:01:35,956][1648985] Avg episode reward: [(0, '157.290')] [2024-06-15 19:01:36,497][1652491] Updated weights for policy 0, policy_version 610706 (0.0017) [2024-06-15 19:01:38,525][1652491] Updated weights for policy 0, policy_version 610768 (0.0118) [2024-06-15 19:01:38,537][1651469] Signal inference workers to stop experience collection... (31800 times) [2024-06-15 19:01:38,637][1652491] InferenceWorker_p0-w0: stopping experience collection (31800 times) [2024-06-15 19:01:38,811][1651469] Signal inference workers to resume experience collection... (31800 times) [2024-06-15 19:01:38,812][1652491] InferenceWorker_p0-w0: resuming experience collection (31800 times) [2024-06-15 19:01:39,958][1652491] Updated weights for policy 0, policy_version 610822 (0.0015) [2024-06-15 19:01:40,955][1648985] Fps is (10 sec: 49151.7, 60 sec: 49698.1, 300 sec: 47430.3). Total num frames: 1251049472. Throughput: 0: 11810.2. Samples: 312792064. Policy #0 lag: (min: 46.0, avg: 208.3, max: 345.0) [2024-06-15 19:01:40,956][1648985] Avg episode reward: [(0, '170.770')] [2024-06-15 19:01:41,056][1652491] Updated weights for policy 0, policy_version 610872 (0.0011) [2024-06-15 19:01:45,687][1652491] Updated weights for policy 0, policy_version 610916 (0.0011) [2024-06-15 19:01:45,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 45329.0, 300 sec: 46874.9). Total num frames: 1251180544. Throughput: 0: 11980.9. Samples: 312871424. Policy #0 lag: (min: 46.0, avg: 208.3, max: 345.0) [2024-06-15 19:01:45,956][1648985] Avg episode reward: [(0, '164.230')] [2024-06-15 19:01:47,072][1652491] Updated weights for policy 0, policy_version 610946 (0.0015) [2024-06-15 19:01:48,307][1652491] Updated weights for policy 0, policy_version 611005 (0.0031) [2024-06-15 19:01:50,351][1652491] Updated weights for policy 0, policy_version 611072 (0.0012) [2024-06-15 19:01:50,955][1648985] Fps is (10 sec: 45875.9, 60 sec: 49152.1, 300 sec: 47208.2). Total num frames: 1251508224. Throughput: 0: 11946.8. Samples: 312935936. Policy #0 lag: (min: 15.0, avg: 114.0, max: 272.0) [2024-06-15 19:01:50,955][1648985] Avg episode reward: [(0, '153.800')] [2024-06-15 19:01:51,700][1652491] Updated weights for policy 0, policy_version 611126 (0.0011) [2024-06-15 19:01:55,955][1648985] Fps is (10 sec: 45875.0, 60 sec: 45875.4, 300 sec: 46763.8). Total num frames: 1251639296. Throughput: 0: 12151.4. Samples: 312979456. Policy #0 lag: (min: 15.0, avg: 114.0, max: 272.0) [2024-06-15 19:01:55,956][1648985] Avg episode reward: [(0, '157.240')] [2024-06-15 19:01:56,439][1652491] Updated weights for policy 0, policy_version 611184 (0.0023) [2024-06-15 19:01:57,862][1652491] Updated weights for policy 0, policy_version 611236 (0.0011) [2024-06-15 19:02:00,564][1652491] Updated weights for policy 0, policy_version 611328 (0.0013) [2024-06-15 19:02:00,955][1648985] Fps is (10 sec: 49151.4, 60 sec: 50244.5, 300 sec: 47097.0). Total num frames: 1251999744. Throughput: 0: 12105.9. Samples: 313052672. Policy #0 lag: (min: 15.0, avg: 114.0, max: 272.0) [2024-06-15 19:02:00,956][1648985] Avg episode reward: [(0, '165.800')] [2024-06-15 19:02:02,360][1652491] Updated weights for policy 0, policy_version 611391 (0.0115) [2024-06-15 19:02:05,955][1648985] Fps is (10 sec: 49152.3, 60 sec: 48060.0, 300 sec: 46986.0). Total num frames: 1252130816. Throughput: 0: 12097.0. Samples: 313129472. Policy #0 lag: (min: 15.0, avg: 114.0, max: 272.0) [2024-06-15 19:02:05,956][1648985] Avg episode reward: [(0, '174.730')] [2024-06-15 19:02:07,474][1652491] Updated weights for policy 0, policy_version 611450 (0.0013) [2024-06-15 19:02:09,026][1652491] Updated weights for policy 0, policy_version 611508 (0.0012) [2024-06-15 19:02:10,927][1652491] Updated weights for policy 0, policy_version 611538 (0.0013) [2024-06-15 19:02:10,955][1648985] Fps is (10 sec: 42599.1, 60 sec: 48605.9, 300 sec: 47097.1). Total num frames: 1252425728. Throughput: 0: 11958.1. Samples: 313160704. Policy #0 lag: (min: 15.0, avg: 114.0, max: 272.0) [2024-06-15 19:02:10,955][1648985] Avg episode reward: [(0, '183.870')] [2024-06-15 19:02:12,377][1652491] Updated weights for policy 0, policy_version 611602 (0.0012) [2024-06-15 19:02:15,955][1648985] Fps is (10 sec: 52428.5, 60 sec: 48059.7, 300 sec: 47208.1). Total num frames: 1252655104. Throughput: 0: 12197.0. Samples: 313231360. Policy #0 lag: (min: 15.0, avg: 114.0, max: 272.0) [2024-06-15 19:02:15,956][1648985] Avg episode reward: [(0, '135.400')] [2024-06-15 19:02:17,760][1652491] Updated weights for policy 0, policy_version 611651 (0.0013) [2024-06-15 19:02:19,334][1652491] Updated weights for policy 0, policy_version 611728 (0.0014) [2024-06-15 19:02:19,467][1651469] Signal inference workers to stop experience collection... (31850 times) [2024-06-15 19:02:19,501][1652491] InferenceWorker_p0-w0: stopping experience collection (31850 times) [2024-06-15 19:02:19,693][1651469] Signal inference workers to resume experience collection... (31850 times) [2024-06-15 19:02:19,694][1652491] InferenceWorker_p0-w0: resuming experience collection (31850 times) [2024-06-15 19:02:20,960][1648985] Fps is (10 sec: 49127.7, 60 sec: 48063.0, 300 sec: 47318.5). Total num frames: 1252917248. Throughput: 0: 12025.0. Samples: 313302528. Policy #0 lag: (min: 15.0, avg: 114.0, max: 272.0) [2024-06-15 19:02:20,961][1648985] Avg episode reward: [(0, '141.030')] [2024-06-15 19:02:21,900][1652491] Updated weights for policy 0, policy_version 611779 (0.0016) [2024-06-15 19:02:23,637][1652491] Updated weights for policy 0, policy_version 611841 (0.0012) [2024-06-15 19:02:24,792][1652491] Updated weights for policy 0, policy_version 611899 (0.0015) [2024-06-15 19:02:25,955][1648985] Fps is (10 sec: 52429.4, 60 sec: 48059.7, 300 sec: 47541.4). Total num frames: 1253179392. Throughput: 0: 12083.2. Samples: 313335808. Policy #0 lag: (min: 15.0, avg: 114.0, max: 272.0) [2024-06-15 19:02:25,955][1648985] Avg episode reward: [(0, '118.910')] [2024-06-15 19:02:30,289][1652491] Updated weights for policy 0, policy_version 611968 (0.0124) [2024-06-15 19:02:30,955][1648985] Fps is (10 sec: 42618.8, 60 sec: 46421.3, 300 sec: 47208.2). Total num frames: 1253343232. Throughput: 0: 12049.1. Samples: 313413632. Policy #0 lag: (min: 15.0, avg: 114.0, max: 272.0) [2024-06-15 19:02:30,956][1648985] Avg episode reward: [(0, '136.730')] [2024-06-15 19:02:31,620][1652491] Updated weights for policy 0, policy_version 612023 (0.0014) [2024-06-15 19:02:33,365][1652491] Updated weights for policy 0, policy_version 612055 (0.0014) [2024-06-15 19:02:35,163][1652491] Updated weights for policy 0, policy_version 612128 (0.0106) [2024-06-15 19:02:35,955][1648985] Fps is (10 sec: 52428.4, 60 sec: 50790.4, 300 sec: 47541.4). Total num frames: 1253703680. Throughput: 0: 12014.9. Samples: 313476608. Policy #0 lag: (min: 15.0, avg: 114.0, max: 272.0) [2024-06-15 19:02:35,956][1648985] Avg episode reward: [(0, '139.450')] [2024-06-15 19:02:40,011][1652491] Updated weights for policy 0, policy_version 612176 (0.0012) [2024-06-15 19:02:40,955][1648985] Fps is (10 sec: 45875.2, 60 sec: 45875.2, 300 sec: 46986.0). Total num frames: 1253801984. Throughput: 0: 12037.7. Samples: 313521152. Policy #0 lag: (min: 15.0, avg: 114.0, max: 272.0) [2024-06-15 19:02:40,956][1648985] Avg episode reward: [(0, '162.220')] [2024-06-15 19:02:40,968][1652491] Updated weights for policy 0, policy_version 612224 (0.0032) [2024-06-15 19:02:41,923][1652491] Updated weights for policy 0, policy_version 612272 (0.0013) [2024-06-15 19:02:43,736][1652491] Updated weights for policy 0, policy_version 612305 (0.0030) [2024-06-15 19:02:45,157][1652491] Updated weights for policy 0, policy_version 612369 (0.0014) [2024-06-15 19:02:45,955][1648985] Fps is (10 sec: 49152.1, 60 sec: 50244.3, 300 sec: 47430.3). Total num frames: 1254195200. Throughput: 0: 11912.6. Samples: 313588736. Policy #0 lag: (min: 15.0, avg: 114.0, max: 272.0) [2024-06-15 19:02:45,956][1648985] Avg episode reward: [(0, '145.670')] [2024-06-15 19:02:50,655][1652491] Updated weights for policy 0, policy_version 612421 (0.0012) [2024-06-15 19:02:50,955][1648985] Fps is (10 sec: 45875.0, 60 sec: 45875.1, 300 sec: 47097.1). Total num frames: 1254260736. Throughput: 0: 12049.0. Samples: 313671680. Policy #0 lag: (min: 15.0, avg: 114.0, max: 272.0) [2024-06-15 19:02:50,956][1648985] Avg episode reward: [(0, '156.530')] [2024-06-15 19:02:52,389][1652491] Updated weights for policy 0, policy_version 612496 (0.0103) [2024-06-15 19:02:53,240][1652491] Updated weights for policy 0, policy_version 612541 (0.0040) [2024-06-15 19:02:54,756][1652491] Updated weights for policy 0, policy_version 612582 (0.0134) [2024-06-15 19:02:55,955][1648985] Fps is (10 sec: 45874.8, 60 sec: 50244.3, 300 sec: 47541.4). Total num frames: 1254653952. Throughput: 0: 12083.2. Samples: 313704448. Policy #0 lag: (min: 15.0, avg: 114.0, max: 272.0) [2024-06-15 19:02:55,956][1648985] Avg episode reward: [(0, '145.790')] [2024-06-15 19:02:56,195][1652491] Updated weights for policy 0, policy_version 612640 (0.0013) [2024-06-15 19:02:56,327][1651469] Signal inference workers to stop experience collection... (31900 times) [2024-06-15 19:02:56,397][1652491] InferenceWorker_p0-w0: stopping experience collection (31900 times) [2024-06-15 19:02:56,575][1651469] Signal inference workers to resume experience collection... (31900 times) [2024-06-15 19:02:56,576][1652491] InferenceWorker_p0-w0: resuming experience collection (31900 times) [2024-06-15 19:02:56,577][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000612656_1254719488.pth... [2024-06-15 19:02:56,637][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000607040_1243217920.pth [2024-06-15 19:02:56,924][1652491] Updated weights for policy 0, policy_version 612672 (0.0012) [2024-06-15 19:03:00,955][1648985] Fps is (10 sec: 49152.3, 60 sec: 45875.2, 300 sec: 47319.2). Total num frames: 1254752256. Throughput: 0: 12071.8. Samples: 313774592. Policy #0 lag: (min: 15.0, avg: 114.0, max: 272.0) [2024-06-15 19:03:00,956][1648985] Avg episode reward: [(0, '143.600')] [2024-06-15 19:03:03,874][1652491] Updated weights for policy 0, policy_version 612752 (0.0014) [2024-06-15 19:03:04,885][1652491] Updated weights for policy 0, policy_version 612797 (0.0012) [2024-06-15 19:03:05,955][1648985] Fps is (10 sec: 39322.1, 60 sec: 48605.9, 300 sec: 47319.3). Total num frames: 1255047168. Throughput: 0: 11970.7. Samples: 313841152. Policy #0 lag: (min: 15.0, avg: 114.0, max: 272.0) [2024-06-15 19:03:05,955][1648985] Avg episode reward: [(0, '133.280')] [2024-06-15 19:03:06,394][1652491] Updated weights for policy 0, policy_version 612835 (0.0028) [2024-06-15 19:03:07,951][1652491] Updated weights for policy 0, policy_version 612897 (0.0013) [2024-06-15 19:03:10,955][1648985] Fps is (10 sec: 52428.6, 60 sec: 47513.5, 300 sec: 47541.4). Total num frames: 1255276544. Throughput: 0: 11844.2. Samples: 313868800. Policy #0 lag: (min: 15.0, avg: 114.0, max: 272.0) [2024-06-15 19:03:10,956][1648985] Avg episode reward: [(0, '132.060')] [2024-06-15 19:03:14,072][1652491] Updated weights for policy 0, policy_version 612960 (0.0108) [2024-06-15 19:03:15,437][1652491] Updated weights for policy 0, policy_version 613010 (0.0040) [2024-06-15 19:03:15,955][1648985] Fps is (10 sec: 42598.4, 60 sec: 46967.5, 300 sec: 47208.1). Total num frames: 1255473152. Throughput: 0: 11764.6. Samples: 313943040. Policy #0 lag: (min: 15.0, avg: 114.0, max: 272.0) [2024-06-15 19:03:15,955][1648985] Avg episode reward: [(0, '128.740')] [2024-06-15 19:03:16,293][1652491] Updated weights for policy 0, policy_version 613051 (0.0013) [2024-06-15 19:03:18,764][1652491] Updated weights for policy 0, policy_version 613110 (0.0079) [2024-06-15 19:03:20,147][1652491] Updated weights for policy 0, policy_version 613177 (0.0014) [2024-06-15 19:03:20,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 48063.6, 300 sec: 47541.4). Total num frames: 1255800832. Throughput: 0: 11673.6. Samples: 314001920. Policy #0 lag: (min: 15.0, avg: 114.0, max: 272.0) [2024-06-15 19:03:20,956][1648985] Avg episode reward: [(0, '133.100')] [2024-06-15 19:03:25,955][1648985] Fps is (10 sec: 39321.2, 60 sec: 44782.9, 300 sec: 46874.9). Total num frames: 1255866368. Throughput: 0: 11616.7. Samples: 314043904. Policy #0 lag: (min: 9.0, avg: 77.2, max: 265.0) [2024-06-15 19:03:25,956][1648985] Avg episode reward: [(0, '145.400')] [2024-06-15 19:03:26,065][1652491] Updated weights for policy 0, policy_version 613232 (0.0030) [2024-06-15 19:03:27,426][1652491] Updated weights for policy 0, policy_version 613284 (0.0013) [2024-06-15 19:03:28,739][1652491] Updated weights for policy 0, policy_version 613328 (0.0011) [2024-06-15 19:03:29,824][1652491] Updated weights for policy 0, policy_version 613376 (0.0015) [2024-06-15 19:03:30,955][1648985] Fps is (10 sec: 45875.2, 60 sec: 48605.9, 300 sec: 47319.2). Total num frames: 1256259584. Throughput: 0: 11673.6. Samples: 314114048. Policy #0 lag: (min: 9.0, avg: 77.2, max: 265.0) [2024-06-15 19:03:30,956][1648985] Avg episode reward: [(0, '138.100')] [2024-06-15 19:03:31,220][1652491] Updated weights for policy 0, policy_version 613439 (0.0014) [2024-06-15 19:03:35,955][1648985] Fps is (10 sec: 45875.2, 60 sec: 43690.6, 300 sec: 46652.7). Total num frames: 1256325120. Throughput: 0: 11491.6. Samples: 314188800. Policy #0 lag: (min: 9.0, avg: 77.2, max: 265.0) [2024-06-15 19:03:35,956][1648985] Avg episode reward: [(0, '144.390')] [2024-06-15 19:03:38,134][1652491] Updated weights for policy 0, policy_version 613536 (0.0014) [2024-06-15 19:03:40,033][1651469] Signal inference workers to stop experience collection... (31950 times) [2024-06-15 19:03:40,108][1652491] InferenceWorker_p0-w0: stopping experience collection (31950 times) [2024-06-15 19:03:40,112][1652491] Updated weights for policy 0, policy_version 613571 (0.0027) [2024-06-15 19:03:40,297][1651469] Signal inference workers to resume experience collection... (31950 times) [2024-06-15 19:03:40,298][1652491] InferenceWorker_p0-w0: resuming experience collection (31950 times) [2024-06-15 19:03:40,955][1648985] Fps is (10 sec: 39321.7, 60 sec: 47513.6, 300 sec: 47208.1). Total num frames: 1256652800. Throughput: 0: 11252.6. Samples: 314210816. Policy #0 lag: (min: 9.0, avg: 77.2, max: 265.0) [2024-06-15 19:03:40,956][1648985] Avg episode reward: [(0, '129.290')] [2024-06-15 19:03:41,532][1652491] Updated weights for policy 0, policy_version 613632 (0.0056) [2024-06-15 19:03:42,594][1652491] Updated weights for policy 0, policy_version 613687 (0.0012) [2024-06-15 19:03:45,955][1648985] Fps is (10 sec: 52427.7, 60 sec: 44236.6, 300 sec: 47208.1). Total num frames: 1256849408. Throughput: 0: 11366.3. Samples: 314286080. Policy #0 lag: (min: 9.0, avg: 77.2, max: 265.0) [2024-06-15 19:03:45,956][1648985] Avg episode reward: [(0, '122.780')] [2024-06-15 19:03:48,534][1652491] Updated weights for policy 0, policy_version 613746 (0.0013) [2024-06-15 19:03:49,853][1652491] Updated weights for policy 0, policy_version 613796 (0.0011) [2024-06-15 19:03:50,955][1648985] Fps is (10 sec: 45876.0, 60 sec: 47513.8, 300 sec: 47097.1). Total num frames: 1257111552. Throughput: 0: 11514.3. Samples: 314359296. Policy #0 lag: (min: 9.0, avg: 77.2, max: 265.0) [2024-06-15 19:03:50,955][1648985] Avg episode reward: [(0, '140.300')] [2024-06-15 19:03:51,376][1652491] Updated weights for policy 0, policy_version 613825 (0.0010) [2024-06-15 19:03:52,959][1652491] Updated weights for policy 0, policy_version 613889 (0.0012) [2024-06-15 19:03:54,070][1652491] Updated weights for policy 0, policy_version 613948 (0.0011) [2024-06-15 19:03:55,955][1648985] Fps is (10 sec: 52429.5, 60 sec: 45329.0, 300 sec: 47541.4). Total num frames: 1257373696. Throughput: 0: 11559.8. Samples: 314388992. Policy #0 lag: (min: 9.0, avg: 77.2, max: 265.0) [2024-06-15 19:03:55,956][1648985] Avg episode reward: [(0, '149.030')] [2024-06-15 19:03:59,440][1652491] Updated weights for policy 0, policy_version 614000 (0.0014) [2024-06-15 19:04:00,955][1648985] Fps is (10 sec: 45875.0, 60 sec: 46967.6, 300 sec: 47208.2). Total num frames: 1257570304. Throughput: 0: 11502.9. Samples: 314460672. Policy #0 lag: (min: 9.0, avg: 77.2, max: 265.0) [2024-06-15 19:04:00,955][1648985] Avg episode reward: [(0, '148.060')] [2024-06-15 19:04:01,404][1652491] Updated weights for policy 0, policy_version 614080 (0.0012) [2024-06-15 19:04:04,443][1652491] Updated weights for policy 0, policy_version 614145 (0.0138) [2024-06-15 19:04:05,837][1652491] Updated weights for policy 0, policy_version 614208 (0.0015) [2024-06-15 19:04:05,955][1648985] Fps is (10 sec: 52429.5, 60 sec: 47513.6, 300 sec: 47541.4). Total num frames: 1257897984. Throughput: 0: 11594.0. Samples: 314523648. Policy #0 lag: (min: 9.0, avg: 77.2, max: 265.0) [2024-06-15 19:04:05,956][1648985] Avg episode reward: [(0, '132.590')] [2024-06-15 19:04:10,955][1648985] Fps is (10 sec: 39321.3, 60 sec: 44783.0, 300 sec: 46986.0). Total num frames: 1257963520. Throughput: 0: 11525.7. Samples: 314562560. Policy #0 lag: (min: 9.0, avg: 77.2, max: 265.0) [2024-06-15 19:04:10,956][1648985] Avg episode reward: [(0, '131.750')] [2024-06-15 19:04:11,702][1652491] Updated weights for policy 0, policy_version 614275 (0.0014) [2024-06-15 19:04:12,958][1652491] Updated weights for policy 0, policy_version 614333 (0.0110) [2024-06-15 19:04:15,404][1652491] Updated weights for policy 0, policy_version 614373 (0.0012) [2024-06-15 19:04:15,955][1648985] Fps is (10 sec: 39321.9, 60 sec: 46967.5, 300 sec: 47319.2). Total num frames: 1258291200. Throughput: 0: 11514.3. Samples: 314632192. Policy #0 lag: (min: 9.0, avg: 77.2, max: 265.0) [2024-06-15 19:04:15,955][1648985] Avg episode reward: [(0, '133.280')] [2024-06-15 19:04:16,681][1652491] Updated weights for policy 0, policy_version 614432 (0.0093) [2024-06-15 19:04:20,955][1648985] Fps is (10 sec: 45875.3, 60 sec: 43690.7, 300 sec: 46986.0). Total num frames: 1258422272. Throughput: 0: 11389.2. Samples: 314701312. Policy #0 lag: (min: 9.0, avg: 77.2, max: 265.0) [2024-06-15 19:04:20,955][1648985] Avg episode reward: [(0, '156.290')] [2024-06-15 19:04:21,760][1651469] Signal inference workers to stop experience collection... (32000 times) [2024-06-15 19:04:21,797][1652491] InferenceWorker_p0-w0: stopping experience collection (32000 times) [2024-06-15 19:04:21,944][1651469] Signal inference workers to resume experience collection... (32000 times) [2024-06-15 19:04:21,945][1652491] InferenceWorker_p0-w0: resuming experience collection (32000 times) [2024-06-15 19:04:22,534][1652491] Updated weights for policy 0, policy_version 614523 (0.0015) [2024-06-15 19:04:24,357][1652491] Updated weights for policy 0, policy_version 614582 (0.0013) [2024-06-15 19:04:25,955][1648985] Fps is (10 sec: 39320.2, 60 sec: 46967.3, 300 sec: 47097.0). Total num frames: 1258684416. Throughput: 0: 11502.9. Samples: 314728448. Policy #0 lag: (min: 9.0, avg: 77.2, max: 265.0) [2024-06-15 19:04:25,956][1648985] Avg episode reward: [(0, '151.520')] [2024-06-15 19:04:26,828][1652491] Updated weights for policy 0, policy_version 614624 (0.0010) [2024-06-15 19:04:28,359][1652491] Updated weights for policy 0, policy_version 614688 (0.0013) [2024-06-15 19:04:30,955][1648985] Fps is (10 sec: 52428.3, 60 sec: 44782.9, 300 sec: 47430.3). Total num frames: 1258946560. Throughput: 0: 11434.7. Samples: 314800640. Policy #0 lag: (min: 9.0, avg: 77.2, max: 265.0) [2024-06-15 19:04:30,956][1648985] Avg episode reward: [(0, '141.520')] [2024-06-15 19:04:32,859][1652491] Updated weights for policy 0, policy_version 614736 (0.0037) [2024-06-15 19:04:35,040][1652491] Updated weights for policy 0, policy_version 614832 (0.0130) [2024-06-15 19:04:35,955][1648985] Fps is (10 sec: 52430.3, 60 sec: 48059.8, 300 sec: 47097.1). Total num frames: 1259208704. Throughput: 0: 11389.1. Samples: 314871808. Policy #0 lag: (min: 9.0, avg: 77.2, max: 265.0) [2024-06-15 19:04:35,956][1648985] Avg episode reward: [(0, '130.770')] [2024-06-15 19:04:37,677][1652491] Updated weights for policy 0, policy_version 614883 (0.0012) [2024-06-15 19:04:38,793][1652491] Updated weights for policy 0, policy_version 614928 (0.0012) [2024-06-15 19:04:40,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 46967.5, 300 sec: 47543.4). Total num frames: 1259470848. Throughput: 0: 11582.6. Samples: 314910208. Policy #0 lag: (min: 9.0, avg: 77.2, max: 265.0) [2024-06-15 19:04:40,956][1648985] Avg episode reward: [(0, '153.310')] [2024-06-15 19:04:43,516][1652491] Updated weights for policy 0, policy_version 614979 (0.0013) [2024-06-15 19:04:44,639][1652491] Updated weights for policy 0, policy_version 615036 (0.0012) [2024-06-15 19:04:45,954][1652491] Updated weights for policy 0, policy_version 615074 (0.0016) [2024-06-15 19:04:45,955][1648985] Fps is (10 sec: 45875.2, 60 sec: 46967.7, 300 sec: 46874.9). Total num frames: 1259667456. Throughput: 0: 11582.6. Samples: 314981888. Policy #0 lag: (min: 9.0, avg: 77.2, max: 265.0) [2024-06-15 19:04:45,956][1648985] Avg episode reward: [(0, '151.890')] [2024-06-15 19:04:47,929][1652491] Updated weights for policy 0, policy_version 615139 (0.0013) [2024-06-15 19:04:49,853][1652491] Updated weights for policy 0, policy_version 615188 (0.0030) [2024-06-15 19:04:50,955][1648985] Fps is (10 sec: 52428.0, 60 sec: 48059.5, 300 sec: 47541.4). Total num frames: 1259995136. Throughput: 0: 11821.5. Samples: 315055616. Policy #0 lag: (min: 9.0, avg: 77.2, max: 265.0) [2024-06-15 19:04:50,956][1648985] Avg episode reward: [(0, '143.270')] [2024-06-15 19:04:53,879][1652491] Updated weights for policy 0, policy_version 615248 (0.0012) [2024-06-15 19:04:55,955][1648985] Fps is (10 sec: 45875.0, 60 sec: 45875.3, 300 sec: 46986.0). Total num frames: 1260126208. Throughput: 0: 11901.1. Samples: 315098112. Policy #0 lag: (min: 9.0, avg: 77.2, max: 265.0) [2024-06-15 19:04:55,956][1648985] Avg episode reward: [(0, '152.290')] [2024-06-15 19:04:56,216][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000615312_1260158976.pth... [2024-06-15 19:04:56,216][1652491] Updated weights for policy 0, policy_version 615312 (0.0011) [2024-06-15 19:04:56,347][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000609824_1248919552.pth [2024-06-15 19:04:58,002][1652491] Updated weights for policy 0, policy_version 615377 (0.0013) [2024-06-15 19:05:00,517][1652491] Updated weights for policy 0, policy_version 615440 (0.0142) [2024-06-15 19:05:00,955][1648985] Fps is (10 sec: 45875.7, 60 sec: 48059.6, 300 sec: 47319.2). Total num frames: 1260453888. Throughput: 0: 11878.4. Samples: 315166720. Policy #0 lag: (min: 15.0, avg: 145.9, max: 271.0) [2024-06-15 19:05:00,956][1648985] Avg episode reward: [(0, '147.100')] [2024-06-15 19:05:01,523][1652491] Updated weights for policy 0, policy_version 615486 (0.0025) [2024-06-15 19:05:04,518][1651469] Signal inference workers to stop experience collection... (32050 times) [2024-06-15 19:05:04,570][1652491] InferenceWorker_p0-w0: stopping experience collection (32050 times) [2024-06-15 19:05:04,764][1651469] Signal inference workers to resume experience collection... (32050 times) [2024-06-15 19:05:04,771][1652491] InferenceWorker_p0-w0: resuming experience collection (32050 times) [2024-06-15 19:05:05,423][1652491] Updated weights for policy 0, policy_version 615545 (0.0014) [2024-06-15 19:05:05,955][1648985] Fps is (10 sec: 52428.4, 60 sec: 45875.1, 300 sec: 47208.1). Total num frames: 1260650496. Throughput: 0: 12049.0. Samples: 315243520. Policy #0 lag: (min: 15.0, avg: 145.9, max: 271.0) [2024-06-15 19:05:05,956][1648985] Avg episode reward: [(0, '141.460')] [2024-06-15 19:05:07,524][1652491] Updated weights for policy 0, policy_version 615610 (0.0014) [2024-06-15 19:05:09,428][1652491] Updated weights for policy 0, policy_version 615680 (0.0014) [2024-06-15 19:05:10,955][1648985] Fps is (10 sec: 49151.9, 60 sec: 49698.1, 300 sec: 47652.4). Total num frames: 1260945408. Throughput: 0: 12231.2. Samples: 315278848. Policy #0 lag: (min: 15.0, avg: 145.9, max: 271.0) [2024-06-15 19:05:10,956][1648985] Avg episode reward: [(0, '152.020')] [2024-06-15 19:05:11,964][1652491] Updated weights for policy 0, policy_version 615735 (0.0012) [2024-06-15 19:05:15,246][1652491] Updated weights for policy 0, policy_version 615760 (0.0012) [2024-06-15 19:05:15,955][1648985] Fps is (10 sec: 45874.9, 60 sec: 46967.3, 300 sec: 47319.2). Total num frames: 1261109248. Throughput: 0: 12333.5. Samples: 315355648. Policy #0 lag: (min: 15.0, avg: 145.9, max: 271.0) [2024-06-15 19:05:15,956][1648985] Avg episode reward: [(0, '157.580')] [2024-06-15 19:05:17,429][1652491] Updated weights for policy 0, policy_version 615809 (0.0014) [2024-06-15 19:05:18,857][1652491] Updated weights for policy 0, policy_version 615872 (0.0013) [2024-06-15 19:05:20,428][1652491] Updated weights for policy 0, policy_version 615935 (0.0108) [2024-06-15 19:05:20,955][1648985] Fps is (10 sec: 49152.1, 60 sec: 50244.2, 300 sec: 47541.4). Total num frames: 1261436928. Throughput: 0: 12231.1. Samples: 315422208. Policy #0 lag: (min: 15.0, avg: 145.9, max: 271.0) [2024-06-15 19:05:20,956][1648985] Avg episode reward: [(0, '140.880')] [2024-06-15 19:05:22,469][1652491] Updated weights for policy 0, policy_version 615985 (0.0015) [2024-06-15 19:05:25,965][1648985] Fps is (10 sec: 45828.9, 60 sec: 48051.7, 300 sec: 47539.7). Total num frames: 1261568000. Throughput: 0: 12216.9. Samples: 315460096. Policy #0 lag: (min: 15.0, avg: 145.9, max: 271.0) [2024-06-15 19:05:25,966][1648985] Avg episode reward: [(0, '128.680')] [2024-06-15 19:05:26,420][1652491] Updated weights for policy 0, policy_version 616018 (0.0027) [2024-06-15 19:05:27,266][1652491] Updated weights for policy 0, policy_version 616063 (0.0013) [2024-06-15 19:05:29,703][1652491] Updated weights for policy 0, policy_version 616112 (0.0015) [2024-06-15 19:05:30,926][1652491] Updated weights for policy 0, policy_version 616176 (0.0012) [2024-06-15 19:05:30,955][1648985] Fps is (10 sec: 49152.3, 60 sec: 49698.2, 300 sec: 47430.3). Total num frames: 1261928448. Throughput: 0: 12299.4. Samples: 315535360. Policy #0 lag: (min: 15.0, avg: 145.9, max: 271.0) [2024-06-15 19:05:30,955][1648985] Avg episode reward: [(0, '137.270')] [2024-06-15 19:05:31,961][1652491] Updated weights for policy 0, policy_version 616193 (0.0011) [2024-06-15 19:05:33,248][1652491] Updated weights for policy 0, policy_version 616252 (0.0014) [2024-06-15 19:05:35,955][1648985] Fps is (10 sec: 52483.5, 60 sec: 48059.8, 300 sec: 47541.4). Total num frames: 1262092288. Throughput: 0: 12390.5. Samples: 315613184. Policy #0 lag: (min: 15.0, avg: 145.9, max: 271.0) [2024-06-15 19:05:35,955][1648985] Avg episode reward: [(0, '148.250')] [2024-06-15 19:05:36,908][1652491] Updated weights for policy 0, policy_version 616316 (0.0015) [2024-06-15 19:05:40,664][1652491] Updated weights for policy 0, policy_version 616388 (0.0013) [2024-06-15 19:05:40,955][1648985] Fps is (10 sec: 45874.2, 60 sec: 48605.7, 300 sec: 47208.1). Total num frames: 1262387200. Throughput: 0: 12333.5. Samples: 315653120. Policy #0 lag: (min: 15.0, avg: 145.9, max: 271.0) [2024-06-15 19:05:40,956][1648985] Avg episode reward: [(0, '147.250')] [2024-06-15 19:05:41,748][1652491] Updated weights for policy 0, policy_version 616445 (0.0012) [2024-06-15 19:05:44,092][1652491] Updated weights for policy 0, policy_version 616501 (0.0026) [2024-06-15 19:05:45,955][1648985] Fps is (10 sec: 52428.2, 60 sec: 49152.0, 300 sec: 47652.5). Total num frames: 1262616576. Throughput: 0: 12162.9. Samples: 315714048. Policy #0 lag: (min: 15.0, avg: 145.9, max: 271.0) [2024-06-15 19:05:45,956][1648985] Avg episode reward: [(0, '162.150')] [2024-06-15 19:05:47,299][1651469] Signal inference workers to stop experience collection... (32100 times) [2024-06-15 19:05:47,314][1652491] InferenceWorker_p0-w0: stopping experience collection (32100 times) [2024-06-15 19:05:47,333][1652491] Updated weights for policy 0, policy_version 616531 (0.0012) [2024-06-15 19:05:47,526][1651469] Signal inference workers to resume experience collection... (32100 times) [2024-06-15 19:05:47,542][1652491] InferenceWorker_p0-w0: resuming experience collection (32100 times) [2024-06-15 19:05:50,955][1648985] Fps is (10 sec: 36045.5, 60 sec: 45875.3, 300 sec: 46986.0). Total num frames: 1262747648. Throughput: 0: 12117.4. Samples: 315788800. Policy #0 lag: (min: 15.0, avg: 145.9, max: 271.0) [2024-06-15 19:05:50,956][1648985] Avg episode reward: [(0, '178.300')] [2024-06-15 19:05:52,567][1652491] Updated weights for policy 0, policy_version 616640 (0.0015) [2024-06-15 19:05:53,795][1652491] Updated weights for policy 0, policy_version 616692 (0.0014) [2024-06-15 19:05:55,412][1652491] Updated weights for policy 0, policy_version 616723 (0.0025) [2024-06-15 19:05:55,955][1648985] Fps is (10 sec: 45875.2, 60 sec: 49152.1, 300 sec: 47763.6). Total num frames: 1263075328. Throughput: 0: 11935.3. Samples: 315815936. Policy #0 lag: (min: 15.0, avg: 145.9, max: 271.0) [2024-06-15 19:05:55,955][1648985] Avg episode reward: [(0, '176.090')] [2024-06-15 19:05:58,006][1652491] Updated weights for policy 0, policy_version 616771 (0.0016) [2024-06-15 19:05:59,250][1652491] Updated weights for policy 0, policy_version 616832 (0.0013) [2024-06-15 19:06:00,955][1648985] Fps is (10 sec: 52428.6, 60 sec: 46967.5, 300 sec: 47541.4). Total num frames: 1263271936. Throughput: 0: 11901.2. Samples: 315891200. Policy #0 lag: (min: 15.0, avg: 145.9, max: 271.0) [2024-06-15 19:06:00,956][1648985] Avg episode reward: [(0, '167.080')] [2024-06-15 19:06:03,521][1652491] Updated weights for policy 0, policy_version 616896 (0.0031) [2024-06-15 19:06:05,136][1652491] Updated weights for policy 0, policy_version 616956 (0.0011) [2024-06-15 19:06:05,956][1648985] Fps is (10 sec: 45873.2, 60 sec: 48059.5, 300 sec: 47541.3). Total num frames: 1263534080. Throughput: 0: 11935.2. Samples: 315959296. Policy #0 lag: (min: 15.0, avg: 145.9, max: 271.0) [2024-06-15 19:06:05,956][1648985] Avg episode reward: [(0, '149.690')] [2024-06-15 19:06:07,026][1652491] Updated weights for policy 0, policy_version 616996 (0.0014) [2024-06-15 19:06:10,073][1652491] Updated weights for policy 0, policy_version 617086 (0.0087) [2024-06-15 19:06:10,955][1648985] Fps is (10 sec: 52429.3, 60 sec: 47513.7, 300 sec: 47541.4). Total num frames: 1263796224. Throughput: 0: 11869.7. Samples: 315994112. Policy #0 lag: (min: 15.0, avg: 145.9, max: 271.0) [2024-06-15 19:06:10,955][1648985] Avg episode reward: [(0, '134.420')] [2024-06-15 19:06:15,004][1652491] Updated weights for policy 0, policy_version 617155 (0.0019) [2024-06-15 19:06:15,955][1648985] Fps is (10 sec: 45877.1, 60 sec: 48059.9, 300 sec: 47320.6). Total num frames: 1263992832. Throughput: 0: 11787.4. Samples: 316065792. Policy #0 lag: (min: 15.0, avg: 145.9, max: 271.0) [2024-06-15 19:06:15,955][1648985] Avg episode reward: [(0, '124.600')] [2024-06-15 19:06:16,425][1652491] Updated weights for policy 0, policy_version 617216 (0.0157) [2024-06-15 19:06:19,315][1652491] Updated weights for policy 0, policy_version 617278 (0.0014) [2024-06-15 19:06:20,959][1648985] Fps is (10 sec: 49134.1, 60 sec: 47510.8, 300 sec: 47429.7). Total num frames: 1264287744. Throughput: 0: 11411.0. Samples: 316126720. Policy #0 lag: (min: 15.0, avg: 145.9, max: 271.0) [2024-06-15 19:06:20,959][1648985] Avg episode reward: [(0, '147.010')] [2024-06-15 19:06:25,266][1652491] Updated weights for policy 0, policy_version 617345 (0.0016) [2024-06-15 19:06:25,955][1648985] Fps is (10 sec: 39321.9, 60 sec: 46975.6, 300 sec: 46874.9). Total num frames: 1264386048. Throughput: 0: 11355.1. Samples: 316164096. Policy #0 lag: (min: 15.0, avg: 145.9, max: 271.0) [2024-06-15 19:06:25,955][1648985] Avg episode reward: [(0, '162.330')] [2024-06-15 19:06:26,978][1652491] Updated weights for policy 0, policy_version 617424 (0.0011) [2024-06-15 19:06:28,098][1652491] Updated weights for policy 0, policy_version 617467 (0.0011) [2024-06-15 19:06:30,416][1651469] Signal inference workers to stop experience collection... (32150 times) [2024-06-15 19:06:30,483][1652491] InferenceWorker_p0-w0: stopping experience collection (32150 times) [2024-06-15 19:06:30,659][1651469] Signal inference workers to resume experience collection... (32150 times) [2024-06-15 19:06:30,659][1652491] InferenceWorker_p0-w0: resuming experience collection (32150 times) [2024-06-15 19:06:30,663][1652491] Updated weights for policy 0, policy_version 617520 (0.0015) [2024-06-15 19:06:30,955][1648985] Fps is (10 sec: 39335.8, 60 sec: 45875.2, 300 sec: 47541.4). Total num frames: 1264680960. Throughput: 0: 11605.3. Samples: 316236288. Policy #0 lag: (min: 15.0, avg: 145.9, max: 271.0) [2024-06-15 19:06:30,956][1648985] Avg episode reward: [(0, '152.230')] [2024-06-15 19:06:32,149][1652491] Updated weights for policy 0, policy_version 617597 (0.0014) [2024-06-15 19:06:35,955][1648985] Fps is (10 sec: 45875.2, 60 sec: 45875.2, 300 sec: 46763.9). Total num frames: 1264844800. Throughput: 0: 11639.5. Samples: 316312576. Policy #0 lag: (min: 15.0, avg: 145.9, max: 271.0) [2024-06-15 19:06:35,955][1648985] Avg episode reward: [(0, '156.360')] [2024-06-15 19:06:36,890][1652491] Updated weights for policy 0, policy_version 617651 (0.0014) [2024-06-15 19:06:38,598][1652491] Updated weights for policy 0, policy_version 617719 (0.0015) [2024-06-15 19:06:40,955][1648985] Fps is (10 sec: 42597.7, 60 sec: 45329.1, 300 sec: 47208.1). Total num frames: 1265106944. Throughput: 0: 11707.7. Samples: 316342784. Policy #0 lag: (min: 7.0, avg: 90.8, max: 263.0) [2024-06-15 19:06:40,956][1648985] Avg episode reward: [(0, '143.460')] [2024-06-15 19:06:41,752][1652491] Updated weights for policy 0, policy_version 617765 (0.0013) [2024-06-15 19:06:42,967][1652491] Updated weights for policy 0, policy_version 617808 (0.0015) [2024-06-15 19:06:43,790][1652491] Updated weights for policy 0, policy_version 617854 (0.0014) [2024-06-15 19:06:45,955][1648985] Fps is (10 sec: 52428.5, 60 sec: 45875.2, 300 sec: 46986.0). Total num frames: 1265369088. Throughput: 0: 11662.2. Samples: 316416000. Policy #0 lag: (min: 7.0, avg: 90.8, max: 263.0) [2024-06-15 19:06:45,955][1648985] Avg episode reward: [(0, '141.310')] [2024-06-15 19:06:47,584][1652491] Updated weights for policy 0, policy_version 617907 (0.0013) [2024-06-15 19:06:49,268][1652491] Updated weights for policy 0, policy_version 617974 (0.0011) [2024-06-15 19:06:50,955][1648985] Fps is (10 sec: 52430.0, 60 sec: 48059.8, 300 sec: 47430.3). Total num frames: 1265631232. Throughput: 0: 11878.5. Samples: 316493824. Policy #0 lag: (min: 7.0, avg: 90.8, max: 263.0) [2024-06-15 19:06:50,955][1648985] Avg episode reward: [(0, '144.920')] [2024-06-15 19:06:51,843][1652491] Updated weights for policy 0, policy_version 618016 (0.0022) [2024-06-15 19:06:52,619][1652491] Updated weights for policy 0, policy_version 618047 (0.0013) [2024-06-15 19:06:54,872][1652491] Updated weights for policy 0, policy_version 618107 (0.0012) [2024-06-15 19:06:55,955][1648985] Fps is (10 sec: 52428.4, 60 sec: 46967.4, 300 sec: 47097.1). Total num frames: 1265893376. Throughput: 0: 11867.0. Samples: 316528128. Policy #0 lag: (min: 7.0, avg: 90.8, max: 263.0) [2024-06-15 19:06:55,956][1648985] Avg episode reward: [(0, '152.670')] [2024-06-15 19:06:55,960][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000618112_1265893376.pth... [2024-06-15 19:06:56,012][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000612656_1254719488.pth [2024-06-15 19:06:57,792][1652491] Updated weights for policy 0, policy_version 618160 (0.0114) [2024-06-15 19:06:58,589][1652491] Updated weights for policy 0, policy_version 618197 (0.0012) [2024-06-15 19:07:00,955][1648985] Fps is (10 sec: 52427.9, 60 sec: 48059.7, 300 sec: 47541.4). Total num frames: 1266155520. Throughput: 0: 11787.3. Samples: 316596224. Policy #0 lag: (min: 7.0, avg: 90.8, max: 263.0) [2024-06-15 19:07:00,956][1648985] Avg episode reward: [(0, '150.660')] [2024-06-15 19:07:03,199][1652491] Updated weights for policy 0, policy_version 618272 (0.0088) [2024-06-15 19:07:04,966][1652491] Updated weights for policy 0, policy_version 618322 (0.0013) [2024-06-15 19:07:05,955][1648985] Fps is (10 sec: 49152.5, 60 sec: 47514.0, 300 sec: 47319.2). Total num frames: 1266384896. Throughput: 0: 11959.0. Samples: 316664832. Policy #0 lag: (min: 7.0, avg: 90.8, max: 263.0) [2024-06-15 19:07:05,955][1648985] Avg episode reward: [(0, '166.890')] [2024-06-15 19:07:06,067][1652491] Updated weights for policy 0, policy_version 618368 (0.0013) [2024-06-15 19:07:09,828][1652491] Updated weights for policy 0, policy_version 618451 (0.0014) [2024-06-15 19:07:10,115][1651469] Signal inference workers to stop experience collection... (32200 times) [2024-06-15 19:07:10,174][1652491] InferenceWorker_p0-w0: stopping experience collection (32200 times) [2024-06-15 19:07:10,316][1651469] Signal inference workers to resume experience collection... (32200 times) [2024-06-15 19:07:10,317][1652491] InferenceWorker_p0-w0: resuming experience collection (32200 times) [2024-06-15 19:07:10,955][1648985] Fps is (10 sec: 52429.4, 60 sec: 48059.7, 300 sec: 47541.4). Total num frames: 1266679808. Throughput: 0: 11992.2. Samples: 316703744. Policy #0 lag: (min: 7.0, avg: 90.8, max: 263.0) [2024-06-15 19:07:10,956][1648985] Avg episode reward: [(0, '168.030')] [2024-06-15 19:07:13,694][1652491] Updated weights for policy 0, policy_version 618512 (0.0012) [2024-06-15 19:07:14,838][1652491] Updated weights for policy 0, policy_version 618558 (0.0013) [2024-06-15 19:07:15,955][1648985] Fps is (10 sec: 42598.3, 60 sec: 46967.5, 300 sec: 47097.8). Total num frames: 1266810880. Throughput: 0: 11935.3. Samples: 316773376. Policy #0 lag: (min: 7.0, avg: 90.8, max: 263.0) [2024-06-15 19:07:15,956][1648985] Avg episode reward: [(0, '153.880')] [2024-06-15 19:07:17,158][1652491] Updated weights for policy 0, policy_version 618620 (0.0014) [2024-06-15 19:07:20,464][1652491] Updated weights for policy 0, policy_version 618688 (0.0013) [2024-06-15 19:07:20,955][1648985] Fps is (10 sec: 42598.3, 60 sec: 46970.3, 300 sec: 47208.1). Total num frames: 1267105792. Throughput: 0: 11867.0. Samples: 316846592. Policy #0 lag: (min: 7.0, avg: 90.8, max: 263.0) [2024-06-15 19:07:20,956][1648985] Avg episode reward: [(0, '149.960')] [2024-06-15 19:07:24,436][1652491] Updated weights for policy 0, policy_version 618768 (0.0069) [2024-06-15 19:07:25,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 49152.0, 300 sec: 47430.3). Total num frames: 1267335168. Throughput: 0: 12037.7. Samples: 316884480. Policy #0 lag: (min: 7.0, avg: 90.8, max: 263.0) [2024-06-15 19:07:25,955][1648985] Avg episode reward: [(0, '137.950')] [2024-06-15 19:07:26,946][1652491] Updated weights for policy 0, policy_version 618817 (0.0013) [2024-06-15 19:07:28,477][1652491] Updated weights for policy 0, policy_version 618879 (0.0013) [2024-06-15 19:07:30,955][1648985] Fps is (10 sec: 39321.8, 60 sec: 46967.5, 300 sec: 46763.8). Total num frames: 1267499008. Throughput: 0: 11855.6. Samples: 316949504. Policy #0 lag: (min: 7.0, avg: 90.8, max: 263.0) [2024-06-15 19:07:30,955][1648985] Avg episode reward: [(0, '162.170')] [2024-06-15 19:07:32,054][1652491] Updated weights for policy 0, policy_version 618944 (0.0013) [2024-06-15 19:07:35,955][1648985] Fps is (10 sec: 39321.5, 60 sec: 48059.7, 300 sec: 47208.1). Total num frames: 1267728384. Throughput: 0: 11616.7. Samples: 317016576. Policy #0 lag: (min: 7.0, avg: 90.8, max: 263.0) [2024-06-15 19:07:35,956][1648985] Avg episode reward: [(0, '178.440')] [2024-06-15 19:07:36,328][1652491] Updated weights for policy 0, policy_version 619024 (0.0014) [2024-06-15 19:07:38,675][1652491] Updated weights for policy 0, policy_version 619076 (0.0013) [2024-06-15 19:07:39,870][1652491] Updated weights for policy 0, policy_version 619136 (0.0013) [2024-06-15 19:07:40,955][1648985] Fps is (10 sec: 49151.7, 60 sec: 48059.8, 300 sec: 46763.8). Total num frames: 1267990528. Throughput: 0: 11639.5. Samples: 317051904. Policy #0 lag: (min: 7.0, avg: 90.8, max: 263.0) [2024-06-15 19:07:40,956][1648985] Avg episode reward: [(0, '169.160')] [2024-06-15 19:07:43,312][1652491] Updated weights for policy 0, policy_version 619200 (0.0119) [2024-06-15 19:07:44,884][1652491] Updated weights for policy 0, policy_version 619256 (0.0033) [2024-06-15 19:07:45,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 48059.7, 300 sec: 47430.3). Total num frames: 1268252672. Throughput: 0: 11628.1. Samples: 317119488. Policy #0 lag: (min: 7.0, avg: 90.8, max: 263.0) [2024-06-15 19:07:45,955][1648985] Avg episode reward: [(0, '177.890')] [2024-06-15 19:07:48,668][1652491] Updated weights for policy 0, policy_version 619322 (0.0027) [2024-06-15 19:07:50,261][1652491] Updated weights for policy 0, policy_version 619361 (0.0014) [2024-06-15 19:07:50,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 48059.7, 300 sec: 46986.0). Total num frames: 1268514816. Throughput: 0: 11639.5. Samples: 317188608. Policy #0 lag: (min: 7.0, avg: 90.8, max: 263.0) [2024-06-15 19:07:50,956][1648985] Avg episode reward: [(0, '167.540')] [2024-06-15 19:07:54,174][1652491] Updated weights for policy 0, policy_version 619428 (0.0119) [2024-06-15 19:07:54,863][1651469] Signal inference workers to stop experience collection... (32250 times) [2024-06-15 19:07:54,895][1652491] InferenceWorker_p0-w0: stopping experience collection (32250 times) [2024-06-15 19:07:55,150][1651469] Signal inference workers to resume experience collection... (32250 times) [2024-06-15 19:07:55,153][1652491] InferenceWorker_p0-w0: resuming experience collection (32250 times) [2024-06-15 19:07:55,955][1648985] Fps is (10 sec: 45873.9, 60 sec: 46967.3, 300 sec: 47319.2). Total num frames: 1268711424. Throughput: 0: 11719.0. Samples: 317231104. Policy #0 lag: (min: 7.0, avg: 90.8, max: 263.0) [2024-06-15 19:07:55,956][1648985] Avg episode reward: [(0, '158.620')] [2024-06-15 19:07:56,212][1652491] Updated weights for policy 0, policy_version 619512 (0.0015) [2024-06-15 19:07:59,914][1652491] Updated weights for policy 0, policy_version 619558 (0.0014) [2024-06-15 19:08:00,955][1648985] Fps is (10 sec: 39321.7, 60 sec: 45875.3, 300 sec: 46986.0). Total num frames: 1268908032. Throughput: 0: 11662.2. Samples: 317298176. Policy #0 lag: (min: 7.0, avg: 90.8, max: 263.0) [2024-06-15 19:08:00,956][1648985] Avg episode reward: [(0, '146.290')] [2024-06-15 19:08:01,164][1652491] Updated weights for policy 0, policy_version 619602 (0.0128) [2024-06-15 19:08:05,344][1652491] Updated weights for policy 0, policy_version 619668 (0.0022) [2024-06-15 19:08:05,955][1648985] Fps is (10 sec: 42599.9, 60 sec: 45875.2, 300 sec: 46986.0). Total num frames: 1269137408. Throughput: 0: 11446.1. Samples: 317361664. Policy #0 lag: (min: 7.0, avg: 90.8, max: 263.0) [2024-06-15 19:08:05,955][1648985] Avg episode reward: [(0, '150.010')] [2024-06-15 19:08:07,469][1652491] Updated weights for policy 0, policy_version 619747 (0.0014) [2024-06-15 19:08:10,958][1648985] Fps is (10 sec: 39312.1, 60 sec: 43688.9, 300 sec: 46874.5). Total num frames: 1269301248. Throughput: 0: 11274.8. Samples: 317391872. Policy #0 lag: (min: 7.0, avg: 90.8, max: 263.0) [2024-06-15 19:08:10,958][1648985] Avg episode reward: [(0, '144.410')] [2024-06-15 19:08:11,594][1652491] Updated weights for policy 0, policy_version 619794 (0.0012) [2024-06-15 19:08:12,798][1652491] Updated weights for policy 0, policy_version 619844 (0.0013) [2024-06-15 19:08:15,955][1648985] Fps is (10 sec: 42598.0, 60 sec: 45875.2, 300 sec: 46652.8). Total num frames: 1269563392. Throughput: 0: 11377.8. Samples: 317461504. Policy #0 lag: (min: 15.0, avg: 105.6, max: 271.0) [2024-06-15 19:08:15,956][1648985] Avg episode reward: [(0, '160.300')] [2024-06-15 19:08:16,536][1652491] Updated weights for policy 0, policy_version 619907 (0.0017) [2024-06-15 19:08:18,073][1652491] Updated weights for policy 0, policy_version 619984 (0.0012) [2024-06-15 19:08:19,357][1652491] Updated weights for policy 0, policy_version 620032 (0.0012) [2024-06-15 19:08:20,957][1648985] Fps is (10 sec: 52433.1, 60 sec: 45327.9, 300 sec: 47319.0). Total num frames: 1269825536. Throughput: 0: 11525.3. Samples: 317535232. Policy #0 lag: (min: 15.0, avg: 105.6, max: 271.0) [2024-06-15 19:08:20,957][1648985] Avg episode reward: [(0, '165.960')] [2024-06-15 19:08:24,185][1652491] Updated weights for policy 0, policy_version 620098 (0.0014) [2024-06-15 19:08:25,352][1652491] Updated weights for policy 0, policy_version 620155 (0.0012) [2024-06-15 19:08:25,955][1648985] Fps is (10 sec: 52427.9, 60 sec: 45875.0, 300 sec: 46874.9). Total num frames: 1270087680. Throughput: 0: 11559.8. Samples: 317572096. Policy #0 lag: (min: 15.0, avg: 105.6, max: 271.0) [2024-06-15 19:08:25,956][1648985] Avg episode reward: [(0, '162.200')] [2024-06-15 19:08:28,360][1652491] Updated weights for policy 0, policy_version 620194 (0.0013) [2024-06-15 19:08:29,922][1652491] Updated weights for policy 0, policy_version 620258 (0.0012) [2024-06-15 19:08:30,955][1648985] Fps is (10 sec: 52436.4, 60 sec: 47513.5, 300 sec: 47541.4). Total num frames: 1270349824. Throughput: 0: 11491.5. Samples: 317636608. Policy #0 lag: (min: 15.0, avg: 105.6, max: 271.0) [2024-06-15 19:08:30,956][1648985] Avg episode reward: [(0, '160.520')] [2024-06-15 19:08:33,255][1652491] Updated weights for policy 0, policy_version 620308 (0.0013) [2024-06-15 19:08:34,415][1652491] Updated weights for policy 0, policy_version 620357 (0.0045) [2024-06-15 19:08:35,161][1651469] Signal inference workers to stop experience collection... (32300 times) [2024-06-15 19:08:35,217][1652491] InferenceWorker_p0-w0: stopping experience collection (32300 times) [2024-06-15 19:08:35,390][1651469] Signal inference workers to resume experience collection... (32300 times) [2024-06-15 19:08:35,391][1652491] InferenceWorker_p0-w0: resuming experience collection (32300 times) [2024-06-15 19:08:35,658][1652491] Updated weights for policy 0, policy_version 620415 (0.0016) [2024-06-15 19:08:35,955][1648985] Fps is (10 sec: 52429.6, 60 sec: 48059.7, 300 sec: 47319.2). Total num frames: 1270611968. Throughput: 0: 11571.2. Samples: 317709312. Policy #0 lag: (min: 15.0, avg: 105.6, max: 271.0) [2024-06-15 19:08:35,956][1648985] Avg episode reward: [(0, '150.620')] [2024-06-15 19:08:40,868][1652491] Updated weights for policy 0, policy_version 620484 (0.0019) [2024-06-15 19:08:40,955][1648985] Fps is (10 sec: 39322.0, 60 sec: 45875.2, 300 sec: 47097.1). Total num frames: 1270743040. Throughput: 0: 11571.3. Samples: 317751808. Policy #0 lag: (min: 15.0, avg: 105.6, max: 271.0) [2024-06-15 19:08:40,956][1648985] Avg episode reward: [(0, '149.660')] [2024-06-15 19:08:44,312][1652491] Updated weights for policy 0, policy_version 620547 (0.0013) [2024-06-15 19:08:45,469][1652491] Updated weights for policy 0, policy_version 620608 (0.0012) [2024-06-15 19:08:45,955][1648985] Fps is (10 sec: 42598.1, 60 sec: 46421.3, 300 sec: 47208.1). Total num frames: 1271037952. Throughput: 0: 11559.8. Samples: 317818368. Policy #0 lag: (min: 15.0, avg: 105.6, max: 271.0) [2024-06-15 19:08:45,956][1648985] Avg episode reward: [(0, '167.200')] [2024-06-15 19:08:50,605][1652491] Updated weights for policy 0, policy_version 620704 (0.0015) [2024-06-15 19:08:50,955][1648985] Fps is (10 sec: 45875.1, 60 sec: 44782.9, 300 sec: 46874.9). Total num frames: 1271201792. Throughput: 0: 11764.6. Samples: 317891072. Policy #0 lag: (min: 15.0, avg: 105.6, max: 271.0) [2024-06-15 19:08:50,956][1648985] Avg episode reward: [(0, '164.420')] [2024-06-15 19:08:51,544][1652491] Updated weights for policy 0, policy_version 620741 (0.0017) [2024-06-15 19:08:52,621][1652491] Updated weights for policy 0, policy_version 620787 (0.0010) [2024-06-15 19:08:55,177][1652491] Updated weights for policy 0, policy_version 620821 (0.0011) [2024-06-15 19:08:55,955][1648985] Fps is (10 sec: 49152.1, 60 sec: 46967.6, 300 sec: 47319.2). Total num frames: 1271529472. Throughput: 0: 12015.6. Samples: 317932544. Policy #0 lag: (min: 15.0, avg: 105.6, max: 271.0) [2024-06-15 19:08:55,956][1648985] Avg episode reward: [(0, '161.440')] [2024-06-15 19:08:55,960][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000620864_1271529472.pth... [2024-06-15 19:08:56,042][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000615312_1260158976.pth [2024-06-15 19:08:56,980][1652491] Updated weights for policy 0, policy_version 620884 (0.0025) [2024-06-15 19:08:57,808][1652491] Updated weights for policy 0, policy_version 620927 (0.0013) [2024-06-15 19:09:00,860][1652491] Updated weights for policy 0, policy_version 620964 (0.0015) [2024-06-15 19:09:00,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 46967.4, 300 sec: 46874.9). Total num frames: 1271726080. Throughput: 0: 12003.5. Samples: 318001664. Policy #0 lag: (min: 15.0, avg: 105.6, max: 271.0) [2024-06-15 19:09:00,956][1648985] Avg episode reward: [(0, '159.280')] [2024-06-15 19:09:02,208][1652491] Updated weights for policy 0, policy_version 621013 (0.0012) [2024-06-15 19:09:05,955][1648985] Fps is (10 sec: 42598.6, 60 sec: 46967.4, 300 sec: 47430.3). Total num frames: 1271955456. Throughput: 0: 12095.0. Samples: 318079488. Policy #0 lag: (min: 15.0, avg: 105.6, max: 271.0) [2024-06-15 19:09:05,956][1648985] Avg episode reward: [(0, '141.570')] [2024-06-15 19:09:06,369][1652491] Updated weights for policy 0, policy_version 621088 (0.0012) [2024-06-15 19:09:06,996][1652491] Updated weights for policy 0, policy_version 621119 (0.0014) [2024-06-15 19:09:08,627][1652491] Updated weights for policy 0, policy_version 621169 (0.0012) [2024-06-15 19:09:10,955][1648985] Fps is (10 sec: 49152.2, 60 sec: 48607.8, 300 sec: 47208.1). Total num frames: 1272217600. Throughput: 0: 11946.7. Samples: 318109696. Policy #0 lag: (min: 15.0, avg: 105.6, max: 271.0) [2024-06-15 19:09:10,956][1648985] Avg episode reward: [(0, '128.740')] [2024-06-15 19:09:11,625][1652491] Updated weights for policy 0, policy_version 621232 (0.0013) [2024-06-15 19:09:14,075][1652491] Updated weights for policy 0, policy_version 621300 (0.0015) [2024-06-15 19:09:15,955][1648985] Fps is (10 sec: 49152.0, 60 sec: 48059.7, 300 sec: 47541.4). Total num frames: 1272446976. Throughput: 0: 11992.2. Samples: 318176256. Policy #0 lag: (min: 15.0, avg: 105.6, max: 271.0) [2024-06-15 19:09:15,956][1648985] Avg episode reward: [(0, '131.420')] [2024-06-15 19:09:17,419][1652491] Updated weights for policy 0, policy_version 621360 (0.0012) [2024-06-15 19:09:19,244][1651469] Signal inference workers to stop experience collection... (32350 times) [2024-06-15 19:09:19,315][1652491] InferenceWorker_p0-w0: stopping experience collection (32350 times) [2024-06-15 19:09:19,579][1651469] Signal inference workers to resume experience collection... (32350 times) [2024-06-15 19:09:19,580][1652491] InferenceWorker_p0-w0: resuming experience collection (32350 times) [2024-06-15 19:09:20,118][1652491] Updated weights for policy 0, policy_version 621410 (0.0013) [2024-06-15 19:09:20,955][1648985] Fps is (10 sec: 49151.8, 60 sec: 48061.0, 300 sec: 47541.4). Total num frames: 1272709120. Throughput: 0: 12060.4. Samples: 318252032. Policy #0 lag: (min: 15.0, avg: 105.6, max: 271.0) [2024-06-15 19:09:20,956][1648985] Avg episode reward: [(0, '139.350')] [2024-06-15 19:09:21,613][1652491] Updated weights for policy 0, policy_version 621446 (0.0013) [2024-06-15 19:09:24,310][1652491] Updated weights for policy 0, policy_version 621506 (0.0015) [2024-06-15 19:09:25,684][1652491] Updated weights for policy 0, policy_version 621560 (0.0013) [2024-06-15 19:09:25,955][1648985] Fps is (10 sec: 52428.6, 60 sec: 48059.8, 300 sec: 47541.4). Total num frames: 1272971264. Throughput: 0: 11867.0. Samples: 318285824. Policy #0 lag: (min: 15.0, avg: 105.6, max: 271.0) [2024-06-15 19:09:25,956][1648985] Avg episode reward: [(0, '139.410')] [2024-06-15 19:09:28,088][1652491] Updated weights for policy 0, policy_version 621600 (0.0012) [2024-06-15 19:09:30,560][1652491] Updated weights for policy 0, policy_version 621635 (0.0012) [2024-06-15 19:09:30,955][1648985] Fps is (10 sec: 42598.6, 60 sec: 46421.4, 300 sec: 47208.1). Total num frames: 1273135104. Throughput: 0: 11935.3. Samples: 318355456. Policy #0 lag: (min: 15.0, avg: 105.6, max: 271.0) [2024-06-15 19:09:30,956][1648985] Avg episode reward: [(0, '132.520')] [2024-06-15 19:09:31,925][1652491] Updated weights for policy 0, policy_version 621694 (0.0011) [2024-06-15 19:09:34,021][1652491] Updated weights for policy 0, policy_version 621752 (0.0013) [2024-06-15 19:09:35,955][1648985] Fps is (10 sec: 45875.9, 60 sec: 46967.5, 300 sec: 47319.2). Total num frames: 1273430016. Throughput: 0: 11810.2. Samples: 318422528. Policy #0 lag: (min: 15.0, avg: 105.6, max: 271.0) [2024-06-15 19:09:35,955][1648985] Avg episode reward: [(0, '150.190')] [2024-06-15 19:09:36,007][1652491] Updated weights for policy 0, policy_version 621795 (0.0011) [2024-06-15 19:09:38,764][1652491] Updated weights for policy 0, policy_version 621856 (0.0013) [2024-06-15 19:09:40,955][1648985] Fps is (10 sec: 49152.2, 60 sec: 48059.8, 300 sec: 47319.2). Total num frames: 1273626624. Throughput: 0: 11730.5. Samples: 318460416. Policy #0 lag: (min: 15.0, avg: 105.6, max: 271.0) [2024-06-15 19:09:40,955][1648985] Avg episode reward: [(0, '163.390')] [2024-06-15 19:09:42,758][1652491] Updated weights for policy 0, policy_version 621936 (0.0022) [2024-06-15 19:09:44,707][1652491] Updated weights for policy 0, policy_version 622000 (0.0024) [2024-06-15 19:09:45,955][1648985] Fps is (10 sec: 45874.9, 60 sec: 47513.7, 300 sec: 47097.1). Total num frames: 1273888768. Throughput: 0: 11787.4. Samples: 318532096. Policy #0 lag: (min: 15.0, avg: 105.6, max: 271.0) [2024-06-15 19:09:45,956][1648985] Avg episode reward: [(0, '136.700')] [2024-06-15 19:09:46,826][1652491] Updated weights for policy 0, policy_version 622034 (0.0026) [2024-06-15 19:09:49,635][1652491] Updated weights for policy 0, policy_version 622096 (0.0013) [2024-06-15 19:09:50,528][1652491] Updated weights for policy 0, policy_version 622144 (0.0012) [2024-06-15 19:09:50,955][1648985] Fps is (10 sec: 52429.4, 60 sec: 49152.2, 300 sec: 47541.4). Total num frames: 1274150912. Throughput: 0: 11741.9. Samples: 318607872. Policy #0 lag: (min: 15.0, avg: 134.8, max: 271.0) [2024-06-15 19:09:50,955][1648985] Avg episode reward: [(0, '142.110')] [2024-06-15 19:09:53,787][1652491] Updated weights for policy 0, policy_version 622202 (0.0014) [2024-06-15 19:09:55,086][1652491] Updated weights for policy 0, policy_version 622256 (0.0014) [2024-06-15 19:09:55,955][1648985] Fps is (10 sec: 52427.9, 60 sec: 48059.7, 300 sec: 47319.2). Total num frames: 1274413056. Throughput: 0: 11923.9. Samples: 318646272. Policy #0 lag: (min: 15.0, avg: 134.8, max: 271.0) [2024-06-15 19:09:55,956][1648985] Avg episode reward: [(0, '153.380')] [2024-06-15 19:09:57,922][1652491] Updated weights for policy 0, policy_version 622308 (0.0067) [2024-06-15 19:10:00,789][1652491] Updated weights for policy 0, policy_version 622368 (0.0108) [2024-06-15 19:10:00,955][1648985] Fps is (10 sec: 45874.7, 60 sec: 48059.8, 300 sec: 47319.2). Total num frames: 1274609664. Throughput: 0: 12060.5. Samples: 318718976. Policy #0 lag: (min: 15.0, avg: 134.8, max: 271.0) [2024-06-15 19:10:00,956][1648985] Avg episode reward: [(0, '165.020')] [2024-06-15 19:10:03,728][1651469] Signal inference workers to stop experience collection... (32400 times) [2024-06-15 19:10:03,765][1652491] InferenceWorker_p0-w0: stopping experience collection (32400 times) [2024-06-15 19:10:03,789][1652491] Updated weights for policy 0, policy_version 622418 (0.0015) [2024-06-15 19:10:03,926][1651469] Signal inference workers to resume experience collection... (32400 times) [2024-06-15 19:10:03,927][1652491] InferenceWorker_p0-w0: resuming experience collection (32400 times) [2024-06-15 19:10:05,009][1652491] Updated weights for policy 0, policy_version 622468 (0.0015) [2024-06-15 19:10:05,955][1648985] Fps is (10 sec: 49152.9, 60 sec: 49152.0, 300 sec: 47319.2). Total num frames: 1274904576. Throughput: 0: 11844.3. Samples: 318785024. Policy #0 lag: (min: 15.0, avg: 134.8, max: 271.0) [2024-06-15 19:10:05,956][1648985] Avg episode reward: [(0, '175.150')] [2024-06-15 19:10:08,818][1652491] Updated weights for policy 0, policy_version 622560 (0.0057) [2024-06-15 19:10:09,502][1652491] Updated weights for policy 0, policy_version 622590 (0.0012) [2024-06-15 19:10:10,955][1648985] Fps is (10 sec: 45875.3, 60 sec: 47513.7, 300 sec: 47319.2). Total num frames: 1275068416. Throughput: 0: 11901.2. Samples: 318821376. Policy #0 lag: (min: 15.0, avg: 134.8, max: 271.0) [2024-06-15 19:10:10,955][1648985] Avg episode reward: [(0, '160.900')] [2024-06-15 19:10:12,212][1652491] Updated weights for policy 0, policy_version 622655 (0.0013) [2024-06-15 19:10:15,363][1652491] Updated weights for policy 0, policy_version 622715 (0.0013) [2024-06-15 19:10:15,955][1648985] Fps is (10 sec: 42597.8, 60 sec: 48059.7, 300 sec: 47097.0). Total num frames: 1275330560. Throughput: 0: 12049.0. Samples: 318897664. Policy #0 lag: (min: 15.0, avg: 134.8, max: 271.0) [2024-06-15 19:10:15,956][1648985] Avg episode reward: [(0, '140.880')] [2024-06-15 19:10:17,041][1652491] Updated weights for policy 0, policy_version 622768 (0.0016) [2024-06-15 19:10:20,543][1652491] Updated weights for policy 0, policy_version 622836 (0.0014) [2024-06-15 19:10:20,955][1648985] Fps is (10 sec: 52427.5, 60 sec: 48059.6, 300 sec: 47543.0). Total num frames: 1275592704. Throughput: 0: 12060.4. Samples: 318965248. Policy #0 lag: (min: 15.0, avg: 134.8, max: 271.0) [2024-06-15 19:10:20,956][1648985] Avg episode reward: [(0, '153.590')] [2024-06-15 19:10:22,639][1652491] Updated weights for policy 0, policy_version 622866 (0.0012) [2024-06-15 19:10:25,026][1652491] Updated weights for policy 0, policy_version 622918 (0.0031) [2024-06-15 19:10:25,955][1648985] Fps is (10 sec: 49151.5, 60 sec: 47513.5, 300 sec: 47097.0). Total num frames: 1275822080. Throughput: 0: 12083.1. Samples: 319004160. Policy #0 lag: (min: 15.0, avg: 134.8, max: 271.0) [2024-06-15 19:10:25,956][1648985] Avg episode reward: [(0, '177.690')] [2024-06-15 19:10:26,112][1652491] Updated weights for policy 0, policy_version 622975 (0.0012) [2024-06-15 19:10:28,145][1652491] Updated weights for policy 0, policy_version 623038 (0.0014) [2024-06-15 19:10:30,962][1648985] Fps is (10 sec: 49117.1, 60 sec: 49146.0, 300 sec: 47429.1). Total num frames: 1276084224. Throughput: 0: 12115.4. Samples: 319077376. Policy #0 lag: (min: 15.0, avg: 134.8, max: 271.0) [2024-06-15 19:10:30,963][1648985] Avg episode reward: [(0, '152.390')] [2024-06-15 19:10:31,001][1652491] Updated weights for policy 0, policy_version 623096 (0.0015) [2024-06-15 19:10:33,789][1652491] Updated weights for policy 0, policy_version 623152 (0.0013) [2024-06-15 19:10:35,955][1648985] Fps is (10 sec: 45876.4, 60 sec: 47513.6, 300 sec: 47097.1). Total num frames: 1276280832. Throughput: 0: 12151.4. Samples: 319154688. Policy #0 lag: (min: 15.0, avg: 134.8, max: 271.0) [2024-06-15 19:10:35,956][1648985] Avg episode reward: [(0, '150.210')] [2024-06-15 19:10:36,490][1652491] Updated weights for policy 0, policy_version 623217 (0.0014) [2024-06-15 19:10:38,554][1652491] Updated weights for policy 0, policy_version 623266 (0.0042) [2024-06-15 19:10:40,955][1648985] Fps is (10 sec: 42629.9, 60 sec: 48059.8, 300 sec: 47097.1). Total num frames: 1276510208. Throughput: 0: 11889.8. Samples: 319181312. Policy #0 lag: (min: 15.0, avg: 134.8, max: 271.0) [2024-06-15 19:10:40,955][1648985] Avg episode reward: [(0, '147.100')] [2024-06-15 19:10:41,936][1652491] Updated weights for policy 0, policy_version 623344 (0.0037) [2024-06-15 19:10:44,523][1652491] Updated weights for policy 0, policy_version 623376 (0.0017) [2024-06-15 19:10:45,955][1648985] Fps is (10 sec: 49151.0, 60 sec: 48059.6, 300 sec: 47541.3). Total num frames: 1276772352. Throughput: 0: 11923.9. Samples: 319255552. Policy #0 lag: (min: 15.0, avg: 134.8, max: 271.0) [2024-06-15 19:10:45,956][1648985] Avg episode reward: [(0, '154.380')] [2024-06-15 19:10:46,540][1652491] Updated weights for policy 0, policy_version 623425 (0.0125) [2024-06-15 19:10:47,471][1652491] Updated weights for policy 0, policy_version 623488 (0.0034) [2024-06-15 19:10:49,063][1651469] Signal inference workers to stop experience collection... (32450 times) [2024-06-15 19:10:49,110][1652491] InferenceWorker_p0-w0: stopping experience collection (32450 times) [2024-06-15 19:10:49,389][1651469] Signal inference workers to resume experience collection... (32450 times) [2024-06-15 19:10:49,391][1652491] InferenceWorker_p0-w0: resuming experience collection (32450 times) [2024-06-15 19:10:50,439][1652491] Updated weights for policy 0, policy_version 623552 (0.0013) [2024-06-15 19:10:50,955][1648985] Fps is (10 sec: 52427.9, 60 sec: 48059.6, 300 sec: 47319.2). Total num frames: 1277034496. Throughput: 0: 12060.4. Samples: 319327744. Policy #0 lag: (min: 15.0, avg: 134.8, max: 271.0) [2024-06-15 19:10:50,956][1648985] Avg episode reward: [(0, '151.200')] [2024-06-15 19:10:52,759][1652491] Updated weights for policy 0, policy_version 623600 (0.0023) [2024-06-15 19:10:54,936][1652491] Updated weights for policy 0, policy_version 623648 (0.0029) [2024-06-15 19:10:55,955][1648985] Fps is (10 sec: 52429.3, 60 sec: 48059.8, 300 sec: 47541.4). Total num frames: 1277296640. Throughput: 0: 12174.2. Samples: 319369216. Policy #0 lag: (min: 15.0, avg: 134.8, max: 271.0) [2024-06-15 19:10:55,956][1648985] Avg episode reward: [(0, '151.640')] [2024-06-15 19:10:55,967][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000623680_1277296640.pth... [2024-06-15 19:10:56,054][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000618112_1265893376.pth [2024-06-15 19:10:56,063][1651469] Saving a milestone train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/milestones/checkpoint_000623680_1277296640.pth [2024-06-15 19:10:57,348][1652491] Updated weights for policy 0, policy_version 623696 (0.0019) [2024-06-15 19:10:59,577][1652491] Updated weights for policy 0, policy_version 623747 (0.0015) [2024-06-15 19:11:00,955][1648985] Fps is (10 sec: 49151.8, 60 sec: 48605.8, 300 sec: 47430.3). Total num frames: 1277526016. Throughput: 0: 12049.1. Samples: 319439872. Policy #0 lag: (min: 15.0, avg: 134.8, max: 271.0) [2024-06-15 19:11:00,956][1648985] Avg episode reward: [(0, '152.000')] [2024-06-15 19:11:00,997][1652491] Updated weights for policy 0, policy_version 623805 (0.0023) [2024-06-15 19:11:03,781][1652491] Updated weights for policy 0, policy_version 623862 (0.0012) [2024-06-15 19:11:05,955][1648985] Fps is (10 sec: 45875.7, 60 sec: 47513.6, 300 sec: 47319.2). Total num frames: 1277755392. Throughput: 0: 12060.5. Samples: 319507968. Policy #0 lag: (min: 15.0, avg: 134.8, max: 271.0) [2024-06-15 19:11:05,955][1648985] Avg episode reward: [(0, '143.090')] [2024-06-15 19:11:06,529][1652491] Updated weights for policy 0, policy_version 623930 (0.0015) [2024-06-15 19:11:09,260][1652491] Updated weights for policy 0, policy_version 623968 (0.0014) [2024-06-15 19:11:10,956][1648985] Fps is (10 sec: 42597.4, 60 sec: 48059.4, 300 sec: 47319.2). Total num frames: 1277952000. Throughput: 0: 12049.0. Samples: 319546368. Policy #0 lag: (min: 15.0, avg: 134.8, max: 271.0) [2024-06-15 19:11:10,956][1648985] Avg episode reward: [(0, '134.580')] [2024-06-15 19:11:11,503][1652491] Updated weights for policy 0, policy_version 624032 (0.0025) [2024-06-15 19:11:14,697][1652491] Updated weights for policy 0, policy_version 624081 (0.0017) [2024-06-15 19:11:15,411][1652491] Updated weights for policy 0, policy_version 624121 (0.0061) [2024-06-15 19:11:15,955][1648985] Fps is (10 sec: 45875.8, 60 sec: 48060.0, 300 sec: 47208.7). Total num frames: 1278214144. Throughput: 0: 11971.4. Samples: 319616000. Policy #0 lag: (min: 15.0, avg: 134.8, max: 271.0) [2024-06-15 19:11:15,955][1648985] Avg episode reward: [(0, '149.010')] [2024-06-15 19:11:16,853][1652491] Updated weights for policy 0, policy_version 624161 (0.0013) [2024-06-15 19:11:20,215][1652491] Updated weights for policy 0, policy_version 624208 (0.0014) [2024-06-15 19:11:20,955][1648985] Fps is (10 sec: 49153.5, 60 sec: 47513.7, 300 sec: 47652.4). Total num frames: 1278443520. Throughput: 0: 11935.3. Samples: 319691776. Policy #0 lag: (min: 15.0, avg: 134.8, max: 271.0) [2024-06-15 19:11:20,956][1648985] Avg episode reward: [(0, '158.460')] [2024-06-15 19:11:21,176][1652491] Updated weights for policy 0, policy_version 624247 (0.0011) [2024-06-15 19:11:22,363][1652491] Updated weights for policy 0, policy_version 624304 (0.0012) [2024-06-15 19:11:25,175][1652491] Updated weights for policy 0, policy_version 624343 (0.0100) [2024-06-15 19:11:25,775][1652491] Updated weights for policy 0, policy_version 624384 (0.0015) [2024-06-15 19:11:25,955][1648985] Fps is (10 sec: 52426.6, 60 sec: 48605.8, 300 sec: 47652.4). Total num frames: 1278738432. Throughput: 0: 12128.6. Samples: 319727104. Policy #0 lag: (min: 47.0, avg: 160.2, max: 287.0) [2024-06-15 19:11:25,956][1648985] Avg episode reward: [(0, '169.500')] [2024-06-15 19:11:27,426][1652491] Updated weights for policy 0, policy_version 624437 (0.0013) [2024-06-15 19:11:30,788][1652491] Updated weights for policy 0, policy_version 624467 (0.0024) [2024-06-15 19:11:30,955][1648985] Fps is (10 sec: 49152.3, 60 sec: 47519.4, 300 sec: 47763.5). Total num frames: 1278935040. Throughput: 0: 12253.9. Samples: 319806976. Policy #0 lag: (min: 47.0, avg: 160.2, max: 287.0) [2024-06-15 19:11:30,956][1648985] Avg episode reward: [(0, '161.280')] [2024-06-15 19:11:31,675][1651469] Signal inference workers to stop experience collection... (32500 times) [2024-06-15 19:11:31,705][1652491] InferenceWorker_p0-w0: stopping experience collection (32500 times) [2024-06-15 19:11:31,906][1651469] Signal inference workers to resume experience collection... (32500 times) [2024-06-15 19:11:31,907][1652491] InferenceWorker_p0-w0: resuming experience collection (32500 times) [2024-06-15 19:11:32,113][1652491] Updated weights for policy 0, policy_version 624531 (0.0179) [2024-06-15 19:11:35,366][1652491] Updated weights for policy 0, policy_version 624579 (0.0013) [2024-06-15 19:11:35,955][1648985] Fps is (10 sec: 45876.5, 60 sec: 48605.9, 300 sec: 47763.6). Total num frames: 1279197184. Throughput: 0: 12162.9. Samples: 319875072. Policy #0 lag: (min: 47.0, avg: 160.2, max: 287.0) [2024-06-15 19:11:35,955][1648985] Avg episode reward: [(0, '161.340')] [2024-06-15 19:11:36,257][1652491] Updated weights for policy 0, policy_version 624632 (0.0014) [2024-06-15 19:11:37,722][1652491] Updated weights for policy 0, policy_version 624688 (0.0015) [2024-06-15 19:11:40,955][1648985] Fps is (10 sec: 45874.4, 60 sec: 48059.5, 300 sec: 47541.3). Total num frames: 1279393792. Throughput: 0: 12094.5. Samples: 319913472. Policy #0 lag: (min: 47.0, avg: 160.2, max: 287.0) [2024-06-15 19:11:40,956][1648985] Avg episode reward: [(0, '154.020')] [2024-06-15 19:11:41,721][1652491] Updated weights for policy 0, policy_version 624721 (0.0014) [2024-06-15 19:11:43,184][1652491] Updated weights for policy 0, policy_version 624788 (0.0011) [2024-06-15 19:11:44,110][1652491] Updated weights for policy 0, policy_version 624828 (0.0032) [2024-06-15 19:11:45,955][1648985] Fps is (10 sec: 49151.1, 60 sec: 48605.9, 300 sec: 47652.4). Total num frames: 1279688704. Throughput: 0: 12208.3. Samples: 319989248. Policy #0 lag: (min: 47.0, avg: 160.2, max: 287.0) [2024-06-15 19:11:45,956][1648985] Avg episode reward: [(0, '144.690')] [2024-06-15 19:11:46,858][1652491] Updated weights for policy 0, policy_version 624896 (0.0014) [2024-06-15 19:11:50,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 48059.7, 300 sec: 47541.4). Total num frames: 1279918080. Throughput: 0: 12208.3. Samples: 320057344. Policy #0 lag: (min: 47.0, avg: 160.2, max: 287.0) [2024-06-15 19:11:50,956][1648985] Avg episode reward: [(0, '133.910')] [2024-06-15 19:11:52,438][1652491] Updated weights for policy 0, policy_version 624961 (0.0023) [2024-06-15 19:11:54,072][1652491] Updated weights for policy 0, policy_version 625040 (0.0016) [2024-06-15 19:11:55,110][1652491] Updated weights for policy 0, policy_version 625086 (0.0020) [2024-06-15 19:11:55,955][1648985] Fps is (10 sec: 49152.7, 60 sec: 48059.8, 300 sec: 47541.4). Total num frames: 1280180224. Throughput: 0: 12219.8. Samples: 320096256. Policy #0 lag: (min: 47.0, avg: 160.2, max: 287.0) [2024-06-15 19:11:55,956][1648985] Avg episode reward: [(0, '154.340')] [2024-06-15 19:11:58,352][1652491] Updated weights for policy 0, policy_version 625152 (0.0012) [2024-06-15 19:11:59,515][1652491] Updated weights for policy 0, policy_version 625216 (0.0014) [2024-06-15 19:12:00,955][1648985] Fps is (10 sec: 52429.7, 60 sec: 48606.0, 300 sec: 47652.4). Total num frames: 1280442368. Throughput: 0: 12140.1. Samples: 320162304. Policy #0 lag: (min: 47.0, avg: 160.2, max: 287.0) [2024-06-15 19:12:00,956][1648985] Avg episode reward: [(0, '154.630')] [2024-06-15 19:12:04,587][1652491] Updated weights for policy 0, policy_version 625284 (0.0015) [2024-06-15 19:12:05,955][1648985] Fps is (10 sec: 52427.7, 60 sec: 49151.8, 300 sec: 47541.3). Total num frames: 1280704512. Throughput: 0: 12083.2. Samples: 320235520. Policy #0 lag: (min: 47.0, avg: 160.2, max: 287.0) [2024-06-15 19:12:05,956][1648985] Avg episode reward: [(0, '181.280')] [2024-06-15 19:12:07,823][1652491] Updated weights for policy 0, policy_version 625360 (0.0014) [2024-06-15 19:12:10,256][1652491] Updated weights for policy 0, policy_version 625429 (0.0013) [2024-06-15 19:12:10,955][1648985] Fps is (10 sec: 49151.8, 60 sec: 49698.4, 300 sec: 47874.6). Total num frames: 1280933888. Throughput: 0: 12083.3. Samples: 320270848. Policy #0 lag: (min: 47.0, avg: 160.2, max: 287.0) [2024-06-15 19:12:10,956][1648985] Avg episode reward: [(0, '180.190')] [2024-06-15 19:12:14,011][1652491] Updated weights for policy 0, policy_version 625476 (0.0012) [2024-06-15 19:12:14,601][1651469] Signal inference workers to stop experience collection... (32550 times) [2024-06-15 19:12:14,653][1652491] InferenceWorker_p0-w0: stopping experience collection (32550 times) [2024-06-15 19:12:14,778][1651469] Signal inference workers to resume experience collection... (32550 times) [2024-06-15 19:12:14,807][1652491] InferenceWorker_p0-w0: resuming experience collection (32550 times) [2024-06-15 19:12:15,810][1652491] Updated weights for policy 0, policy_version 625539 (0.0040) [2024-06-15 19:12:15,955][1648985] Fps is (10 sec: 39322.0, 60 sec: 48059.5, 300 sec: 47430.3). Total num frames: 1281097728. Throughput: 0: 11992.1. Samples: 320346624. Policy #0 lag: (min: 47.0, avg: 160.2, max: 287.0) [2024-06-15 19:12:15,956][1648985] Avg episode reward: [(0, '173.360')] [2024-06-15 19:12:17,031][1652491] Updated weights for policy 0, policy_version 625587 (0.0014) [2024-06-15 19:12:19,506][1652491] Updated weights for policy 0, policy_version 625648 (0.0012) [2024-06-15 19:12:20,955][1648985] Fps is (10 sec: 45875.1, 60 sec: 49152.0, 300 sec: 47652.4). Total num frames: 1281392640. Throughput: 0: 12117.3. Samples: 320420352. Policy #0 lag: (min: 47.0, avg: 160.2, max: 287.0) [2024-06-15 19:12:20,956][1648985] Avg episode reward: [(0, '155.740')] [2024-06-15 19:12:21,184][1652491] Updated weights for policy 0, policy_version 625689 (0.0013) [2024-06-15 19:12:21,883][1652491] Updated weights for policy 0, policy_version 625728 (0.0013) [2024-06-15 19:12:25,217][1652491] Updated weights for policy 0, policy_version 625789 (0.0016) [2024-06-15 19:12:25,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 48059.8, 300 sec: 47874.6). Total num frames: 1281622016. Throughput: 0: 12140.1. Samples: 320459776. Policy #0 lag: (min: 47.0, avg: 160.2, max: 287.0) [2024-06-15 19:12:25,956][1648985] Avg episode reward: [(0, '144.710')] [2024-06-15 19:12:27,294][1652491] Updated weights for policy 0, policy_version 625840 (0.0011) [2024-06-15 19:12:29,837][1652491] Updated weights for policy 0, policy_version 625872 (0.0089) [2024-06-15 19:12:30,978][1648985] Fps is (10 sec: 49041.5, 60 sec: 49133.5, 300 sec: 47982.0). Total num frames: 1281884160. Throughput: 0: 12020.3. Samples: 320530432. Policy #0 lag: (min: 47.0, avg: 160.2, max: 287.0) [2024-06-15 19:12:30,978][1648985] Avg episode reward: [(0, '161.750')] [2024-06-15 19:12:31,774][1652491] Updated weights for policy 0, policy_version 625921 (0.0012) [2024-06-15 19:12:33,085][1652491] Updated weights for policy 0, policy_version 625984 (0.0014) [2024-06-15 19:12:35,955][1648985] Fps is (10 sec: 45875.5, 60 sec: 48059.6, 300 sec: 47763.5). Total num frames: 1282080768. Throughput: 0: 12049.1. Samples: 320599552. Policy #0 lag: (min: 47.0, avg: 160.2, max: 287.0) [2024-06-15 19:12:35,956][1648985] Avg episode reward: [(0, '150.390')] [2024-06-15 19:12:36,320][1652491] Updated weights for policy 0, policy_version 626048 (0.0022) [2024-06-15 19:12:38,893][1652491] Updated weights for policy 0, policy_version 626111 (0.0012) [2024-06-15 19:12:40,955][1648985] Fps is (10 sec: 39411.1, 60 sec: 48060.0, 300 sec: 47541.4). Total num frames: 1282277376. Throughput: 0: 11821.5. Samples: 320628224. Policy #0 lag: (min: 47.0, avg: 160.2, max: 287.0) [2024-06-15 19:12:40,955][1648985] Avg episode reward: [(0, '153.100')] [2024-06-15 19:12:42,157][1652491] Updated weights for policy 0, policy_version 626168 (0.0013) [2024-06-15 19:12:44,228][1652491] Updated weights for policy 0, policy_version 626234 (0.0012) [2024-06-15 19:12:45,955][1648985] Fps is (10 sec: 45875.2, 60 sec: 47513.7, 300 sec: 47541.4). Total num frames: 1282539520. Throughput: 0: 11958.0. Samples: 320700416. Policy #0 lag: (min: 47.0, avg: 160.2, max: 287.0) [2024-06-15 19:12:45,956][1648985] Avg episode reward: [(0, '150.440')] [2024-06-15 19:12:47,665][1652491] Updated weights for policy 0, policy_version 626304 (0.0111) [2024-06-15 19:12:50,574][1652491] Updated weights for policy 0, policy_version 626365 (0.0016) [2024-06-15 19:12:50,955][1648985] Fps is (10 sec: 52428.1, 60 sec: 48059.8, 300 sec: 47763.6). Total num frames: 1282801664. Throughput: 0: 11901.2. Samples: 320771072. Policy #0 lag: (min: 47.0, avg: 160.2, max: 287.0) [2024-06-15 19:12:50,956][1648985] Avg episode reward: [(0, '148.910')] [2024-06-15 19:12:52,461][1652491] Updated weights for policy 0, policy_version 626402 (0.0013) [2024-06-15 19:12:54,397][1652491] Updated weights for policy 0, policy_version 626453 (0.0013) [2024-06-15 19:12:55,955][1648985] Fps is (10 sec: 52427.8, 60 sec: 48059.5, 300 sec: 47985.6). Total num frames: 1283063808. Throughput: 0: 11958.0. Samples: 320808960. Policy #0 lag: (min: 47.0, avg: 160.2, max: 287.0) [2024-06-15 19:12:55,956][1648985] Avg episode reward: [(0, '152.740')] [2024-06-15 19:12:55,960][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000626496_1283063808.pth... [2024-06-15 19:12:56,031][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000620864_1271529472.pth [2024-06-15 19:12:57,873][1652491] Updated weights for policy 0, policy_version 626512 (0.0016) [2024-06-15 19:12:59,150][1652491] Updated weights for policy 0, policy_version 626559 (0.0012) [2024-06-15 19:13:00,553][1651469] Signal inference workers to stop experience collection... (32600 times) [2024-06-15 19:13:00,611][1652491] InferenceWorker_p0-w0: stopping experience collection (32600 times) [2024-06-15 19:13:00,732][1651469] Signal inference workers to resume experience collection... (32600 times) [2024-06-15 19:13:00,735][1652491] InferenceWorker_p0-w0: resuming experience collection (32600 times) [2024-06-15 19:13:00,955][1648985] Fps is (10 sec: 45874.7, 60 sec: 46967.4, 300 sec: 47874.6). Total num frames: 1283260416. Throughput: 0: 11867.0. Samples: 320880640. Policy #0 lag: (min: 15.0, avg: 120.8, max: 271.0) [2024-06-15 19:13:00,956][1648985] Avg episode reward: [(0, '152.990')] [2024-06-15 19:13:01,008][1652491] Updated weights for policy 0, policy_version 626600 (0.0021) [2024-06-15 19:13:02,837][1652491] Updated weights for policy 0, policy_version 626644 (0.0031) [2024-06-15 19:13:05,381][1652491] Updated weights for policy 0, policy_version 626704 (0.0094) [2024-06-15 19:13:05,955][1648985] Fps is (10 sec: 45875.7, 60 sec: 46967.5, 300 sec: 48208.2). Total num frames: 1283522560. Throughput: 0: 11764.6. Samples: 320949760. Policy #0 lag: (min: 15.0, avg: 120.8, max: 271.0) [2024-06-15 19:13:05,956][1648985] Avg episode reward: [(0, '163.470')] [2024-06-15 19:13:09,504][1652491] Updated weights for policy 0, policy_version 626768 (0.0012) [2024-06-15 19:13:10,955][1648985] Fps is (10 sec: 45875.7, 60 sec: 46421.3, 300 sec: 47985.7). Total num frames: 1283719168. Throughput: 0: 11798.8. Samples: 320990720. Policy #0 lag: (min: 15.0, avg: 120.8, max: 271.0) [2024-06-15 19:13:10,956][1648985] Avg episode reward: [(0, '152.560')] [2024-06-15 19:13:12,323][1652491] Updated weights for policy 0, policy_version 626872 (0.0014) [2024-06-15 19:13:13,868][1652491] Updated weights for policy 0, policy_version 626912 (0.0016) [2024-06-15 19:13:15,955][1648985] Fps is (10 sec: 45876.2, 60 sec: 48059.9, 300 sec: 47986.0). Total num frames: 1283981312. Throughput: 0: 11611.2. Samples: 321052672. Policy #0 lag: (min: 15.0, avg: 120.8, max: 271.0) [2024-06-15 19:13:15,955][1648985] Avg episode reward: [(0, '157.900')] [2024-06-15 19:13:16,964][1652491] Updated weights for policy 0, policy_version 626979 (0.0094) [2024-06-15 19:13:20,887][1652491] Updated weights for policy 0, policy_version 627012 (0.0012) [2024-06-15 19:13:20,955][1648985] Fps is (10 sec: 39321.4, 60 sec: 45329.1, 300 sec: 47541.4). Total num frames: 1284112384. Throughput: 0: 11844.3. Samples: 321132544. Policy #0 lag: (min: 15.0, avg: 120.8, max: 271.0) [2024-06-15 19:13:20,956][1648985] Avg episode reward: [(0, '162.190')] [2024-06-15 19:13:23,339][1652491] Updated weights for policy 0, policy_version 627107 (0.0014) [2024-06-15 19:13:25,955][1648985] Fps is (10 sec: 42597.5, 60 sec: 46421.3, 300 sec: 47652.4). Total num frames: 1284407296. Throughput: 0: 11684.9. Samples: 321154048. Policy #0 lag: (min: 15.0, avg: 120.8, max: 271.0) [2024-06-15 19:13:25,956][1648985] Avg episode reward: [(0, '156.910')] [2024-06-15 19:13:26,213][1652491] Updated weights for policy 0, policy_version 627168 (0.0022) [2024-06-15 19:13:28,198][1652491] Updated weights for policy 0, policy_version 627234 (0.0016) [2024-06-15 19:13:30,955][1648985] Fps is (10 sec: 52429.2, 60 sec: 45892.5, 300 sec: 47541.4). Total num frames: 1284636672. Throughput: 0: 11764.6. Samples: 321229824. Policy #0 lag: (min: 15.0, avg: 120.8, max: 271.0) [2024-06-15 19:13:30,955][1648985] Avg episode reward: [(0, '156.610')] [2024-06-15 19:13:32,503][1652491] Updated weights for policy 0, policy_version 627280 (0.0014) [2024-06-15 19:13:35,122][1652491] Updated weights for policy 0, policy_version 627376 (0.0014) [2024-06-15 19:13:35,955][1648985] Fps is (10 sec: 49152.8, 60 sec: 46967.5, 300 sec: 47985.7). Total num frames: 1284898816. Throughput: 0: 11548.5. Samples: 321290752. Policy #0 lag: (min: 15.0, avg: 120.8, max: 271.0) [2024-06-15 19:13:35,955][1648985] Avg episode reward: [(0, '158.310')] [2024-06-15 19:13:38,139][1652491] Updated weights for policy 0, policy_version 627440 (0.0068) [2024-06-15 19:13:40,177][1652491] Updated weights for policy 0, policy_version 627488 (0.0014) [2024-06-15 19:13:40,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 48059.7, 300 sec: 47874.6). Total num frames: 1285160960. Throughput: 0: 11594.0. Samples: 321330688. Policy #0 lag: (min: 15.0, avg: 120.8, max: 271.0) [2024-06-15 19:13:40,955][1648985] Avg episode reward: [(0, '139.270')] [2024-06-15 19:13:43,181][1652491] Updated weights for policy 0, policy_version 627521 (0.0013) [2024-06-15 19:13:44,688][1651469] Signal inference workers to stop experience collection... (32650 times) [2024-06-15 19:13:44,711][1652491] Updated weights for policy 0, policy_version 627585 (0.0013) [2024-06-15 19:13:44,745][1652491] InferenceWorker_p0-w0: stopping experience collection (32650 times) [2024-06-15 19:13:45,013][1651469] Signal inference workers to resume experience collection... (32650 times) [2024-06-15 19:13:45,014][1652491] InferenceWorker_p0-w0: resuming experience collection (32650 times) [2024-06-15 19:13:45,955][1648985] Fps is (10 sec: 49150.8, 60 sec: 47513.5, 300 sec: 48096.7). Total num frames: 1285390336. Throughput: 0: 11548.4. Samples: 321400320. Policy #0 lag: (min: 15.0, avg: 120.8, max: 271.0) [2024-06-15 19:13:45,956][1648985] Avg episode reward: [(0, '151.220')] [2024-06-15 19:13:46,007][1652491] Updated weights for policy 0, policy_version 627644 (0.0012) [2024-06-15 19:13:49,831][1652491] Updated weights for policy 0, policy_version 627700 (0.0012) [2024-06-15 19:13:50,955][1648985] Fps is (10 sec: 42598.0, 60 sec: 46421.3, 300 sec: 47652.4). Total num frames: 1285586944. Throughput: 0: 11571.2. Samples: 321470464. Policy #0 lag: (min: 15.0, avg: 120.8, max: 271.0) [2024-06-15 19:13:50,956][1648985] Avg episode reward: [(0, '163.410')] [2024-06-15 19:13:51,186][1652491] Updated weights for policy 0, policy_version 627748 (0.0020) [2024-06-15 19:13:54,832][1652491] Updated weights for policy 0, policy_version 627808 (0.0012) [2024-06-15 19:13:55,955][1648985] Fps is (10 sec: 42599.2, 60 sec: 45875.4, 300 sec: 47763.5). Total num frames: 1285816320. Throughput: 0: 11525.7. Samples: 321509376. Policy #0 lag: (min: 15.0, avg: 120.8, max: 271.0) [2024-06-15 19:13:55,956][1648985] Avg episode reward: [(0, '141.760')] [2024-06-15 19:13:56,784][1652491] Updated weights for policy 0, policy_version 627888 (0.0013) [2024-06-15 19:14:00,334][1652491] Updated weights for policy 0, policy_version 627922 (0.0013) [2024-06-15 19:14:00,955][1648985] Fps is (10 sec: 45875.3, 60 sec: 46421.4, 300 sec: 47763.5). Total num frames: 1286045696. Throughput: 0: 11753.2. Samples: 321581568. Policy #0 lag: (min: 15.0, avg: 120.8, max: 271.0) [2024-06-15 19:14:00,956][1648985] Avg episode reward: [(0, '126.380')] [2024-06-15 19:14:01,643][1652491] Updated weights for policy 0, policy_version 627985 (0.0010) [2024-06-15 19:14:02,394][1652491] Updated weights for policy 0, policy_version 628031 (0.0011) [2024-06-15 19:14:05,955][1648985] Fps is (10 sec: 42598.3, 60 sec: 45329.1, 300 sec: 47541.4). Total num frames: 1286242304. Throughput: 0: 11571.2. Samples: 321653248. Policy #0 lag: (min: 15.0, avg: 120.8, max: 271.0) [2024-06-15 19:14:05,956][1648985] Avg episode reward: [(0, '148.620')] [2024-06-15 19:14:06,512][1652491] Updated weights for policy 0, policy_version 628086 (0.0012) [2024-06-15 19:14:07,918][1652491] Updated weights for policy 0, policy_version 628148 (0.0080) [2024-06-15 19:14:10,955][1648985] Fps is (10 sec: 42598.4, 60 sec: 45875.2, 300 sec: 47541.4). Total num frames: 1286471680. Throughput: 0: 11776.0. Samples: 321683968. Policy #0 lag: (min: 15.0, avg: 120.8, max: 271.0) [2024-06-15 19:14:10,956][1648985] Avg episode reward: [(0, '166.030')] [2024-06-15 19:14:11,730][1652491] Updated weights for policy 0, policy_version 628195 (0.0012) [2024-06-15 19:14:13,634][1652491] Updated weights for policy 0, policy_version 628281 (0.0091) [2024-06-15 19:14:15,955][1648985] Fps is (10 sec: 49152.0, 60 sec: 45875.1, 300 sec: 47541.4). Total num frames: 1286733824. Throughput: 0: 11662.2. Samples: 321754624. Policy #0 lag: (min: 15.0, avg: 120.8, max: 271.0) [2024-06-15 19:14:15,956][1648985] Avg episode reward: [(0, '165.240')] [2024-06-15 19:14:17,306][1652491] Updated weights for policy 0, policy_version 628323 (0.0015) [2024-06-15 19:14:19,453][1652491] Updated weights for policy 0, policy_version 628386 (0.0015) [2024-06-15 19:14:20,955][1648985] Fps is (10 sec: 52429.2, 60 sec: 48059.8, 300 sec: 47541.4). Total num frames: 1286995968. Throughput: 0: 11821.5. Samples: 321822720. Policy #0 lag: (min: 15.0, avg: 120.8, max: 271.0) [2024-06-15 19:14:20,956][1648985] Avg episode reward: [(0, '159.730')] [2024-06-15 19:14:23,407][1652491] Updated weights for policy 0, policy_version 628464 (0.0015) [2024-06-15 19:14:25,040][1652491] Updated weights for policy 0, policy_version 628530 (0.0013) [2024-06-15 19:14:25,955][1648985] Fps is (10 sec: 52429.5, 60 sec: 47513.8, 300 sec: 47874.6). Total num frames: 1287258112. Throughput: 0: 11639.5. Samples: 321854464. Policy #0 lag: (min: 15.0, avg: 120.8, max: 271.0) [2024-06-15 19:14:25,955][1648985] Avg episode reward: [(0, '159.890')] [2024-06-15 19:14:28,516][1652491] Updated weights for policy 0, policy_version 628576 (0.0011) [2024-06-15 19:14:28,652][1651469] Signal inference workers to stop experience collection... (32700 times) [2024-06-15 19:14:28,693][1652491] InferenceWorker_p0-w0: stopping experience collection (32700 times) [2024-06-15 19:14:28,904][1651469] Signal inference workers to resume experience collection... (32700 times) [2024-06-15 19:14:28,905][1652491] InferenceWorker_p0-w0: resuming experience collection (32700 times) [2024-06-15 19:14:30,206][1652491] Updated weights for policy 0, policy_version 628625 (0.0015) [2024-06-15 19:14:30,955][1648985] Fps is (10 sec: 49151.7, 60 sec: 47513.6, 300 sec: 47652.4). Total num frames: 1287487488. Throughput: 0: 11730.5. Samples: 321928192. Policy #0 lag: (min: 15.0, avg: 120.8, max: 271.0) [2024-06-15 19:14:30,956][1648985] Avg episode reward: [(0, '138.710')] [2024-06-15 19:14:34,221][1652491] Updated weights for policy 0, policy_version 628688 (0.0015) [2024-06-15 19:14:35,955][1648985] Fps is (10 sec: 42598.5, 60 sec: 46421.4, 300 sec: 47652.5). Total num frames: 1287684096. Throughput: 0: 11628.1. Samples: 321993728. Policy #0 lag: (min: 7.0, avg: 98.8, max: 263.0) [2024-06-15 19:14:35,956][1648985] Avg episode reward: [(0, '147.610')] [2024-06-15 19:14:36,283][1652491] Updated weights for policy 0, policy_version 628768 (0.0012) [2024-06-15 19:14:39,125][1652491] Updated weights for policy 0, policy_version 628819 (0.0012) [2024-06-15 19:14:40,078][1652491] Updated weights for policy 0, policy_version 628860 (0.0021) [2024-06-15 19:14:40,956][1648985] Fps is (10 sec: 42596.1, 60 sec: 45874.7, 300 sec: 47541.3). Total num frames: 1287913472. Throughput: 0: 11662.1. Samples: 322034176. Policy #0 lag: (min: 7.0, avg: 98.8, max: 263.0) [2024-06-15 19:14:40,956][1648985] Avg episode reward: [(0, '150.530')] [2024-06-15 19:14:41,764][1652491] Updated weights for policy 0, policy_version 628914 (0.0014) [2024-06-15 19:14:45,955][1648985] Fps is (10 sec: 42598.2, 60 sec: 45329.3, 300 sec: 47319.2). Total num frames: 1288110080. Throughput: 0: 11650.9. Samples: 322105856. Policy #0 lag: (min: 7.0, avg: 98.8, max: 263.0) [2024-06-15 19:14:45,955][1648985] Avg episode reward: [(0, '148.840')] [2024-06-15 19:14:46,020][1652491] Updated weights for policy 0, policy_version 628962 (0.0013) [2024-06-15 19:14:47,773][1652491] Updated weights for policy 0, policy_version 629026 (0.0013) [2024-06-15 19:14:50,592][1652491] Updated weights for policy 0, policy_version 629072 (0.0011) [2024-06-15 19:14:50,955][1648985] Fps is (10 sec: 42600.3, 60 sec: 45875.2, 300 sec: 47208.1). Total num frames: 1288339456. Throughput: 0: 11514.3. Samples: 322171392. Policy #0 lag: (min: 7.0, avg: 98.8, max: 263.0) [2024-06-15 19:14:50,956][1648985] Avg episode reward: [(0, '159.660')] [2024-06-15 19:14:51,725][1652491] Updated weights for policy 0, policy_version 629120 (0.0011) [2024-06-15 19:14:53,974][1652491] Updated weights for policy 0, policy_version 629177 (0.0017) [2024-06-15 19:14:55,955][1648985] Fps is (10 sec: 45873.3, 60 sec: 45875.0, 300 sec: 47319.2). Total num frames: 1288568832. Throughput: 0: 11434.6. Samples: 322198528. Policy #0 lag: (min: 7.0, avg: 98.8, max: 263.0) [2024-06-15 19:14:55,956][1648985] Avg episode reward: [(0, '154.640')] [2024-06-15 19:14:55,960][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000629184_1288568832.pth... [2024-06-15 19:14:56,030][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000623680_1277296640.pth [2024-06-15 19:14:58,292][1652491] Updated weights for policy 0, policy_version 629235 (0.0014) [2024-06-15 19:15:00,001][1652491] Updated weights for policy 0, policy_version 629301 (0.0012) [2024-06-15 19:15:00,955][1648985] Fps is (10 sec: 49152.6, 60 sec: 46421.4, 300 sec: 47208.1). Total num frames: 1288830976. Throughput: 0: 11332.3. Samples: 322264576. Policy #0 lag: (min: 7.0, avg: 98.8, max: 263.0) [2024-06-15 19:15:00,956][1648985] Avg episode reward: [(0, '166.870')] [2024-06-15 19:15:02,645][1652491] Updated weights for policy 0, policy_version 629331 (0.0012) [2024-06-15 19:15:04,744][1652491] Updated weights for policy 0, policy_version 629392 (0.0013) [2024-06-15 19:15:05,662][1652491] Updated weights for policy 0, policy_version 629434 (0.0014) [2024-06-15 19:15:05,961][1648985] Fps is (10 sec: 52407.7, 60 sec: 47510.2, 300 sec: 47540.7). Total num frames: 1289093120. Throughput: 0: 11422.2. Samples: 322336768. Policy #0 lag: (min: 7.0, avg: 98.8, max: 263.0) [2024-06-15 19:15:05,962][1648985] Avg episode reward: [(0, '168.900')] [2024-06-15 19:15:08,931][1652491] Updated weights for policy 0, policy_version 629488 (0.0014) [2024-06-15 19:15:10,384][1652491] Updated weights for policy 0, policy_version 629542 (0.0011) [2024-06-15 19:15:10,955][1648985] Fps is (10 sec: 52428.0, 60 sec: 48059.6, 300 sec: 47541.4). Total num frames: 1289355264. Throughput: 0: 11616.6. Samples: 322377216. Policy #0 lag: (min: 7.0, avg: 98.8, max: 263.0) [2024-06-15 19:15:10,956][1648985] Avg episode reward: [(0, '163.460')] [2024-06-15 19:15:13,368][1651469] Signal inference workers to stop experience collection... (32750 times) [2024-06-15 19:15:13,400][1652491] InferenceWorker_p0-w0: stopping experience collection (32750 times) [2024-06-15 19:15:13,639][1651469] Signal inference workers to resume experience collection... (32750 times) [2024-06-15 19:15:13,640][1652491] InferenceWorker_p0-w0: resuming experience collection (32750 times) [2024-06-15 19:15:14,815][1652491] Updated weights for policy 0, policy_version 629620 (0.0147) [2024-06-15 19:15:15,955][1648985] Fps is (10 sec: 39338.0, 60 sec: 45875.1, 300 sec: 47097.1). Total num frames: 1289486336. Throughput: 0: 11389.1. Samples: 322440704. Policy #0 lag: (min: 7.0, avg: 98.8, max: 263.0) [2024-06-15 19:15:15,956][1648985] Avg episode reward: [(0, '153.790')] [2024-06-15 19:15:16,505][1652491] Updated weights for policy 0, policy_version 629664 (0.0015) [2024-06-15 19:15:19,403][1652491] Updated weights for policy 0, policy_version 629698 (0.0015) [2024-06-15 19:15:20,944][1652491] Updated weights for policy 0, policy_version 629760 (0.0013) [2024-06-15 19:15:20,955][1648985] Fps is (10 sec: 39322.5, 60 sec: 45875.2, 300 sec: 47208.2). Total num frames: 1289748480. Throughput: 0: 11525.7. Samples: 322512384. Policy #0 lag: (min: 7.0, avg: 98.8, max: 263.0) [2024-06-15 19:15:20,955][1648985] Avg episode reward: [(0, '150.450')] [2024-06-15 19:15:22,374][1652491] Updated weights for policy 0, policy_version 629812 (0.0012) [2024-06-15 19:15:25,532][1652491] Updated weights for policy 0, policy_version 629872 (0.0028) [2024-06-15 19:15:25,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 45875.0, 300 sec: 47209.3). Total num frames: 1290010624. Throughput: 0: 11355.1. Samples: 322545152. Policy #0 lag: (min: 7.0, avg: 98.8, max: 263.0) [2024-06-15 19:15:25,956][1648985] Avg episode reward: [(0, '148.400')] [2024-06-15 19:15:27,479][1652491] Updated weights for policy 0, policy_version 629920 (0.0014) [2024-06-15 19:15:30,910][1652491] Updated weights for policy 0, policy_version 629984 (0.0014) [2024-06-15 19:15:30,955][1648985] Fps is (10 sec: 45874.5, 60 sec: 45329.0, 300 sec: 47208.1). Total num frames: 1290207232. Throughput: 0: 11457.4. Samples: 322621440. Policy #0 lag: (min: 7.0, avg: 98.8, max: 263.0) [2024-06-15 19:15:30,956][1648985] Avg episode reward: [(0, '143.510')] [2024-06-15 19:15:32,078][1652491] Updated weights for policy 0, policy_version 630034 (0.0014) [2024-06-15 19:15:35,145][1652491] Updated weights for policy 0, policy_version 630083 (0.0011) [2024-06-15 19:15:35,955][1648985] Fps is (10 sec: 45875.9, 60 sec: 46421.2, 300 sec: 47319.2). Total num frames: 1290469376. Throughput: 0: 11605.4. Samples: 322693632. Policy #0 lag: (min: 7.0, avg: 98.8, max: 263.0) [2024-06-15 19:15:35,956][1648985] Avg episode reward: [(0, '159.700')] [2024-06-15 19:15:36,160][1652491] Updated weights for policy 0, policy_version 630144 (0.0013) [2024-06-15 19:15:39,015][1652491] Updated weights for policy 0, policy_version 630202 (0.0013) [2024-06-15 19:15:40,955][1648985] Fps is (10 sec: 45874.7, 60 sec: 45875.5, 300 sec: 47097.1). Total num frames: 1290665984. Throughput: 0: 11696.4. Samples: 322724864. Policy #0 lag: (min: 7.0, avg: 98.8, max: 263.0) [2024-06-15 19:15:40,956][1648985] Avg episode reward: [(0, '170.060')] [2024-06-15 19:15:42,181][1652491] Updated weights for policy 0, policy_version 630268 (0.0013) [2024-06-15 19:15:43,907][1652491] Updated weights for policy 0, policy_version 630308 (0.0011) [2024-06-15 19:15:45,955][1648985] Fps is (10 sec: 45874.4, 60 sec: 46967.2, 300 sec: 47097.0). Total num frames: 1290928128. Throughput: 0: 11832.8. Samples: 322797056. Policy #0 lag: (min: 7.0, avg: 98.8, max: 263.0) [2024-06-15 19:15:45,956][1648985] Avg episode reward: [(0, '174.600')] [2024-06-15 19:15:47,147][1652491] Updated weights for policy 0, policy_version 630390 (0.0014) [2024-06-15 19:15:49,723][1652491] Updated weights for policy 0, policy_version 630435 (0.0013) [2024-06-15 19:15:50,962][1648985] Fps is (10 sec: 52390.4, 60 sec: 47507.7, 300 sec: 47095.9). Total num frames: 1291190272. Throughput: 0: 11854.8. Samples: 322870272. Policy #0 lag: (min: 7.0, avg: 98.8, max: 263.0) [2024-06-15 19:15:50,963][1648985] Avg episode reward: [(0, '155.550')] [2024-06-15 19:15:52,193][1652491] Updated weights for policy 0, policy_version 630480 (0.0015) [2024-06-15 19:15:54,034][1652491] Updated weights for policy 0, policy_version 630560 (0.0017) [2024-06-15 19:15:55,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 48059.8, 300 sec: 47208.1). Total num frames: 1291452416. Throughput: 0: 11832.9. Samples: 322909696. Policy #0 lag: (min: 7.0, avg: 98.8, max: 263.0) [2024-06-15 19:15:55,956][1648985] Avg episode reward: [(0, '162.700')] [2024-06-15 19:15:57,391][1652491] Updated weights for policy 0, policy_version 630608 (0.0015) [2024-06-15 19:15:57,498][1651469] Signal inference workers to stop experience collection... (32800 times) [2024-06-15 19:15:57,544][1652491] InferenceWorker_p0-w0: stopping experience collection (32800 times) [2024-06-15 19:15:57,730][1651469] Signal inference workers to resume experience collection... (32800 times) [2024-06-15 19:15:57,731][1652491] InferenceWorker_p0-w0: resuming experience collection (32800 times) [2024-06-15 19:15:59,686][1652491] Updated weights for policy 0, policy_version 630657 (0.0012) [2024-06-15 19:16:00,955][1648985] Fps is (10 sec: 49188.7, 60 sec: 47513.6, 300 sec: 47208.1). Total num frames: 1291681792. Throughput: 0: 12003.6. Samples: 322980864. Policy #0 lag: (min: 7.0, avg: 98.8, max: 263.0) [2024-06-15 19:16:00,956][1648985] Avg episode reward: [(0, '168.120')] [2024-06-15 19:16:01,181][1652491] Updated weights for policy 0, policy_version 630714 (0.0012) [2024-06-15 19:16:04,059][1652491] Updated weights for policy 0, policy_version 630754 (0.0080) [2024-06-15 19:16:05,955][1648985] Fps is (10 sec: 45876.1, 60 sec: 46970.9, 300 sec: 47319.3). Total num frames: 1291911168. Throughput: 0: 11730.5. Samples: 323040256. Policy #0 lag: (min: 7.0, avg: 98.8, max: 263.0) [2024-06-15 19:16:05,956][1648985] Avg episode reward: [(0, '157.780')] [2024-06-15 19:16:06,223][1652491] Updated weights for policy 0, policy_version 630848 (0.0016) [2024-06-15 19:16:10,955][1648985] Fps is (10 sec: 42598.5, 60 sec: 45875.3, 300 sec: 47097.0). Total num frames: 1292107776. Throughput: 0: 11832.9. Samples: 323077632. Policy #0 lag: (min: 47.0, avg: 152.9, max: 303.0) [2024-06-15 19:16:10,956][1648985] Avg episode reward: [(0, '151.540')] [2024-06-15 19:16:11,316][1652491] Updated weights for policy 0, policy_version 630914 (0.0019) [2024-06-15 19:16:12,719][1652491] Updated weights for policy 0, policy_version 630973 (0.0034) [2024-06-15 19:16:15,955][1648985] Fps is (10 sec: 36044.6, 60 sec: 46421.4, 300 sec: 46874.9). Total num frames: 1292271616. Throughput: 0: 11707.7. Samples: 323148288. Policy #0 lag: (min: 47.0, avg: 152.9, max: 303.0) [2024-06-15 19:16:15,956][1648985] Avg episode reward: [(0, '150.850')] [2024-06-15 19:16:16,932][1652491] Updated weights for policy 0, policy_version 631040 (0.0016) [2024-06-15 19:16:18,211][1652491] Updated weights for policy 0, policy_version 631103 (0.0013) [2024-06-15 19:16:20,955][1648985] Fps is (10 sec: 42598.2, 60 sec: 46421.2, 300 sec: 46763.9). Total num frames: 1292533760. Throughput: 0: 11616.7. Samples: 323216384. Policy #0 lag: (min: 47.0, avg: 152.9, max: 303.0) [2024-06-15 19:16:20,956][1648985] Avg episode reward: [(0, '159.190')] [2024-06-15 19:16:21,464][1652491] Updated weights for policy 0, policy_version 631155 (0.0095) [2024-06-15 19:16:23,489][1652491] Updated weights for policy 0, policy_version 631200 (0.0040) [2024-06-15 19:16:25,955][1648985] Fps is (10 sec: 49152.3, 60 sec: 45875.3, 300 sec: 46874.9). Total num frames: 1292763136. Throughput: 0: 11628.1. Samples: 323248128. Policy #0 lag: (min: 47.0, avg: 152.9, max: 303.0) [2024-06-15 19:16:25,956][1648985] Avg episode reward: [(0, '147.290')] [2024-06-15 19:16:28,145][1652491] Updated weights for policy 0, policy_version 631293 (0.0014) [2024-06-15 19:16:30,955][1648985] Fps is (10 sec: 49151.0, 60 sec: 46967.3, 300 sec: 46874.9). Total num frames: 1293025280. Throughput: 0: 11514.3. Samples: 323315200. Policy #0 lag: (min: 47.0, avg: 152.9, max: 303.0) [2024-06-15 19:16:30,956][1648985] Avg episode reward: [(0, '173.400')] [2024-06-15 19:16:31,935][1652491] Updated weights for policy 0, policy_version 631362 (0.0012) [2024-06-15 19:16:32,874][1652491] Updated weights for policy 0, policy_version 631420 (0.0015) [2024-06-15 19:16:35,355][1652491] Updated weights for policy 0, policy_version 631486 (0.0013) [2024-06-15 19:16:35,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 46967.5, 300 sec: 47097.1). Total num frames: 1293287424. Throughput: 0: 11493.5. Samples: 323387392. Policy #0 lag: (min: 47.0, avg: 152.9, max: 303.0) [2024-06-15 19:16:35,956][1648985] Avg episode reward: [(0, '181.180')] [2024-06-15 19:16:39,468][1652491] Updated weights for policy 0, policy_version 631545 (0.0016) [2024-06-15 19:16:40,955][1648985] Fps is (10 sec: 45875.9, 60 sec: 46967.5, 300 sec: 46763.8). Total num frames: 1293484032. Throughput: 0: 11366.4. Samples: 323421184. Policy #0 lag: (min: 47.0, avg: 152.9, max: 303.0) [2024-06-15 19:16:40,956][1648985] Avg episode reward: [(0, '176.780')] [2024-06-15 19:16:41,489][1652491] Updated weights for policy 0, policy_version 631605 (0.0013) [2024-06-15 19:16:43,396][1651469] Signal inference workers to stop experience collection... (32850 times) [2024-06-15 19:16:43,439][1652491] InferenceWorker_p0-w0: stopping experience collection (32850 times) [2024-06-15 19:16:43,586][1651469] Signal inference workers to resume experience collection... (32850 times) [2024-06-15 19:16:43,587][1652491] InferenceWorker_p0-w0: resuming experience collection (32850 times) [2024-06-15 19:16:43,772][1652491] Updated weights for policy 0, policy_version 631651 (0.0012) [2024-06-15 19:16:45,955][1648985] Fps is (10 sec: 42598.6, 60 sec: 46421.5, 300 sec: 46763.9). Total num frames: 1293713408. Throughput: 0: 11298.2. Samples: 323489280. Policy #0 lag: (min: 47.0, avg: 152.9, max: 303.0) [2024-06-15 19:16:45,956][1648985] Avg episode reward: [(0, '165.110')] [2024-06-15 19:16:46,645][1652491] Updated weights for policy 0, policy_version 631732 (0.0012) [2024-06-15 19:16:49,980][1652491] Updated weights for policy 0, policy_version 631765 (0.0013) [2024-06-15 19:16:50,955][1648985] Fps is (10 sec: 45876.0, 60 sec: 45881.0, 300 sec: 46652.8). Total num frames: 1293942784. Throughput: 0: 11605.4. Samples: 323562496. Policy #0 lag: (min: 47.0, avg: 152.9, max: 303.0) [2024-06-15 19:16:50,956][1648985] Avg episode reward: [(0, '150.430')] [2024-06-15 19:16:51,069][1652491] Updated weights for policy 0, policy_version 631811 (0.0067) [2024-06-15 19:16:52,203][1652491] Updated weights for policy 0, policy_version 631864 (0.0014) [2024-06-15 19:16:55,006][1652491] Updated weights for policy 0, policy_version 631926 (0.0013) [2024-06-15 19:16:55,955][1648985] Fps is (10 sec: 49151.7, 60 sec: 45875.4, 300 sec: 46652.7). Total num frames: 1294204928. Throughput: 0: 11548.4. Samples: 323597312. Policy #0 lag: (min: 47.0, avg: 152.9, max: 303.0) [2024-06-15 19:16:55,956][1648985] Avg episode reward: [(0, '145.020')] [2024-06-15 19:16:55,960][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000631936_1294204928.pth... [2024-06-15 19:16:56,029][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000626496_1283063808.pth [2024-06-15 19:16:57,466][1652491] Updated weights for policy 0, policy_version 631968 (0.0013) [2024-06-15 19:16:57,987][1652491] Updated weights for policy 0, policy_version 631994 (0.0011) [2024-06-15 19:17:00,955][1648985] Fps is (10 sec: 45875.1, 60 sec: 45329.1, 300 sec: 46430.6). Total num frames: 1294401536. Throughput: 0: 11741.9. Samples: 323676672. Policy #0 lag: (min: 47.0, avg: 152.9, max: 303.0) [2024-06-15 19:17:00,955][1648985] Avg episode reward: [(0, '153.780')] [2024-06-15 19:17:01,078][1652491] Updated weights for policy 0, policy_version 632035 (0.0089) [2024-06-15 19:17:01,857][1652491] Updated weights for policy 0, policy_version 632080 (0.0013) [2024-06-15 19:17:04,381][1652491] Updated weights for policy 0, policy_version 632144 (0.0013) [2024-06-15 19:17:05,955][1648985] Fps is (10 sec: 52428.4, 60 sec: 46967.4, 300 sec: 46763.8). Total num frames: 1294729216. Throughput: 0: 11719.1. Samples: 323743744. Policy #0 lag: (min: 47.0, avg: 152.9, max: 303.0) [2024-06-15 19:17:05,956][1648985] Avg episode reward: [(0, '168.380')] [2024-06-15 19:17:07,629][1652491] Updated weights for policy 0, policy_version 632193 (0.0055) [2024-06-15 19:17:08,865][1652491] Updated weights for policy 0, policy_version 632251 (0.0013) [2024-06-15 19:17:10,955][1648985] Fps is (10 sec: 45875.4, 60 sec: 45875.3, 300 sec: 46652.8). Total num frames: 1294860288. Throughput: 0: 11787.4. Samples: 323778560. Policy #0 lag: (min: 47.0, avg: 152.9, max: 303.0) [2024-06-15 19:17:10,956][1648985] Avg episode reward: [(0, '171.560')] [2024-06-15 19:17:11,768][1652491] Updated weights for policy 0, policy_version 632290 (0.0012) [2024-06-15 19:17:12,833][1652491] Updated weights for policy 0, policy_version 632337 (0.0013) [2024-06-15 19:17:15,107][1652491] Updated weights for policy 0, policy_version 632387 (0.0012) [2024-06-15 19:17:15,955][1648985] Fps is (10 sec: 49151.5, 60 sec: 49151.9, 300 sec: 46874.9). Total num frames: 1295220736. Throughput: 0: 12026.3. Samples: 323856384. Policy #0 lag: (min: 47.0, avg: 152.9, max: 303.0) [2024-06-15 19:17:15,956][1648985] Avg episode reward: [(0, '171.420')] [2024-06-15 19:17:16,115][1652491] Updated weights for policy 0, policy_version 632448 (0.0018) [2024-06-15 19:17:20,002][1652491] Updated weights for policy 0, policy_version 632504 (0.0027) [2024-06-15 19:17:20,968][1648985] Fps is (10 sec: 52363.4, 60 sec: 47503.8, 300 sec: 46650.8). Total num frames: 1295384576. Throughput: 0: 12102.6. Samples: 323932160. Policy #0 lag: (min: 47.0, avg: 152.9, max: 303.0) [2024-06-15 19:17:20,968][1648985] Avg episode reward: [(0, '168.580')] [2024-06-15 19:17:22,248][1652491] Updated weights for policy 0, policy_version 632566 (0.0015) [2024-06-15 19:17:23,208][1652491] Updated weights for policy 0, policy_version 632593 (0.0012) [2024-06-15 19:17:24,065][1652491] Updated weights for policy 0, policy_version 632640 (0.0015) [2024-06-15 19:17:25,569][1651469] Signal inference workers to stop experience collection... (32900 times) [2024-06-15 19:17:25,653][1652491] InferenceWorker_p0-w0: stopping experience collection (32900 times) [2024-06-15 19:17:25,792][1651469] Signal inference workers to resume experience collection... (32900 times) [2024-06-15 19:17:25,793][1652491] InferenceWorker_p0-w0: resuming experience collection (32900 times) [2024-06-15 19:17:25,955][1648985] Fps is (10 sec: 45876.3, 60 sec: 48605.9, 300 sec: 46767.4). Total num frames: 1295679488. Throughput: 0: 12094.6. Samples: 323965440. Policy #0 lag: (min: 47.0, avg: 152.9, max: 303.0) [2024-06-15 19:17:25,955][1648985] Avg episode reward: [(0, '160.920')] [2024-06-15 19:17:29,660][1652491] Updated weights for policy 0, policy_version 632720 (0.0014) [2024-06-15 19:17:30,854][1652491] Updated weights for policy 0, policy_version 632762 (0.0021) [2024-06-15 19:17:30,955][1648985] Fps is (10 sec: 49213.9, 60 sec: 47513.9, 300 sec: 46763.9). Total num frames: 1295876096. Throughput: 0: 12413.2. Samples: 324047872. Policy #0 lag: (min: 47.0, avg: 152.9, max: 303.0) [2024-06-15 19:17:30,955][1648985] Avg episode reward: [(0, '164.040')] [2024-06-15 19:17:31,773][1652491] Updated weights for policy 0, policy_version 632800 (0.0013) [2024-06-15 19:17:34,447][1652491] Updated weights for policy 0, policy_version 632864 (0.0014) [2024-06-15 19:17:35,956][1648985] Fps is (10 sec: 49148.2, 60 sec: 48059.1, 300 sec: 47096.9). Total num frames: 1296171008. Throughput: 0: 12265.0. Samples: 324114432. Policy #0 lag: (min: 47.0, avg: 152.9, max: 303.0) [2024-06-15 19:17:35,957][1648985] Avg episode reward: [(0, '158.190')] [2024-06-15 19:17:36,263][1652491] Updated weights for policy 0, policy_version 632897 (0.0011) [2024-06-15 19:17:40,593][1652491] Updated weights for policy 0, policy_version 632976 (0.0013) [2024-06-15 19:17:40,955][1648985] Fps is (10 sec: 45874.1, 60 sec: 47513.6, 300 sec: 46763.8). Total num frames: 1296334848. Throughput: 0: 12265.2. Samples: 324149248. Policy #0 lag: (min: 47.0, avg: 152.9, max: 303.0) [2024-06-15 19:17:40,956][1648985] Avg episode reward: [(0, '168.790')] [2024-06-15 19:17:41,755][1652491] Updated weights for policy 0, policy_version 633024 (0.0017) [2024-06-15 19:17:43,348][1652491] Updated weights for policy 0, policy_version 633088 (0.0047) [2024-06-15 19:17:45,863][1652491] Updated weights for policy 0, policy_version 633147 (0.0035) [2024-06-15 19:17:45,955][1648985] Fps is (10 sec: 52433.0, 60 sec: 49698.1, 300 sec: 47097.1). Total num frames: 1296695296. Throughput: 0: 12185.6. Samples: 324225024. Policy #0 lag: (min: 63.0, avg: 186.1, max: 319.0) [2024-06-15 19:17:45,956][1648985] Avg episode reward: [(0, '156.890')] [2024-06-15 19:17:47,960][1652491] Updated weights for policy 0, policy_version 633205 (0.0012) [2024-06-15 19:17:50,955][1648985] Fps is (10 sec: 49152.3, 60 sec: 48059.7, 300 sec: 46652.8). Total num frames: 1296826368. Throughput: 0: 12253.9. Samples: 324295168. Policy #0 lag: (min: 63.0, avg: 186.1, max: 319.0) [2024-06-15 19:17:50,956][1648985] Avg episode reward: [(0, '160.190')] [2024-06-15 19:17:51,864][1652491] Updated weights for policy 0, policy_version 633233 (0.0014) [2024-06-15 19:17:53,768][1652491] Updated weights for policy 0, policy_version 633297 (0.0014) [2024-06-15 19:17:55,721][1652491] Updated weights for policy 0, policy_version 633345 (0.0015) [2024-06-15 19:17:55,955][1648985] Fps is (10 sec: 39321.1, 60 sec: 48059.7, 300 sec: 46874.9). Total num frames: 1297088512. Throughput: 0: 12265.2. Samples: 324330496. Policy #0 lag: (min: 63.0, avg: 186.1, max: 319.0) [2024-06-15 19:17:55,956][1648985] Avg episode reward: [(0, '158.800')] [2024-06-15 19:17:57,725][1652491] Updated weights for policy 0, policy_version 633409 (0.0012) [2024-06-15 19:17:59,045][1652491] Updated weights for policy 0, policy_version 633464 (0.0011) [2024-06-15 19:18:00,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 49151.9, 300 sec: 46874.9). Total num frames: 1297350656. Throughput: 0: 12026.4. Samples: 324397568. Policy #0 lag: (min: 63.0, avg: 186.1, max: 319.0) [2024-06-15 19:18:00,956][1648985] Avg episode reward: [(0, '165.090')] [2024-06-15 19:18:03,492][1652491] Updated weights for policy 0, policy_version 633504 (0.0013) [2024-06-15 19:18:04,761][1652491] Updated weights for policy 0, policy_version 633541 (0.0021) [2024-06-15 19:18:05,742][1652491] Updated weights for policy 0, policy_version 633597 (0.0015) [2024-06-15 19:18:05,955][1648985] Fps is (10 sec: 52429.1, 60 sec: 48059.8, 300 sec: 47097.1). Total num frames: 1297612800. Throughput: 0: 11938.6. Samples: 324469248. Policy #0 lag: (min: 63.0, avg: 186.1, max: 319.0) [2024-06-15 19:18:05,956][1648985] Avg episode reward: [(0, '151.330')] [2024-06-15 19:18:09,068][1652491] Updated weights for policy 0, policy_version 633664 (0.0014) [2024-06-15 19:18:09,174][1651469] Signal inference workers to stop experience collection... (32950 times) [2024-06-15 19:18:09,230][1652491] InferenceWorker_p0-w0: stopping experience collection (32950 times) [2024-06-15 19:18:09,522][1651469] Signal inference workers to resume experience collection... (32950 times) [2024-06-15 19:18:09,523][1652491] InferenceWorker_p0-w0: resuming experience collection (32950 times) [2024-06-15 19:18:10,729][1652491] Updated weights for policy 0, policy_version 633718 (0.0015) [2024-06-15 19:18:10,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 50244.2, 300 sec: 47097.0). Total num frames: 1297874944. Throughput: 0: 11980.8. Samples: 324504576. Policy #0 lag: (min: 63.0, avg: 186.1, max: 319.0) [2024-06-15 19:18:10,956][1648985] Avg episode reward: [(0, '147.600')] [2024-06-15 19:18:14,790][1652491] Updated weights for policy 0, policy_version 633766 (0.0013) [2024-06-15 19:18:15,955][1648985] Fps is (10 sec: 39321.4, 60 sec: 46421.4, 300 sec: 47097.1). Total num frames: 1298006016. Throughput: 0: 11628.0. Samples: 324571136. Policy #0 lag: (min: 63.0, avg: 186.1, max: 319.0) [2024-06-15 19:18:15,956][1648985] Avg episode reward: [(0, '130.950')] [2024-06-15 19:18:16,407][1652491] Updated weights for policy 0, policy_version 633808 (0.0013) [2024-06-15 19:18:17,294][1652491] Updated weights for policy 0, policy_version 633856 (0.0012) [2024-06-15 19:18:20,463][1652491] Updated weights for policy 0, policy_version 633906 (0.0010) [2024-06-15 19:18:20,960][1648985] Fps is (10 sec: 39303.2, 60 sec: 48065.9, 300 sec: 46985.3). Total num frames: 1298268160. Throughput: 0: 11718.1. Samples: 324641792. Policy #0 lag: (min: 63.0, avg: 186.1, max: 319.0) [2024-06-15 19:18:20,960][1648985] Avg episode reward: [(0, '120.810')] [2024-06-15 19:18:22,239][1652491] Updated weights for policy 0, policy_version 633978 (0.0120) [2024-06-15 19:18:25,955][1648985] Fps is (10 sec: 45874.6, 60 sec: 46421.2, 300 sec: 46874.9). Total num frames: 1298464768. Throughput: 0: 11719.1. Samples: 324676608. Policy #0 lag: (min: 63.0, avg: 186.1, max: 319.0) [2024-06-15 19:18:25,956][1648985] Avg episode reward: [(0, '136.030')] [2024-06-15 19:18:26,208][1652491] Updated weights for policy 0, policy_version 634042 (0.0012) [2024-06-15 19:18:29,032][1652491] Updated weights for policy 0, policy_version 634109 (0.0047) [2024-06-15 19:18:30,955][1648985] Fps is (10 sec: 42618.6, 60 sec: 46967.4, 300 sec: 46763.8). Total num frames: 1298694144. Throughput: 0: 11639.5. Samples: 324748800. Policy #0 lag: (min: 63.0, avg: 186.1, max: 319.0) [2024-06-15 19:18:30,956][1648985] Avg episode reward: [(0, '146.130')] [2024-06-15 19:18:31,874][1652491] Updated weights for policy 0, policy_version 634176 (0.0026) [2024-06-15 19:18:33,221][1652491] Updated weights for policy 0, policy_version 634224 (0.0011) [2024-06-15 19:18:35,955][1648985] Fps is (10 sec: 45876.4, 60 sec: 45875.8, 300 sec: 46652.7). Total num frames: 1298923520. Throughput: 0: 11559.8. Samples: 324815360. Policy #0 lag: (min: 63.0, avg: 186.1, max: 319.0) [2024-06-15 19:18:35,955][1648985] Avg episode reward: [(0, '157.410')] [2024-06-15 19:18:36,828][1652491] Updated weights for policy 0, policy_version 634245 (0.0012) [2024-06-15 19:18:39,655][1652491] Updated weights for policy 0, policy_version 634306 (0.0012) [2024-06-15 19:18:40,912][1652491] Updated weights for policy 0, policy_version 634364 (0.0012) [2024-06-15 19:18:40,955][1648985] Fps is (10 sec: 45874.9, 60 sec: 46967.5, 300 sec: 46652.8). Total num frames: 1299152896. Throughput: 0: 11503.0. Samples: 324848128. Policy #0 lag: (min: 63.0, avg: 186.1, max: 319.0) [2024-06-15 19:18:40,956][1648985] Avg episode reward: [(0, '163.530')] [2024-06-15 19:18:43,018][1652491] Updated weights for policy 0, policy_version 634432 (0.0013) [2024-06-15 19:18:44,552][1652491] Updated weights for policy 0, policy_version 634492 (0.0013) [2024-06-15 19:18:45,955][1648985] Fps is (10 sec: 52428.3, 60 sec: 45875.1, 300 sec: 46986.0). Total num frames: 1299447808. Throughput: 0: 11366.4. Samples: 324909056. Policy #0 lag: (min: 63.0, avg: 186.1, max: 319.0) [2024-06-15 19:18:45,956][1648985] Avg episode reward: [(0, '178.090')] [2024-06-15 19:18:48,706][1652491] Updated weights for policy 0, policy_version 634544 (0.0015) [2024-06-15 19:18:50,955][1648985] Fps is (10 sec: 42598.0, 60 sec: 45875.1, 300 sec: 46652.7). Total num frames: 1299578880. Throughput: 0: 11548.4. Samples: 324988928. Policy #0 lag: (min: 63.0, avg: 186.1, max: 319.0) [2024-06-15 19:18:50,957][1648985] Avg episode reward: [(0, '173.280')] [2024-06-15 19:18:51,488][1652491] Updated weights for policy 0, policy_version 634576 (0.0011) [2024-06-15 19:18:52,612][1652491] Updated weights for policy 0, policy_version 634620 (0.0013) [2024-06-15 19:18:54,300][1651469] Signal inference workers to stop experience collection... (33000 times) [2024-06-15 19:18:54,338][1652491] InferenceWorker_p0-w0: stopping experience collection (33000 times) [2024-06-15 19:18:54,579][1651469] Signal inference workers to resume experience collection... (33000 times) [2024-06-15 19:18:54,580][1652491] InferenceWorker_p0-w0: resuming experience collection (33000 times) [2024-06-15 19:18:54,753][1652491] Updated weights for policy 0, policy_version 634675 (0.0014) [2024-06-15 19:18:55,957][1648985] Fps is (10 sec: 45865.0, 60 sec: 46965.8, 300 sec: 46985.6). Total num frames: 1299906560. Throughput: 0: 11502.4. Samples: 325022208. Policy #0 lag: (min: 63.0, avg: 186.1, max: 319.0) [2024-06-15 19:18:55,958][1648985] Avg episode reward: [(0, '161.150')] [2024-06-15 19:18:56,580][1652491] Updated weights for policy 0, policy_version 634751 (0.0015) [2024-06-15 19:18:56,591][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000634752_1299972096.pth... [2024-06-15 19:18:56,626][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000629184_1288568832.pth [2024-06-15 19:18:59,390][1652491] Updated weights for policy 0, policy_version 634812 (0.0012) [2024-06-15 19:19:00,955][1648985] Fps is (10 sec: 52428.6, 60 sec: 45875.1, 300 sec: 46986.0). Total num frames: 1300103168. Throughput: 0: 11502.9. Samples: 325088768. Policy #0 lag: (min: 63.0, avg: 186.1, max: 319.0) [2024-06-15 19:19:00,956][1648985] Avg episode reward: [(0, '162.300')] [2024-06-15 19:19:03,727][1652491] Updated weights for policy 0, policy_version 634877 (0.0016) [2024-06-15 19:19:05,955][1648985] Fps is (10 sec: 42608.3, 60 sec: 45329.1, 300 sec: 46986.0). Total num frames: 1300332544. Throughput: 0: 11526.9. Samples: 325160448. Policy #0 lag: (min: 63.0, avg: 186.1, max: 319.0) [2024-06-15 19:19:05,955][1648985] Avg episode reward: [(0, '139.580')] [2024-06-15 19:19:06,024][1652491] Updated weights for policy 0, policy_version 634929 (0.0013) [2024-06-15 19:19:07,777][1652491] Updated weights for policy 0, policy_version 634997 (0.0014) [2024-06-15 19:19:10,394][1652491] Updated weights for policy 0, policy_version 635040 (0.0012) [2024-06-15 19:19:10,955][1648985] Fps is (10 sec: 49152.5, 60 sec: 45329.1, 300 sec: 46986.0). Total num frames: 1300594688. Throughput: 0: 11423.3. Samples: 325190656. Policy #0 lag: (min: 63.0, avg: 186.1, max: 319.0) [2024-06-15 19:19:10,956][1648985] Avg episode reward: [(0, '161.480')] [2024-06-15 19:19:11,039][1652491] Updated weights for policy 0, policy_version 635067 (0.0010) [2024-06-15 19:19:15,330][1652491] Updated weights for policy 0, policy_version 635120 (0.0013) [2024-06-15 19:19:15,955][1648985] Fps is (10 sec: 42597.9, 60 sec: 45875.2, 300 sec: 46652.7). Total num frames: 1300758528. Throughput: 0: 11548.4. Samples: 325268480. Policy #0 lag: (min: 63.0, avg: 186.1, max: 319.0) [2024-06-15 19:19:15,956][1648985] Avg episode reward: [(0, '155.070')] [2024-06-15 19:19:16,379][1652491] Updated weights for policy 0, policy_version 635153 (0.0032) [2024-06-15 19:19:18,371][1652491] Updated weights for policy 0, policy_version 635233 (0.0059) [2024-06-15 19:19:20,814][1652491] Updated weights for policy 0, policy_version 635280 (0.0012) [2024-06-15 19:19:20,955][1648985] Fps is (10 sec: 45874.6, 60 sec: 46424.9, 300 sec: 46763.8). Total num frames: 1301053440. Throughput: 0: 11537.0. Samples: 325334528. Policy #0 lag: (min: 63.0, avg: 186.1, max: 319.0) [2024-06-15 19:19:20,957][1648985] Avg episode reward: [(0, '162.430')] [2024-06-15 19:19:21,618][1652491] Updated weights for policy 0, policy_version 635320 (0.0013) [2024-06-15 19:19:25,810][1652491] Updated weights for policy 0, policy_version 635360 (0.0012) [2024-06-15 19:19:25,955][1648985] Fps is (10 sec: 45875.8, 60 sec: 45875.4, 300 sec: 46541.7). Total num frames: 1301217280. Throughput: 0: 11730.5. Samples: 325376000. Policy #0 lag: (min: 10.0, avg: 90.3, max: 266.0) [2024-06-15 19:19:25,955][1648985] Avg episode reward: [(0, '140.940')] [2024-06-15 19:19:27,576][1652491] Updated weights for policy 0, policy_version 635428 (0.0014) [2024-06-15 19:19:29,168][1652491] Updated weights for policy 0, policy_version 635493 (0.0010) [2024-06-15 19:19:30,955][1648985] Fps is (10 sec: 49152.3, 60 sec: 47513.5, 300 sec: 46985.9). Total num frames: 1301544960. Throughput: 0: 11719.1. Samples: 325436416. Policy #0 lag: (min: 10.0, avg: 90.3, max: 266.0) [2024-06-15 19:19:30,956][1648985] Avg episode reward: [(0, '150.540')] [2024-06-15 19:19:31,545][1652491] Updated weights for policy 0, policy_version 635536 (0.0012) [2024-06-15 19:19:32,545][1652491] Updated weights for policy 0, policy_version 635580 (0.0019) [2024-06-15 19:19:35,955][1648985] Fps is (10 sec: 45874.9, 60 sec: 45875.2, 300 sec: 46652.8). Total num frames: 1301676032. Throughput: 0: 11798.8. Samples: 325519872. Policy #0 lag: (min: 10.0, avg: 90.3, max: 266.0) [2024-06-15 19:19:35,956][1648985] Avg episode reward: [(0, '162.980')] [2024-06-15 19:19:37,385][1652491] Updated weights for policy 0, policy_version 635642 (0.0015) [2024-06-15 19:19:38,092][1651469] Signal inference workers to stop experience collection... (33050 times) [2024-06-15 19:19:38,147][1652491] InferenceWorker_p0-w0: stopping experience collection (33050 times) [2024-06-15 19:19:38,370][1651469] Signal inference workers to resume experience collection... (33050 times) [2024-06-15 19:19:38,382][1652491] InferenceWorker_p0-w0: resuming experience collection (33050 times) [2024-06-15 19:19:38,986][1652491] Updated weights for policy 0, policy_version 635696 (0.0016) [2024-06-15 19:19:40,607][1652491] Updated weights for policy 0, policy_version 635762 (0.0110) [2024-06-15 19:19:40,955][1648985] Fps is (10 sec: 52429.2, 60 sec: 48605.8, 300 sec: 47319.2). Total num frames: 1302069248. Throughput: 0: 11742.4. Samples: 325550592. Policy #0 lag: (min: 10.0, avg: 90.3, max: 266.0) [2024-06-15 19:19:40,956][1648985] Avg episode reward: [(0, '172.810')] [2024-06-15 19:19:42,610][1652491] Updated weights for policy 0, policy_version 635792 (0.0011) [2024-06-15 19:19:43,669][1652491] Updated weights for policy 0, policy_version 635840 (0.0013) [2024-06-15 19:19:45,955][1648985] Fps is (10 sec: 52428.5, 60 sec: 45875.2, 300 sec: 46986.0). Total num frames: 1302200320. Throughput: 0: 11844.3. Samples: 325621760. Policy #0 lag: (min: 10.0, avg: 90.3, max: 266.0) [2024-06-15 19:19:45,956][1648985] Avg episode reward: [(0, '185.020')] [2024-06-15 19:19:49,093][1652491] Updated weights for policy 0, policy_version 635906 (0.0014) [2024-06-15 19:19:50,619][1652491] Updated weights for policy 0, policy_version 635984 (0.0013) [2024-06-15 19:19:50,955][1648985] Fps is (10 sec: 42597.4, 60 sec: 48605.7, 300 sec: 47208.1). Total num frames: 1302495232. Throughput: 0: 11866.9. Samples: 325694464. Policy #0 lag: (min: 10.0, avg: 90.3, max: 266.0) [2024-06-15 19:19:50,956][1648985] Avg episode reward: [(0, '159.860')] [2024-06-15 19:19:51,787][1652491] Updated weights for policy 0, policy_version 636030 (0.0051) [2024-06-15 19:19:55,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 46969.2, 300 sec: 47097.0). Total num frames: 1302724608. Throughput: 0: 12049.1. Samples: 325732864. Policy #0 lag: (min: 10.0, avg: 90.3, max: 266.0) [2024-06-15 19:19:55,956][1648985] Avg episode reward: [(0, '143.020')] [2024-06-15 19:19:57,798][1652491] Updated weights for policy 0, policy_version 636112 (0.0012) [2024-06-15 19:19:58,941][1652491] Updated weights for policy 0, policy_version 636159 (0.0013) [2024-06-15 19:20:00,911][1652491] Updated weights for policy 0, policy_version 636220 (0.0011) [2024-06-15 19:20:00,955][1648985] Fps is (10 sec: 45876.4, 60 sec: 47513.7, 300 sec: 46986.7). Total num frames: 1302953984. Throughput: 0: 11878.4. Samples: 325803008. Policy #0 lag: (min: 10.0, avg: 90.3, max: 266.0) [2024-06-15 19:20:00,956][1648985] Avg episode reward: [(0, '137.900')] [2024-06-15 19:20:02,388][1652491] Updated weights for policy 0, policy_version 636281 (0.0013) [2024-06-15 19:20:05,955][1648985] Fps is (10 sec: 42597.9, 60 sec: 46967.3, 300 sec: 46763.8). Total num frames: 1303150592. Throughput: 0: 12014.9. Samples: 325875200. Policy #0 lag: (min: 10.0, avg: 90.3, max: 266.0) [2024-06-15 19:20:05,956][1648985] Avg episode reward: [(0, '156.310')] [2024-06-15 19:20:06,186][1652491] Updated weights for policy 0, policy_version 636322 (0.0012) [2024-06-15 19:20:09,315][1652491] Updated weights for policy 0, policy_version 636391 (0.0016) [2024-06-15 19:20:10,221][1652491] Updated weights for policy 0, policy_version 636419 (0.0012) [2024-06-15 19:20:10,955][1648985] Fps is (10 sec: 49151.1, 60 sec: 47513.5, 300 sec: 47319.2). Total num frames: 1303445504. Throughput: 0: 11889.7. Samples: 325911040. Policy #0 lag: (min: 10.0, avg: 90.3, max: 266.0) [2024-06-15 19:20:10,956][1648985] Avg episode reward: [(0, '152.530')] [2024-06-15 19:20:11,542][1652491] Updated weights for policy 0, policy_version 636472 (0.0014) [2024-06-15 19:20:13,229][1652491] Updated weights for policy 0, policy_version 636534 (0.0013) [2024-06-15 19:20:15,955][1648985] Fps is (10 sec: 49153.2, 60 sec: 48059.8, 300 sec: 47097.1). Total num frames: 1303642112. Throughput: 0: 12015.0. Samples: 325977088. Policy #0 lag: (min: 10.0, avg: 90.3, max: 266.0) [2024-06-15 19:20:15,955][1648985] Avg episode reward: [(0, '150.950')] [2024-06-15 19:20:17,824][1652491] Updated weights for policy 0, policy_version 636592 (0.0013) [2024-06-15 19:20:20,955][1648985] Fps is (10 sec: 39322.3, 60 sec: 46421.4, 300 sec: 46874.9). Total num frames: 1303838720. Throughput: 0: 11787.4. Samples: 326050304. Policy #0 lag: (min: 10.0, avg: 90.3, max: 266.0) [2024-06-15 19:20:20,956][1648985] Avg episode reward: [(0, '157.700')] [2024-06-15 19:20:20,993][1651469] Signal inference workers to stop experience collection... (33100 times) [2024-06-15 19:20:21,027][1652491] InferenceWorker_p0-w0: stopping experience collection (33100 times) [2024-06-15 19:20:21,300][1651469] Signal inference workers to resume experience collection... (33100 times) [2024-06-15 19:20:21,301][1652491] InferenceWorker_p0-w0: resuming experience collection (33100 times) [2024-06-15 19:20:21,303][1652491] Updated weights for policy 0, policy_version 636656 (0.0013) [2024-06-15 19:20:22,978][1652491] Updated weights for policy 0, policy_version 636720 (0.0012) [2024-06-15 19:20:24,819][1652491] Updated weights for policy 0, policy_version 636784 (0.0121) [2024-06-15 19:20:25,955][1648985] Fps is (10 sec: 52428.1, 60 sec: 49151.9, 300 sec: 47319.2). Total num frames: 1304166400. Throughput: 0: 11650.8. Samples: 326074880. Policy #0 lag: (min: 10.0, avg: 90.3, max: 266.0) [2024-06-15 19:20:25,956][1648985] Avg episode reward: [(0, '174.760')] [2024-06-15 19:20:29,660][1652491] Updated weights for policy 0, policy_version 636818 (0.0013) [2024-06-15 19:20:30,548][1652491] Updated weights for policy 0, policy_version 636864 (0.0011) [2024-06-15 19:20:30,955][1648985] Fps is (10 sec: 45875.3, 60 sec: 45875.3, 300 sec: 46874.9). Total num frames: 1304297472. Throughput: 0: 11741.9. Samples: 326150144. Policy #0 lag: (min: 10.0, avg: 90.3, max: 266.0) [2024-06-15 19:20:30,956][1648985] Avg episode reward: [(0, '161.410')] [2024-06-15 19:20:34,139][1652491] Updated weights for policy 0, policy_version 636963 (0.0082) [2024-06-15 19:20:35,382][1652491] Updated weights for policy 0, policy_version 637008 (0.0012) [2024-06-15 19:20:35,955][1648985] Fps is (10 sec: 45875.4, 60 sec: 49151.9, 300 sec: 47319.2). Total num frames: 1304625152. Throughput: 0: 11400.6. Samples: 326207488. Policy #0 lag: (min: 10.0, avg: 90.3, max: 266.0) [2024-06-15 19:20:35,956][1648985] Avg episode reward: [(0, '147.680')] [2024-06-15 19:20:40,956][1648985] Fps is (10 sec: 39320.6, 60 sec: 43690.5, 300 sec: 46652.7). Total num frames: 1304690688. Throughput: 0: 11457.4. Samples: 326248448. Policy #0 lag: (min: 10.0, avg: 90.3, max: 266.0) [2024-06-15 19:20:40,957][1648985] Avg episode reward: [(0, '149.470')] [2024-06-15 19:20:41,068][1652491] Updated weights for policy 0, policy_version 637057 (0.0069) [2024-06-15 19:20:42,861][1652491] Updated weights for policy 0, policy_version 637121 (0.0013) [2024-06-15 19:20:43,956][1652491] Updated weights for policy 0, policy_version 637173 (0.0012) [2024-06-15 19:20:45,208][1652491] Updated weights for policy 0, policy_version 637220 (0.0014) [2024-06-15 19:20:45,955][1648985] Fps is (10 sec: 45875.5, 60 sec: 48059.8, 300 sec: 47098.3). Total num frames: 1305083904. Throughput: 0: 11423.3. Samples: 326317056. Policy #0 lag: (min: 10.0, avg: 90.3, max: 266.0) [2024-06-15 19:20:45,956][1648985] Avg episode reward: [(0, '147.710')] [2024-06-15 19:20:46,680][1652491] Updated weights for policy 0, policy_version 637265 (0.0012) [2024-06-15 19:20:47,739][1652491] Updated weights for policy 0, policy_version 637310 (0.0013) [2024-06-15 19:20:50,955][1648985] Fps is (10 sec: 52430.5, 60 sec: 45329.3, 300 sec: 46652.8). Total num frames: 1305214976. Throughput: 0: 11571.3. Samples: 326395904. Policy #0 lag: (min: 10.0, avg: 90.3, max: 266.0) [2024-06-15 19:20:50,955][1648985] Avg episode reward: [(0, '139.980')] [2024-06-15 19:20:53,089][1652491] Updated weights for policy 0, policy_version 637369 (0.0014) [2024-06-15 19:20:55,002][1652491] Updated weights for policy 0, policy_version 637435 (0.0134) [2024-06-15 19:20:55,955][1648985] Fps is (10 sec: 45875.3, 60 sec: 46967.6, 300 sec: 46986.0). Total num frames: 1305542656. Throughput: 0: 11548.5. Samples: 326430720. Policy #0 lag: (min: 10.0, avg: 90.3, max: 266.0) [2024-06-15 19:20:55,955][1648985] Avg episode reward: [(0, '125.290')] [2024-06-15 19:20:56,200][1652491] Updated weights for policy 0, policy_version 637480 (0.0014) [2024-06-15 19:20:56,335][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000637488_1305575424.pth... [2024-06-15 19:20:56,383][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000631936_1294204928.pth [2024-06-15 19:20:57,488][1652491] Updated weights for policy 0, policy_version 637520 (0.0013) [2024-06-15 19:20:58,802][1652491] Updated weights for policy 0, policy_version 637564 (0.0013) [2024-06-15 19:21:00,955][1648985] Fps is (10 sec: 52427.8, 60 sec: 46421.2, 300 sec: 46874.9). Total num frames: 1305739264. Throughput: 0: 11639.4. Samples: 326500864. Policy #0 lag: (min: 10.0, avg: 90.3, max: 266.0) [2024-06-15 19:21:00,956][1648985] Avg episode reward: [(0, '136.620')] [2024-06-15 19:21:03,788][1651469] Signal inference workers to stop experience collection... (33150 times) [2024-06-15 19:21:03,838][1652491] InferenceWorker_p0-w0: stopping experience collection (33150 times) [2024-06-15 19:21:03,992][1651469] Signal inference workers to resume experience collection... (33150 times) [2024-06-15 19:21:03,992][1652491] InferenceWorker_p0-w0: resuming experience collection (33150 times) [2024-06-15 19:21:04,181][1652491] Updated weights for policy 0, policy_version 637617 (0.0013) [2024-06-15 19:21:05,764][1652491] Updated weights for policy 0, policy_version 637686 (0.0015) [2024-06-15 19:21:05,978][1648985] Fps is (10 sec: 45769.0, 60 sec: 47495.4, 300 sec: 47093.4). Total num frames: 1306001408. Throughput: 0: 11610.7. Samples: 326573056. Policy #0 lag: (min: 15.0, avg: 74.4, max: 271.0) [2024-06-15 19:21:05,979][1648985] Avg episode reward: [(0, '138.940')] [2024-06-15 19:21:06,878][1652491] Updated weights for policy 0, policy_version 637744 (0.0014) [2024-06-15 19:21:07,961][1652491] Updated weights for policy 0, policy_version 637766 (0.0056) [2024-06-15 19:21:08,974][1652491] Updated weights for policy 0, policy_version 637812 (0.0014) [2024-06-15 19:21:10,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 46967.5, 300 sec: 47430.3). Total num frames: 1306263552. Throughput: 0: 11821.5. Samples: 326606848. Policy #0 lag: (min: 15.0, avg: 74.4, max: 271.0) [2024-06-15 19:21:10,956][1648985] Avg episode reward: [(0, '141.070')] [2024-06-15 19:21:13,727][1652491] Updated weights for policy 0, policy_version 637843 (0.0012) [2024-06-15 19:21:14,468][1652491] Updated weights for policy 0, policy_version 637887 (0.0014) [2024-06-15 19:21:15,661][1652491] Updated weights for policy 0, policy_version 637936 (0.0023) [2024-06-15 19:21:15,955][1648985] Fps is (10 sec: 52550.9, 60 sec: 48059.7, 300 sec: 47430.3). Total num frames: 1306525696. Throughput: 0: 12003.6. Samples: 326690304. Policy #0 lag: (min: 15.0, avg: 74.4, max: 271.0) [2024-06-15 19:21:15,955][1648985] Avg episode reward: [(0, '142.410')] [2024-06-15 19:21:17,162][1652491] Updated weights for policy 0, policy_version 638001 (0.0014) [2024-06-15 19:21:19,167][1652491] Updated weights for policy 0, policy_version 638048 (0.0015) [2024-06-15 19:21:20,955][1648985] Fps is (10 sec: 52428.1, 60 sec: 49151.8, 300 sec: 47541.3). Total num frames: 1306787840. Throughput: 0: 12287.9. Samples: 326760448. Policy #0 lag: (min: 15.0, avg: 74.4, max: 271.0) [2024-06-15 19:21:20,956][1648985] Avg episode reward: [(0, '152.280')] [2024-06-15 19:21:23,875][1652491] Updated weights for policy 0, policy_version 638083 (0.0012) [2024-06-15 19:21:25,253][1652491] Updated weights for policy 0, policy_version 638144 (0.0016) [2024-06-15 19:21:25,955][1648985] Fps is (10 sec: 45875.0, 60 sec: 46967.5, 300 sec: 47319.3). Total num frames: 1306984448. Throughput: 0: 12424.6. Samples: 326807552. Policy #0 lag: (min: 15.0, avg: 74.4, max: 271.0) [2024-06-15 19:21:25,955][1648985] Avg episode reward: [(0, '149.780')] [2024-06-15 19:21:26,521][1652491] Updated weights for policy 0, policy_version 638208 (0.0013) [2024-06-15 19:21:29,614][1652491] Updated weights for policy 0, policy_version 638304 (0.0014) [2024-06-15 19:21:30,955][1648985] Fps is (10 sec: 52430.4, 60 sec: 50244.3, 300 sec: 47541.4). Total num frames: 1307312128. Throughput: 0: 12288.0. Samples: 326870016. Policy #0 lag: (min: 15.0, avg: 74.4, max: 271.0) [2024-06-15 19:21:30,955][1648985] Avg episode reward: [(0, '158.410')] [2024-06-15 19:21:35,842][1652491] Updated weights for policy 0, policy_version 638390 (0.0058) [2024-06-15 19:21:35,955][1648985] Fps is (10 sec: 42598.1, 60 sec: 46421.3, 300 sec: 47208.1). Total num frames: 1307410432. Throughput: 0: 12231.1. Samples: 326946304. Policy #0 lag: (min: 15.0, avg: 74.4, max: 271.0) [2024-06-15 19:21:35,956][1648985] Avg episode reward: [(0, '146.890')] [2024-06-15 19:21:36,788][1652491] Updated weights for policy 0, policy_version 638432 (0.0041) [2024-06-15 19:21:37,740][1652491] Updated weights for policy 0, policy_version 638469 (0.0026) [2024-06-15 19:21:40,230][1652491] Updated weights for policy 0, policy_version 638530 (0.0106) [2024-06-15 19:21:40,955][1648985] Fps is (10 sec: 45875.0, 60 sec: 51336.8, 300 sec: 47652.4). Total num frames: 1307770880. Throughput: 0: 12197.0. Samples: 326979584. Policy #0 lag: (min: 15.0, avg: 74.4, max: 271.0) [2024-06-15 19:21:40,955][1648985] Avg episode reward: [(0, '157.800')] [2024-06-15 19:21:41,675][1652491] Updated weights for policy 0, policy_version 638592 (0.0013) [2024-06-15 19:21:44,989][1651469] Signal inference workers to stop experience collection... (33200 times) [2024-06-15 19:21:45,021][1652491] InferenceWorker_p0-w0: stopping experience collection (33200 times) [2024-06-15 19:21:45,313][1651469] Signal inference workers to resume experience collection... (33200 times) [2024-06-15 19:21:45,314][1652491] InferenceWorker_p0-w0: resuming experience collection (33200 times) [2024-06-15 19:21:45,955][1648985] Fps is (10 sec: 49152.5, 60 sec: 46967.5, 300 sec: 47319.2). Total num frames: 1307901952. Throughput: 0: 12367.7. Samples: 327057408. Policy #0 lag: (min: 15.0, avg: 74.4, max: 271.0) [2024-06-15 19:21:45,955][1648985] Avg episode reward: [(0, '152.940')] [2024-06-15 19:21:46,552][1652491] Updated weights for policy 0, policy_version 638651 (0.0016) [2024-06-15 19:21:48,992][1652491] Updated weights for policy 0, policy_version 638736 (0.0015) [2024-06-15 19:21:50,955][1648985] Fps is (10 sec: 45874.9, 60 sec: 50244.2, 300 sec: 47541.4). Total num frames: 1308229632. Throughput: 0: 12100.8. Samples: 327117312. Policy #0 lag: (min: 15.0, avg: 74.4, max: 271.0) [2024-06-15 19:21:50,956][1648985] Avg episode reward: [(0, '165.300')] [2024-06-15 19:21:52,961][1652491] Updated weights for policy 0, policy_version 638816 (0.0129) [2024-06-15 19:21:55,955][1648985] Fps is (10 sec: 45873.1, 60 sec: 46967.1, 300 sec: 47319.1). Total num frames: 1308360704. Throughput: 0: 12049.0. Samples: 327149056. Policy #0 lag: (min: 15.0, avg: 74.4, max: 271.0) [2024-06-15 19:21:55,956][1648985] Avg episode reward: [(0, '173.600')] [2024-06-15 19:21:57,193][1652491] Updated weights for policy 0, policy_version 638851 (0.0059) [2024-06-15 19:21:58,380][1652491] Updated weights for policy 0, policy_version 638910 (0.0126) [2024-06-15 19:21:59,979][1652491] Updated weights for policy 0, policy_version 638962 (0.0018) [2024-06-15 19:22:00,955][1648985] Fps is (10 sec: 42599.0, 60 sec: 48606.0, 300 sec: 47208.2). Total num frames: 1308655616. Throughput: 0: 11855.7. Samples: 327223808. Policy #0 lag: (min: 15.0, avg: 74.4, max: 271.0) [2024-06-15 19:22:00,956][1648985] Avg episode reward: [(0, '169.450')] [2024-06-15 19:22:01,456][1652491] Updated weights for policy 0, policy_version 639031 (0.0015) [2024-06-15 19:22:04,026][1652491] Updated weights for policy 0, policy_version 639101 (0.0015) [2024-06-15 19:22:05,955][1648985] Fps is (10 sec: 52430.7, 60 sec: 48078.3, 300 sec: 47541.3). Total num frames: 1308884992. Throughput: 0: 11901.2. Samples: 327296000. Policy #0 lag: (min: 15.0, avg: 74.4, max: 271.0) [2024-06-15 19:22:05,956][1648985] Avg episode reward: [(0, '159.550')] [2024-06-15 19:22:09,851][1652491] Updated weights for policy 0, policy_version 639184 (0.0079) [2024-06-15 19:22:10,955][1648985] Fps is (10 sec: 49151.3, 60 sec: 48059.8, 300 sec: 47208.2). Total num frames: 1309147136. Throughput: 0: 11764.6. Samples: 327336960. Policy #0 lag: (min: 15.0, avg: 74.4, max: 271.0) [2024-06-15 19:22:10,956][1648985] Avg episode reward: [(0, '158.730')] [2024-06-15 19:22:11,131][1652491] Updated weights for policy 0, policy_version 639238 (0.0127) [2024-06-15 19:22:13,592][1652491] Updated weights for policy 0, policy_version 639299 (0.0018) [2024-06-15 19:22:14,870][1652491] Updated weights for policy 0, policy_version 639352 (0.0011) [2024-06-15 19:22:15,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 48059.7, 300 sec: 47543.4). Total num frames: 1309409280. Throughput: 0: 11867.0. Samples: 327404032. Policy #0 lag: (min: 15.0, avg: 74.4, max: 271.0) [2024-06-15 19:22:15,956][1648985] Avg episode reward: [(0, '152.930')] [2024-06-15 19:22:20,075][1652491] Updated weights for policy 0, policy_version 639397 (0.0033) [2024-06-15 19:22:20,955][1648985] Fps is (10 sec: 42598.9, 60 sec: 46421.6, 300 sec: 47097.1). Total num frames: 1309573120. Throughput: 0: 11878.4. Samples: 327480832. Policy #0 lag: (min: 15.0, avg: 74.4, max: 271.0) [2024-06-15 19:22:20,955][1648985] Avg episode reward: [(0, '172.990')] [2024-06-15 19:22:21,415][1652491] Updated weights for policy 0, policy_version 639464 (0.0013) [2024-06-15 19:22:22,806][1651469] Signal inference workers to stop experience collection... (33250 times) [2024-06-15 19:22:22,836][1652491] InferenceWorker_p0-w0: stopping experience collection (33250 times) [2024-06-15 19:22:23,047][1651469] Signal inference workers to resume experience collection... (33250 times) [2024-06-15 19:22:23,048][1652491] InferenceWorker_p0-w0: resuming experience collection (33250 times) [2024-06-15 19:22:23,251][1652491] Updated weights for policy 0, policy_version 639551 (0.0123) [2024-06-15 19:22:25,531][1652491] Updated weights for policy 0, policy_version 639604 (0.0014) [2024-06-15 19:22:25,955][1648985] Fps is (10 sec: 52429.1, 60 sec: 49152.0, 300 sec: 47652.4). Total num frames: 1309933568. Throughput: 0: 11832.9. Samples: 327512064. Policy #0 lag: (min: 15.0, avg: 74.4, max: 271.0) [2024-06-15 19:22:25,955][1648985] Avg episode reward: [(0, '179.280')] [2024-06-15 19:22:30,955][1648985] Fps is (10 sec: 42597.7, 60 sec: 44782.8, 300 sec: 46875.0). Total num frames: 1309999104. Throughput: 0: 11923.9. Samples: 327593984. Policy #0 lag: (min: 15.0, avg: 74.4, max: 271.0) [2024-06-15 19:22:30,956][1648985] Avg episode reward: [(0, '154.630')] [2024-06-15 19:22:30,959][1652491] Updated weights for policy 0, policy_version 639654 (0.0014) [2024-06-15 19:22:32,603][1652491] Updated weights for policy 0, policy_version 639728 (0.0013) [2024-06-15 19:22:34,426][1652491] Updated weights for policy 0, policy_version 639800 (0.0013) [2024-06-15 19:22:35,955][1648985] Fps is (10 sec: 42597.5, 60 sec: 49151.9, 300 sec: 47541.4). Total num frames: 1310359552. Throughput: 0: 11935.3. Samples: 327654400. Policy #0 lag: (min: 15.0, avg: 74.4, max: 271.0) [2024-06-15 19:22:35,956][1648985] Avg episode reward: [(0, '160.240')] [2024-06-15 19:22:36,410][1652491] Updated weights for policy 0, policy_version 639841 (0.0011) [2024-06-15 19:22:40,955][1648985] Fps is (10 sec: 45875.5, 60 sec: 44782.9, 300 sec: 46652.7). Total num frames: 1310457856. Throughput: 0: 11992.3. Samples: 327688704. Policy #0 lag: (min: 15.0, avg: 74.4, max: 271.0) [2024-06-15 19:22:40,956][1648985] Avg episode reward: [(0, '148.170')] [2024-06-15 19:22:42,375][1652491] Updated weights for policy 0, policy_version 639892 (0.0012) [2024-06-15 19:22:43,305][1652491] Updated weights for policy 0, policy_version 639939 (0.0013) [2024-06-15 19:22:45,069][1652491] Updated weights for policy 0, policy_version 640016 (0.0015) [2024-06-15 19:22:45,955][1648985] Fps is (10 sec: 45875.7, 60 sec: 48605.8, 300 sec: 47430.3). Total num frames: 1310818304. Throughput: 0: 11958.0. Samples: 327761920. Policy #0 lag: (min: 10.0, avg: 75.5, max: 266.0) [2024-06-15 19:22:45,956][1648985] Avg episode reward: [(0, '143.980')] [2024-06-15 19:22:46,186][1652491] Updated weights for policy 0, policy_version 640061 (0.0011) [2024-06-15 19:22:48,276][1652491] Updated weights for policy 0, policy_version 640121 (0.0013) [2024-06-15 19:22:50,955][1648985] Fps is (10 sec: 52429.2, 60 sec: 45875.3, 300 sec: 47097.1). Total num frames: 1310982144. Throughput: 0: 11923.9. Samples: 327832576. Policy #0 lag: (min: 10.0, avg: 75.5, max: 266.0) [2024-06-15 19:22:50,955][1648985] Avg episode reward: [(0, '153.760')] [2024-06-15 19:22:53,634][1652491] Updated weights for policy 0, policy_version 640176 (0.0013) [2024-06-15 19:22:55,090][1652491] Updated weights for policy 0, policy_version 640242 (0.0012) [2024-06-15 19:22:55,955][1648985] Fps is (10 sec: 45875.3, 60 sec: 48606.2, 300 sec: 47208.1). Total num frames: 1311277056. Throughput: 0: 11821.5. Samples: 327868928. Policy #0 lag: (min: 10.0, avg: 75.5, max: 266.0) [2024-06-15 19:22:55,956][1648985] Avg episode reward: [(0, '156.900')] [2024-06-15 19:22:56,439][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000640304_1311342592.pth... [2024-06-15 19:22:56,479][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000634752_1299972096.pth [2024-06-15 19:22:56,590][1652491] Updated weights for policy 0, policy_version 640305 (0.0014) [2024-06-15 19:22:58,170][1652491] Updated weights for policy 0, policy_version 640327 (0.0013) [2024-06-15 19:23:00,955][1648985] Fps is (10 sec: 52428.0, 60 sec: 47513.4, 300 sec: 47097.0). Total num frames: 1311506432. Throughput: 0: 11855.6. Samples: 327937536. Policy #0 lag: (min: 10.0, avg: 75.5, max: 266.0) [2024-06-15 19:23:00,956][1648985] Avg episode reward: [(0, '149.810')] [2024-06-15 19:23:02,956][1652491] Updated weights for policy 0, policy_version 640387 (0.0014) [2024-06-15 19:23:04,102][1652491] Updated weights for policy 0, policy_version 640437 (0.0011) [2024-06-15 19:23:04,382][1651469] Signal inference workers to stop experience collection... (33300 times) [2024-06-15 19:23:04,431][1652491] InferenceWorker_p0-w0: stopping experience collection (33300 times) [2024-06-15 19:23:04,611][1651469] Signal inference workers to resume experience collection... (33300 times) [2024-06-15 19:23:04,612][1652491] InferenceWorker_p0-w0: resuming experience collection (33300 times) [2024-06-15 19:23:05,829][1652491] Updated weights for policy 0, policy_version 640516 (0.0014) [2024-06-15 19:23:05,955][1648985] Fps is (10 sec: 49152.4, 60 sec: 48059.8, 300 sec: 47097.1). Total num frames: 1311768576. Throughput: 0: 11889.8. Samples: 328015872. Policy #0 lag: (min: 10.0, avg: 75.5, max: 266.0) [2024-06-15 19:23:05,956][1648985] Avg episode reward: [(0, '141.980')] [2024-06-15 19:23:07,002][1652491] Updated weights for policy 0, policy_version 640573 (0.0014) [2024-06-15 19:23:10,397][1652491] Updated weights for policy 0, policy_version 640635 (0.0012) [2024-06-15 19:23:10,955][1648985] Fps is (10 sec: 52429.4, 60 sec: 48059.8, 300 sec: 47541.4). Total num frames: 1312030720. Throughput: 0: 11958.0. Samples: 328050176. Policy #0 lag: (min: 10.0, avg: 75.5, max: 266.0) [2024-06-15 19:23:10,956][1648985] Avg episode reward: [(0, '153.710')] [2024-06-15 19:23:15,065][1652491] Updated weights for policy 0, policy_version 640688 (0.0012) [2024-06-15 19:23:15,955][1648985] Fps is (10 sec: 42598.6, 60 sec: 46421.4, 300 sec: 47208.9). Total num frames: 1312194560. Throughput: 0: 11832.9. Samples: 328126464. Policy #0 lag: (min: 10.0, avg: 75.5, max: 266.0) [2024-06-15 19:23:15,955][1648985] Avg episode reward: [(0, '160.630')] [2024-06-15 19:23:16,934][1652491] Updated weights for policy 0, policy_version 640754 (0.0015) [2024-06-15 19:23:18,421][1652491] Updated weights for policy 0, policy_version 640801 (0.0013) [2024-06-15 19:23:20,955][1648985] Fps is (10 sec: 49152.0, 60 sec: 49152.0, 300 sec: 47652.5). Total num frames: 1312522240. Throughput: 0: 11650.9. Samples: 328178688. Policy #0 lag: (min: 10.0, avg: 75.5, max: 266.0) [2024-06-15 19:23:20,956][1648985] Avg episode reward: [(0, '165.620')] [2024-06-15 19:23:21,057][1652491] Updated weights for policy 0, policy_version 640891 (0.0015) [2024-06-15 19:23:25,955][1648985] Fps is (10 sec: 36044.6, 60 sec: 43690.6, 300 sec: 46986.0). Total num frames: 1312555008. Throughput: 0: 11810.1. Samples: 328220160. Policy #0 lag: (min: 10.0, avg: 75.5, max: 266.0) [2024-06-15 19:23:25,955][1648985] Avg episode reward: [(0, '174.680')] [2024-06-15 19:23:26,427][1652491] Updated weights for policy 0, policy_version 640931 (0.0012) [2024-06-15 19:23:28,807][1652491] Updated weights for policy 0, policy_version 641026 (0.0118) [2024-06-15 19:23:30,135][1652491] Updated weights for policy 0, policy_version 641088 (0.0014) [2024-06-15 19:23:30,955][1648985] Fps is (10 sec: 42598.4, 60 sec: 49152.1, 300 sec: 47541.4). Total num frames: 1312948224. Throughput: 0: 11559.8. Samples: 328282112. Policy #0 lag: (min: 10.0, avg: 75.5, max: 266.0) [2024-06-15 19:23:30,956][1648985] Avg episode reward: [(0, '182.250')] [2024-06-15 19:23:32,568][1652491] Updated weights for policy 0, policy_version 641152 (0.0015) [2024-06-15 19:23:35,955][1648985] Fps is (10 sec: 52429.1, 60 sec: 45329.2, 300 sec: 47208.1). Total num frames: 1313079296. Throughput: 0: 11753.2. Samples: 328361472. Policy #0 lag: (min: 10.0, avg: 75.5, max: 266.0) [2024-06-15 19:23:35,955][1648985] Avg episode reward: [(0, '178.520')] [2024-06-15 19:23:38,618][1652491] Updated weights for policy 0, policy_version 641219 (0.0013) [2024-06-15 19:23:40,409][1652491] Updated weights for policy 0, policy_version 641296 (0.0013) [2024-06-15 19:23:40,955][1648985] Fps is (10 sec: 45874.6, 60 sec: 49151.9, 300 sec: 47319.2). Total num frames: 1313406976. Throughput: 0: 11753.2. Samples: 328397824. Policy #0 lag: (min: 10.0, avg: 75.5, max: 266.0) [2024-06-15 19:23:40,956][1648985] Avg episode reward: [(0, '182.340')] [2024-06-15 19:23:41,228][1652491] Updated weights for policy 0, policy_version 641333 (0.0013) [2024-06-15 19:23:42,692][1652491] Updated weights for policy 0, policy_version 641376 (0.0014) [2024-06-15 19:23:42,820][1651469] Signal inference workers to stop experience collection... (33350 times) [2024-06-15 19:23:42,871][1652491] InferenceWorker_p0-w0: stopping experience collection (33350 times) [2024-06-15 19:23:43,044][1651469] Signal inference workers to resume experience collection... (33350 times) [2024-06-15 19:23:43,046][1652491] InferenceWorker_p0-w0: resuming experience collection (33350 times) [2024-06-15 19:23:45,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 46421.4, 300 sec: 47541.4). Total num frames: 1313603584. Throughput: 0: 11730.5. Samples: 328465408. Policy #0 lag: (min: 10.0, avg: 75.5, max: 266.0) [2024-06-15 19:23:45,956][1648985] Avg episode reward: [(0, '178.830')] [2024-06-15 19:23:48,378][1652491] Updated weights for policy 0, policy_version 641429 (0.0016) [2024-06-15 19:23:49,854][1652491] Updated weights for policy 0, policy_version 641488 (0.0010) [2024-06-15 19:23:50,955][1648985] Fps is (10 sec: 45875.3, 60 sec: 48059.6, 300 sec: 47319.6). Total num frames: 1313865728. Throughput: 0: 11673.6. Samples: 328541184. Policy #0 lag: (min: 10.0, avg: 75.5, max: 266.0) [2024-06-15 19:23:50,956][1648985] Avg episode reward: [(0, '186.590')] [2024-06-15 19:23:51,081][1652491] Updated weights for policy 0, policy_version 641538 (0.0014) [2024-06-15 19:23:52,639][1652491] Updated weights for policy 0, policy_version 641600 (0.0013) [2024-06-15 19:23:54,261][1652491] Updated weights for policy 0, policy_version 641659 (0.0014) [2024-06-15 19:23:55,957][1648985] Fps is (10 sec: 52421.6, 60 sec: 47512.6, 300 sec: 47541.2). Total num frames: 1314127872. Throughput: 0: 11514.0. Samples: 328568320. Policy #0 lag: (min: 10.0, avg: 75.5, max: 266.0) [2024-06-15 19:23:55,957][1648985] Avg episode reward: [(0, '167.170')] [2024-06-15 19:24:00,177][1652491] Updated weights for policy 0, policy_version 641700 (0.0014) [2024-06-15 19:24:00,955][1648985] Fps is (10 sec: 39322.1, 60 sec: 45875.3, 300 sec: 47208.1). Total num frames: 1314258944. Throughput: 0: 11798.7. Samples: 328657408. Policy #0 lag: (min: 10.0, avg: 75.5, max: 266.0) [2024-06-15 19:24:00,955][1648985] Avg episode reward: [(0, '146.320')] [2024-06-15 19:24:01,476][1652491] Updated weights for policy 0, policy_version 641760 (0.0012) [2024-06-15 19:24:03,601][1652491] Updated weights for policy 0, policy_version 641840 (0.0014) [2024-06-15 19:24:05,625][1652491] Updated weights for policy 0, policy_version 641911 (0.0015) [2024-06-15 19:24:05,955][1648985] Fps is (10 sec: 52435.9, 60 sec: 48059.7, 300 sec: 47652.5). Total num frames: 1314652160. Throughput: 0: 11810.1. Samples: 328710144. Policy #0 lag: (min: 10.0, avg: 75.5, max: 266.0) [2024-06-15 19:24:05,956][1648985] Avg episode reward: [(0, '147.210')] [2024-06-15 19:24:10,955][1648985] Fps is (10 sec: 39321.5, 60 sec: 43690.7, 300 sec: 47097.1). Total num frames: 1314652160. Throughput: 0: 11855.6. Samples: 328753664. Policy #0 lag: (min: 10.0, avg: 75.5, max: 266.0) [2024-06-15 19:24:10,955][1648985] Avg episode reward: [(0, '147.070')] [2024-06-15 19:24:11,751][1652491] Updated weights for policy 0, policy_version 641956 (0.0022) [2024-06-15 19:24:13,616][1652491] Updated weights for policy 0, policy_version 642022 (0.0038) [2024-06-15 19:24:15,324][1652491] Updated weights for policy 0, policy_version 642096 (0.0067) [2024-06-15 19:24:15,955][1648985] Fps is (10 sec: 39321.0, 60 sec: 47513.4, 300 sec: 47430.3). Total num frames: 1315045376. Throughput: 0: 11878.4. Samples: 328816640. Policy #0 lag: (min: 10.0, avg: 75.5, max: 266.0) [2024-06-15 19:24:15,956][1648985] Avg episode reward: [(0, '150.810')] [2024-06-15 19:24:17,294][1652491] Updated weights for policy 0, policy_version 642144 (0.0014) [2024-06-15 19:24:20,955][1648985] Fps is (10 sec: 52428.6, 60 sec: 44236.8, 300 sec: 47319.2). Total num frames: 1315176448. Throughput: 0: 11912.5. Samples: 328897536. Policy #0 lag: (min: 10.0, avg: 75.5, max: 266.0) [2024-06-15 19:24:20,956][1648985] Avg episode reward: [(0, '168.010')] [2024-06-15 19:24:21,507][1652491] Updated weights for policy 0, policy_version 642192 (0.0013) [2024-06-15 19:24:23,808][1652491] Updated weights for policy 0, policy_version 642272 (0.0014) [2024-06-15 19:24:24,863][1651469] Signal inference workers to stop experience collection... (33400 times) [2024-06-15 19:24:24,940][1652491] InferenceWorker_p0-w0: stopping experience collection (33400 times) [2024-06-15 19:24:25,048][1651469] Signal inference workers to resume experience collection... (33400 times) [2024-06-15 19:24:25,049][1652491] InferenceWorker_p0-w0: resuming experience collection (33400 times) [2024-06-15 19:24:25,195][1652491] Updated weights for policy 0, policy_version 642322 (0.0012) [2024-06-15 19:24:25,955][1648985] Fps is (10 sec: 49152.6, 60 sec: 49698.1, 300 sec: 47430.3). Total num frames: 1315536896. Throughput: 0: 11650.9. Samples: 328922112. Policy #0 lag: (min: 15.0, avg: 91.0, max: 271.0) [2024-06-15 19:24:25,956][1648985] Avg episode reward: [(0, '170.520')] [2024-06-15 19:24:26,056][1652491] Updated weights for policy 0, policy_version 642366 (0.0028) [2024-06-15 19:24:29,447][1652491] Updated weights for policy 0, policy_version 642429 (0.0012) [2024-06-15 19:24:30,955][1648985] Fps is (10 sec: 52427.4, 60 sec: 45875.0, 300 sec: 47541.3). Total num frames: 1315700736. Throughput: 0: 11684.9. Samples: 328991232. Policy #0 lag: (min: 15.0, avg: 91.0, max: 271.0) [2024-06-15 19:24:30,956][1648985] Avg episode reward: [(0, '171.920')] [2024-06-15 19:24:33,010][1652491] Updated weights for policy 0, policy_version 642469 (0.0011) [2024-06-15 19:24:34,967][1652491] Updated weights for policy 0, policy_version 642557 (0.0019) [2024-06-15 19:24:35,955][1648985] Fps is (10 sec: 42598.7, 60 sec: 48059.7, 300 sec: 47097.1). Total num frames: 1315962880. Throughput: 0: 11525.7. Samples: 329059840. Policy #0 lag: (min: 15.0, avg: 91.0, max: 271.0) [2024-06-15 19:24:35,955][1648985] Avg episode reward: [(0, '165.940')] [2024-06-15 19:24:37,966][1652491] Updated weights for policy 0, policy_version 642624 (0.0016) [2024-06-15 19:24:40,955][1648985] Fps is (10 sec: 52430.1, 60 sec: 46967.5, 300 sec: 47541.4). Total num frames: 1316225024. Throughput: 0: 11730.8. Samples: 329096192. Policy #0 lag: (min: 15.0, avg: 91.0, max: 271.0) [2024-06-15 19:24:40,956][1648985] Avg episode reward: [(0, '171.340')] [2024-06-15 19:24:40,963][1652491] Updated weights for policy 0, policy_version 642688 (0.0013) [2024-06-15 19:24:44,361][1652491] Updated weights for policy 0, policy_version 642744 (0.0035) [2024-06-15 19:24:45,504][1652491] Updated weights for policy 0, policy_version 642784 (0.0013) [2024-06-15 19:24:45,955][1648985] Fps is (10 sec: 49151.7, 60 sec: 47513.6, 300 sec: 47319.3). Total num frames: 1316454400. Throughput: 0: 11377.8. Samples: 329169408. Policy #0 lag: (min: 15.0, avg: 91.0, max: 271.0) [2024-06-15 19:24:45,956][1648985] Avg episode reward: [(0, '178.040')] [2024-06-15 19:24:47,793][1652491] Updated weights for policy 0, policy_version 642848 (0.0020) [2024-06-15 19:24:50,956][1648985] Fps is (10 sec: 39318.2, 60 sec: 45874.6, 300 sec: 47096.9). Total num frames: 1316618240. Throughput: 0: 11855.4. Samples: 329243648. Policy #0 lag: (min: 15.0, avg: 91.0, max: 271.0) [2024-06-15 19:24:50,957][1648985] Avg episode reward: [(0, '167.990')] [2024-06-15 19:24:51,682][1652491] Updated weights for policy 0, policy_version 642914 (0.0088) [2024-06-15 19:24:55,628][1652491] Updated weights for policy 0, policy_version 642996 (0.0093) [2024-06-15 19:24:55,955][1648985] Fps is (10 sec: 42596.9, 60 sec: 45875.9, 300 sec: 47208.1). Total num frames: 1316880384. Throughput: 0: 11650.7. Samples: 329277952. Policy #0 lag: (min: 15.0, avg: 91.0, max: 271.0) [2024-06-15 19:24:55,956][1648985] Avg episode reward: [(0, '163.400')] [2024-06-15 19:24:56,214][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000643024_1316913152.pth... [2024-06-15 19:24:56,365][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000637488_1305575424.pth [2024-06-15 19:24:57,230][1652491] Updated weights for policy 0, policy_version 643072 (0.0012) [2024-06-15 19:24:59,397][1652491] Updated weights for policy 0, policy_version 643136 (0.0014) [2024-06-15 19:25:00,955][1648985] Fps is (10 sec: 52433.6, 60 sec: 48059.7, 300 sec: 47430.3). Total num frames: 1317142528. Throughput: 0: 11753.3. Samples: 329345536. Policy #0 lag: (min: 15.0, avg: 91.0, max: 271.0) [2024-06-15 19:25:00,956][1648985] Avg episode reward: [(0, '162.050')] [2024-06-15 19:25:03,049][1652491] Updated weights for policy 0, policy_version 643198 (0.0011) [2024-06-15 19:25:05,955][1648985] Fps is (10 sec: 45876.7, 60 sec: 44782.9, 300 sec: 47097.1). Total num frames: 1317339136. Throughput: 0: 11730.5. Samples: 329425408. Policy #0 lag: (min: 15.0, avg: 91.0, max: 271.0) [2024-06-15 19:25:05,956][1648985] Avg episode reward: [(0, '179.310')] [2024-06-15 19:25:06,565][1652491] Updated weights for policy 0, policy_version 643264 (0.0013) [2024-06-15 19:25:07,596][1651469] Signal inference workers to stop experience collection... (33450 times) [2024-06-15 19:25:07,644][1652491] InferenceWorker_p0-w0: stopping experience collection (33450 times) [2024-06-15 19:25:07,745][1651469] Signal inference workers to resume experience collection... (33450 times) [2024-06-15 19:25:07,747][1652491] InferenceWorker_p0-w0: resuming experience collection (33450 times) [2024-06-15 19:25:07,916][1652491] Updated weights for policy 0, policy_version 643318 (0.0012) [2024-06-15 19:25:10,738][1652491] Updated weights for policy 0, policy_version 643385 (0.0015) [2024-06-15 19:25:10,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 50244.3, 300 sec: 47541.4). Total num frames: 1317666816. Throughput: 0: 11889.8. Samples: 329457152. Policy #0 lag: (min: 15.0, avg: 91.0, max: 271.0) [2024-06-15 19:25:10,956][1648985] Avg episode reward: [(0, '168.630')] [2024-06-15 19:25:14,357][1652491] Updated weights for policy 0, policy_version 643451 (0.0013) [2024-06-15 19:25:15,955][1648985] Fps is (10 sec: 49152.5, 60 sec: 46421.5, 300 sec: 47430.3). Total num frames: 1317830656. Throughput: 0: 11958.1. Samples: 329529344. Policy #0 lag: (min: 15.0, avg: 91.0, max: 271.0) [2024-06-15 19:25:15,955][1648985] Avg episode reward: [(0, '173.900')] [2024-06-15 19:25:16,851][1652491] Updated weights for policy 0, policy_version 643508 (0.0106) [2024-06-15 19:25:18,203][1652491] Updated weights for policy 0, policy_version 643569 (0.0016) [2024-06-15 19:25:20,955][1648985] Fps is (10 sec: 39321.1, 60 sec: 48059.7, 300 sec: 47097.1). Total num frames: 1318060032. Throughput: 0: 12037.7. Samples: 329601536. Policy #0 lag: (min: 15.0, avg: 91.0, max: 271.0) [2024-06-15 19:25:20,957][1648985] Avg episode reward: [(0, '176.840')] [2024-06-15 19:25:21,264][1652491] Updated weights for policy 0, policy_version 643608 (0.0013) [2024-06-15 19:25:22,104][1652491] Updated weights for policy 0, policy_version 643647 (0.0012) [2024-06-15 19:25:25,328][1652491] Updated weights for policy 0, policy_version 643708 (0.0017) [2024-06-15 19:25:25,955][1648985] Fps is (10 sec: 49151.5, 60 sec: 46421.3, 300 sec: 47541.4). Total num frames: 1318322176. Throughput: 0: 12083.2. Samples: 329639936. Policy #0 lag: (min: 15.0, avg: 91.0, max: 271.0) [2024-06-15 19:25:25,956][1648985] Avg episode reward: [(0, '178.920')] [2024-06-15 19:25:27,619][1652491] Updated weights for policy 0, policy_version 643760 (0.0013) [2024-06-15 19:25:29,423][1652491] Updated weights for policy 0, policy_version 643837 (0.0012) [2024-06-15 19:25:30,955][1648985] Fps is (10 sec: 52429.2, 60 sec: 48060.0, 300 sec: 47319.2). Total num frames: 1318584320. Throughput: 0: 11753.2. Samples: 329698304. Policy #0 lag: (min: 15.0, avg: 91.0, max: 271.0) [2024-06-15 19:25:30,956][1648985] Avg episode reward: [(0, '182.800')] [2024-06-15 19:25:33,495][1652491] Updated weights for policy 0, policy_version 643897 (0.0012) [2024-06-15 19:25:35,955][1648985] Fps is (10 sec: 39321.6, 60 sec: 45875.1, 300 sec: 47541.4). Total num frames: 1318715392. Throughput: 0: 11924.1. Samples: 329780224. Policy #0 lag: (min: 15.0, avg: 91.0, max: 271.0) [2024-06-15 19:25:35,956][1648985] Avg episode reward: [(0, '171.870')] [2024-06-15 19:25:37,010][1652491] Updated weights for policy 0, policy_version 643940 (0.0071) [2024-06-15 19:25:39,104][1652491] Updated weights for policy 0, policy_version 644019 (0.0013) [2024-06-15 19:25:40,156][1652491] Updated weights for policy 0, policy_version 644064 (0.0039) [2024-06-15 19:25:40,850][1652491] Updated weights for policy 0, policy_version 644096 (0.0013) [2024-06-15 19:25:40,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 48059.8, 300 sec: 47541.4). Total num frames: 1319108608. Throughput: 0: 11730.6. Samples: 329805824. Policy #0 lag: (min: 15.0, avg: 91.0, max: 271.0) [2024-06-15 19:25:40,956][1648985] Avg episode reward: [(0, '173.800')] [2024-06-15 19:25:44,894][1652491] Updated weights for policy 0, policy_version 644160 (0.0019) [2024-06-15 19:25:45,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 46421.3, 300 sec: 47541.4). Total num frames: 1319239680. Throughput: 0: 11832.9. Samples: 329878016. Policy #0 lag: (min: 15.0, avg: 91.0, max: 271.0) [2024-06-15 19:25:45,956][1648985] Avg episode reward: [(0, '173.590')] [2024-06-15 19:25:48,969][1652491] Updated weights for policy 0, policy_version 644222 (0.0012) [2024-06-15 19:25:50,007][1651469] Signal inference workers to stop experience collection... (33500 times) [2024-06-15 19:25:50,031][1652491] InferenceWorker_p0-w0: stopping experience collection (33500 times) [2024-06-15 19:25:50,288][1651469] Signal inference workers to resume experience collection... (33500 times) [2024-06-15 19:25:50,289][1652491] InferenceWorker_p0-w0: resuming experience collection (33500 times) [2024-06-15 19:25:50,873][1652491] Updated weights for policy 0, policy_version 644291 (0.0075) [2024-06-15 19:25:50,955][1648985] Fps is (10 sec: 39321.5, 60 sec: 48060.5, 300 sec: 47319.2). Total num frames: 1319501824. Throughput: 0: 11434.7. Samples: 329939968. Policy #0 lag: (min: 15.0, avg: 91.0, max: 271.0) [2024-06-15 19:25:50,956][1648985] Avg episode reward: [(0, '162.640')] [2024-06-15 19:25:52,130][1652491] Updated weights for policy 0, policy_version 644352 (0.0084) [2024-06-15 19:25:55,955][1648985] Fps is (10 sec: 42598.4, 60 sec: 46421.6, 300 sec: 47208.2). Total num frames: 1319665664. Throughput: 0: 11468.8. Samples: 329973248. Policy #0 lag: (min: 15.0, avg: 91.0, max: 271.0) [2024-06-15 19:25:55,956][1648985] Avg episode reward: [(0, '152.930')] [2024-06-15 19:25:56,693][1652491] Updated weights for policy 0, policy_version 644401 (0.0012) [2024-06-15 19:25:59,907][1652491] Updated weights for policy 0, policy_version 644449 (0.0012) [2024-06-15 19:26:00,955][1648985] Fps is (10 sec: 39321.6, 60 sec: 45875.2, 300 sec: 47100.8). Total num frames: 1319895040. Throughput: 0: 11616.7. Samples: 330052096. Policy #0 lag: (min: 7.0, avg: 84.2, max: 263.0) [2024-06-15 19:26:00,956][1648985] Avg episode reward: [(0, '132.090')] [2024-06-15 19:26:01,586][1652491] Updated weights for policy 0, policy_version 644514 (0.0017) [2024-06-15 19:26:03,750][1652491] Updated weights for policy 0, policy_version 644608 (0.0013) [2024-06-15 19:26:05,955][1648985] Fps is (10 sec: 49152.3, 60 sec: 46967.5, 300 sec: 47097.1). Total num frames: 1320157184. Throughput: 0: 11355.1. Samples: 330112512. Policy #0 lag: (min: 7.0, avg: 84.2, max: 263.0) [2024-06-15 19:26:05,955][1648985] Avg episode reward: [(0, '146.150')] [2024-06-15 19:26:08,256][1652491] Updated weights for policy 0, policy_version 644662 (0.0014) [2024-06-15 19:26:10,955][1648985] Fps is (10 sec: 39321.5, 60 sec: 43690.6, 300 sec: 46652.7). Total num frames: 1320288256. Throughput: 0: 11332.3. Samples: 330149888. Policy #0 lag: (min: 7.0, avg: 84.2, max: 263.0) [2024-06-15 19:26:10,956][1648985] Avg episode reward: [(0, '172.990')] [2024-06-15 19:26:11,906][1652491] Updated weights for policy 0, policy_version 644691 (0.0019) [2024-06-15 19:26:13,798][1652491] Updated weights for policy 0, policy_version 644768 (0.0011) [2024-06-15 19:26:15,769][1652491] Updated weights for policy 0, policy_version 644854 (0.0017) [2024-06-15 19:26:15,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 47513.6, 300 sec: 47097.1). Total num frames: 1320681472. Throughput: 0: 11400.6. Samples: 330211328. Policy #0 lag: (min: 7.0, avg: 84.2, max: 263.0) [2024-06-15 19:26:15,955][1648985] Avg episode reward: [(0, '174.220')] [2024-06-15 19:26:19,469][1652491] Updated weights for policy 0, policy_version 644912 (0.0012) [2024-06-15 19:26:20,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 45875.3, 300 sec: 46874.9). Total num frames: 1320812544. Throughput: 0: 11343.7. Samples: 330290688. Policy #0 lag: (min: 7.0, avg: 84.2, max: 263.0) [2024-06-15 19:26:20,956][1648985] Avg episode reward: [(0, '160.390')] [2024-06-15 19:26:22,384][1652491] Updated weights for policy 0, policy_version 644933 (0.0011) [2024-06-15 19:26:24,499][1652491] Updated weights for policy 0, policy_version 645024 (0.0012) [2024-06-15 19:26:25,956][1648985] Fps is (10 sec: 42593.7, 60 sec: 46420.6, 300 sec: 46763.6). Total num frames: 1321107456. Throughput: 0: 11479.9. Samples: 330322432. Policy #0 lag: (min: 7.0, avg: 84.2, max: 263.0) [2024-06-15 19:26:25,957][1648985] Avg episode reward: [(0, '149.060')] [2024-06-15 19:26:26,043][1652491] Updated weights for policy 0, policy_version 645088 (0.0016) [2024-06-15 19:26:29,820][1652491] Updated weights for policy 0, policy_version 645152 (0.0020) [2024-06-15 19:26:30,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 45875.2, 300 sec: 47208.1). Total num frames: 1321336832. Throughput: 0: 11468.8. Samples: 330394112. Policy #0 lag: (min: 7.0, avg: 84.2, max: 263.0) [2024-06-15 19:26:30,956][1648985] Avg episode reward: [(0, '160.380')] [2024-06-15 19:26:33,363][1651469] Signal inference workers to stop experience collection... (33550 times) [2024-06-15 19:26:33,397][1652491] InferenceWorker_p0-w0: stopping experience collection (33550 times) [2024-06-15 19:26:33,406][1652491] Updated weights for policy 0, policy_version 645187 (0.0014) [2024-06-15 19:26:33,626][1651469] Signal inference workers to resume experience collection... (33550 times) [2024-06-15 19:26:33,627][1652491] InferenceWorker_p0-w0: resuming experience collection (33550 times) [2024-06-15 19:26:34,912][1652491] Updated weights for policy 0, policy_version 645264 (0.0014) [2024-06-15 19:26:35,955][1648985] Fps is (10 sec: 49157.4, 60 sec: 48059.8, 300 sec: 46874.9). Total num frames: 1321598976. Throughput: 0: 11719.1. Samples: 330467328. Policy #0 lag: (min: 7.0, avg: 84.2, max: 263.0) [2024-06-15 19:26:35,955][1648985] Avg episode reward: [(0, '161.210')] [2024-06-15 19:26:36,012][1652491] Updated weights for policy 0, policy_version 645315 (0.0012) [2024-06-15 19:26:40,190][1652491] Updated weights for policy 0, policy_version 645377 (0.0142) [2024-06-15 19:26:40,955][1648985] Fps is (10 sec: 45875.3, 60 sec: 44782.9, 300 sec: 47097.0). Total num frames: 1321795584. Throughput: 0: 11764.6. Samples: 330502656. Policy #0 lag: (min: 7.0, avg: 84.2, max: 263.0) [2024-06-15 19:26:40,956][1648985] Avg episode reward: [(0, '172.750')] [2024-06-15 19:26:41,250][1652491] Updated weights for policy 0, policy_version 645433 (0.0014) [2024-06-15 19:26:44,763][1652491] Updated weights for policy 0, policy_version 645475 (0.0014) [2024-06-15 19:26:45,955][1648985] Fps is (10 sec: 42597.4, 60 sec: 46421.2, 300 sec: 46763.8). Total num frames: 1322024960. Throughput: 0: 11741.8. Samples: 330580480. Policy #0 lag: (min: 7.0, avg: 84.2, max: 263.0) [2024-06-15 19:26:45,956][1648985] Avg episode reward: [(0, '169.950')] [2024-06-15 19:26:46,577][1652491] Updated weights for policy 0, policy_version 645568 (0.0136) [2024-06-15 19:26:47,776][1652491] Updated weights for policy 0, policy_version 645632 (0.0188) [2024-06-15 19:26:50,955][1648985] Fps is (10 sec: 45875.2, 60 sec: 45875.2, 300 sec: 47097.1). Total num frames: 1322254336. Throughput: 0: 11958.0. Samples: 330650624. Policy #0 lag: (min: 7.0, avg: 84.2, max: 263.0) [2024-06-15 19:26:50,956][1648985] Avg episode reward: [(0, '160.260')] [2024-06-15 19:26:52,862][1652491] Updated weights for policy 0, policy_version 645685 (0.0013) [2024-06-15 19:26:53,222][1652491] Updated weights for policy 0, policy_version 645696 (0.0011) [2024-06-15 19:26:55,955][1648985] Fps is (10 sec: 45874.8, 60 sec: 46967.3, 300 sec: 46874.8). Total num frames: 1322483712. Throughput: 0: 11901.1. Samples: 330685440. Policy #0 lag: (min: 7.0, avg: 84.2, max: 263.0) [2024-06-15 19:26:55,956][1648985] Avg episode reward: [(0, '168.490')] [2024-06-15 19:26:56,106][1652491] Updated weights for policy 0, policy_version 645755 (0.0049) [2024-06-15 19:26:56,292][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000645760_1322516480.pth... [2024-06-15 19:26:56,453][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000640304_1311342592.pth [2024-06-15 19:26:57,776][1652491] Updated weights for policy 0, policy_version 645812 (0.0015) [2024-06-15 19:26:58,930][1652491] Updated weights for policy 0, policy_version 645872 (0.0149) [2024-06-15 19:27:00,955][1648985] Fps is (10 sec: 52428.1, 60 sec: 48059.6, 300 sec: 47097.0). Total num frames: 1322778624. Throughput: 0: 11958.0. Samples: 330749440. Policy #0 lag: (min: 7.0, avg: 84.2, max: 263.0) [2024-06-15 19:27:00,956][1648985] Avg episode reward: [(0, '173.870')] [2024-06-15 19:27:04,160][1652491] Updated weights for policy 0, policy_version 645936 (0.0092) [2024-06-15 19:27:05,955][1648985] Fps is (10 sec: 45876.8, 60 sec: 46421.4, 300 sec: 46763.8). Total num frames: 1322942464. Throughput: 0: 11935.3. Samples: 330827776. Policy #0 lag: (min: 7.0, avg: 84.2, max: 263.0) [2024-06-15 19:27:05,955][1648985] Avg episode reward: [(0, '173.600')] [2024-06-15 19:27:06,146][1652491] Updated weights for policy 0, policy_version 645984 (0.0021) [2024-06-15 19:27:07,610][1652491] Updated weights for policy 0, policy_version 646026 (0.0013) [2024-06-15 19:27:08,490][1652491] Updated weights for policy 0, policy_version 646068 (0.0014) [2024-06-15 19:27:09,082][1651469] Signal inference workers to stop experience collection... (33600 times) [2024-06-15 19:27:09,143][1652491] InferenceWorker_p0-w0: stopping experience collection (33600 times) [2024-06-15 19:27:09,410][1651469] Signal inference workers to resume experience collection... (33600 times) [2024-06-15 19:27:09,413][1652491] InferenceWorker_p0-w0: resuming experience collection (33600 times) [2024-06-15 19:27:10,955][1648985] Fps is (10 sec: 52429.7, 60 sec: 50244.3, 300 sec: 47097.1). Total num frames: 1323302912. Throughput: 0: 12060.7. Samples: 330865152. Policy #0 lag: (min: 7.0, avg: 84.2, max: 263.0) [2024-06-15 19:27:10,955][1648985] Avg episode reward: [(0, '171.380')] [2024-06-15 19:27:13,535][1652491] Updated weights for policy 0, policy_version 646146 (0.0112) [2024-06-15 19:27:14,998][1652491] Updated weights for policy 0, policy_version 646206 (0.0117) [2024-06-15 19:27:15,955][1648985] Fps is (10 sec: 49151.1, 60 sec: 45875.1, 300 sec: 46986.0). Total num frames: 1323433984. Throughput: 0: 12151.5. Samples: 330940928. Policy #0 lag: (min: 7.0, avg: 84.2, max: 263.0) [2024-06-15 19:27:15,956][1648985] Avg episode reward: [(0, '142.080')] [2024-06-15 19:27:17,060][1652491] Updated weights for policy 0, policy_version 646271 (0.0012) [2024-06-15 19:27:19,883][1652491] Updated weights for policy 0, policy_version 646368 (0.0013) [2024-06-15 19:27:20,955][1648985] Fps is (10 sec: 52429.1, 60 sec: 50244.3, 300 sec: 47097.1). Total num frames: 1323827200. Throughput: 0: 11958.0. Samples: 331005440. Policy #0 lag: (min: 7.0, avg: 84.2, max: 263.0) [2024-06-15 19:27:20,955][1648985] Avg episode reward: [(0, '135.870')] [2024-06-15 19:27:25,391][1652491] Updated weights for policy 0, policy_version 646448 (0.0014) [2024-06-15 19:27:25,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 47514.4, 300 sec: 47319.2). Total num frames: 1323958272. Throughput: 0: 12174.2. Samples: 331050496. Policy #0 lag: (min: 7.0, avg: 84.2, max: 263.0) [2024-06-15 19:27:25,956][1648985] Avg episode reward: [(0, '139.600')] [2024-06-15 19:27:27,052][1652491] Updated weights for policy 0, policy_version 646482 (0.0012) [2024-06-15 19:27:28,762][1652491] Updated weights for policy 0, policy_version 646544 (0.0014) [2024-06-15 19:27:30,261][1652491] Updated weights for policy 0, policy_version 646608 (0.0024) [2024-06-15 19:27:30,955][1648985] Fps is (10 sec: 49151.5, 60 sec: 49698.1, 300 sec: 47319.2). Total num frames: 1324318720. Throughput: 0: 11901.2. Samples: 331116032. Policy #0 lag: (min: 7.0, avg: 84.2, max: 263.0) [2024-06-15 19:27:30,956][1648985] Avg episode reward: [(0, '146.240')] [2024-06-15 19:27:35,007][1652491] Updated weights for policy 0, policy_version 646657 (0.0013) [2024-06-15 19:27:35,692][1652491] Updated weights for policy 0, policy_version 646706 (0.0012) [2024-06-15 19:27:35,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 48059.6, 300 sec: 47541.4). Total num frames: 1324482560. Throughput: 0: 12344.9. Samples: 331206144. Policy #0 lag: (min: 7.0, avg: 84.2, max: 263.0) [2024-06-15 19:27:35,956][1648985] Avg episode reward: [(0, '150.970')] [2024-06-15 19:27:36,800][1652491] Updated weights for policy 0, policy_version 646740 (0.0014) [2024-06-15 19:27:38,272][1652491] Updated weights for policy 0, policy_version 646793 (0.0028) [2024-06-15 19:27:39,496][1652491] Updated weights for policy 0, policy_version 646848 (0.0014) [2024-06-15 19:27:40,751][1652491] Updated weights for policy 0, policy_version 646903 (0.0101) [2024-06-15 19:27:40,955][1648985] Fps is (10 sec: 55705.4, 60 sec: 51336.5, 300 sec: 47652.4). Total num frames: 1324875776. Throughput: 0: 12288.1. Samples: 331238400. Policy #0 lag: (min: 15.0, avg: 118.3, max: 271.0) [2024-06-15 19:27:40,956][1648985] Avg episode reward: [(0, '153.040')] [2024-06-15 19:27:45,955][1648985] Fps is (10 sec: 39321.5, 60 sec: 47513.7, 300 sec: 47097.0). Total num frames: 1324875776. Throughput: 0: 12572.5. Samples: 331315200. Policy #0 lag: (min: 15.0, avg: 118.3, max: 271.0) [2024-06-15 19:27:45,956][1648985] Avg episode reward: [(0, '176.520')] [2024-06-15 19:27:46,263][1652491] Updated weights for policy 0, policy_version 646933 (0.0016) [2024-06-15 19:27:47,383][1652491] Updated weights for policy 0, policy_version 646977 (0.0013) [2024-06-15 19:27:48,355][1652491] Updated weights for policy 0, policy_version 647031 (0.0063) [2024-06-15 19:27:49,243][1651469] Signal inference workers to stop experience collection... (33650 times) [2024-06-15 19:27:49,299][1652491] Updated weights for policy 0, policy_version 647058 (0.0012) [2024-06-15 19:27:49,322][1652491] InferenceWorker_p0-w0: stopping experience collection (33650 times) [2024-06-15 19:27:49,601][1651469] Signal inference workers to resume experience collection... (33650 times) [2024-06-15 19:27:49,602][1652491] InferenceWorker_p0-w0: resuming experience collection (33650 times) [2024-06-15 19:27:50,955][1648985] Fps is (10 sec: 42598.5, 60 sec: 50790.4, 300 sec: 47541.4). Total num frames: 1325301760. Throughput: 0: 12310.7. Samples: 331381760. Policy #0 lag: (min: 15.0, avg: 118.3, max: 271.0) [2024-06-15 19:27:50,956][1648985] Avg episode reward: [(0, '186.230')] [2024-06-15 19:27:51,565][1652491] Updated weights for policy 0, policy_version 647167 (0.0014) [2024-06-15 19:27:55,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 48606.0, 300 sec: 47097.1). Total num frames: 1325400064. Throughput: 0: 12390.4. Samples: 331422720. Policy #0 lag: (min: 15.0, avg: 118.3, max: 271.0) [2024-06-15 19:27:55,956][1648985] Avg episode reward: [(0, '187.440')] [2024-06-15 19:27:57,575][1652491] Updated weights for policy 0, policy_version 647216 (0.0014) [2024-06-15 19:27:58,856][1652491] Updated weights for policy 0, policy_version 647267 (0.0022) [2024-06-15 19:28:00,431][1652491] Updated weights for policy 0, policy_version 647344 (0.0147) [2024-06-15 19:28:00,955][1648985] Fps is (10 sec: 49152.0, 60 sec: 50244.4, 300 sec: 47541.4). Total num frames: 1325793280. Throughput: 0: 12344.9. Samples: 331496448. Policy #0 lag: (min: 15.0, avg: 118.3, max: 271.0) [2024-06-15 19:28:00,956][1648985] Avg episode reward: [(0, '170.820')] [2024-06-15 19:28:01,576][1652491] Updated weights for policy 0, policy_version 647379 (0.0015) [2024-06-15 19:28:05,955][1648985] Fps is (10 sec: 52427.8, 60 sec: 49697.8, 300 sec: 47097.0). Total num frames: 1325924352. Throughput: 0: 12640.6. Samples: 331574272. Policy #0 lag: (min: 15.0, avg: 118.3, max: 271.0) [2024-06-15 19:28:05,956][1648985] Avg episode reward: [(0, '170.850')] [2024-06-15 19:28:08,097][1652491] Updated weights for policy 0, policy_version 647456 (0.0013) [2024-06-15 19:28:09,826][1652491] Updated weights for policy 0, policy_version 647536 (0.0015) [2024-06-15 19:28:10,955][1648985] Fps is (10 sec: 45875.2, 60 sec: 49152.0, 300 sec: 47652.4). Total num frames: 1326252032. Throughput: 0: 12447.3. Samples: 331610624. Policy #0 lag: (min: 15.0, avg: 118.3, max: 271.0) [2024-06-15 19:28:10,956][1648985] Avg episode reward: [(0, '157.230')] [2024-06-15 19:28:11,447][1652491] Updated weights for policy 0, policy_version 647609 (0.0013) [2024-06-15 19:28:13,073][1652491] Updated weights for policy 0, policy_version 647675 (0.0014) [2024-06-15 19:28:15,955][1648985] Fps is (10 sec: 52429.2, 60 sec: 50244.2, 300 sec: 47208.1). Total num frames: 1326448640. Throughput: 0: 12435.9. Samples: 331675648. Policy #0 lag: (min: 15.0, avg: 118.3, max: 271.0) [2024-06-15 19:28:15,956][1648985] Avg episode reward: [(0, '146.370')] [2024-06-15 19:28:19,899][1652491] Updated weights for policy 0, policy_version 647729 (0.0014) [2024-06-15 19:28:20,955][1648985] Fps is (10 sec: 39321.8, 60 sec: 46967.4, 300 sec: 47763.5). Total num frames: 1326645248. Throughput: 0: 12083.2. Samples: 331749888. Policy #0 lag: (min: 15.0, avg: 118.3, max: 271.0) [2024-06-15 19:28:20,955][1648985] Avg episode reward: [(0, '135.900')] [2024-06-15 19:28:21,627][1652491] Updated weights for policy 0, policy_version 647810 (0.0020) [2024-06-15 19:28:22,829][1652491] Updated weights for policy 0, policy_version 647872 (0.0021) [2024-06-15 19:28:24,058][1652491] Updated weights for policy 0, policy_version 647927 (0.0012) [2024-06-15 19:28:25,955][1648985] Fps is (10 sec: 52429.9, 60 sec: 50244.3, 300 sec: 47541.4). Total num frames: 1326972928. Throughput: 0: 12071.8. Samples: 331781632. Policy #0 lag: (min: 15.0, avg: 118.3, max: 271.0) [2024-06-15 19:28:25,956][1648985] Avg episode reward: [(0, '145.170')] [2024-06-15 19:28:29,200][1651469] Signal inference workers to stop experience collection... (33700 times) [2024-06-15 19:28:29,280][1652491] InferenceWorker_p0-w0: stopping experience collection (33700 times) [2024-06-15 19:28:29,534][1651469] Signal inference workers to resume experience collection... (33700 times) [2024-06-15 19:28:29,535][1652491] InferenceWorker_p0-w0: resuming experience collection (33700 times) [2024-06-15 19:28:30,293][1652491] Updated weights for policy 0, policy_version 647984 (0.0012) [2024-06-15 19:28:30,955][1648985] Fps is (10 sec: 45875.1, 60 sec: 46421.4, 300 sec: 47541.4). Total num frames: 1327104000. Throughput: 0: 12276.7. Samples: 331867648. Policy #0 lag: (min: 15.0, avg: 118.3, max: 271.0) [2024-06-15 19:28:30,956][1648985] Avg episode reward: [(0, '162.830')] [2024-06-15 19:28:31,846][1652491] Updated weights for policy 0, policy_version 648055 (0.0020) [2024-06-15 19:28:32,880][1652491] Updated weights for policy 0, policy_version 648101 (0.0042) [2024-06-15 19:28:34,125][1652491] Updated weights for policy 0, policy_version 648162 (0.0055) [2024-06-15 19:28:35,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 50244.3, 300 sec: 47763.5). Total num frames: 1327497216. Throughput: 0: 12265.3. Samples: 331933696. Policy #0 lag: (min: 15.0, avg: 118.3, max: 271.0) [2024-06-15 19:28:35,955][1648985] Avg episode reward: [(0, '162.330')] [2024-06-15 19:28:40,267][1652491] Updated weights for policy 0, policy_version 648224 (0.0014) [2024-06-15 19:28:40,955][1648985] Fps is (10 sec: 49152.4, 60 sec: 45329.2, 300 sec: 47430.3). Total num frames: 1327595520. Throughput: 0: 12344.9. Samples: 331978240. Policy #0 lag: (min: 15.0, avg: 118.3, max: 271.0) [2024-06-15 19:28:40,955][1648985] Avg episode reward: [(0, '166.710')] [2024-06-15 19:28:42,513][1652491] Updated weights for policy 0, policy_version 648304 (0.0014) [2024-06-15 19:28:43,728][1652491] Updated weights for policy 0, policy_version 648352 (0.0012) [2024-06-15 19:28:45,373][1652491] Updated weights for policy 0, policy_version 648417 (0.0026) [2024-06-15 19:28:45,955][1648985] Fps is (10 sec: 52428.6, 60 sec: 52428.9, 300 sec: 47985.7). Total num frames: 1328021504. Throughput: 0: 12026.3. Samples: 332037632. Policy #0 lag: (min: 15.0, avg: 118.3, max: 271.0) [2024-06-15 19:28:45,956][1648985] Avg episode reward: [(0, '166.170')] [2024-06-15 19:28:50,955][1648985] Fps is (10 sec: 42598.1, 60 sec: 45329.1, 300 sec: 47097.3). Total num frames: 1328021504. Throughput: 0: 12197.1. Samples: 332123136. Policy #0 lag: (min: 15.0, avg: 118.3, max: 271.0) [2024-06-15 19:28:50,956][1648985] Avg episode reward: [(0, '142.140')] [2024-06-15 19:28:51,453][1652491] Updated weights for policy 0, policy_version 648480 (0.0012) [2024-06-15 19:28:53,790][1652491] Updated weights for policy 0, policy_version 648571 (0.0027) [2024-06-15 19:28:55,141][1652491] Updated weights for policy 0, policy_version 648624 (0.0012) [2024-06-15 19:28:55,955][1648985] Fps is (10 sec: 42597.5, 60 sec: 50790.3, 300 sec: 48096.7). Total num frames: 1328447488. Throughput: 0: 11912.5. Samples: 332146688. Policy #0 lag: (min: 15.0, avg: 118.3, max: 271.0) [2024-06-15 19:28:55,956][1648985] Avg episode reward: [(0, '142.150')] [2024-06-15 19:28:56,417][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000648688_1328513024.pth... [2024-06-15 19:28:56,467][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000643024_1316913152.pth [2024-06-15 19:28:56,657][1652491] Updated weights for policy 0, policy_version 648699 (0.0015) [2024-06-15 19:29:00,955][1648985] Fps is (10 sec: 52427.1, 60 sec: 45875.0, 300 sec: 47097.0). Total num frames: 1328545792. Throughput: 0: 12071.8. Samples: 332218880. Policy #0 lag: (min: 15.0, avg: 118.3, max: 271.0) [2024-06-15 19:29:00,956][1648985] Avg episode reward: [(0, '145.850')] [2024-06-15 19:29:04,373][1651469] Signal inference workers to stop experience collection... (33750 times) [2024-06-15 19:29:04,400][1652491] Updated weights for policy 0, policy_version 648785 (0.0019) [2024-06-15 19:29:04,459][1652491] InferenceWorker_p0-w0: stopping experience collection (33750 times) [2024-06-15 19:29:04,657][1651469] Signal inference workers to resume experience collection... (33750 times) [2024-06-15 19:29:04,658][1652491] InferenceWorker_p0-w0: resuming experience collection (33750 times) [2024-06-15 19:29:05,955][1648985] Fps is (10 sec: 39322.4, 60 sec: 48606.1, 300 sec: 48096.8). Total num frames: 1328840704. Throughput: 0: 11844.3. Samples: 332282880. Policy #0 lag: (min: 15.0, avg: 118.3, max: 271.0) [2024-06-15 19:29:05,956][1648985] Avg episode reward: [(0, '149.860')] [2024-06-15 19:29:06,184][1652491] Updated weights for policy 0, policy_version 648864 (0.0013) [2024-06-15 19:29:07,588][1652491] Updated weights for policy 0, policy_version 648929 (0.0023) [2024-06-15 19:29:10,955][1648985] Fps is (10 sec: 52430.5, 60 sec: 46967.5, 300 sec: 47541.4). Total num frames: 1329070080. Throughput: 0: 11878.4. Samples: 332316160. Policy #0 lag: (min: 15.0, avg: 118.3, max: 271.0) [2024-06-15 19:29:10,956][1648985] Avg episode reward: [(0, '172.900')] [2024-06-15 19:29:13,796][1652491] Updated weights for policy 0, policy_version 648977 (0.0013) [2024-06-15 19:29:15,695][1652491] Updated weights for policy 0, policy_version 649059 (0.0013) [2024-06-15 19:29:15,955][1648985] Fps is (10 sec: 42598.0, 60 sec: 46967.6, 300 sec: 47763.5). Total num frames: 1329266688. Throughput: 0: 11764.6. Samples: 332397056. Policy #0 lag: (min: 15.0, avg: 63.9, max: 267.0) [2024-06-15 19:29:15,956][1648985] Avg episode reward: [(0, '172.170')] [2024-06-15 19:29:16,442][1652491] Updated weights for policy 0, policy_version 649088 (0.0013) [2024-06-15 19:29:17,873][1652491] Updated weights for policy 0, policy_version 649140 (0.0125) [2024-06-15 19:29:19,266][1652491] Updated weights for policy 0, policy_version 649209 (0.0015) [2024-06-15 19:29:20,955][1648985] Fps is (10 sec: 52427.2, 60 sec: 49151.7, 300 sec: 47652.4). Total num frames: 1329594368. Throughput: 0: 11707.6. Samples: 332460544. Policy #0 lag: (min: 15.0, avg: 63.9, max: 267.0) [2024-06-15 19:29:20,956][1648985] Avg episode reward: [(0, '144.820')] [2024-06-15 19:29:25,476][1652491] Updated weights for policy 0, policy_version 649268 (0.0014) [2024-06-15 19:29:25,955][1648985] Fps is (10 sec: 45875.6, 60 sec: 45875.2, 300 sec: 47541.4). Total num frames: 1329725440. Throughput: 0: 11719.1. Samples: 332505600. Policy #0 lag: (min: 15.0, avg: 63.9, max: 267.0) [2024-06-15 19:29:25,956][1648985] Avg episode reward: [(0, '147.900')] [2024-06-15 19:29:26,980][1652491] Updated weights for policy 0, policy_version 649336 (0.0013) [2024-06-15 19:29:28,130][1652491] Updated weights for policy 0, policy_version 649363 (0.0012) [2024-06-15 19:29:29,270][1652491] Updated weights for policy 0, policy_version 649411 (0.0015) [2024-06-15 19:29:30,955][1648985] Fps is (10 sec: 52430.0, 60 sec: 50244.2, 300 sec: 47985.7). Total num frames: 1330118656. Throughput: 0: 11741.8. Samples: 332566016. Policy #0 lag: (min: 15.0, avg: 63.9, max: 267.0) [2024-06-15 19:29:30,956][1648985] Avg episode reward: [(0, '155.760')] [2024-06-15 19:29:34,806][1652491] Updated weights for policy 0, policy_version 649474 (0.0049) [2024-06-15 19:29:35,955][1648985] Fps is (10 sec: 49152.0, 60 sec: 45329.0, 300 sec: 47430.3). Total num frames: 1330216960. Throughput: 0: 11548.4. Samples: 332642816. Policy #0 lag: (min: 15.0, avg: 63.9, max: 267.0) [2024-06-15 19:29:35,956][1648985] Avg episode reward: [(0, '152.550')] [2024-06-15 19:29:36,269][1652491] Updated weights for policy 0, policy_version 649532 (0.0013) [2024-06-15 19:29:37,906][1652491] Updated weights for policy 0, policy_version 649597 (0.0012) [2024-06-15 19:29:40,269][1652491] Updated weights for policy 0, policy_version 649660 (0.0096) [2024-06-15 19:29:40,955][1648985] Fps is (10 sec: 39321.7, 60 sec: 48605.8, 300 sec: 47652.4). Total num frames: 1330511872. Throughput: 0: 11753.3. Samples: 332675584. Policy #0 lag: (min: 15.0, avg: 63.9, max: 267.0) [2024-06-15 19:29:40,956][1648985] Avg episode reward: [(0, '144.650')] [2024-06-15 19:29:41,198][1651469] Signal inference workers to stop experience collection... (33800 times) [2024-06-15 19:29:41,230][1652491] InferenceWorker_p0-w0: stopping experience collection (33800 times) [2024-06-15 19:29:41,438][1651469] Signal inference workers to resume experience collection... (33800 times) [2024-06-15 19:29:41,439][1652491] InferenceWorker_p0-w0: resuming experience collection (33800 times) [2024-06-15 19:29:41,612][1652491] Updated weights for policy 0, policy_version 649699 (0.0014) [2024-06-15 19:29:45,955][1648985] Fps is (10 sec: 42598.8, 60 sec: 43690.7, 300 sec: 47541.5). Total num frames: 1330642944. Throughput: 0: 11639.6. Samples: 332742656. Policy #0 lag: (min: 15.0, avg: 63.9, max: 267.0) [2024-06-15 19:29:45,955][1648985] Avg episode reward: [(0, '127.360')] [2024-06-15 19:29:46,575][1652491] Updated weights for policy 0, policy_version 649735 (0.0017) [2024-06-15 19:29:47,862][1652491] Updated weights for policy 0, policy_version 649787 (0.0011) [2024-06-15 19:29:49,159][1652491] Updated weights for policy 0, policy_version 649828 (0.0014) [2024-06-15 19:29:50,186][1652491] Updated weights for policy 0, policy_version 649872 (0.0013) [2024-06-15 19:29:50,955][1648985] Fps is (10 sec: 49152.6, 60 sec: 49698.2, 300 sec: 47874.7). Total num frames: 1331003392. Throughput: 0: 11730.5. Samples: 332810752. Policy #0 lag: (min: 15.0, avg: 63.9, max: 267.0) [2024-06-15 19:29:50,956][1648985] Avg episode reward: [(0, '122.230')] [2024-06-15 19:29:51,993][1652491] Updated weights for policy 0, policy_version 649925 (0.0014) [2024-06-15 19:29:53,068][1652491] Updated weights for policy 0, policy_version 649982 (0.0014) [2024-06-15 19:29:55,955][1648985] Fps is (10 sec: 52428.2, 60 sec: 45329.2, 300 sec: 47541.4). Total num frames: 1331167232. Throughput: 0: 11878.4. Samples: 332850688. Policy #0 lag: (min: 15.0, avg: 63.9, max: 267.0) [2024-06-15 19:29:55,956][1648985] Avg episode reward: [(0, '129.940')] [2024-06-15 19:29:58,763][1652491] Updated weights for policy 0, policy_version 650042 (0.0015) [2024-06-15 19:30:00,435][1652491] Updated weights for policy 0, policy_version 650105 (0.0093) [2024-06-15 19:30:00,955][1648985] Fps is (10 sec: 42598.1, 60 sec: 48060.0, 300 sec: 47763.5). Total num frames: 1331429376. Throughput: 0: 11696.4. Samples: 332923392. Policy #0 lag: (min: 15.0, avg: 63.9, max: 267.0) [2024-06-15 19:30:00,956][1648985] Avg episode reward: [(0, '134.030')] [2024-06-15 19:30:02,232][1652491] Updated weights for policy 0, policy_version 650168 (0.0019) [2024-06-15 19:30:03,949][1652491] Updated weights for policy 0, policy_version 650238 (0.0024) [2024-06-15 19:30:05,955][1648985] Fps is (10 sec: 52429.2, 60 sec: 47513.6, 300 sec: 47541.4). Total num frames: 1331691520. Throughput: 0: 11787.5. Samples: 332990976. Policy #0 lag: (min: 15.0, avg: 63.9, max: 267.0) [2024-06-15 19:30:05,955][1648985] Avg episode reward: [(0, '135.100')] [2024-06-15 19:30:09,620][1652491] Updated weights for policy 0, policy_version 650297 (0.0015) [2024-06-15 19:30:10,955][1648985] Fps is (10 sec: 42597.9, 60 sec: 46421.2, 300 sec: 47541.3). Total num frames: 1331855360. Throughput: 0: 11673.6. Samples: 333030912. Policy #0 lag: (min: 15.0, avg: 63.9, max: 267.0) [2024-06-15 19:30:10,956][1648985] Avg episode reward: [(0, '151.730')] [2024-06-15 19:30:11,400][1652491] Updated weights for policy 0, policy_version 650338 (0.0015) [2024-06-15 19:30:12,557][1652491] Updated weights for policy 0, policy_version 650384 (0.0013) [2024-06-15 19:30:14,139][1652491] Updated weights for policy 0, policy_version 650432 (0.0014) [2024-06-15 19:30:15,293][1652491] Updated weights for policy 0, policy_version 650488 (0.0013) [2024-06-15 19:30:15,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 49152.1, 300 sec: 47985.7). Total num frames: 1332215808. Throughput: 0: 11616.7. Samples: 333088768. Policy #0 lag: (min: 15.0, avg: 63.9, max: 267.0) [2024-06-15 19:30:15,955][1648985] Avg episode reward: [(0, '164.240')] [2024-06-15 19:30:20,915][1652491] Updated weights for policy 0, policy_version 650544 (0.0013) [2024-06-15 19:30:20,955][1648985] Fps is (10 sec: 45875.8, 60 sec: 45329.3, 300 sec: 47430.3). Total num frames: 1332314112. Throughput: 0: 11605.3. Samples: 333165056. Policy #0 lag: (min: 15.0, avg: 63.9, max: 267.0) [2024-06-15 19:30:20,956][1648985] Avg episode reward: [(0, '148.380')] [2024-06-15 19:30:22,844][1652491] Updated weights for policy 0, policy_version 650581 (0.0026) [2024-06-15 19:30:24,493][1652491] Updated weights for policy 0, policy_version 650656 (0.0015) [2024-06-15 19:30:25,167][1651469] Signal inference workers to stop experience collection... (33850 times) [2024-06-15 19:30:25,218][1652491] InferenceWorker_p0-w0: stopping experience collection (33850 times) [2024-06-15 19:30:25,509][1651469] Signal inference workers to resume experience collection... (33850 times) [2024-06-15 19:30:25,510][1652491] InferenceWorker_p0-w0: resuming experience collection (33850 times) [2024-06-15 19:30:25,747][1652491] Updated weights for policy 0, policy_version 650695 (0.0014) [2024-06-15 19:30:25,955][1648985] Fps is (10 sec: 42598.1, 60 sec: 48605.9, 300 sec: 47652.4). Total num frames: 1332641792. Throughput: 0: 11685.0. Samples: 333201408. Policy #0 lag: (min: 15.0, avg: 63.9, max: 267.0) [2024-06-15 19:30:25,956][1648985] Avg episode reward: [(0, '138.620')] [2024-06-15 19:30:26,632][1652491] Updated weights for policy 0, policy_version 650742 (0.0083) [2024-06-15 19:30:30,998][1648985] Fps is (10 sec: 45677.8, 60 sec: 44205.0, 300 sec: 47645.5). Total num frames: 1332772864. Throughput: 0: 11764.7. Samples: 333272576. Policy #0 lag: (min: 15.0, avg: 63.9, max: 267.0) [2024-06-15 19:30:30,999][1648985] Avg episode reward: [(0, '130.410')] [2024-06-15 19:30:31,260][1652491] Updated weights for policy 0, policy_version 650784 (0.0014) [2024-06-15 19:30:33,770][1652491] Updated weights for policy 0, policy_version 650832 (0.0013) [2024-06-15 19:30:35,533][1652491] Updated weights for policy 0, policy_version 650901 (0.0013) [2024-06-15 19:30:35,955][1648985] Fps is (10 sec: 42598.7, 60 sec: 47513.7, 300 sec: 47319.2). Total num frames: 1333067776. Throughput: 0: 11650.8. Samples: 333335040. Policy #0 lag: (min: 15.0, avg: 63.9, max: 267.0) [2024-06-15 19:30:35,955][1648985] Avg episode reward: [(0, '135.200')] [2024-06-15 19:30:36,771][1652491] Updated weights for policy 0, policy_version 650945 (0.0022) [2024-06-15 19:30:40,955][1648985] Fps is (10 sec: 49365.3, 60 sec: 45875.2, 300 sec: 47541.4). Total num frames: 1333264384. Throughput: 0: 11502.9. Samples: 333368320. Policy #0 lag: (min: 15.0, avg: 63.9, max: 267.0) [2024-06-15 19:30:40,956][1648985] Avg episode reward: [(0, '127.680')] [2024-06-15 19:30:42,037][1652491] Updated weights for policy 0, policy_version 651010 (0.0095) [2024-06-15 19:30:43,261][1652491] Updated weights for policy 0, policy_version 651072 (0.0014) [2024-06-15 19:30:45,930][1652491] Updated weights for policy 0, policy_version 651135 (0.0014) [2024-06-15 19:30:45,955][1648985] Fps is (10 sec: 45875.1, 60 sec: 48059.7, 300 sec: 47541.4). Total num frames: 1333526528. Throughput: 0: 11639.5. Samples: 333447168. Policy #0 lag: (min: 15.0, avg: 63.9, max: 267.0) [2024-06-15 19:30:45,956][1648985] Avg episode reward: [(0, '137.050')] [2024-06-15 19:30:47,883][1652491] Updated weights for policy 0, policy_version 651186 (0.0116) [2024-06-15 19:30:49,301][1652491] Updated weights for policy 0, policy_version 651256 (0.0012) [2024-06-15 19:30:50,955][1648985] Fps is (10 sec: 52429.1, 60 sec: 46421.3, 300 sec: 47874.6). Total num frames: 1333788672. Throughput: 0: 11696.4. Samples: 333517312. Policy #0 lag: (min: 15.0, avg: 63.9, max: 267.0) [2024-06-15 19:30:50,955][1648985] Avg episode reward: [(0, '154.230')] [2024-06-15 19:30:53,503][1652491] Updated weights for policy 0, policy_version 651300 (0.0048) [2024-06-15 19:30:55,548][1652491] Updated weights for policy 0, policy_version 651346 (0.0015) [2024-06-15 19:30:55,955][1648985] Fps is (10 sec: 45873.6, 60 sec: 46967.2, 300 sec: 47763.5). Total num frames: 1333985280. Throughput: 0: 11616.7. Samples: 333553664. Policy #0 lag: (min: 11.0, avg: 106.9, max: 267.0) [2024-06-15 19:30:55,956][1648985] Avg episode reward: [(0, '172.880')] [2024-06-15 19:30:56,421][1652491] Updated weights for policy 0, policy_version 651392 (0.0011) [2024-06-15 19:30:56,470][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000651392_1334050816.pth... [2024-06-15 19:30:56,544][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000645760_1322516480.pth [2024-06-15 19:30:56,548][1651469] Saving a milestone train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/milestones/checkpoint_000651392_1334050816.pth [2024-06-15 19:30:58,885][1652491] Updated weights for policy 0, policy_version 651456 (0.0011) [2024-06-15 19:30:59,936][1652491] Updated weights for policy 0, policy_version 651506 (0.0011) [2024-06-15 19:31:00,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 48059.8, 300 sec: 47985.7). Total num frames: 1334312960. Throughput: 0: 11878.4. Samples: 333623296. Policy #0 lag: (min: 11.0, avg: 106.9, max: 267.0) [2024-06-15 19:31:00,955][1648985] Avg episode reward: [(0, '168.890')] [2024-06-15 19:31:03,660][1652491] Updated weights for policy 0, policy_version 651541 (0.0011) [2024-06-15 19:31:05,955][1648985] Fps is (10 sec: 45877.0, 60 sec: 45875.2, 300 sec: 47985.7). Total num frames: 1334444032. Throughput: 0: 11923.9. Samples: 333701632. Policy #0 lag: (min: 11.0, avg: 106.9, max: 267.0) [2024-06-15 19:31:05,956][1648985] Avg episode reward: [(0, '154.350')] [2024-06-15 19:31:06,765][1652491] Updated weights for policy 0, policy_version 651585 (0.0015) [2024-06-15 19:31:08,024][1652491] Updated weights for policy 0, policy_version 651645 (0.0013) [2024-06-15 19:31:08,391][1651469] Signal inference workers to stop experience collection... (33900 times) [2024-06-15 19:31:08,432][1652491] InferenceWorker_p0-w0: stopping experience collection (33900 times) [2024-06-15 19:31:08,620][1651469] Signal inference workers to resume experience collection... (33900 times) [2024-06-15 19:31:08,621][1652491] InferenceWorker_p0-w0: resuming experience collection (33900 times) [2024-06-15 19:31:09,527][1652491] Updated weights for policy 0, policy_version 651699 (0.0012) [2024-06-15 19:31:10,955][1648985] Fps is (10 sec: 49151.3, 60 sec: 49152.1, 300 sec: 47874.6). Total num frames: 1334804480. Throughput: 0: 11844.3. Samples: 333734400. Policy #0 lag: (min: 11.0, avg: 106.9, max: 267.0) [2024-06-15 19:31:10,956][1648985] Avg episode reward: [(0, '166.220')] [2024-06-15 19:31:11,040][1652491] Updated weights for policy 0, policy_version 651770 (0.0019) [2024-06-15 19:31:15,801][1652491] Updated weights for policy 0, policy_version 651840 (0.0012) [2024-06-15 19:31:15,955][1648985] Fps is (10 sec: 52428.4, 60 sec: 45875.2, 300 sec: 47985.7). Total num frames: 1334968320. Throughput: 0: 11969.5. Samples: 333810688. Policy #0 lag: (min: 11.0, avg: 106.9, max: 267.0) [2024-06-15 19:31:15,956][1648985] Avg episode reward: [(0, '150.040')] [2024-06-15 19:31:19,969][1652491] Updated weights for policy 0, policy_version 651920 (0.0036) [2024-06-15 19:31:20,955][1648985] Fps is (10 sec: 42597.3, 60 sec: 48605.6, 300 sec: 47874.7). Total num frames: 1335230464. Throughput: 0: 11866.9. Samples: 333869056. Policy #0 lag: (min: 11.0, avg: 106.9, max: 267.0) [2024-06-15 19:31:20,956][1648985] Avg episode reward: [(0, '143.900')] [2024-06-15 19:31:21,421][1652491] Updated weights for policy 0, policy_version 651985 (0.0012) [2024-06-15 19:31:22,309][1652491] Updated weights for policy 0, policy_version 652028 (0.0022) [2024-06-15 19:31:25,955][1648985] Fps is (10 sec: 42598.2, 60 sec: 45875.2, 300 sec: 47652.4). Total num frames: 1335394304. Throughput: 0: 12071.8. Samples: 333911552. Policy #0 lag: (min: 11.0, avg: 106.9, max: 267.0) [2024-06-15 19:31:25,956][1648985] Avg episode reward: [(0, '161.480')] [2024-06-15 19:31:28,939][1652491] Updated weights for policy 0, policy_version 652101 (0.0118) [2024-06-15 19:31:30,788][1652491] Updated weights for policy 0, policy_version 652176 (0.0219) [2024-06-15 19:31:30,955][1648985] Fps is (10 sec: 42599.9, 60 sec: 48094.4, 300 sec: 47652.4). Total num frames: 1335656448. Throughput: 0: 12003.6. Samples: 333987328. Policy #0 lag: (min: 11.0, avg: 106.9, max: 267.0) [2024-06-15 19:31:30,955][1648985] Avg episode reward: [(0, '164.480')] [2024-06-15 19:31:32,332][1652491] Updated weights for policy 0, policy_version 652256 (0.0014) [2024-06-15 19:31:35,955][1648985] Fps is (10 sec: 49151.5, 60 sec: 46967.3, 300 sec: 47763.5). Total num frames: 1335885824. Throughput: 0: 12083.1. Samples: 334061056. Policy #0 lag: (min: 11.0, avg: 106.9, max: 267.0) [2024-06-15 19:31:35,956][1648985] Avg episode reward: [(0, '157.980')] [2024-06-15 19:31:36,374][1652491] Updated weights for policy 0, policy_version 652304 (0.0013) [2024-06-15 19:31:39,554][1652491] Updated weights for policy 0, policy_version 652354 (0.0013) [2024-06-15 19:31:40,955][1648985] Fps is (10 sec: 45874.4, 60 sec: 47513.5, 300 sec: 47763.5). Total num frames: 1336115200. Throughput: 0: 12151.5. Samples: 334100480. Policy #0 lag: (min: 11.0, avg: 106.9, max: 267.0) [2024-06-15 19:31:40,956][1648985] Avg episode reward: [(0, '150.770')] [2024-06-15 19:31:41,385][1652491] Updated weights for policy 0, policy_version 652419 (0.0013) [2024-06-15 19:31:42,974][1652491] Updated weights for policy 0, policy_version 652501 (0.0014) [2024-06-15 19:31:45,998][1648985] Fps is (10 sec: 52203.8, 60 sec: 48025.1, 300 sec: 47978.6). Total num frames: 1336410112. Throughput: 0: 12014.7. Samples: 334164480. Policy #0 lag: (min: 11.0, avg: 106.9, max: 267.0) [2024-06-15 19:31:45,999][1648985] Avg episode reward: [(0, '147.730')] [2024-06-15 19:31:46,559][1652491] Updated weights for policy 0, policy_version 652548 (0.0013) [2024-06-15 19:31:46,839][1651469] Signal inference workers to stop experience collection... (33950 times) [2024-06-15 19:31:46,879][1652491] InferenceWorker_p0-w0: stopping experience collection (33950 times) [2024-06-15 19:31:47,089][1651469] Signal inference workers to resume experience collection... (33950 times) [2024-06-15 19:31:47,089][1652491] InferenceWorker_p0-w0: resuming experience collection (33950 times) [2024-06-15 19:31:47,713][1652491] Updated weights for policy 0, policy_version 652604 (0.0019) [2024-06-15 19:31:50,955][1648985] Fps is (10 sec: 45875.6, 60 sec: 46421.3, 300 sec: 47763.6). Total num frames: 1336573952. Throughput: 0: 12083.2. Samples: 334245376. Policy #0 lag: (min: 11.0, avg: 106.9, max: 267.0) [2024-06-15 19:31:50,956][1648985] Avg episode reward: [(0, '143.020')] [2024-06-15 19:31:51,237][1652491] Updated weights for policy 0, policy_version 652642 (0.0016) [2024-06-15 19:31:52,305][1652491] Updated weights for policy 0, policy_version 652688 (0.0013) [2024-06-15 19:31:54,117][1652491] Updated weights for policy 0, policy_version 652757 (0.0014) [2024-06-15 19:31:55,955][1648985] Fps is (10 sec: 52656.1, 60 sec: 49152.2, 300 sec: 47985.7). Total num frames: 1336934400. Throughput: 0: 11912.5. Samples: 334270464. Policy #0 lag: (min: 11.0, avg: 106.9, max: 267.0) [2024-06-15 19:31:55,956][1648985] Avg episode reward: [(0, '154.100')] [2024-06-15 19:31:58,079][1652491] Updated weights for policy 0, policy_version 652835 (0.0013) [2024-06-15 19:32:00,955][1648985] Fps is (10 sec: 49152.1, 60 sec: 45875.1, 300 sec: 47874.6). Total num frames: 1337065472. Throughput: 0: 11958.0. Samples: 334348800. Policy #0 lag: (min: 11.0, avg: 106.9, max: 267.0) [2024-06-15 19:32:00,956][1648985] Avg episode reward: [(0, '169.050')] [2024-06-15 19:32:01,884][1652491] Updated weights for policy 0, policy_version 652882 (0.0012) [2024-06-15 19:32:03,314][1652491] Updated weights for policy 0, policy_version 652929 (0.0012) [2024-06-15 19:32:04,694][1652491] Updated weights for policy 0, policy_version 652995 (0.0014) [2024-06-15 19:32:05,955][1648985] Fps is (10 sec: 49152.0, 60 sec: 49698.0, 300 sec: 47874.6). Total num frames: 1337425920. Throughput: 0: 12060.5. Samples: 334411776. Policy #0 lag: (min: 11.0, avg: 106.9, max: 267.0) [2024-06-15 19:32:05,956][1648985] Avg episode reward: [(0, '170.550')] [2024-06-15 19:32:06,015][1652491] Updated weights for policy 0, policy_version 653053 (0.0034) [2024-06-15 19:32:09,261][1652491] Updated weights for policy 0, policy_version 653115 (0.0012) [2024-06-15 19:32:10,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 46421.3, 300 sec: 47985.7). Total num frames: 1337589760. Throughput: 0: 12003.6. Samples: 334451712. Policy #0 lag: (min: 11.0, avg: 106.9, max: 267.0) [2024-06-15 19:32:10,956][1648985] Avg episode reward: [(0, '156.860')] [2024-06-15 19:32:14,015][1652491] Updated weights for policy 0, policy_version 653173 (0.0014) [2024-06-15 19:32:15,955][1648985] Fps is (10 sec: 42598.8, 60 sec: 48059.8, 300 sec: 47541.4). Total num frames: 1337851904. Throughput: 0: 11855.6. Samples: 334520832. Policy #0 lag: (min: 11.0, avg: 106.9, max: 267.0) [2024-06-15 19:32:15,956][1648985] Avg episode reward: [(0, '150.230')] [2024-06-15 19:32:16,457][1652491] Updated weights for policy 0, policy_version 653265 (0.0013) [2024-06-15 19:32:17,340][1652491] Updated weights for policy 0, policy_version 653305 (0.0012) [2024-06-15 19:32:20,668][1652491] Updated weights for policy 0, policy_version 653344 (0.0011) [2024-06-15 19:32:20,955][1648985] Fps is (10 sec: 45875.5, 60 sec: 46967.7, 300 sec: 47763.5). Total num frames: 1338048512. Throughput: 0: 11707.8. Samples: 334587904. Policy #0 lag: (min: 11.0, avg: 106.9, max: 267.0) [2024-06-15 19:32:20,955][1648985] Avg episode reward: [(0, '140.050')] [2024-06-15 19:32:25,287][1652491] Updated weights for policy 0, policy_version 653408 (0.0011) [2024-06-15 19:32:25,955][1648985] Fps is (10 sec: 39321.4, 60 sec: 47513.6, 300 sec: 47208.1). Total num frames: 1338245120. Throughput: 0: 11685.0. Samples: 334626304. Policy #0 lag: (min: 11.0, avg: 106.9, max: 267.0) [2024-06-15 19:32:25,956][1648985] Avg episode reward: [(0, '138.060')] [2024-06-15 19:32:26,806][1652491] Updated weights for policy 0, policy_version 653472 (0.0020) [2024-06-15 19:32:27,353][1651469] Signal inference workers to stop experience collection... (34000 times) [2024-06-15 19:32:27,426][1652491] InferenceWorker_p0-w0: stopping experience collection (34000 times) [2024-06-15 19:32:27,753][1651469] Signal inference workers to resume experience collection... (34000 times) [2024-06-15 19:32:27,782][1652491] InferenceWorker_p0-w0: resuming experience collection (34000 times) [2024-06-15 19:32:29,168][1652491] Updated weights for policy 0, policy_version 653562 (0.0097) [2024-06-15 19:32:30,976][1648985] Fps is (10 sec: 45780.0, 60 sec: 47497.1, 300 sec: 47538.0). Total num frames: 1338507264. Throughput: 0: 11497.3. Samples: 334681600. Policy #0 lag: (min: 11.0, avg: 106.9, max: 267.0) [2024-06-15 19:32:30,977][1648985] Avg episode reward: [(0, '142.740')] [2024-06-15 19:32:32,928][1652491] Updated weights for policy 0, policy_version 653624 (0.0012) [2024-06-15 19:32:35,955][1648985] Fps is (10 sec: 39321.9, 60 sec: 45875.4, 300 sec: 46652.8). Total num frames: 1338638336. Throughput: 0: 11343.7. Samples: 334755840. Policy #0 lag: (min: 1.0, avg: 130.9, max: 257.0) [2024-06-15 19:32:35,955][1648985] Avg episode reward: [(0, '146.460')] [2024-06-15 19:32:37,278][1652491] Updated weights for policy 0, policy_version 653680 (0.0013) [2024-06-15 19:32:38,800][1652491] Updated weights for policy 0, policy_version 653747 (0.0012) [2024-06-15 19:32:39,666][1652491] Updated weights for policy 0, policy_version 653763 (0.0012) [2024-06-15 19:32:40,955][1648985] Fps is (10 sec: 49254.3, 60 sec: 48059.8, 300 sec: 47874.6). Total num frames: 1338998784. Throughput: 0: 11525.7. Samples: 334789120. Policy #0 lag: (min: 1.0, avg: 130.9, max: 257.0) [2024-06-15 19:32:40,955][1648985] Avg episode reward: [(0, '153.040')] [2024-06-15 19:32:41,022][1652491] Updated weights for policy 0, policy_version 653823 (0.0014) [2024-06-15 19:32:44,364][1652491] Updated weights for policy 0, policy_version 653882 (0.0014) [2024-06-15 19:32:45,955][1648985] Fps is (10 sec: 52428.2, 60 sec: 45908.2, 300 sec: 46986.0). Total num frames: 1339162624. Throughput: 0: 11298.1. Samples: 334857216. Policy #0 lag: (min: 1.0, avg: 130.9, max: 257.0) [2024-06-15 19:32:45,956][1648985] Avg episode reward: [(0, '151.450')] [2024-06-15 19:32:47,679][1652491] Updated weights for policy 0, policy_version 653907 (0.0014) [2024-06-15 19:32:49,052][1652491] Updated weights for policy 0, policy_version 653956 (0.0015) [2024-06-15 19:32:50,042][1652491] Updated weights for policy 0, policy_version 654009 (0.0012) [2024-06-15 19:32:50,955][1648985] Fps is (10 sec: 45874.6, 60 sec: 48059.7, 300 sec: 47652.4). Total num frames: 1339457536. Throughput: 0: 11571.2. Samples: 334932480. Policy #0 lag: (min: 1.0, avg: 130.9, max: 257.0) [2024-06-15 19:32:50,956][1648985] Avg episode reward: [(0, '172.040')] [2024-06-15 19:32:51,372][1652491] Updated weights for policy 0, policy_version 654053 (0.0013) [2024-06-15 19:32:54,842][1652491] Updated weights for policy 0, policy_version 654128 (0.0015) [2024-06-15 19:32:55,955][1648985] Fps is (10 sec: 52429.2, 60 sec: 45875.3, 300 sec: 47097.1). Total num frames: 1339686912. Throughput: 0: 11434.7. Samples: 334966272. Policy #0 lag: (min: 1.0, avg: 130.9, max: 257.0) [2024-06-15 19:32:55,955][1648985] Avg episode reward: [(0, '173.440')] [2024-06-15 19:32:55,959][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000654144_1339686912.pth... [2024-06-15 19:32:56,026][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000648688_1328513024.pth [2024-06-15 19:32:58,639][1652491] Updated weights for policy 0, policy_version 654176 (0.0016) [2024-06-15 19:32:59,252][1652491] Updated weights for policy 0, policy_version 654206 (0.0011) [2024-06-15 19:33:00,736][1652491] Updated weights for policy 0, policy_version 654263 (0.0018) [2024-06-15 19:33:00,955][1648985] Fps is (10 sec: 49152.3, 60 sec: 48059.7, 300 sec: 47541.4). Total num frames: 1339949056. Throughput: 0: 11571.2. Samples: 335041536. Policy #0 lag: (min: 1.0, avg: 130.9, max: 257.0) [2024-06-15 19:33:00,956][1648985] Avg episode reward: [(0, '176.570')] [2024-06-15 19:33:03,013][1652491] Updated weights for policy 0, policy_version 654336 (0.0015) [2024-06-15 19:33:05,955][1648985] Fps is (10 sec: 49151.1, 60 sec: 45875.1, 300 sec: 47208.1). Total num frames: 1340178432. Throughput: 0: 11571.1. Samples: 335108608. Policy #0 lag: (min: 1.0, avg: 130.9, max: 257.0) [2024-06-15 19:33:05,956][1648985] Avg episode reward: [(0, '183.210')] [2024-06-15 19:33:08,671][1652491] Updated weights for policy 0, policy_version 654403 (0.0012) [2024-06-15 19:33:09,824][1652491] Updated weights for policy 0, policy_version 654464 (0.0011) [2024-06-15 19:33:10,955][1648985] Fps is (10 sec: 42598.5, 60 sec: 46421.3, 300 sec: 47208.2). Total num frames: 1340375040. Throughput: 0: 11650.8. Samples: 335150592. Policy #0 lag: (min: 1.0, avg: 130.9, max: 257.0) [2024-06-15 19:33:10,956][1648985] Avg episode reward: [(0, '175.100')] [2024-06-15 19:33:11,684][1652491] Updated weights for policy 0, policy_version 654521 (0.0014) [2024-06-15 19:33:12,526][1651469] Signal inference workers to stop experience collection... (34050 times) [2024-06-15 19:33:12,584][1652491] InferenceWorker_p0-w0: stopping experience collection (34050 times) [2024-06-15 19:33:12,832][1651469] Signal inference workers to resume experience collection... (34050 times) [2024-06-15 19:33:12,861][1652491] InferenceWorker_p0-w0: resuming experience collection (34050 times) [2024-06-15 19:33:13,294][1652491] Updated weights for policy 0, policy_version 654562 (0.0011) [2024-06-15 19:33:15,635][1652491] Updated weights for policy 0, policy_version 654624 (0.0013) [2024-06-15 19:33:15,955][1648985] Fps is (10 sec: 49153.2, 60 sec: 46967.5, 300 sec: 47541.4). Total num frames: 1340669952. Throughput: 0: 12077.4. Samples: 335224832. Policy #0 lag: (min: 1.0, avg: 130.9, max: 257.0) [2024-06-15 19:33:15,955][1648985] Avg episode reward: [(0, '166.900')] [2024-06-15 19:33:19,872][1652491] Updated weights for policy 0, policy_version 654673 (0.0013) [2024-06-15 19:33:20,651][1652491] Updated weights for policy 0, policy_version 654717 (0.0011) [2024-06-15 19:33:20,955][1648985] Fps is (10 sec: 49152.2, 60 sec: 46967.4, 300 sec: 47097.1). Total num frames: 1340866560. Throughput: 0: 12026.3. Samples: 335297024. Policy #0 lag: (min: 1.0, avg: 130.9, max: 257.0) [2024-06-15 19:33:20,956][1648985] Avg episode reward: [(0, '132.460')] [2024-06-15 19:33:22,357][1652491] Updated weights for policy 0, policy_version 654784 (0.0103) [2024-06-15 19:33:24,599][1652491] Updated weights for policy 0, policy_version 654844 (0.0104) [2024-06-15 19:33:25,955][1648985] Fps is (10 sec: 49151.8, 60 sec: 48605.9, 300 sec: 47652.5). Total num frames: 1341161472. Throughput: 0: 12128.7. Samples: 335334912. Policy #0 lag: (min: 1.0, avg: 130.9, max: 257.0) [2024-06-15 19:33:25,955][1648985] Avg episode reward: [(0, '150.940')] [2024-06-15 19:33:26,526][1652491] Updated weights for policy 0, policy_version 654896 (0.0013) [2024-06-15 19:33:30,955][1648985] Fps is (10 sec: 45875.2, 60 sec: 46983.7, 300 sec: 46874.9). Total num frames: 1341325312. Throughput: 0: 12265.3. Samples: 335409152. Policy #0 lag: (min: 1.0, avg: 130.9, max: 257.0) [2024-06-15 19:33:30,956][1648985] Avg episode reward: [(0, '154.790')] [2024-06-15 19:33:31,087][1652491] Updated weights for policy 0, policy_version 654946 (0.0014) [2024-06-15 19:33:32,867][1652491] Updated weights for policy 0, policy_version 655029 (0.0015) [2024-06-15 19:33:35,636][1652491] Updated weights for policy 0, policy_version 655059 (0.0011) [2024-06-15 19:33:35,955][1648985] Fps is (10 sec: 42598.6, 60 sec: 49152.0, 300 sec: 47430.3). Total num frames: 1341587456. Throughput: 0: 12094.6. Samples: 335476736. Policy #0 lag: (min: 1.0, avg: 130.9, max: 257.0) [2024-06-15 19:33:35,955][1648985] Avg episode reward: [(0, '147.620')] [2024-06-15 19:33:37,164][1652491] Updated weights for policy 0, policy_version 655120 (0.0012) [2024-06-15 19:33:40,955][1648985] Fps is (10 sec: 45874.4, 60 sec: 46421.2, 300 sec: 46652.7). Total num frames: 1341784064. Throughput: 0: 12060.4. Samples: 335508992. Policy #0 lag: (min: 1.0, avg: 130.9, max: 257.0) [2024-06-15 19:33:40,956][1648985] Avg episode reward: [(0, '128.580')] [2024-06-15 19:33:41,789][1652491] Updated weights for policy 0, policy_version 655184 (0.0012) [2024-06-15 19:33:42,803][1652491] Updated weights for policy 0, policy_version 655232 (0.0012) [2024-06-15 19:33:45,955][1648985] Fps is (10 sec: 45874.4, 60 sec: 48059.7, 300 sec: 47541.4). Total num frames: 1342046208. Throughput: 0: 11923.9. Samples: 335578112. Policy #0 lag: (min: 1.0, avg: 130.9, max: 257.0) [2024-06-15 19:33:45,956][1648985] Avg episode reward: [(0, '142.620')] [2024-06-15 19:33:46,384][1652491] Updated weights for policy 0, policy_version 655312 (0.0183) [2024-06-15 19:33:47,289][1652491] Updated weights for policy 0, policy_version 655353 (0.0012) [2024-06-15 19:33:48,520][1652491] Updated weights for policy 0, policy_version 655397 (0.0012) [2024-06-15 19:33:50,955][1648985] Fps is (10 sec: 52429.4, 60 sec: 47513.6, 300 sec: 46986.0). Total num frames: 1342308352. Throughput: 0: 12106.0. Samples: 335653376. Policy #0 lag: (min: 1.0, avg: 130.9, max: 257.0) [2024-06-15 19:33:50,956][1648985] Avg episode reward: [(0, '152.950')] [2024-06-15 19:33:53,316][1652491] Updated weights for policy 0, policy_version 655456 (0.0013) [2024-06-15 19:33:54,515][1651469] Signal inference workers to stop experience collection... (34100 times) [2024-06-15 19:33:54,561][1652491] InferenceWorker_p0-w0: stopping experience collection (34100 times) [2024-06-15 19:33:54,785][1651469] Signal inference workers to resume experience collection... (34100 times) [2024-06-15 19:33:54,791][1652491] InferenceWorker_p0-w0: resuming experience collection (34100 times) [2024-06-15 19:33:55,465][1652491] Updated weights for policy 0, policy_version 655544 (0.0013) [2024-06-15 19:33:55,955][1648985] Fps is (10 sec: 52430.0, 60 sec: 48059.8, 300 sec: 47541.4). Total num frames: 1342570496. Throughput: 0: 11901.2. Samples: 335686144. Policy #0 lag: (min: 1.0, avg: 130.9, max: 257.0) [2024-06-15 19:33:55,955][1648985] Avg episode reward: [(0, '175.280')] [2024-06-15 19:33:58,652][1652491] Updated weights for policy 0, policy_version 655584 (0.0012) [2024-06-15 19:34:00,364][1652491] Updated weights for policy 0, policy_version 655648 (0.0011) [2024-06-15 19:34:00,955][1648985] Fps is (10 sec: 49152.6, 60 sec: 47513.7, 300 sec: 47319.2). Total num frames: 1342799872. Throughput: 0: 11730.5. Samples: 335752704. Policy #0 lag: (min: 1.0, avg: 130.9, max: 257.0) [2024-06-15 19:34:00,955][1648985] Avg episode reward: [(0, '185.780')] [2024-06-15 19:34:05,124][1652491] Updated weights for policy 0, policy_version 655684 (0.0014) [2024-06-15 19:34:05,955][1648985] Fps is (10 sec: 32767.3, 60 sec: 45329.1, 300 sec: 46874.9). Total num frames: 1342898176. Throughput: 0: 11582.6. Samples: 335818240. Policy #0 lag: (min: 1.0, avg: 130.9, max: 257.0) [2024-06-15 19:34:05,956][1648985] Avg episode reward: [(0, '182.400')] [2024-06-15 19:34:06,769][1652491] Updated weights for policy 0, policy_version 655747 (0.0011) [2024-06-15 19:34:07,981][1652491] Updated weights for policy 0, policy_version 655802 (0.0016) [2024-06-15 19:34:10,176][1652491] Updated weights for policy 0, policy_version 655840 (0.0011) [2024-06-15 19:34:10,955][1648985] Fps is (10 sec: 39321.2, 60 sec: 46967.5, 300 sec: 47208.1). Total num frames: 1343193088. Throughput: 0: 11548.4. Samples: 335854592. Policy #0 lag: (min: 15.0, avg: 126.3, max: 271.0) [2024-06-15 19:34:10,956][1648985] Avg episode reward: [(0, '182.620')] [2024-06-15 19:34:11,535][1652491] Updated weights for policy 0, policy_version 655889 (0.0009) [2024-06-15 19:34:15,955][1648985] Fps is (10 sec: 45875.7, 60 sec: 44782.9, 300 sec: 46652.8). Total num frames: 1343356928. Throughput: 0: 11514.3. Samples: 335927296. Policy #0 lag: (min: 15.0, avg: 126.3, max: 271.0) [2024-06-15 19:34:15,955][1648985] Avg episode reward: [(0, '177.430')] [2024-06-15 19:34:16,632][1652491] Updated weights for policy 0, policy_version 655938 (0.0012) [2024-06-15 19:34:17,965][1652491] Updated weights for policy 0, policy_version 656006 (0.0043) [2024-06-15 19:34:19,249][1652491] Updated weights for policy 0, policy_version 656064 (0.0109) [2024-06-15 19:34:20,955][1648985] Fps is (10 sec: 49152.4, 60 sec: 46967.5, 300 sec: 47319.2). Total num frames: 1343684608. Throughput: 0: 11525.7. Samples: 335995392. Policy #0 lag: (min: 15.0, avg: 126.3, max: 271.0) [2024-06-15 19:34:20,956][1648985] Avg episode reward: [(0, '178.530')] [2024-06-15 19:34:21,782][1652491] Updated weights for policy 0, policy_version 656130 (0.0011) [2024-06-15 19:34:23,091][1652491] Updated weights for policy 0, policy_version 656192 (0.0013) [2024-06-15 19:34:25,955][1648985] Fps is (10 sec: 52429.2, 60 sec: 45329.1, 300 sec: 46652.8). Total num frames: 1343881216. Throughput: 0: 11559.9. Samples: 336029184. Policy #0 lag: (min: 15.0, avg: 126.3, max: 271.0) [2024-06-15 19:34:25,955][1648985] Avg episode reward: [(0, '165.020')] [2024-06-15 19:34:29,285][1652491] Updated weights for policy 0, policy_version 656256 (0.0076) [2024-06-15 19:34:30,725][1652491] Updated weights for policy 0, policy_version 656308 (0.0012) [2024-06-15 19:34:30,955][1648985] Fps is (10 sec: 42598.4, 60 sec: 46421.4, 300 sec: 47097.1). Total num frames: 1344110592. Throughput: 0: 11559.9. Samples: 336098304. Policy #0 lag: (min: 15.0, avg: 126.3, max: 271.0) [2024-06-15 19:34:30,955][1648985] Avg episode reward: [(0, '162.290')] [2024-06-15 19:34:32,475][1652491] Updated weights for policy 0, policy_version 656368 (0.0013) [2024-06-15 19:34:33,543][1652491] Updated weights for policy 0, policy_version 656405 (0.0012) [2024-06-15 19:34:34,692][1652491] Updated weights for policy 0, policy_version 656446 (0.0013) [2024-06-15 19:34:35,955][1648985] Fps is (10 sec: 52428.1, 60 sec: 46967.4, 300 sec: 47097.1). Total num frames: 1344405504. Throughput: 0: 11332.3. Samples: 336163328. Policy #0 lag: (min: 15.0, avg: 126.3, max: 271.0) [2024-06-15 19:34:35,956][1648985] Avg episode reward: [(0, '174.340')] [2024-06-15 19:34:40,082][1651469] Signal inference workers to stop experience collection... (34150 times) [2024-06-15 19:34:40,155][1652491] InferenceWorker_p0-w0: stopping experience collection (34150 times) [2024-06-15 19:34:40,288][1651469] Signal inference workers to resume experience collection... (34150 times) [2024-06-15 19:34:40,288][1652491] InferenceWorker_p0-w0: resuming experience collection (34150 times) [2024-06-15 19:34:40,885][1652491] Updated weights for policy 0, policy_version 656496 (0.0013) [2024-06-15 19:34:40,955][1648985] Fps is (10 sec: 39320.9, 60 sec: 45329.1, 300 sec: 46986.0). Total num frames: 1344503808. Throughput: 0: 11468.7. Samples: 336202240. Policy #0 lag: (min: 15.0, avg: 126.3, max: 271.0) [2024-06-15 19:34:40,956][1648985] Avg episode reward: [(0, '162.820')] [2024-06-15 19:34:42,722][1652491] Updated weights for policy 0, policy_version 656567 (0.0014) [2024-06-15 19:34:43,511][1652491] Updated weights for policy 0, policy_version 656596 (0.0012) [2024-06-15 19:34:45,110][1652491] Updated weights for policy 0, policy_version 656657 (0.0011) [2024-06-15 19:34:45,955][1648985] Fps is (10 sec: 49152.9, 60 sec: 47513.8, 300 sec: 47097.1). Total num frames: 1344897024. Throughput: 0: 11389.2. Samples: 336265216. Policy #0 lag: (min: 15.0, avg: 126.3, max: 271.0) [2024-06-15 19:34:45,955][1648985] Avg episode reward: [(0, '150.980')] [2024-06-15 19:34:46,083][1652491] Updated weights for policy 0, policy_version 656698 (0.0011) [2024-06-15 19:34:50,955][1648985] Fps is (10 sec: 45875.3, 60 sec: 44236.8, 300 sec: 46763.8). Total num frames: 1344962560. Throughput: 0: 11696.4. Samples: 336344576. Policy #0 lag: (min: 15.0, avg: 126.3, max: 271.0) [2024-06-15 19:34:50,956][1648985] Avg episode reward: [(0, '151.380')] [2024-06-15 19:34:51,824][1652491] Updated weights for policy 0, policy_version 656768 (0.0022) [2024-06-15 19:34:54,051][1652491] Updated weights for policy 0, policy_version 656834 (0.0025) [2024-06-15 19:34:55,228][1652491] Updated weights for policy 0, policy_version 656890 (0.0014) [2024-06-15 19:34:55,955][1648985] Fps is (10 sec: 45874.0, 60 sec: 46421.2, 300 sec: 47208.1). Total num frames: 1345355776. Throughput: 0: 11537.0. Samples: 336373760. Policy #0 lag: (min: 15.0, avg: 126.3, max: 271.0) [2024-06-15 19:34:55,956][1648985] Avg episode reward: [(0, '154.260')] [2024-06-15 19:34:56,662][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000656944_1345421312.pth... [2024-06-15 19:34:56,664][1652491] Updated weights for policy 0, policy_version 656944 (0.0017) [2024-06-15 19:34:56,692][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000651392_1334050816.pth [2024-06-15 19:35:00,955][1648985] Fps is (10 sec: 49152.0, 60 sec: 44236.7, 300 sec: 46652.7). Total num frames: 1345454080. Throughput: 0: 11616.7. Samples: 336450048. Policy #0 lag: (min: 15.0, avg: 126.3, max: 271.0) [2024-06-15 19:35:00,956][1648985] Avg episode reward: [(0, '157.350')] [2024-06-15 19:35:01,476][1652491] Updated weights for policy 0, policy_version 656978 (0.0011) [2024-06-15 19:35:03,818][1652491] Updated weights for policy 0, policy_version 657040 (0.0013) [2024-06-15 19:35:05,879][1652491] Updated weights for policy 0, policy_version 657145 (0.0014) [2024-06-15 19:35:05,955][1648985] Fps is (10 sec: 49152.4, 60 sec: 49152.0, 300 sec: 47430.3). Total num frames: 1345847296. Throughput: 0: 11525.7. Samples: 336514048. Policy #0 lag: (min: 15.0, avg: 126.3, max: 271.0) [2024-06-15 19:35:05,956][1648985] Avg episode reward: [(0, '162.350')] [2024-06-15 19:35:07,753][1652491] Updated weights for policy 0, policy_version 657185 (0.0012) [2024-06-15 19:35:10,955][1648985] Fps is (10 sec: 52429.1, 60 sec: 46421.3, 300 sec: 46652.7). Total num frames: 1345978368. Throughput: 0: 11616.7. Samples: 336551936. Policy #0 lag: (min: 15.0, avg: 126.3, max: 271.0) [2024-06-15 19:35:10,956][1648985] Avg episode reward: [(0, '152.150')] [2024-06-15 19:35:11,779][1652491] Updated weights for policy 0, policy_version 657221 (0.0016) [2024-06-15 19:35:12,767][1652491] Updated weights for policy 0, policy_version 657280 (0.0035) [2024-06-15 19:35:15,269][1652491] Updated weights for policy 0, policy_version 657344 (0.0014) [2024-06-15 19:35:15,758][1651469] Signal inference workers to stop experience collection... (34200 times) [2024-06-15 19:35:15,796][1652491] InferenceWorker_p0-w0: stopping experience collection (34200 times) [2024-06-15 19:35:15,950][1651469] Signal inference workers to resume experience collection... (34200 times) [2024-06-15 19:35:15,951][1652491] InferenceWorker_p0-w0: resuming experience collection (34200 times) [2024-06-15 19:35:15,955][1648985] Fps is (10 sec: 45875.6, 60 sec: 49152.0, 300 sec: 47430.3). Total num frames: 1346306048. Throughput: 0: 11787.4. Samples: 336628736. Policy #0 lag: (min: 15.0, avg: 126.3, max: 271.0) [2024-06-15 19:35:15,955][1648985] Avg episode reward: [(0, '168.060')] [2024-06-15 19:35:16,472][1652491] Updated weights for policy 0, policy_version 657400 (0.0021) [2024-06-15 19:35:18,956][1652491] Updated weights for policy 0, policy_version 657456 (0.0012) [2024-06-15 19:35:19,316][1652491] Updated weights for policy 0, policy_version 657472 (0.0010) [2024-06-15 19:35:20,955][1648985] Fps is (10 sec: 52429.1, 60 sec: 46967.4, 300 sec: 46986.0). Total num frames: 1346502656. Throughput: 0: 11992.2. Samples: 336702976. Policy #0 lag: (min: 15.0, avg: 126.3, max: 271.0) [2024-06-15 19:35:20,956][1648985] Avg episode reward: [(0, '174.970')] [2024-06-15 19:35:23,555][1652491] Updated weights for policy 0, policy_version 657533 (0.0019) [2024-06-15 19:35:25,955][1648985] Fps is (10 sec: 42598.2, 60 sec: 47513.5, 300 sec: 47326.2). Total num frames: 1346732032. Throughput: 0: 12049.1. Samples: 336744448. Policy #0 lag: (min: 15.0, avg: 126.3, max: 271.0) [2024-06-15 19:35:25,955][1648985] Avg episode reward: [(0, '170.490')] [2024-06-15 19:35:26,160][1652491] Updated weights for policy 0, policy_version 657603 (0.0012) [2024-06-15 19:35:27,283][1652491] Updated weights for policy 0, policy_version 657664 (0.0013) [2024-06-15 19:35:29,336][1652491] Updated weights for policy 0, policy_version 657728 (0.0022) [2024-06-15 19:35:30,955][1648985] Fps is (10 sec: 52428.4, 60 sec: 48605.8, 300 sec: 47319.2). Total num frames: 1347026944. Throughput: 0: 12014.9. Samples: 336805888. Policy #0 lag: (min: 15.0, avg: 126.3, max: 271.0) [2024-06-15 19:35:30,956][1648985] Avg episode reward: [(0, '159.940')] [2024-06-15 19:35:35,955][1648985] Fps is (10 sec: 45874.5, 60 sec: 46421.3, 300 sec: 47208.1). Total num frames: 1347190784. Throughput: 0: 12140.1. Samples: 336890880. Policy #0 lag: (min: 15.0, avg: 126.3, max: 271.0) [2024-06-15 19:35:35,956][1648985] Avg episode reward: [(0, '163.640')] [2024-06-15 19:35:36,165][1652491] Updated weights for policy 0, policy_version 657824 (0.0014) [2024-06-15 19:35:37,402][1652491] Updated weights for policy 0, policy_version 657876 (0.0012) [2024-06-15 19:35:39,580][1652491] Updated weights for policy 0, policy_version 657945 (0.0113) [2024-06-15 19:35:40,955][1648985] Fps is (10 sec: 52429.3, 60 sec: 50790.5, 300 sec: 47541.4). Total num frames: 1347551232. Throughput: 0: 12094.6. Samples: 336918016. Policy #0 lag: (min: 15.0, avg: 126.3, max: 271.0) [2024-06-15 19:35:40,956][1648985] Avg episode reward: [(0, '153.990')] [2024-06-15 19:35:45,156][1652491] Updated weights for policy 0, policy_version 658016 (0.0034) [2024-06-15 19:35:45,955][1648985] Fps is (10 sec: 45875.6, 60 sec: 45875.0, 300 sec: 46986.0). Total num frames: 1347649536. Throughput: 0: 12197.0. Samples: 336998912. Policy #0 lag: (min: 15.0, avg: 126.3, max: 271.0) [2024-06-15 19:35:45,956][1648985] Avg episode reward: [(0, '163.060')] [2024-06-15 19:35:47,527][1652491] Updated weights for policy 0, policy_version 658096 (0.0015) [2024-06-15 19:35:49,265][1652491] Updated weights for policy 0, policy_version 658176 (0.0023) [2024-06-15 19:35:50,955][1648985] Fps is (10 sec: 45874.9, 60 sec: 50790.4, 300 sec: 47541.4). Total num frames: 1348009984. Throughput: 0: 11946.7. Samples: 337051648. Policy #0 lag: (min: 28.0, avg: 123.4, max: 284.0) [2024-06-15 19:35:50,956][1648985] Avg episode reward: [(0, '155.060')] [2024-06-15 19:35:51,454][1652491] Updated weights for policy 0, policy_version 658240 (0.0124) [2024-06-15 19:35:55,955][1648985] Fps is (10 sec: 42597.8, 60 sec: 45329.0, 300 sec: 46652.7). Total num frames: 1348075520. Throughput: 0: 11923.9. Samples: 337088512. Policy #0 lag: (min: 28.0, avg: 123.4, max: 284.0) [2024-06-15 19:35:55,956][1648985] Avg episode reward: [(0, '134.310')] [2024-06-15 19:35:58,594][1652491] Updated weights for policy 0, policy_version 658310 (0.0023) [2024-06-15 19:35:59,575][1651469] Signal inference workers to stop experience collection... (34250 times) [2024-06-15 19:35:59,627][1652491] InferenceWorker_p0-w0: stopping experience collection (34250 times) [2024-06-15 19:35:59,886][1651469] Signal inference workers to resume experience collection... (34250 times) [2024-06-15 19:35:59,890][1652491] InferenceWorker_p0-w0: resuming experience collection (34250 times) [2024-06-15 19:36:00,065][1652491] Updated weights for policy 0, policy_version 658370 (0.0013) [2024-06-15 19:36:00,955][1648985] Fps is (10 sec: 39321.9, 60 sec: 49152.1, 300 sec: 47319.2). Total num frames: 1348403200. Throughput: 0: 11810.1. Samples: 337160192. Policy #0 lag: (min: 28.0, avg: 123.4, max: 284.0) [2024-06-15 19:36:00,955][1648985] Avg episode reward: [(0, '131.830')] [2024-06-15 19:36:01,728][1652491] Updated weights for policy 0, policy_version 658433 (0.0012) [2024-06-15 19:36:05,955][1648985] Fps is (10 sec: 52429.7, 60 sec: 45875.2, 300 sec: 46763.8). Total num frames: 1348599808. Throughput: 0: 11525.7. Samples: 337221632. Policy #0 lag: (min: 28.0, avg: 123.4, max: 284.0) [2024-06-15 19:36:05,956][1648985] Avg episode reward: [(0, '145.710')] [2024-06-15 19:36:09,104][1652491] Updated weights for policy 0, policy_version 658501 (0.0112) [2024-06-15 19:36:10,719][1652491] Updated weights for policy 0, policy_version 658561 (0.0020) [2024-06-15 19:36:10,955][1648985] Fps is (10 sec: 32766.9, 60 sec: 45875.0, 300 sec: 46652.7). Total num frames: 1348730880. Throughput: 0: 11537.0. Samples: 337263616. Policy #0 lag: (min: 28.0, avg: 123.4, max: 284.0) [2024-06-15 19:36:10,956][1648985] Avg episode reward: [(0, '163.100')] [2024-06-15 19:36:12,536][1652491] Updated weights for policy 0, policy_version 658640 (0.0012) [2024-06-15 19:36:14,750][1652491] Updated weights for policy 0, policy_version 658727 (0.0265) [2024-06-15 19:36:15,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 46967.4, 300 sec: 47097.1). Total num frames: 1349124096. Throughput: 0: 11343.7. Samples: 337316352. Policy #0 lag: (min: 28.0, avg: 123.4, max: 284.0) [2024-06-15 19:36:15,956][1648985] Avg episode reward: [(0, '165.560')] [2024-06-15 19:36:20,568][1652491] Updated weights for policy 0, policy_version 658758 (0.0014) [2024-06-15 19:36:20,955][1648985] Fps is (10 sec: 42599.6, 60 sec: 44236.8, 300 sec: 46652.8). Total num frames: 1349156864. Throughput: 0: 11377.8. Samples: 337402880. Policy #0 lag: (min: 28.0, avg: 123.4, max: 284.0) [2024-06-15 19:36:20,956][1648985] Avg episode reward: [(0, '159.650')] [2024-06-15 19:36:21,488][1652491] Updated weights for policy 0, policy_version 658810 (0.0013) [2024-06-15 19:36:22,695][1652491] Updated weights for policy 0, policy_version 658853 (0.0017) [2024-06-15 19:36:23,592][1652491] Updated weights for policy 0, policy_version 658901 (0.0013) [2024-06-15 19:36:25,067][1652491] Updated weights for policy 0, policy_version 658962 (0.0026) [2024-06-15 19:36:25,955][1648985] Fps is (10 sec: 49152.7, 60 sec: 48059.8, 300 sec: 47319.2). Total num frames: 1349615616. Throughput: 0: 11446.1. Samples: 337433088. Policy #0 lag: (min: 28.0, avg: 123.4, max: 284.0) [2024-06-15 19:36:25,955][1648985] Avg episode reward: [(0, '151.730')] [2024-06-15 19:36:26,132][1652491] Updated weights for policy 0, policy_version 659008 (0.0036) [2024-06-15 19:36:30,955][1648985] Fps is (10 sec: 49151.5, 60 sec: 43690.6, 300 sec: 46652.7). Total num frames: 1349648384. Throughput: 0: 11389.1. Samples: 337511424. Policy #0 lag: (min: 28.0, avg: 123.4, max: 284.0) [2024-06-15 19:36:30,956][1648985] Avg episode reward: [(0, '152.660')] [2024-06-15 19:36:32,213][1652491] Updated weights for policy 0, policy_version 659068 (0.0012) [2024-06-15 19:36:33,788][1652491] Updated weights for policy 0, policy_version 659122 (0.0013) [2024-06-15 19:36:35,686][1652491] Updated weights for policy 0, policy_version 659205 (0.0013) [2024-06-15 19:36:35,955][1648985] Fps is (10 sec: 45874.5, 60 sec: 48059.8, 300 sec: 47319.2). Total num frames: 1350074368. Throughput: 0: 11548.5. Samples: 337571328. Policy #0 lag: (min: 28.0, avg: 123.4, max: 284.0) [2024-06-15 19:36:35,956][1648985] Avg episode reward: [(0, '151.430')] [2024-06-15 19:36:36,450][1651469] Signal inference workers to stop experience collection... (34300 times) [2024-06-15 19:36:36,516][1652491] InferenceWorker_p0-w0: stopping experience collection (34300 times) [2024-06-15 19:36:36,778][1651469] Signal inference workers to resume experience collection... (34300 times) [2024-06-15 19:36:36,779][1652491] InferenceWorker_p0-w0: resuming experience collection (34300 times) [2024-06-15 19:36:37,210][1652491] Updated weights for policy 0, policy_version 659263 (0.0015) [2024-06-15 19:36:40,955][1648985] Fps is (10 sec: 52429.8, 60 sec: 43690.7, 300 sec: 46659.6). Total num frames: 1350172672. Throughput: 0: 11559.9. Samples: 337608704. Policy #0 lag: (min: 28.0, avg: 123.4, max: 284.0) [2024-06-15 19:36:40,955][1648985] Avg episode reward: [(0, '147.080')] [2024-06-15 19:36:43,424][1652491] Updated weights for policy 0, policy_version 659320 (0.0012) [2024-06-15 19:36:44,788][1652491] Updated weights for policy 0, policy_version 659361 (0.0013) [2024-06-15 19:36:45,701][1652491] Updated weights for policy 0, policy_version 659408 (0.0030) [2024-06-15 19:36:45,955][1648985] Fps is (10 sec: 39321.7, 60 sec: 46967.5, 300 sec: 47097.1). Total num frames: 1350467584. Throughput: 0: 11730.5. Samples: 337688064. Policy #0 lag: (min: 28.0, avg: 123.4, max: 284.0) [2024-06-15 19:36:45,956][1648985] Avg episode reward: [(0, '151.780')] [2024-06-15 19:36:46,743][1652491] Updated weights for policy 0, policy_version 659456 (0.0043) [2024-06-15 19:36:48,289][1652491] Updated weights for policy 0, policy_version 659517 (0.0014) [2024-06-15 19:36:50,955][1648985] Fps is (10 sec: 52428.1, 60 sec: 44782.9, 300 sec: 46652.8). Total num frames: 1350696960. Throughput: 0: 11878.4. Samples: 337756160. Policy #0 lag: (min: 28.0, avg: 123.4, max: 284.0) [2024-06-15 19:36:50,956][1648985] Avg episode reward: [(0, '156.040')] [2024-06-15 19:36:54,272][1652491] Updated weights for policy 0, policy_version 659568 (0.0101) [2024-06-15 19:36:55,804][1652491] Updated weights for policy 0, policy_version 659618 (0.0015) [2024-06-15 19:36:55,955][1648985] Fps is (10 sec: 42596.7, 60 sec: 46967.3, 300 sec: 46874.8). Total num frames: 1350893568. Throughput: 0: 11832.9. Samples: 337796096. Policy #0 lag: (min: 28.0, avg: 123.4, max: 284.0) [2024-06-15 19:36:55,956][1648985] Avg episode reward: [(0, '163.130')] [2024-06-15 19:36:56,397][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000659648_1350959104.pth... [2024-06-15 19:36:56,549][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000654144_1339686912.pth [2024-06-15 19:36:57,426][1652491] Updated weights for policy 0, policy_version 659696 (0.0089) [2024-06-15 19:36:59,138][1652491] Updated weights for policy 0, policy_version 659760 (0.0016) [2024-06-15 19:37:00,955][1648985] Fps is (10 sec: 52429.1, 60 sec: 46967.4, 300 sec: 46763.8). Total num frames: 1351221248. Throughput: 0: 11889.8. Samples: 337851392. Policy #0 lag: (min: 28.0, avg: 123.4, max: 284.0) [2024-06-15 19:37:00,955][1648985] Avg episode reward: [(0, '168.100')] [2024-06-15 19:37:05,347][1652491] Updated weights for policy 0, policy_version 659811 (0.0015) [2024-06-15 19:37:05,959][1648985] Fps is (10 sec: 45858.9, 60 sec: 45872.2, 300 sec: 46652.1). Total num frames: 1351352320. Throughput: 0: 11684.0. Samples: 337928704. Policy #0 lag: (min: 28.0, avg: 123.4, max: 284.0) [2024-06-15 19:37:05,960][1648985] Avg episode reward: [(0, '149.150')] [2024-06-15 19:37:07,665][1652491] Updated weights for policy 0, policy_version 659904 (0.0014) [2024-06-15 19:37:09,083][1652491] Updated weights for policy 0, policy_version 659969 (0.0014) [2024-06-15 19:37:10,428][1652491] Updated weights for policy 0, policy_version 660026 (0.0012) [2024-06-15 19:37:10,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 50244.5, 300 sec: 47097.1). Total num frames: 1351745536. Throughput: 0: 11628.1. Samples: 337956352. Policy #0 lag: (min: 28.0, avg: 123.4, max: 284.0) [2024-06-15 19:37:10,955][1648985] Avg episode reward: [(0, '149.150')] [2024-06-15 19:37:15,955][1648985] Fps is (10 sec: 39336.8, 60 sec: 43690.6, 300 sec: 46430.6). Total num frames: 1351745536. Throughput: 0: 11571.2. Samples: 338032128. Policy #0 lag: (min: 28.0, avg: 123.4, max: 284.0) [2024-06-15 19:37:15,956][1648985] Avg episode reward: [(0, '136.110')] [2024-06-15 19:37:16,930][1652491] Updated weights for policy 0, policy_version 660089 (0.0125) [2024-06-15 19:37:19,383][1652491] Updated weights for policy 0, policy_version 660169 (0.0013) [2024-06-15 19:37:19,603][1651469] Signal inference workers to stop experience collection... (34350 times) [2024-06-15 19:37:19,653][1652491] InferenceWorker_p0-w0: stopping experience collection (34350 times) [2024-06-15 19:37:19,803][1651469] Signal inference workers to resume experience collection... (34350 times) [2024-06-15 19:37:19,804][1652491] InferenceWorker_p0-w0: resuming experience collection (34350 times) [2024-06-15 19:37:20,955][1648985] Fps is (10 sec: 42598.2, 60 sec: 50244.2, 300 sec: 47208.1). Total num frames: 1352171520. Throughput: 0: 11628.1. Samples: 338094592. Policy #0 lag: (min: 28.0, avg: 123.4, max: 284.0) [2024-06-15 19:37:20,956][1648985] Avg episode reward: [(0, '151.950')] [2024-06-15 19:37:20,977][1652491] Updated weights for policy 0, policy_version 660243 (0.0015) [2024-06-15 19:37:22,036][1652491] Updated weights for policy 0, policy_version 660287 (0.0013) [2024-06-15 19:37:25,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 44236.7, 300 sec: 46656.0). Total num frames: 1352269824. Throughput: 0: 11650.8. Samples: 338132992. Policy #0 lag: (min: 28.0, avg: 123.4, max: 284.0) [2024-06-15 19:37:25,956][1648985] Avg episode reward: [(0, '151.440')] [2024-06-15 19:37:28,432][1652491] Updated weights for policy 0, policy_version 660352 (0.0015) [2024-06-15 19:37:30,062][1652491] Updated weights for policy 0, policy_version 660403 (0.0014) [2024-06-15 19:37:30,955][1648985] Fps is (10 sec: 42598.6, 60 sec: 49152.1, 300 sec: 47319.2). Total num frames: 1352597504. Throughput: 0: 11468.8. Samples: 338204160. Policy #0 lag: (min: 3.0, avg: 67.2, max: 259.0) [2024-06-15 19:37:30,956][1648985] Avg episode reward: [(0, '159.480')] [2024-06-15 19:37:31,241][1652491] Updated weights for policy 0, policy_version 660478 (0.0018) [2024-06-15 19:37:33,046][1652491] Updated weights for policy 0, policy_version 660538 (0.0011) [2024-06-15 19:37:35,955][1648985] Fps is (10 sec: 52429.1, 60 sec: 45329.1, 300 sec: 46763.8). Total num frames: 1352794112. Throughput: 0: 11480.2. Samples: 338272768. Policy #0 lag: (min: 3.0, avg: 67.2, max: 259.0) [2024-06-15 19:37:35,956][1648985] Avg episode reward: [(0, '176.660')] [2024-06-15 19:37:39,853][1652491] Updated weights for policy 0, policy_version 660578 (0.0013) [2024-06-15 19:37:40,955][1648985] Fps is (10 sec: 32768.1, 60 sec: 45875.2, 300 sec: 46652.8). Total num frames: 1352925184. Throughput: 0: 11571.3. Samples: 338316800. Policy #0 lag: (min: 3.0, avg: 67.2, max: 259.0) [2024-06-15 19:37:40,956][1648985] Avg episode reward: [(0, '172.440')] [2024-06-15 19:37:41,864][1652491] Updated weights for policy 0, policy_version 660657 (0.0018) [2024-06-15 19:37:43,125][1652491] Updated weights for policy 0, policy_version 660720 (0.0013) [2024-06-15 19:37:44,764][1652491] Updated weights for policy 0, policy_version 660770 (0.0012) [2024-06-15 19:37:45,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 47513.6, 300 sec: 46986.0). Total num frames: 1353318400. Throughput: 0: 11457.4. Samples: 338366976. Policy #0 lag: (min: 3.0, avg: 67.2, max: 259.0) [2024-06-15 19:37:45,956][1648985] Avg episode reward: [(0, '170.340')] [2024-06-15 19:37:50,955][1648985] Fps is (10 sec: 39321.8, 60 sec: 43690.8, 300 sec: 46208.4). Total num frames: 1353318400. Throughput: 0: 11538.1. Samples: 338447872. Policy #0 lag: (min: 3.0, avg: 67.2, max: 259.0) [2024-06-15 19:37:50,955][1648985] Avg episode reward: [(0, '145.250')] [2024-06-15 19:37:51,665][1652491] Updated weights for policy 0, policy_version 660819 (0.0013) [2024-06-15 19:37:52,958][1652491] Updated weights for policy 0, policy_version 660880 (0.0012) [2024-06-15 19:37:54,564][1652491] Updated weights for policy 0, policy_version 660960 (0.0013) [2024-06-15 19:37:55,955][1648985] Fps is (10 sec: 42597.9, 60 sec: 47513.8, 300 sec: 46763.8). Total num frames: 1353744384. Throughput: 0: 11571.2. Samples: 338477056. Policy #0 lag: (min: 3.0, avg: 67.2, max: 259.0) [2024-06-15 19:37:55,956][1648985] Avg episode reward: [(0, '148.530')] [2024-06-15 19:37:56,687][1652491] Updated weights for policy 0, policy_version 661043 (0.0148) [2024-06-15 19:38:00,955][1648985] Fps is (10 sec: 52428.1, 60 sec: 43690.6, 300 sec: 46319.5). Total num frames: 1353842688. Throughput: 0: 11298.1. Samples: 338540544. Policy #0 lag: (min: 3.0, avg: 67.2, max: 259.0) [2024-06-15 19:38:00,956][1648985] Avg episode reward: [(0, '143.050')] [2024-06-15 19:38:02,861][1651469] Signal inference workers to stop experience collection... (34400 times) [2024-06-15 19:38:02,924][1652491] Updated weights for policy 0, policy_version 661090 (0.0115) [2024-06-15 19:38:02,988][1652491] InferenceWorker_p0-w0: stopping experience collection (34400 times) [2024-06-15 19:38:03,181][1651469] Signal inference workers to resume experience collection... (34400 times) [2024-06-15 19:38:03,182][1652491] InferenceWorker_p0-w0: resuming experience collection (34400 times) [2024-06-15 19:38:04,273][1652491] Updated weights for policy 0, policy_version 661137 (0.0013) [2024-06-15 19:38:05,514][1652491] Updated weights for policy 0, policy_version 661200 (0.0014) [2024-06-15 19:38:05,955][1648985] Fps is (10 sec: 42598.8, 60 sec: 46970.5, 300 sec: 46763.8). Total num frames: 1354170368. Throughput: 0: 11446.1. Samples: 338609664. Policy #0 lag: (min: 3.0, avg: 67.2, max: 259.0) [2024-06-15 19:38:05,956][1648985] Avg episode reward: [(0, '158.300')] [2024-06-15 19:38:06,519][1652491] Updated weights for policy 0, policy_version 661248 (0.0013) [2024-06-15 19:38:10,955][1648985] Fps is (10 sec: 52428.2, 60 sec: 43690.5, 300 sec: 46430.6). Total num frames: 1354366976. Throughput: 0: 11377.7. Samples: 338644992. Policy #0 lag: (min: 3.0, avg: 67.2, max: 259.0) [2024-06-15 19:38:10,956][1648985] Avg episode reward: [(0, '172.660')] [2024-06-15 19:38:13,076][1652491] Updated weights for policy 0, policy_version 661322 (0.0013) [2024-06-15 19:38:14,226][1652491] Updated weights for policy 0, policy_version 661371 (0.0012) [2024-06-15 19:38:15,417][1652491] Updated weights for policy 0, policy_version 661408 (0.0012) [2024-06-15 19:38:15,955][1648985] Fps is (10 sec: 42598.3, 60 sec: 47513.6, 300 sec: 46541.7). Total num frames: 1354596352. Throughput: 0: 11571.2. Samples: 338724864. Policy #0 lag: (min: 3.0, avg: 67.2, max: 259.0) [2024-06-15 19:38:15,956][1648985] Avg episode reward: [(0, '180.190')] [2024-06-15 19:38:17,149][1652491] Updated weights for policy 0, policy_version 661480 (0.0013) [2024-06-15 19:38:18,996][1652491] Updated weights for policy 0, policy_version 661568 (0.0015) [2024-06-15 19:38:20,955][1648985] Fps is (10 sec: 52429.8, 60 sec: 45329.1, 300 sec: 46541.7). Total num frames: 1354891264. Throughput: 0: 11457.4. Samples: 338788352. Policy #0 lag: (min: 3.0, avg: 67.2, max: 259.0) [2024-06-15 19:38:20,956][1648985] Avg episode reward: [(0, '172.500')] [2024-06-15 19:38:25,605][1652491] Updated weights for policy 0, policy_version 661631 (0.0014) [2024-06-15 19:38:25,955][1648985] Fps is (10 sec: 42598.4, 60 sec: 45875.2, 300 sec: 46430.6). Total num frames: 1355022336. Throughput: 0: 11332.3. Samples: 338826752. Policy #0 lag: (min: 3.0, avg: 67.2, max: 259.0) [2024-06-15 19:38:25,956][1648985] Avg episode reward: [(0, '156.530')] [2024-06-15 19:38:28,523][1652491] Updated weights for policy 0, policy_version 661712 (0.0012) [2024-06-15 19:38:30,828][1652491] Updated weights for policy 0, policy_version 661808 (0.0012) [2024-06-15 19:38:30,955][1648985] Fps is (10 sec: 49151.8, 60 sec: 46421.3, 300 sec: 46763.8). Total num frames: 1355382784. Throughput: 0: 11434.7. Samples: 338881536. Policy #0 lag: (min: 3.0, avg: 67.2, max: 259.0) [2024-06-15 19:38:30,956][1648985] Avg episode reward: [(0, '132.510')] [2024-06-15 19:38:35,955][1648985] Fps is (10 sec: 39321.7, 60 sec: 43690.7, 300 sec: 46208.5). Total num frames: 1355415552. Throughput: 0: 11411.9. Samples: 338961408. Policy #0 lag: (min: 3.0, avg: 67.2, max: 259.0) [2024-06-15 19:38:35,956][1648985] Avg episode reward: [(0, '144.110')] [2024-06-15 19:38:36,304][1652491] Updated weights for policy 0, policy_version 661844 (0.0012) [2024-06-15 19:38:38,115][1652491] Updated weights for policy 0, policy_version 661891 (0.0013) [2024-06-15 19:38:40,702][1652491] Updated weights for policy 0, policy_version 662000 (0.0094) [2024-06-15 19:38:40,955][1648985] Fps is (10 sec: 39322.1, 60 sec: 47513.6, 300 sec: 46541.7). Total num frames: 1355776000. Throughput: 0: 11514.4. Samples: 338995200. Policy #0 lag: (min: 3.0, avg: 67.2, max: 259.0) [2024-06-15 19:38:40,955][1648985] Avg episode reward: [(0, '149.520')] [2024-06-15 19:38:41,288][1651469] Signal inference workers to stop experience collection... (34450 times) [2024-06-15 19:38:41,344][1652491] InferenceWorker_p0-w0: stopping experience collection (34450 times) [2024-06-15 19:38:41,504][1651469] Signal inference workers to resume experience collection... (34450 times) [2024-06-15 19:38:41,504][1652491] InferenceWorker_p0-w0: resuming experience collection (34450 times) [2024-06-15 19:38:42,224][1652491] Updated weights for policy 0, policy_version 662064 (0.0011) [2024-06-15 19:38:45,956][1648985] Fps is (10 sec: 52427.5, 60 sec: 43690.5, 300 sec: 46208.4). Total num frames: 1355939840. Throughput: 0: 11480.1. Samples: 339057152. Policy #0 lag: (min: 3.0, avg: 67.2, max: 259.0) [2024-06-15 19:38:45,957][1648985] Avg episode reward: [(0, '155.090')] [2024-06-15 19:38:47,816][1652491] Updated weights for policy 0, policy_version 662098 (0.0013) [2024-06-15 19:38:49,569][1652491] Updated weights for policy 0, policy_version 662164 (0.0023) [2024-06-15 19:38:50,955][1648985] Fps is (10 sec: 42598.2, 60 sec: 48059.7, 300 sec: 46208.4). Total num frames: 1356201984. Throughput: 0: 11502.9. Samples: 339127296. Policy #0 lag: (min: 3.0, avg: 67.2, max: 259.0) [2024-06-15 19:38:50,956][1648985] Avg episode reward: [(0, '134.770')] [2024-06-15 19:38:52,024][1652491] Updated weights for policy 0, policy_version 662259 (0.0012) [2024-06-15 19:38:53,449][1652491] Updated weights for policy 0, policy_version 662306 (0.0011) [2024-06-15 19:38:55,955][1648985] Fps is (10 sec: 52428.1, 60 sec: 45328.9, 300 sec: 46319.4). Total num frames: 1356464128. Throughput: 0: 11264.0. Samples: 339151872. Policy #0 lag: (min: 3.0, avg: 67.2, max: 259.0) [2024-06-15 19:38:55,956][1648985] Avg episode reward: [(0, '146.380')] [2024-06-15 19:38:55,962][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000662336_1356464128.pth... [2024-06-15 19:38:56,070][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000656944_1345421312.pth [2024-06-15 19:38:58,982][1652491] Updated weights for policy 0, policy_version 662352 (0.0012) [2024-06-15 19:39:00,682][1652491] Updated weights for policy 0, policy_version 662416 (0.0118) [2024-06-15 19:39:00,967][1648985] Fps is (10 sec: 42550.6, 60 sec: 46412.7, 300 sec: 46539.9). Total num frames: 1356627968. Throughput: 0: 11352.2. Samples: 339235840. Policy #0 lag: (min: 3.0, avg: 67.2, max: 259.0) [2024-06-15 19:39:00,968][1648985] Avg episode reward: [(0, '158.080')] [2024-06-15 19:39:02,187][1652491] Updated weights for policy 0, policy_version 662466 (0.0013) [2024-06-15 19:39:03,409][1652491] Updated weights for policy 0, policy_version 662519 (0.0014) [2024-06-15 19:39:04,353][1652491] Updated weights for policy 0, policy_version 662548 (0.0010) [2024-06-15 19:39:05,955][1648985] Fps is (10 sec: 52430.8, 60 sec: 46967.5, 300 sec: 46763.8). Total num frames: 1356988416. Throughput: 0: 11286.8. Samples: 339296256. Policy #0 lag: (min: 64.0, avg: 225.3, max: 327.0) [2024-06-15 19:39:05,956][1648985] Avg episode reward: [(0, '154.040')] [2024-06-15 19:39:09,730][1652491] Updated weights for policy 0, policy_version 662595 (0.0071) [2024-06-15 19:39:10,751][1652491] Updated weights for policy 0, policy_version 662647 (0.0013) [2024-06-15 19:39:10,955][1648985] Fps is (10 sec: 49207.3, 60 sec: 45875.4, 300 sec: 46652.7). Total num frames: 1357119488. Throughput: 0: 11411.9. Samples: 339340288. Policy #0 lag: (min: 64.0, avg: 225.3, max: 327.0) [2024-06-15 19:39:10,956][1648985] Avg episode reward: [(0, '150.120')] [2024-06-15 19:39:11,884][1652491] Updated weights for policy 0, policy_version 662688 (0.0068) [2024-06-15 19:39:14,011][1652491] Updated weights for policy 0, policy_version 662752 (0.0013) [2024-06-15 19:39:15,272][1652491] Updated weights for policy 0, policy_version 662800 (0.0012) [2024-06-15 19:39:15,955][1648985] Fps is (10 sec: 45874.6, 60 sec: 47513.5, 300 sec: 46652.7). Total num frames: 1357447168. Throughput: 0: 11741.8. Samples: 339409920. Policy #0 lag: (min: 64.0, avg: 225.3, max: 327.0) [2024-06-15 19:39:15,956][1648985] Avg episode reward: [(0, '153.290')] [2024-06-15 19:39:16,539][1652491] Updated weights for policy 0, policy_version 662848 (0.0011) [2024-06-15 19:39:20,955][1648985] Fps is (10 sec: 49152.0, 60 sec: 45329.1, 300 sec: 46541.7). Total num frames: 1357611008. Throughput: 0: 11594.0. Samples: 339483136. Policy #0 lag: (min: 64.0, avg: 225.3, max: 327.0) [2024-06-15 19:39:20,956][1648985] Avg episode reward: [(0, '159.800')] [2024-06-15 19:39:21,181][1652491] Updated weights for policy 0, policy_version 662911 (0.0013) [2024-06-15 19:39:23,338][1652491] Updated weights for policy 0, policy_version 662966 (0.0015) [2024-06-15 19:39:25,665][1651469] Signal inference workers to stop experience collection... (34500 times) [2024-06-15 19:39:25,716][1652491] InferenceWorker_p0-w0: stopping experience collection (34500 times) [2024-06-15 19:39:25,955][1648985] Fps is (10 sec: 39322.1, 60 sec: 46967.5, 300 sec: 46541.7). Total num frames: 1357840384. Throughput: 0: 11605.3. Samples: 339517440. Policy #0 lag: (min: 64.0, avg: 225.3, max: 327.0) [2024-06-15 19:39:25,956][1648985] Avg episode reward: [(0, '143.870')] [2024-06-15 19:39:26,000][1651469] Signal inference workers to resume experience collection... (34500 times) [2024-06-15 19:39:26,002][1652491] InferenceWorker_p0-w0: resuming experience collection (34500 times) [2024-06-15 19:39:26,474][1652491] Updated weights for policy 0, policy_version 663040 (0.0012) [2024-06-15 19:39:28,253][1652491] Updated weights for policy 0, policy_version 663098 (0.0013) [2024-06-15 19:39:30,955][1648985] Fps is (10 sec: 42598.6, 60 sec: 44236.9, 300 sec: 46208.4). Total num frames: 1358036992. Throughput: 0: 11559.9. Samples: 339577344. Policy #0 lag: (min: 64.0, avg: 225.3, max: 327.0) [2024-06-15 19:39:30,955][1648985] Avg episode reward: [(0, '164.340')] [2024-06-15 19:39:32,858][1652491] Updated weights for policy 0, policy_version 663160 (0.0012) [2024-06-15 19:39:35,181][1652491] Updated weights for policy 0, policy_version 663202 (0.0040) [2024-06-15 19:39:35,955][1648985] Fps is (10 sec: 45875.5, 60 sec: 48059.8, 300 sec: 46763.8). Total num frames: 1358299136. Throughput: 0: 11673.6. Samples: 339652608. Policy #0 lag: (min: 64.0, avg: 225.3, max: 327.0) [2024-06-15 19:39:35,955][1648985] Avg episode reward: [(0, '156.090')] [2024-06-15 19:39:37,575][1652491] Updated weights for policy 0, policy_version 663291 (0.0014) [2024-06-15 19:39:39,408][1652491] Updated weights for policy 0, policy_version 663344 (0.0013) [2024-06-15 19:39:40,955][1648985] Fps is (10 sec: 52428.0, 60 sec: 46421.2, 300 sec: 46319.5). Total num frames: 1358561280. Throughput: 0: 11764.7. Samples: 339681280. Policy #0 lag: (min: 64.0, avg: 225.3, max: 327.0) [2024-06-15 19:39:40,956][1648985] Avg episode reward: [(0, '150.420')] [2024-06-15 19:39:43,309][1652491] Updated weights for policy 0, policy_version 663392 (0.0013) [2024-06-15 19:39:45,309][1652491] Updated weights for policy 0, policy_version 663429 (0.0013) [2024-06-15 19:39:45,955][1648985] Fps is (10 sec: 45874.8, 60 sec: 46967.6, 300 sec: 46763.8). Total num frames: 1358757888. Throughput: 0: 11608.2. Samples: 339758080. Policy #0 lag: (min: 64.0, avg: 225.3, max: 327.0) [2024-06-15 19:39:45,957][1648985] Avg episode reward: [(0, '166.670')] [2024-06-15 19:39:46,469][1652491] Updated weights for policy 0, policy_version 663488 (0.0012) [2024-06-15 19:39:49,050][1652491] Updated weights for policy 0, policy_version 663542 (0.0014) [2024-06-15 19:39:50,618][1652491] Updated weights for policy 0, policy_version 663608 (0.0103) [2024-06-15 19:39:50,955][1648985] Fps is (10 sec: 52429.4, 60 sec: 48059.7, 300 sec: 46541.7). Total num frames: 1359085568. Throughput: 0: 11650.9. Samples: 339820544. Policy #0 lag: (min: 64.0, avg: 225.3, max: 327.0) [2024-06-15 19:39:50,956][1648985] Avg episode reward: [(0, '161.000')] [2024-06-15 19:39:54,755][1652491] Updated weights for policy 0, policy_version 663664 (0.0013) [2024-06-15 19:39:55,955][1648985] Fps is (10 sec: 45874.0, 60 sec: 45875.3, 300 sec: 46652.7). Total num frames: 1359216640. Throughput: 0: 11605.3. Samples: 339862528. Policy #0 lag: (min: 64.0, avg: 225.3, max: 327.0) [2024-06-15 19:39:55,957][1648985] Avg episode reward: [(0, '151.380')] [2024-06-15 19:39:56,614][1652491] Updated weights for policy 0, policy_version 663712 (0.0013) [2024-06-15 19:40:00,066][1652491] Updated weights for policy 0, policy_version 663801 (0.0013) [2024-06-15 19:40:00,955][1648985] Fps is (10 sec: 39321.7, 60 sec: 47522.5, 300 sec: 46208.4). Total num frames: 1359478784. Throughput: 0: 11616.8. Samples: 339932672. Policy #0 lag: (min: 64.0, avg: 225.3, max: 327.0) [2024-06-15 19:40:00,956][1648985] Avg episode reward: [(0, '162.410')] [2024-06-15 19:40:02,001][1652491] Updated weights for policy 0, policy_version 663862 (0.0015) [2024-06-15 19:40:05,487][1652491] Updated weights for policy 0, policy_version 663920 (0.0056) [2024-06-15 19:40:05,955][1648985] Fps is (10 sec: 52430.6, 60 sec: 45875.2, 300 sec: 46652.8). Total num frames: 1359740928. Throughput: 0: 11628.1. Samples: 340006400. Policy #0 lag: (min: 64.0, avg: 225.3, max: 327.0) [2024-06-15 19:40:05,955][1648985] Avg episode reward: [(0, '169.810')] [2024-06-15 19:40:07,686][1652491] Updated weights for policy 0, policy_version 663974 (0.0013) [2024-06-15 19:40:10,233][1652491] Updated weights for policy 0, policy_version 664032 (0.0012) [2024-06-15 19:40:10,377][1651469] Signal inference workers to stop experience collection... (34550 times) [2024-06-15 19:40:10,414][1652491] InferenceWorker_p0-w0: stopping experience collection (34550 times) [2024-06-15 19:40:10,592][1651469] Signal inference workers to resume experience collection... (34550 times) [2024-06-15 19:40:10,593][1652491] InferenceWorker_p0-w0: resuming experience collection (34550 times) [2024-06-15 19:40:10,955][1648985] Fps is (10 sec: 49151.6, 60 sec: 47513.6, 300 sec: 46319.5). Total num frames: 1359970304. Throughput: 0: 11650.8. Samples: 340041728. Policy #0 lag: (min: 64.0, avg: 225.3, max: 327.0) [2024-06-15 19:40:10,956][1648985] Avg episode reward: [(0, '139.300')] [2024-06-15 19:40:11,019][1652491] Updated weights for policy 0, policy_version 664064 (0.0018) [2024-06-15 19:40:12,798][1652491] Updated weights for policy 0, policy_version 664118 (0.0012) [2024-06-15 19:40:15,396][1652491] Updated weights for policy 0, policy_version 664150 (0.0014) [2024-06-15 19:40:15,955][1648985] Fps is (10 sec: 49151.6, 60 sec: 46421.4, 300 sec: 46541.7). Total num frames: 1360232448. Throughput: 0: 12071.8. Samples: 340120576. Policy #0 lag: (min: 64.0, avg: 225.3, max: 327.0) [2024-06-15 19:40:15,956][1648985] Avg episode reward: [(0, '163.730')] [2024-06-15 19:40:16,168][1652491] Updated weights for policy 0, policy_version 664191 (0.0015) [2024-06-15 19:40:18,302][1652491] Updated weights for policy 0, policy_version 664256 (0.0013) [2024-06-15 19:40:20,544][1652491] Updated weights for policy 0, policy_version 664318 (0.0013) [2024-06-15 19:40:20,955][1648985] Fps is (10 sec: 55706.1, 60 sec: 48605.9, 300 sec: 46763.8). Total num frames: 1360527360. Throughput: 0: 12060.4. Samples: 340195328. Policy #0 lag: (min: 64.0, avg: 225.3, max: 327.0) [2024-06-15 19:40:20,955][1648985] Avg episode reward: [(0, '169.080')] [2024-06-15 19:40:23,188][1652491] Updated weights for policy 0, policy_version 664377 (0.0015) [2024-06-15 19:40:25,955][1648985] Fps is (10 sec: 45875.4, 60 sec: 47513.6, 300 sec: 46319.5). Total num frames: 1360691200. Throughput: 0: 12208.4. Samples: 340230656. Policy #0 lag: (min: 64.0, avg: 225.3, max: 327.0) [2024-06-15 19:40:25,955][1648985] Avg episode reward: [(0, '171.940')] [2024-06-15 19:40:26,505][1652491] Updated weights for policy 0, policy_version 664432 (0.0014) [2024-06-15 19:40:28,750][1652491] Updated weights for policy 0, policy_version 664465 (0.0011) [2024-06-15 19:40:29,533][1652491] Updated weights for policy 0, policy_version 664510 (0.0012) [2024-06-15 19:40:30,955][1648985] Fps is (10 sec: 42597.9, 60 sec: 48605.8, 300 sec: 46652.8). Total num frames: 1360953344. Throughput: 0: 12094.6. Samples: 340302336. Policy #0 lag: (min: 64.0, avg: 225.3, max: 327.0) [2024-06-15 19:40:30,956][1648985] Avg episode reward: [(0, '151.380')] [2024-06-15 19:40:31,510][1652491] Updated weights for policy 0, policy_version 664560 (0.0011) [2024-06-15 19:40:33,408][1652491] Updated weights for policy 0, policy_version 664592 (0.0012) [2024-06-15 19:40:35,955][1648985] Fps is (10 sec: 49152.0, 60 sec: 48059.7, 300 sec: 46208.4). Total num frames: 1361182720. Throughput: 0: 12390.4. Samples: 340378112. Policy #0 lag: (min: 64.0, avg: 225.3, max: 327.0) [2024-06-15 19:40:35,956][1648985] Avg episode reward: [(0, '151.700')] [2024-06-15 19:40:36,872][1652491] Updated weights for policy 0, policy_version 664672 (0.0012) [2024-06-15 19:40:39,429][1652491] Updated weights for policy 0, policy_version 664724 (0.0012) [2024-06-15 19:40:40,336][1652491] Updated weights for policy 0, policy_version 664763 (0.0013) [2024-06-15 19:40:40,955][1648985] Fps is (10 sec: 49151.9, 60 sec: 48059.7, 300 sec: 46763.8). Total num frames: 1361444864. Throughput: 0: 12231.2. Samples: 340412928. Policy #0 lag: (min: 64.0, avg: 225.3, max: 327.0) [2024-06-15 19:40:40,956][1648985] Avg episode reward: [(0, '151.330')] [2024-06-15 19:40:42,780][1652491] Updated weights for policy 0, policy_version 664816 (0.0023) [2024-06-15 19:40:44,566][1652491] Updated weights for policy 0, policy_version 664833 (0.0012) [2024-06-15 19:40:45,588][1652491] Updated weights for policy 0, policy_version 664894 (0.0015) [2024-06-15 19:40:45,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 49152.0, 300 sec: 46430.6). Total num frames: 1361707008. Throughput: 0: 12128.7. Samples: 340478464. Policy #0 lag: (min: 15.0, avg: 129.3, max: 271.0) [2024-06-15 19:40:45,956][1648985] Avg episode reward: [(0, '158.280')] [2024-06-15 19:40:48,116][1652491] Updated weights for policy 0, policy_version 664945 (0.0045) [2024-06-15 19:40:50,806][1652491] Updated weights for policy 0, policy_version 664992 (0.0122) [2024-06-15 19:40:50,955][1648985] Fps is (10 sec: 45875.5, 60 sec: 46967.4, 300 sec: 46874.9). Total num frames: 1361903616. Throughput: 0: 12197.0. Samples: 340555264. Policy #0 lag: (min: 15.0, avg: 129.3, max: 271.0) [2024-06-15 19:40:50,956][1648985] Avg episode reward: [(0, '159.220')] [2024-06-15 19:40:52,432][1652491] Updated weights for policy 0, policy_version 665027 (0.0014) [2024-06-15 19:40:53,643][1652491] Updated weights for policy 0, policy_version 665079 (0.0013) [2024-06-15 19:40:54,912][1651469] Signal inference workers to stop experience collection... (34600 times) [2024-06-15 19:40:55,022][1652491] InferenceWorker_p0-w0: stopping experience collection (34600 times) [2024-06-15 19:40:55,024][1652491] Updated weights for policy 0, policy_version 665109 (0.0011) [2024-06-15 19:40:55,174][1651469] Signal inference workers to resume experience collection... (34600 times) [2024-06-15 19:40:55,175][1652491] InferenceWorker_p0-w0: resuming experience collection (34600 times) [2024-06-15 19:40:55,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 50244.5, 300 sec: 46874.9). Total num frames: 1362231296. Throughput: 0: 12265.3. Samples: 340593664. Policy #0 lag: (min: 15.0, avg: 129.3, max: 271.0) [2024-06-15 19:40:55,955][1648985] Avg episode reward: [(0, '165.630')] [2024-06-15 19:40:55,976][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000665152_1362231296.pth... [2024-06-15 19:40:56,059][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000659648_1350959104.pth [2024-06-15 19:40:58,558][1652491] Updated weights for policy 0, policy_version 665168 (0.0016) [2024-06-15 19:40:59,563][1652491] Updated weights for policy 0, policy_version 665212 (0.0012) [2024-06-15 19:41:00,955][1648985] Fps is (10 sec: 49151.9, 60 sec: 48605.8, 300 sec: 46763.8). Total num frames: 1362395136. Throughput: 0: 12083.2. Samples: 340664320. Policy #0 lag: (min: 15.0, avg: 129.3, max: 271.0) [2024-06-15 19:41:00,956][1648985] Avg episode reward: [(0, '148.340')] [2024-06-15 19:41:01,427][1652491] Updated weights for policy 0, policy_version 665264 (0.0013) [2024-06-15 19:41:04,731][1652491] Updated weights for policy 0, policy_version 665339 (0.0013) [2024-06-15 19:41:05,955][1648985] Fps is (10 sec: 39321.1, 60 sec: 48059.6, 300 sec: 47097.1). Total num frames: 1362624512. Throughput: 0: 12037.6. Samples: 340737024. Policy #0 lag: (min: 15.0, avg: 129.3, max: 271.0) [2024-06-15 19:41:05,956][1648985] Avg episode reward: [(0, '136.210')] [2024-06-15 19:41:06,720][1652491] Updated weights for policy 0, policy_version 665379 (0.0012) [2024-06-15 19:41:09,192][1652491] Updated weights for policy 0, policy_version 665415 (0.0013) [2024-06-15 19:41:10,499][1652491] Updated weights for policy 0, policy_version 665472 (0.0013) [2024-06-15 19:41:10,955][1648985] Fps is (10 sec: 49152.5, 60 sec: 48606.0, 300 sec: 46652.8). Total num frames: 1362886656. Throughput: 0: 12117.4. Samples: 340775936. Policy #0 lag: (min: 15.0, avg: 129.3, max: 271.0) [2024-06-15 19:41:10,955][1648985] Avg episode reward: [(0, '148.920')] [2024-06-15 19:41:11,857][1652491] Updated weights for policy 0, policy_version 665530 (0.0012) [2024-06-15 19:41:15,606][1652491] Updated weights for policy 0, policy_version 665584 (0.0013) [2024-06-15 19:41:15,955][1648985] Fps is (10 sec: 49153.0, 60 sec: 48059.8, 300 sec: 47319.2). Total num frames: 1363116032. Throughput: 0: 12094.6. Samples: 340846592. Policy #0 lag: (min: 15.0, avg: 129.3, max: 271.0) [2024-06-15 19:41:15,955][1648985] Avg episode reward: [(0, '162.440')] [2024-06-15 19:41:17,339][1652491] Updated weights for policy 0, policy_version 665618 (0.0014) [2024-06-15 19:41:18,213][1652491] Updated weights for policy 0, policy_version 665664 (0.0014) [2024-06-15 19:41:20,955][1648985] Fps is (10 sec: 45875.2, 60 sec: 46967.5, 300 sec: 46541.7). Total num frames: 1363345408. Throughput: 0: 12026.3. Samples: 340919296. Policy #0 lag: (min: 15.0, avg: 129.3, max: 271.0) [2024-06-15 19:41:20,955][1648985] Avg episode reward: [(0, '163.880')] [2024-06-15 19:41:21,204][1652491] Updated weights for policy 0, policy_version 665715 (0.0012) [2024-06-15 19:41:22,977][1652491] Updated weights for policy 0, policy_version 665782 (0.0011) [2024-06-15 19:41:25,955][1648985] Fps is (10 sec: 45874.7, 60 sec: 48059.7, 300 sec: 47208.2). Total num frames: 1363574784. Throughput: 0: 11889.8. Samples: 340947968. Policy #0 lag: (min: 15.0, avg: 129.3, max: 271.0) [2024-06-15 19:41:25,956][1648985] Avg episode reward: [(0, '168.520')] [2024-06-15 19:41:26,515][1652491] Updated weights for policy 0, policy_version 665832 (0.0028) [2024-06-15 19:41:29,373][1652491] Updated weights for policy 0, policy_version 665893 (0.0022) [2024-06-15 19:41:30,955][1648985] Fps is (10 sec: 45874.5, 60 sec: 47513.6, 300 sec: 46541.7). Total num frames: 1363804160. Throughput: 0: 12014.9. Samples: 341019136. Policy #0 lag: (min: 15.0, avg: 129.3, max: 271.0) [2024-06-15 19:41:30,956][1648985] Avg episode reward: [(0, '171.930')] [2024-06-15 19:41:31,414][1652491] Updated weights for policy 0, policy_version 665936 (0.0011) [2024-06-15 19:41:32,429][1652491] Updated weights for policy 0, policy_version 665977 (0.0023) [2024-06-15 19:41:34,482][1652491] Updated weights for policy 0, policy_version 666036 (0.0019) [2024-06-15 19:41:35,955][1648985] Fps is (10 sec: 49151.9, 60 sec: 48059.7, 300 sec: 47097.0). Total num frames: 1364066304. Throughput: 0: 11901.1. Samples: 341090816. Policy #0 lag: (min: 15.0, avg: 129.3, max: 271.0) [2024-06-15 19:41:35,956][1648985] Avg episode reward: [(0, '157.980')] [2024-06-15 19:41:37,308][1652491] Updated weights for policy 0, policy_version 666080 (0.0012) [2024-06-15 19:41:37,886][1652491] Updated weights for policy 0, policy_version 666110 (0.0013) [2024-06-15 19:41:40,371][1651469] Signal inference workers to stop experience collection... (34650 times) [2024-06-15 19:41:40,407][1652491] InferenceWorker_p0-w0: stopping experience collection (34650 times) [2024-06-15 19:41:40,691][1651469] Signal inference workers to resume experience collection... (34650 times) [2024-06-15 19:41:40,692][1652491] InferenceWorker_p0-w0: resuming experience collection (34650 times) [2024-06-15 19:41:40,925][1652491] Updated weights for policy 0, policy_version 666169 (0.0021) [2024-06-15 19:41:40,955][1648985] Fps is (10 sec: 49152.3, 60 sec: 47513.7, 300 sec: 46874.9). Total num frames: 1364295680. Throughput: 0: 11855.6. Samples: 341127168. Policy #0 lag: (min: 15.0, avg: 129.3, max: 271.0) [2024-06-15 19:41:40,956][1648985] Avg episode reward: [(0, '156.390')] [2024-06-15 19:41:43,610][1652491] Updated weights for policy 0, policy_version 666210 (0.0013) [2024-06-15 19:41:45,416][1652491] Updated weights for policy 0, policy_version 666274 (0.0012) [2024-06-15 19:41:45,955][1648985] Fps is (10 sec: 49152.3, 60 sec: 47513.6, 300 sec: 46986.0). Total num frames: 1364557824. Throughput: 0: 11696.4. Samples: 341190656. Policy #0 lag: (min: 15.0, avg: 129.3, max: 271.0) [2024-06-15 19:41:45,955][1648985] Avg episode reward: [(0, '164.480')] [2024-06-15 19:41:48,723][1652491] Updated weights for policy 0, policy_version 666306 (0.0011) [2024-06-15 19:41:49,822][1652491] Updated weights for policy 0, policy_version 666360 (0.0013) [2024-06-15 19:41:50,955][1648985] Fps is (10 sec: 42598.0, 60 sec: 46967.4, 300 sec: 46874.9). Total num frames: 1364721664. Throughput: 0: 11696.4. Samples: 341263360. Policy #0 lag: (min: 15.0, avg: 129.3, max: 271.0) [2024-06-15 19:41:50,956][1648985] Avg episode reward: [(0, '185.610')] [2024-06-15 19:41:52,506][1652491] Updated weights for policy 0, policy_version 666418 (0.0013) [2024-06-15 19:41:54,773][1652491] Updated weights for policy 0, policy_version 666472 (0.0019) [2024-06-15 19:41:55,859][1652491] Updated weights for policy 0, policy_version 666512 (0.0144) [2024-06-15 19:41:55,955][1648985] Fps is (10 sec: 45874.4, 60 sec: 46421.2, 300 sec: 46763.8). Total num frames: 1365016576. Throughput: 0: 11650.8. Samples: 341300224. Policy #0 lag: (min: 15.0, avg: 129.3, max: 271.0) [2024-06-15 19:41:55,956][1648985] Avg episode reward: [(0, '180.220')] [2024-06-15 19:41:56,944][1652491] Updated weights for policy 0, policy_version 666556 (0.0012) [2024-06-15 19:42:00,537][1652491] Updated weights for policy 0, policy_version 666608 (0.0013) [2024-06-15 19:42:00,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 47513.6, 300 sec: 47097.7). Total num frames: 1365245952. Throughput: 0: 11605.3. Samples: 341368832. Policy #0 lag: (min: 15.0, avg: 129.3, max: 271.0) [2024-06-15 19:42:00,956][1648985] Avg episode reward: [(0, '175.930')] [2024-06-15 19:42:03,928][1652491] Updated weights for policy 0, policy_version 666672 (0.0014) [2024-06-15 19:42:05,555][1652491] Updated weights for policy 0, policy_version 666705 (0.0013) [2024-06-15 19:42:05,955][1648985] Fps is (10 sec: 42599.0, 60 sec: 46967.5, 300 sec: 46430.6). Total num frames: 1365442560. Throughput: 0: 11491.5. Samples: 341436416. Policy #0 lag: (min: 15.0, avg: 129.3, max: 271.0) [2024-06-15 19:42:05,956][1648985] Avg episode reward: [(0, '158.120')] [2024-06-15 19:42:06,883][1652491] Updated weights for policy 0, policy_version 666753 (0.0091) [2024-06-15 19:42:08,299][1652491] Updated weights for policy 0, policy_version 666813 (0.0010) [2024-06-15 19:42:10,955][1648985] Fps is (10 sec: 45875.6, 60 sec: 46967.4, 300 sec: 47319.2). Total num frames: 1365704704. Throughput: 0: 11605.3. Samples: 341470208. Policy #0 lag: (min: 15.0, avg: 129.3, max: 271.0) [2024-06-15 19:42:10,956][1648985] Avg episode reward: [(0, '154.730')] [2024-06-15 19:42:11,042][1652491] Updated weights for policy 0, policy_version 666864 (0.0015) [2024-06-15 19:42:14,183][1652491] Updated weights for policy 0, policy_version 666914 (0.0013) [2024-06-15 19:42:15,955][1648985] Fps is (10 sec: 45875.5, 60 sec: 46421.3, 300 sec: 46541.7). Total num frames: 1365901312. Throughput: 0: 11776.0. Samples: 341549056. Policy #0 lag: (min: 15.0, avg: 129.3, max: 271.0) [2024-06-15 19:42:15,955][1648985] Avg episode reward: [(0, '152.860')] [2024-06-15 19:42:16,138][1652491] Updated weights for policy 0, policy_version 666960 (0.0019) [2024-06-15 19:42:17,264][1652491] Updated weights for policy 0, policy_version 667006 (0.0014) [2024-06-15 19:42:18,947][1652491] Updated weights for policy 0, policy_version 667056 (0.0012) [2024-06-15 19:42:20,955][1648985] Fps is (10 sec: 45874.7, 60 sec: 46967.3, 300 sec: 47097.0). Total num frames: 1366163456. Throughput: 0: 11844.2. Samples: 341623808. Policy #0 lag: (min: 15.0, avg: 129.3, max: 271.0) [2024-06-15 19:42:20,956][1648985] Avg episode reward: [(0, '148.770')] [2024-06-15 19:42:21,489][1652491] Updated weights for policy 0, policy_version 667104 (0.0091) [2024-06-15 19:42:24,660][1652491] Updated weights for policy 0, policy_version 667153 (0.0012) [2024-06-15 19:42:24,949][1651469] Signal inference workers to stop experience collection... (34700 times) [2024-06-15 19:42:24,989][1652491] InferenceWorker_p0-w0: stopping experience collection (34700 times) [2024-06-15 19:42:25,139][1651469] Signal inference workers to resume experience collection... (34700 times) [2024-06-15 19:42:25,140][1652491] InferenceWorker_p0-w0: resuming experience collection (34700 times) [2024-06-15 19:42:25,402][1652491] Updated weights for policy 0, policy_version 667196 (0.0015) [2024-06-15 19:42:25,955][1648985] Fps is (10 sec: 52429.4, 60 sec: 47513.7, 300 sec: 46874.9). Total num frames: 1366425600. Throughput: 0: 11912.6. Samples: 341663232. Policy #0 lag: (min: 15.0, avg: 112.9, max: 271.0) [2024-06-15 19:42:25,955][1648985] Avg episode reward: [(0, '142.160')] [2024-06-15 19:42:26,796][1652491] Updated weights for policy 0, policy_version 667248 (0.0014) [2024-06-15 19:42:28,353][1652491] Updated weights for policy 0, policy_version 667269 (0.0012) [2024-06-15 19:42:30,955][1648985] Fps is (10 sec: 52427.6, 60 sec: 48059.5, 300 sec: 47097.0). Total num frames: 1366687744. Throughput: 0: 12049.0. Samples: 341732864. Policy #0 lag: (min: 15.0, avg: 112.9, max: 271.0) [2024-06-15 19:42:30,957][1648985] Avg episode reward: [(0, '150.530')] [2024-06-15 19:42:31,507][1652491] Updated weights for policy 0, policy_version 667344 (0.0014) [2024-06-15 19:42:32,403][1652491] Updated weights for policy 0, policy_version 667387 (0.0010) [2024-06-15 19:42:35,435][1652491] Updated weights for policy 0, policy_version 667447 (0.0012) [2024-06-15 19:42:35,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 48059.9, 300 sec: 47541.4). Total num frames: 1366949888. Throughput: 0: 12299.4. Samples: 341816832. Policy #0 lag: (min: 15.0, avg: 112.9, max: 271.0) [2024-06-15 19:42:35,955][1648985] Avg episode reward: [(0, '151.650')] [2024-06-15 19:42:36,542][1652491] Updated weights for policy 0, policy_version 667474 (0.0031) [2024-06-15 19:42:38,497][1652491] Updated weights for policy 0, policy_version 667526 (0.0016) [2024-06-15 19:42:39,560][1652491] Updated weights for policy 0, policy_version 667581 (0.0012) [2024-06-15 19:42:40,955][1648985] Fps is (10 sec: 52430.6, 60 sec: 48605.9, 300 sec: 47097.1). Total num frames: 1367212032. Throughput: 0: 12288.0. Samples: 341853184. Policy #0 lag: (min: 15.0, avg: 112.9, max: 271.0) [2024-06-15 19:42:40,956][1648985] Avg episode reward: [(0, '163.420')] [2024-06-15 19:42:42,711][1652491] Updated weights for policy 0, policy_version 667644 (0.0021) [2024-06-15 19:42:45,955][1648985] Fps is (10 sec: 42598.0, 60 sec: 46967.5, 300 sec: 47652.4). Total num frames: 1367375872. Throughput: 0: 12401.8. Samples: 341926912. Policy #0 lag: (min: 15.0, avg: 112.9, max: 271.0) [2024-06-15 19:42:45,955][1648985] Avg episode reward: [(0, '162.310')] [2024-06-15 19:42:47,040][1652491] Updated weights for policy 0, policy_version 667713 (0.0013) [2024-06-15 19:42:47,937][1652491] Updated weights for policy 0, policy_version 667763 (0.0012) [2024-06-15 19:42:49,885][1652491] Updated weights for policy 0, policy_version 667797 (0.0012) [2024-06-15 19:42:50,955][1648985] Fps is (10 sec: 52428.3, 60 sec: 50244.3, 300 sec: 47430.3). Total num frames: 1367736320. Throughput: 0: 12413.1. Samples: 341995008. Policy #0 lag: (min: 15.0, avg: 112.9, max: 271.0) [2024-06-15 19:42:50,956][1648985] Avg episode reward: [(0, '172.970')] [2024-06-15 19:42:53,568][1652491] Updated weights for policy 0, policy_version 667888 (0.0015) [2024-06-15 19:42:55,955][1648985] Fps is (10 sec: 49151.4, 60 sec: 47513.7, 300 sec: 47541.4). Total num frames: 1367867392. Throughput: 0: 12413.1. Samples: 342028800. Policy #0 lag: (min: 15.0, avg: 112.9, max: 271.0) [2024-06-15 19:42:55,956][1648985] Avg episode reward: [(0, '164.480')] [2024-06-15 19:42:55,962][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000667904_1367867392.pth... [2024-06-15 19:42:56,004][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000662336_1356464128.pth [2024-06-15 19:42:57,307][1652491] Updated weights for policy 0, policy_version 667938 (0.0012) [2024-06-15 19:42:58,951][1652491] Updated weights for policy 0, policy_version 668016 (0.0013) [2024-06-15 19:43:00,670][1652491] Updated weights for policy 0, policy_version 668037 (0.0012) [2024-06-15 19:43:00,955][1648985] Fps is (10 sec: 42599.0, 60 sec: 48606.0, 300 sec: 47430.3). Total num frames: 1368162304. Throughput: 0: 12288.0. Samples: 342102016. Policy #0 lag: (min: 15.0, avg: 112.9, max: 271.0) [2024-06-15 19:43:00,955][1648985] Avg episode reward: [(0, '168.040')] [2024-06-15 19:43:01,594][1652491] Updated weights for policy 0, policy_version 668086 (0.0122) [2024-06-15 19:43:04,713][1652491] Updated weights for policy 0, policy_version 668128 (0.0084) [2024-06-15 19:43:05,955][1648985] Fps is (10 sec: 52429.4, 60 sec: 49152.0, 300 sec: 47541.4). Total num frames: 1368391680. Throughput: 0: 12219.8. Samples: 342173696. Policy #0 lag: (min: 15.0, avg: 112.9, max: 271.0) [2024-06-15 19:43:05,955][1648985] Avg episode reward: [(0, '164.440')] [2024-06-15 19:43:07,320][1652491] Updated weights for policy 0, policy_version 668161 (0.0012) [2024-06-15 19:43:08,111][1651469] Signal inference workers to stop experience collection... (34750 times) [2024-06-15 19:43:08,222][1652491] InferenceWorker_p0-w0: stopping experience collection (34750 times) [2024-06-15 19:43:08,359][1651469] Signal inference workers to resume experience collection... (34750 times) [2024-06-15 19:43:08,360][1652491] InferenceWorker_p0-w0: resuming experience collection (34750 times) [2024-06-15 19:43:08,753][1652491] Updated weights for policy 0, policy_version 668224 (0.0013) [2024-06-15 19:43:10,955][1648985] Fps is (10 sec: 49151.8, 60 sec: 49152.0, 300 sec: 47652.5). Total num frames: 1368653824. Throughput: 0: 12128.7. Samples: 342209024. Policy #0 lag: (min: 15.0, avg: 112.9, max: 271.0) [2024-06-15 19:43:10,956][1648985] Avg episode reward: [(0, '173.820')] [2024-06-15 19:43:12,229][1652491] Updated weights for policy 0, policy_version 668320 (0.0013) [2024-06-15 19:43:15,955][1648985] Fps is (10 sec: 45875.3, 60 sec: 49152.0, 300 sec: 47319.2). Total num frames: 1368850432. Throughput: 0: 12197.1. Samples: 342281728. Policy #0 lag: (min: 15.0, avg: 112.9, max: 271.0) [2024-06-15 19:43:15,955][1648985] Avg episode reward: [(0, '168.300')] [2024-06-15 19:43:16,075][1652491] Updated weights for policy 0, policy_version 668392 (0.0015) [2024-06-15 19:43:18,891][1652491] Updated weights for policy 0, policy_version 668437 (0.0013) [2024-06-15 19:43:20,955][1648985] Fps is (10 sec: 45875.3, 60 sec: 49152.1, 300 sec: 47763.5). Total num frames: 1369112576. Throughput: 0: 11832.9. Samples: 342349312. Policy #0 lag: (min: 15.0, avg: 112.9, max: 271.0) [2024-06-15 19:43:20,955][1648985] Avg episode reward: [(0, '169.790')] [2024-06-15 19:43:21,003][1652491] Updated weights for policy 0, policy_version 668528 (0.0014) [2024-06-15 19:43:23,444][1652491] Updated weights for policy 0, policy_version 668604 (0.0012) [2024-06-15 19:43:25,955][1648985] Fps is (10 sec: 45873.5, 60 sec: 48059.4, 300 sec: 47208.1). Total num frames: 1369309184. Throughput: 0: 11730.4. Samples: 342381056. Policy #0 lag: (min: 15.0, avg: 112.9, max: 271.0) [2024-06-15 19:43:25,956][1648985] Avg episode reward: [(0, '189.550')] [2024-06-15 19:43:27,556][1652491] Updated weights for policy 0, policy_version 668665 (0.0168) [2024-06-15 19:43:30,955][1648985] Fps is (10 sec: 36044.5, 60 sec: 46421.6, 300 sec: 47652.4). Total num frames: 1369473024. Throughput: 0: 11741.8. Samples: 342455296. Policy #0 lag: (min: 15.0, avg: 112.9, max: 271.0) [2024-06-15 19:43:30,956][1648985] Avg episode reward: [(0, '172.430')] [2024-06-15 19:43:31,175][1652491] Updated weights for policy 0, policy_version 668709 (0.0012) [2024-06-15 19:43:32,434][1652491] Updated weights for policy 0, policy_version 668757 (0.0015) [2024-06-15 19:43:34,129][1652491] Updated weights for policy 0, policy_version 668805 (0.0013) [2024-06-15 19:43:35,248][1652491] Updated weights for policy 0, policy_version 668860 (0.0014) [2024-06-15 19:43:35,955][1648985] Fps is (10 sec: 52430.3, 60 sec: 48059.6, 300 sec: 47652.4). Total num frames: 1369833472. Throughput: 0: 11616.7. Samples: 342517760. Policy #0 lag: (min: 15.0, avg: 112.9, max: 271.0) [2024-06-15 19:43:35,956][1648985] Avg episode reward: [(0, '162.050')] [2024-06-15 19:43:38,996][1652491] Updated weights for policy 0, policy_version 668912 (0.0044) [2024-06-15 19:43:40,955][1648985] Fps is (10 sec: 49151.9, 60 sec: 45875.1, 300 sec: 47541.4). Total num frames: 1369964544. Throughput: 0: 11662.2. Samples: 342553600. Policy #0 lag: (min: 15.0, avg: 112.9, max: 271.0) [2024-06-15 19:43:40,956][1648985] Avg episode reward: [(0, '172.990')] [2024-06-15 19:43:42,715][1652491] Updated weights for policy 0, policy_version 668962 (0.0031) [2024-06-15 19:43:45,008][1652491] Updated weights for policy 0, policy_version 669056 (0.0012) [2024-06-15 19:43:45,955][1648985] Fps is (10 sec: 39321.2, 60 sec: 47513.4, 300 sec: 47541.3). Total num frames: 1370226688. Throughput: 0: 11389.1. Samples: 342614528. Policy #0 lag: (min: 15.0, avg: 112.9, max: 271.0) [2024-06-15 19:43:45,956][1648985] Avg episode reward: [(0, '176.410')] [2024-06-15 19:43:47,137][1652491] Updated weights for policy 0, policy_version 669113 (0.0015) [2024-06-15 19:43:50,955][1648985] Fps is (10 sec: 45875.4, 60 sec: 44783.0, 300 sec: 47319.3). Total num frames: 1370423296. Throughput: 0: 11343.6. Samples: 342684160. Policy #0 lag: (min: 15.0, avg: 112.9, max: 271.0) [2024-06-15 19:43:50,956][1648985] Avg episode reward: [(0, '179.410')] [2024-06-15 19:43:51,169][1652491] Updated weights for policy 0, policy_version 669175 (0.0014) [2024-06-15 19:43:53,928][1651469] Signal inference workers to stop experience collection... (34800 times) [2024-06-15 19:43:53,956][1652491] InferenceWorker_p0-w0: stopping experience collection (34800 times) [2024-06-15 19:43:54,174][1651469] Signal inference workers to resume experience collection... (34800 times) [2024-06-15 19:43:54,176][1652491] InferenceWorker_p0-w0: resuming experience collection (34800 times) [2024-06-15 19:43:54,647][1652491] Updated weights for policy 0, policy_version 669216 (0.0011) [2024-06-15 19:43:55,955][1648985] Fps is (10 sec: 42598.4, 60 sec: 46421.3, 300 sec: 47543.2). Total num frames: 1370652672. Throughput: 0: 11537.0. Samples: 342728192. Policy #0 lag: (min: 15.0, avg: 112.9, max: 271.0) [2024-06-15 19:43:55,956][1648985] Avg episode reward: [(0, '170.750')] [2024-06-15 19:43:56,763][1652491] Updated weights for policy 0, policy_version 669296 (0.0013) [2024-06-15 19:43:57,851][1652491] Updated weights for policy 0, policy_version 669329 (0.0013) [2024-06-15 19:43:58,992][1652491] Updated weights for policy 0, policy_version 669376 (0.0012) [2024-06-15 19:44:00,955][1648985] Fps is (10 sec: 45875.4, 60 sec: 45329.0, 300 sec: 47097.1). Total num frames: 1370882048. Throughput: 0: 11104.7. Samples: 342781440. Policy #0 lag: (min: 15.0, avg: 112.9, max: 271.0) [2024-06-15 19:44:00,956][1648985] Avg episode reward: [(0, '159.870')] [2024-06-15 19:44:02,897][1652491] Updated weights for policy 0, policy_version 669440 (0.0051) [2024-06-15 19:44:05,955][1648985] Fps is (10 sec: 42598.0, 60 sec: 44782.7, 300 sec: 47319.2). Total num frames: 1371078656. Throughput: 0: 11411.8. Samples: 342862848. Policy #0 lag: (min: 63.0, avg: 173.1, max: 319.0) [2024-06-15 19:44:05,956][1648985] Avg episode reward: [(0, '152.750')] [2024-06-15 19:44:06,748][1652491] Updated weights for policy 0, policy_version 669509 (0.0014) [2024-06-15 19:44:07,786][1652491] Updated weights for policy 0, policy_version 669558 (0.0013) [2024-06-15 19:44:09,629][1652491] Updated weights for policy 0, policy_version 669628 (0.0014) [2024-06-15 19:44:10,955][1648985] Fps is (10 sec: 52428.0, 60 sec: 45875.1, 300 sec: 47319.2). Total num frames: 1371406336. Throughput: 0: 11309.6. Samples: 342889984. Policy #0 lag: (min: 63.0, avg: 173.1, max: 319.0) [2024-06-15 19:44:10,956][1648985] Avg episode reward: [(0, '150.020')] [2024-06-15 19:44:15,955][1648985] Fps is (10 sec: 45874.9, 60 sec: 44782.7, 300 sec: 47208.1). Total num frames: 1371537408. Throughput: 0: 11263.9. Samples: 342962176. Policy #0 lag: (min: 63.0, avg: 173.1, max: 319.0) [2024-06-15 19:44:15,956][1648985] Avg episode reward: [(0, '144.010')] [2024-06-15 19:44:17,310][1652491] Updated weights for policy 0, policy_version 669728 (0.0014) [2024-06-15 19:44:19,492][1652491] Updated weights for policy 0, policy_version 669818 (0.0011) [2024-06-15 19:44:20,955][1648985] Fps is (10 sec: 49153.1, 60 sec: 46421.4, 300 sec: 47652.5). Total num frames: 1371897856. Throughput: 0: 11218.5. Samples: 343022592. Policy #0 lag: (min: 63.0, avg: 173.1, max: 319.0) [2024-06-15 19:44:20,955][1648985] Avg episode reward: [(0, '143.800')] [2024-06-15 19:44:21,102][1652491] Updated weights for policy 0, policy_version 669877 (0.0014) [2024-06-15 19:44:25,381][1652491] Updated weights for policy 0, policy_version 669923 (0.0017) [2024-06-15 19:44:25,966][1648985] Fps is (10 sec: 49098.7, 60 sec: 45320.9, 300 sec: 47428.5). Total num frames: 1372028928. Throughput: 0: 11363.6. Samples: 343065088. Policy #0 lag: (min: 63.0, avg: 173.1, max: 319.0) [2024-06-15 19:44:25,967][1648985] Avg episode reward: [(0, '167.720')] [2024-06-15 19:44:28,606][1652491] Updated weights for policy 0, policy_version 669970 (0.0017) [2024-06-15 19:44:30,586][1652491] Updated weights for policy 0, policy_version 670070 (0.0013) [2024-06-15 19:44:30,955][1648985] Fps is (10 sec: 42598.1, 60 sec: 47513.6, 300 sec: 47541.4). Total num frames: 1372323840. Throughput: 0: 11525.7. Samples: 343133184. Policy #0 lag: (min: 63.0, avg: 173.1, max: 319.0) [2024-06-15 19:44:30,956][1648985] Avg episode reward: [(0, '165.320')] [2024-06-15 19:44:31,422][1651469] Signal inference workers to stop experience collection... (34850 times) [2024-06-15 19:44:31,487][1652491] InferenceWorker_p0-w0: stopping experience collection (34850 times) [2024-06-15 19:44:31,670][1651469] Signal inference workers to resume experience collection... (34850 times) [2024-06-15 19:44:31,670][1652491] InferenceWorker_p0-w0: resuming experience collection (34850 times) [2024-06-15 19:44:32,067][1652491] Updated weights for policy 0, policy_version 670128 (0.0014) [2024-06-15 19:44:35,955][1648985] Fps is (10 sec: 42645.3, 60 sec: 43690.6, 300 sec: 47097.0). Total num frames: 1372454912. Throughput: 0: 11707.7. Samples: 343211008. Policy #0 lag: (min: 63.0, avg: 173.1, max: 319.0) [2024-06-15 19:44:35,956][1648985] Avg episode reward: [(0, '151.840')] [2024-06-15 19:44:36,750][1652491] Updated weights for policy 0, policy_version 670177 (0.0012) [2024-06-15 19:44:39,684][1652491] Updated weights for policy 0, policy_version 670240 (0.0015) [2024-06-15 19:44:40,955][1648985] Fps is (10 sec: 42598.4, 60 sec: 46421.4, 300 sec: 47430.3). Total num frames: 1372749824. Throughput: 0: 11457.5. Samples: 343243776. Policy #0 lag: (min: 63.0, avg: 173.1, max: 319.0) [2024-06-15 19:44:40,956][1648985] Avg episode reward: [(0, '144.230')] [2024-06-15 19:44:41,188][1652491] Updated weights for policy 0, policy_version 670304 (0.0013) [2024-06-15 19:44:42,059][1652491] Updated weights for policy 0, policy_version 670352 (0.0014) [2024-06-15 19:44:42,926][1652491] Updated weights for policy 0, policy_version 670394 (0.0012) [2024-06-15 19:44:45,955][1648985] Fps is (10 sec: 52429.6, 60 sec: 45875.3, 300 sec: 47097.1). Total num frames: 1372979200. Throughput: 0: 11844.3. Samples: 343314432. Policy #0 lag: (min: 63.0, avg: 173.1, max: 319.0) [2024-06-15 19:44:45,956][1648985] Avg episode reward: [(0, '145.230')] [2024-06-15 19:44:48,340][1652491] Updated weights for policy 0, policy_version 670457 (0.0013) [2024-06-15 19:44:49,783][1652491] Updated weights for policy 0, policy_version 670487 (0.0026) [2024-06-15 19:44:50,955][1648985] Fps is (10 sec: 49150.9, 60 sec: 46967.3, 300 sec: 47541.4). Total num frames: 1373241344. Throughput: 0: 11685.0. Samples: 343388672. Policy #0 lag: (min: 63.0, avg: 173.1, max: 319.0) [2024-06-15 19:44:50,956][1648985] Avg episode reward: [(0, '144.130')] [2024-06-15 19:44:51,854][1652491] Updated weights for policy 0, policy_version 670576 (0.0255) [2024-06-15 19:44:53,272][1652491] Updated weights for policy 0, policy_version 670624 (0.0013) [2024-06-15 19:44:55,955][1648985] Fps is (10 sec: 52426.8, 60 sec: 47513.4, 300 sec: 47541.3). Total num frames: 1373503488. Throughput: 0: 11673.5. Samples: 343415296. Policy #0 lag: (min: 63.0, avg: 173.1, max: 319.0) [2024-06-15 19:44:55,956][1648985] Avg episode reward: [(0, '176.370')] [2024-06-15 19:44:55,970][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000670656_1373503488.pth... [2024-06-15 19:44:56,033][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000665152_1362231296.pth [2024-06-15 19:44:59,300][1652491] Updated weights for policy 0, policy_version 670688 (0.0015) [2024-06-15 19:44:59,916][1652491] Updated weights for policy 0, policy_version 670720 (0.0011) [2024-06-15 19:45:00,955][1648985] Fps is (10 sec: 39322.5, 60 sec: 45875.2, 300 sec: 47097.1). Total num frames: 1373634560. Throughput: 0: 11878.5. Samples: 343496704. Policy #0 lag: (min: 63.0, avg: 173.1, max: 319.0) [2024-06-15 19:45:00,956][1648985] Avg episode reward: [(0, '202.530')] [2024-06-15 19:45:02,086][1652491] Updated weights for policy 0, policy_version 670771 (0.0013) [2024-06-15 19:45:03,632][1652491] Updated weights for policy 0, policy_version 670847 (0.0088) [2024-06-15 19:45:04,929][1652491] Updated weights for policy 0, policy_version 670903 (0.0028) [2024-06-15 19:45:05,955][1648985] Fps is (10 sec: 52431.4, 60 sec: 49152.3, 300 sec: 47652.5). Total num frames: 1374027776. Throughput: 0: 11946.7. Samples: 343560192. Policy #0 lag: (min: 63.0, avg: 173.1, max: 319.0) [2024-06-15 19:45:05,956][1648985] Avg episode reward: [(0, '198.280')] [2024-06-15 19:45:10,294][1652491] Updated weights for policy 0, policy_version 670945 (0.0013) [2024-06-15 19:45:10,955][1648985] Fps is (10 sec: 52428.5, 60 sec: 45875.3, 300 sec: 47208.1). Total num frames: 1374158848. Throughput: 0: 11983.8. Samples: 343604224. Policy #0 lag: (min: 63.0, avg: 173.1, max: 319.0) [2024-06-15 19:45:10,956][1648985] Avg episode reward: [(0, '185.350')] [2024-06-15 19:45:11,809][1652491] Updated weights for policy 0, policy_version 670993 (0.0021) [2024-06-15 19:45:13,451][1651469] Signal inference workers to stop experience collection... (34900 times) [2024-06-15 19:45:13,520][1652491] InferenceWorker_p0-w0: stopping experience collection (34900 times) [2024-06-15 19:45:13,778][1651469] Signal inference workers to resume experience collection... (34900 times) [2024-06-15 19:45:13,779][1652491] InferenceWorker_p0-w0: resuming experience collection (34900 times) [2024-06-15 19:45:13,941][1652491] Updated weights for policy 0, policy_version 671075 (0.0088) [2024-06-15 19:45:15,955][1648985] Fps is (10 sec: 45874.9, 60 sec: 49152.3, 300 sec: 47319.2). Total num frames: 1374486528. Throughput: 0: 11912.5. Samples: 343669248. Policy #0 lag: (min: 63.0, avg: 173.1, max: 319.0) [2024-06-15 19:45:15,956][1648985] Avg episode reward: [(0, '169.900')] [2024-06-15 19:45:16,311][1652491] Updated weights for policy 0, policy_version 671162 (0.0016) [2024-06-15 19:45:20,955][1648985] Fps is (10 sec: 39321.8, 60 sec: 44236.7, 300 sec: 46986.0). Total num frames: 1374552064. Throughput: 0: 11901.2. Samples: 343746560. Policy #0 lag: (min: 63.0, avg: 173.1, max: 319.0) [2024-06-15 19:45:20,956][1648985] Avg episode reward: [(0, '155.600')] [2024-06-15 19:45:21,881][1652491] Updated weights for policy 0, policy_version 671222 (0.0014) [2024-06-15 19:45:23,430][1652491] Updated weights for policy 0, policy_version 671266 (0.0012) [2024-06-15 19:45:24,851][1652491] Updated weights for policy 0, policy_version 671328 (0.0013) [2024-06-15 19:45:25,955][1648985] Fps is (10 sec: 45875.1, 60 sec: 48614.9, 300 sec: 47430.3). Total num frames: 1374945280. Throughput: 0: 11810.1. Samples: 343775232. Policy #0 lag: (min: 63.0, avg: 173.1, max: 319.0) [2024-06-15 19:45:25,956][1648985] Avg episode reward: [(0, '150.180')] [2024-06-15 19:45:27,012][1652491] Updated weights for policy 0, policy_version 671378 (0.0015) [2024-06-15 19:45:28,092][1652491] Updated weights for policy 0, policy_version 671424 (0.0011) [2024-06-15 19:45:30,955][1648985] Fps is (10 sec: 52427.5, 60 sec: 45875.0, 300 sec: 47097.0). Total num frames: 1375076352. Throughput: 0: 11707.7. Samples: 343841280. Policy #0 lag: (min: 63.0, avg: 173.1, max: 319.0) [2024-06-15 19:45:30,956][1648985] Avg episode reward: [(0, '162.190')] [2024-06-15 19:45:33,989][1652491] Updated weights for policy 0, policy_version 671487 (0.0017) [2024-06-15 19:45:35,751][1652491] Updated weights for policy 0, policy_version 671552 (0.0027) [2024-06-15 19:45:35,955][1648985] Fps is (10 sec: 39321.5, 60 sec: 48059.8, 300 sec: 47097.1). Total num frames: 1375338496. Throughput: 0: 11605.4. Samples: 343910912. Policy #0 lag: (min: 63.0, avg: 173.1, max: 319.0) [2024-06-15 19:45:35,956][1648985] Avg episode reward: [(0, '155.100')] [2024-06-15 19:45:37,236][1652491] Updated weights for policy 0, policy_version 671614 (0.0018) [2024-06-15 19:45:38,791][1652491] Updated weights for policy 0, policy_version 671671 (0.0013) [2024-06-15 19:45:40,955][1648985] Fps is (10 sec: 52429.6, 60 sec: 47513.5, 300 sec: 47097.0). Total num frames: 1375600640. Throughput: 0: 11662.3. Samples: 343940096. Policy #0 lag: (min: 63.0, avg: 173.1, max: 319.0) [2024-06-15 19:45:40,956][1648985] Avg episode reward: [(0, '143.890')] [2024-06-15 19:45:44,773][1652491] Updated weights for policy 0, policy_version 671699 (0.0030) [2024-06-15 19:45:45,955][1648985] Fps is (10 sec: 39320.9, 60 sec: 45875.1, 300 sec: 46874.9). Total num frames: 1375731712. Throughput: 0: 11775.9. Samples: 344026624. Policy #0 lag: (min: 13.0, avg: 77.3, max: 269.0) [2024-06-15 19:45:45,956][1648985] Avg episode reward: [(0, '127.630')] [2024-06-15 19:45:46,226][1652491] Updated weights for policy 0, policy_version 671767 (0.0013) [2024-06-15 19:45:48,065][1652491] Updated weights for policy 0, policy_version 671842 (0.0077) [2024-06-15 19:45:49,972][1652491] Updated weights for policy 0, policy_version 671920 (0.0018) [2024-06-15 19:45:50,955][1648985] Fps is (10 sec: 52429.5, 60 sec: 48059.9, 300 sec: 47097.1). Total num frames: 1376124928. Throughput: 0: 11571.2. Samples: 344080896. Policy #0 lag: (min: 13.0, avg: 77.3, max: 269.0) [2024-06-15 19:45:50,955][1648985] Avg episode reward: [(0, '130.070')] [2024-06-15 19:45:55,955][1648985] Fps is (10 sec: 39322.0, 60 sec: 43690.9, 300 sec: 46541.7). Total num frames: 1376124928. Throughput: 0: 11548.4. Samples: 344123904. Policy #0 lag: (min: 13.0, avg: 77.3, max: 269.0) [2024-06-15 19:45:55,956][1648985] Avg episode reward: [(0, '134.360')] [2024-06-15 19:45:56,503][1652491] Updated weights for policy 0, policy_version 671969 (0.0014) [2024-06-15 19:45:57,302][1651469] Signal inference workers to stop experience collection... (34950 times) [2024-06-15 19:45:57,337][1652491] InferenceWorker_p0-w0: stopping experience collection (34950 times) [2024-06-15 19:45:57,506][1651469] Signal inference workers to resume experience collection... (34950 times) [2024-06-15 19:45:57,507][1652491] InferenceWorker_p0-w0: resuming experience collection (34950 times) [2024-06-15 19:45:57,703][1652491] Updated weights for policy 0, policy_version 672023 (0.0028) [2024-06-15 19:45:59,142][1652491] Updated weights for policy 0, policy_version 672080 (0.0013) [2024-06-15 19:46:00,955][1648985] Fps is (10 sec: 42598.6, 60 sec: 48605.9, 300 sec: 47208.2). Total num frames: 1376550912. Throughput: 0: 11468.8. Samples: 344185344. Policy #0 lag: (min: 13.0, avg: 77.3, max: 269.0) [2024-06-15 19:46:00,955][1648985] Avg episode reward: [(0, '147.390')] [2024-06-15 19:46:01,272][1652491] Updated weights for policy 0, policy_version 672160 (0.0011) [2024-06-15 19:46:05,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 43690.5, 300 sec: 46652.7). Total num frames: 1376649216. Throughput: 0: 11423.3. Samples: 344260608. Policy #0 lag: (min: 13.0, avg: 77.3, max: 269.0) [2024-06-15 19:46:05,956][1648985] Avg episode reward: [(0, '164.940')] [2024-06-15 19:46:08,287][1652491] Updated weights for policy 0, policy_version 672213 (0.0013) [2024-06-15 19:46:09,969][1652491] Updated weights for policy 0, policy_version 672288 (0.0013) [2024-06-15 19:46:10,959][1648985] Fps is (10 sec: 36031.6, 60 sec: 45872.5, 300 sec: 46763.2). Total num frames: 1376911360. Throughput: 0: 11513.4. Samples: 344293376. Policy #0 lag: (min: 13.0, avg: 77.3, max: 269.0) [2024-06-15 19:46:10,959][1648985] Avg episode reward: [(0, '159.570')] [2024-06-15 19:46:12,424][1652491] Updated weights for policy 0, policy_version 672368 (0.0013) [2024-06-15 19:46:14,144][1652491] Updated weights for policy 0, policy_version 672425 (0.0012) [2024-06-15 19:46:15,955][1648985] Fps is (10 sec: 52429.3, 60 sec: 44782.9, 300 sec: 46874.9). Total num frames: 1377173504. Throughput: 0: 11116.2. Samples: 344341504. Policy #0 lag: (min: 13.0, avg: 77.3, max: 269.0) [2024-06-15 19:46:15,955][1648985] Avg episode reward: [(0, '154.050')] [2024-06-15 19:46:20,036][1652491] Updated weights for policy 0, policy_version 672480 (0.0012) [2024-06-15 19:46:20,955][1648985] Fps is (10 sec: 39335.8, 60 sec: 45875.2, 300 sec: 46541.7). Total num frames: 1377304576. Throughput: 0: 11286.8. Samples: 344418816. Policy #0 lag: (min: 13.0, avg: 77.3, max: 269.0) [2024-06-15 19:46:20,956][1648985] Avg episode reward: [(0, '172.250')] [2024-06-15 19:46:22,144][1652491] Updated weights for policy 0, policy_version 672544 (0.0094) [2024-06-15 19:46:23,852][1652491] Updated weights for policy 0, policy_version 672608 (0.0093) [2024-06-15 19:46:24,987][1652491] Updated weights for policy 0, policy_version 672656 (0.0012) [2024-06-15 19:46:25,948][1652491] Updated weights for policy 0, policy_version 672694 (0.0013) [2024-06-15 19:46:25,955][1648985] Fps is (10 sec: 49152.1, 60 sec: 45329.1, 300 sec: 46986.0). Total num frames: 1377665024. Throughput: 0: 11195.8. Samples: 344443904. Policy #0 lag: (min: 13.0, avg: 77.3, max: 269.0) [2024-06-15 19:46:25,955][1648985] Avg episode reward: [(0, '168.590')] [2024-06-15 19:46:30,955][1648985] Fps is (10 sec: 39320.8, 60 sec: 43690.7, 300 sec: 46208.4). Total num frames: 1377697792. Throughput: 0: 10888.5. Samples: 344516608. Policy #0 lag: (min: 13.0, avg: 77.3, max: 269.0) [2024-06-15 19:46:30,956][1648985] Avg episode reward: [(0, '142.450')] [2024-06-15 19:46:31,571][1652491] Updated weights for policy 0, policy_version 672737 (0.0013) [2024-06-15 19:46:33,393][1652491] Updated weights for policy 0, policy_version 672784 (0.0106) [2024-06-15 19:46:35,489][1652491] Updated weights for policy 0, policy_version 672868 (0.0014) [2024-06-15 19:46:35,955][1648985] Fps is (10 sec: 39321.5, 60 sec: 45329.1, 300 sec: 46652.7). Total num frames: 1378058240. Throughput: 0: 11093.3. Samples: 344580096. Policy #0 lag: (min: 13.0, avg: 77.3, max: 269.0) [2024-06-15 19:46:35,956][1648985] Avg episode reward: [(0, '133.980')] [2024-06-15 19:46:35,983][1651469] Signal inference workers to stop experience collection... (35000 times) [2024-06-15 19:46:36,066][1652491] InferenceWorker_p0-w0: stopping experience collection (35000 times) [2024-06-15 19:46:36,128][1651469] Signal inference workers to resume experience collection... (35000 times) [2024-06-15 19:46:36,129][1652491] InferenceWorker_p0-w0: resuming experience collection (35000 times) [2024-06-15 19:46:36,159][1652491] Updated weights for policy 0, policy_version 672896 (0.0046) [2024-06-15 19:46:37,224][1652491] Updated weights for policy 0, policy_version 672954 (0.0012) [2024-06-15 19:46:40,955][1648985] Fps is (10 sec: 52429.7, 60 sec: 43690.7, 300 sec: 46319.5). Total num frames: 1378222080. Throughput: 0: 11104.7. Samples: 344623616. Policy #0 lag: (min: 13.0, avg: 77.3, max: 269.0) [2024-06-15 19:46:40,956][1648985] Avg episode reward: [(0, '151.100')] [2024-06-15 19:46:42,701][1652491] Updated weights for policy 0, policy_version 673023 (0.0013) [2024-06-15 19:46:45,246][1652491] Updated weights for policy 0, policy_version 673078 (0.0014) [2024-06-15 19:46:45,955][1648985] Fps is (10 sec: 45875.0, 60 sec: 46421.5, 300 sec: 46763.8). Total num frames: 1378516992. Throughput: 0: 11332.2. Samples: 344695296. Policy #0 lag: (min: 13.0, avg: 77.3, max: 269.0) [2024-06-15 19:46:45,956][1648985] Avg episode reward: [(0, '172.720')] [2024-06-15 19:46:46,719][1652491] Updated weights for policy 0, policy_version 673138 (0.0014) [2024-06-15 19:46:48,197][1652491] Updated weights for policy 0, policy_version 673205 (0.0011) [2024-06-15 19:46:50,955][1648985] Fps is (10 sec: 52429.2, 60 sec: 43690.7, 300 sec: 46541.7). Total num frames: 1378746368. Throughput: 0: 11218.5. Samples: 344765440. Policy #0 lag: (min: 13.0, avg: 77.3, max: 269.0) [2024-06-15 19:46:50,956][1648985] Avg episode reward: [(0, '163.100')] [2024-06-15 19:46:53,381][1652491] Updated weights for policy 0, policy_version 673232 (0.0160) [2024-06-15 19:46:55,955][1648985] Fps is (10 sec: 39321.4, 60 sec: 46421.3, 300 sec: 46319.5). Total num frames: 1378910208. Throughput: 0: 11299.0. Samples: 344801792. Policy #0 lag: (min: 13.0, avg: 77.3, max: 269.0) [2024-06-15 19:46:55,956][1648985] Avg episode reward: [(0, '155.500')] [2024-06-15 19:46:56,206][1652491] Updated weights for policy 0, policy_version 673312 (0.0021) [2024-06-15 19:46:56,667][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000673328_1378975744.pth... [2024-06-15 19:46:56,833][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000667904_1367867392.pth [2024-06-15 19:46:58,252][1652491] Updated weights for policy 0, policy_version 673392 (0.0094) [2024-06-15 19:46:59,308][1652491] Updated weights for policy 0, policy_version 673426 (0.0012) [2024-06-15 19:47:00,234][1652491] Updated weights for policy 0, policy_version 673472 (0.0014) [2024-06-15 19:47:00,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 45329.0, 300 sec: 46874.9). Total num frames: 1379270656. Throughput: 0: 11457.4. Samples: 344857088. Policy #0 lag: (min: 13.0, avg: 77.3, max: 269.0) [2024-06-15 19:47:00,955][1648985] Avg episode reward: [(0, '147.870')] [2024-06-15 19:47:05,955][1648985] Fps is (10 sec: 49152.6, 60 sec: 45875.3, 300 sec: 46430.6). Total num frames: 1379401728. Throughput: 0: 11571.2. Samples: 344939520. Policy #0 lag: (min: 13.0, avg: 77.3, max: 269.0) [2024-06-15 19:47:05,955][1648985] Avg episode reward: [(0, '150.280')] [2024-06-15 19:47:05,958][1652491] Updated weights for policy 0, policy_version 673536 (0.0028) [2024-06-15 19:47:07,384][1652491] Updated weights for policy 0, policy_version 673588 (0.0012) [2024-06-15 19:47:09,244][1652491] Updated weights for policy 0, policy_version 673659 (0.0012) [2024-06-15 19:47:10,955][1648985] Fps is (10 sec: 49152.2, 60 sec: 47516.5, 300 sec: 46986.0). Total num frames: 1379762176. Throughput: 0: 11650.8. Samples: 344968192. Policy #0 lag: (min: 13.0, avg: 77.3, max: 269.0) [2024-06-15 19:47:10,955][1648985] Avg episode reward: [(0, '156.970')] [2024-06-15 19:47:11,124][1652491] Updated weights for policy 0, policy_version 673728 (0.0117) [2024-06-15 19:47:15,955][1648985] Fps is (10 sec: 42597.8, 60 sec: 44236.7, 300 sec: 46319.5). Total num frames: 1379827712. Throughput: 0: 11821.5. Samples: 345048576. Policy #0 lag: (min: 13.0, avg: 77.3, max: 269.0) [2024-06-15 19:47:15,956][1648985] Avg episode reward: [(0, '154.470')] [2024-06-15 19:47:17,206][1652491] Updated weights for policy 0, policy_version 673793 (0.0014) [2024-06-15 19:47:17,948][1651469] Signal inference workers to stop experience collection... (35050 times) [2024-06-15 19:47:17,977][1652491] InferenceWorker_p0-w0: stopping experience collection (35050 times) [2024-06-15 19:47:18,202][1651469] Signal inference workers to resume experience collection... (35050 times) [2024-06-15 19:47:18,203][1652491] InferenceWorker_p0-w0: resuming experience collection (35050 times) [2024-06-15 19:47:18,816][1652491] Updated weights for policy 0, policy_version 673860 (0.0014) [2024-06-15 19:47:20,382][1652491] Updated weights for policy 0, policy_version 673920 (0.0017) [2024-06-15 19:47:20,955][1648985] Fps is (10 sec: 42598.4, 60 sec: 48059.8, 300 sec: 46652.7). Total num frames: 1380188160. Throughput: 0: 11855.7. Samples: 345113600. Policy #0 lag: (min: 13.0, avg: 77.3, max: 269.0) [2024-06-15 19:47:20,955][1648985] Avg episode reward: [(0, '158.940')] [2024-06-15 19:47:25,955][1648985] Fps is (10 sec: 49152.5, 60 sec: 44236.8, 300 sec: 46208.5). Total num frames: 1380319232. Throughput: 0: 11798.8. Samples: 345154560. Policy #0 lag: (min: 13.0, avg: 77.3, max: 269.0) [2024-06-15 19:47:25,956][1648985] Avg episode reward: [(0, '156.830')] [2024-06-15 19:47:26,788][1652491] Updated weights for policy 0, policy_version 674016 (0.0030) [2024-06-15 19:47:27,604][1652491] Updated weights for policy 0, policy_version 674048 (0.0014) [2024-06-15 19:47:30,436][1652491] Updated weights for policy 0, policy_version 674128 (0.0013) [2024-06-15 19:47:30,955][1648985] Fps is (10 sec: 45874.5, 60 sec: 49152.1, 300 sec: 46430.5). Total num frames: 1380646912. Throughput: 0: 11719.1. Samples: 345222656. Policy #0 lag: (min: 6.0, avg: 75.1, max: 262.0) [2024-06-15 19:47:30,956][1648985] Avg episode reward: [(0, '152.020')] [2024-06-15 19:47:31,631][1652491] Updated weights for policy 0, policy_version 674175 (0.0028) [2024-06-15 19:47:32,881][1652491] Updated weights for policy 0, policy_version 674224 (0.0011) [2024-06-15 19:47:35,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 46421.3, 300 sec: 46208.4). Total num frames: 1380843520. Throughput: 0: 11787.4. Samples: 345295872. Policy #0 lag: (min: 6.0, avg: 75.1, max: 262.0) [2024-06-15 19:47:35,956][1648985] Avg episode reward: [(0, '167.340')] [2024-06-15 19:47:37,583][1652491] Updated weights for policy 0, policy_version 674258 (0.0010) [2024-06-15 19:47:40,403][1652491] Updated weights for policy 0, policy_version 674336 (0.0014) [2024-06-15 19:47:40,968][1648985] Fps is (10 sec: 42544.0, 60 sec: 47503.4, 300 sec: 46428.6). Total num frames: 1381072896. Throughput: 0: 11784.0. Samples: 345332224. Policy #0 lag: (min: 6.0, avg: 75.1, max: 262.0) [2024-06-15 19:47:40,969][1648985] Avg episode reward: [(0, '162.920')] [2024-06-15 19:47:41,780][1652491] Updated weights for policy 0, policy_version 674387 (0.0013) [2024-06-15 19:47:42,769][1652491] Updated weights for policy 0, policy_version 674429 (0.0012) [2024-06-15 19:47:44,165][1652491] Updated weights for policy 0, policy_version 674480 (0.0013) [2024-06-15 19:47:45,955][1648985] Fps is (10 sec: 52428.4, 60 sec: 47513.6, 300 sec: 46208.4). Total num frames: 1381367808. Throughput: 0: 11935.3. Samples: 345394176. Policy #0 lag: (min: 6.0, avg: 75.1, max: 262.0) [2024-06-15 19:47:45,956][1648985] Avg episode reward: [(0, '145.770')] [2024-06-15 19:47:49,027][1652491] Updated weights for policy 0, policy_version 674516 (0.0014) [2024-06-15 19:47:49,811][1652491] Updated weights for policy 0, policy_version 674560 (0.0121) [2024-06-15 19:47:50,955][1648985] Fps is (10 sec: 52495.9, 60 sec: 47513.5, 300 sec: 46541.7). Total num frames: 1381597184. Throughput: 0: 11855.6. Samples: 345473024. Policy #0 lag: (min: 6.0, avg: 75.1, max: 262.0) [2024-06-15 19:47:50,956][1648985] Avg episode reward: [(0, '170.790')] [2024-06-15 19:47:51,502][1652491] Updated weights for policy 0, policy_version 674631 (0.0017) [2024-06-15 19:47:52,800][1652491] Updated weights for policy 0, policy_version 674685 (0.0013) [2024-06-15 19:47:55,549][1652491] Updated weights for policy 0, policy_version 674742 (0.0021) [2024-06-15 19:47:55,955][1648985] Fps is (10 sec: 52429.1, 60 sec: 49698.2, 300 sec: 46541.7). Total num frames: 1381892096. Throughput: 0: 11958.0. Samples: 345506304. Policy #0 lag: (min: 6.0, avg: 75.1, max: 262.0) [2024-06-15 19:47:55,956][1648985] Avg episode reward: [(0, '185.320')] [2024-06-15 19:48:00,924][1651469] Signal inference workers to stop experience collection... (35100 times) [2024-06-15 19:48:00,955][1648985] Fps is (10 sec: 36045.1, 60 sec: 44782.9, 300 sec: 45986.3). Total num frames: 1381957632. Throughput: 0: 11855.7. Samples: 345582080. Policy #0 lag: (min: 6.0, avg: 75.1, max: 262.0) [2024-06-15 19:48:00,956][1648985] Avg episode reward: [(0, '170.310')] [2024-06-15 19:48:00,980][1652491] InferenceWorker_p0-w0: stopping experience collection (35100 times) [2024-06-15 19:48:01,009][1652491] Updated weights for policy 0, policy_version 674789 (0.0016) [2024-06-15 19:48:01,201][1651469] Signal inference workers to resume experience collection... (35100 times) [2024-06-15 19:48:01,203][1652491] InferenceWorker_p0-w0: resuming experience collection (35100 times) [2024-06-15 19:48:02,684][1652491] Updated weights for policy 0, policy_version 674864 (0.0122) [2024-06-15 19:48:04,198][1652491] Updated weights for policy 0, policy_version 674913 (0.0033) [2024-06-15 19:48:05,955][1648985] Fps is (10 sec: 39322.2, 60 sec: 48059.8, 300 sec: 46208.5). Total num frames: 1382285312. Throughput: 0: 11707.8. Samples: 345640448. Policy #0 lag: (min: 6.0, avg: 75.1, max: 262.0) [2024-06-15 19:48:05,955][1648985] Avg episode reward: [(0, '157.040')] [2024-06-15 19:48:06,344][1652491] Updated weights for policy 0, policy_version 674963 (0.0012) [2024-06-15 19:48:07,448][1652491] Updated weights for policy 0, policy_version 675006 (0.0012) [2024-06-15 19:48:10,955][1648985] Fps is (10 sec: 45875.5, 60 sec: 44236.8, 300 sec: 45986.3). Total num frames: 1382416384. Throughput: 0: 11548.5. Samples: 345674240. Policy #0 lag: (min: 6.0, avg: 75.1, max: 262.0) [2024-06-15 19:48:10,956][1648985] Avg episode reward: [(0, '165.820')] [2024-06-15 19:48:13,238][1652491] Updated weights for policy 0, policy_version 675072 (0.0014) [2024-06-15 19:48:14,463][1652491] Updated weights for policy 0, policy_version 675120 (0.0017) [2024-06-15 19:48:15,955][1648985] Fps is (10 sec: 45874.6, 60 sec: 48606.0, 300 sec: 46208.4). Total num frames: 1382744064. Throughput: 0: 11537.1. Samples: 345741824. Policy #0 lag: (min: 6.0, avg: 75.1, max: 262.0) [2024-06-15 19:48:15,956][1648985] Avg episode reward: [(0, '172.330')] [2024-06-15 19:48:16,394][1652491] Updated weights for policy 0, policy_version 675189 (0.0012) [2024-06-15 19:48:17,829][1652491] Updated weights for policy 0, policy_version 675219 (0.0014) [2024-06-15 19:48:20,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 45875.2, 300 sec: 46208.5). Total num frames: 1382940672. Throughput: 0: 11446.0. Samples: 345810944. Policy #0 lag: (min: 6.0, avg: 75.1, max: 262.0) [2024-06-15 19:48:20,956][1648985] Avg episode reward: [(0, '162.550')] [2024-06-15 19:48:24,033][1652491] Updated weights for policy 0, policy_version 675280 (0.0013) [2024-06-15 19:48:25,657][1652491] Updated weights for policy 0, policy_version 675348 (0.0037) [2024-06-15 19:48:25,955][1648985] Fps is (10 sec: 39322.0, 60 sec: 46967.6, 300 sec: 46319.5). Total num frames: 1383137280. Throughput: 0: 11506.3. Samples: 345849856. Policy #0 lag: (min: 6.0, avg: 75.1, max: 262.0) [2024-06-15 19:48:25,955][1648985] Avg episode reward: [(0, '157.390')] [2024-06-15 19:48:26,911][1652491] Updated weights for policy 0, policy_version 675393 (0.0017) [2024-06-15 19:48:28,568][1652491] Updated weights for policy 0, policy_version 675456 (0.0020) [2024-06-15 19:48:30,119][1652491] Updated weights for policy 0, policy_version 675518 (0.0012) [2024-06-15 19:48:30,955][1648985] Fps is (10 sec: 52427.8, 60 sec: 46967.4, 300 sec: 46208.4). Total num frames: 1383464960. Throughput: 0: 11286.7. Samples: 345902080. Policy #0 lag: (min: 6.0, avg: 75.1, max: 262.0) [2024-06-15 19:48:30,956][1648985] Avg episode reward: [(0, '151.380')] [2024-06-15 19:48:35,955][1648985] Fps is (10 sec: 39321.3, 60 sec: 44782.9, 300 sec: 45986.3). Total num frames: 1383530496. Throughput: 0: 11332.3. Samples: 345982976. Policy #0 lag: (min: 6.0, avg: 75.1, max: 262.0) [2024-06-15 19:48:35,956][1648985] Avg episode reward: [(0, '152.390')] [2024-06-15 19:48:36,183][1652491] Updated weights for policy 0, policy_version 675571 (0.0120) [2024-06-15 19:48:37,065][1652491] Updated weights for policy 0, policy_version 675616 (0.0030) [2024-06-15 19:48:38,648][1652491] Updated weights for policy 0, policy_version 675667 (0.0012) [2024-06-15 19:48:38,983][1651469] Signal inference workers to stop experience collection... (35150 times) [2024-06-15 19:48:39,050][1652491] InferenceWorker_p0-w0: stopping experience collection (35150 times) [2024-06-15 19:48:39,174][1651469] Signal inference workers to resume experience collection... (35150 times) [2024-06-15 19:48:39,175][1652491] InferenceWorker_p0-w0: resuming experience collection (35150 times) [2024-06-15 19:48:40,080][1652491] Updated weights for policy 0, policy_version 675728 (0.0012) [2024-06-15 19:48:40,955][1648985] Fps is (10 sec: 45876.1, 60 sec: 47523.8, 300 sec: 46430.6). Total num frames: 1383923712. Throughput: 0: 11332.3. Samples: 346016256. Policy #0 lag: (min: 6.0, avg: 75.1, max: 262.0) [2024-06-15 19:48:40,956][1648985] Avg episode reward: [(0, '151.260')] [2024-06-15 19:48:41,356][1652491] Updated weights for policy 0, policy_version 675774 (0.0012) [2024-06-15 19:48:45,955][1648985] Fps is (10 sec: 45875.2, 60 sec: 43690.7, 300 sec: 45986.3). Total num frames: 1383989248. Throughput: 0: 11366.4. Samples: 346093568. Policy #0 lag: (min: 6.0, avg: 75.1, max: 262.0) [2024-06-15 19:48:45,956][1648985] Avg episode reward: [(0, '131.450')] [2024-06-15 19:48:46,937][1652491] Updated weights for policy 0, policy_version 675826 (0.0014) [2024-06-15 19:48:47,843][1652491] Updated weights for policy 0, policy_version 675860 (0.0053) [2024-06-15 19:48:48,913][1652491] Updated weights for policy 0, policy_version 675920 (0.0104) [2024-06-15 19:48:50,129][1652491] Updated weights for policy 0, policy_version 675975 (0.0014) [2024-06-15 19:48:50,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 47513.7, 300 sec: 46763.8). Total num frames: 1384448000. Throughput: 0: 11605.3. Samples: 346162688. Policy #0 lag: (min: 6.0, avg: 75.1, max: 262.0) [2024-06-15 19:48:50,956][1648985] Avg episode reward: [(0, '147.720')] [2024-06-15 19:48:51,437][1652491] Updated weights for policy 0, policy_version 676032 (0.0012) [2024-06-15 19:48:55,955][1648985] Fps is (10 sec: 52428.2, 60 sec: 43690.6, 300 sec: 46208.4). Total num frames: 1384513536. Throughput: 0: 11798.7. Samples: 346205184. Policy #0 lag: (min: 6.0, avg: 75.1, max: 262.0) [2024-06-15 19:48:55,956][1648985] Avg episode reward: [(0, '151.660')] [2024-06-15 19:48:55,965][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000676032_1384513536.pth... [2024-06-15 19:48:56,018][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000670656_1373503488.pth [2024-06-15 19:48:57,468][1652491] Updated weights for policy 0, policy_version 676080 (0.0066) [2024-06-15 19:48:59,087][1652491] Updated weights for policy 0, policy_version 676144 (0.0021) [2024-06-15 19:49:00,488][1652491] Updated weights for policy 0, policy_version 676195 (0.0012) [2024-06-15 19:49:00,955][1648985] Fps is (10 sec: 42598.2, 60 sec: 48605.9, 300 sec: 46763.9). Total num frames: 1384873984. Throughput: 0: 11878.4. Samples: 346276352. Policy #0 lag: (min: 6.0, avg: 75.1, max: 262.0) [2024-06-15 19:49:00,956][1648985] Avg episode reward: [(0, '141.060')] [2024-06-15 19:49:02,285][1652491] Updated weights for policy 0, policy_version 676279 (0.0014) [2024-06-15 19:49:05,955][1648985] Fps is (10 sec: 52429.2, 60 sec: 45875.1, 300 sec: 46208.5). Total num frames: 1385037824. Throughput: 0: 12105.9. Samples: 346355712. Policy #0 lag: (min: 6.0, avg: 75.1, max: 262.0) [2024-06-15 19:49:05,956][1648985] Avg episode reward: [(0, '139.190')] [2024-06-15 19:49:07,497][1652491] Updated weights for policy 0, policy_version 676309 (0.0032) [2024-06-15 19:49:09,257][1652491] Updated weights for policy 0, policy_version 676371 (0.0028) [2024-06-15 19:49:10,695][1652491] Updated weights for policy 0, policy_version 676437 (0.0014) [2024-06-15 19:49:10,955][1648985] Fps is (10 sec: 49151.9, 60 sec: 49151.9, 300 sec: 46874.9). Total num frames: 1385365504. Throughput: 0: 12037.6. Samples: 346391552. Policy #0 lag: (min: 15.0, avg: 86.2, max: 271.0) [2024-06-15 19:49:10,956][1648985] Avg episode reward: [(0, '154.960')] [2024-06-15 19:49:12,064][1652491] Updated weights for policy 0, policy_version 676512 (0.0012) [2024-06-15 19:49:15,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 46967.5, 300 sec: 46319.5). Total num frames: 1385562112. Throughput: 0: 12379.1. Samples: 346459136. Policy #0 lag: (min: 15.0, avg: 86.2, max: 271.0) [2024-06-15 19:49:15,956][1648985] Avg episode reward: [(0, '143.520')] [2024-06-15 19:49:18,419][1652491] Updated weights for policy 0, policy_version 676576 (0.0013) [2024-06-15 19:49:19,904][1651469] Signal inference workers to stop experience collection... (35200 times) [2024-06-15 19:49:19,904][1652491] Updated weights for policy 0, policy_version 676609 (0.0015) [2024-06-15 19:49:20,003][1652491] InferenceWorker_p0-w0: stopping experience collection (35200 times) [2024-06-15 19:49:20,131][1651469] Signal inference workers to resume experience collection... (35200 times) [2024-06-15 19:49:20,132][1652491] InferenceWorker_p0-w0: resuming experience collection (35200 times) [2024-06-15 19:49:20,955][1648985] Fps is (10 sec: 42599.1, 60 sec: 47513.7, 300 sec: 46654.5). Total num frames: 1385791488. Throughput: 0: 12242.5. Samples: 346533888. Policy #0 lag: (min: 15.0, avg: 86.2, max: 271.0) [2024-06-15 19:49:20,955][1648985] Avg episode reward: [(0, '167.870')] [2024-06-15 19:49:21,042][1652491] Updated weights for policy 0, policy_version 676672 (0.0011) [2024-06-15 19:49:22,868][1652491] Updated weights for policy 0, policy_version 676755 (0.0013) [2024-06-15 19:49:25,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 49152.0, 300 sec: 46652.8). Total num frames: 1386086400. Throughput: 0: 12197.0. Samples: 346565120. Policy #0 lag: (min: 15.0, avg: 86.2, max: 271.0) [2024-06-15 19:49:25,956][1648985] Avg episode reward: [(0, '184.420')] [2024-06-15 19:49:28,587][1652491] Updated weights for policy 0, policy_version 676817 (0.0013) [2024-06-15 19:49:30,955][1648985] Fps is (10 sec: 45874.8, 60 sec: 46421.5, 300 sec: 46763.9). Total num frames: 1386250240. Throughput: 0: 12185.6. Samples: 346641920. Policy #0 lag: (min: 15.0, avg: 86.2, max: 271.0) [2024-06-15 19:49:30,956][1648985] Avg episode reward: [(0, '162.460')] [2024-06-15 19:49:31,153][1652491] Updated weights for policy 0, policy_version 676884 (0.0042) [2024-06-15 19:49:32,295][1652491] Updated weights for policy 0, policy_version 676944 (0.0011) [2024-06-15 19:49:33,959][1652491] Updated weights for policy 0, policy_version 677011 (0.0012) [2024-06-15 19:49:35,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 51336.5, 300 sec: 46986.0). Total num frames: 1386610688. Throughput: 0: 12219.7. Samples: 346712576. Policy #0 lag: (min: 15.0, avg: 86.2, max: 271.0) [2024-06-15 19:49:35,955][1648985] Avg episode reward: [(0, '151.100')] [2024-06-15 19:49:38,766][1652491] Updated weights for policy 0, policy_version 677074 (0.0014) [2024-06-15 19:49:40,955][1648985] Fps is (10 sec: 49151.9, 60 sec: 46967.5, 300 sec: 46652.7). Total num frames: 1386741760. Throughput: 0: 12219.8. Samples: 346755072. Policy #0 lag: (min: 15.0, avg: 86.2, max: 271.0) [2024-06-15 19:49:40,956][1648985] Avg episode reward: [(0, '148.490')] [2024-06-15 19:49:41,721][1652491] Updated weights for policy 0, policy_version 677136 (0.0013) [2024-06-15 19:49:43,179][1652491] Updated weights for policy 0, policy_version 677200 (0.0013) [2024-06-15 19:49:44,173][1652491] Updated weights for policy 0, policy_version 677248 (0.0127) [2024-06-15 19:49:45,021][1652491] Updated weights for policy 0, policy_version 677298 (0.0012) [2024-06-15 19:49:45,955][1648985] Fps is (10 sec: 52428.6, 60 sec: 52428.8, 300 sec: 47097.1). Total num frames: 1387134976. Throughput: 0: 12197.0. Samples: 346825216. Policy #0 lag: (min: 15.0, avg: 86.2, max: 271.0) [2024-06-15 19:49:45,956][1648985] Avg episode reward: [(0, '151.830')] [2024-06-15 19:49:49,114][1652491] Updated weights for policy 0, policy_version 677344 (0.0016) [2024-06-15 19:49:50,955][1648985] Fps is (10 sec: 52428.4, 60 sec: 46967.4, 300 sec: 46652.8). Total num frames: 1387266048. Throughput: 0: 12094.6. Samples: 346899968. Policy #0 lag: (min: 15.0, avg: 86.2, max: 271.0) [2024-06-15 19:49:50,956][1648985] Avg episode reward: [(0, '144.230')] [2024-06-15 19:49:53,016][1652491] Updated weights for policy 0, policy_version 677408 (0.0024) [2024-06-15 19:49:54,528][1652491] Updated weights for policy 0, policy_version 677476 (0.0013) [2024-06-15 19:49:55,452][1652491] Updated weights for policy 0, policy_version 677508 (0.0029) [2024-06-15 19:49:55,955][1648985] Fps is (10 sec: 45874.6, 60 sec: 51336.5, 300 sec: 47319.2). Total num frames: 1387593728. Throughput: 0: 12060.4. Samples: 346934272. Policy #0 lag: (min: 15.0, avg: 86.2, max: 271.0) [2024-06-15 19:49:55,956][1648985] Avg episode reward: [(0, '145.560')] [2024-06-15 19:49:55,972][1651469] Signal inference workers to stop experience collection... (35250 times) [2024-06-15 19:49:56,005][1652491] InferenceWorker_p0-w0: stopping experience collection (35250 times) [2024-06-15 19:49:56,127][1651469] Signal inference workers to resume experience collection... (35250 times) [2024-06-15 19:49:56,128][1652491] InferenceWorker_p0-w0: resuming experience collection (35250 times) [2024-06-15 19:49:56,351][1652491] Updated weights for policy 0, policy_version 677568 (0.0013) [2024-06-15 19:50:00,709][1652491] Updated weights for policy 0, policy_version 677631 (0.0019) [2024-06-15 19:50:00,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 48605.9, 300 sec: 46652.7). Total num frames: 1387790336. Throughput: 0: 12265.2. Samples: 347011072. Policy #0 lag: (min: 15.0, avg: 86.2, max: 271.0) [2024-06-15 19:50:00,956][1648985] Avg episode reward: [(0, '130.950')] [2024-06-15 19:50:04,704][1652491] Updated weights for policy 0, policy_version 677696 (0.0013) [2024-06-15 19:50:05,955][1648985] Fps is (10 sec: 42598.9, 60 sec: 49698.1, 300 sec: 46986.0). Total num frames: 1388019712. Throughput: 0: 12071.8. Samples: 347077120. Policy #0 lag: (min: 15.0, avg: 86.2, max: 271.0) [2024-06-15 19:50:05,956][1648985] Avg episode reward: [(0, '136.460')] [2024-06-15 19:50:06,151][1652491] Updated weights for policy 0, policy_version 677755 (0.0013) [2024-06-15 19:50:07,388][1652491] Updated weights for policy 0, policy_version 677809 (0.0021) [2024-06-15 19:50:10,666][1652491] Updated weights for policy 0, policy_version 677840 (0.0012) [2024-06-15 19:50:10,955][1648985] Fps is (10 sec: 42598.5, 60 sec: 47513.6, 300 sec: 46541.7). Total num frames: 1388216320. Throughput: 0: 12208.3. Samples: 347114496. Policy #0 lag: (min: 15.0, avg: 86.2, max: 271.0) [2024-06-15 19:50:10,956][1648985] Avg episode reward: [(0, '155.270')] [2024-06-15 19:50:15,235][1652491] Updated weights for policy 0, policy_version 677924 (0.0014) [2024-06-15 19:50:15,955][1648985] Fps is (10 sec: 42598.2, 60 sec: 48059.7, 300 sec: 47097.0). Total num frames: 1388445696. Throughput: 0: 12162.8. Samples: 347189248. Policy #0 lag: (min: 15.0, avg: 86.2, max: 271.0) [2024-06-15 19:50:15,956][1648985] Avg episode reward: [(0, '161.560')] [2024-06-15 19:50:16,705][1652491] Updated weights for policy 0, policy_version 677987 (0.0011) [2024-06-15 19:50:17,958][1652491] Updated weights for policy 0, policy_version 678048 (0.0125) [2024-06-15 19:50:20,955][1648985] Fps is (10 sec: 49152.2, 60 sec: 48605.8, 300 sec: 46652.8). Total num frames: 1388707840. Throughput: 0: 12060.4. Samples: 347255296. Policy #0 lag: (min: 15.0, avg: 86.2, max: 271.0) [2024-06-15 19:50:20,955][1648985] Avg episode reward: [(0, '159.630')] [2024-06-15 19:50:23,054][1652491] Updated weights for policy 0, policy_version 678128 (0.0012) [2024-06-15 19:50:25,955][1648985] Fps is (10 sec: 42599.1, 60 sec: 46421.4, 300 sec: 46763.9). Total num frames: 1388871680. Throughput: 0: 11855.7. Samples: 347288576. Policy #0 lag: (min: 15.0, avg: 86.2, max: 271.0) [2024-06-15 19:50:25,955][1648985] Avg episode reward: [(0, '156.580')] [2024-06-15 19:50:26,580][1652491] Updated weights for policy 0, policy_version 678192 (0.0013) [2024-06-15 19:50:28,780][1652491] Updated weights for policy 0, policy_version 678288 (0.0012) [2024-06-15 19:50:29,820][1652491] Updated weights for policy 0, policy_version 678330 (0.0017) [2024-06-15 19:50:30,955][1648985] Fps is (10 sec: 52428.6, 60 sec: 49698.1, 300 sec: 47097.1). Total num frames: 1389232128. Throughput: 0: 11707.7. Samples: 347352064. Policy #0 lag: (min: 15.0, avg: 86.2, max: 271.0) [2024-06-15 19:50:30,956][1648985] Avg episode reward: [(0, '150.200')] [2024-06-15 19:50:34,965][1652491] Updated weights for policy 0, policy_version 678390 (0.0012) [2024-06-15 19:50:35,964][1648985] Fps is (10 sec: 49108.6, 60 sec: 45868.5, 300 sec: 46651.4). Total num frames: 1389363200. Throughput: 0: 11728.2. Samples: 347427840. Policy #0 lag: (min: 15.0, avg: 86.2, max: 271.0) [2024-06-15 19:50:35,964][1648985] Avg episode reward: [(0, '161.260')] [2024-06-15 19:50:37,338][1652491] Updated weights for policy 0, policy_version 678448 (0.0014) [2024-06-15 19:50:38,271][1651469] Signal inference workers to stop experience collection... (35300 times) [2024-06-15 19:50:38,348][1652491] InferenceWorker_p0-w0: stopping experience collection (35300 times) [2024-06-15 19:50:38,354][1652491] Updated weights for policy 0, policy_version 678487 (0.0016) [2024-06-15 19:50:38,461][1651469] Signal inference workers to resume experience collection... (35300 times) [2024-06-15 19:50:38,462][1652491] InferenceWorker_p0-w0: resuming experience collection (35300 times) [2024-06-15 19:50:40,377][1652491] Updated weights for policy 0, policy_version 678562 (0.0119) [2024-06-15 19:50:40,955][1648985] Fps is (10 sec: 49152.1, 60 sec: 49698.1, 300 sec: 47430.3). Total num frames: 1389723648. Throughput: 0: 11730.5. Samples: 347462144. Policy #0 lag: (min: 15.0, avg: 86.2, max: 271.0) [2024-06-15 19:50:40,955][1648985] Avg episode reward: [(0, '151.030')] [2024-06-15 19:50:44,540][1652491] Updated weights for policy 0, policy_version 678597 (0.0022) [2024-06-15 19:50:45,750][1652491] Updated weights for policy 0, policy_version 678648 (0.0011) [2024-06-15 19:50:45,955][1648985] Fps is (10 sec: 52473.7, 60 sec: 45875.1, 300 sec: 46652.7). Total num frames: 1389887488. Throughput: 0: 11776.0. Samples: 347540992. Policy #0 lag: (min: 15.0, avg: 86.2, max: 271.0) [2024-06-15 19:50:45,956][1648985] Avg episode reward: [(0, '129.630')] [2024-06-15 19:50:48,319][1652491] Updated weights for policy 0, policy_version 678704 (0.0030) [2024-06-15 19:50:49,845][1652491] Updated weights for policy 0, policy_version 678782 (0.0103) [2024-06-15 19:50:50,955][1648985] Fps is (10 sec: 45874.6, 60 sec: 48605.8, 300 sec: 47652.4). Total num frames: 1390182400. Throughput: 0: 11832.9. Samples: 347609600. Policy #0 lag: (min: 15.0, avg: 102.5, max: 271.0) [2024-06-15 19:50:50,956][1648985] Avg episode reward: [(0, '134.690')] [2024-06-15 19:50:51,530][1652491] Updated weights for policy 0, policy_version 678837 (0.0013) [2024-06-15 19:50:55,145][1652491] Updated weights for policy 0, policy_version 678866 (0.0011) [2024-06-15 19:50:55,955][1648985] Fps is (10 sec: 49152.7, 60 sec: 46421.4, 300 sec: 46874.9). Total num frames: 1390379008. Throughput: 0: 11946.7. Samples: 347652096. Policy #0 lag: (min: 15.0, avg: 102.5, max: 271.0) [2024-06-15 19:50:55,956][1648985] Avg episode reward: [(0, '143.610')] [2024-06-15 19:50:56,085][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000678912_1390411776.pth... [2024-06-15 19:50:56,158][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000673328_1378975744.pth [2024-06-15 19:50:56,163][1651469] Saving a milestone train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/milestones/checkpoint_000678912_1390411776.pth [2024-06-15 19:50:58,133][1652491] Updated weights for policy 0, policy_version 678936 (0.0138) [2024-06-15 19:50:59,387][1652491] Updated weights for policy 0, policy_version 678984 (0.0013) [2024-06-15 19:51:00,488][1652491] Updated weights for policy 0, policy_version 679040 (0.0014) [2024-06-15 19:51:00,955][1648985] Fps is (10 sec: 49152.7, 60 sec: 48059.8, 300 sec: 47541.4). Total num frames: 1390673920. Throughput: 0: 11832.9. Samples: 347721728. Policy #0 lag: (min: 15.0, avg: 102.5, max: 271.0) [2024-06-15 19:51:00,955][1648985] Avg episode reward: [(0, '147.280')] [2024-06-15 19:51:02,669][1652491] Updated weights for policy 0, policy_version 679093 (0.0013) [2024-06-15 19:51:05,753][1652491] Updated weights for policy 0, policy_version 679139 (0.0107) [2024-06-15 19:51:05,955][1648985] Fps is (10 sec: 49151.9, 60 sec: 47513.6, 300 sec: 47319.8). Total num frames: 1390870528. Throughput: 0: 12105.9. Samples: 347800064. Policy #0 lag: (min: 15.0, avg: 102.5, max: 271.0) [2024-06-15 19:51:05,956][1648985] Avg episode reward: [(0, '145.340')] [2024-06-15 19:51:08,764][1652491] Updated weights for policy 0, policy_version 679200 (0.0013) [2024-06-15 19:51:09,584][1652491] Updated weights for policy 0, policy_version 679231 (0.0012) [2024-06-15 19:51:10,886][1652491] Updated weights for policy 0, policy_version 679280 (0.0013) [2024-06-15 19:51:10,955][1648985] Fps is (10 sec: 49151.1, 60 sec: 49151.9, 300 sec: 47430.3). Total num frames: 1391165440. Throughput: 0: 12026.2. Samples: 347829760. Policy #0 lag: (min: 15.0, avg: 102.5, max: 271.0) [2024-06-15 19:51:10,956][1648985] Avg episode reward: [(0, '128.070')] [2024-06-15 19:51:13,103][1652491] Updated weights for policy 0, policy_version 679329 (0.0012) [2024-06-15 19:51:15,547][1652491] Updated weights for policy 0, policy_version 679362 (0.0012) [2024-06-15 19:51:15,955][1648985] Fps is (10 sec: 49153.0, 60 sec: 48606.1, 300 sec: 47652.5). Total num frames: 1391362048. Throughput: 0: 12379.1. Samples: 347909120. Policy #0 lag: (min: 15.0, avg: 102.5, max: 271.0) [2024-06-15 19:51:15,955][1648985] Avg episode reward: [(0, '135.080')] [2024-06-15 19:51:16,768][1652491] Updated weights for policy 0, policy_version 679420 (0.0012) [2024-06-15 19:51:19,577][1652491] Updated weights for policy 0, policy_version 679472 (0.0014) [2024-06-15 19:51:20,955][1648985] Fps is (10 sec: 42598.9, 60 sec: 48059.7, 300 sec: 47208.1). Total num frames: 1391591424. Throughput: 0: 12290.4. Samples: 347980800. Policy #0 lag: (min: 15.0, avg: 102.5, max: 271.0) [2024-06-15 19:51:20,956][1648985] Avg episode reward: [(0, '160.370')] [2024-06-15 19:51:21,473][1652491] Updated weights for policy 0, policy_version 679520 (0.0014) [2024-06-15 19:51:23,446][1651469] Signal inference workers to stop experience collection... (35350 times) [2024-06-15 19:51:23,490][1652491] InferenceWorker_p0-w0: stopping experience collection (35350 times) [2024-06-15 19:51:23,658][1651469] Signal inference workers to resume experience collection... (35350 times) [2024-06-15 19:51:23,659][1652491] InferenceWorker_p0-w0: resuming experience collection (35350 times) [2024-06-15 19:51:23,996][1652491] Updated weights for policy 0, policy_version 679584 (0.0012) [2024-06-15 19:51:25,955][1648985] Fps is (10 sec: 49151.0, 60 sec: 49698.0, 300 sec: 47985.7). Total num frames: 1391853568. Throughput: 0: 12219.7. Samples: 348012032. Policy #0 lag: (min: 15.0, avg: 102.5, max: 271.0) [2024-06-15 19:51:25,956][1648985] Avg episode reward: [(0, '155.110')] [2024-06-15 19:51:27,430][1652491] Updated weights for policy 0, policy_version 679648 (0.0015) [2024-06-15 19:51:29,910][1652491] Updated weights for policy 0, policy_version 679696 (0.0014) [2024-06-15 19:51:30,961][1648985] Fps is (10 sec: 49124.9, 60 sec: 47509.2, 300 sec: 47540.5). Total num frames: 1392082944. Throughput: 0: 12150.0. Samples: 348087808. Policy #0 lag: (min: 15.0, avg: 102.5, max: 271.0) [2024-06-15 19:51:30,961][1648985] Avg episode reward: [(0, '146.830')] [2024-06-15 19:51:31,049][1652491] Updated weights for policy 0, policy_version 679740 (0.0013) [2024-06-15 19:51:32,935][1652491] Updated weights for policy 0, policy_version 679792 (0.0013) [2024-06-15 19:51:34,780][1652491] Updated weights for policy 0, policy_version 679826 (0.0012) [2024-06-15 19:51:35,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 50251.5, 300 sec: 47985.7). Total num frames: 1392377856. Throughput: 0: 12071.8. Samples: 348152832. Policy #0 lag: (min: 15.0, avg: 102.5, max: 271.0) [2024-06-15 19:51:35,956][1648985] Avg episode reward: [(0, '141.870')] [2024-06-15 19:51:39,679][1652491] Updated weights for policy 0, policy_version 679931 (0.0099) [2024-06-15 19:51:40,955][1648985] Fps is (10 sec: 42622.0, 60 sec: 46421.3, 300 sec: 47430.3). Total num frames: 1392508928. Throughput: 0: 11969.4. Samples: 348190720. Policy #0 lag: (min: 15.0, avg: 102.5, max: 271.0) [2024-06-15 19:51:40,956][1648985] Avg episode reward: [(0, '175.910')] [2024-06-15 19:51:42,046][1652491] Updated weights for policy 0, policy_version 679968 (0.0032) [2024-06-15 19:51:43,564][1652491] Updated weights for policy 0, policy_version 680016 (0.0022) [2024-06-15 19:51:44,613][1652491] Updated weights for policy 0, policy_version 680061 (0.0012) [2024-06-15 19:51:45,955][1648985] Fps is (10 sec: 42597.8, 60 sec: 48605.9, 300 sec: 47652.4). Total num frames: 1392803840. Throughput: 0: 11855.6. Samples: 348255232. Policy #0 lag: (min: 15.0, avg: 102.5, max: 271.0) [2024-06-15 19:51:45,956][1648985] Avg episode reward: [(0, '169.990')] [2024-06-15 19:51:46,467][1652491] Updated weights for policy 0, policy_version 680113 (0.0012) [2024-06-15 19:51:50,055][1652491] Updated weights for policy 0, policy_version 680146 (0.0012) [2024-06-15 19:51:50,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 47513.7, 300 sec: 47874.6). Total num frames: 1393033216. Throughput: 0: 11821.5. Samples: 348332032. Policy #0 lag: (min: 15.0, avg: 102.5, max: 271.0) [2024-06-15 19:51:50,956][1648985] Avg episode reward: [(0, '171.030')] [2024-06-15 19:51:52,406][1652491] Updated weights for policy 0, policy_version 680208 (0.0026) [2024-06-15 19:51:54,143][1652491] Updated weights for policy 0, policy_version 680261 (0.0022) [2024-06-15 19:51:55,412][1652491] Updated weights for policy 0, policy_version 680316 (0.0011) [2024-06-15 19:51:55,955][1648985] Fps is (10 sec: 49151.9, 60 sec: 48605.7, 300 sec: 47541.3). Total num frames: 1393295360. Throughput: 0: 11889.8. Samples: 348364800. Policy #0 lag: (min: 15.0, avg: 102.5, max: 271.0) [2024-06-15 19:51:55,956][1648985] Avg episode reward: [(0, '155.140')] [2024-06-15 19:51:57,655][1652491] Updated weights for policy 0, policy_version 680378 (0.0023) [2024-06-15 19:52:00,955][1648985] Fps is (10 sec: 45875.1, 60 sec: 46967.4, 300 sec: 47763.5). Total num frames: 1393491968. Throughput: 0: 11855.6. Samples: 348442624. Policy #0 lag: (min: 15.0, avg: 102.5, max: 271.0) [2024-06-15 19:52:00,956][1648985] Avg episode reward: [(0, '149.230')] [2024-06-15 19:52:01,130][1652491] Updated weights for policy 0, policy_version 680442 (0.0012) [2024-06-15 19:52:03,329][1652491] Updated weights for policy 0, policy_version 680485 (0.0020) [2024-06-15 19:52:04,696][1652491] Updated weights for policy 0, policy_version 680516 (0.0036) [2024-06-15 19:52:05,955][1648985] Fps is (10 sec: 49153.7, 60 sec: 48606.0, 300 sec: 47541.4). Total num frames: 1393786880. Throughput: 0: 11832.9. Samples: 348513280. Policy #0 lag: (min: 15.0, avg: 102.5, max: 271.0) [2024-06-15 19:52:05,955][1648985] Avg episode reward: [(0, '146.860')] [2024-06-15 19:52:06,268][1652491] Updated weights for policy 0, policy_version 680574 (0.0013) [2024-06-15 19:52:08,505][1651469] Signal inference workers to stop experience collection... (35400 times) [2024-06-15 19:52:08,551][1652491] InferenceWorker_p0-w0: stopping experience collection (35400 times) [2024-06-15 19:52:08,737][1651469] Signal inference workers to resume experience collection... (35400 times) [2024-06-15 19:52:08,738][1652491] InferenceWorker_p0-w0: resuming experience collection (35400 times) [2024-06-15 19:52:08,957][1652491] Updated weights for policy 0, policy_version 680637 (0.0013) [2024-06-15 19:52:10,955][1648985] Fps is (10 sec: 45874.6, 60 sec: 46421.4, 300 sec: 47874.6). Total num frames: 1393950720. Throughput: 0: 11844.2. Samples: 348545024. Policy #0 lag: (min: 15.0, avg: 102.5, max: 271.0) [2024-06-15 19:52:10,956][1648985] Avg episode reward: [(0, '157.120')] [2024-06-15 19:52:11,827][1652491] Updated weights for policy 0, policy_version 680695 (0.0020) [2024-06-15 19:52:15,008][1652491] Updated weights for policy 0, policy_version 680761 (0.0016) [2024-06-15 19:52:15,962][1648985] Fps is (10 sec: 42566.1, 60 sec: 47507.6, 300 sec: 47540.2). Total num frames: 1394212864. Throughput: 0: 11786.9. Samples: 348618240. Policy #0 lag: (min: 15.0, avg: 102.5, max: 271.0) [2024-06-15 19:52:15,963][1648985] Avg episode reward: [(0, '140.350')] [2024-06-15 19:52:17,578][1652491] Updated weights for policy 0, policy_version 680820 (0.0011) [2024-06-15 19:52:19,982][1652491] Updated weights for policy 0, policy_version 680864 (0.0011) [2024-06-15 19:52:20,961][1648985] Fps is (10 sec: 52399.5, 60 sec: 48055.2, 300 sec: 47984.8). Total num frames: 1394475008. Throughput: 0: 12024.8. Samples: 348694016. Policy #0 lag: (min: 15.0, avg: 102.5, max: 271.0) [2024-06-15 19:52:20,961][1648985] Avg episode reward: [(0, '134.940')] [2024-06-15 19:52:21,076][1652491] Updated weights for policy 0, policy_version 680897 (0.0015) [2024-06-15 19:52:22,329][1652491] Updated weights for policy 0, policy_version 680958 (0.0153) [2024-06-15 19:52:25,955][1648985] Fps is (10 sec: 49188.6, 60 sec: 47513.6, 300 sec: 47652.5). Total num frames: 1394704384. Throughput: 0: 11889.8. Samples: 348725760. Policy #0 lag: (min: 15.0, avg: 102.5, max: 271.0) [2024-06-15 19:52:25,956][1648985] Avg episode reward: [(0, '160.200')] [2024-06-15 19:52:26,102][1652491] Updated weights for policy 0, policy_version 681024 (0.0014) [2024-06-15 19:52:28,945][1652491] Updated weights for policy 0, policy_version 681080 (0.0012) [2024-06-15 19:52:30,962][1648985] Fps is (10 sec: 42592.3, 60 sec: 46966.2, 300 sec: 47651.3). Total num frames: 1394900992. Throughput: 0: 12058.6. Samples: 348797952. Policy #0 lag: (min: 13.0, avg: 123.7, max: 269.0) [2024-06-15 19:52:30,963][1648985] Avg episode reward: [(0, '158.400')] [2024-06-15 19:52:31,469][1652491] Updated weights for policy 0, policy_version 681122 (0.0013) [2024-06-15 19:52:33,045][1652491] Updated weights for policy 0, policy_version 681185 (0.0014) [2024-06-15 19:52:35,955][1648985] Fps is (10 sec: 42598.5, 60 sec: 45875.2, 300 sec: 47654.5). Total num frames: 1395130368. Throughput: 0: 11878.4. Samples: 348866560. Policy #0 lag: (min: 13.0, avg: 123.7, max: 269.0) [2024-06-15 19:52:35,956][1648985] Avg episode reward: [(0, '154.100')] [2024-06-15 19:52:36,196][1652491] Updated weights for policy 0, policy_version 681237 (0.0040) [2024-06-15 19:52:38,714][1652491] Updated weights for policy 0, policy_version 681296 (0.0013) [2024-06-15 19:52:39,708][1652491] Updated weights for policy 0, policy_version 681343 (0.0013) [2024-06-15 19:52:40,966][1648985] Fps is (10 sec: 49135.6, 60 sec: 48051.4, 300 sec: 47539.7). Total num frames: 1395392512. Throughput: 0: 12023.6. Samples: 348905984. Policy #0 lag: (min: 13.0, avg: 123.7, max: 269.0) [2024-06-15 19:52:40,966][1648985] Avg episode reward: [(0, '158.830')] [2024-06-15 19:52:43,343][1652491] Updated weights for policy 0, policy_version 681408 (0.0046) [2024-06-15 19:52:44,811][1652491] Updated weights for policy 0, policy_version 681464 (0.0013) [2024-06-15 19:52:45,973][1648985] Fps is (10 sec: 52335.8, 60 sec: 47499.7, 300 sec: 47649.6). Total num frames: 1395654656. Throughput: 0: 11748.6. Samples: 348971520. Policy #0 lag: (min: 13.0, avg: 123.7, max: 269.0) [2024-06-15 19:52:45,973][1648985] Avg episode reward: [(0, '161.800')] [2024-06-15 19:52:47,493][1652491] Updated weights for policy 0, policy_version 681520 (0.0082) [2024-06-15 19:52:50,810][1652491] Updated weights for policy 0, policy_version 681584 (0.0013) [2024-06-15 19:52:50,955][1648985] Fps is (10 sec: 49202.7, 60 sec: 47513.4, 300 sec: 47430.3). Total num frames: 1395884032. Throughput: 0: 11821.4. Samples: 349045248. Policy #0 lag: (min: 13.0, avg: 123.7, max: 269.0) [2024-06-15 19:52:50,956][1648985] Avg episode reward: [(0, '152.450')] [2024-06-15 19:52:53,334][1651469] Signal inference workers to stop experience collection... (35450 times) [2024-06-15 19:52:53,405][1652491] InferenceWorker_p0-w0: stopping experience collection (35450 times) [2024-06-15 19:52:53,624][1651469] Signal inference workers to resume experience collection... (35450 times) [2024-06-15 19:52:53,625][1652491] InferenceWorker_p0-w0: resuming experience collection (35450 times) [2024-06-15 19:52:53,627][1652491] Updated weights for policy 0, policy_version 681632 (0.0073) [2024-06-15 19:52:55,587][1652491] Updated weights for policy 0, policy_version 681712 (0.0136) [2024-06-15 19:52:55,955][1648985] Fps is (10 sec: 52521.7, 60 sec: 48059.8, 300 sec: 48207.8). Total num frames: 1396178944. Throughput: 0: 11980.8. Samples: 349084160. Policy #0 lag: (min: 13.0, avg: 123.7, max: 269.0) [2024-06-15 19:52:55,956][1648985] Avg episode reward: [(0, '157.530')] [2024-06-15 19:52:55,964][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000681728_1396178944.pth... [2024-06-15 19:52:56,033][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000676032_1384513536.pth [2024-06-15 19:52:58,695][1652491] Updated weights for policy 0, policy_version 681762 (0.0024) [2024-06-15 19:53:00,955][1648985] Fps is (10 sec: 42598.9, 60 sec: 46967.4, 300 sec: 47541.3). Total num frames: 1396310016. Throughput: 0: 11880.4. Samples: 349152768. Policy #0 lag: (min: 13.0, avg: 123.7, max: 269.0) [2024-06-15 19:53:00,956][1648985] Avg episode reward: [(0, '147.870')] [2024-06-15 19:53:01,536][1652491] Updated weights for policy 0, policy_version 681824 (0.0107) [2024-06-15 19:53:04,456][1652491] Updated weights for policy 0, policy_version 681872 (0.0014) [2024-06-15 19:53:05,955][1648985] Fps is (10 sec: 42599.3, 60 sec: 46967.4, 300 sec: 48096.8). Total num frames: 1396604928. Throughput: 0: 11675.1. Samples: 349219328. Policy #0 lag: (min: 13.0, avg: 123.7, max: 269.0) [2024-06-15 19:53:05,955][1648985] Avg episode reward: [(0, '154.290')] [2024-06-15 19:53:06,489][1652491] Updated weights for policy 0, policy_version 681953 (0.0013) [2024-06-15 19:53:09,582][1652491] Updated weights for policy 0, policy_version 682003 (0.0021) [2024-06-15 19:53:10,430][1652491] Updated weights for policy 0, policy_version 682045 (0.0013) [2024-06-15 19:53:10,955][1648985] Fps is (10 sec: 52428.5, 60 sec: 48059.7, 300 sec: 47763.5). Total num frames: 1396834304. Throughput: 0: 11719.1. Samples: 349253120. Policy #0 lag: (min: 13.0, avg: 123.7, max: 269.0) [2024-06-15 19:53:10,956][1648985] Avg episode reward: [(0, '159.400')] [2024-06-15 19:53:12,957][1652491] Updated weights for policy 0, policy_version 682105 (0.0015) [2024-06-15 19:53:15,757][1652491] Updated weights for policy 0, policy_version 682146 (0.0011) [2024-06-15 19:53:15,955][1648985] Fps is (10 sec: 42598.2, 60 sec: 46973.4, 300 sec: 47763.5). Total num frames: 1397030912. Throughput: 0: 11891.7. Samples: 349332992. Policy #0 lag: (min: 13.0, avg: 123.7, max: 269.0) [2024-06-15 19:53:15,955][1648985] Avg episode reward: [(0, '149.770')] [2024-06-15 19:53:17,838][1652491] Updated weights for policy 0, policy_version 682231 (0.0088) [2024-06-15 19:53:20,955][1648985] Fps is (10 sec: 42599.2, 60 sec: 46425.8, 300 sec: 47874.6). Total num frames: 1397260288. Throughput: 0: 11741.9. Samples: 349394944. Policy #0 lag: (min: 13.0, avg: 123.7, max: 269.0) [2024-06-15 19:53:20,955][1648985] Avg episode reward: [(0, '159.690')] [2024-06-15 19:53:21,792][1652491] Updated weights for policy 0, policy_version 682295 (0.0052) [2024-06-15 19:53:23,995][1652491] Updated weights for policy 0, policy_version 682352 (0.0011) [2024-06-15 19:53:25,955][1648985] Fps is (10 sec: 45874.8, 60 sec: 46421.3, 300 sec: 47541.4). Total num frames: 1397489664. Throughput: 0: 11676.3. Samples: 349431296. Policy #0 lag: (min: 13.0, avg: 123.7, max: 269.0) [2024-06-15 19:53:25,956][1648985] Avg episode reward: [(0, '159.100')] [2024-06-15 19:53:27,066][1652491] Updated weights for policy 0, policy_version 682401 (0.0014) [2024-06-15 19:53:28,417][1652491] Updated weights for policy 0, policy_version 682466 (0.0013) [2024-06-15 19:53:30,955][1648985] Fps is (10 sec: 49151.4, 60 sec: 47519.2, 300 sec: 48207.8). Total num frames: 1397751808. Throughput: 0: 11837.6. Samples: 349504000. Policy #0 lag: (min: 13.0, avg: 123.7, max: 269.0) [2024-06-15 19:53:30,956][1648985] Avg episode reward: [(0, '162.310')] [2024-06-15 19:53:32,213][1652491] Updated weights for policy 0, policy_version 682513 (0.0025) [2024-06-15 19:53:33,124][1652491] Updated weights for policy 0, policy_version 682555 (0.0015) [2024-06-15 19:53:34,324][1652491] Updated weights for policy 0, policy_version 682592 (0.0011) [2024-06-15 19:53:34,430][1651469] Signal inference workers to stop experience collection... (35500 times) [2024-06-15 19:53:34,478][1652491] InferenceWorker_p0-w0: stopping experience collection (35500 times) [2024-06-15 19:53:34,699][1651469] Signal inference workers to resume experience collection... (35500 times) [2024-06-15 19:53:34,700][1652491] InferenceWorker_p0-w0: resuming experience collection (35500 times) [2024-06-15 19:53:35,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 48059.7, 300 sec: 47763.5). Total num frames: 1398013952. Throughput: 0: 11844.3. Samples: 349578240. Policy #0 lag: (min: 13.0, avg: 123.7, max: 269.0) [2024-06-15 19:53:35,956][1648985] Avg episode reward: [(0, '176.650')] [2024-06-15 19:53:36,946][1652491] Updated weights for policy 0, policy_version 682630 (0.0020) [2024-06-15 19:53:38,711][1652491] Updated weights for policy 0, policy_version 682694 (0.0110) [2024-06-15 19:53:39,789][1652491] Updated weights for policy 0, policy_version 682747 (0.0015) [2024-06-15 19:53:40,961][1648985] Fps is (10 sec: 52397.1, 60 sec: 48063.3, 300 sec: 48429.0). Total num frames: 1398276096. Throughput: 0: 11808.6. Samples: 349615616. Policy #0 lag: (min: 13.0, avg: 123.7, max: 269.0) [2024-06-15 19:53:40,962][1648985] Avg episode reward: [(0, '159.230')] [2024-06-15 19:53:43,218][1652491] Updated weights for policy 0, policy_version 682816 (0.0101) [2024-06-15 19:53:45,882][1652491] Updated weights for policy 0, policy_version 682872 (0.0013) [2024-06-15 19:53:45,955][1648985] Fps is (10 sec: 49150.6, 60 sec: 47527.5, 300 sec: 47652.4). Total num frames: 1398505472. Throughput: 0: 11878.3. Samples: 349687296. Policy #0 lag: (min: 13.0, avg: 123.7, max: 269.0) [2024-06-15 19:53:45,956][1648985] Avg episode reward: [(0, '157.840')] [2024-06-15 19:53:49,168][1652491] Updated weights for policy 0, policy_version 682944 (0.0014) [2024-06-15 19:53:50,955][1648985] Fps is (10 sec: 52460.5, 60 sec: 48606.0, 300 sec: 48430.0). Total num frames: 1398800384. Throughput: 0: 11969.4. Samples: 349757952. Policy #0 lag: (min: 13.0, avg: 123.7, max: 269.0) [2024-06-15 19:53:50,956][1648985] Avg episode reward: [(0, '155.070')] [2024-06-15 19:53:53,658][1652491] Updated weights for policy 0, policy_version 683031 (0.0028) [2024-06-15 19:53:55,600][1652491] Updated weights for policy 0, policy_version 683088 (0.0013) [2024-06-15 19:53:55,955][1648985] Fps is (10 sec: 45876.8, 60 sec: 46421.4, 300 sec: 47763.5). Total num frames: 1398964224. Throughput: 0: 11980.8. Samples: 349792256. Policy #0 lag: (min: 13.0, avg: 123.7, max: 269.0) [2024-06-15 19:53:55,956][1648985] Avg episode reward: [(0, '160.430')] [2024-06-15 19:53:59,106][1652491] Updated weights for policy 0, policy_version 683137 (0.0032) [2024-06-15 19:54:00,464][1652491] Updated weights for policy 0, policy_version 683191 (0.0011) [2024-06-15 19:54:00,955][1648985] Fps is (10 sec: 39321.7, 60 sec: 48059.8, 300 sec: 47985.7). Total num frames: 1399193600. Throughput: 0: 11798.7. Samples: 349863936. Policy #0 lag: (min: 13.0, avg: 123.7, max: 269.0) [2024-06-15 19:54:00,956][1648985] Avg episode reward: [(0, '166.140')] [2024-06-15 19:54:02,014][1652491] Updated weights for policy 0, policy_version 683233 (0.0013) [2024-06-15 19:54:05,861][1652491] Updated weights for policy 0, policy_version 683317 (0.0016) [2024-06-15 19:54:05,972][1648985] Fps is (10 sec: 45795.7, 60 sec: 46953.8, 300 sec: 47649.7). Total num frames: 1399422976. Throughput: 0: 11851.1. Samples: 349928448. Policy #0 lag: (min: 13.0, avg: 123.7, max: 269.0) [2024-06-15 19:54:05,973][1648985] Avg episode reward: [(0, '149.080')] [2024-06-15 19:54:07,427][1652491] Updated weights for policy 0, policy_version 683360 (0.0123) [2024-06-15 19:54:08,305][1652491] Updated weights for policy 0, policy_version 683392 (0.0012) [2024-06-15 19:54:10,955][1648985] Fps is (10 sec: 39321.8, 60 sec: 45875.3, 300 sec: 47541.4). Total num frames: 1399586816. Throughput: 0: 11821.5. Samples: 349963264. Policy #0 lag: (min: 31.0, avg: 163.1, max: 287.0) [2024-06-15 19:54:10,955][1648985] Avg episode reward: [(0, '155.220')] [2024-06-15 19:54:12,389][1652491] Updated weights for policy 0, policy_version 683457 (0.0106) [2024-06-15 19:54:13,584][1652491] Updated weights for policy 0, policy_version 683514 (0.0014) [2024-06-15 19:54:15,955][1648985] Fps is (10 sec: 42672.7, 60 sec: 46967.5, 300 sec: 47652.4). Total num frames: 1399848960. Throughput: 0: 11776.0. Samples: 350033920. Policy #0 lag: (min: 31.0, avg: 163.1, max: 287.0) [2024-06-15 19:54:15,955][1648985] Avg episode reward: [(0, '154.740')] [2024-06-15 19:54:18,680][1652491] Updated weights for policy 0, policy_version 683587 (0.0112) [2024-06-15 19:54:20,977][1648985] Fps is (10 sec: 52313.3, 60 sec: 47496.1, 300 sec: 47537.8). Total num frames: 1400111104. Throughput: 0: 11633.8. Samples: 350102016. Policy #0 lag: (min: 31.0, avg: 163.1, max: 287.0) [2024-06-15 19:54:20,978][1648985] Avg episode reward: [(0, '146.610')] [2024-06-15 19:54:22,472][1652491] Updated weights for policy 0, policy_version 683649 (0.0012) [2024-06-15 19:54:22,695][1651469] Signal inference workers to stop experience collection... (35550 times) [2024-06-15 19:54:22,764][1652491] InferenceWorker_p0-w0: stopping experience collection (35550 times) [2024-06-15 19:54:22,885][1651469] Signal inference workers to resume experience collection... (35550 times) [2024-06-15 19:54:22,886][1652491] InferenceWorker_p0-w0: resuming experience collection (35550 times) [2024-06-15 19:54:23,798][1652491] Updated weights for policy 0, policy_version 683716 (0.0012) [2024-06-15 19:54:25,021][1652491] Updated weights for policy 0, policy_version 683775 (0.0013) [2024-06-15 19:54:25,955][1648985] Fps is (10 sec: 52428.4, 60 sec: 48059.8, 300 sec: 47874.6). Total num frames: 1400373248. Throughput: 0: 11754.8. Samples: 350144512. Policy #0 lag: (min: 31.0, avg: 163.1, max: 287.0) [2024-06-15 19:54:25,956][1648985] Avg episode reward: [(0, '155.680')] [2024-06-15 19:54:27,477][1652491] Updated weights for policy 0, policy_version 683836 (0.0014) [2024-06-15 19:54:30,452][1652491] Updated weights for policy 0, policy_version 683895 (0.0014) [2024-06-15 19:54:30,972][1648985] Fps is (10 sec: 52454.7, 60 sec: 48046.1, 300 sec: 47538.6). Total num frames: 1400635392. Throughput: 0: 11714.7. Samples: 350214656. Policy #0 lag: (min: 31.0, avg: 163.1, max: 287.0) [2024-06-15 19:54:30,973][1648985] Avg episode reward: [(0, '161.320')] [2024-06-15 19:54:33,999][1652491] Updated weights for policy 0, policy_version 683952 (0.0012) [2024-06-15 19:54:35,567][1652491] Updated weights for policy 0, policy_version 684004 (0.0012) [2024-06-15 19:54:35,963][1648985] Fps is (10 sec: 49115.6, 60 sec: 47507.7, 300 sec: 47873.4). Total num frames: 1400864768. Throughput: 0: 11705.8. Samples: 350284800. Policy #0 lag: (min: 31.0, avg: 163.1, max: 287.0) [2024-06-15 19:54:35,963][1648985] Avg episode reward: [(0, '156.360')] [2024-06-15 19:54:37,579][1652491] Updated weights for policy 0, policy_version 684052 (0.0012) [2024-06-15 19:54:38,368][1652491] Updated weights for policy 0, policy_version 684095 (0.0013) [2024-06-15 19:54:40,955][1648985] Fps is (10 sec: 42671.0, 60 sec: 46426.0, 300 sec: 47208.1). Total num frames: 1401061376. Throughput: 0: 11696.3. Samples: 350318592. Policy #0 lag: (min: 31.0, avg: 163.1, max: 287.0) [2024-06-15 19:54:40,956][1648985] Avg episode reward: [(0, '161.750')] [2024-06-15 19:54:41,772][1652491] Updated weights for policy 0, policy_version 684157 (0.0013) [2024-06-15 19:54:45,184][1652491] Updated weights for policy 0, policy_version 684208 (0.0012) [2024-06-15 19:54:45,955][1648985] Fps is (10 sec: 45909.7, 60 sec: 46967.8, 300 sec: 47652.5). Total num frames: 1401323520. Throughput: 0: 11821.5. Samples: 350395904. Policy #0 lag: (min: 31.0, avg: 163.1, max: 287.0) [2024-06-15 19:54:45,955][1648985] Avg episode reward: [(0, '162.010')] [2024-06-15 19:54:46,502][1652491] Updated weights for policy 0, policy_version 684258 (0.0012) [2024-06-15 19:54:48,892][1652491] Updated weights for policy 0, policy_version 684323 (0.0011) [2024-06-15 19:54:50,955][1648985] Fps is (10 sec: 49151.7, 60 sec: 45875.1, 300 sec: 47319.2). Total num frames: 1401552896. Throughput: 0: 11860.2. Samples: 350461952. Policy #0 lag: (min: 31.0, avg: 163.1, max: 287.0) [2024-06-15 19:54:50,956][1648985] Avg episode reward: [(0, '159.100')] [2024-06-15 19:54:52,111][1652491] Updated weights for policy 0, policy_version 684384 (0.0110) [2024-06-15 19:54:55,547][1652491] Updated weights for policy 0, policy_version 684432 (0.0023) [2024-06-15 19:54:55,955][1648985] Fps is (10 sec: 39320.2, 60 sec: 45875.0, 300 sec: 47208.1). Total num frames: 1401716736. Throughput: 0: 11935.2. Samples: 350500352. Policy #0 lag: (min: 31.0, avg: 163.1, max: 287.0) [2024-06-15 19:54:55,956][1648985] Avg episode reward: [(0, '155.240')] [2024-06-15 19:54:56,445][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000684464_1401782272.pth... [2024-06-15 19:54:56,570][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000678912_1390411776.pth [2024-06-15 19:54:57,493][1652491] Updated weights for policy 0, policy_version 684499 (0.0014) [2024-06-15 19:54:58,470][1652491] Updated weights for policy 0, policy_version 684542 (0.0012) [2024-06-15 19:55:00,352][1652491] Updated weights for policy 0, policy_version 684595 (0.0027) [2024-06-15 19:55:00,955][1648985] Fps is (10 sec: 52429.6, 60 sec: 48059.7, 300 sec: 47652.5). Total num frames: 1402077184. Throughput: 0: 11832.9. Samples: 350566400. Policy #0 lag: (min: 31.0, avg: 163.1, max: 287.0) [2024-06-15 19:55:00,956][1648985] Avg episode reward: [(0, '178.710')] [2024-06-15 19:55:03,671][1651469] Signal inference workers to stop experience collection... (35600 times) [2024-06-15 19:55:03,754][1652491] InferenceWorker_p0-w0: stopping experience collection (35600 times) [2024-06-15 19:55:03,756][1652491] Updated weights for policy 0, policy_version 684645 (0.0028) [2024-06-15 19:55:03,910][1651469] Signal inference workers to resume experience collection... (35600 times) [2024-06-15 19:55:03,911][1652491] InferenceWorker_p0-w0: resuming experience collection (35600 times) [2024-06-15 19:55:05,955][1648985] Fps is (10 sec: 49153.8, 60 sec: 46434.8, 300 sec: 47430.3). Total num frames: 1402208256. Throughput: 0: 12020.8. Samples: 350642688. Policy #0 lag: (min: 31.0, avg: 163.1, max: 287.0) [2024-06-15 19:55:05,956][1648985] Avg episode reward: [(0, '181.720')] [2024-06-15 19:55:07,003][1652491] Updated weights for policy 0, policy_version 684688 (0.0013) [2024-06-15 19:55:08,824][1652491] Updated weights for policy 0, policy_version 684756 (0.0123) [2024-06-15 19:55:10,671][1652491] Updated weights for policy 0, policy_version 684817 (0.0100) [2024-06-15 19:55:10,955][1648985] Fps is (10 sec: 45875.5, 60 sec: 49152.0, 300 sec: 47763.6). Total num frames: 1402535936. Throughput: 0: 11685.0. Samples: 350670336. Policy #0 lag: (min: 31.0, avg: 163.1, max: 287.0) [2024-06-15 19:55:10,955][1648985] Avg episode reward: [(0, '148.830')] [2024-06-15 19:55:11,691][1652491] Updated weights for policy 0, policy_version 684860 (0.0012) [2024-06-15 19:55:15,955][1648985] Fps is (10 sec: 52428.1, 60 sec: 48059.7, 300 sec: 47541.4). Total num frames: 1402732544. Throughput: 0: 11700.8. Samples: 350740992. Policy #0 lag: (min: 31.0, avg: 163.1, max: 287.0) [2024-06-15 19:55:15,956][1648985] Avg episode reward: [(0, '151.130')] [2024-06-15 19:55:18,191][1652491] Updated weights for policy 0, policy_version 684930 (0.0017) [2024-06-15 19:55:20,111][1652491] Updated weights for policy 0, policy_version 684994 (0.0013) [2024-06-15 19:55:20,955][1648985] Fps is (10 sec: 39321.4, 60 sec: 46984.7, 300 sec: 47652.4). Total num frames: 1402929152. Throughput: 0: 11618.6. Samples: 350807552. Policy #0 lag: (min: 31.0, avg: 163.1, max: 287.0) [2024-06-15 19:55:20,956][1648985] Avg episode reward: [(0, '143.270')] [2024-06-15 19:55:21,616][1652491] Updated weights for policy 0, policy_version 685062 (0.0013) [2024-06-15 19:55:22,916][1652491] Updated weights for policy 0, policy_version 685116 (0.0031) [2024-06-15 19:55:25,974][1648985] Fps is (10 sec: 45787.6, 60 sec: 46952.5, 300 sec: 47316.1). Total num frames: 1403191296. Throughput: 0: 11657.3. Samples: 350843392. Policy #0 lag: (min: 31.0, avg: 163.1, max: 287.0) [2024-06-15 19:55:25,975][1648985] Avg episode reward: [(0, '146.030')] [2024-06-15 19:55:29,451][1652491] Updated weights for policy 0, policy_version 685190 (0.0021) [2024-06-15 19:55:30,728][1652491] Updated weights for policy 0, policy_version 685241 (0.0013) [2024-06-15 19:55:30,960][1648985] Fps is (10 sec: 45854.1, 60 sec: 45884.8, 300 sec: 47542.0). Total num frames: 1403387904. Throughput: 0: 11683.8. Samples: 350921728. Policy #0 lag: (min: 31.0, avg: 163.1, max: 287.0) [2024-06-15 19:55:30,960][1648985] Avg episode reward: [(0, '141.020')] [2024-06-15 19:55:32,605][1652491] Updated weights for policy 0, policy_version 685312 (0.0144) [2024-06-15 19:55:33,947][1652491] Updated weights for policy 0, policy_version 685369 (0.0034) [2024-06-15 19:55:35,955][1648985] Fps is (10 sec: 45963.3, 60 sec: 46427.1, 300 sec: 47208.1). Total num frames: 1403650048. Throughput: 0: 11787.4. Samples: 350992384. Policy #0 lag: (min: 31.0, avg: 163.1, max: 287.0) [2024-06-15 19:55:35,956][1648985] Avg episode reward: [(0, '143.870')] [2024-06-15 19:55:36,896][1652491] Updated weights for policy 0, policy_version 685408 (0.0013) [2024-06-15 19:55:40,562][1652491] Updated weights for policy 0, policy_version 685456 (0.0014) [2024-06-15 19:55:40,957][1648985] Fps is (10 sec: 45888.2, 60 sec: 46420.0, 300 sec: 47319.0). Total num frames: 1403846656. Throughput: 0: 11718.7. Samples: 351027712. Policy #0 lag: (min: 31.0, avg: 163.1, max: 287.0) [2024-06-15 19:55:40,957][1648985] Avg episode reward: [(0, '142.700')] [2024-06-15 19:55:42,069][1652491] Updated weights for policy 0, policy_version 685508 (0.0013) [2024-06-15 19:55:43,893][1652491] Updated weights for policy 0, policy_version 685571 (0.0012) [2024-06-15 19:55:44,997][1652491] Updated weights for policy 0, policy_version 685626 (0.0014) [2024-06-15 19:55:45,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 47513.6, 300 sec: 47430.3). Total num frames: 1404174336. Throughput: 0: 11639.5. Samples: 351090176. Policy #0 lag: (min: 31.0, avg: 163.1, max: 287.0) [2024-06-15 19:55:45,956][1648985] Avg episode reward: [(0, '148.940')] [2024-06-15 19:55:47,185][1651469] Signal inference workers to stop experience collection... (35650 times) [2024-06-15 19:55:47,241][1652491] InferenceWorker_p0-w0: stopping experience collection (35650 times) [2024-06-15 19:55:47,450][1651469] Signal inference workers to resume experience collection... (35650 times) [2024-06-15 19:55:47,466][1652491] InferenceWorker_p0-w0: resuming experience collection (35650 times) [2024-06-15 19:55:48,262][1652491] Updated weights for policy 0, policy_version 685690 (0.0012) [2024-06-15 19:55:50,955][1648985] Fps is (10 sec: 45883.3, 60 sec: 45875.3, 300 sec: 47208.1). Total num frames: 1404305408. Throughput: 0: 11741.8. Samples: 351171072. Policy #0 lag: (min: 15.0, avg: 133.6, max: 271.0) [2024-06-15 19:55:50,956][1648985] Avg episode reward: [(0, '150.370')] [2024-06-15 19:55:52,579][1652491] Updated weights for policy 0, policy_version 685748 (0.0099) [2024-06-15 19:55:53,674][1652491] Updated weights for policy 0, policy_version 685780 (0.0015) [2024-06-15 19:55:55,268][1652491] Updated weights for policy 0, policy_version 685840 (0.0011) [2024-06-15 19:55:55,955][1648985] Fps is (10 sec: 49151.5, 60 sec: 49152.1, 300 sec: 47430.3). Total num frames: 1404665856. Throughput: 0: 11867.0. Samples: 351204352. Policy #0 lag: (min: 15.0, avg: 133.6, max: 271.0) [2024-06-15 19:55:55,956][1648985] Avg episode reward: [(0, '131.170')] [2024-06-15 19:55:56,234][1652491] Updated weights for policy 0, policy_version 685888 (0.0020) [2024-06-15 19:55:59,402][1652491] Updated weights for policy 0, policy_version 685948 (0.0020) [2024-06-15 19:56:00,970][1648985] Fps is (10 sec: 52348.7, 60 sec: 45863.5, 300 sec: 47316.8). Total num frames: 1404829696. Throughput: 0: 11851.6. Samples: 351274496. Policy #0 lag: (min: 15.0, avg: 133.6, max: 271.0) [2024-06-15 19:56:00,971][1648985] Avg episode reward: [(0, '129.690')] [2024-06-15 19:56:03,589][1652491] Updated weights for policy 0, policy_version 686000 (0.0014) [2024-06-15 19:56:05,325][1652491] Updated weights for policy 0, policy_version 686033 (0.0013) [2024-06-15 19:56:05,955][1648985] Fps is (10 sec: 36045.4, 60 sec: 46967.5, 300 sec: 46986.0). Total num frames: 1405026304. Throughput: 0: 11855.7. Samples: 351341056. Policy #0 lag: (min: 15.0, avg: 133.6, max: 271.0) [2024-06-15 19:56:05,955][1648985] Avg episode reward: [(0, '129.110')] [2024-06-15 19:56:07,413][1652491] Updated weights for policy 0, policy_version 686116 (0.0013) [2024-06-15 19:56:10,075][1652491] Updated weights for policy 0, policy_version 686146 (0.0013) [2024-06-15 19:56:10,970][1648985] Fps is (10 sec: 45875.8, 60 sec: 45863.5, 300 sec: 47205.7). Total num frames: 1405288448. Throughput: 0: 11777.0. Samples: 351373312. Policy #0 lag: (min: 15.0, avg: 133.6, max: 271.0) [2024-06-15 19:56:10,971][1648985] Avg episode reward: [(0, '130.220')] [2024-06-15 19:56:11,422][1652491] Updated weights for policy 0, policy_version 686207 (0.0012) [2024-06-15 19:56:15,955][1648985] Fps is (10 sec: 45874.2, 60 sec: 45875.1, 300 sec: 47097.0). Total num frames: 1405485056. Throughput: 0: 11572.3. Samples: 351442432. Policy #0 lag: (min: 15.0, avg: 133.6, max: 271.0) [2024-06-15 19:56:15,956][1648985] Avg episode reward: [(0, '119.160')] [2024-06-15 19:56:16,818][1652491] Updated weights for policy 0, policy_version 686273 (0.0119) [2024-06-15 19:56:18,300][1652491] Updated weights for policy 0, policy_version 686336 (0.0011) [2024-06-15 19:56:19,500][1652491] Updated weights for policy 0, policy_version 686391 (0.0015) [2024-06-15 19:56:20,960][1648985] Fps is (10 sec: 45920.1, 60 sec: 46963.2, 300 sec: 47096.2). Total num frames: 1405747200. Throughput: 0: 11501.6. Samples: 351510016. Policy #0 lag: (min: 15.0, avg: 133.6, max: 271.0) [2024-06-15 19:56:20,961][1648985] Avg episode reward: [(0, '144.000')] [2024-06-15 19:56:22,411][1652491] Updated weights for policy 0, policy_version 686459 (0.0012) [2024-06-15 19:56:25,764][1652491] Updated weights for policy 0, policy_version 686497 (0.0012) [2024-06-15 19:56:25,955][1648985] Fps is (10 sec: 45875.8, 60 sec: 45889.9, 300 sec: 46986.9). Total num frames: 1405943808. Throughput: 0: 11548.9. Samples: 351547392. Policy #0 lag: (min: 15.0, avg: 133.6, max: 271.0) [2024-06-15 19:56:25,956][1648985] Avg episode reward: [(0, '158.640')] [2024-06-15 19:56:28,069][1652491] Updated weights for policy 0, policy_version 686548 (0.0014) [2024-06-15 19:56:29,059][1652491] Updated weights for policy 0, policy_version 686596 (0.0015) [2024-06-15 19:56:29,364][1651469] Signal inference workers to stop experience collection... (35700 times) [2024-06-15 19:56:29,399][1652491] InferenceWorker_p0-w0: stopping experience collection (35700 times) [2024-06-15 19:56:29,646][1651469] Signal inference workers to resume experience collection... (35700 times) [2024-06-15 19:56:29,647][1652491] InferenceWorker_p0-w0: resuming experience collection (35700 times) [2024-06-15 19:56:30,298][1652491] Updated weights for policy 0, policy_version 686656 (0.0019) [2024-06-15 19:56:30,955][1648985] Fps is (10 sec: 52457.4, 60 sec: 48063.4, 300 sec: 47097.1). Total num frames: 1406271488. Throughput: 0: 11730.5. Samples: 351618048. Policy #0 lag: (min: 15.0, avg: 133.6, max: 271.0) [2024-06-15 19:56:30,955][1648985] Avg episode reward: [(0, '167.730')] [2024-06-15 19:56:33,052][1652491] Updated weights for policy 0, policy_version 686710 (0.0125) [2024-06-15 19:56:35,955][1648985] Fps is (10 sec: 49152.0, 60 sec: 46421.3, 300 sec: 47208.1). Total num frames: 1406435328. Throughput: 0: 11594.0. Samples: 351692800. Policy #0 lag: (min: 15.0, avg: 133.6, max: 271.0) [2024-06-15 19:56:35,956][1648985] Avg episode reward: [(0, '168.960')] [2024-06-15 19:56:36,011][1652491] Updated weights for policy 0, policy_version 686752 (0.0016) [2024-06-15 19:56:38,774][1652491] Updated weights for policy 0, policy_version 686800 (0.0013) [2024-06-15 19:56:40,657][1652491] Updated weights for policy 0, policy_version 686864 (0.0013) [2024-06-15 19:56:40,955][1648985] Fps is (10 sec: 42598.3, 60 sec: 47515.0, 300 sec: 47097.1). Total num frames: 1406697472. Throughput: 0: 11685.0. Samples: 351730176. Policy #0 lag: (min: 15.0, avg: 133.6, max: 271.0) [2024-06-15 19:56:40,955][1648985] Avg episode reward: [(0, '165.600')] [2024-06-15 19:56:43,157][1652491] Updated weights for policy 0, policy_version 686916 (0.0012) [2024-06-15 19:56:44,553][1652491] Updated weights for policy 0, policy_version 686976 (0.0012) [2024-06-15 19:56:45,955][1648985] Fps is (10 sec: 49152.3, 60 sec: 45875.2, 300 sec: 47097.1). Total num frames: 1406926848. Throughput: 0: 11529.6. Samples: 351793152. Policy #0 lag: (min: 15.0, avg: 133.6, max: 271.0) [2024-06-15 19:56:45,955][1648985] Avg episode reward: [(0, '149.170')] [2024-06-15 19:56:47,363][1652491] Updated weights for policy 0, policy_version 687027 (0.0014) [2024-06-15 19:56:50,871][1652491] Updated weights for policy 0, policy_version 687088 (0.0013) [2024-06-15 19:56:50,955][1648985] Fps is (10 sec: 45875.2, 60 sec: 47513.6, 300 sec: 46986.0). Total num frames: 1407156224. Throughput: 0: 11673.6. Samples: 351866368. Policy #0 lag: (min: 15.0, avg: 133.6, max: 271.0) [2024-06-15 19:56:50,956][1648985] Avg episode reward: [(0, '138.430')] [2024-06-15 19:56:52,870][1652491] Updated weights for policy 0, policy_version 687139 (0.0018) [2024-06-15 19:56:54,512][1652491] Updated weights for policy 0, policy_version 687200 (0.0141) [2024-06-15 19:56:55,955][1648985] Fps is (10 sec: 52428.0, 60 sec: 46421.3, 300 sec: 47319.2). Total num frames: 1407451136. Throughput: 0: 11848.2. Samples: 351906304. Policy #0 lag: (min: 15.0, avg: 133.6, max: 271.0) [2024-06-15 19:56:55,956][1648985] Avg episode reward: [(0, '147.770')] [2024-06-15 19:56:55,980][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000687232_1407451136.pth... [2024-06-15 19:56:56,039][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000681728_1396178944.pth [2024-06-15 19:56:57,762][1652491] Updated weights for policy 0, policy_version 687250 (0.0029) [2024-06-15 19:56:58,531][1652491] Updated weights for policy 0, policy_version 687287 (0.0013) [2024-06-15 19:57:00,955][1648985] Fps is (10 sec: 45875.2, 60 sec: 46433.2, 300 sec: 46874.9). Total num frames: 1407614976. Throughput: 0: 11901.2. Samples: 351977984. Policy #0 lag: (min: 15.0, avg: 133.6, max: 271.0) [2024-06-15 19:57:00,955][1648985] Avg episode reward: [(0, '145.870')] [2024-06-15 19:57:01,490][1652491] Updated weights for policy 0, policy_version 687349 (0.0015) [2024-06-15 19:57:04,159][1652491] Updated weights for policy 0, policy_version 687419 (0.0012) [2024-06-15 19:57:05,955][1648985] Fps is (10 sec: 49152.0, 60 sec: 48605.7, 300 sec: 47430.3). Total num frames: 1407942656. Throughput: 0: 11868.4. Samples: 352044032. Policy #0 lag: (min: 15.0, avg: 133.6, max: 271.0) [2024-06-15 19:57:05,956][1648985] Avg episode reward: [(0, '133.600')] [2024-06-15 19:57:06,041][1652491] Updated weights for policy 0, policy_version 687478 (0.0013) [2024-06-15 19:57:08,320][1652491] Updated weights for policy 0, policy_version 687508 (0.0023) [2024-06-15 19:57:09,274][1652491] Updated weights for policy 0, policy_version 687552 (0.0077) [2024-06-15 19:57:10,955][1648985] Fps is (10 sec: 49151.7, 60 sec: 46979.3, 300 sec: 47098.2). Total num frames: 1408106496. Throughput: 0: 11912.5. Samples: 352083456. Policy #0 lag: (min: 15.0, avg: 133.6, max: 271.0) [2024-06-15 19:57:10,956][1648985] Avg episode reward: [(0, '147.720')] [2024-06-15 19:57:12,275][1652491] Updated weights for policy 0, policy_version 687600 (0.0034) [2024-06-15 19:57:14,506][1652491] Updated weights for policy 0, policy_version 687648 (0.0014) [2024-06-15 19:57:15,955][1648985] Fps is (10 sec: 42598.7, 60 sec: 48059.8, 300 sec: 47098.0). Total num frames: 1408368640. Throughput: 0: 12014.9. Samples: 352158720. Policy #0 lag: (min: 15.0, avg: 133.6, max: 271.0) [2024-06-15 19:57:15,956][1648985] Avg episode reward: [(0, '155.770')] [2024-06-15 19:57:16,001][1651469] Signal inference workers to stop experience collection... (35750 times) [2024-06-15 19:57:16,041][1652491] InferenceWorker_p0-w0: stopping experience collection (35750 times) [2024-06-15 19:57:16,249][1651469] Signal inference workers to resume experience collection... (35750 times) [2024-06-15 19:57:16,250][1652491] InferenceWorker_p0-w0: resuming experience collection (35750 times) [2024-06-15 19:57:16,437][1652491] Updated weights for policy 0, policy_version 687699 (0.0013) [2024-06-15 19:57:18,541][1652491] Updated weights for policy 0, policy_version 687747 (0.0013) [2024-06-15 19:57:19,892][1652491] Updated weights for policy 0, policy_version 687801 (0.0013) [2024-06-15 19:57:20,962][1648985] Fps is (10 sec: 52391.3, 60 sec: 48058.3, 300 sec: 47207.0). Total num frames: 1408630784. Throughput: 0: 11944.8. Samples: 352230400. Policy #0 lag: (min: 15.0, avg: 133.6, max: 271.0) [2024-06-15 19:57:20,963][1648985] Avg episode reward: [(0, '135.180')] [2024-06-15 19:57:22,872][1652491] Updated weights for policy 0, policy_version 687840 (0.0022) [2024-06-15 19:57:24,329][1652491] Updated weights for policy 0, policy_version 687888 (0.0019) [2024-06-15 19:57:25,225][1652491] Updated weights for policy 0, policy_version 687927 (0.0016) [2024-06-15 19:57:25,955][1648985] Fps is (10 sec: 52428.6, 60 sec: 49152.0, 300 sec: 47431.4). Total num frames: 1408892928. Throughput: 0: 12026.3. Samples: 352271360. Policy #0 lag: (min: 15.0, avg: 133.6, max: 271.0) [2024-06-15 19:57:25,956][1648985] Avg episode reward: [(0, '135.160')] [2024-06-15 19:57:27,575][1652491] Updated weights for policy 0, policy_version 687970 (0.0012) [2024-06-15 19:57:30,218][1652491] Updated weights for policy 0, policy_version 688032 (0.0012) [2024-06-15 19:57:30,979][1648985] Fps is (10 sec: 52343.5, 60 sec: 48040.9, 300 sec: 47537.6). Total num frames: 1409155072. Throughput: 0: 12145.1. Samples: 352339968. Policy #0 lag: (min: 3.0, avg: 123.9, max: 259.0) [2024-06-15 19:57:30,979][1648985] Avg episode reward: [(0, '132.300')] [2024-06-15 19:57:33,166][1652491] Updated weights for policy 0, policy_version 688080 (0.0025) [2024-06-15 19:57:35,698][1652491] Updated weights for policy 0, policy_version 688146 (0.0014) [2024-06-15 19:57:35,955][1648985] Fps is (10 sec: 45875.0, 60 sec: 48605.8, 300 sec: 47320.9). Total num frames: 1409351680. Throughput: 0: 12162.8. Samples: 352413696. Policy #0 lag: (min: 3.0, avg: 123.9, max: 259.0) [2024-06-15 19:57:35,956][1648985] Avg episode reward: [(0, '137.460')] [2024-06-15 19:57:36,570][1652491] Updated weights for policy 0, policy_version 688188 (0.0015) [2024-06-15 19:57:38,903][1652491] Updated weights for policy 0, policy_version 688246 (0.0019) [2024-06-15 19:57:40,956][1648985] Fps is (10 sec: 42693.8, 60 sec: 48058.8, 300 sec: 47210.8). Total num frames: 1409581056. Throughput: 0: 11980.5. Samples: 352445440. Policy #0 lag: (min: 3.0, avg: 123.9, max: 259.0) [2024-06-15 19:57:40,957][1648985] Avg episode reward: [(0, '147.360')] [2024-06-15 19:57:41,269][1652491] Updated weights for policy 0, policy_version 688291 (0.0014) [2024-06-15 19:57:43,805][1652491] Updated weights for policy 0, policy_version 688336 (0.0013) [2024-06-15 19:57:44,657][1652491] Updated weights for policy 0, policy_version 688383 (0.0013) [2024-06-15 19:57:45,955][1648985] Fps is (10 sec: 45876.1, 60 sec: 48059.8, 300 sec: 47208.2). Total num frames: 1409810432. Throughput: 0: 12060.5. Samples: 352520704. Policy #0 lag: (min: 3.0, avg: 123.9, max: 259.0) [2024-06-15 19:57:45,955][1648985] Avg episode reward: [(0, '160.270')] [2024-06-15 19:57:47,481][1652491] Updated weights for policy 0, policy_version 688448 (0.0013) [2024-06-15 19:57:50,800][1652491] Updated weights for policy 0, policy_version 688512 (0.0142) [2024-06-15 19:57:50,956][1648985] Fps is (10 sec: 49151.5, 60 sec: 48604.9, 300 sec: 47096.9). Total num frames: 1410072576. Throughput: 0: 12048.8. Samples: 352586240. Policy #0 lag: (min: 3.0, avg: 123.9, max: 259.0) [2024-06-15 19:57:50,957][1648985] Avg episode reward: [(0, '155.540')] [2024-06-15 19:57:53,102][1652491] Updated weights for policy 0, policy_version 688572 (0.0013) [2024-06-15 19:57:55,414][1652491] Updated weights for policy 0, policy_version 688613 (0.0013) [2024-06-15 19:57:55,958][1648985] Fps is (10 sec: 52410.6, 60 sec: 48057.1, 300 sec: 47540.8). Total num frames: 1410334720. Throughput: 0: 11957.2. Samples: 352621568. Policy #0 lag: (min: 3.0, avg: 123.9, max: 259.0) [2024-06-15 19:57:55,959][1648985] Avg episode reward: [(0, '155.240')] [2024-06-15 19:57:57,833][1652491] Updated weights for policy 0, policy_version 688658 (0.0013) [2024-06-15 19:58:00,955][1648985] Fps is (10 sec: 42602.9, 60 sec: 48059.6, 300 sec: 47097.0). Total num frames: 1410498560. Throughput: 0: 11980.8. Samples: 352697856. Policy #0 lag: (min: 3.0, avg: 123.9, max: 259.0) [2024-06-15 19:58:00,956][1648985] Avg episode reward: [(0, '150.210')] [2024-06-15 19:58:01,135][1652491] Updated weights for policy 0, policy_version 688728 (0.0122) [2024-06-15 19:58:01,407][1651469] Signal inference workers to stop experience collection... (35800 times) [2024-06-15 19:58:01,497][1652491] InferenceWorker_p0-w0: stopping experience collection (35800 times) [2024-06-15 19:58:01,782][1651469] Signal inference workers to resume experience collection... (35800 times) [2024-06-15 19:58:01,783][1652491] InferenceWorker_p0-w0: resuming experience collection (35800 times) [2024-06-15 19:58:02,198][1652491] Updated weights for policy 0, policy_version 688768 (0.0012) [2024-06-15 19:58:03,993][1652491] Updated weights for policy 0, policy_version 688830 (0.0095) [2024-06-15 19:58:05,955][1648985] Fps is (10 sec: 39334.8, 60 sec: 46421.4, 300 sec: 47097.1). Total num frames: 1410727936. Throughput: 0: 11925.8. Samples: 352766976. Policy #0 lag: (min: 3.0, avg: 123.9, max: 259.0) [2024-06-15 19:58:05,956][1648985] Avg episode reward: [(0, '150.890')] [2024-06-15 19:58:09,112][1652491] Updated weights for policy 0, policy_version 688912 (0.0012) [2024-06-15 19:58:10,966][1648985] Fps is (10 sec: 49097.5, 60 sec: 48050.7, 300 sec: 47317.4). Total num frames: 1410990080. Throughput: 0: 11761.7. Samples: 352800768. Policy #0 lag: (min: 3.0, avg: 123.9, max: 259.0) [2024-06-15 19:58:10,967][1648985] Avg episode reward: [(0, '149.310')] [2024-06-15 19:58:12,046][1652491] Updated weights for policy 0, policy_version 688976 (0.0016) [2024-06-15 19:58:13,129][1652491] Updated weights for policy 0, policy_version 689019 (0.0016) [2024-06-15 19:58:14,680][1652491] Updated weights for policy 0, policy_version 689057 (0.0015) [2024-06-15 19:58:15,955][1648985] Fps is (10 sec: 52429.1, 60 sec: 48059.8, 300 sec: 47430.3). Total num frames: 1411252224. Throughput: 0: 11793.5. Samples: 352870400. Policy #0 lag: (min: 3.0, avg: 123.9, max: 259.0) [2024-06-15 19:58:15,955][1648985] Avg episode reward: [(0, '158.770')] [2024-06-15 19:58:17,651][1652491] Updated weights for policy 0, policy_version 689120 (0.0013) [2024-06-15 19:58:20,511][1652491] Updated weights for policy 0, policy_version 689184 (0.0016) [2024-06-15 19:58:20,955][1648985] Fps is (10 sec: 49207.1, 60 sec: 47519.2, 300 sec: 47430.3). Total num frames: 1411481600. Throughput: 0: 11753.3. Samples: 352942592. Policy #0 lag: (min: 3.0, avg: 123.9, max: 259.0) [2024-06-15 19:58:20,956][1648985] Avg episode reward: [(0, '146.810')] [2024-06-15 19:58:23,331][1652491] Updated weights for policy 0, policy_version 689218 (0.0014) [2024-06-15 19:58:25,144][1652491] Updated weights for policy 0, policy_version 689296 (0.0015) [2024-06-15 19:58:25,956][1648985] Fps is (10 sec: 45873.0, 60 sec: 46967.2, 300 sec: 47319.2). Total num frames: 1411710976. Throughput: 0: 11878.6. Samples: 352979968. Policy #0 lag: (min: 3.0, avg: 123.9, max: 259.0) [2024-06-15 19:58:25,956][1648985] Avg episode reward: [(0, '158.550')] [2024-06-15 19:58:26,357][1652491] Updated weights for policy 0, policy_version 689341 (0.0018) [2024-06-15 19:58:30,955][1648985] Fps is (10 sec: 42598.8, 60 sec: 45893.2, 300 sec: 47097.1). Total num frames: 1411907584. Throughput: 0: 11571.2. Samples: 353041408. Policy #0 lag: (min: 3.0, avg: 123.9, max: 259.0) [2024-06-15 19:58:30,955][1648985] Avg episode reward: [(0, '149.750')] [2024-06-15 19:58:31,681][1652491] Updated weights for policy 0, policy_version 689409 (0.0144) [2024-06-15 19:58:32,642][1652491] Updated weights for policy 0, policy_version 689466 (0.0014) [2024-06-15 19:58:35,757][1652491] Updated weights for policy 0, policy_version 689523 (0.0013) [2024-06-15 19:58:35,955][1648985] Fps is (10 sec: 45876.9, 60 sec: 46967.5, 300 sec: 47098.0). Total num frames: 1412169728. Throughput: 0: 11810.4. Samples: 353117696. Policy #0 lag: (min: 3.0, avg: 123.9, max: 259.0) [2024-06-15 19:58:35,956][1648985] Avg episode reward: [(0, '154.870')] [2024-06-15 19:58:37,108][1652491] Updated weights for policy 0, policy_version 689589 (0.0015) [2024-06-15 19:58:39,468][1652491] Updated weights for policy 0, policy_version 689648 (0.0140) [2024-06-15 19:58:40,955][1648985] Fps is (10 sec: 52427.9, 60 sec: 47514.4, 300 sec: 47208.2). Total num frames: 1412431872. Throughput: 0: 11822.4. Samples: 353153536. Policy #0 lag: (min: 3.0, avg: 123.9, max: 259.0) [2024-06-15 19:58:40,956][1648985] Avg episode reward: [(0, '152.210')] [2024-06-15 19:58:43,723][1652491] Updated weights for policy 0, policy_version 689703 (0.0151) [2024-06-15 19:58:45,876][1652491] Updated weights for policy 0, policy_version 689746 (0.0014) [2024-06-15 19:58:45,955][1648985] Fps is (10 sec: 42598.9, 60 sec: 46421.3, 300 sec: 46763.8). Total num frames: 1412595712. Throughput: 0: 11821.6. Samples: 353229824. Policy #0 lag: (min: 3.0, avg: 123.9, max: 259.0) [2024-06-15 19:58:45,955][1648985] Avg episode reward: [(0, '165.750')] [2024-06-15 19:58:46,259][1651469] Signal inference workers to stop experience collection... (35850 times) [2024-06-15 19:58:46,368][1652491] InferenceWorker_p0-w0: stopping experience collection (35850 times) [2024-06-15 19:58:46,570][1651469] Signal inference workers to resume experience collection... (35850 times) [2024-06-15 19:58:46,572][1652491] InferenceWorker_p0-w0: resuming experience collection (35850 times) [2024-06-15 19:58:48,087][1652491] Updated weights for policy 0, policy_version 689840 (0.0014) [2024-06-15 19:58:50,307][1652491] Updated weights for policy 0, policy_version 689888 (0.0014) [2024-06-15 19:58:50,955][1648985] Fps is (10 sec: 52429.3, 60 sec: 48060.7, 300 sec: 47430.3). Total num frames: 1412956160. Throughput: 0: 11673.6. Samples: 353292288. Policy #0 lag: (min: 3.0, avg: 123.9, max: 259.0) [2024-06-15 19:58:50,956][1648985] Avg episode reward: [(0, '156.080')] [2024-06-15 19:58:54,825][1652491] Updated weights for policy 0, policy_version 689952 (0.0015) [2024-06-15 19:58:55,955][1648985] Fps is (10 sec: 49150.7, 60 sec: 45877.6, 300 sec: 47097.0). Total num frames: 1413087232. Throughput: 0: 11756.1. Samples: 353329664. Policy #0 lag: (min: 3.0, avg: 123.9, max: 259.0) [2024-06-15 19:58:55,956][1648985] Avg episode reward: [(0, '161.130')] [2024-06-15 19:58:55,970][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000689984_1413087232.pth... [2024-06-15 19:58:56,050][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000684464_1401782272.pth [2024-06-15 19:58:57,711][1652491] Updated weights for policy 0, policy_version 690019 (0.0014) [2024-06-15 19:58:59,559][1652491] Updated weights for policy 0, policy_version 690109 (0.0102) [2024-06-15 19:59:00,955][1648985] Fps is (10 sec: 39321.5, 60 sec: 47513.7, 300 sec: 47210.9). Total num frames: 1413349376. Throughput: 0: 11662.2. Samples: 353395200. Policy #0 lag: (min: 3.0, avg: 123.9, max: 259.0) [2024-06-15 19:59:00,956][1648985] Avg episode reward: [(0, '165.240')] [2024-06-15 19:59:02,555][1652491] Updated weights for policy 0, policy_version 690176 (0.0014) [2024-06-15 19:59:05,955][1648985] Fps is (10 sec: 39322.7, 60 sec: 45875.2, 300 sec: 47097.1). Total num frames: 1413480448. Throughput: 0: 11650.9. Samples: 353466880. Policy #0 lag: (min: 3.0, avg: 123.9, max: 259.0) [2024-06-15 19:59:05,956][1648985] Avg episode reward: [(0, '167.300')] [2024-06-15 19:59:07,635][1652491] Updated weights for policy 0, policy_version 690238 (0.0011) [2024-06-15 19:59:09,989][1652491] Updated weights for policy 0, policy_version 690304 (0.0122) [2024-06-15 19:59:10,955][1648985] Fps is (10 sec: 42598.7, 60 sec: 46430.1, 300 sec: 47208.1). Total num frames: 1413775360. Throughput: 0: 11605.4. Samples: 353502208. Policy #0 lag: (min: 15.0, avg: 104.0, max: 271.0) [2024-06-15 19:59:10,955][1648985] Avg episode reward: [(0, '138.770')] [2024-06-15 19:59:11,646][1652491] Updated weights for policy 0, policy_version 690363 (0.0013) [2024-06-15 19:59:14,194][1652491] Updated weights for policy 0, policy_version 690423 (0.0021) [2024-06-15 19:59:15,955][1648985] Fps is (10 sec: 52428.6, 60 sec: 45875.2, 300 sec: 47100.6). Total num frames: 1414004736. Throughput: 0: 11571.2. Samples: 353562112. Policy #0 lag: (min: 15.0, avg: 104.0, max: 271.0) [2024-06-15 19:59:15,956][1648985] Avg episode reward: [(0, '143.520')] [2024-06-15 19:59:19,035][1652491] Updated weights for policy 0, policy_version 690464 (0.0012) [2024-06-15 19:59:20,313][1652491] Updated weights for policy 0, policy_version 690498 (0.0012) [2024-06-15 19:59:20,955][1648985] Fps is (10 sec: 39321.7, 60 sec: 44783.0, 300 sec: 46763.8). Total num frames: 1414168576. Throughput: 0: 11457.4. Samples: 353633280. Policy #0 lag: (min: 15.0, avg: 104.0, max: 271.0) [2024-06-15 19:59:20,955][1648985] Avg episode reward: [(0, '151.190')] [2024-06-15 19:59:22,269][1652491] Updated weights for policy 0, policy_version 690580 (0.0014) [2024-06-15 19:59:23,272][1652491] Updated weights for policy 0, policy_version 690620 (0.0012) [2024-06-15 19:59:25,976][1648985] Fps is (10 sec: 49046.9, 60 sec: 46405.1, 300 sec: 46985.3). Total num frames: 1414496256. Throughput: 0: 11235.9. Samples: 353659392. Policy #0 lag: (min: 15.0, avg: 104.0, max: 271.0) [2024-06-15 19:59:25,977][1648985] Avg episode reward: [(0, '161.890')] [2024-06-15 19:59:26,037][1652491] Updated weights for policy 0, policy_version 690682 (0.0013) [2024-06-15 19:59:30,955][1648985] Fps is (10 sec: 42597.7, 60 sec: 44782.8, 300 sec: 46542.8). Total num frames: 1414594560. Throughput: 0: 11241.2. Samples: 353735680. Policy #0 lag: (min: 15.0, avg: 104.0, max: 271.0) [2024-06-15 19:59:30,956][1648985] Avg episode reward: [(0, '157.730')] [2024-06-15 19:59:31,536][1652491] Updated weights for policy 0, policy_version 690745 (0.0015) [2024-06-15 19:59:32,196][1651469] Signal inference workers to stop experience collection... (35900 times) [2024-06-15 19:59:32,275][1652491] InferenceWorker_p0-w0: stopping experience collection (35900 times) [2024-06-15 19:59:32,492][1651469] Signal inference workers to resume experience collection... (35900 times) [2024-06-15 19:59:32,493][1652491] InferenceWorker_p0-w0: resuming experience collection (35900 times) [2024-06-15 19:59:32,940][1652491] Updated weights for policy 0, policy_version 690784 (0.0013) [2024-06-15 19:59:34,346][1652491] Updated weights for policy 0, policy_version 690848 (0.0100) [2024-06-15 19:59:35,955][1648985] Fps is (10 sec: 42689.7, 60 sec: 45875.2, 300 sec: 46986.0). Total num frames: 1414922240. Throughput: 0: 11195.7. Samples: 353796096. Policy #0 lag: (min: 15.0, avg: 104.0, max: 271.0) [2024-06-15 19:59:35,956][1648985] Avg episode reward: [(0, '159.900')] [2024-06-15 19:59:36,696][1652491] Updated weights for policy 0, policy_version 690898 (0.0014) [2024-06-15 19:59:40,955][1648985] Fps is (10 sec: 45875.9, 60 sec: 43690.8, 300 sec: 46541.7). Total num frames: 1415053312. Throughput: 0: 11195.8. Samples: 353833472. Policy #0 lag: (min: 15.0, avg: 104.0, max: 271.0) [2024-06-15 19:59:40,955][1648985] Avg episode reward: [(0, '165.860')] [2024-06-15 19:59:41,100][1652491] Updated weights for policy 0, policy_version 690960 (0.0119) [2024-06-15 19:59:42,129][1652491] Updated weights for policy 0, policy_version 691007 (0.0012) [2024-06-15 19:59:44,575][1652491] Updated weights for policy 0, policy_version 691060 (0.0013) [2024-06-15 19:59:45,955][1648985] Fps is (10 sec: 45874.8, 60 sec: 46421.2, 300 sec: 46874.9). Total num frames: 1415380992. Throughput: 0: 11446.0. Samples: 353910272. Policy #0 lag: (min: 15.0, avg: 104.0, max: 271.0) [2024-06-15 19:59:45,956][1648985] Avg episode reward: [(0, '151.350')] [2024-06-15 19:59:46,332][1652491] Updated weights for policy 0, policy_version 691129 (0.0100) [2024-06-15 19:59:47,717][1652491] Updated weights for policy 0, policy_version 691170 (0.0014) [2024-06-15 19:59:50,955][1648985] Fps is (10 sec: 52428.5, 60 sec: 43690.7, 300 sec: 46986.0). Total num frames: 1415577600. Throughput: 0: 11355.0. Samples: 353977856. Policy #0 lag: (min: 15.0, avg: 104.0, max: 271.0) [2024-06-15 19:59:50,956][1648985] Avg episode reward: [(0, '146.170')] [2024-06-15 19:59:52,320][1652491] Updated weights for policy 0, policy_version 691208 (0.0014) [2024-06-15 19:59:54,456][1652491] Updated weights for policy 0, policy_version 691266 (0.0096) [2024-06-15 19:59:55,955][1648985] Fps is (10 sec: 42598.7, 60 sec: 45329.2, 300 sec: 46541.7). Total num frames: 1415806976. Throughput: 0: 11423.3. Samples: 354016256. Policy #0 lag: (min: 15.0, avg: 104.0, max: 271.0) [2024-06-15 19:59:55,956][1648985] Avg episode reward: [(0, '128.780')] [2024-06-15 19:59:56,642][1652491] Updated weights for policy 0, policy_version 691345 (0.0012) [2024-06-15 19:59:57,753][1652491] Updated weights for policy 0, policy_version 691389 (0.0021) [2024-06-15 19:59:59,478][1652491] Updated weights for policy 0, policy_version 691448 (0.0014) [2024-06-15 20:00:00,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 45875.2, 300 sec: 47097.0). Total num frames: 1416101888. Throughput: 0: 11389.1. Samples: 354074624. Policy #0 lag: (min: 15.0, avg: 104.0, max: 271.0) [2024-06-15 20:00:00,956][1648985] Avg episode reward: [(0, '120.440')] [2024-06-15 20:00:04,814][1652491] Updated weights for policy 0, policy_version 691504 (0.0016) [2024-06-15 20:00:05,955][1648985] Fps is (10 sec: 42598.6, 60 sec: 45875.2, 300 sec: 46430.6). Total num frames: 1416232960. Throughput: 0: 11548.4. Samples: 354152960. Policy #0 lag: (min: 15.0, avg: 104.0, max: 271.0) [2024-06-15 20:00:05,956][1648985] Avg episode reward: [(0, '125.040')] [2024-06-15 20:00:06,425][1652491] Updated weights for policy 0, policy_version 691536 (0.0012) [2024-06-15 20:00:08,512][1652491] Updated weights for policy 0, policy_version 691616 (0.0012) [2024-06-15 20:00:10,449][1651469] Signal inference workers to stop experience collection... (35950 times) [2024-06-15 20:00:10,506][1652491] InferenceWorker_p0-w0: stopping experience collection (35950 times) [2024-06-15 20:00:10,519][1652491] Updated weights for policy 0, policy_version 691684 (0.0015) [2024-06-15 20:00:10,708][1651469] Signal inference workers to resume experience collection... (35950 times) [2024-06-15 20:00:10,709][1652491] InferenceWorker_p0-w0: resuming experience collection (35950 times) [2024-06-15 20:00:10,955][1648985] Fps is (10 sec: 49151.7, 60 sec: 46967.4, 300 sec: 46986.0). Total num frames: 1416593408. Throughput: 0: 11588.1. Samples: 354180608. Policy #0 lag: (min: 15.0, avg: 104.0, max: 271.0) [2024-06-15 20:00:10,956][1648985] Avg episode reward: [(0, '140.790')] [2024-06-15 20:00:15,955][1648985] Fps is (10 sec: 42597.8, 60 sec: 44236.7, 300 sec: 46541.6). Total num frames: 1416658944. Throughput: 0: 11480.2. Samples: 354252288. Policy #0 lag: (min: 15.0, avg: 104.0, max: 271.0) [2024-06-15 20:00:15,956][1648985] Avg episode reward: [(0, '147.520')] [2024-06-15 20:00:16,209][1652491] Updated weights for policy 0, policy_version 691744 (0.0013) [2024-06-15 20:00:17,920][1652491] Updated weights for policy 0, policy_version 691778 (0.0024) [2024-06-15 20:00:20,397][1652491] Updated weights for policy 0, policy_version 691872 (0.0041) [2024-06-15 20:00:20,955][1648985] Fps is (10 sec: 39321.2, 60 sec: 46967.3, 300 sec: 46766.8). Total num frames: 1416986624. Throughput: 0: 11514.3. Samples: 354314240. Policy #0 lag: (min: 15.0, avg: 104.0, max: 271.0) [2024-06-15 20:00:20,956][1648985] Avg episode reward: [(0, '135.400')] [2024-06-15 20:00:22,498][1652491] Updated weights for policy 0, policy_version 691952 (0.0012) [2024-06-15 20:00:22,884][1652491] Updated weights for policy 0, policy_version 691968 (0.0011) [2024-06-15 20:00:25,955][1648985] Fps is (10 sec: 49152.7, 60 sec: 44252.6, 300 sec: 46653.5). Total num frames: 1417150464. Throughput: 0: 11457.4. Samples: 354349056. Policy #0 lag: (min: 15.0, avg: 104.0, max: 271.0) [2024-06-15 20:00:25,955][1648985] Avg episode reward: [(0, '148.600')] [2024-06-15 20:00:27,942][1652491] Updated weights for policy 0, policy_version 692031 (0.0087) [2024-06-15 20:00:30,758][1652491] Updated weights for policy 0, policy_version 692085 (0.0118) [2024-06-15 20:00:30,955][1648985] Fps is (10 sec: 39322.1, 60 sec: 46421.4, 300 sec: 46541.7). Total num frames: 1417379840. Throughput: 0: 11548.5. Samples: 354429952. Policy #0 lag: (min: 15.0, avg: 104.0, max: 271.0) [2024-06-15 20:00:30,956][1648985] Avg episode reward: [(0, '154.640')] [2024-06-15 20:00:32,844][1652491] Updated weights for policy 0, policy_version 692165 (0.0013) [2024-06-15 20:00:34,048][1652491] Updated weights for policy 0, policy_version 692212 (0.0013) [2024-06-15 20:00:35,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 45875.2, 300 sec: 46875.2). Total num frames: 1417674752. Throughput: 0: 11366.4. Samples: 354489344. Policy #0 lag: (min: 15.0, avg: 104.0, max: 271.0) [2024-06-15 20:00:35,955][1648985] Avg episode reward: [(0, '159.730')] [2024-06-15 20:00:39,039][1652491] Updated weights for policy 0, policy_version 692256 (0.0011) [2024-06-15 20:00:40,458][1652491] Updated weights for policy 0, policy_version 692289 (0.0056) [2024-06-15 20:00:40,955][1648985] Fps is (10 sec: 45874.8, 60 sec: 46421.2, 300 sec: 46319.5). Total num frames: 1417838592. Throughput: 0: 11468.8. Samples: 354532352. Policy #0 lag: (min: 15.0, avg: 104.0, max: 271.0) [2024-06-15 20:00:40,956][1648985] Avg episode reward: [(0, '154.040')] [2024-06-15 20:00:42,320][1652491] Updated weights for policy 0, policy_version 692356 (0.0012) [2024-06-15 20:00:44,166][1652491] Updated weights for policy 0, policy_version 692432 (0.0012) [2024-06-15 20:00:45,171][1652491] Updated weights for policy 0, policy_version 692476 (0.0012) [2024-06-15 20:00:45,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 46967.6, 300 sec: 47097.1). Total num frames: 1418199040. Throughput: 0: 11366.4. Samples: 354586112. Policy #0 lag: (min: 15.0, avg: 104.0, max: 271.0) [2024-06-15 20:00:45,955][1648985] Avg episode reward: [(0, '163.410')] [2024-06-15 20:00:50,963][1648985] Fps is (10 sec: 42563.9, 60 sec: 44776.8, 300 sec: 46096.1). Total num frames: 1418264576. Throughput: 0: 11455.3. Samples: 354668544. Policy #0 lag: (min: 14.0, avg: 82.6, max: 270.0) [2024-06-15 20:00:50,964][1648985] Avg episode reward: [(0, '165.150')] [2024-06-15 20:00:51,234][1652491] Updated weights for policy 0, policy_version 692544 (0.0025) [2024-06-15 20:00:53,673][1652491] Updated weights for policy 0, policy_version 692608 (0.0014) [2024-06-15 20:00:53,847][1651469] Signal inference workers to stop experience collection... (36000 times) [2024-06-15 20:00:53,896][1652491] InferenceWorker_p0-w0: stopping experience collection (36000 times) [2024-06-15 20:00:54,155][1651469] Signal inference workers to resume experience collection... (36000 times) [2024-06-15 20:00:54,158][1652491] InferenceWorker_p0-w0: resuming experience collection (36000 times) [2024-06-15 20:00:55,438][1652491] Updated weights for policy 0, policy_version 692672 (0.0013) [2024-06-15 20:00:55,955][1648985] Fps is (10 sec: 42597.4, 60 sec: 46967.3, 300 sec: 46766.2). Total num frames: 1418625024. Throughput: 0: 11446.0. Samples: 354695680. Policy #0 lag: (min: 14.0, avg: 82.6, max: 270.0) [2024-06-15 20:00:55,956][1648985] Avg episode reward: [(0, '148.580')] [2024-06-15 20:00:56,467][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000692720_1418690560.pth... [2024-06-15 20:00:56,507][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000687232_1407451136.pth [2024-06-15 20:00:56,722][1652491] Updated weights for policy 0, policy_version 692730 (0.0012) [2024-06-15 20:01:00,955][1648985] Fps is (10 sec: 45911.7, 60 sec: 43690.5, 300 sec: 46430.5). Total num frames: 1418723328. Throughput: 0: 11491.5. Samples: 354769408. Policy #0 lag: (min: 14.0, avg: 82.6, max: 270.0) [2024-06-15 20:01:00,956][1648985] Avg episode reward: [(0, '137.770')] [2024-06-15 20:01:02,263][1652491] Updated weights for policy 0, policy_version 692784 (0.0043) [2024-06-15 20:01:04,372][1652491] Updated weights for policy 0, policy_version 692851 (0.0014) [2024-06-15 20:01:05,955][1648985] Fps is (10 sec: 45875.9, 60 sec: 47513.5, 300 sec: 46766.2). Total num frames: 1419083776. Throughput: 0: 11514.3. Samples: 354832384. Policy #0 lag: (min: 14.0, avg: 82.6, max: 270.0) [2024-06-15 20:01:05,956][1648985] Avg episode reward: [(0, '120.200')] [2024-06-15 20:01:06,567][1652491] Updated weights for policy 0, policy_version 692930 (0.0015) [2024-06-15 20:01:08,003][1652491] Updated weights for policy 0, policy_version 692989 (0.0013) [2024-06-15 20:01:10,955][1648985] Fps is (10 sec: 52430.1, 60 sec: 44236.8, 300 sec: 46652.8). Total num frames: 1419247616. Throughput: 0: 11343.6. Samples: 354859520. Policy #0 lag: (min: 14.0, avg: 82.6, max: 270.0) [2024-06-15 20:01:10,956][1648985] Avg episode reward: [(0, '117.040')] [2024-06-15 20:01:14,719][1652491] Updated weights for policy 0, policy_version 693056 (0.0114) [2024-06-15 20:01:15,955][1648985] Fps is (10 sec: 39321.4, 60 sec: 46967.5, 300 sec: 46542.5). Total num frames: 1419476992. Throughput: 0: 11286.7. Samples: 354937856. Policy #0 lag: (min: 14.0, avg: 82.6, max: 270.0) [2024-06-15 20:01:15,956][1648985] Avg episode reward: [(0, '134.560')] [2024-06-15 20:01:16,325][1652491] Updated weights for policy 0, policy_version 693121 (0.0116) [2024-06-15 20:01:18,169][1652491] Updated weights for policy 0, policy_version 693185 (0.0014) [2024-06-15 20:01:19,487][1652491] Updated weights for policy 0, policy_version 693241 (0.0010) [2024-06-15 20:01:20,955][1648985] Fps is (10 sec: 52429.4, 60 sec: 46421.5, 300 sec: 46874.9). Total num frames: 1419771904. Throughput: 0: 11366.4. Samples: 355000832. Policy #0 lag: (min: 14.0, avg: 82.6, max: 270.0) [2024-06-15 20:01:20,955][1648985] Avg episode reward: [(0, '148.060')] [2024-06-15 20:01:25,955][1648985] Fps is (10 sec: 39322.1, 60 sec: 45329.1, 300 sec: 46097.4). Total num frames: 1419870208. Throughput: 0: 11389.2. Samples: 355044864. Policy #0 lag: (min: 14.0, avg: 82.6, max: 270.0) [2024-06-15 20:01:25,955][1648985] Avg episode reward: [(0, '132.120')] [2024-06-15 20:01:25,983][1652491] Updated weights for policy 0, policy_version 693304 (0.0016) [2024-06-15 20:01:26,977][1652491] Updated weights for policy 0, policy_version 693344 (0.0014) [2024-06-15 20:01:28,318][1652491] Updated weights for policy 0, policy_version 693394 (0.0022) [2024-06-15 20:01:30,014][1652491] Updated weights for policy 0, policy_version 693472 (0.0013) [2024-06-15 20:01:30,955][1648985] Fps is (10 sec: 52428.0, 60 sec: 48605.8, 300 sec: 46986.0). Total num frames: 1420296192. Throughput: 0: 11434.6. Samples: 355100672. Policy #0 lag: (min: 14.0, avg: 82.6, max: 270.0) [2024-06-15 20:01:30,956][1648985] Avg episode reward: [(0, '120.700')] [2024-06-15 20:01:35,955][1648985] Fps is (10 sec: 42598.4, 60 sec: 43690.7, 300 sec: 46097.4). Total num frames: 1420296192. Throughput: 0: 11482.3. Samples: 355185152. Policy #0 lag: (min: 14.0, avg: 82.6, max: 270.0) [2024-06-15 20:01:35,956][1648985] Avg episode reward: [(0, '139.730')] [2024-06-15 20:01:36,019][1652491] Updated weights for policy 0, policy_version 693507 (0.0012) [2024-06-15 20:01:36,424][1651469] Signal inference workers to stop experience collection... (36050 times) [2024-06-15 20:01:36,486][1652491] InferenceWorker_p0-w0: stopping experience collection (36050 times) [2024-06-15 20:01:36,678][1651469] Signal inference workers to resume experience collection... (36050 times) [2024-06-15 20:01:36,679][1652491] InferenceWorker_p0-w0: resuming experience collection (36050 times) [2024-06-15 20:01:37,353][1652491] Updated weights for policy 0, policy_version 693563 (0.0016) [2024-06-15 20:01:38,935][1652491] Updated weights for policy 0, policy_version 693617 (0.0013) [2024-06-15 20:01:40,363][1652491] Updated weights for policy 0, policy_version 693664 (0.0017) [2024-06-15 20:01:40,955][1648985] Fps is (10 sec: 36045.1, 60 sec: 46967.6, 300 sec: 46541.7). Total num frames: 1420656640. Throughput: 0: 11480.2. Samples: 355212288. Policy #0 lag: (min: 14.0, avg: 82.6, max: 270.0) [2024-06-15 20:01:40,956][1648985] Avg episode reward: [(0, '154.550')] [2024-06-15 20:01:42,191][1652491] Updated weights for policy 0, policy_version 693744 (0.0013) [2024-06-15 20:01:45,955][1648985] Fps is (10 sec: 52427.4, 60 sec: 43690.5, 300 sec: 46319.5). Total num frames: 1420820480. Throughput: 0: 11491.6. Samples: 355286528. Policy #0 lag: (min: 14.0, avg: 82.6, max: 270.0) [2024-06-15 20:01:45,956][1648985] Avg episode reward: [(0, '146.820')] [2024-06-15 20:01:47,109][1652491] Updated weights for policy 0, policy_version 693776 (0.0011) [2024-06-15 20:01:49,984][1652491] Updated weights for policy 0, policy_version 693862 (0.0015) [2024-06-15 20:01:50,955][1648985] Fps is (10 sec: 42598.9, 60 sec: 46974.0, 300 sec: 46208.5). Total num frames: 1421082624. Throughput: 0: 11525.7. Samples: 355351040. Policy #0 lag: (min: 14.0, avg: 82.6, max: 270.0) [2024-06-15 20:01:50,955][1648985] Avg episode reward: [(0, '141.200')] [2024-06-15 20:01:51,308][1652491] Updated weights for policy 0, policy_version 693920 (0.0015) [2024-06-15 20:01:53,302][1652491] Updated weights for policy 0, policy_version 694000 (0.0013) [2024-06-15 20:01:55,955][1648985] Fps is (10 sec: 52430.3, 60 sec: 45329.2, 300 sec: 46541.7). Total num frames: 1421344768. Throughput: 0: 11525.7. Samples: 355378176. Policy #0 lag: (min: 14.0, avg: 82.6, max: 270.0) [2024-06-15 20:01:55,955][1648985] Avg episode reward: [(0, '145.140')] [2024-06-15 20:01:59,257][1652491] Updated weights for policy 0, policy_version 694064 (0.0013) [2024-06-15 20:02:00,955][1648985] Fps is (10 sec: 42598.0, 60 sec: 46421.6, 300 sec: 45986.3). Total num frames: 1421508608. Throughput: 0: 11537.1. Samples: 355457024. Policy #0 lag: (min: 14.0, avg: 82.6, max: 270.0) [2024-06-15 20:02:00,955][1648985] Avg episode reward: [(0, '131.430')] [2024-06-15 20:02:01,985][1652491] Updated weights for policy 0, policy_version 694141 (0.0013) [2024-06-15 20:02:03,965][1652491] Updated weights for policy 0, policy_version 694208 (0.0012) [2024-06-15 20:02:05,437][1652491] Updated weights for policy 0, policy_version 694266 (0.0044) [2024-06-15 20:02:05,955][1648985] Fps is (10 sec: 52429.2, 60 sec: 46421.5, 300 sec: 46652.8). Total num frames: 1421869056. Throughput: 0: 11434.7. Samples: 355515392. Policy #0 lag: (min: 14.0, avg: 82.6, max: 270.0) [2024-06-15 20:02:05,955][1648985] Avg episode reward: [(0, '152.180')] [2024-06-15 20:02:10,827][1652491] Updated weights for policy 0, policy_version 694327 (0.0100) [2024-06-15 20:02:10,955][1648985] Fps is (10 sec: 45875.0, 60 sec: 45329.1, 300 sec: 46097.4). Total num frames: 1421967360. Throughput: 0: 11457.4. Samples: 355560448. Policy #0 lag: (min: 14.0, avg: 82.6, max: 270.0) [2024-06-15 20:02:10,956][1648985] Avg episode reward: [(0, '148.910')] [2024-06-15 20:02:13,687][1652491] Updated weights for policy 0, policy_version 694399 (0.0014) [2024-06-15 20:02:15,524][1652491] Updated weights for policy 0, policy_version 694469 (0.0105) [2024-06-15 20:02:15,707][1651469] Signal inference workers to stop experience collection... (36100 times) [2024-06-15 20:02:15,753][1652491] InferenceWorker_p0-w0: stopping experience collection (36100 times) [2024-06-15 20:02:15,955][1648985] Fps is (10 sec: 42597.1, 60 sec: 46967.4, 300 sec: 46320.6). Total num frames: 1422295040. Throughput: 0: 11559.8. Samples: 355620864. Policy #0 lag: (min: 14.0, avg: 82.6, max: 270.0) [2024-06-15 20:02:15,956][1648985] Avg episode reward: [(0, '158.170')] [2024-06-15 20:02:16,006][1651469] Signal inference workers to resume experience collection... (36100 times) [2024-06-15 20:02:16,007][1652491] InferenceWorker_p0-w0: resuming experience collection (36100 times) [2024-06-15 20:02:16,573][1652491] Updated weights for policy 0, policy_version 694526 (0.0060) [2024-06-15 20:02:20,955][1648985] Fps is (10 sec: 42598.5, 60 sec: 43690.6, 300 sec: 45764.1). Total num frames: 1422393344. Throughput: 0: 11491.5. Samples: 355702272. Policy #0 lag: (min: 14.0, avg: 82.6, max: 270.0) [2024-06-15 20:02:20,956][1648985] Avg episode reward: [(0, '173.490')] [2024-06-15 20:02:22,173][1652491] Updated weights for policy 0, policy_version 694576 (0.0118) [2024-06-15 20:02:24,203][1652491] Updated weights for policy 0, policy_version 694640 (0.0015) [2024-06-15 20:02:25,291][1652491] Updated weights for policy 0, policy_version 694672 (0.0013) [2024-06-15 20:02:25,955][1648985] Fps is (10 sec: 45875.6, 60 sec: 48059.6, 300 sec: 46101.0). Total num frames: 1422753792. Throughput: 0: 11457.4. Samples: 355727872. Policy #0 lag: (min: 14.0, avg: 82.6, max: 270.0) [2024-06-15 20:02:25,956][1648985] Avg episode reward: [(0, '170.940')] [2024-06-15 20:02:26,880][1652491] Updated weights for policy 0, policy_version 694752 (0.0093) [2024-06-15 20:02:30,955][1648985] Fps is (10 sec: 52428.6, 60 sec: 43690.7, 300 sec: 45986.3). Total num frames: 1422917632. Throughput: 0: 11514.4. Samples: 355804672. Policy #0 lag: (min: 14.0, avg: 82.6, max: 270.0) [2024-06-15 20:02:30,956][1648985] Avg episode reward: [(0, '170.850')] [2024-06-15 20:02:32,868][1652491] Updated weights for policy 0, policy_version 694816 (0.0013) [2024-06-15 20:02:34,077][1652491] Updated weights for policy 0, policy_version 694866 (0.0019) [2024-06-15 20:02:35,057][1652491] Updated weights for policy 0, policy_version 694912 (0.0023) [2024-06-15 20:02:35,955][1648985] Fps is (10 sec: 45875.7, 60 sec: 48605.9, 300 sec: 46208.6). Total num frames: 1423212544. Throughput: 0: 11696.3. Samples: 355877376. Policy #0 lag: (min: 15.0, avg: 82.6, max: 271.0) [2024-06-15 20:02:35,956][1648985] Avg episode reward: [(0, '148.170')] [2024-06-15 20:02:36,864][1652491] Updated weights for policy 0, policy_version 694960 (0.0012) [2024-06-15 20:02:38,482][1652491] Updated weights for policy 0, policy_version 695039 (0.0015) [2024-06-15 20:02:40,955][1648985] Fps is (10 sec: 52428.5, 60 sec: 46421.3, 300 sec: 46208.4). Total num frames: 1423441920. Throughput: 0: 11776.0. Samples: 355908096. Policy #0 lag: (min: 15.0, avg: 82.6, max: 271.0) [2024-06-15 20:02:40,956][1648985] Avg episode reward: [(0, '170.560')] [2024-06-15 20:02:45,001][1652491] Updated weights for policy 0, policy_version 695120 (0.0068) [2024-06-15 20:02:45,955][1648985] Fps is (10 sec: 49151.9, 60 sec: 48059.9, 300 sec: 46208.6). Total num frames: 1423704064. Throughput: 0: 11662.2. Samples: 355981824. Policy #0 lag: (min: 15.0, avg: 82.6, max: 271.0) [2024-06-15 20:02:45,956][1648985] Avg episode reward: [(0, '176.180')] [2024-06-15 20:02:47,737][1652491] Updated weights for policy 0, policy_version 695200 (0.0013) [2024-06-15 20:02:49,629][1652491] Updated weights for policy 0, policy_version 695294 (0.0015) [2024-06-15 20:02:50,955][1648985] Fps is (10 sec: 52429.5, 60 sec: 48059.7, 300 sec: 46209.0). Total num frames: 1423966208. Throughput: 0: 11867.0. Samples: 356049408. Policy #0 lag: (min: 15.0, avg: 82.6, max: 271.0) [2024-06-15 20:02:50,955][1648985] Avg episode reward: [(0, '164.620')] [2024-06-15 20:02:55,955][1648985] Fps is (10 sec: 36044.7, 60 sec: 45329.0, 300 sec: 45986.3). Total num frames: 1424064512. Throughput: 0: 11832.9. Samples: 356092928. Policy #0 lag: (min: 15.0, avg: 82.6, max: 271.0) [2024-06-15 20:02:55,956][1648985] Avg episode reward: [(0, '153.660')] [2024-06-15 20:02:56,517][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000695376_1424130048.pth... [2024-06-15 20:02:56,715][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000689984_1413087232.pth [2024-06-15 20:02:57,201][1652491] Updated weights for policy 0, policy_version 695397 (0.0015) [2024-06-15 20:02:58,879][1652491] Updated weights for policy 0, policy_version 695459 (0.0012) [2024-06-15 20:03:00,549][1652491] Updated weights for policy 0, policy_version 695504 (0.0013) [2024-06-15 20:03:00,947][1651469] Signal inference workers to stop experience collection... (36150 times) [2024-06-15 20:03:00,955][1648985] Fps is (10 sec: 45875.3, 60 sec: 48605.9, 300 sec: 46430.6). Total num frames: 1424424960. Throughput: 0: 11776.1. Samples: 356150784. Policy #0 lag: (min: 15.0, avg: 82.6, max: 271.0) [2024-06-15 20:03:00,955][1648985] Avg episode reward: [(0, '152.730')] [2024-06-15 20:03:01,017][1652491] InferenceWorker_p0-w0: stopping experience collection (36150 times) [2024-06-15 20:03:01,185][1651469] Signal inference workers to resume experience collection... (36150 times) [2024-06-15 20:03:01,186][1652491] InferenceWorker_p0-w0: resuming experience collection (36150 times) [2024-06-15 20:03:05,955][1648985] Fps is (10 sec: 42598.5, 60 sec: 43690.6, 300 sec: 45765.9). Total num frames: 1424490496. Throughput: 0: 11673.6. Samples: 356227584. Policy #0 lag: (min: 15.0, avg: 82.6, max: 271.0) [2024-06-15 20:03:05,956][1648985] Avg episode reward: [(0, '158.480')] [2024-06-15 20:03:06,426][1652491] Updated weights for policy 0, policy_version 695557 (0.0015) [2024-06-15 20:03:08,092][1652491] Updated weights for policy 0, policy_version 695625 (0.0012) [2024-06-15 20:03:09,658][1652491] Updated weights for policy 0, policy_version 695685 (0.0013) [2024-06-15 20:03:10,833][1652491] Updated weights for policy 0, policy_version 695742 (0.0012) [2024-06-15 20:03:10,955][1648985] Fps is (10 sec: 45874.5, 60 sec: 48605.8, 300 sec: 46208.4). Total num frames: 1424883712. Throughput: 0: 11787.4. Samples: 356258304. Policy #0 lag: (min: 15.0, avg: 82.6, max: 271.0) [2024-06-15 20:03:10,956][1648985] Avg episode reward: [(0, '152.660')] [2024-06-15 20:03:15,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 45329.2, 300 sec: 45875.2). Total num frames: 1425014784. Throughput: 0: 11525.7. Samples: 356323328. Policy #0 lag: (min: 15.0, avg: 82.6, max: 271.0) [2024-06-15 20:03:15,956][1648985] Avg episode reward: [(0, '143.940')] [2024-06-15 20:03:18,301][1652491] Updated weights for policy 0, policy_version 695810 (0.0013) [2024-06-15 20:03:20,072][1652491] Updated weights for policy 0, policy_version 695890 (0.0013) [2024-06-15 20:03:20,955][1648985] Fps is (10 sec: 36044.5, 60 sec: 47513.5, 300 sec: 45875.2). Total num frames: 1425244160. Throughput: 0: 11355.0. Samples: 356388352. Policy #0 lag: (min: 15.0, avg: 82.6, max: 271.0) [2024-06-15 20:03:20,956][1648985] Avg episode reward: [(0, '143.100')] [2024-06-15 20:03:21,526][1652491] Updated weights for policy 0, policy_version 695952 (0.0013) [2024-06-15 20:03:22,691][1652491] Updated weights for policy 0, policy_version 695999 (0.0011) [2024-06-15 20:03:25,955][1648985] Fps is (10 sec: 52428.4, 60 sec: 46421.4, 300 sec: 46208.4). Total num frames: 1425539072. Throughput: 0: 11446.1. Samples: 356423168. Policy #0 lag: (min: 15.0, avg: 82.6, max: 271.0) [2024-06-15 20:03:25,956][1648985] Avg episode reward: [(0, '128.330')] [2024-06-15 20:03:29,929][1652491] Updated weights for policy 0, policy_version 696080 (0.0013) [2024-06-15 20:03:30,955][1648985] Fps is (10 sec: 42599.0, 60 sec: 45875.2, 300 sec: 45764.1). Total num frames: 1425670144. Throughput: 0: 11594.0. Samples: 356503552. Policy #0 lag: (min: 15.0, avg: 82.6, max: 271.0) [2024-06-15 20:03:30,956][1648985] Avg episode reward: [(0, '117.750')] [2024-06-15 20:03:31,901][1652491] Updated weights for policy 0, policy_version 696176 (0.0013) [2024-06-15 20:03:32,794][1652491] Updated weights for policy 0, policy_version 696224 (0.0014) [2024-06-15 20:03:34,530][1652491] Updated weights for policy 0, policy_version 696276 (0.0014) [2024-06-15 20:03:35,955][1648985] Fps is (10 sec: 52428.4, 60 sec: 47513.5, 300 sec: 46208.4). Total num frames: 1426063360. Throughput: 0: 11480.1. Samples: 356566016. Policy #0 lag: (min: 15.0, avg: 82.6, max: 271.0) [2024-06-15 20:03:35,956][1648985] Avg episode reward: [(0, '125.960')] [2024-06-15 20:03:40,942][1652491] Updated weights for policy 0, policy_version 696336 (0.0012) [2024-06-15 20:03:40,955][1648985] Fps is (10 sec: 42598.5, 60 sec: 44236.9, 300 sec: 45764.1). Total num frames: 1426096128. Throughput: 0: 11411.9. Samples: 356606464. Policy #0 lag: (min: 15.0, avg: 82.6, max: 271.0) [2024-06-15 20:03:40,956][1648985] Avg episode reward: [(0, '142.960')] [2024-06-15 20:03:43,313][1652491] Updated weights for policy 0, policy_version 696432 (0.0013) [2024-06-15 20:03:43,622][1651469] Signal inference workers to stop experience collection... (36200 times) [2024-06-15 20:03:43,701][1652491] InferenceWorker_p0-w0: stopping experience collection (36200 times) [2024-06-15 20:03:43,714][1651469] Signal inference workers to resume experience collection... (36200 times) [2024-06-15 20:03:43,728][1652491] InferenceWorker_p0-w0: resuming experience collection (36200 times) [2024-06-15 20:03:44,271][1652491] Updated weights for policy 0, policy_version 696466 (0.0013) [2024-06-15 20:03:45,955][1648985] Fps is (10 sec: 42598.9, 60 sec: 46421.3, 300 sec: 45875.2). Total num frames: 1426489344. Throughput: 0: 11411.9. Samples: 356664320. Policy #0 lag: (min: 15.0, avg: 82.6, max: 271.0) [2024-06-15 20:03:45,956][1648985] Avg episode reward: [(0, '149.950')] [2024-06-15 20:03:45,967][1652491] Updated weights for policy 0, policy_version 696532 (0.0014) [2024-06-15 20:03:50,955][1648985] Fps is (10 sec: 49151.1, 60 sec: 43690.5, 300 sec: 45764.1). Total num frames: 1426587648. Throughput: 0: 11480.1. Samples: 356744192. Policy #0 lag: (min: 15.0, avg: 82.6, max: 271.0) [2024-06-15 20:03:50,956][1648985] Avg episode reward: [(0, '166.630')] [2024-06-15 20:03:53,260][1652491] Updated weights for policy 0, policy_version 696608 (0.0108) [2024-06-15 20:03:55,677][1652491] Updated weights for policy 0, policy_version 696704 (0.0013) [2024-06-15 20:03:55,955][1648985] Fps is (10 sec: 36044.7, 60 sec: 46421.3, 300 sec: 45764.1). Total num frames: 1426849792. Throughput: 0: 11468.8. Samples: 356774400. Policy #0 lag: (min: 15.0, avg: 82.6, max: 271.0) [2024-06-15 20:03:55,956][1648985] Avg episode reward: [(0, '176.190')] [2024-06-15 20:03:56,902][1652491] Updated weights for policy 0, policy_version 696766 (0.0012) [2024-06-15 20:03:58,634][1652491] Updated weights for policy 0, policy_version 696822 (0.0021) [2024-06-15 20:04:00,955][1648985] Fps is (10 sec: 52429.7, 60 sec: 44782.9, 300 sec: 46208.4). Total num frames: 1427111936. Throughput: 0: 11332.3. Samples: 356833280. Policy #0 lag: (min: 15.0, avg: 82.6, max: 271.0) [2024-06-15 20:04:00,956][1648985] Avg episode reward: [(0, '162.270')] [2024-06-15 20:04:04,468][1652491] Updated weights for policy 0, policy_version 696850 (0.0011) [2024-06-15 20:04:05,955][1648985] Fps is (10 sec: 42598.9, 60 sec: 46421.4, 300 sec: 45764.1). Total num frames: 1427275776. Throughput: 0: 11514.4. Samples: 356906496. Policy #0 lag: (min: 15.0, avg: 82.6, max: 271.0) [2024-06-15 20:04:05,955][1648985] Avg episode reward: [(0, '161.680')] [2024-06-15 20:04:06,429][1652491] Updated weights for policy 0, policy_version 696929 (0.0013) [2024-06-15 20:04:08,244][1652491] Updated weights for policy 0, policy_version 697018 (0.0119) [2024-06-15 20:04:10,327][1652491] Updated weights for policy 0, policy_version 697056 (0.0012) [2024-06-15 20:04:10,955][1648985] Fps is (10 sec: 49152.0, 60 sec: 45329.1, 300 sec: 46097.4). Total num frames: 1427603456. Throughput: 0: 11332.3. Samples: 356933120. Policy #0 lag: (min: 15.0, avg: 82.6, max: 271.0) [2024-06-15 20:04:10,956][1648985] Avg episode reward: [(0, '150.990')] [2024-06-15 20:04:11,134][1652491] Updated weights for policy 0, policy_version 697086 (0.0015) [2024-06-15 20:04:15,955][1648985] Fps is (10 sec: 36044.1, 60 sec: 43690.6, 300 sec: 45653.0). Total num frames: 1427636224. Throughput: 0: 11298.1. Samples: 357011968. Policy #0 lag: (min: 15.0, avg: 82.6, max: 271.0) [2024-06-15 20:04:15,956][1648985] Avg episode reward: [(0, '161.780')] [2024-06-15 20:04:17,418][1652491] Updated weights for policy 0, policy_version 697152 (0.0100) [2024-06-15 20:04:19,404][1652491] Updated weights for policy 0, policy_version 697232 (0.0016) [2024-06-15 20:04:20,955][1648985] Fps is (10 sec: 42598.5, 60 sec: 46421.5, 300 sec: 45878.5). Total num frames: 1428029440. Throughput: 0: 11127.5. Samples: 357066752. Policy #0 lag: (min: 15.0, avg: 75.6, max: 271.0) [2024-06-15 20:04:20,956][1648985] Avg episode reward: [(0, '163.250')] [2024-06-15 20:04:22,059][1652491] Updated weights for policy 0, policy_version 697300 (0.0013) [2024-06-15 20:04:25,955][1648985] Fps is (10 sec: 52429.1, 60 sec: 43690.7, 300 sec: 45986.3). Total num frames: 1428160512. Throughput: 0: 11116.1. Samples: 357106688. Policy #0 lag: (min: 15.0, avg: 75.6, max: 271.0) [2024-06-15 20:04:25,956][1648985] Avg episode reward: [(0, '182.630')] [2024-06-15 20:04:27,040][1652491] Updated weights for policy 0, policy_version 697347 (0.0010) [2024-06-15 20:04:27,343][1651469] Signal inference workers to stop experience collection... (36250 times) [2024-06-15 20:04:27,403][1652491] InferenceWorker_p0-w0: stopping experience collection (36250 times) [2024-06-15 20:04:27,583][1651469] Signal inference workers to resume experience collection... (36250 times) [2024-06-15 20:04:27,584][1652491] InferenceWorker_p0-w0: resuming experience collection (36250 times) [2024-06-15 20:04:28,371][1652491] Updated weights for policy 0, policy_version 697410 (0.0030) [2024-06-15 20:04:29,735][1652491] Updated weights for policy 0, policy_version 697467 (0.0015) [2024-06-15 20:04:30,955][1648985] Fps is (10 sec: 45874.2, 60 sec: 46967.3, 300 sec: 45986.2). Total num frames: 1428488192. Throughput: 0: 11366.4. Samples: 357175808. Policy #0 lag: (min: 15.0, avg: 75.6, max: 271.0) [2024-06-15 20:04:30,956][1648985] Avg episode reward: [(0, '178.710')] [2024-06-15 20:04:31,100][1652491] Updated weights for policy 0, policy_version 697507 (0.0069) [2024-06-15 20:04:33,367][1652491] Updated weights for policy 0, policy_version 697568 (0.0011) [2024-06-15 20:04:35,955][1648985] Fps is (10 sec: 52429.4, 60 sec: 43690.8, 300 sec: 46208.4). Total num frames: 1428684800. Throughput: 0: 11116.1. Samples: 357244416. Policy #0 lag: (min: 15.0, avg: 75.6, max: 271.0) [2024-06-15 20:04:35,955][1648985] Avg episode reward: [(0, '172.230')] [2024-06-15 20:04:38,762][1652491] Updated weights for policy 0, policy_version 697619 (0.0015) [2024-06-15 20:04:39,909][1652491] Updated weights for policy 0, policy_version 697667 (0.0023) [2024-06-15 20:04:40,955][1648985] Fps is (10 sec: 39322.5, 60 sec: 46421.3, 300 sec: 45764.1). Total num frames: 1428881408. Throughput: 0: 11252.6. Samples: 357280768. Policy #0 lag: (min: 15.0, avg: 75.6, max: 271.0) [2024-06-15 20:04:40,956][1648985] Avg episode reward: [(0, '147.700')] [2024-06-15 20:04:41,796][1652491] Updated weights for policy 0, policy_version 697733 (0.0013) [2024-06-15 20:04:44,352][1652491] Updated weights for policy 0, policy_version 697801 (0.0013) [2024-06-15 20:04:45,489][1652491] Updated weights for policy 0, policy_version 697845 (0.0012) [2024-06-15 20:04:45,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 45329.1, 300 sec: 46208.4). Total num frames: 1429209088. Throughput: 0: 11457.4. Samples: 357348864. Policy #0 lag: (min: 15.0, avg: 75.6, max: 271.0) [2024-06-15 20:04:45,956][1648985] Avg episode reward: [(0, '135.800')] [2024-06-15 20:04:49,989][1652491] Updated weights for policy 0, policy_version 697875 (0.0016) [2024-06-15 20:04:50,850][1652491] Updated weights for policy 0, policy_version 697917 (0.0011) [2024-06-15 20:04:50,955][1648985] Fps is (10 sec: 45875.4, 60 sec: 45875.4, 300 sec: 45875.2). Total num frames: 1429340160. Throughput: 0: 11400.5. Samples: 357419520. Policy #0 lag: (min: 15.0, avg: 75.6, max: 271.0) [2024-06-15 20:04:50,956][1648985] Avg episode reward: [(0, '141.510')] [2024-06-15 20:04:52,344][1652491] Updated weights for policy 0, policy_version 697968 (0.0013) [2024-06-15 20:04:53,634][1652491] Updated weights for policy 0, policy_version 698016 (0.0012) [2024-06-15 20:04:55,955][1648985] Fps is (10 sec: 42598.4, 60 sec: 46421.4, 300 sec: 45875.2). Total num frames: 1429635072. Throughput: 0: 11502.9. Samples: 357450752. Policy #0 lag: (min: 15.0, avg: 75.6, max: 271.0) [2024-06-15 20:04:55,955][1648985] Avg episode reward: [(0, '177.300')] [2024-06-15 20:04:56,590][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000698096_1429700608.pth... [2024-06-15 20:04:56,636][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000692720_1418690560.pth [2024-06-15 20:04:56,759][1652491] Updated weights for policy 0, policy_version 698100 (0.0012) [2024-06-15 20:05:00,712][1652491] Updated weights for policy 0, policy_version 698142 (0.0125) [2024-06-15 20:05:00,955][1648985] Fps is (10 sec: 45875.0, 60 sec: 44782.9, 300 sec: 45986.3). Total num frames: 1429798912. Throughput: 0: 11411.9. Samples: 357525504. Policy #0 lag: (min: 15.0, avg: 75.6, max: 271.0) [2024-06-15 20:05:00,956][1648985] Avg episode reward: [(0, '185.150')] [2024-06-15 20:05:03,363][1652491] Updated weights for policy 0, policy_version 698200 (0.0076) [2024-06-15 20:05:04,376][1652491] Updated weights for policy 0, policy_version 698242 (0.0012) [2024-06-15 20:05:05,394][1652491] Updated weights for policy 0, policy_version 698293 (0.0013) [2024-06-15 20:05:05,955][1648985] Fps is (10 sec: 49151.1, 60 sec: 47513.4, 300 sec: 45875.2). Total num frames: 1430126592. Throughput: 0: 11730.4. Samples: 357594624. Policy #0 lag: (min: 15.0, avg: 75.6, max: 271.0) [2024-06-15 20:05:05,956][1648985] Avg episode reward: [(0, '176.940')] [2024-06-15 20:05:07,401][1652491] Updated weights for policy 0, policy_version 698344 (0.0015) [2024-06-15 20:05:10,187][1652491] Updated weights for policy 0, policy_version 698369 (0.0013) [2024-06-15 20:05:10,680][1651469] Signal inference workers to stop experience collection... (36300 times) [2024-06-15 20:05:10,727][1652491] InferenceWorker_p0-w0: stopping experience collection (36300 times) [2024-06-15 20:05:10,910][1651469] Signal inference workers to resume experience collection... (36300 times) [2024-06-15 20:05:10,911][1652491] InferenceWorker_p0-w0: resuming experience collection (36300 times) [2024-06-15 20:05:10,971][1648985] Fps is (10 sec: 52344.6, 60 sec: 45316.9, 300 sec: 46317.0). Total num frames: 1430323200. Throughput: 0: 11680.8. Samples: 357632512. Policy #0 lag: (min: 15.0, avg: 75.6, max: 271.0) [2024-06-15 20:05:10,972][1648985] Avg episode reward: [(0, '139.820')] [2024-06-15 20:05:14,952][1652491] Updated weights for policy 0, policy_version 698464 (0.0012) [2024-06-15 20:05:15,955][1648985] Fps is (10 sec: 39322.8, 60 sec: 48060.0, 300 sec: 45875.3). Total num frames: 1430519808. Throughput: 0: 11730.6. Samples: 357703680. Policy #0 lag: (min: 15.0, avg: 75.6, max: 271.0) [2024-06-15 20:05:15,955][1648985] Avg episode reward: [(0, '135.970')] [2024-06-15 20:05:16,642][1652491] Updated weights for policy 0, policy_version 698533 (0.0089) [2024-06-15 20:05:18,310][1652491] Updated weights for policy 0, policy_version 698577 (0.0014) [2024-06-15 20:05:20,955][1648985] Fps is (10 sec: 45949.5, 60 sec: 45875.2, 300 sec: 46208.4). Total num frames: 1430781952. Throughput: 0: 11685.0. Samples: 357770240. Policy #0 lag: (min: 15.0, avg: 75.6, max: 271.0) [2024-06-15 20:05:20,955][1648985] Avg episode reward: [(0, '132.090')] [2024-06-15 20:05:21,979][1652491] Updated weights for policy 0, policy_version 698643 (0.0015) [2024-06-15 20:05:25,876][1652491] Updated weights for policy 0, policy_version 698690 (0.0013) [2024-06-15 20:05:25,955][1648985] Fps is (10 sec: 39320.3, 60 sec: 45875.1, 300 sec: 45875.2). Total num frames: 1430913024. Throughput: 0: 11673.6. Samples: 357806080. Policy #0 lag: (min: 15.0, avg: 75.6, max: 271.0) [2024-06-15 20:05:25,956][1648985] Avg episode reward: [(0, '133.670')] [2024-06-15 20:05:27,819][1652491] Updated weights for policy 0, policy_version 698756 (0.0017) [2024-06-15 20:05:28,839][1652491] Updated weights for policy 0, policy_version 698815 (0.0012) [2024-06-15 20:05:30,307][1652491] Updated weights for policy 0, policy_version 698874 (0.0012) [2024-06-15 20:05:30,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 46967.7, 300 sec: 46208.4). Total num frames: 1431306240. Throughput: 0: 11707.7. Samples: 357875712. Policy #0 lag: (min: 15.0, avg: 75.6, max: 271.0) [2024-06-15 20:05:30,955][1648985] Avg episode reward: [(0, '145.280')] [2024-06-15 20:05:33,691][1652491] Updated weights for policy 0, policy_version 698933 (0.0012) [2024-06-15 20:05:35,955][1648985] Fps is (10 sec: 52429.9, 60 sec: 45875.2, 300 sec: 46097.4). Total num frames: 1431437312. Throughput: 0: 11878.4. Samples: 357954048. Policy #0 lag: (min: 15.0, avg: 75.6, max: 271.0) [2024-06-15 20:05:35,955][1648985] Avg episode reward: [(0, '142.510')] [2024-06-15 20:05:37,384][1652491] Updated weights for policy 0, policy_version 698978 (0.0013) [2024-06-15 20:05:38,815][1652491] Updated weights for policy 0, policy_version 699041 (0.0013) [2024-06-15 20:05:40,557][1652491] Updated weights for policy 0, policy_version 699089 (0.0015) [2024-06-15 20:05:40,955][1648985] Fps is (10 sec: 45874.7, 60 sec: 48059.7, 300 sec: 45986.3). Total num frames: 1431764992. Throughput: 0: 11844.2. Samples: 357983744. Policy #0 lag: (min: 15.0, avg: 75.6, max: 271.0) [2024-06-15 20:05:40,956][1648985] Avg episode reward: [(0, '136.610')] [2024-06-15 20:05:41,343][1652491] Updated weights for policy 0, policy_version 699133 (0.0013) [2024-06-15 20:05:44,551][1652491] Updated weights for policy 0, policy_version 699193 (0.0026) [2024-06-15 20:05:45,955][1648985] Fps is (10 sec: 52428.1, 60 sec: 45875.1, 300 sec: 46431.9). Total num frames: 1431961600. Throughput: 0: 11798.7. Samples: 358056448. Policy #0 lag: (min: 15.0, avg: 75.6, max: 271.0) [2024-06-15 20:05:45,956][1648985] Avg episode reward: [(0, '133.010')] [2024-06-15 20:05:48,174][1652491] Updated weights for policy 0, policy_version 699256 (0.0013) [2024-06-15 20:05:49,271][1652491] Updated weights for policy 0, policy_version 699296 (0.0020) [2024-06-15 20:05:49,818][1652491] Updated weights for policy 0, policy_version 699328 (0.0016) [2024-06-15 20:05:50,955][1648985] Fps is (10 sec: 49152.4, 60 sec: 48605.9, 300 sec: 46208.5). Total num frames: 1432256512. Throughput: 0: 11980.8. Samples: 358133760. Policy #0 lag: (min: 15.0, avg: 75.6, max: 271.0) [2024-06-15 20:05:50,956][1648985] Avg episode reward: [(0, '135.780')] [2024-06-15 20:05:51,165][1651469] Signal inference workers to stop experience collection... (36350 times) [2024-06-15 20:05:51,199][1652491] InferenceWorker_p0-w0: stopping experience collection (36350 times) [2024-06-15 20:05:51,323][1651469] Signal inference workers to resume experience collection... (36350 times) [2024-06-15 20:05:51,324][1652491] InferenceWorker_p0-w0: resuming experience collection (36350 times) [2024-06-15 20:05:51,457][1652491] Updated weights for policy 0, policy_version 699384 (0.0014) [2024-06-15 20:05:54,075][1652491] Updated weights for policy 0, policy_version 699431 (0.0105) [2024-06-15 20:05:55,955][1648985] Fps is (10 sec: 52429.1, 60 sec: 47513.5, 300 sec: 46652.8). Total num frames: 1432485888. Throughput: 0: 11985.1. Samples: 358171648. Policy #0 lag: (min: 15.0, avg: 75.6, max: 271.0) [2024-06-15 20:05:55,956][1648985] Avg episode reward: [(0, '142.630')] [2024-06-15 20:05:58,188][1652491] Updated weights for policy 0, policy_version 699488 (0.0017) [2024-06-15 20:05:59,684][1652491] Updated weights for policy 0, policy_version 699537 (0.0012) [2024-06-15 20:06:00,955][1648985] Fps is (10 sec: 49152.0, 60 sec: 49152.0, 300 sec: 46319.5). Total num frames: 1432748032. Throughput: 0: 12083.2. Samples: 358247424. Policy #0 lag: (min: 15.0, avg: 95.7, max: 271.0) [2024-06-15 20:06:00,956][1648985] Avg episode reward: [(0, '157.560')] [2024-06-15 20:06:01,029][1652491] Updated weights for policy 0, policy_version 699600 (0.0013) [2024-06-15 20:06:04,322][1652491] Updated weights for policy 0, policy_version 699650 (0.0014) [2024-06-15 20:06:05,331][1652491] Updated weights for policy 0, policy_version 699698 (0.0032) [2024-06-15 20:06:05,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 48059.8, 300 sec: 46652.7). Total num frames: 1433010176. Throughput: 0: 12151.4. Samples: 358317056. Policy #0 lag: (min: 15.0, avg: 95.7, max: 271.0) [2024-06-15 20:06:05,956][1648985] Avg episode reward: [(0, '158.340')] [2024-06-15 20:06:09,227][1652491] Updated weights for policy 0, policy_version 699748 (0.0012) [2024-06-15 20:06:10,955][1648985] Fps is (10 sec: 39321.6, 60 sec: 46980.1, 300 sec: 46319.5). Total num frames: 1433141248. Throughput: 0: 12265.3. Samples: 358358016. Policy #0 lag: (min: 15.0, avg: 95.7, max: 271.0) [2024-06-15 20:06:10,956][1648985] Avg episode reward: [(0, '152.830')] [2024-06-15 20:06:11,738][1652491] Updated weights for policy 0, policy_version 699824 (0.0012) [2024-06-15 20:06:12,890][1652491] Updated weights for policy 0, policy_version 699879 (0.0014) [2024-06-15 20:06:15,627][1652491] Updated weights for policy 0, policy_version 699936 (0.0014) [2024-06-15 20:06:15,955][1648985] Fps is (10 sec: 45874.8, 60 sec: 49151.7, 300 sec: 46430.6). Total num frames: 1433468928. Throughput: 0: 12276.6. Samples: 358428160. Policy #0 lag: (min: 15.0, avg: 95.7, max: 271.0) [2024-06-15 20:06:15,956][1648985] Avg episode reward: [(0, '146.960')] [2024-06-15 20:06:16,382][1652491] Updated weights for policy 0, policy_version 699964 (0.0017) [2024-06-15 20:06:20,577][1652491] Updated weights for policy 0, policy_version 700028 (0.0129) [2024-06-15 20:06:20,955][1648985] Fps is (10 sec: 52429.2, 60 sec: 48059.8, 300 sec: 46763.8). Total num frames: 1433665536. Throughput: 0: 12151.5. Samples: 358500864. Policy #0 lag: (min: 15.0, avg: 95.7, max: 271.0) [2024-06-15 20:06:20,955][1648985] Avg episode reward: [(0, '147.430')] [2024-06-15 20:06:22,426][1652491] Updated weights for policy 0, policy_version 700067 (0.0044) [2024-06-15 20:06:25,955][1648985] Fps is (10 sec: 45874.9, 60 sec: 50244.2, 300 sec: 46208.4). Total num frames: 1433927680. Throughput: 0: 12151.4. Samples: 358530560. Policy #0 lag: (min: 15.0, avg: 95.7, max: 271.0) [2024-06-15 20:06:25,956][1648985] Avg episode reward: [(0, '157.350')] [2024-06-15 20:06:26,496][1652491] Updated weights for policy 0, policy_version 700161 (0.0020) [2024-06-15 20:06:30,955][1648985] Fps is (10 sec: 39321.2, 60 sec: 45875.1, 300 sec: 46652.7). Total num frames: 1434058752. Throughput: 0: 12026.3. Samples: 358597632. Policy #0 lag: (min: 15.0, avg: 95.7, max: 271.0) [2024-06-15 20:06:30,956][1648985] Avg episode reward: [(0, '171.750')] [2024-06-15 20:06:31,560][1652491] Updated weights for policy 0, policy_version 700240 (0.0015) [2024-06-15 20:06:33,770][1652491] Updated weights for policy 0, policy_version 700304 (0.0016) [2024-06-15 20:06:34,891][1651469] Signal inference workers to stop experience collection... (36400 times) [2024-06-15 20:06:34,948][1652491] InferenceWorker_p0-w0: stopping experience collection (36400 times) [2024-06-15 20:06:35,183][1651469] Signal inference workers to resume experience collection... (36400 times) [2024-06-15 20:06:35,183][1652491] InferenceWorker_p0-w0: resuming experience collection (36400 times) [2024-06-15 20:06:35,763][1652491] Updated weights for policy 0, policy_version 700385 (0.0013) [2024-06-15 20:06:35,956][1648985] Fps is (10 sec: 45876.4, 60 sec: 49152.0, 300 sec: 46541.7). Total num frames: 1434386432. Throughput: 0: 11650.9. Samples: 358658048. Policy #0 lag: (min: 15.0, avg: 95.7, max: 271.0) [2024-06-15 20:06:35,956][1648985] Avg episode reward: [(0, '169.480')] [2024-06-15 20:06:38,624][1652491] Updated weights for policy 0, policy_version 700432 (0.0013) [2024-06-15 20:06:39,524][1652491] Updated weights for policy 0, policy_version 700476 (0.0022) [2024-06-15 20:06:40,955][1648985] Fps is (10 sec: 52428.3, 60 sec: 46967.4, 300 sec: 46652.8). Total num frames: 1434583040. Throughput: 0: 11616.7. Samples: 358694400. Policy #0 lag: (min: 15.0, avg: 95.7, max: 271.0) [2024-06-15 20:06:40,956][1648985] Avg episode reward: [(0, '184.290')] [2024-06-15 20:06:43,571][1652491] Updated weights for policy 0, policy_version 700513 (0.0081) [2024-06-15 20:06:44,495][1652491] Updated weights for policy 0, policy_version 700547 (0.0033) [2024-06-15 20:06:45,942][1652491] Updated weights for policy 0, policy_version 700610 (0.0014) [2024-06-15 20:06:45,955][1648985] Fps is (10 sec: 45875.1, 60 sec: 48059.8, 300 sec: 46652.7). Total num frames: 1434845184. Throughput: 0: 11650.8. Samples: 358771712. Policy #0 lag: (min: 15.0, avg: 95.7, max: 271.0) [2024-06-15 20:06:45,956][1648985] Avg episode reward: [(0, '172.420')] [2024-06-15 20:06:46,943][1652491] Updated weights for policy 0, policy_version 700656 (0.0024) [2024-06-15 20:06:47,367][1652491] Updated weights for policy 0, policy_version 700669 (0.0031) [2024-06-15 20:06:50,016][1652491] Updated weights for policy 0, policy_version 700724 (0.0020) [2024-06-15 20:06:50,955][1648985] Fps is (10 sec: 52428.4, 60 sec: 47513.4, 300 sec: 46652.7). Total num frames: 1435107328. Throughput: 0: 11616.7. Samples: 358839808. Policy #0 lag: (min: 15.0, avg: 95.7, max: 271.0) [2024-06-15 20:06:50,956][1648985] Avg episode reward: [(0, '142.010')] [2024-06-15 20:06:54,474][1652491] Updated weights for policy 0, policy_version 700769 (0.0013) [2024-06-15 20:06:55,955][1648985] Fps is (10 sec: 39320.7, 60 sec: 45875.1, 300 sec: 46541.6). Total num frames: 1435238400. Throughput: 0: 11639.4. Samples: 358881792. Policy #0 lag: (min: 15.0, avg: 95.7, max: 271.0) [2024-06-15 20:06:55,956][1648985] Avg episode reward: [(0, '133.250')] [2024-06-15 20:06:56,269][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000700816_1435271168.pth... [2024-06-15 20:06:56,463][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000695376_1424130048.pth [2024-06-15 20:06:57,380][1652491] Updated weights for policy 0, policy_version 700850 (0.0014) [2024-06-15 20:06:58,891][1652491] Updated weights for policy 0, policy_version 700919 (0.0017) [2024-06-15 20:07:00,955][1648985] Fps is (10 sec: 42599.0, 60 sec: 46421.3, 300 sec: 46319.5). Total num frames: 1435533312. Throughput: 0: 11366.4. Samples: 358939648. Policy #0 lag: (min: 15.0, avg: 95.7, max: 271.0) [2024-06-15 20:07:00,956][1648985] Avg episode reward: [(0, '141.940')] [2024-06-15 20:07:01,538][1652491] Updated weights for policy 0, policy_version 700964 (0.0012) [2024-06-15 20:07:05,357][1652491] Updated weights for policy 0, policy_version 700999 (0.0013) [2024-06-15 20:07:05,955][1648985] Fps is (10 sec: 45875.5, 60 sec: 44782.9, 300 sec: 46541.7). Total num frames: 1435697152. Throughput: 0: 11491.5. Samples: 359017984. Policy #0 lag: (min: 15.0, avg: 95.7, max: 271.0) [2024-06-15 20:07:05,956][1648985] Avg episode reward: [(0, '164.890')] [2024-06-15 20:07:06,241][1652491] Updated weights for policy 0, policy_version 701052 (0.0012) [2024-06-15 20:07:08,189][1652491] Updated weights for policy 0, policy_version 701107 (0.0022) [2024-06-15 20:07:09,663][1652491] Updated weights for policy 0, policy_version 701168 (0.0011) [2024-06-15 20:07:10,955][1648985] Fps is (10 sec: 49152.5, 60 sec: 48059.8, 300 sec: 46541.7). Total num frames: 1436024832. Throughput: 0: 11480.3. Samples: 359047168. Policy #0 lag: (min: 15.0, avg: 95.7, max: 271.0) [2024-06-15 20:07:10,955][1648985] Avg episode reward: [(0, '177.220')] [2024-06-15 20:07:11,925][1652491] Updated weights for policy 0, policy_version 701202 (0.0011) [2024-06-15 20:07:15,955][1648985] Fps is (10 sec: 45876.0, 60 sec: 44783.1, 300 sec: 46652.8). Total num frames: 1436155904. Throughput: 0: 11582.6. Samples: 359118848. Policy #0 lag: (min: 15.0, avg: 95.7, max: 271.0) [2024-06-15 20:07:15,955][1648985] Avg episode reward: [(0, '165.930')] [2024-06-15 20:07:16,792][1652491] Updated weights for policy 0, policy_version 701264 (0.0014) [2024-06-15 20:07:18,821][1652491] Updated weights for policy 0, policy_version 701328 (0.0097) [2024-06-15 20:07:19,178][1651469] Signal inference workers to stop experience collection... (36450 times) [2024-06-15 20:07:19,256][1652491] InferenceWorker_p0-w0: stopping experience collection (36450 times) [2024-06-15 20:07:19,372][1651469] Signal inference workers to resume experience collection... (36450 times) [2024-06-15 20:07:19,373][1652491] InferenceWorker_p0-w0: resuming experience collection (36450 times) [2024-06-15 20:07:20,654][1652491] Updated weights for policy 0, policy_version 701410 (0.0012) [2024-06-15 20:07:20,955][1648985] Fps is (10 sec: 49152.0, 60 sec: 47513.6, 300 sec: 46652.8). Total num frames: 1436516352. Throughput: 0: 11696.4. Samples: 359184384. Policy #0 lag: (min: 15.0, avg: 95.7, max: 271.0) [2024-06-15 20:07:20,955][1648985] Avg episode reward: [(0, '162.960')] [2024-06-15 20:07:23,319][1652491] Updated weights for policy 0, policy_version 701456 (0.0014) [2024-06-15 20:07:24,454][1652491] Updated weights for policy 0, policy_version 701497 (0.0011) [2024-06-15 20:07:25,955][1648985] Fps is (10 sec: 52427.0, 60 sec: 45875.1, 300 sec: 46652.7). Total num frames: 1436680192. Throughput: 0: 11696.3. Samples: 359220736. Policy #0 lag: (min: 15.0, avg: 95.7, max: 271.0) [2024-06-15 20:07:25,956][1648985] Avg episode reward: [(0, '151.410')] [2024-06-15 20:07:29,059][1652491] Updated weights for policy 0, policy_version 701560 (0.0013) [2024-06-15 20:07:30,431][1652491] Updated weights for policy 0, policy_version 701606 (0.0013) [2024-06-15 20:07:30,955][1648985] Fps is (10 sec: 42598.0, 60 sec: 48059.7, 300 sec: 46541.7). Total num frames: 1436942336. Throughput: 0: 11628.1. Samples: 359294976. Policy #0 lag: (min: 15.0, avg: 95.7, max: 271.0) [2024-06-15 20:07:30,956][1648985] Avg episode reward: [(0, '144.320')] [2024-06-15 20:07:32,161][1652491] Updated weights for policy 0, policy_version 701689 (0.0038) [2024-06-15 20:07:35,810][1652491] Updated weights for policy 0, policy_version 701744 (0.0012) [2024-06-15 20:07:35,955][1648985] Fps is (10 sec: 49154.0, 60 sec: 46421.4, 300 sec: 46541.7). Total num frames: 1437171712. Throughput: 0: 11514.4. Samples: 359357952. Policy #0 lag: (min: 15.0, avg: 95.7, max: 271.0) [2024-06-15 20:07:35,955][1648985] Avg episode reward: [(0, '155.790')] [2024-06-15 20:07:39,640][1652491] Updated weights for policy 0, policy_version 701776 (0.0022) [2024-06-15 20:07:40,966][1648985] Fps is (10 sec: 39277.8, 60 sec: 45866.8, 300 sec: 46206.7). Total num frames: 1437335552. Throughput: 0: 11511.5. Samples: 359399936. Policy #0 lag: (min: 15.0, avg: 91.2, max: 271.0) [2024-06-15 20:07:40,967][1648985] Avg episode reward: [(0, '150.840')] [2024-06-15 20:07:41,136][1652491] Updated weights for policy 0, policy_version 701827 (0.0013) [2024-06-15 20:07:43,400][1652491] Updated weights for policy 0, policy_version 701936 (0.0015) [2024-06-15 20:07:45,955][1648985] Fps is (10 sec: 42597.7, 60 sec: 45875.1, 300 sec: 46208.4). Total num frames: 1437597696. Throughput: 0: 11696.4. Samples: 359465984. Policy #0 lag: (min: 15.0, avg: 91.2, max: 271.0) [2024-06-15 20:07:45,956][1648985] Avg episode reward: [(0, '155.960')] [2024-06-15 20:07:46,965][1652491] Updated weights for policy 0, policy_version 701990 (0.0106) [2024-06-15 20:07:50,488][1652491] Updated weights for policy 0, policy_version 702032 (0.0023) [2024-06-15 20:07:50,955][1648985] Fps is (10 sec: 45926.6, 60 sec: 44783.1, 300 sec: 46541.7). Total num frames: 1437794304. Throughput: 0: 11730.5. Samples: 359545856. Policy #0 lag: (min: 15.0, avg: 91.2, max: 271.0) [2024-06-15 20:07:50,956][1648985] Avg episode reward: [(0, '159.890')] [2024-06-15 20:07:52,037][1652491] Updated weights for policy 0, policy_version 702100 (0.0034) [2024-06-15 20:07:53,104][1652491] Updated weights for policy 0, policy_version 702146 (0.0015) [2024-06-15 20:07:55,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 48059.9, 300 sec: 46430.6). Total num frames: 1438121984. Throughput: 0: 11696.3. Samples: 359573504. Policy #0 lag: (min: 15.0, avg: 91.2, max: 271.0) [2024-06-15 20:07:55,956][1648985] Avg episode reward: [(0, '154.590')] [2024-06-15 20:07:57,309][1652491] Updated weights for policy 0, policy_version 702209 (0.0014) [2024-06-15 20:07:58,689][1652491] Updated weights for policy 0, policy_version 702269 (0.0012) [2024-06-15 20:08:00,955][1648985] Fps is (10 sec: 49150.9, 60 sec: 45875.1, 300 sec: 46763.8). Total num frames: 1438285824. Throughput: 0: 11832.8. Samples: 359651328. Policy #0 lag: (min: 15.0, avg: 91.2, max: 271.0) [2024-06-15 20:08:00,956][1648985] Avg episode reward: [(0, '178.470')] [2024-06-15 20:08:01,306][1651469] Signal inference workers to stop experience collection... (36500 times) [2024-06-15 20:08:01,342][1652491] InferenceWorker_p0-w0: stopping experience collection (36500 times) [2024-06-15 20:08:01,572][1651469] Signal inference workers to resume experience collection... (36500 times) [2024-06-15 20:08:01,574][1652491] InferenceWorker_p0-w0: resuming experience collection (36500 times) [2024-06-15 20:08:01,725][1652491] Updated weights for policy 0, policy_version 702330 (0.0012) [2024-06-15 20:08:03,461][1652491] Updated weights for policy 0, policy_version 702387 (0.0044) [2024-06-15 20:08:04,675][1652491] Updated weights for policy 0, policy_version 702436 (0.0104) [2024-06-15 20:08:05,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 49152.1, 300 sec: 46652.7). Total num frames: 1438646272. Throughput: 0: 11832.9. Samples: 359716864. Policy #0 lag: (min: 15.0, avg: 91.2, max: 271.0) [2024-06-15 20:08:05,956][1648985] Avg episode reward: [(0, '165.510')] [2024-06-15 20:08:09,123][1652491] Updated weights for policy 0, policy_version 702482 (0.0016) [2024-06-15 20:08:10,955][1648985] Fps is (10 sec: 49152.1, 60 sec: 45875.0, 300 sec: 46652.7). Total num frames: 1438777344. Throughput: 0: 11946.7. Samples: 359758336. Policy #0 lag: (min: 15.0, avg: 91.2, max: 271.0) [2024-06-15 20:08:10,956][1648985] Avg episode reward: [(0, '177.490')] [2024-06-15 20:08:11,682][1652491] Updated weights for policy 0, policy_version 702530 (0.0015) [2024-06-15 20:08:13,886][1652491] Updated weights for policy 0, policy_version 702611 (0.0014) [2024-06-15 20:08:15,555][1652491] Updated weights for policy 0, policy_version 702688 (0.0029) [2024-06-15 20:08:15,955][1648985] Fps is (10 sec: 49152.4, 60 sec: 49698.1, 300 sec: 47097.1). Total num frames: 1439137792. Throughput: 0: 11832.9. Samples: 359827456. Policy #0 lag: (min: 15.0, avg: 91.2, max: 271.0) [2024-06-15 20:08:15,955][1648985] Avg episode reward: [(0, '161.210')] [2024-06-15 20:08:19,993][1652491] Updated weights for policy 0, policy_version 702739 (0.0012) [2024-06-15 20:08:20,866][1652491] Updated weights for policy 0, policy_version 702781 (0.0014) [2024-06-15 20:08:20,956][1648985] Fps is (10 sec: 52426.7, 60 sec: 46420.8, 300 sec: 46652.7). Total num frames: 1439301632. Throughput: 0: 12003.4. Samples: 359898112. Policy #0 lag: (min: 15.0, avg: 91.2, max: 271.0) [2024-06-15 20:08:20,956][1648985] Avg episode reward: [(0, '153.980')] [2024-06-15 20:08:23,293][1652491] Updated weights for policy 0, policy_version 702837 (0.0013) [2024-06-15 20:08:25,652][1652491] Updated weights for policy 0, policy_version 702899 (0.0014) [2024-06-15 20:08:25,955][1648985] Fps is (10 sec: 42598.0, 60 sec: 48059.9, 300 sec: 47097.0). Total num frames: 1439563776. Throughput: 0: 11972.4. Samples: 359938560. Policy #0 lag: (min: 15.0, avg: 91.2, max: 271.0) [2024-06-15 20:08:25,956][1648985] Avg episode reward: [(0, '183.590')] [2024-06-15 20:08:27,089][1652491] Updated weights for policy 0, policy_version 702974 (0.0013) [2024-06-15 20:08:30,958][1648985] Fps is (10 sec: 39310.5, 60 sec: 45872.6, 300 sec: 46207.9). Total num frames: 1439694848. Throughput: 0: 11957.2. Samples: 360004096. Policy #0 lag: (min: 15.0, avg: 91.2, max: 271.0) [2024-06-15 20:08:30,959][1648985] Avg episode reward: [(0, '204.920')] [2024-06-15 20:08:30,960][1651469] Saving new best policy, reward=204.920! [2024-06-15 20:08:32,721][1652491] Updated weights for policy 0, policy_version 703027 (0.0153) [2024-06-15 20:08:35,167][1652491] Updated weights for policy 0, policy_version 703099 (0.0014) [2024-06-15 20:08:35,955][1648985] Fps is (10 sec: 39321.7, 60 sec: 46421.2, 300 sec: 46986.0). Total num frames: 1439956992. Throughput: 0: 11696.3. Samples: 360072192. Policy #0 lag: (min: 15.0, avg: 91.2, max: 271.0) [2024-06-15 20:08:35,956][1648985] Avg episode reward: [(0, '186.860')] [2024-06-15 20:08:37,619][1652491] Updated weights for policy 0, policy_version 703184 (0.0015) [2024-06-15 20:08:40,955][1648985] Fps is (10 sec: 52445.7, 60 sec: 48068.5, 300 sec: 46541.6). Total num frames: 1440219136. Throughput: 0: 11582.5. Samples: 360094720. Policy #0 lag: (min: 15.0, avg: 91.2, max: 271.0) [2024-06-15 20:08:40,956][1648985] Avg episode reward: [(0, '202.370')] [2024-06-15 20:08:44,203][1652491] Updated weights for policy 0, policy_version 703251 (0.0024) [2024-06-15 20:08:45,955][1648985] Fps is (10 sec: 39321.2, 60 sec: 45875.1, 300 sec: 46652.8). Total num frames: 1440350208. Throughput: 0: 11502.9. Samples: 360168960. Policy #0 lag: (min: 15.0, avg: 91.2, max: 271.0) [2024-06-15 20:08:45,956][1648985] Avg episode reward: [(0, '211.470')] [2024-06-15 20:08:45,992][1651469] Signal inference workers to stop experience collection... (36550 times) [2024-06-15 20:08:46,021][1652491] InferenceWorker_p0-w0: stopping experience collection (36550 times) [2024-06-15 20:08:46,216][1651469] Signal inference workers to resume experience collection... (36550 times) [2024-06-15 20:08:46,217][1651469] Saving new best policy, reward=211.470! [2024-06-15 20:08:46,217][1652491] InferenceWorker_p0-w0: resuming experience collection (36550 times) [2024-06-15 20:08:46,220][1652491] Updated weights for policy 0, policy_version 703312 (0.0014) [2024-06-15 20:08:47,377][1652491] Updated weights for policy 0, policy_version 703359 (0.0011) [2024-06-15 20:08:49,596][1652491] Updated weights for policy 0, policy_version 703440 (0.0013) [2024-06-15 20:08:50,556][1652491] Updated weights for policy 0, policy_version 703487 (0.0011) [2024-06-15 20:08:50,955][1648985] Fps is (10 sec: 52430.0, 60 sec: 49152.0, 300 sec: 47097.1). Total num frames: 1440743424. Throughput: 0: 11377.8. Samples: 360228864. Policy #0 lag: (min: 15.0, avg: 91.2, max: 271.0) [2024-06-15 20:08:50,955][1648985] Avg episode reward: [(0, '210.440')] [2024-06-15 20:08:55,955][1648985] Fps is (10 sec: 45874.1, 60 sec: 44782.7, 300 sec: 46430.5). Total num frames: 1440808960. Throughput: 0: 11423.2. Samples: 360272384. Policy #0 lag: (min: 15.0, avg: 91.2, max: 271.0) [2024-06-15 20:08:55,956][1648985] Avg episode reward: [(0, '187.430')] [2024-06-15 20:08:56,388][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000703536_1440841728.pth... [2024-06-15 20:08:56,469][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000698096_1429700608.pth [2024-06-15 20:08:56,641][1652491] Updated weights for policy 0, policy_version 703545 (0.0013) [2024-06-15 20:08:58,907][1652491] Updated weights for policy 0, policy_version 703586 (0.0011) [2024-06-15 20:09:00,837][1652491] Updated weights for policy 0, policy_version 703664 (0.0016) [2024-06-15 20:09:00,955][1648985] Fps is (10 sec: 36044.3, 60 sec: 46967.5, 300 sec: 46874.9). Total num frames: 1441103872. Throughput: 0: 11332.2. Samples: 360337408. Policy #0 lag: (min: 15.0, avg: 91.2, max: 271.0) [2024-06-15 20:09:00,956][1648985] Avg episode reward: [(0, '176.120')] [2024-06-15 20:09:02,492][1652491] Updated weights for policy 0, policy_version 703716 (0.0013) [2024-06-15 20:09:05,955][1648985] Fps is (10 sec: 45876.1, 60 sec: 43690.6, 300 sec: 46319.5). Total num frames: 1441267712. Throughput: 0: 11321.0. Samples: 360407552. Policy #0 lag: (min: 15.0, avg: 91.2, max: 271.0) [2024-06-15 20:09:05,956][1648985] Avg episode reward: [(0, '160.340')] [2024-06-15 20:09:06,998][1652491] Updated weights for policy 0, policy_version 703745 (0.0011) [2024-06-15 20:09:08,297][1652491] Updated weights for policy 0, policy_version 703806 (0.0084) [2024-06-15 20:09:10,655][1652491] Updated weights for policy 0, policy_version 703866 (0.0013) [2024-06-15 20:09:10,955][1648985] Fps is (10 sec: 42599.2, 60 sec: 45875.4, 300 sec: 47097.1). Total num frames: 1441529856. Throughput: 0: 11138.9. Samples: 360439808. Policy #0 lag: (min: 15.0, avg: 91.2, max: 271.0) [2024-06-15 20:09:10,955][1648985] Avg episode reward: [(0, '138.940')] [2024-06-15 20:09:12,629][1652491] Updated weights for policy 0, policy_version 703936 (0.0012) [2024-06-15 20:09:14,036][1652491] Updated weights for policy 0, policy_version 703998 (0.0089) [2024-06-15 20:09:15,955][1648985] Fps is (10 sec: 52429.7, 60 sec: 44236.7, 300 sec: 46652.7). Total num frames: 1441792000. Throughput: 0: 10991.7. Samples: 360498688. Policy #0 lag: (min: 15.0, avg: 91.2, max: 271.0) [2024-06-15 20:09:15,956][1648985] Avg episode reward: [(0, '117.740')] [2024-06-15 20:09:19,346][1652491] Updated weights for policy 0, policy_version 704062 (0.0013) [2024-06-15 20:09:20,955][1648985] Fps is (10 sec: 39320.7, 60 sec: 43691.0, 300 sec: 46652.7). Total num frames: 1441923072. Throughput: 0: 11286.7. Samples: 360580096. Policy #0 lag: (min: 15.0, avg: 91.2, max: 271.0) [2024-06-15 20:09:20,956][1648985] Avg episode reward: [(0, '133.120')] [2024-06-15 20:09:22,025][1652491] Updated weights for policy 0, policy_version 704112 (0.0012) [2024-06-15 20:09:22,785][1652491] Updated weights for policy 0, policy_version 704144 (0.0011) [2024-06-15 20:09:24,337][1652491] Updated weights for policy 0, policy_version 704195 (0.0012) [2024-06-15 20:09:24,583][1651469] Signal inference workers to stop experience collection... (36600 times) [2024-06-15 20:09:24,626][1652491] InferenceWorker_p0-w0: stopping experience collection (36600 times) [2024-06-15 20:09:24,866][1651469] Signal inference workers to resume experience collection... (36600 times) [2024-06-15 20:09:24,868][1652491] InferenceWorker_p0-w0: resuming experience collection (36600 times) [2024-06-15 20:09:25,625][1652491] Updated weights for policy 0, policy_version 704255 (0.0011) [2024-06-15 20:09:25,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 45875.2, 300 sec: 46874.9). Total num frames: 1442316288. Throughput: 0: 11389.2. Samples: 360607232. Policy #0 lag: (min: 15.0, avg: 99.8, max: 271.0) [2024-06-15 20:09:25,956][1648985] Avg episode reward: [(0, '161.050')] [2024-06-15 20:09:30,189][1652491] Updated weights for policy 0, policy_version 704306 (0.0024) [2024-06-15 20:09:30,955][1648985] Fps is (10 sec: 52429.2, 60 sec: 45877.8, 300 sec: 46652.7). Total num frames: 1442447360. Throughput: 0: 11411.9. Samples: 360682496. Policy #0 lag: (min: 15.0, avg: 99.8, max: 271.0) [2024-06-15 20:09:30,956][1648985] Avg episode reward: [(0, '172.620')] [2024-06-15 20:09:33,603][1652491] Updated weights for policy 0, policy_version 704370 (0.0018) [2024-06-15 20:09:35,414][1652491] Updated weights for policy 0, policy_version 704438 (0.0013) [2024-06-15 20:09:35,955][1648985] Fps is (10 sec: 39321.5, 60 sec: 45875.2, 300 sec: 46874.9). Total num frames: 1442709504. Throughput: 0: 11434.6. Samples: 360743424. Policy #0 lag: (min: 15.0, avg: 99.8, max: 271.0) [2024-06-15 20:09:35,956][1648985] Avg episode reward: [(0, '164.240')] [2024-06-15 20:09:36,840][1652491] Updated weights for policy 0, policy_version 704496 (0.0013) [2024-06-15 20:09:40,955][1648985] Fps is (10 sec: 42598.8, 60 sec: 44236.9, 300 sec: 46319.5). Total num frames: 1442873344. Throughput: 0: 11343.8. Samples: 360782848. Policy #0 lag: (min: 15.0, avg: 99.8, max: 271.0) [2024-06-15 20:09:40,956][1648985] Avg episode reward: [(0, '142.560')] [2024-06-15 20:09:41,168][1652491] Updated weights for policy 0, policy_version 704529 (0.0012) [2024-06-15 20:09:44,006][1652491] Updated weights for policy 0, policy_version 704592 (0.0013) [2024-06-15 20:09:45,278][1652491] Updated weights for policy 0, policy_version 704644 (0.0013) [2024-06-15 20:09:45,955][1648985] Fps is (10 sec: 42598.5, 60 sec: 46421.4, 300 sec: 46763.8). Total num frames: 1443135488. Throughput: 0: 11525.7. Samples: 360856064. Policy #0 lag: (min: 15.0, avg: 99.8, max: 271.0) [2024-06-15 20:09:45,956][1648985] Avg episode reward: [(0, '130.180')] [2024-06-15 20:09:47,906][1652491] Updated weights for policy 0, policy_version 704752 (0.0121) [2024-06-15 20:09:50,955][1648985] Fps is (10 sec: 49151.9, 60 sec: 43690.6, 300 sec: 46541.7). Total num frames: 1443364864. Throughput: 0: 11434.7. Samples: 360922112. Policy #0 lag: (min: 15.0, avg: 99.8, max: 271.0) [2024-06-15 20:09:50,956][1648985] Avg episode reward: [(0, '149.140')] [2024-06-15 20:09:52,286][1652491] Updated weights for policy 0, policy_version 704789 (0.0013) [2024-06-15 20:09:52,937][1652491] Updated weights for policy 0, policy_version 704830 (0.0018) [2024-06-15 20:09:55,955][1648985] Fps is (10 sec: 42598.5, 60 sec: 45875.5, 300 sec: 46652.7). Total num frames: 1443561472. Throughput: 0: 11616.7. Samples: 360962560. Policy #0 lag: (min: 15.0, avg: 99.8, max: 271.0) [2024-06-15 20:09:55,955][1648985] Avg episode reward: [(0, '162.790')] [2024-06-15 20:09:56,400][1652491] Updated weights for policy 0, policy_version 704896 (0.0015) [2024-06-15 20:09:59,040][1652491] Updated weights for policy 0, policy_version 704996 (0.0013) [2024-06-15 20:10:00,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 46421.4, 300 sec: 46652.8). Total num frames: 1443889152. Throughput: 0: 11491.6. Samples: 361015808. Policy #0 lag: (min: 15.0, avg: 99.8, max: 271.0) [2024-06-15 20:10:00,956][1648985] Avg episode reward: [(0, '162.920')] [2024-06-15 20:10:03,704][1652491] Updated weights for policy 0, policy_version 705056 (0.0019) [2024-06-15 20:10:05,955][1648985] Fps is (10 sec: 45874.7, 60 sec: 45875.3, 300 sec: 46433.1). Total num frames: 1444020224. Throughput: 0: 11468.8. Samples: 361096192. Policy #0 lag: (min: 15.0, avg: 99.8, max: 271.0) [2024-06-15 20:10:05,956][1648985] Avg episode reward: [(0, '164.690')] [2024-06-15 20:10:06,743][1652491] Updated weights for policy 0, policy_version 705089 (0.0013) [2024-06-15 20:10:07,884][1652491] Updated weights for policy 0, policy_version 705146 (0.0013) [2024-06-15 20:10:08,319][1651469] Signal inference workers to stop experience collection... (36650 times) [2024-06-15 20:10:08,405][1652491] InferenceWorker_p0-w0: stopping experience collection (36650 times) [2024-06-15 20:10:08,681][1651469] Signal inference workers to resume experience collection... (36650 times) [2024-06-15 20:10:08,682][1652491] InferenceWorker_p0-w0: resuming experience collection (36650 times) [2024-06-15 20:10:10,296][1652491] Updated weights for policy 0, policy_version 705217 (0.0028) [2024-06-15 20:10:10,955][1648985] Fps is (10 sec: 45874.7, 60 sec: 46967.3, 300 sec: 46874.9). Total num frames: 1444347904. Throughput: 0: 11616.7. Samples: 361129984. Policy #0 lag: (min: 15.0, avg: 99.8, max: 271.0) [2024-06-15 20:10:10,956][1648985] Avg episode reward: [(0, '144.390')] [2024-06-15 20:10:11,556][1652491] Updated weights for policy 0, policy_version 705279 (0.0081) [2024-06-15 20:10:15,559][1652491] Updated weights for policy 0, policy_version 705335 (0.0015) [2024-06-15 20:10:15,955][1648985] Fps is (10 sec: 52429.2, 60 sec: 45875.2, 300 sec: 46652.7). Total num frames: 1444544512. Throughput: 0: 11434.7. Samples: 361197056. Policy #0 lag: (min: 15.0, avg: 99.8, max: 271.0) [2024-06-15 20:10:15,956][1648985] Avg episode reward: [(0, '175.140')] [2024-06-15 20:10:18,507][1652491] Updated weights for policy 0, policy_version 705392 (0.0016) [2024-06-15 20:10:20,011][1652491] Updated weights for policy 0, policy_version 705426 (0.0012) [2024-06-15 20:10:20,955][1648985] Fps is (10 sec: 45874.8, 60 sec: 48059.7, 300 sec: 47097.1). Total num frames: 1444806656. Throughput: 0: 11719.1. Samples: 361270784. Policy #0 lag: (min: 15.0, avg: 99.8, max: 271.0) [2024-06-15 20:10:20,956][1648985] Avg episode reward: [(0, '178.810')] [2024-06-15 20:10:21,196][1652491] Updated weights for policy 0, policy_version 705488 (0.0013) [2024-06-15 20:10:22,212][1652491] Updated weights for policy 0, policy_version 705535 (0.0013) [2024-06-15 20:10:25,955][1648985] Fps is (10 sec: 42599.2, 60 sec: 44236.9, 300 sec: 46319.5). Total num frames: 1444970496. Throughput: 0: 11707.8. Samples: 361309696. Policy #0 lag: (min: 15.0, avg: 99.8, max: 271.0) [2024-06-15 20:10:25,955][1648985] Avg episode reward: [(0, '161.850')] [2024-06-15 20:10:26,133][1652491] Updated weights for policy 0, policy_version 705570 (0.0013) [2024-06-15 20:10:28,163][1652491] Updated weights for policy 0, policy_version 705617 (0.0049) [2024-06-15 20:10:30,007][1652491] Updated weights for policy 0, policy_version 705670 (0.0015) [2024-06-15 20:10:30,955][1648985] Fps is (10 sec: 45875.8, 60 sec: 46967.5, 300 sec: 46874.9). Total num frames: 1445265408. Throughput: 0: 11787.4. Samples: 361386496. Policy #0 lag: (min: 15.0, avg: 99.8, max: 271.0) [2024-06-15 20:10:30,956][1648985] Avg episode reward: [(0, '154.050')] [2024-06-15 20:10:31,453][1652491] Updated weights for policy 0, policy_version 705728 (0.0016) [2024-06-15 20:10:32,767][1652491] Updated weights for policy 0, policy_version 705790 (0.0012) [2024-06-15 20:10:35,955][1648985] Fps is (10 sec: 49151.4, 60 sec: 45875.3, 300 sec: 46430.6). Total num frames: 1445462016. Throughput: 0: 11867.0. Samples: 361456128. Policy #0 lag: (min: 15.0, avg: 99.8, max: 271.0) [2024-06-15 20:10:35,956][1648985] Avg episode reward: [(0, '163.050')] [2024-06-15 20:10:37,581][1652491] Updated weights for policy 0, policy_version 705840 (0.0015) [2024-06-15 20:10:39,448][1652491] Updated weights for policy 0, policy_version 705913 (0.0014) [2024-06-15 20:10:40,955][1648985] Fps is (10 sec: 45874.8, 60 sec: 47513.5, 300 sec: 46652.7). Total num frames: 1445724160. Throughput: 0: 11753.2. Samples: 361491456. Policy #0 lag: (min: 15.0, avg: 99.8, max: 271.0) [2024-06-15 20:10:40,956][1648985] Avg episode reward: [(0, '180.510')] [2024-06-15 20:10:42,358][1652491] Updated weights for policy 0, policy_version 705969 (0.0198) [2024-06-15 20:10:43,977][1652491] Updated weights for policy 0, policy_version 706040 (0.0014) [2024-06-15 20:10:45,955][1648985] Fps is (10 sec: 52427.9, 60 sec: 47513.5, 300 sec: 46541.6). Total num frames: 1445986304. Throughput: 0: 12037.6. Samples: 361557504. Policy #0 lag: (min: 15.0, avg: 99.8, max: 271.0) [2024-06-15 20:10:45,956][1648985] Avg episode reward: [(0, '178.340')] [2024-06-15 20:10:48,767][1652491] Updated weights for policy 0, policy_version 706087 (0.0012) [2024-06-15 20:10:49,813][1651469] Signal inference workers to stop experience collection... (36700 times) [2024-06-15 20:10:49,856][1652491] InferenceWorker_p0-w0: stopping experience collection (36700 times) [2024-06-15 20:10:49,871][1652491] Updated weights for policy 0, policy_version 706133 (0.0013) [2024-06-15 20:10:50,009][1651469] Signal inference workers to resume experience collection... (36700 times) [2024-06-15 20:10:50,010][1652491] InferenceWorker_p0-w0: resuming experience collection (36700 times) [2024-06-15 20:10:50,955][1648985] Fps is (10 sec: 52427.8, 60 sec: 48059.5, 300 sec: 46652.7). Total num frames: 1446248448. Throughput: 0: 11901.1. Samples: 361631744. Policy #0 lag: (min: 15.0, avg: 99.8, max: 271.0) [2024-06-15 20:10:50,956][1648985] Avg episode reward: [(0, '175.600')] [2024-06-15 20:10:52,250][1652491] Updated weights for policy 0, policy_version 706178 (0.0012) [2024-06-15 20:10:54,288][1652491] Updated weights for policy 0, policy_version 706258 (0.0015) [2024-06-15 20:10:55,955][1648985] Fps is (10 sec: 52429.4, 60 sec: 49152.0, 300 sec: 46652.7). Total num frames: 1446510592. Throughput: 0: 11844.3. Samples: 361662976. Policy #0 lag: (min: 15.0, avg: 99.8, max: 271.0) [2024-06-15 20:10:55,956][1648985] Avg episode reward: [(0, '173.580')] [2024-06-15 20:10:55,974][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000706304_1446510592.pth... [2024-06-15 20:10:56,061][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000700816_1435271168.pth [2024-06-15 20:10:56,075][1651469] Saving a milestone train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/milestones/checkpoint_000706304_1446510592.pth [2024-06-15 20:11:00,070][1652491] Updated weights for policy 0, policy_version 706324 (0.0013) [2024-06-15 20:11:00,955][1648985] Fps is (10 sec: 36045.9, 60 sec: 45329.0, 300 sec: 46097.4). Total num frames: 1446608896. Throughput: 0: 12083.2. Samples: 361740800. Policy #0 lag: (min: 15.0, avg: 99.8, max: 271.0) [2024-06-15 20:11:00,956][1648985] Avg episode reward: [(0, '166.960')] [2024-06-15 20:11:01,964][1652491] Updated weights for policy 0, policy_version 706403 (0.0079) [2024-06-15 20:11:03,071][1652491] Updated weights for policy 0, policy_version 706434 (0.0014) [2024-06-15 20:11:04,422][1652491] Updated weights for policy 0, policy_version 706482 (0.0011) [2024-06-15 20:11:05,879][1652491] Updated weights for policy 0, policy_version 706552 (0.0045) [2024-06-15 20:11:05,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 50244.4, 300 sec: 47097.1). Total num frames: 1447034880. Throughput: 0: 11764.7. Samples: 361800192. Policy #0 lag: (min: 95.0, avg: 181.7, max: 367.0) [2024-06-15 20:11:05,956][1648985] Avg episode reward: [(0, '153.270')] [2024-06-15 20:11:10,955][1648985] Fps is (10 sec: 42597.6, 60 sec: 44782.9, 300 sec: 45986.3). Total num frames: 1447034880. Throughput: 0: 11753.1. Samples: 361838592. Policy #0 lag: (min: 95.0, avg: 181.7, max: 367.0) [2024-06-15 20:11:10,956][1648985] Avg episode reward: [(0, '145.210')] [2024-06-15 20:11:12,206][1652491] Updated weights for policy 0, policy_version 706603 (0.0013) [2024-06-15 20:11:13,674][1652491] Updated weights for policy 0, policy_version 706659 (0.0014) [2024-06-15 20:11:15,334][1652491] Updated weights for policy 0, policy_version 706723 (0.0086) [2024-06-15 20:11:15,955][1648985] Fps is (10 sec: 36044.5, 60 sec: 47513.6, 300 sec: 46541.6). Total num frames: 1447395328. Throughput: 0: 11571.2. Samples: 361907200. Policy #0 lag: (min: 95.0, avg: 181.7, max: 367.0) [2024-06-15 20:11:15,956][1648985] Avg episode reward: [(0, '148.890')] [2024-06-15 20:11:16,271][1652491] Updated weights for policy 0, policy_version 706755 (0.0011) [2024-06-15 20:11:17,328][1652491] Updated weights for policy 0, policy_version 706811 (0.0094) [2024-06-15 20:11:20,955][1648985] Fps is (10 sec: 52429.7, 60 sec: 45875.3, 300 sec: 46208.5). Total num frames: 1447559168. Throughput: 0: 11662.2. Samples: 361980928. Policy #0 lag: (min: 95.0, avg: 181.7, max: 367.0) [2024-06-15 20:11:20,956][1648985] Avg episode reward: [(0, '162.830')] [2024-06-15 20:11:22,988][1652491] Updated weights for policy 0, policy_version 706864 (0.0097) [2024-06-15 20:11:24,670][1652491] Updated weights for policy 0, policy_version 706934 (0.0012) [2024-06-15 20:11:25,955][1648985] Fps is (10 sec: 42597.7, 60 sec: 47513.3, 300 sec: 46652.7). Total num frames: 1447821312. Throughput: 0: 11582.5. Samples: 362012672. Policy #0 lag: (min: 95.0, avg: 181.7, max: 367.0) [2024-06-15 20:11:25,956][1648985] Avg episode reward: [(0, '172.440')] [2024-06-15 20:11:26,565][1652491] Updated weights for policy 0, policy_version 706963 (0.0012) [2024-06-15 20:11:27,960][1651469] Signal inference workers to stop experience collection... (36750 times) [2024-06-15 20:11:28,013][1652491] Updated weights for policy 0, policy_version 707027 (0.0011) [2024-06-15 20:11:28,027][1652491] InferenceWorker_p0-w0: stopping experience collection (36750 times) [2024-06-15 20:11:28,203][1651469] Signal inference workers to resume experience collection... (36750 times) [2024-06-15 20:11:28,204][1652491] InferenceWorker_p0-w0: resuming experience collection (36750 times) [2024-06-15 20:11:29,044][1652491] Updated weights for policy 0, policy_version 707072 (0.0013) [2024-06-15 20:11:30,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 46967.5, 300 sec: 46430.6). Total num frames: 1448083456. Throughput: 0: 11685.0. Samples: 362083328. Policy #0 lag: (min: 95.0, avg: 181.7, max: 367.0) [2024-06-15 20:11:30,956][1648985] Avg episode reward: [(0, '154.590')] [2024-06-15 20:11:34,335][1652491] Updated weights for policy 0, policy_version 707137 (0.0015) [2024-06-15 20:11:35,494][1652491] Updated weights for policy 0, policy_version 707189 (0.0012) [2024-06-15 20:11:35,955][1648985] Fps is (10 sec: 52430.0, 60 sec: 48059.7, 300 sec: 46652.8). Total num frames: 1448345600. Throughput: 0: 11662.3. Samples: 362156544. Policy #0 lag: (min: 95.0, avg: 181.7, max: 367.0) [2024-06-15 20:11:35,956][1648985] Avg episode reward: [(0, '148.000')] [2024-06-15 20:11:37,704][1652491] Updated weights for policy 0, policy_version 707232 (0.0033) [2024-06-15 20:11:39,039][1652491] Updated weights for policy 0, policy_version 707296 (0.0013) [2024-06-15 20:11:40,955][1648985] Fps is (10 sec: 52429.1, 60 sec: 48059.9, 300 sec: 46652.8). Total num frames: 1448607744. Throughput: 0: 11650.9. Samples: 362187264. Policy #0 lag: (min: 95.0, avg: 181.7, max: 367.0) [2024-06-15 20:11:40,955][1648985] Avg episode reward: [(0, '138.730')] [2024-06-15 20:11:43,727][1652491] Updated weights for policy 0, policy_version 707331 (0.0013) [2024-06-15 20:11:45,545][1652491] Updated weights for policy 0, policy_version 707397 (0.0013) [2024-06-15 20:11:45,955][1648985] Fps is (10 sec: 45875.2, 60 sec: 46967.6, 300 sec: 46430.6). Total num frames: 1448804352. Throughput: 0: 11514.3. Samples: 362258944. Policy #0 lag: (min: 95.0, avg: 181.7, max: 367.0) [2024-06-15 20:11:45,956][1648985] Avg episode reward: [(0, '126.260')] [2024-06-15 20:11:46,466][1652491] Updated weights for policy 0, policy_version 707452 (0.0013) [2024-06-15 20:11:50,033][1652491] Updated weights for policy 0, policy_version 707513 (0.0191) [2024-06-15 20:11:50,955][1648985] Fps is (10 sec: 45873.8, 60 sec: 46967.5, 300 sec: 46874.9). Total num frames: 1449066496. Throughput: 0: 11753.2. Samples: 362329088. Policy #0 lag: (min: 95.0, avg: 181.7, max: 367.0) [2024-06-15 20:11:50,956][1648985] Avg episode reward: [(0, '139.760')] [2024-06-15 20:11:51,499][1652491] Updated weights for policy 0, policy_version 707574 (0.0013) [2024-06-15 20:11:55,504][1652491] Updated weights for policy 0, policy_version 707616 (0.0060) [2024-06-15 20:11:55,955][1648985] Fps is (10 sec: 42598.4, 60 sec: 45329.1, 300 sec: 46430.6). Total num frames: 1449230336. Throughput: 0: 11776.0. Samples: 362368512. Policy #0 lag: (min: 95.0, avg: 181.7, max: 367.0) [2024-06-15 20:11:55,956][1648985] Avg episode reward: [(0, '127.800')] [2024-06-15 20:11:57,097][1652491] Updated weights for policy 0, policy_version 707680 (0.0114) [2024-06-15 20:12:00,875][1652491] Updated weights for policy 0, policy_version 707760 (0.0014) [2024-06-15 20:12:00,955][1648985] Fps is (10 sec: 42599.7, 60 sec: 48059.8, 300 sec: 46763.9). Total num frames: 1449492480. Throughput: 0: 11832.9. Samples: 362439680. Policy #0 lag: (min: 95.0, avg: 181.7, max: 367.0) [2024-06-15 20:12:00,955][1648985] Avg episode reward: [(0, '134.360')] [2024-06-15 20:12:02,344][1652491] Updated weights for policy 0, policy_version 707835 (0.0013) [2024-06-15 20:12:05,955][1648985] Fps is (10 sec: 42598.7, 60 sec: 43690.7, 300 sec: 46208.4). Total num frames: 1449656320. Throughput: 0: 11821.5. Samples: 362512896. Policy #0 lag: (min: 95.0, avg: 181.7, max: 367.0) [2024-06-15 20:12:05,956][1648985] Avg episode reward: [(0, '147.320')] [2024-06-15 20:12:07,001][1652491] Updated weights for policy 0, policy_version 707902 (0.0014) [2024-06-15 20:12:08,425][1652491] Updated weights for policy 0, policy_version 707939 (0.0013) [2024-06-15 20:12:10,955][1648985] Fps is (10 sec: 42597.9, 60 sec: 48059.8, 300 sec: 46652.7). Total num frames: 1449918464. Throughput: 0: 11776.1. Samples: 362542592. Policy #0 lag: (min: 95.0, avg: 181.7, max: 367.0) [2024-06-15 20:12:10,956][1648985] Avg episode reward: [(0, '164.240')] [2024-06-15 20:12:11,427][1651469] Signal inference workers to stop experience collection... (36800 times) [2024-06-15 20:12:11,458][1652491] InferenceWorker_p0-w0: stopping experience collection (36800 times) [2024-06-15 20:12:11,524][1652491] Updated weights for policy 0, policy_version 707990 (0.0012) [2024-06-15 20:12:11,647][1651469] Signal inference workers to resume experience collection... (36800 times) [2024-06-15 20:12:11,647][1652491] InferenceWorker_p0-w0: resuming experience collection (36800 times) [2024-06-15 20:12:13,606][1652491] Updated weights for policy 0, policy_version 708080 (0.0164) [2024-06-15 20:12:15,955][1648985] Fps is (10 sec: 52427.9, 60 sec: 46421.3, 300 sec: 46319.5). Total num frames: 1450180608. Throughput: 0: 11741.8. Samples: 362611712. Policy #0 lag: (min: 95.0, avg: 181.7, max: 367.0) [2024-06-15 20:12:15,956][1648985] Avg episode reward: [(0, '152.830')] [2024-06-15 20:12:17,572][1652491] Updated weights for policy 0, policy_version 708128 (0.0058) [2024-06-15 20:12:19,148][1652491] Updated weights for policy 0, policy_version 708192 (0.0013) [2024-06-15 20:12:19,992][1652491] Updated weights for policy 0, policy_version 708224 (0.0012) [2024-06-15 20:12:20,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 48059.7, 300 sec: 46652.8). Total num frames: 1450442752. Throughput: 0: 11832.9. Samples: 362689024. Policy #0 lag: (min: 95.0, avg: 181.7, max: 367.0) [2024-06-15 20:12:20,956][1648985] Avg episode reward: [(0, '132.020')] [2024-06-15 20:12:23,860][1652491] Updated weights for policy 0, policy_version 708304 (0.0012) [2024-06-15 20:12:24,751][1652491] Updated weights for policy 0, policy_version 708350 (0.0012) [2024-06-15 20:12:25,955][1648985] Fps is (10 sec: 52429.3, 60 sec: 48059.9, 300 sec: 46652.7). Total num frames: 1450704896. Throughput: 0: 11832.9. Samples: 362719744. Policy #0 lag: (min: 95.0, avg: 181.7, max: 367.0) [2024-06-15 20:12:25,956][1648985] Avg episode reward: [(0, '122.190')] [2024-06-15 20:12:29,621][1652491] Updated weights for policy 0, policy_version 708416 (0.0012) [2024-06-15 20:12:30,955][1648985] Fps is (10 sec: 49151.7, 60 sec: 47513.5, 300 sec: 46652.7). Total num frames: 1450934272. Throughput: 0: 11832.9. Samples: 362791424. Policy #0 lag: (min: 95.0, avg: 181.7, max: 367.0) [2024-06-15 20:12:30,956][1648985] Avg episode reward: [(0, '136.910')] [2024-06-15 20:12:31,030][1652491] Updated weights for policy 0, policy_version 708471 (0.0053) [2024-06-15 20:12:33,676][1652491] Updated weights for policy 0, policy_version 708512 (0.0043) [2024-06-15 20:12:34,996][1652491] Updated weights for policy 0, policy_version 708565 (0.0012) [2024-06-15 20:12:35,933][1652491] Updated weights for policy 0, policy_version 708602 (0.0015) [2024-06-15 20:12:35,955][1648985] Fps is (10 sec: 49152.2, 60 sec: 47513.6, 300 sec: 46987.8). Total num frames: 1451196416. Throughput: 0: 11764.7. Samples: 362858496. Policy #0 lag: (min: 95.0, avg: 181.7, max: 367.0) [2024-06-15 20:12:35,956][1648985] Avg episode reward: [(0, '138.220')] [2024-06-15 20:12:40,493][1652491] Updated weights for policy 0, policy_version 708660 (0.0017) [2024-06-15 20:12:40,955][1648985] Fps is (10 sec: 45875.0, 60 sec: 46421.2, 300 sec: 46763.8). Total num frames: 1451393024. Throughput: 0: 11901.1. Samples: 362904064. Policy #0 lag: (min: 95.0, avg: 181.7, max: 367.0) [2024-06-15 20:12:40,956][1648985] Avg episode reward: [(0, '144.150')] [2024-06-15 20:12:41,871][1652491] Updated weights for policy 0, policy_version 708729 (0.0014) [2024-06-15 20:12:44,727][1652491] Updated weights for policy 0, policy_version 708772 (0.0013) [2024-06-15 20:12:45,955][1648985] Fps is (10 sec: 49151.3, 60 sec: 48059.6, 300 sec: 47097.0). Total num frames: 1451687936. Throughput: 0: 11844.2. Samples: 362972672. Policy #0 lag: (min: 5.0, avg: 114.1, max: 261.0) [2024-06-15 20:12:45,956][1648985] Avg episode reward: [(0, '141.480')] [2024-06-15 20:12:46,329][1652491] Updated weights for policy 0, policy_version 708853 (0.0016) [2024-06-15 20:12:50,700][1652491] Updated weights for policy 0, policy_version 708884 (0.0017) [2024-06-15 20:12:50,955][1648985] Fps is (10 sec: 42598.8, 60 sec: 45875.4, 300 sec: 46430.6). Total num frames: 1451819008. Throughput: 0: 11980.8. Samples: 363052032. Policy #0 lag: (min: 5.0, avg: 114.1, max: 261.0) [2024-06-15 20:12:50,956][1648985] Avg episode reward: [(0, '147.540')] [2024-06-15 20:12:51,719][1651469] Signal inference workers to stop experience collection... (36850 times) [2024-06-15 20:12:51,789][1652491] InferenceWorker_p0-w0: stopping experience collection (36850 times) [2024-06-15 20:12:51,797][1652491] Updated weights for policy 0, policy_version 708932 (0.0013) [2024-06-15 20:12:51,983][1651469] Signal inference workers to resume experience collection... (36850 times) [2024-06-15 20:12:51,984][1652491] InferenceWorker_p0-w0: resuming experience collection (36850 times) [2024-06-15 20:12:53,219][1652491] Updated weights for policy 0, policy_version 708992 (0.0015) [2024-06-15 20:12:55,955][1648985] Fps is (10 sec: 45874.2, 60 sec: 48605.6, 300 sec: 46986.0). Total num frames: 1452146688. Throughput: 0: 11992.1. Samples: 363082240. Policy #0 lag: (min: 5.0, avg: 114.1, max: 261.0) [2024-06-15 20:12:55,956][1648985] Avg episode reward: [(0, '156.050')] [2024-06-15 20:12:56,098][1652491] Updated weights for policy 0, policy_version 709072 (0.0090) [2024-06-15 20:12:56,435][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000709088_1452212224.pth... [2024-06-15 20:12:56,612][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000703536_1440841728.pth [2024-06-15 20:12:57,269][1652491] Updated weights for policy 0, policy_version 709119 (0.0013) [2024-06-15 20:13:00,955][1648985] Fps is (10 sec: 45875.5, 60 sec: 46421.3, 300 sec: 46208.5). Total num frames: 1452277760. Throughput: 0: 11980.8. Samples: 363150848. Policy #0 lag: (min: 5.0, avg: 114.1, max: 261.0) [2024-06-15 20:13:00,955][1648985] Avg episode reward: [(0, '156.000')] [2024-06-15 20:13:03,038][1652491] Updated weights for policy 0, policy_version 709185 (0.0014) [2024-06-15 20:13:04,410][1652491] Updated weights for policy 0, policy_version 709248 (0.0014) [2024-06-15 20:13:05,955][1648985] Fps is (10 sec: 39322.7, 60 sec: 48059.6, 300 sec: 46652.8). Total num frames: 1452539904. Throughput: 0: 11832.9. Samples: 363221504. Policy #0 lag: (min: 5.0, avg: 114.1, max: 261.0) [2024-06-15 20:13:05,956][1648985] Avg episode reward: [(0, '147.870')] [2024-06-15 20:13:07,364][1652491] Updated weights for policy 0, policy_version 709317 (0.0013) [2024-06-15 20:13:08,448][1652491] Updated weights for policy 0, policy_version 709368 (0.0020) [2024-06-15 20:13:10,955][1648985] Fps is (10 sec: 52428.3, 60 sec: 48059.7, 300 sec: 46319.5). Total num frames: 1452802048. Throughput: 0: 11832.9. Samples: 363252224. Policy #0 lag: (min: 5.0, avg: 114.1, max: 261.0) [2024-06-15 20:13:10,956][1648985] Avg episode reward: [(0, '144.870')] [2024-06-15 20:13:13,436][1652491] Updated weights for policy 0, policy_version 709408 (0.0014) [2024-06-15 20:13:15,288][1652491] Updated weights for policy 0, policy_version 709473 (0.0011) [2024-06-15 20:13:15,955][1648985] Fps is (10 sec: 49152.3, 60 sec: 47513.7, 300 sec: 46541.8). Total num frames: 1453031424. Throughput: 0: 11867.0. Samples: 363325440. Policy #0 lag: (min: 5.0, avg: 114.1, max: 261.0) [2024-06-15 20:13:15,956][1648985] Avg episode reward: [(0, '148.680')] [2024-06-15 20:13:17,560][1652491] Updated weights for policy 0, policy_version 709526 (0.0020) [2024-06-15 20:13:19,587][1652491] Updated weights for policy 0, policy_version 709616 (0.0015) [2024-06-15 20:13:20,955][1648985] Fps is (10 sec: 52429.3, 60 sec: 48059.8, 300 sec: 46652.8). Total num frames: 1453326336. Throughput: 0: 11855.7. Samples: 363392000. Policy #0 lag: (min: 5.0, avg: 114.1, max: 261.0) [2024-06-15 20:13:20,956][1648985] Avg episode reward: [(0, '155.990')] [2024-06-15 20:13:25,564][1652491] Updated weights for policy 0, policy_version 709680 (0.0013) [2024-06-15 20:13:25,955][1648985] Fps is (10 sec: 42598.2, 60 sec: 45875.2, 300 sec: 46653.3). Total num frames: 1453457408. Throughput: 0: 11776.0. Samples: 363433984. Policy #0 lag: (min: 5.0, avg: 114.1, max: 261.0) [2024-06-15 20:13:25,956][1648985] Avg episode reward: [(0, '161.640')] [2024-06-15 20:13:26,910][1652491] Updated weights for policy 0, policy_version 709733 (0.0016) [2024-06-15 20:13:27,884][1652491] Updated weights for policy 0, policy_version 709764 (0.0012) [2024-06-15 20:13:29,719][1652491] Updated weights for policy 0, policy_version 709840 (0.0019) [2024-06-15 20:13:30,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 48605.9, 300 sec: 47097.1). Total num frames: 1453850624. Throughput: 0: 11685.0. Samples: 363498496. Policy #0 lag: (min: 5.0, avg: 114.1, max: 261.0) [2024-06-15 20:13:30,956][1648985] Avg episode reward: [(0, '164.710')] [2024-06-15 20:13:35,639][1651469] Signal inference workers to stop experience collection... (36900 times) [2024-06-15 20:13:35,687][1652491] InferenceWorker_p0-w0: stopping experience collection (36900 times) [2024-06-15 20:13:35,879][1651469] Signal inference workers to resume experience collection... (36900 times) [2024-06-15 20:13:35,879][1652491] InferenceWorker_p0-w0: resuming experience collection (36900 times) [2024-06-15 20:13:35,882][1652491] Updated weights for policy 0, policy_version 709904 (0.0025) [2024-06-15 20:13:35,966][1648985] Fps is (10 sec: 42551.5, 60 sec: 44774.7, 300 sec: 46317.8). Total num frames: 1453883392. Throughput: 0: 11704.9. Samples: 363578880. Policy #0 lag: (min: 5.0, avg: 114.1, max: 261.0) [2024-06-15 20:13:35,967][1648985] Avg episode reward: [(0, '151.320')] [2024-06-15 20:13:37,563][1652491] Updated weights for policy 0, policy_version 709968 (0.0012) [2024-06-15 20:13:38,934][1652491] Updated weights for policy 0, policy_version 710014 (0.0018) [2024-06-15 20:13:40,395][1652491] Updated weights for policy 0, policy_version 710080 (0.0013) [2024-06-15 20:13:40,955][1648985] Fps is (10 sec: 39321.3, 60 sec: 47513.6, 300 sec: 47097.1). Total num frames: 1454243840. Throughput: 0: 11571.3. Samples: 363602944. Policy #0 lag: (min: 5.0, avg: 114.1, max: 261.0) [2024-06-15 20:13:40,956][1648985] Avg episode reward: [(0, '143.560')] [2024-06-15 20:13:42,532][1652491] Updated weights for policy 0, policy_version 710143 (0.0012) [2024-06-15 20:13:45,955][1648985] Fps is (10 sec: 49206.4, 60 sec: 44783.0, 300 sec: 46208.4). Total num frames: 1454374912. Throughput: 0: 11593.9. Samples: 363672576. Policy #0 lag: (min: 5.0, avg: 114.1, max: 261.0) [2024-06-15 20:13:45,956][1648985] Avg episode reward: [(0, '148.300')] [2024-06-15 20:13:48,274][1652491] Updated weights for policy 0, policy_version 710192 (0.0014) [2024-06-15 20:13:49,984][1652491] Updated weights for policy 0, policy_version 710266 (0.0028) [2024-06-15 20:13:50,955][1648985] Fps is (10 sec: 42597.9, 60 sec: 47513.5, 300 sec: 46986.0). Total num frames: 1454669824. Throughput: 0: 11582.6. Samples: 363742720. Policy #0 lag: (min: 5.0, avg: 114.1, max: 261.0) [2024-06-15 20:13:50,956][1648985] Avg episode reward: [(0, '164.360')] [2024-06-15 20:13:51,563][1652491] Updated weights for policy 0, policy_version 710321 (0.0014) [2024-06-15 20:13:53,866][1652491] Updated weights for policy 0, policy_version 710392 (0.0016) [2024-06-15 20:13:55,955][1648985] Fps is (10 sec: 52428.4, 60 sec: 45875.4, 300 sec: 46763.8). Total num frames: 1454899200. Throughput: 0: 11457.4. Samples: 363767808. Policy #0 lag: (min: 5.0, avg: 114.1, max: 261.0) [2024-06-15 20:13:55,956][1648985] Avg episode reward: [(0, '156.070')] [2024-06-15 20:13:59,619][1652491] Updated weights for policy 0, policy_version 710442 (0.0012) [2024-06-15 20:14:00,700][1652491] Updated weights for policy 0, policy_version 710483 (0.0013) [2024-06-15 20:14:00,955][1648985] Fps is (10 sec: 42599.3, 60 sec: 46967.5, 300 sec: 46874.9). Total num frames: 1455095808. Throughput: 0: 11594.0. Samples: 363847168. Policy #0 lag: (min: 5.0, avg: 114.1, max: 261.0) [2024-06-15 20:14:00,955][1648985] Avg episode reward: [(0, '169.630')] [2024-06-15 20:14:02,796][1652491] Updated weights for policy 0, policy_version 710546 (0.0013) [2024-06-15 20:14:04,793][1652491] Updated weights for policy 0, policy_version 710625 (0.0169) [2024-06-15 20:14:05,955][1648985] Fps is (10 sec: 52429.3, 60 sec: 48059.8, 300 sec: 47097.0). Total num frames: 1455423488. Throughput: 0: 11423.3. Samples: 363906048. Policy #0 lag: (min: 5.0, avg: 114.1, max: 261.0) [2024-06-15 20:14:05,956][1648985] Avg episode reward: [(0, '160.810')] [2024-06-15 20:14:10,324][1652491] Updated weights for policy 0, policy_version 710672 (0.0014) [2024-06-15 20:14:10,955][1648985] Fps is (10 sec: 39320.7, 60 sec: 44782.8, 300 sec: 46430.6). Total num frames: 1455489024. Throughput: 0: 11446.0. Samples: 363949056. Policy #0 lag: (min: 5.0, avg: 114.1, max: 261.0) [2024-06-15 20:14:10,956][1648985] Avg episode reward: [(0, '136.260')] [2024-06-15 20:14:12,397][1652491] Updated weights for policy 0, policy_version 710758 (0.0014) [2024-06-15 20:14:14,707][1651469] Signal inference workers to stop experience collection... (36950 times) [2024-06-15 20:14:14,760][1652491] InferenceWorker_p0-w0: stopping experience collection (36950 times) [2024-06-15 20:14:14,923][1651469] Signal inference workers to resume experience collection... (36950 times) [2024-06-15 20:14:14,924][1652491] InferenceWorker_p0-w0: resuming experience collection (36950 times) [2024-06-15 20:14:15,064][1652491] Updated weights for policy 0, policy_version 710837 (0.0015) [2024-06-15 20:14:15,955][1648985] Fps is (10 sec: 42598.6, 60 sec: 46967.5, 300 sec: 47208.2). Total num frames: 1455849472. Throughput: 0: 11377.8. Samples: 364010496. Policy #0 lag: (min: 5.0, avg: 114.1, max: 261.0) [2024-06-15 20:14:15,956][1648985] Avg episode reward: [(0, '123.850')] [2024-06-15 20:14:16,668][1652491] Updated weights for policy 0, policy_version 710896 (0.0015) [2024-06-15 20:14:20,955][1648985] Fps is (10 sec: 45876.2, 60 sec: 43690.7, 300 sec: 46208.4). Total num frames: 1455947776. Throughput: 0: 11392.0. Samples: 364091392. Policy #0 lag: (min: 5.0, avg: 114.1, max: 261.0) [2024-06-15 20:14:20,956][1648985] Avg episode reward: [(0, '139.900')] [2024-06-15 20:14:22,784][1652491] Updated weights for policy 0, policy_version 710960 (0.0114) [2024-06-15 20:14:24,040][1652491] Updated weights for policy 0, policy_version 711025 (0.0015) [2024-06-15 20:14:25,596][1652491] Updated weights for policy 0, policy_version 711072 (0.0031) [2024-06-15 20:14:25,955][1648985] Fps is (10 sec: 42598.2, 60 sec: 46967.5, 300 sec: 46874.9). Total num frames: 1456275456. Throughput: 0: 11423.3. Samples: 364116992. Policy #0 lag: (min: 15.0, avg: 144.4, max: 271.0) [2024-06-15 20:14:25,956][1648985] Avg episode reward: [(0, '156.970')] [2024-06-15 20:14:27,523][1652491] Updated weights for policy 0, policy_version 711125 (0.0023) [2024-06-15 20:14:28,542][1652491] Updated weights for policy 0, policy_version 711166 (0.0014) [2024-06-15 20:14:30,955][1648985] Fps is (10 sec: 52428.2, 60 sec: 43690.6, 300 sec: 46652.7). Total num frames: 1456472064. Throughput: 0: 11343.6. Samples: 364183040. Policy #0 lag: (min: 15.0, avg: 144.4, max: 271.0) [2024-06-15 20:14:30,956][1648985] Avg episode reward: [(0, '152.290')] [2024-06-15 20:14:34,262][1652491] Updated weights for policy 0, policy_version 711218 (0.0012) [2024-06-15 20:14:35,375][1652491] Updated weights for policy 0, policy_version 711268 (0.0012) [2024-06-15 20:14:35,955][1648985] Fps is (10 sec: 45875.3, 60 sec: 47522.4, 300 sec: 46986.0). Total num frames: 1456734208. Throughput: 0: 11423.3. Samples: 364256768. Policy #0 lag: (min: 15.0, avg: 144.4, max: 271.0) [2024-06-15 20:14:35,955][1648985] Avg episode reward: [(0, '140.140')] [2024-06-15 20:14:36,827][1652491] Updated weights for policy 0, policy_version 711329 (0.0015) [2024-06-15 20:14:38,718][1652491] Updated weights for policy 0, policy_version 711381 (0.0013) [2024-06-15 20:14:40,955][1648985] Fps is (10 sec: 52429.5, 60 sec: 45875.3, 300 sec: 46986.0). Total num frames: 1456996352. Throughput: 0: 11571.2. Samples: 364288512. Policy #0 lag: (min: 15.0, avg: 144.4, max: 271.0) [2024-06-15 20:14:40,955][1648985] Avg episode reward: [(0, '148.940')] [2024-06-15 20:14:45,056][1652491] Updated weights for policy 0, policy_version 711456 (0.0127) [2024-06-15 20:14:45,955][1648985] Fps is (10 sec: 39321.0, 60 sec: 45875.1, 300 sec: 46652.7). Total num frames: 1457127424. Throughput: 0: 11559.8. Samples: 364367360. Policy #0 lag: (min: 15.0, avg: 144.4, max: 271.0) [2024-06-15 20:14:45,956][1648985] Avg episode reward: [(0, '142.480')] [2024-06-15 20:14:46,629][1652491] Updated weights for policy 0, policy_version 711523 (0.0074) [2024-06-15 20:14:48,083][1652491] Updated weights for policy 0, policy_version 711588 (0.0013) [2024-06-15 20:14:50,955][1648985] Fps is (10 sec: 45875.1, 60 sec: 46421.5, 300 sec: 47097.1). Total num frames: 1457455104. Throughput: 0: 11537.1. Samples: 364425216. Policy #0 lag: (min: 15.0, avg: 144.4, max: 271.0) [2024-06-15 20:14:50,956][1648985] Avg episode reward: [(0, '141.310')] [2024-06-15 20:14:51,206][1652491] Updated weights for policy 0, policy_version 711671 (0.0014) [2024-06-15 20:14:55,955][1648985] Fps is (10 sec: 45875.6, 60 sec: 44783.0, 300 sec: 46430.6). Total num frames: 1457586176. Throughput: 0: 11491.6. Samples: 364466176. Policy #0 lag: (min: 15.0, avg: 144.4, max: 271.0) [2024-06-15 20:14:55,955][1648985] Avg episode reward: [(0, '131.010')] [2024-06-15 20:14:56,290][1652491] Updated weights for policy 0, policy_version 711736 (0.0013) [2024-06-15 20:14:56,397][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000711744_1457651712.pth... [2024-06-15 20:14:56,448][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000706304_1446510592.pth [2024-06-15 20:14:57,727][1652491] Updated weights for policy 0, policy_version 711793 (0.0051) [2024-06-15 20:14:58,377][1651469] Signal inference workers to stop experience collection... (37000 times) [2024-06-15 20:14:58,396][1652491] InferenceWorker_p0-w0: stopping experience collection (37000 times) [2024-06-15 20:14:58,569][1651469] Signal inference workers to resume experience collection... (37000 times) [2024-06-15 20:14:58,586][1652491] InferenceWorker_p0-w0: resuming experience collection (37000 times) [2024-06-15 20:14:59,124][1652491] Updated weights for policy 0, policy_version 711842 (0.0032) [2024-06-15 20:15:00,955][1648985] Fps is (10 sec: 45874.9, 60 sec: 46967.4, 300 sec: 47097.1). Total num frames: 1457913856. Throughput: 0: 11673.6. Samples: 364535808. Policy #0 lag: (min: 15.0, avg: 144.4, max: 271.0) [2024-06-15 20:15:00,956][1648985] Avg episode reward: [(0, '134.720')] [2024-06-15 20:15:01,093][1652491] Updated weights for policy 0, policy_version 711877 (0.0012) [2024-06-15 20:15:05,358][1652491] Updated weights for policy 0, policy_version 711937 (0.0015) [2024-06-15 20:15:05,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 44782.9, 300 sec: 46652.8). Total num frames: 1458110464. Throughput: 0: 11582.6. Samples: 364612608. Policy #0 lag: (min: 15.0, avg: 144.4, max: 271.0) [2024-06-15 20:15:05,956][1648985] Avg episode reward: [(0, '138.520')] [2024-06-15 20:15:08,305][1652491] Updated weights for policy 0, policy_version 712022 (0.0015) [2024-06-15 20:15:09,810][1652491] Updated weights for policy 0, policy_version 712066 (0.0016) [2024-06-15 20:15:10,955][1648985] Fps is (10 sec: 49152.0, 60 sec: 48606.0, 300 sec: 46986.0). Total num frames: 1458405376. Throughput: 0: 11730.5. Samples: 364644864. Policy #0 lag: (min: 15.0, avg: 144.4, max: 271.0) [2024-06-15 20:15:10,956][1648985] Avg episode reward: [(0, '132.010')] [2024-06-15 20:15:12,642][1652491] Updated weights for policy 0, policy_version 712130 (0.0017) [2024-06-15 20:15:13,915][1652491] Updated weights for policy 0, policy_version 712186 (0.0012) [2024-06-15 20:15:15,955][1648985] Fps is (10 sec: 45875.6, 60 sec: 45329.1, 300 sec: 46652.8). Total num frames: 1458569216. Throughput: 0: 11798.8. Samples: 364713984. Policy #0 lag: (min: 15.0, avg: 144.4, max: 271.0) [2024-06-15 20:15:15,955][1648985] Avg episode reward: [(0, '143.510')] [2024-06-15 20:15:16,757][1652491] Updated weights for policy 0, policy_version 712228 (0.0011) [2024-06-15 20:15:19,741][1652491] Updated weights for policy 0, policy_version 712278 (0.0021) [2024-06-15 20:15:20,591][1652491] Updated weights for policy 0, policy_version 712324 (0.0015) [2024-06-15 20:15:20,955][1648985] Fps is (10 sec: 45875.3, 60 sec: 48605.8, 300 sec: 47097.0). Total num frames: 1458864128. Throughput: 0: 11821.5. Samples: 364788736. Policy #0 lag: (min: 15.0, avg: 144.4, max: 271.0) [2024-06-15 20:15:20,956][1648985] Avg episode reward: [(0, '145.130')] [2024-06-15 20:15:21,666][1652491] Updated weights for policy 0, policy_version 712380 (0.0015) [2024-06-15 20:15:23,651][1652491] Updated weights for policy 0, policy_version 712422 (0.0087) [2024-06-15 20:15:25,955][1648985] Fps is (10 sec: 52428.4, 60 sec: 46967.4, 300 sec: 46874.9). Total num frames: 1459093504. Throughput: 0: 11844.3. Samples: 364821504. Policy #0 lag: (min: 15.0, avg: 144.4, max: 271.0) [2024-06-15 20:15:25,956][1648985] Avg episode reward: [(0, '142.380')] [2024-06-15 20:15:26,389][1652491] Updated weights for policy 0, policy_version 712464 (0.0013) [2024-06-15 20:15:27,400][1652491] Updated weights for policy 0, policy_version 712505 (0.0010) [2024-06-15 20:15:30,813][1652491] Updated weights for policy 0, policy_version 712560 (0.0014) [2024-06-15 20:15:30,955][1648985] Fps is (10 sec: 45875.3, 60 sec: 47513.7, 300 sec: 46986.0). Total num frames: 1459322880. Throughput: 0: 11946.7. Samples: 364904960. Policy #0 lag: (min: 15.0, avg: 144.4, max: 271.0) [2024-06-15 20:15:30,956][1648985] Avg episode reward: [(0, '134.620')] [2024-06-15 20:15:31,875][1652491] Updated weights for policy 0, policy_version 712608 (0.0013) [2024-06-15 20:15:34,061][1652491] Updated weights for policy 0, policy_version 712658 (0.0013) [2024-06-15 20:15:34,974][1652491] Updated weights for policy 0, policy_version 712704 (0.0123) [2024-06-15 20:15:35,955][1648985] Fps is (10 sec: 52429.3, 60 sec: 48059.8, 300 sec: 47097.1). Total num frames: 1459617792. Throughput: 0: 12140.1. Samples: 364971520. Policy #0 lag: (min: 15.0, avg: 144.4, max: 271.0) [2024-06-15 20:15:35,955][1648985] Avg episode reward: [(0, '131.170')] [2024-06-15 20:15:38,192][1652491] Updated weights for policy 0, policy_version 712761 (0.0014) [2024-06-15 20:15:40,955][1648985] Fps is (10 sec: 42598.4, 60 sec: 45875.2, 300 sec: 46652.8). Total num frames: 1459748864. Throughput: 0: 12037.7. Samples: 365007872. Policy #0 lag: (min: 15.0, avg: 144.4, max: 271.0) [2024-06-15 20:15:40,956][1648985] Avg episode reward: [(0, '127.480')] [2024-06-15 20:15:41,933][1652491] Updated weights for policy 0, policy_version 712802 (0.0013) [2024-06-15 20:15:42,299][1651469] Signal inference workers to stop experience collection... (37050 times) [2024-06-15 20:15:42,393][1652491] InferenceWorker_p0-w0: stopping experience collection (37050 times) [2024-06-15 20:15:42,590][1651469] Signal inference workers to resume experience collection... (37050 times) [2024-06-15 20:15:42,591][1652491] InferenceWorker_p0-w0: resuming experience collection (37050 times) [2024-06-15 20:15:43,122][1652491] Updated weights for policy 0, policy_version 712851 (0.0032) [2024-06-15 20:15:43,961][1652491] Updated weights for policy 0, policy_version 712896 (0.0013) [2024-06-15 20:15:45,888][1652491] Updated weights for policy 0, policy_version 712960 (0.0110) [2024-06-15 20:15:45,955][1648985] Fps is (10 sec: 52428.2, 60 sec: 50244.3, 300 sec: 47097.1). Total num frames: 1460142080. Throughput: 0: 12094.6. Samples: 365080064. Policy #0 lag: (min: 15.0, avg: 144.4, max: 271.0) [2024-06-15 20:15:45,956][1648985] Avg episode reward: [(0, '133.400')] [2024-06-15 20:15:49,095][1652491] Updated weights for policy 0, policy_version 713024 (0.0013) [2024-06-15 20:15:50,955][1648985] Fps is (10 sec: 52428.0, 60 sec: 46967.3, 300 sec: 46652.7). Total num frames: 1460273152. Throughput: 0: 12083.2. Samples: 365156352. Policy #0 lag: (min: 15.0, avg: 144.4, max: 271.0) [2024-06-15 20:15:50,956][1648985] Avg episode reward: [(0, '148.590')] [2024-06-15 20:15:53,442][1652491] Updated weights for policy 0, policy_version 713078 (0.0016) [2024-06-15 20:15:54,880][1652491] Updated weights for policy 0, policy_version 713142 (0.0014) [2024-06-15 20:15:55,793][1652491] Updated weights for policy 0, policy_version 713168 (0.0012) [2024-06-15 20:15:55,955][1648985] Fps is (10 sec: 42598.2, 60 sec: 49698.1, 300 sec: 47319.2). Total num frames: 1460568064. Throughput: 0: 12162.8. Samples: 365192192. Policy #0 lag: (min: 15.0, avg: 144.4, max: 271.0) [2024-06-15 20:15:55,956][1648985] Avg episode reward: [(0, '154.890')] [2024-06-15 20:15:56,769][1652491] Updated weights for policy 0, policy_version 713211 (0.0014) [2024-06-15 20:15:59,286][1652491] Updated weights for policy 0, policy_version 713264 (0.0013) [2024-06-15 20:16:00,955][1648985] Fps is (10 sec: 52429.6, 60 sec: 48059.8, 300 sec: 46652.7). Total num frames: 1460797440. Throughput: 0: 12162.8. Samples: 365261312. Policy #0 lag: (min: 15.0, avg: 144.4, max: 271.0) [2024-06-15 20:16:00,956][1648985] Avg episode reward: [(0, '147.620')] [2024-06-15 20:16:03,103][1652491] Updated weights for policy 0, policy_version 713312 (0.0013) [2024-06-15 20:16:04,699][1652491] Updated weights for policy 0, policy_version 713380 (0.0031) [2024-06-15 20:16:05,955][1648985] Fps is (10 sec: 49150.9, 60 sec: 49151.8, 300 sec: 47541.4). Total num frames: 1461059584. Throughput: 0: 12049.0. Samples: 365330944. Policy #0 lag: (min: 15.0, avg: 144.4, max: 271.0) [2024-06-15 20:16:05,956][1648985] Avg episode reward: [(0, '142.830')] [2024-06-15 20:16:07,777][1652491] Updated weights for policy 0, policy_version 713440 (0.0016) [2024-06-15 20:16:08,448][1652491] Updated weights for policy 0, policy_version 713469 (0.0012) [2024-06-15 20:16:10,405][1652491] Updated weights for policy 0, policy_version 713530 (0.0012) [2024-06-15 20:16:10,955][1648985] Fps is (10 sec: 52427.9, 60 sec: 48605.8, 300 sec: 47208.1). Total num frames: 1461321728. Throughput: 0: 12174.2. Samples: 365369344. Policy #0 lag: (min: 15.0, avg: 137.1, max: 271.0) [2024-06-15 20:16:10,956][1648985] Avg episode reward: [(0, '139.620')] [2024-06-15 20:16:14,150][1652491] Updated weights for policy 0, policy_version 713571 (0.0012) [2024-06-15 20:16:15,527][1652491] Updated weights for policy 0, policy_version 713632 (0.0103) [2024-06-15 20:16:15,955][1648985] Fps is (10 sec: 49153.1, 60 sec: 49698.0, 300 sec: 47430.3). Total num frames: 1461551104. Throughput: 0: 11969.4. Samples: 365443584. Policy #0 lag: (min: 15.0, avg: 137.1, max: 271.0) [2024-06-15 20:16:15,956][1648985] Avg episode reward: [(0, '140.850')] [2024-06-15 20:16:16,149][1652491] Updated weights for policy 0, policy_version 713663 (0.0014) [2024-06-15 20:16:19,131][1652491] Updated weights for policy 0, policy_version 713718 (0.0011) [2024-06-15 20:16:20,955][1648985] Fps is (10 sec: 42599.2, 60 sec: 48059.7, 300 sec: 47208.2). Total num frames: 1461747712. Throughput: 0: 12049.0. Samples: 365513728. Policy #0 lag: (min: 15.0, avg: 137.1, max: 271.0) [2024-06-15 20:16:20,956][1648985] Avg episode reward: [(0, '155.960')] [2024-06-15 20:16:21,180][1652491] Updated weights for policy 0, policy_version 713764 (0.0039) [2024-06-15 20:16:24,969][1651469] Signal inference workers to stop experience collection... (37100 times) [2024-06-15 20:16:25,014][1652491] InferenceWorker_p0-w0: stopping experience collection (37100 times) [2024-06-15 20:16:25,209][1651469] Signal inference workers to resume experience collection... (37100 times) [2024-06-15 20:16:25,210][1652491] InferenceWorker_p0-w0: resuming experience collection (37100 times) [2024-06-15 20:16:25,903][1652491] Updated weights for policy 0, policy_version 713852 (0.0013) [2024-06-15 20:16:25,955][1648985] Fps is (10 sec: 39321.4, 60 sec: 47513.5, 300 sec: 46986.0). Total num frames: 1461944320. Throughput: 0: 12026.3. Samples: 365549056. Policy #0 lag: (min: 15.0, avg: 137.1, max: 271.0) [2024-06-15 20:16:25,956][1648985] Avg episode reward: [(0, '144.850')] [2024-06-15 20:16:27,297][1652491] Updated weights for policy 0, policy_version 713904 (0.0013) [2024-06-15 20:16:30,955][1648985] Fps is (10 sec: 45875.4, 60 sec: 48059.8, 300 sec: 46986.0). Total num frames: 1462206464. Throughput: 0: 11912.5. Samples: 365616128. Policy #0 lag: (min: 15.0, avg: 137.1, max: 271.0) [2024-06-15 20:16:30,955][1648985] Avg episode reward: [(0, '156.280')] [2024-06-15 20:16:30,987][1652491] Updated weights for policy 0, policy_version 713981 (0.0014) [2024-06-15 20:16:33,536][1652491] Updated weights for policy 0, policy_version 714048 (0.0206) [2024-06-15 20:16:35,955][1648985] Fps is (10 sec: 45876.5, 60 sec: 46421.4, 300 sec: 46763.8). Total num frames: 1462403072. Throughput: 0: 11707.8. Samples: 365683200. Policy #0 lag: (min: 15.0, avg: 137.1, max: 271.0) [2024-06-15 20:16:35,955][1648985] Avg episode reward: [(0, '158.360')] [2024-06-15 20:16:36,935][1652491] Updated weights for policy 0, policy_version 714109 (0.0086) [2024-06-15 20:16:38,963][1652491] Updated weights for policy 0, policy_version 714166 (0.0014) [2024-06-15 20:16:40,955][1648985] Fps is (10 sec: 42597.9, 60 sec: 48059.7, 300 sec: 46874.9). Total num frames: 1462632448. Throughput: 0: 11537.1. Samples: 365711360. Policy #0 lag: (min: 15.0, avg: 137.1, max: 271.0) [2024-06-15 20:16:40,956][1648985] Avg episode reward: [(0, '163.120')] [2024-06-15 20:16:41,980][1652491] Updated weights for policy 0, policy_version 714224 (0.0014) [2024-06-15 20:16:44,373][1652491] Updated weights for policy 0, policy_version 714276 (0.0014) [2024-06-15 20:16:45,955][1648985] Fps is (10 sec: 49150.8, 60 sec: 45875.2, 300 sec: 46874.9). Total num frames: 1462894592. Throughput: 0: 11639.5. Samples: 365785088. Policy #0 lag: (min: 15.0, avg: 137.1, max: 271.0) [2024-06-15 20:16:45,956][1648985] Avg episode reward: [(0, '146.280')] [2024-06-15 20:16:46,650][1652491] Updated weights for policy 0, policy_version 714305 (0.0040) [2024-06-15 20:16:47,655][1652491] Updated weights for policy 0, policy_version 714356 (0.0164) [2024-06-15 20:16:49,642][1652491] Updated weights for policy 0, policy_version 714420 (0.0027) [2024-06-15 20:16:50,955][1648985] Fps is (10 sec: 52428.5, 60 sec: 48059.8, 300 sec: 47208.1). Total num frames: 1463156736. Throughput: 0: 11798.8. Samples: 365861888. Policy #0 lag: (min: 15.0, avg: 137.1, max: 271.0) [2024-06-15 20:16:50,956][1648985] Avg episode reward: [(0, '150.020')] [2024-06-15 20:16:52,999][1652491] Updated weights for policy 0, policy_version 714480 (0.0173) [2024-06-15 20:16:55,296][1652491] Updated weights for policy 0, policy_version 714529 (0.0019) [2024-06-15 20:16:55,955][1648985] Fps is (10 sec: 52429.1, 60 sec: 47513.6, 300 sec: 47208.1). Total num frames: 1463418880. Throughput: 0: 11741.9. Samples: 365897728. Policy #0 lag: (min: 15.0, avg: 137.1, max: 271.0) [2024-06-15 20:16:55,956][1648985] Avg episode reward: [(0, '157.220')] [2024-06-15 20:16:55,960][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000714560_1463418880.pth... [2024-06-15 20:16:56,022][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000709088_1452212224.pth [2024-06-15 20:16:57,078][1652491] Updated weights for policy 0, policy_version 714564 (0.0012) [2024-06-15 20:17:00,038][1652491] Updated weights for policy 0, policy_version 714640 (0.0013) [2024-06-15 20:17:00,856][1652491] Updated weights for policy 0, policy_version 714682 (0.0014) [2024-06-15 20:17:00,955][1648985] Fps is (10 sec: 52429.9, 60 sec: 48059.8, 300 sec: 47541.4). Total num frames: 1463681024. Throughput: 0: 11707.8. Samples: 365970432. Policy #0 lag: (min: 15.0, avg: 137.1, max: 271.0) [2024-06-15 20:17:00,955][1648985] Avg episode reward: [(0, '160.610')] [2024-06-15 20:17:04,090][1652491] Updated weights for policy 0, policy_version 714736 (0.0027) [2024-06-15 20:17:05,768][1652491] Updated weights for policy 0, policy_version 714800 (0.0013) [2024-06-15 20:17:05,955][1648985] Fps is (10 sec: 49152.4, 60 sec: 47513.9, 300 sec: 47430.3). Total num frames: 1463910400. Throughput: 0: 11673.6. Samples: 366039040. Policy #0 lag: (min: 15.0, avg: 137.1, max: 271.0) [2024-06-15 20:17:05,955][1648985] Avg episode reward: [(0, '150.430')] [2024-06-15 20:17:08,437][1651469] Signal inference workers to stop experience collection... (37150 times) [2024-06-15 20:17:08,511][1652491] InferenceWorker_p0-w0: stopping experience collection (37150 times) [2024-06-15 20:17:08,514][1652491] Updated weights for policy 0, policy_version 714852 (0.0013) [2024-06-15 20:17:08,647][1651469] Signal inference workers to resume experience collection... (37150 times) [2024-06-15 20:17:08,648][1652491] InferenceWorker_p0-w0: resuming experience collection (37150 times) [2024-06-15 20:17:10,955][1648985] Fps is (10 sec: 39320.8, 60 sec: 45875.2, 300 sec: 47097.1). Total num frames: 1464074240. Throughput: 0: 11719.1. Samples: 366076416. Policy #0 lag: (min: 15.0, avg: 137.1, max: 271.0) [2024-06-15 20:17:10,956][1648985] Avg episode reward: [(0, '146.070')] [2024-06-15 20:17:11,868][1652491] Updated weights for policy 0, policy_version 714914 (0.0014) [2024-06-15 20:17:14,884][1652491] Updated weights for policy 0, policy_version 714965 (0.0013) [2024-06-15 20:17:15,955][1648985] Fps is (10 sec: 42598.0, 60 sec: 46421.4, 300 sec: 47097.1). Total num frames: 1464336384. Throughput: 0: 11969.4. Samples: 366154752. Policy #0 lag: (min: 15.0, avg: 137.1, max: 271.0) [2024-06-15 20:17:15,956][1648985] Avg episode reward: [(0, '148.850')] [2024-06-15 20:17:16,542][1652491] Updated weights for policy 0, policy_version 715027 (0.0016) [2024-06-15 20:17:19,233][1652491] Updated weights for policy 0, policy_version 715088 (0.0013) [2024-06-15 20:17:20,955][1648985] Fps is (10 sec: 52429.4, 60 sec: 47513.6, 300 sec: 47097.1). Total num frames: 1464598528. Throughput: 0: 11821.5. Samples: 366215168. Policy #0 lag: (min: 15.0, avg: 137.1, max: 271.0) [2024-06-15 20:17:20,956][1648985] Avg episode reward: [(0, '139.560')] [2024-06-15 20:17:23,459][1652491] Updated weights for policy 0, policy_version 715168 (0.0013) [2024-06-15 20:17:25,955][1648985] Fps is (10 sec: 39320.8, 60 sec: 46421.2, 300 sec: 46763.8). Total num frames: 1464729600. Throughput: 0: 11958.0. Samples: 366249472. Policy #0 lag: (min: 15.0, avg: 137.1, max: 271.0) [2024-06-15 20:17:25,956][1648985] Avg episode reward: [(0, '138.550')] [2024-06-15 20:17:26,232][1652491] Updated weights for policy 0, policy_version 715216 (0.0013) [2024-06-15 20:17:27,930][1652491] Updated weights for policy 0, policy_version 715281 (0.0014) [2024-06-15 20:17:30,955][1648985] Fps is (10 sec: 42598.9, 60 sec: 46967.5, 300 sec: 46874.9). Total num frames: 1465024512. Throughput: 0: 11878.4. Samples: 366319616. Policy #0 lag: (min: 15.0, avg: 137.1, max: 271.0) [2024-06-15 20:17:30,955][1648985] Avg episode reward: [(0, '137.220')] [2024-06-15 20:17:30,970][1652491] Updated weights for policy 0, policy_version 715344 (0.0013) [2024-06-15 20:17:35,093][1652491] Updated weights for policy 0, policy_version 715409 (0.0013) [2024-06-15 20:17:35,955][1648985] Fps is (10 sec: 49153.1, 60 sec: 46967.3, 300 sec: 46874.9). Total num frames: 1465221120. Throughput: 0: 11753.3. Samples: 366390784. Policy #0 lag: (min: 15.0, avg: 137.1, max: 271.0) [2024-06-15 20:17:35,956][1648985] Avg episode reward: [(0, '167.660')] [2024-06-15 20:17:37,418][1652491] Updated weights for policy 0, policy_version 715472 (0.0033) [2024-06-15 20:17:39,514][1652491] Updated weights for policy 0, policy_version 715552 (0.0013) [2024-06-15 20:17:40,956][1648985] Fps is (10 sec: 49150.0, 60 sec: 48059.5, 300 sec: 46874.9). Total num frames: 1465516032. Throughput: 0: 11662.1. Samples: 366422528. Policy #0 lag: (min: 15.0, avg: 137.1, max: 271.0) [2024-06-15 20:17:40,956][1648985] Avg episode reward: [(0, '158.150')] [2024-06-15 20:17:42,687][1652491] Updated weights for policy 0, policy_version 715602 (0.0024) [2024-06-15 20:17:43,536][1652491] Updated weights for policy 0, policy_version 715643 (0.0013) [2024-06-15 20:17:45,955][1648985] Fps is (10 sec: 42597.9, 60 sec: 45875.2, 300 sec: 46874.9). Total num frames: 1465647104. Throughput: 0: 11650.8. Samples: 366494720. Policy #0 lag: (min: 15.0, avg: 137.1, max: 271.0) [2024-06-15 20:17:45,956][1648985] Avg episode reward: [(0, '147.610')] [2024-06-15 20:17:47,637][1652491] Updated weights for policy 0, policy_version 715704 (0.0013) [2024-06-15 20:17:49,437][1652491] Updated weights for policy 0, policy_version 715776 (0.0014) [2024-06-15 20:17:50,902][1652491] Updated weights for policy 0, policy_version 715838 (0.0016) [2024-06-15 20:17:50,955][1648985] Fps is (10 sec: 52430.7, 60 sec: 48059.9, 300 sec: 47097.1). Total num frames: 1466040320. Throughput: 0: 11514.3. Samples: 366557184. Policy #0 lag: (min: 49.0, avg: 157.4, max: 302.0) [2024-06-15 20:17:50,955][1648985] Avg episode reward: [(0, '146.580')] [2024-06-15 20:17:53,372][1651469] Signal inference workers to stop experience collection... (37200 times) [2024-06-15 20:17:53,441][1652491] InferenceWorker_p0-w0: stopping experience collection (37200 times) [2024-06-15 20:17:53,646][1651469] Signal inference workers to resume experience collection... (37200 times) [2024-06-15 20:17:53,646][1652491] InferenceWorker_p0-w0: resuming experience collection (37200 times) [2024-06-15 20:17:54,624][1652491] Updated weights for policy 0, policy_version 715901 (0.0014) [2024-06-15 20:17:55,955][1648985] Fps is (10 sec: 52429.4, 60 sec: 45875.2, 300 sec: 47097.0). Total num frames: 1466171392. Throughput: 0: 11582.6. Samples: 366597632. Policy #0 lag: (min: 49.0, avg: 157.4, max: 302.0) [2024-06-15 20:17:55,956][1648985] Avg episode reward: [(0, '151.550')] [2024-06-15 20:17:58,556][1652491] Updated weights for policy 0, policy_version 715938 (0.0015) [2024-06-15 20:17:59,645][1652491] Updated weights for policy 0, policy_version 715989 (0.0011) [2024-06-15 20:18:00,955][1648985] Fps is (10 sec: 39321.5, 60 sec: 45875.2, 300 sec: 47097.1). Total num frames: 1466433536. Throughput: 0: 11502.9. Samples: 366672384. Policy #0 lag: (min: 49.0, avg: 157.4, max: 302.0) [2024-06-15 20:18:00,956][1648985] Avg episode reward: [(0, '152.690')] [2024-06-15 20:18:01,329][1652491] Updated weights for policy 0, policy_version 716064 (0.0012) [2024-06-15 20:18:04,390][1652491] Updated weights for policy 0, policy_version 716112 (0.0013) [2024-06-15 20:18:05,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 46421.3, 300 sec: 47097.1). Total num frames: 1466695680. Throughput: 0: 11673.6. Samples: 366740480. Policy #0 lag: (min: 49.0, avg: 157.4, max: 302.0) [2024-06-15 20:18:05,955][1648985] Avg episode reward: [(0, '166.030')] [2024-06-15 20:18:08,992][1652491] Updated weights for policy 0, policy_version 716176 (0.0013) [2024-06-15 20:18:10,955][1648985] Fps is (10 sec: 42598.2, 60 sec: 46421.4, 300 sec: 46874.9). Total num frames: 1466859520. Throughput: 0: 12026.4. Samples: 366790656. Policy #0 lag: (min: 49.0, avg: 157.4, max: 302.0) [2024-06-15 20:18:10,956][1648985] Avg episode reward: [(0, '158.910')] [2024-06-15 20:18:11,084][1652491] Updated weights for policy 0, policy_version 716245 (0.0014) [2024-06-15 20:18:12,380][1652491] Updated weights for policy 0, policy_version 716309 (0.0017) [2024-06-15 20:18:14,392][1652491] Updated weights for policy 0, policy_version 716353 (0.0014) [2024-06-15 20:18:15,702][1652491] Updated weights for policy 0, policy_version 716415 (0.0015) [2024-06-15 20:18:15,963][1648985] Fps is (10 sec: 52384.4, 60 sec: 48053.0, 300 sec: 47095.7). Total num frames: 1467219968. Throughput: 0: 11796.5. Samples: 366850560. Policy #0 lag: (min: 49.0, avg: 157.4, max: 302.0) [2024-06-15 20:18:15,964][1648985] Avg episode reward: [(0, '156.750')] [2024-06-15 20:18:20,955][1648985] Fps is (10 sec: 42597.9, 60 sec: 44782.8, 300 sec: 46874.9). Total num frames: 1467285504. Throughput: 0: 11889.7. Samples: 366925824. Policy #0 lag: (min: 49.0, avg: 157.4, max: 302.0) [2024-06-15 20:18:20,956][1648985] Avg episode reward: [(0, '134.150')] [2024-06-15 20:18:21,699][1652491] Updated weights for policy 0, policy_version 716480 (0.0012) [2024-06-15 20:18:23,296][1652491] Updated weights for policy 0, policy_version 716544 (0.0013) [2024-06-15 20:18:24,371][1652491] Updated weights for policy 0, policy_version 716593 (0.0025) [2024-06-15 20:18:25,955][1648985] Fps is (10 sec: 39355.0, 60 sec: 48059.9, 300 sec: 46652.7). Total num frames: 1467613184. Throughput: 0: 11798.8. Samples: 366953472. Policy #0 lag: (min: 49.0, avg: 157.4, max: 302.0) [2024-06-15 20:18:25,955][1648985] Avg episode reward: [(0, '135.940')] [2024-06-15 20:18:26,755][1652491] Updated weights for policy 0, policy_version 716641 (0.0032) [2024-06-15 20:18:30,955][1648985] Fps is (10 sec: 45875.5, 60 sec: 45328.9, 300 sec: 46987.7). Total num frames: 1467744256. Throughput: 0: 11980.8. Samples: 367033856. Policy #0 lag: (min: 49.0, avg: 157.4, max: 302.0) [2024-06-15 20:18:30,956][1648985] Avg episode reward: [(0, '134.950')] [2024-06-15 20:18:31,598][1652491] Updated weights for policy 0, policy_version 716674 (0.0017) [2024-06-15 20:18:33,190][1652491] Updated weights for policy 0, policy_version 716736 (0.0107) [2024-06-15 20:18:33,679][1651469] Signal inference workers to stop experience collection... (37250 times) [2024-06-15 20:18:33,727][1652491] InferenceWorker_p0-w0: stopping experience collection (37250 times) [2024-06-15 20:18:33,994][1651469] Signal inference workers to resume experience collection... (37250 times) [2024-06-15 20:18:34,014][1652491] InferenceWorker_p0-w0: resuming experience collection (37250 times) [2024-06-15 20:18:35,533][1652491] Updated weights for policy 0, policy_version 716832 (0.0017) [2024-06-15 20:18:35,955][1648985] Fps is (10 sec: 49152.0, 60 sec: 48059.8, 300 sec: 46986.0). Total num frames: 1468104704. Throughput: 0: 11730.5. Samples: 367085056. Policy #0 lag: (min: 49.0, avg: 157.4, max: 302.0) [2024-06-15 20:18:35,955][1648985] Avg episode reward: [(0, '136.440')] [2024-06-15 20:18:37,709][1652491] Updated weights for policy 0, policy_version 716882 (0.0042) [2024-06-15 20:18:40,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 45875.4, 300 sec: 47097.1). Total num frames: 1468268544. Throughput: 0: 11582.6. Samples: 367118848. Policy #0 lag: (min: 49.0, avg: 157.4, max: 302.0) [2024-06-15 20:18:40,956][1648985] Avg episode reward: [(0, '133.910')] [2024-06-15 20:18:43,977][1652491] Updated weights for policy 0, policy_version 716945 (0.0137) [2024-06-15 20:18:45,955][1648985] Fps is (10 sec: 36044.7, 60 sec: 46967.6, 300 sec: 46763.8). Total num frames: 1468465152. Throughput: 0: 11673.6. Samples: 367197696. Policy #0 lag: (min: 49.0, avg: 157.4, max: 302.0) [2024-06-15 20:18:45,956][1648985] Avg episode reward: [(0, '140.060')] [2024-06-15 20:18:46,055][1652491] Updated weights for policy 0, policy_version 717026 (0.0075) [2024-06-15 20:18:47,867][1652491] Updated weights for policy 0, policy_version 717105 (0.0012) [2024-06-15 20:18:49,933][1652491] Updated weights for policy 0, policy_version 717179 (0.0012) [2024-06-15 20:18:50,955][1648985] Fps is (10 sec: 52428.1, 60 sec: 45875.0, 300 sec: 47097.0). Total num frames: 1468792832. Throughput: 0: 11320.8. Samples: 367249920. Policy #0 lag: (min: 49.0, avg: 157.4, max: 302.0) [2024-06-15 20:18:50,956][1648985] Avg episode reward: [(0, '130.090')] [2024-06-15 20:18:55,955][1648985] Fps is (10 sec: 36044.6, 60 sec: 44236.8, 300 sec: 46541.6). Total num frames: 1468825600. Throughput: 0: 11116.1. Samples: 367290880. Policy #0 lag: (min: 49.0, avg: 157.4, max: 302.0) [2024-06-15 20:18:55,956][1648985] Avg episode reward: [(0, '147.070')] [2024-06-15 20:18:56,417][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000717232_1468891136.pth... [2024-06-15 20:18:56,614][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000711744_1457651712.pth [2024-06-15 20:18:57,383][1652491] Updated weights for policy 0, policy_version 717264 (0.0129) [2024-06-15 20:18:58,905][1652491] Updated weights for policy 0, policy_version 717313 (0.0012) [2024-06-15 20:19:00,028][1652491] Updated weights for policy 0, policy_version 717369 (0.0095) [2024-06-15 20:19:00,955][1648985] Fps is (10 sec: 39322.1, 60 sec: 45875.2, 300 sec: 46652.7). Total num frames: 1469186048. Throughput: 0: 10981.6. Samples: 367344640. Policy #0 lag: (min: 49.0, avg: 157.4, max: 302.0) [2024-06-15 20:19:00,956][1648985] Avg episode reward: [(0, '133.320')] [2024-06-15 20:19:02,346][1652491] Updated weights for policy 0, policy_version 717430 (0.0125) [2024-06-15 20:19:05,955][1648985] Fps is (10 sec: 49150.7, 60 sec: 43690.4, 300 sec: 46874.9). Total num frames: 1469317120. Throughput: 0: 11138.8. Samples: 367427072. Policy #0 lag: (min: 49.0, avg: 157.4, max: 302.0) [2024-06-15 20:19:05,956][1648985] Avg episode reward: [(0, '154.610')] [2024-06-15 20:19:08,051][1652491] Updated weights for policy 0, policy_version 717488 (0.0014) [2024-06-15 20:19:09,742][1652491] Updated weights for policy 0, policy_version 717561 (0.0011) [2024-06-15 20:19:10,955][1648985] Fps is (10 sec: 42598.0, 60 sec: 45875.1, 300 sec: 46652.7). Total num frames: 1469612032. Throughput: 0: 11161.6. Samples: 367455744. Policy #0 lag: (min: 49.0, avg: 157.4, max: 302.0) [2024-06-15 20:19:10,956][1648985] Avg episode reward: [(0, '154.410')] [2024-06-15 20:19:11,707][1652491] Updated weights for policy 0, policy_version 717631 (0.0012) [2024-06-15 20:19:13,499][1652491] Updated weights for policy 0, policy_version 717687 (0.0015) [2024-06-15 20:19:15,955][1648985] Fps is (10 sec: 52430.4, 60 sec: 43696.8, 300 sec: 47097.0). Total num frames: 1469841408. Throughput: 0: 10717.9. Samples: 367516160. Policy #0 lag: (min: 49.0, avg: 157.4, max: 302.0) [2024-06-15 20:19:15,956][1648985] Avg episode reward: [(0, '158.550')] [2024-06-15 20:19:18,931][1651469] Signal inference workers to stop experience collection... (37300 times) [2024-06-15 20:19:18,993][1652491] InferenceWorker_p0-w0: stopping experience collection (37300 times) [2024-06-15 20:19:19,211][1651469] Signal inference workers to resume experience collection... (37300 times) [2024-06-15 20:19:19,212][1652491] InferenceWorker_p0-w0: resuming experience collection (37300 times) [2024-06-15 20:19:19,633][1652491] Updated weights for policy 0, policy_version 717728 (0.0014) [2024-06-15 20:19:20,955][1648985] Fps is (10 sec: 39322.0, 60 sec: 45329.2, 300 sec: 46541.7). Total num frames: 1470005248. Throughput: 0: 11241.2. Samples: 367590912. Policy #0 lag: (min: 49.0, avg: 157.4, max: 302.0) [2024-06-15 20:19:20,956][1648985] Avg episode reward: [(0, '156.780')] [2024-06-15 20:19:21,134][1652491] Updated weights for policy 0, policy_version 717780 (0.0013) [2024-06-15 20:19:23,263][1652491] Updated weights for policy 0, policy_version 717872 (0.0013) [2024-06-15 20:19:24,807][1652491] Updated weights for policy 0, policy_version 717904 (0.0015) [2024-06-15 20:19:25,955][1648985] Fps is (10 sec: 52428.4, 60 sec: 45875.1, 300 sec: 47097.1). Total num frames: 1470365696. Throughput: 0: 11025.1. Samples: 367614976. Policy #0 lag: (min: 49.0, avg: 157.4, max: 302.0) [2024-06-15 20:19:25,956][1648985] Avg episode reward: [(0, '146.860')] [2024-06-15 20:19:29,796][1652491] Updated weights for policy 0, policy_version 717955 (0.0012) [2024-06-15 20:19:30,955][1648985] Fps is (10 sec: 45875.6, 60 sec: 45329.2, 300 sec: 46541.7). Total num frames: 1470464000. Throughput: 0: 11104.7. Samples: 367697408. Policy #0 lag: (min: 47.0, avg: 117.8, max: 303.0) [2024-06-15 20:19:30,956][1648985] Avg episode reward: [(0, '148.800')] [2024-06-15 20:19:31,175][1652491] Updated weights for policy 0, policy_version 718016 (0.0093) [2024-06-15 20:19:33,582][1652491] Updated weights for policy 0, policy_version 718080 (0.0012) [2024-06-15 20:19:35,058][1652491] Updated weights for policy 0, policy_version 718140 (0.0015) [2024-06-15 20:19:35,955][1648985] Fps is (10 sec: 42599.0, 60 sec: 44783.0, 300 sec: 46763.8). Total num frames: 1470791680. Throughput: 0: 11320.9. Samples: 367759360. Policy #0 lag: (min: 47.0, avg: 117.8, max: 303.0) [2024-06-15 20:19:35,955][1648985] Avg episode reward: [(0, '150.570')] [2024-06-15 20:19:36,341][1652491] Updated weights for policy 0, policy_version 718187 (0.0013) [2024-06-15 20:19:40,522][1652491] Updated weights for policy 0, policy_version 718224 (0.0014) [2024-06-15 20:19:40,955][1648985] Fps is (10 sec: 49151.9, 60 sec: 44783.0, 300 sec: 46874.9). Total num frames: 1470955520. Throughput: 0: 11343.7. Samples: 367801344. Policy #0 lag: (min: 47.0, avg: 117.8, max: 303.0) [2024-06-15 20:19:40,956][1648985] Avg episode reward: [(0, '144.890')] [2024-06-15 20:19:41,578][1652491] Updated weights for policy 0, policy_version 718272 (0.0019) [2024-06-15 20:19:44,810][1652491] Updated weights for policy 0, policy_version 718337 (0.0012) [2024-06-15 20:19:45,955][1648985] Fps is (10 sec: 45874.7, 60 sec: 46421.3, 300 sec: 46763.8). Total num frames: 1471250432. Throughput: 0: 11707.7. Samples: 367871488. Policy #0 lag: (min: 47.0, avg: 117.8, max: 303.0) [2024-06-15 20:19:45,956][1648985] Avg episode reward: [(0, '145.000')] [2024-06-15 20:19:45,959][1652491] Updated weights for policy 0, policy_version 718392 (0.0013) [2024-06-15 20:19:47,306][1652491] Updated weights for policy 0, policy_version 718448 (0.0012) [2024-06-15 20:19:50,955][1648985] Fps is (10 sec: 49151.8, 60 sec: 44236.9, 300 sec: 46986.0). Total num frames: 1471447040. Throughput: 0: 11514.4. Samples: 367945216. Policy #0 lag: (min: 47.0, avg: 117.8, max: 303.0) [2024-06-15 20:19:50,956][1648985] Avg episode reward: [(0, '150.830')] [2024-06-15 20:19:51,068][1652491] Updated weights for policy 0, policy_version 718496 (0.0012) [2024-06-15 20:19:54,917][1652491] Updated weights for policy 0, policy_version 718544 (0.0010) [2024-06-15 20:19:55,955][1648985] Fps is (10 sec: 39321.5, 60 sec: 46967.4, 300 sec: 46541.7). Total num frames: 1471643648. Throughput: 0: 11719.1. Samples: 367983104. Policy #0 lag: (min: 47.0, avg: 117.8, max: 303.0) [2024-06-15 20:19:55,956][1648985] Avg episode reward: [(0, '146.110')] [2024-06-15 20:19:56,630][1652491] Updated weights for policy 0, policy_version 718610 (0.0013) [2024-06-15 20:19:57,337][1651469] Signal inference workers to stop experience collection... (37350 times) [2024-06-15 20:19:57,400][1652491] InferenceWorker_p0-w0: stopping experience collection (37350 times) [2024-06-15 20:19:57,637][1651469] Signal inference workers to resume experience collection... (37350 times) [2024-06-15 20:19:57,637][1652491] InferenceWorker_p0-w0: resuming experience collection (37350 times) [2024-06-15 20:19:58,346][1652491] Updated weights for policy 0, policy_version 718689 (0.0012) [2024-06-15 20:20:00,955][1648985] Fps is (10 sec: 49151.3, 60 sec: 45875.1, 300 sec: 46874.9). Total num frames: 1471938560. Throughput: 0: 11798.7. Samples: 368047104. Policy #0 lag: (min: 47.0, avg: 117.8, max: 303.0) [2024-06-15 20:20:00,956][1648985] Avg episode reward: [(0, '145.190')] [2024-06-15 20:20:01,225][1652491] Updated weights for policy 0, policy_version 718736 (0.0012) [2024-06-15 20:20:05,623][1652491] Updated weights for policy 0, policy_version 718788 (0.0022) [2024-06-15 20:20:05,955][1648985] Fps is (10 sec: 45875.9, 60 sec: 46421.6, 300 sec: 46430.6). Total num frames: 1472102400. Throughput: 0: 12049.1. Samples: 368133120. Policy #0 lag: (min: 47.0, avg: 117.8, max: 303.0) [2024-06-15 20:20:05,955][1648985] Avg episode reward: [(0, '175.790')] [2024-06-15 20:20:07,634][1652491] Updated weights for policy 0, policy_version 718865 (0.0013) [2024-06-15 20:20:09,001][1652491] Updated weights for policy 0, policy_version 718928 (0.0014) [2024-06-15 20:20:10,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 47513.6, 300 sec: 47097.0). Total num frames: 1472462848. Throughput: 0: 12140.1. Samples: 368161280. Policy #0 lag: (min: 47.0, avg: 117.8, max: 303.0) [2024-06-15 20:20:10,956][1648985] Avg episode reward: [(0, '170.850')] [2024-06-15 20:20:12,018][1652491] Updated weights for policy 0, policy_version 718995 (0.0012) [2024-06-15 20:20:15,955][1648985] Fps is (10 sec: 49151.3, 60 sec: 45875.2, 300 sec: 46541.7). Total num frames: 1472593920. Throughput: 0: 12083.2. Samples: 368241152. Policy #0 lag: (min: 47.0, avg: 117.8, max: 303.0) [2024-06-15 20:20:15,956][1648985] Avg episode reward: [(0, '145.650')] [2024-06-15 20:20:16,637][1652491] Updated weights for policy 0, policy_version 719057 (0.0016) [2024-06-15 20:20:17,975][1652491] Updated weights for policy 0, policy_version 719108 (0.0011) [2024-06-15 20:20:19,868][1652491] Updated weights for policy 0, policy_version 719184 (0.0013) [2024-06-15 20:20:20,955][1648985] Fps is (10 sec: 49151.9, 60 sec: 49151.9, 300 sec: 46986.0). Total num frames: 1472954368. Throughput: 0: 11935.2. Samples: 368296448. Policy #0 lag: (min: 47.0, avg: 117.8, max: 303.0) [2024-06-15 20:20:20,956][1648985] Avg episode reward: [(0, '137.400')] [2024-06-15 20:20:24,111][1652491] Updated weights for policy 0, policy_version 719248 (0.0013) [2024-06-15 20:20:24,890][1652491] Updated weights for policy 0, policy_version 719285 (0.0021) [2024-06-15 20:20:25,955][1648985] Fps is (10 sec: 52429.2, 60 sec: 45875.3, 300 sec: 46763.8). Total num frames: 1473118208. Throughput: 0: 11889.8. Samples: 368336384. Policy #0 lag: (min: 47.0, avg: 117.8, max: 303.0) [2024-06-15 20:20:25,956][1648985] Avg episode reward: [(0, '130.910')] [2024-06-15 20:20:28,986][1652491] Updated weights for policy 0, policy_version 719328 (0.0012) [2024-06-15 20:20:30,955][1648985] Fps is (10 sec: 36045.4, 60 sec: 47513.6, 300 sec: 46430.6). Total num frames: 1473314816. Throughput: 0: 11844.3. Samples: 368404480. Policy #0 lag: (min: 47.0, avg: 117.8, max: 303.0) [2024-06-15 20:20:30,955][1648985] Avg episode reward: [(0, '156.090')] [2024-06-15 20:20:31,590][1652491] Updated weights for policy 0, policy_version 719426 (0.0011) [2024-06-15 20:20:32,988][1652491] Updated weights for policy 0, policy_version 719488 (0.0013) [2024-06-15 20:20:35,955][1648985] Fps is (10 sec: 42598.3, 60 sec: 45875.1, 300 sec: 46763.8). Total num frames: 1473544192. Throughput: 0: 11628.1. Samples: 368468480. Policy #0 lag: (min: 47.0, avg: 117.8, max: 303.0) [2024-06-15 20:20:35,956][1648985] Avg episode reward: [(0, '159.030')] [2024-06-15 20:20:36,730][1652491] Updated weights for policy 0, policy_version 719552 (0.0123) [2024-06-15 20:20:40,614][1651469] Signal inference workers to stop experience collection... (37400 times) [2024-06-15 20:20:40,656][1652491] InferenceWorker_p0-w0: stopping experience collection (37400 times) [2024-06-15 20:20:40,846][1651469] Signal inference workers to resume experience collection... (37400 times) [2024-06-15 20:20:40,847][1652491] InferenceWorker_p0-w0: resuming experience collection (37400 times) [2024-06-15 20:20:40,955][1648985] Fps is (10 sec: 39321.3, 60 sec: 45875.1, 300 sec: 45986.3). Total num frames: 1473708032. Throughput: 0: 11628.1. Samples: 368506368. Policy #0 lag: (min: 47.0, avg: 117.8, max: 303.0) [2024-06-15 20:20:40,956][1648985] Avg episode reward: [(0, '176.370')] [2024-06-15 20:20:41,446][1652491] Updated weights for policy 0, policy_version 719602 (0.0013) [2024-06-15 20:20:43,308][1652491] Updated weights for policy 0, policy_version 719680 (0.0103) [2024-06-15 20:20:44,617][1652491] Updated weights for policy 0, policy_version 719735 (0.0014) [2024-06-15 20:20:45,955][1648985] Fps is (10 sec: 49153.0, 60 sec: 46421.5, 300 sec: 46652.8). Total num frames: 1474035712. Throughput: 0: 11548.5. Samples: 368566784. Policy #0 lag: (min: 47.0, avg: 117.8, max: 303.0) [2024-06-15 20:20:45,955][1648985] Avg episode reward: [(0, '152.640')] [2024-06-15 20:20:47,137][1652491] Updated weights for policy 0, policy_version 719777 (0.0017) [2024-06-15 20:20:50,955][1648985] Fps is (10 sec: 45875.5, 60 sec: 45329.1, 300 sec: 46097.4). Total num frames: 1474166784. Throughput: 0: 11434.6. Samples: 368647680. Policy #0 lag: (min: 47.0, avg: 117.8, max: 303.0) [2024-06-15 20:20:50,956][1648985] Avg episode reward: [(0, '158.300')] [2024-06-15 20:20:51,732][1652491] Updated weights for policy 0, policy_version 719827 (0.0012) [2024-06-15 20:20:53,336][1652491] Updated weights for policy 0, policy_version 719888 (0.0012) [2024-06-15 20:20:54,674][1652491] Updated weights for policy 0, policy_version 719939 (0.0013) [2024-06-15 20:20:55,865][1652491] Updated weights for policy 0, policy_version 719995 (0.0013) [2024-06-15 20:20:55,955][1648985] Fps is (10 sec: 52425.9, 60 sec: 48605.6, 300 sec: 46652.7). Total num frames: 1474560000. Throughput: 0: 11457.4. Samples: 368676864. Policy #0 lag: (min: 47.0, avg: 117.8, max: 303.0) [2024-06-15 20:20:55,956][1648985] Avg episode reward: [(0, '148.100')] [2024-06-15 20:20:55,991][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000720000_1474560000.pth... [2024-06-15 20:20:56,051][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000714560_1463418880.pth [2024-06-15 20:20:58,929][1652491] Updated weights for policy 0, policy_version 720057 (0.0039) [2024-06-15 20:21:00,955][1648985] Fps is (10 sec: 52428.3, 60 sec: 45875.2, 300 sec: 46208.5). Total num frames: 1474691072. Throughput: 0: 11093.3. Samples: 368740352. Policy #0 lag: (min: 47.0, avg: 117.8, max: 303.0) [2024-06-15 20:21:00,956][1648985] Avg episode reward: [(0, '156.960')] [2024-06-15 20:21:03,863][1652491] Updated weights for policy 0, policy_version 720112 (0.0023) [2024-06-15 20:21:05,251][1652491] Updated weights for policy 0, policy_version 720161 (0.0012) [2024-06-15 20:21:05,955][1648985] Fps is (10 sec: 39322.6, 60 sec: 47513.4, 300 sec: 46208.4). Total num frames: 1474953216. Throughput: 0: 11434.7. Samples: 368811008. Policy #0 lag: (min: 47.0, avg: 117.8, max: 303.0) [2024-06-15 20:21:05,956][1648985] Avg episode reward: [(0, '152.800')] [2024-06-15 20:21:07,134][1652491] Updated weights for policy 0, policy_version 720244 (0.0018) [2024-06-15 20:21:09,077][1652491] Updated weights for policy 0, policy_version 720288 (0.0016) [2024-06-15 20:21:10,955][1648985] Fps is (10 sec: 52428.5, 60 sec: 45875.2, 300 sec: 46319.5). Total num frames: 1475215360. Throughput: 0: 11355.0. Samples: 368847360. Policy #0 lag: (min: 47.0, avg: 117.8, max: 303.0) [2024-06-15 20:21:10,956][1648985] Avg episode reward: [(0, '155.840')] [2024-06-15 20:21:14,905][1652491] Updated weights for policy 0, policy_version 720354 (0.0015) [2024-06-15 20:21:15,955][1648985] Fps is (10 sec: 39321.8, 60 sec: 45875.2, 300 sec: 46097.3). Total num frames: 1475346432. Throughput: 0: 11571.2. Samples: 368925184. Policy #0 lag: (min: 15.0, avg: 83.8, max: 271.0) [2024-06-15 20:21:15,956][1648985] Avg episode reward: [(0, '152.760')] [2024-06-15 20:21:16,408][1652491] Updated weights for policy 0, policy_version 720416 (0.0013) [2024-06-15 20:21:17,518][1651469] Signal inference workers to stop experience collection... (37450 times) [2024-06-15 20:21:17,570][1652491] InferenceWorker_p0-w0: stopping experience collection (37450 times) [2024-06-15 20:21:17,709][1651469] Signal inference workers to resume experience collection... (37450 times) [2024-06-15 20:21:17,722][1652491] InferenceWorker_p0-w0: resuming experience collection (37450 times) [2024-06-15 20:21:18,234][1652491] Updated weights for policy 0, policy_version 720504 (0.0013) [2024-06-15 20:21:20,422][1652491] Updated weights for policy 0, policy_version 720547 (0.0013) [2024-06-15 20:21:20,955][1648985] Fps is (10 sec: 52429.9, 60 sec: 46421.5, 300 sec: 46763.8). Total num frames: 1475739648. Throughput: 0: 11491.6. Samples: 368985600. Policy #0 lag: (min: 15.0, avg: 83.8, max: 271.0) [2024-06-15 20:21:20,956][1648985] Avg episode reward: [(0, '141.150')] [2024-06-15 20:21:25,525][1652491] Updated weights for policy 0, policy_version 720594 (0.0014) [2024-06-15 20:21:25,962][1648985] Fps is (10 sec: 45845.4, 60 sec: 44778.0, 300 sec: 46096.3). Total num frames: 1475805184. Throughput: 0: 11637.8. Samples: 369030144. Policy #0 lag: (min: 15.0, avg: 83.8, max: 271.0) [2024-06-15 20:21:25,963][1648985] Avg episode reward: [(0, '133.190')] [2024-06-15 20:21:26,269][1652491] Updated weights for policy 0, policy_version 720637 (0.0015) [2024-06-15 20:21:27,631][1652491] Updated weights for policy 0, policy_version 720688 (0.0012) [2024-06-15 20:21:29,256][1652491] Updated weights for policy 0, policy_version 720766 (0.0015) [2024-06-15 20:21:30,955][1648985] Fps is (10 sec: 39321.0, 60 sec: 46967.3, 300 sec: 46541.6). Total num frames: 1476132864. Throughput: 0: 11662.1. Samples: 369091584. Policy #0 lag: (min: 15.0, avg: 83.8, max: 271.0) [2024-06-15 20:21:30,956][1648985] Avg episode reward: [(0, '142.060')] [2024-06-15 20:21:32,359][1652491] Updated weights for policy 0, policy_version 720830 (0.0012) [2024-06-15 20:21:35,955][1648985] Fps is (10 sec: 45905.5, 60 sec: 45329.1, 300 sec: 46208.4). Total num frames: 1476263936. Throughput: 0: 11548.4. Samples: 369167360. Policy #0 lag: (min: 15.0, avg: 83.8, max: 271.0) [2024-06-15 20:21:35,956][1648985] Avg episode reward: [(0, '148.330')] [2024-06-15 20:21:37,599][1652491] Updated weights for policy 0, policy_version 720893 (0.0031) [2024-06-15 20:21:39,187][1652491] Updated weights for policy 0, policy_version 720947 (0.0012) [2024-06-15 20:21:40,798][1652491] Updated weights for policy 0, policy_version 721023 (0.0011) [2024-06-15 20:21:40,955][1648985] Fps is (10 sec: 52428.5, 60 sec: 49151.9, 300 sec: 46652.7). Total num frames: 1476657152. Throughput: 0: 11582.6. Samples: 369198080. Policy #0 lag: (min: 15.0, avg: 83.8, max: 271.0) [2024-06-15 20:21:40,956][1648985] Avg episode reward: [(0, '154.600')] [2024-06-15 20:21:43,489][1652491] Updated weights for policy 0, policy_version 721077 (0.0014) [2024-06-15 20:21:45,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 45875.1, 300 sec: 46208.5). Total num frames: 1476788224. Throughput: 0: 11696.4. Samples: 369266688. Policy #0 lag: (min: 15.0, avg: 83.8, max: 271.0) [2024-06-15 20:21:45,956][1648985] Avg episode reward: [(0, '171.170')] [2024-06-15 20:21:47,815][1652491] Updated weights for policy 0, policy_version 721120 (0.0014) [2024-06-15 20:21:49,247][1652491] Updated weights for policy 0, policy_version 721153 (0.0018) [2024-06-15 20:21:50,788][1652491] Updated weights for policy 0, policy_version 721220 (0.0013) [2024-06-15 20:21:50,955][1648985] Fps is (10 sec: 42599.0, 60 sec: 48605.8, 300 sec: 46319.5). Total num frames: 1477083136. Throughput: 0: 11764.6. Samples: 369340416. Policy #0 lag: (min: 15.0, avg: 83.8, max: 271.0) [2024-06-15 20:21:50,956][1648985] Avg episode reward: [(0, '173.150')] [2024-06-15 20:21:52,809][1652491] Updated weights for policy 0, policy_version 721281 (0.0015) [2024-06-15 20:21:54,163][1652491] Updated weights for policy 0, policy_version 721343 (0.0016) [2024-06-15 20:21:55,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 45875.5, 300 sec: 46208.4). Total num frames: 1477312512. Throughput: 0: 11650.9. Samples: 369371648. Policy #0 lag: (min: 15.0, avg: 83.8, max: 271.0) [2024-06-15 20:21:55,955][1648985] Avg episode reward: [(0, '152.300')] [2024-06-15 20:22:00,106][1652491] Updated weights for policy 0, policy_version 721398 (0.0011) [2024-06-15 20:22:00,955][1648985] Fps is (10 sec: 36044.3, 60 sec: 45875.2, 300 sec: 45875.2). Total num frames: 1477443584. Throughput: 0: 11628.1. Samples: 369448448. Policy #0 lag: (min: 15.0, avg: 83.8, max: 271.0) [2024-06-15 20:22:00,956][1648985] Avg episode reward: [(0, '144.080')] [2024-06-15 20:22:01,863][1652491] Updated weights for policy 0, policy_version 721456 (0.0014) [2024-06-15 20:22:01,962][1651469] Signal inference workers to stop experience collection... (37500 times) [2024-06-15 20:22:02,030][1652491] InferenceWorker_p0-w0: stopping experience collection (37500 times) [2024-06-15 20:22:02,240][1651469] Signal inference workers to resume experience collection... (37500 times) [2024-06-15 20:22:02,241][1652491] InferenceWorker_p0-w0: resuming experience collection (37500 times) [2024-06-15 20:22:03,519][1652491] Updated weights for policy 0, policy_version 721530 (0.0033) [2024-06-15 20:22:05,413][1652491] Updated weights for policy 0, policy_version 721592 (0.0012) [2024-06-15 20:22:05,955][1648985] Fps is (10 sec: 52428.4, 60 sec: 48059.8, 300 sec: 46652.8). Total num frames: 1477836800. Throughput: 0: 11480.2. Samples: 369502208. Policy #0 lag: (min: 15.0, avg: 83.8, max: 271.0) [2024-06-15 20:22:05,956][1648985] Avg episode reward: [(0, '128.170')] [2024-06-15 20:22:10,955][1648985] Fps is (10 sec: 42599.1, 60 sec: 44236.9, 300 sec: 45875.2). Total num frames: 1477869568. Throughput: 0: 11504.6. Samples: 369547776. Policy #0 lag: (min: 15.0, avg: 83.8, max: 271.0) [2024-06-15 20:22:10,956][1648985] Avg episode reward: [(0, '141.550')] [2024-06-15 20:22:11,048][1652491] Updated weights for policy 0, policy_version 721620 (0.0013) [2024-06-15 20:22:12,577][1652491] Updated weights for policy 0, policy_version 721680 (0.0040) [2024-06-15 20:22:14,107][1652491] Updated weights for policy 0, policy_version 721746 (0.0013) [2024-06-15 20:22:15,773][1652491] Updated weights for policy 0, policy_version 721810 (0.0117) [2024-06-15 20:22:15,955][1648985] Fps is (10 sec: 42598.7, 60 sec: 48606.0, 300 sec: 46319.5). Total num frames: 1478262784. Throughput: 0: 11582.6. Samples: 369612800. Policy #0 lag: (min: 15.0, avg: 83.8, max: 271.0) [2024-06-15 20:22:15,956][1648985] Avg episode reward: [(0, '150.750')] [2024-06-15 20:22:16,893][1652491] Updated weights for policy 0, policy_version 721852 (0.0011) [2024-06-15 20:22:20,955][1648985] Fps is (10 sec: 49152.1, 60 sec: 43690.7, 300 sec: 46208.5). Total num frames: 1478361088. Throughput: 0: 11707.7. Samples: 369694208. Policy #0 lag: (min: 15.0, avg: 83.8, max: 271.0) [2024-06-15 20:22:20,956][1648985] Avg episode reward: [(0, '154.690')] [2024-06-15 20:22:22,527][1652491] Updated weights for policy 0, policy_version 721912 (0.0013) [2024-06-15 20:22:24,197][1652491] Updated weights for policy 0, policy_version 721968 (0.0013) [2024-06-15 20:22:25,722][1652491] Updated weights for policy 0, policy_version 722038 (0.0020) [2024-06-15 20:22:25,955][1648985] Fps is (10 sec: 49151.3, 60 sec: 49157.3, 300 sec: 46541.6). Total num frames: 1478754304. Throughput: 0: 11764.6. Samples: 369727488. Policy #0 lag: (min: 15.0, avg: 83.8, max: 271.0) [2024-06-15 20:22:25,956][1648985] Avg episode reward: [(0, '151.740')] [2024-06-15 20:22:27,259][1652491] Updated weights for policy 0, policy_version 722106 (0.0241) [2024-06-15 20:22:30,955][1648985] Fps is (10 sec: 52427.9, 60 sec: 45875.2, 300 sec: 46319.5). Total num frames: 1478885376. Throughput: 0: 11878.4. Samples: 369801216. Policy #0 lag: (min: 15.0, avg: 83.8, max: 271.0) [2024-06-15 20:22:30,956][1648985] Avg episode reward: [(0, '152.260')] [2024-06-15 20:22:32,823][1652491] Updated weights for policy 0, policy_version 722147 (0.0017) [2024-06-15 20:22:35,153][1652491] Updated weights for policy 0, policy_version 722208 (0.0013) [2024-06-15 20:22:35,955][1648985] Fps is (10 sec: 39321.9, 60 sec: 48059.7, 300 sec: 46208.5). Total num frames: 1479147520. Throughput: 0: 11832.9. Samples: 369872896. Policy #0 lag: (min: 15.0, avg: 83.8, max: 271.0) [2024-06-15 20:22:35,956][1648985] Avg episode reward: [(0, '157.490')] [2024-06-15 20:22:36,739][1652491] Updated weights for policy 0, policy_version 722288 (0.0013) [2024-06-15 20:22:38,502][1652491] Updated weights for policy 0, policy_version 722360 (0.0037) [2024-06-15 20:22:40,955][1648985] Fps is (10 sec: 52429.3, 60 sec: 45875.3, 300 sec: 46652.8). Total num frames: 1479409664. Throughput: 0: 11776.0. Samples: 369901568. Policy #0 lag: (min: 15.0, avg: 83.8, max: 271.0) [2024-06-15 20:22:40,956][1648985] Avg episode reward: [(0, '152.700')] [2024-06-15 20:22:42,751][1651469] Signal inference workers to stop experience collection... (37550 times) [2024-06-15 20:22:42,823][1652491] InferenceWorker_p0-w0: stopping experience collection (37550 times) [2024-06-15 20:22:43,014][1651469] Signal inference workers to resume experience collection... (37550 times) [2024-06-15 20:22:43,015][1652491] InferenceWorker_p0-w0: resuming experience collection (37550 times) [2024-06-15 20:22:43,018][1652491] Updated weights for policy 0, policy_version 722400 (0.0014) [2024-06-15 20:22:45,956][1648985] Fps is (10 sec: 39317.0, 60 sec: 45874.3, 300 sec: 45763.9). Total num frames: 1479540736. Throughput: 0: 11855.4. Samples: 369981952. Policy #0 lag: (min: 15.0, avg: 83.8, max: 271.0) [2024-06-15 20:22:45,957][1648985] Avg episode reward: [(0, '151.810')] [2024-06-15 20:22:46,500][1652491] Updated weights for policy 0, policy_version 722464 (0.0033) [2024-06-15 20:22:47,660][1652491] Updated weights for policy 0, policy_version 722519 (0.0012) [2024-06-15 20:22:49,033][1652491] Updated weights for policy 0, policy_version 722592 (0.0014) [2024-06-15 20:22:50,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 47513.6, 300 sec: 46652.7). Total num frames: 1479933952. Throughput: 0: 12242.5. Samples: 370053120. Policy #0 lag: (min: 15.0, avg: 83.8, max: 271.0) [2024-06-15 20:22:50,956][1648985] Avg episode reward: [(0, '158.520')] [2024-06-15 20:22:53,260][1652491] Updated weights for policy 0, policy_version 722662 (0.0012) [2024-06-15 20:22:55,955][1648985] Fps is (10 sec: 52433.2, 60 sec: 45874.9, 300 sec: 46208.4). Total num frames: 1480065024. Throughput: 0: 12140.0. Samples: 370094080. Policy #0 lag: (min: 15.0, avg: 83.8, max: 271.0) [2024-06-15 20:22:55,956][1648985] Avg episode reward: [(0, '176.730')] [2024-06-15 20:22:55,983][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000722688_1480065024.pth... [2024-06-15 20:22:56,109][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000717232_1468891136.pth [2024-06-15 20:22:56,404][1652491] Updated weights for policy 0, policy_version 722704 (0.0015) [2024-06-15 20:22:57,956][1652491] Updated weights for policy 0, policy_version 722784 (0.0015) [2024-06-15 20:22:59,817][1652491] Updated weights for policy 0, policy_version 722868 (0.0013) [2024-06-15 20:23:00,955][1648985] Fps is (10 sec: 52428.1, 60 sec: 50244.3, 300 sec: 46652.7). Total num frames: 1480458240. Throughput: 0: 12253.8. Samples: 370164224. Policy #0 lag: (min: 15.0, avg: 92.2, max: 271.0) [2024-06-15 20:23:00,957][1648985] Avg episode reward: [(0, '162.910')] [2024-06-15 20:23:03,664][1652491] Updated weights for policy 0, policy_version 722935 (0.0014) [2024-06-15 20:23:05,955][1648985] Fps is (10 sec: 52430.8, 60 sec: 45875.2, 300 sec: 46541.7). Total num frames: 1480589312. Throughput: 0: 12185.6. Samples: 370242560. Policy #0 lag: (min: 15.0, avg: 92.2, max: 271.0) [2024-06-15 20:23:05,956][1648985] Avg episode reward: [(0, '135.490')] [2024-06-15 20:23:08,301][1652491] Updated weights for policy 0, policy_version 722993 (0.0012) [2024-06-15 20:23:09,604][1652491] Updated weights for policy 0, policy_version 723056 (0.0095) [2024-06-15 20:23:10,747][1652491] Updated weights for policy 0, policy_version 723115 (0.0088) [2024-06-15 20:23:10,955][1648985] Fps is (10 sec: 49152.7, 60 sec: 51336.5, 300 sec: 46543.0). Total num frames: 1480949760. Throughput: 0: 12231.1. Samples: 370277888. Policy #0 lag: (min: 15.0, avg: 92.2, max: 271.0) [2024-06-15 20:23:10,956][1648985] Avg episode reward: [(0, '127.880')] [2024-06-15 20:23:13,871][1652491] Updated weights for policy 0, policy_version 723141 (0.0016) [2024-06-15 20:23:15,024][1652491] Updated weights for policy 0, policy_version 723194 (0.0013) [2024-06-15 20:23:15,955][1648985] Fps is (10 sec: 52427.4, 60 sec: 47513.4, 300 sec: 46874.9). Total num frames: 1481113600. Throughput: 0: 12105.9. Samples: 370345984. Policy #0 lag: (min: 15.0, avg: 92.2, max: 271.0) [2024-06-15 20:23:15,956][1648985] Avg episode reward: [(0, '145.770')] [2024-06-15 20:23:18,690][1652491] Updated weights for policy 0, policy_version 723237 (0.0012) [2024-06-15 20:23:19,805][1652491] Updated weights for policy 0, policy_version 723290 (0.0015) [2024-06-15 20:23:19,968][1651469] Signal inference workers to stop experience collection... (37600 times) [2024-06-15 20:23:19,995][1652491] InferenceWorker_p0-w0: stopping experience collection (37600 times) [2024-06-15 20:23:20,219][1651469] Signal inference workers to resume experience collection... (37600 times) [2024-06-15 20:23:20,219][1652491] InferenceWorker_p0-w0: resuming experience collection (37600 times) [2024-06-15 20:23:20,955][1648985] Fps is (10 sec: 45875.3, 60 sec: 50790.4, 300 sec: 46763.8). Total num frames: 1481408512. Throughput: 0: 12117.3. Samples: 370418176. Policy #0 lag: (min: 15.0, avg: 92.2, max: 271.0) [2024-06-15 20:23:20,956][1648985] Avg episode reward: [(0, '166.690')] [2024-06-15 20:23:21,631][1652491] Updated weights for policy 0, policy_version 723389 (0.0016) [2024-06-15 20:23:25,955][1648985] Fps is (10 sec: 45876.4, 60 sec: 46967.5, 300 sec: 46874.9). Total num frames: 1481572352. Throughput: 0: 12276.6. Samples: 370454016. Policy #0 lag: (min: 15.0, avg: 92.2, max: 271.0) [2024-06-15 20:23:25,956][1648985] Avg episode reward: [(0, '158.740')] [2024-06-15 20:23:26,194][1652491] Updated weights for policy 0, policy_version 723448 (0.0011) [2024-06-15 20:23:30,012][1652491] Updated weights for policy 0, policy_version 723491 (0.0014) [2024-06-15 20:23:30,955][1648985] Fps is (10 sec: 39320.6, 60 sec: 48605.8, 300 sec: 46430.5). Total num frames: 1481801728. Throughput: 0: 12026.6. Samples: 370523136. Policy #0 lag: (min: 15.0, avg: 92.2, max: 271.0) [2024-06-15 20:23:30,956][1648985] Avg episode reward: [(0, '146.180')] [2024-06-15 20:23:31,355][1652491] Updated weights for policy 0, policy_version 723554 (0.0013) [2024-06-15 20:23:32,871][1652491] Updated weights for policy 0, policy_version 723632 (0.0015) [2024-06-15 20:23:35,955][1648985] Fps is (10 sec: 45875.2, 60 sec: 48059.8, 300 sec: 46652.8). Total num frames: 1482031104. Throughput: 0: 11992.2. Samples: 370592768. Policy #0 lag: (min: 15.0, avg: 92.2, max: 271.0) [2024-06-15 20:23:35,956][1648985] Avg episode reward: [(0, '142.340')] [2024-06-15 20:23:37,061][1652491] Updated weights for policy 0, policy_version 723682 (0.0026) [2024-06-15 20:23:40,955][1648985] Fps is (10 sec: 39322.6, 60 sec: 46421.4, 300 sec: 46541.7). Total num frames: 1482194944. Throughput: 0: 11833.0. Samples: 370626560. Policy #0 lag: (min: 15.0, avg: 92.2, max: 271.0) [2024-06-15 20:23:40,956][1648985] Avg episode reward: [(0, '147.910')] [2024-06-15 20:23:41,372][1652491] Updated weights for policy 0, policy_version 723744 (0.0023) [2024-06-15 20:23:43,311][1652491] Updated weights for policy 0, policy_version 723824 (0.0013) [2024-06-15 20:23:44,714][1652491] Updated weights for policy 0, policy_version 723888 (0.0015) [2024-06-15 20:23:45,955][1648985] Fps is (10 sec: 52428.2, 60 sec: 50245.2, 300 sec: 46652.8). Total num frames: 1482555392. Throughput: 0: 11662.2. Samples: 370689024. Policy #0 lag: (min: 15.0, avg: 92.2, max: 271.0) [2024-06-15 20:23:45,956][1648985] Avg episode reward: [(0, '167.350')] [2024-06-15 20:23:48,587][1652491] Updated weights for policy 0, policy_version 723952 (0.0013) [2024-06-15 20:23:50,955][1648985] Fps is (10 sec: 49152.0, 60 sec: 45875.2, 300 sec: 46986.0). Total num frames: 1482686464. Throughput: 0: 11696.3. Samples: 370768896. Policy #0 lag: (min: 15.0, avg: 92.2, max: 271.0) [2024-06-15 20:23:50,956][1648985] Avg episode reward: [(0, '175.480')] [2024-06-15 20:23:51,884][1652491] Updated weights for policy 0, policy_version 723969 (0.0013) [2024-06-15 20:23:53,579][1652491] Updated weights for policy 0, policy_version 724037 (0.0012) [2024-06-15 20:23:55,519][1652491] Updated weights for policy 0, policy_version 724116 (0.0022) [2024-06-15 20:23:55,955][1648985] Fps is (10 sec: 45875.8, 60 sec: 49152.3, 300 sec: 46874.9). Total num frames: 1483014144. Throughput: 0: 11639.5. Samples: 370801664. Policy #0 lag: (min: 15.0, avg: 92.2, max: 271.0) [2024-06-15 20:23:55,956][1648985] Avg episode reward: [(0, '174.190')] [2024-06-15 20:23:58,948][1652491] Updated weights for policy 0, policy_version 724164 (0.0016) [2024-06-15 20:24:00,257][1652491] Updated weights for policy 0, policy_version 724222 (0.0020) [2024-06-15 20:24:00,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 45875.3, 300 sec: 47097.1). Total num frames: 1483210752. Throughput: 0: 11650.9. Samples: 370870272. Policy #0 lag: (min: 15.0, avg: 92.2, max: 271.0) [2024-06-15 20:24:00,956][1648985] Avg episode reward: [(0, '172.240')] [2024-06-15 20:24:02,576][1651469] Signal inference workers to stop experience collection... (37650 times) [2024-06-15 20:24:02,678][1652491] InferenceWorker_p0-w0: stopping experience collection (37650 times) [2024-06-15 20:24:02,816][1651469] Signal inference workers to resume experience collection... (37650 times) [2024-06-15 20:24:02,817][1652491] InferenceWorker_p0-w0: resuming experience collection (37650 times) [2024-06-15 20:24:03,520][1652491] Updated weights for policy 0, policy_version 724272 (0.0016) [2024-06-15 20:24:04,537][1652491] Updated weights for policy 0, policy_version 724320 (0.0014) [2024-06-15 20:24:05,839][1652491] Updated weights for policy 0, policy_version 724374 (0.0013) [2024-06-15 20:24:05,955][1648985] Fps is (10 sec: 49151.8, 60 sec: 48605.8, 300 sec: 47097.1). Total num frames: 1483505664. Throughput: 0: 11594.0. Samples: 370939904. Policy #0 lag: (min: 15.0, avg: 92.2, max: 271.0) [2024-06-15 20:24:05,956][1648985] Avg episode reward: [(0, '179.820')] [2024-06-15 20:24:09,553][1652491] Updated weights for policy 0, policy_version 724437 (0.0014) [2024-06-15 20:24:10,955][1648985] Fps is (10 sec: 52428.2, 60 sec: 46421.3, 300 sec: 47097.0). Total num frames: 1483735040. Throughput: 0: 11673.6. Samples: 370979328. Policy #0 lag: (min: 15.0, avg: 92.2, max: 271.0) [2024-06-15 20:24:10,956][1648985] Avg episode reward: [(0, '168.600')] [2024-06-15 20:24:14,368][1652491] Updated weights for policy 0, policy_version 724496 (0.0014) [2024-06-15 20:24:15,842][1652491] Updated weights for policy 0, policy_version 724560 (0.0013) [2024-06-15 20:24:15,955][1648985] Fps is (10 sec: 39321.8, 60 sec: 46421.5, 300 sec: 47097.1). Total num frames: 1483898880. Throughput: 0: 11662.3. Samples: 371047936. Policy #0 lag: (min: 15.0, avg: 92.2, max: 271.0) [2024-06-15 20:24:15,955][1648985] Avg episode reward: [(0, '169.040')] [2024-06-15 20:24:17,092][1652491] Updated weights for policy 0, policy_version 724613 (0.0103) [2024-06-15 20:24:18,321][1652491] Updated weights for policy 0, policy_version 724665 (0.0015) [2024-06-15 20:24:20,910][1652491] Updated weights for policy 0, policy_version 724691 (0.0013) [2024-06-15 20:24:20,955][1648985] Fps is (10 sec: 42598.9, 60 sec: 45875.2, 300 sec: 46763.8). Total num frames: 1484161024. Throughput: 0: 11639.5. Samples: 371116544. Policy #0 lag: (min: 15.0, avg: 92.2, max: 271.0) [2024-06-15 20:24:20,956][1648985] Avg episode reward: [(0, '167.620')] [2024-06-15 20:24:25,957][1648985] Fps is (10 sec: 39315.9, 60 sec: 45328.0, 300 sec: 46874.7). Total num frames: 1484292096. Throughput: 0: 11787.0. Samples: 371156992. Policy #0 lag: (min: 15.0, avg: 92.2, max: 271.0) [2024-06-15 20:24:25,958][1648985] Avg episode reward: [(0, '162.510')] [2024-06-15 20:24:26,136][1652491] Updated weights for policy 0, policy_version 724768 (0.0014) [2024-06-15 20:24:27,918][1652491] Updated weights for policy 0, policy_version 724848 (0.0013) [2024-06-15 20:24:29,304][1652491] Updated weights for policy 0, policy_version 724898 (0.0020) [2024-06-15 20:24:30,955][1648985] Fps is (10 sec: 49152.0, 60 sec: 47513.8, 300 sec: 46986.0). Total num frames: 1484652544. Throughput: 0: 11662.3. Samples: 371213824. Policy #0 lag: (min: 15.0, avg: 92.2, max: 271.0) [2024-06-15 20:24:30,956][1648985] Avg episode reward: [(0, '165.140')] [2024-06-15 20:24:32,669][1652491] Updated weights for policy 0, policy_version 724945 (0.0030) [2024-06-15 20:24:35,955][1648985] Fps is (10 sec: 49158.6, 60 sec: 45875.1, 300 sec: 46874.9). Total num frames: 1484783616. Throughput: 0: 11605.3. Samples: 371291136. Policy #0 lag: (min: 15.0, avg: 92.2, max: 271.0) [2024-06-15 20:24:35,956][1648985] Avg episode reward: [(0, '160.670')] [2024-06-15 20:24:37,384][1652491] Updated weights for policy 0, policy_version 725008 (0.0013) [2024-06-15 20:24:38,785][1652491] Updated weights for policy 0, policy_version 725072 (0.0013) [2024-06-15 20:24:40,355][1651469] Signal inference workers to stop experience collection... (37700 times) [2024-06-15 20:24:40,467][1652491] InferenceWorker_p0-w0: stopping experience collection (37700 times) [2024-06-15 20:24:40,631][1651469] Signal inference workers to resume experience collection... (37700 times) [2024-06-15 20:24:40,632][1652491] InferenceWorker_p0-w0: resuming experience collection (37700 times) [2024-06-15 20:24:40,955][1648985] Fps is (10 sec: 42598.3, 60 sec: 48059.7, 300 sec: 46874.9). Total num frames: 1485078528. Throughput: 0: 11605.3. Samples: 371323904. Policy #0 lag: (min: 63.0, avg: 133.8, max: 319.0) [2024-06-15 20:24:40,956][1648985] Avg episode reward: [(0, '160.600')] [2024-06-15 20:24:41,121][1652491] Updated weights for policy 0, policy_version 725155 (0.0109) [2024-06-15 20:24:44,302][1652491] Updated weights for policy 0, policy_version 725200 (0.0014) [2024-06-15 20:24:45,482][1652491] Updated weights for policy 0, policy_version 725241 (0.0012) [2024-06-15 20:24:45,955][1648985] Fps is (10 sec: 52428.5, 60 sec: 45875.2, 300 sec: 46986.0). Total num frames: 1485307904. Throughput: 0: 11434.6. Samples: 371384832. Policy #0 lag: (min: 63.0, avg: 133.8, max: 319.0) [2024-06-15 20:24:45,956][1648985] Avg episode reward: [(0, '148.980')] [2024-06-15 20:24:49,435][1652491] Updated weights for policy 0, policy_version 725281 (0.0014) [2024-06-15 20:24:50,955][1648985] Fps is (10 sec: 42597.2, 60 sec: 46967.2, 300 sec: 46985.9). Total num frames: 1485504512. Throughput: 0: 11502.9. Samples: 371457536. Policy #0 lag: (min: 63.0, avg: 133.8, max: 319.0) [2024-06-15 20:24:50,956][1648985] Avg episode reward: [(0, '145.560')] [2024-06-15 20:24:51,051][1652491] Updated weights for policy 0, policy_version 725360 (0.0014) [2024-06-15 20:24:52,666][1652491] Updated weights for policy 0, policy_version 725424 (0.0015) [2024-06-15 20:24:55,732][1652491] Updated weights for policy 0, policy_version 725456 (0.0014) [2024-06-15 20:24:55,955][1648985] Fps is (10 sec: 42597.7, 60 sec: 45328.8, 300 sec: 46763.8). Total num frames: 1485733888. Throughput: 0: 11332.2. Samples: 371489280. Policy #0 lag: (min: 63.0, avg: 133.8, max: 319.0) [2024-06-15 20:24:55,956][1648985] Avg episode reward: [(0, '153.670')] [2024-06-15 20:24:56,504][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000725488_1485799424.pth... [2024-06-15 20:24:56,558][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000720000_1474560000.pth [2024-06-15 20:25:00,623][1652491] Updated weights for policy 0, policy_version 725536 (0.0012) [2024-06-15 20:25:00,956][1648985] Fps is (10 sec: 39321.9, 60 sec: 44782.8, 300 sec: 46763.8). Total num frames: 1485897728. Throughput: 0: 11480.1. Samples: 371564544. Policy #0 lag: (min: 63.0, avg: 133.8, max: 319.0) [2024-06-15 20:25:00,957][1648985] Avg episode reward: [(0, '151.690')] [2024-06-15 20:25:01,631][1652491] Updated weights for policy 0, policy_version 725570 (0.0012) [2024-06-15 20:25:03,030][1652491] Updated weights for policy 0, policy_version 725634 (0.0014) [2024-06-15 20:25:05,955][1648985] Fps is (10 sec: 49153.3, 60 sec: 45329.0, 300 sec: 46652.8). Total num frames: 1486225408. Throughput: 0: 11275.4. Samples: 371623936. Policy #0 lag: (min: 63.0, avg: 133.8, max: 319.0) [2024-06-15 20:25:05,956][1648985] Avg episode reward: [(0, '142.600')] [2024-06-15 20:25:06,579][1652491] Updated weights for policy 0, policy_version 725698 (0.0012) [2024-06-15 20:25:10,955][1648985] Fps is (10 sec: 45875.9, 60 sec: 43690.7, 300 sec: 46652.8). Total num frames: 1486356480. Throughput: 0: 11173.3. Samples: 371659776. Policy #0 lag: (min: 63.0, avg: 133.8, max: 319.0) [2024-06-15 20:25:10,956][1648985] Avg episode reward: [(0, '144.640')] [2024-06-15 20:25:11,265][1652491] Updated weights for policy 0, policy_version 725761 (0.0014) [2024-06-15 20:25:12,729][1652491] Updated weights for policy 0, policy_version 725821 (0.0099) [2024-06-15 20:25:14,797][1652491] Updated weights for policy 0, policy_version 725888 (0.0137) [2024-06-15 20:25:15,955][1648985] Fps is (10 sec: 49152.0, 60 sec: 46967.4, 300 sec: 46652.8). Total num frames: 1486716928. Throughput: 0: 11491.5. Samples: 371730944. Policy #0 lag: (min: 63.0, avg: 133.8, max: 319.0) [2024-06-15 20:25:15,956][1648985] Avg episode reward: [(0, '164.850')] [2024-06-15 20:25:16,251][1652491] Updated weights for policy 0, policy_version 725952 (0.0032) [2024-06-15 20:25:18,655][1652491] Updated weights for policy 0, policy_version 726016 (0.0040) [2024-06-15 20:25:20,955][1648985] Fps is (10 sec: 52428.4, 60 sec: 45329.0, 300 sec: 46652.7). Total num frames: 1486880768. Throughput: 0: 11298.1. Samples: 371799552. Policy #0 lag: (min: 63.0, avg: 133.8, max: 319.0) [2024-06-15 20:25:20,956][1648985] Avg episode reward: [(0, '167.350')] [2024-06-15 20:25:24,244][1652491] Updated weights for policy 0, policy_version 726080 (0.0078) [2024-06-15 20:25:24,371][1651469] Signal inference workers to stop experience collection... (37750 times) [2024-06-15 20:25:24,412][1652491] InferenceWorker_p0-w0: stopping experience collection (37750 times) [2024-06-15 20:25:24,704][1651469] Signal inference workers to resume experience collection... (37750 times) [2024-06-15 20:25:24,705][1652491] InferenceWorker_p0-w0: resuming experience collection (37750 times) [2024-06-15 20:25:25,955][1648985] Fps is (10 sec: 42598.5, 60 sec: 47514.7, 300 sec: 46874.9). Total num frames: 1487142912. Throughput: 0: 11468.8. Samples: 371840000. Policy #0 lag: (min: 63.0, avg: 133.8, max: 319.0) [2024-06-15 20:25:25,956][1648985] Avg episode reward: [(0, '193.130')] [2024-06-15 20:25:26,985][1652491] Updated weights for policy 0, policy_version 726178 (0.0074) [2024-06-15 20:25:28,991][1652491] Updated weights for policy 0, policy_version 726210 (0.0011) [2024-06-15 20:25:30,955][1648985] Fps is (10 sec: 52429.5, 60 sec: 45875.2, 300 sec: 46986.0). Total num frames: 1487405056. Throughput: 0: 11377.8. Samples: 371896832. Policy #0 lag: (min: 63.0, avg: 133.8, max: 319.0) [2024-06-15 20:25:30,956][1648985] Avg episode reward: [(0, '176.400')] [2024-06-15 20:25:34,900][1652491] Updated weights for policy 0, policy_version 726273 (0.0017) [2024-06-15 20:25:35,955][1648985] Fps is (10 sec: 36044.5, 60 sec: 45329.0, 300 sec: 46763.8). Total num frames: 1487503360. Throughput: 0: 11559.9. Samples: 371977728. Policy #0 lag: (min: 63.0, avg: 133.8, max: 319.0) [2024-06-15 20:25:35,956][1648985] Avg episode reward: [(0, '161.780')] [2024-06-15 20:25:36,593][1652491] Updated weights for policy 0, policy_version 726352 (0.0014) [2024-06-15 20:25:38,296][1652491] Updated weights for policy 0, policy_version 726416 (0.0012) [2024-06-15 20:25:39,597][1652491] Updated weights for policy 0, policy_version 726462 (0.0011) [2024-06-15 20:25:40,955][1648985] Fps is (10 sec: 45875.0, 60 sec: 46421.3, 300 sec: 46874.9). Total num frames: 1487863808. Throughput: 0: 11332.3. Samples: 371999232. Policy #0 lag: (min: 63.0, avg: 133.8, max: 319.0) [2024-06-15 20:25:40,956][1648985] Avg episode reward: [(0, '137.550')] [2024-06-15 20:25:41,290][1652491] Updated weights for policy 0, policy_version 726528 (0.0012) [2024-06-15 20:25:45,955][1648985] Fps is (10 sec: 42599.0, 60 sec: 43690.8, 300 sec: 46652.8). Total num frames: 1487929344. Throughput: 0: 11332.3. Samples: 372074496. Policy #0 lag: (min: 63.0, avg: 133.8, max: 319.0) [2024-06-15 20:25:45,955][1648985] Avg episode reward: [(0, '166.320')] [2024-06-15 20:25:47,322][1652491] Updated weights for policy 0, policy_version 726581 (0.0017) [2024-06-15 20:25:49,309][1652491] Updated weights for policy 0, policy_version 726656 (0.0013) [2024-06-15 20:25:50,704][1652491] Updated weights for policy 0, policy_version 726719 (0.0020) [2024-06-15 20:25:50,955][1648985] Fps is (10 sec: 45874.7, 60 sec: 46967.6, 300 sec: 46652.8). Total num frames: 1488322560. Throughput: 0: 11457.4. Samples: 372139520. Policy #0 lag: (min: 63.0, avg: 133.8, max: 319.0) [2024-06-15 20:25:50,956][1648985] Avg episode reward: [(0, '181.780')] [2024-06-15 20:25:52,308][1652491] Updated weights for policy 0, policy_version 726783 (0.0013) [2024-06-15 20:25:55,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 45329.3, 300 sec: 46652.8). Total num frames: 1488453632. Throughput: 0: 11400.6. Samples: 372172800. Policy #0 lag: (min: 63.0, avg: 133.8, max: 319.0) [2024-06-15 20:25:55,955][1648985] Avg episode reward: [(0, '166.300')] [2024-06-15 20:25:59,095][1652491] Updated weights for policy 0, policy_version 726842 (0.0157) [2024-06-15 20:26:00,489][1652491] Updated weights for policy 0, policy_version 726899 (0.0013) [2024-06-15 20:26:00,955][1648985] Fps is (10 sec: 39322.1, 60 sec: 46967.6, 300 sec: 46652.8). Total num frames: 1488715776. Throughput: 0: 11446.0. Samples: 372246016. Policy #0 lag: (min: 63.0, avg: 133.8, max: 319.0) [2024-06-15 20:26:00,956][1648985] Avg episode reward: [(0, '158.370')] [2024-06-15 20:26:02,410][1652491] Updated weights for policy 0, policy_version 726975 (0.0015) [2024-06-15 20:26:03,525][1651469] Signal inference workers to stop experience collection... (37800 times) [2024-06-15 20:26:03,575][1652491] InferenceWorker_p0-w0: stopping experience collection (37800 times) [2024-06-15 20:26:03,721][1651469] Signal inference workers to resume experience collection... (37800 times) [2024-06-15 20:26:03,723][1652491] InferenceWorker_p0-w0: resuming experience collection (37800 times) [2024-06-15 20:26:04,208][1652491] Updated weights for policy 0, policy_version 727034 (0.0014) [2024-06-15 20:26:05,955][1648985] Fps is (10 sec: 52428.4, 60 sec: 45875.2, 300 sec: 46652.8). Total num frames: 1488977920. Throughput: 0: 11355.0. Samples: 372310528. Policy #0 lag: (min: 63.0, avg: 133.8, max: 319.0) [2024-06-15 20:26:05,956][1648985] Avg episode reward: [(0, '159.090')] [2024-06-15 20:26:09,955][1652491] Updated weights for policy 0, policy_version 727078 (0.0016) [2024-06-15 20:26:10,955][1648985] Fps is (10 sec: 42597.7, 60 sec: 46421.2, 300 sec: 46763.8). Total num frames: 1489141760. Throughput: 0: 11457.4. Samples: 372355584. Policy #0 lag: (min: 63.0, avg: 133.8, max: 319.0) [2024-06-15 20:26:10,956][1648985] Avg episode reward: [(0, '171.410')] [2024-06-15 20:26:11,698][1652491] Updated weights for policy 0, policy_version 727154 (0.0013) [2024-06-15 20:26:13,434][1652491] Updated weights for policy 0, policy_version 727225 (0.0013) [2024-06-15 20:26:15,386][1652491] Updated weights for policy 0, policy_version 727296 (0.0016) [2024-06-15 20:26:15,955][1648985] Fps is (10 sec: 52429.1, 60 sec: 46421.4, 300 sec: 46652.7). Total num frames: 1489502208. Throughput: 0: 11446.0. Samples: 372411904. Policy #0 lag: (min: 63.0, avg: 133.8, max: 319.0) [2024-06-15 20:26:15,955][1648985] Avg episode reward: [(0, '172.680')] [2024-06-15 20:26:20,955][1648985] Fps is (10 sec: 42599.2, 60 sec: 44783.0, 300 sec: 46653.8). Total num frames: 1489567744. Throughput: 0: 11559.9. Samples: 372497920. Policy #0 lag: (min: 31.0, avg: 95.5, max: 287.0) [2024-06-15 20:26:20,956][1648985] Avg episode reward: [(0, '170.360')] [2024-06-15 20:26:21,161][1652491] Updated weights for policy 0, policy_version 727347 (0.0012) [2024-06-15 20:26:22,530][1652491] Updated weights for policy 0, policy_version 727418 (0.0013) [2024-06-15 20:26:23,884][1652491] Updated weights for policy 0, policy_version 727486 (0.0017) [2024-06-15 20:26:25,955][1648985] Fps is (10 sec: 49151.4, 60 sec: 47513.5, 300 sec: 46986.0). Total num frames: 1489993728. Throughput: 0: 11685.0. Samples: 372525056. Policy #0 lag: (min: 31.0, avg: 95.5, max: 287.0) [2024-06-15 20:26:25,956][1648985] Avg episode reward: [(0, '183.910')] [2024-06-15 20:26:26,114][1652491] Updated weights for policy 0, policy_version 727547 (0.0013) [2024-06-15 20:26:30,955][1648985] Fps is (10 sec: 45874.9, 60 sec: 43690.6, 300 sec: 46652.7). Total num frames: 1490026496. Throughput: 0: 11832.9. Samples: 372606976. Policy #0 lag: (min: 31.0, avg: 95.5, max: 287.0) [2024-06-15 20:26:30,956][1648985] Avg episode reward: [(0, '153.130')] [2024-06-15 20:26:32,326][1652491] Updated weights for policy 0, policy_version 727603 (0.0015) [2024-06-15 20:26:33,620][1652491] Updated weights for policy 0, policy_version 727664 (0.0012) [2024-06-15 20:26:35,402][1652491] Updated weights for policy 0, policy_version 727740 (0.0106) [2024-06-15 20:26:35,955][1648985] Fps is (10 sec: 42598.4, 60 sec: 48605.9, 300 sec: 46652.8). Total num frames: 1490419712. Throughput: 0: 11685.0. Samples: 372665344. Policy #0 lag: (min: 31.0, avg: 95.5, max: 287.0) [2024-06-15 20:26:35,956][1648985] Avg episode reward: [(0, '137.350')] [2024-06-15 20:26:37,773][1652491] Updated weights for policy 0, policy_version 727801 (0.0011) [2024-06-15 20:26:40,955][1648985] Fps is (10 sec: 52427.3, 60 sec: 44782.7, 300 sec: 46652.7). Total num frames: 1490550784. Throughput: 0: 11775.9. Samples: 372702720. Policy #0 lag: (min: 31.0, avg: 95.5, max: 287.0) [2024-06-15 20:26:40,956][1648985] Avg episode reward: [(0, '139.580')] [2024-06-15 20:26:43,382][1652491] Updated weights for policy 0, policy_version 727862 (0.0014) [2024-06-15 20:26:44,406][1651469] Signal inference workers to stop experience collection... (37850 times) [2024-06-15 20:26:44,443][1652491] InferenceWorker_p0-w0: stopping experience collection (37850 times) [2024-06-15 20:26:44,554][1651469] Signal inference workers to resume experience collection... (37850 times) [2024-06-15 20:26:44,555][1652491] InferenceWorker_p0-w0: resuming experience collection (37850 times) [2024-06-15 20:26:44,558][1652491] Updated weights for policy 0, policy_version 727920 (0.0012) [2024-06-15 20:26:45,773][1652491] Updated weights for policy 0, policy_version 727970 (0.0012) [2024-06-15 20:26:45,955][1648985] Fps is (10 sec: 45875.6, 60 sec: 49152.0, 300 sec: 46763.8). Total num frames: 1490878464. Throughput: 0: 11923.9. Samples: 372782592. Policy #0 lag: (min: 31.0, avg: 95.5, max: 287.0) [2024-06-15 20:26:45,956][1648985] Avg episode reward: [(0, '155.250')] [2024-06-15 20:26:48,317][1652491] Updated weights for policy 0, policy_version 728033 (0.0024) [2024-06-15 20:26:50,955][1648985] Fps is (10 sec: 52430.3, 60 sec: 45875.3, 300 sec: 46652.7). Total num frames: 1491075072. Throughput: 0: 12071.8. Samples: 372853760. Policy #0 lag: (min: 31.0, avg: 95.5, max: 287.0) [2024-06-15 20:26:50,956][1648985] Avg episode reward: [(0, '162.660')] [2024-06-15 20:26:53,028][1652491] Updated weights for policy 0, policy_version 728081 (0.0082) [2024-06-15 20:26:54,642][1652491] Updated weights for policy 0, policy_version 728144 (0.0013) [2024-06-15 20:26:55,850][1652491] Updated weights for policy 0, policy_version 728187 (0.0012) [2024-06-15 20:26:55,955][1648985] Fps is (10 sec: 45875.5, 60 sec: 48059.7, 300 sec: 47097.1). Total num frames: 1491337216. Throughput: 0: 11912.6. Samples: 372891648. Policy #0 lag: (min: 31.0, avg: 95.5, max: 287.0) [2024-06-15 20:26:55,956][1648985] Avg episode reward: [(0, '154.730')] [2024-06-15 20:26:55,995][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000728192_1491337216.pth... [2024-06-15 20:26:56,199][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000722688_1480065024.pth [2024-06-15 20:26:57,499][1652491] Updated weights for policy 0, policy_version 728251 (0.0013) [2024-06-15 20:26:59,299][1652491] Updated weights for policy 0, policy_version 728304 (0.0012) [2024-06-15 20:27:00,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 48059.7, 300 sec: 46652.7). Total num frames: 1491599360. Throughput: 0: 11980.8. Samples: 372951040. Policy #0 lag: (min: 31.0, avg: 95.5, max: 287.0) [2024-06-15 20:27:00,956][1648985] Avg episode reward: [(0, '143.430')] [2024-06-15 20:27:04,167][1652491] Updated weights for policy 0, policy_version 728336 (0.0011) [2024-06-15 20:27:05,955][1648985] Fps is (10 sec: 42598.4, 60 sec: 46421.4, 300 sec: 47097.1). Total num frames: 1491763200. Throughput: 0: 11764.6. Samples: 373027328. Policy #0 lag: (min: 31.0, avg: 95.5, max: 287.0) [2024-06-15 20:27:05,955][1648985] Avg episode reward: [(0, '139.740')] [2024-06-15 20:27:06,036][1652491] Updated weights for policy 0, policy_version 728401 (0.0013) [2024-06-15 20:27:07,654][1652491] Updated weights for policy 0, policy_version 728451 (0.0012) [2024-06-15 20:27:08,709][1652491] Updated weights for policy 0, policy_version 728505 (0.0048) [2024-06-15 20:27:10,523][1652491] Updated weights for policy 0, policy_version 728564 (0.0015) [2024-06-15 20:27:10,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 49698.3, 300 sec: 46986.0). Total num frames: 1492123648. Throughput: 0: 11821.5. Samples: 373057024. Policy #0 lag: (min: 31.0, avg: 95.5, max: 287.0) [2024-06-15 20:27:10,956][1648985] Avg episode reward: [(0, '143.400')] [2024-06-15 20:27:15,275][1652491] Updated weights for policy 0, policy_version 728580 (0.0085) [2024-06-15 20:27:15,955][1648985] Fps is (10 sec: 42597.4, 60 sec: 44782.8, 300 sec: 46874.9). Total num frames: 1492189184. Throughput: 0: 11798.7. Samples: 373137920. Policy #0 lag: (min: 31.0, avg: 95.5, max: 287.0) [2024-06-15 20:27:15,956][1648985] Avg episode reward: [(0, '168.320')] [2024-06-15 20:27:17,305][1652491] Updated weights for policy 0, policy_version 728673 (0.0230) [2024-06-15 20:27:17,879][1652491] Updated weights for policy 0, policy_version 728704 (0.0013) [2024-06-15 20:27:20,256][1652491] Updated weights for policy 0, policy_version 728771 (0.0014) [2024-06-15 20:27:20,955][1648985] Fps is (10 sec: 45875.5, 60 sec: 50244.3, 300 sec: 46874.9). Total num frames: 1492582400. Throughput: 0: 11855.7. Samples: 373198848. Policy #0 lag: (min: 31.0, avg: 95.5, max: 287.0) [2024-06-15 20:27:20,955][1648985] Avg episode reward: [(0, '160.810')] [2024-06-15 20:27:21,415][1652491] Updated weights for policy 0, policy_version 728826 (0.0126) [2024-06-15 20:27:25,955][1648985] Fps is (10 sec: 45875.9, 60 sec: 44236.8, 300 sec: 46652.8). Total num frames: 1492647936. Throughput: 0: 11935.4. Samples: 373239808. Policy #0 lag: (min: 31.0, avg: 95.5, max: 287.0) [2024-06-15 20:27:25,956][1648985] Avg episode reward: [(0, '168.920')] [2024-06-15 20:27:26,469][1651469] Signal inference workers to stop experience collection... (37900 times) [2024-06-15 20:27:26,526][1652491] InferenceWorker_p0-w0: stopping experience collection (37900 times) [2024-06-15 20:27:26,636][1651469] Signal inference workers to resume experience collection... (37900 times) [2024-06-15 20:27:26,654][1652491] InferenceWorker_p0-w0: resuming experience collection (37900 times) [2024-06-15 20:27:26,828][1652491] Updated weights for policy 0, policy_version 728865 (0.0012) [2024-06-15 20:27:27,784][1652491] Updated weights for policy 0, policy_version 728917 (0.0012) [2024-06-15 20:27:29,215][1652491] Updated weights for policy 0, policy_version 728963 (0.0012) [2024-06-15 20:27:30,272][1652491] Updated weights for policy 0, policy_version 729012 (0.0014) [2024-06-15 20:27:30,955][1648985] Fps is (10 sec: 49151.7, 60 sec: 50790.4, 300 sec: 47208.1). Total num frames: 1493073920. Throughput: 0: 11958.0. Samples: 373320704. Policy #0 lag: (min: 31.0, avg: 95.5, max: 287.0) [2024-06-15 20:27:30,956][1648985] Avg episode reward: [(0, '155.790')] [2024-06-15 20:27:31,749][1652491] Updated weights for policy 0, policy_version 729077 (0.0013) [2024-06-15 20:27:35,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 45875.2, 300 sec: 46652.7). Total num frames: 1493172224. Throughput: 0: 12151.5. Samples: 373400576. Policy #0 lag: (min: 31.0, avg: 95.5, max: 287.0) [2024-06-15 20:27:35,956][1648985] Avg episode reward: [(0, '154.310')] [2024-06-15 20:27:37,630][1652491] Updated weights for policy 0, policy_version 729139 (0.0013) [2024-06-15 20:27:38,537][1652491] Updated weights for policy 0, policy_version 729184 (0.0031) [2024-06-15 20:27:39,874][1652491] Updated weights for policy 0, policy_version 729232 (0.0012) [2024-06-15 20:27:40,961][1648985] Fps is (10 sec: 49121.6, 60 sec: 50239.3, 300 sec: 47540.6). Total num frames: 1493565440. Throughput: 0: 11979.1. Samples: 373430784. Policy #0 lag: (min: 31.0, avg: 95.5, max: 287.0) [2024-06-15 20:27:40,962][1648985] Avg episode reward: [(0, '153.550')] [2024-06-15 20:27:42,000][1652491] Updated weights for policy 0, policy_version 729316 (0.0122) [2024-06-15 20:27:45,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 46967.4, 300 sec: 46652.7). Total num frames: 1493696512. Throughput: 0: 12197.0. Samples: 373499904. Policy #0 lag: (min: 31.0, avg: 95.5, max: 287.0) [2024-06-15 20:27:45,956][1648985] Avg episode reward: [(0, '146.840')] [2024-06-15 20:27:47,913][1652491] Updated weights for policy 0, policy_version 729360 (0.0013) [2024-06-15 20:27:49,124][1652491] Updated weights for policy 0, policy_version 729408 (0.0012) [2024-06-15 20:27:50,955][1648985] Fps is (10 sec: 42624.7, 60 sec: 48605.9, 300 sec: 47208.2). Total num frames: 1493991424. Throughput: 0: 12037.7. Samples: 373569024. Policy #0 lag: (min: 31.0, avg: 95.5, max: 287.0) [2024-06-15 20:27:50,956][1648985] Avg episode reward: [(0, '162.300')] [2024-06-15 20:27:51,023][1652491] Updated weights for policy 0, policy_version 729494 (0.0102) [2024-06-15 20:27:52,095][1652491] Updated weights for policy 0, policy_version 729537 (0.0012) [2024-06-15 20:27:53,263][1652491] Updated weights for policy 0, policy_version 729595 (0.0015) [2024-06-15 20:27:55,955][1648985] Fps is (10 sec: 52428.1, 60 sec: 48059.5, 300 sec: 46652.7). Total num frames: 1494220800. Throughput: 0: 12049.0. Samples: 373599232. Policy #0 lag: (min: 31.0, avg: 95.5, max: 287.0) [2024-06-15 20:27:55,956][1648985] Avg episode reward: [(0, '163.920')] [2024-06-15 20:27:59,154][1652491] Updated weights for policy 0, policy_version 729660 (0.0012) [2024-06-15 20:28:00,955][1648985] Fps is (10 sec: 42598.2, 60 sec: 46967.4, 300 sec: 46874.9). Total num frames: 1494417408. Throughput: 0: 12174.2. Samples: 373685760. Policy #0 lag: (min: 31.0, avg: 95.5, max: 287.0) [2024-06-15 20:28:00,956][1648985] Avg episode reward: [(0, '159.500')] [2024-06-15 20:28:01,261][1652491] Updated weights for policy 0, policy_version 729712 (0.0012) [2024-06-15 20:28:01,856][1651469] Signal inference workers to stop experience collection... (37950 times) [2024-06-15 20:28:01,902][1652491] InferenceWorker_p0-w0: stopping experience collection (37950 times) [2024-06-15 20:28:02,131][1651469] Signal inference workers to resume experience collection... (37950 times) [2024-06-15 20:28:02,132][1652491] InferenceWorker_p0-w0: resuming experience collection (37950 times) [2024-06-15 20:28:02,785][1652491] Updated weights for policy 0, policy_version 729768 (0.0012) [2024-06-15 20:28:04,379][1652491] Updated weights for policy 0, policy_version 729840 (0.0012) [2024-06-15 20:28:05,955][1648985] Fps is (10 sec: 52429.6, 60 sec: 49698.1, 300 sec: 46763.8). Total num frames: 1494745088. Throughput: 0: 12151.4. Samples: 373745664. Policy #0 lag: (min: 97.0, avg: 186.7, max: 334.0) [2024-06-15 20:28:05,956][1648985] Avg episode reward: [(0, '148.940')] [2024-06-15 20:28:09,168][1652491] Updated weights for policy 0, policy_version 729872 (0.0031) [2024-06-15 20:28:10,955][1648985] Fps is (10 sec: 45875.8, 60 sec: 45875.2, 300 sec: 46652.8). Total num frames: 1494876160. Throughput: 0: 12208.4. Samples: 373789184. Policy #0 lag: (min: 97.0, avg: 186.7, max: 334.0) [2024-06-15 20:28:10,956][1648985] Avg episode reward: [(0, '155.170')] [2024-06-15 20:28:11,888][1652491] Updated weights for policy 0, policy_version 729936 (0.0081) [2024-06-15 20:28:14,065][1652491] Updated weights for policy 0, policy_version 730032 (0.0012) [2024-06-15 20:28:15,678][1652491] Updated weights for policy 0, policy_version 730098 (0.0022) [2024-06-15 20:28:15,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 51336.7, 300 sec: 46986.0). Total num frames: 1495269376. Throughput: 0: 11696.4. Samples: 373847040. Policy #0 lag: (min: 97.0, avg: 186.7, max: 334.0) [2024-06-15 20:28:15,956][1648985] Avg episode reward: [(0, '156.770')] [2024-06-15 20:28:20,853][1652491] Updated weights for policy 0, policy_version 730160 (0.0012) [2024-06-15 20:28:20,955][1648985] Fps is (10 sec: 49151.9, 60 sec: 46421.3, 300 sec: 46763.8). Total num frames: 1495367680. Throughput: 0: 11616.7. Samples: 373923328. Policy #0 lag: (min: 97.0, avg: 186.7, max: 334.0) [2024-06-15 20:28:20,956][1648985] Avg episode reward: [(0, '154.340')] [2024-06-15 20:28:24,745][1652491] Updated weights for policy 0, policy_version 730225 (0.0015) [2024-06-15 20:28:25,955][1648985] Fps is (10 sec: 32768.0, 60 sec: 49152.0, 300 sec: 46763.9). Total num frames: 1495597056. Throughput: 0: 11823.1. Samples: 373962752. Policy #0 lag: (min: 97.0, avg: 186.7, max: 334.0) [2024-06-15 20:28:25,956][1648985] Avg episode reward: [(0, '151.090')] [2024-06-15 20:28:26,060][1652491] Updated weights for policy 0, policy_version 730274 (0.0042) [2024-06-15 20:28:27,185][1652491] Updated weights for policy 0, policy_version 730336 (0.0016) [2024-06-15 20:28:30,532][1652491] Updated weights for policy 0, policy_version 730374 (0.0011) [2024-06-15 20:28:30,955][1648985] Fps is (10 sec: 45874.0, 60 sec: 45875.0, 300 sec: 46763.8). Total num frames: 1495826432. Throughput: 0: 11867.0. Samples: 374033920. Policy #0 lag: (min: 97.0, avg: 186.7, max: 334.0) [2024-06-15 20:28:30,956][1648985] Avg episode reward: [(0, '143.950')] [2024-06-15 20:28:32,067][1652491] Updated weights for policy 0, policy_version 730432 (0.0012) [2024-06-15 20:28:35,955][1648985] Fps is (10 sec: 39321.7, 60 sec: 46967.5, 300 sec: 46763.8). Total num frames: 1495990272. Throughput: 0: 11821.5. Samples: 374100992. Policy #0 lag: (min: 97.0, avg: 186.7, max: 334.0) [2024-06-15 20:28:35,956][1648985] Avg episode reward: [(0, '144.330')] [2024-06-15 20:28:36,683][1652491] Updated weights for policy 0, policy_version 730504 (0.0013) [2024-06-15 20:28:37,883][1652491] Updated weights for policy 0, policy_version 730557 (0.0013) [2024-06-15 20:28:39,123][1652491] Updated weights for policy 0, policy_version 730622 (0.0013) [2024-06-15 20:28:40,955][1648985] Fps is (10 sec: 49152.9, 60 sec: 45879.9, 300 sec: 46652.8). Total num frames: 1496317952. Throughput: 0: 11730.5. Samples: 374127104. Policy #0 lag: (min: 97.0, avg: 186.7, max: 334.0) [2024-06-15 20:28:40,956][1648985] Avg episode reward: [(0, '164.180')] [2024-06-15 20:28:41,821][1651469] Signal inference workers to stop experience collection... (38000 times) [2024-06-15 20:28:41,872][1652491] InferenceWorker_p0-w0: stopping experience collection (38000 times) [2024-06-15 20:28:42,080][1651469] Signal inference workers to resume experience collection... (38000 times) [2024-06-15 20:28:42,081][1652491] InferenceWorker_p0-w0: resuming experience collection (38000 times) [2024-06-15 20:28:42,724][1652491] Updated weights for policy 0, policy_version 730680 (0.0014) [2024-06-15 20:28:45,955][1648985] Fps is (10 sec: 45875.4, 60 sec: 45875.3, 300 sec: 46652.8). Total num frames: 1496449024. Throughput: 0: 11605.4. Samples: 374208000. Policy #0 lag: (min: 97.0, avg: 186.7, max: 334.0) [2024-06-15 20:28:45,956][1648985] Avg episode reward: [(0, '192.560')] [2024-06-15 20:28:46,579][1652491] Updated weights for policy 0, policy_version 730722 (0.0013) [2024-06-15 20:28:48,146][1652491] Updated weights for policy 0, policy_version 730787 (0.0014) [2024-06-15 20:28:49,997][1652491] Updated weights for policy 0, policy_version 730871 (0.0106) [2024-06-15 20:28:50,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 47513.6, 300 sec: 46874.9). Total num frames: 1496842240. Throughput: 0: 11639.5. Samples: 374269440. Policy #0 lag: (min: 97.0, avg: 186.7, max: 334.0) [2024-06-15 20:28:50,956][1648985] Avg episode reward: [(0, '180.120')] [2024-06-15 20:28:52,789][1652491] Updated weights for policy 0, policy_version 730898 (0.0013) [2024-06-15 20:28:53,874][1652491] Updated weights for policy 0, policy_version 730942 (0.0027) [2024-06-15 20:28:55,955][1648985] Fps is (10 sec: 52428.2, 60 sec: 45875.3, 300 sec: 46652.7). Total num frames: 1496973312. Throughput: 0: 11457.4. Samples: 374304768. Policy #0 lag: (min: 97.0, avg: 186.7, max: 334.0) [2024-06-15 20:28:55,956][1648985] Avg episode reward: [(0, '166.180')] [2024-06-15 20:28:56,003][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000730944_1496973312.pth... [2024-06-15 20:28:56,100][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000725488_1485799424.pth [2024-06-15 20:28:58,186][1652491] Updated weights for policy 0, policy_version 730992 (0.0012) [2024-06-15 20:28:59,419][1652491] Updated weights for policy 0, policy_version 731040 (0.0120) [2024-06-15 20:29:00,612][1652491] Updated weights for policy 0, policy_version 731090 (0.0014) [2024-06-15 20:29:00,955][1648985] Fps is (10 sec: 45875.7, 60 sec: 48059.9, 300 sec: 46763.8). Total num frames: 1497300992. Throughput: 0: 11844.3. Samples: 374380032. Policy #0 lag: (min: 97.0, avg: 186.7, max: 334.0) [2024-06-15 20:29:00,955][1648985] Avg episode reward: [(0, '167.570')] [2024-06-15 20:29:03,344][1652491] Updated weights for policy 0, policy_version 731155 (0.0012) [2024-06-15 20:29:04,337][1652491] Updated weights for policy 0, policy_version 731197 (0.0013) [2024-06-15 20:29:05,955][1648985] Fps is (10 sec: 52429.1, 60 sec: 45875.2, 300 sec: 46652.8). Total num frames: 1497497600. Throughput: 0: 11776.0. Samples: 374453248. Policy #0 lag: (min: 97.0, avg: 186.7, max: 334.0) [2024-06-15 20:29:05,956][1648985] Avg episode reward: [(0, '169.710')] [2024-06-15 20:29:09,300][1652491] Updated weights for policy 0, policy_version 731260 (0.0014) [2024-06-15 20:29:10,771][1652491] Updated weights for policy 0, policy_version 731312 (0.0210) [2024-06-15 20:29:10,955][1648985] Fps is (10 sec: 42597.9, 60 sec: 47513.5, 300 sec: 46874.9). Total num frames: 1497726976. Throughput: 0: 11673.6. Samples: 374488064. Policy #0 lag: (min: 97.0, avg: 186.7, max: 334.0) [2024-06-15 20:29:10,956][1648985] Avg episode reward: [(0, '158.520')] [2024-06-15 20:29:11,916][1652491] Updated weights for policy 0, policy_version 731364 (0.0013) [2024-06-15 20:29:15,036][1652491] Updated weights for policy 0, policy_version 731410 (0.0044) [2024-06-15 20:29:15,925][1652491] Updated weights for policy 0, policy_version 731451 (0.0078) [2024-06-15 20:29:15,955][1648985] Fps is (10 sec: 49151.5, 60 sec: 45329.0, 300 sec: 46874.9). Total num frames: 1497989120. Throughput: 0: 11605.4. Samples: 374556160. Policy #0 lag: (min: 97.0, avg: 186.7, max: 334.0) [2024-06-15 20:29:15,956][1648985] Avg episode reward: [(0, '161.530')] [2024-06-15 20:29:20,074][1652491] Updated weights for policy 0, policy_version 731513 (0.0037) [2024-06-15 20:29:20,955][1648985] Fps is (10 sec: 42597.7, 60 sec: 46421.2, 300 sec: 46986.2). Total num frames: 1498152960. Throughput: 0: 11764.6. Samples: 374630400. Policy #0 lag: (min: 97.0, avg: 186.7, max: 334.0) [2024-06-15 20:29:20,956][1648985] Avg episode reward: [(0, '164.940')] [2024-06-15 20:29:21,824][1652491] Updated weights for policy 0, policy_version 731557 (0.0012) [2024-06-15 20:29:22,116][1651469] Signal inference workers to stop experience collection... (38050 times) [2024-06-15 20:29:22,184][1652491] InferenceWorker_p0-w0: stopping experience collection (38050 times) [2024-06-15 20:29:22,361][1651469] Signal inference workers to resume experience collection... (38050 times) [2024-06-15 20:29:22,363][1652491] InferenceWorker_p0-w0: resuming experience collection (38050 times) [2024-06-15 20:29:22,998][1652491] Updated weights for policy 0, policy_version 731607 (0.0013) [2024-06-15 20:29:25,955][1648985] Fps is (10 sec: 45876.0, 60 sec: 47513.7, 300 sec: 46763.8). Total num frames: 1498447872. Throughput: 0: 11844.3. Samples: 374660096. Policy #0 lag: (min: 97.0, avg: 186.7, max: 334.0) [2024-06-15 20:29:25,955][1648985] Avg episode reward: [(0, '161.460')] [2024-06-15 20:29:26,129][1652491] Updated weights for policy 0, policy_version 731668 (0.0012) [2024-06-15 20:29:27,057][1652491] Updated weights for policy 0, policy_version 731709 (0.0082) [2024-06-15 20:29:30,582][1652491] Updated weights for policy 0, policy_version 731766 (0.0013) [2024-06-15 20:29:30,955][1648985] Fps is (10 sec: 52430.0, 60 sec: 47513.8, 300 sec: 47097.1). Total num frames: 1498677248. Throughput: 0: 11764.6. Samples: 374737408. Policy #0 lag: (min: 97.0, avg: 186.7, max: 334.0) [2024-06-15 20:29:30,955][1648985] Avg episode reward: [(0, '172.280')] [2024-06-15 20:29:32,923][1652491] Updated weights for policy 0, policy_version 731815 (0.0016) [2024-06-15 20:29:34,404][1652491] Updated weights for policy 0, policy_version 731873 (0.0010) [2024-06-15 20:29:35,955][1648985] Fps is (10 sec: 49151.0, 60 sec: 49151.9, 300 sec: 46986.0). Total num frames: 1498939392. Throughput: 0: 11901.1. Samples: 374804992. Policy #0 lag: (min: 97.0, avg: 186.7, max: 334.0) [2024-06-15 20:29:35,956][1648985] Avg episode reward: [(0, '182.610')] [2024-06-15 20:29:36,995][1652491] Updated weights for policy 0, policy_version 731924 (0.0013) [2024-06-15 20:29:37,966][1652491] Updated weights for policy 0, policy_version 731965 (0.0159) [2024-06-15 20:29:40,955][1648985] Fps is (10 sec: 45874.8, 60 sec: 46967.5, 300 sec: 46874.9). Total num frames: 1499136000. Throughput: 0: 11958.0. Samples: 374842880. Policy #0 lag: (min: 97.0, avg: 186.7, max: 334.0) [2024-06-15 20:29:40,956][1648985] Avg episode reward: [(0, '180.360')] [2024-06-15 20:29:41,072][1652491] Updated weights for policy 0, policy_version 732016 (0.0012) [2024-06-15 20:29:44,082][1652491] Updated weights for policy 0, policy_version 732096 (0.0015) [2024-06-15 20:29:45,955][1648985] Fps is (10 sec: 52429.5, 60 sec: 50244.2, 300 sec: 47319.3). Total num frames: 1499463680. Throughput: 0: 11855.6. Samples: 374913536. Policy #0 lag: (min: 97.0, avg: 186.7, max: 334.0) [2024-06-15 20:29:45,956][1648985] Avg episode reward: [(0, '172.910')] [2024-06-15 20:29:47,268][1652491] Updated weights for policy 0, policy_version 732165 (0.0013) [2024-06-15 20:29:50,936][1652491] Updated weights for policy 0, policy_version 732227 (0.0013) [2024-06-15 20:29:50,955][1648985] Fps is (10 sec: 45875.5, 60 sec: 45875.2, 300 sec: 46986.0). Total num frames: 1499594752. Throughput: 0: 11912.5. Samples: 374989312. Policy #0 lag: (min: 15.0, avg: 143.9, max: 271.0) [2024-06-15 20:29:50,955][1648985] Avg episode reward: [(0, '183.900')] [2024-06-15 20:29:52,251][1652491] Updated weights for policy 0, policy_version 732283 (0.0030) [2024-06-15 20:29:55,625][1652491] Updated weights for policy 0, policy_version 732351 (0.0013) [2024-06-15 20:29:55,955][1648985] Fps is (10 sec: 42598.1, 60 sec: 48605.9, 300 sec: 47430.3). Total num frames: 1499889664. Throughput: 0: 11878.4. Samples: 375022592. Policy #0 lag: (min: 15.0, avg: 143.9, max: 271.0) [2024-06-15 20:29:55,956][1648985] Avg episode reward: [(0, '181.160')] [2024-06-15 20:29:58,274][1652491] Updated weights for policy 0, policy_version 732417 (0.0014) [2024-06-15 20:29:59,674][1652491] Updated weights for policy 0, policy_version 732479 (0.0145) [2024-06-15 20:30:00,955][1648985] Fps is (10 sec: 52428.4, 60 sec: 46967.4, 300 sec: 47097.1). Total num frames: 1500119040. Throughput: 0: 11707.7. Samples: 375083008. Policy #0 lag: (min: 15.0, avg: 143.9, max: 271.0) [2024-06-15 20:30:00,956][1648985] Avg episode reward: [(0, '158.100')] [2024-06-15 20:30:04,136][1652491] Updated weights for policy 0, policy_version 732541 (0.0014) [2024-06-15 20:30:05,955][1648985] Fps is (10 sec: 36044.4, 60 sec: 45875.1, 300 sec: 47097.0). Total num frames: 1500250112. Throughput: 0: 11719.1. Samples: 375157760. Policy #0 lag: (min: 15.0, avg: 143.9, max: 271.0) [2024-06-15 20:30:05,956][1648985] Avg episode reward: [(0, '139.270')] [2024-06-15 20:30:06,683][1651469] Signal inference workers to stop experience collection... (38100 times) [2024-06-15 20:30:06,733][1652491] InferenceWorker_p0-w0: stopping experience collection (38100 times) [2024-06-15 20:30:06,921][1651469] Signal inference workers to resume experience collection... (38100 times) [2024-06-15 20:30:06,922][1652491] InferenceWorker_p0-w0: resuming experience collection (38100 times) [2024-06-15 20:30:06,925][1652491] Updated weights for policy 0, policy_version 732592 (0.0014) [2024-06-15 20:30:08,225][1652491] Updated weights for policy 0, policy_version 732656 (0.0039) [2024-06-15 20:30:10,567][1652491] Updated weights for policy 0, policy_version 732708 (0.0026) [2024-06-15 20:30:10,955][1648985] Fps is (10 sec: 49151.7, 60 sec: 48059.7, 300 sec: 47097.0). Total num frames: 1500610560. Throughput: 0: 11798.7. Samples: 375191040. Policy #0 lag: (min: 15.0, avg: 143.9, max: 271.0) [2024-06-15 20:30:10,956][1648985] Avg episode reward: [(0, '150.810')] [2024-06-15 20:30:11,087][1652491] Updated weights for policy 0, policy_version 732736 (0.0013) [2024-06-15 20:30:14,394][1652491] Updated weights for policy 0, policy_version 732795 (0.0013) [2024-06-15 20:30:15,955][1648985] Fps is (10 sec: 52429.2, 60 sec: 46421.4, 300 sec: 47097.1). Total num frames: 1500774400. Throughput: 0: 11787.4. Samples: 375267840. Policy #0 lag: (min: 15.0, avg: 143.9, max: 271.0) [2024-06-15 20:30:15,956][1648985] Avg episode reward: [(0, '161.990')] [2024-06-15 20:30:18,362][1652491] Updated weights for policy 0, policy_version 732864 (0.0125) [2024-06-15 20:30:20,934][1652491] Updated weights for policy 0, policy_version 732932 (0.0014) [2024-06-15 20:30:20,955][1648985] Fps is (10 sec: 42598.5, 60 sec: 48059.8, 300 sec: 47097.0). Total num frames: 1501036544. Throughput: 0: 11650.9. Samples: 375329280. Policy #0 lag: (min: 15.0, avg: 143.9, max: 271.0) [2024-06-15 20:30:20,956][1648985] Avg episode reward: [(0, '164.170')] [2024-06-15 20:30:25,106][1652491] Updated weights for policy 0, policy_version 733009 (0.0013) [2024-06-15 20:30:25,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 47513.5, 300 sec: 47097.0). Total num frames: 1501298688. Throughput: 0: 11605.3. Samples: 375365120. Policy #0 lag: (min: 15.0, avg: 143.9, max: 271.0) [2024-06-15 20:30:25,956][1648985] Avg episode reward: [(0, '167.410')] [2024-06-15 20:30:28,853][1652491] Updated weights for policy 0, policy_version 733062 (0.0014) [2024-06-15 20:30:30,615][1652491] Updated weights for policy 0, policy_version 733136 (0.0016) [2024-06-15 20:30:30,955][1648985] Fps is (10 sec: 42598.7, 60 sec: 46421.3, 300 sec: 47319.2). Total num frames: 1501462528. Throughput: 0: 11696.3. Samples: 375439872. Policy #0 lag: (min: 15.0, avg: 143.9, max: 271.0) [2024-06-15 20:30:30,956][1648985] Avg episode reward: [(0, '154.710')] [2024-06-15 20:30:32,967][1652491] Updated weights for policy 0, policy_version 733217 (0.0013) [2024-06-15 20:30:33,606][1652491] Updated weights for policy 0, policy_version 733248 (0.0011) [2024-06-15 20:30:35,956][1648985] Fps is (10 sec: 42597.3, 60 sec: 46421.2, 300 sec: 46985.9). Total num frames: 1501724672. Throughput: 0: 11377.7. Samples: 375501312. Policy #0 lag: (min: 15.0, avg: 143.9, max: 271.0) [2024-06-15 20:30:35,958][1648985] Avg episode reward: [(0, '146.890')] [2024-06-15 20:30:36,810][1652491] Updated weights for policy 0, policy_version 733298 (0.0013) [2024-06-15 20:30:40,955][1648985] Fps is (10 sec: 36044.6, 60 sec: 44782.9, 300 sec: 47097.0). Total num frames: 1501822976. Throughput: 0: 11525.7. Samples: 375541248. Policy #0 lag: (min: 15.0, avg: 143.9, max: 271.0) [2024-06-15 20:30:40,956][1648985] Avg episode reward: [(0, '143.890')] [2024-06-15 20:30:41,491][1652491] Updated weights for policy 0, policy_version 733330 (0.0014) [2024-06-15 20:30:43,170][1652491] Updated weights for policy 0, policy_version 733395 (0.0015) [2024-06-15 20:30:45,103][1652491] Updated weights for policy 0, policy_version 733476 (0.0111) [2024-06-15 20:30:45,955][1648985] Fps is (10 sec: 49153.0, 60 sec: 45875.1, 300 sec: 47097.1). Total num frames: 1502216192. Throughput: 0: 11411.9. Samples: 375596544. Policy #0 lag: (min: 15.0, avg: 143.9, max: 271.0) [2024-06-15 20:30:45,956][1648985] Avg episode reward: [(0, '152.580')] [2024-06-15 20:30:47,687][1651469] Signal inference workers to stop experience collection... (38150 times) [2024-06-15 20:30:47,750][1652491] InferenceWorker_p0-w0: stopping experience collection (38150 times) [2024-06-15 20:30:48,064][1651469] Signal inference workers to resume experience collection... (38150 times) [2024-06-15 20:30:48,065][1652491] InferenceWorker_p0-w0: resuming experience collection (38150 times) [2024-06-15 20:30:48,067][1652491] Updated weights for policy 0, policy_version 733536 (0.0012) [2024-06-15 20:30:50,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 45875.2, 300 sec: 47097.0). Total num frames: 1502347264. Throughput: 0: 11446.1. Samples: 375672832. Policy #0 lag: (min: 15.0, avg: 143.9, max: 271.0) [2024-06-15 20:30:50,956][1648985] Avg episode reward: [(0, '157.310')] [2024-06-15 20:30:53,896][1652491] Updated weights for policy 0, policy_version 733600 (0.0013) [2024-06-15 20:30:55,880][1652491] Updated weights for policy 0, policy_version 733669 (0.0011) [2024-06-15 20:30:55,967][1648985] Fps is (10 sec: 32728.2, 60 sec: 44227.8, 300 sec: 46873.0). Total num frames: 1502543872. Throughput: 0: 11499.8. Samples: 375708672. Policy #0 lag: (min: 15.0, avg: 143.9, max: 271.0) [2024-06-15 20:30:55,968][1648985] Avg episode reward: [(0, '170.970')] [2024-06-15 20:30:56,335][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000733696_1502609408.pth... [2024-06-15 20:30:56,528][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000728192_1491337216.pth [2024-06-15 20:30:56,533][1651469] Saving a milestone train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/milestones/checkpoint_000733696_1502609408.pth [2024-06-15 20:30:57,751][1652491] Updated weights for policy 0, policy_version 733750 (0.0011) [2024-06-15 20:31:00,434][1652491] Updated weights for policy 0, policy_version 733808 (0.0068) [2024-06-15 20:31:00,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 45875.2, 300 sec: 47097.1). Total num frames: 1502871552. Throughput: 0: 11070.6. Samples: 375766016. Policy #0 lag: (min: 15.0, avg: 143.9, max: 271.0) [2024-06-15 20:31:00,955][1648985] Avg episode reward: [(0, '170.720')] [2024-06-15 20:31:05,050][1652491] Updated weights for policy 0, policy_version 733827 (0.0020) [2024-06-15 20:31:05,955][1648985] Fps is (10 sec: 39368.7, 60 sec: 44782.8, 300 sec: 46763.8). Total num frames: 1502937088. Throughput: 0: 11366.3. Samples: 375840768. Policy #0 lag: (min: 15.0, avg: 143.9, max: 271.0) [2024-06-15 20:31:05,956][1648985] Avg episode reward: [(0, '169.660')] [2024-06-15 20:31:06,884][1652491] Updated weights for policy 0, policy_version 733904 (0.0012) [2024-06-15 20:31:09,187][1652491] Updated weights for policy 0, policy_version 734006 (0.0036) [2024-06-15 20:31:10,955][1648985] Fps is (10 sec: 39321.4, 60 sec: 44236.8, 300 sec: 46652.7). Total num frames: 1503264768. Throughput: 0: 11093.3. Samples: 375864320. Policy #0 lag: (min: 15.0, avg: 143.9, max: 271.0) [2024-06-15 20:31:10,956][1648985] Avg episode reward: [(0, '159.550')] [2024-06-15 20:31:12,214][1652491] Updated weights for policy 0, policy_version 734071 (0.0013) [2024-06-15 20:31:15,972][1648985] Fps is (10 sec: 45798.3, 60 sec: 43678.3, 300 sec: 46872.2). Total num frames: 1503395840. Throughput: 0: 11100.5. Samples: 375939584. Policy #0 lag: (min: 15.0, avg: 143.9, max: 271.0) [2024-06-15 20:31:15,974][1648985] Avg episode reward: [(0, '153.950')] [2024-06-15 20:31:17,201][1652491] Updated weights for policy 0, policy_version 734113 (0.0014) [2024-06-15 20:31:19,119][1652491] Updated weights for policy 0, policy_version 734202 (0.0103) [2024-06-15 20:31:20,456][1652491] Updated weights for policy 0, policy_version 734267 (0.0020) [2024-06-15 20:31:20,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 45875.2, 300 sec: 46763.8). Total num frames: 1503789056. Throughput: 0: 11104.8. Samples: 376001024. Policy #0 lag: (min: 15.0, avg: 143.9, max: 271.0) [2024-06-15 20:31:20,956][1648985] Avg episode reward: [(0, '156.390')] [2024-06-15 20:31:23,437][1652491] Updated weights for policy 0, policy_version 734311 (0.0086) [2024-06-15 20:31:25,955][1648985] Fps is (10 sec: 52518.4, 60 sec: 43690.7, 300 sec: 47097.1). Total num frames: 1503920128. Throughput: 0: 11161.6. Samples: 376043520. Policy #0 lag: (min: 15.0, avg: 143.9, max: 271.0) [2024-06-15 20:31:25,956][1648985] Avg episode reward: [(0, '158.760')] [2024-06-15 20:31:27,331][1652491] Updated weights for policy 0, policy_version 734355 (0.0010) [2024-06-15 20:31:28,798][1652491] Updated weights for policy 0, policy_version 734419 (0.0014) [2024-06-15 20:31:29,067][1651469] Signal inference workers to stop experience collection... (38200 times) [2024-06-15 20:31:29,112][1652491] InferenceWorker_p0-w0: stopping experience collection (38200 times) [2024-06-15 20:31:29,281][1651469] Signal inference workers to resume experience collection... (38200 times) [2024-06-15 20:31:29,282][1652491] InferenceWorker_p0-w0: resuming experience collection (38200 times) [2024-06-15 20:31:30,632][1652491] Updated weights for policy 0, policy_version 734503 (0.0013) [2024-06-15 20:31:30,955][1648985] Fps is (10 sec: 49152.2, 60 sec: 46967.5, 300 sec: 46986.0). Total num frames: 1504280576. Throughput: 0: 11457.4. Samples: 376112128. Policy #0 lag: (min: 47.0, avg: 180.5, max: 335.0) [2024-06-15 20:31:30,956][1648985] Avg episode reward: [(0, '146.460')] [2024-06-15 20:31:34,284][1652491] Updated weights for policy 0, policy_version 734544 (0.0015) [2024-06-15 20:31:35,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 45329.3, 300 sec: 47097.1). Total num frames: 1504444416. Throughput: 0: 11355.0. Samples: 376183808. Policy #0 lag: (min: 47.0, avg: 180.5, max: 335.0) [2024-06-15 20:31:35,956][1648985] Avg episode reward: [(0, '160.730')] [2024-06-15 20:31:38,459][1652491] Updated weights for policy 0, policy_version 734608 (0.0015) [2024-06-15 20:31:40,466][1652491] Updated weights for policy 0, policy_version 734695 (0.0097) [2024-06-15 20:31:40,955][1648985] Fps is (10 sec: 42598.2, 60 sec: 48059.7, 300 sec: 46874.9). Total num frames: 1504706560. Throughput: 0: 11415.0. Samples: 376222208. Policy #0 lag: (min: 47.0, avg: 180.5, max: 335.0) [2024-06-15 20:31:40,956][1648985] Avg episode reward: [(0, '145.910')] [2024-06-15 20:31:41,865][1652491] Updated weights for policy 0, policy_version 734736 (0.0015) [2024-06-15 20:31:45,955][1648985] Fps is (10 sec: 42598.2, 60 sec: 44236.8, 300 sec: 46763.8). Total num frames: 1504870400. Throughput: 0: 11593.9. Samples: 376287744. Policy #0 lag: (min: 47.0, avg: 180.5, max: 335.0) [2024-06-15 20:31:45,956][1648985] Avg episode reward: [(0, '162.730')] [2024-06-15 20:31:46,004][1652491] Updated weights for policy 0, policy_version 734816 (0.0013) [2024-06-15 20:31:49,693][1652491] Updated weights for policy 0, policy_version 734866 (0.0022) [2024-06-15 20:31:50,758][1652491] Updated weights for policy 0, policy_version 734914 (0.0012) [2024-06-15 20:31:50,955][1648985] Fps is (10 sec: 39321.9, 60 sec: 45875.2, 300 sec: 46652.7). Total num frames: 1505099776. Throughput: 0: 11594.0. Samples: 376362496. Policy #0 lag: (min: 47.0, avg: 180.5, max: 335.0) [2024-06-15 20:31:50,956][1648985] Avg episode reward: [(0, '181.250')] [2024-06-15 20:31:53,803][1652491] Updated weights for policy 0, policy_version 735008 (0.0014) [2024-06-15 20:31:54,565][1652491] Updated weights for policy 0, policy_version 735040 (0.0013) [2024-06-15 20:31:55,955][1648985] Fps is (10 sec: 49151.9, 60 sec: 46977.0, 300 sec: 46652.7). Total num frames: 1505361920. Throughput: 0: 11673.6. Samples: 376389632. Policy #0 lag: (min: 47.0, avg: 180.5, max: 335.0) [2024-06-15 20:31:55,956][1648985] Avg episode reward: [(0, '186.080')] [2024-06-15 20:31:58,079][1652491] Updated weights for policy 0, policy_version 735099 (0.0012) [2024-06-15 20:32:00,955][1648985] Fps is (10 sec: 42597.8, 60 sec: 44236.7, 300 sec: 46652.7). Total num frames: 1505525760. Throughput: 0: 11621.1. Samples: 376462336. Policy #0 lag: (min: 47.0, avg: 180.5, max: 335.0) [2024-06-15 20:32:00,956][1648985] Avg episode reward: [(0, '169.020')] [2024-06-15 20:32:01,211][1652491] Updated weights for policy 0, policy_version 735145 (0.0097) [2024-06-15 20:32:02,681][1652491] Updated weights for policy 0, policy_version 735216 (0.0017) [2024-06-15 20:32:05,474][1652491] Updated weights for policy 0, policy_version 735270 (0.0018) [2024-06-15 20:32:05,955][1648985] Fps is (10 sec: 52430.1, 60 sec: 49152.4, 300 sec: 46652.8). Total num frames: 1505886208. Throughput: 0: 11844.3. Samples: 376534016. Policy #0 lag: (min: 47.0, avg: 180.5, max: 335.0) [2024-06-15 20:32:05,955][1648985] Avg episode reward: [(0, '159.400')] [2024-06-15 20:32:07,253][1652491] Updated weights for policy 0, policy_version 735312 (0.0016) [2024-06-15 20:32:08,163][1652491] Updated weights for policy 0, policy_version 735353 (0.0015) [2024-06-15 20:32:10,955][1648985] Fps is (10 sec: 49152.2, 60 sec: 45875.2, 300 sec: 46874.9). Total num frames: 1506017280. Throughput: 0: 11798.7. Samples: 376574464. Policy #0 lag: (min: 47.0, avg: 180.5, max: 335.0) [2024-06-15 20:32:10,956][1648985] Avg episode reward: [(0, '155.900')] [2024-06-15 20:32:11,980][1651469] Signal inference workers to stop experience collection... (38250 times) [2024-06-15 20:32:12,006][1652491] InferenceWorker_p0-w0: stopping experience collection (38250 times) [2024-06-15 20:32:12,037][1652491] Updated weights for policy 0, policy_version 735411 (0.0271) [2024-06-15 20:32:12,167][1651469] Signal inference workers to resume experience collection... (38250 times) [2024-06-15 20:32:12,199][1652491] InferenceWorker_p0-w0: resuming experience collection (38250 times) [2024-06-15 20:32:13,397][1652491] Updated weights for policy 0, policy_version 735475 (0.0013) [2024-06-15 20:32:15,955][1648985] Fps is (10 sec: 42597.8, 60 sec: 48619.7, 300 sec: 46541.7). Total num frames: 1506312192. Throughput: 0: 11855.7. Samples: 376645632. Policy #0 lag: (min: 47.0, avg: 180.5, max: 335.0) [2024-06-15 20:32:15,956][1648985] Avg episode reward: [(0, '164.700')] [2024-06-15 20:32:15,991][1652491] Updated weights for policy 0, policy_version 735508 (0.0023) [2024-06-15 20:32:17,762][1652491] Updated weights for policy 0, policy_version 735554 (0.0013) [2024-06-15 20:32:19,173][1652491] Updated weights for policy 0, policy_version 735616 (0.0013) [2024-06-15 20:32:20,955][1648985] Fps is (10 sec: 52429.4, 60 sec: 45875.3, 300 sec: 47097.1). Total num frames: 1506541568. Throughput: 0: 11912.6. Samples: 376719872. Policy #0 lag: (min: 47.0, avg: 180.5, max: 335.0) [2024-06-15 20:32:20,956][1648985] Avg episode reward: [(0, '167.670')] [2024-06-15 20:32:23,807][1652491] Updated weights for policy 0, policy_version 735696 (0.0029) [2024-06-15 20:32:25,955][1648985] Fps is (10 sec: 49152.1, 60 sec: 48059.8, 300 sec: 46541.7). Total num frames: 1506803712. Throughput: 0: 11832.9. Samples: 376754688. Policy #0 lag: (min: 47.0, avg: 180.5, max: 335.0) [2024-06-15 20:32:25,955][1648985] Avg episode reward: [(0, '168.540')] [2024-06-15 20:32:26,886][1652491] Updated weights for policy 0, policy_version 735760 (0.0017) [2024-06-15 20:32:28,005][1652491] Updated weights for policy 0, policy_version 735808 (0.0013) [2024-06-15 20:32:29,547][1652491] Updated weights for policy 0, policy_version 735869 (0.0014) [2024-06-15 20:32:30,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 46421.4, 300 sec: 47097.1). Total num frames: 1507065856. Throughput: 0: 11867.1. Samples: 376821760. Policy #0 lag: (min: 47.0, avg: 180.5, max: 335.0) [2024-06-15 20:32:30,956][1648985] Avg episode reward: [(0, '169.060')] [2024-06-15 20:32:33,455][1652491] Updated weights for policy 0, policy_version 735934 (0.0097) [2024-06-15 20:32:35,955][1648985] Fps is (10 sec: 45874.8, 60 sec: 46967.5, 300 sec: 46431.6). Total num frames: 1507262464. Throughput: 0: 11867.0. Samples: 376896512. Policy #0 lag: (min: 47.0, avg: 180.5, max: 335.0) [2024-06-15 20:32:35,956][1648985] Avg episode reward: [(0, '167.080')] [2024-06-15 20:32:36,153][1652491] Updated weights for policy 0, policy_version 735990 (0.0019) [2024-06-15 20:32:38,789][1652491] Updated weights for policy 0, policy_version 736035 (0.0013) [2024-06-15 20:32:40,568][1652491] Updated weights for policy 0, policy_version 736102 (0.0014) [2024-06-15 20:32:40,955][1648985] Fps is (10 sec: 49151.5, 60 sec: 47513.6, 300 sec: 46986.0). Total num frames: 1507557376. Throughput: 0: 12128.7. Samples: 376935424. Policy #0 lag: (min: 47.0, avg: 180.5, max: 335.0) [2024-06-15 20:32:40,956][1648985] Avg episode reward: [(0, '168.400')] [2024-06-15 20:32:43,900][1652491] Updated weights for policy 0, policy_version 736160 (0.0013) [2024-06-15 20:32:45,955][1648985] Fps is (10 sec: 45875.5, 60 sec: 47513.7, 300 sec: 46541.7). Total num frames: 1507721216. Throughput: 0: 12026.4. Samples: 377003520. Policy #0 lag: (min: 47.0, avg: 180.5, max: 335.0) [2024-06-15 20:32:45,955][1648985] Avg episode reward: [(0, '156.570')] [2024-06-15 20:32:46,423][1652491] Updated weights for policy 0, policy_version 736210 (0.0012) [2024-06-15 20:32:47,395][1652491] Updated weights for policy 0, policy_version 736254 (0.0041) [2024-06-15 20:32:50,397][1652491] Updated weights for policy 0, policy_version 736336 (0.0012) [2024-06-15 20:32:50,955][1648985] Fps is (10 sec: 49152.2, 60 sec: 49152.0, 300 sec: 46874.9). Total num frames: 1508048896. Throughput: 0: 11912.5. Samples: 377070080. Policy #0 lag: (min: 47.0, avg: 180.5, max: 335.0) [2024-06-15 20:32:50,956][1648985] Avg episode reward: [(0, '150.890')] [2024-06-15 20:32:54,568][1652491] Updated weights for policy 0, policy_version 736400 (0.0012) [2024-06-15 20:32:54,971][1651469] Signal inference workers to stop experience collection... (38300 times) [2024-06-15 20:32:55,015][1652491] InferenceWorker_p0-w0: stopping experience collection (38300 times) [2024-06-15 20:32:55,196][1651469] Signal inference workers to resume experience collection... (38300 times) [2024-06-15 20:32:55,197][1652491] InferenceWorker_p0-w0: resuming experience collection (38300 times) [2024-06-15 20:32:55,450][1652491] Updated weights for policy 0, policy_version 736447 (0.0013) [2024-06-15 20:32:55,955][1648985] Fps is (10 sec: 52427.7, 60 sec: 48059.7, 300 sec: 46874.9). Total num frames: 1508245504. Throughput: 0: 11844.2. Samples: 377107456. Policy #0 lag: (min: 47.0, avg: 180.5, max: 335.0) [2024-06-15 20:32:55,957][1648985] Avg episode reward: [(0, '150.730')] [2024-06-15 20:32:55,967][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000736448_1508245504.pth... [2024-06-15 20:32:56,052][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000730944_1496973312.pth [2024-06-15 20:32:58,793][1652491] Updated weights for policy 0, policy_version 736501 (0.0014) [2024-06-15 20:33:00,955][1648985] Fps is (10 sec: 39321.5, 60 sec: 48605.9, 300 sec: 46430.6). Total num frames: 1508442112. Throughput: 0: 12003.5. Samples: 377185792. Policy #0 lag: (min: 47.0, avg: 180.5, max: 335.0) [2024-06-15 20:33:00,956][1648985] Avg episode reward: [(0, '155.930')] [2024-06-15 20:33:00,983][1652491] Updated weights for policy 0, policy_version 736560 (0.0131) [2024-06-15 20:33:02,887][1652491] Updated weights for policy 0, policy_version 736635 (0.0046) [2024-06-15 20:33:05,955][1648985] Fps is (10 sec: 42599.2, 60 sec: 46421.2, 300 sec: 46763.8). Total num frames: 1508671488. Throughput: 0: 11776.0. Samples: 377249792. Policy #0 lag: (min: 47.0, avg: 180.5, max: 335.0) [2024-06-15 20:33:05,956][1648985] Avg episode reward: [(0, '156.850')] [2024-06-15 20:33:06,517][1652491] Updated weights for policy 0, policy_version 736692 (0.0014) [2024-06-15 20:33:09,333][1652491] Updated weights for policy 0, policy_version 736736 (0.0014) [2024-06-15 20:33:10,955][1648985] Fps is (10 sec: 45874.5, 60 sec: 48059.7, 300 sec: 46208.4). Total num frames: 1508900864. Throughput: 0: 11844.2. Samples: 377287680. Policy #0 lag: (min: 47.0, avg: 180.5, max: 335.0) [2024-06-15 20:33:10,956][1648985] Avg episode reward: [(0, '176.170')] [2024-06-15 20:33:12,557][1652491] Updated weights for policy 0, policy_version 736816 (0.0014) [2024-06-15 20:33:14,627][1652491] Updated weights for policy 0, policy_version 736888 (0.0017) [2024-06-15 20:33:15,955][1648985] Fps is (10 sec: 49152.0, 60 sec: 47513.6, 300 sec: 46763.8). Total num frames: 1509163008. Throughput: 0: 11696.4. Samples: 377348096. Policy #0 lag: (min: 61.0, avg: 154.8, max: 311.0) [2024-06-15 20:33:15,955][1648985] Avg episode reward: [(0, '171.130')] [2024-06-15 20:33:17,220][1652491] Updated weights for policy 0, policy_version 736928 (0.0025) [2024-06-15 20:33:20,370][1652491] Updated weights for policy 0, policy_version 736982 (0.0014) [2024-06-15 20:33:20,955][1648985] Fps is (10 sec: 45876.5, 60 sec: 46967.5, 300 sec: 46652.8). Total num frames: 1509359616. Throughput: 0: 11719.1. Samples: 377423872. Policy #0 lag: (min: 61.0, avg: 154.8, max: 311.0) [2024-06-15 20:33:20,955][1648985] Avg episode reward: [(0, '167.960')] [2024-06-15 20:33:23,164][1652491] Updated weights for policy 0, policy_version 737027 (0.0013) [2024-06-15 20:33:24,697][1652491] Updated weights for policy 0, policy_version 737088 (0.0014) [2024-06-15 20:33:25,955][1648985] Fps is (10 sec: 45874.0, 60 sec: 46967.2, 300 sec: 46763.8). Total num frames: 1509621760. Throughput: 0: 11639.4. Samples: 377459200. Policy #0 lag: (min: 61.0, avg: 154.8, max: 311.0) [2024-06-15 20:33:25,956][1648985] Avg episode reward: [(0, '168.880')] [2024-06-15 20:33:26,134][1652491] Updated weights for policy 0, policy_version 737142 (0.0012) [2024-06-15 20:33:28,092][1652491] Updated weights for policy 0, policy_version 737171 (0.0012) [2024-06-15 20:33:30,955][1648985] Fps is (10 sec: 45875.0, 60 sec: 45875.2, 300 sec: 46874.9). Total num frames: 1509818368. Throughput: 0: 11605.3. Samples: 377525760. Policy #0 lag: (min: 61.0, avg: 154.8, max: 311.0) [2024-06-15 20:33:30,955][1648985] Avg episode reward: [(0, '171.160')] [2024-06-15 20:33:31,848][1652491] Updated weights for policy 0, policy_version 737248 (0.0014) [2024-06-15 20:33:32,685][1652491] Updated weights for policy 0, policy_version 737279 (0.0011) [2024-06-15 20:33:35,955][1648985] Fps is (10 sec: 42599.4, 60 sec: 46421.4, 300 sec: 46541.7). Total num frames: 1510047744. Throughput: 0: 11673.6. Samples: 377595392. Policy #0 lag: (min: 61.0, avg: 154.8, max: 311.0) [2024-06-15 20:33:35,956][1648985] Avg episode reward: [(0, '192.520')] [2024-06-15 20:33:36,176][1652491] Updated weights for policy 0, policy_version 737344 (0.0013) [2024-06-15 20:33:37,741][1652491] Updated weights for policy 0, policy_version 737407 (0.0013) [2024-06-15 20:33:39,484][1651469] Signal inference workers to stop experience collection... (38350 times) [2024-06-15 20:33:39,612][1652491] InferenceWorker_p0-w0: stopping experience collection (38350 times) [2024-06-15 20:33:39,746][1651469] Signal inference workers to resume experience collection... (38350 times) [2024-06-15 20:33:39,748][1652491] InferenceWorker_p0-w0: resuming experience collection (38350 times) [2024-06-15 20:33:40,955][1648985] Fps is (10 sec: 52426.9, 60 sec: 46421.1, 300 sec: 47097.0). Total num frames: 1510342656. Throughput: 0: 11514.3. Samples: 377625600. Policy #0 lag: (min: 61.0, avg: 154.8, max: 311.0) [2024-06-15 20:33:40,956][1648985] Avg episode reward: [(0, '175.210')] [2024-06-15 20:33:42,828][1652491] Updated weights for policy 0, policy_version 737475 (0.0014) [2024-06-15 20:33:45,955][1648985] Fps is (10 sec: 42598.4, 60 sec: 45875.2, 300 sec: 46208.4). Total num frames: 1510473728. Throughput: 0: 11389.2. Samples: 377698304. Policy #0 lag: (min: 61.0, avg: 154.8, max: 311.0) [2024-06-15 20:33:45,956][1648985] Avg episode reward: [(0, '151.530')] [2024-06-15 20:33:46,227][1652491] Updated weights for policy 0, policy_version 737537 (0.0013) [2024-06-15 20:33:47,940][1652491] Updated weights for policy 0, policy_version 737616 (0.0013) [2024-06-15 20:33:49,212][1652491] Updated weights for policy 0, policy_version 737662 (0.0012) [2024-06-15 20:33:50,955][1648985] Fps is (10 sec: 39322.7, 60 sec: 44782.9, 300 sec: 46652.8). Total num frames: 1510735872. Throughput: 0: 11457.4. Samples: 377765376. Policy #0 lag: (min: 61.0, avg: 154.8, max: 311.0) [2024-06-15 20:33:50,956][1648985] Avg episode reward: [(0, '143.770')] [2024-06-15 20:33:52,002][1652491] Updated weights for policy 0, policy_version 737726 (0.0032) [2024-06-15 20:33:55,683][1652491] Updated weights for policy 0, policy_version 737790 (0.0014) [2024-06-15 20:33:55,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 45875.3, 300 sec: 46430.6). Total num frames: 1510998016. Throughput: 0: 11423.3. Samples: 377801728. Policy #0 lag: (min: 61.0, avg: 154.8, max: 311.0) [2024-06-15 20:33:55,956][1648985] Avg episode reward: [(0, '135.920')] [2024-06-15 20:33:59,485][1652491] Updated weights for policy 0, policy_version 737856 (0.0015) [2024-06-15 20:34:00,955][1648985] Fps is (10 sec: 52428.2, 60 sec: 46967.4, 300 sec: 46652.7). Total num frames: 1511260160. Throughput: 0: 11423.3. Samples: 377862144. Policy #0 lag: (min: 61.0, avg: 154.8, max: 311.0) [2024-06-15 20:34:00,956][1648985] Avg episode reward: [(0, '149.370')] [2024-06-15 20:34:03,190][1652491] Updated weights for policy 0, policy_version 737926 (0.0014) [2024-06-15 20:34:05,955][1648985] Fps is (10 sec: 39321.3, 60 sec: 45329.0, 300 sec: 46319.5). Total num frames: 1511391232. Throughput: 0: 11320.9. Samples: 377933312. Policy #0 lag: (min: 61.0, avg: 154.8, max: 311.0) [2024-06-15 20:34:05,956][1648985] Avg episode reward: [(0, '155.800')] [2024-06-15 20:34:06,133][1652491] Updated weights for policy 0, policy_version 737987 (0.0073) [2024-06-15 20:34:07,533][1652491] Updated weights for policy 0, policy_version 738041 (0.0012) [2024-06-15 20:34:10,150][1652491] Updated weights for policy 0, policy_version 738068 (0.0013) [2024-06-15 20:34:10,955][1648985] Fps is (10 sec: 36045.3, 60 sec: 45329.2, 300 sec: 46208.4). Total num frames: 1511620608. Throughput: 0: 11241.3. Samples: 377965056. Policy #0 lag: (min: 61.0, avg: 154.8, max: 311.0) [2024-06-15 20:34:10,956][1648985] Avg episode reward: [(0, '145.700')] [2024-06-15 20:34:12,132][1652491] Updated weights for policy 0, policy_version 738144 (0.0014) [2024-06-15 20:34:15,689][1652491] Updated weights for policy 0, policy_version 738194 (0.0014) [2024-06-15 20:34:15,955][1648985] Fps is (10 sec: 45875.2, 60 sec: 44782.9, 300 sec: 46430.6). Total num frames: 1511849984. Throughput: 0: 11150.2. Samples: 378027520. Policy #0 lag: (min: 61.0, avg: 154.8, max: 311.0) [2024-06-15 20:34:15,956][1648985] Avg episode reward: [(0, '161.040')] [2024-06-15 20:34:18,065][1652491] Updated weights for policy 0, policy_version 738246 (0.0016) [2024-06-15 20:34:19,308][1652491] Updated weights for policy 0, policy_version 738300 (0.0013) [2024-06-15 20:34:20,955][1648985] Fps is (10 sec: 42598.0, 60 sec: 44782.8, 300 sec: 46097.3). Total num frames: 1512046592. Throughput: 0: 11241.2. Samples: 378101248. Policy #0 lag: (min: 61.0, avg: 154.8, max: 311.0) [2024-06-15 20:34:20,956][1648985] Avg episode reward: [(0, '157.290')] [2024-06-15 20:34:22,007][1652491] Updated weights for policy 0, policy_version 738361 (0.0014) [2024-06-15 20:34:23,280][1651469] Signal inference workers to stop experience collection... (38400 times) [2024-06-15 20:34:23,317][1652491] InferenceWorker_p0-w0: stopping experience collection (38400 times) [2024-06-15 20:34:23,552][1651469] Signal inference workers to resume experience collection... (38400 times) [2024-06-15 20:34:23,552][1652491] InferenceWorker_p0-w0: resuming experience collection (38400 times) [2024-06-15 20:34:23,784][1652491] Updated weights for policy 0, policy_version 738423 (0.0014) [2024-06-15 20:34:25,955][1648985] Fps is (10 sec: 45875.0, 60 sec: 44783.0, 300 sec: 46208.4). Total num frames: 1512308736. Throughput: 0: 11127.5. Samples: 378126336. Policy #0 lag: (min: 61.0, avg: 154.8, max: 311.0) [2024-06-15 20:34:25,956][1648985] Avg episode reward: [(0, '162.340')] [2024-06-15 20:34:27,520][1652491] Updated weights for policy 0, policy_version 738480 (0.0012) [2024-06-15 20:34:29,299][1652491] Updated weights for policy 0, policy_version 738515 (0.0013) [2024-06-15 20:34:30,955][1648985] Fps is (10 sec: 52429.2, 60 sec: 45875.1, 300 sec: 46208.4). Total num frames: 1512570880. Throughput: 0: 11275.4. Samples: 378205696. Policy #0 lag: (min: 61.0, avg: 154.8, max: 311.0) [2024-06-15 20:34:30,956][1648985] Avg episode reward: [(0, '177.580')] [2024-06-15 20:34:32,004][1652491] Updated weights for policy 0, policy_version 738576 (0.0013) [2024-06-15 20:34:33,479][1652491] Updated weights for policy 0, policy_version 738625 (0.0020) [2024-06-15 20:34:34,793][1652491] Updated weights for policy 0, policy_version 738682 (0.0012) [2024-06-15 20:34:35,955][1648985] Fps is (10 sec: 52427.3, 60 sec: 46421.0, 300 sec: 46430.5). Total num frames: 1512833024. Throughput: 0: 11343.6. Samples: 378275840. Policy #0 lag: (min: 61.0, avg: 154.8, max: 311.0) [2024-06-15 20:34:35,956][1648985] Avg episode reward: [(0, '188.650')] [2024-06-15 20:34:37,948][1652491] Updated weights for policy 0, policy_version 738724 (0.0012) [2024-06-15 20:34:40,451][1652491] Updated weights for policy 0, policy_version 738769 (0.0014) [2024-06-15 20:34:40,955][1648985] Fps is (10 sec: 45875.3, 60 sec: 44783.2, 300 sec: 45986.3). Total num frames: 1513029632. Throughput: 0: 11434.7. Samples: 378316288. Policy #0 lag: (min: 61.0, avg: 154.8, max: 311.0) [2024-06-15 20:34:40,956][1648985] Avg episode reward: [(0, '175.300')] [2024-06-15 20:34:42,970][1652491] Updated weights for policy 0, policy_version 738823 (0.0013) [2024-06-15 20:34:44,263][1652491] Updated weights for policy 0, policy_version 738882 (0.0012) [2024-06-15 20:34:45,611][1652491] Updated weights for policy 0, policy_version 738942 (0.0013) [2024-06-15 20:34:45,955][1648985] Fps is (10 sec: 52430.8, 60 sec: 48059.7, 300 sec: 46652.7). Total num frames: 1513357312. Throughput: 0: 11628.1. Samples: 378385408. Policy #0 lag: (min: 61.0, avg: 154.8, max: 311.0) [2024-06-15 20:34:45,956][1648985] Avg episode reward: [(0, '180.780')] [2024-06-15 20:34:49,709][1652491] Updated weights for policy 0, policy_version 739008 (0.0025) [2024-06-15 20:34:50,955][1648985] Fps is (10 sec: 45875.2, 60 sec: 45875.2, 300 sec: 46097.4). Total num frames: 1513488384. Throughput: 0: 11753.3. Samples: 378462208. Policy #0 lag: (min: 61.0, avg: 154.8, max: 311.0) [2024-06-15 20:34:50,956][1648985] Avg episode reward: [(0, '170.800')] [2024-06-15 20:34:51,707][1652491] Updated weights for policy 0, policy_version 739060 (0.0012) [2024-06-15 20:34:54,293][1652491] Updated weights for policy 0, policy_version 739092 (0.0013) [2024-06-15 20:34:55,955][1648985] Fps is (10 sec: 45874.7, 60 sec: 46967.4, 300 sec: 46430.6). Total num frames: 1513816064. Throughput: 0: 11878.4. Samples: 378499584. Policy #0 lag: (min: 61.0, avg: 154.8, max: 311.0) [2024-06-15 20:34:55,956][1648985] Avg episode reward: [(0, '152.410')] [2024-06-15 20:34:56,156][1652491] Updated weights for policy 0, policy_version 739184 (0.0013) [2024-06-15 20:34:56,487][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000739200_1513881600.pth... [2024-06-15 20:34:56,537][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000733696_1502609408.pth [2024-06-15 20:34:59,215][1652491] Updated weights for policy 0, policy_version 739218 (0.0013) [2024-06-15 20:35:00,955][1648985] Fps is (10 sec: 52428.3, 60 sec: 45875.2, 300 sec: 46652.8). Total num frames: 1514012672. Throughput: 0: 12026.3. Samples: 378568704. Policy #0 lag: (min: 111.0, avg: 209.3, max: 351.0) [2024-06-15 20:35:00,956][1648985] Avg episode reward: [(0, '148.750')] [2024-06-15 20:35:01,609][1652491] Updated weights for policy 0, policy_version 739272 (0.0014) [2024-06-15 20:35:02,541][1652491] Updated weights for policy 0, policy_version 739320 (0.0043) [2024-06-15 20:35:05,955][1648985] Fps is (10 sec: 42598.9, 60 sec: 47513.6, 300 sec: 46208.5). Total num frames: 1514242048. Throughput: 0: 11935.3. Samples: 378638336. Policy #0 lag: (min: 111.0, avg: 209.3, max: 351.0) [2024-06-15 20:35:05,956][1648985] Avg episode reward: [(0, '157.930')] [2024-06-15 20:35:06,400][1651469] Signal inference workers to stop experience collection... (38450 times) [2024-06-15 20:35:06,436][1652491] InferenceWorker_p0-w0: stopping experience collection (38450 times) [2024-06-15 20:35:06,622][1651469] Signal inference workers to resume experience collection... (38450 times) [2024-06-15 20:35:06,623][1652491] InferenceWorker_p0-w0: resuming experience collection (38450 times) [2024-06-15 20:35:06,770][1652491] Updated weights for policy 0, policy_version 739409 (0.0014) [2024-06-15 20:35:07,565][1652491] Updated weights for policy 0, policy_version 739451 (0.0014) [2024-06-15 20:35:10,955][1648985] Fps is (10 sec: 45875.2, 60 sec: 47513.5, 300 sec: 46430.6). Total num frames: 1514471424. Throughput: 0: 12083.2. Samples: 378670080. Policy #0 lag: (min: 111.0, avg: 209.3, max: 351.0) [2024-06-15 20:35:10,956][1648985] Avg episode reward: [(0, '155.980')] [2024-06-15 20:35:11,542][1652491] Updated weights for policy 0, policy_version 739515 (0.0013) [2024-06-15 20:35:13,875][1652491] Updated weights for policy 0, policy_version 739577 (0.0013) [2024-06-15 20:35:15,955][1648985] Fps is (10 sec: 42598.6, 60 sec: 46967.5, 300 sec: 46208.5). Total num frames: 1514668032. Throughput: 0: 11946.7. Samples: 378743296. Policy #0 lag: (min: 111.0, avg: 209.3, max: 351.0) [2024-06-15 20:35:15,955][1648985] Avg episode reward: [(0, '152.660')] [2024-06-15 20:35:17,789][1652491] Updated weights for policy 0, policy_version 739648 (0.0014) [2024-06-15 20:35:19,111][1652491] Updated weights for policy 0, policy_version 739708 (0.0014) [2024-06-15 20:35:20,955][1648985] Fps is (10 sec: 45875.4, 60 sec: 48059.8, 300 sec: 46208.4). Total num frames: 1514930176. Throughput: 0: 11867.1. Samples: 378809856. Policy #0 lag: (min: 111.0, avg: 209.3, max: 351.0) [2024-06-15 20:35:20,956][1648985] Avg episode reward: [(0, '154.180')] [2024-06-15 20:35:22,225][1652491] Updated weights for policy 0, policy_version 739760 (0.0014) [2024-06-15 20:35:25,252][1652491] Updated weights for policy 0, policy_version 739824 (0.0013) [2024-06-15 20:35:25,955][1648985] Fps is (10 sec: 52428.5, 60 sec: 48059.8, 300 sec: 46541.7). Total num frames: 1515192320. Throughput: 0: 11787.4. Samples: 378846720. Policy #0 lag: (min: 111.0, avg: 209.3, max: 351.0) [2024-06-15 20:35:25,956][1648985] Avg episode reward: [(0, '168.830')] [2024-06-15 20:35:29,075][1652491] Updated weights for policy 0, policy_version 739904 (0.0013) [2024-06-15 20:35:30,956][1648985] Fps is (10 sec: 52422.6, 60 sec: 48058.8, 300 sec: 46541.5). Total num frames: 1515454464. Throughput: 0: 11707.4. Samples: 378912256. Policy #0 lag: (min: 111.0, avg: 209.3, max: 351.0) [2024-06-15 20:35:30,957][1648985] Avg episode reward: [(0, '159.500')] [2024-06-15 20:35:31,995][1652491] Updated weights for policy 0, policy_version 739970 (0.0013) [2024-06-15 20:35:35,955][1648985] Fps is (10 sec: 39321.3, 60 sec: 45875.4, 300 sec: 46652.7). Total num frames: 1515585536. Throughput: 0: 11730.5. Samples: 378990080. Policy #0 lag: (min: 111.0, avg: 209.3, max: 351.0) [2024-06-15 20:35:35,956][1648985] Avg episode reward: [(0, '169.790')] [2024-06-15 20:35:36,181][1652491] Updated weights for policy 0, policy_version 740051 (0.0014) [2024-06-15 20:35:39,049][1652491] Updated weights for policy 0, policy_version 740098 (0.0013) [2024-06-15 20:35:40,606][1652491] Updated weights for policy 0, policy_version 740165 (0.0029) [2024-06-15 20:35:40,955][1648985] Fps is (10 sec: 42603.5, 60 sec: 47513.6, 300 sec: 46319.5). Total num frames: 1515880448. Throughput: 0: 11673.6. Samples: 379024896. Policy #0 lag: (min: 111.0, avg: 209.3, max: 351.0) [2024-06-15 20:35:40,956][1648985] Avg episode reward: [(0, '174.090')] [2024-06-15 20:35:41,654][1652491] Updated weights for policy 0, policy_version 740224 (0.0013) [2024-06-15 20:35:43,885][1652491] Updated weights for policy 0, policy_version 740284 (0.0013) [2024-06-15 20:35:45,955][1648985] Fps is (10 sec: 52429.7, 60 sec: 45875.3, 300 sec: 46652.8). Total num frames: 1516109824. Throughput: 0: 11639.5. Samples: 379092480. Policy #0 lag: (min: 111.0, avg: 209.3, max: 351.0) [2024-06-15 20:35:45,955][1648985] Avg episode reward: [(0, '162.530')] [2024-06-15 20:35:47,521][1652491] Updated weights for policy 0, policy_version 740336 (0.0012) [2024-06-15 20:35:50,955][1648985] Fps is (10 sec: 39321.2, 60 sec: 46421.2, 300 sec: 46543.6). Total num frames: 1516273664. Throughput: 0: 11776.0. Samples: 379168256. Policy #0 lag: (min: 111.0, avg: 209.3, max: 351.0) [2024-06-15 20:35:50,956][1648985] Avg episode reward: [(0, '141.570')] [2024-06-15 20:35:51,267][1651469] Signal inference workers to stop experience collection... (38500 times) [2024-06-15 20:35:51,297][1652491] InferenceWorker_p0-w0: stopping experience collection (38500 times) [2024-06-15 20:35:51,583][1651469] Signal inference workers to resume experience collection... (38500 times) [2024-06-15 20:35:51,584][1652491] InferenceWorker_p0-w0: resuming experience collection (38500 times) [2024-06-15 20:35:52,247][1652491] Updated weights for policy 0, policy_version 740432 (0.0014) [2024-06-15 20:35:53,476][1652491] Updated weights for policy 0, policy_version 740480 (0.0013) [2024-06-15 20:35:55,174][1652491] Updated weights for policy 0, policy_version 740544 (0.0014) [2024-06-15 20:35:55,955][1648985] Fps is (10 sec: 52428.3, 60 sec: 46967.5, 300 sec: 46652.7). Total num frames: 1516634112. Throughput: 0: 11650.9. Samples: 379194368. Policy #0 lag: (min: 111.0, avg: 209.3, max: 351.0) [2024-06-15 20:35:55,956][1648985] Avg episode reward: [(0, '147.280')] [2024-06-15 20:35:59,610][1652491] Updated weights for policy 0, policy_version 740602 (0.0018) [2024-06-15 20:36:00,955][1648985] Fps is (10 sec: 49152.7, 60 sec: 45875.3, 300 sec: 46874.9). Total num frames: 1516765184. Throughput: 0: 11480.2. Samples: 379259904. Policy #0 lag: (min: 111.0, avg: 209.3, max: 351.0) [2024-06-15 20:36:00,956][1648985] Avg episode reward: [(0, '143.030')] [2024-06-15 20:36:03,385][1652491] Updated weights for policy 0, policy_version 740656 (0.0126) [2024-06-15 20:36:05,157][1652491] Updated weights for policy 0, policy_version 740734 (0.0106) [2024-06-15 20:36:05,955][1648985] Fps is (10 sec: 39321.4, 60 sec: 46421.3, 300 sec: 46652.7). Total num frames: 1517027328. Throughput: 0: 11491.6. Samples: 379326976. Policy #0 lag: (min: 111.0, avg: 209.3, max: 351.0) [2024-06-15 20:36:05,956][1648985] Avg episode reward: [(0, '168.060')] [2024-06-15 20:36:06,816][1652491] Updated weights for policy 0, policy_version 740792 (0.0013) [2024-06-15 20:36:10,532][1652491] Updated weights for policy 0, policy_version 740855 (0.0015) [2024-06-15 20:36:10,955][1648985] Fps is (10 sec: 52427.0, 60 sec: 46967.3, 300 sec: 47099.7). Total num frames: 1517289472. Throughput: 0: 11468.7. Samples: 379362816. Policy #0 lag: (min: 111.0, avg: 209.3, max: 351.0) [2024-06-15 20:36:10,956][1648985] Avg episode reward: [(0, '170.740')] [2024-06-15 20:36:15,182][1652491] Updated weights for policy 0, policy_version 740912 (0.0012) [2024-06-15 20:36:15,955][1648985] Fps is (10 sec: 42598.7, 60 sec: 46421.3, 300 sec: 46319.5). Total num frames: 1517453312. Throughput: 0: 11673.9. Samples: 379437568. Policy #0 lag: (min: 111.0, avg: 209.3, max: 351.0) [2024-06-15 20:36:15,955][1648985] Avg episode reward: [(0, '166.040')] [2024-06-15 20:36:16,669][1652491] Updated weights for policy 0, policy_version 740976 (0.0161) [2024-06-15 20:36:20,955][1648985] Fps is (10 sec: 39322.8, 60 sec: 45875.2, 300 sec: 46652.7). Total num frames: 1517682688. Throughput: 0: 11252.6. Samples: 379496448. Policy #0 lag: (min: 111.0, avg: 209.3, max: 351.0) [2024-06-15 20:36:20,956][1648985] Avg episode reward: [(0, '156.690')] [2024-06-15 20:36:22,089][1652491] Updated weights for policy 0, policy_version 741075 (0.0015) [2024-06-15 20:36:25,955][1648985] Fps is (10 sec: 36044.0, 60 sec: 43690.5, 300 sec: 45875.2). Total num frames: 1517813760. Throughput: 0: 11150.2. Samples: 379526656. Policy #0 lag: (min: 111.0, avg: 209.3, max: 351.0) [2024-06-15 20:36:25,956][1648985] Avg episode reward: [(0, '145.530')] [2024-06-15 20:36:27,228][1652491] Updated weights for policy 0, policy_version 741153 (0.0215) [2024-06-15 20:36:28,607][1652491] Updated weights for policy 0, policy_version 741219 (0.0013) [2024-06-15 20:36:30,709][1652491] Updated weights for policy 0, policy_version 741310 (0.0016) [2024-06-15 20:36:30,955][1648985] Fps is (10 sec: 52428.1, 60 sec: 45876.0, 300 sec: 46652.7). Total num frames: 1518206976. Throughput: 0: 11138.8. Samples: 379593728. Policy #0 lag: (min: 111.0, avg: 209.3, max: 351.0) [2024-06-15 20:36:30,956][1648985] Avg episode reward: [(0, '149.040')] [2024-06-15 20:36:34,024][1651469] Signal inference workers to stop experience collection... (38550 times) [2024-06-15 20:36:34,140][1652491] InferenceWorker_p0-w0: stopping experience collection (38550 times) [2024-06-15 20:36:34,471][1651469] Signal inference workers to resume experience collection... (38550 times) [2024-06-15 20:36:34,472][1652491] InferenceWorker_p0-w0: resuming experience collection (38550 times) [2024-06-15 20:36:34,918][1652491] Updated weights for policy 0, policy_version 741360 (0.0014) [2024-06-15 20:36:35,955][1648985] Fps is (10 sec: 52429.8, 60 sec: 45875.3, 300 sec: 46208.4). Total num frames: 1518338048. Throughput: 0: 10865.8. Samples: 379657216. Policy #0 lag: (min: 111.0, avg: 209.3, max: 351.0) [2024-06-15 20:36:35,956][1648985] Avg episode reward: [(0, '146.030')] [2024-06-15 20:36:38,709][1652491] Updated weights for policy 0, policy_version 741410 (0.0016) [2024-06-15 20:36:40,800][1652491] Updated weights for policy 0, policy_version 741482 (0.0015) [2024-06-15 20:36:40,955][1648985] Fps is (10 sec: 36045.5, 60 sec: 44783.0, 300 sec: 46430.6). Total num frames: 1518567424. Throughput: 0: 11025.1. Samples: 379690496. Policy #0 lag: (min: 57.0, avg: 141.9, max: 313.0) [2024-06-15 20:36:40,955][1648985] Avg episode reward: [(0, '171.460')] [2024-06-15 20:36:42,221][1652491] Updated weights for policy 0, policy_version 741538 (0.0010) [2024-06-15 20:36:45,955][1648985] Fps is (10 sec: 39321.4, 60 sec: 43690.6, 300 sec: 46208.4). Total num frames: 1518731264. Throughput: 0: 11116.1. Samples: 379760128. Policy #0 lag: (min: 57.0, avg: 141.9, max: 313.0) [2024-06-15 20:36:45,956][1648985] Avg episode reward: [(0, '157.600')] [2024-06-15 20:36:46,724][1652491] Updated weights for policy 0, policy_version 741603 (0.0012) [2024-06-15 20:36:47,262][1652491] Updated weights for policy 0, policy_version 741631 (0.0012) [2024-06-15 20:36:50,085][1652491] Updated weights for policy 0, policy_version 741685 (0.0026) [2024-06-15 20:36:50,955][1648985] Fps is (10 sec: 42598.2, 60 sec: 45329.2, 300 sec: 46208.4). Total num frames: 1518993408. Throughput: 0: 11195.7. Samples: 379830784. Policy #0 lag: (min: 57.0, avg: 141.9, max: 313.0) [2024-06-15 20:36:50,956][1648985] Avg episode reward: [(0, '162.750')] [2024-06-15 20:36:51,960][1652491] Updated weights for policy 0, policy_version 741736 (0.0014) [2024-06-15 20:36:52,925][1652491] Updated weights for policy 0, policy_version 741792 (0.0021) [2024-06-15 20:36:55,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 43690.6, 300 sec: 46541.7). Total num frames: 1519255552. Throughput: 0: 11138.9. Samples: 379864064. Policy #0 lag: (min: 57.0, avg: 141.9, max: 313.0) [2024-06-15 20:36:55,956][1648985] Avg episode reward: [(0, '157.300')] [2024-06-15 20:36:55,962][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000741824_1519255552.pth... [2024-06-15 20:36:56,053][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000736448_1508245504.pth [2024-06-15 20:36:57,650][1652491] Updated weights for policy 0, policy_version 741860 (0.0013) [2024-06-15 20:36:59,163][1652491] Updated weights for policy 0, policy_version 741904 (0.0011) [2024-06-15 20:37:00,955][1648985] Fps is (10 sec: 52428.2, 60 sec: 45875.1, 300 sec: 46208.4). Total num frames: 1519517696. Throughput: 0: 11172.9. Samples: 379940352. Policy #0 lag: (min: 57.0, avg: 141.9, max: 313.0) [2024-06-15 20:37:00,956][1648985] Avg episode reward: [(0, '164.290')] [2024-06-15 20:37:02,499][1652491] Updated weights for policy 0, policy_version 741968 (0.0023) [2024-06-15 20:37:04,141][1652491] Updated weights for policy 0, policy_version 742035 (0.0011) [2024-06-15 20:37:05,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 45875.2, 300 sec: 46652.8). Total num frames: 1519779840. Throughput: 0: 11468.8. Samples: 380012544. Policy #0 lag: (min: 57.0, avg: 141.9, max: 313.0) [2024-06-15 20:37:05,956][1648985] Avg episode reward: [(0, '149.510')] [2024-06-15 20:37:08,175][1652491] Updated weights for policy 0, policy_version 742082 (0.0012) [2024-06-15 20:37:09,901][1652491] Updated weights for policy 0, policy_version 742160 (0.0103) [2024-06-15 20:37:10,955][1648985] Fps is (10 sec: 49152.7, 60 sec: 45329.3, 300 sec: 46430.6). Total num frames: 1520009216. Throughput: 0: 11673.7. Samples: 380051968. Policy #0 lag: (min: 57.0, avg: 141.9, max: 313.0) [2024-06-15 20:37:10,956][1648985] Avg episode reward: [(0, '152.740')] [2024-06-15 20:37:11,026][1652491] Updated weights for policy 0, policy_version 742208 (0.0016) [2024-06-15 20:37:14,537][1652491] Updated weights for policy 0, policy_version 742273 (0.0012) [2024-06-15 20:37:14,823][1651469] Signal inference workers to stop experience collection... (38600 times) [2024-06-15 20:37:14,858][1652491] InferenceWorker_p0-w0: stopping experience collection (38600 times) [2024-06-15 20:37:15,176][1651469] Signal inference workers to resume experience collection... (38600 times) [2024-06-15 20:37:15,177][1652491] InferenceWorker_p0-w0: resuming experience collection (38600 times) [2024-06-15 20:37:15,767][1652491] Updated weights for policy 0, policy_version 742331 (0.0012) [2024-06-15 20:37:15,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 47513.5, 300 sec: 46652.7). Total num frames: 1520304128. Throughput: 0: 11719.1. Samples: 380121088. Policy #0 lag: (min: 57.0, avg: 141.9, max: 313.0) [2024-06-15 20:37:15,956][1648985] Avg episode reward: [(0, '166.080')] [2024-06-15 20:37:20,609][1652491] Updated weights for policy 0, policy_version 742388 (0.0013) [2024-06-15 20:37:20,955][1648985] Fps is (10 sec: 42598.3, 60 sec: 45875.2, 300 sec: 46208.4). Total num frames: 1520435200. Throughput: 0: 11980.8. Samples: 380196352. Policy #0 lag: (min: 57.0, avg: 141.9, max: 313.0) [2024-06-15 20:37:20,956][1648985] Avg episode reward: [(0, '148.700')] [2024-06-15 20:37:22,051][1652491] Updated weights for policy 0, policy_version 742462 (0.0011) [2024-06-15 20:37:25,100][1652491] Updated weights for policy 0, policy_version 742516 (0.0014) [2024-06-15 20:37:25,955][1648985] Fps is (10 sec: 42597.9, 60 sec: 48605.9, 300 sec: 46319.5). Total num frames: 1520730112. Throughput: 0: 12049.0. Samples: 380232704. Policy #0 lag: (min: 57.0, avg: 141.9, max: 313.0) [2024-06-15 20:37:25,956][1648985] Avg episode reward: [(0, '142.780')] [2024-06-15 20:37:26,653][1652491] Updated weights for policy 0, policy_version 742592 (0.0014) [2024-06-15 20:37:30,955][1648985] Fps is (10 sec: 39321.7, 60 sec: 43690.8, 300 sec: 45986.3). Total num frames: 1520828416. Throughput: 0: 11980.8. Samples: 380299264. Policy #0 lag: (min: 57.0, avg: 141.9, max: 313.0) [2024-06-15 20:37:30,956][1648985] Avg episode reward: [(0, '122.700')] [2024-06-15 20:37:32,918][1652491] Updated weights for policy 0, policy_version 742656 (0.0012) [2024-06-15 20:37:35,663][1652491] Updated weights for policy 0, policy_version 742736 (0.0022) [2024-06-15 20:37:35,955][1648985] Fps is (10 sec: 39323.0, 60 sec: 46421.5, 300 sec: 45986.3). Total num frames: 1521123328. Throughput: 0: 11798.8. Samples: 380361728. Policy #0 lag: (min: 57.0, avg: 141.9, max: 313.0) [2024-06-15 20:37:35,955][1648985] Avg episode reward: [(0, '129.930')] [2024-06-15 20:37:38,292][1652491] Updated weights for policy 0, policy_version 742832 (0.0013) [2024-06-15 20:37:40,955][1648985] Fps is (10 sec: 52428.5, 60 sec: 46421.3, 300 sec: 46208.4). Total num frames: 1521352704. Throughput: 0: 11514.3. Samples: 380382208. Policy #0 lag: (min: 57.0, avg: 141.9, max: 313.0) [2024-06-15 20:37:40,956][1648985] Avg episode reward: [(0, '162.540')] [2024-06-15 20:37:44,037][1652491] Updated weights for policy 0, policy_version 742871 (0.0042) [2024-06-15 20:37:45,397][1652491] Updated weights for policy 0, policy_version 742929 (0.0054) [2024-06-15 20:37:45,956][1648985] Fps is (10 sec: 45869.1, 60 sec: 47512.7, 300 sec: 45875.0). Total num frames: 1521582080. Throughput: 0: 11627.8. Samples: 380463616. Policy #0 lag: (min: 57.0, avg: 141.9, max: 313.0) [2024-06-15 20:37:45,957][1648985] Avg episode reward: [(0, '167.520')] [2024-06-15 20:37:46,435][1652491] Updated weights for policy 0, policy_version 742977 (0.0012) [2024-06-15 20:37:47,330][1652491] Updated weights for policy 0, policy_version 743025 (0.0012) [2024-06-15 20:37:49,055][1652491] Updated weights for policy 0, policy_version 743058 (0.0013) [2024-06-15 20:37:50,955][1648985] Fps is (10 sec: 52429.1, 60 sec: 48059.7, 300 sec: 46208.5). Total num frames: 1521876992. Throughput: 0: 11559.8. Samples: 380532736. Policy #0 lag: (min: 57.0, avg: 141.9, max: 313.0) [2024-06-15 20:37:50,955][1648985] Avg episode reward: [(0, '150.440')] [2024-06-15 20:37:55,210][1652491] Updated weights for policy 0, policy_version 743152 (0.0056) [2024-06-15 20:37:55,955][1648985] Fps is (10 sec: 42603.4, 60 sec: 45875.3, 300 sec: 45986.3). Total num frames: 1522008064. Throughput: 0: 11707.7. Samples: 380578816. Policy #0 lag: (min: 57.0, avg: 141.9, max: 313.0) [2024-06-15 20:37:55,956][1648985] Avg episode reward: [(0, '143.170')] [2024-06-15 20:37:56,619][1651469] Signal inference workers to stop experience collection... (38650 times) [2024-06-15 20:37:56,662][1652491] InferenceWorker_p0-w0: stopping experience collection (38650 times) [2024-06-15 20:37:56,802][1651469] Signal inference workers to resume experience collection... (38650 times) [2024-06-15 20:37:56,802][1652491] InferenceWorker_p0-w0: resuming experience collection (38650 times) [2024-06-15 20:37:58,388][1652491] Updated weights for policy 0, policy_version 743268 (0.0136) [2024-06-15 20:38:00,955][1648985] Fps is (10 sec: 39321.3, 60 sec: 45875.2, 300 sec: 46097.3). Total num frames: 1522270208. Throughput: 0: 11218.5. Samples: 380625920. Policy #0 lag: (min: 57.0, avg: 141.9, max: 313.0) [2024-06-15 20:38:00,956][1648985] Avg episode reward: [(0, '148.300')] [2024-06-15 20:38:02,108][1652491] Updated weights for policy 0, policy_version 743359 (0.0013) [2024-06-15 20:38:05,955][1648985] Fps is (10 sec: 39321.4, 60 sec: 43690.7, 300 sec: 45764.1). Total num frames: 1522401280. Throughput: 0: 11002.3. Samples: 380691456. Policy #0 lag: (min: 57.0, avg: 141.9, max: 313.0) [2024-06-15 20:38:05,956][1648985] Avg episode reward: [(0, '163.100')] [2024-06-15 20:38:09,806][1652491] Updated weights for policy 0, policy_version 743440 (0.0121) [2024-06-15 20:38:10,955][1648985] Fps is (10 sec: 36045.2, 60 sec: 43690.7, 300 sec: 45653.0). Total num frames: 1522630656. Throughput: 0: 10865.8. Samples: 380721664. Policy #0 lag: (min: 57.0, avg: 141.9, max: 313.0) [2024-06-15 20:38:10,955][1648985] Avg episode reward: [(0, '162.410')] [2024-06-15 20:38:12,135][1652491] Updated weights for policy 0, policy_version 743536 (0.0016) [2024-06-15 20:38:15,955][1648985] Fps is (10 sec: 49151.7, 60 sec: 43144.5, 300 sec: 45875.2). Total num frames: 1522892800. Throughput: 0: 10581.3. Samples: 380775424. Policy #0 lag: (min: 57.0, avg: 141.9, max: 313.0) [2024-06-15 20:38:15,956][1648985] Avg episode reward: [(0, '160.990')] [2024-06-15 20:38:16,001][1652491] Updated weights for policy 0, policy_version 743606 (0.0013) [2024-06-15 20:38:20,955][1648985] Fps is (10 sec: 29491.0, 60 sec: 41506.1, 300 sec: 45097.7). Total num frames: 1522925568. Throughput: 0: 10604.0. Samples: 380838912. Policy #0 lag: (min: 57.0, avg: 141.9, max: 313.0) [2024-06-15 20:38:20,956][1648985] Avg episode reward: [(0, '142.750')] [2024-06-15 20:38:22,706][1652491] Updated weights for policy 0, policy_version 743680 (0.0013) [2024-06-15 20:38:24,625][1652491] Updated weights for policy 0, policy_version 743745 (0.0014) [2024-06-15 20:38:25,982][1648985] Fps is (10 sec: 39215.4, 60 sec: 42579.2, 300 sec: 45648.8). Total num frames: 1523286016. Throughput: 0: 10700.0. Samples: 380864000. Policy #0 lag: (min: 41.0, avg: 105.5, max: 297.0) [2024-06-15 20:38:25,983][1648985] Avg episode reward: [(0, '141.630')] [2024-06-15 20:38:26,118][1652491] Updated weights for policy 0, policy_version 743808 (0.0013) [2024-06-15 20:38:29,820][1652491] Updated weights for policy 0, policy_version 743861 (0.0167) [2024-06-15 20:38:30,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 43690.6, 300 sec: 45430.9). Total num frames: 1523449856. Throughput: 0: 10137.9. Samples: 380919808. Policy #0 lag: (min: 41.0, avg: 105.5, max: 297.0) [2024-06-15 20:38:30,956][1648985] Avg episode reward: [(0, '145.580')] [2024-06-15 20:38:34,754][1652491] Updated weights for policy 0, policy_version 743898 (0.0014) [2024-06-15 20:38:35,955][1648985] Fps is (10 sec: 32857.2, 60 sec: 41506.0, 300 sec: 44986.6). Total num frames: 1523613696. Throughput: 0: 10194.5. Samples: 380991488. Policy #0 lag: (min: 41.0, avg: 105.5, max: 297.0) [2024-06-15 20:38:35,956][1648985] Avg episode reward: [(0, '155.920')] [2024-06-15 20:38:36,561][1652491] Updated weights for policy 0, policy_version 743984 (0.0137) [2024-06-15 20:38:38,524][1652491] Updated weights for policy 0, policy_version 744062 (0.0014) [2024-06-15 20:38:40,955][1648985] Fps is (10 sec: 42599.0, 60 sec: 42052.4, 300 sec: 45430.9). Total num frames: 1523875840. Throughput: 0: 9659.8. Samples: 381013504. Policy #0 lag: (min: 41.0, avg: 105.5, max: 297.0) [2024-06-15 20:38:40,956][1648985] Avg episode reward: [(0, '153.200')] [2024-06-15 20:38:41,960][1652491] Updated weights for policy 0, policy_version 744114 (0.0036) [2024-06-15 20:38:45,955][1648985] Fps is (10 sec: 36044.9, 60 sec: 39868.5, 300 sec: 44875.5). Total num frames: 1523974144. Throughput: 0: 10228.6. Samples: 381086208. Policy #0 lag: (min: 41.0, avg: 105.5, max: 297.0) [2024-06-15 20:38:45,956][1648985] Avg episode reward: [(0, '137.060')] [2024-06-15 20:38:47,466][1651469] Signal inference workers to stop experience collection... (38700 times) [2024-06-15 20:38:47,526][1652491] InferenceWorker_p0-w0: stopping experience collection (38700 times) [2024-06-15 20:38:47,642][1651469] Signal inference workers to resume experience collection... (38700 times) [2024-06-15 20:38:47,643][1652491] InferenceWorker_p0-w0: resuming experience collection (38700 times) [2024-06-15 20:38:47,645][1652491] Updated weights for policy 0, policy_version 744160 (0.0012) [2024-06-15 20:38:49,912][1652491] Updated weights for policy 0, policy_version 744257 (0.0012) [2024-06-15 20:38:50,955][1648985] Fps is (10 sec: 45874.8, 60 sec: 40960.0, 300 sec: 45208.7). Total num frames: 1524334592. Throughput: 0: 10001.1. Samples: 381141504. Policy #0 lag: (min: 41.0, avg: 105.5, max: 297.0) [2024-06-15 20:38:50,956][1648985] Avg episode reward: [(0, '124.960')] [2024-06-15 20:38:51,108][1652491] Updated weights for policy 0, policy_version 744320 (0.0012) [2024-06-15 20:38:54,713][1652491] Updated weights for policy 0, policy_version 744382 (0.0014) [2024-06-15 20:38:55,955][1648985] Fps is (10 sec: 52428.2, 60 sec: 41506.0, 300 sec: 44875.5). Total num frames: 1524498432. Throughput: 0: 10080.7. Samples: 381175296. Policy #0 lag: (min: 41.0, avg: 105.5, max: 297.0) [2024-06-15 20:38:55,956][1648985] Avg episode reward: [(0, '143.880')] [2024-06-15 20:38:55,966][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000744384_1524498432.pth... [2024-06-15 20:38:56,011][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000739200_1513881600.pth [2024-06-15 20:39:00,955][1648985] Fps is (10 sec: 29491.0, 60 sec: 39321.6, 300 sec: 44875.5). Total num frames: 1524629504. Throughput: 0: 10501.7. Samples: 381248000. Policy #0 lag: (min: 41.0, avg: 105.5, max: 297.0) [2024-06-15 20:39:00,956][1648985] Avg episode reward: [(0, '148.740')] [2024-06-15 20:39:01,472][1652491] Updated weights for policy 0, policy_version 744468 (0.0014) [2024-06-15 20:39:03,344][1652491] Updated weights for policy 0, policy_version 744552 (0.0014) [2024-06-15 20:39:05,617][1652491] Updated weights for policy 0, policy_version 744594 (0.0014) [2024-06-15 20:39:05,955][1648985] Fps is (10 sec: 45875.5, 60 sec: 42598.4, 300 sec: 45208.7). Total num frames: 1524957184. Throughput: 0: 10262.8. Samples: 381300736. Policy #0 lag: (min: 41.0, avg: 105.5, max: 297.0) [2024-06-15 20:39:05,956][1648985] Avg episode reward: [(0, '163.750')] [2024-06-15 20:39:10,955][1648985] Fps is (10 sec: 39322.1, 60 sec: 39867.8, 300 sec: 44653.4). Total num frames: 1525022720. Throughput: 0: 10530.8. Samples: 381337600. Policy #0 lag: (min: 41.0, avg: 105.5, max: 297.0) [2024-06-15 20:39:10,955][1648985] Avg episode reward: [(0, '161.700')] [2024-06-15 20:39:12,106][1652491] Updated weights for policy 0, policy_version 744673 (0.0033) [2024-06-15 20:39:13,807][1652491] Updated weights for policy 0, policy_version 744736 (0.0015) [2024-06-15 20:39:15,513][1652491] Updated weights for policy 0, policy_version 744816 (0.0013) [2024-06-15 20:39:15,955][1648985] Fps is (10 sec: 45875.3, 60 sec: 42052.3, 300 sec: 45319.8). Total num frames: 1525415936. Throughput: 0: 10524.4. Samples: 381393408. Policy #0 lag: (min: 41.0, avg: 105.5, max: 297.0) [2024-06-15 20:39:15,956][1648985] Avg episode reward: [(0, '165.490')] [2024-06-15 20:39:17,813][1652491] Updated weights for policy 0, policy_version 744851 (0.0013) [2024-06-15 20:39:18,697][1652491] Updated weights for policy 0, policy_version 744896 (0.0012) [2024-06-15 20:39:20,956][1648985] Fps is (10 sec: 52428.3, 60 sec: 43690.7, 300 sec: 44875.5). Total num frames: 1525547008. Throughput: 0: 10786.1. Samples: 381476864. Policy #0 lag: (min: 41.0, avg: 105.5, max: 297.0) [2024-06-15 20:39:20,957][1648985] Avg episode reward: [(0, '142.300')] [2024-06-15 20:39:23,383][1652491] Updated weights for policy 0, policy_version 744960 (0.0014) [2024-06-15 20:39:25,400][1652491] Updated weights for policy 0, policy_version 745025 (0.0037) [2024-06-15 20:39:25,955][1648985] Fps is (10 sec: 45875.5, 60 sec: 43164.1, 300 sec: 45097.7). Total num frames: 1525874688. Throughput: 0: 11070.6. Samples: 381511680. Policy #0 lag: (min: 41.0, avg: 105.5, max: 297.0) [2024-06-15 20:39:25,955][1648985] Avg episode reward: [(0, '148.030')] [2024-06-15 20:39:26,024][1651469] Signal inference workers to stop experience collection... (38750 times) [2024-06-15 20:39:26,090][1652491] InferenceWorker_p0-w0: stopping experience collection (38750 times) [2024-06-15 20:39:26,254][1651469] Signal inference workers to resume experience collection... (38750 times) [2024-06-15 20:39:26,254][1652491] InferenceWorker_p0-w0: resuming experience collection (38750 times) [2024-06-15 20:39:26,559][1652491] Updated weights for policy 0, policy_version 745088 (0.0126) [2024-06-15 20:39:29,881][1652491] Updated weights for policy 0, policy_version 745151 (0.0012) [2024-06-15 20:39:30,955][1648985] Fps is (10 sec: 52428.6, 60 sec: 43690.7, 300 sec: 44875.6). Total num frames: 1526071296. Throughput: 0: 10831.6. Samples: 381573632. Policy #0 lag: (min: 41.0, avg: 105.5, max: 297.0) [2024-06-15 20:39:30,956][1648985] Avg episode reward: [(0, '152.590')] [2024-06-15 20:39:34,490][1652491] Updated weights for policy 0, policy_version 745202 (0.0011) [2024-06-15 20:39:35,955][1648985] Fps is (10 sec: 42596.5, 60 sec: 44782.7, 300 sec: 44986.5). Total num frames: 1526300672. Throughput: 0: 11172.9. Samples: 381644288. Policy #0 lag: (min: 41.0, avg: 105.5, max: 297.0) [2024-06-15 20:39:35,956][1648985] Avg episode reward: [(0, '145.030')] [2024-06-15 20:39:36,476][1652491] Updated weights for policy 0, policy_version 745282 (0.0046) [2024-06-15 20:39:37,700][1652491] Updated weights for policy 0, policy_version 745339 (0.0046) [2024-06-15 20:39:40,955][1648985] Fps is (10 sec: 45875.3, 60 sec: 44236.7, 300 sec: 44653.3). Total num frames: 1526530048. Throughput: 0: 11275.4. Samples: 381682688. Policy #0 lag: (min: 41.0, avg: 105.5, max: 297.0) [2024-06-15 20:39:40,956][1648985] Avg episode reward: [(0, '138.610')] [2024-06-15 20:39:41,257][1652491] Updated weights for policy 0, policy_version 745401 (0.0014) [2024-06-15 20:39:44,309][1652491] Updated weights for policy 0, policy_version 745429 (0.0012) [2024-06-15 20:39:45,958][1648985] Fps is (10 sec: 45861.9, 60 sec: 46418.8, 300 sec: 44986.1). Total num frames: 1526759424. Throughput: 0: 11365.6. Samples: 381759488. Policy #0 lag: (min: 41.0, avg: 105.5, max: 297.0) [2024-06-15 20:39:45,959][1648985] Avg episode reward: [(0, '156.270')] [2024-06-15 20:39:46,158][1652491] Updated weights for policy 0, policy_version 745491 (0.0013) [2024-06-15 20:39:48,215][1652491] Updated weights for policy 0, policy_version 745589 (0.0012) [2024-06-15 20:39:50,955][1648985] Fps is (10 sec: 49152.3, 60 sec: 44783.0, 300 sec: 44764.4). Total num frames: 1527021568. Throughput: 0: 11787.4. Samples: 381831168. Policy #0 lag: (min: 41.0, avg: 105.5, max: 297.0) [2024-06-15 20:39:50,955][1648985] Avg episode reward: [(0, '171.120')] [2024-06-15 20:39:51,875][1652491] Updated weights for policy 0, policy_version 745657 (0.0023) [2024-06-15 20:39:55,816][1652491] Updated weights for policy 0, policy_version 745701 (0.0143) [2024-06-15 20:39:55,955][1648985] Fps is (10 sec: 42612.0, 60 sec: 44782.9, 300 sec: 44653.3). Total num frames: 1527185408. Throughput: 0: 11764.6. Samples: 381867008. Policy #0 lag: (min: 41.0, avg: 105.5, max: 297.0) [2024-06-15 20:39:55,956][1648985] Avg episode reward: [(0, '179.610')] [2024-06-15 20:39:57,536][1652491] Updated weights for policy 0, policy_version 745760 (0.0014) [2024-06-15 20:39:58,999][1652491] Updated weights for policy 0, policy_version 745824 (0.0019) [2024-06-15 20:40:00,955][1648985] Fps is (10 sec: 49151.9, 60 sec: 48059.8, 300 sec: 44986.6). Total num frames: 1527513088. Throughput: 0: 11912.5. Samples: 381929472. Policy #0 lag: (min: 41.0, avg: 105.5, max: 297.0) [2024-06-15 20:40:00,956][1648985] Avg episode reward: [(0, '173.780')] [2024-06-15 20:40:01,981][1652491] Updated weights for policy 0, policy_version 745860 (0.0023) [2024-06-15 20:40:03,277][1652491] Updated weights for policy 0, policy_version 745920 (0.0021) [2024-06-15 20:40:05,955][1648985] Fps is (10 sec: 49152.5, 60 sec: 45329.1, 300 sec: 44764.4). Total num frames: 1527676928. Throughput: 0: 11878.4. Samples: 382011392. Policy #0 lag: (min: 11.0, avg: 91.6, max: 267.0) [2024-06-15 20:40:05,956][1648985] Avg episode reward: [(0, '164.800')] [2024-06-15 20:40:06,887][1652491] Updated weights for policy 0, policy_version 745984 (0.0013) [2024-06-15 20:40:09,320][1652491] Updated weights for policy 0, policy_version 746045 (0.0135) [2024-06-15 20:40:10,955][1648985] Fps is (10 sec: 49152.2, 60 sec: 49698.1, 300 sec: 45208.7). Total num frames: 1528004608. Throughput: 0: 11730.5. Samples: 382039552. Policy #0 lag: (min: 11.0, avg: 91.6, max: 267.0) [2024-06-15 20:40:10,955][1648985] Avg episode reward: [(0, '152.930')] [2024-06-15 20:40:11,029][1652491] Updated weights for policy 0, policy_version 746110 (0.0014) [2024-06-15 20:40:13,096][1651469] Signal inference workers to stop experience collection... (38800 times) [2024-06-15 20:40:13,145][1652491] InferenceWorker_p0-w0: stopping experience collection (38800 times) [2024-06-15 20:40:13,376][1651469] Signal inference workers to resume experience collection... (38800 times) [2024-06-15 20:40:13,377][1652491] InferenceWorker_p0-w0: resuming experience collection (38800 times) [2024-06-15 20:40:14,193][1652491] Updated weights for policy 0, policy_version 746164 (0.0128) [2024-06-15 20:40:15,971][1648985] Fps is (10 sec: 49072.5, 60 sec: 45862.8, 300 sec: 44873.0). Total num frames: 1528168448. Throughput: 0: 12022.0. Samples: 382114816. Policy #0 lag: (min: 11.0, avg: 91.6, max: 267.0) [2024-06-15 20:40:15,972][1648985] Avg episode reward: [(0, '180.420')] [2024-06-15 20:40:16,907][1652491] Updated weights for policy 0, policy_version 746194 (0.0013) [2024-06-15 20:40:18,424][1652491] Updated weights for policy 0, policy_version 746247 (0.0015) [2024-06-15 20:40:19,517][1652491] Updated weights for policy 0, policy_version 746302 (0.0016) [2024-06-15 20:40:20,958][1648985] Fps is (10 sec: 49136.5, 60 sec: 49149.5, 300 sec: 45097.2). Total num frames: 1528496128. Throughput: 0: 12048.3. Samples: 382186496. Policy #0 lag: (min: 11.0, avg: 91.6, max: 267.0) [2024-06-15 20:40:20,959][1648985] Avg episode reward: [(0, '184.150')] [2024-06-15 20:40:21,100][1652491] Updated weights for policy 0, policy_version 746356 (0.0014) [2024-06-15 20:40:24,224][1652491] Updated weights for policy 0, policy_version 746388 (0.0014) [2024-06-15 20:40:25,044][1652491] Updated weights for policy 0, policy_version 746432 (0.0034) [2024-06-15 20:40:25,955][1648985] Fps is (10 sec: 52514.1, 60 sec: 46967.5, 300 sec: 44875.7). Total num frames: 1528692736. Throughput: 0: 12208.4. Samples: 382232064. Policy #0 lag: (min: 11.0, avg: 91.6, max: 267.0) [2024-06-15 20:40:25,955][1648985] Avg episode reward: [(0, '170.570')] [2024-06-15 20:40:28,362][1652491] Updated weights for policy 0, policy_version 746496 (0.0016) [2024-06-15 20:40:30,955][1648985] Fps is (10 sec: 45889.4, 60 sec: 48059.8, 300 sec: 45319.8). Total num frames: 1528954880. Throughput: 0: 11867.9. Samples: 382293504. Policy #0 lag: (min: 11.0, avg: 91.6, max: 267.0) [2024-06-15 20:40:30,956][1648985] Avg episode reward: [(0, '184.740')] [2024-06-15 20:40:31,406][1652491] Updated weights for policy 0, policy_version 746561 (0.0014) [2024-06-15 20:40:32,458][1652491] Updated weights for policy 0, policy_version 746616 (0.0014) [2024-06-15 20:40:35,440][1652491] Updated weights for policy 0, policy_version 746656 (0.0013) [2024-06-15 20:40:35,955][1648985] Fps is (10 sec: 49151.1, 60 sec: 48059.9, 300 sec: 45097.6). Total num frames: 1529184256. Throughput: 0: 12117.3. Samples: 382376448. Policy #0 lag: (min: 11.0, avg: 91.6, max: 267.0) [2024-06-15 20:40:35,956][1648985] Avg episode reward: [(0, '167.220')] [2024-06-15 20:40:38,106][1652491] Updated weights for policy 0, policy_version 746708 (0.0019) [2024-06-15 20:40:39,616][1652491] Updated weights for policy 0, policy_version 746784 (0.0013) [2024-06-15 20:40:40,955][1648985] Fps is (10 sec: 52428.4, 60 sec: 49152.0, 300 sec: 45319.8). Total num frames: 1529479168. Throughput: 0: 12151.5. Samples: 382413824. Policy #0 lag: (min: 11.0, avg: 91.6, max: 267.0) [2024-06-15 20:40:40,956][1648985] Avg episode reward: [(0, '158.540')] [2024-06-15 20:40:42,194][1652491] Updated weights for policy 0, policy_version 746848 (0.0016) [2024-06-15 20:40:45,757][1652491] Updated weights for policy 0, policy_version 746896 (0.0017) [2024-06-15 20:40:45,955][1648985] Fps is (10 sec: 45875.6, 60 sec: 48062.3, 300 sec: 45319.8). Total num frames: 1529643008. Throughput: 0: 12549.7. Samples: 382494208. Policy #0 lag: (min: 11.0, avg: 91.6, max: 267.0) [2024-06-15 20:40:45,956][1648985] Avg episode reward: [(0, '136.430')] [2024-06-15 20:40:46,654][1652491] Updated weights for policy 0, policy_version 746936 (0.0011) [2024-06-15 20:40:48,588][1652491] Updated weights for policy 0, policy_version 747001 (0.0012) [2024-06-15 20:40:50,955][1648985] Fps is (10 sec: 52429.2, 60 sec: 49698.1, 300 sec: 45319.8). Total num frames: 1530003456. Throughput: 0: 12208.4. Samples: 382560768. Policy #0 lag: (min: 11.0, avg: 91.6, max: 267.0) [2024-06-15 20:40:50,956][1648985] Avg episode reward: [(0, '138.640')] [2024-06-15 20:40:52,471][1652491] Updated weights for policy 0, policy_version 747073 (0.0014) [2024-06-15 20:40:53,331][1651469] Signal inference workers to stop experience collection... (38850 times) [2024-06-15 20:40:53,400][1652491] InferenceWorker_p0-w0: stopping experience collection (38850 times) [2024-06-15 20:40:53,664][1651469] Signal inference workers to resume experience collection... (38850 times) [2024-06-15 20:40:53,664][1652491] InferenceWorker_p0-w0: resuming experience collection (38850 times) [2024-06-15 20:40:53,856][1652491] Updated weights for policy 0, policy_version 747128 (0.0013) [2024-06-15 20:40:55,955][1648985] Fps is (10 sec: 49151.8, 60 sec: 49152.0, 300 sec: 45319.8). Total num frames: 1530134528. Throughput: 0: 12299.3. Samples: 382593024. Policy #0 lag: (min: 11.0, avg: 91.6, max: 267.0) [2024-06-15 20:40:55,956][1648985] Avg episode reward: [(0, '151.840')] [2024-06-15 20:40:55,960][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000747136_1530134528.pth... [2024-06-15 20:40:56,032][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000741824_1519255552.pth [2024-06-15 20:40:57,033][1652491] Updated weights for policy 0, policy_version 747159 (0.0012) [2024-06-15 20:40:58,999][1652491] Updated weights for policy 0, policy_version 747216 (0.0013) [2024-06-15 20:41:00,719][1652491] Updated weights for policy 0, policy_version 747285 (0.0013) [2024-06-15 20:41:00,955][1648985] Fps is (10 sec: 45875.4, 60 sec: 49152.0, 300 sec: 45542.0). Total num frames: 1530462208. Throughput: 0: 12372.1. Samples: 382671360. Policy #0 lag: (min: 11.0, avg: 91.6, max: 267.0) [2024-06-15 20:41:00,955][1648985] Avg episode reward: [(0, '141.940')] [2024-06-15 20:41:03,761][1652491] Updated weights for policy 0, policy_version 747344 (0.0012) [2024-06-15 20:41:04,759][1652491] Updated weights for policy 0, policy_version 747385 (0.0013) [2024-06-15 20:41:05,955][1648985] Fps is (10 sec: 52429.3, 60 sec: 49698.1, 300 sec: 45319.9). Total num frames: 1530658816. Throughput: 0: 12402.6. Samples: 382744576. Policy #0 lag: (min: 11.0, avg: 91.6, max: 267.0) [2024-06-15 20:41:05,956][1648985] Avg episode reward: [(0, '142.920')] [2024-06-15 20:41:07,903][1652491] Updated weights for policy 0, policy_version 747440 (0.0013) [2024-06-15 20:41:09,196][1652491] Updated weights for policy 0, policy_version 747472 (0.0011) [2024-06-15 20:41:10,955][1648985] Fps is (10 sec: 45874.0, 60 sec: 48605.6, 300 sec: 45653.0). Total num frames: 1530920960. Throughput: 0: 12333.4. Samples: 382787072. Policy #0 lag: (min: 11.0, avg: 91.6, max: 267.0) [2024-06-15 20:41:10,956][1648985] Avg episode reward: [(0, '160.810')] [2024-06-15 20:41:11,401][1652491] Updated weights for policy 0, policy_version 747552 (0.0166) [2024-06-15 20:41:14,892][1652491] Updated weights for policy 0, policy_version 747616 (0.0014) [2024-06-15 20:41:15,658][1652491] Updated weights for policy 0, policy_version 747647 (0.0012) [2024-06-15 20:41:15,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 50257.8, 300 sec: 45764.1). Total num frames: 1531183104. Throughput: 0: 12379.0. Samples: 382850560. Policy #0 lag: (min: 11.0, avg: 91.6, max: 267.0) [2024-06-15 20:41:15,956][1648985] Avg episode reward: [(0, '163.830')] [2024-06-15 20:41:18,884][1652491] Updated weights for policy 0, policy_version 747685 (0.0011) [2024-06-15 20:41:20,811][1652491] Updated weights for policy 0, policy_version 747760 (0.0013) [2024-06-15 20:41:20,955][1648985] Fps is (10 sec: 49153.0, 60 sec: 48608.4, 300 sec: 46097.4). Total num frames: 1531412480. Throughput: 0: 12185.6. Samples: 382924800. Policy #0 lag: (min: 11.0, avg: 91.6, max: 267.0) [2024-06-15 20:41:20,956][1648985] Avg episode reward: [(0, '161.330')] [2024-06-15 20:41:22,295][1652491] Updated weights for policy 0, policy_version 747827 (0.0099) [2024-06-15 20:41:25,756][1652491] Updated weights for policy 0, policy_version 747872 (0.0016) [2024-06-15 20:41:25,955][1648985] Fps is (10 sec: 45875.1, 60 sec: 49152.0, 300 sec: 45542.0). Total num frames: 1531641856. Throughput: 0: 12037.7. Samples: 382955520. Policy #0 lag: (min: 11.0, avg: 91.6, max: 267.0) [2024-06-15 20:41:25,956][1648985] Avg episode reward: [(0, '149.750')] [2024-06-15 20:41:29,480][1652491] Updated weights for policy 0, policy_version 747905 (0.0017) [2024-06-15 20:41:30,955][1648985] Fps is (10 sec: 42598.6, 60 sec: 48059.8, 300 sec: 45764.1). Total num frames: 1531838464. Throughput: 0: 11969.4. Samples: 383032832. Policy #0 lag: (min: 11.0, avg: 91.6, max: 267.0) [2024-06-15 20:41:30,956][1648985] Avg episode reward: [(0, '143.340')] [2024-06-15 20:41:32,326][1652491] Updated weights for policy 0, policy_version 748016 (0.0013) [2024-06-15 20:41:33,801][1652491] Updated weights for policy 0, policy_version 748080 (0.0118) [2024-06-15 20:41:35,955][1648985] Fps is (10 sec: 45875.0, 60 sec: 48605.9, 300 sec: 45875.2). Total num frames: 1532100608. Throughput: 0: 11969.4. Samples: 383099392. Policy #0 lag: (min: 11.0, avg: 91.6, max: 267.0) [2024-06-15 20:41:35,956][1648985] Avg episode reward: [(0, '142.090')] [2024-06-15 20:41:36,311][1651469] Signal inference workers to stop experience collection... (38900 times) [2024-06-15 20:41:36,346][1652491] InferenceWorker_p0-w0: stopping experience collection (38900 times) [2024-06-15 20:41:36,580][1651469] Signal inference workers to resume experience collection... (38900 times) [2024-06-15 20:41:36,581][1652491] InferenceWorker_p0-w0: resuming experience collection (38900 times) [2024-06-15 20:41:37,099][1652491] Updated weights for policy 0, policy_version 748129 (0.0013) [2024-06-15 20:41:40,591][1652491] Updated weights for policy 0, policy_version 748162 (0.0013) [2024-06-15 20:41:40,955][1648985] Fps is (10 sec: 42598.4, 60 sec: 46421.4, 300 sec: 45875.2). Total num frames: 1532264448. Throughput: 0: 12117.4. Samples: 383138304. Policy #0 lag: (min: 11.0, avg: 91.6, max: 267.0) [2024-06-15 20:41:40,956][1648985] Avg episode reward: [(0, '136.390')] [2024-06-15 20:41:43,924][1652491] Updated weights for policy 0, policy_version 748274 (0.0012) [2024-06-15 20:41:45,256][1652491] Updated weights for policy 0, policy_version 748336 (0.0013) [2024-06-15 20:41:45,955][1648985] Fps is (10 sec: 52428.3, 60 sec: 49698.1, 300 sec: 46208.4). Total num frames: 1532624896. Throughput: 0: 11741.8. Samples: 383199744. Policy #0 lag: (min: 11.0, avg: 91.6, max: 267.0) [2024-06-15 20:41:45,956][1648985] Avg episode reward: [(0, '161.270')] [2024-06-15 20:41:49,020][1652491] Updated weights for policy 0, policy_version 748400 (0.0013) [2024-06-15 20:41:50,955][1648985] Fps is (10 sec: 49151.3, 60 sec: 45875.1, 300 sec: 45764.1). Total num frames: 1532755968. Throughput: 0: 11832.8. Samples: 383277056. Policy #0 lag: (min: 59.0, avg: 184.8, max: 315.0) [2024-06-15 20:41:50,956][1648985] Avg episode reward: [(0, '152.990')] [2024-06-15 20:41:52,380][1652491] Updated weights for policy 0, policy_version 748432 (0.0118) [2024-06-15 20:41:53,772][1652491] Updated weights for policy 0, policy_version 748496 (0.0014) [2024-06-15 20:41:54,619][1652491] Updated weights for policy 0, policy_version 748540 (0.0012) [2024-06-15 20:41:55,734][1652491] Updated weights for policy 0, policy_version 748577 (0.0013) [2024-06-15 20:41:55,955][1648985] Fps is (10 sec: 45875.6, 60 sec: 49152.0, 300 sec: 45986.3). Total num frames: 1533083648. Throughput: 0: 11582.6. Samples: 383308288. Policy #0 lag: (min: 59.0, avg: 184.8, max: 315.0) [2024-06-15 20:41:55,956][1648985] Avg episode reward: [(0, '159.020')] [2024-06-15 20:41:58,000][1652491] Updated weights for policy 0, policy_version 748610 (0.0016) [2024-06-15 20:42:00,955][1648985] Fps is (10 sec: 52429.7, 60 sec: 46967.5, 300 sec: 45764.1). Total num frames: 1533280256. Throughput: 0: 11787.4. Samples: 383380992. Policy #0 lag: (min: 59.0, avg: 184.8, max: 315.0) [2024-06-15 20:42:00,956][1648985] Avg episode reward: [(0, '174.960')] [2024-06-15 20:42:03,896][1652491] Updated weights for policy 0, policy_version 748720 (0.0088) [2024-06-15 20:42:05,211][1652491] Updated weights for policy 0, policy_version 748787 (0.0129) [2024-06-15 20:42:05,955][1648985] Fps is (10 sec: 45874.8, 60 sec: 48059.6, 300 sec: 45875.2). Total num frames: 1533542400. Throughput: 0: 11730.5. Samples: 383452672. Policy #0 lag: (min: 59.0, avg: 184.8, max: 315.0) [2024-06-15 20:42:05,956][1648985] Avg episode reward: [(0, '177.040')] [2024-06-15 20:42:07,105][1652491] Updated weights for policy 0, policy_version 748839 (0.0012) [2024-06-15 20:42:10,154][1652491] Updated weights for policy 0, policy_version 748896 (0.0014) [2024-06-15 20:42:10,955][1648985] Fps is (10 sec: 52428.2, 60 sec: 48059.8, 300 sec: 45764.1). Total num frames: 1533804544. Throughput: 0: 11844.2. Samples: 383488512. Policy #0 lag: (min: 59.0, avg: 184.8, max: 315.0) [2024-06-15 20:42:10,956][1648985] Avg episode reward: [(0, '180.900')] [2024-06-15 20:42:14,882][1652491] Updated weights for policy 0, policy_version 748976 (0.0013) [2024-06-15 20:42:15,955][1648985] Fps is (10 sec: 39321.4, 60 sec: 45875.1, 300 sec: 45764.1). Total num frames: 1533935616. Throughput: 0: 11684.9. Samples: 383558656. Policy #0 lag: (min: 59.0, avg: 184.8, max: 315.0) [2024-06-15 20:42:15,956][1648985] Avg episode reward: [(0, '157.230')] [2024-06-15 20:42:16,674][1652491] Updated weights for policy 0, policy_version 749040 (0.0032) [2024-06-15 20:42:18,521][1652491] Updated weights for policy 0, policy_version 749088 (0.0013) [2024-06-15 20:42:18,615][1651469] Signal inference workers to stop experience collection... (38950 times) [2024-06-15 20:42:18,662][1652491] InferenceWorker_p0-w0: stopping experience collection (38950 times) [2024-06-15 20:42:18,821][1651469] Signal inference workers to resume experience collection... (38950 times) [2024-06-15 20:42:18,822][1652491] InferenceWorker_p0-w0: resuming experience collection (38950 times) [2024-06-15 20:42:20,955][1648985] Fps is (10 sec: 42598.6, 60 sec: 46967.5, 300 sec: 45764.1). Total num frames: 1534230528. Throughput: 0: 11810.1. Samples: 383630848. Policy #0 lag: (min: 59.0, avg: 184.8, max: 315.0) [2024-06-15 20:42:20,956][1648985] Avg episode reward: [(0, '168.590')] [2024-06-15 20:42:21,562][1652491] Updated weights for policy 0, policy_version 749168 (0.0095) [2024-06-15 20:42:25,420][1652491] Updated weights for policy 0, policy_version 749219 (0.0012) [2024-06-15 20:42:25,955][1648985] Fps is (10 sec: 52429.4, 60 sec: 46967.4, 300 sec: 46208.4). Total num frames: 1534459904. Throughput: 0: 11832.9. Samples: 383670784. Policy #0 lag: (min: 59.0, avg: 184.8, max: 315.0) [2024-06-15 20:42:25,956][1648985] Avg episode reward: [(0, '166.870')] [2024-06-15 20:42:27,894][1652491] Updated weights for policy 0, policy_version 749303 (0.0014) [2024-06-15 20:42:29,711][1652491] Updated weights for policy 0, policy_version 749345 (0.0013) [2024-06-15 20:42:30,955][1648985] Fps is (10 sec: 49152.2, 60 sec: 48059.7, 300 sec: 46097.3). Total num frames: 1534722048. Throughput: 0: 11844.3. Samples: 383732736. Policy #0 lag: (min: 59.0, avg: 184.8, max: 315.0) [2024-06-15 20:42:30,955][1648985] Avg episode reward: [(0, '174.010')] [2024-06-15 20:42:32,204][1652491] Updated weights for policy 0, policy_version 749393 (0.0013) [2024-06-15 20:42:33,175][1652491] Updated weights for policy 0, policy_version 749436 (0.0012) [2024-06-15 20:42:35,955][1648985] Fps is (10 sec: 39321.8, 60 sec: 45875.2, 300 sec: 45764.1). Total num frames: 1534853120. Throughput: 0: 11776.0. Samples: 383806976. Policy #0 lag: (min: 59.0, avg: 184.8, max: 315.0) [2024-06-15 20:42:35,956][1648985] Avg episode reward: [(0, '155.420')] [2024-06-15 20:42:37,139][1652491] Updated weights for policy 0, policy_version 749492 (0.0087) [2024-06-15 20:42:39,372][1652491] Updated weights for policy 0, policy_version 749536 (0.0086) [2024-06-15 20:42:40,955][1648985] Fps is (10 sec: 42598.4, 60 sec: 48059.7, 300 sec: 45986.5). Total num frames: 1535148032. Throughput: 0: 11832.9. Samples: 383840768. Policy #0 lag: (min: 59.0, avg: 184.8, max: 315.0) [2024-06-15 20:42:40,955][1648985] Avg episode reward: [(0, '151.540')] [2024-06-15 20:42:41,426][1652491] Updated weights for policy 0, policy_version 749605 (0.0013) [2024-06-15 20:42:42,869][1652491] Updated weights for policy 0, policy_version 749636 (0.0012) [2024-06-15 20:42:44,430][1652491] Updated weights for policy 0, policy_version 749696 (0.0011) [2024-06-15 20:42:45,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 45875.3, 300 sec: 45764.1). Total num frames: 1535377408. Throughput: 0: 11605.3. Samples: 383903232. Policy #0 lag: (min: 59.0, avg: 184.8, max: 315.0) [2024-06-15 20:42:45,956][1648985] Avg episode reward: [(0, '144.280')] [2024-06-15 20:42:50,260][1652491] Updated weights for policy 0, policy_version 749776 (0.0013) [2024-06-15 20:42:50,955][1648985] Fps is (10 sec: 42598.7, 60 sec: 46967.6, 300 sec: 45986.3). Total num frames: 1535574016. Throughput: 0: 11662.3. Samples: 383977472. Policy #0 lag: (min: 59.0, avg: 184.8, max: 315.0) [2024-06-15 20:42:50,956][1648985] Avg episode reward: [(0, '139.910')] [2024-06-15 20:42:51,432][1652491] Updated weights for policy 0, policy_version 749820 (0.0012) [2024-06-15 20:42:54,380][1652491] Updated weights for policy 0, policy_version 749893 (0.0015) [2024-06-15 20:42:55,657][1652491] Updated weights for policy 0, policy_version 749946 (0.0013) [2024-06-15 20:42:55,955][1648985] Fps is (10 sec: 52427.0, 60 sec: 46967.2, 300 sec: 46208.4). Total num frames: 1535901696. Throughput: 0: 11582.5. Samples: 384009728. Policy #0 lag: (min: 59.0, avg: 184.8, max: 315.0) [2024-06-15 20:42:55,956][1648985] Avg episode reward: [(0, '132.550')] [2024-06-15 20:42:55,962][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000749952_1535901696.pth... [2024-06-15 20:42:56,016][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000744384_1524498432.pth [2024-06-15 20:42:59,459][1652491] Updated weights for policy 0, policy_version 750007 (0.0014) [2024-06-15 20:43:00,955][1648985] Fps is (10 sec: 52428.2, 60 sec: 46967.4, 300 sec: 46430.6). Total num frames: 1536098304. Throughput: 0: 11844.3. Samples: 384091648. Policy #0 lag: (min: 59.0, avg: 184.8, max: 315.0) [2024-06-15 20:43:00,956][1648985] Avg episode reward: [(0, '156.160')] [2024-06-15 20:43:01,044][1652491] Updated weights for policy 0, policy_version 750064 (0.0015) [2024-06-15 20:43:02,746][1652491] Updated weights for policy 0, policy_version 750116 (0.0014) [2024-06-15 20:43:04,906][1651469] Signal inference workers to stop experience collection... (39000 times) [2024-06-15 20:43:04,961][1652491] InferenceWorker_p0-w0: stopping experience collection (39000 times) [2024-06-15 20:43:05,124][1651469] Signal inference workers to resume experience collection... (39000 times) [2024-06-15 20:43:05,125][1652491] InferenceWorker_p0-w0: resuming experience collection (39000 times) [2024-06-15 20:43:05,478][1652491] Updated weights for policy 0, policy_version 750176 (0.0013) [2024-06-15 20:43:05,955][1648985] Fps is (10 sec: 49153.3, 60 sec: 47513.6, 300 sec: 46652.7). Total num frames: 1536393216. Throughput: 0: 11719.1. Samples: 384158208. Policy #0 lag: (min: 59.0, avg: 184.8, max: 315.0) [2024-06-15 20:43:05,956][1648985] Avg episode reward: [(0, '168.550')] [2024-06-15 20:43:10,655][1652491] Updated weights for policy 0, policy_version 750265 (0.0012) [2024-06-15 20:43:10,955][1648985] Fps is (10 sec: 45874.8, 60 sec: 45875.2, 300 sec: 46319.5). Total num frames: 1536557056. Throughput: 0: 11730.5. Samples: 384198656. Policy #0 lag: (min: 59.0, avg: 184.8, max: 315.0) [2024-06-15 20:43:10,957][1648985] Avg episode reward: [(0, '163.140')] [2024-06-15 20:43:12,255][1652491] Updated weights for policy 0, policy_version 750313 (0.0014) [2024-06-15 20:43:13,208][1652491] Updated weights for policy 0, policy_version 750352 (0.0028) [2024-06-15 20:43:15,760][1652491] Updated weights for policy 0, policy_version 750404 (0.0030) [2024-06-15 20:43:15,955][1648985] Fps is (10 sec: 45875.3, 60 sec: 48605.9, 300 sec: 47208.1). Total num frames: 1536851968. Throughput: 0: 11867.0. Samples: 384266752. Policy #0 lag: (min: 59.0, avg: 184.8, max: 315.0) [2024-06-15 20:43:15,956][1648985] Avg episode reward: [(0, '163.140')] [2024-06-15 20:43:16,948][1652491] Updated weights for policy 0, policy_version 750463 (0.0012) [2024-06-15 20:43:20,955][1648985] Fps is (10 sec: 45876.1, 60 sec: 46421.4, 300 sec: 46546.0). Total num frames: 1537015808. Throughput: 0: 11855.7. Samples: 384340480. Policy #0 lag: (min: 59.0, avg: 184.8, max: 315.0) [2024-06-15 20:43:20,956][1648985] Avg episode reward: [(0, '161.820')] [2024-06-15 20:43:21,520][1652491] Updated weights for policy 0, policy_version 750518 (0.0013) [2024-06-15 20:43:23,505][1652491] Updated weights for policy 0, policy_version 750576 (0.0013) [2024-06-15 20:43:25,164][1652491] Updated weights for policy 0, policy_version 750612 (0.0019) [2024-06-15 20:43:25,955][1648985] Fps is (10 sec: 49152.0, 60 sec: 48059.7, 300 sec: 47097.1). Total num frames: 1537343488. Throughput: 0: 11844.2. Samples: 384373760. Policy #0 lag: (min: 59.0, avg: 184.8, max: 315.0) [2024-06-15 20:43:25,956][1648985] Avg episode reward: [(0, '164.510')] [2024-06-15 20:43:27,148][1652491] Updated weights for policy 0, policy_version 750688 (0.0013) [2024-06-15 20:43:30,955][1648985] Fps is (10 sec: 45874.7, 60 sec: 45875.2, 300 sec: 46986.0). Total num frames: 1537474560. Throughput: 0: 12003.6. Samples: 384443392. Policy #0 lag: (min: 59.0, avg: 184.8, max: 315.0) [2024-06-15 20:43:30,956][1648985] Avg episode reward: [(0, '162.530')] [2024-06-15 20:43:31,770][1652491] Updated weights for policy 0, policy_version 750721 (0.0019) [2024-06-15 20:43:32,953][1652491] Updated weights for policy 0, policy_version 750784 (0.0080) [2024-06-15 20:43:35,955][1648985] Fps is (10 sec: 39322.0, 60 sec: 48059.8, 300 sec: 46986.0). Total num frames: 1537736704. Throughput: 0: 11946.7. Samples: 384515072. Policy #0 lag: (min: 63.0, avg: 147.9, max: 319.0) [2024-06-15 20:43:35,956][1648985] Avg episode reward: [(0, '160.590')] [2024-06-15 20:43:36,130][1652491] Updated weights for policy 0, policy_version 750851 (0.0017) [2024-06-15 20:43:37,539][1652491] Updated weights for policy 0, policy_version 750903 (0.0032) [2024-06-15 20:43:38,558][1652491] Updated weights for policy 0, policy_version 750948 (0.0012) [2024-06-15 20:43:40,955][1648985] Fps is (10 sec: 52428.3, 60 sec: 47513.5, 300 sec: 47541.4). Total num frames: 1537998848. Throughput: 0: 11844.3. Samples: 384542720. Policy #0 lag: (min: 63.0, avg: 147.9, max: 319.0) [2024-06-15 20:43:40,956][1648985] Avg episode reward: [(0, '150.560')] [2024-06-15 20:43:43,696][1652491] Updated weights for policy 0, policy_version 750998 (0.0016) [2024-06-15 20:43:44,470][1652491] Updated weights for policy 0, policy_version 751034 (0.0012) [2024-06-15 20:43:45,812][1652491] Updated weights for policy 0, policy_version 751078 (0.0013) [2024-06-15 20:43:45,955][1648985] Fps is (10 sec: 49151.9, 60 sec: 47513.6, 300 sec: 47097.1). Total num frames: 1538228224. Throughput: 0: 11764.6. Samples: 384621056. Policy #0 lag: (min: 63.0, avg: 147.9, max: 319.0) [2024-06-15 20:43:45,956][1648985] Avg episode reward: [(0, '150.840')] [2024-06-15 20:43:47,458][1652491] Updated weights for policy 0, policy_version 751127 (0.0015) [2024-06-15 20:43:48,447][1651469] Signal inference workers to stop experience collection... (39050 times) [2024-06-15 20:43:48,449][1652491] Updated weights for policy 0, policy_version 751169 (0.0024) [2024-06-15 20:43:48,520][1652491] InferenceWorker_p0-w0: stopping experience collection (39050 times) [2024-06-15 20:43:48,667][1651469] Signal inference workers to resume experience collection... (39050 times) [2024-06-15 20:43:48,668][1652491] InferenceWorker_p0-w0: resuming experience collection (39050 times) [2024-06-15 20:43:49,489][1652491] Updated weights for policy 0, policy_version 751224 (0.0014) [2024-06-15 20:43:50,955][1648985] Fps is (10 sec: 52429.4, 60 sec: 49151.9, 300 sec: 47541.4). Total num frames: 1538523136. Throughput: 0: 11935.3. Samples: 384695296. Policy #0 lag: (min: 63.0, avg: 147.9, max: 319.0) [2024-06-15 20:43:50,956][1648985] Avg episode reward: [(0, '155.990')] [2024-06-15 20:43:54,887][1652491] Updated weights for policy 0, policy_version 751286 (0.0015) [2024-06-15 20:43:55,955][1648985] Fps is (10 sec: 45875.5, 60 sec: 46421.7, 300 sec: 47652.5). Total num frames: 1538686976. Throughput: 0: 11924.0. Samples: 384735232. Policy #0 lag: (min: 63.0, avg: 147.9, max: 319.0) [2024-06-15 20:43:55,955][1648985] Avg episode reward: [(0, '168.780')] [2024-06-15 20:43:56,124][1652491] Updated weights for policy 0, policy_version 751328 (0.0011) [2024-06-15 20:43:58,053][1652491] Updated weights for policy 0, policy_version 751361 (0.0012) [2024-06-15 20:43:59,344][1652491] Updated weights for policy 0, policy_version 751415 (0.0030) [2024-06-15 20:44:00,845][1652491] Updated weights for policy 0, policy_version 751459 (0.0013) [2024-06-15 20:44:00,955][1648985] Fps is (10 sec: 45874.6, 60 sec: 48059.7, 300 sec: 47541.4). Total num frames: 1538981888. Throughput: 0: 11832.9. Samples: 384799232. Policy #0 lag: (min: 63.0, avg: 147.9, max: 319.0) [2024-06-15 20:44:00,956][1648985] Avg episode reward: [(0, '167.870')] [2024-06-15 20:44:05,054][1652491] Updated weights for policy 0, policy_version 751504 (0.0015) [2024-06-15 20:44:05,955][1648985] Fps is (10 sec: 45875.3, 60 sec: 45875.4, 300 sec: 47874.6). Total num frames: 1539145728. Throughput: 0: 11776.0. Samples: 384870400. Policy #0 lag: (min: 63.0, avg: 147.9, max: 319.0) [2024-06-15 20:44:05,955][1648985] Avg episode reward: [(0, '157.350')] [2024-06-15 20:44:06,750][1652491] Updated weights for policy 0, policy_version 751554 (0.0012) [2024-06-15 20:44:07,927][1652491] Updated weights for policy 0, policy_version 751610 (0.0012) [2024-06-15 20:44:09,537][1652491] Updated weights for policy 0, policy_version 751664 (0.0013) [2024-06-15 20:44:10,955][1648985] Fps is (10 sec: 45875.8, 60 sec: 48059.8, 300 sec: 47541.4). Total num frames: 1539440640. Throughput: 0: 11776.0. Samples: 384903680. Policy #0 lag: (min: 63.0, avg: 147.9, max: 319.0) [2024-06-15 20:44:10,955][1648985] Avg episode reward: [(0, '146.210')] [2024-06-15 20:44:11,918][1652491] Updated weights for policy 0, policy_version 751712 (0.0014) [2024-06-15 20:44:15,955][1648985] Fps is (10 sec: 45874.3, 60 sec: 45875.2, 300 sec: 47652.4). Total num frames: 1539604480. Throughput: 0: 11935.3. Samples: 384980480. Policy #0 lag: (min: 63.0, avg: 147.9, max: 319.0) [2024-06-15 20:44:15,956][1648985] Avg episode reward: [(0, '142.820')] [2024-06-15 20:44:16,031][1652491] Updated weights for policy 0, policy_version 751762 (0.0013) [2024-06-15 20:44:17,684][1652491] Updated weights for policy 0, policy_version 751809 (0.0015) [2024-06-15 20:44:18,907][1652491] Updated weights for policy 0, policy_version 751872 (0.0028) [2024-06-15 20:44:20,955][1648985] Fps is (10 sec: 49152.1, 60 sec: 48605.8, 300 sec: 47652.4). Total num frames: 1539932160. Throughput: 0: 11832.9. Samples: 385047552. Policy #0 lag: (min: 63.0, avg: 147.9, max: 319.0) [2024-06-15 20:44:20,955][1648985] Avg episode reward: [(0, '142.040')] [2024-06-15 20:44:20,993][1652491] Updated weights for policy 0, policy_version 751929 (0.0014) [2024-06-15 20:44:22,806][1652491] Updated weights for policy 0, policy_version 751968 (0.0014) [2024-06-15 20:44:23,515][1652491] Updated weights for policy 0, policy_version 752000 (0.0012) [2024-06-15 20:44:25,955][1648985] Fps is (10 sec: 52428.1, 60 sec: 46421.2, 300 sec: 47652.4). Total num frames: 1540128768. Throughput: 0: 12208.3. Samples: 385092096. Policy #0 lag: (min: 63.0, avg: 147.9, max: 319.0) [2024-06-15 20:44:25,956][1648985] Avg episode reward: [(0, '165.930')] [2024-06-15 20:44:26,863][1652491] Updated weights for policy 0, policy_version 752060 (0.0133) [2024-06-15 20:44:28,930][1652491] Updated weights for policy 0, policy_version 752112 (0.0016) [2024-06-15 20:44:30,955][1648985] Fps is (10 sec: 42598.1, 60 sec: 48059.7, 300 sec: 47652.5). Total num frames: 1540358144. Throughput: 0: 11969.4. Samples: 385159680. Policy #0 lag: (min: 63.0, avg: 147.9, max: 319.0) [2024-06-15 20:44:30,956][1648985] Avg episode reward: [(0, '184.550')] [2024-06-15 20:44:32,284][1652491] Updated weights for policy 0, policy_version 752186 (0.0123) [2024-06-15 20:44:33,664][1651469] Signal inference workers to stop experience collection... (39100 times) [2024-06-15 20:44:33,696][1652491] Updated weights for policy 0, policy_version 752225 (0.0012) [2024-06-15 20:44:33,715][1652491] InferenceWorker_p0-w0: stopping experience collection (39100 times) [2024-06-15 20:44:33,927][1651469] Signal inference workers to resume experience collection... (39100 times) [2024-06-15 20:44:33,927][1652491] InferenceWorker_p0-w0: resuming experience collection (39100 times) [2024-06-15 20:44:35,955][1648985] Fps is (10 sec: 49153.1, 60 sec: 48059.7, 300 sec: 47763.5). Total num frames: 1540620288. Throughput: 0: 11946.7. Samples: 385232896. Policy #0 lag: (min: 63.0, avg: 147.9, max: 319.0) [2024-06-15 20:44:35,956][1648985] Avg episode reward: [(0, '181.140')] [2024-06-15 20:44:36,866][1652491] Updated weights for policy 0, policy_version 752260 (0.0012) [2024-06-15 20:44:38,050][1652491] Updated weights for policy 0, policy_version 752320 (0.0016) [2024-06-15 20:44:40,440][1652491] Updated weights for policy 0, policy_version 752382 (0.0012) [2024-06-15 20:44:40,955][1648985] Fps is (10 sec: 52427.6, 60 sec: 48059.6, 300 sec: 47875.1). Total num frames: 1540882432. Throughput: 0: 11855.5. Samples: 385268736. Policy #0 lag: (min: 63.0, avg: 147.9, max: 319.0) [2024-06-15 20:44:40,956][1648985] Avg episode reward: [(0, '177.080')] [2024-06-15 20:44:43,184][1652491] Updated weights for policy 0, policy_version 752440 (0.0012) [2024-06-15 20:44:44,211][1652491] Updated weights for policy 0, policy_version 752480 (0.0012) [2024-06-15 20:44:45,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 48605.9, 300 sec: 47874.6). Total num frames: 1541144576. Throughput: 0: 12026.4. Samples: 385340416. Policy #0 lag: (min: 63.0, avg: 147.9, max: 319.0) [2024-06-15 20:44:45,955][1648985] Avg episode reward: [(0, '163.920')] [2024-06-15 20:44:47,489][1652491] Updated weights for policy 0, policy_version 752528 (0.0134) [2024-06-15 20:44:50,258][1652491] Updated weights for policy 0, policy_version 752582 (0.0019) [2024-06-15 20:44:50,955][1648985] Fps is (10 sec: 45876.4, 60 sec: 46967.5, 300 sec: 47985.7). Total num frames: 1541341184. Throughput: 0: 12162.8. Samples: 385417728. Policy #0 lag: (min: 63.0, avg: 147.9, max: 319.0) [2024-06-15 20:44:50,956][1648985] Avg episode reward: [(0, '173.090')] [2024-06-15 20:44:51,500][1652491] Updated weights for policy 0, policy_version 752638 (0.0014) [2024-06-15 20:44:54,274][1652491] Updated weights for policy 0, policy_version 752698 (0.0015) [2024-06-15 20:44:55,624][1652491] Updated weights for policy 0, policy_version 752761 (0.0013) [2024-06-15 20:44:55,957][1648985] Fps is (10 sec: 52416.0, 60 sec: 49696.1, 300 sec: 47985.3). Total num frames: 1541668864. Throughput: 0: 12173.6. Samples: 385451520. Policy #0 lag: (min: 63.0, avg: 147.9, max: 319.0) [2024-06-15 20:44:55,958][1648985] Avg episode reward: [(0, '158.190')] [2024-06-15 20:44:55,967][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000752768_1541668864.pth... [2024-06-15 20:44:56,030][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000747136_1530134528.pth [2024-06-15 20:44:59,481][1652491] Updated weights for policy 0, policy_version 752816 (0.0012) [2024-06-15 20:45:00,956][1648985] Fps is (10 sec: 45873.6, 60 sec: 46967.3, 300 sec: 47874.5). Total num frames: 1541799936. Throughput: 0: 11946.6. Samples: 385518080. Policy #0 lag: (min: 63.0, avg: 147.9, max: 319.0) [2024-06-15 20:45:00,957][1648985] Avg episode reward: [(0, '164.820')] [2024-06-15 20:45:01,800][1652491] Updated weights for policy 0, policy_version 752880 (0.0047) [2024-06-15 20:45:05,649][1652491] Updated weights for policy 0, policy_version 752947 (0.0139) [2024-06-15 20:45:05,955][1648985] Fps is (10 sec: 39331.0, 60 sec: 48605.8, 300 sec: 47652.4). Total num frames: 1542062080. Throughput: 0: 11912.5. Samples: 385583616. Policy #0 lag: (min: 63.0, avg: 147.9, max: 319.0) [2024-06-15 20:45:05,956][1648985] Avg episode reward: [(0, '147.610')] [2024-06-15 20:45:07,416][1652491] Updated weights for policy 0, policy_version 752999 (0.0013) [2024-06-15 20:45:10,186][1652491] Updated weights for policy 0, policy_version 753040 (0.0014) [2024-06-15 20:45:10,955][1648985] Fps is (10 sec: 45876.5, 60 sec: 46967.4, 300 sec: 47766.1). Total num frames: 1542258688. Throughput: 0: 11821.6. Samples: 385624064. Policy #0 lag: (min: 63.0, avg: 147.9, max: 319.0) [2024-06-15 20:45:10,956][1648985] Avg episode reward: [(0, '151.350')] [2024-06-15 20:45:12,835][1652491] Updated weights for policy 0, policy_version 753090 (0.0012) [2024-06-15 20:45:13,937][1652491] Updated weights for policy 0, policy_version 753144 (0.0022) [2024-06-15 20:45:15,960][1648985] Fps is (10 sec: 42579.1, 60 sec: 48056.2, 300 sec: 47430.1). Total num frames: 1542488064. Throughput: 0: 11797.6. Samples: 385690624. Policy #0 lag: (min: 63.0, avg: 147.9, max: 319.0) [2024-06-15 20:45:15,960][1648985] Avg episode reward: [(0, '140.770')] [2024-06-15 20:45:16,596][1652491] Updated weights for policy 0, policy_version 753190 (0.0013) [2024-06-15 20:45:18,004][1652491] Updated weights for policy 0, policy_version 753236 (0.0013) [2024-06-15 20:45:20,909][1651469] Signal inference workers to stop experience collection... (39150 times) [2024-06-15 20:45:20,955][1648985] Fps is (10 sec: 45875.7, 60 sec: 46421.3, 300 sec: 47541.4). Total num frames: 1542717440. Throughput: 0: 11867.0. Samples: 385766912. Policy #0 lag: (min: 51.0, avg: 164.4, max: 307.0) [2024-06-15 20:45:20,956][1648985] Avg episode reward: [(0, '166.310')] [2024-06-15 20:45:20,950][1652491] InferenceWorker_p0-w0: stopping experience collection (39150 times) [2024-06-15 20:45:21,135][1651469] Signal inference workers to resume experience collection... (39150 times) [2024-06-15 20:45:21,137][1652491] InferenceWorker_p0-w0: resuming experience collection (39150 times) [2024-06-15 20:45:21,139][1652491] Updated weights for policy 0, policy_version 753296 (0.0025) [2024-06-15 20:45:23,654][1652491] Updated weights for policy 0, policy_version 753366 (0.0014) [2024-06-15 20:45:25,580][1652491] Updated weights for policy 0, policy_version 753412 (0.0035) [2024-06-15 20:45:25,955][1648985] Fps is (10 sec: 52452.6, 60 sec: 48059.9, 300 sec: 47652.4). Total num frames: 1543012352. Throughput: 0: 11924.0. Samples: 385805312. Policy #0 lag: (min: 51.0, avg: 164.4, max: 307.0) [2024-06-15 20:45:25,956][1648985] Avg episode reward: [(0, '167.450')] [2024-06-15 20:45:28,758][1652491] Updated weights for policy 0, policy_version 753492 (0.0022) [2024-06-15 20:45:30,955][1648985] Fps is (10 sec: 52428.4, 60 sec: 48059.7, 300 sec: 47652.5). Total num frames: 1543241728. Throughput: 0: 11832.9. Samples: 385872896. Policy #0 lag: (min: 51.0, avg: 164.4, max: 307.0) [2024-06-15 20:45:30,956][1648985] Avg episode reward: [(0, '159.140')] [2024-06-15 20:45:31,657][1652491] Updated weights for policy 0, policy_version 753538 (0.0014) [2024-06-15 20:45:34,627][1652491] Updated weights for policy 0, policy_version 753605 (0.0028) [2024-06-15 20:45:35,955][1648985] Fps is (10 sec: 49152.1, 60 sec: 48059.8, 300 sec: 47541.4). Total num frames: 1543503872. Throughput: 0: 11707.7. Samples: 385944576. Policy #0 lag: (min: 51.0, avg: 164.4, max: 307.0) [2024-06-15 20:45:35,956][1648985] Avg episode reward: [(0, '155.910')] [2024-06-15 20:45:37,278][1652491] Updated weights for policy 0, policy_version 753680 (0.0014) [2024-06-15 20:45:38,283][1652491] Updated weights for policy 0, policy_version 753726 (0.0019) [2024-06-15 20:45:40,187][1652491] Updated weights for policy 0, policy_version 753788 (0.0126) [2024-06-15 20:45:40,955][1648985] Fps is (10 sec: 52428.6, 60 sec: 48059.9, 300 sec: 47874.6). Total num frames: 1543766016. Throughput: 0: 11731.1. Samples: 385979392. Policy #0 lag: (min: 51.0, avg: 164.4, max: 307.0) [2024-06-15 20:45:40,956][1648985] Avg episode reward: [(0, '169.240')] [2024-06-15 20:45:44,195][1652491] Updated weights for policy 0, policy_version 753848 (0.0015) [2024-06-15 20:45:45,955][1648985] Fps is (10 sec: 42598.6, 60 sec: 46421.4, 300 sec: 47208.1). Total num frames: 1543929856. Throughput: 0: 11924.0. Samples: 386054656. Policy #0 lag: (min: 51.0, avg: 164.4, max: 307.0) [2024-06-15 20:45:45,955][1648985] Avg episode reward: [(0, '194.200')] [2024-06-15 20:45:46,704][1652491] Updated weights for policy 0, policy_version 753913 (0.0013) [2024-06-15 20:45:49,482][1652491] Updated weights for policy 0, policy_version 753968 (0.0012) [2024-06-15 20:45:50,952][1652491] Updated weights for policy 0, policy_version 754033 (0.0014) [2024-06-15 20:45:50,955][1648985] Fps is (10 sec: 49152.6, 60 sec: 48605.9, 300 sec: 47874.6). Total num frames: 1544257536. Throughput: 0: 11935.3. Samples: 386120704. Policy #0 lag: (min: 51.0, avg: 164.4, max: 307.0) [2024-06-15 20:45:50,956][1648985] Avg episode reward: [(0, '176.500')] [2024-06-15 20:45:55,583][1652491] Updated weights for policy 0, policy_version 754101 (0.0012) [2024-06-15 20:45:55,955][1648985] Fps is (10 sec: 49151.4, 60 sec: 45877.0, 300 sec: 47319.2). Total num frames: 1544421376. Throughput: 0: 11980.8. Samples: 386163200. Policy #0 lag: (min: 51.0, avg: 164.4, max: 307.0) [2024-06-15 20:45:55,956][1648985] Avg episode reward: [(0, '147.110')] [2024-06-15 20:45:56,482][1652491] Updated weights for policy 0, policy_version 754128 (0.0013) [2024-06-15 20:45:57,513][1652491] Updated weights for policy 0, policy_version 754172 (0.0013) [2024-06-15 20:46:00,077][1652491] Updated weights for policy 0, policy_version 754237 (0.0013) [2024-06-15 20:46:00,955][1648985] Fps is (10 sec: 45874.4, 60 sec: 48606.0, 300 sec: 47652.4). Total num frames: 1544716288. Throughput: 0: 12061.6. Samples: 386233344. Policy #0 lag: (min: 51.0, avg: 164.4, max: 307.0) [2024-06-15 20:46:00,956][1648985] Avg episode reward: [(0, '128.140')] [2024-06-15 20:46:01,418][1652491] Updated weights for policy 0, policy_version 754277 (0.0013) [2024-06-15 20:46:05,350][1651469] Signal inference workers to stop experience collection... (39200 times) [2024-06-15 20:46:05,378][1652491] InferenceWorker_p0-w0: stopping experience collection (39200 times) [2024-06-15 20:46:05,519][1651469] Signal inference workers to resume experience collection... (39200 times) [2024-06-15 20:46:05,519][1652491] InferenceWorker_p0-w0: resuming experience collection (39200 times) [2024-06-15 20:46:05,746][1652491] Updated weights for policy 0, policy_version 754342 (0.0012) [2024-06-15 20:46:05,955][1648985] Fps is (10 sec: 49152.3, 60 sec: 47513.6, 300 sec: 47430.3). Total num frames: 1544912896. Throughput: 0: 12026.3. Samples: 386308096. Policy #0 lag: (min: 51.0, avg: 164.4, max: 307.0) [2024-06-15 20:46:05,956][1648985] Avg episode reward: [(0, '140.500')] [2024-06-15 20:46:07,663][1652491] Updated weights for policy 0, policy_version 754416 (0.0017) [2024-06-15 20:46:10,955][1648985] Fps is (10 sec: 45876.0, 60 sec: 48605.9, 300 sec: 47430.3). Total num frames: 1545175040. Throughput: 0: 11935.3. Samples: 386342400. Policy #0 lag: (min: 51.0, avg: 164.4, max: 307.0) [2024-06-15 20:46:10,956][1648985] Avg episode reward: [(0, '153.360')] [2024-06-15 20:46:10,959][1652491] Updated weights for policy 0, policy_version 754486 (0.0013) [2024-06-15 20:46:12,515][1652491] Updated weights for policy 0, policy_version 754557 (0.0014) [2024-06-15 20:46:15,955][1648985] Fps is (10 sec: 42598.3, 60 sec: 47517.2, 300 sec: 47208.1). Total num frames: 1545338880. Throughput: 0: 12003.6. Samples: 386413056. Policy #0 lag: (min: 51.0, avg: 164.4, max: 307.0) [2024-06-15 20:46:15,956][1648985] Avg episode reward: [(0, '160.590')] [2024-06-15 20:46:16,714][1652491] Updated weights for policy 0, policy_version 754608 (0.0012) [2024-06-15 20:46:17,891][1652491] Updated weights for policy 0, policy_version 754626 (0.0012) [2024-06-15 20:46:18,808][1652491] Updated weights for policy 0, policy_version 754684 (0.0022) [2024-06-15 20:46:20,955][1648985] Fps is (10 sec: 42598.2, 60 sec: 48059.7, 300 sec: 47319.2). Total num frames: 1545601024. Throughput: 0: 12208.3. Samples: 386493952. Policy #0 lag: (min: 51.0, avg: 164.4, max: 307.0) [2024-06-15 20:46:20,956][1648985] Avg episode reward: [(0, '157.530')] [2024-06-15 20:46:21,959][1652491] Updated weights for policy 0, policy_version 754738 (0.0013) [2024-06-15 20:46:23,499][1652491] Updated weights for policy 0, policy_version 754804 (0.0015) [2024-06-15 20:46:25,955][1648985] Fps is (10 sec: 52429.3, 60 sec: 47513.7, 300 sec: 47541.4). Total num frames: 1545863168. Throughput: 0: 12060.5. Samples: 386522112. Policy #0 lag: (min: 51.0, avg: 164.4, max: 307.0) [2024-06-15 20:46:25,955][1648985] Avg episode reward: [(0, '164.960')] [2024-06-15 20:46:27,554][1652491] Updated weights for policy 0, policy_version 754848 (0.0121) [2024-06-15 20:46:28,923][1652491] Updated weights for policy 0, policy_version 754897 (0.0013) [2024-06-15 20:46:30,955][1648985] Fps is (10 sec: 52429.3, 60 sec: 48059.8, 300 sec: 47541.4). Total num frames: 1546125312. Throughput: 0: 11889.8. Samples: 386589696. Policy #0 lag: (min: 51.0, avg: 164.4, max: 307.0) [2024-06-15 20:46:30,955][1648985] Avg episode reward: [(0, '178.210')] [2024-06-15 20:46:32,145][1652491] Updated weights for policy 0, policy_version 754963 (0.0013) [2024-06-15 20:46:34,274][1652491] Updated weights for policy 0, policy_version 755040 (0.0017) [2024-06-15 20:46:35,955][1648985] Fps is (10 sec: 52428.3, 60 sec: 48059.7, 300 sec: 47874.6). Total num frames: 1546387456. Throughput: 0: 12242.5. Samples: 386671616. Policy #0 lag: (min: 51.0, avg: 164.4, max: 307.0) [2024-06-15 20:46:35,956][1648985] Avg episode reward: [(0, '179.760')] [2024-06-15 20:46:37,324][1652491] Updated weights for policy 0, policy_version 755073 (0.0010) [2024-06-15 20:46:38,780][1652491] Updated weights for policy 0, policy_version 755138 (0.0012) [2024-06-15 20:46:39,977][1652491] Updated weights for policy 0, policy_version 755193 (0.0139) [2024-06-15 20:46:40,955][1648985] Fps is (10 sec: 52428.4, 60 sec: 48059.8, 300 sec: 47541.4). Total num frames: 1546649600. Throughput: 0: 12003.6. Samples: 386703360. Policy #0 lag: (min: 51.0, avg: 164.4, max: 307.0) [2024-06-15 20:46:40,956][1648985] Avg episode reward: [(0, '163.410')] [2024-06-15 20:46:43,355][1652491] Updated weights for policy 0, policy_version 755248 (0.0012) [2024-06-15 20:46:45,955][1648985] Fps is (10 sec: 45875.3, 60 sec: 48605.8, 300 sec: 47763.5). Total num frames: 1546846208. Throughput: 0: 12049.1. Samples: 386775552. Policy #0 lag: (min: 51.0, avg: 164.4, max: 307.0) [2024-06-15 20:46:45,956][1648985] Avg episode reward: [(0, '171.040')] [2024-06-15 20:46:46,298][1652491] Updated weights for policy 0, policy_version 755320 (0.0083) [2024-06-15 20:46:48,532][1651469] Signal inference workers to stop experience collection... (39250 times) [2024-06-15 20:46:48,567][1652491] InferenceWorker_p0-w0: stopping experience collection (39250 times) [2024-06-15 20:46:48,763][1651469] Signal inference workers to resume experience collection... (39250 times) [2024-06-15 20:46:48,764][1652491] InferenceWorker_p0-w0: resuming experience collection (39250 times) [2024-06-15 20:46:49,763][1652491] Updated weights for policy 0, policy_version 755379 (0.0016) [2024-06-15 20:46:50,955][1648985] Fps is (10 sec: 45875.4, 60 sec: 47513.6, 300 sec: 47541.4). Total num frames: 1547108352. Throughput: 0: 11867.0. Samples: 386842112. Policy #0 lag: (min: 51.0, avg: 164.4, max: 307.0) [2024-06-15 20:46:50,956][1648985] Avg episode reward: [(0, '163.140')] [2024-06-15 20:46:51,382][1652491] Updated weights for policy 0, policy_version 755447 (0.0014) [2024-06-15 20:46:54,680][1652491] Updated weights for policy 0, policy_version 755489 (0.0019) [2024-06-15 20:46:55,955][1648985] Fps is (10 sec: 45874.8, 60 sec: 48059.7, 300 sec: 47541.3). Total num frames: 1547304960. Throughput: 0: 11992.1. Samples: 386882048. Policy #0 lag: (min: 51.0, avg: 164.4, max: 307.0) [2024-06-15 20:46:55,956][1648985] Avg episode reward: [(0, '169.640')] [2024-06-15 20:46:55,967][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000755520_1547304960.pth... [2024-06-15 20:46:56,019][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000749952_1535901696.pth [2024-06-15 20:46:56,695][1652491] Updated weights for policy 0, policy_version 755524 (0.0019) [2024-06-15 20:47:00,379][1652491] Updated weights for policy 0, policy_version 755600 (0.0014) [2024-06-15 20:47:00,955][1648985] Fps is (10 sec: 39321.4, 60 sec: 46421.4, 300 sec: 47319.2). Total num frames: 1547501568. Throughput: 0: 12003.6. Samples: 386953216. Policy #0 lag: (min: 51.0, avg: 164.4, max: 307.0) [2024-06-15 20:47:00,956][1648985] Avg episode reward: [(0, '154.730')] [2024-06-15 20:47:03,349][1652491] Updated weights for policy 0, policy_version 755706 (0.0019) [2024-06-15 20:47:05,955][1648985] Fps is (10 sec: 39321.5, 60 sec: 46421.2, 300 sec: 47097.0). Total num frames: 1547698176. Throughput: 0: 11502.9. Samples: 387011584. Policy #0 lag: (min: 47.0, avg: 146.5, max: 303.0) [2024-06-15 20:47:05,956][1648985] Avg episode reward: [(0, '158.810')] [2024-06-15 20:47:06,845][1652491] Updated weights for policy 0, policy_version 755760 (0.0015) [2024-06-15 20:47:10,069][1652491] Updated weights for policy 0, policy_version 755813 (0.0012) [2024-06-15 20:47:10,955][1648985] Fps is (10 sec: 45875.7, 60 sec: 46421.4, 300 sec: 47541.4). Total num frames: 1547960320. Throughput: 0: 11639.5. Samples: 387045888. Policy #0 lag: (min: 47.0, avg: 146.5, max: 303.0) [2024-06-15 20:47:10,955][1648985] Avg episode reward: [(0, '151.030')] [2024-06-15 20:47:12,520][1652491] Updated weights for policy 0, policy_version 755860 (0.0042) [2024-06-15 20:47:14,112][1652491] Updated weights for policy 0, policy_version 755921 (0.0013) [2024-06-15 20:47:15,129][1652491] Updated weights for policy 0, policy_version 755966 (0.0014) [2024-06-15 20:47:15,955][1648985] Fps is (10 sec: 52429.2, 60 sec: 48059.7, 300 sec: 47430.3). Total num frames: 1548222464. Throughput: 0: 11434.6. Samples: 387104256. Policy #0 lag: (min: 47.0, avg: 146.5, max: 303.0) [2024-06-15 20:47:15,956][1648985] Avg episode reward: [(0, '158.530')] [2024-06-15 20:47:18,085][1652491] Updated weights for policy 0, policy_version 756019 (0.0015) [2024-06-15 20:47:20,955][1648985] Fps is (10 sec: 39320.9, 60 sec: 45875.1, 300 sec: 47097.0). Total num frames: 1548353536. Throughput: 0: 11309.5. Samples: 387180544. Policy #0 lag: (min: 47.0, avg: 146.5, max: 303.0) [2024-06-15 20:47:20,956][1648985] Avg episode reward: [(0, '162.420')] [2024-06-15 20:47:21,881][1652491] Updated weights for policy 0, policy_version 756064 (0.0013) [2024-06-15 20:47:24,730][1652491] Updated weights for policy 0, policy_version 756116 (0.0014) [2024-06-15 20:47:25,955][1648985] Fps is (10 sec: 39321.6, 60 sec: 45875.1, 300 sec: 47097.0). Total num frames: 1548615680. Throughput: 0: 11411.9. Samples: 387216896. Policy #0 lag: (min: 47.0, avg: 146.5, max: 303.0) [2024-06-15 20:47:25,956][1648985] Avg episode reward: [(0, '163.460')] [2024-06-15 20:47:26,185][1652491] Updated weights for policy 0, policy_version 756177 (0.0011) [2024-06-15 20:47:27,023][1652491] Updated weights for policy 0, policy_version 756222 (0.0013) [2024-06-15 20:47:29,197][1652491] Updated weights for policy 0, policy_version 756281 (0.0100) [2024-06-15 20:47:30,955][1648985] Fps is (10 sec: 52429.4, 60 sec: 45875.2, 300 sec: 47541.4). Total num frames: 1548877824. Throughput: 0: 11161.6. Samples: 387277824. Policy #0 lag: (min: 47.0, avg: 146.5, max: 303.0) [2024-06-15 20:47:30,956][1648985] Avg episode reward: [(0, '176.730')] [2024-06-15 20:47:33,299][1651469] Signal inference workers to stop experience collection... (39300 times) [2024-06-15 20:47:33,341][1652491] InferenceWorker_p0-w0: stopping experience collection (39300 times) [2024-06-15 20:47:33,508][1651469] Signal inference workers to resume experience collection... (39300 times) [2024-06-15 20:47:33,509][1652491] InferenceWorker_p0-w0: resuming experience collection (39300 times) [2024-06-15 20:47:33,511][1652491] Updated weights for policy 0, policy_version 756320 (0.0013) [2024-06-15 20:47:35,533][1652491] Updated weights for policy 0, policy_version 756368 (0.0013) [2024-06-15 20:47:35,955][1648985] Fps is (10 sec: 42599.3, 60 sec: 44236.9, 300 sec: 47097.1). Total num frames: 1549041664. Throughput: 0: 11400.6. Samples: 387355136. Policy #0 lag: (min: 47.0, avg: 146.5, max: 303.0) [2024-06-15 20:47:35,955][1648985] Avg episode reward: [(0, '175.270')] [2024-06-15 20:47:37,772][1652491] Updated weights for policy 0, policy_version 756449 (0.0013) [2024-06-15 20:47:38,812][1652491] Updated weights for policy 0, policy_version 756484 (0.0015) [2024-06-15 20:47:40,115][1652491] Updated weights for policy 0, policy_version 756543 (0.0016) [2024-06-15 20:47:40,955][1648985] Fps is (10 sec: 52427.5, 60 sec: 45875.0, 300 sec: 47541.3). Total num frames: 1549402112. Throughput: 0: 11070.5. Samples: 387380224. Policy #0 lag: (min: 47.0, avg: 146.5, max: 303.0) [2024-06-15 20:47:40,956][1648985] Avg episode reward: [(0, '163.550')] [2024-06-15 20:47:45,507][1652491] Updated weights for policy 0, policy_version 756584 (0.0017) [2024-06-15 20:47:45,955][1648985] Fps is (10 sec: 49150.8, 60 sec: 44782.9, 300 sec: 47319.2). Total num frames: 1549533184. Throughput: 0: 11207.1. Samples: 387457536. Policy #0 lag: (min: 47.0, avg: 146.5, max: 303.0) [2024-06-15 20:47:45,956][1648985] Avg episode reward: [(0, '144.960')] [2024-06-15 20:47:46,963][1652491] Updated weights for policy 0, policy_version 756610 (0.0014) [2024-06-15 20:47:49,206][1652491] Updated weights for policy 0, policy_version 756693 (0.0011) [2024-06-15 20:47:50,493][1652491] Updated weights for policy 0, policy_version 756754 (0.0012) [2024-06-15 20:47:50,955][1648985] Fps is (10 sec: 45876.1, 60 sec: 45875.1, 300 sec: 47319.3). Total num frames: 1549860864. Throughput: 0: 11309.5. Samples: 387520512. Policy #0 lag: (min: 47.0, avg: 146.5, max: 303.0) [2024-06-15 20:47:50,956][1648985] Avg episode reward: [(0, '135.880')] [2024-06-15 20:47:55,956][1648985] Fps is (10 sec: 39316.8, 60 sec: 43689.8, 300 sec: 46874.7). Total num frames: 1549926400. Throughput: 0: 11525.3. Samples: 387564544. Policy #0 lag: (min: 47.0, avg: 146.5, max: 303.0) [2024-06-15 20:47:55,957][1648985] Avg episode reward: [(0, '165.420')] [2024-06-15 20:47:56,425][1652491] Updated weights for policy 0, policy_version 756818 (0.0109) [2024-06-15 20:47:57,588][1652491] Updated weights for policy 0, policy_version 756880 (0.0013) [2024-06-15 20:47:58,521][1652491] Updated weights for policy 0, policy_version 756918 (0.0015) [2024-06-15 20:47:59,506][1652491] Updated weights for policy 0, policy_version 756945 (0.0044) [2024-06-15 20:48:00,955][1648985] Fps is (10 sec: 45875.4, 60 sec: 46967.5, 300 sec: 47208.2). Total num frames: 1550319616. Throughput: 0: 11753.3. Samples: 387633152. Policy #0 lag: (min: 47.0, avg: 146.5, max: 303.0) [2024-06-15 20:48:00,956][1648985] Avg episode reward: [(0, '169.530')] [2024-06-15 20:48:01,385][1652491] Updated weights for policy 0, policy_version 757008 (0.0011) [2024-06-15 20:48:02,204][1652491] Updated weights for policy 0, policy_version 757056 (0.0013) [2024-06-15 20:48:05,955][1648985] Fps is (10 sec: 52435.7, 60 sec: 45875.3, 300 sec: 47097.1). Total num frames: 1550450688. Throughput: 0: 11616.7. Samples: 387703296. Policy #0 lag: (min: 47.0, avg: 146.5, max: 303.0) [2024-06-15 20:48:05,956][1648985] Avg episode reward: [(0, '182.160')] [2024-06-15 20:48:07,943][1652491] Updated weights for policy 0, policy_version 757115 (0.0012) [2024-06-15 20:48:09,765][1652491] Updated weights for policy 0, policy_version 757176 (0.0011) [2024-06-15 20:48:10,955][1648985] Fps is (10 sec: 39321.3, 60 sec: 45875.1, 300 sec: 46986.0). Total num frames: 1550712832. Throughput: 0: 11571.2. Samples: 387737600. Policy #0 lag: (min: 47.0, avg: 146.5, max: 303.0) [2024-06-15 20:48:10,956][1648985] Avg episode reward: [(0, '176.940')] [2024-06-15 20:48:12,633][1652491] Updated weights for policy 0, policy_version 757248 (0.0013) [2024-06-15 20:48:12,737][1651469] Signal inference workers to stop experience collection... (39350 times) [2024-06-15 20:48:12,769][1652491] InferenceWorker_p0-w0: stopping experience collection (39350 times) [2024-06-15 20:48:12,883][1651469] Signal inference workers to resume experience collection... (39350 times) [2024-06-15 20:48:12,883][1652491] InferenceWorker_p0-w0: resuming experience collection (39350 times) [2024-06-15 20:48:13,793][1652491] Updated weights for policy 0, policy_version 757309 (0.0013) [2024-06-15 20:48:15,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 45875.3, 300 sec: 47319.2). Total num frames: 1550974976. Throughput: 0: 11685.0. Samples: 387803648. Policy #0 lag: (min: 47.0, avg: 146.5, max: 303.0) [2024-06-15 20:48:15,956][1648985] Avg episode reward: [(0, '175.890')] [2024-06-15 20:48:18,795][1652491] Updated weights for policy 0, policy_version 757350 (0.0012) [2024-06-15 20:48:20,296][1652491] Updated weights for policy 0, policy_version 757429 (0.0014) [2024-06-15 20:48:20,955][1648985] Fps is (10 sec: 52428.0, 60 sec: 48059.6, 300 sec: 47097.0). Total num frames: 1551237120. Throughput: 0: 11707.6. Samples: 387881984. Policy #0 lag: (min: 47.0, avg: 146.5, max: 303.0) [2024-06-15 20:48:20,956][1648985] Avg episode reward: [(0, '166.660')] [2024-06-15 20:48:23,170][1652491] Updated weights for policy 0, policy_version 757475 (0.0015) [2024-06-15 20:48:24,297][1652491] Updated weights for policy 0, policy_version 757522 (0.0012) [2024-06-15 20:48:25,955][1648985] Fps is (10 sec: 52427.8, 60 sec: 48059.6, 300 sec: 47541.3). Total num frames: 1551499264. Throughput: 0: 11946.7. Samples: 387917824. Policy #0 lag: (min: 47.0, avg: 146.5, max: 303.0) [2024-06-15 20:48:25,956][1648985] Avg episode reward: [(0, '169.690')] [2024-06-15 20:48:29,705][1652491] Updated weights for policy 0, policy_version 757585 (0.0013) [2024-06-15 20:48:30,955][1648985] Fps is (10 sec: 39322.2, 60 sec: 45875.1, 300 sec: 47097.0). Total num frames: 1551630336. Throughput: 0: 11753.2. Samples: 387986432. Policy #0 lag: (min: 47.0, avg: 146.5, max: 303.0) [2024-06-15 20:48:30,956][1648985] Avg episode reward: [(0, '163.850')] [2024-06-15 20:48:32,024][1652491] Updated weights for policy 0, policy_version 757680 (0.0013) [2024-06-15 20:48:34,920][1652491] Updated weights for policy 0, policy_version 757728 (0.0012) [2024-06-15 20:48:35,955][1648985] Fps is (10 sec: 39321.7, 60 sec: 47513.3, 300 sec: 47097.1). Total num frames: 1551892480. Throughput: 0: 11548.4. Samples: 388040192. Policy #0 lag: (min: 47.0, avg: 146.5, max: 303.0) [2024-06-15 20:48:35,956][1648985] Avg episode reward: [(0, '169.940')] [2024-06-15 20:48:37,127][1652491] Updated weights for policy 0, policy_version 757819 (0.0015) [2024-06-15 20:48:40,955][1648985] Fps is (10 sec: 39321.9, 60 sec: 43690.8, 300 sec: 46763.8). Total num frames: 1552023552. Throughput: 0: 11321.2. Samples: 388073984. Policy #0 lag: (min: 47.0, avg: 146.5, max: 303.0) [2024-06-15 20:48:40,956][1648985] Avg episode reward: [(0, '158.700')] [2024-06-15 20:48:43,341][1652491] Updated weights for policy 0, policy_version 757876 (0.0013) [2024-06-15 20:48:45,955][1648985] Fps is (10 sec: 39322.2, 60 sec: 45875.2, 300 sec: 46652.7). Total num frames: 1552285696. Throughput: 0: 11286.7. Samples: 388141056. Policy #0 lag: (min: 47.0, avg: 146.5, max: 303.0) [2024-06-15 20:48:45,956][1648985] Avg episode reward: [(0, '167.020')] [2024-06-15 20:48:46,251][1652491] Updated weights for policy 0, policy_version 757968 (0.0013) [2024-06-15 20:48:47,649][1652491] Updated weights for policy 0, policy_version 758033 (0.0119) [2024-06-15 20:48:50,955][1648985] Fps is (10 sec: 52427.9, 60 sec: 44782.8, 300 sec: 46985.9). Total num frames: 1552547840. Throughput: 0: 11389.1. Samples: 388215808. Policy #0 lag: (min: 15.0, avg: 126.1, max: 271.0) [2024-06-15 20:48:50,956][1648985] Avg episode reward: [(0, '167.210')] [2024-06-15 20:48:53,165][1652491] Updated weights for policy 0, policy_version 758097 (0.0097) [2024-06-15 20:48:54,842][1652491] Updated weights for policy 0, policy_version 758160 (0.0015) [2024-06-15 20:48:54,947][1651469] Signal inference workers to stop experience collection... (39400 times) [2024-06-15 20:48:55,007][1652491] InferenceWorker_p0-w0: stopping experience collection (39400 times) [2024-06-15 20:48:55,119][1651469] Signal inference workers to resume experience collection... (39400 times) [2024-06-15 20:48:55,120][1652491] InferenceWorker_p0-w0: resuming experience collection (39400 times) [2024-06-15 20:48:55,573][1652491] Updated weights for policy 0, policy_version 758203 (0.0023) [2024-06-15 20:48:55,955][1648985] Fps is (10 sec: 52426.5, 60 sec: 48060.4, 300 sec: 46874.8). Total num frames: 1552809984. Throughput: 0: 11468.7. Samples: 388253696. Policy #0 lag: (min: 15.0, avg: 126.1, max: 271.0) [2024-06-15 20:48:55,956][1648985] Avg episode reward: [(0, '150.950')] [2024-06-15 20:48:55,962][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000758208_1552809984.pth... [2024-06-15 20:48:56,019][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000752768_1541668864.pth [2024-06-15 20:48:58,080][1652491] Updated weights for policy 0, policy_version 758272 (0.0017) [2024-06-15 20:48:59,214][1652491] Updated weights for policy 0, policy_version 758328 (0.0013) [2024-06-15 20:49:00,955][1648985] Fps is (10 sec: 52429.4, 60 sec: 45875.2, 300 sec: 47208.1). Total num frames: 1553072128. Throughput: 0: 11525.7. Samples: 388322304. Policy #0 lag: (min: 15.0, avg: 126.1, max: 271.0) [2024-06-15 20:49:00,956][1648985] Avg episode reward: [(0, '160.990')] [2024-06-15 20:49:04,932][1652491] Updated weights for policy 0, policy_version 758386 (0.0143) [2024-06-15 20:49:05,955][1648985] Fps is (10 sec: 42600.9, 60 sec: 46421.4, 300 sec: 46763.8). Total num frames: 1553235968. Throughput: 0: 11264.1. Samples: 388388864. Policy #0 lag: (min: 15.0, avg: 126.1, max: 271.0) [2024-06-15 20:49:05,955][1648985] Avg episode reward: [(0, '165.940')] [2024-06-15 20:49:06,668][1652491] Updated weights for policy 0, policy_version 758457 (0.0011) [2024-06-15 20:49:09,438][1652491] Updated weights for policy 0, policy_version 758498 (0.0011) [2024-06-15 20:49:10,955][1648985] Fps is (10 sec: 45875.2, 60 sec: 46967.5, 300 sec: 47208.1). Total num frames: 1553530880. Throughput: 0: 11298.2. Samples: 388426240. Policy #0 lag: (min: 15.0, avg: 126.1, max: 271.0) [2024-06-15 20:49:10,956][1648985] Avg episode reward: [(0, '158.350')] [2024-06-15 20:49:11,371][1652491] Updated weights for policy 0, policy_version 758584 (0.0139) [2024-06-15 20:49:15,955][1648985] Fps is (10 sec: 39321.4, 60 sec: 44236.8, 300 sec: 46430.6). Total num frames: 1553629184. Throughput: 0: 11218.5. Samples: 388491264. Policy #0 lag: (min: 15.0, avg: 126.1, max: 271.0) [2024-06-15 20:49:15,955][1648985] Avg episode reward: [(0, '157.560')] [2024-06-15 20:49:16,070][1652491] Updated weights for policy 0, policy_version 758613 (0.0034) [2024-06-15 20:49:17,641][1652491] Updated weights for policy 0, policy_version 758672 (0.0012) [2024-06-15 20:49:18,771][1652491] Updated weights for policy 0, policy_version 758720 (0.0013) [2024-06-15 20:49:20,955][1648985] Fps is (10 sec: 42598.6, 60 sec: 45329.2, 300 sec: 46874.9). Total num frames: 1553956864. Throughput: 0: 11514.4. Samples: 388558336. Policy #0 lag: (min: 15.0, avg: 126.1, max: 271.0) [2024-06-15 20:49:20,956][1648985] Avg episode reward: [(0, '153.450')] [2024-06-15 20:49:21,912][1652491] Updated weights for policy 0, policy_version 758805 (0.0015) [2024-06-15 20:49:22,852][1652491] Updated weights for policy 0, policy_version 758847 (0.0012) [2024-06-15 20:49:25,955][1648985] Fps is (10 sec: 49151.4, 60 sec: 43690.8, 300 sec: 46652.7). Total num frames: 1554120704. Throughput: 0: 11525.7. Samples: 388592640. Policy #0 lag: (min: 15.0, avg: 126.1, max: 271.0) [2024-06-15 20:49:25,956][1648985] Avg episode reward: [(0, '169.410')] [2024-06-15 20:49:27,486][1652491] Updated weights for policy 0, policy_version 758883 (0.0013) [2024-06-15 20:49:28,409][1652491] Updated weights for policy 0, policy_version 758917 (0.0027) [2024-06-15 20:49:30,955][1648985] Fps is (10 sec: 42598.4, 60 sec: 45875.3, 300 sec: 46652.7). Total num frames: 1554382848. Throughput: 0: 11548.5. Samples: 388660736. Policy #0 lag: (min: 15.0, avg: 126.1, max: 271.0) [2024-06-15 20:49:30,956][1648985] Avg episode reward: [(0, '177.050')] [2024-06-15 20:49:30,988][1652491] Updated weights for policy 0, policy_version 758979 (0.0013) [2024-06-15 20:49:31,928][1652491] Updated weights for policy 0, policy_version 759028 (0.0012) [2024-06-15 20:49:33,203][1652491] Updated weights for policy 0, policy_version 759057 (0.0019) [2024-06-15 20:49:35,955][1648985] Fps is (10 sec: 52428.6, 60 sec: 45875.3, 300 sec: 46652.8). Total num frames: 1554644992. Throughput: 0: 11559.8. Samples: 388736000. Policy #0 lag: (min: 15.0, avg: 126.1, max: 271.0) [2024-06-15 20:49:35,956][1648985] Avg episode reward: [(0, '170.810')] [2024-06-15 20:49:38,096][1651469] Signal inference workers to stop experience collection... (39450 times) [2024-06-15 20:49:38,145][1652491] Updated weights for policy 0, policy_version 759139 (0.0023) [2024-06-15 20:49:38,159][1652491] InferenceWorker_p0-w0: stopping experience collection (39450 times) [2024-06-15 20:49:38,354][1651469] Signal inference workers to resume experience collection... (39450 times) [2024-06-15 20:49:38,364][1652491] InferenceWorker_p0-w0: resuming experience collection (39450 times) [2024-06-15 20:49:40,276][1652491] Updated weights for policy 0, policy_version 759200 (0.0013) [2024-06-15 20:49:40,955][1648985] Fps is (10 sec: 52429.2, 60 sec: 48059.8, 300 sec: 46652.8). Total num frames: 1554907136. Throughput: 0: 11525.8. Samples: 388772352. Policy #0 lag: (min: 15.0, avg: 126.1, max: 271.0) [2024-06-15 20:49:40,956][1648985] Avg episode reward: [(0, '153.290')] [2024-06-15 20:49:40,957][1652491] Updated weights for policy 0, policy_version 759232 (0.0028) [2024-06-15 20:49:42,527][1652491] Updated weights for policy 0, policy_version 759286 (0.0034) [2024-06-15 20:49:44,302][1652491] Updated weights for policy 0, policy_version 759316 (0.0024) [2024-06-15 20:49:45,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 48059.7, 300 sec: 46874.9). Total num frames: 1555169280. Throughput: 0: 11559.8. Samples: 388842496. Policy #0 lag: (min: 15.0, avg: 126.1, max: 271.0) [2024-06-15 20:49:45,956][1648985] Avg episode reward: [(0, '149.600')] [2024-06-15 20:49:48,404][1652491] Updated weights for policy 0, policy_version 759364 (0.0043) [2024-06-15 20:49:49,685][1652491] Updated weights for policy 0, policy_version 759420 (0.0117) [2024-06-15 20:49:50,955][1648985] Fps is (10 sec: 39321.5, 60 sec: 45875.4, 300 sec: 46208.8). Total num frames: 1555300352. Throughput: 0: 11662.2. Samples: 388913664. Policy #0 lag: (min: 15.0, avg: 126.1, max: 271.0) [2024-06-15 20:49:50,956][1648985] Avg episode reward: [(0, '146.730')] [2024-06-15 20:49:52,389][1652491] Updated weights for policy 0, policy_version 759477 (0.0012) [2024-06-15 20:49:53,653][1652491] Updated weights for policy 0, policy_version 759523 (0.0011) [2024-06-15 20:49:55,955][1648985] Fps is (10 sec: 42599.2, 60 sec: 46421.8, 300 sec: 46763.9). Total num frames: 1555595264. Throughput: 0: 11503.0. Samples: 388943872. Policy #0 lag: (min: 15.0, avg: 126.1, max: 271.0) [2024-06-15 20:49:55,955][1648985] Avg episode reward: [(0, '156.720')] [2024-06-15 20:49:56,303][1652491] Updated weights for policy 0, policy_version 759584 (0.0013) [2024-06-15 20:49:59,799][1652491] Updated weights for policy 0, policy_version 759618 (0.0013) [2024-06-15 20:50:00,955][1648985] Fps is (10 sec: 49151.8, 60 sec: 45329.1, 300 sec: 46541.7). Total num frames: 1555791872. Throughput: 0: 11707.7. Samples: 389018112. Policy #0 lag: (min: 15.0, avg: 126.1, max: 271.0) [2024-06-15 20:50:00,956][1648985] Avg episode reward: [(0, '158.710')] [2024-06-15 20:50:01,131][1652491] Updated weights for policy 0, policy_version 759672 (0.0092) [2024-06-15 20:50:03,539][1652491] Updated weights for policy 0, policy_version 759712 (0.0017) [2024-06-15 20:50:04,923][1652491] Updated weights for policy 0, policy_version 759767 (0.0011) [2024-06-15 20:50:05,955][1648985] Fps is (10 sec: 49152.2, 60 sec: 47513.6, 300 sec: 46874.9). Total num frames: 1556086784. Throughput: 0: 11616.7. Samples: 389081088. Policy #0 lag: (min: 15.0, avg: 126.1, max: 271.0) [2024-06-15 20:50:05,955][1648985] Avg episode reward: [(0, '140.220')] [2024-06-15 20:50:07,315][1652491] Updated weights for policy 0, policy_version 759824 (0.0013) [2024-06-15 20:50:10,955][1648985] Fps is (10 sec: 45875.5, 60 sec: 45329.1, 300 sec: 46653.5). Total num frames: 1556250624. Throughput: 0: 11662.3. Samples: 389117440. Policy #0 lag: (min: 15.0, avg: 126.1, max: 271.0) [2024-06-15 20:50:10,956][1648985] Avg episode reward: [(0, '132.680')] [2024-06-15 20:50:10,976][1652491] Updated weights for policy 0, policy_version 759889 (0.0014) [2024-06-15 20:50:11,853][1652491] Updated weights for policy 0, policy_version 759936 (0.0012) [2024-06-15 20:50:15,955][1648985] Fps is (10 sec: 36044.4, 60 sec: 46967.4, 300 sec: 46541.7). Total num frames: 1556447232. Throughput: 0: 11878.4. Samples: 389195264. Policy #0 lag: (min: 15.0, avg: 126.1, max: 271.0) [2024-06-15 20:50:15,956][1648985] Avg episode reward: [(0, '129.380')] [2024-06-15 20:50:16,339][1652491] Updated weights for policy 0, policy_version 760016 (0.0157) [2024-06-15 20:50:18,830][1652491] Updated weights for policy 0, policy_version 760080 (0.0024) [2024-06-15 20:50:19,865][1652491] Updated weights for policy 0, policy_version 760126 (0.0013) [2024-06-15 20:50:20,955][1648985] Fps is (10 sec: 49151.8, 60 sec: 46421.3, 300 sec: 46541.7). Total num frames: 1556742144. Throughput: 0: 11605.4. Samples: 389258240. Policy #0 lag: (min: 15.0, avg: 126.1, max: 271.0) [2024-06-15 20:50:20,955][1648985] Avg episode reward: [(0, '150.090')] [2024-06-15 20:50:21,778][1651469] Signal inference workers to stop experience collection... (39500 times) [2024-06-15 20:50:21,829][1652491] InferenceWorker_p0-w0: stopping experience collection (39500 times) [2024-06-15 20:50:22,073][1651469] Signal inference workers to resume experience collection... (39500 times) [2024-06-15 20:50:22,075][1652491] InferenceWorker_p0-w0: resuming experience collection (39500 times) [2024-06-15 20:50:22,494][1652491] Updated weights for policy 0, policy_version 760182 (0.0115) [2024-06-15 20:50:25,955][1648985] Fps is (10 sec: 49151.8, 60 sec: 46967.5, 300 sec: 46430.6). Total num frames: 1556938752. Throughput: 0: 11673.6. Samples: 389297664. Policy #0 lag: (min: 15.0, avg: 126.1, max: 271.0) [2024-06-15 20:50:25,956][1648985] Avg episode reward: [(0, '174.160')] [2024-06-15 20:50:26,241][1652491] Updated weights for policy 0, policy_version 760240 (0.0014) [2024-06-15 20:50:27,521][1652491] Updated weights for policy 0, policy_version 760301 (0.0015) [2024-06-15 20:50:29,593][1652491] Updated weights for policy 0, policy_version 760337 (0.0013) [2024-06-15 20:50:30,955][1648985] Fps is (10 sec: 52429.5, 60 sec: 48059.8, 300 sec: 46652.8). Total num frames: 1557266432. Throughput: 0: 11787.4. Samples: 389372928. Policy #0 lag: (min: 53.0, avg: 179.9, max: 309.0) [2024-06-15 20:50:30,955][1648985] Avg episode reward: [(0, '180.750')] [2024-06-15 20:50:31,909][1652491] Updated weights for policy 0, policy_version 760385 (0.0014) [2024-06-15 20:50:35,955][1648985] Fps is (10 sec: 45875.4, 60 sec: 45875.3, 300 sec: 46208.5). Total num frames: 1557397504. Throughput: 0: 11855.6. Samples: 389447168. Policy #0 lag: (min: 53.0, avg: 179.9, max: 309.0) [2024-06-15 20:50:35,956][1648985] Avg episode reward: [(0, '164.920')] [2024-06-15 20:50:36,527][1652491] Updated weights for policy 0, policy_version 760464 (0.0012) [2024-06-15 20:50:37,832][1652491] Updated weights for policy 0, policy_version 760528 (0.0012) [2024-06-15 20:50:38,778][1652491] Updated weights for policy 0, policy_version 760571 (0.0014) [2024-06-15 20:50:40,955][1648985] Fps is (10 sec: 45874.4, 60 sec: 46967.4, 300 sec: 46763.8). Total num frames: 1557725184. Throughput: 0: 11878.4. Samples: 389478400. Policy #0 lag: (min: 53.0, avg: 179.9, max: 309.0) [2024-06-15 20:50:40,956][1648985] Avg episode reward: [(0, '157.270')] [2024-06-15 20:50:41,589][1652491] Updated weights for policy 0, policy_version 760633 (0.0013) [2024-06-15 20:50:44,004][1652491] Updated weights for policy 0, policy_version 760692 (0.0013) [2024-06-15 20:50:45,955][1648985] Fps is (10 sec: 52428.5, 60 sec: 45875.3, 300 sec: 46319.5). Total num frames: 1557921792. Throughput: 0: 11832.9. Samples: 389550592. Policy #0 lag: (min: 53.0, avg: 179.9, max: 309.0) [2024-06-15 20:50:45,956][1648985] Avg episode reward: [(0, '146.260')] [2024-06-15 20:50:47,528][1652491] Updated weights for policy 0, policy_version 760727 (0.0032) [2024-06-15 20:50:48,911][1652491] Updated weights for policy 0, policy_version 760800 (0.0097) [2024-06-15 20:50:50,955][1648985] Fps is (10 sec: 45875.1, 60 sec: 48059.7, 300 sec: 46652.7). Total num frames: 1558183936. Throughput: 0: 12060.4. Samples: 389623808. Policy #0 lag: (min: 53.0, avg: 179.9, max: 309.0) [2024-06-15 20:50:50,956][1648985] Avg episode reward: [(0, '143.980')] [2024-06-15 20:50:52,387][1652491] Updated weights for policy 0, policy_version 760885 (0.0017) [2024-06-15 20:50:54,297][1652491] Updated weights for policy 0, policy_version 760928 (0.0017) [2024-06-15 20:50:55,955][1648985] Fps is (10 sec: 52427.6, 60 sec: 47513.3, 300 sec: 46541.7). Total num frames: 1558446080. Throughput: 0: 12049.0. Samples: 389659648. Policy #0 lag: (min: 53.0, avg: 179.9, max: 309.0) [2024-06-15 20:50:55,956][1648985] Avg episode reward: [(0, '141.750')] [2024-06-15 20:50:55,964][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000760960_1558446080.pth... [2024-06-15 20:50:56,023][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000755520_1547304960.pth [2024-06-15 20:50:56,030][1651469] Saving a milestone train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/milestones/checkpoint_000760960_1558446080.pth [2024-06-15 20:50:57,016][1652491] Updated weights for policy 0, policy_version 760962 (0.0063) [2024-06-15 20:50:58,286][1652491] Updated weights for policy 0, policy_version 761021 (0.0013) [2024-06-15 20:50:59,652][1652491] Updated weights for policy 0, policy_version 761085 (0.0014) [2024-06-15 20:51:00,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 48605.8, 300 sec: 46763.8). Total num frames: 1558708224. Throughput: 0: 12049.0. Samples: 389737472. Policy #0 lag: (min: 53.0, avg: 179.9, max: 309.0) [2024-06-15 20:51:00,956][1648985] Avg episode reward: [(0, '174.700')] [2024-06-15 20:51:02,435][1651469] Signal inference workers to stop experience collection... (39550 times) [2024-06-15 20:51:02,475][1652491] InferenceWorker_p0-w0: stopping experience collection (39550 times) [2024-06-15 20:51:02,666][1651469] Signal inference workers to resume experience collection... (39550 times) [2024-06-15 20:51:02,667][1652491] InferenceWorker_p0-w0: resuming experience collection (39550 times) [2024-06-15 20:51:02,670][1652491] Updated weights for policy 0, policy_version 761136 (0.0013) [2024-06-15 20:51:04,184][1652491] Updated weights for policy 0, policy_version 761189 (0.0021) [2024-06-15 20:51:05,955][1648985] Fps is (10 sec: 52429.7, 60 sec: 48059.6, 300 sec: 46763.8). Total num frames: 1558970368. Throughput: 0: 12242.5. Samples: 389809152. Policy #0 lag: (min: 53.0, avg: 179.9, max: 309.0) [2024-06-15 20:51:05,956][1648985] Avg episode reward: [(0, '171.420')] [2024-06-15 20:51:07,640][1652491] Updated weights for policy 0, policy_version 761219 (0.0013) [2024-06-15 20:51:08,977][1652491] Updated weights for policy 0, policy_version 761276 (0.0011) [2024-06-15 20:51:10,519][1652491] Updated weights for policy 0, policy_version 761317 (0.0012) [2024-06-15 20:51:10,956][1648985] Fps is (10 sec: 52425.5, 60 sec: 49697.5, 300 sec: 47097.0). Total num frames: 1559232512. Throughput: 0: 12276.4. Samples: 389850112. Policy #0 lag: (min: 53.0, avg: 179.9, max: 309.0) [2024-06-15 20:51:10,957][1648985] Avg episode reward: [(0, '167.880')] [2024-06-15 20:51:12,633][1652491] Updated weights for policy 0, policy_version 761376 (0.0012) [2024-06-15 20:51:14,312][1652491] Updated weights for policy 0, policy_version 761425 (0.0041) [2024-06-15 20:51:15,274][1652491] Updated weights for policy 0, policy_version 761472 (0.0014) [2024-06-15 20:51:15,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 50790.4, 300 sec: 47097.1). Total num frames: 1559494656. Throughput: 0: 12117.3. Samples: 389918208. Policy #0 lag: (min: 53.0, avg: 179.9, max: 309.0) [2024-06-15 20:51:15,956][1648985] Avg episode reward: [(0, '148.440')] [2024-06-15 20:51:20,955][1648985] Fps is (10 sec: 39324.4, 60 sec: 48059.7, 300 sec: 46652.7). Total num frames: 1559625728. Throughput: 0: 12151.5. Samples: 389993984. Policy #0 lag: (min: 53.0, avg: 179.9, max: 309.0) [2024-06-15 20:51:20,956][1648985] Avg episode reward: [(0, '150.530')] [2024-06-15 20:51:21,277][1652491] Updated weights for policy 0, policy_version 761552 (0.0012) [2024-06-15 20:51:23,111][1652491] Updated weights for policy 0, policy_version 761603 (0.0013) [2024-06-15 20:51:24,375][1652491] Updated weights for policy 0, policy_version 761663 (0.0023) [2024-06-15 20:51:25,955][1648985] Fps is (10 sec: 45874.6, 60 sec: 50244.1, 300 sec: 46874.9). Total num frames: 1559953408. Throughput: 0: 12196.9. Samples: 390027264. Policy #0 lag: (min: 53.0, avg: 179.9, max: 309.0) [2024-06-15 20:51:25,956][1648985] Avg episode reward: [(0, '137.830')] [2024-06-15 20:51:30,632][1652491] Updated weights for policy 0, policy_version 761760 (0.0015) [2024-06-15 20:51:30,961][1648985] Fps is (10 sec: 49123.4, 60 sec: 47508.9, 300 sec: 46540.8). Total num frames: 1560117248. Throughput: 0: 12195.4. Samples: 390099456. Policy #0 lag: (min: 53.0, avg: 179.9, max: 309.0) [2024-06-15 20:51:30,961][1648985] Avg episode reward: [(0, '159.720')] [2024-06-15 20:51:33,952][1652491] Updated weights for policy 0, policy_version 761840 (0.0072) [2024-06-15 20:51:35,852][1652491] Updated weights for policy 0, policy_version 761920 (0.0014) [2024-06-15 20:51:35,955][1648985] Fps is (10 sec: 45876.0, 60 sec: 50244.2, 300 sec: 46652.7). Total num frames: 1560412160. Throughput: 0: 11867.0. Samples: 390157824. Policy #0 lag: (min: 53.0, avg: 179.9, max: 309.0) [2024-06-15 20:51:35,956][1648985] Avg episode reward: [(0, '174.590')] [2024-06-15 20:51:40,955][1648985] Fps is (10 sec: 42622.9, 60 sec: 46967.5, 300 sec: 46430.6). Total num frames: 1560543232. Throughput: 0: 11764.7. Samples: 390189056. Policy #0 lag: (min: 53.0, avg: 179.9, max: 309.0) [2024-06-15 20:51:40,956][1648985] Avg episode reward: [(0, '197.540')] [2024-06-15 20:51:41,658][1652491] Updated weights for policy 0, policy_version 762000 (0.0014) [2024-06-15 20:51:42,572][1652491] Updated weights for policy 0, policy_version 762046 (0.0013) [2024-06-15 20:51:45,562][1652491] Updated weights for policy 0, policy_version 762096 (0.0029) [2024-06-15 20:51:45,955][1648985] Fps is (10 sec: 36044.8, 60 sec: 47513.6, 300 sec: 46319.5). Total num frames: 1560772608. Throughput: 0: 11855.7. Samples: 390270976. Policy #0 lag: (min: 53.0, avg: 179.9, max: 309.0) [2024-06-15 20:51:45,956][1648985] Avg episode reward: [(0, '185.490')] [2024-06-15 20:51:46,820][1651469] Signal inference workers to stop experience collection... (39600 times) [2024-06-15 20:51:46,867][1652491] InferenceWorker_p0-w0: stopping experience collection (39600 times) [2024-06-15 20:51:47,063][1651469] Signal inference workers to resume experience collection... (39600 times) [2024-06-15 20:51:47,064][1652491] InferenceWorker_p0-w0: resuming experience collection (39600 times) [2024-06-15 20:51:47,066][1652491] Updated weights for policy 0, policy_version 762160 (0.0013) [2024-06-15 20:51:49,224][1652491] Updated weights for policy 0, policy_version 762224 (0.0016) [2024-06-15 20:51:50,955][1648985] Fps is (10 sec: 52428.5, 60 sec: 48059.7, 300 sec: 46652.7). Total num frames: 1561067520. Throughput: 0: 11753.2. Samples: 390338048. Policy #0 lag: (min: 53.0, avg: 179.9, max: 309.0) [2024-06-15 20:51:50,956][1648985] Avg episode reward: [(0, '195.960')] [2024-06-15 20:51:52,611][1652491] Updated weights for policy 0, policy_version 762288 (0.0022) [2024-06-15 20:51:55,955][1648985] Fps is (10 sec: 45874.9, 60 sec: 46421.5, 300 sec: 46541.7). Total num frames: 1561231360. Throughput: 0: 11662.4. Samples: 390374912. Policy #0 lag: (min: 53.0, avg: 179.9, max: 309.0) [2024-06-15 20:51:55,956][1648985] Avg episode reward: [(0, '186.740')] [2024-06-15 20:51:56,068][1652491] Updated weights for policy 0, policy_version 762336 (0.0011) [2024-06-15 20:51:57,322][1652491] Updated weights for policy 0, policy_version 762384 (0.0012) [2024-06-15 20:51:59,609][1652491] Updated weights for policy 0, policy_version 762464 (0.0016) [2024-06-15 20:52:00,360][1652491] Updated weights for policy 0, policy_version 762496 (0.0012) [2024-06-15 20:52:00,955][1648985] Fps is (10 sec: 52429.5, 60 sec: 48059.8, 300 sec: 47097.1). Total num frames: 1561591808. Throughput: 0: 11650.9. Samples: 390442496. Policy #0 lag: (min: 53.0, avg: 179.9, max: 309.0) [2024-06-15 20:52:00,956][1648985] Avg episode reward: [(0, '174.060')] [2024-06-15 20:52:03,762][1652491] Updated weights for policy 0, policy_version 762544 (0.0011) [2024-06-15 20:52:05,955][1648985] Fps is (10 sec: 49151.6, 60 sec: 45875.1, 300 sec: 46652.7). Total num frames: 1561722880. Throughput: 0: 11810.1. Samples: 390525440. Policy #0 lag: (min: 53.0, avg: 179.9, max: 309.0) [2024-06-15 20:52:05,956][1648985] Avg episode reward: [(0, '187.060')] [2024-06-15 20:52:07,464][1652491] Updated weights for policy 0, policy_version 762595 (0.0016) [2024-06-15 20:52:08,956][1652491] Updated weights for policy 0, policy_version 762656 (0.0012) [2024-06-15 20:52:10,855][1652491] Updated weights for policy 0, policy_version 762720 (0.0012) [2024-06-15 20:52:10,955][1648985] Fps is (10 sec: 45874.7, 60 sec: 46967.9, 300 sec: 46874.9). Total num frames: 1562050560. Throughput: 0: 11787.4. Samples: 390557696. Policy #0 lag: (min: 53.0, avg: 179.9, max: 309.0) [2024-06-15 20:52:10,956][1648985] Avg episode reward: [(0, '190.070')] [2024-06-15 20:52:14,713][1652491] Updated weights for policy 0, policy_version 762788 (0.0013) [2024-06-15 20:52:15,955][1648985] Fps is (10 sec: 52429.3, 60 sec: 45875.2, 300 sec: 47097.1). Total num frames: 1562247168. Throughput: 0: 11504.4. Samples: 390617088. Policy #0 lag: (min: 35.0, avg: 164.7, max: 291.0) [2024-06-15 20:52:15,956][1648985] Avg episode reward: [(0, '191.040')] [2024-06-15 20:52:18,405][1652491] Updated weights for policy 0, policy_version 762832 (0.0016) [2024-06-15 20:52:19,625][1652491] Updated weights for policy 0, policy_version 762881 (0.0011) [2024-06-15 20:52:20,955][1648985] Fps is (10 sec: 42598.5, 60 sec: 47513.5, 300 sec: 46986.0). Total num frames: 1562476544. Throughput: 0: 11889.8. Samples: 390692864. Policy #0 lag: (min: 35.0, avg: 164.7, max: 291.0) [2024-06-15 20:52:20,956][1648985] Avg episode reward: [(0, '157.050')] [2024-06-15 20:52:21,108][1652491] Updated weights for policy 0, policy_version 762938 (0.0011) [2024-06-15 20:52:22,438][1652491] Updated weights for policy 0, policy_version 762978 (0.0012) [2024-06-15 20:52:25,201][1652491] Updated weights for policy 0, policy_version 763040 (0.0015) [2024-06-15 20:52:25,955][1648985] Fps is (10 sec: 52428.2, 60 sec: 46967.5, 300 sec: 47097.0). Total num frames: 1562771456. Throughput: 0: 11935.3. Samples: 390726144. Policy #0 lag: (min: 35.0, avg: 164.7, max: 291.0) [2024-06-15 20:52:25,956][1648985] Avg episode reward: [(0, '156.350')] [2024-06-15 20:52:30,046][1652491] Updated weights for policy 0, policy_version 763089 (0.0021) [2024-06-15 20:52:30,732][1651469] Signal inference workers to stop experience collection... (39650 times) [2024-06-15 20:52:30,776][1652491] InferenceWorker_p0-w0: stopping experience collection (39650 times) [2024-06-15 20:52:30,950][1651469] Signal inference workers to resume experience collection... (39650 times) [2024-06-15 20:52:30,955][1648985] Fps is (10 sec: 42598.6, 60 sec: 46425.8, 300 sec: 46986.0). Total num frames: 1562902528. Throughput: 0: 11935.3. Samples: 390808064. Policy #0 lag: (min: 35.0, avg: 164.7, max: 291.0) [2024-06-15 20:52:30,956][1648985] Avg episode reward: [(0, '154.250')] [2024-06-15 20:52:30,960][1652491] InferenceWorker_p0-w0: resuming experience collection (39650 times) [2024-06-15 20:52:32,007][1652491] Updated weights for policy 0, policy_version 763184 (0.0014) [2024-06-15 20:52:33,998][1652491] Updated weights for policy 0, policy_version 763248 (0.0016) [2024-06-15 20:52:35,955][1648985] Fps is (10 sec: 45876.0, 60 sec: 46967.5, 300 sec: 46874.9). Total num frames: 1563230208. Throughput: 0: 11650.9. Samples: 390862336. Policy #0 lag: (min: 35.0, avg: 164.7, max: 291.0) [2024-06-15 20:52:35,956][1648985] Avg episode reward: [(0, '151.700')] [2024-06-15 20:52:36,247][1652491] Updated weights for policy 0, policy_version 763312 (0.0020) [2024-06-15 20:52:40,955][1648985] Fps is (10 sec: 42597.6, 60 sec: 46421.2, 300 sec: 46763.8). Total num frames: 1563328512. Throughput: 0: 11753.2. Samples: 390903808. Policy #0 lag: (min: 35.0, avg: 164.7, max: 291.0) [2024-06-15 20:52:40,956][1648985] Avg episode reward: [(0, '163.160')] [2024-06-15 20:52:41,416][1652491] Updated weights for policy 0, policy_version 763361 (0.0014) [2024-06-15 20:52:43,004][1652491] Updated weights for policy 0, policy_version 763425 (0.0012) [2024-06-15 20:52:43,485][1652491] Updated weights for policy 0, policy_version 763454 (0.0012) [2024-06-15 20:52:45,181][1652491] Updated weights for policy 0, policy_version 763520 (0.0015) [2024-06-15 20:52:45,955][1648985] Fps is (10 sec: 45874.8, 60 sec: 48605.8, 300 sec: 46874.9). Total num frames: 1563688960. Throughput: 0: 11787.3. Samples: 390972928. Policy #0 lag: (min: 35.0, avg: 164.7, max: 291.0) [2024-06-15 20:52:45,956][1648985] Avg episode reward: [(0, '169.430')] [2024-06-15 20:52:50,955][1648985] Fps is (10 sec: 49153.2, 60 sec: 45875.3, 300 sec: 47097.3). Total num frames: 1563820032. Throughput: 0: 11685.0. Samples: 391051264. Policy #0 lag: (min: 35.0, avg: 164.7, max: 291.0) [2024-06-15 20:52:50,956][1648985] Avg episode reward: [(0, '172.330')] [2024-06-15 20:52:51,451][1652491] Updated weights for policy 0, policy_version 763592 (0.0015) [2024-06-15 20:52:52,880][1652491] Updated weights for policy 0, policy_version 763648 (0.0019) [2024-06-15 20:52:55,085][1652491] Updated weights for policy 0, policy_version 763713 (0.0012) [2024-06-15 20:52:55,955][1648985] Fps is (10 sec: 45874.4, 60 sec: 48605.7, 300 sec: 46874.9). Total num frames: 1564147712. Throughput: 0: 11639.4. Samples: 391081472. Policy #0 lag: (min: 35.0, avg: 164.7, max: 291.0) [2024-06-15 20:52:55,956][1648985] Avg episode reward: [(0, '163.680')] [2024-06-15 20:52:56,452][1652491] Updated weights for policy 0, policy_version 763771 (0.0013) [2024-06-15 20:52:56,555][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000763776_1564213248.pth... [2024-06-15 20:52:56,606][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000758208_1552809984.pth [2024-06-15 20:52:58,829][1652491] Updated weights for policy 0, policy_version 763834 (0.0013) [2024-06-15 20:53:00,955][1648985] Fps is (10 sec: 52428.1, 60 sec: 45875.1, 300 sec: 47097.0). Total num frames: 1564344320. Throughput: 0: 11741.9. Samples: 391145472. Policy #0 lag: (min: 35.0, avg: 164.7, max: 291.0) [2024-06-15 20:53:00,956][1648985] Avg episode reward: [(0, '173.280')] [2024-06-15 20:53:03,986][1652491] Updated weights for policy 0, policy_version 763895 (0.0012) [2024-06-15 20:53:05,448][1652491] Updated weights for policy 0, policy_version 763958 (0.0033) [2024-06-15 20:53:05,955][1648985] Fps is (10 sec: 45876.1, 60 sec: 48059.8, 300 sec: 47097.1). Total num frames: 1564606464. Throughput: 0: 11730.5. Samples: 391220736. Policy #0 lag: (min: 35.0, avg: 164.7, max: 291.0) [2024-06-15 20:53:05,956][1648985] Avg episode reward: [(0, '164.160')] [2024-06-15 20:53:07,726][1652491] Updated weights for policy 0, policy_version 764031 (0.0016) [2024-06-15 20:53:09,908][1652491] Updated weights for policy 0, policy_version 764092 (0.0012) [2024-06-15 20:53:10,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 46967.5, 300 sec: 47097.0). Total num frames: 1564868608. Throughput: 0: 11707.8. Samples: 391252992. Policy #0 lag: (min: 35.0, avg: 164.7, max: 291.0) [2024-06-15 20:53:10,956][1648985] Avg episode reward: [(0, '154.240')] [2024-06-15 20:53:14,407][1651469] Signal inference workers to stop experience collection... (39700 times) [2024-06-15 20:53:14,513][1652491] InferenceWorker_p0-w0: stopping experience collection (39700 times) [2024-06-15 20:53:14,678][1651469] Signal inference workers to resume experience collection... (39700 times) [2024-06-15 20:53:14,679][1652491] InferenceWorker_p0-w0: resuming experience collection (39700 times) [2024-06-15 20:53:14,875][1652491] Updated weights for policy 0, policy_version 764134 (0.0025) [2024-06-15 20:53:15,955][1648985] Fps is (10 sec: 42597.9, 60 sec: 46421.2, 300 sec: 46763.8). Total num frames: 1565032448. Throughput: 0: 11662.2. Samples: 391332864. Policy #0 lag: (min: 35.0, avg: 164.7, max: 291.0) [2024-06-15 20:53:15,956][1648985] Avg episode reward: [(0, '160.700')] [2024-06-15 20:53:16,296][1652491] Updated weights for policy 0, policy_version 764193 (0.0013) [2024-06-15 20:53:17,710][1652491] Updated weights for policy 0, policy_version 764241 (0.0015) [2024-06-15 20:53:20,500][1652491] Updated weights for policy 0, policy_version 764336 (0.0014) [2024-06-15 20:53:20,955][1648985] Fps is (10 sec: 52429.8, 60 sec: 48606.0, 300 sec: 47097.1). Total num frames: 1565392896. Throughput: 0: 11810.2. Samples: 391393792. Policy #0 lag: (min: 35.0, avg: 164.7, max: 291.0) [2024-06-15 20:53:20,955][1648985] Avg episode reward: [(0, '170.550')] [2024-06-15 20:53:25,700][1652491] Updated weights for policy 0, policy_version 764385 (0.0013) [2024-06-15 20:53:25,955][1648985] Fps is (10 sec: 45875.9, 60 sec: 45329.2, 300 sec: 46986.0). Total num frames: 1565491200. Throughput: 0: 11924.0. Samples: 391440384. Policy #0 lag: (min: 35.0, avg: 164.7, max: 291.0) [2024-06-15 20:53:25,956][1648985] Avg episode reward: [(0, '178.550')] [2024-06-15 20:53:26,759][1652491] Updated weights for policy 0, policy_version 764432 (0.0013) [2024-06-15 20:53:28,362][1652491] Updated weights for policy 0, policy_version 764498 (0.0082) [2024-06-15 20:53:30,338][1652491] Updated weights for policy 0, policy_version 764545 (0.0013) [2024-06-15 20:53:30,955][1648985] Fps is (10 sec: 42598.0, 60 sec: 48605.9, 300 sec: 47208.2). Total num frames: 1565818880. Throughput: 0: 11878.4. Samples: 391507456. Policy #0 lag: (min: 35.0, avg: 164.7, max: 291.0) [2024-06-15 20:53:30,955][1648985] Avg episode reward: [(0, '184.790')] [2024-06-15 20:53:31,664][1652491] Updated weights for policy 0, policy_version 764601 (0.0015) [2024-06-15 20:53:35,955][1648985] Fps is (10 sec: 42598.0, 60 sec: 44782.8, 300 sec: 47097.0). Total num frames: 1565917184. Throughput: 0: 11878.4. Samples: 391585792. Policy #0 lag: (min: 35.0, avg: 164.7, max: 291.0) [2024-06-15 20:53:35,956][1648985] Avg episode reward: [(0, '187.840')] [2024-06-15 20:53:37,407][1652491] Updated weights for policy 0, policy_version 764666 (0.0161) [2024-06-15 20:53:39,312][1652491] Updated weights for policy 0, policy_version 764720 (0.0028) [2024-06-15 20:53:40,855][1652491] Updated weights for policy 0, policy_version 764795 (0.0034) [2024-06-15 20:53:40,955][1648985] Fps is (10 sec: 49151.5, 60 sec: 49698.3, 300 sec: 47541.4). Total num frames: 1566310400. Throughput: 0: 11832.9. Samples: 391613952. Policy #0 lag: (min: 35.0, avg: 164.7, max: 291.0) [2024-06-15 20:53:40,956][1648985] Avg episode reward: [(0, '175.290')] [2024-06-15 20:53:43,390][1652491] Updated weights for policy 0, policy_version 764859 (0.0034) [2024-06-15 20:53:45,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 45875.2, 300 sec: 47097.1). Total num frames: 1566441472. Throughput: 0: 11821.5. Samples: 391677440. Policy #0 lag: (min: 35.0, avg: 164.7, max: 291.0) [2024-06-15 20:53:45,956][1648985] Avg episode reward: [(0, '168.790')] [2024-06-15 20:53:48,309][1652491] Updated weights for policy 0, policy_version 764899 (0.0014) [2024-06-15 20:53:49,804][1652491] Updated weights for policy 0, policy_version 764934 (0.0041) [2024-06-15 20:53:50,955][1648985] Fps is (10 sec: 39321.7, 60 sec: 48059.7, 300 sec: 47097.1). Total num frames: 1566703616. Throughput: 0: 11730.5. Samples: 391748608. Policy #0 lag: (min: 35.0, avg: 164.7, max: 291.0) [2024-06-15 20:53:50,956][1648985] Avg episode reward: [(0, '159.700')] [2024-06-15 20:53:51,568][1652491] Updated weights for policy 0, policy_version 765024 (0.0090) [2024-06-15 20:53:54,548][1651469] Signal inference workers to stop experience collection... (39750 times) [2024-06-15 20:53:54,590][1652491] InferenceWorker_p0-w0: stopping experience collection (39750 times) [2024-06-15 20:53:54,592][1652491] Updated weights for policy 0, policy_version 765090 (0.0013) [2024-06-15 20:53:54,874][1651469] Signal inference workers to resume experience collection... (39750 times) [2024-06-15 20:53:54,874][1652491] InferenceWorker_p0-w0: resuming experience collection (39750 times) [2024-06-15 20:53:55,955][1648985] Fps is (10 sec: 52429.3, 60 sec: 46967.6, 300 sec: 47097.1). Total num frames: 1566965760. Throughput: 0: 11810.1. Samples: 391784448. Policy #0 lag: (min: 35.0, avg: 164.7, max: 291.0) [2024-06-15 20:53:55,956][1648985] Avg episode reward: [(0, '138.870')] [2024-06-15 20:53:58,817][1652491] Updated weights for policy 0, policy_version 765136 (0.0044) [2024-06-15 20:54:00,028][1652491] Updated weights for policy 0, policy_version 765181 (0.0012) [2024-06-15 20:54:00,955][1648985] Fps is (10 sec: 39321.5, 60 sec: 45875.2, 300 sec: 46986.0). Total num frames: 1567096832. Throughput: 0: 11571.2. Samples: 391853568. Policy #0 lag: (min: 47.0, avg: 131.0, max: 303.0) [2024-06-15 20:54:00,956][1648985] Avg episode reward: [(0, '137.370')] [2024-06-15 20:54:02,665][1652491] Updated weights for policy 0, policy_version 765237 (0.0012) [2024-06-15 20:54:03,832][1652491] Updated weights for policy 0, policy_version 765296 (0.0012) [2024-06-15 20:54:05,303][1652491] Updated weights for policy 0, policy_version 765346 (0.0013) [2024-06-15 20:54:05,955][1648985] Fps is (10 sec: 52429.4, 60 sec: 48059.8, 300 sec: 47319.2). Total num frames: 1567490048. Throughput: 0: 11776.0. Samples: 391923712. Policy #0 lag: (min: 47.0, avg: 131.0, max: 303.0) [2024-06-15 20:54:05,955][1648985] Avg episode reward: [(0, '155.190')] [2024-06-15 20:54:09,230][1652491] Updated weights for policy 0, policy_version 765377 (0.0011) [2024-06-15 20:54:10,680][1652491] Updated weights for policy 0, policy_version 765433 (0.0016) [2024-06-15 20:54:10,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 45875.2, 300 sec: 47430.3). Total num frames: 1567621120. Throughput: 0: 11730.5. Samples: 391968256. Policy #0 lag: (min: 47.0, avg: 131.0, max: 303.0) [2024-06-15 20:54:10,956][1648985] Avg episode reward: [(0, '197.700')] [2024-06-15 20:54:12,389][1652491] Updated weights for policy 0, policy_version 765462 (0.0012) [2024-06-15 20:54:14,122][1652491] Updated weights for policy 0, policy_version 765537 (0.0094) [2024-06-15 20:54:15,955][1648985] Fps is (10 sec: 49150.9, 60 sec: 49152.0, 300 sec: 47541.3). Total num frames: 1567981568. Throughput: 0: 11707.7. Samples: 392034304. Policy #0 lag: (min: 47.0, avg: 131.0, max: 303.0) [2024-06-15 20:54:15,956][1648985] Avg episode reward: [(0, '202.990')] [2024-06-15 20:54:16,082][1652491] Updated weights for policy 0, policy_version 765630 (0.0025) [2024-06-15 20:54:20,955][1648985] Fps is (10 sec: 42598.2, 60 sec: 44236.7, 300 sec: 47208.1). Total num frames: 1568047104. Throughput: 0: 11673.6. Samples: 392111104. Policy #0 lag: (min: 47.0, avg: 131.0, max: 303.0) [2024-06-15 20:54:20,956][1648985] Avg episode reward: [(0, '189.660')] [2024-06-15 20:54:21,741][1652491] Updated weights for policy 0, policy_version 765689 (0.0013) [2024-06-15 20:54:23,578][1652491] Updated weights for policy 0, policy_version 765728 (0.0014) [2024-06-15 20:54:25,032][1652491] Updated weights for policy 0, policy_version 765792 (0.0038) [2024-06-15 20:54:25,955][1648985] Fps is (10 sec: 42599.0, 60 sec: 48605.9, 300 sec: 47541.4). Total num frames: 1568407552. Throughput: 0: 11832.9. Samples: 392146432. Policy #0 lag: (min: 47.0, avg: 131.0, max: 303.0) [2024-06-15 20:54:25,956][1648985] Avg episode reward: [(0, '204.980')] [2024-06-15 20:54:26,433][1652491] Updated weights for policy 0, policy_version 765858 (0.0013) [2024-06-15 20:54:30,571][1652491] Updated weights for policy 0, policy_version 765893 (0.0013) [2024-06-15 20:54:30,958][1648985] Fps is (10 sec: 52412.5, 60 sec: 45872.7, 300 sec: 47207.6). Total num frames: 1568571392. Throughput: 0: 12207.5. Samples: 392226816. Policy #0 lag: (min: 47.0, avg: 131.0, max: 303.0) [2024-06-15 20:54:30,959][1648985] Avg episode reward: [(0, '186.450')] [2024-06-15 20:54:31,910][1652491] Updated weights for policy 0, policy_version 765950 (0.0012) [2024-06-15 20:54:35,121][1652491] Updated weights for policy 0, policy_version 766016 (0.0113) [2024-06-15 20:54:35,955][1648985] Fps is (10 sec: 42598.4, 60 sec: 48606.0, 300 sec: 47208.1). Total num frames: 1568833536. Throughput: 0: 11946.7. Samples: 392286208. Policy #0 lag: (min: 47.0, avg: 131.0, max: 303.0) [2024-06-15 20:54:35,956][1648985] Avg episode reward: [(0, '166.940')] [2024-06-15 20:54:36,086][1651469] Signal inference workers to stop experience collection... (39800 times) [2024-06-15 20:54:36,117][1652491] InferenceWorker_p0-w0: stopping experience collection (39800 times) [2024-06-15 20:54:36,313][1651469] Signal inference workers to resume experience collection... (39800 times) [2024-06-15 20:54:36,313][1652491] InferenceWorker_p0-w0: resuming experience collection (39800 times) [2024-06-15 20:54:36,830][1652491] Updated weights for policy 0, policy_version 766085 (0.0031) [2024-06-15 20:54:40,955][1648985] Fps is (10 sec: 49167.3, 60 sec: 45875.2, 300 sec: 47097.1). Total num frames: 1569062912. Throughput: 0: 11855.6. Samples: 392317952. Policy #0 lag: (min: 47.0, avg: 131.0, max: 303.0) [2024-06-15 20:54:40,956][1648985] Avg episode reward: [(0, '162.300')] [2024-06-15 20:54:42,319][1652491] Updated weights for policy 0, policy_version 766146 (0.0012) [2024-06-15 20:54:43,471][1652491] Updated weights for policy 0, policy_version 766201 (0.0013) [2024-06-15 20:54:45,955][1648985] Fps is (10 sec: 39320.7, 60 sec: 46421.2, 300 sec: 47208.1). Total num frames: 1569226752. Throughput: 0: 12049.0. Samples: 392395776. Policy #0 lag: (min: 47.0, avg: 131.0, max: 303.0) [2024-06-15 20:54:45,956][1648985] Avg episode reward: [(0, '172.790')] [2024-06-15 20:54:46,490][1652491] Updated weights for policy 0, policy_version 766256 (0.0169) [2024-06-15 20:54:48,831][1652491] Updated weights for policy 0, policy_version 766342 (0.0016) [2024-06-15 20:54:49,937][1652491] Updated weights for policy 0, policy_version 766395 (0.0012) [2024-06-15 20:54:50,955][1648985] Fps is (10 sec: 52429.6, 60 sec: 48059.8, 300 sec: 47430.3). Total num frames: 1569587200. Throughput: 0: 11639.5. Samples: 392447488. Policy #0 lag: (min: 47.0, avg: 131.0, max: 303.0) [2024-06-15 20:54:50,955][1648985] Avg episode reward: [(0, '171.830')] [2024-06-15 20:54:55,856][1652491] Updated weights for policy 0, policy_version 766458 (0.0014) [2024-06-15 20:54:55,955][1648985] Fps is (10 sec: 49152.8, 60 sec: 45875.2, 300 sec: 47208.1). Total num frames: 1569718272. Throughput: 0: 11593.9. Samples: 392489984. Policy #0 lag: (min: 47.0, avg: 131.0, max: 303.0) [2024-06-15 20:54:55,956][1648985] Avg episode reward: [(0, '173.610')] [2024-06-15 20:54:55,976][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000766464_1569718272.pth... [2024-06-15 20:54:56,021][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000760960_1558446080.pth [2024-06-15 20:54:58,073][1652491] Updated weights for policy 0, policy_version 766526 (0.0013) [2024-06-15 20:55:00,119][1652491] Updated weights for policy 0, policy_version 766595 (0.0011) [2024-06-15 20:55:00,955][1648985] Fps is (10 sec: 45874.7, 60 sec: 49152.0, 300 sec: 47319.2). Total num frames: 1570045952. Throughput: 0: 11559.9. Samples: 392554496. Policy #0 lag: (min: 47.0, avg: 131.0, max: 303.0) [2024-06-15 20:55:00,956][1648985] Avg episode reward: [(0, '181.260')] [2024-06-15 20:55:01,425][1652491] Updated weights for policy 0, policy_version 766649 (0.0015) [2024-06-15 20:55:05,955][1648985] Fps is (10 sec: 39322.0, 60 sec: 43690.6, 300 sec: 46986.0). Total num frames: 1570111488. Throughput: 0: 11480.2. Samples: 392627712. Policy #0 lag: (min: 47.0, avg: 131.0, max: 303.0) [2024-06-15 20:55:05,956][1648985] Avg episode reward: [(0, '163.720')] [2024-06-15 20:55:07,312][1652491] Updated weights for policy 0, policy_version 766704 (0.0012) [2024-06-15 20:55:08,995][1652491] Updated weights for policy 0, policy_version 766737 (0.0012) [2024-06-15 20:55:10,005][1652491] Updated weights for policy 0, policy_version 766781 (0.0012) [2024-06-15 20:55:10,955][1648985] Fps is (10 sec: 39320.9, 60 sec: 46967.3, 300 sec: 47430.3). Total num frames: 1570439168. Throughput: 0: 11457.4. Samples: 392662016. Policy #0 lag: (min: 47.0, avg: 131.0, max: 303.0) [2024-06-15 20:55:10,956][1648985] Avg episode reward: [(0, '173.350')] [2024-06-15 20:55:11,452][1652491] Updated weights for policy 0, policy_version 766836 (0.0011) [2024-06-15 20:55:13,034][1652491] Updated weights for policy 0, policy_version 766904 (0.0014) [2024-06-15 20:55:15,955][1648985] Fps is (10 sec: 52427.6, 60 sec: 44236.8, 300 sec: 47097.0). Total num frames: 1570635776. Throughput: 0: 11025.8. Samples: 392722944. Policy #0 lag: (min: 47.0, avg: 131.0, max: 303.0) [2024-06-15 20:55:15,956][1648985] Avg episode reward: [(0, '146.420')] [2024-06-15 20:55:18,842][1652491] Updated weights for policy 0, policy_version 766975 (0.0014) [2024-06-15 20:55:20,955][1648985] Fps is (10 sec: 36045.4, 60 sec: 45875.2, 300 sec: 46986.0). Total num frames: 1570799616. Throughput: 0: 11366.4. Samples: 392797696. Policy #0 lag: (min: 47.0, avg: 131.0, max: 303.0) [2024-06-15 20:55:20,956][1648985] Avg episode reward: [(0, '160.700')] [2024-06-15 20:55:21,055][1651469] Signal inference workers to stop experience collection... (39850 times) [2024-06-15 20:55:21,114][1652491] InferenceWorker_p0-w0: stopping experience collection (39850 times) [2024-06-15 20:55:21,228][1651469] Signal inference workers to resume experience collection... (39850 times) [2024-06-15 20:55:21,229][1652491] InferenceWorker_p0-w0: resuming experience collection (39850 times) [2024-06-15 20:55:21,620][1652491] Updated weights for policy 0, policy_version 767024 (0.0013) [2024-06-15 20:55:23,276][1652491] Updated weights for policy 0, policy_version 767088 (0.0012) [2024-06-15 20:55:24,911][1652491] Updated weights for policy 0, policy_version 767168 (0.0029) [2024-06-15 20:55:25,955][1648985] Fps is (10 sec: 52429.6, 60 sec: 45875.2, 300 sec: 47097.0). Total num frames: 1571160064. Throughput: 0: 11195.7. Samples: 392821760. Policy #0 lag: (min: 47.0, avg: 131.0, max: 303.0) [2024-06-15 20:55:25,956][1648985] Avg episode reward: [(0, '159.940')] [2024-06-15 20:55:30,955][1648985] Fps is (10 sec: 49152.4, 60 sec: 45331.5, 300 sec: 47097.1). Total num frames: 1571291136. Throughput: 0: 11173.0. Samples: 392898560. Policy #0 lag: (min: 47.0, avg: 131.0, max: 303.0) [2024-06-15 20:55:30,955][1648985] Avg episode reward: [(0, '169.470')] [2024-06-15 20:55:32,104][1652491] Updated weights for policy 0, policy_version 767248 (0.0038) [2024-06-15 20:55:34,239][1652491] Updated weights for policy 0, policy_version 767328 (0.0013) [2024-06-15 20:55:35,955][1648985] Fps is (10 sec: 45875.4, 60 sec: 46421.3, 300 sec: 47097.1). Total num frames: 1571618816. Throughput: 0: 11252.6. Samples: 392953856. Policy #0 lag: (min: 47.0, avg: 131.0, max: 303.0) [2024-06-15 20:55:35,956][1648985] Avg episode reward: [(0, '172.290')] [2024-06-15 20:55:36,046][1652491] Updated weights for policy 0, policy_version 767408 (0.0013) [2024-06-15 20:55:40,955][1648985] Fps is (10 sec: 42598.1, 60 sec: 44236.8, 300 sec: 46763.8). Total num frames: 1571717120. Throughput: 0: 11184.4. Samples: 392993280. Policy #0 lag: (min: 47.0, avg: 131.0, max: 303.0) [2024-06-15 20:55:40,956][1648985] Avg episode reward: [(0, '138.310')] [2024-06-15 20:55:41,438][1652491] Updated weights for policy 0, policy_version 767461 (0.0021) [2024-06-15 20:55:44,379][1652491] Updated weights for policy 0, policy_version 767521 (0.0014) [2024-06-15 20:55:45,690][1652491] Updated weights for policy 0, policy_version 767584 (0.0013) [2024-06-15 20:55:45,955][1648985] Fps is (10 sec: 39320.8, 60 sec: 46421.4, 300 sec: 46874.9). Total num frames: 1572012032. Throughput: 0: 11434.6. Samples: 393069056. Policy #0 lag: (min: 41.0, avg: 115.6, max: 297.0) [2024-06-15 20:55:45,956][1648985] Avg episode reward: [(0, '133.800')] [2024-06-15 20:55:47,082][1652491] Updated weights for policy 0, policy_version 767648 (0.0013) [2024-06-15 20:55:50,955][1648985] Fps is (10 sec: 49152.0, 60 sec: 43690.6, 300 sec: 46652.8). Total num frames: 1572208640. Throughput: 0: 11446.0. Samples: 393142784. Policy #0 lag: (min: 41.0, avg: 115.6, max: 297.0) [2024-06-15 20:55:50,956][1648985] Avg episode reward: [(0, '142.380')] [2024-06-15 20:55:52,197][1652491] Updated weights for policy 0, policy_version 767715 (0.0015) [2024-06-15 20:55:55,111][1652491] Updated weights for policy 0, policy_version 767777 (0.0015) [2024-06-15 20:55:55,955][1648985] Fps is (10 sec: 49152.4, 60 sec: 46421.3, 300 sec: 46763.8). Total num frames: 1572503552. Throughput: 0: 11503.0. Samples: 393179648. Policy #0 lag: (min: 41.0, avg: 115.6, max: 297.0) [2024-06-15 20:55:55,956][1648985] Avg episode reward: [(0, '145.110')] [2024-06-15 20:55:56,204][1652491] Updated weights for policy 0, policy_version 767838 (0.0012) [2024-06-15 20:55:57,483][1651469] Signal inference workers to stop experience collection... (39900 times) [2024-06-15 20:55:57,520][1652491] InferenceWorker_p0-w0: stopping experience collection (39900 times) [2024-06-15 20:55:57,524][1652491] Updated weights for policy 0, policy_version 767891 (0.0013) [2024-06-15 20:55:57,795][1651469] Signal inference workers to resume experience collection... (39900 times) [2024-06-15 20:55:57,796][1652491] InferenceWorker_p0-w0: resuming experience collection (39900 times) [2024-06-15 20:56:00,955][1648985] Fps is (10 sec: 52428.5, 60 sec: 44782.9, 300 sec: 46652.7). Total num frames: 1572732928. Throughput: 0: 11537.1. Samples: 393242112. Policy #0 lag: (min: 41.0, avg: 115.6, max: 297.0) [2024-06-15 20:56:00,956][1648985] Avg episode reward: [(0, '155.910')] [2024-06-15 20:56:03,092][1652491] Updated weights for policy 0, policy_version 767968 (0.0015) [2024-06-15 20:56:04,741][1652491] Updated weights for policy 0, policy_version 768004 (0.0011) [2024-06-15 20:56:05,919][1652491] Updated weights for policy 0, policy_version 768057 (0.0045) [2024-06-15 20:56:05,955][1648985] Fps is (10 sec: 45876.0, 60 sec: 47513.6, 300 sec: 46541.8). Total num frames: 1572962304. Throughput: 0: 11662.2. Samples: 393322496. Policy #0 lag: (min: 41.0, avg: 115.6, max: 297.0) [2024-06-15 20:56:05,955][1648985] Avg episode reward: [(0, '156.810')] [2024-06-15 20:56:07,248][1652491] Updated weights for policy 0, policy_version 768118 (0.0014) [2024-06-15 20:56:09,105][1652491] Updated weights for policy 0, policy_version 768184 (0.0013) [2024-06-15 20:56:10,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 46967.6, 300 sec: 46652.7). Total num frames: 1573257216. Throughput: 0: 11776.0. Samples: 393351680. Policy #0 lag: (min: 41.0, avg: 115.6, max: 297.0) [2024-06-15 20:56:10,956][1648985] Avg episode reward: [(0, '163.400')] [2024-06-15 20:56:14,295][1652491] Updated weights for policy 0, policy_version 768227 (0.0012) [2024-06-15 20:56:15,696][1652491] Updated weights for policy 0, policy_version 768275 (0.0013) [2024-06-15 20:56:15,955][1648985] Fps is (10 sec: 49152.0, 60 sec: 46967.7, 300 sec: 46874.9). Total num frames: 1573453824. Throughput: 0: 11844.3. Samples: 393431552. Policy #0 lag: (min: 41.0, avg: 115.6, max: 297.0) [2024-06-15 20:56:15,956][1648985] Avg episode reward: [(0, '169.820')] [2024-06-15 20:56:17,404][1652491] Updated weights for policy 0, policy_version 768324 (0.0013) [2024-06-15 20:56:18,239][1652491] Updated weights for policy 0, policy_version 768376 (0.0127) [2024-06-15 20:56:19,227][1652491] Updated weights for policy 0, policy_version 768416 (0.0012) [2024-06-15 20:56:20,955][1648985] Fps is (10 sec: 52428.4, 60 sec: 49698.1, 300 sec: 46874.9). Total num frames: 1573781504. Throughput: 0: 12105.9. Samples: 393498624. Policy #0 lag: (min: 41.0, avg: 115.6, max: 297.0) [2024-06-15 20:56:20,956][1648985] Avg episode reward: [(0, '173.130')] [2024-06-15 20:56:24,722][1652491] Updated weights for policy 0, policy_version 768471 (0.0013) [2024-06-15 20:56:25,955][1648985] Fps is (10 sec: 45874.9, 60 sec: 45875.2, 300 sec: 46764.7). Total num frames: 1573912576. Throughput: 0: 12299.4. Samples: 393546752. Policy #0 lag: (min: 41.0, avg: 115.6, max: 297.0) [2024-06-15 20:56:25,956][1648985] Avg episode reward: [(0, '152.660')] [2024-06-15 20:56:26,608][1652491] Updated weights for policy 0, policy_version 768546 (0.0112) [2024-06-15 20:56:28,747][1652491] Updated weights for policy 0, policy_version 768608 (0.0013) [2024-06-15 20:56:29,291][1652491] Updated weights for policy 0, policy_version 768640 (0.0012) [2024-06-15 20:56:30,739][1652491] Updated weights for policy 0, policy_version 768704 (0.0014) [2024-06-15 20:56:30,955][1648985] Fps is (10 sec: 52429.5, 60 sec: 50244.2, 300 sec: 47097.1). Total num frames: 1574305792. Throughput: 0: 12026.4. Samples: 393610240. Policy #0 lag: (min: 41.0, avg: 115.6, max: 297.0) [2024-06-15 20:56:30,956][1648985] Avg episode reward: [(0, '133.990')] [2024-06-15 20:56:35,955][1648985] Fps is (10 sec: 45875.9, 60 sec: 45875.3, 300 sec: 46874.9). Total num frames: 1574371328. Throughput: 0: 12162.9. Samples: 393690112. Policy #0 lag: (min: 41.0, avg: 115.6, max: 297.0) [2024-06-15 20:56:35,955][1648985] Avg episode reward: [(0, '150.390')] [2024-06-15 20:56:36,564][1652491] Updated weights for policy 0, policy_version 768768 (0.0022) [2024-06-15 20:56:37,985][1652491] Updated weights for policy 0, policy_version 768824 (0.0013) [2024-06-15 20:56:40,223][1652491] Updated weights for policy 0, policy_version 768867 (0.0013) [2024-06-15 20:56:40,495][1651469] Signal inference workers to stop experience collection... (39950 times) [2024-06-15 20:56:40,530][1652491] InferenceWorker_p0-w0: stopping experience collection (39950 times) [2024-06-15 20:56:40,750][1651469] Signal inference workers to resume experience collection... (39950 times) [2024-06-15 20:56:40,751][1652491] InferenceWorker_p0-w0: resuming experience collection (39950 times) [2024-06-15 20:56:40,955][1648985] Fps is (10 sec: 39321.6, 60 sec: 49698.2, 300 sec: 47208.1). Total num frames: 1574699008. Throughput: 0: 12037.7. Samples: 393721344. Policy #0 lag: (min: 41.0, avg: 115.6, max: 297.0) [2024-06-15 20:56:40,956][1648985] Avg episode reward: [(0, '168.930')] [2024-06-15 20:56:41,840][1652491] Updated weights for policy 0, policy_version 768953 (0.0014) [2024-06-15 20:56:45,955][1648985] Fps is (10 sec: 45874.5, 60 sec: 46967.6, 300 sec: 46652.8). Total num frames: 1574830080. Throughput: 0: 12231.1. Samples: 393792512. Policy #0 lag: (min: 41.0, avg: 115.6, max: 297.0) [2024-06-15 20:56:45,956][1648985] Avg episode reward: [(0, '172.360')] [2024-06-15 20:56:47,080][1652491] Updated weights for policy 0, policy_version 768992 (0.0035) [2024-06-15 20:56:48,305][1652491] Updated weights for policy 0, policy_version 769046 (0.0012) [2024-06-15 20:56:50,702][1652491] Updated weights for policy 0, policy_version 769104 (0.0012) [2024-06-15 20:56:50,961][1648985] Fps is (10 sec: 42575.3, 60 sec: 48601.5, 300 sec: 47096.2). Total num frames: 1575124992. Throughput: 0: 12093.1. Samples: 393866752. Policy #0 lag: (min: 41.0, avg: 115.6, max: 297.0) [2024-06-15 20:56:50,962][1648985] Avg episode reward: [(0, '166.780')] [2024-06-15 20:56:51,932][1652491] Updated weights for policy 0, policy_version 769155 (0.0120) [2024-06-15 20:56:55,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 47513.7, 300 sec: 46652.7). Total num frames: 1575354368. Throughput: 0: 12117.3. Samples: 393896960. Policy #0 lag: (min: 41.0, avg: 115.6, max: 297.0) [2024-06-15 20:56:55,955][1648985] Avg episode reward: [(0, '163.810')] [2024-06-15 20:56:55,960][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000769216_1575354368.pth... [2024-06-15 20:56:56,062][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000763776_1564213248.pth [2024-06-15 20:56:57,774][1652491] Updated weights for policy 0, policy_version 769219 (0.0083) [2024-06-15 20:56:59,599][1652491] Updated weights for policy 0, policy_version 769297 (0.0011) [2024-06-15 20:57:00,955][1648985] Fps is (10 sec: 49178.6, 60 sec: 48059.8, 300 sec: 47097.1). Total num frames: 1575616512. Throughput: 0: 11889.8. Samples: 393966592. Policy #0 lag: (min: 41.0, avg: 115.6, max: 297.0) [2024-06-15 20:57:00,956][1648985] Avg episode reward: [(0, '159.100')] [2024-06-15 20:57:02,096][1652491] Updated weights for policy 0, policy_version 769360 (0.0013) [2024-06-15 20:57:03,595][1652491] Updated weights for policy 0, policy_version 769424 (0.0015) [2024-06-15 20:57:04,489][1652491] Updated weights for policy 0, policy_version 769468 (0.0012) [2024-06-15 20:57:05,955][1648985] Fps is (10 sec: 52429.1, 60 sec: 48605.9, 300 sec: 46874.9). Total num frames: 1575878656. Throughput: 0: 11924.0. Samples: 394035200. Policy #0 lag: (min: 41.0, avg: 115.6, max: 297.0) [2024-06-15 20:57:05,956][1648985] Avg episode reward: [(0, '152.280')] [2024-06-15 20:57:10,714][1652491] Updated weights for policy 0, policy_version 769521 (0.0012) [2024-06-15 20:57:10,955][1648985] Fps is (10 sec: 36044.6, 60 sec: 45329.1, 300 sec: 46541.7). Total num frames: 1575976960. Throughput: 0: 11696.3. Samples: 394073088. Policy #0 lag: (min: 41.0, avg: 115.6, max: 297.0) [2024-06-15 20:57:10,956][1648985] Avg episode reward: [(0, '154.820')] [2024-06-15 20:57:11,990][1652491] Updated weights for policy 0, policy_version 769570 (0.0024) [2024-06-15 20:57:12,804][1652491] Updated weights for policy 0, policy_version 769603 (0.0023) [2024-06-15 20:57:14,054][1652491] Updated weights for policy 0, policy_version 769654 (0.0013) [2024-06-15 20:57:15,432][1652491] Updated weights for policy 0, policy_version 769726 (0.0097) [2024-06-15 20:57:15,955][1648985] Fps is (10 sec: 52428.3, 60 sec: 49151.9, 300 sec: 47208.1). Total num frames: 1576402944. Throughput: 0: 11628.1. Samples: 394133504. Policy #0 lag: (min: 41.0, avg: 115.6, max: 297.0) [2024-06-15 20:57:15,956][1648985] Avg episode reward: [(0, '145.060')] [2024-06-15 20:57:20,955][1648985] Fps is (10 sec: 42597.6, 60 sec: 43690.6, 300 sec: 46208.4). Total num frames: 1576402944. Throughput: 0: 11639.4. Samples: 394213888. Policy #0 lag: (min: 41.0, avg: 115.6, max: 297.0) [2024-06-15 20:57:20,956][1648985] Avg episode reward: [(0, '144.990')] [2024-06-15 20:57:22,108][1651469] Signal inference workers to stop experience collection... (40000 times) [2024-06-15 20:57:22,171][1652491] InferenceWorker_p0-w0: stopping experience collection (40000 times) [2024-06-15 20:57:22,434][1651469] Signal inference workers to resume experience collection... (40000 times) [2024-06-15 20:57:22,435][1652491] InferenceWorker_p0-w0: resuming experience collection (40000 times) [2024-06-15 20:57:22,641][1652491] Updated weights for policy 0, policy_version 769795 (0.0013) [2024-06-15 20:57:24,034][1652491] Updated weights for policy 0, policy_version 769846 (0.0013) [2024-06-15 20:57:25,512][1652491] Updated weights for policy 0, policy_version 769904 (0.0102) [2024-06-15 20:57:25,955][1648985] Fps is (10 sec: 39321.1, 60 sec: 48059.6, 300 sec: 47097.0). Total num frames: 1576796160. Throughput: 0: 11491.5. Samples: 394238464. Policy #0 lag: (min: 41.0, avg: 115.6, max: 297.0) [2024-06-15 20:57:25,956][1648985] Avg episode reward: [(0, '147.680')] [2024-06-15 20:57:27,000][1652491] Updated weights for policy 0, policy_version 769976 (0.0014) [2024-06-15 20:57:30,955][1648985] Fps is (10 sec: 52429.9, 60 sec: 43690.6, 300 sec: 46430.6). Total num frames: 1576927232. Throughput: 0: 11446.0. Samples: 394307584. Policy #0 lag: (min: 41.0, avg: 115.6, max: 297.0) [2024-06-15 20:57:30,956][1648985] Avg episode reward: [(0, '153.500')] [2024-06-15 20:57:33,295][1652491] Updated weights for policy 0, policy_version 770041 (0.0015) [2024-06-15 20:57:35,872][1652491] Updated weights for policy 0, policy_version 770115 (0.0014) [2024-06-15 20:57:35,955][1648985] Fps is (10 sec: 39322.3, 60 sec: 46967.4, 300 sec: 46986.0). Total num frames: 1577189376. Throughput: 0: 11333.6. Samples: 394376704. Policy #0 lag: (min: 15.0, avg: 77.4, max: 271.0) [2024-06-15 20:57:35,956][1648985] Avg episode reward: [(0, '149.370')] [2024-06-15 20:57:37,664][1652491] Updated weights for policy 0, policy_version 770193 (0.0012) [2024-06-15 20:57:38,537][1652491] Updated weights for policy 0, policy_version 770240 (0.0013) [2024-06-15 20:57:40,958][1648985] Fps is (10 sec: 52411.8, 60 sec: 45872.7, 300 sec: 46652.2). Total num frames: 1577451520. Throughput: 0: 11297.3. Samples: 394405376. Policy #0 lag: (min: 15.0, avg: 77.4, max: 271.0) [2024-06-15 20:57:40,960][1648985] Avg episode reward: [(0, '137.140')] [2024-06-15 20:57:44,591][1652491] Updated weights for policy 0, policy_version 770301 (0.0012) [2024-06-15 20:57:45,955][1648985] Fps is (10 sec: 45875.0, 60 sec: 46967.5, 300 sec: 46874.9). Total num frames: 1577648128. Throughput: 0: 11468.8. Samples: 394482688. Policy #0 lag: (min: 15.0, avg: 77.4, max: 271.0) [2024-06-15 20:57:45,956][1648985] Avg episode reward: [(0, '168.390')] [2024-06-15 20:57:46,323][1652491] Updated weights for policy 0, policy_version 770363 (0.0014) [2024-06-15 20:57:48,481][1652491] Updated weights for policy 0, policy_version 770448 (0.0014) [2024-06-15 20:57:50,955][1648985] Fps is (10 sec: 52447.1, 60 sec: 47518.1, 300 sec: 46875.0). Total num frames: 1577975808. Throughput: 0: 11446.1. Samples: 394550272. Policy #0 lag: (min: 15.0, avg: 77.4, max: 271.0) [2024-06-15 20:57:50,955][1648985] Avg episode reward: [(0, '182.790')] [2024-06-15 20:57:54,236][1652491] Updated weights for policy 0, policy_version 770497 (0.0012) [2024-06-15 20:57:55,955][1648985] Fps is (10 sec: 45875.2, 60 sec: 45875.2, 300 sec: 46652.8). Total num frames: 1578106880. Throughput: 0: 11673.6. Samples: 394598400. Policy #0 lag: (min: 15.0, avg: 77.4, max: 271.0) [2024-06-15 20:57:55,956][1648985] Avg episode reward: [(0, '186.080')] [2024-06-15 20:57:56,484][1652491] Updated weights for policy 0, policy_version 770576 (0.0013) [2024-06-15 20:57:58,565][1652491] Updated weights for policy 0, policy_version 770656 (0.0014) [2024-06-15 20:58:00,127][1651469] Signal inference workers to stop experience collection... (40050 times) [2024-06-15 20:58:00,181][1652491] InferenceWorker_p0-w0: stopping experience collection (40050 times) [2024-06-15 20:58:00,183][1652491] Updated weights for policy 0, policy_version 770723 (0.0097) [2024-06-15 20:58:00,450][1651469] Signal inference workers to resume experience collection... (40050 times) [2024-06-15 20:58:00,453][1652491] InferenceWorker_p0-w0: resuming experience collection (40050 times) [2024-06-15 20:58:00,955][1648985] Fps is (10 sec: 52427.8, 60 sec: 48059.8, 300 sec: 47097.1). Total num frames: 1578500096. Throughput: 0: 11650.9. Samples: 394657792. Policy #0 lag: (min: 15.0, avg: 77.4, max: 271.0) [2024-06-15 20:58:00,956][1648985] Avg episode reward: [(0, '161.550')] [2024-06-15 20:58:05,955][1648985] Fps is (10 sec: 45875.1, 60 sec: 44782.9, 300 sec: 46430.6). Total num frames: 1578565632. Throughput: 0: 11605.4. Samples: 394736128. Policy #0 lag: (min: 15.0, avg: 77.4, max: 271.0) [2024-06-15 20:58:05,956][1648985] Avg episode reward: [(0, '149.400')] [2024-06-15 20:58:06,228][1652491] Updated weights for policy 0, policy_version 770809 (0.0016) [2024-06-15 20:58:07,901][1652491] Updated weights for policy 0, policy_version 770852 (0.0013) [2024-06-15 20:58:09,172][1652491] Updated weights for policy 0, policy_version 770898 (0.0011) [2024-06-15 20:58:10,902][1652491] Updated weights for policy 0, policy_version 770978 (0.0011) [2024-06-15 20:58:10,955][1648985] Fps is (10 sec: 45875.1, 60 sec: 49698.2, 300 sec: 47208.2). Total num frames: 1578958848. Throughput: 0: 11889.8. Samples: 394773504. Policy #0 lag: (min: 15.0, avg: 77.4, max: 271.0) [2024-06-15 20:58:10,956][1648985] Avg episode reward: [(0, '161.640')] [2024-06-15 20:58:15,955][1648985] Fps is (10 sec: 45874.6, 60 sec: 43690.6, 300 sec: 46208.4). Total num frames: 1579024384. Throughput: 0: 11935.3. Samples: 394844672. Policy #0 lag: (min: 15.0, avg: 77.4, max: 271.0) [2024-06-15 20:58:15,956][1648985] Avg episode reward: [(0, '168.360')] [2024-06-15 20:58:16,589][1652491] Updated weights for policy 0, policy_version 771040 (0.0012) [2024-06-15 20:58:17,358][1652491] Updated weights for policy 0, policy_version 771069 (0.0023) [2024-06-15 20:58:19,225][1652491] Updated weights for policy 0, policy_version 771136 (0.0027) [2024-06-15 20:58:20,955][1648985] Fps is (10 sec: 42597.8, 60 sec: 49698.2, 300 sec: 47097.0). Total num frames: 1579384832. Throughput: 0: 11935.2. Samples: 394913792. Policy #0 lag: (min: 15.0, avg: 77.4, max: 271.0) [2024-06-15 20:58:20,956][1648985] Avg episode reward: [(0, '173.670')] [2024-06-15 20:58:21,268][1652491] Updated weights for policy 0, policy_version 771200 (0.0013) [2024-06-15 20:58:22,647][1652491] Updated weights for policy 0, policy_version 771264 (0.0030) [2024-06-15 20:58:25,955][1648985] Fps is (10 sec: 52429.9, 60 sec: 45875.3, 300 sec: 46541.7). Total num frames: 1579548672. Throughput: 0: 12038.6. Samples: 394947072. Policy #0 lag: (min: 15.0, avg: 77.4, max: 271.0) [2024-06-15 20:58:25,955][1648985] Avg episode reward: [(0, '183.640')] [2024-06-15 20:58:28,754][1652491] Updated weights for policy 0, policy_version 771329 (0.0070) [2024-06-15 20:58:30,955][1648985] Fps is (10 sec: 42599.0, 60 sec: 48059.8, 300 sec: 47097.1). Total num frames: 1579810816. Throughput: 0: 11958.0. Samples: 395020800. Policy #0 lag: (min: 15.0, avg: 77.4, max: 271.0) [2024-06-15 20:58:30,956][1648985] Avg episode reward: [(0, '174.280')] [2024-06-15 20:58:31,401][1652491] Updated weights for policy 0, policy_version 771410 (0.0013) [2024-06-15 20:58:32,900][1652491] Updated weights for policy 0, policy_version 771480 (0.0013) [2024-06-15 20:58:35,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 48059.7, 300 sec: 46652.8). Total num frames: 1580072960. Throughput: 0: 12049.0. Samples: 395092480. Policy #0 lag: (min: 15.0, avg: 77.4, max: 271.0) [2024-06-15 20:58:35,956][1648985] Avg episode reward: [(0, '167.240')] [2024-06-15 20:58:38,925][1652491] Updated weights for policy 0, policy_version 771536 (0.0013) [2024-06-15 20:58:40,323][1652491] Updated weights for policy 0, policy_version 771588 (0.0025) [2024-06-15 20:58:40,955][1648985] Fps is (10 sec: 45875.4, 60 sec: 46970.1, 300 sec: 46874.9). Total num frames: 1580269568. Throughput: 0: 11901.2. Samples: 395133952. Policy #0 lag: (min: 15.0, avg: 77.4, max: 271.0) [2024-06-15 20:58:40,955][1648985] Avg episode reward: [(0, '141.010')] [2024-06-15 20:58:41,541][1652491] Updated weights for policy 0, policy_version 771646 (0.0015) [2024-06-15 20:58:42,857][1651469] Signal inference workers to stop experience collection... (40100 times) [2024-06-15 20:58:42,902][1652491] InferenceWorker_p0-w0: stopping experience collection (40100 times) [2024-06-15 20:58:43,091][1651469] Signal inference workers to resume experience collection... (40100 times) [2024-06-15 20:58:43,092][1652491] InferenceWorker_p0-w0: resuming experience collection (40100 times) [2024-06-15 20:58:44,152][1652491] Updated weights for policy 0, policy_version 771744 (0.0016) [2024-06-15 20:58:45,955][1648985] Fps is (10 sec: 52429.5, 60 sec: 49152.1, 300 sec: 47097.1). Total num frames: 1580597248. Throughput: 0: 11730.5. Samples: 395185664. Policy #0 lag: (min: 15.0, avg: 77.4, max: 271.0) [2024-06-15 20:58:45,955][1648985] Avg episode reward: [(0, '152.440')] [2024-06-15 20:58:50,955][1648985] Fps is (10 sec: 39321.4, 60 sec: 44782.8, 300 sec: 46430.6). Total num frames: 1580662784. Throughput: 0: 11855.7. Samples: 395269632. Policy #0 lag: (min: 15.0, avg: 77.4, max: 271.0) [2024-06-15 20:58:50,955][1648985] Avg episode reward: [(0, '157.130')] [2024-06-15 20:58:51,062][1652491] Updated weights for policy 0, policy_version 771810 (0.0015) [2024-06-15 20:58:52,736][1652491] Updated weights for policy 0, policy_version 771877 (0.0107) [2024-06-15 20:58:53,577][1652491] Updated weights for policy 0, policy_version 771920 (0.0013) [2024-06-15 20:58:55,269][1652491] Updated weights for policy 0, policy_version 771986 (0.0013) [2024-06-15 20:58:55,955][1648985] Fps is (10 sec: 49150.9, 60 sec: 49698.1, 300 sec: 47430.3). Total num frames: 1581088768. Throughput: 0: 11707.7. Samples: 395300352. Policy #0 lag: (min: 15.0, avg: 77.4, max: 271.0) [2024-06-15 20:58:55,956][1648985] Avg episode reward: [(0, '171.670')] [2024-06-15 20:58:56,043][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000772032_1581121536.pth... [2024-06-15 20:58:56,099][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000766464_1569718272.pth [2024-06-15 20:59:00,966][1648985] Fps is (10 sec: 45824.1, 60 sec: 43682.5, 300 sec: 46206.7). Total num frames: 1581121536. Throughput: 0: 11704.9. Samples: 395371520. Policy #0 lag: (min: 15.0, avg: 77.4, max: 271.0) [2024-06-15 20:59:00,967][1648985] Avg episode reward: [(0, '161.510')] [2024-06-15 20:59:02,407][1652491] Updated weights for policy 0, policy_version 772064 (0.0013) [2024-06-15 20:59:04,340][1652491] Updated weights for policy 0, policy_version 772144 (0.0013) [2024-06-15 20:59:05,799][1652491] Updated weights for policy 0, policy_version 772214 (0.0019) [2024-06-15 20:59:05,955][1648985] Fps is (10 sec: 39321.3, 60 sec: 48605.8, 300 sec: 46986.0). Total num frames: 1581481984. Throughput: 0: 11605.3. Samples: 395436032. Policy #0 lag: (min: 15.0, avg: 77.4, max: 271.0) [2024-06-15 20:59:05,956][1648985] Avg episode reward: [(0, '170.190')] [2024-06-15 20:59:06,457][1652491] Updated weights for policy 0, policy_version 772240 (0.0013) [2024-06-15 20:59:07,344][1652491] Updated weights for policy 0, policy_version 772286 (0.0020) [2024-06-15 20:59:10,955][1648985] Fps is (10 sec: 52487.9, 60 sec: 44783.0, 300 sec: 46319.6). Total num frames: 1581645824. Throughput: 0: 11730.5. Samples: 395474944. Policy #0 lag: (min: 15.0, avg: 77.4, max: 271.0) [2024-06-15 20:59:10,955][1648985] Avg episode reward: [(0, '151.710')] [2024-06-15 20:59:13,533][1652491] Updated weights for policy 0, policy_version 772342 (0.0012) [2024-06-15 20:59:14,858][1652491] Updated weights for policy 0, policy_version 772400 (0.0012) [2024-06-15 20:59:15,993][1648985] Fps is (10 sec: 45876.5, 60 sec: 48606.1, 300 sec: 47097.1). Total num frames: 1581940736. Throughput: 0: 11741.9. Samples: 395549184. Policy #0 lag: (min: 15.0, avg: 77.4, max: 271.0) [2024-06-15 20:59:15,993][1648985] Avg episode reward: [(0, '161.610')] [2024-06-15 20:59:16,236][1652491] Updated weights for policy 0, policy_version 772456 (0.0013) [2024-06-15 20:59:17,539][1652491] Updated weights for policy 0, policy_version 772512 (0.0056) [2024-06-15 20:59:20,955][1648985] Fps is (10 sec: 52428.2, 60 sec: 46421.4, 300 sec: 46652.7). Total num frames: 1582170112. Throughput: 0: 11685.0. Samples: 395618304. Policy #0 lag: (min: 55.0, avg: 212.3, max: 305.0) [2024-06-15 20:59:20,956][1648985] Avg episode reward: [(0, '157.160')] [2024-06-15 20:59:24,289][1651469] Signal inference workers to stop experience collection... (40150 times) [2024-06-15 20:59:24,377][1652491] InferenceWorker_p0-w0: stopping experience collection (40150 times) [2024-06-15 20:59:24,558][1651469] Signal inference workers to resume experience collection... (40150 times) [2024-06-15 20:59:24,559][1652491] InferenceWorker_p0-w0: resuming experience collection (40150 times) [2024-06-15 20:59:24,734][1652491] Updated weights for policy 0, policy_version 772577 (0.0102) [2024-06-15 20:59:25,955][1648985] Fps is (10 sec: 39321.0, 60 sec: 46421.3, 300 sec: 46653.2). Total num frames: 1582333952. Throughput: 0: 11753.2. Samples: 395662848. Policy #0 lag: (min: 55.0, avg: 212.3, max: 305.0) [2024-06-15 20:59:25,956][1648985] Avg episode reward: [(0, '154.570')] [2024-06-15 20:59:26,833][1652491] Updated weights for policy 0, policy_version 772659 (0.0277) [2024-06-15 20:59:28,484][1652491] Updated weights for policy 0, policy_version 772729 (0.0013) [2024-06-15 20:59:29,469][1652491] Updated weights for policy 0, policy_version 772774 (0.0012) [2024-06-15 20:59:29,889][1652491] Updated weights for policy 0, policy_version 772800 (0.0013) [2024-06-15 20:59:30,955][1648985] Fps is (10 sec: 52429.4, 60 sec: 48059.8, 300 sec: 46986.0). Total num frames: 1582694400. Throughput: 0: 11741.9. Samples: 395714048. Policy #0 lag: (min: 55.0, avg: 212.3, max: 305.0) [2024-06-15 20:59:30,955][1648985] Avg episode reward: [(0, '156.510')] [2024-06-15 20:59:35,955][1648985] Fps is (10 sec: 39321.7, 60 sec: 44236.8, 300 sec: 46319.5). Total num frames: 1582727168. Throughput: 0: 11662.2. Samples: 395794432. Policy #0 lag: (min: 55.0, avg: 212.3, max: 305.0) [2024-06-15 20:59:35,956][1648985] Avg episode reward: [(0, '153.500')] [2024-06-15 20:59:37,654][1652491] Updated weights for policy 0, policy_version 772896 (0.0011) [2024-06-15 20:59:39,059][1652491] Updated weights for policy 0, policy_version 772948 (0.0020) [2024-06-15 20:59:40,587][1652491] Updated weights for policy 0, policy_version 773013 (0.0082) [2024-06-15 20:59:40,955][1648985] Fps is (10 sec: 45874.1, 60 sec: 48059.6, 300 sec: 47208.2). Total num frames: 1583153152. Throughput: 0: 11502.9. Samples: 395817984. Policy #0 lag: (min: 55.0, avg: 212.3, max: 305.0) [2024-06-15 20:59:40,956][1648985] Avg episode reward: [(0, '161.360')] [2024-06-15 20:59:45,955][1648985] Fps is (10 sec: 49152.1, 60 sec: 43690.6, 300 sec: 46208.4). Total num frames: 1583218688. Throughput: 0: 11596.8. Samples: 395893248. Policy #0 lag: (min: 55.0, avg: 212.3, max: 305.0) [2024-06-15 20:59:45,956][1648985] Avg episode reward: [(0, '170.300')] [2024-06-15 20:59:47,578][1652491] Updated weights for policy 0, policy_version 773060 (0.0052) [2024-06-15 20:59:49,584][1652491] Updated weights for policy 0, policy_version 773136 (0.0109) [2024-06-15 20:59:50,955][1648985] Fps is (10 sec: 36045.7, 60 sec: 47513.7, 300 sec: 46763.9). Total num frames: 1583513600. Throughput: 0: 11616.8. Samples: 395958784. Policy #0 lag: (min: 55.0, avg: 212.3, max: 305.0) [2024-06-15 20:59:50,955][1648985] Avg episode reward: [(0, '152.260')] [2024-06-15 20:59:51,021][1652491] Updated weights for policy 0, policy_version 773200 (0.0081) [2024-06-15 20:59:52,591][1652491] Updated weights for policy 0, policy_version 773267 (0.0133) [2024-06-15 20:59:55,955][1648985] Fps is (10 sec: 52426.8, 60 sec: 44236.6, 300 sec: 46430.5). Total num frames: 1583742976. Throughput: 0: 11389.0. Samples: 395987456. Policy #0 lag: (min: 55.0, avg: 212.3, max: 305.0) [2024-06-15 20:59:55,958][1648985] Avg episode reward: [(0, '153.290')] [2024-06-15 20:59:58,351][1652491] Updated weights for policy 0, policy_version 773314 (0.0015) [2024-06-15 21:00:00,385][1652491] Updated weights for policy 0, policy_version 773393 (0.0038) [2024-06-15 21:00:00,738][1651469] Signal inference workers to stop experience collection... (40200 times) [2024-06-15 21:00:00,817][1652491] InferenceWorker_p0-w0: stopping experience collection (40200 times) [2024-06-15 21:00:00,920][1651469] Signal inference workers to resume experience collection... (40200 times) [2024-06-15 21:00:00,921][1652491] InferenceWorker_p0-w0: resuming experience collection (40200 times) [2024-06-15 21:00:00,962][1648985] Fps is (10 sec: 45842.1, 60 sec: 47516.8, 300 sec: 46984.8). Total num frames: 1583972352. Throughput: 0: 11444.2. Samples: 396064256. Policy #0 lag: (min: 55.0, avg: 212.3, max: 305.0) [2024-06-15 21:00:00,963][1648985] Avg episode reward: [(0, '147.460')] [2024-06-15 21:00:01,255][1652491] Updated weights for policy 0, policy_version 773440 (0.0011) [2024-06-15 21:00:02,432][1652491] Updated weights for policy 0, policy_version 773501 (0.0013) [2024-06-15 21:00:03,516][1652491] Updated weights for policy 0, policy_version 773539 (0.0044) [2024-06-15 21:00:05,955][1648985] Fps is (10 sec: 52431.1, 60 sec: 46421.5, 300 sec: 46874.9). Total num frames: 1584267264. Throughput: 0: 11503.0. Samples: 396135936. Policy #0 lag: (min: 55.0, avg: 212.3, max: 305.0) [2024-06-15 21:00:05,956][1648985] Avg episode reward: [(0, '151.250')] [2024-06-15 21:00:10,549][1652491] Updated weights for policy 0, policy_version 773616 (0.0013) [2024-06-15 21:00:10,955][1648985] Fps is (10 sec: 39349.3, 60 sec: 45328.9, 300 sec: 46541.7). Total num frames: 1584365568. Throughput: 0: 11434.7. Samples: 396177408. Policy #0 lag: (min: 55.0, avg: 212.3, max: 305.0) [2024-06-15 21:00:10,956][1648985] Avg episode reward: [(0, '162.590')] [2024-06-15 21:00:12,643][1652491] Updated weights for policy 0, policy_version 773696 (0.0022) [2024-06-15 21:00:15,005][1652491] Updated weights for policy 0, policy_version 773777 (0.0013) [2024-06-15 21:00:15,955][1648985] Fps is (10 sec: 52426.5, 60 sec: 47513.2, 300 sec: 47430.2). Total num frames: 1584791552. Throughput: 0: 11514.2. Samples: 396232192. Policy #0 lag: (min: 55.0, avg: 212.3, max: 305.0) [2024-06-15 21:00:15,956][1648985] Avg episode reward: [(0, '159.840')] [2024-06-15 21:00:20,955][1648985] Fps is (10 sec: 42598.5, 60 sec: 43690.6, 300 sec: 46208.4). Total num frames: 1584791552. Throughput: 0: 11400.5. Samples: 396307456. Policy #0 lag: (min: 55.0, avg: 212.3, max: 305.0) [2024-06-15 21:00:20,956][1648985] Avg episode reward: [(0, '165.940')] [2024-06-15 21:00:21,640][1652491] Updated weights for policy 0, policy_version 773841 (0.0016) [2024-06-15 21:00:23,923][1652491] Updated weights for policy 0, policy_version 773924 (0.0012) [2024-06-15 21:00:24,881][1652491] Updated weights for policy 0, policy_version 773969 (0.0015) [2024-06-15 21:00:25,955][1648985] Fps is (10 sec: 39322.8, 60 sec: 47513.6, 300 sec: 47097.0). Total num frames: 1585184768. Throughput: 0: 11389.2. Samples: 396330496. Policy #0 lag: (min: 55.0, avg: 212.3, max: 305.0) [2024-06-15 21:00:25,956][1648985] Avg episode reward: [(0, '160.490')] [2024-06-15 21:00:26,639][1652491] Updated weights for policy 0, policy_version 774020 (0.0013) [2024-06-15 21:00:27,619][1652491] Updated weights for policy 0, policy_version 774078 (0.0016) [2024-06-15 21:00:30,955][1648985] Fps is (10 sec: 52428.5, 60 sec: 43690.5, 300 sec: 46430.6). Total num frames: 1585315840. Throughput: 0: 11582.5. Samples: 396414464. Policy #0 lag: (min: 55.0, avg: 212.3, max: 305.0) [2024-06-15 21:00:30,956][1648985] Avg episode reward: [(0, '177.630')] [2024-06-15 21:00:32,362][1652491] Updated weights for policy 0, policy_version 774136 (0.0016) [2024-06-15 21:00:33,454][1652491] Updated weights for policy 0, policy_version 774166 (0.0012) [2024-06-15 21:00:35,341][1652491] Updated weights for policy 0, policy_version 774209 (0.0125) [2024-06-15 21:00:35,955][1648985] Fps is (10 sec: 45875.4, 60 sec: 48605.8, 300 sec: 47208.1). Total num frames: 1585643520. Throughput: 0: 11593.9. Samples: 396480512. Policy #0 lag: (min: 55.0, avg: 212.3, max: 305.0) [2024-06-15 21:00:35,956][1648985] Avg episode reward: [(0, '168.840')] [2024-06-15 21:00:37,223][1652491] Updated weights for policy 0, policy_version 774278 (0.0012) [2024-06-15 21:00:38,335][1652491] Updated weights for policy 0, policy_version 774336 (0.0017) [2024-06-15 21:00:40,955][1648985] Fps is (10 sec: 52429.9, 60 sec: 44783.1, 300 sec: 46874.9). Total num frames: 1585840128. Throughput: 0: 11685.1. Samples: 396513280. Policy #0 lag: (min: 55.0, avg: 212.3, max: 305.0) [2024-06-15 21:00:40,956][1648985] Avg episode reward: [(0, '175.440')] [2024-06-15 21:00:42,763][1651469] Signal inference workers to stop experience collection... (40250 times) [2024-06-15 21:00:42,810][1652491] InferenceWorker_p0-w0: stopping experience collection (40250 times) [2024-06-15 21:00:43,034][1651469] Signal inference workers to resume experience collection... (40250 times) [2024-06-15 21:00:43,035][1652491] InferenceWorker_p0-w0: resuming experience collection (40250 times) [2024-06-15 21:00:43,335][1652491] Updated weights for policy 0, policy_version 774397 (0.0012) [2024-06-15 21:00:45,219][1652491] Updated weights for policy 0, policy_version 774453 (0.0011) [2024-06-15 21:00:45,955][1648985] Fps is (10 sec: 45874.6, 60 sec: 48059.6, 300 sec: 47097.0). Total num frames: 1586102272. Throughput: 0: 11652.6. Samples: 396588544. Policy #0 lag: (min: 55.0, avg: 212.3, max: 305.0) [2024-06-15 21:00:45,956][1648985] Avg episode reward: [(0, '154.310')] [2024-06-15 21:00:47,083][1652491] Updated weights for policy 0, policy_version 774497 (0.0032) [2024-06-15 21:00:48,742][1652491] Updated weights for policy 0, policy_version 774560 (0.0012) [2024-06-15 21:00:50,957][1648985] Fps is (10 sec: 52419.2, 60 sec: 47512.1, 300 sec: 46985.7). Total num frames: 1586364416. Throughput: 0: 11616.2. Samples: 396658688. Policy #0 lag: (min: 55.0, avg: 212.3, max: 305.0) [2024-06-15 21:00:50,957][1648985] Avg episode reward: [(0, '162.130')] [2024-06-15 21:00:53,606][1652491] Updated weights for policy 0, policy_version 774609 (0.0013) [2024-06-15 21:00:54,396][1652491] Updated weights for policy 0, policy_version 774656 (0.0013) [2024-06-15 21:00:55,955][1648985] Fps is (10 sec: 49151.5, 60 sec: 47513.7, 300 sec: 46986.0). Total num frames: 1586593792. Throughput: 0: 11639.4. Samples: 396701184. Policy #0 lag: (min: 55.0, avg: 212.3, max: 305.0) [2024-06-15 21:00:55,956][1648985] Avg episode reward: [(0, '155.790')] [2024-06-15 21:00:56,064][1652491] Updated weights for policy 0, policy_version 774712 (0.0013) [2024-06-15 21:00:56,212][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000774720_1586626560.pth... [2024-06-15 21:00:56,256][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000769216_1575354368.pth [2024-06-15 21:00:57,470][1652491] Updated weights for policy 0, policy_version 774754 (0.0012) [2024-06-15 21:00:59,297][1652491] Updated weights for policy 0, policy_version 774816 (0.0021) [2024-06-15 21:01:00,955][1648985] Fps is (10 sec: 52438.2, 60 sec: 48611.7, 300 sec: 47208.1). Total num frames: 1586888704. Throughput: 0: 11935.4. Samples: 396769280. Policy #0 lag: (min: 55.0, avg: 212.3, max: 305.0) [2024-06-15 21:01:00,955][1648985] Avg episode reward: [(0, '174.950')] [2024-06-15 21:01:03,593][1652491] Updated weights for policy 0, policy_version 774849 (0.0012) [2024-06-15 21:01:05,116][1652491] Updated weights for policy 0, policy_version 774912 (0.0021) [2024-06-15 21:01:05,957][1648985] Fps is (10 sec: 42590.6, 60 sec: 45873.5, 300 sec: 46652.4). Total num frames: 1587019776. Throughput: 0: 11946.1. Samples: 396845056. Policy #0 lag: (min: 11.0, avg: 94.7, max: 267.0) [2024-06-15 21:01:05,958][1648985] Avg episode reward: [(0, '169.860')] [2024-06-15 21:01:07,116][1652491] Updated weights for policy 0, policy_version 774970 (0.0102) [2024-06-15 21:01:08,714][1652491] Updated weights for policy 0, policy_version 775015 (0.0014) [2024-06-15 21:01:10,766][1652491] Updated weights for policy 0, policy_version 775072 (0.0013) [2024-06-15 21:01:10,955][1648985] Fps is (10 sec: 45874.8, 60 sec: 49698.2, 300 sec: 47097.0). Total num frames: 1587347456. Throughput: 0: 12117.3. Samples: 396875776. Policy #0 lag: (min: 11.0, avg: 94.7, max: 267.0) [2024-06-15 21:01:10,956][1648985] Avg episode reward: [(0, '175.860')] [2024-06-15 21:01:15,442][1652491] Updated weights for policy 0, policy_version 775120 (0.0013) [2024-06-15 21:01:15,955][1648985] Fps is (10 sec: 45885.0, 60 sec: 44783.2, 300 sec: 46430.6). Total num frames: 1587478528. Throughput: 0: 11935.3. Samples: 396951552. Policy #0 lag: (min: 11.0, avg: 94.7, max: 267.0) [2024-06-15 21:01:15,956][1648985] Avg episode reward: [(0, '160.870')] [2024-06-15 21:01:18,548][1652491] Updated weights for policy 0, policy_version 775187 (0.0024) [2024-06-15 21:01:19,973][1652491] Updated weights for policy 0, policy_version 775252 (0.0012) [2024-06-15 21:01:20,955][1648985] Fps is (10 sec: 45875.5, 60 sec: 50244.3, 300 sec: 47097.1). Total num frames: 1587806208. Throughput: 0: 11776.0. Samples: 397010432. Policy #0 lag: (min: 11.0, avg: 94.7, max: 267.0) [2024-06-15 21:01:20,956][1648985] Avg episode reward: [(0, '162.710')] [2024-06-15 21:01:21,719][1652491] Updated weights for policy 0, policy_version 775312 (0.0015) [2024-06-15 21:01:22,698][1652491] Updated weights for policy 0, policy_version 775360 (0.0012) [2024-06-15 21:01:25,955][1648985] Fps is (10 sec: 45875.0, 60 sec: 45875.2, 300 sec: 46208.4). Total num frames: 1587937280. Throughput: 0: 11844.2. Samples: 397046272. Policy #0 lag: (min: 11.0, avg: 94.7, max: 267.0) [2024-06-15 21:01:25,956][1648985] Avg episode reward: [(0, '156.090')] [2024-06-15 21:01:28,429][1652491] Updated weights for policy 0, policy_version 775418 (0.0014) [2024-06-15 21:01:28,959][1651469] Signal inference workers to stop experience collection... (40300 times) [2024-06-15 21:01:29,001][1652491] InferenceWorker_p0-w0: stopping experience collection (40300 times) [2024-06-15 21:01:29,176][1651469] Signal inference workers to resume experience collection... (40300 times) [2024-06-15 21:01:29,178][1652491] InferenceWorker_p0-w0: resuming experience collection (40300 times) [2024-06-15 21:01:29,931][1652491] Updated weights for policy 0, policy_version 775472 (0.0012) [2024-06-15 21:01:30,958][1648985] Fps is (10 sec: 42584.7, 60 sec: 48603.4, 300 sec: 46985.4). Total num frames: 1588232192. Throughput: 0: 11877.6. Samples: 397123072. Policy #0 lag: (min: 11.0, avg: 94.7, max: 267.0) [2024-06-15 21:01:30,959][1648985] Avg episode reward: [(0, '157.800')] [2024-06-15 21:01:31,489][1652491] Updated weights for policy 0, policy_version 775541 (0.0071) [2024-06-15 21:01:33,067][1652491] Updated weights for policy 0, policy_version 775570 (0.0059) [2024-06-15 21:01:33,935][1652491] Updated weights for policy 0, policy_version 775616 (0.0013) [2024-06-15 21:01:35,955][1648985] Fps is (10 sec: 52428.4, 60 sec: 46967.4, 300 sec: 46652.7). Total num frames: 1588461568. Throughput: 0: 11776.4. Samples: 397188608. Policy #0 lag: (min: 11.0, avg: 94.7, max: 267.0) [2024-06-15 21:01:35,956][1648985] Avg episode reward: [(0, '140.240')] [2024-06-15 21:01:40,293][1652491] Updated weights for policy 0, policy_version 775680 (0.0016) [2024-06-15 21:01:40,956][1648985] Fps is (10 sec: 39329.5, 60 sec: 46420.3, 300 sec: 46763.6). Total num frames: 1588625408. Throughput: 0: 11764.4. Samples: 397230592. Policy #0 lag: (min: 11.0, avg: 94.7, max: 267.0) [2024-06-15 21:01:40,957][1648985] Avg episode reward: [(0, '155.830')] [2024-06-15 21:01:41,588][1652491] Updated weights for policy 0, policy_version 775732 (0.0013) [2024-06-15 21:01:42,857][1652491] Updated weights for policy 0, policy_version 775798 (0.0015) [2024-06-15 21:01:45,091][1652491] Updated weights for policy 0, policy_version 775856 (0.0102) [2024-06-15 21:01:45,955][1648985] Fps is (10 sec: 52429.6, 60 sec: 48059.9, 300 sec: 46986.9). Total num frames: 1588985856. Throughput: 0: 11559.8. Samples: 397289472. Policy #0 lag: (min: 11.0, avg: 94.7, max: 267.0) [2024-06-15 21:01:45,955][1648985] Avg episode reward: [(0, '168.460')] [2024-06-15 21:01:50,378][1652491] Updated weights for policy 0, policy_version 775890 (0.0015) [2024-06-15 21:01:50,955][1648985] Fps is (10 sec: 42603.6, 60 sec: 44784.3, 300 sec: 46430.6). Total num frames: 1589051392. Throughput: 0: 11583.1. Samples: 397366272. Policy #0 lag: (min: 11.0, avg: 94.7, max: 267.0) [2024-06-15 21:01:50,955][1648985] Avg episode reward: [(0, '167.900')] [2024-06-15 21:01:51,498][1652491] Updated weights for policy 0, policy_version 775936 (0.0015) [2024-06-15 21:01:53,325][1652491] Updated weights for policy 0, policy_version 776016 (0.0186) [2024-06-15 21:01:55,661][1652491] Updated weights for policy 0, policy_version 776096 (0.0013) [2024-06-15 21:01:55,955][1648985] Fps is (10 sec: 45875.3, 60 sec: 47513.9, 300 sec: 46874.9). Total num frames: 1589444608. Throughput: 0: 11434.7. Samples: 397390336. Policy #0 lag: (min: 11.0, avg: 94.7, max: 267.0) [2024-06-15 21:01:55,955][1648985] Avg episode reward: [(0, '160.100')] [2024-06-15 21:01:56,401][1652491] Updated weights for policy 0, policy_version 776128 (0.0012) [2024-06-15 21:02:00,955][1648985] Fps is (10 sec: 45874.5, 60 sec: 43690.5, 300 sec: 46208.4). Total num frames: 1589510144. Throughput: 0: 11548.4. Samples: 397471232. Policy #0 lag: (min: 11.0, avg: 94.7, max: 267.0) [2024-06-15 21:02:00,956][1648985] Avg episode reward: [(0, '151.770')] [2024-06-15 21:02:02,498][1652491] Updated weights for policy 0, policy_version 776185 (0.0114) [2024-06-15 21:02:04,168][1652491] Updated weights for policy 0, policy_version 776256 (0.0012) [2024-06-15 21:02:05,690][1651469] Signal inference workers to stop experience collection... (40350 times) [2024-06-15 21:02:05,724][1652491] InferenceWorker_p0-w0: stopping experience collection (40350 times) [2024-06-15 21:02:05,882][1651469] Signal inference workers to resume experience collection... (40350 times) [2024-06-15 21:02:05,890][1652491] InferenceWorker_p0-w0: resuming experience collection (40350 times) [2024-06-15 21:02:05,893][1652491] Updated weights for policy 0, policy_version 776336 (0.0014) [2024-06-15 21:02:05,955][1648985] Fps is (10 sec: 49151.5, 60 sec: 48607.5, 300 sec: 47319.2). Total num frames: 1589936128. Throughput: 0: 11719.1. Samples: 397537792. Policy #0 lag: (min: 11.0, avg: 94.7, max: 267.0) [2024-06-15 21:02:05,956][1648985] Avg episode reward: [(0, '157.510')] [2024-06-15 21:02:10,968][1648985] Fps is (10 sec: 52361.2, 60 sec: 44773.2, 300 sec: 46206.4). Total num frames: 1590034432. Throughput: 0: 11863.6. Samples: 397580288. Policy #0 lag: (min: 11.0, avg: 94.7, max: 267.0) [2024-06-15 21:02:10,969][1648985] Avg episode reward: [(0, '170.410')] [2024-06-15 21:02:12,252][1652491] Updated weights for policy 0, policy_version 776387 (0.0084) [2024-06-15 21:02:14,660][1652491] Updated weights for policy 0, policy_version 776496 (0.0013) [2024-06-15 21:02:15,837][1652491] Updated weights for policy 0, policy_version 776560 (0.0014) [2024-06-15 21:02:15,955][1648985] Fps is (10 sec: 45875.1, 60 sec: 48605.8, 300 sec: 47430.3). Total num frames: 1590394880. Throughput: 0: 11606.1. Samples: 397645312. Policy #0 lag: (min: 11.0, avg: 94.7, max: 267.0) [2024-06-15 21:02:15,956][1648985] Avg episode reward: [(0, '167.190')] [2024-06-15 21:02:16,841][1652491] Updated weights for policy 0, policy_version 776608 (0.0013) [2024-06-15 21:02:17,458][1652491] Updated weights for policy 0, policy_version 776639 (0.0014) [2024-06-15 21:02:20,955][1648985] Fps is (10 sec: 52497.3, 60 sec: 45875.2, 300 sec: 46652.8). Total num frames: 1590558720. Throughput: 0: 11969.5. Samples: 397727232. Policy #0 lag: (min: 11.0, avg: 94.7, max: 267.0) [2024-06-15 21:02:20,955][1648985] Avg episode reward: [(0, '168.580')] [2024-06-15 21:02:24,094][1652491] Updated weights for policy 0, policy_version 776697 (0.0030) [2024-06-15 21:02:25,955][1648985] Fps is (10 sec: 39321.7, 60 sec: 47513.6, 300 sec: 46986.0). Total num frames: 1590788096. Throughput: 0: 11821.8. Samples: 397762560. Policy #0 lag: (min: 11.0, avg: 94.7, max: 267.0) [2024-06-15 21:02:25,956][1648985] Avg episode reward: [(0, '173.340')] [2024-06-15 21:02:26,361][1652491] Updated weights for policy 0, policy_version 776784 (0.0208) [2024-06-15 21:02:27,581][1652491] Updated weights for policy 0, policy_version 776848 (0.0086) [2024-06-15 21:02:28,386][1652491] Updated weights for policy 0, policy_version 776895 (0.0017) [2024-06-15 21:02:30,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 47516.2, 300 sec: 47097.1). Total num frames: 1591083008. Throughput: 0: 11810.1. Samples: 397820928. Policy #0 lag: (min: 11.0, avg: 94.7, max: 267.0) [2024-06-15 21:02:30,955][1648985] Avg episode reward: [(0, '186.130')] [2024-06-15 21:02:35,955][1648985] Fps is (10 sec: 39321.6, 60 sec: 45329.1, 300 sec: 46542.2). Total num frames: 1591181312. Throughput: 0: 11821.5. Samples: 397898240. Policy #0 lag: (min: 11.0, avg: 94.7, max: 267.0) [2024-06-15 21:02:35,956][1648985] Avg episode reward: [(0, '181.950')] [2024-06-15 21:02:36,280][1652491] Updated weights for policy 0, policy_version 776960 (0.0013) [2024-06-15 21:02:37,848][1652491] Updated weights for policy 0, policy_version 777026 (0.0013) [2024-06-15 21:02:38,842][1652491] Updated weights for policy 0, policy_version 777079 (0.0016) [2024-06-15 21:02:39,966][1652491] Updated weights for policy 0, policy_version 777136 (0.0084) [2024-06-15 21:02:40,956][1648985] Fps is (10 sec: 52425.3, 60 sec: 49698.6, 300 sec: 47319.1). Total num frames: 1591607296. Throughput: 0: 11923.7. Samples: 397926912. Policy #0 lag: (min: 11.0, avg: 94.7, max: 267.0) [2024-06-15 21:02:40,956][1648985] Avg episode reward: [(0, '173.900')] [2024-06-15 21:02:45,965][1648985] Fps is (10 sec: 42555.2, 60 sec: 43683.2, 300 sec: 46206.8). Total num frames: 1591607296. Throughput: 0: 11932.6. Samples: 398008320. Policy #0 lag: (min: 11.0, avg: 94.7, max: 267.0) [2024-06-15 21:02:45,966][1648985] Avg episode reward: [(0, '193.290')] [2024-06-15 21:02:46,692][1651469] Signal inference workers to stop experience collection... (40400 times) [2024-06-15 21:02:46,730][1652491] InferenceWorker_p0-w0: stopping experience collection (40400 times) [2024-06-15 21:02:46,733][1652491] Updated weights for policy 0, policy_version 777187 (0.0011) [2024-06-15 21:02:46,948][1651469] Signal inference workers to resume experience collection... (40400 times) [2024-06-15 21:02:46,949][1652491] InferenceWorker_p0-w0: resuming experience collection (40400 times) [2024-06-15 21:02:48,496][1652491] Updated weights for policy 0, policy_version 777252 (0.0040) [2024-06-15 21:02:49,828][1652491] Updated weights for policy 0, policy_version 777317 (0.0045) [2024-06-15 21:02:50,955][1648985] Fps is (10 sec: 42600.4, 60 sec: 49698.0, 300 sec: 47208.1). Total num frames: 1592033280. Throughput: 0: 11730.4. Samples: 398065664. Policy #0 lag: (min: 32.0, avg: 137.2, max: 262.0) [2024-06-15 21:02:50,956][1648985] Avg episode reward: [(0, '196.840')] [2024-06-15 21:02:51,339][1652491] Updated weights for policy 0, policy_version 777392 (0.0013) [2024-06-15 21:02:55,955][1648985] Fps is (10 sec: 52480.4, 60 sec: 44782.6, 300 sec: 46208.4). Total num frames: 1592131584. Throughput: 0: 11676.9. Samples: 398105600. Policy #0 lag: (min: 32.0, avg: 137.2, max: 262.0) [2024-06-15 21:02:55,956][1648985] Avg episode reward: [(0, '156.220')] [2024-06-15 21:02:55,962][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000777408_1592131584.pth... [2024-06-15 21:02:56,008][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000772032_1581121536.pth [2024-06-15 21:02:57,407][1652491] Updated weights for policy 0, policy_version 777426 (0.0020) [2024-06-15 21:02:59,755][1652491] Updated weights for policy 0, policy_version 777508 (0.0013) [2024-06-15 21:03:00,955][1648985] Fps is (10 sec: 39322.3, 60 sec: 48606.0, 300 sec: 46986.0). Total num frames: 1592426496. Throughput: 0: 11787.4. Samples: 398175744. Policy #0 lag: (min: 32.0, avg: 137.2, max: 262.0) [2024-06-15 21:03:00,956][1648985] Avg episode reward: [(0, '175.300')] [2024-06-15 21:03:01,202][1652491] Updated weights for policy 0, policy_version 777571 (0.0012) [2024-06-15 21:03:02,623][1652491] Updated weights for policy 0, policy_version 777648 (0.0016) [2024-06-15 21:03:05,955][1648985] Fps is (10 sec: 52430.7, 60 sec: 45329.1, 300 sec: 46430.6). Total num frames: 1592655872. Throughput: 0: 11537.1. Samples: 398246400. Policy #0 lag: (min: 32.0, avg: 137.2, max: 262.0) [2024-06-15 21:03:05,956][1648985] Avg episode reward: [(0, '187.370')] [2024-06-15 21:03:08,839][1652491] Updated weights for policy 0, policy_version 777712 (0.0050) [2024-06-15 21:03:10,675][1652491] Updated weights for policy 0, policy_version 777744 (0.0012) [2024-06-15 21:03:10,955][1648985] Fps is (10 sec: 39321.1, 60 sec: 46431.3, 300 sec: 46763.8). Total num frames: 1592819712. Throughput: 0: 11480.1. Samples: 398279168. Policy #0 lag: (min: 32.0, avg: 137.2, max: 262.0) [2024-06-15 21:03:10,956][1648985] Avg episode reward: [(0, '170.400')] [2024-06-15 21:03:12,051][1652491] Updated weights for policy 0, policy_version 777793 (0.0012) [2024-06-15 21:03:13,506][1652491] Updated weights for policy 0, policy_version 777858 (0.0013) [2024-06-15 21:03:15,955][1648985] Fps is (10 sec: 52428.1, 60 sec: 46421.3, 300 sec: 46763.8). Total num frames: 1593180160. Throughput: 0: 11525.7. Samples: 398339584. Policy #0 lag: (min: 32.0, avg: 137.2, max: 262.0) [2024-06-15 21:03:15,956][1648985] Avg episode reward: [(0, '156.840')] [2024-06-15 21:03:19,790][1652491] Updated weights for policy 0, policy_version 777925 (0.0013) [2024-06-15 21:03:20,899][1652491] Updated weights for policy 0, policy_version 777977 (0.0012) [2024-06-15 21:03:20,955][1648985] Fps is (10 sec: 45875.7, 60 sec: 45329.0, 300 sec: 46541.7). Total num frames: 1593278464. Throughput: 0: 11502.9. Samples: 398415872. Policy #0 lag: (min: 32.0, avg: 137.2, max: 262.0) [2024-06-15 21:03:20,956][1648985] Avg episode reward: [(0, '157.570')] [2024-06-15 21:03:22,426][1652491] Updated weights for policy 0, policy_version 778032 (0.0034) [2024-06-15 21:03:23,792][1651469] Signal inference workers to stop experience collection... (40450 times) [2024-06-15 21:03:23,844][1652491] InferenceWorker_p0-w0: stopping experience collection (40450 times) [2024-06-15 21:03:24,048][1651469] Signal inference workers to resume experience collection... (40450 times) [2024-06-15 21:03:24,049][1652491] InferenceWorker_p0-w0: resuming experience collection (40450 times) [2024-06-15 21:03:24,051][1652491] Updated weights for policy 0, policy_version 778096 (0.0012) [2024-06-15 21:03:25,253][1652491] Updated weights for policy 0, policy_version 778147 (0.0012) [2024-06-15 21:03:25,955][1648985] Fps is (10 sec: 52429.4, 60 sec: 48605.9, 300 sec: 47097.1). Total num frames: 1593704448. Throughput: 0: 11548.6. Samples: 398446592. Policy #0 lag: (min: 32.0, avg: 137.2, max: 262.0) [2024-06-15 21:03:25,956][1648985] Avg episode reward: [(0, '175.460')] [2024-06-15 21:03:30,955][1648985] Fps is (10 sec: 42599.1, 60 sec: 43690.7, 300 sec: 46208.5). Total num frames: 1593704448. Throughput: 0: 11551.1. Samples: 398528000. Policy #0 lag: (min: 32.0, avg: 137.2, max: 262.0) [2024-06-15 21:03:30,955][1648985] Avg episode reward: [(0, '160.820')] [2024-06-15 21:03:31,785][1652491] Updated weights for policy 0, policy_version 778209 (0.0013) [2024-06-15 21:03:33,552][1652491] Updated weights for policy 0, policy_version 778302 (0.0132) [2024-06-15 21:03:35,424][1652491] Updated weights for policy 0, policy_version 778367 (0.0015) [2024-06-15 21:03:35,958][1648985] Fps is (10 sec: 42584.9, 60 sec: 49149.4, 300 sec: 46985.5). Total num frames: 1594130432. Throughput: 0: 11547.7. Samples: 398585344. Policy #0 lag: (min: 32.0, avg: 137.2, max: 262.0) [2024-06-15 21:03:35,959][1648985] Avg episode reward: [(0, '172.570')] [2024-06-15 21:03:37,116][1652491] Updated weights for policy 0, policy_version 778432 (0.0012) [2024-06-15 21:03:40,955][1648985] Fps is (10 sec: 52428.1, 60 sec: 43691.1, 300 sec: 46208.4). Total num frames: 1594228736. Throughput: 0: 11468.9. Samples: 398621696. Policy #0 lag: (min: 32.0, avg: 137.2, max: 262.0) [2024-06-15 21:03:40,956][1648985] Avg episode reward: [(0, '149.690')] [2024-06-15 21:03:44,007][1652491] Updated weights for policy 0, policy_version 778500 (0.0013) [2024-06-15 21:03:45,955][1648985] Fps is (10 sec: 36056.3, 60 sec: 48067.9, 300 sec: 46874.9). Total num frames: 1594490880. Throughput: 0: 11491.6. Samples: 398692864. Policy #0 lag: (min: 32.0, avg: 137.2, max: 262.0) [2024-06-15 21:03:45,955][1648985] Avg episode reward: [(0, '169.650')] [2024-06-15 21:03:46,157][1652491] Updated weights for policy 0, policy_version 778576 (0.0027) [2024-06-15 21:03:48,125][1652491] Updated weights for policy 0, policy_version 778665 (0.0133) [2024-06-15 21:03:48,585][1652491] Updated weights for policy 0, policy_version 778688 (0.0013) [2024-06-15 21:03:50,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 45329.2, 300 sec: 46319.5). Total num frames: 1594753024. Throughput: 0: 11537.0. Samples: 398765568. Policy #0 lag: (min: 32.0, avg: 137.2, max: 262.0) [2024-06-15 21:03:50,956][1648985] Avg episode reward: [(0, '175.300')] [2024-06-15 21:03:54,499][1652491] Updated weights for policy 0, policy_version 778753 (0.0013) [2024-06-15 21:03:55,599][1652491] Updated weights for policy 0, policy_version 778810 (0.0013) [2024-06-15 21:03:55,955][1648985] Fps is (10 sec: 52427.7, 60 sec: 48059.9, 300 sec: 47098.8). Total num frames: 1595015168. Throughput: 0: 11707.7. Samples: 398806016. Policy #0 lag: (min: 32.0, avg: 137.2, max: 262.0) [2024-06-15 21:03:55,956][1648985] Avg episode reward: [(0, '166.710')] [2024-06-15 21:03:58,402][1652491] Updated weights for policy 0, policy_version 778880 (0.0015) [2024-06-15 21:03:59,816][1652491] Updated weights for policy 0, policy_version 778942 (0.0094) [2024-06-15 21:04:00,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 47513.6, 300 sec: 46763.8). Total num frames: 1595277312. Throughput: 0: 11696.4. Samples: 398865920. Policy #0 lag: (min: 32.0, avg: 137.2, max: 262.0) [2024-06-15 21:04:00,956][1648985] Avg episode reward: [(0, '171.490')] [2024-06-15 21:04:04,784][1652491] Updated weights for policy 0, policy_version 779005 (0.0046) [2024-06-15 21:04:05,415][1651469] Signal inference workers to stop experience collection... (40500 times) [2024-06-15 21:04:05,487][1652491] InferenceWorker_p0-w0: stopping experience collection (40500 times) [2024-06-15 21:04:05,680][1651469] Signal inference workers to resume experience collection... (40500 times) [2024-06-15 21:04:05,682][1652491] InferenceWorker_p0-w0: resuming experience collection (40500 times) [2024-06-15 21:04:05,955][1648985] Fps is (10 sec: 42598.7, 60 sec: 46421.2, 300 sec: 46763.8). Total num frames: 1595441152. Throughput: 0: 11776.0. Samples: 398945792. Policy #0 lag: (min: 32.0, avg: 137.2, max: 262.0) [2024-06-15 21:04:05,956][1648985] Avg episode reward: [(0, '161.500')] [2024-06-15 21:04:06,624][1652491] Updated weights for policy 0, policy_version 779069 (0.0046) [2024-06-15 21:04:09,480][1652491] Updated weights for policy 0, policy_version 779120 (0.0012) [2024-06-15 21:04:10,955][1648985] Fps is (10 sec: 49151.5, 60 sec: 49152.0, 300 sec: 46874.9). Total num frames: 1595768832. Throughput: 0: 11855.6. Samples: 398980096. Policy #0 lag: (min: 32.0, avg: 137.2, max: 262.0) [2024-06-15 21:04:10,956][1648985] Avg episode reward: [(0, '151.270')] [2024-06-15 21:04:11,038][1652491] Updated weights for policy 0, policy_version 779197 (0.0022) [2024-06-15 21:04:15,841][1652491] Updated weights for policy 0, policy_version 779259 (0.0152) [2024-06-15 21:04:15,955][1648985] Fps is (10 sec: 49151.4, 60 sec: 45875.1, 300 sec: 46652.7). Total num frames: 1595932672. Throughput: 0: 11730.4. Samples: 399055872. Policy #0 lag: (min: 32.0, avg: 137.2, max: 262.0) [2024-06-15 21:04:15,956][1648985] Avg episode reward: [(0, '142.070')] [2024-06-15 21:04:18,591][1652491] Updated weights for policy 0, policy_version 779326 (0.0013) [2024-06-15 21:04:20,681][1652491] Updated weights for policy 0, policy_version 779376 (0.0012) [2024-06-15 21:04:20,955][1648985] Fps is (10 sec: 39322.2, 60 sec: 48059.8, 300 sec: 46874.9). Total num frames: 1596162048. Throughput: 0: 11708.6. Samples: 399112192. Policy #0 lag: (min: 32.0, avg: 137.2, max: 262.0) [2024-06-15 21:04:20,956][1648985] Avg episode reward: [(0, '158.520')] [2024-06-15 21:04:22,284][1652491] Updated weights for policy 0, policy_version 779440 (0.0015) [2024-06-15 21:04:25,955][1648985] Fps is (10 sec: 39322.3, 60 sec: 43690.6, 300 sec: 46208.4). Total num frames: 1596325888. Throughput: 0: 11696.4. Samples: 399148032. Policy #0 lag: (min: 32.0, avg: 137.2, max: 262.0) [2024-06-15 21:04:25,956][1648985] Avg episode reward: [(0, '179.630')] [2024-06-15 21:04:26,525][1652491] Updated weights for policy 0, policy_version 779473 (0.0012) [2024-06-15 21:04:29,679][1652491] Updated weights for policy 0, policy_version 779552 (0.0015) [2024-06-15 21:04:30,955][1648985] Fps is (10 sec: 42598.1, 60 sec: 48059.6, 300 sec: 46986.0). Total num frames: 1596588032. Throughput: 0: 11741.8. Samples: 399221248. Policy #0 lag: (min: 32.0, avg: 137.2, max: 262.0) [2024-06-15 21:04:30,956][1648985] Avg episode reward: [(0, '180.390')] [2024-06-15 21:04:31,101][1652491] Updated weights for policy 0, policy_version 779600 (0.0013) [2024-06-15 21:04:32,355][1652491] Updated weights for policy 0, policy_version 779655 (0.0011) [2024-06-15 21:04:33,320][1652491] Updated weights for policy 0, policy_version 779710 (0.0013) [2024-06-15 21:04:35,955][1648985] Fps is (10 sec: 52427.1, 60 sec: 45331.2, 300 sec: 46430.6). Total num frames: 1596850176. Throughput: 0: 11730.4. Samples: 399293440. Policy #0 lag: (min: 32.0, avg: 137.2, max: 262.0) [2024-06-15 21:04:35,956][1648985] Avg episode reward: [(0, '183.450')] [2024-06-15 21:04:38,626][1652491] Updated weights for policy 0, policy_version 779769 (0.0013) [2024-06-15 21:04:40,955][1648985] Fps is (10 sec: 45875.3, 60 sec: 46967.5, 300 sec: 46874.9). Total num frames: 1597046784. Throughput: 0: 11582.6. Samples: 399327232. Policy #0 lag: (min: 15.0, avg: 110.2, max: 271.0) [2024-06-15 21:04:40,956][1648985] Avg episode reward: [(0, '199.530')] [2024-06-15 21:04:41,195][1652491] Updated weights for policy 0, policy_version 779824 (0.0014) [2024-06-15 21:04:43,062][1652491] Updated weights for policy 0, policy_version 779906 (0.0013) [2024-06-15 21:04:45,955][1648985] Fps is (10 sec: 52431.5, 60 sec: 48059.8, 300 sec: 46986.0). Total num frames: 1597374464. Throughput: 0: 11696.4. Samples: 399392256. Policy #0 lag: (min: 15.0, avg: 110.2, max: 271.0) [2024-06-15 21:04:45,955][1648985] Avg episode reward: [(0, '193.270')] [2024-06-15 21:04:48,439][1651469] Signal inference workers to stop experience collection... (40550 times) [2024-06-15 21:04:48,480][1652491] InferenceWorker_p0-w0: stopping experience collection (40550 times) [2024-06-15 21:04:48,490][1652491] Updated weights for policy 0, policy_version 779971 (0.0015) [2024-06-15 21:04:48,766][1651469] Signal inference workers to resume experience collection... (40550 times) [2024-06-15 21:04:48,767][1652491] InferenceWorker_p0-w0: resuming experience collection (40550 times) [2024-06-15 21:04:50,955][1648985] Fps is (10 sec: 45874.7, 60 sec: 45875.2, 300 sec: 46652.8). Total num frames: 1597505536. Throughput: 0: 11685.0. Samples: 399471616. Policy #0 lag: (min: 15.0, avg: 110.2, max: 271.0) [2024-06-15 21:04:50,956][1648985] Avg episode reward: [(0, '159.360')] [2024-06-15 21:04:51,459][1652491] Updated weights for policy 0, policy_version 780035 (0.0014) [2024-06-15 21:04:52,765][1652491] Updated weights for policy 0, policy_version 780088 (0.0045) [2024-06-15 21:04:54,142][1652491] Updated weights for policy 0, policy_version 780145 (0.0028) [2024-06-15 21:04:55,955][1648985] Fps is (10 sec: 52426.4, 60 sec: 48059.6, 300 sec: 47209.2). Total num frames: 1597898752. Throughput: 0: 11593.9. Samples: 399501824. Policy #0 lag: (min: 15.0, avg: 110.2, max: 271.0) [2024-06-15 21:04:55,956][1648985] Avg episode reward: [(0, '158.230')] [2024-06-15 21:04:55,961][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000780224_1597898752.pth... [2024-06-15 21:04:56,070][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000774720_1586626560.pth [2024-06-15 21:04:59,352][1652491] Updated weights for policy 0, policy_version 780240 (0.0013) [2024-06-15 21:05:00,363][1652491] Updated weights for policy 0, policy_version 780284 (0.0011) [2024-06-15 21:05:00,955][1648985] Fps is (10 sec: 52429.4, 60 sec: 45875.2, 300 sec: 46652.7). Total num frames: 1598029824. Throughput: 0: 11650.9. Samples: 399580160. Policy #0 lag: (min: 15.0, avg: 110.2, max: 271.0) [2024-06-15 21:05:00,956][1648985] Avg episode reward: [(0, '157.550')] [2024-06-15 21:05:03,262][1652491] Updated weights for policy 0, policy_version 780340 (0.0013) [2024-06-15 21:05:04,821][1652491] Updated weights for policy 0, policy_version 780416 (0.0027) [2024-06-15 21:05:05,955][1648985] Fps is (10 sec: 49152.4, 60 sec: 49151.9, 300 sec: 47541.3). Total num frames: 1598390272. Throughput: 0: 11867.0. Samples: 399646208. Policy #0 lag: (min: 15.0, avg: 110.2, max: 271.0) [2024-06-15 21:05:05,956][1648985] Avg episode reward: [(0, '167.710')] [2024-06-15 21:05:06,023][1652491] Updated weights for policy 0, policy_version 780472 (0.0013) [2024-06-15 21:05:10,774][1652491] Updated weights for policy 0, policy_version 780532 (0.0021) [2024-06-15 21:05:10,955][1648985] Fps is (10 sec: 49151.1, 60 sec: 45875.1, 300 sec: 46541.7). Total num frames: 1598521344. Throughput: 0: 11969.4. Samples: 399686656. Policy #0 lag: (min: 15.0, avg: 110.2, max: 271.0) [2024-06-15 21:05:10,956][1648985] Avg episode reward: [(0, '180.400')] [2024-06-15 21:05:12,930][1652491] Updated weights for policy 0, policy_version 780546 (0.0014) [2024-06-15 21:05:14,764][1652491] Updated weights for policy 0, policy_version 780630 (0.0015) [2024-06-15 21:05:15,955][1648985] Fps is (10 sec: 42599.5, 60 sec: 48059.9, 300 sec: 47541.4). Total num frames: 1598816256. Throughput: 0: 11980.8. Samples: 399760384. Policy #0 lag: (min: 15.0, avg: 110.2, max: 271.0) [2024-06-15 21:05:15,955][1648985] Avg episode reward: [(0, '168.890')] [2024-06-15 21:05:16,741][1652491] Updated weights for policy 0, policy_version 780705 (0.0116) [2024-06-15 21:05:20,955][1648985] Fps is (10 sec: 42599.4, 60 sec: 46421.3, 300 sec: 46652.8). Total num frames: 1598947328. Throughput: 0: 11969.5. Samples: 399832064. Policy #0 lag: (min: 15.0, avg: 110.2, max: 271.0) [2024-06-15 21:05:20,955][1648985] Avg episode reward: [(0, '173.880')] [2024-06-15 21:05:21,370][1652491] Updated weights for policy 0, policy_version 780756 (0.0014) [2024-06-15 21:05:22,318][1652491] Updated weights for policy 0, policy_version 780800 (0.0013) [2024-06-15 21:05:25,955][1648985] Fps is (10 sec: 39320.8, 60 sec: 48059.6, 300 sec: 47097.1). Total num frames: 1599209472. Throughput: 0: 12003.5. Samples: 399867392. Policy #0 lag: (min: 15.0, avg: 110.2, max: 271.0) [2024-06-15 21:05:25,956][1648985] Avg episode reward: [(0, '188.440')] [2024-06-15 21:05:26,120][1652491] Updated weights for policy 0, policy_version 780870 (0.0012) [2024-06-15 21:05:26,396][1651469] Signal inference workers to stop experience collection... (40600 times) [2024-06-15 21:05:26,458][1652491] InferenceWorker_p0-w0: stopping experience collection (40600 times) [2024-06-15 21:05:26,598][1651469] Signal inference workers to resume experience collection... (40600 times) [2024-06-15 21:05:26,599][1652491] InferenceWorker_p0-w0: resuming experience collection (40600 times) [2024-06-15 21:05:27,407][1652491] Updated weights for policy 0, policy_version 780929 (0.0014) [2024-06-15 21:05:30,955][1648985] Fps is (10 sec: 52428.5, 60 sec: 48059.7, 300 sec: 46874.9). Total num frames: 1599471616. Throughput: 0: 12026.3. Samples: 399933440. Policy #0 lag: (min: 15.0, avg: 110.2, max: 271.0) [2024-06-15 21:05:30,956][1648985] Avg episode reward: [(0, '179.010')] [2024-06-15 21:05:32,217][1652491] Updated weights for policy 0, policy_version 781008 (0.0019) [2024-06-15 21:05:35,955][1648985] Fps is (10 sec: 42599.2, 60 sec: 46421.6, 300 sec: 46763.8). Total num frames: 1599635456. Throughput: 0: 11992.2. Samples: 400011264. Policy #0 lag: (min: 15.0, avg: 110.2, max: 271.0) [2024-06-15 21:05:35,956][1648985] Avg episode reward: [(0, '159.050')] [2024-06-15 21:05:36,284][1652491] Updated weights for policy 0, policy_version 781088 (0.0115) [2024-06-15 21:05:38,262][1652491] Updated weights for policy 0, policy_version 781168 (0.0012) [2024-06-15 21:05:39,146][1652491] Updated weights for policy 0, policy_version 781216 (0.0018) [2024-06-15 21:05:40,955][1648985] Fps is (10 sec: 52429.1, 60 sec: 49152.0, 300 sec: 47097.1). Total num frames: 1599995904. Throughput: 0: 11889.9. Samples: 400036864. Policy #0 lag: (min: 15.0, avg: 110.2, max: 271.0) [2024-06-15 21:05:40,955][1648985] Avg episode reward: [(0, '163.120')] [2024-06-15 21:05:43,751][1652491] Updated weights for policy 0, policy_version 781281 (0.0035) [2024-06-15 21:05:45,955][1648985] Fps is (10 sec: 49152.0, 60 sec: 45875.1, 300 sec: 46653.0). Total num frames: 1600126976. Throughput: 0: 11798.8. Samples: 400111104. Policy #0 lag: (min: 15.0, avg: 110.2, max: 271.0) [2024-06-15 21:05:45,956][1648985] Avg episode reward: [(0, '171.430')] [2024-06-15 21:05:46,958][1652491] Updated weights for policy 0, policy_version 781328 (0.0012) [2024-06-15 21:05:48,445][1652491] Updated weights for policy 0, policy_version 781392 (0.0013) [2024-06-15 21:05:49,805][1652491] Updated weights for policy 0, policy_version 781440 (0.0015) [2024-06-15 21:05:50,955][1648985] Fps is (10 sec: 49152.0, 60 sec: 49698.3, 300 sec: 47097.1). Total num frames: 1600487424. Throughput: 0: 11696.4. Samples: 400172544. Policy #0 lag: (min: 15.0, avg: 110.2, max: 271.0) [2024-06-15 21:05:50,955][1648985] Avg episode reward: [(0, '170.530')] [2024-06-15 21:05:54,706][1652491] Updated weights for policy 0, policy_version 781506 (0.0016) [2024-06-15 21:05:55,955][1648985] Fps is (10 sec: 49151.7, 60 sec: 45329.3, 300 sec: 46541.7). Total num frames: 1600618496. Throughput: 0: 11662.3. Samples: 400211456. Policy #0 lag: (min: 15.0, avg: 110.2, max: 271.0) [2024-06-15 21:05:55,955][1648985] Avg episode reward: [(0, '167.000')] [2024-06-15 21:05:55,985][1652491] Updated weights for policy 0, policy_version 781566 (0.0034) [2024-06-15 21:05:59,373][1652491] Updated weights for policy 0, policy_version 781618 (0.0032) [2024-06-15 21:06:00,955][1648985] Fps is (10 sec: 39321.4, 60 sec: 47513.6, 300 sec: 46986.3). Total num frames: 1600880640. Throughput: 0: 11639.5. Samples: 400284160. Policy #0 lag: (min: 15.0, avg: 110.2, max: 271.0) [2024-06-15 21:06:00,956][1648985] Avg episode reward: [(0, '178.710')] [2024-06-15 21:06:01,004][1652491] Updated weights for policy 0, policy_version 781681 (0.0014) [2024-06-15 21:06:02,215][1652491] Updated weights for policy 0, policy_version 781747 (0.0012) [2024-06-15 21:06:05,955][1648985] Fps is (10 sec: 42598.6, 60 sec: 44237.0, 300 sec: 46430.6). Total num frames: 1601044480. Throughput: 0: 11514.3. Samples: 400350208. Policy #0 lag: (min: 15.0, avg: 110.2, max: 271.0) [2024-06-15 21:06:05,956][1648985] Avg episode reward: [(0, '171.070')] [2024-06-15 21:06:06,499][1651469] Signal inference workers to stop experience collection... (40650 times) [2024-06-15 21:06:06,546][1652491] InferenceWorker_p0-w0: stopping experience collection (40650 times) [2024-06-15 21:06:06,761][1651469] Signal inference workers to resume experience collection... (40650 times) [2024-06-15 21:06:06,762][1652491] InferenceWorker_p0-w0: resuming experience collection (40650 times) [2024-06-15 21:06:06,926][1652491] Updated weights for policy 0, policy_version 781817 (0.0013) [2024-06-15 21:06:10,955][1648985] Fps is (10 sec: 36044.6, 60 sec: 45329.2, 300 sec: 46652.7). Total num frames: 1601241088. Throughput: 0: 11514.3. Samples: 400385536. Policy #0 lag: (min: 15.0, avg: 110.2, max: 271.0) [2024-06-15 21:06:10,956][1648985] Avg episode reward: [(0, '164.960')] [2024-06-15 21:06:12,052][1652491] Updated weights for policy 0, policy_version 781904 (0.0030) [2024-06-15 21:06:14,024][1652491] Updated weights for policy 0, policy_version 782000 (0.0025) [2024-06-15 21:06:15,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 45875.2, 300 sec: 46652.7). Total num frames: 1601568768. Throughput: 0: 11286.8. Samples: 400441344. Policy #0 lag: (min: 15.0, avg: 110.2, max: 271.0) [2024-06-15 21:06:15,956][1648985] Avg episode reward: [(0, '158.740')] [2024-06-15 21:06:18,000][1652491] Updated weights for policy 0, policy_version 782048 (0.0034) [2024-06-15 21:06:20,966][1648985] Fps is (10 sec: 45824.1, 60 sec: 45866.6, 300 sec: 46651.0). Total num frames: 1601699840. Throughput: 0: 11363.6. Samples: 400522752. Policy #0 lag: (min: 15.0, avg: 110.2, max: 271.0) [2024-06-15 21:06:20,967][1648985] Avg episode reward: [(0, '167.090')] [2024-06-15 21:06:22,291][1652491] Updated weights for policy 0, policy_version 782099 (0.0013) [2024-06-15 21:06:24,211][1652491] Updated weights for policy 0, policy_version 782192 (0.0013) [2024-06-15 21:06:25,689][1652491] Updated weights for policy 0, policy_version 782266 (0.0015) [2024-06-15 21:06:25,955][1648985] Fps is (10 sec: 52428.5, 60 sec: 48059.8, 300 sec: 46986.5). Total num frames: 1602093056. Throughput: 0: 11434.6. Samples: 400551424. Policy #0 lag: (min: 79.0, avg: 142.7, max: 334.0) [2024-06-15 21:06:25,956][1648985] Avg episode reward: [(0, '168.120')] [2024-06-15 21:06:29,626][1652491] Updated weights for policy 0, policy_version 782325 (0.0012) [2024-06-15 21:06:30,955][1648985] Fps is (10 sec: 52487.6, 60 sec: 45875.2, 300 sec: 46652.8). Total num frames: 1602224128. Throughput: 0: 11332.3. Samples: 400621056. Policy #0 lag: (min: 79.0, avg: 142.7, max: 334.0) [2024-06-15 21:06:30,956][1648985] Avg episode reward: [(0, '174.310')] [2024-06-15 21:06:33,395][1652491] Updated weights for policy 0, policy_version 782368 (0.0013) [2024-06-15 21:06:35,154][1652491] Updated weights for policy 0, policy_version 782448 (0.0013) [2024-06-15 21:06:35,955][1648985] Fps is (10 sec: 42598.3, 60 sec: 48059.7, 300 sec: 47097.2). Total num frames: 1602519040. Throughput: 0: 11502.9. Samples: 400690176. Policy #0 lag: (min: 79.0, avg: 142.7, max: 334.0) [2024-06-15 21:06:35,956][1648985] Avg episode reward: [(0, '162.310')] [2024-06-15 21:06:36,339][1652491] Updated weights for policy 0, policy_version 782499 (0.0078) [2024-06-15 21:06:39,908][1652491] Updated weights for policy 0, policy_version 782547 (0.0013) [2024-06-15 21:06:40,791][1652491] Updated weights for policy 0, policy_version 782592 (0.0014) [2024-06-15 21:06:40,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 45875.2, 300 sec: 46652.7). Total num frames: 1602748416. Throughput: 0: 11525.7. Samples: 400730112. Policy #0 lag: (min: 79.0, avg: 142.7, max: 334.0) [2024-06-15 21:06:40,956][1648985] Avg episode reward: [(0, '184.120')] [2024-06-15 21:06:45,533][1652491] Updated weights for policy 0, policy_version 782656 (0.0118) [2024-06-15 21:06:45,955][1648985] Fps is (10 sec: 39321.7, 60 sec: 46421.3, 300 sec: 46986.0). Total num frames: 1602912256. Throughput: 0: 11639.5. Samples: 400807936. Policy #0 lag: (min: 79.0, avg: 142.7, max: 334.0) [2024-06-15 21:06:45,956][1648985] Avg episode reward: [(0, '178.140')] [2024-06-15 21:06:46,622][1651469] Signal inference workers to stop experience collection... (40700 times) [2024-06-15 21:06:46,708][1652491] InferenceWorker_p0-w0: stopping experience collection (40700 times) [2024-06-15 21:06:46,896][1651469] Signal inference workers to resume experience collection... (40700 times) [2024-06-15 21:06:46,897][1652491] InferenceWorker_p0-w0: resuming experience collection (40700 times) [2024-06-15 21:06:47,456][1652491] Updated weights for policy 0, policy_version 782737 (0.0013) [2024-06-15 21:06:50,955][1648985] Fps is (10 sec: 39321.6, 60 sec: 44236.8, 300 sec: 46430.6). Total num frames: 1603141632. Throughput: 0: 11639.5. Samples: 400873984. Policy #0 lag: (min: 79.0, avg: 142.7, max: 334.0) [2024-06-15 21:06:50,956][1648985] Avg episode reward: [(0, '162.760')] [2024-06-15 21:06:51,427][1652491] Updated weights for policy 0, policy_version 782800 (0.0013) [2024-06-15 21:06:55,911][1652491] Updated weights for policy 0, policy_version 782881 (0.0014) [2024-06-15 21:06:55,955][1648985] Fps is (10 sec: 42598.7, 60 sec: 45329.1, 300 sec: 46874.9). Total num frames: 1603338240. Throughput: 0: 11707.8. Samples: 400912384. Policy #0 lag: (min: 79.0, avg: 142.7, max: 334.0) [2024-06-15 21:06:55,955][1648985] Avg episode reward: [(0, '171.990')] [2024-06-15 21:06:56,322][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000782912_1603403776.pth... [2024-06-15 21:06:56,465][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000777408_1592131584.pth [2024-06-15 21:06:57,212][1652491] Updated weights for policy 0, policy_version 782944 (0.0132) [2024-06-15 21:06:58,968][1652491] Updated weights for policy 0, policy_version 783026 (0.0014) [2024-06-15 21:07:00,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 46421.4, 300 sec: 46541.7). Total num frames: 1603665920. Throughput: 0: 11912.5. Samples: 400977408. Policy #0 lag: (min: 79.0, avg: 142.7, max: 334.0) [2024-06-15 21:07:00,955][1648985] Avg episode reward: [(0, '175.070')] [2024-06-15 21:07:03,076][1652491] Updated weights for policy 0, policy_version 783059 (0.0014) [2024-06-15 21:07:03,770][1652491] Updated weights for policy 0, policy_version 783104 (0.0012) [2024-06-15 21:07:05,955][1648985] Fps is (10 sec: 52428.2, 60 sec: 46967.4, 300 sec: 46877.0). Total num frames: 1603862528. Throughput: 0: 11938.2. Samples: 401059840. Policy #0 lag: (min: 79.0, avg: 142.7, max: 334.0) [2024-06-15 21:07:05,956][1648985] Avg episode reward: [(0, '171.940')] [2024-06-15 21:07:06,793][1652491] Updated weights for policy 0, policy_version 783184 (0.0043) [2024-06-15 21:07:08,360][1652491] Updated weights for policy 0, policy_version 783253 (0.0086) [2024-06-15 21:07:10,955][1648985] Fps is (10 sec: 52428.6, 60 sec: 49152.1, 300 sec: 46763.8). Total num frames: 1604190208. Throughput: 0: 11889.8. Samples: 401086464. Policy #0 lag: (min: 79.0, avg: 142.7, max: 334.0) [2024-06-15 21:07:10,955][1648985] Avg episode reward: [(0, '179.840')] [2024-06-15 21:07:13,225][1652491] Updated weights for policy 0, policy_version 783298 (0.0017) [2024-06-15 21:07:14,600][1652491] Updated weights for policy 0, policy_version 783355 (0.0022) [2024-06-15 21:07:15,955][1648985] Fps is (10 sec: 49152.6, 60 sec: 46421.3, 300 sec: 46763.8). Total num frames: 1604354048. Throughput: 0: 12151.5. Samples: 401167872. Policy #0 lag: (min: 79.0, avg: 142.7, max: 334.0) [2024-06-15 21:07:15,956][1648985] Avg episode reward: [(0, '157.370')] [2024-06-15 21:07:16,705][1652491] Updated weights for policy 0, policy_version 783408 (0.0016) [2024-06-15 21:07:17,656][1652491] Updated weights for policy 0, policy_version 783443 (0.0012) [2024-06-15 21:07:18,914][1652491] Updated weights for policy 0, policy_version 783507 (0.0020) [2024-06-15 21:07:20,955][1648985] Fps is (10 sec: 52428.6, 60 sec: 50253.6, 300 sec: 47208.1). Total num frames: 1604714496. Throughput: 0: 12140.1. Samples: 401236480. Policy #0 lag: (min: 79.0, avg: 142.7, max: 334.0) [2024-06-15 21:07:20,956][1648985] Avg episode reward: [(0, '162.190')] [2024-06-15 21:07:24,493][1652491] Updated weights for policy 0, policy_version 783556 (0.0013) [2024-06-15 21:07:25,938][1652491] Updated weights for policy 0, policy_version 783614 (0.0012) [2024-06-15 21:07:25,955][1648985] Fps is (10 sec: 45875.1, 60 sec: 45329.1, 300 sec: 46541.7). Total num frames: 1604812800. Throughput: 0: 12197.0. Samples: 401278976. Policy #0 lag: (min: 79.0, avg: 142.7, max: 334.0) [2024-06-15 21:07:25,956][1648985] Avg episode reward: [(0, '162.100')] [2024-06-15 21:07:26,798][1651469] Signal inference workers to stop experience collection... (40750 times) [2024-06-15 21:07:26,851][1652491] InferenceWorker_p0-w0: stopping experience collection (40750 times) [2024-06-15 21:07:27,204][1651469] Signal inference workers to resume experience collection... (40750 times) [2024-06-15 21:07:27,206][1652491] InferenceWorker_p0-w0: resuming experience collection (40750 times) [2024-06-15 21:07:27,882][1652491] Updated weights for policy 0, policy_version 783680 (0.0012) [2024-06-15 21:07:28,914][1652491] Updated weights for policy 0, policy_version 783732 (0.0011) [2024-06-15 21:07:30,071][1652491] Updated weights for policy 0, policy_version 783780 (0.0011) [2024-06-15 21:07:30,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 50244.3, 300 sec: 47652.5). Total num frames: 1605238784. Throughput: 0: 11855.7. Samples: 401341440. Policy #0 lag: (min: 79.0, avg: 142.7, max: 334.0) [2024-06-15 21:07:30,956][1648985] Avg episode reward: [(0, '154.930')] [2024-06-15 21:07:35,955][1648985] Fps is (10 sec: 45875.4, 60 sec: 45875.3, 300 sec: 46319.6). Total num frames: 1605271552. Throughput: 0: 12162.8. Samples: 401421312. Policy #0 lag: (min: 79.0, avg: 142.7, max: 334.0) [2024-06-15 21:07:35,955][1648985] Avg episode reward: [(0, '128.120')] [2024-06-15 21:07:36,138][1652491] Updated weights for policy 0, policy_version 783826 (0.0012) [2024-06-15 21:07:38,363][1652491] Updated weights for policy 0, policy_version 783904 (0.0121) [2024-06-15 21:07:39,301][1652491] Updated weights for policy 0, policy_version 783936 (0.0016) [2024-06-15 21:07:40,955][1648985] Fps is (10 sec: 39321.6, 60 sec: 48059.7, 300 sec: 47543.0). Total num frames: 1605632000. Throughput: 0: 11810.1. Samples: 401443840. Policy #0 lag: (min: 79.0, avg: 142.7, max: 334.0) [2024-06-15 21:07:40,956][1648985] Avg episode reward: [(0, '146.230')] [2024-06-15 21:07:41,141][1652491] Updated weights for policy 0, policy_version 784017 (0.0014) [2024-06-15 21:07:41,830][1652491] Updated weights for policy 0, policy_version 784061 (0.0013) [2024-06-15 21:07:45,955][1648985] Fps is (10 sec: 49152.1, 60 sec: 47513.7, 300 sec: 46541.7). Total num frames: 1605763072. Throughput: 0: 12060.5. Samples: 401520128. Policy #0 lag: (min: 79.0, avg: 142.7, max: 334.0) [2024-06-15 21:07:45,955][1648985] Avg episode reward: [(0, '146.000')] [2024-06-15 21:07:48,449][1652491] Updated weights for policy 0, policy_version 784118 (0.0018) [2024-06-15 21:07:49,825][1652491] Updated weights for policy 0, policy_version 784164 (0.0099) [2024-06-15 21:07:50,955][1648985] Fps is (10 sec: 42598.2, 60 sec: 48605.8, 300 sec: 47208.2). Total num frames: 1606057984. Throughput: 0: 11628.1. Samples: 401583104. Policy #0 lag: (min: 79.0, avg: 142.7, max: 334.0) [2024-06-15 21:07:50,956][1648985] Avg episode reward: [(0, '147.770')] [2024-06-15 21:07:51,720][1652491] Updated weights for policy 0, policy_version 784240 (0.0012) [2024-06-15 21:07:52,825][1652491] Updated weights for policy 0, policy_version 784288 (0.0013) [2024-06-15 21:07:55,955][1648985] Fps is (10 sec: 52428.3, 60 sec: 49152.0, 300 sec: 46986.0). Total num frames: 1606287360. Throughput: 0: 11707.7. Samples: 401613312. Policy #0 lag: (min: 79.0, avg: 142.7, max: 334.0) [2024-06-15 21:07:55,956][1648985] Avg episode reward: [(0, '130.210')] [2024-06-15 21:07:58,631][1652491] Updated weights for policy 0, policy_version 784322 (0.0014) [2024-06-15 21:07:59,818][1652491] Updated weights for policy 0, policy_version 784373 (0.0015) [2024-06-15 21:08:00,955][1648985] Fps is (10 sec: 42597.8, 60 sec: 46967.3, 300 sec: 46874.9). Total num frames: 1606483968. Throughput: 0: 11719.1. Samples: 401695232. Policy #0 lag: (min: 79.0, avg: 142.7, max: 334.0) [2024-06-15 21:08:00,956][1648985] Avg episode reward: [(0, '154.740')] [2024-06-15 21:08:01,224][1652491] Updated weights for policy 0, policy_version 784432 (0.0013) [2024-06-15 21:08:02,646][1652491] Updated weights for policy 0, policy_version 784482 (0.0013) [2024-06-15 21:08:04,398][1652491] Updated weights for policy 0, policy_version 784575 (0.0190) [2024-06-15 21:08:05,970][1648985] Fps is (10 sec: 52349.2, 60 sec: 49139.6, 300 sec: 47427.9). Total num frames: 1606811648. Throughput: 0: 11555.9. Samples: 401756672. Policy #0 lag: (min: 79.0, avg: 142.7, max: 334.0) [2024-06-15 21:08:05,971][1648985] Avg episode reward: [(0, '167.030')] [2024-06-15 21:08:09,878][1651469] Signal inference workers to stop experience collection... (40800 times) [2024-06-15 21:08:09,924][1652491] InferenceWorker_p0-w0: stopping experience collection (40800 times) [2024-06-15 21:08:10,222][1651469] Signal inference workers to resume experience collection... (40800 times) [2024-06-15 21:08:10,224][1652491] InferenceWorker_p0-w0: resuming experience collection (40800 times) [2024-06-15 21:08:10,955][1648985] Fps is (10 sec: 39322.3, 60 sec: 44782.9, 300 sec: 46430.6). Total num frames: 1606877184. Throughput: 0: 11559.8. Samples: 401799168. Policy #0 lag: (min: 15.0, avg: 82.3, max: 271.0) [2024-06-15 21:08:10,956][1648985] Avg episode reward: [(0, '152.020')] [2024-06-15 21:08:12,121][1652491] Updated weights for policy 0, policy_version 784646 (0.0013) [2024-06-15 21:08:13,258][1652491] Updated weights for policy 0, policy_version 784702 (0.0014) [2024-06-15 21:08:14,805][1652491] Updated weights for policy 0, policy_version 784765 (0.0031) [2024-06-15 21:08:15,955][1648985] Fps is (10 sec: 45944.9, 60 sec: 48605.8, 300 sec: 47430.3). Total num frames: 1607270400. Throughput: 0: 11423.3. Samples: 401855488. Policy #0 lag: (min: 15.0, avg: 82.3, max: 271.0) [2024-06-15 21:08:15,956][1648985] Avg episode reward: [(0, '163.630')] [2024-06-15 21:08:16,423][1652491] Updated weights for policy 0, policy_version 784824 (0.0014) [2024-06-15 21:08:20,955][1648985] Fps is (10 sec: 45875.2, 60 sec: 43690.7, 300 sec: 46208.4). Total num frames: 1607335936. Throughput: 0: 11446.0. Samples: 401936384. Policy #0 lag: (min: 15.0, avg: 82.3, max: 271.0) [2024-06-15 21:08:20,956][1648985] Avg episode reward: [(0, '152.930')] [2024-06-15 21:08:22,021][1652491] Updated weights for policy 0, policy_version 784866 (0.0019) [2024-06-15 21:08:23,775][1652491] Updated weights for policy 0, policy_version 784928 (0.0016) [2024-06-15 21:08:25,263][1652491] Updated weights for policy 0, policy_version 784994 (0.0024) [2024-06-15 21:08:25,955][1648985] Fps is (10 sec: 42599.0, 60 sec: 48059.8, 300 sec: 47430.3). Total num frames: 1607696384. Throughput: 0: 11719.1. Samples: 401971200. Policy #0 lag: (min: 15.0, avg: 82.3, max: 271.0) [2024-06-15 21:08:25,955][1648985] Avg episode reward: [(0, '144.710')] [2024-06-15 21:08:26,843][1652491] Updated weights for policy 0, policy_version 785056 (0.0014) [2024-06-15 21:08:30,956][1648985] Fps is (10 sec: 52423.3, 60 sec: 43689.9, 300 sec: 46542.0). Total num frames: 1607860224. Throughput: 0: 11570.9. Samples: 402040832. Policy #0 lag: (min: 15.0, avg: 82.3, max: 271.0) [2024-06-15 21:08:30,957][1648985] Avg episode reward: [(0, '138.670')] [2024-06-15 21:08:32,867][1652491] Updated weights for policy 0, policy_version 785123 (0.0014) [2024-06-15 21:08:34,982][1652491] Updated weights for policy 0, policy_version 785184 (0.0015) [2024-06-15 21:08:35,955][1648985] Fps is (10 sec: 42598.1, 60 sec: 47513.6, 300 sec: 47097.1). Total num frames: 1608122368. Throughput: 0: 11741.9. Samples: 402111488. Policy #0 lag: (min: 15.0, avg: 82.3, max: 271.0) [2024-06-15 21:08:35,955][1648985] Avg episode reward: [(0, '131.270')] [2024-06-15 21:08:36,677][1652491] Updated weights for policy 0, policy_version 785254 (0.0019) [2024-06-15 21:08:38,180][1652491] Updated weights for policy 0, policy_version 785315 (0.0012) [2024-06-15 21:08:40,955][1648985] Fps is (10 sec: 52434.6, 60 sec: 45875.2, 300 sec: 47097.1). Total num frames: 1608384512. Throughput: 0: 11707.8. Samples: 402140160. Policy #0 lag: (min: 15.0, avg: 82.3, max: 271.0) [2024-06-15 21:08:40,955][1648985] Avg episode reward: [(0, '145.460')] [2024-06-15 21:08:43,465][1652491] Updated weights for policy 0, policy_version 785377 (0.0014) [2024-06-15 21:08:45,706][1652491] Updated weights for policy 0, policy_version 785440 (0.0015) [2024-06-15 21:08:45,955][1648985] Fps is (10 sec: 45874.3, 60 sec: 46967.3, 300 sec: 46874.9). Total num frames: 1608581120. Throughput: 0: 11787.4. Samples: 402225664. Policy #0 lag: (min: 15.0, avg: 82.3, max: 271.0) [2024-06-15 21:08:45,956][1648985] Avg episode reward: [(0, '139.330')] [2024-06-15 21:08:47,388][1651469] Signal inference workers to stop experience collection... (40850 times) [2024-06-15 21:08:47,407][1652491] Updated weights for policy 0, policy_version 785521 (0.0015) [2024-06-15 21:08:47,422][1652491] InferenceWorker_p0-w0: stopping experience collection (40850 times) [2024-06-15 21:08:47,601][1651469] Signal inference workers to resume experience collection... (40850 times) [2024-06-15 21:08:47,602][1652491] InferenceWorker_p0-w0: resuming experience collection (40850 times) [2024-06-15 21:08:48,942][1652491] Updated weights for policy 0, policy_version 785584 (0.0013) [2024-06-15 21:08:50,955][1648985] Fps is (10 sec: 52428.5, 60 sec: 47513.6, 300 sec: 47097.1). Total num frames: 1608908800. Throughput: 0: 11893.8. Samples: 402291712. Policy #0 lag: (min: 15.0, avg: 82.3, max: 271.0) [2024-06-15 21:08:50,955][1648985] Avg episode reward: [(0, '160.610')] [2024-06-15 21:08:54,820][1652491] Updated weights for policy 0, policy_version 785660 (0.0012) [2024-06-15 21:08:55,955][1648985] Fps is (10 sec: 45875.9, 60 sec: 45875.2, 300 sec: 46652.7). Total num frames: 1609039872. Throughput: 0: 11946.7. Samples: 402336768. Policy #0 lag: (min: 15.0, avg: 82.3, max: 271.0) [2024-06-15 21:08:55,956][1648985] Avg episode reward: [(0, '160.410')] [2024-06-15 21:08:56,507][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000785696_1609105408.pth... [2024-06-15 21:08:56,686][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000780224_1597898752.pth [2024-06-15 21:08:57,236][1652491] Updated weights for policy 0, policy_version 785715 (0.0015) [2024-06-15 21:08:59,345][1652491] Updated weights for policy 0, policy_version 785824 (0.0159) [2024-06-15 21:09:00,955][1648985] Fps is (10 sec: 52428.2, 60 sec: 49152.1, 300 sec: 47430.3). Total num frames: 1609433088. Throughput: 0: 11821.5. Samples: 402387456. Policy #0 lag: (min: 15.0, avg: 82.3, max: 271.0) [2024-06-15 21:09:00,956][1648985] Avg episode reward: [(0, '156.060')] [2024-06-15 21:09:04,676][1652491] Updated weights for policy 0, policy_version 785872 (0.0015) [2024-06-15 21:09:05,955][1648985] Fps is (10 sec: 52428.6, 60 sec: 45886.8, 300 sec: 46763.8). Total num frames: 1609564160. Throughput: 0: 11992.2. Samples: 402476032. Policy #0 lag: (min: 15.0, avg: 82.3, max: 271.0) [2024-06-15 21:09:05,956][1648985] Avg episode reward: [(0, '158.430')] [2024-06-15 21:09:06,733][1652491] Updated weights for policy 0, policy_version 785925 (0.0064) [2024-06-15 21:09:07,710][1652491] Updated weights for policy 0, policy_version 785971 (0.0012) [2024-06-15 21:09:09,258][1652491] Updated weights for policy 0, policy_version 786048 (0.0159) [2024-06-15 21:09:10,955][1648985] Fps is (10 sec: 52429.2, 60 sec: 51336.5, 300 sec: 47541.4). Total num frames: 1609957376. Throughput: 0: 11980.8. Samples: 402510336. Policy #0 lag: (min: 15.0, avg: 82.3, max: 271.0) [2024-06-15 21:09:10,956][1648985] Avg episode reward: [(0, '153.460')] [2024-06-15 21:09:14,927][1652491] Updated weights for policy 0, policy_version 786115 (0.0164) [2024-06-15 21:09:15,955][1648985] Fps is (10 sec: 45875.5, 60 sec: 45875.2, 300 sec: 46986.0). Total num frames: 1610022912. Throughput: 0: 12174.5. Samples: 402588672. Policy #0 lag: (min: 15.0, avg: 82.3, max: 271.0) [2024-06-15 21:09:15,955][1648985] Avg episode reward: [(0, '140.440')] [2024-06-15 21:09:17,707][1652491] Updated weights for policy 0, policy_version 786177 (0.0012) [2024-06-15 21:09:18,685][1652491] Updated weights for policy 0, policy_version 786233 (0.0035) [2024-06-15 21:09:20,700][1652491] Updated weights for policy 0, policy_version 786320 (0.0110) [2024-06-15 21:09:20,966][1648985] Fps is (10 sec: 42550.4, 60 sec: 50780.8, 300 sec: 47650.6). Total num frames: 1610383360. Throughput: 0: 12000.5. Samples: 402651648. Policy #0 lag: (min: 15.0, avg: 82.3, max: 271.0) [2024-06-15 21:09:20,967][1648985] Avg episode reward: [(0, '139.060')] [2024-06-15 21:09:21,839][1652491] Updated weights for policy 0, policy_version 786364 (0.0015) [2024-06-15 21:09:25,955][1648985] Fps is (10 sec: 45874.6, 60 sec: 46421.2, 300 sec: 47097.0). Total num frames: 1610481664. Throughput: 0: 12162.8. Samples: 402687488. Policy #0 lag: (min: 15.0, avg: 82.3, max: 271.0) [2024-06-15 21:09:25,956][1648985] Avg episode reward: [(0, '137.790')] [2024-06-15 21:09:27,317][1652491] Updated weights for policy 0, policy_version 786421 (0.0016) [2024-06-15 21:09:28,263][1652491] Updated weights for policy 0, policy_version 786434 (0.0012) [2024-06-15 21:09:29,222][1651469] Signal inference workers to stop experience collection... (40900 times) [2024-06-15 21:09:29,297][1652491] InferenceWorker_p0-w0: stopping experience collection (40900 times) [2024-06-15 21:09:29,618][1651469] Signal inference workers to resume experience collection... (40900 times) [2024-06-15 21:09:29,619][1652491] InferenceWorker_p0-w0: resuming experience collection (40900 times) [2024-06-15 21:09:29,914][1652491] Updated weights for policy 0, policy_version 786496 (0.0017) [2024-06-15 21:09:30,955][1648985] Fps is (10 sec: 39365.2, 60 sec: 48606.5, 300 sec: 47208.2). Total num frames: 1610776576. Throughput: 0: 11855.6. Samples: 402759168. Policy #0 lag: (min: 15.0, avg: 82.3, max: 271.0) [2024-06-15 21:09:30,956][1648985] Avg episode reward: [(0, '130.920')] [2024-06-15 21:09:31,477][1652491] Updated weights for policy 0, policy_version 786554 (0.0016) [2024-06-15 21:09:33,196][1652491] Updated weights for policy 0, policy_version 786608 (0.0012) [2024-06-15 21:09:35,955][1648985] Fps is (10 sec: 52429.7, 60 sec: 48059.8, 300 sec: 47319.2). Total num frames: 1611005952. Throughput: 0: 11946.7. Samples: 402829312. Policy #0 lag: (min: 15.0, avg: 82.3, max: 271.0) [2024-06-15 21:09:35,956][1648985] Avg episode reward: [(0, '142.380')] [2024-06-15 21:09:38,469][1652491] Updated weights for policy 0, policy_version 786672 (0.0013) [2024-06-15 21:09:40,955][1648985] Fps is (10 sec: 42599.4, 60 sec: 46967.4, 300 sec: 46874.9). Total num frames: 1611202560. Throughput: 0: 11685.0. Samples: 402862592. Policy #0 lag: (min: 15.0, avg: 82.3, max: 271.0) [2024-06-15 21:09:40,955][1648985] Avg episode reward: [(0, '152.380')] [2024-06-15 21:09:41,156][1652491] Updated weights for policy 0, policy_version 786742 (0.0013) [2024-06-15 21:09:43,249][1652491] Updated weights for policy 0, policy_version 786800 (0.0013) [2024-06-15 21:09:45,001][1652491] Updated weights for policy 0, policy_version 786850 (0.0011) [2024-06-15 21:09:45,955][1648985] Fps is (10 sec: 52429.4, 60 sec: 49152.3, 300 sec: 47541.4). Total num frames: 1611530240. Throughput: 0: 11878.5. Samples: 402921984. Policy #0 lag: (min: 15.0, avg: 82.3, max: 271.0) [2024-06-15 21:09:45,955][1648985] Avg episode reward: [(0, '165.590')] [2024-06-15 21:09:50,122][1652491] Updated weights for policy 0, policy_version 786914 (0.0029) [2024-06-15 21:09:50,955][1648985] Fps is (10 sec: 45875.1, 60 sec: 45875.2, 300 sec: 46652.8). Total num frames: 1611661312. Throughput: 0: 11559.8. Samples: 402996224. Policy #0 lag: (min: 15.0, avg: 82.3, max: 271.0) [2024-06-15 21:09:50,956][1648985] Avg episode reward: [(0, '168.320')] [2024-06-15 21:09:52,469][1652491] Updated weights for policy 0, policy_version 786962 (0.0125) [2024-06-15 21:09:53,945][1652491] Updated weights for policy 0, policy_version 787009 (0.0041) [2024-06-15 21:09:55,166][1652491] Updated weights for policy 0, policy_version 787072 (0.0012) [2024-06-15 21:09:55,955][1648985] Fps is (10 sec: 45874.4, 60 sec: 49152.0, 300 sec: 47319.2). Total num frames: 1611988992. Throughput: 0: 11559.8. Samples: 403030528. Policy #0 lag: (min: 15.0, avg: 82.3, max: 271.0) [2024-06-15 21:09:55,955][1648985] Avg episode reward: [(0, '166.230')] [2024-06-15 21:09:56,380][1652491] Updated weights for policy 0, policy_version 787129 (0.0014) [2024-06-15 21:10:00,955][1648985] Fps is (10 sec: 42598.1, 60 sec: 44236.8, 300 sec: 46430.6). Total num frames: 1612087296. Throughput: 0: 11423.3. Samples: 403102720. Policy #0 lag: (min: 63.0, avg: 202.0, max: 303.0) [2024-06-15 21:10:00,956][1648985] Avg episode reward: [(0, '170.860')] [2024-06-15 21:10:01,599][1652491] Updated weights for policy 0, policy_version 787193 (0.0013) [2024-06-15 21:10:04,049][1652491] Updated weights for policy 0, policy_version 787248 (0.0013) [2024-06-15 21:10:05,955][1648985] Fps is (10 sec: 36044.8, 60 sec: 46421.4, 300 sec: 46874.9). Total num frames: 1612349440. Throughput: 0: 11551.3. Samples: 403171328. Policy #0 lag: (min: 63.0, avg: 202.0, max: 303.0) [2024-06-15 21:10:05,956][1648985] Avg episode reward: [(0, '176.300')] [2024-06-15 21:10:06,613][1652491] Updated weights for policy 0, policy_version 787314 (0.0011) [2024-06-15 21:10:10,955][1648985] Fps is (10 sec: 49152.3, 60 sec: 43690.7, 300 sec: 46652.7). Total num frames: 1612578816. Throughput: 0: 11309.5. Samples: 403196416. Policy #0 lag: (min: 63.0, avg: 202.0, max: 303.0) [2024-06-15 21:10:10,955][1648985] Avg episode reward: [(0, '187.260')] [2024-06-15 21:10:11,783][1652491] Updated weights for policy 0, policy_version 787393 (0.0013) [2024-06-15 21:10:13,077][1652491] Updated weights for policy 0, policy_version 787452 (0.0132) [2024-06-15 21:10:15,592][1651469] Signal inference workers to stop experience collection... (40950 times) [2024-06-15 21:10:15,649][1652491] InferenceWorker_p0-w0: stopping experience collection (40950 times) [2024-06-15 21:10:15,851][1651469] Signal inference workers to resume experience collection... (40950 times) [2024-06-15 21:10:15,852][1652491] InferenceWorker_p0-w0: resuming experience collection (40950 times) [2024-06-15 21:10:15,855][1652491] Updated weights for policy 0, policy_version 787504 (0.0013) [2024-06-15 21:10:15,955][1648985] Fps is (10 sec: 45875.1, 60 sec: 46421.3, 300 sec: 46986.0). Total num frames: 1612808192. Throughput: 0: 11457.5. Samples: 403274752. Policy #0 lag: (min: 63.0, avg: 202.0, max: 303.0) [2024-06-15 21:10:15,956][1648985] Avg episode reward: [(0, '168.760')] [2024-06-15 21:10:17,447][1652491] Updated weights for policy 0, policy_version 787568 (0.0014) [2024-06-15 21:10:18,982][1652491] Updated weights for policy 0, policy_version 787625 (0.0012) [2024-06-15 21:10:20,955][1648985] Fps is (10 sec: 52428.4, 60 sec: 45337.6, 300 sec: 47097.1). Total num frames: 1613103104. Throughput: 0: 11411.9. Samples: 403342848. Policy #0 lag: (min: 63.0, avg: 202.0, max: 303.0) [2024-06-15 21:10:20,956][1648985] Avg episode reward: [(0, '168.660')] [2024-06-15 21:10:23,193][1652491] Updated weights for policy 0, policy_version 787696 (0.0013) [2024-06-15 21:10:25,955][1648985] Fps is (10 sec: 42597.6, 60 sec: 45875.1, 300 sec: 46652.7). Total num frames: 1613234176. Throughput: 0: 11491.5. Samples: 403379712. Policy #0 lag: (min: 63.0, avg: 202.0, max: 303.0) [2024-06-15 21:10:25,956][1648985] Avg episode reward: [(0, '179.420')] [2024-06-15 21:10:26,866][1652491] Updated weights for policy 0, policy_version 787745 (0.0012) [2024-06-15 21:10:28,236][1652491] Updated weights for policy 0, policy_version 787808 (0.0013) [2024-06-15 21:10:29,973][1652491] Updated weights for policy 0, policy_version 787888 (0.0013) [2024-06-15 21:10:30,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 47513.8, 300 sec: 47430.3). Total num frames: 1613627392. Throughput: 0: 11673.6. Samples: 403447296. Policy #0 lag: (min: 63.0, avg: 202.0, max: 303.0) [2024-06-15 21:10:30,956][1648985] Avg episode reward: [(0, '179.600')] [2024-06-15 21:10:33,969][1652491] Updated weights for policy 0, policy_version 787952 (0.0012) [2024-06-15 21:10:35,955][1648985] Fps is (10 sec: 52429.7, 60 sec: 45875.1, 300 sec: 46652.7). Total num frames: 1613758464. Throughput: 0: 11753.2. Samples: 403525120. Policy #0 lag: (min: 63.0, avg: 202.0, max: 303.0) [2024-06-15 21:10:35,956][1648985] Avg episode reward: [(0, '178.040')] [2024-06-15 21:10:37,868][1652491] Updated weights for policy 0, policy_version 787987 (0.0013) [2024-06-15 21:10:39,203][1652491] Updated weights for policy 0, policy_version 788049 (0.0012) [2024-06-15 21:10:40,955][1648985] Fps is (10 sec: 42597.9, 60 sec: 47513.5, 300 sec: 47208.1). Total num frames: 1614053376. Throughput: 0: 11889.7. Samples: 403565568. Policy #0 lag: (min: 63.0, avg: 202.0, max: 303.0) [2024-06-15 21:10:40,956][1648985] Avg episode reward: [(0, '167.940')] [2024-06-15 21:10:41,131][1652491] Updated weights for policy 0, policy_version 788128 (0.0094) [2024-06-15 21:10:44,311][1652491] Updated weights for policy 0, policy_version 788176 (0.0025) [2024-06-15 21:10:45,955][1648985] Fps is (10 sec: 52429.6, 60 sec: 45875.2, 300 sec: 46763.8). Total num frames: 1614282752. Throughput: 0: 11616.8. Samples: 403625472. Policy #0 lag: (min: 63.0, avg: 202.0, max: 303.0) [2024-06-15 21:10:45,955][1648985] Avg episode reward: [(0, '169.470')] [2024-06-15 21:10:49,050][1652491] Updated weights for policy 0, policy_version 788228 (0.0026) [2024-06-15 21:10:50,518][1652491] Updated weights for policy 0, policy_version 788293 (0.0012) [2024-06-15 21:10:50,955][1648985] Fps is (10 sec: 39322.2, 60 sec: 46421.3, 300 sec: 46874.9). Total num frames: 1614446592. Throughput: 0: 11639.5. Samples: 403695104. Policy #0 lag: (min: 63.0, avg: 202.0, max: 303.0) [2024-06-15 21:10:50,956][1648985] Avg episode reward: [(0, '161.220')] [2024-06-15 21:10:51,792][1652491] Updated weights for policy 0, policy_version 788346 (0.0011) [2024-06-15 21:10:53,279][1652491] Updated weights for policy 0, policy_version 788400 (0.0012) [2024-06-15 21:10:55,709][1651469] Signal inference workers to stop experience collection... (41000 times) [2024-06-15 21:10:55,752][1652491] InferenceWorker_p0-w0: stopping experience collection (41000 times) [2024-06-15 21:10:55,770][1652491] Updated weights for policy 0, policy_version 788435 (0.0013) [2024-06-15 21:10:55,925][1651469] Signal inference workers to resume experience collection... (41000 times) [2024-06-15 21:10:55,938][1652491] InferenceWorker_p0-w0: resuming experience collection (41000 times) [2024-06-15 21:10:55,955][1648985] Fps is (10 sec: 45873.2, 60 sec: 45875.0, 300 sec: 46985.9). Total num frames: 1614741504. Throughput: 0: 11764.5. Samples: 403725824. Policy #0 lag: (min: 63.0, avg: 202.0, max: 303.0) [2024-06-15 21:10:55,956][1648985] Avg episode reward: [(0, '153.440')] [2024-06-15 21:10:56,260][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000788464_1614774272.pth... [2024-06-15 21:10:56,342][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000782912_1603403776.pth [2024-06-15 21:10:56,345][1651469] Saving a milestone train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/milestones/checkpoint_000788464_1614774272.pth [2024-06-15 21:10:56,750][1652491] Updated weights for policy 0, policy_version 788478 (0.0080) [2024-06-15 21:11:00,955][1648985] Fps is (10 sec: 42598.6, 60 sec: 46421.4, 300 sec: 46874.9). Total num frames: 1614872576. Throughput: 0: 11753.3. Samples: 403803648. Policy #0 lag: (min: 63.0, avg: 202.0, max: 303.0) [2024-06-15 21:11:00,956][1648985] Avg episode reward: [(0, '149.930')] [2024-06-15 21:11:01,574][1652491] Updated weights for policy 0, policy_version 788544 (0.0012) [2024-06-15 21:11:03,862][1652491] Updated weights for policy 0, policy_version 788611 (0.0014) [2024-06-15 21:11:05,138][1652491] Updated weights for policy 0, policy_version 788664 (0.0087) [2024-06-15 21:11:05,955][1648985] Fps is (10 sec: 45876.4, 60 sec: 47513.5, 300 sec: 47319.2). Total num frames: 1615200256. Throughput: 0: 11559.8. Samples: 403863040. Policy #0 lag: (min: 63.0, avg: 202.0, max: 303.0) [2024-06-15 21:11:05,956][1648985] Avg episode reward: [(0, '155.630')] [2024-06-15 21:11:07,570][1652491] Updated weights for policy 0, policy_version 788704 (0.0014) [2024-06-15 21:11:10,955][1648985] Fps is (10 sec: 45874.9, 60 sec: 45875.2, 300 sec: 46652.7). Total num frames: 1615331328. Throughput: 0: 11525.7. Samples: 403898368. Policy #0 lag: (min: 63.0, avg: 202.0, max: 303.0) [2024-06-15 21:11:10,956][1648985] Avg episode reward: [(0, '135.400')] [2024-06-15 21:11:12,675][1652491] Updated weights for policy 0, policy_version 788800 (0.0012) [2024-06-15 21:11:14,009][1652491] Updated weights for policy 0, policy_version 788853 (0.0020) [2024-06-15 21:11:15,131][1652491] Updated weights for policy 0, policy_version 788882 (0.0012) [2024-06-15 21:11:15,955][1648985] Fps is (10 sec: 49152.5, 60 sec: 48059.8, 300 sec: 47432.1). Total num frames: 1615691776. Throughput: 0: 11685.0. Samples: 403973120. Policy #0 lag: (min: 63.0, avg: 202.0, max: 303.0) [2024-06-15 21:11:15,956][1648985] Avg episode reward: [(0, '116.450')] [2024-06-15 21:11:16,234][1652491] Updated weights for policy 0, policy_version 788928 (0.0012) [2024-06-15 21:11:19,399][1652491] Updated weights for policy 0, policy_version 788992 (0.0013) [2024-06-15 21:11:20,955][1648985] Fps is (10 sec: 52428.4, 60 sec: 45875.2, 300 sec: 46652.7). Total num frames: 1615855616. Throughput: 0: 11537.1. Samples: 404044288. Policy #0 lag: (min: 63.0, avg: 202.0, max: 303.0) [2024-06-15 21:11:20,956][1648985] Avg episode reward: [(0, '131.690')] [2024-06-15 21:11:23,657][1652491] Updated weights for policy 0, policy_version 789057 (0.0013) [2024-06-15 21:11:25,955][1648985] Fps is (10 sec: 42597.9, 60 sec: 48059.8, 300 sec: 47097.0). Total num frames: 1616117760. Throughput: 0: 11434.7. Samples: 404080128. Policy #0 lag: (min: 63.0, avg: 202.0, max: 303.0) [2024-06-15 21:11:25,956][1648985] Avg episode reward: [(0, '136.500')] [2024-06-15 21:11:26,311][1652491] Updated weights for policy 0, policy_version 789137 (0.0020) [2024-06-15 21:11:27,337][1652491] Updated weights for policy 0, policy_version 789181 (0.0011) [2024-06-15 21:11:30,327][1652491] Updated weights for policy 0, policy_version 789232 (0.0016) [2024-06-15 21:11:30,955][1648985] Fps is (10 sec: 52428.5, 60 sec: 45875.1, 300 sec: 46986.0). Total num frames: 1616379904. Throughput: 0: 11696.3. Samples: 404151808. Policy #0 lag: (min: 63.0, avg: 202.0, max: 303.0) [2024-06-15 21:11:30,956][1648985] Avg episode reward: [(0, '143.470')] [2024-06-15 21:11:33,183][1652491] Updated weights for policy 0, policy_version 789281 (0.0024) [2024-06-15 21:11:35,032][1652491] Updated weights for policy 0, policy_version 789333 (0.0014) [2024-06-15 21:11:35,955][1648985] Fps is (10 sec: 52429.5, 60 sec: 48059.8, 300 sec: 47097.1). Total num frames: 1616642048. Throughput: 0: 11855.7. Samples: 404228608. Policy #0 lag: (min: 63.0, avg: 202.0, max: 303.0) [2024-06-15 21:11:35,955][1648985] Avg episode reward: [(0, '166.560')] [2024-06-15 21:11:36,560][1652491] Updated weights for policy 0, policy_version 789393 (0.0017) [2024-06-15 21:11:37,511][1652491] Updated weights for policy 0, policy_version 789439 (0.0014) [2024-06-15 21:11:40,221][1651469] Signal inference workers to stop experience collection... (41050 times) [2024-06-15 21:11:40,306][1652491] InferenceWorker_p0-w0: stopping experience collection (41050 times) [2024-06-15 21:11:40,505][1651469] Signal inference workers to resume experience collection... (41050 times) [2024-06-15 21:11:40,506][1652491] InferenceWorker_p0-w0: resuming experience collection (41050 times) [2024-06-15 21:11:40,834][1652491] Updated weights for policy 0, policy_version 789491 (0.0033) [2024-06-15 21:11:40,955][1648985] Fps is (10 sec: 49152.7, 60 sec: 46967.6, 300 sec: 47319.2). Total num frames: 1616871424. Throughput: 0: 12003.6. Samples: 404265984. Policy #0 lag: (min: 63.0, avg: 202.0, max: 303.0) [2024-06-15 21:11:40,956][1648985] Avg episode reward: [(0, '165.900')] [2024-06-15 21:11:42,974][1652491] Updated weights for policy 0, policy_version 789524 (0.0011) [2024-06-15 21:11:45,333][1652491] Updated weights for policy 0, policy_version 789570 (0.0020) [2024-06-15 21:11:45,955][1648985] Fps is (10 sec: 45874.7, 60 sec: 46967.3, 300 sec: 47319.2). Total num frames: 1617100800. Throughput: 0: 11980.8. Samples: 404342784. Policy #0 lag: (min: 7.0, avg: 112.4, max: 263.0) [2024-06-15 21:11:45,956][1648985] Avg episode reward: [(0, '147.080')] [2024-06-15 21:11:46,718][1652491] Updated weights for policy 0, policy_version 789629 (0.0016) [2024-06-15 21:11:48,270][1652491] Updated weights for policy 0, policy_version 789686 (0.0012) [2024-06-15 21:11:50,955][1648985] Fps is (10 sec: 42598.6, 60 sec: 47513.6, 300 sec: 47319.2). Total num frames: 1617297408. Throughput: 0: 12231.1. Samples: 404413440. Policy #0 lag: (min: 7.0, avg: 112.4, max: 263.0) [2024-06-15 21:11:50,955][1648985] Avg episode reward: [(0, '170.150')] [2024-06-15 21:11:51,975][1652491] Updated weights for policy 0, policy_version 789754 (0.0113) [2024-06-15 21:11:54,472][1652491] Updated weights for policy 0, policy_version 789808 (0.0033) [2024-06-15 21:11:55,955][1648985] Fps is (10 sec: 45875.3, 60 sec: 46967.7, 300 sec: 47097.0). Total num frames: 1617559552. Throughput: 0: 12231.1. Samples: 404448768. Policy #0 lag: (min: 7.0, avg: 112.4, max: 263.0) [2024-06-15 21:11:55,956][1648985] Avg episode reward: [(0, '162.500')] [2024-06-15 21:11:56,646][1652491] Updated weights for policy 0, policy_version 789856 (0.0013) [2024-06-15 21:11:59,058][1652491] Updated weights for policy 0, policy_version 789922 (0.0100) [2024-06-15 21:12:00,955][1648985] Fps is (10 sec: 52428.2, 60 sec: 49151.9, 300 sec: 47319.2). Total num frames: 1617821696. Throughput: 0: 12049.0. Samples: 404515328. Policy #0 lag: (min: 7.0, avg: 112.4, max: 263.0) [2024-06-15 21:12:00,956][1648985] Avg episode reward: [(0, '155.140')] [2024-06-15 21:12:02,623][1652491] Updated weights for policy 0, policy_version 789985 (0.0077) [2024-06-15 21:12:04,145][1652491] Updated weights for policy 0, policy_version 790022 (0.0010) [2024-06-15 21:12:05,109][1652491] Updated weights for policy 0, policy_version 790080 (0.0123) [2024-06-15 21:12:05,955][1648985] Fps is (10 sec: 52428.2, 60 sec: 48059.7, 300 sec: 47097.0). Total num frames: 1618083840. Throughput: 0: 12242.5. Samples: 404595200. Policy #0 lag: (min: 7.0, avg: 112.4, max: 263.0) [2024-06-15 21:12:05,956][1648985] Avg episode reward: [(0, '128.790')] [2024-06-15 21:12:09,054][1652491] Updated weights for policy 0, policy_version 790160 (0.0119) [2024-06-15 21:12:10,036][1652491] Updated weights for policy 0, policy_version 790208 (0.0016) [2024-06-15 21:12:10,955][1648985] Fps is (10 sec: 52428.3, 60 sec: 50244.1, 300 sec: 47430.3). Total num frames: 1618345984. Throughput: 0: 12174.2. Samples: 404627968. Policy #0 lag: (min: 7.0, avg: 112.4, max: 263.0) [2024-06-15 21:12:10,956][1648985] Avg episode reward: [(0, '128.160')] [2024-06-15 21:12:14,332][1652491] Updated weights for policy 0, policy_version 790266 (0.0012) [2024-06-15 21:12:15,955][1648985] Fps is (10 sec: 45876.0, 60 sec: 47513.6, 300 sec: 46874.9). Total num frames: 1618542592. Throughput: 0: 12174.3. Samples: 404699648. Policy #0 lag: (min: 7.0, avg: 112.4, max: 263.0) [2024-06-15 21:12:15,956][1648985] Avg episode reward: [(0, '118.960')] [2024-06-15 21:12:16,280][1652491] Updated weights for policy 0, policy_version 790327 (0.0125) [2024-06-15 21:12:19,579][1652491] Updated weights for policy 0, policy_version 790376 (0.0012) [2024-06-15 21:12:20,955][1648985] Fps is (10 sec: 42599.3, 60 sec: 48606.0, 300 sec: 47319.2). Total num frames: 1618771968. Throughput: 0: 12014.9. Samples: 404769280. Policy #0 lag: (min: 7.0, avg: 112.4, max: 263.0) [2024-06-15 21:12:20,956][1648985] Avg episode reward: [(0, '128.170')] [2024-06-15 21:12:21,324][1652491] Updated weights for policy 0, policy_version 790448 (0.0016) [2024-06-15 21:12:24,869][1651469] Signal inference workers to stop experience collection... (41100 times) [2024-06-15 21:12:24,934][1652491] InferenceWorker_p0-w0: stopping experience collection (41100 times) [2024-06-15 21:12:25,100][1651469] Signal inference workers to resume experience collection... (41100 times) [2024-06-15 21:12:25,101][1652491] InferenceWorker_p0-w0: resuming experience collection (41100 times) [2024-06-15 21:12:25,665][1652491] Updated weights for policy 0, policy_version 790522 (0.0150) [2024-06-15 21:12:25,955][1648985] Fps is (10 sec: 45875.1, 60 sec: 48059.8, 300 sec: 46652.7). Total num frames: 1619001344. Throughput: 0: 11958.0. Samples: 404804096. Policy #0 lag: (min: 7.0, avg: 112.4, max: 263.0) [2024-06-15 21:12:25,956][1648985] Avg episode reward: [(0, '127.750')] [2024-06-15 21:12:27,052][1652491] Updated weights for policy 0, policy_version 790563 (0.0012) [2024-06-15 21:12:29,864][1652491] Updated weights for policy 0, policy_version 790608 (0.0013) [2024-06-15 21:12:30,955][1648985] Fps is (10 sec: 49151.6, 60 sec: 48059.8, 300 sec: 47430.3). Total num frames: 1619263488. Throughput: 0: 11912.5. Samples: 404878848. Policy #0 lag: (min: 7.0, avg: 112.4, max: 263.0) [2024-06-15 21:12:30,956][1648985] Avg episode reward: [(0, '137.750')] [2024-06-15 21:12:30,972][1652491] Updated weights for policy 0, policy_version 790659 (0.0014) [2024-06-15 21:12:35,566][1652491] Updated weights for policy 0, policy_version 790736 (0.0116) [2024-06-15 21:12:35,955][1648985] Fps is (10 sec: 45875.1, 60 sec: 46967.4, 300 sec: 46874.9). Total num frames: 1619460096. Throughput: 0: 11810.1. Samples: 404944896. Policy #0 lag: (min: 7.0, avg: 112.4, max: 263.0) [2024-06-15 21:12:35,956][1648985] Avg episode reward: [(0, '147.750')] [2024-06-15 21:12:37,675][1652491] Updated weights for policy 0, policy_version 790788 (0.0012) [2024-06-15 21:12:38,907][1652491] Updated weights for policy 0, policy_version 790847 (0.0012) [2024-06-15 21:12:40,955][1648985] Fps is (10 sec: 39321.4, 60 sec: 46421.3, 300 sec: 47097.0). Total num frames: 1619656704. Throughput: 0: 11707.7. Samples: 404975616. Policy #0 lag: (min: 7.0, avg: 112.4, max: 263.0) [2024-06-15 21:12:40,956][1648985] Avg episode reward: [(0, '147.410')] [2024-06-15 21:12:42,827][1652491] Updated weights for policy 0, policy_version 790913 (0.0013) [2024-06-15 21:12:44,302][1652491] Updated weights for policy 0, policy_version 790968 (0.0013) [2024-06-15 21:12:45,955][1648985] Fps is (10 sec: 45875.3, 60 sec: 46967.5, 300 sec: 46986.0). Total num frames: 1619918848. Throughput: 0: 11662.2. Samples: 405040128. Policy #0 lag: (min: 7.0, avg: 112.4, max: 263.0) [2024-06-15 21:12:45,956][1648985] Avg episode reward: [(0, '153.000')] [2024-06-15 21:12:47,227][1652491] Updated weights for policy 0, policy_version 791011 (0.0025) [2024-06-15 21:12:47,740][1652491] Updated weights for policy 0, policy_version 791040 (0.0021) [2024-06-15 21:12:50,955][1648985] Fps is (10 sec: 52430.1, 60 sec: 48059.8, 300 sec: 47097.1). Total num frames: 1620180992. Throughput: 0: 11571.3. Samples: 405115904. Policy #0 lag: (min: 7.0, avg: 112.4, max: 263.0) [2024-06-15 21:12:50,955][1648985] Avg episode reward: [(0, '158.570')] [2024-06-15 21:12:52,638][1652491] Updated weights for policy 0, policy_version 791120 (0.0014) [2024-06-15 21:12:54,210][1652491] Updated weights for policy 0, policy_version 791187 (0.0012) [2024-06-15 21:12:55,955][1648985] Fps is (10 sec: 52427.9, 60 sec: 48059.6, 300 sec: 47319.2). Total num frames: 1620443136. Throughput: 0: 11628.1. Samples: 405151232. Policy #0 lag: (min: 7.0, avg: 112.4, max: 263.0) [2024-06-15 21:12:55,956][1648985] Avg episode reward: [(0, '161.790')] [2024-06-15 21:12:55,960][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000791232_1620443136.pth... [2024-06-15 21:12:56,009][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000785696_1609105408.pth [2024-06-15 21:12:57,470][1652491] Updated weights for policy 0, policy_version 791248 (0.0013) [2024-06-15 21:13:00,955][1648985] Fps is (10 sec: 39320.3, 60 sec: 45875.1, 300 sec: 46655.1). Total num frames: 1620574208. Throughput: 0: 11605.3. Samples: 405221888. Policy #0 lag: (min: 7.0, avg: 112.4, max: 263.0) [2024-06-15 21:13:00,956][1648985] Avg episode reward: [(0, '163.190')] [2024-06-15 21:13:01,590][1652491] Updated weights for policy 0, policy_version 791321 (0.0013) [2024-06-15 21:13:03,563][1652491] Updated weights for policy 0, policy_version 791376 (0.0013) [2024-06-15 21:13:04,664][1652491] Updated weights for policy 0, policy_version 791422 (0.0013) [2024-06-15 21:13:05,955][1648985] Fps is (10 sec: 45876.2, 60 sec: 46967.6, 300 sec: 47541.4). Total num frames: 1620901888. Throughput: 0: 11582.6. Samples: 405290496. Policy #0 lag: (min: 7.0, avg: 112.4, max: 263.0) [2024-06-15 21:13:05,955][1648985] Avg episode reward: [(0, '165.530')] [2024-06-15 21:13:06,468][1652491] Updated weights for policy 0, policy_version 791488 (0.0018) [2024-06-15 21:13:08,442][1651469] Signal inference workers to stop experience collection... (41150 times) [2024-06-15 21:13:08,493][1652491] InferenceWorker_p0-w0: stopping experience collection (41150 times) [2024-06-15 21:13:08,642][1651469] Signal inference workers to resume experience collection... (41150 times) [2024-06-15 21:13:08,643][1652491] InferenceWorker_p0-w0: resuming experience collection (41150 times) [2024-06-15 21:13:10,955][1648985] Fps is (10 sec: 52429.4, 60 sec: 45875.3, 300 sec: 46874.9). Total num frames: 1621098496. Throughput: 0: 11662.2. Samples: 405328896. Policy #0 lag: (min: 7.0, avg: 112.4, max: 263.0) [2024-06-15 21:13:10,956][1648985] Avg episode reward: [(0, '154.770')] [2024-06-15 21:13:12,736][1652491] Updated weights for policy 0, policy_version 791569 (0.0014) [2024-06-15 21:13:15,620][1652491] Updated weights for policy 0, policy_version 791648 (0.0013) [2024-06-15 21:13:15,955][1648985] Fps is (10 sec: 39321.1, 60 sec: 45875.1, 300 sec: 47319.2). Total num frames: 1621295104. Throughput: 0: 11628.1. Samples: 405402112. Policy #0 lag: (min: 7.0, avg: 112.4, max: 263.0) [2024-06-15 21:13:15,956][1648985] Avg episode reward: [(0, '161.780')] [2024-06-15 21:13:18,104][1652491] Updated weights for policy 0, policy_version 791737 (0.0012) [2024-06-15 21:13:20,955][1648985] Fps is (10 sec: 45875.4, 60 sec: 46421.3, 300 sec: 46986.0). Total num frames: 1621557248. Throughput: 0: 11548.5. Samples: 405464576. Policy #0 lag: (min: 7.0, avg: 112.4, max: 263.0) [2024-06-15 21:13:20,956][1648985] Avg episode reward: [(0, '186.440')] [2024-06-15 21:13:21,029][1652491] Updated weights for policy 0, policy_version 791792 (0.0040) [2024-06-15 21:13:25,073][1652491] Updated weights for policy 0, policy_version 791856 (0.0146) [2024-06-15 21:13:25,955][1648985] Fps is (10 sec: 45875.4, 60 sec: 45875.2, 300 sec: 47097.2). Total num frames: 1621753856. Throughput: 0: 11685.0. Samples: 405501440. Policy #0 lag: (min: 7.0, avg: 112.4, max: 263.0) [2024-06-15 21:13:25,956][1648985] Avg episode reward: [(0, '199.270')] [2024-06-15 21:13:27,325][1652491] Updated weights for policy 0, policy_version 791904 (0.0125) [2024-06-15 21:13:28,987][1652491] Updated weights for policy 0, policy_version 791970 (0.0011) [2024-06-15 21:13:30,955][1648985] Fps is (10 sec: 45875.1, 60 sec: 45875.2, 300 sec: 47097.0). Total num frames: 1622016000. Throughput: 0: 11673.6. Samples: 405565440. Policy #0 lag: (min: 111.0, avg: 206.6, max: 367.0) [2024-06-15 21:13:30,956][1648985] Avg episode reward: [(0, '199.030')] [2024-06-15 21:13:31,783][1652491] Updated weights for policy 0, policy_version 792017 (0.0012) [2024-06-15 21:13:35,955][1648985] Fps is (10 sec: 39322.0, 60 sec: 44783.0, 300 sec: 46652.7). Total num frames: 1622147072. Throughput: 0: 11605.3. Samples: 405638144. Policy #0 lag: (min: 111.0, avg: 206.6, max: 367.0) [2024-06-15 21:13:35,955][1648985] Avg episode reward: [(0, '175.830')] [2024-06-15 21:13:36,029][1652491] Updated weights for policy 0, policy_version 792080 (0.0013) [2024-06-15 21:13:37,194][1652491] Updated weights for policy 0, policy_version 792126 (0.0014) [2024-06-15 21:13:39,101][1652491] Updated weights for policy 0, policy_version 792178 (0.0014) [2024-06-15 21:13:40,955][1648985] Fps is (10 sec: 49151.9, 60 sec: 47513.6, 300 sec: 47208.2). Total num frames: 1622507520. Throughput: 0: 11571.2. Samples: 405671936. Policy #0 lag: (min: 111.0, avg: 206.6, max: 367.0) [2024-06-15 21:13:40,956][1648985] Avg episode reward: [(0, '178.020')] [2024-06-15 21:13:41,021][1652491] Updated weights for policy 0, policy_version 792256 (0.0010) [2024-06-15 21:13:43,802][1652491] Updated weights for policy 0, policy_version 792311 (0.0013) [2024-06-15 21:13:45,955][1648985] Fps is (10 sec: 52427.9, 60 sec: 45875.1, 300 sec: 46652.7). Total num frames: 1622671360. Throughput: 0: 11400.5. Samples: 405734912. Policy #0 lag: (min: 111.0, avg: 206.6, max: 367.0) [2024-06-15 21:13:45,956][1648985] Avg episode reward: [(0, '185.240')] [2024-06-15 21:13:48,753][1652491] Updated weights for policy 0, policy_version 792368 (0.0012) [2024-06-15 21:13:49,985][1652491] Updated weights for policy 0, policy_version 792416 (0.0011) [2024-06-15 21:13:50,955][1648985] Fps is (10 sec: 42598.7, 60 sec: 45875.1, 300 sec: 47097.1). Total num frames: 1622933504. Throughput: 0: 11400.5. Samples: 405803520. Policy #0 lag: (min: 111.0, avg: 206.6, max: 367.0) [2024-06-15 21:13:50,956][1648985] Avg episode reward: [(0, '189.900')] [2024-06-15 21:13:51,116][1651469] Signal inference workers to stop experience collection... (41200 times) [2024-06-15 21:13:51,153][1652491] InferenceWorker_p0-w0: stopping experience collection (41200 times) [2024-06-15 21:13:51,382][1651469] Signal inference workers to resume experience collection... (41200 times) [2024-06-15 21:13:51,383][1652491] InferenceWorker_p0-w0: resuming experience collection (41200 times) [2024-06-15 21:13:51,863][1652491] Updated weights for policy 0, policy_version 792499 (0.0141) [2024-06-15 21:13:54,604][1652491] Updated weights for policy 0, policy_version 792528 (0.0012) [2024-06-15 21:13:55,955][1648985] Fps is (10 sec: 52429.7, 60 sec: 45875.4, 300 sec: 46652.8). Total num frames: 1623195648. Throughput: 0: 11389.2. Samples: 405841408. Policy #0 lag: (min: 111.0, avg: 206.6, max: 367.0) [2024-06-15 21:13:55,955][1648985] Avg episode reward: [(0, '168.940')] [2024-06-15 21:14:00,167][1652491] Updated weights for policy 0, policy_version 792635 (0.0017) [2024-06-15 21:14:00,955][1648985] Fps is (10 sec: 42598.4, 60 sec: 46421.5, 300 sec: 46763.8). Total num frames: 1623359488. Throughput: 0: 11264.0. Samples: 405908992. Policy #0 lag: (min: 111.0, avg: 206.6, max: 367.0) [2024-06-15 21:14:00,956][1648985] Avg episode reward: [(0, '175.230')] [2024-06-15 21:14:01,385][1652491] Updated weights for policy 0, policy_version 792677 (0.0012) [2024-06-15 21:14:02,134][1652491] Updated weights for policy 0, policy_version 792706 (0.0015) [2024-06-15 21:14:03,408][1652491] Updated weights for policy 0, policy_version 792764 (0.0012) [2024-06-15 21:14:05,940][1652491] Updated weights for policy 0, policy_version 792800 (0.0012) [2024-06-15 21:14:05,956][1648985] Fps is (10 sec: 45874.3, 60 sec: 45875.1, 300 sec: 46430.6). Total num frames: 1623654400. Throughput: 0: 11593.9. Samples: 405986304. Policy #0 lag: (min: 111.0, avg: 206.6, max: 367.0) [2024-06-15 21:14:05,957][1648985] Avg episode reward: [(0, '172.580')] [2024-06-15 21:14:10,616][1652491] Updated weights for policy 0, policy_version 792868 (0.0012) [2024-06-15 21:14:10,955][1648985] Fps is (10 sec: 45875.0, 60 sec: 45329.1, 300 sec: 46763.8). Total num frames: 1623818240. Throughput: 0: 11605.3. Samples: 406023680. Policy #0 lag: (min: 111.0, avg: 206.6, max: 367.0) [2024-06-15 21:14:10,956][1648985] Avg episode reward: [(0, '153.200')] [2024-06-15 21:14:12,305][1652491] Updated weights for policy 0, policy_version 792949 (0.0012) [2024-06-15 21:14:13,651][1652491] Updated weights for policy 0, policy_version 792992 (0.0010) [2024-06-15 21:14:15,955][1648985] Fps is (10 sec: 45875.9, 60 sec: 46967.5, 300 sec: 46543.4). Total num frames: 1624113152. Throughput: 0: 11628.1. Samples: 406088704. Policy #0 lag: (min: 111.0, avg: 206.6, max: 367.0) [2024-06-15 21:14:15,956][1648985] Avg episode reward: [(0, '167.200')] [2024-06-15 21:14:16,931][1652491] Updated weights for policy 0, policy_version 793060 (0.0014) [2024-06-15 21:14:20,955][1648985] Fps is (10 sec: 42597.8, 60 sec: 44782.8, 300 sec: 46652.7). Total num frames: 1624244224. Throughput: 0: 11775.9. Samples: 406168064. Policy #0 lag: (min: 111.0, avg: 206.6, max: 367.0) [2024-06-15 21:14:20,956][1648985] Avg episode reward: [(0, '157.900')] [2024-06-15 21:14:20,999][1652491] Updated weights for policy 0, policy_version 793093 (0.0012) [2024-06-15 21:14:22,617][1652491] Updated weights for policy 0, policy_version 793168 (0.0014) [2024-06-15 21:14:24,361][1652491] Updated weights for policy 0, policy_version 793232 (0.0012) [2024-06-15 21:14:25,524][1652491] Updated weights for policy 0, policy_version 793278 (0.0011) [2024-06-15 21:14:25,955][1648985] Fps is (10 sec: 52429.2, 60 sec: 48059.8, 300 sec: 46986.0). Total num frames: 1624637440. Throughput: 0: 11719.1. Samples: 406199296. Policy #0 lag: (min: 111.0, avg: 206.6, max: 367.0) [2024-06-15 21:14:25,955][1648985] Avg episode reward: [(0, '151.870')] [2024-06-15 21:14:28,042][1652491] Updated weights for policy 0, policy_version 793317 (0.0017) [2024-06-15 21:14:30,955][1648985] Fps is (10 sec: 52429.9, 60 sec: 45875.2, 300 sec: 46652.7). Total num frames: 1624768512. Throughput: 0: 11821.6. Samples: 406266880. Policy #0 lag: (min: 111.0, avg: 206.6, max: 367.0) [2024-06-15 21:14:30,956][1648985] Avg episode reward: [(0, '137.050')] [2024-06-15 21:14:32,113][1652491] Updated weights for policy 0, policy_version 793362 (0.0011) [2024-06-15 21:14:33,453][1652491] Updated weights for policy 0, policy_version 793429 (0.0014) [2024-06-15 21:14:33,706][1651469] Signal inference workers to stop experience collection... (41250 times) [2024-06-15 21:14:33,778][1652491] InferenceWorker_p0-w0: stopping experience collection (41250 times) [2024-06-15 21:14:33,938][1651469] Signal inference workers to resume experience collection... (41250 times) [2024-06-15 21:14:33,939][1652491] InferenceWorker_p0-w0: resuming experience collection (41250 times) [2024-06-15 21:14:35,469][1652491] Updated weights for policy 0, policy_version 793504 (0.0014) [2024-06-15 21:14:35,955][1648985] Fps is (10 sec: 49151.4, 60 sec: 49698.1, 300 sec: 47208.1). Total num frames: 1625128960. Throughput: 0: 11901.1. Samples: 406339072. Policy #0 lag: (min: 111.0, avg: 206.6, max: 367.0) [2024-06-15 21:14:35,956][1648985] Avg episode reward: [(0, '138.410')] [2024-06-15 21:14:38,100][1652491] Updated weights for policy 0, policy_version 793554 (0.0117) [2024-06-15 21:14:40,955][1648985] Fps is (10 sec: 52428.6, 60 sec: 46421.4, 300 sec: 46652.7). Total num frames: 1625292800. Throughput: 0: 11901.1. Samples: 406376960. Policy #0 lag: (min: 111.0, avg: 206.6, max: 367.0) [2024-06-15 21:14:40,956][1648985] Avg episode reward: [(0, '151.950')] [2024-06-15 21:14:42,415][1652491] Updated weights for policy 0, policy_version 793601 (0.0025) [2024-06-15 21:14:44,129][1652491] Updated weights for policy 0, policy_version 793667 (0.0012) [2024-06-15 21:14:45,955][1648985] Fps is (10 sec: 45875.4, 60 sec: 48606.0, 300 sec: 47208.1). Total num frames: 1625587712. Throughput: 0: 12128.7. Samples: 406454784. Policy #0 lag: (min: 111.0, avg: 206.6, max: 367.0) [2024-06-15 21:14:45,956][1648985] Avg episode reward: [(0, '160.560')] [2024-06-15 21:14:46,198][1652491] Updated weights for policy 0, policy_version 793760 (0.0109) [2024-06-15 21:14:48,803][1652491] Updated weights for policy 0, policy_version 793808 (0.0016) [2024-06-15 21:14:49,548][1652491] Updated weights for policy 0, policy_version 793854 (0.0097) [2024-06-15 21:14:50,955][1648985] Fps is (10 sec: 52429.2, 60 sec: 48059.8, 300 sec: 46874.9). Total num frames: 1625817088. Throughput: 0: 12003.6. Samples: 406526464. Policy #0 lag: (min: 111.0, avg: 206.6, max: 367.0) [2024-06-15 21:14:50,955][1648985] Avg episode reward: [(0, '178.820')] [2024-06-15 21:14:55,374][1652491] Updated weights for policy 0, policy_version 793936 (0.0014) [2024-06-15 21:14:55,955][1648985] Fps is (10 sec: 42596.8, 60 sec: 46967.2, 300 sec: 47208.1). Total num frames: 1626013696. Throughput: 0: 12174.1. Samples: 406571520. Policy #0 lag: (min: 111.0, avg: 206.6, max: 367.0) [2024-06-15 21:14:55,956][1648985] Avg episode reward: [(0, '180.680')] [2024-06-15 21:14:56,337][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000793984_1626079232.pth... [2024-06-15 21:14:56,507][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000788464_1614774272.pth [2024-06-15 21:14:57,259][1652491] Updated weights for policy 0, policy_version 794017 (0.0014) [2024-06-15 21:14:59,802][1652491] Updated weights for policy 0, policy_version 794050 (0.0013) [2024-06-15 21:15:00,515][1652491] Updated weights for policy 0, policy_version 794096 (0.0013) [2024-06-15 21:15:00,955][1648985] Fps is (10 sec: 52427.4, 60 sec: 49698.0, 300 sec: 47430.3). Total num frames: 1626341376. Throughput: 0: 12128.7. Samples: 406634496. Policy #0 lag: (min: 111.0, avg: 206.6, max: 367.0) [2024-06-15 21:15:00,956][1648985] Avg episode reward: [(0, '163.740')] [2024-06-15 21:15:05,955][1648985] Fps is (10 sec: 39322.3, 60 sec: 45875.2, 300 sec: 46874.9). Total num frames: 1626406912. Throughput: 0: 12037.7. Samples: 406709760. Policy #0 lag: (min: 111.0, avg: 206.6, max: 367.0) [2024-06-15 21:15:05,956][1648985] Avg episode reward: [(0, '141.000')] [2024-06-15 21:15:06,166][1652491] Updated weights for policy 0, policy_version 794161 (0.0024) [2024-06-15 21:15:08,074][1652491] Updated weights for policy 0, policy_version 794240 (0.0013) [2024-06-15 21:15:09,463][1652491] Updated weights for policy 0, policy_version 794301 (0.0014) [2024-06-15 21:15:10,955][1648985] Fps is (10 sec: 39322.4, 60 sec: 48605.9, 300 sec: 47208.1). Total num frames: 1626734592. Throughput: 0: 11878.4. Samples: 406733824. Policy #0 lag: (min: 111.0, avg: 206.6, max: 367.0) [2024-06-15 21:15:10,956][1648985] Avg episode reward: [(0, '151.080')] [2024-06-15 21:15:12,699][1652491] Updated weights for policy 0, policy_version 794364 (0.0015) [2024-06-15 21:15:15,955][1648985] Fps is (10 sec: 45875.1, 60 sec: 45875.0, 300 sec: 46652.7). Total num frames: 1626865664. Throughput: 0: 12026.2. Samples: 406808064. Policy #0 lag: (min: 111.0, avg: 206.6, max: 367.0) [2024-06-15 21:15:15,956][1648985] Avg episode reward: [(0, '167.680')] [2024-06-15 21:15:16,854][1651469] Signal inference workers to stop experience collection... (41300 times) [2024-06-15 21:15:16,892][1652491] InferenceWorker_p0-w0: stopping experience collection (41300 times) [2024-06-15 21:15:17,155][1651469] Signal inference workers to resume experience collection... (41300 times) [2024-06-15 21:15:17,156][1652491] InferenceWorker_p0-w0: resuming experience collection (41300 times) [2024-06-15 21:15:17,340][1652491] Updated weights for policy 0, policy_version 794403 (0.0013) [2024-06-15 21:15:18,531][1652491] Updated weights for policy 0, policy_version 794453 (0.0052) [2024-06-15 21:15:20,616][1652491] Updated weights for policy 0, policy_version 794532 (0.0014) [2024-06-15 21:15:20,955][1648985] Fps is (10 sec: 49151.9, 60 sec: 49698.3, 300 sec: 47430.3). Total num frames: 1627226112. Throughput: 0: 11719.1. Samples: 406866432. Policy #0 lag: (min: 15.0, avg: 87.2, max: 271.0) [2024-06-15 21:15:20,955][1648985] Avg episode reward: [(0, '175.900')] [2024-06-15 21:15:23,801][1652491] Updated weights for policy 0, policy_version 794592 (0.0021) [2024-06-15 21:15:25,955][1648985] Fps is (10 sec: 52429.5, 60 sec: 45875.1, 300 sec: 46652.7). Total num frames: 1627389952. Throughput: 0: 11628.1. Samples: 406900224. Policy #0 lag: (min: 15.0, avg: 87.2, max: 271.0) [2024-06-15 21:15:25,956][1648985] Avg episode reward: [(0, '150.340')] [2024-06-15 21:15:28,780][1652491] Updated weights for policy 0, policy_version 794641 (0.0032) [2024-06-15 21:15:30,955][1648985] Fps is (10 sec: 36044.9, 60 sec: 46967.5, 300 sec: 46874.9). Total num frames: 1627586560. Throughput: 0: 11491.6. Samples: 406971904. Policy #0 lag: (min: 15.0, avg: 87.2, max: 271.0) [2024-06-15 21:15:30,956][1648985] Avg episode reward: [(0, '154.090')] [2024-06-15 21:15:31,087][1652491] Updated weights for policy 0, policy_version 794736 (0.0013) [2024-06-15 21:15:32,712][1652491] Updated weights for policy 0, policy_version 794801 (0.0014) [2024-06-15 21:15:35,955][1648985] Fps is (10 sec: 45875.4, 60 sec: 45329.1, 300 sec: 46763.8). Total num frames: 1627848704. Throughput: 0: 11275.4. Samples: 407033856. Policy #0 lag: (min: 15.0, avg: 87.2, max: 271.0) [2024-06-15 21:15:35,956][1648985] Avg episode reward: [(0, '160.030')] [2024-06-15 21:15:35,986][1652491] Updated weights for policy 0, policy_version 794853 (0.0017) [2024-06-15 21:15:40,911][1652491] Updated weights for policy 0, policy_version 794898 (0.0012) [2024-06-15 21:15:40,955][1648985] Fps is (10 sec: 36044.8, 60 sec: 44236.8, 300 sec: 46319.5). Total num frames: 1627947008. Throughput: 0: 11127.6. Samples: 407072256. Policy #0 lag: (min: 15.0, avg: 87.2, max: 271.0) [2024-06-15 21:15:40,956][1648985] Avg episode reward: [(0, '163.170')] [2024-06-15 21:15:42,705][1652491] Updated weights for policy 0, policy_version 794976 (0.0013) [2024-06-15 21:15:44,598][1652491] Updated weights for policy 0, policy_version 795045 (0.0021) [2024-06-15 21:15:45,955][1648985] Fps is (10 sec: 45875.2, 60 sec: 45329.0, 300 sec: 46986.0). Total num frames: 1628307456. Throughput: 0: 10922.7. Samples: 407126016. Policy #0 lag: (min: 15.0, avg: 87.2, max: 271.0) [2024-06-15 21:15:45,956][1648985] Avg episode reward: [(0, '164.300')] [2024-06-15 21:15:47,449][1652491] Updated weights for policy 0, policy_version 795104 (0.0013) [2024-06-15 21:15:50,955][1648985] Fps is (10 sec: 49150.6, 60 sec: 43690.4, 300 sec: 46430.6). Total num frames: 1628438528. Throughput: 0: 11025.0. Samples: 407205888. Policy #0 lag: (min: 15.0, avg: 87.2, max: 271.0) [2024-06-15 21:15:50,956][1648985] Avg episode reward: [(0, '156.910')] [2024-06-15 21:15:52,344][1652491] Updated weights for policy 0, policy_version 795152 (0.0014) [2024-06-15 21:15:53,759][1652491] Updated weights for policy 0, policy_version 795205 (0.0013) [2024-06-15 21:15:54,970][1652491] Updated weights for policy 0, policy_version 795252 (0.0012) [2024-06-15 21:15:55,620][1651469] Signal inference workers to stop experience collection... (41350 times) [2024-06-15 21:15:55,654][1652491] InferenceWorker_p0-w0: stopping experience collection (41350 times) [2024-06-15 21:15:55,779][1651469] Signal inference workers to resume experience collection... (41350 times) [2024-06-15 21:15:55,780][1652491] InferenceWorker_p0-w0: resuming experience collection (41350 times) [2024-06-15 21:15:55,955][1648985] Fps is (10 sec: 45874.7, 60 sec: 45875.4, 300 sec: 47097.0). Total num frames: 1628766208. Throughput: 0: 11195.7. Samples: 407237632. Policy #0 lag: (min: 15.0, avg: 87.2, max: 271.0) [2024-06-15 21:15:55,956][1648985] Avg episode reward: [(0, '140.340')] [2024-06-15 21:15:56,308][1652491] Updated weights for policy 0, policy_version 795320 (0.0012) [2024-06-15 21:15:59,005][1652491] Updated weights for policy 0, policy_version 795382 (0.0012) [2024-06-15 21:16:00,955][1648985] Fps is (10 sec: 52430.1, 60 sec: 43690.8, 300 sec: 46652.8). Total num frames: 1628962816. Throughput: 0: 11127.5. Samples: 407308800. Policy #0 lag: (min: 15.0, avg: 87.2, max: 271.0) [2024-06-15 21:16:00,956][1648985] Avg episode reward: [(0, '135.690')] [2024-06-15 21:16:03,411][1652491] Updated weights for policy 0, policy_version 795426 (0.0010) [2024-06-15 21:16:04,658][1652491] Updated weights for policy 0, policy_version 795473 (0.0018) [2024-06-15 21:16:05,935][1652491] Updated weights for policy 0, policy_version 795522 (0.0011) [2024-06-15 21:16:05,955][1648985] Fps is (10 sec: 45875.7, 60 sec: 46967.6, 300 sec: 47097.1). Total num frames: 1629224960. Throughput: 0: 11355.0. Samples: 407377408. Policy #0 lag: (min: 15.0, avg: 87.2, max: 271.0) [2024-06-15 21:16:05,956][1648985] Avg episode reward: [(0, '141.940')] [2024-06-15 21:16:07,127][1652491] Updated weights for policy 0, policy_version 795581 (0.0012) [2024-06-15 21:16:09,535][1652491] Updated weights for policy 0, policy_version 795632 (0.0020) [2024-06-15 21:16:10,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 45875.2, 300 sec: 46763.8). Total num frames: 1629487104. Throughput: 0: 11411.9. Samples: 407413760. Policy #0 lag: (min: 15.0, avg: 87.2, max: 271.0) [2024-06-15 21:16:10,956][1648985] Avg episode reward: [(0, '130.720')] [2024-06-15 21:16:14,064][1652491] Updated weights for policy 0, policy_version 795680 (0.0053) [2024-06-15 21:16:15,188][1652491] Updated weights for policy 0, policy_version 795728 (0.0017) [2024-06-15 21:16:15,955][1648985] Fps is (10 sec: 45875.2, 60 sec: 46967.6, 300 sec: 46874.9). Total num frames: 1629683712. Throughput: 0: 11559.8. Samples: 407492096. Policy #0 lag: (min: 15.0, avg: 87.2, max: 271.0) [2024-06-15 21:16:15,956][1648985] Avg episode reward: [(0, '145.670')] [2024-06-15 21:16:16,218][1652491] Updated weights for policy 0, policy_version 795775 (0.0011) [2024-06-15 21:16:17,499][1652491] Updated weights for policy 0, policy_version 795828 (0.0013) [2024-06-15 21:16:19,237][1652491] Updated weights for policy 0, policy_version 795856 (0.0011) [2024-06-15 21:16:20,955][1648985] Fps is (10 sec: 52428.4, 60 sec: 46421.2, 300 sec: 47097.1). Total num frames: 1630011392. Throughput: 0: 11878.4. Samples: 407568384. Policy #0 lag: (min: 15.0, avg: 87.2, max: 271.0) [2024-06-15 21:16:20,956][1648985] Avg episode reward: [(0, '173.870')] [2024-06-15 21:16:23,424][1652491] Updated weights for policy 0, policy_version 795907 (0.0014) [2024-06-15 21:16:24,836][1652491] Updated weights for policy 0, policy_version 795960 (0.0021) [2024-06-15 21:16:25,940][1652491] Updated weights for policy 0, policy_version 796003 (0.0013) [2024-06-15 21:16:25,955][1648985] Fps is (10 sec: 52428.4, 60 sec: 46967.4, 300 sec: 46874.9). Total num frames: 1630208000. Throughput: 0: 12037.7. Samples: 407613952. Policy #0 lag: (min: 15.0, avg: 87.2, max: 271.0) [2024-06-15 21:16:25,956][1648985] Avg episode reward: [(0, '179.020')] [2024-06-15 21:16:27,220][1652491] Updated weights for policy 0, policy_version 796051 (0.0014) [2024-06-15 21:16:29,992][1652491] Updated weights for policy 0, policy_version 796112 (0.0015) [2024-06-15 21:16:31,006][1648985] Fps is (10 sec: 52162.6, 60 sec: 49110.1, 300 sec: 47088.9). Total num frames: 1630535680. Throughput: 0: 12240.0. Samples: 407677440. Policy #0 lag: (min: 15.0, avg: 87.2, max: 271.0) [2024-06-15 21:16:31,007][1648985] Avg episode reward: [(0, '173.930')] [2024-06-15 21:16:33,822][1652491] Updated weights for policy 0, policy_version 796163 (0.0014) [2024-06-15 21:16:35,955][1648985] Fps is (10 sec: 45875.8, 60 sec: 46967.5, 300 sec: 46763.8). Total num frames: 1630666752. Throughput: 0: 12242.6. Samples: 407756800. Policy #0 lag: (min: 15.0, avg: 87.2, max: 271.0) [2024-06-15 21:16:35,955][1648985] Avg episode reward: [(0, '155.020')] [2024-06-15 21:16:36,081][1652491] Updated weights for policy 0, policy_version 796240 (0.0013) [2024-06-15 21:16:36,589][1651469] Signal inference workers to stop experience collection... (41400 times) [2024-06-15 21:16:36,640][1652491] InferenceWorker_p0-w0: stopping experience collection (41400 times) [2024-06-15 21:16:36,847][1651469] Signal inference workers to resume experience collection... (41400 times) [2024-06-15 21:16:36,847][1652491] InferenceWorker_p0-w0: resuming experience collection (41400 times) [2024-06-15 21:16:37,230][1652491] Updated weights for policy 0, policy_version 796288 (0.0010) [2024-06-15 21:16:38,583][1652491] Updated weights for policy 0, policy_version 796344 (0.0011) [2024-06-15 21:16:40,249][1652491] Updated weights for policy 0, policy_version 796371 (0.0014) [2024-06-15 21:16:40,955][1648985] Fps is (10 sec: 49404.5, 60 sec: 51336.5, 300 sec: 47208.1). Total num frames: 1631027200. Throughput: 0: 12197.0. Samples: 407786496. Policy #0 lag: (min: 15.0, avg: 87.2, max: 271.0) [2024-06-15 21:16:40,956][1648985] Avg episode reward: [(0, '143.410')] [2024-06-15 21:16:45,831][1652491] Updated weights for policy 0, policy_version 796421 (0.0124) [2024-06-15 21:16:45,955][1648985] Fps is (10 sec: 39321.1, 60 sec: 45875.2, 300 sec: 46652.7). Total num frames: 1631059968. Throughput: 0: 12413.1. Samples: 407867392. Policy #0 lag: (min: 15.0, avg: 87.2, max: 271.0) [2024-06-15 21:16:45,956][1648985] Avg episode reward: [(0, '163.790')] [2024-06-15 21:16:47,777][1652491] Updated weights for policy 0, policy_version 796501 (0.0012) [2024-06-15 21:16:49,512][1652491] Updated weights for policy 0, policy_version 796565 (0.0170) [2024-06-15 21:16:50,955][1648985] Fps is (10 sec: 42598.7, 60 sec: 50244.5, 300 sec: 47097.1). Total num frames: 1631453184. Throughput: 0: 12185.6. Samples: 407925760. Policy #0 lag: (min: 15.0, avg: 87.2, max: 271.0) [2024-06-15 21:16:50,955][1648985] Avg episode reward: [(0, '162.500')] [2024-06-15 21:16:51,496][1652491] Updated weights for policy 0, policy_version 796643 (0.0046) [2024-06-15 21:16:55,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 46967.5, 300 sec: 46652.7). Total num frames: 1631584256. Throughput: 0: 12242.5. Samples: 407964672. Policy #0 lag: (min: 15.0, avg: 87.2, max: 271.0) [2024-06-15 21:16:55,956][1648985] Avg episode reward: [(0, '189.370')] [2024-06-15 21:16:55,989][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000796672_1631584256.pth... [2024-06-15 21:16:56,057][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000791232_1620443136.pth [2024-06-15 21:16:58,395][1652491] Updated weights for policy 0, policy_version 796691 (0.0013) [2024-06-15 21:17:00,524][1652491] Updated weights for policy 0, policy_version 796768 (0.0011) [2024-06-15 21:17:00,955][1648985] Fps is (10 sec: 32767.7, 60 sec: 46967.4, 300 sec: 46430.6). Total num frames: 1631780864. Throughput: 0: 12026.3. Samples: 408033280. Policy #0 lag: (min: 15.0, avg: 87.2, max: 271.0) [2024-06-15 21:17:00,956][1648985] Avg episode reward: [(0, '168.490')] [2024-06-15 21:17:02,257][1652491] Updated weights for policy 0, policy_version 796832 (0.0011) [2024-06-15 21:17:03,976][1652491] Updated weights for policy 0, policy_version 796926 (0.0014) [2024-06-15 21:17:05,955][1648985] Fps is (10 sec: 52429.5, 60 sec: 48059.8, 300 sec: 46652.8). Total num frames: 1632108544. Throughput: 0: 11673.6. Samples: 408093696. Policy #0 lag: (min: 63.0, avg: 204.6, max: 287.0) [2024-06-15 21:17:05,956][1648985] Avg episode reward: [(0, '149.000')] [2024-06-15 21:17:10,701][1652491] Updated weights for policy 0, policy_version 796976 (0.0011) [2024-06-15 21:17:10,955][1648985] Fps is (10 sec: 42598.9, 60 sec: 45329.1, 300 sec: 46319.5). Total num frames: 1632206848. Throughput: 0: 11537.1. Samples: 408133120. Policy #0 lag: (min: 63.0, avg: 204.6, max: 287.0) [2024-06-15 21:17:10,955][1648985] Avg episode reward: [(0, '145.850')] [2024-06-15 21:17:12,861][1652491] Updated weights for policy 0, policy_version 797056 (0.0011) [2024-06-15 21:17:14,342][1652491] Updated weights for policy 0, policy_version 797109 (0.0011) [2024-06-15 21:17:14,989][1651469] Signal inference workers to stop experience collection... (41450 times) [2024-06-15 21:17:15,073][1652491] InferenceWorker_p0-w0: stopping experience collection (41450 times) [2024-06-15 21:17:15,240][1651469] Signal inference workers to resume experience collection... (41450 times) [2024-06-15 21:17:15,241][1652491] InferenceWorker_p0-w0: resuming experience collection (41450 times) [2024-06-15 21:17:15,955][1648985] Fps is (10 sec: 52428.5, 60 sec: 49152.0, 300 sec: 46986.0). Total num frames: 1632632832. Throughput: 0: 11379.3. Samples: 408188928. Policy #0 lag: (min: 63.0, avg: 204.6, max: 287.0) [2024-06-15 21:17:15,956][1648985] Avg episode reward: [(0, '164.200')] [2024-06-15 21:17:20,955][1648985] Fps is (10 sec: 42598.1, 60 sec: 43690.7, 300 sec: 46208.4). Total num frames: 1632632832. Throughput: 0: 11320.9. Samples: 408266240. Policy #0 lag: (min: 63.0, avg: 204.6, max: 287.0) [2024-06-15 21:17:20,956][1648985] Avg episode reward: [(0, '178.370')] [2024-06-15 21:17:21,489][1652491] Updated weights for policy 0, policy_version 797186 (0.0015) [2024-06-15 21:17:22,929][1652491] Updated weights for policy 0, policy_version 797248 (0.0012) [2024-06-15 21:17:24,929][1652491] Updated weights for policy 0, policy_version 797331 (0.0011) [2024-06-15 21:17:25,955][1648985] Fps is (10 sec: 39321.3, 60 sec: 46967.5, 300 sec: 46652.7). Total num frames: 1633026048. Throughput: 0: 11298.1. Samples: 408294912. Policy #0 lag: (min: 63.0, avg: 204.6, max: 287.0) [2024-06-15 21:17:25,956][1648985] Avg episode reward: [(0, '191.020')] [2024-06-15 21:17:26,893][1652491] Updated weights for policy 0, policy_version 797408 (0.0109) [2024-06-15 21:17:30,955][1648985] Fps is (10 sec: 52428.5, 60 sec: 43727.9, 300 sec: 46430.6). Total num frames: 1633157120. Throughput: 0: 10956.8. Samples: 408360448. Policy #0 lag: (min: 63.0, avg: 204.6, max: 287.0) [2024-06-15 21:17:30,956][1648985] Avg episode reward: [(0, '188.620')] [2024-06-15 21:17:33,695][1652491] Updated weights for policy 0, policy_version 797456 (0.0033) [2024-06-15 21:17:35,045][1652491] Updated weights for policy 0, policy_version 797503 (0.0012) [2024-06-15 21:17:35,955][1648985] Fps is (10 sec: 32768.3, 60 sec: 44782.9, 300 sec: 46430.6). Total num frames: 1633353728. Throughput: 0: 11195.7. Samples: 408429568. Policy #0 lag: (min: 63.0, avg: 204.6, max: 287.0) [2024-06-15 21:17:35,956][1648985] Avg episode reward: [(0, '189.530')] [2024-06-15 21:17:36,003][1652491] Updated weights for policy 0, policy_version 797540 (0.0011) [2024-06-15 21:17:37,611][1652491] Updated weights for policy 0, policy_version 797603 (0.0019) [2024-06-15 21:17:39,467][1652491] Updated weights for policy 0, policy_version 797696 (0.0013) [2024-06-15 21:17:40,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 44236.8, 300 sec: 46652.7). Total num frames: 1633681408. Throughput: 0: 10922.7. Samples: 408456192. Policy #0 lag: (min: 63.0, avg: 204.6, max: 287.0) [2024-06-15 21:17:40,956][1648985] Avg episode reward: [(0, '194.210')] [2024-06-15 21:17:45,955][1648985] Fps is (10 sec: 36045.0, 60 sec: 44236.9, 300 sec: 45875.2). Total num frames: 1633714176. Throughput: 0: 11093.4. Samples: 408532480. Policy #0 lag: (min: 63.0, avg: 204.6, max: 287.0) [2024-06-15 21:17:45,956][1648985] Avg episode reward: [(0, '171.510')] [2024-06-15 21:17:46,854][1652491] Updated weights for policy 0, policy_version 797751 (0.0110) [2024-06-15 21:17:48,487][1652491] Updated weights for policy 0, policy_version 797824 (0.0010) [2024-06-15 21:17:50,036][1652491] Updated weights for policy 0, policy_version 797903 (0.0141) [2024-06-15 21:17:50,955][1648985] Fps is (10 sec: 49152.3, 60 sec: 45329.0, 300 sec: 46541.7). Total num frames: 1634172928. Throughput: 0: 11025.0. Samples: 408589824. Policy #0 lag: (min: 63.0, avg: 204.6, max: 287.0) [2024-06-15 21:17:50,956][1648985] Avg episode reward: [(0, '179.450')] [2024-06-15 21:17:51,064][1652491] Updated weights for policy 0, policy_version 797952 (0.0019) [2024-06-15 21:17:55,955][1648985] Fps is (10 sec: 49151.7, 60 sec: 43690.7, 300 sec: 46208.5). Total num frames: 1634205696. Throughput: 0: 11002.3. Samples: 408628224. Policy #0 lag: (min: 63.0, avg: 204.6, max: 287.0) [2024-06-15 21:17:55,955][1648985] Avg episode reward: [(0, '166.390')] [2024-06-15 21:17:58,026][1651469] Signal inference workers to stop experience collection... (41500 times) [2024-06-15 21:17:58,080][1652491] InferenceWorker_p0-w0: stopping experience collection (41500 times) [2024-06-15 21:17:58,288][1651469] Signal inference workers to resume experience collection... (41500 times) [2024-06-15 21:17:58,289][1652491] InferenceWorker_p0-w0: resuming experience collection (41500 times) [2024-06-15 21:17:58,481][1652491] Updated weights for policy 0, policy_version 798018 (0.0094) [2024-06-15 21:18:00,160][1652491] Updated weights for policy 0, policy_version 798084 (0.0010) [2024-06-15 21:18:00,956][1648985] Fps is (10 sec: 36043.4, 60 sec: 45874.9, 300 sec: 46208.4). Total num frames: 1634533376. Throughput: 0: 11263.9. Samples: 408695808. Policy #0 lag: (min: 63.0, avg: 204.6, max: 287.0) [2024-06-15 21:18:00,957][1648985] Avg episode reward: [(0, '164.970')] [2024-06-15 21:18:02,101][1652491] Updated weights for policy 0, policy_version 798170 (0.0208) [2024-06-15 21:18:05,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 43690.6, 300 sec: 46208.4). Total num frames: 1634729984. Throughput: 0: 10979.6. Samples: 408760320. Policy #0 lag: (min: 63.0, avg: 204.6, max: 287.0) [2024-06-15 21:18:05,956][1648985] Avg episode reward: [(0, '155.030')] [2024-06-15 21:18:08,992][1652491] Updated weights for policy 0, policy_version 798224 (0.0016) [2024-06-15 21:18:10,955][1648985] Fps is (10 sec: 36046.5, 60 sec: 44782.9, 300 sec: 46097.4). Total num frames: 1634893824. Throughput: 0: 11320.9. Samples: 408804352. Policy #0 lag: (min: 63.0, avg: 204.6, max: 287.0) [2024-06-15 21:18:10,956][1648985] Avg episode reward: [(0, '140.970')] [2024-06-15 21:18:11,503][1652491] Updated weights for policy 0, policy_version 798305 (0.0012) [2024-06-15 21:18:13,036][1652491] Updated weights for policy 0, policy_version 798371 (0.0013) [2024-06-15 21:18:14,444][1652491] Updated weights for policy 0, policy_version 798448 (0.0102) [2024-06-15 21:18:15,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 43690.7, 300 sec: 46430.6). Total num frames: 1635254272. Throughput: 0: 10979.6. Samples: 408854528. Policy #0 lag: (min: 63.0, avg: 204.6, max: 287.0) [2024-06-15 21:18:15,956][1648985] Avg episode reward: [(0, '148.350')] [2024-06-15 21:18:20,955][1648985] Fps is (10 sec: 42598.7, 60 sec: 44783.0, 300 sec: 45986.3). Total num frames: 1635319808. Throughput: 0: 11275.4. Samples: 408936960. Policy #0 lag: (min: 63.0, avg: 204.6, max: 287.0) [2024-06-15 21:18:20,955][1648985] Avg episode reward: [(0, '140.110')] [2024-06-15 21:18:20,957][1652491] Updated weights for policy 0, policy_version 798498 (0.0026) [2024-06-15 21:18:22,518][1652491] Updated weights for policy 0, policy_version 798560 (0.0026) [2024-06-15 21:18:24,992][1652491] Updated weights for policy 0, policy_version 798656 (0.0013) [2024-06-15 21:18:25,955][1648985] Fps is (10 sec: 45875.1, 60 sec: 44783.0, 300 sec: 46430.6). Total num frames: 1635713024. Throughput: 0: 11320.9. Samples: 408965632. Policy #0 lag: (min: 63.0, avg: 204.6, max: 287.0) [2024-06-15 21:18:25,956][1648985] Avg episode reward: [(0, '152.790')] [2024-06-15 21:18:30,955][1648985] Fps is (10 sec: 45875.5, 60 sec: 43690.8, 300 sec: 46208.5). Total num frames: 1635778560. Throughput: 0: 11127.5. Samples: 409033216. Policy #0 lag: (min: 63.0, avg: 204.6, max: 287.0) [2024-06-15 21:18:30,955][1648985] Avg episode reward: [(0, '151.600')] [2024-06-15 21:18:31,670][1652491] Updated weights for policy 0, policy_version 798721 (0.0014) [2024-06-15 21:18:33,562][1652491] Updated weights for policy 0, policy_version 798787 (0.0015) [2024-06-15 21:18:35,194][1651469] Signal inference workers to stop experience collection... (41550 times) [2024-06-15 21:18:35,224][1652491] InferenceWorker_p0-w0: stopping experience collection (41550 times) [2024-06-15 21:18:35,371][1651469] Signal inference workers to resume experience collection... (41550 times) [2024-06-15 21:18:35,371][1652491] InferenceWorker_p0-w0: resuming experience collection (41550 times) [2024-06-15 21:18:35,520][1652491] Updated weights for policy 0, policy_version 798867 (0.0021) [2024-06-15 21:18:35,955][1648985] Fps is (10 sec: 42598.8, 60 sec: 46421.4, 300 sec: 46208.5). Total num frames: 1636139008. Throughput: 0: 11116.1. Samples: 409090048. Policy #0 lag: (min: 63.0, avg: 204.6, max: 287.0) [2024-06-15 21:18:35,956][1648985] Avg episode reward: [(0, '167.200')] [2024-06-15 21:18:38,109][1652491] Updated weights for policy 0, policy_version 798960 (0.0024) [2024-06-15 21:18:40,955][1648985] Fps is (10 sec: 52426.9, 60 sec: 43690.6, 300 sec: 46208.4). Total num frames: 1636302848. Throughput: 0: 11036.4. Samples: 409124864. Policy #0 lag: (min: 63.0, avg: 204.6, max: 287.0) [2024-06-15 21:18:40,956][1648985] Avg episode reward: [(0, '166.360')] [2024-06-15 21:18:44,599][1652491] Updated weights for policy 0, policy_version 799008 (0.0010) [2024-06-15 21:18:45,955][1648985] Fps is (10 sec: 32768.1, 60 sec: 45875.3, 300 sec: 45875.2). Total num frames: 1636466688. Throughput: 0: 11252.8. Samples: 409202176. Policy #0 lag: (min: 63.0, avg: 204.6, max: 287.0) [2024-06-15 21:18:45,955][1648985] Avg episode reward: [(0, '164.840')] [2024-06-15 21:18:46,636][1652491] Updated weights for policy 0, policy_version 799091 (0.0120) [2024-06-15 21:18:47,807][1652491] Updated weights for policy 0, policy_version 799141 (0.0014) [2024-06-15 21:18:49,178][1652491] Updated weights for policy 0, policy_version 799188 (0.0012) [2024-06-15 21:18:50,955][1648985] Fps is (10 sec: 52429.7, 60 sec: 44236.8, 300 sec: 46208.4). Total num frames: 1636827136. Throughput: 0: 11150.2. Samples: 409262080. Policy #0 lag: (min: 63.0, avg: 204.6, max: 287.0) [2024-06-15 21:18:50,956][1648985] Avg episode reward: [(0, '161.080')] [2024-06-15 21:18:55,816][1652491] Updated weights for policy 0, policy_version 799248 (0.0012) [2024-06-15 21:18:55,955][1648985] Fps is (10 sec: 39320.6, 60 sec: 44236.7, 300 sec: 45764.1). Total num frames: 1636859904. Throughput: 0: 11081.9. Samples: 409303040. Policy #0 lag: (min: 15.0, avg: 72.3, max: 271.0) [2024-06-15 21:18:55,956][1648985] Avg episode reward: [(0, '157.230')] [2024-06-15 21:18:56,571][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000799280_1636925440.pth... [2024-06-15 21:18:56,715][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000793984_1626079232.pth [2024-06-15 21:18:57,602][1652491] Updated weights for policy 0, policy_version 799330 (0.0091) [2024-06-15 21:18:59,176][1652491] Updated weights for policy 0, policy_version 799416 (0.0115) [2024-06-15 21:19:00,955][1648985] Fps is (10 sec: 42598.1, 60 sec: 45329.3, 300 sec: 46097.4). Total num frames: 1637253120. Throughput: 0: 11366.4. Samples: 409366016. Policy #0 lag: (min: 15.0, avg: 72.3, max: 271.0) [2024-06-15 21:19:00,956][1648985] Avg episode reward: [(0, '155.090')] [2024-06-15 21:19:01,566][1652491] Updated weights for policy 0, policy_version 799480 (0.0013) [2024-06-15 21:19:05,955][1648985] Fps is (10 sec: 49152.5, 60 sec: 43690.7, 300 sec: 45875.2). Total num frames: 1637351424. Throughput: 0: 11184.3. Samples: 409440256. Policy #0 lag: (min: 15.0, avg: 72.3, max: 271.0) [2024-06-15 21:19:05,956][1648985] Avg episode reward: [(0, '156.640')] [2024-06-15 21:19:07,955][1652491] Updated weights for policy 0, policy_version 799541 (0.0014) [2024-06-15 21:19:08,703][1652491] Updated weights for policy 0, policy_version 799584 (0.0013) [2024-06-15 21:19:10,048][1652491] Updated weights for policy 0, policy_version 799649 (0.0097) [2024-06-15 21:19:10,955][1648985] Fps is (10 sec: 49152.8, 60 sec: 47513.6, 300 sec: 46208.4). Total num frames: 1637744640. Throughput: 0: 11286.8. Samples: 409473536. Policy #0 lag: (min: 15.0, avg: 72.3, max: 271.0) [2024-06-15 21:19:10,955][1648985] Avg episode reward: [(0, '162.380')] [2024-06-15 21:19:12,054][1652491] Updated weights for policy 0, policy_version 799699 (0.0012) [2024-06-15 21:19:15,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 43690.6, 300 sec: 46208.5). Total num frames: 1637875712. Throughput: 0: 11502.9. Samples: 409550848. Policy #0 lag: (min: 15.0, avg: 72.3, max: 271.0) [2024-06-15 21:19:15,956][1648985] Avg episode reward: [(0, '172.010')] [2024-06-15 21:19:17,117][1652491] Updated weights for policy 0, policy_version 799760 (0.0012) [2024-06-15 21:19:17,776][1651469] Signal inference workers to stop experience collection... (41600 times) [2024-06-15 21:19:17,859][1652491] InferenceWorker_p0-w0: stopping experience collection (41600 times) [2024-06-15 21:19:18,113][1651469] Signal inference workers to resume experience collection... (41600 times) [2024-06-15 21:19:18,115][1652491] InferenceWorker_p0-w0: resuming experience collection (41600 times) [2024-06-15 21:19:18,704][1652491] Updated weights for policy 0, policy_version 799811 (0.0035) [2024-06-15 21:19:19,796][1652491] Updated weights for policy 0, policy_version 799872 (0.0013) [2024-06-15 21:19:20,957][1648985] Fps is (10 sec: 45867.4, 60 sec: 48058.3, 300 sec: 45986.0). Total num frames: 1638203392. Throughput: 0: 11718.7. Samples: 409617408. Policy #0 lag: (min: 15.0, avg: 72.3, max: 271.0) [2024-06-15 21:19:20,957][1648985] Avg episode reward: [(0, '166.280')] [2024-06-15 21:19:21,185][1652491] Updated weights for policy 0, policy_version 799930 (0.0016) [2024-06-15 21:19:23,311][1652491] Updated weights for policy 0, policy_version 799994 (0.0013) [2024-06-15 21:19:25,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 44782.9, 300 sec: 46208.4). Total num frames: 1638400000. Throughput: 0: 11707.8. Samples: 409651712. Policy #0 lag: (min: 15.0, avg: 72.3, max: 271.0) [2024-06-15 21:19:25,956][1648985] Avg episode reward: [(0, '166.090')] [2024-06-15 21:19:28,685][1652491] Updated weights for policy 0, policy_version 800040 (0.0013) [2024-06-15 21:19:30,281][1652491] Updated weights for policy 0, policy_version 800097 (0.0015) [2024-06-15 21:19:30,955][1648985] Fps is (10 sec: 45882.0, 60 sec: 48059.5, 300 sec: 45875.2). Total num frames: 1638662144. Throughput: 0: 11684.9. Samples: 409728000. Policy #0 lag: (min: 15.0, avg: 72.3, max: 271.0) [2024-06-15 21:19:30,956][1648985] Avg episode reward: [(0, '158.580')] [2024-06-15 21:19:31,763][1652491] Updated weights for policy 0, policy_version 800162 (0.0100) [2024-06-15 21:19:33,696][1652491] Updated weights for policy 0, policy_version 800225 (0.0013) [2024-06-15 21:19:35,974][1648985] Fps is (10 sec: 52327.6, 60 sec: 46406.3, 300 sec: 46205.4). Total num frames: 1638924288. Throughput: 0: 11861.9. Samples: 409796096. Policy #0 lag: (min: 15.0, avg: 72.3, max: 271.0) [2024-06-15 21:19:35,975][1648985] Avg episode reward: [(0, '175.410')] [2024-06-15 21:19:39,797][1652491] Updated weights for policy 0, policy_version 800288 (0.0010) [2024-06-15 21:19:40,955][1648985] Fps is (10 sec: 39322.0, 60 sec: 45875.3, 300 sec: 45653.0). Total num frames: 1639055360. Throughput: 0: 11992.2. Samples: 409842688. Policy #0 lag: (min: 15.0, avg: 72.3, max: 271.0) [2024-06-15 21:19:40,956][1648985] Avg episode reward: [(0, '168.020')] [2024-06-15 21:19:41,199][1652491] Updated weights for policy 0, policy_version 800343 (0.0012) [2024-06-15 21:19:42,523][1652491] Updated weights for policy 0, policy_version 800400 (0.0013) [2024-06-15 21:19:44,153][1652491] Updated weights for policy 0, policy_version 800464 (0.0042) [2024-06-15 21:19:45,200][1652491] Updated weights for policy 0, policy_version 800509 (0.0018) [2024-06-15 21:19:45,955][1648985] Fps is (10 sec: 52530.8, 60 sec: 49698.1, 300 sec: 46208.4). Total num frames: 1639448576. Throughput: 0: 11867.1. Samples: 409900032. Policy #0 lag: (min: 15.0, avg: 72.3, max: 271.0) [2024-06-15 21:19:45,956][1648985] Avg episode reward: [(0, '166.060')] [2024-06-15 21:19:50,955][1648985] Fps is (10 sec: 42598.4, 60 sec: 44236.8, 300 sec: 45653.1). Total num frames: 1639481344. Throughput: 0: 12140.1. Samples: 409986560. Policy #0 lag: (min: 15.0, avg: 72.3, max: 271.0) [2024-06-15 21:19:50,956][1648985] Avg episode reward: [(0, '147.320')] [2024-06-15 21:19:51,473][1652491] Updated weights for policy 0, policy_version 800560 (0.0013) [2024-06-15 21:19:52,389][1652491] Updated weights for policy 0, policy_version 800600 (0.0020) [2024-06-15 21:19:54,026][1652491] Updated weights for policy 0, policy_version 800688 (0.0012) [2024-06-15 21:19:54,133][1651469] Signal inference workers to stop experience collection... (41650 times) [2024-06-15 21:19:54,248][1652491] InferenceWorker_p0-w0: stopping experience collection (41650 times) [2024-06-15 21:19:54,438][1651469] Signal inference workers to resume experience collection... (41650 times) [2024-06-15 21:19:54,450][1652491] InferenceWorker_p0-w0: resuming experience collection (41650 times) [2024-06-15 21:19:55,613][1652491] Updated weights for policy 0, policy_version 800758 (0.0024) [2024-06-15 21:19:55,955][1648985] Fps is (10 sec: 52428.6, 60 sec: 51882.8, 300 sec: 46208.5). Total num frames: 1639972864. Throughput: 0: 12094.6. Samples: 410017792. Policy #0 lag: (min: 15.0, avg: 72.3, max: 271.0) [2024-06-15 21:19:55,956][1648985] Avg episode reward: [(0, '144.910')] [2024-06-15 21:20:00,955][1648985] Fps is (10 sec: 49151.7, 60 sec: 45329.0, 300 sec: 45986.3). Total num frames: 1639972864. Throughput: 0: 12060.4. Samples: 410093568. Policy #0 lag: (min: 15.0, avg: 72.3, max: 271.0) [2024-06-15 21:20:00,956][1648985] Avg episode reward: [(0, '145.860')] [2024-06-15 21:20:02,000][1652491] Updated weights for policy 0, policy_version 800804 (0.0015) [2024-06-15 21:20:03,228][1652491] Updated weights for policy 0, policy_version 800869 (0.0015) [2024-06-15 21:20:04,513][1652491] Updated weights for policy 0, policy_version 800928 (0.0029) [2024-06-15 21:20:05,962][1648985] Fps is (10 sec: 45842.0, 60 sec: 51330.4, 300 sec: 46429.5). Total num frames: 1640431616. Throughput: 0: 12013.4. Samples: 410158080. Policy #0 lag: (min: 15.0, avg: 72.3, max: 271.0) [2024-06-15 21:20:05,963][1648985] Avg episode reward: [(0, '162.660')] [2024-06-15 21:20:06,434][1652491] Updated weights for policy 0, policy_version 801018 (0.0019) [2024-06-15 21:20:10,955][1648985] Fps is (10 sec: 52429.5, 60 sec: 45875.2, 300 sec: 46208.5). Total num frames: 1640497152. Throughput: 0: 12219.7. Samples: 410201600. Policy #0 lag: (min: 15.0, avg: 72.3, max: 271.0) [2024-06-15 21:20:10,956][1648985] Avg episode reward: [(0, '169.460')] [2024-06-15 21:20:13,565][1652491] Updated weights for policy 0, policy_version 801088 (0.0014) [2024-06-15 21:20:14,754][1652491] Updated weights for policy 0, policy_version 801140 (0.0106) [2024-06-15 21:20:15,955][1648985] Fps is (10 sec: 42629.4, 60 sec: 49698.2, 300 sec: 46208.4). Total num frames: 1640857600. Throughput: 0: 12151.5. Samples: 410274816. Policy #0 lag: (min: 15.0, avg: 72.3, max: 271.0) [2024-06-15 21:20:15,955][1648985] Avg episode reward: [(0, '165.390')] [2024-06-15 21:20:16,296][1652491] Updated weights for policy 0, policy_version 801220 (0.0019) [2024-06-15 21:20:20,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 46968.8, 300 sec: 46208.4). Total num frames: 1641021440. Throughput: 0: 12327.4. Samples: 410350592. Policy #0 lag: (min: 15.0, avg: 72.3, max: 271.0) [2024-06-15 21:20:20,956][1648985] Avg episode reward: [(0, '166.360')] [2024-06-15 21:20:23,302][1652491] Updated weights for policy 0, policy_version 801297 (0.0012) [2024-06-15 21:20:25,580][1652491] Updated weights for policy 0, policy_version 801392 (0.0013) [2024-06-15 21:20:25,955][1648985] Fps is (10 sec: 42597.6, 60 sec: 48059.7, 300 sec: 46430.6). Total num frames: 1641283584. Throughput: 0: 12105.9. Samples: 410387456. Policy #0 lag: (min: 15.0, avg: 72.3, max: 271.0) [2024-06-15 21:20:25,956][1648985] Avg episode reward: [(0, '176.500')] [2024-06-15 21:20:26,854][1652491] Updated weights for policy 0, policy_version 801457 (0.0014) [2024-06-15 21:20:27,940][1652491] Updated weights for policy 0, policy_version 801520 (0.0014) [2024-06-15 21:20:30,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 48059.9, 300 sec: 46430.6). Total num frames: 1641545728. Throughput: 0: 12219.7. Samples: 410449920. Policy #0 lag: (min: 15.0, avg: 72.3, max: 271.0) [2024-06-15 21:20:30,955][1648985] Avg episode reward: [(0, '167.320')] [2024-06-15 21:20:33,953][1651469] Signal inference workers to stop experience collection... (41700 times) [2024-06-15 21:20:33,993][1652491] InferenceWorker_p0-w0: stopping experience collection (41700 times) [2024-06-15 21:20:34,246][1651469] Signal inference workers to resume experience collection... (41700 times) [2024-06-15 21:20:34,247][1652491] InferenceWorker_p0-w0: resuming experience collection (41700 times) [2024-06-15 21:20:34,457][1652491] Updated weights for policy 0, policy_version 801569 (0.0016) [2024-06-15 21:20:35,955][1648985] Fps is (10 sec: 45875.6, 60 sec: 46982.6, 300 sec: 46763.8). Total num frames: 1641742336. Throughput: 0: 12014.9. Samples: 410527232. Policy #0 lag: (min: 15.0, avg: 72.3, max: 271.0) [2024-06-15 21:20:35,956][1648985] Avg episode reward: [(0, '184.660')] [2024-06-15 21:20:36,045][1652491] Updated weights for policy 0, policy_version 801648 (0.0013) [2024-06-15 21:20:37,495][1652491] Updated weights for policy 0, policy_version 801712 (0.0012) [2024-06-15 21:20:38,887][1652491] Updated weights for policy 0, policy_version 801776 (0.0013) [2024-06-15 21:20:40,955][1648985] Fps is (10 sec: 52427.4, 60 sec: 50244.1, 300 sec: 46652.7). Total num frames: 1642070016. Throughput: 0: 11992.1. Samples: 410557440. Policy #0 lag: (min: 15.0, avg: 72.3, max: 271.0) [2024-06-15 21:20:40,956][1648985] Avg episode reward: [(0, '158.540')] [2024-06-15 21:20:45,960][1648985] Fps is (10 sec: 39303.3, 60 sec: 44779.4, 300 sec: 46429.9). Total num frames: 1642135552. Throughput: 0: 12195.7. Samples: 410642432. Policy #0 lag: (min: 7.0, avg: 71.3, max: 263.0) [2024-06-15 21:20:45,960][1648985] Avg episode reward: [(0, '147.090')] [2024-06-15 21:20:46,057][1652491] Updated weights for policy 0, policy_version 801840 (0.0013) [2024-06-15 21:20:48,210][1652491] Updated weights for policy 0, policy_version 801922 (0.0107) [2024-06-15 21:20:49,476][1652491] Updated weights for policy 0, policy_version 801984 (0.0013) [2024-06-15 21:20:50,346][1652491] Updated weights for policy 0, policy_version 802032 (0.0027) [2024-06-15 21:20:50,955][1648985] Fps is (10 sec: 52430.5, 60 sec: 51882.8, 300 sec: 46874.9). Total num frames: 1642594304. Throughput: 0: 12051.0. Samples: 410700288. Policy #0 lag: (min: 7.0, avg: 71.3, max: 263.0) [2024-06-15 21:20:50,955][1648985] Avg episode reward: [(0, '143.250')] [2024-06-15 21:20:55,962][1648985] Fps is (10 sec: 45864.1, 60 sec: 43685.5, 300 sec: 46207.3). Total num frames: 1642594304. Throughput: 0: 12001.7. Samples: 410741760. Policy #0 lag: (min: 7.0, avg: 71.3, max: 263.0) [2024-06-15 21:20:55,963][1648985] Avg episode reward: [(0, '136.740')] [2024-06-15 21:20:56,448][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000802080_1642659840.pth... [2024-06-15 21:20:56,596][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000796672_1631584256.pth [2024-06-15 21:20:57,299][1652491] Updated weights for policy 0, policy_version 802112 (0.0099) [2024-06-15 21:20:59,236][1652491] Updated weights for policy 0, policy_version 802192 (0.0013) [2024-06-15 21:21:00,920][1652491] Updated weights for policy 0, policy_version 802256 (0.0011) [2024-06-15 21:21:00,955][1648985] Fps is (10 sec: 42598.2, 60 sec: 50790.5, 300 sec: 46763.8). Total num frames: 1643020288. Throughput: 0: 11741.9. Samples: 410803200. Policy #0 lag: (min: 7.0, avg: 71.3, max: 263.0) [2024-06-15 21:21:00,956][1648985] Avg episode reward: [(0, '146.900')] [2024-06-15 21:21:01,799][1652491] Updated weights for policy 0, policy_version 802303 (0.0013) [2024-06-15 21:21:05,955][1648985] Fps is (10 sec: 52465.9, 60 sec: 44788.3, 300 sec: 46208.4). Total num frames: 1643118592. Throughput: 0: 11650.8. Samples: 410874880. Policy #0 lag: (min: 7.0, avg: 71.3, max: 263.0) [2024-06-15 21:21:05,956][1648985] Avg episode reward: [(0, '142.740')] [2024-06-15 21:21:08,530][1652491] Updated weights for policy 0, policy_version 802352 (0.0017) [2024-06-15 21:21:09,544][1651469] Signal inference workers to stop experience collection... (41750 times) [2024-06-15 21:21:09,601][1652491] InferenceWorker_p0-w0: stopping experience collection (41750 times) [2024-06-15 21:21:09,746][1651469] Signal inference workers to resume experience collection... (41750 times) [2024-06-15 21:21:09,747][1652491] InferenceWorker_p0-w0: resuming experience collection (41750 times) [2024-06-15 21:21:09,749][1652491] Updated weights for policy 0, policy_version 802416 (0.0012) [2024-06-15 21:21:10,955][1648985] Fps is (10 sec: 42597.9, 60 sec: 49151.9, 300 sec: 46652.7). Total num frames: 1643446272. Throughput: 0: 11776.0. Samples: 410917376. Policy #0 lag: (min: 7.0, avg: 71.3, max: 263.0) [2024-06-15 21:21:10,956][1648985] Avg episode reward: [(0, '161.770')] [2024-06-15 21:21:11,898][1652491] Updated weights for policy 0, policy_version 802496 (0.0024) [2024-06-15 21:21:15,955][1648985] Fps is (10 sec: 52428.1, 60 sec: 46421.2, 300 sec: 46208.4). Total num frames: 1643642880. Throughput: 0: 11662.2. Samples: 410974720. Policy #0 lag: (min: 7.0, avg: 71.3, max: 263.0) [2024-06-15 21:21:15,956][1648985] Avg episode reward: [(0, '172.270')] [2024-06-15 21:21:19,144][1652491] Updated weights for policy 0, policy_version 802576 (0.0014) [2024-06-15 21:21:20,250][1652491] Updated weights for policy 0, policy_version 802627 (0.0012) [2024-06-15 21:21:20,955][1648985] Fps is (10 sec: 39321.6, 60 sec: 46967.4, 300 sec: 46208.4). Total num frames: 1643839488. Throughput: 0: 11548.4. Samples: 411046912. Policy #0 lag: (min: 7.0, avg: 71.3, max: 263.0) [2024-06-15 21:21:20,956][1648985] Avg episode reward: [(0, '182.300')] [2024-06-15 21:21:21,712][1652491] Updated weights for policy 0, policy_version 802691 (0.0046) [2024-06-15 21:21:23,836][1652491] Updated weights for policy 0, policy_version 802784 (0.0015) [2024-06-15 21:21:25,955][1648985] Fps is (10 sec: 52429.6, 60 sec: 48059.8, 300 sec: 46216.4). Total num frames: 1644167168. Throughput: 0: 11434.7. Samples: 411072000. Policy #0 lag: (min: 7.0, avg: 71.3, max: 263.0) [2024-06-15 21:21:25,956][1648985] Avg episode reward: [(0, '174.520')] [2024-06-15 21:21:30,263][1652491] Updated weights for policy 0, policy_version 802817 (0.0014) [2024-06-15 21:21:30,955][1648985] Fps is (10 sec: 36045.3, 60 sec: 44236.8, 300 sec: 45875.2). Total num frames: 1644199936. Throughput: 0: 11379.0. Samples: 411154432. Policy #0 lag: (min: 7.0, avg: 71.3, max: 263.0) [2024-06-15 21:21:30,955][1648985] Avg episode reward: [(0, '179.950')] [2024-06-15 21:21:32,393][1652491] Updated weights for policy 0, policy_version 802903 (0.0013) [2024-06-15 21:21:33,911][1652491] Updated weights for policy 0, policy_version 802963 (0.0022) [2024-06-15 21:21:35,582][1652491] Updated weights for policy 0, policy_version 803027 (0.0014) [2024-06-15 21:21:35,955][1648985] Fps is (10 sec: 45875.2, 60 sec: 48059.7, 300 sec: 46097.4). Total num frames: 1644625920. Throughput: 0: 11207.1. Samples: 411204608. Policy #0 lag: (min: 7.0, avg: 71.3, max: 263.0) [2024-06-15 21:21:35,956][1648985] Avg episode reward: [(0, '188.910')] [2024-06-15 21:21:40,955][1648985] Fps is (10 sec: 49151.5, 60 sec: 43690.8, 300 sec: 46208.4). Total num frames: 1644691456. Throughput: 0: 11117.8. Samples: 411241984. Policy #0 lag: (min: 7.0, avg: 71.3, max: 263.0) [2024-06-15 21:21:40,956][1648985] Avg episode reward: [(0, '167.060')] [2024-06-15 21:21:42,513][1652491] Updated weights for policy 0, policy_version 803088 (0.0016) [2024-06-15 21:21:44,172][1652491] Updated weights for policy 0, policy_version 803173 (0.0016) [2024-06-15 21:21:45,787][1652491] Updated weights for policy 0, policy_version 803235 (0.0014) [2024-06-15 21:21:45,955][1648985] Fps is (10 sec: 39321.3, 60 sec: 48063.4, 300 sec: 45986.3). Total num frames: 1645019136. Throughput: 0: 11389.1. Samples: 411315712. Policy #0 lag: (min: 7.0, avg: 71.3, max: 263.0) [2024-06-15 21:21:45,956][1648985] Avg episode reward: [(0, '174.020')] [2024-06-15 21:21:47,100][1651469] Signal inference workers to stop experience collection... (41800 times) [2024-06-15 21:21:47,152][1652491] InferenceWorker_p0-w0: stopping experience collection (41800 times) [2024-06-15 21:21:47,304][1651469] Signal inference workers to resume experience collection... (41800 times) [2024-06-15 21:21:47,307][1652491] InferenceWorker_p0-w0: resuming experience collection (41800 times) [2024-06-15 21:21:47,648][1652491] Updated weights for policy 0, policy_version 803312 (0.0109) [2024-06-15 21:21:50,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 43690.6, 300 sec: 46208.5). Total num frames: 1645215744. Throughput: 0: 11264.0. Samples: 411381760. Policy #0 lag: (min: 7.0, avg: 71.3, max: 263.0) [2024-06-15 21:21:50,956][1648985] Avg episode reward: [(0, '149.110')] [2024-06-15 21:21:54,254][1652491] Updated weights for policy 0, policy_version 803346 (0.0012) [2024-06-15 21:21:55,732][1652491] Updated weights for policy 0, policy_version 803408 (0.0014) [2024-06-15 21:21:55,955][1648985] Fps is (10 sec: 36044.9, 60 sec: 46426.8, 300 sec: 46097.4). Total num frames: 1645379584. Throughput: 0: 11252.6. Samples: 411423744. Policy #0 lag: (min: 7.0, avg: 71.3, max: 263.0) [2024-06-15 21:21:55,956][1648985] Avg episode reward: [(0, '160.040')] [2024-06-15 21:21:57,292][1652491] Updated weights for policy 0, policy_version 803472 (0.0109) [2024-06-15 21:21:59,424][1652491] Updated weights for policy 0, policy_version 803568 (0.0106) [2024-06-15 21:22:00,955][1648985] Fps is (10 sec: 52428.0, 60 sec: 45328.9, 300 sec: 46208.4). Total num frames: 1645740032. Throughput: 0: 11070.6. Samples: 411472896. Policy #0 lag: (min: 7.0, avg: 71.3, max: 263.0) [2024-06-15 21:22:00,956][1648985] Avg episode reward: [(0, '178.330')] [2024-06-15 21:22:05,955][1648985] Fps is (10 sec: 36045.0, 60 sec: 43690.7, 300 sec: 45875.2). Total num frames: 1645740032. Throughput: 0: 11264.0. Samples: 411553792. Policy #0 lag: (min: 7.0, avg: 71.3, max: 263.0) [2024-06-15 21:22:05,956][1648985] Avg episode reward: [(0, '190.110')] [2024-06-15 21:22:07,510][1652491] Updated weights for policy 0, policy_version 803634 (0.0013) [2024-06-15 21:22:09,184][1652491] Updated weights for policy 0, policy_version 803702 (0.0071) [2024-06-15 21:22:10,955][1648985] Fps is (10 sec: 36045.0, 60 sec: 44236.8, 300 sec: 45653.0). Total num frames: 1646100480. Throughput: 0: 11286.7. Samples: 411579904. Policy #0 lag: (min: 7.0, avg: 71.3, max: 263.0) [2024-06-15 21:22:10,956][1648985] Avg episode reward: [(0, '177.130')] [2024-06-15 21:22:11,073][1652491] Updated weights for policy 0, policy_version 803776 (0.0070) [2024-06-15 21:22:12,667][1652491] Updated weights for policy 0, policy_version 803838 (0.0013) [2024-06-15 21:22:15,955][1648985] Fps is (10 sec: 52429.3, 60 sec: 43690.8, 300 sec: 46208.4). Total num frames: 1646264320. Throughput: 0: 10956.8. Samples: 411647488. Policy #0 lag: (min: 7.0, avg: 71.3, max: 263.0) [2024-06-15 21:22:15,956][1648985] Avg episode reward: [(0, '169.610')] [2024-06-15 21:22:19,225][1652491] Updated weights for policy 0, policy_version 803889 (0.0017) [2024-06-15 21:22:20,955][1648985] Fps is (10 sec: 39321.7, 60 sec: 44236.8, 300 sec: 45653.1). Total num frames: 1646493696. Throughput: 0: 11400.5. Samples: 411717632. Policy #0 lag: (min: 7.0, avg: 71.3, max: 263.0) [2024-06-15 21:22:20,956][1648985] Avg episode reward: [(0, '153.600')] [2024-06-15 21:22:20,995][1652491] Updated weights for policy 0, policy_version 803968 (0.0016) [2024-06-15 21:22:23,960][1652491] Updated weights for policy 0, policy_version 804068 (0.0088) [2024-06-15 21:22:25,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 43690.7, 300 sec: 46208.5). Total num frames: 1646788608. Throughput: 0: 11002.3. Samples: 411737088. Policy #0 lag: (min: 7.0, avg: 71.3, max: 263.0) [2024-06-15 21:22:25,955][1648985] Avg episode reward: [(0, '153.710')] [2024-06-15 21:22:30,887][1651469] Signal inference workers to stop experience collection... (41850 times) [2024-06-15 21:22:30,932][1652491] InferenceWorker_p0-w0: stopping experience collection (41850 times) [2024-06-15 21:22:30,955][1648985] Fps is (10 sec: 39321.7, 60 sec: 44782.9, 300 sec: 45875.2). Total num frames: 1646886912. Throughput: 0: 11116.1. Samples: 411815936. Policy #0 lag: (min: 7.0, avg: 71.3, max: 263.0) [2024-06-15 21:22:30,956][1648985] Avg episode reward: [(0, '146.700')] [2024-06-15 21:22:31,082][1651469] Signal inference workers to resume experience collection... (41850 times) [2024-06-15 21:22:31,083][1652491] InferenceWorker_p0-w0: resuming experience collection (41850 times) [2024-06-15 21:22:31,085][1652491] Updated weights for policy 0, policy_version 804160 (0.0029) [2024-06-15 21:22:33,345][1652491] Updated weights for policy 0, policy_version 804256 (0.0013) [2024-06-15 21:22:35,342][1652491] Updated weights for policy 0, policy_version 804320 (0.0168) [2024-06-15 21:22:35,955][1648985] Fps is (10 sec: 49152.1, 60 sec: 44236.9, 300 sec: 46097.4). Total num frames: 1647280128. Throughput: 0: 10592.7. Samples: 411858432. Policy #0 lag: (min: 79.0, avg: 160.7, max: 367.0) [2024-06-15 21:22:35,955][1648985] Avg episode reward: [(0, '154.510')] [2024-06-15 21:22:40,955][1648985] Fps is (10 sec: 42598.3, 60 sec: 43690.6, 300 sec: 46097.3). Total num frames: 1647312896. Throughput: 0: 10638.2. Samples: 411902464. Policy #0 lag: (min: 79.0, avg: 160.7, max: 367.0) [2024-06-15 21:22:40,956][1648985] Avg episode reward: [(0, '169.500')] [2024-06-15 21:22:42,734][1652491] Updated weights for policy 0, policy_version 804371 (0.0013) [2024-06-15 21:22:44,209][1652491] Updated weights for policy 0, policy_version 804435 (0.0012) [2024-06-15 21:22:45,955][1648985] Fps is (10 sec: 32767.5, 60 sec: 43144.6, 300 sec: 45542.0). Total num frames: 1647607808. Throughput: 0: 11127.5. Samples: 411973632. Policy #0 lag: (min: 79.0, avg: 160.7, max: 367.0) [2024-06-15 21:22:45,956][1648985] Avg episode reward: [(0, '182.600')] [2024-06-15 21:22:46,200][1652491] Updated weights for policy 0, policy_version 804512 (0.0012) [2024-06-15 21:22:47,800][1652491] Updated weights for policy 0, policy_version 804563 (0.0012) [2024-06-15 21:22:50,955][1648985] Fps is (10 sec: 52429.3, 60 sec: 43690.7, 300 sec: 46208.4). Total num frames: 1647837184. Throughput: 0: 10672.4. Samples: 412034048. Policy #0 lag: (min: 79.0, avg: 160.7, max: 367.0) [2024-06-15 21:22:50,955][1648985] Avg episode reward: [(0, '172.370')] [2024-06-15 21:22:55,349][1652491] Updated weights for policy 0, policy_version 804658 (0.0014) [2024-06-15 21:22:55,955][1648985] Fps is (10 sec: 39321.2, 60 sec: 43690.6, 300 sec: 45653.1). Total num frames: 1648001024. Throughput: 0: 11025.0. Samples: 412076032. Policy #0 lag: (min: 79.0, avg: 160.7, max: 367.0) [2024-06-15 21:22:55,956][1648985] Avg episode reward: [(0, '168.860')] [2024-06-15 21:22:56,265][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000804704_1648033792.pth... [2024-06-15 21:22:56,462][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000799280_1636925440.pth [2024-06-15 21:22:57,367][1652491] Updated weights for policy 0, policy_version 804739 (0.0013) [2024-06-15 21:22:59,158][1652491] Updated weights for policy 0, policy_version 804816 (0.0016) [2024-06-15 21:23:00,196][1652491] Updated weights for policy 0, policy_version 804864 (0.0029) [2024-06-15 21:23:00,955][1648985] Fps is (10 sec: 52428.2, 60 sec: 43690.7, 300 sec: 46208.4). Total num frames: 1648361472. Throughput: 0: 10615.4. Samples: 412125184. Policy #0 lag: (min: 79.0, avg: 160.7, max: 367.0) [2024-06-15 21:23:00,956][1648985] Avg episode reward: [(0, '166.390')] [2024-06-15 21:23:06,002][1648985] Fps is (10 sec: 42399.2, 60 sec: 44747.8, 300 sec: 45867.9). Total num frames: 1648427008. Throughput: 0: 10990.8. Samples: 412212736. Policy #0 lag: (min: 79.0, avg: 160.7, max: 367.0) [2024-06-15 21:23:06,003][1648985] Avg episode reward: [(0, '190.250')] [2024-06-15 21:23:07,658][1652491] Updated weights for policy 0, policy_version 804948 (0.0115) [2024-06-15 21:23:08,621][1651469] Signal inference workers to stop experience collection... (41900 times) [2024-06-15 21:23:08,656][1652491] InferenceWorker_p0-w0: stopping experience collection (41900 times) [2024-06-15 21:23:08,920][1651469] Signal inference workers to resume experience collection... (41900 times) [2024-06-15 21:23:08,921][1652491] InferenceWorker_p0-w0: resuming experience collection (41900 times) [2024-06-15 21:23:09,355][1652491] Updated weights for policy 0, policy_version 805027 (0.0075) [2024-06-15 21:23:10,541][1652491] Updated weights for policy 0, policy_version 805076 (0.0030) [2024-06-15 21:23:10,955][1648985] Fps is (10 sec: 45875.0, 60 sec: 45329.0, 300 sec: 45986.3). Total num frames: 1648820224. Throughput: 0: 11036.4. Samples: 412233728. Policy #0 lag: (min: 79.0, avg: 160.7, max: 367.0) [2024-06-15 21:23:10,956][1648985] Avg episode reward: [(0, '184.290')] [2024-06-15 21:23:11,255][1652491] Updated weights for policy 0, policy_version 805115 (0.0011) [2024-06-15 21:23:15,955][1648985] Fps is (10 sec: 49384.9, 60 sec: 44236.8, 300 sec: 46097.3). Total num frames: 1648918528. Throughput: 0: 11252.6. Samples: 412322304. Policy #0 lag: (min: 79.0, avg: 160.7, max: 367.0) [2024-06-15 21:23:15,955][1648985] Avg episode reward: [(0, '199.790')] [2024-06-15 21:23:16,148][1652491] Updated weights for policy 0, policy_version 805152 (0.0011) [2024-06-15 21:23:17,784][1652491] Updated weights for policy 0, policy_version 805216 (0.0015) [2024-06-15 21:23:19,296][1652491] Updated weights for policy 0, policy_version 805280 (0.0020) [2024-06-15 21:23:20,750][1652491] Updated weights for policy 0, policy_version 805344 (0.0013) [2024-06-15 21:23:20,955][1648985] Fps is (10 sec: 52429.3, 60 sec: 47513.6, 300 sec: 46208.4). Total num frames: 1649344512. Throughput: 0: 11571.2. Samples: 412379136. Policy #0 lag: (min: 79.0, avg: 160.7, max: 367.0) [2024-06-15 21:23:20,956][1648985] Avg episode reward: [(0, '194.960')] [2024-06-15 21:23:25,955][1648985] Fps is (10 sec: 49151.9, 60 sec: 43690.6, 300 sec: 46208.4). Total num frames: 1649410048. Throughput: 0: 11616.7. Samples: 412425216. Policy #0 lag: (min: 79.0, avg: 160.7, max: 367.0) [2024-06-15 21:23:25,955][1648985] Avg episode reward: [(0, '201.640')] [2024-06-15 21:23:26,980][1652491] Updated weights for policy 0, policy_version 805392 (0.0029) [2024-06-15 21:23:28,543][1652491] Updated weights for policy 0, policy_version 805451 (0.0013) [2024-06-15 21:23:30,385][1652491] Updated weights for policy 0, policy_version 805520 (0.0014) [2024-06-15 21:23:30,955][1648985] Fps is (10 sec: 39321.3, 60 sec: 47513.5, 300 sec: 46097.3). Total num frames: 1649737728. Throughput: 0: 11593.9. Samples: 412495360. Policy #0 lag: (min: 79.0, avg: 160.7, max: 367.0) [2024-06-15 21:23:30,956][1648985] Avg episode reward: [(0, '187.080')] [2024-06-15 21:23:32,305][1652491] Updated weights for policy 0, policy_version 805616 (0.0016) [2024-06-15 21:23:35,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 44236.7, 300 sec: 46208.5). Total num frames: 1649934336. Throughput: 0: 11844.3. Samples: 412567040. Policy #0 lag: (min: 79.0, avg: 160.7, max: 367.0) [2024-06-15 21:23:35,956][1648985] Avg episode reward: [(0, '188.780')] [2024-06-15 21:23:38,884][1652491] Updated weights for policy 0, policy_version 805696 (0.0012) [2024-06-15 21:23:40,955][1648985] Fps is (10 sec: 45875.7, 60 sec: 48059.8, 300 sec: 46541.6). Total num frames: 1650196480. Throughput: 0: 11832.9. Samples: 412608512. Policy #0 lag: (min: 79.0, avg: 160.7, max: 367.0) [2024-06-15 21:23:40,956][1648985] Avg episode reward: [(0, '186.490')] [2024-06-15 21:23:41,476][1652491] Updated weights for policy 0, policy_version 805765 (0.0013) [2024-06-15 21:23:42,389][1652491] Updated weights for policy 0, policy_version 805809 (0.0100) [2024-06-15 21:23:43,628][1652491] Updated weights for policy 0, policy_version 805884 (0.0012) [2024-06-15 21:23:45,955][1648985] Fps is (10 sec: 52429.2, 60 sec: 47513.7, 300 sec: 46208.4). Total num frames: 1650458624. Throughput: 0: 12231.1. Samples: 412675584. Policy #0 lag: (min: 79.0, avg: 160.7, max: 367.0) [2024-06-15 21:23:45,955][1648985] Avg episode reward: [(0, '164.940')] [2024-06-15 21:23:49,831][1651469] Signal inference workers to stop experience collection... (41950 times) [2024-06-15 21:23:49,882][1652491] InferenceWorker_p0-w0: stopping experience collection (41950 times) [2024-06-15 21:23:50,156][1651469] Signal inference workers to resume experience collection... (41950 times) [2024-06-15 21:23:50,157][1652491] InferenceWorker_p0-w0: resuming experience collection (41950 times) [2024-06-15 21:23:50,160][1652491] Updated weights for policy 0, policy_version 805952 (0.0013) [2024-06-15 21:23:50,955][1648985] Fps is (10 sec: 42598.3, 60 sec: 46421.3, 300 sec: 46652.8). Total num frames: 1650622464. Throughput: 0: 11890.8. Samples: 412747264. Policy #0 lag: (min: 79.0, avg: 160.7, max: 367.0) [2024-06-15 21:23:50,956][1648985] Avg episode reward: [(0, '154.980')] [2024-06-15 21:23:51,536][1652491] Updated weights for policy 0, policy_version 806008 (0.0013) [2024-06-15 21:23:53,080][1652491] Updated weights for policy 0, policy_version 806049 (0.0011) [2024-06-15 21:23:54,755][1652491] Updated weights for policy 0, policy_version 806135 (0.0120) [2024-06-15 21:23:55,991][1648985] Fps is (10 sec: 52242.7, 60 sec: 49668.8, 300 sec: 46536.1). Total num frames: 1650982912. Throughput: 0: 11994.1. Samples: 412773888. Policy #0 lag: (min: 79.0, avg: 160.7, max: 367.0) [2024-06-15 21:23:55,991][1648985] Avg episode reward: [(0, '178.800')] [2024-06-15 21:24:00,926][1652491] Updated weights for policy 0, policy_version 806192 (0.0013) [2024-06-15 21:24:00,955][1648985] Fps is (10 sec: 45875.4, 60 sec: 45329.1, 300 sec: 46541.7). Total num frames: 1651081216. Throughput: 0: 11946.7. Samples: 412859904. Policy #0 lag: (min: 79.0, avg: 160.7, max: 367.0) [2024-06-15 21:24:00,955][1648985] Avg episode reward: [(0, '168.620')] [2024-06-15 21:24:02,647][1652491] Updated weights for policy 0, policy_version 806265 (0.0111) [2024-06-15 21:24:05,004][1652491] Updated weights for policy 0, policy_version 806336 (0.0015) [2024-06-15 21:24:05,955][1648985] Fps is (10 sec: 46038.8, 60 sec: 50283.7, 300 sec: 46430.6). Total num frames: 1651441664. Throughput: 0: 11753.2. Samples: 412908032. Policy #0 lag: (min: 79.0, avg: 160.7, max: 367.0) [2024-06-15 21:24:05,956][1648985] Avg episode reward: [(0, '164.590')] [2024-06-15 21:24:06,129][1652491] Updated weights for policy 0, policy_version 806384 (0.0012) [2024-06-15 21:24:10,955][1648985] Fps is (10 sec: 42598.4, 60 sec: 44783.0, 300 sec: 46208.4). Total num frames: 1651507200. Throughput: 0: 11673.6. Samples: 412950528. Policy #0 lag: (min: 79.0, avg: 160.7, max: 367.0) [2024-06-15 21:24:10,956][1648985] Avg episode reward: [(0, '169.320')] [2024-06-15 21:24:12,761][1652491] Updated weights for policy 0, policy_version 806458 (0.0138) [2024-06-15 21:24:14,874][1652491] Updated weights for policy 0, policy_version 806512 (0.0103) [2024-06-15 21:24:15,955][1648985] Fps is (10 sec: 39321.4, 60 sec: 48605.8, 300 sec: 46208.7). Total num frames: 1651834880. Throughput: 0: 11628.1. Samples: 413018624. Policy #0 lag: (min: 79.0, avg: 160.7, max: 367.0) [2024-06-15 21:24:15,956][1648985] Avg episode reward: [(0, '177.330')] [2024-06-15 21:24:16,611][1652491] Updated weights for policy 0, policy_version 806585 (0.0102) [2024-06-15 21:24:18,069][1652491] Updated weights for policy 0, policy_version 806656 (0.0012) [2024-06-15 21:24:20,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 44782.9, 300 sec: 46208.4). Total num frames: 1652031488. Throughput: 0: 11468.8. Samples: 413083136. Policy #0 lag: (min: 79.0, avg: 160.7, max: 367.0) [2024-06-15 21:24:20,956][1648985] Avg episode reward: [(0, '196.340')] [2024-06-15 21:24:24,862][1652491] Updated weights for policy 0, policy_version 806712 (0.0011) [2024-06-15 21:24:25,939][1652491] Updated weights for policy 0, policy_version 806752 (0.0012) [2024-06-15 21:24:25,955][1648985] Fps is (10 sec: 39321.7, 60 sec: 46967.4, 300 sec: 45986.3). Total num frames: 1652228096. Throughput: 0: 11491.5. Samples: 413125632. Policy #0 lag: (min: 11.0, avg: 80.1, max: 267.0) [2024-06-15 21:24:25,956][1648985] Avg episode reward: [(0, '188.810')] [2024-06-15 21:24:27,610][1652491] Updated weights for policy 0, policy_version 806818 (0.0015) [2024-06-15 21:24:28,676][1651469] Signal inference workers to stop experience collection... (42000 times) [2024-06-15 21:24:28,719][1652491] InferenceWorker_p0-w0: stopping experience collection (42000 times) [2024-06-15 21:24:28,749][1652491] Updated weights for policy 0, policy_version 806869 (0.0012) [2024-06-15 21:24:28,929][1651469] Signal inference workers to resume experience collection... (42000 times) [2024-06-15 21:24:28,934][1652491] InferenceWorker_p0-w0: resuming experience collection (42000 times) [2024-06-15 21:24:30,955][1648985] Fps is (10 sec: 52428.6, 60 sec: 46967.5, 300 sec: 46211.5). Total num frames: 1652555776. Throughput: 0: 11229.8. Samples: 413180928. Policy #0 lag: (min: 11.0, avg: 80.1, max: 267.0) [2024-06-15 21:24:30,956][1648985] Avg episode reward: [(0, '174.570')] [2024-06-15 21:24:35,223][1652491] Updated weights for policy 0, policy_version 806913 (0.0015) [2024-06-15 21:24:35,955][1648985] Fps is (10 sec: 36044.8, 60 sec: 44236.8, 300 sec: 45875.2). Total num frames: 1652588544. Throughput: 0: 11468.8. Samples: 413263360. Policy #0 lag: (min: 11.0, avg: 80.1, max: 267.0) [2024-06-15 21:24:35,956][1648985] Avg episode reward: [(0, '204.850')] [2024-06-15 21:24:36,989][1652491] Updated weights for policy 0, policy_version 806977 (0.0012) [2024-06-15 21:24:38,523][1652491] Updated weights for policy 0, policy_version 807042 (0.0099) [2024-06-15 21:24:40,228][1652491] Updated weights for policy 0, policy_version 807120 (0.0012) [2024-06-15 21:24:40,955][1648985] Fps is (10 sec: 45875.0, 60 sec: 46967.4, 300 sec: 45986.3). Total num frames: 1653014528. Throughput: 0: 11500.6. Samples: 413291008. Policy #0 lag: (min: 11.0, avg: 80.1, max: 267.0) [2024-06-15 21:24:40,956][1648985] Avg episode reward: [(0, '189.590')] [2024-06-15 21:24:41,372][1652491] Updated weights for policy 0, policy_version 807168 (0.0013) [2024-06-15 21:24:45,955][1648985] Fps is (10 sec: 49152.3, 60 sec: 43690.6, 300 sec: 46097.4). Total num frames: 1653080064. Throughput: 0: 11116.1. Samples: 413360128. Policy #0 lag: (min: 11.0, avg: 80.1, max: 267.0) [2024-06-15 21:24:45,956][1648985] Avg episode reward: [(0, '185.160')] [2024-06-15 21:24:48,179][1652491] Updated weights for policy 0, policy_version 807218 (0.0021) [2024-06-15 21:24:49,892][1652491] Updated weights for policy 0, policy_version 807296 (0.0012) [2024-06-15 21:24:50,955][1648985] Fps is (10 sec: 42598.4, 60 sec: 46967.4, 300 sec: 45653.0). Total num frames: 1653440512. Throughput: 0: 11434.7. Samples: 413422592. Policy #0 lag: (min: 11.0, avg: 80.1, max: 267.0) [2024-06-15 21:24:50,956][1648985] Avg episode reward: [(0, '182.360')] [2024-06-15 21:24:51,005][1652491] Updated weights for policy 0, policy_version 807356 (0.0022) [2024-06-15 21:24:52,271][1652491] Updated weights for policy 0, policy_version 807408 (0.0012) [2024-06-15 21:24:55,955][1648985] Fps is (10 sec: 52426.6, 60 sec: 43716.3, 300 sec: 46208.4). Total num frames: 1653604352. Throughput: 0: 11354.9. Samples: 413461504. Policy #0 lag: (min: 11.0, avg: 80.1, max: 267.0) [2024-06-15 21:24:55,956][1648985] Avg episode reward: [(0, '176.340')] [2024-06-15 21:24:55,961][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000807424_1653604352.pth... [2024-06-15 21:24:56,052][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000802080_1642659840.pth [2024-06-15 21:24:58,728][1652491] Updated weights for policy 0, policy_version 807456 (0.0012) [2024-06-15 21:25:00,113][1652491] Updated weights for policy 0, policy_version 807507 (0.0012) [2024-06-15 21:25:00,955][1648985] Fps is (10 sec: 39322.2, 60 sec: 45875.2, 300 sec: 45432.0). Total num frames: 1653833728. Throughput: 0: 11537.1. Samples: 413537792. Policy #0 lag: (min: 11.0, avg: 80.1, max: 267.0) [2024-06-15 21:25:00,956][1648985] Avg episode reward: [(0, '165.010')] [2024-06-15 21:25:01,554][1652491] Updated weights for policy 0, policy_version 807571 (0.0011) [2024-06-15 21:25:03,503][1652491] Updated weights for policy 0, policy_version 807679 (0.0075) [2024-06-15 21:25:05,955][1648985] Fps is (10 sec: 52430.9, 60 sec: 44782.9, 300 sec: 46208.4). Total num frames: 1654128640. Throughput: 0: 11685.0. Samples: 413608960. Policy #0 lag: (min: 11.0, avg: 80.1, max: 267.0) [2024-06-15 21:25:05,956][1648985] Avg episode reward: [(0, '173.130')] [2024-06-15 21:25:10,811][1652491] Updated weights for policy 0, policy_version 807744 (0.0037) [2024-06-15 21:25:10,955][1648985] Fps is (10 sec: 42598.0, 60 sec: 45875.2, 300 sec: 45430.9). Total num frames: 1654259712. Throughput: 0: 11616.7. Samples: 413648384. Policy #0 lag: (min: 11.0, avg: 80.1, max: 267.0) [2024-06-15 21:25:10,956][1648985] Avg episode reward: [(0, '182.380')] [2024-06-15 21:25:10,980][1651469] Signal inference workers to stop experience collection... (42050 times) [2024-06-15 21:25:11,043][1652491] InferenceWorker_p0-w0: stopping experience collection (42050 times) [2024-06-15 21:25:11,222][1651469] Signal inference workers to resume experience collection... (42050 times) [2024-06-15 21:25:11,223][1652491] InferenceWorker_p0-w0: resuming experience collection (42050 times) [2024-06-15 21:25:12,624][1652491] Updated weights for policy 0, policy_version 807809 (0.0014) [2024-06-15 21:25:13,736][1652491] Updated weights for policy 0, policy_version 807870 (0.0013) [2024-06-15 21:25:15,496][1652491] Updated weights for policy 0, policy_version 807928 (0.0013) [2024-06-15 21:25:15,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 46967.5, 300 sec: 46208.4). Total num frames: 1654652928. Throughput: 0: 11639.5. Samples: 413704704. Policy #0 lag: (min: 11.0, avg: 80.1, max: 267.0) [2024-06-15 21:25:15,956][1648985] Avg episode reward: [(0, '157.370')] [2024-06-15 21:25:20,349][1652491] Updated weights for policy 0, policy_version 807968 (0.0023) [2024-06-15 21:25:20,955][1648985] Fps is (10 sec: 49152.6, 60 sec: 45329.1, 300 sec: 45653.1). Total num frames: 1654751232. Throughput: 0: 11650.9. Samples: 413787648. Policy #0 lag: (min: 11.0, avg: 80.1, max: 267.0) [2024-06-15 21:25:20,955][1648985] Avg episode reward: [(0, '157.240')] [2024-06-15 21:25:21,811][1652491] Updated weights for policy 0, policy_version 808034 (0.0015) [2024-06-15 21:25:23,885][1652491] Updated weights for policy 0, policy_version 808118 (0.0012) [2024-06-15 21:25:25,955][1648985] Fps is (10 sec: 45875.2, 60 sec: 48059.7, 300 sec: 45986.3). Total num frames: 1655111680. Throughput: 0: 11582.6. Samples: 413812224. Policy #0 lag: (min: 11.0, avg: 80.1, max: 267.0) [2024-06-15 21:25:25,956][1648985] Avg episode reward: [(0, '151.890')] [2024-06-15 21:25:26,187][1652491] Updated weights for policy 0, policy_version 808165 (0.0013) [2024-06-15 21:25:30,519][1652491] Updated weights for policy 0, policy_version 808211 (0.0013) [2024-06-15 21:25:30,955][1648985] Fps is (10 sec: 49151.3, 60 sec: 44782.9, 300 sec: 45764.1). Total num frames: 1655242752. Throughput: 0: 12003.5. Samples: 413900288. Policy #0 lag: (min: 11.0, avg: 80.1, max: 267.0) [2024-06-15 21:25:30,956][1648985] Avg episode reward: [(0, '163.520')] [2024-06-15 21:25:33,058][1652491] Updated weights for policy 0, policy_version 808318 (0.0260) [2024-06-15 21:25:34,656][1652491] Updated weights for policy 0, policy_version 808380 (0.0014) [2024-06-15 21:25:35,955][1648985] Fps is (10 sec: 45875.2, 60 sec: 49698.2, 300 sec: 45764.2). Total num frames: 1655570432. Throughput: 0: 11958.1. Samples: 413960704. Policy #0 lag: (min: 11.0, avg: 80.1, max: 267.0) [2024-06-15 21:25:35,956][1648985] Avg episode reward: [(0, '169.650')] [2024-06-15 21:25:37,600][1652491] Updated weights for policy 0, policy_version 808446 (0.0014) [2024-06-15 21:25:40,955][1648985] Fps is (10 sec: 45874.1, 60 sec: 44782.8, 300 sec: 45987.0). Total num frames: 1655701504. Throughput: 0: 11924.0. Samples: 413998080. Policy #0 lag: (min: 11.0, avg: 80.1, max: 267.0) [2024-06-15 21:25:40,957][1648985] Avg episode reward: [(0, '196.470')] [2024-06-15 21:25:41,948][1652491] Updated weights for policy 0, policy_version 808496 (0.0014) [2024-06-15 21:25:42,945][1652491] Updated weights for policy 0, policy_version 808531 (0.0011) [2024-06-15 21:25:45,343][1652491] Updated weights for policy 0, policy_version 808611 (0.0232) [2024-06-15 21:25:45,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 50244.3, 300 sec: 45764.1). Total num frames: 1656094720. Throughput: 0: 11787.4. Samples: 414068224. Policy #0 lag: (min: 11.0, avg: 80.1, max: 267.0) [2024-06-15 21:25:45,956][1648985] Avg episode reward: [(0, '198.610')] [2024-06-15 21:25:47,828][1652491] Updated weights for policy 0, policy_version 808656 (0.0158) [2024-06-15 21:25:50,955][1648985] Fps is (10 sec: 52429.1, 60 sec: 46421.2, 300 sec: 46209.5). Total num frames: 1656225792. Throughput: 0: 11776.0. Samples: 414138880. Policy #0 lag: (min: 11.0, avg: 80.1, max: 267.0) [2024-06-15 21:25:50,956][1648985] Avg episode reward: [(0, '198.540')] [2024-06-15 21:25:52,545][1651469] Signal inference workers to stop experience collection... (42100 times) [2024-06-15 21:25:52,589][1652491] InferenceWorker_p0-w0: stopping experience collection (42100 times) [2024-06-15 21:25:52,780][1651469] Signal inference workers to resume experience collection... (42100 times) [2024-06-15 21:25:52,780][1652491] InferenceWorker_p0-w0: resuming experience collection (42100 times) [2024-06-15 21:25:52,782][1652491] Updated weights for policy 0, policy_version 808736 (0.0014) [2024-06-15 21:25:53,945][1652491] Updated weights for policy 0, policy_version 808771 (0.0014) [2024-06-15 21:25:55,541][1652491] Updated weights for policy 0, policy_version 808832 (0.0013) [2024-06-15 21:25:55,955][1648985] Fps is (10 sec: 42597.8, 60 sec: 48606.1, 300 sec: 45764.1). Total num frames: 1656520704. Throughput: 0: 11741.8. Samples: 414176768. Policy #0 lag: (min: 11.0, avg: 80.1, max: 267.0) [2024-06-15 21:25:55,956][1648985] Avg episode reward: [(0, '166.680')] [2024-06-15 21:25:56,770][1652491] Updated weights for policy 0, policy_version 808889 (0.0014) [2024-06-15 21:26:00,547][1652491] Updated weights for policy 0, policy_version 808951 (0.0121) [2024-06-15 21:26:00,993][1648985] Fps is (10 sec: 52231.6, 60 sec: 48575.1, 300 sec: 46202.5). Total num frames: 1656750080. Throughput: 0: 11936.6. Samples: 414242304. Policy #0 lag: (min: 11.0, avg: 80.1, max: 267.0) [2024-06-15 21:26:00,993][1648985] Avg episode reward: [(0, '177.550')] [2024-06-15 21:26:04,368][1652491] Updated weights for policy 0, policy_version 809020 (0.0017) [2024-06-15 21:26:05,955][1648985] Fps is (10 sec: 45875.9, 60 sec: 47513.6, 300 sec: 45875.2). Total num frames: 1656979456. Throughput: 0: 11662.2. Samples: 414312448. Policy #0 lag: (min: 11.0, avg: 80.1, max: 267.0) [2024-06-15 21:26:05,956][1648985] Avg episode reward: [(0, '152.190')] [2024-06-15 21:26:06,114][1652491] Updated weights for policy 0, policy_version 809084 (0.0016) [2024-06-15 21:26:07,698][1652491] Updated weights for policy 0, policy_version 809136 (0.0013) [2024-06-15 21:26:10,726][1652491] Updated weights for policy 0, policy_version 809184 (0.0013) [2024-06-15 21:26:10,955][1648985] Fps is (10 sec: 46050.0, 60 sec: 49152.0, 300 sec: 45986.3). Total num frames: 1657208832. Throughput: 0: 11878.4. Samples: 414346752. Policy #0 lag: (min: 11.0, avg: 80.1, max: 267.0) [2024-06-15 21:26:10,955][1648985] Avg episode reward: [(0, '159.420')] [2024-06-15 21:26:14,519][1652491] Updated weights for policy 0, policy_version 809232 (0.0012) [2024-06-15 21:26:15,820][1652491] Updated weights for policy 0, policy_version 809285 (0.0067) [2024-06-15 21:26:15,955][1648985] Fps is (10 sec: 42598.7, 60 sec: 45875.3, 300 sec: 45986.3). Total num frames: 1657405440. Throughput: 0: 11730.5. Samples: 414428160. Policy #0 lag: (min: 63.0, avg: 187.3, max: 319.0) [2024-06-15 21:26:15,955][1648985] Avg episode reward: [(0, '180.020')] [2024-06-15 21:26:18,233][1652491] Updated weights for policy 0, policy_version 809363 (0.0013) [2024-06-15 21:26:18,991][1652491] Updated weights for policy 0, policy_version 809405 (0.0012) [2024-06-15 21:26:20,955][1648985] Fps is (10 sec: 49152.1, 60 sec: 49151.9, 300 sec: 45875.2). Total num frames: 1657700352. Throughput: 0: 11844.3. Samples: 414493696. Policy #0 lag: (min: 63.0, avg: 187.3, max: 319.0) [2024-06-15 21:26:20,956][1648985] Avg episode reward: [(0, '171.450')] [2024-06-15 21:26:21,732][1652491] Updated weights for policy 0, policy_version 809459 (0.0014) [2024-06-15 21:26:25,804][1652491] Updated weights for policy 0, policy_version 809490 (0.0016) [2024-06-15 21:26:25,955][1648985] Fps is (10 sec: 42597.8, 60 sec: 45329.0, 300 sec: 46208.4). Total num frames: 1657831424. Throughput: 0: 11912.6. Samples: 414534144. Policy #0 lag: (min: 63.0, avg: 187.3, max: 319.0) [2024-06-15 21:26:25,956][1648985] Avg episode reward: [(0, '151.560')] [2024-06-15 21:26:27,158][1652491] Updated weights for policy 0, policy_version 809552 (0.0013) [2024-06-15 21:26:29,096][1652491] Updated weights for policy 0, policy_version 809616 (0.0015) [2024-06-15 21:26:30,955][1648985] Fps is (10 sec: 49151.6, 60 sec: 49152.0, 300 sec: 45986.3). Total num frames: 1658191872. Throughput: 0: 11832.9. Samples: 414600704. Policy #0 lag: (min: 63.0, avg: 187.3, max: 319.0) [2024-06-15 21:26:30,956][1648985] Avg episode reward: [(0, '156.590')] [2024-06-15 21:26:31,748][1652491] Updated weights for policy 0, policy_version 809680 (0.0041) [2024-06-15 21:26:32,543][1652491] Updated weights for policy 0, policy_version 809721 (0.0012) [2024-06-15 21:26:35,955][1648985] Fps is (10 sec: 49150.9, 60 sec: 45875.0, 300 sec: 46208.4). Total num frames: 1658322944. Throughput: 0: 12117.3. Samples: 414684160. Policy #0 lag: (min: 63.0, avg: 187.3, max: 319.0) [2024-06-15 21:26:35,956][1648985] Avg episode reward: [(0, '177.230')] [2024-06-15 21:26:36,143][1651469] Signal inference workers to stop experience collection... (42150 times) [2024-06-15 21:26:36,169][1652491] InferenceWorker_p0-w0: stopping experience collection (42150 times) [2024-06-15 21:26:36,404][1651469] Signal inference workers to resume experience collection... (42150 times) [2024-06-15 21:26:36,405][1652491] InferenceWorker_p0-w0: resuming experience collection (42150 times) [2024-06-15 21:26:37,313][1652491] Updated weights for policy 0, policy_version 809776 (0.0015) [2024-06-15 21:26:38,763][1652491] Updated weights for policy 0, policy_version 809840 (0.0013) [2024-06-15 21:26:40,825][1652491] Updated weights for policy 0, policy_version 809904 (0.0013) [2024-06-15 21:26:40,955][1648985] Fps is (10 sec: 49151.9, 60 sec: 49698.3, 300 sec: 46319.5). Total num frames: 1658683392. Throughput: 0: 11980.8. Samples: 414715904. Policy #0 lag: (min: 63.0, avg: 187.3, max: 319.0) [2024-06-15 21:26:40,956][1648985] Avg episode reward: [(0, '199.590')] [2024-06-15 21:26:43,111][1652491] Updated weights for policy 0, policy_version 809983 (0.0011) [2024-06-15 21:26:45,955][1648985] Fps is (10 sec: 52429.6, 60 sec: 45875.1, 300 sec: 46208.4). Total num frames: 1658847232. Throughput: 0: 12093.4. Samples: 414786048. Policy #0 lag: (min: 63.0, avg: 187.3, max: 319.0) [2024-06-15 21:26:45,956][1648985] Avg episode reward: [(0, '202.740')] [2024-06-15 21:26:48,387][1652491] Updated weights for policy 0, policy_version 810048 (0.0031) [2024-06-15 21:26:49,591][1652491] Updated weights for policy 0, policy_version 810097 (0.0012) [2024-06-15 21:26:50,955][1648985] Fps is (10 sec: 45875.5, 60 sec: 48606.0, 300 sec: 46652.8). Total num frames: 1659142144. Throughput: 0: 12174.2. Samples: 414860288. Policy #0 lag: (min: 63.0, avg: 187.3, max: 319.0) [2024-06-15 21:26:50,955][1648985] Avg episode reward: [(0, '174.260')] [2024-06-15 21:26:51,408][1652491] Updated weights for policy 0, policy_version 810149 (0.0014) [2024-06-15 21:26:52,874][1652491] Updated weights for policy 0, policy_version 810208 (0.0013) [2024-06-15 21:26:55,955][1648985] Fps is (10 sec: 52429.6, 60 sec: 47513.7, 300 sec: 46208.5). Total num frames: 1659371520. Throughput: 0: 12140.1. Samples: 414893056. Policy #0 lag: (min: 63.0, avg: 187.3, max: 319.0) [2024-06-15 21:26:55,956][1648985] Avg episode reward: [(0, '171.920')] [2024-06-15 21:26:56,000][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000810240_1659371520.pth... [2024-06-15 21:26:56,056][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000804704_1648033792.pth [2024-06-15 21:26:58,973][1652491] Updated weights for policy 0, policy_version 810262 (0.0014) [2024-06-15 21:27:00,195][1652491] Updated weights for policy 0, policy_version 810324 (0.0014) [2024-06-15 21:27:00,955][1648985] Fps is (10 sec: 45875.1, 60 sec: 47543.7, 300 sec: 46986.0). Total num frames: 1659600896. Throughput: 0: 12083.2. Samples: 414971904. Policy #0 lag: (min: 63.0, avg: 187.3, max: 319.0) [2024-06-15 21:27:00,956][1648985] Avg episode reward: [(0, '165.470')] [2024-06-15 21:27:01,210][1652491] Updated weights for policy 0, policy_version 810368 (0.0012) [2024-06-15 21:27:02,311][1652491] Updated weights for policy 0, policy_version 810426 (0.0087) [2024-06-15 21:27:04,202][1652491] Updated weights for policy 0, policy_version 810480 (0.0013) [2024-06-15 21:27:05,955][1648985] Fps is (10 sec: 52427.9, 60 sec: 48605.7, 300 sec: 46763.8). Total num frames: 1659895808. Throughput: 0: 12037.6. Samples: 415035392. Policy #0 lag: (min: 63.0, avg: 187.3, max: 319.0) [2024-06-15 21:27:05,956][1648985] Avg episode reward: [(0, '162.760')] [2024-06-15 21:27:09,375][1652491] Updated weights for policy 0, policy_version 810499 (0.0012) [2024-06-15 21:27:10,791][1652491] Updated weights for policy 0, policy_version 810560 (0.0012) [2024-06-15 21:27:10,955][1648985] Fps is (10 sec: 42598.9, 60 sec: 46967.5, 300 sec: 46652.8). Total num frames: 1660026880. Throughput: 0: 12083.2. Samples: 415077888. Policy #0 lag: (min: 63.0, avg: 187.3, max: 319.0) [2024-06-15 21:27:10,955][1648985] Avg episode reward: [(0, '167.640')] [2024-06-15 21:27:11,943][1652491] Updated weights for policy 0, policy_version 810617 (0.0031) [2024-06-15 21:27:13,556][1652491] Updated weights for policy 0, policy_version 810659 (0.0014) [2024-06-15 21:27:14,274][1651469] Signal inference workers to stop experience collection... (42200 times) [2024-06-15 21:27:14,386][1652491] InferenceWorker_p0-w0: stopping experience collection (42200 times) [2024-06-15 21:27:14,504][1651469] Signal inference workers to resume experience collection... (42200 times) [2024-06-15 21:27:14,504][1652491] InferenceWorker_p0-w0: resuming experience collection (42200 times) [2024-06-15 21:27:15,225][1652491] Updated weights for policy 0, policy_version 810736 (0.0173) [2024-06-15 21:27:15,955][1648985] Fps is (10 sec: 52429.9, 60 sec: 50244.2, 300 sec: 47208.1). Total num frames: 1660420096. Throughput: 0: 11969.4. Samples: 415139328. Policy #0 lag: (min: 63.0, avg: 187.3, max: 319.0) [2024-06-15 21:27:15,956][1648985] Avg episode reward: [(0, '141.180')] [2024-06-15 21:27:20,955][1648985] Fps is (10 sec: 45874.9, 60 sec: 46421.3, 300 sec: 46430.6). Total num frames: 1660485632. Throughput: 0: 11867.1. Samples: 415218176. Policy #0 lag: (min: 63.0, avg: 187.3, max: 319.0) [2024-06-15 21:27:20,955][1648985] Avg episode reward: [(0, '143.380')] [2024-06-15 21:27:21,286][1652491] Updated weights for policy 0, policy_version 810800 (0.0016) [2024-06-15 21:27:22,471][1652491] Updated weights for policy 0, policy_version 810848 (0.0012) [2024-06-15 21:27:24,574][1652491] Updated weights for policy 0, policy_version 810901 (0.0013) [2024-06-15 21:27:25,955][1648985] Fps is (10 sec: 42598.1, 60 sec: 50244.3, 300 sec: 47319.2). Total num frames: 1660846080. Throughput: 0: 11912.5. Samples: 415251968. Policy #0 lag: (min: 63.0, avg: 187.3, max: 319.0) [2024-06-15 21:27:25,956][1648985] Avg episode reward: [(0, '157.730')] [2024-06-15 21:27:26,312][1652491] Updated weights for policy 0, policy_version 810979 (0.0013) [2024-06-15 21:27:30,955][1648985] Fps is (10 sec: 45874.4, 60 sec: 45875.1, 300 sec: 46319.5). Total num frames: 1660944384. Throughput: 0: 12060.5. Samples: 415328768. Policy #0 lag: (min: 63.0, avg: 187.3, max: 319.0) [2024-06-15 21:27:30,956][1648985] Avg episode reward: [(0, '169.230')] [2024-06-15 21:27:31,004][1652491] Updated weights for policy 0, policy_version 811009 (0.0031) [2024-06-15 21:27:32,195][1652491] Updated weights for policy 0, policy_version 811065 (0.0016) [2024-06-15 21:27:33,799][1652491] Updated weights for policy 0, policy_version 811136 (0.0023) [2024-06-15 21:27:35,955][1648985] Fps is (10 sec: 42598.4, 60 sec: 49152.2, 300 sec: 47319.2). Total num frames: 1661272064. Throughput: 0: 11821.5. Samples: 415392256. Policy #0 lag: (min: 63.0, avg: 187.3, max: 319.0) [2024-06-15 21:27:35,956][1648985] Avg episode reward: [(0, '181.830')] [2024-06-15 21:27:36,829][1652491] Updated weights for policy 0, policy_version 811216 (0.0013) [2024-06-15 21:27:40,955][1648985] Fps is (10 sec: 52429.6, 60 sec: 46421.4, 300 sec: 46986.0). Total num frames: 1661468672. Throughput: 0: 11798.8. Samples: 415424000. Policy #0 lag: (min: 63.0, avg: 187.3, max: 319.0) [2024-06-15 21:27:40,956][1648985] Avg episode reward: [(0, '185.940')] [2024-06-15 21:27:41,866][1652491] Updated weights for policy 0, policy_version 811268 (0.0013) [2024-06-15 21:27:43,194][1652491] Updated weights for policy 0, policy_version 811328 (0.0018) [2024-06-15 21:27:45,955][1648985] Fps is (10 sec: 45875.0, 60 sec: 48059.8, 300 sec: 47097.0). Total num frames: 1661730816. Throughput: 0: 11685.0. Samples: 415497728. Policy #0 lag: (min: 63.0, avg: 187.3, max: 319.0) [2024-06-15 21:27:45,956][1648985] Avg episode reward: [(0, '173.150')] [2024-06-15 21:27:46,867][1652491] Updated weights for policy 0, policy_version 811394 (0.0014) [2024-06-15 21:27:48,178][1652491] Updated weights for policy 0, policy_version 811458 (0.0014) [2024-06-15 21:27:49,197][1652491] Updated weights for policy 0, policy_version 811515 (0.0013) [2024-06-15 21:27:50,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 47513.6, 300 sec: 47430.3). Total num frames: 1661992960. Throughput: 0: 11935.4. Samples: 415572480. Policy #0 lag: (min: 63.0, avg: 187.3, max: 319.0) [2024-06-15 21:27:50,955][1648985] Avg episode reward: [(0, '143.310')] [2024-06-15 21:27:53,515][1652491] Updated weights for policy 0, policy_version 811568 (0.0025) [2024-06-15 21:27:55,437][1652491] Updated weights for policy 0, policy_version 811641 (0.0143) [2024-06-15 21:27:55,955][1648985] Fps is (10 sec: 52428.2, 60 sec: 48059.6, 300 sec: 47097.0). Total num frames: 1662255104. Throughput: 0: 11866.9. Samples: 415611904. Policy #0 lag: (min: 63.0, avg: 187.3, max: 319.0) [2024-06-15 21:27:55,956][1648985] Avg episode reward: [(0, '167.120')] [2024-06-15 21:27:57,404][1651469] Signal inference workers to stop experience collection... (42250 times) [2024-06-15 21:27:57,442][1652491] InferenceWorker_p0-w0: stopping experience collection (42250 times) [2024-06-15 21:27:57,646][1651469] Signal inference workers to resume experience collection... (42250 times) [2024-06-15 21:27:57,647][1652491] InferenceWorker_p0-w0: resuming experience collection (42250 times) [2024-06-15 21:27:58,470][1652491] Updated weights for policy 0, policy_version 811697 (0.0012) [2024-06-15 21:27:59,966][1652491] Updated weights for policy 0, policy_version 811776 (0.0019) [2024-06-15 21:28:00,955][1648985] Fps is (10 sec: 52428.5, 60 sec: 48605.9, 300 sec: 47771.2). Total num frames: 1662517248. Throughput: 0: 11855.6. Samples: 415672832. Policy #0 lag: (min: 93.0, avg: 195.9, max: 333.0) [2024-06-15 21:28:00,956][1648985] Avg episode reward: [(0, '171.420')] [2024-06-15 21:28:05,492][1652491] Updated weights for policy 0, policy_version 811838 (0.0013) [2024-06-15 21:28:05,955][1648985] Fps is (10 sec: 42599.4, 60 sec: 46421.5, 300 sec: 46986.0). Total num frames: 1662681088. Throughput: 0: 11889.8. Samples: 415753216. Policy #0 lag: (min: 93.0, avg: 195.9, max: 333.0) [2024-06-15 21:28:05,955][1648985] Avg episode reward: [(0, '169.500')] [2024-06-15 21:28:06,949][1652491] Updated weights for policy 0, policy_version 811904 (0.0019) [2024-06-15 21:28:10,480][1652491] Updated weights for policy 0, policy_version 812000 (0.0105) [2024-06-15 21:28:10,955][1648985] Fps is (10 sec: 49152.1, 60 sec: 49698.1, 300 sec: 47763.5). Total num frames: 1663008768. Throughput: 0: 11912.6. Samples: 415788032. Policy #0 lag: (min: 93.0, avg: 195.9, max: 333.0) [2024-06-15 21:28:10,955][1648985] Avg episode reward: [(0, '163.690')] [2024-06-15 21:28:15,239][1652491] Updated weights for policy 0, policy_version 812048 (0.0013) [2024-06-15 21:28:15,955][1648985] Fps is (10 sec: 45874.5, 60 sec: 45329.0, 300 sec: 46763.8). Total num frames: 1663139840. Throughput: 0: 11946.7. Samples: 415866368. Policy #0 lag: (min: 93.0, avg: 195.9, max: 333.0) [2024-06-15 21:28:15,957][1648985] Avg episode reward: [(0, '174.150')] [2024-06-15 21:28:16,235][1652491] Updated weights for policy 0, policy_version 812091 (0.0118) [2024-06-15 21:28:17,990][1652491] Updated weights for policy 0, policy_version 812155 (0.0013) [2024-06-15 21:28:19,761][1652491] Updated weights for policy 0, policy_version 812208 (0.0013) [2024-06-15 21:28:20,955][1648985] Fps is (10 sec: 49151.0, 60 sec: 50244.1, 300 sec: 47763.5). Total num frames: 1663500288. Throughput: 0: 11878.4. Samples: 415926784. Policy #0 lag: (min: 93.0, avg: 195.9, max: 333.0) [2024-06-15 21:28:20,956][1648985] Avg episode reward: [(0, '164.380')] [2024-06-15 21:28:21,547][1652491] Updated weights for policy 0, policy_version 812284 (0.0023) [2024-06-15 21:28:25,955][1648985] Fps is (10 sec: 42598.3, 60 sec: 45329.0, 300 sec: 46874.9). Total num frames: 1663565824. Throughput: 0: 12071.8. Samples: 415967232. Policy #0 lag: (min: 93.0, avg: 195.9, max: 333.0) [2024-06-15 21:28:25,956][1648985] Avg episode reward: [(0, '172.360')] [2024-06-15 21:28:27,331][1652491] Updated weights for policy 0, policy_version 812336 (0.0014) [2024-06-15 21:28:29,001][1652491] Updated weights for policy 0, policy_version 812400 (0.0016) [2024-06-15 21:28:30,461][1652491] Updated weights for policy 0, policy_version 812448 (0.0013) [2024-06-15 21:28:30,955][1648985] Fps is (10 sec: 42598.3, 60 sec: 49698.1, 300 sec: 47430.3). Total num frames: 1663926272. Throughput: 0: 12014.9. Samples: 416038400. Policy #0 lag: (min: 93.0, avg: 195.9, max: 333.0) [2024-06-15 21:28:30,956][1648985] Avg episode reward: [(0, '167.510')] [2024-06-15 21:28:31,859][1652491] Updated weights for policy 0, policy_version 812512 (0.0162) [2024-06-15 21:28:35,955][1648985] Fps is (10 sec: 52429.5, 60 sec: 46967.5, 300 sec: 47097.1). Total num frames: 1664090112. Throughput: 0: 11980.8. Samples: 416111616. Policy #0 lag: (min: 93.0, avg: 195.9, max: 333.0) [2024-06-15 21:28:35,956][1648985] Avg episode reward: [(0, '171.490')] [2024-06-15 21:28:37,594][1651469] Signal inference workers to stop experience collection... (42300 times) [2024-06-15 21:28:37,672][1652491] InferenceWorker_p0-w0: stopping experience collection (42300 times) [2024-06-15 21:28:37,870][1651469] Signal inference workers to resume experience collection... (42300 times) [2024-06-15 21:28:37,872][1652491] InferenceWorker_p0-w0: resuming experience collection (42300 times) [2024-06-15 21:28:37,874][1652491] Updated weights for policy 0, policy_version 812560 (0.0013) [2024-06-15 21:28:39,375][1652491] Updated weights for policy 0, policy_version 812611 (0.0013) [2024-06-15 21:28:40,955][1648985] Fps is (10 sec: 45876.2, 60 sec: 48605.9, 300 sec: 47208.1). Total num frames: 1664385024. Throughput: 0: 11844.3. Samples: 416144896. Policy #0 lag: (min: 93.0, avg: 195.9, max: 333.0) [2024-06-15 21:28:40,956][1648985] Avg episode reward: [(0, '175.450')] [2024-06-15 21:28:41,103][1652491] Updated weights for policy 0, policy_version 812690 (0.0014) [2024-06-15 21:28:42,176][1652491] Updated weights for policy 0, policy_version 812736 (0.0025) [2024-06-15 21:28:43,428][1652491] Updated weights for policy 0, policy_version 812792 (0.0012) [2024-06-15 21:28:45,955][1648985] Fps is (10 sec: 52428.0, 60 sec: 48059.7, 300 sec: 47430.3). Total num frames: 1664614400. Throughput: 0: 11923.9. Samples: 416209408. Policy #0 lag: (min: 93.0, avg: 195.9, max: 333.0) [2024-06-15 21:28:45,957][1648985] Avg episode reward: [(0, '176.540')] [2024-06-15 21:28:50,217][1652491] Updated weights for policy 0, policy_version 812854 (0.0084) [2024-06-15 21:28:50,955][1648985] Fps is (10 sec: 39321.8, 60 sec: 46421.3, 300 sec: 46769.5). Total num frames: 1664778240. Throughput: 0: 11741.9. Samples: 416281600. Policy #0 lag: (min: 93.0, avg: 195.9, max: 333.0) [2024-06-15 21:28:50,955][1648985] Avg episode reward: [(0, '186.380')] [2024-06-15 21:28:51,690][1652491] Updated weights for policy 0, policy_version 812912 (0.0014) [2024-06-15 21:28:53,277][1652491] Updated weights for policy 0, policy_version 812960 (0.0016) [2024-06-15 21:28:55,330][1652491] Updated weights for policy 0, policy_version 813050 (0.0066) [2024-06-15 21:28:55,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 48059.8, 300 sec: 47652.4). Total num frames: 1665138688. Throughput: 0: 11514.3. Samples: 416306176. Policy #0 lag: (min: 93.0, avg: 195.9, max: 333.0) [2024-06-15 21:28:55,956][1648985] Avg episode reward: [(0, '167.610')] [2024-06-15 21:28:55,960][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000813056_1665138688.pth... [2024-06-15 21:28:56,029][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000807424_1653604352.pth [2024-06-15 21:29:00,955][1648985] Fps is (10 sec: 36044.4, 60 sec: 43690.6, 300 sec: 46430.6). Total num frames: 1665138688. Throughput: 0: 11457.4. Samples: 416381952. Policy #0 lag: (min: 93.0, avg: 195.9, max: 333.0) [2024-06-15 21:29:00,956][1648985] Avg episode reward: [(0, '171.550')] [2024-06-15 21:29:02,022][1652491] Updated weights for policy 0, policy_version 813104 (0.0118) [2024-06-15 21:29:03,669][1652491] Updated weights for policy 0, policy_version 813155 (0.0130) [2024-06-15 21:29:04,583][1652491] Updated weights for policy 0, policy_version 813187 (0.0012) [2024-06-15 21:29:05,961][1648985] Fps is (10 sec: 39299.2, 60 sec: 47508.9, 300 sec: 47540.4). Total num frames: 1665531904. Throughput: 0: 11467.4. Samples: 416442880. Policy #0 lag: (min: 93.0, avg: 195.9, max: 333.0) [2024-06-15 21:29:05,962][1648985] Avg episode reward: [(0, '157.020')] [2024-06-15 21:29:06,886][1652491] Updated weights for policy 0, policy_version 813283 (0.0018) [2024-06-15 21:29:10,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 44236.7, 300 sec: 46874.9). Total num frames: 1665662976. Throughput: 0: 11355.0. Samples: 416478208. Policy #0 lag: (min: 93.0, avg: 195.9, max: 333.0) [2024-06-15 21:29:10,956][1648985] Avg episode reward: [(0, '160.970')] [2024-06-15 21:29:13,313][1652491] Updated weights for policy 0, policy_version 813360 (0.0016) [2024-06-15 21:29:15,474][1652491] Updated weights for policy 0, policy_version 813424 (0.0122) [2024-06-15 21:29:15,576][1651469] Signal inference workers to stop experience collection... (42350 times) [2024-06-15 21:29:15,667][1652491] InferenceWorker_p0-w0: stopping experience collection (42350 times) [2024-06-15 21:29:15,799][1651469] Signal inference workers to resume experience collection... (42350 times) [2024-06-15 21:29:15,800][1652491] InferenceWorker_p0-w0: resuming experience collection (42350 times) [2024-06-15 21:29:15,955][1648985] Fps is (10 sec: 39344.9, 60 sec: 46421.5, 300 sec: 47097.1). Total num frames: 1665925120. Throughput: 0: 11423.4. Samples: 416552448. Policy #0 lag: (min: 93.0, avg: 195.9, max: 333.0) [2024-06-15 21:29:15,955][1648985] Avg episode reward: [(0, '151.300')] [2024-06-15 21:29:17,052][1652491] Updated weights for policy 0, policy_version 813494 (0.0020) [2024-06-15 21:29:18,336][1652491] Updated weights for policy 0, policy_version 813538 (0.0013) [2024-06-15 21:29:20,955][1648985] Fps is (10 sec: 52429.1, 60 sec: 44783.1, 300 sec: 47319.2). Total num frames: 1666187264. Throughput: 0: 11377.8. Samples: 416623616. Policy #0 lag: (min: 93.0, avg: 195.9, max: 333.0) [2024-06-15 21:29:20,956][1648985] Avg episode reward: [(0, '132.810')] [2024-06-15 21:29:23,772][1652491] Updated weights for policy 0, policy_version 813584 (0.0053) [2024-06-15 21:29:24,752][1652491] Updated weights for policy 0, policy_version 813631 (0.0015) [2024-06-15 21:29:25,965][1648985] Fps is (10 sec: 42554.6, 60 sec: 46413.5, 300 sec: 46762.2). Total num frames: 1666351104. Throughput: 0: 11420.7. Samples: 416658944. Policy #0 lag: (min: 93.0, avg: 195.9, max: 333.0) [2024-06-15 21:29:25,966][1648985] Avg episode reward: [(0, '144.460')] [2024-06-15 21:29:27,363][1652491] Updated weights for policy 0, policy_version 813712 (0.0014) [2024-06-15 21:29:28,763][1652491] Updated weights for policy 0, policy_version 813767 (0.0128) [2024-06-15 21:29:29,976][1652491] Updated weights for policy 0, policy_version 813818 (0.0015) [2024-06-15 21:29:30,955][1648985] Fps is (10 sec: 52428.6, 60 sec: 46421.4, 300 sec: 47874.6). Total num frames: 1666711552. Throughput: 0: 11275.4. Samples: 416716800. Policy #0 lag: (min: 93.0, avg: 195.9, max: 333.0) [2024-06-15 21:29:30,956][1648985] Avg episode reward: [(0, '149.620')] [2024-06-15 21:29:35,610][1652491] Updated weights for policy 0, policy_version 813858 (0.0012) [2024-06-15 21:29:35,974][1648985] Fps is (10 sec: 45836.9, 60 sec: 45315.0, 300 sec: 46760.9). Total num frames: 1666809856. Throughput: 0: 11452.7. Samples: 416797184. Policy #0 lag: (min: 93.0, avg: 195.9, max: 333.0) [2024-06-15 21:29:35,974][1648985] Avg episode reward: [(0, '148.810')] [2024-06-15 21:29:37,575][1652491] Updated weights for policy 0, policy_version 813908 (0.0047) [2024-06-15 21:29:38,796][1652491] Updated weights for policy 0, policy_version 813977 (0.0110) [2024-06-15 21:29:40,327][1652491] Updated weights for policy 0, policy_version 814039 (0.0013) [2024-06-15 21:29:40,955][1648985] Fps is (10 sec: 49152.3, 60 sec: 46967.5, 300 sec: 47874.6). Total num frames: 1667203072. Throughput: 0: 11707.8. Samples: 416833024. Policy #0 lag: (min: 93.0, avg: 195.9, max: 333.0) [2024-06-15 21:29:40,956][1648985] Avg episode reward: [(0, '134.020')] [2024-06-15 21:29:45,949][1652491] Updated weights for policy 0, policy_version 814096 (0.0012) [2024-06-15 21:29:45,955][1648985] Fps is (10 sec: 45960.8, 60 sec: 44236.9, 300 sec: 46874.9). Total num frames: 1667268608. Throughput: 0: 11650.9. Samples: 416906240. Policy #0 lag: (min: 93.0, avg: 195.9, max: 333.0) [2024-06-15 21:29:45,955][1648985] Avg episode reward: [(0, '144.720')] [2024-06-15 21:29:46,992][1652491] Updated weights for policy 0, policy_version 814144 (0.0012) [2024-06-15 21:29:48,701][1652491] Updated weights for policy 0, policy_version 814202 (0.0013) [2024-06-15 21:29:49,807][1652491] Updated weights for policy 0, policy_version 814256 (0.0018) [2024-06-15 21:29:50,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 49152.0, 300 sec: 47874.7). Total num frames: 1667727360. Throughput: 0: 11777.5. Samples: 416972800. Policy #0 lag: (min: 15.0, avg: 106.6, max: 271.0) [2024-06-15 21:29:50,956][1648985] Avg episode reward: [(0, '147.710')] [2024-06-15 21:29:51,304][1652491] Updated weights for policy 0, policy_version 814336 (0.0027) [2024-06-15 21:29:55,955][1648985] Fps is (10 sec: 49150.8, 60 sec: 43690.6, 300 sec: 47208.1). Total num frames: 1667760128. Throughput: 0: 11912.5. Samples: 417014272. Policy #0 lag: (min: 15.0, avg: 106.6, max: 271.0) [2024-06-15 21:29:55,956][1648985] Avg episode reward: [(0, '151.320')] [2024-06-15 21:29:56,974][1651469] Signal inference workers to stop experience collection... (42400 times) [2024-06-15 21:29:57,063][1652491] InferenceWorker_p0-w0: stopping experience collection (42400 times) [2024-06-15 21:29:57,306][1651469] Signal inference workers to resume experience collection... (42400 times) [2024-06-15 21:29:57,307][1652491] InferenceWorker_p0-w0: resuming experience collection (42400 times) [2024-06-15 21:29:57,948][1652491] Updated weights for policy 0, policy_version 814400 (0.0012) [2024-06-15 21:29:59,200][1652491] Updated weights for policy 0, policy_version 814448 (0.0012) [2024-06-15 21:30:00,139][1652491] Updated weights for policy 0, policy_version 814496 (0.0012) [2024-06-15 21:30:00,955][1648985] Fps is (10 sec: 42598.2, 60 sec: 50244.3, 300 sec: 47541.4). Total num frames: 1668153344. Throughput: 0: 11901.1. Samples: 417088000. Policy #0 lag: (min: 15.0, avg: 106.6, max: 271.0) [2024-06-15 21:30:00,956][1648985] Avg episode reward: [(0, '157.260')] [2024-06-15 21:30:01,295][1652491] Updated weights for policy 0, policy_version 814554 (0.0012) [2024-06-15 21:30:01,931][1652491] Updated weights for policy 0, policy_version 814589 (0.0020) [2024-06-15 21:30:05,955][1648985] Fps is (10 sec: 52430.0, 60 sec: 45879.7, 300 sec: 47541.4). Total num frames: 1668284416. Throughput: 0: 12128.7. Samples: 417169408. Policy #0 lag: (min: 15.0, avg: 106.6, max: 271.0) [2024-06-15 21:30:05,956][1648985] Avg episode reward: [(0, '171.630')] [2024-06-15 21:30:08,136][1652491] Updated weights for policy 0, policy_version 814650 (0.0014) [2024-06-15 21:30:09,400][1652491] Updated weights for policy 0, policy_version 814695 (0.0013) [2024-06-15 21:30:10,955][1648985] Fps is (10 sec: 42598.4, 60 sec: 48605.9, 300 sec: 47208.1). Total num frames: 1668579328. Throughput: 0: 12029.0. Samples: 417200128. Policy #0 lag: (min: 15.0, avg: 106.6, max: 271.0) [2024-06-15 21:30:10,956][1648985] Avg episode reward: [(0, '205.590')] [2024-06-15 21:30:10,980][1652491] Updated weights for policy 0, policy_version 814738 (0.0013) [2024-06-15 21:30:12,552][1652491] Updated weights for policy 0, policy_version 814818 (0.0012) [2024-06-15 21:30:15,955][1648985] Fps is (10 sec: 52428.3, 60 sec: 48059.6, 300 sec: 47652.4). Total num frames: 1668808704. Throughput: 0: 12333.5. Samples: 417271808. Policy #0 lag: (min: 15.0, avg: 106.6, max: 271.0) [2024-06-15 21:30:15,956][1648985] Avg episode reward: [(0, '195.840')] [2024-06-15 21:30:18,476][1652491] Updated weights for policy 0, policy_version 814880 (0.0015) [2024-06-15 21:30:20,086][1652491] Updated weights for policy 0, policy_version 814945 (0.0013) [2024-06-15 21:30:20,955][1648985] Fps is (10 sec: 49152.0, 60 sec: 48059.7, 300 sec: 47319.2). Total num frames: 1669070848. Throughput: 0: 12088.2. Samples: 417340928. Policy #0 lag: (min: 15.0, avg: 106.6, max: 271.0) [2024-06-15 21:30:20,956][1648985] Avg episode reward: [(0, '171.070')] [2024-06-15 21:30:21,899][1652491] Updated weights for policy 0, policy_version 814992 (0.0013) [2024-06-15 21:30:24,145][1652491] Updated weights for policy 0, policy_version 815097 (0.0015) [2024-06-15 21:30:25,955][1648985] Fps is (10 sec: 52427.6, 60 sec: 49706.4, 300 sec: 47763.5). Total num frames: 1669332992. Throughput: 0: 12014.9. Samples: 417373696. Policy #0 lag: (min: 15.0, avg: 106.6, max: 271.0) [2024-06-15 21:30:25,956][1648985] Avg episode reward: [(0, '165.410')] [2024-06-15 21:30:29,972][1652491] Updated weights for policy 0, policy_version 815142 (0.0014) [2024-06-15 21:30:30,955][1648985] Fps is (10 sec: 42598.3, 60 sec: 46421.4, 300 sec: 47208.1). Total num frames: 1669496832. Throughput: 0: 12140.1. Samples: 417452544. Policy #0 lag: (min: 15.0, avg: 106.6, max: 271.0) [2024-06-15 21:30:30,956][1648985] Avg episode reward: [(0, '154.060')] [2024-06-15 21:30:31,621][1652491] Updated weights for policy 0, policy_version 815226 (0.0015) [2024-06-15 21:30:34,055][1652491] Updated weights for policy 0, policy_version 815280 (0.0115) [2024-06-15 21:30:34,191][1651469] Signal inference workers to stop experience collection... (42450 times) [2024-06-15 21:30:34,230][1652491] InferenceWorker_p0-w0: stopping experience collection (42450 times) [2024-06-15 21:30:34,394][1651469] Signal inference workers to resume experience collection... (42450 times) [2024-06-15 21:30:34,395][1652491] InferenceWorker_p0-w0: resuming experience collection (42450 times) [2024-06-15 21:30:35,278][1652491] Updated weights for policy 0, policy_version 815328 (0.0012) [2024-06-15 21:30:35,955][1648985] Fps is (10 sec: 52430.3, 60 sec: 50806.1, 300 sec: 47985.7). Total num frames: 1669857280. Throughput: 0: 12105.9. Samples: 417517568. Policy #0 lag: (min: 15.0, avg: 106.6, max: 271.0) [2024-06-15 21:30:35,956][1648985] Avg episode reward: [(0, '150.520')] [2024-06-15 21:30:40,128][1652491] Updated weights for policy 0, policy_version 815378 (0.0013) [2024-06-15 21:30:40,982][1648985] Fps is (10 sec: 45750.6, 60 sec: 45854.4, 300 sec: 46981.6). Total num frames: 1669955584. Throughput: 0: 12144.2. Samples: 417561088. Policy #0 lag: (min: 15.0, avg: 106.6, max: 271.0) [2024-06-15 21:30:40,983][1648985] Avg episode reward: [(0, '162.820')] [2024-06-15 21:30:41,780][1652491] Updated weights for policy 0, policy_version 815456 (0.0090) [2024-06-15 21:30:44,541][1652491] Updated weights for policy 0, policy_version 815509 (0.0013) [2024-06-15 21:30:45,955][1648985] Fps is (10 sec: 42598.3, 60 sec: 50244.2, 300 sec: 47652.5). Total num frames: 1670283264. Throughput: 0: 11980.8. Samples: 417627136. Policy #0 lag: (min: 15.0, avg: 106.6, max: 271.0) [2024-06-15 21:30:45,956][1648985] Avg episode reward: [(0, '151.230')] [2024-06-15 21:30:46,204][1652491] Updated weights for policy 0, policy_version 815579 (0.0014) [2024-06-15 21:30:50,955][1648985] Fps is (10 sec: 42714.6, 60 sec: 44236.7, 300 sec: 46986.0). Total num frames: 1670381568. Throughput: 0: 11764.6. Samples: 417698816. Policy #0 lag: (min: 15.0, avg: 106.6, max: 271.0) [2024-06-15 21:30:50,956][1648985] Avg episode reward: [(0, '161.660')] [2024-06-15 21:30:51,714][1652491] Updated weights for policy 0, policy_version 815632 (0.0028) [2024-06-15 21:30:52,959][1652491] Updated weights for policy 0, policy_version 815682 (0.0024) [2024-06-15 21:30:54,216][1652491] Updated weights for policy 0, policy_version 815735 (0.0012) [2024-06-15 21:30:55,116][1652491] Updated weights for policy 0, policy_version 815748 (0.0010) [2024-06-15 21:30:55,955][1648985] Fps is (10 sec: 42597.7, 60 sec: 49152.0, 300 sec: 47325.3). Total num frames: 1670709248. Throughput: 0: 11798.7. Samples: 417731072. Policy #0 lag: (min: 15.0, avg: 106.6, max: 271.0) [2024-06-15 21:30:55,956][1648985] Avg episode reward: [(0, '164.810')] [2024-06-15 21:30:56,549][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000815808_1670774784.pth... [2024-06-15 21:30:56,754][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000810240_1659371520.pth [2024-06-15 21:30:56,757][1651469] Saving a milestone train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/milestones/checkpoint_000815808_1670774784.pth [2024-06-15 21:30:57,248][1652491] Updated weights for policy 0, policy_version 815825 (0.0012) [2024-06-15 21:31:00,955][1648985] Fps is (10 sec: 52428.5, 60 sec: 45875.1, 300 sec: 47208.1). Total num frames: 1670905856. Throughput: 0: 11673.6. Samples: 417797120. Policy #0 lag: (min: 15.0, avg: 106.6, max: 271.0) [2024-06-15 21:31:00,956][1648985] Avg episode reward: [(0, '172.090')] [2024-06-15 21:31:02,572][1652491] Updated weights for policy 0, policy_version 815878 (0.0015) [2024-06-15 21:31:04,005][1652491] Updated weights for policy 0, policy_version 815938 (0.0013) [2024-06-15 21:31:05,378][1652491] Updated weights for policy 0, policy_version 815996 (0.0012) [2024-06-15 21:31:05,955][1648985] Fps is (10 sec: 45875.7, 60 sec: 48059.6, 300 sec: 47319.2). Total num frames: 1671168000. Throughput: 0: 11787.4. Samples: 417871360. Policy #0 lag: (min: 15.0, avg: 106.6, max: 271.0) [2024-06-15 21:31:05,956][1648985] Avg episode reward: [(0, '155.430')] [2024-06-15 21:31:07,487][1652491] Updated weights for policy 0, policy_version 816052 (0.0013) [2024-06-15 21:31:09,193][1652491] Updated weights for policy 0, policy_version 816120 (0.0014) [2024-06-15 21:31:10,960][1648985] Fps is (10 sec: 52402.9, 60 sec: 47509.6, 300 sec: 47540.5). Total num frames: 1671430144. Throughput: 0: 11661.0. Samples: 417898496. Policy #0 lag: (min: 15.0, avg: 106.6, max: 271.0) [2024-06-15 21:31:10,961][1648985] Avg episode reward: [(0, '156.400')] [2024-06-15 21:31:15,439][1651469] Signal inference workers to stop experience collection... (42500 times) [2024-06-15 21:31:15,497][1652491] InferenceWorker_p0-w0: stopping experience collection (42500 times) [2024-06-15 21:31:15,500][1652491] Updated weights for policy 0, policy_version 816194 (0.0014) [2024-06-15 21:31:15,810][1651469] Signal inference workers to resume experience collection... (42500 times) [2024-06-15 21:31:15,810][1652491] InferenceWorker_p0-w0: resuming experience collection (42500 times) [2024-06-15 21:31:15,955][1648985] Fps is (10 sec: 42598.8, 60 sec: 46421.4, 300 sec: 47097.1). Total num frames: 1671593984. Throughput: 0: 11605.3. Samples: 417974784. Policy #0 lag: (min: 15.0, avg: 106.6, max: 271.0) [2024-06-15 21:31:15,956][1648985] Avg episode reward: [(0, '148.930')] [2024-06-15 21:31:16,709][1652491] Updated weights for policy 0, policy_version 816253 (0.0028) [2024-06-15 21:31:19,320][1652491] Updated weights for policy 0, policy_version 816312 (0.0013) [2024-06-15 21:31:20,955][1648985] Fps is (10 sec: 49177.0, 60 sec: 47513.6, 300 sec: 47763.5). Total num frames: 1671921664. Throughput: 0: 11423.3. Samples: 418031616. Policy #0 lag: (min: 15.0, avg: 106.6, max: 271.0) [2024-06-15 21:31:20,956][1648985] Avg episode reward: [(0, '158.210')] [2024-06-15 21:31:20,999][1652491] Updated weights for policy 0, policy_version 816375 (0.0012) [2024-06-15 21:31:25,955][1648985] Fps is (10 sec: 42598.2, 60 sec: 44783.1, 300 sec: 46874.9). Total num frames: 1672019968. Throughput: 0: 11475.7. Samples: 418077184. Policy #0 lag: (min: 15.0, avg: 106.6, max: 271.0) [2024-06-15 21:31:25,956][1648985] Avg episode reward: [(0, '175.760')] [2024-06-15 21:31:26,082][1652491] Updated weights for policy 0, policy_version 816432 (0.0011) [2024-06-15 21:31:27,203][1652491] Updated weights for policy 0, policy_version 816484 (0.0012) [2024-06-15 21:31:29,970][1652491] Updated weights for policy 0, policy_version 816528 (0.0012) [2024-06-15 21:31:30,955][1648985] Fps is (10 sec: 39321.6, 60 sec: 46967.5, 300 sec: 47430.3). Total num frames: 1672314880. Throughput: 0: 11685.0. Samples: 418152960. Policy #0 lag: (min: 15.0, avg: 106.6, max: 271.0) [2024-06-15 21:31:30,956][1648985] Avg episode reward: [(0, '180.870')] [2024-06-15 21:31:32,019][1652491] Updated weights for policy 0, policy_version 816608 (0.0012) [2024-06-15 21:31:35,216][1652491] Updated weights for policy 0, policy_version 816641 (0.0014) [2024-06-15 21:31:35,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 44782.9, 300 sec: 46986.0). Total num frames: 1672544256. Throughput: 0: 11559.8. Samples: 418219008. Policy #0 lag: (min: 15.0, avg: 106.6, max: 271.0) [2024-06-15 21:31:35,956][1648985] Avg episode reward: [(0, '184.190')] [2024-06-15 21:31:37,294][1652491] Updated weights for policy 0, policy_version 816736 (0.0011) [2024-06-15 21:31:40,955][1648985] Fps is (10 sec: 42598.2, 60 sec: 46442.4, 300 sec: 47097.1). Total num frames: 1672740864. Throughput: 0: 11525.7. Samples: 418249728. Policy #0 lag: (min: 47.0, avg: 146.0, max: 303.0) [2024-06-15 21:31:40,956][1648985] Avg episode reward: [(0, '156.980')] [2024-06-15 21:31:42,301][1652491] Updated weights for policy 0, policy_version 816800 (0.0090) [2024-06-15 21:31:44,062][1652491] Updated weights for policy 0, policy_version 816865 (0.0021) [2024-06-15 21:31:45,955][1648985] Fps is (10 sec: 45875.1, 60 sec: 45329.1, 300 sec: 46986.0). Total num frames: 1673003008. Throughput: 0: 11537.1. Samples: 418316288. Policy #0 lag: (min: 47.0, avg: 146.0, max: 303.0) [2024-06-15 21:31:45,956][1648985] Avg episode reward: [(0, '146.950')] [2024-06-15 21:31:46,892][1652491] Updated weights for policy 0, policy_version 816912 (0.0014) [2024-06-15 21:31:48,468][1652491] Updated weights for policy 0, policy_version 816981 (0.0013) [2024-06-15 21:31:49,224][1652491] Updated weights for policy 0, policy_version 817024 (0.0048) [2024-06-15 21:31:50,955][1648985] Fps is (10 sec: 52429.2, 60 sec: 48059.8, 300 sec: 47097.1). Total num frames: 1673265152. Throughput: 0: 11571.2. Samples: 418392064. Policy #0 lag: (min: 47.0, avg: 146.0, max: 303.0) [2024-06-15 21:31:50,956][1648985] Avg episode reward: [(0, '137.940')] [2024-06-15 21:31:55,168][1652491] Updated weights for policy 0, policy_version 817104 (0.0018) [2024-06-15 21:31:55,349][1651469] Signal inference workers to stop experience collection... (42550 times) [2024-06-15 21:31:55,385][1652491] InferenceWorker_p0-w0: stopping experience collection (42550 times) [2024-06-15 21:31:55,613][1651469] Signal inference workers to resume experience collection... (42550 times) [2024-06-15 21:31:55,614][1652491] InferenceWorker_p0-w0: resuming experience collection (42550 times) [2024-06-15 21:31:55,955][1648985] Fps is (10 sec: 45875.7, 60 sec: 45875.4, 300 sec: 46986.0). Total num frames: 1673461760. Throughput: 0: 11811.5. Samples: 418429952. Policy #0 lag: (min: 47.0, avg: 146.0, max: 303.0) [2024-06-15 21:31:55,955][1648985] Avg episode reward: [(0, '143.690')] [2024-06-15 21:31:56,530][1652491] Updated weights for policy 0, policy_version 817152 (0.0013) [2024-06-15 21:31:59,099][1652491] Updated weights for policy 0, policy_version 817203 (0.0011) [2024-06-15 21:32:00,528][1652491] Updated weights for policy 0, policy_version 817277 (0.0103) [2024-06-15 21:32:00,955][1648985] Fps is (10 sec: 52428.5, 60 sec: 48059.8, 300 sec: 47097.1). Total num frames: 1673789440. Throughput: 0: 11446.0. Samples: 418489856. Policy #0 lag: (min: 47.0, avg: 146.0, max: 303.0) [2024-06-15 21:32:00,956][1648985] Avg episode reward: [(0, '133.200')] [2024-06-15 21:32:05,955][1648985] Fps is (10 sec: 39321.4, 60 sec: 44783.0, 300 sec: 46874.9). Total num frames: 1673854976. Throughput: 0: 11821.5. Samples: 418563584. Policy #0 lag: (min: 47.0, avg: 146.0, max: 303.0) [2024-06-15 21:32:05,956][1648985] Avg episode reward: [(0, '139.550')] [2024-06-15 21:32:05,969][1652491] Updated weights for policy 0, policy_version 817328 (0.0013) [2024-06-15 21:32:07,900][1652491] Updated weights for policy 0, policy_version 817402 (0.0012) [2024-06-15 21:32:10,955][1648985] Fps is (10 sec: 36043.8, 60 sec: 45332.7, 300 sec: 46541.6). Total num frames: 1674149888. Throughput: 0: 11389.1. Samples: 418589696. Policy #0 lag: (min: 47.0, avg: 146.0, max: 303.0) [2024-06-15 21:32:10,956][1648985] Avg episode reward: [(0, '145.860')] [2024-06-15 21:32:11,136][1652491] Updated weights for policy 0, policy_version 817472 (0.0102) [2024-06-15 21:32:15,955][1648985] Fps is (10 sec: 45875.2, 60 sec: 45329.1, 300 sec: 46874.9). Total num frames: 1674313728. Throughput: 0: 11127.5. Samples: 418653696. Policy #0 lag: (min: 47.0, avg: 146.0, max: 303.0) [2024-06-15 21:32:15,956][1648985] Avg episode reward: [(0, '170.080')] [2024-06-15 21:32:18,256][1652491] Updated weights for policy 0, policy_version 817568 (0.0015) [2024-06-15 21:32:20,229][1652491] Updated weights for policy 0, policy_version 817648 (0.0013) [2024-06-15 21:32:20,955][1648985] Fps is (10 sec: 42599.7, 60 sec: 44236.8, 300 sec: 46541.7). Total num frames: 1674575872. Throughput: 0: 11047.8. Samples: 418716160. Policy #0 lag: (min: 47.0, avg: 146.0, max: 303.0) [2024-06-15 21:32:20,956][1648985] Avg episode reward: [(0, '154.420')] [2024-06-15 21:32:23,697][1652491] Updated weights for policy 0, policy_version 817729 (0.0013) [2024-06-15 21:32:24,834][1652491] Updated weights for policy 0, policy_version 817783 (0.0013) [2024-06-15 21:32:25,955][1648985] Fps is (10 sec: 52428.2, 60 sec: 46967.4, 300 sec: 47097.1). Total num frames: 1674838016. Throughput: 0: 11138.8. Samples: 418750976. Policy #0 lag: (min: 47.0, avg: 146.0, max: 303.0) [2024-06-15 21:32:25,956][1648985] Avg episode reward: [(0, '157.090')] [2024-06-15 21:32:30,955][1648985] Fps is (10 sec: 36045.0, 60 sec: 43690.7, 300 sec: 46319.5). Total num frames: 1674936320. Throughput: 0: 11298.2. Samples: 418824704. Policy #0 lag: (min: 47.0, avg: 146.0, max: 303.0) [2024-06-15 21:32:30,955][1648985] Avg episode reward: [(0, '168.980')] [2024-06-15 21:32:31,005][1652491] Updated weights for policy 0, policy_version 817856 (0.0110) [2024-06-15 21:32:32,338][1652491] Updated weights for policy 0, policy_version 817915 (0.0015) [2024-06-15 21:32:35,077][1652491] Updated weights for policy 0, policy_version 817984 (0.0012) [2024-06-15 21:32:35,928][1651469] Signal inference workers to stop experience collection... (42600 times) [2024-06-15 21:32:35,955][1648985] Fps is (10 sec: 45876.0, 60 sec: 45875.3, 300 sec: 46874.9). Total num frames: 1675296768. Throughput: 0: 10843.0. Samples: 418880000. Policy #0 lag: (min: 47.0, avg: 146.0, max: 303.0) [2024-06-15 21:32:35,955][1648985] Avg episode reward: [(0, '171.030')] [2024-06-15 21:32:35,984][1652491] InferenceWorker_p0-w0: stopping experience collection (42600 times) [2024-06-15 21:32:36,115][1651469] Signal inference workers to resume experience collection... (42600 times) [2024-06-15 21:32:36,116][1652491] InferenceWorker_p0-w0: resuming experience collection (42600 times) [2024-06-15 21:32:36,337][1652491] Updated weights for policy 0, policy_version 818046 (0.0015) [2024-06-15 21:32:40,955][1648985] Fps is (10 sec: 42598.0, 60 sec: 43690.7, 300 sec: 46208.4). Total num frames: 1675362304. Throughput: 0: 10854.4. Samples: 418918400. Policy #0 lag: (min: 47.0, avg: 146.0, max: 303.0) [2024-06-15 21:32:40,956][1648985] Avg episode reward: [(0, '180.570')] [2024-06-15 21:32:43,344][1652491] Updated weights for policy 0, policy_version 818114 (0.0028) [2024-06-15 21:32:44,465][1652491] Updated weights for policy 0, policy_version 818163 (0.0013) [2024-06-15 21:32:45,833][1652491] Updated weights for policy 0, policy_version 818208 (0.0011) [2024-06-15 21:32:45,955][1648985] Fps is (10 sec: 39321.4, 60 sec: 44783.0, 300 sec: 46430.6). Total num frames: 1675689984. Throughput: 0: 11013.7. Samples: 418985472. Policy #0 lag: (min: 47.0, avg: 146.0, max: 303.0) [2024-06-15 21:32:45,956][1648985] Avg episode reward: [(0, '180.420')] [2024-06-15 21:32:47,759][1652491] Updated weights for policy 0, policy_version 818273 (0.0060) [2024-06-15 21:32:50,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 43690.6, 300 sec: 46208.5). Total num frames: 1675886592. Throughput: 0: 10877.1. Samples: 419053056. Policy #0 lag: (min: 47.0, avg: 146.0, max: 303.0) [2024-06-15 21:32:50,956][1648985] Avg episode reward: [(0, '200.190')] [2024-06-15 21:32:53,954][1652491] Updated weights for policy 0, policy_version 818336 (0.0014) [2024-06-15 21:32:55,503][1652491] Updated weights for policy 0, policy_version 818400 (0.0011) [2024-06-15 21:32:55,955][1648985] Fps is (10 sec: 42598.2, 60 sec: 44236.7, 300 sec: 46097.4). Total num frames: 1676115968. Throughput: 0: 11173.0. Samples: 419092480. Policy #0 lag: (min: 47.0, avg: 146.0, max: 303.0) [2024-06-15 21:32:55,956][1648985] Avg episode reward: [(0, '184.950')] [2024-06-15 21:32:56,257][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000818432_1676148736.pth... [2024-06-15 21:32:56,468][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000813056_1665138688.pth [2024-06-15 21:32:56,982][1652491] Updated weights for policy 0, policy_version 818451 (0.0012) [2024-06-15 21:32:59,324][1652491] Updated weights for policy 0, policy_version 818544 (0.0013) [2024-06-15 21:33:00,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 43690.7, 300 sec: 46541.7). Total num frames: 1676410880. Throughput: 0: 10831.6. Samples: 419141120. Policy #0 lag: (min: 47.0, avg: 146.0, max: 303.0) [2024-06-15 21:33:00,956][1648985] Avg episode reward: [(0, '186.910')] [2024-06-15 21:33:05,411][1652491] Updated weights for policy 0, policy_version 818578 (0.0027) [2024-06-15 21:33:05,955][1648985] Fps is (10 sec: 36045.0, 60 sec: 43690.7, 300 sec: 45653.0). Total num frames: 1676476416. Throughput: 0: 11241.3. Samples: 419222016. Policy #0 lag: (min: 47.0, avg: 146.0, max: 303.0) [2024-06-15 21:33:05,955][1648985] Avg episode reward: [(0, '156.330')] [2024-06-15 21:33:07,081][1652491] Updated weights for policy 0, policy_version 818656 (0.0015) [2024-06-15 21:33:09,183][1652491] Updated weights for policy 0, policy_version 818705 (0.0012) [2024-06-15 21:33:10,955][1648985] Fps is (10 sec: 42598.7, 60 sec: 44783.2, 300 sec: 46430.6). Total num frames: 1676836864. Throughput: 0: 11184.4. Samples: 419254272. Policy #0 lag: (min: 47.0, avg: 146.0, max: 303.0) [2024-06-15 21:33:10,955][1648985] Avg episode reward: [(0, '154.060')] [2024-06-15 21:33:11,236][1652491] Updated weights for policy 0, policy_version 818790 (0.0012) [2024-06-15 21:33:15,955][1648985] Fps is (10 sec: 45874.9, 60 sec: 43690.6, 300 sec: 45542.0). Total num frames: 1676935168. Throughput: 0: 11104.7. Samples: 419324416. Policy #0 lag: (min: 47.0, avg: 146.0, max: 303.0) [2024-06-15 21:33:15,956][1648985] Avg episode reward: [(0, '163.640')] [2024-06-15 21:33:16,549][1652491] Updated weights for policy 0, policy_version 818848 (0.0013) [2024-06-15 21:33:17,339][1652491] Updated weights for policy 0, policy_version 818880 (0.0012) [2024-06-15 21:33:18,808][1652491] Updated weights for policy 0, policy_version 818943 (0.0022) [2024-06-15 21:33:19,855][1651469] Signal inference workers to stop experience collection... (42650 times) [2024-06-15 21:33:19,933][1652491] InferenceWorker_p0-w0: stopping experience collection (42650 times) [2024-06-15 21:33:20,170][1651469] Signal inference workers to resume experience collection... (42650 times) [2024-06-15 21:33:20,174][1652491] InferenceWorker_p0-w0: resuming experience collection (42650 times) [2024-06-15 21:33:20,955][1648985] Fps is (10 sec: 42597.7, 60 sec: 44782.8, 300 sec: 46430.6). Total num frames: 1677262848. Throughput: 0: 11366.4. Samples: 419391488. Policy #0 lag: (min: 47.0, avg: 146.0, max: 303.0) [2024-06-15 21:33:20,956][1648985] Avg episode reward: [(0, '155.100')] [2024-06-15 21:33:21,393][1652491] Updated weights for policy 0, policy_version 819008 (0.0015) [2024-06-15 21:33:22,765][1652491] Updated weights for policy 0, policy_version 819071 (0.0013) [2024-06-15 21:33:25,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 43690.7, 300 sec: 45875.2). Total num frames: 1677459456. Throughput: 0: 11195.7. Samples: 419422208. Policy #0 lag: (min: 47.0, avg: 146.0, max: 303.0) [2024-06-15 21:33:25,955][1648985] Avg episode reward: [(0, '163.430')] [2024-06-15 21:33:28,152][1652491] Updated weights for policy 0, policy_version 819130 (0.0053) [2024-06-15 21:33:29,611][1652491] Updated weights for policy 0, policy_version 819174 (0.0017) [2024-06-15 21:33:30,955][1648985] Fps is (10 sec: 45875.6, 60 sec: 46421.3, 300 sec: 46208.4). Total num frames: 1677721600. Throughput: 0: 11389.1. Samples: 419497984. Policy #0 lag: (min: 15.0, avg: 95.6, max: 271.0) [2024-06-15 21:33:30,956][1648985] Avg episode reward: [(0, '150.130')] [2024-06-15 21:33:31,904][1652491] Updated weights for policy 0, policy_version 819232 (0.0153) [2024-06-15 21:33:33,350][1652491] Updated weights for policy 0, policy_version 819299 (0.0131) [2024-06-15 21:33:35,955][1648985] Fps is (10 sec: 52428.4, 60 sec: 44782.8, 300 sec: 46097.3). Total num frames: 1677983744. Throughput: 0: 11571.2. Samples: 419573760. Policy #0 lag: (min: 15.0, avg: 95.6, max: 271.0) [2024-06-15 21:33:35,956][1648985] Avg episode reward: [(0, '147.080')] [2024-06-15 21:33:38,483][1652491] Updated weights for policy 0, policy_version 819362 (0.0054) [2024-06-15 21:33:40,083][1652491] Updated weights for policy 0, policy_version 819413 (0.0035) [2024-06-15 21:33:40,955][1648985] Fps is (10 sec: 49152.5, 60 sec: 47513.7, 300 sec: 46097.4). Total num frames: 1678213120. Throughput: 0: 11605.4. Samples: 419614720. Policy #0 lag: (min: 15.0, avg: 95.6, max: 271.0) [2024-06-15 21:33:40,955][1648985] Avg episode reward: [(0, '146.840')] [2024-06-15 21:33:41,980][1652491] Updated weights for policy 0, policy_version 819473 (0.0017) [2024-06-15 21:33:43,754][1652491] Updated weights for policy 0, policy_version 819552 (0.0140) [2024-06-15 21:33:45,955][1648985] Fps is (10 sec: 52429.3, 60 sec: 46967.5, 300 sec: 46541.7). Total num frames: 1678508032. Throughput: 0: 11832.9. Samples: 419673600. Policy #0 lag: (min: 15.0, avg: 95.6, max: 271.0) [2024-06-15 21:33:45,955][1648985] Avg episode reward: [(0, '152.800')] [2024-06-15 21:33:49,464][1652491] Updated weights for policy 0, policy_version 819622 (0.0013) [2024-06-15 21:33:50,955][1648985] Fps is (10 sec: 45874.1, 60 sec: 46421.2, 300 sec: 45875.2). Total num frames: 1678671872. Throughput: 0: 11958.0. Samples: 419760128. Policy #0 lag: (min: 15.0, avg: 95.6, max: 271.0) [2024-06-15 21:33:50,956][1648985] Avg episode reward: [(0, '175.900')] [2024-06-15 21:33:50,969][1652491] Updated weights for policy 0, policy_version 819680 (0.0011) [2024-06-15 21:33:52,359][1652491] Updated weights for policy 0, policy_version 819713 (0.0012) [2024-06-15 21:33:54,325][1652491] Updated weights for policy 0, policy_version 819808 (0.0109) [2024-06-15 21:33:55,955][1648985] Fps is (10 sec: 52428.3, 60 sec: 48605.8, 300 sec: 47097.1). Total num frames: 1679032320. Throughput: 0: 11855.6. Samples: 419787776. Policy #0 lag: (min: 15.0, avg: 95.6, max: 271.0) [2024-06-15 21:33:55,956][1648985] Avg episode reward: [(0, '157.790')] [2024-06-15 21:34:00,872][1652491] Updated weights for policy 0, policy_version 819888 (0.0015) [2024-06-15 21:34:00,955][1648985] Fps is (10 sec: 45875.9, 60 sec: 45329.1, 300 sec: 46098.3). Total num frames: 1679130624. Throughput: 0: 12094.6. Samples: 419868672. Policy #0 lag: (min: 15.0, avg: 95.6, max: 271.0) [2024-06-15 21:34:00,956][1648985] Avg episode reward: [(0, '131.380')] [2024-06-15 21:34:01,562][1651469] Signal inference workers to stop experience collection... (42700 times) [2024-06-15 21:34:01,619][1652491] InferenceWorker_p0-w0: stopping experience collection (42700 times) [2024-06-15 21:34:01,791][1651469] Signal inference workers to resume experience collection... (42700 times) [2024-06-15 21:34:01,792][1652491] InferenceWorker_p0-w0: resuming experience collection (42700 times) [2024-06-15 21:34:02,668][1652491] Updated weights for policy 0, policy_version 819952 (0.0014) [2024-06-15 21:34:03,618][1652491] Updated weights for policy 0, policy_version 819987 (0.0014) [2024-06-15 21:34:05,444][1652491] Updated weights for policy 0, policy_version 820066 (0.0014) [2024-06-15 21:34:05,955][1648985] Fps is (10 sec: 52429.2, 60 sec: 51336.5, 300 sec: 47097.1). Total num frames: 1679556608. Throughput: 0: 11878.4. Samples: 419926016. Policy #0 lag: (min: 15.0, avg: 95.6, max: 271.0) [2024-06-15 21:34:05,956][1648985] Avg episode reward: [(0, '132.920')] [2024-06-15 21:34:10,955][1648985] Fps is (10 sec: 42597.2, 60 sec: 45328.8, 300 sec: 46208.4). Total num frames: 1679556608. Throughput: 0: 12185.5. Samples: 419970560. Policy #0 lag: (min: 15.0, avg: 95.6, max: 271.0) [2024-06-15 21:34:10,956][1648985] Avg episode reward: [(0, '140.560')] [2024-06-15 21:34:11,663][1652491] Updated weights for policy 0, policy_version 820114 (0.0011) [2024-06-15 21:34:14,127][1652491] Updated weights for policy 0, policy_version 820208 (0.0019) [2024-06-15 21:34:15,955][1648985] Fps is (10 sec: 36044.6, 60 sec: 49698.1, 300 sec: 46541.7). Total num frames: 1679917056. Throughput: 0: 11832.9. Samples: 420030464. Policy #0 lag: (min: 15.0, avg: 95.6, max: 271.0) [2024-06-15 21:34:15,956][1648985] Avg episode reward: [(0, '144.260')] [2024-06-15 21:34:16,311][1652491] Updated weights for policy 0, policy_version 820288 (0.0014) [2024-06-15 21:34:17,457][1652491] Updated weights for policy 0, policy_version 820339 (0.0013) [2024-06-15 21:34:20,955][1648985] Fps is (10 sec: 52430.3, 60 sec: 46967.5, 300 sec: 46543.3). Total num frames: 1680080896. Throughput: 0: 11673.6. Samples: 420099072. Policy #0 lag: (min: 15.0, avg: 95.6, max: 271.0) [2024-06-15 21:34:20,956][1648985] Avg episode reward: [(0, '146.550')] [2024-06-15 21:34:24,746][1652491] Updated weights for policy 0, policy_version 820403 (0.0016) [2024-06-15 21:34:25,955][1648985] Fps is (10 sec: 32768.1, 60 sec: 46421.3, 300 sec: 45875.2). Total num frames: 1680244736. Throughput: 0: 11753.2. Samples: 420143616. Policy #0 lag: (min: 15.0, avg: 95.6, max: 271.0) [2024-06-15 21:34:25,956][1648985] Avg episode reward: [(0, '152.120')] [2024-06-15 21:34:26,999][1652491] Updated weights for policy 0, policy_version 820481 (0.0016) [2024-06-15 21:34:28,881][1652491] Updated weights for policy 0, policy_version 820560 (0.0022) [2024-06-15 21:34:29,693][1652491] Updated weights for policy 0, policy_version 820608 (0.0013) [2024-06-15 21:34:30,955][1648985] Fps is (10 sec: 52428.5, 60 sec: 48059.7, 300 sec: 46766.8). Total num frames: 1680605184. Throughput: 0: 11571.2. Samples: 420194304. Policy #0 lag: (min: 15.0, avg: 95.6, max: 271.0) [2024-06-15 21:34:30,956][1648985] Avg episode reward: [(0, '136.140')] [2024-06-15 21:34:35,957][1648985] Fps is (10 sec: 36038.4, 60 sec: 43689.4, 300 sec: 45430.6). Total num frames: 1680605184. Throughput: 0: 11593.5. Samples: 420281856. Policy #0 lag: (min: 15.0, avg: 95.6, max: 271.0) [2024-06-15 21:34:35,957][1648985] Avg episode reward: [(0, '131.810')] [2024-06-15 21:34:37,702][1652491] Updated weights for policy 0, policy_version 820688 (0.0108) [2024-06-15 21:34:39,574][1651469] Signal inference workers to stop experience collection... (42750 times) [2024-06-15 21:34:39,608][1652491] InferenceWorker_p0-w0: stopping experience collection (42750 times) [2024-06-15 21:34:39,864][1651469] Signal inference workers to resume experience collection... (42750 times) [2024-06-15 21:34:39,865][1652491] InferenceWorker_p0-w0: resuming experience collection (42750 times) [2024-06-15 21:34:39,867][1652491] Updated weights for policy 0, policy_version 820768 (0.0011) [2024-06-15 21:34:40,955][1648985] Fps is (10 sec: 39322.0, 60 sec: 46421.3, 300 sec: 46541.7). Total num frames: 1680998400. Throughput: 0: 11457.4. Samples: 420303360. Policy #0 lag: (min: 15.0, avg: 95.6, max: 271.0) [2024-06-15 21:34:40,955][1648985] Avg episode reward: [(0, '145.090')] [2024-06-15 21:34:41,397][1652491] Updated weights for policy 0, policy_version 820832 (0.0012) [2024-06-15 21:34:45,955][1648985] Fps is (10 sec: 52438.2, 60 sec: 43690.6, 300 sec: 45430.9). Total num frames: 1681129472. Throughput: 0: 11207.1. Samples: 420372992. Policy #0 lag: (min: 15.0, avg: 95.6, max: 271.0) [2024-06-15 21:34:45,956][1648985] Avg episode reward: [(0, '153.780')] [2024-06-15 21:34:48,714][1652491] Updated weights for policy 0, policy_version 820912 (0.0136) [2024-06-15 21:34:50,513][1652491] Updated weights for policy 0, policy_version 820977 (0.0011) [2024-06-15 21:34:50,955][1648985] Fps is (10 sec: 39321.7, 60 sec: 45329.2, 300 sec: 46208.5). Total num frames: 1681391616. Throughput: 0: 11343.7. Samples: 420436480. Policy #0 lag: (min: 15.0, avg: 95.6, max: 271.0) [2024-06-15 21:34:50,955][1648985] Avg episode reward: [(0, '148.000')] [2024-06-15 21:34:52,264][1652491] Updated weights for policy 0, policy_version 821043 (0.0012) [2024-06-15 21:34:53,915][1652491] Updated weights for policy 0, policy_version 821119 (0.0079) [2024-06-15 21:34:55,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 43690.7, 300 sec: 45764.1). Total num frames: 1681653760. Throughput: 0: 10934.1. Samples: 420462592. Policy #0 lag: (min: 15.0, avg: 95.6, max: 271.0) [2024-06-15 21:34:55,956][1648985] Avg episode reward: [(0, '155.660')] [2024-06-15 21:34:55,961][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000821120_1681653760.pth... [2024-06-15 21:34:56,086][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000815808_1670774784.pth [2024-06-15 21:35:00,375][1652491] Updated weights for policy 0, policy_version 821168 (0.0013) [2024-06-15 21:35:00,955][1648985] Fps is (10 sec: 39321.2, 60 sec: 44236.8, 300 sec: 45764.1). Total num frames: 1681784832. Throughput: 0: 11355.0. Samples: 420541440. Policy #0 lag: (min: 15.0, avg: 95.6, max: 271.0) [2024-06-15 21:35:00,956][1648985] Avg episode reward: [(0, '152.570')] [2024-06-15 21:35:02,109][1652491] Updated weights for policy 0, policy_version 821232 (0.0014) [2024-06-15 21:35:04,168][1652491] Updated weights for policy 0, policy_version 821312 (0.0013) [2024-06-15 21:35:05,712][1652491] Updated weights for policy 0, policy_version 821376 (0.0016) [2024-06-15 21:35:05,955][1648985] Fps is (10 sec: 52429.3, 60 sec: 43690.7, 300 sec: 46097.4). Total num frames: 1682178048. Throughput: 0: 10854.4. Samples: 420587520. Policy #0 lag: (min: 15.0, avg: 95.6, max: 271.0) [2024-06-15 21:35:05,955][1648985] Avg episode reward: [(0, '143.750')] [2024-06-15 21:35:10,955][1648985] Fps is (10 sec: 39321.7, 60 sec: 43690.9, 300 sec: 45319.8). Total num frames: 1682178048. Throughput: 0: 10808.9. Samples: 420630016. Policy #0 lag: (min: 15.0, avg: 95.6, max: 271.0) [2024-06-15 21:35:10,956][1648985] Avg episode reward: [(0, '141.820')] [2024-06-15 21:35:13,102][1652491] Updated weights for policy 0, policy_version 821442 (0.0013) [2024-06-15 21:35:14,672][1652491] Updated weights for policy 0, policy_version 821507 (0.0013) [2024-06-15 21:35:15,955][1648985] Fps is (10 sec: 39321.4, 60 sec: 44236.9, 300 sec: 45764.1). Total num frames: 1682571264. Throughput: 0: 10991.0. Samples: 420688896. Policy #0 lag: (min: 72.0, avg: 182.4, max: 344.0) [2024-06-15 21:35:15,956][1648985] Avg episode reward: [(0, '127.730')] [2024-06-15 21:35:16,216][1652491] Updated weights for policy 0, policy_version 821572 (0.0025) [2024-06-15 21:35:20,955][1648985] Fps is (10 sec: 52429.1, 60 sec: 43690.7, 300 sec: 45319.9). Total num frames: 1682702336. Throughput: 0: 10684.2. Samples: 420762624. Policy #0 lag: (min: 72.0, avg: 182.4, max: 344.0) [2024-06-15 21:35:20,956][1648985] Avg episode reward: [(0, '135.430')] [2024-06-15 21:35:23,137][1651469] Signal inference workers to stop experience collection... (42800 times) [2024-06-15 21:35:23,249][1652491] InferenceWorker_p0-w0: stopping experience collection (42800 times) [2024-06-15 21:35:23,417][1651469] Signal inference workers to resume experience collection... (42800 times) [2024-06-15 21:35:23,418][1652491] InferenceWorker_p0-w0: resuming experience collection (42800 times) [2024-06-15 21:35:23,420][1652491] Updated weights for policy 0, policy_version 821648 (0.0012) [2024-06-15 21:35:24,902][1652491] Updated weights for policy 0, policy_version 821712 (0.0016) [2024-06-15 21:35:25,955][1648985] Fps is (10 sec: 36044.5, 60 sec: 44782.9, 300 sec: 45542.0). Total num frames: 1682931712. Throughput: 0: 10990.9. Samples: 420797952. Policy #0 lag: (min: 72.0, avg: 182.4, max: 344.0) [2024-06-15 21:35:25,956][1648985] Avg episode reward: [(0, '129.350')] [2024-06-15 21:35:26,250][1652491] Updated weights for policy 0, policy_version 821762 (0.0013) [2024-06-15 21:35:27,460][1652491] Updated weights for policy 0, policy_version 821816 (0.0011) [2024-06-15 21:35:28,715][1652491] Updated weights for policy 0, policy_version 821872 (0.0011) [2024-06-15 21:35:30,955][1648985] Fps is (10 sec: 52428.4, 60 sec: 43690.7, 300 sec: 45319.8). Total num frames: 1683226624. Throughput: 0: 10956.8. Samples: 420866048. Policy #0 lag: (min: 72.0, avg: 182.4, max: 344.0) [2024-06-15 21:35:30,956][1648985] Avg episode reward: [(0, '146.600')] [2024-06-15 21:35:33,455][1652491] Updated weights for policy 0, policy_version 821891 (0.0011) [2024-06-15 21:35:34,961][1652491] Updated weights for policy 0, policy_version 821968 (0.0012) [2024-06-15 21:35:35,955][1648985] Fps is (10 sec: 55705.4, 60 sec: 48061.1, 300 sec: 45879.4). Total num frames: 1683488768. Throughput: 0: 11286.7. Samples: 420944384. Policy #0 lag: (min: 72.0, avg: 182.4, max: 344.0) [2024-06-15 21:35:35,956][1648985] Avg episode reward: [(0, '147.260')] [2024-06-15 21:35:36,493][1652491] Updated weights for policy 0, policy_version 822033 (0.0014) [2024-06-15 21:35:37,376][1652491] Updated weights for policy 0, policy_version 822078 (0.0013) [2024-06-15 21:35:39,234][1652491] Updated weights for policy 0, policy_version 822135 (0.0018) [2024-06-15 21:35:40,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 45875.1, 300 sec: 45653.0). Total num frames: 1683750912. Throughput: 0: 11400.5. Samples: 420975616. Policy #0 lag: (min: 72.0, avg: 182.4, max: 344.0) [2024-06-15 21:35:40,956][1648985] Avg episode reward: [(0, '156.060')] [2024-06-15 21:35:43,713][1652491] Updated weights for policy 0, policy_version 822148 (0.0012) [2024-06-15 21:35:45,726][1652491] Updated weights for policy 0, policy_version 822225 (0.0015) [2024-06-15 21:35:45,955][1648985] Fps is (10 sec: 42598.8, 60 sec: 46421.3, 300 sec: 45875.2). Total num frames: 1683914752. Throughput: 0: 11423.3. Samples: 421055488. Policy #0 lag: (min: 72.0, avg: 182.4, max: 344.0) [2024-06-15 21:35:45,956][1648985] Avg episode reward: [(0, '153.900')] [2024-06-15 21:35:47,458][1652491] Updated weights for policy 0, policy_version 822288 (0.0017) [2024-06-15 21:35:49,399][1652491] Updated weights for policy 0, policy_version 822354 (0.0017) [2024-06-15 21:35:50,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 48059.7, 300 sec: 45986.3). Total num frames: 1684275200. Throughput: 0: 11832.9. Samples: 421120000. Policy #0 lag: (min: 72.0, avg: 182.4, max: 344.0) [2024-06-15 21:35:50,957][1648985] Avg episode reward: [(0, '165.530')] [2024-06-15 21:35:54,576][1652491] Updated weights for policy 0, policy_version 822403 (0.0014) [2024-06-15 21:35:55,771][1652491] Updated weights for policy 0, policy_version 822450 (0.0123) [2024-06-15 21:35:55,955][1648985] Fps is (10 sec: 45874.9, 60 sec: 45329.0, 300 sec: 45653.1). Total num frames: 1684373504. Throughput: 0: 12003.5. Samples: 421170176. Policy #0 lag: (min: 72.0, avg: 182.4, max: 344.0) [2024-06-15 21:35:55,956][1648985] Avg episode reward: [(0, '157.710')] [2024-06-15 21:35:57,313][1652491] Updated weights for policy 0, policy_version 822528 (0.0014) [2024-06-15 21:35:58,624][1651469] Signal inference workers to stop experience collection... (42850 times) [2024-06-15 21:35:58,684][1652491] InferenceWorker_p0-w0: stopping experience collection (42850 times) [2024-06-15 21:35:58,797][1651469] Signal inference workers to resume experience collection... (42850 times) [2024-06-15 21:35:58,797][1652491] InferenceWorker_p0-w0: resuming experience collection (42850 times) [2024-06-15 21:35:58,829][1652491] Updated weights for policy 0, policy_version 822577 (0.0013) [2024-06-15 21:35:59,935][1652491] Updated weights for policy 0, policy_version 822626 (0.0151) [2024-06-15 21:36:00,955][1648985] Fps is (10 sec: 52429.1, 60 sec: 50244.3, 300 sec: 46208.4). Total num frames: 1684799488. Throughput: 0: 12162.8. Samples: 421236224. Policy #0 lag: (min: 72.0, avg: 182.4, max: 344.0) [2024-06-15 21:36:00,956][1648985] Avg episode reward: [(0, '159.670')] [2024-06-15 21:36:05,247][1652491] Updated weights for policy 0, policy_version 822673 (0.0013) [2024-06-15 21:36:05,955][1648985] Fps is (10 sec: 52428.2, 60 sec: 45328.9, 300 sec: 45653.8). Total num frames: 1684897792. Throughput: 0: 12356.2. Samples: 421318656. Policy #0 lag: (min: 72.0, avg: 182.4, max: 344.0) [2024-06-15 21:36:05,956][1648985] Avg episode reward: [(0, '150.540')] [2024-06-15 21:36:06,500][1652491] Updated weights for policy 0, policy_version 822722 (0.0013) [2024-06-15 21:36:07,881][1652491] Updated weights for policy 0, policy_version 822781 (0.0023) [2024-06-15 21:36:09,652][1652491] Updated weights for policy 0, policy_version 822841 (0.0011) [2024-06-15 21:36:10,727][1652491] Updated weights for policy 0, policy_version 822884 (0.0014) [2024-06-15 21:36:10,955][1648985] Fps is (10 sec: 49152.0, 60 sec: 51882.7, 300 sec: 46430.6). Total num frames: 1685291008. Throughput: 0: 12242.5. Samples: 421348864. Policy #0 lag: (min: 72.0, avg: 182.4, max: 344.0) [2024-06-15 21:36:10,956][1648985] Avg episode reward: [(0, '136.920')] [2024-06-15 21:36:15,955][1648985] Fps is (10 sec: 42599.7, 60 sec: 45875.3, 300 sec: 45430.9). Total num frames: 1685323776. Throughput: 0: 12447.3. Samples: 421426176. Policy #0 lag: (min: 72.0, avg: 182.4, max: 344.0) [2024-06-15 21:36:15,955][1648985] Avg episode reward: [(0, '164.110')] [2024-06-15 21:36:17,351][1652491] Updated weights for policy 0, policy_version 822976 (0.0012) [2024-06-15 21:36:18,789][1652491] Updated weights for policy 0, policy_version 823028 (0.0010) [2024-06-15 21:36:20,363][1652491] Updated weights for policy 0, policy_version 823061 (0.0033) [2024-06-15 21:36:20,955][1648985] Fps is (10 sec: 39321.2, 60 sec: 49698.0, 300 sec: 46319.5). Total num frames: 1685684224. Throughput: 0: 12162.9. Samples: 421491712. Policy #0 lag: (min: 72.0, avg: 182.4, max: 344.0) [2024-06-15 21:36:20,956][1648985] Avg episode reward: [(0, '152.910')] [2024-06-15 21:36:21,852][1652491] Updated weights for policy 0, policy_version 823123 (0.0011) [2024-06-15 21:36:25,955][1648985] Fps is (10 sec: 52428.0, 60 sec: 48605.9, 300 sec: 45875.2). Total num frames: 1685848064. Throughput: 0: 12253.9. Samples: 421527040. Policy #0 lag: (min: 72.0, avg: 182.4, max: 344.0) [2024-06-15 21:36:25,956][1648985] Avg episode reward: [(0, '124.610')] [2024-06-15 21:36:27,046][1652491] Updated weights for policy 0, policy_version 823200 (0.0012) [2024-06-15 21:36:28,747][1652491] Updated weights for policy 0, policy_version 823264 (0.0016) [2024-06-15 21:36:30,955][1648985] Fps is (10 sec: 42598.5, 60 sec: 48059.7, 300 sec: 45986.3). Total num frames: 1686110208. Throughput: 0: 11992.2. Samples: 421595136. Policy #0 lag: (min: 72.0, avg: 182.4, max: 344.0) [2024-06-15 21:36:30,956][1648985] Avg episode reward: [(0, '120.150')] [2024-06-15 21:36:31,355][1652491] Updated weights for policy 0, policy_version 823297 (0.0012) [2024-06-15 21:36:32,492][1652491] Updated weights for policy 0, policy_version 823345 (0.0016) [2024-06-15 21:36:34,012][1652491] Updated weights for policy 0, policy_version 823418 (0.0012) [2024-06-15 21:36:35,955][1648985] Fps is (10 sec: 52429.1, 60 sec: 48059.8, 300 sec: 46208.4). Total num frames: 1686372352. Throughput: 0: 12231.1. Samples: 421670400. Policy #0 lag: (min: 72.0, avg: 182.4, max: 344.0) [2024-06-15 21:36:35,956][1648985] Avg episode reward: [(0, '139.130')] [2024-06-15 21:36:38,503][1652491] Updated weights for policy 0, policy_version 823463 (0.0014) [2024-06-15 21:36:39,183][1651469] Signal inference workers to stop experience collection... (42900 times) [2024-06-15 21:36:39,220][1652491] InferenceWorker_p0-w0: stopping experience collection (42900 times) [2024-06-15 21:36:39,412][1651469] Signal inference workers to resume experience collection... (42900 times) [2024-06-15 21:36:39,414][1652491] InferenceWorker_p0-w0: resuming experience collection (42900 times) [2024-06-15 21:36:40,682][1652491] Updated weights for policy 0, policy_version 823552 (0.0012) [2024-06-15 21:36:40,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 48059.7, 300 sec: 46208.4). Total num frames: 1686634496. Throughput: 0: 11912.5. Samples: 421706240. Policy #0 lag: (min: 72.0, avg: 182.4, max: 344.0) [2024-06-15 21:36:40,956][1648985] Avg episode reward: [(0, '152.920')] [2024-06-15 21:36:44,199][1652491] Updated weights for policy 0, policy_version 823616 (0.0012) [2024-06-15 21:36:45,089][1652491] Updated weights for policy 0, policy_version 823664 (0.0014) [2024-06-15 21:36:45,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 49698.2, 300 sec: 46208.4). Total num frames: 1686896640. Throughput: 0: 11946.7. Samples: 421773824. Policy #0 lag: (min: 72.0, avg: 182.4, max: 344.0) [2024-06-15 21:36:45,956][1648985] Avg episode reward: [(0, '147.660')] [2024-06-15 21:36:49,419][1652491] Updated weights for policy 0, policy_version 823712 (0.0013) [2024-06-15 21:36:50,564][1652491] Updated weights for policy 0, policy_version 823763 (0.0011) [2024-06-15 21:36:50,955][1648985] Fps is (10 sec: 45875.7, 60 sec: 46967.5, 300 sec: 46208.4). Total num frames: 1687093248. Throughput: 0: 11753.3. Samples: 421847552. Policy #0 lag: (min: 72.0, avg: 182.4, max: 344.0) [2024-06-15 21:36:50,956][1648985] Avg episode reward: [(0, '126.110')] [2024-06-15 21:36:53,452][1652491] Updated weights for policy 0, policy_version 823809 (0.0014) [2024-06-15 21:36:55,160][1652491] Updated weights for policy 0, policy_version 823875 (0.0013) [2024-06-15 21:36:55,955][1648985] Fps is (10 sec: 45873.1, 60 sec: 49697.9, 300 sec: 45986.2). Total num frames: 1687355392. Throughput: 0: 11946.6. Samples: 421886464. Policy #0 lag: (min: 72.0, avg: 182.4, max: 344.0) [2024-06-15 21:36:55,956][1648985] Avg episode reward: [(0, '125.580')] [2024-06-15 21:36:56,279][1652491] Updated weights for policy 0, policy_version 823933 (0.0011) [2024-06-15 21:36:56,319][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000823936_1687420928.pth... [2024-06-15 21:36:56,390][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000818432_1676148736.pth [2024-06-15 21:37:00,960][1648985] Fps is (10 sec: 42574.8, 60 sec: 45324.9, 300 sec: 46318.6). Total num frames: 1687519232. Throughput: 0: 11865.5. Samples: 421960192. Policy #0 lag: (min: 72.0, avg: 182.4, max: 344.0) [2024-06-15 21:37:00,961][1648985] Avg episode reward: [(0, '145.870')] [2024-06-15 21:37:01,012][1652491] Updated weights for policy 0, policy_version 824000 (0.0016) [2024-06-15 21:37:04,685][1652491] Updated weights for policy 0, policy_version 824080 (0.0050) [2024-06-15 21:37:05,955][1648985] Fps is (10 sec: 45877.2, 60 sec: 48606.1, 300 sec: 46319.6). Total num frames: 1687814144. Throughput: 0: 11912.6. Samples: 422027776. Policy #0 lag: (min: 15.0, avg: 120.7, max: 271.0) [2024-06-15 21:37:05,955][1648985] Avg episode reward: [(0, '156.580')] [2024-06-15 21:37:06,212][1652491] Updated weights for policy 0, policy_version 824145 (0.0084) [2024-06-15 21:37:07,206][1652491] Updated weights for policy 0, policy_version 824192 (0.0015) [2024-06-15 21:37:10,955][1648985] Fps is (10 sec: 49178.5, 60 sec: 45329.0, 300 sec: 46430.6). Total num frames: 1688010752. Throughput: 0: 11912.5. Samples: 422063104. Policy #0 lag: (min: 15.0, avg: 120.7, max: 271.0) [2024-06-15 21:37:10,956][1648985] Avg episode reward: [(0, '151.060')] [2024-06-15 21:37:11,552][1652491] Updated weights for policy 0, policy_version 824257 (0.0090) [2024-06-15 21:37:12,756][1652491] Updated weights for policy 0, policy_version 824308 (0.0011) [2024-06-15 21:37:15,955][1648985] Fps is (10 sec: 45874.2, 60 sec: 49151.8, 300 sec: 46430.6). Total num frames: 1688272896. Throughput: 0: 12140.1. Samples: 422141440. Policy #0 lag: (min: 15.0, avg: 120.7, max: 271.0) [2024-06-15 21:37:15,956][1648985] Avg episode reward: [(0, '137.800')] [2024-06-15 21:37:16,175][1652491] Updated weights for policy 0, policy_version 824380 (0.0017) [2024-06-15 21:37:16,871][1651469] Signal inference workers to stop experience collection... (42950 times) [2024-06-15 21:37:16,925][1652491] InferenceWorker_p0-w0: stopping experience collection (42950 times) [2024-06-15 21:37:17,100][1651469] Signal inference workers to resume experience collection... (42950 times) [2024-06-15 21:37:17,100][1652491] InferenceWorker_p0-w0: resuming experience collection (42950 times) [2024-06-15 21:37:17,689][1652491] Updated weights for policy 0, policy_version 824448 (0.0092) [2024-06-15 21:37:20,955][1648985] Fps is (10 sec: 45875.8, 60 sec: 46421.4, 300 sec: 46208.5). Total num frames: 1688469504. Throughput: 0: 12037.7. Samples: 422212096. Policy #0 lag: (min: 15.0, avg: 120.7, max: 271.0) [2024-06-15 21:37:20,956][1648985] Avg episode reward: [(0, '133.710')] [2024-06-15 21:37:22,403][1652491] Updated weights for policy 0, policy_version 824513 (0.0014) [2024-06-15 21:37:25,955][1648985] Fps is (10 sec: 45876.0, 60 sec: 48059.8, 300 sec: 46763.8). Total num frames: 1688731648. Throughput: 0: 11867.0. Samples: 422240256. Policy #0 lag: (min: 15.0, avg: 120.7, max: 271.0) [2024-06-15 21:37:25,956][1648985] Avg episode reward: [(0, '133.730')] [2024-06-15 21:37:27,027][1652491] Updated weights for policy 0, policy_version 824577 (0.0014) [2024-06-15 21:37:28,141][1652491] Updated weights for policy 0, policy_version 824630 (0.0012) [2024-06-15 21:37:29,445][1652491] Updated weights for policy 0, policy_version 824688 (0.0038) [2024-06-15 21:37:30,955][1648985] Fps is (10 sec: 52428.4, 60 sec: 48059.7, 300 sec: 46430.6). Total num frames: 1688993792. Throughput: 0: 12060.4. Samples: 422316544. Policy #0 lag: (min: 15.0, avg: 120.7, max: 271.0) [2024-06-15 21:37:30,956][1648985] Avg episode reward: [(0, '150.970')] [2024-06-15 21:37:31,841][1652491] Updated weights for policy 0, policy_version 824723 (0.0012) [2024-06-15 21:37:33,558][1652491] Updated weights for policy 0, policy_version 824785 (0.0051) [2024-06-15 21:37:35,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 48059.7, 300 sec: 47097.1). Total num frames: 1689255936. Throughput: 0: 12049.1. Samples: 422389760. Policy #0 lag: (min: 15.0, avg: 120.7, max: 271.0) [2024-06-15 21:37:35,956][1648985] Avg episode reward: [(0, '151.420')] [2024-06-15 21:37:37,753][1652491] Updated weights for policy 0, policy_version 824841 (0.0013) [2024-06-15 21:37:38,863][1652491] Updated weights for policy 0, policy_version 824896 (0.0011) [2024-06-15 21:37:40,413][1652491] Updated weights for policy 0, policy_version 824953 (0.0012) [2024-06-15 21:37:40,955][1648985] Fps is (10 sec: 52429.1, 60 sec: 48059.8, 300 sec: 46874.9). Total num frames: 1689518080. Throughput: 0: 12049.2. Samples: 422428672. Policy #0 lag: (min: 15.0, avg: 120.7, max: 271.0) [2024-06-15 21:37:40,956][1648985] Avg episode reward: [(0, '174.090')] [2024-06-15 21:37:43,025][1652491] Updated weights for policy 0, policy_version 824993 (0.0013) [2024-06-15 21:37:44,236][1652491] Updated weights for policy 0, policy_version 825045 (0.0012) [2024-06-15 21:37:45,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 48059.7, 300 sec: 47097.1). Total num frames: 1689780224. Throughput: 0: 11879.9. Samples: 422494720. Policy #0 lag: (min: 15.0, avg: 120.7, max: 271.0) [2024-06-15 21:37:45,956][1648985] Avg episode reward: [(0, '177.720')] [2024-06-15 21:37:48,788][1652491] Updated weights for policy 0, policy_version 825104 (0.0015) [2024-06-15 21:37:50,955][1648985] Fps is (10 sec: 45875.1, 60 sec: 48059.7, 300 sec: 46986.0). Total num frames: 1689976832. Throughput: 0: 11901.1. Samples: 422563328. Policy #0 lag: (min: 15.0, avg: 120.7, max: 271.0) [2024-06-15 21:37:50,956][1648985] Avg episode reward: [(0, '191.310')] [2024-06-15 21:37:51,220][1652491] Updated weights for policy 0, policy_version 825200 (0.0013) [2024-06-15 21:37:54,490][1652491] Updated weights for policy 0, policy_version 825233 (0.0013) [2024-06-15 21:37:55,955][1648985] Fps is (10 sec: 39322.0, 60 sec: 46967.8, 300 sec: 46652.8). Total num frames: 1690173440. Throughput: 0: 12015.0. Samples: 422603776. Policy #0 lag: (min: 15.0, avg: 120.7, max: 271.0) [2024-06-15 21:37:55,955][1648985] Avg episode reward: [(0, '186.590')] [2024-06-15 21:37:56,898][1652491] Updated weights for policy 0, policy_version 825318 (0.0278) [2024-06-15 21:38:00,237][1652491] Updated weights for policy 0, policy_version 825365 (0.0014) [2024-06-15 21:38:00,433][1651469] Signal inference workers to stop experience collection... (43000 times) [2024-06-15 21:38:00,506][1652491] InferenceWorker_p0-w0: stopping experience collection (43000 times) [2024-06-15 21:38:00,679][1651469] Signal inference workers to resume experience collection... (43000 times) [2024-06-15 21:38:00,680][1652491] InferenceWorker_p0-w0: resuming experience collection (43000 times) [2024-06-15 21:38:00,955][1648985] Fps is (10 sec: 42598.7, 60 sec: 48064.2, 300 sec: 47208.1). Total num frames: 1690402816. Throughput: 0: 11696.4. Samples: 422667776. Policy #0 lag: (min: 15.0, avg: 120.7, max: 271.0) [2024-06-15 21:38:00,956][1648985] Avg episode reward: [(0, '176.680')] [2024-06-15 21:38:01,719][1652491] Updated weights for policy 0, policy_version 825424 (0.0014) [2024-06-15 21:38:02,711][1652491] Updated weights for policy 0, policy_version 825467 (0.0014) [2024-06-15 21:38:05,955][1648985] Fps is (10 sec: 42597.9, 60 sec: 46421.3, 300 sec: 46652.7). Total num frames: 1690599424. Throughput: 0: 11798.7. Samples: 422743040. Policy #0 lag: (min: 15.0, avg: 120.7, max: 271.0) [2024-06-15 21:38:05,956][1648985] Avg episode reward: [(0, '155.540')] [2024-06-15 21:38:08,059][1652491] Updated weights for policy 0, policy_version 825568 (0.0014) [2024-06-15 21:38:10,955][1648985] Fps is (10 sec: 42598.5, 60 sec: 46967.6, 300 sec: 47097.1). Total num frames: 1690828800. Throughput: 0: 11730.5. Samples: 422768128. Policy #0 lag: (min: 15.0, avg: 120.7, max: 271.0) [2024-06-15 21:38:10,955][1648985] Avg episode reward: [(0, '166.080')] [2024-06-15 21:38:11,203][1652491] Updated weights for policy 0, policy_version 825602 (0.0015) [2024-06-15 21:38:12,106][1652491] Updated weights for policy 0, policy_version 825663 (0.0014) [2024-06-15 21:38:13,531][1652491] Updated weights for policy 0, policy_version 825721 (0.0013) [2024-06-15 21:38:15,955][1648985] Fps is (10 sec: 49152.2, 60 sec: 46967.6, 300 sec: 46874.9). Total num frames: 1691090944. Throughput: 0: 11787.4. Samples: 422846976. Policy #0 lag: (min: 15.0, avg: 120.7, max: 271.0) [2024-06-15 21:38:15,956][1648985] Avg episode reward: [(0, '148.240')] [2024-06-15 21:38:17,956][1652491] Updated weights for policy 0, policy_version 825792 (0.0118) [2024-06-15 21:38:20,955][1648985] Fps is (10 sec: 52428.2, 60 sec: 48059.7, 300 sec: 47097.0). Total num frames: 1691353088. Throughput: 0: 11571.2. Samples: 422910464. Policy #0 lag: (min: 15.0, avg: 120.7, max: 271.0) [2024-06-15 21:38:20,956][1648985] Avg episode reward: [(0, '144.090')] [2024-06-15 21:38:22,866][1652491] Updated weights for policy 0, policy_version 825879 (0.0031) [2024-06-15 21:38:23,747][1652491] Updated weights for policy 0, policy_version 825923 (0.0195) [2024-06-15 21:38:25,955][1648985] Fps is (10 sec: 52428.5, 60 sec: 48059.7, 300 sec: 47097.1). Total num frames: 1691615232. Throughput: 0: 11537.1. Samples: 422947840. Policy #0 lag: (min: 15.0, avg: 120.7, max: 271.0) [2024-06-15 21:38:25,956][1648985] Avg episode reward: [(0, '133.200')] [2024-06-15 21:38:28,100][1652491] Updated weights for policy 0, policy_version 825986 (0.0084) [2024-06-15 21:38:29,568][1652491] Updated weights for policy 0, policy_version 826049 (0.0014) [2024-06-15 21:38:30,645][1652491] Updated weights for policy 0, policy_version 826100 (0.0015) [2024-06-15 21:38:30,955][1648985] Fps is (10 sec: 52428.1, 60 sec: 48059.6, 300 sec: 47097.0). Total num frames: 1691877376. Throughput: 0: 11605.3. Samples: 423016960. Policy #0 lag: (min: 15.0, avg: 120.7, max: 271.0) [2024-06-15 21:38:30,956][1648985] Avg episode reward: [(0, '128.260')] [2024-06-15 21:38:34,098][1652491] Updated weights for policy 0, policy_version 826167 (0.0026) [2024-06-15 21:38:35,788][1652491] Updated weights for policy 0, policy_version 826229 (0.0015) [2024-06-15 21:38:35,955][1648985] Fps is (10 sec: 49152.1, 60 sec: 47513.6, 300 sec: 47097.0). Total num frames: 1692106752. Throughput: 0: 11605.3. Samples: 423085568. Policy #0 lag: (min: 15.0, avg: 120.7, max: 271.0) [2024-06-15 21:38:35,956][1648985] Avg episode reward: [(0, '144.280')] [2024-06-15 21:38:39,097][1652491] Updated weights for policy 0, policy_version 826256 (0.0013) [2024-06-15 21:38:40,955][1648985] Fps is (10 sec: 45876.3, 60 sec: 46967.5, 300 sec: 46874.9). Total num frames: 1692336128. Throughput: 0: 11650.8. Samples: 423128064. Policy #0 lag: (min: 15.0, avg: 120.7, max: 271.0) [2024-06-15 21:38:40,956][1648985] Avg episode reward: [(0, '154.340')] [2024-06-15 21:38:41,019][1651469] Signal inference workers to stop experience collection... (43050 times) [2024-06-15 21:38:41,130][1652491] InferenceWorker_p0-w0: stopping experience collection (43050 times) [2024-06-15 21:38:41,239][1651469] Signal inference workers to resume experience collection... (43050 times) [2024-06-15 21:38:41,240][1652491] InferenceWorker_p0-w0: resuming experience collection (43050 times) [2024-06-15 21:38:41,323][1652491] Updated weights for policy 0, policy_version 826355 (0.0014) [2024-06-15 21:38:45,955][1648985] Fps is (10 sec: 32768.3, 60 sec: 44236.9, 300 sec: 46652.8). Total num frames: 1692434432. Throughput: 0: 11787.4. Samples: 423198208. Policy #0 lag: (min: 15.0, avg: 120.7, max: 271.0) [2024-06-15 21:38:45,955][1648985] Avg episode reward: [(0, '176.380')] [2024-06-15 21:38:46,376][1652491] Updated weights for policy 0, policy_version 826403 (0.0013) [2024-06-15 21:38:48,123][1652491] Updated weights for policy 0, policy_version 826465 (0.0012) [2024-06-15 21:38:49,810][1652491] Updated weights for policy 0, policy_version 826501 (0.0012) [2024-06-15 21:38:50,955][1648985] Fps is (10 sec: 42597.8, 60 sec: 46421.3, 300 sec: 46541.7). Total num frames: 1692762112. Throughput: 0: 11582.6. Samples: 423264256. Policy #0 lag: (min: 15.0, avg: 120.7, max: 271.0) [2024-06-15 21:38:50,956][1648985] Avg episode reward: [(0, '188.780')] [2024-06-15 21:38:51,577][1652491] Updated weights for policy 0, policy_version 826576 (0.0014) [2024-06-15 21:38:55,955][1648985] Fps is (10 sec: 49151.5, 60 sec: 45875.1, 300 sec: 46763.8). Total num frames: 1692925952. Throughput: 0: 11719.1. Samples: 423295488. Policy #0 lag: (min: 15.0, avg: 120.7, max: 271.0) [2024-06-15 21:38:55,956][1648985] Avg episode reward: [(0, '201.110')] [2024-06-15 21:38:55,963][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000826624_1692925952.pth... [2024-06-15 21:38:56,002][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000821120_1681653760.pth [2024-06-15 21:38:56,922][1652491] Updated weights for policy 0, policy_version 826640 (0.0013) [2024-06-15 21:38:58,682][1652491] Updated weights for policy 0, policy_version 826704 (0.0014) [2024-06-15 21:39:00,955][1648985] Fps is (10 sec: 42599.1, 60 sec: 46421.3, 300 sec: 46208.4). Total num frames: 1693188096. Throughput: 0: 11616.7. Samples: 423369728. Policy #0 lag: (min: 15.0, avg: 94.3, max: 271.0) [2024-06-15 21:39:00,955][1648985] Avg episode reward: [(0, '201.920')] [2024-06-15 21:39:01,211][1652491] Updated weights for policy 0, policy_version 826755 (0.0013) [2024-06-15 21:39:03,862][1652491] Updated weights for policy 0, policy_version 826877 (0.0014) [2024-06-15 21:39:05,957][1648985] Fps is (10 sec: 52417.4, 60 sec: 47511.9, 300 sec: 47096.8). Total num frames: 1693450240. Throughput: 0: 11684.4. Samples: 423436288. Policy #0 lag: (min: 15.0, avg: 94.3, max: 271.0) [2024-06-15 21:39:05,958][1648985] Avg episode reward: [(0, '188.410')] [2024-06-15 21:39:09,676][1652491] Updated weights for policy 0, policy_version 826944 (0.0016) [2024-06-15 21:39:10,955][1648985] Fps is (10 sec: 49151.5, 60 sec: 47513.5, 300 sec: 46652.8). Total num frames: 1693679616. Throughput: 0: 11730.5. Samples: 423475712. Policy #0 lag: (min: 15.0, avg: 94.3, max: 271.0) [2024-06-15 21:39:10,956][1648985] Avg episode reward: [(0, '190.370')] [2024-06-15 21:39:11,193][1652491] Updated weights for policy 0, policy_version 827005 (0.0023) [2024-06-15 21:39:14,222][1652491] Updated weights for policy 0, policy_version 827072 (0.0090) [2024-06-15 21:39:15,955][1648985] Fps is (10 sec: 52440.0, 60 sec: 48059.7, 300 sec: 47097.0). Total num frames: 1693974528. Throughput: 0: 11468.8. Samples: 423533056. Policy #0 lag: (min: 15.0, avg: 94.3, max: 271.0) [2024-06-15 21:39:15,956][1648985] Avg episode reward: [(0, '176.460')] [2024-06-15 21:39:19,947][1652491] Updated weights for policy 0, policy_version 827139 (0.0014) [2024-06-15 21:39:20,955][1648985] Fps is (10 sec: 36045.0, 60 sec: 44783.0, 300 sec: 46763.8). Total num frames: 1694040064. Throughput: 0: 11685.0. Samples: 423611392. Policy #0 lag: (min: 15.0, avg: 94.3, max: 271.0) [2024-06-15 21:39:20,956][1648985] Avg episode reward: [(0, '160.390')] [2024-06-15 21:39:21,400][1652491] Updated weights for policy 0, policy_version 827200 (0.0012) [2024-06-15 21:39:23,141][1652491] Updated weights for policy 0, policy_version 827259 (0.0010) [2024-06-15 21:39:24,230][1651469] Signal inference workers to stop experience collection... (43100 times) [2024-06-15 21:39:24,272][1652491] InferenceWorker_p0-w0: stopping experience collection (43100 times) [2024-06-15 21:39:24,474][1651469] Signal inference workers to resume experience collection... (43100 times) [2024-06-15 21:39:24,475][1652491] InferenceWorker_p0-w0: resuming experience collection (43100 times) [2024-06-15 21:39:25,358][1652491] Updated weights for policy 0, policy_version 827315 (0.0011) [2024-06-15 21:39:25,955][1648985] Fps is (10 sec: 42597.9, 60 sec: 46421.2, 300 sec: 46763.8). Total num frames: 1694400512. Throughput: 0: 11377.7. Samples: 423640064. Policy #0 lag: (min: 15.0, avg: 94.3, max: 271.0) [2024-06-15 21:39:25,956][1648985] Avg episode reward: [(0, '172.650')] [2024-06-15 21:39:30,955][1648985] Fps is (10 sec: 45875.2, 60 sec: 43690.8, 300 sec: 47097.3). Total num frames: 1694498816. Throughput: 0: 11298.1. Samples: 423706624. Policy #0 lag: (min: 15.0, avg: 94.3, max: 271.0) [2024-06-15 21:39:30,956][1648985] Avg episode reward: [(0, '172.310')] [2024-06-15 21:39:31,449][1652491] Updated weights for policy 0, policy_version 827394 (0.0023) [2024-06-15 21:39:32,676][1652491] Updated weights for policy 0, policy_version 827456 (0.0020) [2024-06-15 21:39:34,374][1652491] Updated weights for policy 0, policy_version 827511 (0.0103) [2024-06-15 21:39:35,955][1648985] Fps is (10 sec: 39322.5, 60 sec: 44783.0, 300 sec: 46763.8). Total num frames: 1694793728. Throughput: 0: 11434.7. Samples: 423778816. Policy #0 lag: (min: 15.0, avg: 94.3, max: 271.0) [2024-06-15 21:39:35,956][1648985] Avg episode reward: [(0, '177.560')] [2024-06-15 21:39:36,040][1652491] Updated weights for policy 0, policy_version 827539 (0.0012) [2024-06-15 21:39:37,667][1652491] Updated weights for policy 0, policy_version 827602 (0.0013) [2024-06-15 21:39:40,956][1648985] Fps is (10 sec: 52426.5, 60 sec: 44782.6, 300 sec: 47097.0). Total num frames: 1695023104. Throughput: 0: 11343.5. Samples: 423805952. Policy #0 lag: (min: 15.0, avg: 94.3, max: 271.0) [2024-06-15 21:39:40,957][1648985] Avg episode reward: [(0, '164.640')] [2024-06-15 21:39:42,750][1652491] Updated weights for policy 0, policy_version 827650 (0.0012) [2024-06-15 21:39:44,041][1652491] Updated weights for policy 0, policy_version 827711 (0.0021) [2024-06-15 21:39:45,941][1652491] Updated weights for policy 0, policy_version 827768 (0.0013) [2024-06-15 21:39:45,955][1648985] Fps is (10 sec: 45874.9, 60 sec: 46967.4, 300 sec: 46986.0). Total num frames: 1695252480. Throughput: 0: 11366.4. Samples: 423881216. Policy #0 lag: (min: 15.0, avg: 94.3, max: 271.0) [2024-06-15 21:39:45,956][1648985] Avg episode reward: [(0, '153.870')] [2024-06-15 21:39:48,047][1652491] Updated weights for policy 0, policy_version 827828 (0.0027) [2024-06-15 21:39:49,716][1652491] Updated weights for policy 0, policy_version 827902 (0.0021) [2024-06-15 21:39:50,955][1648985] Fps is (10 sec: 52431.2, 60 sec: 46421.4, 300 sec: 47097.1). Total num frames: 1695547392. Throughput: 0: 11150.8. Samples: 423938048. Policy #0 lag: (min: 15.0, avg: 94.3, max: 271.0) [2024-06-15 21:39:50,956][1648985] Avg episode reward: [(0, '153.650')] [2024-06-15 21:39:55,631][1652491] Updated weights for policy 0, policy_version 827953 (0.0012) [2024-06-15 21:39:55,955][1648985] Fps is (10 sec: 42598.7, 60 sec: 45875.2, 300 sec: 47097.1). Total num frames: 1695678464. Throughput: 0: 11184.4. Samples: 423979008. Policy #0 lag: (min: 15.0, avg: 94.3, max: 271.0) [2024-06-15 21:39:55,956][1648985] Avg episode reward: [(0, '165.870')] [2024-06-15 21:39:57,348][1652491] Updated weights for policy 0, policy_version 828000 (0.0109) [2024-06-15 21:39:58,998][1652491] Updated weights for policy 0, policy_version 828054 (0.0012) [2024-06-15 21:40:00,955][1648985] Fps is (10 sec: 42598.2, 60 sec: 46421.3, 300 sec: 46763.8). Total num frames: 1695973376. Throughput: 0: 11275.4. Samples: 424040448. Policy #0 lag: (min: 15.0, avg: 94.3, max: 271.0) [2024-06-15 21:40:00,956][1648985] Avg episode reward: [(0, '164.010')] [2024-06-15 21:40:01,590][1652491] Updated weights for policy 0, policy_version 828134 (0.0112) [2024-06-15 21:40:05,955][1648985] Fps is (10 sec: 39321.8, 60 sec: 43692.3, 300 sec: 47097.1). Total num frames: 1696071680. Throughput: 0: 11184.4. Samples: 424114688. Policy #0 lag: (min: 15.0, avg: 94.3, max: 271.0) [2024-06-15 21:40:05,955][1648985] Avg episode reward: [(0, '167.510')] [2024-06-15 21:40:07,051][1652491] Updated weights for policy 0, policy_version 828193 (0.0014) [2024-06-15 21:40:09,523][1651469] Signal inference workers to stop experience collection... (43150 times) [2024-06-15 21:40:09,615][1652491] InferenceWorker_p0-w0: stopping experience collection (43150 times) [2024-06-15 21:40:09,815][1651469] Signal inference workers to resume experience collection... (43150 times) [2024-06-15 21:40:09,817][1652491] InferenceWorker_p0-w0: resuming experience collection (43150 times) [2024-06-15 21:40:10,044][1652491] Updated weights for policy 0, policy_version 828288 (0.0148) [2024-06-15 21:40:10,955][1648985] Fps is (10 sec: 39321.9, 60 sec: 44783.0, 300 sec: 46763.8). Total num frames: 1696366592. Throughput: 0: 11355.1. Samples: 424151040. Policy #0 lag: (min: 15.0, avg: 94.3, max: 271.0) [2024-06-15 21:40:10,955][1648985] Avg episode reward: [(0, '169.040')] [2024-06-15 21:40:11,902][1652491] Updated weights for policy 0, policy_version 828341 (0.0012) [2024-06-15 21:40:13,559][1652491] Updated weights for policy 0, policy_version 828414 (0.0015) [2024-06-15 21:40:15,955][1648985] Fps is (10 sec: 52428.3, 60 sec: 43690.7, 300 sec: 47097.1). Total num frames: 1696595968. Throughput: 0: 11195.7. Samples: 424210432. Policy #0 lag: (min: 15.0, avg: 94.3, max: 271.0) [2024-06-15 21:40:15,956][1648985] Avg episode reward: [(0, '170.380')] [2024-06-15 21:40:19,933][1652491] Updated weights for policy 0, policy_version 828483 (0.0044) [2024-06-15 21:40:20,955][1648985] Fps is (10 sec: 42598.1, 60 sec: 45875.2, 300 sec: 46986.0). Total num frames: 1696792576. Throughput: 0: 11218.5. Samples: 424283648. Policy #0 lag: (min: 15.0, avg: 94.3, max: 271.0) [2024-06-15 21:40:20,956][1648985] Avg episode reward: [(0, '159.240')] [2024-06-15 21:40:21,182][1652491] Updated weights for policy 0, policy_version 828535 (0.0011) [2024-06-15 21:40:23,507][1652491] Updated weights for policy 0, policy_version 828601 (0.0012) [2024-06-15 21:40:24,774][1652491] Updated weights for policy 0, policy_version 828656 (0.0016) [2024-06-15 21:40:25,955][1648985] Fps is (10 sec: 52428.6, 60 sec: 45329.2, 300 sec: 47097.1). Total num frames: 1697120256. Throughput: 0: 11309.6. Samples: 424314880. Policy #0 lag: (min: 15.0, avg: 94.3, max: 271.0) [2024-06-15 21:40:25,956][1648985] Avg episode reward: [(0, '136.780')] [2024-06-15 21:40:29,934][1652491] Updated weights for policy 0, policy_version 828720 (0.0138) [2024-06-15 21:40:30,955][1648985] Fps is (10 sec: 45874.0, 60 sec: 45875.0, 300 sec: 46652.7). Total num frames: 1697251328. Throughput: 0: 11263.9. Samples: 424388096. Policy #0 lag: (min: 15.0, avg: 94.3, max: 271.0) [2024-06-15 21:40:30,956][1648985] Avg episode reward: [(0, '141.530')] [2024-06-15 21:40:31,724][1652491] Updated weights for policy 0, policy_version 828769 (0.0012) [2024-06-15 21:40:32,274][1652491] Updated weights for policy 0, policy_version 828798 (0.0019) [2024-06-15 21:40:34,621][1652491] Updated weights for policy 0, policy_version 828864 (0.0014) [2024-06-15 21:40:35,799][1652491] Updated weights for policy 0, policy_version 828918 (0.0043) [2024-06-15 21:40:35,955][1648985] Fps is (10 sec: 49152.1, 60 sec: 46967.4, 300 sec: 46986.0). Total num frames: 1697611776. Throughput: 0: 11548.4. Samples: 424457728. Policy #0 lag: (min: 15.0, avg: 94.3, max: 271.0) [2024-06-15 21:40:35,955][1648985] Avg episode reward: [(0, '154.540')] [2024-06-15 21:40:40,345][1652491] Updated weights for policy 0, policy_version 828946 (0.0023) [2024-06-15 21:40:40,955][1648985] Fps is (10 sec: 49152.6, 60 sec: 45329.3, 300 sec: 46874.9). Total num frames: 1697742848. Throughput: 0: 11650.8. Samples: 424503296. Policy #0 lag: (min: 15.0, avg: 94.3, max: 271.0) [2024-06-15 21:40:40,956][1648985] Avg episode reward: [(0, '157.510')] [2024-06-15 21:40:41,728][1652491] Updated weights for policy 0, policy_version 828995 (0.0095) [2024-06-15 21:40:42,933][1652491] Updated weights for policy 0, policy_version 829056 (0.0108) [2024-06-15 21:40:44,638][1652491] Updated weights for policy 0, policy_version 829107 (0.0013) [2024-06-15 21:40:45,955][1648985] Fps is (10 sec: 49152.0, 60 sec: 47513.6, 300 sec: 46874.9). Total num frames: 1698103296. Throughput: 0: 11776.0. Samples: 424570368. Policy #0 lag: (min: 15.0, avg: 94.3, max: 271.0) [2024-06-15 21:40:45,956][1648985] Avg episode reward: [(0, '158.670')] [2024-06-15 21:40:46,298][1652491] Updated weights for policy 0, policy_version 829179 (0.0012) [2024-06-15 21:40:50,955][1648985] Fps is (10 sec: 42599.3, 60 sec: 43690.7, 300 sec: 46763.8). Total num frames: 1698168832. Throughput: 0: 12003.6. Samples: 424654848. Policy #0 lag: (min: 45.0, avg: 193.8, max: 335.0) [2024-06-15 21:40:50,956][1648985] Avg episode reward: [(0, '169.340')] [2024-06-15 21:40:51,582][1652491] Updated weights for policy 0, policy_version 829219 (0.0013) [2024-06-15 21:40:52,520][1652491] Updated weights for policy 0, policy_version 829265 (0.0011) [2024-06-15 21:40:52,873][1651469] Signal inference workers to stop experience collection... (43200 times) [2024-06-15 21:40:52,924][1652491] InferenceWorker_p0-w0: stopping experience collection (43200 times) [2024-06-15 21:40:53,105][1651469] Signal inference workers to resume experience collection... (43200 times) [2024-06-15 21:40:53,106][1652491] InferenceWorker_p0-w0: resuming experience collection (43200 times) [2024-06-15 21:40:54,465][1652491] Updated weights for policy 0, policy_version 829344 (0.0013) [2024-06-15 21:40:55,633][1652491] Updated weights for policy 0, policy_version 829397 (0.0014) [2024-06-15 21:40:55,955][1648985] Fps is (10 sec: 52428.4, 60 sec: 49151.9, 300 sec: 46874.9). Total num frames: 1698627584. Throughput: 0: 11878.4. Samples: 424685568. Policy #0 lag: (min: 45.0, avg: 193.8, max: 335.0) [2024-06-15 21:40:55,956][1648985] Avg episode reward: [(0, '159.000')] [2024-06-15 21:40:56,319][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000829440_1698693120.pth... [2024-06-15 21:40:56,420][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000823936_1687420928.pth [2024-06-15 21:41:00,955][1648985] Fps is (10 sec: 52428.0, 60 sec: 45329.0, 300 sec: 46763.8). Total num frames: 1698693120. Throughput: 0: 12231.1. Samples: 424760832. Policy #0 lag: (min: 45.0, avg: 193.8, max: 335.0) [2024-06-15 21:41:00,956][1648985] Avg episode reward: [(0, '162.270')] [2024-06-15 21:41:01,916][1652491] Updated weights for policy 0, policy_version 829441 (0.0012) [2024-06-15 21:41:04,470][1652491] Updated weights for policy 0, policy_version 829536 (0.0013) [2024-06-15 21:41:05,843][1652491] Updated weights for policy 0, policy_version 829591 (0.0027) [2024-06-15 21:41:05,955][1648985] Fps is (10 sec: 39322.0, 60 sec: 49151.9, 300 sec: 46541.7). Total num frames: 1699020800. Throughput: 0: 11992.2. Samples: 424823296. Policy #0 lag: (min: 45.0, avg: 193.8, max: 335.0) [2024-06-15 21:41:05,955][1648985] Avg episode reward: [(0, '161.400')] [2024-06-15 21:41:07,006][1652491] Updated weights for policy 0, policy_version 829648 (0.0012) [2024-06-15 21:41:10,955][1648985] Fps is (10 sec: 52429.5, 60 sec: 47513.6, 300 sec: 47097.0). Total num frames: 1699217408. Throughput: 0: 12071.8. Samples: 424858112. Policy #0 lag: (min: 45.0, avg: 193.8, max: 335.0) [2024-06-15 21:41:10,955][1648985] Avg episode reward: [(0, '169.170')] [2024-06-15 21:41:13,725][1652491] Updated weights for policy 0, policy_version 829712 (0.0013) [2024-06-15 21:41:15,457][1652491] Updated weights for policy 0, policy_version 829776 (0.0011) [2024-06-15 21:41:15,955][1648985] Fps is (10 sec: 39320.6, 60 sec: 46967.3, 300 sec: 46541.6). Total num frames: 1699414016. Throughput: 0: 12071.8. Samples: 424931328. Policy #0 lag: (min: 45.0, avg: 193.8, max: 335.0) [2024-06-15 21:41:15,956][1648985] Avg episode reward: [(0, '166.130')] [2024-06-15 21:41:16,792][1652491] Updated weights for policy 0, policy_version 829824 (0.0014) [2024-06-15 21:41:18,639][1652491] Updated weights for policy 0, policy_version 829904 (0.0013) [2024-06-15 21:41:20,955][1648985] Fps is (10 sec: 52428.6, 60 sec: 49152.0, 300 sec: 47097.1). Total num frames: 1699741696. Throughput: 0: 11867.0. Samples: 424991744. Policy #0 lag: (min: 45.0, avg: 193.8, max: 335.0) [2024-06-15 21:41:20,956][1648985] Avg episode reward: [(0, '164.580')] [2024-06-15 21:41:25,180][1652491] Updated weights for policy 0, policy_version 829955 (0.0014) [2024-06-15 21:41:25,958][1648985] Fps is (10 sec: 39310.4, 60 sec: 44780.6, 300 sec: 46430.1). Total num frames: 1699807232. Throughput: 0: 11775.2. Samples: 425033216. Policy #0 lag: (min: 45.0, avg: 193.8, max: 335.0) [2024-06-15 21:41:25,959][1648985] Avg episode reward: [(0, '182.310')] [2024-06-15 21:41:26,780][1652491] Updated weights for policy 0, policy_version 830016 (0.0013) [2024-06-15 21:41:28,565][1652491] Updated weights for policy 0, policy_version 830082 (0.0095) [2024-06-15 21:41:29,918][1652491] Updated weights for policy 0, policy_version 830134 (0.0023) [2024-06-15 21:41:30,227][1651469] Signal inference workers to stop experience collection... (43250 times) [2024-06-15 21:41:30,276][1652491] InferenceWorker_p0-w0: stopping experience collection (43250 times) [2024-06-15 21:41:30,422][1651469] Signal inference workers to resume experience collection... (43250 times) [2024-06-15 21:41:30,423][1652491] InferenceWorker_p0-w0: resuming experience collection (43250 times) [2024-06-15 21:41:30,955][1648985] Fps is (10 sec: 45875.3, 60 sec: 49152.2, 300 sec: 46874.9). Total num frames: 1700200448. Throughput: 0: 11582.6. Samples: 425091584. Policy #0 lag: (min: 45.0, avg: 193.8, max: 335.0) [2024-06-15 21:41:30,955][1648985] Avg episode reward: [(0, '177.620')] [2024-06-15 21:41:31,077][1652491] Updated weights for policy 0, policy_version 830197 (0.0117) [2024-06-15 21:41:35,955][1648985] Fps is (10 sec: 45889.7, 60 sec: 44236.9, 300 sec: 46208.5). Total num frames: 1700265984. Throughput: 0: 11366.4. Samples: 425166336. Policy #0 lag: (min: 45.0, avg: 193.8, max: 335.0) [2024-06-15 21:41:35,955][1648985] Avg episode reward: [(0, '175.400')] [2024-06-15 21:41:38,763][1652491] Updated weights for policy 0, policy_version 830256 (0.0013) [2024-06-15 21:41:40,955][1648985] Fps is (10 sec: 32767.6, 60 sec: 46421.4, 300 sec: 46208.4). Total num frames: 1700528128. Throughput: 0: 11457.4. Samples: 425201152. Policy #0 lag: (min: 45.0, avg: 193.8, max: 335.0) [2024-06-15 21:41:40,956][1648985] Avg episode reward: [(0, '179.360')] [2024-06-15 21:41:41,080][1652491] Updated weights for policy 0, policy_version 830340 (0.0047) [2024-06-15 21:41:42,174][1652491] Updated weights for policy 0, policy_version 830389 (0.0013) [2024-06-15 21:41:43,596][1652491] Updated weights for policy 0, policy_version 830453 (0.0013) [2024-06-15 21:41:45,955][1648985] Fps is (10 sec: 52428.2, 60 sec: 44782.9, 300 sec: 46430.6). Total num frames: 1700790272. Throughput: 0: 11138.8. Samples: 425262080. Policy #0 lag: (min: 45.0, avg: 193.8, max: 335.0) [2024-06-15 21:41:45,956][1648985] Avg episode reward: [(0, '158.040')] [2024-06-15 21:41:49,258][1652491] Updated weights for policy 0, policy_version 830496 (0.0012) [2024-06-15 21:41:50,716][1652491] Updated weights for policy 0, policy_version 830549 (0.0134) [2024-06-15 21:41:50,955][1648985] Fps is (10 sec: 45875.8, 60 sec: 46967.4, 300 sec: 46208.5). Total num frames: 1700986880. Throughput: 0: 11480.2. Samples: 425339904. Policy #0 lag: (min: 45.0, avg: 193.8, max: 335.0) [2024-06-15 21:41:50,955][1648985] Avg episode reward: [(0, '158.270')] [2024-06-15 21:41:51,962][1652491] Updated weights for policy 0, policy_version 830608 (0.0013) [2024-06-15 21:41:53,069][1652491] Updated weights for policy 0, policy_version 830660 (0.0082) [2024-06-15 21:41:54,444][1652491] Updated weights for policy 0, policy_version 830717 (0.0056) [2024-06-15 21:41:55,955][1648985] Fps is (10 sec: 52429.5, 60 sec: 44783.1, 300 sec: 46764.7). Total num frames: 1701314560. Throughput: 0: 11241.2. Samples: 425363968. Policy #0 lag: (min: 45.0, avg: 193.8, max: 335.0) [2024-06-15 21:41:55,955][1648985] Avg episode reward: [(0, '133.100')] [2024-06-15 21:42:00,955][1648985] Fps is (10 sec: 42597.8, 60 sec: 45329.1, 300 sec: 46097.3). Total num frames: 1701412864. Throughput: 0: 11434.7. Samples: 425445888. Policy #0 lag: (min: 45.0, avg: 193.8, max: 335.0) [2024-06-15 21:42:00,956][1648985] Avg episode reward: [(0, '171.590')] [2024-06-15 21:42:01,002][1652491] Updated weights for policy 0, policy_version 830778 (0.0014) [2024-06-15 21:42:02,434][1652491] Updated weights for policy 0, policy_version 830832 (0.0013) [2024-06-15 21:42:04,123][1652491] Updated weights for policy 0, policy_version 830897 (0.0026) [2024-06-15 21:42:05,721][1652491] Updated weights for policy 0, policy_version 830964 (0.0087) [2024-06-15 21:42:05,955][1648985] Fps is (10 sec: 52428.2, 60 sec: 46967.4, 300 sec: 46874.9). Total num frames: 1701838848. Throughput: 0: 11332.3. Samples: 425501696. Policy #0 lag: (min: 45.0, avg: 193.8, max: 335.0) [2024-06-15 21:42:05,956][1648985] Avg episode reward: [(0, '162.500')] [2024-06-15 21:42:10,955][1648985] Fps is (10 sec: 42598.5, 60 sec: 43690.6, 300 sec: 45986.3). Total num frames: 1701838848. Throughput: 0: 11298.9. Samples: 425541632. Policy #0 lag: (min: 45.0, avg: 193.8, max: 335.0) [2024-06-15 21:42:10,956][1648985] Avg episode reward: [(0, '165.260')] [2024-06-15 21:42:11,264][1652491] Updated weights for policy 0, policy_version 830996 (0.0013) [2024-06-15 21:42:12,937][1651469] Signal inference workers to stop experience collection... (43300 times) [2024-06-15 21:42:12,996][1652491] InferenceWorker_p0-w0: stopping experience collection (43300 times) [2024-06-15 21:42:13,146][1651469] Signal inference workers to resume experience collection... (43300 times) [2024-06-15 21:42:13,146][1652491] InferenceWorker_p0-w0: resuming experience collection (43300 times) [2024-06-15 21:42:13,149][1652491] Updated weights for policy 0, policy_version 831056 (0.0013) [2024-06-15 21:42:14,468][1652491] Updated weights for policy 0, policy_version 831120 (0.0013) [2024-06-15 21:42:15,955][1648985] Fps is (10 sec: 39321.9, 60 sec: 46967.7, 300 sec: 46652.7). Total num frames: 1702232064. Throughput: 0: 11616.7. Samples: 425614336. Policy #0 lag: (min: 45.0, avg: 193.8, max: 335.0) [2024-06-15 21:42:15,956][1648985] Avg episode reward: [(0, '173.830')] [2024-06-15 21:42:16,498][1652491] Updated weights for policy 0, policy_version 831189 (0.0012) [2024-06-15 21:42:17,265][1652491] Updated weights for policy 0, policy_version 831226 (0.0012) [2024-06-15 21:42:20,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 43690.6, 300 sec: 46208.4). Total num frames: 1702363136. Throughput: 0: 11514.3. Samples: 425684480. Policy #0 lag: (min: 45.0, avg: 193.8, max: 335.0) [2024-06-15 21:42:20,956][1648985] Avg episode reward: [(0, '181.800')] [2024-06-15 21:42:23,480][1652491] Updated weights for policy 0, policy_version 831280 (0.0153) [2024-06-15 21:42:25,232][1652491] Updated weights for policy 0, policy_version 831348 (0.0127) [2024-06-15 21:42:25,955][1648985] Fps is (10 sec: 42598.5, 60 sec: 47516.1, 300 sec: 46319.5). Total num frames: 1702658048. Throughput: 0: 11525.7. Samples: 425719808. Policy #0 lag: (min: 45.0, avg: 193.8, max: 335.0) [2024-06-15 21:42:25,955][1648985] Avg episode reward: [(0, '181.180')] [2024-06-15 21:42:27,044][1652491] Updated weights for policy 0, policy_version 831424 (0.0013) [2024-06-15 21:42:28,512][1652491] Updated weights for policy 0, policy_version 831479 (0.0023) [2024-06-15 21:42:30,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 44782.9, 300 sec: 46208.4). Total num frames: 1702887424. Throughput: 0: 11434.7. Samples: 425776640. Policy #0 lag: (min: 45.0, avg: 193.8, max: 335.0) [2024-06-15 21:42:30,956][1648985] Avg episode reward: [(0, '173.970')] [2024-06-15 21:42:35,417][1652491] Updated weights for policy 0, policy_version 831543 (0.0013) [2024-06-15 21:42:35,970][1648985] Fps is (10 sec: 35990.2, 60 sec: 45863.6, 300 sec: 45761.8). Total num frames: 1703018496. Throughput: 0: 11305.7. Samples: 425848832. Policy #0 lag: (min: 47.0, avg: 111.6, max: 303.0) [2024-06-15 21:42:35,971][1648985] Avg episode reward: [(0, '152.860')] [2024-06-15 21:42:37,356][1652491] Updated weights for policy 0, policy_version 831586 (0.0013) [2024-06-15 21:42:40,192][1652491] Updated weights for policy 0, policy_version 831684 (0.0011) [2024-06-15 21:42:40,955][1648985] Fps is (10 sec: 45874.4, 60 sec: 46967.4, 300 sec: 45986.2). Total num frames: 1703346176. Throughput: 0: 11355.0. Samples: 425874944. Policy #0 lag: (min: 47.0, avg: 111.6, max: 303.0) [2024-06-15 21:42:40,956][1648985] Avg episode reward: [(0, '152.900')] [2024-06-15 21:42:41,586][1652491] Updated weights for policy 0, policy_version 831744 (0.0015) [2024-06-15 21:42:45,955][1648985] Fps is (10 sec: 39381.0, 60 sec: 43690.7, 300 sec: 45542.0). Total num frames: 1703411712. Throughput: 0: 10991.0. Samples: 425940480. Policy #0 lag: (min: 47.0, avg: 111.6, max: 303.0) [2024-06-15 21:42:45,956][1648985] Avg episode reward: [(0, '155.670')] [2024-06-15 21:42:47,883][1652491] Updated weights for policy 0, policy_version 831808 (0.0098) [2024-06-15 21:42:49,711][1652491] Updated weights for policy 0, policy_version 831872 (0.0013) [2024-06-15 21:42:50,955][1648985] Fps is (10 sec: 42599.1, 60 sec: 46421.3, 300 sec: 46097.3). Total num frames: 1703772160. Throughput: 0: 11241.2. Samples: 426007552. Policy #0 lag: (min: 47.0, avg: 111.6, max: 303.0) [2024-06-15 21:42:50,956][1648985] Avg episode reward: [(0, '156.690')] [2024-06-15 21:42:51,521][1651469] Signal inference workers to stop experience collection... (43350 times) [2024-06-15 21:42:51,571][1652491] InferenceWorker_p0-w0: stopping experience collection (43350 times) [2024-06-15 21:42:51,573][1652491] Updated weights for policy 0, policy_version 831939 (0.0012) [2024-06-15 21:42:51,776][1651469] Signal inference workers to resume experience collection... (43350 times) [2024-06-15 21:42:51,777][1652491] InferenceWorker_p0-w0: resuming experience collection (43350 times) [2024-06-15 21:42:52,761][1652491] Updated weights for policy 0, policy_version 831990 (0.0031) [2024-06-15 21:42:55,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 43690.6, 300 sec: 45875.2). Total num frames: 1703936000. Throughput: 0: 11093.3. Samples: 426040832. Policy #0 lag: (min: 47.0, avg: 111.6, max: 303.0) [2024-06-15 21:42:55,956][1648985] Avg episode reward: [(0, '141.160')] [2024-06-15 21:42:55,961][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000832000_1703936000.pth... [2024-06-15 21:42:56,014][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000826624_1692925952.pth [2024-06-15 21:42:59,456][1652491] Updated weights for policy 0, policy_version 832050 (0.0013) [2024-06-15 21:43:00,955][1648985] Fps is (10 sec: 39321.8, 60 sec: 45875.3, 300 sec: 45986.3). Total num frames: 1704165376. Throughput: 0: 11059.2. Samples: 426112000. Policy #0 lag: (min: 47.0, avg: 111.6, max: 303.0) [2024-06-15 21:43:00,956][1648985] Avg episode reward: [(0, '163.210')] [2024-06-15 21:43:01,351][1652491] Updated weights for policy 0, policy_version 832132 (0.0017) [2024-06-15 21:43:03,077][1652491] Updated weights for policy 0, policy_version 832197 (0.0011) [2024-06-15 21:43:04,036][1652491] Updated weights for policy 0, policy_version 832250 (0.0014) [2024-06-15 21:43:05,955][1648985] Fps is (10 sec: 52427.7, 60 sec: 43690.5, 300 sec: 46208.4). Total num frames: 1704460288. Throughput: 0: 10888.5. Samples: 426174464. Policy #0 lag: (min: 47.0, avg: 111.6, max: 303.0) [2024-06-15 21:43:05,957][1648985] Avg episode reward: [(0, '171.460')] [2024-06-15 21:43:10,955][1648985] Fps is (10 sec: 32768.1, 60 sec: 44236.9, 300 sec: 45430.9). Total num frames: 1704493056. Throughput: 0: 11082.0. Samples: 426218496. Policy #0 lag: (min: 47.0, avg: 111.6, max: 303.0) [2024-06-15 21:43:10,955][1648985] Avg episode reward: [(0, '185.250')] [2024-06-15 21:43:11,630][1652491] Updated weights for policy 0, policy_version 832320 (0.0013) [2024-06-15 21:43:13,270][1652491] Updated weights for policy 0, policy_version 832386 (0.0159) [2024-06-15 21:43:14,959][1652491] Updated weights for policy 0, policy_version 832465 (0.0018) [2024-06-15 21:43:15,590][1652491] Updated weights for policy 0, policy_version 832508 (0.0013) [2024-06-15 21:43:15,955][1648985] Fps is (10 sec: 52430.1, 60 sec: 45875.2, 300 sec: 46208.4). Total num frames: 1704984576. Throughput: 0: 11127.5. Samples: 426277376. Policy #0 lag: (min: 47.0, avg: 111.6, max: 303.0) [2024-06-15 21:43:15,956][1648985] Avg episode reward: [(0, '183.890')] [2024-06-15 21:43:20,955][1648985] Fps is (10 sec: 49151.5, 60 sec: 43690.7, 300 sec: 45319.8). Total num frames: 1704984576. Throughput: 0: 11632.0. Samples: 426372096. Policy #0 lag: (min: 47.0, avg: 111.6, max: 303.0) [2024-06-15 21:43:20,955][1648985] Avg episode reward: [(0, '168.440')] [2024-06-15 21:43:22,074][1652491] Updated weights for policy 0, policy_version 832560 (0.0013) [2024-06-15 21:43:23,647][1652491] Updated weights for policy 0, policy_version 832628 (0.0011) [2024-06-15 21:43:24,670][1652491] Updated weights for policy 0, policy_version 832678 (0.0012) [2024-06-15 21:43:25,955][1648985] Fps is (10 sec: 45874.3, 60 sec: 46421.1, 300 sec: 45986.3). Total num frames: 1705443328. Throughput: 0: 11673.6. Samples: 426400256. Policy #0 lag: (min: 47.0, avg: 111.6, max: 303.0) [2024-06-15 21:43:25,956][1648985] Avg episode reward: [(0, '168.290')] [2024-06-15 21:43:26,049][1652491] Updated weights for policy 0, policy_version 832741 (0.0094) [2024-06-15 21:43:30,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 43690.6, 300 sec: 45430.9). Total num frames: 1705508864. Throughput: 0: 11912.5. Samples: 426476544. Policy #0 lag: (min: 47.0, avg: 111.6, max: 303.0) [2024-06-15 21:43:30,956][1648985] Avg episode reward: [(0, '197.520')] [2024-06-15 21:43:31,941][1652491] Updated weights for policy 0, policy_version 832772 (0.0031) [2024-06-15 21:43:32,836][1651469] Signal inference workers to stop experience collection... (43400 times) [2024-06-15 21:43:32,879][1652491] InferenceWorker_p0-w0: stopping experience collection (43400 times) [2024-06-15 21:43:33,157][1651469] Signal inference workers to resume experience collection... (43400 times) [2024-06-15 21:43:33,157][1652491] InferenceWorker_p0-w0: resuming experience collection (43400 times) [2024-06-15 21:43:33,647][1652491] Updated weights for policy 0, policy_version 832835 (0.0045) [2024-06-15 21:43:34,857][1652491] Updated weights for policy 0, policy_version 832896 (0.0013) [2024-06-15 21:43:35,955][1648985] Fps is (10 sec: 39322.4, 60 sec: 46979.3, 300 sec: 45764.1). Total num frames: 1705836544. Throughput: 0: 11923.9. Samples: 426544128. Policy #0 lag: (min: 47.0, avg: 111.6, max: 303.0) [2024-06-15 21:43:35,956][1648985] Avg episode reward: [(0, '179.830')] [2024-06-15 21:43:37,015][1652491] Updated weights for policy 0, policy_version 832992 (0.0086) [2024-06-15 21:43:40,955][1648985] Fps is (10 sec: 52429.2, 60 sec: 44783.1, 300 sec: 46097.3). Total num frames: 1706033152. Throughput: 0: 11867.0. Samples: 426574848. Policy #0 lag: (min: 47.0, avg: 111.6, max: 303.0) [2024-06-15 21:43:40,956][1648985] Avg episode reward: [(0, '169.070')] [2024-06-15 21:43:43,238][1652491] Updated weights for policy 0, policy_version 833025 (0.0013) [2024-06-15 21:43:45,064][1652491] Updated weights for policy 0, policy_version 833090 (0.0013) [2024-06-15 21:43:45,955][1648985] Fps is (10 sec: 39321.6, 60 sec: 46967.5, 300 sec: 45653.1). Total num frames: 1706229760. Throughput: 0: 11980.8. Samples: 426651136. Policy #0 lag: (min: 47.0, avg: 111.6, max: 303.0) [2024-06-15 21:43:45,956][1648985] Avg episode reward: [(0, '157.310')] [2024-06-15 21:43:46,697][1652491] Updated weights for policy 0, policy_version 833168 (0.0024) [2024-06-15 21:43:48,444][1652491] Updated weights for policy 0, policy_version 833250 (0.0013) [2024-06-15 21:43:50,955][1648985] Fps is (10 sec: 52428.1, 60 sec: 46421.3, 300 sec: 46208.4). Total num frames: 1706557440. Throughput: 0: 12003.6. Samples: 426714624. Policy #0 lag: (min: 47.0, avg: 111.6, max: 303.0) [2024-06-15 21:43:50,956][1648985] Avg episode reward: [(0, '165.550')] [2024-06-15 21:43:55,263][1652491] Updated weights for policy 0, policy_version 833314 (0.0013) [2024-06-15 21:43:55,955][1648985] Fps is (10 sec: 42598.2, 60 sec: 45329.1, 300 sec: 45653.0). Total num frames: 1706655744. Throughput: 0: 11912.5. Samples: 426754560. Policy #0 lag: (min: 47.0, avg: 111.6, max: 303.0) [2024-06-15 21:43:55,956][1648985] Avg episode reward: [(0, '164.720')] [2024-06-15 21:43:56,673][1652491] Updated weights for policy 0, policy_version 833362 (0.0013) [2024-06-15 21:43:58,437][1652491] Updated weights for policy 0, policy_version 833443 (0.0028) [2024-06-15 21:44:00,020][1652491] Updated weights for policy 0, policy_version 833531 (0.0014) [2024-06-15 21:44:00,955][1648985] Fps is (10 sec: 52429.6, 60 sec: 48605.9, 300 sec: 46208.8). Total num frames: 1707081728. Throughput: 0: 11730.5. Samples: 426805248. Policy #0 lag: (min: 47.0, avg: 111.6, max: 303.0) [2024-06-15 21:44:00,956][1648985] Avg episode reward: [(0, '194.310')] [2024-06-15 21:44:05,955][1648985] Fps is (10 sec: 42598.5, 60 sec: 43690.8, 300 sec: 45430.9). Total num frames: 1707081728. Throughput: 0: 11468.8. Samples: 426888192. Policy #0 lag: (min: 47.0, avg: 111.6, max: 303.0) [2024-06-15 21:44:05,956][1648985] Avg episode reward: [(0, '185.690')] [2024-06-15 21:44:07,254][1652491] Updated weights for policy 0, policy_version 833585 (0.0013) [2024-06-15 21:44:08,804][1652491] Updated weights for policy 0, policy_version 833653 (0.0033) [2024-06-15 21:44:08,943][1651469] Signal inference workers to stop experience collection... (43450 times) [2024-06-15 21:44:08,977][1652491] InferenceWorker_p0-w0: stopping experience collection (43450 times) [2024-06-15 21:44:09,087][1651469] Signal inference workers to resume experience collection... (43450 times) [2024-06-15 21:44:09,087][1652491] InferenceWorker_p0-w0: resuming experience collection (43450 times) [2024-06-15 21:44:10,212][1652491] Updated weights for policy 0, policy_version 833712 (0.0013) [2024-06-15 21:44:10,955][1648985] Fps is (10 sec: 42597.3, 60 sec: 50244.1, 300 sec: 45875.2). Total num frames: 1707507712. Throughput: 0: 11366.4. Samples: 426911744. Policy #0 lag: (min: 47.0, avg: 111.6, max: 303.0) [2024-06-15 21:44:10,956][1648985] Avg episode reward: [(0, '173.620')] [2024-06-15 21:44:11,434][1652491] Updated weights for policy 0, policy_version 833770 (0.0037) [2024-06-15 21:44:15,955][1648985] Fps is (10 sec: 52429.1, 60 sec: 43690.7, 300 sec: 45986.3). Total num frames: 1707606016. Throughput: 0: 11366.4. Samples: 426988032. Policy #0 lag: (min: 47.0, avg: 111.6, max: 303.0) [2024-06-15 21:44:15,955][1648985] Avg episode reward: [(0, '151.130')] [2024-06-15 21:44:17,816][1652491] Updated weights for policy 0, policy_version 833840 (0.0014) [2024-06-15 21:44:18,869][1652491] Updated weights for policy 0, policy_version 833888 (0.0012) [2024-06-15 21:44:20,582][1652491] Updated weights for policy 0, policy_version 833936 (0.0022) [2024-06-15 21:44:20,955][1648985] Fps is (10 sec: 39322.1, 60 sec: 48605.9, 300 sec: 45764.1). Total num frames: 1707900928. Throughput: 0: 11389.1. Samples: 427056640. Policy #0 lag: (min: 47.0, avg: 111.6, max: 303.0) [2024-06-15 21:44:20,956][1648985] Avg episode reward: [(0, '146.780')] [2024-06-15 21:44:21,964][1652491] Updated weights for policy 0, policy_version 833985 (0.0012) [2024-06-15 21:44:23,345][1652491] Updated weights for policy 0, policy_version 834041 (0.0011) [2024-06-15 21:44:25,956][1648985] Fps is (10 sec: 52426.2, 60 sec: 44782.7, 300 sec: 46208.4). Total num frames: 1708130304. Throughput: 0: 11298.0. Samples: 427083264. Policy #0 lag: (min: 47.0, avg: 111.6, max: 303.0) [2024-06-15 21:44:25,956][1648985] Avg episode reward: [(0, '143.240')] [2024-06-15 21:44:28,026][1652491] Updated weights for policy 0, policy_version 834071 (0.0011) [2024-06-15 21:44:29,371][1652491] Updated weights for policy 0, policy_version 834114 (0.0015) [2024-06-15 21:44:30,427][1652491] Updated weights for policy 0, policy_version 834176 (0.0015) [2024-06-15 21:44:30,955][1648985] Fps is (10 sec: 49152.7, 60 sec: 48059.8, 300 sec: 46097.4). Total num frames: 1708392448. Throughput: 0: 11264.0. Samples: 427158016. Policy #0 lag: (min: 15.0, avg: 90.9, max: 271.0) [2024-06-15 21:44:30,955][1648985] Avg episode reward: [(0, '141.190')] [2024-06-15 21:44:32,999][1652491] Updated weights for policy 0, policy_version 834233 (0.0014) [2024-06-15 21:44:34,670][1652491] Updated weights for policy 0, policy_version 834291 (0.0014) [2024-06-15 21:44:35,955][1648985] Fps is (10 sec: 52431.6, 60 sec: 46967.5, 300 sec: 46208.5). Total num frames: 1708654592. Throughput: 0: 11366.4. Samples: 427226112. Policy #0 lag: (min: 15.0, avg: 90.9, max: 271.0) [2024-06-15 21:44:35,955][1648985] Avg episode reward: [(0, '126.710')] [2024-06-15 21:44:38,304][1652491] Updated weights for policy 0, policy_version 834320 (0.0012) [2024-06-15 21:44:40,955][1648985] Fps is (10 sec: 42597.9, 60 sec: 46421.3, 300 sec: 45986.3). Total num frames: 1708818432. Throughput: 0: 11320.9. Samples: 427264000. Policy #0 lag: (min: 15.0, avg: 90.9, max: 271.0) [2024-06-15 21:44:40,956][1648985] Avg episode reward: [(0, '126.080')] [2024-06-15 21:44:41,017][1652491] Updated weights for policy 0, policy_version 834391 (0.0014) [2024-06-15 21:44:42,626][1652491] Updated weights for policy 0, policy_version 834439 (0.0014) [2024-06-15 21:44:43,813][1652491] Updated weights for policy 0, policy_version 834496 (0.0109) [2024-06-15 21:44:45,865][1652491] Updated weights for policy 0, policy_version 834559 (0.0012) [2024-06-15 21:44:45,955][1648985] Fps is (10 sec: 52428.3, 60 sec: 49152.0, 300 sec: 46208.4). Total num frames: 1709178880. Throughput: 0: 11810.1. Samples: 427336704. Policy #0 lag: (min: 15.0, avg: 90.9, max: 271.0) [2024-06-15 21:44:45,956][1648985] Avg episode reward: [(0, '139.430')] [2024-06-15 21:44:49,778][1652491] Updated weights for policy 0, policy_version 834615 (0.0093) [2024-06-15 21:44:50,955][1648985] Fps is (10 sec: 49152.1, 60 sec: 45875.3, 300 sec: 46208.4). Total num frames: 1709309952. Throughput: 0: 11673.6. Samples: 427413504. Policy #0 lag: (min: 15.0, avg: 90.9, max: 271.0) [2024-06-15 21:44:50,956][1648985] Avg episode reward: [(0, '161.320')] [2024-06-15 21:44:51,252][1651469] Signal inference workers to stop experience collection... (43500 times) [2024-06-15 21:44:51,304][1652491] InferenceWorker_p0-w0: stopping experience collection (43500 times) [2024-06-15 21:44:51,498][1651469] Signal inference workers to resume experience collection... (43500 times) [2024-06-15 21:44:51,499][1652491] InferenceWorker_p0-w0: resuming experience collection (43500 times) [2024-06-15 21:44:52,377][1652491] Updated weights for policy 0, policy_version 834687 (0.0140) [2024-06-15 21:44:54,261][1652491] Updated weights for policy 0, policy_version 834747 (0.0014) [2024-06-15 21:44:55,955][1648985] Fps is (10 sec: 42598.3, 60 sec: 49152.0, 300 sec: 46208.4). Total num frames: 1709604864. Throughput: 0: 11901.2. Samples: 427447296. Policy #0 lag: (min: 15.0, avg: 90.9, max: 271.0) [2024-06-15 21:44:55,956][1648985] Avg episode reward: [(0, '178.450')] [2024-06-15 21:44:56,268][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000834784_1709637632.pth... [2024-06-15 21:44:56,422][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000829440_1698693120.pth [2024-06-15 21:44:56,971][1652491] Updated weights for policy 0, policy_version 834814 (0.0037) [2024-06-15 21:45:00,493][1652491] Updated weights for policy 0, policy_version 834864 (0.0014) [2024-06-15 21:45:00,955][1648985] Fps is (10 sec: 52428.2, 60 sec: 45875.1, 300 sec: 46652.7). Total num frames: 1709834240. Throughput: 0: 11867.0. Samples: 427522048. Policy #0 lag: (min: 15.0, avg: 90.9, max: 271.0) [2024-06-15 21:45:00,956][1648985] Avg episode reward: [(0, '177.940')] [2024-06-15 21:45:02,730][1652491] Updated weights for policy 0, policy_version 834928 (0.0012) [2024-06-15 21:45:05,148][1652491] Updated weights for policy 0, policy_version 834979 (0.0021) [2024-06-15 21:45:05,955][1648985] Fps is (10 sec: 49152.1, 60 sec: 50244.3, 300 sec: 46541.7). Total num frames: 1710096384. Throughput: 0: 11821.5. Samples: 427588608. Policy #0 lag: (min: 15.0, avg: 90.9, max: 271.0) [2024-06-15 21:45:05,956][1648985] Avg episode reward: [(0, '164.070')] [2024-06-15 21:45:07,365][1652491] Updated weights for policy 0, policy_version 835024 (0.0018) [2024-06-15 21:45:10,955][1648985] Fps is (10 sec: 42598.8, 60 sec: 45875.3, 300 sec: 46319.5). Total num frames: 1710260224. Throughput: 0: 11958.2. Samples: 427621376. Policy #0 lag: (min: 15.0, avg: 90.9, max: 271.0) [2024-06-15 21:45:10,956][1648985] Avg episode reward: [(0, '145.260')] [2024-06-15 21:45:11,147][1652491] Updated weights for policy 0, policy_version 835104 (0.0012) [2024-06-15 21:45:13,866][1652491] Updated weights for policy 0, policy_version 835154 (0.0014) [2024-06-15 21:45:14,493][1652491] Updated weights for policy 0, policy_version 835193 (0.0055) [2024-06-15 21:45:15,858][1652491] Updated weights for policy 0, policy_version 835249 (0.0019) [2024-06-15 21:45:15,955][1648985] Fps is (10 sec: 49152.0, 60 sec: 49698.1, 300 sec: 46763.8). Total num frames: 1710587904. Throughput: 0: 11992.2. Samples: 427697664. Policy #0 lag: (min: 15.0, avg: 90.9, max: 271.0) [2024-06-15 21:45:15,956][1648985] Avg episode reward: [(0, '149.440')] [2024-06-15 21:45:19,287][1652491] Updated weights for policy 0, policy_version 835325 (0.0012) [2024-06-15 21:45:20,955][1648985] Fps is (10 sec: 49151.4, 60 sec: 47513.5, 300 sec: 46208.4). Total num frames: 1710751744. Throughput: 0: 12014.9. Samples: 427766784. Policy #0 lag: (min: 15.0, avg: 90.9, max: 271.0) [2024-06-15 21:45:20,956][1648985] Avg episode reward: [(0, '162.070')] [2024-06-15 21:45:22,993][1652491] Updated weights for policy 0, policy_version 835378 (0.0011) [2024-06-15 21:45:24,576][1652491] Updated weights for policy 0, policy_version 835409 (0.0022) [2024-06-15 21:45:25,955][1648985] Fps is (10 sec: 42598.0, 60 sec: 48060.0, 300 sec: 46652.8). Total num frames: 1711013888. Throughput: 0: 12105.9. Samples: 427808768. Policy #0 lag: (min: 15.0, avg: 90.9, max: 271.0) [2024-06-15 21:45:25,956][1648985] Avg episode reward: [(0, '152.620')] [2024-06-15 21:45:26,464][1652491] Updated weights for policy 0, policy_version 835476 (0.0087) [2024-06-15 21:45:29,057][1652491] Updated weights for policy 0, policy_version 835522 (0.0016) [2024-06-15 21:45:30,955][1648985] Fps is (10 sec: 52429.8, 60 sec: 48059.7, 300 sec: 46319.5). Total num frames: 1711276032. Throughput: 0: 11912.5. Samples: 427872768. Policy #0 lag: (min: 15.0, avg: 90.9, max: 271.0) [2024-06-15 21:45:30,955][1648985] Avg episode reward: [(0, '157.730')] [2024-06-15 21:45:33,134][1652491] Updated weights for policy 0, policy_version 835601 (0.0014) [2024-06-15 21:45:34,100][1652491] Updated weights for policy 0, policy_version 835645 (0.0014) [2024-06-15 21:45:35,660][1651469] Signal inference workers to stop experience collection... (43550 times) [2024-06-15 21:45:35,717][1652491] InferenceWorker_p0-w0: stopping experience collection (43550 times) [2024-06-15 21:45:35,955][1648985] Fps is (10 sec: 45876.2, 60 sec: 46967.5, 300 sec: 46541.7). Total num frames: 1711472640. Throughput: 0: 11741.9. Samples: 427941888. Policy #0 lag: (min: 15.0, avg: 90.9, max: 271.0) [2024-06-15 21:45:35,955][1648985] Avg episode reward: [(0, '157.760')] [2024-06-15 21:45:36,025][1651469] Signal inference workers to resume experience collection... (43550 times) [2024-06-15 21:45:36,026][1652491] InferenceWorker_p0-w0: resuming experience collection (43550 times) [2024-06-15 21:45:36,227][1652491] Updated weights for policy 0, policy_version 835706 (0.0075) [2024-06-15 21:45:39,421][1652491] Updated weights for policy 0, policy_version 835770 (0.0013) [2024-06-15 21:45:40,957][1648985] Fps is (10 sec: 42590.5, 60 sec: 48058.3, 300 sec: 46097.1). Total num frames: 1711702016. Throughput: 0: 11741.4. Samples: 427975680. Policy #0 lag: (min: 15.0, avg: 90.9, max: 271.0) [2024-06-15 21:45:40,958][1648985] Avg episode reward: [(0, '173.320')] [2024-06-15 21:45:44,564][1652491] Updated weights for policy 0, policy_version 835856 (0.0015) [2024-06-15 21:45:45,581][1652491] Updated weights for policy 0, policy_version 835899 (0.0013) [2024-06-15 21:45:45,955][1648985] Fps is (10 sec: 45875.1, 60 sec: 45875.3, 300 sec: 46652.8). Total num frames: 1711931392. Throughput: 0: 11616.8. Samples: 428044800. Policy #0 lag: (min: 15.0, avg: 90.9, max: 271.0) [2024-06-15 21:45:45,955][1648985] Avg episode reward: [(0, '172.310')] [2024-06-15 21:45:47,798][1652491] Updated weights for policy 0, policy_version 835962 (0.0013) [2024-06-15 21:45:50,621][1652491] Updated weights for policy 0, policy_version 836022 (0.0085) [2024-06-15 21:45:50,955][1648985] Fps is (10 sec: 49161.2, 60 sec: 48059.8, 300 sec: 45986.3). Total num frames: 1712193536. Throughput: 0: 11673.6. Samples: 428113920. Policy #0 lag: (min: 15.0, avg: 90.9, max: 271.0) [2024-06-15 21:45:50,956][1648985] Avg episode reward: [(0, '191.110')] [2024-06-15 21:45:52,100][1652491] Updated weights for policy 0, policy_version 836064 (0.0012) [2024-06-15 21:45:55,955][1648985] Fps is (10 sec: 42597.4, 60 sec: 45875.1, 300 sec: 46319.5). Total num frames: 1712357376. Throughput: 0: 11753.2. Samples: 428150272. Policy #0 lag: (min: 15.0, avg: 90.9, max: 271.0) [2024-06-15 21:45:55,956][1648985] Avg episode reward: [(0, '184.960')] [2024-06-15 21:45:56,589][1652491] Updated weights for policy 0, policy_version 836148 (0.0014) [2024-06-15 21:45:58,457][1652491] Updated weights for policy 0, policy_version 836192 (0.0106) [2024-06-15 21:45:59,060][1652491] Updated weights for policy 0, policy_version 836224 (0.0010) [2024-06-15 21:46:00,955][1648985] Fps is (10 sec: 42598.2, 60 sec: 46421.4, 300 sec: 46097.4). Total num frames: 1712619520. Throughput: 0: 11719.1. Samples: 428225024. Policy #0 lag: (min: 15.0, avg: 90.9, max: 271.0) [2024-06-15 21:46:00,956][1648985] Avg episode reward: [(0, '172.710')] [2024-06-15 21:46:01,737][1652491] Updated weights for policy 0, policy_version 836277 (0.0013) [2024-06-15 21:46:03,339][1652491] Updated weights for policy 0, policy_version 836308 (0.0011) [2024-06-15 21:46:05,955][1648985] Fps is (10 sec: 49153.1, 60 sec: 45875.3, 300 sec: 46208.4). Total num frames: 1712848896. Throughput: 0: 11730.6. Samples: 428294656. Policy #0 lag: (min: 15.0, avg: 90.9, max: 271.0) [2024-06-15 21:46:05,955][1648985] Avg episode reward: [(0, '163.820')] [2024-06-15 21:46:06,963][1652491] Updated weights for policy 0, policy_version 836368 (0.0013) [2024-06-15 21:46:07,956][1652491] Updated weights for policy 0, policy_version 836410 (0.0027) [2024-06-15 21:46:09,871][1652491] Updated weights for policy 0, policy_version 836472 (0.0013) [2024-06-15 21:46:10,955][1648985] Fps is (10 sec: 49151.4, 60 sec: 47513.5, 300 sec: 46430.6). Total num frames: 1713111040. Throughput: 0: 11559.8. Samples: 428328960. Policy #0 lag: (min: 15.0, avg: 90.9, max: 271.0) [2024-06-15 21:46:10,956][1648985] Avg episode reward: [(0, '155.130')] [2024-06-15 21:46:12,507][1652491] Updated weights for policy 0, policy_version 836515 (0.0023) [2024-06-15 21:46:14,759][1652491] Updated weights for policy 0, policy_version 836592 (0.0013) [2024-06-15 21:46:15,955][1648985] Fps is (10 sec: 52427.2, 60 sec: 46421.2, 300 sec: 46208.4). Total num frames: 1713373184. Throughput: 0: 11537.0. Samples: 428391936. Policy #0 lag: (min: 15.0, avg: 90.9, max: 271.0) [2024-06-15 21:46:15,956][1648985] Avg episode reward: [(0, '169.070')] [2024-06-15 21:46:18,227][1652491] Updated weights for policy 0, policy_version 836624 (0.0013) [2024-06-15 21:46:20,955][1648985] Fps is (10 sec: 42598.9, 60 sec: 46421.5, 300 sec: 46542.2). Total num frames: 1713537024. Throughput: 0: 11685.0. Samples: 428467712. Policy #0 lag: (min: 15.0, avg: 107.3, max: 271.0) [2024-06-15 21:46:20,956][1648985] Avg episode reward: [(0, '166.120')] [2024-06-15 21:46:21,292][1652491] Updated weights for policy 0, policy_version 836704 (0.0014) [2024-06-15 21:46:23,184][1652491] Updated weights for policy 0, policy_version 836744 (0.0012) [2024-06-15 21:46:23,504][1651469] Signal inference workers to stop experience collection... (43600 times) [2024-06-15 21:46:23,535][1652491] InferenceWorker_p0-w0: stopping experience collection (43600 times) [2024-06-15 21:46:23,812][1651469] Signal inference workers to resume experience collection... (43600 times) [2024-06-15 21:46:23,813][1652491] InferenceWorker_p0-w0: resuming experience collection (43600 times) [2024-06-15 21:46:24,894][1652491] Updated weights for policy 0, policy_version 836801 (0.0011) [2024-06-15 21:46:25,951][1652491] Updated weights for policy 0, policy_version 836857 (0.0011) [2024-06-15 21:46:25,955][1648985] Fps is (10 sec: 49152.9, 60 sec: 47513.6, 300 sec: 46319.5). Total num frames: 1713864704. Throughput: 0: 11639.9. Samples: 428499456. Policy #0 lag: (min: 15.0, avg: 107.3, max: 271.0) [2024-06-15 21:46:25,956][1648985] Avg episode reward: [(0, '189.930')] [2024-06-15 21:46:29,784][1652491] Updated weights for policy 0, policy_version 836897 (0.0017) [2024-06-15 21:46:30,955][1648985] Fps is (10 sec: 49151.9, 60 sec: 45875.2, 300 sec: 46652.7). Total num frames: 1714028544. Throughput: 0: 11798.7. Samples: 428575744. Policy #0 lag: (min: 15.0, avg: 107.3, max: 271.0) [2024-06-15 21:46:30,956][1648985] Avg episode reward: [(0, '185.540')] [2024-06-15 21:46:31,621][1652491] Updated weights for policy 0, policy_version 836931 (0.0012) [2024-06-15 21:46:32,741][1652491] Updated weights for policy 0, policy_version 836992 (0.0219) [2024-06-15 21:46:35,211][1652491] Updated weights for policy 0, policy_version 837048 (0.0014) [2024-06-15 21:46:35,999][1648985] Fps is (10 sec: 45875.7, 60 sec: 47513.5, 300 sec: 46763.8). Total num frames: 1714323456. Throughput: 0: 11832.9. Samples: 428646400. Policy #0 lag: (min: 15.0, avg: 107.3, max: 271.0) [2024-06-15 21:46:35,999][1648985] Avg episode reward: [(0, '195.800')] [2024-06-15 21:46:36,414][1652491] Updated weights for policy 0, policy_version 837091 (0.0026) [2024-06-15 21:46:39,867][1652491] Updated weights for policy 0, policy_version 837140 (0.0021) [2024-06-15 21:46:40,595][1652491] Updated weights for policy 0, policy_version 837183 (0.0071) [2024-06-15 21:46:40,955][1648985] Fps is (10 sec: 52429.1, 60 sec: 47515.1, 300 sec: 46652.8). Total num frames: 1714552832. Throughput: 0: 11946.7. Samples: 428687872. Policy #0 lag: (min: 15.0, avg: 107.3, max: 271.0) [2024-06-15 21:46:40,955][1648985] Avg episode reward: [(0, '161.440')] [2024-06-15 21:46:43,249][1652491] Updated weights for policy 0, policy_version 837246 (0.0013) [2024-06-15 21:46:45,462][1652491] Updated weights for policy 0, policy_version 837299 (0.0013) [2024-06-15 21:46:45,955][1648985] Fps is (10 sec: 49151.7, 60 sec: 48059.7, 300 sec: 46874.9). Total num frames: 1714814976. Throughput: 0: 11912.5. Samples: 428761088. Policy #0 lag: (min: 15.0, avg: 107.3, max: 271.0) [2024-06-15 21:46:45,956][1648985] Avg episode reward: [(0, '165.490')] [2024-06-15 21:46:46,542][1652491] Updated weights for policy 0, policy_version 837344 (0.0013) [2024-06-15 21:46:50,362][1652491] Updated weights for policy 0, policy_version 837412 (0.0014) [2024-06-15 21:46:50,955][1648985] Fps is (10 sec: 49151.6, 60 sec: 47513.5, 300 sec: 46541.7). Total num frames: 1715044352. Throughput: 0: 11958.0. Samples: 428832768. Policy #0 lag: (min: 15.0, avg: 107.3, max: 271.0) [2024-06-15 21:46:50,956][1648985] Avg episode reward: [(0, '155.230')] [2024-06-15 21:46:53,032][1652491] Updated weights for policy 0, policy_version 837443 (0.0016) [2024-06-15 21:46:55,955][1648985] Fps is (10 sec: 39321.2, 60 sec: 47513.6, 300 sec: 46763.8). Total num frames: 1715208192. Throughput: 0: 12026.3. Samples: 428870144. Policy #0 lag: (min: 15.0, avg: 107.3, max: 271.0) [2024-06-15 21:46:55,956][1648985] Avg episode reward: [(0, '157.920')] [2024-06-15 21:46:56,245][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000837520_1715240960.pth... [2024-06-15 21:46:56,400][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000832000_1703936000.pth [2024-06-15 21:46:56,620][1652491] Updated weights for policy 0, policy_version 837536 (0.0015) [2024-06-15 21:46:58,352][1652491] Updated weights for policy 0, policy_version 837616 (0.0015) [2024-06-15 21:47:00,955][1648985] Fps is (10 sec: 45874.7, 60 sec: 48059.6, 300 sec: 46319.5). Total num frames: 1715503104. Throughput: 0: 12162.9. Samples: 428939264. Policy #0 lag: (min: 15.0, avg: 107.3, max: 271.0) [2024-06-15 21:47:00,956][1648985] Avg episode reward: [(0, '163.910')] [2024-06-15 21:47:01,018][1652491] Updated weights for policy 0, policy_version 837664 (0.0035) [2024-06-15 21:47:01,628][1652491] Updated weights for policy 0, policy_version 837691 (0.0011) [2024-06-15 21:47:05,485][1652491] Updated weights for policy 0, policy_version 837752 (0.0010) [2024-06-15 21:47:05,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 48059.6, 300 sec: 47097.1). Total num frames: 1715732480. Throughput: 0: 12105.9. Samples: 429012480. Policy #0 lag: (min: 15.0, avg: 107.3, max: 271.0) [2024-06-15 21:47:05,956][1648985] Avg episode reward: [(0, '181.200')] [2024-06-15 21:47:07,219][1652491] Updated weights for policy 0, policy_version 837796 (0.0014) [2024-06-15 21:47:07,532][1651469] Signal inference workers to stop experience collection... (43650 times) [2024-06-15 21:47:07,582][1652491] InferenceWorker_p0-w0: stopping experience collection (43650 times) [2024-06-15 21:47:07,671][1651469] Signal inference workers to resume experience collection... (43650 times) [2024-06-15 21:47:07,689][1652491] InferenceWorker_p0-w0: resuming experience collection (43650 times) [2024-06-15 21:47:08,400][1652491] Updated weights for policy 0, policy_version 837857 (0.0014) [2024-06-15 21:47:10,955][1648985] Fps is (10 sec: 49152.1, 60 sec: 48059.7, 300 sec: 46652.7). Total num frames: 1715994624. Throughput: 0: 12231.1. Samples: 429049856. Policy #0 lag: (min: 15.0, avg: 107.3, max: 271.0) [2024-06-15 21:47:10,956][1648985] Avg episode reward: [(0, '182.890')] [2024-06-15 21:47:11,472][1652491] Updated weights for policy 0, policy_version 837920 (0.0013) [2024-06-15 21:47:15,653][1652491] Updated weights for policy 0, policy_version 837970 (0.0015) [2024-06-15 21:47:15,955][1648985] Fps is (10 sec: 45875.6, 60 sec: 46967.6, 300 sec: 46874.9). Total num frames: 1716191232. Throughput: 0: 12242.5. Samples: 429126656. Policy #0 lag: (min: 15.0, avg: 107.3, max: 271.0) [2024-06-15 21:47:15,956][1648985] Avg episode reward: [(0, '170.730')] [2024-06-15 21:47:17,540][1652491] Updated weights for policy 0, policy_version 838048 (0.0015) [2024-06-15 21:47:19,738][1652491] Updated weights for policy 0, policy_version 838096 (0.0014) [2024-06-15 21:47:20,560][1652491] Updated weights for policy 0, policy_version 838139 (0.0013) [2024-06-15 21:47:20,955][1648985] Fps is (10 sec: 52429.4, 60 sec: 49698.1, 300 sec: 46986.0). Total num frames: 1716518912. Throughput: 0: 12140.1. Samples: 429192704. Policy #0 lag: (min: 15.0, avg: 107.3, max: 271.0) [2024-06-15 21:47:20,956][1648985] Avg episode reward: [(0, '167.800')] [2024-06-15 21:47:23,357][1652491] Updated weights for policy 0, policy_version 838178 (0.0014) [2024-06-15 21:47:25,955][1648985] Fps is (10 sec: 49152.3, 60 sec: 46967.5, 300 sec: 46763.8). Total num frames: 1716682752. Throughput: 0: 12026.3. Samples: 429229056. Policy #0 lag: (min: 15.0, avg: 107.3, max: 271.0) [2024-06-15 21:47:25,956][1648985] Avg episode reward: [(0, '150.810')] [2024-06-15 21:47:26,073][1652491] Updated weights for policy 0, policy_version 838229 (0.0012) [2024-06-15 21:47:26,921][1652491] Updated weights for policy 0, policy_version 838270 (0.0012) [2024-06-15 21:47:28,437][1652491] Updated weights for policy 0, policy_version 838307 (0.0012) [2024-06-15 21:47:29,948][1652491] Updated weights for policy 0, policy_version 838353 (0.0013) [2024-06-15 21:47:30,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 50244.3, 300 sec: 47543.8). Total num frames: 1717043200. Throughput: 0: 12026.3. Samples: 429302272. Policy #0 lag: (min: 15.0, avg: 107.3, max: 271.0) [2024-06-15 21:47:30,956][1648985] Avg episode reward: [(0, '155.720')] [2024-06-15 21:47:34,127][1652491] Updated weights for policy 0, policy_version 838419 (0.0033) [2024-06-15 21:47:34,912][1652491] Updated weights for policy 0, policy_version 838464 (0.0011) [2024-06-15 21:47:35,955][1648985] Fps is (10 sec: 49151.9, 60 sec: 47513.6, 300 sec: 46874.9). Total num frames: 1717174272. Throughput: 0: 12014.9. Samples: 429373440. Policy #0 lag: (min: 15.0, avg: 107.3, max: 271.0) [2024-06-15 21:47:35,956][1648985] Avg episode reward: [(0, '167.660')] [2024-06-15 21:47:37,938][1652491] Updated weights for policy 0, policy_version 838520 (0.0014) [2024-06-15 21:47:40,281][1652491] Updated weights for policy 0, policy_version 838585 (0.0011) [2024-06-15 21:47:40,955][1648985] Fps is (10 sec: 42598.2, 60 sec: 48605.8, 300 sec: 47652.4). Total num frames: 1717469184. Throughput: 0: 12037.7. Samples: 429411840. Policy #0 lag: (min: 15.0, avg: 107.3, max: 271.0) [2024-06-15 21:47:40,956][1648985] Avg episode reward: [(0, '145.240')] [2024-06-15 21:47:41,916][1652491] Updated weights for policy 0, policy_version 838644 (0.0012) [2024-06-15 21:47:45,955][1648985] Fps is (10 sec: 45875.6, 60 sec: 46967.5, 300 sec: 46986.0). Total num frames: 1717633024. Throughput: 0: 11980.9. Samples: 429478400. Policy #0 lag: (min: 15.0, avg: 107.3, max: 271.0) [2024-06-15 21:47:45,955][1648985] Avg episode reward: [(0, '154.520')] [2024-06-15 21:47:46,148][1652491] Updated weights for policy 0, policy_version 838704 (0.0013) [2024-06-15 21:47:49,642][1652491] Updated weights for policy 0, policy_version 838768 (0.0026) [2024-06-15 21:47:50,915][1652491] Updated weights for policy 0, policy_version 838800 (0.0012) [2024-06-15 21:47:50,955][1648985] Fps is (10 sec: 39321.7, 60 sec: 46967.5, 300 sec: 47208.1). Total num frames: 1717862400. Throughput: 0: 11844.3. Samples: 429545472. Policy #0 lag: (min: 15.0, avg: 107.3, max: 271.0) [2024-06-15 21:47:50,956][1648985] Avg episode reward: [(0, '145.540')] [2024-06-15 21:47:51,360][1651469] Signal inference workers to stop experience collection... (43700 times) [2024-06-15 21:47:51,398][1652491] InferenceWorker_p0-w0: stopping experience collection (43700 times) [2024-06-15 21:47:51,571][1651469] Signal inference workers to resume experience collection... (43700 times) [2024-06-15 21:47:51,582][1652491] InferenceWorker_p0-w0: resuming experience collection (43700 times) [2024-06-15 21:47:52,307][1652491] Updated weights for policy 0, policy_version 838856 (0.0012) [2024-06-15 21:47:53,432][1652491] Updated weights for policy 0, policy_version 838909 (0.0014) [2024-06-15 21:47:55,955][1648985] Fps is (10 sec: 45874.8, 60 sec: 48059.8, 300 sec: 47208.1). Total num frames: 1718091776. Throughput: 0: 11696.4. Samples: 429576192. Policy #0 lag: (min: 15.0, avg: 107.3, max: 271.0) [2024-06-15 21:47:55,956][1648985] Avg episode reward: [(0, '166.450')] [2024-06-15 21:47:57,955][1652491] Updated weights for policy 0, policy_version 838976 (0.0014) [2024-06-15 21:48:00,955][1648985] Fps is (10 sec: 45875.5, 60 sec: 46967.6, 300 sec: 46986.0). Total num frames: 1718321152. Throughput: 0: 11628.1. Samples: 429649920. Policy #0 lag: (min: 15.0, avg: 107.3, max: 271.0) [2024-06-15 21:48:00,955][1648985] Avg episode reward: [(0, '162.290')] [2024-06-15 21:48:01,047][1652491] Updated weights for policy 0, policy_version 839031 (0.0021) [2024-06-15 21:48:03,022][1652491] Updated weights for policy 0, policy_version 839090 (0.0020) [2024-06-15 21:48:04,361][1652491] Updated weights for policy 0, policy_version 839160 (0.0025) [2024-06-15 21:48:05,955][1648985] Fps is (10 sec: 52428.1, 60 sec: 48059.7, 300 sec: 47874.6). Total num frames: 1718616064. Throughput: 0: 11753.2. Samples: 429721600. Policy #0 lag: (min: 15.0, avg: 107.3, max: 271.0) [2024-06-15 21:48:05,956][1648985] Avg episode reward: [(0, '147.560')] [2024-06-15 21:48:09,336][1652491] Updated weights for policy 0, policy_version 839216 (0.0113) [2024-06-15 21:48:10,955][1648985] Fps is (10 sec: 42598.0, 60 sec: 45875.3, 300 sec: 46652.7). Total num frames: 1718747136. Throughput: 0: 11696.3. Samples: 429755392. Policy #0 lag: (min: 15.0, avg: 112.7, max: 271.0) [2024-06-15 21:48:10,956][1648985] Avg episode reward: [(0, '147.470')] [2024-06-15 21:48:12,275][1652491] Updated weights for policy 0, policy_version 839265 (0.0013) [2024-06-15 21:48:14,059][1652491] Updated weights for policy 0, policy_version 839328 (0.0013) [2024-06-15 21:48:15,594][1652491] Updated weights for policy 0, policy_version 839392 (0.0011) [2024-06-15 21:48:15,955][1648985] Fps is (10 sec: 45875.5, 60 sec: 48059.7, 300 sec: 47763.5). Total num frames: 1719074816. Throughput: 0: 11571.2. Samples: 429822976. Policy #0 lag: (min: 15.0, avg: 112.7, max: 271.0) [2024-06-15 21:48:15,956][1648985] Avg episode reward: [(0, '160.480')] [2024-06-15 21:48:16,445][1652491] Updated weights for policy 0, policy_version 839424 (0.0014) [2024-06-15 21:48:20,293][1652491] Updated weights for policy 0, policy_version 839487 (0.0013) [2024-06-15 21:48:20,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 45875.2, 300 sec: 46874.9). Total num frames: 1719271424. Throughput: 0: 11480.2. Samples: 429890048. Policy #0 lag: (min: 15.0, avg: 112.7, max: 271.0) [2024-06-15 21:48:20,956][1648985] Avg episode reward: [(0, '163.510')] [2024-06-15 21:48:23,861][1652491] Updated weights for policy 0, policy_version 839539 (0.0021) [2024-06-15 21:48:25,247][1652491] Updated weights for policy 0, policy_version 839569 (0.0012) [2024-06-15 21:48:25,955][1648985] Fps is (10 sec: 42598.9, 60 sec: 46967.5, 300 sec: 47430.3). Total num frames: 1719500800. Throughput: 0: 11434.7. Samples: 429926400. Policy #0 lag: (min: 15.0, avg: 112.7, max: 271.0) [2024-06-15 21:48:25,955][1648985] Avg episode reward: [(0, '196.760')] [2024-06-15 21:48:26,949][1652491] Updated weights for policy 0, policy_version 839634 (0.0012) [2024-06-15 21:48:27,859][1652491] Updated weights for policy 0, policy_version 839680 (0.0013) [2024-06-15 21:48:30,955][1648985] Fps is (10 sec: 49151.7, 60 sec: 45329.0, 300 sec: 47208.1). Total num frames: 1719762944. Throughput: 0: 11616.7. Samples: 430001152. Policy #0 lag: (min: 15.0, avg: 112.7, max: 271.0) [2024-06-15 21:48:30,956][1648985] Avg episode reward: [(0, '187.020')] [2024-06-15 21:48:31,139][1652491] Updated weights for policy 0, policy_version 839738 (0.0012) [2024-06-15 21:48:34,220][1652491] Updated weights for policy 0, policy_version 839778 (0.0011) [2024-06-15 21:48:35,911][1651469] Signal inference workers to stop experience collection... (43750 times) [2024-06-15 21:48:35,955][1648985] Fps is (10 sec: 45874.8, 60 sec: 46421.3, 300 sec: 47208.1). Total num frames: 1719959552. Throughput: 0: 11741.9. Samples: 430073856. Policy #0 lag: (min: 15.0, avg: 112.7, max: 271.0) [2024-06-15 21:48:35,956][1648985] Avg episode reward: [(0, '181.290')] [2024-06-15 21:48:35,968][1652491] InferenceWorker_p0-w0: stopping experience collection (43750 times) [2024-06-15 21:48:36,219][1651469] Signal inference workers to resume experience collection... (43750 times) [2024-06-15 21:48:36,220][1652491] InferenceWorker_p0-w0: resuming experience collection (43750 times) [2024-06-15 21:48:36,222][1652491] Updated weights for policy 0, policy_version 839840 (0.0013) [2024-06-15 21:48:37,606][1652491] Updated weights for policy 0, policy_version 839891 (0.0011) [2024-06-15 21:48:40,955][1648985] Fps is (10 sec: 42598.6, 60 sec: 45329.1, 300 sec: 47319.2). Total num frames: 1720188928. Throughput: 0: 11696.3. Samples: 430102528. Policy #0 lag: (min: 15.0, avg: 112.7, max: 271.0) [2024-06-15 21:48:40,956][1648985] Avg episode reward: [(0, '172.530')] [2024-06-15 21:48:41,511][1652491] Updated weights for policy 0, policy_version 839952 (0.0014) [2024-06-15 21:48:45,955][1648985] Fps is (10 sec: 39321.8, 60 sec: 45329.0, 300 sec: 46763.8). Total num frames: 1720352768. Throughput: 0: 11628.1. Samples: 430173184. Policy #0 lag: (min: 15.0, avg: 112.7, max: 271.0) [2024-06-15 21:48:45,956][1648985] Avg episode reward: [(0, '175.850')] [2024-06-15 21:48:46,045][1652491] Updated weights for policy 0, policy_version 840032 (0.0024) [2024-06-15 21:48:48,013][1652491] Updated weights for policy 0, policy_version 840084 (0.0013) [2024-06-15 21:48:49,555][1652491] Updated weights for policy 0, policy_version 840147 (0.0011) [2024-06-15 21:48:50,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 47513.6, 300 sec: 47652.4). Total num frames: 1720713216. Throughput: 0: 11366.4. Samples: 430233088. Policy #0 lag: (min: 15.0, avg: 112.7, max: 271.0) [2024-06-15 21:48:50,956][1648985] Avg episode reward: [(0, '142.820')] [2024-06-15 21:48:53,855][1652491] Updated weights for policy 0, policy_version 840224 (0.0013) [2024-06-15 21:48:55,955][1648985] Fps is (10 sec: 49150.8, 60 sec: 45875.0, 300 sec: 46652.7). Total num frames: 1720844288. Throughput: 0: 11457.4. Samples: 430270976. Policy #0 lag: (min: 15.0, avg: 112.7, max: 271.0) [2024-06-15 21:48:55,956][1648985] Avg episode reward: [(0, '134.600')] [2024-06-15 21:48:55,961][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000840256_1720844288.pth... [2024-06-15 21:48:56,056][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000834784_1709637632.pth [2024-06-15 21:48:57,265][1652491] Updated weights for policy 0, policy_version 840263 (0.0027) [2024-06-15 21:48:58,479][1652491] Updated weights for policy 0, policy_version 840317 (0.0015) [2024-06-15 21:49:00,033][1652491] Updated weights for policy 0, policy_version 840369 (0.0017) [2024-06-15 21:49:00,955][1648985] Fps is (10 sec: 42598.5, 60 sec: 46967.4, 300 sec: 47652.4). Total num frames: 1721139200. Throughput: 0: 11514.3. Samples: 430341120. Policy #0 lag: (min: 15.0, avg: 112.7, max: 271.0) [2024-06-15 21:49:00,956][1648985] Avg episode reward: [(0, '137.270')] [2024-06-15 21:49:01,414][1652491] Updated weights for policy 0, policy_version 840432 (0.0081) [2024-06-15 21:49:04,072][1652491] Updated weights for policy 0, policy_version 840464 (0.0010) [2024-06-15 21:49:04,960][1652491] Updated weights for policy 0, policy_version 840512 (0.0013) [2024-06-15 21:49:05,955][1648985] Fps is (10 sec: 52429.8, 60 sec: 45875.3, 300 sec: 46986.0). Total num frames: 1721368576. Throughput: 0: 11662.2. Samples: 430414848. Policy #0 lag: (min: 15.0, avg: 112.7, max: 271.0) [2024-06-15 21:49:05,956][1648985] Avg episode reward: [(0, '153.180')] [2024-06-15 21:49:09,728][1652491] Updated weights for policy 0, policy_version 840576 (0.0013) [2024-06-15 21:49:10,955][1648985] Fps is (10 sec: 42598.7, 60 sec: 46967.5, 300 sec: 47319.2). Total num frames: 1721565184. Throughput: 0: 11730.5. Samples: 430454272. Policy #0 lag: (min: 15.0, avg: 112.7, max: 271.0) [2024-06-15 21:49:10,955][1648985] Avg episode reward: [(0, '156.600')] [2024-06-15 21:49:12,340][1652491] Updated weights for policy 0, policy_version 840672 (0.0086) [2024-06-15 21:49:15,132][1652491] Updated weights for policy 0, policy_version 840713 (0.0030) [2024-06-15 21:49:15,955][1648985] Fps is (10 sec: 49152.2, 60 sec: 46421.4, 300 sec: 47319.2). Total num frames: 1721860096. Throughput: 0: 11491.6. Samples: 430518272. Policy #0 lag: (min: 15.0, avg: 112.7, max: 271.0) [2024-06-15 21:49:15,956][1648985] Avg episode reward: [(0, '163.350')] [2024-06-15 21:49:16,019][1652491] Updated weights for policy 0, policy_version 840762 (0.0020) [2024-06-15 21:49:20,275][1651469] Signal inference workers to stop experience collection... (43800 times) [2024-06-15 21:49:20,312][1652491] InferenceWorker_p0-w0: stopping experience collection (43800 times) [2024-06-15 21:49:20,529][1651469] Signal inference workers to resume experience collection... (43800 times) [2024-06-15 21:49:20,530][1652491] InferenceWorker_p0-w0: resuming experience collection (43800 times) [2024-06-15 21:49:20,717][1652491] Updated weights for policy 0, policy_version 840828 (0.0013) [2024-06-15 21:49:20,955][1648985] Fps is (10 sec: 45874.7, 60 sec: 45875.2, 300 sec: 47097.1). Total num frames: 1722023936. Throughput: 0: 11537.1. Samples: 430593024. Policy #0 lag: (min: 15.0, avg: 112.7, max: 271.0) [2024-06-15 21:49:20,956][1648985] Avg episode reward: [(0, '159.180')] [2024-06-15 21:49:22,666][1652491] Updated weights for policy 0, policy_version 840912 (0.0013) [2024-06-15 21:49:23,736][1652491] Updated weights for policy 0, policy_version 840959 (0.0030) [2024-06-15 21:49:25,955][1648985] Fps is (10 sec: 45875.5, 60 sec: 46967.5, 300 sec: 47208.1). Total num frames: 1722318848. Throughput: 0: 11548.5. Samples: 430622208. Policy #0 lag: (min: 15.0, avg: 112.7, max: 271.0) [2024-06-15 21:49:25,955][1648985] Avg episode reward: [(0, '149.100')] [2024-06-15 21:49:26,865][1652491] Updated weights for policy 0, policy_version 841021 (0.0016) [2024-06-15 21:49:30,955][1648985] Fps is (10 sec: 39321.2, 60 sec: 44236.7, 300 sec: 46652.7). Total num frames: 1722417152. Throughput: 0: 11696.3. Samples: 430699520. Policy #0 lag: (min: 15.0, avg: 112.7, max: 271.0) [2024-06-15 21:49:30,956][1648985] Avg episode reward: [(0, '146.780')] [2024-06-15 21:49:31,974][1652491] Updated weights for policy 0, policy_version 841081 (0.0025) [2024-06-15 21:49:33,590][1652491] Updated weights for policy 0, policy_version 841138 (0.0012) [2024-06-15 21:49:35,093][1652491] Updated weights for policy 0, policy_version 841207 (0.0014) [2024-06-15 21:49:35,955][1648985] Fps is (10 sec: 49151.4, 60 sec: 47513.6, 300 sec: 47430.3). Total num frames: 1722810368. Throughput: 0: 11753.2. Samples: 430761984. Policy #0 lag: (min: 15.0, avg: 112.7, max: 271.0) [2024-06-15 21:49:35,956][1648985] Avg episode reward: [(0, '162.690')] [2024-06-15 21:49:38,358][1652491] Updated weights for policy 0, policy_version 841279 (0.0104) [2024-06-15 21:49:40,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 45875.1, 300 sec: 46652.7). Total num frames: 1722941440. Throughput: 0: 11707.8. Samples: 430797824. Policy #0 lag: (min: 15.0, avg: 112.7, max: 271.0) [2024-06-15 21:49:40,956][1648985] Avg episode reward: [(0, '169.550')] [2024-06-15 21:49:42,376][1652491] Updated weights for policy 0, policy_version 841338 (0.0012) [2024-06-15 21:49:44,541][1652491] Updated weights for policy 0, policy_version 841401 (0.0014) [2024-06-15 21:49:45,726][1652491] Updated weights for policy 0, policy_version 841446 (0.0012) [2024-06-15 21:49:45,955][1648985] Fps is (10 sec: 49151.5, 60 sec: 49151.9, 300 sec: 47430.3). Total num frames: 1723301888. Throughput: 0: 11821.5. Samples: 430873088. Policy #0 lag: (min: 15.0, avg: 112.7, max: 271.0) [2024-06-15 21:49:45,956][1648985] Avg episode reward: [(0, '172.220')] [2024-06-15 21:49:47,735][1652491] Updated weights for policy 0, policy_version 841473 (0.0012) [2024-06-15 21:49:50,955][1648985] Fps is (10 sec: 52429.9, 60 sec: 45875.3, 300 sec: 46986.0). Total num frames: 1723465728. Throughput: 0: 11878.4. Samples: 430949376. Policy #0 lag: (min: 15.0, avg: 112.7, max: 271.0) [2024-06-15 21:49:50,955][1648985] Avg episode reward: [(0, '175.780')] [2024-06-15 21:49:52,069][1652491] Updated weights for policy 0, policy_version 841553 (0.0016) [2024-06-15 21:49:54,077][1652491] Updated weights for policy 0, policy_version 841601 (0.0024) [2024-06-15 21:49:55,370][1652491] Updated weights for policy 0, policy_version 841661 (0.0015) [2024-06-15 21:49:55,955][1648985] Fps is (10 sec: 45876.1, 60 sec: 48606.1, 300 sec: 47208.2). Total num frames: 1723760640. Throughput: 0: 11889.8. Samples: 430989312. Policy #0 lag: (min: 15.0, avg: 112.7, max: 271.0) [2024-06-15 21:49:55,956][1648985] Avg episode reward: [(0, '178.670')] [2024-06-15 21:49:56,751][1652491] Updated weights for policy 0, policy_version 841715 (0.0013) [2024-06-15 21:49:58,727][1652491] Updated weights for policy 0, policy_version 841744 (0.0013) [2024-06-15 21:49:59,886][1652491] Updated weights for policy 0, policy_version 841787 (0.0012) [2024-06-15 21:50:00,955][1648985] Fps is (10 sec: 52427.6, 60 sec: 47513.5, 300 sec: 47097.0). Total num frames: 1723990016. Throughput: 0: 11901.1. Samples: 431053824. Policy #0 lag: (min: 15.0, avg: 148.8, max: 271.0) [2024-06-15 21:50:00,956][1648985] Avg episode reward: [(0, '178.210')] [2024-06-15 21:50:02,566][1651469] Signal inference workers to stop experience collection... (43850 times) [2024-06-15 21:50:02,595][1652491] InferenceWorker_p0-w0: stopping experience collection (43850 times) [2024-06-15 21:50:02,854][1651469] Signal inference workers to resume experience collection... (43850 times) [2024-06-15 21:50:02,855][1652491] InferenceWorker_p0-w0: resuming experience collection (43850 times) [2024-06-15 21:50:02,857][1652491] Updated weights for policy 0, policy_version 841824 (0.0013) [2024-06-15 21:50:03,597][1652491] Updated weights for policy 0, policy_version 841854 (0.0011) [2024-06-15 21:50:05,955][1648985] Fps is (10 sec: 42598.1, 60 sec: 46967.5, 300 sec: 47208.1). Total num frames: 1724186624. Throughput: 0: 12049.1. Samples: 431135232. Policy #0 lag: (min: 15.0, avg: 148.8, max: 271.0) [2024-06-15 21:50:05,956][1648985] Avg episode reward: [(0, '171.410')] [2024-06-15 21:50:06,681][1652491] Updated weights for policy 0, policy_version 841929 (0.0015) [2024-06-15 21:50:09,327][1652491] Updated weights for policy 0, policy_version 842000 (0.0014) [2024-06-15 21:50:10,955][1648985] Fps is (10 sec: 52429.6, 60 sec: 49152.0, 300 sec: 47208.1). Total num frames: 1724514304. Throughput: 0: 12151.4. Samples: 431169024. Policy #0 lag: (min: 15.0, avg: 148.8, max: 271.0) [2024-06-15 21:50:10,956][1648985] Avg episode reward: [(0, '157.690')] [2024-06-15 21:50:13,065][1652491] Updated weights for policy 0, policy_version 842050 (0.0017) [2024-06-15 21:50:14,521][1652491] Updated weights for policy 0, policy_version 842102 (0.0012) [2024-06-15 21:50:15,955][1648985] Fps is (10 sec: 45875.2, 60 sec: 46421.3, 300 sec: 47097.1). Total num frames: 1724645376. Throughput: 0: 11946.7. Samples: 431237120. Policy #0 lag: (min: 15.0, avg: 148.8, max: 271.0) [2024-06-15 21:50:15,956][1648985] Avg episode reward: [(0, '164.970')] [2024-06-15 21:50:17,324][1652491] Updated weights for policy 0, policy_version 842144 (0.0011) [2024-06-15 21:50:18,529][1652491] Updated weights for policy 0, policy_version 842208 (0.0012) [2024-06-15 21:50:20,337][1652491] Updated weights for policy 0, policy_version 842258 (0.0012) [2024-06-15 21:50:20,955][1648985] Fps is (10 sec: 49151.1, 60 sec: 49698.0, 300 sec: 47430.3). Total num frames: 1725005824. Throughput: 0: 12140.1. Samples: 431308288. Policy #0 lag: (min: 15.0, avg: 148.8, max: 271.0) [2024-06-15 21:50:20,956][1648985] Avg episode reward: [(0, '159.720')] [2024-06-15 21:50:24,103][1652491] Updated weights for policy 0, policy_version 842320 (0.0012) [2024-06-15 21:50:25,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 47513.5, 300 sec: 47097.1). Total num frames: 1725169664. Throughput: 0: 12208.4. Samples: 431347200. Policy #0 lag: (min: 15.0, avg: 148.8, max: 271.0) [2024-06-15 21:50:25,956][1648985] Avg episode reward: [(0, '157.770')] [2024-06-15 21:50:28,309][1652491] Updated weights for policy 0, policy_version 842400 (0.0013) [2024-06-15 21:50:29,746][1652491] Updated weights for policy 0, policy_version 842464 (0.0013) [2024-06-15 21:50:30,955][1648985] Fps is (10 sec: 42599.5, 60 sec: 50244.5, 300 sec: 47319.2). Total num frames: 1725431808. Throughput: 0: 12026.4. Samples: 431414272. Policy #0 lag: (min: 15.0, avg: 148.8, max: 271.0) [2024-06-15 21:50:30,955][1648985] Avg episode reward: [(0, '152.400')] [2024-06-15 21:50:31,564][1652491] Updated weights for policy 0, policy_version 842499 (0.0013) [2024-06-15 21:50:32,807][1652491] Updated weights for policy 0, policy_version 842555 (0.0012) [2024-06-15 21:50:35,955][1648985] Fps is (10 sec: 42597.9, 60 sec: 46421.3, 300 sec: 47097.3). Total num frames: 1725595648. Throughput: 0: 11912.5. Samples: 431485440. Policy #0 lag: (min: 15.0, avg: 148.8, max: 271.0) [2024-06-15 21:50:35,956][1648985] Avg episode reward: [(0, '160.440')] [2024-06-15 21:50:36,958][1652491] Updated weights for policy 0, policy_version 842624 (0.0013) [2024-06-15 21:50:40,530][1652491] Updated weights for policy 0, policy_version 842696 (0.0018) [2024-06-15 21:50:40,955][1648985] Fps is (10 sec: 42597.9, 60 sec: 48605.9, 300 sec: 47208.1). Total num frames: 1725857792. Throughput: 0: 11832.9. Samples: 431521792. Policy #0 lag: (min: 15.0, avg: 148.8, max: 271.0) [2024-06-15 21:50:40,956][1648985] Avg episode reward: [(0, '171.590')] [2024-06-15 21:50:41,723][1652491] Updated weights for policy 0, policy_version 842741 (0.0012) [2024-06-15 21:50:43,732][1651469] Signal inference workers to stop experience collection... (43900 times) [2024-06-15 21:50:43,870][1652491] InferenceWorker_p0-w0: stopping experience collection (43900 times) [2024-06-15 21:50:43,947][1651469] Signal inference workers to resume experience collection... (43900 times) [2024-06-15 21:50:43,948][1652491] InferenceWorker_p0-w0: resuming experience collection (43900 times) [2024-06-15 21:50:43,949][1652491] Updated weights for policy 0, policy_version 842803 (0.0117) [2024-06-15 21:50:45,955][1648985] Fps is (10 sec: 49152.7, 60 sec: 46421.5, 300 sec: 47097.1). Total num frames: 1726087168. Throughput: 0: 11889.8. Samples: 431588864. Policy #0 lag: (min: 15.0, avg: 148.8, max: 271.0) [2024-06-15 21:50:45,956][1648985] Avg episode reward: [(0, '171.400')] [2024-06-15 21:50:47,599][1652491] Updated weights for policy 0, policy_version 842853 (0.0013) [2024-06-15 21:50:50,467][1652491] Updated weights for policy 0, policy_version 842912 (0.0015) [2024-06-15 21:50:50,955][1648985] Fps is (10 sec: 45875.1, 60 sec: 47513.5, 300 sec: 47319.2). Total num frames: 1726316544. Throughput: 0: 11707.7. Samples: 431662080. Policy #0 lag: (min: 15.0, avg: 148.8, max: 271.0) [2024-06-15 21:50:50,956][1648985] Avg episode reward: [(0, '160.380')] [2024-06-15 21:50:52,115][1652491] Updated weights for policy 0, policy_version 842977 (0.0097) [2024-06-15 21:50:52,842][1652491] Updated weights for policy 0, policy_version 843008 (0.0013) [2024-06-15 21:50:55,955][1648985] Fps is (10 sec: 52428.4, 60 sec: 47513.5, 300 sec: 47430.3). Total num frames: 1726611456. Throughput: 0: 11787.4. Samples: 431699456. Policy #0 lag: (min: 15.0, avg: 148.8, max: 271.0) [2024-06-15 21:50:55,956][1648985] Avg episode reward: [(0, '170.620')] [2024-06-15 21:50:55,976][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000843072_1726611456.pth... [2024-06-15 21:50:56,032][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000837520_1715240960.pth [2024-06-15 21:50:56,057][1651469] Saving a milestone train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/milestones/checkpoint_000843072_1726611456.pth [2024-06-15 21:50:57,569][1652491] Updated weights for policy 0, policy_version 843088 (0.0013) [2024-06-15 21:51:00,955][1648985] Fps is (10 sec: 42598.5, 60 sec: 45875.3, 300 sec: 47097.0). Total num frames: 1726742528. Throughput: 0: 11810.1. Samples: 431768576. Policy #0 lag: (min: 15.0, avg: 148.8, max: 271.0) [2024-06-15 21:51:00,956][1648985] Avg episode reward: [(0, '169.470')] [2024-06-15 21:51:02,159][1652491] Updated weights for policy 0, policy_version 843184 (0.0215) [2024-06-15 21:51:03,393][1652491] Updated weights for policy 0, policy_version 843237 (0.0011) [2024-06-15 21:51:04,873][1652491] Updated weights for policy 0, policy_version 843268 (0.0011) [2024-06-15 21:51:05,955][1648985] Fps is (10 sec: 45874.2, 60 sec: 48059.5, 300 sec: 47319.2). Total num frames: 1727070208. Throughput: 0: 11776.0. Samples: 431838208. Policy #0 lag: (min: 15.0, avg: 148.8, max: 271.0) [2024-06-15 21:51:05,956][1648985] Avg episode reward: [(0, '166.190')] [2024-06-15 21:51:06,273][1652491] Updated weights for policy 0, policy_version 843327 (0.0030) [2024-06-15 21:51:09,107][1652491] Updated weights for policy 0, policy_version 843386 (0.0036) [2024-06-15 21:51:10,955][1648985] Fps is (10 sec: 52429.1, 60 sec: 45875.2, 300 sec: 47097.1). Total num frames: 1727266816. Throughput: 0: 11764.6. Samples: 431876608. Policy #0 lag: (min: 15.0, avg: 148.8, max: 271.0) [2024-06-15 21:51:10,956][1648985] Avg episode reward: [(0, '165.050')] [2024-06-15 21:51:13,347][1652491] Updated weights for policy 0, policy_version 843446 (0.0013) [2024-06-15 21:51:14,359][1652491] Updated weights for policy 0, policy_version 843492 (0.0013) [2024-06-15 21:51:15,955][1648985] Fps is (10 sec: 52431.0, 60 sec: 49152.2, 300 sec: 47652.5). Total num frames: 1727594496. Throughput: 0: 11855.7. Samples: 431947776. Policy #0 lag: (min: 15.0, avg: 148.8, max: 271.0) [2024-06-15 21:51:15,955][1648985] Avg episode reward: [(0, '164.200')] [2024-06-15 21:51:16,397][1652491] Updated weights for policy 0, policy_version 843575 (0.0012) [2024-06-15 21:51:19,824][1652491] Updated weights for policy 0, policy_version 843618 (0.0061) [2024-06-15 21:51:20,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 46421.5, 300 sec: 47208.1). Total num frames: 1727791104. Throughput: 0: 11901.2. Samples: 432020992. Policy #0 lag: (min: 15.0, avg: 148.8, max: 271.0) [2024-06-15 21:51:20,956][1648985] Avg episode reward: [(0, '155.810')] [2024-06-15 21:51:23,804][1652491] Updated weights for policy 0, policy_version 843672 (0.0012) [2024-06-15 21:51:25,520][1651469] Signal inference workers to stop experience collection... (43950 times) [2024-06-15 21:51:25,607][1652491] InferenceWorker_p0-w0: stopping experience collection (43950 times) [2024-06-15 21:51:25,795][1651469] Signal inference workers to resume experience collection... (43950 times) [2024-06-15 21:51:25,796][1652491] InferenceWorker_p0-w0: resuming experience collection (43950 times) [2024-06-15 21:51:25,799][1652491] Updated weights for policy 0, policy_version 843760 (0.0126) [2024-06-15 21:51:25,955][1648985] Fps is (10 sec: 42597.6, 60 sec: 47513.6, 300 sec: 47430.3). Total num frames: 1728020480. Throughput: 0: 11958.0. Samples: 432059904. Policy #0 lag: (min: 15.0, avg: 148.8, max: 271.0) [2024-06-15 21:51:25,956][1648985] Avg episode reward: [(0, '164.260')] [2024-06-15 21:51:27,750][1652491] Updated weights for policy 0, policy_version 843832 (0.0034) [2024-06-15 21:51:30,955][1648985] Fps is (10 sec: 42598.2, 60 sec: 46421.2, 300 sec: 47097.0). Total num frames: 1728217088. Throughput: 0: 11855.6. Samples: 432122368. Policy #0 lag: (min: 15.0, avg: 148.8, max: 271.0) [2024-06-15 21:51:30,956][1648985] Avg episode reward: [(0, '166.960')] [2024-06-15 21:51:31,924][1652491] Updated weights for policy 0, policy_version 843902 (0.0028) [2024-06-15 21:51:35,955][1648985] Fps is (10 sec: 32768.0, 60 sec: 45875.3, 300 sec: 46763.8). Total num frames: 1728348160. Throughput: 0: 11867.0. Samples: 432196096. Policy #0 lag: (min: 15.0, avg: 148.8, max: 271.0) [2024-06-15 21:51:35,956][1648985] Avg episode reward: [(0, '156.800')] [2024-06-15 21:51:37,145][1652491] Updated weights for policy 0, policy_version 843971 (0.0019) [2024-06-15 21:51:39,283][1652491] Updated weights for policy 0, policy_version 844052 (0.0012) [2024-06-15 21:51:40,955][1648985] Fps is (10 sec: 49152.9, 60 sec: 47513.7, 300 sec: 47097.1). Total num frames: 1728708608. Throughput: 0: 11525.7. Samples: 432218112. Policy #0 lag: (min: 15.0, avg: 148.8, max: 271.0) [2024-06-15 21:51:40,955][1648985] Avg episode reward: [(0, '157.860')] [2024-06-15 21:51:42,908][1652491] Updated weights for policy 0, policy_version 844114 (0.0012) [2024-06-15 21:51:43,837][1652491] Updated weights for policy 0, policy_version 844160 (0.0012) [2024-06-15 21:51:45,955][1648985] Fps is (10 sec: 49152.4, 60 sec: 45875.2, 300 sec: 46763.8). Total num frames: 1728839680. Throughput: 0: 11537.1. Samples: 432287744. Policy #0 lag: (min: 15.0, avg: 148.8, max: 271.0) [2024-06-15 21:51:45,956][1648985] Avg episode reward: [(0, '136.650')] [2024-06-15 21:51:48,490][1652491] Updated weights for policy 0, policy_version 844230 (0.0015) [2024-06-15 21:51:50,328][1652491] Updated weights for policy 0, policy_version 844304 (0.0014) [2024-06-15 21:51:50,955][1648985] Fps is (10 sec: 49150.8, 60 sec: 48059.7, 300 sec: 47430.3). Total num frames: 1729200128. Throughput: 0: 11446.1. Samples: 432353280. Policy #0 lag: (min: 32.0, avg: 146.6, max: 336.0) [2024-06-15 21:51:50,956][1648985] Avg episode reward: [(0, '145.970')] [2024-06-15 21:51:51,371][1652491] Updated weights for policy 0, policy_version 844351 (0.0024) [2024-06-15 21:51:54,965][1652491] Updated weights for policy 0, policy_version 844408 (0.0012) [2024-06-15 21:51:55,955][1648985] Fps is (10 sec: 52427.8, 60 sec: 45875.1, 300 sec: 46986.0). Total num frames: 1729363968. Throughput: 0: 11605.3. Samples: 432398848. Policy #0 lag: (min: 32.0, avg: 146.6, max: 336.0) [2024-06-15 21:51:55,956][1648985] Avg episode reward: [(0, '134.880')] [2024-06-15 21:51:58,126][1652491] Updated weights for policy 0, policy_version 844450 (0.0012) [2024-06-15 21:51:59,337][1652491] Updated weights for policy 0, policy_version 844483 (0.0013) [2024-06-15 21:52:00,955][1648985] Fps is (10 sec: 45875.4, 60 sec: 48605.9, 300 sec: 47208.1). Total num frames: 1729658880. Throughput: 0: 11593.9. Samples: 432469504. Policy #0 lag: (min: 32.0, avg: 146.6, max: 336.0) [2024-06-15 21:52:00,956][1648985] Avg episode reward: [(0, '156.800')] [2024-06-15 21:52:01,461][1652491] Updated weights for policy 0, policy_version 844577 (0.0099) [2024-06-15 21:52:04,408][1652491] Updated weights for policy 0, policy_version 844624 (0.0014) [2024-06-15 21:52:05,603][1652491] Updated weights for policy 0, policy_version 844670 (0.0011) [2024-06-15 21:52:05,955][1648985] Fps is (10 sec: 52430.0, 60 sec: 46967.7, 300 sec: 47097.1). Total num frames: 1729888256. Throughput: 0: 11525.7. Samples: 432539648. Policy #0 lag: (min: 32.0, avg: 146.6, max: 336.0) [2024-06-15 21:52:05,956][1648985] Avg episode reward: [(0, '160.990')] [2024-06-15 21:52:08,517][1651469] Signal inference workers to stop experience collection... (44000 times) [2024-06-15 21:52:08,564][1652491] InferenceWorker_p0-w0: stopping experience collection (44000 times) [2024-06-15 21:52:08,789][1651469] Signal inference workers to resume experience collection... (44000 times) [2024-06-15 21:52:08,790][1652491] InferenceWorker_p0-w0: resuming experience collection (44000 times) [2024-06-15 21:52:09,340][1652491] Updated weights for policy 0, policy_version 844730 (0.0013) [2024-06-15 21:52:10,955][1648985] Fps is (10 sec: 39321.6, 60 sec: 46421.3, 300 sec: 46986.0). Total num frames: 1730052096. Throughput: 0: 11548.4. Samples: 432579584. Policy #0 lag: (min: 32.0, avg: 146.6, max: 336.0) [2024-06-15 21:52:10,956][1648985] Avg episode reward: [(0, '172.080')] [2024-06-15 21:52:11,456][1652491] Updated weights for policy 0, policy_version 844784 (0.0022) [2024-06-15 21:52:13,143][1652491] Updated weights for policy 0, policy_version 844860 (0.0013) [2024-06-15 21:52:15,955][1648985] Fps is (10 sec: 45875.2, 60 sec: 45875.1, 300 sec: 46874.9). Total num frames: 1730347008. Throughput: 0: 11639.5. Samples: 432646144. Policy #0 lag: (min: 32.0, avg: 146.6, max: 336.0) [2024-06-15 21:52:15,955][1648985] Avg episode reward: [(0, '160.860')] [2024-06-15 21:52:16,283][1652491] Updated weights for policy 0, policy_version 844912 (0.0015) [2024-06-15 21:52:20,042][1652491] Updated weights for policy 0, policy_version 844960 (0.0014) [2024-06-15 21:52:20,955][1648985] Fps is (10 sec: 49151.6, 60 sec: 45875.1, 300 sec: 46986.0). Total num frames: 1730543616. Throughput: 0: 11616.7. Samples: 432718848. Policy #0 lag: (min: 32.0, avg: 146.6, max: 336.0) [2024-06-15 21:52:20,956][1648985] Avg episode reward: [(0, '146.550')] [2024-06-15 21:52:21,900][1652491] Updated weights for policy 0, policy_version 845013 (0.0015) [2024-06-15 21:52:23,623][1652491] Updated weights for policy 0, policy_version 845088 (0.0014) [2024-06-15 21:52:25,955][1648985] Fps is (10 sec: 45874.8, 60 sec: 46421.4, 300 sec: 46652.7). Total num frames: 1730805760. Throughput: 0: 11764.6. Samples: 432747520. Policy #0 lag: (min: 32.0, avg: 146.6, max: 336.0) [2024-06-15 21:52:25,956][1648985] Avg episode reward: [(0, '150.700')] [2024-06-15 21:52:26,676][1652491] Updated weights for policy 0, policy_version 845138 (0.0014) [2024-06-15 21:52:27,601][1652491] Updated weights for policy 0, policy_version 845184 (0.0012) [2024-06-15 21:52:30,955][1648985] Fps is (10 sec: 49152.6, 60 sec: 46967.5, 300 sec: 46986.0). Total num frames: 1731035136. Throughput: 0: 12003.5. Samples: 432827904. Policy #0 lag: (min: 32.0, avg: 146.6, max: 336.0) [2024-06-15 21:52:30,956][1648985] Avg episode reward: [(0, '170.550')] [2024-06-15 21:52:32,500][1652491] Updated weights for policy 0, policy_version 845250 (0.0025) [2024-06-15 21:52:34,472][1652491] Updated weights for policy 0, policy_version 845331 (0.0095) [2024-06-15 21:52:35,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 49698.2, 300 sec: 46986.0). Total num frames: 1731330048. Throughput: 0: 11889.8. Samples: 432888320. Policy #0 lag: (min: 32.0, avg: 146.6, max: 336.0) [2024-06-15 21:52:35,956][1648985] Avg episode reward: [(0, '177.920')] [2024-06-15 21:52:37,532][1652491] Updated weights for policy 0, policy_version 845393 (0.0013) [2024-06-15 21:52:38,351][1652491] Updated weights for policy 0, policy_version 845437 (0.0010) [2024-06-15 21:52:40,955][1648985] Fps is (10 sec: 42598.3, 60 sec: 45875.1, 300 sec: 46874.9). Total num frames: 1731461120. Throughput: 0: 11753.3. Samples: 432927744. Policy #0 lag: (min: 32.0, avg: 146.6, max: 336.0) [2024-06-15 21:52:40,956][1648985] Avg episode reward: [(0, '154.390')] [2024-06-15 21:52:42,961][1652491] Updated weights for policy 0, policy_version 845499 (0.0013) [2024-06-15 21:52:45,000][1652491] Updated weights for policy 0, policy_version 845552 (0.0032) [2024-06-15 21:52:45,955][1648985] Fps is (10 sec: 42597.1, 60 sec: 48605.6, 300 sec: 47097.0). Total num frames: 1731756032. Throughput: 0: 11867.0. Samples: 433003520. Policy #0 lag: (min: 32.0, avg: 146.6, max: 336.0) [2024-06-15 21:52:45,956][1648985] Avg episode reward: [(0, '143.740')] [2024-06-15 21:52:46,741][1652491] Updated weights for policy 0, policy_version 845616 (0.0015) [2024-06-15 21:52:48,964][1651469] Signal inference workers to stop experience collection... (44050 times) [2024-06-15 21:52:49,006][1652491] InferenceWorker_p0-w0: stopping experience collection (44050 times) [2024-06-15 21:52:49,217][1651469] Signal inference workers to resume experience collection... (44050 times) [2024-06-15 21:52:49,217][1652491] InferenceWorker_p0-w0: resuming experience collection (44050 times) [2024-06-15 21:52:49,383][1652491] Updated weights for policy 0, policy_version 845687 (0.0014) [2024-06-15 21:52:50,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 46421.4, 300 sec: 47097.1). Total num frames: 1731985408. Throughput: 0: 11673.6. Samples: 433064960. Policy #0 lag: (min: 32.0, avg: 146.6, max: 336.0) [2024-06-15 21:52:50,956][1648985] Avg episode reward: [(0, '146.390')] [2024-06-15 21:52:54,750][1652491] Updated weights for policy 0, policy_version 845752 (0.0129) [2024-06-15 21:52:55,955][1648985] Fps is (10 sec: 39322.8, 60 sec: 46421.5, 300 sec: 46874.9). Total num frames: 1732149248. Throughput: 0: 11696.4. Samples: 433105920. Policy #0 lag: (min: 32.0, avg: 146.6, max: 336.0) [2024-06-15 21:52:55,956][1648985] Avg episode reward: [(0, '162.730')] [2024-06-15 21:52:56,165][1652491] Updated weights for policy 0, policy_version 845793 (0.0087) [2024-06-15 21:52:56,379][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000845808_1732214784.pth... [2024-06-15 21:52:56,573][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000840256_1720844288.pth [2024-06-15 21:52:58,133][1652491] Updated weights for policy 0, policy_version 845872 (0.0130) [2024-06-15 21:52:59,720][1652491] Updated weights for policy 0, policy_version 845889 (0.0012) [2024-06-15 21:53:00,955][1648985] Fps is (10 sec: 52428.1, 60 sec: 47513.5, 300 sec: 47097.1). Total num frames: 1732509696. Throughput: 0: 11593.9. Samples: 433167872. Policy #0 lag: (min: 32.0, avg: 146.6, max: 336.0) [2024-06-15 21:53:00,956][1648985] Avg episode reward: [(0, '174.830')] [2024-06-15 21:53:05,374][1652491] Updated weights for policy 0, policy_version 845968 (0.0013) [2024-06-15 21:53:05,955][1648985] Fps is (10 sec: 42598.3, 60 sec: 44782.9, 300 sec: 46874.9). Total num frames: 1732575232. Throughput: 0: 11605.4. Samples: 433241088. Policy #0 lag: (min: 32.0, avg: 146.6, max: 336.0) [2024-06-15 21:53:05,956][1648985] Avg episode reward: [(0, '184.590')] [2024-06-15 21:53:07,388][1652491] Updated weights for policy 0, policy_version 846032 (0.0013) [2024-06-15 21:53:08,901][1652491] Updated weights for policy 0, policy_version 846083 (0.0056) [2024-06-15 21:53:10,158][1652491] Updated weights for policy 0, policy_version 846140 (0.0029) [2024-06-15 21:53:10,955][1648985] Fps is (10 sec: 39322.1, 60 sec: 47513.6, 300 sec: 46874.9). Total num frames: 1732902912. Throughput: 0: 11537.1. Samples: 433266688. Policy #0 lag: (min: 32.0, avg: 146.6, max: 336.0) [2024-06-15 21:53:10,956][1648985] Avg episode reward: [(0, '175.000')] [2024-06-15 21:53:12,005][1652491] Updated weights for policy 0, policy_version 846182 (0.0013) [2024-06-15 21:53:15,955][1648985] Fps is (10 sec: 45875.1, 60 sec: 44782.9, 300 sec: 46652.7). Total num frames: 1733033984. Throughput: 0: 11309.5. Samples: 433336832. Policy #0 lag: (min: 32.0, avg: 146.6, max: 336.0) [2024-06-15 21:53:15,956][1648985] Avg episode reward: [(0, '164.770')] [2024-06-15 21:53:18,050][1652491] Updated weights for policy 0, policy_version 846242 (0.0018) [2024-06-15 21:53:20,064][1652491] Updated weights for policy 0, policy_version 846320 (0.0013) [2024-06-15 21:53:20,955][1648985] Fps is (10 sec: 39322.1, 60 sec: 45875.3, 300 sec: 46763.8). Total num frames: 1733296128. Throughput: 0: 11400.5. Samples: 433401344. Policy #0 lag: (min: 32.0, avg: 146.6, max: 336.0) [2024-06-15 21:53:20,955][1648985] Avg episode reward: [(0, '153.860')] [2024-06-15 21:53:21,467][1652491] Updated weights for policy 0, policy_version 846368 (0.0014) [2024-06-15 21:53:23,513][1652491] Updated weights for policy 0, policy_version 846420 (0.0013) [2024-06-15 21:53:24,497][1652491] Updated weights for policy 0, policy_version 846462 (0.0024) [2024-06-15 21:53:25,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 45875.2, 300 sec: 46763.8). Total num frames: 1733558272. Throughput: 0: 11104.7. Samples: 433427456. Policy #0 lag: (min: 32.0, avg: 146.6, max: 336.0) [2024-06-15 21:53:25,956][1648985] Avg episode reward: [(0, '156.140')] [2024-06-15 21:53:30,955][1648985] Fps is (10 sec: 39321.4, 60 sec: 44236.8, 300 sec: 46541.7). Total num frames: 1733689344. Throughput: 0: 11207.2. Samples: 433507840. Policy #0 lag: (min: 32.0, avg: 146.6, max: 336.0) [2024-06-15 21:53:30,956][1648985] Avg episode reward: [(0, '151.320')] [2024-06-15 21:53:31,113][1652491] Updated weights for policy 0, policy_version 846533 (0.0013) [2024-06-15 21:53:33,591][1652491] Updated weights for policy 0, policy_version 846624 (0.0011) [2024-06-15 21:53:33,684][1651469] Signal inference workers to stop experience collection... (44100 times) [2024-06-15 21:53:33,754][1652491] InferenceWorker_p0-w0: stopping experience collection (44100 times) [2024-06-15 21:53:33,966][1651469] Signal inference workers to resume experience collection... (44100 times) [2024-06-15 21:53:33,967][1652491] InferenceWorker_p0-w0: resuming experience collection (44100 times) [2024-06-15 21:53:35,709][1652491] Updated weights for policy 0, policy_version 846690 (0.0106) [2024-06-15 21:53:35,955][1648985] Fps is (10 sec: 49152.1, 60 sec: 45329.0, 300 sec: 46986.0). Total num frames: 1734049792. Throughput: 0: 10956.8. Samples: 433558016. Policy #0 lag: (min: 32.0, avg: 146.6, max: 336.0) [2024-06-15 21:53:35,956][1648985] Avg episode reward: [(0, '150.160')] [2024-06-15 21:53:40,955][1648985] Fps is (10 sec: 39321.2, 60 sec: 43690.6, 300 sec: 46541.7). Total num frames: 1734082560. Throughput: 0: 11059.2. Samples: 433603584. Policy #0 lag: (min: 32.0, avg: 146.6, max: 336.0) [2024-06-15 21:53:40,956][1648985] Avg episode reward: [(0, '164.220')] [2024-06-15 21:53:41,661][1652491] Updated weights for policy 0, policy_version 846740 (0.0016) [2024-06-15 21:53:42,976][1652491] Updated weights for policy 0, policy_version 846801 (0.0033) [2024-06-15 21:53:43,948][1652491] Updated weights for policy 0, policy_version 846842 (0.0124) [2024-06-15 21:53:45,574][1652491] Updated weights for policy 0, policy_version 846902 (0.0124) [2024-06-15 21:53:45,955][1648985] Fps is (10 sec: 42597.9, 60 sec: 45329.2, 300 sec: 46652.7). Total num frames: 1734475776. Throughput: 0: 11229.9. Samples: 433673216. Policy #0 lag: (min: 15.0, avg: 67.6, max: 255.0) [2024-06-15 21:53:45,956][1648985] Avg episode reward: [(0, '182.440')] [2024-06-15 21:53:47,171][1652491] Updated weights for policy 0, policy_version 846968 (0.0020) [2024-06-15 21:53:50,955][1648985] Fps is (10 sec: 52428.1, 60 sec: 43690.5, 300 sec: 46652.7). Total num frames: 1734606848. Throughput: 0: 11286.7. Samples: 433748992. Policy #0 lag: (min: 15.0, avg: 67.6, max: 255.0) [2024-06-15 21:53:50,956][1648985] Avg episode reward: [(0, '193.590')] [2024-06-15 21:53:53,425][1652491] Updated weights for policy 0, policy_version 847027 (0.0011) [2024-06-15 21:53:54,922][1652491] Updated weights for policy 0, policy_version 847092 (0.0020) [2024-06-15 21:53:55,955][1648985] Fps is (10 sec: 42598.7, 60 sec: 45875.1, 300 sec: 46652.7). Total num frames: 1734901760. Throughput: 0: 11457.4. Samples: 433782272. Policy #0 lag: (min: 15.0, avg: 67.6, max: 255.0) [2024-06-15 21:53:55,956][1648985] Avg episode reward: [(0, '206.750')] [2024-06-15 21:53:56,270][1652491] Updated weights for policy 0, policy_version 847136 (0.0099) [2024-06-15 21:53:57,786][1652491] Updated weights for policy 0, policy_version 847187 (0.0016) [2024-06-15 21:54:00,955][1648985] Fps is (10 sec: 52430.2, 60 sec: 43690.8, 300 sec: 46652.8). Total num frames: 1735131136. Throughput: 0: 11275.4. Samples: 433844224. Policy #0 lag: (min: 15.0, avg: 67.6, max: 255.0) [2024-06-15 21:54:00,956][1648985] Avg episode reward: [(0, '166.550')] [2024-06-15 21:54:03,617][1652491] Updated weights for policy 0, policy_version 847248 (0.0069) [2024-06-15 21:54:04,893][1652491] Updated weights for policy 0, policy_version 847299 (0.0011) [2024-06-15 21:54:05,955][1648985] Fps is (10 sec: 45874.9, 60 sec: 46421.2, 300 sec: 46763.8). Total num frames: 1735360512. Throughput: 0: 11423.2. Samples: 433915392. Policy #0 lag: (min: 15.0, avg: 67.6, max: 255.0) [2024-06-15 21:54:05,956][1648985] Avg episode reward: [(0, '161.640')] [2024-06-15 21:54:06,024][1652491] Updated weights for policy 0, policy_version 847354 (0.0012) [2024-06-15 21:54:08,275][1652491] Updated weights for policy 0, policy_version 847408 (0.0123) [2024-06-15 21:54:09,710][1652491] Updated weights for policy 0, policy_version 847462 (0.0012) [2024-06-15 21:54:10,955][1648985] Fps is (10 sec: 52428.3, 60 sec: 45875.2, 300 sec: 46763.8). Total num frames: 1735655424. Throughput: 0: 11537.1. Samples: 433946624. Policy #0 lag: (min: 15.0, avg: 67.6, max: 255.0) [2024-06-15 21:54:10,956][1648985] Avg episode reward: [(0, '152.720')] [2024-06-15 21:54:14,465][1652491] Updated weights for policy 0, policy_version 847505 (0.0015) [2024-06-15 21:54:14,840][1651469] Signal inference workers to stop experience collection... (44150 times) [2024-06-15 21:54:14,874][1652491] InferenceWorker_p0-w0: stopping experience collection (44150 times) [2024-06-15 21:54:15,068][1651469] Signal inference workers to resume experience collection... (44150 times) [2024-06-15 21:54:15,069][1652491] InferenceWorker_p0-w0: resuming experience collection (44150 times) [2024-06-15 21:54:15,667][1652491] Updated weights for policy 0, policy_version 847568 (0.0014) [2024-06-15 21:54:15,955][1648985] Fps is (10 sec: 45876.0, 60 sec: 46421.4, 300 sec: 46763.8). Total num frames: 1735819264. Throughput: 0: 11628.1. Samples: 434031104. Policy #0 lag: (min: 15.0, avg: 67.6, max: 255.0) [2024-06-15 21:54:15,956][1648985] Avg episode reward: [(0, '150.230')] [2024-06-15 21:54:17,678][1652491] Updated weights for policy 0, policy_version 847618 (0.0014) [2024-06-15 21:54:19,286][1652491] Updated weights for policy 0, policy_version 847696 (0.0013) [2024-06-15 21:54:20,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 48059.6, 300 sec: 46986.0). Total num frames: 1736179712. Throughput: 0: 11935.3. Samples: 434095104. Policy #0 lag: (min: 15.0, avg: 67.6, max: 255.0) [2024-06-15 21:54:20,956][1648985] Avg episode reward: [(0, '161.920')] [2024-06-15 21:54:24,601][1652491] Updated weights for policy 0, policy_version 847748 (0.0015) [2024-06-15 21:54:25,955][1648985] Fps is (10 sec: 49151.4, 60 sec: 45875.2, 300 sec: 47097.1). Total num frames: 1736310784. Throughput: 0: 12037.7. Samples: 434145280. Policy #0 lag: (min: 15.0, avg: 67.6, max: 255.0) [2024-06-15 21:54:25,956][1648985] Avg episode reward: [(0, '161.330')] [2024-06-15 21:54:26,097][1652491] Updated weights for policy 0, policy_version 847813 (0.0051) [2024-06-15 21:54:27,214][1652491] Updated weights for policy 0, policy_version 847863 (0.0014) [2024-06-15 21:54:29,803][1652491] Updated weights for policy 0, policy_version 847936 (0.0015) [2024-06-15 21:54:30,977][1648985] Fps is (10 sec: 49045.9, 60 sec: 49680.2, 300 sec: 46982.5). Total num frames: 1736671232. Throughput: 0: 11815.9. Samples: 434205184. Policy #0 lag: (min: 15.0, avg: 67.6, max: 255.0) [2024-06-15 21:54:30,977][1648985] Avg episode reward: [(0, '156.520')] [2024-06-15 21:54:31,233][1652491] Updated weights for policy 0, policy_version 847999 (0.0012) [2024-06-15 21:54:35,955][1648985] Fps is (10 sec: 39322.0, 60 sec: 44236.8, 300 sec: 46652.8). Total num frames: 1736704000. Throughput: 0: 12003.6. Samples: 434289152. Policy #0 lag: (min: 15.0, avg: 67.6, max: 255.0) [2024-06-15 21:54:35,956][1648985] Avg episode reward: [(0, '177.040')] [2024-06-15 21:54:37,120][1652491] Updated weights for policy 0, policy_version 848051 (0.0012) [2024-06-15 21:54:38,433][1652491] Updated weights for policy 0, policy_version 848122 (0.0013) [2024-06-15 21:54:40,955][1648985] Fps is (10 sec: 39406.9, 60 sec: 49698.2, 300 sec: 46652.8). Total num frames: 1737064448. Throughput: 0: 11946.7. Samples: 434319872. Policy #0 lag: (min: 15.0, avg: 67.6, max: 255.0) [2024-06-15 21:54:40,956][1648985] Avg episode reward: [(0, '169.810')] [2024-06-15 21:54:41,129][1652491] Updated weights for policy 0, policy_version 848197 (0.0014) [2024-06-15 21:54:42,183][1652491] Updated weights for policy 0, policy_version 848250 (0.0015) [2024-06-15 21:54:45,955][1648985] Fps is (10 sec: 52428.5, 60 sec: 45875.3, 300 sec: 46652.7). Total num frames: 1737228288. Throughput: 0: 12208.3. Samples: 434393600. Policy #0 lag: (min: 15.0, avg: 67.6, max: 255.0) [2024-06-15 21:54:45,956][1648985] Avg episode reward: [(0, '174.910')] [2024-06-15 21:54:47,359][1652491] Updated weights for policy 0, policy_version 848292 (0.0014) [2024-06-15 21:54:49,429][1652491] Updated weights for policy 0, policy_version 848373 (0.0015) [2024-06-15 21:54:50,955][1648985] Fps is (10 sec: 42597.8, 60 sec: 48059.8, 300 sec: 46541.6). Total num frames: 1737490432. Throughput: 0: 12071.8. Samples: 434458624. Policy #0 lag: (min: 15.0, avg: 67.6, max: 255.0) [2024-06-15 21:54:50,956][1648985] Avg episode reward: [(0, '164.030')] [2024-06-15 21:54:51,414][1652491] Updated weights for policy 0, policy_version 848404 (0.0012) [2024-06-15 21:54:52,079][1651469] Signal inference workers to stop experience collection... (44200 times) [2024-06-15 21:54:52,108][1652491] InferenceWorker_p0-w0: stopping experience collection (44200 times) [2024-06-15 21:54:52,285][1651469] Signal inference workers to resume experience collection... (44200 times) [2024-06-15 21:54:52,294][1652491] InferenceWorker_p0-w0: resuming experience collection (44200 times) [2024-06-15 21:54:53,094][1652491] Updated weights for policy 0, policy_version 848480 (0.0077) [2024-06-15 21:54:55,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 47513.6, 300 sec: 46652.8). Total num frames: 1737752576. Throughput: 0: 11878.4. Samples: 434481152. Policy #0 lag: (min: 15.0, avg: 67.6, max: 255.0) [2024-06-15 21:54:55,956][1648985] Avg episode reward: [(0, '164.990')] [2024-06-15 21:54:55,961][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000848512_1737752576.pth... [2024-06-15 21:54:56,007][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000843072_1726611456.pth [2024-06-15 21:54:58,320][1652491] Updated weights for policy 0, policy_version 848529 (0.0013) [2024-06-15 21:55:00,038][1652491] Updated weights for policy 0, policy_version 848594 (0.0012) [2024-06-15 21:55:00,955][1648985] Fps is (10 sec: 49152.6, 60 sec: 47513.5, 300 sec: 46763.8). Total num frames: 1737981952. Throughput: 0: 11821.5. Samples: 434563072. Policy #0 lag: (min: 15.0, avg: 67.6, max: 255.0) [2024-06-15 21:55:00,956][1648985] Avg episode reward: [(0, '148.310')] [2024-06-15 21:55:02,635][1652491] Updated weights for policy 0, policy_version 848644 (0.0013) [2024-06-15 21:55:04,243][1652491] Updated weights for policy 0, policy_version 848722 (0.0014) [2024-06-15 21:55:05,955][1648985] Fps is (10 sec: 52429.1, 60 sec: 48606.0, 300 sec: 46652.7). Total num frames: 1738276864. Throughput: 0: 11764.6. Samples: 434624512. Policy #0 lag: (min: 15.0, avg: 67.6, max: 255.0) [2024-06-15 21:55:05,956][1648985] Avg episode reward: [(0, '159.020')] [2024-06-15 21:55:09,692][1652491] Updated weights for policy 0, policy_version 848771 (0.0012) [2024-06-15 21:55:10,955][1648985] Fps is (10 sec: 39321.5, 60 sec: 45329.1, 300 sec: 46541.7). Total num frames: 1738375168. Throughput: 0: 11605.3. Samples: 434667520. Policy #0 lag: (min: 15.0, avg: 67.6, max: 255.0) [2024-06-15 21:55:10,956][1648985] Avg episode reward: [(0, '162.090')] [2024-06-15 21:55:12,399][1652491] Updated weights for policy 0, policy_version 848868 (0.0195) [2024-06-15 21:55:14,552][1652491] Updated weights for policy 0, policy_version 848912 (0.0028) [2024-06-15 21:55:15,617][1652491] Updated weights for policy 0, policy_version 848960 (0.0016) [2024-06-15 21:55:15,955][1648985] Fps is (10 sec: 42598.8, 60 sec: 48059.8, 300 sec: 46430.6). Total num frames: 1738702848. Throughput: 0: 11611.0. Samples: 434727424. Policy #0 lag: (min: 15.0, avg: 67.6, max: 255.0) [2024-06-15 21:55:15,955][1648985] Avg episode reward: [(0, '167.530')] [2024-06-15 21:55:16,893][1652491] Updated weights for policy 0, policy_version 849020 (0.0079) [2024-06-15 21:55:20,955][1648985] Fps is (10 sec: 42598.6, 60 sec: 43690.7, 300 sec: 46208.4). Total num frames: 1738801152. Throughput: 0: 11355.0. Samples: 434800128. Policy #0 lag: (min: 15.0, avg: 67.6, max: 255.0) [2024-06-15 21:55:20,956][1648985] Avg episode reward: [(0, '181.300')] [2024-06-15 21:55:22,661][1652491] Updated weights for policy 0, policy_version 849083 (0.0014) [2024-06-15 21:55:24,460][1652491] Updated weights for policy 0, policy_version 849150 (0.0079) [2024-06-15 21:55:25,955][1648985] Fps is (10 sec: 39320.8, 60 sec: 46421.3, 300 sec: 46319.5). Total num frames: 1739096064. Throughput: 0: 11320.9. Samples: 434829312. Policy #0 lag: (min: 15.0, avg: 67.6, max: 255.0) [2024-06-15 21:55:25,956][1648985] Avg episode reward: [(0, '183.040')] [2024-06-15 21:55:26,297][1652491] Updated weights for policy 0, policy_version 849187 (0.0013) [2024-06-15 21:55:27,945][1652491] Updated weights for policy 0, policy_version 849278 (0.0129) [2024-06-15 21:55:30,955][1648985] Fps is (10 sec: 52429.1, 60 sec: 44252.8, 300 sec: 46541.7). Total num frames: 1739325440. Throughput: 0: 11309.5. Samples: 434902528. Policy #0 lag: (min: 15.0, avg: 67.6, max: 255.0) [2024-06-15 21:55:30,956][1648985] Avg episode reward: [(0, '165.050')] [2024-06-15 21:55:33,748][1652491] Updated weights for policy 0, policy_version 849330 (0.0014) [2024-06-15 21:55:34,498][1652491] Updated weights for policy 0, policy_version 849360 (0.0036) [2024-06-15 21:55:34,564][1651469] Signal inference workers to stop experience collection... (44250 times) [2024-06-15 21:55:34,631][1652491] InferenceWorker_p0-w0: stopping experience collection (44250 times) [2024-06-15 21:55:34,889][1651469] Signal inference workers to resume experience collection... (44250 times) [2024-06-15 21:55:34,890][1652491] InferenceWorker_p0-w0: resuming experience collection (44250 times) [2024-06-15 21:55:35,955][1648985] Fps is (10 sec: 49152.9, 60 sec: 48059.8, 300 sec: 46541.7). Total num frames: 1739587584. Throughput: 0: 11423.4. Samples: 434972672. Policy #0 lag: (min: 15.0, avg: 92.6, max: 271.0) [2024-06-15 21:55:35,955][1648985] Avg episode reward: [(0, '156.470')] [2024-06-15 21:55:36,454][1652491] Updated weights for policy 0, policy_version 849411 (0.0103) [2024-06-15 21:55:37,856][1652491] Updated weights for policy 0, policy_version 849473 (0.0012) [2024-06-15 21:55:39,110][1652491] Updated weights for policy 0, policy_version 849536 (0.0019) [2024-06-15 21:55:40,955][1648985] Fps is (10 sec: 52428.6, 60 sec: 46421.4, 300 sec: 46652.7). Total num frames: 1739849728. Throughput: 0: 11639.5. Samples: 435004928. Policy #0 lag: (min: 15.0, avg: 92.6, max: 271.0) [2024-06-15 21:55:40,955][1648985] Avg episode reward: [(0, '181.460')] [2024-06-15 21:55:45,363][1652491] Updated weights for policy 0, policy_version 849606 (0.0015) [2024-06-15 21:55:45,955][1648985] Fps is (10 sec: 45875.1, 60 sec: 46967.5, 300 sec: 46541.7). Total num frames: 1740046336. Throughput: 0: 11514.3. Samples: 435081216. Policy #0 lag: (min: 15.0, avg: 92.6, max: 271.0) [2024-06-15 21:55:45,955][1648985] Avg episode reward: [(0, '192.670')] [2024-06-15 21:55:46,353][1652491] Updated weights for policy 0, policy_version 849663 (0.0018) [2024-06-15 21:55:48,862][1652491] Updated weights for policy 0, policy_version 849729 (0.0148) [2024-06-15 21:55:49,991][1652491] Updated weights for policy 0, policy_version 849781 (0.0012) [2024-06-15 21:55:50,955][1648985] Fps is (10 sec: 52428.4, 60 sec: 48059.8, 300 sec: 46652.7). Total num frames: 1740374016. Throughput: 0: 11673.6. Samples: 435149824. Policy #0 lag: (min: 15.0, avg: 92.6, max: 271.0) [2024-06-15 21:55:50,956][1648985] Avg episode reward: [(0, '178.590')] [2024-06-15 21:55:54,548][1652491] Updated weights for policy 0, policy_version 849810 (0.0026) [2024-06-15 21:55:55,658][1652491] Updated weights for policy 0, policy_version 849855 (0.0033) [2024-06-15 21:55:55,955][1648985] Fps is (10 sec: 45874.5, 60 sec: 45875.2, 300 sec: 46652.7). Total num frames: 1740505088. Throughput: 0: 11628.1. Samples: 435190784. Policy #0 lag: (min: 15.0, avg: 92.6, max: 271.0) [2024-06-15 21:55:55,956][1648985] Avg episode reward: [(0, '160.000')] [2024-06-15 21:55:57,658][1652491] Updated weights for policy 0, policy_version 849913 (0.0012) [2024-06-15 21:55:59,681][1652491] Updated weights for policy 0, policy_version 849956 (0.0012) [2024-06-15 21:56:00,955][1648985] Fps is (10 sec: 42599.0, 60 sec: 46967.6, 300 sec: 46541.7). Total num frames: 1740800000. Throughput: 0: 11787.4. Samples: 435257856. Policy #0 lag: (min: 15.0, avg: 92.6, max: 271.0) [2024-06-15 21:56:00,955][1648985] Avg episode reward: [(0, '160.270')] [2024-06-15 21:56:01,633][1652491] Updated weights for policy 0, policy_version 850042 (0.0079) [2024-06-15 21:56:05,955][1648985] Fps is (10 sec: 42599.0, 60 sec: 44236.8, 300 sec: 46319.5). Total num frames: 1740931072. Throughput: 0: 11798.8. Samples: 435331072. Policy #0 lag: (min: 15.0, avg: 92.6, max: 271.0) [2024-06-15 21:56:05,955][1648985] Avg episode reward: [(0, '163.870')] [2024-06-15 21:56:06,178][1652491] Updated weights for policy 0, policy_version 850084 (0.0024) [2024-06-15 21:56:07,532][1652491] Updated weights for policy 0, policy_version 850128 (0.0013) [2024-06-15 21:56:08,739][1652491] Updated weights for policy 0, policy_version 850176 (0.0014) [2024-06-15 21:56:10,924][1652491] Updated weights for policy 0, policy_version 850224 (0.0013) [2024-06-15 21:56:10,955][1648985] Fps is (10 sec: 45874.6, 60 sec: 48059.7, 300 sec: 46319.5). Total num frames: 1741258752. Throughput: 0: 11855.7. Samples: 435362816. Policy #0 lag: (min: 15.0, avg: 92.6, max: 271.0) [2024-06-15 21:56:10,956][1648985] Avg episode reward: [(0, '177.250')] [2024-06-15 21:56:12,166][1652491] Updated weights for policy 0, policy_version 850274 (0.0012) [2024-06-15 21:56:15,121][1652491] Updated weights for policy 0, policy_version 850307 (0.0019) [2024-06-15 21:56:15,845][1651469] Signal inference workers to stop experience collection... (44300 times) [2024-06-15 21:56:15,889][1652491] InferenceWorker_p0-w0: stopping experience collection (44300 times) [2024-06-15 21:56:15,955][1648985] Fps is (10 sec: 55705.0, 60 sec: 46421.2, 300 sec: 46430.6). Total num frames: 1741488128. Throughput: 0: 12094.5. Samples: 435446784. Policy #0 lag: (min: 15.0, avg: 92.6, max: 271.0) [2024-06-15 21:56:15,956][1648985] Avg episode reward: [(0, '170.170')] [2024-06-15 21:56:16,163][1651469] Signal inference workers to resume experience collection... (44300 times) [2024-06-15 21:56:16,164][1652491] InferenceWorker_p0-w0: resuming experience collection (44300 times) [2024-06-15 21:56:16,373][1652491] Updated weights for policy 0, policy_version 850363 (0.0026) [2024-06-15 21:56:19,116][1652491] Updated weights for policy 0, policy_version 850427 (0.0015) [2024-06-15 21:56:20,955][1648985] Fps is (10 sec: 45876.1, 60 sec: 48606.0, 300 sec: 46430.6). Total num frames: 1741717504. Throughput: 0: 12060.5. Samples: 435515392. Policy #0 lag: (min: 15.0, avg: 92.6, max: 271.0) [2024-06-15 21:56:20,956][1648985] Avg episode reward: [(0, '186.290')] [2024-06-15 21:56:21,601][1652491] Updated weights for policy 0, policy_version 850484 (0.0012) [2024-06-15 21:56:22,977][1652491] Updated weights for policy 0, policy_version 850560 (0.0116) [2024-06-15 21:56:25,955][1648985] Fps is (10 sec: 45875.3, 60 sec: 47513.6, 300 sec: 46541.7). Total num frames: 1741946880. Throughput: 0: 12026.3. Samples: 435546112. Policy #0 lag: (min: 15.0, avg: 92.6, max: 271.0) [2024-06-15 21:56:25,956][1648985] Avg episode reward: [(0, '176.880')] [2024-06-15 21:56:27,427][1652491] Updated weights for policy 0, policy_version 850624 (0.0021) [2024-06-15 21:56:30,204][1652491] Updated weights for policy 0, policy_version 850686 (0.0100) [2024-06-15 21:56:30,955][1648985] Fps is (10 sec: 49151.3, 60 sec: 48059.7, 300 sec: 46986.0). Total num frames: 1742209024. Throughput: 0: 12037.7. Samples: 435622912. Policy #0 lag: (min: 15.0, avg: 92.6, max: 271.0) [2024-06-15 21:56:30,956][1648985] Avg episode reward: [(0, '163.890')] [2024-06-15 21:56:32,615][1652491] Updated weights for policy 0, policy_version 850737 (0.0017) [2024-06-15 21:56:34,054][1652491] Updated weights for policy 0, policy_version 850808 (0.0012) [2024-06-15 21:56:35,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 48059.6, 300 sec: 46652.7). Total num frames: 1742471168. Throughput: 0: 12083.2. Samples: 435693568. Policy #0 lag: (min: 15.0, avg: 92.6, max: 271.0) [2024-06-15 21:56:35,956][1648985] Avg episode reward: [(0, '164.840')] [2024-06-15 21:56:38,249][1652491] Updated weights for policy 0, policy_version 850872 (0.0014) [2024-06-15 21:56:40,771][1652491] Updated weights for policy 0, policy_version 850913 (0.0014) [2024-06-15 21:56:40,955][1648985] Fps is (10 sec: 49151.6, 60 sec: 47513.5, 300 sec: 46986.0). Total num frames: 1742700544. Throughput: 0: 11923.9. Samples: 435727360. Policy #0 lag: (min: 15.0, avg: 92.6, max: 271.0) [2024-06-15 21:56:40,956][1648985] Avg episode reward: [(0, '165.640')] [2024-06-15 21:56:43,227][1652491] Updated weights for policy 0, policy_version 850962 (0.0017) [2024-06-15 21:56:44,564][1652491] Updated weights for policy 0, policy_version 851024 (0.0013) [2024-06-15 21:56:45,504][1652491] Updated weights for policy 0, policy_version 851068 (0.0013) [2024-06-15 21:56:45,955][1648985] Fps is (10 sec: 52429.1, 60 sec: 49152.0, 300 sec: 46763.8). Total num frames: 1742995456. Throughput: 0: 11923.9. Samples: 435794432. Policy #0 lag: (min: 15.0, avg: 92.6, max: 271.0) [2024-06-15 21:56:45,956][1648985] Avg episode reward: [(0, '159.540')] [2024-06-15 21:56:49,569][1652491] Updated weights for policy 0, policy_version 851130 (0.0013) [2024-06-15 21:56:50,955][1648985] Fps is (10 sec: 42598.9, 60 sec: 45875.3, 300 sec: 46652.8). Total num frames: 1743126528. Throughput: 0: 11992.2. Samples: 435870720. Policy #0 lag: (min: 15.0, avg: 92.6, max: 271.0) [2024-06-15 21:56:50,956][1648985] Avg episode reward: [(0, '159.190')] [2024-06-15 21:56:52,106][1652491] Updated weights for policy 0, policy_version 851184 (0.0052) [2024-06-15 21:56:54,892][1652491] Updated weights for policy 0, policy_version 851248 (0.0014) [2024-06-15 21:56:55,955][1648985] Fps is (10 sec: 42597.5, 60 sec: 48605.8, 300 sec: 46652.7). Total num frames: 1743421440. Throughput: 0: 12049.0. Samples: 435905024. Policy #0 lag: (min: 15.0, avg: 92.6, max: 271.0) [2024-06-15 21:56:55,956][1648985] Avg episode reward: [(0, '166.590')] [2024-06-15 21:56:56,404][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000851312_1743486976.pth... [2024-06-15 21:56:56,449][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000845808_1732214784.pth [2024-06-15 21:56:56,531][1652491] Updated weights for policy 0, policy_version 851313 (0.0015) [2024-06-15 21:56:59,014][1651469] Signal inference workers to stop experience collection... (44350 times) [2024-06-15 21:56:59,028][1652491] Updated weights for policy 0, policy_version 851329 (0.0012) [2024-06-15 21:56:59,058][1652491] InferenceWorker_p0-w0: stopping experience collection (44350 times) [2024-06-15 21:56:59,313][1651469] Signal inference workers to resume experience collection... (44350 times) [2024-06-15 21:56:59,314][1652491] InferenceWorker_p0-w0: resuming experience collection (44350 times) [2024-06-15 21:57:00,353][1652491] Updated weights for policy 0, policy_version 851383 (0.0011) [2024-06-15 21:57:00,955][1648985] Fps is (10 sec: 52428.2, 60 sec: 47513.5, 300 sec: 46652.7). Total num frames: 1743650816. Throughput: 0: 11741.9. Samples: 435975168. Policy #0 lag: (min: 15.0, avg: 92.6, max: 271.0) [2024-06-15 21:57:00,956][1648985] Avg episode reward: [(0, '157.710')] [2024-06-15 21:57:03,193][1652491] Updated weights for policy 0, policy_version 851427 (0.0025) [2024-06-15 21:57:05,955][1648985] Fps is (10 sec: 42599.1, 60 sec: 48605.8, 300 sec: 46763.8). Total num frames: 1743847424. Throughput: 0: 11696.3. Samples: 436041728. Policy #0 lag: (min: 15.0, avg: 92.6, max: 271.0) [2024-06-15 21:57:05,956][1648985] Avg episode reward: [(0, '160.030')] [2024-06-15 21:57:06,739][1652491] Updated weights for policy 0, policy_version 851522 (0.0107) [2024-06-15 21:57:10,951][1652491] Updated weights for policy 0, policy_version 851600 (0.0013) [2024-06-15 21:57:10,956][1648985] Fps is (10 sec: 42595.9, 60 sec: 46967.0, 300 sec: 46541.6). Total num frames: 1744076800. Throughput: 0: 11605.2. Samples: 436068352. Policy #0 lag: (min: 15.0, avg: 92.6, max: 271.0) [2024-06-15 21:57:10,956][1648985] Avg episode reward: [(0, '162.900')] [2024-06-15 21:57:12,235][1652491] Updated weights for policy 0, policy_version 851648 (0.0011) [2024-06-15 21:57:15,091][1652491] Updated weights for policy 0, policy_version 851707 (0.0011) [2024-06-15 21:57:15,956][1648985] Fps is (10 sec: 45869.7, 60 sec: 46966.6, 300 sec: 46652.6). Total num frames: 1744306176. Throughput: 0: 11661.9. Samples: 436147712. Policy #0 lag: (min: 15.0, avg: 92.6, max: 271.0) [2024-06-15 21:57:15,957][1648985] Avg episode reward: [(0, '173.730')] [2024-06-15 21:57:17,771][1652491] Updated weights for policy 0, policy_version 851764 (0.0014) [2024-06-15 21:57:19,180][1652491] Updated weights for policy 0, policy_version 851832 (0.0103) [2024-06-15 21:57:20,955][1648985] Fps is (10 sec: 49155.6, 60 sec: 47513.6, 300 sec: 46652.8). Total num frames: 1744568320. Throughput: 0: 11673.6. Samples: 436218880. Policy #0 lag: (min: 15.0, avg: 92.6, max: 271.0) [2024-06-15 21:57:20,955][1648985] Avg episode reward: [(0, '168.150')] [2024-06-15 21:57:23,020][1652491] Updated weights for policy 0, policy_version 851897 (0.0105) [2024-06-15 21:57:25,535][1652491] Updated weights for policy 0, policy_version 851921 (0.0013) [2024-06-15 21:57:25,955][1648985] Fps is (10 sec: 45880.9, 60 sec: 46967.5, 300 sec: 46541.7). Total num frames: 1744764928. Throughput: 0: 11741.9. Samples: 436255744. Policy #0 lag: (min: 0.0, avg: 98.6, max: 256.0) [2024-06-15 21:57:25,955][1648985] Avg episode reward: [(0, '187.440')] [2024-06-15 21:57:27,533][1652491] Updated weights for policy 0, policy_version 851984 (0.0017) [2024-06-15 21:57:29,197][1652491] Updated weights for policy 0, policy_version 852052 (0.0012) [2024-06-15 21:57:30,128][1652491] Updated weights for policy 0, policy_version 852096 (0.0020) [2024-06-15 21:57:30,955][1648985] Fps is (10 sec: 52428.4, 60 sec: 48059.8, 300 sec: 46652.7). Total num frames: 1745092608. Throughput: 0: 11594.0. Samples: 436316160. Policy #0 lag: (min: 0.0, avg: 98.6, max: 256.0) [2024-06-15 21:57:30,956][1648985] Avg episode reward: [(0, '167.810')] [2024-06-15 21:57:33,953][1652491] Updated weights for policy 0, policy_version 852158 (0.0012) [2024-06-15 21:57:35,955][1648985] Fps is (10 sec: 45874.3, 60 sec: 45875.1, 300 sec: 46652.7). Total num frames: 1745223680. Throughput: 0: 11719.1. Samples: 436398080. Policy #0 lag: (min: 0.0, avg: 98.6, max: 256.0) [2024-06-15 21:57:35,957][1648985] Avg episode reward: [(0, '151.470')] [2024-06-15 21:57:37,963][1652491] Updated weights for policy 0, policy_version 852208 (0.0012) [2024-06-15 21:57:38,516][1652491] Updated weights for policy 0, policy_version 852226 (0.0012) [2024-06-15 21:57:40,094][1651469] Signal inference workers to stop experience collection... (44400 times) [2024-06-15 21:57:40,146][1652491] InferenceWorker_p0-w0: stopping experience collection (44400 times) [2024-06-15 21:57:40,445][1651469] Signal inference workers to resume experience collection... (44400 times) [2024-06-15 21:57:40,446][1652491] InferenceWorker_p0-w0: resuming experience collection (44400 times) [2024-06-15 21:57:40,627][1652491] Updated weights for policy 0, policy_version 852306 (0.0207) [2024-06-15 21:57:40,955][1648985] Fps is (10 sec: 45875.4, 60 sec: 47513.7, 300 sec: 46763.9). Total num frames: 1745551360. Throughput: 0: 11730.6. Samples: 436432896. Policy #0 lag: (min: 0.0, avg: 98.6, max: 256.0) [2024-06-15 21:57:40,956][1648985] Avg episode reward: [(0, '152.530')] [2024-06-15 21:57:41,714][1652491] Updated weights for policy 0, policy_version 852352 (0.0130) [2024-06-15 21:57:45,460][1652491] Updated weights for policy 0, policy_version 852411 (0.0014) [2024-06-15 21:57:45,955][1648985] Fps is (10 sec: 52430.2, 60 sec: 45875.2, 300 sec: 46652.8). Total num frames: 1745747968. Throughput: 0: 11514.4. Samples: 436493312. Policy #0 lag: (min: 0.0, avg: 98.6, max: 256.0) [2024-06-15 21:57:45,955][1648985] Avg episode reward: [(0, '169.300')] [2024-06-15 21:57:49,288][1652491] Updated weights for policy 0, policy_version 852448 (0.0068) [2024-06-15 21:57:50,745][1652491] Updated weights for policy 0, policy_version 852515 (0.0014) [2024-06-15 21:57:50,955][1648985] Fps is (10 sec: 42598.3, 60 sec: 47513.6, 300 sec: 46874.9). Total num frames: 1745977344. Throughput: 0: 11673.6. Samples: 436567040. Policy #0 lag: (min: 0.0, avg: 98.6, max: 256.0) [2024-06-15 21:57:50,956][1648985] Avg episode reward: [(0, '172.490')] [2024-06-15 21:57:52,945][1652491] Updated weights for policy 0, policy_version 852599 (0.0013) [2024-06-15 21:57:55,955][1648985] Fps is (10 sec: 45874.4, 60 sec: 46421.4, 300 sec: 46430.6). Total num frames: 1746206720. Throughput: 0: 11662.4. Samples: 436593152. Policy #0 lag: (min: 0.0, avg: 98.6, max: 256.0) [2024-06-15 21:57:55,956][1648985] Avg episode reward: [(0, '196.660')] [2024-06-15 21:57:55,990][1652491] Updated weights for policy 0, policy_version 852642 (0.0017) [2024-06-15 21:58:00,244][1652491] Updated weights for policy 0, policy_version 852688 (0.0012) [2024-06-15 21:58:00,955][1648985] Fps is (10 sec: 39321.6, 60 sec: 45329.1, 300 sec: 46763.8). Total num frames: 1746370560. Throughput: 0: 11685.3. Samples: 436673536. Policy #0 lag: (min: 0.0, avg: 98.6, max: 256.0) [2024-06-15 21:58:00,956][1648985] Avg episode reward: [(0, '194.020')] [2024-06-15 21:58:01,821][1652491] Updated weights for policy 0, policy_version 852753 (0.0015) [2024-06-15 21:58:02,863][1652491] Updated weights for policy 0, policy_version 852806 (0.0013) [2024-06-15 21:58:04,030][1652491] Updated weights for policy 0, policy_version 852859 (0.0014) [2024-06-15 21:58:05,955][1648985] Fps is (10 sec: 49152.0, 60 sec: 47513.5, 300 sec: 46763.8). Total num frames: 1746698240. Throughput: 0: 11673.6. Samples: 436744192. Policy #0 lag: (min: 0.0, avg: 98.6, max: 256.0) [2024-06-15 21:58:05,956][1648985] Avg episode reward: [(0, '203.480')] [2024-06-15 21:58:06,717][1652491] Updated weights for policy 0, policy_version 852919 (0.0018) [2024-06-15 21:58:10,955][1648985] Fps is (10 sec: 42598.0, 60 sec: 45329.5, 300 sec: 46652.7). Total num frames: 1746796544. Throughput: 0: 11696.3. Samples: 436782080. Policy #0 lag: (min: 0.0, avg: 98.6, max: 256.0) [2024-06-15 21:58:10,956][1648985] Avg episode reward: [(0, '169.930')] [2024-06-15 21:58:11,695][1652491] Updated weights for policy 0, policy_version 852960 (0.0014) [2024-06-15 21:58:13,987][1652491] Updated weights for policy 0, policy_version 853076 (0.0092) [2024-06-15 21:58:15,955][1648985] Fps is (10 sec: 49152.3, 60 sec: 48060.7, 300 sec: 47097.0). Total num frames: 1747189760. Throughput: 0: 11753.2. Samples: 436845056. Policy #0 lag: (min: 0.0, avg: 98.6, max: 256.0) [2024-06-15 21:58:15,956][1648985] Avg episode reward: [(0, '160.860')] [2024-06-15 21:58:16,506][1652491] Updated weights for policy 0, policy_version 853121 (0.0013) [2024-06-15 21:58:17,869][1652491] Updated weights for policy 0, policy_version 853181 (0.0014) [2024-06-15 21:58:20,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 45875.1, 300 sec: 46652.7). Total num frames: 1747320832. Throughput: 0: 11776.0. Samples: 436928000. Policy #0 lag: (min: 0.0, avg: 98.6, max: 256.0) [2024-06-15 21:58:20,956][1648985] Avg episode reward: [(0, '154.150')] [2024-06-15 21:58:22,912][1652491] Updated weights for policy 0, policy_version 853219 (0.0013) [2024-06-15 21:58:23,134][1651469] Signal inference workers to stop experience collection... (44450 times) [2024-06-15 21:58:23,196][1652491] InferenceWorker_p0-w0: stopping experience collection (44450 times) [2024-06-15 21:58:23,286][1651469] Signal inference workers to resume experience collection... (44450 times) [2024-06-15 21:58:23,287][1652491] InferenceWorker_p0-w0: resuming experience collection (44450 times) [2024-06-15 21:58:24,227][1652491] Updated weights for policy 0, policy_version 853283 (0.0014) [2024-06-15 21:58:25,228][1652491] Updated weights for policy 0, policy_version 853344 (0.0011) [2024-06-15 21:58:25,955][1648985] Fps is (10 sec: 52428.0, 60 sec: 49151.8, 300 sec: 47541.3). Total num frames: 1747714048. Throughput: 0: 11775.9. Samples: 436962816. Policy #0 lag: (min: 0.0, avg: 98.6, max: 256.0) [2024-06-15 21:58:25,956][1648985] Avg episode reward: [(0, '151.260')] [2024-06-15 21:58:27,474][1652491] Updated weights for policy 0, policy_version 853394 (0.0012) [2024-06-15 21:58:30,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 45875.2, 300 sec: 46763.8). Total num frames: 1747845120. Throughput: 0: 12060.4. Samples: 437036032. Policy #0 lag: (min: 0.0, avg: 98.6, max: 256.0) [2024-06-15 21:58:30,956][1648985] Avg episode reward: [(0, '152.630')] [2024-06-15 21:58:32,519][1652491] Updated weights for policy 0, policy_version 853456 (0.0015) [2024-06-15 21:58:34,677][1652491] Updated weights for policy 0, policy_version 853552 (0.0013) [2024-06-15 21:58:35,782][1652491] Updated weights for policy 0, policy_version 853603 (0.0114) [2024-06-15 21:58:35,955][1648985] Fps is (10 sec: 49152.8, 60 sec: 49698.3, 300 sec: 47874.6). Total num frames: 1748205568. Throughput: 0: 11923.9. Samples: 437103616. Policy #0 lag: (min: 0.0, avg: 98.6, max: 256.0) [2024-06-15 21:58:35,956][1648985] Avg episode reward: [(0, '162.640')] [2024-06-15 21:58:38,089][1652491] Updated weights for policy 0, policy_version 853648 (0.0015) [2024-06-15 21:58:39,264][1652491] Updated weights for policy 0, policy_version 853691 (0.0015) [2024-06-15 21:58:40,955][1648985] Fps is (10 sec: 52428.5, 60 sec: 46967.3, 300 sec: 47097.1). Total num frames: 1748369408. Throughput: 0: 12208.4. Samples: 437142528. Policy #0 lag: (min: 0.0, avg: 98.6, max: 256.0) [2024-06-15 21:58:40,956][1648985] Avg episode reward: [(0, '163.380')] [2024-06-15 21:58:43,789][1652491] Updated weights for policy 0, policy_version 853733 (0.0013) [2024-06-15 21:58:45,117][1652491] Updated weights for policy 0, policy_version 853792 (0.0044) [2024-06-15 21:58:45,957][1648985] Fps is (10 sec: 42593.0, 60 sec: 48058.6, 300 sec: 47541.2). Total num frames: 1748631552. Throughput: 0: 12299.0. Samples: 437227008. Policy #0 lag: (min: 0.0, avg: 98.6, max: 256.0) [2024-06-15 21:58:45,958][1648985] Avg episode reward: [(0, '155.220')] [2024-06-15 21:58:47,090][1652491] Updated weights for policy 0, policy_version 853888 (0.0013) [2024-06-15 21:58:49,138][1652491] Updated weights for policy 0, policy_version 853944 (0.0014) [2024-06-15 21:58:50,955][1648985] Fps is (10 sec: 52429.4, 60 sec: 48605.9, 300 sec: 47430.3). Total num frames: 1748893696. Throughput: 0: 12208.4. Samples: 437293568. Policy #0 lag: (min: 0.0, avg: 98.6, max: 256.0) [2024-06-15 21:58:50,956][1648985] Avg episode reward: [(0, '150.530')] [2024-06-15 21:58:55,625][1652491] Updated weights for policy 0, policy_version 854032 (0.0109) [2024-06-15 21:58:55,960][1648985] Fps is (10 sec: 45859.3, 60 sec: 48056.0, 300 sec: 47318.4). Total num frames: 1749090304. Throughput: 0: 12434.6. Samples: 437341696. Policy #0 lag: (min: 0.0, avg: 98.6, max: 256.0) [2024-06-15 21:58:55,960][1648985] Avg episode reward: [(0, '157.400')] [2024-06-15 21:58:56,214][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000854064_1749123072.pth... [2024-06-15 21:58:56,286][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000848512_1737752576.pth [2024-06-15 21:58:56,935][1652491] Updated weights for policy 0, policy_version 854082 (0.0089) [2024-06-15 21:58:57,988][1652491] Updated weights for policy 0, policy_version 854143 (0.0015) [2024-06-15 21:58:58,851][1651469] Signal inference workers to stop experience collection... (44500 times) [2024-06-15 21:58:58,938][1652491] InferenceWorker_p0-w0: stopping experience collection (44500 times) [2024-06-15 21:58:59,056][1651469] Signal inference workers to resume experience collection... (44500 times) [2024-06-15 21:58:59,056][1652491] InferenceWorker_p0-w0: resuming experience collection (44500 times) [2024-06-15 21:58:59,945][1652491] Updated weights for policy 0, policy_version 854203 (0.0028) [2024-06-15 21:59:00,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 50790.4, 300 sec: 47652.5). Total num frames: 1749417984. Throughput: 0: 12265.3. Samples: 437396992. Policy #0 lag: (min: 0.0, avg: 98.6, max: 256.0) [2024-06-15 21:59:00,956][1648985] Avg episode reward: [(0, '180.060')] [2024-06-15 21:59:05,955][1648985] Fps is (10 sec: 42617.9, 60 sec: 46967.4, 300 sec: 46986.0). Total num frames: 1749516288. Throughput: 0: 12162.8. Samples: 437475328. Policy #0 lag: (min: 0.0, avg: 98.6, max: 256.0) [2024-06-15 21:59:05,956][1648985] Avg episode reward: [(0, '186.240')] [2024-06-15 21:59:06,113][1652491] Updated weights for policy 0, policy_version 854272 (0.0016) [2024-06-15 21:59:07,454][1652491] Updated weights for policy 0, policy_version 854336 (0.0014) [2024-06-15 21:59:08,804][1652491] Updated weights for policy 0, policy_version 854395 (0.0011) [2024-06-15 21:59:10,826][1652491] Updated weights for policy 0, policy_version 854448 (0.0011) [2024-06-15 21:59:10,955][1648985] Fps is (10 sec: 49151.6, 60 sec: 51882.7, 300 sec: 47763.5). Total num frames: 1749909504. Throughput: 0: 12049.1. Samples: 437505024. Policy #0 lag: (min: 0.0, avg: 98.6, max: 256.0) [2024-06-15 21:59:10,956][1648985] Avg episode reward: [(0, '173.540')] [2024-06-15 21:59:15,955][1648985] Fps is (10 sec: 45875.8, 60 sec: 46421.3, 300 sec: 46763.8). Total num frames: 1749975040. Throughput: 0: 12242.5. Samples: 437586944. Policy #0 lag: (min: 15.0, avg: 75.0, max: 271.0) [2024-06-15 21:59:15,956][1648985] Avg episode reward: [(0, '191.220')] [2024-06-15 21:59:17,343][1652491] Updated weights for policy 0, policy_version 854544 (0.0015) [2024-06-15 21:59:18,752][1652491] Updated weights for policy 0, policy_version 854608 (0.0015) [2024-06-15 21:59:19,667][1652491] Updated weights for policy 0, policy_version 854656 (0.0012) [2024-06-15 21:59:20,955][1648985] Fps is (10 sec: 45875.4, 60 sec: 50790.4, 300 sec: 47652.5). Total num frames: 1750368256. Throughput: 0: 12185.6. Samples: 437651968. Policy #0 lag: (min: 15.0, avg: 75.0, max: 271.0) [2024-06-15 21:59:20,956][1648985] Avg episode reward: [(0, '189.480')] [2024-06-15 21:59:21,835][1652491] Updated weights for policy 0, policy_version 854709 (0.0014) [2024-06-15 21:59:25,955][1648985] Fps is (10 sec: 49151.8, 60 sec: 45875.3, 300 sec: 46767.2). Total num frames: 1750466560. Throughput: 0: 12242.5. Samples: 437693440. Policy #0 lag: (min: 15.0, avg: 75.0, max: 271.0) [2024-06-15 21:59:25,956][1648985] Avg episode reward: [(0, '191.310')] [2024-06-15 21:59:27,011][1652491] Updated weights for policy 0, policy_version 854752 (0.0013) [2024-06-15 21:59:28,940][1652491] Updated weights for policy 0, policy_version 854848 (0.0014) [2024-06-15 21:59:30,849][1652491] Updated weights for policy 0, policy_version 854910 (0.0053) [2024-06-15 21:59:30,955][1648985] Fps is (10 sec: 49151.8, 60 sec: 50244.3, 300 sec: 47985.7). Total num frames: 1750859776. Throughput: 0: 11856.0. Samples: 437760512. Policy #0 lag: (min: 15.0, avg: 75.0, max: 271.0) [2024-06-15 21:59:30,956][1648985] Avg episode reward: [(0, '174.090')] [2024-06-15 21:59:32,267][1652491] Updated weights for policy 0, policy_version 854964 (0.0013) [2024-06-15 21:59:35,955][1648985] Fps is (10 sec: 52429.2, 60 sec: 46421.3, 300 sec: 47208.1). Total num frames: 1750990848. Throughput: 0: 12276.6. Samples: 437846016. Policy #0 lag: (min: 15.0, avg: 75.0, max: 271.0) [2024-06-15 21:59:35,956][1648985] Avg episode reward: [(0, '176.440')] [2024-06-15 21:59:37,146][1652491] Updated weights for policy 0, policy_version 854997 (0.0023) [2024-06-15 21:59:38,608][1652491] Updated weights for policy 0, policy_version 855056 (0.0016) [2024-06-15 21:59:40,587][1651469] Signal inference workers to stop experience collection... (44550 times) [2024-06-15 21:59:40,694][1652491] InferenceWorker_p0-w0: stopping experience collection (44550 times) [2024-06-15 21:59:40,887][1651469] Signal inference workers to resume experience collection... (44550 times) [2024-06-15 21:59:40,888][1652491] InferenceWorker_p0-w0: resuming experience collection (44550 times) [2024-06-15 21:59:40,890][1652491] Updated weights for policy 0, policy_version 855136 (0.0015) [2024-06-15 21:59:40,955][1648985] Fps is (10 sec: 45875.2, 60 sec: 49152.0, 300 sec: 47763.5). Total num frames: 1751318528. Throughput: 0: 11936.5. Samples: 437878784. Policy #0 lag: (min: 15.0, avg: 75.0, max: 271.0) [2024-06-15 21:59:40,956][1648985] Avg episode reward: [(0, '175.650')] [2024-06-15 21:59:42,486][1652491] Updated weights for policy 0, policy_version 855216 (0.0015) [2024-06-15 21:59:45,955][1648985] Fps is (10 sec: 52429.1, 60 sec: 48060.8, 300 sec: 47541.4). Total num frames: 1751515136. Throughput: 0: 12299.4. Samples: 437950464. Policy #0 lag: (min: 15.0, avg: 75.0, max: 271.0) [2024-06-15 21:59:45,956][1648985] Avg episode reward: [(0, '192.960')] [2024-06-15 21:59:47,891][1652491] Updated weights for policy 0, policy_version 855280 (0.0013) [2024-06-15 21:59:50,160][1652491] Updated weights for policy 0, policy_version 855329 (0.0015) [2024-06-15 21:59:50,955][1648985] Fps is (10 sec: 45875.5, 60 sec: 48059.7, 300 sec: 47541.4). Total num frames: 1751777280. Throughput: 0: 12162.9. Samples: 438022656. Policy #0 lag: (min: 15.0, avg: 75.0, max: 271.0) [2024-06-15 21:59:50,956][1648985] Avg episode reward: [(0, '168.780')] [2024-06-15 21:59:51,385][1652491] Updated weights for policy 0, policy_version 855377 (0.0084) [2024-06-15 21:59:52,672][1652491] Updated weights for policy 0, policy_version 855444 (0.0015) [2024-06-15 21:59:55,955][1648985] Fps is (10 sec: 52427.7, 60 sec: 49155.8, 300 sec: 47652.4). Total num frames: 1752039424. Throughput: 0: 12219.7. Samples: 438054912. Policy #0 lag: (min: 15.0, avg: 75.0, max: 271.0) [2024-06-15 21:59:55,956][1648985] Avg episode reward: [(0, '160.680')] [2024-06-15 21:59:57,470][1652491] Updated weights for policy 0, policy_version 855505 (0.0013) [2024-06-15 22:00:00,834][1652491] Updated weights for policy 0, policy_version 855588 (0.0015) [2024-06-15 22:00:00,955][1648985] Fps is (10 sec: 45875.0, 60 sec: 46967.4, 300 sec: 47319.2). Total num frames: 1752236032. Throughput: 0: 12367.7. Samples: 438143488. Policy #0 lag: (min: 15.0, avg: 75.0, max: 271.0) [2024-06-15 22:00:00,956][1648985] Avg episode reward: [(0, '161.990')] [2024-06-15 22:00:02,155][1652491] Updated weights for policy 0, policy_version 855648 (0.0013) [2024-06-15 22:00:04,069][1652491] Updated weights for policy 0, policy_version 855744 (0.0014) [2024-06-15 22:00:05,955][1648985] Fps is (10 sec: 52429.4, 60 sec: 50790.5, 300 sec: 48096.8). Total num frames: 1752563712. Throughput: 0: 12276.6. Samples: 438204416. Policy #0 lag: (min: 15.0, avg: 75.0, max: 271.0) [2024-06-15 22:00:05,956][1648985] Avg episode reward: [(0, '180.540')] [2024-06-15 22:00:09,293][1652491] Updated weights for policy 0, policy_version 855802 (0.0087) [2024-06-15 22:00:10,955][1648985] Fps is (10 sec: 45874.3, 60 sec: 46421.2, 300 sec: 47430.2). Total num frames: 1752694784. Throughput: 0: 12253.8. Samples: 438244864. Policy #0 lag: (min: 15.0, avg: 75.0, max: 271.0) [2024-06-15 22:00:10,956][1648985] Avg episode reward: [(0, '186.830')] [2024-06-15 22:00:12,175][1652491] Updated weights for policy 0, policy_version 855844 (0.0011) [2024-06-15 22:00:14,087][1652491] Updated weights for policy 0, policy_version 855922 (0.0014) [2024-06-15 22:00:15,437][1652491] Updated weights for policy 0, policy_version 855994 (0.0015) [2024-06-15 22:00:15,955][1648985] Fps is (10 sec: 52428.5, 60 sec: 51882.6, 300 sec: 48430.0). Total num frames: 1753088000. Throughput: 0: 12310.7. Samples: 438314496. Policy #0 lag: (min: 15.0, avg: 75.0, max: 271.0) [2024-06-15 22:00:15,956][1648985] Avg episode reward: [(0, '179.970')] [2024-06-15 22:00:20,234][1652491] Updated weights for policy 0, policy_version 856055 (0.0019) [2024-06-15 22:00:20,955][1648985] Fps is (10 sec: 52430.1, 60 sec: 47513.6, 300 sec: 47874.6). Total num frames: 1753219072. Throughput: 0: 12060.5. Samples: 438388736. Policy #0 lag: (min: 15.0, avg: 75.0, max: 271.0) [2024-06-15 22:00:20,955][1648985] Avg episode reward: [(0, '181.590')] [2024-06-15 22:00:22,269][1651469] Signal inference workers to stop experience collection... (44600 times) [2024-06-15 22:00:22,385][1652491] InferenceWorker_p0-w0: stopping experience collection (44600 times) [2024-06-15 22:00:22,388][1652491] Updated weights for policy 0, policy_version 856088 (0.0014) [2024-06-15 22:00:22,510][1651469] Signal inference workers to resume experience collection... (44600 times) [2024-06-15 22:00:22,511][1652491] InferenceWorker_p0-w0: resuming experience collection (44600 times) [2024-06-15 22:00:24,343][1652491] Updated weights for policy 0, policy_version 856176 (0.0014) [2024-06-15 22:00:25,459][1652491] Updated weights for policy 0, policy_version 856229 (0.0013) [2024-06-15 22:00:25,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 52428.8, 300 sec: 48430.0). Total num frames: 1753612288. Throughput: 0: 12140.1. Samples: 438425088. Policy #0 lag: (min: 15.0, avg: 75.0, max: 271.0) [2024-06-15 22:00:25,956][1648985] Avg episode reward: [(0, '197.300')] [2024-06-15 22:00:30,197][1652491] Updated weights for policy 0, policy_version 856273 (0.0013) [2024-06-15 22:00:30,955][1648985] Fps is (10 sec: 52428.5, 60 sec: 48059.7, 300 sec: 47985.7). Total num frames: 1753743360. Throughput: 0: 12231.1. Samples: 438500864. Policy #0 lag: (min: 15.0, avg: 75.0, max: 271.0) [2024-06-15 22:00:30,956][1648985] Avg episode reward: [(0, '210.660')] [2024-06-15 22:00:32,572][1652491] Updated weights for policy 0, policy_version 856328 (0.0015) [2024-06-15 22:00:33,992][1652491] Updated weights for policy 0, policy_version 856400 (0.0014) [2024-06-15 22:00:35,428][1652491] Updated weights for policy 0, policy_version 856464 (0.0019) [2024-06-15 22:00:35,955][1648985] Fps is (10 sec: 45876.2, 60 sec: 51336.6, 300 sec: 48207.8). Total num frames: 1754071040. Throughput: 0: 12128.7. Samples: 438568448. Policy #0 lag: (min: 15.0, avg: 75.0, max: 271.0) [2024-06-15 22:00:35,955][1648985] Avg episode reward: [(0, '190.300')] [2024-06-15 22:00:40,955][1648985] Fps is (10 sec: 39321.6, 60 sec: 46967.5, 300 sec: 47763.5). Total num frames: 1754136576. Throughput: 0: 12208.4. Samples: 438604288. Policy #0 lag: (min: 15.0, avg: 75.0, max: 271.0) [2024-06-15 22:00:40,956][1648985] Avg episode reward: [(0, '159.050')] [2024-06-15 22:00:41,247][1652491] Updated weights for policy 0, policy_version 856531 (0.0013) [2024-06-15 22:00:43,887][1652491] Updated weights for policy 0, policy_version 856595 (0.0065) [2024-06-15 22:00:45,312][1652491] Updated weights for policy 0, policy_version 856659 (0.0011) [2024-06-15 22:00:45,955][1648985] Fps is (10 sec: 42598.1, 60 sec: 49698.1, 300 sec: 47874.6). Total num frames: 1754497024. Throughput: 0: 11923.9. Samples: 438680064. Policy #0 lag: (min: 15.0, avg: 75.0, max: 271.0) [2024-06-15 22:00:45,956][1648985] Avg episode reward: [(0, '172.800')] [2024-06-15 22:00:46,029][1652491] Updated weights for policy 0, policy_version 856702 (0.0013) [2024-06-15 22:00:47,414][1652491] Updated weights for policy 0, policy_version 856755 (0.0013) [2024-06-15 22:00:50,955][1648985] Fps is (10 sec: 52428.3, 60 sec: 48059.6, 300 sec: 47985.7). Total num frames: 1754660864. Throughput: 0: 12219.7. Samples: 438754304. Policy #0 lag: (min: 15.0, avg: 75.0, max: 271.0) [2024-06-15 22:00:50,957][1648985] Avg episode reward: [(0, '173.410')] [2024-06-15 22:00:51,579][1652491] Updated weights for policy 0, policy_version 856776 (0.0012) [2024-06-15 22:00:52,726][1652491] Updated weights for policy 0, policy_version 856829 (0.0012) [2024-06-15 22:00:54,907][1652491] Updated weights for policy 0, policy_version 856880 (0.0014) [2024-06-15 22:00:55,955][1648985] Fps is (10 sec: 45874.6, 60 sec: 48605.9, 300 sec: 47985.6). Total num frames: 1754955776. Throughput: 0: 12208.4. Samples: 438794240. Policy #0 lag: (min: 15.0, avg: 75.0, max: 271.0) [2024-06-15 22:00:55,956][1648985] Avg episode reward: [(0, '185.760')] [2024-06-15 22:00:56,434][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000856944_1755021312.pth... [2024-06-15 22:00:56,574][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000851312_1743486976.pth [2024-06-15 22:00:56,874][1652491] Updated weights for policy 0, policy_version 856960 (0.0013) [2024-06-15 22:00:57,686][1651469] Signal inference workers to stop experience collection... (44650 times) [2024-06-15 22:00:57,729][1652491] InferenceWorker_p0-w0: stopping experience collection (44650 times) [2024-06-15 22:00:58,033][1651469] Signal inference workers to resume experience collection... (44650 times) [2024-06-15 22:00:58,034][1652491] InferenceWorker_p0-w0: resuming experience collection (44650 times) [2024-06-15 22:00:58,342][1652491] Updated weights for policy 0, policy_version 857022 (0.0015) [2024-06-15 22:01:00,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 49152.0, 300 sec: 48318.9). Total num frames: 1755185152. Throughput: 0: 12106.0. Samples: 438859264. Policy #0 lag: (min: 15.0, avg: 75.0, max: 271.0) [2024-06-15 22:01:00,956][1648985] Avg episode reward: [(0, '165.000')] [2024-06-15 22:01:03,021][1652491] Updated weights for policy 0, policy_version 857072 (0.0014) [2024-06-15 22:01:05,251][1652491] Updated weights for policy 0, policy_version 857108 (0.0012) [2024-06-15 22:01:05,955][1648985] Fps is (10 sec: 45876.1, 60 sec: 47513.7, 300 sec: 47985.7). Total num frames: 1755414528. Throughput: 0: 12197.0. Samples: 438937600. Policy #0 lag: (min: 15.0, avg: 75.0, max: 271.0) [2024-06-15 22:01:05,955][1648985] Avg episode reward: [(0, '162.520')] [2024-06-15 22:01:06,302][1652491] Updated weights for policy 0, policy_version 857168 (0.0013) [2024-06-15 22:01:09,211][1652491] Updated weights for policy 0, policy_version 857264 (0.0014) [2024-06-15 22:01:10,960][1648985] Fps is (10 sec: 52402.9, 60 sec: 50240.3, 300 sec: 48207.0). Total num frames: 1755709440. Throughput: 0: 12138.8. Samples: 438971392. Policy #0 lag: (min: 73.0, avg: 165.2, max: 333.0) [2024-06-15 22:01:10,961][1648985] Avg episode reward: [(0, '149.600')] [2024-06-15 22:01:12,932][1652491] Updated weights for policy 0, policy_version 857312 (0.0017) [2024-06-15 22:01:15,955][1648985] Fps is (10 sec: 45874.1, 60 sec: 46421.3, 300 sec: 47985.6). Total num frames: 1755873280. Throughput: 0: 12162.8. Samples: 439048192. Policy #0 lag: (min: 73.0, avg: 165.2, max: 333.0) [2024-06-15 22:01:15,956][1648985] Avg episode reward: [(0, '175.160')] [2024-06-15 22:01:16,303][1652491] Updated weights for policy 0, policy_version 857376 (0.0046) [2024-06-15 22:01:17,906][1652491] Updated weights for policy 0, policy_version 857445 (0.0014) [2024-06-15 22:01:19,414][1652491] Updated weights for policy 0, policy_version 857488 (0.0018) [2024-06-15 22:01:20,955][1648985] Fps is (10 sec: 52455.2, 60 sec: 50244.3, 300 sec: 48430.0). Total num frames: 1756233728. Throughput: 0: 12117.3. Samples: 439113728. Policy #0 lag: (min: 73.0, avg: 165.2, max: 333.0) [2024-06-15 22:01:20,955][1648985] Avg episode reward: [(0, '184.390')] [2024-06-15 22:01:23,446][1652491] Updated weights for policy 0, policy_version 857552 (0.0013) [2024-06-15 22:01:25,955][1648985] Fps is (10 sec: 49152.2, 60 sec: 45875.2, 300 sec: 47985.7). Total num frames: 1756364800. Throughput: 0: 12185.6. Samples: 439152640. Policy #0 lag: (min: 73.0, avg: 165.2, max: 333.0) [2024-06-15 22:01:25,956][1648985] Avg episode reward: [(0, '190.390')] [2024-06-15 22:01:27,725][1652491] Updated weights for policy 0, policy_version 857618 (0.0014) [2024-06-15 22:01:29,029][1652491] Updated weights for policy 0, policy_version 857680 (0.0013) [2024-06-15 22:01:29,899][1652491] Updated weights for policy 0, policy_version 857728 (0.0015) [2024-06-15 22:01:30,955][1648985] Fps is (10 sec: 42598.1, 60 sec: 48605.9, 300 sec: 48096.8). Total num frames: 1756659712. Throughput: 0: 11923.9. Samples: 439216640. Policy #0 lag: (min: 73.0, avg: 165.2, max: 333.0) [2024-06-15 22:01:30,956][1648985] Avg episode reward: [(0, '173.510')] [2024-06-15 22:01:31,693][1652491] Updated weights for policy 0, policy_version 857784 (0.0017) [2024-06-15 22:01:35,837][1652491] Updated weights for policy 0, policy_version 857848 (0.0026) [2024-06-15 22:01:35,955][1648985] Fps is (10 sec: 52429.9, 60 sec: 46967.5, 300 sec: 48096.8). Total num frames: 1756889088. Throughput: 0: 11935.3. Samples: 439291392. Policy #0 lag: (min: 73.0, avg: 165.2, max: 333.0) [2024-06-15 22:01:35,956][1648985] Avg episode reward: [(0, '173.390')] [2024-06-15 22:01:39,099][1652491] Updated weights for policy 0, policy_version 857888 (0.0017) [2024-06-15 22:01:40,760][1652491] Updated weights for policy 0, policy_version 857955 (0.0015) [2024-06-15 22:01:40,962][1648985] Fps is (10 sec: 45844.9, 60 sec: 49692.6, 300 sec: 47873.5). Total num frames: 1757118464. Throughput: 0: 11933.6. Samples: 439331328. Policy #0 lag: (min: 73.0, avg: 165.2, max: 333.0) [2024-06-15 22:01:40,964][1648985] Avg episode reward: [(0, '161.890')] [2024-06-15 22:01:42,437][1651469] Signal inference workers to stop experience collection... (44700 times) [2024-06-15 22:01:42,474][1652491] InferenceWorker_p0-w0: stopping experience collection (44700 times) [2024-06-15 22:01:42,630][1651469] Signal inference workers to resume experience collection... (44700 times) [2024-06-15 22:01:42,631][1652491] InferenceWorker_p0-w0: resuming experience collection (44700 times) [2024-06-15 22:01:42,633][1652491] Updated weights for policy 0, policy_version 858016 (0.0011) [2024-06-15 22:01:45,955][1648985] Fps is (10 sec: 42598.2, 60 sec: 46967.5, 300 sec: 48096.8). Total num frames: 1757315072. Throughput: 0: 12071.8. Samples: 439402496. Policy #0 lag: (min: 73.0, avg: 165.2, max: 333.0) [2024-06-15 22:01:45,956][1648985] Avg episode reward: [(0, '171.110')] [2024-06-15 22:01:45,978][1652491] Updated weights for policy 0, policy_version 858067 (0.0012) [2024-06-15 22:01:49,576][1652491] Updated weights for policy 0, policy_version 858132 (0.0014) [2024-06-15 22:01:50,955][1648985] Fps is (10 sec: 45905.8, 60 sec: 48606.0, 300 sec: 47985.7). Total num frames: 1757577216. Throughput: 0: 11855.6. Samples: 439471104. Policy #0 lag: (min: 73.0, avg: 165.2, max: 333.0) [2024-06-15 22:01:50,956][1648985] Avg episode reward: [(0, '184.080')] [2024-06-15 22:01:51,124][1652491] Updated weights for policy 0, policy_version 858208 (0.0037) [2024-06-15 22:01:52,990][1652491] Updated weights for policy 0, policy_version 858256 (0.0013) [2024-06-15 22:01:54,081][1652491] Updated weights for policy 0, policy_version 858304 (0.0013) [2024-06-15 22:01:55,955][1648985] Fps is (10 sec: 49151.7, 60 sec: 47513.7, 300 sec: 47985.7). Total num frames: 1757806592. Throughput: 0: 11743.2. Samples: 439499776. Policy #0 lag: (min: 73.0, avg: 165.2, max: 333.0) [2024-06-15 22:01:55,956][1648985] Avg episode reward: [(0, '182.680')] [2024-06-15 22:01:57,652][1652491] Updated weights for policy 0, policy_version 858361 (0.0015) [2024-06-15 22:02:00,955][1648985] Fps is (10 sec: 42597.8, 60 sec: 46967.4, 300 sec: 47985.7). Total num frames: 1758003200. Throughput: 0: 11867.0. Samples: 439582208. Policy #0 lag: (min: 73.0, avg: 165.2, max: 333.0) [2024-06-15 22:02:00,956][1648985] Avg episode reward: [(0, '168.650')] [2024-06-15 22:02:01,144][1652491] Updated weights for policy 0, policy_version 858417 (0.0012) [2024-06-15 22:02:02,837][1652491] Updated weights for policy 0, policy_version 858494 (0.0013) [2024-06-15 22:02:05,251][1652491] Updated weights for policy 0, policy_version 858549 (0.0035) [2024-06-15 22:02:05,955][1648985] Fps is (10 sec: 52428.6, 60 sec: 48605.8, 300 sec: 48319.0). Total num frames: 1758330880. Throughput: 0: 11798.7. Samples: 439644672. Policy #0 lag: (min: 73.0, avg: 165.2, max: 333.0) [2024-06-15 22:02:05,956][1648985] Avg episode reward: [(0, '192.110')] [2024-06-15 22:02:08,168][1652491] Updated weights for policy 0, policy_version 858595 (0.0011) [2024-06-15 22:02:10,955][1648985] Fps is (10 sec: 45876.0, 60 sec: 45879.1, 300 sec: 47985.9). Total num frames: 1758461952. Throughput: 0: 11832.9. Samples: 439685120. Policy #0 lag: (min: 73.0, avg: 165.2, max: 333.0) [2024-06-15 22:02:10,955][1648985] Avg episode reward: [(0, '172.680')] [2024-06-15 22:02:11,233][1652491] Updated weights for policy 0, policy_version 858640 (0.0044) [2024-06-15 22:02:12,848][1652491] Updated weights for policy 0, policy_version 858709 (0.0013) [2024-06-15 22:02:14,702][1652491] Updated weights for policy 0, policy_version 858759 (0.0015) [2024-06-15 22:02:15,955][1648985] Fps is (10 sec: 49152.2, 60 sec: 49152.1, 300 sec: 48318.9). Total num frames: 1758822400. Throughput: 0: 11969.4. Samples: 439755264. Policy #0 lag: (min: 73.0, avg: 165.2, max: 333.0) [2024-06-15 22:02:15,956][1648985] Avg episode reward: [(0, '201.570')] [2024-06-15 22:02:18,690][1652491] Updated weights for policy 0, policy_version 858832 (0.0013) [2024-06-15 22:02:20,955][1648985] Fps is (10 sec: 52428.5, 60 sec: 45875.2, 300 sec: 48207.8). Total num frames: 1758986240. Throughput: 0: 11844.2. Samples: 439824384. Policy #0 lag: (min: 73.0, avg: 165.2, max: 333.0) [2024-06-15 22:02:20,956][1648985] Avg episode reward: [(0, '181.460')] [2024-06-15 22:02:22,693][1652491] Updated weights for policy 0, policy_version 858883 (0.0023) [2024-06-15 22:02:24,329][1652491] Updated weights for policy 0, policy_version 858960 (0.0030) [2024-06-15 22:02:24,781][1651469] Signal inference workers to stop experience collection... (44750 times) [2024-06-15 22:02:24,827][1652491] InferenceWorker_p0-w0: stopping experience collection (44750 times) [2024-06-15 22:02:25,045][1651469] Signal inference workers to resume experience collection... (44750 times) [2024-06-15 22:02:25,046][1652491] InferenceWorker_p0-w0: resuming experience collection (44750 times) [2024-06-15 22:02:25,955][1648985] Fps is (10 sec: 42598.6, 60 sec: 48059.9, 300 sec: 47985.7). Total num frames: 1759248384. Throughput: 0: 11755.0. Samples: 439860224. Policy #0 lag: (min: 73.0, avg: 165.2, max: 333.0) [2024-06-15 22:02:25,956][1648985] Avg episode reward: [(0, '188.320')] [2024-06-15 22:02:26,261][1652491] Updated weights for policy 0, policy_version 859012 (0.0023) [2024-06-15 22:02:27,742][1652491] Updated weights for policy 0, policy_version 859072 (0.0021) [2024-06-15 22:02:30,955][1648985] Fps is (10 sec: 49152.1, 60 sec: 46967.5, 300 sec: 48318.9). Total num frames: 1759477760. Throughput: 0: 11719.1. Samples: 439929856. Policy #0 lag: (min: 73.0, avg: 165.2, max: 333.0) [2024-06-15 22:02:30,956][1648985] Avg episode reward: [(0, '160.730')] [2024-06-15 22:02:34,217][1652491] Updated weights for policy 0, policy_version 859152 (0.0016) [2024-06-15 22:02:35,237][1652491] Updated weights for policy 0, policy_version 859199 (0.0036) [2024-06-15 22:02:35,955][1648985] Fps is (10 sec: 39320.9, 60 sec: 45875.0, 300 sec: 47763.5). Total num frames: 1759641600. Throughput: 0: 11696.3. Samples: 439997440. Policy #0 lag: (min: 73.0, avg: 165.2, max: 333.0) [2024-06-15 22:02:35,956][1648985] Avg episode reward: [(0, '165.320')] [2024-06-15 22:02:37,406][1652491] Updated weights for policy 0, policy_version 859265 (0.0021) [2024-06-15 22:02:38,709][1652491] Updated weights for policy 0, policy_version 859316 (0.0096) [2024-06-15 22:02:40,955][1648985] Fps is (10 sec: 42598.0, 60 sec: 46426.4, 300 sec: 47985.7). Total num frames: 1759903744. Throughput: 0: 11730.5. Samples: 440027648. Policy #0 lag: (min: 73.0, avg: 165.2, max: 333.0) [2024-06-15 22:02:40,956][1648985] Avg episode reward: [(0, '151.490')] [2024-06-15 22:02:42,339][1652491] Updated weights for policy 0, policy_version 859376 (0.0022) [2024-06-15 22:02:45,448][1652491] Updated weights for policy 0, policy_version 859409 (0.0037) [2024-06-15 22:02:45,955][1648985] Fps is (10 sec: 49152.6, 60 sec: 46967.4, 300 sec: 47985.7). Total num frames: 1760133120. Throughput: 0: 11628.1. Samples: 440105472. Policy #0 lag: (min: 73.0, avg: 165.2, max: 333.0) [2024-06-15 22:02:45,956][1648985] Avg episode reward: [(0, '144.490')] [2024-06-15 22:02:47,335][1652491] Updated weights for policy 0, policy_version 859475 (0.0014) [2024-06-15 22:02:49,760][1652491] Updated weights for policy 0, policy_version 859541 (0.0087) [2024-06-15 22:02:50,955][1648985] Fps is (10 sec: 52429.3, 60 sec: 47513.6, 300 sec: 48207.9). Total num frames: 1760428032. Throughput: 0: 11423.3. Samples: 440158720. Policy #0 lag: (min: 73.0, avg: 165.2, max: 333.0) [2024-06-15 22:02:50,956][1648985] Avg episode reward: [(0, '162.490')] [2024-06-15 22:02:54,102][1652491] Updated weights for policy 0, policy_version 859616 (0.0016) [2024-06-15 22:02:55,955][1648985] Fps is (10 sec: 42597.9, 60 sec: 45875.1, 300 sec: 48096.7). Total num frames: 1760559104. Throughput: 0: 11389.1. Samples: 440197632. Policy #0 lag: (min: 73.0, avg: 165.2, max: 333.0) [2024-06-15 22:02:55,956][1648985] Avg episode reward: [(0, '171.190')] [2024-06-15 22:02:55,960][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000859648_1760559104.pth... [2024-06-15 22:02:56,005][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000854064_1749123072.pth [2024-06-15 22:02:57,272][1652491] Updated weights for policy 0, policy_version 859664 (0.0017) [2024-06-15 22:02:58,607][1652491] Updated weights for policy 0, policy_version 859710 (0.0014) [2024-06-15 22:03:00,029][1652491] Updated weights for policy 0, policy_version 859774 (0.0013) [2024-06-15 22:03:00,955][1648985] Fps is (10 sec: 39321.5, 60 sec: 46967.5, 300 sec: 47874.6). Total num frames: 1760821248. Throughput: 0: 11309.5. Samples: 440264192. Policy #0 lag: (min: 31.0, avg: 118.0, max: 287.0) [2024-06-15 22:03:00,956][1648985] Avg episode reward: [(0, '162.280')] [2024-06-15 22:03:01,875][1652491] Updated weights for policy 0, policy_version 859824 (0.0012) [2024-06-15 22:03:04,473][1652491] Updated weights for policy 0, policy_version 859859 (0.0025) [2024-06-15 22:03:05,955][1648985] Fps is (10 sec: 52429.6, 60 sec: 45875.3, 300 sec: 48430.0). Total num frames: 1761083392. Throughput: 0: 11423.3. Samples: 440338432. Policy #0 lag: (min: 31.0, avg: 118.0, max: 287.0) [2024-06-15 22:03:05,955][1648985] Avg episode reward: [(0, '160.810')] [2024-06-15 22:03:08,604][1652491] Updated weights for policy 0, policy_version 859923 (0.0013) [2024-06-15 22:03:10,169][1652491] Updated weights for policy 0, policy_version 859987 (0.0013) [2024-06-15 22:03:10,558][1651469] Signal inference workers to stop experience collection... (44800 times) [2024-06-15 22:03:10,684][1652491] InferenceWorker_p0-w0: stopping experience collection (44800 times) [2024-06-15 22:03:10,828][1651469] Signal inference workers to resume experience collection... (44800 times) [2024-06-15 22:03:10,828][1652491] InferenceWorker_p0-w0: resuming experience collection (44800 times) [2024-06-15 22:03:10,955][1648985] Fps is (10 sec: 49151.8, 60 sec: 47513.5, 300 sec: 47874.6). Total num frames: 1761312768. Throughput: 0: 11548.4. Samples: 440379904. Policy #0 lag: (min: 31.0, avg: 118.0, max: 287.0) [2024-06-15 22:03:10,956][1648985] Avg episode reward: [(0, '160.280')] [2024-06-15 22:03:11,960][1652491] Updated weights for policy 0, policy_version 860048 (0.0156) [2024-06-15 22:03:15,014][1652491] Updated weights for policy 0, policy_version 860097 (0.0038) [2024-06-15 22:03:15,955][1648985] Fps is (10 sec: 45874.4, 60 sec: 45329.0, 300 sec: 48207.8). Total num frames: 1761542144. Throughput: 0: 11480.1. Samples: 440446464. Policy #0 lag: (min: 31.0, avg: 118.0, max: 287.0) [2024-06-15 22:03:15,956][1648985] Avg episode reward: [(0, '159.270')] [2024-06-15 22:03:16,175][1652491] Updated weights for policy 0, policy_version 860153 (0.0018) [2024-06-15 22:03:19,955][1652491] Updated weights for policy 0, policy_version 860208 (0.0013) [2024-06-15 22:03:20,955][1648985] Fps is (10 sec: 42599.2, 60 sec: 45875.3, 300 sec: 47541.4). Total num frames: 1761738752. Throughput: 0: 11650.9. Samples: 440521728. Policy #0 lag: (min: 31.0, avg: 118.0, max: 287.0) [2024-06-15 22:03:20,955][1648985] Avg episode reward: [(0, '133.950')] [2024-06-15 22:03:21,816][1652491] Updated weights for policy 0, policy_version 860257 (0.0019) [2024-06-15 22:03:23,472][1652491] Updated weights for policy 0, policy_version 860320 (0.0031) [2024-06-15 22:03:25,955][1648985] Fps is (10 sec: 45875.9, 60 sec: 45875.2, 300 sec: 47985.7). Total num frames: 1762000896. Throughput: 0: 11594.0. Samples: 440549376. Policy #0 lag: (min: 31.0, avg: 118.0, max: 287.0) [2024-06-15 22:03:25,955][1648985] Avg episode reward: [(0, '127.160')] [2024-06-15 22:03:26,224][1652491] Updated weights for policy 0, policy_version 860368 (0.0014) [2024-06-15 22:03:30,728][1652491] Updated weights for policy 0, policy_version 860432 (0.0030) [2024-06-15 22:03:30,955][1648985] Fps is (10 sec: 42597.7, 60 sec: 44782.9, 300 sec: 47319.2). Total num frames: 1762164736. Throughput: 0: 11685.0. Samples: 440631296. Policy #0 lag: (min: 31.0, avg: 118.0, max: 287.0) [2024-06-15 22:03:30,956][1648985] Avg episode reward: [(0, '173.520')] [2024-06-15 22:03:32,345][1652491] Updated weights for policy 0, policy_version 860497 (0.0015) [2024-06-15 22:03:33,658][1652491] Updated weights for policy 0, policy_version 860547 (0.0134) [2024-06-15 22:03:34,900][1652491] Updated weights for policy 0, policy_version 860602 (0.0011) [2024-06-15 22:03:35,957][1648985] Fps is (10 sec: 52420.4, 60 sec: 48058.6, 300 sec: 47985.4). Total num frames: 1762525184. Throughput: 0: 12060.0. Samples: 440701440. Policy #0 lag: (min: 31.0, avg: 118.0, max: 287.0) [2024-06-15 22:03:35,957][1648985] Avg episode reward: [(0, '183.730')] [2024-06-15 22:03:36,741][1652491] Updated weights for policy 0, policy_version 860640 (0.0011) [2024-06-15 22:03:40,955][1648985] Fps is (10 sec: 49151.9, 60 sec: 45875.2, 300 sec: 47541.6). Total num frames: 1762656256. Throughput: 0: 11992.2. Samples: 440737280. Policy #0 lag: (min: 31.0, avg: 118.0, max: 287.0) [2024-06-15 22:03:40,966][1648985] Avg episode reward: [(0, '178.610')] [2024-06-15 22:03:41,075][1652491] Updated weights for policy 0, policy_version 860676 (0.0012) [2024-06-15 22:03:42,099][1652491] Updated weights for policy 0, policy_version 860731 (0.0012) [2024-06-15 22:03:43,145][1652491] Updated weights for policy 0, policy_version 860784 (0.0015) [2024-06-15 22:03:44,579][1652491] Updated weights for policy 0, policy_version 860840 (0.0022) [2024-06-15 22:03:45,955][1648985] Fps is (10 sec: 52437.4, 60 sec: 48605.9, 300 sec: 47985.7). Total num frames: 1763049472. Throughput: 0: 12219.8. Samples: 440814080. Policy #0 lag: (min: 31.0, avg: 118.0, max: 287.0) [2024-06-15 22:03:45,955][1648985] Avg episode reward: [(0, '179.900')] [2024-06-15 22:03:46,510][1652491] Updated weights for policy 0, policy_version 860880 (0.0013) [2024-06-15 22:03:47,700][1652491] Updated weights for policy 0, policy_version 860927 (0.0013) [2024-06-15 22:03:50,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 45875.2, 300 sec: 47764.3). Total num frames: 1763180544. Throughput: 0: 12344.9. Samples: 440893952. Policy #0 lag: (min: 31.0, avg: 118.0, max: 287.0) [2024-06-15 22:03:50,956][1648985] Avg episode reward: [(0, '159.420')] [2024-06-15 22:03:52,662][1652491] Updated weights for policy 0, policy_version 860976 (0.0023) [2024-06-15 22:03:52,789][1651469] Signal inference workers to stop experience collection... (44850 times) [2024-06-15 22:03:52,854][1652491] InferenceWorker_p0-w0: stopping experience collection (44850 times) [2024-06-15 22:03:53,023][1651469] Signal inference workers to resume experience collection... (44850 times) [2024-06-15 22:03:53,024][1652491] InferenceWorker_p0-w0: resuming experience collection (44850 times) [2024-06-15 22:03:54,282][1652491] Updated weights for policy 0, policy_version 861049 (0.0056) [2024-06-15 22:03:55,955][1648985] Fps is (10 sec: 49151.8, 60 sec: 49698.3, 300 sec: 47874.6). Total num frames: 1763540992. Throughput: 0: 12071.8. Samples: 440923136. Policy #0 lag: (min: 31.0, avg: 118.0, max: 287.0) [2024-06-15 22:03:55,955][1648985] Avg episode reward: [(0, '148.370')] [2024-06-15 22:03:56,077][1652491] Updated weights for policy 0, policy_version 861109 (0.0012) [2024-06-15 22:03:57,695][1652491] Updated weights for policy 0, policy_version 861140 (0.0012) [2024-06-15 22:04:00,955][1648985] Fps is (10 sec: 52428.4, 60 sec: 48059.7, 300 sec: 48096.8). Total num frames: 1763704832. Throughput: 0: 12231.1. Samples: 440996864. Policy #0 lag: (min: 31.0, avg: 118.0, max: 287.0) [2024-06-15 22:04:00,956][1648985] Avg episode reward: [(0, '136.220')] [2024-06-15 22:04:02,479][1652491] Updated weights for policy 0, policy_version 861186 (0.0015) [2024-06-15 22:04:04,332][1652491] Updated weights for policy 0, policy_version 861265 (0.0010) [2024-06-15 22:04:05,759][1652491] Updated weights for policy 0, policy_version 861328 (0.0013) [2024-06-15 22:04:05,955][1648985] Fps is (10 sec: 45875.6, 60 sec: 48605.9, 300 sec: 47763.6). Total num frames: 1763999744. Throughput: 0: 12003.5. Samples: 441061888. Policy #0 lag: (min: 31.0, avg: 118.0, max: 287.0) [2024-06-15 22:04:05,955][1648985] Avg episode reward: [(0, '156.300')] [2024-06-15 22:04:06,714][1652491] Updated weights for policy 0, policy_version 861373 (0.0012) [2024-06-15 22:04:10,355][1652491] Updated weights for policy 0, policy_version 861424 (0.0013) [2024-06-15 22:04:10,955][1648985] Fps is (10 sec: 52429.5, 60 sec: 48605.9, 300 sec: 48318.9). Total num frames: 1764229120. Throughput: 0: 12242.5. Samples: 441100288. Policy #0 lag: (min: 31.0, avg: 118.0, max: 287.0) [2024-06-15 22:04:10,956][1648985] Avg episode reward: [(0, '154.160')] [2024-06-15 22:04:13,624][1652491] Updated weights for policy 0, policy_version 861457 (0.0024) [2024-06-15 22:04:15,203][1652491] Updated weights for policy 0, policy_version 861523 (0.0022) [2024-06-15 22:04:15,955][1648985] Fps is (10 sec: 45874.9, 60 sec: 48606.0, 300 sec: 47763.5). Total num frames: 1764458496. Throughput: 0: 12094.6. Samples: 441175552. Policy #0 lag: (min: 31.0, avg: 118.0, max: 287.0) [2024-06-15 22:04:15,956][1648985] Avg episode reward: [(0, '151.710')] [2024-06-15 22:04:16,139][1652491] Updated weights for policy 0, policy_version 861568 (0.0012) [2024-06-15 22:04:17,727][1652491] Updated weights for policy 0, policy_version 861628 (0.0033) [2024-06-15 22:04:20,955][1648985] Fps is (10 sec: 45875.3, 60 sec: 49151.9, 300 sec: 48207.9). Total num frames: 1764687872. Throughput: 0: 12004.0. Samples: 441241600. Policy #0 lag: (min: 31.0, avg: 118.0, max: 287.0) [2024-06-15 22:04:20,955][1648985] Avg episode reward: [(0, '172.870')] [2024-06-15 22:04:25,139][1652491] Updated weights for policy 0, policy_version 861699 (0.0013) [2024-06-15 22:04:25,955][1648985] Fps is (10 sec: 36044.7, 60 sec: 46967.5, 300 sec: 47319.2). Total num frames: 1764818944. Throughput: 0: 12060.5. Samples: 441280000. Policy #0 lag: (min: 31.0, avg: 118.0, max: 287.0) [2024-06-15 22:04:25,956][1648985] Avg episode reward: [(0, '165.550')] [2024-06-15 22:04:26,938][1652491] Updated weights for policy 0, policy_version 861780 (0.0012) [2024-06-15 22:04:28,539][1652491] Updated weights for policy 0, policy_version 861840 (0.0098) [2024-06-15 22:04:30,955][1648985] Fps is (10 sec: 45875.1, 60 sec: 49698.2, 300 sec: 47985.7). Total num frames: 1765146624. Throughput: 0: 11662.2. Samples: 441338880. Policy #0 lag: (min: 31.0, avg: 118.0, max: 287.0) [2024-06-15 22:04:30,955][1648985] Avg episode reward: [(0, '177.620')] [2024-06-15 22:04:31,881][1652491] Updated weights for policy 0, policy_version 861890 (0.0013) [2024-06-15 22:04:35,955][1648985] Fps is (10 sec: 45875.1, 60 sec: 45876.4, 300 sec: 47319.2). Total num frames: 1765277696. Throughput: 0: 11639.5. Samples: 441417728. Policy #0 lag: (min: 31.0, avg: 118.0, max: 287.0) [2024-06-15 22:04:35,956][1648985] Avg episode reward: [(0, '171.170')] [2024-06-15 22:04:36,685][1652491] Updated weights for policy 0, policy_version 861953 (0.0013) [2024-06-15 22:04:37,009][1651469] Signal inference workers to stop experience collection... (44900 times) [2024-06-15 22:04:37,056][1652491] InferenceWorker_p0-w0: stopping experience collection (44900 times) [2024-06-15 22:04:37,254][1651469] Signal inference workers to resume experience collection... (44900 times) [2024-06-15 22:04:37,255][1652491] InferenceWorker_p0-w0: resuming experience collection (44900 times) [2024-06-15 22:04:37,820][1652491] Updated weights for policy 0, policy_version 862002 (0.0051) [2024-06-15 22:04:39,784][1652491] Updated weights for policy 0, policy_version 862080 (0.0013) [2024-06-15 22:04:40,955][1648985] Fps is (10 sec: 49151.6, 60 sec: 49698.2, 300 sec: 47874.6). Total num frames: 1765638144. Throughput: 0: 11639.5. Samples: 441446912. Policy #0 lag: (min: 31.0, avg: 118.0, max: 287.0) [2024-06-15 22:04:40,956][1648985] Avg episode reward: [(0, '163.910')] [2024-06-15 22:04:41,121][1652491] Updated weights for policy 0, policy_version 862140 (0.0025) [2024-06-15 22:04:44,358][1652491] Updated weights for policy 0, policy_version 862201 (0.0135) [2024-06-15 22:04:45,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 45875.2, 300 sec: 47541.4). Total num frames: 1765801984. Throughput: 0: 11628.1. Samples: 441520128. Policy #0 lag: (min: 31.0, avg: 118.0, max: 287.0) [2024-06-15 22:04:45,956][1648985] Avg episode reward: [(0, '147.640')] [2024-06-15 22:04:49,613][1652491] Updated weights for policy 0, policy_version 862306 (0.0135) [2024-06-15 22:04:50,315][1652491] Updated weights for policy 0, policy_version 862334 (0.0027) [2024-06-15 22:04:50,955][1648985] Fps is (10 sec: 45875.5, 60 sec: 48605.9, 300 sec: 47652.5). Total num frames: 1766096896. Throughput: 0: 11662.2. Samples: 441586688. Policy #0 lag: (min: 63.0, avg: 134.2, max: 319.0) [2024-06-15 22:04:50,955][1648985] Avg episode reward: [(0, '151.870')] [2024-06-15 22:04:51,901][1652491] Updated weights for policy 0, policy_version 862387 (0.0012) [2024-06-15 22:04:55,176][1652491] Updated weights for policy 0, policy_version 862432 (0.0013) [2024-06-15 22:04:55,956][1648985] Fps is (10 sec: 52423.7, 60 sec: 46420.6, 300 sec: 47763.4). Total num frames: 1766326272. Throughput: 0: 11639.2. Samples: 441624064. Policy #0 lag: (min: 63.0, avg: 134.2, max: 319.0) [2024-06-15 22:04:55,957][1648985] Avg episode reward: [(0, '166.940')] [2024-06-15 22:04:55,971][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000862464_1766326272.pth... [2024-06-15 22:04:56,057][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000856944_1755021312.pth [2024-06-15 22:04:58,786][1652491] Updated weights for policy 0, policy_version 862496 (0.0013) [2024-06-15 22:05:00,072][1652491] Updated weights for policy 0, policy_version 862548 (0.0013) [2024-06-15 22:05:00,955][1648985] Fps is (10 sec: 49152.2, 60 sec: 48059.9, 300 sec: 47541.4). Total num frames: 1766588416. Throughput: 0: 11628.1. Samples: 441698816. Policy #0 lag: (min: 63.0, avg: 134.2, max: 319.0) [2024-06-15 22:05:00,955][1648985] Avg episode reward: [(0, '159.900')] [2024-06-15 22:05:01,715][1652491] Updated weights for policy 0, policy_version 862608 (0.0023) [2024-06-15 22:05:02,880][1652491] Updated weights for policy 0, policy_version 862651 (0.0013) [2024-06-15 22:05:05,955][1648985] Fps is (10 sec: 42603.1, 60 sec: 45875.2, 300 sec: 47652.5). Total num frames: 1766752256. Throughput: 0: 11741.9. Samples: 441769984. Policy #0 lag: (min: 63.0, avg: 134.2, max: 319.0) [2024-06-15 22:05:05,955][1648985] Avg episode reward: [(0, '160.500')] [2024-06-15 22:05:06,699][1652491] Updated weights for policy 0, policy_version 862712 (0.0011) [2024-06-15 22:05:08,722][1652491] Updated weights for policy 0, policy_version 862752 (0.0031) [2024-06-15 22:05:10,955][1648985] Fps is (10 sec: 39321.1, 60 sec: 45875.1, 300 sec: 47097.1). Total num frames: 1766981632. Throughput: 0: 11787.4. Samples: 441810432. Policy #0 lag: (min: 63.0, avg: 134.2, max: 319.0) [2024-06-15 22:05:10,974][1648985] Avg episode reward: [(0, '144.400')] [2024-06-15 22:05:11,232][1652491] Updated weights for policy 0, policy_version 862800 (0.0032) [2024-06-15 22:05:13,822][1652491] Updated weights for policy 0, policy_version 862884 (0.0028) [2024-06-15 22:05:15,955][1648985] Fps is (10 sec: 49150.6, 60 sec: 46421.2, 300 sec: 47541.3). Total num frames: 1767243776. Throughput: 0: 11969.4. Samples: 441877504. Policy #0 lag: (min: 63.0, avg: 134.2, max: 319.0) [2024-06-15 22:05:15,956][1648985] Avg episode reward: [(0, '156.420')] [2024-06-15 22:05:16,191][1652491] Updated weights for policy 0, policy_version 862932 (0.0017) [2024-06-15 22:05:16,442][1651469] Signal inference workers to stop experience collection... (44950 times) [2024-06-15 22:05:16,533][1652491] InferenceWorker_p0-w0: stopping experience collection (44950 times) [2024-06-15 22:05:16,679][1651469] Signal inference workers to resume experience collection... (44950 times) [2024-06-15 22:05:16,679][1652491] InferenceWorker_p0-w0: resuming experience collection (44950 times) [2024-06-15 22:05:16,857][1652491] Updated weights for policy 0, policy_version 862968 (0.0027) [2024-06-15 22:05:18,907][1652491] Updated weights for policy 0, policy_version 863010 (0.0013) [2024-06-15 22:05:20,955][1648985] Fps is (10 sec: 52429.4, 60 sec: 46967.5, 300 sec: 47097.1). Total num frames: 1767505920. Throughput: 0: 12071.8. Samples: 441960960. Policy #0 lag: (min: 63.0, avg: 134.2, max: 319.0) [2024-06-15 22:05:20,955][1648985] Avg episode reward: [(0, '163.690')] [2024-06-15 22:05:21,895][1652491] Updated weights for policy 0, policy_version 863056 (0.0013) [2024-06-15 22:05:22,858][1652491] Updated weights for policy 0, policy_version 863104 (0.0014) [2024-06-15 22:05:24,203][1652491] Updated weights for policy 0, policy_version 863162 (0.0017) [2024-06-15 22:05:25,955][1648985] Fps is (10 sec: 55706.5, 60 sec: 49698.1, 300 sec: 47652.5). Total num frames: 1767800832. Throughput: 0: 12265.3. Samples: 441998848. Policy #0 lag: (min: 63.0, avg: 134.2, max: 319.0) [2024-06-15 22:05:25,955][1648985] Avg episode reward: [(0, '144.180')] [2024-06-15 22:05:26,704][1652491] Updated weights for policy 0, policy_version 863222 (0.0013) [2024-06-15 22:05:28,971][1652491] Updated weights for policy 0, policy_version 863268 (0.0124) [2024-06-15 22:05:30,955][1648985] Fps is (10 sec: 52428.3, 60 sec: 48059.7, 300 sec: 47319.2). Total num frames: 1768030208. Throughput: 0: 12231.1. Samples: 442070528. Policy #0 lag: (min: 63.0, avg: 134.2, max: 319.0) [2024-06-15 22:05:30,956][1648985] Avg episode reward: [(0, '127.640')] [2024-06-15 22:05:33,232][1652491] Updated weights for policy 0, policy_version 863331 (0.0113) [2024-06-15 22:05:35,006][1652491] Updated weights for policy 0, policy_version 863392 (0.0015) [2024-06-15 22:05:35,955][1648985] Fps is (10 sec: 49152.1, 60 sec: 50244.3, 300 sec: 47985.7). Total num frames: 1768292352. Throughput: 0: 12288.0. Samples: 442139648. Policy #0 lag: (min: 63.0, avg: 134.2, max: 319.0) [2024-06-15 22:05:35,956][1648985] Avg episode reward: [(0, '129.730')] [2024-06-15 22:05:36,585][1652491] Updated weights for policy 0, policy_version 863428 (0.0013) [2024-06-15 22:05:37,885][1652491] Updated weights for policy 0, policy_version 863480 (0.0011) [2024-06-15 22:05:40,764][1652491] Updated weights for policy 0, policy_version 863536 (0.0014) [2024-06-15 22:05:40,955][1648985] Fps is (10 sec: 49151.9, 60 sec: 48059.7, 300 sec: 47541.4). Total num frames: 1768521728. Throughput: 0: 12276.9. Samples: 442176512. Policy #0 lag: (min: 63.0, avg: 134.2, max: 319.0) [2024-06-15 22:05:40,955][1648985] Avg episode reward: [(0, '143.250')] [2024-06-15 22:05:43,869][1652491] Updated weights for policy 0, policy_version 863584 (0.0028) [2024-06-15 22:05:45,254][1652491] Updated weights for policy 0, policy_version 863632 (0.0016) [2024-06-15 22:05:45,955][1648985] Fps is (10 sec: 49152.2, 60 sec: 49698.2, 300 sec: 47874.6). Total num frames: 1768783872. Throughput: 0: 12242.5. Samples: 442249728. Policy #0 lag: (min: 63.0, avg: 134.2, max: 319.0) [2024-06-15 22:05:45,955][1648985] Avg episode reward: [(0, '186.080')] [2024-06-15 22:05:46,210][1652491] Updated weights for policy 0, policy_version 863680 (0.0013) [2024-06-15 22:05:49,244][1652491] Updated weights for policy 0, policy_version 863741 (0.0011) [2024-06-15 22:05:50,955][1648985] Fps is (10 sec: 42598.6, 60 sec: 47513.6, 300 sec: 47430.3). Total num frames: 1768947712. Throughput: 0: 12140.0. Samples: 442316288. Policy #0 lag: (min: 63.0, avg: 134.2, max: 319.0) [2024-06-15 22:05:50,956][1648985] Avg episode reward: [(0, '182.750')] [2024-06-15 22:05:51,849][1652491] Updated weights for policy 0, policy_version 863779 (0.0013) [2024-06-15 22:05:55,478][1652491] Updated weights for policy 0, policy_version 863829 (0.0104) [2024-06-15 22:05:55,955][1648985] Fps is (10 sec: 36043.9, 60 sec: 46968.1, 300 sec: 47319.2). Total num frames: 1769144320. Throughput: 0: 12060.4. Samples: 442353152. Policy #0 lag: (min: 63.0, avg: 134.2, max: 319.0) [2024-06-15 22:05:55,956][1648985] Avg episode reward: [(0, '169.650')] [2024-06-15 22:05:57,319][1652491] Updated weights for policy 0, policy_version 863904 (0.0016) [2024-06-15 22:05:59,724][1652491] Updated weights for policy 0, policy_version 863968 (0.0012) [2024-06-15 22:06:00,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 48059.7, 300 sec: 47652.4). Total num frames: 1769472000. Throughput: 0: 11958.1. Samples: 442415616. Policy #0 lag: (min: 63.0, avg: 134.2, max: 319.0) [2024-06-15 22:06:00,956][1648985] Avg episode reward: [(0, '144.510')] [2024-06-15 22:06:02,843][1652491] Updated weights for policy 0, policy_version 864016 (0.0011) [2024-06-15 22:06:03,003][1651469] Signal inference workers to stop experience collection... (45000 times) [2024-06-15 22:06:03,058][1652491] InferenceWorker_p0-w0: stopping experience collection (45000 times) [2024-06-15 22:06:03,317][1651469] Signal inference workers to resume experience collection... (45000 times) [2024-06-15 22:06:03,321][1652491] InferenceWorker_p0-w0: resuming experience collection (45000 times) [2024-06-15 22:06:03,984][1652491] Updated weights for policy 0, policy_version 864055 (0.0012) [2024-06-15 22:06:05,955][1648985] Fps is (10 sec: 45876.3, 60 sec: 47513.5, 300 sec: 47097.9). Total num frames: 1769603072. Throughput: 0: 11798.8. Samples: 442491904. Policy #0 lag: (min: 63.0, avg: 134.2, max: 319.0) [2024-06-15 22:06:05,956][1648985] Avg episode reward: [(0, '157.750')] [2024-06-15 22:06:07,122][1652491] Updated weights for policy 0, policy_version 864096 (0.0020) [2024-06-15 22:06:08,251][1652491] Updated weights for policy 0, policy_version 864146 (0.0011) [2024-06-15 22:06:09,073][1652491] Updated weights for policy 0, policy_version 864191 (0.0023) [2024-06-15 22:06:10,955][1648985] Fps is (10 sec: 45874.7, 60 sec: 49151.9, 300 sec: 47652.5). Total num frames: 1769930752. Throughput: 0: 11650.8. Samples: 442523136. Policy #0 lag: (min: 63.0, avg: 134.2, max: 319.0) [2024-06-15 22:06:10,956][1648985] Avg episode reward: [(0, '160.510')] [2024-06-15 22:06:11,534][1652491] Updated weights for policy 0, policy_version 864254 (0.0013) [2024-06-15 22:06:14,470][1652491] Updated weights for policy 0, policy_version 864292 (0.0016) [2024-06-15 22:06:15,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 48059.9, 300 sec: 47097.1). Total num frames: 1770127360. Throughput: 0: 11616.7. Samples: 442593280. Policy #0 lag: (min: 63.0, avg: 134.2, max: 319.0) [2024-06-15 22:06:15,955][1648985] Avg episode reward: [(0, '160.780')] [2024-06-15 22:06:18,208][1652491] Updated weights for policy 0, policy_version 864356 (0.0014) [2024-06-15 22:06:20,094][1652491] Updated weights for policy 0, policy_version 864441 (0.0013) [2024-06-15 22:06:20,955][1648985] Fps is (10 sec: 45876.1, 60 sec: 48059.7, 300 sec: 47541.4). Total num frames: 1770389504. Throughput: 0: 11650.9. Samples: 442663936. Policy #0 lag: (min: 63.0, avg: 134.2, max: 319.0) [2024-06-15 22:06:20,955][1648985] Avg episode reward: [(0, '156.960')] [2024-06-15 22:06:22,333][1652491] Updated weights for policy 0, policy_version 864480 (0.0012) [2024-06-15 22:06:22,916][1652491] Updated weights for policy 0, policy_version 864511 (0.0011) [2024-06-15 22:06:25,300][1652491] Updated weights for policy 0, policy_version 864551 (0.0012) [2024-06-15 22:06:25,955][1648985] Fps is (10 sec: 52428.0, 60 sec: 47513.5, 300 sec: 47430.3). Total num frames: 1770651648. Throughput: 0: 11696.3. Samples: 442702848. Policy #0 lag: (min: 63.0, avg: 134.2, max: 319.0) [2024-06-15 22:06:25,956][1648985] Avg episode reward: [(0, '148.160')] [2024-06-15 22:06:28,724][1652491] Updated weights for policy 0, policy_version 864597 (0.0019) [2024-06-15 22:06:30,330][1652491] Updated weights for policy 0, policy_version 864662 (0.0122) [2024-06-15 22:06:30,955][1648985] Fps is (10 sec: 45874.3, 60 sec: 46967.4, 300 sec: 47319.2). Total num frames: 1770848256. Throughput: 0: 11696.3. Samples: 442776064. Policy #0 lag: (min: 63.0, avg: 134.2, max: 319.0) [2024-06-15 22:06:30,956][1648985] Avg episode reward: [(0, '139.290')] [2024-06-15 22:06:31,417][1652491] Updated weights for policy 0, policy_version 864704 (0.0040) [2024-06-15 22:06:33,520][1652491] Updated weights for policy 0, policy_version 864755 (0.0012) [2024-06-15 22:06:35,143][1652491] Updated weights for policy 0, policy_version 864784 (0.0013) [2024-06-15 22:06:35,955][1648985] Fps is (10 sec: 49152.5, 60 sec: 47513.6, 300 sec: 47542.4). Total num frames: 1771143168. Throughput: 0: 11867.0. Samples: 442850304. Policy #0 lag: (min: 63.0, avg: 134.2, max: 319.0) [2024-06-15 22:06:35,956][1648985] Avg episode reward: [(0, '137.280')] [2024-06-15 22:06:39,440][1652491] Updated weights for policy 0, policy_version 864833 (0.0092) [2024-06-15 22:06:40,508][1652491] Updated weights for policy 0, policy_version 864884 (0.0016) [2024-06-15 22:06:40,960][1648985] Fps is (10 sec: 49126.8, 60 sec: 46963.4, 300 sec: 47540.5). Total num frames: 1771339776. Throughput: 0: 11933.9. Samples: 442890240. Policy #0 lag: (min: 47.0, avg: 134.2, max: 303.0) [2024-06-15 22:06:40,961][1648985] Avg episode reward: [(0, '166.350')] [2024-06-15 22:06:41,873][1652491] Updated weights for policy 0, policy_version 864953 (0.0015) [2024-06-15 22:06:44,462][1652491] Updated weights for policy 0, policy_version 865008 (0.0026) [2024-06-15 22:06:45,955][1648985] Fps is (10 sec: 42598.5, 60 sec: 46421.3, 300 sec: 47430.3). Total num frames: 1771569152. Throughput: 0: 11923.9. Samples: 442952192. Policy #0 lag: (min: 47.0, avg: 134.2, max: 303.0) [2024-06-15 22:06:45,956][1648985] Avg episode reward: [(0, '182.670')] [2024-06-15 22:06:46,257][1652491] Updated weights for policy 0, policy_version 865025 (0.0012) [2024-06-15 22:06:46,600][1651469] Signal inference workers to stop experience collection... (45050 times) [2024-06-15 22:06:46,697][1652491] InferenceWorker_p0-w0: stopping experience collection (45050 times) [2024-06-15 22:06:46,813][1651469] Signal inference workers to resume experience collection... (45050 times) [2024-06-15 22:06:46,814][1652491] InferenceWorker_p0-w0: resuming experience collection (45050 times) [2024-06-15 22:06:50,955][1648985] Fps is (10 sec: 36063.6, 60 sec: 45875.2, 300 sec: 47097.1). Total num frames: 1771700224. Throughput: 0: 11923.9. Samples: 443028480. Policy #0 lag: (min: 47.0, avg: 134.2, max: 303.0) [2024-06-15 22:06:50,956][1648985] Avg episode reward: [(0, '177.990')] [2024-06-15 22:06:51,212][1652491] Updated weights for policy 0, policy_version 865105 (0.0013) [2024-06-15 22:06:52,450][1652491] Updated weights for policy 0, policy_version 865168 (0.0012) [2024-06-15 22:06:53,358][1652491] Updated weights for policy 0, policy_version 865215 (0.0014) [2024-06-15 22:06:55,949][1652491] Updated weights for policy 0, policy_version 865267 (0.0013) [2024-06-15 22:06:55,956][1648985] Fps is (10 sec: 49145.3, 60 sec: 48604.9, 300 sec: 47652.2). Total num frames: 1772060672. Throughput: 0: 11980.5. Samples: 443062272. Policy #0 lag: (min: 47.0, avg: 134.2, max: 303.0) [2024-06-15 22:06:55,957][1648985] Avg episode reward: [(0, '156.000')] [2024-06-15 22:06:56,192][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000865280_1772093440.pth... [2024-06-15 22:06:56,251][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000859648_1760559104.pth [2024-06-15 22:06:57,634][1652491] Updated weights for policy 0, policy_version 865299 (0.0013) [2024-06-15 22:06:58,798][1652491] Updated weights for policy 0, policy_version 865344 (0.0013) [2024-06-15 22:07:00,955][1648985] Fps is (10 sec: 52429.2, 60 sec: 45875.2, 300 sec: 47097.1). Total num frames: 1772224512. Throughput: 0: 12049.1. Samples: 443135488. Policy #0 lag: (min: 47.0, avg: 134.2, max: 303.0) [2024-06-15 22:07:00,956][1648985] Avg episode reward: [(0, '168.310')] [2024-06-15 22:07:02,749][1652491] Updated weights for policy 0, policy_version 865413 (0.0013) [2024-06-15 22:07:05,955][1648985] Fps is (10 sec: 42605.0, 60 sec: 48059.9, 300 sec: 47541.4). Total num frames: 1772486656. Throughput: 0: 12151.5. Samples: 443210752. Policy #0 lag: (min: 47.0, avg: 134.2, max: 303.0) [2024-06-15 22:07:05,955][1648985] Avg episode reward: [(0, '166.590')] [2024-06-15 22:07:06,048][1652491] Updated weights for policy 0, policy_version 865473 (0.0083) [2024-06-15 22:07:07,396][1652491] Updated weights for policy 0, policy_version 865534 (0.0011) [2024-06-15 22:07:09,094][1652491] Updated weights for policy 0, policy_version 865592 (0.0021) [2024-06-15 22:07:10,955][1648985] Fps is (10 sec: 52428.6, 60 sec: 46967.5, 300 sec: 47208.1). Total num frames: 1772748800. Throughput: 0: 12015.0. Samples: 443243520. Policy #0 lag: (min: 47.0, avg: 134.2, max: 303.0) [2024-06-15 22:07:10,956][1648985] Avg episode reward: [(0, '149.460')] [2024-06-15 22:07:12,489][1652491] Updated weights for policy 0, policy_version 865648 (0.0120) [2024-06-15 22:07:13,828][1652491] Updated weights for policy 0, policy_version 865685 (0.0011) [2024-06-15 22:07:15,955][1648985] Fps is (10 sec: 52427.9, 60 sec: 48059.7, 300 sec: 47541.4). Total num frames: 1773010944. Throughput: 0: 11855.7. Samples: 443309568. Policy #0 lag: (min: 47.0, avg: 134.2, max: 303.0) [2024-06-15 22:07:15,956][1648985] Avg episode reward: [(0, '138.540')] [2024-06-15 22:07:17,608][1652491] Updated weights for policy 0, policy_version 865744 (0.0014) [2024-06-15 22:07:19,440][1652491] Updated weights for policy 0, policy_version 865793 (0.0012) [2024-06-15 22:07:20,803][1652491] Updated weights for policy 0, policy_version 865846 (0.0011) [2024-06-15 22:07:20,955][1648985] Fps is (10 sec: 49152.1, 60 sec: 47513.5, 300 sec: 47430.3). Total num frames: 1773240320. Throughput: 0: 11867.0. Samples: 443384320. Policy #0 lag: (min: 47.0, avg: 134.2, max: 303.0) [2024-06-15 22:07:20,956][1648985] Avg episode reward: [(0, '134.410')] [2024-06-15 22:07:22,540][1652491] Updated weights for policy 0, policy_version 865890 (0.0013) [2024-06-15 22:07:24,632][1652491] Updated weights for policy 0, policy_version 865953 (0.0030) [2024-06-15 22:07:25,958][1648985] Fps is (10 sec: 52412.4, 60 sec: 48057.3, 300 sec: 47651.9). Total num frames: 1773535232. Throughput: 0: 11799.3. Samples: 443421184. Policy #0 lag: (min: 47.0, avg: 134.2, max: 303.0) [2024-06-15 22:07:25,959][1648985] Avg episode reward: [(0, '145.160')] [2024-06-15 22:07:29,183][1652491] Updated weights for policy 0, policy_version 866003 (0.0014) [2024-06-15 22:07:30,326][1651469] Signal inference workers to stop experience collection... (45100 times) [2024-06-15 22:07:30,446][1652491] InferenceWorker_p0-w0: stopping experience collection (45100 times) [2024-06-15 22:07:30,651][1651469] Signal inference workers to resume experience collection... (45100 times) [2024-06-15 22:07:30,652][1652491] InferenceWorker_p0-w0: resuming experience collection (45100 times) [2024-06-15 22:07:30,654][1652491] Updated weights for policy 0, policy_version 866064 (0.0012) [2024-06-15 22:07:30,955][1648985] Fps is (10 sec: 45875.4, 60 sec: 47513.7, 300 sec: 47652.5). Total num frames: 1773699072. Throughput: 0: 12106.0. Samples: 443496960. Policy #0 lag: (min: 47.0, avg: 134.2, max: 303.0) [2024-06-15 22:07:30,956][1648985] Avg episode reward: [(0, '155.760')] [2024-06-15 22:07:33,739][1652491] Updated weights for policy 0, policy_version 866128 (0.0046) [2024-06-15 22:07:35,603][1652491] Updated weights for policy 0, policy_version 866208 (0.0013) [2024-06-15 22:07:35,955][1648985] Fps is (10 sec: 45889.5, 60 sec: 47513.6, 300 sec: 47763.5). Total num frames: 1773993984. Throughput: 0: 11730.5. Samples: 443556352. Policy #0 lag: (min: 47.0, avg: 134.2, max: 303.0) [2024-06-15 22:07:35,956][1648985] Avg episode reward: [(0, '157.900')] [2024-06-15 22:07:40,686][1652491] Updated weights for policy 0, policy_version 866272 (0.0014) [2024-06-15 22:07:40,955][1648985] Fps is (10 sec: 42597.9, 60 sec: 46425.3, 300 sec: 47430.3). Total num frames: 1774125056. Throughput: 0: 11924.2. Samples: 443598848. Policy #0 lag: (min: 47.0, avg: 134.2, max: 303.0) [2024-06-15 22:07:40,956][1648985] Avg episode reward: [(0, '149.100')] [2024-06-15 22:07:41,681][1652491] Updated weights for policy 0, policy_version 866320 (0.0014) [2024-06-15 22:07:43,048][1652491] Updated weights for policy 0, policy_version 866368 (0.0028) [2024-06-15 22:07:45,955][1648985] Fps is (10 sec: 39321.5, 60 sec: 46967.5, 300 sec: 47319.2). Total num frames: 1774387200. Throughput: 0: 11832.9. Samples: 443667968. Policy #0 lag: (min: 47.0, avg: 134.2, max: 303.0) [2024-06-15 22:07:45,956][1648985] Avg episode reward: [(0, '149.820')] [2024-06-15 22:07:46,426][1652491] Updated weights for policy 0, policy_version 866430 (0.0018) [2024-06-15 22:07:48,011][1652491] Updated weights for policy 0, policy_version 866491 (0.0142) [2024-06-15 22:07:50,955][1648985] Fps is (10 sec: 45875.9, 60 sec: 48059.8, 300 sec: 47541.4). Total num frames: 1774583808. Throughput: 0: 11616.7. Samples: 443733504. Policy #0 lag: (min: 47.0, avg: 134.2, max: 303.0) [2024-06-15 22:07:50,955][1648985] Avg episode reward: [(0, '156.140')] [2024-06-15 22:07:52,357][1652491] Updated weights for policy 0, policy_version 866533 (0.0127) [2024-06-15 22:07:53,320][1652491] Updated weights for policy 0, policy_version 866576 (0.0129) [2024-06-15 22:07:55,956][1648985] Fps is (10 sec: 45870.5, 60 sec: 46421.6, 300 sec: 47541.2). Total num frames: 1774845952. Throughput: 0: 11605.1. Samples: 443765760. Policy #0 lag: (min: 47.0, avg: 134.2, max: 303.0) [2024-06-15 22:07:55,957][1648985] Avg episode reward: [(0, '152.380')] [2024-06-15 22:07:56,795][1652491] Updated weights for policy 0, policy_version 866640 (0.0014) [2024-06-15 22:07:57,762][1652491] Updated weights for policy 0, policy_version 866679 (0.0014) [2024-06-15 22:07:58,657][1652491] Updated weights for policy 0, policy_version 866720 (0.0014) [2024-06-15 22:08:00,955][1648985] Fps is (10 sec: 52428.5, 60 sec: 48059.7, 300 sec: 47541.4). Total num frames: 1775108096. Throughput: 0: 11639.5. Samples: 443833344. Policy #0 lag: (min: 47.0, avg: 134.2, max: 303.0) [2024-06-15 22:08:00,956][1648985] Avg episode reward: [(0, '163.000')] [2024-06-15 22:08:03,389][1652491] Updated weights for policy 0, policy_version 866770 (0.0014) [2024-06-15 22:08:05,297][1652491] Updated weights for policy 0, policy_version 866853 (0.0116) [2024-06-15 22:08:05,955][1648985] Fps is (10 sec: 52434.3, 60 sec: 48059.6, 300 sec: 47652.5). Total num frames: 1775370240. Throughput: 0: 11616.7. Samples: 443907072. Policy #0 lag: (min: 47.0, avg: 134.2, max: 303.0) [2024-06-15 22:08:05,956][1648985] Avg episode reward: [(0, '161.710')] [2024-06-15 22:08:07,538][1652491] Updated weights for policy 0, policy_version 866881 (0.0013) [2024-06-15 22:08:08,743][1652491] Updated weights for policy 0, policy_version 866935 (0.0085) [2024-06-15 22:08:09,448][1651469] Signal inference workers to stop experience collection... (45150 times) [2024-06-15 22:08:09,532][1652491] InferenceWorker_p0-w0: stopping experience collection (45150 times) [2024-06-15 22:08:09,668][1651469] Signal inference workers to resume experience collection... (45150 times) [2024-06-15 22:08:09,668][1652491] InferenceWorker_p0-w0: resuming experience collection (45150 times) [2024-06-15 22:08:10,292][1652491] Updated weights for policy 0, policy_version 867008 (0.0013) [2024-06-15 22:08:10,955][1648985] Fps is (10 sec: 52428.4, 60 sec: 48059.7, 300 sec: 47763.5). Total num frames: 1775632384. Throughput: 0: 11606.1. Samples: 443943424. Policy #0 lag: (min: 47.0, avg: 134.2, max: 303.0) [2024-06-15 22:08:10,956][1648985] Avg episode reward: [(0, '180.470')] [2024-06-15 22:08:15,000][1652491] Updated weights for policy 0, policy_version 867057 (0.0015) [2024-06-15 22:08:15,955][1648985] Fps is (10 sec: 45875.3, 60 sec: 46967.5, 300 sec: 47763.5). Total num frames: 1775828992. Throughput: 0: 11594.0. Samples: 444018688. Policy #0 lag: (min: 47.0, avg: 134.2, max: 303.0) [2024-06-15 22:08:15,956][1648985] Avg episode reward: [(0, '167.510')] [2024-06-15 22:08:16,273][1652491] Updated weights for policy 0, policy_version 867121 (0.0015) [2024-06-15 22:08:18,966][1652491] Updated weights for policy 0, policy_version 867168 (0.0013) [2024-06-15 22:08:20,069][1652491] Updated weights for policy 0, policy_version 867201 (0.0013) [2024-06-15 22:08:20,955][1648985] Fps is (10 sec: 45874.9, 60 sec: 47513.5, 300 sec: 47763.5). Total num frames: 1776091136. Throughput: 0: 11867.0. Samples: 444090368. Policy #0 lag: (min: 47.0, avg: 134.2, max: 303.0) [2024-06-15 22:08:20,956][1648985] Avg episode reward: [(0, '144.870')] [2024-06-15 22:08:21,339][1652491] Updated weights for policy 0, policy_version 867264 (0.0020) [2024-06-15 22:08:25,695][1652491] Updated weights for policy 0, policy_version 867329 (0.0013) [2024-06-15 22:08:25,955][1648985] Fps is (10 sec: 49151.8, 60 sec: 46423.7, 300 sec: 47985.7). Total num frames: 1776320512. Throughput: 0: 11787.4. Samples: 444129280. Policy #0 lag: (min: 47.0, avg: 134.2, max: 303.0) [2024-06-15 22:08:25,956][1648985] Avg episode reward: [(0, '170.910')] [2024-06-15 22:08:26,760][1652491] Updated weights for policy 0, policy_version 867392 (0.0018) [2024-06-15 22:08:30,716][1652491] Updated weights for policy 0, policy_version 867442 (0.0013) [2024-06-15 22:08:30,955][1648985] Fps is (10 sec: 42598.6, 60 sec: 46967.4, 300 sec: 47430.5). Total num frames: 1776517120. Throughput: 0: 11844.2. Samples: 444200960. Policy #0 lag: (min: 47.0, avg: 134.2, max: 303.0) [2024-06-15 22:08:30,956][1648985] Avg episode reward: [(0, '174.090')] [2024-06-15 22:08:31,528][1652491] Updated weights for policy 0, policy_version 867473 (0.0014) [2024-06-15 22:08:32,422][1652491] Updated weights for policy 0, policy_version 867520 (0.0013) [2024-06-15 22:08:35,622][1652491] Updated weights for policy 0, policy_version 867571 (0.0013) [2024-06-15 22:08:35,956][1648985] Fps is (10 sec: 45876.0, 60 sec: 46421.5, 300 sec: 47874.6). Total num frames: 1776779264. Throughput: 0: 12083.2. Samples: 444277248. Policy #0 lag: (min: 61.0, avg: 178.1, max: 303.0) [2024-06-15 22:08:35,956][1648985] Avg episode reward: [(0, '163.300')] [2024-06-15 22:08:37,112][1652491] Updated weights for policy 0, policy_version 867621 (0.0012) [2024-06-15 22:08:40,955][1648985] Fps is (10 sec: 42599.0, 60 sec: 46967.6, 300 sec: 47097.1). Total num frames: 1776943104. Throughput: 0: 12060.7. Samples: 444308480. Policy #0 lag: (min: 61.0, avg: 178.1, max: 303.0) [2024-06-15 22:08:40,955][1648985] Avg episode reward: [(0, '158.620')] [2024-06-15 22:08:41,451][1652491] Updated weights for policy 0, policy_version 867685 (0.0012) [2024-06-15 22:08:43,206][1652491] Updated weights for policy 0, policy_version 867772 (0.0013) [2024-06-15 22:08:45,955][1648985] Fps is (10 sec: 45874.1, 60 sec: 47513.6, 300 sec: 47652.4). Total num frames: 1777238016. Throughput: 0: 12162.8. Samples: 444380672. Policy #0 lag: (min: 61.0, avg: 178.1, max: 303.0) [2024-06-15 22:08:45,956][1648985] Avg episode reward: [(0, '169.250')] [2024-06-15 22:08:46,406][1652491] Updated weights for policy 0, policy_version 867813 (0.0014) [2024-06-15 22:08:47,533][1652491] Updated weights for policy 0, policy_version 867843 (0.0016) [2024-06-15 22:08:48,598][1652491] Updated weights for policy 0, policy_version 867898 (0.0014) [2024-06-15 22:08:50,955][1648985] Fps is (10 sec: 52428.4, 60 sec: 48059.7, 300 sec: 47208.1). Total num frames: 1777467392. Throughput: 0: 12174.2. Samples: 444454912. Policy #0 lag: (min: 61.0, avg: 178.1, max: 303.0) [2024-06-15 22:08:50,956][1648985] Avg episode reward: [(0, '159.920')] [2024-06-15 22:08:51,632][1651469] Signal inference workers to stop experience collection... (45200 times) [2024-06-15 22:08:51,725][1652491] InferenceWorker_p0-w0: stopping experience collection (45200 times) [2024-06-15 22:08:51,799][1651469] Signal inference workers to resume experience collection... (45200 times) [2024-06-15 22:08:51,800][1652491] InferenceWorker_p0-w0: resuming experience collection (45200 times) [2024-06-15 22:08:52,235][1652491] Updated weights for policy 0, policy_version 867954 (0.0094) [2024-06-15 22:08:53,579][1652491] Updated weights for policy 0, policy_version 868031 (0.0015) [2024-06-15 22:08:55,955][1648985] Fps is (10 sec: 49151.6, 60 sec: 48060.4, 300 sec: 47541.4). Total num frames: 1777729536. Throughput: 0: 12117.3. Samples: 444488704. Policy #0 lag: (min: 61.0, avg: 178.1, max: 303.0) [2024-06-15 22:08:55,956][1648985] Avg episode reward: [(0, '180.610')] [2024-06-15 22:08:56,254][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000868064_1777795072.pth... [2024-06-15 22:08:56,401][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000862464_1766326272.pth [2024-06-15 22:08:56,937][1652491] Updated weights for policy 0, policy_version 868095 (0.0020) [2024-06-15 22:08:59,569][1652491] Updated weights for policy 0, policy_version 868150 (0.0018) [2024-06-15 22:09:00,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 48059.7, 300 sec: 47430.3). Total num frames: 1777991680. Throughput: 0: 12094.6. Samples: 444562944. Policy #0 lag: (min: 61.0, avg: 178.1, max: 303.0) [2024-06-15 22:09:00,956][1648985] Avg episode reward: [(0, '175.860')] [2024-06-15 22:09:03,146][1652491] Updated weights for policy 0, policy_version 868208 (0.0014) [2024-06-15 22:09:04,597][1652491] Updated weights for policy 0, policy_version 868272 (0.0012) [2024-06-15 22:09:05,012][1652491] Updated weights for policy 0, policy_version 868287 (0.0011) [2024-06-15 22:09:05,955][1648985] Fps is (10 sec: 52429.6, 60 sec: 48059.7, 300 sec: 47541.4). Total num frames: 1778253824. Throughput: 0: 12185.6. Samples: 444638720. Policy #0 lag: (min: 61.0, avg: 178.1, max: 303.0) [2024-06-15 22:09:05,956][1648985] Avg episode reward: [(0, '199.980')] [2024-06-15 22:09:07,400][1652491] Updated weights for policy 0, policy_version 868337 (0.0104) [2024-06-15 22:09:09,199][1652491] Updated weights for policy 0, policy_version 868384 (0.0013) [2024-06-15 22:09:10,955][1648985] Fps is (10 sec: 52428.2, 60 sec: 48059.7, 300 sec: 47652.4). Total num frames: 1778515968. Throughput: 0: 12094.6. Samples: 444673536. Policy #0 lag: (min: 61.0, avg: 178.1, max: 303.0) [2024-06-15 22:09:10,956][1648985] Avg episode reward: [(0, '183.080')] [2024-06-15 22:09:13,221][1652491] Updated weights for policy 0, policy_version 868417 (0.0013) [2024-06-15 22:09:14,537][1652491] Updated weights for policy 0, policy_version 868481 (0.0099) [2024-06-15 22:09:15,651][1652491] Updated weights for policy 0, policy_version 868537 (0.0012) [2024-06-15 22:09:15,962][1648985] Fps is (10 sec: 52390.5, 60 sec: 49146.0, 300 sec: 47762.3). Total num frames: 1778778112. Throughput: 0: 12104.0. Samples: 444745728. Policy #0 lag: (min: 61.0, avg: 178.1, max: 303.0) [2024-06-15 22:09:15,963][1648985] Avg episode reward: [(0, '174.480')] [2024-06-15 22:09:18,548][1652491] Updated weights for policy 0, policy_version 868611 (0.0013) [2024-06-15 22:09:20,955][1648985] Fps is (10 sec: 52429.2, 60 sec: 49152.1, 300 sec: 48207.8). Total num frames: 1779040256. Throughput: 0: 12083.1. Samples: 444820992. Policy #0 lag: (min: 61.0, avg: 178.1, max: 303.0) [2024-06-15 22:09:20,956][1648985] Avg episode reward: [(0, '173.160')] [2024-06-15 22:09:24,170][1652491] Updated weights for policy 0, policy_version 868675 (0.0122) [2024-06-15 22:09:25,933][1652491] Updated weights for policy 0, policy_version 868768 (0.0213) [2024-06-15 22:09:25,955][1648985] Fps is (10 sec: 45908.5, 60 sec: 48605.8, 300 sec: 47763.5). Total num frames: 1779236864. Throughput: 0: 12288.0. Samples: 444861440. Policy #0 lag: (min: 61.0, avg: 178.1, max: 303.0) [2024-06-15 22:09:25,956][1648985] Avg episode reward: [(0, '169.360')] [2024-06-15 22:09:28,428][1652491] Updated weights for policy 0, policy_version 868816 (0.0017) [2024-06-15 22:09:30,311][1651469] Signal inference workers to stop experience collection... (45250 times) [2024-06-15 22:09:30,373][1652491] Updated weights for policy 0, policy_version 868899 (0.0016) [2024-06-15 22:09:30,461][1652491] InferenceWorker_p0-w0: stopping experience collection (45250 times) [2024-06-15 22:09:30,556][1651469] Signal inference workers to resume experience collection... (45250 times) [2024-06-15 22:09:30,557][1652491] InferenceWorker_p0-w0: resuming experience collection (45250 times) [2024-06-15 22:09:30,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 50790.4, 300 sec: 48430.0). Total num frames: 1779564544. Throughput: 0: 12162.9. Samples: 444928000. Policy #0 lag: (min: 61.0, avg: 178.1, max: 303.0) [2024-06-15 22:09:30,956][1648985] Avg episode reward: [(0, '154.330')] [2024-06-15 22:09:30,977][1652491] Updated weights for policy 0, policy_version 868928 (0.0012) [2024-06-15 22:09:35,955][1648985] Fps is (10 sec: 39321.5, 60 sec: 47513.4, 300 sec: 47430.3). Total num frames: 1779630080. Throughput: 0: 12242.5. Samples: 445005824. Policy #0 lag: (min: 61.0, avg: 178.1, max: 303.0) [2024-06-15 22:09:35,956][1648985] Avg episode reward: [(0, '152.730')] [2024-06-15 22:09:36,097][1652491] Updated weights for policy 0, policy_version 868976 (0.0025) [2024-06-15 22:09:37,623][1652491] Updated weights for policy 0, policy_version 869046 (0.0071) [2024-06-15 22:09:39,840][1652491] Updated weights for policy 0, policy_version 869104 (0.0063) [2024-06-15 22:09:40,955][1648985] Fps is (10 sec: 45875.7, 60 sec: 51336.6, 300 sec: 48207.9). Total num frames: 1780023296. Throughput: 0: 12345.0. Samples: 445044224. Policy #0 lag: (min: 61.0, avg: 178.1, max: 303.0) [2024-06-15 22:09:40,955][1648985] Avg episode reward: [(0, '159.610')] [2024-06-15 22:09:41,112][1652491] Updated weights for policy 0, policy_version 869168 (0.0093) [2024-06-15 22:09:45,400][1652491] Updated weights for policy 0, policy_version 869185 (0.0012) [2024-06-15 22:09:45,955][1648985] Fps is (10 sec: 52429.3, 60 sec: 48605.9, 300 sec: 47652.4). Total num frames: 1780154368. Throughput: 0: 12538.3. Samples: 445127168. Policy #0 lag: (min: 61.0, avg: 178.1, max: 303.0) [2024-06-15 22:09:45,956][1648985] Avg episode reward: [(0, '161.850')] [2024-06-15 22:09:47,014][1652491] Updated weights for policy 0, policy_version 869264 (0.0156) [2024-06-15 22:09:47,735][1652491] Updated weights for policy 0, policy_version 869311 (0.0015) [2024-06-15 22:09:50,281][1652491] Updated weights for policy 0, policy_version 869360 (0.0021) [2024-06-15 22:09:50,956][1648985] Fps is (10 sec: 45871.0, 60 sec: 50243.6, 300 sec: 47985.7). Total num frames: 1780482048. Throughput: 0: 12276.4. Samples: 445191168. Policy #0 lag: (min: 61.0, avg: 178.1, max: 303.0) [2024-06-15 22:09:50,956][1648985] Avg episode reward: [(0, '142.920')] [2024-06-15 22:09:51,978][1652491] Updated weights for policy 0, policy_version 869436 (0.0015) [2024-06-15 22:09:55,955][1648985] Fps is (10 sec: 45875.0, 60 sec: 48059.8, 300 sec: 47541.3). Total num frames: 1780613120. Throughput: 0: 12288.0. Samples: 445226496. Policy #0 lag: (min: 61.0, avg: 178.1, max: 303.0) [2024-06-15 22:09:55,956][1648985] Avg episode reward: [(0, '165.280')] [2024-06-15 22:09:57,809][1652491] Updated weights for policy 0, policy_version 869511 (0.0199) [2024-06-15 22:10:00,955][1648985] Fps is (10 sec: 39324.2, 60 sec: 48059.6, 300 sec: 47874.6). Total num frames: 1780875264. Throughput: 0: 12187.5. Samples: 445294080. Policy #0 lag: (min: 61.0, avg: 178.1, max: 303.0) [2024-06-15 22:10:00,956][1648985] Avg episode reward: [(0, '153.450')] [2024-06-15 22:10:01,384][1652491] Updated weights for policy 0, policy_version 869589 (0.0013) [2024-06-15 22:10:03,637][1652491] Updated weights for policy 0, policy_version 869680 (0.0011) [2024-06-15 22:10:05,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 48059.7, 300 sec: 47985.7). Total num frames: 1781137408. Throughput: 0: 12014.9. Samples: 445361664. Policy #0 lag: (min: 61.0, avg: 178.1, max: 303.0) [2024-06-15 22:10:05,956][1648985] Avg episode reward: [(0, '157.560')] [2024-06-15 22:10:08,546][1652491] Updated weights for policy 0, policy_version 869714 (0.0011) [2024-06-15 22:10:09,983][1652491] Updated weights for policy 0, policy_version 869776 (0.0013) [2024-06-15 22:10:10,890][1652491] Updated weights for policy 0, policy_version 869815 (0.0122) [2024-06-15 22:10:10,955][1648985] Fps is (10 sec: 49153.0, 60 sec: 47513.7, 300 sec: 47874.6). Total num frames: 1781366784. Throughput: 0: 11958.1. Samples: 445399552. Policy #0 lag: (min: 61.0, avg: 178.1, max: 303.0) [2024-06-15 22:10:10,955][1648985] Avg episode reward: [(0, '157.570')] [2024-06-15 22:10:12,723][1651469] Signal inference workers to stop experience collection... (45300 times) [2024-06-15 22:10:12,760][1652491] InferenceWorker_p0-w0: stopping experience collection (45300 times) [2024-06-15 22:10:12,973][1651469] Signal inference workers to resume experience collection... (45300 times) [2024-06-15 22:10:12,982][1652491] InferenceWorker_p0-w0: resuming experience collection (45300 times) [2024-06-15 22:10:14,138][1652491] Updated weights for policy 0, policy_version 869888 (0.0014) [2024-06-15 22:10:15,611][1652491] Updated weights for policy 0, policy_version 869952 (0.0022) [2024-06-15 22:10:15,955][1648985] Fps is (10 sec: 52427.9, 60 sec: 48065.5, 300 sec: 47985.6). Total num frames: 1781661696. Throughput: 0: 11787.3. Samples: 445458432. Policy #0 lag: (min: 61.0, avg: 178.1, max: 303.0) [2024-06-15 22:10:15,956][1648985] Avg episode reward: [(0, '172.200')] [2024-06-15 22:10:20,955][1648985] Fps is (10 sec: 42598.3, 60 sec: 45875.2, 300 sec: 47430.3). Total num frames: 1781792768. Throughput: 0: 11719.1. Samples: 445533184. Policy #0 lag: (min: 61.0, avg: 178.1, max: 303.0) [2024-06-15 22:10:20,956][1648985] Avg episode reward: [(0, '174.770')] [2024-06-15 22:10:21,317][1652491] Updated weights for policy 0, policy_version 870032 (0.0015) [2024-06-15 22:10:24,464][1652491] Updated weights for policy 0, policy_version 870084 (0.0017) [2024-06-15 22:10:25,955][1648985] Fps is (10 sec: 39322.4, 60 sec: 46967.5, 300 sec: 47541.4). Total num frames: 1782054912. Throughput: 0: 11593.9. Samples: 445565952. Policy #0 lag: (min: 0.0, avg: 105.0, max: 256.0) [2024-06-15 22:10:25,955][1648985] Avg episode reward: [(0, '174.440')] [2024-06-15 22:10:26,394][1652491] Updated weights for policy 0, policy_version 870176 (0.0012) [2024-06-15 22:10:30,955][1648985] Fps is (10 sec: 39321.5, 60 sec: 43690.7, 300 sec: 47097.1). Total num frames: 1782185984. Throughput: 0: 11150.2. Samples: 445628928. Policy #0 lag: (min: 0.0, avg: 105.0, max: 256.0) [2024-06-15 22:10:30,956][1648985] Avg episode reward: [(0, '162.700')] [2024-06-15 22:10:31,487][1652491] Updated weights for policy 0, policy_version 870214 (0.0014) [2024-06-15 22:10:33,591][1652491] Updated weights for policy 0, policy_version 870291 (0.0154) [2024-06-15 22:10:35,955][1648985] Fps is (10 sec: 39321.5, 60 sec: 46967.5, 300 sec: 47208.1). Total num frames: 1782448128. Throughput: 0: 11287.0. Samples: 445699072. Policy #0 lag: (min: 0.0, avg: 105.0, max: 256.0) [2024-06-15 22:10:35,956][1648985] Avg episode reward: [(0, '163.170')] [2024-06-15 22:10:36,381][1652491] Updated weights for policy 0, policy_version 870337 (0.0016) [2024-06-15 22:10:37,580][1652491] Updated weights for policy 0, policy_version 870400 (0.0023) [2024-06-15 22:10:39,094][1652491] Updated weights for policy 0, policy_version 870453 (0.0013) [2024-06-15 22:10:40,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 44782.9, 300 sec: 47208.1). Total num frames: 1782710272. Throughput: 0: 11229.9. Samples: 445731840. Policy #0 lag: (min: 0.0, avg: 105.0, max: 256.0) [2024-06-15 22:10:40,956][1648985] Avg episode reward: [(0, '182.210')] [2024-06-15 22:10:42,837][1652491] Updated weights for policy 0, policy_version 870497 (0.0019) [2024-06-15 22:10:44,933][1652491] Updated weights for policy 0, policy_version 870590 (0.0159) [2024-06-15 22:10:45,955][1648985] Fps is (10 sec: 52428.5, 60 sec: 46967.4, 300 sec: 47541.4). Total num frames: 1782972416. Throughput: 0: 11195.8. Samples: 445797888. Policy #0 lag: (min: 0.0, avg: 105.0, max: 256.0) [2024-06-15 22:10:45,956][1648985] Avg episode reward: [(0, '173.450')] [2024-06-15 22:10:48,588][1652491] Updated weights for policy 0, policy_version 870640 (0.0012) [2024-06-15 22:10:50,335][1652491] Updated weights for policy 0, policy_version 870704 (0.0016) [2024-06-15 22:10:50,955][1648985] Fps is (10 sec: 52429.2, 60 sec: 45875.9, 300 sec: 47763.6). Total num frames: 1783234560. Throughput: 0: 11275.4. Samples: 445869056. Policy #0 lag: (min: 0.0, avg: 105.0, max: 256.0) [2024-06-15 22:10:50,956][1648985] Avg episode reward: [(0, '168.950')] [2024-06-15 22:10:53,451][1651469] Signal inference workers to stop experience collection... (45350 times) [2024-06-15 22:10:53,495][1652491] InferenceWorker_p0-w0: stopping experience collection (45350 times) [2024-06-15 22:10:53,498][1652491] Updated weights for policy 0, policy_version 870740 (0.0012) [2024-06-15 22:10:53,633][1651469] Signal inference workers to resume experience collection... (45350 times) [2024-06-15 22:10:53,634][1652491] InferenceWorker_p0-w0: resuming experience collection (45350 times) [2024-06-15 22:10:54,287][1652491] Updated weights for policy 0, policy_version 870784 (0.0013) [2024-06-15 22:10:55,332][1652491] Updated weights for policy 0, policy_version 870840 (0.0018) [2024-06-15 22:10:55,955][1648985] Fps is (10 sec: 52428.3, 60 sec: 48059.7, 300 sec: 47541.3). Total num frames: 1783496704. Throughput: 0: 11411.9. Samples: 445913088. Policy #0 lag: (min: 0.0, avg: 105.0, max: 256.0) [2024-06-15 22:10:55,956][1648985] Avg episode reward: [(0, '153.100')] [2024-06-15 22:10:55,976][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000870848_1783496704.pth... [2024-06-15 22:10:56,037][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000865280_1772093440.pth [2024-06-15 22:10:56,040][1651469] Saving a milestone train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/milestones/checkpoint_000870848_1783496704.pth [2024-06-15 22:10:58,666][1652491] Updated weights for policy 0, policy_version 870886 (0.0098) [2024-06-15 22:10:59,667][1652491] Updated weights for policy 0, policy_version 870931 (0.0029) [2024-06-15 22:11:00,527][1652491] Updated weights for policy 0, policy_version 870973 (0.0014) [2024-06-15 22:11:00,955][1648985] Fps is (10 sec: 52428.3, 60 sec: 48059.8, 300 sec: 47985.7). Total num frames: 1783758848. Throughput: 0: 11753.3. Samples: 445987328. Policy #0 lag: (min: 0.0, avg: 105.0, max: 256.0) [2024-06-15 22:11:00,956][1648985] Avg episode reward: [(0, '155.650')] [2024-06-15 22:11:04,257][1652491] Updated weights for policy 0, policy_version 871015 (0.0015) [2024-06-15 22:11:05,796][1652491] Updated weights for policy 0, policy_version 871102 (0.0013) [2024-06-15 22:11:05,955][1648985] Fps is (10 sec: 52429.5, 60 sec: 48059.7, 300 sec: 47763.5). Total num frames: 1784020992. Throughput: 0: 11844.3. Samples: 446066176. Policy #0 lag: (min: 0.0, avg: 105.0, max: 256.0) [2024-06-15 22:11:05,956][1648985] Avg episode reward: [(0, '164.120')] [2024-06-15 22:11:09,765][1652491] Updated weights for policy 0, policy_version 871184 (0.0014) [2024-06-15 22:11:10,955][1648985] Fps is (10 sec: 52428.0, 60 sec: 48605.7, 300 sec: 47985.6). Total num frames: 1784283136. Throughput: 0: 12083.1. Samples: 446109696. Policy #0 lag: (min: 0.0, avg: 105.0, max: 256.0) [2024-06-15 22:11:10,956][1648985] Avg episode reward: [(0, '165.300')] [2024-06-15 22:11:14,506][1652491] Updated weights for policy 0, policy_version 871249 (0.0014) [2024-06-15 22:11:15,643][1652491] Updated weights for policy 0, policy_version 871294 (0.0010) [2024-06-15 22:11:15,955][1648985] Fps is (10 sec: 39321.9, 60 sec: 45875.4, 300 sec: 47541.4). Total num frames: 1784414208. Throughput: 0: 12151.5. Samples: 446175744. Policy #0 lag: (min: 0.0, avg: 105.0, max: 256.0) [2024-06-15 22:11:15,955][1648985] Avg episode reward: [(0, '182.690')] [2024-06-15 22:11:16,932][1652491] Updated weights for policy 0, policy_version 871357 (0.0011) [2024-06-15 22:11:20,606][1652491] Updated weights for policy 0, policy_version 871428 (0.0015) [2024-06-15 22:11:20,955][1648985] Fps is (10 sec: 42599.0, 60 sec: 48605.8, 300 sec: 47652.5). Total num frames: 1784709120. Throughput: 0: 12049.0. Samples: 446241280. Policy #0 lag: (min: 0.0, avg: 105.0, max: 256.0) [2024-06-15 22:11:20,956][1648985] Avg episode reward: [(0, '196.670')] [2024-06-15 22:11:21,727][1652491] Updated weights for policy 0, policy_version 871488 (0.0013) [2024-06-15 22:11:25,955][1648985] Fps is (10 sec: 39322.2, 60 sec: 45875.3, 300 sec: 47319.3). Total num frames: 1784807424. Throughput: 0: 12185.7. Samples: 446280192. Policy #0 lag: (min: 0.0, avg: 105.0, max: 256.0) [2024-06-15 22:11:25,955][1648985] Avg episode reward: [(0, '191.750')] [2024-06-15 22:11:27,456][1652491] Updated weights for policy 0, policy_version 871554 (0.0147) [2024-06-15 22:11:28,378][1652491] Updated weights for policy 0, policy_version 871609 (0.0013) [2024-06-15 22:11:30,024][1652491] Updated weights for policy 0, policy_version 871664 (0.0012) [2024-06-15 22:11:30,955][1648985] Fps is (10 sec: 49152.5, 60 sec: 50244.3, 300 sec: 47652.5). Total num frames: 1785200640. Throughput: 0: 12288.0. Samples: 446350848. Policy #0 lag: (min: 0.0, avg: 105.0, max: 256.0) [2024-06-15 22:11:30,955][1648985] Avg episode reward: [(0, '181.440')] [2024-06-15 22:11:31,137][1651469] Signal inference workers to stop experience collection... (45400 times) [2024-06-15 22:11:31,214][1652491] InferenceWorker_p0-w0: stopping experience collection (45400 times) [2024-06-15 22:11:31,396][1651469] Signal inference workers to resume experience collection... (45400 times) [2024-06-15 22:11:31,397][1652491] InferenceWorker_p0-w0: resuming experience collection (45400 times) [2024-06-15 22:11:31,733][1652491] Updated weights for policy 0, policy_version 871712 (0.0140) [2024-06-15 22:11:35,955][1648985] Fps is (10 sec: 52427.7, 60 sec: 48059.7, 300 sec: 47431.1). Total num frames: 1785331712. Throughput: 0: 12413.1. Samples: 446427648. Policy #0 lag: (min: 0.0, avg: 105.0, max: 256.0) [2024-06-15 22:11:35,956][1648985] Avg episode reward: [(0, '181.220')] [2024-06-15 22:11:37,120][1652491] Updated weights for policy 0, policy_version 871760 (0.0014) [2024-06-15 22:11:38,678][1652491] Updated weights for policy 0, policy_version 871824 (0.0041) [2024-06-15 22:11:40,440][1652491] Updated weights for policy 0, policy_version 871889 (0.0014) [2024-06-15 22:11:40,955][1648985] Fps is (10 sec: 45875.2, 60 sec: 49152.0, 300 sec: 47763.5). Total num frames: 1785659392. Throughput: 0: 12151.5. Samples: 446459904. Policy #0 lag: (min: 0.0, avg: 105.0, max: 256.0) [2024-06-15 22:11:40,956][1648985] Avg episode reward: [(0, '167.480')] [2024-06-15 22:11:41,397][1652491] Updated weights for policy 0, policy_version 871932 (0.0021) [2024-06-15 22:11:43,257][1652491] Updated weights for policy 0, policy_version 871984 (0.0013) [2024-06-15 22:11:45,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 48059.8, 300 sec: 47985.7). Total num frames: 1785856000. Throughput: 0: 12071.8. Samples: 446530560. Policy #0 lag: (min: 0.0, avg: 105.0, max: 256.0) [2024-06-15 22:11:45,956][1648985] Avg episode reward: [(0, '166.600')] [2024-06-15 22:11:48,588][1652491] Updated weights for policy 0, policy_version 872037 (0.0049) [2024-06-15 22:11:49,805][1652491] Updated weights for policy 0, policy_version 872096 (0.0013) [2024-06-15 22:11:50,955][1648985] Fps is (10 sec: 45875.3, 60 sec: 48059.7, 300 sec: 47652.7). Total num frames: 1786118144. Throughput: 0: 12015.0. Samples: 446606848. Policy #0 lag: (min: 0.0, avg: 105.0, max: 256.0) [2024-06-15 22:11:50,956][1648985] Avg episode reward: [(0, '149.140')] [2024-06-15 22:11:51,693][1652491] Updated weights for policy 0, policy_version 872146 (0.0012) [2024-06-15 22:11:53,595][1652491] Updated weights for policy 0, policy_version 872195 (0.0012) [2024-06-15 22:11:54,752][1652491] Updated weights for policy 0, policy_version 872256 (0.0015) [2024-06-15 22:11:55,955][1648985] Fps is (10 sec: 52428.5, 60 sec: 48059.8, 300 sec: 47985.7). Total num frames: 1786380288. Throughput: 0: 11741.9. Samples: 446638080. Policy #0 lag: (min: 0.0, avg: 105.0, max: 256.0) [2024-06-15 22:11:55,956][1648985] Avg episode reward: [(0, '159.250')] [2024-06-15 22:12:00,677][1652491] Updated weights for policy 0, policy_version 872352 (0.0096) [2024-06-15 22:12:00,955][1648985] Fps is (10 sec: 45875.1, 60 sec: 46967.5, 300 sec: 47763.5). Total num frames: 1786576896. Throughput: 0: 11958.0. Samples: 446713856. Policy #0 lag: (min: 0.0, avg: 105.0, max: 256.0) [2024-06-15 22:12:00,955][1648985] Avg episode reward: [(0, '161.400')] [2024-06-15 22:12:02,400][1652491] Updated weights for policy 0, policy_version 872387 (0.0044) [2024-06-15 22:12:03,380][1652491] Updated weights for policy 0, policy_version 872436 (0.0012) [2024-06-15 22:12:04,586][1652491] Updated weights for policy 0, policy_version 872480 (0.0020) [2024-06-15 22:12:05,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 48059.7, 300 sec: 47985.7). Total num frames: 1786904576. Throughput: 0: 12151.5. Samples: 446788096. Policy #0 lag: (min: 0.0, avg: 105.0, max: 256.0) [2024-06-15 22:12:05,955][1648985] Avg episode reward: [(0, '149.670')] [2024-06-15 22:12:09,984][1652491] Updated weights for policy 0, policy_version 872531 (0.0011) [2024-06-15 22:12:10,955][1648985] Fps is (10 sec: 42597.8, 60 sec: 45329.1, 300 sec: 47430.3). Total num frames: 1787002880. Throughput: 0: 12196.9. Samples: 446829056. Policy #0 lag: (min: 0.0, avg: 105.0, max: 256.0) [2024-06-15 22:12:10,956][1648985] Avg episode reward: [(0, '160.580')] [2024-06-15 22:12:11,568][1652491] Updated weights for policy 0, policy_version 872593 (0.0122) [2024-06-15 22:12:13,199][1652491] Updated weights for policy 0, policy_version 872643 (0.0013) [2024-06-15 22:12:13,561][1651469] Signal inference workers to stop experience collection... (45450 times) [2024-06-15 22:12:13,591][1652491] InferenceWorker_p0-w0: stopping experience collection (45450 times) [2024-06-15 22:12:13,725][1651469] Signal inference workers to resume experience collection... (45450 times) [2024-06-15 22:12:13,726][1652491] InferenceWorker_p0-w0: resuming experience collection (45450 times) [2024-06-15 22:12:14,313][1652491] Updated weights for policy 0, policy_version 872703 (0.0011) [2024-06-15 22:12:15,955][1648985] Fps is (10 sec: 49152.1, 60 sec: 49698.1, 300 sec: 47985.7). Total num frames: 1787396096. Throughput: 0: 12003.5. Samples: 446891008. Policy #0 lag: (min: 0.0, avg: 105.0, max: 256.0) [2024-06-15 22:12:15,956][1648985] Avg episode reward: [(0, '159.660')] [2024-06-15 22:12:16,016][1652491] Updated weights for policy 0, policy_version 872766 (0.0013) [2024-06-15 22:12:20,955][1648985] Fps is (10 sec: 42597.8, 60 sec: 45328.9, 300 sec: 47097.5). Total num frames: 1787428864. Throughput: 0: 12060.4. Samples: 446970368. Policy #0 lag: (min: 0.0, avg: 105.0, max: 256.0) [2024-06-15 22:12:20,956][1648985] Avg episode reward: [(0, '160.960')] [2024-06-15 22:12:22,081][1652491] Updated weights for policy 0, policy_version 872822 (0.0013) [2024-06-15 22:12:23,490][1652491] Updated weights for policy 0, policy_version 872887 (0.0016) [2024-06-15 22:12:24,535][1652491] Updated weights for policy 0, policy_version 872928 (0.0013) [2024-06-15 22:12:25,776][1652491] Updated weights for policy 0, policy_version 872983 (0.0013) [2024-06-15 22:12:25,955][1648985] Fps is (10 sec: 49150.8, 60 sec: 51336.2, 300 sec: 48096.7). Total num frames: 1787887616. Throughput: 0: 12014.9. Samples: 447000576. Policy #0 lag: (min: 4.0, avg: 70.8, max: 260.0) [2024-06-15 22:12:25,956][1648985] Avg episode reward: [(0, '158.050')] [2024-06-15 22:12:26,476][1652491] Updated weights for policy 0, policy_version 873024 (0.0012) [2024-06-15 22:12:30,955][1648985] Fps is (10 sec: 52430.0, 60 sec: 45875.2, 300 sec: 47319.2). Total num frames: 1787953152. Throughput: 0: 12174.2. Samples: 447078400. Policy #0 lag: (min: 4.0, avg: 70.8, max: 260.0) [2024-06-15 22:12:30,956][1648985] Avg episode reward: [(0, '169.760')] [2024-06-15 22:12:33,292][1652491] Updated weights for policy 0, policy_version 873106 (0.0103) [2024-06-15 22:12:34,867][1652491] Updated weights for policy 0, policy_version 873155 (0.0029) [2024-06-15 22:12:35,955][1648985] Fps is (10 sec: 42598.7, 60 sec: 49698.0, 300 sec: 48096.7). Total num frames: 1788313600. Throughput: 0: 11810.1. Samples: 447138304. Policy #0 lag: (min: 4.0, avg: 70.8, max: 260.0) [2024-06-15 22:12:35,956][1648985] Avg episode reward: [(0, '173.260')] [2024-06-15 22:12:36,148][1652491] Updated weights for policy 0, policy_version 873211 (0.0064) [2024-06-15 22:12:37,565][1652491] Updated weights for policy 0, policy_version 873268 (0.0091) [2024-06-15 22:12:40,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 46967.4, 300 sec: 47763.5). Total num frames: 1788477440. Throughput: 0: 11923.9. Samples: 447174656. Policy #0 lag: (min: 4.0, avg: 70.8, max: 260.0) [2024-06-15 22:12:40,956][1648985] Avg episode reward: [(0, '170.130')] [2024-06-15 22:12:43,605][1652491] Updated weights for policy 0, policy_version 873315 (0.0014) [2024-06-15 22:12:45,756][1652491] Updated weights for policy 0, policy_version 873402 (0.0047) [2024-06-15 22:12:45,956][1648985] Fps is (10 sec: 42598.4, 60 sec: 48059.6, 300 sec: 47985.6). Total num frames: 1788739584. Throughput: 0: 11969.4. Samples: 447252480. Policy #0 lag: (min: 4.0, avg: 70.8, max: 260.0) [2024-06-15 22:12:45,957][1648985] Avg episode reward: [(0, '184.050')] [2024-06-15 22:12:47,123][1652491] Updated weights for policy 0, policy_version 873463 (0.0089) [2024-06-15 22:12:48,398][1652491] Updated weights for policy 0, policy_version 873494 (0.0014) [2024-06-15 22:12:50,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 48059.7, 300 sec: 47985.8). Total num frames: 1789001728. Throughput: 0: 11764.6. Samples: 447317504. Policy #0 lag: (min: 4.0, avg: 70.8, max: 260.0) [2024-06-15 22:12:50,956][1648985] Avg episode reward: [(0, '177.830')] [2024-06-15 22:12:54,728][1652491] Updated weights for policy 0, policy_version 873568 (0.0168) [2024-06-15 22:12:55,576][1651469] Signal inference workers to stop experience collection... (45500 times) [2024-06-15 22:12:55,628][1652491] InferenceWorker_p0-w0: stopping experience collection (45500 times) [2024-06-15 22:12:55,814][1651469] Signal inference workers to resume experience collection... (45500 times) [2024-06-15 22:12:55,815][1652491] InferenceWorker_p0-w0: resuming experience collection (45500 times) [2024-06-15 22:12:55,955][1648985] Fps is (10 sec: 42598.7, 60 sec: 46421.3, 300 sec: 47652.4). Total num frames: 1789165568. Throughput: 0: 11844.3. Samples: 447362048. Policy #0 lag: (min: 4.0, avg: 70.8, max: 260.0) [2024-06-15 22:12:55,956][1648985] Avg episode reward: [(0, '183.800')] [2024-06-15 22:12:56,213][1652491] Updated weights for policy 0, policy_version 873632 (0.0101) [2024-06-15 22:12:56,585][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000873648_1789231104.pth... [2024-06-15 22:12:56,748][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000868064_1777795072.pth [2024-06-15 22:12:58,725][1652491] Updated weights for policy 0, policy_version 873728 (0.0107) [2024-06-15 22:13:00,043][1652491] Updated weights for policy 0, policy_version 873788 (0.0014) [2024-06-15 22:13:00,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 49152.0, 300 sec: 47985.7). Total num frames: 1789526016. Throughput: 0: 11605.3. Samples: 447413248. Policy #0 lag: (min: 4.0, avg: 70.8, max: 260.0) [2024-06-15 22:13:00,956][1648985] Avg episode reward: [(0, '168.800')] [2024-06-15 22:13:05,955][1648985] Fps is (10 sec: 39321.7, 60 sec: 44236.8, 300 sec: 47208.1). Total num frames: 1789558784. Throughput: 0: 11764.7. Samples: 447499776. Policy #0 lag: (min: 4.0, avg: 70.8, max: 260.0) [2024-06-15 22:13:05,956][1648985] Avg episode reward: [(0, '175.370')] [2024-06-15 22:13:06,646][1652491] Updated weights for policy 0, policy_version 873840 (0.0020) [2024-06-15 22:13:07,888][1652491] Updated weights for policy 0, policy_version 873889 (0.0012) [2024-06-15 22:13:09,445][1652491] Updated weights for policy 0, policy_version 873953 (0.0014) [2024-06-15 22:13:10,938][1652491] Updated weights for policy 0, policy_version 874022 (0.0054) [2024-06-15 22:13:10,955][1648985] Fps is (10 sec: 45875.2, 60 sec: 49698.2, 300 sec: 47985.7). Total num frames: 1789984768. Throughput: 0: 11741.9. Samples: 447528960. Policy #0 lag: (min: 4.0, avg: 70.8, max: 260.0) [2024-06-15 22:13:10,956][1648985] Avg episode reward: [(0, '160.790')] [2024-06-15 22:13:15,955][1648985] Fps is (10 sec: 49152.2, 60 sec: 44236.8, 300 sec: 47319.2). Total num frames: 1790050304. Throughput: 0: 11707.7. Samples: 447605248. Policy #0 lag: (min: 4.0, avg: 70.8, max: 260.0) [2024-06-15 22:13:15,956][1648985] Avg episode reward: [(0, '168.800')] [2024-06-15 22:13:17,978][1652491] Updated weights for policy 0, policy_version 874096 (0.0014) [2024-06-15 22:13:19,121][1652491] Updated weights for policy 0, policy_version 874146 (0.0101) [2024-06-15 22:13:20,594][1652491] Updated weights for policy 0, policy_version 874224 (0.0153) [2024-06-15 22:13:20,955][1648985] Fps is (10 sec: 45875.3, 60 sec: 50244.5, 300 sec: 47874.6). Total num frames: 1790443520. Throughput: 0: 11787.4. Samples: 447668736. Policy #0 lag: (min: 4.0, avg: 70.8, max: 260.0) [2024-06-15 22:13:20,956][1648985] Avg episode reward: [(0, '168.710')] [2024-06-15 22:13:22,490][1652491] Updated weights for policy 0, policy_version 874304 (0.0125) [2024-06-15 22:13:25,955][1648985] Fps is (10 sec: 52428.3, 60 sec: 44783.0, 300 sec: 47652.4). Total num frames: 1790574592. Throughput: 0: 11696.3. Samples: 447700992. Policy #0 lag: (min: 4.0, avg: 70.8, max: 260.0) [2024-06-15 22:13:25,956][1648985] Avg episode reward: [(0, '182.530')] [2024-06-15 22:13:29,791][1652491] Updated weights for policy 0, policy_version 874363 (0.0027) [2024-06-15 22:13:30,955][1648985] Fps is (10 sec: 36044.1, 60 sec: 47513.5, 300 sec: 47541.3). Total num frames: 1790803968. Throughput: 0: 11628.1. Samples: 447775744. Policy #0 lag: (min: 4.0, avg: 70.8, max: 260.0) [2024-06-15 22:13:30,956][1648985] Avg episode reward: [(0, '176.140')] [2024-06-15 22:13:31,058][1652491] Updated weights for policy 0, policy_version 874417 (0.0012) [2024-06-15 22:13:31,703][1651469] Signal inference workers to stop experience collection... (45550 times) [2024-06-15 22:13:31,786][1652491] InferenceWorker_p0-w0: stopping experience collection (45550 times) [2024-06-15 22:13:31,888][1651469] Signal inference workers to resume experience collection... (45550 times) [2024-06-15 22:13:31,889][1652491] InferenceWorker_p0-w0: resuming experience collection (45550 times) [2024-06-15 22:13:32,459][1652491] Updated weights for policy 0, policy_version 874484 (0.0112) [2024-06-15 22:13:34,267][1652491] Updated weights for policy 0, policy_version 874559 (0.0012) [2024-06-15 22:13:35,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 46421.4, 300 sec: 47985.7). Total num frames: 1791098880. Throughput: 0: 11582.6. Samples: 447838720. Policy #0 lag: (min: 4.0, avg: 70.8, max: 260.0) [2024-06-15 22:13:35,956][1648985] Avg episode reward: [(0, '175.130')] [2024-06-15 22:13:40,955][1648985] Fps is (10 sec: 36045.4, 60 sec: 44782.9, 300 sec: 47208.1). Total num frames: 1791164416. Throughput: 0: 11491.6. Samples: 447879168. Policy #0 lag: (min: 4.0, avg: 70.8, max: 260.0) [2024-06-15 22:13:40,956][1648985] Avg episode reward: [(0, '180.800')] [2024-06-15 22:13:41,537][1652491] Updated weights for policy 0, policy_version 874619 (0.0096) [2024-06-15 22:13:42,705][1652491] Updated weights for policy 0, policy_version 874672 (0.0013) [2024-06-15 22:13:45,117][1652491] Updated weights for policy 0, policy_version 874754 (0.0104) [2024-06-15 22:13:45,955][1648985] Fps is (10 sec: 45875.7, 60 sec: 46967.6, 300 sec: 47763.5). Total num frames: 1791557632. Throughput: 0: 11594.0. Samples: 447934976. Policy #0 lag: (min: 4.0, avg: 70.8, max: 260.0) [2024-06-15 22:13:45,955][1648985] Avg episode reward: [(0, '168.870')] [2024-06-15 22:13:46,487][1652491] Updated weights for policy 0, policy_version 874813 (0.0014) [2024-06-15 22:13:50,955][1648985] Fps is (10 sec: 45875.3, 60 sec: 43690.7, 300 sec: 47097.1). Total num frames: 1791623168. Throughput: 0: 11264.0. Samples: 448006656. Policy #0 lag: (min: 4.0, avg: 70.8, max: 260.0) [2024-06-15 22:13:50,956][1648985] Avg episode reward: [(0, '176.410')] [2024-06-15 22:13:53,137][1652491] Updated weights for policy 0, policy_version 874864 (0.0013) [2024-06-15 22:13:55,955][1648985] Fps is (10 sec: 36044.7, 60 sec: 45875.2, 300 sec: 47208.1). Total num frames: 1791918080. Throughput: 0: 11468.8. Samples: 448045056. Policy #0 lag: (min: 4.0, avg: 70.8, max: 260.0) [2024-06-15 22:13:55,956][1648985] Avg episode reward: [(0, '174.990')] [2024-06-15 22:13:55,995][1652491] Updated weights for policy 0, policy_version 874976 (0.0018) [2024-06-15 22:13:57,912][1652491] Updated weights for policy 0, policy_version 875042 (0.0012) [2024-06-15 22:14:00,955][1648985] Fps is (10 sec: 52428.6, 60 sec: 43690.6, 300 sec: 47097.1). Total num frames: 1792147456. Throughput: 0: 10968.2. Samples: 448098816. Policy #0 lag: (min: 4.0, avg: 70.8, max: 260.0) [2024-06-15 22:14:00,956][1648985] Avg episode reward: [(0, '175.590')] [2024-06-15 22:14:04,719][1652491] Updated weights for policy 0, policy_version 875089 (0.0042) [2024-06-15 22:14:05,955][1648985] Fps is (10 sec: 36044.9, 60 sec: 45329.1, 300 sec: 46652.8). Total num frames: 1792278528. Throughput: 0: 11173.0. Samples: 448171520. Policy #0 lag: (min: 4.0, avg: 70.8, max: 260.0) [2024-06-15 22:14:05,956][1648985] Avg episode reward: [(0, '148.710')] [2024-06-15 22:14:06,469][1652491] Updated weights for policy 0, policy_version 875155 (0.0013) [2024-06-15 22:14:08,591][1652491] Updated weights for policy 0, policy_version 875236 (0.0014) [2024-06-15 22:14:10,122][1651469] Signal inference workers to stop experience collection... (45600 times) [2024-06-15 22:14:10,168][1652491] InferenceWorker_p0-w0: stopping experience collection (45600 times) [2024-06-15 22:14:10,366][1651469] Signal inference workers to resume experience collection... (45600 times) [2024-06-15 22:14:10,367][1652491] InferenceWorker_p0-w0: resuming experience collection (45600 times) [2024-06-15 22:14:10,492][1652491] Updated weights for policy 0, policy_version 875321 (0.0014) [2024-06-15 22:14:10,955][1648985] Fps is (10 sec: 52427.6, 60 sec: 44782.7, 300 sec: 47098.2). Total num frames: 1792671744. Throughput: 0: 10956.8. Samples: 448194048. Policy #0 lag: (min: 4.0, avg: 70.8, max: 260.0) [2024-06-15 22:14:10,956][1648985] Avg episode reward: [(0, '143.130')] [2024-06-15 22:14:15,955][1648985] Fps is (10 sec: 39321.7, 60 sec: 43690.7, 300 sec: 46208.4). Total num frames: 1792671744. Throughput: 0: 11013.7. Samples: 448271360. Policy #0 lag: (min: 4.0, avg: 70.8, max: 260.0) [2024-06-15 22:14:15,955][1648985] Avg episode reward: [(0, '150.610')] [2024-06-15 22:14:17,579][1652491] Updated weights for policy 0, policy_version 875376 (0.0012) [2024-06-15 22:14:18,958][1652491] Updated weights for policy 0, policy_version 875425 (0.0013) [2024-06-15 22:14:20,632][1652491] Updated weights for policy 0, policy_version 875504 (0.0012) [2024-06-15 22:14:20,958][1648985] Fps is (10 sec: 39310.1, 60 sec: 43688.3, 300 sec: 46874.4). Total num frames: 1793064960. Throughput: 0: 10944.7. Samples: 448331264. Policy #0 lag: (min: 1.0, avg: 56.5, max: 257.0) [2024-06-15 22:14:20,959][1648985] Avg episode reward: [(0, '159.280')] [2024-06-15 22:14:22,140][1652491] Updated weights for policy 0, policy_version 875578 (0.0014) [2024-06-15 22:14:25,955][1648985] Fps is (10 sec: 52428.4, 60 sec: 43690.7, 300 sec: 46208.4). Total num frames: 1793196032. Throughput: 0: 10854.4. Samples: 448367616. Policy #0 lag: (min: 1.0, avg: 56.5, max: 257.0) [2024-06-15 22:14:25,956][1648985] Avg episode reward: [(0, '165.060')] [2024-06-15 22:14:29,001][1652491] Updated weights for policy 0, policy_version 875648 (0.0013) [2024-06-15 22:14:30,785][1652491] Updated weights for policy 0, policy_version 875728 (0.0016) [2024-06-15 22:14:30,955][1648985] Fps is (10 sec: 42611.9, 60 sec: 44783.1, 300 sec: 46986.0). Total num frames: 1793490944. Throughput: 0: 11207.1. Samples: 448439296. Policy #0 lag: (min: 1.0, avg: 56.5, max: 257.0) [2024-06-15 22:14:30,956][1648985] Avg episode reward: [(0, '159.450')] [2024-06-15 22:14:32,043][1652491] Updated weights for policy 0, policy_version 875792 (0.0013) [2024-06-15 22:14:32,994][1652491] Updated weights for policy 0, policy_version 875837 (0.0013) [2024-06-15 22:14:35,955][1648985] Fps is (10 sec: 52429.2, 60 sec: 43690.8, 300 sec: 46430.6). Total num frames: 1793720320. Throughput: 0: 11184.4. Samples: 448509952. Policy #0 lag: (min: 1.0, avg: 56.5, max: 257.0) [2024-06-15 22:14:35,955][1648985] Avg episode reward: [(0, '155.430')] [2024-06-15 22:14:39,855][1652491] Updated weights for policy 0, policy_version 875899 (0.0124) [2024-06-15 22:14:40,955][1648985] Fps is (10 sec: 42597.8, 60 sec: 45875.1, 300 sec: 46652.7). Total num frames: 1793916928. Throughput: 0: 11275.3. Samples: 448552448. Policy #0 lag: (min: 1.0, avg: 56.5, max: 257.0) [2024-06-15 22:14:40,956][1648985] Avg episode reward: [(0, '153.210')] [2024-06-15 22:14:41,460][1652491] Updated weights for policy 0, policy_version 875954 (0.0012) [2024-06-15 22:14:43,083][1652491] Updated weights for policy 0, policy_version 876032 (0.0012) [2024-06-15 22:14:44,253][1652491] Updated weights for policy 0, policy_version 876090 (0.0019) [2024-06-15 22:14:45,955][1648985] Fps is (10 sec: 52428.5, 60 sec: 44782.9, 300 sec: 46652.9). Total num frames: 1794244608. Throughput: 0: 11366.4. Samples: 448610304. Policy #0 lag: (min: 1.0, avg: 56.5, max: 257.0) [2024-06-15 22:14:45,956][1648985] Avg episode reward: [(0, '163.750')] [2024-06-15 22:14:50,478][1652491] Updated weights for policy 0, policy_version 876130 (0.0012) [2024-06-15 22:14:50,955][1648985] Fps is (10 sec: 42598.1, 60 sec: 45328.9, 300 sec: 46541.6). Total num frames: 1794342912. Throughput: 0: 11468.7. Samples: 448687616. Policy #0 lag: (min: 1.0, avg: 56.5, max: 257.0) [2024-06-15 22:14:50,956][1648985] Avg episode reward: [(0, '175.000')] [2024-06-15 22:14:51,277][1651469] Signal inference workers to stop experience collection... (45650 times) [2024-06-15 22:14:51,321][1652491] InferenceWorker_p0-w0: stopping experience collection (45650 times) [2024-06-15 22:14:51,563][1651469] Signal inference workers to resume experience collection... (45650 times) [2024-06-15 22:14:51,564][1652491] InferenceWorker_p0-w0: resuming experience collection (45650 times) [2024-06-15 22:14:52,046][1652491] Updated weights for policy 0, policy_version 876192 (0.0013) [2024-06-15 22:14:53,737][1652491] Updated weights for policy 0, policy_version 876264 (0.0012) [2024-06-15 22:14:55,092][1652491] Updated weights for policy 0, policy_version 876326 (0.0039) [2024-06-15 22:14:55,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 47513.6, 300 sec: 47097.1). Total num frames: 1794768896. Throughput: 0: 11559.9. Samples: 448714240. Policy #0 lag: (min: 1.0, avg: 56.5, max: 257.0) [2024-06-15 22:14:55,956][1648985] Avg episode reward: [(0, '180.620')] [2024-06-15 22:14:55,961][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000876352_1794768896.pth... [2024-06-15 22:14:56,008][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000870848_1783496704.pth [2024-06-15 22:15:00,955][1648985] Fps is (10 sec: 42599.2, 60 sec: 43690.7, 300 sec: 46208.4). Total num frames: 1794768896. Throughput: 0: 11582.6. Samples: 448792576. Policy #0 lag: (min: 1.0, avg: 56.5, max: 257.0) [2024-06-15 22:15:00,956][1648985] Avg episode reward: [(0, '170.350')] [2024-06-15 22:15:01,652][1652491] Updated weights for policy 0, policy_version 876392 (0.0013) [2024-06-15 22:15:02,623][1652491] Updated weights for policy 0, policy_version 876432 (0.0018) [2024-06-15 22:15:03,844][1652491] Updated weights for policy 0, policy_version 876475 (0.0013) [2024-06-15 22:15:05,481][1652491] Updated weights for policy 0, policy_version 876530 (0.0042) [2024-06-15 22:15:05,955][1648985] Fps is (10 sec: 39321.5, 60 sec: 48059.7, 300 sec: 46763.8). Total num frames: 1795162112. Throughput: 0: 11549.2. Samples: 448850944. Policy #0 lag: (min: 1.0, avg: 56.5, max: 257.0) [2024-06-15 22:15:05,956][1648985] Avg episode reward: [(0, '156.000')] [2024-06-15 22:15:06,786][1652491] Updated weights for policy 0, policy_version 876603 (0.0072) [2024-06-15 22:15:10,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 43690.9, 300 sec: 46208.5). Total num frames: 1795293184. Throughput: 0: 11605.3. Samples: 448889856. Policy #0 lag: (min: 1.0, avg: 56.5, max: 257.0) [2024-06-15 22:15:10,956][1648985] Avg episode reward: [(0, '169.300')] [2024-06-15 22:15:12,944][1652491] Updated weights for policy 0, policy_version 876647 (0.0150) [2024-06-15 22:15:14,004][1652491] Updated weights for policy 0, policy_version 876688 (0.0014) [2024-06-15 22:15:15,955][1648985] Fps is (10 sec: 45874.6, 60 sec: 49151.8, 300 sec: 46874.9). Total num frames: 1795620864. Throughput: 0: 11605.3. Samples: 448961536. Policy #0 lag: (min: 1.0, avg: 56.5, max: 257.0) [2024-06-15 22:15:15,956][1648985] Avg episode reward: [(0, '159.500')] [2024-06-15 22:15:16,290][1652491] Updated weights for policy 0, policy_version 876784 (0.0078) [2024-06-15 22:15:18,234][1652491] Updated weights for policy 0, policy_version 876856 (0.0013) [2024-06-15 22:15:20,955][1648985] Fps is (10 sec: 52428.0, 60 sec: 45877.5, 300 sec: 46652.7). Total num frames: 1795817472. Throughput: 0: 11514.3. Samples: 449028096. Policy #0 lag: (min: 1.0, avg: 56.5, max: 257.0) [2024-06-15 22:15:20,956][1648985] Avg episode reward: [(0, '160.010')] [2024-06-15 22:15:24,746][1652491] Updated weights for policy 0, policy_version 876912 (0.0028) [2024-06-15 22:15:25,955][1648985] Fps is (10 sec: 39322.0, 60 sec: 46967.4, 300 sec: 46874.9). Total num frames: 1796014080. Throughput: 0: 11514.3. Samples: 449070592. Policy #0 lag: (min: 1.0, avg: 56.5, max: 257.0) [2024-06-15 22:15:25,956][1648985] Avg episode reward: [(0, '154.130')] [2024-06-15 22:15:26,395][1652491] Updated weights for policy 0, policy_version 876977 (0.0297) [2024-06-15 22:15:28,020][1652491] Updated weights for policy 0, policy_version 877056 (0.0132) [2024-06-15 22:15:28,827][1651469] Signal inference workers to stop experience collection... (45700 times) [2024-06-15 22:15:28,871][1652491] InferenceWorker_p0-w0: stopping experience collection (45700 times) [2024-06-15 22:15:29,067][1651469] Signal inference workers to resume experience collection... (45700 times) [2024-06-15 22:15:29,067][1652491] InferenceWorker_p0-w0: resuming experience collection (45700 times) [2024-06-15 22:15:30,062][1652491] Updated weights for policy 0, policy_version 877111 (0.0012) [2024-06-15 22:15:30,955][1648985] Fps is (10 sec: 52429.5, 60 sec: 47513.6, 300 sec: 47097.0). Total num frames: 1796341760. Throughput: 0: 11400.5. Samples: 449123328. Policy #0 lag: (min: 1.0, avg: 56.5, max: 257.0) [2024-06-15 22:15:30,956][1648985] Avg episode reward: [(0, '156.630')] [2024-06-15 22:15:35,955][1648985] Fps is (10 sec: 42598.9, 60 sec: 45329.1, 300 sec: 46541.7). Total num frames: 1796440064. Throughput: 0: 11503.0. Samples: 449205248. Policy #0 lag: (min: 1.0, avg: 56.5, max: 257.0) [2024-06-15 22:15:35,956][1648985] Avg episode reward: [(0, '154.990')] [2024-06-15 22:15:36,019][1652491] Updated weights for policy 0, policy_version 877178 (0.0014) [2024-06-15 22:15:37,456][1652491] Updated weights for policy 0, policy_version 877220 (0.0037) [2024-06-15 22:15:39,176][1652491] Updated weights for policy 0, policy_version 877310 (0.0014) [2024-06-15 22:15:40,955][1648985] Fps is (10 sec: 49152.0, 60 sec: 48606.0, 300 sec: 46986.0). Total num frames: 1796833280. Throughput: 0: 11616.7. Samples: 449236992. Policy #0 lag: (min: 1.0, avg: 56.5, max: 257.0) [2024-06-15 22:15:40,956][1648985] Avg episode reward: [(0, '139.780')] [2024-06-15 22:15:40,989][1652491] Updated weights for policy 0, policy_version 877370 (0.0026) [2024-06-15 22:15:45,955][1648985] Fps is (10 sec: 45875.1, 60 sec: 44236.8, 300 sec: 46319.5). Total num frames: 1796898816. Throughput: 0: 11673.6. Samples: 449317888. Policy #0 lag: (min: 1.0, avg: 56.5, max: 257.0) [2024-06-15 22:15:45,955][1648985] Avg episode reward: [(0, '136.310')] [2024-06-15 22:15:47,530][1652491] Updated weights for policy 0, policy_version 877441 (0.0123) [2024-06-15 22:15:49,386][1652491] Updated weights for policy 0, policy_version 877536 (0.0125) [2024-06-15 22:15:50,955][1648985] Fps is (10 sec: 42598.3, 60 sec: 48606.0, 300 sec: 46652.8). Total num frames: 1797259264. Throughput: 0: 11741.9. Samples: 449379328. Policy #0 lag: (min: 1.0, avg: 56.5, max: 257.0) [2024-06-15 22:15:50,956][1648985] Avg episode reward: [(0, '131.260')] [2024-06-15 22:15:51,337][1652491] Updated weights for policy 0, policy_version 877600 (0.0013) [2024-06-15 22:15:55,955][1648985] Fps is (10 sec: 49151.7, 60 sec: 43690.7, 300 sec: 46208.4). Total num frames: 1797390336. Throughput: 0: 11764.6. Samples: 449419264. Policy #0 lag: (min: 1.0, avg: 56.5, max: 257.0) [2024-06-15 22:15:55,956][1648985] Avg episode reward: [(0, '168.150')] [2024-06-15 22:15:57,335][1652491] Updated weights for policy 0, policy_version 877669 (0.0014) [2024-06-15 22:15:58,398][1652491] Updated weights for policy 0, policy_version 877712 (0.0012) [2024-06-15 22:16:00,452][1652491] Updated weights for policy 0, policy_version 877792 (0.0129) [2024-06-15 22:16:00,955][1648985] Fps is (10 sec: 49152.1, 60 sec: 49698.1, 300 sec: 46541.7). Total num frames: 1797750784. Throughput: 0: 11696.4. Samples: 449487872. Policy #0 lag: (min: 1.0, avg: 56.5, max: 257.0) [2024-06-15 22:16:00,956][1648985] Avg episode reward: [(0, '177.830')] [2024-06-15 22:16:02,875][1652491] Updated weights for policy 0, policy_version 877856 (0.0016) [2024-06-15 22:16:05,955][1648985] Fps is (10 sec: 52427.5, 60 sec: 45875.0, 300 sec: 46208.4). Total num frames: 1797914624. Throughput: 0: 11719.1. Samples: 449555456. Policy #0 lag: (min: 1.0, avg: 56.5, max: 257.0) [2024-06-15 22:16:05,956][1648985] Avg episode reward: [(0, '173.770')] [2024-06-15 22:16:08,851][1652491] Updated weights for policy 0, policy_version 877928 (0.0013) [2024-06-15 22:16:09,811][1652491] Updated weights for policy 0, policy_version 877959 (0.0014) [2024-06-15 22:16:10,955][1648985] Fps is (10 sec: 39321.6, 60 sec: 47513.6, 300 sec: 46541.7). Total num frames: 1798144000. Throughput: 0: 11628.1. Samples: 449593856. Policy #0 lag: (min: 24.0, avg: 91.8, max: 280.0) [2024-06-15 22:16:10,956][1648985] Avg episode reward: [(0, '186.940')] [2024-06-15 22:16:11,381][1651469] Signal inference workers to stop experience collection... (45750 times) [2024-06-15 22:16:11,422][1652491] InferenceWorker_p0-w0: stopping experience collection (45750 times) [2024-06-15 22:16:11,613][1651469] Signal inference workers to resume experience collection... (45750 times) [2024-06-15 22:16:11,613][1652491] InferenceWorker_p0-w0: resuming experience collection (45750 times) [2024-06-15 22:16:11,616][1652491] Updated weights for policy 0, policy_version 878032 (0.0089) [2024-06-15 22:16:12,623][1652491] Updated weights for policy 0, policy_version 878077 (0.0011) [2024-06-15 22:16:14,072][1652491] Updated weights for policy 0, policy_version 878136 (0.0017) [2024-06-15 22:16:15,955][1648985] Fps is (10 sec: 52430.3, 60 sec: 46967.6, 300 sec: 46541.7). Total num frames: 1798438912. Throughput: 0: 11958.1. Samples: 449661440. Policy #0 lag: (min: 24.0, avg: 91.8, max: 280.0) [2024-06-15 22:16:15,956][1648985] Avg episode reward: [(0, '172.090')] [2024-06-15 22:16:19,132][1652491] Updated weights for policy 0, policy_version 878165 (0.0015) [2024-06-15 22:16:20,579][1652491] Updated weights for policy 0, policy_version 878212 (0.0014) [2024-06-15 22:16:20,955][1648985] Fps is (10 sec: 45875.5, 60 sec: 46421.5, 300 sec: 46763.8). Total num frames: 1798602752. Throughput: 0: 11912.5. Samples: 449741312. Policy #0 lag: (min: 24.0, avg: 91.8, max: 280.0) [2024-06-15 22:16:20,956][1648985] Avg episode reward: [(0, '166.400')] [2024-06-15 22:16:22,047][1652491] Updated weights for policy 0, policy_version 878266 (0.0011) [2024-06-15 22:16:23,580][1652491] Updated weights for policy 0, policy_version 878336 (0.0189) [2024-06-15 22:16:25,033][1652491] Updated weights for policy 0, policy_version 878391 (0.0011) [2024-06-15 22:16:25,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 49152.0, 300 sec: 46652.7). Total num frames: 1798963200. Throughput: 0: 11821.5. Samples: 449768960. Policy #0 lag: (min: 24.0, avg: 91.8, max: 280.0) [2024-06-15 22:16:25,955][1648985] Avg episode reward: [(0, '174.690')] [2024-06-15 22:16:30,171][1652491] Updated weights for policy 0, policy_version 878437 (0.0015) [2024-06-15 22:16:30,955][1648985] Fps is (10 sec: 49151.7, 60 sec: 45875.2, 300 sec: 46652.7). Total num frames: 1799094272. Throughput: 0: 11878.4. Samples: 449852416. Policy #0 lag: (min: 24.0, avg: 91.8, max: 280.0) [2024-06-15 22:16:30,956][1648985] Avg episode reward: [(0, '181.620')] [2024-06-15 22:16:32,532][1652491] Updated weights for policy 0, policy_version 878496 (0.0013) [2024-06-15 22:16:34,680][1652491] Updated weights for policy 0, policy_version 878577 (0.0013) [2024-06-15 22:16:35,955][1648985] Fps is (10 sec: 45875.4, 60 sec: 49698.1, 300 sec: 46652.7). Total num frames: 1799421952. Throughput: 0: 11719.1. Samples: 449906688. Policy #0 lag: (min: 24.0, avg: 91.8, max: 280.0) [2024-06-15 22:16:35,956][1648985] Avg episode reward: [(0, '173.620')] [2024-06-15 22:16:36,378][1652491] Updated weights for policy 0, policy_version 878648 (0.0013) [2024-06-15 22:16:40,955][1648985] Fps is (10 sec: 39320.5, 60 sec: 44236.6, 300 sec: 46208.4). Total num frames: 1799487488. Throughput: 0: 11764.6. Samples: 449948672. Policy #0 lag: (min: 24.0, avg: 91.8, max: 280.0) [2024-06-15 22:16:40,956][1648985] Avg episode reward: [(0, '161.970')] [2024-06-15 22:16:42,109][1652491] Updated weights for policy 0, policy_version 878675 (0.0011) [2024-06-15 22:16:43,372][1652491] Updated weights for policy 0, policy_version 878723 (0.0015) [2024-06-15 22:16:45,262][1652491] Updated weights for policy 0, policy_version 878801 (0.0039) [2024-06-15 22:16:45,955][1648985] Fps is (10 sec: 42598.1, 60 sec: 49151.9, 300 sec: 46541.6). Total num frames: 1799847936. Throughput: 0: 11798.7. Samples: 450018816. Policy #0 lag: (min: 24.0, avg: 91.8, max: 280.0) [2024-06-15 22:16:45,956][1648985] Avg episode reward: [(0, '158.820')] [2024-06-15 22:16:47,567][1652491] Updated weights for policy 0, policy_version 878896 (0.0011) [2024-06-15 22:16:50,955][1648985] Fps is (10 sec: 52429.4, 60 sec: 45875.1, 300 sec: 46208.4). Total num frames: 1800011776. Throughput: 0: 11707.8. Samples: 450082304. Policy #0 lag: (min: 24.0, avg: 91.8, max: 280.0) [2024-06-15 22:16:50,956][1648985] Avg episode reward: [(0, '164.690')] [2024-06-15 22:16:53,395][1652491] Updated weights for policy 0, policy_version 878931 (0.0013) [2024-06-15 22:16:53,775][1651469] Signal inference workers to stop experience collection... (45800 times) [2024-06-15 22:16:53,910][1652491] InferenceWorker_p0-w0: stopping experience collection (45800 times) [2024-06-15 22:16:54,068][1651469] Signal inference workers to resume experience collection... (45800 times) [2024-06-15 22:16:54,069][1652491] InferenceWorker_p0-w0: resuming experience collection (45800 times) [2024-06-15 22:16:55,947][1652491] Updated weights for policy 0, policy_version 878994 (0.0194) [2024-06-15 22:16:55,955][1648985] Fps is (10 sec: 32767.9, 60 sec: 46421.3, 300 sec: 46097.3). Total num frames: 1800175616. Throughput: 0: 11730.5. Samples: 450121728. Policy #0 lag: (min: 24.0, avg: 91.8, max: 280.0) [2024-06-15 22:16:55,956][1648985] Avg episode reward: [(0, '182.480')] [2024-06-15 22:16:56,567][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000879024_1800241152.pth... [2024-06-15 22:16:56,681][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000873648_1789231104.pth [2024-06-15 22:16:57,657][1652491] Updated weights for policy 0, policy_version 879059 (0.0026) [2024-06-15 22:16:59,019][1652491] Updated weights for policy 0, policy_version 879120 (0.0012) [2024-06-15 22:16:59,934][1652491] Updated weights for policy 0, policy_version 879163 (0.0013) [2024-06-15 22:17:00,955][1648985] Fps is (10 sec: 52429.9, 60 sec: 46421.4, 300 sec: 46208.4). Total num frames: 1800536064. Throughput: 0: 11446.1. Samples: 450176512. Policy #0 lag: (min: 24.0, avg: 91.8, max: 280.0) [2024-06-15 22:17:00,956][1648985] Avg episode reward: [(0, '176.440')] [2024-06-15 22:17:05,955][1648985] Fps is (10 sec: 45875.4, 60 sec: 45329.3, 300 sec: 46208.4). Total num frames: 1800634368. Throughput: 0: 11525.7. Samples: 450259968. Policy #0 lag: (min: 24.0, avg: 91.8, max: 280.0) [2024-06-15 22:17:05,956][1648985] Avg episode reward: [(0, '172.490')] [2024-06-15 22:17:06,494][1652491] Updated weights for policy 0, policy_version 879248 (0.0028) [2024-06-15 22:17:07,574][1652491] Updated weights for policy 0, policy_version 879296 (0.0037) [2024-06-15 22:17:09,987][1652491] Updated weights for policy 0, policy_version 879378 (0.0013) [2024-06-15 22:17:10,955][1648985] Fps is (10 sec: 49151.7, 60 sec: 48059.7, 300 sec: 46208.4). Total num frames: 1801027584. Throughput: 0: 11537.1. Samples: 450288128. Policy #0 lag: (min: 24.0, avg: 91.8, max: 280.0) [2024-06-15 22:17:10,956][1648985] Avg episode reward: [(0, '169.470')] [2024-06-15 22:17:15,955][1648985] Fps is (10 sec: 42597.7, 60 sec: 43690.5, 300 sec: 46208.4). Total num frames: 1801060352. Throughput: 0: 11184.3. Samples: 450355712. Policy #0 lag: (min: 24.0, avg: 91.8, max: 280.0) [2024-06-15 22:17:15,956][1648985] Avg episode reward: [(0, '162.210')] [2024-06-15 22:17:15,974][1652491] Updated weights for policy 0, policy_version 879426 (0.0011) [2024-06-15 22:17:17,500][1652491] Updated weights for policy 0, policy_version 879488 (0.0096) [2024-06-15 22:17:19,481][1652491] Updated weights for policy 0, policy_version 879543 (0.0094) [2024-06-15 22:17:20,955][1648985] Fps is (10 sec: 36045.2, 60 sec: 46421.4, 300 sec: 45764.2). Total num frames: 1801388032. Throughput: 0: 11389.2. Samples: 450419200. Policy #0 lag: (min: 24.0, avg: 91.8, max: 280.0) [2024-06-15 22:17:20,955][1648985] Avg episode reward: [(0, '164.570')] [2024-06-15 22:17:21,573][1652491] Updated weights for policy 0, policy_version 879617 (0.0093) [2024-06-15 22:17:22,785][1652491] Updated weights for policy 0, policy_version 879671 (0.0112) [2024-06-15 22:17:25,955][1648985] Fps is (10 sec: 52430.0, 60 sec: 43690.7, 300 sec: 46208.4). Total num frames: 1801584640. Throughput: 0: 11127.5. Samples: 450449408. Policy #0 lag: (min: 24.0, avg: 91.8, max: 280.0) [2024-06-15 22:17:25,956][1648985] Avg episode reward: [(0, '153.880')] [2024-06-15 22:17:28,714][1652491] Updated weights for policy 0, policy_version 879728 (0.0121) [2024-06-15 22:17:30,408][1652491] Updated weights for policy 0, policy_version 879760 (0.0014) [2024-06-15 22:17:30,955][1648985] Fps is (10 sec: 39320.8, 60 sec: 44782.9, 300 sec: 45653.1). Total num frames: 1801781248. Throughput: 0: 11275.4. Samples: 450526208. Policy #0 lag: (min: 24.0, avg: 91.8, max: 280.0) [2024-06-15 22:17:30,956][1648985] Avg episode reward: [(0, '156.720')] [2024-06-15 22:17:31,730][1652491] Updated weights for policy 0, policy_version 879812 (0.0012) [2024-06-15 22:17:32,682][1651469] Signal inference workers to stop experience collection... (45850 times) [2024-06-15 22:17:32,725][1652491] InferenceWorker_p0-w0: stopping experience collection (45850 times) [2024-06-15 22:17:32,963][1651469] Signal inference workers to resume experience collection... (45850 times) [2024-06-15 22:17:32,963][1652491] InferenceWorker_p0-w0: resuming experience collection (45850 times) [2024-06-15 22:17:33,110][1652491] Updated weights for policy 0, policy_version 879873 (0.0012) [2024-06-15 22:17:35,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 44783.0, 300 sec: 46208.4). Total num frames: 1802108928. Throughput: 0: 11320.9. Samples: 450591744. Policy #0 lag: (min: 24.0, avg: 91.8, max: 280.0) [2024-06-15 22:17:35,956][1648985] Avg episode reward: [(0, '147.680')] [2024-06-15 22:17:39,024][1652491] Updated weights for policy 0, policy_version 879952 (0.0112) [2024-06-15 22:17:40,955][1648985] Fps is (10 sec: 45876.1, 60 sec: 45875.5, 300 sec: 45764.2). Total num frames: 1802240000. Throughput: 0: 11423.3. Samples: 450635776. Policy #0 lag: (min: 24.0, avg: 91.8, max: 280.0) [2024-06-15 22:17:40,955][1648985] Avg episode reward: [(0, '170.880')] [2024-06-15 22:17:41,556][1652491] Updated weights for policy 0, policy_version 880001 (0.0011) [2024-06-15 22:17:44,051][1652491] Updated weights for policy 0, policy_version 880112 (0.0028) [2024-06-15 22:17:45,596][1652491] Updated weights for policy 0, policy_version 880176 (0.0019) [2024-06-15 22:17:45,955][1648985] Fps is (10 sec: 52429.1, 60 sec: 46421.5, 300 sec: 46208.5). Total num frames: 1802633216. Throughput: 0: 11400.5. Samples: 450689536. Policy #0 lag: (min: 24.0, avg: 91.8, max: 280.0) [2024-06-15 22:17:45,956][1648985] Avg episode reward: [(0, '161.990')] [2024-06-15 22:17:50,413][1652491] Updated weights for policy 0, policy_version 880215 (0.0084) [2024-06-15 22:17:50,955][1648985] Fps is (10 sec: 49150.9, 60 sec: 45329.1, 300 sec: 45986.3). Total num frames: 1802731520. Throughput: 0: 11423.3. Samples: 450774016. Policy #0 lag: (min: 24.0, avg: 91.8, max: 280.0) [2024-06-15 22:17:50,956][1648985] Avg episode reward: [(0, '148.180')] [2024-06-15 22:17:52,847][1652491] Updated weights for policy 0, policy_version 880275 (0.0098) [2024-06-15 22:17:54,575][1652491] Updated weights for policy 0, policy_version 880352 (0.0096) [2024-06-15 22:17:55,955][1648985] Fps is (10 sec: 42598.3, 60 sec: 48059.8, 300 sec: 45875.2). Total num frames: 1803059200. Throughput: 0: 11503.0. Samples: 450805760. Policy #0 lag: (min: 24.0, avg: 91.8, max: 280.0) [2024-06-15 22:17:55,955][1648985] Avg episode reward: [(0, '139.050')] [2024-06-15 22:17:55,990][1652491] Updated weights for policy 0, policy_version 880416 (0.0014) [2024-06-15 22:18:00,955][1648985] Fps is (10 sec: 42599.1, 60 sec: 43690.6, 300 sec: 46097.4). Total num frames: 1803157504. Throughput: 0: 11662.3. Samples: 450880512. Policy #0 lag: (min: 24.0, avg: 91.8, max: 280.0) [2024-06-15 22:18:00,955][1648985] Avg episode reward: [(0, '141.280')] [2024-06-15 22:18:01,831][1652491] Updated weights for policy 0, policy_version 880481 (0.0014) [2024-06-15 22:18:04,127][1652491] Updated weights for policy 0, policy_version 880528 (0.0013) [2024-06-15 22:18:05,954][1652491] Updated weights for policy 0, policy_version 880608 (0.0118) [2024-06-15 22:18:05,958][1648985] Fps is (10 sec: 39311.4, 60 sec: 46965.5, 300 sec: 45652.7). Total num frames: 1803452416. Throughput: 0: 11672.9. Samples: 450944512. Policy #0 lag: (min: 15.0, avg: 92.9, max: 271.0) [2024-06-15 22:18:05,958][1648985] Avg episode reward: [(0, '134.870')] [2024-06-15 22:18:07,290][1652491] Updated weights for policy 0, policy_version 880664 (0.0151) [2024-06-15 22:18:10,955][1648985] Fps is (10 sec: 52428.1, 60 sec: 44236.7, 300 sec: 46208.4). Total num frames: 1803681792. Throughput: 0: 11696.3. Samples: 450975744. Policy #0 lag: (min: 15.0, avg: 92.9, max: 271.0) [2024-06-15 22:18:10,956][1648985] Avg episode reward: [(0, '134.650')] [2024-06-15 22:18:12,028][1652491] Updated weights for policy 0, policy_version 880705 (0.0013) [2024-06-15 22:18:13,098][1652491] Updated weights for policy 0, policy_version 880760 (0.0013) [2024-06-15 22:18:15,070][1651469] Signal inference workers to stop experience collection... (45900 times) [2024-06-15 22:18:15,115][1652491] InferenceWorker_p0-w0: stopping experience collection (45900 times) [2024-06-15 22:18:15,326][1651469] Signal inference workers to resume experience collection... (45900 times) [2024-06-15 22:18:15,326][1652491] InferenceWorker_p0-w0: resuming experience collection (45900 times) [2024-06-15 22:18:15,569][1652491] Updated weights for policy 0, policy_version 880803 (0.0012) [2024-06-15 22:18:15,955][1648985] Fps is (10 sec: 45886.9, 60 sec: 47513.8, 300 sec: 45653.0). Total num frames: 1803911168. Throughput: 0: 11935.3. Samples: 451063296. Policy #0 lag: (min: 15.0, avg: 92.9, max: 271.0) [2024-06-15 22:18:15,956][1648985] Avg episode reward: [(0, '158.120')] [2024-06-15 22:18:17,198][1652491] Updated weights for policy 0, policy_version 880866 (0.0014) [2024-06-15 22:18:18,898][1652491] Updated weights for policy 0, policy_version 880930 (0.0047) [2024-06-15 22:18:20,955][1648985] Fps is (10 sec: 52429.4, 60 sec: 46967.4, 300 sec: 46208.5). Total num frames: 1804206080. Throughput: 0: 11730.5. Samples: 451119616. Policy #0 lag: (min: 15.0, avg: 92.9, max: 271.0) [2024-06-15 22:18:20,956][1648985] Avg episode reward: [(0, '166.930')] [2024-06-15 22:18:23,298][1652491] Updated weights for policy 0, policy_version 880992 (0.0013) [2024-06-15 22:18:24,166][1652491] Updated weights for policy 0, policy_version 881024 (0.0011) [2024-06-15 22:18:25,955][1648985] Fps is (10 sec: 49151.8, 60 sec: 46967.4, 300 sec: 46097.4). Total num frames: 1804402688. Throughput: 0: 11662.2. Samples: 451160576. Policy #0 lag: (min: 15.0, avg: 92.9, max: 271.0) [2024-06-15 22:18:25,956][1648985] Avg episode reward: [(0, '163.930')] [2024-06-15 22:18:26,941][1652491] Updated weights for policy 0, policy_version 881089 (0.0013) [2024-06-15 22:18:28,331][1652491] Updated weights for policy 0, policy_version 881148 (0.0015) [2024-06-15 22:18:30,092][1652491] Updated weights for policy 0, policy_version 881214 (0.0016) [2024-06-15 22:18:30,955][1648985] Fps is (10 sec: 52428.3, 60 sec: 49152.0, 300 sec: 46208.4). Total num frames: 1804730368. Throughput: 0: 11923.9. Samples: 451226112. Policy #0 lag: (min: 15.0, avg: 92.9, max: 271.0) [2024-06-15 22:18:30,956][1648985] Avg episode reward: [(0, '145.040')] [2024-06-15 22:18:34,455][1652491] Updated weights for policy 0, policy_version 881264 (0.0016) [2024-06-15 22:18:35,955][1648985] Fps is (10 sec: 45875.1, 60 sec: 45875.1, 300 sec: 46430.6). Total num frames: 1804861440. Throughput: 0: 11832.9. Samples: 451306496. Policy #0 lag: (min: 15.0, avg: 92.9, max: 271.0) [2024-06-15 22:18:35,956][1648985] Avg episode reward: [(0, '136.800')] [2024-06-15 22:18:37,287][1652491] Updated weights for policy 0, policy_version 881298 (0.0013) [2024-06-15 22:18:39,478][1652491] Updated weights for policy 0, policy_version 881378 (0.0104) [2024-06-15 22:18:40,955][1648985] Fps is (10 sec: 45875.5, 60 sec: 49151.9, 300 sec: 46208.4). Total num frames: 1805189120. Throughput: 0: 11753.2. Samples: 451334656. Policy #0 lag: (min: 15.0, avg: 92.9, max: 271.0) [2024-06-15 22:18:40,956][1648985] Avg episode reward: [(0, '122.680')] [2024-06-15 22:18:40,990][1652491] Updated weights for policy 0, policy_version 881442 (0.0012) [2024-06-15 22:18:44,738][1652491] Updated weights for policy 0, policy_version 881475 (0.0013) [2024-06-15 22:18:45,810][1652491] Updated weights for policy 0, policy_version 881530 (0.0014) [2024-06-15 22:18:45,955][1648985] Fps is (10 sec: 52429.3, 60 sec: 45875.2, 300 sec: 46652.8). Total num frames: 1805385728. Throughput: 0: 11810.1. Samples: 451411968. Policy #0 lag: (min: 15.0, avg: 92.9, max: 271.0) [2024-06-15 22:18:45,955][1648985] Avg episode reward: [(0, '158.170')] [2024-06-15 22:18:49,549][1652491] Updated weights for policy 0, policy_version 881600 (0.0129) [2024-06-15 22:18:50,955][1648985] Fps is (10 sec: 42598.3, 60 sec: 48059.8, 300 sec: 46430.6). Total num frames: 1805615104. Throughput: 0: 11742.5. Samples: 451472896. Policy #0 lag: (min: 15.0, avg: 92.9, max: 271.0) [2024-06-15 22:18:50,956][1648985] Avg episode reward: [(0, '163.620')] [2024-06-15 22:18:50,984][1652491] Updated weights for policy 0, policy_version 881656 (0.0122) [2024-06-15 22:18:51,533][1651469] Signal inference workers to stop experience collection... (45950 times) [2024-06-15 22:18:51,606][1652491] InferenceWorker_p0-w0: stopping experience collection (45950 times) [2024-06-15 22:18:51,806][1651469] Signal inference workers to resume experience collection... (45950 times) [2024-06-15 22:18:51,822][1652491] InferenceWorker_p0-w0: resuming experience collection (45950 times) [2024-06-15 22:18:52,310][1652491] Updated weights for policy 0, policy_version 881715 (0.0012) [2024-06-15 22:18:55,955][1648985] Fps is (10 sec: 42596.8, 60 sec: 45874.9, 300 sec: 46319.5). Total num frames: 1805811712. Throughput: 0: 11889.7. Samples: 451510784. Policy #0 lag: (min: 15.0, avg: 92.9, max: 271.0) [2024-06-15 22:18:55,956][1648985] Avg episode reward: [(0, '164.350')] [2024-06-15 22:18:56,018][1652491] Updated weights for policy 0, policy_version 881747 (0.0012) [2024-06-15 22:18:56,220][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000881760_1805844480.pth... [2024-06-15 22:18:56,401][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000876352_1794768896.pth [2024-06-15 22:18:59,350][1652491] Updated weights for policy 0, policy_version 881808 (0.0011) [2024-06-15 22:19:00,955][1648985] Fps is (10 sec: 42598.6, 60 sec: 48059.7, 300 sec: 46652.7). Total num frames: 1806041088. Throughput: 0: 11662.2. Samples: 451588096. Policy #0 lag: (min: 15.0, avg: 92.9, max: 271.0) [2024-06-15 22:19:00,956][1648985] Avg episode reward: [(0, '145.650')] [2024-06-15 22:19:01,354][1652491] Updated weights for policy 0, policy_version 881888 (0.0012) [2024-06-15 22:19:03,073][1652491] Updated weights for policy 0, policy_version 881958 (0.0015) [2024-06-15 22:19:05,955][1648985] Fps is (10 sec: 49152.7, 60 sec: 47515.5, 300 sec: 46208.4). Total num frames: 1806303232. Throughput: 0: 11844.2. Samples: 451652608. Policy #0 lag: (min: 15.0, avg: 92.9, max: 271.0) [2024-06-15 22:19:05,956][1648985] Avg episode reward: [(0, '141.320')] [2024-06-15 22:19:07,177][1652491] Updated weights for policy 0, policy_version 882000 (0.0031) [2024-06-15 22:19:08,259][1652491] Updated weights for policy 0, policy_version 882045 (0.0014) [2024-06-15 22:19:10,955][1648985] Fps is (10 sec: 39320.9, 60 sec: 45875.2, 300 sec: 46652.7). Total num frames: 1806434304. Throughput: 0: 11741.8. Samples: 451688960. Policy #0 lag: (min: 15.0, avg: 92.9, max: 271.0) [2024-06-15 22:19:10,956][1648985] Avg episode reward: [(0, '124.960')] [2024-06-15 22:19:12,483][1652491] Updated weights for policy 0, policy_version 882112 (0.0232) [2024-06-15 22:19:14,007][1652491] Updated weights for policy 0, policy_version 882179 (0.0133) [2024-06-15 22:19:15,340][1652491] Updated weights for policy 0, policy_version 882233 (0.0013) [2024-06-15 22:19:15,955][1648985] Fps is (10 sec: 52430.0, 60 sec: 48605.9, 300 sec: 46653.3). Total num frames: 1806827520. Throughput: 0: 11571.2. Samples: 451746816. Policy #0 lag: (min: 15.0, avg: 92.9, max: 271.0) [2024-06-15 22:19:15,955][1648985] Avg episode reward: [(0, '134.370')] [2024-06-15 22:19:19,057][1652491] Updated weights for policy 0, policy_version 882273 (0.0011) [2024-06-15 22:19:20,955][1648985] Fps is (10 sec: 52429.7, 60 sec: 45875.2, 300 sec: 46652.8). Total num frames: 1806958592. Throughput: 0: 11514.3. Samples: 451824640. Policy #0 lag: (min: 15.0, avg: 92.9, max: 271.0) [2024-06-15 22:19:20,956][1648985] Avg episode reward: [(0, '130.260')] [2024-06-15 22:19:23,122][1652491] Updated weights for policy 0, policy_version 882336 (0.0011) [2024-06-15 22:19:24,790][1652491] Updated weights for policy 0, policy_version 882403 (0.0013) [2024-06-15 22:19:25,955][1648985] Fps is (10 sec: 42598.1, 60 sec: 47513.6, 300 sec: 46652.7). Total num frames: 1807253504. Throughput: 0: 11605.3. Samples: 451856896. Policy #0 lag: (min: 15.0, avg: 92.9, max: 271.0) [2024-06-15 22:19:25,956][1648985] Avg episode reward: [(0, '137.340')] [2024-06-15 22:19:26,222][1652491] Updated weights for policy 0, policy_version 882464 (0.0031) [2024-06-15 22:19:29,951][1652491] Updated weights for policy 0, policy_version 882512 (0.0145) [2024-06-15 22:19:30,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 45875.3, 300 sec: 46652.7). Total num frames: 1807482880. Throughput: 0: 11377.8. Samples: 451923968. Policy #0 lag: (min: 15.0, avg: 92.9, max: 271.0) [2024-06-15 22:19:30,956][1648985] Avg episode reward: [(0, '146.320')] [2024-06-15 22:19:34,542][1652491] Updated weights for policy 0, policy_version 882576 (0.0014) [2024-06-15 22:19:35,331][1651469] Signal inference workers to stop experience collection... (46000 times) [2024-06-15 22:19:35,367][1652491] InferenceWorker_p0-w0: stopping experience collection (46000 times) [2024-06-15 22:19:35,652][1651469] Signal inference workers to resume experience collection... (46000 times) [2024-06-15 22:19:35,653][1652491] InferenceWorker_p0-w0: resuming experience collection (46000 times) [2024-06-15 22:19:35,955][1648985] Fps is (10 sec: 36044.9, 60 sec: 45875.2, 300 sec: 46430.6). Total num frames: 1807613952. Throughput: 0: 11548.5. Samples: 451992576. Policy #0 lag: (min: 15.0, avg: 92.9, max: 271.0) [2024-06-15 22:19:35,956][1648985] Avg episode reward: [(0, '152.530')] [2024-06-15 22:19:36,513][1652491] Updated weights for policy 0, policy_version 882656 (0.0100) [2024-06-15 22:19:37,860][1652491] Updated weights for policy 0, policy_version 882709 (0.0016) [2024-06-15 22:19:40,955][1648985] Fps is (10 sec: 39321.3, 60 sec: 44782.9, 300 sec: 46208.4). Total num frames: 1807876096. Throughput: 0: 11309.6. Samples: 452019712. Policy #0 lag: (min: 15.0, avg: 92.9, max: 271.0) [2024-06-15 22:19:40,956][1648985] Avg episode reward: [(0, '172.640')] [2024-06-15 22:19:41,117][1652491] Updated weights for policy 0, policy_version 882768 (0.0123) [2024-06-15 22:19:41,870][1652491] Updated weights for policy 0, policy_version 882815 (0.0030) [2024-06-15 22:19:45,955][1648985] Fps is (10 sec: 39321.6, 60 sec: 43690.6, 300 sec: 46319.5). Total num frames: 1808007168. Throughput: 0: 11377.8. Samples: 452100096. Policy #0 lag: (min: 15.0, avg: 92.9, max: 271.0) [2024-06-15 22:19:45,956][1648985] Avg episode reward: [(0, '185.120')] [2024-06-15 22:19:48,001][1652491] Updated weights for policy 0, policy_version 882902 (0.0091) [2024-06-15 22:19:49,635][1652491] Updated weights for policy 0, policy_version 882976 (0.0014) [2024-06-15 22:19:50,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 46421.3, 300 sec: 46208.4). Total num frames: 1808400384. Throughput: 0: 11241.3. Samples: 452158464. Policy #0 lag: (min: 15.0, avg: 92.9, max: 271.0) [2024-06-15 22:19:50,956][1648985] Avg episode reward: [(0, '173.310')] [2024-06-15 22:19:53,508][1652491] Updated weights for policy 0, policy_version 883067 (0.0116) [2024-06-15 22:19:55,955][1648985] Fps is (10 sec: 52428.3, 60 sec: 45329.2, 300 sec: 46652.7). Total num frames: 1808531456. Throughput: 0: 11355.0. Samples: 452199936. Policy #0 lag: (min: 52.0, avg: 175.9, max: 308.0) [2024-06-15 22:19:55,956][1648985] Avg episode reward: [(0, '145.030')] [2024-06-15 22:19:58,895][1652491] Updated weights for policy 0, policy_version 883137 (0.0120) [2024-06-15 22:20:00,843][1652491] Updated weights for policy 0, policy_version 883216 (0.0013) [2024-06-15 22:20:00,955][1648985] Fps is (10 sec: 42599.0, 60 sec: 46421.4, 300 sec: 46319.5). Total num frames: 1808826368. Throughput: 0: 11446.1. Samples: 452261888. Policy #0 lag: (min: 52.0, avg: 175.9, max: 308.0) [2024-06-15 22:20:00,955][1648985] Avg episode reward: [(0, '144.450')] [2024-06-15 22:20:01,954][1652491] Updated weights for policy 0, policy_version 883264 (0.0012) [2024-06-15 22:20:05,648][1652491] Updated weights for policy 0, policy_version 883321 (0.0016) [2024-06-15 22:20:05,955][1648985] Fps is (10 sec: 52429.3, 60 sec: 45875.3, 300 sec: 46652.7). Total num frames: 1809055744. Throughput: 0: 11229.9. Samples: 452329984. Policy #0 lag: (min: 52.0, avg: 175.9, max: 308.0) [2024-06-15 22:20:05,956][1648985] Avg episode reward: [(0, '143.140')] [2024-06-15 22:20:10,504][1652491] Updated weights for policy 0, policy_version 883392 (0.0013) [2024-06-15 22:20:10,955][1648985] Fps is (10 sec: 36044.2, 60 sec: 45875.2, 300 sec: 45986.3). Total num frames: 1809186816. Throughput: 0: 11446.0. Samples: 452371968. Policy #0 lag: (min: 52.0, avg: 175.9, max: 308.0) [2024-06-15 22:20:10,956][1648985] Avg episode reward: [(0, '164.050')] [2024-06-15 22:20:12,645][1652491] Updated weights for policy 0, policy_version 883472 (0.0012) [2024-06-15 22:20:15,955][1648985] Fps is (10 sec: 39321.6, 60 sec: 43690.6, 300 sec: 46208.5). Total num frames: 1809448960. Throughput: 0: 11286.8. Samples: 452431872. Policy #0 lag: (min: 52.0, avg: 175.9, max: 308.0) [2024-06-15 22:20:15,955][1648985] Avg episode reward: [(0, '171.020')] [2024-06-15 22:20:16,165][1651469] Signal inference workers to stop experience collection... (46050 times) [2024-06-15 22:20:16,214][1652491] InferenceWorker_p0-w0: stopping experience collection (46050 times) [2024-06-15 22:20:16,397][1651469] Signal inference workers to resume experience collection... (46050 times) [2024-06-15 22:20:16,398][1652491] InferenceWorker_p0-w0: resuming experience collection (46050 times) [2024-06-15 22:20:16,399][1652491] Updated weights for policy 0, policy_version 883536 (0.0027) [2024-06-15 22:20:17,541][1652491] Updated weights for policy 0, policy_version 883580 (0.0016) [2024-06-15 22:20:20,955][1648985] Fps is (10 sec: 42598.3, 60 sec: 44236.7, 300 sec: 46097.3). Total num frames: 1809612800. Throughput: 0: 11355.0. Samples: 452503552. Policy #0 lag: (min: 52.0, avg: 175.9, max: 308.0) [2024-06-15 22:20:20,956][1648985] Avg episode reward: [(0, '167.540')] [2024-06-15 22:20:21,901][1652491] Updated weights for policy 0, policy_version 883635 (0.0013) [2024-06-15 22:20:23,514][1652491] Updated weights for policy 0, policy_version 883707 (0.0012) [2024-06-15 22:20:24,907][1652491] Updated weights for policy 0, policy_version 883760 (0.0017) [2024-06-15 22:20:25,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 45329.1, 300 sec: 46208.4). Total num frames: 1809973248. Throughput: 0: 11286.8. Samples: 452527616. Policy #0 lag: (min: 52.0, avg: 175.9, max: 308.0) [2024-06-15 22:20:25,955][1648985] Avg episode reward: [(0, '165.850')] [2024-06-15 22:20:28,000][1652491] Updated weights for policy 0, policy_version 883781 (0.0012) [2024-06-15 22:20:30,955][1648985] Fps is (10 sec: 49152.7, 60 sec: 43690.7, 300 sec: 46319.5). Total num frames: 1810104320. Throughput: 0: 11082.0. Samples: 452598784. Policy #0 lag: (min: 52.0, avg: 175.9, max: 308.0) [2024-06-15 22:20:30,956][1648985] Avg episode reward: [(0, '144.960')] [2024-06-15 22:20:32,214][1652491] Updated weights for policy 0, policy_version 883856 (0.0018) [2024-06-15 22:20:33,168][1652491] Updated weights for policy 0, policy_version 883899 (0.0013) [2024-06-15 22:20:34,601][1652491] Updated weights for policy 0, policy_version 883938 (0.0012) [2024-06-15 22:20:35,955][1648985] Fps is (10 sec: 45875.2, 60 sec: 46967.5, 300 sec: 46097.4). Total num frames: 1810432000. Throughput: 0: 11275.4. Samples: 452665856. Policy #0 lag: (min: 52.0, avg: 175.9, max: 308.0) [2024-06-15 22:20:35,956][1648985] Avg episode reward: [(0, '151.830')] [2024-06-15 22:20:36,479][1652491] Updated weights for policy 0, policy_version 884027 (0.0014) [2024-06-15 22:20:40,550][1652491] Updated weights for policy 0, policy_version 884092 (0.0016) [2024-06-15 22:20:40,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 45875.2, 300 sec: 46541.7). Total num frames: 1810628608. Throughput: 0: 11127.5. Samples: 452700672. Policy #0 lag: (min: 52.0, avg: 175.9, max: 308.0) [2024-06-15 22:20:40,955][1648985] Avg episode reward: [(0, '142.680')] [2024-06-15 22:20:44,113][1652491] Updated weights for policy 0, policy_version 884116 (0.0012) [2024-06-15 22:20:45,192][1652491] Updated weights for policy 0, policy_version 884161 (0.0017) [2024-06-15 22:20:45,955][1648985] Fps is (10 sec: 39320.8, 60 sec: 46967.3, 300 sec: 45986.3). Total num frames: 1810825216. Throughput: 0: 11423.2. Samples: 452775936. Policy #0 lag: (min: 52.0, avg: 175.9, max: 308.0) [2024-06-15 22:20:45,956][1648985] Avg episode reward: [(0, '167.980')] [2024-06-15 22:20:47,274][1652491] Updated weights for policy 0, policy_version 884256 (0.0012) [2024-06-15 22:20:50,610][1652491] Updated weights for policy 0, policy_version 884292 (0.0013) [2024-06-15 22:20:50,955][1648985] Fps is (10 sec: 42597.7, 60 sec: 44236.7, 300 sec: 46319.5). Total num frames: 1811054592. Throughput: 0: 11286.7. Samples: 452837888. Policy #0 lag: (min: 52.0, avg: 175.9, max: 308.0) [2024-06-15 22:20:50,956][1648985] Avg episode reward: [(0, '157.400')] [2024-06-15 22:20:54,961][1652491] Updated weights for policy 0, policy_version 884368 (0.0015) [2024-06-15 22:20:55,955][1648985] Fps is (10 sec: 45874.6, 60 sec: 45875.0, 300 sec: 45875.2). Total num frames: 1811283968. Throughput: 0: 11264.0. Samples: 452878848. Policy #0 lag: (min: 52.0, avg: 175.9, max: 308.0) [2024-06-15 22:20:55,956][1648985] Avg episode reward: [(0, '146.620')] [2024-06-15 22:20:55,964][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000884416_1811283968.pth... [2024-06-15 22:20:56,049][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000879024_1800241152.pth [2024-06-15 22:20:56,729][1652491] Updated weights for policy 0, policy_version 884420 (0.0041) [2024-06-15 22:20:58,003][1651469] Signal inference workers to stop experience collection... (46100 times) [2024-06-15 22:20:58,043][1652491] InferenceWorker_p0-w0: stopping experience collection (46100 times) [2024-06-15 22:20:58,323][1651469] Signal inference workers to resume experience collection... (46100 times) [2024-06-15 22:20:58,325][1652491] InferenceWorker_p0-w0: resuming experience collection (46100 times) [2024-06-15 22:20:58,505][1652491] Updated weights for policy 0, policy_version 884484 (0.0013) [2024-06-15 22:21:00,955][1648985] Fps is (10 sec: 49152.8, 60 sec: 45329.0, 300 sec: 46208.5). Total num frames: 1811546112. Throughput: 0: 11207.1. Samples: 452936192. Policy #0 lag: (min: 52.0, avg: 175.9, max: 308.0) [2024-06-15 22:21:00,956][1648985] Avg episode reward: [(0, '142.030')] [2024-06-15 22:21:03,119][1652491] Updated weights for policy 0, policy_version 884576 (0.0141) [2024-06-15 22:21:05,955][1648985] Fps is (10 sec: 39322.9, 60 sec: 43690.7, 300 sec: 45875.2). Total num frames: 1811677184. Throughput: 0: 11332.3. Samples: 453013504. Policy #0 lag: (min: 52.0, avg: 175.9, max: 308.0) [2024-06-15 22:21:05,956][1648985] Avg episode reward: [(0, '185.330')] [2024-06-15 22:21:06,234][1652491] Updated weights for policy 0, policy_version 884624 (0.0031) [2024-06-15 22:21:07,993][1652491] Updated weights for policy 0, policy_version 884673 (0.0040) [2024-06-15 22:21:09,752][1652491] Updated weights for policy 0, policy_version 884752 (0.0012) [2024-06-15 22:21:10,955][1648985] Fps is (10 sec: 49151.8, 60 sec: 47513.7, 300 sec: 46097.4). Total num frames: 1812037632. Throughput: 0: 11616.7. Samples: 453050368. Policy #0 lag: (min: 52.0, avg: 175.9, max: 308.0) [2024-06-15 22:21:10,956][1648985] Avg episode reward: [(0, '192.750')] [2024-06-15 22:21:13,267][1652491] Updated weights for policy 0, policy_version 884804 (0.0015) [2024-06-15 22:21:14,611][1652491] Updated weights for policy 0, policy_version 884864 (0.0012) [2024-06-15 22:21:15,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 45875.2, 300 sec: 46097.4). Total num frames: 1812201472. Throughput: 0: 11491.6. Samples: 453115904. Policy #0 lag: (min: 52.0, avg: 175.9, max: 308.0) [2024-06-15 22:21:15,955][1648985] Avg episode reward: [(0, '176.770')] [2024-06-15 22:21:19,010][1652491] Updated weights for policy 0, policy_version 884917 (0.0012) [2024-06-15 22:21:20,955][1648985] Fps is (10 sec: 42598.4, 60 sec: 47513.7, 300 sec: 45764.1). Total num frames: 1812463616. Throughput: 0: 11525.7. Samples: 453184512. Policy #0 lag: (min: 52.0, avg: 175.9, max: 308.0) [2024-06-15 22:21:20,956][1648985] Avg episode reward: [(0, '151.840')] [2024-06-15 22:21:21,082][1652491] Updated weights for policy 0, policy_version 884996 (0.0033) [2024-06-15 22:21:21,993][1652491] Updated weights for policy 0, policy_version 885050 (0.0041) [2024-06-15 22:21:25,824][1652491] Updated weights for policy 0, policy_version 885104 (0.0014) [2024-06-15 22:21:25,955][1648985] Fps is (10 sec: 49151.7, 60 sec: 45329.0, 300 sec: 46097.4). Total num frames: 1812692992. Throughput: 0: 11502.9. Samples: 453218304. Policy #0 lag: (min: 52.0, avg: 175.9, max: 308.0) [2024-06-15 22:21:25,956][1648985] Avg episode reward: [(0, '155.470')] [2024-06-15 22:21:29,739][1652491] Updated weights for policy 0, policy_version 885139 (0.0013) [2024-06-15 22:21:30,955][1648985] Fps is (10 sec: 39321.2, 60 sec: 45875.1, 300 sec: 45541.9). Total num frames: 1812856832. Throughput: 0: 11639.5. Samples: 453299712. Policy #0 lag: (min: 52.0, avg: 175.9, max: 308.0) [2024-06-15 22:21:30,956][1648985] Avg episode reward: [(0, '161.910')] [2024-06-15 22:21:31,723][1652491] Updated weights for policy 0, policy_version 885216 (0.0013) [2024-06-15 22:21:33,468][1652491] Updated weights for policy 0, policy_version 885296 (0.0015) [2024-06-15 22:21:35,955][1648985] Fps is (10 sec: 42597.9, 60 sec: 44782.8, 300 sec: 46208.5). Total num frames: 1813118976. Throughput: 0: 11480.2. Samples: 453354496. Policy #0 lag: (min: 52.0, avg: 175.9, max: 308.0) [2024-06-15 22:21:35,956][1648985] Avg episode reward: [(0, '163.380')] [2024-06-15 22:21:37,378][1652491] Updated weights for policy 0, policy_version 885360 (0.0011) [2024-06-15 22:21:40,955][1648985] Fps is (10 sec: 39321.9, 60 sec: 43690.6, 300 sec: 45430.9). Total num frames: 1813250048. Throughput: 0: 11298.2. Samples: 453387264. Policy #0 lag: (min: 52.0, avg: 175.9, max: 308.0) [2024-06-15 22:21:40,956][1648985] Avg episode reward: [(0, '163.370')] [2024-06-15 22:21:41,838][1652491] Updated weights for policy 0, policy_version 885394 (0.0013) [2024-06-15 22:21:42,194][1651469] Signal inference workers to stop experience collection... (46150 times) [2024-06-15 22:21:42,272][1652491] InferenceWorker_p0-w0: stopping experience collection (46150 times) [2024-06-15 22:21:42,523][1651469] Signal inference workers to resume experience collection... (46150 times) [2024-06-15 22:21:42,534][1652491] InferenceWorker_p0-w0: resuming experience collection (46150 times) [2024-06-15 22:21:43,860][1652491] Updated weights for policy 0, policy_version 885478 (0.0015) [2024-06-15 22:21:45,480][1652491] Updated weights for policy 0, policy_version 885562 (0.0015) [2024-06-15 22:21:45,955][1648985] Fps is (10 sec: 52429.3, 60 sec: 46967.6, 300 sec: 46208.5). Total num frames: 1813643264. Throughput: 0: 11491.5. Samples: 453453312. Policy #0 lag: (min: 52.0, avg: 175.9, max: 308.0) [2024-06-15 22:21:45,956][1648985] Avg episode reward: [(0, '156.270')] [2024-06-15 22:21:49,686][1652491] Updated weights for policy 0, policy_version 885616 (0.0025) [2024-06-15 22:21:50,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 45329.2, 300 sec: 46097.4). Total num frames: 1813774336. Throughput: 0: 11275.4. Samples: 453520896. Policy #0 lag: (min: 15.0, avg: 142.1, max: 271.0) [2024-06-15 22:21:50,956][1648985] Avg episode reward: [(0, '154.220')] [2024-06-15 22:21:54,290][1652491] Updated weights for policy 0, policy_version 885669 (0.0013) [2024-06-15 22:21:55,955][1648985] Fps is (10 sec: 32768.0, 60 sec: 44783.1, 300 sec: 45541.9). Total num frames: 1813970944. Throughput: 0: 11343.6. Samples: 453560832. Policy #0 lag: (min: 15.0, avg: 142.1, max: 271.0) [2024-06-15 22:21:55,956][1648985] Avg episode reward: [(0, '166.350')] [2024-06-15 22:21:56,157][1652491] Updated weights for policy 0, policy_version 885750 (0.0013) [2024-06-15 22:21:57,636][1652491] Updated weights for policy 0, policy_version 885813 (0.0012) [2024-06-15 22:22:00,955][1648985] Fps is (10 sec: 42598.5, 60 sec: 44236.8, 300 sec: 45986.3). Total num frames: 1814200320. Throughput: 0: 11286.8. Samples: 453623808. Policy #0 lag: (min: 15.0, avg: 142.1, max: 271.0) [2024-06-15 22:22:00,956][1648985] Avg episode reward: [(0, '168.350')] [2024-06-15 22:22:01,779][1652491] Updated weights for policy 0, policy_version 885880 (0.0012) [2024-06-15 22:22:05,164][1652491] Updated weights for policy 0, policy_version 885920 (0.0024) [2024-06-15 22:22:05,956][1648985] Fps is (10 sec: 45870.0, 60 sec: 45874.3, 300 sec: 45430.7). Total num frames: 1814429696. Throughput: 0: 11354.7. Samples: 453695488. Policy #0 lag: (min: 15.0, avg: 142.1, max: 271.0) [2024-06-15 22:22:05,957][1648985] Avg episode reward: [(0, '176.920')] [2024-06-15 22:22:07,089][1652491] Updated weights for policy 0, policy_version 886000 (0.0012) [2024-06-15 22:22:08,723][1652491] Updated weights for policy 0, policy_version 886067 (0.0012) [2024-06-15 22:22:10,955][1648985] Fps is (10 sec: 49151.8, 60 sec: 44236.8, 300 sec: 46208.5). Total num frames: 1814691840. Throughput: 0: 11150.2. Samples: 453720064. Policy #0 lag: (min: 15.0, avg: 142.1, max: 271.0) [2024-06-15 22:22:10,956][1648985] Avg episode reward: [(0, '163.560')] [2024-06-15 22:22:12,395][1652491] Updated weights for policy 0, policy_version 886115 (0.0014) [2024-06-15 22:22:15,955][1648985] Fps is (10 sec: 39326.3, 60 sec: 43690.7, 300 sec: 45542.0). Total num frames: 1814822912. Throughput: 0: 11036.5. Samples: 453796352. Policy #0 lag: (min: 15.0, avg: 142.1, max: 271.0) [2024-06-15 22:22:15,955][1648985] Avg episode reward: [(0, '150.310')] [2024-06-15 22:22:16,764][1652491] Updated weights for policy 0, policy_version 886176 (0.0025) [2024-06-15 22:22:18,263][1652491] Updated weights for policy 0, policy_version 886240 (0.0086) [2024-06-15 22:22:19,923][1651469] Signal inference workers to stop experience collection... (46200 times) [2024-06-15 22:22:19,968][1652491] Updated weights for policy 0, policy_version 886309 (0.0012) [2024-06-15 22:22:20,006][1652491] InferenceWorker_p0-w0: stopping experience collection (46200 times) [2024-06-15 22:22:20,206][1651469] Signal inference workers to resume experience collection... (46200 times) [2024-06-15 22:22:20,207][1652491] InferenceWorker_p0-w0: resuming experience collection (46200 times) [2024-06-15 22:22:20,955][1648985] Fps is (10 sec: 52428.4, 60 sec: 45875.2, 300 sec: 46208.4). Total num frames: 1815216128. Throughput: 0: 11127.5. Samples: 453855232. Policy #0 lag: (min: 15.0, avg: 142.1, max: 271.0) [2024-06-15 22:22:20,956][1648985] Avg episode reward: [(0, '169.440')] [2024-06-15 22:22:23,830][1652491] Updated weights for policy 0, policy_version 886352 (0.0018) [2024-06-15 22:22:25,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 44236.8, 300 sec: 45986.3). Total num frames: 1815347200. Throughput: 0: 11252.6. Samples: 453893632. Policy #0 lag: (min: 15.0, avg: 142.1, max: 271.0) [2024-06-15 22:22:25,956][1648985] Avg episode reward: [(0, '162.990')] [2024-06-15 22:22:28,390][1652491] Updated weights for policy 0, policy_version 886416 (0.0100) [2024-06-15 22:22:29,682][1652491] Updated weights for policy 0, policy_version 886482 (0.0062) [2024-06-15 22:22:30,956][1648985] Fps is (10 sec: 39318.8, 60 sec: 45874.7, 300 sec: 45764.0). Total num frames: 1815609344. Throughput: 0: 11275.2. Samples: 453960704. Policy #0 lag: (min: 15.0, avg: 142.1, max: 271.0) [2024-06-15 22:22:30,956][1648985] Avg episode reward: [(0, '162.190')] [2024-06-15 22:22:31,037][1652491] Updated weights for policy 0, policy_version 886544 (0.0015) [2024-06-15 22:22:32,145][1652491] Updated weights for policy 0, policy_version 886590 (0.0013) [2024-06-15 22:22:35,491][1652491] Updated weights for policy 0, policy_version 886645 (0.0014) [2024-06-15 22:22:35,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 45875.3, 300 sec: 46208.4). Total num frames: 1815871488. Throughput: 0: 11332.3. Samples: 454030848. Policy #0 lag: (min: 15.0, avg: 142.1, max: 271.0) [2024-06-15 22:22:35,956][1648985] Avg episode reward: [(0, '157.130')] [2024-06-15 22:22:40,359][1652491] Updated weights for policy 0, policy_version 886704 (0.0013) [2024-06-15 22:22:40,955][1648985] Fps is (10 sec: 39324.6, 60 sec: 45875.2, 300 sec: 45319.8). Total num frames: 1816002560. Throughput: 0: 11343.7. Samples: 454071296. Policy #0 lag: (min: 15.0, avg: 142.1, max: 271.0) [2024-06-15 22:22:40,956][1648985] Avg episode reward: [(0, '155.360')] [2024-06-15 22:22:41,769][1652491] Updated weights for policy 0, policy_version 886768 (0.0091) [2024-06-15 22:22:43,097][1652491] Updated weights for policy 0, policy_version 886816 (0.0022) [2024-06-15 22:22:43,908][1652491] Updated weights for policy 0, policy_version 886846 (0.0016) [2024-06-15 22:22:45,955][1648985] Fps is (10 sec: 42598.5, 60 sec: 44236.9, 300 sec: 45986.3). Total num frames: 1816297472. Throughput: 0: 11343.6. Samples: 454134272. Policy #0 lag: (min: 15.0, avg: 142.1, max: 271.0) [2024-06-15 22:22:45,956][1648985] Avg episode reward: [(0, '159.070')] [2024-06-15 22:22:46,349][1652491] Updated weights for policy 0, policy_version 886882 (0.0026) [2024-06-15 22:22:50,604][1652491] Updated weights for policy 0, policy_version 886928 (0.0013) [2024-06-15 22:22:50,955][1648985] Fps is (10 sec: 42597.8, 60 sec: 44236.7, 300 sec: 45319.8). Total num frames: 1816428544. Throughput: 0: 11378.0. Samples: 454207488. Policy #0 lag: (min: 15.0, avg: 142.1, max: 271.0) [2024-06-15 22:22:50,956][1648985] Avg episode reward: [(0, '158.270')] [2024-06-15 22:22:52,067][1652491] Updated weights for policy 0, policy_version 886992 (0.0011) [2024-06-15 22:22:54,630][1652491] Updated weights for policy 0, policy_version 887059 (0.0066) [2024-06-15 22:22:55,661][1652491] Updated weights for policy 0, policy_version 887104 (0.0011) [2024-06-15 22:22:55,955][1648985] Fps is (10 sec: 49151.3, 60 sec: 46967.4, 300 sec: 46208.4). Total num frames: 1816788992. Throughput: 0: 11537.0. Samples: 454239232. Policy #0 lag: (min: 15.0, avg: 142.1, max: 271.0) [2024-06-15 22:22:55,956][1648985] Avg episode reward: [(0, '157.460')] [2024-06-15 22:22:55,989][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000887104_1816788992.pth... [2024-06-15 22:22:56,280][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000881760_1805844480.pth [2024-06-15 22:22:57,412][1652491] Updated weights for policy 0, policy_version 887159 (0.0015) [2024-06-15 22:23:00,957][1648985] Fps is (10 sec: 49140.6, 60 sec: 45327.2, 300 sec: 45653.1). Total num frames: 1816920064. Throughput: 0: 11479.5. Samples: 454312960. Policy #0 lag: (min: 15.0, avg: 142.1, max: 271.0) [2024-06-15 22:23:00,958][1648985] Avg episode reward: [(0, '162.660')] [2024-06-15 22:23:02,199][1652491] Updated weights for policy 0, policy_version 887203 (0.0013) [2024-06-15 22:23:03,591][1651469] Signal inference workers to stop experience collection... (46250 times) [2024-06-15 22:23:03,647][1652491] Updated weights for policy 0, policy_version 887265 (0.0014) [2024-06-15 22:23:03,695][1652491] InferenceWorker_p0-w0: stopping experience collection (46250 times) [2024-06-15 22:23:03,993][1651469] Signal inference workers to resume experience collection... (46250 times) [2024-06-15 22:23:03,994][1652491] InferenceWorker_p0-w0: resuming experience collection (46250 times) [2024-06-15 22:23:05,870][1652491] Updated weights for policy 0, policy_version 887328 (0.0011) [2024-06-15 22:23:05,955][1648985] Fps is (10 sec: 45875.0, 60 sec: 46968.3, 300 sec: 45986.3). Total num frames: 1817247744. Throughput: 0: 11730.5. Samples: 454383104. Policy #0 lag: (min: 15.0, avg: 142.1, max: 271.0) [2024-06-15 22:23:05,956][1648985] Avg episode reward: [(0, '180.060')] [2024-06-15 22:23:06,676][1652491] Updated weights for policy 0, policy_version 887360 (0.0011) [2024-06-15 22:23:08,284][1652491] Updated weights for policy 0, policy_version 887420 (0.0115) [2024-06-15 22:23:10,955][1648985] Fps is (10 sec: 52441.7, 60 sec: 45875.2, 300 sec: 45875.2). Total num frames: 1817444352. Throughput: 0: 11593.9. Samples: 454415360. Policy #0 lag: (min: 15.0, avg: 142.1, max: 271.0) [2024-06-15 22:23:10,956][1648985] Avg episode reward: [(0, '187.470')] [2024-06-15 22:23:13,727][1652491] Updated weights for policy 0, policy_version 887482 (0.0012) [2024-06-15 22:23:14,790][1652491] Updated weights for policy 0, policy_version 887542 (0.0014) [2024-06-15 22:23:15,955][1648985] Fps is (10 sec: 45876.2, 60 sec: 48059.8, 300 sec: 45764.1). Total num frames: 1817706496. Throughput: 0: 11730.7. Samples: 454488576. Policy #0 lag: (min: 15.0, avg: 142.1, max: 271.0) [2024-06-15 22:23:15,955][1648985] Avg episode reward: [(0, '165.590')] [2024-06-15 22:23:17,444][1652491] Updated weights for policy 0, policy_version 887603 (0.0012) [2024-06-15 22:23:18,305][1652491] Updated weights for policy 0, policy_version 887632 (0.0014) [2024-06-15 22:23:19,281][1652491] Updated weights for policy 0, policy_version 887677 (0.0071) [2024-06-15 22:23:20,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 45875.2, 300 sec: 45986.3). Total num frames: 1817968640. Throughput: 0: 11923.9. Samples: 454567424. Policy #0 lag: (min: 15.0, avg: 142.1, max: 271.0) [2024-06-15 22:23:20,956][1648985] Avg episode reward: [(0, '137.500')] [2024-06-15 22:23:24,851][1652491] Updated weights for policy 0, policy_version 887745 (0.0015) [2024-06-15 22:23:25,955][1648985] Fps is (10 sec: 49151.3, 60 sec: 47513.5, 300 sec: 45653.0). Total num frames: 1818198016. Throughput: 0: 11764.6. Samples: 454600704. Policy #0 lag: (min: 15.0, avg: 142.1, max: 271.0) [2024-06-15 22:23:25,956][1648985] Avg episode reward: [(0, '138.030')] [2024-06-15 22:23:26,059][1652491] Updated weights for policy 0, policy_version 887808 (0.0013) [2024-06-15 22:23:27,958][1652491] Updated weights for policy 0, policy_version 887872 (0.0014) [2024-06-15 22:23:30,158][1652491] Updated weights for policy 0, policy_version 887930 (0.0014) [2024-06-15 22:23:30,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 48060.3, 300 sec: 46208.4). Total num frames: 1818492928. Throughput: 0: 11844.2. Samples: 454667264. Policy #0 lag: (min: 15.0, avg: 142.1, max: 271.0) [2024-06-15 22:23:30,956][1648985] Avg episode reward: [(0, '150.230')] [2024-06-15 22:23:34,063][1652491] Updated weights for policy 0, policy_version 887970 (0.0011) [2024-06-15 22:23:35,955][1648985] Fps is (10 sec: 45875.2, 60 sec: 46421.3, 300 sec: 45653.0). Total num frames: 1818656768. Throughput: 0: 12060.5. Samples: 454750208. Policy #0 lag: (min: 15.0, avg: 142.1, max: 271.0) [2024-06-15 22:23:35,956][1648985] Avg episode reward: [(0, '154.460')] [2024-06-15 22:23:36,257][1652491] Updated weights for policy 0, policy_version 888035 (0.0014) [2024-06-15 22:23:37,068][1652491] Updated weights for policy 0, policy_version 888069 (0.0038) [2024-06-15 22:23:39,778][1652491] Updated weights for policy 0, policy_version 888144 (0.0012) [2024-06-15 22:23:40,955][1648985] Fps is (10 sec: 49152.4, 60 sec: 49698.2, 300 sec: 46097.4). Total num frames: 1818984448. Throughput: 0: 12037.7. Samples: 454780928. Policy #0 lag: (min: 15.0, avg: 142.1, max: 271.0) [2024-06-15 22:23:40,955][1648985] Avg episode reward: [(0, '170.140')] [2024-06-15 22:23:41,028][1652491] Updated weights for policy 0, policy_version 888192 (0.0014) [2024-06-15 22:23:45,955][1648985] Fps is (10 sec: 49151.9, 60 sec: 47513.5, 300 sec: 45875.2). Total num frames: 1819148288. Throughput: 0: 12129.4. Samples: 454858752. Policy #0 lag: (min: 3.0, avg: 86.4, max: 259.0) [2024-06-15 22:23:45,956][1648985] Avg episode reward: [(0, '160.710')] [2024-06-15 22:23:46,784][1652491] Updated weights for policy 0, policy_version 888263 (0.0095) [2024-06-15 22:23:47,863][1652491] Updated weights for policy 0, policy_version 888318 (0.0017) [2024-06-15 22:23:48,493][1651469] Signal inference workers to stop experience collection... (46300 times) [2024-06-15 22:23:48,582][1652491] InferenceWorker_p0-w0: stopping experience collection (46300 times) [2024-06-15 22:23:48,716][1651469] Signal inference workers to resume experience collection... (46300 times) [2024-06-15 22:23:48,725][1652491] InferenceWorker_p0-w0: resuming experience collection (46300 times) [2024-06-15 22:23:49,387][1652491] Updated weights for policy 0, policy_version 888379 (0.0012) [2024-06-15 22:23:50,955][1648985] Fps is (10 sec: 45875.1, 60 sec: 50244.4, 300 sec: 46208.5). Total num frames: 1819443200. Throughput: 0: 12060.5. Samples: 454925824. Policy #0 lag: (min: 3.0, avg: 86.4, max: 259.0) [2024-06-15 22:23:50,956][1648985] Avg episode reward: [(0, '169.090')] [2024-06-15 22:23:51,459][1652491] Updated weights for policy 0, policy_version 888442 (0.0013) [2024-06-15 22:23:55,955][1648985] Fps is (10 sec: 42598.6, 60 sec: 46421.4, 300 sec: 45875.2). Total num frames: 1819574272. Throughput: 0: 12276.6. Samples: 454967808. Policy #0 lag: (min: 3.0, avg: 86.4, max: 259.0) [2024-06-15 22:23:55,956][1648985] Avg episode reward: [(0, '161.590')] [2024-06-15 22:23:56,456][1652491] Updated weights for policy 0, policy_version 888496 (0.0014) [2024-06-15 22:23:58,042][1652491] Updated weights for policy 0, policy_version 888575 (0.0098) [2024-06-15 22:24:00,748][1652491] Updated weights for policy 0, policy_version 888633 (0.0013) [2024-06-15 22:24:00,955][1648985] Fps is (10 sec: 49152.0, 60 sec: 50246.4, 300 sec: 46208.5). Total num frames: 1819934720. Throughput: 0: 12151.4. Samples: 455035392. Policy #0 lag: (min: 3.0, avg: 86.4, max: 259.0) [2024-06-15 22:24:00,956][1648985] Avg episode reward: [(0, '173.870')] [2024-06-15 22:24:01,914][1652491] Updated weights for policy 0, policy_version 888672 (0.0044) [2024-06-15 22:24:05,955][1648985] Fps is (10 sec: 49152.1, 60 sec: 46967.6, 300 sec: 46208.5). Total num frames: 1820065792. Throughput: 0: 12049.1. Samples: 455109632. Policy #0 lag: (min: 3.0, avg: 86.4, max: 259.0) [2024-06-15 22:24:05,956][1648985] Avg episode reward: [(0, '141.680')] [2024-06-15 22:24:07,493][1652491] Updated weights for policy 0, policy_version 888729 (0.0038) [2024-06-15 22:24:08,675][1652491] Updated weights for policy 0, policy_version 888789 (0.0025) [2024-06-15 22:24:10,349][1652491] Updated weights for policy 0, policy_version 888833 (0.0012) [2024-06-15 22:24:10,955][1648985] Fps is (10 sec: 45875.0, 60 sec: 49152.0, 300 sec: 45986.3). Total num frames: 1820393472. Throughput: 0: 12128.7. Samples: 455146496. Policy #0 lag: (min: 3.0, avg: 86.4, max: 259.0) [2024-06-15 22:24:10,956][1648985] Avg episode reward: [(0, '150.340')] [2024-06-15 22:24:11,485][1652491] Updated weights for policy 0, policy_version 888888 (0.0019) [2024-06-15 22:24:12,787][1652491] Updated weights for policy 0, policy_version 888929 (0.0013) [2024-06-15 22:24:15,955][1648985] Fps is (10 sec: 52428.3, 60 sec: 48059.6, 300 sec: 46208.4). Total num frames: 1820590080. Throughput: 0: 12310.7. Samples: 455221248. Policy #0 lag: (min: 3.0, avg: 86.4, max: 259.0) [2024-06-15 22:24:15,956][1648985] Avg episode reward: [(0, '160.630')] [2024-06-15 22:24:17,480][1652491] Updated weights for policy 0, policy_version 888976 (0.0017) [2024-06-15 22:24:19,029][1652491] Updated weights for policy 0, policy_version 889045 (0.0260) [2024-06-15 22:24:20,955][1648985] Fps is (10 sec: 45875.3, 60 sec: 48059.8, 300 sec: 46097.4). Total num frames: 1820852224. Throughput: 0: 12049.1. Samples: 455292416. Policy #0 lag: (min: 3.0, avg: 86.4, max: 259.0) [2024-06-15 22:24:20,956][1648985] Avg episode reward: [(0, '154.810')] [2024-06-15 22:24:21,088][1652491] Updated weights for policy 0, policy_version 889094 (0.0014) [2024-06-15 22:24:22,177][1652491] Updated weights for policy 0, policy_version 889151 (0.0012) [2024-06-15 22:24:24,123][1652491] Updated weights for policy 0, policy_version 889200 (0.0013) [2024-06-15 22:24:25,955][1648985] Fps is (10 sec: 52429.4, 60 sec: 48605.9, 300 sec: 46208.4). Total num frames: 1821114368. Throughput: 0: 12083.2. Samples: 455324672. Policy #0 lag: (min: 3.0, avg: 86.4, max: 259.0) [2024-06-15 22:24:25,956][1648985] Avg episode reward: [(0, '147.730')] [2024-06-15 22:24:28,232][1652491] Updated weights for policy 0, policy_version 889232 (0.0012) [2024-06-15 22:24:29,908][1651469] Signal inference workers to stop experience collection... (46350 times) [2024-06-15 22:24:29,982][1652491] InferenceWorker_p0-w0: stopping experience collection (46350 times) [2024-06-15 22:24:30,056][1651469] Signal inference workers to resume experience collection... (46350 times) [2024-06-15 22:24:30,057][1652491] InferenceWorker_p0-w0: resuming experience collection (46350 times) [2024-06-15 22:24:30,186][1652491] Updated weights for policy 0, policy_version 889313 (0.0012) [2024-06-15 22:24:30,955][1648985] Fps is (10 sec: 52428.0, 60 sec: 48059.6, 300 sec: 46652.7). Total num frames: 1821376512. Throughput: 0: 12071.8. Samples: 455401984. Policy #0 lag: (min: 3.0, avg: 86.4, max: 259.0) [2024-06-15 22:24:30,956][1648985] Avg episode reward: [(0, '165.260')] [2024-06-15 22:24:31,537][1652491] Updated weights for policy 0, policy_version 889349 (0.0014) [2024-06-15 22:24:32,412][1652491] Updated weights for policy 0, policy_version 889399 (0.0013) [2024-06-15 22:24:34,251][1652491] Updated weights for policy 0, policy_version 889456 (0.0014) [2024-06-15 22:24:35,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 49698.2, 300 sec: 46652.8). Total num frames: 1821638656. Throughput: 0: 12424.5. Samples: 455484928. Policy #0 lag: (min: 3.0, avg: 86.4, max: 259.0) [2024-06-15 22:24:35,956][1648985] Avg episode reward: [(0, '162.360')] [2024-06-15 22:24:38,493][1652491] Updated weights for policy 0, policy_version 889489 (0.0019) [2024-06-15 22:24:40,718][1652491] Updated weights for policy 0, policy_version 889571 (0.0012) [2024-06-15 22:24:40,955][1648985] Fps is (10 sec: 49153.2, 60 sec: 48059.8, 300 sec: 46986.0). Total num frames: 1821868032. Throughput: 0: 12344.9. Samples: 455523328. Policy #0 lag: (min: 3.0, avg: 86.4, max: 259.0) [2024-06-15 22:24:40,955][1648985] Avg episode reward: [(0, '160.620')] [2024-06-15 22:24:42,127][1652491] Updated weights for policy 0, policy_version 889618 (0.0025) [2024-06-15 22:24:45,275][1652491] Updated weights for policy 0, policy_version 889701 (0.0022) [2024-06-15 22:24:45,955][1648985] Fps is (10 sec: 52428.4, 60 sec: 50244.3, 300 sec: 46652.7). Total num frames: 1822162944. Throughput: 0: 12231.1. Samples: 455585792. Policy #0 lag: (min: 3.0, avg: 86.4, max: 259.0) [2024-06-15 22:24:45,956][1648985] Avg episode reward: [(0, '166.380')] [2024-06-15 22:24:49,594][1652491] Updated weights for policy 0, policy_version 889744 (0.0013) [2024-06-15 22:24:50,955][1648985] Fps is (10 sec: 42597.4, 60 sec: 47513.5, 300 sec: 46652.7). Total num frames: 1822294016. Throughput: 0: 12162.8. Samples: 455656960. Policy #0 lag: (min: 3.0, avg: 86.4, max: 259.0) [2024-06-15 22:24:50,956][1648985] Avg episode reward: [(0, '156.210')] [2024-06-15 22:24:51,699][1652491] Updated weights for policy 0, policy_version 889824 (0.0012) [2024-06-15 22:24:52,539][1652491] Updated weights for policy 0, policy_version 889855 (0.0011) [2024-06-15 22:24:54,067][1652491] Updated weights for policy 0, policy_version 889918 (0.0013) [2024-06-15 22:24:55,955][1648985] Fps is (10 sec: 39321.6, 60 sec: 49698.1, 300 sec: 46541.6). Total num frames: 1822556160. Throughput: 0: 11992.2. Samples: 455686144. Policy #0 lag: (min: 3.0, avg: 86.4, max: 259.0) [2024-06-15 22:24:55,956][1648985] Avg episode reward: [(0, '152.940')] [2024-06-15 22:24:55,961][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000889920_1822556160.pth... [2024-06-15 22:24:56,018][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000884416_1811283968.pth [2024-06-15 22:25:00,428][1652491] Updated weights for policy 0, policy_version 890000 (0.0014) [2024-06-15 22:25:00,955][1648985] Fps is (10 sec: 45875.8, 60 sec: 46967.4, 300 sec: 46430.6). Total num frames: 1822752768. Throughput: 0: 12060.5. Samples: 455763968. Policy #0 lag: (min: 3.0, avg: 86.4, max: 259.0) [2024-06-15 22:25:00,956][1648985] Avg episode reward: [(0, '150.650')] [2024-06-15 22:25:01,477][1652491] Updated weights for policy 0, policy_version 890043 (0.0013) [2024-06-15 22:25:03,291][1652491] Updated weights for policy 0, policy_version 890101 (0.0047) [2024-06-15 22:25:05,040][1652491] Updated weights for policy 0, policy_version 890160 (0.0011) [2024-06-15 22:25:05,955][1648985] Fps is (10 sec: 52429.2, 60 sec: 50244.3, 300 sec: 47097.1). Total num frames: 1823080448. Throughput: 0: 11992.2. Samples: 455832064. Policy #0 lag: (min: 3.0, avg: 86.4, max: 259.0) [2024-06-15 22:25:05,956][1648985] Avg episode reward: [(0, '156.710')] [2024-06-15 22:25:07,309][1652491] Updated weights for policy 0, policy_version 890208 (0.0051) [2024-06-15 22:25:10,886][1652491] Updated weights for policy 0, policy_version 890243 (0.0016) [2024-06-15 22:25:10,955][1648985] Fps is (10 sec: 45874.9, 60 sec: 46967.4, 300 sec: 46652.7). Total num frames: 1823211520. Throughput: 0: 12162.8. Samples: 455872000. Policy #0 lag: (min: 3.0, avg: 86.4, max: 259.0) [2024-06-15 22:25:10,956][1648985] Avg episode reward: [(0, '143.570')] [2024-06-15 22:25:13,002][1652491] Updated weights for policy 0, policy_version 890305 (0.0011) [2024-06-15 22:25:13,475][1651469] Signal inference workers to stop experience collection... (46400 times) [2024-06-15 22:25:13,550][1652491] InferenceWorker_p0-w0: stopping experience collection (46400 times) [2024-06-15 22:25:13,803][1651469] Signal inference workers to resume experience collection... (46400 times) [2024-06-15 22:25:13,804][1652491] InferenceWorker_p0-w0: resuming experience collection (46400 times) [2024-06-15 22:25:14,409][1652491] Updated weights for policy 0, policy_version 890366 (0.0013) [2024-06-15 22:25:15,955][1648985] Fps is (10 sec: 39321.6, 60 sec: 48059.8, 300 sec: 46986.0). Total num frames: 1823473664. Throughput: 0: 11946.7. Samples: 455939584. Policy #0 lag: (min: 3.0, avg: 86.4, max: 259.0) [2024-06-15 22:25:15,956][1648985] Avg episode reward: [(0, '134.880')] [2024-06-15 22:25:16,747][1652491] Updated weights for policy 0, policy_version 890425 (0.0129) [2024-06-15 22:25:18,364][1652491] Updated weights for policy 0, policy_version 890480 (0.0013) [2024-06-15 22:25:20,955][1648985] Fps is (10 sec: 52429.4, 60 sec: 48059.7, 300 sec: 46652.7). Total num frames: 1823735808. Throughput: 0: 11764.6. Samples: 456014336. Policy #0 lag: (min: 3.0, avg: 86.4, max: 259.0) [2024-06-15 22:25:20,956][1648985] Avg episode reward: [(0, '134.080')] [2024-06-15 22:25:22,232][1652491] Updated weights for policy 0, policy_version 890515 (0.0021) [2024-06-15 22:25:23,196][1652491] Updated weights for policy 0, policy_version 890560 (0.0015) [2024-06-15 22:25:25,804][1652491] Updated weights for policy 0, policy_version 890622 (0.0012) [2024-06-15 22:25:25,955][1648985] Fps is (10 sec: 52428.6, 60 sec: 48059.7, 300 sec: 47097.1). Total num frames: 1823997952. Throughput: 0: 11639.4. Samples: 456047104. Policy #0 lag: (min: 3.0, avg: 86.4, max: 259.0) [2024-06-15 22:25:25,956][1648985] Avg episode reward: [(0, '143.610')] [2024-06-15 22:25:27,933][1652491] Updated weights for policy 0, policy_version 890688 (0.0018) [2024-06-15 22:25:30,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 48059.9, 300 sec: 46874.9). Total num frames: 1824260096. Throughput: 0: 11719.1. Samples: 456113152. Policy #0 lag: (min: 3.0, avg: 86.4, max: 259.0) [2024-06-15 22:25:30,956][1648985] Avg episode reward: [(0, '142.320')] [2024-06-15 22:25:33,055][1652491] Updated weights for policy 0, policy_version 890753 (0.0014) [2024-06-15 22:25:35,955][1648985] Fps is (10 sec: 39322.4, 60 sec: 45875.3, 300 sec: 46652.8). Total num frames: 1824391168. Throughput: 0: 11753.3. Samples: 456185856. Policy #0 lag: (min: 3.0, avg: 86.4, max: 259.0) [2024-06-15 22:25:35,955][1648985] Avg episode reward: [(0, '143.630')] [2024-06-15 22:25:36,157][1652491] Updated weights for policy 0, policy_version 890818 (0.0035) [2024-06-15 22:25:37,431][1652491] Updated weights for policy 0, policy_version 890875 (0.0011) [2024-06-15 22:25:39,236][1652491] Updated weights for policy 0, policy_version 890916 (0.0020) [2024-06-15 22:25:40,955][1648985] Fps is (10 sec: 45874.9, 60 sec: 47513.5, 300 sec: 47097.1). Total num frames: 1824718848. Throughput: 0: 11912.5. Samples: 456222208. Policy #0 lag: (min: 15.0, avg: 125.1, max: 271.0) [2024-06-15 22:25:40,956][1648985] Avg episode reward: [(0, '146.800')] [2024-06-15 22:25:41,109][1652491] Updated weights for policy 0, policy_version 890992 (0.0018) [2024-06-15 22:25:44,391][1652491] Updated weights for policy 0, policy_version 891028 (0.0012) [2024-06-15 22:25:45,317][1652491] Updated weights for policy 0, policy_version 891070 (0.0014) [2024-06-15 22:25:45,955][1648985] Fps is (10 sec: 52427.9, 60 sec: 45875.3, 300 sec: 46986.0). Total num frames: 1824915456. Throughput: 0: 11673.6. Samples: 456289280. Policy #0 lag: (min: 15.0, avg: 125.1, max: 271.0) [2024-06-15 22:25:45,955][1648985] Avg episode reward: [(0, '146.240')] [2024-06-15 22:25:48,175][1652491] Updated weights for policy 0, policy_version 891120 (0.0015) [2024-06-15 22:25:50,233][1652491] Updated weights for policy 0, policy_version 891153 (0.0012) [2024-06-15 22:25:50,955][1648985] Fps is (10 sec: 45875.4, 60 sec: 48059.8, 300 sec: 47097.1). Total num frames: 1825177600. Throughput: 0: 11855.6. Samples: 456365568. Policy #0 lag: (min: 15.0, avg: 125.1, max: 271.0) [2024-06-15 22:25:50,956][1648985] Avg episode reward: [(0, '161.880')] [2024-06-15 22:25:51,606][1652491] Updated weights for policy 0, policy_version 891203 (0.0015) [2024-06-15 22:25:54,619][1652491] Updated weights for policy 0, policy_version 891267 (0.0013) [2024-06-15 22:25:55,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 48059.8, 300 sec: 47097.1). Total num frames: 1825439744. Throughput: 0: 11741.9. Samples: 456400384. Policy #0 lag: (min: 15.0, avg: 125.1, max: 271.0) [2024-06-15 22:25:55,956][1648985] Avg episode reward: [(0, '163.090')] [2024-06-15 22:25:56,805][1652491] Updated weights for policy 0, policy_version 891329 (0.0031) [2024-06-15 22:26:00,051][1651469] Signal inference workers to stop experience collection... (46450 times) [2024-06-15 22:26:00,130][1652491] Updated weights for policy 0, policy_version 891396 (0.0012) [2024-06-15 22:26:00,175][1652491] InferenceWorker_p0-w0: stopping experience collection (46450 times) [2024-06-15 22:26:00,285][1651469] Signal inference workers to resume experience collection... (46450 times) [2024-06-15 22:26:00,290][1652491] InferenceWorker_p0-w0: resuming experience collection (46450 times) [2024-06-15 22:26:00,955][1648985] Fps is (10 sec: 45875.8, 60 sec: 48059.8, 300 sec: 47319.2). Total num frames: 1825636352. Throughput: 0: 11923.9. Samples: 456476160. Policy #0 lag: (min: 15.0, avg: 125.1, max: 271.0) [2024-06-15 22:26:00,955][1648985] Avg episode reward: [(0, '165.640')] [2024-06-15 22:26:01,342][1652491] Updated weights for policy 0, policy_version 891445 (0.0010) [2024-06-15 22:26:03,433][1652491] Updated weights for policy 0, policy_version 891491 (0.0012) [2024-06-15 22:26:05,909][1652491] Updated weights for policy 0, policy_version 891552 (0.0017) [2024-06-15 22:26:05,955][1648985] Fps is (10 sec: 45874.9, 60 sec: 46967.4, 300 sec: 46986.0). Total num frames: 1825898496. Throughput: 0: 11844.2. Samples: 456547328. Policy #0 lag: (min: 15.0, avg: 125.1, max: 271.0) [2024-06-15 22:26:05,956][1648985] Avg episode reward: [(0, '168.760')] [2024-06-15 22:26:08,276][1652491] Updated weights for policy 0, policy_version 891600 (0.0013) [2024-06-15 22:26:09,249][1652491] Updated weights for policy 0, policy_version 891648 (0.0015) [2024-06-15 22:26:10,955][1648985] Fps is (10 sec: 45874.0, 60 sec: 48059.7, 300 sec: 47097.0). Total num frames: 1826095104. Throughput: 0: 11923.9. Samples: 456583680. Policy #0 lag: (min: 15.0, avg: 125.1, max: 271.0) [2024-06-15 22:26:10,956][1648985] Avg episode reward: [(0, '165.760')] [2024-06-15 22:26:12,179][1652491] Updated weights for policy 0, policy_version 891701 (0.0012) [2024-06-15 22:26:13,730][1652491] Updated weights for policy 0, policy_version 891731 (0.0015) [2024-06-15 22:26:14,559][1652491] Updated weights for policy 0, policy_version 891776 (0.0013) [2024-06-15 22:26:15,955][1648985] Fps is (10 sec: 49152.4, 60 sec: 48605.9, 300 sec: 47208.1). Total num frames: 1826390016. Throughput: 0: 12140.1. Samples: 456659456. Policy #0 lag: (min: 15.0, avg: 125.1, max: 271.0) [2024-06-15 22:26:15,955][1648985] Avg episode reward: [(0, '171.900')] [2024-06-15 22:26:16,484][1652491] Updated weights for policy 0, policy_version 891832 (0.0021) [2024-06-15 22:26:19,717][1652491] Updated weights for policy 0, policy_version 891888 (0.0013) [2024-06-15 22:26:20,955][1648985] Fps is (10 sec: 52429.5, 60 sec: 48059.7, 300 sec: 47208.1). Total num frames: 1826619392. Throughput: 0: 12140.0. Samples: 456732160. Policy #0 lag: (min: 15.0, avg: 125.1, max: 271.0) [2024-06-15 22:26:20,956][1648985] Avg episode reward: [(0, '168.610')] [2024-06-15 22:26:21,975][1652491] Updated weights for policy 0, policy_version 891937 (0.0018) [2024-06-15 22:26:22,647][1652491] Updated weights for policy 0, policy_version 891968 (0.0011) [2024-06-15 22:26:25,224][1652491] Updated weights for policy 0, policy_version 892027 (0.0015) [2024-06-15 22:26:25,971][1648985] Fps is (10 sec: 49075.3, 60 sec: 48047.3, 300 sec: 47538.9). Total num frames: 1826881536. Throughput: 0: 12226.9. Samples: 456772608. Policy #0 lag: (min: 15.0, avg: 125.1, max: 271.0) [2024-06-15 22:26:25,971][1648985] Avg episode reward: [(0, '176.950')] [2024-06-15 22:26:26,782][1652491] Updated weights for policy 0, policy_version 892086 (0.0014) [2024-06-15 22:26:29,719][1652491] Updated weights for policy 0, policy_version 892128 (0.0011) [2024-06-15 22:26:30,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 48059.7, 300 sec: 47541.4). Total num frames: 1827143680. Throughput: 0: 12470.0. Samples: 456850432. Policy #0 lag: (min: 15.0, avg: 125.1, max: 271.0) [2024-06-15 22:26:30,956][1648985] Avg episode reward: [(0, '162.950')] [2024-06-15 22:26:32,118][1652491] Updated weights for policy 0, policy_version 892176 (0.0014) [2024-06-15 22:26:33,156][1652491] Updated weights for policy 0, policy_version 892224 (0.0013) [2024-06-15 22:26:35,955][1648985] Fps is (10 sec: 52510.9, 60 sec: 50244.1, 300 sec: 47985.7). Total num frames: 1827405824. Throughput: 0: 12162.9. Samples: 456912896. Policy #0 lag: (min: 15.0, avg: 125.1, max: 271.0) [2024-06-15 22:26:35,955][1648985] Avg episode reward: [(0, '173.070')] [2024-06-15 22:26:37,774][1652491] Updated weights for policy 0, policy_version 892320 (0.0015) [2024-06-15 22:26:40,804][1652491] Updated weights for policy 0, policy_version 892355 (0.0012) [2024-06-15 22:26:40,955][1648985] Fps is (10 sec: 39321.6, 60 sec: 46967.5, 300 sec: 47097.1). Total num frames: 1827536896. Throughput: 0: 12208.3. Samples: 456949760. Policy #0 lag: (min: 15.0, avg: 125.1, max: 271.0) [2024-06-15 22:26:40,956][1648985] Avg episode reward: [(0, '168.690')] [2024-06-15 22:26:42,119][1652491] Updated weights for policy 0, policy_version 892416 (0.0013) [2024-06-15 22:26:43,973][1651469] Signal inference workers to stop experience collection... (46500 times) [2024-06-15 22:26:44,032][1652491] InferenceWorker_p0-w0: stopping experience collection (46500 times) [2024-06-15 22:26:44,207][1651469] Signal inference workers to resume experience collection... (46500 times) [2024-06-15 22:26:44,207][1652491] InferenceWorker_p0-w0: resuming experience collection (46500 times) [2024-06-15 22:26:44,506][1652491] Updated weights for policy 0, policy_version 892475 (0.0014) [2024-06-15 22:26:45,955][1648985] Fps is (10 sec: 42598.1, 60 sec: 48605.8, 300 sec: 47652.4). Total num frames: 1827831808. Throughput: 0: 12117.3. Samples: 457021440. Policy #0 lag: (min: 15.0, avg: 125.1, max: 271.0) [2024-06-15 22:26:45,956][1648985] Avg episode reward: [(0, '155.340')] [2024-06-15 22:26:46,924][1652491] Updated weights for policy 0, policy_version 892532 (0.0017) [2024-06-15 22:26:48,801][1652491] Updated weights for policy 0, policy_version 892576 (0.0014) [2024-06-15 22:26:50,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 48059.7, 300 sec: 47763.5). Total num frames: 1828061184. Throughput: 0: 12094.6. Samples: 457091584. Policy #0 lag: (min: 15.0, avg: 125.1, max: 271.0) [2024-06-15 22:26:50,956][1648985] Avg episode reward: [(0, '132.320')] [2024-06-15 22:26:51,958][1652491] Updated weights for policy 0, policy_version 892612 (0.0012) [2024-06-15 22:26:53,371][1652491] Updated weights for policy 0, policy_version 892672 (0.0037) [2024-06-15 22:26:55,955][1648985] Fps is (10 sec: 45874.9, 60 sec: 47513.5, 300 sec: 47763.5). Total num frames: 1828290560. Throughput: 0: 11992.2. Samples: 457123328. Policy #0 lag: (min: 15.0, avg: 125.1, max: 271.0) [2024-06-15 22:26:55,956][1648985] Avg episode reward: [(0, '149.530')] [2024-06-15 22:26:56,092][1652491] Updated weights for policy 0, policy_version 892727 (0.0014) [2024-06-15 22:26:56,229][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000892736_1828323328.pth... [2024-06-15 22:26:56,274][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000887104_1816788992.pth [2024-06-15 22:26:58,305][1652491] Updated weights for policy 0, policy_version 892791 (0.0018) [2024-06-15 22:27:00,256][1652491] Updated weights for policy 0, policy_version 892853 (0.0014) [2024-06-15 22:27:00,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 49151.9, 300 sec: 47985.9). Total num frames: 1828585472. Throughput: 0: 11821.5. Samples: 457191424. Policy #0 lag: (min: 15.0, avg: 125.1, max: 271.0) [2024-06-15 22:27:00,956][1648985] Avg episode reward: [(0, '174.170')] [2024-06-15 22:27:04,112][1652491] Updated weights for policy 0, policy_version 892898 (0.0030) [2024-06-15 22:27:05,955][1648985] Fps is (10 sec: 42598.4, 60 sec: 46967.4, 300 sec: 47541.4). Total num frames: 1828716544. Throughput: 0: 11969.4. Samples: 457270784. Policy #0 lag: (min: 15.0, avg: 125.1, max: 271.0) [2024-06-15 22:27:05,956][1648985] Avg episode reward: [(0, '179.120')] [2024-06-15 22:27:06,248][1652491] Updated weights for policy 0, policy_version 892947 (0.0012) [2024-06-15 22:27:07,612][1652491] Updated weights for policy 0, policy_version 892997 (0.0067) [2024-06-15 22:27:10,214][1652491] Updated weights for policy 0, policy_version 893057 (0.0014) [2024-06-15 22:27:10,955][1648985] Fps is (10 sec: 45875.0, 60 sec: 49152.0, 300 sec: 48207.8). Total num frames: 1829044224. Throughput: 0: 11757.3. Samples: 457301504. Policy #0 lag: (min: 15.0, avg: 125.1, max: 271.0) [2024-06-15 22:27:10,956][1648985] Avg episode reward: [(0, '160.600')] [2024-06-15 22:27:11,400][1652491] Updated weights for policy 0, policy_version 893120 (0.0012) [2024-06-15 22:27:15,508][1652491] Updated weights for policy 0, policy_version 893172 (0.0078) [2024-06-15 22:27:15,955][1648985] Fps is (10 sec: 52429.3, 60 sec: 47513.6, 300 sec: 47541.4). Total num frames: 1829240832. Throughput: 0: 11753.2. Samples: 457379328. Policy #0 lag: (min: 15.0, avg: 125.1, max: 271.0) [2024-06-15 22:27:15,956][1648985] Avg episode reward: [(0, '146.020')] [2024-06-15 22:27:16,437][1652491] Updated weights for policy 0, policy_version 893203 (0.0010) [2024-06-15 22:27:17,300][1652491] Updated weights for policy 0, policy_version 893242 (0.0012) [2024-06-15 22:27:18,772][1652491] Updated weights for policy 0, policy_version 893280 (0.0012) [2024-06-15 22:27:20,493][1652491] Updated weights for policy 0, policy_version 893329 (0.0013) [2024-06-15 22:27:20,955][1648985] Fps is (10 sec: 52429.1, 60 sec: 49152.0, 300 sec: 48207.8). Total num frames: 1829568512. Throughput: 0: 12026.3. Samples: 457454080. Policy #0 lag: (min: 15.0, avg: 125.1, max: 271.0) [2024-06-15 22:27:20,956][1648985] Avg episode reward: [(0, '129.320')] [2024-06-15 22:27:25,539][1652491] Updated weights for policy 0, policy_version 893377 (0.0024) [2024-06-15 22:27:25,955][1648985] Fps is (10 sec: 42597.3, 60 sec: 46433.2, 300 sec: 47652.5). Total num frames: 1829666816. Throughput: 0: 12003.5. Samples: 457489920. Policy #0 lag: (min: 15.0, avg: 125.1, max: 271.0) [2024-06-15 22:27:25,956][1648985] Avg episode reward: [(0, '136.250')] [2024-06-15 22:27:26,406][1652491] Updated weights for policy 0, policy_version 893430 (0.0013) [2024-06-15 22:27:27,302][1652491] Updated weights for policy 0, policy_version 893463 (0.0017) [2024-06-15 22:27:28,916][1651469] Signal inference workers to stop experience collection... (46550 times) [2024-06-15 22:27:28,973][1652491] Updated weights for policy 0, policy_version 893506 (0.0012) [2024-06-15 22:27:29,022][1652491] InferenceWorker_p0-w0: stopping experience collection (46550 times) [2024-06-15 22:27:29,250][1651469] Signal inference workers to resume experience collection... (46550 times) [2024-06-15 22:27:29,250][1652491] InferenceWorker_p0-w0: resuming experience collection (46550 times) [2024-06-15 22:27:30,701][1652491] Updated weights for policy 0, policy_version 893569 (0.0123) [2024-06-15 22:27:30,955][1648985] Fps is (10 sec: 45875.1, 60 sec: 48059.7, 300 sec: 47985.7). Total num frames: 1830027264. Throughput: 0: 12140.1. Samples: 457567744. Policy #0 lag: (min: 15.0, avg: 125.1, max: 271.0) [2024-06-15 22:27:30,956][1648985] Avg episode reward: [(0, '137.450')] [2024-06-15 22:27:32,168][1652491] Updated weights for policy 0, policy_version 893632 (0.0016) [2024-06-15 22:27:35,955][1648985] Fps is (10 sec: 49153.2, 60 sec: 45875.2, 300 sec: 47985.7). Total num frames: 1830158336. Throughput: 0: 12071.8. Samples: 457634816. Policy #0 lag: (min: 109.0, avg: 227.0, max: 319.0) [2024-06-15 22:27:35,956][1648985] Avg episode reward: [(0, '151.450')] [2024-06-15 22:27:37,436][1652491] Updated weights for policy 0, policy_version 893681 (0.0013) [2024-06-15 22:27:38,876][1652491] Updated weights for policy 0, policy_version 893751 (0.0015) [2024-06-15 22:27:40,579][1652491] Updated weights for policy 0, policy_version 893778 (0.0012) [2024-06-15 22:27:40,955][1648985] Fps is (10 sec: 45874.6, 60 sec: 49151.9, 300 sec: 48096.7). Total num frames: 1830486016. Throughput: 0: 12162.8. Samples: 457670656. Policy #0 lag: (min: 109.0, avg: 227.0, max: 319.0) [2024-06-15 22:27:40,956][1648985] Avg episode reward: [(0, '180.650')] [2024-06-15 22:27:42,272][1652491] Updated weights for policy 0, policy_version 893843 (0.0016) [2024-06-15 22:27:45,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 47513.6, 300 sec: 48318.9). Total num frames: 1830682624. Throughput: 0: 12242.5. Samples: 457742336. Policy #0 lag: (min: 109.0, avg: 227.0, max: 319.0) [2024-06-15 22:27:45,955][1648985] Avg episode reward: [(0, '183.280')] [2024-06-15 22:27:47,555][1652491] Updated weights for policy 0, policy_version 893920 (0.0019) [2024-06-15 22:27:48,221][1652491] Updated weights for policy 0, policy_version 893952 (0.0013) [2024-06-15 22:27:50,017][1652491] Updated weights for policy 0, policy_version 894010 (0.0013) [2024-06-15 22:27:50,955][1648985] Fps is (10 sec: 49152.5, 60 sec: 48605.8, 300 sec: 48096.8). Total num frames: 1830977536. Throughput: 0: 12083.2. Samples: 457814528. Policy #0 lag: (min: 109.0, avg: 227.0, max: 319.0) [2024-06-15 22:27:50,956][1648985] Avg episode reward: [(0, '156.030')] [2024-06-15 22:27:51,452][1652491] Updated weights for policy 0, policy_version 894051 (0.0028) [2024-06-15 22:27:52,852][1652491] Updated weights for policy 0, policy_version 894083 (0.0012) [2024-06-15 22:27:54,044][1652491] Updated weights for policy 0, policy_version 894139 (0.0014) [2024-06-15 22:27:55,955][1648985] Fps is (10 sec: 52427.8, 60 sec: 48605.8, 300 sec: 48430.4). Total num frames: 1831206912. Throughput: 0: 12105.9. Samples: 457846272. Policy #0 lag: (min: 109.0, avg: 227.0, max: 319.0) [2024-06-15 22:27:55,956][1648985] Avg episode reward: [(0, '153.140')] [2024-06-15 22:27:59,887][1652491] Updated weights for policy 0, policy_version 894213 (0.0125) [2024-06-15 22:28:00,955][1648985] Fps is (10 sec: 49152.4, 60 sec: 48059.8, 300 sec: 48207.9). Total num frames: 1831469056. Throughput: 0: 12151.5. Samples: 457926144. Policy #0 lag: (min: 109.0, avg: 227.0, max: 319.0) [2024-06-15 22:28:00,956][1648985] Avg episode reward: [(0, '146.460')] [2024-06-15 22:28:01,224][1652491] Updated weights for policy 0, policy_version 894280 (0.0015) [2024-06-15 22:28:02,134][1652491] Updated weights for policy 0, policy_version 894333 (0.0123) [2024-06-15 22:28:04,978][1652491] Updated weights for policy 0, policy_version 894397 (0.0125) [2024-06-15 22:28:05,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 50244.2, 300 sec: 48430.0). Total num frames: 1831731200. Throughput: 0: 12026.3. Samples: 457995264. Policy #0 lag: (min: 109.0, avg: 227.0, max: 319.0) [2024-06-15 22:28:05,956][1648985] Avg episode reward: [(0, '156.720')] [2024-06-15 22:28:09,709][1652491] Updated weights for policy 0, policy_version 894458 (0.0174) [2024-06-15 22:28:10,955][1648985] Fps is (10 sec: 42597.2, 60 sec: 47513.4, 300 sec: 48096.7). Total num frames: 1831895040. Throughput: 0: 12151.5. Samples: 458036736. Policy #0 lag: (min: 109.0, avg: 227.0, max: 319.0) [2024-06-15 22:28:10,956][1648985] Avg episode reward: [(0, '175.480')] [2024-06-15 22:28:11,511][1652491] Updated weights for policy 0, policy_version 894516 (0.0013) [2024-06-15 22:28:12,160][1651469] Signal inference workers to stop experience collection... (46600 times) [2024-06-15 22:28:12,185][1652491] InferenceWorker_p0-w0: stopping experience collection (46600 times) [2024-06-15 22:28:12,384][1651469] Signal inference workers to resume experience collection... (46600 times) [2024-06-15 22:28:12,385][1652491] InferenceWorker_p0-w0: resuming experience collection (46600 times) [2024-06-15 22:28:12,916][1652491] Updated weights for policy 0, policy_version 894564 (0.0012) [2024-06-15 22:28:15,281][1652491] Updated weights for policy 0, policy_version 894647 (0.0043) [2024-06-15 22:28:15,955][1648985] Fps is (10 sec: 52429.8, 60 sec: 50244.3, 300 sec: 48430.0). Total num frames: 1832255488. Throughput: 0: 11912.6. Samples: 458103808. Policy #0 lag: (min: 109.0, avg: 227.0, max: 319.0) [2024-06-15 22:28:15,955][1648985] Avg episode reward: [(0, '166.560')] [2024-06-15 22:28:20,102][1652491] Updated weights for policy 0, policy_version 894720 (0.0091) [2024-06-15 22:28:20,955][1648985] Fps is (10 sec: 52430.1, 60 sec: 47513.6, 300 sec: 48207.8). Total num frames: 1832419328. Throughput: 0: 12197.0. Samples: 458183680. Policy #0 lag: (min: 109.0, avg: 227.0, max: 319.0) [2024-06-15 22:28:20,956][1648985] Avg episode reward: [(0, '161.470')] [2024-06-15 22:28:21,855][1652491] Updated weights for policy 0, policy_version 894784 (0.0013) [2024-06-15 22:28:24,579][1652491] Updated weights for policy 0, policy_version 894838 (0.0011) [2024-06-15 22:28:25,748][1652491] Updated weights for policy 0, policy_version 894880 (0.0012) [2024-06-15 22:28:25,955][1648985] Fps is (10 sec: 45874.4, 60 sec: 50790.5, 300 sec: 48207.8). Total num frames: 1832714240. Throughput: 0: 12162.9. Samples: 458217984. Policy #0 lag: (min: 109.0, avg: 227.0, max: 319.0) [2024-06-15 22:28:25,956][1648985] Avg episode reward: [(0, '160.420')] [2024-06-15 22:28:29,404][1652491] Updated weights for policy 0, policy_version 894928 (0.0013) [2024-06-15 22:28:30,287][1652491] Updated weights for policy 0, policy_version 894968 (0.0012) [2024-06-15 22:28:30,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 48605.9, 300 sec: 48430.0). Total num frames: 1832943616. Throughput: 0: 12288.0. Samples: 458295296. Policy #0 lag: (min: 109.0, avg: 227.0, max: 319.0) [2024-06-15 22:28:30,956][1648985] Avg episode reward: [(0, '139.550')] [2024-06-15 22:28:31,518][1652491] Updated weights for policy 0, policy_version 895024 (0.0013) [2024-06-15 22:28:35,659][1652491] Updated weights for policy 0, policy_version 895097 (0.0013) [2024-06-15 22:28:35,955][1648985] Fps is (10 sec: 45875.7, 60 sec: 50244.3, 300 sec: 48096.8). Total num frames: 1833172992. Throughput: 0: 12208.4. Samples: 458363904. Policy #0 lag: (min: 109.0, avg: 227.0, max: 319.0) [2024-06-15 22:28:35,956][1648985] Avg episode reward: [(0, '145.380')] [2024-06-15 22:28:37,027][1652491] Updated weights for policy 0, policy_version 895141 (0.0013) [2024-06-15 22:28:39,911][1652491] Updated weights for policy 0, policy_version 895184 (0.0013) [2024-06-15 22:28:40,955][1648985] Fps is (10 sec: 49151.8, 60 sec: 49152.1, 300 sec: 48430.0). Total num frames: 1833435136. Throughput: 0: 12390.4. Samples: 458403840. Policy #0 lag: (min: 109.0, avg: 227.0, max: 319.0) [2024-06-15 22:28:40,956][1648985] Avg episode reward: [(0, '142.810')] [2024-06-15 22:28:41,947][1652491] Updated weights for policy 0, policy_version 895236 (0.0014) [2024-06-15 22:28:45,386][1652491] Updated weights for policy 0, policy_version 895297 (0.0018) [2024-06-15 22:28:45,955][1648985] Fps is (10 sec: 42597.8, 60 sec: 48605.7, 300 sec: 47985.7). Total num frames: 1833598976. Throughput: 0: 12174.2. Samples: 458473984. Policy #0 lag: (min: 109.0, avg: 227.0, max: 319.0) [2024-06-15 22:28:45,956][1648985] Avg episode reward: [(0, '156.170')] [2024-06-15 22:28:47,491][1652491] Updated weights for policy 0, policy_version 895363 (0.0011) [2024-06-15 22:28:48,725][1652491] Updated weights for policy 0, policy_version 895424 (0.0011) [2024-06-15 22:28:50,955][1648985] Fps is (10 sec: 39321.6, 60 sec: 47513.6, 300 sec: 48318.9). Total num frames: 1833828352. Throughput: 0: 12174.3. Samples: 458543104. Policy #0 lag: (min: 109.0, avg: 227.0, max: 319.0) [2024-06-15 22:28:50,956][1648985] Avg episode reward: [(0, '144.970')] [2024-06-15 22:28:52,631][1652491] Updated weights for policy 0, policy_version 895488 (0.0022) [2024-06-15 22:28:54,510][1652491] Updated weights for policy 0, policy_version 895551 (0.0013) [2024-06-15 22:28:55,955][1648985] Fps is (10 sec: 49152.4, 60 sec: 48059.8, 300 sec: 47985.7). Total num frames: 1834090496. Throughput: 0: 11946.7. Samples: 458574336. Policy #0 lag: (min: 109.0, avg: 227.0, max: 319.0) [2024-06-15 22:28:55,956][1648985] Avg episode reward: [(0, '156.700')] [2024-06-15 22:28:55,959][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000895552_1834090496.pth... [2024-06-15 22:28:56,007][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000889920_1822556160.pth [2024-06-15 22:28:57,807][1651469] Signal inference workers to stop experience collection... (46650 times) [2024-06-15 22:28:57,840][1652491] InferenceWorker_p0-w0: stopping experience collection (46650 times) [2024-06-15 22:28:58,099][1651469] Signal inference workers to resume experience collection... (46650 times) [2024-06-15 22:28:58,100][1652491] InferenceWorker_p0-w0: resuming experience collection (46650 times) [2024-06-15 22:28:58,622][1652491] Updated weights for policy 0, policy_version 895608 (0.0015) [2024-06-15 22:28:59,703][1652491] Updated weights for policy 0, policy_version 895648 (0.0011) [2024-06-15 22:29:00,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 48059.7, 300 sec: 48430.0). Total num frames: 1834352640. Throughput: 0: 11901.1. Samples: 458639360. Policy #0 lag: (min: 109.0, avg: 227.0, max: 319.0) [2024-06-15 22:29:00,956][1648985] Avg episode reward: [(0, '143.340')] [2024-06-15 22:29:02,664][1652491] Updated weights for policy 0, policy_version 895684 (0.0012) [2024-06-15 22:29:03,803][1652491] Updated weights for policy 0, policy_version 895744 (0.0014) [2024-06-15 22:29:05,571][1652491] Updated weights for policy 0, policy_version 895808 (0.0012) [2024-06-15 22:29:05,955][1648985] Fps is (10 sec: 52429.3, 60 sec: 48059.9, 300 sec: 48207.8). Total num frames: 1834614784. Throughput: 0: 11719.1. Samples: 458711040. Policy #0 lag: (min: 109.0, avg: 227.0, max: 319.0) [2024-06-15 22:29:05,955][1648985] Avg episode reward: [(0, '155.810')] [2024-06-15 22:29:10,351][1652491] Updated weights for policy 0, policy_version 895872 (0.0052) [2024-06-15 22:29:10,955][1648985] Fps is (10 sec: 42597.8, 60 sec: 48059.8, 300 sec: 48096.8). Total num frames: 1834778624. Throughput: 0: 11912.5. Samples: 458754048. Policy #0 lag: (min: 109.0, avg: 227.0, max: 319.0) [2024-06-15 22:29:10,956][1648985] Avg episode reward: [(0, '151.240')] [2024-06-15 22:29:13,880][1652491] Updated weights for policy 0, policy_version 895938 (0.0013) [2024-06-15 22:29:15,314][1652491] Updated weights for policy 0, policy_version 896000 (0.0013) [2024-06-15 22:29:15,955][1648985] Fps is (10 sec: 39321.0, 60 sec: 45875.1, 300 sec: 47985.7). Total num frames: 1835008000. Throughput: 0: 11616.7. Samples: 458818048. Policy #0 lag: (min: 109.0, avg: 227.0, max: 319.0) [2024-06-15 22:29:15,956][1648985] Avg episode reward: [(0, '148.700')] [2024-06-15 22:29:20,830][1652491] Updated weights for policy 0, policy_version 896068 (0.0014) [2024-06-15 22:29:20,955][1648985] Fps is (10 sec: 36045.1, 60 sec: 45329.0, 300 sec: 47541.4). Total num frames: 1835139072. Throughput: 0: 11662.2. Samples: 458888704. Policy #0 lag: (min: 109.0, avg: 227.0, max: 319.0) [2024-06-15 22:29:20,956][1648985] Avg episode reward: [(0, '152.250')] [2024-06-15 22:29:22,442][1652491] Updated weights for policy 0, policy_version 896131 (0.0020) [2024-06-15 22:29:23,729][1652491] Updated weights for policy 0, policy_version 896189 (0.0024) [2024-06-15 22:29:25,955][1648985] Fps is (10 sec: 45875.5, 60 sec: 45875.3, 300 sec: 47763.5). Total num frames: 1835466752. Throughput: 0: 11446.0. Samples: 458918912. Policy #0 lag: (min: 25.0, avg: 148.5, max: 281.0) [2024-06-15 22:29:25,956][1648985] Avg episode reward: [(0, '144.080')] [2024-06-15 22:29:28,016][1652491] Updated weights for policy 0, policy_version 896272 (0.0014) [2024-06-15 22:29:29,017][1652491] Updated weights for policy 0, policy_version 896320 (0.0014) [2024-06-15 22:29:30,955][1648985] Fps is (10 sec: 52429.1, 60 sec: 45329.1, 300 sec: 47541.4). Total num frames: 1835663360. Throughput: 0: 11457.5. Samples: 458989568. Policy #0 lag: (min: 25.0, avg: 148.5, max: 281.0) [2024-06-15 22:29:30,956][1648985] Avg episode reward: [(0, '169.950')] [2024-06-15 22:29:33,565][1652491] Updated weights for policy 0, policy_version 896384 (0.0092) [2024-06-15 22:29:34,816][1652491] Updated weights for policy 0, policy_version 896435 (0.0012) [2024-06-15 22:29:35,955][1648985] Fps is (10 sec: 45875.2, 60 sec: 45875.2, 300 sec: 47652.4). Total num frames: 1835925504. Throughput: 0: 11411.9. Samples: 459056640. Policy #0 lag: (min: 25.0, avg: 148.5, max: 281.0) [2024-06-15 22:29:35,956][1648985] Avg episode reward: [(0, '165.920')] [2024-06-15 22:29:37,556][1652491] Updated weights for policy 0, policy_version 896502 (0.0013) [2024-06-15 22:29:39,565][1651469] Signal inference workers to stop experience collection... (46700 times) [2024-06-15 22:29:39,656][1652491] InferenceWorker_p0-w0: stopping experience collection (46700 times) [2024-06-15 22:29:39,806][1651469] Signal inference workers to resume experience collection... (46700 times) [2024-06-15 22:29:39,807][1652491] InferenceWorker_p0-w0: resuming experience collection (46700 times) [2024-06-15 22:29:39,813][1652491] Updated weights for policy 0, policy_version 896560 (0.0041) [2024-06-15 22:29:40,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 45875.2, 300 sec: 47541.4). Total num frames: 1836187648. Throughput: 0: 11514.3. Samples: 459092480. Policy #0 lag: (min: 25.0, avg: 148.5, max: 281.0) [2024-06-15 22:29:40,956][1648985] Avg episode reward: [(0, '159.780')] [2024-06-15 22:29:44,142][1652491] Updated weights for policy 0, policy_version 896616 (0.0014) [2024-06-15 22:29:45,955][1648985] Fps is (10 sec: 49152.4, 60 sec: 46967.6, 300 sec: 47874.6). Total num frames: 1836417024. Throughput: 0: 11594.0. Samples: 459161088. Policy #0 lag: (min: 25.0, avg: 148.5, max: 281.0) [2024-06-15 22:29:45,955][1648985] Avg episode reward: [(0, '148.100')] [2024-06-15 22:29:46,041][1652491] Updated weights for policy 0, policy_version 896698 (0.0012) [2024-06-15 22:29:48,709][1652491] Updated weights for policy 0, policy_version 896752 (0.0013) [2024-06-15 22:29:50,955][1648985] Fps is (10 sec: 45875.4, 60 sec: 46967.5, 300 sec: 47763.5). Total num frames: 1836646400. Throughput: 0: 11593.9. Samples: 459232768. Policy #0 lag: (min: 25.0, avg: 148.5, max: 281.0) [2024-06-15 22:29:50,956][1648985] Avg episode reward: [(0, '152.420')] [2024-06-15 22:29:51,402][1652491] Updated weights for policy 0, policy_version 896824 (0.0021) [2024-06-15 22:29:55,297][1652491] Updated weights for policy 0, policy_version 896880 (0.0016) [2024-06-15 22:29:55,955][1648985] Fps is (10 sec: 42598.4, 60 sec: 45875.2, 300 sec: 47763.5). Total num frames: 1836843008. Throughput: 0: 11525.7. Samples: 459272704. Policy #0 lag: (min: 25.0, avg: 148.5, max: 281.0) [2024-06-15 22:29:55,955][1648985] Avg episode reward: [(0, '163.550')] [2024-06-15 22:29:57,438][1652491] Updated weights for policy 0, policy_version 896958 (0.0013) [2024-06-15 22:30:00,542][1652491] Updated weights for policy 0, policy_version 897010 (0.0013) [2024-06-15 22:30:00,955][1648985] Fps is (10 sec: 45875.2, 60 sec: 45875.2, 300 sec: 47541.4). Total num frames: 1837105152. Throughput: 0: 11503.0. Samples: 459335680. Policy #0 lag: (min: 25.0, avg: 148.5, max: 281.0) [2024-06-15 22:30:00,956][1648985] Avg episode reward: [(0, '162.100')] [2024-06-15 22:30:02,190][1652491] Updated weights for policy 0, policy_version 897057 (0.0012) [2024-06-15 22:30:02,712][1652491] Updated weights for policy 0, policy_version 897088 (0.0037) [2024-06-15 22:30:05,955][1648985] Fps is (10 sec: 45875.3, 60 sec: 44782.9, 300 sec: 47763.5). Total num frames: 1837301760. Throughput: 0: 11685.0. Samples: 459414528. Policy #0 lag: (min: 25.0, avg: 148.5, max: 281.0) [2024-06-15 22:30:05,956][1648985] Avg episode reward: [(0, '156.840')] [2024-06-15 22:30:06,272][1652491] Updated weights for policy 0, policy_version 897140 (0.0013) [2024-06-15 22:30:08,071][1652491] Updated weights for policy 0, policy_version 897210 (0.0013) [2024-06-15 22:30:10,955][1648985] Fps is (10 sec: 42597.0, 60 sec: 45875.1, 300 sec: 47652.4). Total num frames: 1837531136. Throughput: 0: 11616.6. Samples: 459441664. Policy #0 lag: (min: 25.0, avg: 148.5, max: 281.0) [2024-06-15 22:30:10,956][1648985] Avg episode reward: [(0, '141.520')] [2024-06-15 22:30:11,609][1652491] Updated weights for policy 0, policy_version 897264 (0.0012) [2024-06-15 22:30:12,622][1652491] Updated weights for policy 0, policy_version 897300 (0.0012) [2024-06-15 22:30:15,955][1648985] Fps is (10 sec: 45874.5, 60 sec: 45875.2, 300 sec: 47541.3). Total num frames: 1837760512. Throughput: 0: 11696.3. Samples: 459515904. Policy #0 lag: (min: 25.0, avg: 148.5, max: 281.0) [2024-06-15 22:30:15,956][1648985] Avg episode reward: [(0, '130.550')] [2024-06-15 22:30:16,601][1652491] Updated weights for policy 0, policy_version 897362 (0.0013) [2024-06-15 22:30:18,240][1652491] Updated weights for policy 0, policy_version 897426 (0.0034) [2024-06-15 22:30:19,283][1652491] Updated weights for policy 0, policy_version 897471 (0.0010) [2024-06-15 22:30:20,955][1648985] Fps is (10 sec: 49153.6, 60 sec: 48059.8, 300 sec: 47541.4). Total num frames: 1838022656. Throughput: 0: 11821.5. Samples: 459588608. Policy #0 lag: (min: 25.0, avg: 148.5, max: 281.0) [2024-06-15 22:30:20,956][1648985] Avg episode reward: [(0, '135.290')] [2024-06-15 22:30:22,532][1652491] Updated weights for policy 0, policy_version 897507 (0.0013) [2024-06-15 22:30:22,879][1651469] Signal inference workers to stop experience collection... (46750 times) [2024-06-15 22:30:22,927][1652491] InferenceWorker_p0-w0: stopping experience collection (46750 times) [2024-06-15 22:30:23,108][1651469] Signal inference workers to resume experience collection... (46750 times) [2024-06-15 22:30:23,110][1652491] InferenceWorker_p0-w0: resuming experience collection (46750 times) [2024-06-15 22:30:24,327][1652491] Updated weights for policy 0, policy_version 897584 (0.0153) [2024-06-15 22:30:25,978][1648985] Fps is (10 sec: 52307.6, 60 sec: 46949.3, 300 sec: 47537.6). Total num frames: 1838284800. Throughput: 0: 11724.4. Samples: 459620352. Policy #0 lag: (min: 25.0, avg: 148.5, max: 281.0) [2024-06-15 22:30:25,979][1648985] Avg episode reward: [(0, '139.840')] [2024-06-15 22:30:27,623][1652491] Updated weights for policy 0, policy_version 897620 (0.0012) [2024-06-15 22:30:28,900][1652491] Updated weights for policy 0, policy_version 897668 (0.0014) [2024-06-15 22:30:30,287][1652491] Updated weights for policy 0, policy_version 897723 (0.0130) [2024-06-15 22:30:30,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 48059.7, 300 sec: 47985.7). Total num frames: 1838546944. Throughput: 0: 11821.5. Samples: 459693056. Policy #0 lag: (min: 25.0, avg: 148.5, max: 281.0) [2024-06-15 22:30:30,955][1648985] Avg episode reward: [(0, '174.680')] [2024-06-15 22:30:33,316][1652491] Updated weights for policy 0, policy_version 897776 (0.0037) [2024-06-15 22:30:34,783][1652491] Updated weights for policy 0, policy_version 897848 (0.0012) [2024-06-15 22:30:35,955][1648985] Fps is (10 sec: 52551.1, 60 sec: 48059.8, 300 sec: 47763.5). Total num frames: 1838809088. Throughput: 0: 11867.0. Samples: 459766784. Policy #0 lag: (min: 25.0, avg: 148.5, max: 281.0) [2024-06-15 22:30:35,956][1648985] Avg episode reward: [(0, '175.920')] [2024-06-15 22:30:39,233][1652491] Updated weights for policy 0, policy_version 897912 (0.0161) [2024-06-15 22:30:40,955][1648985] Fps is (10 sec: 45875.0, 60 sec: 46967.5, 300 sec: 47763.5). Total num frames: 1839005696. Throughput: 0: 11832.9. Samples: 459805184. Policy #0 lag: (min: 25.0, avg: 148.5, max: 281.0) [2024-06-15 22:30:40,956][1648985] Avg episode reward: [(0, '177.580')] [2024-06-15 22:30:41,088][1652491] Updated weights for policy 0, policy_version 897968 (0.0033) [2024-06-15 22:30:43,116][1652491] Updated weights for policy 0, policy_version 897990 (0.0010) [2024-06-15 22:30:44,431][1652491] Updated weights for policy 0, policy_version 898049 (0.0122) [2024-06-15 22:30:45,439][1652491] Updated weights for policy 0, policy_version 898099 (0.0031) [2024-06-15 22:30:45,955][1648985] Fps is (10 sec: 52428.1, 60 sec: 48605.7, 300 sec: 47985.7). Total num frames: 1839333376. Throughput: 0: 11946.6. Samples: 459873280. Policy #0 lag: (min: 25.0, avg: 148.5, max: 281.0) [2024-06-15 22:30:45,956][1648985] Avg episode reward: [(0, '165.160')] [2024-06-15 22:30:49,917][1652491] Updated weights for policy 0, policy_version 898160 (0.0012) [2024-06-15 22:30:50,955][1648985] Fps is (10 sec: 45875.2, 60 sec: 46967.4, 300 sec: 47541.4). Total num frames: 1839464448. Throughput: 0: 11878.4. Samples: 459949056. Policy #0 lag: (min: 25.0, avg: 148.5, max: 281.0) [2024-06-15 22:30:50,956][1648985] Avg episode reward: [(0, '165.240')] [2024-06-15 22:30:51,721][1652491] Updated weights for policy 0, policy_version 898180 (0.0024) [2024-06-15 22:30:53,028][1652491] Updated weights for policy 0, policy_version 898236 (0.0014) [2024-06-15 22:30:55,322][1652491] Updated weights for policy 0, policy_version 898306 (0.0014) [2024-06-15 22:30:55,955][1648985] Fps is (10 sec: 45875.3, 60 sec: 49151.9, 300 sec: 47985.6). Total num frames: 1839792128. Throughput: 0: 12049.1. Samples: 459983872. Policy #0 lag: (min: 25.0, avg: 148.5, max: 281.0) [2024-06-15 22:30:55,956][1648985] Avg episode reward: [(0, '159.500')] [2024-06-15 22:30:56,221][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000898352_1839824896.pth... [2024-06-15 22:30:56,279][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000892736_1828323328.pth [2024-06-15 22:30:56,283][1651469] Saving a milestone train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/milestones/checkpoint_000898352_1839824896.pth [2024-06-15 22:31:00,902][1652491] Updated weights for policy 0, policy_version 898369 (0.0016) [2024-06-15 22:31:00,955][1648985] Fps is (10 sec: 39321.5, 60 sec: 45875.2, 300 sec: 47319.2). Total num frames: 1839857664. Throughput: 0: 11969.4. Samples: 460054528. Policy #0 lag: (min: 25.0, avg: 148.5, max: 281.0) [2024-06-15 22:31:00,956][1648985] Avg episode reward: [(0, '171.910')] [2024-06-15 22:31:02,112][1652491] Updated weights for policy 0, policy_version 898426 (0.0099) [2024-06-15 22:31:03,877][1652491] Updated weights for policy 0, policy_version 898486 (0.0014) [2024-06-15 22:31:05,955][1648985] Fps is (10 sec: 36045.3, 60 sec: 47513.6, 300 sec: 47652.5). Total num frames: 1840152576. Throughput: 0: 11901.2. Samples: 460124160. Policy #0 lag: (min: 25.0, avg: 148.5, max: 281.0) [2024-06-15 22:31:05,956][1648985] Avg episode reward: [(0, '177.970')] [2024-06-15 22:31:06,068][1651469] Signal inference workers to stop experience collection... (46800 times) [2024-06-15 22:31:06,118][1652491] InferenceWorker_p0-w0: stopping experience collection (46800 times) [2024-06-15 22:31:06,227][1651469] Signal inference workers to resume experience collection... (46800 times) [2024-06-15 22:31:06,228][1652491] InferenceWorker_p0-w0: resuming experience collection (46800 times) [2024-06-15 22:31:06,673][1652491] Updated weights for policy 0, policy_version 898563 (0.0087) [2024-06-15 22:31:07,991][1652491] Updated weights for policy 0, policy_version 898623 (0.0078) [2024-06-15 22:31:10,955][1648985] Fps is (10 sec: 52429.1, 60 sec: 47513.9, 300 sec: 47430.3). Total num frames: 1840381952. Throughput: 0: 11816.3. Samples: 460151808. Policy #0 lag: (min: 25.0, avg: 148.5, max: 281.0) [2024-06-15 22:31:10,955][1648985] Avg episode reward: [(0, '165.270')] [2024-06-15 22:31:13,920][1652491] Updated weights for policy 0, policy_version 898688 (0.0014) [2024-06-15 22:31:15,054][1652491] Updated weights for policy 0, policy_version 898743 (0.0030) [2024-06-15 22:31:15,955][1648985] Fps is (10 sec: 49151.8, 60 sec: 48059.8, 300 sec: 47541.4). Total num frames: 1840644096. Throughput: 0: 11810.1. Samples: 460224512. Policy #0 lag: (min: 25.0, avg: 148.5, max: 281.0) [2024-06-15 22:31:15,956][1648985] Avg episode reward: [(0, '148.540')] [2024-06-15 22:31:17,037][1652491] Updated weights for policy 0, policy_version 898784 (0.0012) [2024-06-15 22:31:19,069][1652491] Updated weights for policy 0, policy_version 898866 (0.0130) [2024-06-15 22:31:20,955][1648985] Fps is (10 sec: 52428.4, 60 sec: 48059.7, 300 sec: 47543.9). Total num frames: 1840906240. Throughput: 0: 11650.8. Samples: 460291072. Policy #0 lag: (min: 111.0, avg: 223.3, max: 367.0) [2024-06-15 22:31:20,956][1648985] Avg episode reward: [(0, '143.700')] [2024-06-15 22:31:24,052][1652491] Updated weights for policy 0, policy_version 898901 (0.0013) [2024-06-15 22:31:25,636][1652491] Updated weights for policy 0, policy_version 898983 (0.0015) [2024-06-15 22:31:25,955][1648985] Fps is (10 sec: 49152.1, 60 sec: 47532.0, 300 sec: 47430.3). Total num frames: 1841135616. Throughput: 0: 11764.6. Samples: 460334592. Policy #0 lag: (min: 111.0, avg: 223.3, max: 367.0) [2024-06-15 22:31:25,956][1648985] Avg episode reward: [(0, '155.110')] [2024-06-15 22:31:28,294][1652491] Updated weights for policy 0, policy_version 899040 (0.0014) [2024-06-15 22:31:29,851][1652491] Updated weights for policy 0, policy_version 899090 (0.0013) [2024-06-15 22:31:30,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 48059.7, 300 sec: 47541.4). Total num frames: 1841430528. Throughput: 0: 11685.0. Samples: 460399104. Policy #0 lag: (min: 111.0, avg: 223.3, max: 367.0) [2024-06-15 22:31:30,956][1648985] Avg episode reward: [(0, '163.450')] [2024-06-15 22:31:35,376][1652491] Updated weights for policy 0, policy_version 899157 (0.0012) [2024-06-15 22:31:35,955][1648985] Fps is (10 sec: 39321.0, 60 sec: 45329.0, 300 sec: 47430.3). Total num frames: 1841528832. Throughput: 0: 11684.9. Samples: 460474880. Policy #0 lag: (min: 111.0, avg: 223.3, max: 367.0) [2024-06-15 22:31:35,956][1648985] Avg episode reward: [(0, '176.600')] [2024-06-15 22:31:36,681][1652491] Updated weights for policy 0, policy_version 899232 (0.0013) [2024-06-15 22:31:39,269][1652491] Updated weights for policy 0, policy_version 899268 (0.0014) [2024-06-15 22:31:40,955][1648985] Fps is (10 sec: 39321.8, 60 sec: 46967.5, 300 sec: 47430.3). Total num frames: 1841823744. Throughput: 0: 11707.8. Samples: 460510720. Policy #0 lag: (min: 111.0, avg: 223.3, max: 367.0) [2024-06-15 22:31:40,956][1648985] Avg episode reward: [(0, '169.130')] [2024-06-15 22:31:41,879][1652491] Updated weights for policy 0, policy_version 899363 (0.0013) [2024-06-15 22:31:45,955][1648985] Fps is (10 sec: 42599.1, 60 sec: 43690.8, 300 sec: 47097.1). Total num frames: 1841954816. Throughput: 0: 11491.6. Samples: 460571648. Policy #0 lag: (min: 111.0, avg: 223.3, max: 367.0) [2024-06-15 22:31:45,956][1648985] Avg episode reward: [(0, '153.150')] [2024-06-15 22:31:46,905][1652491] Updated weights for policy 0, policy_version 899408 (0.0016) [2024-06-15 22:31:47,647][1651469] Signal inference workers to stop experience collection... (46850 times) [2024-06-15 22:31:47,703][1652491] InferenceWorker_p0-w0: stopping experience collection (46850 times) [2024-06-15 22:31:47,843][1651469] Signal inference workers to resume experience collection... (46850 times) [2024-06-15 22:31:47,844][1652491] InferenceWorker_p0-w0: resuming experience collection (46850 times) [2024-06-15 22:31:48,162][1652491] Updated weights for policy 0, policy_version 899472 (0.0014) [2024-06-15 22:31:50,955][1648985] Fps is (10 sec: 39321.4, 60 sec: 45875.2, 300 sec: 47208.1). Total num frames: 1842216960. Throughput: 0: 11605.3. Samples: 460646400. Policy #0 lag: (min: 111.0, avg: 223.3, max: 367.0) [2024-06-15 22:31:50,956][1648985] Avg episode reward: [(0, '126.150')] [2024-06-15 22:31:51,402][1652491] Updated weights for policy 0, policy_version 899539 (0.0012) [2024-06-15 22:31:53,885][1652491] Updated weights for policy 0, policy_version 899637 (0.0014) [2024-06-15 22:31:55,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 44783.0, 300 sec: 47097.1). Total num frames: 1842479104. Throughput: 0: 11446.0. Samples: 460666880. Policy #0 lag: (min: 111.0, avg: 223.3, max: 367.0) [2024-06-15 22:31:55,956][1648985] Avg episode reward: [(0, '131.610')] [2024-06-15 22:31:58,852][1652491] Updated weights for policy 0, policy_version 899699 (0.0011) [2024-06-15 22:31:59,975][1652491] Updated weights for policy 0, policy_version 899760 (0.0014) [2024-06-15 22:32:00,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 48059.7, 300 sec: 47541.4). Total num frames: 1842741248. Throughput: 0: 11639.5. Samples: 460748288. Policy #0 lag: (min: 111.0, avg: 223.3, max: 367.0) [2024-06-15 22:32:00,956][1648985] Avg episode reward: [(0, '146.170')] [2024-06-15 22:32:02,527][1652491] Updated weights for policy 0, policy_version 899797 (0.0013) [2024-06-15 22:32:04,083][1652491] Updated weights for policy 0, policy_version 899864 (0.0012) [2024-06-15 22:32:04,985][1652491] Updated weights for policy 0, policy_version 899900 (0.0032) [2024-06-15 22:32:05,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 47513.6, 300 sec: 47319.2). Total num frames: 1843003392. Throughput: 0: 11741.9. Samples: 460819456. Policy #0 lag: (min: 111.0, avg: 223.3, max: 367.0) [2024-06-15 22:32:05,956][1648985] Avg episode reward: [(0, '169.840')] [2024-06-15 22:32:09,478][1652491] Updated weights for policy 0, policy_version 899954 (0.0152) [2024-06-15 22:32:10,917][1652491] Updated weights for policy 0, policy_version 900032 (0.0020) [2024-06-15 22:32:10,955][1648985] Fps is (10 sec: 52427.7, 60 sec: 48059.5, 300 sec: 47541.3). Total num frames: 1843265536. Throughput: 0: 11730.4. Samples: 460862464. Policy #0 lag: (min: 111.0, avg: 223.3, max: 367.0) [2024-06-15 22:32:10,956][1648985] Avg episode reward: [(0, '179.110')] [2024-06-15 22:32:14,197][1652491] Updated weights for policy 0, policy_version 900084 (0.0014) [2024-06-15 22:32:15,861][1652491] Updated weights for policy 0, policy_version 900150 (0.0112) [2024-06-15 22:32:15,955][1648985] Fps is (10 sec: 49152.2, 60 sec: 47513.7, 300 sec: 47208.1). Total num frames: 1843494912. Throughput: 0: 11673.6. Samples: 460924416. Policy #0 lag: (min: 111.0, avg: 223.3, max: 367.0) [2024-06-15 22:32:15,956][1648985] Avg episode reward: [(0, '183.350')] [2024-06-15 22:32:19,977][1652491] Updated weights for policy 0, policy_version 900178 (0.0014) [2024-06-15 22:32:20,955][1648985] Fps is (10 sec: 36046.0, 60 sec: 45329.1, 300 sec: 47319.3). Total num frames: 1843625984. Throughput: 0: 11582.6. Samples: 460996096. Policy #0 lag: (min: 111.0, avg: 223.3, max: 367.0) [2024-06-15 22:32:20,955][1648985] Avg episode reward: [(0, '185.920')] [2024-06-15 22:32:22,042][1652491] Updated weights for policy 0, policy_version 900276 (0.0234) [2024-06-15 22:32:25,143][1652491] Updated weights for policy 0, policy_version 900321 (0.0012) [2024-06-15 22:32:25,956][1648985] Fps is (10 sec: 42594.1, 60 sec: 46420.6, 300 sec: 47096.9). Total num frames: 1843920896. Throughput: 0: 11536.8. Samples: 461029888. Policy #0 lag: (min: 111.0, avg: 223.3, max: 367.0) [2024-06-15 22:32:25,957][1648985] Avg episode reward: [(0, '149.820')] [2024-06-15 22:32:26,043][1651469] Signal inference workers to stop experience collection... (46900 times) [2024-06-15 22:32:26,083][1652491] InferenceWorker_p0-w0: stopping experience collection (46900 times) [2024-06-15 22:32:26,345][1651469] Signal inference workers to resume experience collection... (46900 times) [2024-06-15 22:32:26,345][1652491] InferenceWorker_p0-w0: resuming experience collection (46900 times) [2024-06-15 22:32:27,304][1652491] Updated weights for policy 0, policy_version 900406 (0.0014) [2024-06-15 22:32:30,955][1648985] Fps is (10 sec: 42598.4, 60 sec: 43690.7, 300 sec: 47097.1). Total num frames: 1844051968. Throughput: 0: 11650.9. Samples: 461095936. Policy #0 lag: (min: 111.0, avg: 223.3, max: 367.0) [2024-06-15 22:32:30,955][1648985] Avg episode reward: [(0, '147.160')] [2024-06-15 22:32:32,260][1652491] Updated weights for policy 0, policy_version 900464 (0.0011) [2024-06-15 22:32:33,543][1652491] Updated weights for policy 0, policy_version 900517 (0.0020) [2024-06-15 22:32:35,955][1648985] Fps is (10 sec: 39325.3, 60 sec: 46421.4, 300 sec: 46874.9). Total num frames: 1844314112. Throughput: 0: 11548.4. Samples: 461166080. Policy #0 lag: (min: 111.0, avg: 223.3, max: 367.0) [2024-06-15 22:32:35,956][1648985] Avg episode reward: [(0, '158.850')] [2024-06-15 22:32:37,364][1652491] Updated weights for policy 0, policy_version 900592 (0.0038) [2024-06-15 22:32:38,789][1652491] Updated weights for policy 0, policy_version 900643 (0.0013) [2024-06-15 22:32:40,958][1648985] Fps is (10 sec: 52410.8, 60 sec: 45872.6, 300 sec: 47096.5). Total num frames: 1844576256. Throughput: 0: 11741.0. Samples: 461195264. Policy #0 lag: (min: 111.0, avg: 223.3, max: 367.0) [2024-06-15 22:32:40,959][1648985] Avg episode reward: [(0, '171.490')] [2024-06-15 22:32:42,715][1652491] Updated weights for policy 0, policy_version 900676 (0.0012) [2024-06-15 22:32:43,901][1652491] Updated weights for policy 0, policy_version 900730 (0.0034) [2024-06-15 22:32:45,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 48059.7, 300 sec: 46986.0). Total num frames: 1844838400. Throughput: 0: 11468.8. Samples: 461264384. Policy #0 lag: (min: 111.0, avg: 223.3, max: 367.0) [2024-06-15 22:32:45,956][1648985] Avg episode reward: [(0, '174.120')] [2024-06-15 22:32:45,964][1652491] Updated weights for policy 0, policy_version 900800 (0.0013) [2024-06-15 22:32:49,535][1652491] Updated weights for policy 0, policy_version 900865 (0.0105) [2024-06-15 22:32:50,878][1652491] Updated weights for policy 0, policy_version 900928 (0.0013) [2024-06-15 22:32:50,955][1648985] Fps is (10 sec: 52446.3, 60 sec: 48059.7, 300 sec: 47097.1). Total num frames: 1845100544. Throughput: 0: 11286.7. Samples: 461327360. Policy #0 lag: (min: 111.0, avg: 223.3, max: 367.0) [2024-06-15 22:32:50,956][1648985] Avg episode reward: [(0, '163.140')] [2024-06-15 22:32:55,955][1648985] Fps is (10 sec: 36044.7, 60 sec: 45329.0, 300 sec: 46541.7). Total num frames: 1845198848. Throughput: 0: 11218.5. Samples: 461367296. Policy #0 lag: (min: 111.0, avg: 223.3, max: 367.0) [2024-06-15 22:32:55,956][1648985] Avg episode reward: [(0, '158.510')] [2024-06-15 22:32:56,060][1652491] Updated weights for policy 0, policy_version 900991 (0.0015) [2024-06-15 22:32:56,148][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000900992_1845231616.pth... [2024-06-15 22:32:56,326][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000895552_1834090496.pth [2024-06-15 22:32:59,191][1652491] Updated weights for policy 0, policy_version 901058 (0.0012) [2024-06-15 22:33:00,955][1648985] Fps is (10 sec: 39321.8, 60 sec: 45875.2, 300 sec: 46652.8). Total num frames: 1845493760. Throughput: 0: 11309.5. Samples: 461433344. Policy #0 lag: (min: 111.0, avg: 223.3, max: 367.0) [2024-06-15 22:33:00,956][1648985] Avg episode reward: [(0, '162.890')] [2024-06-15 22:33:01,315][1652491] Updated weights for policy 0, policy_version 901138 (0.0013) [2024-06-15 22:33:05,955][1648985] Fps is (10 sec: 42598.1, 60 sec: 43690.6, 300 sec: 46541.7). Total num frames: 1845624832. Throughput: 0: 11252.6. Samples: 461502464. Policy #0 lag: (min: 111.0, avg: 223.3, max: 367.0) [2024-06-15 22:33:05,956][1648985] Avg episode reward: [(0, '172.000')] [2024-06-15 22:33:06,816][1652491] Updated weights for policy 0, policy_version 901210 (0.0113) [2024-06-15 22:33:07,532][1652491] Updated weights for policy 0, policy_version 901248 (0.0017) [2024-06-15 22:33:08,464][1651469] Signal inference workers to stop experience collection... (46950 times) [2024-06-15 22:33:08,508][1652491] InferenceWorker_p0-w0: stopping experience collection (46950 times) [2024-06-15 22:33:08,671][1651469] Signal inference workers to resume experience collection... (46950 times) [2024-06-15 22:33:08,672][1652491] InferenceWorker_p0-w0: resuming experience collection (46950 times) [2024-06-15 22:33:08,875][1652491] Updated weights for policy 0, policy_version 901302 (0.0016) [2024-06-15 22:33:10,955][1648985] Fps is (10 sec: 42598.4, 60 sec: 44237.0, 300 sec: 46319.5). Total num frames: 1845919744. Throughput: 0: 11241.5. Samples: 461535744. Policy #0 lag: (min: 111.0, avg: 223.3, max: 367.0) [2024-06-15 22:33:10,955][1648985] Avg episode reward: [(0, '161.850')] [2024-06-15 22:33:11,253][1652491] Updated weights for policy 0, policy_version 901347 (0.0012) [2024-06-15 22:33:12,996][1652491] Updated weights for policy 0, policy_version 901430 (0.0011) [2024-06-15 22:33:15,982][1648985] Fps is (10 sec: 52287.0, 60 sec: 44216.7, 300 sec: 46537.4). Total num frames: 1846149120. Throughput: 0: 11336.8. Samples: 461606400. Policy #0 lag: (min: 75.0, avg: 201.8, max: 331.0) [2024-06-15 22:33:15,983][1648985] Avg episode reward: [(0, '147.980')] [2024-06-15 22:33:17,649][1652491] Updated weights for policy 0, policy_version 901472 (0.0012) [2024-06-15 22:33:18,918][1652491] Updated weights for policy 0, policy_version 901525 (0.0012) [2024-06-15 22:33:19,638][1652491] Updated weights for policy 0, policy_version 901565 (0.0013) [2024-06-15 22:33:20,955][1648985] Fps is (10 sec: 49152.1, 60 sec: 46421.3, 300 sec: 46430.6). Total num frames: 1846411264. Throughput: 0: 11559.8. Samples: 461686272. Policy #0 lag: (min: 75.0, avg: 201.8, max: 331.0) [2024-06-15 22:33:20,956][1648985] Avg episode reward: [(0, '157.260')] [2024-06-15 22:33:22,217][1652491] Updated weights for policy 0, policy_version 901619 (0.0031) [2024-06-15 22:33:23,837][1652491] Updated weights for policy 0, policy_version 901687 (0.0011) [2024-06-15 22:33:25,955][1648985] Fps is (10 sec: 52571.7, 60 sec: 45875.9, 300 sec: 46541.7). Total num frames: 1846673408. Throughput: 0: 11594.8. Samples: 461716992. Policy #0 lag: (min: 75.0, avg: 201.8, max: 331.0) [2024-06-15 22:33:25,956][1648985] Avg episode reward: [(0, '170.700')] [2024-06-15 22:33:28,054][1652491] Updated weights for policy 0, policy_version 901745 (0.0015) [2024-06-15 22:33:29,411][1652491] Updated weights for policy 0, policy_version 901808 (0.0012) [2024-06-15 22:33:30,955][1648985] Fps is (10 sec: 52428.4, 60 sec: 48059.6, 300 sec: 46652.7). Total num frames: 1846935552. Throughput: 0: 11810.1. Samples: 461795840. Policy #0 lag: (min: 75.0, avg: 201.8, max: 331.0) [2024-06-15 22:33:30,956][1648985] Avg episode reward: [(0, '177.010')] [2024-06-15 22:33:32,449][1652491] Updated weights for policy 0, policy_version 901858 (0.0011) [2024-06-15 22:33:33,693][1652491] Updated weights for policy 0, policy_version 901920 (0.0013) [2024-06-15 22:33:35,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 48059.8, 300 sec: 46652.7). Total num frames: 1847197696. Throughput: 0: 12083.2. Samples: 461871104. Policy #0 lag: (min: 75.0, avg: 201.8, max: 331.0) [2024-06-15 22:33:35,956][1648985] Avg episode reward: [(0, '181.020')] [2024-06-15 22:33:38,611][1652491] Updated weights for policy 0, policy_version 901970 (0.0035) [2024-06-15 22:33:39,911][1652491] Updated weights for policy 0, policy_version 902032 (0.0015) [2024-06-15 22:33:40,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 48062.4, 300 sec: 46986.0). Total num frames: 1847459840. Throughput: 0: 12014.9. Samples: 461907968. Policy #0 lag: (min: 75.0, avg: 201.8, max: 331.0) [2024-06-15 22:33:40,956][1648985] Avg episode reward: [(0, '169.540')] [2024-06-15 22:33:42,653][1652491] Updated weights for policy 0, policy_version 902083 (0.0014) [2024-06-15 22:33:44,174][1652491] Updated weights for policy 0, policy_version 902147 (0.0013) [2024-06-15 22:33:45,357][1652491] Updated weights for policy 0, policy_version 902205 (0.0141) [2024-06-15 22:33:45,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 48059.7, 300 sec: 47097.1). Total num frames: 1847721984. Throughput: 0: 12037.7. Samples: 461975040. Policy #0 lag: (min: 75.0, avg: 201.8, max: 331.0) [2024-06-15 22:33:45,956][1648985] Avg episode reward: [(0, '169.300')] [2024-06-15 22:33:49,556][1651469] Signal inference workers to stop experience collection... (47000 times) [2024-06-15 22:33:49,620][1652491] InferenceWorker_p0-w0: stopping experience collection (47000 times) [2024-06-15 22:33:49,748][1651469] Signal inference workers to resume experience collection... (47000 times) [2024-06-15 22:33:49,749][1652491] InferenceWorker_p0-w0: resuming experience collection (47000 times) [2024-06-15 22:33:50,196][1652491] Updated weights for policy 0, policy_version 902264 (0.0021) [2024-06-15 22:33:50,955][1648985] Fps is (10 sec: 42598.4, 60 sec: 46421.3, 300 sec: 46763.8). Total num frames: 1847885824. Throughput: 0: 12140.1. Samples: 462048768. Policy #0 lag: (min: 75.0, avg: 201.8, max: 331.0) [2024-06-15 22:33:50,956][1648985] Avg episode reward: [(0, '172.500')] [2024-06-15 22:33:51,603][1652491] Updated weights for policy 0, policy_version 902336 (0.0126) [2024-06-15 22:33:55,586][1652491] Updated weights for policy 0, policy_version 902407 (0.0013) [2024-06-15 22:33:55,955][1648985] Fps is (10 sec: 42598.5, 60 sec: 49152.0, 300 sec: 46763.8). Total num frames: 1848147968. Throughput: 0: 12288.0. Samples: 462088704. Policy #0 lag: (min: 75.0, avg: 201.8, max: 331.0) [2024-06-15 22:33:55,956][1648985] Avg episode reward: [(0, '191.240')] [2024-06-15 22:33:56,727][1652491] Updated weights for policy 0, policy_version 902462 (0.0020) [2024-06-15 22:34:00,955][1648985] Fps is (10 sec: 39322.2, 60 sec: 46421.4, 300 sec: 46319.5). Total num frames: 1848279040. Throughput: 0: 12227.2. Samples: 462156288. Policy #0 lag: (min: 75.0, avg: 201.8, max: 331.0) [2024-06-15 22:34:00,955][1648985] Avg episode reward: [(0, '164.360')] [2024-06-15 22:34:01,549][1652491] Updated weights for policy 0, policy_version 902528 (0.0013) [2024-06-15 22:34:03,043][1652491] Updated weights for policy 0, policy_version 902591 (0.0028) [2024-06-15 22:34:05,955][1648985] Fps is (10 sec: 42598.4, 60 sec: 49152.1, 300 sec: 46763.8). Total num frames: 1848573952. Throughput: 0: 11992.2. Samples: 462225920. Policy #0 lag: (min: 75.0, avg: 201.8, max: 331.0) [2024-06-15 22:34:05,956][1648985] Avg episode reward: [(0, '155.690')] [2024-06-15 22:34:06,543][1652491] Updated weights for policy 0, policy_version 902656 (0.0015) [2024-06-15 22:34:10,955][1648985] Fps is (10 sec: 49151.5, 60 sec: 47513.6, 300 sec: 46652.8). Total num frames: 1848770560. Throughput: 0: 11992.2. Samples: 462256640. Policy #0 lag: (min: 75.0, avg: 201.8, max: 331.0) [2024-06-15 22:34:10,956][1648985] Avg episode reward: [(0, '143.830')] [2024-06-15 22:34:11,513][1652491] Updated weights for policy 0, policy_version 902727 (0.0016) [2024-06-15 22:34:13,554][1652491] Updated weights for policy 0, policy_version 902801 (0.0013) [2024-06-15 22:34:15,955][1648985] Fps is (10 sec: 45874.8, 60 sec: 48081.5, 300 sec: 47097.0). Total num frames: 1849032704. Throughput: 0: 11798.7. Samples: 462326784. Policy #0 lag: (min: 75.0, avg: 201.8, max: 331.0) [2024-06-15 22:34:15,956][1648985] Avg episode reward: [(0, '154.400')] [2024-06-15 22:34:16,636][1652491] Updated weights for policy 0, policy_version 902864 (0.0015) [2024-06-15 22:34:17,788][1652491] Updated weights for policy 0, policy_version 902928 (0.0013) [2024-06-15 22:34:20,955][1648985] Fps is (10 sec: 52429.5, 60 sec: 48059.8, 300 sec: 46874.9). Total num frames: 1849294848. Throughput: 0: 11969.5. Samples: 462409728. Policy #0 lag: (min: 75.0, avg: 201.8, max: 331.0) [2024-06-15 22:34:20,955][1648985] Avg episode reward: [(0, '158.380')] [2024-06-15 22:34:21,505][1652491] Updated weights for policy 0, policy_version 902980 (0.0012) [2024-06-15 22:34:22,632][1652491] Updated weights for policy 0, policy_version 903040 (0.0024) [2024-06-15 22:34:24,095][1652491] Updated weights for policy 0, policy_version 903096 (0.0029) [2024-06-15 22:34:25,955][1648985] Fps is (10 sec: 52429.4, 60 sec: 48059.8, 300 sec: 47097.1). Total num frames: 1849556992. Throughput: 0: 11969.4. Samples: 462446592. Policy #0 lag: (min: 75.0, avg: 201.8, max: 331.0) [2024-06-15 22:34:25,955][1648985] Avg episode reward: [(0, '162.110')] [2024-06-15 22:34:27,911][1652491] Updated weights for policy 0, policy_version 903157 (0.0012) [2024-06-15 22:34:28,493][1651469] Signal inference workers to stop experience collection... (47050 times) [2024-06-15 22:34:28,562][1652491] InferenceWorker_p0-w0: stopping experience collection (47050 times) [2024-06-15 22:34:28,766][1651469] Signal inference workers to resume experience collection... (47050 times) [2024-06-15 22:34:28,767][1652491] InferenceWorker_p0-w0: resuming experience collection (47050 times) [2024-06-15 22:34:29,160][1652491] Updated weights for policy 0, policy_version 903223 (0.0017) [2024-06-15 22:34:30,955][1648985] Fps is (10 sec: 52428.0, 60 sec: 48059.7, 300 sec: 47097.1). Total num frames: 1849819136. Throughput: 0: 12151.5. Samples: 462521856. Policy #0 lag: (min: 75.0, avg: 201.8, max: 331.0) [2024-06-15 22:34:30,956][1648985] Avg episode reward: [(0, '173.790')] [2024-06-15 22:34:32,750][1652491] Updated weights for policy 0, policy_version 903264 (0.0013) [2024-06-15 22:34:33,786][1652491] Updated weights for policy 0, policy_version 903300 (0.0013) [2024-06-15 22:34:35,022][1652491] Updated weights for policy 0, policy_version 903359 (0.0048) [2024-06-15 22:34:35,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 48059.8, 300 sec: 47097.1). Total num frames: 1850081280. Throughput: 0: 12128.7. Samples: 462594560. Policy #0 lag: (min: 75.0, avg: 201.8, max: 331.0) [2024-06-15 22:34:35,955][1648985] Avg episode reward: [(0, '171.040')] [2024-06-15 22:34:38,473][1652491] Updated weights for policy 0, policy_version 903409 (0.0015) [2024-06-15 22:34:39,838][1652491] Updated weights for policy 0, policy_version 903472 (0.0011) [2024-06-15 22:34:40,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 48059.8, 300 sec: 47208.1). Total num frames: 1850343424. Throughput: 0: 12083.2. Samples: 462632448. Policy #0 lag: (min: 75.0, avg: 201.8, max: 331.0) [2024-06-15 22:34:40,956][1648985] Avg episode reward: [(0, '162.720')] [2024-06-15 22:34:43,583][1652491] Updated weights for policy 0, policy_version 903546 (0.0013) [2024-06-15 22:34:45,749][1652491] Updated weights for policy 0, policy_version 903611 (0.0012) [2024-06-15 22:34:45,955][1648985] Fps is (10 sec: 52428.3, 60 sec: 48059.7, 300 sec: 47319.2). Total num frames: 1850605568. Throughput: 0: 12174.2. Samples: 462704128. Policy #0 lag: (min: 75.0, avg: 201.8, max: 331.0) [2024-06-15 22:34:45,956][1648985] Avg episode reward: [(0, '161.930')] [2024-06-15 22:34:49,350][1652491] Updated weights for policy 0, policy_version 903664 (0.0040) [2024-06-15 22:34:50,841][1652491] Updated weights for policy 0, policy_version 903728 (0.0013) [2024-06-15 22:34:50,955][1648985] Fps is (10 sec: 49151.9, 60 sec: 49152.0, 300 sec: 47430.3). Total num frames: 1850834944. Throughput: 0: 12094.6. Samples: 462770176. Policy #0 lag: (min: 75.0, avg: 201.8, max: 331.0) [2024-06-15 22:34:50,956][1648985] Avg episode reward: [(0, '167.570')] [2024-06-15 22:34:54,485][1652491] Updated weights for policy 0, policy_version 903782 (0.0035) [2024-06-15 22:34:55,955][1648985] Fps is (10 sec: 39321.1, 60 sec: 47513.5, 300 sec: 47097.0). Total num frames: 1850998784. Throughput: 0: 12276.6. Samples: 462809088. Policy #0 lag: (min: 75.0, avg: 201.8, max: 331.0) [2024-06-15 22:34:55,956][1648985] Avg episode reward: [(0, '181.560')] [2024-06-15 22:34:55,964][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000903808_1850998784.pth... [2024-06-15 22:34:56,023][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000898352_1839824896.pth [2024-06-15 22:34:56,574][1652491] Updated weights for policy 0, policy_version 903829 (0.0017) [2024-06-15 22:35:00,261][1652491] Updated weights for policy 0, policy_version 903909 (0.0011) [2024-06-15 22:35:00,955][1648985] Fps is (10 sec: 42599.3, 60 sec: 49698.2, 300 sec: 47319.2). Total num frames: 1851260928. Throughput: 0: 12253.9. Samples: 462878208. Policy #0 lag: (min: 75.0, avg: 201.8, max: 331.0) [2024-06-15 22:35:00,955][1648985] Avg episode reward: [(0, '157.580')] [2024-06-15 22:35:01,312][1652491] Updated weights for policy 0, policy_version 903968 (0.0031) [2024-06-15 22:35:04,323][1652491] Updated weights for policy 0, policy_version 904004 (0.0033) [2024-06-15 22:35:05,481][1652491] Updated weights for policy 0, policy_version 904063 (0.0013) [2024-06-15 22:35:05,955][1648985] Fps is (10 sec: 52429.4, 60 sec: 49151.9, 300 sec: 47430.3). Total num frames: 1851523072. Throughput: 0: 12049.0. Samples: 462951936. Policy #0 lag: (min: 75.0, avg: 201.8, max: 331.0) [2024-06-15 22:35:05,956][1648985] Avg episode reward: [(0, '165.420')] [2024-06-15 22:35:08,596][1652491] Updated weights for policy 0, policy_version 904123 (0.0016) [2024-06-15 22:35:10,955][1648985] Fps is (10 sec: 42596.8, 60 sec: 48605.7, 300 sec: 47208.1). Total num frames: 1851686912. Throughput: 0: 11946.6. Samples: 462984192. Policy #0 lag: (min: 15.0, avg: 122.8, max: 271.0) [2024-06-15 22:35:10,956][1648985] Avg episode reward: [(0, '158.140')] [2024-06-15 22:35:11,327][1652491] Updated weights for policy 0, policy_version 904176 (0.0011) [2024-06-15 22:35:11,422][1651469] Signal inference workers to stop experience collection... (47100 times) [2024-06-15 22:35:11,470][1652491] InferenceWorker_p0-w0: stopping experience collection (47100 times) [2024-06-15 22:35:11,568][1651469] Signal inference workers to resume experience collection... (47100 times) [2024-06-15 22:35:11,568][1652491] InferenceWorker_p0-w0: resuming experience collection (47100 times) [2024-06-15 22:35:12,666][1652491] Updated weights for policy 0, policy_version 904240 (0.0013) [2024-06-15 22:35:15,955][1648985] Fps is (10 sec: 42598.5, 60 sec: 48605.9, 300 sec: 47208.1). Total num frames: 1851949056. Throughput: 0: 11935.3. Samples: 463058944. Policy #0 lag: (min: 15.0, avg: 122.8, max: 271.0) [2024-06-15 22:35:15,956][1648985] Avg episode reward: [(0, '168.690')] [2024-06-15 22:35:16,728][1652491] Updated weights for policy 0, policy_version 904304 (0.0022) [2024-06-15 22:35:20,115][1652491] Updated weights for policy 0, policy_version 904376 (0.0013) [2024-06-15 22:35:20,955][1648985] Fps is (10 sec: 49152.3, 60 sec: 48059.5, 300 sec: 47100.8). Total num frames: 1852178432. Throughput: 0: 11810.1. Samples: 463126016. Policy #0 lag: (min: 15.0, avg: 122.8, max: 271.0) [2024-06-15 22:35:20,956][1648985] Avg episode reward: [(0, '156.400')] [2024-06-15 22:35:22,149][1652491] Updated weights for policy 0, policy_version 904432 (0.0021) [2024-06-15 22:35:23,790][1652491] Updated weights for policy 0, policy_version 904512 (0.0014) [2024-06-15 22:35:25,955][1648985] Fps is (10 sec: 49152.3, 60 sec: 48059.7, 300 sec: 47097.1). Total num frames: 1852440576. Throughput: 0: 11719.1. Samples: 463159808. Policy #0 lag: (min: 15.0, avg: 122.8, max: 271.0) [2024-06-15 22:35:25,956][1648985] Avg episode reward: [(0, '164.700')] [2024-06-15 22:35:27,975][1652491] Updated weights for policy 0, policy_version 904572 (0.0037) [2024-06-15 22:35:30,955][1648985] Fps is (10 sec: 45875.6, 60 sec: 46967.5, 300 sec: 46874.9). Total num frames: 1852637184. Throughput: 0: 11901.2. Samples: 463239680. Policy #0 lag: (min: 15.0, avg: 122.8, max: 271.0) [2024-06-15 22:35:30,956][1648985] Avg episode reward: [(0, '165.960')] [2024-06-15 22:35:31,327][1652491] Updated weights for policy 0, policy_version 904628 (0.0032) [2024-06-15 22:35:32,653][1652491] Updated weights for policy 0, policy_version 904672 (0.0118) [2024-06-15 22:35:34,229][1652491] Updated weights for policy 0, policy_version 904738 (0.0012) [2024-06-15 22:35:35,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 48059.7, 300 sec: 47319.2). Total num frames: 1852964864. Throughput: 0: 11855.7. Samples: 463303680. Policy #0 lag: (min: 15.0, avg: 122.8, max: 271.0) [2024-06-15 22:35:35,956][1648985] Avg episode reward: [(0, '169.170')] [2024-06-15 22:35:38,651][1652491] Updated weights for policy 0, policy_version 904771 (0.0018) [2024-06-15 22:35:40,121][1652491] Updated weights for policy 0, policy_version 904832 (0.0021) [2024-06-15 22:35:40,955][1648985] Fps is (10 sec: 45874.0, 60 sec: 45875.0, 300 sec: 46652.7). Total num frames: 1853095936. Throughput: 0: 11855.6. Samples: 463342592. Policy #0 lag: (min: 15.0, avg: 122.8, max: 271.0) [2024-06-15 22:35:40,956][1648985] Avg episode reward: [(0, '154.350')] [2024-06-15 22:35:42,663][1652491] Updated weights for policy 0, policy_version 904888 (0.0127) [2024-06-15 22:35:43,997][1652491] Updated weights for policy 0, policy_version 904944 (0.0016) [2024-06-15 22:35:45,478][1652491] Updated weights for policy 0, policy_version 904995 (0.0012) [2024-06-15 22:35:45,955][1648985] Fps is (10 sec: 49152.2, 60 sec: 47513.7, 300 sec: 47430.3). Total num frames: 1853456384. Throughput: 0: 11719.1. Samples: 463405568. Policy #0 lag: (min: 15.0, avg: 122.8, max: 271.0) [2024-06-15 22:35:45,955][1648985] Avg episode reward: [(0, '143.270')] [2024-06-15 22:35:50,804][1652491] Updated weights for policy 0, policy_version 905059 (0.0012) [2024-06-15 22:35:50,955][1648985] Fps is (10 sec: 49153.8, 60 sec: 45875.3, 300 sec: 46763.9). Total num frames: 1853587456. Throughput: 0: 11741.9. Samples: 463480320. Policy #0 lag: (min: 15.0, avg: 122.8, max: 271.0) [2024-06-15 22:35:50,956][1648985] Avg episode reward: [(0, '152.590')] [2024-06-15 22:35:51,297][1652491] Updated weights for policy 0, policy_version 905088 (0.0020) [2024-06-15 22:35:53,689][1652491] Updated weights for policy 0, policy_version 905149 (0.0013) [2024-06-15 22:35:54,638][1651469] Signal inference workers to stop experience collection... (47150 times) [2024-06-15 22:35:54,706][1652491] InferenceWorker_p0-w0: stopping experience collection (47150 times) [2024-06-15 22:35:54,913][1651469] Signal inference workers to resume experience collection... (47150 times) [2024-06-15 22:35:54,914][1652491] InferenceWorker_p0-w0: resuming experience collection (47150 times) [2024-06-15 22:35:55,522][1652491] Updated weights for policy 0, policy_version 905214 (0.0020) [2024-06-15 22:35:55,955][1648985] Fps is (10 sec: 42598.0, 60 sec: 48059.9, 300 sec: 47541.4). Total num frames: 1853882368. Throughput: 0: 11707.8. Samples: 463511040. Policy #0 lag: (min: 15.0, avg: 122.8, max: 271.0) [2024-06-15 22:35:55,956][1648985] Avg episode reward: [(0, '166.550')] [2024-06-15 22:35:57,457][1652491] Updated weights for policy 0, policy_version 905277 (0.0014) [2024-06-15 22:36:00,955][1648985] Fps is (10 sec: 42598.4, 60 sec: 45875.1, 300 sec: 46986.0). Total num frames: 1854013440. Throughput: 0: 11605.4. Samples: 463581184. Policy #0 lag: (min: 15.0, avg: 122.8, max: 271.0) [2024-06-15 22:36:00,956][1648985] Avg episode reward: [(0, '176.150')] [2024-06-15 22:36:02,793][1652491] Updated weights for policy 0, policy_version 905343 (0.0013) [2024-06-15 22:36:04,958][1652491] Updated weights for policy 0, policy_version 905396 (0.0011) [2024-06-15 22:36:05,955][1648985] Fps is (10 sec: 42598.4, 60 sec: 46421.4, 300 sec: 47208.1). Total num frames: 1854308352. Throughput: 0: 11685.0. Samples: 463651840. Policy #0 lag: (min: 15.0, avg: 122.8, max: 271.0) [2024-06-15 22:36:05,956][1648985] Avg episode reward: [(0, '174.660')] [2024-06-15 22:36:06,622][1652491] Updated weights for policy 0, policy_version 905456 (0.0013) [2024-06-15 22:36:07,643][1652491] Updated weights for policy 0, policy_version 905492 (0.0012) [2024-06-15 22:36:10,957][1648985] Fps is (10 sec: 52418.4, 60 sec: 47512.2, 300 sec: 47096.8). Total num frames: 1854537728. Throughput: 0: 11616.2. Samples: 463682560. Policy #0 lag: (min: 15.0, avg: 122.8, max: 271.0) [2024-06-15 22:36:10,958][1648985] Avg episode reward: [(0, '153.550')] [2024-06-15 22:36:12,823][1652491] Updated weights for policy 0, policy_version 905538 (0.0014) [2024-06-15 22:36:15,538][1652491] Updated weights for policy 0, policy_version 905623 (0.0139) [2024-06-15 22:36:15,955][1648985] Fps is (10 sec: 42598.7, 60 sec: 46421.4, 300 sec: 46874.9). Total num frames: 1854734336. Throughput: 0: 11525.7. Samples: 463758336. Policy #0 lag: (min: 15.0, avg: 122.8, max: 271.0) [2024-06-15 22:36:15,956][1648985] Avg episode reward: [(0, '150.800')] [2024-06-15 22:36:17,029][1652491] Updated weights for policy 0, policy_version 905680 (0.0011) [2024-06-15 22:36:18,207][1652491] Updated weights for policy 0, policy_version 905727 (0.0022) [2024-06-15 22:36:20,955][1648985] Fps is (10 sec: 49161.6, 60 sec: 47513.7, 300 sec: 47097.1). Total num frames: 1855029248. Throughput: 0: 11320.9. Samples: 463813120. Policy #0 lag: (min: 15.0, avg: 122.8, max: 271.0) [2024-06-15 22:36:20,956][1648985] Avg episode reward: [(0, '151.070')] [2024-06-15 22:36:20,993][1652491] Updated weights for policy 0, policy_version 905791 (0.0014) [2024-06-15 22:36:25,859][1652491] Updated weights for policy 0, policy_version 905840 (0.0021) [2024-06-15 22:36:25,955][1648985] Fps is (10 sec: 42598.4, 60 sec: 45329.1, 300 sec: 46541.7). Total num frames: 1855160320. Throughput: 0: 11423.4. Samples: 463856640. Policy #0 lag: (min: 15.0, avg: 122.8, max: 271.0) [2024-06-15 22:36:25,955][1648985] Avg episode reward: [(0, '147.320')] [2024-06-15 22:36:27,705][1652491] Updated weights for policy 0, policy_version 905904 (0.0013) [2024-06-15 22:36:28,907][1652491] Updated weights for policy 0, policy_version 905936 (0.0127) [2024-06-15 22:36:30,039][1652491] Updated weights for policy 0, policy_version 905984 (0.0014) [2024-06-15 22:36:30,955][1648985] Fps is (10 sec: 42597.7, 60 sec: 46967.4, 300 sec: 47208.1). Total num frames: 1855455232. Throughput: 0: 11355.0. Samples: 463916544. Policy #0 lag: (min: 15.0, avg: 122.8, max: 271.0) [2024-06-15 22:36:30,956][1648985] Avg episode reward: [(0, '152.880')] [2024-06-15 22:36:32,873][1652491] Updated weights for policy 0, policy_version 906048 (0.0012) [2024-06-15 22:36:35,955][1648985] Fps is (10 sec: 42598.5, 60 sec: 43690.7, 300 sec: 46652.7). Total num frames: 1855586304. Throughput: 0: 11320.9. Samples: 463989760. Policy #0 lag: (min: 15.0, avg: 122.8, max: 271.0) [2024-06-15 22:36:35,955][1648985] Avg episode reward: [(0, '154.970')] [2024-06-15 22:36:37,945][1652491] Updated weights for policy 0, policy_version 906112 (0.0013) [2024-06-15 22:36:39,929][1652491] Updated weights for policy 0, policy_version 906170 (0.0019) [2024-06-15 22:36:40,955][1648985] Fps is (10 sec: 42598.9, 60 sec: 46421.6, 300 sec: 47208.1). Total num frames: 1855881216. Throughput: 0: 11366.4. Samples: 464022528. Policy #0 lag: (min: 15.0, avg: 122.8, max: 271.0) [2024-06-15 22:36:40,956][1648985] Avg episode reward: [(0, '165.900')] [2024-06-15 22:36:41,264][1651469] Signal inference workers to stop experience collection... (47200 times) [2024-06-15 22:36:41,318][1652491] InferenceWorker_p0-w0: stopping experience collection (47200 times) [2024-06-15 22:36:41,498][1651469] Signal inference workers to resume experience collection... (47200 times) [2024-06-15 22:36:41,499][1652491] InferenceWorker_p0-w0: resuming experience collection (47200 times) [2024-06-15 22:36:41,694][1652491] Updated weights for policy 0, policy_version 906235 (0.0012) [2024-06-15 22:36:44,353][1652491] Updated weights for policy 0, policy_version 906276 (0.0020) [2024-06-15 22:36:45,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 44236.8, 300 sec: 47097.1). Total num frames: 1856110592. Throughput: 0: 11252.6. Samples: 464087552. Policy #0 lag: (min: 15.0, avg: 122.8, max: 271.0) [2024-06-15 22:36:45,956][1648985] Avg episode reward: [(0, '170.850')] [2024-06-15 22:36:47,238][1652491] Updated weights for policy 0, policy_version 906324 (0.0013) [2024-06-15 22:36:49,150][1652491] Updated weights for policy 0, policy_version 906371 (0.0012) [2024-06-15 22:36:50,955][1648985] Fps is (10 sec: 49150.8, 60 sec: 46421.1, 300 sec: 47097.0). Total num frames: 1856372736. Throughput: 0: 11411.9. Samples: 464165376. Policy #0 lag: (min: 15.0, avg: 122.8, max: 271.0) [2024-06-15 22:36:50,956][1648985] Avg episode reward: [(0, '174.930')] [2024-06-15 22:36:51,910][1652491] Updated weights for policy 0, policy_version 906451 (0.0013) [2024-06-15 22:36:55,210][1652491] Updated weights for policy 0, policy_version 906514 (0.0042) [2024-06-15 22:36:55,920][1652491] Updated weights for policy 0, policy_version 906560 (0.0020) [2024-06-15 22:36:55,955][1648985] Fps is (10 sec: 52427.8, 60 sec: 45875.1, 300 sec: 47097.0). Total num frames: 1856634880. Throughput: 0: 11401.0. Samples: 464195584. Policy #0 lag: (min: 15.0, avg: 122.8, max: 271.0) [2024-06-15 22:36:55,956][1648985] Avg episode reward: [(0, '166.870')] [2024-06-15 22:36:55,961][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000906560_1856634880.pth... [2024-06-15 22:36:56,051][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000900992_1845231616.pth [2024-06-15 22:36:58,821][1652491] Updated weights for policy 0, policy_version 906618 (0.0014) [2024-06-15 22:37:00,744][1652491] Updated weights for policy 0, policy_version 906661 (0.0014) [2024-06-15 22:37:00,962][1648985] Fps is (10 sec: 49153.5, 60 sec: 47513.6, 300 sec: 46986.0). Total num frames: 1856864256. Throughput: 0: 11480.2. Samples: 464274944. Policy #0 lag: (min: 15.0, avg: 122.8, max: 271.0) [2024-06-15 22:37:00,962][1648985] Avg episode reward: [(0, '180.100')] [2024-06-15 22:37:02,914][1652491] Updated weights for policy 0, policy_version 906710 (0.0014) [2024-06-15 22:37:03,617][1652491] Updated weights for policy 0, policy_version 906751 (0.0014) [2024-06-15 22:37:05,955][1648985] Fps is (10 sec: 45875.8, 60 sec: 46421.4, 300 sec: 46874.9). Total num frames: 1857093632. Throughput: 0: 11912.5. Samples: 464349184. Policy #0 lag: (min: 15.0, avg: 129.7, max: 271.0) [2024-06-15 22:37:05,956][1648985] Avg episode reward: [(0, '164.520')] [2024-06-15 22:37:06,189][1652491] Updated weights for policy 0, policy_version 906813 (0.0019) [2024-06-15 22:37:08,417][1652491] Updated weights for policy 0, policy_version 906851 (0.0013) [2024-06-15 22:37:10,124][1652491] Updated weights for policy 0, policy_version 906886 (0.0013) [2024-06-15 22:37:10,955][1648985] Fps is (10 sec: 49151.6, 60 sec: 46968.9, 300 sec: 46986.0). Total num frames: 1857355776. Throughput: 0: 11730.5. Samples: 464384512. Policy #0 lag: (min: 15.0, avg: 129.7, max: 271.0) [2024-06-15 22:37:10,956][1648985] Avg episode reward: [(0, '163.490')] [2024-06-15 22:37:11,319][1652491] Updated weights for policy 0, policy_version 906934 (0.0013) [2024-06-15 22:37:13,975][1652491] Updated weights for policy 0, policy_version 906977 (0.0013) [2024-06-15 22:37:15,955][1648985] Fps is (10 sec: 45874.9, 60 sec: 46967.4, 300 sec: 47208.1). Total num frames: 1857552384. Throughput: 0: 12049.1. Samples: 464458752. Policy #0 lag: (min: 15.0, avg: 129.7, max: 271.0) [2024-06-15 22:37:15,956][1648985] Avg episode reward: [(0, '177.660')] [2024-06-15 22:37:16,880][1652491] Updated weights for policy 0, policy_version 907040 (0.0013) [2024-06-15 22:37:17,646][1652491] Updated weights for policy 0, policy_version 907072 (0.0053) [2024-06-15 22:37:19,912][1652491] Updated weights for policy 0, policy_version 907135 (0.0071) [2024-06-15 22:37:20,955][1648985] Fps is (10 sec: 45875.1, 60 sec: 46421.3, 300 sec: 47097.2). Total num frames: 1857814528. Throughput: 0: 11958.0. Samples: 464527872. Policy #0 lag: (min: 15.0, avg: 129.7, max: 271.0) [2024-06-15 22:37:20,956][1648985] Avg episode reward: [(0, '185.900')] [2024-06-15 22:37:22,398][1652491] Updated weights for policy 0, policy_version 907185 (0.0016) [2024-06-15 22:37:24,221][1652491] Updated weights for policy 0, policy_version 907216 (0.0031) [2024-06-15 22:37:25,361][1652491] Updated weights for policy 0, policy_version 907264 (0.0012) [2024-06-15 22:37:25,955][1648985] Fps is (10 sec: 52429.7, 60 sec: 48605.9, 300 sec: 47541.4). Total num frames: 1858076672. Throughput: 0: 12219.8. Samples: 464572416. Policy #0 lag: (min: 15.0, avg: 129.7, max: 271.0) [2024-06-15 22:37:25,955][1648985] Avg episode reward: [(0, '188.740')] [2024-06-15 22:37:27,229][1651469] Signal inference workers to stop experience collection... (47250 times) [2024-06-15 22:37:27,279][1652491] InferenceWorker_p0-w0: stopping experience collection (47250 times) [2024-06-15 22:37:27,379][1651469] Signal inference workers to resume experience collection... (47250 times) [2024-06-15 22:37:27,394][1652491] InferenceWorker_p0-w0: resuming experience collection (47250 times) [2024-06-15 22:37:29,172][1652491] Updated weights for policy 0, policy_version 907331 (0.0013) [2024-06-15 22:37:30,351][1652491] Updated weights for policy 0, policy_version 907384 (0.0012) [2024-06-15 22:37:30,955][1648985] Fps is (10 sec: 52429.2, 60 sec: 48059.9, 300 sec: 47541.4). Total num frames: 1858338816. Throughput: 0: 12299.4. Samples: 464641024. Policy #0 lag: (min: 15.0, avg: 129.7, max: 271.0) [2024-06-15 22:37:30,956][1648985] Avg episode reward: [(0, '172.580')] [2024-06-15 22:37:32,392][1652491] Updated weights for policy 0, policy_version 907446 (0.0012) [2024-06-15 22:37:35,850][1652491] Updated weights for policy 0, policy_version 907488 (0.0012) [2024-06-15 22:37:35,955][1648985] Fps is (10 sec: 45874.8, 60 sec: 49152.0, 300 sec: 47319.8). Total num frames: 1858535424. Throughput: 0: 12242.6. Samples: 464716288. Policy #0 lag: (min: 15.0, avg: 129.7, max: 271.0) [2024-06-15 22:37:35,956][1648985] Avg episode reward: [(0, '172.200')] [2024-06-15 22:37:36,713][1652491] Updated weights for policy 0, policy_version 907519 (0.0013) [2024-06-15 22:37:38,861][1652491] Updated weights for policy 0, policy_version 907568 (0.0014) [2024-06-15 22:37:40,955][1648985] Fps is (10 sec: 42597.8, 60 sec: 48059.7, 300 sec: 47208.1). Total num frames: 1858764800. Throughput: 0: 12356.3. Samples: 464751616. Policy #0 lag: (min: 15.0, avg: 129.7, max: 271.0) [2024-06-15 22:37:40,956][1648985] Avg episode reward: [(0, '180.700')] [2024-06-15 22:37:41,277][1652491] Updated weights for policy 0, policy_version 907632 (0.0014) [2024-06-15 22:37:42,833][1652491] Updated weights for policy 0, policy_version 907680 (0.0013) [2024-06-15 22:37:45,955][1648985] Fps is (10 sec: 45875.1, 60 sec: 48059.7, 300 sec: 47097.1). Total num frames: 1858994176. Throughput: 0: 12128.7. Samples: 464820736. Policy #0 lag: (min: 15.0, avg: 129.7, max: 271.0) [2024-06-15 22:37:45,956][1648985] Avg episode reward: [(0, '159.310')] [2024-06-15 22:37:46,761][1652491] Updated weights for policy 0, policy_version 907729 (0.0013) [2024-06-15 22:37:48,621][1652491] Updated weights for policy 0, policy_version 907796 (0.0101) [2024-06-15 22:37:50,955][1648985] Fps is (10 sec: 49152.3, 60 sec: 48059.9, 300 sec: 47652.4). Total num frames: 1859256320. Throughput: 0: 12151.4. Samples: 464896000. Policy #0 lag: (min: 15.0, avg: 129.7, max: 271.0) [2024-06-15 22:37:50,956][1648985] Avg episode reward: [(0, '167.900')] [2024-06-15 22:37:51,553][1652491] Updated weights for policy 0, policy_version 907856 (0.0023) [2024-06-15 22:37:52,511][1652491] Updated weights for policy 0, policy_version 907899 (0.0013) [2024-06-15 22:37:53,728][1652491] Updated weights for policy 0, policy_version 907952 (0.0013) [2024-06-15 22:37:55,955][1648985] Fps is (10 sec: 52429.5, 60 sec: 48060.0, 300 sec: 47541.4). Total num frames: 1859518464. Throughput: 0: 12037.7. Samples: 464926208. Policy #0 lag: (min: 15.0, avg: 129.7, max: 271.0) [2024-06-15 22:37:55,955][1648985] Avg episode reward: [(0, '190.710')] [2024-06-15 22:37:58,580][1652491] Updated weights for policy 0, policy_version 908016 (0.0012) [2024-06-15 22:38:00,277][1652491] Updated weights for policy 0, policy_version 908092 (0.0016) [2024-06-15 22:38:00,955][1648985] Fps is (10 sec: 52429.3, 60 sec: 48605.9, 300 sec: 47985.7). Total num frames: 1859780608. Throughput: 0: 12083.2. Samples: 465002496. Policy #0 lag: (min: 15.0, avg: 129.7, max: 271.0) [2024-06-15 22:38:00,955][1648985] Avg episode reward: [(0, '206.150')] [2024-06-15 22:38:03,603][1652491] Updated weights for policy 0, policy_version 908146 (0.0011) [2024-06-15 22:38:05,041][1652491] Updated weights for policy 0, policy_version 908209 (0.0014) [2024-06-15 22:38:05,955][1648985] Fps is (10 sec: 52428.2, 60 sec: 49152.0, 300 sec: 47874.6). Total num frames: 1860042752. Throughput: 0: 11969.4. Samples: 465066496. Policy #0 lag: (min: 15.0, avg: 129.7, max: 271.0) [2024-06-15 22:38:05,956][1648985] Avg episode reward: [(0, '190.130')] [2024-06-15 22:38:09,949][1652491] Updated weights for policy 0, policy_version 908256 (0.0012) [2024-06-15 22:38:10,423][1651469] Signal inference workers to stop experience collection... (47300 times) [2024-06-15 22:38:10,492][1652491] InferenceWorker_p0-w0: stopping experience collection (47300 times) [2024-06-15 22:38:10,686][1651469] Signal inference workers to resume experience collection... (47300 times) [2024-06-15 22:38:10,687][1652491] InferenceWorker_p0-w0: resuming experience collection (47300 times) [2024-06-15 22:38:10,955][1648985] Fps is (10 sec: 39321.1, 60 sec: 46967.4, 300 sec: 47545.7). Total num frames: 1860173824. Throughput: 0: 11935.2. Samples: 465109504. Policy #0 lag: (min: 15.0, avg: 129.7, max: 271.0) [2024-06-15 22:38:10,956][1648985] Avg episode reward: [(0, '183.400')] [2024-06-15 22:38:11,210][1652491] Updated weights for policy 0, policy_version 908307 (0.0013) [2024-06-15 22:38:11,931][1652491] Updated weights for policy 0, policy_version 908345 (0.0015) [2024-06-15 22:38:15,115][1652491] Updated weights for policy 0, policy_version 908416 (0.0037) [2024-06-15 22:38:15,955][1648985] Fps is (10 sec: 42598.5, 60 sec: 48606.0, 300 sec: 47652.4). Total num frames: 1860468736. Throughput: 0: 11901.2. Samples: 465176576. Policy #0 lag: (min: 15.0, avg: 129.7, max: 271.0) [2024-06-15 22:38:15,955][1648985] Avg episode reward: [(0, '168.520')] [2024-06-15 22:38:20,301][1652491] Updated weights for policy 0, policy_version 908481 (0.0011) [2024-06-15 22:38:20,955][1648985] Fps is (10 sec: 45875.9, 60 sec: 46967.6, 300 sec: 47319.2). Total num frames: 1860632576. Throughput: 0: 11798.8. Samples: 465247232. Policy #0 lag: (min: 15.0, avg: 129.7, max: 271.0) [2024-06-15 22:38:20,955][1648985] Avg episode reward: [(0, '155.070')] [2024-06-15 22:38:21,358][1652491] Updated weights for policy 0, policy_version 908536 (0.0014) [2024-06-15 22:38:22,855][1652491] Updated weights for policy 0, policy_version 908592 (0.0023) [2024-06-15 22:38:25,288][1652491] Updated weights for policy 0, policy_version 908626 (0.0014) [2024-06-15 22:38:25,955][1648985] Fps is (10 sec: 45874.4, 60 sec: 47513.4, 300 sec: 47430.3). Total num frames: 1860927488. Throughput: 0: 11776.0. Samples: 465281536. Policy #0 lag: (min: 15.0, avg: 129.7, max: 271.0) [2024-06-15 22:38:25,956][1648985] Avg episode reward: [(0, '139.950')] [2024-06-15 22:38:27,583][1652491] Updated weights for policy 0, policy_version 908729 (0.0014) [2024-06-15 22:38:30,955][1648985] Fps is (10 sec: 45875.0, 60 sec: 45875.2, 300 sec: 47097.1). Total num frames: 1861091328. Throughput: 0: 11730.5. Samples: 465348608. Policy #0 lag: (min: 15.0, avg: 129.7, max: 271.0) [2024-06-15 22:38:30,956][1648985] Avg episode reward: [(0, '145.900')] [2024-06-15 22:38:32,042][1652491] Updated weights for policy 0, policy_version 908771 (0.0014) [2024-06-15 22:38:34,468][1652491] Updated weights for policy 0, policy_version 908833 (0.0016) [2024-06-15 22:38:35,041][1652491] Updated weights for policy 0, policy_version 908863 (0.0012) [2024-06-15 22:38:35,955][1648985] Fps is (10 sec: 42598.9, 60 sec: 46967.4, 300 sec: 47097.1). Total num frames: 1861353472. Throughput: 0: 11730.5. Samples: 465423872. Policy #0 lag: (min: 15.0, avg: 129.7, max: 271.0) [2024-06-15 22:38:35,956][1648985] Avg episode reward: [(0, '147.230')] [2024-06-15 22:38:37,065][1652491] Updated weights for policy 0, policy_version 908927 (0.0078) [2024-06-15 22:38:38,544][1652491] Updated weights for policy 0, policy_version 908977 (0.0014) [2024-06-15 22:38:40,955][1648985] Fps is (10 sec: 52428.0, 60 sec: 47513.6, 300 sec: 47097.0). Total num frames: 1861615616. Throughput: 0: 11764.6. Samples: 465455616. Policy #0 lag: (min: 15.0, avg: 129.7, max: 271.0) [2024-06-15 22:38:40,956][1648985] Avg episode reward: [(0, '159.360')] [2024-06-15 22:38:42,714][1652491] Updated weights for policy 0, policy_version 909040 (0.0013) [2024-06-15 22:38:45,955][1648985] Fps is (10 sec: 45874.5, 60 sec: 46967.3, 300 sec: 47208.1). Total num frames: 1861812224. Throughput: 0: 11776.0. Samples: 465532416. Policy #0 lag: (min: 15.0, avg: 129.7, max: 271.0) [2024-06-15 22:38:45,956][1648985] Avg episode reward: [(0, '157.260')] [2024-06-15 22:38:46,260][1652491] Updated weights for policy 0, policy_version 909109 (0.0013) [2024-06-15 22:38:47,035][1652491] Updated weights for policy 0, policy_version 909136 (0.0011) [2024-06-15 22:38:47,973][1652491] Updated weights for policy 0, policy_version 909183 (0.0013) [2024-06-15 22:38:48,858][1651469] Signal inference workers to stop experience collection... (47350 times) [2024-06-15 22:38:48,900][1652491] InferenceWorker_p0-w0: stopping experience collection (47350 times) [2024-06-15 22:38:49,117][1651469] Signal inference workers to resume experience collection... (47350 times) [2024-06-15 22:38:49,118][1652491] InferenceWorker_p0-w0: resuming experience collection (47350 times) [2024-06-15 22:38:49,340][1652491] Updated weights for policy 0, policy_version 909245 (0.0014) [2024-06-15 22:38:50,955][1648985] Fps is (10 sec: 52429.3, 60 sec: 48059.7, 300 sec: 47430.3). Total num frames: 1862139904. Throughput: 0: 11867.0. Samples: 465600512. Policy #0 lag: (min: 15.0, avg: 129.7, max: 271.0) [2024-06-15 22:38:50,956][1648985] Avg episode reward: [(0, '155.790')] [2024-06-15 22:38:53,802][1652491] Updated weights for policy 0, policy_version 909306 (0.0074) [2024-06-15 22:38:55,955][1648985] Fps is (10 sec: 45875.5, 60 sec: 45875.0, 300 sec: 47430.3). Total num frames: 1862270976. Throughput: 0: 11696.4. Samples: 465635840. Policy #0 lag: (min: 15.0, avg: 129.7, max: 271.0) [2024-06-15 22:38:55,956][1648985] Avg episode reward: [(0, '156.650')] [2024-06-15 22:38:55,996][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000909312_1862270976.pth... [2024-06-15 22:38:56,222][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000903808_1850998784.pth [2024-06-15 22:38:57,435][1652491] Updated weights for policy 0, policy_version 909362 (0.0027) [2024-06-15 22:38:59,095][1652491] Updated weights for policy 0, policy_version 909428 (0.0012) [2024-06-15 22:39:00,504][1652491] Updated weights for policy 0, policy_version 909458 (0.0013) [2024-06-15 22:39:00,955][1648985] Fps is (10 sec: 49152.6, 60 sec: 47513.6, 300 sec: 47652.5). Total num frames: 1862631424. Throughput: 0: 11776.0. Samples: 465706496. Policy #0 lag: (min: 47.0, avg: 136.9, max: 303.0) [2024-06-15 22:39:00,956][1648985] Avg episode reward: [(0, '165.180')] [2024-06-15 22:39:03,675][1652491] Updated weights for policy 0, policy_version 909505 (0.0015) [2024-06-15 22:39:04,776][1652491] Updated weights for policy 0, policy_version 909562 (0.0013) [2024-06-15 22:39:05,955][1648985] Fps is (10 sec: 52429.3, 60 sec: 45875.2, 300 sec: 47541.4). Total num frames: 1862795264. Throughput: 0: 11923.9. Samples: 465783808. Policy #0 lag: (min: 47.0, avg: 136.9, max: 303.0) [2024-06-15 22:39:05,956][1648985] Avg episode reward: [(0, '160.340')] [2024-06-15 22:39:07,803][1652491] Updated weights for policy 0, policy_version 909621 (0.0014) [2024-06-15 22:39:08,624][1652491] Updated weights for policy 0, policy_version 909650 (0.0013) [2024-06-15 22:39:09,461][1652491] Updated weights for policy 0, policy_version 909695 (0.0014) [2024-06-15 22:39:10,955][1648985] Fps is (10 sec: 45874.6, 60 sec: 48605.9, 300 sec: 47652.5). Total num frames: 1863090176. Throughput: 0: 11855.7. Samples: 465815040. Policy #0 lag: (min: 47.0, avg: 136.9, max: 303.0) [2024-06-15 22:39:10,956][1648985] Avg episode reward: [(0, '154.090')] [2024-06-15 22:39:11,447][1652491] Updated weights for policy 0, policy_version 909744 (0.0012) [2024-06-15 22:39:14,704][1652491] Updated weights for policy 0, policy_version 909794 (0.0018) [2024-06-15 22:39:15,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 47513.6, 300 sec: 47541.4). Total num frames: 1863319552. Throughput: 0: 12117.3. Samples: 465893888. Policy #0 lag: (min: 47.0, avg: 136.9, max: 303.0) [2024-06-15 22:39:15,956][1648985] Avg episode reward: [(0, '157.060')] [2024-06-15 22:39:17,866][1652491] Updated weights for policy 0, policy_version 909857 (0.0014) [2024-06-15 22:39:19,983][1652491] Updated weights for policy 0, policy_version 909920 (0.0012) [2024-06-15 22:39:20,955][1648985] Fps is (10 sec: 49152.6, 60 sec: 49152.0, 300 sec: 47541.4). Total num frames: 1863581696. Throughput: 0: 11912.6. Samples: 465959936. Policy #0 lag: (min: 47.0, avg: 136.9, max: 303.0) [2024-06-15 22:39:20,955][1648985] Avg episode reward: [(0, '168.430')] [2024-06-15 22:39:21,208][1652491] Updated weights for policy 0, policy_version 909954 (0.0013) [2024-06-15 22:39:22,102][1652491] Updated weights for policy 0, policy_version 910011 (0.0033) [2024-06-15 22:39:25,955][1648985] Fps is (10 sec: 42598.4, 60 sec: 46967.6, 300 sec: 47208.1). Total num frames: 1863745536. Throughput: 0: 12185.6. Samples: 466003968. Policy #0 lag: (min: 47.0, avg: 136.9, max: 303.0) [2024-06-15 22:39:25,956][1648985] Avg episode reward: [(0, '159.400')] [2024-06-15 22:39:26,899][1652491] Updated weights for policy 0, policy_version 910080 (0.0108) [2024-06-15 22:39:29,088][1652491] Updated weights for policy 0, policy_version 910141 (0.0017) [2024-06-15 22:39:30,955][1648985] Fps is (10 sec: 42597.1, 60 sec: 48605.7, 300 sec: 47208.1). Total num frames: 1864007680. Throughput: 0: 11946.6. Samples: 466070016. Policy #0 lag: (min: 47.0, avg: 136.9, max: 303.0) [2024-06-15 22:39:30,956][1648985] Avg episode reward: [(0, '163.790')] [2024-06-15 22:39:31,668][1652491] Updated weights for policy 0, policy_version 910207 (0.0013) [2024-06-15 22:39:33,046][1651469] Signal inference workers to stop experience collection... (47400 times) [2024-06-15 22:39:33,095][1652491] InferenceWorker_p0-w0: stopping experience collection (47400 times) [2024-06-15 22:39:33,241][1651469] Signal inference workers to resume experience collection... (47400 times) [2024-06-15 22:39:33,242][1652491] InferenceWorker_p0-w0: resuming experience collection (47400 times) [2024-06-15 22:39:33,377][1652491] Updated weights for policy 0, policy_version 910261 (0.0053) [2024-06-15 22:39:35,955][1648985] Fps is (10 sec: 49152.1, 60 sec: 48059.8, 300 sec: 47097.1). Total num frames: 1864237056. Throughput: 0: 12071.8. Samples: 466143744. Policy #0 lag: (min: 47.0, avg: 136.9, max: 303.0) [2024-06-15 22:39:35,956][1648985] Avg episode reward: [(0, '181.560')] [2024-06-15 22:39:38,104][1652491] Updated weights for policy 0, policy_version 910329 (0.0014) [2024-06-15 22:39:40,953][1652491] Updated weights for policy 0, policy_version 910389 (0.0015) [2024-06-15 22:39:40,955][1648985] Fps is (10 sec: 45876.1, 60 sec: 47513.7, 300 sec: 46986.0). Total num frames: 1864466432. Throughput: 0: 12049.1. Samples: 466178048. Policy #0 lag: (min: 47.0, avg: 136.9, max: 303.0) [2024-06-15 22:39:40,956][1648985] Avg episode reward: [(0, '198.830')] [2024-06-15 22:39:44,023][1652491] Updated weights for policy 0, policy_version 910485 (0.0050) [2024-06-15 22:39:44,808][1652491] Updated weights for policy 0, policy_version 910526 (0.0011) [2024-06-15 22:39:45,955][1648985] Fps is (10 sec: 52429.1, 60 sec: 49152.2, 300 sec: 47208.2). Total num frames: 1864761344. Throughput: 0: 11889.8. Samples: 466241536. Policy #0 lag: (min: 47.0, avg: 136.9, max: 303.0) [2024-06-15 22:39:45,955][1648985] Avg episode reward: [(0, '185.810')] [2024-06-15 22:39:49,282][1652491] Updated weights for policy 0, policy_version 910583 (0.0025) [2024-06-15 22:39:50,955][1648985] Fps is (10 sec: 42598.5, 60 sec: 45875.2, 300 sec: 47097.1). Total num frames: 1864892416. Throughput: 0: 11935.3. Samples: 466320896. Policy #0 lag: (min: 47.0, avg: 136.9, max: 303.0) [2024-06-15 22:39:50,956][1648985] Avg episode reward: [(0, '182.920')] [2024-06-15 22:39:51,721][1652491] Updated weights for policy 0, policy_version 910626 (0.0014) [2024-06-15 22:39:53,653][1652491] Updated weights for policy 0, policy_version 910716 (0.0020) [2024-06-15 22:39:55,955][1648985] Fps is (10 sec: 45874.7, 60 sec: 49152.0, 300 sec: 47319.2). Total num frames: 1865220096. Throughput: 0: 11719.1. Samples: 466342400. Policy #0 lag: (min: 47.0, avg: 136.9, max: 303.0) [2024-06-15 22:39:55,956][1648985] Avg episode reward: [(0, '167.500')] [2024-06-15 22:39:56,163][1652491] Updated weights for policy 0, policy_version 910778 (0.0011) [2024-06-15 22:40:00,442][1652491] Updated weights for policy 0, policy_version 910816 (0.0013) [2024-06-15 22:40:00,955][1648985] Fps is (10 sec: 49152.6, 60 sec: 45875.2, 300 sec: 46986.0). Total num frames: 1865383936. Throughput: 0: 11719.1. Samples: 466421248. Policy #0 lag: (min: 47.0, avg: 136.9, max: 303.0) [2024-06-15 22:40:00,955][1648985] Avg episode reward: [(0, '180.440')] [2024-06-15 22:40:03,316][1652491] Updated weights for policy 0, policy_version 910880 (0.0014) [2024-06-15 22:40:04,896][1652491] Updated weights for policy 0, policy_version 910948 (0.0014) [2024-06-15 22:40:05,955][1648985] Fps is (10 sec: 45875.5, 60 sec: 48059.7, 300 sec: 47430.3). Total num frames: 1865678848. Throughput: 0: 11662.2. Samples: 466484736. Policy #0 lag: (min: 47.0, avg: 136.9, max: 303.0) [2024-06-15 22:40:05,956][1648985] Avg episode reward: [(0, '163.890')] [2024-06-15 22:40:06,503][1652491] Updated weights for policy 0, policy_version 910978 (0.0036) [2024-06-15 22:40:07,622][1652491] Updated weights for policy 0, policy_version 911034 (0.0013) [2024-06-15 22:40:10,955][1648985] Fps is (10 sec: 42598.0, 60 sec: 45329.1, 300 sec: 46986.0). Total num frames: 1865809920. Throughput: 0: 11502.9. Samples: 466521600. Policy #0 lag: (min: 47.0, avg: 136.9, max: 303.0) [2024-06-15 22:40:10,956][1648985] Avg episode reward: [(0, '162.760')] [2024-06-15 22:40:12,396][1652491] Updated weights for policy 0, policy_version 911091 (0.0022) [2024-06-15 22:40:15,304][1652491] Updated weights for policy 0, policy_version 911168 (0.0013) [2024-06-15 22:40:15,955][1648985] Fps is (10 sec: 42598.4, 60 sec: 46421.3, 300 sec: 47208.2). Total num frames: 1866104832. Throughput: 0: 11582.6. Samples: 466591232. Policy #0 lag: (min: 47.0, avg: 136.9, max: 303.0) [2024-06-15 22:40:15,956][1648985] Avg episode reward: [(0, '171.040')] [2024-06-15 22:40:16,659][1652491] Updated weights for policy 0, policy_version 911230 (0.0028) [2024-06-15 22:40:17,922][1651469] Signal inference workers to stop experience collection... (47450 times) [2024-06-15 22:40:17,977][1652491] InferenceWorker_p0-w0: stopping experience collection (47450 times) [2024-06-15 22:40:18,180][1651469] Signal inference workers to resume experience collection... (47450 times) [2024-06-15 22:40:18,181][1652491] InferenceWorker_p0-w0: resuming experience collection (47450 times) [2024-06-15 22:40:18,723][1652491] Updated weights for policy 0, policy_version 911267 (0.0043) [2024-06-15 22:40:20,955][1648985] Fps is (10 sec: 52427.5, 60 sec: 45875.0, 300 sec: 47097.0). Total num frames: 1866334208. Throughput: 0: 11434.6. Samples: 466658304. Policy #0 lag: (min: 47.0, avg: 136.9, max: 303.0) [2024-06-15 22:40:20,957][1648985] Avg episode reward: [(0, '189.420')] [2024-06-15 22:40:23,327][1652491] Updated weights for policy 0, policy_version 911315 (0.0015) [2024-06-15 22:40:24,235][1652491] Updated weights for policy 0, policy_version 911359 (0.0012) [2024-06-15 22:40:25,955][1648985] Fps is (10 sec: 39321.5, 60 sec: 45875.2, 300 sec: 46986.0). Total num frames: 1866498048. Throughput: 0: 11480.2. Samples: 466694656. Policy #0 lag: (min: 47.0, avg: 136.9, max: 303.0) [2024-06-15 22:40:25,956][1648985] Avg episode reward: [(0, '161.110')] [2024-06-15 22:40:26,773][1652491] Updated weights for policy 0, policy_version 911424 (0.0012) [2024-06-15 22:40:27,863][1652491] Updated weights for policy 0, policy_version 911472 (0.0012) [2024-06-15 22:40:28,294][1652491] Updated weights for policy 0, policy_version 911488 (0.0012) [2024-06-15 22:40:30,465][1652491] Updated weights for policy 0, policy_version 911542 (0.0144) [2024-06-15 22:40:30,955][1648985] Fps is (10 sec: 52429.9, 60 sec: 47513.8, 300 sec: 47097.1). Total num frames: 1866858496. Throughput: 0: 11605.3. Samples: 466763776. Policy #0 lag: (min: 47.0, avg: 136.9, max: 303.0) [2024-06-15 22:40:30,956][1648985] Avg episode reward: [(0, '146.610')] [2024-06-15 22:40:34,465][1652491] Updated weights for policy 0, policy_version 911584 (0.0168) [2024-06-15 22:40:35,955][1648985] Fps is (10 sec: 49152.1, 60 sec: 45875.2, 300 sec: 47097.1). Total num frames: 1866989568. Throughput: 0: 11457.4. Samples: 466836480. Policy #0 lag: (min: 47.0, avg: 136.9, max: 303.0) [2024-06-15 22:40:35,956][1648985] Avg episode reward: [(0, '156.940')] [2024-06-15 22:40:36,719][1652491] Updated weights for policy 0, policy_version 911634 (0.0013) [2024-06-15 22:40:38,239][1652491] Updated weights for policy 0, policy_version 911712 (0.0012) [2024-06-15 22:40:40,716][1652491] Updated weights for policy 0, policy_version 911760 (0.0012) [2024-06-15 22:40:40,955][1648985] Fps is (10 sec: 42598.3, 60 sec: 46967.5, 300 sec: 46874.9). Total num frames: 1867284480. Throughput: 0: 11662.2. Samples: 466867200. Policy #0 lag: (min: 47.0, avg: 136.9, max: 303.0) [2024-06-15 22:40:40,956][1648985] Avg episode reward: [(0, '157.410')] [2024-06-15 22:40:45,015][1652491] Updated weights for policy 0, policy_version 911814 (0.0015) [2024-06-15 22:40:45,955][1648985] Fps is (10 sec: 49152.3, 60 sec: 45329.0, 300 sec: 47097.1). Total num frames: 1867481088. Throughput: 0: 11707.7. Samples: 466948096. Policy #0 lag: (min: 47.0, avg: 136.9, max: 303.0) [2024-06-15 22:40:45,955][1648985] Avg episode reward: [(0, '152.550')] [2024-06-15 22:40:45,992][1652491] Updated weights for policy 0, policy_version 911869 (0.0012) [2024-06-15 22:40:47,820][1652491] Updated weights for policy 0, policy_version 911920 (0.0012) [2024-06-15 22:40:48,998][1652491] Updated weights for policy 0, policy_version 911974 (0.0013) [2024-06-15 22:40:50,969][1648985] Fps is (10 sec: 49082.5, 60 sec: 48048.4, 300 sec: 47094.8). Total num frames: 1867776000. Throughput: 0: 11840.5. Samples: 467017728. Policy #0 lag: (min: 47.0, avg: 136.9, max: 303.0) [2024-06-15 22:40:50,970][1648985] Avg episode reward: [(0, '148.780')] [2024-06-15 22:40:52,136][1652491] Updated weights for policy 0, policy_version 912034 (0.0014) [2024-06-15 22:40:55,955][1648985] Fps is (10 sec: 42597.6, 60 sec: 44782.9, 300 sec: 47097.0). Total num frames: 1867907072. Throughput: 0: 11696.3. Samples: 467047936. Policy #0 lag: (min: 26.0, avg: 160.3, max: 282.0) [2024-06-15 22:40:55,956][1648985] Avg episode reward: [(0, '143.260')] [2024-06-15 22:40:56,292][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000912080_1867939840.pth... [2024-06-15 22:40:56,416][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000906560_1856634880.pth [2024-06-15 22:40:56,907][1652491] Updated weights for policy 0, policy_version 912112 (0.0033) [2024-06-15 22:40:59,186][1652491] Updated weights for policy 0, policy_version 912176 (0.0013) [2024-06-15 22:41:00,082][1651469] Signal inference workers to stop experience collection... (47500 times) [2024-06-15 22:41:00,137][1652491] InferenceWorker_p0-w0: stopping experience collection (47500 times) [2024-06-15 22:41:00,351][1651469] Signal inference workers to resume experience collection... (47500 times) [2024-06-15 22:41:00,356][1652491] InferenceWorker_p0-w0: resuming experience collection (47500 times) [2024-06-15 22:41:00,876][1652491] Updated weights for policy 0, policy_version 912244 (0.0013) [2024-06-15 22:41:00,955][1648985] Fps is (10 sec: 49221.8, 60 sec: 48059.6, 300 sec: 47319.2). Total num frames: 1868267520. Throughput: 0: 11662.2. Samples: 467116032. Policy #0 lag: (min: 26.0, avg: 160.3, max: 282.0) [2024-06-15 22:41:00,956][1648985] Avg episode reward: [(0, '149.310')] [2024-06-15 22:41:03,540][1652491] Updated weights for policy 0, policy_version 912291 (0.0013) [2024-06-15 22:41:05,955][1648985] Fps is (10 sec: 52429.2, 60 sec: 45875.1, 300 sec: 47097.4). Total num frames: 1868431360. Throughput: 0: 11707.8. Samples: 467185152. Policy #0 lag: (min: 26.0, avg: 160.3, max: 282.0) [2024-06-15 22:41:05,956][1648985] Avg episode reward: [(0, '179.740')] [2024-06-15 22:41:08,056][1652491] Updated weights for policy 0, policy_version 912341 (0.0025) [2024-06-15 22:41:09,505][1652491] Updated weights for policy 0, policy_version 912400 (0.0013) [2024-06-15 22:41:10,767][1652491] Updated weights for policy 0, policy_version 912464 (0.0015) [2024-06-15 22:41:10,955][1648985] Fps is (10 sec: 45875.1, 60 sec: 48605.8, 300 sec: 47430.3). Total num frames: 1868726272. Throughput: 0: 11832.9. Samples: 467227136. Policy #0 lag: (min: 26.0, avg: 160.3, max: 282.0) [2024-06-15 22:41:10,956][1648985] Avg episode reward: [(0, '188.860')] [2024-06-15 22:41:11,774][1652491] Updated weights for policy 0, policy_version 912508 (0.0012) [2024-06-15 22:41:13,975][1652491] Updated weights for policy 0, policy_version 912532 (0.0012) [2024-06-15 22:41:15,188][1652491] Updated weights for policy 0, policy_version 912576 (0.0015) [2024-06-15 22:41:15,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 47513.5, 300 sec: 47208.1). Total num frames: 1868955648. Throughput: 0: 11923.9. Samples: 467300352. Policy #0 lag: (min: 26.0, avg: 160.3, max: 282.0) [2024-06-15 22:41:15,956][1648985] Avg episode reward: [(0, '165.960')] [2024-06-15 22:41:19,804][1652491] Updated weights for policy 0, policy_version 912642 (0.0024) [2024-06-15 22:41:20,955][1648985] Fps is (10 sec: 45875.5, 60 sec: 47513.8, 300 sec: 47541.4). Total num frames: 1869185024. Throughput: 0: 11878.4. Samples: 467371008. Policy #0 lag: (min: 26.0, avg: 160.3, max: 282.0) [2024-06-15 22:41:20,956][1648985] Avg episode reward: [(0, '157.820')] [2024-06-15 22:41:21,229][1652491] Updated weights for policy 0, policy_version 912711 (0.0013) [2024-06-15 22:41:22,232][1652491] Updated weights for policy 0, policy_version 912765 (0.0098) [2024-06-15 22:41:25,542][1652491] Updated weights for policy 0, policy_version 912825 (0.0013) [2024-06-15 22:41:25,955][1648985] Fps is (10 sec: 52428.5, 60 sec: 49698.1, 300 sec: 47541.4). Total num frames: 1869479936. Throughput: 0: 12094.6. Samples: 467411456. Policy #0 lag: (min: 26.0, avg: 160.3, max: 282.0) [2024-06-15 22:41:25,956][1648985] Avg episode reward: [(0, '153.130')] [2024-06-15 22:41:29,221][1652491] Updated weights for policy 0, policy_version 912867 (0.0014) [2024-06-15 22:41:30,344][1652491] Updated weights for policy 0, policy_version 912903 (0.0036) [2024-06-15 22:41:30,955][1648985] Fps is (10 sec: 49152.2, 60 sec: 46967.5, 300 sec: 47763.5). Total num frames: 1869676544. Throughput: 0: 11901.2. Samples: 467483648. Policy #0 lag: (min: 26.0, avg: 160.3, max: 282.0) [2024-06-15 22:41:30,956][1648985] Avg episode reward: [(0, '152.310')] [2024-06-15 22:41:31,388][1652491] Updated weights for policy 0, policy_version 912958 (0.0013) [2024-06-15 22:41:32,648][1652491] Updated weights for policy 0, policy_version 913015 (0.0045) [2024-06-15 22:41:35,650][1652491] Updated weights for policy 0, policy_version 913072 (0.0013) [2024-06-15 22:41:35,955][1648985] Fps is (10 sec: 52429.3, 60 sec: 50244.3, 300 sec: 47874.6). Total num frames: 1870004224. Throughput: 0: 11950.4. Samples: 467555328. Policy #0 lag: (min: 26.0, avg: 160.3, max: 282.0) [2024-06-15 22:41:35,956][1648985] Avg episode reward: [(0, '152.360')] [2024-06-15 22:41:40,338][1652491] Updated weights for policy 0, policy_version 913122 (0.0080) [2024-06-15 22:41:40,955][1648985] Fps is (10 sec: 42598.0, 60 sec: 46967.5, 300 sec: 47430.3). Total num frames: 1870102528. Throughput: 0: 12140.1. Samples: 467594240. Policy #0 lag: (min: 26.0, avg: 160.3, max: 282.0) [2024-06-15 22:41:40,956][1648985] Avg episode reward: [(0, '143.320')] [2024-06-15 22:41:41,424][1652491] Updated weights for policy 0, policy_version 913168 (0.0012) [2024-06-15 22:41:43,153][1651469] Signal inference workers to stop experience collection... (47550 times) [2024-06-15 22:41:43,195][1652491] Updated weights for policy 0, policy_version 913218 (0.0014) [2024-06-15 22:41:43,216][1652491] InferenceWorker_p0-w0: stopping experience collection (47550 times) [2024-06-15 22:41:43,389][1651469] Signal inference workers to resume experience collection... (47550 times) [2024-06-15 22:41:43,390][1652491] InferenceWorker_p0-w0: resuming experience collection (47550 times) [2024-06-15 22:41:44,197][1652491] Updated weights for policy 0, policy_version 913274 (0.0017) [2024-06-15 22:41:45,955][1648985] Fps is (10 sec: 39320.8, 60 sec: 48605.6, 300 sec: 47541.4). Total num frames: 1870397440. Throughput: 0: 12105.9. Samples: 467660800. Policy #0 lag: (min: 26.0, avg: 160.3, max: 282.0) [2024-06-15 22:41:45,956][1648985] Avg episode reward: [(0, '140.290')] [2024-06-15 22:41:46,755][1652491] Updated weights for policy 0, policy_version 913316 (0.0012) [2024-06-15 22:41:50,636][1652491] Updated weights for policy 0, policy_version 913360 (0.0014) [2024-06-15 22:41:50,955][1648985] Fps is (10 sec: 45874.5, 60 sec: 46432.2, 300 sec: 47208.1). Total num frames: 1870561280. Throughput: 0: 12288.0. Samples: 467738112. Policy #0 lag: (min: 26.0, avg: 160.3, max: 282.0) [2024-06-15 22:41:50,956][1648985] Avg episode reward: [(0, '163.160')] [2024-06-15 22:41:51,659][1652491] Updated weights for policy 0, policy_version 913404 (0.0036) [2024-06-15 22:41:52,905][1652491] Updated weights for policy 0, policy_version 913467 (0.0011) [2024-06-15 22:41:55,270][1652491] Updated weights for policy 0, policy_version 913536 (0.0012) [2024-06-15 22:41:55,955][1648985] Fps is (10 sec: 52429.9, 60 sec: 50244.4, 300 sec: 47652.4). Total num frames: 1870921728. Throughput: 0: 12117.3. Samples: 467772416. Policy #0 lag: (min: 26.0, avg: 160.3, max: 282.0) [2024-06-15 22:41:55,956][1648985] Avg episode reward: [(0, '158.000')] [2024-06-15 22:41:57,456][1652491] Updated weights for policy 0, policy_version 913584 (0.0099) [2024-06-15 22:42:00,955][1648985] Fps is (10 sec: 49153.1, 60 sec: 46421.4, 300 sec: 47319.2). Total num frames: 1871052800. Throughput: 0: 12128.7. Samples: 467846144. Policy #0 lag: (min: 26.0, avg: 160.3, max: 282.0) [2024-06-15 22:42:00,956][1648985] Avg episode reward: [(0, '147.490')] [2024-06-15 22:42:02,682][1652491] Updated weights for policy 0, policy_version 913648 (0.0013) [2024-06-15 22:42:03,535][1652491] Updated weights for policy 0, policy_version 913683 (0.0011) [2024-06-15 22:42:05,153][1652491] Updated weights for policy 0, policy_version 913745 (0.0013) [2024-06-15 22:42:05,955][1648985] Fps is (10 sec: 49152.0, 60 sec: 49698.2, 300 sec: 47652.5). Total num frames: 1871413248. Throughput: 0: 11980.8. Samples: 467910144. Policy #0 lag: (min: 26.0, avg: 160.3, max: 282.0) [2024-06-15 22:42:05,956][1648985] Avg episode reward: [(0, '143.550')] [2024-06-15 22:42:07,364][1652491] Updated weights for policy 0, policy_version 913793 (0.0014) [2024-06-15 22:42:08,736][1652491] Updated weights for policy 0, policy_version 913855 (0.0013) [2024-06-15 22:42:10,955][1648985] Fps is (10 sec: 52428.6, 60 sec: 47513.6, 300 sec: 47541.4). Total num frames: 1871577088. Throughput: 0: 11889.8. Samples: 467946496. Policy #0 lag: (min: 26.0, avg: 160.3, max: 282.0) [2024-06-15 22:42:10,956][1648985] Avg episode reward: [(0, '169.800')] [2024-06-15 22:42:13,512][1652491] Updated weights for policy 0, policy_version 913907 (0.0016) [2024-06-15 22:42:14,758][1652491] Updated weights for policy 0, policy_version 913968 (0.0020) [2024-06-15 22:42:15,955][1648985] Fps is (10 sec: 42598.6, 60 sec: 48059.8, 300 sec: 47541.4). Total num frames: 1871839232. Throughput: 0: 11935.3. Samples: 468020736. Policy #0 lag: (min: 26.0, avg: 160.3, max: 282.0) [2024-06-15 22:42:15,956][1648985] Avg episode reward: [(0, '174.220')] [2024-06-15 22:42:16,174][1652491] Updated weights for policy 0, policy_version 914002 (0.0013) [2024-06-15 22:42:17,028][1652491] Updated weights for policy 0, policy_version 914046 (0.0013) [2024-06-15 22:42:19,406][1652491] Updated weights for policy 0, policy_version 914107 (0.0012) [2024-06-15 22:42:20,955][1648985] Fps is (10 sec: 52429.8, 60 sec: 48606.0, 300 sec: 47541.4). Total num frames: 1872101376. Throughput: 0: 12026.4. Samples: 468096512. Policy #0 lag: (min: 26.0, avg: 160.3, max: 282.0) [2024-06-15 22:42:20,955][1648985] Avg episode reward: [(0, '195.570')] [2024-06-15 22:42:24,486][1652491] Updated weights for policy 0, policy_version 914145 (0.0012) [2024-06-15 22:42:25,650][1651469] Signal inference workers to stop experience collection... (47600 times) [2024-06-15 22:42:25,760][1652491] InferenceWorker_p0-w0: stopping experience collection (47600 times) [2024-06-15 22:42:25,841][1651469] Signal inference workers to resume experience collection... (47600 times) [2024-06-15 22:42:25,842][1652491] InferenceWorker_p0-w0: resuming experience collection (47600 times) [2024-06-15 22:42:25,955][1648985] Fps is (10 sec: 49150.6, 60 sec: 47513.5, 300 sec: 47430.2). Total num frames: 1872330752. Throughput: 0: 12003.5. Samples: 468134400. Policy #0 lag: (min: 26.0, avg: 160.3, max: 282.0) [2024-06-15 22:42:25,956][1648985] Avg episode reward: [(0, '180.640')] [2024-06-15 22:42:25,997][1652491] Updated weights for policy 0, policy_version 914231 (0.0062) [2024-06-15 22:42:28,387][1652491] Updated weights for policy 0, policy_version 914288 (0.0068) [2024-06-15 22:42:30,277][1652491] Updated weights for policy 0, policy_version 914336 (0.0012) [2024-06-15 22:42:30,955][1648985] Fps is (10 sec: 52427.7, 60 sec: 49151.9, 300 sec: 47763.5). Total num frames: 1872625664. Throughput: 0: 12003.6. Samples: 468200960. Policy #0 lag: (min: 26.0, avg: 160.3, max: 282.0) [2024-06-15 22:42:30,956][1648985] Avg episode reward: [(0, '177.770')] [2024-06-15 22:42:35,261][1652491] Updated weights for policy 0, policy_version 914398 (0.0091) [2024-06-15 22:42:35,955][1648985] Fps is (10 sec: 42599.7, 60 sec: 45875.2, 300 sec: 47430.3). Total num frames: 1872756736. Throughput: 0: 11946.7. Samples: 468275712. Policy #0 lag: (min: 26.0, avg: 160.3, max: 282.0) [2024-06-15 22:42:35,955][1648985] Avg episode reward: [(0, '158.080')] [2024-06-15 22:42:36,550][1652491] Updated weights for policy 0, policy_version 914464 (0.0011) [2024-06-15 22:42:38,290][1652491] Updated weights for policy 0, policy_version 914512 (0.0015) [2024-06-15 22:42:39,354][1652491] Updated weights for policy 0, policy_version 914554 (0.0016) [2024-06-15 22:42:40,955][1648985] Fps is (10 sec: 45874.2, 60 sec: 49698.0, 300 sec: 47763.5). Total num frames: 1873084416. Throughput: 0: 11946.6. Samples: 468310016. Policy #0 lag: (min: 26.0, avg: 160.3, max: 282.0) [2024-06-15 22:42:40,956][1648985] Avg episode reward: [(0, '164.150')] [2024-06-15 22:42:41,236][1652491] Updated weights for policy 0, policy_version 914608 (0.0014) [2024-06-15 22:42:45,806][1652491] Updated weights for policy 0, policy_version 914642 (0.0012) [2024-06-15 22:42:45,955][1648985] Fps is (10 sec: 42598.1, 60 sec: 46421.5, 300 sec: 47208.1). Total num frames: 1873182720. Throughput: 0: 12003.5. Samples: 468386304. Policy #0 lag: (min: 26.0, avg: 160.3, max: 282.0) [2024-06-15 22:42:45,956][1648985] Avg episode reward: [(0, '199.190')] [2024-06-15 22:42:46,942][1652491] Updated weights for policy 0, policy_version 914691 (0.0026) [2024-06-15 22:42:47,907][1652491] Updated weights for policy 0, policy_version 914746 (0.0011) [2024-06-15 22:42:49,896][1652491] Updated weights for policy 0, policy_version 914808 (0.0042) [2024-06-15 22:42:50,955][1648985] Fps is (10 sec: 45876.6, 60 sec: 49698.3, 300 sec: 47541.4). Total num frames: 1873543168. Throughput: 0: 12208.4. Samples: 468459520. Policy #0 lag: (min: 38.0, avg: 112.1, max: 294.0) [2024-06-15 22:42:50,955][1648985] Avg episode reward: [(0, '203.590')] [2024-06-15 22:42:51,636][1652491] Updated weights for policy 0, policy_version 914851 (0.0012) [2024-06-15 22:42:55,955][1648985] Fps is (10 sec: 49150.3, 60 sec: 45874.9, 300 sec: 47097.0). Total num frames: 1873674240. Throughput: 0: 12174.1. Samples: 468494336. Policy #0 lag: (min: 38.0, avg: 112.1, max: 294.0) [2024-06-15 22:42:55,956][1648985] Avg episode reward: [(0, '189.000')] [2024-06-15 22:42:55,964][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000914880_1873674240.pth... [2024-06-15 22:42:56,062][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000909312_1862270976.pth [2024-06-15 22:42:57,560][1652491] Updated weights for policy 0, policy_version 914928 (0.0317) [2024-06-15 22:42:59,107][1652491] Updated weights for policy 0, policy_version 914997 (0.0105) [2024-06-15 22:43:00,516][1652491] Updated weights for policy 0, policy_version 915062 (0.0012) [2024-06-15 22:43:00,955][1648985] Fps is (10 sec: 52427.8, 60 sec: 50244.1, 300 sec: 47541.3). Total num frames: 1874067456. Throughput: 0: 12014.9. Samples: 468561408. Policy #0 lag: (min: 38.0, avg: 112.1, max: 294.0) [2024-06-15 22:43:00,956][1648985] Avg episode reward: [(0, '175.820')] [2024-06-15 22:43:02,798][1652491] Updated weights for policy 0, policy_version 915104 (0.0017) [2024-06-15 22:43:05,955][1648985] Fps is (10 sec: 52430.7, 60 sec: 46421.3, 300 sec: 47541.4). Total num frames: 1874198528. Throughput: 0: 12014.9. Samples: 468637184. Policy #0 lag: (min: 38.0, avg: 112.1, max: 294.0) [2024-06-15 22:43:05,956][1648985] Avg episode reward: [(0, '183.390')] [2024-06-15 22:43:07,921][1652491] Updated weights for policy 0, policy_version 915153 (0.0015) [2024-06-15 22:43:08,649][1651469] Signal inference workers to stop experience collection... (47650 times) [2024-06-15 22:43:08,699][1652491] InferenceWorker_p0-w0: stopping experience collection (47650 times) [2024-06-15 22:43:08,850][1651469] Signal inference workers to resume experience collection... (47650 times) [2024-06-15 22:43:08,851][1652491] InferenceWorker_p0-w0: resuming experience collection (47650 times) [2024-06-15 22:43:09,356][1652491] Updated weights for policy 0, policy_version 915220 (0.0012) [2024-06-15 22:43:10,955][1648985] Fps is (10 sec: 39322.1, 60 sec: 48059.7, 300 sec: 47430.3). Total num frames: 1874460672. Throughput: 0: 11946.7. Samples: 468672000. Policy #0 lag: (min: 38.0, avg: 112.1, max: 294.0) [2024-06-15 22:43:10,956][1648985] Avg episode reward: [(0, '185.870')] [2024-06-15 22:43:11,318][1652491] Updated weights for policy 0, policy_version 915280 (0.0030) [2024-06-15 22:43:12,531][1652491] Updated weights for policy 0, policy_version 915326 (0.0020) [2024-06-15 22:43:14,077][1652491] Updated weights for policy 0, policy_version 915390 (0.0013) [2024-06-15 22:43:15,955][1648985] Fps is (10 sec: 52427.5, 60 sec: 48059.5, 300 sec: 47763.5). Total num frames: 1874722816. Throughput: 0: 11923.8. Samples: 468737536. Policy #0 lag: (min: 38.0, avg: 112.1, max: 294.0) [2024-06-15 22:43:15,956][1648985] Avg episode reward: [(0, '183.750')] [2024-06-15 22:43:20,067][1652491] Updated weights for policy 0, policy_version 915442 (0.0012) [2024-06-15 22:43:20,955][1648985] Fps is (10 sec: 45875.5, 60 sec: 46967.3, 300 sec: 47430.3). Total num frames: 1874919424. Throughput: 0: 11855.6. Samples: 468809216. Policy #0 lag: (min: 38.0, avg: 112.1, max: 294.0) [2024-06-15 22:43:20,955][1648985] Avg episode reward: [(0, '169.900')] [2024-06-15 22:43:21,208][1652491] Updated weights for policy 0, policy_version 915504 (0.0013) [2024-06-15 22:43:22,838][1652491] Updated weights for policy 0, policy_version 915553 (0.0014) [2024-06-15 22:43:24,878][1652491] Updated weights for policy 0, policy_version 915616 (0.0013) [2024-06-15 22:43:25,955][1648985] Fps is (10 sec: 52430.7, 60 sec: 48606.1, 300 sec: 47985.7). Total num frames: 1875247104. Throughput: 0: 11787.5. Samples: 468840448. Policy #0 lag: (min: 38.0, avg: 112.1, max: 294.0) [2024-06-15 22:43:25,955][1648985] Avg episode reward: [(0, '182.940')] [2024-06-15 22:43:30,955][1648985] Fps is (10 sec: 36044.9, 60 sec: 44236.9, 300 sec: 47208.1). Total num frames: 1875279872. Throughput: 0: 11707.8. Samples: 468913152. Policy #0 lag: (min: 38.0, avg: 112.1, max: 294.0) [2024-06-15 22:43:30,955][1648985] Avg episode reward: [(0, '177.930')] [2024-06-15 22:43:31,155][1652491] Updated weights for policy 0, policy_version 915681 (0.0014) [2024-06-15 22:43:32,485][1652491] Updated weights for policy 0, policy_version 915749 (0.0014) [2024-06-15 22:43:34,940][1652491] Updated weights for policy 0, policy_version 915813 (0.0127) [2024-06-15 22:43:35,761][1652491] Updated weights for policy 0, policy_version 915844 (0.0012) [2024-06-15 22:43:35,958][1648985] Fps is (10 sec: 42584.4, 60 sec: 48603.2, 300 sec: 47652.0). Total num frames: 1875673088. Throughput: 0: 11581.7. Samples: 468980736. Policy #0 lag: (min: 38.0, avg: 112.1, max: 294.0) [2024-06-15 22:43:35,959][1648985] Avg episode reward: [(0, '178.260')] [2024-06-15 22:43:36,982][1652491] Updated weights for policy 0, policy_version 915898 (0.0040) [2024-06-15 22:43:40,955][1648985] Fps is (10 sec: 49151.7, 60 sec: 44783.1, 300 sec: 47319.2). Total num frames: 1875771392. Throughput: 0: 11616.8. Samples: 469017088. Policy #0 lag: (min: 38.0, avg: 112.1, max: 294.0) [2024-06-15 22:43:40,955][1648985] Avg episode reward: [(0, '168.930')] [2024-06-15 22:43:42,017][1652491] Updated weights for policy 0, policy_version 915952 (0.0012) [2024-06-15 22:43:43,568][1652491] Updated weights for policy 0, policy_version 916016 (0.0016) [2024-06-15 22:43:45,349][1652491] Updated weights for policy 0, policy_version 916054 (0.0013) [2024-06-15 22:43:45,955][1648985] Fps is (10 sec: 45889.9, 60 sec: 49152.0, 300 sec: 47430.3). Total num frames: 1876131840. Throughput: 0: 11741.9. Samples: 469089792. Policy #0 lag: (min: 38.0, avg: 112.1, max: 294.0) [2024-06-15 22:43:45,956][1648985] Avg episode reward: [(0, '185.210')] [2024-06-15 22:43:46,897][1652491] Updated weights for policy 0, policy_version 916116 (0.0034) [2024-06-15 22:43:47,178][1651469] Signal inference workers to stop experience collection... (47700 times) [2024-06-15 22:43:47,203][1652491] InferenceWorker_p0-w0: stopping experience collection (47700 times) [2024-06-15 22:43:47,351][1651469] Signal inference workers to resume experience collection... (47700 times) [2024-06-15 22:43:47,352][1652491] InferenceWorker_p0-w0: resuming experience collection (47700 times) [2024-06-15 22:43:47,603][1652491] Updated weights for policy 0, policy_version 916154 (0.0014) [2024-06-15 22:43:50,955][1648985] Fps is (10 sec: 52429.1, 60 sec: 45875.2, 300 sec: 47541.4). Total num frames: 1876295680. Throughput: 0: 11867.0. Samples: 469171200. Policy #0 lag: (min: 38.0, avg: 112.1, max: 294.0) [2024-06-15 22:43:50,956][1648985] Avg episode reward: [(0, '197.790')] [2024-06-15 22:43:52,798][1652491] Updated weights for policy 0, policy_version 916224 (0.0098) [2024-06-15 22:43:54,193][1652491] Updated weights for policy 0, policy_version 916277 (0.0038) [2024-06-15 22:43:55,950][1652491] Updated weights for policy 0, policy_version 916304 (0.0010) [2024-06-15 22:43:55,955][1648985] Fps is (10 sec: 42598.2, 60 sec: 48060.0, 300 sec: 47208.1). Total num frames: 1876557824. Throughput: 0: 11798.8. Samples: 469202944. Policy #0 lag: (min: 38.0, avg: 112.1, max: 294.0) [2024-06-15 22:43:55,956][1648985] Avg episode reward: [(0, '183.130')] [2024-06-15 22:43:57,155][1652491] Updated weights for policy 0, policy_version 916357 (0.0040) [2024-06-15 22:44:00,955][1648985] Fps is (10 sec: 52428.6, 60 sec: 45875.3, 300 sec: 47541.4). Total num frames: 1876819968. Throughput: 0: 12003.6. Samples: 469277696. Policy #0 lag: (min: 38.0, avg: 112.1, max: 294.0) [2024-06-15 22:44:00,955][1648985] Avg episode reward: [(0, '153.930')] [2024-06-15 22:44:01,685][1652491] Updated weights for policy 0, policy_version 916419 (0.0046) [2024-06-15 22:44:03,078][1652491] Updated weights for policy 0, policy_version 916481 (0.0013) [2024-06-15 22:44:03,910][1652491] Updated weights for policy 0, policy_version 916528 (0.0040) [2024-06-15 22:44:05,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 48059.8, 300 sec: 47430.3). Total num frames: 1877082112. Throughput: 0: 12151.5. Samples: 469356032. Policy #0 lag: (min: 38.0, avg: 112.1, max: 294.0) [2024-06-15 22:44:05,956][1648985] Avg episode reward: [(0, '150.760')] [2024-06-15 22:44:06,580][1652491] Updated weights for policy 0, policy_version 916562 (0.0013) [2024-06-15 22:44:08,931][1652491] Updated weights for policy 0, policy_version 916666 (0.0014) [2024-06-15 22:44:10,955][1648985] Fps is (10 sec: 52428.4, 60 sec: 48059.7, 300 sec: 47541.4). Total num frames: 1877344256. Throughput: 0: 12049.0. Samples: 469382656. Policy #0 lag: (min: 38.0, avg: 112.1, max: 294.0) [2024-06-15 22:44:10,956][1648985] Avg episode reward: [(0, '159.180')] [2024-06-15 22:44:13,149][1652491] Updated weights for policy 0, policy_version 916704 (0.0012) [2024-06-15 22:44:14,785][1652491] Updated weights for policy 0, policy_version 916771 (0.0011) [2024-06-15 22:44:15,955][1648985] Fps is (10 sec: 52428.6, 60 sec: 48059.9, 300 sec: 47541.4). Total num frames: 1877606400. Throughput: 0: 12026.3. Samples: 469454336. Policy #0 lag: (min: 38.0, avg: 112.1, max: 294.0) [2024-06-15 22:44:15,956][1648985] Avg episode reward: [(0, '176.340')] [2024-06-15 22:44:18,217][1652491] Updated weights for policy 0, policy_version 916832 (0.0012) [2024-06-15 22:44:19,570][1652491] Updated weights for policy 0, policy_version 916880 (0.0012) [2024-06-15 22:44:20,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 49151.9, 300 sec: 47874.6). Total num frames: 1877868544. Throughput: 0: 12163.7. Samples: 469528064. Policy #0 lag: (min: 38.0, avg: 112.1, max: 294.0) [2024-06-15 22:44:20,956][1648985] Avg episode reward: [(0, '179.990')] [2024-06-15 22:44:23,402][1652491] Updated weights for policy 0, policy_version 916944 (0.0016) [2024-06-15 22:44:24,880][1652491] Updated weights for policy 0, policy_version 917008 (0.0068) [2024-06-15 22:44:25,868][1652491] Updated weights for policy 0, policy_version 917052 (0.0013) [2024-06-15 22:44:25,955][1648985] Fps is (10 sec: 52427.9, 60 sec: 48059.5, 300 sec: 47874.6). Total num frames: 1878130688. Throughput: 0: 12310.7. Samples: 469571072. Policy #0 lag: (min: 38.0, avg: 112.1, max: 294.0) [2024-06-15 22:44:25,956][1648985] Avg episode reward: [(0, '179.330')] [2024-06-15 22:44:29,289][1651469] Signal inference workers to stop experience collection... (47750 times) [2024-06-15 22:44:29,337][1652491] InferenceWorker_p0-w0: stopping experience collection (47750 times) [2024-06-15 22:44:29,527][1651469] Signal inference workers to resume experience collection... (47750 times) [2024-06-15 22:44:29,528][1652491] InferenceWorker_p0-w0: resuming experience collection (47750 times) [2024-06-15 22:44:29,759][1652491] Updated weights for policy 0, policy_version 917113 (0.0079) [2024-06-15 22:44:30,955][1648985] Fps is (10 sec: 45875.4, 60 sec: 50790.3, 300 sec: 47763.5). Total num frames: 1878327296. Throughput: 0: 12276.6. Samples: 469642240. Policy #0 lag: (min: 38.0, avg: 112.1, max: 294.0) [2024-06-15 22:44:30,956][1648985] Avg episode reward: [(0, '170.870')] [2024-06-15 22:44:31,260][1652491] Updated weights for policy 0, policy_version 917184 (0.0014) [2024-06-15 22:44:35,955][1648985] Fps is (10 sec: 39322.3, 60 sec: 47516.1, 300 sec: 47652.4). Total num frames: 1878523904. Throughput: 0: 11889.8. Samples: 469706240. Policy #0 lag: (min: 38.0, avg: 112.1, max: 294.0) [2024-06-15 22:44:35,956][1648985] Avg episode reward: [(0, '169.160')] [2024-06-15 22:44:36,589][1652491] Updated weights for policy 0, policy_version 917266 (0.0024) [2024-06-15 22:44:40,271][1652491] Updated weights for policy 0, policy_version 917344 (0.0012) [2024-06-15 22:44:40,955][1648985] Fps is (10 sec: 45875.3, 60 sec: 50244.3, 300 sec: 47541.4). Total num frames: 1878786048. Throughput: 0: 12083.2. Samples: 469746688. Policy #0 lag: (min: 38.0, avg: 112.1, max: 294.0) [2024-06-15 22:44:40,956][1648985] Avg episode reward: [(0, '175.810')] [2024-06-15 22:44:41,681][1652491] Updated weights for policy 0, policy_version 917392 (0.0022) [2024-06-15 22:44:42,893][1652491] Updated weights for policy 0, policy_version 917436 (0.0012) [2024-06-15 22:44:45,955][1648985] Fps is (10 sec: 42598.1, 60 sec: 46967.4, 300 sec: 47652.4). Total num frames: 1878949888. Throughput: 0: 11969.4. Samples: 469816320. Policy #0 lag: (min: 15.0, avg: 146.0, max: 303.0) [2024-06-15 22:44:45,956][1648985] Avg episode reward: [(0, '186.100')] [2024-06-15 22:44:47,407][1652491] Updated weights for policy 0, policy_version 917520 (0.0013) [2024-06-15 22:44:48,529][1652491] Updated weights for policy 0, policy_version 917568 (0.0018) [2024-06-15 22:44:50,955][1648985] Fps is (10 sec: 39321.6, 60 sec: 48059.7, 300 sec: 47319.2). Total num frames: 1879179264. Throughput: 0: 11719.1. Samples: 469883392. Policy #0 lag: (min: 15.0, avg: 146.0, max: 303.0) [2024-06-15 22:44:50,956][1648985] Avg episode reward: [(0, '193.670')] [2024-06-15 22:44:52,607][1652491] Updated weights for policy 0, policy_version 917632 (0.0013) [2024-06-15 22:44:54,475][1652491] Updated weights for policy 0, policy_version 917696 (0.0014) [2024-06-15 22:44:55,955][1648985] Fps is (10 sec: 49152.3, 60 sec: 48059.7, 300 sec: 47652.4). Total num frames: 1879441408. Throughput: 0: 11923.9. Samples: 469919232. Policy #0 lag: (min: 15.0, avg: 146.0, max: 303.0) [2024-06-15 22:44:55,956][1648985] Avg episode reward: [(0, '195.330')] [2024-06-15 22:44:55,979][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000917696_1879441408.pth... [2024-06-15 22:44:56,073][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000912080_1867939840.pth [2024-06-15 22:44:58,637][1652491] Updated weights for policy 0, policy_version 917776 (0.0016) [2024-06-15 22:45:00,955][1648985] Fps is (10 sec: 52428.1, 60 sec: 48059.6, 300 sec: 47541.3). Total num frames: 1879703552. Throughput: 0: 11741.8. Samples: 469982720. Policy #0 lag: (min: 15.0, avg: 146.0, max: 303.0) [2024-06-15 22:45:00,956][1648985] Avg episode reward: [(0, '180.050')] [2024-06-15 22:45:02,807][1652491] Updated weights for policy 0, policy_version 917856 (0.0012) [2024-06-15 22:45:04,516][1652491] Updated weights for policy 0, policy_version 917907 (0.0012) [2024-06-15 22:45:05,955][1648985] Fps is (10 sec: 52429.2, 60 sec: 48059.7, 300 sec: 47985.7). Total num frames: 1879965696. Throughput: 0: 11764.6. Samples: 470057472. Policy #0 lag: (min: 15.0, avg: 146.0, max: 303.0) [2024-06-15 22:45:05,955][1648985] Avg episode reward: [(0, '169.810')] [2024-06-15 22:45:07,858][1652491] Updated weights for policy 0, policy_version 917968 (0.0014) [2024-06-15 22:45:08,867][1652491] Updated weights for policy 0, policy_version 918014 (0.0021) [2024-06-15 22:45:10,156][1652491] Updated weights for policy 0, policy_version 918064 (0.0012) [2024-06-15 22:45:10,955][1648985] Fps is (10 sec: 52429.1, 60 sec: 48059.7, 300 sec: 47874.6). Total num frames: 1880227840. Throughput: 0: 11594.0. Samples: 470092800. Policy #0 lag: (min: 15.0, avg: 146.0, max: 303.0) [2024-06-15 22:45:10,956][1648985] Avg episode reward: [(0, '159.250')] [2024-06-15 22:45:13,331][1651469] Signal inference workers to stop experience collection... (47800 times) [2024-06-15 22:45:13,415][1652491] InferenceWorker_p0-w0: stopping experience collection (47800 times) [2024-06-15 22:45:13,609][1651469] Signal inference workers to resume experience collection... (47800 times) [2024-06-15 22:45:13,611][1652491] InferenceWorker_p0-w0: resuming experience collection (47800 times) [2024-06-15 22:45:14,199][1652491] Updated weights for policy 0, policy_version 918117 (0.0028) [2024-06-15 22:45:15,102][1652491] Updated weights for policy 0, policy_version 918147 (0.0018) [2024-06-15 22:45:15,955][1648985] Fps is (10 sec: 45875.4, 60 sec: 46967.5, 300 sec: 47763.6). Total num frames: 1880424448. Throughput: 0: 11753.3. Samples: 470171136. Policy #0 lag: (min: 15.0, avg: 146.0, max: 303.0) [2024-06-15 22:45:15,955][1648985] Avg episode reward: [(0, '188.150')] [2024-06-15 22:45:18,070][1652491] Updated weights for policy 0, policy_version 918210 (0.0011) [2024-06-15 22:45:19,437][1652491] Updated weights for policy 0, policy_version 918269 (0.0016) [2024-06-15 22:45:20,650][1652491] Updated weights for policy 0, policy_version 918320 (0.0013) [2024-06-15 22:45:20,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 48059.7, 300 sec: 48318.9). Total num frames: 1880752128. Throughput: 0: 11787.4. Samples: 470236672. Policy #0 lag: (min: 15.0, avg: 146.0, max: 303.0) [2024-06-15 22:45:20,956][1648985] Avg episode reward: [(0, '197.880')] [2024-06-15 22:45:25,145][1652491] Updated weights for policy 0, policy_version 918373 (0.0017) [2024-06-15 22:45:25,955][1648985] Fps is (10 sec: 45874.9, 60 sec: 45875.4, 300 sec: 47541.4). Total num frames: 1880883200. Throughput: 0: 11776.0. Samples: 470276608. Policy #0 lag: (min: 15.0, avg: 146.0, max: 303.0) [2024-06-15 22:45:25,956][1648985] Avg episode reward: [(0, '187.750')] [2024-06-15 22:45:26,839][1652491] Updated weights for policy 0, policy_version 918418 (0.0015) [2024-06-15 22:45:29,696][1652491] Updated weights for policy 0, policy_version 918496 (0.0101) [2024-06-15 22:45:30,955][1648985] Fps is (10 sec: 39321.6, 60 sec: 46967.4, 300 sec: 47985.7). Total num frames: 1881145344. Throughput: 0: 11730.5. Samples: 470344192. Policy #0 lag: (min: 15.0, avg: 146.0, max: 303.0) [2024-06-15 22:45:30,956][1648985] Avg episode reward: [(0, '176.340')] [2024-06-15 22:45:32,174][1652491] Updated weights for policy 0, policy_version 918560 (0.0080) [2024-06-15 22:45:35,955][1648985] Fps is (10 sec: 39321.6, 60 sec: 45875.2, 300 sec: 47430.3). Total num frames: 1881276416. Throughput: 0: 11741.9. Samples: 470411776. Policy #0 lag: (min: 15.0, avg: 146.0, max: 303.0) [2024-06-15 22:45:35,956][1648985] Avg episode reward: [(0, '166.130')] [2024-06-15 22:45:37,412][1652491] Updated weights for policy 0, policy_version 918640 (0.0013) [2024-06-15 22:45:39,001][1652491] Updated weights for policy 0, policy_version 918708 (0.0100) [2024-06-15 22:45:40,955][1648985] Fps is (10 sec: 39321.8, 60 sec: 45875.2, 300 sec: 47652.4). Total num frames: 1881538560. Throughput: 0: 11571.2. Samples: 470439936. Policy #0 lag: (min: 15.0, avg: 146.0, max: 303.0) [2024-06-15 22:45:40,955][1648985] Avg episode reward: [(0, '179.990')] [2024-06-15 22:45:41,015][1652491] Updated weights for policy 0, policy_version 918736 (0.0019) [2024-06-15 22:45:42,085][1652491] Updated weights for policy 0, policy_version 918781 (0.0013) [2024-06-15 22:45:43,668][1652491] Updated weights for policy 0, policy_version 918832 (0.0108) [2024-06-15 22:45:45,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 47513.7, 300 sec: 47543.7). Total num frames: 1881800704. Throughput: 0: 11878.4. Samples: 470517248. Policy #0 lag: (min: 15.0, avg: 146.0, max: 303.0) [2024-06-15 22:45:45,956][1648985] Avg episode reward: [(0, '168.990')] [2024-06-15 22:45:47,694][1652491] Updated weights for policy 0, policy_version 918864 (0.0013) [2024-06-15 22:45:49,266][1652491] Updated weights for policy 0, policy_version 918928 (0.0016) [2024-06-15 22:45:50,130][1652491] Updated weights for policy 0, policy_version 918976 (0.0012) [2024-06-15 22:45:50,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 48059.8, 300 sec: 47985.7). Total num frames: 1882062848. Throughput: 0: 11912.5. Samples: 470593536. Policy #0 lag: (min: 15.0, avg: 146.0, max: 303.0) [2024-06-15 22:45:50,956][1648985] Avg episode reward: [(0, '176.310')] [2024-06-15 22:45:52,597][1652491] Updated weights for policy 0, policy_version 919032 (0.0022) [2024-06-15 22:45:53,332][1652491] Updated weights for policy 0, policy_version 919059 (0.0009) [2024-06-15 22:45:55,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 48059.8, 300 sec: 47652.4). Total num frames: 1882324992. Throughput: 0: 11878.4. Samples: 470627328. Policy #0 lag: (min: 15.0, avg: 146.0, max: 303.0) [2024-06-15 22:45:55,956][1648985] Avg episode reward: [(0, '170.820')] [2024-06-15 22:45:58,252][1651469] Signal inference workers to stop experience collection... (47850 times) [2024-06-15 22:45:58,301][1652491] InferenceWorker_p0-w0: stopping experience collection (47850 times) [2024-06-15 22:45:58,513][1651469] Signal inference workers to resume experience collection... (47850 times) [2024-06-15 22:45:58,514][1652491] InferenceWorker_p0-w0: resuming experience collection (47850 times) [2024-06-15 22:45:58,853][1652491] Updated weights for policy 0, policy_version 919136 (0.0120) [2024-06-15 22:46:00,810][1652491] Updated weights for policy 0, policy_version 919203 (0.0031) [2024-06-15 22:46:00,955][1648985] Fps is (10 sec: 45875.0, 60 sec: 46967.6, 300 sec: 47763.5). Total num frames: 1882521600. Throughput: 0: 11844.2. Samples: 470704128. Policy #0 lag: (min: 15.0, avg: 146.0, max: 303.0) [2024-06-15 22:46:00,956][1648985] Avg episode reward: [(0, '180.600')] [2024-06-15 22:46:02,823][1652491] Updated weights for policy 0, policy_version 919267 (0.0013) [2024-06-15 22:46:04,619][1652491] Updated weights for policy 0, policy_version 919344 (0.0012) [2024-06-15 22:46:05,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 48059.7, 300 sec: 47874.6). Total num frames: 1882849280. Throughput: 0: 11821.5. Samples: 470768640. Policy #0 lag: (min: 15.0, avg: 146.0, max: 303.0) [2024-06-15 22:46:05,956][1648985] Avg episode reward: [(0, '179.920')] [2024-06-15 22:46:10,002][1652491] Updated weights for policy 0, policy_version 919376 (0.0012) [2024-06-15 22:46:10,955][1648985] Fps is (10 sec: 45875.6, 60 sec: 45875.3, 300 sec: 47541.4). Total num frames: 1882980352. Throughput: 0: 11912.6. Samples: 470812672. Policy #0 lag: (min: 15.0, avg: 146.0, max: 303.0) [2024-06-15 22:46:10,955][1648985] Avg episode reward: [(0, '188.230')] [2024-06-15 22:46:11,650][1652491] Updated weights for policy 0, policy_version 919442 (0.0012) [2024-06-15 22:46:12,589][1652491] Updated weights for policy 0, policy_version 919488 (0.0013) [2024-06-15 22:46:14,461][1652491] Updated weights for policy 0, policy_version 919559 (0.0086) [2024-06-15 22:46:15,614][1652491] Updated weights for policy 0, policy_version 919614 (0.0011) [2024-06-15 22:46:15,955][1648985] Fps is (10 sec: 52429.4, 60 sec: 49152.0, 300 sec: 48096.8). Total num frames: 1883373568. Throughput: 0: 11776.0. Samples: 470874112. Policy #0 lag: (min: 15.0, avg: 146.0, max: 303.0) [2024-06-15 22:46:15,955][1648985] Avg episode reward: [(0, '175.090')] [2024-06-15 22:46:20,955][1648985] Fps is (10 sec: 39321.2, 60 sec: 43690.7, 300 sec: 47097.1). Total num frames: 1883373568. Throughput: 0: 12174.2. Samples: 470959616. Policy #0 lag: (min: 15.0, avg: 146.0, max: 303.0) [2024-06-15 22:46:20,956][1648985] Avg episode reward: [(0, '193.980')] [2024-06-15 22:46:22,158][1652491] Updated weights for policy 0, policy_version 919680 (0.0016) [2024-06-15 22:46:24,337][1652491] Updated weights for policy 0, policy_version 919750 (0.0024) [2024-06-15 22:46:25,955][1648985] Fps is (10 sec: 42597.6, 60 sec: 48605.8, 300 sec: 47874.6). Total num frames: 1883799552. Throughput: 0: 12140.1. Samples: 470986240. Policy #0 lag: (min: 15.0, avg: 146.0, max: 303.0) [2024-06-15 22:46:25,956][1648985] Avg episode reward: [(0, '180.360')] [2024-06-15 22:46:26,050][1652491] Updated weights for policy 0, policy_version 919840 (0.0098) [2024-06-15 22:46:30,955][1648985] Fps is (10 sec: 52429.2, 60 sec: 45875.3, 300 sec: 47097.1). Total num frames: 1883897856. Throughput: 0: 12014.9. Samples: 471057920. Policy #0 lag: (min: 15.0, avg: 146.0, max: 303.0) [2024-06-15 22:46:30,955][1648985] Avg episode reward: [(0, '177.930')] [2024-06-15 22:46:33,651][1652491] Updated weights for policy 0, policy_version 919920 (0.0147) [2024-06-15 22:46:35,323][1652491] Updated weights for policy 0, policy_version 919984 (0.0013) [2024-06-15 22:46:35,955][1648985] Fps is (10 sec: 36045.3, 60 sec: 48059.8, 300 sec: 47652.5). Total num frames: 1884160000. Throughput: 0: 11673.6. Samples: 471118848. Policy #0 lag: (min: 15.0, avg: 146.0, max: 303.0) [2024-06-15 22:46:35,955][1648985] Avg episode reward: [(0, '155.510')] [2024-06-15 22:46:36,072][1651469] Signal inference workers to stop experience collection... (47900 times) [2024-06-15 22:46:36,135][1652491] InferenceWorker_p0-w0: stopping experience collection (47900 times) [2024-06-15 22:46:36,244][1651469] Signal inference workers to resume experience collection... (47900 times) [2024-06-15 22:46:36,245][1652491] InferenceWorker_p0-w0: resuming experience collection (47900 times) [2024-06-15 22:46:37,486][1652491] Updated weights for policy 0, policy_version 920082 (0.0014) [2024-06-15 22:46:38,490][1652491] Updated weights for policy 0, policy_version 920128 (0.0014) [2024-06-15 22:46:40,955][1648985] Fps is (10 sec: 52427.7, 60 sec: 48059.6, 300 sec: 47541.4). Total num frames: 1884422144. Throughput: 0: 11502.9. Samples: 471144960. Policy #0 lag: (min: 107.0, avg: 197.3, max: 365.0) [2024-06-15 22:46:40,956][1648985] Avg episode reward: [(0, '169.570')] [2024-06-15 22:46:45,955][1648985] Fps is (10 sec: 32767.8, 60 sec: 44782.9, 300 sec: 47208.2). Total num frames: 1884487680. Throughput: 0: 11730.5. Samples: 471232000. Policy #0 lag: (min: 107.0, avg: 197.3, max: 365.0) [2024-06-15 22:46:45,956][1648985] Avg episode reward: [(0, '169.830')] [2024-06-15 22:46:47,194][1652491] Updated weights for policy 0, policy_version 920211 (0.0014) [2024-06-15 22:46:48,829][1652491] Updated weights for policy 0, policy_version 920274 (0.0013) [2024-06-15 22:46:50,871][1652491] Updated weights for policy 0, policy_version 920355 (0.0013) [2024-06-15 22:46:50,955][1648985] Fps is (10 sec: 45876.3, 60 sec: 46967.5, 300 sec: 47319.2). Total num frames: 1884880896. Throughput: 0: 11241.3. Samples: 471274496. Policy #0 lag: (min: 107.0, avg: 197.3, max: 365.0) [2024-06-15 22:46:50,955][1648985] Avg episode reward: [(0, '160.230')] [2024-06-15 22:46:55,955][1648985] Fps is (10 sec: 45873.7, 60 sec: 43690.4, 300 sec: 47097.0). Total num frames: 1884946432. Throughput: 0: 11207.0. Samples: 471316992. Policy #0 lag: (min: 107.0, avg: 197.3, max: 365.0) [2024-06-15 22:46:55,956][1648985] Avg episode reward: [(0, '157.880')] [2024-06-15 22:46:55,960][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000920384_1884946432.pth... [2024-06-15 22:46:56,040][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000914880_1873674240.pth [2024-06-15 22:46:57,978][1652491] Updated weights for policy 0, policy_version 920416 (0.0123) [2024-06-15 22:46:59,484][1652491] Updated weights for policy 0, policy_version 920488 (0.0108) [2024-06-15 22:47:00,955][1648985] Fps is (10 sec: 39321.0, 60 sec: 45875.2, 300 sec: 46986.0). Total num frames: 1885274112. Throughput: 0: 11400.5. Samples: 471387136. Policy #0 lag: (min: 107.0, avg: 197.3, max: 365.0) [2024-06-15 22:47:00,956][1648985] Avg episode reward: [(0, '157.770')] [2024-06-15 22:47:01,207][1652491] Updated weights for policy 0, policy_version 920561 (0.0136) [2024-06-15 22:47:02,664][1652491] Updated weights for policy 0, policy_version 920624 (0.0013) [2024-06-15 22:47:05,955][1648985] Fps is (10 sec: 52430.4, 60 sec: 43690.6, 300 sec: 47097.0). Total num frames: 1885470720. Throughput: 0: 10968.2. Samples: 471453184. Policy #0 lag: (min: 107.0, avg: 197.3, max: 365.0) [2024-06-15 22:47:05,956][1648985] Avg episode reward: [(0, '168.060')] [2024-06-15 22:47:09,331][1652491] Updated weights for policy 0, policy_version 920675 (0.0013) [2024-06-15 22:47:10,466][1652491] Updated weights for policy 0, policy_version 920736 (0.0012) [2024-06-15 22:47:10,955][1648985] Fps is (10 sec: 42598.5, 60 sec: 45329.0, 300 sec: 46986.0). Total num frames: 1885700096. Throughput: 0: 11343.7. Samples: 471496704. Policy #0 lag: (min: 107.0, avg: 197.3, max: 365.0) [2024-06-15 22:47:10,956][1648985] Avg episode reward: [(0, '186.690')] [2024-06-15 22:47:11,860][1652491] Updated weights for policy 0, policy_version 920786 (0.0015) [2024-06-15 22:47:13,587][1651469] Signal inference workers to stop experience collection... (47950 times) [2024-06-15 22:47:13,654][1652491] InferenceWorker_p0-w0: stopping experience collection (47950 times) [2024-06-15 22:47:13,765][1651469] Signal inference workers to resume experience collection... (47950 times) [2024-06-15 22:47:13,766][1652491] InferenceWorker_p0-w0: resuming experience collection (47950 times) [2024-06-15 22:47:13,769][1652491] Updated weights for policy 0, policy_version 920880 (0.0013) [2024-06-15 22:47:15,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 43690.5, 300 sec: 47097.0). Total num frames: 1885995008. Throughput: 0: 11070.5. Samples: 471556096. Policy #0 lag: (min: 107.0, avg: 197.3, max: 365.0) [2024-06-15 22:47:15,956][1648985] Avg episode reward: [(0, '168.090')] [2024-06-15 22:47:20,500][1652491] Updated weights for policy 0, policy_version 920944 (0.0011) [2024-06-15 22:47:20,955][1648985] Fps is (10 sec: 42598.9, 60 sec: 45875.3, 300 sec: 46763.9). Total num frames: 1886126080. Throughput: 0: 11491.6. Samples: 471635968. Policy #0 lag: (min: 107.0, avg: 197.3, max: 365.0) [2024-06-15 22:47:20,955][1648985] Avg episode reward: [(0, '172.650')] [2024-06-15 22:47:22,573][1652491] Updated weights for policy 0, policy_version 921040 (0.0074) [2024-06-15 22:47:24,074][1652491] Updated weights for policy 0, policy_version 921104 (0.0018) [2024-06-15 22:47:24,904][1652491] Updated weights for policy 0, policy_version 921152 (0.0015) [2024-06-15 22:47:25,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 45329.1, 300 sec: 47097.1). Total num frames: 1886519296. Throughput: 0: 11423.3. Samples: 471659008. Policy #0 lag: (min: 107.0, avg: 197.3, max: 365.0) [2024-06-15 22:47:25,956][1648985] Avg episode reward: [(0, '165.480')] [2024-06-15 22:47:30,955][1648985] Fps is (10 sec: 45874.2, 60 sec: 44782.8, 300 sec: 46874.9). Total num frames: 1886584832. Throughput: 0: 11389.1. Samples: 471744512. Policy #0 lag: (min: 107.0, avg: 197.3, max: 365.0) [2024-06-15 22:47:30,956][1648985] Avg episode reward: [(0, '171.030')] [2024-06-15 22:47:32,083][1652491] Updated weights for policy 0, policy_version 921232 (0.0014) [2024-06-15 22:47:34,257][1652491] Updated weights for policy 0, policy_version 921312 (0.0014) [2024-06-15 22:47:35,538][1652491] Updated weights for policy 0, policy_version 921365 (0.0018) [2024-06-15 22:47:35,955][1648985] Fps is (10 sec: 45875.6, 60 sec: 46967.4, 300 sec: 47097.1). Total num frames: 1886978048. Throughput: 0: 11571.2. Samples: 471795200. Policy #0 lag: (min: 107.0, avg: 197.3, max: 365.0) [2024-06-15 22:47:35,956][1648985] Avg episode reward: [(0, '174.050')] [2024-06-15 22:47:40,955][1648985] Fps is (10 sec: 45876.0, 60 sec: 43690.8, 300 sec: 46986.0). Total num frames: 1887043584. Throughput: 0: 11503.0. Samples: 471834624. Policy #0 lag: (min: 107.0, avg: 197.3, max: 365.0) [2024-06-15 22:47:40,956][1648985] Avg episode reward: [(0, '196.420')] [2024-06-15 22:47:42,118][1652491] Updated weights for policy 0, policy_version 921424 (0.0011) [2024-06-15 22:47:44,031][1652491] Updated weights for policy 0, policy_version 921493 (0.0020) [2024-06-15 22:47:45,955][1648985] Fps is (10 sec: 39321.7, 60 sec: 48059.8, 300 sec: 46874.9). Total num frames: 1887371264. Throughput: 0: 11571.2. Samples: 471907840. Policy #0 lag: (min: 107.0, avg: 197.3, max: 365.0) [2024-06-15 22:47:45,955][1648985] Avg episode reward: [(0, '185.640')] [2024-06-15 22:47:46,398][1652491] Updated weights for policy 0, policy_version 921584 (0.0081) [2024-06-15 22:47:47,801][1652491] Updated weights for policy 0, policy_version 921648 (0.0014) [2024-06-15 22:47:50,955][1648985] Fps is (10 sec: 52428.5, 60 sec: 44782.9, 300 sec: 47097.1). Total num frames: 1887567872. Throughput: 0: 11514.3. Samples: 471971328. Policy #0 lag: (min: 107.0, avg: 197.3, max: 365.0) [2024-06-15 22:47:50,956][1648985] Avg episode reward: [(0, '172.340')] [2024-06-15 22:47:54,269][1652491] Updated weights for policy 0, policy_version 921712 (0.0011) [2024-06-15 22:47:54,775][1651469] Signal inference workers to stop experience collection... (48000 times) [2024-06-15 22:47:54,845][1652491] InferenceWorker_p0-w0: stopping experience collection (48000 times) [2024-06-15 22:47:55,013][1651469] Signal inference workers to resume experience collection... (48000 times) [2024-06-15 22:47:55,014][1652491] InferenceWorker_p0-w0: resuming experience collection (48000 times) [2024-06-15 22:47:55,955][1648985] Fps is (10 sec: 42597.7, 60 sec: 47513.8, 300 sec: 46541.7). Total num frames: 1887797248. Throughput: 0: 11480.2. Samples: 472013312. Policy #0 lag: (min: 107.0, avg: 197.3, max: 365.0) [2024-06-15 22:47:55,956][1648985] Avg episode reward: [(0, '158.280')] [2024-06-15 22:47:56,223][1652491] Updated weights for policy 0, policy_version 921787 (0.0070) [2024-06-15 22:47:57,875][1652491] Updated weights for policy 0, policy_version 921843 (0.0012) [2024-06-15 22:47:59,245][1652491] Updated weights for policy 0, policy_version 921914 (0.0157) [2024-06-15 22:48:00,955][1648985] Fps is (10 sec: 52429.1, 60 sec: 46967.6, 300 sec: 47097.1). Total num frames: 1888092160. Throughput: 0: 11286.8. Samples: 472064000. Policy #0 lag: (min: 107.0, avg: 197.3, max: 365.0) [2024-06-15 22:48:00,955][1648985] Avg episode reward: [(0, '153.330')] [2024-06-15 22:48:05,955][1648985] Fps is (10 sec: 36045.4, 60 sec: 44783.0, 300 sec: 46430.6). Total num frames: 1888157696. Throughput: 0: 11343.6. Samples: 472146432. Policy #0 lag: (min: 107.0, avg: 197.3, max: 365.0) [2024-06-15 22:48:05,956][1648985] Avg episode reward: [(0, '162.030')] [2024-06-15 22:48:06,424][1652491] Updated weights for policy 0, policy_version 921984 (0.0016) [2024-06-15 22:48:08,594][1652491] Updated weights for policy 0, policy_version 922065 (0.0011) [2024-06-15 22:48:10,220][1652491] Updated weights for policy 0, policy_version 922130 (0.0013) [2024-06-15 22:48:10,955][1648985] Fps is (10 sec: 49151.7, 60 sec: 48059.8, 300 sec: 46986.0). Total num frames: 1888583680. Throughput: 0: 11355.0. Samples: 472169984. Policy #0 lag: (min: 107.0, avg: 197.3, max: 365.0) [2024-06-15 22:48:10,956][1648985] Avg episode reward: [(0, '157.520')] [2024-06-15 22:48:15,955][1648985] Fps is (10 sec: 45875.6, 60 sec: 43690.8, 300 sec: 46430.6). Total num frames: 1888616448. Throughput: 0: 11025.1. Samples: 472240640. Policy #0 lag: (min: 107.0, avg: 197.3, max: 365.0) [2024-06-15 22:48:15,955][1648985] Avg episode reward: [(0, '148.160')] [2024-06-15 22:48:16,607][1652491] Updated weights for policy 0, policy_version 922193 (0.0014) [2024-06-15 22:48:17,542][1652491] Updated weights for policy 0, policy_version 922240 (0.0119) [2024-06-15 22:48:19,592][1652491] Updated weights for policy 0, policy_version 922299 (0.0128) [2024-06-15 22:48:20,651][1652491] Updated weights for policy 0, policy_version 922357 (0.0011) [2024-06-15 22:48:20,955][1648985] Fps is (10 sec: 42597.5, 60 sec: 48059.5, 300 sec: 46652.7). Total num frames: 1889009664. Throughput: 0: 11434.6. Samples: 472309760. Policy #0 lag: (min: 107.0, avg: 197.3, max: 365.0) [2024-06-15 22:48:20,956][1648985] Avg episode reward: [(0, '155.060')] [2024-06-15 22:48:22,478][1652491] Updated weights for policy 0, policy_version 922431 (0.0013) [2024-06-15 22:48:25,955][1648985] Fps is (10 sec: 52427.3, 60 sec: 43690.6, 300 sec: 46985.9). Total num frames: 1889140736. Throughput: 0: 11241.2. Samples: 472340480. Policy #0 lag: (min: 107.0, avg: 197.3, max: 365.0) [2024-06-15 22:48:25,956][1648985] Avg episode reward: [(0, '170.060')] [2024-06-15 22:48:28,205][1652491] Updated weights for policy 0, policy_version 922489 (0.0014) [2024-06-15 22:48:30,480][1652491] Updated weights for policy 0, policy_version 922544 (0.0011) [2024-06-15 22:48:30,955][1648985] Fps is (10 sec: 39322.6, 60 sec: 46967.6, 300 sec: 46542.2). Total num frames: 1889402880. Throughput: 0: 11502.9. Samples: 472425472. Policy #0 lag: (min: 107.0, avg: 197.3, max: 365.0) [2024-06-15 22:48:30,955][1648985] Avg episode reward: [(0, '179.560')] [2024-06-15 22:48:31,850][1652491] Updated weights for policy 0, policy_version 922608 (0.0012) [2024-06-15 22:48:32,284][1651469] Signal inference workers to stop experience collection... (48050 times) [2024-06-15 22:48:32,315][1652491] InferenceWorker_p0-w0: stopping experience collection (48050 times) [2024-06-15 22:48:32,490][1651469] Signal inference workers to resume experience collection... (48050 times) [2024-06-15 22:48:32,491][1652491] InferenceWorker_p0-w0: resuming experience collection (48050 times) [2024-06-15 22:48:33,383][1652491] Updated weights for policy 0, policy_version 922680 (0.0015) [2024-06-15 22:48:35,955][1648985] Fps is (10 sec: 52429.6, 60 sec: 44782.9, 300 sec: 47097.1). Total num frames: 1889665024. Throughput: 0: 11650.8. Samples: 472495616. Policy #0 lag: (min: 68.0, avg: 163.7, max: 308.0) [2024-06-15 22:48:35,956][1648985] Avg episode reward: [(0, '176.080')] [2024-06-15 22:48:38,314][1652491] Updated weights for policy 0, policy_version 922752 (0.0013) [2024-06-15 22:48:40,955][1648985] Fps is (10 sec: 42597.8, 60 sec: 46421.2, 300 sec: 46430.6). Total num frames: 1889828864. Throughput: 0: 11491.6. Samples: 472530432. Policy #0 lag: (min: 68.0, avg: 163.7, max: 308.0) [2024-06-15 22:48:40,956][1648985] Avg episode reward: [(0, '177.800')] [2024-06-15 22:48:42,131][1652491] Updated weights for policy 0, policy_version 922816 (0.0014) [2024-06-15 22:48:44,225][1652491] Updated weights for policy 0, policy_version 922896 (0.0102) [2024-06-15 22:48:45,239][1652491] Updated weights for policy 0, policy_version 922941 (0.0018) [2024-06-15 22:48:45,974][1648985] Fps is (10 sec: 52328.7, 60 sec: 46952.4, 300 sec: 47094.0). Total num frames: 1890189312. Throughput: 0: 11611.8. Samples: 472586752. Policy #0 lag: (min: 68.0, avg: 163.7, max: 308.0) [2024-06-15 22:48:45,975][1648985] Avg episode reward: [(0, '177.640')] [2024-06-15 22:48:50,468][1652491] Updated weights for policy 0, policy_version 922997 (0.0013) [2024-06-15 22:48:50,955][1648985] Fps is (10 sec: 49152.7, 60 sec: 45875.2, 300 sec: 46652.8). Total num frames: 1890320384. Throughput: 0: 11514.3. Samples: 472664576. Policy #0 lag: (min: 68.0, avg: 163.7, max: 308.0) [2024-06-15 22:48:50,955][1648985] Avg episode reward: [(0, '172.470')] [2024-06-15 22:48:53,554][1652491] Updated weights for policy 0, policy_version 923043 (0.0061) [2024-06-15 22:48:55,060][1652491] Updated weights for policy 0, policy_version 923108 (0.0013) [2024-06-15 22:48:55,955][1648985] Fps is (10 sec: 42679.6, 60 sec: 46967.5, 300 sec: 46763.8). Total num frames: 1890615296. Throughput: 0: 11787.3. Samples: 472700416. Policy #0 lag: (min: 68.0, avg: 163.7, max: 308.0) [2024-06-15 22:48:55,956][1648985] Avg episode reward: [(0, '162.880')] [2024-06-15 22:48:56,249][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000923168_1890648064.pth... [2024-06-15 22:48:56,399][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000917696_1879441408.pth [2024-06-15 22:48:56,902][1652491] Updated weights for policy 0, policy_version 923190 (0.0013) [2024-06-15 22:49:00,952][1652491] Updated weights for policy 0, policy_version 923235 (0.0014) [2024-06-15 22:49:00,955][1648985] Fps is (10 sec: 45874.9, 60 sec: 44782.9, 300 sec: 46430.6). Total num frames: 1890779136. Throughput: 0: 11730.5. Samples: 472768512. Policy #0 lag: (min: 68.0, avg: 163.7, max: 308.0) [2024-06-15 22:49:00,956][1648985] Avg episode reward: [(0, '159.270')] [2024-06-15 22:49:03,849][1652491] Updated weights for policy 0, policy_version 923280 (0.0012) [2024-06-15 22:49:05,131][1652491] Updated weights for policy 0, policy_version 923329 (0.0111) [2024-06-15 22:49:05,955][1648985] Fps is (10 sec: 42599.0, 60 sec: 48059.7, 300 sec: 46430.6). Total num frames: 1891041280. Throughput: 0: 11844.3. Samples: 472842752. Policy #0 lag: (min: 68.0, avg: 163.7, max: 308.0) [2024-06-15 22:49:05,956][1648985] Avg episode reward: [(0, '168.310')] [2024-06-15 22:49:06,609][1652491] Updated weights for policy 0, policy_version 923392 (0.0037) [2024-06-15 22:49:07,788][1652491] Updated weights for policy 0, policy_version 923452 (0.0013) [2024-06-15 22:49:10,955][1648985] Fps is (10 sec: 45874.7, 60 sec: 44236.7, 300 sec: 46208.4). Total num frames: 1891237888. Throughput: 0: 11878.4. Samples: 472875008. Policy #0 lag: (min: 68.0, avg: 163.7, max: 308.0) [2024-06-15 22:49:10,956][1648985] Avg episode reward: [(0, '185.370')] [2024-06-15 22:49:12,106][1652491] Updated weights for policy 0, policy_version 923510 (0.0013) [2024-06-15 22:49:14,524][1651469] Signal inference workers to stop experience collection... (48100 times) [2024-06-15 22:49:14,629][1652491] InferenceWorker_p0-w0: stopping experience collection (48100 times) [2024-06-15 22:49:14,750][1651469] Signal inference workers to resume experience collection... (48100 times) [2024-06-15 22:49:14,751][1652491] InferenceWorker_p0-w0: resuming experience collection (48100 times) [2024-06-15 22:49:15,881][1652491] Updated weights for policy 0, policy_version 923617 (0.0070) [2024-06-15 22:49:15,955][1648985] Fps is (10 sec: 52429.1, 60 sec: 49152.0, 300 sec: 46430.6). Total num frames: 1891565568. Throughput: 0: 11787.4. Samples: 472955904. Policy #0 lag: (min: 68.0, avg: 163.7, max: 308.0) [2024-06-15 22:49:15,955][1648985] Avg episode reward: [(0, '155.140')] [2024-06-15 22:49:17,015][1652491] Updated weights for policy 0, policy_version 923670 (0.0017) [2024-06-15 22:49:20,955][1648985] Fps is (10 sec: 52429.8, 60 sec: 45875.4, 300 sec: 46208.5). Total num frames: 1891762176. Throughput: 0: 11923.9. Samples: 473032192. Policy #0 lag: (min: 68.0, avg: 163.7, max: 308.0) [2024-06-15 22:49:20,956][1648985] Avg episode reward: [(0, '165.490')] [2024-06-15 22:49:21,829][1652491] Updated weights for policy 0, policy_version 923713 (0.0013) [2024-06-15 22:49:22,918][1652491] Updated weights for policy 0, policy_version 923774 (0.0014) [2024-06-15 22:49:25,795][1652491] Updated weights for policy 0, policy_version 923843 (0.0013) [2024-06-15 22:49:25,955][1648985] Fps is (10 sec: 49151.9, 60 sec: 48606.1, 300 sec: 46541.7). Total num frames: 1892057088. Throughput: 0: 12003.6. Samples: 473070592. Policy #0 lag: (min: 68.0, avg: 163.7, max: 308.0) [2024-06-15 22:49:25,956][1648985] Avg episode reward: [(0, '147.620')] [2024-06-15 22:49:27,600][1652491] Updated weights for policy 0, policy_version 923921 (0.0014) [2024-06-15 22:49:30,956][1648985] Fps is (10 sec: 52426.2, 60 sec: 48059.3, 300 sec: 46652.7). Total num frames: 1892286464. Throughput: 0: 12122.4. Samples: 473132032. Policy #0 lag: (min: 68.0, avg: 163.7, max: 308.0) [2024-06-15 22:49:30,957][1648985] Avg episode reward: [(0, '147.790')] [2024-06-15 22:49:33,933][1652491] Updated weights for policy 0, policy_version 924000 (0.0016) [2024-06-15 22:49:34,639][1652491] Updated weights for policy 0, policy_version 924032 (0.0013) [2024-06-15 22:49:35,955][1648985] Fps is (10 sec: 39321.4, 60 sec: 46421.3, 300 sec: 46319.5). Total num frames: 1892450304. Throughput: 0: 12140.1. Samples: 473210880. Policy #0 lag: (min: 68.0, avg: 163.7, max: 308.0) [2024-06-15 22:49:35,956][1648985] Avg episode reward: [(0, '142.880')] [2024-06-15 22:49:37,475][1652491] Updated weights for policy 0, policy_version 924114 (0.0125) [2024-06-15 22:49:39,202][1652491] Updated weights for policy 0, policy_version 924193 (0.0093) [2024-06-15 22:49:40,955][1648985] Fps is (10 sec: 52431.2, 60 sec: 49698.2, 300 sec: 46986.0). Total num frames: 1892810752. Throughput: 0: 11798.8. Samples: 473231360. Policy #0 lag: (min: 68.0, avg: 163.7, max: 308.0) [2024-06-15 22:49:40,956][1648985] Avg episode reward: [(0, '145.730')] [2024-06-15 22:49:45,929][1652491] Updated weights for policy 0, policy_version 924277 (0.0014) [2024-06-15 22:49:45,955][1648985] Fps is (10 sec: 45875.0, 60 sec: 45343.5, 300 sec: 46541.7). Total num frames: 1892909056. Throughput: 0: 12219.7. Samples: 473318400. Policy #0 lag: (min: 68.0, avg: 163.7, max: 308.0) [2024-06-15 22:49:45,956][1648985] Avg episode reward: [(0, '154.920')] [2024-06-15 22:49:47,236][1652491] Updated weights for policy 0, policy_version 924307 (0.0012) [2024-06-15 22:49:49,340][1652491] Updated weights for policy 0, policy_version 924390 (0.0015) [2024-06-15 22:49:50,562][1651469] Signal inference workers to stop experience collection... (48150 times) [2024-06-15 22:49:50,612][1652491] InferenceWorker_p0-w0: stopping experience collection (48150 times) [2024-06-15 22:49:50,761][1651469] Signal inference workers to resume experience collection... (48150 times) [2024-06-15 22:49:50,762][1652491] InferenceWorker_p0-w0: resuming experience collection (48150 times) [2024-06-15 22:49:50,765][1652491] Updated weights for policy 0, policy_version 924464 (0.0012) [2024-06-15 22:49:50,955][1648985] Fps is (10 sec: 49152.5, 60 sec: 49698.2, 300 sec: 46986.0). Total num frames: 1893302272. Throughput: 0: 11662.3. Samples: 473367552. Policy #0 lag: (min: 68.0, avg: 163.7, max: 308.0) [2024-06-15 22:49:50,955][1648985] Avg episode reward: [(0, '141.730')] [2024-06-15 22:49:55,955][1648985] Fps is (10 sec: 42598.6, 60 sec: 45329.2, 300 sec: 46208.5). Total num frames: 1893335040. Throughput: 0: 12049.1. Samples: 473417216. Policy #0 lag: (min: 68.0, avg: 163.7, max: 308.0) [2024-06-15 22:49:55,956][1648985] Avg episode reward: [(0, '140.260')] [2024-06-15 22:49:56,913][1652491] Updated weights for policy 0, policy_version 924497 (0.0011) [2024-06-15 22:49:59,412][1652491] Updated weights for policy 0, policy_version 924593 (0.0014) [2024-06-15 22:50:00,658][1652491] Updated weights for policy 0, policy_version 924656 (0.0013) [2024-06-15 22:50:00,956][1648985] Fps is (10 sec: 39320.1, 60 sec: 48605.7, 300 sec: 46541.6). Total num frames: 1893695488. Throughput: 0: 11628.0. Samples: 473479168. Policy #0 lag: (min: 68.0, avg: 163.7, max: 308.0) [2024-06-15 22:50:00,957][1648985] Avg episode reward: [(0, '139.580')] [2024-06-15 22:50:02,228][1652491] Updated weights for policy 0, policy_version 924705 (0.0010) [2024-06-15 22:50:05,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 46967.5, 300 sec: 46208.4). Total num frames: 1893859328. Throughput: 0: 11616.7. Samples: 473554944. Policy #0 lag: (min: 68.0, avg: 163.7, max: 308.0) [2024-06-15 22:50:05,956][1648985] Avg episode reward: [(0, '137.640')] [2024-06-15 22:50:09,050][1652491] Updated weights for policy 0, policy_version 924768 (0.0039) [2024-06-15 22:50:10,948][1652491] Updated weights for policy 0, policy_version 924834 (0.0013) [2024-06-15 22:50:10,955][1648985] Fps is (10 sec: 36046.0, 60 sec: 46967.6, 300 sec: 46208.4). Total num frames: 1894055936. Throughput: 0: 11650.8. Samples: 473594880. Policy #0 lag: (min: 68.0, avg: 163.7, max: 308.0) [2024-06-15 22:50:10,955][1648985] Avg episode reward: [(0, '145.940')] [2024-06-15 22:50:13,149][1652491] Updated weights for policy 0, policy_version 924930 (0.0094) [2024-06-15 22:50:14,616][1652491] Updated weights for policy 0, policy_version 924985 (0.0010) [2024-06-15 22:50:15,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 46967.4, 300 sec: 46208.4). Total num frames: 1894383616. Throughput: 0: 11241.4. Samples: 473637888. Policy #0 lag: (min: 68.0, avg: 163.7, max: 308.0) [2024-06-15 22:50:15,955][1648985] Avg episode reward: [(0, '162.080')] [2024-06-15 22:50:20,955][1648985] Fps is (10 sec: 36044.6, 60 sec: 44236.8, 300 sec: 45875.2). Total num frames: 1894416384. Throughput: 0: 11355.0. Samples: 473721856. Policy #0 lag: (min: 68.0, avg: 163.7, max: 308.0) [2024-06-15 22:50:20,956][1648985] Avg episode reward: [(0, '161.330')] [2024-06-15 22:50:21,926][1652491] Updated weights for policy 0, policy_version 925056 (0.0014) [2024-06-15 22:50:23,384][1652491] Updated weights for policy 0, policy_version 925120 (0.0013) [2024-06-15 22:50:25,089][1652491] Updated weights for policy 0, policy_version 925185 (0.0012) [2024-06-15 22:50:25,955][1648985] Fps is (10 sec: 45875.6, 60 sec: 46421.4, 300 sec: 46430.6). Total num frames: 1894842368. Throughput: 0: 11355.0. Samples: 473742336. Policy #0 lag: (min: 68.0, avg: 163.7, max: 308.0) [2024-06-15 22:50:25,955][1648985] Avg episode reward: [(0, '176.100')] [2024-06-15 22:50:26,458][1652491] Updated weights for policy 0, policy_version 925248 (0.0017) [2024-06-15 22:50:30,956][1648985] Fps is (10 sec: 49150.8, 60 sec: 43690.8, 300 sec: 46208.4). Total num frames: 1894907904. Throughput: 0: 10979.5. Samples: 473812480. Policy #0 lag: (min: 68.0, avg: 163.7, max: 308.0) [2024-06-15 22:50:30,957][1648985] Avg episode reward: [(0, '173.000')] [2024-06-15 22:50:33,808][1651469] Signal inference workers to stop experience collection... (48200 times) [2024-06-15 22:50:33,846][1652491] InferenceWorker_p0-w0: stopping experience collection (48200 times) [2024-06-15 22:50:33,848][1652491] Updated weights for policy 0, policy_version 925316 (0.0014) [2024-06-15 22:50:33,975][1651469] Signal inference workers to resume experience collection... (48200 times) [2024-06-15 22:50:33,976][1652491] InferenceWorker_p0-w0: resuming experience collection (48200 times) [2024-06-15 22:50:35,955][1648985] Fps is (10 sec: 36044.3, 60 sec: 45875.2, 300 sec: 46319.5). Total num frames: 1895202816. Throughput: 0: 11252.6. Samples: 473873920. Policy #0 lag: (min: 3.0, avg: 49.2, max: 259.0) [2024-06-15 22:50:35,956][1648985] Avg episode reward: [(0, '162.520')] [2024-06-15 22:50:36,091][1652491] Updated weights for policy 0, policy_version 925396 (0.0013) [2024-06-15 22:50:37,900][1652491] Updated weights for policy 0, policy_version 925472 (0.0083) [2024-06-15 22:50:40,955][1648985] Fps is (10 sec: 52430.2, 60 sec: 43690.7, 300 sec: 46208.4). Total num frames: 1895432192. Throughput: 0: 10797.5. Samples: 473903104. Policy #0 lag: (min: 3.0, avg: 49.2, max: 259.0) [2024-06-15 22:50:40,955][1648985] Avg episode reward: [(0, '161.300')] [2024-06-15 22:50:44,750][1652491] Updated weights for policy 0, policy_version 925505 (0.0014) [2024-06-15 22:50:45,955][1648985] Fps is (10 sec: 36045.6, 60 sec: 44236.9, 300 sec: 45764.1). Total num frames: 1895563264. Throughput: 0: 11252.7. Samples: 473985536. Policy #0 lag: (min: 3.0, avg: 49.2, max: 259.0) [2024-06-15 22:50:45,955][1648985] Avg episode reward: [(0, '161.950')] [2024-06-15 22:50:46,296][1652491] Updated weights for policy 0, policy_version 925584 (0.0017) [2024-06-15 22:50:48,168][1652491] Updated weights for policy 0, policy_version 925648 (0.0028) [2024-06-15 22:50:50,518][1652491] Updated weights for policy 0, policy_version 925756 (0.0110) [2024-06-15 22:50:50,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 44236.7, 300 sec: 46208.4). Total num frames: 1895956480. Throughput: 0: 10558.6. Samples: 474030080. Policy #0 lag: (min: 3.0, avg: 49.2, max: 259.0) [2024-06-15 22:50:50,955][1648985] Avg episode reward: [(0, '148.780')] [2024-06-15 22:50:55,955][1648985] Fps is (10 sec: 39320.2, 60 sec: 43690.5, 300 sec: 45541.9). Total num frames: 1895956480. Throughput: 0: 10763.3. Samples: 474079232. Policy #0 lag: (min: 3.0, avg: 49.2, max: 259.0) [2024-06-15 22:50:55,956][1648985] Avg episode reward: [(0, '146.960')] [2024-06-15 22:50:55,962][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000925760_1895956480.pth... [2024-06-15 22:50:56,145][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000920384_1884946432.pth [2024-06-15 22:50:56,150][1651469] Saving a milestone train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/milestones/checkpoint_000925760_1895956480.pth [2024-06-15 22:50:57,669][1652491] Updated weights for policy 0, policy_version 925824 (0.0014) [2024-06-15 22:50:59,436][1652491] Updated weights for policy 0, policy_version 925892 (0.0040) [2024-06-15 22:51:00,955][1648985] Fps is (10 sec: 39321.5, 60 sec: 44237.0, 300 sec: 45764.1). Total num frames: 1896349696. Throughput: 0: 11229.9. Samples: 474143232. Policy #0 lag: (min: 3.0, avg: 49.2, max: 259.0) [2024-06-15 22:51:00,956][1648985] Avg episode reward: [(0, '137.390')] [2024-06-15 22:51:01,232][1652491] Updated weights for policy 0, policy_version 925975 (0.0014) [2024-06-15 22:51:01,973][1652491] Updated weights for policy 0, policy_version 926013 (0.0048) [2024-06-15 22:51:05,955][1648985] Fps is (10 sec: 52429.6, 60 sec: 43690.6, 300 sec: 45764.1). Total num frames: 1896480768. Throughput: 0: 11013.7. Samples: 474217472. Policy #0 lag: (min: 3.0, avg: 49.2, max: 259.0) [2024-06-15 22:51:05,956][1648985] Avg episode reward: [(0, '139.100')] [2024-06-15 22:51:08,644][1652491] Updated weights for policy 0, policy_version 926071 (0.0015) [2024-06-15 22:51:10,212][1652491] Updated weights for policy 0, policy_version 926136 (0.0013) [2024-06-15 22:51:10,711][1651469] Signal inference workers to stop experience collection... (48250 times) [2024-06-15 22:51:10,780][1652491] InferenceWorker_p0-w0: stopping experience collection (48250 times) [2024-06-15 22:51:10,882][1651469] Signal inference workers to resume experience collection... (48250 times) [2024-06-15 22:51:10,883][1652491] InferenceWorker_p0-w0: resuming experience collection (48250 times) [2024-06-15 22:51:10,955][1648985] Fps is (10 sec: 45875.2, 60 sec: 45875.2, 300 sec: 45541.9). Total num frames: 1896808448. Throughput: 0: 11309.5. Samples: 474251264. Policy #0 lag: (min: 3.0, avg: 49.2, max: 259.0) [2024-06-15 22:51:10,956][1648985] Avg episode reward: [(0, '161.870')] [2024-06-15 22:51:11,776][1652491] Updated weights for policy 0, policy_version 926214 (0.0086) [2024-06-15 22:51:15,955][1648985] Fps is (10 sec: 52430.0, 60 sec: 43690.8, 300 sec: 46208.5). Total num frames: 1897005056. Throughput: 0: 11173.1. Samples: 474315264. Policy #0 lag: (min: 3.0, avg: 49.2, max: 259.0) [2024-06-15 22:51:15,955][1648985] Avg episode reward: [(0, '174.080')] [2024-06-15 22:51:18,680][1652491] Updated weights for policy 0, policy_version 926274 (0.0014) [2024-06-15 22:51:20,289][1652491] Updated weights for policy 0, policy_version 926337 (0.0012) [2024-06-15 22:51:20,955][1648985] Fps is (10 sec: 39321.4, 60 sec: 46421.3, 300 sec: 45430.9). Total num frames: 1897201664. Throughput: 0: 11446.0. Samples: 474388992. Policy #0 lag: (min: 3.0, avg: 49.2, max: 259.0) [2024-06-15 22:51:20,956][1648985] Avg episode reward: [(0, '158.850')] [2024-06-15 22:51:21,664][1652491] Updated weights for policy 0, policy_version 926401 (0.0014) [2024-06-15 22:51:23,526][1652491] Updated weights for policy 0, policy_version 926496 (0.0124) [2024-06-15 22:51:25,955][1648985] Fps is (10 sec: 52427.4, 60 sec: 44782.8, 300 sec: 46208.4). Total num frames: 1897529344. Throughput: 0: 11355.0. Samples: 474414080. Policy #0 lag: (min: 3.0, avg: 49.2, max: 259.0) [2024-06-15 22:51:25,956][1648985] Avg episode reward: [(0, '144.280')] [2024-06-15 22:51:30,483][1652491] Updated weights for policy 0, policy_version 926562 (0.0013) [2024-06-15 22:51:30,955][1648985] Fps is (10 sec: 42599.3, 60 sec: 45329.4, 300 sec: 45653.1). Total num frames: 1897627648. Throughput: 0: 11468.8. Samples: 474501632. Policy #0 lag: (min: 3.0, avg: 49.2, max: 259.0) [2024-06-15 22:51:30,955][1648985] Avg episode reward: [(0, '146.550')] [2024-06-15 22:51:32,702][1652491] Updated weights for policy 0, policy_version 926659 (0.0110) [2024-06-15 22:51:33,772][1652491] Updated weights for policy 0, policy_version 926709 (0.0014) [2024-06-15 22:51:35,238][1652491] Updated weights for policy 0, policy_version 926782 (0.0015) [2024-06-15 22:51:35,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 47513.6, 300 sec: 46208.4). Total num frames: 1898053632. Throughput: 0: 11673.6. Samples: 474555392. Policy #0 lag: (min: 3.0, avg: 49.2, max: 259.0) [2024-06-15 22:51:35,956][1648985] Avg episode reward: [(0, '147.080')] [2024-06-15 22:51:40,955][1648985] Fps is (10 sec: 45874.0, 60 sec: 44236.7, 300 sec: 46097.3). Total num frames: 1898086400. Throughput: 0: 11662.3. Samples: 474604032. Policy #0 lag: (min: 3.0, avg: 49.2, max: 259.0) [2024-06-15 22:51:40,956][1648985] Avg episode reward: [(0, '166.580')] [2024-06-15 22:51:41,744][1652491] Updated weights for policy 0, policy_version 926843 (0.0023) [2024-06-15 22:51:42,919][1652491] Updated weights for policy 0, policy_version 926896 (0.0089) [2024-06-15 22:51:45,089][1652491] Updated weights for policy 0, policy_version 926978 (0.0013) [2024-06-15 22:51:45,938][1651469] Signal inference workers to stop experience collection... (48300 times) [2024-06-15 22:51:45,955][1648985] Fps is (10 sec: 45875.4, 60 sec: 49151.9, 300 sec: 46208.4). Total num frames: 1898512384. Throughput: 0: 11628.1. Samples: 474666496. Policy #0 lag: (min: 3.0, avg: 49.2, max: 259.0) [2024-06-15 22:51:45,969][1648985] Avg episode reward: [(0, '164.180')] [2024-06-15 22:51:46,032][1652491] InferenceWorker_p0-w0: stopping experience collection (48300 times) [2024-06-15 22:51:46,234][1651469] Signal inference workers to resume experience collection... (48300 times) [2024-06-15 22:51:46,235][1652491] InferenceWorker_p0-w0: resuming experience collection (48300 times) [2024-06-15 22:51:46,464][1652491] Updated weights for policy 0, policy_version 927035 (0.0015) [2024-06-15 22:51:50,955][1648985] Fps is (10 sec: 49152.3, 60 sec: 43690.6, 300 sec: 46208.5). Total num frames: 1898577920. Throughput: 0: 11730.5. Samples: 474745344. Policy #0 lag: (min: 3.0, avg: 49.2, max: 259.0) [2024-06-15 22:51:50,956][1648985] Avg episode reward: [(0, '150.540')] [2024-06-15 22:51:53,255][1652491] Updated weights for policy 0, policy_version 927089 (0.0025) [2024-06-15 22:51:54,728][1652491] Updated weights for policy 0, policy_version 927168 (0.0013) [2024-06-15 22:51:55,955][1648985] Fps is (10 sec: 42598.2, 60 sec: 49698.3, 300 sec: 46319.5). Total num frames: 1898938368. Throughput: 0: 11764.6. Samples: 474780672. Policy #0 lag: (min: 3.0, avg: 49.2, max: 259.0) [2024-06-15 22:51:55,956][1648985] Avg episode reward: [(0, '140.510')] [2024-06-15 22:51:56,389][1652491] Updated weights for policy 0, policy_version 927232 (0.0022) [2024-06-15 22:51:57,834][1652491] Updated weights for policy 0, policy_version 927284 (0.0013) [2024-06-15 22:52:00,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 45875.2, 300 sec: 46208.4). Total num frames: 1899102208. Throughput: 0: 11684.9. Samples: 474841088. Policy #0 lag: (min: 3.0, avg: 49.2, max: 259.0) [2024-06-15 22:52:00,956][1648985] Avg episode reward: [(0, '144.980')] [2024-06-15 22:52:04,152][1652491] Updated weights for policy 0, policy_version 927314 (0.0018) [2024-06-15 22:52:05,835][1652491] Updated weights for policy 0, policy_version 927401 (0.0015) [2024-06-15 22:52:05,955][1648985] Fps is (10 sec: 39321.6, 60 sec: 47513.6, 300 sec: 46208.4). Total num frames: 1899331584. Throughput: 0: 11730.5. Samples: 474916864. Policy #0 lag: (min: 3.0, avg: 49.2, max: 259.0) [2024-06-15 22:52:05,956][1648985] Avg episode reward: [(0, '157.980')] [2024-06-15 22:52:07,093][1652491] Updated weights for policy 0, policy_version 927456 (0.0014) [2024-06-15 22:52:09,496][1652491] Updated weights for policy 0, policy_version 927546 (0.0013) [2024-06-15 22:52:10,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 46967.4, 300 sec: 46208.4). Total num frames: 1899626496. Throughput: 0: 11673.6. Samples: 474939392. Policy #0 lag: (min: 3.0, avg: 49.2, max: 259.0) [2024-06-15 22:52:10,956][1648985] Avg episode reward: [(0, '163.160')] [2024-06-15 22:52:15,955][1648985] Fps is (10 sec: 32768.2, 60 sec: 44236.7, 300 sec: 45875.2). Total num frames: 1899659264. Throughput: 0: 11548.4. Samples: 475021312. Policy #0 lag: (min: 3.0, avg: 49.2, max: 259.0) [2024-06-15 22:52:15,955][1648985] Avg episode reward: [(0, '193.570')] [2024-06-15 22:52:17,113][1652491] Updated weights for policy 0, policy_version 927632 (0.0013) [2024-06-15 22:52:18,574][1652491] Updated weights for policy 0, policy_version 927696 (0.0098) [2024-06-15 22:52:20,014][1652491] Updated weights for policy 0, policy_version 927760 (0.0012) [2024-06-15 22:52:20,955][1648985] Fps is (10 sec: 49151.6, 60 sec: 48605.8, 300 sec: 46097.3). Total num frames: 1900118016. Throughput: 0: 11457.4. Samples: 475070976. Policy #0 lag: (min: 3.0, avg: 49.2, max: 259.0) [2024-06-15 22:52:20,956][1648985] Avg episode reward: [(0, '174.920')] [2024-06-15 22:52:25,955][1648985] Fps is (10 sec: 49151.8, 60 sec: 43690.7, 300 sec: 45986.3). Total num frames: 1900150784. Throughput: 0: 11332.3. Samples: 475113984. Policy #0 lag: (min: 3.0, avg: 49.2, max: 259.0) [2024-06-15 22:52:25,956][1648985] Avg episode reward: [(0, '170.940')] [2024-06-15 22:52:27,150][1652491] Updated weights for policy 0, policy_version 927824 (0.0015) [2024-06-15 22:52:28,137][1651469] Signal inference workers to stop experience collection... (48350 times) [2024-06-15 22:52:28,209][1652491] InferenceWorker_p0-w0: stopping experience collection (48350 times) [2024-06-15 22:52:28,392][1651469] Signal inference workers to resume experience collection... (48350 times) [2024-06-15 22:52:28,393][1652491] InferenceWorker_p0-w0: resuming experience collection (48350 times) [2024-06-15 22:52:28,848][1652491] Updated weights for policy 0, policy_version 927908 (0.0013) [2024-06-15 22:52:29,882][1652491] Updated weights for policy 0, policy_version 927952 (0.0105) [2024-06-15 22:52:30,962][1648985] Fps is (10 sec: 39293.1, 60 sec: 48053.7, 300 sec: 45874.0). Total num frames: 1900511232. Throughput: 0: 11432.8. Samples: 475181056. Policy #0 lag: (min: 49.0, avg: 101.5, max: 305.0) [2024-06-15 22:52:30,963][1648985] Avg episode reward: [(0, '156.480')] [2024-06-15 22:52:31,443][1652491] Updated weights for policy 0, policy_version 928004 (0.0012) [2024-06-15 22:52:32,835][1652491] Updated weights for policy 0, policy_version 928061 (0.0011) [2024-06-15 22:52:35,955][1648985] Fps is (10 sec: 52428.4, 60 sec: 43690.6, 300 sec: 46208.4). Total num frames: 1900675072. Throughput: 0: 11127.5. Samples: 475246080. Policy #0 lag: (min: 49.0, avg: 101.5, max: 305.0) [2024-06-15 22:52:35,956][1648985] Avg episode reward: [(0, '155.620')] [2024-06-15 22:52:39,782][1652491] Updated weights for policy 0, policy_version 928120 (0.0016) [2024-06-15 22:52:40,955][1648985] Fps is (10 sec: 36071.3, 60 sec: 46421.4, 300 sec: 45764.1). Total num frames: 1900871680. Throughput: 0: 11252.6. Samples: 475287040. Policy #0 lag: (min: 49.0, avg: 101.5, max: 305.0) [2024-06-15 22:52:40,956][1648985] Avg episode reward: [(0, '167.890')] [2024-06-15 22:52:41,083][1652491] Updated weights for policy 0, policy_version 928161 (0.0012) [2024-06-15 22:52:42,916][1652491] Updated weights for policy 0, policy_version 928229 (0.0012) [2024-06-15 22:52:44,922][1652491] Updated weights for policy 0, policy_version 928304 (0.0112) [2024-06-15 22:52:45,955][1648985] Fps is (10 sec: 52429.4, 60 sec: 44782.9, 300 sec: 46208.4). Total num frames: 1901199360. Throughput: 0: 11036.5. Samples: 475337728. Policy #0 lag: (min: 49.0, avg: 101.5, max: 305.0) [2024-06-15 22:52:45,956][1648985] Avg episode reward: [(0, '166.940')] [2024-06-15 22:52:50,955][1648985] Fps is (10 sec: 39322.0, 60 sec: 44783.0, 300 sec: 45653.1). Total num frames: 1901264896. Throughput: 0: 11138.9. Samples: 475418112. Policy #0 lag: (min: 49.0, avg: 101.5, max: 305.0) [2024-06-15 22:52:50,955][1648985] Avg episode reward: [(0, '166.670')] [2024-06-15 22:52:51,054][1652491] Updated weights for policy 0, policy_version 928358 (0.0013) [2024-06-15 22:52:53,220][1652491] Updated weights for policy 0, policy_version 928432 (0.0132) [2024-06-15 22:52:55,955][1648985] Fps is (10 sec: 42597.7, 60 sec: 44782.9, 300 sec: 45875.2). Total num frames: 1901625344. Throughput: 0: 11229.8. Samples: 475444736. Policy #0 lag: (min: 49.0, avg: 101.5, max: 305.0) [2024-06-15 22:52:55,956][1648985] Avg episode reward: [(0, '166.460')] [2024-06-15 22:52:56,286][1652491] Updated weights for policy 0, policy_version 928551 (0.0094) [2024-06-15 22:52:56,425][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000928560_1901690880.pth... [2024-06-15 22:52:56,482][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000923168_1890648064.pth [2024-06-15 22:53:00,955][1648985] Fps is (10 sec: 45874.9, 60 sec: 43690.7, 300 sec: 45986.3). Total num frames: 1901723648. Throughput: 0: 10854.4. Samples: 475509760. Policy #0 lag: (min: 49.0, avg: 101.5, max: 305.0) [2024-06-15 22:53:00,956][1648985] Avg episode reward: [(0, '161.210')] [2024-06-15 22:53:02,692][1652491] Updated weights for policy 0, policy_version 928593 (0.0013) [2024-06-15 22:53:04,475][1652491] Updated weights for policy 0, policy_version 928645 (0.0014) [2024-06-15 22:53:05,955][1648985] Fps is (10 sec: 32768.6, 60 sec: 43690.7, 300 sec: 45319.8). Total num frames: 1901953024. Throughput: 0: 11241.3. Samples: 475576832. Policy #0 lag: (min: 49.0, avg: 101.5, max: 305.0) [2024-06-15 22:53:05,985][1648985] Avg episode reward: [(0, '169.440')] [2024-06-15 22:53:06,360][1652491] Updated weights for policy 0, policy_version 928720 (0.0015) [2024-06-15 22:53:07,620][1651469] Signal inference workers to stop experience collection... (48400 times) [2024-06-15 22:53:07,683][1652491] InferenceWorker_p0-w0: stopping experience collection (48400 times) [2024-06-15 22:53:07,686][1652491] Updated weights for policy 0, policy_version 928771 (0.0012) [2024-06-15 22:53:07,932][1651469] Signal inference workers to resume experience collection... (48400 times) [2024-06-15 22:53:07,933][1652491] InferenceWorker_p0-w0: resuming experience collection (48400 times) [2024-06-15 22:53:10,955][1648985] Fps is (10 sec: 52428.1, 60 sec: 43690.6, 300 sec: 46208.4). Total num frames: 1902247936. Throughput: 0: 10854.4. Samples: 475602432. Policy #0 lag: (min: 49.0, avg: 101.5, max: 305.0) [2024-06-15 22:53:10,956][1648985] Avg episode reward: [(0, '154.830')] [2024-06-15 22:53:13,531][1652491] Updated weights for policy 0, policy_version 928848 (0.0014) [2024-06-15 22:53:15,955][1648985] Fps is (10 sec: 42598.4, 60 sec: 45329.1, 300 sec: 45319.8). Total num frames: 1902379008. Throughput: 0: 11117.9. Samples: 475681280. Policy #0 lag: (min: 49.0, avg: 101.5, max: 305.0) [2024-06-15 22:53:15,956][1648985] Avg episode reward: [(0, '170.670')] [2024-06-15 22:53:16,337][1652491] Updated weights for policy 0, policy_version 928898 (0.0013) [2024-06-15 22:53:18,177][1652491] Updated weights for policy 0, policy_version 928977 (0.0014) [2024-06-15 22:53:19,875][1652491] Updated weights for policy 0, policy_version 929044 (0.0012) [2024-06-15 22:53:20,682][1652491] Updated weights for policy 0, policy_version 929087 (0.0013) [2024-06-15 22:53:20,955][1648985] Fps is (10 sec: 52429.5, 60 sec: 44236.9, 300 sec: 46208.5). Total num frames: 1902772224. Throughput: 0: 10956.8. Samples: 475739136. Policy #0 lag: (min: 49.0, avg: 101.5, max: 305.0) [2024-06-15 22:53:20,956][1648985] Avg episode reward: [(0, '172.760')] [2024-06-15 22:53:25,955][1648985] Fps is (10 sec: 49152.0, 60 sec: 45329.1, 300 sec: 45653.0). Total num frames: 1902870528. Throughput: 0: 11002.3. Samples: 475782144. Policy #0 lag: (min: 49.0, avg: 101.5, max: 305.0) [2024-06-15 22:53:25,956][1648985] Avg episode reward: [(0, '173.600')] [2024-06-15 22:53:25,956][1652491] Updated weights for policy 0, policy_version 929141 (0.0014) [2024-06-15 22:53:28,758][1652491] Updated weights for policy 0, policy_version 929200 (0.0015) [2024-06-15 22:53:30,815][1652491] Updated weights for policy 0, policy_version 929281 (0.0013) [2024-06-15 22:53:30,955][1648985] Fps is (10 sec: 39321.4, 60 sec: 44242.2, 300 sec: 45764.1). Total num frames: 1903165440. Throughput: 0: 11377.8. Samples: 475849728. Policy #0 lag: (min: 49.0, avg: 101.5, max: 305.0) [2024-06-15 22:53:30,956][1648985] Avg episode reward: [(0, '163.440')] [2024-06-15 22:53:35,955][1648985] Fps is (10 sec: 42598.6, 60 sec: 43690.8, 300 sec: 45653.1). Total num frames: 1903296512. Throughput: 0: 11104.7. Samples: 475917824. Policy #0 lag: (min: 49.0, avg: 101.5, max: 305.0) [2024-06-15 22:53:35,955][1648985] Avg episode reward: [(0, '159.090')] [2024-06-15 22:53:36,899][1652491] Updated weights for policy 0, policy_version 929363 (0.0012) [2024-06-15 22:53:37,550][1652491] Updated weights for policy 0, policy_version 929408 (0.0012) [2024-06-15 22:53:40,634][1652491] Updated weights for policy 0, policy_version 929472 (0.0013) [2024-06-15 22:53:40,955][1648985] Fps is (10 sec: 39321.5, 60 sec: 44782.9, 300 sec: 45322.7). Total num frames: 1903558656. Throughput: 0: 11423.3. Samples: 475958784. Policy #0 lag: (min: 49.0, avg: 101.5, max: 305.0) [2024-06-15 22:53:40,956][1648985] Avg episode reward: [(0, '166.450')] [2024-06-15 22:53:42,840][1652491] Updated weights for policy 0, policy_version 929554 (0.0133) [2024-06-15 22:53:43,825][1652491] Updated weights for policy 0, policy_version 929594 (0.0022) [2024-06-15 22:53:45,955][1648985] Fps is (10 sec: 52427.4, 60 sec: 43690.5, 300 sec: 45764.1). Total num frames: 1903820800. Throughput: 0: 11252.6. Samples: 476016128. Policy #0 lag: (min: 49.0, avg: 101.5, max: 305.0) [2024-06-15 22:53:45,956][1648985] Avg episode reward: [(0, '146.960')] [2024-06-15 22:53:48,624][1652491] Updated weights for policy 0, policy_version 929643 (0.0014) [2024-06-15 22:53:50,955][1648985] Fps is (10 sec: 42598.5, 60 sec: 45329.0, 300 sec: 45319.8). Total num frames: 1903984640. Throughput: 0: 11571.2. Samples: 476097536. Policy #0 lag: (min: 49.0, avg: 101.5, max: 305.0) [2024-06-15 22:53:50,956][1648985] Avg episode reward: [(0, '139.170')] [2024-06-15 22:53:51,308][1652491] Updated weights for policy 0, policy_version 929696 (0.0141) [2024-06-15 22:53:51,470][1651469] Signal inference workers to stop experience collection... (48450 times) [2024-06-15 22:53:51,509][1652491] InferenceWorker_p0-w0: stopping experience collection (48450 times) [2024-06-15 22:53:51,707][1651469] Signal inference workers to resume experience collection... (48450 times) [2024-06-15 22:53:51,708][1652491] InferenceWorker_p0-w0: resuming experience collection (48450 times) [2024-06-15 22:53:53,651][1652491] Updated weights for policy 0, policy_version 929783 (0.0013) [2024-06-15 22:53:54,953][1652491] Updated weights for policy 0, policy_version 929850 (0.0012) [2024-06-15 22:53:55,955][1648985] Fps is (10 sec: 52429.8, 60 sec: 45329.2, 300 sec: 45986.3). Total num frames: 1904345088. Throughput: 0: 11468.8. Samples: 476118528. Policy #0 lag: (min: 49.0, avg: 101.5, max: 305.0) [2024-06-15 22:53:55,956][1648985] Avg episode reward: [(0, '145.770')] [2024-06-15 22:54:00,553][1652491] Updated weights for policy 0, policy_version 929913 (0.0149) [2024-06-15 22:54:00,955][1648985] Fps is (10 sec: 49151.7, 60 sec: 45875.1, 300 sec: 45542.0). Total num frames: 1904476160. Throughput: 0: 11400.5. Samples: 476194304. Policy #0 lag: (min: 49.0, avg: 101.5, max: 305.0) [2024-06-15 22:54:00,956][1648985] Avg episode reward: [(0, '150.780')] [2024-06-15 22:54:02,902][1652491] Updated weights for policy 0, policy_version 929956 (0.0015) [2024-06-15 22:54:04,764][1652491] Updated weights for policy 0, policy_version 930036 (0.0014) [2024-06-15 22:54:05,833][1652491] Updated weights for policy 0, policy_version 930096 (0.0028) [2024-06-15 22:54:05,955][1648985] Fps is (10 sec: 49152.1, 60 sec: 48059.7, 300 sec: 46097.4). Total num frames: 1904836608. Throughput: 0: 11502.9. Samples: 476256768. Policy #0 lag: (min: 49.0, avg: 101.5, max: 305.0) [2024-06-15 22:54:05,956][1648985] Avg episode reward: [(0, '154.520')] [2024-06-15 22:54:10,433][1652491] Updated weights for policy 0, policy_version 930128 (0.0012) [2024-06-15 22:54:10,955][1648985] Fps is (10 sec: 45875.8, 60 sec: 44783.1, 300 sec: 45319.8). Total num frames: 1904934912. Throughput: 0: 11525.7. Samples: 476300800. Policy #0 lag: (min: 49.0, avg: 101.5, max: 305.0) [2024-06-15 22:54:10,955][1648985] Avg episode reward: [(0, '167.610')] [2024-06-15 22:54:12,642][1652491] Updated weights for policy 0, policy_version 930178 (0.0014) [2024-06-15 22:54:13,526][1652491] Updated weights for policy 0, policy_version 930229 (0.0012) [2024-06-15 22:54:15,188][1652491] Updated weights for policy 0, policy_version 930288 (0.0021) [2024-06-15 22:54:15,955][1648985] Fps is (10 sec: 42598.3, 60 sec: 48059.7, 300 sec: 45764.1). Total num frames: 1905262592. Throughput: 0: 11548.5. Samples: 476369408. Policy #0 lag: (min: 49.0, avg: 101.5, max: 305.0) [2024-06-15 22:54:15,956][1648985] Avg episode reward: [(0, '177.540')] [2024-06-15 22:54:20,921][1652491] Updated weights for policy 0, policy_version 930371 (0.0254) [2024-06-15 22:54:20,955][1648985] Fps is (10 sec: 45874.3, 60 sec: 43690.6, 300 sec: 45208.7). Total num frames: 1905393664. Throughput: 0: 11662.2. Samples: 476442624. Policy #0 lag: (min: 49.0, avg: 101.5, max: 305.0) [2024-06-15 22:54:20,956][1648985] Avg episode reward: [(0, '191.100')] [2024-06-15 22:54:22,283][1652491] Updated weights for policy 0, policy_version 930432 (0.0025) [2024-06-15 22:54:25,546][1652491] Updated weights for policy 0, policy_version 930485 (0.0014) [2024-06-15 22:54:25,955][1648985] Fps is (10 sec: 39321.6, 60 sec: 46421.3, 300 sec: 45319.9). Total num frames: 1905655808. Throughput: 0: 11525.7. Samples: 476477440. Policy #0 lag: (min: 15.0, avg: 98.0, max: 271.0) [2024-06-15 22:54:25,956][1648985] Avg episode reward: [(0, '176.400')] [2024-06-15 22:54:26,701][1652491] Updated weights for policy 0, policy_version 930520 (0.0014) [2024-06-15 22:54:28,090][1652491] Updated weights for policy 0, policy_version 930582 (0.0013) [2024-06-15 22:54:30,955][1648985] Fps is (10 sec: 52429.7, 60 sec: 45875.3, 300 sec: 45653.0). Total num frames: 1905917952. Throughput: 0: 11639.5. Samples: 476539904. Policy #0 lag: (min: 15.0, avg: 98.0, max: 271.0) [2024-06-15 22:54:30,956][1648985] Avg episode reward: [(0, '166.430')] [2024-06-15 22:54:32,030][1651469] Signal inference workers to stop experience collection... (48500 times) [2024-06-15 22:54:32,079][1652491] InferenceWorker_p0-w0: stopping experience collection (48500 times) [2024-06-15 22:54:32,310][1651469] Signal inference workers to resume experience collection... (48500 times) [2024-06-15 22:54:32,311][1652491] InferenceWorker_p0-w0: resuming experience collection (48500 times) [2024-06-15 22:54:32,313][1652491] Updated weights for policy 0, policy_version 930656 (0.0019) [2024-06-15 22:54:35,955][1648985] Fps is (10 sec: 39321.8, 60 sec: 45875.2, 300 sec: 44875.5). Total num frames: 1906049024. Throughput: 0: 11525.7. Samples: 476616192. Policy #0 lag: (min: 15.0, avg: 98.0, max: 271.0) [2024-06-15 22:54:35,956][1648985] Avg episode reward: [(0, '170.420')] [2024-06-15 22:54:36,282][1652491] Updated weights for policy 0, policy_version 930705 (0.0012) [2024-06-15 22:54:37,758][1652491] Updated weights for policy 0, policy_version 930784 (0.0012) [2024-06-15 22:54:39,735][1652491] Updated weights for policy 0, policy_version 930874 (0.0027) [2024-06-15 22:54:40,955][1648985] Fps is (10 sec: 52428.6, 60 sec: 48059.8, 300 sec: 45875.2). Total num frames: 1906442240. Throughput: 0: 11741.9. Samples: 476646912. Policy #0 lag: (min: 15.0, avg: 98.0, max: 271.0) [2024-06-15 22:54:40,956][1648985] Avg episode reward: [(0, '159.460')] [2024-06-15 22:54:43,660][1652491] Updated weights for policy 0, policy_version 930928 (0.0019) [2024-06-15 22:54:45,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 45875.4, 300 sec: 44986.6). Total num frames: 1906573312. Throughput: 0: 11719.1. Samples: 476721664. Policy #0 lag: (min: 15.0, avg: 98.0, max: 271.0) [2024-06-15 22:54:45,955][1648985] Avg episode reward: [(0, '147.710')] [2024-06-15 22:54:47,336][1652491] Updated weights for policy 0, policy_version 930980 (0.0013) [2024-06-15 22:54:48,128][1652491] Updated weights for policy 0, policy_version 931024 (0.0016) [2024-06-15 22:54:49,380][1652491] Updated weights for policy 0, policy_version 931081 (0.0107) [2024-06-15 22:54:50,518][1652491] Updated weights for policy 0, policy_version 931130 (0.0075) [2024-06-15 22:54:50,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 49698.2, 300 sec: 46208.4). Total num frames: 1906966528. Throughput: 0: 11946.7. Samples: 476794368. Policy #0 lag: (min: 15.0, avg: 98.0, max: 271.0) [2024-06-15 22:54:50,956][1648985] Avg episode reward: [(0, '150.600')] [2024-06-15 22:54:54,245][1652491] Updated weights for policy 0, policy_version 931194 (0.0014) [2024-06-15 22:54:55,955][1648985] Fps is (10 sec: 52427.0, 60 sec: 45875.0, 300 sec: 45430.9). Total num frames: 1907097600. Throughput: 0: 11821.4. Samples: 476832768. Policy #0 lag: (min: 15.0, avg: 98.0, max: 271.0) [2024-06-15 22:54:55,956][1648985] Avg episode reward: [(0, '155.310')] [2024-06-15 22:54:55,973][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000931200_1907097600.pth... [2024-06-15 22:54:56,035][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000925760_1895956480.pth [2024-06-15 22:54:58,473][1652491] Updated weights for policy 0, policy_version 931264 (0.0013) [2024-06-15 22:54:59,837][1652491] Updated weights for policy 0, policy_version 931324 (0.0122) [2024-06-15 22:55:00,955][1648985] Fps is (10 sec: 42598.1, 60 sec: 48605.9, 300 sec: 45875.2). Total num frames: 1907392512. Throughput: 0: 11832.9. Samples: 476901888. Policy #0 lag: (min: 15.0, avg: 98.0, max: 271.0) [2024-06-15 22:55:00,956][1648985] Avg episode reward: [(0, '159.020')] [2024-06-15 22:55:01,459][1652491] Updated weights for policy 0, policy_version 931376 (0.0013) [2024-06-15 22:55:05,436][1652491] Updated weights for policy 0, policy_version 931424 (0.0017) [2024-06-15 22:55:05,955][1648985] Fps is (10 sec: 49153.6, 60 sec: 45875.2, 300 sec: 45875.2). Total num frames: 1907589120. Throughput: 0: 11787.4. Samples: 476973056. Policy #0 lag: (min: 15.0, avg: 98.0, max: 271.0) [2024-06-15 22:55:05,956][1648985] Avg episode reward: [(0, '161.290')] [2024-06-15 22:55:08,652][1652491] Updated weights for policy 0, policy_version 931476 (0.0013) [2024-06-15 22:55:10,413][1652491] Updated weights for policy 0, policy_version 931548 (0.0013) [2024-06-15 22:55:10,955][1648985] Fps is (10 sec: 45874.7, 60 sec: 48605.7, 300 sec: 45653.0). Total num frames: 1907851264. Throughput: 0: 11855.6. Samples: 477010944. Policy #0 lag: (min: 15.0, avg: 98.0, max: 271.0) [2024-06-15 22:55:10,956][1648985] Avg episode reward: [(0, '152.490')] [2024-06-15 22:55:11,214][1652491] Updated weights for policy 0, policy_version 931578 (0.0012) [2024-06-15 22:55:12,440][1651469] Signal inference workers to stop experience collection... (48550 times) [2024-06-15 22:55:12,544][1652491] InferenceWorker_p0-w0: stopping experience collection (48550 times) [2024-06-15 22:55:12,735][1651469] Signal inference workers to resume experience collection... (48550 times) [2024-06-15 22:55:12,736][1652491] InferenceWorker_p0-w0: resuming experience collection (48550 times) [2024-06-15 22:55:13,373][1652491] Updated weights for policy 0, policy_version 931638 (0.0013) [2024-06-15 22:55:15,955][1648985] Fps is (10 sec: 42598.6, 60 sec: 45875.3, 300 sec: 46097.4). Total num frames: 1908015104. Throughput: 0: 12003.6. Samples: 477080064. Policy #0 lag: (min: 15.0, avg: 98.0, max: 271.0) [2024-06-15 22:55:15,955][1648985] Avg episode reward: [(0, '160.800')] [2024-06-15 22:55:16,761][1652491] Updated weights for policy 0, policy_version 931681 (0.0012) [2024-06-15 22:55:19,378][1652491] Updated weights for policy 0, policy_version 931718 (0.0013) [2024-06-15 22:55:20,602][1652491] Updated weights for policy 0, policy_version 931780 (0.0014) [2024-06-15 22:55:20,955][1648985] Fps is (10 sec: 45876.1, 60 sec: 48606.0, 300 sec: 45653.0). Total num frames: 1908310016. Throughput: 0: 11855.6. Samples: 477149696. Policy #0 lag: (min: 15.0, avg: 98.0, max: 271.0) [2024-06-15 22:55:20,956][1648985] Avg episode reward: [(0, '157.630')] [2024-06-15 22:55:21,738][1652491] Updated weights for policy 0, policy_version 931829 (0.0014) [2024-06-15 22:55:24,068][1652491] Updated weights for policy 0, policy_version 931888 (0.0018) [2024-06-15 22:55:25,955][1648985] Fps is (10 sec: 52427.6, 60 sec: 48059.6, 300 sec: 46208.4). Total num frames: 1908539392. Throughput: 0: 11901.1. Samples: 477182464. Policy #0 lag: (min: 15.0, avg: 98.0, max: 271.0) [2024-06-15 22:55:25,956][1648985] Avg episode reward: [(0, '171.910')] [2024-06-15 22:55:27,955][1652491] Updated weights for policy 0, policy_version 931937 (0.0015) [2024-06-15 22:55:30,395][1652491] Updated weights for policy 0, policy_version 931984 (0.0013) [2024-06-15 22:55:30,955][1648985] Fps is (10 sec: 42598.5, 60 sec: 46967.5, 300 sec: 45875.2). Total num frames: 1908736000. Throughput: 0: 12014.9. Samples: 477262336. Policy #0 lag: (min: 15.0, avg: 98.0, max: 271.0) [2024-06-15 22:55:30,956][1648985] Avg episode reward: [(0, '173.200')] [2024-06-15 22:55:32,298][1652491] Updated weights for policy 0, policy_version 932072 (0.0101) [2024-06-15 22:55:34,628][1652491] Updated weights for policy 0, policy_version 932112 (0.0025) [2024-06-15 22:55:35,827][1652491] Updated weights for policy 0, policy_version 932157 (0.0022) [2024-06-15 22:55:35,955][1648985] Fps is (10 sec: 52429.7, 60 sec: 50244.3, 300 sec: 46208.4). Total num frames: 1909063680. Throughput: 0: 11798.8. Samples: 477325312. Policy #0 lag: (min: 15.0, avg: 98.0, max: 271.0) [2024-06-15 22:55:35,956][1648985] Avg episode reward: [(0, '191.850')] [2024-06-15 22:55:40,152][1652491] Updated weights for policy 0, policy_version 932224 (0.0013) [2024-06-15 22:55:40,955][1648985] Fps is (10 sec: 45875.3, 60 sec: 45875.3, 300 sec: 46208.4). Total num frames: 1909194752. Throughput: 0: 11912.6. Samples: 477368832. Policy #0 lag: (min: 15.0, avg: 98.0, max: 271.0) [2024-06-15 22:55:40,955][1648985] Avg episode reward: [(0, '167.080')] [2024-06-15 22:55:43,507][1652491] Updated weights for policy 0, policy_version 932321 (0.0011) [2024-06-15 22:55:45,268][1652491] Updated weights for policy 0, policy_version 932368 (0.0099) [2024-06-15 22:55:45,955][1648985] Fps is (10 sec: 45874.5, 60 sec: 49151.9, 300 sec: 45986.3). Total num frames: 1909522432. Throughput: 0: 11662.2. Samples: 477426688. Policy #0 lag: (min: 15.0, avg: 98.0, max: 271.0) [2024-06-15 22:55:45,956][1648985] Avg episode reward: [(0, '168.080')] [2024-06-15 22:55:50,737][1652491] Updated weights for policy 0, policy_version 932417 (0.0093) [2024-06-15 22:55:50,955][1648985] Fps is (10 sec: 42598.2, 60 sec: 44236.8, 300 sec: 46319.5). Total num frames: 1909620736. Throughput: 0: 11764.6. Samples: 477502464. Policy #0 lag: (min: 15.0, avg: 98.0, max: 271.0) [2024-06-15 22:55:50,956][1648985] Avg episode reward: [(0, '166.610')] [2024-06-15 22:55:53,212][1652491] Updated weights for policy 0, policy_version 932496 (0.0015) [2024-06-15 22:55:55,388][1652491] Updated weights for policy 0, policy_version 932576 (0.0102) [2024-06-15 22:55:55,942][1651469] Signal inference workers to stop experience collection... (48600 times) [2024-06-15 22:55:55,955][1648985] Fps is (10 sec: 42598.6, 60 sec: 47513.8, 300 sec: 46097.3). Total num frames: 1909948416. Throughput: 0: 11628.1. Samples: 477534208. Policy #0 lag: (min: 15.0, avg: 98.0, max: 271.0) [2024-06-15 22:55:55,956][1648985] Avg episode reward: [(0, '175.780')] [2024-06-15 22:55:55,979][1652491] InferenceWorker_p0-w0: stopping experience collection (48600 times) [2024-06-15 22:55:56,292][1651469] Signal inference workers to resume experience collection... (48600 times) [2024-06-15 22:55:56,294][1652491] InferenceWorker_p0-w0: resuming experience collection (48600 times) [2024-06-15 22:55:57,614][1652491] Updated weights for policy 0, policy_version 932656 (0.0013) [2024-06-15 22:56:00,955][1648985] Fps is (10 sec: 49152.2, 60 sec: 45329.2, 300 sec: 46208.5). Total num frames: 1910112256. Throughput: 0: 11491.6. Samples: 477597184. Policy #0 lag: (min: 15.0, avg: 98.0, max: 271.0) [2024-06-15 22:56:00,956][1648985] Avg episode reward: [(0, '165.420')] [2024-06-15 22:56:01,646][1652491] Updated weights for policy 0, policy_version 932673 (0.0012) [2024-06-15 22:56:02,620][1652491] Updated weights for policy 0, policy_version 932735 (0.0015) [2024-06-15 22:56:04,999][1652491] Updated weights for policy 0, policy_version 932800 (0.0014) [2024-06-15 22:56:05,955][1648985] Fps is (10 sec: 42598.5, 60 sec: 46421.3, 300 sec: 45986.3). Total num frames: 1910374400. Throughput: 0: 11753.2. Samples: 477678592. Policy #0 lag: (min: 15.0, avg: 98.0, max: 271.0) [2024-06-15 22:56:05,956][1648985] Avg episode reward: [(0, '166.660')] [2024-06-15 22:56:07,143][1652491] Updated weights for policy 0, policy_version 932864 (0.0143) [2024-06-15 22:56:08,604][1652491] Updated weights for policy 0, policy_version 932923 (0.0014) [2024-06-15 22:56:10,955][1648985] Fps is (10 sec: 52429.1, 60 sec: 46421.6, 300 sec: 46208.4). Total num frames: 1910636544. Throughput: 0: 11605.4. Samples: 477704704. Policy #0 lag: (min: 15.0, avg: 98.0, max: 271.0) [2024-06-15 22:56:10,955][1648985] Avg episode reward: [(0, '156.760')] [2024-06-15 22:56:13,436][1652491] Updated weights for policy 0, policy_version 932965 (0.0012) [2024-06-15 22:56:14,643][1652491] Updated weights for policy 0, policy_version 933016 (0.0012) [2024-06-15 22:56:15,955][1648985] Fps is (10 sec: 52429.1, 60 sec: 48059.7, 300 sec: 46430.6). Total num frames: 1910898688. Throughput: 0: 11685.0. Samples: 477788160. Policy #0 lag: (min: 15.0, avg: 98.0, max: 271.0) [2024-06-15 22:56:15,956][1648985] Avg episode reward: [(0, '140.300')] [2024-06-15 22:56:17,255][1652491] Updated weights for policy 0, policy_version 933072 (0.0012) [2024-06-15 22:56:19,328][1652491] Updated weights for policy 0, policy_version 933152 (0.0014) [2024-06-15 22:56:20,955][1648985] Fps is (10 sec: 52427.3, 60 sec: 47513.4, 300 sec: 46208.4). Total num frames: 1911160832. Throughput: 0: 11662.2. Samples: 477850112. Policy #0 lag: (min: 15.0, avg: 98.0, max: 271.0) [2024-06-15 22:56:20,956][1648985] Avg episode reward: [(0, '155.810')] [2024-06-15 22:56:24,607][1652491] Updated weights for policy 0, policy_version 933219 (0.0016) [2024-06-15 22:56:25,955][1648985] Fps is (10 sec: 42598.3, 60 sec: 46421.4, 300 sec: 46430.6). Total num frames: 1911324672. Throughput: 0: 11730.5. Samples: 477896704. Policy #0 lag: (min: 15.0, avg: 100.3, max: 271.0) [2024-06-15 22:56:25,956][1648985] Avg episode reward: [(0, '169.670')] [2024-06-15 22:56:25,993][1652491] Updated weights for policy 0, policy_version 933280 (0.0013) [2024-06-15 22:56:28,582][1652491] Updated weights for policy 0, policy_version 933317 (0.0012) [2024-06-15 22:56:30,747][1652491] Updated weights for policy 0, policy_version 933408 (0.0013) [2024-06-15 22:56:30,955][1648985] Fps is (10 sec: 45875.9, 60 sec: 48059.7, 300 sec: 45986.3). Total num frames: 1911619584. Throughput: 0: 11821.5. Samples: 477958656. Policy #0 lag: (min: 15.0, avg: 100.3, max: 271.0) [2024-06-15 22:56:30,956][1648985] Avg episode reward: [(0, '183.660')] [2024-06-15 22:56:31,344][1652491] Updated weights for policy 0, policy_version 933439 (0.0014) [2024-06-15 22:56:35,729][1652491] Updated weights for policy 0, policy_version 933489 (0.0014) [2024-06-15 22:56:35,955][1648985] Fps is (10 sec: 49151.1, 60 sec: 45875.0, 300 sec: 46541.7). Total num frames: 1911816192. Throughput: 0: 11946.6. Samples: 478040064. Policy #0 lag: (min: 15.0, avg: 100.3, max: 271.0) [2024-06-15 22:56:35,956][1648985] Avg episode reward: [(0, '170.020')] [2024-06-15 22:56:36,814][1651469] Signal inference workers to stop experience collection... (48650 times) [2024-06-15 22:56:36,904][1652491] InferenceWorker_p0-w0: stopping experience collection (48650 times) [2024-06-15 22:56:37,024][1651469] Signal inference workers to resume experience collection... (48650 times) [2024-06-15 22:56:37,024][1652491] InferenceWorker_p0-w0: resuming experience collection (48650 times) [2024-06-15 22:56:37,111][1652491] Updated weights for policy 0, policy_version 933555 (0.0014) [2024-06-15 22:56:39,338][1652491] Updated weights for policy 0, policy_version 933584 (0.0012) [2024-06-15 22:56:40,955][1648985] Fps is (10 sec: 49152.3, 60 sec: 48605.8, 300 sec: 46097.4). Total num frames: 1912111104. Throughput: 0: 11980.8. Samples: 478073344. Policy #0 lag: (min: 15.0, avg: 100.3, max: 271.0) [2024-06-15 22:56:40,955][1648985] Avg episode reward: [(0, '185.750')] [2024-06-15 22:56:41,295][1652491] Updated weights for policy 0, policy_version 933664 (0.0108) [2024-06-15 22:56:45,955][1648985] Fps is (10 sec: 42599.1, 60 sec: 45329.1, 300 sec: 46319.5). Total num frames: 1912242176. Throughput: 0: 12117.3. Samples: 478142464. Policy #0 lag: (min: 15.0, avg: 100.3, max: 271.0) [2024-06-15 22:56:45,956][1648985] Avg episode reward: [(0, '182.120')] [2024-06-15 22:56:46,275][1652491] Updated weights for policy 0, policy_version 933728 (0.0014) [2024-06-15 22:56:47,582][1652491] Updated weights for policy 0, policy_version 933777 (0.0031) [2024-06-15 22:56:50,407][1652491] Updated weights for policy 0, policy_version 933840 (0.0015) [2024-06-15 22:56:50,955][1648985] Fps is (10 sec: 42598.1, 60 sec: 48605.8, 300 sec: 46097.4). Total num frames: 1912537088. Throughput: 0: 11787.4. Samples: 478209024. Policy #0 lag: (min: 15.0, avg: 100.3, max: 271.0) [2024-06-15 22:56:50,956][1648985] Avg episode reward: [(0, '193.880')] [2024-06-15 22:56:51,342][1652491] Updated weights for policy 0, policy_version 933885 (0.0015) [2024-06-15 22:56:52,862][1652491] Updated weights for policy 0, policy_version 933948 (0.0014) [2024-06-15 22:56:55,955][1648985] Fps is (10 sec: 49150.8, 60 sec: 46421.2, 300 sec: 46208.4). Total num frames: 1912733696. Throughput: 0: 12026.2. Samples: 478245888. Policy #0 lag: (min: 15.0, avg: 100.3, max: 271.0) [2024-06-15 22:56:55,956][1648985] Avg episode reward: [(0, '181.620')] [2024-06-15 22:56:55,989][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000933952_1912733696.pth... [2024-06-15 22:56:56,039][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000928560_1901690880.pth [2024-06-15 22:56:57,232][1652491] Updated weights for policy 0, policy_version 934000 (0.0011) [2024-06-15 22:56:59,107][1652491] Updated weights for policy 0, policy_version 934064 (0.0014) [2024-06-15 22:57:00,905][1652491] Updated weights for policy 0, policy_version 934112 (0.0014) [2024-06-15 22:57:00,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 49151.9, 300 sec: 46541.7). Total num frames: 1913061376. Throughput: 0: 11878.4. Samples: 478322688. Policy #0 lag: (min: 15.0, avg: 100.3, max: 271.0) [2024-06-15 22:57:00,956][1648985] Avg episode reward: [(0, '181.040')] [2024-06-15 22:57:02,880][1652491] Updated weights for policy 0, policy_version 934160 (0.0014) [2024-06-15 22:57:05,955][1648985] Fps is (10 sec: 52430.2, 60 sec: 48059.8, 300 sec: 46208.4). Total num frames: 1913257984. Throughput: 0: 12094.6. Samples: 478394368. Policy #0 lag: (min: 15.0, avg: 100.3, max: 271.0) [2024-06-15 22:57:05,956][1648985] Avg episode reward: [(0, '177.210')] [2024-06-15 22:57:07,637][1652491] Updated weights for policy 0, policy_version 934242 (0.0013) [2024-06-15 22:57:09,475][1652491] Updated weights for policy 0, policy_version 934290 (0.0012) [2024-06-15 22:57:10,955][1648985] Fps is (10 sec: 45875.5, 60 sec: 48059.7, 300 sec: 46986.0). Total num frames: 1913520128. Throughput: 0: 11901.2. Samples: 478432256. Policy #0 lag: (min: 15.0, avg: 100.3, max: 271.0) [2024-06-15 22:57:10,956][1648985] Avg episode reward: [(0, '180.110')] [2024-06-15 22:57:11,312][1652491] Updated weights for policy 0, policy_version 934337 (0.0014) [2024-06-15 22:57:12,319][1652491] Updated weights for policy 0, policy_version 934392 (0.0025) [2024-06-15 22:57:14,078][1652491] Updated weights for policy 0, policy_version 934435 (0.0012) [2024-06-15 22:57:15,957][1648985] Fps is (10 sec: 52419.6, 60 sec: 48058.3, 300 sec: 46319.3). Total num frames: 1913782272. Throughput: 0: 12082.7. Samples: 478502400. Policy #0 lag: (min: 15.0, avg: 100.3, max: 271.0) [2024-06-15 22:57:15,957][1648985] Avg episode reward: [(0, '156.140')] [2024-06-15 22:57:18,001][1652491] Updated weights for policy 0, policy_version 934496 (0.0034) [2024-06-15 22:57:20,127][1652491] Updated weights for policy 0, policy_version 934546 (0.0013) [2024-06-15 22:57:20,396][1651469] Signal inference workers to stop experience collection... (48700 times) [2024-06-15 22:57:20,452][1652491] InferenceWorker_p0-w0: stopping experience collection (48700 times) [2024-06-15 22:57:20,523][1651469] Signal inference workers to resume experience collection... (48700 times) [2024-06-15 22:57:20,524][1652491] InferenceWorker_p0-w0: resuming experience collection (48700 times) [2024-06-15 22:57:20,955][1648985] Fps is (10 sec: 52429.1, 60 sec: 48059.9, 300 sec: 47097.1). Total num frames: 1914044416. Throughput: 0: 11924.0. Samples: 478576640. Policy #0 lag: (min: 15.0, avg: 100.3, max: 271.0) [2024-06-15 22:57:20,955][1648985] Avg episode reward: [(0, '132.780')] [2024-06-15 22:57:22,484][1652491] Updated weights for policy 0, policy_version 934608 (0.0013) [2024-06-15 22:57:23,640][1652491] Updated weights for policy 0, policy_version 934655 (0.0013) [2024-06-15 22:57:25,182][1652491] Updated weights for policy 0, policy_version 934704 (0.0013) [2024-06-15 22:57:25,955][1648985] Fps is (10 sec: 52438.2, 60 sec: 49698.2, 300 sec: 46765.0). Total num frames: 1914306560. Throughput: 0: 12071.8. Samples: 478616576. Policy #0 lag: (min: 15.0, avg: 100.3, max: 271.0) [2024-06-15 22:57:25,955][1648985] Avg episode reward: [(0, '163.650')] [2024-06-15 22:57:27,585][1652491] Updated weights for policy 0, policy_version 934736 (0.0011) [2024-06-15 22:57:28,718][1652491] Updated weights for policy 0, policy_version 934780 (0.0012) [2024-06-15 22:57:30,515][1652491] Updated weights for policy 0, policy_version 934840 (0.0106) [2024-06-15 22:57:30,955][1648985] Fps is (10 sec: 52428.6, 60 sec: 49152.1, 300 sec: 47097.1). Total num frames: 1914568704. Throughput: 0: 12208.4. Samples: 478691840. Policy #0 lag: (min: 15.0, avg: 100.3, max: 271.0) [2024-06-15 22:57:30,955][1648985] Avg episode reward: [(0, '174.540')] [2024-06-15 22:57:34,299][1652491] Updated weights for policy 0, policy_version 934897 (0.0095) [2024-06-15 22:57:35,955][1648985] Fps is (10 sec: 45875.1, 60 sec: 49152.2, 300 sec: 47097.1). Total num frames: 1914765312. Throughput: 0: 12231.1. Samples: 478759424. Policy #0 lag: (min: 15.0, avg: 100.3, max: 271.0) [2024-06-15 22:57:35,956][1648985] Avg episode reward: [(0, '181.680')] [2024-06-15 22:57:36,228][1652491] Updated weights for policy 0, policy_version 934971 (0.0012) [2024-06-15 22:57:40,324][1652491] Updated weights for policy 0, policy_version 935013 (0.0049) [2024-06-15 22:57:40,955][1648985] Fps is (10 sec: 36044.3, 60 sec: 46967.4, 300 sec: 46541.7). Total num frames: 1914929152. Throughput: 0: 12424.6. Samples: 478804992. Policy #0 lag: (min: 15.0, avg: 100.3, max: 271.0) [2024-06-15 22:57:40,956][1648985] Avg episode reward: [(0, '173.870')] [2024-06-15 22:57:42,346][1652491] Updated weights for policy 0, policy_version 935095 (0.0105) [2024-06-15 22:57:44,880][1652491] Updated weights for policy 0, policy_version 935136 (0.0013) [2024-06-15 22:57:45,955][1648985] Fps is (10 sec: 45875.5, 60 sec: 49698.2, 300 sec: 47319.2). Total num frames: 1915224064. Throughput: 0: 12219.8. Samples: 478872576. Policy #0 lag: (min: 15.0, avg: 100.3, max: 271.0) [2024-06-15 22:57:45,955][1648985] Avg episode reward: [(0, '188.060')] [2024-06-15 22:57:46,449][1652491] Updated weights for policy 0, policy_version 935200 (0.0074) [2024-06-15 22:57:49,991][1652491] Updated weights for policy 0, policy_version 935234 (0.0013) [2024-06-15 22:57:50,955][1648985] Fps is (10 sec: 52429.4, 60 sec: 48605.9, 300 sec: 46874.9). Total num frames: 1915453440. Throughput: 0: 12276.6. Samples: 478946816. Policy #0 lag: (min: 15.0, avg: 100.3, max: 271.0) [2024-06-15 22:57:50,956][1648985] Avg episode reward: [(0, '182.860')] [2024-06-15 22:57:52,172][1652491] Updated weights for policy 0, policy_version 935328 (0.0012) [2024-06-15 22:57:55,517][1652491] Updated weights for policy 0, policy_version 935362 (0.0013) [2024-06-15 22:57:55,955][1648985] Fps is (10 sec: 42598.1, 60 sec: 48606.1, 300 sec: 47208.1). Total num frames: 1915650048. Throughput: 0: 12071.8. Samples: 478975488. Policy #0 lag: (min: 15.0, avg: 100.3, max: 271.0) [2024-06-15 22:57:55,955][1648985] Avg episode reward: [(0, '174.000')] [2024-06-15 22:57:56,921][1652491] Updated weights for policy 0, policy_version 935424 (0.0013) [2024-06-15 22:57:58,572][1652491] Updated weights for policy 0, policy_version 935485 (0.0035) [2024-06-15 22:58:00,955][1648985] Fps is (10 sec: 42598.5, 60 sec: 46967.5, 300 sec: 47208.1). Total num frames: 1915879424. Throughput: 0: 12208.8. Samples: 479051776. Policy #0 lag: (min: 15.0, avg: 100.3, max: 271.0) [2024-06-15 22:58:00,955][1648985] Avg episode reward: [(0, '161.040')] [2024-06-15 22:58:01,762][1651469] Signal inference workers to stop experience collection... (48750 times) [2024-06-15 22:58:01,804][1652491] InferenceWorker_p0-w0: stopping experience collection (48750 times) [2024-06-15 22:58:02,021][1651469] Signal inference workers to resume experience collection... (48750 times) [2024-06-15 22:58:02,022][1652491] InferenceWorker_p0-w0: resuming experience collection (48750 times) [2024-06-15 22:58:02,098][1652491] Updated weights for policy 0, policy_version 935536 (0.0087) [2024-06-15 22:58:03,570][1652491] Updated weights for policy 0, policy_version 935605 (0.0014) [2024-06-15 22:58:05,955][1648985] Fps is (10 sec: 49152.2, 60 sec: 48059.8, 300 sec: 47097.1). Total num frames: 1916141568. Throughput: 0: 12344.9. Samples: 479132160. Policy #0 lag: (min: 15.0, avg: 100.3, max: 271.0) [2024-06-15 22:58:05,956][1648985] Avg episode reward: [(0, '148.020')] [2024-06-15 22:58:07,328][1652491] Updated weights for policy 0, policy_version 935668 (0.0013) [2024-06-15 22:58:09,137][1652491] Updated weights for policy 0, policy_version 935739 (0.0015) [2024-06-15 22:58:10,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 48059.7, 300 sec: 47541.4). Total num frames: 1916403712. Throughput: 0: 11980.8. Samples: 479155712. Policy #0 lag: (min: 15.0, avg: 100.3, max: 271.0) [2024-06-15 22:58:10,956][1648985] Avg episode reward: [(0, '145.650')] [2024-06-15 22:58:13,276][1652491] Updated weights for policy 0, policy_version 935793 (0.0011) [2024-06-15 22:58:14,639][1652491] Updated weights for policy 0, policy_version 935871 (0.0013) [2024-06-15 22:58:15,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 48061.2, 300 sec: 47097.1). Total num frames: 1916665856. Throughput: 0: 11958.0. Samples: 479229952. Policy #0 lag: (min: 15.0, avg: 100.3, max: 271.0) [2024-06-15 22:58:15,956][1648985] Avg episode reward: [(0, '150.020')] [2024-06-15 22:58:17,672][1652491] Updated weights for policy 0, policy_version 935906 (0.0021) [2024-06-15 22:58:19,517][1652491] Updated weights for policy 0, policy_version 935977 (0.0012) [2024-06-15 22:58:20,955][1648985] Fps is (10 sec: 52428.4, 60 sec: 48059.6, 300 sec: 47652.4). Total num frames: 1916928000. Throughput: 0: 12197.0. Samples: 479308288. Policy #0 lag: (min: 15.0, avg: 100.3, max: 271.0) [2024-06-15 22:58:20,956][1648985] Avg episode reward: [(0, '162.910')] [2024-06-15 22:58:22,811][1652491] Updated weights for policy 0, policy_version 936018 (0.0012) [2024-06-15 22:58:24,590][1652491] Updated weights for policy 0, policy_version 936112 (0.0121) [2024-06-15 22:58:25,955][1648985] Fps is (10 sec: 52428.4, 60 sec: 48059.7, 300 sec: 47541.4). Total num frames: 1917190144. Throughput: 0: 11969.4. Samples: 479343616. Policy #0 lag: (min: 15.0, avg: 102.1, max: 271.0) [2024-06-15 22:58:25,956][1648985] Avg episode reward: [(0, '178.140')] [2024-06-15 22:58:28,656][1652491] Updated weights for policy 0, policy_version 936176 (0.0011) [2024-06-15 22:58:29,986][1652491] Updated weights for policy 0, policy_version 936228 (0.0117) [2024-06-15 22:58:30,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 48059.7, 300 sec: 47985.7). Total num frames: 1917452288. Throughput: 0: 12094.6. Samples: 479416832. Policy #0 lag: (min: 15.0, avg: 102.1, max: 271.0) [2024-06-15 22:58:30,956][1648985] Avg episode reward: [(0, '165.460')] [2024-06-15 22:58:33,727][1652491] Updated weights for policy 0, policy_version 936304 (0.0012) [2024-06-15 22:58:35,002][1652491] Updated weights for policy 0, policy_version 936374 (0.0084) [2024-06-15 22:58:35,955][1648985] Fps is (10 sec: 52429.1, 60 sec: 49152.0, 300 sec: 47985.7). Total num frames: 1917714432. Throughput: 0: 12083.2. Samples: 479490560. Policy #0 lag: (min: 15.0, avg: 102.1, max: 271.0) [2024-06-15 22:58:35,956][1648985] Avg episode reward: [(0, '165.980')] [2024-06-15 22:58:38,864][1652491] Updated weights for policy 0, policy_version 936421 (0.0027) [2024-06-15 22:58:39,103][1651469] Signal inference workers to stop experience collection... (48800 times) [2024-06-15 22:58:39,287][1652491] InferenceWorker_p0-w0: stopping experience collection (48800 times) [2024-06-15 22:58:39,388][1651469] Signal inference workers to resume experience collection... (48800 times) [2024-06-15 22:58:39,389][1652491] InferenceWorker_p0-w0: resuming experience collection (48800 times) [2024-06-15 22:58:40,322][1652491] Updated weights for policy 0, policy_version 936483 (0.0104) [2024-06-15 22:58:40,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 50790.5, 300 sec: 47985.7). Total num frames: 1917976576. Throughput: 0: 12481.4. Samples: 479537152. Policy #0 lag: (min: 15.0, avg: 102.1, max: 271.0) [2024-06-15 22:58:40,956][1648985] Avg episode reward: [(0, '153.530')] [2024-06-15 22:58:43,172][1652491] Updated weights for policy 0, policy_version 936513 (0.0012) [2024-06-15 22:58:44,638][1652491] Updated weights for policy 0, policy_version 936581 (0.0014) [2024-06-15 22:58:45,667][1652491] Updated weights for policy 0, policy_version 936634 (0.0014) [2024-06-15 22:58:45,955][1648985] Fps is (10 sec: 52428.3, 60 sec: 50244.1, 300 sec: 48318.9). Total num frames: 1918238720. Throughput: 0: 12379.0. Samples: 479608832. Policy #0 lag: (min: 15.0, avg: 102.1, max: 271.0) [2024-06-15 22:58:45,956][1648985] Avg episode reward: [(0, '152.220')] [2024-06-15 22:58:48,935][1652491] Updated weights for policy 0, policy_version 936677 (0.0011) [2024-06-15 22:58:50,434][1652491] Updated weights for policy 0, policy_version 936752 (0.0016) [2024-06-15 22:58:50,955][1648985] Fps is (10 sec: 52429.4, 60 sec: 50790.5, 300 sec: 47985.7). Total num frames: 1918500864. Throughput: 0: 12265.3. Samples: 479684096. Policy #0 lag: (min: 15.0, avg: 102.1, max: 271.0) [2024-06-15 22:58:50,955][1648985] Avg episode reward: [(0, '139.730')] [2024-06-15 22:58:53,803][1652491] Updated weights for policy 0, policy_version 936804 (0.0015) [2024-06-15 22:58:55,096][1652491] Updated weights for policy 0, policy_version 936869 (0.0013) [2024-06-15 22:58:55,955][1648985] Fps is (10 sec: 52428.2, 60 sec: 51882.5, 300 sec: 48430.0). Total num frames: 1918763008. Throughput: 0: 12697.5. Samples: 479727104. Policy #0 lag: (min: 15.0, avg: 102.1, max: 271.0) [2024-06-15 22:58:55,956][1648985] Avg episode reward: [(0, '142.840')] [2024-06-15 22:58:55,977][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000936896_1918763008.pth... [2024-06-15 22:58:56,041][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000931200_1907097600.pth [2024-06-15 22:58:58,236][1652491] Updated weights for policy 0, policy_version 936902 (0.0038) [2024-06-15 22:58:59,647][1652491] Updated weights for policy 0, policy_version 936962 (0.0013) [2024-06-15 22:59:00,955][1648985] Fps is (10 sec: 49151.3, 60 sec: 51882.6, 300 sec: 47985.7). Total num frames: 1918992384. Throughput: 0: 12777.2. Samples: 479804928. Policy #0 lag: (min: 15.0, avg: 102.1, max: 271.0) [2024-06-15 22:59:00,956][1648985] Avg episode reward: [(0, '141.710')] [2024-06-15 22:59:03,508][1652491] Updated weights for policy 0, policy_version 937025 (0.0023) [2024-06-15 22:59:04,744][1652491] Updated weights for policy 0, policy_version 937079 (0.0012) [2024-06-15 22:59:05,955][1648985] Fps is (10 sec: 45875.5, 60 sec: 51336.4, 300 sec: 48430.0). Total num frames: 1919221760. Throughput: 0: 12686.2. Samples: 479879168. Policy #0 lag: (min: 15.0, avg: 102.1, max: 271.0) [2024-06-15 22:59:05,956][1648985] Avg episode reward: [(0, '165.020')] [2024-06-15 22:59:06,063][1652491] Updated weights for policy 0, policy_version 937136 (0.0011) [2024-06-15 22:59:08,761][1652491] Updated weights for policy 0, policy_version 937156 (0.0016) [2024-06-15 22:59:10,664][1652491] Updated weights for policy 0, policy_version 937248 (0.0014) [2024-06-15 22:59:10,955][1648985] Fps is (10 sec: 49151.2, 60 sec: 51336.4, 300 sec: 48207.8). Total num frames: 1919483904. Throughput: 0: 12925.1. Samples: 479925248. Policy #0 lag: (min: 15.0, avg: 102.1, max: 271.0) [2024-06-15 22:59:10,956][1648985] Avg episode reward: [(0, '161.090')] [2024-06-15 22:59:13,306][1652491] Updated weights for policy 0, policy_version 937282 (0.0014) [2024-06-15 22:59:14,679][1652491] Updated weights for policy 0, policy_version 937350 (0.0105) [2024-06-15 22:59:15,323][1651469] Signal inference workers to stop experience collection... (48850 times) [2024-06-15 22:59:15,391][1652491] InferenceWorker_p0-w0: stopping experience collection (48850 times) [2024-06-15 22:59:15,655][1651469] Signal inference workers to resume experience collection... (48850 times) [2024-06-15 22:59:15,656][1652491] InferenceWorker_p0-w0: resuming experience collection (48850 times) [2024-06-15 22:59:15,902][1652491] Updated weights for policy 0, policy_version 937402 (0.0012) [2024-06-15 22:59:15,955][1648985] Fps is (10 sec: 55706.8, 60 sec: 51882.7, 300 sec: 48763.3). Total num frames: 1919778816. Throughput: 0: 12845.5. Samples: 479994880. Policy #0 lag: (min: 15.0, avg: 102.1, max: 271.0) [2024-06-15 22:59:15,955][1648985] Avg episode reward: [(0, '165.380')] [2024-06-15 22:59:19,695][1652491] Updated weights for policy 0, policy_version 937456 (0.0013) [2024-06-15 22:59:20,955][1648985] Fps is (10 sec: 52429.9, 60 sec: 51336.6, 300 sec: 48652.2). Total num frames: 1920008192. Throughput: 0: 12993.4. Samples: 480075264. Policy #0 lag: (min: 15.0, avg: 102.1, max: 271.0) [2024-06-15 22:59:20,956][1648985] Avg episode reward: [(0, '172.910')] [2024-06-15 22:59:20,985][1652491] Updated weights for policy 0, policy_version 937509 (0.0018) [2024-06-15 22:59:23,073][1652491] Updated weights for policy 0, policy_version 937552 (0.0029) [2024-06-15 22:59:23,889][1652491] Updated weights for policy 0, policy_version 937596 (0.0012) [2024-06-15 22:59:24,722][1652491] Updated weights for policy 0, policy_version 937633 (0.0014) [2024-06-15 22:59:25,955][1648985] Fps is (10 sec: 55704.8, 60 sec: 52428.8, 300 sec: 48874.3). Total num frames: 1920335872. Throughput: 0: 12902.4. Samples: 480117760. Policy #0 lag: (min: 15.0, avg: 102.1, max: 271.0) [2024-06-15 22:59:25,956][1648985] Avg episode reward: [(0, '176.200')] [2024-06-15 22:59:28,239][1652491] Updated weights for policy 0, policy_version 937680 (0.0015) [2024-06-15 22:59:29,774][1652491] Updated weights for policy 0, policy_version 937744 (0.0024) [2024-06-15 22:59:30,682][1652491] Updated weights for policy 0, policy_version 937784 (0.0014) [2024-06-15 22:59:30,955][1648985] Fps is (10 sec: 58983.0, 60 sec: 52428.9, 300 sec: 49318.6). Total num frames: 1920598016. Throughput: 0: 13130.0. Samples: 480199680. Policy #0 lag: (min: 15.0, avg: 102.1, max: 271.0) [2024-06-15 22:59:30,955][1648985] Avg episode reward: [(0, '171.680')] [2024-06-15 22:59:33,613][1652491] Updated weights for policy 0, policy_version 937840 (0.0012) [2024-06-15 22:59:34,601][1652491] Updated weights for policy 0, policy_version 937890 (0.0096) [2024-06-15 22:59:35,955][1648985] Fps is (10 sec: 52429.4, 60 sec: 52428.8, 300 sec: 48874.3). Total num frames: 1920860160. Throughput: 0: 13266.5. Samples: 480281088. Policy #0 lag: (min: 15.0, avg: 102.1, max: 271.0) [2024-06-15 22:59:35,955][1648985] Avg episode reward: [(0, '169.740')] [2024-06-15 22:59:37,967][1652491] Updated weights for policy 0, policy_version 937936 (0.0012) [2024-06-15 22:59:39,918][1652491] Updated weights for policy 0, policy_version 938000 (0.0145) [2024-06-15 22:59:40,872][1652491] Updated weights for policy 0, policy_version 938043 (0.0011) [2024-06-15 22:59:40,955][1648985] Fps is (10 sec: 49151.1, 60 sec: 51882.6, 300 sec: 49207.5). Total num frames: 1921089536. Throughput: 0: 13152.8. Samples: 480318976. Policy #0 lag: (min: 15.0, avg: 102.1, max: 271.0) [2024-06-15 22:59:40,956][1648985] Avg episode reward: [(0, '167.220')] [2024-06-15 22:59:43,862][1652491] Updated weights for policy 0, policy_version 938085 (0.0124) [2024-06-15 22:59:45,636][1652491] Updated weights for policy 0, policy_version 938163 (0.0013) [2024-06-15 22:59:45,955][1648985] Fps is (10 sec: 52428.5, 60 sec: 52428.9, 300 sec: 48874.3). Total num frames: 1921384448. Throughput: 0: 13061.7. Samples: 480392704. Policy #0 lag: (min: 15.0, avg: 102.1, max: 271.0) [2024-06-15 22:59:45,956][1648985] Avg episode reward: [(0, '188.920')] [2024-06-15 22:59:48,481][1652491] Updated weights for policy 0, policy_version 938195 (0.0012) [2024-06-15 22:59:50,220][1652491] Updated weights for policy 0, policy_version 938256 (0.0012) [2024-06-15 22:59:50,955][1648985] Fps is (10 sec: 52428.4, 60 sec: 51882.5, 300 sec: 49207.6). Total num frames: 1921613824. Throughput: 0: 13198.2. Samples: 480473088. Policy #0 lag: (min: 15.0, avg: 102.1, max: 271.0) [2024-06-15 22:59:50,956][1648985] Avg episode reward: [(0, '185.280')] [2024-06-15 22:59:51,130][1652491] Updated weights for policy 0, policy_version 938296 (0.0010) [2024-06-15 22:59:53,531][1652491] Updated weights for policy 0, policy_version 938324 (0.0016) [2024-06-15 22:59:54,280][1651469] Signal inference workers to stop experience collection... (48900 times) [2024-06-15 22:59:54,394][1652491] InferenceWorker_p0-w0: stopping experience collection (48900 times) [2024-06-15 22:59:54,479][1651469] Signal inference workers to resume experience collection... (48900 times) [2024-06-15 22:59:54,479][1652491] InferenceWorker_p0-w0: resuming experience collection (48900 times) [2024-06-15 22:59:54,961][1652491] Updated weights for policy 0, policy_version 938391 (0.0091) [2024-06-15 22:59:55,697][1652491] Updated weights for policy 0, policy_version 938432 (0.0012) [2024-06-15 22:59:55,955][1648985] Fps is (10 sec: 52428.6, 60 sec: 52428.9, 300 sec: 49207.5). Total num frames: 1921908736. Throughput: 0: 13095.9. Samples: 480514560. Policy #0 lag: (min: 15.0, avg: 102.1, max: 271.0) [2024-06-15 22:59:55,956][1648985] Avg episode reward: [(0, '166.280')] [2024-06-15 22:59:58,894][1652491] Updated weights for policy 0, policy_version 938483 (0.0013) [2024-06-15 23:00:00,473][1652491] Updated weights for policy 0, policy_version 938544 (0.0094) [2024-06-15 23:00:00,955][1648985] Fps is (10 sec: 55706.1, 60 sec: 52974.9, 300 sec: 49429.7). Total num frames: 1922170880. Throughput: 0: 13312.0. Samples: 480593920. Policy #0 lag: (min: 15.0, avg: 102.1, max: 271.0) [2024-06-15 23:00:00,956][1648985] Avg episode reward: [(0, '153.570')] [2024-06-15 23:00:03,594][1652491] Updated weights for policy 0, policy_version 938608 (0.0030) [2024-06-15 23:00:04,700][1652491] Updated weights for policy 0, policy_version 938658 (0.0011) [2024-06-15 23:00:05,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 53521.2, 300 sec: 49429.7). Total num frames: 1922433024. Throughput: 0: 13266.5. Samples: 480672256. Policy #0 lag: (min: 15.0, avg: 102.1, max: 271.0) [2024-06-15 23:00:05,956][1648985] Avg episode reward: [(0, '152.010')] [2024-06-15 23:00:08,628][1652491] Updated weights for policy 0, policy_version 938746 (0.0012) [2024-06-15 23:00:10,955][1648985] Fps is (10 sec: 45875.2, 60 sec: 52428.9, 300 sec: 49540.8). Total num frames: 1922629632. Throughput: 0: 13107.2. Samples: 480707584. Policy #0 lag: (min: 15.0, avg: 102.1, max: 271.0) [2024-06-15 23:00:10,956][1648985] Avg episode reward: [(0, '151.130')] [2024-06-15 23:00:11,036][1652491] Updated weights for policy 0, policy_version 938800 (0.0012) [2024-06-15 23:00:13,290][1652491] Updated weights for policy 0, policy_version 938848 (0.0010) [2024-06-15 23:00:14,964][1652491] Updated weights for policy 0, policy_version 938883 (0.0017) [2024-06-15 23:00:15,955][1648985] Fps is (10 sec: 49151.8, 60 sec: 52428.7, 300 sec: 49540.8). Total num frames: 1922924544. Throughput: 0: 13095.8. Samples: 480788992. Policy #0 lag: (min: 15.0, avg: 102.1, max: 271.0) [2024-06-15 23:00:15,956][1648985] Avg episode reward: [(0, '161.480')] [2024-06-15 23:00:16,059][1652491] Updated weights for policy 0, policy_version 938943 (0.0014) [2024-06-15 23:00:18,779][1652491] Updated weights for policy 0, policy_version 938996 (0.0011) [2024-06-15 23:00:20,955][1648985] Fps is (10 sec: 49151.5, 60 sec: 51882.5, 300 sec: 49429.7). Total num frames: 1923121152. Throughput: 0: 12947.9. Samples: 480863744. Policy #0 lag: (min: 49.0, avg: 172.0, max: 305.0) [2024-06-15 23:00:20,956][1648985] Avg episode reward: [(0, '158.870')] [2024-06-15 23:00:21,134][1652491] Updated weights for policy 0, policy_version 939040 (0.0012) [2024-06-15 23:00:23,756][1652491] Updated weights for policy 0, policy_version 939104 (0.0011) [2024-06-15 23:00:25,955][1648985] Fps is (10 sec: 42598.9, 60 sec: 50244.4, 300 sec: 49540.8). Total num frames: 1923350528. Throughput: 0: 12845.5. Samples: 480897024. Policy #0 lag: (min: 49.0, avg: 172.0, max: 305.0) [2024-06-15 23:00:25,955][1648985] Avg episode reward: [(0, '171.980')] [2024-06-15 23:00:26,264][1652491] Updated weights for policy 0, policy_version 939138 (0.0010) [2024-06-15 23:00:27,747][1652491] Updated weights for policy 0, policy_version 939202 (0.0012) [2024-06-15 23:00:29,120][1652491] Updated weights for policy 0, policy_version 939257 (0.0012) [2024-06-15 23:00:30,955][1648985] Fps is (10 sec: 49152.8, 60 sec: 50244.2, 300 sec: 49318.6). Total num frames: 1923612672. Throughput: 0: 12879.7. Samples: 480972288. Policy #0 lag: (min: 49.0, avg: 172.0, max: 305.0) [2024-06-15 23:00:30,955][1648985] Avg episode reward: [(0, '169.460')] [2024-06-15 23:00:32,511][1652491] Updated weights for policy 0, policy_version 939318 (0.0013) [2024-06-15 23:00:33,727][1652491] Updated weights for policy 0, policy_version 939360 (0.0012) [2024-06-15 23:00:33,820][1651469] Signal inference workers to stop experience collection... (48950 times) [2024-06-15 23:00:33,884][1652491] InferenceWorker_p0-w0: stopping experience collection (48950 times) [2024-06-15 23:00:34,078][1651469] Signal inference workers to resume experience collection... (48950 times) [2024-06-15 23:00:34,086][1652491] InferenceWorker_p0-w0: resuming experience collection (48950 times) [2024-06-15 23:00:35,955][1648985] Fps is (10 sec: 52427.7, 60 sec: 50244.1, 300 sec: 49762.9). Total num frames: 1923874816. Throughput: 0: 12947.9. Samples: 481055744. Policy #0 lag: (min: 49.0, avg: 172.0, max: 305.0) [2024-06-15 23:00:35,956][1648985] Avg episode reward: [(0, '141.060')] [2024-06-15 23:00:36,893][1652491] Updated weights for policy 0, policy_version 939409 (0.0019) [2024-06-15 23:00:38,285][1652491] Updated weights for policy 0, policy_version 939473 (0.0126) [2024-06-15 23:00:39,065][1652491] Updated weights for policy 0, policy_version 939516 (0.0013) [2024-06-15 23:00:40,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 50790.5, 300 sec: 49540.8). Total num frames: 1924136960. Throughput: 0: 12765.9. Samples: 481089024. Policy #0 lag: (min: 49.0, avg: 172.0, max: 305.0) [2024-06-15 23:00:40,955][1648985] Avg episode reward: [(0, '137.250')] [2024-06-15 23:00:42,471][1652491] Updated weights for policy 0, policy_version 939555 (0.0013) [2024-06-15 23:00:43,977][1652491] Updated weights for policy 0, policy_version 939616 (0.0012) [2024-06-15 23:00:44,685][1652491] Updated weights for policy 0, policy_version 939648 (0.0013) [2024-06-15 23:00:45,955][1648985] Fps is (10 sec: 52429.3, 60 sec: 50244.2, 300 sec: 50096.2). Total num frames: 1924399104. Throughput: 0: 12743.1. Samples: 481167360. Policy #0 lag: (min: 49.0, avg: 172.0, max: 305.0) [2024-06-15 23:00:45,956][1648985] Avg episode reward: [(0, '140.710')] [2024-06-15 23:00:48,154][1652491] Updated weights for policy 0, policy_version 939728 (0.0012) [2024-06-15 23:00:49,107][1652491] Updated weights for policy 0, policy_version 939773 (0.0013) [2024-06-15 23:00:50,955][1648985] Fps is (10 sec: 52428.5, 60 sec: 50790.5, 300 sec: 49874.0). Total num frames: 1924661248. Throughput: 0: 12788.6. Samples: 481247744. Policy #0 lag: (min: 49.0, avg: 172.0, max: 305.0) [2024-06-15 23:00:50,956][1648985] Avg episode reward: [(0, '152.900')] [2024-06-15 23:00:53,174][1652491] Updated weights for policy 0, policy_version 939836 (0.0022) [2024-06-15 23:00:54,510][1652491] Updated weights for policy 0, policy_version 939893 (0.0014) [2024-06-15 23:00:55,955][1648985] Fps is (10 sec: 52427.1, 60 sec: 50244.0, 300 sec: 50207.2). Total num frames: 1924923392. Throughput: 0: 12777.1. Samples: 481282560. Policy #0 lag: (min: 49.0, avg: 172.0, max: 305.0) [2024-06-15 23:00:55,956][1648985] Avg episode reward: [(0, '171.200')] [2024-06-15 23:00:55,962][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000939904_1924923392.pth... [2024-06-15 23:00:56,009][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000933952_1912733696.pth [2024-06-15 23:00:57,217][1652491] Updated weights for policy 0, policy_version 939937 (0.0022) [2024-06-15 23:00:58,598][1652491] Updated weights for policy 0, policy_version 940000 (0.0012) [2024-06-15 23:01:00,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 50244.3, 300 sec: 50207.3). Total num frames: 1925185536. Throughput: 0: 12652.1. Samples: 481358336. Policy #0 lag: (min: 49.0, avg: 172.0, max: 305.0) [2024-06-15 23:01:00,956][1648985] Avg episode reward: [(0, '150.580')] [2024-06-15 23:01:02,620][1652491] Updated weights for policy 0, policy_version 940034 (0.0017) [2024-06-15 23:01:04,793][1652491] Updated weights for policy 0, policy_version 940144 (0.0098) [2024-06-15 23:01:05,955][1648985] Fps is (10 sec: 52430.5, 60 sec: 50244.2, 300 sec: 50207.2). Total num frames: 1925447680. Throughput: 0: 12709.0. Samples: 481435648. Policy #0 lag: (min: 49.0, avg: 172.0, max: 305.0) [2024-06-15 23:01:05,956][1648985] Avg episode reward: [(0, '148.530')] [2024-06-15 23:01:07,417][1652491] Updated weights for policy 0, policy_version 940192 (0.0012) [2024-06-15 23:01:09,144][1652491] Updated weights for policy 0, policy_version 940256 (0.0012) [2024-06-15 23:01:09,778][1652491] Updated weights for policy 0, policy_version 940288 (0.0011) [2024-06-15 23:01:10,955][1648985] Fps is (10 sec: 52427.3, 60 sec: 51336.3, 300 sec: 50207.2). Total num frames: 1925709824. Throughput: 0: 12765.8. Samples: 481471488. Policy #0 lag: (min: 49.0, avg: 172.0, max: 305.0) [2024-06-15 23:01:10,956][1648985] Avg episode reward: [(0, '156.510')] [2024-06-15 23:01:14,408][1651469] Signal inference workers to stop experience collection... (49000 times) [2024-06-15 23:01:14,513][1652491] InferenceWorker_p0-w0: stopping experience collection (49000 times) [2024-06-15 23:01:14,719][1651469] Signal inference workers to resume experience collection... (49000 times) [2024-06-15 23:01:14,720][1652491] InferenceWorker_p0-w0: resuming experience collection (49000 times) [2024-06-15 23:01:14,722][1652491] Updated weights for policy 0, policy_version 940352 (0.0012) [2024-06-15 23:01:15,955][1648985] Fps is (10 sec: 49151.9, 60 sec: 50244.3, 300 sec: 50096.2). Total num frames: 1925939200. Throughput: 0: 12879.6. Samples: 481551872. Policy #0 lag: (min: 49.0, avg: 172.0, max: 305.0) [2024-06-15 23:01:15,956][1648985] Avg episode reward: [(0, '154.190')] [2024-06-15 23:01:16,046][1652491] Updated weights for policy 0, policy_version 940415 (0.0020) [2024-06-15 23:01:18,355][1652491] Updated weights for policy 0, policy_version 940480 (0.0012) [2024-06-15 23:01:19,769][1652491] Updated weights for policy 0, policy_version 940536 (0.0012) [2024-06-15 23:01:20,965][1648985] Fps is (10 sec: 52430.1, 60 sec: 51882.7, 300 sec: 50540.5). Total num frames: 1926234112. Throughput: 0: 12538.3. Samples: 481619968. Policy #0 lag: (min: 49.0, avg: 172.0, max: 305.0) [2024-06-15 23:01:20,966][1648985] Avg episode reward: [(0, '171.350')] [2024-06-15 23:01:25,273][1652491] Updated weights for policy 0, policy_version 940592 (0.0119) [2024-06-15 23:01:25,955][1648985] Fps is (10 sec: 45875.5, 60 sec: 50790.4, 300 sec: 50096.2). Total num frames: 1926397952. Throughput: 0: 12834.1. Samples: 481666560. Policy #0 lag: (min: 49.0, avg: 172.0, max: 305.0) [2024-06-15 23:01:25,956][1648985] Avg episode reward: [(0, '168.740')] [2024-06-15 23:01:26,301][1652491] Updated weights for policy 0, policy_version 940641 (0.0017) [2024-06-15 23:01:28,368][1652491] Updated weights for policy 0, policy_version 940708 (0.0034) [2024-06-15 23:01:29,741][1652491] Updated weights for policy 0, policy_version 940768 (0.0079) [2024-06-15 23:01:30,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 52428.7, 300 sec: 50651.6). Total num frames: 1926758400. Throughput: 0: 12447.3. Samples: 481727488. Policy #0 lag: (min: 49.0, avg: 172.0, max: 305.0) [2024-06-15 23:01:30,956][1648985] Avg episode reward: [(0, '173.110')] [2024-06-15 23:01:35,215][1652491] Updated weights for policy 0, policy_version 940838 (0.0014) [2024-06-15 23:01:35,955][1648985] Fps is (10 sec: 49152.4, 60 sec: 50244.5, 300 sec: 50096.2). Total num frames: 1926889472. Throughput: 0: 12561.1. Samples: 481812992. Policy #0 lag: (min: 49.0, avg: 172.0, max: 305.0) [2024-06-15 23:01:35,955][1648985] Avg episode reward: [(0, '186.470')] [2024-06-15 23:01:36,442][1652491] Updated weights for policy 0, policy_version 940896 (0.0015) [2024-06-15 23:01:37,122][1652491] Updated weights for policy 0, policy_version 940927 (0.0012) [2024-06-15 23:01:39,385][1652491] Updated weights for policy 0, policy_version 940992 (0.0021) [2024-06-15 23:01:40,722][1652491] Updated weights for policy 0, policy_version 941047 (0.0012) [2024-06-15 23:01:40,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 52428.8, 300 sec: 50984.8). Total num frames: 1927282688. Throughput: 0: 12606.7. Samples: 481849856. Policy #0 lag: (min: 49.0, avg: 172.0, max: 305.0) [2024-06-15 23:01:40,956][1648985] Avg episode reward: [(0, '185.350')] [2024-06-15 23:01:45,754][1652491] Updated weights for policy 0, policy_version 941104 (0.0013) [2024-06-15 23:01:45,955][1648985] Fps is (10 sec: 49151.7, 60 sec: 49698.2, 300 sec: 50318.3). Total num frames: 1927380992. Throughput: 0: 12652.1. Samples: 481927680. Policy #0 lag: (min: 49.0, avg: 172.0, max: 305.0) [2024-06-15 23:01:45,956][1648985] Avg episode reward: [(0, '155.340')] [2024-06-15 23:01:46,661][1652491] Updated weights for policy 0, policy_version 941138 (0.0012) [2024-06-15 23:01:47,365][1652491] Updated weights for policy 0, policy_version 941178 (0.0012) [2024-06-15 23:01:49,260][1652491] Updated weights for policy 0, policy_version 941243 (0.0073) [2024-06-15 23:01:50,365][1651469] Signal inference workers to stop experience collection... (49050 times) [2024-06-15 23:01:50,418][1652491] InferenceWorker_p0-w0: stopping experience collection (49050 times) [2024-06-15 23:01:50,583][1651469] Signal inference workers to resume experience collection... (49050 times) [2024-06-15 23:01:50,585][1652491] InferenceWorker_p0-w0: resuming experience collection (49050 times) [2024-06-15 23:01:50,955][1648985] Fps is (10 sec: 49151.9, 60 sec: 51882.7, 300 sec: 50984.8). Total num frames: 1927774208. Throughput: 0: 12390.4. Samples: 481993216. Policy #0 lag: (min: 49.0, avg: 172.0, max: 305.0) [2024-06-15 23:01:50,956][1648985] Avg episode reward: [(0, '142.460')] [2024-06-15 23:01:51,127][1652491] Updated weights for policy 0, policy_version 941311 (0.0030) [2024-06-15 23:01:55,955][1648985] Fps is (10 sec: 49151.1, 60 sec: 49152.2, 300 sec: 50207.2). Total num frames: 1927872512. Throughput: 0: 12618.0. Samples: 482039296. Policy #0 lag: (min: 49.0, avg: 172.0, max: 305.0) [2024-06-15 23:01:55,956][1648985] Avg episode reward: [(0, '134.260')] [2024-06-15 23:01:56,281][1652491] Updated weights for policy 0, policy_version 941364 (0.0120) [2024-06-15 23:01:57,178][1652491] Updated weights for policy 0, policy_version 941396 (0.0009) [2024-06-15 23:01:58,617][1652491] Updated weights for policy 0, policy_version 941456 (0.0013) [2024-06-15 23:02:00,625][1652491] Updated weights for policy 0, policy_version 941507 (0.0012) [2024-06-15 23:02:00,955][1648985] Fps is (10 sec: 45874.8, 60 sec: 50790.3, 300 sec: 50762.6). Total num frames: 1928232960. Throughput: 0: 12333.5. Samples: 482106880. Policy #0 lag: (min: 49.0, avg: 172.0, max: 305.0) [2024-06-15 23:02:00,956][1648985] Avg episode reward: [(0, '146.430')] [2024-06-15 23:02:01,778][1652491] Updated weights for policy 0, policy_version 941565 (0.0114) [2024-06-15 23:02:05,955][1648985] Fps is (10 sec: 49152.8, 60 sec: 48605.9, 300 sec: 50318.3). Total num frames: 1928364032. Throughput: 0: 12743.1. Samples: 482193408. Policy #0 lag: (min: 49.0, avg: 172.0, max: 305.0) [2024-06-15 23:02:05,956][1648985] Avg episode reward: [(0, '182.610')] [2024-06-15 23:02:06,028][1652491] Updated weights for policy 0, policy_version 941600 (0.0028) [2024-06-15 23:02:07,963][1652491] Updated weights for policy 0, policy_version 941670 (0.0014) [2024-06-15 23:02:09,953][1652491] Updated weights for policy 0, policy_version 941730 (0.0012) [2024-06-15 23:02:10,955][1648985] Fps is (10 sec: 49152.5, 60 sec: 50244.5, 300 sec: 50651.9). Total num frames: 1928724480. Throughput: 0: 12322.1. Samples: 482221056. Policy #0 lag: (min: 49.0, avg: 172.0, max: 305.0) [2024-06-15 23:02:10,956][1648985] Avg episode reward: [(0, '183.530')] [2024-06-15 23:02:11,942][1652491] Updated weights for policy 0, policy_version 941778 (0.0014) [2024-06-15 23:02:12,895][1652491] Updated weights for policy 0, policy_version 941824 (0.0013) [2024-06-15 23:02:15,955][1648985] Fps is (10 sec: 49151.6, 60 sec: 48605.8, 300 sec: 50207.2). Total num frames: 1928855552. Throughput: 0: 12720.3. Samples: 482299904. Policy #0 lag: (min: 49.0, avg: 172.0, max: 305.0) [2024-06-15 23:02:15,957][1648985] Avg episode reward: [(0, '160.960')] [2024-06-15 23:02:17,113][1652491] Updated weights for policy 0, policy_version 941877 (0.0017) [2024-06-15 23:02:18,553][1652491] Updated weights for policy 0, policy_version 941944 (0.0085) [2024-06-15 23:02:19,948][1652491] Updated weights for policy 0, policy_version 941989 (0.0012) [2024-06-15 23:02:20,955][1648985] Fps is (10 sec: 52428.3, 60 sec: 50244.2, 300 sec: 50651.5). Total num frames: 1929248768. Throughput: 0: 12470.0. Samples: 482374144. Policy #0 lag: (min: 5.0, avg: 82.9, max: 261.0) [2024-06-15 23:02:20,956][1648985] Avg episode reward: [(0, '148.330')] [2024-06-15 23:02:22,757][1652491] Updated weights for policy 0, policy_version 942048 (0.0086) [2024-06-15 23:02:25,955][1648985] Fps is (10 sec: 52429.2, 60 sec: 49698.1, 300 sec: 50207.2). Total num frames: 1929379840. Throughput: 0: 12447.3. Samples: 482409984. Policy #0 lag: (min: 5.0, avg: 82.9, max: 261.0) [2024-06-15 23:02:25,956][1648985] Avg episode reward: [(0, '154.060')] [2024-06-15 23:02:26,644][1652491] Updated weights for policy 0, policy_version 942098 (0.0015) [2024-06-15 23:02:27,845][1652491] Updated weights for policy 0, policy_version 942160 (0.0012) [2024-06-15 23:02:28,853][1652491] Updated weights for policy 0, policy_version 942208 (0.0015) [2024-06-15 23:02:30,343][1651469] Signal inference workers to stop experience collection... (49100 times) [2024-06-15 23:02:30,379][1652491] InferenceWorker_p0-w0: stopping experience collection (49100 times) [2024-06-15 23:02:30,511][1651469] Signal inference workers to resume experience collection... (49100 times) [2024-06-15 23:02:30,512][1652491] InferenceWorker_p0-w0: resuming experience collection (49100 times) [2024-06-15 23:02:30,680][1652491] Updated weights for policy 0, policy_version 942267 (0.0015) [2024-06-15 23:02:30,955][1648985] Fps is (10 sec: 52429.5, 60 sec: 50244.3, 300 sec: 50873.7). Total num frames: 1929773056. Throughput: 0: 12458.7. Samples: 482488320. Policy #0 lag: (min: 5.0, avg: 82.9, max: 261.0) [2024-06-15 23:02:30,956][1648985] Avg episode reward: [(0, '186.660')] [2024-06-15 23:02:33,863][1652491] Updated weights for policy 0, policy_version 942332 (0.0013) [2024-06-15 23:02:35,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 50244.2, 300 sec: 50762.6). Total num frames: 1929904128. Throughput: 0: 12777.3. Samples: 482568192. Policy #0 lag: (min: 5.0, avg: 82.9, max: 261.0) [2024-06-15 23:02:35,955][1648985] Avg episode reward: [(0, '204.640')] [2024-06-15 23:02:38,019][1652491] Updated weights for policy 0, policy_version 942388 (0.0015) [2024-06-15 23:02:39,491][1652491] Updated weights for policy 0, policy_version 942456 (0.0013) [2024-06-15 23:02:40,955][1648985] Fps is (10 sec: 45874.5, 60 sec: 49151.9, 300 sec: 50873.7). Total num frames: 1930231808. Throughput: 0: 12458.7. Samples: 482599936. Policy #0 lag: (min: 5.0, avg: 82.9, max: 261.0) [2024-06-15 23:02:40,956][1648985] Avg episode reward: [(0, '183.830')] [2024-06-15 23:02:41,006][1652491] Updated weights for policy 0, policy_version 942497 (0.0013) [2024-06-15 23:02:43,217][1652491] Updated weights for policy 0, policy_version 942544 (0.0021) [2024-06-15 23:02:45,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 50790.4, 300 sec: 50762.6). Total num frames: 1930428416. Throughput: 0: 12595.2. Samples: 482673664. Policy #0 lag: (min: 5.0, avg: 82.9, max: 261.0) [2024-06-15 23:02:45,956][1648985] Avg episode reward: [(0, '161.780')] [2024-06-15 23:02:47,340][1652491] Updated weights for policy 0, policy_version 942608 (0.0093) [2024-06-15 23:02:48,579][1652491] Updated weights for policy 0, policy_version 942656 (0.0110) [2024-06-15 23:02:50,846][1652491] Updated weights for policy 0, policy_version 942721 (0.0022) [2024-06-15 23:02:50,955][1648985] Fps is (10 sec: 45875.7, 60 sec: 48605.9, 300 sec: 50984.8). Total num frames: 1930690560. Throughput: 0: 12379.0. Samples: 482750464. Policy #0 lag: (min: 5.0, avg: 82.9, max: 261.0) [2024-06-15 23:02:50,956][1648985] Avg episode reward: [(0, '157.760')] [2024-06-15 23:02:52,002][1652491] Updated weights for policy 0, policy_version 942781 (0.0075) [2024-06-15 23:02:54,871][1652491] Updated weights for policy 0, policy_version 942833 (0.0017) [2024-06-15 23:02:55,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 51336.6, 300 sec: 51095.9). Total num frames: 1930952704. Throughput: 0: 12629.3. Samples: 482789376. Policy #0 lag: (min: 5.0, avg: 82.9, max: 261.0) [2024-06-15 23:02:55,956][1648985] Avg episode reward: [(0, '167.920')] [2024-06-15 23:02:55,964][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000942848_1930952704.pth... [2024-06-15 23:02:56,053][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000936896_1918763008.pth [2024-06-15 23:02:58,066][1652491] Updated weights for policy 0, policy_version 942867 (0.0012) [2024-06-15 23:02:59,417][1652491] Updated weights for policy 0, policy_version 942918 (0.0012) [2024-06-15 23:03:00,955][1648985] Fps is (10 sec: 52428.9, 60 sec: 49698.2, 300 sec: 51095.9). Total num frames: 1931214848. Throughput: 0: 12527.0. Samples: 482863616. Policy #0 lag: (min: 5.0, avg: 82.9, max: 261.0) [2024-06-15 23:03:00,956][1648985] Avg episode reward: [(0, '188.440')] [2024-06-15 23:03:01,633][1652491] Updated weights for policy 0, policy_version 942981 (0.0100) [2024-06-15 23:03:02,996][1652491] Updated weights for policy 0, policy_version 943040 (0.0014) [2024-06-15 23:03:05,069][1652491] Updated weights for policy 0, policy_version 943103 (0.0014) [2024-06-15 23:03:05,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 51882.7, 300 sec: 51095.9). Total num frames: 1931476992. Throughput: 0: 12492.8. Samples: 482936320. Policy #0 lag: (min: 5.0, avg: 82.9, max: 261.0) [2024-06-15 23:03:05,956][1648985] Avg episode reward: [(0, '197.080')] [2024-06-15 23:03:09,168][1652491] Updated weights for policy 0, policy_version 943160 (0.0024) [2024-06-15 23:03:10,380][1652491] Updated weights for policy 0, policy_version 943204 (0.0013) [2024-06-15 23:03:10,955][1648985] Fps is (10 sec: 52428.6, 60 sec: 50244.2, 300 sec: 51095.9). Total num frames: 1931739136. Throughput: 0: 12561.1. Samples: 482975232. Policy #0 lag: (min: 5.0, avg: 82.9, max: 261.0) [2024-06-15 23:03:10,956][1648985] Avg episode reward: [(0, '171.030')] [2024-06-15 23:03:11,823][1651469] Signal inference workers to stop experience collection... (49150 times) [2024-06-15 23:03:11,868][1652491] InferenceWorker_p0-w0: stopping experience collection (49150 times) [2024-06-15 23:03:12,123][1651469] Signal inference workers to resume experience collection... (49150 times) [2024-06-15 23:03:12,124][1652491] InferenceWorker_p0-w0: resuming experience collection (49150 times) [2024-06-15 23:03:12,248][1652491] Updated weights for policy 0, policy_version 943249 (0.0013) [2024-06-15 23:03:13,320][1652491] Updated weights for policy 0, policy_version 943294 (0.0012) [2024-06-15 23:03:15,421][1652491] Updated weights for policy 0, policy_version 943331 (0.0014) [2024-06-15 23:03:15,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 52428.9, 300 sec: 51095.9). Total num frames: 1932001280. Throughput: 0: 12526.9. Samples: 483052032. Policy #0 lag: (min: 5.0, avg: 82.9, max: 261.0) [2024-06-15 23:03:15,956][1648985] Avg episode reward: [(0, '171.660')] [2024-06-15 23:03:19,257][1652491] Updated weights for policy 0, policy_version 943408 (0.0166) [2024-06-15 23:03:20,832][1652491] Updated weights for policy 0, policy_version 943483 (0.0014) [2024-06-15 23:03:20,955][1648985] Fps is (10 sec: 52429.3, 60 sec: 50244.4, 300 sec: 51095.9). Total num frames: 1932263424. Throughput: 0: 12333.5. Samples: 483123200. Policy #0 lag: (min: 5.0, avg: 82.9, max: 261.0) [2024-06-15 23:03:20,955][1648985] Avg episode reward: [(0, '172.380')] [2024-06-15 23:03:23,351][1652491] Updated weights for policy 0, policy_version 943528 (0.0033) [2024-06-15 23:03:24,747][1652491] Updated weights for policy 0, policy_version 943553 (0.0013) [2024-06-15 23:03:25,955][1648985] Fps is (10 sec: 52428.6, 60 sec: 52428.8, 300 sec: 51095.9). Total num frames: 1932525568. Throughput: 0: 12504.2. Samples: 483162624. Policy #0 lag: (min: 5.0, avg: 82.9, max: 261.0) [2024-06-15 23:03:25,956][1648985] Avg episode reward: [(0, '163.410')] [2024-06-15 23:03:28,586][1652491] Updated weights for policy 0, policy_version 943633 (0.0014) [2024-06-15 23:03:30,686][1652491] Updated weights for policy 0, policy_version 943718 (0.0013) [2024-06-15 23:03:30,955][1648985] Fps is (10 sec: 49151.6, 60 sec: 49698.1, 300 sec: 50984.8). Total num frames: 1932754944. Throughput: 0: 12674.8. Samples: 483244032. Policy #0 lag: (min: 5.0, avg: 82.9, max: 261.0) [2024-06-15 23:03:30,956][1648985] Avg episode reward: [(0, '154.010')] [2024-06-15 23:03:33,549][1652491] Updated weights for policy 0, policy_version 943782 (0.0022) [2024-06-15 23:03:35,696][1652491] Updated weights for policy 0, policy_version 943824 (0.0014) [2024-06-15 23:03:35,955][1648985] Fps is (10 sec: 42598.4, 60 sec: 50790.4, 300 sec: 50762.6). Total num frames: 1932951552. Throughput: 0: 12652.1. Samples: 483319808. Policy #0 lag: (min: 5.0, avg: 82.9, max: 261.0) [2024-06-15 23:03:35,956][1648985] Avg episode reward: [(0, '158.340')] [2024-06-15 23:03:38,823][1652491] Updated weights for policy 0, policy_version 943888 (0.0011) [2024-06-15 23:03:39,853][1652491] Updated weights for policy 0, policy_version 943930 (0.0017) [2024-06-15 23:03:40,955][1648985] Fps is (10 sec: 45875.7, 60 sec: 49698.3, 300 sec: 50762.7). Total num frames: 1933213696. Throughput: 0: 12686.3. Samples: 483360256. Policy #0 lag: (min: 5.0, avg: 82.9, max: 261.0) [2024-06-15 23:03:40,955][1648985] Avg episode reward: [(0, '151.350')] [2024-06-15 23:03:41,378][1652491] Updated weights for policy 0, policy_version 943984 (0.0011) [2024-06-15 23:03:43,425][1652491] Updated weights for policy 0, policy_version 944037 (0.0013) [2024-06-15 23:03:45,316][1652491] Updated weights for policy 0, policy_version 944065 (0.0013) [2024-06-15 23:03:45,955][1648985] Fps is (10 sec: 55705.7, 60 sec: 51336.5, 300 sec: 50873.7). Total num frames: 1933508608. Throughput: 0: 12709.0. Samples: 483435520. Policy #0 lag: (min: 5.0, avg: 82.9, max: 261.0) [2024-06-15 23:03:45,956][1648985] Avg episode reward: [(0, '158.300')] [2024-06-15 23:03:46,363][1652491] Updated weights for policy 0, policy_version 944118 (0.0012) [2024-06-15 23:03:49,064][1652491] Updated weights for policy 0, policy_version 944146 (0.0015) [2024-06-15 23:03:50,008][1652491] Updated weights for policy 0, policy_version 944188 (0.0009) [2024-06-15 23:03:50,656][1651469] Signal inference workers to stop experience collection... (49200 times) [2024-06-15 23:03:50,698][1652491] InferenceWorker_p0-w0: stopping experience collection (49200 times) [2024-06-15 23:03:50,955][1648985] Fps is (10 sec: 52427.4, 60 sec: 50790.3, 300 sec: 50762.6). Total num frames: 1933737984. Throughput: 0: 12834.1. Samples: 483513856. Policy #0 lag: (min: 5.0, avg: 82.9, max: 261.0) [2024-06-15 23:03:50,956][1648985] Avg episode reward: [(0, '145.480')] [2024-06-15 23:03:51,022][1651469] Signal inference workers to resume experience collection... (49200 times) [2024-06-15 23:03:51,023][1652491] InferenceWorker_p0-w0: resuming experience collection (49200 times) [2024-06-15 23:03:51,461][1652491] Updated weights for policy 0, policy_version 944240 (0.0013) [2024-06-15 23:03:54,177][1652491] Updated weights for policy 0, policy_version 944304 (0.0013) [2024-06-15 23:03:55,955][1648985] Fps is (10 sec: 45875.2, 60 sec: 50244.3, 300 sec: 50762.6). Total num frames: 1933967360. Throughput: 0: 12697.6. Samples: 483546624. Policy #0 lag: (min: 5.0, avg: 82.9, max: 261.0) [2024-06-15 23:03:55,956][1648985] Avg episode reward: [(0, '159.210')] [2024-06-15 23:03:56,474][1652491] Updated weights for policy 0, policy_version 944352 (0.0015) [2024-06-15 23:03:59,780][1652491] Updated weights for policy 0, policy_version 944402 (0.0045) [2024-06-15 23:04:00,955][1648985] Fps is (10 sec: 49153.0, 60 sec: 50244.3, 300 sec: 50873.7). Total num frames: 1934229504. Throughput: 0: 12720.4. Samples: 483624448. Policy #0 lag: (min: 5.0, avg: 82.9, max: 261.0) [2024-06-15 23:04:00,955][1648985] Avg episode reward: [(0, '138.230')] [2024-06-15 23:04:01,661][1652491] Updated weights for policy 0, policy_version 944464 (0.0013) [2024-06-15 23:04:02,814][1652491] Updated weights for policy 0, policy_version 944512 (0.0014) [2024-06-15 23:04:05,041][1652491] Updated weights for policy 0, policy_version 944569 (0.0013) [2024-06-15 23:04:05,955][1648985] Fps is (10 sec: 52428.4, 60 sec: 50244.2, 300 sec: 50873.7). Total num frames: 1934491648. Throughput: 0: 12674.8. Samples: 483693568. Policy #0 lag: (min: 5.0, avg: 82.9, max: 261.0) [2024-06-15 23:04:05,956][1648985] Avg episode reward: [(0, '131.690')] [2024-06-15 23:04:07,590][1652491] Updated weights for policy 0, policy_version 944610 (0.0013) [2024-06-15 23:04:09,524][1652491] Updated weights for policy 0, policy_version 944641 (0.0024) [2024-06-15 23:04:10,947][1652491] Updated weights for policy 0, policy_version 944700 (0.0013) [2024-06-15 23:04:10,955][1648985] Fps is (10 sec: 49152.4, 60 sec: 49698.2, 300 sec: 50651.6). Total num frames: 1934721024. Throughput: 0: 12754.5. Samples: 483736576. Policy #0 lag: (min: 5.0, avg: 82.9, max: 261.0) [2024-06-15 23:04:10,955][1648985] Avg episode reward: [(0, '152.510')] [2024-06-15 23:04:11,917][1652491] Updated weights for policy 0, policy_version 944736 (0.0012) [2024-06-15 23:04:14,093][1652491] Updated weights for policy 0, policy_version 944770 (0.0028) [2024-06-15 23:04:15,237][1652491] Updated weights for policy 0, policy_version 944831 (0.0013) [2024-06-15 23:04:15,955][1648985] Fps is (10 sec: 52429.3, 60 sec: 50244.3, 300 sec: 50873.7). Total num frames: 1935015936. Throughput: 0: 12754.5. Samples: 483817984. Policy #0 lag: (min: 12.0, avg: 133.9, max: 268.0) [2024-06-15 23:04:15,956][1648985] Avg episode reward: [(0, '179.310')] [2024-06-15 23:04:17,064][1652491] Updated weights for policy 0, policy_version 944892 (0.0020) [2024-06-15 23:04:20,474][1652491] Updated weights for policy 0, policy_version 944931 (0.0011) [2024-06-15 23:04:20,955][1648985] Fps is (10 sec: 52427.9, 60 sec: 49698.0, 300 sec: 50540.5). Total num frames: 1935245312. Throughput: 0: 12777.2. Samples: 483894784. Policy #0 lag: (min: 12.0, avg: 133.9, max: 268.0) [2024-06-15 23:04:20,956][1648985] Avg episode reward: [(0, '185.740')] [2024-06-15 23:04:22,215][1652491] Updated weights for policy 0, policy_version 944995 (0.0011) [2024-06-15 23:04:24,525][1652491] Updated weights for policy 0, policy_version 945025 (0.0015) [2024-06-15 23:04:25,721][1652491] Updated weights for policy 0, policy_version 945078 (0.0012) [2024-06-15 23:04:25,955][1648985] Fps is (10 sec: 52428.3, 60 sec: 50244.2, 300 sec: 50651.5). Total num frames: 1935540224. Throughput: 0: 12743.1. Samples: 483933696. Policy #0 lag: (min: 12.0, avg: 133.9, max: 268.0) [2024-06-15 23:04:25,956][1648985] Avg episode reward: [(0, '166.650')] [2024-06-15 23:04:27,137][1652491] Updated weights for policy 0, policy_version 945120 (0.0011) [2024-06-15 23:04:30,797][1652491] Updated weights for policy 0, policy_version 945185 (0.0056) [2024-06-15 23:04:30,955][1648985] Fps is (10 sec: 49152.2, 60 sec: 49698.1, 300 sec: 50429.4). Total num frames: 1935736832. Throughput: 0: 12800.0. Samples: 484011520. Policy #0 lag: (min: 12.0, avg: 133.9, max: 268.0) [2024-06-15 23:04:30,956][1648985] Avg episode reward: [(0, '162.180')] [2024-06-15 23:04:31,466][1651469] Signal inference workers to stop experience collection... (49250 times) [2024-06-15 23:04:31,521][1652491] InferenceWorker_p0-w0: stopping experience collection (49250 times) [2024-06-15 23:04:31,810][1651469] Signal inference workers to resume experience collection... (49250 times) [2024-06-15 23:04:31,811][1652491] InferenceWorker_p0-w0: resuming experience collection (49250 times) [2024-06-15 23:04:32,417][1652491] Updated weights for policy 0, policy_version 945249 (0.0011) [2024-06-15 23:04:33,075][1652491] Updated weights for policy 0, policy_version 945279 (0.0011) [2024-06-15 23:04:35,955][1648985] Fps is (10 sec: 49152.5, 60 sec: 51336.6, 300 sec: 50651.6). Total num frames: 1936031744. Throughput: 0: 12652.1. Samples: 484083200. Policy #0 lag: (min: 12.0, avg: 133.9, max: 268.0) [2024-06-15 23:04:35,956][1648985] Avg episode reward: [(0, '173.270')] [2024-06-15 23:04:36,181][1652491] Updated weights for policy 0, policy_version 945344 (0.0018) [2024-06-15 23:04:38,261][1652491] Updated weights for policy 0, policy_version 945396 (0.0012) [2024-06-15 23:04:40,956][1648985] Fps is (10 sec: 49145.7, 60 sec: 50243.1, 300 sec: 50318.1). Total num frames: 1936228352. Throughput: 0: 12731.4. Samples: 484119552. Policy #0 lag: (min: 12.0, avg: 133.9, max: 268.0) [2024-06-15 23:04:40,957][1648985] Avg episode reward: [(0, '156.580')] [2024-06-15 23:04:41,201][1652491] Updated weights for policy 0, policy_version 945440 (0.0011) [2024-06-15 23:04:43,056][1652491] Updated weights for policy 0, policy_version 945510 (0.0178) [2024-06-15 23:04:45,748][1652491] Updated weights for policy 0, policy_version 945571 (0.0139) [2024-06-15 23:04:45,955][1648985] Fps is (10 sec: 52428.2, 60 sec: 50790.3, 300 sec: 50651.6). Total num frames: 1936556032. Throughput: 0: 12811.3. Samples: 484200960. Policy #0 lag: (min: 12.0, avg: 133.9, max: 268.0) [2024-06-15 23:04:45,956][1648985] Avg episode reward: [(0, '148.690')] [2024-06-15 23:04:48,158][1652491] Updated weights for policy 0, policy_version 945634 (0.0012) [2024-06-15 23:04:50,955][1648985] Fps is (10 sec: 52435.6, 60 sec: 50244.4, 300 sec: 50318.3). Total num frames: 1936752640. Throughput: 0: 13016.2. Samples: 484279296. Policy #0 lag: (min: 12.0, avg: 133.9, max: 268.0) [2024-06-15 23:04:50,955][1648985] Avg episode reward: [(0, '150.690')] [2024-06-15 23:04:51,087][1652491] Updated weights for policy 0, policy_version 945682 (0.0014) [2024-06-15 23:04:53,392][1652491] Updated weights for policy 0, policy_version 945776 (0.0114) [2024-06-15 23:04:55,955][1648985] Fps is (10 sec: 42597.6, 60 sec: 50244.0, 300 sec: 50207.2). Total num frames: 1936982016. Throughput: 0: 12572.3. Samples: 484302336. Policy #0 lag: (min: 12.0, avg: 133.9, max: 268.0) [2024-06-15 23:04:55,956][1648985] Avg episode reward: [(0, '161.750')] [2024-06-15 23:04:55,963][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000945792_1936982016.pth... [2024-06-15 23:04:56,025][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000939904_1924923392.pth [2024-06-15 23:04:56,754][1652491] Updated weights for policy 0, policy_version 945815 (0.0012) [2024-06-15 23:04:58,642][1652491] Updated weights for policy 0, policy_version 945888 (0.0012) [2024-06-15 23:05:00,955][1648985] Fps is (10 sec: 49150.7, 60 sec: 50244.0, 300 sec: 50207.2). Total num frames: 1937244160. Throughput: 0: 12561.0. Samples: 484383232. Policy #0 lag: (min: 12.0, avg: 133.9, max: 268.0) [2024-06-15 23:05:00,956][1648985] Avg episode reward: [(0, '173.170')] [2024-06-15 23:05:01,717][1652491] Updated weights for policy 0, policy_version 945936 (0.0080) [2024-06-15 23:05:03,354][1652491] Updated weights for policy 0, policy_version 946000 (0.0012) [2024-06-15 23:05:04,505][1652491] Updated weights for policy 0, policy_version 946048 (0.0014) [2024-06-15 23:05:05,955][1648985] Fps is (10 sec: 52429.9, 60 sec: 50244.3, 300 sec: 50429.4). Total num frames: 1937506304. Throughput: 0: 12390.4. Samples: 484452352. Policy #0 lag: (min: 12.0, avg: 133.9, max: 268.0) [2024-06-15 23:05:05,956][1648985] Avg episode reward: [(0, '177.890')] [2024-06-15 23:05:07,894][1652491] Updated weights for policy 0, policy_version 946110 (0.0013) [2024-06-15 23:05:10,427][1652491] Updated weights for policy 0, policy_version 946169 (0.0015) [2024-06-15 23:05:10,955][1648985] Fps is (10 sec: 52429.9, 60 sec: 50790.3, 300 sec: 50318.3). Total num frames: 1937768448. Throughput: 0: 12379.0. Samples: 484490752. Policy #0 lag: (min: 12.0, avg: 133.9, max: 268.0) [2024-06-15 23:05:10,956][1648985] Avg episode reward: [(0, '174.230')] [2024-06-15 23:05:12,712][1652491] Updated weights for policy 0, policy_version 946208 (0.0062) [2024-06-15 23:05:12,871][1651469] Signal inference workers to stop experience collection... (49300 times) [2024-06-15 23:05:12,928][1652491] InferenceWorker_p0-w0: stopping experience collection (49300 times) [2024-06-15 23:05:13,170][1651469] Signal inference workers to resume experience collection... (49300 times) [2024-06-15 23:05:13,171][1652491] InferenceWorker_p0-w0: resuming experience collection (49300 times) [2024-06-15 23:05:14,304][1652491] Updated weights for policy 0, policy_version 946272 (0.0083) [2024-06-15 23:05:15,043][1652491] Updated weights for policy 0, policy_version 946303 (0.0010) [2024-06-15 23:05:15,955][1648985] Fps is (10 sec: 52429.4, 60 sec: 50244.3, 300 sec: 50540.5). Total num frames: 1938030592. Throughput: 0: 12208.4. Samples: 484560896. Policy #0 lag: (min: 12.0, avg: 133.9, max: 268.0) [2024-06-15 23:05:15,955][1648985] Avg episode reward: [(0, '172.270')] [2024-06-15 23:05:17,942][1652491] Updated weights for policy 0, policy_version 946342 (0.0014) [2024-06-15 23:05:19,577][1652491] Updated weights for policy 0, policy_version 946373 (0.0014) [2024-06-15 23:05:20,638][1652491] Updated weights for policy 0, policy_version 946431 (0.0013) [2024-06-15 23:05:20,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 50790.4, 300 sec: 50651.5). Total num frames: 1938292736. Throughput: 0: 12492.8. Samples: 484645376. Policy #0 lag: (min: 12.0, avg: 133.9, max: 268.0) [2024-06-15 23:05:20,957][1648985] Avg episode reward: [(0, '188.310')] [2024-06-15 23:05:23,791][1652491] Updated weights for policy 0, policy_version 946497 (0.0090) [2024-06-15 23:05:25,280][1652491] Updated weights for policy 0, policy_version 946557 (0.0014) [2024-06-15 23:05:25,955][1648985] Fps is (10 sec: 52427.5, 60 sec: 50244.2, 300 sec: 50651.5). Total num frames: 1938554880. Throughput: 0: 12424.8. Samples: 484678656. Policy #0 lag: (min: 12.0, avg: 133.9, max: 268.0) [2024-06-15 23:05:25,956][1648985] Avg episode reward: [(0, '190.550')] [2024-06-15 23:05:28,440][1652491] Updated weights for policy 0, policy_version 946608 (0.0011) [2024-06-15 23:05:30,333][1652491] Updated weights for policy 0, policy_version 946658 (0.0013) [2024-06-15 23:05:30,955][1648985] Fps is (10 sec: 52429.3, 60 sec: 51336.6, 300 sec: 50651.6). Total num frames: 1938817024. Throughput: 0: 12413.2. Samples: 484759552. Policy #0 lag: (min: 12.0, avg: 133.9, max: 268.0) [2024-06-15 23:05:30,955][1648985] Avg episode reward: [(0, '196.570')] [2024-06-15 23:05:33,594][1652491] Updated weights for policy 0, policy_version 946723 (0.0017) [2024-06-15 23:05:34,803][1652491] Updated weights for policy 0, policy_version 946769 (0.0011) [2024-06-15 23:05:35,959][1648985] Fps is (10 sec: 52411.2, 60 sec: 50787.4, 300 sec: 50650.9). Total num frames: 1939079168. Throughput: 0: 12321.1. Samples: 484833792. Policy #0 lag: (min: 12.0, avg: 133.9, max: 268.0) [2024-06-15 23:05:35,960][1648985] Avg episode reward: [(0, '172.860')] [2024-06-15 23:05:37,425][1652491] Updated weights for policy 0, policy_version 946832 (0.0033) [2024-06-15 23:05:38,380][1652491] Updated weights for policy 0, policy_version 946880 (0.0011) [2024-06-15 23:05:40,324][1652491] Updated weights for policy 0, policy_version 946943 (0.0010) [2024-06-15 23:05:40,955][1648985] Fps is (10 sec: 52428.4, 60 sec: 51883.8, 300 sec: 50651.6). Total num frames: 1939341312. Throughput: 0: 12879.7. Samples: 484881920. Policy #0 lag: (min: 12.0, avg: 133.9, max: 268.0) [2024-06-15 23:05:40,956][1648985] Avg episode reward: [(0, '165.530')] [2024-06-15 23:05:43,560][1652491] Updated weights for policy 0, policy_version 946992 (0.0013) [2024-06-15 23:05:45,072][1652491] Updated weights for policy 0, policy_version 947042 (0.0013) [2024-06-15 23:05:45,955][1648985] Fps is (10 sec: 52447.3, 60 sec: 50790.5, 300 sec: 50651.5). Total num frames: 1939603456. Throughput: 0: 12652.1. Samples: 484952576. Policy #0 lag: (min: 12.0, avg: 133.9, max: 268.0) [2024-06-15 23:05:45,956][1648985] Avg episode reward: [(0, '168.330')] [2024-06-15 23:05:48,694][1652491] Updated weights for policy 0, policy_version 947120 (0.0013) [2024-06-15 23:05:49,244][1652491] Updated weights for policy 0, policy_version 947138 (0.0059) [2024-06-15 23:05:49,848][1651469] Signal inference workers to stop experience collection... (49350 times) [2024-06-15 23:05:49,878][1652491] InferenceWorker_p0-w0: stopping experience collection (49350 times) [2024-06-15 23:05:50,088][1651469] Signal inference workers to resume experience collection... (49350 times) [2024-06-15 23:05:50,089][1652491] InferenceWorker_p0-w0: resuming experience collection (49350 times) [2024-06-15 23:05:50,330][1652491] Updated weights for policy 0, policy_version 947197 (0.0012) [2024-06-15 23:05:50,955][1648985] Fps is (10 sec: 52428.2, 60 sec: 51882.5, 300 sec: 50651.6). Total num frames: 1939865600. Throughput: 0: 13004.8. Samples: 485037568. Policy #0 lag: (min: 12.0, avg: 133.9, max: 268.0) [2024-06-15 23:05:50,956][1648985] Avg episode reward: [(0, '182.940')] [2024-06-15 23:05:53,083][1652491] Updated weights for policy 0, policy_version 947235 (0.0013) [2024-06-15 23:05:54,674][1652491] Updated weights for policy 0, policy_version 947298 (0.0010) [2024-06-15 23:05:55,955][1648985] Fps is (10 sec: 52428.1, 60 sec: 52428.9, 300 sec: 50651.5). Total num frames: 1940127744. Throughput: 0: 13061.6. Samples: 485078528. Policy #0 lag: (min: 12.0, avg: 133.9, max: 268.0) [2024-06-15 23:05:55,957][1648985] Avg episode reward: [(0, '160.650')] [2024-06-15 23:05:57,335][1652491] Updated weights for policy 0, policy_version 947329 (0.0010) [2024-06-15 23:05:58,318][1652491] Updated weights for policy 0, policy_version 947383 (0.0014) [2024-06-15 23:05:59,496][1652491] Updated weights for policy 0, policy_version 947426 (0.0011) [2024-06-15 23:06:00,955][1648985] Fps is (10 sec: 52429.6, 60 sec: 52429.0, 300 sec: 50651.6). Total num frames: 1940389888. Throughput: 0: 13346.1. Samples: 485161472. Policy #0 lag: (min: 12.0, avg: 133.9, max: 268.0) [2024-06-15 23:06:00,956][1648985] Avg episode reward: [(0, '150.420')] [2024-06-15 23:06:02,172][1652491] Updated weights for policy 0, policy_version 947490 (0.0012) [2024-06-15 23:06:03,671][1652491] Updated weights for policy 0, policy_version 947552 (0.0011) [2024-06-15 23:06:04,354][1652491] Updated weights for policy 0, policy_version 947584 (0.0011) [2024-06-15 23:06:05,955][1648985] Fps is (10 sec: 52430.0, 60 sec: 52428.9, 300 sec: 50651.6). Total num frames: 1940652032. Throughput: 0: 13368.9. Samples: 485246976. Policy #0 lag: (min: 12.0, avg: 133.9, max: 268.0) [2024-06-15 23:06:05,955][1648985] Avg episode reward: [(0, '142.430')] [2024-06-15 23:06:07,038][1652491] Updated weights for policy 0, policy_version 947623 (0.0010) [2024-06-15 23:06:08,372][1652491] Updated weights for policy 0, policy_version 947664 (0.0020) [2024-06-15 23:06:09,261][1652491] Updated weights for policy 0, policy_version 947702 (0.0024) [2024-06-15 23:06:10,955][1648985] Fps is (10 sec: 55704.9, 60 sec: 52974.9, 300 sec: 50873.7). Total num frames: 1940946944. Throughput: 0: 13676.1. Samples: 485294080. Policy #0 lag: (min: 12.0, avg: 133.9, max: 268.0) [2024-06-15 23:06:10,956][1648985] Avg episode reward: [(0, '145.040')] [2024-06-15 23:06:11,545][1652491] Updated weights for policy 0, policy_version 947760 (0.0012) [2024-06-15 23:06:13,193][1652491] Updated weights for policy 0, policy_version 947830 (0.0013) [2024-06-15 23:06:15,955][1648985] Fps is (10 sec: 52428.2, 60 sec: 52428.7, 300 sec: 50651.5). Total num frames: 1941176320. Throughput: 0: 13653.3. Samples: 485373952. Policy #0 lag: (min: 22.0, avg: 124.5, max: 278.0) [2024-06-15 23:06:15,970][1648985] Avg episode reward: [(0, '150.350')] [2024-06-15 23:06:16,597][1652491] Updated weights for policy 0, policy_version 947872 (0.0011) [2024-06-15 23:06:17,334][1652491] Updated weights for policy 0, policy_version 947902 (0.0102) [2024-06-15 23:06:18,657][1652491] Updated weights for policy 0, policy_version 947957 (0.0010) [2024-06-15 23:06:20,121][1652491] Updated weights for policy 0, policy_version 947986 (0.0010) [2024-06-15 23:06:20,955][1648985] Fps is (10 sec: 58983.2, 60 sec: 54067.3, 300 sec: 51318.0). Total num frames: 1941536768. Throughput: 0: 13950.3. Samples: 485461504. Policy #0 lag: (min: 22.0, avg: 124.5, max: 278.0) [2024-06-15 23:06:20,956][1648985] Avg episode reward: [(0, '153.600')] [2024-06-15 23:06:22,131][1652491] Updated weights for policy 0, policy_version 948087 (0.0013) [2024-06-15 23:06:25,421][1652491] Updated weights for policy 0, policy_version 948128 (0.0044) [2024-06-15 23:06:25,540][1651469] Signal inference workers to stop experience collection... (49400 times) [2024-06-15 23:06:25,578][1652491] InferenceWorker_p0-w0: stopping experience collection (49400 times) [2024-06-15 23:06:25,696][1651469] Signal inference workers to resume experience collection... (49400 times) [2024-06-15 23:06:25,696][1652491] InferenceWorker_p0-w0: resuming experience collection (49400 times) [2024-06-15 23:06:25,939][1652491] Updated weights for policy 0, policy_version 948160 (0.0011) [2024-06-15 23:06:25,955][1648985] Fps is (10 sec: 65536.8, 60 sec: 54613.6, 300 sec: 51095.9). Total num frames: 1941831680. Throughput: 0: 13835.4. Samples: 485504512. Policy #0 lag: (min: 22.0, avg: 124.5, max: 278.0) [2024-06-15 23:06:25,955][1648985] Avg episode reward: [(0, '177.600')] [2024-06-15 23:06:27,744][1652491] Updated weights for policy 0, policy_version 948213 (0.0014) [2024-06-15 23:06:28,930][1652491] Updated weights for policy 0, policy_version 948256 (0.0011) [2024-06-15 23:06:30,216][1652491] Updated weights for policy 0, policy_version 948307 (0.0012) [2024-06-15 23:06:30,955][1648985] Fps is (10 sec: 65535.8, 60 sec: 56251.7, 300 sec: 51873.4). Total num frames: 1942192128. Throughput: 0: 14256.4. Samples: 485594112. Policy #0 lag: (min: 22.0, avg: 124.5, max: 278.0) [2024-06-15 23:06:30,956][1648985] Avg episode reward: [(0, '169.400')] [2024-06-15 23:06:31,030][1652491] Updated weights for policy 0, policy_version 948352 (0.0013) [2024-06-15 23:06:34,845][1652491] Updated weights for policy 0, policy_version 948415 (0.0011) [2024-06-15 23:06:35,955][1648985] Fps is (10 sec: 52428.6, 60 sec: 54616.6, 300 sec: 51095.9). Total num frames: 1942355968. Throughput: 0: 14461.2. Samples: 485688320. Policy #0 lag: (min: 22.0, avg: 124.5, max: 278.0) [2024-06-15 23:06:35,956][1648985] Avg episode reward: [(0, '181.420')] [2024-06-15 23:06:36,861][1652491] Updated weights for policy 0, policy_version 948464 (0.0012) [2024-06-15 23:06:37,330][1652491] Updated weights for policy 0, policy_version 948480 (0.0010) [2024-06-15 23:06:38,624][1652491] Updated weights for policy 0, policy_version 948530 (0.0011) [2024-06-15 23:06:39,630][1652491] Updated weights for policy 0, policy_version 948576 (0.0010) [2024-06-15 23:06:40,287][1652491] Updated weights for policy 0, policy_version 948608 (0.0011) [2024-06-15 23:06:40,955][1648985] Fps is (10 sec: 55705.5, 60 sec: 56797.9, 300 sec: 52095.6). Total num frames: 1942749184. Throughput: 0: 14324.7. Samples: 485723136. Policy #0 lag: (min: 22.0, avg: 124.5, max: 278.0) [2024-06-15 23:06:40,956][1648985] Avg episode reward: [(0, '172.150')] [2024-06-15 23:06:45,285][1652491] Updated weights for policy 0, policy_version 948676 (0.0013) [2024-06-15 23:06:45,955][1648985] Fps is (10 sec: 58982.4, 60 sec: 55705.7, 300 sec: 51429.1). Total num frames: 1942945792. Throughput: 0: 14404.3. Samples: 485809664. Policy #0 lag: (min: 22.0, avg: 124.5, max: 278.0) [2024-06-15 23:06:45,956][1648985] Avg episode reward: [(0, '170.940')] [2024-06-15 23:06:46,480][1652491] Updated weights for policy 0, policy_version 948731 (0.0011) [2024-06-15 23:06:47,790][1652491] Updated weights for policy 0, policy_version 948769 (0.0013) [2024-06-15 23:06:49,489][1652491] Updated weights for policy 0, policy_version 948834 (0.0012) [2024-06-15 23:06:50,955][1648985] Fps is (10 sec: 52428.0, 60 sec: 56797.8, 300 sec: 52206.6). Total num frames: 1943273472. Throughput: 0: 14131.1. Samples: 485882880. Policy #0 lag: (min: 22.0, avg: 124.5, max: 278.0) [2024-06-15 23:06:50,956][1648985] Avg episode reward: [(0, '152.780')] [2024-06-15 23:06:52,726][1652491] Updated weights for policy 0, policy_version 948880 (0.0012) [2024-06-15 23:06:53,655][1652491] Updated weights for policy 0, policy_version 948923 (0.0011) [2024-06-15 23:06:55,410][1652491] Updated weights for policy 0, policy_version 948984 (0.0015) [2024-06-15 23:06:55,955][1648985] Fps is (10 sec: 58981.3, 60 sec: 56797.9, 300 sec: 51873.4). Total num frames: 1943535616. Throughput: 0: 14188.1. Samples: 485932544. Policy #0 lag: (min: 22.0, avg: 124.5, max: 278.0) [2024-06-15 23:06:55,956][1648985] Avg episode reward: [(0, '135.690')] [2024-06-15 23:06:55,960][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000948992_1943535616.pth... [2024-06-15 23:06:56,000][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000942848_1930952704.pth [2024-06-15 23:06:57,226][1652491] Updated weights for policy 0, policy_version 949028 (0.0016) [2024-06-15 23:06:58,842][1651469] Signal inference workers to stop experience collection... (49450 times) [2024-06-15 23:06:58,882][1652491] InferenceWorker_p0-w0: stopping experience collection (49450 times) [2024-06-15 23:06:59,134][1651469] Signal inference workers to resume experience collection... (49450 times) [2024-06-15 23:06:59,135][1652491] InferenceWorker_p0-w0: resuming experience collection (49450 times) [2024-06-15 23:06:59,137][1652491] Updated weights for policy 0, policy_version 949104 (0.0012) [2024-06-15 23:07:00,955][1648985] Fps is (10 sec: 52429.9, 60 sec: 56797.9, 300 sec: 52317.7). Total num frames: 1943797760. Throughput: 0: 14119.9. Samples: 486009344. Policy #0 lag: (min: 22.0, avg: 124.5, max: 278.0) [2024-06-15 23:07:00,956][1648985] Avg episode reward: [(0, '152.640')] [2024-06-15 23:07:01,976][1652491] Updated weights for policy 0, policy_version 949136 (0.0025) [2024-06-15 23:07:02,954][1652491] Updated weights for policy 0, policy_version 949181 (0.0011) [2024-06-15 23:07:04,672][1652491] Updated weights for policy 0, policy_version 949232 (0.0012) [2024-06-15 23:07:05,955][1648985] Fps is (10 sec: 52429.7, 60 sec: 56797.8, 300 sec: 51984.5). Total num frames: 1944059904. Throughput: 0: 14256.3. Samples: 486103040. Policy #0 lag: (min: 22.0, avg: 124.5, max: 278.0) [2024-06-15 23:07:05,956][1648985] Avg episode reward: [(0, '151.390')] [2024-06-15 23:07:06,455][1652491] Updated weights for policy 0, policy_version 949266 (0.0033) [2024-06-15 23:07:08,042][1652491] Updated weights for policy 0, policy_version 949331 (0.0011) [2024-06-15 23:07:09,100][1652491] Updated weights for policy 0, policy_version 949376 (0.0012) [2024-06-15 23:07:10,955][1648985] Fps is (10 sec: 52428.6, 60 sec: 56251.8, 300 sec: 52428.8). Total num frames: 1944322048. Throughput: 0: 14006.0. Samples: 486134784. Policy #0 lag: (min: 22.0, avg: 124.5, max: 278.0) [2024-06-15 23:07:10,971][1648985] Avg episode reward: [(0, '157.030')] [2024-06-15 23:07:11,831][1652491] Updated weights for policy 0, policy_version 949440 (0.0012) [2024-06-15 23:07:14,119][1652491] Updated weights for policy 0, policy_version 949497 (0.0014) [2024-06-15 23:07:15,955][1648985] Fps is (10 sec: 55705.6, 60 sec: 57344.0, 300 sec: 52095.6). Total num frames: 1944616960. Throughput: 0: 14085.7. Samples: 486227968. Policy #0 lag: (min: 22.0, avg: 124.5, max: 278.0) [2024-06-15 23:07:15,956][1648985] Avg episode reward: [(0, '167.510')] [2024-06-15 23:07:16,930][1652491] Updated weights for policy 0, policy_version 949569 (0.0012) [2024-06-15 23:07:18,132][1652491] Updated weights for policy 0, policy_version 949628 (0.0016) [2024-06-15 23:07:20,955][1648985] Fps is (10 sec: 58982.5, 60 sec: 56251.7, 300 sec: 52651.0). Total num frames: 1944911872. Throughput: 0: 13915.0. Samples: 486314496. Policy #0 lag: (min: 22.0, avg: 124.5, max: 278.0) [2024-06-15 23:07:20,956][1648985] Avg episode reward: [(0, '169.720')] [2024-06-15 23:07:21,033][1652491] Updated weights for policy 0, policy_version 949669 (0.0011) [2024-06-15 23:07:22,360][1652491] Updated weights for policy 0, policy_version 949714 (0.0010) [2024-06-15 23:07:23,225][1652491] Updated weights for policy 0, policy_version 949751 (0.0011) [2024-06-15 23:07:25,320][1652491] Updated weights for policy 0, policy_version 949814 (0.0014) [2024-06-15 23:07:25,955][1648985] Fps is (10 sec: 65535.7, 60 sec: 57343.9, 300 sec: 52539.9). Total num frames: 1945272320. Throughput: 0: 14085.7. Samples: 486356992. Policy #0 lag: (min: 22.0, avg: 124.5, max: 278.0) [2024-06-15 23:07:25,956][1648985] Avg episode reward: [(0, '162.660')] [2024-06-15 23:07:26,747][1652491] Updated weights for policy 0, policy_version 949883 (0.0012) [2024-06-15 23:07:30,303][1652491] Updated weights for policy 0, policy_version 949936 (0.0013) [2024-06-15 23:07:30,955][1648985] Fps is (10 sec: 58982.5, 60 sec: 55159.5, 300 sec: 52873.1). Total num frames: 1945501696. Throughput: 0: 14165.3. Samples: 486447104. Policy #0 lag: (min: 22.0, avg: 124.5, max: 278.0) [2024-06-15 23:07:30,956][1648985] Avg episode reward: [(0, '153.310')] [2024-06-15 23:07:31,443][1652491] Updated weights for policy 0, policy_version 949969 (0.0012) [2024-06-15 23:07:32,369][1652491] Updated weights for policy 0, policy_version 950009 (0.0020) [2024-06-15 23:07:34,365][1652491] Updated weights for policy 0, policy_version 950070 (0.0012) [2024-06-15 23:07:34,734][1651469] Signal inference workers to stop experience collection... (49500 times) [2024-06-15 23:07:34,769][1652491] InferenceWorker_p0-w0: stopping experience collection (49500 times) [2024-06-15 23:07:34,952][1651469] Signal inference workers to resume experience collection... (49500 times) [2024-06-15 23:07:34,953][1652491] InferenceWorker_p0-w0: resuming experience collection (49500 times) [2024-06-15 23:07:35,117][1652491] Updated weights for policy 0, policy_version 950099 (0.0010) [2024-06-15 23:07:35,955][1648985] Fps is (10 sec: 58982.8, 60 sec: 58436.3, 300 sec: 52984.2). Total num frames: 1945862144. Throughput: 0: 14199.5. Samples: 486521856. Policy #0 lag: (min: 22.0, avg: 124.5, max: 278.0) [2024-06-15 23:07:35,955][1648985] Avg episode reward: [(0, '147.850')] [2024-06-15 23:07:35,991][1652491] Updated weights for policy 0, policy_version 950144 (0.0104) [2024-06-15 23:07:39,360][1652491] Updated weights for policy 0, policy_version 950206 (0.0014) [2024-06-15 23:07:40,955][1648985] Fps is (10 sec: 58982.3, 60 sec: 55705.6, 300 sec: 53095.3). Total num frames: 1946091520. Throughput: 0: 14290.5. Samples: 486575616. Policy #0 lag: (min: 22.0, avg: 124.5, max: 278.0) [2024-06-15 23:07:40,956][1648985] Avg episode reward: [(0, '164.410')] [2024-06-15 23:07:41,169][1652491] Updated weights for policy 0, policy_version 950256 (0.0011) [2024-06-15 23:07:41,569][1652491] Updated weights for policy 0, policy_version 950272 (0.0024) [2024-06-15 23:07:43,149][1652491] Updated weights for policy 0, policy_version 950309 (0.0012) [2024-06-15 23:07:44,710][1652491] Updated weights for policy 0, policy_version 950370 (0.0020) [2024-06-15 23:07:45,955][1648985] Fps is (10 sec: 55704.8, 60 sec: 57890.0, 300 sec: 53317.4). Total num frames: 1946419200. Throughput: 0: 14279.1. Samples: 486651904. Policy #0 lag: (min: 22.0, avg: 124.5, max: 278.0) [2024-06-15 23:07:45,956][1648985] Avg episode reward: [(0, '158.410')] [2024-06-15 23:07:47,785][1652491] Updated weights for policy 0, policy_version 950419 (0.0028) [2024-06-15 23:07:49,155][1652491] Updated weights for policy 0, policy_version 950480 (0.0014) [2024-06-15 23:07:50,076][1652491] Updated weights for policy 0, policy_version 950518 (0.0014) [2024-06-15 23:07:50,955][1648985] Fps is (10 sec: 58982.4, 60 sec: 56798.0, 300 sec: 53317.4). Total num frames: 1946681344. Throughput: 0: 14392.9. Samples: 486750720. Policy #0 lag: (min: 22.0, avg: 124.5, max: 278.0) [2024-06-15 23:07:50,955][1648985] Avg episode reward: [(0, '186.300')] [2024-06-15 23:07:51,872][1652491] Updated weights for policy 0, policy_version 950550 (0.0028) [2024-06-15 23:07:52,681][1652491] Updated weights for policy 0, policy_version 950592 (0.0012) [2024-06-15 23:07:54,237][1652491] Updated weights for policy 0, policy_version 950649 (0.0012) [2024-06-15 23:07:55,955][1648985] Fps is (10 sec: 52429.9, 60 sec: 56798.1, 300 sec: 53317.4). Total num frames: 1946943488. Throughput: 0: 14563.6. Samples: 486790144. Policy #0 lag: (min: 22.0, avg: 124.5, max: 278.0) [2024-06-15 23:07:55,955][1648985] Avg episode reward: [(0, '180.040')] [2024-06-15 23:07:57,180][1652491] Updated weights for policy 0, policy_version 950704 (0.0011) [2024-06-15 23:07:58,318][1652491] Updated weights for policy 0, policy_version 950759 (0.0010) [2024-06-15 23:08:00,707][1652491] Updated weights for policy 0, policy_version 950789 (0.0013) [2024-06-15 23:08:00,955][1648985] Fps is (10 sec: 55704.8, 60 sec: 57343.8, 300 sec: 53428.5). Total num frames: 1947238400. Throughput: 0: 14518.0. Samples: 486881280. Policy #0 lag: (min: 22.0, avg: 124.5, max: 278.0) [2024-06-15 23:08:00,956][1648985] Avg episode reward: [(0, '166.730')] [2024-06-15 23:08:02,463][1652491] Updated weights for policy 0, policy_version 950866 (0.0015) [2024-06-15 23:08:03,505][1652491] Updated weights for policy 0, policy_version 950907 (0.0011) [2024-06-15 23:08:05,955][1648985] Fps is (10 sec: 58982.6, 60 sec: 57890.2, 300 sec: 53539.6). Total num frames: 1947533312. Throughput: 0: 14449.8. Samples: 486964736. Policy #0 lag: (min: 22.0, avg: 124.5, max: 278.0) [2024-06-15 23:08:05,955][1648985] Avg episode reward: [(0, '179.160')] [2024-06-15 23:08:06,103][1652491] Updated weights for policy 0, policy_version 950945 (0.0010) [2024-06-15 23:08:06,905][1652491] Updated weights for policy 0, policy_version 950977 (0.0012) [2024-06-15 23:08:08,235][1652491] Updated weights for policy 0, policy_version 951034 (0.0080) [2024-06-15 23:08:10,955][1648985] Fps is (10 sec: 52429.2, 60 sec: 57343.9, 300 sec: 53428.5). Total num frames: 1947762688. Throughput: 0: 14290.5. Samples: 487000064. Policy #0 lag: (min: 72.0, avg: 192.9, max: 319.0) [2024-06-15 23:08:10,956][1648985] Avg episode reward: [(0, '185.770')] [2024-06-15 23:08:11,213][1651469] Signal inference workers to stop experience collection... (49550 times) [2024-06-15 23:08:11,252][1652491] InferenceWorker_p0-w0: stopping experience collection (49550 times) [2024-06-15 23:08:11,254][1652491] Updated weights for policy 0, policy_version 951076 (0.0010) [2024-06-15 23:08:11,432][1651469] Signal inference workers to resume experience collection... (49550 times) [2024-06-15 23:08:11,433][1652491] InferenceWorker_p0-w0: resuming experience collection (49550 times) [2024-06-15 23:08:13,079][1652491] Updated weights for policy 0, policy_version 951159 (0.0101) [2024-06-15 23:08:15,571][1652491] Updated weights for policy 0, policy_version 951202 (0.0011) [2024-06-15 23:08:15,955][1648985] Fps is (10 sec: 55705.3, 60 sec: 57890.2, 300 sec: 53650.7). Total num frames: 1948090368. Throughput: 0: 14336.0. Samples: 487092224. Policy #0 lag: (min: 72.0, avg: 192.9, max: 319.0) [2024-06-15 23:08:15,955][1648985] Avg episode reward: [(0, '159.340')] [2024-06-15 23:08:16,210][1652491] Updated weights for policy 0, policy_version 951232 (0.0015) [2024-06-15 23:08:19,340][1652491] Updated weights for policy 0, policy_version 951312 (0.0013) [2024-06-15 23:08:20,263][1652491] Updated weights for policy 0, policy_version 951358 (0.0010) [2024-06-15 23:08:20,955][1648985] Fps is (10 sec: 65536.5, 60 sec: 58436.3, 300 sec: 53872.8). Total num frames: 1948418048. Throughput: 0: 14518.0. Samples: 487175168. Policy #0 lag: (min: 72.0, avg: 192.9, max: 319.0) [2024-06-15 23:08:20,956][1648985] Avg episode reward: [(0, '170.140')] [2024-06-15 23:08:21,506][1652491] Updated weights for policy 0, policy_version 951420 (0.0012) [2024-06-15 23:08:24,188][1652491] Updated weights for policy 0, policy_version 951458 (0.0010) [2024-06-15 23:08:25,955][1648985] Fps is (10 sec: 58981.9, 60 sec: 56797.9, 300 sec: 53983.9). Total num frames: 1948680192. Throughput: 0: 14449.8. Samples: 487225856. Policy #0 lag: (min: 72.0, avg: 192.9, max: 319.0) [2024-06-15 23:08:25,956][1648985] Avg episode reward: [(0, '153.760')] [2024-06-15 23:08:26,109][1652491] Updated weights for policy 0, policy_version 951508 (0.0010) [2024-06-15 23:08:26,995][1652491] Updated weights for policy 0, policy_version 951546 (0.0011) [2024-06-15 23:08:28,705][1652491] Updated weights for policy 0, policy_version 951610 (0.0012) [2024-06-15 23:08:30,270][1652491] Updated weights for policy 0, policy_version 951649 (0.0010) [2024-06-15 23:08:30,955][1648985] Fps is (10 sec: 62259.7, 60 sec: 58982.5, 300 sec: 54539.3). Total num frames: 1949040640. Throughput: 0: 14563.6. Samples: 487307264. Policy #0 lag: (min: 72.0, avg: 192.9, max: 319.0) [2024-06-15 23:08:30,955][1648985] Avg episode reward: [(0, '160.170')] [2024-06-15 23:08:32,516][1652491] Updated weights for policy 0, policy_version 951696 (0.0011) [2024-06-15 23:08:33,580][1652491] Updated weights for policy 0, policy_version 951744 (0.0012) [2024-06-15 23:08:35,802][1652491] Updated weights for policy 0, policy_version 951808 (0.0015) [2024-06-15 23:08:35,955][1648985] Fps is (10 sec: 62258.3, 60 sec: 57343.8, 300 sec: 54539.2). Total num frames: 1949302784. Throughput: 0: 14438.3. Samples: 487400448. Policy #0 lag: (min: 72.0, avg: 192.9, max: 319.0) [2024-06-15 23:08:35,956][1648985] Avg episode reward: [(0, '174.730')] [2024-06-15 23:08:38,589][1652491] Updated weights for policy 0, policy_version 951874 (0.0013) [2024-06-15 23:08:39,842][1652491] Updated weights for policy 0, policy_version 951923 (0.0016) [2024-06-15 23:08:40,955][1648985] Fps is (10 sec: 52428.4, 60 sec: 57890.1, 300 sec: 54428.2). Total num frames: 1949564928. Throughput: 0: 14518.0. Samples: 487443456. Policy #0 lag: (min: 72.0, avg: 192.9, max: 319.0) [2024-06-15 23:08:40,955][1648985] Avg episode reward: [(0, '166.300')] [2024-06-15 23:08:41,736][1652491] Updated weights for policy 0, policy_version 951952 (0.0010) [2024-06-15 23:08:42,622][1652491] Updated weights for policy 0, policy_version 951990 (0.0012) [2024-06-15 23:08:44,177][1652491] Updated weights for policy 0, policy_version 952032 (0.0014) [2024-06-15 23:08:44,546][1651469] Signal inference workers to stop experience collection... (49600 times) [2024-06-15 23:08:44,588][1652491] InferenceWorker_p0-w0: stopping experience collection (49600 times) [2024-06-15 23:08:44,751][1651469] Signal inference workers to resume experience collection... (49600 times) [2024-06-15 23:08:44,752][1652491] InferenceWorker_p0-w0: resuming experience collection (49600 times) [2024-06-15 23:08:45,478][1652491] Updated weights for policy 0, policy_version 952099 (0.0013) [2024-06-15 23:08:45,955][1648985] Fps is (10 sec: 65537.2, 60 sec: 58982.5, 300 sec: 54983.6). Total num frames: 1949958144. Throughput: 0: 14438.4. Samples: 487531008. Policy #0 lag: (min: 72.0, avg: 192.9, max: 319.0) [2024-06-15 23:08:45,956][1648985] Avg episode reward: [(0, '168.740')] [2024-06-15 23:08:47,339][1652491] Updated weights for policy 0, policy_version 952129 (0.0014) [2024-06-15 23:08:48,720][1652491] Updated weights for policy 0, policy_version 952184 (0.0012) [2024-06-15 23:08:50,715][1652491] Updated weights for policy 0, policy_version 952225 (0.0012) [2024-06-15 23:08:50,955][1648985] Fps is (10 sec: 58982.5, 60 sec: 57890.2, 300 sec: 54872.5). Total num frames: 1950154752. Throughput: 0: 14518.0. Samples: 487618048. Policy #0 lag: (min: 72.0, avg: 192.9, max: 319.0) [2024-06-15 23:08:50,955][1648985] Avg episode reward: [(0, '161.050')] [2024-06-15 23:08:52,824][1652491] Updated weights for policy 0, policy_version 952257 (0.0011) [2024-06-15 23:08:54,088][1652491] Updated weights for policy 0, policy_version 952320 (0.0018) [2024-06-15 23:08:55,379][1652491] Updated weights for policy 0, policy_version 952384 (0.0013) [2024-06-15 23:08:55,955][1648985] Fps is (10 sec: 52428.5, 60 sec: 58982.3, 300 sec: 55094.7). Total num frames: 1950482432. Throughput: 0: 14722.8. Samples: 487662592. Policy #0 lag: (min: 72.0, avg: 192.9, max: 319.0) [2024-06-15 23:08:55,956][1648985] Avg episode reward: [(0, '145.600')] [2024-06-15 23:08:55,961][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000952384_1950482432.pth... [2024-06-15 23:08:56,052][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000945792_1936982016.pth [2024-06-15 23:08:58,339][1652491] Updated weights for policy 0, policy_version 952440 (0.0011) [2024-06-15 23:08:59,539][1652491] Updated weights for policy 0, policy_version 952480 (0.0010) [2024-06-15 23:09:00,297][1652491] Updated weights for policy 0, policy_version 952507 (0.0020) [2024-06-15 23:09:00,955][1648985] Fps is (10 sec: 58982.1, 60 sec: 58436.4, 300 sec: 55094.7). Total num frames: 1950744576. Throughput: 0: 14540.8. Samples: 487746560. Policy #0 lag: (min: 72.0, avg: 192.9, max: 319.0) [2024-06-15 23:09:00,956][1648985] Avg episode reward: [(0, '150.900')] [2024-06-15 23:09:02,948][1652491] Updated weights for policy 0, policy_version 952548 (0.0011) [2024-06-15 23:09:04,444][1652491] Updated weights for policy 0, policy_version 952624 (0.0010) [2024-06-15 23:09:05,955][1648985] Fps is (10 sec: 52429.2, 60 sec: 57890.0, 300 sec: 55205.7). Total num frames: 1951006720. Throughput: 0: 14506.7. Samples: 487827968. Policy #0 lag: (min: 72.0, avg: 192.9, max: 319.0) [2024-06-15 23:09:05,956][1648985] Avg episode reward: [(0, '150.050')] [2024-06-15 23:09:07,166][1652491] Updated weights for policy 0, policy_version 952673 (0.0017) [2024-06-15 23:09:07,978][1652491] Updated weights for policy 0, policy_version 952705 (0.0020) [2024-06-15 23:09:09,317][1652491] Updated weights for policy 0, policy_version 952762 (0.0016) [2024-06-15 23:09:10,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 58436.3, 300 sec: 55094.7). Total num frames: 1951268864. Throughput: 0: 14301.9. Samples: 487869440. Policy #0 lag: (min: 72.0, avg: 192.9, max: 319.0) [2024-06-15 23:09:10,956][1648985] Avg episode reward: [(0, '161.820')] [2024-06-15 23:09:12,552][1652491] Updated weights for policy 0, policy_version 952816 (0.0011) [2024-06-15 23:09:13,542][1652491] Updated weights for policy 0, policy_version 952864 (0.0012) [2024-06-15 23:09:15,614][1652491] Updated weights for policy 0, policy_version 952912 (0.0011) [2024-06-15 23:09:15,955][1648985] Fps is (10 sec: 55705.8, 60 sec: 57890.1, 300 sec: 55316.8). Total num frames: 1951563776. Throughput: 0: 14483.9. Samples: 487959040. Policy #0 lag: (min: 72.0, avg: 192.9, max: 319.0) [2024-06-15 23:09:15,956][1648985] Avg episode reward: [(0, '168.330')] [2024-06-15 23:09:16,638][1652491] Updated weights for policy 0, policy_version 952957 (0.0014) [2024-06-15 23:09:18,550][1652491] Updated weights for policy 0, policy_version 953010 (0.0011) [2024-06-15 23:09:20,176][1652491] Updated weights for policy 0, policy_version 953029 (0.0010) [2024-06-15 23:09:20,469][1651469] Signal inference workers to stop experience collection... (49650 times) [2024-06-15 23:09:20,547][1652491] InferenceWorker_p0-w0: stopping experience collection (49650 times) [2024-06-15 23:09:20,735][1651469] Signal inference workers to resume experience collection... (49650 times) [2024-06-15 23:09:20,738][1652491] InferenceWorker_p0-w0: resuming experience collection (49650 times) [2024-06-15 23:09:20,955][1648985] Fps is (10 sec: 58982.9, 60 sec: 57344.0, 300 sec: 55316.9). Total num frames: 1951858688. Throughput: 0: 14381.6. Samples: 488047616. Policy #0 lag: (min: 72.0, avg: 192.9, max: 319.0) [2024-06-15 23:09:20,955][1648985] Avg episode reward: [(0, '168.720')] [2024-06-15 23:09:21,343][1652491] Updated weights for policy 0, policy_version 953079 (0.0011) [2024-06-15 23:09:22,380][1652491] Updated weights for policy 0, policy_version 953120 (0.0011) [2024-06-15 23:09:24,097][1652491] Updated weights for policy 0, policy_version 953168 (0.0011) [2024-06-15 23:09:24,932][1652491] Updated weights for policy 0, policy_version 953216 (0.0017) [2024-06-15 23:09:25,955][1648985] Fps is (10 sec: 62260.3, 60 sec: 58436.5, 300 sec: 55761.2). Total num frames: 1952186368. Throughput: 0: 14506.7. Samples: 488096256. Policy #0 lag: (min: 72.0, avg: 192.9, max: 319.0) [2024-06-15 23:09:25,955][1648985] Avg episode reward: [(0, '164.020')] [2024-06-15 23:09:26,833][1652491] Updated weights for policy 0, policy_version 953271 (0.0012) [2024-06-15 23:09:28,755][1652491] Updated weights for policy 0, policy_version 953312 (0.0110) [2024-06-15 23:09:30,913][1652491] Updated weights for policy 0, policy_version 953360 (0.0012) [2024-06-15 23:09:30,961][1648985] Fps is (10 sec: 62221.8, 60 sec: 57338.2, 300 sec: 55760.0). Total num frames: 1952481280. Throughput: 0: 14595.8. Samples: 488187904. Policy #0 lag: (min: 72.0, avg: 192.9, max: 319.0) [2024-06-15 23:09:30,962][1648985] Avg episode reward: [(0, '147.930')] [2024-06-15 23:09:32,825][1652491] Updated weights for policy 0, policy_version 953411 (0.0014) [2024-06-15 23:09:34,012][1652491] Updated weights for policy 0, policy_version 953472 (0.0028) [2024-06-15 23:09:35,883][1652491] Updated weights for policy 0, policy_version 953532 (0.0012) [2024-06-15 23:09:35,955][1648985] Fps is (10 sec: 65534.9, 60 sec: 58982.6, 300 sec: 56316.8). Total num frames: 1952841728. Throughput: 0: 14415.7. Samples: 488266752. Policy #0 lag: (min: 72.0, avg: 192.9, max: 319.0) [2024-06-15 23:09:35,955][1648985] Avg episode reward: [(0, '144.530')] [2024-06-15 23:09:38,552][1652491] Updated weights for policy 0, policy_version 953595 (0.0092) [2024-06-15 23:09:40,553][1652491] Updated weights for policy 0, policy_version 953632 (0.0012) [2024-06-15 23:09:40,955][1648985] Fps is (10 sec: 59018.2, 60 sec: 58436.3, 300 sec: 55983.3). Total num frames: 1953071104. Throughput: 0: 14392.9. Samples: 488310272. Policy #0 lag: (min: 72.0, avg: 192.9, max: 319.0) [2024-06-15 23:09:40,955][1648985] Avg episode reward: [(0, '148.840')] [2024-06-15 23:09:42,228][1652491] Updated weights for policy 0, policy_version 953665 (0.0011) [2024-06-15 23:09:43,282][1652491] Updated weights for policy 0, policy_version 953725 (0.0056) [2024-06-15 23:09:44,553][1652491] Updated weights for policy 0, policy_version 953784 (0.0014) [2024-06-15 23:09:45,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 56798.0, 300 sec: 56316.5). Total num frames: 1953366016. Throughput: 0: 14461.2. Samples: 488397312. Policy #0 lag: (min: 72.0, avg: 192.9, max: 319.0) [2024-06-15 23:09:45,955][1648985] Avg episode reward: [(0, '145.820')] [2024-06-15 23:09:47,091][1652491] Updated weights for policy 0, policy_version 953824 (0.0012) [2024-06-15 23:09:47,703][1652491] Updated weights for policy 0, policy_version 953856 (0.0012) [2024-06-15 23:09:50,038][1652491] Updated weights for policy 0, policy_version 953910 (0.0021) [2024-06-15 23:09:50,955][1648985] Fps is (10 sec: 55704.9, 60 sec: 57890.1, 300 sec: 56427.7). Total num frames: 1953628160. Throughput: 0: 14791.1. Samples: 488493568. Policy #0 lag: (min: 72.0, avg: 192.9, max: 319.0) [2024-06-15 23:09:50,956][1648985] Avg episode reward: [(0, '183.240')] [2024-06-15 23:09:51,349][1652491] Updated weights for policy 0, policy_version 953952 (0.0091) [2024-06-15 23:09:52,801][1652491] Updated weights for policy 0, policy_version 954001 (0.0013) [2024-06-15 23:09:55,568][1652491] Updated weights for policy 0, policy_version 954064 (0.0011) [2024-06-15 23:09:55,955][1648985] Fps is (10 sec: 55704.5, 60 sec: 57344.0, 300 sec: 56538.7). Total num frames: 1953923072. Throughput: 0: 14620.4. Samples: 488527360. Policy #0 lag: (min: 72.0, avg: 192.9, max: 319.0) [2024-06-15 23:09:55,956][1648985] Avg episode reward: [(0, '202.860')] [2024-06-15 23:09:56,182][1651469] Signal inference workers to stop experience collection... (49700 times) [2024-06-15 23:09:56,228][1652491] InferenceWorker_p0-w0: stopping experience collection (49700 times) [2024-06-15 23:09:56,425][1651469] Signal inference workers to resume experience collection... (49700 times) [2024-06-15 23:09:56,426][1652491] InferenceWorker_p0-w0: resuming experience collection (49700 times) [2024-06-15 23:09:59,041][1652491] Updated weights for policy 0, policy_version 954132 (0.0091) [2024-06-15 23:10:00,462][1652491] Updated weights for policy 0, policy_version 954177 (0.0011) [2024-06-15 23:10:00,955][1648985] Fps is (10 sec: 55705.9, 60 sec: 57344.0, 300 sec: 56538.7). Total num frames: 1954185216. Throughput: 0: 14654.6. Samples: 488618496. Policy #0 lag: (min: 72.0, avg: 192.9, max: 319.0) [2024-06-15 23:10:00,956][1648985] Avg episode reward: [(0, '185.180')] [2024-06-15 23:10:01,579][1652491] Updated weights for policy 0, policy_version 954234 (0.0042) [2024-06-15 23:10:02,811][1652491] Updated weights for policy 0, policy_version 954288 (0.0029) [2024-06-15 23:10:04,144][1652491] Updated weights for policy 0, policy_version 954308 (0.0010) [2024-06-15 23:10:05,599][1652491] Updated weights for policy 0, policy_version 954359 (0.0028) [2024-06-15 23:10:05,955][1648985] Fps is (10 sec: 62259.9, 60 sec: 58982.4, 300 sec: 56871.9). Total num frames: 1954545664. Throughput: 0: 14495.3. Samples: 488699904. Policy #0 lag: (min: 56.0, avg: 183.4, max: 312.0) [2024-06-15 23:10:05,956][1648985] Avg episode reward: [(0, '165.670')] [2024-06-15 23:10:08,383][1652491] Updated weights for policy 0, policy_version 954402 (0.0022) [2024-06-15 23:10:09,737][1652491] Updated weights for policy 0, policy_version 954448 (0.0012) [2024-06-15 23:10:10,955][1648985] Fps is (10 sec: 62259.1, 60 sec: 58982.4, 300 sec: 56871.9). Total num frames: 1954807808. Throughput: 0: 14472.5. Samples: 488747520. Policy #0 lag: (min: 56.0, avg: 183.4, max: 312.0) [2024-06-15 23:10:10,956][1648985] Avg episode reward: [(0, '146.860')] [2024-06-15 23:10:11,202][1652491] Updated weights for policy 0, policy_version 954513 (0.0009) [2024-06-15 23:10:11,972][1652491] Updated weights for policy 0, policy_version 954556 (0.0011) [2024-06-15 23:10:14,375][1652491] Updated weights for policy 0, policy_version 954616 (0.0011) [2024-06-15 23:10:15,955][1648985] Fps is (10 sec: 52428.8, 60 sec: 58436.2, 300 sec: 56871.9). Total num frames: 1955069952. Throughput: 0: 14463.1. Samples: 488838656. Policy #0 lag: (min: 56.0, avg: 183.4, max: 312.0) [2024-06-15 23:10:15,956][1648985] Avg episode reward: [(0, '177.220')] [2024-06-15 23:10:17,235][1652491] Updated weights for policy 0, policy_version 954656 (0.0011) [2024-06-15 23:10:18,208][1652491] Updated weights for policy 0, policy_version 954709 (0.0013) [2024-06-15 23:10:19,293][1652491] Updated weights for policy 0, policy_version 954768 (0.0017) [2024-06-15 23:10:20,326][1652491] Updated weights for policy 0, policy_version 954812 (0.0011) [2024-06-15 23:10:20,955][1648985] Fps is (10 sec: 65536.3, 60 sec: 60074.7, 300 sec: 57316.3). Total num frames: 1955463168. Throughput: 0: 14620.4. Samples: 488924672. Policy #0 lag: (min: 56.0, avg: 183.4, max: 312.0) [2024-06-15 23:10:20,956][1648985] Avg episode reward: [(0, '175.260')] [2024-06-15 23:10:23,159][1652491] Updated weights for policy 0, policy_version 954877 (0.0012) [2024-06-15 23:10:25,955][1648985] Fps is (10 sec: 55704.7, 60 sec: 57343.6, 300 sec: 56983.0). Total num frames: 1955627008. Throughput: 0: 14586.2. Samples: 488966656. Policy #0 lag: (min: 56.0, avg: 183.4, max: 312.0) [2024-06-15 23:10:25,956][1648985] Avg episode reward: [(0, '173.500')] [2024-06-15 23:10:26,532][1652491] Updated weights for policy 0, policy_version 954928 (0.0012) [2024-06-15 23:10:27,708][1652491] Updated weights for policy 0, policy_version 954980 (0.0013) [2024-06-15 23:10:28,883][1652491] Updated weights for policy 0, policy_version 955027 (0.0032) [2024-06-15 23:10:29,860][1652491] Updated weights for policy 0, policy_version 955068 (0.0011) [2024-06-15 23:10:30,955][1648985] Fps is (10 sec: 52428.1, 60 sec: 58442.0, 300 sec: 57316.9). Total num frames: 1955987456. Throughput: 0: 14563.5. Samples: 489052672. Policy #0 lag: (min: 56.0, avg: 183.4, max: 312.0) [2024-06-15 23:10:30,956][1648985] Avg episode reward: [(0, '177.240')] [2024-06-15 23:10:31,385][1651469] Signal inference workers to stop experience collection... (49750 times) [2024-06-15 23:10:31,432][1652491] InferenceWorker_p0-w0: stopping experience collection (49750 times) [2024-06-15 23:10:31,695][1651469] Signal inference workers to resume experience collection... (49750 times) [2024-06-15 23:10:31,696][1652491] InferenceWorker_p0-w0: resuming experience collection (49750 times) [2024-06-15 23:10:32,423][1652491] Updated weights for policy 0, policy_version 955120 (0.0011) [2024-06-15 23:10:34,510][1652491] Updated weights for policy 0, policy_version 955154 (0.0011) [2024-06-15 23:10:35,286][1652491] Updated weights for policy 0, policy_version 955200 (0.0018) [2024-06-15 23:10:35,955][1648985] Fps is (10 sec: 62260.2, 60 sec: 56797.8, 300 sec: 57316.2). Total num frames: 1956249600. Throughput: 0: 14563.6. Samples: 489148928. Policy #0 lag: (min: 56.0, avg: 183.4, max: 312.0) [2024-06-15 23:10:35,956][1648985] Avg episode reward: [(0, '178.830')] [2024-06-15 23:10:36,651][1652491] Updated weights for policy 0, policy_version 955248 (0.0017) [2024-06-15 23:10:37,807][1652491] Updated weights for policy 0, policy_version 955281 (0.0012) [2024-06-15 23:10:38,776][1652491] Updated weights for policy 0, policy_version 955327 (0.0014) [2024-06-15 23:10:40,955][1648985] Fps is (10 sec: 55705.6, 60 sec: 57890.0, 300 sec: 57427.3). Total num frames: 1956544512. Throughput: 0: 14643.2. Samples: 489186304. Policy #0 lag: (min: 56.0, avg: 183.4, max: 312.0) [2024-06-15 23:10:40,956][1648985] Avg episode reward: [(0, '181.680')] [2024-06-15 23:10:41,638][1652491] Updated weights for policy 0, policy_version 955382 (0.0013) [2024-06-15 23:10:43,283][1652491] Updated weights for policy 0, policy_version 955413 (0.0013) [2024-06-15 23:10:44,005][1652491] Updated weights for policy 0, policy_version 955453 (0.0018) [2024-06-15 23:10:45,781][1652491] Updated weights for policy 0, policy_version 955519 (0.0022) [2024-06-15 23:10:45,955][1648985] Fps is (10 sec: 65535.9, 60 sec: 58982.3, 300 sec: 57760.6). Total num frames: 1956904960. Throughput: 0: 14688.7. Samples: 489279488. Policy #0 lag: (min: 56.0, avg: 183.4, max: 312.0) [2024-06-15 23:10:45,956][1648985] Avg episode reward: [(0, '164.720')] [2024-06-15 23:10:47,156][1652491] Updated weights for policy 0, policy_version 955555 (0.0012) [2024-06-15 23:10:49,904][1652491] Updated weights for policy 0, policy_version 955601 (0.0016) [2024-06-15 23:10:50,942][1652491] Updated weights for policy 0, policy_version 955646 (0.0012) [2024-06-15 23:10:50,955][1648985] Fps is (10 sec: 58983.0, 60 sec: 58436.3, 300 sec: 57649.5). Total num frames: 1957134336. Throughput: 0: 14722.8. Samples: 489362432. Policy #0 lag: (min: 56.0, avg: 183.4, max: 312.0) [2024-06-15 23:10:50,956][1648985] Avg episode reward: [(0, '152.040')] [2024-06-15 23:10:52,710][1652491] Updated weights for policy 0, policy_version 955700 (0.0013) [2024-06-15 23:10:55,023][1652491] Updated weights for policy 0, policy_version 955731 (0.0013) [2024-06-15 23:10:55,955][1648985] Fps is (10 sec: 52428.7, 60 sec: 58436.4, 300 sec: 57760.5). Total num frames: 1957429248. Throughput: 0: 14666.0. Samples: 489407488. Policy #0 lag: (min: 56.0, avg: 183.4, max: 312.0) [2024-06-15 23:10:55,956][1648985] Avg episode reward: [(0, '154.220')] [2024-06-15 23:10:56,243][1652491] Updated weights for policy 0, policy_version 955794 (0.0010) [2024-06-15 23:10:56,438][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000955808_1957494784.pth... [2024-06-15 23:10:56,588][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000948992_1943535616.pth [2024-06-15 23:10:56,595][1651469] Saving a milestone train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/milestones/checkpoint_000955808_1957494784.pth [2024-06-15 23:10:59,283][1652491] Updated weights for policy 0, policy_version 955856 (0.0011) [2024-06-15 23:11:00,394][1652491] Updated weights for policy 0, policy_version 955899 (0.0012) [2024-06-15 23:11:00,955][1648985] Fps is (10 sec: 55705.2, 60 sec: 58436.2, 300 sec: 57760.5). Total num frames: 1957691392. Throughput: 0: 14472.5. Samples: 489489920. Policy #0 lag: (min: 56.0, avg: 183.4, max: 312.0) [2024-06-15 23:11:00,956][1648985] Avg episode reward: [(0, '162.250')] [2024-06-15 23:11:02,015][1652491] Updated weights for policy 0, policy_version 955940 (0.0023) [2024-06-15 23:11:02,627][1652491] Updated weights for policy 0, policy_version 955968 (0.0011) [2024-06-15 23:11:04,561][1652491] Updated weights for policy 0, policy_version 956023 (0.0061) [2024-06-15 23:11:05,556][1652491] Updated weights for policy 0, policy_version 956069 (0.0012) [2024-06-15 23:11:05,955][1648985] Fps is (10 sec: 62259.4, 60 sec: 58436.3, 300 sec: 57982.7). Total num frames: 1958051840. Throughput: 0: 14472.5. Samples: 489575936. Policy #0 lag: (min: 56.0, avg: 183.4, max: 312.0) [2024-06-15 23:11:05,956][1648985] Avg episode reward: [(0, '162.240')] [2024-06-15 23:11:07,698][1651469] Signal inference workers to stop experience collection... (49800 times) [2024-06-15 23:11:07,731][1652491] InferenceWorker_p0-w0: stopping experience collection (49800 times) [2024-06-15 23:11:07,924][1651469] Signal inference workers to resume experience collection... (49800 times) [2024-06-15 23:11:07,924][1652491] InferenceWorker_p0-w0: resuming experience collection (49800 times) [2024-06-15 23:11:08,070][1652491] Updated weights for policy 0, policy_version 956115 (0.0012) [2024-06-15 23:11:08,897][1652491] Updated weights for policy 0, policy_version 956151 (0.0013) [2024-06-15 23:11:10,879][1652491] Updated weights for policy 0, policy_version 956193 (0.0011) [2024-06-15 23:11:10,955][1648985] Fps is (10 sec: 58982.6, 60 sec: 57890.1, 300 sec: 57982.7). Total num frames: 1958281216. Throughput: 0: 14529.5. Samples: 489620480. Policy #0 lag: (min: 56.0, avg: 183.4, max: 312.0) [2024-06-15 23:11:10,956][1648985] Avg episode reward: [(0, '160.920')] [2024-06-15 23:11:11,428][1652491] Updated weights for policy 0, policy_version 956220 (0.0009) [2024-06-15 23:11:12,933][1652491] Updated weights for policy 0, policy_version 956261 (0.0011) [2024-06-15 23:11:13,933][1652491] Updated weights for policy 0, policy_version 956306 (0.0011) [2024-06-15 23:11:15,955][1648985] Fps is (10 sec: 55705.3, 60 sec: 58982.3, 300 sec: 57871.6). Total num frames: 1958608896. Throughput: 0: 14506.7. Samples: 489705472. Policy #0 lag: (min: 56.0, avg: 183.4, max: 312.0) [2024-06-15 23:11:15,956][1648985] Avg episode reward: [(0, '168.790')] [2024-06-15 23:11:16,567][1652491] Updated weights for policy 0, policy_version 956353 (0.0012) [2024-06-15 23:11:17,563][1652491] Updated weights for policy 0, policy_version 956412 (0.0013) [2024-06-15 23:11:20,019][1652491] Updated weights for policy 0, policy_version 956464 (0.0020) [2024-06-15 23:11:20,955][1648985] Fps is (10 sec: 58982.6, 60 sec: 56797.8, 300 sec: 57760.5). Total num frames: 1958871040. Throughput: 0: 14415.6. Samples: 489797632. Policy #0 lag: (min: 56.0, avg: 183.4, max: 312.0) [2024-06-15 23:11:20,956][1648985] Avg episode reward: [(0, '168.190')] [2024-06-15 23:11:22,099][1652491] Updated weights for policy 0, policy_version 956512 (0.0010) [2024-06-15 23:11:22,887][1652491] Updated weights for policy 0, policy_version 956546 (0.0012) [2024-06-15 23:11:24,047][1652491] Updated weights for policy 0, policy_version 956595 (0.0019) [2024-06-15 23:11:25,443][1652491] Updated weights for policy 0, policy_version 956610 (0.0009) [2024-06-15 23:11:25,955][1648985] Fps is (10 sec: 58982.7, 60 sec: 59528.7, 300 sec: 57649.5). Total num frames: 1959198720. Throughput: 0: 14495.3. Samples: 489838592. Policy #0 lag: (min: 56.0, avg: 183.4, max: 312.0) [2024-06-15 23:11:25,956][1648985] Avg episode reward: [(0, '163.520')] [2024-06-15 23:11:26,309][1652491] Updated weights for policy 0, policy_version 956663 (0.0012) [2024-06-15 23:11:28,886][1652491] Updated weights for policy 0, policy_version 956720 (0.0013) [2024-06-15 23:11:30,955][1648985] Fps is (10 sec: 55705.6, 60 sec: 57344.1, 300 sec: 57871.6). Total num frames: 1959428096. Throughput: 0: 14495.3. Samples: 489931776. Policy #0 lag: (min: 56.0, avg: 183.4, max: 312.0) [2024-06-15 23:11:30,956][1648985] Avg episode reward: [(0, '166.720')] [2024-06-15 23:11:31,141][1652491] Updated weights for policy 0, policy_version 956768 (0.0012) [2024-06-15 23:11:32,870][1652491] Updated weights for policy 0, policy_version 956834 (0.0078) [2024-06-15 23:11:33,562][1652491] Updated weights for policy 0, policy_version 956863 (0.0011) [2024-06-15 23:11:35,886][1652491] Updated weights for policy 0, policy_version 956917 (0.0013) [2024-06-15 23:11:35,955][1648985] Fps is (10 sec: 55705.4, 60 sec: 58436.2, 300 sec: 57649.5). Total num frames: 1959755776. Throughput: 0: 14267.7. Samples: 490004480. Policy #0 lag: (min: 56.0, avg: 183.4, max: 312.0) [2024-06-15 23:11:35,956][1648985] Avg episode reward: [(0, '166.840')] [2024-06-15 23:11:38,695][1652491] Updated weights for policy 0, policy_version 956966 (0.0012) [2024-06-15 23:11:39,224][1652491] Updated weights for policy 0, policy_version 956992 (0.0010) [2024-06-15 23:11:40,955][1648985] Fps is (10 sec: 55705.6, 60 sec: 57344.1, 300 sec: 57760.5). Total num frames: 1959985152. Throughput: 0: 14313.2. Samples: 490051584. Policy #0 lag: (min: 56.0, avg: 183.4, max: 312.0) [2024-06-15 23:11:40,956][1648985] Avg episode reward: [(0, '148.480')] [2024-06-15 23:11:41,254][1652491] Updated weights for policy 0, policy_version 957046 (0.0013) [2024-06-15 23:11:41,938][1651469] Signal inference workers to stop experience collection... (49850 times) [2024-06-15 23:11:42,015][1652491] InferenceWorker_p0-w0: stopping experience collection (49850 times) [2024-06-15 23:11:42,250][1651469] Signal inference workers to resume experience collection... (49850 times) [2024-06-15 23:11:42,251][1652491] InferenceWorker_p0-w0: resuming experience collection (49850 times) [2024-06-15 23:11:42,424][1652491] Updated weights for policy 0, policy_version 957093 (0.0013) [2024-06-15 23:11:44,181][1652491] Updated weights for policy 0, policy_version 957138 (0.0012) [2024-06-15 23:11:45,955][1648985] Fps is (10 sec: 55706.0, 60 sec: 56797.9, 300 sec: 57760.6). Total num frames: 1960312832. Throughput: 0: 14279.1. Samples: 490132480. Policy #0 lag: (min: 56.0, avg: 183.4, max: 312.0) [2024-06-15 23:11:45,956][1648985] Avg episode reward: [(0, '147.890')] [2024-06-15 23:11:47,169][1652491] Updated weights for policy 0, policy_version 957202 (0.0013) [2024-06-15 23:11:47,973][1652491] Updated weights for policy 0, policy_version 957248 (0.0012) [2024-06-15 23:11:50,787][1652491] Updated weights for policy 0, policy_version 957312 (0.0013) [2024-06-15 23:11:50,955][1648985] Fps is (10 sec: 58982.4, 60 sec: 57344.0, 300 sec: 57760.6). Total num frames: 1960574976. Throughput: 0: 14427.0. Samples: 490225152. Policy #0 lag: (min: 56.0, avg: 183.4, max: 312.0) [2024-06-15 23:11:50,956][1648985] Avg episode reward: [(0, '160.120')] [2024-06-15 23:11:52,027][1652491] Updated weights for policy 0, policy_version 957368 (0.0012) [2024-06-15 23:11:54,093][1652491] Updated weights for policy 0, policy_version 957424 (0.0083) [2024-06-15 23:11:54,452][1652491] Updated weights for policy 0, policy_version 957439 (0.0010) [2024-06-15 23:11:55,955][1648985] Fps is (10 sec: 52428.0, 60 sec: 56797.8, 300 sec: 57760.5). Total num frames: 1960837120. Throughput: 0: 14279.1. Samples: 490263040. Policy #0 lag: (min: 56.0, avg: 183.4, max: 312.0) [2024-06-15 23:11:55,956][1648985] Avg episode reward: [(0, '163.820')] [2024-06-15 23:11:57,290][1652491] Updated weights for policy 0, policy_version 957504 (0.0012) [2024-06-15 23:11:59,520][1652491] Updated weights for policy 0, policy_version 957553 (0.0033) [2024-06-15 23:12:00,608][1652491] Updated weights for policy 0, policy_version 957600 (0.0010) [2024-06-15 23:12:00,955][1648985] Fps is (10 sec: 62259.5, 60 sec: 58436.4, 300 sec: 58093.8). Total num frames: 1961197568. Throughput: 0: 14392.9. Samples: 490353152. Policy #0 lag: (min: 56.0, avg: 183.4, max: 312.0) [2024-06-15 23:12:00,955][1648985] Avg episode reward: [(0, '175.110')] [2024-06-15 23:12:02,286][1652491] Updated weights for policy 0, policy_version 957648 (0.0012) [2024-06-15 23:12:03,303][1652491] Updated weights for policy 0, policy_version 957689 (0.0011) [2024-06-15 23:12:05,955][1648985] Fps is (10 sec: 58983.0, 60 sec: 56251.7, 300 sec: 57982.7). Total num frames: 1961426944. Throughput: 0: 14347.4. Samples: 490443264. Policy #0 lag: (min: 63.0, avg: 197.8, max: 319.0) [2024-06-15 23:12:05,956][1648985] Avg episode reward: [(0, '167.300')] [2024-06-15 23:12:06,304][1652491] Updated weights for policy 0, policy_version 957744 (0.0015) [2024-06-15 23:12:08,102][1652491] Updated weights for policy 0, policy_version 957778 (0.0048) [2024-06-15 23:12:09,354][1652491] Updated weights for policy 0, policy_version 957828 (0.0021) [2024-06-15 23:12:10,437][1652491] Updated weights for policy 0, policy_version 957879 (0.0015) [2024-06-15 23:12:10,955][1648985] Fps is (10 sec: 55705.4, 60 sec: 57890.2, 300 sec: 58093.8). Total num frames: 1961754624. Throughput: 0: 14267.7. Samples: 490480640. Policy #0 lag: (min: 63.0, avg: 197.8, max: 319.0) [2024-06-15 23:12:10,956][1648985] Avg episode reward: [(0, '156.740')] [2024-06-15 23:12:11,966][1652491] Updated weights for policy 0, policy_version 957922 (0.0011) [2024-06-15 23:12:14,901][1652491] Updated weights for policy 0, policy_version 957968 (0.0013) [2024-06-15 23:12:15,798][1652491] Updated weights for policy 0, policy_version 958015 (0.0012) [2024-06-15 23:12:15,955][1648985] Fps is (10 sec: 58982.1, 60 sec: 56797.8, 300 sec: 57982.7). Total num frames: 1962016768. Throughput: 0: 14279.1. Samples: 490574336. Policy #0 lag: (min: 63.0, avg: 197.8, max: 319.0) [2024-06-15 23:12:15,956][1648985] Avg episode reward: [(0, '172.200')] [2024-06-15 23:12:17,539][1651469] Signal inference workers to stop experience collection... (49900 times) [2024-06-15 23:12:17,646][1652491] InferenceWorker_p0-w0: stopping experience collection (49900 times) [2024-06-15 23:12:17,781][1651469] Signal inference workers to resume experience collection... (49900 times) [2024-06-15 23:12:17,781][1652491] InferenceWorker_p0-w0: resuming experience collection (49900 times) [2024-06-15 23:12:17,924][1652491] Updated weights for policy 0, policy_version 958081 (0.0089) [2024-06-15 23:12:19,198][1652491] Updated weights for policy 0, policy_version 958133 (0.0012) [2024-06-15 23:12:20,909][1652491] Updated weights for policy 0, policy_version 958176 (0.0011) [2024-06-15 23:12:20,955][1648985] Fps is (10 sec: 58982.4, 60 sec: 57890.1, 300 sec: 57871.6). Total num frames: 1962344448. Throughput: 0: 14313.3. Samples: 490648576. Policy #0 lag: (min: 63.0, avg: 197.8, max: 319.0) [2024-06-15 23:12:20,955][1648985] Avg episode reward: [(0, '185.210')] [2024-06-15 23:12:21,628][1652491] Updated weights for policy 0, policy_version 958208 (0.0022) [2024-06-15 23:12:25,260][1652491] Updated weights for policy 0, policy_version 958266 (0.0012) [2024-06-15 23:12:25,955][1648985] Fps is (10 sec: 55706.0, 60 sec: 56251.7, 300 sec: 57871.6). Total num frames: 1962573824. Throughput: 0: 14472.5. Samples: 490702848. Policy #0 lag: (min: 63.0, avg: 197.8, max: 319.0) [2024-06-15 23:12:25,955][1648985] Avg episode reward: [(0, '158.670')] [2024-06-15 23:12:27,282][1652491] Updated weights for policy 0, policy_version 958356 (0.0079) [2024-06-15 23:12:28,039][1652491] Updated weights for policy 0, policy_version 958400 (0.0090) [2024-06-15 23:12:30,955][1648985] Fps is (10 sec: 55705.8, 60 sec: 57890.2, 300 sec: 57760.5). Total num frames: 1962901504. Throughput: 0: 14438.4. Samples: 490782208. Policy #0 lag: (min: 63.0, avg: 197.8, max: 319.0) [2024-06-15 23:12:30,955][1648985] Avg episode reward: [(0, '151.850')] [2024-06-15 23:12:31,083][1652491] Updated weights for policy 0, policy_version 958457 (0.0014) [2024-06-15 23:12:34,206][1652491] Updated weights for policy 0, policy_version 958512 (0.0013) [2024-06-15 23:12:34,624][1652491] Updated weights for policy 0, policy_version 958528 (0.0012) [2024-06-15 23:12:35,682][1652491] Updated weights for policy 0, policy_version 958576 (0.0012) [2024-06-15 23:12:35,955][1648985] Fps is (10 sec: 58982.3, 60 sec: 56797.9, 300 sec: 57871.6). Total num frames: 1963163648. Throughput: 0: 14210.9. Samples: 490864640. Policy #0 lag: (min: 63.0, avg: 197.8, max: 319.0) [2024-06-15 23:12:35,956][1648985] Avg episode reward: [(0, '146.830')] [2024-06-15 23:12:36,479][1652491] Updated weights for policy 0, policy_version 958612 (0.0030) [2024-06-15 23:12:39,421][1652491] Updated weights for policy 0, policy_version 958658 (0.0019) [2024-06-15 23:12:40,655][1652491] Updated weights for policy 0, policy_version 958707 (0.0020) [2024-06-15 23:12:40,955][1648985] Fps is (10 sec: 55705.3, 60 sec: 57890.1, 300 sec: 57760.6). Total num frames: 1963458560. Throughput: 0: 14358.8. Samples: 490909184. Policy #0 lag: (min: 63.0, avg: 197.8, max: 319.0) [2024-06-15 23:12:40,956][1648985] Avg episode reward: [(0, '152.930')] [2024-06-15 23:12:42,380][1652491] Updated weights for policy 0, policy_version 958722 (0.0024) [2024-06-15 23:12:44,508][1652491] Updated weights for policy 0, policy_version 958785 (0.0014) [2024-06-15 23:12:45,924][1652491] Updated weights for policy 0, policy_version 958848 (0.0014) [2024-06-15 23:12:45,955][1648985] Fps is (10 sec: 55705.6, 60 sec: 56797.8, 300 sec: 57760.5). Total num frames: 1963720704. Throughput: 0: 14256.3. Samples: 490994688. Policy #0 lag: (min: 63.0, avg: 197.8, max: 319.0) [2024-06-15 23:12:45,955][1648985] Avg episode reward: [(0, '167.140')] [2024-06-15 23:12:46,964][1652491] Updated weights for policy 0, policy_version 958896 (0.0011) [2024-06-15 23:12:49,212][1652491] Updated weights for policy 0, policy_version 958929 (0.0012) [2024-06-15 23:12:49,926][1652491] Updated weights for policy 0, policy_version 958970 (0.0013) [2024-06-15 23:12:50,955][1648985] Fps is (10 sec: 52427.8, 60 sec: 56797.7, 300 sec: 57760.5). Total num frames: 1963982848. Throughput: 0: 14119.8. Samples: 491078656. Policy #0 lag: (min: 63.0, avg: 197.8, max: 319.0) [2024-06-15 23:12:50,956][1648985] Avg episode reward: [(0, '172.840')] [2024-06-15 23:12:52,554][1652491] Updated weights for policy 0, policy_version 959024 (0.0010) [2024-06-15 23:12:53,669][1652491] Updated weights for policy 0, policy_version 959056 (0.0012) [2024-06-15 23:12:53,781][1651469] Signal inference workers to stop experience collection... (49950 times) [2024-06-15 23:12:53,814][1652491] InferenceWorker_p0-w0: stopping experience collection (49950 times) [2024-06-15 23:12:54,067][1651469] Signal inference workers to resume experience collection... (49950 times) [2024-06-15 23:12:54,068][1652491] InferenceWorker_p0-w0: resuming experience collection (49950 times) [2024-06-15 23:12:54,928][1652491] Updated weights for policy 0, policy_version 959105 (0.0011) [2024-06-15 23:12:55,955][1648985] Fps is (10 sec: 58981.5, 60 sec: 57890.1, 300 sec: 57871.6). Total num frames: 1964310528. Throughput: 0: 14244.9. Samples: 491121664. Policy #0 lag: (min: 63.0, avg: 197.8, max: 319.0) [2024-06-15 23:12:55,956][1648985] Avg episode reward: [(0, '187.510')] [2024-06-15 23:12:56,153][1652491] Updated weights for policy 0, policy_version 959158 (0.0012) [2024-06-15 23:12:56,315][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000959168_1964376064.pth... [2024-06-15 23:12:56,359][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000952384_1950482432.pth [2024-06-15 23:12:58,705][1652491] Updated weights for policy 0, policy_version 959200 (0.0011) [2024-06-15 23:13:00,955][1648985] Fps is (10 sec: 52430.4, 60 sec: 55159.5, 300 sec: 57538.4). Total num frames: 1964507136. Throughput: 0: 13994.7. Samples: 491204096. Policy #0 lag: (min: 63.0, avg: 197.8, max: 319.0) [2024-06-15 23:13:00,955][1648985] Avg episode reward: [(0, '173.480')] [2024-06-15 23:13:01,676][1652491] Updated weights for policy 0, policy_version 959251 (0.0012) [2024-06-15 23:13:02,721][1652491] Updated weights for policy 0, policy_version 959292 (0.0012) [2024-06-15 23:13:03,947][1652491] Updated weights for policy 0, policy_version 959348 (0.0011) [2024-06-15 23:13:04,995][1652491] Updated weights for policy 0, policy_version 959394 (0.0011) [2024-06-15 23:13:05,955][1648985] Fps is (10 sec: 58983.6, 60 sec: 57890.2, 300 sec: 58093.8). Total num frames: 1964900352. Throughput: 0: 14222.2. Samples: 491288576. Policy #0 lag: (min: 63.0, avg: 197.8, max: 319.0) [2024-06-15 23:13:05,955][1648985] Avg episode reward: [(0, '162.290')] [2024-06-15 23:13:07,366][1652491] Updated weights for policy 0, policy_version 959443 (0.0012) [2024-06-15 23:13:09,221][1652491] Updated weights for policy 0, policy_version 959489 (0.0011) [2024-06-15 23:13:10,185][1652491] Updated weights for policy 0, policy_version 959546 (0.0012) [2024-06-15 23:13:10,955][1648985] Fps is (10 sec: 68812.1, 60 sec: 57344.0, 300 sec: 57982.7). Total num frames: 1965195264. Throughput: 0: 14381.5. Samples: 491350016. Policy #0 lag: (min: 63.0, avg: 197.8, max: 319.0) [2024-06-15 23:13:10,956][1648985] Avg episode reward: [(0, '162.690')] [2024-06-15 23:13:11,288][1652491] Updated weights for policy 0, policy_version 959585 (0.0011) [2024-06-15 23:13:12,466][1652491] Updated weights for policy 0, policy_version 959650 (0.0013) [2024-06-15 23:13:13,965][1652491] Updated weights for policy 0, policy_version 959681 (0.0011) [2024-06-15 23:13:14,843][1652491] Updated weights for policy 0, policy_version 959736 (0.0013) [2024-06-15 23:13:15,955][1648985] Fps is (10 sec: 65535.8, 60 sec: 58982.5, 300 sec: 58093.8). Total num frames: 1965555712. Throughput: 0: 14802.5. Samples: 491448320. Policy #0 lag: (min: 63.0, avg: 197.8, max: 319.0) [2024-06-15 23:13:15,956][1648985] Avg episode reward: [(0, '154.150')] [2024-06-15 23:13:17,284][1652491] Updated weights for policy 0, policy_version 959792 (0.0079) [2024-06-15 23:13:18,458][1652491] Updated weights for policy 0, policy_version 959829 (0.0019) [2024-06-15 23:13:19,740][1652491] Updated weights for policy 0, policy_version 959891 (0.0011) [2024-06-15 23:13:20,955][1648985] Fps is (10 sec: 75366.8, 60 sec: 60074.7, 300 sec: 58538.1). Total num frames: 1965948928. Throughput: 0: 15314.5. Samples: 491553792. Policy #0 lag: (min: 63.0, avg: 197.8, max: 319.0) [2024-06-15 23:13:20,955][1648985] Avg episode reward: [(0, '173.950')] [2024-06-15 23:13:21,212][1652491] Updated weights for policy 0, policy_version 959938 (0.0025) [2024-06-15 23:13:21,914][1652491] Updated weights for policy 0, policy_version 959989 (0.0011) [2024-06-15 23:13:24,117][1651469] Signal inference workers to stop experience collection... (50000 times) [2024-06-15 23:13:24,157][1652491] InferenceWorker_p0-w0: stopping experience collection (50000 times) [2024-06-15 23:13:24,267][1651469] Signal inference workers to resume experience collection... (50000 times) [2024-06-15 23:13:24,267][1652491] InferenceWorker_p0-w0: resuming experience collection (50000 times) [2024-06-15 23:13:24,499][1652491] Updated weights for policy 0, policy_version 960048 (0.0012) [2024-06-15 23:13:25,955][1648985] Fps is (10 sec: 72087.8, 60 sec: 61712.8, 300 sec: 58426.9). Total num frames: 1966276608. Throughput: 0: 15803.7. Samples: 491620352. Policy #0 lag: (min: 63.0, avg: 197.8, max: 319.0) [2024-06-15 23:13:25,956][1648985] Avg episode reward: [(0, '169.380')] [2024-06-15 23:13:25,987][1652491] Updated weights for policy 0, policy_version 960101 (0.0010) [2024-06-15 23:13:26,833][1652491] Updated weights for policy 0, policy_version 960148 (0.0012) [2024-06-15 23:13:28,712][1652491] Updated weights for policy 0, policy_version 960208 (0.0012) [2024-06-15 23:13:30,955][1648985] Fps is (10 sec: 65535.5, 60 sec: 61713.0, 300 sec: 58649.2). Total num frames: 1966604288. Throughput: 0: 16065.4. Samples: 491717632. Policy #0 lag: (min: 63.0, avg: 197.8, max: 319.0) [2024-06-15 23:13:30,956][1648985] Avg episode reward: [(0, '155.240')] [2024-06-15 23:13:31,766][1652491] Updated weights for policy 0, policy_version 960272 (0.0012) [2024-06-15 23:13:32,732][1652491] Updated weights for policy 0, policy_version 960336 (0.0017) [2024-06-15 23:13:33,718][1652491] Updated weights for policy 0, policy_version 960389 (0.0012) [2024-06-15 23:13:35,681][1652491] Updated weights for policy 0, policy_version 960450 (0.0012) [2024-06-15 23:13:35,966][1648985] Fps is (10 sec: 75283.9, 60 sec: 64431.7, 300 sec: 59202.3). Total num frames: 1967030272. Throughput: 0: 16618.9. Samples: 491826688. Policy #0 lag: (min: 63.0, avg: 197.8, max: 319.0) [2024-06-15 23:13:35,967][1648985] Avg episode reward: [(0, '177.660')] [2024-06-15 23:13:36,837][1652491] Updated weights for policy 0, policy_version 960512 (0.0012) [2024-06-15 23:13:39,869][1652491] Updated weights for policy 0, policy_version 960560 (0.0012) [2024-06-15 23:13:40,948][1652491] Updated weights for policy 0, policy_version 960624 (0.0011) [2024-06-15 23:13:40,955][1648985] Fps is (10 sec: 75365.3, 60 sec: 64989.7, 300 sec: 58982.4). Total num frames: 1967357952. Throughput: 0: 16918.7. Samples: 491883008. Policy #0 lag: (min: 63.0, avg: 197.8, max: 319.0) [2024-06-15 23:13:40,956][1648985] Avg episode reward: [(0, '179.610')] [2024-06-15 23:13:42,297][1652491] Updated weights for policy 0, policy_version 960699 (0.0011) [2024-06-15 23:13:43,632][1652491] Updated weights for policy 0, policy_version 960736 (0.0011) [2024-06-15 23:13:44,219][1652491] Updated weights for policy 0, policy_version 960768 (0.0010) [2024-06-15 23:13:45,955][1648985] Fps is (10 sec: 62329.0, 60 sec: 65536.0, 300 sec: 59315.6). Total num frames: 1967652864. Throughput: 0: 17169.0. Samples: 491976704. Policy #0 lag: (min: 63.0, avg: 197.8, max: 319.0) [2024-06-15 23:13:45,956][1648985] Avg episode reward: [(0, '164.140')] [2024-06-15 23:13:48,167][1652491] Updated weights for policy 0, policy_version 960832 (0.0011) [2024-06-15 23:13:49,414][1652491] Updated weights for policy 0, policy_version 960901 (0.0022) [2024-06-15 23:13:49,875][1651469] Signal inference workers to stop experience collection... (50050 times) [2024-06-15 23:13:49,936][1652491] InferenceWorker_p0-w0: stopping experience collection (50050 times) [2024-06-15 23:13:50,008][1651469] Signal inference workers to resume experience collection... (50050 times) [2024-06-15 23:13:50,009][1652491] InferenceWorker_p0-w0: resuming experience collection (50050 times) [2024-06-15 23:13:50,111][1652491] Updated weights for policy 0, policy_version 960951 (0.0011) [2024-06-15 23:13:50,955][1648985] Fps is (10 sec: 72091.3, 60 sec: 68267.0, 300 sec: 59648.9). Total num frames: 1968078848. Throughput: 0: 17669.7. Samples: 492083712. Policy #0 lag: (min: 63.0, avg: 197.8, max: 319.0) [2024-06-15 23:13:50,955][1648985] Avg episode reward: [(0, '167.060')] [2024-06-15 23:13:51,504][1652491] Updated weights for policy 0, policy_version 961008 (0.0011) [2024-06-15 23:13:55,156][1652491] Updated weights for policy 0, policy_version 961044 (0.0014) [2024-06-15 23:13:55,955][1648985] Fps is (10 sec: 65535.9, 60 sec: 66628.4, 300 sec: 59537.8). Total num frames: 1968308224. Throughput: 0: 17555.9. Samples: 492140032. Policy #0 lag: (min: 63.0, avg: 197.8, max: 319.0) [2024-06-15 23:13:55,955][1648985] Avg episode reward: [(0, '139.390')] [2024-06-15 23:13:56,301][1652491] Updated weights for policy 0, policy_version 961109 (0.0019) [2024-06-15 23:13:57,360][1652491] Updated weights for policy 0, policy_version 961170 (0.0016) [2024-06-15 23:13:57,987][1652491] Updated weights for policy 0, policy_version 961216 (0.0009) [2024-06-15 23:13:59,186][1652491] Updated weights for policy 0, policy_version 961271 (0.0013) [2024-06-15 23:14:00,955][1648985] Fps is (10 sec: 62258.7, 60 sec: 69904.9, 300 sec: 59982.1). Total num frames: 1968701440. Throughput: 0: 17658.3. Samples: 492242944. Policy #0 lag: (min: 63.0, avg: 197.8, max: 319.0) [2024-06-15 23:14:00,956][1648985] Avg episode reward: [(0, '149.130')] [2024-06-15 23:14:02,676][1652491] Updated weights for policy 0, policy_version 961318 (0.0080) [2024-06-15 23:14:03,563][1652491] Updated weights for policy 0, policy_version 961379 (0.0012) [2024-06-15 23:14:04,324][1652491] Updated weights for policy 0, policy_version 961426 (0.0012) [2024-06-15 23:14:05,072][1652491] Updated weights for policy 0, policy_version 961469 (0.0011) [2024-06-15 23:14:05,955][1648985] Fps is (10 sec: 81920.3, 60 sec: 70451.2, 300 sec: 60537.5). Total num frames: 1969127424. Throughput: 0: 17749.3. Samples: 492352512. Policy #0 lag: (min: 15.0, avg: 79.3, max: 271.0) [2024-06-15 23:14:05,955][1648985] Avg episode reward: [(0, '179.570')] [2024-06-15 23:14:06,416][1652491] Updated weights for policy 0, policy_version 961520 (0.0012) [2024-06-15 23:14:09,602][1652491] Updated weights for policy 0, policy_version 961553 (0.0011) [2024-06-15 23:14:10,955][1648985] Fps is (10 sec: 68812.7, 60 sec: 69905.0, 300 sec: 60426.4). Total num frames: 1969389568. Throughput: 0: 17590.1. Samples: 492411904. Policy #0 lag: (min: 15.0, avg: 79.3, max: 271.0) [2024-06-15 23:14:10,956][1648985] Avg episode reward: [(0, '172.450')] [2024-06-15 23:14:10,965][1652491] Updated weights for policy 0, policy_version 961621 (0.0012) [2024-06-15 23:14:11,991][1652491] Updated weights for policy 0, policy_version 961681 (0.0011) [2024-06-15 23:14:12,681][1652491] Updated weights for policy 0, policy_version 961724 (0.0011) [2024-06-15 23:14:14,217][1652491] Updated weights for policy 0, policy_version 961776 (0.0011) [2024-06-15 23:14:15,955][1648985] Fps is (10 sec: 62258.6, 60 sec: 69905.0, 300 sec: 60648.5). Total num frames: 1969750016. Throughput: 0: 17601.4. Samples: 492509696. Policy #0 lag: (min: 15.0, avg: 79.3, max: 271.0) [2024-06-15 23:14:15,956][1648985] Avg episode reward: [(0, '165.800')] [2024-06-15 23:14:16,925][1652491] Updated weights for policy 0, policy_version 961810 (0.0012) [2024-06-15 23:14:17,781][1651469] Signal inference workers to stop experience collection... (50100 times) [2024-06-15 23:14:17,819][1652491] InferenceWorker_p0-w0: stopping experience collection (50100 times) [2024-06-15 23:14:18,014][1651469] Signal inference workers to resume experience collection... (50100 times) [2024-06-15 23:14:18,015][1652491] InferenceWorker_p0-w0: resuming experience collection (50100 times) [2024-06-15 23:14:18,169][1652491] Updated weights for policy 0, policy_version 961875 (0.0094) [2024-06-15 23:14:19,203][1652491] Updated weights for policy 0, policy_version 961936 (0.0010) [2024-06-15 23:14:19,948][1652491] Updated weights for policy 0, policy_version 961979 (0.0025) [2024-06-15 23:14:20,955][1648985] Fps is (10 sec: 75366.8, 60 sec: 69905.0, 300 sec: 60870.7). Total num frames: 1970143232. Throughput: 0: 17492.0. Samples: 492613632. Policy #0 lag: (min: 15.0, avg: 79.3, max: 271.0) [2024-06-15 23:14:20,955][1648985] Avg episode reward: [(0, '167.410')] [2024-06-15 23:14:21,825][1652491] Updated weights for policy 0, policy_version 962016 (0.0010) [2024-06-15 23:14:23,769][1652491] Updated weights for policy 0, policy_version 962052 (0.0011) [2024-06-15 23:14:24,911][1652491] Updated weights for policy 0, policy_version 962116 (0.0013) [2024-06-15 23:14:25,955][1648985] Fps is (10 sec: 75366.9, 60 sec: 70451.5, 300 sec: 61094.1). Total num frames: 1970503680. Throughput: 0: 17590.1. Samples: 492674560. Policy #0 lag: (min: 15.0, avg: 79.3, max: 271.0) [2024-06-15 23:14:25,956][1648985] Avg episode reward: [(0, '152.260')] [2024-06-15 23:14:26,118][1652491] Updated weights for policy 0, policy_version 962178 (0.0012) [2024-06-15 23:14:27,077][1652491] Updated weights for policy 0, policy_version 962237 (0.0072) [2024-06-15 23:14:30,088][1652491] Updated weights for policy 0, policy_version 962288 (0.0020) [2024-06-15 23:14:30,955][1648985] Fps is (10 sec: 65534.6, 60 sec: 69904.9, 300 sec: 60870.7). Total num frames: 1970798592. Throughput: 0: 17601.3. Samples: 492768768. Policy #0 lag: (min: 15.0, avg: 79.3, max: 271.0) [2024-06-15 23:14:30,956][1648985] Avg episode reward: [(0, '175.530')] [2024-06-15 23:14:32,377][1652491] Updated weights for policy 0, policy_version 962338 (0.0013) [2024-06-15 23:14:33,412][1652491] Updated weights for policy 0, policy_version 962400 (0.0013) [2024-06-15 23:14:34,445][1652491] Updated weights for policy 0, policy_version 962449 (0.0012) [2024-06-15 23:14:35,955][1648985] Fps is (10 sec: 68813.2, 60 sec: 69371.9, 300 sec: 61426.1). Total num frames: 1971191808. Throughput: 0: 17351.1. Samples: 492864512. Policy #0 lag: (min: 15.0, avg: 79.3, max: 271.0) [2024-06-15 23:14:35,955][1648985] Avg episode reward: [(0, '201.920')] [2024-06-15 23:14:37,089][1652491] Updated weights for policy 0, policy_version 962501 (0.0012) [2024-06-15 23:14:38,046][1652491] Updated weights for policy 0, policy_version 962557 (0.0012) [2024-06-15 23:14:40,116][1652491] Updated weights for policy 0, policy_version 962608 (0.0010) [2024-06-15 23:14:40,894][1652491] Updated weights for policy 0, policy_version 962656 (0.0011) [2024-06-15 23:14:40,955][1648985] Fps is (10 sec: 72090.7, 60 sec: 69359.1, 300 sec: 61537.2). Total num frames: 1971519488. Throughput: 0: 17419.4. Samples: 492923904. Policy #0 lag: (min: 15.0, avg: 79.3, max: 271.0) [2024-06-15 23:14:40,956][1648985] Avg episode reward: [(0, '166.340')] [2024-06-15 23:14:41,961][1652491] Updated weights for policy 0, policy_version 962707 (0.0011) [2024-06-15 23:14:42,698][1652491] Updated weights for policy 0, policy_version 962751 (0.0013) [2024-06-15 23:14:44,725][1651469] Signal inference workers to stop experience collection... (50150 times) [2024-06-15 23:14:44,805][1652491] InferenceWorker_p0-w0: stopping experience collection (50150 times) [2024-06-15 23:14:44,949][1651469] Signal inference workers to resume experience collection... (50150 times) [2024-06-15 23:14:44,950][1652491] InferenceWorker_p0-w0: resuming experience collection (50150 times) [2024-06-15 23:14:45,406][1652491] Updated weights for policy 0, policy_version 962808 (0.0083) [2024-06-15 23:14:45,955][1648985] Fps is (10 sec: 65534.9, 60 sec: 69904.9, 300 sec: 61759.3). Total num frames: 1971847168. Throughput: 0: 17487.6. Samples: 493029888. Policy #0 lag: (min: 15.0, avg: 79.3, max: 271.0) [2024-06-15 23:14:45,956][1648985] Avg episode reward: [(0, '143.830')] [2024-06-15 23:14:46,868][1652491] Updated weights for policy 0, policy_version 962833 (0.0012) [2024-06-15 23:14:47,641][1652491] Updated weights for policy 0, policy_version 962881 (0.0011) [2024-06-15 23:14:48,362][1652491] Updated weights for policy 0, policy_version 962929 (0.0011) [2024-06-15 23:14:49,144][1652491] Updated weights for policy 0, policy_version 962978 (0.0011) [2024-06-15 23:14:50,955][1648985] Fps is (10 sec: 72089.8, 60 sec: 69358.9, 300 sec: 62092.6). Total num frames: 1972240384. Throughput: 0: 17578.6. Samples: 493143552. Policy #0 lag: (min: 15.0, avg: 79.3, max: 271.0) [2024-06-15 23:14:50,956][1648985] Avg episode reward: [(0, '162.200')] [2024-06-15 23:14:51,703][1652491] Updated weights for policy 0, policy_version 963024 (0.0068) [2024-06-15 23:14:52,357][1652491] Updated weights for policy 0, policy_version 963066 (0.0012) [2024-06-15 23:14:53,572][1652491] Updated weights for policy 0, policy_version 963090 (0.0028) [2024-06-15 23:14:54,667][1652491] Updated weights for policy 0, policy_version 963152 (0.0012) [2024-06-15 23:14:55,604][1652491] Updated weights for policy 0, policy_version 963216 (0.0011) [2024-06-15 23:14:55,955][1648985] Fps is (10 sec: 85197.5, 60 sec: 73181.8, 300 sec: 62759.0). Total num frames: 1972699136. Throughput: 0: 17521.8. Samples: 493200384. Policy #0 lag: (min: 15.0, avg: 79.3, max: 271.0) [2024-06-15 23:14:55,956][1648985] Avg episode reward: [(0, '165.450')] [2024-06-15 23:14:56,250][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000963264_1972764672.pth... [2024-06-15 23:14:56,259][1652491] Updated weights for policy 0, policy_version 963264 (0.0010) [2024-06-15 23:14:56,295][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000955808_1957494784.pth [2024-06-15 23:14:59,182][1652491] Updated weights for policy 0, policy_version 963322 (0.0017) [2024-06-15 23:15:00,845][1652491] Updated weights for policy 0, policy_version 963360 (0.0013) [2024-06-15 23:15:00,955][1648985] Fps is (10 sec: 72090.1, 60 sec: 70997.4, 300 sec: 62425.8). Total num frames: 1972961280. Throughput: 0: 17874.5. Samples: 493314048. Policy #0 lag: (min: 15.0, avg: 79.3, max: 271.0) [2024-06-15 23:15:00,955][1648985] Avg episode reward: [(0, '169.290')] [2024-06-15 23:15:01,954][1652491] Updated weights for policy 0, policy_version 963424 (0.0010) [2024-06-15 23:15:02,823][1652491] Updated weights for policy 0, policy_version 963473 (0.0014) [2024-06-15 23:15:05,107][1652491] Updated weights for policy 0, policy_version 963523 (0.0011) [2024-06-15 23:15:05,871][1652491] Updated weights for policy 0, policy_version 963578 (0.0012) [2024-06-15 23:15:05,955][1648985] Fps is (10 sec: 72090.7, 60 sec: 71543.6, 300 sec: 63092.3). Total num frames: 1973420032. Throughput: 0: 17965.6. Samples: 493422080. Policy #0 lag: (min: 15.0, avg: 79.3, max: 271.0) [2024-06-15 23:15:05,955][1648985] Avg episode reward: [(0, '174.700')] [2024-06-15 23:15:07,959][1652491] Updated weights for policy 0, policy_version 963621 (0.0009) [2024-06-15 23:15:09,164][1652491] Updated weights for policy 0, policy_version 963696 (0.0011) [2024-06-15 23:15:09,242][1651469] Signal inference workers to stop experience collection... (50200 times) [2024-06-15 23:15:09,296][1652491] InferenceWorker_p0-w0: stopping experience collection (50200 times) [2024-06-15 23:15:09,471][1651469] Signal inference workers to resume experience collection... (50200 times) [2024-06-15 23:15:09,472][1652491] InferenceWorker_p0-w0: resuming experience collection (50200 times) [2024-06-15 23:15:10,425][1652491] Updated weights for policy 0, policy_version 963768 (0.0013) [2024-06-15 23:15:10,955][1648985] Fps is (10 sec: 85196.6, 60 sec: 73728.1, 300 sec: 63536.6). Total num frames: 1973813248. Throughput: 0: 17874.5. Samples: 493478912. Policy #0 lag: (min: 15.0, avg: 79.3, max: 271.0) [2024-06-15 23:15:10,955][1648985] Avg episode reward: [(0, '172.680')] [2024-06-15 23:15:13,255][1652491] Updated weights for policy 0, policy_version 963824 (0.0011) [2024-06-15 23:15:15,102][1652491] Updated weights for policy 0, policy_version 963857 (0.0009) [2024-06-15 23:15:15,955][1648985] Fps is (10 sec: 65535.5, 60 sec: 72089.7, 300 sec: 63092.3). Total num frames: 1974075392. Throughput: 0: 18250.1. Samples: 493590016. Policy #0 lag: (min: 15.0, avg: 79.3, max: 271.0) [2024-06-15 23:15:15,955][1648985] Avg episode reward: [(0, '151.650')] [2024-06-15 23:15:16,068][1652491] Updated weights for policy 0, policy_version 963920 (0.0013) [2024-06-15 23:15:17,070][1652491] Updated weights for policy 0, policy_version 963974 (0.0011) [2024-06-15 23:15:17,888][1652491] Updated weights for policy 0, policy_version 964029 (0.0085) [2024-06-15 23:15:20,916][1652491] Updated weights for policy 0, policy_version 964080 (0.0031) [2024-06-15 23:15:20,955][1648985] Fps is (10 sec: 62258.3, 60 sec: 71543.3, 300 sec: 63758.8). Total num frames: 1974435840. Throughput: 0: 18329.5. Samples: 493689344. Policy #0 lag: (min: 15.0, avg: 79.3, max: 271.0) [2024-06-15 23:15:20,956][1648985] Avg episode reward: [(0, '152.510')] [2024-06-15 23:15:22,785][1652491] Updated weights for policy 0, policy_version 964128 (0.0009) [2024-06-15 23:15:23,712][1652491] Updated weights for policy 0, policy_version 964179 (0.0011) [2024-06-15 23:15:25,023][1652491] Updated weights for policy 0, policy_version 964244 (0.0013) [2024-06-15 23:15:25,955][1648985] Fps is (10 sec: 78642.2, 60 sec: 72635.6, 300 sec: 63980.9). Total num frames: 1974861824. Throughput: 0: 18193.1. Samples: 493742592. Policy #0 lag: (min: 15.0, avg: 79.3, max: 271.0) [2024-06-15 23:15:25,956][1648985] Avg episode reward: [(0, '181.510')] [2024-06-15 23:15:27,737][1652491] Updated weights for policy 0, policy_version 964305 (0.0011) [2024-06-15 23:15:28,359][1652491] Updated weights for policy 0, policy_version 964352 (0.0012) [2024-06-15 23:15:30,750][1652491] Updated weights for policy 0, policy_version 964416 (0.0010) [2024-06-15 23:15:30,955][1648985] Fps is (10 sec: 68813.6, 60 sec: 72089.9, 300 sec: 63980.9). Total num frames: 1975123968. Throughput: 0: 18397.9. Samples: 493857792. Policy #0 lag: (min: 15.0, avg: 79.3, max: 271.0) [2024-06-15 23:15:30,956][1648985] Avg episode reward: [(0, '142.440')] [2024-06-15 23:15:31,914][1652491] Updated weights for policy 0, policy_version 964480 (0.0011) [2024-06-15 23:15:32,854][1652491] Updated weights for policy 0, policy_version 964536 (0.0017) [2024-06-15 23:15:35,231][1652491] Updated weights for policy 0, policy_version 964577 (0.0021) [2024-06-15 23:15:35,955][1648985] Fps is (10 sec: 65536.0, 60 sec: 72089.4, 300 sec: 64314.1). Total num frames: 1975517184. Throughput: 0: 18158.9. Samples: 493960704. Policy #0 lag: (min: 15.0, avg: 79.3, max: 271.0) [2024-06-15 23:15:35,956][1648985] Avg episode reward: [(0, '136.520')] [2024-06-15 23:15:36,960][1651469] Signal inference workers to stop experience collection... (50250 times) [2024-06-15 23:15:37,018][1652491] InferenceWorker_p0-w0: stopping experience collection (50250 times) [2024-06-15 23:15:37,020][1652491] Updated weights for policy 0, policy_version 964630 (0.0011) [2024-06-15 23:15:37,145][1651469] Signal inference workers to resume experience collection... (50250 times) [2024-06-15 23:15:37,146][1652491] InferenceWorker_p0-w0: resuming experience collection (50250 times) [2024-06-15 23:15:38,132][1652491] Updated weights for policy 0, policy_version 964690 (0.0011) [2024-06-15 23:15:39,138][1652491] Updated weights for policy 0, policy_version 964742 (0.0023) [2024-06-15 23:15:40,051][1652491] Updated weights for policy 0, policy_version 964798 (0.0012) [2024-06-15 23:15:40,955][1648985] Fps is (10 sec: 78642.2, 60 sec: 73181.8, 300 sec: 64425.2). Total num frames: 1975910400. Throughput: 0: 18033.7. Samples: 494011904. Policy #0 lag: (min: 15.0, avg: 79.3, max: 271.0) [2024-06-15 23:15:40,956][1648985] Avg episode reward: [(0, '156.760')] [2024-06-15 23:15:42,710][1652491] Updated weights for policy 0, policy_version 964861 (0.0012) [2024-06-15 23:15:44,732][1652491] Updated weights for policy 0, policy_version 964916 (0.0012) [2024-06-15 23:15:45,593][1652491] Updated weights for policy 0, policy_version 964961 (0.0011) [2024-06-15 23:15:45,955][1648985] Fps is (10 sec: 75367.0, 60 sec: 73728.1, 300 sec: 64869.5). Total num frames: 1976270848. Throughput: 0: 18033.8. Samples: 494125568. Policy #0 lag: (min: 15.0, avg: 79.3, max: 271.0) [2024-06-15 23:15:45,956][1648985] Avg episode reward: [(0, '158.840')] [2024-06-15 23:15:46,457][1652491] Updated weights for policy 0, policy_version 965009 (0.0012) [2024-06-15 23:15:47,105][1652491] Updated weights for policy 0, policy_version 965051 (0.0012) [2024-06-15 23:15:49,661][1652491] Updated weights for policy 0, policy_version 965104 (0.0011) [2024-06-15 23:15:50,955][1648985] Fps is (10 sec: 65537.0, 60 sec: 72089.7, 300 sec: 64869.5). Total num frames: 1976565760. Throughput: 0: 18238.5. Samples: 494242816. Policy #0 lag: (min: 15.0, avg: 79.3, max: 271.0) [2024-06-15 23:15:50,955][1648985] Avg episode reward: [(0, '153.570')] [2024-06-15 23:15:51,195][1652491] Updated weights for policy 0, policy_version 965141 (0.0011) [2024-06-15 23:15:52,187][1652491] Updated weights for policy 0, policy_version 965201 (0.0012) [2024-06-15 23:15:53,266][1652491] Updated weights for policy 0, policy_version 965264 (0.0013) [2024-06-15 23:15:54,026][1652491] Updated weights for policy 0, policy_version 965309 (0.0010) [2024-06-15 23:15:55,955][1648985] Fps is (10 sec: 68812.0, 60 sec: 70997.2, 300 sec: 65313.8). Total num frames: 1976958976. Throughput: 0: 17863.0. Samples: 494282752. Policy #0 lag: (min: 15.0, avg: 79.3, max: 271.0) [2024-06-15 23:15:55,956][1648985] Avg episode reward: [(0, '139.200')] [2024-06-15 23:15:56,753][1652491] Updated weights for policy 0, policy_version 965360 (0.0013) [2024-06-15 23:15:58,495][1652491] Updated weights for policy 0, policy_version 965410 (0.0011) [2024-06-15 23:15:59,143][1652491] Updated weights for policy 0, policy_version 965443 (0.0013) [2024-06-15 23:16:00,156][1652491] Updated weights for policy 0, policy_version 965505 (0.0026) [2024-06-15 23:16:00,605][1651469] Signal inference workers to stop experience collection... (50300 times) [2024-06-15 23:16:00,629][1652491] InferenceWorker_p0-w0: stopping experience collection (50300 times) [2024-06-15 23:16:00,752][1651469] Signal inference workers to resume experience collection... (50300 times) [2024-06-15 23:16:00,758][1652491] InferenceWorker_p0-w0: resuming experience collection (50300 times) [2024-06-15 23:16:00,858][1652491] Updated weights for policy 0, policy_version 965558 (0.0011) [2024-06-15 23:16:00,955][1648985] Fps is (10 sec: 91750.6, 60 sec: 75366.4, 300 sec: 65869.2). Total num frames: 1977483264. Throughput: 0: 17988.3. Samples: 494399488. Policy #0 lag: (min: 55.0, avg: 182.3, max: 327.0) [2024-06-15 23:16:00,956][1648985] Avg episode reward: [(0, '155.240')] [2024-06-15 23:16:03,649][1652491] Updated weights for policy 0, policy_version 965604 (0.0010) [2024-06-15 23:16:04,535][1652491] Updated weights for policy 0, policy_version 965635 (0.0012) [2024-06-15 23:16:05,500][1652491] Updated weights for policy 0, policy_version 965691 (0.0013) [2024-06-15 23:16:05,955][1648985] Fps is (10 sec: 78644.3, 60 sec: 72089.5, 300 sec: 65980.3). Total num frames: 1977745408. Throughput: 0: 18352.4. Samples: 494515200. Policy #0 lag: (min: 55.0, avg: 182.3, max: 327.0) [2024-06-15 23:16:05,955][1648985] Avg episode reward: [(0, '163.290')] [2024-06-15 23:16:06,792][1652491] Updated weights for policy 0, policy_version 965737 (0.0011) [2024-06-15 23:16:07,546][1652491] Updated weights for policy 0, policy_version 965792 (0.0013) [2024-06-15 23:16:10,440][1652491] Updated weights for policy 0, policy_version 965856 (0.0021) [2024-06-15 23:16:10,955][1648985] Fps is (10 sec: 62257.3, 60 sec: 71543.1, 300 sec: 66091.3). Total num frames: 1978105856. Throughput: 0: 18420.6. Samples: 494571520. Policy #0 lag: (min: 55.0, avg: 182.3, max: 327.0) [2024-06-15 23:16:10,956][1648985] Avg episode reward: [(0, '167.700')] [2024-06-15 23:16:11,945][1652491] Updated weights for policy 0, policy_version 965920 (0.0011) [2024-06-15 23:16:13,191][1652491] Updated weights for policy 0, policy_version 965954 (0.0010) [2024-06-15 23:16:14,006][1652491] Updated weights for policy 0, policy_version 966003 (0.0016) [2024-06-15 23:16:14,955][1652491] Updated weights for policy 0, policy_version 966072 (0.0013) [2024-06-15 23:16:15,955][1648985] Fps is (10 sec: 78642.9, 60 sec: 74274.1, 300 sec: 66646.8). Total num frames: 1978531840. Throughput: 0: 18181.7. Samples: 494675968. Policy #0 lag: (min: 55.0, avg: 182.3, max: 327.0) [2024-06-15 23:16:15,956][1648985] Avg episode reward: [(0, '157.360')] [2024-06-15 23:16:18,007][1652491] Updated weights for policy 0, policy_version 966128 (0.0010) [2024-06-15 23:16:19,370][1652491] Updated weights for policy 0, policy_version 966176 (0.0127) [2024-06-15 23:16:20,387][1652491] Updated weights for policy 0, policy_version 966210 (0.0010) [2024-06-15 23:16:20,955][1648985] Fps is (10 sec: 75368.3, 60 sec: 73728.1, 300 sec: 66646.8). Total num frames: 1978859520. Throughput: 0: 18409.3. Samples: 494789120. Policy #0 lag: (min: 55.0, avg: 182.3, max: 327.0) [2024-06-15 23:16:20,956][1648985] Avg episode reward: [(0, '170.000')] [2024-06-15 23:16:21,277][1652491] Updated weights for policy 0, policy_version 966265 (0.0012) [2024-06-15 23:16:21,865][1652491] Updated weights for policy 0, policy_version 966304 (0.0011) [2024-06-15 23:16:24,630][1652491] Updated weights for policy 0, policy_version 966354 (0.0012) [2024-06-15 23:16:25,389][1652491] Updated weights for policy 0, policy_version 966400 (0.0012) [2024-06-15 23:16:25,955][1648985] Fps is (10 sec: 65536.2, 60 sec: 72089.7, 300 sec: 66980.0). Total num frames: 1979187200. Throughput: 0: 18636.9. Samples: 494850560. Policy #0 lag: (min: 55.0, avg: 182.3, max: 327.0) [2024-06-15 23:16:25,955][1648985] Avg episode reward: [(0, '175.480')] [2024-06-15 23:16:26,963][1652491] Updated weights for policy 0, policy_version 966459 (0.0010) [2024-06-15 23:16:28,280][1652491] Updated weights for policy 0, policy_version 966523 (0.0011) [2024-06-15 23:16:29,526][1651469] Signal inference workers to stop experience collection... (50350 times) [2024-06-15 23:16:29,551][1652491] Updated weights for policy 0, policy_version 966562 (0.0010) [2024-06-15 23:16:29,572][1652491] InferenceWorker_p0-w0: stopping experience collection (50350 times) [2024-06-15 23:16:29,678][1651469] Signal inference workers to resume experience collection... (50350 times) [2024-06-15 23:16:29,678][1652491] InferenceWorker_p0-w0: resuming experience collection (50350 times) [2024-06-15 23:16:30,955][1648985] Fps is (10 sec: 72089.2, 60 sec: 74274.1, 300 sec: 67202.2). Total num frames: 1979580416. Throughput: 0: 18363.7. Samples: 494951936. Policy #0 lag: (min: 55.0, avg: 182.3, max: 327.0) [2024-06-15 23:16:30,956][1648985] Avg episode reward: [(0, '179.640')] [2024-06-15 23:16:31,303][1652491] Updated weights for policy 0, policy_version 966610 (0.0010) [2024-06-15 23:16:32,828][1652491] Updated weights for policy 0, policy_version 966657 (0.0011) [2024-06-15 23:16:33,692][1652491] Updated weights for policy 0, policy_version 966708 (0.0019) [2024-06-15 23:16:34,791][1652491] Updated weights for policy 0, policy_version 966736 (0.0009) [2024-06-15 23:16:35,955][1648985] Fps is (10 sec: 78643.1, 60 sec: 74274.2, 300 sec: 67757.6). Total num frames: 1979973632. Throughput: 0: 18227.2. Samples: 495063040. Policy #0 lag: (min: 55.0, avg: 182.3, max: 327.0) [2024-06-15 23:16:35,956][1648985] Avg episode reward: [(0, '162.580')] [2024-06-15 23:16:36,233][1652491] Updated weights for policy 0, policy_version 966800 (0.0012) [2024-06-15 23:16:38,628][1652491] Updated weights for policy 0, policy_version 966854 (0.0011) [2024-06-15 23:16:39,378][1652491] Updated weights for policy 0, policy_version 966909 (0.0011) [2024-06-15 23:16:40,955][1648985] Fps is (10 sec: 72089.8, 60 sec: 73182.0, 300 sec: 67757.5). Total num frames: 1980301312. Throughput: 0: 18545.8. Samples: 495117312. Policy #0 lag: (min: 55.0, avg: 182.3, max: 327.0) [2024-06-15 23:16:40,956][1648985] Avg episode reward: [(0, '142.690')] [2024-06-15 23:16:41,346][1652491] Updated weights for policy 0, policy_version 966964 (0.0011) [2024-06-15 23:16:42,459][1652491] Updated weights for policy 0, policy_version 967008 (0.0011) [2024-06-15 23:16:42,963][1652491] Updated weights for policy 0, policy_version 967038 (0.0009) [2024-06-15 23:16:44,149][1652491] Updated weights for policy 0, policy_version 967088 (0.0010) [2024-06-15 23:16:45,859][1652491] Updated weights for policy 0, policy_version 967137 (0.0011) [2024-06-15 23:16:45,955][1648985] Fps is (10 sec: 72090.0, 60 sec: 73728.1, 300 sec: 68201.9). Total num frames: 1980694528. Throughput: 0: 18454.8. Samples: 495229952. Policy #0 lag: (min: 55.0, avg: 182.3, max: 327.0) [2024-06-15 23:16:45,955][1648985] Avg episode reward: [(0, '151.950')] [2024-06-15 23:16:47,671][1652491] Updated weights for policy 0, policy_version 967186 (0.0010) [2024-06-15 23:16:48,294][1652491] Updated weights for policy 0, policy_version 967226 (0.0012) [2024-06-15 23:16:49,462][1652491] Updated weights for policy 0, policy_version 967271 (0.0012) [2024-06-15 23:16:50,492][1652491] Updated weights for policy 0, policy_version 967312 (0.0013) [2024-06-15 23:16:50,955][1648985] Fps is (10 sec: 78643.4, 60 sec: 75366.4, 300 sec: 68646.2). Total num frames: 1981087744. Throughput: 0: 18329.6. Samples: 495340032. Policy #0 lag: (min: 55.0, avg: 182.3, max: 327.0) [2024-06-15 23:16:50,956][1648985] Avg episode reward: [(0, '177.910')] [2024-06-15 23:16:52,611][1652491] Updated weights for policy 0, policy_version 967364 (0.0013) [2024-06-15 23:16:53,433][1652491] Updated weights for policy 0, policy_version 967419 (0.0010) [2024-06-15 23:16:55,171][1652491] Updated weights for policy 0, policy_version 967459 (0.0015) [2024-06-15 23:16:55,955][1648985] Fps is (10 sec: 72088.0, 60 sec: 74274.1, 300 sec: 68535.1). Total num frames: 1981415424. Throughput: 0: 18284.1. Samples: 495394304. Policy #0 lag: (min: 55.0, avg: 182.3, max: 327.0) [2024-06-15 23:16:55,956][1648985] Avg episode reward: [(0, '182.920')] [2024-06-15 23:16:55,959][1652491] Updated weights for policy 0, policy_version 967490 (0.0012) [2024-06-15 23:16:56,449][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000967520_1981480960.pth... [2024-06-15 23:16:56,551][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000959168_1964376064.pth [2024-06-15 23:16:57,956][1652491] Updated weights for policy 0, policy_version 967568 (0.0010) [2024-06-15 23:16:59,887][1652491] Updated weights for policy 0, policy_version 967618 (0.0011) [2024-06-15 23:17:00,111][1651469] Signal inference workers to stop experience collection... (50400 times) [2024-06-15 23:17:00,132][1652491] InferenceWorker_p0-w0: stopping experience collection (50400 times) [2024-06-15 23:17:00,284][1651469] Signal inference workers to resume experience collection... (50400 times) [2024-06-15 23:17:00,285][1652491] InferenceWorker_p0-w0: resuming experience collection (50400 times) [2024-06-15 23:17:00,828][1652491] Updated weights for policy 0, policy_version 967678 (0.0010) [2024-06-15 23:17:00,955][1648985] Fps is (10 sec: 72089.6, 60 sec: 72089.5, 300 sec: 69090.5). Total num frames: 1981808640. Throughput: 0: 18545.8. Samples: 495510528. Policy #0 lag: (min: 55.0, avg: 182.3, max: 327.0) [2024-06-15 23:17:00,956][1648985] Avg episode reward: [(0, '165.720')] [2024-06-15 23:17:02,114][1652491] Updated weights for policy 0, policy_version 967728 (0.0012) [2024-06-15 23:17:03,212][1652491] Updated weights for policy 0, policy_version 967776 (0.0014) [2024-06-15 23:17:04,586][1652491] Updated weights for policy 0, policy_version 967840 (0.0011) [2024-06-15 23:17:05,955][1648985] Fps is (10 sec: 78644.7, 60 sec: 74274.2, 300 sec: 69312.7). Total num frames: 1982201856. Throughput: 0: 18363.8. Samples: 495615488. Policy #0 lag: (min: 55.0, avg: 182.3, max: 327.0) [2024-06-15 23:17:05,955][1648985] Avg episode reward: [(0, '151.320')] [2024-06-15 23:17:07,869][1652491] Updated weights for policy 0, policy_version 967905 (0.0014) [2024-06-15 23:17:08,825][1652491] Updated weights for policy 0, policy_version 967953 (0.0012) [2024-06-15 23:17:09,500][1652491] Updated weights for policy 0, policy_version 967998 (0.0015) [2024-06-15 23:17:10,586][1652491] Updated weights for policy 0, policy_version 968037 (0.0011) [2024-06-15 23:17:10,955][1648985] Fps is (10 sec: 75365.7, 60 sec: 74274.3, 300 sec: 69645.9). Total num frames: 1982562304. Throughput: 0: 18386.4. Samples: 495677952. Policy #0 lag: (min: 55.0, avg: 182.3, max: 327.0) [2024-06-15 23:17:10,956][1648985] Avg episode reward: [(0, '170.320')] [2024-06-15 23:17:11,686][1652491] Updated weights for policy 0, policy_version 968112 (0.0011) [2024-06-15 23:17:14,418][1652491] Updated weights for policy 0, policy_version 968144 (0.0012) [2024-06-15 23:17:15,191][1652491] Updated weights for policy 0, policy_version 968191 (0.0088) [2024-06-15 23:17:15,955][1648985] Fps is (10 sec: 68812.7, 60 sec: 72635.8, 300 sec: 69645.9). Total num frames: 1982889984. Throughput: 0: 18648.2. Samples: 495791104. Policy #0 lag: (min: 55.0, avg: 182.3, max: 327.0) [2024-06-15 23:17:15,955][1648985] Avg episode reward: [(0, '154.530')] [2024-06-15 23:17:16,458][1652491] Updated weights for policy 0, policy_version 968246 (0.0010) [2024-06-15 23:17:17,655][1652491] Updated weights for policy 0, policy_version 968288 (0.0011) [2024-06-15 23:17:18,304][1652491] Updated weights for policy 0, policy_version 968325 (0.0019) [2024-06-15 23:17:18,976][1652491] Updated weights for policy 0, policy_version 968376 (0.0016) [2024-06-15 23:17:20,955][1648985] Fps is (10 sec: 68812.1, 60 sec: 73181.6, 300 sec: 70090.1). Total num frames: 1983250432. Throughput: 0: 18727.7. Samples: 495905792. Policy #0 lag: (min: 55.0, avg: 182.3, max: 327.0) [2024-06-15 23:17:20,956][1648985] Avg episode reward: [(0, '139.360')] [2024-06-15 23:17:22,157][1652491] Updated weights for policy 0, policy_version 968421 (0.0011) [2024-06-15 23:17:23,271][1652491] Updated weights for policy 0, policy_version 968480 (0.0092) [2024-06-15 23:17:23,867][1652491] Updated weights for policy 0, policy_version 968512 (0.0011) [2024-06-15 23:17:25,086][1652491] Updated weights for policy 0, policy_version 968561 (0.0010) [2024-06-15 23:17:25,840][1651469] Signal inference workers to stop experience collection... (50450 times) [2024-06-15 23:17:25,882][1652491] InferenceWorker_p0-w0: stopping experience collection (50450 times) [2024-06-15 23:17:25,955][1648985] Fps is (10 sec: 81918.7, 60 sec: 75366.2, 300 sec: 70534.5). Total num frames: 1983709184. Throughput: 0: 18545.7. Samples: 495951872. Policy #0 lag: (min: 55.0, avg: 182.3, max: 327.0) [2024-06-15 23:17:25,956][1648985] Avg episode reward: [(0, '141.160')] [2024-06-15 23:17:25,962][1651469] Signal inference workers to resume experience collection... (50450 times) [2024-06-15 23:17:25,962][1652491] InferenceWorker_p0-w0: resuming experience collection (50450 times) [2024-06-15 23:17:25,964][1652491] Updated weights for policy 0, policy_version 968624 (0.0030) [2024-06-15 23:17:29,172][1652491] Updated weights for policy 0, policy_version 968658 (0.0011) [2024-06-15 23:17:30,379][1652491] Updated weights for policy 0, policy_version 968724 (0.0010) [2024-06-15 23:17:30,955][1648985] Fps is (10 sec: 75367.8, 60 sec: 73728.1, 300 sec: 70645.6). Total num frames: 1984004096. Throughput: 0: 18648.1. Samples: 496069120. Policy #0 lag: (min: 55.0, avg: 182.3, max: 327.0) [2024-06-15 23:17:30,956][1648985] Avg episode reward: [(0, '159.880')] [2024-06-15 23:17:31,091][1652491] Updated weights for policy 0, policy_version 968767 (0.0009) [2024-06-15 23:17:32,368][1652491] Updated weights for policy 0, policy_version 968832 (0.0091) [2024-06-15 23:17:33,226][1652491] Updated weights for policy 0, policy_version 968888 (0.0013) [2024-06-15 23:17:35,955][1648985] Fps is (10 sec: 58983.2, 60 sec: 72089.6, 300 sec: 70645.6). Total num frames: 1984299008. Throughput: 0: 18602.7. Samples: 496177152. Policy #0 lag: (min: 55.0, avg: 182.3, max: 327.0) [2024-06-15 23:17:35,956][1648985] Avg episode reward: [(0, '162.790')] [2024-06-15 23:17:37,258][1652491] Updated weights for policy 0, policy_version 968944 (0.0011) [2024-06-15 23:17:38,418][1652491] Updated weights for policy 0, policy_version 968998 (0.0011) [2024-06-15 23:17:39,740][1652491] Updated weights for policy 0, policy_version 969072 (0.0011) [2024-06-15 23:17:40,662][1652491] Updated weights for policy 0, policy_version 969120 (0.0051) [2024-06-15 23:17:40,955][1648985] Fps is (10 sec: 78643.3, 60 sec: 74820.3, 300 sec: 71423.1). Total num frames: 1984790528. Throughput: 0: 18386.6. Samples: 496221696. Policy #0 lag: (min: 55.0, avg: 182.3, max: 327.0) [2024-06-15 23:17:40,956][1648985] Avg episode reward: [(0, '155.660')] [2024-06-15 23:17:44,973][1652491] Updated weights for policy 0, policy_version 969200 (0.0013) [2024-06-15 23:17:45,900][1652491] Updated weights for policy 0, policy_version 969250 (0.0009) [2024-06-15 23:17:45,955][1648985] Fps is (10 sec: 72090.6, 60 sec: 72089.7, 300 sec: 71312.1). Total num frames: 1985019904. Throughput: 0: 18124.8. Samples: 496326144. Policy #0 lag: (min: 55.0, avg: 182.3, max: 327.0) [2024-06-15 23:17:45,955][1648985] Avg episode reward: [(0, '149.780')] [2024-06-15 23:17:46,324][1652491] Updated weights for policy 0, policy_version 969280 (0.0010) [2024-06-15 23:17:47,428][1652491] Updated weights for policy 0, policy_version 969329 (0.0012) [2024-06-15 23:17:48,716][1652491] Updated weights for policy 0, policy_version 969398 (0.0075) [2024-06-15 23:17:50,955][1648985] Fps is (10 sec: 55705.6, 60 sec: 70997.3, 300 sec: 71312.1). Total num frames: 1985347584. Throughput: 0: 18193.1. Samples: 496434176. Policy #0 lag: (min: 55.0, avg: 182.3, max: 327.0) [2024-06-15 23:17:50,956][1648985] Avg episode reward: [(0, '153.000')] [2024-06-15 23:17:52,588][1652491] Updated weights for policy 0, policy_version 969457 (0.0011) [2024-06-15 23:17:53,181][1651469] Signal inference workers to stop experience collection... (50500 times) [2024-06-15 23:17:53,216][1652491] InferenceWorker_p0-w0: stopping experience collection (50500 times) [2024-06-15 23:17:53,380][1651469] Signal inference workers to resume experience collection... (50500 times) [2024-06-15 23:17:53,381][1652491] InferenceWorker_p0-w0: resuming experience collection (50500 times) [2024-06-15 23:17:53,680][1652491] Updated weights for policy 0, policy_version 969520 (0.0013) [2024-06-15 23:17:54,664][1652491] Updated weights for policy 0, policy_version 969569 (0.0013) [2024-06-15 23:17:55,373][1652491] Updated weights for policy 0, policy_version 969616 (0.0010) [2024-06-15 23:17:55,955][1648985] Fps is (10 sec: 81918.8, 60 sec: 73728.2, 300 sec: 72311.7). Total num frames: 1985839104. Throughput: 0: 17840.4. Samples: 496480768. Policy #0 lag: (min: 55.0, avg: 182.3, max: 327.0) [2024-06-15 23:17:55,956][1648985] Avg episode reward: [(0, '162.760')] [2024-06-15 23:17:59,000][1652491] Updated weights for policy 0, policy_version 969680 (0.0028) [2024-06-15 23:18:00,140][1652491] Updated weights for policy 0, policy_version 969732 (0.0012) [2024-06-15 23:18:00,955][1648985] Fps is (10 sec: 75365.7, 60 sec: 71543.3, 300 sec: 71867.4). Total num frames: 1986101248. Throughput: 0: 17772.0. Samples: 496590848. Policy #0 lag: (min: 6.0, avg: 76.7, max: 262.0) [2024-06-15 23:18:00,956][1648985] Avg episode reward: [(0, '178.190')] [2024-06-15 23:18:01,126][1652491] Updated weights for policy 0, policy_version 969786 (0.0034) [2024-06-15 23:18:02,117][1652491] Updated weights for policy 0, policy_version 969824 (0.0010) [2024-06-15 23:18:03,002][1652491] Updated weights for policy 0, policy_version 969872 (0.0074) [2024-06-15 23:18:03,606][1652491] Updated weights for policy 0, policy_version 969920 (0.0011) [2024-06-15 23:18:05,955][1648985] Fps is (10 sec: 55705.9, 60 sec: 69905.1, 300 sec: 71867.5). Total num frames: 1986396160. Throughput: 0: 17590.1. Samples: 496697344. Policy #0 lag: (min: 6.0, avg: 76.7, max: 262.0) [2024-06-15 23:18:05,955][1648985] Avg episode reward: [(0, '148.180')] [2024-06-15 23:18:06,849][1652491] Updated weights for policy 0, policy_version 969977 (0.0013) [2024-06-15 23:18:07,897][1652491] Updated weights for policy 0, policy_version 970018 (0.0011) [2024-06-15 23:18:08,885][1652491] Updated weights for policy 0, policy_version 970064 (0.0010) [2024-06-15 23:18:09,667][1652491] Updated weights for policy 0, policy_version 970111 (0.0011) [2024-06-15 23:18:10,892][1652491] Updated weights for policy 0, policy_version 970170 (0.0011) [2024-06-15 23:18:10,955][1648985] Fps is (10 sec: 81919.0, 60 sec: 72635.6, 300 sec: 72422.8). Total num frames: 1986920448. Throughput: 0: 17715.2. Samples: 496749056. Policy #0 lag: (min: 6.0, avg: 76.7, max: 262.0) [2024-06-15 23:18:10,956][1648985] Avg episode reward: [(0, '144.570')] [2024-06-15 23:18:13,853][1652491] Updated weights for policy 0, policy_version 970233 (0.0014) [2024-06-15 23:18:15,575][1652491] Updated weights for policy 0, policy_version 970279 (0.0011) [2024-06-15 23:18:15,955][1648985] Fps is (10 sec: 78642.0, 60 sec: 71543.3, 300 sec: 71978.5). Total num frames: 1987182592. Throughput: 0: 17555.9. Samples: 496859136. Policy #0 lag: (min: 6.0, avg: 76.7, max: 262.0) [2024-06-15 23:18:15,956][1648985] Avg episode reward: [(0, '146.970')] [2024-06-15 23:18:16,592][1652491] Updated weights for policy 0, policy_version 970338 (0.0012) [2024-06-15 23:18:17,022][1652491] Updated weights for policy 0, policy_version 970367 (0.0012) [2024-06-15 23:18:18,459][1652491] Updated weights for policy 0, policy_version 970423 (0.0012) [2024-06-15 23:18:20,771][1651469] Signal inference workers to stop experience collection... (50550 times) [2024-06-15 23:18:20,827][1652491] InferenceWorker_p0-w0: stopping experience collection (50550 times) [2024-06-15 23:18:20,829][1652491] Updated weights for policy 0, policy_version 970452 (0.0013) [2024-06-15 23:18:20,955][1648985] Fps is (10 sec: 55707.0, 60 sec: 70451.5, 300 sec: 71867.5). Total num frames: 1987477504. Throughput: 0: 17408.0. Samples: 496960512. Policy #0 lag: (min: 6.0, avg: 76.7, max: 262.0) [2024-06-15 23:18:20,955][1648985] Avg episode reward: [(0, '162.010')] [2024-06-15 23:18:20,973][1651469] Signal inference workers to resume experience collection... (50550 times) [2024-06-15 23:18:20,973][1652491] InferenceWorker_p0-w0: resuming experience collection (50550 times) [2024-06-15 23:18:21,568][1652491] Updated weights for policy 0, policy_version 970496 (0.0011) [2024-06-15 23:18:23,834][1652491] Updated weights for policy 0, policy_version 970563 (0.0013) [2024-06-15 23:18:24,747][1652491] Updated weights for policy 0, policy_version 970618 (0.0021) [2024-06-15 23:18:25,955][1648985] Fps is (10 sec: 72090.6, 60 sec: 69905.2, 300 sec: 72200.7). Total num frames: 1987903488. Throughput: 0: 17464.9. Samples: 497007616. Policy #0 lag: (min: 6.0, avg: 76.7, max: 262.0) [2024-06-15 23:18:25,956][1648985] Avg episode reward: [(0, '172.550')] [2024-06-15 23:18:26,062][1652491] Updated weights for policy 0, policy_version 970660 (0.0012) [2024-06-15 23:18:28,265][1652491] Updated weights for policy 0, policy_version 970692 (0.0011) [2024-06-15 23:18:29,096][1652491] Updated weights for policy 0, policy_version 970743 (0.0012) [2024-06-15 23:18:30,864][1652491] Updated weights for policy 0, policy_version 970787 (0.0011) [2024-06-15 23:18:30,955][1648985] Fps is (10 sec: 68813.1, 60 sec: 69359.0, 300 sec: 71648.0). Total num frames: 1988165632. Throughput: 0: 17544.5. Samples: 497115648. Policy #0 lag: (min: 6.0, avg: 76.7, max: 262.0) [2024-06-15 23:18:30,955][1648985] Avg episode reward: [(0, '176.380')] [2024-06-15 23:18:31,844][1652491] Updated weights for policy 0, policy_version 970848 (0.0010) [2024-06-15 23:18:32,917][1652491] Updated weights for policy 0, policy_version 970896 (0.0012) [2024-06-15 23:18:33,708][1652491] Updated weights for policy 0, policy_version 970941 (0.0011) [2024-06-15 23:18:35,831][1652491] Updated weights for policy 0, policy_version 970984 (0.0010) [2024-06-15 23:18:35,955][1648985] Fps is (10 sec: 68812.8, 60 sec: 71543.5, 300 sec: 71978.6). Total num frames: 1988591616. Throughput: 0: 17510.4. Samples: 497222144. Policy #0 lag: (min: 6.0, avg: 76.7, max: 262.0) [2024-06-15 23:18:35,955][1648985] Avg episode reward: [(0, '157.670')] [2024-06-15 23:18:37,388][1652491] Updated weights for policy 0, policy_version 971010 (0.0011) [2024-06-15 23:18:38,198][1652491] Updated weights for policy 0, policy_version 971072 (0.0010) [2024-06-15 23:18:39,054][1652491] Updated weights for policy 0, policy_version 971124 (0.0010) [2024-06-15 23:18:39,800][1652491] Updated weights for policy 0, policy_version 971152 (0.0010) [2024-06-15 23:18:40,638][1652491] Updated weights for policy 0, policy_version 971200 (0.0009) [2024-06-15 23:18:40,955][1648985] Fps is (10 sec: 85195.3, 60 sec: 70451.1, 300 sec: 72422.8). Total num frames: 1989017600. Throughput: 0: 17772.1. Samples: 497280512. Policy #0 lag: (min: 6.0, avg: 76.7, max: 262.0) [2024-06-15 23:18:40,956][1648985] Avg episode reward: [(0, '163.910')] [2024-06-15 23:18:43,143][1652491] Updated weights for policy 0, policy_version 971255 (0.0066) [2024-06-15 23:18:44,651][1652491] Updated weights for policy 0, policy_version 971300 (0.0012) [2024-06-15 23:18:45,592][1652491] Updated weights for policy 0, policy_version 971361 (0.0029) [2024-06-15 23:18:45,955][1648985] Fps is (10 sec: 81919.6, 60 sec: 73181.7, 300 sec: 72311.7). Total num frames: 1989410816. Throughput: 0: 17874.5. Samples: 497395200. Policy #0 lag: (min: 6.0, avg: 76.7, max: 262.0) [2024-06-15 23:18:45,956][1648985] Avg episode reward: [(0, '158.380')] [2024-06-15 23:18:46,452][1652491] Updated weights for policy 0, policy_version 971414 (0.0011) [2024-06-15 23:18:46,655][1651469] Signal inference workers to stop experience collection... (50600 times) [2024-06-15 23:18:46,697][1652491] InferenceWorker_p0-w0: stopping experience collection (50600 times) [2024-06-15 23:18:46,862][1651469] Signal inference workers to resume experience collection... (50600 times) [2024-06-15 23:18:46,863][1652491] InferenceWorker_p0-w0: resuming experience collection (50600 times) [2024-06-15 23:18:47,136][1652491] Updated weights for policy 0, policy_version 971454 (0.0011) [2024-06-15 23:18:50,555][1652491] Updated weights for policy 0, policy_version 971510 (0.0077) [2024-06-15 23:18:50,955][1648985] Fps is (10 sec: 65536.9, 60 sec: 72089.7, 300 sec: 72422.8). Total num frames: 1989672960. Throughput: 0: 18033.8. Samples: 497508864. Policy #0 lag: (min: 6.0, avg: 76.7, max: 262.0) [2024-06-15 23:18:50,955][1648985] Avg episode reward: [(0, '161.150')] [2024-06-15 23:18:51,480][1652491] Updated weights for policy 0, policy_version 971552 (0.0021) [2024-06-15 23:18:52,221][1652491] Updated weights for policy 0, policy_version 971585 (0.0011) [2024-06-15 23:18:53,227][1652491] Updated weights for policy 0, policy_version 971651 (0.0010) [2024-06-15 23:18:54,027][1652491] Updated weights for policy 0, policy_version 971708 (0.0009) [2024-06-15 23:18:55,955][1648985] Fps is (10 sec: 65535.8, 60 sec: 70451.1, 300 sec: 72422.8). Total num frames: 1990066176. Throughput: 0: 17897.3. Samples: 497554432. Policy #0 lag: (min: 6.0, avg: 76.7, max: 262.0) [2024-06-15 23:18:55,956][1648985] Avg episode reward: [(0, '169.130')] [2024-06-15 23:18:55,961][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000971712_1990066176.pth... [2024-06-15 23:18:56,044][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000963264_1972764672.pth [2024-06-15 23:18:57,610][1652491] Updated weights for policy 0, policy_version 971760 (0.0034) [2024-06-15 23:18:58,376][1652491] Updated weights for policy 0, policy_version 971793 (0.0010) [2024-06-15 23:18:59,494][1652491] Updated weights for policy 0, policy_version 971856 (0.0010) [2024-06-15 23:19:00,492][1652491] Updated weights for policy 0, policy_version 971907 (0.0011) [2024-06-15 23:19:00,955][1648985] Fps is (10 sec: 85195.6, 60 sec: 73728.0, 300 sec: 72533.9). Total num frames: 1990524928. Throughput: 0: 17976.9. Samples: 497668096. Policy #0 lag: (min: 6.0, avg: 76.7, max: 262.0) [2024-06-15 23:19:00,956][1648985] Avg episode reward: [(0, '157.410')] [2024-06-15 23:19:01,285][1652491] Updated weights for policy 0, policy_version 971967 (0.0074) [2024-06-15 23:19:05,512][1652491] Updated weights for policy 0, policy_version 972032 (0.0011) [2024-06-15 23:19:05,955][1648985] Fps is (10 sec: 68813.5, 60 sec: 72635.7, 300 sec: 72422.9). Total num frames: 1990754304. Throughput: 0: 18284.1. Samples: 497783296. Policy #0 lag: (min: 6.0, avg: 76.7, max: 262.0) [2024-06-15 23:19:05,955][1648985] Avg episode reward: [(0, '142.900')] [2024-06-15 23:19:06,614][1652491] Updated weights for policy 0, policy_version 972082 (0.0010) [2024-06-15 23:19:08,046][1652491] Updated weights for policy 0, policy_version 972160 (0.0010) [2024-06-15 23:19:08,952][1652491] Updated weights for policy 0, policy_version 972221 (0.0011) [2024-06-15 23:19:10,955][1648985] Fps is (10 sec: 58982.9, 60 sec: 69905.3, 300 sec: 72422.8). Total num frames: 1991114752. Throughput: 0: 18056.5. Samples: 497820160. Policy #0 lag: (min: 6.0, avg: 76.7, max: 262.0) [2024-06-15 23:19:10,956][1648985] Avg episode reward: [(0, '144.570')] [2024-06-15 23:19:13,281][1652491] Updated weights for policy 0, policy_version 972288 (0.0012) [2024-06-15 23:19:13,670][1651469] Signal inference workers to stop experience collection... (50650 times) [2024-06-15 23:19:13,723][1652491] InferenceWorker_p0-w0: stopping experience collection (50650 times) [2024-06-15 23:19:13,920][1651469] Signal inference workers to resume experience collection... (50650 times) [2024-06-15 23:19:13,921][1652491] InferenceWorker_p0-w0: resuming experience collection (50650 times) [2024-06-15 23:19:14,185][1652491] Updated weights for policy 0, policy_version 972336 (0.0010) [2024-06-15 23:19:15,215][1652491] Updated weights for policy 0, policy_version 972387 (0.0085) [2024-06-15 23:19:15,955][1648985] Fps is (10 sec: 78642.9, 60 sec: 72635.9, 300 sec: 72533.9). Total num frames: 1991540736. Throughput: 0: 18090.6. Samples: 497929728. Policy #0 lag: (min: 6.0, avg: 76.7, max: 262.0) [2024-06-15 23:19:15,956][1648985] Avg episode reward: [(0, '182.960')] [2024-06-15 23:19:16,235][1652491] Updated weights for policy 0, policy_version 972449 (0.0010) [2024-06-15 23:19:19,740][1652491] Updated weights for policy 0, policy_version 972484 (0.0009) [2024-06-15 23:19:20,739][1652491] Updated weights for policy 0, policy_version 972545 (0.0081) [2024-06-15 23:19:20,955][1648985] Fps is (10 sec: 68813.1, 60 sec: 72089.6, 300 sec: 72200.7). Total num frames: 1991802880. Throughput: 0: 18329.6. Samples: 498046976. Policy #0 lag: (min: 6.0, avg: 76.7, max: 262.0) [2024-06-15 23:19:20,955][1648985] Avg episode reward: [(0, '181.610')] [2024-06-15 23:19:21,632][1652491] Updated weights for policy 0, policy_version 972597 (0.0012) [2024-06-15 23:19:22,579][1652491] Updated weights for policy 0, policy_version 972661 (0.0011) [2024-06-15 23:19:23,811][1652491] Updated weights for policy 0, policy_version 972728 (0.0010) [2024-06-15 23:19:25,955][1648985] Fps is (10 sec: 62258.4, 60 sec: 70997.1, 300 sec: 72422.9). Total num frames: 1992163328. Throughput: 0: 17851.7. Samples: 498083840. Policy #0 lag: (min: 6.0, avg: 76.7, max: 262.0) [2024-06-15 23:19:25,956][1648985] Avg episode reward: [(0, '180.350')] [2024-06-15 23:19:27,209][1652491] Updated weights for policy 0, policy_version 972768 (0.0010) [2024-06-15 23:19:28,205][1652491] Updated weights for policy 0, policy_version 972819 (0.0011) [2024-06-15 23:19:29,148][1652491] Updated weights for policy 0, policy_version 972880 (0.0012) [2024-06-15 23:19:29,869][1652491] Updated weights for policy 0, policy_version 972928 (0.0011) [2024-06-15 23:19:30,840][1652491] Updated weights for policy 0, policy_version 972992 (0.0010) [2024-06-15 23:19:30,955][1648985] Fps is (10 sec: 88473.3, 60 sec: 75366.3, 300 sec: 72867.1). Total num frames: 1992687616. Throughput: 0: 17885.9. Samples: 498200064. Policy #0 lag: (min: 6.0, avg: 76.7, max: 262.0) [2024-06-15 23:19:30,955][1648985] Avg episode reward: [(0, '170.360')] [2024-06-15 23:19:34,614][1652491] Updated weights for policy 0, policy_version 973042 (0.0087) [2024-06-15 23:19:35,703][1652491] Updated weights for policy 0, policy_version 973104 (0.0010) [2024-06-15 23:19:35,955][1648985] Fps is (10 sec: 75367.2, 60 sec: 72089.5, 300 sec: 72533.9). Total num frames: 1992916992. Throughput: 0: 17908.6. Samples: 498314752. Policy #0 lag: (min: 6.0, avg: 76.7, max: 262.0) [2024-06-15 23:19:35,955][1648985] Avg episode reward: [(0, '168.590')] [2024-06-15 23:19:36,667][1651469] Signal inference workers to stop experience collection... (50700 times) [2024-06-15 23:19:36,718][1652491] InferenceWorker_p0-w0: stopping experience collection (50700 times) [2024-06-15 23:19:36,727][1652491] Updated weights for policy 0, policy_version 973160 (0.0012) [2024-06-15 23:19:36,800][1651469] Signal inference workers to resume experience collection... (50700 times) [2024-06-15 23:19:36,801][1652491] InferenceWorker_p0-w0: resuming experience collection (50700 times) [2024-06-15 23:19:37,499][1652491] Updated weights for policy 0, policy_version 973216 (0.0011) [2024-06-15 23:19:40,955][1648985] Fps is (10 sec: 52429.0, 60 sec: 69905.2, 300 sec: 72422.9). Total num frames: 1993211904. Throughput: 0: 17999.7. Samples: 498364416. Policy #0 lag: (min: 6.0, avg: 76.7, max: 262.0) [2024-06-15 23:19:40,955][1648985] Avg episode reward: [(0, '176.380')] [2024-06-15 23:19:41,215][1652491] Updated weights for policy 0, policy_version 973268 (0.0011) [2024-06-15 23:19:42,291][1652491] Updated weights for policy 0, policy_version 973329 (0.0012) [2024-06-15 23:19:43,529][1652491] Updated weights for policy 0, policy_version 973393 (0.0010) [2024-06-15 23:19:44,754][1652491] Updated weights for policy 0, policy_version 973472 (0.0099) [2024-06-15 23:19:45,955][1648985] Fps is (10 sec: 81919.9, 60 sec: 72089.6, 300 sec: 72867.1). Total num frames: 1993736192. Throughput: 0: 17817.6. Samples: 498469888. Policy #0 lag: (min: 6.0, avg: 76.7, max: 262.0) [2024-06-15 23:19:45,956][1648985] Avg episode reward: [(0, '170.770')] [2024-06-15 23:19:48,143][1652491] Updated weights for policy 0, policy_version 973507 (0.0012) [2024-06-15 23:19:48,959][1652491] Updated weights for policy 0, policy_version 973565 (0.0010) [2024-06-15 23:19:50,257][1652491] Updated weights for policy 0, policy_version 973624 (0.0010) [2024-06-15 23:19:50,955][1648985] Fps is (10 sec: 85196.6, 60 sec: 73181.8, 300 sec: 72422.8). Total num frames: 1994063872. Throughput: 0: 17783.5. Samples: 498583552. Policy #0 lag: (min: 6.0, avg: 76.7, max: 262.0) [2024-06-15 23:19:50,956][1648985] Avg episode reward: [(0, '139.690')] [2024-06-15 23:19:51,329][1652491] Updated weights for policy 0, policy_version 973685 (0.0061) [2024-06-15 23:19:51,893][1652491] Updated weights for policy 0, policy_version 973712 (0.0089) [2024-06-15 23:19:52,499][1652491] Updated weights for policy 0, policy_version 973760 (0.0011) [2024-06-15 23:19:55,938][1652491] Updated weights for policy 0, policy_version 973818 (0.0011) [2024-06-15 23:19:55,955][1648985] Fps is (10 sec: 62258.2, 60 sec: 71543.3, 300 sec: 72533.8). Total num frames: 1994358784. Throughput: 0: 18181.6. Samples: 498638336. Policy #0 lag: (min: 6.0, avg: 76.7, max: 262.0) [2024-06-15 23:19:55,956][1648985] Avg episode reward: [(0, '161.270')] [2024-06-15 23:19:57,661][1652491] Updated weights for policy 0, policy_version 973888 (0.0095) [2024-06-15 23:19:59,257][1652491] Updated weights for policy 0, policy_version 973958 (0.0135) [2024-06-15 23:20:00,096][1652491] Updated weights for policy 0, policy_version 974013 (0.0011) [2024-06-15 23:20:00,955][1648985] Fps is (10 sec: 72089.6, 60 sec: 70997.5, 300 sec: 72422.8). Total num frames: 1994784768. Throughput: 0: 17931.4. Samples: 498736640. Policy #0 lag: (min: 47.0, avg: 144.8, max: 303.0) [2024-06-15 23:20:00,956][1648985] Avg episode reward: [(0, '169.330')] [2024-06-15 23:20:03,399][1652491] Updated weights for policy 0, policy_version 974064 (0.0012) [2024-06-15 23:20:04,153][1651469] Signal inference workers to stop experience collection... (50750 times) [2024-06-15 23:20:04,206][1652491] InferenceWorker_p0-w0: stopping experience collection (50750 times) [2024-06-15 23:20:04,207][1652491] Updated weights for policy 0, policy_version 974099 (0.0008) [2024-06-15 23:20:04,375][1651469] Signal inference workers to resume experience collection... (50750 times) [2024-06-15 23:20:04,376][1652491] InferenceWorker_p0-w0: resuming experience collection (50750 times) [2024-06-15 23:20:05,955][1648985] Fps is (10 sec: 72091.3, 60 sec: 72089.6, 300 sec: 72089.6). Total num frames: 1995079680. Throughput: 0: 17908.6. Samples: 498852864. Policy #0 lag: (min: 47.0, avg: 144.8, max: 303.0) [2024-06-15 23:20:05,955][1648985] Avg episode reward: [(0, '132.430')] [2024-06-15 23:20:06,053][1652491] Updated weights for policy 0, policy_version 974164 (0.0012) [2024-06-15 23:20:07,096][1652491] Updated weights for policy 0, policy_version 974224 (0.0012) [2024-06-15 23:20:10,212][1652491] Updated weights for policy 0, policy_version 974274 (0.0013) [2024-06-15 23:20:10,955][1648985] Fps is (10 sec: 58981.0, 60 sec: 70997.1, 300 sec: 72200.6). Total num frames: 1995374592. Throughput: 0: 18193.0. Samples: 498902528. Policy #0 lag: (min: 47.0, avg: 144.8, max: 303.0) [2024-06-15 23:20:10,956][1648985] Avg episode reward: [(0, '146.270')] [2024-06-15 23:20:11,727][1652491] Updated weights for policy 0, policy_version 974358 (0.0017) [2024-06-15 23:20:12,376][1652491] Updated weights for policy 0, policy_version 974400 (0.0009) [2024-06-15 23:20:14,468][1652491] Updated weights for policy 0, policy_version 974467 (0.0011) [2024-06-15 23:20:15,955][1648985] Fps is (10 sec: 75366.8, 60 sec: 71543.6, 300 sec: 72534.0). Total num frames: 1995833344. Throughput: 0: 17760.7. Samples: 498999296. Policy #0 lag: (min: 47.0, avg: 144.8, max: 303.0) [2024-06-15 23:20:15,955][1648985] Avg episode reward: [(0, '153.560')] [2024-06-15 23:20:17,877][1652491] Updated weights for policy 0, policy_version 974530 (0.0012) [2024-06-15 23:20:18,901][1652491] Updated weights for policy 0, policy_version 974592 (0.0013) [2024-06-15 23:20:20,027][1652491] Updated weights for policy 0, policy_version 974648 (0.0011) [2024-06-15 23:20:20,955][1648985] Fps is (10 sec: 75368.3, 60 sec: 72089.6, 300 sec: 72089.6). Total num frames: 1996128256. Throughput: 0: 17749.4. Samples: 499113472. Policy #0 lag: (min: 47.0, avg: 144.8, max: 303.0) [2024-06-15 23:20:20,955][1648985] Avg episode reward: [(0, '184.980')] [2024-06-15 23:20:21,401][1652491] Updated weights for policy 0, policy_version 974704 (0.0177) [2024-06-15 23:20:22,356][1652491] Updated weights for policy 0, policy_version 974754 (0.0010) [2024-06-15 23:20:25,598][1652491] Updated weights for policy 0, policy_version 974803 (0.0014) [2024-06-15 23:20:25,955][1648985] Fps is (10 sec: 58982.0, 60 sec: 70997.5, 300 sec: 72200.7). Total num frames: 1996423168. Throughput: 0: 17760.7. Samples: 499163648. Policy #0 lag: (min: 47.0, avg: 144.8, max: 303.0) [2024-06-15 23:20:25,955][1648985] Avg episode reward: [(0, '185.650')] [2024-06-15 23:20:26,619][1652491] Updated weights for policy 0, policy_version 974864 (0.0011) [2024-06-15 23:20:27,433][1652491] Updated weights for policy 0, policy_version 974907 (0.0009) [2024-06-15 23:20:28,664][1652491] Updated weights for policy 0, policy_version 974960 (0.0010) [2024-06-15 23:20:29,615][1652491] Updated weights for policy 0, policy_version 975008 (0.0010) [2024-06-15 23:20:29,714][1651469] Signal inference workers to stop experience collection... (50800 times) [2024-06-15 23:20:29,762][1652491] InferenceWorker_p0-w0: stopping experience collection (50800 times) [2024-06-15 23:20:29,913][1651469] Signal inference workers to resume experience collection... (50800 times) [2024-06-15 23:20:29,914][1652491] InferenceWorker_p0-w0: resuming experience collection (50800 times) [2024-06-15 23:20:30,955][1648985] Fps is (10 sec: 75366.2, 60 sec: 69905.1, 300 sec: 72422.9). Total num frames: 1996881920. Throughput: 0: 17646.9. Samples: 499264000. Policy #0 lag: (min: 47.0, avg: 144.8, max: 303.0) [2024-06-15 23:20:30,956][1648985] Avg episode reward: [(0, '170.200')] [2024-06-15 23:20:32,803][1652491] Updated weights for policy 0, policy_version 975061 (0.0019) [2024-06-15 23:20:33,596][1652491] Updated weights for policy 0, policy_version 975105 (0.0059) [2024-06-15 23:20:34,409][1652491] Updated weights for policy 0, policy_version 975152 (0.0012) [2024-06-15 23:20:35,578][1652491] Updated weights for policy 0, policy_version 975186 (0.0011) [2024-06-15 23:20:35,955][1648985] Fps is (10 sec: 78643.3, 60 sec: 71543.5, 300 sec: 72200.7). Total num frames: 1997209600. Throughput: 0: 17612.8. Samples: 499376128. Policy #0 lag: (min: 47.0, avg: 144.8, max: 303.0) [2024-06-15 23:20:35,956][1648985] Avg episode reward: [(0, '168.190')] [2024-06-15 23:20:36,465][1652491] Updated weights for policy 0, policy_version 975232 (0.0011) [2024-06-15 23:20:37,506][1652491] Updated weights for policy 0, policy_version 975292 (0.0164) [2024-06-15 23:20:40,861][1652491] Updated weights for policy 0, policy_version 975356 (0.0010) [2024-06-15 23:20:40,955][1648985] Fps is (10 sec: 65536.2, 60 sec: 72089.6, 300 sec: 72089.6). Total num frames: 1997537280. Throughput: 0: 17556.0. Samples: 499428352. Policy #0 lag: (min: 47.0, avg: 144.8, max: 303.0) [2024-06-15 23:20:40,955][1648985] Avg episode reward: [(0, '163.700')] [2024-06-15 23:20:42,263][1652491] Updated weights for policy 0, policy_version 975417 (0.0010) [2024-06-15 23:20:43,586][1652491] Updated weights for policy 0, policy_version 975458 (0.0010) [2024-06-15 23:20:44,043][1652491] Updated weights for policy 0, policy_version 975488 (0.0011) [2024-06-15 23:20:45,450][1652491] Updated weights for policy 0, policy_version 975546 (0.0011) [2024-06-15 23:20:45,956][1648985] Fps is (10 sec: 72085.8, 60 sec: 69904.5, 300 sec: 72422.7). Total num frames: 1997930496. Throughput: 0: 17612.6. Samples: 499529216. Policy #0 lag: (min: 47.0, avg: 144.8, max: 303.0) [2024-06-15 23:20:45,957][1648985] Avg episode reward: [(0, '171.860')] [2024-06-15 23:20:48,022][1652491] Updated weights for policy 0, policy_version 975600 (0.0011) [2024-06-15 23:20:49,242][1652491] Updated weights for policy 0, policy_version 975648 (0.0010) [2024-06-15 23:20:49,816][1652491] Updated weights for policy 0, policy_version 975678 (0.0010) [2024-06-15 23:20:50,689][1652491] Updated weights for policy 0, policy_version 975728 (0.0012) [2024-06-15 23:20:50,959][1648985] Fps is (10 sec: 78609.3, 60 sec: 70992.2, 300 sec: 72421.8). Total num frames: 1998323712. Throughput: 0: 17474.6. Samples: 499639296. Policy #0 lag: (min: 47.0, avg: 144.8, max: 303.0) [2024-06-15 23:20:50,960][1648985] Avg episode reward: [(0, '172.880')] [2024-06-15 23:20:52,589][1652491] Updated weights for policy 0, policy_version 975792 (0.0010) [2024-06-15 23:20:54,519][1652491] Updated weights for policy 0, policy_version 975840 (0.0010) [2024-06-15 23:20:55,852][1652491] Updated weights for policy 0, policy_version 975875 (0.0013) [2024-06-15 23:20:55,955][1648985] Fps is (10 sec: 65538.0, 60 sec: 70451.2, 300 sec: 71534.1). Total num frames: 1998585856. Throughput: 0: 17783.5. Samples: 499702784. Policy #0 lag: (min: 47.0, avg: 144.8, max: 303.0) [2024-06-15 23:20:55,956][1648985] Avg episode reward: [(0, '176.230')] [2024-06-15 23:20:56,219][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000975904_1998651392.pth... [2024-06-15 23:20:56,308][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000967520_1981480960.pth [2024-06-15 23:20:56,848][1652491] Updated weights for policy 0, policy_version 975940 (0.0011) [2024-06-15 23:20:57,717][1652491] Updated weights for policy 0, policy_version 975993 (0.0009) [2024-06-15 23:20:59,127][1651469] Signal inference workers to stop experience collection... (50850 times) [2024-06-15 23:20:59,173][1652491] InferenceWorker_p0-w0: stopping experience collection (50850 times) [2024-06-15 23:20:59,314][1651469] Signal inference workers to resume experience collection... (50850 times) [2024-06-15 23:20:59,315][1652491] InferenceWorker_p0-w0: resuming experience collection (50850 times) [2024-06-15 23:20:59,795][1652491] Updated weights for policy 0, policy_version 976035 (0.0010) [2024-06-15 23:21:00,955][1648985] Fps is (10 sec: 68840.3, 60 sec: 70450.8, 300 sec: 72089.5). Total num frames: 1999011840. Throughput: 0: 17999.5. Samples: 499809280. Policy #0 lag: (min: 47.0, avg: 144.8, max: 303.0) [2024-06-15 23:21:00,956][1648985] Avg episode reward: [(0, '187.390')] [2024-06-15 23:21:01,175][1652491] Updated weights for policy 0, policy_version 976099 (0.0012) [2024-06-15 23:21:03,607][1652491] Updated weights for policy 0, policy_version 976160 (0.0012) [2024-06-15 23:21:04,696][1652491] Updated weights for policy 0, policy_version 976224 (0.0012) [2024-06-15 23:21:05,955][1648985] Fps is (10 sec: 78645.2, 60 sec: 71543.5, 300 sec: 72089.7). Total num frames: 1999372288. Throughput: 0: 17829.0. Samples: 499915776. Policy #0 lag: (min: 47.0, avg: 144.8, max: 303.0) [2024-06-15 23:21:05,955][1648985] Avg episode reward: [(0, '191.080')] [2024-06-15 23:21:07,234][1652491] Updated weights for policy 0, policy_version 976272 (0.0013) [2024-06-15 23:21:08,215][1652491] Updated weights for policy 0, policy_version 976322 (0.0011) [2024-06-15 23:21:10,127][1652491] Updated weights for policy 0, policy_version 976386 (0.0011) [2024-06-15 23:21:10,955][1648985] Fps is (10 sec: 72091.6, 60 sec: 72636.0, 300 sec: 71867.4). Total num frames: 1999732736. Throughput: 0: 17840.3. Samples: 499966464. Policy #0 lag: (min: 47.0, avg: 144.8, max: 303.0) [2024-06-15 23:21:10,955][1648985] Avg episode reward: [(0, '185.780')] [2024-06-15 23:21:10,991][1652491] Updated weights for policy 0, policy_version 976444 (0.0013) [2024-06-15 23:21:12,098][1652491] Updated weights for policy 0, policy_version 976480 (0.0011) [2024-06-15 23:21:14,779][1652491] Updated weights for policy 0, policy_version 976528 (0.0010) [2024-06-15 23:21:15,919][1652493] Stopping RolloutWorker_w3... [2024-06-15 23:21:15,919][1652489] Stopping RolloutWorker_w1... [2024-06-15 23:21:15,920][1652489] Loop rollout_proc1_evt_loop terminating... [2024-06-15 23:21:15,920][1652493] Loop rollout_proc3_evt_loop terminating... [2024-06-15 23:21:15,920][1648985] Component RolloutWorker_w3 stopped! [2024-06-15 23:21:15,920][1648985] Component RolloutWorker_w1 stopped! [2024-06-15 23:21:15,920][1648985] Component RolloutWorker_w0 stopped! [2024-06-15 23:21:15,920][1652490] Stopping RolloutWorker_w0... [2024-06-15 23:21:15,922][1652490] Loop rollout_proc0_evt_loop terminating... [2024-06-15 23:21:15,936][1648985] Component RolloutWorker_w2 stopped! [2024-06-15 23:21:15,936][1652492] Stopping RolloutWorker_w2... [2024-06-15 23:21:15,936][1652492] Loop rollout_proc2_evt_loop terminating... [2024-06-15 23:21:15,954][1652491] Weights refcount: 2 0 [2024-06-15 23:21:15,956][1652491] Stopping InferenceWorker_p0-w0... [2024-06-15 23:21:15,956][1652491] Loop inference_proc0-0_evt_loop terminating... [2024-06-15 23:21:15,956][1648985] Component InferenceWorker_p0-w0 stopped! [2024-06-15 23:21:16,008][1648985] Component Batcher_0 stopped! [2024-06-15 23:21:16,009][1651469] Stopping Batcher_0... [2024-06-15 23:21:16,010][1651469] Loop batcher_evt_loop terminating... [2024-06-15 23:21:16,156][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000976608_2000093184.pth... [2024-06-15 23:21:16,196][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000971712_1990066176.pth [2024-06-15 23:21:16,367][1651469] Saving train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000976624_2000125952.pth... [2024-06-15 23:21:16,417][1651469] Removing train_dir/atari_2B_atari_pooyan_1111/checkpoint_p0/checkpoint_000975904_1998651392.pth [2024-06-15 23:21:16,424][1651469] Stopping LearnerWorker_p0... [2024-06-15 23:21:16,424][1648985] Component LearnerWorker_p0 stopped! [2024-06-15 23:21:16,424][1651469] Loop learner_proc0_evt_loop terminating... [2024-06-15 23:21:16,424][1648985] Waiting for process learner_proc0 to stop... [2024-06-15 23:21:17,906][1648985] Waiting for process inference_proc0-0 to join... [2024-06-15 23:21:17,907][1648985] Waiting for process rollout_proc0 to join... [2024-06-15 23:21:17,907][1648985] Waiting for process rollout_proc1 to join... [2024-06-15 23:21:17,907][1648985] Waiting for process rollout_proc2 to join... [2024-06-15 23:21:17,908][1648985] Waiting for process rollout_proc3 to join... [2024-06-15 23:21:17,908][1648985] Batcher 0 profile tree view: batching: 2467.1871, releasing_batches: 5092.6731 [2024-06-15 23:21:17,908][1648985] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0032 wait_policy_total: 12341.8355 update_model: 564.4720 weight_update: 0.0011 one_step: 0.0122 handle_policy_step: 21096.1282 deserialize: 19.3942, stack: 3515.4946, obs_to_device_normalize: 12048.7196, forward: 4152.5395, prepare_outputs: 905.2516, send_messages: 168.1425 [2024-06-15 23:21:17,908][1648985] Learner 0 profile tree view: misc: 0.5002, prepare_batch: 6125.1908 train: 16023.9667 epoch_init: 3.5512, minibatch_init: 195.8730, losses_postprocess: 2329.6010, kl_divergence: 1227.8465, update: 6079.3049, after_optimizer: 2987.3925 calculate_losses: 2976.0751 losses_init: 6.0769, forward_head: 1181.8024, bptt_initial: 19.6331, bptt: 26.8802, tail: 633.8957, advantages_returns: 180.2335, losses: 736.7507 [2024-06-15 23:21:17,909][1648985] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.5931, enqueue_policy_requests: 1725.0700, process_policy_outputs: 69.9033, env_step: 21442.9841, finalize_trajectories: 25.8208, complete_rollouts: 4.5457 post_env_step: 117.5932 process_env_step: 32.2963 [2024-06-15 23:21:17,909][1648985] RolloutWorker_w3 profile tree view: wait_for_trajectories: 0.5285, enqueue_policy_requests: 1686.6267, process_policy_outputs: 75.1087, env_step: 21628.8459, finalize_trajectories: 24.5297, complete_rollouts: 5.1870 post_env_step: 119.5827 process_env_step: 32.6831 [2024-06-15 23:21:17,909][1648985] Loop Runner_EvtLoop terminating... [2024-06-15 23:21:17,909][1648985] Runner profile tree view: main_loop: 42608.6848 [2024-06-15 23:21:17,910][1648985] Collected {0: 2000125952}, FPS: 46941.7