[2024-11-11 14:13:20,505][00562] Saving configuration to /content/train_dir/default_experiment/config.json... [2024-11-11 14:13:20,509][00562] Rollout worker 0 uses device cpu [2024-11-11 14:13:20,510][00562] Rollout worker 1 uses device cpu [2024-11-11 14:13:20,511][00562] Rollout worker 2 uses device cpu [2024-11-11 14:13:20,512][00562] Rollout worker 3 uses device cpu [2024-11-11 14:13:20,513][00562] Rollout worker 4 uses device cpu [2024-11-11 14:13:20,514][00562] Rollout worker 5 uses device cpu [2024-11-11 14:13:20,515][00562] Rollout worker 6 uses device cpu [2024-11-11 14:13:20,516][00562] Rollout worker 7 uses device cpu [2024-11-11 14:13:20,669][00562] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-11 14:13:20,670][00562] InferenceWorker_p0-w0: min num requests: 2 [2024-11-11 14:13:20,702][00562] Starting all processes... [2024-11-11 14:13:20,703][00562] Starting process learner_proc0 [2024-11-11 14:13:20,748][00562] Starting all processes... [2024-11-11 14:13:20,754][00562] Starting process inference_proc0-0 [2024-11-11 14:13:20,755][00562] Starting process rollout_proc0 [2024-11-11 14:13:20,757][00562] Starting process rollout_proc1 [2024-11-11 14:13:20,757][00562] Starting process rollout_proc2 [2024-11-11 14:13:20,757][00562] Starting process rollout_proc3 [2024-11-11 14:13:20,757][00562] Starting process rollout_proc4 [2024-11-11 14:13:20,757][00562] Starting process rollout_proc5 [2024-11-11 14:13:20,757][00562] Starting process rollout_proc6 [2024-11-11 14:13:20,757][00562] Starting process rollout_proc7 [2024-11-11 14:13:38,740][03143] Worker 7 uses CPU cores [1] [2024-11-11 14:13:38,851][03135] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-11 14:13:38,858][03135] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2024-11-11 14:13:38,892][03141] Worker 5 uses CPU cores [1] [2024-11-11 14:13:38,946][03137] Worker 1 uses CPU cores [1] [2024-11-11 14:13:38,955][03135] Num visible devices: 1 [2024-11-11 14:13:39,090][03142] Worker 6 uses CPU cores [0] [2024-11-11 14:13:39,192][03138] Worker 2 uses CPU cores [0] [2024-11-11 14:13:39,308][03122] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-11 14:13:39,309][03122] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2024-11-11 14:13:39,334][03122] Num visible devices: 1 [2024-11-11 14:13:39,359][03122] Starting seed is not provided [2024-11-11 14:13:39,360][03122] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-11 14:13:39,360][03122] Initializing actor-critic model on device cuda:0 [2024-11-11 14:13:39,361][03122] RunningMeanStd input shape: (3, 72, 128) [2024-11-11 14:13:39,364][03122] RunningMeanStd input shape: (1,) [2024-11-11 14:13:39,401][03139] Worker 3 uses CPU cores [1] [2024-11-11 14:13:39,400][03122] ConvEncoder: input_channels=3 [2024-11-11 14:13:39,420][03136] Worker 0 uses CPU cores [0] [2024-11-11 14:13:39,425][03140] Worker 4 uses CPU cores [0] [2024-11-11 14:13:40,071][03122] Conv encoder output size: 512 [2024-11-11 14:13:40,071][03122] Policy head output size: 512 [2024-11-11 14:13:40,151][03122] Created Actor Critic model with architecture: [2024-11-11 14:13:40,152][03122] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2024-11-11 14:13:40,593][03122] Using optimizer [2024-11-11 14:13:40,671][00562] Heartbeat connected on InferenceWorker_p0-w0 [2024-11-11 14:13:40,677][00562] Heartbeat connected on RolloutWorker_w0 [2024-11-11 14:13:40,687][00562] Heartbeat connected on RolloutWorker_w1 [2024-11-11 14:13:40,693][00562] Heartbeat connected on RolloutWorker_w2 [2024-11-11 14:13:40,694][00562] Heartbeat connected on RolloutWorker_w3 [2024-11-11 14:13:40,699][00562] Heartbeat connected on RolloutWorker_w4 [2024-11-11 14:13:40,700][00562] Heartbeat connected on RolloutWorker_w5 [2024-11-11 14:13:40,701][00562] Heartbeat connected on RolloutWorker_w6 [2024-11-11 14:13:40,706][00562] Heartbeat connected on RolloutWorker_w7 [2024-11-11 14:13:40,741][00562] Heartbeat connected on Batcher_0 [2024-11-11 14:13:45,455][03122] No checkpoints found [2024-11-11 14:13:45,456][03122] Did not load from checkpoint, starting from scratch! [2024-11-11 14:13:45,457][03122] Initialized policy 0 weights for model version 0 [2024-11-11 14:13:45,464][03122] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-11 14:13:45,477][03122] LearnerWorker_p0 finished initialization! [2024-11-11 14:13:45,477][00562] Heartbeat connected on LearnerWorker_p0 [2024-11-11 14:13:45,558][03135] RunningMeanStd input shape: (3, 72, 128) [2024-11-11 14:13:45,560][03135] RunningMeanStd input shape: (1,) [2024-11-11 14:13:45,571][03135] ConvEncoder: input_channels=3 [2024-11-11 14:13:45,674][03135] Conv encoder output size: 512 [2024-11-11 14:13:45,675][03135] Policy head output size: 512 [2024-11-11 14:13:45,728][00562] Inference worker 0-0 is ready! [2024-11-11 14:13:45,729][00562] All inference workers are ready! Signal rollout workers to start! [2024-11-11 14:13:45,932][03143] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-11 14:13:45,933][03139] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-11 14:13:45,929][03136] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-11 14:13:45,936][03137] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-11 14:13:45,934][03138] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-11 14:13:45,936][03141] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-11 14:13:45,933][03140] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-11 14:13:45,935][03142] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-11 14:13:46,615][03143] Decorrelating experience for 0 frames... [2024-11-11 14:13:47,230][03136] Decorrelating experience for 0 frames... [2024-11-11 14:13:47,233][03142] Decorrelating experience for 0 frames... [2024-11-11 14:13:47,238][03140] Decorrelating experience for 0 frames... [2024-11-11 14:13:47,235][03138] Decorrelating experience for 0 frames... [2024-11-11 14:13:48,364][03136] Decorrelating experience for 32 frames... [2024-11-11 14:13:48,380][03138] Decorrelating experience for 32 frames... [2024-11-11 14:13:48,385][03140] Decorrelating experience for 32 frames... [2024-11-11 14:13:48,436][03139] Decorrelating experience for 0 frames... [2024-11-11 14:13:48,441][03137] Decorrelating experience for 0 frames... [2024-11-11 14:13:48,444][03141] Decorrelating experience for 0 frames... [2024-11-11 14:13:49,388][03142] Decorrelating experience for 32 frames... [2024-11-11 14:13:49,536][00562] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-11-11 14:13:49,634][03136] Decorrelating experience for 64 frames... [2024-11-11 14:13:50,015][03139] Decorrelating experience for 32 frames... [2024-11-11 14:13:50,010][03137] Decorrelating experience for 32 frames... [2024-11-11 14:13:50,012][03141] Decorrelating experience for 32 frames... [2024-11-11 14:13:50,018][03143] Decorrelating experience for 32 frames... [2024-11-11 14:13:50,794][03138] Decorrelating experience for 64 frames... [2024-11-11 14:13:50,831][03136] Decorrelating experience for 96 frames... [2024-11-11 14:13:50,911][03139] Decorrelating experience for 64 frames... [2024-11-11 14:13:51,535][03142] Decorrelating experience for 64 frames... [2024-11-11 14:13:51,539][03140] Decorrelating experience for 64 frames... [2024-11-11 14:13:52,033][03143] Decorrelating experience for 64 frames... [2024-11-11 14:13:52,175][03139] Decorrelating experience for 96 frames... [2024-11-11 14:13:52,407][03138] Decorrelating experience for 96 frames... [2024-11-11 14:13:52,976][03140] Decorrelating experience for 96 frames... [2024-11-11 14:13:53,081][03137] Decorrelating experience for 64 frames... [2024-11-11 14:13:53,881][03141] Decorrelating experience for 64 frames... [2024-11-11 14:13:54,126][03143] Decorrelating experience for 96 frames... [2024-11-11 14:13:54,496][03142] Decorrelating experience for 96 frames... [2024-11-11 14:13:54,536][00562] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 3.2. Samples: 16. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-11-11 14:13:54,538][00562] Avg episode reward: [(0, '1.240')] [2024-11-11 14:13:56,816][03137] Decorrelating experience for 96 frames... [2024-11-11 14:13:58,646][03141] Decorrelating experience for 96 frames... [2024-11-11 14:13:59,536][00562] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 195.2. Samples: 1952. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-11-11 14:13:59,539][00562] Avg episode reward: [(0, '2.538')] [2024-11-11 14:14:00,594][03122] Signal inference workers to stop experience collection... [2024-11-11 14:14:00,626][03135] InferenceWorker_p0-w0: stopping experience collection [2024-11-11 14:14:02,988][03122] Signal inference workers to resume experience collection... [2024-11-11 14:14:02,990][03135] InferenceWorker_p0-w0: resuming experience collection [2024-11-11 14:14:04,536][00562] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 12288. Throughput: 0: 224.7. Samples: 3370. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-11-11 14:14:04,541][00562] Avg episode reward: [(0, '2.654')] [2024-11-11 14:14:09,536][00562] Fps is (10 sec: 3276.8, 60 sec: 1638.4, 300 sec: 1638.4). Total num frames: 32768. Throughput: 0: 330.9. Samples: 6618. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-11 14:14:09,540][00562] Avg episode reward: [(0, '3.712')] [2024-11-11 14:14:10,740][03135] Updated weights for policy 0, policy_version 10 (0.0158) [2024-11-11 14:14:14,539][00562] Fps is (10 sec: 3685.4, 60 sec: 1965.9, 300 sec: 1965.9). Total num frames: 49152. Throughput: 0: 479.4. Samples: 11986. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-11 14:14:14,541][00562] Avg episode reward: [(0, '4.213')] [2024-11-11 14:14:19,536][00562] Fps is (10 sec: 2867.2, 60 sec: 2048.0, 300 sec: 2048.0). Total num frames: 61440. Throughput: 0: 522.7. Samples: 15682. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-11 14:14:19,540][00562] Avg episode reward: [(0, '4.425')] [2024-11-11 14:14:23,894][03135] Updated weights for policy 0, policy_version 20 (0.0033) [2024-11-11 14:14:24,536][00562] Fps is (10 sec: 3277.7, 60 sec: 2340.6, 300 sec: 2340.6). Total num frames: 81920. Throughput: 0: 536.7. Samples: 18784. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-11 14:14:24,542][00562] Avg episode reward: [(0, '4.474')] [2024-11-11 14:14:29,537][00562] Fps is (10 sec: 4095.4, 60 sec: 2559.9, 300 sec: 2559.9). Total num frames: 102400. Throughput: 0: 631.5. Samples: 25262. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-11 14:14:29,543][00562] Avg episode reward: [(0, '4.385')] [2024-11-11 14:14:29,546][03122] Saving new best policy, reward=4.385! [2024-11-11 14:14:34,540][00562] Fps is (10 sec: 3275.4, 60 sec: 2548.4, 300 sec: 2548.4). Total num frames: 114688. Throughput: 0: 649.9. Samples: 29246. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-11 14:14:34,543][00562] Avg episode reward: [(0, '4.387')] [2024-11-11 14:14:34,555][03122] Saving new best policy, reward=4.387! [2024-11-11 14:14:36,570][03135] Updated weights for policy 0, policy_version 30 (0.0035) [2024-11-11 14:14:39,536][00562] Fps is (10 sec: 2867.6, 60 sec: 2621.4, 300 sec: 2621.4). Total num frames: 131072. Throughput: 0: 700.9. Samples: 31556. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-11 14:14:39,541][00562] Avg episode reward: [(0, '4.332')] [2024-11-11 14:14:44,536][00562] Fps is (10 sec: 4097.7, 60 sec: 2830.0, 300 sec: 2830.0). Total num frames: 155648. Throughput: 0: 794.4. Samples: 37702. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-11 14:14:44,539][00562] Avg episode reward: [(0, '4.309')] [2024-11-11 14:14:46,762][03135] Updated weights for policy 0, policy_version 40 (0.0032) [2024-11-11 14:14:49,536][00562] Fps is (10 sec: 3686.4, 60 sec: 2798.9, 300 sec: 2798.9). Total num frames: 167936. Throughput: 0: 877.4. Samples: 42854. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-11 14:14:49,538][00562] Avg episode reward: [(0, '4.368')] [2024-11-11 14:14:54,536][00562] Fps is (10 sec: 3276.8, 60 sec: 3140.3, 300 sec: 2898.7). Total num frames: 188416. Throughput: 0: 850.6. Samples: 44896. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-11 14:14:54,542][00562] Avg episode reward: [(0, '4.408')] [2024-11-11 14:14:54,551][03122] Saving new best policy, reward=4.408! [2024-11-11 14:14:58,442][03135] Updated weights for policy 0, policy_version 50 (0.0018) [2024-11-11 14:14:59,536][00562] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 2984.2). Total num frames: 208896. Throughput: 0: 866.1. Samples: 50956. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-11 14:14:59,541][00562] Avg episode reward: [(0, '4.312')] [2024-11-11 14:15:04,537][00562] Fps is (10 sec: 3686.1, 60 sec: 3549.8, 300 sec: 3003.7). Total num frames: 225280. Throughput: 0: 920.3. Samples: 57098. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-11 14:15:04,543][00562] Avg episode reward: [(0, '4.295')] [2024-11-11 14:15:09,536][00562] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3020.8). Total num frames: 241664. Throughput: 0: 895.1. Samples: 59062. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-11 14:15:09,540][00562] Avg episode reward: [(0, '4.409')] [2024-11-11 14:15:09,542][03122] Saving new best policy, reward=4.409! [2024-11-11 14:15:10,575][03135] Updated weights for policy 0, policy_version 60 (0.0042) [2024-11-11 14:15:14,536][00562] Fps is (10 sec: 3686.7, 60 sec: 3550.0, 300 sec: 3084.0). Total num frames: 262144. Throughput: 0: 866.4. Samples: 64250. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-11 14:15:14,543][00562] Avg episode reward: [(0, '4.556')] [2024-11-11 14:15:14,550][03122] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000064_262144.pth... [2024-11-11 14:15:14,700][03122] Saving new best policy, reward=4.556! [2024-11-11 14:15:19,536][00562] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3140.3). Total num frames: 282624. Throughput: 0: 915.1. Samples: 70422. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-11 14:15:19,543][00562] Avg episode reward: [(0, '4.557')] [2024-11-11 14:15:20,911][03135] Updated weights for policy 0, policy_version 70 (0.0019) [2024-11-11 14:15:24,536][00562] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3104.3). Total num frames: 294912. Throughput: 0: 916.0. Samples: 72776. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-11 14:15:24,542][00562] Avg episode reward: [(0, '4.631')] [2024-11-11 14:15:24,551][03122] Saving new best policy, reward=4.631! [2024-11-11 14:15:29,536][00562] Fps is (10 sec: 2867.2, 60 sec: 3481.7, 300 sec: 3113.0). Total num frames: 311296. Throughput: 0: 876.0. Samples: 77120. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-11 14:15:29,543][00562] Avg episode reward: [(0, '4.743')] [2024-11-11 14:15:29,548][03122] Saving new best policy, reward=4.743! [2024-11-11 14:15:33,030][03135] Updated weights for policy 0, policy_version 80 (0.0028) [2024-11-11 14:15:34,536][00562] Fps is (10 sec: 3686.4, 60 sec: 3618.4, 300 sec: 3159.8). Total num frames: 331776. Throughput: 0: 900.9. Samples: 83394. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-11 14:15:34,542][00562] Avg episode reward: [(0, '4.588')] [2024-11-11 14:15:39,536][00562] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3165.1). Total num frames: 348160. Throughput: 0: 926.0. Samples: 86564. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-11 14:15:39,543][00562] Avg episode reward: [(0, '4.451')] [2024-11-11 14:15:44,536][00562] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3169.9). Total num frames: 364544. Throughput: 0: 878.4. Samples: 90484. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-11 14:15:44,539][00562] Avg episode reward: [(0, '4.660')] [2024-11-11 14:15:45,430][03135] Updated weights for policy 0, policy_version 90 (0.0023) [2024-11-11 14:15:49,536][00562] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3208.5). Total num frames: 385024. Throughput: 0: 872.9. Samples: 96376. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-11 14:15:49,538][00562] Avg episode reward: [(0, '4.720')] [2024-11-11 14:15:54,536][00562] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3244.0). Total num frames: 405504. Throughput: 0: 903.6. Samples: 99722. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-11 14:15:54,541][00562] Avg episode reward: [(0, '4.603')] [2024-11-11 14:15:55,158][03135] Updated weights for policy 0, policy_version 100 (0.0013) [2024-11-11 14:15:59,541][00562] Fps is (10 sec: 3275.1, 60 sec: 3481.3, 300 sec: 3213.7). Total num frames: 417792. Throughput: 0: 896.5. Samples: 104598. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-11 14:15:59,544][00562] Avg episode reward: [(0, '4.750')] [2024-11-11 14:15:59,550][03122] Saving new best policy, reward=4.750! [2024-11-11 14:16:04,536][00562] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3246.5). Total num frames: 438272. Throughput: 0: 874.9. Samples: 109794. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-11 14:16:04,539][00562] Avg episode reward: [(0, '4.947')] [2024-11-11 14:16:04,547][03122] Saving new best policy, reward=4.947! [2024-11-11 14:16:07,043][03135] Updated weights for policy 0, policy_version 110 (0.0031) [2024-11-11 14:16:09,536][00562] Fps is (10 sec: 4098.1, 60 sec: 3618.1, 300 sec: 3276.8). Total num frames: 458752. Throughput: 0: 891.1. Samples: 112874. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-11 14:16:09,540][00562] Avg episode reward: [(0, '4.808')] [2024-11-11 14:16:14,536][00562] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3276.8). Total num frames: 475136. Throughput: 0: 918.0. Samples: 118428. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-11 14:16:14,541][00562] Avg episode reward: [(0, '4.764')] [2024-11-11 14:16:19,536][00562] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3249.5). Total num frames: 487424. Throughput: 0: 871.7. Samples: 122620. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-11 14:16:19,543][00562] Avg episode reward: [(0, '4.668')] [2024-11-11 14:16:19,670][03135] Updated weights for policy 0, policy_version 120 (0.0034) [2024-11-11 14:16:24,536][00562] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3303.2). Total num frames: 512000. Throughput: 0: 870.3. Samples: 125726. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-11 14:16:24,538][00562] Avg episode reward: [(0, '4.833')] [2024-11-11 14:16:29,536][00562] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3302.4). Total num frames: 528384. Throughput: 0: 927.5. Samples: 132222. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-11 14:16:29,539][00562] Avg episode reward: [(0, '5.245')] [2024-11-11 14:16:29,649][03135] Updated weights for policy 0, policy_version 130 (0.0021) [2024-11-11 14:16:29,664][03122] Saving new best policy, reward=5.245! [2024-11-11 14:16:34,536][00562] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3301.6). Total num frames: 544768. Throughput: 0: 889.1. Samples: 136386. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-11 14:16:34,544][00562] Avg episode reward: [(0, '5.391')] [2024-11-11 14:16:34,557][03122] Saving new best policy, reward=5.391! [2024-11-11 14:16:39,536][00562] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3325.0). Total num frames: 565248. Throughput: 0: 870.9. Samples: 138912. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-11 14:16:39,541][00562] Avg episode reward: [(0, '5.441')] [2024-11-11 14:16:39,546][03122] Saving new best policy, reward=5.441! [2024-11-11 14:16:41,376][03135] Updated weights for policy 0, policy_version 140 (0.0024) [2024-11-11 14:16:44,536][00562] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3347.0). Total num frames: 585728. Throughput: 0: 905.1. Samples: 145322. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-11 14:16:44,540][00562] Avg episode reward: [(0, '5.502')] [2024-11-11 14:16:44,555][03122] Saving new best policy, reward=5.502! [2024-11-11 14:16:49,536][00562] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3322.3). Total num frames: 598016. Throughput: 0: 895.1. Samples: 150072. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-11 14:16:49,542][00562] Avg episode reward: [(0, '5.174')] [2024-11-11 14:16:54,027][03135] Updated weights for policy 0, policy_version 150 (0.0026) [2024-11-11 14:16:54,536][00562] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3321.1). Total num frames: 614400. Throughput: 0: 873.1. Samples: 152164. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-11 14:16:54,539][00562] Avg episode reward: [(0, '5.378')] [2024-11-11 14:16:59,536][00562] Fps is (10 sec: 3686.4, 60 sec: 3618.4, 300 sec: 3341.5). Total num frames: 634880. Throughput: 0: 884.6. Samples: 158236. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-11 14:16:59,545][00562] Avg episode reward: [(0, '5.539')] [2024-11-11 14:16:59,620][03122] Saving new best policy, reward=5.539! [2024-11-11 14:17:04,115][03135] Updated weights for policy 0, policy_version 160 (0.0030) [2024-11-11 14:17:04,536][00562] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3360.8). Total num frames: 655360. Throughput: 0: 921.9. Samples: 164106. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-11 14:17:04,540][00562] Avg episode reward: [(0, '5.798')] [2024-11-11 14:17:04,552][03122] Saving new best policy, reward=5.798! [2024-11-11 14:17:09,537][00562] Fps is (10 sec: 3276.5, 60 sec: 3481.6, 300 sec: 3338.2). Total num frames: 667648. Throughput: 0: 894.6. Samples: 165982. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-11 14:17:09,542][00562] Avg episode reward: [(0, '5.654')] [2024-11-11 14:17:14,536][00562] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3356.7). Total num frames: 688128. Throughput: 0: 870.3. Samples: 171386. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-11 14:17:14,542][00562] Avg episode reward: [(0, '5.711')] [2024-11-11 14:17:14,552][03122] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000168_688128.pth... [2024-11-11 14:17:15,909][03135] Updated weights for policy 0, policy_version 170 (0.0018) [2024-11-11 14:17:19,536][00562] Fps is (10 sec: 4096.3, 60 sec: 3686.4, 300 sec: 3374.3). Total num frames: 708608. Throughput: 0: 917.2. Samples: 177662. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-11 14:17:19,538][00562] Avg episode reward: [(0, '5.479')] [2024-11-11 14:17:24,541][00562] Fps is (10 sec: 3684.5, 60 sec: 3549.6, 300 sec: 3372.0). Total num frames: 724992. Throughput: 0: 911.6. Samples: 179940. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-11 14:17:24,546][00562] Avg episode reward: [(0, '5.572')] [2024-11-11 14:17:28,184][03135] Updated weights for policy 0, policy_version 180 (0.0019) [2024-11-11 14:17:29,536][00562] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3369.9). Total num frames: 741376. Throughput: 0: 866.4. Samples: 184310. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-11 14:17:29,541][00562] Avg episode reward: [(0, '5.582')] [2024-11-11 14:17:34,536][00562] Fps is (10 sec: 3688.3, 60 sec: 3618.1, 300 sec: 3386.0). Total num frames: 761856. Throughput: 0: 903.1. Samples: 190712. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-11 14:17:34,540][00562] Avg episode reward: [(0, '5.400')] [2024-11-11 14:17:38,535][03135] Updated weights for policy 0, policy_version 190 (0.0019) [2024-11-11 14:17:39,536][00562] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3383.7). Total num frames: 778240. Throughput: 0: 926.4. Samples: 193854. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-11 14:17:39,539][00562] Avg episode reward: [(0, '5.618')] [2024-11-11 14:17:44,536][00562] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3363.9). Total num frames: 790528. Throughput: 0: 880.0. Samples: 197836. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-11 14:17:44,541][00562] Avg episode reward: [(0, '5.695')] [2024-11-11 14:17:49,536][00562] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3396.3). Total num frames: 815104. Throughput: 0: 879.5. Samples: 203684. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-11 14:17:49,539][00562] Avg episode reward: [(0, '6.146')] [2024-11-11 14:17:49,544][03122] Saving new best policy, reward=6.146! [2024-11-11 14:17:50,420][03135] Updated weights for policy 0, policy_version 200 (0.0033) [2024-11-11 14:17:54,541][00562] Fps is (10 sec: 4503.4, 60 sec: 3686.1, 300 sec: 3410.5). Total num frames: 835584. Throughput: 0: 908.1. Samples: 206850. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-11 14:17:54,544][00562] Avg episode reward: [(0, '6.938')] [2024-11-11 14:17:54,556][03122] Saving new best policy, reward=6.938! [2024-11-11 14:17:59,538][00562] Fps is (10 sec: 3276.1, 60 sec: 3549.7, 300 sec: 3391.5). Total num frames: 847872. Throughput: 0: 893.9. Samples: 211612. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-11 14:17:59,541][00562] Avg episode reward: [(0, '7.284')] [2024-11-11 14:17:59,543][03122] Saving new best policy, reward=7.284! [2024-11-11 14:18:02,770][03135] Updated weights for policy 0, policy_version 210 (0.0032) [2024-11-11 14:18:04,537][00562] Fps is (10 sec: 2868.3, 60 sec: 3481.6, 300 sec: 3389.2). Total num frames: 864256. Throughput: 0: 866.4. Samples: 216650. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-11 14:18:04,539][00562] Avg episode reward: [(0, '7.289')] [2024-11-11 14:18:04,553][03122] Saving new best policy, reward=7.289! [2024-11-11 14:18:09,536][00562] Fps is (10 sec: 3687.0, 60 sec: 3618.2, 300 sec: 3402.8). Total num frames: 884736. Throughput: 0: 884.5. Samples: 219738. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-11 14:18:09,541][00562] Avg episode reward: [(0, '6.772')] [2024-11-11 14:18:13,268][03135] Updated weights for policy 0, policy_version 220 (0.0029) [2024-11-11 14:18:14,537][00562] Fps is (10 sec: 3686.5, 60 sec: 3549.8, 300 sec: 3400.4). Total num frames: 901120. Throughput: 0: 914.7. Samples: 225474. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-11 14:18:14,544][00562] Avg episode reward: [(0, '6.888')] [2024-11-11 14:18:19,536][00562] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3398.2). Total num frames: 917504. Throughput: 0: 869.5. Samples: 229838. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-11 14:18:19,542][00562] Avg episode reward: [(0, '6.874')] [2024-11-11 14:18:24,536][00562] Fps is (10 sec: 3686.6, 60 sec: 3550.2, 300 sec: 3410.9). Total num frames: 937984. Throughput: 0: 871.2. Samples: 233058. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-11 14:18:24,543][00562] Avg episode reward: [(0, '7.313')] [2024-11-11 14:18:24,564][03122] Saving new best policy, reward=7.313! [2024-11-11 14:18:24,562][03135] Updated weights for policy 0, policy_version 230 (0.0019) [2024-11-11 14:18:29,538][00562] Fps is (10 sec: 4095.4, 60 sec: 3618.0, 300 sec: 3423.1). Total num frames: 958464. Throughput: 0: 925.4. Samples: 239480. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-11 14:18:29,542][00562] Avg episode reward: [(0, '7.727')] [2024-11-11 14:18:29,544][03122] Saving new best policy, reward=7.727! [2024-11-11 14:18:34,536][00562] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3406.1). Total num frames: 970752. Throughput: 0: 880.6. Samples: 243310. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-11 14:18:34,541][00562] Avg episode reward: [(0, '7.795')] [2024-11-11 14:18:34,551][03122] Saving new best policy, reward=7.795! [2024-11-11 14:18:37,011][03135] Updated weights for policy 0, policy_version 240 (0.0025) [2024-11-11 14:18:39,536][00562] Fps is (10 sec: 3277.3, 60 sec: 3549.9, 300 sec: 3418.0). Total num frames: 991232. Throughput: 0: 873.4. Samples: 246148. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-11 14:18:39,541][00562] Avg episode reward: [(0, '8.393')] [2024-11-11 14:18:39,544][03122] Saving new best policy, reward=8.393! [2024-11-11 14:18:44,536][00562] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3429.5). Total num frames: 1011712. Throughput: 0: 910.6. Samples: 252588. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-11 14:18:44,539][00562] Avg episode reward: [(0, '8.526')] [2024-11-11 14:18:44,549][03122] Saving new best policy, reward=8.526! [2024-11-11 14:18:48,068][03135] Updated weights for policy 0, policy_version 250 (0.0015) [2024-11-11 14:18:49,536][00562] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 1024000. Throughput: 0: 898.2. Samples: 257066. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-11 14:18:49,539][00562] Avg episode reward: [(0, '9.033')] [2024-11-11 14:18:49,559][03122] Saving new best policy, reward=9.033! [2024-11-11 14:18:54,536][00562] Fps is (10 sec: 3276.8, 60 sec: 3481.9, 300 sec: 3540.6). Total num frames: 1044480. Throughput: 0: 874.0. Samples: 259066. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-11 14:18:54,543][00562] Avg episode reward: [(0, '8.546')] [2024-11-11 14:18:59,081][03135] Updated weights for policy 0, policy_version 260 (0.0025) [2024-11-11 14:18:59,536][00562] Fps is (10 sec: 4096.0, 60 sec: 3618.3, 300 sec: 3568.4). Total num frames: 1064960. Throughput: 0: 890.6. Samples: 265550. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-11 14:18:59,540][00562] Avg episode reward: [(0, '7.860')] [2024-11-11 14:19:04,536][00562] Fps is (10 sec: 3686.4, 60 sec: 3618.2, 300 sec: 3554.5). Total num frames: 1081344. Throughput: 0: 913.6. Samples: 270950. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-11 14:19:04,546][00562] Avg episode reward: [(0, '7.881')] [2024-11-11 14:19:09,536][00562] Fps is (10 sec: 3276.7, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 1097728. Throughput: 0: 887.1. Samples: 272976. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-11 14:19:09,538][00562] Avg episode reward: [(0, '8.809')] [2024-11-11 14:19:11,099][03135] Updated weights for policy 0, policy_version 270 (0.0031) [2024-11-11 14:19:14,536][00562] Fps is (10 sec: 3686.3, 60 sec: 3618.2, 300 sec: 3582.3). Total num frames: 1118208. Throughput: 0: 878.3. Samples: 279000. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-11 14:19:14,539][00562] Avg episode reward: [(0, '9.228')] [2024-11-11 14:19:14,555][03122] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000273_1118208.pth... [2024-11-11 14:19:14,675][03122] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000064_262144.pth [2024-11-11 14:19:14,692][03122] Saving new best policy, reward=9.228! [2024-11-11 14:19:19,536][00562] Fps is (10 sec: 3686.5, 60 sec: 3618.1, 300 sec: 3568.4). Total num frames: 1134592. Throughput: 0: 920.3. Samples: 284724. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-11 14:19:19,540][00562] Avg episode reward: [(0, '9.538')] [2024-11-11 14:19:19,604][03122] Saving new best policy, reward=9.538! [2024-11-11 14:19:22,954][03135] Updated weights for policy 0, policy_version 280 (0.0018) [2024-11-11 14:19:24,536][00562] Fps is (10 sec: 3276.9, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 1150976. Throughput: 0: 898.6. Samples: 286586. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-11 14:19:24,540][00562] Avg episode reward: [(0, '9.325')] [2024-11-11 14:19:29,536][00562] Fps is (10 sec: 3686.3, 60 sec: 3549.9, 300 sec: 3582.3). Total num frames: 1171456. Throughput: 0: 873.1. Samples: 291876. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-11-11 14:19:29,538][00562] Avg episode reward: [(0, '9.553')] [2024-11-11 14:19:29,542][03122] Saving new best policy, reward=9.553! [2024-11-11 14:19:33,246][03135] Updated weights for policy 0, policy_version 290 (0.0019) [2024-11-11 14:19:34,536][00562] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3596.1). Total num frames: 1191936. Throughput: 0: 910.7. Samples: 298046. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-11 14:19:34,541][00562] Avg episode reward: [(0, '10.099')] [2024-11-11 14:19:34,551][03122] Saving new best policy, reward=10.099! [2024-11-11 14:19:39,541][00562] Fps is (10 sec: 3275.2, 60 sec: 3549.6, 300 sec: 3554.4). Total num frames: 1204224. Throughput: 0: 915.3. Samples: 300258. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-11 14:19:39,544][00562] Avg episode reward: [(0, '9.851')] [2024-11-11 14:19:44,542][00562] Fps is (10 sec: 3274.9, 60 sec: 3549.5, 300 sec: 3582.2). Total num frames: 1224704. Throughput: 0: 873.6. Samples: 304868. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-11 14:19:44,544][00562] Avg episode reward: [(0, '9.566')] [2024-11-11 14:19:45,424][03135] Updated weights for policy 0, policy_version 300 (0.0022) [2024-11-11 14:19:49,536][00562] Fps is (10 sec: 4098.1, 60 sec: 3686.4, 300 sec: 3582.3). Total num frames: 1245184. Throughput: 0: 899.8. Samples: 311440. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-11 14:19:49,544][00562] Avg episode reward: [(0, '9.843')] [2024-11-11 14:19:54,536][00562] Fps is (10 sec: 3688.6, 60 sec: 3618.1, 300 sec: 3568.4). Total num frames: 1261568. Throughput: 0: 932.5. Samples: 314936. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-11 14:19:54,539][00562] Avg episode reward: [(0, '9.722')] [2024-11-11 14:19:56,009][03135] Updated weights for policy 0, policy_version 310 (0.0016) [2024-11-11 14:19:59,536][00562] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 1277952. Throughput: 0: 893.7. Samples: 319216. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-11 14:19:59,538][00562] Avg episode reward: [(0, '9.716')] [2024-11-11 14:20:04,536][00562] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3596.1). Total num frames: 1302528. Throughput: 0: 914.5. Samples: 325876. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-11 14:20:04,541][00562] Avg episode reward: [(0, '10.107')] [2024-11-11 14:20:04,556][03122] Saving new best policy, reward=10.107! [2024-11-11 14:20:05,974][03135] Updated weights for policy 0, policy_version 320 (0.0017) [2024-11-11 14:20:09,539][00562] Fps is (10 sec: 4504.2, 60 sec: 3754.5, 300 sec: 3596.1). Total num frames: 1323008. Throughput: 0: 949.0. Samples: 329296. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-11 14:20:09,541][00562] Avg episode reward: [(0, '11.050')] [2024-11-11 14:20:09,552][03122] Saving new best policy, reward=11.050! [2024-11-11 14:20:14,536][00562] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3568.4). Total num frames: 1335296. Throughput: 0: 941.1. Samples: 334226. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-11-11 14:20:14,539][00562] Avg episode reward: [(0, '11.349')] [2024-11-11 14:20:14,558][03122] Saving new best policy, reward=11.349! [2024-11-11 14:20:17,364][03135] Updated weights for policy 0, policy_version 330 (0.0037) [2024-11-11 14:20:19,536][00562] Fps is (10 sec: 3687.5, 60 sec: 3754.7, 300 sec: 3610.0). Total num frames: 1359872. Throughput: 0: 932.2. Samples: 339996. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-11 14:20:19,538][00562] Avg episode reward: [(0, '11.071')] [2024-11-11 14:20:24,536][00562] Fps is (10 sec: 4915.2, 60 sec: 3891.2, 300 sec: 3637.8). Total num frames: 1384448. Throughput: 0: 961.4. Samples: 343514. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-11 14:20:24,538][00562] Avg episode reward: [(0, '10.360')] [2024-11-11 14:20:26,823][03135] Updated weights for policy 0, policy_version 340 (0.0031) [2024-11-11 14:20:29,536][00562] Fps is (10 sec: 3686.3, 60 sec: 3754.7, 300 sec: 3610.0). Total num frames: 1396736. Throughput: 0: 988.7. Samples: 349354. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-11 14:20:29,546][00562] Avg episode reward: [(0, '10.946')] [2024-11-11 14:20:34,536][00562] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3623.9). Total num frames: 1417216. Throughput: 0: 961.4. Samples: 354702. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-11 14:20:34,538][00562] Avg episode reward: [(0, '11.230')] [2024-11-11 14:20:37,353][03135] Updated weights for policy 0, policy_version 350 (0.0020) [2024-11-11 14:20:39,536][00562] Fps is (10 sec: 4505.7, 60 sec: 3959.8, 300 sec: 3651.7). Total num frames: 1441792. Throughput: 0: 960.0. Samples: 358138. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-11 14:20:39,538][00562] Avg episode reward: [(0, '13.045')] [2024-11-11 14:20:39,542][03122] Saving new best policy, reward=13.045! [2024-11-11 14:20:44,536][00562] Fps is (10 sec: 4096.0, 60 sec: 3891.6, 300 sec: 3637.8). Total num frames: 1458176. Throughput: 0: 1009.6. Samples: 364648. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-11 14:20:44,538][00562] Avg episode reward: [(0, '14.305')] [2024-11-11 14:20:44,555][03122] Saving new best policy, reward=14.305! [2024-11-11 14:20:48,981][03135] Updated weights for policy 0, policy_version 360 (0.0027) [2024-11-11 14:20:49,536][00562] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3623.9). Total num frames: 1474560. Throughput: 0: 958.2. Samples: 368996. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-11 14:20:49,538][00562] Avg episode reward: [(0, '14.481')] [2024-11-11 14:20:49,540][03122] Saving new best policy, reward=14.481! [2024-11-11 14:20:54,536][00562] Fps is (10 sec: 4095.9, 60 sec: 3959.4, 300 sec: 3665.6). Total num frames: 1499136. Throughput: 0: 958.9. Samples: 372442. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-11 14:20:54,543][00562] Avg episode reward: [(0, '14.633')] [2024-11-11 14:20:54,551][03122] Saving new best policy, reward=14.633! [2024-11-11 14:20:57,890][03135] Updated weights for policy 0, policy_version 370 (0.0025) [2024-11-11 14:20:59,536][00562] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3665.6). Total num frames: 1519616. Throughput: 0: 1002.1. Samples: 379322. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-11 14:20:59,542][00562] Avg episode reward: [(0, '13.241')] [2024-11-11 14:21:04,536][00562] Fps is (10 sec: 3276.9, 60 sec: 3822.9, 300 sec: 3637.8). Total num frames: 1531904. Throughput: 0: 974.9. Samples: 383868. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-11 14:21:04,539][00562] Avg episode reward: [(0, '13.089')] [2024-11-11 14:21:09,174][03135] Updated weights for policy 0, policy_version 380 (0.0021) [2024-11-11 14:21:09,536][00562] Fps is (10 sec: 3686.4, 60 sec: 3891.4, 300 sec: 3665.6). Total num frames: 1556480. Throughput: 0: 962.0. Samples: 386802. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-11 14:21:09,539][00562] Avg episode reward: [(0, '13.534')] [2024-11-11 14:21:14,536][00562] Fps is (10 sec: 4915.2, 60 sec: 4096.0, 300 sec: 3707.2). Total num frames: 1581056. Throughput: 0: 989.8. Samples: 393894. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-11 14:21:14,538][00562] Avg episode reward: [(0, '14.577')] [2024-11-11 14:21:14,546][03122] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000386_1581056.pth... [2024-11-11 14:21:14,667][03122] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000168_688128.pth [2024-11-11 14:21:19,540][00562] Fps is (10 sec: 3684.8, 60 sec: 3890.9, 300 sec: 3665.5). Total num frames: 1593344. Throughput: 0: 988.3. Samples: 399180. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-11 14:21:19,546][00562] Avg episode reward: [(0, '15.060')] [2024-11-11 14:21:19,628][03135] Updated weights for policy 0, policy_version 390 (0.0020) [2024-11-11 14:21:19,632][03122] Saving new best policy, reward=15.060! [2024-11-11 14:21:24,536][00562] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3679.5). Total num frames: 1613824. Throughput: 0: 960.8. Samples: 401372. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-11 14:21:24,539][00562] Avg episode reward: [(0, '15.528')] [2024-11-11 14:21:24,549][03122] Saving new best policy, reward=15.528! [2024-11-11 14:21:29,315][03135] Updated weights for policy 0, policy_version 400 (0.0027) [2024-11-11 14:21:29,536][00562] Fps is (10 sec: 4507.6, 60 sec: 4027.7, 300 sec: 3707.2). Total num frames: 1638400. Throughput: 0: 969.6. Samples: 408280. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-11 14:21:29,544][00562] Avg episode reward: [(0, '14.303')] [2024-11-11 14:21:34,536][00562] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3693.3). Total num frames: 1654784. Throughput: 0: 1011.9. Samples: 414530. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-11 14:21:34,542][00562] Avg episode reward: [(0, '12.693')] [2024-11-11 14:21:39,536][00562] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3679.5). Total num frames: 1671168. Throughput: 0: 983.8. Samples: 416712. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-11 14:21:39,542][00562] Avg episode reward: [(0, '11.913')] [2024-11-11 14:21:40,711][03135] Updated weights for policy 0, policy_version 410 (0.0020) [2024-11-11 14:21:44,536][00562] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3721.1). Total num frames: 1695744. Throughput: 0: 969.8. Samples: 422964. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-11 14:21:44,539][00562] Avg episode reward: [(0, '12.488')] [2024-11-11 14:21:49,537][00562] Fps is (10 sec: 4505.1, 60 sec: 4027.7, 300 sec: 3735.0). Total num frames: 1716224. Throughput: 0: 1025.6. Samples: 430020. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-11 14:21:49,540][00562] Avg episode reward: [(0, '13.037')] [2024-11-11 14:21:49,650][03135] Updated weights for policy 0, policy_version 420 (0.0014) [2024-11-11 14:21:54,542][00562] Fps is (10 sec: 3684.2, 60 sec: 3890.8, 300 sec: 3721.0). Total num frames: 1732608. Throughput: 0: 1008.1. Samples: 432174. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-11 14:21:54,551][00562] Avg episode reward: [(0, '13.699')] [2024-11-11 14:21:59,536][00562] Fps is (10 sec: 3686.8, 60 sec: 3891.2, 300 sec: 3721.1). Total num frames: 1753088. Throughput: 0: 967.2. Samples: 437418. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-11 14:21:59,539][00562] Avg episode reward: [(0, '15.479')] [2024-11-11 14:22:01,010][03135] Updated weights for policy 0, policy_version 430 (0.0020) [2024-11-11 14:22:04,537][00562] Fps is (10 sec: 4507.9, 60 sec: 4095.9, 300 sec: 3762.8). Total num frames: 1777664. Throughput: 0: 998.2. Samples: 444096. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-11 14:22:04,544][00562] Avg episode reward: [(0, '15.164')] [2024-11-11 14:22:09,537][00562] Fps is (10 sec: 3685.8, 60 sec: 3891.1, 300 sec: 3735.0). Total num frames: 1789952. Throughput: 0: 1012.5. Samples: 446936. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-11 14:22:09,543][00562] Avg episode reward: [(0, '15.847')] [2024-11-11 14:22:09,548][03122] Saving new best policy, reward=15.847! [2024-11-11 14:22:13,001][03135] Updated weights for policy 0, policy_version 440 (0.0038) [2024-11-11 14:22:14,536][00562] Fps is (10 sec: 2867.5, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 1806336. Throughput: 0: 948.5. Samples: 450962. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-11 14:22:14,539][00562] Avg episode reward: [(0, '15.502')] [2024-11-11 14:22:19,536][00562] Fps is (10 sec: 4096.6, 60 sec: 3959.8, 300 sec: 3748.9). Total num frames: 1830912. Throughput: 0: 954.4. Samples: 457478. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-11 14:22:19,542][00562] Avg episode reward: [(0, '14.958')] [2024-11-11 14:22:22,092][03135] Updated weights for policy 0, policy_version 450 (0.0028) [2024-11-11 14:22:24,536][00562] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3748.9). Total num frames: 1847296. Throughput: 0: 982.1. Samples: 460906. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-11 14:22:24,540][00562] Avg episode reward: [(0, '15.776')] [2024-11-11 14:22:29,536][00562] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 1863680. Throughput: 0: 939.7. Samples: 465250. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-11 14:22:29,538][00562] Avg episode reward: [(0, '16.064')] [2024-11-11 14:22:29,540][03122] Saving new best policy, reward=16.064! [2024-11-11 14:22:34,004][03135] Updated weights for policy 0, policy_version 460 (0.0025) [2024-11-11 14:22:34,536][00562] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3748.9). Total num frames: 1884160. Throughput: 0: 916.6. Samples: 471266. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-11 14:22:34,538][00562] Avg episode reward: [(0, '15.748')] [2024-11-11 14:22:39,536][00562] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3776.7). Total num frames: 1904640. Throughput: 0: 943.1. Samples: 474608. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-11 14:22:39,541][00562] Avg episode reward: [(0, '15.982')] [2024-11-11 14:22:44,536][00562] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 1921024. Throughput: 0: 939.2. Samples: 479680. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-11 14:22:44,541][00562] Avg episode reward: [(0, '14.914')] [2024-11-11 14:22:45,999][03135] Updated weights for policy 0, policy_version 470 (0.0026) [2024-11-11 14:22:49,536][00562] Fps is (10 sec: 3276.8, 60 sec: 3686.5, 300 sec: 3735.1). Total num frames: 1937408. Throughput: 0: 906.5. Samples: 484888. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-11 14:22:49,539][00562] Avg episode reward: [(0, '14.322')] [2024-11-11 14:22:54,536][00562] Fps is (10 sec: 4096.0, 60 sec: 3823.3, 300 sec: 3776.7). Total num frames: 1961984. Throughput: 0: 916.4. Samples: 488174. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-11 14:22:54,538][00562] Avg episode reward: [(0, '12.973')] [2024-11-11 14:22:55,248][03135] Updated weights for policy 0, policy_version 480 (0.0021) [2024-11-11 14:22:59,536][00562] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3776.7). Total num frames: 1978368. Throughput: 0: 959.3. Samples: 494130. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-11 14:22:59,539][00562] Avg episode reward: [(0, '13.627')] [2024-11-11 14:23:04,536][00562] Fps is (10 sec: 3276.8, 60 sec: 3618.2, 300 sec: 3762.8). Total num frames: 1994752. Throughput: 0: 918.2. Samples: 498798. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-11 14:23:04,543][00562] Avg episode reward: [(0, '15.122')] [2024-11-11 14:23:06,911][03135] Updated weights for policy 0, policy_version 490 (0.0036) [2024-11-11 14:23:09,536][00562] Fps is (10 sec: 3686.4, 60 sec: 3754.8, 300 sec: 3776.7). Total num frames: 2015232. Throughput: 0: 917.5. Samples: 502194. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-11 14:23:09,542][00562] Avg episode reward: [(0, '16.168')] [2024-11-11 14:23:09,550][03122] Saving new best policy, reward=16.168! [2024-11-11 14:23:14,538][00562] Fps is (10 sec: 4095.1, 60 sec: 3822.8, 300 sec: 3790.5). Total num frames: 2035712. Throughput: 0: 969.3. Samples: 508872. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-11-11 14:23:14,545][00562] Avg episode reward: [(0, '17.297')] [2024-11-11 14:23:14,560][03122] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000497_2035712.pth... [2024-11-11 14:23:14,723][03122] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000273_1118208.pth [2024-11-11 14:23:14,744][03122] Saving new best policy, reward=17.297! [2024-11-11 14:23:18,676][03135] Updated weights for policy 0, policy_version 500 (0.0024) [2024-11-11 14:23:19,536][00562] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3762.8). Total num frames: 2048000. Throughput: 0: 918.7. Samples: 512608. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-11 14:23:19,539][00562] Avg episode reward: [(0, '18.508')] [2024-11-11 14:23:19,553][03122] Saving new best policy, reward=18.508! [2024-11-11 14:23:24,536][00562] Fps is (10 sec: 3687.2, 60 sec: 3754.7, 300 sec: 3776.7). Total num frames: 2072576. Throughput: 0: 910.7. Samples: 515588. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-11 14:23:24,538][00562] Avg episode reward: [(0, '17.975')] [2024-11-11 14:23:28,016][03135] Updated weights for policy 0, policy_version 510 (0.0013) [2024-11-11 14:23:29,537][00562] Fps is (10 sec: 4505.0, 60 sec: 3822.8, 300 sec: 3804.4). Total num frames: 2093056. Throughput: 0: 950.5. Samples: 522452. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-11 14:23:29,545][00562] Avg episode reward: [(0, '17.632')] [2024-11-11 14:23:34,536][00562] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3776.7). Total num frames: 2105344. Throughput: 0: 944.7. Samples: 527398. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-11 14:23:34,543][00562] Avg episode reward: [(0, '18.036')] [2024-11-11 14:23:39,536][00562] Fps is (10 sec: 3277.3, 60 sec: 3686.4, 300 sec: 3776.7). Total num frames: 2125824. Throughput: 0: 923.7. Samples: 529740. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-11 14:23:39,542][00562] Avg episode reward: [(0, '18.040')] [2024-11-11 14:23:39,830][03135] Updated weights for policy 0, policy_version 520 (0.0023) [2024-11-11 14:23:44,536][00562] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3818.3). Total num frames: 2150400. Throughput: 0: 939.6. Samples: 536414. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-11 14:23:44,542][00562] Avg episode reward: [(0, '18.160')] [2024-11-11 14:23:49,536][00562] Fps is (10 sec: 4095.9, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 2166784. Throughput: 0: 958.8. Samples: 541944. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-11 14:23:49,542][00562] Avg episode reward: [(0, '19.216')] [2024-11-11 14:23:49,543][03122] Saving new best policy, reward=19.216! [2024-11-11 14:23:50,734][03135] Updated weights for policy 0, policy_version 530 (0.0014) [2024-11-11 14:23:54,536][00562] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3790.5). Total num frames: 2183168. Throughput: 0: 926.4. Samples: 543880. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-11 14:23:54,543][00562] Avg episode reward: [(0, '19.125')] [2024-11-11 14:23:59,536][00562] Fps is (10 sec: 3686.5, 60 sec: 3754.7, 300 sec: 3804.4). Total num frames: 2203648. Throughput: 0: 915.8. Samples: 550082. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-11 14:23:59,540][00562] Avg episode reward: [(0, '18.306')] [2024-11-11 14:24:00,831][03135] Updated weights for policy 0, policy_version 540 (0.0028) [2024-11-11 14:24:04,536][00562] Fps is (10 sec: 4095.9, 60 sec: 3822.9, 300 sec: 3818.3). Total num frames: 2224128. Throughput: 0: 982.9. Samples: 556838. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-11 14:24:04,538][00562] Avg episode reward: [(0, '17.906')] [2024-11-11 14:24:09,536][00562] Fps is (10 sec: 3686.3, 60 sec: 3754.7, 300 sec: 3804.4). Total num frames: 2240512. Throughput: 0: 962.2. Samples: 558888. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-11 14:24:09,541][00562] Avg episode reward: [(0, '17.494')] [2024-11-11 14:24:12,425][03135] Updated weights for policy 0, policy_version 550 (0.0024) [2024-11-11 14:24:14,536][00562] Fps is (10 sec: 3686.5, 60 sec: 3754.8, 300 sec: 3818.3). Total num frames: 2260992. Throughput: 0: 928.3. Samples: 564222. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-11 14:24:14,543][00562] Avg episode reward: [(0, '17.795')] [2024-11-11 14:24:19,536][00562] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3832.2). Total num frames: 2281472. Throughput: 0: 973.9. Samples: 571224. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-11 14:24:19,544][00562] Avg episode reward: [(0, '17.263')] [2024-11-11 14:24:22,313][03135] Updated weights for policy 0, policy_version 560 (0.0018) [2024-11-11 14:24:24,537][00562] Fps is (10 sec: 3686.1, 60 sec: 3754.6, 300 sec: 3818.3). Total num frames: 2297856. Throughput: 0: 978.9. Samples: 573792. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-11 14:24:24,539][00562] Avg episode reward: [(0, '17.846')] [2024-11-11 14:24:29,536][00562] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3818.3). Total num frames: 2318336. Throughput: 0: 937.1. Samples: 578584. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-11 14:24:29,544][00562] Avg episode reward: [(0, '18.910')] [2024-11-11 14:24:32,729][03135] Updated weights for policy 0, policy_version 570 (0.0031) [2024-11-11 14:24:34,536][00562] Fps is (10 sec: 4506.0, 60 sec: 3959.5, 300 sec: 3860.0). Total num frames: 2342912. Throughput: 0: 968.3. Samples: 585518. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-11 14:24:34,539][00562] Avg episode reward: [(0, '19.948')] [2024-11-11 14:24:34,549][03122] Saving new best policy, reward=19.948! [2024-11-11 14:24:39,536][00562] Fps is (10 sec: 4096.1, 60 sec: 3891.2, 300 sec: 3846.2). Total num frames: 2359296. Throughput: 0: 999.1. Samples: 588840. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-11 14:24:39,549][00562] Avg episode reward: [(0, '19.368')] [2024-11-11 14:24:44,415][03135] Updated weights for policy 0, policy_version 580 (0.0024) [2024-11-11 14:24:44,536][00562] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3832.2). Total num frames: 2375680. Throughput: 0: 955.0. Samples: 593056. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-11 14:24:44,538][00562] Avg episode reward: [(0, '20.561')] [2024-11-11 14:24:44,546][03122] Saving new best policy, reward=20.561! [2024-11-11 14:24:49,536][00562] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 2396160. Throughput: 0: 951.2. Samples: 599644. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-11 14:24:49,538][00562] Avg episode reward: [(0, '20.524')] [2024-11-11 14:24:53,193][03135] Updated weights for policy 0, policy_version 590 (0.0025) [2024-11-11 14:24:54,536][00562] Fps is (10 sec: 4505.5, 60 sec: 3959.4, 300 sec: 3873.8). Total num frames: 2420736. Throughput: 0: 980.6. Samples: 603014. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-11 14:24:54,540][00562] Avg episode reward: [(0, '19.929')] [2024-11-11 14:24:59,542][00562] Fps is (10 sec: 3684.2, 60 sec: 3822.5, 300 sec: 3832.1). Total num frames: 2433024. Throughput: 0: 975.2. Samples: 608110. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-11 14:24:59,545][00562] Avg episode reward: [(0, '19.025')] [2024-11-11 14:25:04,536][00562] Fps is (10 sec: 3276.9, 60 sec: 3822.9, 300 sec: 3832.2). Total num frames: 2453504. Throughput: 0: 950.8. Samples: 614012. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-11 14:25:04,538][00562] Avg episode reward: [(0, '18.559')] [2024-11-11 14:25:04,664][03135] Updated weights for policy 0, policy_version 600 (0.0028) [2024-11-11 14:25:09,536][00562] Fps is (10 sec: 4508.4, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 2478080. Throughput: 0: 973.1. Samples: 617580. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-11 14:25:09,538][00562] Avg episode reward: [(0, '18.975')] [2024-11-11 14:25:14,538][00562] Fps is (10 sec: 4095.3, 60 sec: 3891.1, 300 sec: 3846.1). Total num frames: 2494464. Throughput: 0: 995.4. Samples: 623378. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-11 14:25:14,543][00562] Avg episode reward: [(0, '19.347')] [2024-11-11 14:25:14,559][03122] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000609_2494464.pth... [2024-11-11 14:25:14,724][03122] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000386_1581056.pth [2024-11-11 14:25:15,624][03135] Updated weights for policy 0, policy_version 610 (0.0032) [2024-11-11 14:25:19,538][00562] Fps is (10 sec: 3276.2, 60 sec: 3822.8, 300 sec: 3818.3). Total num frames: 2510848. Throughput: 0: 952.2. Samples: 628368. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-11 14:25:19,542][00562] Avg episode reward: [(0, '18.871')] [2024-11-11 14:25:24,536][00562] Fps is (10 sec: 4096.7, 60 sec: 3959.5, 300 sec: 3860.0). Total num frames: 2535424. Throughput: 0: 947.9. Samples: 631496. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-11 14:25:24,542][00562] Avg episode reward: [(0, '19.056')] [2024-11-11 14:25:24,980][03135] Updated weights for policy 0, policy_version 620 (0.0026) [2024-11-11 14:25:29,538][00562] Fps is (10 sec: 4505.5, 60 sec: 3959.4, 300 sec: 3859.9). Total num frames: 2555904. Throughput: 0: 1006.5. Samples: 638350. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-11 14:25:29,540][00562] Avg episode reward: [(0, '18.963')] [2024-11-11 14:25:34,536][00562] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3818.3). Total num frames: 2568192. Throughput: 0: 954.9. Samples: 642616. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-11 14:25:34,539][00562] Avg episode reward: [(0, '18.796')] [2024-11-11 14:25:36,392][03135] Updated weights for policy 0, policy_version 630 (0.0023) [2024-11-11 14:25:39,536][00562] Fps is (10 sec: 3687.1, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 2592768. Throughput: 0: 958.7. Samples: 646154. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-11 14:25:39,538][00562] Avg episode reward: [(0, '19.304')] [2024-11-11 14:25:44,536][00562] Fps is (10 sec: 4915.2, 60 sec: 4027.7, 300 sec: 3873.8). Total num frames: 2617344. Throughput: 0: 1002.5. Samples: 653216. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-11 14:25:44,539][00562] Avg episode reward: [(0, '19.259')] [2024-11-11 14:25:45,777][03135] Updated weights for policy 0, policy_version 640 (0.0039) [2024-11-11 14:25:49,536][00562] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3832.2). Total num frames: 2629632. Throughput: 0: 975.9. Samples: 657928. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-11 14:25:49,539][00562] Avg episode reward: [(0, '19.747')] [2024-11-11 14:25:54,536][00562] Fps is (10 sec: 3276.8, 60 sec: 3823.0, 300 sec: 3832.2). Total num frames: 2650112. Throughput: 0: 956.8. Samples: 660636. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-11 14:25:54,543][00562] Avg episode reward: [(0, '19.441')] [2024-11-11 14:25:56,442][03135] Updated weights for policy 0, policy_version 650 (0.0032) [2024-11-11 14:25:59,536][00562] Fps is (10 sec: 4505.6, 60 sec: 4028.1, 300 sec: 3873.8). Total num frames: 2674688. Throughput: 0: 984.3. Samples: 667670. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-11 14:25:59,538][00562] Avg episode reward: [(0, '19.029')] [2024-11-11 14:26:04,539][00562] Fps is (10 sec: 4094.8, 60 sec: 3959.3, 300 sec: 3846.0). Total num frames: 2691072. Throughput: 0: 996.0. Samples: 673188. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0) [2024-11-11 14:26:04,542][00562] Avg episode reward: [(0, '19.359')] [2024-11-11 14:26:07,949][03135] Updated weights for policy 0, policy_version 660 (0.0031) [2024-11-11 14:26:09,536][00562] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3818.3). Total num frames: 2707456. Throughput: 0: 973.6. Samples: 675308. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-11 14:26:09,538][00562] Avg episode reward: [(0, '19.579')] [2024-11-11 14:26:14,536][00562] Fps is (10 sec: 4097.2, 60 sec: 3959.6, 300 sec: 3860.0). Total num frames: 2732032. Throughput: 0: 973.4. Samples: 682150. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-11 14:26:14,546][00562] Avg episode reward: [(0, '18.167')] [2024-11-11 14:26:16,721][03135] Updated weights for policy 0, policy_version 670 (0.0020) [2024-11-11 14:26:19,536][00562] Fps is (10 sec: 4505.6, 60 sec: 4027.9, 300 sec: 3860.0). Total num frames: 2752512. Throughput: 0: 1017.9. Samples: 688422. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-11 14:26:19,538][00562] Avg episode reward: [(0, '18.763')] [2024-11-11 14:26:24,536][00562] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3832.2). Total num frames: 2768896. Throughput: 0: 986.2. Samples: 690532. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-11 14:26:24,538][00562] Avg episode reward: [(0, '18.673')] [2024-11-11 14:26:27,909][03135] Updated weights for policy 0, policy_version 680 (0.0024) [2024-11-11 14:26:29,536][00562] Fps is (10 sec: 3686.4, 60 sec: 3891.3, 300 sec: 3846.1). Total num frames: 2789376. Throughput: 0: 963.3. Samples: 696564. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-11 14:26:29,544][00562] Avg episode reward: [(0, '17.968')] [2024-11-11 14:26:34,537][00562] Fps is (10 sec: 4505.1, 60 sec: 4095.9, 300 sec: 3873.8). Total num frames: 2813952. Throughput: 0: 1016.3. Samples: 703662. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-11 14:26:34,540][00562] Avg episode reward: [(0, '18.548')] [2024-11-11 14:26:38,381][03135] Updated weights for policy 0, policy_version 690 (0.0019) [2024-11-11 14:26:39,536][00562] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3832.2). Total num frames: 2826240. Throughput: 0: 1006.8. Samples: 705944. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-11 14:26:39,541][00562] Avg episode reward: [(0, '18.296')] [2024-11-11 14:26:44,536][00562] Fps is (10 sec: 3277.2, 60 sec: 3822.9, 300 sec: 3832.2). Total num frames: 2846720. Throughput: 0: 965.7. Samples: 711128. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-11 14:26:44,541][00562] Avg episode reward: [(0, '17.230')] [2024-11-11 14:26:48,080][03135] Updated weights for policy 0, policy_version 700 (0.0031) [2024-11-11 14:26:49,536][00562] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3860.0). Total num frames: 2871296. Throughput: 0: 1000.8. Samples: 718222. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-11 14:26:49,545][00562] Avg episode reward: [(0, '17.083')] [2024-11-11 14:26:54,537][00562] Fps is (10 sec: 4095.5, 60 sec: 3959.4, 300 sec: 3846.1). Total num frames: 2887680. Throughput: 0: 1019.0. Samples: 721164. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-11 14:26:54,543][00562] Avg episode reward: [(0, '17.814')] [2024-11-11 14:26:59,536][00562] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3818.3). Total num frames: 2904064. Throughput: 0: 965.1. Samples: 725578. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-11 14:26:59,543][00562] Avg episode reward: [(0, '18.477')] [2024-11-11 14:26:59,565][03135] Updated weights for policy 0, policy_version 710 (0.0029) [2024-11-11 14:27:04,543][00562] Fps is (10 sec: 4093.5, 60 sec: 3959.2, 300 sec: 3859.9). Total num frames: 2928640. Throughput: 0: 983.2. Samples: 732674. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-11 14:27:04,549][00562] Avg episode reward: [(0, '19.283')] [2024-11-11 14:27:08,764][03135] Updated weights for policy 0, policy_version 720 (0.0028) [2024-11-11 14:27:09,536][00562] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3873.8). Total num frames: 2949120. Throughput: 0: 1014.0. Samples: 736160. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-11 14:27:09,540][00562] Avg episode reward: [(0, '20.200')] [2024-11-11 14:27:14,536][00562] Fps is (10 sec: 3689.1, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 2965504. Throughput: 0: 984.0. Samples: 740842. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-11 14:27:14,538][00562] Avg episode reward: [(0, '18.441')] [2024-11-11 14:27:14,550][03122] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000724_2965504.pth... [2024-11-11 14:27:14,671][03122] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000497_2035712.pth [2024-11-11 14:27:19,536][00562] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 2985984. Throughput: 0: 966.7. Samples: 747164. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-11 14:27:19,538][00562] Avg episode reward: [(0, '18.515')] [2024-11-11 14:27:19,552][03135] Updated weights for policy 0, policy_version 730 (0.0013) [2024-11-11 14:27:24,537][00562] Fps is (10 sec: 4504.9, 60 sec: 4027.6, 300 sec: 3887.7). Total num frames: 3010560. Throughput: 0: 987.8. Samples: 750398. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-11 14:27:24,542][00562] Avg episode reward: [(0, '18.392')] [2024-11-11 14:27:29,537][00562] Fps is (10 sec: 3686.0, 60 sec: 3891.1, 300 sec: 3859.9). Total num frames: 3022848. Throughput: 0: 991.1. Samples: 755730. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-11 14:27:29,539][00562] Avg episode reward: [(0, '17.663')] [2024-11-11 14:27:31,246][03135] Updated weights for policy 0, policy_version 740 (0.0023) [2024-11-11 14:27:34,536][00562] Fps is (10 sec: 3277.3, 60 sec: 3823.0, 300 sec: 3860.0). Total num frames: 3043328. Throughput: 0: 960.1. Samples: 761426. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-11 14:27:34,544][00562] Avg episode reward: [(0, '18.847')] [2024-11-11 14:27:39,536][00562] Fps is (10 sec: 4506.0, 60 sec: 4027.7, 300 sec: 3887.7). Total num frames: 3067904. Throughput: 0: 973.3. Samples: 764960. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-11 14:27:39,543][00562] Avg episode reward: [(0, '18.547')] [2024-11-11 14:27:39,839][03135] Updated weights for policy 0, policy_version 750 (0.0013) [2024-11-11 14:27:44,536][00562] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3887.7). Total num frames: 3084288. Throughput: 0: 1009.1. Samples: 770986. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-11 14:27:44,540][00562] Avg episode reward: [(0, '19.671')] [2024-11-11 14:27:49,536][00562] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 3104768. Throughput: 0: 962.2. Samples: 775968. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-11 14:27:49,537][00562] Avg episode reward: [(0, '20.647')] [2024-11-11 14:27:49,540][03122] Saving new best policy, reward=20.647! [2024-11-11 14:27:51,277][03135] Updated weights for policy 0, policy_version 760 (0.0017) [2024-11-11 14:27:54,536][00562] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3887.7). Total num frames: 3125248. Throughput: 0: 958.9. Samples: 779310. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-11 14:27:54,543][00562] Avg episode reward: [(0, '20.763')] [2024-11-11 14:27:54,555][03122] Saving new best policy, reward=20.763! [2024-11-11 14:27:59,537][00562] Fps is (10 sec: 4095.6, 60 sec: 4027.7, 300 sec: 3901.6). Total num frames: 3145728. Throughput: 0: 1007.4. Samples: 786178. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-11 14:27:59,539][00562] Avg episode reward: [(0, '20.604')] [2024-11-11 14:28:01,918][03135] Updated weights for policy 0, policy_version 770 (0.0014) [2024-11-11 14:28:04,536][00562] Fps is (10 sec: 3686.4, 60 sec: 3891.7, 300 sec: 3887.7). Total num frames: 3162112. Throughput: 0: 961.4. Samples: 790426. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-11 14:28:04,539][00562] Avg episode reward: [(0, '19.923')] [2024-11-11 14:28:09,536][00562] Fps is (10 sec: 3686.8, 60 sec: 3891.2, 300 sec: 3887.8). Total num frames: 3182592. Throughput: 0: 964.2. Samples: 793784. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-11 14:28:09,543][00562] Avg episode reward: [(0, '18.911')] [2024-11-11 14:28:11,370][03135] Updated weights for policy 0, policy_version 780 (0.0016) [2024-11-11 14:28:14,536][00562] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3929.4). Total num frames: 3207168. Throughput: 0: 1001.4. Samples: 800794. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-11 14:28:14,538][00562] Avg episode reward: [(0, '19.310')] [2024-11-11 14:28:19,538][00562] Fps is (10 sec: 3685.7, 60 sec: 3891.1, 300 sec: 3887.7). Total num frames: 3219456. Throughput: 0: 982.7. Samples: 805648. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-11 14:28:19,543][00562] Avg episode reward: [(0, '19.464')] [2024-11-11 14:28:23,046][03135] Updated weights for policy 0, policy_version 790 (0.0027) [2024-11-11 14:28:24,536][00562] Fps is (10 sec: 3276.7, 60 sec: 3823.0, 300 sec: 3887.7). Total num frames: 3239936. Throughput: 0: 956.8. Samples: 808016. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-11 14:28:24,539][00562] Avg episode reward: [(0, '20.023')] [2024-11-11 14:28:29,536][00562] Fps is (10 sec: 4506.4, 60 sec: 4027.8, 300 sec: 3929.4). Total num frames: 3264512. Throughput: 0: 978.3. Samples: 815008. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-11 14:28:29,538][00562] Avg episode reward: [(0, '20.631')] [2024-11-11 14:28:32,363][03135] Updated weights for policy 0, policy_version 800 (0.0015) [2024-11-11 14:28:34,536][00562] Fps is (10 sec: 4096.1, 60 sec: 3959.5, 300 sec: 3915.5). Total num frames: 3280896. Throughput: 0: 995.2. Samples: 820750. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-11 14:28:34,542][00562] Avg episode reward: [(0, '21.092')] [2024-11-11 14:28:34,561][03122] Saving new best policy, reward=21.092! [2024-11-11 14:28:39,536][00562] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3887.7). Total num frames: 3297280. Throughput: 0: 966.7. Samples: 822812. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-11 14:28:39,538][00562] Avg episode reward: [(0, '20.886')] [2024-11-11 14:28:43,414][03135] Updated weights for policy 0, policy_version 810 (0.0022) [2024-11-11 14:28:44,536][00562] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3915.5). Total num frames: 3321856. Throughput: 0: 957.8. Samples: 829276. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-11 14:28:44,542][00562] Avg episode reward: [(0, '19.328')] [2024-11-11 14:28:49,536][00562] Fps is (10 sec: 4505.5, 60 sec: 3959.4, 300 sec: 3929.4). Total num frames: 3342336. Throughput: 0: 1011.5. Samples: 835946. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-11 14:28:49,540][00562] Avg episode reward: [(0, '20.569')] [2024-11-11 14:28:54,536][00562] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3901.6). Total num frames: 3354624. Throughput: 0: 983.5. Samples: 838042. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-11 14:28:54,540][00562] Avg episode reward: [(0, '20.043')] [2024-11-11 14:28:54,723][03135] Updated weights for policy 0, policy_version 820 (0.0017) [2024-11-11 14:28:59,536][00562] Fps is (10 sec: 3686.5, 60 sec: 3891.3, 300 sec: 3915.5). Total num frames: 3379200. Throughput: 0: 953.2. Samples: 843686. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-11 14:28:59,542][00562] Avg episode reward: [(0, '18.447')] [2024-11-11 14:29:03,547][03135] Updated weights for policy 0, policy_version 830 (0.0029) [2024-11-11 14:29:04,536][00562] Fps is (10 sec: 4915.2, 60 sec: 4027.7, 300 sec: 3943.3). Total num frames: 3403776. Throughput: 0: 1002.9. Samples: 850778. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-11 14:29:04,541][00562] Avg episode reward: [(0, '18.528')] [2024-11-11 14:29:09,536][00562] Fps is (10 sec: 3686.3, 60 sec: 3891.2, 300 sec: 3915.5). Total num frames: 3416064. Throughput: 0: 1007.3. Samples: 853346. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-11 14:29:09,545][00562] Avg episode reward: [(0, '19.173')] [2024-11-11 14:29:14,536][00562] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3915.5). Total num frames: 3436544. Throughput: 0: 962.7. Samples: 858330. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-11 14:29:14,543][00562] Avg episode reward: [(0, '18.941')] [2024-11-11 14:29:14,552][03122] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000839_3436544.pth... [2024-11-11 14:29:14,688][03122] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000609_2494464.pth [2024-11-11 14:29:14,788][03135] Updated weights for policy 0, policy_version 840 (0.0025) [2024-11-11 14:29:19,536][00562] Fps is (10 sec: 4505.7, 60 sec: 4027.9, 300 sec: 3943.3). Total num frames: 3461120. Throughput: 0: 989.2. Samples: 865264. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-11 14:29:19,542][00562] Avg episode reward: [(0, '20.902')] [2024-11-11 14:29:24,536][00562] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3929.4). Total num frames: 3477504. Throughput: 0: 1016.5. Samples: 868554. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-11 14:29:24,542][00562] Avg episode reward: [(0, '21.155')] [2024-11-11 14:29:24,553][03122] Saving new best policy, reward=21.155! [2024-11-11 14:29:25,327][03135] Updated weights for policy 0, policy_version 850 (0.0025) [2024-11-11 14:29:29,536][00562] Fps is (10 sec: 3276.7, 60 sec: 3822.9, 300 sec: 3901.6). Total num frames: 3493888. Throughput: 0: 965.4. Samples: 872718. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-11 14:29:29,545][00562] Avg episode reward: [(0, '21.123')] [2024-11-11 14:29:34,536][00562] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3929.4). Total num frames: 3518464. Throughput: 0: 969.6. Samples: 879576. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-11 14:29:34,541][00562] Avg episode reward: [(0, '20.929')] [2024-11-11 14:29:35,193][03135] Updated weights for policy 0, policy_version 860 (0.0026) [2024-11-11 14:29:39,536][00562] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3943.3). Total num frames: 3538944. Throughput: 0: 1002.3. Samples: 883144. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-11 14:29:39,538][00562] Avg episode reward: [(0, '20.835')] [2024-11-11 14:29:44,536][00562] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3915.5). Total num frames: 3551232. Throughput: 0: 987.0. Samples: 888100. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-11 14:29:44,542][00562] Avg episode reward: [(0, '20.317')] [2024-11-11 14:29:46,450][03135] Updated weights for policy 0, policy_version 870 (0.0030) [2024-11-11 14:29:49,536][00562] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3915.5). Total num frames: 3575808. Throughput: 0: 966.3. Samples: 894262. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-11 14:29:49,538][00562] Avg episode reward: [(0, '21.075')] [2024-11-11 14:29:54,536][00562] Fps is (10 sec: 4915.2, 60 sec: 4096.0, 300 sec: 3957.2). Total num frames: 3600384. Throughput: 0: 987.7. Samples: 897792. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-11 14:29:54,538][00562] Avg episode reward: [(0, '21.177')] [2024-11-11 14:29:54,546][03122] Saving new best policy, reward=21.177! [2024-11-11 14:29:55,461][03135] Updated weights for policy 0, policy_version 880 (0.0033) [2024-11-11 14:29:59,538][00562] Fps is (10 sec: 3685.6, 60 sec: 3891.1, 300 sec: 3929.4). Total num frames: 3612672. Throughput: 0: 998.1. Samples: 903246. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-11 14:29:59,542][00562] Avg episode reward: [(0, '19.835')] [2024-11-11 14:30:04,536][00562] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3915.5). Total num frames: 3633152. Throughput: 0: 956.1. Samples: 908288. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-11 14:30:04,542][00562] Avg episode reward: [(0, '20.637')] [2024-11-11 14:30:06,895][03135] Updated weights for policy 0, policy_version 890 (0.0029) [2024-11-11 14:30:09,536][00562] Fps is (10 sec: 4506.6, 60 sec: 4027.7, 300 sec: 3943.3). Total num frames: 3657728. Throughput: 0: 960.6. Samples: 911780. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-11 14:30:09,539][00562] Avg episode reward: [(0, '20.359')] [2024-11-11 14:30:14,536][00562] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3943.3). Total num frames: 3674112. Throughput: 0: 1016.9. Samples: 918478. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-11 14:30:14,542][00562] Avg episode reward: [(0, '19.161')] [2024-11-11 14:30:18,074][03135] Updated weights for policy 0, policy_version 900 (0.0013) [2024-11-11 14:30:19,536][00562] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3915.5). Total num frames: 3690496. Throughput: 0: 960.4. Samples: 922794. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-11 14:30:19,540][00562] Avg episode reward: [(0, '19.307')] [2024-11-11 14:30:24,536][00562] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3929.4). Total num frames: 3715072. Throughput: 0: 959.4. Samples: 926316. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-11 14:30:24,539][00562] Avg episode reward: [(0, '20.115')] [2024-11-11 14:30:27,067][03135] Updated weights for policy 0, policy_version 910 (0.0042) [2024-11-11 14:30:29,536][00562] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3957.2). Total num frames: 3735552. Throughput: 0: 1002.1. Samples: 933196. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-11 14:30:29,540][00562] Avg episode reward: [(0, '20.229')] [2024-11-11 14:30:34,536][00562] Fps is (10 sec: 3276.7, 60 sec: 3822.9, 300 sec: 3915.5). Total num frames: 3747840. Throughput: 0: 968.5. Samples: 937844. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-11 14:30:34,546][00562] Avg episode reward: [(0, '20.060')] [2024-11-11 14:30:38,637][03135] Updated weights for policy 0, policy_version 920 (0.0019) [2024-11-11 14:30:39,536][00562] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3915.5). Total num frames: 3772416. Throughput: 0: 953.4. Samples: 940696. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-11 14:30:39,543][00562] Avg episode reward: [(0, '21.008')] [2024-11-11 14:30:44,538][00562] Fps is (10 sec: 4504.7, 60 sec: 4027.6, 300 sec: 3943.2). Total num frames: 3792896. Throughput: 0: 985.7. Samples: 947602. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-11 14:30:44,544][00562] Avg episode reward: [(0, '22.849')] [2024-11-11 14:30:44,552][03122] Saving new best policy, reward=22.849! [2024-11-11 14:30:48,980][03135] Updated weights for policy 0, policy_version 930 (0.0017) [2024-11-11 14:30:49,536][00562] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3929.4). Total num frames: 3809280. Throughput: 0: 987.6. Samples: 952728. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-11 14:30:49,541][00562] Avg episode reward: [(0, '23.123')] [2024-11-11 14:30:49,547][03122] Saving new best policy, reward=23.123! [2024-11-11 14:30:54,536][00562] Fps is (10 sec: 3277.6, 60 sec: 3754.7, 300 sec: 3901.6). Total num frames: 3825664. Throughput: 0: 956.2. Samples: 954808. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-11 14:30:54,539][00562] Avg episode reward: [(0, '24.388')] [2024-11-11 14:30:54,547][03122] Saving new best policy, reward=24.388! [2024-11-11 14:30:59,079][03135] Updated weights for policy 0, policy_version 940 (0.0023) [2024-11-11 14:30:59,536][00562] Fps is (10 sec: 4096.0, 60 sec: 3959.6, 300 sec: 3929.4). Total num frames: 3850240. Throughput: 0: 958.2. Samples: 961598. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-11 14:30:59,544][00562] Avg episode reward: [(0, '23.414')] [2024-11-11 14:31:04,536][00562] Fps is (10 sec: 4095.8, 60 sec: 3891.2, 300 sec: 3929.4). Total num frames: 3866624. Throughput: 0: 996.1. Samples: 967618. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-11 14:31:04,544][00562] Avg episode reward: [(0, '23.652')] [2024-11-11 14:31:09,536][00562] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3901.6). Total num frames: 3883008. Throughput: 0: 966.4. Samples: 969804. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-11 14:31:09,538][00562] Avg episode reward: [(0, '24.082')] [2024-11-11 14:31:10,466][03135] Updated weights for policy 0, policy_version 950 (0.0017) [2024-11-11 14:31:14,536][00562] Fps is (10 sec: 4096.1, 60 sec: 3891.2, 300 sec: 3915.5). Total num frames: 3907584. Throughput: 0: 950.8. Samples: 975984. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-11 14:31:14,538][00562] Avg episode reward: [(0, '21.893')] [2024-11-11 14:31:14,547][03122] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000954_3907584.pth... [2024-11-11 14:31:14,665][03122] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000724_2965504.pth [2024-11-11 14:31:19,537][00562] Fps is (10 sec: 4505.2, 60 sec: 3959.4, 300 sec: 3929.4). Total num frames: 3928064. Throughput: 0: 999.0. Samples: 982798. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-11 14:31:19,539][00562] Avg episode reward: [(0, '20.911')] [2024-11-11 14:31:19,945][03135] Updated weights for policy 0, policy_version 960 (0.0022) [2024-11-11 14:31:24,536][00562] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3915.5). Total num frames: 3944448. Throughput: 0: 981.1. Samples: 984846. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-11 14:31:24,543][00562] Avg episode reward: [(0, '22.390')] [2024-11-11 14:31:29,536][00562] Fps is (10 sec: 3686.7, 60 sec: 3822.9, 300 sec: 3901.6). Total num frames: 3964928. Throughput: 0: 944.8. Samples: 990116. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-11 14:31:29,540][00562] Avg episode reward: [(0, '21.640')] [2024-11-11 14:31:31,036][03135] Updated weights for policy 0, policy_version 970 (0.0018) [2024-11-11 14:31:34,536][00562] Fps is (10 sec: 4505.7, 60 sec: 4027.8, 300 sec: 3943.3). Total num frames: 3989504. Throughput: 0: 985.9. Samples: 997094. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-11 14:31:34,545][00562] Avg episode reward: [(0, '21.698')] [2024-11-11 14:31:39,536][00562] Fps is (10 sec: 3686.5, 60 sec: 3822.9, 300 sec: 3915.5). Total num frames: 4001792. Throughput: 0: 1002.7. Samples: 999930. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-11 14:31:39,546][00562] Avg episode reward: [(0, '21.392')] [2024-11-11 14:31:39,657][03122] Stopping Batcher_0... [2024-11-11 14:31:39,659][03122] Loop batcher_evt_loop terminating... [2024-11-11 14:31:39,664][03122] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-11-11 14:31:39,660][00562] Component Batcher_0 stopped! [2024-11-11 14:31:39,766][03135] Weights refcount: 2 0 [2024-11-11 14:31:39,772][00562] Component InferenceWorker_p0-w0 stopped! [2024-11-11 14:31:39,778][03135] Stopping InferenceWorker_p0-w0... [2024-11-11 14:31:39,778][03135] Loop inference_proc0-0_evt_loop terminating... [2024-11-11 14:31:39,854][03122] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000839_3436544.pth [2024-11-11 14:31:39,872][03122] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-11-11 14:31:40,081][00562] Component LearnerWorker_p0 stopped! [2024-11-11 14:31:40,088][03122] Stopping LearnerWorker_p0... [2024-11-11 14:31:40,088][03122] Loop learner_proc0_evt_loop terminating... [2024-11-11 14:31:40,394][00562] Component RolloutWorker_w1 stopped! [2024-11-11 14:31:40,405][03137] Stopping RolloutWorker_w1... [2024-11-11 14:31:40,405][03137] Loop rollout_proc1_evt_loop terminating... [2024-11-11 14:31:40,449][03143] Stopping RolloutWorker_w7... [2024-11-11 14:31:40,451][00562] Component RolloutWorker_w7 stopped! [2024-11-11 14:31:40,450][03143] Loop rollout_proc7_evt_loop terminating... [2024-11-11 14:31:40,471][03141] Stopping RolloutWorker_w5... [2024-11-11 14:31:40,472][03141] Loop rollout_proc5_evt_loop terminating... [2024-11-11 14:31:40,473][00562] Component RolloutWorker_w5 stopped! [2024-11-11 14:31:40,483][00562] Component RolloutWorker_w3 stopped! [2024-11-11 14:31:40,490][03139] Stopping RolloutWorker_w3... [2024-11-11 14:31:40,490][03139] Loop rollout_proc3_evt_loop terminating... [2024-11-11 14:31:40,541][03138] Stopping RolloutWorker_w2... [2024-11-11 14:31:40,540][00562] Component RolloutWorker_w2 stopped! [2024-11-11 14:31:40,541][03138] Loop rollout_proc2_evt_loop terminating... [2024-11-11 14:31:40,577][03140] Stopping RolloutWorker_w4... [2024-11-11 14:31:40,577][03140] Loop rollout_proc4_evt_loop terminating... [2024-11-11 14:31:40,577][00562] Component RolloutWorker_w4 stopped! [2024-11-11 14:31:40,591][00562] Component RolloutWorker_w0 stopped! [2024-11-11 14:31:40,590][03136] Stopping RolloutWorker_w0... [2024-11-11 14:31:40,593][03136] Loop rollout_proc0_evt_loop terminating... [2024-11-11 14:31:40,636][00562] Component RolloutWorker_w6 stopped! [2024-11-11 14:31:40,638][00562] Waiting for process learner_proc0 to stop... [2024-11-11 14:31:40,641][03142] Stopping RolloutWorker_w6... [2024-11-11 14:31:40,641][03142] Loop rollout_proc6_evt_loop terminating... [2024-11-11 14:31:41,983][00562] Waiting for process inference_proc0-0 to join... [2024-11-11 14:31:41,987][00562] Waiting for process rollout_proc0 to join... [2024-11-11 14:31:44,119][00562] Waiting for process rollout_proc1 to join... [2024-11-11 14:31:44,143][00562] Waiting for process rollout_proc2 to join... [2024-11-11 14:31:44,149][00562] Waiting for process rollout_proc3 to join... [2024-11-11 14:31:44,154][00562] Waiting for process rollout_proc4 to join... [2024-11-11 14:31:44,157][00562] Waiting for process rollout_proc5 to join... [2024-11-11 14:31:44,161][00562] Waiting for process rollout_proc6 to join... [2024-11-11 14:31:44,164][00562] Waiting for process rollout_proc7 to join... [2024-11-11 14:31:44,167][00562] Batcher 0 profile tree view: batching: 28.3437, releasing_batches: 0.0330 [2024-11-11 14:31:44,170][00562] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0025 wait_policy_total: 436.3030 update_model: 8.7112 weight_update: 0.0019 one_step: 0.0036 handle_policy_step: 583.4895 deserialize: 15.2080, stack: 3.1268, obs_to_device_normalize: 124.0045, forward: 294.0283, send_messages: 29.0137 prepare_outputs: 88.5107 to_cpu: 52.8766 [2024-11-11 14:31:44,172][00562] Learner 0 profile tree view: misc: 0.0051, prepare_batch: 13.6783 train: 75.4392 epoch_init: 0.0055, minibatch_init: 0.0159, losses_postprocess: 0.5740, kl_divergence: 0.5243, after_optimizer: 33.4716 calculate_losses: 27.7655 losses_init: 0.0037, forward_head: 1.4214, bptt_initial: 18.8187, tail: 1.2498, advantages_returns: 0.2608, losses: 3.7434 bptt: 1.9478 bptt_forward_core: 1.8665 update: 12.4283 clip: 1.0064 [2024-11-11 14:31:44,174][00562] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.4060, enqueue_policy_requests: 106.8350, env_step: 839.8568, overhead: 14.1801, complete_rollouts: 7.3456 save_policy_outputs: 22.4906 split_output_tensors: 8.6830 [2024-11-11 14:31:44,176][00562] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.3708, enqueue_policy_requests: 109.4797, env_step: 834.1560, overhead: 14.2467, complete_rollouts: 7.0637 save_policy_outputs: 21.3815 split_output_tensors: 8.4240 [2024-11-11 14:31:44,178][00562] Loop Runner_EvtLoop terminating... [2024-11-11 14:31:44,180][00562] Runner profile tree view: main_loop: 1103.4782 [2024-11-11 14:31:44,181][00562] Collected {0: 4005888}, FPS: 3630.2 [2024-11-11 14:31:56,379][00562] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-11-11 14:31:56,381][00562] Overriding arg 'num_workers' with value 1 passed from command line [2024-11-11 14:31:56,383][00562] Adding new argument 'no_render'=True that is not in the saved config file! [2024-11-11 14:31:56,385][00562] Adding new argument 'save_video'=True that is not in the saved config file! [2024-11-11 14:31:56,387][00562] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-11-11 14:31:56,388][00562] Adding new argument 'video_name'=None that is not in the saved config file! [2024-11-11 14:31:56,389][00562] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2024-11-11 14:31:56,391][00562] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-11-11 14:31:56,392][00562] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2024-11-11 14:31:56,393][00562] Adding new argument 'hf_repository'=None that is not in the saved config file! [2024-11-11 14:31:56,394][00562] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-11-11 14:31:56,395][00562] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-11-11 14:31:56,396][00562] Adding new argument 'train_script'=None that is not in the saved config file! [2024-11-11 14:31:56,397][00562] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-11-11 14:31:56,398][00562] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-11-11 14:31:56,446][00562] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-11 14:31:56,450][00562] RunningMeanStd input shape: (3, 72, 128) [2024-11-11 14:31:56,453][00562] RunningMeanStd input shape: (1,) [2024-11-11 14:31:56,468][00562] ConvEncoder: input_channels=3 [2024-11-11 14:31:56,569][00562] Conv encoder output size: 512 [2024-11-11 14:31:56,571][00562] Policy head output size: 512 [2024-11-11 14:31:56,748][00562] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-11-11 14:31:57,562][00562] Num frames 100... [2024-11-11 14:31:57,684][00562] Num frames 200... [2024-11-11 14:31:57,807][00562] Num frames 300... [2024-11-11 14:31:57,936][00562] Num frames 400... [2024-11-11 14:31:58,058][00562] Num frames 500... [2024-11-11 14:31:58,185][00562] Num frames 600... [2024-11-11 14:31:58,309][00562] Num frames 700... [2024-11-11 14:31:58,449][00562] Avg episode rewards: #0: 18.690, true rewards: #0: 7.690 [2024-11-11 14:31:58,451][00562] Avg episode reward: 18.690, avg true_objective: 7.690 [2024-11-11 14:31:58,493][00562] Num frames 800... [2024-11-11 14:31:58,614][00562] Num frames 900... [2024-11-11 14:31:58,737][00562] Num frames 1000... [2024-11-11 14:31:58,857][00562] Num frames 1100... [2024-11-11 14:31:58,999][00562] Num frames 1200... [2024-11-11 14:31:59,134][00562] Num frames 1300... [2024-11-11 14:31:59,257][00562] Num frames 1400... [2024-11-11 14:31:59,325][00562] Avg episode rewards: #0: 14.545, true rewards: #0: 7.045 [2024-11-11 14:31:59,326][00562] Avg episode reward: 14.545, avg true_objective: 7.045 [2024-11-11 14:31:59,436][00562] Num frames 1500... [2024-11-11 14:31:59,558][00562] Num frames 1600... [2024-11-11 14:31:59,682][00562] Num frames 1700... [2024-11-11 14:31:59,804][00562] Num frames 1800... [2024-11-11 14:31:59,925][00562] Num frames 1900... [2024-11-11 14:32:00,054][00562] Num frames 2000... [2024-11-11 14:32:00,184][00562] Num frames 2100... [2024-11-11 14:32:00,310][00562] Num frames 2200... [2024-11-11 14:32:00,433][00562] Num frames 2300... [2024-11-11 14:32:00,554][00562] Num frames 2400... [2024-11-11 14:32:00,676][00562] Num frames 2500... [2024-11-11 14:32:00,797][00562] Num frames 2600... [2024-11-11 14:32:00,918][00562] Num frames 2700... [2024-11-11 14:32:01,046][00562] Num frames 2800... [2024-11-11 14:32:01,177][00562] Num frames 2900... [2024-11-11 14:32:01,303][00562] Num frames 3000... [2024-11-11 14:32:01,425][00562] Num frames 3100... [2024-11-11 14:32:01,546][00562] Num frames 3200... [2024-11-11 14:32:01,658][00562] Avg episode rewards: #0: 23.487, true rewards: #0: 10.820 [2024-11-11 14:32:01,659][00562] Avg episode reward: 23.487, avg true_objective: 10.820 [2024-11-11 14:32:01,727][00562] Num frames 3300... [2024-11-11 14:32:01,850][00562] Num frames 3400... [2024-11-11 14:32:01,969][00562] Num frames 3500... [2024-11-11 14:32:02,102][00562] Num frames 3600... [2024-11-11 14:32:02,271][00562] Avg episode rewards: #0: 20.220, true rewards: #0: 9.220 [2024-11-11 14:32:02,273][00562] Avg episode reward: 20.220, avg true_objective: 9.220 [2024-11-11 14:32:02,299][00562] Num frames 3700... [2024-11-11 14:32:02,467][00562] Num frames 3800... [2024-11-11 14:32:02,632][00562] Num frames 3900... [2024-11-11 14:32:02,795][00562] Num frames 4000... [2024-11-11 14:32:02,960][00562] Num frames 4100... [2024-11-11 14:32:03,131][00562] Num frames 4200... [2024-11-11 14:32:03,295][00562] Num frames 4300... [2024-11-11 14:32:03,462][00562] Num frames 4400... [2024-11-11 14:32:03,635][00562] Num frames 4500... [2024-11-11 14:32:03,813][00562] Num frames 4600... [2024-11-11 14:32:03,987][00562] Num frames 4700... [2024-11-11 14:32:04,175][00562] Num frames 4800... [2024-11-11 14:32:04,247][00562] Avg episode rewards: #0: 20.816, true rewards: #0: 9.616 [2024-11-11 14:32:04,249][00562] Avg episode reward: 20.816, avg true_objective: 9.616 [2024-11-11 14:32:04,413][00562] Num frames 4900... [2024-11-11 14:32:04,546][00562] Num frames 5000... [2024-11-11 14:32:04,668][00562] Num frames 5100... [2024-11-11 14:32:04,787][00562] Num frames 5200... [2024-11-11 14:32:04,904][00562] Num frames 5300... [2024-11-11 14:32:05,020][00562] Avg episode rewards: #0: 18.587, true rewards: #0: 8.920 [2024-11-11 14:32:05,022][00562] Avg episode reward: 18.587, avg true_objective: 8.920 [2024-11-11 14:32:05,084][00562] Num frames 5400... [2024-11-11 14:32:05,221][00562] Num frames 5500... [2024-11-11 14:32:05,342][00562] Num frames 5600... [2024-11-11 14:32:05,465][00562] Num frames 5700... [2024-11-11 14:32:05,583][00562] Num frames 5800... [2024-11-11 14:32:05,703][00562] Num frames 5900... [2024-11-11 14:32:05,823][00562] Num frames 6000... [2024-11-11 14:32:05,954][00562] Num frames 6100... [2024-11-11 14:32:06,132][00562] Avg episode rewards: #0: 18.704, true rewards: #0: 8.847 [2024-11-11 14:32:06,133][00562] Avg episode reward: 18.704, avg true_objective: 8.847 [2024-11-11 14:32:06,145][00562] Num frames 6200... [2024-11-11 14:32:06,276][00562] Num frames 6300... [2024-11-11 14:32:06,398][00562] Num frames 6400... [2024-11-11 14:32:06,517][00562] Num frames 6500... [2024-11-11 14:32:06,636][00562] Num frames 6600... [2024-11-11 14:32:06,754][00562] Num frames 6700... [2024-11-11 14:32:06,875][00562] Num frames 6800... [2024-11-11 14:32:06,997][00562] Num frames 6900... [2024-11-11 14:32:07,135][00562] Num frames 7000... [2024-11-11 14:32:07,243][00562] Avg episode rewards: #0: 18.555, true rewards: #0: 8.805 [2024-11-11 14:32:07,245][00562] Avg episode reward: 18.555, avg true_objective: 8.805 [2024-11-11 14:32:07,315][00562] Num frames 7100... [2024-11-11 14:32:07,436][00562] Num frames 7200... [2024-11-11 14:32:07,556][00562] Num frames 7300... [2024-11-11 14:32:07,674][00562] Num frames 7400... [2024-11-11 14:32:07,796][00562] Num frames 7500... [2024-11-11 14:32:07,915][00562] Num frames 7600... [2024-11-11 14:32:08,037][00562] Num frames 7700... [2024-11-11 14:32:08,167][00562] Num frames 7800... [2024-11-11 14:32:08,295][00562] Num frames 7900... [2024-11-11 14:32:08,416][00562] Num frames 8000... [2024-11-11 14:32:08,535][00562] Num frames 8100... [2024-11-11 14:32:08,656][00562] Num frames 8200... [2024-11-11 14:32:08,777][00562] Num frames 8300... [2024-11-11 14:32:08,900][00562] Num frames 8400... [2024-11-11 14:32:09,020][00562] Num frames 8500... [2024-11-11 14:32:09,147][00562] Num frames 8600... [2024-11-11 14:32:09,275][00562] Num frames 8700... [2024-11-11 14:32:09,398][00562] Num frames 8800... [2024-11-11 14:32:09,519][00562] Num frames 8900... [2024-11-11 14:32:09,641][00562] Num frames 9000... [2024-11-11 14:32:09,763][00562] Num frames 9100... [2024-11-11 14:32:09,871][00562] Avg episode rewards: #0: 22.715, true rewards: #0: 10.160 [2024-11-11 14:32:09,872][00562] Avg episode reward: 22.715, avg true_objective: 10.160 [2024-11-11 14:32:09,944][00562] Num frames 9200... [2024-11-11 14:32:10,061][00562] Num frames 9300... [2024-11-11 14:32:10,193][00562] Num frames 9400... [2024-11-11 14:32:10,320][00562] Num frames 9500... [2024-11-11 14:32:10,442][00562] Num frames 9600... [2024-11-11 14:32:10,564][00562] Avg episode rewards: #0: 21.256, true rewards: #0: 9.656 [2024-11-11 14:32:10,565][00562] Avg episode reward: 21.256, avg true_objective: 9.656 [2024-11-11 14:33:08,223][00562] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2024-11-11 14:41:13,430][00562] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-11-11 14:41:13,432][00562] Overriding arg 'num_workers' with value 1 passed from command line [2024-11-11 14:41:13,434][00562] Adding new argument 'no_render'=True that is not in the saved config file! [2024-11-11 14:41:13,435][00562] Adding new argument 'save_video'=True that is not in the saved config file! [2024-11-11 14:41:13,437][00562] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-11-11 14:41:13,439][00562] Adding new argument 'video_name'=None that is not in the saved config file! [2024-11-11 14:41:13,441][00562] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2024-11-11 14:41:13,442][00562] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-11-11 14:41:13,443][00562] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2024-11-11 14:41:13,444][00562] Adding new argument 'hf_repository'='usamabuttar/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2024-11-11 14:41:13,445][00562] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-11-11 14:41:13,446][00562] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-11-11 14:41:13,447][00562] Adding new argument 'train_script'=None that is not in the saved config file! [2024-11-11 14:41:13,448][00562] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-11-11 14:41:13,449][00562] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-11-11 14:41:13,478][00562] RunningMeanStd input shape: (3, 72, 128) [2024-11-11 14:41:13,480][00562] RunningMeanStd input shape: (1,) [2024-11-11 14:41:13,492][00562] ConvEncoder: input_channels=3 [2024-11-11 14:41:13,529][00562] Conv encoder output size: 512 [2024-11-11 14:41:13,530][00562] Policy head output size: 512 [2024-11-11 14:41:13,551][00562] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-11-11 14:41:13,972][00562] Num frames 100... [2024-11-11 14:41:14,107][00562] Num frames 200... [2024-11-11 14:41:14,247][00562] Num frames 300... [2024-11-11 14:41:14,369][00562] Num frames 400... [2024-11-11 14:41:14,489][00562] Num frames 500... [2024-11-11 14:41:14,614][00562] Num frames 600... [2024-11-11 14:41:14,732][00562] Num frames 700... [2024-11-11 14:41:14,866][00562] Num frames 800... [2024-11-11 14:41:14,988][00562] Num frames 900... [2024-11-11 14:41:15,110][00562] Num frames 1000... [2024-11-11 14:41:15,240][00562] Num frames 1100... [2024-11-11 14:41:15,363][00562] Num frames 1200... [2024-11-11 14:41:15,482][00562] Num frames 1300... [2024-11-11 14:41:15,603][00562] Num frames 1400... [2024-11-11 14:41:15,727][00562] Num frames 1500... [2024-11-11 14:41:15,847][00562] Num frames 1600... [2024-11-11 14:41:15,969][00562] Num frames 1700... [2024-11-11 14:41:16,089][00562] Num frames 1800... [2024-11-11 14:41:16,214][00562] Num frames 1900... [2024-11-11 14:41:16,343][00562] Num frames 2000... [2024-11-11 14:41:16,469][00562] Num frames 2100... [2024-11-11 14:41:16,521][00562] Avg episode rewards: #0: 59.999, true rewards: #0: 21.000 [2024-11-11 14:41:16,524][00562] Avg episode reward: 59.999, avg true_objective: 21.000 [2024-11-11 14:41:16,643][00562] Num frames 2200... [2024-11-11 14:41:16,761][00562] Num frames 2300... [2024-11-11 14:41:16,875][00562] Num frames 2400... [2024-11-11 14:41:16,993][00562] Num frames 2500... [2024-11-11 14:41:17,115][00562] Num frames 2600... [2024-11-11 14:41:17,236][00562] Num frames 2700... [2024-11-11 14:41:17,363][00562] Num frames 2800... [2024-11-11 14:41:17,485][00562] Num frames 2900... [2024-11-11 14:41:17,603][00562] Num frames 3000... [2024-11-11 14:41:17,724][00562] Num frames 3100... [2024-11-11 14:41:17,848][00562] Num frames 3200... [2024-11-11 14:41:17,937][00562] Avg episode rewards: #0: 41.600, true rewards: #0: 16.100 [2024-11-11 14:41:17,940][00562] Avg episode reward: 41.600, avg true_objective: 16.100 [2024-11-11 14:41:18,074][00562] Num frames 3300... [2024-11-11 14:41:18,236][00562] Num frames 3400... [2024-11-11 14:41:18,407][00562] Num frames 3500... [2024-11-11 14:41:18,570][00562] Num frames 3600... [2024-11-11 14:41:18,738][00562] Num frames 3700... [2024-11-11 14:41:18,899][00562] Num frames 3800... [2024-11-11 14:41:19,060][00562] Num frames 3900... [2024-11-11 14:41:19,233][00562] Num frames 4000... [2024-11-11 14:41:19,416][00562] Num frames 4100... [2024-11-11 14:41:19,591][00562] Num frames 4200... [2024-11-11 14:41:19,766][00562] Num frames 4300... [2024-11-11 14:41:19,941][00562] Num frames 4400... [2024-11-11 14:41:20,124][00562] Num frames 4500... [2024-11-11 14:41:20,297][00562] Num frames 4600... [2024-11-11 14:41:20,452][00562] Num frames 4700... [2024-11-11 14:41:20,577][00562] Num frames 4800... [2024-11-11 14:41:20,701][00562] Num frames 4900... [2024-11-11 14:41:20,822][00562] Num frames 5000... [2024-11-11 14:41:20,946][00562] Num frames 5100... [2024-11-11 14:41:21,066][00562] Num frames 5200... [2024-11-11 14:41:21,195][00562] Num frames 5300... [2024-11-11 14:41:21,276][00562] Avg episode rewards: #0: 45.733, true rewards: #0: 17.733 [2024-11-11 14:41:21,278][00562] Avg episode reward: 45.733, avg true_objective: 17.733 [2024-11-11 14:41:21,380][00562] Num frames 5400... [2024-11-11 14:41:21,505][00562] Num frames 5500... [2024-11-11 14:41:21,624][00562] Num frames 5600... [2024-11-11 14:41:21,747][00562] Num frames 5700... [2024-11-11 14:41:21,899][00562] Num frames 5800... [2024-11-11 14:41:22,018][00562] Num frames 5900... [2024-11-11 14:41:22,170][00562] Avg episode rewards: #0: 37.435, true rewards: #0: 14.935 [2024-11-11 14:41:22,172][00562] Avg episode reward: 37.435, avg true_objective: 14.935 [2024-11-11 14:41:22,206][00562] Num frames 6000... [2024-11-11 14:41:22,325][00562] Num frames 6100... [2024-11-11 14:41:22,454][00562] Num frames 6200... [2024-11-11 14:41:22,607][00562] Num frames 6300... [2024-11-11 14:41:22,728][00562] Num frames 6400... [2024-11-11 14:41:22,845][00562] Num frames 6500... [2024-11-11 14:41:22,965][00562] Num frames 6600... [2024-11-11 14:41:23,085][00562] Num frames 6700... [2024-11-11 14:41:23,213][00562] Num frames 6800... [2024-11-11 14:41:23,334][00562] Num frames 6900... [2024-11-11 14:41:23,466][00562] Num frames 7000... [2024-11-11 14:41:23,598][00562] Num frames 7100... [2024-11-11 14:41:23,720][00562] Num frames 7200... [2024-11-11 14:41:23,843][00562] Num frames 7300... [2024-11-11 14:41:23,963][00562] Num frames 7400... [2024-11-11 14:41:24,073][00562] Avg episode rewards: #0: 36.892, true rewards: #0: 14.892 [2024-11-11 14:41:24,075][00562] Avg episode reward: 36.892, avg true_objective: 14.892 [2024-11-11 14:41:24,148][00562] Num frames 7500... [2024-11-11 14:41:24,267][00562] Num frames 7600... [2024-11-11 14:41:24,388][00562] Num frames 7700... [2024-11-11 14:41:24,517][00562] Num frames 7800... [2024-11-11 14:41:24,610][00562] Avg episode rewards: #0: 31.383, true rewards: #0: 13.050 [2024-11-11 14:41:24,612][00562] Avg episode reward: 31.383, avg true_objective: 13.050 [2024-11-11 14:41:24,699][00562] Num frames 7900... [2024-11-11 14:41:24,817][00562] Num frames 8000... [2024-11-11 14:41:24,937][00562] Num frames 8100... [2024-11-11 14:41:25,058][00562] Num frames 8200... [2024-11-11 14:41:25,185][00562] Num frames 8300... [2024-11-11 14:41:25,311][00562] Num frames 8400... [2024-11-11 14:41:25,431][00562] Num frames 8500... [2024-11-11 14:41:25,559][00562] Num frames 8600... [2024-11-11 14:41:25,690][00562] Num frames 8700... [2024-11-11 14:41:25,809][00562] Num frames 8800... [2024-11-11 14:41:25,930][00562] Num frames 8900... [2024-11-11 14:41:26,084][00562] Avg episode rewards: #0: 30.403, true rewards: #0: 12.831 [2024-11-11 14:41:26,086][00562] Avg episode reward: 30.403, avg true_objective: 12.831 [2024-11-11 14:41:26,118][00562] Num frames 9000... [2024-11-11 14:41:26,241][00562] Num frames 9100... [2024-11-11 14:41:26,362][00562] Num frames 9200... [2024-11-11 14:41:26,485][00562] Num frames 9300... [2024-11-11 14:41:26,645][00562] Num frames 9400... [2024-11-11 14:41:26,768][00562] Num frames 9500... [2024-11-11 14:41:26,856][00562] Avg episode rewards: #0: 27.657, true rewards: #0: 11.908 [2024-11-11 14:41:26,858][00562] Avg episode reward: 27.657, avg true_objective: 11.908 [2024-11-11 14:41:26,948][00562] Num frames 9600... [2024-11-11 14:41:27,067][00562] Num frames 9700... [2024-11-11 14:41:27,195][00562] Num frames 9800... [2024-11-11 14:41:27,316][00562] Num frames 9900... [2024-11-11 14:41:27,440][00562] Num frames 10000... [2024-11-11 14:41:27,562][00562] Num frames 10100... [2024-11-11 14:41:27,693][00562] Num frames 10200... [2024-11-11 14:41:27,861][00562] Avg episode rewards: #0: 25.993, true rewards: #0: 11.438 [2024-11-11 14:41:27,863][00562] Avg episode reward: 25.993, avg true_objective: 11.438 [2024-11-11 14:41:27,874][00562] Num frames 10300... [2024-11-11 14:41:27,998][00562] Num frames 10400... [2024-11-11 14:41:28,081][00562] Avg episode rewards: #0: 23.522, true rewards: #0: 10.422 [2024-11-11 14:41:28,082][00562] Avg episode reward: 23.522, avg true_objective: 10.422 [2024-11-11 14:42:30,928][00562] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2024-11-11 14:42:38,969][00562] The model has been pushed to https://huggingface.co/usamabuttar/rl_course_vizdoom_health_gathering_supreme [2024-11-11 14:45:46,383][00562] Loading legacy config file train_dir/doom_health_gathering_supreme_2222/cfg.json instead of train_dir/doom_health_gathering_supreme_2222/config.json [2024-11-11 14:45:46,385][00562] Loading existing experiment configuration from train_dir/doom_health_gathering_supreme_2222/config.json [2024-11-11 14:45:46,387][00562] Overriding arg 'experiment' with value 'doom_health_gathering_supreme_2222' passed from command line [2024-11-11 14:45:46,389][00562] Overriding arg 'train_dir' with value 'train_dir' passed from command line [2024-11-11 14:45:46,390][00562] Overriding arg 'num_workers' with value 1 passed from command line [2024-11-11 14:45:46,396][00562] Adding new argument 'lr_adaptive_min'=1e-06 that is not in the saved config file! [2024-11-11 14:45:46,397][00562] Adding new argument 'lr_adaptive_max'=0.01 that is not in the saved config file! [2024-11-11 14:45:46,398][00562] Adding new argument 'env_gpu_observations'=True that is not in the saved config file! [2024-11-11 14:45:46,401][00562] Adding new argument 'no_render'=True that is not in the saved config file! [2024-11-11 14:45:46,402][00562] Adding new argument 'save_video'=True that is not in the saved config file! [2024-11-11 14:45:46,403][00562] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-11-11 14:45:46,404][00562] Adding new argument 'video_name'=None that is not in the saved config file! [2024-11-11 14:45:46,405][00562] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2024-11-11 14:45:46,407][00562] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-11-11 14:45:46,408][00562] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2024-11-11 14:45:46,409][00562] Adding new argument 'hf_repository'=None that is not in the saved config file! [2024-11-11 14:45:46,410][00562] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-11-11 14:45:46,410][00562] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-11-11 14:45:46,411][00562] Adding new argument 'train_script'=None that is not in the saved config file! [2024-11-11 14:45:46,412][00562] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-11-11 14:45:46,414][00562] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-11-11 14:45:46,458][00562] RunningMeanStd input shape: (3, 72, 128) [2024-11-11 14:45:46,460][00562] RunningMeanStd input shape: (1,) [2024-11-11 14:45:46,472][00562] ConvEncoder: input_channels=3 [2024-11-11 14:45:46,522][00562] Conv encoder output size: 512 [2024-11-11 14:45:46,524][00562] Policy head output size: 512 [2024-11-11 14:45:46,546][00562] Loading state from checkpoint train_dir/doom_health_gathering_supreme_2222/checkpoint_p0/checkpoint_000539850_4422451200.pth... [2024-11-11 14:45:46,964][00562] Num frames 100... [2024-11-11 14:45:47,093][00562] Num frames 200... [2024-11-11 14:45:47,243][00562] Num frames 300... [2024-11-11 14:45:47,373][00562] Num frames 400... [2024-11-11 14:45:47,505][00562] Num frames 500... [2024-11-11 14:45:47,627][00562] Num frames 600... [2024-11-11 14:45:47,752][00562] Num frames 700... [2024-11-11 14:45:47,877][00562] Num frames 800... [2024-11-11 14:45:48,011][00562] Num frames 900... [2024-11-11 14:45:48,141][00562] Num frames 1000... [2024-11-11 14:45:48,263][00562] Num frames 1100... [2024-11-11 14:45:48,387][00562] Num frames 1200... [2024-11-11 14:45:48,521][00562] Num frames 1300... [2024-11-11 14:45:48,640][00562] Num frames 1400... [2024-11-11 14:45:48,762][00562] Num frames 1500... [2024-11-11 14:45:48,886][00562] Num frames 1600... [2024-11-11 14:45:49,006][00562] Num frames 1700... [2024-11-11 14:45:49,132][00562] Num frames 1800... [2024-11-11 14:45:49,251][00562] Num frames 1900... [2024-11-11 14:45:49,381][00562] Num frames 2000... [2024-11-11 14:45:49,511][00562] Num frames 2100... [2024-11-11 14:45:49,564][00562] Avg episode rewards: #0: 65.998, true rewards: #0: 21.000 [2024-11-11 14:45:49,567][00562] Avg episode reward: 65.998, avg true_objective: 21.000 [2024-11-11 14:45:49,685][00562] Num frames 2200... [2024-11-11 14:45:49,804][00562] Num frames 2300... [2024-11-11 14:45:49,925][00562] Num frames 2400... [2024-11-11 14:45:50,044][00562] Num frames 2500... [2024-11-11 14:45:50,174][00562] Num frames 2600... [2024-11-11 14:45:50,292][00562] Num frames 2700... [2024-11-11 14:45:50,418][00562] Num frames 2800... [2024-11-11 14:45:50,548][00562] Num frames 2900... [2024-11-11 14:45:50,672][00562] Num frames 3000... [2024-11-11 14:45:50,795][00562] Num frames 3100... [2024-11-11 14:45:50,918][00562] Num frames 3200... [2024-11-11 14:45:51,039][00562] Num frames 3300... [2024-11-11 14:45:51,166][00562] Num frames 3400... [2024-11-11 14:45:51,294][00562] Num frames 3500... [2024-11-11 14:45:51,420][00562] Num frames 3600... [2024-11-11 14:45:51,542][00562] Num frames 3700... [2024-11-11 14:45:51,670][00562] Num frames 3800... [2024-11-11 14:45:51,796][00562] Num frames 3900... [2024-11-11 14:45:51,916][00562] Num frames 4000... [2024-11-11 14:45:52,040][00562] Num frames 4100... [2024-11-11 14:45:52,176][00562] Num frames 4200... [2024-11-11 14:45:52,228][00562] Avg episode rewards: #0: 63.999, true rewards: #0: 21.000 [2024-11-11 14:45:52,230][00562] Avg episode reward: 63.999, avg true_objective: 21.000 [2024-11-11 14:45:52,363][00562] Num frames 4300... [2024-11-11 14:45:52,491][00562] Num frames 4400... [2024-11-11 14:45:52,625][00562] Num frames 4500... [2024-11-11 14:45:52,786][00562] Num frames 4600... [2024-11-11 14:45:52,956][00562] Num frames 4700... [2024-11-11 14:45:53,124][00562] Num frames 4800... [2024-11-11 14:45:53,284][00562] Num frames 4900... [2024-11-11 14:45:53,450][00562] Num frames 5000... [2024-11-11 14:45:53,623][00562] Num frames 5100... [2024-11-11 14:45:53,794][00562] Num frames 5200... [2024-11-11 14:45:53,959][00562] Num frames 5300... [2024-11-11 14:45:54,134][00562] Num frames 5400... [2024-11-11 14:45:54,309][00562] Num frames 5500... [2024-11-11 14:45:54,499][00562] Num frames 5600... [2024-11-11 14:45:54,676][00562] Num frames 5700... [2024-11-11 14:45:54,861][00562] Num frames 5800... [2024-11-11 14:45:55,035][00562] Num frames 5900... [2024-11-11 14:45:55,207][00562] Num frames 6000... [2024-11-11 14:45:55,329][00562] Num frames 6100... [2024-11-11 14:45:55,453][00562] Num frames 6200... [2024-11-11 14:45:55,580][00562] Num frames 6300... [2024-11-11 14:45:55,633][00562] Avg episode rewards: #0: 65.332, true rewards: #0: 21.000 [2024-11-11 14:45:55,634][00562] Avg episode reward: 65.332, avg true_objective: 21.000 [2024-11-11 14:45:55,771][00562] Num frames 6400... [2024-11-11 14:45:55,897][00562] Num frames 6500... [2024-11-11 14:45:56,019][00562] Num frames 6600... [2024-11-11 14:45:56,148][00562] Num frames 6700... [2024-11-11 14:45:56,271][00562] Num frames 6800... [2024-11-11 14:45:56,393][00562] Num frames 6900... [2024-11-11 14:45:56,515][00562] Num frames 7000... [2024-11-11 14:45:56,636][00562] Num frames 7100... [2024-11-11 14:45:56,756][00562] Num frames 7200... [2024-11-11 14:45:56,884][00562] Num frames 7300... [2024-11-11 14:45:57,003][00562] Num frames 7400... [2024-11-11 14:45:57,131][00562] Num frames 7500... [2024-11-11 14:45:57,251][00562] Num frames 7600... [2024-11-11 14:45:57,375][00562] Num frames 7700... [2024-11-11 14:45:57,497][00562] Num frames 7800... [2024-11-11 14:45:57,621][00562] Num frames 7900... [2024-11-11 14:45:57,772][00562] Num frames 8000... [2024-11-11 14:45:57,908][00562] Num frames 8100... [2024-11-11 14:45:58,033][00562] Num frames 8200... [2024-11-11 14:45:58,171][00562] Num frames 8300... [2024-11-11 14:45:58,304][00562] Num frames 8400... [2024-11-11 14:45:58,356][00562] Avg episode rewards: #0: 65.499, true rewards: #0: 21.000 [2024-11-11 14:45:58,358][00562] Avg episode reward: 65.499, avg true_objective: 21.000 [2024-11-11 14:45:58,482][00562] Num frames 8500... [2024-11-11 14:45:58,607][00562] Num frames 8600... [2024-11-11 14:45:58,727][00562] Num frames 8700... [2024-11-11 14:45:58,857][00562] Num frames 8800... [2024-11-11 14:45:58,986][00562] Num frames 8900... [2024-11-11 14:45:59,121][00562] Num frames 9000... [2024-11-11 14:45:59,242][00562] Num frames 9100... [2024-11-11 14:45:59,368][00562] Num frames 9200... [2024-11-11 14:45:59,499][00562] Num frames 9300... [2024-11-11 14:45:59,621][00562] Num frames 9400... [2024-11-11 14:45:59,745][00562] Num frames 9500... [2024-11-11 14:45:59,873][00562] Num frames 9600... [2024-11-11 14:45:59,998][00562] Num frames 9700... [2024-11-11 14:46:00,127][00562] Num frames 9800... [2024-11-11 14:46:00,247][00562] Num frames 9900... [2024-11-11 14:46:00,371][00562] Num frames 10000... [2024-11-11 14:46:00,494][00562] Num frames 10100... [2024-11-11 14:46:00,616][00562] Num frames 10200... [2024-11-11 14:46:00,738][00562] Num frames 10300... [2024-11-11 14:46:00,860][00562] Num frames 10400... [2024-11-11 14:46:00,995][00562] Num frames 10500... [2024-11-11 14:46:01,047][00562] Avg episode rewards: #0: 65.399, true rewards: #0: 21.000 [2024-11-11 14:46:01,049][00562] Avg episode reward: 65.399, avg true_objective: 21.000 [2024-11-11 14:46:01,179][00562] Num frames 10600... [2024-11-11 14:46:01,304][00562] Num frames 10700... [2024-11-11 14:46:01,428][00562] Num frames 10800... [2024-11-11 14:46:01,550][00562] Num frames 10900... [2024-11-11 14:46:01,672][00562] Num frames 11000... [2024-11-11 14:46:01,797][00562] Num frames 11100... [2024-11-11 14:46:01,927][00562] Num frames 11200... [2024-11-11 14:46:02,050][00562] Num frames 11300... [2024-11-11 14:46:02,182][00562] Num frames 11400... [2024-11-11 14:46:02,308][00562] Num frames 11500... [2024-11-11 14:46:02,435][00562] Num frames 11600... [2024-11-11 14:46:02,559][00562] Num frames 11700... [2024-11-11 14:46:02,685][00562] Num frames 11800... [2024-11-11 14:46:02,810][00562] Num frames 11900... [2024-11-11 14:46:02,935][00562] Num frames 12000... [2024-11-11 14:46:03,064][00562] Num frames 12100... [2024-11-11 14:46:03,194][00562] Num frames 12200... [2024-11-11 14:46:03,319][00562] Num frames 12300... [2024-11-11 14:46:03,458][00562] Num frames 12400... [2024-11-11 14:46:03,581][00562] Num frames 12500... [2024-11-11 14:46:03,712][00562] Num frames 12600... [2024-11-11 14:46:03,765][00562] Avg episode rewards: #0: 65.499, true rewards: #0: 21.000 [2024-11-11 14:46:03,767][00562] Avg episode reward: 65.499, avg true_objective: 21.000 [2024-11-11 14:46:03,893][00562] Num frames 12700... [2024-11-11 14:46:04,027][00562] Num frames 12800... [2024-11-11 14:46:04,159][00562] Num frames 12900... [2024-11-11 14:46:04,286][00562] Num frames 13000... [2024-11-11 14:46:04,409][00562] Num frames 13100... [2024-11-11 14:46:04,532][00562] Num frames 13200... [2024-11-11 14:46:04,656][00562] Num frames 13300... [2024-11-11 14:46:04,781][00562] Num frames 13400... [2024-11-11 14:46:04,904][00562] Num frames 13500... [2024-11-11 14:46:05,035][00562] Num frames 13600... [2024-11-11 14:46:05,176][00562] Num frames 13700... [2024-11-11 14:46:05,376][00562] Num frames 13800... [2024-11-11 14:46:05,564][00562] Num frames 13900... [2024-11-11 14:46:05,734][00562] Num frames 14000... [2024-11-11 14:46:05,905][00562] Num frames 14100... [2024-11-11 14:46:06,082][00562] Num frames 14200... [2024-11-11 14:46:06,249][00562] Num frames 14300... [2024-11-11 14:46:06,422][00562] Num frames 14400... [2024-11-11 14:46:06,597][00562] Num frames 14500... [2024-11-11 14:46:06,775][00562] Num frames 14600... [2024-11-11 14:46:06,950][00562] Num frames 14700... [2024-11-11 14:46:07,005][00562] Avg episode rewards: #0: 65.427, true rewards: #0: 21.000 [2024-11-11 14:46:07,007][00562] Avg episode reward: 65.427, avg true_objective: 21.000 [2024-11-11 14:46:07,194][00562] Num frames 14800... [2024-11-11 14:46:07,367][00562] Num frames 14900... [2024-11-11 14:46:07,538][00562] Num frames 15000... [2024-11-11 14:46:07,719][00562] Num frames 15100... [2024-11-11 14:46:07,869][00562] Num frames 15200... [2024-11-11 14:46:07,991][00562] Num frames 15300... [2024-11-11 14:46:08,126][00562] Num frames 15400... [2024-11-11 14:46:08,250][00562] Num frames 15500... [2024-11-11 14:46:08,373][00562] Num frames 15600... [2024-11-11 14:46:08,503][00562] Num frames 15700... [2024-11-11 14:46:08,628][00562] Num frames 15800... [2024-11-11 14:46:08,753][00562] Num frames 15900... [2024-11-11 14:46:08,878][00562] Num frames 16000... [2024-11-11 14:46:09,002][00562] Num frames 16100... [2024-11-11 14:46:09,140][00562] Num frames 16200... [2024-11-11 14:46:09,264][00562] Num frames 16300... [2024-11-11 14:46:09,397][00562] Num frames 16400... [2024-11-11 14:46:09,524][00562] Num frames 16500... [2024-11-11 14:46:09,653][00562] Num frames 16600... [2024-11-11 14:46:09,778][00562] Num frames 16700... [2024-11-11 14:46:09,906][00562] Num frames 16800... [2024-11-11 14:46:09,958][00562] Avg episode rewards: #0: 64.874, true rewards: #0: 21.000 [2024-11-11 14:46:09,960][00562] Avg episode reward: 64.874, avg true_objective: 21.000 [2024-11-11 14:46:10,082][00562] Num frames 16900... [2024-11-11 14:46:10,220][00562] Num frames 17000... [2024-11-11 14:46:10,344][00562] Num frames 17100... [2024-11-11 14:46:10,468][00562] Num frames 17200... [2024-11-11 14:46:10,589][00562] Num frames 17300... [2024-11-11 14:46:10,713][00562] Num frames 17400... [2024-11-11 14:46:10,835][00562] Num frames 17500... [2024-11-11 14:46:10,962][00562] Num frames 17600... [2024-11-11 14:46:11,086][00562] Num frames 17700... [2024-11-11 14:46:11,224][00562] Num frames 17800... [2024-11-11 14:46:11,348][00562] Num frames 17900... [2024-11-11 14:46:11,471][00562] Num frames 18000... [2024-11-11 14:46:11,596][00562] Num frames 18100... [2024-11-11 14:46:11,719][00562] Num frames 18200... [2024-11-11 14:46:11,843][00562] Num frames 18300... [2024-11-11 14:46:11,906][00562] Avg episode rewards: #0: 62.670, true rewards: #0: 20.338 [2024-11-11 14:46:11,908][00562] Avg episode reward: 62.670, avg true_objective: 20.338 [2024-11-11 14:46:12,031][00562] Num frames 18400... [2024-11-11 14:46:12,163][00562] Num frames 18500... [2024-11-11 14:46:12,293][00562] Num frames 18600... [2024-11-11 14:46:12,421][00562] Num frames 18700... [2024-11-11 14:46:12,544][00562] Num frames 18800... [2024-11-11 14:46:12,665][00562] Num frames 18900... [2024-11-11 14:46:12,792][00562] Num frames 19000... [2024-11-11 14:46:12,897][00562] Avg episode rewards: #0: 57.739, true rewards: #0: 19.040 [2024-11-11 14:46:12,899][00562] Avg episode reward: 57.739, avg true_objective: 19.040 [2024-11-11 14:48:09,151][00562] Replay video saved to train_dir/doom_health_gathering_supreme_2222/replay.mp4! [2024-11-11 14:50:05,788][00562] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-11-11 14:50:05,790][00562] Overriding arg 'num_workers' with value 1 passed from command line [2024-11-11 14:50:05,791][00562] Adding new argument 'no_render'=True that is not in the saved config file! [2024-11-11 14:50:05,793][00562] Adding new argument 'save_video'=True that is not in the saved config file! [2024-11-11 14:50:05,795][00562] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-11-11 14:50:05,796][00562] Adding new argument 'video_name'=None that is not in the saved config file! [2024-11-11 14:50:05,797][00562] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2024-11-11 14:50:05,799][00562] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-11-11 14:50:05,800][00562] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2024-11-11 14:50:05,802][00562] Adding new argument 'hf_repository'='usamabuttar/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2024-11-11 14:50:05,803][00562] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-11-11 14:50:05,808][00562] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-11-11 14:50:05,809][00562] Adding new argument 'train_script'=None that is not in the saved config file! [2024-11-11 14:50:05,810][00562] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-11-11 14:50:05,811][00562] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-11-11 14:50:05,840][00562] RunningMeanStd input shape: (3, 72, 128) [2024-11-11 14:50:05,841][00562] RunningMeanStd input shape: (1,) [2024-11-11 14:50:05,855][00562] ConvEncoder: input_channels=3 [2024-11-11 14:50:05,894][00562] Conv encoder output size: 512 [2024-11-11 14:50:05,896][00562] Policy head output size: 512 [2024-11-11 14:50:05,914][00562] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-11-11 14:50:06,364][00562] Num frames 100... [2024-11-11 14:50:06,482][00562] Num frames 200... [2024-11-11 14:50:06,601][00562] Num frames 300... [2024-11-11 14:50:06,722][00562] Num frames 400... [2024-11-11 14:50:06,848][00562] Num frames 500... [2024-11-11 14:50:06,972][00562] Num frames 600... [2024-11-11 14:50:07,093][00562] Num frames 700... [2024-11-11 14:50:07,215][00562] Num frames 800... [2024-11-11 14:50:07,333][00562] Num frames 900... [2024-11-11 14:50:07,455][00562] Num frames 1000... [2024-11-11 14:50:07,596][00562] Avg episode rewards: #0: 23.690, true rewards: #0: 10.690 [2024-11-11 14:50:07,598][00562] Avg episode reward: 23.690, avg true_objective: 10.690 [2024-11-11 14:50:07,638][00562] Num frames 1100... [2024-11-11 14:50:07,756][00562] Num frames 1200... [2024-11-11 14:50:07,880][00562] Num frames 1300... [2024-11-11 14:50:08,006][00562] Num frames 1400... [2024-11-11 14:50:08,129][00562] Num frames 1500... [2024-11-11 14:50:08,246][00562] Num frames 1600... [2024-11-11 14:50:08,364][00562] Num frames 1700... [2024-11-11 14:50:08,484][00562] Num frames 1800... [2024-11-11 14:50:08,605][00562] Num frames 1900... [2024-11-11 14:50:08,727][00562] Num frames 2000... [2024-11-11 14:50:08,854][00562] Num frames 2100... [2024-11-11 14:50:08,982][00562] Num frames 2200... [2024-11-11 14:50:09,078][00562] Avg episode rewards: #0: 24.160, true rewards: #0: 11.160 [2024-11-11 14:50:09,080][00562] Avg episode reward: 24.160, avg true_objective: 11.160 [2024-11-11 14:50:09,171][00562] Num frames 2300... [2024-11-11 14:50:09,291][00562] Num frames 2400... [2024-11-11 14:50:09,412][00562] Num frames 2500... [2024-11-11 14:50:09,535][00562] Num frames 2600... [2024-11-11 14:50:09,658][00562] Num frames 2700... [2024-11-11 14:50:09,781][00562] Num frames 2800... [2024-11-11 14:50:09,909][00562] Num frames 2900... [2024-11-11 14:50:10,031][00562] Num frames 3000... [2024-11-11 14:50:10,159][00562] Num frames 3100... [2024-11-11 14:50:10,278][00562] Num frames 3200... [2024-11-11 14:50:10,401][00562] Num frames 3300... [2024-11-11 14:50:10,525][00562] Num frames 3400... [2024-11-11 14:50:10,647][00562] Num frames 3500... [2024-11-11 14:50:10,766][00562] Num frames 3600... [2024-11-11 14:50:10,889][00562] Num frames 3700... [2024-11-11 14:50:11,058][00562] Num frames 3800... [2024-11-11 14:50:11,236][00562] Num frames 3900... [2024-11-11 14:50:11,402][00562] Num frames 4000... [2024-11-11 14:50:11,574][00562] Num frames 4100... [2024-11-11 14:50:11,740][00562] Num frames 4200... [2024-11-11 14:50:11,908][00562] Num frames 4300... [2024-11-11 14:50:12,024][00562] Avg episode rewards: #0: 34.440, true rewards: #0: 14.440 [2024-11-11 14:50:12,028][00562] Avg episode reward: 34.440, avg true_objective: 14.440 [2024-11-11 14:50:12,154][00562] Num frames 4400... [2024-11-11 14:50:12,323][00562] Num frames 4500... [2024-11-11 14:50:12,489][00562] Num frames 4600... [2024-11-11 14:50:12,660][00562] Num frames 4700... [2024-11-11 14:50:12,831][00562] Num frames 4800... [2024-11-11 14:50:13,006][00562] Num frames 4900... [2024-11-11 14:50:13,180][00562] Num frames 5000... [2024-11-11 14:50:13,351][00562] Num frames 5100... [2024-11-11 14:50:13,502][00562] Num frames 5200... [2024-11-11 14:50:13,618][00562] Num frames 5300... [2024-11-11 14:50:13,741][00562] Num frames 5400... [2024-11-11 14:50:13,878][00562] Num frames 5500... [2024-11-11 14:50:14,014][00562] Num frames 5600... [2024-11-11 14:50:14,143][00562] Num frames 5700... [2024-11-11 14:50:14,265][00562] Num frames 5800... [2024-11-11 14:50:14,385][00562] Num frames 5900... [2024-11-11 14:50:14,503][00562] Num frames 6000... [2024-11-11 14:50:14,622][00562] Num frames 6100... [2024-11-11 14:50:14,742][00562] Num frames 6200... [2024-11-11 14:50:14,862][00562] Num frames 6300... [2024-11-11 14:50:14,983][00562] Num frames 6400... [2024-11-11 14:50:15,077][00562] Avg episode rewards: #0: 40.329, true rewards: #0: 16.080 [2024-11-11 14:50:15,079][00562] Avg episode reward: 40.329, avg true_objective: 16.080 [2024-11-11 14:50:15,166][00562] Num frames 6500... [2024-11-11 14:50:15,282][00562] Num frames 6600... [2024-11-11 14:50:15,406][00562] Num frames 6700... [2024-11-11 14:50:15,525][00562] Num frames 6800... [2024-11-11 14:50:15,644][00562] Num frames 6900... [2024-11-11 14:50:15,765][00562] Num frames 7000... [2024-11-11 14:50:15,883][00562] Num frames 7100... [2024-11-11 14:50:16,005][00562] Num frames 7200... [2024-11-11 14:50:16,202][00562] Avg episode rewards: #0: 35.794, true rewards: #0: 14.594 [2024-11-11 14:50:16,204][00562] Avg episode reward: 35.794, avg true_objective: 14.594 [2024-11-11 14:50:16,211][00562] Num frames 7300... [2024-11-11 14:50:16,329][00562] Num frames 7400... [2024-11-11 14:50:16,450][00562] Num frames 7500... [2024-11-11 14:50:16,570][00562] Num frames 7600... [2024-11-11 14:50:16,691][00562] Num frames 7700... [2024-11-11 14:50:16,819][00562] Num frames 7800... [2024-11-11 14:50:16,936][00562] Num frames 7900... [2024-11-11 14:50:17,074][00562] Num frames 8000... [2024-11-11 14:50:17,210][00562] Num frames 8100... [2024-11-11 14:50:17,327][00562] Num frames 8200... [2024-11-11 14:50:17,452][00562] Num frames 8300... [2024-11-11 14:50:17,571][00562] Num frames 8400... [2024-11-11 14:50:17,648][00562] Avg episode rewards: #0: 33.528, true rewards: #0: 14.028 [2024-11-11 14:50:17,649][00562] Avg episode reward: 33.528, avg true_objective: 14.028 [2024-11-11 14:50:17,750][00562] Num frames 8500... [2024-11-11 14:50:17,866][00562] Num frames 8600... [2024-11-11 14:50:17,984][00562] Num frames 8700... [2024-11-11 14:50:18,117][00562] Num frames 8800... [2024-11-11 14:50:18,251][00562] Avg episode rewards: #0: 29.521, true rewards: #0: 12.664 [2024-11-11 14:50:18,253][00562] Avg episode reward: 29.521, avg true_objective: 12.664 [2024-11-11 14:50:18,295][00562] Num frames 8900... [2024-11-11 14:50:18,411][00562] Num frames 9000... [2024-11-11 14:50:18,530][00562] Num frames 9100... [2024-11-11 14:50:18,654][00562] Num frames 9200... [2024-11-11 14:50:18,768][00562] Avg episode rewards: #0: 26.311, true rewards: #0: 11.561 [2024-11-11 14:50:18,770][00562] Avg episode reward: 26.311, avg true_objective: 11.561 [2024-11-11 14:50:18,837][00562] Num frames 9300... [2024-11-11 14:50:18,960][00562] Num frames 9400... [2024-11-11 14:50:19,081][00562] Num frames 9500... [2024-11-11 14:50:19,221][00562] Num frames 9600... [2024-11-11 14:50:19,339][00562] Num frames 9700... [2024-11-11 14:50:19,458][00562] Num frames 9800... [2024-11-11 14:50:19,579][00562] Num frames 9900... [2024-11-11 14:50:19,698][00562] Num frames 10000... [2024-11-11 14:50:19,819][00562] Num frames 10100... [2024-11-11 14:50:19,975][00562] Avg episode rewards: #0: 25.426, true rewards: #0: 11.316 [2024-11-11 14:50:19,977][00562] Avg episode reward: 25.426, avg true_objective: 11.316 [2024-11-11 14:50:19,999][00562] Num frames 10200... [2024-11-11 14:50:20,124][00562] Num frames 10300... [2024-11-11 14:50:20,252][00562] Num frames 10400... [2024-11-11 14:50:20,374][00562] Num frames 10500... [2024-11-11 14:50:20,497][00562] Num frames 10600... [2024-11-11 14:50:20,617][00562] Num frames 10700... [2024-11-11 14:50:20,739][00562] Num frames 10800... [2024-11-11 14:50:20,863][00562] Avg episode rewards: #0: 24.356, true rewards: #0: 10.856 [2024-11-11 14:50:20,865][00562] Avg episode reward: 24.356, avg true_objective: 10.856 [2024-11-11 14:51:25,155][00562] Replay video saved to /content/train_dir/default_experiment/replay.mp4!