File size: 167,963 Bytes

[2024-09-22 05:59:43,579][04746] Saving configuration to /content/train_dir/default_experiment/config.json...
[2024-09-22 05:59:43,581][04746] Rollout worker 0 uses device cpu
[2024-09-22 05:59:43,582][04746] Rollout worker 1 uses device cpu
[2024-09-22 05:59:43,584][04746] Rollout worker 2 uses device cpu
[2024-09-22 05:59:43,586][04746] Rollout worker 3 uses device cpu
[2024-09-22 05:59:43,587][04746] Rollout worker 4 uses device cpu
[2024-09-22 05:59:43,589][04746] Rollout worker 5 uses device cpu
[2024-09-22 05:59:43,590][04746] Rollout worker 6 uses device cpu
[2024-09-22 05:59:43,592][04746] Rollout worker 7 uses device cpu
[2024-09-22 05:59:43,714][04746] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2024-09-22 05:59:43,716][04746] InferenceWorker_p0-w0: min num requests: 2
[2024-09-22 05:59:43,750][04746] Starting all processes...
[2024-09-22 05:59:43,753][04746] Starting process learner_proc0
[2024-09-22 05:59:44,501][04746] Starting all processes...
[2024-09-22 05:59:44,507][04746] Starting process inference_proc0-0
[2024-09-22 05:59:44,508][04746] Starting process rollout_proc0
[2024-09-22 05:59:44,508][04746] Starting process rollout_proc1
[2024-09-22 05:59:44,509][04746] Starting process rollout_proc2
[2024-09-22 05:59:44,510][04746] Starting process rollout_proc3
[2024-09-22 05:59:44,510][04746] Starting process rollout_proc4
[2024-09-22 05:59:44,511][04746] Starting process rollout_proc5
[2024-09-22 05:59:44,520][04746] Starting process rollout_proc6
[2024-09-22 05:59:44,525][04746] Starting process rollout_proc7
[2024-09-22 05:59:48,487][06918] Worker 7 uses CPU cores [7]
[2024-09-22 05:59:48,681][06893] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2024-09-22 05:59:48,681][06893] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0
[2024-09-22 05:59:48,700][06893] Num visible devices: 1
[2024-09-22 05:59:48,736][06910] Worker 3 uses CPU cores [3]
[2024-09-22 05:59:48,761][06893] Starting seed is not provided
[2024-09-22 05:59:48,761][06893] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2024-09-22 05:59:48,762][06893] Initializing actor-critic model on device cuda:0
[2024-09-22 05:59:48,762][06893] RunningMeanStd input shape: (3, 72, 128)
[2024-09-22 05:59:48,765][06893] RunningMeanStd input shape: (1,)
[2024-09-22 05:59:48,801][06893] ConvEncoder: input_channels=3
[2024-09-22 05:59:48,897][06906] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2024-09-22 05:59:48,898][06906] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0
[2024-09-22 05:59:48,921][06906] Num visible devices: 1
[2024-09-22 05:59:48,976][06909] Worker 2 uses CPU cores [2]
[2024-09-22 05:59:49,076][06908] Worker 1 uses CPU cores [1]
[2024-09-22 05:59:49,172][06893] Conv encoder output size: 512
[2024-09-22 05:59:49,172][06893] Policy head output size: 512
[2024-09-22 05:59:49,177][06913] Worker 4 uses CPU cores [4]
[2024-09-22 05:59:49,196][06907] Worker 0 uses CPU cores [0]
[2024-09-22 05:59:49,230][06912] Worker 6 uses CPU cores [6]
[2024-09-22 05:59:49,240][06893] Created Actor Critic model with architecture:
[2024-09-22 05:59:49,240][06893] ActorCriticSharedWeights(
  (obs_normalizer): ObservationNormalizer(
    (running_mean_std): RunningMeanStdDictInPlace(
      (running_mean_std): ModuleDict(
        (obs): RunningMeanStdInPlace()
      )
    )
  )
  (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace)
  (encoder): VizdoomEncoder(
    (basic_encoder): ConvEncoder(
      (enc): RecursiveScriptModule(
        original_name=ConvEncoderImpl
        (conv_head): RecursiveScriptModule(
          original_name=Sequential
          (0): RecursiveScriptModule(original_name=Conv2d)
          (1): RecursiveScriptModule(original_name=ELU)
          (2): RecursiveScriptModule(original_name=Conv2d)
          (3): RecursiveScriptModule(original_name=ELU)
          (4): RecursiveScriptModule(original_name=Conv2d)
          (5): RecursiveScriptModule(original_name=ELU)
        )
        (mlp_layers): RecursiveScriptModule(
          original_name=Sequential
          (0): RecursiveScriptModule(original_name=Linear)
          (1): RecursiveScriptModule(original_name=ELU)
        )
      )
    )
  )
  (core): ModelCoreRNN(
    (core): GRU(512, 512)
  )
  (decoder): MlpDecoder(
    (mlp): Identity()
  )
  (critic_linear): Linear(in_features=512, out_features=1, bias=True)
  (action_parameterization): ActionParameterizationDefault(
    (distribution_linear): Linear(in_features=512, out_features=5, bias=True)
  )
)
[2024-09-22 05:59:49,362][06911] Worker 5 uses CPU cores [5]
[2024-09-22 05:59:49,667][06893] Using optimizer <class 'torch.optim.adam.Adam'>
[2024-09-22 05:59:50,419][06893] No checkpoints found
[2024-09-22 05:59:50,419][06893] Did not load from checkpoint, starting from scratch!
[2024-09-22 05:59:50,419][06893] Initialized policy 0 weights for model version 0
[2024-09-22 05:59:50,423][06893] LearnerWorker_p0 finished initialization!
[2024-09-22 05:59:50,424][06893] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2024-09-22 05:59:50,602][06906] RunningMeanStd input shape: (3, 72, 128)
[2024-09-22 05:59:50,603][06906] RunningMeanStd input shape: (1,)
[2024-09-22 05:59:50,616][06906] ConvEncoder: input_channels=3
[2024-09-22 05:59:50,731][06906] Conv encoder output size: 512
[2024-09-22 05:59:50,731][06906] Policy head output size: 512
[2024-09-22 05:59:50,788][04746] Inference worker 0-0 is ready!
[2024-09-22 05:59:50,789][04746] All inference workers are ready! Signal rollout workers to start!
[2024-09-22 05:59:50,846][06911] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-09-22 05:59:50,847][06909] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-09-22 05:59:50,846][06908] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-09-22 05:59:50,848][06910] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-09-22 05:59:50,848][06918] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-09-22 05:59:50,847][06913] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-09-22 05:59:50,848][06907] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-09-22 05:59:50,848][06912] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-09-22 05:59:51,197][06910] Decorrelating experience for 0 frames...
[2024-09-22 05:59:51,197][06911] Decorrelating experience for 0 frames...
[2024-09-22 05:59:51,197][06909] Decorrelating experience for 0 frames...
[2024-09-22 05:59:51,324][06913] Decorrelating experience for 0 frames...
[2024-09-22 05:59:51,326][06907] Decorrelating experience for 0 frames...
[2024-09-22 05:59:51,457][06910] Decorrelating experience for 32 frames...
[2024-09-22 05:59:51,462][06908] Decorrelating experience for 0 frames...
[2024-09-22 05:59:51,639][06907] Decorrelating experience for 32 frames...
[2024-09-22 05:59:51,755][06909] Decorrelating experience for 32 frames...
[2024-09-22 05:59:51,781][06912] Decorrelating experience for 0 frames...
[2024-09-22 05:59:51,819][06910] Decorrelating experience for 64 frames...
[2024-09-22 05:59:51,842][06913] Decorrelating experience for 32 frames...
[2024-09-22 05:59:51,852][06908] Decorrelating experience for 32 frames...
[2024-09-22 05:59:51,915][06911] Decorrelating experience for 32 frames...
[2024-09-22 05:59:52,106][06918] Decorrelating experience for 0 frames...
[2024-09-22 05:59:52,182][06907] Decorrelating experience for 64 frames...
[2024-09-22 05:59:52,251][06912] Decorrelating experience for 32 frames...
[2024-09-22 05:59:52,259][06908] Decorrelating experience for 64 frames...
[2024-09-22 05:59:52,265][06909] Decorrelating experience for 64 frames...
[2024-09-22 05:59:52,361][06918] Decorrelating experience for 32 frames...
[2024-09-22 05:59:52,477][06913] Decorrelating experience for 64 frames...
[2024-09-22 05:59:52,547][06908] Decorrelating experience for 96 frames...
[2024-09-22 05:59:52,636][06907] Decorrelating experience for 96 frames...
[2024-09-22 05:59:52,736][06909] Decorrelating experience for 96 frames...
[2024-09-22 05:59:52,786][06912] Decorrelating experience for 64 frames...
[2024-09-22 05:59:52,797][06910] Decorrelating experience for 96 frames...
[2024-09-22 05:59:52,998][06918] Decorrelating experience for 64 frames...
[2024-09-22 05:59:53,031][04746] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
[2024-09-22 05:59:53,072][06911] Decorrelating experience for 64 frames...
[2024-09-22 05:59:53,097][06912] Decorrelating experience for 96 frames...
[2024-09-22 05:59:53,296][06913] Decorrelating experience for 96 frames...
[2024-09-22 05:59:53,393][06918] Decorrelating experience for 96 frames...
[2024-09-22 05:59:53,421][06911] Decorrelating experience for 96 frames...
[2024-09-22 05:59:55,184][06893] Signal inference workers to stop experience collection...
[2024-09-22 05:59:55,192][06906] InferenceWorker_p0-w0: stopping experience collection
[2024-09-22 05:59:58,031][04746] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 460.4. Samples: 2302. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
[2024-09-22 05:59:58,034][04746] Avg episode reward: [(0, '1.889')]
[2024-09-22 05:59:58,912][06893] Signal inference workers to resume experience collection...
[2024-09-22 05:59:58,913][06906] InferenceWorker_p0-w0: resuming experience collection
[2024-09-22 06:00:01,784][06906] Updated weights for policy 0, policy_version 10 (0.0163)
[2024-09-22 06:00:03,033][04746] Fps is (10 sec: 5324.0, 60 sec: 5324.0, 300 sec: 5324.0). Total num frames: 53248. Throughput: 0: 1365.4. Samples: 13656. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-09-22 06:00:03,035][04746] Avg episode reward: [(0, '4.332')]
[2024-09-22 06:00:03,705][04746] Heartbeat connected on Batcher_0
[2024-09-22 06:00:03,709][04746] Heartbeat connected on LearnerWorker_p0
[2024-09-22 06:00:03,721][04746] Heartbeat connected on InferenceWorker_p0-w0
[2024-09-22 06:00:03,726][04746] Heartbeat connected on RolloutWorker_w0
[2024-09-22 06:00:03,729][04746] Heartbeat connected on RolloutWorker_w1
[2024-09-22 06:00:03,733][04746] Heartbeat connected on RolloutWorker_w2
[2024-09-22 06:00:03,737][04746] Heartbeat connected on RolloutWorker_w3
[2024-09-22 06:00:03,738][04746] Heartbeat connected on RolloutWorker_w4
[2024-09-22 06:00:03,748][04746] Heartbeat connected on RolloutWorker_w6
[2024-09-22 06:00:03,756][04746] Heartbeat connected on RolloutWorker_w5
[2024-09-22 06:00:03,758][04746] Heartbeat connected on RolloutWorker_w7
[2024-09-22 06:00:05,230][06906] Updated weights for policy 0, policy_version 20 (0.0017)
[2024-09-22 06:00:08,031][04746] Fps is (10 sec: 11468.9, 60 sec: 7645.9, 300 sec: 7645.9). Total num frames: 114688. Throughput: 0: 1535.2. Samples: 23028. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2024-09-22 06:00:08,034][04746] Avg episode reward: [(0, '4.432')]
[2024-09-22 06:00:08,084][06893] Saving new best policy, reward=4.432!
[2024-09-22 06:00:08,391][06906] Updated weights for policy 0, policy_version 30 (0.0014)
[2024-09-22 06:00:11,361][06906] Updated weights for policy 0, policy_version 40 (0.0015)
[2024-09-22 06:00:13,031][04746] Fps is (10 sec: 13109.1, 60 sec: 9216.0, 300 sec: 9216.0). Total num frames: 184320. Throughput: 0: 2153.9. Samples: 43078. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-22 06:00:13,033][04746] Avg episode reward: [(0, '4.490')]
[2024-09-22 06:00:13,036][06893] Saving new best policy, reward=4.490!
[2024-09-22 06:00:14,385][06906] Updated weights for policy 0, policy_version 50 (0.0016)
[2024-09-22 06:00:17,617][06906] Updated weights for policy 0, policy_version 60 (0.0014)
[2024-09-22 06:00:18,034][04746] Fps is (10 sec: 13513.4, 60 sec: 9993.3, 300 sec: 9993.3). Total num frames: 249856. Throughput: 0: 2506.8. Samples: 62676. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-09-22 06:00:18,036][04746] Avg episode reward: [(0, '4.243')]
[2024-09-22 06:00:20,589][06906] Updated weights for policy 0, policy_version 70 (0.0014)
[2024-09-22 06:00:23,031][04746] Fps is (10 sec: 13516.8, 60 sec: 10649.6, 300 sec: 10649.6). Total num frames: 319488. Throughput: 0: 2431.7. Samples: 72952. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-22 06:00:23,034][04746] Avg episode reward: [(0, '4.546')]
[2024-09-22 06:00:23,037][06893] Saving new best policy, reward=4.546!
[2024-09-22 06:00:23,601][06906] Updated weights for policy 0, policy_version 80 (0.0014)
[2024-09-22 06:00:26,497][06906] Updated weights for policy 0, policy_version 90 (0.0015)
[2024-09-22 06:00:28,031][04746] Fps is (10 sec: 13929.9, 60 sec: 11117.7, 300 sec: 11117.7). Total num frames: 389120. Throughput: 0: 2685.5. Samples: 93994. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2024-09-22 06:00:28,034][04746] Avg episode reward: [(0, '4.462')]
[2024-09-22 06:00:29,428][06906] Updated weights for policy 0, policy_version 100 (0.0016)
[2024-09-22 06:00:32,699][06906] Updated weights for policy 0, policy_version 110 (0.0015)
[2024-09-22 06:00:33,031][04746] Fps is (10 sec: 13516.7, 60 sec: 11366.4, 300 sec: 11366.4). Total num frames: 454656. Throughput: 0: 2842.0. Samples: 113682. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-09-22 06:00:33,033][04746] Avg episode reward: [(0, '4.527')]
[2024-09-22 06:00:35,664][06906] Updated weights for policy 0, policy_version 120 (0.0017)
[2024-09-22 06:00:38,031][04746] Fps is (10 sec: 13107.0, 60 sec: 11559.8, 300 sec: 11559.8). Total num frames: 520192. Throughput: 0: 2755.9. Samples: 124018. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2024-09-22 06:00:38,033][04746] Avg episode reward: [(0, '4.583')]
[2024-09-22 06:00:38,060][06893] Saving new best policy, reward=4.583!
[2024-09-22 06:00:38,640][06906] Updated weights for policy 0, policy_version 130 (0.0014)
[2024-09-22 06:00:41,589][06906] Updated weights for policy 0, policy_version 140 (0.0016)
[2024-09-22 06:00:43,031][04746] Fps is (10 sec: 13517.0, 60 sec: 11796.5, 300 sec: 11796.5). Total num frames: 589824. Throughput: 0: 3167.9. Samples: 144858. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2024-09-22 06:00:43,034][04746] Avg episode reward: [(0, '4.505')]
[2024-09-22 06:00:44,648][06906] Updated weights for policy 0, policy_version 150 (0.0015)
[2024-09-22 06:00:47,953][06906] Updated weights for policy 0, policy_version 160 (0.0016)
[2024-09-22 06:00:48,031][04746] Fps is (10 sec: 13516.7, 60 sec: 11915.6, 300 sec: 11915.6). Total num frames: 655360. Throughput: 0: 3341.2. Samples: 164004. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-22 06:00:48,034][04746] Avg episode reward: [(0, '4.547')]
[2024-09-22 06:00:50,901][06906] Updated weights for policy 0, policy_version 170 (0.0015)
[2024-09-22 06:00:53,031][04746] Fps is (10 sec: 13516.8, 60 sec: 12083.2, 300 sec: 12083.2). Total num frames: 724992. Throughput: 0: 3367.3. Samples: 174558. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2024-09-22 06:00:53,034][04746] Avg episode reward: [(0, '4.534')]
[2024-09-22 06:00:53,858][06906] Updated weights for policy 0, policy_version 180 (0.0018)
[2024-09-22 06:00:56,747][06906] Updated weights for policy 0, policy_version 190 (0.0019)
[2024-09-22 06:00:58,031][04746] Fps is (10 sec: 13926.4, 60 sec: 13243.7, 300 sec: 12225.0). Total num frames: 794624. Throughput: 0: 3385.9. Samples: 195444. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2024-09-22 06:00:58,033][04746] Avg episode reward: [(0, '4.582')]
[2024-09-22 06:00:59,870][06906] Updated weights for policy 0, policy_version 200 (0.0017)
[2024-09-22 06:01:03,031][04746] Fps is (10 sec: 13107.3, 60 sec: 13380.6, 300 sec: 12229.5). Total num frames: 856064. Throughput: 0: 3377.6. Samples: 214660. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-22 06:01:03,033][04746] Avg episode reward: [(0, '4.789')]
[2024-09-22 06:01:03,056][06893] Saving new best policy, reward=4.789!
[2024-09-22 06:01:03,063][06906] Updated weights for policy 0, policy_version 210 (0.0017)
[2024-09-22 06:01:06,026][06906] Updated weights for policy 0, policy_version 220 (0.0015)
[2024-09-22 06:01:08,031][04746] Fps is (10 sec: 13107.3, 60 sec: 13516.8, 300 sec: 12342.6). Total num frames: 925696. Throughput: 0: 3381.0. Samples: 225098. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-22 06:01:08,034][04746] Avg episode reward: [(0, '4.693')]
[2024-09-22 06:01:09,098][06906] Updated weights for policy 0, policy_version 230 (0.0015)
[2024-09-22 06:01:12,029][06906] Updated weights for policy 0, policy_version 240 (0.0017)
[2024-09-22 06:01:13,031][04746] Fps is (10 sec: 13926.3, 60 sec: 13516.8, 300 sec: 12441.6). Total num frames: 995328. Throughput: 0: 3367.8. Samples: 245544. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2024-09-22 06:01:13,035][04746] Avg episode reward: [(0, '4.842')]
[2024-09-22 06:01:13,039][06893] Saving new best policy, reward=4.842!
[2024-09-22 06:01:15,348][06906] Updated weights for policy 0, policy_version 250 (0.0014)
[2024-09-22 06:01:18,031][04746] Fps is (10 sec: 13107.2, 60 sec: 13449.1, 300 sec: 12432.6). Total num frames: 1056768. Throughput: 0: 3350.1. Samples: 264436. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-22 06:01:18,033][04746] Avg episode reward: [(0, '5.062')]
[2024-09-22 06:01:18,043][06893] Saving new best policy, reward=5.062!
[2024-09-22 06:01:18,583][06906] Updated weights for policy 0, policy_version 260 (0.0015)
[2024-09-22 06:01:21,557][06906] Updated weights for policy 0, policy_version 270 (0.0016)
[2024-09-22 06:01:23,031][04746] Fps is (10 sec: 13107.4, 60 sec: 13448.6, 300 sec: 12515.6). Total num frames: 1126400. Throughput: 0: 3346.1. Samples: 274590. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-22 06:01:23,034][04746] Avg episode reward: [(0, '5.321')]
[2024-09-22 06:01:23,036][06893] Saving new best policy, reward=5.321!
[2024-09-22 06:01:24,489][06906] Updated weights for policy 0, policy_version 280 (0.0018)
[2024-09-22 06:01:27,668][06906] Updated weights for policy 0, policy_version 290 (0.0014)
[2024-09-22 06:01:28,031][04746] Fps is (10 sec: 13516.9, 60 sec: 13380.2, 300 sec: 12546.7). Total num frames: 1191936. Throughput: 0: 3335.5. Samples: 294956. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-22 06:01:28,033][04746] Avg episode reward: [(0, '5.955')]
[2024-09-22 06:01:28,042][06893] Saving new best policy, reward=5.955!
[2024-09-22 06:01:30,955][06906] Updated weights for policy 0, policy_version 300 (0.0017)
[2024-09-22 06:01:33,031][04746] Fps is (10 sec: 12697.5, 60 sec: 13312.0, 300 sec: 12533.8). Total num frames: 1253376. Throughput: 0: 3336.7. Samples: 314156. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-22 06:01:33,033][04746] Avg episode reward: [(0, '6.551')]
[2024-09-22 06:01:33,055][06893] Saving new best policy, reward=6.551!
[2024-09-22 06:01:33,984][06906] Updated weights for policy 0, policy_version 310 (0.0016)
[2024-09-22 06:01:36,885][06906] Updated weights for policy 0, policy_version 320 (0.0014)
[2024-09-22 06:01:38,031][04746] Fps is (10 sec: 13107.2, 60 sec: 13380.3, 300 sec: 12600.1). Total num frames: 1323008. Throughput: 0: 3331.8. Samples: 324488. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-09-22 06:01:38,035][04746] Avg episode reward: [(0, '7.361')]
[2024-09-22 06:01:38,093][06893] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000324_1327104.pth...
[2024-09-22 06:01:38,180][06893] Saving new best policy, reward=7.361!
[2024-09-22 06:01:39,935][06906] Updated weights for policy 0, policy_version 330 (0.0016)
[2024-09-22 06:01:43,031][04746] Fps is (10 sec: 13516.6, 60 sec: 13312.0, 300 sec: 12623.1). Total num frames: 1388544. Throughput: 0: 3315.4. Samples: 344638. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-22 06:01:43,033][04746] Avg episode reward: [(0, '6.732')]
[2024-09-22 06:01:43,143][06906] Updated weights for policy 0, policy_version 340 (0.0019)
[2024-09-22 06:01:46,376][06906] Updated weights for policy 0, policy_version 350 (0.0014)
[2024-09-22 06:01:48,031][04746] Fps is (10 sec: 13107.2, 60 sec: 13312.0, 300 sec: 12644.2). Total num frames: 1454080. Throughput: 0: 3315.5. Samples: 363858. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2024-09-22 06:01:48,035][04746] Avg episode reward: [(0, '7.363')]
[2024-09-22 06:01:48,043][06893] Saving new best policy, reward=7.363!
[2024-09-22 06:01:49,345][06906] Updated weights for policy 0, policy_version 360 (0.0016)
[2024-09-22 06:01:52,347][06906] Updated weights for policy 0, policy_version 370 (0.0018)
[2024-09-22 06:01:53,031][04746] Fps is (10 sec: 13516.9, 60 sec: 13312.0, 300 sec: 12697.6). Total num frames: 1523712. Throughput: 0: 3316.4. Samples: 374334. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-22 06:01:53,034][04746] Avg episode reward: [(0, '8.228')]
[2024-09-22 06:01:53,038][06893] Saving new best policy, reward=8.228!
[2024-09-22 06:01:55,338][06906] Updated weights for policy 0, policy_version 380 (0.0015)
[2024-09-22 06:01:58,031][04746] Fps is (10 sec: 13516.7, 60 sec: 13243.7, 300 sec: 12714.0). Total num frames: 1589248. Throughput: 0: 3304.1. Samples: 394228. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2024-09-22 06:01:58,035][04746] Avg episode reward: [(0, '10.239')]
[2024-09-22 06:01:58,041][06893] Saving new best policy, reward=10.239!
[2024-09-22 06:01:58,675][06906] Updated weights for policy 0, policy_version 390 (0.0017)
[2024-09-22 06:02:01,808][06906] Updated weights for policy 0, policy_version 400 (0.0015)
[2024-09-22 06:02:03,031][04746] Fps is (10 sec: 13107.2, 60 sec: 13312.0, 300 sec: 12729.1). Total num frames: 1654784. Throughput: 0: 3315.7. Samples: 413642. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-22 06:02:03,034][04746] Avg episode reward: [(0, '10.943')]
[2024-09-22 06:02:03,037][06893] Saving new best policy, reward=10.943!
[2024-09-22 06:02:04,782][06906] Updated weights for policy 0, policy_version 410 (0.0014)
[2024-09-22 06:02:07,832][06906] Updated weights for policy 0, policy_version 420 (0.0015)
[2024-09-22 06:02:08,031][04746] Fps is (10 sec: 13107.3, 60 sec: 13243.7, 300 sec: 12743.1). Total num frames: 1720320. Throughput: 0: 3319.8. Samples: 423980. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-22 06:02:08,033][04746] Avg episode reward: [(0, '11.171')]
[2024-09-22 06:02:08,041][06893] Saving new best policy, reward=11.171!
[2024-09-22 06:02:10,963][06906] Updated weights for policy 0, policy_version 430 (0.0018)
[2024-09-22 06:02:13,031][04746] Fps is (10 sec: 13107.2, 60 sec: 13175.5, 300 sec: 12756.1). Total num frames: 1785856. Throughput: 0: 3302.7. Samples: 443576. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-22 06:02:13,033][04746] Avg episode reward: [(0, '9.843')]
[2024-09-22 06:02:14,230][06906] Updated weights for policy 0, policy_version 440 (0.0015)
[2024-09-22 06:02:17,315][06906] Updated weights for policy 0, policy_version 450 (0.0015)
[2024-09-22 06:02:18,031][04746] Fps is (10 sec: 13107.2, 60 sec: 13243.7, 300 sec: 12768.2). Total num frames: 1851392. Throughput: 0: 3307.7. Samples: 463004. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2024-09-22 06:02:18,033][04746] Avg episode reward: [(0, '11.811')]
[2024-09-22 06:02:18,040][06893] Saving new best policy, reward=11.811!
[2024-09-22 06:02:20,387][06906] Updated weights for policy 0, policy_version 460 (0.0019)
[2024-09-22 06:02:23,031][04746] Fps is (10 sec: 13516.7, 60 sec: 13243.7, 300 sec: 12806.8). Total num frames: 1921024. Throughput: 0: 3302.8. Samples: 473116. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-22 06:02:23,033][04746] Avg episode reward: [(0, '15.738')]
[2024-09-22 06:02:23,037][06893] Saving new best policy, reward=15.738!
[2024-09-22 06:02:23,336][06906] Updated weights for policy 0, policy_version 470 (0.0016)
[2024-09-22 06:02:26,473][06906] Updated weights for policy 0, policy_version 480 (0.0017)
[2024-09-22 06:02:28,031][04746] Fps is (10 sec: 13107.1, 60 sec: 13175.5, 300 sec: 12790.1). Total num frames: 1982464. Throughput: 0: 3297.9. Samples: 493042. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-22 06:02:28,033][04746] Avg episode reward: [(0, '17.847')]
[2024-09-22 06:02:28,042][06893] Saving new best policy, reward=17.847!
[2024-09-22 06:02:29,836][06906] Updated weights for policy 0, policy_version 490 (0.0016)
[2024-09-22 06:02:32,843][06906] Updated weights for policy 0, policy_version 500 (0.0017)
[2024-09-22 06:02:33,031][04746] Fps is (10 sec: 12697.7, 60 sec: 13243.7, 300 sec: 12800.0). Total num frames: 2048000. Throughput: 0: 3304.2. Samples: 512546. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-22 06:02:33,033][04746] Avg episode reward: [(0, '18.660')]
[2024-09-22 06:02:33,036][06893] Saving new best policy, reward=18.660!
[2024-09-22 06:02:35,862][06906] Updated weights for policy 0, policy_version 510 (0.0014)
[2024-09-22 06:02:38,031][04746] Fps is (10 sec: 13516.8, 60 sec: 13243.7, 300 sec: 12834.1). Total num frames: 2117632. Throughput: 0: 3298.8. Samples: 522778. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-22 06:02:38,034][04746] Avg episode reward: [(0, '17.260')]
[2024-09-22 06:02:38,822][06906] Updated weights for policy 0, policy_version 520 (0.0015)
[2024-09-22 06:02:42,162][06906] Updated weights for policy 0, policy_version 530 (0.0020)
[2024-09-22 06:02:43,031][04746] Fps is (10 sec: 13107.1, 60 sec: 13175.5, 300 sec: 12818.1). Total num frames: 2179072. Throughput: 0: 3289.2. Samples: 542240. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-22 06:02:43,033][04746] Avg episode reward: [(0, '18.519')]
[2024-09-22 06:02:45,374][06906] Updated weights for policy 0, policy_version 540 (0.0014)
[2024-09-22 06:02:48,031][04746] Fps is (10 sec: 13107.3, 60 sec: 13243.7, 300 sec: 12849.7). Total num frames: 2248704. Throughput: 0: 3299.2. Samples: 562106. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-22 06:02:48,034][04746] Avg episode reward: [(0, '17.896')]
[2024-09-22 06:02:48,298][06906] Updated weights for policy 0, policy_version 550 (0.0015)
[2024-09-22 06:02:51,351][06906] Updated weights for policy 0, policy_version 560 (0.0017)
[2024-09-22 06:02:53,031][04746] Fps is (10 sec: 13516.9, 60 sec: 13175.5, 300 sec: 12856.9). Total num frames: 2314240. Throughput: 0: 3293.2. Samples: 572176. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-09-22 06:02:53,033][04746] Avg episode reward: [(0, '19.252')]
[2024-09-22 06:02:53,037][06893] Saving new best policy, reward=19.252!
[2024-09-22 06:02:54,461][06906] Updated weights for policy 0, policy_version 570 (0.0020)
[2024-09-22 06:02:57,854][06906] Updated weights for policy 0, policy_version 580 (0.0016)
[2024-09-22 06:02:58,031][04746] Fps is (10 sec: 12697.6, 60 sec: 13107.2, 300 sec: 12841.5). Total num frames: 2375680. Throughput: 0: 3285.8. Samples: 591436. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-22 06:02:58,033][04746] Avg episode reward: [(0, '19.334')]
[2024-09-22 06:02:58,043][06893] Saving new best policy, reward=19.334!
[2024-09-22 06:03:00,813][06906] Updated weights for policy 0, policy_version 590 (0.0015)
[2024-09-22 06:03:03,031][04746] Fps is (10 sec: 13107.2, 60 sec: 13175.5, 300 sec: 12870.1). Total num frames: 2445312. Throughput: 0: 3301.7. Samples: 611580. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2024-09-22 06:03:03,033][04746] Avg episode reward: [(0, '19.455')]
[2024-09-22 06:03:03,035][06893] Saving new best policy, reward=19.455!
[2024-09-22 06:03:03,857][06906] Updated weights for policy 0, policy_version 600 (0.0014)
[2024-09-22 06:03:06,865][06906] Updated weights for policy 0, policy_version 610 (0.0014)
[2024-09-22 06:03:08,031][04746] Fps is (10 sec: 13516.7, 60 sec: 13175.5, 300 sec: 12876.1). Total num frames: 2510848. Throughput: 0: 3301.4. Samples: 621680. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-22 06:03:08,034][04746] Avg episode reward: [(0, '21.272')]
[2024-09-22 06:03:08,042][06893] Saving new best policy, reward=21.272!
[2024-09-22 06:03:10,156][06906] Updated weights for policy 0, policy_version 620 (0.0016)
[2024-09-22 06:03:13,031][04746] Fps is (10 sec: 12697.5, 60 sec: 13107.2, 300 sec: 12861.4). Total num frames: 2572288. Throughput: 0: 3277.0. Samples: 640508. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-09-22 06:03:13,033][04746] Avg episode reward: [(0, '21.166')]
[2024-09-22 06:03:13,485][06906] Updated weights for policy 0, policy_version 630 (0.0015)
[2024-09-22 06:03:16,486][06906] Updated weights for policy 0, policy_version 640 (0.0015)
[2024-09-22 06:03:18,031][04746] Fps is (10 sec: 13107.1, 60 sec: 13175.4, 300 sec: 12887.4). Total num frames: 2641920. Throughput: 0: 3288.1. Samples: 660510. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-22 06:03:18,034][04746] Avg episode reward: [(0, '20.795')]
[2024-09-22 06:03:19,482][06906] Updated weights for policy 0, policy_version 650 (0.0017)
[2024-09-22 06:03:22,441][06906] Updated weights for policy 0, policy_version 660 (0.0017)
[2024-09-22 06:03:23,031][04746] Fps is (10 sec: 13516.8, 60 sec: 13107.2, 300 sec: 12892.6). Total num frames: 2707456. Throughput: 0: 3287.4. Samples: 670712. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-22 06:03:23,034][04746] Avg episode reward: [(0, '19.731')]
[2024-09-22 06:03:25,766][06906] Updated weights for policy 0, policy_version 670 (0.0017)
[2024-09-22 06:03:28,031][04746] Fps is (10 sec: 13107.3, 60 sec: 13175.5, 300 sec: 12897.6). Total num frames: 2772992. Throughput: 0: 3283.5. Samples: 689998. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2024-09-22 06:03:28,033][04746] Avg episode reward: [(0, '21.732')]
[2024-09-22 06:03:28,043][06893] Saving new best policy, reward=21.732!
[2024-09-22 06:03:28,876][06906] Updated weights for policy 0, policy_version 680 (0.0020)
[2024-09-22 06:03:31,786][06906] Updated weights for policy 0, policy_version 690 (0.0016)
[2024-09-22 06:03:33,031][04746] Fps is (10 sec: 13516.8, 60 sec: 13243.7, 300 sec: 12921.0). Total num frames: 2842624. Throughput: 0: 3303.7. Samples: 710774. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-09-22 06:03:33,033][04746] Avg episode reward: [(0, '21.135')]
[2024-09-22 06:03:34,764][06906] Updated weights for policy 0, policy_version 700 (0.0015)
[2024-09-22 06:03:37,840][06906] Updated weights for policy 0, policy_version 710 (0.0014)
[2024-09-22 06:03:38,031][04746] Fps is (10 sec: 13516.6, 60 sec: 13175.4, 300 sec: 12925.1). Total num frames: 2908160. Throughput: 0: 3307.5. Samples: 721014. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-22 06:03:38,034][04746] Avg episode reward: [(0, '19.791')]
[2024-09-22 06:03:38,044][06893] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000710_2908160.pth...
[2024-09-22 06:03:41,191][06906] Updated weights for policy 0, policy_version 720 (0.0017)
[2024-09-22 06:03:43,031][04746] Fps is (10 sec: 13106.9, 60 sec: 13243.7, 300 sec: 12929.1). Total num frames: 2973696. Throughput: 0: 3304.0. Samples: 740116. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-09-22 06:03:43,034][04746] Avg episode reward: [(0, '21.972')]
[2024-09-22 06:03:43,037][06893] Saving new best policy, reward=21.972!
[2024-09-22 06:03:44,083][06906] Updated weights for policy 0, policy_version 730 (0.0015)
[2024-09-22 06:03:46,998][06906] Updated weights for policy 0, policy_version 740 (0.0014)
[2024-09-22 06:03:48,031][04746] Fps is (10 sec: 13517.1, 60 sec: 13243.7, 300 sec: 12950.3). Total num frames: 3043328. Throughput: 0: 3329.4. Samples: 761404. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2024-09-22 06:03:48,033][04746] Avg episode reward: [(0, '26.497')]
[2024-09-22 06:03:48,044][06893] Saving new best policy, reward=26.497!
[2024-09-22 06:03:49,881][06906] Updated weights for policy 0, policy_version 750 (0.0015)
[2024-09-22 06:03:52,913][06906] Updated weights for policy 0, policy_version 760 (0.0019)
[2024-09-22 06:03:53,031][04746] Fps is (10 sec: 13926.7, 60 sec: 13312.0, 300 sec: 12970.7). Total num frames: 3112960. Throughput: 0: 3336.3. Samples: 771812. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-09-22 06:03:53,033][04746] Avg episode reward: [(0, '27.469')]
[2024-09-22 06:03:53,035][06893] Saving new best policy, reward=27.469!
[2024-09-22 06:03:56,124][06906] Updated weights for policy 0, policy_version 770 (0.0021)
[2024-09-22 06:03:58,031][04746] Fps is (10 sec: 13516.8, 60 sec: 13380.3, 300 sec: 12973.5). Total num frames: 3178496. Throughput: 0: 3348.5. Samples: 791188. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-22 06:03:58,034][04746] Avg episode reward: [(0, '24.919')]
[2024-09-22 06:03:59,087][06906] Updated weights for policy 0, policy_version 780 (0.0015)
[2024-09-22 06:04:02,101][06906] Updated weights for policy 0, policy_version 790 (0.0017)
[2024-09-22 06:04:03,031][04746] Fps is (10 sec: 13516.8, 60 sec: 13380.3, 300 sec: 12992.5). Total num frames: 3248128. Throughput: 0: 3368.8. Samples: 812106. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2024-09-22 06:04:03,034][04746] Avg episode reward: [(0, '24.168')]
[2024-09-22 06:04:05,093][06906] Updated weights for policy 0, policy_version 800 (0.0017)
[2024-09-22 06:04:08,031][04746] Fps is (10 sec: 13107.2, 60 sec: 13312.0, 300 sec: 12978.7). Total num frames: 3309568. Throughput: 0: 3366.3. Samples: 822196. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2024-09-22 06:04:08,033][04746] Avg episode reward: [(0, '23.725')]
[2024-09-22 06:04:08,410][06906] Updated weights for policy 0, policy_version 810 (0.0014)
[2024-09-22 06:04:11,477][06906] Updated weights for policy 0, policy_version 820 (0.0015)
[2024-09-22 06:04:13,031][04746] Fps is (10 sec: 13107.2, 60 sec: 13448.5, 300 sec: 12996.9). Total num frames: 3379200. Throughput: 0: 3366.3. Samples: 841482. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-09-22 06:04:13,034][04746] Avg episode reward: [(0, '23.472')]
[2024-09-22 06:04:14,400][06906] Updated weights for policy 0, policy_version 830 (0.0014)
[2024-09-22 06:04:17,234][06906] Updated weights for policy 0, policy_version 840 (0.0015)
[2024-09-22 06:04:18,031][04746] Fps is (10 sec: 13926.3, 60 sec: 13448.5, 300 sec: 13014.5). Total num frames: 3448832. Throughput: 0: 3378.0. Samples: 862784. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2024-09-22 06:04:18,033][04746] Avg episode reward: [(0, '21.957')]
[2024-09-22 06:04:20,179][06906] Updated weights for policy 0, policy_version 850 (0.0016)
[2024-09-22 06:04:23,031][04746] Fps is (10 sec: 13516.9, 60 sec: 13448.5, 300 sec: 13016.2). Total num frames: 3514368. Throughput: 0: 3377.9. Samples: 873020. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-22 06:04:23,034][04746] Avg episode reward: [(0, '22.854')]
[2024-09-22 06:04:23,411][06906] Updated weights for policy 0, policy_version 860 (0.0016)
[2024-09-22 06:04:26,393][06906] Updated weights for policy 0, policy_version 870 (0.0016)
[2024-09-22 06:04:28,031][04746] Fps is (10 sec: 13516.5, 60 sec: 13516.7, 300 sec: 13032.7). Total num frames: 3584000. Throughput: 0: 3400.0. Samples: 893116. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2024-09-22 06:04:28,034][04746] Avg episode reward: [(0, '24.346')]
[2024-09-22 06:04:29,222][06906] Updated weights for policy 0, policy_version 880 (0.0015)
[2024-09-22 06:04:32,062][06906] Updated weights for policy 0, policy_version 890 (0.0016)
[2024-09-22 06:04:33,031][04746] Fps is (10 sec: 14336.0, 60 sec: 13585.1, 300 sec: 13063.3). Total num frames: 3657728. Throughput: 0: 3406.0. Samples: 914676. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-22 06:04:33,034][04746] Avg episode reward: [(0, '25.572')]
[2024-09-22 06:04:34,926][06906] Updated weights for policy 0, policy_version 900 (0.0015)
[2024-09-22 06:04:38,031][04746] Fps is (10 sec: 13926.8, 60 sec: 13585.1, 300 sec: 13064.1). Total num frames: 3723264. Throughput: 0: 3406.2. Samples: 925090. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-22 06:04:38,033][04746] Avg episode reward: [(0, '25.821')]
[2024-09-22 06:04:38,157][06906] Updated weights for policy 0, policy_version 910 (0.0017)
[2024-09-22 06:04:41,098][06906] Updated weights for policy 0, policy_version 920 (0.0017)
[2024-09-22 06:04:43,031][04746] Fps is (10 sec: 13516.8, 60 sec: 13653.4, 300 sec: 13079.0). Total num frames: 3792896. Throughput: 0: 3424.8. Samples: 945304. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-09-22 06:04:43,033][04746] Avg episode reward: [(0, '25.851')]
[2024-09-22 06:04:43,917][06906] Updated weights for policy 0, policy_version 930 (0.0015)
[2024-09-22 06:04:46,694][06906] Updated weights for policy 0, policy_version 940 (0.0014)
[2024-09-22 06:04:48,031][04746] Fps is (10 sec: 14335.9, 60 sec: 13721.6, 300 sec: 13107.2). Total num frames: 3866624. Throughput: 0: 3445.4. Samples: 967150. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-22 06:04:48,035][04746] Avg episode reward: [(0, '27.156')]
[2024-09-22 06:04:49,569][06906] Updated weights for policy 0, policy_version 950 (0.0014)
[2024-09-22 06:04:52,744][06906] Updated weights for policy 0, policy_version 960 (0.0016)
[2024-09-22 06:04:53,031][04746] Fps is (10 sec: 13926.3, 60 sec: 13653.3, 300 sec: 13329.4). Total num frames: 3932160. Throughput: 0: 3449.2. Samples: 977410. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-22 06:04:53,034][04746] Avg episode reward: [(0, '28.118')]
[2024-09-22 06:04:53,073][06893] Saving new best policy, reward=28.118!
[2024-09-22 06:04:55,653][06906] Updated weights for policy 0, policy_version 970 (0.0015)
[2024-09-22 06:04:58,031][04746] Fps is (10 sec: 13926.5, 60 sec: 13789.8, 300 sec: 13398.8). Total num frames: 4005888. Throughput: 0: 3480.9. Samples: 998122. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2024-09-22 06:04:58,033][04746] Avg episode reward: [(0, '26.885')]
[2024-09-22 06:04:58,467][06906] Updated weights for policy 0, policy_version 980 (0.0014)
[2024-09-22 06:05:01,257][06906] Updated weights for policy 0, policy_version 990 (0.0014)
[2024-09-22 06:05:03,031][04746] Fps is (10 sec: 14745.6, 60 sec: 13858.1, 300 sec: 13440.4). Total num frames: 4079616. Throughput: 0: 3496.9. Samples: 1020144. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2024-09-22 06:05:03,034][04746] Avg episode reward: [(0, '26.385')]
[2024-09-22 06:05:04,146][06906] Updated weights for policy 0, policy_version 1000 (0.0014)
[2024-09-22 06:05:07,210][06906] Updated weights for policy 0, policy_version 1010 (0.0014)
[2024-09-22 06:05:08,031][04746] Fps is (10 sec: 13926.4, 60 sec: 13926.4, 300 sec: 13426.5). Total num frames: 4145152. Throughput: 0: 3498.6. Samples: 1030456. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2024-09-22 06:05:08,033][04746] Avg episode reward: [(0, '23.512')]
[2024-09-22 06:05:10,028][06906] Updated weights for policy 0, policy_version 1020 (0.0014)
[2024-09-22 06:05:12,812][06906] Updated weights for policy 0, policy_version 1030 (0.0016)
[2024-09-22 06:05:13,031][04746] Fps is (10 sec: 13926.5, 60 sec: 13994.7, 300 sec: 13454.4). Total num frames: 4218880. Throughput: 0: 3524.5. Samples: 1051718. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2024-09-22 06:05:13,034][04746] Avg episode reward: [(0, '25.543')]
[2024-09-22 06:05:15,610][06906] Updated weights for policy 0, policy_version 1040 (0.0018)
[2024-09-22 06:05:18,031][04746] Fps is (10 sec: 14745.3, 60 sec: 14062.9, 300 sec: 13468.2). Total num frames: 4292608. Throughput: 0: 3532.4. Samples: 1073634. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-22 06:05:18,033][04746] Avg episode reward: [(0, '26.777')]
[2024-09-22 06:05:18,500][06906] Updated weights for policy 0, policy_version 1050 (0.0016)
[2024-09-22 06:05:21,655][06906] Updated weights for policy 0, policy_version 1060 (0.0021)
[2024-09-22 06:05:23,031][04746] Fps is (10 sec: 13926.3, 60 sec: 14062.9, 300 sec: 13454.3). Total num frames: 4358144. Throughput: 0: 3519.3. Samples: 1083460. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2024-09-22 06:05:23,034][04746] Avg episode reward: [(0, '25.521')]
[2024-09-22 06:05:24,450][06906] Updated weights for policy 0, policy_version 1070 (0.0014)
[2024-09-22 06:05:27,263][06906] Updated weights for policy 0, policy_version 1080 (0.0014)
[2024-09-22 06:05:28,031][04746] Fps is (10 sec: 13926.6, 60 sec: 14131.2, 300 sec: 13482.1). Total num frames: 4431872. Throughput: 0: 3545.8. Samples: 1104864. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-22 06:05:28,033][04746] Avg episode reward: [(0, '25.581')]
[2024-09-22 06:05:30,185][06906] Updated weights for policy 0, policy_version 1090 (0.0014)
[2024-09-22 06:05:33,031][04746] Fps is (10 sec: 14336.0, 60 sec: 14062.9, 300 sec: 13496.0). Total num frames: 4501504. Throughput: 0: 3531.2. Samples: 1126052. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0)
[2024-09-22 06:05:33,034][04746] Avg episode reward: [(0, '24.798')]
[2024-09-22 06:05:33,088][06906] Updated weights for policy 0, policy_version 1100 (0.0019)
[2024-09-22 06:05:36,247][06906] Updated weights for policy 0, policy_version 1110 (0.0015)
[2024-09-22 06:05:38,031][04746] Fps is (10 sec: 13926.5, 60 sec: 14131.2, 300 sec: 13496.0). Total num frames: 4571136. Throughput: 0: 3521.8. Samples: 1135890. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-22 06:05:38,034][04746] Avg episode reward: [(0, '22.580')]
[2024-09-22 06:05:38,043][06893] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001116_4571136.pth...
[2024-09-22 06:05:38,124][06893] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000324_1327104.pth
[2024-09-22 06:05:39,129][06906] Updated weights for policy 0, policy_version 1120 (0.0015)
[2024-09-22 06:05:41,966][06906] Updated weights for policy 0, policy_version 1130 (0.0016)
[2024-09-22 06:05:43,031][04746] Fps is (10 sec: 13926.3, 60 sec: 14131.2, 300 sec: 13509.9). Total num frames: 4640768. Throughput: 0: 3534.8. Samples: 1157188. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2024-09-22 06:05:43,035][04746] Avg episode reward: [(0, '25.251')]
[2024-09-22 06:05:44,780][06906] Updated weights for policy 0, policy_version 1140 (0.0016)
[2024-09-22 06:05:47,685][06906] Updated weights for policy 0, policy_version 1150 (0.0016)
[2024-09-22 06:05:48,031][04746] Fps is (10 sec: 14336.0, 60 sec: 14131.2, 300 sec: 13523.7). Total num frames: 4714496. Throughput: 0: 3521.1. Samples: 1178592. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-22 06:05:48,034][04746] Avg episode reward: [(0, '25.028')]
[2024-09-22 06:05:50,830][06906] Updated weights for policy 0, policy_version 1160 (0.0018)
[2024-09-22 06:05:53,031][04746] Fps is (10 sec: 13926.4, 60 sec: 14131.2, 300 sec: 13509.9). Total num frames: 4780032. Throughput: 0: 3511.0. Samples: 1188452. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-22 06:05:53,033][04746] Avg episode reward: [(0, '25.848')]
[2024-09-22 06:05:53,683][06906] Updated weights for policy 0, policy_version 1170 (0.0016)
[2024-09-22 06:05:56,521][06906] Updated weights for policy 0, policy_version 1180 (0.0015)
[2024-09-22 06:05:58,031][04746] Fps is (10 sec: 13926.3, 60 sec: 14131.2, 300 sec: 13551.5). Total num frames: 4853760. Throughput: 0: 3520.9. Samples: 1210160. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2024-09-22 06:05:58,033][04746] Avg episode reward: [(0, '27.013')]
[2024-09-22 06:05:59,297][06906] Updated weights for policy 0, policy_version 1190 (0.0014)
[2024-09-22 06:06:02,256][06906] Updated weights for policy 0, policy_version 1200 (0.0014)
[2024-09-22 06:06:03,031][04746] Fps is (10 sec: 14336.0, 60 sec: 14062.9, 300 sec: 13551.5). Total num frames: 4923392. Throughput: 0: 3504.4. Samples: 1231332. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2024-09-22 06:06:03,034][04746] Avg episode reward: [(0, '26.650')]
[2024-09-22 06:06:05,409][06906] Updated weights for policy 0, policy_version 1210 (0.0015)
[2024-09-22 06:06:08,031][04746] Fps is (10 sec: 13926.3, 60 sec: 14131.2, 300 sec: 13551.5). Total num frames: 4993024. Throughput: 0: 3504.7. Samples: 1241172. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-09-22 06:06:08,034][04746] Avg episode reward: [(0, '28.227')]
[2024-09-22 06:06:08,044][06893] Saving new best policy, reward=28.227!
[2024-09-22 06:06:08,226][06906] Updated weights for policy 0, policy_version 1220 (0.0015)
[2024-09-22 06:06:11,012][06906] Updated weights for policy 0, policy_version 1230 (0.0017)
[2024-09-22 06:06:13,031][04746] Fps is (10 sec: 14336.1, 60 sec: 14131.2, 300 sec: 13593.2). Total num frames: 5066752. Throughput: 0: 3515.6. Samples: 1263066. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-22 06:06:13,033][04746] Avg episode reward: [(0, '27.215')]
[2024-09-22 06:06:13,844][06906] Updated weights for policy 0, policy_version 1240 (0.0016)
[2024-09-22 06:06:16,820][06906] Updated weights for policy 0, policy_version 1250 (0.0014)
[2024-09-22 06:06:18,031][04746] Fps is (10 sec: 13926.6, 60 sec: 13994.7, 300 sec: 13579.3). Total num frames: 5132288. Throughput: 0: 3506.6. Samples: 1283850. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2024-09-22 06:06:18,034][04746] Avg episode reward: [(0, '26.215')]
[2024-09-22 06:06:19,919][06906] Updated weights for policy 0, policy_version 1260 (0.0016)
[2024-09-22 06:06:22,675][06906] Updated weights for policy 0, policy_version 1270 (0.0015)
[2024-09-22 06:06:23,031][04746] Fps is (10 sec: 13926.4, 60 sec: 14131.2, 300 sec: 13607.1). Total num frames: 5206016. Throughput: 0: 3515.7. Samples: 1294098. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-09-22 06:06:23,033][04746] Avg episode reward: [(0, '27.755')]
[2024-09-22 06:06:25,477][06906] Updated weights for policy 0, policy_version 1280 (0.0017)
[2024-09-22 06:06:28,031][04746] Fps is (10 sec: 14745.7, 60 sec: 14131.2, 300 sec: 13648.7). Total num frames: 5279744. Throughput: 0: 3536.1. Samples: 1316314. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-22 06:06:28,033][04746] Avg episode reward: [(0, '28.448')]
[2024-09-22 06:06:28,042][06893] Saving new best policy, reward=28.448!
[2024-09-22 06:06:28,227][06906] Updated weights for policy 0, policy_version 1290 (0.0016)
[2024-09-22 06:06:31,196][06906] Updated weights for policy 0, policy_version 1300 (0.0014)
[2024-09-22 06:06:33,031][04746] Fps is (10 sec: 13926.3, 60 sec: 14062.9, 300 sec: 13634.8). Total num frames: 5345280. Throughput: 0: 3520.6. Samples: 1337020. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-22 06:06:33,033][04746] Avg episode reward: [(0, '29.035')]
[2024-09-22 06:06:33,088][06893] Saving new best policy, reward=29.035!
[2024-09-22 06:06:34,328][06906] Updated weights for policy 0, policy_version 1310 (0.0016)
[2024-09-22 06:06:37,134][06906] Updated weights for policy 0, policy_version 1320 (0.0014)
[2024-09-22 06:06:38,031][04746] Fps is (10 sec: 13926.4, 60 sec: 14131.2, 300 sec: 13662.6). Total num frames: 5419008. Throughput: 0: 3534.7. Samples: 1347514. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2024-09-22 06:06:38,033][04746] Avg episode reward: [(0, '25.313')]
[2024-09-22 06:06:39,945][06906] Updated weights for policy 0, policy_version 1330 (0.0014)
[2024-09-22 06:06:42,784][06906] Updated weights for policy 0, policy_version 1340 (0.0013)
[2024-09-22 06:06:43,031][04746] Fps is (10 sec: 14336.0, 60 sec: 14131.2, 300 sec: 13676.5). Total num frames: 5488640. Throughput: 0: 3534.7. Samples: 1369222. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-22 06:06:43,034][04746] Avg episode reward: [(0, '29.276')]
[2024-09-22 06:06:43,069][06893] Saving new best policy, reward=29.276!
[2024-09-22 06:06:45,725][06906] Updated weights for policy 0, policy_version 1350 (0.0014)
[2024-09-22 06:06:48,031][04746] Fps is (10 sec: 13926.3, 60 sec: 14062.9, 300 sec: 13676.5). Total num frames: 5558272. Throughput: 0: 3523.2. Samples: 1389876. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-22 06:06:48,035][04746] Avg episode reward: [(0, '28.923')]
[2024-09-22 06:06:48,756][06906] Updated weights for policy 0, policy_version 1360 (0.0018)
[2024-09-22 06:06:51,574][06906] Updated weights for policy 0, policy_version 1370 (0.0015)
[2024-09-22 06:06:53,031][04746] Fps is (10 sec: 14336.0, 60 sec: 14199.5, 300 sec: 13704.2). Total num frames: 5632000. Throughput: 0: 3545.9. Samples: 1400736. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-22 06:06:53,034][04746] Avg episode reward: [(0, '28.842')]
[2024-09-22 06:06:54,417][06906] Updated weights for policy 0, policy_version 1380 (0.0015)
[2024-09-22 06:06:57,212][06906] Updated weights for policy 0, policy_version 1390 (0.0013)
[2024-09-22 06:06:58,031][04746] Fps is (10 sec: 14745.6, 60 sec: 14199.5, 300 sec: 13732.0). Total num frames: 5705728. Throughput: 0: 3544.0. Samples: 1422546. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-22 06:06:58,034][04746] Avg episode reward: [(0, '29.792')]
[2024-09-22 06:06:58,044][06893] Saving new best policy, reward=29.792!
[2024-09-22 06:07:00,170][06906] Updated weights for policy 0, policy_version 1400 (0.0016)
[2024-09-22 06:07:03,031][04746] Fps is (10 sec: 13926.0, 60 sec: 14131.1, 300 sec: 13732.0). Total num frames: 5771264. Throughput: 0: 3535.4. Samples: 1442942. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-22 06:07:03,034][04746] Avg episode reward: [(0, '26.261')]
[2024-09-22 06:07:03,245][06906] Updated weights for policy 0, policy_version 1410 (0.0015)
[2024-09-22 06:07:06,026][06906] Updated weights for policy 0, policy_version 1420 (0.0014)
[2024-09-22 06:07:08,031][04746] Fps is (10 sec: 13926.4, 60 sec: 14199.5, 300 sec: 13759.8). Total num frames: 5844992. Throughput: 0: 3553.7. Samples: 1454014. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2024-09-22 06:07:08,033][04746] Avg episode reward: [(0, '23.809')]
[2024-09-22 06:07:08,746][06906] Updated weights for policy 0, policy_version 1430 (0.0013)
[2024-09-22 06:07:11,538][06906] Updated weights for policy 0, policy_version 1440 (0.0016)
[2024-09-22 06:07:13,031][04746] Fps is (10 sec: 14746.0, 60 sec: 14199.5, 300 sec: 13787.6). Total num frames: 5918720. Throughput: 0: 3553.0. Samples: 1476198. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-22 06:07:13,034][04746] Avg episode reward: [(0, '24.877')]
[2024-09-22 06:07:14,501][06906] Updated weights for policy 0, policy_version 1450 (0.0014)
[2024-09-22 06:07:17,502][06906] Updated weights for policy 0, policy_version 1460 (0.0018)
[2024-09-22 06:07:18,031][04746] Fps is (10 sec: 13926.4, 60 sec: 14199.5, 300 sec: 13773.7). Total num frames: 5984256. Throughput: 0: 3550.3. Samples: 1496784. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-22 06:07:18,033][04746] Avg episode reward: [(0, '23.963')]
[2024-09-22 06:07:20,270][06906] Updated weights for policy 0, policy_version 1470 (0.0014)
[2024-09-22 06:07:23,031][04746] Fps is (10 sec: 13926.5, 60 sec: 14199.5, 300 sec: 13815.3). Total num frames: 6057984. Throughput: 0: 3567.2. Samples: 1508040. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0)
[2024-09-22 06:07:23,035][04746] Avg episode reward: [(0, '24.729')]
[2024-09-22 06:07:23,054][06906] Updated weights for policy 0, policy_version 1480 (0.0016)
[2024-09-22 06:07:25,760][06906] Updated weights for policy 0, policy_version 1490 (0.0014)
[2024-09-22 06:07:28,031][04746] Fps is (10 sec: 14745.5, 60 sec: 14199.4, 300 sec: 13843.1). Total num frames: 6131712. Throughput: 0: 3574.1. Samples: 1530058. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-22 06:07:28,033][04746] Avg episode reward: [(0, '27.785')]
[2024-09-22 06:07:28,824][06906] Updated weights for policy 0, policy_version 1500 (0.0018)
[2024-09-22 06:07:31,944][06906] Updated weights for policy 0, policy_version 1510 (0.0018)
[2024-09-22 06:07:33,031][04746] Fps is (10 sec: 13926.3, 60 sec: 14199.5, 300 sec: 13829.2). Total num frames: 6197248. Throughput: 0: 3557.4. Samples: 1549960. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-22 06:07:33,034][04746] Avg episode reward: [(0, '29.769')]
[2024-09-22 06:07:34,749][06906] Updated weights for policy 0, policy_version 1520 (0.0016)
[2024-09-22 06:07:37,616][06906] Updated weights for policy 0, policy_version 1530 (0.0014)
[2024-09-22 06:07:38,031][04746] Fps is (10 sec: 13926.6, 60 sec: 14199.5, 300 sec: 13870.9). Total num frames: 6270976. Throughput: 0: 3557.6. Samples: 1560828. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-22 06:07:38,033][04746] Avg episode reward: [(0, '26.364')]
[2024-09-22 06:07:38,042][06893] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001531_6270976.pth...
[2024-09-22 06:07:38,123][06893] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000710_2908160.pth
[2024-09-22 06:07:40,498][06906] Updated weights for policy 0, policy_version 1540 (0.0015)
[2024-09-22 06:07:43,031][04746] Fps is (10 sec: 14336.0, 60 sec: 14199.5, 300 sec: 13870.9). Total num frames: 6340608. Throughput: 0: 3542.5. Samples: 1581960. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-22 06:07:43,034][04746] Avg episode reward: [(0, '24.922')]
[2024-09-22 06:07:43,629][06906] Updated weights for policy 0, policy_version 1550 (0.0015)
[2024-09-22 06:07:46,667][06906] Updated weights for policy 0, policy_version 1560 (0.0015)
[2024-09-22 06:07:48,031][04746] Fps is (10 sec: 13516.7, 60 sec: 14131.2, 300 sec: 13870.9). Total num frames: 6406144. Throughput: 0: 3544.8. Samples: 1602458. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-22 06:07:48,034][04746] Avg episode reward: [(0, '27.267')]
[2024-09-22 06:07:49,407][06906] Updated weights for policy 0, policy_version 1570 (0.0014)
[2024-09-22 06:07:52,150][06906] Updated weights for policy 0, policy_version 1580 (0.0016)
[2024-09-22 06:07:53,031][04746] Fps is (10 sec: 14336.1, 60 sec: 14199.5, 300 sec: 13926.4). Total num frames: 6483968. Throughput: 0: 3545.2. Samples: 1613550. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2024-09-22 06:07:53,034][04746] Avg episode reward: [(0, '26.000')]
[2024-09-22 06:07:54,976][06906] Updated weights for policy 0, policy_version 1590 (0.0015)
[2024-09-22 06:07:57,903][06906] Updated weights for policy 0, policy_version 1600 (0.0019)
[2024-09-22 06:07:58,031][04746] Fps is (10 sec: 14745.6, 60 sec: 14131.2, 300 sec: 13926.4). Total num frames: 6553600. Throughput: 0: 3533.6. Samples: 1635212. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-22 06:07:58,034][04746] Avg episode reward: [(0, '28.912')]
[2024-09-22 06:08:00,923][06906] Updated weights for policy 0, policy_version 1610 (0.0015)
[2024-09-22 06:08:03,031][04746] Fps is (10 sec: 13926.3, 60 sec: 14199.5, 300 sec: 13940.3). Total num frames: 6623232. Throughput: 0: 3544.4. Samples: 1656284. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-22 06:08:03,033][04746] Avg episode reward: [(0, '31.208')]
[2024-09-22 06:08:03,037][06893] Saving new best policy, reward=31.208!
[2024-09-22 06:08:03,740][06906] Updated weights for policy 0, policy_version 1620 (0.0013)
[2024-09-22 06:08:06,459][06906] Updated weights for policy 0, policy_version 1630 (0.0014)
[2024-09-22 06:08:08,031][04746] Fps is (10 sec: 14335.9, 60 sec: 14199.5, 300 sec: 13981.9). Total num frames: 6696960. Throughput: 0: 3540.1. Samples: 1667344. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-22 06:08:08,034][04746] Avg episode reward: [(0, '29.081')]
[2024-09-22 06:08:09,240][06906] Updated weights for policy 0, policy_version 1640 (0.0014)
[2024-09-22 06:08:12,326][06906] Updated weights for policy 0, policy_version 1650 (0.0016)
[2024-09-22 06:08:13,031][04746] Fps is (10 sec: 14336.0, 60 sec: 14131.2, 300 sec: 13981.9). Total num frames: 6766592. Throughput: 0: 3523.4. Samples: 1688610. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2024-09-22 06:08:13,034][04746] Avg episode reward: [(0, '28.647')]
[2024-09-22 06:08:15,293][06906] Updated weights for policy 0, policy_version 1660 (0.0015)
[2024-09-22 06:08:18,031][04746] Fps is (10 sec: 13926.4, 60 sec: 14199.5, 300 sec: 13995.8). Total num frames: 6836224. Throughput: 0: 3552.0. Samples: 1709802. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2024-09-22 06:08:18,034][04746] Avg episode reward: [(0, '27.663')]
[2024-09-22 06:08:18,055][06906] Updated weights for policy 0, policy_version 1670 (0.0014)
[2024-09-22 06:08:20,785][06906] Updated weights for policy 0, policy_version 1680 (0.0014)
[2024-09-22 06:08:23,031][04746] Fps is (10 sec: 14745.7, 60 sec: 14267.7, 300 sec: 14037.5). Total num frames: 6914048. Throughput: 0: 3558.1. Samples: 1720942. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-22 06:08:23,034][04746] Avg episode reward: [(0, '27.779')]
[2024-09-22 06:08:23,555][06906] Updated weights for policy 0, policy_version 1690 (0.0014)
[2024-09-22 06:08:26,565][06906] Updated weights for policy 0, policy_version 1700 (0.0014)
[2024-09-22 06:08:28,031][04746] Fps is (10 sec: 14336.0, 60 sec: 14131.2, 300 sec: 14023.6). Total num frames: 6979584. Throughput: 0: 3560.0. Samples: 1742160. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-22 06:08:28,034][04746] Avg episode reward: [(0, '26.484')]
[2024-09-22 06:08:29,575][06906] Updated weights for policy 0, policy_version 1710 (0.0014)
[2024-09-22 06:08:32,341][06906] Updated weights for policy 0, policy_version 1720 (0.0014)
[2024-09-22 06:08:33,031][04746] Fps is (10 sec: 13926.5, 60 sec: 14267.8, 300 sec: 14051.4). Total num frames: 7053312. Throughput: 0: 3584.4. Samples: 1763758. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2024-09-22 06:08:33,034][04746] Avg episode reward: [(0, '28.002')]
[2024-09-22 06:08:35,066][06906] Updated weights for policy 0, policy_version 1730 (0.0014)
[2024-09-22 06:08:37,861][06906] Updated weights for policy 0, policy_version 1740 (0.0017)
[2024-09-22 06:08:38,031][04746] Fps is (10 sec: 14745.6, 60 sec: 14267.7, 300 sec: 14079.1). Total num frames: 7127040. Throughput: 0: 3585.8. Samples: 1774912. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-22 06:08:38,035][04746] Avg episode reward: [(0, '25.099')]
[2024-09-22 06:08:40,840][06906] Updated weights for policy 0, policy_version 1750 (0.0015)
[2024-09-22 06:08:43,031][04746] Fps is (10 sec: 14336.0, 60 sec: 14267.8, 300 sec: 14079.1). Total num frames: 7196672. Throughput: 0: 3568.1. Samples: 1795776. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-22 06:08:43,034][04746] Avg episode reward: [(0, '24.603')]
[2024-09-22 06:08:43,786][06906] Updated weights for policy 0, policy_version 1760 (0.0015)
[2024-09-22 06:08:46,470][06906] Updated weights for policy 0, policy_version 1770 (0.0014)
[2024-09-22 06:08:48,031][04746] Fps is (10 sec: 14336.0, 60 sec: 14404.3, 300 sec: 14093.0). Total num frames: 7270400. Throughput: 0: 3595.8. Samples: 1818094. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-22 06:08:48,033][04746] Avg episode reward: [(0, '28.280')]
[2024-09-22 06:08:49,259][06906] Updated weights for policy 0, policy_version 1780 (0.0018)
[2024-09-22 06:08:51,996][06906] Updated weights for policy 0, policy_version 1790 (0.0015)
[2024-09-22 06:08:53,031][04746] Fps is (10 sec: 14745.4, 60 sec: 14336.0, 300 sec: 14120.8). Total num frames: 7344128. Throughput: 0: 3596.8. Samples: 1829200. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-22 06:08:53,034][04746] Avg episode reward: [(0, '30.855')]
[2024-09-22 06:08:54,933][06906] Updated weights for policy 0, policy_version 1800 (0.0017)
[2024-09-22 06:08:57,835][06906] Updated weights for policy 0, policy_version 1810 (0.0015)
[2024-09-22 06:08:58,031][04746] Fps is (10 sec: 14336.0, 60 sec: 14336.0, 300 sec: 14120.8). Total num frames: 7413760. Throughput: 0: 3586.4. Samples: 1849996. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-22 06:08:58,034][04746] Avg episode reward: [(0, '27.938')]
[2024-09-22 06:09:00,571][06906] Updated weights for policy 0, policy_version 1820 (0.0014)
[2024-09-22 06:09:03,031][04746] Fps is (10 sec: 14745.8, 60 sec: 14472.6, 300 sec: 14176.3). Total num frames: 7491584. Throughput: 0: 3621.7. Samples: 1872776. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-22 06:09:03,033][04746] Avg episode reward: [(0, '27.574')]
[2024-09-22 06:09:03,308][06906] Updated weights for policy 0, policy_version 1830 (0.0015)
[2024-09-22 06:09:05,943][06906] Updated weights for policy 0, policy_version 1840 (0.0016)
[2024-09-22 06:09:08,031][04746] Fps is (10 sec: 15155.2, 60 sec: 14472.5, 300 sec: 14190.2). Total num frames: 7565312. Throughput: 0: 3629.4. Samples: 1884264. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-22 06:09:08,034][04746] Avg episode reward: [(0, '30.259')]
[2024-09-22 06:09:08,883][06906] Updated weights for policy 0, policy_version 1850 (0.0015)
[2024-09-22 06:09:11,811][06906] Updated weights for policy 0, policy_version 1860 (0.0016)
[2024-09-22 06:09:13,031][04746] Fps is (10 sec: 14335.9, 60 sec: 14472.6, 300 sec: 14190.2). Total num frames: 7634944. Throughput: 0: 3622.0. Samples: 1905148. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-22 06:09:13,034][04746] Avg episode reward: [(0, '26.642')]
[2024-09-22 06:09:14,570][06906] Updated weights for policy 0, policy_version 1870 (0.0015)
[2024-09-22 06:09:17,253][06906] Updated weights for policy 0, policy_version 1880 (0.0014)
[2024-09-22 06:09:18,031][04746] Fps is (10 sec: 14335.9, 60 sec: 14540.8, 300 sec: 14218.0). Total num frames: 7708672. Throughput: 0: 3644.3. Samples: 1927752. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-22 06:09:18,033][04746] Avg episode reward: [(0, '25.555')]
[2024-09-22 06:09:20,033][06906] Updated weights for policy 0, policy_version 1890 (0.0016)
[2024-09-22 06:09:23,025][06906] Updated weights for policy 0, policy_version 1900 (0.0014)
[2024-09-22 06:09:23,038][04746] Fps is (10 sec: 14734.9, 60 sec: 14470.8, 300 sec: 14231.5). Total num frames: 7782400. Throughput: 0: 3644.5. Samples: 1938942. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-22 06:09:23,043][04746] Avg episode reward: [(0, '28.122')]
[2024-09-22 06:09:26,050][06906] Updated weights for policy 0, policy_version 1910 (0.0014)
[2024-09-22 06:09:28,031][04746] Fps is (10 sec: 14336.3, 60 sec: 14540.8, 300 sec: 14218.0). Total num frames: 7852032. Throughput: 0: 3632.6. Samples: 1959244. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-22 06:09:28,033][04746] Avg episode reward: [(0, '28.462')]
[2024-09-22 06:09:28,842][06906] Updated weights for policy 0, policy_version 1920 (0.0013)
[2024-09-22 06:09:31,567][06906] Updated weights for policy 0, policy_version 1930 (0.0017)
[2024-09-22 06:09:33,031][04746] Fps is (10 sec: 14346.3, 60 sec: 14540.8, 300 sec: 14245.7). Total num frames: 7925760. Throughput: 0: 3632.2. Samples: 1981544. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2024-09-22 06:09:33,033][04746] Avg episode reward: [(0, '33.180')]
[2024-09-22 06:09:33,035][06893] Saving new best policy, reward=33.180!
[2024-09-22 06:09:34,368][06906] Updated weights for policy 0, policy_version 1940 (0.0014)
[2024-09-22 06:09:37,321][06906] Updated weights for policy 0, policy_version 1950 (0.0014)
[2024-09-22 06:09:38,031][04746] Fps is (10 sec: 14335.8, 60 sec: 14472.5, 300 sec: 14245.7). Total num frames: 7995392. Throughput: 0: 3627.2. Samples: 1992422. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-09-22 06:09:38,033][04746] Avg episode reward: [(0, '31.870')]
[2024-09-22 06:09:38,044][06893] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001952_7995392.pth...
[2024-09-22 06:09:38,118][06893] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001116_4571136.pth
[2024-09-22 06:09:40,305][06906] Updated weights for policy 0, policy_version 1960 (0.0015)
[2024-09-22 06:09:43,031][04746] Fps is (10 sec: 13926.4, 60 sec: 14472.5, 300 sec: 14231.9). Total num frames: 8065024. Throughput: 0: 3618.8. Samples: 2012844. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-22 06:09:43,034][04746] Avg episode reward: [(0, '28.848')]
[2024-09-22 06:09:43,249][06906] Updated weights for policy 0, policy_version 1970 (0.0015)
[2024-09-22 06:09:46,191][06906] Updated weights for policy 0, policy_version 1980 (0.0014)
[2024-09-22 06:09:48,031][04746] Fps is (10 sec: 13926.4, 60 sec: 14404.3, 300 sec: 14245.7). Total num frames: 8134656. Throughput: 0: 3579.4. Samples: 2033850. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2024-09-22 06:09:48,033][04746] Avg episode reward: [(0, '30.720')]
[2024-09-22 06:09:49,107][06906] Updated weights for policy 0, policy_version 1990 (0.0016)
[2024-09-22 06:09:52,217][06906] Updated weights for policy 0, policy_version 2000 (0.0017)
[2024-09-22 06:09:53,031][04746] Fps is (10 sec: 13516.9, 60 sec: 14267.8, 300 sec: 14218.0). Total num frames: 8200192. Throughput: 0: 3547.8. Samples: 2043916. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2024-09-22 06:09:53,033][04746] Avg episode reward: [(0, '31.790')]
[2024-09-22 06:09:55,358][06906] Updated weights for policy 0, policy_version 2010 (0.0014)
[2024-09-22 06:09:58,031][04746] Fps is (10 sec: 13107.2, 60 sec: 14199.5, 300 sec: 14190.2). Total num frames: 8265728. Throughput: 0: 3515.7. Samples: 2063356. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-09-22 06:09:58,033][04746] Avg episode reward: [(0, '30.821')]
[2024-09-22 06:09:58,682][06906] Updated weights for policy 0, policy_version 2020 (0.0017)
[2024-09-22 06:10:02,065][06906] Updated weights for policy 0, policy_version 2030 (0.0017)
[2024-09-22 06:10:03,031][04746] Fps is (10 sec: 12287.9, 60 sec: 13858.1, 300 sec: 14162.4). Total num frames: 8323072. Throughput: 0: 3419.3. Samples: 2081622. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-22 06:10:03,034][04746] Avg episode reward: [(0, '27.214')]
[2024-09-22 06:10:05,664][06906] Updated weights for policy 0, policy_version 2040 (0.0017)
[2024-09-22 06:10:08,031][04746] Fps is (10 sec: 11468.7, 60 sec: 13585.1, 300 sec: 14106.9). Total num frames: 8380416. Throughput: 0: 3358.8. Samples: 2090066. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-22 06:10:08,033][04746] Avg episode reward: [(0, '28.579')]
[2024-09-22 06:10:09,296][06906] Updated weights for policy 0, policy_version 2050 (0.0017)
[2024-09-22 06:10:12,735][06906] Updated weights for policy 0, policy_version 2060 (0.0017)
[2024-09-22 06:10:13,031][04746] Fps is (10 sec: 11468.7, 60 sec: 13380.2, 300 sec: 14051.4). Total num frames: 8437760. Throughput: 0: 3288.1. Samples: 2107210. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-22 06:10:13,036][04746] Avg episode reward: [(0, '30.279')]
[2024-09-22 06:10:16,120][06906] Updated weights for policy 0, policy_version 2070 (0.0017)
[2024-09-22 06:10:18,031][04746] Fps is (10 sec: 11878.6, 60 sec: 13175.5, 300 sec: 14037.5). Total num frames: 8499200. Throughput: 0: 3198.1. Samples: 2125458. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-22 06:10:18,034][04746] Avg episode reward: [(0, '29.975')]
[2024-09-22 06:10:19,656][06906] Updated weights for policy 0, policy_version 2080 (0.0018)
[2024-09-22 06:10:23,031][04746] Fps is (10 sec: 11878.4, 60 sec: 12903.9, 300 sec: 13981.9). Total num frames: 8556544. Throughput: 0: 3141.6. Samples: 2133794. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2024-09-22 06:10:23,033][04746] Avg episode reward: [(0, '27.525')]
[2024-09-22 06:10:23,348][06906] Updated weights for policy 0, policy_version 2090 (0.0018)
[2024-09-22 06:10:26,726][06906] Updated weights for policy 0, policy_version 2100 (0.0016)
[2024-09-22 06:10:28,031][04746] Fps is (10 sec: 11468.8, 60 sec: 12697.6, 300 sec: 13940.3). Total num frames: 8613888. Throughput: 0: 3077.3. Samples: 2151320. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-22 06:10:28,033][04746] Avg episode reward: [(0, '24.633')]
[2024-09-22 06:10:30,089][06906] Updated weights for policy 0, policy_version 2110 (0.0015)
[2024-09-22 06:10:33,031][04746] Fps is (10 sec: 11878.4, 60 sec: 12492.8, 300 sec: 13912.5). Total num frames: 8675328. Throughput: 0: 3015.4. Samples: 2169544. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-22 06:10:33,034][04746] Avg episode reward: [(0, '27.902')]
[2024-09-22 06:10:33,512][06906] Updated weights for policy 0, policy_version 2120 (0.0014)
[2024-09-22 06:10:37,238][06906] Updated weights for policy 0, policy_version 2130 (0.0017)
[2024-09-22 06:10:38,031][04746] Fps is (10 sec: 11878.2, 60 sec: 12288.0, 300 sec: 13870.9). Total num frames: 8732672. Throughput: 0: 2976.3. Samples: 2177850. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2024-09-22 06:10:38,034][04746] Avg episode reward: [(0, '28.677')]
[2024-09-22 06:10:40,661][06906] Updated weights for policy 0, policy_version 2140 (0.0015)
[2024-09-22 06:10:43,031][04746] Fps is (10 sec: 11468.8, 60 sec: 12083.2, 300 sec: 13815.3). Total num frames: 8790016. Throughput: 0: 2932.5. Samples: 2195320. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-22 06:10:43,034][04746] Avg episode reward: [(0, '27.919')]
[2024-09-22 06:10:44,061][06906] Updated weights for policy 0, policy_version 2150 (0.0016)
[2024-09-22 06:10:47,398][06906] Updated weights for policy 0, policy_version 2160 (0.0020)
[2024-09-22 06:10:48,031][04746] Fps is (10 sec: 11878.5, 60 sec: 11946.7, 300 sec: 13801.4). Total num frames: 8851456. Throughput: 0: 2936.0. Samples: 2213744. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2024-09-22 06:10:48,033][04746] Avg episode reward: [(0, '29.776')]
[2024-09-22 06:10:50,642][06906] Updated weights for policy 0, policy_version 2170 (0.0017)
[2024-09-22 06:10:53,031][04746] Fps is (10 sec: 12697.3, 60 sec: 11946.6, 300 sec: 13773.7). Total num frames: 8916992. Throughput: 0: 2957.7. Samples: 2223164. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-22 06:10:53,035][04746] Avg episode reward: [(0, '30.713')]
[2024-09-22 06:10:53,743][06906] Updated weights for policy 0, policy_version 2180 (0.0017)
[2024-09-22 06:10:56,608][06906] Updated weights for policy 0, policy_version 2190 (0.0014)
[2024-09-22 06:10:58,031][04746] Fps is (10 sec: 13516.7, 60 sec: 12014.9, 300 sec: 13773.7). Total num frames: 8986624. Throughput: 0: 3037.3. Samples: 2243888. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2024-09-22 06:10:58,034][04746] Avg episode reward: [(0, '28.134')]
[2024-09-22 06:10:59,455][06906] Updated weights for policy 0, policy_version 2200 (0.0015)
[2024-09-22 06:11:02,390][06906] Updated weights for policy 0, policy_version 2210 (0.0019)
[2024-09-22 06:11:03,033][04746] Fps is (10 sec: 13923.8, 60 sec: 12219.3, 300 sec: 13773.6). Total num frames: 9056256. Throughput: 0: 3093.6. Samples: 2264678. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-22 06:11:03,035][04746] Avg episode reward: [(0, '27.638')]
[2024-09-22 06:11:06,169][06906] Updated weights for policy 0, policy_version 2220 (0.0016)
[2024-09-22 06:11:08,031][04746] Fps is (10 sec: 12697.7, 60 sec: 12219.8, 300 sec: 13718.1). Total num frames: 9113600. Throughput: 0: 3091.7. Samples: 2272920. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2024-09-22 06:11:08,033][04746] Avg episode reward: [(0, '28.320')]
[2024-09-22 06:11:09,625][06906] Updated weights for policy 0, policy_version 2230 (0.0016)
[2024-09-22 06:11:13,014][06906] Updated weights for policy 0, policy_version 2240 (0.0016)
[2024-09-22 06:11:13,031][04746] Fps is (10 sec: 11880.9, 60 sec: 12288.0, 300 sec: 13704.2). Total num frames: 9175040. Throughput: 0: 3097.7. Samples: 2290716. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-22 06:11:13,034][04746] Avg episode reward: [(0, '29.334')]
[2024-09-22 06:11:16,343][06906] Updated weights for policy 0, policy_version 2250 (0.0016)
[2024-09-22 06:11:18,031][04746] Fps is (10 sec: 11878.4, 60 sec: 12219.7, 300 sec: 13648.7). Total num frames: 9232384. Throughput: 0: 3094.1. Samples: 2308778. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-22 06:11:18,033][04746] Avg episode reward: [(0, '31.025')]
[2024-09-22 06:11:20,034][06906] Updated weights for policy 0, policy_version 2260 (0.0018)
[2024-09-22 06:11:23,031][04746] Fps is (10 sec: 11468.9, 60 sec: 12219.7, 300 sec: 13593.2). Total num frames: 9289728. Throughput: 0: 3092.1. Samples: 2316994. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2024-09-22 06:11:23,035][04746] Avg episode reward: [(0, '29.830')]
[2024-09-22 06:11:23,554][06906] Updated weights for policy 0, policy_version 2270 (0.0015)
[2024-09-22 06:11:26,999][06906] Updated weights for policy 0, policy_version 2280 (0.0016)
[2024-09-22 06:11:28,031][04746] Fps is (10 sec: 11878.4, 60 sec: 12288.0, 300 sec: 13579.3). Total num frames: 9351168. Throughput: 0: 3100.1. Samples: 2334826. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-22 06:11:28,034][04746] Avg episode reward: [(0, '28.612')]
[2024-09-22 06:11:30,359][06906] Updated weights for policy 0, policy_version 2290 (0.0014)
[2024-09-22 06:11:33,031][04746] Fps is (10 sec: 11878.3, 60 sec: 12219.7, 300 sec: 13523.7). Total num frames: 9408512. Throughput: 0: 3083.4. Samples: 2352498. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-22 06:11:33,034][04746] Avg episode reward: [(0, '30.104')]
[2024-09-22 06:11:33,991][06906] Updated weights for policy 0, policy_version 2300 (0.0015)
[2024-09-22 06:11:37,626][06906] Updated weights for policy 0, policy_version 2310 (0.0017)
[2024-09-22 06:11:38,031][04746] Fps is (10 sec: 11468.6, 60 sec: 12219.7, 300 sec: 13482.1). Total num frames: 9465856. Throughput: 0: 3056.4. Samples: 2360700. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-09-22 06:11:38,034][04746] Avg episode reward: [(0, '30.210')]
[2024-09-22 06:11:38,043][06893] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002311_9465856.pth...
[2024-09-22 06:11:38,126][06893] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001531_6270976.pth
[2024-09-22 06:11:40,997][06906] Updated weights for policy 0, policy_version 2320 (0.0017)
[2024-09-22 06:11:43,031][04746] Fps is (10 sec: 11468.9, 60 sec: 12219.7, 300 sec: 13440.4). Total num frames: 9523200. Throughput: 0: 2996.4. Samples: 2378726. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2024-09-22 06:11:43,034][04746] Avg episode reward: [(0, '28.804')]
[2024-09-22 06:11:44,444][06906] Updated weights for policy 0, policy_version 2330 (0.0015)
[2024-09-22 06:11:48,018][06906] Updated weights for policy 0, policy_version 2340 (0.0017)
[2024-09-22 06:11:48,031][04746] Fps is (10 sec: 11878.3, 60 sec: 12219.7, 300 sec: 13398.8). Total num frames: 9584640. Throughput: 0: 2922.2. Samples: 2396170. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0)
[2024-09-22 06:11:48,034][04746] Avg episode reward: [(0, '31.504')]
[2024-09-22 06:11:51,642][06906] Updated weights for policy 0, policy_version 2350 (0.0017)
[2024-09-22 06:11:53,031][04746] Fps is (10 sec: 11878.1, 60 sec: 12083.2, 300 sec: 13343.2). Total num frames: 9641984. Throughput: 0: 2923.3. Samples: 2404470. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-22 06:11:53,034][04746] Avg episode reward: [(0, '31.838')]
[2024-09-22 06:11:55,083][06906] Updated weights for policy 0, policy_version 2360 (0.0018)
[2024-09-22 06:11:58,031][04746] Fps is (10 sec: 11469.1, 60 sec: 11878.4, 300 sec: 13315.5). Total num frames: 9699328. Throughput: 0: 2926.8. Samples: 2422422. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-22 06:11:58,033][04746] Avg episode reward: [(0, '30.794')]
[2024-09-22 06:11:58,469][06906] Updated weights for policy 0, policy_version 2370 (0.0015)
[2024-09-22 06:12:01,564][06906] Updated weights for policy 0, policy_version 2380 (0.0018)
[2024-09-22 06:12:03,031][04746] Fps is (10 sec: 12288.3, 60 sec: 11810.6, 300 sec: 13287.7). Total num frames: 9764864. Throughput: 0: 2956.6. Samples: 2441824. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-09-22 06:12:03,034][04746] Avg episode reward: [(0, '28.643')]
[2024-09-22 06:12:04,847][06906] Updated weights for policy 0, policy_version 2390 (0.0018)
[2024-09-22 06:12:07,678][06906] Updated weights for policy 0, policy_version 2400 (0.0017)
[2024-09-22 06:12:08,032][04746] Fps is (10 sec: 13516.3, 60 sec: 12014.8, 300 sec: 13273.8). Total num frames: 9834496. Throughput: 0: 2990.0. Samples: 2451544. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-09-22 06:12:08,033][04746] Avg episode reward: [(0, '27.010')]
[2024-09-22 06:12:10,518][06906] Updated weights for policy 0, policy_version 2410 (0.0015)
[2024-09-22 06:12:13,031][04746] Fps is (10 sec: 13926.3, 60 sec: 12151.5, 300 sec: 13287.7). Total num frames: 9904128. Throughput: 0: 3068.1. Samples: 2472890. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2024-09-22 06:12:13,033][04746] Avg episode reward: [(0, '29.198')]
[2024-09-22 06:12:13,580][06906] Updated weights for policy 0, policy_version 2420 (0.0014)
[2024-09-22 06:12:17,131][06906] Updated weights for policy 0, policy_version 2430 (0.0017)
[2024-09-22 06:12:18,031][04746] Fps is (10 sec: 12698.1, 60 sec: 12151.5, 300 sec: 13232.2). Total num frames: 9961472. Throughput: 0: 3073.9. Samples: 2490824. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-22 06:12:18,034][04746] Avg episode reward: [(0, '29.191')]
[2024-09-22 06:12:20,788][06906] Updated weights for policy 0, policy_version 2440 (0.0018)
[2024-09-22 06:12:21,773][06893] Stopping Batcher_0...
[2024-09-22 06:12:21,774][06893] Loop batcher_evt_loop terminating...
[2024-09-22 06:12:21,777][06893] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002443_10006528.pth...
[2024-09-22 06:12:21,774][04746] Component Batcher_0 stopped!
[2024-09-22 06:12:21,808][06906] Weights refcount: 2 0
[2024-09-22 06:12:21,810][06906] Stopping InferenceWorker_p0-w0...
[2024-09-22 06:12:21,810][06906] Loop inference_proc0-0_evt_loop terminating...
[2024-09-22 06:12:21,811][04746] Component InferenceWorker_p0-w0 stopped!
[2024-09-22 06:12:21,838][06913] Stopping RolloutWorker_w4...
[2024-09-22 06:12:21,838][06913] Loop rollout_proc4_evt_loop terminating...
[2024-09-22 06:12:21,838][04746] Component RolloutWorker_w4 stopped!
[2024-09-22 06:12:21,851][06911] Stopping RolloutWorker_w5...
[2024-09-22 06:12:21,852][06911] Loop rollout_proc5_evt_loop terminating...
[2024-09-22 06:12:21,851][04746] Component RolloutWorker_w5 stopped!
[2024-09-22 06:12:21,864][06910] Stopping RolloutWorker_w3...
[2024-09-22 06:12:21,865][06910] Loop rollout_proc3_evt_loop terminating...
[2024-09-22 06:12:21,867][06918] Stopping RolloutWorker_w7...
[2024-09-22 06:12:21,864][04746] Component RolloutWorker_w3 stopped!
[2024-09-22 06:12:21,868][06918] Loop rollout_proc7_evt_loop terminating...
[2024-09-22 06:12:21,868][04746] Component RolloutWorker_w7 stopped!
[2024-09-22 06:12:21,872][06893] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001952_7995392.pth
[2024-09-22 06:12:21,878][06912] Stopping RolloutWorker_w6...
[2024-09-22 06:12:21,879][04746] Component RolloutWorker_w6 stopped!
[2024-09-22 06:12:21,882][06912] Loop rollout_proc6_evt_loop terminating...
[2024-09-22 06:12:21,885][06893] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002443_10006528.pth...
[2024-09-22 06:12:21,898][06908] Stopping RolloutWorker_w1...
[2024-09-22 06:12:21,899][06908] Loop rollout_proc1_evt_loop terminating...
[2024-09-22 06:12:21,898][04746] Component RolloutWorker_w1 stopped!
[2024-09-22 06:12:21,921][06909] Stopping RolloutWorker_w2...
[2024-09-22 06:12:21,922][06909] Loop rollout_proc2_evt_loop terminating...
[2024-09-22 06:12:21,924][04746] Component RolloutWorker_w2 stopped!
[2024-09-22 06:12:21,966][06907] Stopping RolloutWorker_w0...
[2024-09-22 06:12:21,967][06907] Loop rollout_proc0_evt_loop terminating...
[2024-09-22 06:12:21,967][04746] Component RolloutWorker_w0 stopped!
[2024-09-22 06:12:22,064][06893] Stopping LearnerWorker_p0...
[2024-09-22 06:12:22,064][06893] Loop learner_proc0_evt_loop terminating...
[2024-09-22 06:12:22,063][04746] Component LearnerWorker_p0 stopped!
[2024-09-22 06:12:22,068][04746] Waiting for process learner_proc0 to stop...
[2024-09-22 06:12:23,148][04746] Waiting for process inference_proc0-0 to join...
[2024-09-22 06:12:23,151][04746] Waiting for process rollout_proc0 to join...
[2024-09-22 06:12:23,154][04746] Waiting for process rollout_proc1 to join...
[2024-09-22 06:12:23,156][04746] Waiting for process rollout_proc2 to join...
[2024-09-22 06:12:23,158][04746] Waiting for process rollout_proc3 to join...
[2024-09-22 06:12:23,160][04746] Waiting for process rollout_proc4 to join...
[2024-09-22 06:12:23,162][04746] Waiting for process rollout_proc5 to join...
[2024-09-22 06:12:23,165][04746] Waiting for process rollout_proc6 to join...
[2024-09-22 06:12:23,168][04746] Waiting for process rollout_proc7 to join...
[2024-09-22 06:12:23,170][04746] Batcher 0 profile tree view:
batching: 41.2143, releasing_batches: 0.0861
[2024-09-22 06:12:23,171][04746] InferenceWorker_p0-w0 profile tree view:
wait_policy: 0.0001
  wait_policy_total: 12.9840
update_model: 12.5052
  weight_update: 0.0017
one_step: 0.0084
  handle_policy_step: 680.9309
    deserialize: 27.9295, stack: 4.8215, obs_to_device_normalize: 159.6845, forward: 336.5609, send_messages: 40.8327
    prepare_outputs: 78.3299
      to_cpu: 49.7910
[2024-09-22 06:12:23,173][04746] Learner 0 profile tree view:
misc: 0.0139, prepare_batch: 26.0761
train: 112.9910
  epoch_init: 0.0147, minibatch_init: 0.0160, losses_postprocess: 0.8946, kl_divergence: 0.9136, after_optimizer: 52.6494
  calculate_losses: 38.7085
    losses_init: 0.0096, forward_head: 1.8256, bptt_initial: 27.6678, tail: 1.5628, advantages_returns: 0.4044, losses: 3.8285
    bptt: 2.9188
      bptt_forward_core: 2.7751
  update: 18.7262
    clip: 1.8584
[2024-09-22 06:12:23,176][04746] RolloutWorker_w0 profile tree view:
wait_for_trajectories: 0.4337, enqueue_policy_requests: 22.8105, env_step: 304.0444, overhead: 17.7295, complete_rollouts: 1.0132
save_policy_outputs: 26.1419
  split_output_tensors: 10.4431
[2024-09-22 06:12:23,178][04746] RolloutWorker_w7 profile tree view:
wait_for_trajectories: 0.4597, enqueue_policy_requests: 23.9968, env_step: 320.4870, overhead: 18.6543, complete_rollouts: 1.3253
save_policy_outputs: 27.0024
  split_output_tensors: 10.7421
[2024-09-22 06:12:23,180][04746] Loop Runner_EvtLoop terminating...
[2024-09-22 06:12:23,181][04746] Runner profile tree view:
main_loop: 759.4308
[2024-09-22 06:12:23,182][04746] Collected {0: 10006528}, FPS: 13176.4
[2024-09-22 06:12:23,547][04746] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
[2024-09-22 06:12:23,550][04746] Overriding arg 'num_workers' with value 1 passed from command line
[2024-09-22 06:12:23,552][04746] Adding new argument 'no_render'=True that is not in the saved config file!
[2024-09-22 06:12:23,553][04746] Adding new argument 'save_video'=True that is not in the saved config file!
[2024-09-22 06:12:23,554][04746] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
[2024-09-22 06:12:23,556][04746] Adding new argument 'video_name'=None that is not in the saved config file!
[2024-09-22 06:12:23,559][04746] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file!
[2024-09-22 06:12:23,561][04746] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
[2024-09-22 06:12:23,563][04746] Adding new argument 'push_to_hub'=False that is not in the saved config file!
[2024-09-22 06:12:23,564][04746] Adding new argument 'hf_repository'=None that is not in the saved config file!
[2024-09-22 06:12:23,565][04746] Adding new argument 'policy_index'=0 that is not in the saved config file!
[2024-09-22 06:12:23,566][04746] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
[2024-09-22 06:12:23,568][04746] Adding new argument 'train_script'=None that is not in the saved config file!
[2024-09-22 06:12:23,571][04746] Adding new argument 'enjoy_script'=None that is not in the saved config file!
[2024-09-22 06:12:23,572][04746] Using frameskip 1 and render_action_repeat=4 for evaluation
[2024-09-22 06:12:23,605][04746] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-09-22 06:12:23,609][04746] RunningMeanStd input shape: (3, 72, 128)
[2024-09-22 06:12:23,612][04746] RunningMeanStd input shape: (1,)
[2024-09-22 06:12:23,630][04746] ConvEncoder: input_channels=3
[2024-09-22 06:12:23,759][04746] Conv encoder output size: 512
[2024-09-22 06:12:23,761][04746] Policy head output size: 512
[2024-09-22 06:12:24,038][04746] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002443_10006528.pth...
[2024-09-22 06:12:24,942][04746] Num frames 100...
[2024-09-22 06:12:25,115][04746] Num frames 200...
[2024-09-22 06:12:25,269][04746] Num frames 300...
[2024-09-22 06:12:25,426][04746] Num frames 400...
[2024-09-22 06:12:25,577][04746] Num frames 500...
[2024-09-22 06:12:25,746][04746] Num frames 600...
[2024-09-22 06:12:25,902][04746] Num frames 700...
[2024-09-22 06:12:26,054][04746] Num frames 800...
[2024-09-22 06:12:26,213][04746] Num frames 900...
[2024-09-22 06:12:26,381][04746] Num frames 1000...
[2024-09-22 06:12:26,539][04746] Num frames 1100...
[2024-09-22 06:12:26,687][04746] Num frames 1200...
[2024-09-22 06:12:26,864][04746] Num frames 1300...
[2024-09-22 06:12:27,021][04746] Num frames 1400...
[2024-09-22 06:12:27,174][04746] Num frames 1500...
[2024-09-22 06:12:27,329][04746] Num frames 1600...
[2024-09-22 06:12:27,547][04746] Avg episode rewards: #0: 39.959, true rewards: #0: 16.960
[2024-09-22 06:12:27,549][04746] Avg episode reward: 39.959, avg true_objective: 16.960
[2024-09-22 06:12:27,558][04746] Num frames 1700...
[2024-09-22 06:12:27,714][04746] Num frames 1800...
[2024-09-22 06:12:27,857][04746] Num frames 1900...
[2024-09-22 06:12:28,003][04746] Avg episode rewards: #0: 21.260, true rewards: #0: 9.760
[2024-09-22 06:12:28,005][04746] Avg episode reward: 21.260, avg true_objective: 9.760
[2024-09-22 06:12:28,080][04746] Num frames 2000...
[2024-09-22 06:12:28,220][04746] Num frames 2100...
[2024-09-22 06:12:28,369][04746] Num frames 2200...
[2024-09-22 06:12:28,537][04746] Num frames 2300...
[2024-09-22 06:12:28,700][04746] Num frames 2400...
[2024-09-22 06:12:28,753][04746] Avg episode rewards: #0: 17.333, true rewards: #0: 8.000
[2024-09-22 06:12:28,755][04746] Avg episode reward: 17.333, avg true_objective: 8.000
[2024-09-22 06:12:28,896][04746] Num frames 2500...
[2024-09-22 06:12:29,065][04746] Num frames 2600...
[2024-09-22 06:12:29,220][04746] Num frames 2700...
[2024-09-22 06:12:29,368][04746] Num frames 2800...
[2024-09-22 06:12:29,544][04746] Num frames 2900...
[2024-09-22 06:12:29,704][04746] Num frames 3000...
[2024-09-22 06:12:29,869][04746] Num frames 3100...
[2024-09-22 06:12:29,931][04746] Avg episode rewards: #0: 17.010, true rewards: #0: 7.760
[2024-09-22 06:12:29,933][04746] Avg episode reward: 17.010, avg true_objective: 7.760
[2024-09-22 06:12:30,089][04746] Num frames 3200...
[2024-09-22 06:12:30,263][04746] Num frames 3300...
[2024-09-22 06:12:30,419][04746] Num frames 3400...
[2024-09-22 06:12:30,579][04746] Num frames 3500...
[2024-09-22 06:12:30,757][04746] Num frames 3600...
[2024-09-22 06:12:30,909][04746] Num frames 3700...
[2024-09-22 06:12:31,071][04746] Num frames 3800...
[2024-09-22 06:12:31,236][04746] Num frames 3900...
[2024-09-22 06:12:31,394][04746] Num frames 4000...
[2024-09-22 06:12:31,552][04746] Num frames 4100...
[2024-09-22 06:12:31,697][04746] Num frames 4200...
[2024-09-22 06:12:31,863][04746] Num frames 4300...
[2024-09-22 06:12:32,023][04746] Num frames 4400...
[2024-09-22 06:12:32,187][04746] Num frames 4500...
[2024-09-22 06:12:32,342][04746] Num frames 4600...
[2024-09-22 06:12:32,504][04746] Num frames 4700...
[2024-09-22 06:12:32,651][04746] Num frames 4800...
[2024-09-22 06:12:32,856][04746] Avg episode rewards: #0: 23.392, true rewards: #0: 9.792
[2024-09-22 06:12:32,857][04746] Avg episode reward: 23.392, avg true_objective: 9.792
[2024-09-22 06:12:32,867][04746] Num frames 4900...
[2024-09-22 06:12:33,037][04746] Num frames 5000...
[2024-09-22 06:12:33,185][04746] Num frames 5100...
[2024-09-22 06:12:33,346][04746] Num frames 5200...
[2024-09-22 06:12:33,503][04746] Num frames 5300...
[2024-09-22 06:12:33,661][04746] Num frames 5400...
[2024-09-22 06:12:33,817][04746] Num frames 5500...
[2024-09-22 06:12:33,968][04746] Num frames 5600...
[2024-09-22 06:12:34,125][04746] Num frames 5700...
[2024-09-22 06:12:34,214][04746] Avg episode rewards: #0: 22.366, true rewards: #0: 9.533
[2024-09-22 06:12:34,215][04746] Avg episode reward: 22.366, avg true_objective: 9.533
[2024-09-22 06:12:34,350][04746] Num frames 5800...
[2024-09-22 06:12:34,489][04746] Num frames 5900...
[2024-09-22 06:12:34,652][04746] Num frames 6000...
[2024-09-22 06:12:34,807][04746] Num frames 6100...
[2024-09-22 06:12:34,979][04746] Num frames 6200...
[2024-09-22 06:12:35,123][04746] Num frames 6300...
[2024-09-22 06:12:35,277][04746] Num frames 6400...
[2024-09-22 06:12:35,448][04746] Num frames 6500...
[2024-09-22 06:12:35,606][04746] Num frames 6600...
[2024-09-22 06:12:35,753][04746] Num frames 6700...
[2024-09-22 06:12:35,913][04746] Num frames 6800...
[2024-09-22 06:12:36,072][04746] Num frames 6900...
[2024-09-22 06:12:36,217][04746] Num frames 7000...
[2024-09-22 06:12:36,375][04746] Num frames 7100...
[2024-09-22 06:12:36,543][04746] Num frames 7200...
[2024-09-22 06:12:36,687][04746] Num frames 7300...
[2024-09-22 06:12:36,843][04746] Num frames 7400...
[2024-09-22 06:12:37,003][04746] Num frames 7500...
[2024-09-22 06:12:37,157][04746] Num frames 7600...
[2024-09-22 06:12:37,313][04746] Num frames 7700...
[2024-09-22 06:12:37,479][04746] Num frames 7800...
[2024-09-22 06:12:37,573][04746] Avg episode rewards: #0: 28.028, true rewards: #0: 11.171
[2024-09-22 06:12:37,574][04746] Avg episode reward: 28.028, avg true_objective: 11.171
[2024-09-22 06:12:37,706][04746] Num frames 7900...
[2024-09-22 06:12:37,854][04746] Num frames 8000...
[2024-09-22 06:12:38,021][04746] Num frames 8100...
[2024-09-22 06:12:38,197][04746] Num frames 8200...
[2024-09-22 06:12:38,358][04746] Num frames 8300...
[2024-09-22 06:12:38,510][04746] Num frames 8400...
[2024-09-22 06:12:38,687][04746] Num frames 8500...
[2024-09-22 06:12:38,841][04746] Num frames 8600...
[2024-09-22 06:12:38,995][04746] Num frames 8700...
[2024-09-22 06:12:39,166][04746] Num frames 8800...
[2024-09-22 06:12:39,329][04746] Num frames 8900...
[2024-09-22 06:12:39,484][04746] Num frames 9000...
[2024-09-22 06:12:39,649][04746] Num frames 9100...
[2024-09-22 06:12:39,835][04746] Num frames 9200...
[2024-09-22 06:12:40,038][04746] Avg episode rewards: #0: 29.365, true rewards: #0: 11.615
[2024-09-22 06:12:40,039][04746] Avg episode reward: 29.365, avg true_objective: 11.615
[2024-09-22 06:12:40,056][04746] Num frames 9300...
[2024-09-22 06:12:40,215][04746] Num frames 9400...
[2024-09-22 06:12:40,393][04746] Num frames 9500...
[2024-09-22 06:12:40,553][04746] Num frames 9600...
[2024-09-22 06:12:40,718][04746] Num frames 9700...
[2024-09-22 06:12:40,899][04746] Num frames 9800...
[2024-09-22 06:12:41,058][04746] Num frames 9900...
[2024-09-22 06:12:41,221][04746] Num frames 10000...
[2024-09-22 06:12:41,410][04746] Num frames 10100...
[2024-09-22 06:12:41,585][04746] Num frames 10200...
[2024-09-22 06:12:41,750][04746] Num frames 10300...
[2024-09-22 06:12:41,924][04746] Num frames 10400...
[2024-09-22 06:12:42,091][04746] Num frames 10500...
[2024-09-22 06:12:42,271][04746] Num frames 10600...
[2024-09-22 06:12:42,437][04746] Num frames 10700...
[2024-09-22 06:12:42,602][04746] Num frames 10800...
[2024-09-22 06:12:42,789][04746] Num frames 10900...
[2024-09-22 06:12:42,954][04746] Num frames 11000...
[2024-09-22 06:12:43,120][04746] Num frames 11100...
[2024-09-22 06:12:43,279][04746] Num frames 11200...
[2024-09-22 06:12:43,456][04746] Num frames 11300...
[2024-09-22 06:12:43,668][04746] Avg episode rewards: #0: 32.546, true rewards: #0: 12.658
[2024-09-22 06:12:43,670][04746] Avg episode reward: 32.546, avg true_objective: 12.658
[2024-09-22 06:12:43,687][04746] Num frames 11400...
[2024-09-22 06:12:43,854][04746] Num frames 11500...
[2024-09-22 06:12:44,023][04746] Num frames 11600...
[2024-09-22 06:12:44,199][04746] Num frames 11700...
[2024-09-22 06:12:44,351][04746] Num frames 11800...
[2024-09-22 06:12:44,507][04746] Num frames 11900...
[2024-09-22 06:12:44,682][04746] Num frames 12000...
[2024-09-22 06:12:44,829][04746] Num frames 12100...
[2024-09-22 06:12:44,985][04746] Num frames 12200...
[2024-09-22 06:12:45,160][04746] Num frames 12300...
[2024-09-22 06:12:45,324][04746] Num frames 12400...
[2024-09-22 06:12:45,483][04746] Num frames 12500...
[2024-09-22 06:12:45,642][04746] Num frames 12600...
[2024-09-22 06:12:45,814][04746] Num frames 12700...
[2024-09-22 06:12:45,972][04746] Num frames 12800...
[2024-09-22 06:12:46,141][04746] Num frames 12900...
[2024-09-22 06:12:46,299][04746] Num frames 13000...
[2024-09-22 06:12:46,460][04746] Avg episode rewards: #0: 33.657, true rewards: #0: 13.057
[2024-09-22 06:12:46,462][04746] Avg episode reward: 33.657, avg true_objective: 13.057
[2024-09-22 06:13:23,381][04746] Replay video saved to /content/train_dir/default_experiment/replay.mp4!
[2024-09-22 06:14:25,599][04746] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
[2024-09-22 06:14:25,601][04746] Overriding arg 'num_workers' with value 1 passed from command line
[2024-09-22 06:14:25,603][04746] Adding new argument 'no_render'=True that is not in the saved config file!
[2024-09-22 06:14:25,604][04746] Adding new argument 'save_video'=True that is not in the saved config file!
[2024-09-22 06:14:25,607][04746] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
[2024-09-22 06:14:25,608][04746] Adding new argument 'video_name'=None that is not in the saved config file!
[2024-09-22 06:14:25,609][04746] Adding new argument 'max_num_frames'=100000 that is not in the saved config file!
[2024-09-22 06:14:25,611][04746] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
[2024-09-22 06:14:25,612][04746] Adding new argument 'push_to_hub'=True that is not in the saved config file!
[2024-09-22 06:14:25,613][04746] Adding new argument 'hf_repository'='ThomasSimonini/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file!
[2024-09-22 06:14:25,614][04746] Adding new argument 'policy_index'=0 that is not in the saved config file!
[2024-09-22 06:14:25,616][04746] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
[2024-09-22 06:14:25,617][04746] Adding new argument 'train_script'=None that is not in the saved config file!
[2024-09-22 06:14:25,618][04746] Adding new argument 'enjoy_script'=None that is not in the saved config file!
[2024-09-22 06:14:25,623][04746] Using frameskip 1 and render_action_repeat=4 for evaluation
[2024-09-22 06:14:25,650][04746] RunningMeanStd input shape: (3, 72, 128)
[2024-09-22 06:14:25,653][04746] RunningMeanStd input shape: (1,)
[2024-09-22 06:14:25,667][04746] ConvEncoder: input_channels=3
[2024-09-22 06:14:25,717][04746] Conv encoder output size: 512
[2024-09-22 06:14:25,719][04746] Policy head output size: 512
[2024-09-22 06:14:25,742][04746] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002443_10006528.pth...
[2024-09-22 06:14:26,239][04746] Num frames 100...
[2024-09-22 06:14:26,399][04746] Num frames 200...
[2024-09-22 06:14:26,561][04746] Num frames 300...
[2024-09-22 06:14:26,716][04746] Num frames 400...
[2024-09-22 06:14:26,878][04746] Num frames 500...
[2024-09-22 06:14:27,036][04746] Num frames 600...
[2024-09-22 06:14:27,212][04746] Num frames 700...
[2024-09-22 06:14:27,367][04746] Num frames 800...
[2024-09-22 06:14:27,531][04746] Num frames 900...
[2024-09-22 06:14:27,697][04746] Num frames 1000...
[2024-09-22 06:14:27,867][04746] Num frames 1100...
[2024-09-22 06:14:28,071][04746] Avg episode rewards: #0: 28.950, true rewards: #0: 11.950
[2024-09-22 06:14:28,073][04746] Avg episode reward: 28.950, avg true_objective: 11.950
[2024-09-22 06:14:28,084][04746] Num frames 1200...
[2024-09-22 06:14:28,257][04746] Num frames 1300...
[2024-09-22 06:14:28,436][04746] Num frames 1400...
[2024-09-22 06:14:28,602][04746] Num frames 1500...
[2024-09-22 06:14:28,767][04746] Num frames 1600...
[2024-09-22 06:14:28,901][04746] Num frames 1700...
[2024-09-22 06:14:29,041][04746] Num frames 1800...
[2024-09-22 06:14:29,148][04746] Avg episode rewards: #0: 19.645, true rewards: #0: 9.145
[2024-09-22 06:14:29,150][04746] Avg episode reward: 19.645, avg true_objective: 9.145
[2024-09-22 06:14:29,250][04746] Num frames 1900...
[2024-09-22 06:14:29,382][04746] Num frames 2000...
[2024-09-22 06:14:29,514][04746] Num frames 2100...
[2024-09-22 06:14:29,648][04746] Num frames 2200...
[2024-09-22 06:14:29,785][04746] Num frames 2300...
[2024-09-22 06:14:29,923][04746] Num frames 2400...
[2024-09-22 06:14:30,060][04746] Num frames 2500...
[2024-09-22 06:14:30,190][04746] Num frames 2600...
[2024-09-22 06:14:30,352][04746] Num frames 2700...
[2024-09-22 06:14:30,483][04746] Num frames 2800...
[2024-09-22 06:14:30,614][04746] Num frames 2900...
[2024-09-22 06:14:30,745][04746] Num frames 3000...
[2024-09-22 06:14:30,875][04746] Num frames 3100...
[2024-09-22 06:14:30,945][04746] Avg episode rewards: #0: 22.364, true rewards: #0: 10.363
[2024-09-22 06:14:30,947][04746] Avg episode reward: 22.364, avg true_objective: 10.363
[2024-09-22 06:14:31,071][04746] Num frames 3200...
[2024-09-22 06:14:31,216][04746] Num frames 3300...
[2024-09-22 06:14:31,359][04746] Num frames 3400...
[2024-09-22 06:14:31,489][04746] Num frames 3500...
[2024-09-22 06:14:31,623][04746] Num frames 3600...
[2024-09-22 06:14:31,781][04746] Num frames 3700...
[2024-09-22 06:14:31,942][04746] Num frames 3800...
[2024-09-22 06:14:32,079][04746] Num frames 3900...
[2024-09-22 06:14:32,210][04746] Num frames 4000...
[2024-09-22 06:14:32,339][04746] Num frames 4100...
[2024-09-22 06:14:32,471][04746] Num frames 4200...
[2024-09-22 06:14:32,602][04746] Num frames 4300...
[2024-09-22 06:14:32,730][04746] Num frames 4400...
[2024-09-22 06:14:32,861][04746] Num frames 4500...
[2024-09-22 06:14:32,996][04746] Num frames 4600...
[2024-09-22 06:14:33,126][04746] Num frames 4700...
[2024-09-22 06:14:33,262][04746] Num frames 4800...
[2024-09-22 06:14:33,444][04746] Num frames 4900...
[2024-09-22 06:14:33,599][04746] Num frames 5000...
[2024-09-22 06:14:33,729][04746] Num frames 5100...
[2024-09-22 06:14:33,879][04746] Avg episode rewards: #0: 29.935, true rewards: #0: 12.935
[2024-09-22 06:14:33,881][04746] Avg episode reward: 29.935, avg true_objective: 12.935
[2024-09-22 06:14:33,922][04746] Num frames 5200...
[2024-09-22 06:14:34,053][04746] Num frames 5300...
[2024-09-22 06:14:34,184][04746] Num frames 5400...
[2024-09-22 06:14:34,314][04746] Num frames 5500...
[2024-09-22 06:14:34,446][04746] Num frames 5600...
[2024-09-22 06:14:34,576][04746] Num frames 5700...
[2024-09-22 06:14:34,703][04746] Num frames 5800...
[2024-09-22 06:14:34,830][04746] Num frames 5900...
[2024-09-22 06:14:34,964][04746] Num frames 6000...
[2024-09-22 06:14:35,094][04746] Num frames 6100...
[2024-09-22 06:14:35,219][04746] Num frames 6200...
[2024-09-22 06:14:35,347][04746] Num frames 6300...
[2024-09-22 06:14:35,482][04746] Num frames 6400...
[2024-09-22 06:14:35,619][04746] Num frames 6500...
[2024-09-22 06:14:35,747][04746] Num frames 6600...
[2024-09-22 06:14:35,879][04746] Num frames 6700...
[2024-09-22 06:14:36,016][04746] Num frames 6800...
[2024-09-22 06:14:36,150][04746] Num frames 6900...
[2024-09-22 06:14:36,279][04746] Num frames 7000...
[2024-09-22 06:14:36,407][04746] Num frames 7100...
[2024-09-22 06:14:36,538][04746] Num frames 7200...
[2024-09-22 06:14:36,690][04746] Avg episode rewards: #0: 36.148, true rewards: #0: 14.548
[2024-09-22 06:14:36,693][04746] Avg episode reward: 36.148, avg true_objective: 14.548
[2024-09-22 06:14:36,728][04746] Num frames 7300...
[2024-09-22 06:14:36,853][04746] Num frames 7400...
[2024-09-22 06:14:36,991][04746] Num frames 7500...
[2024-09-22 06:14:37,126][04746] Num frames 7600...
[2024-09-22 06:14:37,254][04746] Num frames 7700...
[2024-09-22 06:14:37,381][04746] Num frames 7800...
[2024-09-22 06:14:37,508][04746] Num frames 7900...
[2024-09-22 06:14:37,635][04746] Avg episode rewards: #0: 32.596, true rewards: #0: 13.263
[2024-09-22 06:14:37,637][04746] Avg episode reward: 32.596, avg true_objective: 13.263
[2024-09-22 06:14:37,692][04746] Num frames 8000...
[2024-09-22 06:14:37,818][04746] Num frames 8100...
[2024-09-22 06:14:37,945][04746] Num frames 8200...
[2024-09-22 06:14:38,070][04746] Num frames 8300...
[2024-09-22 06:14:38,200][04746] Num frames 8400...
[2024-09-22 06:14:38,326][04746] Num frames 8500...
[2024-09-22 06:14:38,458][04746] Num frames 8600...
[2024-09-22 06:14:38,590][04746] Num frames 8700...
[2024-09-22 06:14:38,725][04746] Num frames 8800...
[2024-09-22 06:14:38,854][04746] Num frames 8900...
[2024-09-22 06:14:38,992][04746] Num frames 9000...
[2024-09-22 06:14:39,124][04746] Num frames 9100...
[2024-09-22 06:14:39,263][04746] Num frames 9200...
[2024-09-22 06:14:39,399][04746] Num frames 9300...
[2024-09-22 06:17:24,401][11734] Saving configuration to /content/train_dir/default_experiment/config.json...
[2024-09-22 06:17:24,404][11734] Rollout worker 0 uses device cpu
[2024-09-22 06:17:24,406][11734] Rollout worker 1 uses device cpu
[2024-09-22 06:17:24,407][11734] Rollout worker 2 uses device cpu
[2024-09-22 06:17:24,409][11734] Rollout worker 3 uses device cpu
[2024-09-22 06:17:24,410][11734] Rollout worker 4 uses device cpu
[2024-09-22 06:17:24,411][11734] Rollout worker 5 uses device cpu
[2024-09-22 06:17:24,413][11734] Rollout worker 6 uses device cpu
[2024-09-22 06:17:24,414][11734] Rollout worker 7 uses device cpu
[2024-09-22 06:17:24,484][11734] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2024-09-22 06:17:24,486][11734] InferenceWorker_p0-w0: min num requests: 2
[2024-09-22 06:17:24,526][11734] Starting all processes...
[2024-09-22 06:17:24,528][11734] Starting process learner_proc0
[2024-09-22 06:17:24,938][11734] Starting all processes...
[2024-09-22 06:17:24,947][11734] Starting process inference_proc0-0
[2024-09-22 06:17:24,947][11734] Starting process rollout_proc0
[2024-09-22 06:17:24,947][11734] Starting process rollout_proc1
[2024-09-22 06:17:24,949][11734] Starting process rollout_proc2
[2024-09-22 06:17:24,950][11734] Starting process rollout_proc3
[2024-09-22 06:17:24,955][11734] Starting process rollout_proc4
[2024-09-22 06:17:24,970][11734] Starting process rollout_proc5
[2024-09-22 06:17:24,973][11734] Starting process rollout_proc6
[2024-09-22 06:17:24,982][11734] Starting process rollout_proc7
[2024-09-22 06:17:29,220][12533] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2024-09-22 06:17:29,221][12533] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0
[2024-09-22 06:17:29,251][12533] Num visible devices: 1
[2024-09-22 06:17:29,320][12536] Worker 2 uses CPU cores [2]
[2024-09-22 06:17:29,320][12520] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2024-09-22 06:17:29,321][12520] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0
[2024-09-22 06:17:29,341][12520] Num visible devices: 1
[2024-09-22 06:17:29,352][12537] Worker 3 uses CPU cores [3]
[2024-09-22 06:17:29,391][12520] Starting seed is not provided
[2024-09-22 06:17:29,392][12520] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2024-09-22 06:17:29,393][12520] Initializing actor-critic model on device cuda:0
[2024-09-22 06:17:29,395][12520] RunningMeanStd input shape: (3, 72, 128)
[2024-09-22 06:17:29,397][12520] RunningMeanStd input shape: (1,)
[2024-09-22 06:17:29,421][12520] ConvEncoder: input_channels=3
[2024-09-22 06:17:29,426][12540] Worker 7 uses CPU cores [7]
[2024-09-22 06:17:29,502][12538] Worker 4 uses CPU cores [4]
[2024-09-22 06:17:29,503][12535] Worker 1 uses CPU cores [1]
[2024-09-22 06:17:29,515][12534] Worker 0 uses CPU cores [0]
[2024-09-22 06:17:29,589][12539] Worker 6 uses CPU cores [6]
[2024-09-22 06:17:29,592][12520] Conv encoder output size: 512
[2024-09-22 06:17:29,593][12520] Policy head output size: 512
[2024-09-22 06:17:29,609][12520] Created Actor Critic model with architecture:
[2024-09-22 06:17:29,609][12520] ActorCriticSharedWeights(
  (obs_normalizer): ObservationNormalizer(
    (running_mean_std): RunningMeanStdDictInPlace(
      (running_mean_std): ModuleDict(
        (obs): RunningMeanStdInPlace()
      )
    )
  )
  (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace)
  (encoder): VizdoomEncoder(
    (basic_encoder): ConvEncoder(
      (enc): RecursiveScriptModule(
        original_name=ConvEncoderImpl
        (conv_head): RecursiveScriptModule(
          original_name=Sequential
          (0): RecursiveScriptModule(original_name=Conv2d)
          (1): RecursiveScriptModule(original_name=ELU)
          (2): RecursiveScriptModule(original_name=Conv2d)
          (3): RecursiveScriptModule(original_name=ELU)
          (4): RecursiveScriptModule(original_name=Conv2d)
          (5): RecursiveScriptModule(original_name=ELU)
        )
        (mlp_layers): RecursiveScriptModule(
          original_name=Sequential
          (0): RecursiveScriptModule(original_name=Linear)
          (1): RecursiveScriptModule(original_name=ELU)
        )
      )
    )
  )
  (core): ModelCoreRNN(
    (core): GRU(512, 512)
  )
  (decoder): MlpDecoder(
    (mlp): Identity()
  )
  (critic_linear): Linear(in_features=512, out_features=1, bias=True)
  (action_parameterization): ActionParameterizationDefault(
    (distribution_linear): Linear(in_features=512, out_features=5, bias=True)
  )
)
[2024-09-22 06:17:29,872][12520] Using optimizer <class 'torch.optim.adam.Adam'>
[2024-09-22 06:17:29,892][12541] Worker 5 uses CPU cores [5]
[2024-09-22 06:17:30,625][12520] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002443_10006528.pth...
[2024-09-22 06:17:30,665][12520] Loading model from checkpoint
[2024-09-22 06:17:30,667][12520] Loaded experiment state at self.train_step=2443, self.env_steps=10006528
[2024-09-22 06:17:30,667][12520] Initialized policy 0 weights for model version 2443
[2024-09-22 06:17:30,672][12520] LearnerWorker_p0 finished initialization!
[2024-09-22 06:17:30,672][12520] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2024-09-22 06:17:30,860][12533] RunningMeanStd input shape: (3, 72, 128)
[2024-09-22 06:17:30,861][12533] RunningMeanStd input shape: (1,)
[2024-09-22 06:17:30,875][12533] ConvEncoder: input_channels=3
[2024-09-22 06:17:31,004][12533] Conv encoder output size: 512
[2024-09-22 06:17:31,005][12533] Policy head output size: 512
[2024-09-22 06:17:31,064][11734] Inference worker 0-0 is ready!
[2024-09-22 06:17:31,066][11734] All inference workers are ready! Signal rollout workers to start!
[2024-09-22 06:17:31,122][12540] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-09-22 06:17:31,124][12534] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-09-22 06:17:31,124][12539] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-09-22 06:17:31,126][12538] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-09-22 06:17:31,126][12541] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-09-22 06:17:31,126][12535] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-09-22 06:17:31,134][12536] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-09-22 06:17:31,140][12537] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-09-22 06:17:31,458][12540] Decorrelating experience for 0 frames...
[2024-09-22 06:17:31,459][12539] Decorrelating experience for 0 frames...
[2024-09-22 06:17:31,459][12541] Decorrelating experience for 0 frames...
[2024-09-22 06:17:31,619][12538] Decorrelating experience for 0 frames...
[2024-09-22 06:17:31,621][12534] Decorrelating experience for 0 frames...
[2024-09-22 06:17:31,724][12539] Decorrelating experience for 32 frames...
[2024-09-22 06:17:31,870][12541] Decorrelating experience for 32 frames...
[2024-09-22 06:17:31,914][12535] Decorrelating experience for 0 frames...
[2024-09-22 06:17:32,052][12534] Decorrelating experience for 32 frames...
[2024-09-22 06:17:32,052][12538] Decorrelating experience for 32 frames...
[2024-09-22 06:17:32,149][12539] Decorrelating experience for 64 frames...
[2024-09-22 06:17:32,266][12535] Decorrelating experience for 32 frames...
[2024-09-22 06:17:32,310][12536] Decorrelating experience for 0 frames...
[2024-09-22 06:17:32,349][12540] Decorrelating experience for 32 frames...
[2024-09-22 06:17:32,511][12541] Decorrelating experience for 64 frames...
[2024-09-22 06:17:32,621][12538] Decorrelating experience for 64 frames...
[2024-09-22 06:17:32,622][12534] Decorrelating experience for 64 frames...
[2024-09-22 06:17:32,703][12536] Decorrelating experience for 32 frames...
[2024-09-22 06:17:32,767][12537] Decorrelating experience for 0 frames...
[2024-09-22 06:17:32,785][12539] Decorrelating experience for 96 frames...
[2024-09-22 06:17:32,810][12535] Decorrelating experience for 64 frames...
[2024-09-22 06:17:32,862][12540] Decorrelating experience for 64 frames...
[2024-09-22 06:17:32,943][12541] Decorrelating experience for 96 frames...
[2024-09-22 06:17:33,088][12534] Decorrelating experience for 96 frames...
[2024-09-22 06:17:33,157][12538] Decorrelating experience for 96 frames...
[2024-09-22 06:17:33,253][12537] Decorrelating experience for 32 frames...
[2024-09-22 06:17:33,257][12535] Decorrelating experience for 96 frames...
[2024-09-22 06:17:33,298][12536] Decorrelating experience for 64 frames...
[2024-09-22 06:17:33,368][12540] Decorrelating experience for 96 frames...
[2024-09-22 06:17:33,615][12536] Decorrelating experience for 96 frames...
[2024-09-22 06:17:33,697][12537] Decorrelating experience for 64 frames...
[2024-09-22 06:17:34,075][12537] Decorrelating experience for 96 frames...
[2024-09-22 06:17:34,420][11734] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 10006528. Throughput: 0: nan. Samples: 772. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
[2024-09-22 06:17:34,422][11734] Avg episode reward: [(0, '1.866')]
[2024-09-22 06:17:34,924][12520] Signal inference workers to stop experience collection...
[2024-09-22 06:17:34,935][12533] InferenceWorker_p0-w0: stopping experience collection
[2024-09-22 06:17:37,581][12520] Signal inference workers to resume experience collection...
[2024-09-22 06:17:37,582][12520] Stopping Batcher_0...
[2024-09-22 06:17:37,582][12520] Loop batcher_evt_loop terminating...
[2024-09-22 06:17:37,593][11734] Component Batcher_0 stopped!
[2024-09-22 06:17:37,604][12533] Weights refcount: 2 0
[2024-09-22 06:17:37,611][12533] Stopping InferenceWorker_p0-w0...
[2024-09-22 06:17:37,610][11734] Component InferenceWorker_p0-w0 stopped!
[2024-09-22 06:17:37,613][12533] Loop inference_proc0-0_evt_loop terminating...
[2024-09-22 06:17:37,629][12534] Stopping RolloutWorker_w0...
[2024-09-22 06:17:37,630][12534] Loop rollout_proc0_evt_loop terminating...
[2024-09-22 06:17:37,630][11734] Component RolloutWorker_w0 stopped!
[2024-09-22 06:17:37,632][12540] Stopping RolloutWorker_w7...
[2024-09-22 06:17:37,634][12540] Loop rollout_proc7_evt_loop terminating...
[2024-09-22 06:17:37,634][12536] Stopping RolloutWorker_w2...
[2024-09-22 06:17:37,633][11734] Component RolloutWorker_w7 stopped!
[2024-09-22 06:17:37,635][12536] Loop rollout_proc2_evt_loop terminating...
[2024-09-22 06:17:37,635][11734] Component RolloutWorker_w2 stopped!
[2024-09-22 06:17:37,639][12539] Stopping RolloutWorker_w6...
[2024-09-22 06:17:37,639][12539] Loop rollout_proc6_evt_loop terminating...
[2024-09-22 06:17:37,641][11734] Component RolloutWorker_w6 stopped!
[2024-09-22 06:17:37,642][12537] Stopping RolloutWorker_w3...
[2024-09-22 06:17:37,644][12537] Loop rollout_proc3_evt_loop terminating...
[2024-09-22 06:17:37,645][11734] Component RolloutWorker_w3 stopped!
[2024-09-22 06:17:37,670][12541] Stopping RolloutWorker_w5...
[2024-09-22 06:17:37,670][11734] Component RolloutWorker_w5 stopped!
[2024-09-22 06:17:37,672][12541] Loop rollout_proc5_evt_loop terminating...
[2024-09-22 06:17:37,725][12535] Stopping RolloutWorker_w1...
[2024-09-22 06:17:37,726][12535] Loop rollout_proc1_evt_loop terminating...
[2024-09-22 06:17:37,725][11734] Component RolloutWorker_w1 stopped!
[2024-09-22 06:17:37,848][11734] Component RolloutWorker_w4 stopped!
[2024-09-22 06:17:37,847][12538] Stopping RolloutWorker_w4...
[2024-09-22 06:17:37,853][12538] Loop rollout_proc4_evt_loop terminating...
[2024-09-22 06:17:38,311][12520] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002445_10014720.pth...
[2024-09-22 06:17:38,415][12520] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002311_9465856.pth
[2024-09-22 06:17:38,430][12520] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002445_10014720.pth...
[2024-09-22 06:17:38,536][12520] Stopping LearnerWorker_p0...
[2024-09-22 06:17:38,537][12520] Loop learner_proc0_evt_loop terminating...
[2024-09-22 06:17:38,536][11734] Component LearnerWorker_p0 stopped!
[2024-09-22 06:17:38,540][11734] Waiting for process learner_proc0 to stop...
[2024-09-22 06:17:39,375][11734] Waiting for process inference_proc0-0 to join...
[2024-09-22 06:17:39,377][11734] Waiting for process rollout_proc0 to join...
[2024-09-22 06:17:39,380][11734] Waiting for process rollout_proc1 to join...
[2024-09-22 06:17:39,382][11734] Waiting for process rollout_proc2 to join...
[2024-09-22 06:17:39,384][11734] Waiting for process rollout_proc3 to join...
[2024-09-22 06:17:39,387][11734] Waiting for process rollout_proc4 to join...
[2024-09-22 06:17:39,389][11734] Waiting for process rollout_proc5 to join...
[2024-09-22 06:17:39,391][11734] Waiting for process rollout_proc6 to join...
[2024-09-22 06:17:39,394][11734] Waiting for process rollout_proc7 to join...
[2024-09-22 06:17:39,396][11734] Batcher 0 profile tree view:
batching: 0.0272, releasing_batches: 0.0006
[2024-09-22 06:17:39,399][11734] InferenceWorker_p0-w0 profile tree view:
update_model: 0.0150
wait_policy: 0.0001
  wait_policy_total: 1.7781
one_step: 0.0072
  handle_policy_step: 1.9771
    deserialize: 0.0586, stack: 0.0103, obs_to_device_normalize: 0.3707, forward: 1.2566, send_messages: 0.1221
    prepare_outputs: 0.1041
      to_cpu: 0.0496
[2024-09-22 06:17:39,402][11734] Learner 0 profile tree view:
misc: 0.0000, prepare_batch: 1.5120
train: 2.3528
  epoch_init: 0.0000, minibatch_init: 0.0000, losses_postprocess: 0.0005, kl_divergence: 0.0125, after_optimizer: 0.0419
  calculate_losses: 0.9267
    losses_init: 0.0000, forward_head: 0.3230, bptt_initial: 0.5264, tail: 0.0353, advantages_returns: 0.0010, losses: 0.0367
    bptt: 0.0038
      bptt_forward_core: 0.0036
  update: 1.3702
    clip: 0.0448
[2024-09-22 06:17:39,405][11734] RolloutWorker_w0 profile tree view:
wait_for_trajectories: 0.0008, enqueue_policy_requests: 0.0398, env_step: 0.4080, overhead: 0.0266, complete_rollouts: 0.0008
save_policy_outputs: 0.0351
  split_output_tensors: 0.0139
[2024-09-22 06:17:39,406][11734] RolloutWorker_w7 profile tree view:
wait_for_trajectories: 0.0010, enqueue_policy_requests: 0.0449, env_step: 0.4532, overhead: 0.0323, complete_rollouts: 0.0014
save_policy_outputs: 0.0410
  split_output_tensors: 0.0159
[2024-09-22 06:17:39,409][11734] Loop Runner_EvtLoop terminating...
[2024-09-22 06:17:39,410][11734] Runner profile tree view:
main_loop: 14.8845
[2024-09-22 06:17:39,412][11734] Collected {0: 10014720}, FPS: 550.4
[2024-09-22 06:17:39,433][11734] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
[2024-09-22 06:17:39,434][11734] Overriding arg 'num_workers' with value 1 passed from command line
[2024-09-22 06:17:39,435][11734] Adding new argument 'no_render'=True that is not in the saved config file!
[2024-09-22 06:17:39,438][11734] Adding new argument 'save_video'=True that is not in the saved config file!
[2024-09-22 06:17:39,439][11734] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
[2024-09-22 06:17:39,440][11734] Adding new argument 'video_name'=None that is not in the saved config file!
[2024-09-22 06:17:39,442][11734] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file!
[2024-09-22 06:17:39,443][11734] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
[2024-09-22 06:17:39,444][11734] Adding new argument 'push_to_hub'=False that is not in the saved config file!
[2024-09-22 06:17:39,446][11734] Adding new argument 'hf_repository'=None that is not in the saved config file!
[2024-09-22 06:17:39,447][11734] Adding new argument 'policy_index'=0 that is not in the saved config file!
[2024-09-22 06:17:39,449][11734] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
[2024-09-22 06:17:39,450][11734] Adding new argument 'train_script'=None that is not in the saved config file!
[2024-09-22 06:17:39,452][11734] Adding new argument 'enjoy_script'=None that is not in the saved config file!
[2024-09-22 06:17:39,454][11734] Using frameskip 1 and render_action_repeat=4 for evaluation
[2024-09-22 06:17:39,486][11734] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-09-22 06:17:39,490][11734] RunningMeanStd input shape: (3, 72, 128)
[2024-09-22 06:17:39,493][11734] RunningMeanStd input shape: (1,)
[2024-09-22 06:17:39,510][11734] ConvEncoder: input_channels=3
[2024-09-22 06:17:39,647][11734] Conv encoder output size: 512
[2024-09-22 06:17:39,649][11734] Policy head output size: 512
[2024-09-22 06:17:39,942][11734] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002445_10014720.pth...
[2024-09-22 06:17:40,875][11734] Num frames 100...
[2024-09-22 06:17:41,034][11734] Num frames 200...
[2024-09-22 06:17:41,208][11734] Num frames 300...
[2024-09-22 06:17:41,367][11734] Num frames 400...
[2024-09-22 06:17:41,530][11734] Num frames 500...
[2024-09-22 06:17:41,691][11734] Num frames 600...
[2024-09-22 06:17:41,865][11734] Num frames 700...
[2024-09-22 06:17:42,034][11734] Num frames 800...
[2024-09-22 06:17:42,189][11734] Num frames 900...
[2024-09-22 06:17:42,365][11734] Num frames 1000...
[2024-09-22 06:17:42,537][11734] Num frames 1100...
[2024-09-22 06:17:42,708][11734] Num frames 1200...
[2024-09-22 06:17:42,874][11734] Num frames 1300...
[2024-09-22 06:17:43,048][11734] Num frames 1400...
[2024-09-22 06:17:43,217][11734] Num frames 1500...
[2024-09-22 06:17:43,376][11734] Num frames 1600...
[2024-09-22 06:17:43,525][11734] Num frames 1700...
[2024-09-22 06:17:43,702][11734] Num frames 1800...
[2024-09-22 06:17:43,842][11734] Num frames 1900...
[2024-09-22 06:17:43,987][11734] Num frames 2000...
[2024-09-22 06:17:44,100][11734] Avg episode rewards: #0: 54.329, true rewards: #0: 20.330
[2024-09-22 06:17:44,102][11734] Avg episode reward: 54.329, avg true_objective: 20.330
[2024-09-22 06:17:44,216][11734] Num frames 2100...
[2024-09-22 06:17:44,378][11734] Num frames 2200...
[2024-09-22 06:17:44,535][11734] Num frames 2300...
[2024-09-22 06:17:44,683][11734] Num frames 2400...
[2024-09-22 06:17:44,870][11734] Avg episode rewards: #0: 29.904, true rewards: #0: 12.405
[2024-09-22 06:17:44,872][11734] Avg episode reward: 29.904, avg true_objective: 12.405
[2024-09-22 06:17:44,909][11734] Num frames 2500...
[2024-09-22 06:17:45,059][11734] Num frames 2600...
[2024-09-22 06:17:45,215][11734] Num frames 2700...
[2024-09-22 06:17:45,387][11734] Num frames 2800...
[2024-09-22 06:17:45,547][11734] Num frames 2900...
[2024-09-22 06:17:45,693][11734] Num frames 3000...
[2024-09-22 06:17:45,849][11734] Num frames 3100...
[2024-09-22 06:17:46,021][11734] Num frames 3200...
[2024-09-22 06:17:46,169][11734] Num frames 3300...
[2024-09-22 06:17:46,317][11734] Num frames 3400...
[2024-09-22 06:17:46,489][11734] Num frames 3500...
[2024-09-22 06:17:46,640][11734] Num frames 3600...
[2024-09-22 06:17:46,776][11734] Num frames 3700...
[2024-09-22 06:17:46,959][11734] Num frames 3800...
[2024-09-22 06:17:47,123][11734] Num frames 3900...
[2024-09-22 06:17:47,298][11734] Num frames 4000...
[2024-09-22 06:17:47,471][11734] Num frames 4100...
[2024-09-22 06:17:47,656][11734] Num frames 4200...
[2024-09-22 06:17:47,811][11734] Num frames 4300...
[2024-09-22 06:17:47,973][11734] Num frames 4400...
[2024-09-22 06:17:48,146][11734] Num frames 4500...
[2024-09-22 06:17:48,361][11734] Avg episode rewards: #0: 40.936, true rewards: #0: 15.270
[2024-09-22 06:17:48,364][11734] Avg episode reward: 40.936, avg true_objective: 15.270
[2024-09-22 06:17:48,400][11734] Num frames 4600...
[2024-09-22 06:17:48,575][11734] Num frames 4700...
[2024-09-22 06:17:48,748][11734] Num frames 4800...
[2024-09-22 06:17:48,937][11734] Num frames 4900...
[2024-09-22 06:17:49,090][11734] Num frames 5000...
[2024-09-22 06:17:49,238][11734] Num frames 5100...
[2024-09-22 06:17:49,398][11734] Num frames 5200...
[2024-09-22 06:17:49,568][11734] Num frames 5300...
[2024-09-22 06:17:49,722][11734] Num frames 5400...
[2024-09-22 06:17:49,869][11734] Num frames 5500...
[2024-09-22 06:17:50,043][11734] Num frames 5600...
[2024-09-22 06:17:50,213][11734] Avg episode rewards: #0: 36.672, true rewards: #0: 14.173
[2024-09-22 06:17:50,215][11734] Avg episode reward: 36.672, avg true_objective: 14.173
[2024-09-22 06:17:50,265][11734] Num frames 5700...
[2024-09-22 06:17:50,420][11734] Num frames 5800...
[2024-09-22 06:17:50,573][11734] Num frames 5900...
[2024-09-22 06:17:50,739][11734] Num frames 6000...
[2024-09-22 06:17:50,899][11734] Num frames 6100...
[2024-09-22 06:17:51,053][11734] Num frames 6200...
[2024-09-22 06:17:51,205][11734] Num frames 6300...
[2024-09-22 06:17:51,381][11734] Num frames 6400...
[2024-09-22 06:17:51,535][11734] Num frames 6500...
[2024-09-22 06:17:51,684][11734] Num frames 6600...
[2024-09-22 06:17:51,860][11734] Num frames 6700...
[2024-09-22 06:17:52,030][11734] Num frames 6800...
[2024-09-22 06:17:52,189][11734] Num frames 6900...
[2024-09-22 06:17:52,342][11734] Num frames 7000...
[2024-09-22 06:17:52,514][11734] Num frames 7100...
[2024-09-22 06:17:52,657][11734] Num frames 7200...
[2024-09-22 06:17:52,722][11734] Avg episode rewards: #0: 37.209, true rewards: #0: 14.410
[2024-09-22 06:17:52,724][11734] Avg episode reward: 37.209, avg true_objective: 14.410
[2024-09-22 06:17:52,867][11734] Num frames 7300...
[2024-09-22 06:17:53,040][11734] Num frames 7400...
[2024-09-22 06:17:53,184][11734] Num frames 7500...
[2024-09-22 06:17:53,357][11734] Num frames 7600...
[2024-09-22 06:17:53,515][11734] Num frames 7700...
[2024-09-22 06:17:53,682][11734] Num frames 7800...
[2024-09-22 06:17:53,839][11734] Num frames 7900...
[2024-09-22 06:17:53,941][11734] Avg episode rewards: #0: 34.208, true rewards: #0: 13.208
[2024-09-22 06:17:53,943][11734] Avg episode reward: 34.208, avg true_objective: 13.208
[2024-09-22 06:17:54,063][11734] Num frames 8000...
[2024-09-22 06:17:54,241][11734] Num frames 8100...
[2024-09-22 06:17:54,386][11734] Num frames 8200...
[2024-09-22 06:17:54,554][11734] Num frames 8300...
[2024-09-22 06:17:54,725][11734] Num frames 8400...
[2024-09-22 06:17:54,892][11734] Num frames 8500...
[2024-09-22 06:17:55,045][11734] Num frames 8600...
[2024-09-22 06:17:55,208][11734] Num frames 8700...
[2024-09-22 06:17:55,387][11734] Num frames 8800...
[2024-09-22 06:17:55,541][11734] Num frames 8900...
[2024-09-22 06:17:55,707][11734] Num frames 9000...
[2024-09-22 06:17:55,875][11734] Num frames 9100...
[2024-09-22 06:17:56,028][11734] Num frames 9200...
[2024-09-22 06:17:56,185][11734] Num frames 9300...
[2024-09-22 06:17:56,367][11734] Num frames 9400...
[2024-09-22 06:17:56,535][11734] Num frames 9500...
[2024-09-22 06:17:56,692][11734] Num frames 9600...
[2024-09-22 06:17:56,844][11734] Num frames 9700...
[2024-09-22 06:17:57,016][11734] Num frames 9800...
[2024-09-22 06:17:57,179][11734] Num frames 9900...
[2024-09-22 06:17:57,344][11734] Num frames 10000...
[2024-09-22 06:17:57,446][11734] Avg episode rewards: #0: 37.607, true rewards: #0: 14.321
[2024-09-22 06:17:57,448][11734] Avg episode reward: 37.607, avg true_objective: 14.321
[2024-09-22 06:17:57,563][11734] Num frames 10100...
[2024-09-22 06:17:57,749][11734] Num frames 10200...
[2024-09-22 06:17:57,907][11734] Num frames 10300...
[2024-09-22 06:17:58,065][11734] Num frames 10400...
[2024-09-22 06:17:58,228][11734] Num frames 10500...
[2024-09-22 06:17:58,400][11734] Num frames 10600...
[2024-09-22 06:17:58,559][11734] Num frames 10700...
[2024-09-22 06:17:58,731][11734] Num frames 10800...
[2024-09-22 06:17:58,890][11734] Num frames 10900...
[2024-09-22 06:17:59,071][11734] Num frames 11000...
[2024-09-22 06:17:59,239][11734] Num frames 11100...
[2024-09-22 06:17:59,394][11734] Num frames 11200...
[2024-09-22 06:17:59,587][11734] Num frames 11300...
[2024-09-22 06:17:59,757][11734] Num frames 11400...
[2024-09-22 06:17:59,924][11734] Num frames 11500...
[2024-09-22 06:18:00,108][11734] Num frames 11600...
[2024-09-22 06:18:00,295][11734] Num frames 11700...
[2024-09-22 06:18:00,474][11734] Num frames 11800...
[2024-09-22 06:18:00,643][11734] Num frames 11900...
[2024-09-22 06:18:00,829][11734] Num frames 12000...
[2024-09-22 06:18:00,980][11734] Num frames 12100...
[2024-09-22 06:18:01,084][11734] Avg episode rewards: #0: 40.406, true rewards: #0: 15.156
[2024-09-22 06:18:01,086][11734] Avg episode reward: 40.406, avg true_objective: 15.156
[2024-09-22 06:18:01,199][11734] Num frames 12200...
[2024-09-22 06:18:01,366][11734] Num frames 12300...
[2024-09-22 06:18:01,519][11734] Num frames 12400...
[2024-09-22 06:18:01,673][11734] Num frames 12500...
[2024-09-22 06:18:01,836][11734] Num frames 12600...
[2024-09-22 06:18:02,021][11734] Num frames 12700...
[2024-09-22 06:18:02,176][11734] Num frames 12800...
[2024-09-22 06:18:02,321][11734] Num frames 12900...
[2024-09-22 06:18:02,447][11734] Avg episode rewards: #0: 37.938, true rewards: #0: 14.383
[2024-09-22 06:18:02,449][11734] Avg episode reward: 37.938, avg true_objective: 14.383
[2024-09-22 06:18:02,543][11734] Num frames 13000...
[2024-09-22 06:18:02,691][11734] Num frames 13100...
[2024-09-22 06:18:02,860][11734] Num frames 13200...
[2024-09-22 06:18:03,019][11734] Num frames 13300...
[2024-09-22 06:18:03,185][11734] Avg episode rewards: #0: 34.760, true rewards: #0: 13.361
[2024-09-22 06:18:03,187][11734] Avg episode reward: 34.760, avg true_objective: 13.361
[2024-09-22 06:18:40,287][11734] Replay video saved to /content/train_dir/default_experiment/replay.mp4!
[2024-09-22 06:18:40,316][11734] Environment doom_basic already registered, overwriting...
[2024-09-22 06:18:40,319][11734] Environment doom_two_colors_easy already registered, overwriting...
[2024-09-22 06:18:40,320][11734] Environment doom_two_colors_hard already registered, overwriting...
[2024-09-22 06:18:40,321][11734] Environment doom_dm already registered, overwriting...
[2024-09-22 06:18:40,322][11734] Environment doom_dwango5 already registered, overwriting...
[2024-09-22 06:18:40,325][11734] Environment doom_my_way_home_flat_actions already registered, overwriting...
[2024-09-22 06:18:40,326][11734] Environment doom_defend_the_center_flat_actions already registered, overwriting...
[2024-09-22 06:18:40,327][11734] Environment doom_my_way_home already registered, overwriting...
[2024-09-22 06:18:40,328][11734] Environment doom_deadly_corridor already registered, overwriting...
[2024-09-22 06:18:40,331][11734] Environment doom_defend_the_center already registered, overwriting...
[2024-09-22 06:18:40,332][11734] Environment doom_defend_the_line already registered, overwriting...
[2024-09-22 06:18:40,334][11734] Environment doom_health_gathering already registered, overwriting...
[2024-09-22 06:18:40,336][11734] Environment doom_health_gathering_supreme already registered, overwriting...
[2024-09-22 06:18:40,337][11734] Environment doom_battle already registered, overwriting...
[2024-09-22 06:18:40,338][11734] Environment doom_battle2 already registered, overwriting...
[2024-09-22 06:18:40,339][11734] Environment doom_duel_bots already registered, overwriting...
[2024-09-22 06:18:40,342][11734] Environment doom_deathmatch_bots already registered, overwriting...
[2024-09-22 06:18:40,343][11734] Environment doom_duel already registered, overwriting...
[2024-09-22 06:18:40,345][11734] Environment doom_deathmatch_full already registered, overwriting...
[2024-09-22 06:18:40,346][11734] Environment doom_benchmark already registered, overwriting...
[2024-09-22 06:18:40,348][11734] register_encoder_factory: <function make_vizdoom_encoder at 0x7b467399b910>
[2024-09-22 06:18:40,360][11734] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
[2024-09-22 06:18:40,366][11734] Experiment dir /content/train_dir/default_experiment already exists!
[2024-09-22 06:18:40,367][11734] Resuming existing experiment from /content/train_dir/default_experiment...
[2024-09-22 06:18:40,369][11734] Weights and Biases integration disabled
[2024-09-22 06:18:40,372][11734] Environment var CUDA_VISIBLE_DEVICES is 0

[2024-09-22 06:18:42,971][11734] Starting experiment with the following configuration:
help=False
algo=APPO
env=doom_health_gathering_supreme
experiment=default_experiment
train_dir=/content/train_dir
restart_behavior=resume
device=gpu
seed=None
num_policies=1
async_rl=True
serial_mode=False
batched_sampling=False
num_batches_to_accumulate=2
worker_num_splits=2
policy_workers_per_policy=1
max_policy_lag=1000
num_workers=8
num_envs_per_worker=4
batch_size=1024
num_batches_per_epoch=1
num_epochs=1
rollout=32
recurrence=32
shuffle_minibatches=False
gamma=0.99
reward_scale=1.0
reward_clip=1000.0
value_bootstrap=False
normalize_returns=True
exploration_loss_coeff=0.001
value_loss_coeff=0.5
kl_loss_coeff=0.0
exploration_loss=symmetric_kl
gae_lambda=0.95
ppo_clip_ratio=0.1
ppo_clip_value=0.2
with_vtrace=False
vtrace_rho=1.0
vtrace_c=1.0
optimizer=adam
adam_eps=1e-06
adam_beta1=0.9
adam_beta2=0.999
max_grad_norm=4.0
learning_rate=0.0001
lr_schedule=constant
lr_schedule_kl_threshold=0.008
lr_adaptive_min=1e-06
lr_adaptive_max=0.01
obs_subtract_mean=0.0
obs_scale=255.0
normalize_input=True
normalize_input_keys=None
decorrelate_experience_max_seconds=0
decorrelate_envs_on_one_worker=True
actor_worker_gpus=[]
set_workers_cpu_affinity=True
force_envs_single_thread=False
default_niceness=0
log_to_file=True
experiment_summaries_interval=10
flush_summaries_interval=30
stats_avg=100
summaries_use_frameskip=True
heartbeat_interval=20
heartbeat_reporting_interval=600
train_for_env_steps=10000000
train_for_seconds=10000000000
save_every_sec=120
keep_checkpoints=2
load_checkpoint_kind=latest
save_milestones_sec=-1
save_best_every_sec=5
save_best_metric=reward
save_best_after=100000
benchmark=False
encoder_mlp_layers=[512, 512]
encoder_conv_architecture=convnet_simple
encoder_conv_mlp_layers=[512]
use_rnn=True
rnn_size=512
rnn_type=gru
rnn_num_layers=1
decoder_mlp_layers=[]
nonlinearity=elu
policy_initialization=orthogonal
policy_init_gain=1.0
actor_critic_share_weights=True
adaptive_stddev=True
continuous_tanh_scale=0.0
initial_stddev=1.0
use_env_info_cache=False
env_gpu_actions=False
env_gpu_observations=True
env_frameskip=4
env_framestack=1
pixel_format=CHW
use_record_episode_statistics=False
with_wandb=False
wandb_user=None
wandb_project=sample_factory
wandb_group=None
wandb_job_type=SF
wandb_tags=[]
with_pbt=False
pbt_mix_policies_in_one_env=True
pbt_period_env_steps=5000000
pbt_start_mutation=20000000
pbt_replace_fraction=0.3
pbt_mutation_rate=0.15
pbt_replace_reward_gap=0.1
pbt_replace_reward_gap_absolute=1e-06
pbt_optimize_gamma=False
pbt_target_objective=true_objective
pbt_perturb_min=1.1
pbt_perturb_max=1.5
num_agents=-1
num_humans=0
num_bots=-1
start_bot_difficulty=None
timelimit=None
res_w=128
res_h=72
wide_aspect_ratio=False
eval_env_frameskip=1
fps=35
command_line=--env=doom_health_gathering_supreme --num_workers=8 --num_envs_per_worker=4 --train_for_env_steps=10000000
cli_args={'env': 'doom_health_gathering_supreme', 'num_workers': 8, 'num_envs_per_worker': 4, 'train_for_env_steps': 10000000}
git_hash=unknown
git_repo_name=not a git repository
[2024-09-22 06:18:42,973][11734] Saving configuration to /content/train_dir/default_experiment/config.json...
[2024-09-22 06:18:42,976][11734] Rollout worker 0 uses device cpu
[2024-09-22 06:18:42,977][11734] Rollout worker 1 uses device cpu
[2024-09-22 06:18:42,978][11734] Rollout worker 2 uses device cpu
[2024-09-22 06:18:42,981][11734] Rollout worker 3 uses device cpu
[2024-09-22 06:18:42,982][11734] Rollout worker 4 uses device cpu
[2024-09-22 06:18:42,983][11734] Rollout worker 5 uses device cpu
[2024-09-22 06:18:42,986][11734] Rollout worker 6 uses device cpu
[2024-09-22 06:18:42,987][11734] Rollout worker 7 uses device cpu
[2024-09-22 06:18:43,028][11734] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2024-09-22 06:18:43,030][11734] InferenceWorker_p0-w0: min num requests: 2
[2024-09-22 06:18:43,068][11734] Starting all processes...
[2024-09-22 06:18:43,070][11734] Starting process learner_proc0
[2024-09-22 06:18:43,118][11734] Starting all processes...
[2024-09-22 06:18:43,123][11734] Starting process inference_proc0-0
[2024-09-22 06:18:43,125][11734] Starting process rollout_proc0
[2024-09-22 06:18:43,126][11734] Starting process rollout_proc1
[2024-09-22 06:18:43,129][11734] Starting process rollout_proc2
[2024-09-22 06:18:43,131][11734] Starting process rollout_proc3
[2024-09-22 06:18:43,133][11734] Starting process rollout_proc4
[2024-09-22 06:18:43,140][11734] Starting process rollout_proc5
[2024-09-22 06:18:43,141][11734] Starting process rollout_proc6
[2024-09-22 06:18:43,155][11734] Starting process rollout_proc7
[2024-09-22 06:18:47,451][13282] Worker 4 uses CPU cores [4]
[2024-09-22 06:18:47,487][13280] Worker 2 uses CPU cores [2]
[2024-09-22 06:18:47,594][13281] Worker 3 uses CPU cores [3]
[2024-09-22 06:18:47,605][13284] Worker 6 uses CPU cores [6]
[2024-09-22 06:18:47,701][13285] Worker 7 uses CPU cores [7]
[2024-09-22 06:18:47,722][13283] Worker 5 uses CPU cores [5]
[2024-09-22 06:18:47,769][13260] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2024-09-22 06:18:47,769][13260] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0
[2024-09-22 06:18:47,788][13260] Num visible devices: 1
[2024-09-22 06:18:47,794][13279] Worker 1 uses CPU cores [1]
[2024-09-22 06:18:47,809][13260] Starting seed is not provided
[2024-09-22 06:18:47,809][13260] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2024-09-22 06:18:47,809][13260] Initializing actor-critic model on device cuda:0
[2024-09-22 06:18:47,810][13260] RunningMeanStd input shape: (3, 72, 128)
[2024-09-22 06:18:47,811][13260] RunningMeanStd input shape: (1,)
[2024-09-22 06:18:47,832][13260] ConvEncoder: input_channels=3
[2024-09-22 06:18:47,846][13274] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2024-09-22 06:18:47,846][13274] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0
[2024-09-22 06:18:47,862][13274] Num visible devices: 1
[2024-09-22 06:18:47,918][13278] Worker 0 uses CPU cores [0]
[2024-09-22 06:18:47,975][13260] Conv encoder output size: 512
[2024-09-22 06:18:47,975][13260] Policy head output size: 512
[2024-09-22 06:18:47,992][13260] Created Actor Critic model with architecture:
[2024-09-22 06:18:47,992][13260] ActorCriticSharedWeights(
  (obs_normalizer): ObservationNormalizer(
    (running_mean_std): RunningMeanStdDictInPlace(
      (running_mean_std): ModuleDict(
        (obs): RunningMeanStdInPlace()
      )
    )
  )
  (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace)
  (encoder): VizdoomEncoder(
    (basic_encoder): ConvEncoder(
      (enc): RecursiveScriptModule(
        original_name=ConvEncoderImpl
        (conv_head): RecursiveScriptModule(
          original_name=Sequential
          (0): RecursiveScriptModule(original_name=Conv2d)
          (1): RecursiveScriptModule(original_name=ELU)
          (2): RecursiveScriptModule(original_name=Conv2d)
          (3): RecursiveScriptModule(original_name=ELU)
          (4): RecursiveScriptModule(original_name=Conv2d)
          (5): RecursiveScriptModule(original_name=ELU)
        )
        (mlp_layers): RecursiveScriptModule(
          original_name=Sequential
          (0): RecursiveScriptModule(original_name=Linear)
          (1): RecursiveScriptModule(original_name=ELU)
        )
      )
    )
  )
  (core): ModelCoreRNN(
    (core): GRU(512, 512)
  )
  (decoder): MlpDecoder(
    (mlp): Identity()
  )
  (critic_linear): Linear(in_features=512, out_features=1, bias=True)
  (action_parameterization): ActionParameterizationDefault(
    (distribution_linear): Linear(in_features=512, out_features=5, bias=True)
  )
)
[2024-09-22 06:18:48,219][13260] Using optimizer <class 'torch.optim.adam.Adam'>
[2024-09-22 06:18:48,952][13260] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002445_10014720.pth...
[2024-09-22 06:18:48,997][13260] Loading model from checkpoint
[2024-09-22 06:18:48,999][13260] Loaded experiment state at self.train_step=2445, self.env_steps=10014720
[2024-09-22 06:18:49,000][13260] Initialized policy 0 weights for model version 2445
[2024-09-22 06:18:49,004][13260] LearnerWorker_p0 finished initialization!
[2024-09-22 06:18:49,004][13260] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2024-09-22 06:18:49,191][13274] RunningMeanStd input shape: (3, 72, 128)
[2024-09-22 06:18:49,193][13274] RunningMeanStd input shape: (1,)
[2024-09-22 06:18:49,207][13274] ConvEncoder: input_channels=3
[2024-09-22 06:18:49,327][13274] Conv encoder output size: 512
[2024-09-22 06:18:49,327][13274] Policy head output size: 512
[2024-09-22 06:18:49,388][11734] Inference worker 0-0 is ready!
[2024-09-22 06:18:49,390][11734] All inference workers are ready! Signal rollout workers to start!
[2024-09-22 06:18:49,443][13278] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-09-22 06:18:49,446][13284] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-09-22 06:18:49,446][13279] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-09-22 06:18:49,449][13281] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-09-22 06:18:49,450][13283] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-09-22 06:18:49,451][13285] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-09-22 06:18:49,461][13280] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-09-22 06:18:49,508][13282] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-09-22 06:18:49,858][13281] Decorrelating experience for 0 frames...
[2024-09-22 06:18:49,938][13284] Decorrelating experience for 0 frames...
[2024-09-22 06:18:49,938][13278] Decorrelating experience for 0 frames...
[2024-09-22 06:18:49,957][13279] Decorrelating experience for 0 frames...
[2024-09-22 06:18:49,957][13283] Decorrelating experience for 0 frames...
[2024-09-22 06:18:49,958][13280] Decorrelating experience for 0 frames...
[2024-09-22 06:18:49,978][13282] Decorrelating experience for 0 frames...
[2024-09-22 06:18:50,224][13278] Decorrelating experience for 32 frames...
[2024-09-22 06:18:50,232][13283] Decorrelating experience for 32 frames...
[2024-09-22 06:18:50,240][13280] Decorrelating experience for 32 frames...
[2024-09-22 06:18:50,373][11734] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 10014720. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
[2024-09-22 06:18:50,377][13281] Decorrelating experience for 32 frames...
[2024-09-22 06:18:50,407][13285] Decorrelating experience for 0 frames...
[2024-09-22 06:18:50,544][13282] Decorrelating experience for 32 frames...
[2024-09-22 06:18:50,583][13283] Decorrelating experience for 64 frames...
[2024-09-22 06:18:50,706][13285] Decorrelating experience for 32 frames...
[2024-09-22 06:18:50,833][13280] Decorrelating experience for 64 frames...
[2024-09-22 06:18:50,853][13279] Decorrelating experience for 32 frames...
[2024-09-22 06:18:50,868][13284] Decorrelating experience for 32 frames...
[2024-09-22 06:18:51,025][13278] Decorrelating experience for 64 frames...
[2024-09-22 06:18:51,063][13282] Decorrelating experience for 64 frames...
[2024-09-22 06:18:51,222][13285] Decorrelating experience for 64 frames...
[2024-09-22 06:18:51,262][13280] Decorrelating experience for 96 frames...
[2024-09-22 06:18:51,307][13279] Decorrelating experience for 64 frames...
[2024-09-22 06:18:51,314][13281] Decorrelating experience for 64 frames...
[2024-09-22 06:18:51,366][13282] Decorrelating experience for 96 frames...
[2024-09-22 06:18:51,460][13284] Decorrelating experience for 64 frames...
[2024-09-22 06:18:51,469][13283] Decorrelating experience for 96 frames...
[2024-09-22 06:18:51,703][13279] Decorrelating experience for 96 frames...
[2024-09-22 06:18:51,756][13284] Decorrelating experience for 96 frames...
[2024-09-22 06:18:51,759][13281] Decorrelating experience for 96 frames...
[2024-09-22 06:18:51,876][13278] Decorrelating experience for 96 frames...
[2024-09-22 06:18:51,905][13285] Decorrelating experience for 96 frames...
[2024-09-22 06:18:53,021][13260] Signal inference workers to stop experience collection...
[2024-09-22 06:18:53,028][13274] InferenceWorker_p0-w0: stopping experience collection
[2024-09-22 06:18:55,372][11734] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 10014720. Throughput: 0: 84.8. Samples: 424. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
[2024-09-22 06:18:55,375][11734] Avg episode reward: [(0, '2.298')]
[2024-09-22 06:18:55,699][13260] Signal inference workers to resume experience collection...
[2024-09-22 06:18:55,700][13260] Stopping Batcher_0...
[2024-09-22 06:18:55,701][13260] Loop batcher_evt_loop terminating...
[2024-09-22 06:18:55,713][11734] Component Batcher_0 stopped!
[2024-09-22 06:18:55,723][13274] Weights refcount: 2 0
[2024-09-22 06:18:55,725][13274] Stopping InferenceWorker_p0-w0...
[2024-09-22 06:18:55,726][13274] Loop inference_proc0-0_evt_loop terminating...
[2024-09-22 06:18:55,726][11734] Component InferenceWorker_p0-w0 stopped!
[2024-09-22 06:18:55,748][13281] Stopping RolloutWorker_w3...
[2024-09-22 06:18:55,749][13281] Loop rollout_proc3_evt_loop terminating...
[2024-09-22 06:18:55,749][13278] Stopping RolloutWorker_w0...
[2024-09-22 06:18:55,750][13278] Loop rollout_proc0_evt_loop terminating...
[2024-09-22 06:18:55,749][11734] Component RolloutWorker_w3 stopped!
[2024-09-22 06:18:55,751][11734] Component RolloutWorker_w0 stopped!
[2024-09-22 06:18:55,753][13279] Stopping RolloutWorker_w1...
[2024-09-22 06:18:55,753][13279] Loop rollout_proc1_evt_loop terminating...
[2024-09-22 06:18:55,755][13280] Stopping RolloutWorker_w2...
[2024-09-22 06:18:55,755][13280] Loop rollout_proc2_evt_loop terminating...
[2024-09-22 06:18:55,752][13282] Stopping RolloutWorker_w4...
[2024-09-22 06:18:55,756][13282] Loop rollout_proc4_evt_loop terminating...
[2024-09-22 06:18:55,759][13284] Stopping RolloutWorker_w6...
[2024-09-22 06:18:55,754][11734] Component RolloutWorker_w4 stopped!
[2024-09-22 06:18:55,760][13284] Loop rollout_proc6_evt_loop terminating...
[2024-09-22 06:18:55,760][11734] Component RolloutWorker_w1 stopped!
[2024-09-22 06:18:55,761][11734] Component RolloutWorker_w2 stopped!
[2024-09-22 06:18:55,764][11734] Component RolloutWorker_w6 stopped!
[2024-09-22 06:18:55,838][13283] Stopping RolloutWorker_w5...
[2024-09-22 06:18:55,839][13283] Loop rollout_proc5_evt_loop terminating...
[2024-09-22 06:18:55,838][11734] Component RolloutWorker_w5 stopped!
[2024-09-22 06:18:55,956][13285] Stopping RolloutWorker_w7...
[2024-09-22 06:18:55,957][13285] Loop rollout_proc7_evt_loop terminating...
[2024-09-22 06:18:55,957][11734] Component RolloutWorker_w7 stopped!
[2024-09-22 06:18:56,413][13260] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002447_10022912.pth...
[2024-09-22 06:18:56,518][13260] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002443_10006528.pth
[2024-09-22 06:18:56,532][13260] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002447_10022912.pth...
[2024-09-22 06:18:56,656][13260] Stopping LearnerWorker_p0...
[2024-09-22 06:18:56,656][13260] Loop learner_proc0_evt_loop terminating...
[2024-09-22 06:18:56,656][11734] Component LearnerWorker_p0 stopped!
[2024-09-22 06:18:56,658][11734] Waiting for process learner_proc0 to stop...
[2024-09-22 06:18:57,551][11734] Waiting for process inference_proc0-0 to join...
[2024-09-22 06:18:57,555][11734] Waiting for process rollout_proc0 to join...
[2024-09-22 06:18:57,557][11734] Waiting for process rollout_proc1 to join...
[2024-09-22 06:18:57,560][11734] Waiting for process rollout_proc2 to join...
[2024-09-22 06:18:57,562][11734] Waiting for process rollout_proc3 to join...
[2024-09-22 06:18:57,564][11734] Waiting for process rollout_proc4 to join...
[2024-09-22 06:18:57,567][11734] Waiting for process rollout_proc5 to join...
[2024-09-22 06:18:57,569][11734] Waiting for process rollout_proc6 to join...
[2024-09-22 06:18:57,572][11734] Waiting for process rollout_proc7 to join...
[2024-09-22 06:18:57,574][11734] Batcher 0 profile tree view:
batching: 0.0279, releasing_batches: 0.0006
[2024-09-22 06:18:57,575][11734] InferenceWorker_p0-w0 profile tree view:
update_model: 0.0078
wait_policy: 0.0001
  wait_policy_total: 1.9361
one_step: 0.0729
  handle_policy_step: 1.6765
    deserialize: 0.0400, stack: 0.0044, obs_to_device_normalize: 0.3127, forward: 1.1167, send_messages: 0.1127
    prepare_outputs: 0.0566
      to_cpu: 0.0275
[2024-09-22 06:18:57,577][11734] Learner 0 profile tree view:
misc: 0.0000, prepare_batch: 1.4660
train: 2.3107
  epoch_init: 0.0000, minibatch_init: 0.0000, losses_postprocess: 0.0004, kl_divergence: 0.0121, after_optimizer: 0.0508
  calculate_losses: 0.8574
    losses_init: 0.0000, forward_head: 0.2691, bptt_initial: 0.5100, tail: 0.0361, advantages_returns: 0.0010, losses: 0.0366
    bptt: 0.0041
      bptt_forward_core: 0.0039
  update: 1.3889
    clip: 0.0478
[2024-09-22 06:18:57,579][11734] RolloutWorker_w0 profile tree view:
wait_for_trajectories: 0.0006, enqueue_policy_requests: 0.0289, env_step: 0.2787, overhead: 0.0196, complete_rollouts: 0.0007
save_policy_outputs: 0.0275
  split_output_tensors: 0.0102
[2024-09-22 06:18:57,581][11734] RolloutWorker_w7 profile tree view:
wait_for_trajectories: 0.0006, enqueue_policy_requests: 0.0295, env_step: 0.2849, overhead: 0.0200, complete_rollouts: 0.0007
save_policy_outputs: 0.0273
  split_output_tensors: 0.0109
[2024-09-22 06:18:57,584][11734] Loop Runner_EvtLoop terminating...
[2024-09-22 06:18:57,586][11734] Runner profile tree view:
main_loop: 14.5178
[2024-09-22 06:18:57,587][11734] Collected {0: 10022912}, FPS: 564.3
[2024-09-22 06:19:03,049][11734] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
[2024-09-22 06:19:03,050][11734] Overriding arg 'num_workers' with value 1 passed from command line
[2024-09-22 06:19:03,052][11734] Adding new argument 'no_render'=True that is not in the saved config file!
[2024-09-22 06:19:03,054][11734] Adding new argument 'save_video'=True that is not in the saved config file!
[2024-09-22 06:19:03,055][11734] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
[2024-09-22 06:19:03,058][11734] Adding new argument 'video_name'=None that is not in the saved config file!
[2024-09-22 06:19:03,059][11734] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file!
[2024-09-22 06:19:03,060][11734] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
[2024-09-22 06:19:03,061][11734] Adding new argument 'push_to_hub'=False that is not in the saved config file!
[2024-09-22 06:19:03,066][11734] Adding new argument 'hf_repository'=None that is not in the saved config file!
[2024-09-22 06:19:03,067][11734] Adding new argument 'policy_index'=0 that is not in the saved config file!
[2024-09-22 06:19:03,070][11734] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
[2024-09-22 06:19:03,071][11734] Adding new argument 'train_script'=None that is not in the saved config file!
[2024-09-22 06:19:03,072][11734] Adding new argument 'enjoy_script'=None that is not in the saved config file!
[2024-09-22 06:19:03,076][11734] Using frameskip 1 and render_action_repeat=4 for evaluation
[2024-09-22 06:19:03,102][11734] RunningMeanStd input shape: (3, 72, 128)
[2024-09-22 06:19:03,105][11734] RunningMeanStd input shape: (1,)
[2024-09-22 06:19:03,119][11734] ConvEncoder: input_channels=3
[2024-09-22 06:19:03,167][11734] Conv encoder output size: 512
[2024-09-22 06:19:03,170][11734] Policy head output size: 512
[2024-09-22 06:19:03,192][11734] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002447_10022912.pth...
[2024-09-22 06:19:03,692][11734] Num frames 100...
[2024-09-22 06:19:03,869][11734] Num frames 200...
[2024-09-22 06:19:04,041][11734] Num frames 300...
[2024-09-22 06:19:04,201][11734] Num frames 400...
[2024-09-22 06:19:04,362][11734] Num frames 500...
[2024-09-22 06:19:04,543][11734] Num frames 600...
[2024-09-22 06:19:04,707][11734] Num frames 700...
[2024-09-22 06:19:04,862][11734] Num frames 800...
[2024-09-22 06:19:05,010][11734] Num frames 900...
[2024-09-22 06:19:05,165][11734] Avg episode rewards: #0: 20.600, true rewards: #0: 9.600
[2024-09-22 06:19:05,167][11734] Avg episode reward: 20.600, avg true_objective: 9.600
[2024-09-22 06:19:05,227][11734] Num frames 1000...
[2024-09-22 06:19:05,361][11734] Num frames 1100...
[2024-09-22 06:19:05,507][11734] Num frames 1200...
[2024-09-22 06:19:05,640][11734] Avg episode rewards: #0: 13.240, true rewards: #0: 6.240
[2024-09-22 06:19:05,643][11734] Avg episode reward: 13.240, avg true_objective: 6.240
[2024-09-22 06:19:05,721][11734] Num frames 1300...
[2024-09-22 06:19:05,869][11734] Num frames 1400...
[2024-09-22 06:19:06,016][11734] Num frames 1500...
[2024-09-22 06:19:06,183][11734] Num frames 1600...
[2024-09-22 06:19:06,341][11734] Num frames 1700...
[2024-09-22 06:19:06,500][11734] Num frames 1800...
[2024-09-22 06:19:06,663][11734] Num frames 1900...
[2024-09-22 06:19:06,821][11734] Num frames 2000...
[2024-09-22 06:19:06,965][11734] Num frames 2100...
[2024-09-22 06:19:07,115][11734] Num frames 2200...
[2024-09-22 06:19:07,302][11734] Num frames 2300...
[2024-09-22 06:19:07,457][11734] Num frames 2400...
[2024-09-22 06:19:07,631][11734] Num frames 2500...
[2024-09-22 06:19:07,787][11734] Num frames 2600...
[2024-09-22 06:19:07,949][11734] Num frames 2700...
[2024-09-22 06:19:08,090][11734] Num frames 2800...
[2024-09-22 06:19:08,258][11734] Num frames 2900...
[2024-09-22 06:19:08,442][11734] Num frames 3000...
[2024-09-22 06:19:08,603][11734] Num frames 3100...
[2024-09-22 06:19:08,755][11734] Num frames 3200...
[2024-09-22 06:19:08,907][11734] Avg episode rewards: #0: 28.213, true rewards: #0: 10.880
[2024-09-22 06:19:08,908][11734] Avg episode reward: 28.213, avg true_objective: 10.880
[2024-09-22 06:19:08,968][11734] Num frames 3300...
[2024-09-22 06:19:09,139][11734] Num frames 3400...
[2024-09-22 06:19:09,305][11734] Num frames 3500...
[2024-09-22 06:19:09,472][11734] Num frames 3600...
[2024-09-22 06:19:09,637][11734] Num frames 3700...
[2024-09-22 06:19:09,818][11734] Num frames 3800...
[2024-09-22 06:19:09,973][11734] Num frames 3900...
[2024-09-22 06:19:10,128][11734] Num frames 4000...
[2024-09-22 06:19:10,281][11734] Num frames 4100...
[2024-09-22 06:19:10,462][11734] Num frames 4200...
[2024-09-22 06:19:10,633][11734] Num frames 4300...
[2024-09-22 06:19:10,796][11734] Num frames 4400...
[2024-09-22 06:19:10,956][11734] Num frames 4500...
[2024-09-22 06:19:11,119][11734] Num frames 4600...
[2024-09-22 06:19:11,264][11734] Num frames 4700...
[2024-09-22 06:19:11,418][11734] Num frames 4800...
[2024-09-22 06:19:11,608][11734] Num frames 4900...
[2024-09-22 06:19:11,775][11734] Num frames 5000...
[2024-09-22 06:19:11,925][11734] Num frames 5100...
[2024-09-22 06:19:12,103][11734] Num frames 5200...
[2024-09-22 06:19:12,245][11734] Avg episode rewards: #0: 34.384, true rewards: #0: 13.135
[2024-09-22 06:19:12,247][11734] Avg episode reward: 34.384, avg true_objective: 13.135
[2024-09-22 06:19:12,324][11734] Num frames 5300...
[2024-09-22 06:19:12,481][11734] Num frames 5400...
[2024-09-22 06:19:12,659][11734] Num frames 5500...
[2024-09-22 06:19:12,820][11734] Num frames 5600...
[2024-09-22 06:19:13,009][11734] Num frames 5700...
[2024-09-22 06:19:13,209][11734] Num frames 5800...
[2024-09-22 06:19:13,384][11734] Num frames 5900...
[2024-09-22 06:19:13,549][11734] Num frames 6000...
[2024-09-22 06:19:13,703][11734] Num frames 6100...
[2024-09-22 06:19:13,876][11734] Num frames 6200...
[2024-09-22 06:19:14,040][11734] Num frames 6300...
[2024-09-22 06:19:14,199][11734] Num frames 6400...
[2024-09-22 06:19:14,363][11734] Num frames 6500...
[2024-09-22 06:19:14,533][11734] Num frames 6600...
[2024-09-22 06:19:14,702][11734] Num frames 6700...
[2024-09-22 06:19:14,859][11734] Num frames 6800...
[2024-09-22 06:19:14,923][11734] Avg episode rewards: #0: 36.407, true rewards: #0: 13.608
[2024-09-22 06:19:14,925][11734] Avg episode reward: 36.407, avg true_objective: 13.608
[2024-09-22 06:19:15,075][11734] Num frames 6900...
[2024-09-22 06:19:15,243][11734] Num frames 7000...
[2024-09-22 06:19:15,402][11734] Num frames 7100...
[2024-09-22 06:19:15,560][11734] Num frames 7200...
[2024-09-22 06:19:15,724][11734] Num frames 7300...
[2024-09-22 06:19:15,885][11734] Num frames 7400...
[2024-09-22 06:19:16,037][11734] Num frames 7500...
[2024-09-22 06:19:16,107][11734] Avg episode rewards: #0: 33.013, true rewards: #0: 12.513
[2024-09-22 06:19:16,109][11734] Avg episode reward: 33.013, avg true_objective: 12.513
[2024-09-22 06:19:16,246][11734] Num frames 7600...
[2024-09-22 06:19:16,411][11734] Num frames 7700...
[2024-09-22 06:19:16,545][11734] Num frames 7800...
[2024-09-22 06:19:16,699][11734] Num frames 7900...
[2024-09-22 06:19:16,858][11734] Num frames 8000...
[2024-09-22 06:19:17,023][11734] Num frames 8100...
[2024-09-22 06:19:17,104][11734] Avg episode rewards: #0: 29.880, true rewards: #0: 11.594
[2024-09-22 06:19:17,106][11734] Avg episode reward: 29.880, avg true_objective: 11.594
[2024-09-22 06:19:17,228][11734] Num frames 8200...
[2024-09-22 06:19:17,390][11734] Num frames 8300...
[2024-09-22 06:19:17,565][11734] Num frames 8400...
[2024-09-22 06:19:17,734][11734] Num frames 8500...
[2024-09-22 06:19:17,881][11734] Num frames 8600...
[2024-09-22 06:19:18,053][11734] Num frames 8700...
[2024-09-22 06:19:18,219][11734] Num frames 8800...
[2024-09-22 06:19:18,361][11734] Num frames 8900...
[2024-09-22 06:19:18,528][11734] Num frames 9000...
[2024-09-22 06:19:18,701][11734] Num frames 9100...
[2024-09-22 06:19:18,851][11734] Num frames 9200...
[2024-09-22 06:19:19,007][11734] Num frames 9300...
[2024-09-22 06:19:19,183][11734] Num frames 9400...
[2024-09-22 06:19:19,348][11734] Num frames 9500...
[2024-09-22 06:19:19,502][11734] Num frames 9600...
[2024-09-22 06:19:19,654][11734] Num frames 9700...
[2024-09-22 06:19:19,836][11734] Num frames 9800...
[2024-09-22 06:19:19,986][11734] Num frames 9900...
[2024-09-22 06:19:20,145][11734] Num frames 10000...
[2024-09-22 06:19:20,313][11734] Num frames 10100...
[2024-09-22 06:19:20,488][11734] Num frames 10200...
[2024-09-22 06:19:20,563][11734] Avg episode rewards: #0: 33.016, true rewards: #0: 12.766
[2024-09-22 06:19:20,565][11734] Avg episode reward: 33.016, avg true_objective: 12.766
[2024-09-22 06:19:20,694][11734] Num frames 10300...
[2024-09-22 06:19:20,851][11734] Num frames 10400...
[2024-09-22 06:19:21,017][11734] Num frames 10500...
[2024-09-22 06:19:21,188][11734] Num frames 10600...
[2024-09-22 06:19:21,347][11734] Num frames 10700...
[2024-09-22 06:19:21,507][11734] Num frames 10800...
[2024-09-22 06:19:21,687][11734] Num frames 10900...
[2024-09-22 06:19:21,845][11734] Num frames 11000...
[2024-09-22 06:19:21,991][11734] Num frames 11100...
[2024-09-22 06:19:22,162][11734] Num frames 11200...
[2024-09-22 06:19:22,313][11734] Num frames 11300...
[2024-09-22 06:19:22,467][11734] Num frames 11400...
[2024-09-22 06:19:22,623][11734] Num frames 11500...
[2024-09-22 06:19:22,726][11734] Avg episode rewards: #0: 33.694, true rewards: #0: 12.806
[2024-09-22 06:19:22,727][11734] Avg episode reward: 33.694, avg true_objective: 12.806
[2024-09-22 06:19:22,847][11734] Num frames 11600...
[2024-09-22 06:19:23,010][11734] Num frames 11700...
[2024-09-22 06:19:23,161][11734] Num frames 11800...
[2024-09-22 06:19:23,332][11734] Avg episode rewards: #0: 30.777, true rewards: #0: 11.877
[2024-09-22 06:19:23,334][11734] Avg episode reward: 30.777, avg true_objective: 11.877
[2024-09-22 06:19:56,200][11734] Replay video saved to /content/train_dir/default_experiment/replay.mp4!
[2024-09-22 06:20:31,391][11734] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
[2024-09-22 06:20:31,393][11734] Overriding arg 'num_workers' with value 1 passed from command line
[2024-09-22 06:20:31,394][11734] Adding new argument 'no_render'=True that is not in the saved config file!
[2024-09-22 06:20:31,395][11734] Adding new argument 'save_video'=True that is not in the saved config file!
[2024-09-22 06:20:31,398][11734] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
[2024-09-22 06:20:31,400][11734] Adding new argument 'video_name'=None that is not in the saved config file!
[2024-09-22 06:20:31,401][11734] Adding new argument 'max_num_frames'=100000 that is not in the saved config file!
[2024-09-22 06:20:31,402][11734] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
[2024-09-22 06:20:31,405][11734] Adding new argument 'push_to_hub'=True that is not in the saved config file!
[2024-09-22 06:20:31,405][11734] Adding new argument 'hf_repository'='Vivek-huggingface/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file!
[2024-09-22 06:20:31,407][11734] Adding new argument 'policy_index'=0 that is not in the saved config file!
[2024-09-22 06:20:31,409][11734] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
[2024-09-22 06:20:31,411][11734] Adding new argument 'train_script'=None that is not in the saved config file!
[2024-09-22 06:20:31,412][11734] Adding new argument 'enjoy_script'=None that is not in the saved config file!
[2024-09-22 06:20:31,414][11734] Using frameskip 1 and render_action_repeat=4 for evaluation
[2024-09-22 06:20:31,440][11734] RunningMeanStd input shape: (3, 72, 128)
[2024-09-22 06:20:31,442][11734] RunningMeanStd input shape: (1,)
[2024-09-22 06:20:31,457][11734] ConvEncoder: input_channels=3
[2024-09-22 06:20:31,502][11734] Conv encoder output size: 512
[2024-09-22 06:20:31,504][11734] Policy head output size: 512
[2024-09-22 06:20:31,532][11734] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002447_10022912.pth...
[2024-09-22 06:20:32,034][11734] Num frames 100...
[2024-09-22 06:20:32,187][11734] Num frames 200...
[2024-09-22 06:20:32,360][11734] Num frames 300...
[2024-09-22 06:20:32,527][11734] Num frames 400...
[2024-09-22 06:20:32,684][11734] Num frames 500...
[2024-09-22 06:20:32,844][11734] Num frames 600...
[2024-09-22 06:20:33,013][11734] Num frames 700...
[2024-09-22 06:20:33,168][11734] Num frames 800...
[2024-09-22 06:20:33,327][11734] Num frames 900...
[2024-09-22 06:20:33,498][11734] Num frames 1000...
[2024-09-22 06:20:33,669][11734] Num frames 1100...
[2024-09-22 06:20:33,839][11734] Num frames 1200...
[2024-09-22 06:20:34,001][11734] Num frames 1300...
[2024-09-22 06:20:34,177][11734] Num frames 1400...
[2024-09-22 06:20:34,354][11734] Avg episode rewards: #0: 33.720, true rewards: #0: 14.720
[2024-09-22 06:20:34,356][11734] Avg episode reward: 33.720, avg true_objective: 14.720
[2024-09-22 06:20:34,408][11734] Num frames 1500...
[2024-09-22 06:20:34,590][11734] Num frames 1600...
[2024-09-22 06:20:34,754][11734] Num frames 1700...
[2024-09-22 06:20:34,937][11734] Num frames 1800...
[2024-09-22 06:20:35,107][11734] Num frames 1900...
[2024-09-22 06:20:35,252][11734] Num frames 2000...
[2024-09-22 06:20:35,398][11734] Avg episode rewards: #0: 21.740, true rewards: #0: 10.240
[2024-09-22 06:20:35,400][11734] Avg episode reward: 21.740, avg true_objective: 10.240
[2024-09-22 06:20:35,496][11734] Num frames 2100...
[2024-09-22 06:20:35,668][11734] Num frames 2200...
[2024-09-22 06:20:35,827][11734] Num frames 2300...
[2024-09-22 06:20:36,009][11734] Num frames 2400...
[2024-09-22 06:20:36,163][11734] Num frames 2500...
[2024-09-22 06:20:36,329][11734] Num frames 2600...
[2024-09-22 06:20:36,462][11734] Num frames 2700...
[2024-09-22 06:20:36,591][11734] Num frames 2800...
[2024-09-22 06:20:36,723][11734] Num frames 2900...
[2024-09-22 06:20:36,860][11734] Num frames 3000...
[2024-09-22 06:20:36,996][11734] Num frames 3100...
[2024-09-22 06:20:37,128][11734] Num frames 3200...
[2024-09-22 06:20:37,262][11734] Avg episode rewards: #0: 23.523, true rewards: #0: 10.857
[2024-09-22 06:20:37,263][11734] Avg episode reward: 23.523, avg true_objective: 10.857
[2024-09-22 06:20:37,322][11734] Num frames 3300...
[2024-09-22 06:20:37,458][11734] Num frames 3400...
[2024-09-22 06:20:37,594][11734] Num frames 3500...
[2024-09-22 06:20:37,725][11734] Num frames 3600...
[2024-09-22 06:20:37,861][11734] Num frames 3700...
[2024-09-22 06:20:37,993][11734] Num frames 3800...
[2024-09-22 06:20:38,131][11734] Avg episode rewards: #0: 20.913, true rewards: #0: 9.662
[2024-09-22 06:20:38,132][11734] Avg episode reward: 20.913, avg true_objective: 9.662
[2024-09-22 06:20:38,181][11734] Num frames 3900...
[2024-09-22 06:20:38,311][11734] Num frames 4000...
[2024-09-22 06:20:38,444][11734] Num frames 4100...
[2024-09-22 06:20:38,577][11734] Num frames 4200...
[2024-09-22 06:20:38,716][11734] Num frames 4300...
[2024-09-22 06:20:38,854][11734] Num frames 4400...
[2024-09-22 06:20:38,984][11734] Num frames 4500...
[2024-09-22 06:20:39,116][11734] Num frames 4600...
[2024-09-22 06:20:39,255][11734] Num frames 4700...
[2024-09-22 06:20:39,393][11734] Num frames 4800...
[2024-09-22 06:20:39,527][11734] Num frames 4900...
[2024-09-22 06:20:39,667][11734] Num frames 5000...
[2024-09-22 06:20:39,798][11734] Num frames 5100...
[2024-09-22 06:20:39,934][11734] Num frames 5200...
[2024-09-22 06:20:40,073][11734] Num frames 5300...
[2024-09-22 06:20:40,211][11734] Num frames 5400...
[2024-09-22 06:20:40,398][11734] Avg episode rewards: #0: 25.394, true rewards: #0: 10.994
[2024-09-22 06:20:40,400][11734] Avg episode reward: 25.394, avg true_objective: 10.994
[2024-09-22 06:20:40,406][11734] Num frames 5500...
[2024-09-22 06:20:40,544][11734] Num frames 5600...
[2024-09-22 06:20:40,679][11734] Num frames 5700...
[2024-09-22 06:20:40,809][11734] Num frames 5800...
[2024-09-22 06:20:40,942][11734] Num frames 5900...
[2024-09-22 06:20:41,075][11734] Num frames 6000...
[2024-09-22 06:20:41,205][11734] Num frames 6100...
[2024-09-22 06:20:41,340][11734] Num frames 6200...
[2024-09-22 06:20:41,475][11734] Num frames 6300...
[2024-09-22 06:20:41,608][11734] Num frames 6400...
[2024-09-22 06:20:41,743][11734] Num frames 6500...
[2024-09-22 06:20:41,877][11734] Num frames 6600...
[2024-09-22 06:20:42,043][11734] Avg episode rewards: #0: 25.468, true rewards: #0: 11.135
[2024-09-22 06:20:42,046][11734] Avg episode reward: 25.468, avg true_objective: 11.135
[2024-09-22 06:20:42,074][11734] Num frames 6700...
[2024-09-22 06:20:42,211][11734] Num frames 6800...
[2024-09-22 06:20:42,342][11734] Num frames 6900...
[2024-09-22 06:20:42,474][11734] Num frames 7000...
[2024-09-22 06:20:42,605][11734] Num frames 7100...
[2024-09-22 06:20:42,745][11734] Num frames 7200...
[2024-09-22 06:20:42,878][11734] Num frames 7300...
[2024-09-22 06:20:43,012][11734] Num frames 7400...
[2024-09-22 06:20:43,146][11734] Num frames 7500...
[2024-09-22 06:20:43,282][11734] Num frames 7600...
[2024-09-22 06:20:43,370][11734] Avg episode rewards: #0: 24.889, true rewards: #0: 10.889
[2024-09-22 06:20:43,373][11734] Avg episode reward: 24.889, avg true_objective: 10.889
[2024-09-22 06:20:43,478][11734] Num frames 7700...
[2024-09-22 06:20:43,615][11734] Num frames 7800...
[2024-09-22 06:20:43,751][11734] Num frames 7900...
[2024-09-22 06:20:43,885][11734] Num frames 8000...
[2024-09-22 06:20:44,019][11734] Num frames 8100...
[2024-09-22 06:20:44,152][11734] Num frames 8200...
[2024-09-22 06:20:44,286][11734] Num frames 8300...
[2024-09-22 06:20:44,421][11734] Num frames 8400...
[2024-09-22 06:20:44,558][11734] Num frames 8500...
[2024-09-22 06:20:44,691][11734] Num frames 8600...
[2024-09-22 06:20:44,825][11734] Num frames 8700...
[2024-09-22 06:20:44,956][11734] Num frames 8800...
[2024-09-22 06:20:45,086][11734] Num frames 8900...
[2024-09-22 06:20:45,213][11734] Num frames 9000...
[2024-09-22 06:20:45,343][11734] Num frames 9100...
[2024-09-22 06:20:45,515][11734] Avg episode rewards: #0: 27.112, true rewards: #0: 11.487
[2024-09-22 06:20:45,517][11734] Avg episode reward: 27.112, avg true_objective: 11.487
[2024-09-22 06:20:45,532][11734] Num frames 9200...
[2024-09-22 06:20:45,660][11734] Num frames 9300...
[2024-09-22 06:20:45,798][11734] Num frames 9400...
[2024-09-22 06:20:45,927][11734] Num frames 9500...
[2024-09-22 06:20:46,054][11734] Num frames 9600...
[2024-09-22 06:20:46,184][11734] Num frames 9700...
[2024-09-22 06:20:46,312][11734] Num frames 9800...
[2024-09-22 06:20:46,453][11734] Num frames 9900...
[2024-09-22 06:20:46,589][11734] Num frames 10000...
[2024-09-22 06:20:46,714][11734] Avg episode rewards: #0: 26.282, true rewards: #0: 11.171
[2024-09-22 06:20:46,716][11734] Avg episode reward: 26.282, avg true_objective: 11.171
[2024-09-22 06:20:46,780][11734] Num frames 10100...
[2024-09-22 06:20:46,917][11734] Num frames 10200...
[2024-09-22 06:20:47,052][11734] Num frames 10300...
[2024-09-22 06:20:47,191][11734] Num frames 10400...
[2024-09-22 06:20:47,326][11734] Num frames 10500...
[2024-09-22 06:20:47,465][11734] Num frames 10600...
[2024-09-22 06:20:47,603][11734] Num frames 10700...
[2024-09-22 06:20:47,740][11734] Num frames 10800...
[2024-09-22 06:20:47,880][11734] Num frames 10900...
[2024-09-22 06:20:48,017][11734] Num frames 11000...
[2024-09-22 06:20:48,150][11734] Num frames 11100...
[2024-09-22 06:20:48,284][11734] Num frames 11200...
[2024-09-22 06:20:48,413][11734] Num frames 11300...
[2024-09-22 06:20:48,599][11734] Avg episode rewards: #0: 26.798, true rewards: #0: 11.398
[2024-09-22 06:20:48,601][11734] Avg episode reward: 26.798, avg true_objective: 11.398
[2024-09-22 06:20:48,606][11734] Num frames 11400...
[2024-09-22 06:21:20,460][11734] Replay video saved to /content/train_dir/default_experiment/replay.mp4!