diff --git "a/sf_log.txt" "b/sf_log.txt" new file mode 100644--- /dev/null +++ "b/sf_log.txt" @@ -0,0 +1,1015 @@ +[2023-02-22 17:27:38,294][00114] Saving configuration to /content/train_dir/default_experiment/config.json... +[2023-02-22 17:27:38,300][00114] Rollout worker 0 uses device cpu +[2023-02-22 17:27:38,303][00114] Rollout worker 1 uses device cpu +[2023-02-22 17:27:38,306][00114] Rollout worker 2 uses device cpu +[2023-02-22 17:27:38,310][00114] Rollout worker 3 uses device cpu +[2023-02-22 17:27:38,314][00114] Rollout worker 4 uses device cpu +[2023-02-22 17:27:38,321][00114] Rollout worker 5 uses device cpu +[2023-02-22 17:27:38,324][00114] Rollout worker 6 uses device cpu +[2023-02-22 17:27:38,328][00114] Rollout worker 7 uses device cpu +[2023-02-22 17:27:38,605][00114] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2023-02-22 17:27:38,611][00114] InferenceWorker_p0-w0: min num requests: 2 +[2023-02-22 17:27:38,663][00114] Starting all processes... +[2023-02-22 17:27:38,675][00114] Starting process learner_proc0 +[2023-02-22 17:27:38,772][00114] Starting all processes... +[2023-02-22 17:27:38,845][00114] Starting process inference_proc0-0 +[2023-02-22 17:27:38,846][00114] Starting process rollout_proc0 +[2023-02-22 17:27:38,848][00114] Starting process rollout_proc1 +[2023-02-22 17:27:38,853][00114] Starting process rollout_proc2 +[2023-02-22 17:27:38,853][00114] Starting process rollout_proc3 +[2023-02-22 17:27:38,853][00114] Starting process rollout_proc4 +[2023-02-22 17:27:38,853][00114] Starting process rollout_proc5 +[2023-02-22 17:27:38,853][00114] Starting process rollout_proc6 +[2023-02-22 17:27:38,853][00114] Starting process rollout_proc7 +[2023-02-22 17:27:49,858][13425] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2023-02-22 17:27:49,863][13425] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 +[2023-02-22 17:27:50,083][13444] Worker 6 uses CPU cores [0] +[2023-02-22 17:27:50,307][13446] Worker 5 uses CPU cores [1] +[2023-02-22 17:27:50,420][13440] Worker 1 uses CPU cores [1] +[2023-02-22 17:27:50,608][13439] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2023-02-22 17:27:50,611][13439] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 +[2023-02-22 17:27:50,715][13441] Worker 0 uses CPU cores [0] +[2023-02-22 17:27:50,748][13442] Worker 2 uses CPU cores [0] +[2023-02-22 17:27:50,768][13447] Worker 7 uses CPU cores [1] +[2023-02-22 17:27:50,866][13443] Worker 4 uses CPU cores [0] +[2023-02-22 17:27:50,941][13445] Worker 3 uses CPU cores [1] +[2023-02-22 17:27:51,296][13439] Num visible devices: 1 +[2023-02-22 17:27:51,323][13425] Num visible devices: 1 +[2023-02-22 17:27:51,365][13425] Starting seed is not provided +[2023-02-22 17:27:51,366][13425] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2023-02-22 17:27:51,366][13425] Initializing actor-critic model on device cuda:0 +[2023-02-22 17:27:51,366][13425] RunningMeanStd input shape: (3, 72, 128) +[2023-02-22 17:27:51,393][13425] RunningMeanStd input shape: (1,) +[2023-02-22 17:27:51,498][13425] ConvEncoder: input_channels=3 +[2023-02-22 17:27:52,352][13425] Conv encoder output size: 512 +[2023-02-22 17:27:52,353][13425] Policy head output size: 512 +[2023-02-22 17:27:52,457][13425] Created Actor Critic model with architecture: +[2023-02-22 17:27:52,458][13425] ActorCriticSharedWeights( + (obs_normalizer): ObservationNormalizer( + (running_mean_std): RunningMeanStdDictInPlace( + (running_mean_std): ModuleDict( + (obs): RunningMeanStdInPlace() + ) + ) + ) + (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) + (encoder): VizdoomEncoder( + (basic_encoder): ConvEncoder( + (enc): RecursiveScriptModule( + original_name=ConvEncoderImpl + (conv_head): RecursiveScriptModule( + original_name=Sequential + (0): RecursiveScriptModule(original_name=Conv2d) + (1): RecursiveScriptModule(original_name=ELU) + (2): RecursiveScriptModule(original_name=Conv2d) + (3): RecursiveScriptModule(original_name=ELU) + (4): RecursiveScriptModule(original_name=Conv2d) + (5): RecursiveScriptModule(original_name=ELU) + ) + (mlp_layers): RecursiveScriptModule( + original_name=Sequential + (0): RecursiveScriptModule(original_name=Linear) + (1): RecursiveScriptModule(original_name=ELU) + ) + ) + ) + ) + (core): ModelCoreRNN( + (core): GRU(512, 512) + ) + (decoder): MlpDecoder( + (mlp): Identity() + ) + (critic_linear): Linear(in_features=512, out_features=1, bias=True) + (action_parameterization): ActionParameterizationDefault( + (distribution_linear): Linear(in_features=512, out_features=5, bias=True) + ) +) +[2023-02-22 17:27:58,595][00114] Heartbeat connected on Batcher_0 +[2023-02-22 17:27:58,605][00114] Heartbeat connected on InferenceWorker_p0-w0 +[2023-02-22 17:27:58,622][00114] Heartbeat connected on RolloutWorker_w0 +[2023-02-22 17:27:58,627][00114] Heartbeat connected on RolloutWorker_w1 +[2023-02-22 17:27:58,631][00114] Heartbeat connected on RolloutWorker_w2 +[2023-02-22 17:27:58,636][00114] Heartbeat connected on RolloutWorker_w3 +[2023-02-22 17:27:58,640][00114] Heartbeat connected on RolloutWorker_w4 +[2023-02-22 17:27:58,646][00114] Heartbeat connected on RolloutWorker_w5 +[2023-02-22 17:27:58,651][00114] Heartbeat connected on RolloutWorker_w6 +[2023-02-22 17:27:58,664][00114] Heartbeat connected on RolloutWorker_w7 +[2023-02-22 17:28:00,982][13425] Using optimizer +[2023-02-22 17:28:00,984][13425] No checkpoints found +[2023-02-22 17:28:00,984][13425] Did not load from checkpoint, starting from scratch! +[2023-02-22 17:28:00,984][13425] Initialized policy 0 weights for model version 0 +[2023-02-22 17:28:00,989][13425] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2023-02-22 17:28:00,995][13425] LearnerWorker_p0 finished initialization! +[2023-02-22 17:28:01,000][00114] Heartbeat connected on LearnerWorker_p0 +[2023-02-22 17:28:01,190][13439] RunningMeanStd input shape: (3, 72, 128) +[2023-02-22 17:28:01,191][13439] RunningMeanStd input shape: (1,) +[2023-02-22 17:28:01,203][13439] ConvEncoder: input_channels=3 +[2023-02-22 17:28:01,303][13439] Conv encoder output size: 512 +[2023-02-22 17:28:01,304][13439] Policy head output size: 512 +[2023-02-22 17:28:02,104][00114] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2023-02-22 17:28:03,487][00114] Inference worker 0-0 is ready! +[2023-02-22 17:28:03,489][00114] All inference workers are ready! Signal rollout workers to start! +[2023-02-22 17:28:03,652][13443] Doom resolution: 160x120, resize resolution: (128, 72) +[2023-02-22 17:28:03,672][13442] Doom resolution: 160x120, resize resolution: (128, 72) +[2023-02-22 17:28:03,676][13441] Doom resolution: 160x120, resize resolution: (128, 72) +[2023-02-22 17:28:03,682][13444] Doom resolution: 160x120, resize resolution: (128, 72) +[2023-02-22 17:28:03,699][13445] Doom resolution: 160x120, resize resolution: (128, 72) +[2023-02-22 17:28:03,751][13440] Doom resolution: 160x120, resize resolution: (128, 72) +[2023-02-22 17:28:03,747][13446] Doom resolution: 160x120, resize resolution: (128, 72) +[2023-02-22 17:28:03,777][13447] Doom resolution: 160x120, resize resolution: (128, 72) +[2023-02-22 17:28:05,037][13446] Decorrelating experience for 0 frames... +[2023-02-22 17:28:05,039][13440] Decorrelating experience for 0 frames... +[2023-02-22 17:28:05,658][13443] Decorrelating experience for 0 frames... +[2023-02-22 17:28:05,656][13442] Decorrelating experience for 0 frames... +[2023-02-22 17:28:05,662][13441] Decorrelating experience for 0 frames... +[2023-02-22 17:28:05,674][13444] Decorrelating experience for 0 frames... +[2023-02-22 17:28:05,935][13446] Decorrelating experience for 32 frames... +[2023-02-22 17:28:07,104][00114] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2023-02-22 17:28:07,237][13445] Decorrelating experience for 0 frames... +[2023-02-22 17:28:07,244][13440] Decorrelating experience for 32 frames... +[2023-02-22 17:28:07,279][13447] Decorrelating experience for 0 frames... +[2023-02-22 17:28:07,711][13442] Decorrelating experience for 32 frames... +[2023-02-22 17:28:07,719][13443] Decorrelating experience for 32 frames... +[2023-02-22 17:28:07,734][13441] Decorrelating experience for 32 frames... +[2023-02-22 17:28:07,962][13444] Decorrelating experience for 32 frames... +[2023-02-22 17:28:08,605][13447] Decorrelating experience for 32 frames... +[2023-02-22 17:28:08,755][13440] Decorrelating experience for 64 frames... +[2023-02-22 17:28:09,335][13446] Decorrelating experience for 64 frames... +[2023-02-22 17:28:09,560][13442] Decorrelating experience for 64 frames... +[2023-02-22 17:28:09,555][13443] Decorrelating experience for 64 frames... +[2023-02-22 17:28:09,576][13441] Decorrelating experience for 64 frames... +[2023-02-22 17:28:10,085][13445] Decorrelating experience for 32 frames... +[2023-02-22 17:28:10,711][13444] Decorrelating experience for 64 frames... +[2023-02-22 17:28:10,802][13446] Decorrelating experience for 96 frames... +[2023-02-22 17:28:10,917][13447] Decorrelating experience for 64 frames... +[2023-02-22 17:28:11,015][13443] Decorrelating experience for 96 frames... +[2023-02-22 17:28:11,019][13441] Decorrelating experience for 96 frames... +[2023-02-22 17:28:11,485][13440] Decorrelating experience for 96 frames... +[2023-02-22 17:28:11,770][13442] Decorrelating experience for 96 frames... +[2023-02-22 17:28:11,858][13444] Decorrelating experience for 96 frames... +[2023-02-22 17:28:12,082][13445] Decorrelating experience for 64 frames... +[2023-02-22 17:28:12,105][00114] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2023-02-22 17:28:12,466][13447] Decorrelating experience for 96 frames... +[2023-02-22 17:28:12,645][13445] Decorrelating experience for 96 frames... +[2023-02-22 17:28:17,104][00114] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 40.5. Samples: 608. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2023-02-22 17:28:17,107][00114] Avg episode reward: [(0, '2.008')] +[2023-02-22 17:28:17,287][13425] Signal inference workers to stop experience collection... +[2023-02-22 17:28:17,304][13439] InferenceWorker_p0-w0: stopping experience collection +[2023-02-22 17:28:20,066][13425] Signal inference workers to resume experience collection... +[2023-02-22 17:28:20,069][13439] InferenceWorker_p0-w0: resuming experience collection +[2023-02-22 17:28:22,104][00114] Fps is (10 sec: 819.3, 60 sec: 409.6, 300 sec: 409.6). Total num frames: 8192. Throughput: 0: 142.8. Samples: 2856. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) +[2023-02-22 17:28:22,109][00114] Avg episode reward: [(0, '2.545')] +[2023-02-22 17:28:27,104][00114] Fps is (10 sec: 2457.6, 60 sec: 983.0, 300 sec: 983.0). Total num frames: 24576. Throughput: 0: 197.6. Samples: 4940. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2023-02-22 17:28:27,107][00114] Avg episode reward: [(0, '3.829')] +[2023-02-22 17:28:31,440][13439] Updated weights for policy 0, policy_version 10 (0.0346) +[2023-02-22 17:28:32,104][00114] Fps is (10 sec: 3276.8, 60 sec: 1365.3, 300 sec: 1365.3). Total num frames: 40960. Throughput: 0: 319.0. Samples: 9570. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2023-02-22 17:28:32,110][00114] Avg episode reward: [(0, '4.199')] +[2023-02-22 17:28:37,104][00114] Fps is (10 sec: 4096.1, 60 sec: 1872.5, 300 sec: 1872.5). Total num frames: 65536. Throughput: 0: 482.1. Samples: 16872. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-02-22 17:28:37,106][00114] Avg episode reward: [(0, '4.387')] +[2023-02-22 17:28:39,825][13439] Updated weights for policy 0, policy_version 20 (0.0015) +[2023-02-22 17:28:42,104][00114] Fps is (10 sec: 4915.1, 60 sec: 2252.8, 300 sec: 2252.8). Total num frames: 90112. Throughput: 0: 513.3. Samples: 20534. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-02-22 17:28:42,108][00114] Avg episode reward: [(0, '4.391')] +[2023-02-22 17:28:47,109][00114] Fps is (10 sec: 3684.7, 60 sec: 2275.3, 300 sec: 2275.3). Total num frames: 102400. Throughput: 0: 566.6. Samples: 25500. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-22 17:28:47,111][00114] Avg episode reward: [(0, '4.307')] +[2023-02-22 17:28:47,115][13425] Saving new best policy, reward=4.307! +[2023-02-22 17:28:52,017][13439] Updated weights for policy 0, policy_version 30 (0.0019) +[2023-02-22 17:28:52,104][00114] Fps is (10 sec: 3276.8, 60 sec: 2457.6, 300 sec: 2457.6). Total num frames: 122880. Throughput: 0: 682.5. Samples: 30714. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-22 17:28:52,107][00114] Avg episode reward: [(0, '4.352')] +[2023-02-22 17:28:52,114][13425] Saving new best policy, reward=4.352! +[2023-02-22 17:28:57,104][00114] Fps is (10 sec: 4507.7, 60 sec: 2681.0, 300 sec: 2681.0). Total num frames: 147456. Throughput: 0: 762.1. Samples: 34294. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-22 17:28:57,107][00114] Avg episode reward: [(0, '4.515')] +[2023-02-22 17:28:57,110][13425] Saving new best policy, reward=4.515! +[2023-02-22 17:29:00,539][13439] Updated weights for policy 0, policy_version 40 (0.0013) +[2023-02-22 17:29:02,104][00114] Fps is (10 sec: 4505.6, 60 sec: 2798.9, 300 sec: 2798.9). Total num frames: 167936. Throughput: 0: 908.6. Samples: 41496. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-22 17:29:02,109][00114] Avg episode reward: [(0, '4.474')] +[2023-02-22 17:29:07,104][00114] Fps is (10 sec: 3276.8, 60 sec: 3003.7, 300 sec: 2772.7). Total num frames: 180224. Throughput: 0: 962.8. Samples: 46184. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-22 17:29:07,106][00114] Avg episode reward: [(0, '4.479')] +[2023-02-22 17:29:12,104][00114] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 2867.2). Total num frames: 200704. Throughput: 0: 969.3. Samples: 48558. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-22 17:29:12,109][00114] Avg episode reward: [(0, '4.342')] +[2023-02-22 17:29:12,323][13439] Updated weights for policy 0, policy_version 50 (0.0027) +[2023-02-22 17:29:17,104][00114] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3003.7). Total num frames: 225280. Throughput: 0: 1026.0. Samples: 55738. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-22 17:29:17,111][00114] Avg episode reward: [(0, '4.273')] +[2023-02-22 17:29:21,097][13439] Updated weights for policy 0, policy_version 60 (0.0023) +[2023-02-22 17:29:22,104][00114] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3072.0). Total num frames: 245760. Throughput: 0: 1009.9. Samples: 62316. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-22 17:29:22,110][00114] Avg episode reward: [(0, '4.458')] +[2023-02-22 17:29:27,104][00114] Fps is (10 sec: 3686.3, 60 sec: 3959.5, 300 sec: 3084.0). Total num frames: 262144. Throughput: 0: 979.4. Samples: 64606. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-22 17:29:27,107][00114] Avg episode reward: [(0, '4.474')] +[2023-02-22 17:29:32,104][00114] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3140.3). Total num frames: 282624. Throughput: 0: 982.2. Samples: 69696. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-22 17:29:32,106][00114] Avg episode reward: [(0, '4.430')] +[2023-02-22 17:29:32,118][13425] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000069_282624.pth... +[2023-02-22 17:29:33,093][13439] Updated weights for policy 0, policy_version 70 (0.0012) +[2023-02-22 17:29:37,104][00114] Fps is (10 sec: 4096.1, 60 sec: 3959.5, 300 sec: 3190.6). Total num frames: 303104. Throughput: 0: 1015.7. Samples: 76420. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-22 17:29:37,110][00114] Avg episode reward: [(0, '4.442')] +[2023-02-22 17:29:42,104][00114] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3235.8). Total num frames: 323584. Throughput: 0: 1018.2. Samples: 80112. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-02-22 17:29:42,109][00114] Avg episode reward: [(0, '4.416')] +[2023-02-22 17:29:42,485][13439] Updated weights for policy 0, policy_version 80 (0.0019) +[2023-02-22 17:29:47,104][00114] Fps is (10 sec: 3686.4, 60 sec: 3959.8, 300 sec: 3237.8). Total num frames: 339968. Throughput: 0: 962.4. Samples: 84804. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-22 17:29:47,108][00114] Avg episode reward: [(0, '4.401')] +[2023-02-22 17:29:52,104][00114] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3276.8). Total num frames: 360448. Throughput: 0: 985.2. Samples: 90520. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2023-02-22 17:29:52,111][00114] Avg episode reward: [(0, '4.442')] +[2023-02-22 17:29:53,367][13439] Updated weights for policy 0, policy_version 90 (0.0031) +[2023-02-22 17:29:57,104][00114] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3348.0). Total num frames: 385024. Throughput: 0: 1013.8. Samples: 94180. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-22 17:29:57,109][00114] Avg episode reward: [(0, '4.496')] +[2023-02-22 17:30:02,109][00114] Fps is (10 sec: 4503.5, 60 sec: 3959.2, 300 sec: 3379.1). Total num frames: 405504. Throughput: 0: 1004.4. Samples: 100942. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-22 17:30:02,111][00114] Avg episode reward: [(0, '4.460')] +[2023-02-22 17:30:03,146][13439] Updated weights for policy 0, policy_version 100 (0.0020) +[2023-02-22 17:30:07,104][00114] Fps is (10 sec: 3276.7, 60 sec: 3959.5, 300 sec: 3342.3). Total num frames: 417792. Throughput: 0: 957.2. Samples: 105388. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-22 17:30:07,111][00114] Avg episode reward: [(0, '4.518')] +[2023-02-22 17:30:07,117][13425] Saving new best policy, reward=4.518! +[2023-02-22 17:30:12,104][00114] Fps is (10 sec: 3688.1, 60 sec: 4027.7, 300 sec: 3402.8). Total num frames: 442368. Throughput: 0: 965.1. Samples: 108034. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-22 17:30:12,111][00114] Avg episode reward: [(0, '4.630')] +[2023-02-22 17:30:12,121][13425] Saving new best policy, reward=4.630! +[2023-02-22 17:30:13,868][13439] Updated weights for policy 0, policy_version 110 (0.0025) +[2023-02-22 17:30:17,104][00114] Fps is (10 sec: 4505.7, 60 sec: 3959.5, 300 sec: 3428.5). Total num frames: 462848. Throughput: 0: 1013.1. Samples: 115284. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-22 17:30:17,113][00114] Avg episode reward: [(0, '4.611')] +[2023-02-22 17:30:22,106][00114] Fps is (10 sec: 4095.1, 60 sec: 3959.3, 300 sec: 3452.3). Total num frames: 483328. Throughput: 0: 994.6. Samples: 121180. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-22 17:30:22,109][00114] Avg episode reward: [(0, '4.589')] +[2023-02-22 17:30:24,459][13439] Updated weights for policy 0, policy_version 120 (0.0013) +[2023-02-22 17:30:27,104][00114] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3418.0). Total num frames: 495616. Throughput: 0: 962.8. Samples: 123436. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-22 17:30:27,106][00114] Avg episode reward: [(0, '4.669')] +[2023-02-22 17:30:27,114][13425] Saving new best policy, reward=4.669! +[2023-02-22 17:30:32,105][00114] Fps is (10 sec: 2457.8, 60 sec: 3754.6, 300 sec: 3386.0). Total num frames: 507904. Throughput: 0: 942.1. Samples: 127198. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-02-22 17:30:32,108][00114] Avg episode reward: [(0, '4.715')] +[2023-02-22 17:30:32,132][13425] Saving new best policy, reward=4.715! +[2023-02-22 17:30:37,104][00114] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3408.9). Total num frames: 528384. Throughput: 0: 924.9. Samples: 132140. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-02-22 17:30:37,106][00114] Avg episode reward: [(0, '4.829')] +[2023-02-22 17:30:37,114][13425] Saving new best policy, reward=4.829! +[2023-02-22 17:30:37,961][13439] Updated weights for policy 0, policy_version 130 (0.0028) +[2023-02-22 17:30:42,104][00114] Fps is (10 sec: 4096.6, 60 sec: 3754.7, 300 sec: 3430.4). Total num frames: 548864. Throughput: 0: 922.3. Samples: 135684. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-22 17:30:42,109][00114] Avg episode reward: [(0, '4.799')] +[2023-02-22 17:30:47,105][00114] Fps is (10 sec: 3276.6, 60 sec: 3686.4, 300 sec: 3400.9). Total num frames: 561152. Throughput: 0: 875.9. Samples: 140356. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-22 17:30:47,111][00114] Avg episode reward: [(0, '4.793')] +[2023-02-22 17:30:49,660][13439] Updated weights for policy 0, policy_version 140 (0.0025) +[2023-02-22 17:30:52,104][00114] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3421.4). Total num frames: 581632. Throughput: 0: 902.8. Samples: 146014. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-22 17:30:52,106][00114] Avg episode reward: [(0, '4.697')] +[2023-02-22 17:30:57,104][00114] Fps is (10 sec: 4505.9, 60 sec: 3686.4, 300 sec: 3464.0). Total num frames: 606208. Throughput: 0: 926.1. Samples: 149710. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-22 17:30:57,107][00114] Avg episode reward: [(0, '4.661')] +[2023-02-22 17:30:58,116][13439] Updated weights for policy 0, policy_version 150 (0.0012) +[2023-02-22 17:31:02,104][00114] Fps is (10 sec: 4505.6, 60 sec: 3686.7, 300 sec: 3481.6). Total num frames: 626688. Throughput: 0: 913.3. Samples: 156382. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-02-22 17:31:02,113][00114] Avg episode reward: [(0, '4.597')] +[2023-02-22 17:31:07,104][00114] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3476.1). Total num frames: 643072. Throughput: 0: 884.0. Samples: 160956. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-02-22 17:31:07,112][00114] Avg episode reward: [(0, '4.740')] +[2023-02-22 17:31:10,095][13439] Updated weights for policy 0, policy_version 160 (0.0025) +[2023-02-22 17:31:12,104][00114] Fps is (10 sec: 3686.3, 60 sec: 3686.4, 300 sec: 3492.4). Total num frames: 663552. Throughput: 0: 892.6. Samples: 163604. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-22 17:31:12,107][00114] Avg episode reward: [(0, '4.871')] +[2023-02-22 17:31:12,120][13425] Saving new best policy, reward=4.871! +[2023-02-22 17:31:17,104][00114] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3528.9). Total num frames: 688128. Throughput: 0: 968.3. Samples: 170772. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-02-22 17:31:17,107][00114] Avg episode reward: [(0, '4.715')] +[2023-02-22 17:31:18,750][13439] Updated weights for policy 0, policy_version 170 (0.0020) +[2023-02-22 17:31:22,107][00114] Fps is (10 sec: 4094.9, 60 sec: 3686.3, 300 sec: 3522.5). Total num frames: 704512. Throughput: 0: 994.2. Samples: 176884. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-22 17:31:22,109][00114] Avg episode reward: [(0, '4.555')] +[2023-02-22 17:31:27,104][00114] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3516.6). Total num frames: 720896. Throughput: 0: 966.3. Samples: 179166. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-02-22 17:31:27,109][00114] Avg episode reward: [(0, '4.626')] +[2023-02-22 17:31:30,709][13439] Updated weights for policy 0, policy_version 180 (0.0019) +[2023-02-22 17:31:32,104][00114] Fps is (10 sec: 3687.5, 60 sec: 3891.3, 300 sec: 3530.4). Total num frames: 741376. Throughput: 0: 984.0. Samples: 184636. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-22 17:31:32,111][00114] Avg episode reward: [(0, '4.538')] +[2023-02-22 17:31:32,120][13425] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000181_741376.pth... +[2023-02-22 17:31:37,104][00114] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3562.6). Total num frames: 765952. Throughput: 0: 1018.3. Samples: 191838. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-22 17:31:37,111][00114] Avg episode reward: [(0, '4.621')] +[2023-02-22 17:31:42,104][00114] Fps is (10 sec: 3276.9, 60 sec: 3754.7, 300 sec: 3518.8). Total num frames: 774144. Throughput: 0: 986.4. Samples: 194100. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-22 17:31:42,115][00114] Avg episode reward: [(0, '4.770')] +[2023-02-22 17:31:42,182][13439] Updated weights for policy 0, policy_version 190 (0.0013) +[2023-02-22 17:31:47,104][00114] Fps is (10 sec: 2457.6, 60 sec: 3823.0, 300 sec: 3513.5). Total num frames: 790528. Throughput: 0: 910.5. Samples: 197356. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-22 17:31:47,109][00114] Avg episode reward: [(0, '4.724')] +[2023-02-22 17:31:52,104][00114] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3543.9). Total num frames: 815104. Throughput: 0: 949.6. Samples: 203690. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-22 17:31:52,106][00114] Avg episode reward: [(0, '4.725')] +[2023-02-22 17:31:52,969][13439] Updated weights for policy 0, policy_version 200 (0.0031) +[2023-02-22 17:31:57,104][00114] Fps is (10 sec: 4915.2, 60 sec: 3891.2, 300 sec: 3573.1). Total num frames: 839680. Throughput: 0: 972.6. Samples: 207372. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-22 17:31:57,106][00114] Avg episode reward: [(0, '4.692')] +[2023-02-22 17:32:02,104][00114] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3566.9). Total num frames: 856064. Throughput: 0: 950.3. Samples: 213536. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-02-22 17:32:02,106][00114] Avg episode reward: [(0, '4.589')] +[2023-02-22 17:32:03,203][13439] Updated weights for policy 0, policy_version 210 (0.0014) +[2023-02-22 17:32:07,104][00114] Fps is (10 sec: 3276.7, 60 sec: 3822.9, 300 sec: 3561.0). Total num frames: 872448. Throughput: 0: 917.9. Samples: 218186. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-22 17:32:07,107][00114] Avg episode reward: [(0, '4.543')] +[2023-02-22 17:32:12,104][00114] Fps is (10 sec: 3686.4, 60 sec: 3823.0, 300 sec: 3571.7). Total num frames: 892928. Throughput: 0: 938.1. Samples: 221382. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-22 17:32:12,107][00114] Avg episode reward: [(0, '4.495')] +[2023-02-22 17:32:13,223][13439] Updated weights for policy 0, policy_version 220 (0.0027) +[2023-02-22 17:32:17,104][00114] Fps is (10 sec: 4505.8, 60 sec: 3822.9, 300 sec: 3598.1). Total num frames: 917504. Throughput: 0: 980.1. Samples: 228742. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-22 17:32:17,106][00114] Avg episode reward: [(0, '4.311')] +[2023-02-22 17:32:22,104][00114] Fps is (10 sec: 4095.8, 60 sec: 3823.1, 300 sec: 3591.9). Total num frames: 933888. Throughput: 0: 940.5. Samples: 234160. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-22 17:32:22,108][00114] Avg episode reward: [(0, '4.334')] +[2023-02-22 17:32:23,946][13439] Updated weights for policy 0, policy_version 230 (0.0014) +[2023-02-22 17:32:27,105][00114] Fps is (10 sec: 3276.6, 60 sec: 3822.9, 300 sec: 3585.9). Total num frames: 950272. Throughput: 0: 939.8. Samples: 236390. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-02-22 17:32:27,109][00114] Avg episode reward: [(0, '4.595')] +[2023-02-22 17:32:32,104][00114] Fps is (10 sec: 4096.2, 60 sec: 3891.2, 300 sec: 3610.5). Total num frames: 974848. Throughput: 0: 1004.7. Samples: 242566. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-22 17:32:32,110][00114] Avg episode reward: [(0, '4.869')] +[2023-02-22 17:32:33,520][13439] Updated weights for policy 0, policy_version 240 (0.0013) +[2023-02-22 17:32:37,104][00114] Fps is (10 sec: 4915.5, 60 sec: 3891.2, 300 sec: 3634.3). Total num frames: 999424. Throughput: 0: 1028.4. Samples: 249970. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-22 17:32:37,109][00114] Avg episode reward: [(0, '4.728')] +[2023-02-22 17:32:42,104][00114] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3613.3). Total num frames: 1011712. Throughput: 0: 1002.7. Samples: 252492. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-22 17:32:42,109][00114] Avg episode reward: [(0, '4.656')] +[2023-02-22 17:32:44,945][13439] Updated weights for policy 0, policy_version 250 (0.0018) +[2023-02-22 17:32:47,104][00114] Fps is (10 sec: 2867.1, 60 sec: 3959.4, 300 sec: 3607.4). Total num frames: 1028096. Throughput: 0: 966.5. Samples: 257030. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-22 17:32:47,107][00114] Avg episode reward: [(0, '4.492')] +[2023-02-22 17:32:52,104][00114] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3629.9). Total num frames: 1052672. Throughput: 0: 1011.4. Samples: 263700. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-22 17:32:52,111][00114] Avg episode reward: [(0, '4.749')] +[2023-02-22 17:32:54,040][13439] Updated weights for policy 0, policy_version 260 (0.0014) +[2023-02-22 17:32:57,104][00114] Fps is (10 sec: 4915.4, 60 sec: 3959.5, 300 sec: 3651.7). Total num frames: 1077248. Throughput: 0: 1020.4. Samples: 267302. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-22 17:32:57,109][00114] Avg episode reward: [(0, '4.862')] +[2023-02-22 17:33:02,106][00114] Fps is (10 sec: 3685.5, 60 sec: 3891.0, 300 sec: 3693.3). Total num frames: 1089536. Throughput: 0: 978.3. Samples: 272766. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-02-22 17:33:02,109][00114] Avg episode reward: [(0, '4.743')] +[2023-02-22 17:33:06,112][13439] Updated weights for policy 0, policy_version 270 (0.0039) +[2023-02-22 17:33:07,104][00114] Fps is (10 sec: 2867.2, 60 sec: 3891.2, 300 sec: 3748.9). Total num frames: 1105920. Throughput: 0: 963.3. Samples: 277510. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-22 17:33:07,106][00114] Avg episode reward: [(0, '4.630')] +[2023-02-22 17:33:12,104][00114] Fps is (10 sec: 4097.0, 60 sec: 3959.5, 300 sec: 3832.2). Total num frames: 1130496. Throughput: 0: 994.1. Samples: 281122. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-02-22 17:33:12,109][00114] Avg episode reward: [(0, '4.897')] +[2023-02-22 17:33:12,123][13425] Saving new best policy, reward=4.897! +[2023-02-22 17:33:14,702][13439] Updated weights for policy 0, policy_version 280 (0.0012) +[2023-02-22 17:33:17,114][00114] Fps is (10 sec: 4910.4, 60 sec: 3958.8, 300 sec: 3887.6). Total num frames: 1155072. Throughput: 0: 1013.6. Samples: 288188. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-02-22 17:33:17,116][00114] Avg episode reward: [(0, '4.789')] +[2023-02-22 17:33:22,104][00114] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3887.7). Total num frames: 1171456. Throughput: 0: 959.9. Samples: 293166. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-22 17:33:22,111][00114] Avg episode reward: [(0, '4.693')] +[2023-02-22 17:33:26,849][13439] Updated weights for policy 0, policy_version 290 (0.0031) +[2023-02-22 17:33:27,104][00114] Fps is (10 sec: 3280.0, 60 sec: 3959.5, 300 sec: 3887.7). Total num frames: 1187840. Throughput: 0: 954.7. Samples: 295452. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-22 17:33:27,109][00114] Avg episode reward: [(0, '4.730')] +[2023-02-22 17:33:32,104][00114] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3887.7). Total num frames: 1212416. Throughput: 0: 999.2. Samples: 301992. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-22 17:33:32,109][00114] Avg episode reward: [(0, '4.830')] +[2023-02-22 17:33:32,122][13425] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000296_1212416.pth... +[2023-02-22 17:33:32,232][13425] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000069_282624.pth +[2023-02-22 17:33:35,332][13439] Updated weights for policy 0, policy_version 300 (0.0013) +[2023-02-22 17:33:37,104][00114] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 1232896. Throughput: 0: 1007.7. Samples: 309046. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-22 17:33:37,110][00114] Avg episode reward: [(0, '4.642')] +[2023-02-22 17:33:42,104][00114] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3887.8). Total num frames: 1249280. Throughput: 0: 977.2. Samples: 311278. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-22 17:33:42,107][00114] Avg episode reward: [(0, '4.664')] +[2023-02-22 17:33:47,104][00114] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 1265664. Throughput: 0: 958.6. Samples: 315902. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-22 17:33:47,107][00114] Avg episode reward: [(0, '4.744')] +[2023-02-22 17:33:47,514][13439] Updated weights for policy 0, policy_version 310 (0.0028) +[2023-02-22 17:33:52,104][00114] Fps is (10 sec: 4095.9, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 1290240. Throughput: 0: 1009.9. Samples: 322958. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-22 17:33:52,107][00114] Avg episode reward: [(0, '4.751')] +[2023-02-22 17:33:55,977][13439] Updated weights for policy 0, policy_version 320 (0.0012) +[2023-02-22 17:33:57,104][00114] Fps is (10 sec: 4505.4, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 1310720. Throughput: 0: 1009.6. Samples: 326554. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-02-22 17:33:57,114][00114] Avg episode reward: [(0, '4.677')] +[2023-02-22 17:34:02,104][00114] Fps is (10 sec: 3686.5, 60 sec: 3959.6, 300 sec: 3887.7). Total num frames: 1327104. Throughput: 0: 969.8. Samples: 331820. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-22 17:34:02,109][00114] Avg episode reward: [(0, '4.685')] +[2023-02-22 17:34:07,104][00114] Fps is (10 sec: 3686.5, 60 sec: 4027.7, 300 sec: 3887.7). Total num frames: 1347584. Throughput: 0: 969.8. Samples: 336808. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-22 17:34:07,106][00114] Avg episode reward: [(0, '4.680')] +[2023-02-22 17:34:07,874][13439] Updated weights for policy 0, policy_version 330 (0.0014) +[2023-02-22 17:34:12,104][00114] Fps is (10 sec: 4095.9, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 1368064. Throughput: 0: 999.7. Samples: 340438. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-22 17:34:12,109][00114] Avg episode reward: [(0, '4.616')] +[2023-02-22 17:34:17,104][00114] Fps is (10 sec: 4096.0, 60 sec: 3891.8, 300 sec: 3873.8). Total num frames: 1388544. Throughput: 0: 1012.9. Samples: 347572. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-22 17:34:17,107][00114] Avg episode reward: [(0, '4.743')] +[2023-02-22 17:34:17,169][13439] Updated weights for policy 0, policy_version 340 (0.0022) +[2023-02-22 17:34:22,109][00114] Fps is (10 sec: 3684.7, 60 sec: 3890.9, 300 sec: 3873.8). Total num frames: 1404928. Throughput: 0: 958.1. Samples: 352164. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-22 17:34:22,114][00114] Avg episode reward: [(0, '4.645')] +[2023-02-22 17:34:27,104][00114] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 1425408. Throughput: 0: 958.5. Samples: 354410. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-22 17:34:27,106][00114] Avg episode reward: [(0, '4.707')] +[2023-02-22 17:34:28,599][13439] Updated weights for policy 0, policy_version 350 (0.0026) +[2023-02-22 17:34:32,106][00114] Fps is (10 sec: 4506.9, 60 sec: 3959.3, 300 sec: 3887.7). Total num frames: 1449984. Throughput: 0: 1011.6. Samples: 361424. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-02-22 17:34:32,108][00114] Avg episode reward: [(0, '4.761')] +[2023-02-22 17:34:37,104][00114] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3887.7). Total num frames: 1470464. Throughput: 0: 1002.1. Samples: 368054. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2023-02-22 17:34:37,108][00114] Avg episode reward: [(0, '4.930')] +[2023-02-22 17:34:37,111][13425] Saving new best policy, reward=4.930! +[2023-02-22 17:34:38,168][13439] Updated weights for policy 0, policy_version 360 (0.0018) +[2023-02-22 17:34:42,108][00114] Fps is (10 sec: 3275.9, 60 sec: 3890.9, 300 sec: 3873.8). Total num frames: 1482752. Throughput: 0: 970.0. Samples: 370208. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-22 17:34:42,111][00114] Avg episode reward: [(0, '4.969')] +[2023-02-22 17:34:42,194][13425] Saving new best policy, reward=4.969! +[2023-02-22 17:34:47,104][00114] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 1503232. Throughput: 0: 960.9. Samples: 375060. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-02-22 17:34:47,109][00114] Avg episode reward: [(0, '5.006')] +[2023-02-22 17:34:47,114][13425] Saving new best policy, reward=5.006! +[2023-02-22 17:34:49,275][13439] Updated weights for policy 0, policy_version 370 (0.0032) +[2023-02-22 17:34:52,104][00114] Fps is (10 sec: 4507.7, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 1527808. Throughput: 0: 1010.6. Samples: 382284. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-22 17:34:52,107][00114] Avg episode reward: [(0, '4.955')] +[2023-02-22 17:34:57,104][00114] Fps is (10 sec: 4505.3, 60 sec: 3959.4, 300 sec: 3873.9). Total num frames: 1548288. Throughput: 0: 1009.4. Samples: 385860. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-22 17:34:57,109][00114] Avg episode reward: [(0, '4.755')] +[2023-02-22 17:34:59,170][13439] Updated weights for policy 0, policy_version 380 (0.0020) +[2023-02-22 17:35:02,106][00114] Fps is (10 sec: 3685.7, 60 sec: 3959.3, 300 sec: 3887.7). Total num frames: 1564672. Throughput: 0: 958.5. Samples: 390706. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-02-22 17:35:02,112][00114] Avg episode reward: [(0, '4.758')] +[2023-02-22 17:35:07,104][00114] Fps is (10 sec: 3277.0, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 1581056. Throughput: 0: 978.4. Samples: 396188. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-02-22 17:35:07,111][00114] Avg episode reward: [(0, '4.986')] +[2023-02-22 17:35:09,599][13439] Updated weights for policy 0, policy_version 390 (0.0024) +[2023-02-22 17:35:12,104][00114] Fps is (10 sec: 4506.5, 60 sec: 4027.7, 300 sec: 3887.7). Total num frames: 1609728. Throughput: 0: 1009.6. Samples: 399842. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-22 17:35:12,106][00114] Avg episode reward: [(0, '4.762')] +[2023-02-22 17:35:17,107][00114] Fps is (10 sec: 4504.2, 60 sec: 3959.3, 300 sec: 3873.8). Total num frames: 1626112. Throughput: 0: 1006.5. Samples: 406720. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-22 17:35:17,114][00114] Avg episode reward: [(0, '4.940')] +[2023-02-22 17:35:20,077][13439] Updated weights for policy 0, policy_version 400 (0.0026) +[2023-02-22 17:35:22,106][00114] Fps is (10 sec: 3275.9, 60 sec: 3959.6, 300 sec: 3887.7). Total num frames: 1642496. Throughput: 0: 960.9. Samples: 411296. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-22 17:35:22,114][00114] Avg episode reward: [(0, '5.216')] +[2023-02-22 17:35:22,124][13425] Saving new best policy, reward=5.216! +[2023-02-22 17:35:27,104][00114] Fps is (10 sec: 3687.5, 60 sec: 3959.5, 300 sec: 3915.5). Total num frames: 1662976. Throughput: 0: 965.4. Samples: 413646. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-22 17:35:27,106][00114] Avg episode reward: [(0, '5.392')] +[2023-02-22 17:35:27,112][13425] Saving new best policy, reward=5.392! +[2023-02-22 17:35:30,290][13439] Updated weights for policy 0, policy_version 410 (0.0022) +[2023-02-22 17:35:32,104][00114] Fps is (10 sec: 4506.7, 60 sec: 3959.6, 300 sec: 3929.4). Total num frames: 1687552. Throughput: 0: 1015.6. Samples: 420764. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-02-22 17:35:32,106][00114] Avg episode reward: [(0, '5.597')] +[2023-02-22 17:35:32,122][13425] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000412_1687552.pth... +[2023-02-22 17:35:32,254][13425] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000181_741376.pth +[2023-02-22 17:35:32,261][13425] Saving new best policy, reward=5.597! +[2023-02-22 17:35:37,104][00114] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3915.5). Total num frames: 1703936. Throughput: 0: 992.2. Samples: 426932. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-02-22 17:35:37,111][00114] Avg episode reward: [(0, '5.570')] +[2023-02-22 17:35:41,636][13439] Updated weights for policy 0, policy_version 420 (0.0018) +[2023-02-22 17:35:42,104][00114] Fps is (10 sec: 3276.8, 60 sec: 3959.8, 300 sec: 3929.4). Total num frames: 1720320. Throughput: 0: 963.0. Samples: 429194. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-22 17:35:42,108][00114] Avg episode reward: [(0, '5.611')] +[2023-02-22 17:35:42,126][13425] Saving new best policy, reward=5.611! +[2023-02-22 17:35:47,104][00114] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3929.4). Total num frames: 1740800. Throughput: 0: 969.6. Samples: 434334. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-22 17:35:47,106][00114] Avg episode reward: [(0, '5.644')] +[2023-02-22 17:35:47,114][13425] Saving new best policy, reward=5.644! +[2023-02-22 17:35:50,898][13439] Updated weights for policy 0, policy_version 430 (0.0029) +[2023-02-22 17:35:52,104][00114] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3929.4). Total num frames: 1765376. Throughput: 0: 1009.8. Samples: 441630. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-22 17:35:52,107][00114] Avg episode reward: [(0, '5.396')] +[2023-02-22 17:35:57,104][00114] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3915.5). Total num frames: 1781760. Throughput: 0: 1005.8. Samples: 445102. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-22 17:35:57,110][00114] Avg episode reward: [(0, '5.372')] +[2023-02-22 17:36:02,105][00114] Fps is (10 sec: 3276.4, 60 sec: 3891.2, 300 sec: 3915.5). Total num frames: 1798144. Throughput: 0: 955.7. Samples: 449726. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-02-22 17:36:02,107][00114] Avg episode reward: [(0, '5.185')] +[2023-02-22 17:36:02,632][13439] Updated weights for policy 0, policy_version 440 (0.0011) +[2023-02-22 17:36:07,104][00114] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3901.6). Total num frames: 1814528. Throughput: 0: 955.3. Samples: 454284. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-02-22 17:36:07,111][00114] Avg episode reward: [(0, '5.303')] +[2023-02-22 17:36:12,104][00114] Fps is (10 sec: 3277.0, 60 sec: 3686.4, 300 sec: 3873.8). Total num frames: 1830912. Throughput: 0: 952.1. Samples: 456492. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-22 17:36:12,110][00114] Avg episode reward: [(0, '5.188')] +[2023-02-22 17:36:15,565][13439] Updated weights for policy 0, policy_version 450 (0.0019) +[2023-02-22 17:36:17,107][00114] Fps is (10 sec: 3275.9, 60 sec: 3686.4, 300 sec: 3873.8). Total num frames: 1847296. Throughput: 0: 898.1. Samples: 461180. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-02-22 17:36:17,110][00114] Avg episode reward: [(0, '4.991')] +[2023-02-22 17:36:22,106][00114] Fps is (10 sec: 2866.6, 60 sec: 3618.1, 300 sec: 3859.9). Total num frames: 1859584. Throughput: 0: 861.1. Samples: 465682. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-22 17:36:22,109][00114] Avg episode reward: [(0, '4.542')] +[2023-02-22 17:36:27,072][13439] Updated weights for policy 0, policy_version 460 (0.0027) +[2023-02-22 17:36:27,104][00114] Fps is (10 sec: 3687.5, 60 sec: 3686.4, 300 sec: 3873.8). Total num frames: 1884160. Throughput: 0: 872.1. Samples: 468438. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-22 17:36:27,106][00114] Avg episode reward: [(0, '4.716')] +[2023-02-22 17:36:32,104][00114] Fps is (10 sec: 4506.8, 60 sec: 3618.1, 300 sec: 3860.0). Total num frames: 1904640. Throughput: 0: 918.2. Samples: 475652. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-22 17:36:32,110][00114] Avg episode reward: [(0, '4.784')] +[2023-02-22 17:36:37,105][00114] Fps is (10 sec: 3685.8, 60 sec: 3618.0, 300 sec: 3887.7). Total num frames: 1921024. Throughput: 0: 883.9. Samples: 481406. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-02-22 17:36:37,108][00114] Avg episode reward: [(0, '4.668')] +[2023-02-22 17:36:37,218][13439] Updated weights for policy 0, policy_version 470 (0.0014) +[2023-02-22 17:36:42,104][00114] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3887.7). Total num frames: 1937408. Throughput: 0: 858.6. Samples: 483738. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2023-02-22 17:36:42,108][00114] Avg episode reward: [(0, '4.727')] +[2023-02-22 17:36:47,104][00114] Fps is (10 sec: 3687.0, 60 sec: 3618.1, 300 sec: 3873.8). Total num frames: 1957888. Throughput: 0: 878.5. Samples: 489258. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2023-02-22 17:36:47,106][00114] Avg episode reward: [(0, '4.755')] +[2023-02-22 17:36:48,050][13439] Updated weights for policy 0, policy_version 480 (0.0012) +[2023-02-22 17:36:52,104][00114] Fps is (10 sec: 4505.6, 60 sec: 3618.1, 300 sec: 3873.8). Total num frames: 1982464. Throughput: 0: 938.5. Samples: 496516. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-02-22 17:36:52,106][00114] Avg episode reward: [(0, '4.712')] +[2023-02-22 17:36:57,104][00114] Fps is (10 sec: 4505.6, 60 sec: 3686.4, 300 sec: 3887.7). Total num frames: 2002944. Throughput: 0: 955.6. Samples: 499494. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-22 17:36:57,111][00114] Avg episode reward: [(0, '4.480')] +[2023-02-22 17:36:58,144][13439] Updated weights for policy 0, policy_version 490 (0.0020) +[2023-02-22 17:37:02,105][00114] Fps is (10 sec: 3686.1, 60 sec: 3686.4, 300 sec: 3887.7). Total num frames: 2019328. Throughput: 0: 952.6. Samples: 504044. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-02-22 17:37:02,110][00114] Avg episode reward: [(0, '4.305')] +[2023-02-22 17:37:07,104][00114] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3887.7). Total num frames: 2039808. Throughput: 0: 992.5. Samples: 510344. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2023-02-22 17:37:07,111][00114] Avg episode reward: [(0, '4.500')] +[2023-02-22 17:37:08,498][13439] Updated weights for policy 0, policy_version 500 (0.0022) +[2023-02-22 17:37:12,104][00114] Fps is (10 sec: 4506.0, 60 sec: 3891.2, 300 sec: 3887.7). Total num frames: 2064384. Throughput: 0: 1011.1. Samples: 513936. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-02-22 17:37:12,111][00114] Avg episode reward: [(0, '4.689')] +[2023-02-22 17:37:17,104][00114] Fps is (10 sec: 4096.0, 60 sec: 3891.4, 300 sec: 3887.7). Total num frames: 2080768. Throughput: 0: 983.9. Samples: 519926. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-22 17:37:17,110][00114] Avg episode reward: [(0, '4.807')] +[2023-02-22 17:37:19,110][13439] Updated weights for policy 0, policy_version 510 (0.0018) +[2023-02-22 17:37:22,104][00114] Fps is (10 sec: 3276.8, 60 sec: 3959.6, 300 sec: 3887.7). Total num frames: 2097152. Throughput: 0: 960.0. Samples: 524604. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-02-22 17:37:22,106][00114] Avg episode reward: [(0, '4.837')] +[2023-02-22 17:37:27,104][00114] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 2117632. Throughput: 0: 979.2. Samples: 527802. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-22 17:37:27,106][00114] Avg episode reward: [(0, '4.721')] +[2023-02-22 17:37:28,830][13439] Updated weights for policy 0, policy_version 520 (0.0030) +[2023-02-22 17:37:32,107][00114] Fps is (10 sec: 4504.4, 60 sec: 3959.3, 300 sec: 3873.8). Total num frames: 2142208. Throughput: 0: 1018.3. Samples: 535086. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-22 17:37:32,109][00114] Avg episode reward: [(0, '4.553')] +[2023-02-22 17:37:32,149][13425] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000524_2146304.pth... +[2023-02-22 17:37:32,273][13425] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000296_1212416.pth +[2023-02-22 17:37:37,104][00114] Fps is (10 sec: 4096.0, 60 sec: 3959.6, 300 sec: 3887.7). Total num frames: 2158592. Throughput: 0: 974.1. Samples: 540350. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-02-22 17:37:37,109][00114] Avg episode reward: [(0, '4.394')] +[2023-02-22 17:37:40,296][13439] Updated weights for policy 0, policy_version 530 (0.0026) +[2023-02-22 17:37:42,104][00114] Fps is (10 sec: 3277.7, 60 sec: 3959.5, 300 sec: 3887.7). Total num frames: 2174976. Throughput: 0: 958.1. Samples: 542610. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-02-22 17:37:42,106][00114] Avg episode reward: [(0, '4.421')] +[2023-02-22 17:37:47,104][00114] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3887.7). Total num frames: 2199552. Throughput: 0: 991.0. Samples: 548640. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-02-22 17:37:47,107][00114] Avg episode reward: [(0, '4.731')] +[2023-02-22 17:37:49,507][13439] Updated weights for policy 0, policy_version 540 (0.0015) +[2023-02-22 17:37:52,104][00114] Fps is (10 sec: 4915.2, 60 sec: 4027.7, 300 sec: 3887.7). Total num frames: 2224128. Throughput: 0: 1012.3. Samples: 555898. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-22 17:37:52,107][00114] Avg episode reward: [(0, '4.736')] +[2023-02-22 17:37:57,104][00114] Fps is (10 sec: 3686.2, 60 sec: 3891.2, 300 sec: 3887.8). Total num frames: 2236416. Throughput: 0: 989.5. Samples: 558466. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-22 17:37:57,111][00114] Avg episode reward: [(0, '4.648')] +[2023-02-22 17:38:01,408][13439] Updated weights for policy 0, policy_version 550 (0.0031) +[2023-02-22 17:38:02,104][00114] Fps is (10 sec: 2867.2, 60 sec: 3891.3, 300 sec: 3887.7). Total num frames: 2252800. Throughput: 0: 956.7. Samples: 562978. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-22 17:38:02,106][00114] Avg episode reward: [(0, '4.579')] +[2023-02-22 17:38:07,111][00114] Fps is (10 sec: 4093.4, 60 sec: 3959.0, 300 sec: 3887.6). Total num frames: 2277376. Throughput: 0: 1001.8. Samples: 569694. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-22 17:38:07,113][00114] Avg episode reward: [(0, '4.616')] +[2023-02-22 17:38:10,035][13439] Updated weights for policy 0, policy_version 560 (0.0018) +[2023-02-22 17:38:12,106][00114] Fps is (10 sec: 4914.3, 60 sec: 3959.4, 300 sec: 3887.8). Total num frames: 2301952. Throughput: 0: 1009.1. Samples: 573212. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-02-22 17:38:12,111][00114] Avg episode reward: [(0, '4.689')] +[2023-02-22 17:38:17,104][00114] Fps is (10 sec: 3688.9, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 2314240. Throughput: 0: 969.3. Samples: 578702. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-02-22 17:38:17,110][00114] Avg episode reward: [(0, '4.571')] +[2023-02-22 17:38:22,104][00114] Fps is (10 sec: 2867.7, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 2330624. Throughput: 0: 955.1. Samples: 583328. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-02-22 17:38:22,111][00114] Avg episode reward: [(0, '4.787')] +[2023-02-22 17:38:22,371][13439] Updated weights for policy 0, policy_version 570 (0.0016) +[2023-02-22 17:38:27,104][00114] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 2355200. Throughput: 0: 980.9. Samples: 586750. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-22 17:38:27,112][00114] Avg episode reward: [(0, '4.646')] +[2023-02-22 17:38:30,843][13439] Updated weights for policy 0, policy_version 580 (0.0021) +[2023-02-22 17:38:32,104][00114] Fps is (10 sec: 4915.2, 60 sec: 3959.6, 300 sec: 3887.7). Total num frames: 2379776. Throughput: 0: 1009.1. Samples: 594050. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-22 17:38:32,109][00114] Avg episode reward: [(0, '4.653')] +[2023-02-22 17:38:37,104][00114] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 2392064. Throughput: 0: 960.2. Samples: 599108. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-22 17:38:37,110][00114] Avg episode reward: [(0, '4.633')] +[2023-02-22 17:38:42,104][00114] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 3887.7). Total num frames: 2412544. Throughput: 0: 952.8. Samples: 601342. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-02-22 17:38:42,106][00114] Avg episode reward: [(0, '4.507')] +[2023-02-22 17:38:43,031][13439] Updated weights for policy 0, policy_version 590 (0.0022) +[2023-02-22 17:38:47,104][00114] Fps is (10 sec: 4095.9, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 2433024. Throughput: 0: 996.3. Samples: 607812. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-22 17:38:47,107][00114] Avg episode reward: [(0, '4.567')] +[2023-02-22 17:38:51,582][13439] Updated weights for policy 0, policy_version 600 (0.0013) +[2023-02-22 17:38:52,104][00114] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3887.7). Total num frames: 2457600. Throughput: 0: 1005.2. Samples: 614920. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-22 17:38:52,114][00114] Avg episode reward: [(0, '4.670')] +[2023-02-22 17:38:57,104][00114] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3887.7). Total num frames: 2473984. Throughput: 0: 976.6. Samples: 617156. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-22 17:38:57,108][00114] Avg episode reward: [(0, '4.583')] +[2023-02-22 17:39:02,104][00114] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 2490368. Throughput: 0: 957.5. Samples: 621788. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-22 17:39:02,107][00114] Avg episode reward: [(0, '4.602')] +[2023-02-22 17:39:03,524][13439] Updated weights for policy 0, policy_version 610 (0.0032) +[2023-02-22 17:39:07,104][00114] Fps is (10 sec: 4096.1, 60 sec: 3959.9, 300 sec: 3887.7). Total num frames: 2514944. Throughput: 0: 1011.9. Samples: 628862. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-22 17:39:07,107][00114] Avg episode reward: [(0, '4.700')] +[2023-02-22 17:39:12,104][00114] Fps is (10 sec: 4505.6, 60 sec: 3891.3, 300 sec: 3887.7). Total num frames: 2535424. Throughput: 0: 1016.3. Samples: 632482. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-02-22 17:39:12,106][00114] Avg episode reward: [(0, '4.572')] +[2023-02-22 17:39:12,755][13439] Updated weights for policy 0, policy_version 620 (0.0019) +[2023-02-22 17:39:17,104][00114] Fps is (10 sec: 3686.3, 60 sec: 3959.5, 300 sec: 3887.8). Total num frames: 2551808. Throughput: 0: 970.0. Samples: 637700. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-02-22 17:39:17,111][00114] Avg episode reward: [(0, '4.578')] +[2023-02-22 17:39:22,104][00114] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 2568192. Throughput: 0: 967.0. Samples: 642622. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-22 17:39:22,110][00114] Avg episode reward: [(0, '4.816')] +[2023-02-22 17:39:24,020][13439] Updated weights for policy 0, policy_version 630 (0.0017) +[2023-02-22 17:39:27,104][00114] Fps is (10 sec: 4096.1, 60 sec: 3959.5, 300 sec: 3873.9). Total num frames: 2592768. Throughput: 0: 998.0. Samples: 646250. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-22 17:39:27,107][00114] Avg episode reward: [(0, '5.055')] +[2023-02-22 17:39:32,104][00114] Fps is (10 sec: 4505.5, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 2613248. Throughput: 0: 1013.6. Samples: 653424. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-22 17:39:32,106][00114] Avg episode reward: [(0, '4.793')] +[2023-02-22 17:39:32,123][13425] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000638_2613248.pth... +[2023-02-22 17:39:32,258][13425] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000412_1687552.pth +[2023-02-22 17:39:33,634][13439] Updated weights for policy 0, policy_version 640 (0.0020) +[2023-02-22 17:39:37,104][00114] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3887.8). Total num frames: 2629632. Throughput: 0: 956.8. Samples: 657974. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-22 17:39:37,108][00114] Avg episode reward: [(0, '4.645')] +[2023-02-22 17:39:42,104][00114] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 2646016. Throughput: 0: 956.7. Samples: 660206. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-22 17:39:42,111][00114] Avg episode reward: [(0, '4.774')] +[2023-02-22 17:39:44,789][13439] Updated weights for policy 0, policy_version 650 (0.0020) +[2023-02-22 17:39:47,104][00114] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 2670592. Throughput: 0: 1006.1. Samples: 667064. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-22 17:39:47,110][00114] Avg episode reward: [(0, '4.644')] +[2023-02-22 17:39:52,104][00114] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3873.9). Total num frames: 2691072. Throughput: 0: 998.6. Samples: 673798. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-22 17:39:52,112][00114] Avg episode reward: [(0, '4.650')] +[2023-02-22 17:39:55,114][13439] Updated weights for policy 0, policy_version 660 (0.0013) +[2023-02-22 17:39:57,104][00114] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3873.9). Total num frames: 2707456. Throughput: 0: 968.4. Samples: 676060. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-02-22 17:39:57,110][00114] Avg episode reward: [(0, '4.782')] +[2023-02-22 17:40:02,104][00114] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3887.7). Total num frames: 2727936. Throughput: 0: 959.4. Samples: 680872. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-02-22 17:40:02,106][00114] Avg episode reward: [(0, '4.791')] +[2023-02-22 17:40:05,322][13439] Updated weights for policy 0, policy_version 670 (0.0018) +[2023-02-22 17:40:07,104][00114] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 2752512. Throughput: 0: 1011.2. Samples: 688126. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-22 17:40:07,106][00114] Avg episode reward: [(0, '4.707')] +[2023-02-22 17:40:12,104][00114] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3887.8). Total num frames: 2772992. Throughput: 0: 1011.0. Samples: 691744. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-22 17:40:12,106][00114] Avg episode reward: [(0, '4.781')] +[2023-02-22 17:40:15,898][13439] Updated weights for policy 0, policy_version 680 (0.0024) +[2023-02-22 17:40:17,109][00114] Fps is (10 sec: 3274.9, 60 sec: 3890.9, 300 sec: 3873.8). Total num frames: 2785280. Throughput: 0: 959.5. Samples: 696606. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-22 17:40:17,112][00114] Avg episode reward: [(0, '4.664')] +[2023-02-22 17:40:22,104][00114] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 2805760. Throughput: 0: 975.8. Samples: 701886. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-22 17:40:22,106][00114] Avg episode reward: [(0, '4.758')] +[2023-02-22 17:40:25,922][13439] Updated weights for policy 0, policy_version 690 (0.0012) +[2023-02-22 17:40:27,104][00114] Fps is (10 sec: 4508.1, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 2830336. Throughput: 0: 1005.1. Samples: 705436. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-22 17:40:27,107][00114] Avg episode reward: [(0, '4.657')] +[2023-02-22 17:40:32,104][00114] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3887.7). Total num frames: 2850816. Throughput: 0: 1010.0. Samples: 712514. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-22 17:40:32,110][00114] Avg episode reward: [(0, '4.522')] +[2023-02-22 17:40:37,038][13439] Updated weights for policy 0, policy_version 700 (0.0018) +[2023-02-22 17:40:37,104][00114] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3887.7). Total num frames: 2867200. Throughput: 0: 960.7. Samples: 717028. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-02-22 17:40:37,110][00114] Avg episode reward: [(0, '4.638')] +[2023-02-22 17:40:42,104][00114] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 2883584. Throughput: 0: 960.4. Samples: 719278. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-22 17:40:42,112][00114] Avg episode reward: [(0, '4.675')] +[2023-02-22 17:40:46,534][13439] Updated weights for policy 0, policy_version 710 (0.0027) +[2023-02-22 17:40:47,104][00114] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 2908160. Throughput: 0: 1012.0. Samples: 726414. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-22 17:40:47,106][00114] Avg episode reward: [(0, '4.631')] +[2023-02-22 17:40:52,104][00114] Fps is (10 sec: 4505.5, 60 sec: 3959.4, 300 sec: 3887.7). Total num frames: 2928640. Throughput: 0: 988.9. Samples: 732626. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-02-22 17:40:52,107][00114] Avg episode reward: [(0, '4.694')] +[2023-02-22 17:40:57,104][00114] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3873.9). Total num frames: 2940928. Throughput: 0: 958.4. Samples: 734874. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-22 17:40:57,107][00114] Avg episode reward: [(0, '4.496')] +[2023-02-22 17:40:58,485][13439] Updated weights for policy 0, policy_version 720 (0.0031) +[2023-02-22 17:41:02,104][00114] Fps is (10 sec: 3276.9, 60 sec: 3891.2, 300 sec: 3887.7). Total num frames: 2961408. Throughput: 0: 964.1. Samples: 739986. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-22 17:41:02,110][00114] Avg episode reward: [(0, '4.451')] +[2023-02-22 17:41:07,104][00114] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3915.5). Total num frames: 2985984. Throughput: 0: 1009.0. Samples: 747292. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-22 17:41:07,111][00114] Avg episode reward: [(0, '4.474')] +[2023-02-22 17:41:07,222][13439] Updated weights for policy 0, policy_version 730 (0.0016) +[2023-02-22 17:41:12,104][00114] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3929.4). Total num frames: 3006464. Throughput: 0: 1008.2. Samples: 750806. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-02-22 17:41:12,112][00114] Avg episode reward: [(0, '4.574')] +[2023-02-22 17:41:17,104][00114] Fps is (10 sec: 3686.4, 60 sec: 3959.8, 300 sec: 3943.3). Total num frames: 3022848. Throughput: 0: 950.3. Samples: 755276. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-22 17:41:17,108][00114] Avg episode reward: [(0, '4.614')] +[2023-02-22 17:41:19,420][13439] Updated weights for policy 0, policy_version 740 (0.0027) +[2023-02-22 17:41:22,104][00114] Fps is (10 sec: 3686.3, 60 sec: 3959.5, 300 sec: 3929.4). Total num frames: 3043328. Throughput: 0: 976.3. Samples: 760964. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-22 17:41:22,111][00114] Avg episode reward: [(0, '4.648')] +[2023-02-22 17:41:27,104][00114] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3929.4). Total num frames: 3063808. Throughput: 0: 1004.9. Samples: 764500. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-22 17:41:27,106][00114] Avg episode reward: [(0, '4.722')] +[2023-02-22 17:41:27,882][13439] Updated weights for policy 0, policy_version 750 (0.0013) +[2023-02-22 17:41:32,104][00114] Fps is (10 sec: 4096.1, 60 sec: 3891.2, 300 sec: 3943.3). Total num frames: 3084288. Throughput: 0: 990.5. Samples: 770986. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-02-22 17:41:32,116][00114] Avg episode reward: [(0, '4.602')] +[2023-02-22 17:41:32,134][13425] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000753_3084288.pth... +[2023-02-22 17:41:32,288][13425] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000524_2146304.pth +[2023-02-22 17:41:37,104][00114] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3943.3). Total num frames: 3100672. Throughput: 0: 954.1. Samples: 775560. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-22 17:41:37,111][00114] Avg episode reward: [(0, '4.628')] +[2023-02-22 17:41:40,016][13439] Updated weights for policy 0, policy_version 760 (0.0017) +[2023-02-22 17:41:42,104][00114] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3929.4). Total num frames: 3117056. Throughput: 0: 965.1. Samples: 778302. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2023-02-22 17:41:42,109][00114] Avg episode reward: [(0, '4.715')] +[2023-02-22 17:41:47,105][00114] Fps is (10 sec: 3276.4, 60 sec: 3754.6, 300 sec: 3901.6). Total num frames: 3133440. Throughput: 0: 966.8. Samples: 783494. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2023-02-22 17:41:47,107][00114] Avg episode reward: [(0, '4.632')] +[2023-02-22 17:41:52,104][00114] Fps is (10 sec: 2867.2, 60 sec: 3618.2, 300 sec: 3873.8). Total num frames: 3145728. Throughput: 0: 896.4. Samples: 787630. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-22 17:41:52,111][00114] Avg episode reward: [(0, '4.712')] +[2023-02-22 17:41:53,569][13439] Updated weights for policy 0, policy_version 770 (0.0021) +[2023-02-22 17:41:57,104][00114] Fps is (10 sec: 2867.6, 60 sec: 3686.4, 300 sec: 3873.9). Total num frames: 3162112. Throughput: 0: 868.4. Samples: 789884. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-02-22 17:41:57,107][00114] Avg episode reward: [(0, '4.827')] +[2023-02-22 17:42:02,106][00114] Fps is (10 sec: 3685.4, 60 sec: 3686.2, 300 sec: 3873.8). Total num frames: 3182592. Throughput: 0: 885.4. Samples: 795120. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-02-22 17:42:02,109][00114] Avg episode reward: [(0, '4.764')] +[2023-02-22 17:42:04,132][13439] Updated weights for policy 0, policy_version 780 (0.0023) +[2023-02-22 17:42:07,104][00114] Fps is (10 sec: 4505.6, 60 sec: 3686.4, 300 sec: 3873.8). Total num frames: 3207168. Throughput: 0: 917.6. Samples: 802254. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-22 17:42:07,106][00114] Avg episode reward: [(0, '4.572')] +[2023-02-22 17:42:12,104][00114] Fps is (10 sec: 4506.7, 60 sec: 3686.4, 300 sec: 3887.7). Total num frames: 3227648. Throughput: 0: 916.9. Samples: 805760. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-22 17:42:12,109][00114] Avg episode reward: [(0, '4.588')] +[2023-02-22 17:42:14,432][13439] Updated weights for policy 0, policy_version 790 (0.0022) +[2023-02-22 17:42:17,106][00114] Fps is (10 sec: 3276.0, 60 sec: 3618.0, 300 sec: 3873.8). Total num frames: 3239936. Throughput: 0: 872.6. Samples: 810256. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-02-22 17:42:17,111][00114] Avg episode reward: [(0, '4.600')] +[2023-02-22 17:42:22,104][00114] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3873.8). Total num frames: 3260416. Throughput: 0: 895.6. Samples: 815862. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-22 17:42:22,111][00114] Avg episode reward: [(0, '4.537')] +[2023-02-22 17:42:24,740][13439] Updated weights for policy 0, policy_version 800 (0.0014) +[2023-02-22 17:42:27,104][00114] Fps is (10 sec: 4506.7, 60 sec: 3686.4, 300 sec: 3873.9). Total num frames: 3284992. Throughput: 0: 915.0. Samples: 819478. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-22 17:42:27,107][00114] Avg episode reward: [(0, '4.560')] +[2023-02-22 17:42:32,104][00114] Fps is (10 sec: 4505.6, 60 sec: 3686.4, 300 sec: 3887.7). Total num frames: 3305472. Throughput: 0: 945.6. Samples: 826044. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-22 17:42:32,111][00114] Avg episode reward: [(0, '4.675')] +[2023-02-22 17:42:35,612][13439] Updated weights for policy 0, policy_version 810 (0.0018) +[2023-02-22 17:42:37,104][00114] Fps is (10 sec: 3276.7, 60 sec: 3618.1, 300 sec: 3873.8). Total num frames: 3317760. Throughput: 0: 953.3. Samples: 830528. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-02-22 17:42:37,108][00114] Avg episode reward: [(0, '4.496')] +[2023-02-22 17:42:42,104][00114] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3873.8). Total num frames: 3342336. Throughput: 0: 962.2. Samples: 833182. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-22 17:42:42,106][00114] Avg episode reward: [(0, '4.533')] +[2023-02-22 17:42:45,437][13439] Updated weights for policy 0, policy_version 820 (0.0015) +[2023-02-22 17:42:47,104][00114] Fps is (10 sec: 4505.8, 60 sec: 3823.0, 300 sec: 3860.0). Total num frames: 3362816. Throughput: 0: 1006.2. Samples: 840398. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-22 17:42:47,106][00114] Avg episode reward: [(0, '4.546')] +[2023-02-22 17:42:52,104][00114] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3887.7). Total num frames: 3383296. Throughput: 0: 982.6. Samples: 846472. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-02-22 17:42:52,112][00114] Avg episode reward: [(0, '4.591')] +[2023-02-22 17:42:56,792][13439] Updated weights for policy 0, policy_version 830 (0.0012) +[2023-02-22 17:42:57,105][00114] Fps is (10 sec: 3685.8, 60 sec: 3959.4, 300 sec: 3887.7). Total num frames: 3399680. Throughput: 0: 955.3. Samples: 848750. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2023-02-22 17:42:57,108][00114] Avg episode reward: [(0, '4.544')] +[2023-02-22 17:43:02,104][00114] Fps is (10 sec: 3686.4, 60 sec: 3959.6, 300 sec: 3873.9). Total num frames: 3420160. Throughput: 0: 979.3. Samples: 854324. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-02-22 17:43:02,107][00114] Avg episode reward: [(0, '4.498')] +[2023-02-22 17:43:05,860][13439] Updated weights for policy 0, policy_version 840 (0.0023) +[2023-02-22 17:43:07,104][00114] Fps is (10 sec: 4506.3, 60 sec: 3959.5, 300 sec: 3873.9). Total num frames: 3444736. Throughput: 0: 1016.1. Samples: 861586. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-02-22 17:43:07,106][00114] Avg episode reward: [(0, '4.517')] +[2023-02-22 17:43:12,104][00114] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3887.7). Total num frames: 3461120. Throughput: 0: 1005.5. Samples: 864726. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-22 17:43:12,106][00114] Avg episode reward: [(0, '4.812')] +[2023-02-22 17:43:17,105][00114] Fps is (10 sec: 3276.6, 60 sec: 3959.6, 300 sec: 3887.7). Total num frames: 3477504. Throughput: 0: 959.1. Samples: 869202. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-22 17:43:17,110][00114] Avg episode reward: [(0, '4.813')] +[2023-02-22 17:43:17,832][13439] Updated weights for policy 0, policy_version 850 (0.0011) +[2023-02-22 17:43:22,104][00114] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 3497984. Throughput: 0: 992.7. Samples: 875200. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-22 17:43:22,110][00114] Avg episode reward: [(0, '4.726')] +[2023-02-22 17:43:26,441][13439] Updated weights for policy 0, policy_version 860 (0.0015) +[2023-02-22 17:43:27,104][00114] Fps is (10 sec: 4505.9, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 3522560. Throughput: 0: 1013.6. Samples: 878794. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-22 17:43:27,106][00114] Avg episode reward: [(0, '4.778')] +[2023-02-22 17:43:32,109][00114] Fps is (10 sec: 4094.1, 60 sec: 3890.9, 300 sec: 3887.7). Total num frames: 3538944. Throughput: 0: 989.1. Samples: 884912. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-22 17:43:32,113][00114] Avg episode reward: [(0, '4.816')] +[2023-02-22 17:43:32,129][13425] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000864_3538944.pth... +[2023-02-22 17:43:32,286][13425] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000638_2613248.pth +[2023-02-22 17:43:37,104][00114] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 3555328. Throughput: 0: 952.1. Samples: 889316. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-22 17:43:37,111][00114] Avg episode reward: [(0, '4.481')] +[2023-02-22 17:43:38,795][13439] Updated weights for policy 0, policy_version 870 (0.0026) +[2023-02-22 17:43:42,104][00114] Fps is (10 sec: 3688.0, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 3575808. Throughput: 0: 968.6. Samples: 892334. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-22 17:43:42,111][00114] Avg episode reward: [(0, '4.504')] +[2023-02-22 17:43:47,104][00114] Fps is (10 sec: 4505.7, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 3600384. Throughput: 0: 1009.1. Samples: 899734. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-22 17:43:47,110][00114] Avg episode reward: [(0, '4.786')] +[2023-02-22 17:43:47,239][13439] Updated weights for policy 0, policy_version 880 (0.0014) +[2023-02-22 17:43:52,104][00114] Fps is (10 sec: 4505.7, 60 sec: 3959.5, 300 sec: 3887.7). Total num frames: 3620864. Throughput: 0: 976.8. Samples: 905540. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-02-22 17:43:52,107][00114] Avg episode reward: [(0, '4.802')] +[2023-02-22 17:43:57,105][00114] Fps is (10 sec: 3686.0, 60 sec: 3959.5, 300 sec: 3887.7). Total num frames: 3637248. Throughput: 0: 957.7. Samples: 907822. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-22 17:43:57,109][00114] Avg episode reward: [(0, '4.718')] +[2023-02-22 17:43:58,779][13439] Updated weights for policy 0, policy_version 890 (0.0016) +[2023-02-22 17:44:02,104][00114] Fps is (10 sec: 3686.3, 60 sec: 3959.4, 300 sec: 3873.8). Total num frames: 3657728. Throughput: 0: 995.7. Samples: 914008. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-22 17:44:02,110][00114] Avg episode reward: [(0, '4.508')] +[2023-02-22 17:44:07,107][00114] Fps is (10 sec: 4504.4, 60 sec: 3959.2, 300 sec: 3887.7). Total num frames: 3682304. Throughput: 0: 1024.5. Samples: 921306. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-22 17:44:07,110][00114] Avg episode reward: [(0, '4.563')] +[2023-02-22 17:44:07,231][13439] Updated weights for policy 0, policy_version 900 (0.0013) +[2023-02-22 17:44:12,104][00114] Fps is (10 sec: 4096.1, 60 sec: 3959.5, 300 sec: 3887.7). Total num frames: 3698688. Throughput: 0: 1006.8. Samples: 924102. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-22 17:44:12,113][00114] Avg episode reward: [(0, '4.607')] +[2023-02-22 17:44:17,105][00114] Fps is (10 sec: 3277.7, 60 sec: 3959.5, 300 sec: 3887.7). Total num frames: 3715072. Throughput: 0: 974.5. Samples: 928760. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-02-22 17:44:17,113][00114] Avg episode reward: [(0, '4.708')] +[2023-02-22 17:44:18,996][13439] Updated weights for policy 0, policy_version 910 (0.0052) +[2023-02-22 17:44:22,104][00114] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3887.7). Total num frames: 3739648. Throughput: 0: 1025.6. Samples: 935470. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-22 17:44:22,106][00114] Avg episode reward: [(0, '4.870')] +[2023-02-22 17:44:27,104][00114] Fps is (10 sec: 4915.6, 60 sec: 4027.7, 300 sec: 3901.6). Total num frames: 3764224. Throughput: 0: 1040.2. Samples: 939144. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-22 17:44:27,106][00114] Avg episode reward: [(0, '4.644')] +[2023-02-22 17:44:27,563][13439] Updated weights for policy 0, policy_version 920 (0.0012) +[2023-02-22 17:44:32,104][00114] Fps is (10 sec: 4096.0, 60 sec: 4028.1, 300 sec: 3901.6). Total num frames: 3780608. Throughput: 0: 1007.1. Samples: 945054. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-22 17:44:32,107][00114] Avg episode reward: [(0, '4.432')] +[2023-02-22 17:44:37,104][00114] Fps is (10 sec: 3276.8, 60 sec: 4027.7, 300 sec: 3901.6). Total num frames: 3796992. Throughput: 0: 981.7. Samples: 949716. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-22 17:44:37,111][00114] Avg episode reward: [(0, '4.319')] +[2023-02-22 17:44:39,347][13439] Updated weights for policy 0, policy_version 930 (0.0019) +[2023-02-22 17:44:42,104][00114] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 3901.6). Total num frames: 3821568. Throughput: 0: 1008.8. Samples: 953218. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-22 17:44:42,110][00114] Avg episode reward: [(0, '4.353')] +[2023-02-22 17:44:47,108][00114] Fps is (10 sec: 4913.4, 60 sec: 4095.7, 300 sec: 3915.4). Total num frames: 3846144. Throughput: 0: 1036.1. Samples: 960638. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2023-02-22 17:44:47,110][00114] Avg episode reward: [(0, '4.649')] +[2023-02-22 17:44:47,842][13439] Updated weights for policy 0, policy_version 940 (0.0018) +[2023-02-22 17:44:52,106][00114] Fps is (10 sec: 4095.0, 60 sec: 4027.6, 300 sec: 3915.5). Total num frames: 3862528. Throughput: 0: 991.8. Samples: 965938. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-22 17:44:52,109][00114] Avg episode reward: [(0, '4.650')] +[2023-02-22 17:44:57,104][00114] Fps is (10 sec: 3278.0, 60 sec: 4027.8, 300 sec: 3901.6). Total num frames: 3878912. Throughput: 0: 980.4. Samples: 968222. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-22 17:44:57,112][00114] Avg episode reward: [(0, '4.810')] +[2023-02-22 17:44:59,402][13439] Updated weights for policy 0, policy_version 950 (0.0030) +[2023-02-22 17:45:02,104][00114] Fps is (10 sec: 4097.0, 60 sec: 4096.0, 300 sec: 3901.6). Total num frames: 3903488. Throughput: 0: 1022.3. Samples: 974764. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-22 17:45:02,107][00114] Avg episode reward: [(0, '4.869')] +[2023-02-22 17:45:07,104][00114] Fps is (10 sec: 4914.9, 60 sec: 4096.2, 300 sec: 3915.5). Total num frames: 3928064. Throughput: 0: 1036.7. Samples: 982122. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-02-22 17:45:07,108][00114] Avg episode reward: [(0, '4.711')] +[2023-02-22 17:45:08,221][13439] Updated weights for policy 0, policy_version 960 (0.0018) +[2023-02-22 17:45:12,104][00114] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3915.6). Total num frames: 3940352. Throughput: 0: 1006.7. Samples: 984446. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-02-22 17:45:12,111][00114] Avg episode reward: [(0, '4.439')] +[2023-02-22 17:45:17,104][00114] Fps is (10 sec: 3277.0, 60 sec: 4096.1, 300 sec: 3915.5). Total num frames: 3960832. Throughput: 0: 980.0. Samples: 989156. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-02-22 17:45:17,110][00114] Avg episode reward: [(0, '4.448')] +[2023-02-22 17:45:19,424][13439] Updated weights for policy 0, policy_version 970 (0.0025) +[2023-02-22 17:45:22,104][00114] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 3915.5). Total num frames: 3985408. Throughput: 0: 1037.7. Samples: 996414. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-22 17:45:22,106][00114] Avg episode reward: [(0, '4.970')] +[2023-02-22 17:45:26,305][13425] Stopping Batcher_0... +[2023-02-22 17:45:26,307][00114] Component Batcher_0 stopped! +[2023-02-22 17:45:26,313][13425] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2023-02-22 17:45:26,364][13425] Loop batcher_evt_loop terminating... +[2023-02-22 17:45:26,392][13439] Weights refcount: 2 0 +[2023-02-22 17:45:26,400][00114] Component InferenceWorker_p0-w0 stopped! +[2023-02-22 17:45:26,407][13439] Stopping InferenceWorker_p0-w0... +[2023-02-22 17:45:26,408][13439] Loop inference_proc0-0_evt_loop terminating... +[2023-02-22 17:45:26,437][13443] Stopping RolloutWorker_w4... +[2023-02-22 17:45:26,437][00114] Component RolloutWorker_w4 stopped! +[2023-02-22 17:45:26,445][00114] Component RolloutWorker_w5 stopped! +[2023-02-22 17:45:26,449][13446] Stopping RolloutWorker_w5... +[2023-02-22 17:45:26,450][13446] Loop rollout_proc5_evt_loop terminating... +[2023-02-22 17:45:26,461][00114] Component RolloutWorker_w7 stopped! +[2023-02-22 17:45:26,463][13447] Stopping RolloutWorker_w7... +[2023-02-22 17:45:26,463][13447] Loop rollout_proc7_evt_loop terminating... +[2023-02-22 17:45:26,472][00114] Component RolloutWorker_w1 stopped! +[2023-02-22 17:45:26,479][13445] Stopping RolloutWorker_w3... +[2023-02-22 17:45:26,479][13445] Loop rollout_proc3_evt_loop terminating... +[2023-02-22 17:45:26,481][00114] Component RolloutWorker_w3 stopped! +[2023-02-22 17:45:26,474][13440] Stopping RolloutWorker_w1... +[2023-02-22 17:45:26,485][13440] Loop rollout_proc1_evt_loop terminating... +[2023-02-22 17:45:26,502][13444] Stopping RolloutWorker_w6... +[2023-02-22 17:45:26,503][13444] Loop rollout_proc6_evt_loop terminating... +[2023-02-22 17:45:26,503][00114] Component RolloutWorker_w6 stopped! +[2023-02-22 17:45:26,437][13443] Loop rollout_proc4_evt_loop terminating... +[2023-02-22 17:45:26,539][13442] Stopping RolloutWorker_w2... +[2023-02-22 17:45:26,540][00114] Component RolloutWorker_w2 stopped! +[2023-02-22 17:45:26,549][13425] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000753_3084288.pth +[2023-02-22 17:45:26,568][13425] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2023-02-22 17:45:26,540][13442] Loop rollout_proc2_evt_loop terminating... +[2023-02-22 17:45:26,602][13441] Stopping RolloutWorker_w0... +[2023-02-22 17:45:26,602][00114] Component RolloutWorker_w0 stopped! +[2023-02-22 17:45:26,622][13441] Loop rollout_proc0_evt_loop terminating... +[2023-02-22 17:45:26,901][13425] Stopping LearnerWorker_p0... +[2023-02-22 17:45:26,902][13425] Loop learner_proc0_evt_loop terminating... +[2023-02-22 17:45:26,901][00114] Component LearnerWorker_p0 stopped! +[2023-02-22 17:45:26,904][00114] Waiting for process learner_proc0 to stop... +[2023-02-22 17:45:29,276][00114] Waiting for process inference_proc0-0 to join... +[2023-02-22 17:45:29,846][00114] Waiting for process rollout_proc0 to join... +[2023-02-22 17:45:30,618][00114] Waiting for process rollout_proc1 to join... +[2023-02-22 17:45:30,621][00114] Waiting for process rollout_proc2 to join... +[2023-02-22 17:45:30,622][00114] Waiting for process rollout_proc3 to join... +[2023-02-22 17:45:30,624][00114] Waiting for process rollout_proc4 to join... +[2023-02-22 17:45:30,625][00114] Waiting for process rollout_proc5 to join... +[2023-02-22 17:45:30,626][00114] Waiting for process rollout_proc6 to join... +[2023-02-22 17:45:30,627][00114] Waiting for process rollout_proc7 to join... +[2023-02-22 17:45:30,631][00114] Batcher 0 profile tree view: +batching: 24.9339, releasing_batches: 0.0228 +[2023-02-22 17:45:30,633][00114] InferenceWorker_p0-w0 profile tree view: +wait_policy: 0.0001 + wait_policy_total: 506.5642 +update_model: 7.7233 + weight_update: 0.0016 +one_step: 0.0026 + handle_policy_step: 488.5414 + deserialize: 14.1411, stack: 2.7926, obs_to_device_normalize: 111.2670, forward: 231.2570, send_messages: 24.6882 + prepare_outputs: 80.4690 + to_cpu: 50.6445 +[2023-02-22 17:45:30,634][00114] Learner 0 profile tree view: +misc: 0.0120, prepare_batch: 16.0681 +train: 76.4609 + epoch_init: 0.0061, minibatch_init: 0.0059, losses_postprocess: 0.6485, kl_divergence: 0.6033, after_optimizer: 32.9669 + calculate_losses: 27.4887 + losses_init: 0.0048, forward_head: 1.8068, bptt_initial: 18.1434, tail: 1.1346, advantages_returns: 0.2888, losses: 3.6040 + bptt: 2.1645 + bptt_forward_core: 2.0977 + update: 14.1680 + clip: 1.3975 +[2023-02-22 17:45:30,635][00114] RolloutWorker_w0 profile tree view: +wait_for_trajectories: 0.3204, enqueue_policy_requests: 128.7518, env_step: 790.1169, overhead: 17.6947, complete_rollouts: 6.8153 +save_policy_outputs: 18.6688 + split_output_tensors: 9.0017 +[2023-02-22 17:45:30,636][00114] RolloutWorker_w7 profile tree view: +wait_for_trajectories: 0.3282, enqueue_policy_requests: 127.2620, env_step: 790.8325, overhead: 17.9588, complete_rollouts: 6.8302 +save_policy_outputs: 19.4265 + split_output_tensors: 9.3786 +[2023-02-22 17:45:30,639][00114] Loop Runner_EvtLoop terminating... +[2023-02-22 17:45:30,644][00114] Runner profile tree view: +main_loop: 1071.9823 +[2023-02-22 17:45:30,646][00114] Collected {0: 4005888}, FPS: 3736.9 +[2023-02-22 17:45:30,941][00114] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json +[2023-02-22 17:45:30,944][00114] Overriding arg 'num_workers' with value 1 passed from command line +[2023-02-22 17:45:30,946][00114] Adding new argument 'no_render'=True that is not in the saved config file! +[2023-02-22 17:45:30,948][00114] Adding new argument 'save_video'=True that is not in the saved config file! +[2023-02-22 17:45:30,950][00114] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2023-02-22 17:45:30,953][00114] Adding new argument 'video_name'=None that is not in the saved config file! +[2023-02-22 17:45:30,955][00114] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! +[2023-02-22 17:45:30,956][00114] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! +[2023-02-22 17:45:30,957][00114] Adding new argument 'push_to_hub'=False that is not in the saved config file! +[2023-02-22 17:45:30,958][00114] Adding new argument 'hf_repository'=None that is not in the saved config file! +[2023-02-22 17:45:30,960][00114] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2023-02-22 17:45:30,962][00114] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2023-02-22 17:45:30,964][00114] Adding new argument 'train_script'=None that is not in the saved config file! +[2023-02-22 17:45:30,965][00114] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2023-02-22 17:45:30,967][00114] Using frameskip 1 and render_action_repeat=4 for evaluation +[2023-02-22 17:45:31,001][00114] Doom resolution: 160x120, resize resolution: (128, 72) +[2023-02-22 17:45:31,004][00114] RunningMeanStd input shape: (3, 72, 128) +[2023-02-22 17:45:31,006][00114] RunningMeanStd input shape: (1,) +[2023-02-22 17:45:31,032][00114] ConvEncoder: input_channels=3 +[2023-02-22 17:45:31,806][00114] Conv encoder output size: 512 +[2023-02-22 17:45:31,809][00114] Policy head output size: 512 +[2023-02-22 17:45:34,182][00114] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2023-02-22 17:45:35,474][00114] Num frames 100... +[2023-02-22 17:45:35,583][00114] Num frames 200... +[2023-02-22 17:45:35,696][00114] Num frames 300... +[2023-02-22 17:45:35,850][00114] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840 +[2023-02-22 17:45:35,852][00114] Avg episode reward: 3.840, avg true_objective: 3.840 +[2023-02-22 17:45:35,874][00114] Num frames 400... +[2023-02-22 17:45:35,984][00114] Num frames 500... +[2023-02-22 17:45:36,095][00114] Num frames 600... +[2023-02-22 17:45:36,211][00114] Num frames 700... +[2023-02-22 17:45:36,310][00114] Avg episode rewards: #0: 4.180, true rewards: #0: 3.680 +[2023-02-22 17:45:36,313][00114] Avg episode reward: 4.180, avg true_objective: 3.680 +[2023-02-22 17:45:36,384][00114] Num frames 800... +[2023-02-22 17:45:36,498][00114] Num frames 900... +[2023-02-22 17:45:36,609][00114] Num frames 1000... +[2023-02-22 17:45:36,715][00114] Num frames 1100... +[2023-02-22 17:45:36,865][00114] Avg episode rewards: #0: 4.613, true rewards: #0: 3.947 +[2023-02-22 17:45:36,866][00114] Avg episode reward: 4.613, avg true_objective: 3.947 +[2023-02-22 17:45:36,887][00114] Num frames 1200... +[2023-02-22 17:45:36,997][00114] Num frames 1300... +[2023-02-22 17:45:37,106][00114] Num frames 1400... +[2023-02-22 17:45:37,217][00114] Num frames 1500... +[2023-02-22 17:45:37,332][00114] Num frames 1600... +[2023-02-22 17:45:37,423][00114] Avg episode rewards: #0: 4.830, true rewards: #0: 4.080 +[2023-02-22 17:45:37,426][00114] Avg episode reward: 4.830, avg true_objective: 4.080 +[2023-02-22 17:45:37,507][00114] Num frames 1700... +[2023-02-22 17:45:37,616][00114] Num frames 1800... +[2023-02-22 17:45:37,725][00114] Num frames 1900... +[2023-02-22 17:45:37,842][00114] Num frames 2000... +[2023-02-22 17:45:37,917][00114] Avg episode rewards: #0: 4.632, true rewards: #0: 4.032 +[2023-02-22 17:45:37,919][00114] Avg episode reward: 4.632, avg true_objective: 4.032 +[2023-02-22 17:45:38,010][00114] Num frames 2100... +[2023-02-22 17:45:38,118][00114] Num frames 2200... +[2023-02-22 17:45:38,231][00114] Num frames 2300... +[2023-02-22 17:45:38,340][00114] Num frames 2400... +[2023-02-22 17:45:38,450][00114] Num frames 2500... +[2023-02-22 17:45:38,569][00114] Avg episode rewards: #0: 5.100, true rewards: #0: 4.267 +[2023-02-22 17:45:38,572][00114] Avg episode reward: 5.100, avg true_objective: 4.267 +[2023-02-22 17:45:38,618][00114] Num frames 2600... +[2023-02-22 17:45:38,729][00114] Num frames 2700... +[2023-02-22 17:45:38,840][00114] Num frames 2800... +[2023-02-22 17:45:38,955][00114] Num frames 2900... +[2023-02-22 17:45:39,059][00114] Avg episode rewards: #0: 4.920, true rewards: #0: 4.206 +[2023-02-22 17:45:39,061][00114] Avg episode reward: 4.920, avg true_objective: 4.206 +[2023-02-22 17:45:39,123][00114] Num frames 3000... +[2023-02-22 17:45:39,242][00114] Num frames 3100... +[2023-02-22 17:45:39,349][00114] Num frames 3200... +[2023-02-22 17:45:39,463][00114] Num frames 3300... +[2023-02-22 17:45:39,616][00114] Avg episode rewards: #0: 4.990, true rewards: #0: 4.240 +[2023-02-22 17:45:39,618][00114] Avg episode reward: 4.990, avg true_objective: 4.240 +[2023-02-22 17:45:39,631][00114] Num frames 3400... +[2023-02-22 17:45:39,742][00114] Num frames 3500... +[2023-02-22 17:45:39,852][00114] Num frames 3600... +[2023-02-22 17:45:39,964][00114] Num frames 3700... +[2023-02-22 17:45:40,102][00114] Avg episode rewards: #0: 4.862, true rewards: #0: 4.196 +[2023-02-22 17:45:40,103][00114] Avg episode reward: 4.862, avg true_objective: 4.196 +[2023-02-22 17:45:40,132][00114] Num frames 3800... +[2023-02-22 17:45:40,241][00114] Num frames 3900... +[2023-02-22 17:45:40,357][00114] Num frames 4000... +[2023-02-22 17:45:40,468][00114] Num frames 4100... +[2023-02-22 17:45:40,589][00114] Avg episode rewards: #0: 4.760, true rewards: #0: 4.160 +[2023-02-22 17:45:40,590][00114] Avg episode reward: 4.760, avg true_objective: 4.160 +[2023-02-22 17:46:00,099][00114] Replay video saved to /content/train_dir/default_experiment/replay.mp4! +[2023-02-22 18:03:00,508][00114] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json +[2023-02-22 18:03:00,513][00114] Overriding arg 'num_workers' with value 1 passed from command line +[2023-02-22 18:03:00,517][00114] Adding new argument 'no_render'=True that is not in the saved config file! +[2023-02-22 18:03:00,519][00114] Adding new argument 'save_video'=True that is not in the saved config file! +[2023-02-22 18:03:00,522][00114] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2023-02-22 18:03:00,525][00114] Adding new argument 'video_name'=None that is not in the saved config file! +[2023-02-22 18:03:00,527][00114] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! +[2023-02-22 18:03:00,529][00114] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! +[2023-02-22 18:03:00,530][00114] Adding new argument 'push_to_hub'=True that is not in the saved config file! +[2023-02-22 18:03:00,533][00114] Adding new argument 'hf_repository'='ThomasSimonini/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! +[2023-02-22 18:03:00,534][00114] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2023-02-22 18:03:00,537][00114] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2023-02-22 18:03:00,538][00114] Adding new argument 'train_script'=None that is not in the saved config file! +[2023-02-22 18:03:00,540][00114] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2023-02-22 18:03:00,541][00114] Using frameskip 1 and render_action_repeat=4 for evaluation +[2023-02-22 18:03:00,574][00114] RunningMeanStd input shape: (3, 72, 128) +[2023-02-22 18:03:00,579][00114] RunningMeanStd input shape: (1,) +[2023-02-22 18:03:00,599][00114] ConvEncoder: input_channels=3 +[2023-02-22 18:03:00,659][00114] Conv encoder output size: 512 +[2023-02-22 18:03:00,662][00114] Policy head output size: 512 +[2023-02-22 18:03:00,688][00114] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2023-02-22 18:03:01,298][00114] Num frames 100... +[2023-02-22 18:03:01,414][00114] Num frames 200... +[2023-02-22 18:03:01,538][00114] Num frames 300... +[2023-02-22 18:03:01,650][00114] Num frames 400... +[2023-02-22 18:03:01,765][00114] Avg episode rewards: #0: 5.480, true rewards: #0: 4.480 +[2023-02-22 18:03:01,766][00114] Avg episode reward: 5.480, avg true_objective: 4.480 +[2023-02-22 18:03:01,827][00114] Num frames 500... +[2023-02-22 18:03:01,939][00114] Num frames 600... +[2023-02-22 18:03:02,054][00114] Num frames 700... +[2023-02-22 18:03:02,170][00114] Num frames 800... +[2023-02-22 18:03:02,287][00114] Num frames 900... +[2023-02-22 18:03:02,405][00114] Num frames 1000... +[2023-02-22 18:03:02,522][00114] Avg episode rewards: #0: 7.280, true rewards: #0: 5.280 +[2023-02-22 18:03:02,524][00114] Avg episode reward: 7.280, avg true_objective: 5.280 +[2023-02-22 18:03:02,581][00114] Num frames 1100... +[2023-02-22 18:03:02,689][00114] Num frames 1200... +[2023-02-22 18:03:02,800][00114] Num frames 1300... +[2023-02-22 18:03:02,908][00114] Num frames 1400... +[2023-02-22 18:03:03,063][00114] Avg episode rewards: #0: 6.240, true rewards: #0: 4.907 +[2023-02-22 18:03:03,065][00114] Avg episode reward: 6.240, avg true_objective: 4.907 +[2023-02-22 18:03:03,102][00114] Num frames 1500... +[2023-02-22 18:03:03,212][00114] Num frames 1600... +[2023-02-22 18:03:03,322][00114] Num frames 1700... +[2023-02-22 18:03:03,440][00114] Num frames 1800... +[2023-02-22 18:03:03,552][00114] Num frames 1900... +[2023-02-22 18:03:03,667][00114] Avg episode rewards: #0: 6.380, true rewards: #0: 4.880 +[2023-02-22 18:03:03,668][00114] Avg episode reward: 6.380, avg true_objective: 4.880 +[2023-02-22 18:03:03,733][00114] Num frames 2000... +[2023-02-22 18:03:03,858][00114] Num frames 2100... +[2023-02-22 18:03:03,970][00114] Num frames 2200... +[2023-02-22 18:03:04,086][00114] Num frames 2300... +[2023-02-22 18:03:04,180][00114] Avg episode rewards: #0: 5.872, true rewards: #0: 4.672 +[2023-02-22 18:03:04,183][00114] Avg episode reward: 5.872, avg true_objective: 4.672 +[2023-02-22 18:03:04,262][00114] Num frames 2400... +[2023-02-22 18:03:04,371][00114] Num frames 2500... +[2023-02-22 18:03:04,485][00114] Num frames 2600... +[2023-02-22 18:03:04,609][00114] Num frames 2700... +[2023-02-22 18:03:04,686][00114] Avg episode rewards: #0: 5.533, true rewards: #0: 4.533 +[2023-02-22 18:03:04,688][00114] Avg episode reward: 5.533, avg true_objective: 4.533 +[2023-02-22 18:03:04,777][00114] Num frames 2800... +[2023-02-22 18:03:04,889][00114] Num frames 2900... +[2023-02-22 18:03:05,001][00114] Num frames 3000... +[2023-02-22 18:03:05,112][00114] Num frames 3100... +[2023-02-22 18:03:05,206][00114] Avg episode rewards: #0: 5.623, true rewards: #0: 4.480 +[2023-02-22 18:03:05,207][00114] Avg episode reward: 5.623, avg true_objective: 4.480 +[2023-02-22 18:03:05,280][00114] Num frames 3200... +[2023-02-22 18:03:05,387][00114] Num frames 3300... +[2023-02-22 18:03:19,816][00114] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json +[2023-02-22 18:03:19,818][00114] Overriding arg 'num_workers' with value 1 passed from command line +[2023-02-22 18:03:19,820][00114] Adding new argument 'no_render'=True that is not in the saved config file! +[2023-02-22 18:03:19,823][00114] Adding new argument 'save_video'=True that is not in the saved config file! +[2023-02-22 18:03:19,825][00114] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2023-02-22 18:03:19,827][00114] Adding new argument 'video_name'=None that is not in the saved config file! +[2023-02-22 18:03:19,829][00114] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! +[2023-02-22 18:03:19,830][00114] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! +[2023-02-22 18:03:19,831][00114] Adding new argument 'push_to_hub'=True that is not in the saved config file! +[2023-02-22 18:03:19,833][00114] Adding new argument 'hf_repository'='pneubauer/basic-_vizdoom_health_gathering_supreme' that is not in the saved config file! +[2023-02-22 18:03:19,834][00114] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2023-02-22 18:03:19,835][00114] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2023-02-22 18:03:19,836][00114] Adding new argument 'train_script'=None that is not in the saved config file! +[2023-02-22 18:03:19,838][00114] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2023-02-22 18:03:19,839][00114] Using frameskip 1 and render_action_repeat=4 for evaluation +[2023-02-22 18:03:19,868][00114] RunningMeanStd input shape: (3, 72, 128) +[2023-02-22 18:03:19,871][00114] RunningMeanStd input shape: (1,) +[2023-02-22 18:03:19,884][00114] ConvEncoder: input_channels=3 +[2023-02-22 18:03:19,920][00114] Conv encoder output size: 512 +[2023-02-22 18:03:19,922][00114] Policy head output size: 512 +[2023-02-22 18:03:19,941][00114] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2023-02-22 18:03:20,356][00114] Num frames 100... +[2023-02-22 18:03:20,469][00114] Num frames 200... +[2023-02-22 18:03:20,585][00114] Num frames 300... +[2023-02-22 18:03:20,697][00114] Num frames 400... +[2023-02-22 18:03:20,829][00114] Num frames 500... +[2023-02-22 18:03:20,937][00114] Avg episode rewards: #0: 7.440, true rewards: #0: 5.440 +[2023-02-22 18:03:20,939][00114] Avg episode reward: 7.440, avg true_objective: 5.440 +[2023-02-22 18:03:21,002][00114] Num frames 600... +[2023-02-22 18:03:21,115][00114] Num frames 700... +[2023-02-22 18:03:21,223][00114] Num frames 800... +[2023-02-22 18:03:21,331][00114] Num frames 900... +[2023-02-22 18:03:21,451][00114] Avg episode rewards: #0: 6.300, true rewards: #0: 4.800 +[2023-02-22 18:03:21,452][00114] Avg episode reward: 6.300, avg true_objective: 4.800 +[2023-02-22 18:03:21,500][00114] Num frames 1000... +[2023-02-22 18:03:21,611][00114] Num frames 1100... +[2023-02-22 18:03:21,722][00114] Num frames 1200... +[2023-02-22 18:03:21,837][00114] Num frames 1300... +[2023-02-22 18:03:21,947][00114] Avg episode rewards: #0: 5.480, true rewards: #0: 4.480 +[2023-02-22 18:03:21,949][00114] Avg episode reward: 5.480, avg true_objective: 4.480 +[2023-02-22 18:03:22,013][00114] Num frames 1400... +[2023-02-22 18:03:22,129][00114] Num frames 1500... +[2023-02-22 18:03:22,246][00114] Num frames 1600... +[2023-02-22 18:03:22,360][00114] Num frames 1700... +[2023-02-22 18:03:22,449][00114] Avg episode rewards: #0: 5.070, true rewards: #0: 4.320 +[2023-02-22 18:03:22,450][00114] Avg episode reward: 5.070, avg true_objective: 4.320 +[2023-02-22 18:03:22,533][00114] Num frames 1800... +[2023-02-22 18:03:22,644][00114] Num frames 1900... +[2023-02-22 18:03:22,756][00114] Num frames 2000... +[2023-02-22 18:03:22,872][00114] Num frames 2100... +[2023-02-22 18:03:22,980][00114] Num frames 2200... +[2023-02-22 18:03:23,091][00114] Num frames 2300... +[2023-02-22 18:03:23,153][00114] Avg episode rewards: #0: 6.008, true rewards: #0: 4.608 +[2023-02-22 18:03:23,154][00114] Avg episode reward: 6.008, avg true_objective: 4.608 +[2023-02-22 18:03:23,259][00114] Num frames 2400... +[2023-02-22 18:03:23,374][00114] Num frames 2500... +[2023-02-22 18:03:23,483][00114] Num frames 2600... +[2023-02-22 18:03:23,592][00114] Num frames 2700... +[2023-02-22 18:03:23,704][00114] Avg episode rewards: #0: 5.920, true rewards: #0: 4.587 +[2023-02-22 18:03:23,706][00114] Avg episode reward: 5.920, avg true_objective: 4.587 +[2023-02-22 18:03:23,762][00114] Num frames 2800... +[2023-02-22 18:03:23,878][00114] Num frames 2900... +[2023-02-22 18:03:23,992][00114] Num frames 3000... +[2023-02-22 18:03:24,101][00114] Num frames 3100... +[2023-02-22 18:03:24,195][00114] Avg episode rewards: #0: 5.623, true rewards: #0: 4.480 +[2023-02-22 18:03:24,198][00114] Avg episode reward: 5.623, avg true_objective: 4.480 +[2023-02-22 18:03:24,296][00114] Num frames 3200... +[2023-02-22 18:03:24,452][00114] Num frames 3300... +[2023-02-22 18:03:24,603][00114] Num frames 3400... +[2023-02-22 18:03:24,752][00114] Num frames 3500... +[2023-02-22 18:03:24,840][00114] Avg episode rewards: #0: 5.400, true rewards: #0: 4.400 +[2023-02-22 18:03:24,843][00114] Avg episode reward: 5.400, avg true_objective: 4.400 +[2023-02-22 18:03:24,979][00114] Num frames 3600... +[2023-02-22 18:03:25,149][00114] Num frames 3700... +[2023-02-22 18:03:25,302][00114] Num frames 3800... +[2023-02-22 18:03:25,374][00114] Avg episode rewards: #0: 5.120, true rewards: #0: 4.231 +[2023-02-22 18:03:25,380][00114] Avg episode reward: 5.120, avg true_objective: 4.231 +[2023-02-22 18:03:25,522][00114] Num frames 3900... +[2023-02-22 18:03:25,670][00114] Num frames 4000... +[2023-02-22 18:03:25,817][00114] Num frames 4100... +[2023-02-22 18:03:26,027][00114] Avg episode rewards: #0: 4.992, true rewards: #0: 4.192 +[2023-02-22 18:03:26,030][00114] Avg episode reward: 4.992, avg true_objective: 4.192 +[2023-02-22 18:03:45,042][00114] Replay video saved to /content/train_dir/default_experiment/replay.mp4!