diff --git "a/sf_log.txt" "b/sf_log.txt"
--- "a/sf_log.txt"
+++ "b/sf_log.txt"
@@ -1,50 +1,50 @@
-[2023-02-25 01:10:29,963][00389] Saving configuration to /content/train_dir/default_experiment/config.json...
-[2023-02-25 01:10:29,970][00389] Rollout worker 0 uses device cpu
-[2023-02-25 01:10:29,980][00389] Rollout worker 1 uses device cpu
-[2023-02-25 01:10:29,983][00389] Rollout worker 2 uses device cpu
-[2023-02-25 01:10:29,986][00389] Rollout worker 3 uses device cpu
-[2023-02-25 01:10:29,990][00389] Rollout worker 4 uses device cpu
-[2023-02-25 01:10:29,992][00389] Rollout worker 5 uses device cpu
-[2023-02-25 01:10:29,995][00389] Rollout worker 6 uses device cpu
-[2023-02-25 01:10:29,998][00389] Rollout worker 7 uses device cpu
-[2023-02-25 01:10:30,432][00389] Using GPUs [0] for process 0 (actually maps to GPUs [0])
-[2023-02-25 01:10:30,440][00389] InferenceWorker_p0-w0: min num requests: 2
-[2023-02-25 01:10:30,503][00389] Starting all processes...
-[2023-02-25 01:10:30,509][00389] Starting process learner_proc0
-[2023-02-25 01:10:30,614][00389] Starting all processes...
-[2023-02-25 01:10:30,672][00389] Starting process inference_proc0-0
-[2023-02-25 01:10:30,673][00389] Starting process rollout_proc0
-[2023-02-25 01:10:30,676][00389] Starting process rollout_proc1
-[2023-02-25 01:10:30,676][00389] Starting process rollout_proc2
-[2023-02-25 01:10:30,676][00389] Starting process rollout_proc3
-[2023-02-25 01:10:30,676][00389] Starting process rollout_proc4
-[2023-02-25 01:10:30,676][00389] Starting process rollout_proc5
-[2023-02-25 01:10:30,676][00389] Starting process rollout_proc6
-[2023-02-25 01:10:30,676][00389] Starting process rollout_proc7
-[2023-02-25 01:10:41,746][10405] Worker 7 uses CPU cores [1]
-[2023-02-25 01:10:41,812][10383] Using GPUs [0] for process 0 (actually maps to GPUs [0])
-[2023-02-25 01:10:41,815][10383] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0
-[2023-02-25 01:10:42,183][10399] Worker 0 uses CPU cores [0]
-[2023-02-25 01:10:42,192][10397] Using GPUs [0] for process 0 (actually maps to GPUs [0])
-[2023-02-25 01:10:42,195][10397] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0
-[2023-02-25 01:10:42,375][10404] Worker 6 uses CPU cores [0]
-[2023-02-25 01:10:42,479][10400] Worker 2 uses CPU cores [0]
-[2023-02-25 01:10:42,507][10402] Worker 3 uses CPU cores [1]
-[2023-02-25 01:10:42,537][10398] Worker 1 uses CPU cores [1]
-[2023-02-25 01:10:42,556][10403] Worker 5 uses CPU cores [1]
-[2023-02-25 01:10:42,766][10401] Worker 4 uses CPU cores [0]
-[2023-02-25 01:10:43,299][10397] Num visible devices: 1
-[2023-02-25 01:10:43,302][10383] Num visible devices: 1
-[2023-02-25 01:10:43,322][10383] Starting seed is not provided
-[2023-02-25 01:10:43,323][10383] Using GPUs [0] for process 0 (actually maps to GPUs [0])
-[2023-02-25 01:10:43,324][10383] Initializing actor-critic model on device cuda:0
-[2023-02-25 01:10:43,325][10383] RunningMeanStd input shape: (3, 72, 128)
-[2023-02-25 01:10:43,327][10383] RunningMeanStd input shape: (1,)
-[2023-02-25 01:10:43,369][10383] ConvEncoder: input_channels=3
-[2023-02-25 01:10:44,099][10383] Conv encoder output size: 512
-[2023-02-25 01:10:44,101][10383] Policy head output size: 512
-[2023-02-25 01:10:44,205][10383] Created Actor Critic model with architecture:
-[2023-02-25 01:10:44,206][10383] ActorCriticSharedWeights(
+[2023-03-01 03:17:06,195][00674] Saving configuration to /content/train_dir/default_experiment/config.json...
+[2023-03-01 03:17:06,199][00674] Rollout worker 0 uses device cpu
+[2023-03-01 03:17:06,200][00674] Rollout worker 1 uses device cpu
+[2023-03-01 03:17:06,201][00674] Rollout worker 2 uses device cpu
+[2023-03-01 03:17:06,203][00674] Rollout worker 3 uses device cpu
+[2023-03-01 03:17:06,204][00674] Rollout worker 4 uses device cpu
+[2023-03-01 03:17:06,206][00674] Rollout worker 5 uses device cpu
+[2023-03-01 03:17:06,207][00674] Rollout worker 6 uses device cpu
+[2023-03-01 03:17:06,208][00674] Rollout worker 7 uses device cpu
+[2023-03-01 03:17:06,400][00674] Using GPUs [0] for process 0 (actually maps to GPUs [0])
+[2023-03-01 03:17:06,402][00674] InferenceWorker_p0-w0: min num requests: 2
+[2023-03-01 03:17:06,435][00674] Starting all processes...
+[2023-03-01 03:17:06,437][00674] Starting process learner_proc0
+[2023-03-01 03:17:06,489][00674] Starting all processes...
+[2023-03-01 03:17:06,504][00674] Starting process inference_proc0-0
+[2023-03-01 03:17:06,505][00674] Starting process rollout_proc0
+[2023-03-01 03:17:06,506][00674] Starting process rollout_proc1
+[2023-03-01 03:17:06,507][00674] Starting process rollout_proc2
+[2023-03-01 03:17:06,507][00674] Starting process rollout_proc3
+[2023-03-01 03:17:06,507][00674] Starting process rollout_proc4
+[2023-03-01 03:17:06,507][00674] Starting process rollout_proc5
+[2023-03-01 03:17:06,507][00674] Starting process rollout_proc6
+[2023-03-01 03:17:06,507][00674] Starting process rollout_proc7
+[2023-03-01 03:17:15,653][11907] Using GPUs [0] for process 0 (actually maps to GPUs [0])
+[2023-03-01 03:17:15,660][11907] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0
+[2023-03-01 03:17:16,111][11921] Using GPUs [0] for process 0 (actually maps to GPUs [0])
+[2023-03-01 03:17:16,131][11921] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0
+[2023-03-01 03:17:16,213][11930] Worker 4 uses CPU cores [0]
+[2023-03-01 03:17:16,300][11929] Worker 5 uses CPU cores [1]
+[2023-03-01 03:17:16,314][11933] Worker 7 uses CPU cores [1]
+[2023-03-01 03:17:16,514][11927] Worker 0 uses CPU cores [0]
+[2023-03-01 03:17:16,528][11931] Worker 2 uses CPU cores [0]
+[2023-03-01 03:17:16,652][11932] Worker 6 uses CPU cores [0]
+[2023-03-01 03:17:16,793][11926] Worker 1 uses CPU cores [1]
+[2023-03-01 03:17:16,794][11928] Worker 3 uses CPU cores [1]
+[2023-03-01 03:17:16,904][11921] Num visible devices: 1
+[2023-03-01 03:17:16,906][11907] Num visible devices: 1
+[2023-03-01 03:17:16,918][11907] Starting seed is not provided
+[2023-03-01 03:17:16,918][11907] Using GPUs [0] for process 0 (actually maps to GPUs [0])
+[2023-03-01 03:17:16,918][11907] Initializing actor-critic model on device cuda:0
+[2023-03-01 03:17:16,919][11907] RunningMeanStd input shape: (3, 72, 128)
+[2023-03-01 03:17:16,921][11907] RunningMeanStd input shape: (1,)
+[2023-03-01 03:17:16,942][11907] ConvEncoder: input_channels=3
+[2023-03-01 03:17:17,298][11907] Conv encoder output size: 512
+[2023-03-01 03:17:17,299][11907] Policy head output size: 512
+[2023-03-01 03:17:17,359][11907] Created Actor Critic model with architecture:
+[2023-03-01 03:17:17,359][11907] ActorCriticSharedWeights(
   (obs_normalizer): ObservationNormalizer(
     (running_mean_std): RunningMeanStdDictInPlace(
       (running_mean_std): ModuleDict(
@@ -85,1210 +85,324 @@
     (distribution_linear): Linear(in_features=512, out_features=5, bias=True)
   )
 )
-[2023-02-25 01:10:50,418][00389] Heartbeat connected on Batcher_0
-[2023-02-25 01:10:50,432][00389] Heartbeat connected on InferenceWorker_p0-w0
-[2023-02-25 01:10:50,456][00389] Heartbeat connected on RolloutWorker_w0
-[2023-02-25 01:10:50,462][00389] Heartbeat connected on RolloutWorker_w1
-[2023-02-25 01:10:50,472][00389] Heartbeat connected on RolloutWorker_w2
-[2023-02-25 01:10:50,486][00389] Heartbeat connected on RolloutWorker_w3
-[2023-02-25 01:10:50,488][00389] Heartbeat connected on RolloutWorker_w4
-[2023-02-25 01:10:50,492][00389] Heartbeat connected on RolloutWorker_w5
-[2023-02-25 01:10:50,497][00389] Heartbeat connected on RolloutWorker_w6
-[2023-02-25 01:10:50,502][00389] Heartbeat connected on RolloutWorker_w7
-[2023-02-25 01:10:52,969][10383] Using optimizer <class 'torch.optim.adam.Adam'>
-[2023-02-25 01:10:52,970][10383] No checkpoints found
-[2023-02-25 01:10:52,971][10383] Did not load from checkpoint, starting from scratch!
-[2023-02-25 01:10:52,971][10383] Initialized policy 0 weights for model version 0
-[2023-02-25 01:10:52,976][10383] Using GPUs [0] for process 0 (actually maps to GPUs [0])
-[2023-02-25 01:10:52,983][10383] LearnerWorker_p0 finished initialization!
-[2023-02-25 01:10:52,984][00389] Heartbeat connected on LearnerWorker_p0
-[2023-02-25 01:10:53,188][10397] RunningMeanStd input shape: (3, 72, 128)
-[2023-02-25 01:10:53,189][10397] RunningMeanStd input shape: (1,)
-[2023-02-25 01:10:53,202][10397] ConvEncoder: input_channels=3
-[2023-02-25 01:10:53,312][10397] Conv encoder output size: 512
-[2023-02-25 01:10:53,312][10397] Policy head output size: 512
-[2023-02-25 01:10:53,377][00389] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
-[2023-02-25 01:10:55,603][00389] Inference worker 0-0 is ready!
-[2023-02-25 01:10:55,605][00389] All inference workers are ready! Signal rollout workers to start!
-[2023-02-25 01:10:55,733][10405] Doom resolution: 160x120, resize resolution: (128, 72)
-[2023-02-25 01:10:55,739][10402] Doom resolution: 160x120, resize resolution: (128, 72)
-[2023-02-25 01:10:55,749][10403] Doom resolution: 160x120, resize resolution: (128, 72)
-[2023-02-25 01:10:55,770][10404] Doom resolution: 160x120, resize resolution: (128, 72)
-[2023-02-25 01:10:55,765][10401] Doom resolution: 160x120, resize resolution: (128, 72)
-[2023-02-25 01:10:55,767][10398] Doom resolution: 160x120, resize resolution: (128, 72)
-[2023-02-25 01:10:55,780][10400] Doom resolution: 160x120, resize resolution: (128, 72)
-[2023-02-25 01:10:55,784][10399] Doom resolution: 160x120, resize resolution: (128, 72)
-[2023-02-25 01:10:56,651][10399] Decorrelating experience for 0 frames...
-[2023-02-25 01:10:56,653][10400] Decorrelating experience for 0 frames...
-[2023-02-25 01:10:57,028][10402] Decorrelating experience for 0 frames...
-[2023-02-25 01:10:57,032][10403] Decorrelating experience for 0 frames...
-[2023-02-25 01:10:57,037][10398] Decorrelating experience for 0 frames...
-[2023-02-25 01:10:57,039][10405] Decorrelating experience for 0 frames...
-[2023-02-25 01:10:58,228][10401] Decorrelating experience for 0 frames...
-[2023-02-25 01:10:58,246][10400] Decorrelating experience for 32 frames...
-[2023-02-25 01:10:58,377][00389] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
-[2023-02-25 01:10:58,393][10402] Decorrelating experience for 32 frames...
-[2023-02-25 01:10:58,409][10398] Decorrelating experience for 32 frames...
-[2023-02-25 01:10:58,414][10405] Decorrelating experience for 32 frames...
-[2023-02-25 01:10:59,716][10401] Decorrelating experience for 32 frames...
-[2023-02-25 01:10:59,739][10403] Decorrelating experience for 32 frames...
-[2023-02-25 01:10:59,950][10402] Decorrelating experience for 64 frames...
-[2023-02-25 01:10:59,953][10400] Decorrelating experience for 64 frames...
-[2023-02-25 01:10:59,961][10399] Decorrelating experience for 32 frames...
-[2023-02-25 01:11:01,604][10404] Decorrelating experience for 0 frames...
-[2023-02-25 01:11:01,607][10401] Decorrelating experience for 64 frames...
-[2023-02-25 01:11:01,665][10398] Decorrelating experience for 64 frames...
-[2023-02-25 01:11:01,668][10403] Decorrelating experience for 64 frames...
-[2023-02-25 01:11:01,833][10402] Decorrelating experience for 96 frames...
-[2023-02-25 01:11:01,893][10399] Decorrelating experience for 64 frames...
-[2023-02-25 01:11:03,270][10404] Decorrelating experience for 32 frames...
-[2023-02-25 01:11:03,373][10400] Decorrelating experience for 96 frames...
-[2023-02-25 01:11:03,380][00389] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
-[2023-02-25 01:11:03,534][10401] Decorrelating experience for 96 frames...
-[2023-02-25 01:11:04,314][10399] Decorrelating experience for 96 frames...
-[2023-02-25 01:11:04,408][10404] Decorrelating experience for 64 frames...
-[2023-02-25 01:11:05,267][10405] Decorrelating experience for 64 frames...
-[2023-02-25 01:11:05,430][10398] Decorrelating experience for 96 frames...
-[2023-02-25 01:11:05,439][10403] Decorrelating experience for 96 frames...
-[2023-02-25 01:11:05,556][10404] Decorrelating experience for 96 frames...
-[2023-02-25 01:11:06,072][10405] Decorrelating experience for 96 frames...
-[2023-02-25 01:11:08,377][00389] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 38.5. Samples: 578. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
-[2023-02-25 01:11:08,380][00389] Avg episode reward: [(0, '1.173')]
-[2023-02-25 01:11:09,514][10383] Signal inference workers to stop experience collection...
-[2023-02-25 01:11:09,530][10397] InferenceWorker_p0-w0: stopping experience collection
-[2023-02-25 01:11:12,293][10383] Signal inference workers to resume experience collection...
-[2023-02-25 01:11:12,294][10397] InferenceWorker_p0-w0: resuming experience collection
-[2023-02-25 01:11:13,377][00389] Fps is (10 sec: 409.7, 60 sec: 204.8, 300 sec: 204.8). Total num frames: 4096. Throughput: 0: 111.2. Samples: 2224. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0)
-[2023-02-25 01:11:13,383][00389] Avg episode reward: [(0, '2.592')]
-[2023-02-25 01:11:18,377][00389] Fps is (10 sec: 2457.6, 60 sec: 983.0, 300 sec: 983.0). Total num frames: 24576. Throughput: 0: 227.4. Samples: 5686. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
-[2023-02-25 01:11:18,383][00389] Avg episode reward: [(0, '3.547')]
-[2023-02-25 01:11:23,377][00389] Fps is (10 sec: 3276.8, 60 sec: 1228.8, 300 sec: 1228.8). Total num frames: 36864. Throughput: 0: 326.9. Samples: 9808. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
-[2023-02-25 01:11:23,380][00389] Avg episode reward: [(0, '3.829')]
-[2023-02-25 01:11:23,802][10397] Updated weights for policy 0, policy_version 10 (0.0017)
-[2023-02-25 01:11:28,377][00389] Fps is (10 sec: 3276.7, 60 sec: 1638.4, 300 sec: 1638.4). Total num frames: 57344. Throughput: 0: 341.4. Samples: 11948. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
-[2023-02-25 01:11:28,379][00389] Avg episode reward: [(0, '4.336')]
-[2023-02-25 01:11:33,377][00389] Fps is (10 sec: 4096.0, 60 sec: 1945.6, 300 sec: 1945.6). Total num frames: 77824. Throughput: 0: 459.8. Samples: 18392. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
-[2023-02-25 01:11:33,380][00389] Avg episode reward: [(0, '4.361')]
-[2023-02-25 01:11:33,949][10397] Updated weights for policy 0, policy_version 20 (0.0014)
-[2023-02-25 01:11:38,377][00389] Fps is (10 sec: 3686.5, 60 sec: 2093.5, 300 sec: 2093.5). Total num frames: 94208. Throughput: 0: 544.2. Samples: 24490. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
-[2023-02-25 01:11:38,380][00389] Avg episode reward: [(0, '4.380')]
-[2023-02-25 01:11:41,603][00389] Keyboard interrupt detected in the event loop EvtLoop [Runner_EvtLoop, process=main process 389], exiting...
-[2023-02-25 01:11:41,611][10383] Stopping Batcher_0...
-[2023-02-25 01:11:41,612][10383] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000026_106496.pth...
-[2023-02-25 01:11:41,610][00389] Runner profile tree view:
-main_loop: 71.1077
-[2023-02-25 01:11:41,614][00389] Collected {0: 106496}, FPS: 1497.7
-[2023-02-25 01:11:41,700][10397] Weights refcount: 2 0
-[2023-02-25 01:11:41,613][10383] Loop batcher_evt_loop terminating...
-[2023-02-25 01:11:41,687][10403] EvtLoop [rollout_proc5_evt_loop, process=rollout_proc5] unhandled exception in slot='advance_rollouts' connected to emitter=Emitter(object_id='InferenceWorker_p0-w0', signal_name='advance5'), args=(1, 0)
-Traceback (most recent call last):
-  File "/usr/local/lib/python3.8/dist-packages/signal_slot/signal_slot.py", line 355, in _process_signal
-    slot_callable(*args)
-  File "/usr/local/lib/python3.8/dist-packages/sample_factory/algo/sampling/rollout_worker.py", line 241, in advance_rollouts
-    complete_rollouts, episodic_stats = runner.advance_rollouts(policy_id, self.timing)
-  File "/usr/local/lib/python3.8/dist-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 634, in advance_rollouts
-    new_obs, rewards, terminated, truncated, infos = e.step(actions)
-  File "/usr/local/lib/python3.8/dist-packages/gym/core.py", line 319, in step
-    return self.env.step(action)
-  File "/usr/local/lib/python3.8/dist-packages/sample_factory/algo/utils/make_env.py", line 129, in step
-    obs, rew, terminated, truncated, info = self.env.step(action)
-  File "/usr/local/lib/python3.8/dist-packages/sample_factory/algo/utils/make_env.py", line 115, in step
-    obs, rew, terminated, truncated, info = self.env.step(action)
-  File "/usr/local/lib/python3.8/dist-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 33, in step
-    observation, reward, terminated, truncated, info = self.env.step(action)
-  File "/usr/local/lib/python3.8/dist-packages/gym/core.py", line 384, in step
-    observation, reward, terminated, truncated, info = self.env.step(action)
-  File "/usr/local/lib/python3.8/dist-packages/sample_factory/envs/env_wrappers.py", line 88, in step
-    obs, reward, terminated, truncated, info = self.env.step(action)
-  File "/usr/local/lib/python3.8/dist-packages/gym/core.py", line 319, in step
-    return self.env.step(action)
-  File "/usr/local/lib/python3.8/dist-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 54, in step
-    obs, reward, terminated, truncated, info = self.env.step(action)
-  File "/usr/local/lib/python3.8/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 452, in step
-    reward = self.game.make_action(actions_flattened, self.skip_frames)
-vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed.
-[2023-02-25 01:11:41,658][10402] EvtLoop [rollout_proc3_evt_loop, process=rollout_proc3] unhandled exception in slot='advance_rollouts' connected to emitter=Emitter(object_id='InferenceWorker_p0-w0', signal_name='advance3'), args=(1, 0)
-Traceback (most recent call last):
-  File "/usr/local/lib/python3.8/dist-packages/signal_slot/signal_slot.py", line 355, in _process_signal
-    slot_callable(*args)
-  File "/usr/local/lib/python3.8/dist-packages/sample_factory/algo/sampling/rollout_worker.py", line 241, in advance_rollouts
-    complete_rollouts, episodic_stats = runner.advance_rollouts(policy_id, self.timing)
-  File "/usr/local/lib/python3.8/dist-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 634, in advance_rollouts
-    new_obs, rewards, terminated, truncated, infos = e.step(actions)
-  File "/usr/local/lib/python3.8/dist-packages/gym/core.py", line 319, in step
-    return self.env.step(action)
-  File "/usr/local/lib/python3.8/dist-packages/sample_factory/algo/utils/make_env.py", line 129, in step
-    obs, rew, terminated, truncated, info = self.env.step(action)
-  File "/usr/local/lib/python3.8/dist-packages/sample_factory/algo/utils/make_env.py", line 115, in step
-    obs, rew, terminated, truncated, info = self.env.step(action)
-  File "/usr/local/lib/python3.8/dist-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 33, in step
-    observation, reward, terminated, truncated, info = self.env.step(action)
-  File "/usr/local/lib/python3.8/dist-packages/gym/core.py", line 384, in step
-    observation, reward, terminated, truncated, info = self.env.step(action)
-  File "/usr/local/lib/python3.8/dist-packages/sample_factory/envs/env_wrappers.py", line 88, in step
-    obs, reward, terminated, truncated, info = self.env.step(action)
-  File "/usr/local/lib/python3.8/dist-packages/gym/core.py", line 319, in step
-    return self.env.step(action)
-  File "/usr/local/lib/python3.8/dist-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 54, in step
-    obs, reward, terminated, truncated, info = self.env.step(action)
-  File "/usr/local/lib/python3.8/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 452, in step
-    reward = self.game.make_action(actions_flattened, self.skip_frames)
-vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed.
-[2023-02-25 01:11:41,721][10402] Unhandled exception Signal SIGINT received. ViZDoom instance has been closed. in evt loop rollout_proc3_evt_loop
-[2023-02-25 01:11:41,684][10400] EvtLoop [rollout_proc2_evt_loop, process=rollout_proc2] unhandled exception in slot='advance_rollouts' connected to emitter=Emitter(object_id='InferenceWorker_p0-w0', signal_name='advance2'), args=(1, 0)
-Traceback (most recent call last):
-  File "/usr/local/lib/python3.8/dist-packages/signal_slot/signal_slot.py", line 355, in _process_signal
-    slot_callable(*args)
-  File "/usr/local/lib/python3.8/dist-packages/sample_factory/algo/sampling/rollout_worker.py", line 241, in advance_rollouts
-    complete_rollouts, episodic_stats = runner.advance_rollouts(policy_id, self.timing)
-  File "/usr/local/lib/python3.8/dist-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 634, in advance_rollouts
-    new_obs, rewards, terminated, truncated, infos = e.step(actions)
-  File "/usr/local/lib/python3.8/dist-packages/gym/core.py", line 319, in step
-    return self.env.step(action)
-  File "/usr/local/lib/python3.8/dist-packages/sample_factory/algo/utils/make_env.py", line 129, in step
-    obs, rew, terminated, truncated, info = self.env.step(action)
-  File "/usr/local/lib/python3.8/dist-packages/sample_factory/algo/utils/make_env.py", line 115, in step
-    obs, rew, terminated, truncated, info = self.env.step(action)
-  File "/usr/local/lib/python3.8/dist-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 33, in step
-    observation, reward, terminated, truncated, info = self.env.step(action)
-  File "/usr/local/lib/python3.8/dist-packages/gym/core.py", line 384, in step
-    observation, reward, terminated, truncated, info = self.env.step(action)
-  File "/usr/local/lib/python3.8/dist-packages/sample_factory/envs/env_wrappers.py", line 88, in step
-    obs, reward, terminated, truncated, info = self.env.step(action)
-  File "/usr/local/lib/python3.8/dist-packages/gym/core.py", line 319, in step
-    return self.env.step(action)
-  File "/usr/local/lib/python3.8/dist-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 54, in step
-    obs, reward, terminated, truncated, info = self.env.step(action)
-  File "/usr/local/lib/python3.8/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 452, in step
-    reward = self.game.make_action(actions_flattened, self.skip_frames)
-vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed.
-[2023-02-25 01:11:41,725][10400] Unhandled exception Signal SIGINT received. ViZDoom instance has been closed. in evt loop rollout_proc2_evt_loop
-[2023-02-25 01:11:41,721][10403] Unhandled exception Signal SIGINT received. ViZDoom instance has been closed. in evt loop rollout_proc5_evt_loop
-[2023-02-25 01:11:41,726][10398] EvtLoop [rollout_proc1_evt_loop, process=rollout_proc1] unhandled exception in slot='advance_rollouts' connected to emitter=Emitter(object_id='InferenceWorker_p0-w0', signal_name='advance1'), args=(1, 0)
-Traceback (most recent call last):
-  File "/usr/local/lib/python3.8/dist-packages/signal_slot/signal_slot.py", line 355, in _process_signal
-    slot_callable(*args)
-  File "/usr/local/lib/python3.8/dist-packages/sample_factory/algo/sampling/rollout_worker.py", line 241, in advance_rollouts
-    complete_rollouts, episodic_stats = runner.advance_rollouts(policy_id, self.timing)
-  File "/usr/local/lib/python3.8/dist-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 634, in advance_rollouts
-    new_obs, rewards, terminated, truncated, infos = e.step(actions)
-  File "/usr/local/lib/python3.8/dist-packages/gym/core.py", line 319, in step
-    return self.env.step(action)
-  File "/usr/local/lib/python3.8/dist-packages/sample_factory/algo/utils/make_env.py", line 129, in step
-    obs, rew, terminated, truncated, info = self.env.step(action)
-  File "/usr/local/lib/python3.8/dist-packages/sample_factory/algo/utils/make_env.py", line 115, in step
-    obs, rew, terminated, truncated, info = self.env.step(action)
-  File "/usr/local/lib/python3.8/dist-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 33, in step
-    observation, reward, terminated, truncated, info = self.env.step(action)
-  File "/usr/local/lib/python3.8/dist-packages/gym/core.py", line 384, in step
-    observation, reward, terminated, truncated, info = self.env.step(action)
-  File "/usr/local/lib/python3.8/dist-packages/sample_factory/envs/env_wrappers.py", line 88, in step
-    obs, reward, terminated, truncated, info = self.env.step(action)
-  File "/usr/local/lib/python3.8/dist-packages/gym/core.py", line 319, in step
-    return self.env.step(action)
-  File "/usr/local/lib/python3.8/dist-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 54, in step
-    obs, reward, terminated, truncated, info = self.env.step(action)
-  File "/usr/local/lib/python3.8/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 452, in step
-    reward = self.game.make_action(actions_flattened, self.skip_frames)
-vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed.
-[2023-02-25 01:11:41,757][10398] Unhandled exception Signal SIGINT received. ViZDoom instance has been closed. in evt loop rollout_proc1_evt_loop
-[2023-02-25 01:11:41,596][10404] EvtLoop [rollout_proc6_evt_loop, process=rollout_proc6] unhandled exception in slot='advance_rollouts' connected to emitter=Emitter(object_id='InferenceWorker_p0-w0', signal_name='advance6'), args=(0, 0)
-Traceback (most recent call last):
-  File "/usr/local/lib/python3.8/dist-packages/signal_slot/signal_slot.py", line 355, in _process_signal
-    slot_callable(*args)
-  File "/usr/local/lib/python3.8/dist-packages/sample_factory/algo/sampling/rollout_worker.py", line 241, in advance_rollouts
-    complete_rollouts, episodic_stats = runner.advance_rollouts(policy_id, self.timing)
-  File "/usr/local/lib/python3.8/dist-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 634, in advance_rollouts
-    new_obs, rewards, terminated, truncated, infos = e.step(actions)
-  File "/usr/local/lib/python3.8/dist-packages/gym/core.py", line 319, in step
-    return self.env.step(action)
-  File "/usr/local/lib/python3.8/dist-packages/sample_factory/algo/utils/make_env.py", line 129, in step
-    obs, rew, terminated, truncated, info = self.env.step(action)
-  File "/usr/local/lib/python3.8/dist-packages/sample_factory/algo/utils/make_env.py", line 115, in step
-    obs, rew, terminated, truncated, info = self.env.step(action)
-  File "/usr/local/lib/python3.8/dist-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 33, in step
-    observation, reward, terminated, truncated, info = self.env.step(action)
-  File "/usr/local/lib/python3.8/dist-packages/gym/core.py", line 384, in step
-    observation, reward, terminated, truncated, info = self.env.step(action)
-  File "/usr/local/lib/python3.8/dist-packages/sample_factory/envs/env_wrappers.py", line 88, in step
-    obs, reward, terminated, truncated, info = self.env.step(action)
-  File "/usr/local/lib/python3.8/dist-packages/gym/core.py", line 319, in step
-    return self.env.step(action)
-  File "/usr/local/lib/python3.8/dist-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 54, in step
-    obs, reward, terminated, truncated, info = self.env.step(action)
-  File "/usr/local/lib/python3.8/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 452, in step
-    reward = self.game.make_action(actions_flattened, self.skip_frames)
-vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed.
-[2023-02-25 01:11:41,758][10404] Unhandled exception Signal SIGINT received. ViZDoom instance has been closed. in evt loop rollout_proc6_evt_loop
-[2023-02-25 01:11:41,665][10401] EvtLoop [rollout_proc4_evt_loop, process=rollout_proc4] unhandled exception in slot='advance_rollouts' connected to emitter=Emitter(object_id='InferenceWorker_p0-w0', signal_name='advance4'), args=(1, 0)
-Traceback (most recent call last):
-  File "/usr/local/lib/python3.8/dist-packages/signal_slot/signal_slot.py", line 355, in _process_signal
-    slot_callable(*args)
-  File "/usr/local/lib/python3.8/dist-packages/sample_factory/algo/sampling/rollout_worker.py", line 241, in advance_rollouts
-    complete_rollouts, episodic_stats = runner.advance_rollouts(policy_id, self.timing)
-  File "/usr/local/lib/python3.8/dist-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 634, in advance_rollouts
-    new_obs, rewards, terminated, truncated, infos = e.step(actions)
-  File "/usr/local/lib/python3.8/dist-packages/gym/core.py", line 319, in step
-    return self.env.step(action)
-  File "/usr/local/lib/python3.8/dist-packages/sample_factory/algo/utils/make_env.py", line 129, in step
-    obs, rew, terminated, truncated, info = self.env.step(action)
-  File "/usr/local/lib/python3.8/dist-packages/sample_factory/algo/utils/make_env.py", line 115, in step
-    obs, rew, terminated, truncated, info = self.env.step(action)
-  File "/usr/local/lib/python3.8/dist-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 33, in step
-    observation, reward, terminated, truncated, info = self.env.step(action)
-  File "/usr/local/lib/python3.8/dist-packages/gym/core.py", line 384, in step
-    observation, reward, terminated, truncated, info = self.env.step(action)
-  File "/usr/local/lib/python3.8/dist-packages/sample_factory/envs/env_wrappers.py", line 88, in step
-    obs, reward, terminated, truncated, info = self.env.step(action)
-  File "/usr/local/lib/python3.8/dist-packages/gym/core.py", line 319, in step
-    return self.env.step(action)
-  File "/usr/local/lib/python3.8/dist-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 54, in step
-    obs, reward, terminated, truncated, info = self.env.step(action)
-  File "/usr/local/lib/python3.8/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 452, in step
-    reward = self.game.make_action(actions_flattened, self.skip_frames)
-vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed.
-[2023-02-25 01:11:41,763][10401] Unhandled exception Signal SIGINT received. ViZDoom instance has been closed. in evt loop rollout_proc4_evt_loop
-[2023-02-25 01:11:41,755][10397] Stopping InferenceWorker_p0-w0...
-[2023-02-25 01:11:41,772][10397] Loop inference_proc0-0_evt_loop terminating...
-[2023-02-25 01:11:41,747][10399] EvtLoop [rollout_proc0_evt_loop, process=rollout_proc0] unhandled exception in slot='advance_rollouts' connected to emitter=Emitter(object_id='InferenceWorker_p0-w0', signal_name='advance0'), args=(1, 0)
-Traceback (most recent call last):
-  File "/usr/local/lib/python3.8/dist-packages/signal_slot/signal_slot.py", line 355, in _process_signal
-    slot_callable(*args)
-  File "/usr/local/lib/python3.8/dist-packages/sample_factory/algo/sampling/rollout_worker.py", line 241, in advance_rollouts
-    complete_rollouts, episodic_stats = runner.advance_rollouts(policy_id, self.timing)
-  File "/usr/local/lib/python3.8/dist-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 634, in advance_rollouts
-    new_obs, rewards, terminated, truncated, infos = e.step(actions)
-  File "/usr/local/lib/python3.8/dist-packages/gym/core.py", line 319, in step
-    return self.env.step(action)
-  File "/usr/local/lib/python3.8/dist-packages/sample_factory/algo/utils/make_env.py", line 129, in step
-    obs, rew, terminated, truncated, info = self.env.step(action)
-  File "/usr/local/lib/python3.8/dist-packages/sample_factory/algo/utils/make_env.py", line 115, in step
-    obs, rew, terminated, truncated, info = self.env.step(action)
-  File "/usr/local/lib/python3.8/dist-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 33, in step
-    observation, reward, terminated, truncated, info = self.env.step(action)
-  File "/usr/local/lib/python3.8/dist-packages/gym/core.py", line 384, in step
-    observation, reward, terminated, truncated, info = self.env.step(action)
-  File "/usr/local/lib/python3.8/dist-packages/sample_factory/envs/env_wrappers.py", line 88, in step
-    obs, reward, terminated, truncated, info = self.env.step(action)
-  File "/usr/local/lib/python3.8/dist-packages/gym/core.py", line 319, in step
-    return self.env.step(action)
-  File "/usr/local/lib/python3.8/dist-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 54, in step
-    obs, reward, terminated, truncated, info = self.env.step(action)
-  File "/usr/local/lib/python3.8/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 452, in step
-    reward = self.game.make_action(actions_flattened, self.skip_frames)
-vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed.
-[2023-02-25 01:11:41,831][10399] Unhandled exception Signal SIGINT received. ViZDoom instance has been closed. in evt loop rollout_proc0_evt_loop
-[2023-02-25 01:11:41,910][10405] EvtLoop [rollout_proc7_evt_loop, process=rollout_proc7] unhandled exception in slot='advance_rollouts' connected to emitter=Emitter(object_id='InferenceWorker_p0-w0', signal_name='advance7'), args=(0, 0)
-Traceback (most recent call last):
-  File "/usr/local/lib/python3.8/dist-packages/signal_slot/signal_slot.py", line 355, in _process_signal
-    slot_callable(*args)
-  File "/usr/local/lib/python3.8/dist-packages/sample_factory/algo/sampling/rollout_worker.py", line 241, in advance_rollouts
-    complete_rollouts, episodic_stats = runner.advance_rollouts(policy_id, self.timing)
-  File "/usr/local/lib/python3.8/dist-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 634, in advance_rollouts
-    new_obs, rewards, terminated, truncated, infos = e.step(actions)
-  File "/usr/local/lib/python3.8/dist-packages/gym/core.py", line 319, in step
-    return self.env.step(action)
-  File "/usr/local/lib/python3.8/dist-packages/sample_factory/algo/utils/make_env.py", line 129, in step
-    obs, rew, terminated, truncated, info = self.env.step(action)
-  File "/usr/local/lib/python3.8/dist-packages/sample_factory/algo/utils/make_env.py", line 115, in step
-    obs, rew, terminated, truncated, info = self.env.step(action)
-  File "/usr/local/lib/python3.8/dist-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 33, in step
-    observation, reward, terminated, truncated, info = self.env.step(action)
-  File "/usr/local/lib/python3.8/dist-packages/gym/core.py", line 384, in step
-    observation, reward, terminated, truncated, info = self.env.step(action)
-  File "/usr/local/lib/python3.8/dist-packages/sample_factory/envs/env_wrappers.py", line 88, in step
-    obs, reward, terminated, truncated, info = self.env.step(action)
-  File "/usr/local/lib/python3.8/dist-packages/gym/core.py", line 319, in step
-    return self.env.step(action)
-  File "/usr/local/lib/python3.8/dist-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 54, in step
-    obs, reward, terminated, truncated, info = self.env.step(action)
-  File "/usr/local/lib/python3.8/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 452, in step
-    reward = self.game.make_action(actions_flattened, self.skip_frames)
-vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed.
-[2023-02-25 01:11:41,932][10405] Unhandled exception Signal SIGINT received. ViZDoom instance has been closed. in evt loop rollout_proc7_evt_loop
-[2023-02-25 01:11:42,126][10383] Stopping LearnerWorker_p0...
-[2023-02-25 01:11:42,127][10383] Loop learner_proc0_evt_loop terminating...
-[2023-02-25 01:11:46,723][00389] Environment doom_basic already registered, overwriting...
-[2023-02-25 01:11:46,725][00389] Environment doom_two_colors_easy already registered, overwriting...
-[2023-02-25 01:11:46,729][00389] Environment doom_two_colors_hard already registered, overwriting...
-[2023-02-25 01:11:46,731][00389] Environment doom_dm already registered, overwriting...
-[2023-02-25 01:11:46,732][00389] Environment doom_dwango5 already registered, overwriting...
-[2023-02-25 01:11:46,736][00389] Environment doom_my_way_home_flat_actions already registered, overwriting...
-[2023-02-25 01:11:46,738][00389] Environment doom_defend_the_center_flat_actions already registered, overwriting...
-[2023-02-25 01:11:46,741][00389] Environment doom_my_way_home already registered, overwriting...
-[2023-02-25 01:11:46,743][00389] Environment doom_deadly_corridor already registered, overwriting...
-[2023-02-25 01:11:46,746][00389] Environment doom_defend_the_center already registered, overwriting...
-[2023-02-25 01:11:46,747][00389] Environment doom_defend_the_line already registered, overwriting...
-[2023-02-25 01:11:46,749][00389] Environment doom_health_gathering already registered, overwriting...
-[2023-02-25 01:11:46,751][00389] Environment doom_health_gathering_supreme already registered, overwriting...
-[2023-02-25 01:11:46,753][00389] Environment doom_battle already registered, overwriting...
-[2023-02-25 01:11:46,754][00389] Environment doom_battle2 already registered, overwriting...
-[2023-02-25 01:11:46,756][00389] Environment doom_duel_bots already registered, overwriting...
-[2023-02-25 01:11:46,758][00389] Environment doom_deathmatch_bots already registered, overwriting...
-[2023-02-25 01:11:46,760][00389] Environment doom_duel already registered, overwriting...
-[2023-02-25 01:11:46,764][00389] Environment doom_deathmatch_full already registered, overwriting...
-[2023-02-25 01:11:46,766][00389] Environment doom_benchmark already registered, overwriting...
-[2023-02-25 01:11:46,768][00389] register_encoder_factory: <function make_vizdoom_encoder at 0x7fa473917310>
-[2023-02-25 01:11:46,797][00389] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
-[2023-02-25 01:11:46,798][00389] Overriding arg 'train_for_env_steps' with value 40000 passed from command line
-[2023-02-25 01:11:46,805][00389] Experiment dir /content/train_dir/default_experiment already exists!
-[2023-02-25 01:11:46,806][00389] Resuming existing experiment from /content/train_dir/default_experiment...
-[2023-02-25 01:11:46,809][00389] Weights and Biases integration disabled
-[2023-02-25 01:11:46,813][00389] Environment var CUDA_VISIBLE_DEVICES is 0
-
-[2023-02-25 01:11:49,428][00389] Starting experiment with the following configuration:
-help=False
-algo=APPO
-env=doom_health_gathering_supreme
-experiment=default_experiment
-train_dir=/content/train_dir
-restart_behavior=resume
-device=gpu
-seed=None
-num_policies=1
-async_rl=True
-serial_mode=False
-batched_sampling=False
-num_batches_to_accumulate=2
-worker_num_splits=2
-policy_workers_per_policy=1
-max_policy_lag=1000
-num_workers=8
-num_envs_per_worker=4
-batch_size=1024
-num_batches_per_epoch=1
-num_epochs=1
-rollout=32
-recurrence=32
-shuffle_minibatches=False
-gamma=0.99
-reward_scale=1.0
-reward_clip=1000.0
-value_bootstrap=False
-normalize_returns=True
-exploration_loss_coeff=0.001
-value_loss_coeff=0.5
-kl_loss_coeff=0.0
-exploration_loss=symmetric_kl
-gae_lambda=0.95
-ppo_clip_ratio=0.1
-ppo_clip_value=0.2
-with_vtrace=False
-vtrace_rho=1.0
-vtrace_c=1.0
-optimizer=adam
-adam_eps=1e-06
-adam_beta1=0.9
-adam_beta2=0.999
-max_grad_norm=4.0
-learning_rate=0.0001
-lr_schedule=constant
-lr_schedule_kl_threshold=0.008
-lr_adaptive_min=1e-06
-lr_adaptive_max=0.01
-obs_subtract_mean=0.0
-obs_scale=255.0
-normalize_input=True
-normalize_input_keys=None
-decorrelate_experience_max_seconds=0
-decorrelate_envs_on_one_worker=True
-actor_worker_gpus=[]
-set_workers_cpu_affinity=True
-force_envs_single_thread=False
-default_niceness=0
-log_to_file=True
-experiment_summaries_interval=10
-flush_summaries_interval=30
-stats_avg=100
-summaries_use_frameskip=True
-heartbeat_interval=20
-heartbeat_reporting_interval=600
-train_for_env_steps=40000
-train_for_seconds=10000000000
-save_every_sec=120
-keep_checkpoints=2
-load_checkpoint_kind=latest
-save_milestones_sec=-1
-save_best_every_sec=5
-save_best_metric=reward
-save_best_after=100000
-benchmark=False
-encoder_mlp_layers=[512, 512]
-encoder_conv_architecture=convnet_simple
-encoder_conv_mlp_layers=[512]
-use_rnn=True
-rnn_size=512
-rnn_type=gru
-rnn_num_layers=1
-decoder_mlp_layers=[]
-nonlinearity=elu
-policy_initialization=orthogonal
-policy_init_gain=1.0
-actor_critic_share_weights=True
-adaptive_stddev=True
-continuous_tanh_scale=0.0
-initial_stddev=1.0
-use_env_info_cache=False
-env_gpu_actions=False
-env_gpu_observations=True
-env_frameskip=4
-env_framestack=1
-pixel_format=CHW
-use_record_episode_statistics=False
-with_wandb=False
-wandb_user=None
-wandb_project=sample_factory
-wandb_group=None
-wandb_job_type=SF
-wandb_tags=[]
-with_pbt=False
-pbt_mix_policies_in_one_env=True
-pbt_period_env_steps=5000000
-pbt_start_mutation=20000000
-pbt_replace_fraction=0.3
-pbt_mutation_rate=0.15
-pbt_replace_reward_gap=0.1
-pbt_replace_reward_gap_absolute=1e-06
-pbt_optimize_gamma=False
-pbt_target_objective=true_objective
-pbt_perturb_min=1.1
-pbt_perturb_max=1.5
-num_agents=-1
-num_humans=0
-num_bots=-1
-start_bot_difficulty=None
-timelimit=None
-res_w=128
-res_h=72
-wide_aspect_ratio=False
-eval_env_frameskip=1
-fps=35
-command_line=--env=doom_health_gathering_supreme --num_workers=8 --num_envs_per_worker=4 --train_for_env_steps=4000000
-cli_args={'env': 'doom_health_gathering_supreme', 'num_workers': 8, 'num_envs_per_worker': 4, 'train_for_env_steps': 4000000}
-git_hash=unknown
-git_repo_name=not a git repository
-[2023-02-25 01:11:49,433][00389] Saving configuration to /content/train_dir/default_experiment/config.json...
-[2023-02-25 01:11:49,440][00389] Rollout worker 0 uses device cpu
-[2023-02-25 01:11:49,442][00389] Rollout worker 1 uses device cpu
-[2023-02-25 01:11:49,444][00389] Rollout worker 2 uses device cpu
-[2023-02-25 01:11:49,453][00389] Rollout worker 3 uses device cpu
-[2023-02-25 01:11:49,457][00389] Rollout worker 4 uses device cpu
-[2023-02-25 01:11:49,467][00389] Rollout worker 5 uses device cpu
-[2023-02-25 01:11:49,471][00389] Rollout worker 6 uses device cpu
-[2023-02-25 01:11:49,474][00389] Rollout worker 7 uses device cpu
-[2023-02-25 01:11:49,789][00389] Using GPUs [0] for process 0 (actually maps to GPUs [0])
-[2023-02-25 01:11:49,791][00389] InferenceWorker_p0-w0: min num requests: 2
-[2023-02-25 01:11:49,842][00389] Starting all processes...
-[2023-02-25 01:11:49,844][00389] Starting process learner_proc0
-[2023-02-25 01:11:49,927][00389] Starting all processes...
-[2023-02-25 01:11:50,033][00389] Starting process inference_proc0-0
-[2023-02-25 01:11:50,034][00389] Starting process rollout_proc0
-[2023-02-25 01:11:50,034][00389] Starting process rollout_proc1
-[2023-02-25 01:11:50,034][00389] Starting process rollout_proc2
-[2023-02-25 01:11:50,034][00389] Starting process rollout_proc3
-[2023-02-25 01:11:50,034][00389] Starting process rollout_proc4
-[2023-02-25 01:11:50,035][00389] Starting process rollout_proc5
-[2023-02-25 01:11:50,036][00389] Starting process rollout_proc6
-[2023-02-25 01:11:50,039][00389] Starting process rollout_proc7
-[2023-02-25 01:12:01,907][13885] Using GPUs [0] for process 0 (actually maps to GPUs [0])
-[2023-02-25 01:12:01,916][13885] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0
-[2023-02-25 01:12:03,079][13905] Worker 4 uses CPU cores [0]
-[2023-02-25 01:12:03,135][13901] Worker 1 uses CPU cores [1]
-[2023-02-25 01:12:03,143][13885] Num visible devices: 1
-[2023-02-25 01:12:03,186][13885] Starting seed is not provided
-[2023-02-25 01:12:03,186][13885] Using GPUs [0] for process 0 (actually maps to GPUs [0])
-[2023-02-25 01:12:03,186][13885] Initializing actor-critic model on device cuda:0
-[2023-02-25 01:12:03,187][13885] RunningMeanStd input shape: (3, 72, 128)
-[2023-02-25 01:12:03,188][13885] RunningMeanStd input shape: (1,)
-[2023-02-25 01:12:03,204][13902] Worker 3 uses CPU cores [1]
-[2023-02-25 01:12:03,217][13904] Worker 5 uses CPU cores [1]
-[2023-02-25 01:12:03,241][13900] Using GPUs [0] for process 0 (actually maps to GPUs [0])
-[2023-02-25 01:12:03,246][13900] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0
-[2023-02-25 01:12:03,263][13885] ConvEncoder: input_channels=3
-[2023-02-25 01:12:03,282][13899] Worker 0 uses CPU cores [0]
-[2023-02-25 01:12:03,302][13903] Worker 2 uses CPU cores [0]
-[2023-02-25 01:12:03,302][13900] Num visible devices: 1
-[2023-02-25 01:12:03,354][13907] Worker 7 uses CPU cores [1]
-[2023-02-25 01:12:03,373][13906] Worker 6 uses CPU cores [0]
-[2023-02-25 01:12:03,467][13885] Conv encoder output size: 512
-[2023-02-25 01:12:03,468][13885] Policy head output size: 512
-[2023-02-25 01:12:03,483][13885] Created Actor Critic model with architecture:
-[2023-02-25 01:12:03,483][13885] ActorCriticSharedWeights(
-  (obs_normalizer): ObservationNormalizer(
-    (running_mean_std): RunningMeanStdDictInPlace(
-      (running_mean_std): ModuleDict(
-        (obs): RunningMeanStdInPlace()
-      )
-    )
-  )
-  (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace)
-  (encoder): VizdoomEncoder(
-    (basic_encoder): ConvEncoder(
-      (enc): RecursiveScriptModule(
-        original_name=ConvEncoderImpl
-        (conv_head): RecursiveScriptModule(
-          original_name=Sequential
-          (0): RecursiveScriptModule(original_name=Conv2d)
-          (1): RecursiveScriptModule(original_name=ELU)
-          (2): RecursiveScriptModule(original_name=Conv2d)
-          (3): RecursiveScriptModule(original_name=ELU)
-          (4): RecursiveScriptModule(original_name=Conv2d)
-          (5): RecursiveScriptModule(original_name=ELU)
-        )
-        (mlp_layers): RecursiveScriptModule(
-          original_name=Sequential
-          (0): RecursiveScriptModule(original_name=Linear)
-          (1): RecursiveScriptModule(original_name=ELU)
-        )
-      )
-    )
-  )
-  (core): ModelCoreRNN(
-    (core): GRU(512, 512)
-  )
-  (decoder): MlpDecoder(
-    (mlp): Identity()
-  )
-  (critic_linear): Linear(in_features=512, out_features=1, bias=True)
-  (action_parameterization): ActionParameterizationDefault(
-    (distribution_linear): Linear(in_features=512, out_features=5, bias=True)
-  )
-)
-[2023-02-25 01:12:05,920][13885] Using optimizer <class 'torch.optim.adam.Adam'>
-[2023-02-25 01:12:05,920][13885] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000026_106496.pth...
-[2023-02-25 01:12:05,955][13885] Loading model from checkpoint
-[2023-02-25 01:12:05,962][13885] Loaded experiment state at self.train_step=26, self.env_steps=106496
-[2023-02-25 01:12:05,962][13885] Initialized policy 0 weights for model version 26
-[2023-02-25 01:12:05,967][13885] Using GPUs [0] for process 0 (actually maps to GPUs [0])
-[2023-02-25 01:12:05,974][13885] LearnerWorker_p0 finished initialization!
-[2023-02-25 01:12:06,181][13900] RunningMeanStd input shape: (3, 72, 128)
-[2023-02-25 01:12:06,182][13900] RunningMeanStd input shape: (1,)
-[2023-02-25 01:12:06,195][13900] ConvEncoder: input_channels=3
-[2023-02-25 01:12:06,299][13900] Conv encoder output size: 512
-[2023-02-25 01:12:06,299][13900] Policy head output size: 512
-[2023-02-25 01:12:06,813][00389] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 106496. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
-[2023-02-25 01:12:08,878][00389] Inference worker 0-0 is ready!
-[2023-02-25 01:12:08,879][00389] All inference workers are ready! Signal rollout workers to start!
-[2023-02-25 01:12:08,979][13903] Doom resolution: 160x120, resize resolution: (128, 72)
-[2023-02-25 01:12:08,981][13906] Doom resolution: 160x120, resize resolution: (128, 72)
-[2023-02-25 01:12:08,982][13905] Doom resolution: 160x120, resize resolution: (128, 72)
-[2023-02-25 01:12:08,977][13899] Doom resolution: 160x120, resize resolution: (128, 72)
-[2023-02-25 01:12:08,981][13904] Doom resolution: 160x120, resize resolution: (128, 72)
-[2023-02-25 01:12:08,990][13901] Doom resolution: 160x120, resize resolution: (128, 72)
-[2023-02-25 01:12:08,989][13907] Doom resolution: 160x120, resize resolution: (128, 72)
-[2023-02-25 01:12:08,985][13902] Doom resolution: 160x120, resize resolution: (128, 72)
-[2023-02-25 01:12:09,778][00389] Heartbeat connected on Batcher_0
-[2023-02-25 01:12:09,784][00389] Heartbeat connected on LearnerWorker_p0
-[2023-02-25 01:12:09,821][00389] Heartbeat connected on InferenceWorker_p0-w0
-[2023-02-25 01:12:10,369][13906] Decorrelating experience for 0 frames...
-[2023-02-25 01:12:10,374][13905] Decorrelating experience for 0 frames...
-[2023-02-25 01:12:10,379][13899] Decorrelating experience for 0 frames...
-[2023-02-25 01:12:10,392][13901] Decorrelating experience for 0 frames...
-[2023-02-25 01:12:10,396][13904] Decorrelating experience for 0 frames...
-[2023-02-25 01:12:10,402][13907] Decorrelating experience for 0 frames...
-[2023-02-25 01:12:11,100][13902] Decorrelating experience for 0 frames...
-[2023-02-25 01:12:11,106][13901] Decorrelating experience for 32 frames...
-[2023-02-25 01:12:11,454][13906] Decorrelating experience for 32 frames...
-[2023-02-25 01:12:11,457][13903] Decorrelating experience for 0 frames...
-[2023-02-25 01:12:11,556][13905] Decorrelating experience for 32 frames...
-[2023-02-25 01:12:11,813][00389] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 106496. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
-[2023-02-25 01:12:12,390][13904] Decorrelating experience for 32 frames...
-[2023-02-25 01:12:12,415][13902] Decorrelating experience for 32 frames...
-[2023-02-25 01:12:12,445][13907] Decorrelating experience for 32 frames...
-[2023-02-25 01:12:13,067][13903] Decorrelating experience for 32 frames...
-[2023-02-25 01:12:13,254][13899] Decorrelating experience for 32 frames...
-[2023-02-25 01:12:13,606][13905] Decorrelating experience for 64 frames...
-[2023-02-25 01:12:14,275][13906] Decorrelating experience for 64 frames...
-[2023-02-25 01:12:14,319][13901] Decorrelating experience for 64 frames...
-[2023-02-25 01:12:14,601][13904] Decorrelating experience for 64 frames...
-[2023-02-25 01:12:14,611][13902] Decorrelating experience for 64 frames...
-[2023-02-25 01:12:15,236][13903] Decorrelating experience for 64 frames...
-[2023-02-25 01:12:15,999][13907] Decorrelating experience for 64 frames...
-[2023-02-25 01:12:16,173][13899] Decorrelating experience for 64 frames...
-[2023-02-25 01:12:16,176][13901] Decorrelating experience for 96 frames...
-[2023-02-25 01:12:16,324][13906] Decorrelating experience for 96 frames...
-[2023-02-25 01:12:16,472][13902] Decorrelating experience for 96 frames...
-[2023-02-25 01:12:16,487][00389] Heartbeat connected on RolloutWorker_w1
-[2023-02-25 01:12:16,555][00389] Heartbeat connected on RolloutWorker_w6
-[2023-02-25 01:12:16,804][00389] Heartbeat connected on RolloutWorker_w3
-[2023-02-25 01:12:16,814][00389] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 106496. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
-[2023-02-25 01:12:16,873][13905] Decorrelating experience for 96 frames...
-[2023-02-25 01:12:17,294][00389] Heartbeat connected on RolloutWorker_w4
-[2023-02-25 01:12:19,782][13907] Decorrelating experience for 96 frames...
-[2023-02-25 01:12:20,397][00389] Heartbeat connected on RolloutWorker_w7
-[2023-02-25 01:12:21,813][00389] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 106496. Throughput: 0: 84.8. Samples: 1272. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
-[2023-02-25 01:12:21,817][00389] Avg episode reward: [(0, '2.183')]
-[2023-02-25 01:12:24,886][13885] Signal inference workers to stop experience collection...
-[2023-02-25 01:12:24,932][13900] InferenceWorker_p0-w0: stopping experience collection
-[2023-02-25 01:12:25,028][13903] Decorrelating experience for 96 frames...
-[2023-02-25 01:12:25,176][00389] Heartbeat connected on RolloutWorker_w2
-[2023-02-25 01:12:25,198][13885] Signal inference workers to resume experience collection...
-[2023-02-25 01:12:25,199][13885] Saving new best policy, reward=2.183!
-[2023-02-25 01:12:25,199][13900] InferenceWorker_p0-w0: resuming experience collection
-[2023-02-25 01:12:25,227][13885] Stopping Batcher_0...
-[2023-02-25 01:12:25,228][13885] Loop batcher_evt_loop terminating...
-[2023-02-25 01:12:25,229][00389] Component Batcher_0 stopped!
-[2023-02-25 01:12:25,260][13905] Stopping RolloutWorker_w4...
-[2023-02-25 01:12:25,261][13905] Loop rollout_proc4_evt_loop terminating...
-[2023-02-25 01:12:25,260][00389] Component RolloutWorker_w2 stopped!
-[2023-02-25 01:12:25,268][00389] Component RolloutWorker_w4 stopped!
-[2023-02-25 01:12:25,250][13900] Weights refcount: 2 0
-[2023-02-25 01:12:25,282][13900] Stopping InferenceWorker_p0-w0...
-[2023-02-25 01:12:25,274][00389] Component RolloutWorker_w3 stopped!
-[2023-02-25 01:12:25,256][13903] Stopping RolloutWorker_w2...
-[2023-02-25 01:12:25,284][13906] Stopping RolloutWorker_w6...
-[2023-02-25 01:12:25,283][00389] Component InferenceWorker_p0-w0 stopped!
-[2023-02-25 01:12:25,283][13900] Loop inference_proc0-0_evt_loop terminating...
-[2023-02-25 01:12:25,273][13902] Stopping RolloutWorker_w3...
-[2023-02-25 01:12:25,294][13902] Loop rollout_proc3_evt_loop terminating...
-[2023-02-25 01:12:25,293][00389] Component RolloutWorker_w6 stopped!
-[2023-02-25 01:12:25,283][13903] Loop rollout_proc2_evt_loop terminating...
-[2023-02-25 01:12:25,314][00389] Component RolloutWorker_w1 stopped!
-[2023-02-25 01:12:25,292][13906] Loop rollout_proc6_evt_loop terminating...
-[2023-02-25 01:12:25,315][13901] Stopping RolloutWorker_w1...
-[2023-02-25 01:12:25,330][13901] Loop rollout_proc1_evt_loop terminating...
-[2023-02-25 01:12:25,321][13907] Stopping RolloutWorker_w7...
-[2023-02-25 01:12:25,321][00389] Component RolloutWorker_w7 stopped!
-[2023-02-25 01:12:25,339][13907] Loop rollout_proc7_evt_loop terminating...
-[2023-02-25 01:12:28,175][13885] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000028_114688.pth...
-[2023-02-25 01:12:28,296][13885] Saving new best policy, reward=2.621!
-[2023-02-25 01:12:28,546][13885] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000028_114688.pth...
-[2023-02-25 01:12:28,768][13885] Stopping LearnerWorker_p0...
-[2023-02-25 01:12:28,768][13885] Loop learner_proc0_evt_loop terminating...
-[2023-02-25 01:12:28,769][00389] Component LearnerWorker_p0 stopped!
-[2023-02-25 01:12:28,905][13899] Decorrelating experience for 96 frames...
-[2023-02-25 01:12:29,426][00389] Component RolloutWorker_w0 stopped!
-[2023-02-25 01:12:29,436][13899] Stopping RolloutWorker_w0...
-[2023-02-25 01:12:29,438][13899] Loop rollout_proc0_evt_loop terminating...
-[2023-02-25 01:12:30,527][13904] Decorrelating experience for 96 frames...
-[2023-02-25 01:12:30,583][13904] Stopping RolloutWorker_w5...
-[2023-02-25 01:12:30,584][13904] Loop rollout_proc5_evt_loop terminating...
-[2023-02-25 01:12:30,583][00389] Component RolloutWorker_w5 stopped!
-[2023-02-25 01:12:30,586][00389] Waiting for process learner_proc0 to stop...
-[2023-02-25 01:12:30,587][00389] Waiting for process inference_proc0-0 to join...
-[2023-02-25 01:12:30,589][00389] Waiting for process rollout_proc0 to join...
-[2023-02-25 01:12:30,590][00389] Waiting for process rollout_proc1 to join...
-[2023-02-25 01:12:30,593][00389] Waiting for process rollout_proc2 to join...
-[2023-02-25 01:12:30,595][00389] Waiting for process rollout_proc3 to join...
-[2023-02-25 01:12:30,597][00389] Waiting for process rollout_proc4 to join...
-[2023-02-25 01:12:30,600][00389] Waiting for process rollout_proc5 to join...
-[2023-02-25 01:12:30,826][00389] Waiting for process rollout_proc6 to join...
-[2023-02-25 01:12:30,830][00389] Waiting for process rollout_proc7 to join...
-[2023-02-25 01:12:30,831][00389] Batcher 0 profile tree view:
-batching: 0.0708, releasing_batches: 0.0005
-[2023-02-25 01:12:30,834][00389] InferenceWorker_p0-w0 profile tree view:
-update_model: 0.0226
-wait_policy: 0.0024
-  wait_policy_total: 11.7130
-one_step: 0.0128
-  handle_policy_step: 3.9862
-    deserialize: 0.0532, stack: 0.0110, obs_to_device_normalize: 0.4729, forward: 2.9253, send_messages: 0.1043
-    prepare_outputs: 0.3162
-      to_cpu: 0.1608
-[2023-02-25 01:12:30,836][00389] Learner 0 profile tree view:
-misc: 0.0000, prepare_batch: 6.5606
-train: 1.0002
-  epoch_init: 0.0000, minibatch_init: 0.0000, losses_postprocess: 0.0004, kl_divergence: 0.0005, after_optimizer: 0.0132
-  calculate_losses: 0.1732
-    losses_init: 0.0000, forward_head: 0.1298, bptt_initial: 0.0213, tail: 0.0013, advantages_returns: 0.0052, losses: 0.0119
+[2023-03-01 03:17:24,882][11907] Using optimizer <class 'torch.optim.adam.Adam'>
+[2023-03-01 03:17:24,883][11907] No checkpoints found
+[2023-03-01 03:17:24,883][11907] Did not load from checkpoint, starting from scratch!
+[2023-03-01 03:17:24,884][11907] Initialized policy 0 weights for model version 0
+[2023-03-01 03:17:24,887][11907] Using GPUs [0] for process 0 (actually maps to GPUs [0])
+[2023-03-01 03:17:24,894][11907] LearnerWorker_p0 finished initialization!
+[2023-03-01 03:17:25,090][11921] RunningMeanStd input shape: (3, 72, 128)
+[2023-03-01 03:17:25,091][11921] RunningMeanStd input shape: (1,)
+[2023-03-01 03:17:25,103][11921] ConvEncoder: input_channels=3
+[2023-03-01 03:17:25,203][11921] Conv encoder output size: 512
+[2023-03-01 03:17:25,204][11921] Policy head output size: 512
+[2023-03-01 03:17:26,393][00674] Heartbeat connected on Batcher_0
+[2023-03-01 03:17:26,398][00674] Heartbeat connected on LearnerWorker_p0
+[2023-03-01 03:17:26,413][00674] Heartbeat connected on RolloutWorker_w0
+[2023-03-01 03:17:26,417][00674] Heartbeat connected on RolloutWorker_w1
+[2023-03-01 03:17:26,422][00674] Heartbeat connected on RolloutWorker_w2
+[2023-03-01 03:17:26,424][00674] Heartbeat connected on RolloutWorker_w3
+[2023-03-01 03:17:26,427][00674] Heartbeat connected on RolloutWorker_w4
+[2023-03-01 03:17:26,429][00674] Heartbeat connected on RolloutWorker_w5
+[2023-03-01 03:17:26,432][00674] Heartbeat connected on RolloutWorker_w6
+[2023-03-01 03:17:26,435][00674] Heartbeat connected on RolloutWorker_w7
+[2023-03-01 03:17:27,008][00674] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2023-03-01 03:17:27,574][00674] Inference worker 0-0 is ready!
+[2023-03-01 03:17:27,576][00674] All inference workers are ready! Signal rollout workers to start!
+[2023-03-01 03:17:27,580][00674] Heartbeat connected on InferenceWorker_p0-w0
+[2023-03-01 03:17:27,683][11928] Doom resolution: 160x120, resize resolution: (128, 72)
+[2023-03-01 03:17:27,707][11926] Doom resolution: 160x120, resize resolution: (128, 72)
+[2023-03-01 03:17:27,715][11933] Doom resolution: 160x120, resize resolution: (128, 72)
+[2023-03-01 03:17:27,723][11931] Doom resolution: 160x120, resize resolution: (128, 72)
+[2023-03-01 03:17:27,735][11927] Doom resolution: 160x120, resize resolution: (128, 72)
+[2023-03-01 03:17:27,742][11930] Doom resolution: 160x120, resize resolution: (128, 72)
+[2023-03-01 03:17:27,745][11932] Doom resolution: 160x120, resize resolution: (128, 72)
+[2023-03-01 03:17:27,759][11929] Doom resolution: 160x120, resize resolution: (128, 72)
+[2023-03-01 03:17:28,900][11926] Decorrelating experience for 0 frames...
+[2023-03-01 03:17:28,901][11933] Decorrelating experience for 0 frames...
+[2023-03-01 03:17:28,902][11928] Decorrelating experience for 0 frames...
+[2023-03-01 03:17:28,900][11927] Decorrelating experience for 0 frames...
+[2023-03-01 03:17:28,904][11930] Decorrelating experience for 0 frames...
+[2023-03-01 03:17:28,903][11932] Decorrelating experience for 0 frames...
+[2023-03-01 03:17:29,577][11933] Decorrelating experience for 32 frames...
+[2023-03-01 03:17:29,613][11929] Decorrelating experience for 0 frames...
+[2023-03-01 03:17:30,174][11927] Decorrelating experience for 32 frames...
+[2023-03-01 03:17:30,180][11931] Decorrelating experience for 0 frames...
+[2023-03-01 03:17:30,184][11930] Decorrelating experience for 32 frames...
+[2023-03-01 03:17:31,014][11929] Decorrelating experience for 32 frames...
+[2023-03-01 03:17:31,135][11933] Decorrelating experience for 64 frames...
+[2023-03-01 03:17:31,355][11932] Decorrelating experience for 32 frames...
+[2023-03-01 03:17:31,866][11931] Decorrelating experience for 32 frames...
+[2023-03-01 03:17:32,011][00674] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2023-03-01 03:17:32,094][11927] Decorrelating experience for 64 frames...
+[2023-03-01 03:17:32,116][11930] Decorrelating experience for 64 frames...
+[2023-03-01 03:17:32,601][11933] Decorrelating experience for 96 frames...
+[2023-03-01 03:17:32,775][11929] Decorrelating experience for 64 frames...
+[2023-03-01 03:17:33,397][11928] Decorrelating experience for 32 frames...
+[2023-03-01 03:17:34,167][11928] Decorrelating experience for 64 frames...
+[2023-03-01 03:17:34,321][11931] Decorrelating experience for 64 frames...
+[2023-03-01 03:17:34,460][11927] Decorrelating experience for 96 frames...
+[2023-03-01 03:17:34,493][11930] Decorrelating experience for 96 frames...
+[2023-03-01 03:17:34,904][11928] Decorrelating experience for 96 frames...
+[2023-03-01 03:17:35,586][11932] Decorrelating experience for 64 frames...
+[2023-03-01 03:17:36,049][11929] Decorrelating experience for 96 frames...
+[2023-03-01 03:17:36,375][11931] Decorrelating experience for 96 frames...
+[2023-03-01 03:17:36,844][11926] Decorrelating experience for 32 frames...
+[2023-03-01 03:17:37,009][00674] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2023-03-01 03:17:37,195][11932] Decorrelating experience for 96 frames...
+[2023-03-01 03:17:37,520][11926] Decorrelating experience for 64 frames...
+[2023-03-01 03:17:38,041][11926] Decorrelating experience for 96 frames...
+[2023-03-01 03:17:40,783][11907] Signal inference workers to stop experience collection...
+[2023-03-01 03:17:40,812][11921] InferenceWorker_p0-w0: stopping experience collection
+[2023-03-01 03:17:42,008][00674] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 98.9. Samples: 1484. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2023-03-01 03:17:42,010][00674] Avg episode reward: [(0, '1.828')]
+[2023-03-01 03:17:43,230][11907] Signal inference workers to resume experience collection...
+[2023-03-01 03:17:43,231][11921] InferenceWorker_p0-w0: resuming experience collection
+[2023-03-01 03:17:44,876][11926] Stopping RolloutWorker_w1...
+[2023-03-01 03:17:44,878][11926] Loop rollout_proc1_evt_loop terminating...
+[2023-03-01 03:17:44,879][11907] Stopping Batcher_0...
+[2023-03-01 03:17:44,880][11907] Loop batcher_evt_loop terminating...
+[2023-03-01 03:17:44,877][00674] Component RolloutWorker_w1 stopped!
+[2023-03-01 03:17:44,889][00674] Component Batcher_0 stopped!
+[2023-03-01 03:17:44,913][00674] Component RolloutWorker_w0 stopped!
+[2023-03-01 03:17:44,914][11928] Stopping RolloutWorker_w3...
+[2023-03-01 03:17:44,917][00674] Component RolloutWorker_w3 stopped!
+[2023-03-01 03:17:44,922][11927] Stopping RolloutWorker_w0...
+[2023-03-01 03:17:44,923][00674] Component RolloutWorker_w7 stopped!
+[2023-03-01 03:17:44,924][11933] Stopping RolloutWorker_w7...
+[2023-03-01 03:17:44,930][11929] Stopping RolloutWorker_w5...
+[2023-03-01 03:17:44,930][00674] Component RolloutWorker_w5 stopped!
+[2023-03-01 03:17:44,919][11928] Loop rollout_proc3_evt_loop terminating...
+[2023-03-01 03:17:44,932][11907] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000003_12288.pth...
+[2023-03-01 03:17:44,937][11921] Weights refcount: 2 0
+[2023-03-01 03:17:44,937][11927] Loop rollout_proc0_evt_loop terminating...
+[2023-03-01 03:17:44,939][00674] Component InferenceWorker_p0-w0 stopped!
+[2023-03-01 03:17:44,927][11933] Loop rollout_proc7_evt_loop terminating...
+[2023-03-01 03:17:44,931][11929] Loop rollout_proc5_evt_loop terminating...
+[2023-03-01 03:17:44,939][11921] Stopping InferenceWorker_p0-w0...
+[2023-03-01 03:17:44,944][11921] Loop inference_proc0-0_evt_loop terminating...
+[2023-03-01 03:17:44,959][00674] Component RolloutWorker_w4 stopped!
+[2023-03-01 03:17:44,969][11932] Stopping RolloutWorker_w6...
+[2023-03-01 03:17:44,969][00674] Component RolloutWorker_w6 stopped!
+[2023-03-01 03:17:44,959][11930] Stopping RolloutWorker_w4...
+[2023-03-01 03:17:44,980][11931] Stopping RolloutWorker_w2...
+[2023-03-01 03:17:44,981][00674] Component RolloutWorker_w2 stopped!
+[2023-03-01 03:17:44,970][11932] Loop rollout_proc6_evt_loop terminating...
+[2023-03-01 03:17:44,974][11930] Loop rollout_proc4_evt_loop terminating...
+[2023-03-01 03:17:44,982][11931] Loop rollout_proc2_evt_loop terminating...
+[2023-03-01 03:17:45,054][11907] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000003_12288.pth...
+[2023-03-01 03:17:45,205][00674] Component LearnerWorker_p0 stopped!
+[2023-03-01 03:17:45,207][00674] Waiting for process learner_proc0 to stop...
+[2023-03-01 03:17:45,211][11907] Stopping LearnerWorker_p0...
+[2023-03-01 03:17:45,213][11907] Loop learner_proc0_evt_loop terminating...
+[2023-03-01 03:17:46,991][00674] Waiting for process inference_proc0-0 to join...
+[2023-03-01 03:17:47,419][00674] Waiting for process rollout_proc0 to join...
+[2023-03-01 03:17:47,798][00674] Waiting for process rollout_proc1 to join...
+[2023-03-01 03:17:47,800][00674] Waiting for process rollout_proc2 to join...
+[2023-03-01 03:17:47,812][00674] Waiting for process rollout_proc3 to join...
+[2023-03-01 03:17:47,813][00674] Waiting for process rollout_proc4 to join...
+[2023-03-01 03:17:47,815][00674] Waiting for process rollout_proc5 to join...
+[2023-03-01 03:17:47,816][00674] Waiting for process rollout_proc6 to join...
+[2023-03-01 03:17:47,817][00674] Waiting for process rollout_proc7 to join...
+[2023-03-01 03:17:47,818][00674] Batcher 0 profile tree view:
+batching: 0.0695, releasing_batches: 0.0007
+[2023-03-01 03:17:47,820][00674] InferenceWorker_p0-w0 profile tree view:
+wait_policy: 0.0000
+  wait_policy_total: 7.1780
+update_model: 0.0171
+  weight_update: 0.0010
+one_step: 0.0020
+  handle_policy_step: 6.9707
+    deserialize: 0.0485, stack: 0.0087, obs_to_device_normalize: 0.3685, forward: 6.1493, send_messages: 0.0999
+    prepare_outputs: 0.2249
+      to_cpu: 0.1262
+[2023-03-01 03:17:47,821][00674] Learner 0 profile tree view:
+misc: 0.0000, prepare_batch: 4.4091
+train: 1.3496
+  epoch_init: 0.0000, minibatch_init: 0.0000, losses_postprocess: 0.0004, kl_divergence: 0.0069, after_optimizer: 0.0264
+  calculate_losses: 0.1852
+    losses_init: 0.0000, forward_head: 0.1145, bptt_initial: 0.0597, tail: 0.0022, advantages_returns: 0.0009, losses: 0.0042
     bptt: 0.0031
-      bptt_forward_core: 0.0029
-  update: 0.7983
-    clip: 0.0030
-[2023-02-25 01:12:30,838][00389] RolloutWorker_w0 profile tree view:
-wait_for_trajectories: 0.0003, enqueue_policy_requests: 0.0006
-[2023-02-25 01:12:30,841][00389] RolloutWorker_w7 profile tree view:
-wait_for_trajectories: 0.0005, enqueue_policy_requests: 1.1856, env_step: 2.8701, overhead: 0.2014, complete_rollouts: 0.0310
-save_policy_outputs: 0.0911
-  split_output_tensors: 0.0448
-[2023-02-25 01:12:30,844][00389] Loop Runner_EvtLoop terminating...
-[2023-02-25 01:12:30,851][00389] Runner profile tree view:
-main_loop: 41.0095
-[2023-02-25 01:12:30,854][00389] Collected {0: 114688}, FPS: 199.8
-[2023-02-25 01:14:08,563][00389] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
-[2023-02-25 01:14:08,565][00389] Overriding arg 'num_workers' with value 1 passed from command line
-[2023-02-25 01:14:08,567][00389] Adding new argument 'no_render'=True that is not in the saved config file!
-[2023-02-25 01:14:08,570][00389] Adding new argument 'save_video'=True that is not in the saved config file!
-[2023-02-25 01:14:08,572][00389] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
-[2023-02-25 01:14:08,573][00389] Adding new argument 'video_name'=None that is not in the saved config file!
-[2023-02-25 01:14:08,576][00389] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file!
-[2023-02-25 01:14:08,577][00389] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
-[2023-02-25 01:14:08,578][00389] Adding new argument 'push_to_hub'=False that is not in the saved config file!
-[2023-02-25 01:14:08,580][00389] Adding new argument 'hf_repository'=None that is not in the saved config file!
-[2023-02-25 01:14:08,581][00389] Adding new argument 'policy_index'=0 that is not in the saved config file!
-[2023-02-25 01:14:08,582][00389] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
-[2023-02-25 01:14:08,583][00389] Adding new argument 'train_script'=None that is not in the saved config file!
-[2023-02-25 01:14:08,585][00389] Adding new argument 'enjoy_script'=None that is not in the saved config file!
-[2023-02-25 01:14:08,586][00389] Using frameskip 1 and render_action_repeat=4 for evaluation
-[2023-02-25 01:14:08,617][00389] Doom resolution: 160x120, resize resolution: (128, 72)
-[2023-02-25 01:14:08,619][00389] RunningMeanStd input shape: (3, 72, 128)
-[2023-02-25 01:14:08,623][00389] RunningMeanStd input shape: (1,)
-[2023-02-25 01:14:08,642][00389] ConvEncoder: input_channels=3
-[2023-02-25 01:14:09,306][00389] Conv encoder output size: 512
-[2023-02-25 01:14:09,308][00389] Policy head output size: 512
-[2023-02-25 01:14:11,975][00389] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000028_114688.pth...
-[2023-02-25 01:14:13,821][00389] Num frames 100...
-[2023-02-25 01:14:13,989][00389] Num frames 200...
-[2023-02-25 01:14:14,161][00389] Num frames 300...
-[2023-02-25 01:14:14,362][00389] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840
-[2023-02-25 01:14:14,364][00389] Avg episode reward: 3.840, avg true_objective: 3.840
-[2023-02-25 01:14:14,394][00389] Num frames 400...
-[2023-02-25 01:14:14,565][00389] Num frames 500...
-[2023-02-25 01:14:14,731][00389] Num frames 600...
-[2023-02-25 01:14:14,900][00389] Num frames 700...
-[2023-02-25 01:14:15,033][00389] Num frames 800...
-[2023-02-25 01:14:15,127][00389] Avg episode rewards: #0: 4.660, true rewards: #0: 4.160
-[2023-02-25 01:14:15,128][00389] Avg episode reward: 4.660, avg true_objective: 4.160
-[2023-02-25 01:14:15,211][00389] Num frames 900...
-[2023-02-25 01:14:15,343][00389] Num frames 1000...
-[2023-02-25 01:14:15,456][00389] Num frames 1100...
-[2023-02-25 01:14:15,568][00389] Num frames 1200...
-[2023-02-25 01:14:15,641][00389] Avg episode rewards: #0: 4.387, true rewards: #0: 4.053
-[2023-02-25 01:14:15,643][00389] Avg episode reward: 4.387, avg true_objective: 4.053
-[2023-02-25 01:14:15,740][00389] Num frames 1300...
-[2023-02-25 01:14:15,857][00389] Num frames 1400...
-[2023-02-25 01:14:15,982][00389] Num frames 1500...
-[2023-02-25 01:14:16,098][00389] Num frames 1600...
-[2023-02-25 01:14:16,150][00389] Avg episode rewards: #0: 4.250, true rewards: #0: 4.000
-[2023-02-25 01:14:16,152][00389] Avg episode reward: 4.250, avg true_objective: 4.000
-[2023-02-25 01:14:16,276][00389] Num frames 1700...
-[2023-02-25 01:14:16,398][00389] Num frames 1800...
-[2023-02-25 01:14:16,523][00389] Num frames 1900...
-[2023-02-25 01:14:16,672][00389] Avg episode rewards: #0: 4.168, true rewards: #0: 3.968
-[2023-02-25 01:14:16,673][00389] Avg episode reward: 4.168, avg true_objective: 3.968
-[2023-02-25 01:14:16,696][00389] Num frames 2000...
-[2023-02-25 01:14:16,808][00389] Num frames 2100...
-[2023-02-25 01:14:16,929][00389] Num frames 2200...
-[2023-02-25 01:14:17,047][00389] Num frames 2300...
-[2023-02-25 01:14:17,108][00389] Avg episode rewards: #0: 4.003, true rewards: #0: 3.837
-[2023-02-25 01:14:17,113][00389] Avg episode reward: 4.003, avg true_objective: 3.837
-[2023-02-25 01:14:17,231][00389] Num frames 2400...
-[2023-02-25 01:14:17,353][00389] Num frames 2500...
-[2023-02-25 01:14:17,471][00389] Num frames 2600...
-[2023-02-25 01:14:17,626][00389] Avg episode rewards: #0: 3.980, true rewards: #0: 3.837
-[2023-02-25 01:14:17,628][00389] Avg episode reward: 3.980, avg true_objective: 3.837
-[2023-02-25 01:14:17,647][00389] Num frames 2700...
-[2023-02-25 01:14:17,763][00389] Num frames 2800...
-[2023-02-25 01:14:17,875][00389] Num frames 2900...
-[2023-02-25 01:14:17,988][00389] Num frames 3000...
-[2023-02-25 01:14:18,131][00389] Avg episode rewards: #0: 3.963, true rewards: #0: 3.837
-[2023-02-25 01:14:18,133][00389] Avg episode reward: 3.963, avg true_objective: 3.837
-[2023-02-25 01:14:18,169][00389] Num frames 3100...
-[2023-02-25 01:14:18,292][00389] Num frames 3200...
-[2023-02-25 01:14:18,412][00389] Num frames 3300...
-[2023-02-25 01:14:18,529][00389] Num frames 3400...
-[2023-02-25 01:14:18,650][00389] Avg episode rewards: #0: 3.949, true rewards: #0: 3.838
-[2023-02-25 01:14:18,651][00389] Avg episode reward: 3.949, avg true_objective: 3.838
-[2023-02-25 01:14:18,711][00389] Num frames 3500...
-[2023-02-25 01:14:18,824][00389] Num frames 3600...
-[2023-02-25 01:14:18,950][00389] Num frames 3700...
-[2023-02-25 01:14:19,066][00389] Num frames 3800...
-[2023-02-25 01:14:19,182][00389] Num frames 3900...
-[2023-02-25 01:14:19,242][00389] Avg episode rewards: #0: 4.102, true rewards: #0: 3.902
-[2023-02-25 01:14:19,245][00389] Avg episode reward: 4.102, avg true_objective: 3.902
-[2023-02-25 01:14:41,404][00389] Replay video saved to /content/train_dir/default_experiment/replay.mp4!
-[2023-02-25 01:16:11,150][00389] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
-[2023-02-25 01:16:11,152][00389] Overriding arg 'num_workers' with value 1 passed from command line
-[2023-02-25 01:16:11,155][00389] Adding new argument 'no_render'=True that is not in the saved config file!
-[2023-02-25 01:16:11,158][00389] Adding new argument 'save_video'=True that is not in the saved config file!
-[2023-02-25 01:16:11,159][00389] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
-[2023-02-25 01:16:11,161][00389] Adding new argument 'video_name'=None that is not in the saved config file!
-[2023-02-25 01:16:11,163][00389] Adding new argument 'max_num_frames'=100000 that is not in the saved config file!
-[2023-02-25 01:16:11,164][00389] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
-[2023-02-25 01:16:11,165][00389] Adding new argument 'push_to_hub'=True that is not in the saved config file!
-[2023-02-25 01:16:11,167][00389] Adding new argument 'hf_repository'='ThomasSimonini/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file!
-[2023-02-25 01:16:11,168][00389] Adding new argument 'policy_index'=0 that is not in the saved config file!
-[2023-02-25 01:16:11,170][00389] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
-[2023-02-25 01:16:11,171][00389] Adding new argument 'train_script'=None that is not in the saved config file!
-[2023-02-25 01:16:11,172][00389] Adding new argument 'enjoy_script'=None that is not in the saved config file!
-[2023-02-25 01:16:11,174][00389] Using frameskip 1 and render_action_repeat=4 for evaluation
-[2023-02-25 01:16:11,204][00389] RunningMeanStd input shape: (3, 72, 128)
-[2023-02-25 01:16:11,206][00389] RunningMeanStd input shape: (1,)
-[2023-02-25 01:16:11,222][00389] ConvEncoder: input_channels=3
-[2023-02-25 01:16:11,260][00389] Conv encoder output size: 512
-[2023-02-25 01:16:11,265][00389] Policy head output size: 512
-[2023-02-25 01:16:11,285][00389] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000028_114688.pth...
-[2023-02-25 01:16:11,747][00389] Num frames 100...
-[2023-02-25 01:16:11,871][00389] Num frames 200...
-[2023-02-25 01:16:11,991][00389] Num frames 300...
-[2023-02-25 01:16:12,116][00389] Num frames 400...
-[2023-02-25 01:16:12,238][00389] Num frames 500...
-[2023-02-25 01:16:12,351][00389] Avg episode rewards: #0: 7.440, true rewards: #0: 5.440
-[2023-02-25 01:16:12,354][00389] Avg episode reward: 7.440, avg true_objective: 5.440
-[2023-02-25 01:16:12,421][00389] Num frames 600...
-[2023-02-25 01:16:12,535][00389] Num frames 700...
-[2023-02-25 01:16:12,659][00389] Num frames 800...
-[2023-02-25 01:16:12,787][00389] Num frames 900...
-[2023-02-25 01:16:12,951][00389] Avg episode rewards: #0: 6.460, true rewards: #0: 4.960
-[2023-02-25 01:16:12,956][00389] Avg episode reward: 6.460, avg true_objective: 4.960
-[2023-02-25 01:16:12,969][00389] Num frames 1000...
-[2023-02-25 01:16:13,086][00389] Num frames 1100...
-[2023-02-25 01:16:13,209][00389] Num frames 1200...
-[2023-02-25 01:16:13,327][00389] Num frames 1300...
-[2023-02-25 01:16:13,469][00389] Num frames 1400...
-[2023-02-25 01:16:13,583][00389] Num frames 1500...
-[2023-02-25 01:16:13,702][00389] Avg episode rewards: #0: 6.787, true rewards: #0: 5.120
-[2023-02-25 01:16:13,705][00389] Avg episode reward: 6.787, avg true_objective: 5.120
-[2023-02-25 01:16:13,784][00389] Num frames 1600...
-[2023-02-25 01:16:13,937][00389] Num frames 1700...
-[2023-02-25 01:16:14,055][00389] Num frames 1800...
-[2023-02-25 01:16:14,220][00389] Num frames 1900...
-[2023-02-25 01:16:14,318][00389] Avg episode rewards: #0: 6.050, true rewards: #0: 4.800
-[2023-02-25 01:16:14,320][00389] Avg episode reward: 6.050, avg true_objective: 4.800
-[2023-02-25 01:16:14,430][00389] Num frames 2000...
-[2023-02-25 01:16:14,590][00389] Num frames 2100...
-[2023-02-25 01:16:14,715][00389] Num frames 2200...
-[2023-02-25 01:16:14,780][00389] Avg episode rewards: #0: 5.412, true rewards: #0: 4.412
-[2023-02-25 01:16:14,782][00389] Avg episode reward: 5.412, avg true_objective: 4.412
-[2023-02-25 01:16:14,984][00389] Num frames 2300...
-[2023-02-25 01:16:15,258][00389] Num frames 2400...
-[2023-02-25 01:16:15,437][00389] Num frames 2500...
-[2023-02-25 01:16:15,667][00389] Avg episode rewards: #0: 5.150, true rewards: #0: 4.317
-[2023-02-25 01:16:15,673][00389] Avg episode reward: 5.150, avg true_objective: 4.317
-[2023-02-25 01:16:15,708][00389] Num frames 2600...
-[2023-02-25 01:16:16,148][00389] Num frames 2700...
-[2023-02-25 01:16:16,394][00389] Num frames 2800...
-[2023-02-25 01:16:16,533][00389] Avg episode rewards: #0: 4.780, true rewards: #0: 4.066
-[2023-02-25 01:16:16,538][00389] Avg episode reward: 4.780, avg true_objective: 4.066
-[2023-02-25 01:16:16,775][00389] Num frames 2900...
-[2023-02-25 01:16:17,029][00389] Num frames 3000...
-[2023-02-25 01:16:17,154][00389] Num frames 3100...
-[2023-02-25 01:16:17,282][00389] Num frames 3200...
-[2023-02-25 01:16:17,445][00389] Avg episode rewards: #0: 4.868, true rewards: #0: 4.117
-[2023-02-25 01:16:17,447][00389] Avg episode reward: 4.868, avg true_objective: 4.117
-[2023-02-25 01:16:17,461][00389] Num frames 3300...
-[2023-02-25 01:16:17,588][00389] Num frames 3400...
-[2023-02-25 01:16:17,710][00389] Num frames 3500...
-[2023-02-25 01:16:17,834][00389] Num frames 3600...
-[2023-02-25 01:16:17,981][00389] Avg episode rewards: #0: 4.753, true rewards: #0: 4.087
-[2023-02-25 01:16:17,984][00389] Avg episode reward: 4.753, avg true_objective: 4.087
-[2023-02-25 01:16:18,014][00389] Num frames 3700...
-[2023-02-25 01:16:18,138][00389] Num frames 3800...
-[2023-02-25 01:16:18,256][00389] Num frames 3900...
-[2023-02-25 01:16:18,382][00389] Num frames 4000...
-[2023-02-25 01:16:18,508][00389] Avg episode rewards: #0: 4.662, true rewards: #0: 4.062
-[2023-02-25 01:16:18,509][00389] Avg episode reward: 4.662, avg true_objective: 4.062
-[2023-02-25 01:17:54,848][15272] Saving configuration to /content/train_dir/default_experiment/config.json...
-[2023-02-25 01:17:54,850][15272] Rollout worker 0 uses device cpu
-[2023-02-25 01:17:54,854][15272] Rollout worker 1 uses device cpu
-[2023-02-25 01:17:54,855][15272] Rollout worker 2 uses device cpu
-[2023-02-25 01:17:54,861][15272] Rollout worker 3 uses device cpu
-[2023-02-25 01:17:54,862][15272] Rollout worker 4 uses device cpu
-[2023-02-25 01:17:54,866][15272] Rollout worker 5 uses device cpu
-[2023-02-25 01:17:54,867][15272] Rollout worker 6 uses device cpu
-[2023-02-25 01:17:54,869][15272] Rollout worker 7 uses device cpu
-[2023-02-25 01:17:55,097][15272] Using GPUs [0] for process 0 (actually maps to GPUs [0])
-[2023-02-25 01:17:55,100][15272] InferenceWorker_p0-w0: min num requests: 2
-[2023-02-25 01:17:55,150][15272] Starting all processes...
-[2023-02-25 01:17:55,152][15272] Starting process learner_proc0
-[2023-02-25 01:17:55,241][15272] Starting all processes...
-[2023-02-25 01:17:55,337][15272] Starting process inference_proc0-0
-[2023-02-25 01:17:55,338][15272] Starting process rollout_proc0
-[2023-02-25 01:17:55,340][15272] Starting process rollout_proc1
-[2023-02-25 01:17:55,340][15272] Starting process rollout_proc2
-[2023-02-25 01:17:55,340][15272] Starting process rollout_proc3
-[2023-02-25 01:17:55,341][15272] Starting process rollout_proc4
-[2023-02-25 01:17:55,343][15272] Starting process rollout_proc5
-[2023-02-25 01:17:55,343][15272] Starting process rollout_proc6
-[2023-02-25 01:17:55,343][15272] Starting process rollout_proc7
-[2023-02-25 01:18:08,346][15762] Using GPUs [0] for process 0 (actually maps to GPUs [0])
-[2023-02-25 01:18:08,346][15762] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0
-[2023-02-25 01:18:08,381][15782] Worker 1 uses CPU cores [1]
-[2023-02-25 01:18:09,027][15787] Worker 7 uses CPU cores [1]
-[2023-02-25 01:18:09,242][15776] Using GPUs [0] for process 0 (actually maps to GPUs [0])
-[2023-02-25 01:18:09,246][15776] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0
-[2023-02-25 01:18:09,292][15785] Worker 6 uses CPU cores [0]
-[2023-02-25 01:18:09,313][15788] Worker 5 uses CPU cores [1]
-[2023-02-25 01:18:09,477][15786] Worker 4 uses CPU cores [0]
-[2023-02-25 01:18:09,562][15784] Worker 3 uses CPU cores [1]
-[2023-02-25 01:18:09,714][15783] Worker 2 uses CPU cores [0]
-[2023-02-25 01:18:09,717][15777] Worker 0 uses CPU cores [0]
-[2023-02-25 01:18:09,818][15776] Num visible devices: 1
-[2023-02-25 01:18:09,822][15762] Num visible devices: 1
-[2023-02-25 01:18:09,844][15762] Starting seed is not provided
-[2023-02-25 01:18:09,845][15762] Using GPUs [0] for process 0 (actually maps to GPUs [0])
-[2023-02-25 01:18:09,846][15762] Initializing actor-critic model on device cuda:0
-[2023-02-25 01:18:09,847][15762] RunningMeanStd input shape: (3, 72, 128)
-[2023-02-25 01:18:09,848][15762] RunningMeanStd input shape: (1,)
-[2023-02-25 01:18:09,884][15762] ConvEncoder: input_channels=3
-[2023-02-25 01:18:10,126][15762] Conv encoder output size: 512
-[2023-02-25 01:18:10,128][15762] Policy head output size: 512
-[2023-02-25 01:18:10,152][15762] Created Actor Critic model with architecture:
-[2023-02-25 01:18:10,153][15762] ActorCriticSharedWeights(
-  (obs_normalizer): ObservationNormalizer(
-    (running_mean_std): RunningMeanStdDictInPlace(
-      (running_mean_std): ModuleDict(
-        (obs): RunningMeanStdInPlace()
-      )
-    )
-  )
-  (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace)
-  (encoder): VizdoomEncoder(
-    (basic_encoder): ConvEncoder(
-      (enc): RecursiveScriptModule(
-        original_name=ConvEncoderImpl
-        (conv_head): RecursiveScriptModule(
-          original_name=Sequential
-          (0): RecursiveScriptModule(original_name=Conv2d)
-          (1): RecursiveScriptModule(original_name=ELU)
-          (2): RecursiveScriptModule(original_name=Conv2d)
-          (3): RecursiveScriptModule(original_name=ELU)
-          (4): RecursiveScriptModule(original_name=Conv2d)
-          (5): RecursiveScriptModule(original_name=ELU)
-        )
-        (mlp_layers): RecursiveScriptModule(
-          original_name=Sequential
-          (0): RecursiveScriptModule(original_name=Linear)
-          (1): RecursiveScriptModule(original_name=ELU)
-        )
-      )
-    )
-  )
-  (core): ModelCoreRNN(
-    (core): GRU(512, 512)
-  )
-  (decoder): MlpDecoder(
-    (mlp): Identity()
-  )
-  (critic_linear): Linear(in_features=512, out_features=1, bias=True)
-  (action_parameterization): ActionParameterizationDefault(
-    (distribution_linear): Linear(in_features=512, out_features=5, bias=True)
-  )
-)
-[2023-02-25 01:18:14,035][15762] Using optimizer <class 'torch.optim.adam.Adam'>
-[2023-02-25 01:18:14,036][15762] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000028_114688.pth...
-[2023-02-25 01:18:14,065][15762] Loading model from checkpoint
-[2023-02-25 01:18:14,072][15762] Loaded experiment state at self.train_step=28, self.env_steps=114688
-[2023-02-25 01:18:14,073][15762] Initialized policy 0 weights for model version 28
-[2023-02-25 01:18:14,078][15762] LearnerWorker_p0 finished initialization!
-[2023-02-25 01:18:14,082][15762] Using GPUs [0] for process 0 (actually maps to GPUs [0])
-[2023-02-25 01:18:14,277][15776] RunningMeanStd input shape: (3, 72, 128)
-[2023-02-25 01:18:14,278][15776] RunningMeanStd input shape: (1,)
-[2023-02-25 01:18:14,290][15776] ConvEncoder: input_channels=3
-[2023-02-25 01:18:14,396][15776] Conv encoder output size: 512
-[2023-02-25 01:18:14,396][15776] Policy head output size: 512
-[2023-02-25 01:18:15,076][15272] Heartbeat connected on Batcher_0
-[2023-02-25 01:18:15,090][15272] Heartbeat connected on LearnerWorker_p0
-[2023-02-25 01:18:15,118][15272] Heartbeat connected on RolloutWorker_w0
-[2023-02-25 01:18:15,123][15272] Heartbeat connected on RolloutWorker_w1
-[2023-02-25 01:18:15,127][15272] Heartbeat connected on RolloutWorker_w2
-[2023-02-25 01:18:15,132][15272] Heartbeat connected on RolloutWorker_w3
-[2023-02-25 01:18:15,137][15272] Heartbeat connected on RolloutWorker_w4
-[2023-02-25 01:18:15,143][15272] Heartbeat connected on RolloutWorker_w5
-[2023-02-25 01:18:15,145][15272] Heartbeat connected on RolloutWorker_w6
-[2023-02-25 01:18:15,150][15272] Heartbeat connected on RolloutWorker_w7
-[2023-02-25 01:18:15,442][15272] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 114688. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
-[2023-02-25 01:18:16,646][15272] Inference worker 0-0 is ready!
-[2023-02-25 01:18:16,648][15272] All inference workers are ready! Signal rollout workers to start!
-[2023-02-25 01:18:16,650][15272] Heartbeat connected on InferenceWorker_p0-w0
-[2023-02-25 01:18:16,748][15785] Doom resolution: 160x120, resize resolution: (128, 72)
-[2023-02-25 01:18:16,749][15786] Doom resolution: 160x120, resize resolution: (128, 72)
-[2023-02-25 01:18:16,751][15783] Doom resolution: 160x120, resize resolution: (128, 72)
-[2023-02-25 01:18:16,750][15777] Doom resolution: 160x120, resize resolution: (128, 72)
-[2023-02-25 01:18:16,761][15787] Doom resolution: 160x120, resize resolution: (128, 72)
-[2023-02-25 01:18:16,755][15784] Doom resolution: 160x120, resize resolution: (128, 72)
-[2023-02-25 01:18:16,758][15788] Doom resolution: 160x120, resize resolution: (128, 72)
-[2023-02-25 01:18:16,760][15782] Doom resolution: 160x120, resize resolution: (128, 72)
-[2023-02-25 01:18:17,949][15782] Decorrelating experience for 0 frames...
-[2023-02-25 01:18:17,953][15787] Decorrelating experience for 0 frames...
-[2023-02-25 01:18:17,958][15788] Decorrelating experience for 0 frames...
-[2023-02-25 01:18:18,263][15783] Decorrelating experience for 0 frames...
-[2023-02-25 01:18:18,268][15785] Decorrelating experience for 0 frames...
-[2023-02-25 01:18:18,270][15786] Decorrelating experience for 0 frames...
-[2023-02-25 01:18:18,272][15777] Decorrelating experience for 0 frames...
-[2023-02-25 01:18:18,625][15785] Decorrelating experience for 32 frames...
-[2023-02-25 01:18:19,280][15788] Decorrelating experience for 32 frames...
-[2023-02-25 01:18:19,456][15784] Decorrelating experience for 0 frames...
-[2023-02-25 01:18:19,493][15782] Decorrelating experience for 32 frames...
-[2023-02-25 01:18:19,721][15777] Decorrelating experience for 32 frames...
-[2023-02-25 01:18:19,852][15787] Decorrelating experience for 32 frames...
-[2023-02-25 01:18:19,986][15785] Decorrelating experience for 64 frames...
-[2023-02-25 01:18:20,442][15272] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 114688. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
-[2023-02-25 01:18:21,161][15782] Decorrelating experience for 64 frames...
-[2023-02-25 01:18:21,479][15787] Decorrelating experience for 64 frames...
-[2023-02-25 01:18:21,828][15786] Decorrelating experience for 32 frames...
-[2023-02-25 01:18:22,110][15783] Decorrelating experience for 32 frames...
-[2023-02-25 01:18:22,797][15777] Decorrelating experience for 64 frames...
-[2023-02-25 01:18:23,028][15785] Decorrelating experience for 96 frames...
-[2023-02-25 01:18:23,171][15782] Decorrelating experience for 96 frames...
-[2023-02-25 01:18:24,143][15788] Decorrelating experience for 64 frames...
-[2023-02-25 01:18:24,528][15787] Decorrelating experience for 96 frames...
-[2023-02-25 01:18:25,354][15786] Decorrelating experience for 64 frames...
-[2023-02-25 01:18:25,442][15272] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 114688. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
-[2023-02-25 01:18:26,075][15783] Decorrelating experience for 64 frames...
-[2023-02-25 01:18:27,264][15784] Decorrelating experience for 32 frames...
-[2023-02-25 01:18:28,566][15786] Decorrelating experience for 96 frames...
-[2023-02-25 01:18:28,824][15788] Decorrelating experience for 96 frames...
-[2023-02-25 01:18:30,444][15272] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 114688. Throughput: 0: 65.1. Samples: 976. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
-[2023-02-25 01:18:30,448][15272] Avg episode reward: [(0, '2.692')]
-[2023-02-25 01:18:32,543][15762] Signal inference workers to stop experience collection...
-[2023-02-25 01:18:32,567][15776] InferenceWorker_p0-w0: stopping experience collection
-[2023-02-25 01:18:32,633][15784] Decorrelating experience for 64 frames...
-[2023-02-25 01:18:32,689][15777] Decorrelating experience for 96 frames...
-[2023-02-25 01:18:33,259][15784] Decorrelating experience for 96 frames...
-[2023-02-25 01:18:33,460][15783] Decorrelating experience for 96 frames...
-[2023-02-25 01:18:33,730][15762] Signal inference workers to resume experience collection...
-[2023-02-25 01:18:33,739][15272] Component Batcher_0 stopped!
-[2023-02-25 01:18:33,739][15762] Stopping Batcher_0...
-[2023-02-25 01:18:33,748][15762] Loop batcher_evt_loop terminating...
-[2023-02-25 01:18:33,754][15762] Saving new best policy, reward=2.692!
-[2023-02-25 01:18:33,776][15272] Component RolloutWorker_w6 stopped!
-[2023-02-25 01:18:33,776][15777] Stopping RolloutWorker_w0...
-[2023-02-25 01:18:33,783][15272] Component RolloutWorker_w1 stopped!
-[2023-02-25 01:18:33,787][15272] Component RolloutWorker_w0 stopped!
-[2023-02-25 01:18:33,776][15782] Stopping RolloutWorker_w1...
-[2023-02-25 01:18:33,778][15785] Stopping RolloutWorker_w6...
-[2023-02-25 01:18:33,793][15782] Loop rollout_proc1_evt_loop terminating...
-[2023-02-25 01:18:33,783][15777] Loop rollout_proc0_evt_loop terminating...
-[2023-02-25 01:18:33,797][15776] Weights refcount: 2 0
-[2023-02-25 01:18:33,791][15785] Loop rollout_proc6_evt_loop terminating...
-[2023-02-25 01:18:33,804][15272] Component RolloutWorker_w7 stopped!
-[2023-02-25 01:18:33,811][15776] Stopping InferenceWorker_p0-w0...
-[2023-02-25 01:18:33,809][15272] Component RolloutWorker_w5 stopped!
-[2023-02-25 01:18:33,813][15776] Loop inference_proc0-0_evt_loop terminating...
-[2023-02-25 01:18:33,814][15788] Stopping RolloutWorker_w5...
-[2023-02-25 01:18:33,812][15272] Component InferenceWorker_p0-w0 stopped!
-[2023-02-25 01:18:33,815][15783] Stopping RolloutWorker_w2...
-[2023-02-25 01:18:33,815][15272] Component RolloutWorker_w2 stopped!
-[2023-02-25 01:18:33,823][15787] Stopping RolloutWorker_w7...
-[2023-02-25 01:18:33,825][15786] Stopping RolloutWorker_w4...
-[2023-02-25 01:18:33,825][15272] Component RolloutWorker_w4 stopped!
-[2023-02-25 01:18:33,816][15783] Loop rollout_proc2_evt_loop terminating...
-[2023-02-25 01:18:33,832][15787] Loop rollout_proc7_evt_loop terminating...
-[2023-02-25 01:18:33,824][15788] Loop rollout_proc5_evt_loop terminating...
-[2023-02-25 01:18:33,833][15272] Component RolloutWorker_w3 stopped!
-[2023-02-25 01:18:33,837][15784] Stopping RolloutWorker_w3...
-[2023-02-25 01:18:33,826][15786] Loop rollout_proc4_evt_loop terminating...
-[2023-02-25 01:18:33,839][15784] Loop rollout_proc3_evt_loop terminating...
-[2023-02-25 01:18:36,027][15762] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000030_122880.pth...
-[2023-02-25 01:18:36,126][15762] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000026_106496.pth
-[2023-02-25 01:18:36,138][15762] Saving new best policy, reward=2.960!
-[2023-02-25 01:18:36,295][15762] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000030_122880.pth...
-[2023-02-25 01:18:36,438][15272] Component LearnerWorker_p0 stopped!
-[2023-02-25 01:18:36,445][15272] Waiting for process learner_proc0 to stop...
-[2023-02-25 01:18:36,439][15762] Stopping LearnerWorker_p0...
-[2023-02-25 01:18:36,454][15762] Loop learner_proc0_evt_loop terminating...
-[2023-02-25 01:18:37,633][15272] Waiting for process inference_proc0-0 to join...
-[2023-02-25 01:18:37,635][15272] Waiting for process rollout_proc0 to join...
-[2023-02-25 01:18:37,640][15272] Waiting for process rollout_proc1 to join...
-[2023-02-25 01:18:37,646][15272] Waiting for process rollout_proc2 to join...
-[2023-02-25 01:18:37,649][15272] Waiting for process rollout_proc3 to join...
-[2023-02-25 01:18:37,651][15272] Waiting for process rollout_proc4 to join...
-[2023-02-25 01:18:37,654][15272] Waiting for process rollout_proc5 to join...
-[2023-02-25 01:18:37,659][15272] Waiting for process rollout_proc6 to join...
-[2023-02-25 01:18:37,660][15272] Waiting for process rollout_proc7 to join...
-[2023-02-25 01:18:37,661][15272] Batcher 0 profile tree view:
-batching: 0.0579, releasing_batches: 0.0089
-[2023-02-25 01:18:37,664][15272] InferenceWorker_p0-w0 profile tree view:
-update_model: 0.0253
-wait_policy: 0.0012
-  wait_policy_total: 11.0478
-one_step: 0.0019
-  handle_policy_step: 4.5369
-    deserialize: 0.0550, stack: 0.0135, obs_to_device_normalize: 0.4126, forward: 3.5764, send_messages: 0.1127
-    prepare_outputs: 0.2696
-      to_cpu: 0.1420
-[2023-02-25 01:18:37,666][15272] Learner 0 profile tree view:
-misc: 0.0000, prepare_batch: 5.4236
-train: 0.7007
-  epoch_init: 0.0000, minibatch_init: 0.0000, losses_postprocess: 0.0004, kl_divergence: 0.0005, after_optimizer: 0.0076
-  calculate_losses: 0.1402
-    losses_init: 0.0000, forward_head: 0.1134, bptt_initial: 0.0170, tail: 0.0014, advantages_returns: 0.0011, losses: 0.0041
-    bptt: 0.0027
-      bptt_forward_core: 0.0026
-  update: 0.5509
-    clip: 0.0024
-[2023-02-25 01:18:37,668][15272] RolloutWorker_w0 profile tree view:
-wait_for_trajectories: 0.0003, enqueue_policy_requests: 0.0005
-[2023-02-25 01:18:37,669][15272] RolloutWorker_w7 profile tree view:
-wait_for_trajectories: 0.0011, enqueue_policy_requests: 1.0850, env_step: 4.9020, overhead: 0.1307, complete_rollouts: 0.0696
-save_policy_outputs: 0.1046
-  split_output_tensors: 0.0584
-[2023-02-25 01:18:37,672][15272] Loop Runner_EvtLoop terminating...
-[2023-02-25 01:18:37,675][15272] Runner profile tree view:
-main_loop: 42.5253
-[2023-02-25 01:18:37,677][15272] Collected {0: 122880}, FPS: 192.6
-[2023-02-25 01:18:37,901][15272] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
-[2023-02-25 01:18:37,904][15272] Overriding arg 'num_workers' with value 1 passed from command line
-[2023-02-25 01:18:37,906][15272] Adding new argument 'no_render'=True that is not in the saved config file!
-[2023-02-25 01:18:37,907][15272] Adding new argument 'save_video'=True that is not in the saved config file!
-[2023-02-25 01:18:37,910][15272] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
-[2023-02-25 01:18:37,912][15272] Adding new argument 'video_name'=None that is not in the saved config file!
-[2023-02-25 01:18:37,913][15272] Adding new argument 'max_num_frames'=100000 that is not in the saved config file!
-[2023-02-25 01:18:37,914][15272] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
-[2023-02-25 01:18:37,915][15272] Adding new argument 'push_to_hub'=True that is not in the saved config file!
-[2023-02-25 01:18:37,917][15272] Adding new argument 'hf_repository'='Antiraedus/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file!
-[2023-02-25 01:18:37,922][15272] Adding new argument 'policy_index'=0 that is not in the saved config file!
-[2023-02-25 01:18:37,923][15272] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
-[2023-02-25 01:18:37,924][15272] Adding new argument 'train_script'=None that is not in the saved config file!
-[2023-02-25 01:18:37,925][15272] Adding new argument 'enjoy_script'=None that is not in the saved config file!
-[2023-02-25 01:18:37,927][15272] Using frameskip 1 and render_action_repeat=4 for evaluation
-[2023-02-25 01:18:37,955][15272] Doom resolution: 160x120, resize resolution: (128, 72)
-[2023-02-25 01:18:37,957][15272] RunningMeanStd input shape: (3, 72, 128)
-[2023-02-25 01:18:37,960][15272] RunningMeanStd input shape: (1,)
-[2023-02-25 01:18:37,976][15272] ConvEncoder: input_channels=3
-[2023-02-25 01:18:38,646][15272] Conv encoder output size: 512
-[2023-02-25 01:18:38,648][15272] Policy head output size: 512
-[2023-02-25 01:18:41,511][15272] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000030_122880.pth...
-[2023-02-25 01:18:44,517][15272] Num frames 100...
-[2023-02-25 01:18:44,869][15272] Num frames 200...
-[2023-02-25 01:18:45,105][15272] Num frames 300...
-[2023-02-25 01:18:45,351][15272] Num frames 400...
-[2023-02-25 01:18:45,562][15272] Num frames 500...
-[2023-02-25 01:18:45,696][15272] Avg episode rewards: #0: 7.440, true rewards: #0: 5.440
-[2023-02-25 01:18:45,703][15272] Avg episode reward: 7.440, avg true_objective: 5.440
-[2023-02-25 01:18:45,814][15272] Num frames 600...
-[2023-02-25 01:18:46,001][15272] Num frames 700...
-[2023-02-25 01:18:46,188][15272] Num frames 800...
-[2023-02-25 01:18:46,352][15272] Avg episode rewards: #0: 5.305, true rewards: #0: 4.305
-[2023-02-25 01:18:46,355][15272] Avg episode reward: 5.305, avg true_objective: 4.305
-[2023-02-25 01:18:46,509][15272] Num frames 900...
-[2023-02-25 01:18:46,711][15272] Num frames 1000...
-[2023-02-25 01:18:46,904][15272] Num frames 1100...
-[2023-02-25 01:18:47,068][15272] Num frames 1200...
-[2023-02-25 01:18:47,230][15272] Avg episode rewards: #0: 4.817, true rewards: #0: 4.150
-[2023-02-25 01:18:47,237][15272] Avg episode reward: 4.817, avg true_objective: 4.150
-[2023-02-25 01:18:47,369][15272] Num frames 1300...
-[2023-02-25 01:18:47,673][15272] Num frames 1400...
-[2023-02-25 01:18:47,850][15272] Num frames 1500...
-[2023-02-25 01:18:48,120][15272] Num frames 1600...
-[2023-02-25 01:18:48,301][15272] Avg episode rewards: #0: 4.573, true rewards: #0: 4.072
-[2023-02-25 01:18:48,303][15272] Avg episode reward: 4.573, avg true_objective: 4.072
-[2023-02-25 01:18:48,443][15272] Num frames 1700...
-[2023-02-25 01:18:48,671][15272] Num frames 1800...
-[2023-02-25 01:18:48,927][15272] Num frames 1900...
-[2023-02-25 01:18:49,152][15272] Num frames 2000...
-[2023-02-25 01:18:49,263][15272] Avg episode rewards: #0: 4.426, true rewards: #0: 4.026
-[2023-02-25 01:18:49,271][15272] Avg episode reward: 4.426, avg true_objective: 4.026
-[2023-02-25 01:18:49,479][15272] Num frames 2100...
-[2023-02-25 01:18:49,678][15272] Num frames 2200...
-[2023-02-25 01:18:49,860][15272] Num frames 2300...
-[2023-02-25 01:18:50,056][15272] Num frames 2400...
-[2023-02-25 01:18:50,173][15272] Avg episode rewards: #0: 4.382, true rewards: #0: 4.048
-[2023-02-25 01:18:50,175][15272] Avg episode reward: 4.382, avg true_objective: 4.048
-[2023-02-25 01:18:50,261][15272] Num frames 2500...
-[2023-02-25 01:18:50,385][15272] Num frames 2600...
-[2023-02-25 01:18:50,502][15272] Num frames 2700...
-[2023-02-25 01:18:50,628][15272] Num frames 2800...
-[2023-02-25 01:18:50,701][15272] Avg episode rewards: #0: 4.304, true rewards: #0: 4.019
-[2023-02-25 01:18:50,704][15272] Avg episode reward: 4.304, avg true_objective: 4.019
-[2023-02-25 01:18:50,813][15272] Num frames 2900...
-[2023-02-25 01:18:50,925][15272] Num frames 3000...
-[2023-02-25 01:18:51,048][15272] Num frames 3100...
-[2023-02-25 01:18:51,166][15272] Num frames 3200...
-[2023-02-25 01:18:51,257][15272] Avg episode rewards: #0: 4.411, true rewards: #0: 4.036
-[2023-02-25 01:18:51,262][15272] Avg episode reward: 4.411, avg true_objective: 4.036
-[2023-02-25 01:18:51,345][15272] Num frames 3300...
-[2023-02-25 01:18:51,463][15272] Num frames 3400...
-[2023-02-25 01:18:51,586][15272] Num frames 3500...
-[2023-02-25 01:18:51,711][15272] Num frames 3600...
-[2023-02-25 01:18:51,837][15272] Num frames 3700...
-[2023-02-25 01:18:51,960][15272] Num frames 3800...
-[2023-02-25 01:18:52,024][15272] Avg episode rewards: #0: 4.894, true rewards: #0: 4.228
-[2023-02-25 01:18:52,026][15272] Avg episode reward: 4.894, avg true_objective: 4.228
-[2023-02-25 01:18:52,137][15272] Num frames 3900...
-[2023-02-25 01:18:52,263][15272] Num frames 4000...
-[2023-02-25 01:18:52,383][15272] Num frames 4100...
-[2023-02-25 01:18:52,505][15272] Avg episode rewards: #0: 4.857, true rewards: #0: 4.157
-[2023-02-25 01:18:52,507][15272] Avg episode reward: 4.857, avg true_objective: 4.157
-[2023-02-25 01:19:15,633][15272] Replay video saved to /content/train_dir/default_experiment/replay.mp4!
+      bptt_forward_core: 0.0030
+  update: 1.1296
+    clip: 0.0035
+[2023-03-01 03:17:47,822][00674] RolloutWorker_w0 profile tree view:
+wait_for_trajectories: 0.0008, enqueue_policy_requests: 0.6873, env_step: 2.6950, overhead: 0.0839, complete_rollouts: 0.0183
+save_policy_outputs: 0.0687
+  split_output_tensors: 0.0276
+[2023-03-01 03:17:47,824][00674] RolloutWorker_w7 profile tree view:
+wait_for_trajectories: 0.0008, enqueue_policy_requests: 0.4166, env_step: 3.0030, overhead: 0.0586, complete_rollouts: 0.0292
+save_policy_outputs: 0.0462
+  split_output_tensors: 0.0194
+[2023-03-01 03:17:47,826][00674] Loop Runner_EvtLoop terminating...
+[2023-03-01 03:17:47,827][00674] Runner profile tree view:
+main_loop: 41.3926
+[2023-03-01 03:17:47,829][00674] Collected {0: 12288}, FPS: 296.9
+[2023-03-01 03:18:18,296][00674] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
+[2023-03-01 03:18:18,298][00674] Overriding arg 'num_workers' with value 1 passed from command line
+[2023-03-01 03:18:18,300][00674] Adding new argument 'no_render'=True that is not in the saved config file!
+[2023-03-01 03:18:18,303][00674] Adding new argument 'save_video'=True that is not in the saved config file!
+[2023-03-01 03:18:18,305][00674] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
+[2023-03-01 03:18:18,306][00674] Adding new argument 'video_name'=None that is not in the saved config file!
+[2023-03-01 03:18:18,308][00674] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file!
+[2023-03-01 03:18:18,311][00674] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
+[2023-03-01 03:18:18,313][00674] Adding new argument 'push_to_hub'=False that is not in the saved config file!
+[2023-03-01 03:18:18,314][00674] Adding new argument 'hf_repository'=None that is not in the saved config file!
+[2023-03-01 03:18:18,319][00674] Adding new argument 'policy_index'=0 that is not in the saved config file!
+[2023-03-01 03:18:18,320][00674] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
+[2023-03-01 03:18:18,324][00674] Adding new argument 'train_script'=None that is not in the saved config file!
+[2023-03-01 03:18:18,326][00674] Adding new argument 'enjoy_script'=None that is not in the saved config file!
+[2023-03-01 03:18:18,328][00674] Using frameskip 1 and render_action_repeat=4 for evaluation
+[2023-03-01 03:18:18,352][00674] Doom resolution: 160x120, resize resolution: (128, 72)
+[2023-03-01 03:18:18,355][00674] RunningMeanStd input shape: (3, 72, 128)
+[2023-03-01 03:18:18,358][00674] RunningMeanStd input shape: (1,)
+[2023-03-01 03:18:18,378][00674] ConvEncoder: input_channels=3
+[2023-03-01 03:18:19,118][00674] Conv encoder output size: 512
+[2023-03-01 03:18:19,121][00674] Policy head output size: 512
+[2023-03-01 03:18:21,517][00674] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000003_12288.pth...
+[2023-03-01 03:18:22,782][00674] Num frames 100...
+[2023-03-01 03:18:22,916][00674] Num frames 200...
+[2023-03-01 03:18:23,034][00674] Num frames 300...
+[2023-03-01 03:18:23,158][00674] Num frames 400...
+[2023-03-01 03:18:23,314][00674] Avg episode rewards: #0: 6.800, true rewards: #0: 4.800
+[2023-03-01 03:18:23,317][00674] Avg episode reward: 6.800, avg true_objective: 4.800
+[2023-03-01 03:18:23,344][00674] Num frames 500...
+[2023-03-01 03:18:23,460][00674] Num frames 600...
+[2023-03-01 03:18:23,578][00674] Num frames 700...
+[2023-03-01 03:18:23,697][00674] Num frames 800...
+[2023-03-01 03:18:23,830][00674] Avg episode rewards: #0: 5.320, true rewards: #0: 4.320
+[2023-03-01 03:18:23,831][00674] Avg episode reward: 5.320, avg true_objective: 4.320
+[2023-03-01 03:18:23,879][00674] Num frames 900...
+[2023-03-01 03:18:24,006][00674] Num frames 1000...
+[2023-03-01 03:18:24,121][00674] Num frames 1100...
+[2023-03-01 03:18:24,238][00674] Num frames 1200...
+[2023-03-01 03:18:24,346][00674] Avg episode rewards: #0: 4.827, true rewards: #0: 4.160
+[2023-03-01 03:18:24,348][00674] Avg episode reward: 4.827, avg true_objective: 4.160
+[2023-03-01 03:18:24,411][00674] Num frames 1300...
+[2023-03-01 03:18:24,534][00674] Num frames 1400...
+[2023-03-01 03:18:24,653][00674] Num frames 1500...
+[2023-03-01 03:18:24,767][00674] Num frames 1600...
+[2023-03-01 03:18:24,936][00674] Avg episode rewards: #0: 4.990, true rewards: #0: 4.240
+[2023-03-01 03:18:24,937][00674] Avg episode reward: 4.990, avg true_objective: 4.240
+[2023-03-01 03:18:24,947][00674] Num frames 1700...
+[2023-03-01 03:18:25,063][00674] Num frames 1800...
+[2023-03-01 03:18:25,182][00674] Num frames 1900...
+[2023-03-01 03:18:25,297][00674] Num frames 2000...
+[2023-03-01 03:18:25,417][00674] Num frames 2100...
+[2023-03-01 03:18:25,549][00674] Avg episode rewards: #0: 5.088, true rewards: #0: 4.288
+[2023-03-01 03:18:25,551][00674] Avg episode reward: 5.088, avg true_objective: 4.288
+[2023-03-01 03:18:25,650][00674] Num frames 2200...
+[2023-03-01 03:18:25,814][00674] Num frames 2300...
+[2023-03-01 03:18:25,982][00674] Num frames 2400...
+[2023-03-01 03:18:26,152][00674] Num frames 2500...
+[2023-03-01 03:18:26,362][00674] Avg episode rewards: #0: 5.153, true rewards: #0: 4.320
+[2023-03-01 03:18:26,364][00674] Avg episode reward: 5.153, avg true_objective: 4.320
+[2023-03-01 03:18:26,383][00674] Num frames 2600...
+[2023-03-01 03:18:26,551][00674] Num frames 2700...
+[2023-03-01 03:18:26,721][00674] Num frames 2800...
+[2023-03-01 03:18:26,886][00674] Num frames 2900...
+[2023-03-01 03:18:27,067][00674] Num frames 3000...
+[2023-03-01 03:18:27,188][00674] Avg episode rewards: #0: 5.200, true rewards: #0: 4.343
+[2023-03-01 03:18:27,194][00674] Avg episode reward: 5.200, avg true_objective: 4.343
+[2023-03-01 03:18:27,299][00674] Num frames 3100...
+[2023-03-01 03:18:27,461][00674] Num frames 3200...
+[2023-03-01 03:18:27,625][00674] Num frames 3300...
+[2023-03-01 03:18:27,789][00674] Num frames 3400...
+[2023-03-01 03:18:27,885][00674] Avg episode rewards: #0: 5.030, true rewards: #0: 4.280
+[2023-03-01 03:18:27,887][00674] Avg episode reward: 5.030, avg true_objective: 4.280
+[2023-03-01 03:18:28,020][00674] Num frames 3500...
+[2023-03-01 03:18:28,203][00674] Num frames 3600...
+[2023-03-01 03:18:28,389][00674] Num frames 3700...
+[2023-03-01 03:18:28,562][00674] Num frames 3800...
+[2023-03-01 03:18:28,637][00674] Avg episode rewards: #0: 4.898, true rewards: #0: 4.231
+[2023-03-01 03:18:28,639][00674] Avg episode reward: 4.898, avg true_objective: 4.231
+[2023-03-01 03:18:28,800][00674] Num frames 3900...
+[2023-03-01 03:18:28,969][00674] Num frames 4000...
+[2023-03-01 03:18:29,093][00674] Num frames 4100...
+[2023-03-01 03:18:29,218][00674] Num frames 4200...
+[2023-03-01 03:18:29,339][00674] Avg episode rewards: #0: 4.956, true rewards: #0: 4.256
+[2023-03-01 03:18:29,341][00674] Avg episode reward: 4.956, avg true_objective: 4.256
+[2023-03-01 03:18:51,695][00674] Replay video saved to /content/train_dir/default_experiment/replay.mp4!
+[2023-03-01 03:20:42,520][00674] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
+[2023-03-01 03:20:42,522][00674] Overriding arg 'num_workers' with value 1 passed from command line
+[2023-03-01 03:20:42,523][00674] Adding new argument 'no_render'=True that is not in the saved config file!
+[2023-03-01 03:20:42,525][00674] Adding new argument 'save_video'=True that is not in the saved config file!
+[2023-03-01 03:20:42,526][00674] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
+[2023-03-01 03:20:42,528][00674] Adding new argument 'video_name'=None that is not in the saved config file!
+[2023-03-01 03:20:42,530][00674] Adding new argument 'max_num_frames'=100000 that is not in the saved config file!
+[2023-03-01 03:20:42,532][00674] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
+[2023-03-01 03:20:42,533][00674] Adding new argument 'push_to_hub'=True that is not in the saved config file!
+[2023-03-01 03:20:42,534][00674] Adding new argument 'hf_repository'='Antiraedus/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file!
+[2023-03-01 03:20:42,535][00674] Adding new argument 'policy_index'=0 that is not in the saved config file!
+[2023-03-01 03:20:42,536][00674] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
+[2023-03-01 03:20:42,538][00674] Adding new argument 'train_script'=None that is not in the saved config file!
+[2023-03-01 03:20:42,539][00674] Adding new argument 'enjoy_script'=None that is not in the saved config file!
+[2023-03-01 03:20:42,540][00674] Using frameskip 1 and render_action_repeat=4 for evaluation
+[2023-03-01 03:20:42,564][00674] RunningMeanStd input shape: (3, 72, 128)
+[2023-03-01 03:20:42,566][00674] RunningMeanStd input shape: (1,)
+[2023-03-01 03:20:42,581][00674] ConvEncoder: input_channels=3
+[2023-03-01 03:20:42,623][00674] Conv encoder output size: 512
+[2023-03-01 03:20:42,625][00674] Policy head output size: 512
+[2023-03-01 03:20:42,644][00674] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000003_12288.pth...
+[2023-03-01 03:20:43,085][00674] Num frames 100...
+[2023-03-01 03:20:43,205][00674] Num frames 200...
+[2023-03-01 03:20:43,330][00674] Num frames 300...
+[2023-03-01 03:20:43,462][00674] Num frames 400...
+[2023-03-01 03:20:43,575][00674] Avg episode rewards: #0: 5.480, true rewards: #0: 4.480
+[2023-03-01 03:20:43,576][00674] Avg episode reward: 5.480, avg true_objective: 4.480
+[2023-03-01 03:20:43,642][00674] Num frames 500...
+[2023-03-01 03:20:43,754][00674] Num frames 600...
+[2023-03-01 03:20:43,866][00674] Num frames 700...
+[2023-03-01 03:20:43,986][00674] Num frames 800...
+[2023-03-01 03:20:44,079][00674] Avg episode rewards: #0: 4.660, true rewards: #0: 4.160
+[2023-03-01 03:20:44,081][00674] Avg episode reward: 4.660, avg true_objective: 4.160
+[2023-03-01 03:20:44,170][00674] Num frames 900...
+[2023-03-01 03:20:44,341][00674] Num frames 1000...
+[2023-03-01 03:20:44,505][00674] Num frames 1100...
+[2023-03-01 03:20:44,667][00674] Num frames 1200...
+[2023-03-01 03:20:44,751][00674] Avg episode rewards: #0: 4.387, true rewards: #0: 4.053
+[2023-03-01 03:20:44,757][00674] Avg episode reward: 4.387, avg true_objective: 4.053
+[2023-03-01 03:20:44,896][00674] Num frames 1300...
+[2023-03-01 03:20:45,060][00674] Num frames 1400...
+[2023-03-01 03:20:45,224][00674] Num frames 1500...
+[2023-03-01 03:20:45,388][00674] Num frames 1600...
+[2023-03-01 03:20:45,444][00674] Avg episode rewards: #0: 4.250, true rewards: #0: 4.000
+[2023-03-01 03:20:45,450][00674] Avg episode reward: 4.250, avg true_objective: 4.000
+[2023-03-01 03:20:45,622][00674] Num frames 1700...
+[2023-03-01 03:20:45,796][00674] Num frames 1800...
+[2023-03-01 03:20:45,965][00674] Num frames 1900...
+[2023-03-01 03:20:46,173][00674] Avg episode rewards: #0: 4.168, true rewards: #0: 3.968
+[2023-03-01 03:20:46,176][00674] Avg episode reward: 4.168, avg true_objective: 3.968
+[2023-03-01 03:20:46,213][00674] Num frames 2000...
+[2023-03-01 03:20:46,385][00674] Num frames 2100...
+[2023-03-01 03:20:46,557][00674] Num frames 2200...
+[2023-03-01 03:20:46,729][00674] Num frames 2300...
+[2023-03-01 03:20:46,906][00674] Num frames 2400...
+[2023-03-01 03:20:46,962][00674] Avg episode rewards: #0: 4.333, true rewards: #0: 4.000
+[2023-03-01 03:20:46,964][00674] Avg episode reward: 4.333, avg true_objective: 4.000
+[2023-03-01 03:20:47,133][00674] Num frames 2500...
+[2023-03-01 03:20:47,307][00674] Num frames 2600...
+[2023-03-01 03:20:47,469][00674] Num frames 2700...
+[2023-03-01 03:20:47,639][00674] Num frames 2800...
+[2023-03-01 03:20:47,757][00674] Avg episode rewards: #0: 4.497, true rewards: #0: 4.069
+[2023-03-01 03:20:47,758][00674] Avg episode reward: 4.497, avg true_objective: 4.069
+[2023-03-01 03:20:47,822][00674] Num frames 2900...
+[2023-03-01 03:20:47,935][00674] Num frames 3000...
+[2023-03-01 03:20:48,055][00674] Num frames 3100...
+[2023-03-01 03:20:48,173][00674] Num frames 3200...
+[2023-03-01 03:20:48,342][00674] Avg episode rewards: #0: 4.620, true rewards: #0: 4.120
+[2023-03-01 03:20:48,344][00674] Avg episode reward: 4.620, avg true_objective: 4.120
+[2023-03-01 03:20:48,353][00674] Num frames 3300...
+[2023-03-01 03:20:48,467][00674] Num frames 3400...
+[2023-03-01 03:20:48,588][00674] Num frames 3500...
+[2023-03-01 03:20:48,702][00674] Num frames 3600...
+[2023-03-01 03:20:48,818][00674] Num frames 3700...
+[2023-03-01 03:20:48,934][00674] Num frames 3800...
+[2023-03-01 03:20:49,002][00674] Avg episode rewards: #0: 4.898, true rewards: #0: 4.231
+[2023-03-01 03:20:49,004][00674] Avg episode reward: 4.898, avg true_objective: 4.231
+[2023-03-01 03:20:49,108][00674] Num frames 3900...
+[2023-03-01 03:20:49,227][00674] Num frames 4000...
+[2023-03-01 03:20:49,347][00674] Num frames 4100...
+[2023-03-01 03:20:49,509][00674] Avg episode rewards: #0: 4.792, true rewards: #0: 4.192
+[2023-03-01 03:20:49,511][00674] Avg episode reward: 4.792, avg true_objective: 4.192
+[2023-03-01 03:21:02,994][00674] Replay video saved to /content/train_dir/default_experiment/replay.mp4!