diff --git "a/sf_log.txt" "b/sf_log.txt" --- "a/sf_log.txt" +++ "b/sf_log.txt" @@ -1,50 +1,50 @@ -[2023-02-25 01:10:29,963][00389] Saving configuration to /content/train_dir/default_experiment/config.json... -[2023-02-25 01:10:29,970][00389] Rollout worker 0 uses device cpu -[2023-02-25 01:10:29,980][00389] Rollout worker 1 uses device cpu -[2023-02-25 01:10:29,983][00389] Rollout worker 2 uses device cpu -[2023-02-25 01:10:29,986][00389] Rollout worker 3 uses device cpu -[2023-02-25 01:10:29,990][00389] Rollout worker 4 uses device cpu -[2023-02-25 01:10:29,992][00389] Rollout worker 5 uses device cpu -[2023-02-25 01:10:29,995][00389] Rollout worker 6 uses device cpu -[2023-02-25 01:10:29,998][00389] Rollout worker 7 uses device cpu -[2023-02-25 01:10:30,432][00389] Using GPUs [0] for process 0 (actually maps to GPUs [0]) -[2023-02-25 01:10:30,440][00389] InferenceWorker_p0-w0: min num requests: 2 -[2023-02-25 01:10:30,503][00389] Starting all processes... -[2023-02-25 01:10:30,509][00389] Starting process learner_proc0 -[2023-02-25 01:10:30,614][00389] Starting all processes... -[2023-02-25 01:10:30,672][00389] Starting process inference_proc0-0 -[2023-02-25 01:10:30,673][00389] Starting process rollout_proc0 -[2023-02-25 01:10:30,676][00389] Starting process rollout_proc1 -[2023-02-25 01:10:30,676][00389] Starting process rollout_proc2 -[2023-02-25 01:10:30,676][00389] Starting process rollout_proc3 -[2023-02-25 01:10:30,676][00389] Starting process rollout_proc4 -[2023-02-25 01:10:30,676][00389] Starting process rollout_proc5 -[2023-02-25 01:10:30,676][00389] Starting process rollout_proc6 -[2023-02-25 01:10:30,676][00389] Starting process rollout_proc7 -[2023-02-25 01:10:41,746][10405] Worker 7 uses CPU cores [1] -[2023-02-25 01:10:41,812][10383] Using GPUs [0] for process 0 (actually maps to GPUs [0]) -[2023-02-25 01:10:41,815][10383] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 -[2023-02-25 01:10:42,183][10399] Worker 0 uses CPU cores [0] -[2023-02-25 01:10:42,192][10397] Using GPUs [0] for process 0 (actually maps to GPUs [0]) -[2023-02-25 01:10:42,195][10397] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 -[2023-02-25 01:10:42,375][10404] Worker 6 uses CPU cores [0] -[2023-02-25 01:10:42,479][10400] Worker 2 uses CPU cores [0] -[2023-02-25 01:10:42,507][10402] Worker 3 uses CPU cores [1] -[2023-02-25 01:10:42,537][10398] Worker 1 uses CPU cores [1] -[2023-02-25 01:10:42,556][10403] Worker 5 uses CPU cores [1] -[2023-02-25 01:10:42,766][10401] Worker 4 uses CPU cores [0] -[2023-02-25 01:10:43,299][10397] Num visible devices: 1 -[2023-02-25 01:10:43,302][10383] Num visible devices: 1 -[2023-02-25 01:10:43,322][10383] Starting seed is not provided -[2023-02-25 01:10:43,323][10383] Using GPUs [0] for process 0 (actually maps to GPUs [0]) -[2023-02-25 01:10:43,324][10383] Initializing actor-critic model on device cuda:0 -[2023-02-25 01:10:43,325][10383] RunningMeanStd input shape: (3, 72, 128) -[2023-02-25 01:10:43,327][10383] RunningMeanStd input shape: (1,) -[2023-02-25 01:10:43,369][10383] ConvEncoder: input_channels=3 -[2023-02-25 01:10:44,099][10383] Conv encoder output size: 512 -[2023-02-25 01:10:44,101][10383] Policy head output size: 512 -[2023-02-25 01:10:44,205][10383] Created Actor Critic model with architecture: -[2023-02-25 01:10:44,206][10383] ActorCriticSharedWeights( +[2023-03-01 03:17:06,195][00674] Saving configuration to /content/train_dir/default_experiment/config.json... +[2023-03-01 03:17:06,199][00674] Rollout worker 0 uses device cpu +[2023-03-01 03:17:06,200][00674] Rollout worker 1 uses device cpu +[2023-03-01 03:17:06,201][00674] Rollout worker 2 uses device cpu +[2023-03-01 03:17:06,203][00674] Rollout worker 3 uses device cpu +[2023-03-01 03:17:06,204][00674] Rollout worker 4 uses device cpu +[2023-03-01 03:17:06,206][00674] Rollout worker 5 uses device cpu +[2023-03-01 03:17:06,207][00674] Rollout worker 6 uses device cpu +[2023-03-01 03:17:06,208][00674] Rollout worker 7 uses device cpu +[2023-03-01 03:17:06,400][00674] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2023-03-01 03:17:06,402][00674] InferenceWorker_p0-w0: min num requests: 2 +[2023-03-01 03:17:06,435][00674] Starting all processes... +[2023-03-01 03:17:06,437][00674] Starting process learner_proc0 +[2023-03-01 03:17:06,489][00674] Starting all processes... +[2023-03-01 03:17:06,504][00674] Starting process inference_proc0-0 +[2023-03-01 03:17:06,505][00674] Starting process rollout_proc0 +[2023-03-01 03:17:06,506][00674] Starting process rollout_proc1 +[2023-03-01 03:17:06,507][00674] Starting process rollout_proc2 +[2023-03-01 03:17:06,507][00674] Starting process rollout_proc3 +[2023-03-01 03:17:06,507][00674] Starting process rollout_proc4 +[2023-03-01 03:17:06,507][00674] Starting process rollout_proc5 +[2023-03-01 03:17:06,507][00674] Starting process rollout_proc6 +[2023-03-01 03:17:06,507][00674] Starting process rollout_proc7 +[2023-03-01 03:17:15,653][11907] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2023-03-01 03:17:15,660][11907] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 +[2023-03-01 03:17:16,111][11921] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2023-03-01 03:17:16,131][11921] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 +[2023-03-01 03:17:16,213][11930] Worker 4 uses CPU cores [0] +[2023-03-01 03:17:16,300][11929] Worker 5 uses CPU cores [1] +[2023-03-01 03:17:16,314][11933] Worker 7 uses CPU cores [1] +[2023-03-01 03:17:16,514][11927] Worker 0 uses CPU cores [0] +[2023-03-01 03:17:16,528][11931] Worker 2 uses CPU cores [0] +[2023-03-01 03:17:16,652][11932] Worker 6 uses CPU cores [0] +[2023-03-01 03:17:16,793][11926] Worker 1 uses CPU cores [1] +[2023-03-01 03:17:16,794][11928] Worker 3 uses CPU cores [1] +[2023-03-01 03:17:16,904][11921] Num visible devices: 1 +[2023-03-01 03:17:16,906][11907] Num visible devices: 1 +[2023-03-01 03:17:16,918][11907] Starting seed is not provided +[2023-03-01 03:17:16,918][11907] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2023-03-01 03:17:16,918][11907] Initializing actor-critic model on device cuda:0 +[2023-03-01 03:17:16,919][11907] RunningMeanStd input shape: (3, 72, 128) +[2023-03-01 03:17:16,921][11907] RunningMeanStd input shape: (1,) +[2023-03-01 03:17:16,942][11907] ConvEncoder: input_channels=3 +[2023-03-01 03:17:17,298][11907] Conv encoder output size: 512 +[2023-03-01 03:17:17,299][11907] Policy head output size: 512 +[2023-03-01 03:17:17,359][11907] Created Actor Critic model with architecture: +[2023-03-01 03:17:17,359][11907] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( @@ -85,1210 +85,324 @@ (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) -[2023-02-25 01:10:50,418][00389] Heartbeat connected on Batcher_0 -[2023-02-25 01:10:50,432][00389] Heartbeat connected on InferenceWorker_p0-w0 -[2023-02-25 01:10:50,456][00389] Heartbeat connected on RolloutWorker_w0 -[2023-02-25 01:10:50,462][00389] Heartbeat connected on RolloutWorker_w1 -[2023-02-25 01:10:50,472][00389] Heartbeat connected on RolloutWorker_w2 -[2023-02-25 01:10:50,486][00389] Heartbeat connected on RolloutWorker_w3 -[2023-02-25 01:10:50,488][00389] Heartbeat connected on RolloutWorker_w4 -[2023-02-25 01:10:50,492][00389] Heartbeat connected on RolloutWorker_w5 -[2023-02-25 01:10:50,497][00389] Heartbeat connected on RolloutWorker_w6 -[2023-02-25 01:10:50,502][00389] Heartbeat connected on RolloutWorker_w7 -[2023-02-25 01:10:52,969][10383] Using optimizer -[2023-02-25 01:10:52,970][10383] No checkpoints found -[2023-02-25 01:10:52,971][10383] Did not load from checkpoint, starting from scratch! -[2023-02-25 01:10:52,971][10383] Initialized policy 0 weights for model version 0 -[2023-02-25 01:10:52,976][10383] Using GPUs [0] for process 0 (actually maps to GPUs [0]) -[2023-02-25 01:10:52,983][10383] LearnerWorker_p0 finished initialization! -[2023-02-25 01:10:52,984][00389] Heartbeat connected on LearnerWorker_p0 -[2023-02-25 01:10:53,188][10397] RunningMeanStd input shape: (3, 72, 128) -[2023-02-25 01:10:53,189][10397] RunningMeanStd input shape: (1,) -[2023-02-25 01:10:53,202][10397] ConvEncoder: input_channels=3 -[2023-02-25 01:10:53,312][10397] Conv encoder output size: 512 -[2023-02-25 01:10:53,312][10397] Policy head output size: 512 -[2023-02-25 01:10:53,377][00389] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2023-02-25 01:10:55,603][00389] Inference worker 0-0 is ready! -[2023-02-25 01:10:55,605][00389] All inference workers are ready! Signal rollout workers to start! -[2023-02-25 01:10:55,733][10405] Doom resolution: 160x120, resize resolution: (128, 72) -[2023-02-25 01:10:55,739][10402] Doom resolution: 160x120, resize resolution: (128, 72) -[2023-02-25 01:10:55,749][10403] Doom resolution: 160x120, resize resolution: (128, 72) -[2023-02-25 01:10:55,770][10404] Doom resolution: 160x120, resize resolution: (128, 72) -[2023-02-25 01:10:55,765][10401] Doom resolution: 160x120, resize resolution: (128, 72) -[2023-02-25 01:10:55,767][10398] Doom resolution: 160x120, resize resolution: (128, 72) -[2023-02-25 01:10:55,780][10400] Doom resolution: 160x120, resize resolution: (128, 72) -[2023-02-25 01:10:55,784][10399] Doom resolution: 160x120, resize resolution: (128, 72) -[2023-02-25 01:10:56,651][10399] Decorrelating experience for 0 frames... -[2023-02-25 01:10:56,653][10400] Decorrelating experience for 0 frames... -[2023-02-25 01:10:57,028][10402] Decorrelating experience for 0 frames... -[2023-02-25 01:10:57,032][10403] Decorrelating experience for 0 frames... -[2023-02-25 01:10:57,037][10398] Decorrelating experience for 0 frames... -[2023-02-25 01:10:57,039][10405] Decorrelating experience for 0 frames... -[2023-02-25 01:10:58,228][10401] Decorrelating experience for 0 frames... -[2023-02-25 01:10:58,246][10400] Decorrelating experience for 32 frames... -[2023-02-25 01:10:58,377][00389] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2023-02-25 01:10:58,393][10402] Decorrelating experience for 32 frames... -[2023-02-25 01:10:58,409][10398] Decorrelating experience for 32 frames... -[2023-02-25 01:10:58,414][10405] Decorrelating experience for 32 frames... -[2023-02-25 01:10:59,716][10401] Decorrelating experience for 32 frames... -[2023-02-25 01:10:59,739][10403] Decorrelating experience for 32 frames... -[2023-02-25 01:10:59,950][10402] Decorrelating experience for 64 frames... -[2023-02-25 01:10:59,953][10400] Decorrelating experience for 64 frames... -[2023-02-25 01:10:59,961][10399] Decorrelating experience for 32 frames... -[2023-02-25 01:11:01,604][10404] Decorrelating experience for 0 frames... -[2023-02-25 01:11:01,607][10401] Decorrelating experience for 64 frames... -[2023-02-25 01:11:01,665][10398] Decorrelating experience for 64 frames... -[2023-02-25 01:11:01,668][10403] Decorrelating experience for 64 frames... -[2023-02-25 01:11:01,833][10402] Decorrelating experience for 96 frames... -[2023-02-25 01:11:01,893][10399] Decorrelating experience for 64 frames... -[2023-02-25 01:11:03,270][10404] Decorrelating experience for 32 frames... -[2023-02-25 01:11:03,373][10400] Decorrelating experience for 96 frames... -[2023-02-25 01:11:03,380][00389] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2023-02-25 01:11:03,534][10401] Decorrelating experience for 96 frames... -[2023-02-25 01:11:04,314][10399] Decorrelating experience for 96 frames... -[2023-02-25 01:11:04,408][10404] Decorrelating experience for 64 frames... -[2023-02-25 01:11:05,267][10405] Decorrelating experience for 64 frames... -[2023-02-25 01:11:05,430][10398] Decorrelating experience for 96 frames... -[2023-02-25 01:11:05,439][10403] Decorrelating experience for 96 frames... -[2023-02-25 01:11:05,556][10404] Decorrelating experience for 96 frames... -[2023-02-25 01:11:06,072][10405] Decorrelating experience for 96 frames... -[2023-02-25 01:11:08,377][00389] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 38.5. Samples: 578. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2023-02-25 01:11:08,380][00389] Avg episode reward: [(0, '1.173')] -[2023-02-25 01:11:09,514][10383] Signal inference workers to stop experience collection... -[2023-02-25 01:11:09,530][10397] InferenceWorker_p0-w0: stopping experience collection -[2023-02-25 01:11:12,293][10383] Signal inference workers to resume experience collection... -[2023-02-25 01:11:12,294][10397] InferenceWorker_p0-w0: resuming experience collection -[2023-02-25 01:11:13,377][00389] Fps is (10 sec: 409.7, 60 sec: 204.8, 300 sec: 204.8). Total num frames: 4096. Throughput: 0: 111.2. Samples: 2224. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) -[2023-02-25 01:11:13,383][00389] Avg episode reward: [(0, '2.592')] -[2023-02-25 01:11:18,377][00389] Fps is (10 sec: 2457.6, 60 sec: 983.0, 300 sec: 983.0). Total num frames: 24576. Throughput: 0: 227.4. Samples: 5686. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2023-02-25 01:11:18,383][00389] Avg episode reward: [(0, '3.547')] -[2023-02-25 01:11:23,377][00389] Fps is (10 sec: 3276.8, 60 sec: 1228.8, 300 sec: 1228.8). Total num frames: 36864. Throughput: 0: 326.9. Samples: 9808. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2023-02-25 01:11:23,380][00389] Avg episode reward: [(0, '3.829')] -[2023-02-25 01:11:23,802][10397] Updated weights for policy 0, policy_version 10 (0.0017) -[2023-02-25 01:11:28,377][00389] Fps is (10 sec: 3276.7, 60 sec: 1638.4, 300 sec: 1638.4). Total num frames: 57344. Throughput: 0: 341.4. Samples: 11948. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) -[2023-02-25 01:11:28,379][00389] Avg episode reward: [(0, '4.336')] -[2023-02-25 01:11:33,377][00389] Fps is (10 sec: 4096.0, 60 sec: 1945.6, 300 sec: 1945.6). Total num frames: 77824. Throughput: 0: 459.8. Samples: 18392. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2023-02-25 01:11:33,380][00389] Avg episode reward: [(0, '4.361')] -[2023-02-25 01:11:33,949][10397] Updated weights for policy 0, policy_version 20 (0.0014) -[2023-02-25 01:11:38,377][00389] Fps is (10 sec: 3686.5, 60 sec: 2093.5, 300 sec: 2093.5). Total num frames: 94208. Throughput: 0: 544.2. Samples: 24490. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2023-02-25 01:11:38,380][00389] Avg episode reward: [(0, '4.380')] -[2023-02-25 01:11:41,603][00389] Keyboard interrupt detected in the event loop EvtLoop [Runner_EvtLoop, process=main process 389], exiting... -[2023-02-25 01:11:41,611][10383] Stopping Batcher_0... -[2023-02-25 01:11:41,612][10383] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000026_106496.pth... -[2023-02-25 01:11:41,610][00389] Runner profile tree view: -main_loop: 71.1077 -[2023-02-25 01:11:41,614][00389] Collected {0: 106496}, FPS: 1497.7 -[2023-02-25 01:11:41,700][10397] Weights refcount: 2 0 -[2023-02-25 01:11:41,613][10383] Loop batcher_evt_loop terminating... -[2023-02-25 01:11:41,687][10403] EvtLoop [rollout_proc5_evt_loop, process=rollout_proc5] unhandled exception in slot='advance_rollouts' connected to emitter=Emitter(object_id='InferenceWorker_p0-w0', signal_name='advance5'), args=(1, 0) -Traceback (most recent call last): - File "/usr/local/lib/python3.8/dist-packages/signal_slot/signal_slot.py", line 355, in _process_signal - slot_callable(*args) - File "/usr/local/lib/python3.8/dist-packages/sample_factory/algo/sampling/rollout_worker.py", line 241, in advance_rollouts - complete_rollouts, episodic_stats = runner.advance_rollouts(policy_id, self.timing) - File "/usr/local/lib/python3.8/dist-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 634, in advance_rollouts - new_obs, rewards, terminated, truncated, infos = e.step(actions) - File "/usr/local/lib/python3.8/dist-packages/gym/core.py", line 319, in step - return self.env.step(action) - File "/usr/local/lib/python3.8/dist-packages/sample_factory/algo/utils/make_env.py", line 129, in step - obs, rew, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.8/dist-packages/sample_factory/algo/utils/make_env.py", line 115, in step - obs, rew, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.8/dist-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 33, in step - observation, reward, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.8/dist-packages/gym/core.py", line 384, in step - observation, reward, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.8/dist-packages/sample_factory/envs/env_wrappers.py", line 88, in step - obs, reward, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.8/dist-packages/gym/core.py", line 319, in step - return self.env.step(action) - File "/usr/local/lib/python3.8/dist-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 54, in step - obs, reward, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.8/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 452, in step - reward = self.game.make_action(actions_flattened, self.skip_frames) -vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed. -[2023-02-25 01:11:41,658][10402] EvtLoop [rollout_proc3_evt_loop, process=rollout_proc3] unhandled exception in slot='advance_rollouts' connected to emitter=Emitter(object_id='InferenceWorker_p0-w0', signal_name='advance3'), args=(1, 0) -Traceback (most recent call last): - File "/usr/local/lib/python3.8/dist-packages/signal_slot/signal_slot.py", line 355, in _process_signal - slot_callable(*args) - File "/usr/local/lib/python3.8/dist-packages/sample_factory/algo/sampling/rollout_worker.py", line 241, in advance_rollouts - complete_rollouts, episodic_stats = runner.advance_rollouts(policy_id, self.timing) - File "/usr/local/lib/python3.8/dist-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 634, in advance_rollouts - new_obs, rewards, terminated, truncated, infos = e.step(actions) - File "/usr/local/lib/python3.8/dist-packages/gym/core.py", line 319, in step - return self.env.step(action) - File "/usr/local/lib/python3.8/dist-packages/sample_factory/algo/utils/make_env.py", line 129, in step - obs, rew, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.8/dist-packages/sample_factory/algo/utils/make_env.py", line 115, in step - obs, rew, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.8/dist-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 33, in step - observation, reward, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.8/dist-packages/gym/core.py", line 384, in step - observation, reward, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.8/dist-packages/sample_factory/envs/env_wrappers.py", line 88, in step - obs, reward, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.8/dist-packages/gym/core.py", line 319, in step - return self.env.step(action) - File "/usr/local/lib/python3.8/dist-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 54, in step - obs, reward, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.8/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 452, in step - reward = self.game.make_action(actions_flattened, self.skip_frames) -vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed. -[2023-02-25 01:11:41,721][10402] Unhandled exception Signal SIGINT received. ViZDoom instance has been closed. in evt loop rollout_proc3_evt_loop -[2023-02-25 01:11:41,684][10400] EvtLoop [rollout_proc2_evt_loop, process=rollout_proc2] unhandled exception in slot='advance_rollouts' connected to emitter=Emitter(object_id='InferenceWorker_p0-w0', signal_name='advance2'), args=(1, 0) -Traceback (most recent call last): - File "/usr/local/lib/python3.8/dist-packages/signal_slot/signal_slot.py", line 355, in _process_signal - slot_callable(*args) - File "/usr/local/lib/python3.8/dist-packages/sample_factory/algo/sampling/rollout_worker.py", line 241, in advance_rollouts - complete_rollouts, episodic_stats = runner.advance_rollouts(policy_id, self.timing) - File "/usr/local/lib/python3.8/dist-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 634, in advance_rollouts - new_obs, rewards, terminated, truncated, infos = e.step(actions) - File "/usr/local/lib/python3.8/dist-packages/gym/core.py", line 319, in step - return self.env.step(action) - File "/usr/local/lib/python3.8/dist-packages/sample_factory/algo/utils/make_env.py", line 129, in step - obs, rew, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.8/dist-packages/sample_factory/algo/utils/make_env.py", line 115, in step - obs, rew, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.8/dist-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 33, in step - observation, reward, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.8/dist-packages/gym/core.py", line 384, in step - observation, reward, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.8/dist-packages/sample_factory/envs/env_wrappers.py", line 88, in step - obs, reward, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.8/dist-packages/gym/core.py", line 319, in step - return self.env.step(action) - File "/usr/local/lib/python3.8/dist-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 54, in step - obs, reward, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.8/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 452, in step - reward = self.game.make_action(actions_flattened, self.skip_frames) -vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed. -[2023-02-25 01:11:41,725][10400] Unhandled exception Signal SIGINT received. ViZDoom instance has been closed. in evt loop rollout_proc2_evt_loop -[2023-02-25 01:11:41,721][10403] Unhandled exception Signal SIGINT received. ViZDoom instance has been closed. in evt loop rollout_proc5_evt_loop -[2023-02-25 01:11:41,726][10398] EvtLoop [rollout_proc1_evt_loop, process=rollout_proc1] unhandled exception in slot='advance_rollouts' connected to emitter=Emitter(object_id='InferenceWorker_p0-w0', signal_name='advance1'), args=(1, 0) -Traceback (most recent call last): - File "/usr/local/lib/python3.8/dist-packages/signal_slot/signal_slot.py", line 355, in _process_signal - slot_callable(*args) - File "/usr/local/lib/python3.8/dist-packages/sample_factory/algo/sampling/rollout_worker.py", line 241, in advance_rollouts - complete_rollouts, episodic_stats = runner.advance_rollouts(policy_id, self.timing) - File "/usr/local/lib/python3.8/dist-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 634, in advance_rollouts - new_obs, rewards, terminated, truncated, infos = e.step(actions) - File "/usr/local/lib/python3.8/dist-packages/gym/core.py", line 319, in step - return self.env.step(action) - File "/usr/local/lib/python3.8/dist-packages/sample_factory/algo/utils/make_env.py", line 129, in step - obs, rew, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.8/dist-packages/sample_factory/algo/utils/make_env.py", line 115, in step - obs, rew, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.8/dist-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 33, in step - observation, reward, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.8/dist-packages/gym/core.py", line 384, in step - observation, reward, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.8/dist-packages/sample_factory/envs/env_wrappers.py", line 88, in step - obs, reward, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.8/dist-packages/gym/core.py", line 319, in step - return self.env.step(action) - File "/usr/local/lib/python3.8/dist-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 54, in step - obs, reward, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.8/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 452, in step - reward = self.game.make_action(actions_flattened, self.skip_frames) -vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed. -[2023-02-25 01:11:41,757][10398] Unhandled exception Signal SIGINT received. ViZDoom instance has been closed. in evt loop rollout_proc1_evt_loop -[2023-02-25 01:11:41,596][10404] EvtLoop [rollout_proc6_evt_loop, process=rollout_proc6] unhandled exception in slot='advance_rollouts' connected to emitter=Emitter(object_id='InferenceWorker_p0-w0', signal_name='advance6'), args=(0, 0) -Traceback (most recent call last): - File "/usr/local/lib/python3.8/dist-packages/signal_slot/signal_slot.py", line 355, in _process_signal - slot_callable(*args) - File "/usr/local/lib/python3.8/dist-packages/sample_factory/algo/sampling/rollout_worker.py", line 241, in advance_rollouts - complete_rollouts, episodic_stats = runner.advance_rollouts(policy_id, self.timing) - File "/usr/local/lib/python3.8/dist-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 634, in advance_rollouts - new_obs, rewards, terminated, truncated, infos = e.step(actions) - File "/usr/local/lib/python3.8/dist-packages/gym/core.py", line 319, in step - return self.env.step(action) - File "/usr/local/lib/python3.8/dist-packages/sample_factory/algo/utils/make_env.py", line 129, in step - obs, rew, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.8/dist-packages/sample_factory/algo/utils/make_env.py", line 115, in step - obs, rew, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.8/dist-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 33, in step - observation, reward, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.8/dist-packages/gym/core.py", line 384, in step - observation, reward, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.8/dist-packages/sample_factory/envs/env_wrappers.py", line 88, in step - obs, reward, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.8/dist-packages/gym/core.py", line 319, in step - return self.env.step(action) - File "/usr/local/lib/python3.8/dist-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 54, in step - obs, reward, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.8/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 452, in step - reward = self.game.make_action(actions_flattened, self.skip_frames) -vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed. -[2023-02-25 01:11:41,758][10404] Unhandled exception Signal SIGINT received. ViZDoom instance has been closed. in evt loop rollout_proc6_evt_loop -[2023-02-25 01:11:41,665][10401] EvtLoop [rollout_proc4_evt_loop, process=rollout_proc4] unhandled exception in slot='advance_rollouts' connected to emitter=Emitter(object_id='InferenceWorker_p0-w0', signal_name='advance4'), args=(1, 0) -Traceback (most recent call last): - File "/usr/local/lib/python3.8/dist-packages/signal_slot/signal_slot.py", line 355, in _process_signal - slot_callable(*args) - File "/usr/local/lib/python3.8/dist-packages/sample_factory/algo/sampling/rollout_worker.py", line 241, in advance_rollouts - complete_rollouts, episodic_stats = runner.advance_rollouts(policy_id, self.timing) - File "/usr/local/lib/python3.8/dist-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 634, in advance_rollouts - new_obs, rewards, terminated, truncated, infos = e.step(actions) - File "/usr/local/lib/python3.8/dist-packages/gym/core.py", line 319, in step - return self.env.step(action) - File "/usr/local/lib/python3.8/dist-packages/sample_factory/algo/utils/make_env.py", line 129, in step - obs, rew, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.8/dist-packages/sample_factory/algo/utils/make_env.py", line 115, in step - obs, rew, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.8/dist-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 33, in step - observation, reward, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.8/dist-packages/gym/core.py", line 384, in step - observation, reward, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.8/dist-packages/sample_factory/envs/env_wrappers.py", line 88, in step - obs, reward, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.8/dist-packages/gym/core.py", line 319, in step - return self.env.step(action) - File "/usr/local/lib/python3.8/dist-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 54, in step - obs, reward, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.8/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 452, in step - reward = self.game.make_action(actions_flattened, self.skip_frames) -vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed. -[2023-02-25 01:11:41,763][10401] Unhandled exception Signal SIGINT received. ViZDoom instance has been closed. in evt loop rollout_proc4_evt_loop -[2023-02-25 01:11:41,755][10397] Stopping InferenceWorker_p0-w0... -[2023-02-25 01:11:41,772][10397] Loop inference_proc0-0_evt_loop terminating... -[2023-02-25 01:11:41,747][10399] EvtLoop [rollout_proc0_evt_loop, process=rollout_proc0] unhandled exception in slot='advance_rollouts' connected to emitter=Emitter(object_id='InferenceWorker_p0-w0', signal_name='advance0'), args=(1, 0) -Traceback (most recent call last): - File "/usr/local/lib/python3.8/dist-packages/signal_slot/signal_slot.py", line 355, in _process_signal - slot_callable(*args) - File "/usr/local/lib/python3.8/dist-packages/sample_factory/algo/sampling/rollout_worker.py", line 241, in advance_rollouts - complete_rollouts, episodic_stats = runner.advance_rollouts(policy_id, self.timing) - File "/usr/local/lib/python3.8/dist-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 634, in advance_rollouts - new_obs, rewards, terminated, truncated, infos = e.step(actions) - File "/usr/local/lib/python3.8/dist-packages/gym/core.py", line 319, in step - return self.env.step(action) - File "/usr/local/lib/python3.8/dist-packages/sample_factory/algo/utils/make_env.py", line 129, in step - obs, rew, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.8/dist-packages/sample_factory/algo/utils/make_env.py", line 115, in step - obs, rew, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.8/dist-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 33, in step - observation, reward, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.8/dist-packages/gym/core.py", line 384, in step - observation, reward, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.8/dist-packages/sample_factory/envs/env_wrappers.py", line 88, in step - obs, reward, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.8/dist-packages/gym/core.py", line 319, in step - return self.env.step(action) - File "/usr/local/lib/python3.8/dist-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 54, in step - obs, reward, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.8/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 452, in step - reward = self.game.make_action(actions_flattened, self.skip_frames) -vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed. -[2023-02-25 01:11:41,831][10399] Unhandled exception Signal SIGINT received. ViZDoom instance has been closed. in evt loop rollout_proc0_evt_loop -[2023-02-25 01:11:41,910][10405] EvtLoop [rollout_proc7_evt_loop, process=rollout_proc7] unhandled exception in slot='advance_rollouts' connected to emitter=Emitter(object_id='InferenceWorker_p0-w0', signal_name='advance7'), args=(0, 0) -Traceback (most recent call last): - File "/usr/local/lib/python3.8/dist-packages/signal_slot/signal_slot.py", line 355, in _process_signal - slot_callable(*args) - File "/usr/local/lib/python3.8/dist-packages/sample_factory/algo/sampling/rollout_worker.py", line 241, in advance_rollouts - complete_rollouts, episodic_stats = runner.advance_rollouts(policy_id, self.timing) - File "/usr/local/lib/python3.8/dist-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 634, in advance_rollouts - new_obs, rewards, terminated, truncated, infos = e.step(actions) - File "/usr/local/lib/python3.8/dist-packages/gym/core.py", line 319, in step - return self.env.step(action) - File "/usr/local/lib/python3.8/dist-packages/sample_factory/algo/utils/make_env.py", line 129, in step - obs, rew, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.8/dist-packages/sample_factory/algo/utils/make_env.py", line 115, in step - obs, rew, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.8/dist-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 33, in step - observation, reward, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.8/dist-packages/gym/core.py", line 384, in step - observation, reward, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.8/dist-packages/sample_factory/envs/env_wrappers.py", line 88, in step - obs, reward, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.8/dist-packages/gym/core.py", line 319, in step - return self.env.step(action) - File "/usr/local/lib/python3.8/dist-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 54, in step - obs, reward, terminated, truncated, info = self.env.step(action) - File "/usr/local/lib/python3.8/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 452, in step - reward = self.game.make_action(actions_flattened, self.skip_frames) -vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed. -[2023-02-25 01:11:41,932][10405] Unhandled exception Signal SIGINT received. ViZDoom instance has been closed. in evt loop rollout_proc7_evt_loop -[2023-02-25 01:11:42,126][10383] Stopping LearnerWorker_p0... -[2023-02-25 01:11:42,127][10383] Loop learner_proc0_evt_loop terminating... -[2023-02-25 01:11:46,723][00389] Environment doom_basic already registered, overwriting... -[2023-02-25 01:11:46,725][00389] Environment doom_two_colors_easy already registered, overwriting... -[2023-02-25 01:11:46,729][00389] Environment doom_two_colors_hard already registered, overwriting... -[2023-02-25 01:11:46,731][00389] Environment doom_dm already registered, overwriting... -[2023-02-25 01:11:46,732][00389] Environment doom_dwango5 already registered, overwriting... -[2023-02-25 01:11:46,736][00389] Environment doom_my_way_home_flat_actions already registered, overwriting... -[2023-02-25 01:11:46,738][00389] Environment doom_defend_the_center_flat_actions already registered, overwriting... -[2023-02-25 01:11:46,741][00389] Environment doom_my_way_home already registered, overwriting... -[2023-02-25 01:11:46,743][00389] Environment doom_deadly_corridor already registered, overwriting... -[2023-02-25 01:11:46,746][00389] Environment doom_defend_the_center already registered, overwriting... -[2023-02-25 01:11:46,747][00389] Environment doom_defend_the_line already registered, overwriting... -[2023-02-25 01:11:46,749][00389] Environment doom_health_gathering already registered, overwriting... -[2023-02-25 01:11:46,751][00389] Environment doom_health_gathering_supreme already registered, overwriting... -[2023-02-25 01:11:46,753][00389] Environment doom_battle already registered, overwriting... -[2023-02-25 01:11:46,754][00389] Environment doom_battle2 already registered, overwriting... -[2023-02-25 01:11:46,756][00389] Environment doom_duel_bots already registered, overwriting... -[2023-02-25 01:11:46,758][00389] Environment doom_deathmatch_bots already registered, overwriting... -[2023-02-25 01:11:46,760][00389] Environment doom_duel already registered, overwriting... -[2023-02-25 01:11:46,764][00389] Environment doom_deathmatch_full already registered, overwriting... -[2023-02-25 01:11:46,766][00389] Environment doom_benchmark already registered, overwriting... -[2023-02-25 01:11:46,768][00389] register_encoder_factory: -[2023-02-25 01:11:46,797][00389] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json -[2023-02-25 01:11:46,798][00389] Overriding arg 'train_for_env_steps' with value 40000 passed from command line -[2023-02-25 01:11:46,805][00389] Experiment dir /content/train_dir/default_experiment already exists! -[2023-02-25 01:11:46,806][00389] Resuming existing experiment from /content/train_dir/default_experiment... -[2023-02-25 01:11:46,809][00389] Weights and Biases integration disabled -[2023-02-25 01:11:46,813][00389] Environment var CUDA_VISIBLE_DEVICES is 0 - -[2023-02-25 01:11:49,428][00389] Starting experiment with the following configuration: -help=False -algo=APPO -env=doom_health_gathering_supreme -experiment=default_experiment -train_dir=/content/train_dir -restart_behavior=resume -device=gpu -seed=None -num_policies=1 -async_rl=True -serial_mode=False -batched_sampling=False -num_batches_to_accumulate=2 -worker_num_splits=2 -policy_workers_per_policy=1 -max_policy_lag=1000 -num_workers=8 -num_envs_per_worker=4 -batch_size=1024 -num_batches_per_epoch=1 -num_epochs=1 -rollout=32 -recurrence=32 -shuffle_minibatches=False -gamma=0.99 -reward_scale=1.0 -reward_clip=1000.0 -value_bootstrap=False -normalize_returns=True -exploration_loss_coeff=0.001 -value_loss_coeff=0.5 -kl_loss_coeff=0.0 -exploration_loss=symmetric_kl -gae_lambda=0.95 -ppo_clip_ratio=0.1 -ppo_clip_value=0.2 -with_vtrace=False -vtrace_rho=1.0 -vtrace_c=1.0 -optimizer=adam -adam_eps=1e-06 -adam_beta1=0.9 -adam_beta2=0.999 -max_grad_norm=4.0 -learning_rate=0.0001 -lr_schedule=constant -lr_schedule_kl_threshold=0.008 -lr_adaptive_min=1e-06 -lr_adaptive_max=0.01 -obs_subtract_mean=0.0 -obs_scale=255.0 -normalize_input=True -normalize_input_keys=None -decorrelate_experience_max_seconds=0 -decorrelate_envs_on_one_worker=True -actor_worker_gpus=[] -set_workers_cpu_affinity=True -force_envs_single_thread=False -default_niceness=0 -log_to_file=True -experiment_summaries_interval=10 -flush_summaries_interval=30 -stats_avg=100 -summaries_use_frameskip=True -heartbeat_interval=20 -heartbeat_reporting_interval=600 -train_for_env_steps=40000 -train_for_seconds=10000000000 -save_every_sec=120 -keep_checkpoints=2 -load_checkpoint_kind=latest -save_milestones_sec=-1 -save_best_every_sec=5 -save_best_metric=reward -save_best_after=100000 -benchmark=False -encoder_mlp_layers=[512, 512] -encoder_conv_architecture=convnet_simple -encoder_conv_mlp_layers=[512] -use_rnn=True -rnn_size=512 -rnn_type=gru -rnn_num_layers=1 -decoder_mlp_layers=[] -nonlinearity=elu -policy_initialization=orthogonal -policy_init_gain=1.0 -actor_critic_share_weights=True -adaptive_stddev=True -continuous_tanh_scale=0.0 -initial_stddev=1.0 -use_env_info_cache=False -env_gpu_actions=False -env_gpu_observations=True -env_frameskip=4 -env_framestack=1 -pixel_format=CHW -use_record_episode_statistics=False -with_wandb=False -wandb_user=None -wandb_project=sample_factory -wandb_group=None -wandb_job_type=SF -wandb_tags=[] -with_pbt=False -pbt_mix_policies_in_one_env=True -pbt_period_env_steps=5000000 -pbt_start_mutation=20000000 -pbt_replace_fraction=0.3 -pbt_mutation_rate=0.15 -pbt_replace_reward_gap=0.1 -pbt_replace_reward_gap_absolute=1e-06 -pbt_optimize_gamma=False -pbt_target_objective=true_objective -pbt_perturb_min=1.1 -pbt_perturb_max=1.5 -num_agents=-1 -num_humans=0 -num_bots=-1 -start_bot_difficulty=None -timelimit=None -res_w=128 -res_h=72 -wide_aspect_ratio=False -eval_env_frameskip=1 -fps=35 -command_line=--env=doom_health_gathering_supreme --num_workers=8 --num_envs_per_worker=4 --train_for_env_steps=4000000 -cli_args={'env': 'doom_health_gathering_supreme', 'num_workers': 8, 'num_envs_per_worker': 4, 'train_for_env_steps': 4000000} -git_hash=unknown -git_repo_name=not a git repository -[2023-02-25 01:11:49,433][00389] Saving configuration to /content/train_dir/default_experiment/config.json... -[2023-02-25 01:11:49,440][00389] Rollout worker 0 uses device cpu -[2023-02-25 01:11:49,442][00389] Rollout worker 1 uses device cpu -[2023-02-25 01:11:49,444][00389] Rollout worker 2 uses device cpu -[2023-02-25 01:11:49,453][00389] Rollout worker 3 uses device cpu -[2023-02-25 01:11:49,457][00389] Rollout worker 4 uses device cpu -[2023-02-25 01:11:49,467][00389] Rollout worker 5 uses device cpu -[2023-02-25 01:11:49,471][00389] Rollout worker 6 uses device cpu -[2023-02-25 01:11:49,474][00389] Rollout worker 7 uses device cpu -[2023-02-25 01:11:49,789][00389] Using GPUs [0] for process 0 (actually maps to GPUs [0]) -[2023-02-25 01:11:49,791][00389] InferenceWorker_p0-w0: min num requests: 2 -[2023-02-25 01:11:49,842][00389] Starting all processes... -[2023-02-25 01:11:49,844][00389] Starting process learner_proc0 -[2023-02-25 01:11:49,927][00389] Starting all processes... -[2023-02-25 01:11:50,033][00389] Starting process inference_proc0-0 -[2023-02-25 01:11:50,034][00389] Starting process rollout_proc0 -[2023-02-25 01:11:50,034][00389] Starting process rollout_proc1 -[2023-02-25 01:11:50,034][00389] Starting process rollout_proc2 -[2023-02-25 01:11:50,034][00389] Starting process rollout_proc3 -[2023-02-25 01:11:50,034][00389] Starting process rollout_proc4 -[2023-02-25 01:11:50,035][00389] Starting process rollout_proc5 -[2023-02-25 01:11:50,036][00389] Starting process rollout_proc6 -[2023-02-25 01:11:50,039][00389] Starting process rollout_proc7 -[2023-02-25 01:12:01,907][13885] Using GPUs [0] for process 0 (actually maps to GPUs [0]) -[2023-02-25 01:12:01,916][13885] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 -[2023-02-25 01:12:03,079][13905] Worker 4 uses CPU cores [0] -[2023-02-25 01:12:03,135][13901] Worker 1 uses CPU cores [1] -[2023-02-25 01:12:03,143][13885] Num visible devices: 1 -[2023-02-25 01:12:03,186][13885] Starting seed is not provided -[2023-02-25 01:12:03,186][13885] Using GPUs [0] for process 0 (actually maps to GPUs [0]) -[2023-02-25 01:12:03,186][13885] Initializing actor-critic model on device cuda:0 -[2023-02-25 01:12:03,187][13885] RunningMeanStd input shape: (3, 72, 128) -[2023-02-25 01:12:03,188][13885] RunningMeanStd input shape: (1,) -[2023-02-25 01:12:03,204][13902] Worker 3 uses CPU cores [1] -[2023-02-25 01:12:03,217][13904] Worker 5 uses CPU cores [1] -[2023-02-25 01:12:03,241][13900] Using GPUs [0] for process 0 (actually maps to GPUs [0]) -[2023-02-25 01:12:03,246][13900] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 -[2023-02-25 01:12:03,263][13885] ConvEncoder: input_channels=3 -[2023-02-25 01:12:03,282][13899] Worker 0 uses CPU cores [0] -[2023-02-25 01:12:03,302][13903] Worker 2 uses CPU cores [0] -[2023-02-25 01:12:03,302][13900] Num visible devices: 1 -[2023-02-25 01:12:03,354][13907] Worker 7 uses CPU cores [1] -[2023-02-25 01:12:03,373][13906] Worker 6 uses CPU cores [0] -[2023-02-25 01:12:03,467][13885] Conv encoder output size: 512 -[2023-02-25 01:12:03,468][13885] Policy head output size: 512 -[2023-02-25 01:12:03,483][13885] Created Actor Critic model with architecture: -[2023-02-25 01:12:03,483][13885] ActorCriticSharedWeights( - (obs_normalizer): ObservationNormalizer( - (running_mean_std): RunningMeanStdDictInPlace( - (running_mean_std): ModuleDict( - (obs): RunningMeanStdInPlace() - ) - ) - ) - (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) - (encoder): VizdoomEncoder( - (basic_encoder): ConvEncoder( - (enc): RecursiveScriptModule( - original_name=ConvEncoderImpl - (conv_head): RecursiveScriptModule( - original_name=Sequential - (0): RecursiveScriptModule(original_name=Conv2d) - (1): RecursiveScriptModule(original_name=ELU) - (2): RecursiveScriptModule(original_name=Conv2d) - (3): RecursiveScriptModule(original_name=ELU) - (4): RecursiveScriptModule(original_name=Conv2d) - (5): RecursiveScriptModule(original_name=ELU) - ) - (mlp_layers): RecursiveScriptModule( - original_name=Sequential - (0): RecursiveScriptModule(original_name=Linear) - (1): RecursiveScriptModule(original_name=ELU) - ) - ) - ) - ) - (core): ModelCoreRNN( - (core): GRU(512, 512) - ) - (decoder): MlpDecoder( - (mlp): Identity() - ) - (critic_linear): Linear(in_features=512, out_features=1, bias=True) - (action_parameterization): ActionParameterizationDefault( - (distribution_linear): Linear(in_features=512, out_features=5, bias=True) - ) -) -[2023-02-25 01:12:05,920][13885] Using optimizer -[2023-02-25 01:12:05,920][13885] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000026_106496.pth... -[2023-02-25 01:12:05,955][13885] Loading model from checkpoint -[2023-02-25 01:12:05,962][13885] Loaded experiment state at self.train_step=26, self.env_steps=106496 -[2023-02-25 01:12:05,962][13885] Initialized policy 0 weights for model version 26 -[2023-02-25 01:12:05,967][13885] Using GPUs [0] for process 0 (actually maps to GPUs [0]) -[2023-02-25 01:12:05,974][13885] LearnerWorker_p0 finished initialization! -[2023-02-25 01:12:06,181][13900] RunningMeanStd input shape: (3, 72, 128) -[2023-02-25 01:12:06,182][13900] RunningMeanStd input shape: (1,) -[2023-02-25 01:12:06,195][13900] ConvEncoder: input_channels=3 -[2023-02-25 01:12:06,299][13900] Conv encoder output size: 512 -[2023-02-25 01:12:06,299][13900] Policy head output size: 512 -[2023-02-25 01:12:06,813][00389] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 106496. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2023-02-25 01:12:08,878][00389] Inference worker 0-0 is ready! -[2023-02-25 01:12:08,879][00389] All inference workers are ready! Signal rollout workers to start! -[2023-02-25 01:12:08,979][13903] Doom resolution: 160x120, resize resolution: (128, 72) -[2023-02-25 01:12:08,981][13906] Doom resolution: 160x120, resize resolution: (128, 72) -[2023-02-25 01:12:08,982][13905] Doom resolution: 160x120, resize resolution: (128, 72) -[2023-02-25 01:12:08,977][13899] Doom resolution: 160x120, resize resolution: (128, 72) -[2023-02-25 01:12:08,981][13904] Doom resolution: 160x120, resize resolution: (128, 72) -[2023-02-25 01:12:08,990][13901] Doom resolution: 160x120, resize resolution: (128, 72) -[2023-02-25 01:12:08,989][13907] Doom resolution: 160x120, resize resolution: (128, 72) -[2023-02-25 01:12:08,985][13902] Doom resolution: 160x120, resize resolution: (128, 72) -[2023-02-25 01:12:09,778][00389] Heartbeat connected on Batcher_0 -[2023-02-25 01:12:09,784][00389] Heartbeat connected on LearnerWorker_p0 -[2023-02-25 01:12:09,821][00389] Heartbeat connected on InferenceWorker_p0-w0 -[2023-02-25 01:12:10,369][13906] Decorrelating experience for 0 frames... -[2023-02-25 01:12:10,374][13905] Decorrelating experience for 0 frames... -[2023-02-25 01:12:10,379][13899] Decorrelating experience for 0 frames... -[2023-02-25 01:12:10,392][13901] Decorrelating experience for 0 frames... -[2023-02-25 01:12:10,396][13904] Decorrelating experience for 0 frames... -[2023-02-25 01:12:10,402][13907] Decorrelating experience for 0 frames... -[2023-02-25 01:12:11,100][13902] Decorrelating experience for 0 frames... -[2023-02-25 01:12:11,106][13901] Decorrelating experience for 32 frames... -[2023-02-25 01:12:11,454][13906] Decorrelating experience for 32 frames... -[2023-02-25 01:12:11,457][13903] Decorrelating experience for 0 frames... -[2023-02-25 01:12:11,556][13905] Decorrelating experience for 32 frames... -[2023-02-25 01:12:11,813][00389] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 106496. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2023-02-25 01:12:12,390][13904] Decorrelating experience for 32 frames... -[2023-02-25 01:12:12,415][13902] Decorrelating experience for 32 frames... -[2023-02-25 01:12:12,445][13907] Decorrelating experience for 32 frames... -[2023-02-25 01:12:13,067][13903] Decorrelating experience for 32 frames... -[2023-02-25 01:12:13,254][13899] Decorrelating experience for 32 frames... -[2023-02-25 01:12:13,606][13905] Decorrelating experience for 64 frames... -[2023-02-25 01:12:14,275][13906] Decorrelating experience for 64 frames... -[2023-02-25 01:12:14,319][13901] Decorrelating experience for 64 frames... -[2023-02-25 01:12:14,601][13904] Decorrelating experience for 64 frames... -[2023-02-25 01:12:14,611][13902] Decorrelating experience for 64 frames... -[2023-02-25 01:12:15,236][13903] Decorrelating experience for 64 frames... -[2023-02-25 01:12:15,999][13907] Decorrelating experience for 64 frames... -[2023-02-25 01:12:16,173][13899] Decorrelating experience for 64 frames... -[2023-02-25 01:12:16,176][13901] Decorrelating experience for 96 frames... -[2023-02-25 01:12:16,324][13906] Decorrelating experience for 96 frames... -[2023-02-25 01:12:16,472][13902] Decorrelating experience for 96 frames... -[2023-02-25 01:12:16,487][00389] Heartbeat connected on RolloutWorker_w1 -[2023-02-25 01:12:16,555][00389] Heartbeat connected on RolloutWorker_w6 -[2023-02-25 01:12:16,804][00389] Heartbeat connected on RolloutWorker_w3 -[2023-02-25 01:12:16,814][00389] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 106496. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2023-02-25 01:12:16,873][13905] Decorrelating experience for 96 frames... -[2023-02-25 01:12:17,294][00389] Heartbeat connected on RolloutWorker_w4 -[2023-02-25 01:12:19,782][13907] Decorrelating experience for 96 frames... -[2023-02-25 01:12:20,397][00389] Heartbeat connected on RolloutWorker_w7 -[2023-02-25 01:12:21,813][00389] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 106496. Throughput: 0: 84.8. Samples: 1272. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2023-02-25 01:12:21,817][00389] Avg episode reward: [(0, '2.183')] -[2023-02-25 01:12:24,886][13885] Signal inference workers to stop experience collection... -[2023-02-25 01:12:24,932][13900] InferenceWorker_p0-w0: stopping experience collection -[2023-02-25 01:12:25,028][13903] Decorrelating experience for 96 frames... -[2023-02-25 01:12:25,176][00389] Heartbeat connected on RolloutWorker_w2 -[2023-02-25 01:12:25,198][13885] Signal inference workers to resume experience collection... -[2023-02-25 01:12:25,199][13885] Saving new best policy, reward=2.183! -[2023-02-25 01:12:25,199][13900] InferenceWorker_p0-w0: resuming experience collection -[2023-02-25 01:12:25,227][13885] Stopping Batcher_0... -[2023-02-25 01:12:25,228][13885] Loop batcher_evt_loop terminating... -[2023-02-25 01:12:25,229][00389] Component Batcher_0 stopped! -[2023-02-25 01:12:25,260][13905] Stopping RolloutWorker_w4... -[2023-02-25 01:12:25,261][13905] Loop rollout_proc4_evt_loop terminating... -[2023-02-25 01:12:25,260][00389] Component RolloutWorker_w2 stopped! -[2023-02-25 01:12:25,268][00389] Component RolloutWorker_w4 stopped! -[2023-02-25 01:12:25,250][13900] Weights refcount: 2 0 -[2023-02-25 01:12:25,282][13900] Stopping InferenceWorker_p0-w0... -[2023-02-25 01:12:25,274][00389] Component RolloutWorker_w3 stopped! -[2023-02-25 01:12:25,256][13903] Stopping RolloutWorker_w2... -[2023-02-25 01:12:25,284][13906] Stopping RolloutWorker_w6... -[2023-02-25 01:12:25,283][00389] Component InferenceWorker_p0-w0 stopped! -[2023-02-25 01:12:25,283][13900] Loop inference_proc0-0_evt_loop terminating... -[2023-02-25 01:12:25,273][13902] Stopping RolloutWorker_w3... -[2023-02-25 01:12:25,294][13902] Loop rollout_proc3_evt_loop terminating... -[2023-02-25 01:12:25,293][00389] Component RolloutWorker_w6 stopped! -[2023-02-25 01:12:25,283][13903] Loop rollout_proc2_evt_loop terminating... -[2023-02-25 01:12:25,314][00389] Component RolloutWorker_w1 stopped! -[2023-02-25 01:12:25,292][13906] Loop rollout_proc6_evt_loop terminating... -[2023-02-25 01:12:25,315][13901] Stopping RolloutWorker_w1... -[2023-02-25 01:12:25,330][13901] Loop rollout_proc1_evt_loop terminating... -[2023-02-25 01:12:25,321][13907] Stopping RolloutWorker_w7... -[2023-02-25 01:12:25,321][00389] Component RolloutWorker_w7 stopped! -[2023-02-25 01:12:25,339][13907] Loop rollout_proc7_evt_loop terminating... -[2023-02-25 01:12:28,175][13885] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000028_114688.pth... -[2023-02-25 01:12:28,296][13885] Saving new best policy, reward=2.621! -[2023-02-25 01:12:28,546][13885] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000028_114688.pth... -[2023-02-25 01:12:28,768][13885] Stopping LearnerWorker_p0... -[2023-02-25 01:12:28,768][13885] Loop learner_proc0_evt_loop terminating... -[2023-02-25 01:12:28,769][00389] Component LearnerWorker_p0 stopped! -[2023-02-25 01:12:28,905][13899] Decorrelating experience for 96 frames... -[2023-02-25 01:12:29,426][00389] Component RolloutWorker_w0 stopped! -[2023-02-25 01:12:29,436][13899] Stopping RolloutWorker_w0... -[2023-02-25 01:12:29,438][13899] Loop rollout_proc0_evt_loop terminating... -[2023-02-25 01:12:30,527][13904] Decorrelating experience for 96 frames... -[2023-02-25 01:12:30,583][13904] Stopping RolloutWorker_w5... -[2023-02-25 01:12:30,584][13904] Loop rollout_proc5_evt_loop terminating... -[2023-02-25 01:12:30,583][00389] Component RolloutWorker_w5 stopped! -[2023-02-25 01:12:30,586][00389] Waiting for process learner_proc0 to stop... -[2023-02-25 01:12:30,587][00389] Waiting for process inference_proc0-0 to join... -[2023-02-25 01:12:30,589][00389] Waiting for process rollout_proc0 to join... -[2023-02-25 01:12:30,590][00389] Waiting for process rollout_proc1 to join... -[2023-02-25 01:12:30,593][00389] Waiting for process rollout_proc2 to join... -[2023-02-25 01:12:30,595][00389] Waiting for process rollout_proc3 to join... -[2023-02-25 01:12:30,597][00389] Waiting for process rollout_proc4 to join... -[2023-02-25 01:12:30,600][00389] Waiting for process rollout_proc5 to join... -[2023-02-25 01:12:30,826][00389] Waiting for process rollout_proc6 to join... -[2023-02-25 01:12:30,830][00389] Waiting for process rollout_proc7 to join... -[2023-02-25 01:12:30,831][00389] Batcher 0 profile tree view: -batching: 0.0708, releasing_batches: 0.0005 -[2023-02-25 01:12:30,834][00389] InferenceWorker_p0-w0 profile tree view: -update_model: 0.0226 -wait_policy: 0.0024 - wait_policy_total: 11.7130 -one_step: 0.0128 - handle_policy_step: 3.9862 - deserialize: 0.0532, stack: 0.0110, obs_to_device_normalize: 0.4729, forward: 2.9253, send_messages: 0.1043 - prepare_outputs: 0.3162 - to_cpu: 0.1608 -[2023-02-25 01:12:30,836][00389] Learner 0 profile tree view: -misc: 0.0000, prepare_batch: 6.5606 -train: 1.0002 - epoch_init: 0.0000, minibatch_init: 0.0000, losses_postprocess: 0.0004, kl_divergence: 0.0005, after_optimizer: 0.0132 - calculate_losses: 0.1732 - losses_init: 0.0000, forward_head: 0.1298, bptt_initial: 0.0213, tail: 0.0013, advantages_returns: 0.0052, losses: 0.0119 +[2023-03-01 03:17:24,882][11907] Using optimizer +[2023-03-01 03:17:24,883][11907] No checkpoints found +[2023-03-01 03:17:24,883][11907] Did not load from checkpoint, starting from scratch! +[2023-03-01 03:17:24,884][11907] Initialized policy 0 weights for model version 0 +[2023-03-01 03:17:24,887][11907] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2023-03-01 03:17:24,894][11907] LearnerWorker_p0 finished initialization! +[2023-03-01 03:17:25,090][11921] RunningMeanStd input shape: (3, 72, 128) +[2023-03-01 03:17:25,091][11921] RunningMeanStd input shape: (1,) +[2023-03-01 03:17:25,103][11921] ConvEncoder: input_channels=3 +[2023-03-01 03:17:25,203][11921] Conv encoder output size: 512 +[2023-03-01 03:17:25,204][11921] Policy head output size: 512 +[2023-03-01 03:17:26,393][00674] Heartbeat connected on Batcher_0 +[2023-03-01 03:17:26,398][00674] Heartbeat connected on LearnerWorker_p0 +[2023-03-01 03:17:26,413][00674] Heartbeat connected on RolloutWorker_w0 +[2023-03-01 03:17:26,417][00674] Heartbeat connected on RolloutWorker_w1 +[2023-03-01 03:17:26,422][00674] Heartbeat connected on RolloutWorker_w2 +[2023-03-01 03:17:26,424][00674] Heartbeat connected on RolloutWorker_w3 +[2023-03-01 03:17:26,427][00674] Heartbeat connected on RolloutWorker_w4 +[2023-03-01 03:17:26,429][00674] Heartbeat connected on RolloutWorker_w5 +[2023-03-01 03:17:26,432][00674] Heartbeat connected on RolloutWorker_w6 +[2023-03-01 03:17:26,435][00674] Heartbeat connected on RolloutWorker_w7 +[2023-03-01 03:17:27,008][00674] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2023-03-01 03:17:27,574][00674] Inference worker 0-0 is ready! +[2023-03-01 03:17:27,576][00674] All inference workers are ready! Signal rollout workers to start! +[2023-03-01 03:17:27,580][00674] Heartbeat connected on InferenceWorker_p0-w0 +[2023-03-01 03:17:27,683][11928] Doom resolution: 160x120, resize resolution: (128, 72) +[2023-03-01 03:17:27,707][11926] Doom resolution: 160x120, resize resolution: (128, 72) +[2023-03-01 03:17:27,715][11933] Doom resolution: 160x120, resize resolution: (128, 72) +[2023-03-01 03:17:27,723][11931] Doom resolution: 160x120, resize resolution: (128, 72) +[2023-03-01 03:17:27,735][11927] Doom resolution: 160x120, resize resolution: (128, 72) +[2023-03-01 03:17:27,742][11930] Doom resolution: 160x120, resize resolution: (128, 72) +[2023-03-01 03:17:27,745][11932] Doom resolution: 160x120, resize resolution: (128, 72) +[2023-03-01 03:17:27,759][11929] Doom resolution: 160x120, resize resolution: (128, 72) +[2023-03-01 03:17:28,900][11926] Decorrelating experience for 0 frames... +[2023-03-01 03:17:28,901][11933] Decorrelating experience for 0 frames... +[2023-03-01 03:17:28,902][11928] Decorrelating experience for 0 frames... +[2023-03-01 03:17:28,900][11927] Decorrelating experience for 0 frames... +[2023-03-01 03:17:28,904][11930] Decorrelating experience for 0 frames... +[2023-03-01 03:17:28,903][11932] Decorrelating experience for 0 frames... +[2023-03-01 03:17:29,577][11933] Decorrelating experience for 32 frames... +[2023-03-01 03:17:29,613][11929] Decorrelating experience for 0 frames... +[2023-03-01 03:17:30,174][11927] Decorrelating experience for 32 frames... +[2023-03-01 03:17:30,180][11931] Decorrelating experience for 0 frames... +[2023-03-01 03:17:30,184][11930] Decorrelating experience for 32 frames... +[2023-03-01 03:17:31,014][11929] Decorrelating experience for 32 frames... +[2023-03-01 03:17:31,135][11933] Decorrelating experience for 64 frames... +[2023-03-01 03:17:31,355][11932] Decorrelating experience for 32 frames... +[2023-03-01 03:17:31,866][11931] Decorrelating experience for 32 frames... +[2023-03-01 03:17:32,011][00674] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2023-03-01 03:17:32,094][11927] Decorrelating experience for 64 frames... +[2023-03-01 03:17:32,116][11930] Decorrelating experience for 64 frames... +[2023-03-01 03:17:32,601][11933] Decorrelating experience for 96 frames... +[2023-03-01 03:17:32,775][11929] Decorrelating experience for 64 frames... +[2023-03-01 03:17:33,397][11928] Decorrelating experience for 32 frames... +[2023-03-01 03:17:34,167][11928] Decorrelating experience for 64 frames... +[2023-03-01 03:17:34,321][11931] Decorrelating experience for 64 frames... +[2023-03-01 03:17:34,460][11927] Decorrelating experience for 96 frames... +[2023-03-01 03:17:34,493][11930] Decorrelating experience for 96 frames... +[2023-03-01 03:17:34,904][11928] Decorrelating experience for 96 frames... +[2023-03-01 03:17:35,586][11932] Decorrelating experience for 64 frames... +[2023-03-01 03:17:36,049][11929] Decorrelating experience for 96 frames... +[2023-03-01 03:17:36,375][11931] Decorrelating experience for 96 frames... +[2023-03-01 03:17:36,844][11926] Decorrelating experience for 32 frames... +[2023-03-01 03:17:37,009][00674] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2023-03-01 03:17:37,195][11932] Decorrelating experience for 96 frames... +[2023-03-01 03:17:37,520][11926] Decorrelating experience for 64 frames... +[2023-03-01 03:17:38,041][11926] Decorrelating experience for 96 frames... +[2023-03-01 03:17:40,783][11907] Signal inference workers to stop experience collection... +[2023-03-01 03:17:40,812][11921] InferenceWorker_p0-w0: stopping experience collection +[2023-03-01 03:17:42,008][00674] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 98.9. Samples: 1484. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2023-03-01 03:17:42,010][00674] Avg episode reward: [(0, '1.828')] +[2023-03-01 03:17:43,230][11907] Signal inference workers to resume experience collection... +[2023-03-01 03:17:43,231][11921] InferenceWorker_p0-w0: resuming experience collection +[2023-03-01 03:17:44,876][11926] Stopping RolloutWorker_w1... +[2023-03-01 03:17:44,878][11926] Loop rollout_proc1_evt_loop terminating... +[2023-03-01 03:17:44,879][11907] Stopping Batcher_0... +[2023-03-01 03:17:44,880][11907] Loop batcher_evt_loop terminating... +[2023-03-01 03:17:44,877][00674] Component RolloutWorker_w1 stopped! +[2023-03-01 03:17:44,889][00674] Component Batcher_0 stopped! +[2023-03-01 03:17:44,913][00674] Component RolloutWorker_w0 stopped! +[2023-03-01 03:17:44,914][11928] Stopping RolloutWorker_w3... +[2023-03-01 03:17:44,917][00674] Component RolloutWorker_w3 stopped! +[2023-03-01 03:17:44,922][11927] Stopping RolloutWorker_w0... +[2023-03-01 03:17:44,923][00674] Component RolloutWorker_w7 stopped! +[2023-03-01 03:17:44,924][11933] Stopping RolloutWorker_w7... +[2023-03-01 03:17:44,930][11929] Stopping RolloutWorker_w5... +[2023-03-01 03:17:44,930][00674] Component RolloutWorker_w5 stopped! +[2023-03-01 03:17:44,919][11928] Loop rollout_proc3_evt_loop terminating... +[2023-03-01 03:17:44,932][11907] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000003_12288.pth... +[2023-03-01 03:17:44,937][11921] Weights refcount: 2 0 +[2023-03-01 03:17:44,937][11927] Loop rollout_proc0_evt_loop terminating... +[2023-03-01 03:17:44,939][00674] Component InferenceWorker_p0-w0 stopped! +[2023-03-01 03:17:44,927][11933] Loop rollout_proc7_evt_loop terminating... +[2023-03-01 03:17:44,931][11929] Loop rollout_proc5_evt_loop terminating... +[2023-03-01 03:17:44,939][11921] Stopping InferenceWorker_p0-w0... +[2023-03-01 03:17:44,944][11921] Loop inference_proc0-0_evt_loop terminating... +[2023-03-01 03:17:44,959][00674] Component RolloutWorker_w4 stopped! +[2023-03-01 03:17:44,969][11932] Stopping RolloutWorker_w6... +[2023-03-01 03:17:44,969][00674] Component RolloutWorker_w6 stopped! +[2023-03-01 03:17:44,959][11930] Stopping RolloutWorker_w4... +[2023-03-01 03:17:44,980][11931] Stopping RolloutWorker_w2... +[2023-03-01 03:17:44,981][00674] Component RolloutWorker_w2 stopped! +[2023-03-01 03:17:44,970][11932] Loop rollout_proc6_evt_loop terminating... +[2023-03-01 03:17:44,974][11930] Loop rollout_proc4_evt_loop terminating... +[2023-03-01 03:17:44,982][11931] Loop rollout_proc2_evt_loop terminating... +[2023-03-01 03:17:45,054][11907] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000003_12288.pth... +[2023-03-01 03:17:45,205][00674] Component LearnerWorker_p0 stopped! +[2023-03-01 03:17:45,207][00674] Waiting for process learner_proc0 to stop... +[2023-03-01 03:17:45,211][11907] Stopping LearnerWorker_p0... +[2023-03-01 03:17:45,213][11907] Loop learner_proc0_evt_loop terminating... +[2023-03-01 03:17:46,991][00674] Waiting for process inference_proc0-0 to join... +[2023-03-01 03:17:47,419][00674] Waiting for process rollout_proc0 to join... +[2023-03-01 03:17:47,798][00674] Waiting for process rollout_proc1 to join... +[2023-03-01 03:17:47,800][00674] Waiting for process rollout_proc2 to join... +[2023-03-01 03:17:47,812][00674] Waiting for process rollout_proc3 to join... +[2023-03-01 03:17:47,813][00674] Waiting for process rollout_proc4 to join... +[2023-03-01 03:17:47,815][00674] Waiting for process rollout_proc5 to join... +[2023-03-01 03:17:47,816][00674] Waiting for process rollout_proc6 to join... +[2023-03-01 03:17:47,817][00674] Waiting for process rollout_proc7 to join... +[2023-03-01 03:17:47,818][00674] Batcher 0 profile tree view: +batching: 0.0695, releasing_batches: 0.0007 +[2023-03-01 03:17:47,820][00674] InferenceWorker_p0-w0 profile tree view: +wait_policy: 0.0000 + wait_policy_total: 7.1780 +update_model: 0.0171 + weight_update: 0.0010 +one_step: 0.0020 + handle_policy_step: 6.9707 + deserialize: 0.0485, stack: 0.0087, obs_to_device_normalize: 0.3685, forward: 6.1493, send_messages: 0.0999 + prepare_outputs: 0.2249 + to_cpu: 0.1262 +[2023-03-01 03:17:47,821][00674] Learner 0 profile tree view: +misc: 0.0000, prepare_batch: 4.4091 +train: 1.3496 + epoch_init: 0.0000, minibatch_init: 0.0000, losses_postprocess: 0.0004, kl_divergence: 0.0069, after_optimizer: 0.0264 + calculate_losses: 0.1852 + losses_init: 0.0000, forward_head: 0.1145, bptt_initial: 0.0597, tail: 0.0022, advantages_returns: 0.0009, losses: 0.0042 bptt: 0.0031 - bptt_forward_core: 0.0029 - update: 0.7983 - clip: 0.0030 -[2023-02-25 01:12:30,838][00389] RolloutWorker_w0 profile tree view: -wait_for_trajectories: 0.0003, enqueue_policy_requests: 0.0006 -[2023-02-25 01:12:30,841][00389] RolloutWorker_w7 profile tree view: -wait_for_trajectories: 0.0005, enqueue_policy_requests: 1.1856, env_step: 2.8701, overhead: 0.2014, complete_rollouts: 0.0310 -save_policy_outputs: 0.0911 - split_output_tensors: 0.0448 -[2023-02-25 01:12:30,844][00389] Loop Runner_EvtLoop terminating... -[2023-02-25 01:12:30,851][00389] Runner profile tree view: -main_loop: 41.0095 -[2023-02-25 01:12:30,854][00389] Collected {0: 114688}, FPS: 199.8 -[2023-02-25 01:14:08,563][00389] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json -[2023-02-25 01:14:08,565][00389] Overriding arg 'num_workers' with value 1 passed from command line -[2023-02-25 01:14:08,567][00389] Adding new argument 'no_render'=True that is not in the saved config file! -[2023-02-25 01:14:08,570][00389] Adding new argument 'save_video'=True that is not in the saved config file! -[2023-02-25 01:14:08,572][00389] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! -[2023-02-25 01:14:08,573][00389] Adding new argument 'video_name'=None that is not in the saved config file! -[2023-02-25 01:14:08,576][00389] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! -[2023-02-25 01:14:08,577][00389] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! -[2023-02-25 01:14:08,578][00389] Adding new argument 'push_to_hub'=False that is not in the saved config file! -[2023-02-25 01:14:08,580][00389] Adding new argument 'hf_repository'=None that is not in the saved config file! -[2023-02-25 01:14:08,581][00389] Adding new argument 'policy_index'=0 that is not in the saved config file! -[2023-02-25 01:14:08,582][00389] Adding new argument 'eval_deterministic'=False that is not in the saved config file! -[2023-02-25 01:14:08,583][00389] Adding new argument 'train_script'=None that is not in the saved config file! -[2023-02-25 01:14:08,585][00389] Adding new argument 'enjoy_script'=None that is not in the saved config file! -[2023-02-25 01:14:08,586][00389] Using frameskip 1 and render_action_repeat=4 for evaluation -[2023-02-25 01:14:08,617][00389] Doom resolution: 160x120, resize resolution: (128, 72) -[2023-02-25 01:14:08,619][00389] RunningMeanStd input shape: (3, 72, 128) -[2023-02-25 01:14:08,623][00389] RunningMeanStd input shape: (1,) -[2023-02-25 01:14:08,642][00389] ConvEncoder: input_channels=3 -[2023-02-25 01:14:09,306][00389] Conv encoder output size: 512 -[2023-02-25 01:14:09,308][00389] Policy head output size: 512 -[2023-02-25 01:14:11,975][00389] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000028_114688.pth... -[2023-02-25 01:14:13,821][00389] Num frames 100... -[2023-02-25 01:14:13,989][00389] Num frames 200... -[2023-02-25 01:14:14,161][00389] Num frames 300... -[2023-02-25 01:14:14,362][00389] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840 -[2023-02-25 01:14:14,364][00389] Avg episode reward: 3.840, avg true_objective: 3.840 -[2023-02-25 01:14:14,394][00389] Num frames 400... -[2023-02-25 01:14:14,565][00389] Num frames 500... -[2023-02-25 01:14:14,731][00389] Num frames 600... -[2023-02-25 01:14:14,900][00389] Num frames 700... -[2023-02-25 01:14:15,033][00389] Num frames 800... -[2023-02-25 01:14:15,127][00389] Avg episode rewards: #0: 4.660, true rewards: #0: 4.160 -[2023-02-25 01:14:15,128][00389] Avg episode reward: 4.660, avg true_objective: 4.160 -[2023-02-25 01:14:15,211][00389] Num frames 900... -[2023-02-25 01:14:15,343][00389] Num frames 1000... -[2023-02-25 01:14:15,456][00389] Num frames 1100... -[2023-02-25 01:14:15,568][00389] Num frames 1200... -[2023-02-25 01:14:15,641][00389] Avg episode rewards: #0: 4.387, true rewards: #0: 4.053 -[2023-02-25 01:14:15,643][00389] Avg episode reward: 4.387, avg true_objective: 4.053 -[2023-02-25 01:14:15,740][00389] Num frames 1300... -[2023-02-25 01:14:15,857][00389] Num frames 1400... -[2023-02-25 01:14:15,982][00389] Num frames 1500... -[2023-02-25 01:14:16,098][00389] Num frames 1600... -[2023-02-25 01:14:16,150][00389] Avg episode rewards: #0: 4.250, true rewards: #0: 4.000 -[2023-02-25 01:14:16,152][00389] Avg episode reward: 4.250, avg true_objective: 4.000 -[2023-02-25 01:14:16,276][00389] Num frames 1700... -[2023-02-25 01:14:16,398][00389] Num frames 1800... -[2023-02-25 01:14:16,523][00389] Num frames 1900... -[2023-02-25 01:14:16,672][00389] Avg episode rewards: #0: 4.168, true rewards: #0: 3.968 -[2023-02-25 01:14:16,673][00389] Avg episode reward: 4.168, avg true_objective: 3.968 -[2023-02-25 01:14:16,696][00389] Num frames 2000... -[2023-02-25 01:14:16,808][00389] Num frames 2100... -[2023-02-25 01:14:16,929][00389] Num frames 2200... -[2023-02-25 01:14:17,047][00389] Num frames 2300... -[2023-02-25 01:14:17,108][00389] Avg episode rewards: #0: 4.003, true rewards: #0: 3.837 -[2023-02-25 01:14:17,113][00389] Avg episode reward: 4.003, avg true_objective: 3.837 -[2023-02-25 01:14:17,231][00389] Num frames 2400... -[2023-02-25 01:14:17,353][00389] Num frames 2500... -[2023-02-25 01:14:17,471][00389] Num frames 2600... -[2023-02-25 01:14:17,626][00389] Avg episode rewards: #0: 3.980, true rewards: #0: 3.837 -[2023-02-25 01:14:17,628][00389] Avg episode reward: 3.980, avg true_objective: 3.837 -[2023-02-25 01:14:17,647][00389] Num frames 2700... -[2023-02-25 01:14:17,763][00389] Num frames 2800... -[2023-02-25 01:14:17,875][00389] Num frames 2900... -[2023-02-25 01:14:17,988][00389] Num frames 3000... -[2023-02-25 01:14:18,131][00389] Avg episode rewards: #0: 3.963, true rewards: #0: 3.837 -[2023-02-25 01:14:18,133][00389] Avg episode reward: 3.963, avg true_objective: 3.837 -[2023-02-25 01:14:18,169][00389] Num frames 3100... -[2023-02-25 01:14:18,292][00389] Num frames 3200... -[2023-02-25 01:14:18,412][00389] Num frames 3300... -[2023-02-25 01:14:18,529][00389] Num frames 3400... -[2023-02-25 01:14:18,650][00389] Avg episode rewards: #0: 3.949, true rewards: #0: 3.838 -[2023-02-25 01:14:18,651][00389] Avg episode reward: 3.949, avg true_objective: 3.838 -[2023-02-25 01:14:18,711][00389] Num frames 3500... -[2023-02-25 01:14:18,824][00389] Num frames 3600... -[2023-02-25 01:14:18,950][00389] Num frames 3700... -[2023-02-25 01:14:19,066][00389] Num frames 3800... -[2023-02-25 01:14:19,182][00389] Num frames 3900... -[2023-02-25 01:14:19,242][00389] Avg episode rewards: #0: 4.102, true rewards: #0: 3.902 -[2023-02-25 01:14:19,245][00389] Avg episode reward: 4.102, avg true_objective: 3.902 -[2023-02-25 01:14:41,404][00389] Replay video saved to /content/train_dir/default_experiment/replay.mp4! -[2023-02-25 01:16:11,150][00389] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json -[2023-02-25 01:16:11,152][00389] Overriding arg 'num_workers' with value 1 passed from command line -[2023-02-25 01:16:11,155][00389] Adding new argument 'no_render'=True that is not in the saved config file! -[2023-02-25 01:16:11,158][00389] Adding new argument 'save_video'=True that is not in the saved config file! -[2023-02-25 01:16:11,159][00389] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! -[2023-02-25 01:16:11,161][00389] Adding new argument 'video_name'=None that is not in the saved config file! -[2023-02-25 01:16:11,163][00389] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! -[2023-02-25 01:16:11,164][00389] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! -[2023-02-25 01:16:11,165][00389] Adding new argument 'push_to_hub'=True that is not in the saved config file! -[2023-02-25 01:16:11,167][00389] Adding new argument 'hf_repository'='ThomasSimonini/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! -[2023-02-25 01:16:11,168][00389] Adding new argument 'policy_index'=0 that is not in the saved config file! -[2023-02-25 01:16:11,170][00389] Adding new argument 'eval_deterministic'=False that is not in the saved config file! -[2023-02-25 01:16:11,171][00389] Adding new argument 'train_script'=None that is not in the saved config file! -[2023-02-25 01:16:11,172][00389] Adding new argument 'enjoy_script'=None that is not in the saved config file! -[2023-02-25 01:16:11,174][00389] Using frameskip 1 and render_action_repeat=4 for evaluation -[2023-02-25 01:16:11,204][00389] RunningMeanStd input shape: (3, 72, 128) -[2023-02-25 01:16:11,206][00389] RunningMeanStd input shape: (1,) -[2023-02-25 01:16:11,222][00389] ConvEncoder: input_channels=3 -[2023-02-25 01:16:11,260][00389] Conv encoder output size: 512 -[2023-02-25 01:16:11,265][00389] Policy head output size: 512 -[2023-02-25 01:16:11,285][00389] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000028_114688.pth... -[2023-02-25 01:16:11,747][00389] Num frames 100... -[2023-02-25 01:16:11,871][00389] Num frames 200... -[2023-02-25 01:16:11,991][00389] Num frames 300... -[2023-02-25 01:16:12,116][00389] Num frames 400... -[2023-02-25 01:16:12,238][00389] Num frames 500... -[2023-02-25 01:16:12,351][00389] Avg episode rewards: #0: 7.440, true rewards: #0: 5.440 -[2023-02-25 01:16:12,354][00389] Avg episode reward: 7.440, avg true_objective: 5.440 -[2023-02-25 01:16:12,421][00389] Num frames 600... -[2023-02-25 01:16:12,535][00389] Num frames 700... -[2023-02-25 01:16:12,659][00389] Num frames 800... -[2023-02-25 01:16:12,787][00389] Num frames 900... -[2023-02-25 01:16:12,951][00389] Avg episode rewards: #0: 6.460, true rewards: #0: 4.960 -[2023-02-25 01:16:12,956][00389] Avg episode reward: 6.460, avg true_objective: 4.960 -[2023-02-25 01:16:12,969][00389] Num frames 1000... -[2023-02-25 01:16:13,086][00389] Num frames 1100... -[2023-02-25 01:16:13,209][00389] Num frames 1200... -[2023-02-25 01:16:13,327][00389] Num frames 1300... -[2023-02-25 01:16:13,469][00389] Num frames 1400... -[2023-02-25 01:16:13,583][00389] Num frames 1500... -[2023-02-25 01:16:13,702][00389] Avg episode rewards: #0: 6.787, true rewards: #0: 5.120 -[2023-02-25 01:16:13,705][00389] Avg episode reward: 6.787, avg true_objective: 5.120 -[2023-02-25 01:16:13,784][00389] Num frames 1600... -[2023-02-25 01:16:13,937][00389] Num frames 1700... -[2023-02-25 01:16:14,055][00389] Num frames 1800... -[2023-02-25 01:16:14,220][00389] Num frames 1900... -[2023-02-25 01:16:14,318][00389] Avg episode rewards: #0: 6.050, true rewards: #0: 4.800 -[2023-02-25 01:16:14,320][00389] Avg episode reward: 6.050, avg true_objective: 4.800 -[2023-02-25 01:16:14,430][00389] Num frames 2000... -[2023-02-25 01:16:14,590][00389] Num frames 2100... -[2023-02-25 01:16:14,715][00389] Num frames 2200... -[2023-02-25 01:16:14,780][00389] Avg episode rewards: #0: 5.412, true rewards: #0: 4.412 -[2023-02-25 01:16:14,782][00389] Avg episode reward: 5.412, avg true_objective: 4.412 -[2023-02-25 01:16:14,984][00389] Num frames 2300... -[2023-02-25 01:16:15,258][00389] Num frames 2400... -[2023-02-25 01:16:15,437][00389] Num frames 2500... -[2023-02-25 01:16:15,667][00389] Avg episode rewards: #0: 5.150, true rewards: #0: 4.317 -[2023-02-25 01:16:15,673][00389] Avg episode reward: 5.150, avg true_objective: 4.317 -[2023-02-25 01:16:15,708][00389] Num frames 2600... -[2023-02-25 01:16:16,148][00389] Num frames 2700... -[2023-02-25 01:16:16,394][00389] Num frames 2800... -[2023-02-25 01:16:16,533][00389] Avg episode rewards: #0: 4.780, true rewards: #0: 4.066 -[2023-02-25 01:16:16,538][00389] Avg episode reward: 4.780, avg true_objective: 4.066 -[2023-02-25 01:16:16,775][00389] Num frames 2900... -[2023-02-25 01:16:17,029][00389] Num frames 3000... -[2023-02-25 01:16:17,154][00389] Num frames 3100... -[2023-02-25 01:16:17,282][00389] Num frames 3200... -[2023-02-25 01:16:17,445][00389] Avg episode rewards: #0: 4.868, true rewards: #0: 4.117 -[2023-02-25 01:16:17,447][00389] Avg episode reward: 4.868, avg true_objective: 4.117 -[2023-02-25 01:16:17,461][00389] Num frames 3300... -[2023-02-25 01:16:17,588][00389] Num frames 3400... -[2023-02-25 01:16:17,710][00389] Num frames 3500... -[2023-02-25 01:16:17,834][00389] Num frames 3600... -[2023-02-25 01:16:17,981][00389] Avg episode rewards: #0: 4.753, true rewards: #0: 4.087 -[2023-02-25 01:16:17,984][00389] Avg episode reward: 4.753, avg true_objective: 4.087 -[2023-02-25 01:16:18,014][00389] Num frames 3700... -[2023-02-25 01:16:18,138][00389] Num frames 3800... -[2023-02-25 01:16:18,256][00389] Num frames 3900... -[2023-02-25 01:16:18,382][00389] Num frames 4000... -[2023-02-25 01:16:18,508][00389] Avg episode rewards: #0: 4.662, true rewards: #0: 4.062 -[2023-02-25 01:16:18,509][00389] Avg episode reward: 4.662, avg true_objective: 4.062 -[2023-02-25 01:17:54,848][15272] Saving configuration to /content/train_dir/default_experiment/config.json... -[2023-02-25 01:17:54,850][15272] Rollout worker 0 uses device cpu -[2023-02-25 01:17:54,854][15272] Rollout worker 1 uses device cpu -[2023-02-25 01:17:54,855][15272] Rollout worker 2 uses device cpu -[2023-02-25 01:17:54,861][15272] Rollout worker 3 uses device cpu -[2023-02-25 01:17:54,862][15272] Rollout worker 4 uses device cpu -[2023-02-25 01:17:54,866][15272] Rollout worker 5 uses device cpu -[2023-02-25 01:17:54,867][15272] Rollout worker 6 uses device cpu -[2023-02-25 01:17:54,869][15272] Rollout worker 7 uses device cpu -[2023-02-25 01:17:55,097][15272] Using GPUs [0] for process 0 (actually maps to GPUs [0]) -[2023-02-25 01:17:55,100][15272] InferenceWorker_p0-w0: min num requests: 2 -[2023-02-25 01:17:55,150][15272] Starting all processes... -[2023-02-25 01:17:55,152][15272] Starting process learner_proc0 -[2023-02-25 01:17:55,241][15272] Starting all processes... -[2023-02-25 01:17:55,337][15272] Starting process inference_proc0-0 -[2023-02-25 01:17:55,338][15272] Starting process rollout_proc0 -[2023-02-25 01:17:55,340][15272] Starting process rollout_proc1 -[2023-02-25 01:17:55,340][15272] Starting process rollout_proc2 -[2023-02-25 01:17:55,340][15272] Starting process rollout_proc3 -[2023-02-25 01:17:55,341][15272] Starting process rollout_proc4 -[2023-02-25 01:17:55,343][15272] Starting process rollout_proc5 -[2023-02-25 01:17:55,343][15272] Starting process rollout_proc6 -[2023-02-25 01:17:55,343][15272] Starting process rollout_proc7 -[2023-02-25 01:18:08,346][15762] Using GPUs [0] for process 0 (actually maps to GPUs [0]) -[2023-02-25 01:18:08,346][15762] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 -[2023-02-25 01:18:08,381][15782] Worker 1 uses CPU cores [1] -[2023-02-25 01:18:09,027][15787] Worker 7 uses CPU cores [1] -[2023-02-25 01:18:09,242][15776] Using GPUs [0] for process 0 (actually maps to GPUs [0]) -[2023-02-25 01:18:09,246][15776] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 -[2023-02-25 01:18:09,292][15785] Worker 6 uses CPU cores [0] -[2023-02-25 01:18:09,313][15788] Worker 5 uses CPU cores [1] -[2023-02-25 01:18:09,477][15786] Worker 4 uses CPU cores [0] -[2023-02-25 01:18:09,562][15784] Worker 3 uses CPU cores [1] -[2023-02-25 01:18:09,714][15783] Worker 2 uses CPU cores [0] -[2023-02-25 01:18:09,717][15777] Worker 0 uses CPU cores [0] -[2023-02-25 01:18:09,818][15776] Num visible devices: 1 -[2023-02-25 01:18:09,822][15762] Num visible devices: 1 -[2023-02-25 01:18:09,844][15762] Starting seed is not provided -[2023-02-25 01:18:09,845][15762] Using GPUs [0] for process 0 (actually maps to GPUs [0]) -[2023-02-25 01:18:09,846][15762] Initializing actor-critic model on device cuda:0 -[2023-02-25 01:18:09,847][15762] RunningMeanStd input shape: (3, 72, 128) -[2023-02-25 01:18:09,848][15762] RunningMeanStd input shape: (1,) -[2023-02-25 01:18:09,884][15762] ConvEncoder: input_channels=3 -[2023-02-25 01:18:10,126][15762] Conv encoder output size: 512 -[2023-02-25 01:18:10,128][15762] Policy head output size: 512 -[2023-02-25 01:18:10,152][15762] Created Actor Critic model with architecture: -[2023-02-25 01:18:10,153][15762] ActorCriticSharedWeights( - (obs_normalizer): ObservationNormalizer( - (running_mean_std): RunningMeanStdDictInPlace( - (running_mean_std): ModuleDict( - (obs): RunningMeanStdInPlace() - ) - ) - ) - (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) - (encoder): VizdoomEncoder( - (basic_encoder): ConvEncoder( - (enc): RecursiveScriptModule( - original_name=ConvEncoderImpl - (conv_head): RecursiveScriptModule( - original_name=Sequential - (0): RecursiveScriptModule(original_name=Conv2d) - (1): RecursiveScriptModule(original_name=ELU) - (2): RecursiveScriptModule(original_name=Conv2d) - (3): RecursiveScriptModule(original_name=ELU) - (4): RecursiveScriptModule(original_name=Conv2d) - (5): RecursiveScriptModule(original_name=ELU) - ) - (mlp_layers): RecursiveScriptModule( - original_name=Sequential - (0): RecursiveScriptModule(original_name=Linear) - (1): RecursiveScriptModule(original_name=ELU) - ) - ) - ) - ) - (core): ModelCoreRNN( - (core): GRU(512, 512) - ) - (decoder): MlpDecoder( - (mlp): Identity() - ) - (critic_linear): Linear(in_features=512, out_features=1, bias=True) - (action_parameterization): ActionParameterizationDefault( - (distribution_linear): Linear(in_features=512, out_features=5, bias=True) - ) -) -[2023-02-25 01:18:14,035][15762] Using optimizer -[2023-02-25 01:18:14,036][15762] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000028_114688.pth... -[2023-02-25 01:18:14,065][15762] Loading model from checkpoint -[2023-02-25 01:18:14,072][15762] Loaded experiment state at self.train_step=28, self.env_steps=114688 -[2023-02-25 01:18:14,073][15762] Initialized policy 0 weights for model version 28 -[2023-02-25 01:18:14,078][15762] LearnerWorker_p0 finished initialization! -[2023-02-25 01:18:14,082][15762] Using GPUs [0] for process 0 (actually maps to GPUs [0]) -[2023-02-25 01:18:14,277][15776] RunningMeanStd input shape: (3, 72, 128) -[2023-02-25 01:18:14,278][15776] RunningMeanStd input shape: (1,) -[2023-02-25 01:18:14,290][15776] ConvEncoder: input_channels=3 -[2023-02-25 01:18:14,396][15776] Conv encoder output size: 512 -[2023-02-25 01:18:14,396][15776] Policy head output size: 512 -[2023-02-25 01:18:15,076][15272] Heartbeat connected on Batcher_0 -[2023-02-25 01:18:15,090][15272] Heartbeat connected on LearnerWorker_p0 -[2023-02-25 01:18:15,118][15272] Heartbeat connected on RolloutWorker_w0 -[2023-02-25 01:18:15,123][15272] Heartbeat connected on RolloutWorker_w1 -[2023-02-25 01:18:15,127][15272] Heartbeat connected on RolloutWorker_w2 -[2023-02-25 01:18:15,132][15272] Heartbeat connected on RolloutWorker_w3 -[2023-02-25 01:18:15,137][15272] Heartbeat connected on RolloutWorker_w4 -[2023-02-25 01:18:15,143][15272] Heartbeat connected on RolloutWorker_w5 -[2023-02-25 01:18:15,145][15272] Heartbeat connected on RolloutWorker_w6 -[2023-02-25 01:18:15,150][15272] Heartbeat connected on RolloutWorker_w7 -[2023-02-25 01:18:15,442][15272] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 114688. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2023-02-25 01:18:16,646][15272] Inference worker 0-0 is ready! -[2023-02-25 01:18:16,648][15272] All inference workers are ready! Signal rollout workers to start! -[2023-02-25 01:18:16,650][15272] Heartbeat connected on InferenceWorker_p0-w0 -[2023-02-25 01:18:16,748][15785] Doom resolution: 160x120, resize resolution: (128, 72) -[2023-02-25 01:18:16,749][15786] Doom resolution: 160x120, resize resolution: (128, 72) -[2023-02-25 01:18:16,751][15783] Doom resolution: 160x120, resize resolution: (128, 72) -[2023-02-25 01:18:16,750][15777] Doom resolution: 160x120, resize resolution: (128, 72) -[2023-02-25 01:18:16,761][15787] Doom resolution: 160x120, resize resolution: (128, 72) -[2023-02-25 01:18:16,755][15784] Doom resolution: 160x120, resize resolution: (128, 72) -[2023-02-25 01:18:16,758][15788] Doom resolution: 160x120, resize resolution: (128, 72) -[2023-02-25 01:18:16,760][15782] Doom resolution: 160x120, resize resolution: (128, 72) -[2023-02-25 01:18:17,949][15782] Decorrelating experience for 0 frames... -[2023-02-25 01:18:17,953][15787] Decorrelating experience for 0 frames... -[2023-02-25 01:18:17,958][15788] Decorrelating experience for 0 frames... -[2023-02-25 01:18:18,263][15783] Decorrelating experience for 0 frames... -[2023-02-25 01:18:18,268][15785] Decorrelating experience for 0 frames... -[2023-02-25 01:18:18,270][15786] Decorrelating experience for 0 frames... -[2023-02-25 01:18:18,272][15777] Decorrelating experience for 0 frames... -[2023-02-25 01:18:18,625][15785] Decorrelating experience for 32 frames... -[2023-02-25 01:18:19,280][15788] Decorrelating experience for 32 frames... -[2023-02-25 01:18:19,456][15784] Decorrelating experience for 0 frames... -[2023-02-25 01:18:19,493][15782] Decorrelating experience for 32 frames... -[2023-02-25 01:18:19,721][15777] Decorrelating experience for 32 frames... -[2023-02-25 01:18:19,852][15787] Decorrelating experience for 32 frames... -[2023-02-25 01:18:19,986][15785] Decorrelating experience for 64 frames... -[2023-02-25 01:18:20,442][15272] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 114688. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2023-02-25 01:18:21,161][15782] Decorrelating experience for 64 frames... -[2023-02-25 01:18:21,479][15787] Decorrelating experience for 64 frames... -[2023-02-25 01:18:21,828][15786] Decorrelating experience for 32 frames... -[2023-02-25 01:18:22,110][15783] Decorrelating experience for 32 frames... -[2023-02-25 01:18:22,797][15777] Decorrelating experience for 64 frames... -[2023-02-25 01:18:23,028][15785] Decorrelating experience for 96 frames... -[2023-02-25 01:18:23,171][15782] Decorrelating experience for 96 frames... -[2023-02-25 01:18:24,143][15788] Decorrelating experience for 64 frames... -[2023-02-25 01:18:24,528][15787] Decorrelating experience for 96 frames... -[2023-02-25 01:18:25,354][15786] Decorrelating experience for 64 frames... -[2023-02-25 01:18:25,442][15272] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 114688. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2023-02-25 01:18:26,075][15783] Decorrelating experience for 64 frames... -[2023-02-25 01:18:27,264][15784] Decorrelating experience for 32 frames... -[2023-02-25 01:18:28,566][15786] Decorrelating experience for 96 frames... -[2023-02-25 01:18:28,824][15788] Decorrelating experience for 96 frames... -[2023-02-25 01:18:30,444][15272] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 114688. Throughput: 0: 65.1. Samples: 976. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2023-02-25 01:18:30,448][15272] Avg episode reward: [(0, '2.692')] -[2023-02-25 01:18:32,543][15762] Signal inference workers to stop experience collection... -[2023-02-25 01:18:32,567][15776] InferenceWorker_p0-w0: stopping experience collection -[2023-02-25 01:18:32,633][15784] Decorrelating experience for 64 frames... -[2023-02-25 01:18:32,689][15777] Decorrelating experience for 96 frames... -[2023-02-25 01:18:33,259][15784] Decorrelating experience for 96 frames... -[2023-02-25 01:18:33,460][15783] Decorrelating experience for 96 frames... -[2023-02-25 01:18:33,730][15762] Signal inference workers to resume experience collection... -[2023-02-25 01:18:33,739][15272] Component Batcher_0 stopped! -[2023-02-25 01:18:33,739][15762] Stopping Batcher_0... -[2023-02-25 01:18:33,748][15762] Loop batcher_evt_loop terminating... -[2023-02-25 01:18:33,754][15762] Saving new best policy, reward=2.692! -[2023-02-25 01:18:33,776][15272] Component RolloutWorker_w6 stopped! -[2023-02-25 01:18:33,776][15777] Stopping RolloutWorker_w0... -[2023-02-25 01:18:33,783][15272] Component RolloutWorker_w1 stopped! -[2023-02-25 01:18:33,787][15272] Component RolloutWorker_w0 stopped! -[2023-02-25 01:18:33,776][15782] Stopping RolloutWorker_w1... -[2023-02-25 01:18:33,778][15785] Stopping RolloutWorker_w6... -[2023-02-25 01:18:33,793][15782] Loop rollout_proc1_evt_loop terminating... -[2023-02-25 01:18:33,783][15777] Loop rollout_proc0_evt_loop terminating... -[2023-02-25 01:18:33,797][15776] Weights refcount: 2 0 -[2023-02-25 01:18:33,791][15785] Loop rollout_proc6_evt_loop terminating... -[2023-02-25 01:18:33,804][15272] Component RolloutWorker_w7 stopped! -[2023-02-25 01:18:33,811][15776] Stopping InferenceWorker_p0-w0... -[2023-02-25 01:18:33,809][15272] Component RolloutWorker_w5 stopped! -[2023-02-25 01:18:33,813][15776] Loop inference_proc0-0_evt_loop terminating... -[2023-02-25 01:18:33,814][15788] Stopping RolloutWorker_w5... -[2023-02-25 01:18:33,812][15272] Component InferenceWorker_p0-w0 stopped! -[2023-02-25 01:18:33,815][15783] Stopping RolloutWorker_w2... -[2023-02-25 01:18:33,815][15272] Component RolloutWorker_w2 stopped! -[2023-02-25 01:18:33,823][15787] Stopping RolloutWorker_w7... -[2023-02-25 01:18:33,825][15786] Stopping RolloutWorker_w4... -[2023-02-25 01:18:33,825][15272] Component RolloutWorker_w4 stopped! -[2023-02-25 01:18:33,816][15783] Loop rollout_proc2_evt_loop terminating... -[2023-02-25 01:18:33,832][15787] Loop rollout_proc7_evt_loop terminating... -[2023-02-25 01:18:33,824][15788] Loop rollout_proc5_evt_loop terminating... -[2023-02-25 01:18:33,833][15272] Component RolloutWorker_w3 stopped! -[2023-02-25 01:18:33,837][15784] Stopping RolloutWorker_w3... -[2023-02-25 01:18:33,826][15786] Loop rollout_proc4_evt_loop terminating... -[2023-02-25 01:18:33,839][15784] Loop rollout_proc3_evt_loop terminating... -[2023-02-25 01:18:36,027][15762] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000030_122880.pth... -[2023-02-25 01:18:36,126][15762] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000026_106496.pth -[2023-02-25 01:18:36,138][15762] Saving new best policy, reward=2.960! -[2023-02-25 01:18:36,295][15762] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000030_122880.pth... -[2023-02-25 01:18:36,438][15272] Component LearnerWorker_p0 stopped! -[2023-02-25 01:18:36,445][15272] Waiting for process learner_proc0 to stop... -[2023-02-25 01:18:36,439][15762] Stopping LearnerWorker_p0... -[2023-02-25 01:18:36,454][15762] Loop learner_proc0_evt_loop terminating... -[2023-02-25 01:18:37,633][15272] Waiting for process inference_proc0-0 to join... -[2023-02-25 01:18:37,635][15272] Waiting for process rollout_proc0 to join... -[2023-02-25 01:18:37,640][15272] Waiting for process rollout_proc1 to join... -[2023-02-25 01:18:37,646][15272] Waiting for process rollout_proc2 to join... -[2023-02-25 01:18:37,649][15272] Waiting for process rollout_proc3 to join... -[2023-02-25 01:18:37,651][15272] Waiting for process rollout_proc4 to join... -[2023-02-25 01:18:37,654][15272] Waiting for process rollout_proc5 to join... -[2023-02-25 01:18:37,659][15272] Waiting for process rollout_proc6 to join... -[2023-02-25 01:18:37,660][15272] Waiting for process rollout_proc7 to join... -[2023-02-25 01:18:37,661][15272] Batcher 0 profile tree view: -batching: 0.0579, releasing_batches: 0.0089 -[2023-02-25 01:18:37,664][15272] InferenceWorker_p0-w0 profile tree view: -update_model: 0.0253 -wait_policy: 0.0012 - wait_policy_total: 11.0478 -one_step: 0.0019 - handle_policy_step: 4.5369 - deserialize: 0.0550, stack: 0.0135, obs_to_device_normalize: 0.4126, forward: 3.5764, send_messages: 0.1127 - prepare_outputs: 0.2696 - to_cpu: 0.1420 -[2023-02-25 01:18:37,666][15272] Learner 0 profile tree view: -misc: 0.0000, prepare_batch: 5.4236 -train: 0.7007 - epoch_init: 0.0000, minibatch_init: 0.0000, losses_postprocess: 0.0004, kl_divergence: 0.0005, after_optimizer: 0.0076 - calculate_losses: 0.1402 - losses_init: 0.0000, forward_head: 0.1134, bptt_initial: 0.0170, tail: 0.0014, advantages_returns: 0.0011, losses: 0.0041 - bptt: 0.0027 - bptt_forward_core: 0.0026 - update: 0.5509 - clip: 0.0024 -[2023-02-25 01:18:37,668][15272] RolloutWorker_w0 profile tree view: -wait_for_trajectories: 0.0003, enqueue_policy_requests: 0.0005 -[2023-02-25 01:18:37,669][15272] RolloutWorker_w7 profile tree view: -wait_for_trajectories: 0.0011, enqueue_policy_requests: 1.0850, env_step: 4.9020, overhead: 0.1307, complete_rollouts: 0.0696 -save_policy_outputs: 0.1046 - split_output_tensors: 0.0584 -[2023-02-25 01:18:37,672][15272] Loop Runner_EvtLoop terminating... -[2023-02-25 01:18:37,675][15272] Runner profile tree view: -main_loop: 42.5253 -[2023-02-25 01:18:37,677][15272] Collected {0: 122880}, FPS: 192.6 -[2023-02-25 01:18:37,901][15272] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json -[2023-02-25 01:18:37,904][15272] Overriding arg 'num_workers' with value 1 passed from command line -[2023-02-25 01:18:37,906][15272] Adding new argument 'no_render'=True that is not in the saved config file! -[2023-02-25 01:18:37,907][15272] Adding new argument 'save_video'=True that is not in the saved config file! -[2023-02-25 01:18:37,910][15272] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! -[2023-02-25 01:18:37,912][15272] Adding new argument 'video_name'=None that is not in the saved config file! -[2023-02-25 01:18:37,913][15272] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! -[2023-02-25 01:18:37,914][15272] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! -[2023-02-25 01:18:37,915][15272] Adding new argument 'push_to_hub'=True that is not in the saved config file! -[2023-02-25 01:18:37,917][15272] Adding new argument 'hf_repository'='Antiraedus/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! -[2023-02-25 01:18:37,922][15272] Adding new argument 'policy_index'=0 that is not in the saved config file! -[2023-02-25 01:18:37,923][15272] Adding new argument 'eval_deterministic'=False that is not in the saved config file! -[2023-02-25 01:18:37,924][15272] Adding new argument 'train_script'=None that is not in the saved config file! -[2023-02-25 01:18:37,925][15272] Adding new argument 'enjoy_script'=None that is not in the saved config file! -[2023-02-25 01:18:37,927][15272] Using frameskip 1 and render_action_repeat=4 for evaluation -[2023-02-25 01:18:37,955][15272] Doom resolution: 160x120, resize resolution: (128, 72) -[2023-02-25 01:18:37,957][15272] RunningMeanStd input shape: (3, 72, 128) -[2023-02-25 01:18:37,960][15272] RunningMeanStd input shape: (1,) -[2023-02-25 01:18:37,976][15272] ConvEncoder: input_channels=3 -[2023-02-25 01:18:38,646][15272] Conv encoder output size: 512 -[2023-02-25 01:18:38,648][15272] Policy head output size: 512 -[2023-02-25 01:18:41,511][15272] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000030_122880.pth... -[2023-02-25 01:18:44,517][15272] Num frames 100... -[2023-02-25 01:18:44,869][15272] Num frames 200... -[2023-02-25 01:18:45,105][15272] Num frames 300... -[2023-02-25 01:18:45,351][15272] Num frames 400... -[2023-02-25 01:18:45,562][15272] Num frames 500... -[2023-02-25 01:18:45,696][15272] Avg episode rewards: #0: 7.440, true rewards: #0: 5.440 -[2023-02-25 01:18:45,703][15272] Avg episode reward: 7.440, avg true_objective: 5.440 -[2023-02-25 01:18:45,814][15272] Num frames 600... -[2023-02-25 01:18:46,001][15272] Num frames 700... -[2023-02-25 01:18:46,188][15272] Num frames 800... -[2023-02-25 01:18:46,352][15272] Avg episode rewards: #0: 5.305, true rewards: #0: 4.305 -[2023-02-25 01:18:46,355][15272] Avg episode reward: 5.305, avg true_objective: 4.305 -[2023-02-25 01:18:46,509][15272] Num frames 900... -[2023-02-25 01:18:46,711][15272] Num frames 1000... -[2023-02-25 01:18:46,904][15272] Num frames 1100... -[2023-02-25 01:18:47,068][15272] Num frames 1200... -[2023-02-25 01:18:47,230][15272] Avg episode rewards: #0: 4.817, true rewards: #0: 4.150 -[2023-02-25 01:18:47,237][15272] Avg episode reward: 4.817, avg true_objective: 4.150 -[2023-02-25 01:18:47,369][15272] Num frames 1300... -[2023-02-25 01:18:47,673][15272] Num frames 1400... -[2023-02-25 01:18:47,850][15272] Num frames 1500... -[2023-02-25 01:18:48,120][15272] Num frames 1600... -[2023-02-25 01:18:48,301][15272] Avg episode rewards: #0: 4.573, true rewards: #0: 4.072 -[2023-02-25 01:18:48,303][15272] Avg episode reward: 4.573, avg true_objective: 4.072 -[2023-02-25 01:18:48,443][15272] Num frames 1700... -[2023-02-25 01:18:48,671][15272] Num frames 1800... -[2023-02-25 01:18:48,927][15272] Num frames 1900... -[2023-02-25 01:18:49,152][15272] Num frames 2000... -[2023-02-25 01:18:49,263][15272] Avg episode rewards: #0: 4.426, true rewards: #0: 4.026 -[2023-02-25 01:18:49,271][15272] Avg episode reward: 4.426, avg true_objective: 4.026 -[2023-02-25 01:18:49,479][15272] Num frames 2100... -[2023-02-25 01:18:49,678][15272] Num frames 2200... -[2023-02-25 01:18:49,860][15272] Num frames 2300... -[2023-02-25 01:18:50,056][15272] Num frames 2400... -[2023-02-25 01:18:50,173][15272] Avg episode rewards: #0: 4.382, true rewards: #0: 4.048 -[2023-02-25 01:18:50,175][15272] Avg episode reward: 4.382, avg true_objective: 4.048 -[2023-02-25 01:18:50,261][15272] Num frames 2500... -[2023-02-25 01:18:50,385][15272] Num frames 2600... -[2023-02-25 01:18:50,502][15272] Num frames 2700... -[2023-02-25 01:18:50,628][15272] Num frames 2800... -[2023-02-25 01:18:50,701][15272] Avg episode rewards: #0: 4.304, true rewards: #0: 4.019 -[2023-02-25 01:18:50,704][15272] Avg episode reward: 4.304, avg true_objective: 4.019 -[2023-02-25 01:18:50,813][15272] Num frames 2900... -[2023-02-25 01:18:50,925][15272] Num frames 3000... -[2023-02-25 01:18:51,048][15272] Num frames 3100... -[2023-02-25 01:18:51,166][15272] Num frames 3200... -[2023-02-25 01:18:51,257][15272] Avg episode rewards: #0: 4.411, true rewards: #0: 4.036 -[2023-02-25 01:18:51,262][15272] Avg episode reward: 4.411, avg true_objective: 4.036 -[2023-02-25 01:18:51,345][15272] Num frames 3300... -[2023-02-25 01:18:51,463][15272] Num frames 3400... -[2023-02-25 01:18:51,586][15272] Num frames 3500... -[2023-02-25 01:18:51,711][15272] Num frames 3600... -[2023-02-25 01:18:51,837][15272] Num frames 3700... -[2023-02-25 01:18:51,960][15272] Num frames 3800... -[2023-02-25 01:18:52,024][15272] Avg episode rewards: #0: 4.894, true rewards: #0: 4.228 -[2023-02-25 01:18:52,026][15272] Avg episode reward: 4.894, avg true_objective: 4.228 -[2023-02-25 01:18:52,137][15272] Num frames 3900... -[2023-02-25 01:18:52,263][15272] Num frames 4000... -[2023-02-25 01:18:52,383][15272] Num frames 4100... -[2023-02-25 01:18:52,505][15272] Avg episode rewards: #0: 4.857, true rewards: #0: 4.157 -[2023-02-25 01:18:52,507][15272] Avg episode reward: 4.857, avg true_objective: 4.157 -[2023-02-25 01:19:15,633][15272] Replay video saved to /content/train_dir/default_experiment/replay.mp4! + bptt_forward_core: 0.0030 + update: 1.1296 + clip: 0.0035 +[2023-03-01 03:17:47,822][00674] RolloutWorker_w0 profile tree view: +wait_for_trajectories: 0.0008, enqueue_policy_requests: 0.6873, env_step: 2.6950, overhead: 0.0839, complete_rollouts: 0.0183 +save_policy_outputs: 0.0687 + split_output_tensors: 0.0276 +[2023-03-01 03:17:47,824][00674] RolloutWorker_w7 profile tree view: +wait_for_trajectories: 0.0008, enqueue_policy_requests: 0.4166, env_step: 3.0030, overhead: 0.0586, complete_rollouts: 0.0292 +save_policy_outputs: 0.0462 + split_output_tensors: 0.0194 +[2023-03-01 03:17:47,826][00674] Loop Runner_EvtLoop terminating... +[2023-03-01 03:17:47,827][00674] Runner profile tree view: +main_loop: 41.3926 +[2023-03-01 03:17:47,829][00674] Collected {0: 12288}, FPS: 296.9 +[2023-03-01 03:18:18,296][00674] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json +[2023-03-01 03:18:18,298][00674] Overriding arg 'num_workers' with value 1 passed from command line +[2023-03-01 03:18:18,300][00674] Adding new argument 'no_render'=True that is not in the saved config file! +[2023-03-01 03:18:18,303][00674] Adding new argument 'save_video'=True that is not in the saved config file! +[2023-03-01 03:18:18,305][00674] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2023-03-01 03:18:18,306][00674] Adding new argument 'video_name'=None that is not in the saved config file! +[2023-03-01 03:18:18,308][00674] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! +[2023-03-01 03:18:18,311][00674] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! +[2023-03-01 03:18:18,313][00674] Adding new argument 'push_to_hub'=False that is not in the saved config file! +[2023-03-01 03:18:18,314][00674] Adding new argument 'hf_repository'=None that is not in the saved config file! +[2023-03-01 03:18:18,319][00674] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2023-03-01 03:18:18,320][00674] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2023-03-01 03:18:18,324][00674] Adding new argument 'train_script'=None that is not in the saved config file! +[2023-03-01 03:18:18,326][00674] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2023-03-01 03:18:18,328][00674] Using frameskip 1 and render_action_repeat=4 for evaluation +[2023-03-01 03:18:18,352][00674] Doom resolution: 160x120, resize resolution: (128, 72) +[2023-03-01 03:18:18,355][00674] RunningMeanStd input shape: (3, 72, 128) +[2023-03-01 03:18:18,358][00674] RunningMeanStd input shape: (1,) +[2023-03-01 03:18:18,378][00674] ConvEncoder: input_channels=3 +[2023-03-01 03:18:19,118][00674] Conv encoder output size: 512 +[2023-03-01 03:18:19,121][00674] Policy head output size: 512 +[2023-03-01 03:18:21,517][00674] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000003_12288.pth... +[2023-03-01 03:18:22,782][00674] Num frames 100... +[2023-03-01 03:18:22,916][00674] Num frames 200... +[2023-03-01 03:18:23,034][00674] Num frames 300... +[2023-03-01 03:18:23,158][00674] Num frames 400... +[2023-03-01 03:18:23,314][00674] Avg episode rewards: #0: 6.800, true rewards: #0: 4.800 +[2023-03-01 03:18:23,317][00674] Avg episode reward: 6.800, avg true_objective: 4.800 +[2023-03-01 03:18:23,344][00674] Num frames 500... +[2023-03-01 03:18:23,460][00674] Num frames 600... +[2023-03-01 03:18:23,578][00674] Num frames 700... +[2023-03-01 03:18:23,697][00674] Num frames 800... +[2023-03-01 03:18:23,830][00674] Avg episode rewards: #0: 5.320, true rewards: #0: 4.320 +[2023-03-01 03:18:23,831][00674] Avg episode reward: 5.320, avg true_objective: 4.320 +[2023-03-01 03:18:23,879][00674] Num frames 900... +[2023-03-01 03:18:24,006][00674] Num frames 1000... +[2023-03-01 03:18:24,121][00674] Num frames 1100... +[2023-03-01 03:18:24,238][00674] Num frames 1200... +[2023-03-01 03:18:24,346][00674] Avg episode rewards: #0: 4.827, true rewards: #0: 4.160 +[2023-03-01 03:18:24,348][00674] Avg episode reward: 4.827, avg true_objective: 4.160 +[2023-03-01 03:18:24,411][00674] Num frames 1300... +[2023-03-01 03:18:24,534][00674] Num frames 1400... +[2023-03-01 03:18:24,653][00674] Num frames 1500... +[2023-03-01 03:18:24,767][00674] Num frames 1600... +[2023-03-01 03:18:24,936][00674] Avg episode rewards: #0: 4.990, true rewards: #0: 4.240 +[2023-03-01 03:18:24,937][00674] Avg episode reward: 4.990, avg true_objective: 4.240 +[2023-03-01 03:18:24,947][00674] Num frames 1700... +[2023-03-01 03:18:25,063][00674] Num frames 1800... +[2023-03-01 03:18:25,182][00674] Num frames 1900... +[2023-03-01 03:18:25,297][00674] Num frames 2000... +[2023-03-01 03:18:25,417][00674] Num frames 2100... +[2023-03-01 03:18:25,549][00674] Avg episode rewards: #0: 5.088, true rewards: #0: 4.288 +[2023-03-01 03:18:25,551][00674] Avg episode reward: 5.088, avg true_objective: 4.288 +[2023-03-01 03:18:25,650][00674] Num frames 2200... +[2023-03-01 03:18:25,814][00674] Num frames 2300... +[2023-03-01 03:18:25,982][00674] Num frames 2400... +[2023-03-01 03:18:26,152][00674] Num frames 2500... +[2023-03-01 03:18:26,362][00674] Avg episode rewards: #0: 5.153, true rewards: #0: 4.320 +[2023-03-01 03:18:26,364][00674] Avg episode reward: 5.153, avg true_objective: 4.320 +[2023-03-01 03:18:26,383][00674] Num frames 2600... +[2023-03-01 03:18:26,551][00674] Num frames 2700... +[2023-03-01 03:18:26,721][00674] Num frames 2800... +[2023-03-01 03:18:26,886][00674] Num frames 2900... +[2023-03-01 03:18:27,067][00674] Num frames 3000... +[2023-03-01 03:18:27,188][00674] Avg episode rewards: #0: 5.200, true rewards: #0: 4.343 +[2023-03-01 03:18:27,194][00674] Avg episode reward: 5.200, avg true_objective: 4.343 +[2023-03-01 03:18:27,299][00674] Num frames 3100... +[2023-03-01 03:18:27,461][00674] Num frames 3200... +[2023-03-01 03:18:27,625][00674] Num frames 3300... +[2023-03-01 03:18:27,789][00674] Num frames 3400... +[2023-03-01 03:18:27,885][00674] Avg episode rewards: #0: 5.030, true rewards: #0: 4.280 +[2023-03-01 03:18:27,887][00674] Avg episode reward: 5.030, avg true_objective: 4.280 +[2023-03-01 03:18:28,020][00674] Num frames 3500... +[2023-03-01 03:18:28,203][00674] Num frames 3600... +[2023-03-01 03:18:28,389][00674] Num frames 3700... +[2023-03-01 03:18:28,562][00674] Num frames 3800... +[2023-03-01 03:18:28,637][00674] Avg episode rewards: #0: 4.898, true rewards: #0: 4.231 +[2023-03-01 03:18:28,639][00674] Avg episode reward: 4.898, avg true_objective: 4.231 +[2023-03-01 03:18:28,800][00674] Num frames 3900... +[2023-03-01 03:18:28,969][00674] Num frames 4000... +[2023-03-01 03:18:29,093][00674] Num frames 4100... +[2023-03-01 03:18:29,218][00674] Num frames 4200... +[2023-03-01 03:18:29,339][00674] Avg episode rewards: #0: 4.956, true rewards: #0: 4.256 +[2023-03-01 03:18:29,341][00674] Avg episode reward: 4.956, avg true_objective: 4.256 +[2023-03-01 03:18:51,695][00674] Replay video saved to /content/train_dir/default_experiment/replay.mp4! +[2023-03-01 03:20:42,520][00674] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json +[2023-03-01 03:20:42,522][00674] Overriding arg 'num_workers' with value 1 passed from command line +[2023-03-01 03:20:42,523][00674] Adding new argument 'no_render'=True that is not in the saved config file! +[2023-03-01 03:20:42,525][00674] Adding new argument 'save_video'=True that is not in the saved config file! +[2023-03-01 03:20:42,526][00674] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2023-03-01 03:20:42,528][00674] Adding new argument 'video_name'=None that is not in the saved config file! +[2023-03-01 03:20:42,530][00674] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! +[2023-03-01 03:20:42,532][00674] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! +[2023-03-01 03:20:42,533][00674] Adding new argument 'push_to_hub'=True that is not in the saved config file! +[2023-03-01 03:20:42,534][00674] Adding new argument 'hf_repository'='Antiraedus/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! +[2023-03-01 03:20:42,535][00674] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2023-03-01 03:20:42,536][00674] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2023-03-01 03:20:42,538][00674] Adding new argument 'train_script'=None that is not in the saved config file! +[2023-03-01 03:20:42,539][00674] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2023-03-01 03:20:42,540][00674] Using frameskip 1 and render_action_repeat=4 for evaluation +[2023-03-01 03:20:42,564][00674] RunningMeanStd input shape: (3, 72, 128) +[2023-03-01 03:20:42,566][00674] RunningMeanStd input shape: (1,) +[2023-03-01 03:20:42,581][00674] ConvEncoder: input_channels=3 +[2023-03-01 03:20:42,623][00674] Conv encoder output size: 512 +[2023-03-01 03:20:42,625][00674] Policy head output size: 512 +[2023-03-01 03:20:42,644][00674] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000003_12288.pth... +[2023-03-01 03:20:43,085][00674] Num frames 100... +[2023-03-01 03:20:43,205][00674] Num frames 200... +[2023-03-01 03:20:43,330][00674] Num frames 300... +[2023-03-01 03:20:43,462][00674] Num frames 400... +[2023-03-01 03:20:43,575][00674] Avg episode rewards: #0: 5.480, true rewards: #0: 4.480 +[2023-03-01 03:20:43,576][00674] Avg episode reward: 5.480, avg true_objective: 4.480 +[2023-03-01 03:20:43,642][00674] Num frames 500... +[2023-03-01 03:20:43,754][00674] Num frames 600... +[2023-03-01 03:20:43,866][00674] Num frames 700... +[2023-03-01 03:20:43,986][00674] Num frames 800... +[2023-03-01 03:20:44,079][00674] Avg episode rewards: #0: 4.660, true rewards: #0: 4.160 +[2023-03-01 03:20:44,081][00674] Avg episode reward: 4.660, avg true_objective: 4.160 +[2023-03-01 03:20:44,170][00674] Num frames 900... +[2023-03-01 03:20:44,341][00674] Num frames 1000... +[2023-03-01 03:20:44,505][00674] Num frames 1100... +[2023-03-01 03:20:44,667][00674] Num frames 1200... +[2023-03-01 03:20:44,751][00674] Avg episode rewards: #0: 4.387, true rewards: #0: 4.053 +[2023-03-01 03:20:44,757][00674] Avg episode reward: 4.387, avg true_objective: 4.053 +[2023-03-01 03:20:44,896][00674] Num frames 1300... +[2023-03-01 03:20:45,060][00674] Num frames 1400... +[2023-03-01 03:20:45,224][00674] Num frames 1500... +[2023-03-01 03:20:45,388][00674] Num frames 1600... +[2023-03-01 03:20:45,444][00674] Avg episode rewards: #0: 4.250, true rewards: #0: 4.000 +[2023-03-01 03:20:45,450][00674] Avg episode reward: 4.250, avg true_objective: 4.000 +[2023-03-01 03:20:45,622][00674] Num frames 1700... +[2023-03-01 03:20:45,796][00674] Num frames 1800... +[2023-03-01 03:20:45,965][00674] Num frames 1900... +[2023-03-01 03:20:46,173][00674] Avg episode rewards: #0: 4.168, true rewards: #0: 3.968 +[2023-03-01 03:20:46,176][00674] Avg episode reward: 4.168, avg true_objective: 3.968 +[2023-03-01 03:20:46,213][00674] Num frames 2000... +[2023-03-01 03:20:46,385][00674] Num frames 2100... +[2023-03-01 03:20:46,557][00674] Num frames 2200... +[2023-03-01 03:20:46,729][00674] Num frames 2300... +[2023-03-01 03:20:46,906][00674] Num frames 2400... +[2023-03-01 03:20:46,962][00674] Avg episode rewards: #0: 4.333, true rewards: #0: 4.000 +[2023-03-01 03:20:46,964][00674] Avg episode reward: 4.333, avg true_objective: 4.000 +[2023-03-01 03:20:47,133][00674] Num frames 2500... +[2023-03-01 03:20:47,307][00674] Num frames 2600... +[2023-03-01 03:20:47,469][00674] Num frames 2700... +[2023-03-01 03:20:47,639][00674] Num frames 2800... +[2023-03-01 03:20:47,757][00674] Avg episode rewards: #0: 4.497, true rewards: #0: 4.069 +[2023-03-01 03:20:47,758][00674] Avg episode reward: 4.497, avg true_objective: 4.069 +[2023-03-01 03:20:47,822][00674] Num frames 2900... +[2023-03-01 03:20:47,935][00674] Num frames 3000... +[2023-03-01 03:20:48,055][00674] Num frames 3100... +[2023-03-01 03:20:48,173][00674] Num frames 3200... +[2023-03-01 03:20:48,342][00674] Avg episode rewards: #0: 4.620, true rewards: #0: 4.120 +[2023-03-01 03:20:48,344][00674] Avg episode reward: 4.620, avg true_objective: 4.120 +[2023-03-01 03:20:48,353][00674] Num frames 3300... +[2023-03-01 03:20:48,467][00674] Num frames 3400... +[2023-03-01 03:20:48,588][00674] Num frames 3500... +[2023-03-01 03:20:48,702][00674] Num frames 3600... +[2023-03-01 03:20:48,818][00674] Num frames 3700... +[2023-03-01 03:20:48,934][00674] Num frames 3800... +[2023-03-01 03:20:49,002][00674] Avg episode rewards: #0: 4.898, true rewards: #0: 4.231 +[2023-03-01 03:20:49,004][00674] Avg episode reward: 4.898, avg true_objective: 4.231 +[2023-03-01 03:20:49,108][00674] Num frames 3900... +[2023-03-01 03:20:49,227][00674] Num frames 4000... +[2023-03-01 03:20:49,347][00674] Num frames 4100... +[2023-03-01 03:20:49,509][00674] Avg episode rewards: #0: 4.792, true rewards: #0: 4.192 +[2023-03-01 03:20:49,511][00674] Avg episode reward: 4.792, avg true_objective: 4.192 +[2023-03-01 03:21:02,994][00674] Replay video saved to /content/train_dir/default_experiment/replay.mp4!