[2024-04-25 12:11:47,233][18010] Saving configuration to /home/bartek/Workspace/ideas/sample-factory/train_dir/Ant/config.json... [2024-04-25 12:11:47,250][18010] Rollout worker 0 uses device cpu [2024-04-25 12:11:47,251][18010] Rollout worker 1 uses device cpu [2024-04-25 12:11:47,251][18010] Rollout worker 2 uses device cpu [2024-04-25 12:11:47,251][18010] Rollout worker 3 uses device cpu [2024-04-25 12:11:47,251][18010] Rollout worker 4 uses device cpu [2024-04-25 12:11:47,251][18010] Rollout worker 5 uses device cpu [2024-04-25 12:11:47,251][18010] Rollout worker 6 uses device cpu [2024-04-25 12:11:47,251][18010] Rollout worker 7 uses device cpu [2024-04-25 12:11:47,251][18010] In synchronous mode, we only accumulate one batch. Setting num_batches_to_accumulate to 1 [2024-04-25 12:11:47,288][18010] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-04-25 12:11:47,288][18010] InferenceWorker_p0-w0: min num requests: 2 [2024-04-25 12:11:47,314][18010] Starting all processes... [2024-04-25 12:11:47,314][18010] Starting process learner_proc0 [2024-04-25 12:11:47,316][18010] Starting all processes... [2024-04-25 12:11:47,319][18010] Starting process inference_proc0-0 [2024-04-25 12:11:47,319][18010] Starting process rollout_proc0 [2024-04-25 12:11:47,320][18010] Starting process rollout_proc1 [2024-04-25 12:11:47,320][18010] Starting process rollout_proc2 [2024-04-25 12:11:47,320][18010] Starting process rollout_proc3 [2024-04-25 12:11:47,320][18010] Starting process rollout_proc4 [2024-04-25 12:11:47,320][18010] Starting process rollout_proc5 [2024-04-25 12:11:47,320][18010] Starting process rollout_proc6 [2024-04-25 12:11:47,322][18010] Starting process rollout_proc7 [2024-04-25 12:11:49,472][18188] Worker 3 uses CPU cores [6, 7] [2024-04-25 12:11:49,481][18187] Worker 6 uses CPU cores [12, 13] [2024-04-25 12:11:49,609][18189] Worker 7 uses CPU cores [14, 15] [2024-04-25 12:11:49,675][18184] Worker 1 uses CPU cores [2, 3] [2024-04-25 12:11:49,693][18185] Worker 2 uses CPU cores [4, 5] [2024-04-25 12:11:49,708][18183] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-04-25 12:11:49,708][18183] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2024-04-25 12:11:49,751][18183] Num visible devices: 1 [2024-04-25 12:11:49,976][18186] Worker 4 uses CPU cores [8, 9] [2024-04-25 12:11:50,022][18190] Worker 5 uses CPU cores [10, 11] [2024-04-25 12:11:50,166][18182] Worker 0 uses CPU cores [0, 1] [2024-04-25 12:11:50,192][18169] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-04-25 12:11:50,192][18169] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2024-04-25 12:11:50,229][18169] Num visible devices: 1 [2024-04-25 12:11:50,257][18169] Starting seed is not provided [2024-04-25 12:11:50,258][18169] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-04-25 12:11:50,258][18169] Initializing actor-critic model on device cuda:0 [2024-04-25 12:11:50,258][18169] RunningMeanStd input shape: (27,) [2024-04-25 12:11:50,264][18169] RunningMeanStd input shape: (1,) [2024-04-25 12:11:50,332][18169] Created Actor Critic model with architecture: [2024-04-25 12:11:50,332][18169] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): MultiInputEncoder( (encoders): ModuleDict( (obs): MlpEncoder( (mlp_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=Tanh) (2): RecursiveScriptModule(original_name=Linear) (3): RecursiveScriptModule(original_name=Tanh) ) ) ) ) (core): ModelCoreIdentity() (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=64, out_features=1, bias=True) (action_parameterization): ActionParameterizationContinuousNonAdaptiveStddev( (distribution_linear): Linear(in_features=64, out_features=8, bias=True) ) ) [2024-04-25 12:11:50,440][18169] Using optimizer [2024-04-25 12:11:50,776][18169] No checkpoints found [2024-04-25 12:11:50,776][18169] Did not load from checkpoint, starting from scratch! [2024-04-25 12:11:50,777][18169] Initialized policy 0 weights for model version 0 [2024-04-25 12:11:50,780][18169] LearnerWorker_p0 finished initialization! [2024-04-25 12:11:50,780][18169] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-04-25 12:11:50,838][18183] RunningMeanStd input shape: (27,) [2024-04-25 12:11:50,839][18183] RunningMeanStd input shape: (1,) [2024-04-25 12:11:50,901][18010] Inference worker 0-0 is ready! [2024-04-25 12:11:50,901][18010] All inference workers are ready! Signal rollout workers to start! [2024-04-25 12:11:50,994][18184] Decorrelating experience for 0 frames... [2024-04-25 12:11:50,994][18190] Decorrelating experience for 0 frames... [2024-04-25 12:11:50,994][18186] Decorrelating experience for 0 frames... [2024-04-25 12:11:50,994][18184] Decorrelating experience for 64 frames... [2024-04-25 12:11:50,994][18190] Decorrelating experience for 64 frames... [2024-04-25 12:11:50,994][18185] Decorrelating experience for 0 frames... [2024-04-25 12:11:50,994][18188] Decorrelating experience for 0 frames... [2024-04-25 12:11:50,995][18186] Decorrelating experience for 64 frames... [2024-04-25 12:11:50,994][18182] Decorrelating experience for 0 frames... [2024-04-25 12:11:50,994][18189] Decorrelating experience for 0 frames... [2024-04-25 12:11:50,995][18185] Decorrelating experience for 64 frames... [2024-04-25 12:11:50,995][18188] Decorrelating experience for 64 frames... [2024-04-25 12:11:50,995][18189] Decorrelating experience for 64 frames... [2024-04-25 12:11:50,995][18182] Decorrelating experience for 64 frames... [2024-04-25 12:11:51,014][18184] Decorrelating experience for 128 frames... [2024-04-25 12:11:51,014][18190] Decorrelating experience for 128 frames... [2024-04-25 12:11:51,015][18189] Decorrelating experience for 128 frames... [2024-04-25 12:11:51,015][18185] Decorrelating experience for 128 frames... [2024-04-25 12:11:51,015][18188] Decorrelating experience for 128 frames... [2024-04-25 12:11:51,015][18182] Decorrelating experience for 128 frames... [2024-04-25 12:11:51,017][18186] Decorrelating experience for 128 frames... [2024-04-25 12:11:51,030][18187] Decorrelating experience for 0 frames... [2024-04-25 12:11:51,031][18187] Decorrelating experience for 64 frames... [2024-04-25 12:11:51,061][18187] Decorrelating experience for 128 frames... [2024-04-25 12:11:51,065][18182] Decorrelating experience for 192 frames... [2024-04-25 12:11:51,065][18184] Decorrelating experience for 192 frames... [2024-04-25 12:11:51,065][18188] Decorrelating experience for 192 frames... [2024-04-25 12:11:51,066][18189] Decorrelating experience for 192 frames... [2024-04-25 12:11:51,066][18186] Decorrelating experience for 192 frames... [2024-04-25 12:11:51,070][18185] Decorrelating experience for 192 frames... [2024-04-25 12:11:51,076][18190] Decorrelating experience for 192 frames... [2024-04-25 12:11:51,105][18187] Decorrelating experience for 192 frames... [2024-04-25 12:11:51,138][18188] Decorrelating experience for 256 frames... [2024-04-25 12:11:51,142][18189] Decorrelating experience for 256 frames... [2024-04-25 12:11:51,143][18186] Decorrelating experience for 256 frames... [2024-04-25 12:11:51,145][18184] Decorrelating experience for 256 frames... [2024-04-25 12:11:51,146][18185] Decorrelating experience for 256 frames... [2024-04-25 12:11:51,153][18190] Decorrelating experience for 256 frames... [2024-04-25 12:11:51,166][18182] Decorrelating experience for 256 frames... [2024-04-25 12:11:51,179][18187] Decorrelating experience for 256 frames... [2024-04-25 12:11:51,214][18188] Decorrelating experience for 320 frames... [2024-04-25 12:11:51,222][18189] Decorrelating experience for 320 frames... [2024-04-25 12:11:51,222][18184] Decorrelating experience for 320 frames... [2024-04-25 12:11:51,225][18185] Decorrelating experience for 320 frames... [2024-04-25 12:11:51,231][18190] Decorrelating experience for 320 frames... [2024-04-25 12:11:51,246][18182] Decorrelating experience for 320 frames... [2024-04-25 12:11:51,255][18187] Decorrelating experience for 320 frames... [2024-04-25 12:11:51,269][18186] Decorrelating experience for 320 frames... [2024-04-25 12:11:51,308][18188] Decorrelating experience for 384 frames... [2024-04-25 12:11:51,319][18184] Decorrelating experience for 384 frames... [2024-04-25 12:11:51,320][18185] Decorrelating experience for 384 frames... [2024-04-25 12:11:51,321][18189] Decorrelating experience for 384 frames... [2024-04-25 12:11:51,326][18190] Decorrelating experience for 384 frames... [2024-04-25 12:11:51,346][18182] Decorrelating experience for 384 frames... [2024-04-25 12:11:51,368][18186] Decorrelating experience for 384 frames... [2024-04-25 12:11:51,392][18187] Decorrelating experience for 384 frames... [2024-04-25 12:11:51,420][18188] Decorrelating experience for 448 frames... [2024-04-25 12:11:51,432][18184] Decorrelating experience for 448 frames... [2024-04-25 12:11:51,435][18189] Decorrelating experience for 448 frames... [2024-04-25 12:11:51,436][18185] Decorrelating experience for 448 frames... [2024-04-25 12:11:51,438][18190] Decorrelating experience for 448 frames... [2024-04-25 12:11:51,464][18182] Decorrelating experience for 448 frames... [2024-04-25 12:11:51,490][18186] Decorrelating experience for 448 frames... [2024-04-25 12:11:51,542][18187] Decorrelating experience for 448 frames... [2024-04-25 12:11:53,329][18169] Early stopping after 2 epochs (8 sgd steps), loss delta 0.0000000 [2024-04-25 12:11:53,677][18010] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 4096. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2024-04-25 12:11:53,678][18010] Avg episode reward: [(0, '-121.104')] [2024-04-25 12:11:56,699][18183] Updated weights for policy 0, policy_version 80 (0.0007) [2024-04-25 12:11:58,678][18010] Fps is (10 sec: 11468.6, 60 sec: 11468.6, 300 sec: 11468.6). Total num frames: 61440. Throughput: 0: 8898.2. Samples: 44492. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-04-25 12:11:58,678][18010] Avg episode reward: [(0, '-308.244')] [2024-04-25 12:11:58,681][18169] Saving /home/bartek/Workspace/ideas/sample-factory/train_dir/Ant/checkpoint_p0/checkpoint_000000120_61440.pth... [2024-04-25 12:12:00,253][18183] Updated weights for policy 0, policy_version 160 (0.0008) [2024-04-25 12:12:03,678][18010] Fps is (10 sec: 11468.7, 60 sec: 11468.7, 300 sec: 11468.7). Total num frames: 118784. Throughput: 0: 11237.9. Samples: 112380. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-04-25 12:12:03,678][18010] Avg episode reward: [(0, '-163.524')] [2024-04-25 12:12:03,678][18169] Saving new best policy, reward=-163.524! [2024-04-25 12:12:03,950][18183] Updated weights for policy 0, policy_version 240 (0.0008) [2024-04-25 12:12:07,281][18010] Heartbeat connected on Batcher_0 [2024-04-25 12:12:07,285][18010] Heartbeat connected on LearnerWorker_p0 [2024-04-25 12:12:07,293][18010] Heartbeat connected on RolloutWorker_w0 [2024-04-25 12:12:07,294][18010] Heartbeat connected on InferenceWorker_p0-w0 [2024-04-25 12:12:07,295][18010] Heartbeat connected on RolloutWorker_w1 [2024-04-25 12:12:07,298][18010] Heartbeat connected on RolloutWorker_w2 [2024-04-25 12:12:07,301][18010] Heartbeat connected on RolloutWorker_w3 [2024-04-25 12:12:07,304][18010] Heartbeat connected on RolloutWorker_w4 [2024-04-25 12:12:07,307][18010] Heartbeat connected on RolloutWorker_w5 [2024-04-25 12:12:07,310][18010] Heartbeat connected on RolloutWorker_w6 [2024-04-25 12:12:07,313][18010] Heartbeat connected on RolloutWorker_w7 [2024-04-25 12:12:07,814][18183] Updated weights for policy 0, policy_version 320 (0.0007) [2024-04-25 12:12:08,678][18010] Fps is (10 sec: 11059.3, 60 sec: 11195.7, 300 sec: 11195.7). Total num frames: 172032. Throughput: 0: 9562.1. Samples: 143432. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-04-25 12:12:08,678][18010] Avg episode reward: [(0, '-132.096')] [2024-04-25 12:12:08,678][18169] Saving new best policy, reward=-132.096! [2024-04-25 12:12:11,517][18183] Updated weights for policy 0, policy_version 400 (0.0008) [2024-04-25 12:12:13,678][18010] Fps is (10 sec: 10649.6, 60 sec: 11059.1, 300 sec: 11059.1). Total num frames: 225280. Throughput: 0: 10448.4. Samples: 208968. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-04-25 12:12:13,678][18010] Avg episode reward: [(0, '-76.525')] [2024-04-25 12:12:13,681][18169] Saving /home/bartek/Workspace/ideas/sample-factory/train_dir/Ant/checkpoint_p0/checkpoint_000000440_225280.pth... [2024-04-25 12:12:13,684][18169] Saving new best policy, reward=-76.525! [2024-04-25 12:12:15,231][18183] Updated weights for policy 0, policy_version 480 (0.0008) [2024-04-25 12:12:18,677][18010] Fps is (10 sec: 11059.2, 60 sec: 11141.1, 300 sec: 11141.1). Total num frames: 282624. Throughput: 0: 11117.9. Samples: 277948. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-04-25 12:12:18,678][18010] Avg episode reward: [(0, '-20.892')] [2024-04-25 12:12:18,678][18169] Saving new best policy, reward=-20.892! [2024-04-25 12:12:18,804][18183] Updated weights for policy 0, policy_version 560 (0.0007) [2024-04-25 12:12:22,271][18183] Updated weights for policy 0, policy_version 640 (0.0007) [2024-04-25 12:12:23,678][18010] Fps is (10 sec: 11878.5, 60 sec: 11332.3, 300 sec: 11332.3). Total num frames: 344064. Throughput: 0: 10378.9. Samples: 311368. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-04-25 12:12:23,678][18010] Avg episode reward: [(0, '3.339')] [2024-04-25 12:12:23,678][18169] Saving new best policy, reward=3.339! [2024-04-25 12:12:25,808][18183] Updated weights for policy 0, policy_version 720 (0.0009) [2024-04-25 12:12:28,678][18010] Fps is (10 sec: 11468.4, 60 sec: 11234.6, 300 sec: 11234.6). Total num frames: 397312. Throughput: 0: 10885.6. Samples: 381000. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-04-25 12:12:28,678][18010] Avg episode reward: [(0, '75.000')] [2024-04-25 12:12:28,681][18169] Saving /home/bartek/Workspace/ideas/sample-factory/train_dir/Ant/checkpoint_p0/checkpoint_000000776_397312.pth... [2024-04-25 12:12:28,684][18169] Removing /home/bartek/Workspace/ideas/sample-factory/train_dir/Ant/checkpoint_p0/checkpoint_000000120_61440.pth [2024-04-25 12:12:28,684][18169] Saving new best policy, reward=75.000! [2024-04-25 12:12:29,492][18183] Updated weights for policy 0, policy_version 800 (0.0007) [2024-04-25 12:12:33,095][18183] Updated weights for policy 0, policy_version 880 (0.0008) [2024-04-25 12:12:33,678][18010] Fps is (10 sec: 11059.2, 60 sec: 11264.0, 300 sec: 11264.0). Total num frames: 454656. Throughput: 0: 11223.8. Samples: 448952. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-04-25 12:12:33,678][18010] Avg episode reward: [(0, '211.768')] [2024-04-25 12:12:33,678][18169] Saving new best policy, reward=211.768! [2024-04-25 12:12:36,697][18183] Updated weights for policy 0, policy_version 960 (0.0008) [2024-04-25 12:12:38,678][18010] Fps is (10 sec: 11469.1, 60 sec: 11286.7, 300 sec: 11286.7). Total num frames: 512000. Throughput: 0: 10739.7. Samples: 483288. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-04-25 12:12:38,678][18010] Avg episode reward: [(0, '364.076')] [2024-04-25 12:12:38,678][18169] Saving new best policy, reward=364.076! [2024-04-25 12:12:40,380][18183] Updated weights for policy 0, policy_version 1040 (0.0007) [2024-04-25 12:12:43,678][18010] Fps is (10 sec: 11059.1, 60 sec: 11223.0, 300 sec: 11223.0). Total num frames: 565248. Throughput: 0: 11208.0. Samples: 548852. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-04-25 12:12:43,678][18010] Avg episode reward: [(0, '537.335')] [2024-04-25 12:12:43,682][18169] Saving /home/bartek/Workspace/ideas/sample-factory/train_dir/Ant/checkpoint_p0/checkpoint_000001104_565248.pth... [2024-04-25 12:12:43,686][18169] Removing /home/bartek/Workspace/ideas/sample-factory/train_dir/Ant/checkpoint_p0/checkpoint_000000440_225280.pth [2024-04-25 12:12:43,686][18169] Saving new best policy, reward=537.335! [2024-04-25 12:12:44,271][18183] Updated weights for policy 0, policy_version 1120 (0.0008) [2024-04-25 12:12:47,684][18183] Updated weights for policy 0, policy_version 1200 (0.0007) [2024-04-25 12:12:48,678][18010] Fps is (10 sec: 11058.7, 60 sec: 11245.3, 300 sec: 11245.3). Total num frames: 622592. Throughput: 0: 11228.7. Samples: 617676. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-04-25 12:12:48,678][18010] Avg episode reward: [(0, '687.033')] [2024-04-25 12:12:48,679][18169] Saving new best policy, reward=687.033! [2024-04-25 12:12:51,086][18183] Updated weights for policy 0, policy_version 1280 (0.0006) [2024-04-25 12:12:53,678][18010] Fps is (10 sec: 11878.5, 60 sec: 11332.3, 300 sec: 11332.3). Total num frames: 684032. Throughput: 0: 11343.7. Samples: 653900. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-04-25 12:12:53,678][18010] Avg episode reward: [(0, '804.876')] [2024-04-25 12:12:53,678][18169] Saving new best policy, reward=804.876! [2024-04-25 12:12:54,620][18183] Updated weights for policy 0, policy_version 1360 (0.0008) [2024-04-25 12:12:58,170][18183] Updated weights for policy 0, policy_version 1440 (0.0007) [2024-04-25 12:12:58,678][18010] Fps is (10 sec: 11878.9, 60 sec: 11332.3, 300 sec: 11342.8). Total num frames: 741376. Throughput: 0: 11450.4. Samples: 724236. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-04-25 12:12:58,678][18010] Avg episode reward: [(0, '928.015')] [2024-04-25 12:12:58,681][18169] Saving /home/bartek/Workspace/ideas/sample-factory/train_dir/Ant/checkpoint_p0/checkpoint_000001448_741376.pth... [2024-04-25 12:12:58,686][18169] Removing /home/bartek/Workspace/ideas/sample-factory/train_dir/Ant/checkpoint_p0/checkpoint_000000776_397312.pth [2024-04-25 12:12:58,686][18169] Saving new best policy, reward=928.015! [2024-04-25 12:13:01,759][18183] Updated weights for policy 0, policy_version 1520 (0.0007) [2024-04-25 12:13:03,678][18010] Fps is (10 sec: 11468.8, 60 sec: 11332.3, 300 sec: 11351.8). Total num frames: 798720. Throughput: 0: 11417.9. Samples: 791752. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-04-25 12:13:03,678][18010] Avg episode reward: [(0, '1025.920')] [2024-04-25 12:13:03,678][18169] Saving new best policy, reward=1025.920! [2024-04-25 12:13:05,237][18183] Updated weights for policy 0, policy_version 1600 (0.0008) [2024-04-25 12:13:08,678][18010] Fps is (10 sec: 11468.9, 60 sec: 11400.5, 300 sec: 11359.6). Total num frames: 856064. Throughput: 0: 11467.2. Samples: 827392. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-04-25 12:13:08,678][18010] Avg episode reward: [(0, '1189.863')] [2024-04-25 12:13:08,678][18169] Saving new best policy, reward=1189.863! [2024-04-25 12:13:08,808][18183] Updated weights for policy 0, policy_version 1680 (0.0007) [2024-04-25 12:13:12,440][18183] Updated weights for policy 0, policy_version 1760 (0.0007) [2024-04-25 12:13:13,678][18010] Fps is (10 sec: 11468.7, 60 sec: 11468.8, 300 sec: 11366.4). Total num frames: 913408. Throughput: 0: 11441.0. Samples: 895844. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-04-25 12:13:13,678][18010] Avg episode reward: [(0, '1406.626')] [2024-04-25 12:13:13,682][18169] Saving /home/bartek/Workspace/ideas/sample-factory/train_dir/Ant/checkpoint_p0/checkpoint_000001784_913408.pth... [2024-04-25 12:13:13,687][18169] Removing /home/bartek/Workspace/ideas/sample-factory/train_dir/Ant/checkpoint_p0/checkpoint_000001104_565248.pth [2024-04-25 12:13:13,687][18169] Saving new best policy, reward=1406.626! [2024-04-25 12:13:16,101][18183] Updated weights for policy 0, policy_version 1840 (0.0007) [2024-04-25 12:13:18,678][18010] Fps is (10 sec: 11468.8, 60 sec: 11468.8, 300 sec: 11372.4). Total num frames: 970752. Throughput: 0: 11414.9. Samples: 962624. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-04-25 12:13:18,678][18010] Avg episode reward: [(0, '1587.237')] [2024-04-25 12:13:18,679][18169] Saving new best policy, reward=1587.237! [2024-04-25 12:13:19,720][18183] Updated weights for policy 0, policy_version 1920 (0.0006) [2024-04-25 12:13:23,245][18183] Updated weights for policy 0, policy_version 2000 (0.0007) [2024-04-25 12:13:23,678][18010] Fps is (10 sec: 11468.9, 60 sec: 11400.5, 300 sec: 11377.8). Total num frames: 1028096. Throughput: 0: 11415.1. Samples: 996968. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-04-25 12:13:23,678][18010] Avg episode reward: [(0, '1807.736')] [2024-04-25 12:13:23,678][18169] Saving new best policy, reward=1807.736! [2024-04-25 12:13:26,750][18183] Updated weights for policy 0, policy_version 2080 (0.0007) [2024-04-25 12:13:28,678][18010] Fps is (10 sec: 11468.8, 60 sec: 11468.8, 300 sec: 11382.6). Total num frames: 1085440. Throughput: 0: 11519.2. Samples: 1067216. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-04-25 12:13:28,678][18010] Avg episode reward: [(0, '2055.129')] [2024-04-25 12:13:28,681][18169] Saving /home/bartek/Workspace/ideas/sample-factory/train_dir/Ant/checkpoint_p0/checkpoint_000002120_1085440.pth... [2024-04-25 12:13:28,684][18169] Removing /home/bartek/Workspace/ideas/sample-factory/train_dir/Ant/checkpoint_p0/checkpoint_000001448_741376.pth [2024-04-25 12:13:28,684][18169] Saving new best policy, reward=2055.129! [2024-04-25 12:13:30,398][18183] Updated weights for policy 0, policy_version 2160 (0.0008) [2024-04-25 12:13:33,677][18010] Fps is (10 sec: 11468.8, 60 sec: 11468.8, 300 sec: 11386.9). Total num frames: 1142784. Throughput: 0: 11489.6. Samples: 1134704. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-04-25 12:13:33,678][18010] Avg episode reward: [(0, '2186.135')] [2024-04-25 12:13:33,678][18169] Saving new best policy, reward=2186.135! [2024-04-25 12:13:33,914][18183] Updated weights for policy 0, policy_version 2240 (0.0008) [2024-04-25 12:13:37,648][18183] Updated weights for policy 0, policy_version 2320 (0.0008) [2024-04-25 12:13:38,678][18010] Fps is (10 sec: 11059.2, 60 sec: 11400.5, 300 sec: 11351.8). Total num frames: 1196032. Throughput: 0: 11470.3. Samples: 1170064. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-04-25 12:13:38,678][18010] Avg episode reward: [(0, '2408.001')] [2024-04-25 12:13:38,678][18169] Saving new best policy, reward=2408.001! [2024-04-25 12:13:41,237][18183] Updated weights for policy 0, policy_version 2400 (0.0008) [2024-04-25 12:13:43,678][18010] Fps is (10 sec: 11059.1, 60 sec: 11468.8, 300 sec: 11357.1). Total num frames: 1253376. Throughput: 0: 11391.4. Samples: 1236848. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-04-25 12:13:43,678][18010] Avg episode reward: [(0, '2600.142')] [2024-04-25 12:13:43,681][18169] Saving /home/bartek/Workspace/ideas/sample-factory/train_dir/Ant/checkpoint_p0/checkpoint_000002448_1253376.pth... [2024-04-25 12:13:43,683][18169] Removing /home/bartek/Workspace/ideas/sample-factory/train_dir/Ant/checkpoint_p0/checkpoint_000001784_913408.pth [2024-04-25 12:13:43,684][18169] Saving new best policy, reward=2600.142! [2024-04-25 12:13:44,814][18183] Updated weights for policy 0, policy_version 2480 (0.0007) [2024-04-25 12:13:48,445][18183] Updated weights for policy 0, policy_version 2560 (0.0006) [2024-04-25 12:13:48,678][18010] Fps is (10 sec: 11468.8, 60 sec: 11468.9, 300 sec: 11361.9). Total num frames: 1310720. Throughput: 0: 11405.9. Samples: 1305020. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-04-25 12:13:48,678][18010] Avg episode reward: [(0, '2588.456')] [2024-04-25 12:13:52,094][18183] Updated weights for policy 0, policy_version 2640 (0.0008) [2024-04-25 12:13:53,678][18010] Fps is (10 sec: 11468.8, 60 sec: 11400.5, 300 sec: 11366.4). Total num frames: 1368064. Throughput: 0: 11355.4. Samples: 1338384. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-04-25 12:13:53,678][18010] Avg episode reward: [(0, '2755.965')] [2024-04-25 12:13:53,678][18169] Saving new best policy, reward=2755.965! [2024-04-25 12:13:55,883][18183] Updated weights for policy 0, policy_version 2720 (0.0007) [2024-04-25 12:13:58,678][18010] Fps is (10 sec: 11059.2, 60 sec: 11332.3, 300 sec: 11337.7). Total num frames: 1421312. Throughput: 0: 11304.5. Samples: 1404548. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-04-25 12:13:58,678][18010] Avg episode reward: [(0, '2868.845')] [2024-04-25 12:13:58,681][18169] Saving /home/bartek/Workspace/ideas/sample-factory/train_dir/Ant/checkpoint_p0/checkpoint_000002776_1421312.pth... [2024-04-25 12:13:58,684][18169] Removing /home/bartek/Workspace/ideas/sample-factory/train_dir/Ant/checkpoint_p0/checkpoint_000002120_1085440.pth [2024-04-25 12:13:58,685][18169] Saving new best policy, reward=2868.845! [2024-04-25 12:13:59,641][18183] Updated weights for policy 0, policy_version 2800 (0.0008) [2024-04-25 12:14:03,023][18183] Updated weights for policy 0, policy_version 2880 (0.0007) [2024-04-25 12:14:03,679][18010] Fps is (10 sec: 11058.2, 60 sec: 11332.1, 300 sec: 11342.7). Total num frames: 1478656. Throughput: 0: 11366.0. Samples: 1474104. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-04-25 12:14:03,679][18010] Avg episode reward: [(0, '3015.231')] [2024-04-25 12:14:03,680][18169] Saving new best policy, reward=3015.231! [2024-04-25 12:14:06,497][18183] Updated weights for policy 0, policy_version 2960 (0.0008) [2024-04-25 12:14:08,678][18010] Fps is (10 sec: 11878.5, 60 sec: 11400.5, 300 sec: 11377.8). Total num frames: 1540096. Throughput: 0: 11379.0. Samples: 1509024. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-04-25 12:14:08,678][18010] Avg episode reward: [(0, '3148.293')] [2024-04-25 12:14:08,678][18169] Saving new best policy, reward=3148.293! [2024-04-25 12:14:09,958][18183] Updated weights for policy 0, policy_version 3040 (0.0009) [2024-04-25 12:14:13,391][18183] Updated weights for policy 0, policy_version 3120 (0.0008) [2024-04-25 12:14:13,678][18010] Fps is (10 sec: 11879.1, 60 sec: 11400.5, 300 sec: 11381.0). Total num frames: 1597440. Throughput: 0: 11414.1. Samples: 1580852. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-04-25 12:14:13,678][18010] Avg episode reward: [(0, '3094.504')] [2024-04-25 12:14:13,682][18169] Saving /home/bartek/Workspace/ideas/sample-factory/train_dir/Ant/checkpoint_p0/checkpoint_000003120_1597440.pth... [2024-04-25 12:14:13,684][18169] Removing /home/bartek/Workspace/ideas/sample-factory/train_dir/Ant/checkpoint_p0/checkpoint_000002448_1253376.pth [2024-04-25 12:14:16,859][18183] Updated weights for policy 0, policy_version 3200 (0.0008) [2024-04-25 12:14:18,678][18010] Fps is (10 sec: 11468.8, 60 sec: 11400.5, 300 sec: 11384.1). Total num frames: 1654784. Throughput: 0: 11469.9. Samples: 1650852. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-04-25 12:14:18,678][18010] Avg episode reward: [(0, '3108.484')] [2024-04-25 12:14:20,455][18183] Updated weights for policy 0, policy_version 3280 (0.0008) [2024-04-25 12:14:23,678][18010] Fps is (10 sec: 11878.4, 60 sec: 11468.7, 300 sec: 11414.2). Total num frames: 1716224. Throughput: 0: 11459.7. Samples: 1685756. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-04-25 12:14:23,678][18010] Avg episode reward: [(0, '3123.165')] [2024-04-25 12:14:23,792][18183] Updated weights for policy 0, policy_version 3360 (0.0007) [2024-04-25 12:14:27,278][18183] Updated weights for policy 0, policy_version 3440 (0.0006) [2024-04-25 12:14:28,677][18010] Fps is (10 sec: 11878.5, 60 sec: 11468.8, 300 sec: 11415.9). Total num frames: 1773568. Throughput: 0: 11564.6. Samples: 1757256. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-04-25 12:14:28,678][18010] Avg episode reward: [(0, '3275.058')] [2024-04-25 12:14:28,691][18169] Saving /home/bartek/Workspace/ideas/sample-factory/train_dir/Ant/checkpoint_p0/checkpoint_000003472_1777664.pth... [2024-04-25 12:14:28,693][18169] Removing /home/bartek/Workspace/ideas/sample-factory/train_dir/Ant/checkpoint_p0/checkpoint_000002776_1421312.pth [2024-04-25 12:14:28,693][18169] Saving new best policy, reward=3275.058! [2024-04-25 12:14:30,832][18183] Updated weights for policy 0, policy_version 3520 (0.0008) [2024-04-25 12:14:33,678][18010] Fps is (10 sec: 11878.8, 60 sec: 11537.1, 300 sec: 11443.2). Total num frames: 1835008. Throughput: 0: 11595.7. Samples: 1826824. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-04-25 12:14:33,678][18010] Avg episode reward: [(0, '3335.740')] [2024-04-25 12:14:33,678][18169] Saving new best policy, reward=3335.740! [2024-04-25 12:14:34,384][18183] Updated weights for policy 0, policy_version 3600 (0.0009) [2024-04-25 12:14:37,809][18183] Updated weights for policy 0, policy_version 3680 (0.0007) [2024-04-25 12:14:38,678][18010] Fps is (10 sec: 11878.4, 60 sec: 11605.3, 300 sec: 11444.0). Total num frames: 1892352. Throughput: 0: 11632.3. Samples: 1861836. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-04-25 12:14:38,678][18010] Avg episode reward: [(0, '3403.285')] [2024-04-25 12:14:38,678][18169] Saving new best policy, reward=3403.285! [2024-04-25 12:14:41,345][18183] Updated weights for policy 0, policy_version 3760 (0.0006) [2024-04-25 12:14:43,678][18010] Fps is (10 sec: 11468.7, 60 sec: 11605.3, 300 sec: 11444.7). Total num frames: 1949696. Throughput: 0: 11735.0. Samples: 1932624. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-04-25 12:14:43,678][18010] Avg episode reward: [(0, '3713.850')] [2024-04-25 12:14:43,682][18169] Saving /home/bartek/Workspace/ideas/sample-factory/train_dir/Ant/checkpoint_p0/checkpoint_000003808_1949696.pth... [2024-04-25 12:14:43,686][18169] Removing /home/bartek/Workspace/ideas/sample-factory/train_dir/Ant/checkpoint_p0/checkpoint_000003120_1597440.pth [2024-04-25 12:14:43,686][18169] Saving new best policy, reward=3713.850! [2024-04-25 12:14:44,845][18183] Updated weights for policy 0, policy_version 3840 (0.0006) [2024-04-25 12:14:48,338][18183] Updated weights for policy 0, policy_version 3920 (0.0005) [2024-04-25 12:14:48,678][18010] Fps is (10 sec: 11468.1, 60 sec: 11605.2, 300 sec: 11445.4). Total num frames: 2007040. Throughput: 0: 11749.2. Samples: 2002816. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-04-25 12:14:48,678][18010] Avg episode reward: [(0, '3555.687')] [2024-04-25 12:14:51,835][18183] Updated weights for policy 0, policy_version 4000 (0.0006) [2024-04-25 12:14:53,678][18010] Fps is (10 sec: 11878.5, 60 sec: 11673.6, 300 sec: 11468.8). Total num frames: 2068480. Throughput: 0: 11761.9. Samples: 2038308. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-04-25 12:14:53,678][18010] Avg episode reward: [(0, '3542.870')] [2024-04-25 12:14:55,233][18183] Updated weights for policy 0, policy_version 4080 (0.0008) [2024-04-25 12:14:58,678][18010] Fps is (10 sec: 11878.7, 60 sec: 11741.8, 300 sec: 11468.8). Total num frames: 2125824. Throughput: 0: 11746.0. Samples: 2109420. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-04-25 12:14:58,678][18010] Avg episode reward: [(0, '3543.417')] [2024-04-25 12:14:58,681][18169] Saving /home/bartek/Workspace/ideas/sample-factory/train_dir/Ant/checkpoint_p0/checkpoint_000004152_2125824.pth... [2024-04-25 12:14:58,684][18169] Removing /home/bartek/Workspace/ideas/sample-factory/train_dir/Ant/checkpoint_p0/checkpoint_000003472_1777664.pth [2024-04-25 12:14:58,711][18183] Updated weights for policy 0, policy_version 4160 (0.0007) [2024-04-25 12:15:02,091][18183] Updated weights for policy 0, policy_version 4240 (0.0008) [2024-04-25 12:15:03,678][18010] Fps is (10 sec: 11878.4, 60 sec: 11810.3, 300 sec: 11490.4). Total num frames: 2187264. Throughput: 0: 11771.9. Samples: 2180588. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-04-25 12:15:03,678][18010] Avg episode reward: [(0, '3463.342')] [2024-04-25 12:15:05,626][18183] Updated weights for policy 0, policy_version 4320 (0.0008) [2024-04-25 12:15:08,678][18010] Fps is (10 sec: 11878.8, 60 sec: 11741.9, 300 sec: 11489.8). Total num frames: 2244608. Throughput: 0: 11781.2. Samples: 2215908. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-04-25 12:15:08,678][18010] Avg episode reward: [(0, '3539.587')] [2024-04-25 12:15:09,063][18183] Updated weights for policy 0, policy_version 4400 (0.0008) [2024-04-25 12:15:12,561][18183] Updated weights for policy 0, policy_version 4480 (0.0008) [2024-04-25 12:15:13,678][18010] Fps is (10 sec: 11878.3, 60 sec: 11810.2, 300 sec: 11509.8). Total num frames: 2306048. Throughput: 0: 11769.1. Samples: 2286868. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-04-25 12:15:13,678][18010] Avg episode reward: [(0, '3758.082')] [2024-04-25 12:15:13,681][18169] Saving /home/bartek/Workspace/ideas/sample-factory/train_dir/Ant/checkpoint_p0/checkpoint_000004504_2306048.pth... [2024-04-25 12:15:13,685][18169] Removing /home/bartek/Workspace/ideas/sample-factory/train_dir/Ant/checkpoint_p0/checkpoint_000003808_1949696.pth [2024-04-25 12:15:13,686][18169] Saving new best policy, reward=3758.082! [2024-04-25 12:15:16,174][18183] Updated weights for policy 0, policy_version 4560 (0.0007) [2024-04-25 12:15:18,678][18010] Fps is (10 sec: 11468.8, 60 sec: 11741.9, 300 sec: 11488.8). Total num frames: 2359296. Throughput: 0: 11717.0. Samples: 2354088. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-04-25 12:15:18,678][18010] Avg episode reward: [(0, '3679.422')] [2024-04-25 12:15:19,976][18183] Updated weights for policy 0, policy_version 4640 (0.0008) [2024-04-25 12:15:23,678][18010] Fps is (10 sec: 10649.6, 60 sec: 11605.4, 300 sec: 11468.8). Total num frames: 2412544. Throughput: 0: 11655.9. Samples: 2386352. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-04-25 12:15:23,678][18010] Avg episode reward: [(0, '3670.392')] [2024-04-25 12:15:23,830][18183] Updated weights for policy 0, policy_version 4720 (0.0006) [2024-04-25 12:15:27,546][18183] Updated weights for policy 0, policy_version 4800 (0.0008) [2024-04-25 12:15:28,678][18010] Fps is (10 sec: 11059.2, 60 sec: 11605.3, 300 sec: 11468.8). Total num frames: 2469888. Throughput: 0: 11510.1. Samples: 2450576. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-04-25 12:15:28,678][18010] Avg episode reward: [(0, '3763.645')] [2024-04-25 12:15:28,682][18169] Saving /home/bartek/Workspace/ideas/sample-factory/train_dir/Ant/checkpoint_p0/checkpoint_000004824_2469888.pth... [2024-04-25 12:15:28,685][18169] Removing /home/bartek/Workspace/ideas/sample-factory/train_dir/Ant/checkpoint_p0/checkpoint_000004152_2125824.pth [2024-04-25 12:15:28,686][18169] Saving new best policy, reward=3763.645! [2024-04-25 12:15:31,108][18183] Updated weights for policy 0, policy_version 4880 (0.0007) [2024-04-25 12:15:33,678][18010] Fps is (10 sec: 11058.6, 60 sec: 11468.7, 300 sec: 11450.2). Total num frames: 2523136. Throughput: 0: 11473.1. Samples: 2519104. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-04-25 12:15:33,679][18010] Avg episode reward: [(0, '3813.962')] [2024-04-25 12:15:33,679][18169] Saving new best policy, reward=3813.962! [2024-04-25 12:15:34,906][18183] Updated weights for policy 0, policy_version 4960 (0.0007) [2024-04-25 12:15:38,678][18010] Fps is (10 sec: 10649.6, 60 sec: 11400.5, 300 sec: 11432.4). Total num frames: 2576384. Throughput: 0: 11394.7. Samples: 2551068. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-04-25 12:15:38,678][18010] Avg episode reward: [(0, '4048.412')] [2024-04-25 12:15:38,692][18169] Saving new best policy, reward=4048.412! [2024-04-25 12:15:38,694][18183] Updated weights for policy 0, policy_version 5040 (0.0007) [2024-04-25 12:15:42,288][18183] Updated weights for policy 0, policy_version 5120 (0.0008) [2024-04-25 12:15:43,678][18010] Fps is (10 sec: 11059.6, 60 sec: 11400.5, 300 sec: 11433.2). Total num frames: 2633728. Throughput: 0: 11288.8. Samples: 2617416. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-04-25 12:15:43,678][18010] Avg episode reward: [(0, '3946.728')] [2024-04-25 12:15:43,687][18169] Saving /home/bartek/Workspace/ideas/sample-factory/train_dir/Ant/checkpoint_p0/checkpoint_000005152_2637824.pth... [2024-04-25 12:15:43,690][18169] Removing /home/bartek/Workspace/ideas/sample-factory/train_dir/Ant/checkpoint_p0/checkpoint_000004504_2306048.pth [2024-04-25 12:15:45,801][18183] Updated weights for policy 0, policy_version 5200 (0.0007) [2024-04-25 12:15:48,678][18010] Fps is (10 sec: 11468.8, 60 sec: 11400.7, 300 sec: 11433.9). Total num frames: 2691072. Throughput: 0: 11203.8. Samples: 2684760. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-04-25 12:15:48,678][18010] Avg episode reward: [(0, '3823.764')] [2024-04-25 12:15:49,669][18183] Updated weights for policy 0, policy_version 5280 (0.0008) [2024-04-25 12:15:53,083][18183] Updated weights for policy 0, policy_version 5360 (0.0008) [2024-04-25 12:15:53,678][18010] Fps is (10 sec: 11469.0, 60 sec: 11332.3, 300 sec: 11434.7). Total num frames: 2748416. Throughput: 0: 11180.5. Samples: 2719032. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-04-25 12:15:53,678][18010] Avg episode reward: [(0, '4157.833')] [2024-04-25 12:15:53,678][18169] Saving new best policy, reward=4157.833! [2024-04-25 12:15:56,691][18183] Updated weights for policy 0, policy_version 5440 (0.0008) [2024-04-25 12:15:58,678][18010] Fps is (10 sec: 11468.7, 60 sec: 11332.3, 300 sec: 11435.4). Total num frames: 2805760. Throughput: 0: 11147.3. Samples: 2788496. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-04-25 12:15:58,678][18010] Avg episode reward: [(0, '4217.040')] [2024-04-25 12:15:58,682][18169] Saving /home/bartek/Workspace/ideas/sample-factory/train_dir/Ant/checkpoint_p0/checkpoint_000005480_2805760.pth... [2024-04-25 12:15:58,686][18169] Removing /home/bartek/Workspace/ideas/sample-factory/train_dir/Ant/checkpoint_p0/checkpoint_000004824_2469888.pth [2024-04-25 12:15:58,686][18169] Saving new best policy, reward=4217.040! [2024-04-25 12:16:00,183][18183] Updated weights for policy 0, policy_version 5520 (0.0008) [2024-04-25 12:16:03,655][18183] Updated weights for policy 0, policy_version 5600 (0.0008) [2024-04-25 12:16:03,678][18010] Fps is (10 sec: 11878.4, 60 sec: 11332.3, 300 sec: 11452.4). Total num frames: 2867200. Throughput: 0: 11220.4. Samples: 2859008. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-04-25 12:16:03,678][18010] Avg episode reward: [(0, '4298.690')] [2024-04-25 12:16:03,678][18169] Saving new best policy, reward=4298.690! [2024-04-25 12:16:07,092][18183] Updated weights for policy 0, policy_version 5680 (0.0007) [2024-04-25 12:16:08,678][18010] Fps is (10 sec: 11878.5, 60 sec: 11332.3, 300 sec: 11452.7). Total num frames: 2924544. Throughput: 0: 11304.4. Samples: 2895048. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-04-25 12:16:08,678][18010] Avg episode reward: [(0, '4308.525')] [2024-04-25 12:16:08,678][18169] Saving new best policy, reward=4308.525! [2024-04-25 12:16:10,587][18183] Updated weights for policy 0, policy_version 5760 (0.0008) [2024-04-25 12:16:13,677][18010] Fps is (10 sec: 11878.4, 60 sec: 11332.3, 300 sec: 11468.8). Total num frames: 2985984. Throughput: 0: 11444.3. Samples: 2965568. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-04-25 12:16:13,678][18010] Avg episode reward: [(0, '4128.371')] [2024-04-25 12:16:13,680][18169] Saving /home/bartek/Workspace/ideas/sample-factory/train_dir/Ant/checkpoint_p0/checkpoint_000005832_2985984.pth... [2024-04-25 12:16:13,684][18169] Removing /home/bartek/Workspace/ideas/sample-factory/train_dir/Ant/checkpoint_p0/checkpoint_000005152_2637824.pth [2024-04-25 12:16:14,031][18183] Updated weights for policy 0, policy_version 5840 (0.0008) [2024-04-25 12:16:17,359][18183] Updated weights for policy 0, policy_version 5920 (0.0008) [2024-04-25 12:16:18,678][18010] Fps is (10 sec: 11878.4, 60 sec: 11400.5, 300 sec: 11468.8). Total num frames: 3043328. Throughput: 0: 11537.2. Samples: 3038272. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-04-25 12:16:18,678][18010] Avg episode reward: [(0, '3964.065')] [2024-04-25 12:16:20,811][18183] Updated weights for policy 0, policy_version 6000 (0.0008) [2024-04-25 12:16:23,678][18010] Fps is (10 sec: 11878.3, 60 sec: 11537.1, 300 sec: 11484.0). Total num frames: 3104768. Throughput: 0: 11612.6. Samples: 3073636. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-04-25 12:16:23,678][18010] Avg episode reward: [(0, '4031.521')] [2024-04-25 12:16:24,136][18183] Updated weights for policy 0, policy_version 6080 (0.0008) [2024-04-25 12:16:27,688][18183] Updated weights for policy 0, policy_version 6160 (0.0008) [2024-04-25 12:16:28,678][18010] Fps is (10 sec: 11878.0, 60 sec: 11537.0, 300 sec: 11483.7). Total num frames: 3162112. Throughput: 0: 11737.9. Samples: 3145624. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-04-25 12:16:28,678][18010] Avg episode reward: [(0, '4147.268')] [2024-04-25 12:16:28,681][18169] Saving /home/bartek/Workspace/ideas/sample-factory/train_dir/Ant/checkpoint_p0/checkpoint_000006176_3162112.pth... [2024-04-25 12:16:28,684][18169] Removing /home/bartek/Workspace/ideas/sample-factory/train_dir/Ant/checkpoint_p0/checkpoint_000005480_2805760.pth [2024-04-25 12:16:31,104][18183] Updated weights for policy 0, policy_version 6240 (0.0008) [2024-04-25 12:16:33,678][18010] Fps is (10 sec: 11878.4, 60 sec: 11673.7, 300 sec: 11498.1). Total num frames: 3223552. Throughput: 0: 11792.7. Samples: 3215432. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-04-25 12:16:33,678][18010] Avg episode reward: [(0, '4159.181')] [2024-04-25 12:16:34,625][18183] Updated weights for policy 0, policy_version 6320 (0.0008) [2024-04-25 12:16:38,100][18183] Updated weights for policy 0, policy_version 6400 (0.0008) [2024-04-25 12:16:38,678][18010] Fps is (10 sec: 11878.8, 60 sec: 11741.9, 300 sec: 11497.5). Total num frames: 3280896. Throughput: 0: 11847.7. Samples: 3252180. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-04-25 12:16:38,678][18010] Avg episode reward: [(0, '3848.930')] [2024-04-25 12:16:41,591][18183] Updated weights for policy 0, policy_version 6480 (0.0009) [2024-04-25 12:16:43,678][18010] Fps is (10 sec: 11468.4, 60 sec: 11741.8, 300 sec: 11497.0). Total num frames: 3338240. Throughput: 0: 11852.5. Samples: 3321864. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-04-25 12:16:43,678][18010] Avg episode reward: [(0, '3760.982')] [2024-04-25 12:16:43,682][18169] Saving /home/bartek/Workspace/ideas/sample-factory/train_dir/Ant/checkpoint_p0/checkpoint_000006520_3338240.pth... [2024-04-25 12:16:43,684][18169] Removing /home/bartek/Workspace/ideas/sample-factory/train_dir/Ant/checkpoint_p0/checkpoint_000005832_2985984.pth [2024-04-25 12:16:45,063][18183] Updated weights for policy 0, policy_version 6560 (0.0008) [2024-04-25 12:16:48,545][18183] Updated weights for policy 0, policy_version 6640 (0.0006) [2024-04-25 12:16:48,678][18010] Fps is (10 sec: 11878.4, 60 sec: 11810.1, 300 sec: 11510.5). Total num frames: 3399680. Throughput: 0: 11852.9. Samples: 3392388. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-04-25 12:16:48,678][18010] Avg episode reward: [(0, '3896.960')] [2024-04-25 12:16:52,038][18183] Updated weights for policy 0, policy_version 6720 (0.0007) [2024-04-25 12:16:53,678][18010] Fps is (10 sec: 11878.5, 60 sec: 11810.1, 300 sec: 11510.4). Total num frames: 3457024. Throughput: 0: 11849.3. Samples: 3428272. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-04-25 12:16:53,678][18010] Avg episode reward: [(0, '4393.778')] [2024-04-25 12:16:53,679][18169] Saving new best policy, reward=4393.778! [2024-04-25 12:16:55,469][18183] Updated weights for policy 0, policy_version 6800 (0.0007) [2024-04-25 12:16:58,678][18010] Fps is (10 sec: 11878.3, 60 sec: 11878.4, 300 sec: 11524.3). Total num frames: 3518464. Throughput: 0: 11855.6. Samples: 3499072. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-04-25 12:16:58,678][18010] Avg episode reward: [(0, '4489.862')] [2024-04-25 12:16:58,682][18169] Saving /home/bartek/Workspace/ideas/sample-factory/train_dir/Ant/checkpoint_p0/checkpoint_000006872_3518464.pth... [2024-04-25 12:16:58,686][18169] Removing /home/bartek/Workspace/ideas/sample-factory/train_dir/Ant/checkpoint_p0/checkpoint_000006176_3162112.pth [2024-04-25 12:16:58,686][18169] Saving new best policy, reward=4489.862! [2024-04-25 12:16:58,797][18183] Updated weights for policy 0, policy_version 6880 (0.0006) [2024-04-25 12:17:02,159][18183] Updated weights for policy 0, policy_version 6960 (0.0006) [2024-04-25 12:17:03,678][18010] Fps is (10 sec: 12288.4, 60 sec: 11878.4, 300 sec: 11552.1). Total num frames: 3579904. Throughput: 0: 11872.8. Samples: 3572548. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-04-25 12:17:03,678][18010] Avg episode reward: [(0, '4520.115')] [2024-04-25 12:17:03,678][18169] Saving new best policy, reward=4520.115! [2024-04-25 12:17:05,706][18183] Updated weights for policy 0, policy_version 7040 (0.0008) [2024-04-25 12:17:08,678][18010] Fps is (10 sec: 11878.5, 60 sec: 11878.4, 300 sec: 11566.0). Total num frames: 3637248. Throughput: 0: 11872.4. Samples: 3607896. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-04-25 12:17:08,678][18010] Avg episode reward: [(0, '4370.675')] [2024-04-25 12:17:09,101][18183] Updated weights for policy 0, policy_version 7120 (0.0008) [2024-04-25 12:17:12,481][18183] Updated weights for policy 0, policy_version 7200 (0.0008) [2024-04-25 12:17:13,678][18010] Fps is (10 sec: 11878.4, 60 sec: 11878.4, 300 sec: 11579.9). Total num frames: 3698688. Throughput: 0: 11873.7. Samples: 3679936. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-04-25 12:17:13,678][18010] Avg episode reward: [(0, '4035.377')] [2024-04-25 12:17:13,681][18169] Saving /home/bartek/Workspace/ideas/sample-factory/train_dir/Ant/checkpoint_p0/checkpoint_000007224_3698688.pth... [2024-04-25 12:17:13,685][18169] Removing /home/bartek/Workspace/ideas/sample-factory/train_dir/Ant/checkpoint_p0/checkpoint_000006520_3338240.pth [2024-04-25 12:17:15,956][18183] Updated weights for policy 0, policy_version 7280 (0.0008) [2024-04-25 12:17:18,678][18010] Fps is (10 sec: 11877.9, 60 sec: 11878.3, 300 sec: 11566.0). Total num frames: 3756032. Throughput: 0: 11912.2. Samples: 3751488. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-04-25 12:17:18,678][18010] Avg episode reward: [(0, '4434.165')] [2024-04-25 12:17:19,396][18183] Updated weights for policy 0, policy_version 7360 (0.0008) [2024-04-25 12:17:22,834][18183] Updated weights for policy 0, policy_version 7440 (0.0008) [2024-04-25 12:17:23,677][18010] Fps is (10 sec: 11878.5, 60 sec: 11878.4, 300 sec: 11593.8). Total num frames: 3817472. Throughput: 0: 11889.1. Samples: 3787188. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-04-25 12:17:23,678][18010] Avg episode reward: [(0, '4432.497')] [2024-04-25 12:17:26,223][18183] Updated weights for policy 0, policy_version 7520 (0.0007) [2024-04-25 12:17:28,678][18010] Fps is (10 sec: 11878.9, 60 sec: 11878.5, 300 sec: 11593.8). Total num frames: 3874816. Throughput: 0: 11924.0. Samples: 3858440. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-04-25 12:17:28,678][18010] Avg episode reward: [(0, '4285.665')] [2024-04-25 12:17:28,710][18169] Saving /home/bartek/Workspace/ideas/sample-factory/train_dir/Ant/checkpoint_p0/checkpoint_000007576_3878912.pth... [2024-04-25 12:17:28,712][18169] Removing /home/bartek/Workspace/ideas/sample-factory/train_dir/Ant/checkpoint_p0/checkpoint_000006872_3518464.pth [2024-04-25 12:17:29,647][18183] Updated weights for policy 0, policy_version 7600 (0.0007) [2024-04-25 12:17:32,993][18183] Updated weights for policy 0, policy_version 7680 (0.0007) [2024-04-25 12:17:33,678][18010] Fps is (10 sec: 12288.0, 60 sec: 11946.7, 300 sec: 11621.5). Total num frames: 3940352. Throughput: 0: 11992.3. Samples: 3932040. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-04-25 12:17:33,678][18010] Avg episode reward: [(0, '4318.483')] [2024-04-25 12:17:36,487][18183] Updated weights for policy 0, policy_version 7760 (0.0008) [2024-04-25 12:17:38,678][18010] Fps is (10 sec: 11878.4, 60 sec: 11878.4, 300 sec: 11621.5). Total num frames: 3993600. Throughput: 0: 11980.3. Samples: 3967384. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-04-25 12:17:38,678][18010] Avg episode reward: [(0, '4313.575')] [2024-04-25 12:17:40,240][18183] Updated weights for policy 0, policy_version 7840 (0.0008) [2024-04-25 12:17:43,678][18010] Fps is (10 sec: 11059.1, 60 sec: 11878.5, 300 sec: 11621.5). Total num frames: 4050944. Throughput: 0: 11884.0. Samples: 4033852. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-04-25 12:17:43,678][18010] Avg episode reward: [(0, '4306.333')] [2024-04-25 12:17:43,681][18169] Saving /home/bartek/Workspace/ideas/sample-factory/train_dir/Ant/checkpoint_p0/checkpoint_000007912_4050944.pth... [2024-04-25 12:17:43,686][18169] Removing /home/bartek/Workspace/ideas/sample-factory/train_dir/Ant/checkpoint_p0/checkpoint_000007224_3698688.pth [2024-04-25 12:17:43,749][18183] Updated weights for policy 0, policy_version 7920 (0.0008) [2024-04-25 12:17:47,495][18183] Updated weights for policy 0, policy_version 8000 (0.0007) [2024-04-25 12:17:48,678][18010] Fps is (10 sec: 11468.8, 60 sec: 11810.1, 300 sec: 11607.6). Total num frames: 4108288. Throughput: 0: 11726.9. Samples: 4100260. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-04-25 12:17:48,678][18010] Avg episode reward: [(0, '4321.427')] [2024-04-25 12:17:51,165][18183] Updated weights for policy 0, policy_version 8080 (0.0007) [2024-04-25 12:17:53,678][18010] Fps is (10 sec: 10649.7, 60 sec: 11673.7, 300 sec: 11579.9). Total num frames: 4157440. Throughput: 0: 11698.8. Samples: 4134344. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-04-25 12:17:53,678][18010] Avg episode reward: [(0, '4341.977')] [2024-04-25 12:17:55,383][18183] Updated weights for policy 0, policy_version 8160 (0.0006) [2024-04-25 12:17:58,678][18010] Fps is (10 sec: 10239.9, 60 sec: 11537.1, 300 sec: 11566.0). Total num frames: 4210688. Throughput: 0: 11430.8. Samples: 4194324. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-04-25 12:17:58,678][18010] Avg episode reward: [(0, '4369.566')] [2024-04-25 12:17:58,681][18169] Saving /home/bartek/Workspace/ideas/sample-factory/train_dir/Ant/checkpoint_p0/checkpoint_000008224_4210688.pth... [2024-04-25 12:17:58,684][18169] Removing /home/bartek/Workspace/ideas/sample-factory/train_dir/Ant/checkpoint_p0/checkpoint_000007576_3878912.pth [2024-04-25 12:17:59,143][18183] Updated weights for policy 0, policy_version 8240 (0.0007) [2024-04-25 12:18:02,673][18183] Updated weights for policy 0, policy_version 8320 (0.0009) [2024-04-25 12:18:03,678][18010] Fps is (10 sec: 11059.2, 60 sec: 11468.8, 300 sec: 11566.0). Total num frames: 4268032. Throughput: 0: 11383.5. Samples: 4263740. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-04-25 12:18:03,678][18010] Avg episode reward: [(0, '4550.807')] [2024-04-25 12:18:03,678][18169] Saving new best policy, reward=4550.807! [2024-04-25 12:18:06,300][18183] Updated weights for policy 0, policy_version 8400 (0.0007) [2024-04-25 12:18:08,678][18010] Fps is (10 sec: 11468.9, 60 sec: 11468.8, 300 sec: 11566.0). Total num frames: 4325376. Throughput: 0: 11322.8. Samples: 4296716. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-04-25 12:18:08,678][18010] Avg episode reward: [(0, '4576.575')] [2024-04-25 12:18:08,704][18169] Saving new best policy, reward=4576.575! [2024-04-25 12:18:09,759][18183] Updated weights for policy 0, policy_version 8480 (0.0007) [2024-04-25 12:18:13,355][18183] Updated weights for policy 0, policy_version 8560 (0.0007) [2024-04-25 12:18:13,678][18010] Fps is (10 sec: 11468.2, 60 sec: 11400.4, 300 sec: 11566.0). Total num frames: 4382720. Throughput: 0: 11294.7. Samples: 4366708. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-04-25 12:18:13,679][18010] Avg episode reward: [(0, '4492.174')] [2024-04-25 12:18:13,682][18169] Saving /home/bartek/Workspace/ideas/sample-factory/train_dir/Ant/checkpoint_p0/checkpoint_000008560_4382720.pth... [2024-04-25 12:18:13,684][18169] Removing /home/bartek/Workspace/ideas/sample-factory/train_dir/Ant/checkpoint_p0/checkpoint_000007912_4050944.pth [2024-04-25 12:18:16,853][18183] Updated weights for policy 0, policy_version 8640 (0.0008) [2024-04-25 12:18:18,678][18010] Fps is (10 sec: 11468.8, 60 sec: 11400.6, 300 sec: 11566.0). Total num frames: 4440064. Throughput: 0: 11197.9. Samples: 4435944. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-04-25 12:18:18,678][18010] Avg episode reward: [(0, '4398.239')] [2024-04-25 12:18:20,476][18183] Updated weights for policy 0, policy_version 8720 (0.0008) [2024-04-25 12:18:23,678][18010] Fps is (10 sec: 11879.1, 60 sec: 11400.5, 300 sec: 11579.9). Total num frames: 4501504. Throughput: 0: 11155.5. Samples: 4469380. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-04-25 12:18:23,678][18010] Avg episode reward: [(0, '4388.993')] [2024-04-25 12:18:23,984][18183] Updated weights for policy 0, policy_version 8800 (0.0008) [2024-04-25 12:18:27,438][18183] Updated weights for policy 0, policy_version 8880 (0.0007) [2024-04-25 12:18:28,678][18010] Fps is (10 sec: 11878.4, 60 sec: 11400.5, 300 sec: 11579.9). Total num frames: 4558848. Throughput: 0: 11269.3. Samples: 4540968. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-04-25 12:18:28,678][18010] Avg episode reward: [(0, '4380.707')] [2024-04-25 12:18:28,681][18169] Saving /home/bartek/Workspace/ideas/sample-factory/train_dir/Ant/checkpoint_p0/checkpoint_000008904_4558848.pth... [2024-04-25 12:18:28,685][18169] Removing /home/bartek/Workspace/ideas/sample-factory/train_dir/Ant/checkpoint_p0/checkpoint_000008224_4210688.pth [2024-04-25 12:18:30,867][18183] Updated weights for policy 0, policy_version 8960 (0.0007) [2024-04-25 12:18:33,677][18010] Fps is (10 sec: 11878.4, 60 sec: 11332.3, 300 sec: 11607.6). Total num frames: 4620288. Throughput: 0: 11383.3. Samples: 4612508. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-04-25 12:18:33,678][18010] Avg episode reward: [(0, '4536.185')] [2024-04-25 12:18:34,266][18183] Updated weights for policy 0, policy_version 9040 (0.0007) [2024-04-25 12:18:37,735][18183] Updated weights for policy 0, policy_version 9120 (0.0007) [2024-04-25 12:18:38,678][18010] Fps is (10 sec: 11878.5, 60 sec: 11400.5, 300 sec: 11607.6). Total num frames: 4677632. Throughput: 0: 11427.9. Samples: 4648600. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-04-25 12:18:38,678][18010] Avg episode reward: [(0, '4656.095')] [2024-04-25 12:18:38,678][18169] Saving new best policy, reward=4656.095! [2024-04-25 12:18:41,271][18183] Updated weights for policy 0, policy_version 9200 (0.0008) [2024-04-25 12:18:43,678][18010] Fps is (10 sec: 11878.2, 60 sec: 11468.8, 300 sec: 11621.5). Total num frames: 4739072. Throughput: 0: 11648.7. Samples: 4718516. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-04-25 12:18:43,678][18010] Avg episode reward: [(0, '4390.243')] [2024-04-25 12:18:43,681][18169] Saving /home/bartek/Workspace/ideas/sample-factory/train_dir/Ant/checkpoint_p0/checkpoint_000009256_4739072.pth... [2024-04-25 12:18:43,684][18169] Removing /home/bartek/Workspace/ideas/sample-factory/train_dir/Ant/checkpoint_p0/checkpoint_000008560_4382720.pth [2024-04-25 12:18:44,935][18183] Updated weights for policy 0, policy_version 9280 (0.0007) [2024-04-25 12:18:48,331][18183] Updated weights for policy 0, policy_version 9360 (0.0008) [2024-04-25 12:18:48,678][18010] Fps is (10 sec: 11878.4, 60 sec: 11468.8, 300 sec: 11621.5). Total num frames: 4796416. Throughput: 0: 11643.5. Samples: 4787696. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-04-25 12:18:48,678][18010] Avg episode reward: [(0, '3944.175')] [2024-04-25 12:18:52,103][18169] Early stopping after 2 epochs (8 sgd steps), loss delta 0.0000009 [2024-04-25 12:18:52,105][18183] Updated weights for policy 0, policy_version 9440 (0.0008) [2024-04-25 12:18:53,678][18010] Fps is (10 sec: 11059.4, 60 sec: 11537.1, 300 sec: 11621.5). Total num frames: 4849664. Throughput: 0: 11652.0. Samples: 4821056. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-04-25 12:18:53,678][18010] Avg episode reward: [(0, '3972.322')] [2024-04-25 12:18:55,578][18183] Updated weights for policy 0, policy_version 9520 (0.0006) [2024-04-25 12:18:58,678][18010] Fps is (10 sec: 11058.4, 60 sec: 11605.2, 300 sec: 11621.5). Total num frames: 4907008. Throughput: 0: 11642.4. Samples: 4890620. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-04-25 12:18:58,679][18010] Avg episode reward: [(0, '4178.792')] [2024-04-25 12:18:58,682][18169] Saving /home/bartek/Workspace/ideas/sample-factory/train_dir/Ant/checkpoint_p0/checkpoint_000009584_4907008.pth... [2024-04-25 12:18:58,684][18169] Removing /home/bartek/Workspace/ideas/sample-factory/train_dir/Ant/checkpoint_p0/checkpoint_000008904_4558848.pth [2024-04-25 12:18:59,048][18183] Updated weights for policy 0, policy_version 9600 (0.0008) [2024-04-25 12:19:02,632][18183] Updated weights for policy 0, policy_version 9680 (0.0007) [2024-04-25 12:19:03,678][18010] Fps is (10 sec: 11468.7, 60 sec: 11605.3, 300 sec: 11607.6). Total num frames: 4964352. Throughput: 0: 11648.1. Samples: 4960108. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-04-25 12:19:03,678][18010] Avg episode reward: [(0, '4333.809')] [2024-04-25 12:19:06,336][18183] Updated weights for policy 0, policy_version 9760 (0.0008) [2024-04-25 12:19:08,677][18010] Fps is (10 sec: 11060.0, 60 sec: 11537.1, 300 sec: 11593.8). Total num frames: 5017600. Throughput: 0: 11636.9. Samples: 4993040. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-04-25 12:19:08,678][18010] Avg episode reward: [(0, '4301.992')] [2024-04-25 12:19:10,243][18183] Updated weights for policy 0, policy_version 9840 (0.0007) [2024-04-25 12:19:13,678][18010] Fps is (10 sec: 11059.3, 60 sec: 11537.2, 300 sec: 11593.8). Total num frames: 5074944. Throughput: 0: 11477.5. Samples: 5057456. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-04-25 12:19:13,678][18010] Avg episode reward: [(0, '4656.840')] [2024-04-25 12:19:13,682][18169] Saving /home/bartek/Workspace/ideas/sample-factory/train_dir/Ant/checkpoint_p0/checkpoint_000009912_5074944.pth... [2024-04-25 12:19:13,686][18169] Removing /home/bartek/Workspace/ideas/sample-factory/train_dir/Ant/checkpoint_p0/checkpoint_000009256_4739072.pth [2024-04-25 12:19:13,686][18169] Saving new best policy, reward=4656.840! [2024-04-25 12:19:13,840][18183] Updated weights for policy 0, policy_version 9920 (0.0007) [2024-04-25 12:19:17,387][18183] Updated weights for policy 0, policy_version 10000 (0.0008) [2024-04-25 12:19:18,678][18010] Fps is (10 sec: 11468.7, 60 sec: 11537.1, 300 sec: 11579.9). Total num frames: 5132288. Throughput: 0: 11427.2. Samples: 5126732. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-04-25 12:19:18,678][18010] Avg episode reward: [(0, '4858.170')] [2024-04-25 12:19:18,679][18169] Saving new best policy, reward=4858.170! [2024-04-25 12:19:20,800][18183] Updated weights for policy 0, policy_version 10080 (0.0008) [2024-04-25 12:19:23,677][18010] Fps is (10 sec: 11468.9, 60 sec: 11468.8, 300 sec: 11579.9). Total num frames: 5189632. Throughput: 0: 11425.8. Samples: 5162760. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-04-25 12:19:23,678][18010] Avg episode reward: [(0, '4797.994')] [2024-04-25 12:19:24,406][18183] Updated weights for policy 0, policy_version 10160 (0.0007) [2024-04-25 12:19:27,969][18183] Updated weights for policy 0, policy_version 10240 (0.0007) [2024-04-25 12:19:28,678][18010] Fps is (10 sec: 11468.8, 60 sec: 11468.8, 300 sec: 11566.0). Total num frames: 5246976. Throughput: 0: 11382.4. Samples: 5230724. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-04-25 12:19:28,678][18010] Avg episode reward: [(0, '4576.290')] [2024-04-25 12:19:28,698][18169] Saving /home/bartek/Workspace/ideas/sample-factory/train_dir/Ant/checkpoint_p0/checkpoint_000010256_5251072.pth... [2024-04-25 12:19:28,701][18169] Removing /home/bartek/Workspace/ideas/sample-factory/train_dir/Ant/checkpoint_p0/checkpoint_000009584_4907008.pth [2024-04-25 12:19:31,562][18183] Updated weights for policy 0, policy_version 10320 (0.0007) [2024-04-25 12:19:33,678][18010] Fps is (10 sec: 11468.8, 60 sec: 11400.5, 300 sec: 11566.0). Total num frames: 5304320. Throughput: 0: 11366.3. Samples: 5299180. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-04-25 12:19:33,678][18010] Avg episode reward: [(0, '4202.681')] [2024-04-25 12:19:35,423][18183] Updated weights for policy 0, policy_version 10400 (0.0007) [2024-04-25 12:19:38,678][18010] Fps is (10 sec: 11468.8, 60 sec: 11400.5, 300 sec: 11566.0). Total num frames: 5361664. Throughput: 0: 11325.5. Samples: 5330704. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-04-25 12:19:38,678][18010] Avg episode reward: [(0, '3973.564')] [2024-04-25 12:19:38,955][18183] Updated weights for policy 0, policy_version 10480 (0.0008) [2024-04-25 12:19:42,885][18183] Updated weights for policy 0, policy_version 10560 (0.0006) [2024-04-25 12:19:43,678][18010] Fps is (10 sec: 11059.2, 60 sec: 11264.0, 300 sec: 11552.1). Total num frames: 5414912. Throughput: 0: 11272.4. Samples: 5397868. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-04-25 12:19:43,678][18010] Avg episode reward: [(0, '4061.521')] [2024-04-25 12:19:43,681][18169] Saving /home/bartek/Workspace/ideas/sample-factory/train_dir/Ant/checkpoint_p0/checkpoint_000010576_5414912.pth... [2024-04-25 12:19:43,685][18169] Removing /home/bartek/Workspace/ideas/sample-factory/train_dir/Ant/checkpoint_p0/checkpoint_000009912_5074944.pth [2024-04-25 12:19:46,728][18183] Updated weights for policy 0, policy_version 10640 (0.0007) [2024-04-25 12:19:48,678][18010] Fps is (10 sec: 10240.0, 60 sec: 11127.5, 300 sec: 11510.5). Total num frames: 5464064. Throughput: 0: 11109.5. Samples: 5460032. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-04-25 12:19:48,678][18010] Avg episode reward: [(0, '3900.484')] [2024-04-25 12:19:50,533][18183] Updated weights for policy 0, policy_version 10720 (0.0008) [2024-04-25 12:19:53,678][18010] Fps is (10 sec: 10649.6, 60 sec: 11195.7, 300 sec: 11510.5). Total num frames: 5521408. Throughput: 0: 11109.7. Samples: 5492976. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-04-25 12:19:53,678][18010] Avg episode reward: [(0, '3818.098')] [2024-04-25 12:19:54,279][18183] Updated weights for policy 0, policy_version 10800 (0.0007) [2024-04-25 12:19:57,996][18183] Updated weights for policy 0, policy_version 10880 (0.0009) [2024-04-25 12:19:58,678][18010] Fps is (10 sec: 11059.1, 60 sec: 11127.6, 300 sec: 11482.7). Total num frames: 5574656. Throughput: 0: 11154.2. Samples: 5559396. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-04-25 12:19:58,678][18010] Avg episode reward: [(0, '3932.364')] [2024-04-25 12:19:58,681][18169] Saving /home/bartek/Workspace/ideas/sample-factory/train_dir/Ant/checkpoint_p0/checkpoint_000010888_5574656.pth... [2024-04-25 12:19:58,684][18169] Removing /home/bartek/Workspace/ideas/sample-factory/train_dir/Ant/checkpoint_p0/checkpoint_000010256_5251072.pth [2024-04-25 12:20:01,484][18183] Updated weights for policy 0, policy_version 10960 (0.0007) [2024-04-25 12:20:03,678][18010] Fps is (10 sec: 11059.3, 60 sec: 11127.5, 300 sec: 11482.7). Total num frames: 5632000. Throughput: 0: 11135.8. Samples: 5627844. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-04-25 12:20:03,678][18010] Avg episode reward: [(0, '4309.665')] [2024-04-25 12:20:05,122][18183] Updated weights for policy 0, policy_version 11040 (0.0008) [2024-04-25 12:20:08,667][18183] Updated weights for policy 0, policy_version 11120 (0.0008) [2024-04-25 12:20:08,678][18010] Fps is (10 sec: 11878.5, 60 sec: 11264.0, 300 sec: 11482.7). Total num frames: 5693440. Throughput: 0: 11079.0. Samples: 5661316. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-04-25 12:20:08,678][18010] Avg episode reward: [(0, '4475.545')] [2024-04-25 12:20:12,423][18183] Updated weights for policy 0, policy_version 11200 (0.0008) [2024-04-25 12:20:13,678][18010] Fps is (10 sec: 11468.7, 60 sec: 11195.7, 300 sec: 11482.7). Total num frames: 5746688. Throughput: 0: 11088.7. Samples: 5729716. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-04-25 12:20:13,678][18010] Avg episode reward: [(0, '4747.353')] [2024-04-25 12:20:13,682][18169] Saving /home/bartek/Workspace/ideas/sample-factory/train_dir/Ant/checkpoint_p0/checkpoint_000011224_5746688.pth... [2024-04-25 12:20:13,686][18169] Removing /home/bartek/Workspace/ideas/sample-factory/train_dir/Ant/checkpoint_p0/checkpoint_000010576_5414912.pth [2024-04-25 12:20:16,124][18183] Updated weights for policy 0, policy_version 11280 (0.0008) [2024-04-25 12:20:18,678][18010] Fps is (10 sec: 11059.2, 60 sec: 11195.7, 300 sec: 11496.6). Total num frames: 5804032. Throughput: 0: 11038.3. Samples: 5795904. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-04-25 12:20:18,678][18010] Avg episode reward: [(0, '4483.395')] [2024-04-25 12:20:19,650][18183] Updated weights for policy 0, policy_version 11360 (0.0007) [2024-04-25 12:20:23,377][18183] Updated weights for policy 0, policy_version 11440 (0.0008) [2024-04-25 12:20:23,678][18010] Fps is (10 sec: 11058.6, 60 sec: 11127.3, 300 sec: 11482.7). Total num frames: 5857280. Throughput: 0: 11126.2. Samples: 5831392. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-04-25 12:20:23,678][18010] Avg episode reward: [(0, '4281.430')] [2024-04-25 12:20:26,925][18183] Updated weights for policy 0, policy_version 11520 (0.0007) [2024-04-25 12:20:28,678][18010] Fps is (10 sec: 11468.8, 60 sec: 11195.7, 300 sec: 11510.5). Total num frames: 5918720. Throughput: 0: 11120.8. Samples: 5898304. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2024-04-25 12:20:28,678][18010] Avg episode reward: [(0, '4157.416')] [2024-04-25 12:20:28,681][18169] Saving /home/bartek/Workspace/ideas/sample-factory/train_dir/Ant/checkpoint_p0/checkpoint_000011560_5918720.pth... [2024-04-25 12:20:28,684][18169] Removing /home/bartek/Workspace/ideas/sample-factory/train_dir/Ant/checkpoint_p0/checkpoint_000010888_5574656.pth [2024-04-25 12:20:30,446][18183] Updated weights for policy 0, policy_version 11600 (0.0007) [2024-04-25 12:20:33,678][18010] Fps is (10 sec: 11879.1, 60 sec: 11195.7, 300 sec: 11524.3). Total num frames: 5976064. Throughput: 0: 11285.1. Samples: 5967864. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2024-04-25 12:20:33,678][18010] Avg episode reward: [(0, '4287.082')] [2024-04-25 12:20:33,976][18183] Updated weights for policy 0, policy_version 11680 (0.0007) [2024-04-25 12:20:37,537][18183] Updated weights for policy 0, policy_version 11760 (0.0007) [2024-04-25 12:20:38,678][18010] Fps is (10 sec: 11468.8, 60 sec: 11195.7, 300 sec: 11524.3). Total num frames: 6033408. Throughput: 0: 11329.8. Samples: 6002816. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-04-25 12:20:38,678][18010] Avg episode reward: [(0, '4320.669')] [2024-04-25 12:20:41,171][18183] Updated weights for policy 0, policy_version 11840 (0.0008) [2024-04-25 12:20:43,678][18010] Fps is (10 sec: 11468.8, 60 sec: 11264.0, 300 sec: 11524.3). Total num frames: 6090752. Throughput: 0: 11362.2. Samples: 6070696. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-04-25 12:20:43,678][18010] Avg episode reward: [(0, '4318.549')] [2024-04-25 12:20:43,681][18169] Saving /home/bartek/Workspace/ideas/sample-factory/train_dir/Ant/checkpoint_p0/checkpoint_000011896_6090752.pth... [2024-04-25 12:20:43,685][18169] Removing /home/bartek/Workspace/ideas/sample-factory/train_dir/Ant/checkpoint_p0/checkpoint_000011224_5746688.pth [2024-04-25 12:20:44,740][18183] Updated weights for policy 0, policy_version 11920 (0.0007) [2024-04-25 12:20:48,397][18183] Updated weights for policy 0, policy_version 12000 (0.0007) [2024-04-25 12:20:48,678][18010] Fps is (10 sec: 11059.1, 60 sec: 11332.3, 300 sec: 11510.5). Total num frames: 6144000. Throughput: 0: 11376.3. Samples: 6139780. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-04-25 12:20:48,678][18010] Avg episode reward: [(0, '4476.382')] [2024-04-25 12:20:52,223][18183] Updated weights for policy 0, policy_version 12080 (0.0008) [2024-04-25 12:20:53,678][18010] Fps is (10 sec: 11059.2, 60 sec: 11332.3, 300 sec: 11510.5). Total num frames: 6201344. Throughput: 0: 11332.5. Samples: 6171280. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-04-25 12:20:53,678][18010] Avg episode reward: [(0, '4514.714')] [2024-04-25 12:20:55,788][18183] Updated weights for policy 0, policy_version 12160 (0.0007) [2024-04-25 12:20:58,678][18010] Fps is (10 sec: 11059.2, 60 sec: 11332.3, 300 sec: 11482.7). Total num frames: 6254592. Throughput: 0: 11315.4. Samples: 6238908. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2024-04-25 12:20:58,678][18010] Avg episode reward: [(0, '4246.516')] [2024-04-25 12:20:58,681][18169] Saving /home/bartek/Workspace/ideas/sample-factory/train_dir/Ant/checkpoint_p0/checkpoint_000012216_6254592.pth... [2024-04-25 12:20:58,683][18169] Removing /home/bartek/Workspace/ideas/sample-factory/train_dir/Ant/checkpoint_p0/checkpoint_000011560_5918720.pth [2024-04-25 12:20:59,522][18183] Updated weights for policy 0, policy_version 12240 (0.0006) [2024-04-25 12:21:03,368][18183] Updated weights for policy 0, policy_version 12320 (0.0007) [2024-04-25 12:21:03,678][18010] Fps is (10 sec: 10649.6, 60 sec: 11264.0, 300 sec: 11468.8). Total num frames: 6307840. Throughput: 0: 11282.8. Samples: 6303632. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-04-25 12:21:03,678][18010] Avg episode reward: [(0, '4152.223')] [2024-04-25 12:21:07,122][18183] Updated weights for policy 0, policy_version 12400 (0.0006) [2024-04-25 12:21:08,678][18010] Fps is (10 sec: 11059.3, 60 sec: 11195.7, 300 sec: 11454.9). Total num frames: 6365184. Throughput: 0: 11223.8. Samples: 6336456. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-04-25 12:21:08,678][18010] Avg episode reward: [(0, '4250.936')] [2024-04-25 12:21:10,916][18183] Updated weights for policy 0, policy_version 12480 (0.0008) [2024-04-25 12:21:13,678][18010] Fps is (10 sec: 11059.2, 60 sec: 11195.7, 300 sec: 11441.0). Total num frames: 6418432. Throughput: 0: 11164.8. Samples: 6400720. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-04-25 12:21:13,678][18010] Avg episode reward: [(0, '4131.012')] [2024-04-25 12:21:13,682][18169] Saving /home/bartek/Workspace/ideas/sample-factory/train_dir/Ant/checkpoint_p0/checkpoint_000012536_6418432.pth... [2024-04-25 12:21:13,685][18169] Removing /home/bartek/Workspace/ideas/sample-factory/train_dir/Ant/checkpoint_p0/checkpoint_000011896_6090752.pth [2024-04-25 12:21:14,827][18183] Updated weights for policy 0, policy_version 12560 (0.0007) [2024-04-25 12:21:18,677][18010] Fps is (10 sec: 10240.0, 60 sec: 11059.2, 300 sec: 11399.4). Total num frames: 6467584. Throughput: 0: 11015.3. Samples: 6463552. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-04-25 12:21:18,678][18010] Avg episode reward: [(0, '4142.306')] [2024-04-25 12:21:18,714][18183] Updated weights for policy 0, policy_version 12640 (0.0007) [2024-04-25 12:21:22,617][18183] Updated weights for policy 0, policy_version 12720 (0.0007) [2024-04-25 12:21:23,678][18010] Fps is (10 sec: 10239.9, 60 sec: 11059.3, 300 sec: 11385.5). Total num frames: 6520832. Throughput: 0: 10959.5. Samples: 6495996. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-04-25 12:21:23,678][18010] Avg episode reward: [(0, '4296.502')] [2024-04-25 12:21:26,196][18183] Updated weights for policy 0, policy_version 12800 (0.0007) [2024-04-25 12:21:28,678][18010] Fps is (10 sec: 11468.8, 60 sec: 11059.2, 300 sec: 11385.5). Total num frames: 6582272. Throughput: 0: 10923.7. Samples: 6562264. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-04-25 12:21:28,678][18010] Avg episode reward: [(0, '4260.274')] [2024-04-25 12:21:28,681][18169] Saving /home/bartek/Workspace/ideas/sample-factory/train_dir/Ant/checkpoint_p0/checkpoint_000012856_6582272.pth... [2024-04-25 12:21:28,685][18169] Removing /home/bartek/Workspace/ideas/sample-factory/train_dir/Ant/checkpoint_p0/checkpoint_000012216_6254592.pth [2024-04-25 12:21:29,792][18183] Updated weights for policy 0, policy_version 12880 (0.0007) [2024-04-25 12:21:33,241][18183] Updated weights for policy 0, policy_version 12960 (0.0006) [2024-04-25 12:21:33,678][18010] Fps is (10 sec: 11878.4, 60 sec: 11059.2, 300 sec: 11385.5). Total num frames: 6639616. Throughput: 0: 10927.0. Samples: 6631496. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-04-25 12:21:33,678][18010] Avg episode reward: [(0, '4263.510')] [2024-04-25 12:21:36,960][18183] Updated weights for policy 0, policy_version 13040 (0.0007) [2024-04-25 12:21:38,678][18010] Fps is (10 sec: 11059.2, 60 sec: 10990.9, 300 sec: 11371.6). Total num frames: 6692864. Throughput: 0: 10993.6. Samples: 6665992. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-04-25 12:21:38,678][18010] Avg episode reward: [(0, '3932.444')] [2024-04-25 12:21:40,521][18183] Updated weights for policy 0, policy_version 13120 (0.0008) [2024-04-25 12:21:43,678][18010] Fps is (10 sec: 11059.2, 60 sec: 10990.9, 300 sec: 11357.7). Total num frames: 6750208. Throughput: 0: 10993.2. Samples: 6733600. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2024-04-25 12:21:43,678][18010] Avg episode reward: [(0, '4064.512')] [2024-04-25 12:21:43,682][18169] Saving /home/bartek/Workspace/ideas/sample-factory/train_dir/Ant/checkpoint_p0/checkpoint_000013184_6750208.pth... [2024-04-25 12:21:43,686][18169] Removing /home/bartek/Workspace/ideas/sample-factory/train_dir/Ant/checkpoint_p0/checkpoint_000012536_6418432.pth [2024-04-25 12:21:44,313][18183] Updated weights for policy 0, policy_version 13200 (0.0007) [2024-04-25 12:21:48,297][18183] Updated weights for policy 0, policy_version 13280 (0.0007) [2024-04-25 12:21:48,678][18010] Fps is (10 sec: 10649.6, 60 sec: 10922.7, 300 sec: 11330.0). Total num frames: 6799360. Throughput: 0: 10932.7. Samples: 6795604. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2024-04-25 12:21:48,678][18010] Avg episode reward: [(0, '4113.793')] [2024-04-25 12:21:52,114][18183] Updated weights for policy 0, policy_version 13360 (0.0007) [2024-04-25 12:21:53,678][18010] Fps is (10 sec: 10240.0, 60 sec: 10854.4, 300 sec: 11302.2). Total num frames: 6852608. Throughput: 0: 10923.9. Samples: 6828032. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-04-25 12:21:53,678][18010] Avg episode reward: [(0, '4272.755')] [2024-04-25 12:21:55,852][18183] Updated weights for policy 0, policy_version 13440 (0.0008) [2024-04-25 12:21:58,678][18010] Fps is (10 sec: 11468.7, 60 sec: 10990.9, 300 sec: 11302.2). Total num frames: 6914048. Throughput: 0: 10962.5. Samples: 6894032. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-04-25 12:21:58,678][18010] Avg episode reward: [(0, '4323.754')] [2024-04-25 12:21:58,682][18169] Saving /home/bartek/Workspace/ideas/sample-factory/train_dir/Ant/checkpoint_p0/checkpoint_000013504_6914048.pth... [2024-04-25 12:21:58,686][18169] Removing /home/bartek/Workspace/ideas/sample-factory/train_dir/Ant/checkpoint_p0/checkpoint_000012856_6582272.pth [2024-04-25 12:21:59,305][18183] Updated weights for policy 0, policy_version 13520 (0.0008) [2024-04-25 12:22:02,918][18183] Updated weights for policy 0, policy_version 13600 (0.0008) [2024-04-25 12:22:03,678][18010] Fps is (10 sec: 11878.4, 60 sec: 11059.2, 300 sec: 11302.2). Total num frames: 6971392. Throughput: 0: 11104.7. Samples: 6963264. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-04-25 12:22:03,678][18010] Avg episode reward: [(0, '3916.466')] [2024-04-25 12:22:06,509][18183] Updated weights for policy 0, policy_version 13680 (0.0008) [2024-04-25 12:22:08,678][18010] Fps is (10 sec: 11059.2, 60 sec: 10990.9, 300 sec: 11274.4). Total num frames: 7024640. Throughput: 0: 11146.8. Samples: 6997604. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-04-25 12:22:08,678][18010] Avg episode reward: [(0, '4167.148')] [2024-04-25 12:22:10,261][18183] Updated weights for policy 0, policy_version 13760 (0.0008) [2024-04-25 12:22:13,678][18010] Fps is (10 sec: 11059.2, 60 sec: 11059.2, 300 sec: 11274.4). Total num frames: 7081984. Throughput: 0: 11146.1. Samples: 7063840. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-04-25 12:22:13,678][18010] Avg episode reward: [(0, '4199.728')] [2024-04-25 12:22:13,681][18169] Saving /home/bartek/Workspace/ideas/sample-factory/train_dir/Ant/checkpoint_p0/checkpoint_000013832_7081984.pth... [2024-04-25 12:22:13,685][18169] Removing /home/bartek/Workspace/ideas/sample-factory/train_dir/Ant/checkpoint_p0/checkpoint_000013184_6750208.pth [2024-04-25 12:22:13,933][18183] Updated weights for policy 0, policy_version 13840 (0.0007) [2024-04-25 12:22:17,715][18183] Updated weights for policy 0, policy_version 13920 (0.0007) [2024-04-25 12:22:18,677][18010] Fps is (10 sec: 11059.3, 60 sec: 11127.5, 300 sec: 11246.6). Total num frames: 7135232. Throughput: 0: 11078.1. Samples: 7130008. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-04-25 12:22:18,678][18010] Avg episode reward: [(0, '4416.016')] [2024-04-25 12:22:21,355][18183] Updated weights for policy 0, policy_version 14000 (0.0007) [2024-04-25 12:22:23,678][18010] Fps is (10 sec: 11059.2, 60 sec: 11195.7, 300 sec: 11246.6). Total num frames: 7192576. Throughput: 0: 11049.2. Samples: 7163208. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-04-25 12:22:23,678][18010] Avg episode reward: [(0, '4163.614')] [2024-04-25 12:22:24,890][18183] Updated weights for policy 0, policy_version 14080 (0.0007) [2024-04-25 12:22:28,586][18183] Updated weights for policy 0, policy_version 14160 (0.0007) [2024-04-25 12:22:28,678][18010] Fps is (10 sec: 11468.7, 60 sec: 11127.5, 300 sec: 11218.9). Total num frames: 7249920. Throughput: 0: 11074.8. Samples: 7231964. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-04-25 12:22:28,678][18010] Avg episode reward: [(0, '4266.760')] [2024-04-25 12:22:28,682][18169] Saving /home/bartek/Workspace/ideas/sample-factory/train_dir/Ant/checkpoint_p0/checkpoint_000014160_7249920.pth... [2024-04-25 12:22:28,686][18169] Removing /home/bartek/Workspace/ideas/sample-factory/train_dir/Ant/checkpoint_p0/checkpoint_000013504_6914048.pth [2024-04-25 12:22:32,235][18183] Updated weights for policy 0, policy_version 14240 (0.0008) [2024-04-25 12:22:33,678][18010] Fps is (10 sec: 11059.2, 60 sec: 11059.2, 300 sec: 11218.9). Total num frames: 7303168. Throughput: 0: 11185.5. Samples: 7298952. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2024-04-25 12:22:33,678][18010] Avg episode reward: [(0, '4474.591')] [2024-04-25 12:22:35,926][18183] Updated weights for policy 0, policy_version 14320 (0.0007) [2024-04-25 12:22:38,678][18010] Fps is (10 sec: 11059.2, 60 sec: 11127.5, 300 sec: 11218.9). Total num frames: 7360512. Throughput: 0: 11197.2. Samples: 7331904. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2024-04-25 12:22:38,678][18010] Avg episode reward: [(0, '4604.267')] [2024-04-25 12:22:39,742][18183] Updated weights for policy 0, policy_version 14400 (0.0008) [2024-04-25 12:22:43,678][18010] Fps is (10 sec: 10649.5, 60 sec: 10990.9, 300 sec: 11191.1). Total num frames: 7409664. Throughput: 0: 11154.1. Samples: 7395968. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2024-04-25 12:22:43,678][18010] Avg episode reward: [(0, '4548.941')] [2024-04-25 12:22:43,694][18169] Saving /home/bartek/Workspace/ideas/sample-factory/train_dir/Ant/checkpoint_p0/checkpoint_000014480_7413760.pth... [2024-04-25 12:22:43,695][18183] Updated weights for policy 0, policy_version 14480 (0.0008) [2024-04-25 12:22:43,697][18169] Removing /home/bartek/Workspace/ideas/sample-factory/train_dir/Ant/checkpoint_p0/checkpoint_000013832_7081984.pth [2024-04-25 12:22:47,542][18183] Updated weights for policy 0, policy_version 14560 (0.0007) [2024-04-25 12:22:48,678][18010] Fps is (10 sec: 10240.0, 60 sec: 11059.2, 300 sec: 11205.0). Total num frames: 7462912. Throughput: 0: 11013.7. Samples: 7458880. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-04-25 12:22:48,678][18010] Avg episode reward: [(0, '4498.790')] [2024-04-25 12:22:51,320][18183] Updated weights for policy 0, policy_version 14640 (0.0007) [2024-04-25 12:22:53,678][18010] Fps is (10 sec: 11059.4, 60 sec: 11127.5, 300 sec: 11218.9). Total num frames: 7520256. Throughput: 0: 10978.8. Samples: 7491648. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-04-25 12:22:53,678][18010] Avg episode reward: [(0, '4515.162')] [2024-04-25 12:22:55,061][18183] Updated weights for policy 0, policy_version 14720 (0.0007) [2024-04-25 12:22:58,678][18010] Fps is (10 sec: 11059.2, 60 sec: 10990.9, 300 sec: 11205.0). Total num frames: 7573504. Throughput: 0: 10982.1. Samples: 7558036. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-04-25 12:22:58,678][18010] Avg episode reward: [(0, '4374.914')] [2024-04-25 12:22:58,681][18169] Saving /home/bartek/Workspace/ideas/sample-factory/train_dir/Ant/checkpoint_p0/checkpoint_000014792_7573504.pth... [2024-04-25 12:22:58,683][18169] Removing /home/bartek/Workspace/ideas/sample-factory/train_dir/Ant/checkpoint_p0/checkpoint_000014160_7249920.pth [2024-04-25 12:22:58,737][18183] Updated weights for policy 0, policy_version 14800 (0.0007) [2024-04-25 12:23:02,480][18183] Updated weights for policy 0, policy_version 14880 (0.0007) [2024-04-25 12:23:03,678][18010] Fps is (10 sec: 11059.1, 60 sec: 10990.9, 300 sec: 11205.0). Total num frames: 7630848. Throughput: 0: 10975.2. Samples: 7623892. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-04-25 12:23:03,678][18010] Avg episode reward: [(0, '4171.956')] [2024-04-25 12:23:06,319][18183] Updated weights for policy 0, policy_version 14960 (0.0007) [2024-04-25 12:23:08,678][18010] Fps is (10 sec: 11059.2, 60 sec: 10990.9, 300 sec: 11191.1). Total num frames: 7684096. Throughput: 0: 10935.6. Samples: 7655312. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-04-25 12:23:08,678][18010] Avg episode reward: [(0, '4547.253')] [2024-04-25 12:23:09,932][18183] Updated weights for policy 0, policy_version 15040 (0.0009) [2024-04-25 12:23:13,307][18183] Updated weights for policy 0, policy_version 15120 (0.0008) [2024-04-25 12:23:13,678][18010] Fps is (10 sec: 11468.8, 60 sec: 11059.2, 300 sec: 11205.0). Total num frames: 7745536. Throughput: 0: 10956.1. Samples: 7724988. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-04-25 12:23:13,678][18010] Avg episode reward: [(0, '4412.580')] [2024-04-25 12:23:13,681][18169] Saving /home/bartek/Workspace/ideas/sample-factory/train_dir/Ant/checkpoint_p0/checkpoint_000015128_7745536.pth... [2024-04-25 12:23:13,686][18169] Removing /home/bartek/Workspace/ideas/sample-factory/train_dir/Ant/checkpoint_p0/checkpoint_000014480_7413760.pth [2024-04-25 12:23:16,686][18183] Updated weights for policy 0, policy_version 15200 (0.0007) [2024-04-25 12:23:18,678][18010] Fps is (10 sec: 11878.4, 60 sec: 11127.5, 300 sec: 11191.1). Total num frames: 7802880. Throughput: 0: 11085.2. Samples: 7797784. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-04-25 12:23:18,678][18010] Avg episode reward: [(0, '4425.175')] [2024-04-25 12:23:20,152][18183] Updated weights for policy 0, policy_version 15280 (0.0007) [2024-04-25 12:23:23,678][18010] Fps is (10 sec: 11468.9, 60 sec: 11127.5, 300 sec: 11191.1). Total num frames: 7860224. Throughput: 0: 11108.5. Samples: 7831784. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-04-25 12:23:23,678][18010] Avg episode reward: [(0, '4397.829')] [2024-04-25 12:23:23,933][18183] Updated weights for policy 0, policy_version 15360 (0.0008) [2024-04-25 12:23:27,672][18183] Updated weights for policy 0, policy_version 15440 (0.0007) [2024-04-25 12:23:28,678][18010] Fps is (10 sec: 11059.2, 60 sec: 11059.2, 300 sec: 11163.3). Total num frames: 7913472. Throughput: 0: 11137.5. Samples: 7897152. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-04-25 12:23:28,678][18010] Avg episode reward: [(0, '4476.643')] [2024-04-25 12:23:28,682][18169] Saving /home/bartek/Workspace/ideas/sample-factory/train_dir/Ant/checkpoint_p0/checkpoint_000015456_7913472.pth... [2024-04-25 12:23:28,685][18169] Removing /home/bartek/Workspace/ideas/sample-factory/train_dir/Ant/checkpoint_p0/checkpoint_000014792_7573504.pth [2024-04-25 12:23:31,477][18183] Updated weights for policy 0, policy_version 15520 (0.0007) [2024-04-25 12:23:33,678][18010] Fps is (10 sec: 11059.2, 60 sec: 11127.5, 300 sec: 11163.3). Total num frames: 7970816. Throughput: 0: 11230.1. Samples: 7964232. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-04-25 12:23:33,678][18010] Avg episode reward: [(0, '4598.032')] [2024-04-25 12:23:34,843][18183] Updated weights for policy 0, policy_version 15600 (0.0008) [2024-04-25 12:23:38,421][18183] Updated weights for policy 0, policy_version 15680 (0.0006) [2024-04-25 12:23:38,678][18010] Fps is (10 sec: 11468.3, 60 sec: 11127.4, 300 sec: 11149.4). Total num frames: 8028160. Throughput: 0: 11286.6. Samples: 7999552. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-04-25 12:23:38,678][18010] Avg episode reward: [(0, '4636.045')] [2024-04-25 12:23:42,107][18183] Updated weights for policy 0, policy_version 15760 (0.0007) [2024-04-25 12:23:43,678][18010] Fps is (10 sec: 11468.7, 60 sec: 11264.0, 300 sec: 11149.4). Total num frames: 8085504. Throughput: 0: 11331.0. Samples: 8067932. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-04-25 12:23:43,678][18010] Avg episode reward: [(0, '4589.296')] [2024-04-25 12:23:43,681][18169] Saving /home/bartek/Workspace/ideas/sample-factory/train_dir/Ant/checkpoint_p0/checkpoint_000015792_8085504.pth... [2024-04-25 12:23:43,686][18169] Removing /home/bartek/Workspace/ideas/sample-factory/train_dir/Ant/checkpoint_p0/checkpoint_000015128_7745536.pth [2024-04-25 12:23:45,589][18183] Updated weights for policy 0, policy_version 15840 (0.0007) [2024-04-25 12:23:48,678][18010] Fps is (10 sec: 11469.3, 60 sec: 11332.3, 300 sec: 11163.3). Total num frames: 8142848. Throughput: 0: 11378.9. Samples: 8135940. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-04-25 12:23:48,678][18010] Avg episode reward: [(0, '4486.162')] [2024-04-25 12:23:49,404][18183] Updated weights for policy 0, policy_version 15920 (0.0008) [2024-04-25 12:23:53,142][18183] Updated weights for policy 0, policy_version 16000 (0.0008) [2024-04-25 12:23:53,678][18010] Fps is (10 sec: 11059.3, 60 sec: 11264.0, 300 sec: 11149.5). Total num frames: 8196096. Throughput: 0: 11380.5. Samples: 8167436. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-04-25 12:23:53,678][18010] Avg episode reward: [(0, '4730.408')] [2024-04-25 12:23:56,821][18183] Updated weights for policy 0, policy_version 16080 (0.0007) [2024-04-25 12:23:58,678][18010] Fps is (10 sec: 11059.2, 60 sec: 11332.3, 300 sec: 11149.5). Total num frames: 8253440. Throughput: 0: 11317.6. Samples: 8234280. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-04-25 12:23:58,678][18010] Avg episode reward: [(0, '4830.380')] [2024-04-25 12:23:58,681][18169] Saving /home/bartek/Workspace/ideas/sample-factory/train_dir/Ant/checkpoint_p0/checkpoint_000016120_8253440.pth... [2024-04-25 12:23:58,686][18169] Removing /home/bartek/Workspace/ideas/sample-factory/train_dir/Ant/checkpoint_p0/checkpoint_000015456_7913472.pth [2024-04-25 12:24:00,509][18183] Updated weights for policy 0, policy_version 16160 (0.0007) [2024-04-25 12:24:03,678][18010] Fps is (10 sec: 11059.2, 60 sec: 11264.0, 300 sec: 11149.4). Total num frames: 8306688. Throughput: 0: 11173.5. Samples: 8300592. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-04-25 12:24:03,678][18010] Avg episode reward: [(0, '4879.093')] [2024-04-25 12:24:03,678][18169] Saving new best policy, reward=4879.093! [2024-04-25 12:24:04,329][18183] Updated weights for policy 0, policy_version 16240 (0.0007) [2024-04-25 12:24:08,128][18183] Updated weights for policy 0, policy_version 16320 (0.0007) [2024-04-25 12:24:08,678][18010] Fps is (10 sec: 10649.6, 60 sec: 11264.0, 300 sec: 11135.6). Total num frames: 8359936. Throughput: 0: 11128.1. Samples: 8332548. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-04-25 12:24:08,678][18010] Avg episode reward: [(0, '4697.971')] [2024-04-25 12:24:11,846][18183] Updated weights for policy 0, policy_version 16400 (0.0007) [2024-04-25 12:24:13,678][18010] Fps is (10 sec: 11059.2, 60 sec: 11195.7, 300 sec: 11135.6). Total num frames: 8417280. Throughput: 0: 11126.7. Samples: 8397856. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2024-04-25 12:24:13,678][18010] Avg episode reward: [(0, '4722.303')] [2024-04-25 12:24:13,682][18169] Saving /home/bartek/Workspace/ideas/sample-factory/train_dir/Ant/checkpoint_p0/checkpoint_000016440_8417280.pth... [2024-04-25 12:24:13,686][18169] Removing /home/bartek/Workspace/ideas/sample-factory/train_dir/Ant/checkpoint_p0/checkpoint_000015792_8085504.pth [2024-04-25 12:24:15,245][18183] Updated weights for policy 0, policy_version 16480 (0.0008) [2024-04-25 12:24:18,678][18010] Fps is (10 sec: 11468.8, 60 sec: 11195.7, 300 sec: 11135.6). Total num frames: 8474624. Throughput: 0: 11169.9. Samples: 8466876. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2024-04-25 12:24:18,678][18010] Avg episode reward: [(0, '4738.537')] [2024-04-25 12:24:18,963][18183] Updated weights for policy 0, policy_version 16560 (0.0007) [2024-04-25 12:24:22,507][18183] Updated weights for policy 0, policy_version 16640 (0.0007) [2024-04-25 12:24:23,678][18010] Fps is (10 sec: 11468.8, 60 sec: 11195.7, 300 sec: 11135.6). Total num frames: 8531968. Throughput: 0: 11161.7. Samples: 8501824. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2024-04-25 12:24:23,678][18010] Avg episode reward: [(0, '4881.211')] [2024-04-25 12:24:23,678][18169] Saving new best policy, reward=4881.211! [2024-04-25 12:24:26,097][18183] Updated weights for policy 0, policy_version 16720 (0.0008) [2024-04-25 12:24:28,678][18010] Fps is (10 sec: 11059.1, 60 sec: 11195.7, 300 sec: 11121.7). Total num frames: 8585216. Throughput: 0: 11146.0. Samples: 8569500. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-04-25 12:24:28,678][18010] Avg episode reward: [(0, '5088.950')] [2024-04-25 12:24:28,689][18169] Saving /home/bartek/Workspace/ideas/sample-factory/train_dir/Ant/checkpoint_p0/checkpoint_000016776_8589312.pth... [2024-04-25 12:24:28,691][18169] Removing /home/bartek/Workspace/ideas/sample-factory/train_dir/Ant/checkpoint_p0/checkpoint_000016120_8253440.pth [2024-04-25 12:24:28,691][18169] Saving new best policy, reward=5088.950! [2024-04-25 12:24:29,837][18183] Updated weights for policy 0, policy_version 16800 (0.0008) [2024-04-25 12:24:33,518][18183] Updated weights for policy 0, policy_version 16880 (0.0007) [2024-04-25 12:24:33,678][18010] Fps is (10 sec: 11059.2, 60 sec: 11195.7, 300 sec: 11121.7). Total num frames: 8642560. Throughput: 0: 11121.6. Samples: 8636412. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-04-25 12:24:33,678][18010] Avg episode reward: [(0, '5203.627')] [2024-04-25 12:24:33,678][18169] Saving new best policy, reward=5203.627! [2024-04-25 12:24:37,095][18183] Updated weights for policy 0, policy_version 16960 (0.0007) [2024-04-25 12:24:38,677][18010] Fps is (10 sec: 11468.9, 60 sec: 11195.8, 300 sec: 11135.6). Total num frames: 8699904. Throughput: 0: 11179.9. Samples: 8670532. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-04-25 12:24:38,678][18010] Avg episode reward: [(0, '5099.735')] [2024-04-25 12:24:40,562][18183] Updated weights for policy 0, policy_version 17040 (0.0007) [2024-04-25 12:24:43,678][18010] Fps is (10 sec: 11468.7, 60 sec: 11195.7, 300 sec: 11163.3). Total num frames: 8757248. Throughput: 0: 11257.6. Samples: 8740872. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-04-25 12:24:43,678][18010] Avg episode reward: [(0, '5215.899')] [2024-04-25 12:24:43,702][18169] Saving /home/bartek/Workspace/ideas/sample-factory/train_dir/Ant/checkpoint_p0/checkpoint_000017112_8761344.pth... [2024-04-25 12:24:43,705][18169] Removing /home/bartek/Workspace/ideas/sample-factory/train_dir/Ant/checkpoint_p0/checkpoint_000016440_8417280.pth [2024-04-25 12:24:43,705][18169] Saving new best policy, reward=5215.899! [2024-04-25 12:24:44,056][18183] Updated weights for policy 0, policy_version 17120 (0.0007) [2024-04-25 12:24:47,808][18183] Updated weights for policy 0, policy_version 17200 (0.0007) [2024-04-25 12:24:48,678][18010] Fps is (10 sec: 11468.7, 60 sec: 11195.7, 300 sec: 11163.3). Total num frames: 8814592. Throughput: 0: 11275.7. Samples: 8808000. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-04-25 12:24:48,678][18010] Avg episode reward: [(0, '5267.448')] [2024-04-25 12:24:48,678][18169] Saving new best policy, reward=5267.448! [2024-04-25 12:24:51,409][18183] Updated weights for policy 0, policy_version 17280 (0.0008) [2024-04-25 12:24:53,678][18010] Fps is (10 sec: 11468.8, 60 sec: 11264.0, 300 sec: 11177.2). Total num frames: 8871936. Throughput: 0: 11317.9. Samples: 8841856. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-04-25 12:24:53,678][18010] Avg episode reward: [(0, '5128.761')] [2024-04-25 12:24:55,091][18183] Updated weights for policy 0, policy_version 17360 (0.0007) [2024-04-25 12:24:58,678][18010] Fps is (10 sec: 11058.6, 60 sec: 11195.6, 300 sec: 11163.3). Total num frames: 8925184. Throughput: 0: 11362.4. Samples: 8909172. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-04-25 12:24:58,679][18010] Avg episode reward: [(0, '5259.475')] [2024-04-25 12:24:58,682][18169] Saving /home/bartek/Workspace/ideas/sample-factory/train_dir/Ant/checkpoint_p0/checkpoint_000017432_8925184.pth... [2024-04-25 12:24:58,684][18169] Removing /home/bartek/Workspace/ideas/sample-factory/train_dir/Ant/checkpoint_p0/checkpoint_000016776_8589312.pth [2024-04-25 12:24:58,727][18183] Updated weights for policy 0, policy_version 17440 (0.0008) [2024-04-25 12:25:02,274][18183] Updated weights for policy 0, policy_version 17520 (0.0008) [2024-04-25 12:25:03,677][18010] Fps is (10 sec: 11059.3, 60 sec: 11264.0, 300 sec: 11149.5). Total num frames: 8982528. Throughput: 0: 11359.9. Samples: 8978072. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-04-25 12:25:03,678][18010] Avg episode reward: [(0, '5285.231')] [2024-04-25 12:25:03,678][18169] Saving new best policy, reward=5285.231! [2024-04-25 12:25:05,937][18183] Updated weights for policy 0, policy_version 17600 (0.0007) [2024-04-25 12:25:08,678][18010] Fps is (10 sec: 11469.6, 60 sec: 11332.3, 300 sec: 11163.3). Total num frames: 9039872. Throughput: 0: 11319.5. Samples: 9011200. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-04-25 12:25:08,678][18010] Avg episode reward: [(0, '5284.296')] [2024-04-25 12:25:09,660][18183] Updated weights for policy 0, policy_version 17680 (0.0007) [2024-04-25 12:25:13,397][18183] Updated weights for policy 0, policy_version 17760 (0.0008) [2024-04-25 12:25:13,678][18010] Fps is (10 sec: 11059.2, 60 sec: 11264.0, 300 sec: 11149.5). Total num frames: 9093120. Throughput: 0: 11276.6. Samples: 9076944. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-04-25 12:25:13,678][18010] Avg episode reward: [(0, '5270.554')] [2024-04-25 12:25:13,680][18169] Saving /home/bartek/Workspace/ideas/sample-factory/train_dir/Ant/checkpoint_p0/checkpoint_000017760_9093120.pth... [2024-04-25 12:25:13,682][18169] Removing /home/bartek/Workspace/ideas/sample-factory/train_dir/Ant/checkpoint_p0/checkpoint_000017112_8761344.pth [2024-04-25 12:25:17,021][18183] Updated weights for policy 0, policy_version 17840 (0.0008) [2024-04-25 12:25:18,678][18010] Fps is (10 sec: 11059.2, 60 sec: 11264.0, 300 sec: 11163.4). Total num frames: 9150464. Throughput: 0: 11308.9. Samples: 9145312. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-04-25 12:25:18,678][18010] Avg episode reward: [(0, '5293.390')] [2024-04-25 12:25:18,678][18169] Saving new best policy, reward=5293.390! [2024-04-25 12:25:20,567][18183] Updated weights for policy 0, policy_version 17920 (0.0007) [2024-04-25 12:25:23,677][18010] Fps is (10 sec: 11468.8, 60 sec: 11264.0, 300 sec: 11149.5). Total num frames: 9207808. Throughput: 0: 11302.5. Samples: 9179144. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-04-25 12:25:23,678][18010] Avg episode reward: [(0, '5305.687')] [2024-04-25 12:25:23,678][18169] Saving new best policy, reward=5305.687! [2024-04-25 12:25:24,126][18183] Updated weights for policy 0, policy_version 18000 (0.0007) [2024-04-25 12:25:27,592][18183] Updated weights for policy 0, policy_version 18080 (0.0007) [2024-04-25 12:25:28,678][18010] Fps is (10 sec: 11878.3, 60 sec: 11400.5, 300 sec: 11163.3). Total num frames: 9269248. Throughput: 0: 11292.7. Samples: 9249044. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-04-25 12:25:28,678][18010] Avg episode reward: [(0, '5436.254')] [2024-04-25 12:25:28,681][18169] Saving /home/bartek/Workspace/ideas/sample-factory/train_dir/Ant/checkpoint_p0/checkpoint_000018104_9269248.pth... [2024-04-25 12:25:28,685][18169] Removing /home/bartek/Workspace/ideas/sample-factory/train_dir/Ant/checkpoint_p0/checkpoint_000017432_8925184.pth [2024-04-25 12:25:28,686][18169] Saving new best policy, reward=5436.254! [2024-04-25 12:25:31,047][18183] Updated weights for policy 0, policy_version 18160 (0.0008) [2024-04-25 12:25:33,678][18010] Fps is (10 sec: 11878.4, 60 sec: 11400.5, 300 sec: 11163.3). Total num frames: 9326592. Throughput: 0: 11395.9. Samples: 9320816. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-04-25 12:25:33,678][18010] Avg episode reward: [(0, '5302.312')] [2024-04-25 12:25:34,536][18183] Updated weights for policy 0, policy_version 18240 (0.0007) [2024-04-25 12:25:38,253][18183] Updated weights for policy 0, policy_version 18320 (0.0008) [2024-04-25 12:25:38,678][18010] Fps is (10 sec: 11468.7, 60 sec: 11400.5, 300 sec: 11163.3). Total num frames: 9383936. Throughput: 0: 11401.9. Samples: 9354944. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-04-25 12:25:38,678][18010] Avg episode reward: [(0, '5162.613')] [2024-04-25 12:25:41,824][18183] Updated weights for policy 0, policy_version 18400 (0.0007) [2024-04-25 12:25:43,678][18010] Fps is (10 sec: 11058.5, 60 sec: 11332.2, 300 sec: 11163.3). Total num frames: 9437184. Throughput: 0: 11399.0. Samples: 9422128. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-04-25 12:25:43,679][18010] Avg episode reward: [(0, '4913.346')] [2024-04-25 12:25:43,682][18169] Saving /home/bartek/Workspace/ideas/sample-factory/train_dir/Ant/checkpoint_p0/checkpoint_000018432_9437184.pth... [2024-04-25 12:25:43,684][18169] Removing /home/bartek/Workspace/ideas/sample-factory/train_dir/Ant/checkpoint_p0/checkpoint_000017760_9093120.pth [2024-04-25 12:25:45,555][18183] Updated weights for policy 0, policy_version 18480 (0.0007) [2024-04-25 12:25:48,678][18010] Fps is (10 sec: 11059.3, 60 sec: 11332.3, 300 sec: 11163.3). Total num frames: 9494528. Throughput: 0: 11376.9. Samples: 9490032. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-04-25 12:25:48,678][18010] Avg episode reward: [(0, '5144.698')] [2024-04-25 12:25:49,177][18183] Updated weights for policy 0, policy_version 18560 (0.0008) [2024-04-25 12:25:52,866][18183] Updated weights for policy 0, policy_version 18640 (0.0007) [2024-04-25 12:25:53,677][18010] Fps is (10 sec: 11469.5, 60 sec: 11332.3, 300 sec: 11177.2). Total num frames: 9551872. Throughput: 0: 11354.9. Samples: 9522168. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-04-25 12:25:53,678][18010] Avg episode reward: [(0, '5363.992')] [2024-04-25 12:25:56,308][18183] Updated weights for policy 0, policy_version 18720 (0.0007) [2024-04-25 12:25:58,678][18010] Fps is (10 sec: 11468.8, 60 sec: 11400.7, 300 sec: 11191.1). Total num frames: 9609216. Throughput: 0: 11463.2. Samples: 9592788. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-04-25 12:25:58,678][18010] Avg episode reward: [(0, '5390.010')] [2024-04-25 12:25:58,681][18169] Saving /home/bartek/Workspace/ideas/sample-factory/train_dir/Ant/checkpoint_p0/checkpoint_000018768_9609216.pth... [2024-04-25 12:25:58,687][18169] Removing /home/bartek/Workspace/ideas/sample-factory/train_dir/Ant/checkpoint_p0/checkpoint_000018104_9269248.pth [2024-04-25 12:25:59,986][18183] Updated weights for policy 0, policy_version 18800 (0.0007) [2024-04-25 12:26:03,552][18183] Updated weights for policy 0, policy_version 18880 (0.0007) [2024-04-25 12:26:03,678][18010] Fps is (10 sec: 11468.7, 60 sec: 11400.5, 300 sec: 11191.1). Total num frames: 9666560. Throughput: 0: 11430.8. Samples: 9659700. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-04-25 12:26:03,678][18010] Avg episode reward: [(0, '5260.982')] [2024-04-25 12:26:07,028][18183] Updated weights for policy 0, policy_version 18960 (0.0008) [2024-04-25 12:26:08,678][18010] Fps is (10 sec: 11468.7, 60 sec: 11400.5, 300 sec: 11205.0). Total num frames: 9723904. Throughput: 0: 11466.7. Samples: 9695148. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-04-25 12:26:08,678][18010] Avg episode reward: [(0, '5310.132')] [2024-04-25 12:26:10,447][18183] Updated weights for policy 0, policy_version 19040 (0.0007) [2024-04-25 12:26:13,678][18010] Fps is (10 sec: 11878.4, 60 sec: 11537.1, 300 sec: 11246.6). Total num frames: 9785344. Throughput: 0: 11489.2. Samples: 9766056. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-04-25 12:26:13,678][18010] Avg episode reward: [(0, '5277.358')] [2024-04-25 12:26:13,682][18169] Saving /home/bartek/Workspace/ideas/sample-factory/train_dir/Ant/checkpoint_p0/checkpoint_000019112_9785344.pth... [2024-04-25 12:26:13,686][18169] Removing /home/bartek/Workspace/ideas/sample-factory/train_dir/Ant/checkpoint_p0/checkpoint_000018432_9437184.pth [2024-04-25 12:26:13,952][18183] Updated weights for policy 0, policy_version 19120 (0.0007) [2024-04-25 12:26:17,387][18183] Updated weights for policy 0, policy_version 19200 (0.0006) [2024-04-25 12:26:18,678][18010] Fps is (10 sec: 11878.6, 60 sec: 11537.1, 300 sec: 11260.5). Total num frames: 9842688. Throughput: 0: 11485.2. Samples: 9837652. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-04-25 12:26:18,678][18010] Avg episode reward: [(0, '5274.701')] [2024-04-25 12:26:21,084][18183] Updated weights for policy 0, policy_version 19280 (0.0007) [2024-04-25 12:26:23,678][18010] Fps is (10 sec: 11058.9, 60 sec: 11468.7, 300 sec: 11232.7). Total num frames: 9895936. Throughput: 0: 11458.7. Samples: 9870588. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-04-25 12:26:23,678][18010] Avg episode reward: [(0, '5368.373')] [2024-04-25 12:26:24,882][18183] Updated weights for policy 0, policy_version 19360 (0.0007) [2024-04-25 12:26:28,551][18183] Updated weights for policy 0, policy_version 19440 (0.0006) [2024-04-25 12:26:28,678][18010] Fps is (10 sec: 11059.1, 60 sec: 11400.5, 300 sec: 11232.8). Total num frames: 9953280. Throughput: 0: 11426.4. Samples: 9936312. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-04-25 12:26:28,678][18010] Avg episode reward: [(0, '5160.498')] [2024-04-25 12:26:28,682][18169] Saving /home/bartek/Workspace/ideas/sample-factory/train_dir/Ant/checkpoint_p0/checkpoint_000019440_9953280.pth... [2024-04-25 12:26:28,686][18169] Removing /home/bartek/Workspace/ideas/sample-factory/train_dir/Ant/checkpoint_p0/checkpoint_000018768_9609216.pth [2024-04-25 12:26:32,325][18183] Updated weights for policy 0, policy_version 19520 (0.0008) [2024-04-25 12:26:33,366][18169] Early stopping after 2 epochs (8 sgd steps), loss delta 0.0000000 [2024-04-25 12:26:33,367][18185] Stopping RolloutWorker_w2... [2024-04-25 12:26:33,368][18010] Component RolloutWorker_w2 stopped! [2024-04-25 12:26:33,368][18185] Loop rollout_proc2_evt_loop terminating... [2024-04-25 12:26:33,368][18169] Stopping Batcher_0... [2024-04-25 12:26:33,368][18187] Stopping RolloutWorker_w6... [2024-04-25 12:26:33,368][18190] Stopping RolloutWorker_w5... [2024-04-25 12:26:33,368][18184] Stopping RolloutWorker_w1... [2024-04-25 12:26:33,368][18182] Stopping RolloutWorker_w0... [2024-04-25 12:26:33,368][18187] Loop rollout_proc6_evt_loop terminating... [2024-04-25 12:26:33,368][18186] Stopping RolloutWorker_w4... [2024-04-25 12:26:33,368][18010] Component RolloutWorker_w3 stopped! [2024-04-25 12:26:33,368][18188] Stopping RolloutWorker_w3... [2024-04-25 12:26:33,368][18169] Loop batcher_evt_loop terminating... [2024-04-25 12:26:33,368][18190] Loop rollout_proc5_evt_loop terminating... [2024-04-25 12:26:33,368][18189] Stopping RolloutWorker_w7... [2024-04-25 12:26:33,368][18010] Component RolloutWorker_w1 stopped! [2024-04-25 12:26:33,368][18182] Loop rollout_proc0_evt_loop terminating... [2024-04-25 12:26:33,368][18184] Loop rollout_proc1_evt_loop terminating... [2024-04-25 12:26:33,368][18186] Loop rollout_proc4_evt_loop terminating... [2024-04-25 12:26:33,368][18188] Loop rollout_proc3_evt_loop terminating... [2024-04-25 12:26:33,368][18010] Component RolloutWorker_w5 stopped! [2024-04-25 12:26:33,368][18189] Loop rollout_proc7_evt_loop terminating... [2024-04-25 12:26:33,369][18010] Component RolloutWorker_w4 stopped! [2024-04-25 12:26:33,369][18010] Component RolloutWorker_w7 stopped! [2024-04-25 12:26:33,369][18169] Saving /home/bartek/Workspace/ideas/sample-factory/train_dir/Ant/checkpoint_p0/checkpoint_000019544_10006528.pth... [2024-04-25 12:26:33,369][18010] Component RolloutWorker_w0 stopped! [2024-04-25 12:26:33,369][18010] Component RolloutWorker_w6 stopped! [2024-04-25 12:26:33,369][18010] Component Batcher_0 stopped! [2024-04-25 12:26:33,372][18169] Removing /home/bartek/Workspace/ideas/sample-factory/train_dir/Ant/checkpoint_p0/checkpoint_000019112_9785344.pth [2024-04-25 12:26:33,373][18169] Saving /home/bartek/Workspace/ideas/sample-factory/train_dir/Ant/checkpoint_p0/checkpoint_000019544_10006528.pth... [2024-04-25 12:26:33,376][18169] Stopping LearnerWorker_p0... [2024-04-25 12:26:33,377][18169] Loop learner_proc0_evt_loop terminating... [2024-04-25 12:26:33,377][18010] Component LearnerWorker_p0 stopped! [2024-04-25 12:26:33,449][18183] Weights refcount: 2 0 [2024-04-25 12:26:33,451][18183] Stopping InferenceWorker_p0-w0... [2024-04-25 12:26:33,451][18010] Component InferenceWorker_p0-w0 stopped! [2024-04-25 12:26:33,452][18183] Loop inference_proc0-0_evt_loop terminating... [2024-04-25 12:26:33,452][18010] Waiting for process learner_proc0 to stop... [2024-04-25 12:26:34,382][18010] Waiting for process inference_proc0-0 to join... [2024-04-25 12:26:34,382][18010] Waiting for process rollout_proc0 to join... [2024-04-25 12:26:34,382][18010] Waiting for process rollout_proc1 to join... [2024-04-25 12:26:34,383][18010] Waiting for process rollout_proc2 to join... [2024-04-25 12:26:34,383][18010] Waiting for process rollout_proc3 to join... [2024-04-25 12:26:34,383][18010] Waiting for process rollout_proc4 to join... [2024-04-25 12:26:34,383][18010] Waiting for process rollout_proc5 to join... [2024-04-25 12:26:34,383][18010] Waiting for process rollout_proc6 to join... [2024-04-25 12:26:34,384][18010] Waiting for process rollout_proc7 to join... [2024-04-25 12:26:34,384][18010] Batcher 0 profile tree view: batching: 3.2441, releasing_batches: 1.5483 [2024-04-25 12:26:34,384][18010] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0052 wait_policy_total: 210.0314 update_model: 9.7032 weight_update: 0.0008 one_step: 0.0023 handle_policy_step: 618.3589 deserialize: 18.6033, stack: 3.8221, obs_to_device_normalize: 122.5832, forward: 309.1879, send_messages: 55.0457 prepare_outputs: 75.2405 to_cpu: 38.5992 [2024-04-25 12:26:34,384][18010] Learner 0 profile tree view: misc: 0.0079, prepare_batch: 6.5897 train: 67.8258 epoch_init: 0.0345, minibatch_init: 0.9679, losses_postprocess: 1.8979, kl_divergence: 0.8879, after_optimizer: 1.1132 calculate_losses: 20.3445 losses_init: 0.0357, forward_head: 2.4427, bptt_initial: 0.1418, bptt: 0.1515, tail: 8.1449, advantages_returns: 1.0710, losses: 7.0595 update: 41.0535 clip: 4.9322 [2024-04-25 12:26:34,384][18010] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.4502, enqueue_policy_requests: 19.6062, env_step: 370.9715, overhead: 40.8484, complete_rollouts: 0.4321 save_policy_outputs: 41.4738 split_output_tensors: 19.5440 [2024-04-25 12:26:34,384][18010] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.4346, enqueue_policy_requests: 19.7944, env_step: 370.2794, overhead: 40.4007, complete_rollouts: 0.4334 save_policy_outputs: 40.6604 split_output_tensors: 19.1990 [2024-04-25 12:26:34,384][18010] Loop Runner_EvtLoop terminating... [2024-04-25 12:26:34,385][18010] Runner profile tree view: main_loop: 887.0711 [2024-04-25 12:26:34,385][18010] Collected {0: 10006528}, FPS: 11280.4