[2025-01-07 11:53:38,254][01669] Saving configuration to /content/train_dir/default_experiment/config.json... [2025-01-07 11:53:38,256][01669] Rollout worker 0 uses device cpu [2025-01-07 11:53:38,258][01669] Rollout worker 1 uses device cpu [2025-01-07 11:53:38,260][01669] Rollout worker 2 uses device cpu [2025-01-07 11:53:38,261][01669] Rollout worker 3 uses device cpu [2025-01-07 11:53:38,263][01669] Rollout worker 4 uses device cpu [2025-01-07 11:53:38,264][01669] Rollout worker 5 uses device cpu [2025-01-07 11:53:38,265][01669] Rollout worker 6 uses device cpu [2025-01-07 11:53:38,266][01669] Rollout worker 7 uses device cpu [2025-01-07 11:57:00,265][01669] Environment doom_basic already registered, overwriting... [2025-01-07 11:57:00,269][01669] Environment doom_two_colors_easy already registered, overwriting... [2025-01-07 11:57:00,271][01669] Environment doom_two_colors_hard already registered, overwriting... [2025-01-07 11:57:00,273][01669] Environment doom_dm already registered, overwriting... [2025-01-07 11:57:00,274][01669] Environment doom_dwango5 already registered, overwriting... [2025-01-07 11:57:00,275][01669] Environment doom_my_way_home_flat_actions already registered, overwriting... [2025-01-07 11:57:00,278][01669] Environment doom_defend_the_center_flat_actions already registered, overwriting... [2025-01-07 11:57:00,279][01669] Environment doom_my_way_home already registered, overwriting... [2025-01-07 11:57:00,281][01669] Environment doom_deadly_corridor already registered, overwriting... [2025-01-07 11:57:00,282][01669] Environment doom_defend_the_center already registered, overwriting... [2025-01-07 11:57:00,284][01669] Environment doom_defend_the_line already registered, overwriting... [2025-01-07 11:57:00,285][01669] Environment doom_health_gathering already registered, overwriting... [2025-01-07 11:57:00,287][01669] Environment doom_health_gathering_supreme already registered, overwriting... [2025-01-07 11:57:00,293][01669] Environment doom_battle already registered, overwriting... [2025-01-07 11:57:00,295][01669] Environment doom_battle2 already registered, overwriting... [2025-01-07 11:57:00,301][01669] Environment doom_duel_bots already registered, overwriting... [2025-01-07 11:57:00,302][01669] Environment doom_deathmatch_bots already registered, overwriting... [2025-01-07 11:57:00,303][01669] Environment doom_duel already registered, overwriting... [2025-01-07 11:57:00,304][01669] Environment doom_deathmatch_full already registered, overwriting... [2025-01-07 11:57:00,305][01669] Environment doom_benchmark already registered, overwriting... [2025-01-07 11:57:00,307][01669] register_encoder_factory: [2025-01-07 11:57:00,331][01669] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2025-01-07 11:57:00,339][01669] Experiment dir /content/train_dir/default_experiment already exists! [2025-01-07 11:57:00,340][01669] Resuming existing experiment from /content/train_dir/default_experiment... [2025-01-07 11:57:00,343][01669] Weights and Biases integration disabled [2025-01-07 11:57:00,349][01669] Environment var CUDA_VISIBLE_DEVICES is [2025-01-07 11:57:03,154][01669] Starting experiment with the following configuration: help=False algo=APPO env=doom_health_gathering_supreme experiment=default_experiment train_dir=/content/train_dir restart_behavior=resume device=gpu seed=None num_policies=1 async_rl=True serial_mode=False batched_sampling=False num_batches_to_accumulate=2 worker_num_splits=2 policy_workers_per_policy=1 max_policy_lag=1000 num_workers=8 num_envs_per_worker=4 batch_size=1024 num_batches_per_epoch=1 num_epochs=1 rollout=32 recurrence=32 shuffle_minibatches=False gamma=0.99 reward_scale=1.0 reward_clip=1000.0 value_bootstrap=False normalize_returns=True exploration_loss_coeff=0.001 value_loss_coeff=0.5 kl_loss_coeff=0.0 exploration_loss=symmetric_kl gae_lambda=0.95 ppo_clip_ratio=0.1 ppo_clip_value=0.2 with_vtrace=False vtrace_rho=1.0 vtrace_c=1.0 optimizer=adam adam_eps=1e-06 adam_beta1=0.9 adam_beta2=0.999 max_grad_norm=4.0 learning_rate=0.0001 lr_schedule=constant lr_schedule_kl_threshold=0.008 lr_adaptive_min=1e-06 lr_adaptive_max=0.01 obs_subtract_mean=0.0 obs_scale=255.0 normalize_input=True normalize_input_keys=None decorrelate_experience_max_seconds=0 decorrelate_envs_on_one_worker=True actor_worker_gpus=[] set_workers_cpu_affinity=True force_envs_single_thread=False default_niceness=0 log_to_file=True experiment_summaries_interval=10 flush_summaries_interval=30 stats_avg=100 summaries_use_frameskip=True heartbeat_interval=20 heartbeat_reporting_interval=600 train_for_env_steps=4000000 train_for_seconds=10000000000 save_every_sec=120 keep_checkpoints=2 load_checkpoint_kind=latest save_milestones_sec=-1 save_best_every_sec=5 save_best_metric=reward save_best_after=100000 benchmark=False encoder_mlp_layers=[512, 512] encoder_conv_architecture=convnet_simple encoder_conv_mlp_layers=[512] use_rnn=True rnn_size=512 rnn_type=gru rnn_num_layers=1 decoder_mlp_layers=[] nonlinearity=elu policy_initialization=orthogonal policy_init_gain=1.0 actor_critic_share_weights=True adaptive_stddev=True continuous_tanh_scale=0.0 initial_stddev=1.0 use_env_info_cache=False env_gpu_actions=False env_gpu_observations=True env_frameskip=4 env_framestack=1 pixel_format=CHW use_record_episode_statistics=False with_wandb=False wandb_user=None wandb_project=sample_factory wandb_group=None wandb_job_type=SF wandb_tags=[] with_pbt=False pbt_mix_policies_in_one_env=True pbt_period_env_steps=5000000 pbt_start_mutation=20000000 pbt_replace_fraction=0.3 pbt_mutation_rate=0.15 pbt_replace_reward_gap=0.1 pbt_replace_reward_gap_absolute=1e-06 pbt_optimize_gamma=False pbt_target_objective=true_objective pbt_perturb_min=1.1 pbt_perturb_max=1.5 num_agents=-1 num_humans=0 num_bots=-1 start_bot_difficulty=None timelimit=None res_w=128 res_h=72 wide_aspect_ratio=False eval_env_frameskip=1 fps=35 command_line=--env=doom_health_gathering_supreme --num_workers=8 --num_envs_per_worker=4 --train_for_env_steps=4000000 cli_args={'env': 'doom_health_gathering_supreme', 'num_workers': 8, 'num_envs_per_worker': 4, 'train_for_env_steps': 4000000} git_hash=unknown git_repo_name=not a git repository [2025-01-07 11:57:03,157][01669] Saving configuration to /content/train_dir/default_experiment/config.json... [2025-01-07 11:57:03,161][01669] Rollout worker 0 uses device cpu [2025-01-07 11:57:03,163][01669] Rollout worker 1 uses device cpu [2025-01-07 11:57:03,165][01669] Rollout worker 2 uses device cpu [2025-01-07 11:57:03,166][01669] Rollout worker 3 uses device cpu [2025-01-07 11:57:03,167][01669] Rollout worker 4 uses device cpu [2025-01-07 11:57:03,169][01669] Rollout worker 5 uses device cpu [2025-01-07 11:57:03,170][01669] Rollout worker 6 uses device cpu [2025-01-07 11:57:03,171][01669] Rollout worker 7 uses device cpu [2025-01-07 12:00:35,659][01669] Environment doom_basic already registered, overwriting... [2025-01-07 12:00:35,661][01669] Environment doom_two_colors_easy already registered, overwriting... [2025-01-07 12:00:35,665][01669] Environment doom_two_colors_hard already registered, overwriting... [2025-01-07 12:00:35,666][01669] Environment doom_dm already registered, overwriting... [2025-01-07 12:00:35,667][01669] Environment doom_dwango5 already registered, overwriting... [2025-01-07 12:00:35,668][01669] Environment doom_my_way_home_flat_actions already registered, overwriting... [2025-01-07 12:00:35,669][01669] Environment doom_defend_the_center_flat_actions already registered, overwriting... [2025-01-07 12:00:35,672][01669] Environment doom_my_way_home already registered, overwriting... [2025-01-07 12:00:35,673][01669] Environment doom_deadly_corridor already registered, overwriting... [2025-01-07 12:00:35,675][01669] Environment doom_defend_the_center already registered, overwriting... [2025-01-07 12:00:35,677][01669] Environment doom_defend_the_line already registered, overwriting... [2025-01-07 12:00:35,679][01669] Environment doom_health_gathering already registered, overwriting... [2025-01-07 12:00:35,682][01669] Environment doom_health_gathering_supreme already registered, overwriting... [2025-01-07 12:00:35,684][01669] Environment doom_battle already registered, overwriting... [2025-01-07 12:00:35,686][01669] Environment doom_battle2 already registered, overwriting... [2025-01-07 12:00:35,688][01669] Environment doom_duel_bots already registered, overwriting... [2025-01-07 12:00:35,691][01669] Environment doom_deathmatch_bots already registered, overwriting... [2025-01-07 12:00:35,693][01669] Environment doom_duel already registered, overwriting... [2025-01-07 12:00:35,695][01669] Environment doom_deathmatch_full already registered, overwriting... [2025-01-07 12:00:35,697][01669] Environment doom_benchmark already registered, overwriting... [2025-01-07 12:00:35,700][01669] register_encoder_factory: [2025-01-07 12:00:35,751][01669] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2025-01-07 12:00:35,753][01669] Overriding arg 'device' with value 'cpu' passed from command line [2025-01-07 12:00:35,757][01669] Experiment dir /content/train_dir/default_experiment already exists! [2025-01-07 12:00:35,759][01669] Resuming existing experiment from /content/train_dir/default_experiment... [2025-01-07 12:00:35,762][01669] Weights and Biases integration disabled [2025-01-07 12:00:35,766][01669] Environment var CUDA_VISIBLE_DEVICES is [2025-01-07 12:00:39,944][01669] Starting experiment with the following configuration: help=False algo=APPO env=doom_health_gathering_supreme experiment=default_experiment train_dir=/content/train_dir restart_behavior=resume device=cpu seed=None num_policies=1 async_rl=True serial_mode=False batched_sampling=False num_batches_to_accumulate=2 worker_num_splits=2 policy_workers_per_policy=1 max_policy_lag=1000 num_workers=8 num_envs_per_worker=4 batch_size=1024 num_batches_per_epoch=1 num_epochs=1 rollout=32 recurrence=32 shuffle_minibatches=False gamma=0.99 reward_scale=1.0 reward_clip=1000.0 value_bootstrap=False normalize_returns=True exploration_loss_coeff=0.001 value_loss_coeff=0.5 kl_loss_coeff=0.0 exploration_loss=symmetric_kl gae_lambda=0.95 ppo_clip_ratio=0.1 ppo_clip_value=0.2 with_vtrace=False vtrace_rho=1.0 vtrace_c=1.0 optimizer=adam adam_eps=1e-06 adam_beta1=0.9 adam_beta2=0.999 max_grad_norm=4.0 learning_rate=0.0001 lr_schedule=constant lr_schedule_kl_threshold=0.008 lr_adaptive_min=1e-06 lr_adaptive_max=0.01 obs_subtract_mean=0.0 obs_scale=255.0 normalize_input=True normalize_input_keys=None decorrelate_experience_max_seconds=0 decorrelate_envs_on_one_worker=True actor_worker_gpus=[] set_workers_cpu_affinity=True force_envs_single_thread=False default_niceness=0 log_to_file=True experiment_summaries_interval=10 flush_summaries_interval=30 stats_avg=100 summaries_use_frameskip=True heartbeat_interval=20 heartbeat_reporting_interval=600 train_for_env_steps=4000000 train_for_seconds=10000000000 save_every_sec=120 keep_checkpoints=2 load_checkpoint_kind=latest save_milestones_sec=-1 save_best_every_sec=5 save_best_metric=reward save_best_after=100000 benchmark=False encoder_mlp_layers=[512, 512] encoder_conv_architecture=convnet_simple encoder_conv_mlp_layers=[512] use_rnn=True rnn_size=512 rnn_type=gru rnn_num_layers=1 decoder_mlp_layers=[] nonlinearity=elu policy_initialization=orthogonal policy_init_gain=1.0 actor_critic_share_weights=True adaptive_stddev=True continuous_tanh_scale=0.0 initial_stddev=1.0 use_env_info_cache=False env_gpu_actions=False env_gpu_observations=True env_frameskip=4 env_framestack=1 pixel_format=CHW use_record_episode_statistics=False with_wandb=False wandb_user=None wandb_project=sample_factory wandb_group=None wandb_job_type=SF wandb_tags=[] with_pbt=False pbt_mix_policies_in_one_env=True pbt_period_env_steps=5000000 pbt_start_mutation=20000000 pbt_replace_fraction=0.3 pbt_mutation_rate=0.15 pbt_replace_reward_gap=0.1 pbt_replace_reward_gap_absolute=1e-06 pbt_optimize_gamma=False pbt_target_objective=true_objective pbt_perturb_min=1.1 pbt_perturb_max=1.5 num_agents=-1 num_humans=0 num_bots=-1 start_bot_difficulty=None timelimit=None res_w=128 res_h=72 wide_aspect_ratio=False eval_env_frameskip=1 fps=35 command_line=--env=doom_health_gathering_supreme --num_workers=8 --num_envs_per_worker=4 --train_for_env_steps=4000000 cli_args={'env': 'doom_health_gathering_supreme', 'num_workers': 8, 'num_envs_per_worker': 4, 'train_for_env_steps': 4000000} git_hash=unknown git_repo_name=not a git repository [2025-01-07 12:00:39,946][01669] Saving configuration to /content/train_dir/default_experiment/config.json... [2025-01-07 12:00:39,950][01669] Rollout worker 0 uses device cpu [2025-01-07 12:00:39,952][01669] Rollout worker 1 uses device cpu [2025-01-07 12:00:39,954][01669] Rollout worker 2 uses device cpu [2025-01-07 12:00:39,955][01669] Rollout worker 3 uses device cpu [2025-01-07 12:00:39,956][01669] Rollout worker 4 uses device cpu [2025-01-07 12:00:39,957][01669] Rollout worker 5 uses device cpu [2025-01-07 12:00:39,959][01669] Rollout worker 6 uses device cpu [2025-01-07 12:00:39,960][01669] Rollout worker 7 uses device cpu [2025-01-07 12:00:40,071][01669] InferenceWorker_p0-w0: min num requests: 2 [2025-01-07 12:00:40,109][01669] Starting all processes... [2025-01-07 12:00:40,110][01669] Starting process learner_proc0 [2025-01-07 12:00:40,170][01669] Starting all processes... [2025-01-07 12:00:40,187][01669] Starting process inference_proc0-0 [2025-01-07 12:00:40,188][01669] Starting process rollout_proc0 [2025-01-07 12:00:40,190][01669] Starting process rollout_proc1 [2025-01-07 12:00:40,190][01669] Starting process rollout_proc2 [2025-01-07 12:00:40,190][01669] Starting process rollout_proc3 [2025-01-07 12:00:40,190][01669] Starting process rollout_proc4 [2025-01-07 12:00:40,190][01669] Starting process rollout_proc5 [2025-01-07 12:00:40,190][01669] Starting process rollout_proc6 [2025-01-07 12:00:40,190][01669] Starting process rollout_proc7 [2025-01-07 12:00:47,910][01669] Keyboard interrupt detected in the event loop EvtLoop [Runner_EvtLoop, process=main process 1669], exiting... [2025-01-07 12:00:47,916][01669] Runner profile tree view: main_loop: 7.8074 [2025-01-07 12:00:47,920][01669] Collected {}, FPS: 0.0 [2025-01-07 12:01:02,441][05422] Worker 5 uses CPU cores [1] [2025-01-07 12:01:02,604][05422] Stopping RolloutWorker_w5... [2025-01-07 12:01:02,631][05422] Loop rollout_proc5_evt_loop terminating... [2025-01-07 12:01:03,137][05419] Worker 3 uses CPU cores [1] [2025-01-07 12:01:03,187][05416] Worker 0 uses CPU cores [0] [2025-01-07 12:01:03,221][05417] Worker 1 uses CPU cores [1] [2025-01-07 12:01:03,248][05419] Stopping RolloutWorker_w3... [2025-01-07 12:01:03,266][05419] Loop rollout_proc3_evt_loop terminating... [2025-01-07 12:01:03,288][05416] Stopping RolloutWorker_w0... [2025-01-07 12:01:03,289][05416] Loop rollout_proc0_evt_loop terminating... [2025-01-07 12:01:03,302][05417] Stopping RolloutWorker_w1... [2025-01-07 12:01:03,313][05417] Loop rollout_proc1_evt_loop terminating... [2025-01-07 12:01:03,606][05423] Worker 7 uses CPU cores [1] [2025-01-07 12:01:03,644][05423] Stopping RolloutWorker_w7... [2025-01-07 12:01:03,644][05423] Loop rollout_proc7_evt_loop terminating... [2025-01-07 12:01:03,983][05420] Worker 4 uses CPU cores [0] [2025-01-07 12:01:04,016][05421] Worker 6 uses CPU cores [0] [2025-01-07 12:01:04,075][05420] Stopping RolloutWorker_w4... [2025-01-07 12:01:04,089][05420] Loop rollout_proc4_evt_loop terminating... [2025-01-07 12:01:04,102][05421] Stopping RolloutWorker_w6... [2025-01-07 12:01:04,105][05421] Loop rollout_proc6_evt_loop terminating... [2025-01-07 12:01:31,390][01669] Environment doom_basic already registered, overwriting... [2025-01-07 12:01:31,398][01669] Environment doom_two_colors_easy already registered, overwriting... [2025-01-07 12:01:31,401][01669] Environment doom_two_colors_hard already registered, overwriting... [2025-01-07 12:01:31,403][01669] Environment doom_dm already registered, overwriting... [2025-01-07 12:01:31,406][01669] Environment doom_dwango5 already registered, overwriting... [2025-01-07 12:01:31,408][01669] Environment doom_my_way_home_flat_actions already registered, overwriting... [2025-01-07 12:01:31,410][01669] Environment doom_defend_the_center_flat_actions already registered, overwriting... [2025-01-07 12:01:31,412][01669] Environment doom_my_way_home already registered, overwriting... [2025-01-07 12:01:31,427][01669] Environment doom_deadly_corridor already registered, overwriting... [2025-01-07 12:01:31,431][01669] Environment doom_defend_the_center already registered, overwriting... [2025-01-07 12:01:31,441][01669] Environment doom_defend_the_line already registered, overwriting... [2025-01-07 12:01:31,444][01669] Environment doom_health_gathering already registered, overwriting... [2025-01-07 12:01:31,447][01669] Environment doom_health_gathering_supreme already registered, overwriting... [2025-01-07 12:01:31,451][01669] Environment doom_battle already registered, overwriting... [2025-01-07 12:01:31,454][01669] Environment doom_battle2 already registered, overwriting... [2025-01-07 12:01:31,457][01669] Environment doom_duel_bots already registered, overwriting... [2025-01-07 12:01:31,460][01669] Environment doom_deathmatch_bots already registered, overwriting... [2025-01-07 12:01:31,463][01669] Environment doom_duel already registered, overwriting... [2025-01-07 12:01:31,501][01669] Environment doom_deathmatch_full already registered, overwriting... [2025-01-07 12:01:31,503][01669] Environment doom_benchmark already registered, overwriting... [2025-01-07 12:01:31,506][01669] register_encoder_factory: [2025-01-07 12:01:31,550][01669] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2025-01-07 12:01:31,575][01669] Experiment dir /content/train_dir/default_experiment already exists! [2025-01-07 12:01:31,582][01669] Resuming existing experiment from /content/train_dir/default_experiment... [2025-01-07 12:01:31,587][01669] Weights and Biases integration disabled [2025-01-07 12:01:31,606][01669] Environment var CUDA_VISIBLE_DEVICES is [2025-01-07 12:01:38,590][01669] Starting experiment with the following configuration: help=False algo=APPO env=doom_health_gathering_supreme experiment=default_experiment train_dir=/content/train_dir restart_behavior=resume device=cpu seed=None num_policies=1 async_rl=True serial_mode=False batched_sampling=False num_batches_to_accumulate=2 worker_num_splits=2 policy_workers_per_policy=1 max_policy_lag=1000 num_workers=8 num_envs_per_worker=4 batch_size=1024 num_batches_per_epoch=1 num_epochs=1 rollout=32 recurrence=32 shuffle_minibatches=False gamma=0.99 reward_scale=1.0 reward_clip=1000.0 value_bootstrap=False normalize_returns=True exploration_loss_coeff=0.001 value_loss_coeff=0.5 kl_loss_coeff=0.0 exploration_loss=symmetric_kl gae_lambda=0.95 ppo_clip_ratio=0.1 ppo_clip_value=0.2 with_vtrace=False vtrace_rho=1.0 vtrace_c=1.0 optimizer=adam adam_eps=1e-06 adam_beta1=0.9 adam_beta2=0.999 max_grad_norm=4.0 learning_rate=0.0001 lr_schedule=constant lr_schedule_kl_threshold=0.008 lr_adaptive_min=1e-06 lr_adaptive_max=0.01 obs_subtract_mean=0.0 obs_scale=255.0 normalize_input=True normalize_input_keys=None decorrelate_experience_max_seconds=0 decorrelate_envs_on_one_worker=True actor_worker_gpus=[] set_workers_cpu_affinity=True force_envs_single_thread=False default_niceness=0 log_to_file=True experiment_summaries_interval=10 flush_summaries_interval=30 stats_avg=100 summaries_use_frameskip=True heartbeat_interval=20 heartbeat_reporting_interval=600 train_for_env_steps=4000000 train_for_seconds=10000000000 save_every_sec=120 keep_checkpoints=2 load_checkpoint_kind=latest save_milestones_sec=-1 save_best_every_sec=5 save_best_metric=reward save_best_after=100000 benchmark=False encoder_mlp_layers=[512, 512] encoder_conv_architecture=convnet_simple encoder_conv_mlp_layers=[512] use_rnn=True rnn_size=512 rnn_type=gru rnn_num_layers=1 decoder_mlp_layers=[] nonlinearity=elu policy_initialization=orthogonal policy_init_gain=1.0 actor_critic_share_weights=True adaptive_stddev=True continuous_tanh_scale=0.0 initial_stddev=1.0 use_env_info_cache=False env_gpu_actions=False env_gpu_observations=True env_frameskip=4 env_framestack=1 pixel_format=CHW use_record_episode_statistics=False with_wandb=False wandb_user=None wandb_project=sample_factory wandb_group=None wandb_job_type=SF wandb_tags=[] with_pbt=False pbt_mix_policies_in_one_env=True pbt_period_env_steps=5000000 pbt_start_mutation=20000000 pbt_replace_fraction=0.3 pbt_mutation_rate=0.15 pbt_replace_reward_gap=0.1 pbt_replace_reward_gap_absolute=1e-06 pbt_optimize_gamma=False pbt_target_objective=true_objective pbt_perturb_min=1.1 pbt_perturb_max=1.5 num_agents=-1 num_humans=0 num_bots=-1 start_bot_difficulty=None timelimit=None res_w=128 res_h=72 wide_aspect_ratio=False eval_env_frameskip=1 fps=35 command_line=--env=doom_health_gathering_supreme --num_workers=8 --num_envs_per_worker=4 --train_for_env_steps=4000000 cli_args={'env': 'doom_health_gathering_supreme', 'num_workers': 8, 'num_envs_per_worker': 4, 'train_for_env_steps': 4000000} git_hash=unknown git_repo_name=not a git repository [2025-01-07 12:01:38,594][01669] Saving configuration to /content/train_dir/default_experiment/config.json... [2025-01-07 12:01:38,598][01669] Rollout worker 0 uses device cpu [2025-01-07 12:01:38,600][01669] Rollout worker 1 uses device cpu [2025-01-07 12:01:38,602][01669] Rollout worker 2 uses device cpu [2025-01-07 12:01:38,603][01669] Rollout worker 3 uses device cpu [2025-01-07 12:01:38,605][01669] Rollout worker 4 uses device cpu [2025-01-07 12:01:38,606][01669] Rollout worker 5 uses device cpu [2025-01-07 12:01:38,607][01669] Rollout worker 6 uses device cpu [2025-01-07 12:01:38,609][01669] Rollout worker 7 uses device cpu [2025-01-07 12:01:38,786][01669] InferenceWorker_p0-w0: min num requests: 2 [2025-01-07 12:01:38,869][01669] Starting all processes... [2025-01-07 12:01:38,876][01669] Starting process learner_proc0 [2025-01-07 12:01:38,968][01669] Starting all processes... [2025-01-07 12:01:39,042][01669] Starting process inference_proc0-0 [2025-01-07 12:01:39,042][01669] Starting process rollout_proc0 [2025-01-07 12:01:39,050][01669] Starting process rollout_proc1 [2025-01-07 12:01:39,051][01669] Starting process rollout_proc2 [2025-01-07 12:01:39,051][01669] Starting process rollout_proc3 [2025-01-07 12:01:39,051][01669] Starting process rollout_proc4 [2025-01-07 12:01:39,051][01669] Starting process rollout_proc5 [2025-01-07 12:01:39,054][01669] Starting process rollout_proc6 [2025-01-07 12:01:39,054][01669] Starting process rollout_proc7 [2025-01-07 12:02:04,019][05745] Starting seed is not provided [2025-01-07 12:02:04,020][05745] Initializing actor-critic model on device cpu [2025-01-07 12:02:04,021][05745] RunningMeanStd input shape: (3, 72, 128) [2025-01-07 12:02:04,027][05745] RunningMeanStd input shape: (1,) [2025-01-07 12:02:04,021][01669] Heartbeat connected on Batcher_0 [2025-01-07 12:02:04,243][05745] ConvEncoder: input_channels=3 [2025-01-07 12:02:04,953][05762] Worker 2 uses CPU cores [0] [2025-01-07 12:02:05,094][05760] Worker 1 uses CPU cores [1] [2025-01-07 12:02:05,103][01669] Heartbeat connected on RolloutWorker_w2 [2025-01-07 12:02:05,106][05761] Worker 3 uses CPU cores [1] [2025-01-07 12:02:05,140][01669] Heartbeat connected on InferenceWorker_p0-w0 [2025-01-07 12:02:05,162][05766] Worker 7 uses CPU cores [1] [2025-01-07 12:02:05,178][05765] Worker 6 uses CPU cores [0] [2025-01-07 12:02:05,202][01669] Heartbeat connected on RolloutWorker_w1 [2025-01-07 12:02:05,216][01669] Heartbeat connected on RolloutWorker_w3 [2025-01-07 12:02:05,259][01669] Heartbeat connected on RolloutWorker_w7 [2025-01-07 12:02:05,264][05759] Worker 0 uses CPU cores [0] [2025-01-07 12:02:05,276][01669] Heartbeat connected on RolloutWorker_w6 [2025-01-07 12:02:05,300][05763] Worker 4 uses CPU cores [0] [2025-01-07 12:02:05,303][01669] Heartbeat connected on RolloutWorker_w0 [2025-01-07 12:02:05,314][01669] Heartbeat connected on RolloutWorker_w4 [2025-01-07 12:02:05,384][05745] Conv encoder output size: 512 [2025-01-07 12:02:05,385][05745] Policy head output size: 512 [2025-01-07 12:02:05,415][05764] Worker 5 uses CPU cores [1] [2025-01-07 12:02:05,427][01669] Heartbeat connected on RolloutWorker_w5 [2025-01-07 12:02:05,461][05745] Created Actor Critic model with architecture: [2025-01-07 12:02:05,462][05745] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2025-01-07 12:02:06,001][05745] Using optimizer [2025-01-07 12:02:10,227][05745] No checkpoints found [2025-01-07 12:02:10,227][05745] Did not load from checkpoint, starting from scratch! [2025-01-07 12:02:10,228][05745] Initialized policy 0 weights for model version 0 [2025-01-07 12:02:10,230][05745] LearnerWorker_p0 finished initialization! [2025-01-07 12:02:10,233][01669] Heartbeat connected on LearnerWorker_p0 [2025-01-07 12:02:10,242][05758] RunningMeanStd input shape: (3, 72, 128) [2025-01-07 12:02:10,244][05758] RunningMeanStd input shape: (1,) [2025-01-07 12:02:10,269][05758] ConvEncoder: input_channels=3 [2025-01-07 12:02:10,418][05758] Conv encoder output size: 512 [2025-01-07 12:02:10,419][05758] Policy head output size: 512 [2025-01-07 12:02:10,443][01669] Inference worker 0-0 is ready! [2025-01-07 12:02:10,444][01669] All inference workers are ready! Signal rollout workers to start! [2025-01-07 12:02:10,703][05761] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-07 12:02:10,706][05764] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-07 12:02:10,705][05766] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-07 12:02:10,708][05760] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-07 12:02:10,730][05763] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-07 12:02:10,731][05762] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-07 12:02:10,729][05759] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-07 12:02:10,736][05765] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-07 12:02:11,512][05765] Decorrelating experience for 0 frames... [2025-01-07 12:02:11,606][01669] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2025-01-07 12:02:11,972][05765] Decorrelating experience for 32 frames... [2025-01-07 12:02:12,309][05764] Decorrelating experience for 0 frames... [2025-01-07 12:02:12,305][05766] Decorrelating experience for 0 frames... [2025-01-07 12:02:12,310][05761] Decorrelating experience for 0 frames... [2025-01-07 12:02:13,109][05765] Decorrelating experience for 64 frames... [2025-01-07 12:02:13,648][05764] Decorrelating experience for 32 frames... [2025-01-07 12:02:13,649][05761] Decorrelating experience for 32 frames... [2025-01-07 12:02:13,646][05766] Decorrelating experience for 32 frames... [2025-01-07 12:02:13,728][05762] Decorrelating experience for 0 frames... [2025-01-07 12:02:13,746][05763] Decorrelating experience for 0 frames... [2025-01-07 12:02:14,588][05765] Decorrelating experience for 96 frames... [2025-01-07 12:02:15,457][05763] Decorrelating experience for 32 frames... [2025-01-07 12:02:15,541][05759] Decorrelating experience for 0 frames... [2025-01-07 12:02:15,911][05760] Decorrelating experience for 0 frames... [2025-01-07 12:02:16,404][05766] Decorrelating experience for 64 frames... [2025-01-07 12:02:16,410][05761] Decorrelating experience for 64 frames... [2025-01-07 12:02:16,449][05764] Decorrelating experience for 64 frames... [2025-01-07 12:02:16,606][01669] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 36.0. Samples: 180. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2025-01-07 12:02:19,440][05760] Decorrelating experience for 32 frames... [2025-01-07 12:02:19,582][05762] Decorrelating experience for 32 frames... [2025-01-07 12:02:19,743][05766] Decorrelating experience for 96 frames... [2025-01-07 12:02:19,829][05764] Decorrelating experience for 96 frames... [2025-01-07 12:02:20,048][05763] Decorrelating experience for 64 frames... [2025-01-07 12:02:20,050][05759] Decorrelating experience for 32 frames... [2025-01-07 12:02:21,606][01669] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 49.0. Samples: 490. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2025-01-07 12:02:21,612][01669] Avg episode reward: [(0, '3.587')] [2025-01-07 12:02:23,530][05761] Decorrelating experience for 96 frames... [2025-01-07 12:02:24,627][05763] Decorrelating experience for 96 frames... [2025-01-07 12:02:24,881][05762] Decorrelating experience for 64 frames... [2025-01-07 12:02:25,171][05760] Decorrelating experience for 64 frames... [2025-01-07 12:02:25,872][05759] Decorrelating experience for 64 frames... [2025-01-07 12:02:26,606][01669] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 103.5. Samples: 1552. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2025-01-07 12:02:26,612][01669] Avg episode reward: [(0, '3.210')] [2025-01-07 12:02:27,878][05745] Signal inference workers to stop experience collection... [2025-01-07 12:02:27,913][05758] InferenceWorker_p0-w0: stopping experience collection [2025-01-07 12:02:28,323][05762] Decorrelating experience for 96 frames... [2025-01-07 12:02:28,433][05760] Decorrelating experience for 96 frames... [2025-01-07 12:02:28,980][05759] Decorrelating experience for 96 frames... [2025-01-07 12:02:29,305][05745] Signal inference workers to resume experience collection... [2025-01-07 12:02:29,306][05758] InferenceWorker_p0-w0: resuming experience collection [2025-01-07 12:02:31,606][01669] Fps is (10 sec: 409.6, 60 sec: 204.8, 300 sec: 204.8). Total num frames: 4096. Throughput: 0: 153.6. Samples: 3072. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2025-01-07 12:02:31,609][01669] Avg episode reward: [(0, '3.113')] [2025-01-07 12:02:36,606][01669] Fps is (10 sec: 819.2, 60 sec: 327.7, 300 sec: 327.7). Total num frames: 8192. Throughput: 0: 135.6. Samples: 3390. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2025-01-07 12:02:36,612][01669] Avg episode reward: [(0, '2.967')] [2025-01-07 12:02:41,606][01669] Fps is (10 sec: 819.2, 60 sec: 409.6, 300 sec: 409.6). Total num frames: 12288. Throughput: 0: 146.9. Samples: 4408. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2025-01-07 12:02:41,609][01669] Avg episode reward: [(0, '3.046')] [2025-01-07 12:02:46,606][01669] Fps is (10 sec: 819.2, 60 sec: 468.1, 300 sec: 468.1). Total num frames: 16384. Throughput: 0: 175.0. Samples: 6126. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2025-01-07 12:02:46,608][01669] Avg episode reward: [(0, '3.283')] [2025-01-07 12:02:51,606][01669] Fps is (10 sec: 819.2, 60 sec: 512.0, 300 sec: 512.0). Total num frames: 20480. Throughput: 0: 168.9. Samples: 6756. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2025-01-07 12:02:51,612][01669] Avg episode reward: [(0, '3.505')] [2025-01-07 12:02:56,606][01669] Fps is (10 sec: 819.2, 60 sec: 546.1, 300 sec: 546.1). Total num frames: 24576. Throughput: 0: 175.1. Samples: 7878. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2025-01-07 12:02:56,609][01669] Avg episode reward: [(0, '3.536')] [2025-01-07 12:03:01,606][01669] Fps is (10 sec: 819.2, 60 sec: 573.4, 300 sec: 573.4). Total num frames: 28672. Throughput: 0: 204.8. Samples: 9396. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2025-01-07 12:03:01,609][01669] Avg episode reward: [(0, '3.607')] [2025-01-07 12:03:06,606][01669] Fps is (10 sec: 819.2, 60 sec: 595.8, 300 sec: 595.8). Total num frames: 32768. Throughput: 0: 210.7. Samples: 9970. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2025-01-07 12:03:06,613][01669] Avg episode reward: [(0, '3.700')] [2025-01-07 12:03:11,606][01669] Fps is (10 sec: 819.2, 60 sec: 614.4, 300 sec: 614.4). Total num frames: 36864. Throughput: 0: 221.9. Samples: 11536. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2025-01-07 12:03:11,609][01669] Avg episode reward: [(0, '3.802')] [2025-01-07 12:03:12,808][05758] Updated weights for policy 0, policy_version 10 (0.1036) [2025-01-07 12:03:16,606][01669] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 630.2). Total num frames: 40960. Throughput: 0: 211.5. Samples: 12590. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2025-01-07 12:03:16,614][01669] Avg episode reward: [(0, '3.880')] [2025-01-07 12:03:21,606][01669] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 643.7). Total num frames: 45056. Throughput: 0: 221.1. Samples: 13338. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2025-01-07 12:03:21,613][01669] Avg episode reward: [(0, '4.159')] [2025-01-07 12:03:26,612][01669] Fps is (10 sec: 1228.2, 60 sec: 887.4, 300 sec: 709.9). Total num frames: 53248. Throughput: 0: 228.6. Samples: 14694. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2025-01-07 12:03:26,614][01669] Avg episode reward: [(0, '4.327')] [2025-01-07 12:03:31,606][01669] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 665.6). Total num frames: 53248. Throughput: 0: 213.1. Samples: 15716. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2025-01-07 12:03:31,612][01669] Avg episode reward: [(0, '4.393')] [2025-01-07 12:03:36,248][05745] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000015_61440.pth... [2025-01-07 12:03:36,606][01669] Fps is (10 sec: 819.6, 60 sec: 887.5, 300 sec: 722.8). Total num frames: 61440. Throughput: 0: 217.5. Samples: 16542. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2025-01-07 12:03:36,613][01669] Avg episode reward: [(0, '4.459')] [2025-01-07 12:03:41,606][01669] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 728.2). Total num frames: 65536. Throughput: 0: 221.2. Samples: 17834. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2025-01-07 12:03:41,612][01669] Avg episode reward: [(0, '4.468')] [2025-01-07 12:03:46,609][01669] Fps is (10 sec: 819.0, 60 sec: 887.4, 300 sec: 733.0). Total num frames: 69632. Throughput: 0: 219.0. Samples: 19250. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2025-01-07 12:03:46,614][01669] Avg episode reward: [(0, '4.512')] [2025-01-07 12:03:51,607][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 737.3). Total num frames: 73728. Throughput: 0: 219.1. Samples: 19828. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2025-01-07 12:03:51,609][01669] Avg episode reward: [(0, '4.514')] [2025-01-07 12:03:56,606][01669] Fps is (10 sec: 819.4, 60 sec: 887.5, 300 sec: 741.2). Total num frames: 77824. Throughput: 0: 214.6. Samples: 21194. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2025-01-07 12:03:56,611][01669] Avg episode reward: [(0, '4.495')] [2025-01-07 12:03:59,142][05758] Updated weights for policy 0, policy_version 20 (0.0495) [2025-01-07 12:04:01,612][01669] Fps is (10 sec: 818.8, 60 sec: 887.4, 300 sec: 744.7). Total num frames: 81920. Throughput: 0: 228.2. Samples: 22860. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2025-01-07 12:04:01,618][01669] Avg episode reward: [(0, '4.467')] [2025-01-07 12:04:06,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 748.0). Total num frames: 86016. Throughput: 0: 218.4. Samples: 23168. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:04:06,611][01669] Avg episode reward: [(0, '4.375')] [2025-01-07 12:04:11,606][01669] Fps is (10 sec: 819.6, 60 sec: 887.5, 300 sec: 750.9). Total num frames: 90112. Throughput: 0: 210.4. Samples: 24162. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:04:11,610][01669] Avg episode reward: [(0, '4.274')] [2025-01-07 12:04:16,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 753.7). Total num frames: 94208. Throughput: 0: 227.5. Samples: 25952. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:04:16,609][01669] Avg episode reward: [(0, '4.286')] [2025-01-07 12:04:21,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 756.2). Total num frames: 98304. Throughput: 0: 223.6. Samples: 26602. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:04:21,612][01669] Avg episode reward: [(0, '4.230')] [2025-01-07 12:04:26,607][01669] Fps is (10 sec: 819.2, 60 sec: 819.3, 300 sec: 758.5). Total num frames: 102400. Throughput: 0: 216.0. Samples: 27552. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:04:26,610][01669] Avg episode reward: [(0, '4.256')] [2025-01-07 12:04:28,773][05745] Saving new best policy, reward=4.256! [2025-01-07 12:04:31,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 760.7). Total num frames: 106496. Throughput: 0: 221.7. Samples: 29224. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:04:31,614][01669] Avg episode reward: [(0, '4.230')] [2025-01-07 12:04:36,606][01669] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 762.7). Total num frames: 110592. Throughput: 0: 224.5. Samples: 29930. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2025-01-07 12:04:36,609][01669] Avg episode reward: [(0, '4.317')] [2025-01-07 12:04:41,606][01669] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 764.6). Total num frames: 114688. Throughput: 0: 219.7. Samples: 31082. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2025-01-07 12:04:41,613][01669] Avg episode reward: [(0, '4.399')] [2025-01-07 12:04:43,057][05745] Saving new best policy, reward=4.317! [2025-01-07 12:04:46,606][01669] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 766.3). Total num frames: 118784. Throughput: 0: 210.6. Samples: 32336. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:04:46,615][01669] Avg episode reward: [(0, '4.382')] [2025-01-07 12:04:47,544][05745] Saving new best policy, reward=4.399! [2025-01-07 12:04:47,549][05758] Updated weights for policy 0, policy_version 30 (0.0508) [2025-01-07 12:04:51,606][01669] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 793.6). Total num frames: 126976. Throughput: 0: 222.6. Samples: 33186. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:04:51,613][01669] Avg episode reward: [(0, '4.421')] [2025-01-07 12:04:55,953][05745] Saving new best policy, reward=4.421! [2025-01-07 12:04:56,607][01669] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 794.4). Total num frames: 131072. Throughput: 0: 226.4. Samples: 34352. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:04:56,615][01669] Avg episode reward: [(0, '4.477')] [2025-01-07 12:05:01,606][01669] Fps is (10 sec: 409.6, 60 sec: 819.3, 300 sec: 771.0). Total num frames: 131072. Throughput: 0: 209.3. Samples: 35370. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:05:01,611][01669] Avg episode reward: [(0, '4.503')] [2025-01-07 12:05:01,990][05745] Saving new best policy, reward=4.477! [2025-01-07 12:05:06,126][05745] Saving new best policy, reward=4.503! [2025-01-07 12:05:06,613][01669] Fps is (10 sec: 818.7, 60 sec: 887.4, 300 sec: 795.8). Total num frames: 139264. Throughput: 0: 217.8. Samples: 36404. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2025-01-07 12:05:06,615][01669] Avg episode reward: [(0, '4.461')] [2025-01-07 12:05:11,606][01669] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 796.4). Total num frames: 143360. Throughput: 0: 220.4. Samples: 37468. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:05:11,609][01669] Avg episode reward: [(0, '4.461')] [2025-01-07 12:05:16,611][01669] Fps is (10 sec: 819.4, 60 sec: 887.4, 300 sec: 797.0). Total num frames: 147456. Throughput: 0: 204.0. Samples: 38404. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:05:16,618][01669] Avg episode reward: [(0, '4.445')] [2025-01-07 12:05:21,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 797.6). Total num frames: 151552. Throughput: 0: 208.1. Samples: 39294. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:05:21,611][01669] Avg episode reward: [(0, '4.497')] [2025-01-07 12:05:26,606][01669] Fps is (10 sec: 819.5, 60 sec: 887.5, 300 sec: 798.2). Total num frames: 155648. Throughput: 0: 216.4. Samples: 40820. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:05:26,612][01669] Avg episode reward: [(0, '4.513')] [2025-01-07 12:05:28,583][05745] Saving new best policy, reward=4.513! [2025-01-07 12:05:31,611][01669] Fps is (10 sec: 818.8, 60 sec: 887.4, 300 sec: 798.7). Total num frames: 159744. Throughput: 0: 223.4. Samples: 42388. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2025-01-07 12:05:31,616][01669] Avg episode reward: [(0, '4.519')] [2025-01-07 12:05:34,248][05745] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000040_163840.pth... [2025-01-07 12:05:34,257][05758] Updated weights for policy 0, policy_version 40 (0.1072) [2025-01-07 12:05:34,373][05745] Saving new best policy, reward=4.519! [2025-01-07 12:05:36,610][01669] Fps is (10 sec: 818.9, 60 sec: 887.4, 300 sec: 799.2). Total num frames: 163840. Throughput: 0: 208.3. Samples: 42558. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2025-01-07 12:05:36,613][01669] Avg episode reward: [(0, '4.574')] [2025-01-07 12:05:39,427][05745] Saving new best policy, reward=4.574! [2025-01-07 12:05:41,607][01669] Fps is (10 sec: 819.6, 60 sec: 887.5, 300 sec: 799.7). Total num frames: 167936. Throughput: 0: 213.6. Samples: 43964. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:05:41,612][01669] Avg episode reward: [(0, '4.637')] [2025-01-07 12:05:43,444][05745] Saving new best policy, reward=4.637! [2025-01-07 12:05:46,606][01669] Fps is (10 sec: 819.5, 60 sec: 887.5, 300 sec: 800.1). Total num frames: 172032. Throughput: 0: 227.1. Samples: 45590. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:05:46,612][01669] Avg episode reward: [(0, '4.637')] [2025-01-07 12:05:51,607][01669] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 800.6). Total num frames: 176128. Throughput: 0: 215.0. Samples: 46078. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:05:51,612][01669] Avg episode reward: [(0, '4.719')] [2025-01-07 12:05:56,609][01669] Fps is (10 sec: 819.0, 60 sec: 819.2, 300 sec: 801.0). Total num frames: 180224. Throughput: 0: 214.8. Samples: 47134. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:05:56,615][01669] Avg episode reward: [(0, '4.762')] [2025-01-07 12:05:57,989][05745] Saving new best policy, reward=4.719! [2025-01-07 12:06:01,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 801.4). Total num frames: 184320. Throughput: 0: 227.4. Samples: 48636. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:06:01,614][01669] Avg episode reward: [(0, '4.827')] [2025-01-07 12:06:01,745][05745] Saving new best policy, reward=4.762! [2025-01-07 12:06:05,846][05745] Saving new best policy, reward=4.827! [2025-01-07 12:06:06,606][01669] Fps is (10 sec: 1229.1, 60 sec: 887.6, 300 sec: 819.2). Total num frames: 192512. Throughput: 0: 230.3. Samples: 49656. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2025-01-07 12:06:06,611][01669] Avg episode reward: [(0, '4.795')] [2025-01-07 12:06:11,606][01669] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 802.1). Total num frames: 192512. Throughput: 0: 219.5. Samples: 50696. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2025-01-07 12:06:11,609][01669] Avg episode reward: [(0, '4.745')] [2025-01-07 12:06:16,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 819.2). Total num frames: 200704. Throughput: 0: 212.1. Samples: 51932. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2025-01-07 12:06:16,609][01669] Avg episode reward: [(0, '4.627')] [2025-01-07 12:06:20,222][05758] Updated weights for policy 0, policy_version 50 (0.0461) [2025-01-07 12:06:21,606][01669] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 819.2). Total num frames: 204800. Throughput: 0: 226.8. Samples: 52762. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2025-01-07 12:06:21,614][01669] Avg episode reward: [(0, '4.484')] [2025-01-07 12:06:22,406][05745] Signal inference workers to stop experience collection... (50 times) [2025-01-07 12:06:22,454][05758] InferenceWorker_p0-w0: stopping experience collection (50 times) [2025-01-07 12:06:23,891][05745] Signal inference workers to resume experience collection... (50 times) [2025-01-07 12:06:23,892][05758] InferenceWorker_p0-w0: resuming experience collection (50 times) [2025-01-07 12:06:26,607][01669] Fps is (10 sec: 819.1, 60 sec: 887.5, 300 sec: 819.2). Total num frames: 208896. Throughput: 0: 225.9. Samples: 54130. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2025-01-07 12:06:26,615][01669] Avg episode reward: [(0, '4.432')] [2025-01-07 12:06:31,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 819.2). Total num frames: 212992. Throughput: 0: 216.4. Samples: 55326. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2025-01-07 12:06:31,612][01669] Avg episode reward: [(0, '4.334')] [2025-01-07 12:06:36,606][01669] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 819.2). Total num frames: 217088. Throughput: 0: 221.8. Samples: 56060. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2025-01-07 12:06:36,616][01669] Avg episode reward: [(0, '4.330')] [2025-01-07 12:06:41,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 819.2). Total num frames: 221184. Throughput: 0: 237.6. Samples: 57824. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2025-01-07 12:06:41,609][01669] Avg episode reward: [(0, '4.301')] [2025-01-07 12:06:46,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 819.2). Total num frames: 225280. Throughput: 0: 227.0. Samples: 58852. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:06:46,609][01669] Avg episode reward: [(0, '4.199')] [2025-01-07 12:06:51,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 819.2). Total num frames: 229376. Throughput: 0: 218.1. Samples: 59472. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:06:51,614][01669] Avg episode reward: [(0, '4.291')] [2025-01-07 12:06:56,606][01669] Fps is (10 sec: 1228.8, 60 sec: 955.8, 300 sec: 833.6). Total num frames: 237568. Throughput: 0: 226.8. Samples: 60904. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:06:56,613][01669] Avg episode reward: [(0, '4.328')] [2025-01-07 12:07:01,608][01669] Fps is (10 sec: 1228.7, 60 sec: 955.7, 300 sec: 833.3). Total num frames: 241664. Throughput: 0: 227.2. Samples: 62158. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:07:01,611][01669] Avg episode reward: [(0, '4.403')] [2025-01-07 12:07:06,607][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 833.1). Total num frames: 245760. Throughput: 0: 224.4. Samples: 62860. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:07:06,610][01669] Avg episode reward: [(0, '4.419')] [2025-01-07 12:07:06,986][05758] Updated weights for policy 0, policy_version 60 (0.0493) [2025-01-07 12:07:11,606][01669] Fps is (10 sec: 819.3, 60 sec: 955.7, 300 sec: 847.0). Total num frames: 249856. Throughput: 0: 219.5. Samples: 64008. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:07:11,609][01669] Avg episode reward: [(0, '4.464')] [2025-01-07 12:07:16,613][01669] Fps is (10 sec: 818.7, 60 sec: 887.4, 300 sec: 860.8). Total num frames: 253952. Throughput: 0: 233.9. Samples: 65854. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:07:16,621][01669] Avg episode reward: [(0, '4.506')] [2025-01-07 12:07:21,610][01669] Fps is (10 sec: 818.9, 60 sec: 887.4, 300 sec: 874.7). Total num frames: 258048. Throughput: 0: 221.9. Samples: 66046. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:07:21,612][01669] Avg episode reward: [(0, '4.540')] [2025-01-07 12:07:26,606][01669] Fps is (10 sec: 819.7, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 262144. Throughput: 0: 209.5. Samples: 67250. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:07:26,613][01669] Avg episode reward: [(0, '4.497')] [2025-01-07 12:07:31,606][01669] Fps is (10 sec: 819.4, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 266240. Throughput: 0: 229.6. Samples: 69182. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:07:31,612][01669] Avg episode reward: [(0, '4.586')] [2025-01-07 12:07:36,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 270336. Throughput: 0: 228.4. Samples: 69752. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:07:36,611][01669] Avg episode reward: [(0, '4.560')] [2025-01-07 12:07:38,816][05745] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000067_274432.pth... [2025-01-07 12:07:38,930][05745] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000015_61440.pth [2025-01-07 12:07:41,607][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 274432. Throughput: 0: 216.4. Samples: 70644. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:07:41,610][01669] Avg episode reward: [(0, '4.596')] [2025-01-07 12:07:46,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 278528. Throughput: 0: 225.7. Samples: 72316. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:07:46,614][01669] Avg episode reward: [(0, '4.596')] [2025-01-07 12:07:51,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 282624. Throughput: 0: 223.1. Samples: 72898. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:07:51,612][01669] Avg episode reward: [(0, '4.585')] [2025-01-07 12:07:51,713][05758] Updated weights for policy 0, policy_version 70 (0.0956) [2025-01-07 12:07:56,609][01669] Fps is (10 sec: 819.0, 60 sec: 819.2, 300 sec: 874.7). Total num frames: 286720. Throughput: 0: 227.9. Samples: 74266. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:07:56,612][01669] Avg episode reward: [(0, '4.608')] [2025-01-07 12:08:01,606][01669] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 874.7). Total num frames: 290816. Throughput: 0: 211.4. Samples: 75366. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2025-01-07 12:08:01,609][01669] Avg episode reward: [(0, '4.700')] [2025-01-07 12:08:06,606][01669] Fps is (10 sec: 1229.1, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 299008. Throughput: 0: 229.4. Samples: 76368. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:08:06,612][01669] Avg episode reward: [(0, '4.615')] [2025-01-07 12:08:11,606][01669] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 303104. Throughput: 0: 226.6. Samples: 77448. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:08:11,613][01669] Avg episode reward: [(0, '4.628')] [2025-01-07 12:08:16,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.6, 300 sec: 888.6). Total num frames: 307200. Throughput: 0: 206.6. Samples: 78480. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:08:16,612][01669] Avg episode reward: [(0, '4.609')] [2025-01-07 12:08:21,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.8). Total num frames: 311296. Throughput: 0: 216.0. Samples: 79470. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:08:21,609][01669] Avg episode reward: [(0, '4.514')] [2025-01-07 12:08:26,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 315392. Throughput: 0: 228.9. Samples: 80944. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:08:26,610][01669] Avg episode reward: [(0, '4.557')] [2025-01-07 12:08:31,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 319488. Throughput: 0: 216.9. Samples: 82078. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:08:31,609][01669] Avg episode reward: [(0, '4.545')] [2025-01-07 12:08:36,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 323584. Throughput: 0: 213.8. Samples: 82518. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:08:36,612][01669] Avg episode reward: [(0, '4.491')] [2025-01-07 12:08:38,706][05758] Updated weights for policy 0, policy_version 80 (0.0593) [2025-01-07 12:08:41,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 327680. Throughput: 0: 222.9. Samples: 84294. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:08:41,613][01669] Avg episode reward: [(0, '4.622')] [2025-01-07 12:08:46,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 331776. Throughput: 0: 226.5. Samples: 85560. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:08:46,608][01669] Avg episode reward: [(0, '4.610')] [2025-01-07 12:08:51,608][01669] Fps is (10 sec: 819.1, 60 sec: 887.4, 300 sec: 874.7). Total num frames: 335872. Throughput: 0: 212.1. Samples: 85914. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:08:51,613][01669] Avg episode reward: [(0, '4.686')] [2025-01-07 12:08:56,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.8). Total num frames: 339968. Throughput: 0: 227.0. Samples: 87664. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:08:56,613][01669] Avg episode reward: [(0, '4.695')] [2025-01-07 12:09:01,607][01669] Fps is (10 sec: 1228.9, 60 sec: 955.7, 300 sec: 888.6). Total num frames: 348160. Throughput: 0: 231.6. Samples: 88902. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:09:01,609][01669] Avg episode reward: [(0, '4.750')] [2025-01-07 12:09:06,608][01669] Fps is (10 sec: 1228.6, 60 sec: 887.4, 300 sec: 888.6). Total num frames: 352256. Throughput: 0: 226.7. Samples: 89672. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:09:06,614][01669] Avg episode reward: [(0, '4.678')] [2025-01-07 12:09:11,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 356352. Throughput: 0: 216.7. Samples: 90696. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:09:11,612][01669] Avg episode reward: [(0, '4.714')] [2025-01-07 12:09:16,606][01669] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 360448. Throughput: 0: 225.5. Samples: 92224. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:09:16,609][01669] Avg episode reward: [(0, '4.699')] [2025-01-07 12:09:21,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 364544. Throughput: 0: 230.0. Samples: 92868. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:09:21,612][01669] Avg episode reward: [(0, '4.764')] [2025-01-07 12:09:24,888][05758] Updated weights for policy 0, policy_version 90 (0.0974) [2025-01-07 12:09:26,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 368640. Throughput: 0: 212.7. Samples: 93864. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:09:26,615][01669] Avg episode reward: [(0, '4.718')] [2025-01-07 12:09:31,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 372736. Throughput: 0: 221.9. Samples: 95546. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:09:31,609][01669] Avg episode reward: [(0, '4.750')] [2025-01-07 12:09:33,515][05745] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000092_376832.pth... [2025-01-07 12:09:33,624][05745] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000040_163840.pth [2025-01-07 12:09:36,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 376832. Throughput: 0: 227.5. Samples: 96150. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:09:36,615][01669] Avg episode reward: [(0, '4.748')] [2025-01-07 12:09:41,610][01669] Fps is (10 sec: 818.9, 60 sec: 887.4, 300 sec: 888.6). Total num frames: 380928. Throughput: 0: 216.2. Samples: 97394. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:09:41,618][01669] Avg episode reward: [(0, '4.784')] [2025-01-07 12:09:46,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 385024. Throughput: 0: 217.9. Samples: 98708. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:09:46,609][01669] Avg episode reward: [(0, '4.882')] [2025-01-07 12:09:51,596][05745] Saving new best policy, reward=4.882! [2025-01-07 12:09:51,606][01669] Fps is (10 sec: 1229.3, 60 sec: 955.8, 300 sec: 888.6). Total num frames: 393216. Throughput: 0: 217.3. Samples: 99452. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:09:51,613][01669] Avg episode reward: [(0, '4.904')] [2025-01-07 12:09:55,653][05745] Saving new best policy, reward=4.904! [2025-01-07 12:09:56,606][01669] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 397312. Throughput: 0: 224.8. Samples: 100814. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:09:56,613][01669] Avg episode reward: [(0, '4.933')] [2025-01-07 12:10:01,606][01669] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 874.8). Total num frames: 397312. Throughput: 0: 213.5. Samples: 101830. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:10:01,609][01669] Avg episode reward: [(0, '4.857')] [2025-01-07 12:10:01,794][05745] Saving new best policy, reward=4.933! [2025-01-07 12:10:06,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 405504. Throughput: 0: 221.7. Samples: 102844. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:10:06,609][01669] Avg episode reward: [(0, '4.854')] [2025-01-07 12:10:09,773][05758] Updated weights for policy 0, policy_version 100 (0.0060) [2025-01-07 12:10:11,606][01669] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 409600. Throughput: 0: 230.5. Samples: 104238. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:10:11,609][01669] Avg episode reward: [(0, '4.824')] [2025-01-07 12:10:11,985][05745] Signal inference workers to stop experience collection... (100 times) [2025-01-07 12:10:12,029][05758] InferenceWorker_p0-w0: stopping experience collection (100 times) [2025-01-07 12:10:13,969][05745] Signal inference workers to resume experience collection... (100 times) [2025-01-07 12:10:13,971][05758] InferenceWorker_p0-w0: resuming experience collection (100 times) [2025-01-07 12:10:16,609][01669] Fps is (10 sec: 819.0, 60 sec: 887.4, 300 sec: 888.6). Total num frames: 413696. Throughput: 0: 221.4. Samples: 105510. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:10:16,614][01669] Avg episode reward: [(0, '4.761')] [2025-01-07 12:10:21,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 417792. Throughput: 0: 218.0. Samples: 105962. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:10:21,609][01669] Avg episode reward: [(0, '4.702')] [2025-01-07 12:10:26,606][01669] Fps is (10 sec: 819.4, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 421888. Throughput: 0: 227.9. Samples: 107650. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:10:26,610][01669] Avg episode reward: [(0, '4.641')] [2025-01-07 12:10:31,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 425984. Throughput: 0: 227.7. Samples: 108956. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:10:31,609][01669] Avg episode reward: [(0, '4.574')] [2025-01-07 12:10:36,611][01669] Fps is (10 sec: 818.9, 60 sec: 887.4, 300 sec: 888.6). Total num frames: 430080. Throughput: 0: 222.0. Samples: 109444. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:10:36,613][01669] Avg episode reward: [(0, '4.611')] [2025-01-07 12:10:41,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 434176. Throughput: 0: 225.1. Samples: 110944. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:10:41,609][01669] Avg episode reward: [(0, '4.606')] [2025-01-07 12:10:46,606][01669] Fps is (10 sec: 1229.3, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 442368. Throughput: 0: 228.8. Samples: 112128. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:10:46,611][01669] Avg episode reward: [(0, '4.670')] [2025-01-07 12:10:51,606][01669] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 446464. Throughput: 0: 228.4. Samples: 113122. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:10:51,615][01669] Avg episode reward: [(0, '4.664')] [2025-01-07 12:10:56,347][05758] Updated weights for policy 0, policy_version 110 (0.0942) [2025-01-07 12:10:56,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 450560. Throughput: 0: 220.9. Samples: 114178. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:10:56,612][01669] Avg episode reward: [(0, '4.681')] [2025-01-07 12:11:01,606][01669] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 888.6). Total num frames: 454656. Throughput: 0: 228.1. Samples: 115772. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:11:01,613][01669] Avg episode reward: [(0, '4.805')] [2025-01-07 12:11:06,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 458752. Throughput: 0: 234.5. Samples: 116514. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:11:06,613][01669] Avg episode reward: [(0, '4.889')] [2025-01-07 12:11:11,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 462848. Throughput: 0: 220.7. Samples: 117580. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:11:11,612][01669] Avg episode reward: [(0, '4.850')] [2025-01-07 12:11:16,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 466944. Throughput: 0: 226.3. Samples: 119140. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:11:16,611][01669] Avg episode reward: [(0, '4.861')] [2025-01-07 12:11:21,607][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 471040. Throughput: 0: 230.5. Samples: 119814. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:11:21,613][01669] Avg episode reward: [(0, '4.825')] [2025-01-07 12:11:26,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 475136. Throughput: 0: 229.0. Samples: 121248. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:11:26,609][01669] Avg episode reward: [(0, '4.754')] [2025-01-07 12:11:31,607][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 479232. Throughput: 0: 225.0. Samples: 122254. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:11:31,609][01669] Avg episode reward: [(0, '4.739')] [2025-01-07 12:11:36,166][05745] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000119_487424.pth... [2025-01-07 12:11:36,278][05745] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000067_274432.pth [2025-01-07 12:11:36,606][01669] Fps is (10 sec: 1228.8, 60 sec: 955.8, 300 sec: 902.5). Total num frames: 487424. Throughput: 0: 223.2. Samples: 123168. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:11:36,612][01669] Avg episode reward: [(0, '4.651')] [2025-01-07 12:11:40,078][05758] Updated weights for policy 0, policy_version 120 (0.0049) [2025-01-07 12:11:41,606][01669] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 491520. Throughput: 0: 228.0. Samples: 124436. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:11:41,609][01669] Avg episode reward: [(0, '4.683')] [2025-01-07 12:11:46,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 495616. Throughput: 0: 214.7. Samples: 125432. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:11:46,613][01669] Avg episode reward: [(0, '4.620')] [2025-01-07 12:11:51,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 499712. Throughput: 0: 218.4. Samples: 126340. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:11:51,608][01669] Avg episode reward: [(0, '4.583')] [2025-01-07 12:11:56,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 503808. Throughput: 0: 224.9. Samples: 127700. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:11:56,611][01669] Avg episode reward: [(0, '4.449')] [2025-01-07 12:12:01,609][01669] Fps is (10 sec: 819.0, 60 sec: 887.4, 300 sec: 888.6). Total num frames: 507904. Throughput: 0: 219.3. Samples: 129010. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:12:01,612][01669] Avg episode reward: [(0, '4.452')] [2025-01-07 12:12:06,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 512000. Throughput: 0: 212.4. Samples: 129372. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:12:06,609][01669] Avg episode reward: [(0, '4.499')] [2025-01-07 12:12:11,607][01669] Fps is (10 sec: 819.4, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 516096. Throughput: 0: 216.6. Samples: 130994. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:12:11,614][01669] Avg episode reward: [(0, '4.584')] [2025-01-07 12:12:16,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 520192. Throughput: 0: 226.9. Samples: 132466. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2025-01-07 12:12:16,611][01669] Avg episode reward: [(0, '4.544')] [2025-01-07 12:12:21,607][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 524288. Throughput: 0: 218.5. Samples: 133002. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2025-01-07 12:12:21,613][01669] Avg episode reward: [(0, '4.674')] [2025-01-07 12:12:26,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 528384. Throughput: 0: 221.2. Samples: 134388. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:12:26,615][01669] Avg episode reward: [(0, '4.572')] [2025-01-07 12:12:27,905][05758] Updated weights for policy 0, policy_version 130 (0.0930) [2025-01-07 12:12:31,606][01669] Fps is (10 sec: 1228.9, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 536576. Throughput: 0: 226.4. Samples: 135618. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:12:31,609][01669] Avg episode reward: [(0, '4.563')] [2025-01-07 12:12:36,607][01669] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 540672. Throughput: 0: 228.5. Samples: 136624. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:12:36,614][01669] Avg episode reward: [(0, '4.564')] [2025-01-07 12:12:41,606][01669] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 888.6). Total num frames: 540672. Throughput: 0: 219.0. Samples: 137554. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:12:41,609][01669] Avg episode reward: [(0, '4.629')] [2025-01-07 12:12:46,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 548864. Throughput: 0: 217.8. Samples: 138812. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:12:46,612][01669] Avg episode reward: [(0, '4.663')] [2025-01-07 12:12:51,606][01669] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 552960. Throughput: 0: 230.0. Samples: 139720. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:12:51,609][01669] Avg episode reward: [(0, '4.568')] [2025-01-07 12:12:56,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 557056. Throughput: 0: 214.6. Samples: 140650. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:12:56,609][01669] Avg episode reward: [(0, '4.612')] [2025-01-07 12:13:01,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 561152. Throughput: 0: 213.6. Samples: 142076. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:13:01,609][01669] Avg episode reward: [(0, '4.584')] [2025-01-07 12:13:06,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 565248. Throughput: 0: 216.0. Samples: 142720. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:13:06,609][01669] Avg episode reward: [(0, '4.607')] [2025-01-07 12:13:11,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 569344. Throughput: 0: 215.9. Samples: 144104. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:13:11,617][01669] Avg episode reward: [(0, '4.663')] [2025-01-07 12:13:14,628][05758] Updated weights for policy 0, policy_version 140 (0.0537) [2025-01-07 12:13:16,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 573440. Throughput: 0: 214.6. Samples: 145276. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:13:16,610][01669] Avg episode reward: [(0, '4.622')] [2025-01-07 12:13:21,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 577536. Throughput: 0: 206.4. Samples: 145912. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:13:21,609][01669] Avg episode reward: [(0, '4.662')] [2025-01-07 12:13:26,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 581632. Throughput: 0: 228.7. Samples: 147846. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:13:26,614][01669] Avg episode reward: [(0, '4.625')] [2025-01-07 12:13:31,606][01669] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 888.6). Total num frames: 585728. Throughput: 0: 223.5. Samples: 148868. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:13:31,612][01669] Avg episode reward: [(0, '4.658')] [2025-01-07 12:13:33,570][05745] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000144_589824.pth... [2025-01-07 12:13:33,732][05745] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000092_376832.pth [2025-01-07 12:13:36,607][01669] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 888.6). Total num frames: 589824. Throughput: 0: 212.5. Samples: 149282. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:13:36,617][01669] Avg episode reward: [(0, '4.656')] [2025-01-07 12:13:41,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 593920. Throughput: 0: 231.0. Samples: 151044. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:13:41,614][01669] Avg episode reward: [(0, '4.624')] [2025-01-07 12:13:46,606][01669] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 888.6). Total num frames: 598016. Throughput: 0: 222.8. Samples: 152100. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:13:46,609][01669] Avg episode reward: [(0, '4.632')] [2025-01-07 12:13:51,606][01669] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 888.6). Total num frames: 602112. Throughput: 0: 224.1. Samples: 152806. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:13:51,613][01669] Avg episode reward: [(0, '4.679')] [2025-01-07 12:13:56,606][01669] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 610304. Throughput: 0: 223.7. Samples: 154170. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:13:56,614][01669] Avg episode reward: [(0, '4.576')] [2025-01-07 12:14:00,549][05758] Updated weights for policy 0, policy_version 150 (0.0944) [2025-01-07 12:14:01,606][01669] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 614400. Throughput: 0: 228.4. Samples: 155556. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:14:01,609][01669] Avg episode reward: [(0, '4.755')] [2025-01-07 12:14:03,010][05745] Signal inference workers to stop experience collection... (150 times) [2025-01-07 12:14:03,090][05758] InferenceWorker_p0-w0: stopping experience collection (150 times) [2025-01-07 12:14:05,617][05745] Signal inference workers to resume experience collection... (150 times) [2025-01-07 12:14:05,618][05758] InferenceWorker_p0-w0: resuming experience collection (150 times) [2025-01-07 12:14:06,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 618496. Throughput: 0: 227.5. Samples: 156150. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:14:06,612][01669] Avg episode reward: [(0, '4.732')] [2025-01-07 12:14:11,607][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 622592. Throughput: 0: 206.5. Samples: 157138. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:14:11,619][01669] Avg episode reward: [(0, '4.745')] [2025-01-07 12:14:16,607][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 626688. Throughput: 0: 219.6. Samples: 158748. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:14:16,609][01669] Avg episode reward: [(0, '4.797')] [2025-01-07 12:14:21,607][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 630784. Throughput: 0: 223.9. Samples: 159358. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:14:21,609][01669] Avg episode reward: [(0, '4.813')] [2025-01-07 12:14:26,618][01669] Fps is (10 sec: 818.3, 60 sec: 887.3, 300 sec: 888.6). Total num frames: 634880. Throughput: 0: 207.1. Samples: 160366. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:14:26,623][01669] Avg episode reward: [(0, '4.907')] [2025-01-07 12:14:31,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 638976. Throughput: 0: 220.4. Samples: 162016. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:14:31,608][01669] Avg episode reward: [(0, '4.897')] [2025-01-07 12:14:36,606][01669] Fps is (10 sec: 820.1, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 643072. Throughput: 0: 218.6. Samples: 162642. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2025-01-07 12:14:36,613][01669] Avg episode reward: [(0, '4.862')] [2025-01-07 12:14:41,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 647168. Throughput: 0: 217.1. Samples: 163940. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2025-01-07 12:14:41,609][01669] Avg episode reward: [(0, '4.831')] [2025-01-07 12:14:46,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 651264. Throughput: 0: 213.4. Samples: 165158. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2025-01-07 12:14:46,609][01669] Avg episode reward: [(0, '4.843')] [2025-01-07 12:14:48,163][05758] Updated weights for policy 0, policy_version 160 (0.0501) [2025-01-07 12:14:51,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 655360. Throughput: 0: 215.3. Samples: 165840. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2025-01-07 12:14:51,616][01669] Avg episode reward: [(0, '4.791')] [2025-01-07 12:14:56,607][01669] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 663552. Throughput: 0: 225.1. Samples: 167266. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:14:56,613][01669] Avg episode reward: [(0, '4.862')] [2025-01-07 12:15:01,606][01669] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 874.7). Total num frames: 663552. Throughput: 0: 212.8. Samples: 168322. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:15:01,609][01669] Avg episode reward: [(0, '4.772')] [2025-01-07 12:15:06,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 671744. Throughput: 0: 217.1. Samples: 169128. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:15:06,612][01669] Avg episode reward: [(0, '4.832')] [2025-01-07 12:15:11,606][01669] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 675840. Throughput: 0: 222.0. Samples: 170354. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:15:11,609][01669] Avg episode reward: [(0, '4.888')] [2025-01-07 12:15:16,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 679936. Throughput: 0: 215.2. Samples: 171702. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:15:16,609][01669] Avg episode reward: [(0, '4.947')] [2025-01-07 12:15:21,160][05745] Saving new best policy, reward=4.947! [2025-01-07 12:15:21,607][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 684032. Throughput: 0: 217.7. Samples: 172440. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:15:21,613][01669] Avg episode reward: [(0, '4.868')] [2025-01-07 12:15:26,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.6, 300 sec: 888.6). Total num frames: 688128. Throughput: 0: 215.9. Samples: 173656. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:15:26,608][01669] Avg episode reward: [(0, '5.039')] [2025-01-07 12:15:28,993][05745] Saving new best policy, reward=5.039! [2025-01-07 12:15:31,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 692224. Throughput: 0: 228.6. Samples: 175446. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:15:31,614][01669] Avg episode reward: [(0, '5.071')] [2025-01-07 12:15:34,238][05745] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000170_696320.pth... [2025-01-07 12:15:34,249][05758] Updated weights for policy 0, policy_version 170 (0.0974) [2025-01-07 12:15:34,344][05745] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000119_487424.pth [2025-01-07 12:15:34,373][05745] Saving new best policy, reward=5.071! [2025-01-07 12:15:36,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 696320. Throughput: 0: 216.4. Samples: 175580. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:15:36,609][01669] Avg episode reward: [(0, '5.116')] [2025-01-07 12:15:39,721][05745] Saving new best policy, reward=5.116! [2025-01-07 12:15:41,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 700416. Throughput: 0: 211.0. Samples: 176760. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:15:41,609][01669] Avg episode reward: [(0, '5.106')] [2025-01-07 12:15:46,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 704512. Throughput: 0: 227.6. Samples: 178566. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:15:46,614][01669] Avg episode reward: [(0, '5.158')] [2025-01-07 12:15:51,610][01669] Fps is (10 sec: 818.9, 60 sec: 887.4, 300 sec: 874.7). Total num frames: 708608. Throughput: 0: 222.3. Samples: 179130. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2025-01-07 12:15:51,618][01669] Avg episode reward: [(0, '5.188')] [2025-01-07 12:15:53,432][05745] Saving new best policy, reward=5.158! [2025-01-07 12:15:56,606][01669] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 874.7). Total num frames: 712704. Throughput: 0: 218.1. Samples: 180170. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2025-01-07 12:15:56,610][01669] Avg episode reward: [(0, '5.247')] [2025-01-07 12:15:58,196][05745] Saving new best policy, reward=5.188! [2025-01-07 12:15:58,345][05745] Saving new best policy, reward=5.247! [2025-01-07 12:16:01,606][01669] Fps is (10 sec: 819.5, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 716800. Throughput: 0: 222.2. Samples: 181702. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:16:01,612][01669] Avg episode reward: [(0, '5.070')] [2025-01-07 12:16:06,606][01669] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 724992. Throughput: 0: 223.8. Samples: 182510. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:16:06,609][01669] Avg episode reward: [(0, '5.062')] [2025-01-07 12:16:11,606][01669] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 874.7). Total num frames: 724992. Throughput: 0: 221.0. Samples: 183600. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:16:11,610][01669] Avg episode reward: [(0, '5.000')] [2025-01-07 12:16:16,606][01669] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 874.7). Total num frames: 729088. Throughput: 0: 208.8. Samples: 184840. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:16:16,614][01669] Avg episode reward: [(0, '5.065')] [2025-01-07 12:16:21,150][05758] Updated weights for policy 0, policy_version 180 (0.1538) [2025-01-07 12:16:21,606][01669] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 737280. Throughput: 0: 229.4. Samples: 185902. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:16:21,613][01669] Avg episode reward: [(0, '5.081')] [2025-01-07 12:16:26,609][01669] Fps is (10 sec: 1228.5, 60 sec: 887.4, 300 sec: 888.6). Total num frames: 741376. Throughput: 0: 229.1. Samples: 187072. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:16:26,612][01669] Avg episode reward: [(0, '5.094')] [2025-01-07 12:16:31,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 745472. Throughput: 0: 211.5. Samples: 188084. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:16:31,618][01669] Avg episode reward: [(0, '5.117')] [2025-01-07 12:16:36,606][01669] Fps is (10 sec: 819.4, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 749568. Throughput: 0: 218.4. Samples: 188956. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:16:36,609][01669] Avg episode reward: [(0, '5.165')] [2025-01-07 12:16:41,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 753664. Throughput: 0: 231.8. Samples: 190602. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:16:41,613][01669] Avg episode reward: [(0, '5.165')] [2025-01-07 12:16:46,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 757760. Throughput: 0: 221.6. Samples: 191674. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:16:46,609][01669] Avg episode reward: [(0, '5.089')] [2025-01-07 12:16:51,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 761856. Throughput: 0: 213.6. Samples: 192120. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:16:51,608][01669] Avg episode reward: [(0, '5.243')] [2025-01-07 12:16:56,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 765952. Throughput: 0: 227.8. Samples: 193852. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:16:56,614][01669] Avg episode reward: [(0, '5.323')] [2025-01-07 12:17:01,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 770048. Throughput: 0: 230.2. Samples: 195198. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:17:01,614][01669] Avg episode reward: [(0, '5.432')] [2025-01-07 12:17:03,165][05745] Saving new best policy, reward=5.323! [2025-01-07 12:17:06,606][01669] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 874.7). Total num frames: 774144. Throughput: 0: 213.6. Samples: 195516. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:17:06,615][01669] Avg episode reward: [(0, '5.364')] [2025-01-07 12:17:08,539][05745] Saving new best policy, reward=5.432! [2025-01-07 12:17:08,544][05758] Updated weights for policy 0, policy_version 190 (0.0498) [2025-01-07 12:17:11,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 778240. Throughput: 0: 222.6. Samples: 197088. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:17:11,613][01669] Avg episode reward: [(0, '5.534')] [2025-01-07 12:17:16,436][05745] Saving new best policy, reward=5.534! [2025-01-07 12:17:16,606][01669] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 888.6). Total num frames: 786432. Throughput: 0: 226.3. Samples: 198266. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2025-01-07 12:17:16,609][01669] Avg episode reward: [(0, '5.432')] [2025-01-07 12:17:21,606][01669] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 874.7). Total num frames: 786432. Throughput: 0: 224.3. Samples: 199048. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2025-01-07 12:17:21,608][01669] Avg episode reward: [(0, '5.484')] [2025-01-07 12:17:26,606][01669] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 860.9). Total num frames: 790528. Throughput: 0: 212.8. Samples: 200178. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2025-01-07 12:17:26,616][01669] Avg episode reward: [(0, '5.493')] [2025-01-07 12:17:31,606][01669] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 798720. Throughput: 0: 218.9. Samples: 201524. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:17:31,611][01669] Avg episode reward: [(0, '5.447')] [2025-01-07 12:17:35,140][05745] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000196_802816.pth... [2025-01-07 12:17:35,346][05745] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000144_589824.pth [2025-01-07 12:17:36,606][01669] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 802816. Throughput: 0: 226.0. Samples: 202288. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:17:36,613][01669] Avg episode reward: [(0, '5.503')] [2025-01-07 12:17:41,607][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 806912. Throughput: 0: 211.9. Samples: 203388. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:17:41,613][01669] Avg episode reward: [(0, '5.435')] [2025-01-07 12:17:46,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 811008. Throughput: 0: 211.3. Samples: 204706. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:17:46,609][01669] Avg episode reward: [(0, '5.445')] [2025-01-07 12:17:51,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 815104. Throughput: 0: 220.6. Samples: 205444. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:17:51,614][01669] Avg episode reward: [(0, '5.461')] [2025-01-07 12:17:54,351][05758] Updated weights for policy 0, policy_version 200 (0.0502) [2025-01-07 12:17:56,607][01669] Fps is (10 sec: 819.1, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 819200. Throughput: 0: 214.7. Samples: 206748. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:17:56,612][01669] Avg episode reward: [(0, '5.477')] [2025-01-07 12:17:58,234][05745] Signal inference workers to stop experience collection... (200 times) [2025-01-07 12:17:58,297][05758] InferenceWorker_p0-w0: stopping experience collection (200 times) [2025-01-07 12:18:00,225][05745] Signal inference workers to resume experience collection... (200 times) [2025-01-07 12:18:00,226][05758] InferenceWorker_p0-w0: resuming experience collection (200 times) [2025-01-07 12:18:01,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 823296. Throughput: 0: 216.0. Samples: 207988. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:18:01,612][01669] Avg episode reward: [(0, '5.431')] [2025-01-07 12:18:06,606][01669] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 827392. Throughput: 0: 209.3. Samples: 208468. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:18:06,608][01669] Avg episode reward: [(0, '5.565')] [2025-01-07 12:18:08,109][05745] Saving new best policy, reward=5.565! [2025-01-07 12:18:11,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 831488. Throughput: 0: 229.4. Samples: 210500. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:18:11,610][01669] Avg episode reward: [(0, '5.595')] [2025-01-07 12:18:16,606][01669] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 874.7). Total num frames: 835584. Throughput: 0: 215.6. Samples: 211228. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:18:16,609][01669] Avg episode reward: [(0, '5.690')] [2025-01-07 12:18:18,536][05745] Saving new best policy, reward=5.595! [2025-01-07 12:18:18,669][05745] Saving new best policy, reward=5.690! [2025-01-07 12:18:21,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 839680. Throughput: 0: 214.8. Samples: 211954. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:18:21,609][01669] Avg episode reward: [(0, '5.906')] [2025-01-07 12:18:26,320][05745] Saving new best policy, reward=5.906! [2025-01-07 12:18:26,606][01669] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 888.6). Total num frames: 847872. Throughput: 0: 227.6. Samples: 213630. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:18:26,610][01669] Avg episode reward: [(0, '5.911')] [2025-01-07 12:18:31,328][05745] Saving new best policy, reward=5.911! [2025-01-07 12:18:31,606][01669] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 851968. Throughput: 0: 221.2. Samples: 214662. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:18:31,610][01669] Avg episode reward: [(0, '5.957')] [2025-01-07 12:18:36,606][01669] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 874.7). Total num frames: 851968. Throughput: 0: 220.3. Samples: 215358. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:18:36,610][01669] Avg episode reward: [(0, '5.889')] [2025-01-07 12:18:37,015][05745] Saving new best policy, reward=5.957! [2025-01-07 12:18:40,972][05758] Updated weights for policy 0, policy_version 210 (0.0998) [2025-01-07 12:18:41,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 860160. Throughput: 0: 221.2. Samples: 216700. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:18:41,609][01669] Avg episode reward: [(0, '5.860')] [2025-01-07 12:18:46,606][01669] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 864256. Throughput: 0: 228.7. Samples: 218278. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:18:46,612][01669] Avg episode reward: [(0, '5.846')] [2025-01-07 12:18:51,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 868352. Throughput: 0: 227.0. Samples: 218684. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:18:51,612][01669] Avg episode reward: [(0, '5.906')] [2025-01-07 12:18:56,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 872448. Throughput: 0: 208.7. Samples: 219890. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:18:56,609][01669] Avg episode reward: [(0, '5.791')] [2025-01-07 12:19:01,607][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 876544. Throughput: 0: 234.1. Samples: 221762. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:19:01,615][01669] Avg episode reward: [(0, '5.706')] [2025-01-07 12:19:06,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 880640. Throughput: 0: 230.4. Samples: 222320. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:19:06,609][01669] Avg episode reward: [(0, '5.569')] [2025-01-07 12:19:11,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 884736. Throughput: 0: 213.9. Samples: 223256. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:19:11,609][01669] Avg episode reward: [(0, '5.624')] [2025-01-07 12:19:16,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 888832. Throughput: 0: 226.4. Samples: 224850. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:19:16,615][01669] Avg episode reward: [(0, '5.676')] [2025-01-07 12:19:21,606][01669] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 888.7). Total num frames: 897024. Throughput: 0: 230.2. Samples: 225716. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:19:21,609][01669] Avg episode reward: [(0, '5.604')] [2025-01-07 12:19:26,606][01669] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 874.7). Total num frames: 897024. Throughput: 0: 222.5. Samples: 226714. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:19:26,609][01669] Avg episode reward: [(0, '5.663')] [2025-01-07 12:19:27,842][05758] Updated weights for policy 0, policy_version 220 (0.1150) [2025-01-07 12:19:31,606][01669] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 874.7). Total num frames: 901120. Throughput: 0: 214.0. Samples: 227908. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:19:31,609][01669] Avg episode reward: [(0, '5.630')] [2025-01-07 12:19:35,747][05745] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000222_909312.pth... [2025-01-07 12:19:35,867][05745] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000170_696320.pth [2025-01-07 12:19:36,606][01669] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 888.6). Total num frames: 909312. Throughput: 0: 228.4. Samples: 228964. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:19:36,609][01669] Avg episode reward: [(0, '5.524')] [2025-01-07 12:19:41,606][01669] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 913408. Throughput: 0: 226.4. Samples: 230078. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:19:41,609][01669] Avg episode reward: [(0, '5.644')] [2025-01-07 12:19:46,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 917504. Throughput: 0: 207.1. Samples: 231082. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:19:46,611][01669] Avg episode reward: [(0, '5.650')] [2025-01-07 12:19:51,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 921600. Throughput: 0: 215.9. Samples: 232036. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2025-01-07 12:19:51,609][01669] Avg episode reward: [(0, '5.689')] [2025-01-07 12:19:56,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 925696. Throughput: 0: 229.8. Samples: 233598. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2025-01-07 12:19:56,615][01669] Avg episode reward: [(0, '5.845')] [2025-01-07 12:20:01,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 929792. Throughput: 0: 218.1. Samples: 234664. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2025-01-07 12:20:01,612][01669] Avg episode reward: [(0, '5.663')] [2025-01-07 12:20:06,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 933888. Throughput: 0: 210.1. Samples: 235170. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2025-01-07 12:20:06,609][01669] Avg episode reward: [(0, '5.595')] [2025-01-07 12:20:11,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 937984. Throughput: 0: 226.7. Samples: 236916. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2025-01-07 12:20:11,614][01669] Avg episode reward: [(0, '5.601')] [2025-01-07 12:20:12,756][05758] Updated weights for policy 0, policy_version 230 (0.0051) [2025-01-07 12:20:16,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 942080. Throughput: 0: 229.1. Samples: 238218. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2025-01-07 12:20:16,609][01669] Avg episode reward: [(0, '5.827')] [2025-01-07 12:20:21,607][01669] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 874.7). Total num frames: 946176. Throughput: 0: 212.2. Samples: 238514. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2025-01-07 12:20:21,614][01669] Avg episode reward: [(0, '5.818')] [2025-01-07 12:20:26,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 950272. Throughput: 0: 222.4. Samples: 240086. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2025-01-07 12:20:26,609][01669] Avg episode reward: [(0, '5.982')] [2025-01-07 12:20:31,348][05745] Saving new best policy, reward=5.982! [2025-01-07 12:20:31,606][01669] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 888.6). Total num frames: 958464. Throughput: 0: 226.9. Samples: 241292. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2025-01-07 12:20:31,609][01669] Avg episode reward: [(0, '6.056')] [2025-01-07 12:20:36,606][01669] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 874.7). Total num frames: 958464. Throughput: 0: 222.8. Samples: 242060. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2025-01-07 12:20:36,617][01669] Avg episode reward: [(0, '6.035')] [2025-01-07 12:20:37,336][05745] Saving new best policy, reward=6.056! [2025-01-07 12:20:41,606][01669] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 874.7). Total num frames: 962560. Throughput: 0: 213.4. Samples: 243202. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:20:41,613][01669] Avg episode reward: [(0, '5.972')] [2025-01-07 12:20:46,606][01669] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 970752. Throughput: 0: 217.1. Samples: 244432. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:20:46,611][01669] Avg episode reward: [(0, '6.040')] [2025-01-07 12:20:51,611][01669] Fps is (10 sec: 1228.3, 60 sec: 887.4, 300 sec: 888.6). Total num frames: 974848. Throughput: 0: 225.5. Samples: 245318. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:20:51,613][01669] Avg episode reward: [(0, '6.140')] [2025-01-07 12:20:56,459][05745] Saving new best policy, reward=6.140! [2025-01-07 12:20:56,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 978944. Throughput: 0: 208.4. Samples: 246292. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:20:56,611][01669] Avg episode reward: [(0, '6.184')] [2025-01-07 12:21:00,570][05745] Saving new best policy, reward=6.184! [2025-01-07 12:21:00,576][05758] Updated weights for policy 0, policy_version 240 (0.0509) [2025-01-07 12:21:01,606][01669] Fps is (10 sec: 819.5, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 983040. Throughput: 0: 211.2. Samples: 247724. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:21:01,614][01669] Avg episode reward: [(0, '6.414')] [2025-01-07 12:21:04,555][05745] Saving new best policy, reward=6.414! [2025-01-07 12:21:06,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 987136. Throughput: 0: 218.7. Samples: 248356. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:21:06,608][01669] Avg episode reward: [(0, '6.603')] [2025-01-07 12:21:09,482][05745] Saving new best policy, reward=6.603! [2025-01-07 12:21:11,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 991232. Throughput: 0: 210.4. Samples: 249556. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:21:11,612][01669] Avg episode reward: [(0, '6.622')] [2025-01-07 12:21:15,382][05745] Saving new best policy, reward=6.622! [2025-01-07 12:21:16,611][01669] Fps is (10 sec: 818.8, 60 sec: 887.4, 300 sec: 874.7). Total num frames: 995328. Throughput: 0: 212.2. Samples: 250844. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:21:16,620][01669] Avg episode reward: [(0, '6.740')] [2025-01-07 12:21:19,427][05745] Saving new best policy, reward=6.740! [2025-01-07 12:21:21,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 999424. Throughput: 0: 208.9. Samples: 251460. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:21:21,613][01669] Avg episode reward: [(0, '6.606')] [2025-01-07 12:21:26,606][01669] Fps is (10 sec: 819.6, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 1003520. Throughput: 0: 221.9. Samples: 253186. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:21:26,612][01669] Avg episode reward: [(0, '6.562')] [2025-01-07 12:21:31,607][01669] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 874.7). Total num frames: 1007616. Throughput: 0: 215.2. Samples: 254116. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:21:31,612][01669] Avg episode reward: [(0, '6.714')] [2025-01-07 12:21:34,178][05745] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000247_1011712.pth... [2025-01-07 12:21:34,290][05745] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000196_802816.pth [2025-01-07 12:21:36,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 1011712. Throughput: 0: 208.2. Samples: 254684. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2025-01-07 12:21:36,613][01669] Avg episode reward: [(0, '6.699')] [2025-01-07 12:21:41,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 1015808. Throughput: 0: 229.0. Samples: 256596. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2025-01-07 12:21:41,611][01669] Avg episode reward: [(0, '6.675')] [2025-01-07 12:21:46,606][01669] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 874.7). Total num frames: 1019904. Throughput: 0: 221.1. Samples: 257672. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2025-01-07 12:21:46,609][01669] Avg episode reward: [(0, '6.571')] [2025-01-07 12:21:48,257][05758] Updated weights for policy 0, policy_version 250 (0.0052) [2025-01-07 12:21:51,440][05745] Signal inference workers to stop experience collection... (250 times) [2025-01-07 12:21:51,483][05758] InferenceWorker_p0-w0: stopping experience collection (250 times) [2025-01-07 12:21:51,607][01669] Fps is (10 sec: 819.2, 60 sec: 819.3, 300 sec: 874.7). Total num frames: 1024000. Throughput: 0: 214.0. Samples: 257988. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2025-01-07 12:21:51,610][01669] Avg episode reward: [(0, '6.584')] [2025-01-07 12:21:52,894][05745] Signal inference workers to resume experience collection... (250 times) [2025-01-07 12:21:52,895][05758] InferenceWorker_p0-w0: resuming experience collection (250 times) [2025-01-07 12:21:56,606][01669] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 874.7). Total num frames: 1028096. Throughput: 0: 225.5. Samples: 259702. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2025-01-07 12:21:56,609][01669] Avg episode reward: [(0, '6.600')] [2025-01-07 12:22:01,606][01669] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 1036288. Throughput: 0: 223.3. Samples: 260890. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2025-01-07 12:22:01,615][01669] Avg episode reward: [(0, '6.467')] [2025-01-07 12:22:06,606][01669] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 874.7). Total num frames: 1036288. Throughput: 0: 224.6. Samples: 261568. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2025-01-07 12:22:06,613][01669] Avg episode reward: [(0, '6.474')] [2025-01-07 12:22:11,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 1044480. Throughput: 0: 212.5. Samples: 262750. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2025-01-07 12:22:11,615][01669] Avg episode reward: [(0, '6.362')] [2025-01-07 12:22:16,606][01669] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 1048576. Throughput: 0: 230.3. Samples: 264480. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2025-01-07 12:22:16,609][01669] Avg episode reward: [(0, '6.432')] [2025-01-07 12:22:21,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 1052672. Throughput: 0: 226.9. Samples: 264896. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2025-01-07 12:22:21,613][01669] Avg episode reward: [(0, '6.620')] [2025-01-07 12:22:26,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 1056768. Throughput: 0: 206.8. Samples: 265904. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2025-01-07 12:22:26,609][01669] Avg episode reward: [(0, '6.374')] [2025-01-07 12:22:31,607][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 1060864. Throughput: 0: 227.3. Samples: 267902. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2025-01-07 12:22:31,616][01669] Avg episode reward: [(0, '6.462')] [2025-01-07 12:22:32,747][05758] Updated weights for policy 0, policy_version 260 (0.0452) [2025-01-07 12:22:36,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 1064960. Throughput: 0: 237.4. Samples: 268672. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2025-01-07 12:22:36,608][01669] Avg episode reward: [(0, '6.696')] [2025-01-07 12:22:41,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 1069056. Throughput: 0: 220.4. Samples: 269620. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2025-01-07 12:22:41,611][01669] Avg episode reward: [(0, '6.830')] [2025-01-07 12:22:46,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 1073152. Throughput: 0: 224.4. Samples: 270988. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2025-01-07 12:22:46,614][01669] Avg episode reward: [(0, '6.802')] [2025-01-07 12:22:46,740][05745] Saving new best policy, reward=6.830! [2025-01-07 12:22:51,613][01669] Fps is (10 sec: 1228.0, 60 sec: 955.6, 300 sec: 888.6). Total num frames: 1081344. Throughput: 0: 231.7. Samples: 271994. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2025-01-07 12:22:51,615][01669] Avg episode reward: [(0, '6.885')] [2025-01-07 12:22:54,968][05745] Saving new best policy, reward=6.885! [2025-01-07 12:22:56,606][01669] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 888.6). Total num frames: 1085440. Throughput: 0: 230.8. Samples: 273136. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:22:56,609][01669] Avg episode reward: [(0, '7.062')] [2025-01-07 12:23:00,797][05745] Saving new best policy, reward=7.062! [2025-01-07 12:23:01,606][01669] Fps is (10 sec: 819.7, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 1089536. Throughput: 0: 218.7. Samples: 274322. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:23:01,608][01669] Avg episode reward: [(0, '7.249')] [2025-01-07 12:23:04,700][05745] Saving new best policy, reward=7.249! [2025-01-07 12:23:06,606][01669] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 888.6). Total num frames: 1093632. Throughput: 0: 225.5. Samples: 275042. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2025-01-07 12:23:06,613][01669] Avg episode reward: [(0, '7.461')] [2025-01-07 12:23:08,554][05745] Saving new best policy, reward=7.461! [2025-01-07 12:23:11,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 1097728. Throughput: 0: 245.0. Samples: 276930. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2025-01-07 12:23:11,609][01669] Avg episode reward: [(0, '7.544')] [2025-01-07 12:23:16,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 1101824. Throughput: 0: 222.2. Samples: 277902. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2025-01-07 12:23:16,612][01669] Avg episode reward: [(0, '7.855')] [2025-01-07 12:23:18,982][05745] Saving new best policy, reward=7.544! [2025-01-07 12:23:18,988][05758] Updated weights for policy 0, policy_version 270 (0.0543) [2025-01-07 12:23:19,118][05745] Saving new best policy, reward=7.855! [2025-01-07 12:23:21,607][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 1105920. Throughput: 0: 214.8. Samples: 278340. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2025-01-07 12:23:21,609][01669] Avg episode reward: [(0, '7.771')] [2025-01-07 12:23:26,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 1110016. Throughput: 0: 234.2. Samples: 280158. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2025-01-07 12:23:26,609][01669] Avg episode reward: [(0, '7.935')] [2025-01-07 12:23:31,448][05745] Saving new best policy, reward=7.935! [2025-01-07 12:23:31,606][01669] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 1118208. Throughput: 0: 226.8. Samples: 281192. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:23:31,611][01669] Avg episode reward: [(0, '7.942')] [2025-01-07 12:23:36,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 1118208. Throughput: 0: 219.2. Samples: 281856. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:23:36,622][01669] Avg episode reward: [(0, '8.088')] [2025-01-07 12:23:36,998][05745] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000274_1122304.pth... [2025-01-07 12:23:37,111][05745] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000222_909312.pth [2025-01-07 12:23:37,135][05745] Saving new best policy, reward=7.942! [2025-01-07 12:23:40,960][05745] Saving new best policy, reward=8.088! [2025-01-07 12:23:41,606][01669] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 888.6). Total num frames: 1126400. Throughput: 0: 223.7. Samples: 283202. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:23:41,609][01669] Avg episode reward: [(0, '8.137')] [2025-01-07 12:23:44,736][05745] Saving new best policy, reward=8.137! [2025-01-07 12:23:46,611][01669] Fps is (10 sec: 1228.3, 60 sec: 955.7, 300 sec: 888.6). Total num frames: 1130496. Throughput: 0: 234.6. Samples: 284880. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:23:46,618][01669] Avg episode reward: [(0, '8.115')] [2025-01-07 12:23:51,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.6, 300 sec: 888.6). Total num frames: 1134592. Throughput: 0: 224.7. Samples: 285154. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:23:51,613][01669] Avg episode reward: [(0, '8.030')] [2025-01-07 12:23:56,606][01669] Fps is (10 sec: 819.6, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 1138688. Throughput: 0: 206.3. Samples: 286212. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:23:56,614][01669] Avg episode reward: [(0, '7.935')] [2025-01-07 12:24:01,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 1142784. Throughput: 0: 224.3. Samples: 287994. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:24:01,609][01669] Avg episode reward: [(0, '7.915')] [2025-01-07 12:24:03,296][05758] Updated weights for policy 0, policy_version 280 (0.0970) [2025-01-07 12:24:06,612][01669] Fps is (10 sec: 818.7, 60 sec: 887.4, 300 sec: 888.6). Total num frames: 1146880. Throughput: 0: 230.7. Samples: 288724. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:24:06,615][01669] Avg episode reward: [(0, '7.904')] [2025-01-07 12:24:11,611][01669] Fps is (10 sec: 818.8, 60 sec: 887.4, 300 sec: 888.6). Total num frames: 1150976. Throughput: 0: 211.0. Samples: 289652. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:24:11,618][01669] Avg episode reward: [(0, '7.961')] [2025-01-07 12:24:16,606][01669] Fps is (10 sec: 819.7, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 1155072. Throughput: 0: 224.1. Samples: 291278. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:24:16,615][01669] Avg episode reward: [(0, '7.761')] [2025-01-07 12:24:21,606][01669] Fps is (10 sec: 819.6, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 1159168. Throughput: 0: 226.7. Samples: 292058. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:24:21,615][01669] Avg episode reward: [(0, '7.727')] [2025-01-07 12:24:26,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 1163264. Throughput: 0: 223.2. Samples: 293244. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:24:26,613][01669] Avg episode reward: [(0, '7.668')] [2025-01-07 12:24:31,606][01669] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 874.7). Total num frames: 1167360. Throughput: 0: 213.8. Samples: 294498. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:24:31,615][01669] Avg episode reward: [(0, '7.497')] [2025-01-07 12:24:36,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 1171456. Throughput: 0: 222.0. Samples: 295146. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:24:36,609][01669] Avg episode reward: [(0, '7.587')] [2025-01-07 12:24:41,607][01669] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 1179648. Throughput: 0: 228.6. Samples: 296500. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:24:41,610][01669] Avg episode reward: [(0, '7.424')] [2025-01-07 12:24:46,606][01669] Fps is (10 sec: 819.2, 60 sec: 819.3, 300 sec: 874.7). Total num frames: 1179648. Throughput: 0: 212.6. Samples: 297562. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:24:46,612][01669] Avg episode reward: [(0, '7.304')] [2025-01-07 12:24:51,607][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 1187840. Throughput: 0: 218.2. Samples: 298544. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:24:51,617][01669] Avg episode reward: [(0, '7.412')] [2025-01-07 12:24:51,625][05758] Updated weights for policy 0, policy_version 290 (0.1941) [2025-01-07 12:24:56,606][01669] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 1191936. Throughput: 0: 222.3. Samples: 299654. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:24:56,612][01669] Avg episode reward: [(0, '7.784')] [2025-01-07 12:25:01,607][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 1196032. Throughput: 0: 214.4. Samples: 300928. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:25:01,612][01669] Avg episode reward: [(0, '7.807')] [2025-01-07 12:25:06,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.6, 300 sec: 888.6). Total num frames: 1200128. Throughput: 0: 211.9. Samples: 301592. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:25:06,608][01669] Avg episode reward: [(0, '8.015')] [2025-01-07 12:25:11,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 1204224. Throughput: 0: 214.1. Samples: 302880. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:25:11,609][01669] Avg episode reward: [(0, '8.115')] [2025-01-07 12:25:16,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 1208320. Throughput: 0: 226.1. Samples: 304674. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:25:16,612][01669] Avg episode reward: [(0, '8.188')] [2025-01-07 12:25:21,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 1212416. Throughput: 0: 217.4. Samples: 304930. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:25:21,617][01669] Avg episode reward: [(0, '8.156')] [2025-01-07 12:25:23,775][05745] Saving new best policy, reward=8.188! [2025-01-07 12:25:26,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 1216512. Throughput: 0: 220.3. Samples: 306412. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:25:26,614][01669] Avg episode reward: [(0, '8.018')] [2025-01-07 12:25:31,606][01669] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 1224704. Throughput: 0: 225.6. Samples: 307714. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:25:31,609][01669] Avg episode reward: [(0, '8.195')] [2025-01-07 12:25:36,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 1224704. Throughput: 0: 225.5. Samples: 308692. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:25:36,610][01669] Avg episode reward: [(0, '7.959')] [2025-01-07 12:25:36,639][05745] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000300_1228800.pth... [2025-01-07 12:25:36,651][05758] Updated weights for policy 0, policy_version 300 (0.1331) [2025-01-07 12:25:36,752][05745] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000247_1011712.pth [2025-01-07 12:25:36,774][05745] Saving new best policy, reward=8.195! [2025-01-07 12:25:40,820][05745] Signal inference workers to stop experience collection... (300 times) [2025-01-07 12:25:40,863][05758] InferenceWorker_p0-w0: stopping experience collection (300 times) [2025-01-07 12:25:41,606][01669] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 874.7). Total num frames: 1228800. Throughput: 0: 221.2. Samples: 309610. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:25:41,616][01669] Avg episode reward: [(0, '8.037')] [2025-01-07 12:25:42,005][05745] Signal inference workers to resume experience collection... (300 times) [2025-01-07 12:25:42,006][05758] InferenceWorker_p0-w0: resuming experience collection (300 times) [2025-01-07 12:25:46,606][01669] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 888.6). Total num frames: 1236992. Throughput: 0: 226.2. Samples: 311106. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:25:46,609][01669] Avg episode reward: [(0, '7.919')] [2025-01-07 12:25:51,606][01669] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 1241088. Throughput: 0: 228.9. Samples: 311894. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:25:51,610][01669] Avg episode reward: [(0, '7.920')] [2025-01-07 12:25:56,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 1245184. Throughput: 0: 224.3. Samples: 312974. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:25:56,609][01669] Avg episode reward: [(0, '8.132')] [2025-01-07 12:26:01,607][01669] Fps is (10 sec: 819.1, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 1249280. Throughput: 0: 219.5. Samples: 314550. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:26:01,613][01669] Avg episode reward: [(0, '8.008')] [2025-01-07 12:26:06,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 1253376. Throughput: 0: 229.3. Samples: 315250. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:26:06,616][01669] Avg episode reward: [(0, '8.377')] [2025-01-07 12:26:11,606][01669] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 1257472. Throughput: 0: 231.1. Samples: 316810. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:26:11,609][01669] Avg episode reward: [(0, '7.975')] [2025-01-07 12:26:12,962][05745] Saving new best policy, reward=8.377! [2025-01-07 12:26:16,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 1261568. Throughput: 0: 225.4. Samples: 317858. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:26:16,615][01669] Avg episode reward: [(0, '8.264')] [2025-01-07 12:26:21,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 1265664. Throughput: 0: 221.5. Samples: 318658. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:26:21,615][01669] Avg episode reward: [(0, '8.393')] [2025-01-07 12:26:21,686][05758] Updated weights for policy 0, policy_version 310 (0.0999) [2025-01-07 12:26:25,507][05745] Saving new best policy, reward=8.393! [2025-01-07 12:26:26,607][01669] Fps is (10 sec: 1228.7, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 1273856. Throughput: 0: 232.3. Samples: 320064. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:26:26,613][01669] Avg episode reward: [(0, '8.469')] [2025-01-07 12:26:31,338][05745] Saving new best policy, reward=8.469! [2025-01-07 12:26:31,614][01669] Fps is (10 sec: 1227.9, 60 sec: 887.4, 300 sec: 902.5). Total num frames: 1277952. Throughput: 0: 222.9. Samples: 321140. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:26:31,616][01669] Avg episode reward: [(0, '8.335')] [2025-01-07 12:26:36,606][01669] Fps is (10 sec: 819.3, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 1282048. Throughput: 0: 225.0. Samples: 322018. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:26:36,609][01669] Avg episode reward: [(0, '8.230')] [2025-01-07 12:26:41,607][01669] Fps is (10 sec: 819.8, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 1286144. Throughput: 0: 230.4. Samples: 323342. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:26:41,612][01669] Avg episode reward: [(0, '8.780')] [2025-01-07 12:26:43,995][05745] Saving new best policy, reward=8.780! [2025-01-07 12:26:46,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 1290240. Throughput: 0: 226.1. Samples: 324726. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:26:46,610][01669] Avg episode reward: [(0, '9.054')] [2025-01-07 12:26:50,313][05745] Saving new best policy, reward=9.054! [2025-01-07 12:26:51,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 1294336. Throughput: 0: 219.8. Samples: 325142. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:26:51,609][01669] Avg episode reward: [(0, '9.688')] [2025-01-07 12:26:54,681][05745] Saving new best policy, reward=9.688! [2025-01-07 12:26:56,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 1298432. Throughput: 0: 216.4. Samples: 326548. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:26:56,614][01669] Avg episode reward: [(0, '10.187')] [2025-01-07 12:26:58,672][05745] Saving new best policy, reward=10.187! [2025-01-07 12:27:01,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 1302528. Throughput: 0: 230.2. Samples: 328216. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:27:01,609][01669] Avg episode reward: [(0, '10.126')] [2025-01-07 12:27:06,610][01669] Fps is (10 sec: 818.9, 60 sec: 887.4, 300 sec: 888.6). Total num frames: 1306624. Throughput: 0: 219.2. Samples: 328522. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:27:06,613][01669] Avg episode reward: [(0, '9.896')] [2025-01-07 12:27:09,713][05758] Updated weights for policy 0, policy_version 320 (0.0523) [2025-01-07 12:27:11,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 1310720. Throughput: 0: 211.5. Samples: 329580. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:27:11,609][01669] Avg episode reward: [(0, '10.072')] [2025-01-07 12:27:16,606][01669] Fps is (10 sec: 819.5, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 1314816. Throughput: 0: 225.5. Samples: 331284. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:27:16,609][01669] Avg episode reward: [(0, '10.280')] [2025-01-07 12:27:21,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 1318912. Throughput: 0: 226.5. Samples: 332210. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:27:21,611][01669] Avg episode reward: [(0, '10.343')] [2025-01-07 12:27:22,073][05745] Saving new best policy, reward=10.280! [2025-01-07 12:27:26,606][01669] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 888.6). Total num frames: 1323008. Throughput: 0: 217.2. Samples: 333114. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:27:26,612][01669] Avg episode reward: [(0, '10.663')] [2025-01-07 12:27:27,561][05745] Saving new best policy, reward=10.343! [2025-01-07 12:27:31,328][05745] Saving new best policy, reward=10.663! [2025-01-07 12:27:31,606][01669] Fps is (10 sec: 1228.8, 60 sec: 887.6, 300 sec: 902.5). Total num frames: 1331200. Throughput: 0: 215.1. Samples: 334406. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:27:31,609][01669] Avg episode reward: [(0, '10.639')] [2025-01-07 12:27:35,268][05745] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000326_1335296.pth... [2025-01-07 12:27:35,387][05745] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000274_1122304.pth [2025-01-07 12:27:36,606][01669] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 1335296. Throughput: 0: 227.2. Samples: 335368. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:27:36,611][01669] Avg episode reward: [(0, '10.610')] [2025-01-07 12:27:41,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 1339392. Throughput: 0: 219.1. Samples: 336406. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:27:41,609][01669] Avg episode reward: [(0, '10.776')] [2025-01-07 12:27:45,682][05745] Saving new best policy, reward=10.776! [2025-01-07 12:27:46,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 1343488. Throughput: 0: 211.3. Samples: 337726. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:27:46,615][01669] Avg episode reward: [(0, '11.212')] [2025-01-07 12:27:49,558][05745] Saving new best policy, reward=11.212! [2025-01-07 12:27:51,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 1347584. Throughput: 0: 218.9. Samples: 338372. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:27:51,609][01669] Avg episode reward: [(0, '11.455')] [2025-01-07 12:27:53,413][05745] Saving new best policy, reward=11.455! [2025-01-07 12:27:53,419][05758] Updated weights for policy 0, policy_version 330 (0.0054) [2025-01-07 12:27:56,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 1351680. Throughput: 0: 234.8. Samples: 340144. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:27:56,614][01669] Avg episode reward: [(0, '11.815')] [2025-01-07 12:28:01,607][01669] Fps is (10 sec: 819.1, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 1355776. Throughput: 0: 217.4. Samples: 341066. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:28:01,610][01669] Avg episode reward: [(0, '12.529')] [2025-01-07 12:28:03,783][05745] Saving new best policy, reward=11.815! [2025-01-07 12:28:03,922][05745] Saving new best policy, reward=12.529! [2025-01-07 12:28:06,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 1359872. Throughput: 0: 211.6. Samples: 341730. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:28:06,613][01669] Avg episode reward: [(0, '12.974')] [2025-01-07 12:28:11,441][05745] Saving new best policy, reward=12.974! [2025-01-07 12:28:11,607][01669] Fps is (10 sec: 1228.9, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 1368064. Throughput: 0: 229.6. Samples: 343446. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:28:11,609][01669] Avg episode reward: [(0, '13.123')] [2025-01-07 12:28:16,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 1368064. Throughput: 0: 222.1. Samples: 344402. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:28:16,608][01669] Avg episode reward: [(0, '12.813')] [2025-01-07 12:28:16,923][05745] Saving new best policy, reward=13.123! [2025-01-07 12:28:21,606][01669] Fps is (10 sec: 409.6, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 1372160. Throughput: 0: 214.7. Samples: 345028. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:28:21,613][01669] Avg episode reward: [(0, '13.365')] [2025-01-07 12:28:25,830][05745] Saving new best policy, reward=13.365! [2025-01-07 12:28:26,606][01669] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 888.6). Total num frames: 1380352. Throughput: 0: 224.7. Samples: 346518. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2025-01-07 12:28:26,616][01669] Avg episode reward: [(0, '13.527')] [2025-01-07 12:28:29,777][05745] Saving new best policy, reward=13.527! [2025-01-07 12:28:31,610][01669] Fps is (10 sec: 1228.4, 60 sec: 887.4, 300 sec: 902.5). Total num frames: 1384448. Throughput: 0: 228.7. Samples: 348016. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2025-01-07 12:28:31,617][01669] Avg episode reward: [(0, '13.374')] [2025-01-07 12:28:36,607][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 1388544. Throughput: 0: 226.8. Samples: 348578. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:28:36,614][01669] Avg episode reward: [(0, '13.263')] [2025-01-07 12:28:40,400][05758] Updated weights for policy 0, policy_version 340 (0.0051) [2025-01-07 12:28:41,606][01669] Fps is (10 sec: 819.5, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 1392640. Throughput: 0: 210.5. Samples: 349618. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:28:41,613][01669] Avg episode reward: [(0, '13.331')] [2025-01-07 12:28:46,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 1396736. Throughput: 0: 232.8. Samples: 351540. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:28:46,610][01669] Avg episode reward: [(0, '13.264')] [2025-01-07 12:28:51,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 1400832. Throughput: 0: 224.6. Samples: 351836. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:28:51,612][01669] Avg episode reward: [(0, '13.568')] [2025-01-07 12:28:54,459][05745] Saving new best policy, reward=13.568! [2025-01-07 12:28:56,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 1404928. Throughput: 0: 208.4. Samples: 352826. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:28:56,612][01669] Avg episode reward: [(0, '13.481')] [2025-01-07 12:29:01,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 1409024. Throughput: 0: 229.1. Samples: 354710. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:29:01,615][01669] Avg episode reward: [(0, '13.846')] [2025-01-07 12:29:06,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 1413120. Throughput: 0: 233.9. Samples: 355554. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2025-01-07 12:29:06,614][01669] Avg episode reward: [(0, '13.645')] [2025-01-07 12:29:07,253][05745] Saving new best policy, reward=13.846! [2025-01-07 12:29:11,606][01669] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 888.6). Total num frames: 1417216. Throughput: 0: 224.6. Samples: 356626. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2025-01-07 12:29:11,609][01669] Avg episode reward: [(0, '13.576')] [2025-01-07 12:29:16,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 1421312. Throughput: 0: 218.6. Samples: 357852. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:29:16,611][01669] Avg episode reward: [(0, '13.157')] [2025-01-07 12:29:21,606][01669] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 1429504. Throughput: 0: 225.9. Samples: 358744. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:29:21,615][01669] Avg episode reward: [(0, '12.664')] [2025-01-07 12:29:26,520][05758] Updated weights for policy 0, policy_version 350 (0.0502) [2025-01-07 12:29:26,606][01669] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 1433600. Throughput: 0: 228.4. Samples: 359896. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:29:26,614][01669] Avg episode reward: [(0, '12.525')] [2025-01-07 12:29:30,523][05745] Signal inference workers to stop experience collection... (350 times) [2025-01-07 12:29:30,662][05758] InferenceWorker_p0-w0: stopping experience collection (350 times) [2025-01-07 12:29:31,606][01669] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 888.6). Total num frames: 1433600. Throughput: 0: 209.0. Samples: 360944. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:29:31,614][01669] Avg episode reward: [(0, '12.920')] [2025-01-07 12:29:32,054][05745] Signal inference workers to resume experience collection... (350 times) [2025-01-07 12:29:32,057][05758] InferenceWorker_p0-w0: resuming experience collection (350 times) [2025-01-07 12:29:35,978][05745] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000352_1441792.pth... [2025-01-07 12:29:36,102][05745] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000300_1228800.pth [2025-01-07 12:29:36,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 1441792. Throughput: 0: 222.1. Samples: 361832. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:29:36,615][01669] Avg episode reward: [(0, '12.529')] [2025-01-07 12:29:41,606][01669] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 1445888. Throughput: 0: 229.5. Samples: 363154. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:29:41,609][01669] Avg episode reward: [(0, '12.211')] [2025-01-07 12:29:46,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 1449984. Throughput: 0: 209.2. Samples: 364122. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:29:46,614][01669] Avg episode reward: [(0, '12.224')] [2025-01-07 12:29:51,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 1454080. Throughput: 0: 206.8. Samples: 364858. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:29:51,608][01669] Avg episode reward: [(0, '12.284')] [2025-01-07 12:29:56,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 1458176. Throughput: 0: 216.1. Samples: 366352. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:29:56,610][01669] Avg episode reward: [(0, '12.011')] [2025-01-07 12:30:01,607][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 1462272. Throughput: 0: 218.8. Samples: 367696. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2025-01-07 12:30:01,609][01669] Avg episode reward: [(0, '11.996')] [2025-01-07 12:30:06,607][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 1466368. Throughput: 0: 205.3. Samples: 367984. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2025-01-07 12:30:06,616][01669] Avg episode reward: [(0, '11.922')] [2025-01-07 12:30:11,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 1470464. Throughput: 0: 216.8. Samples: 369654. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2025-01-07 12:30:11,612][01669] Avg episode reward: [(0, '11.656')] [2025-01-07 12:30:13,202][05758] Updated weights for policy 0, policy_version 360 (0.0490) [2025-01-07 12:30:16,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 1474560. Throughput: 0: 226.7. Samples: 371146. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2025-01-07 12:30:16,613][01669] Avg episode reward: [(0, '11.822')] [2025-01-07 12:30:21,611][01669] Fps is (10 sec: 818.8, 60 sec: 819.1, 300 sec: 888.6). Total num frames: 1478656. Throughput: 0: 217.1. Samples: 371604. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:30:21,620][01669] Avg episode reward: [(0, '11.726')] [2025-01-07 12:30:26,606][01669] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 874.7). Total num frames: 1482752. Throughput: 0: 215.7. Samples: 372862. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:30:26,615][01669] Avg episode reward: [(0, '11.764')] [2025-01-07 12:30:31,606][01669] Fps is (10 sec: 819.5, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 1486848. Throughput: 0: 223.5. Samples: 374178. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:30:31,619][01669] Avg episode reward: [(0, '12.066')] [2025-01-07 12:30:36,607][01669] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 1495040. Throughput: 0: 230.5. Samples: 375232. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:30:36,610][01669] Avg episode reward: [(0, '12.235')] [2025-01-07 12:30:41,606][01669] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 874.7). Total num frames: 1495040. Throughput: 0: 218.6. Samples: 376190. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:30:41,609][01669] Avg episode reward: [(0, '12.390')] [2025-01-07 12:30:46,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 1503232. Throughput: 0: 215.1. Samples: 377376. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:30:46,616][01669] Avg episode reward: [(0, '12.498')] [2025-01-07 12:30:51,606][01669] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 1507328. Throughput: 0: 229.2. Samples: 378300. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:30:51,609][01669] Avg episode reward: [(0, '11.950')] [2025-01-07 12:30:56,610][01669] Fps is (10 sec: 818.9, 60 sec: 887.4, 300 sec: 888.6). Total num frames: 1511424. Throughput: 0: 214.6. Samples: 379310. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:30:56,612][01669] Avg episode reward: [(0, '11.911')] [2025-01-07 12:31:00,896][05758] Updated weights for policy 0, policy_version 370 (0.0497) [2025-01-07 12:31:01,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 1515520. Throughput: 0: 210.8. Samples: 380630. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:31:01,611][01669] Avg episode reward: [(0, '11.830')] [2025-01-07 12:31:06,606][01669] Fps is (10 sec: 819.5, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 1519616. Throughput: 0: 217.2. Samples: 381376. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:31:06,614][01669] Avg episode reward: [(0, '12.782')] [2025-01-07 12:31:11,608][01669] Fps is (10 sec: 819.1, 60 sec: 887.4, 300 sec: 888.6). Total num frames: 1523712. Throughput: 0: 224.3. Samples: 382956. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:31:11,613][01669] Avg episode reward: [(0, '12.716')] [2025-01-07 12:31:16,611][01669] Fps is (10 sec: 818.8, 60 sec: 887.4, 300 sec: 888.6). Total num frames: 1527808. Throughput: 0: 216.8. Samples: 383934. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:31:16,613][01669] Avg episode reward: [(0, '12.886')] [2025-01-07 12:31:21,606][01669] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 1531904. Throughput: 0: 205.6. Samples: 384482. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:31:21,609][01669] Avg episode reward: [(0, '13.243')] [2025-01-07 12:31:26,606][01669] Fps is (10 sec: 819.6, 60 sec: 887.5, 300 sec: 874.8). Total num frames: 1536000. Throughput: 0: 228.1. Samples: 386456. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:31:26,614][01669] Avg episode reward: [(0, '12.928')] [2025-01-07 12:31:31,608][01669] Fps is (10 sec: 819.0, 60 sec: 887.4, 300 sec: 874.7). Total num frames: 1540096. Throughput: 0: 224.2. Samples: 387464. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:31:31,612][01669] Avg episode reward: [(0, '12.941')] [2025-01-07 12:31:32,895][05745] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000377_1544192.pth... [2025-01-07 12:31:33,048][05745] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000326_1335296.pth [2025-01-07 12:31:36,606][01669] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 874.7). Total num frames: 1544192. Throughput: 0: 213.7. Samples: 387916. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:31:36,616][01669] Avg episode reward: [(0, '12.799')] [2025-01-07 12:31:41,606][01669] Fps is (10 sec: 1229.1, 60 sec: 955.7, 300 sec: 888.6). Total num frames: 1552384. Throughput: 0: 227.4. Samples: 389542. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:31:41,613][01669] Avg episode reward: [(0, '12.779')] [2025-01-07 12:31:45,671][05758] Updated weights for policy 0, policy_version 380 (0.0969) [2025-01-07 12:31:46,606][01669] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 1556480. Throughput: 0: 225.8. Samples: 390790. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:31:46,612][01669] Avg episode reward: [(0, '12.644')] [2025-01-07 12:31:51,606][01669] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 874.7). Total num frames: 1556480. Throughput: 0: 225.3. Samples: 391514. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:31:51,609][01669] Avg episode reward: [(0, '12.502')] [2025-01-07 12:31:56,607][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 1564672. Throughput: 0: 213.9. Samples: 392580. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:31:56,609][01669] Avg episode reward: [(0, '12.486')] [2025-01-07 12:32:01,606][01669] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 1568768. Throughput: 0: 230.3. Samples: 394298. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:32:01,616][01669] Avg episode reward: [(0, '12.764')] [2025-01-07 12:32:06,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 1572864. Throughput: 0: 226.5. Samples: 394676. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:32:06,610][01669] Avg episode reward: [(0, '12.957')] [2025-01-07 12:32:11,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 1576960. Throughput: 0: 205.8. Samples: 395718. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:32:11,609][01669] Avg episode reward: [(0, '12.939')] [2025-01-07 12:32:16,609][01669] Fps is (10 sec: 819.0, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 1581056. Throughput: 0: 225.2. Samples: 397600. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:32:16,616][01669] Avg episode reward: [(0, '13.919')] [2025-01-07 12:32:18,457][05745] Saving new best policy, reward=13.919! [2025-01-07 12:32:21,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 1585152. Throughput: 0: 227.7. Samples: 398162. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:32:21,610][01669] Avg episode reward: [(0, '14.392')] [2025-01-07 12:32:26,606][01669] Fps is (10 sec: 819.4, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 1589248. Throughput: 0: 213.1. Samples: 399132. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:32:26,609][01669] Avg episode reward: [(0, '14.696')] [2025-01-07 12:32:29,334][05745] Saving new best policy, reward=14.392! [2025-01-07 12:32:29,488][05745] Saving new best policy, reward=14.696! [2025-01-07 12:32:31,607][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 1593344. Throughput: 0: 220.4. Samples: 400706. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:32:31,616][01669] Avg episode reward: [(0, '14.884')] [2025-01-07 12:32:33,380][05758] Updated weights for policy 0, policy_version 390 (0.0925) [2025-01-07 12:32:36,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 1597440. Throughput: 0: 216.7. Samples: 401264. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:32:36,608][01669] Avg episode reward: [(0, '14.795')] [2025-01-07 12:32:37,350][05745] Saving new best policy, reward=14.884! [2025-01-07 12:32:41,606][01669] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 874.7). Total num frames: 1601536. Throughput: 0: 225.0. Samples: 402704. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:32:41,609][01669] Avg episode reward: [(0, '14.960')] [2025-01-07 12:32:46,606][01669] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 874.7). Total num frames: 1605632. Throughput: 0: 209.8. Samples: 403738. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:32:46,616][01669] Avg episode reward: [(0, '14.607')] [2025-01-07 12:32:47,918][05745] Saving new best policy, reward=14.960! [2025-01-07 12:32:51,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 1609728. Throughput: 0: 218.1. Samples: 404490. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:32:51,613][01669] Avg episode reward: [(0, '15.155')] [2025-01-07 12:32:56,111][05745] Saving new best policy, reward=15.155! [2025-01-07 12:32:56,606][01669] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 1617920. Throughput: 0: 225.3. Samples: 405856. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2025-01-07 12:32:56,616][01669] Avg episode reward: [(0, '15.340')] [2025-01-07 12:33:01,606][01669] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 874.7). Total num frames: 1617920. Throughput: 0: 206.4. Samples: 406886. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2025-01-07 12:33:01,613][01669] Avg episode reward: [(0, '15.590')] [2025-01-07 12:33:02,408][05745] Saving new best policy, reward=15.340! [2025-01-07 12:33:06,606][01669] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 860.9). Total num frames: 1622016. Throughput: 0: 209.6. Samples: 407592. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2025-01-07 12:33:06,613][01669] Avg episode reward: [(0, '15.750')] [2025-01-07 12:33:06,733][05745] Saving new best policy, reward=15.590! [2025-01-07 12:33:10,796][05745] Saving new best policy, reward=15.750! [2025-01-07 12:33:11,606][01669] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 1630208. Throughput: 0: 217.9. Samples: 408936. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2025-01-07 12:33:11,615][01669] Avg episode reward: [(0, '15.929')] [2025-01-07 12:33:15,597][05745] Saving new best policy, reward=15.929! [2025-01-07 12:33:16,608][01669] Fps is (10 sec: 1228.6, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 1634304. Throughput: 0: 211.2. Samples: 410210. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2025-01-07 12:33:16,615][01669] Avg episode reward: [(0, '15.901')] [2025-01-07 12:33:21,557][05758] Updated weights for policy 0, policy_version 400 (0.0067) [2025-01-07 12:33:21,609][01669] Fps is (10 sec: 819.0, 60 sec: 887.4, 300 sec: 874.7). Total num frames: 1638400. Throughput: 0: 215.3. Samples: 410954. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2025-01-07 12:33:21,615][01669] Avg episode reward: [(0, '15.843')] [2025-01-07 12:33:23,937][05745] Signal inference workers to stop experience collection... (400 times) [2025-01-07 12:33:23,987][05758] InferenceWorker_p0-w0: stopping experience collection (400 times) [2025-01-07 12:33:25,466][05745] Signal inference workers to resume experience collection... (400 times) [2025-01-07 12:33:25,467][05758] InferenceWorker_p0-w0: resuming experience collection (400 times) [2025-01-07 12:33:26,606][01669] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 1642496. Throughput: 0: 207.1. Samples: 412024. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2025-01-07 12:33:26,614][01669] Avg episode reward: [(0, '16.507')] [2025-01-07 12:33:29,317][05745] Saving new best policy, reward=16.507! [2025-01-07 12:33:31,607][01669] Fps is (10 sec: 819.4, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 1646592. Throughput: 0: 225.7. Samples: 413894. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2025-01-07 12:33:31,611][01669] Avg episode reward: [(0, '16.580')] [2025-01-07 12:33:34,646][05745] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000403_1650688.pth... [2025-01-07 12:33:34,799][05745] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000352_1441792.pth [2025-01-07 12:33:34,823][05745] Saving new best policy, reward=16.580! [2025-01-07 12:33:36,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 1650688. Throughput: 0: 213.5. Samples: 414096. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:33:36,617][01669] Avg episode reward: [(0, '16.623')] [2025-01-07 12:33:40,095][05745] Saving new best policy, reward=16.623! [2025-01-07 12:33:41,607][01669] Fps is (10 sec: 819.1, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 1654784. Throughput: 0: 210.1. Samples: 415312. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:33:41,616][01669] Avg episode reward: [(0, '16.280')] [2025-01-07 12:33:46,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 1658880. Throughput: 0: 228.9. Samples: 417186. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:33:46,614][01669] Avg episode reward: [(0, '15.584')] [2025-01-07 12:33:51,606][01669] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 1662976. Throughput: 0: 223.2. Samples: 417636. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:33:51,615][01669] Avg episode reward: [(0, '14.739')] [2025-01-07 12:33:56,606][01669] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 874.7). Total num frames: 1667072. Throughput: 0: 213.8. Samples: 418558. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:33:56,613][01669] Avg episode reward: [(0, '14.811')] [2025-01-07 12:34:01,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 1671168. Throughput: 0: 223.9. Samples: 420286. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:34:01,609][01669] Avg episode reward: [(0, '14.698')] [2025-01-07 12:34:06,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 1675264. Throughput: 0: 220.3. Samples: 420866. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:34:06,609][01669] Avg episode reward: [(0, '15.025')] [2025-01-07 12:34:07,437][05758] Updated weights for policy 0, policy_version 410 (0.1511) [2025-01-07 12:34:11,606][01669] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 874.7). Total num frames: 1679360. Throughput: 0: 222.5. Samples: 422038. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:34:11,609][01669] Avg episode reward: [(0, '15.132')] [2025-01-07 12:34:16,607][01669] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 860.9). Total num frames: 1683456. Throughput: 0: 210.8. Samples: 423378. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:34:16,615][01669] Avg episode reward: [(0, '15.184')] [2025-01-07 12:34:21,606][01669] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 1691648. Throughput: 0: 228.7. Samples: 424388. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:34:21,608][01669] Avg episode reward: [(0, '15.927')] [2025-01-07 12:34:26,606][01669] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 1695744. Throughput: 0: 225.4. Samples: 425456. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:34:26,613][01669] Avg episode reward: [(0, '15.982')] [2025-01-07 12:34:31,606][01669] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 860.9). Total num frames: 1695744. Throughput: 0: 206.1. Samples: 426460. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:34:31,609][01669] Avg episode reward: [(0, '15.891')] [2025-01-07 12:34:36,607][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 1703936. Throughput: 0: 218.1. Samples: 427452. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:34:36,610][01669] Avg episode reward: [(0, '15.608')] [2025-01-07 12:34:41,606][01669] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 1708032. Throughput: 0: 228.3. Samples: 428832. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:34:41,609][01669] Avg episode reward: [(0, '15.907')] [2025-01-07 12:34:46,607][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 1712128. Throughput: 0: 216.3. Samples: 430018. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:34:46,609][01669] Avg episode reward: [(0, '15.874')] [2025-01-07 12:34:51,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 1716224. Throughput: 0: 214.8. Samples: 430534. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:34:51,609][01669] Avg episode reward: [(0, '16.387')] [2025-01-07 12:34:54,301][05758] Updated weights for policy 0, policy_version 420 (0.1037) [2025-01-07 12:34:56,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 1720320. Throughput: 0: 222.7. Samples: 432058. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:34:56,609][01669] Avg episode reward: [(0, '16.790')] [2025-01-07 12:34:58,298][05745] Saving new best policy, reward=16.790! [2025-01-07 12:35:01,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 1724416. Throughput: 0: 227.1. Samples: 433598. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:35:01,609][01669] Avg episode reward: [(0, '17.117')] [2025-01-07 12:35:06,614][01669] Fps is (10 sec: 818.6, 60 sec: 887.4, 300 sec: 874.7). Total num frames: 1728512. Throughput: 0: 208.6. Samples: 433776. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:35:06,624][01669] Avg episode reward: [(0, '17.282')] [2025-01-07 12:35:09,140][05745] Saving new best policy, reward=17.117! [2025-01-07 12:35:09,271][05745] Saving new best policy, reward=17.282! [2025-01-07 12:35:11,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 1732608. Throughput: 0: 218.9. Samples: 435306. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:35:11,614][01669] Avg episode reward: [(0, '17.832')] [2025-01-07 12:35:16,606][01669] Fps is (10 sec: 819.8, 60 sec: 887.5, 300 sec: 874.8). Total num frames: 1736704. Throughput: 0: 230.0. Samples: 436808. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:35:16,616][01669] Avg episode reward: [(0, '17.881')] [2025-01-07 12:35:17,168][05745] Saving new best policy, reward=17.832! [2025-01-07 12:35:21,608][01669] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 874.7). Total num frames: 1740800. Throughput: 0: 220.1. Samples: 437358. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:35:21,610][01669] Avg episode reward: [(0, '18.088')] [2025-01-07 12:35:23,451][05745] Saving new best policy, reward=17.881! [2025-01-07 12:35:26,606][01669] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 874.7). Total num frames: 1744896. Throughput: 0: 212.1. Samples: 438378. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:35:26,615][01669] Avg episode reward: [(0, '18.078')] [2025-01-07 12:35:28,068][05745] Saving new best policy, reward=18.088! [2025-01-07 12:35:31,606][01669] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 860.9). Total num frames: 1748992. Throughput: 0: 218.4. Samples: 439848. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:35:31,617][01669] Avg episode reward: [(0, '18.579')] [2025-01-07 12:35:36,081][05745] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000429_1757184.pth... [2025-01-07 12:35:36,172][05745] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000377_1544192.pth [2025-01-07 12:35:36,187][05745] Saving new best policy, reward=18.579! [2025-01-07 12:35:36,607][01669] Fps is (10 sec: 1228.7, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 1757184. Throughput: 0: 229.8. Samples: 440876. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:35:36,614][01669] Avg episode reward: [(0, '18.817')] [2025-01-07 12:35:41,606][01669] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 860.9). Total num frames: 1757184. Throughput: 0: 218.7. Samples: 441898. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:35:41,609][01669] Avg episode reward: [(0, '19.620')] [2025-01-07 12:35:42,407][05758] Updated weights for policy 0, policy_version 430 (0.0073) [2025-01-07 12:35:42,409][05745] Saving new best policy, reward=18.817! [2025-01-07 12:35:46,606][01669] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 860.9). Total num frames: 1761280. Throughput: 0: 206.8. Samples: 442906. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:35:46,614][01669] Avg episode reward: [(0, '19.515')] [2025-01-07 12:35:46,627][05745] Saving new best policy, reward=19.620! [2025-01-07 12:35:51,606][01669] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 1769472. Throughput: 0: 226.0. Samples: 443944. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:35:51,611][01669] Avg episode reward: [(0, '19.631')] [2025-01-07 12:35:55,508][05745] Saving new best policy, reward=19.631! [2025-01-07 12:35:56,606][01669] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 1773568. Throughput: 0: 214.5. Samples: 444958. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:35:56,612][01669] Avg episode reward: [(0, '19.430')] [2025-01-07 12:36:01,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 1777664. Throughput: 0: 205.1. Samples: 446038. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:36:01,614][01669] Avg episode reward: [(0, '19.304')] [2025-01-07 12:36:06,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.6, 300 sec: 874.7). Total num frames: 1781760. Throughput: 0: 214.4. Samples: 447008. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:36:06,609][01669] Avg episode reward: [(0, '20.406')] [2025-01-07 12:36:09,296][05745] Saving new best policy, reward=20.406! [2025-01-07 12:36:11,614][01669] Fps is (10 sec: 818.6, 60 sec: 887.4, 300 sec: 874.7). Total num frames: 1785856. Throughput: 0: 226.0. Samples: 448548. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:36:11,616][01669] Avg episode reward: [(0, '19.981')] [2025-01-07 12:36:16,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 1789952. Throughput: 0: 213.8. Samples: 449470. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:36:16,617][01669] Avg episode reward: [(0, '20.332')] [2025-01-07 12:36:21,606][01669] Fps is (10 sec: 819.8, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 1794048. Throughput: 0: 203.9. Samples: 450052. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:36:21,609][01669] Avg episode reward: [(0, '20.177')] [2025-01-07 12:36:26,607][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 1798144. Throughput: 0: 216.4. Samples: 451638. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:36:26,609][01669] Avg episode reward: [(0, '20.076')] [2025-01-07 12:36:28,074][05758] Updated weights for policy 0, policy_version 440 (0.0946) [2025-01-07 12:36:31,607][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 1802240. Throughput: 0: 224.2. Samples: 452994. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:36:31,613][01669] Avg episode reward: [(0, '20.768')] [2025-01-07 12:36:34,115][05745] Saving new best policy, reward=20.768! [2025-01-07 12:36:36,606][01669] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 860.9). Total num frames: 1806336. Throughput: 0: 206.0. Samples: 453216. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:36:36,608][01669] Avg episode reward: [(0, '20.989')] [2025-01-07 12:36:38,794][05745] Saving new best policy, reward=20.989! [2025-01-07 12:36:41,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 860.9). Total num frames: 1810432. Throughput: 0: 220.4. Samples: 454874. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:36:41,617][01669] Avg episode reward: [(0, '21.103')] [2025-01-07 12:36:46,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 1814528. Throughput: 0: 225.8. Samples: 456200. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:36:46,608][01669] Avg episode reward: [(0, '21.818')] [2025-01-07 12:36:47,197][05745] Saving new best policy, reward=21.103! [2025-01-07 12:36:51,606][01669] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 860.9). Total num frames: 1818624. Throughput: 0: 217.4. Samples: 456790. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:36:51,613][01669] Avg episode reward: [(0, '21.966')] [2025-01-07 12:36:53,340][05745] Saving new best policy, reward=21.818! [2025-01-07 12:36:56,606][01669] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 860.9). Total num frames: 1822720. Throughput: 0: 211.5. Samples: 458062. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:36:56,608][01669] Avg episode reward: [(0, '21.411')] [2025-01-07 12:36:57,497][05745] Saving new best policy, reward=21.966! [2025-01-07 12:37:01,606][01669] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 1830912. Throughput: 0: 219.2. Samples: 459336. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:37:01,609][01669] Avg episode reward: [(0, '21.259')] [2025-01-07 12:37:06,607][01669] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 1835008. Throughput: 0: 228.8. Samples: 460348. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:37:06,618][01669] Avg episode reward: [(0, '21.956')] [2025-01-07 12:37:11,606][01669] Fps is (10 sec: 409.6, 60 sec: 819.3, 300 sec: 860.9). Total num frames: 1835008. Throughput: 0: 215.4. Samples: 461332. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:37:11,608][01669] Avg episode reward: [(0, '22.000')] [2025-01-07 12:37:16,059][05745] Saving new best policy, reward=22.000! [2025-01-07 12:37:16,065][05758] Updated weights for policy 0, policy_version 450 (0.0504) [2025-01-07 12:37:16,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 1843200. Throughput: 0: 212.8. Samples: 462570. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:37:16,611][01669] Avg episode reward: [(0, '21.615')] [2025-01-07 12:37:18,272][05745] Signal inference workers to stop experience collection... (450 times) [2025-01-07 12:37:18,318][05758] InferenceWorker_p0-w0: stopping experience collection (450 times) [2025-01-07 12:37:20,042][05745] Signal inference workers to resume experience collection... (450 times) [2025-01-07 12:37:20,045][05758] InferenceWorker_p0-w0: resuming experience collection (450 times) [2025-01-07 12:37:21,610][01669] Fps is (10 sec: 1228.4, 60 sec: 887.4, 300 sec: 874.7). Total num frames: 1847296. Throughput: 0: 225.8. Samples: 463378. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:37:21,614][01669] Avg episode reward: [(0, '21.236')] [2025-01-07 12:37:26,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 1851392. Throughput: 0: 210.2. Samples: 464332. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:37:26,612][01669] Avg episode reward: [(0, '20.984')] [2025-01-07 12:37:31,606][01669] Fps is (10 sec: 819.5, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 1855488. Throughput: 0: 210.8. Samples: 465686. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:37:31,609][01669] Avg episode reward: [(0, '21.297')] [2025-01-07 12:37:34,684][05745] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000454_1859584.pth... [2025-01-07 12:37:34,804][05745] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000403_1650688.pth [2025-01-07 12:37:36,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 1859584. Throughput: 0: 213.2. Samples: 466386. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:37:36,610][01669] Avg episode reward: [(0, '20.869')] [2025-01-07 12:37:41,610][01669] Fps is (10 sec: 818.9, 60 sec: 887.4, 300 sec: 874.7). Total num frames: 1863680. Throughput: 0: 219.8. Samples: 467952. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:37:41,618][01669] Avg episode reward: [(0, '21.076')] [2025-01-07 12:37:46,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 1867776. Throughput: 0: 214.1. Samples: 468972. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:37:46,611][01669] Avg episode reward: [(0, '21.201')] [2025-01-07 12:37:51,606][01669] Fps is (10 sec: 819.5, 60 sec: 887.5, 300 sec: 860.9). Total num frames: 1871872. Throughput: 0: 204.3. Samples: 469540. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:37:51,613][01669] Avg episode reward: [(0, '20.833')] [2025-01-07 12:37:56,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 1875968. Throughput: 0: 224.0. Samples: 471412. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:37:56,619][01669] Avg episode reward: [(0, '20.361')] [2025-01-07 12:38:01,613][01669] Fps is (10 sec: 818.7, 60 sec: 819.1, 300 sec: 874.7). Total num frames: 1880064. Throughput: 0: 220.4. Samples: 472488. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:38:01,615][01669] Avg episode reward: [(0, '20.109')] [2025-01-07 12:38:03,519][05758] Updated weights for policy 0, policy_version 460 (0.0949) [2025-01-07 12:38:06,606][01669] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 860.9). Total num frames: 1884160. Throughput: 0: 209.2. Samples: 472790. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:38:06,612][01669] Avg episode reward: [(0, '20.358')] [2025-01-07 12:38:11,606][01669] Fps is (10 sec: 1229.6, 60 sec: 955.7, 300 sec: 874.7). Total num frames: 1892352. Throughput: 0: 227.6. Samples: 474574. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:38:11,610][01669] Avg episode reward: [(0, '20.300')] [2025-01-07 12:38:16,610][01669] Fps is (10 sec: 1228.3, 60 sec: 887.4, 300 sec: 874.7). Total num frames: 1896448. Throughput: 0: 224.2. Samples: 475778. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:38:16,615][01669] Avg episode reward: [(0, '20.909')] [2025-01-07 12:38:21,606][01669] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 860.9). Total num frames: 1896448. Throughput: 0: 224.5. Samples: 476488. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:38:21,614][01669] Avg episode reward: [(0, '20.460')] [2025-01-07 12:38:26,606][01669] Fps is (10 sec: 819.5, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 1904640. Throughput: 0: 217.4. Samples: 477736. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:38:26,615][01669] Avg episode reward: [(0, '20.318')] [2025-01-07 12:38:31,606][01669] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 1908736. Throughput: 0: 230.4. Samples: 479338. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:38:31,609][01669] Avg episode reward: [(0, '19.700')] [2025-01-07 12:38:36,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 1912832. Throughput: 0: 226.1. Samples: 479716. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:38:36,615][01669] Avg episode reward: [(0, '20.149')] [2025-01-07 12:38:41,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 1916928. Throughput: 0: 207.9. Samples: 480766. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:38:41,616][01669] Avg episode reward: [(0, '20.123')] [2025-01-07 12:38:46,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 1921024. Throughput: 0: 228.3. Samples: 482758. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:38:46,614][01669] Avg episode reward: [(0, '20.116')] [2025-01-07 12:38:47,957][05758] Updated weights for policy 0, policy_version 470 (0.0946) [2025-01-07 12:38:51,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 1925120. Throughput: 0: 236.8. Samples: 483444. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:38:51,609][01669] Avg episode reward: [(0, '19.933')] [2025-01-07 12:38:56,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 1929216. Throughput: 0: 217.5. Samples: 484362. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:38:56,615][01669] Avg episode reward: [(0, '19.488')] [2025-01-07 12:39:01,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.6, 300 sec: 874.7). Total num frames: 1933312. Throughput: 0: 223.6. Samples: 485840. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:39:01,609][01669] Avg episode reward: [(0, '19.536')] [2025-01-07 12:39:06,606][01669] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 888.6). Total num frames: 1941504. Throughput: 0: 226.9. Samples: 486698. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:39:06,609][01669] Avg episode reward: [(0, '19.285')] [2025-01-07 12:39:11,606][01669] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 874.7). Total num frames: 1941504. Throughput: 0: 224.7. Samples: 487848. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:39:11,609][01669] Avg episode reward: [(0, '19.645')] [2025-01-07 12:39:16,606][01669] Fps is (10 sec: 409.6, 60 sec: 819.3, 300 sec: 860.9). Total num frames: 1945600. Throughput: 0: 212.7. Samples: 488908. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:39:16,617][01669] Avg episode reward: [(0, '20.149')] [2025-01-07 12:39:21,606][01669] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 874.7). Total num frames: 1953792. Throughput: 0: 227.2. Samples: 489938. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:39:21,608][01669] Avg episode reward: [(0, '20.226')] [2025-01-07 12:39:26,607][01669] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 1957888. Throughput: 0: 232.6. Samples: 491232. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:39:26,617][01669] Avg episode reward: [(0, '20.887')] [2025-01-07 12:39:31,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 1961984. Throughput: 0: 210.4. Samples: 492224. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:39:31,619][01669] Avg episode reward: [(0, '20.842')] [2025-01-07 12:39:35,014][05745] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000480_1966080.pth... [2025-01-07 12:39:35,019][05758] Updated weights for policy 0, policy_version 480 (0.0493) [2025-01-07 12:39:35,136][05745] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000429_1757184.pth [2025-01-07 12:39:36,608][01669] Fps is (10 sec: 819.1, 60 sec: 887.4, 300 sec: 874.7). Total num frames: 1966080. Throughput: 0: 212.1. Samples: 492990. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:39:36,618][01669] Avg episode reward: [(0, '20.741')] [2025-01-07 12:39:41,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 1970176. Throughput: 0: 230.1. Samples: 494716. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:39:41,616][01669] Avg episode reward: [(0, '20.809')] [2025-01-07 12:39:46,606][01669] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 1974272. Throughput: 0: 226.3. Samples: 496022. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:39:46,614][01669] Avg episode reward: [(0, '20.589')] [2025-01-07 12:39:51,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 1978368. Throughput: 0: 212.2. Samples: 496248. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:39:51,609][01669] Avg episode reward: [(0, '20.523')] [2025-01-07 12:39:56,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 1982464. Throughput: 0: 228.3. Samples: 498122. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:39:56,616][01669] Avg episode reward: [(0, '19.938')] [2025-01-07 12:40:01,606][01669] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 888.6). Total num frames: 1990656. Throughput: 0: 229.3. Samples: 499228. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:40:01,613][01669] Avg episode reward: [(0, '20.032')] [2025-01-07 12:40:06,606][01669] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 874.7). Total num frames: 1990656. Throughput: 0: 223.2. Samples: 499982. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:40:06,609][01669] Avg episode reward: [(0, '19.979')] [2025-01-07 12:40:11,606][01669] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 888.6). Total num frames: 1998848. Throughput: 0: 220.8. Samples: 501168. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:40:11,610][01669] Avg episode reward: [(0, '20.028')] [2025-01-07 12:40:16,606][01669] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 888.6). Total num frames: 2002944. Throughput: 0: 232.6. Samples: 502690. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:40:16,609][01669] Avg episode reward: [(0, '19.840')] [2025-01-07 12:40:20,496][05758] Updated weights for policy 0, policy_version 490 (0.0597) [2025-01-07 12:40:21,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2007040. Throughput: 0: 227.7. Samples: 503238. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:40:21,611][01669] Avg episode reward: [(0, '19.814')] [2025-01-07 12:40:26,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2011136. Throughput: 0: 212.4. Samples: 504272. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:40:26,611][01669] Avg episode reward: [(0, '19.990')] [2025-01-07 12:40:31,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 2015232. Throughput: 0: 221.3. Samples: 505982. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:40:31,617][01669] Avg episode reward: [(0, '20.859')] [2025-01-07 12:40:36,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2019328. Throughput: 0: 230.2. Samples: 506608. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:40:36,609][01669] Avg episode reward: [(0, '20.750')] [2025-01-07 12:40:41,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2023424. Throughput: 0: 211.5. Samples: 507638. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:40:41,610][01669] Avg episode reward: [(0, '21.004')] [2025-01-07 12:40:46,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 2027520. Throughput: 0: 223.2. Samples: 509270. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:40:46,616][01669] Avg episode reward: [(0, '20.680')] [2025-01-07 12:40:51,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 2031616. Throughput: 0: 219.1. Samples: 509840. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:40:51,616][01669] Avg episode reward: [(0, '21.531')] [2025-01-07 12:40:56,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 2035712. Throughput: 0: 226.3. Samples: 511352. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:40:56,608][01669] Avg episode reward: [(0, '21.611')] [2025-01-07 12:41:01,606][01669] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 874.7). Total num frames: 2039808. Throughput: 0: 214.7. Samples: 512350. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:41:01,609][01669] Avg episode reward: [(0, '21.692')] [2025-01-07 12:41:06,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.8). Total num frames: 2043904. Throughput: 0: 218.5. Samples: 513070. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:41:06,615][01669] Avg episode reward: [(0, '21.277')] [2025-01-07 12:41:07,047][05758] Updated weights for policy 0, policy_version 500 (0.0623) [2025-01-07 12:41:09,336][05745] Signal inference workers to stop experience collection... (500 times) [2025-01-07 12:41:09,382][05758] InferenceWorker_p0-w0: stopping experience collection (500 times) [2025-01-07 12:41:10,965][05745] Signal inference workers to resume experience collection... (500 times) [2025-01-07 12:41:10,966][05758] InferenceWorker_p0-w0: resuming experience collection (500 times) [2025-01-07 12:41:11,611][01669] Fps is (10 sec: 1228.2, 60 sec: 887.4, 300 sec: 888.6). Total num frames: 2052096. Throughput: 0: 228.0. Samples: 514532. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:41:11,618][01669] Avg episode reward: [(0, '20.615')] [2025-01-07 12:41:16,606][01669] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 874.7). Total num frames: 2052096. Throughput: 0: 214.3. Samples: 515626. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:41:16,608][01669] Avg episode reward: [(0, '19.762')] [2025-01-07 12:41:21,606][01669] Fps is (10 sec: 409.8, 60 sec: 819.2, 300 sec: 874.7). Total num frames: 2056192. Throughput: 0: 214.5. Samples: 516260. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:41:21,616][01669] Avg episode reward: [(0, '20.026')] [2025-01-07 12:41:26,606][01669] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2064384. Throughput: 0: 223.5. Samples: 517694. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:41:26,614][01669] Avg episode reward: [(0, '19.902')] [2025-01-07 12:41:31,607][01669] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2068480. Throughput: 0: 215.9. Samples: 518984. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:41:31,614][01669] Avg episode reward: [(0, '20.665')] [2025-01-07 12:41:35,791][05745] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000506_2072576.pth... [2025-01-07 12:41:35,922][05745] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000454_1859584.pth [2025-01-07 12:41:36,608][01669] Fps is (10 sec: 819.0, 60 sec: 887.4, 300 sec: 888.6). Total num frames: 2072576. Throughput: 0: 219.5. Samples: 519720. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:41:36,611][01669] Avg episode reward: [(0, '20.785')] [2025-01-07 12:41:41,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2076672. Throughput: 0: 213.7. Samples: 520968. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:41:41,608][01669] Avg episode reward: [(0, '20.774')] [2025-01-07 12:41:46,607][01669] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2080768. Throughput: 0: 232.3. Samples: 522802. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:41:46,616][01669] Avg episode reward: [(0, '21.044')] [2025-01-07 12:41:51,609][01669] Fps is (10 sec: 819.0, 60 sec: 887.4, 300 sec: 888.6). Total num frames: 2084864. Throughput: 0: 221.0. Samples: 523016. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2025-01-07 12:41:51,612][01669] Avg episode reward: [(0, '21.044')] [2025-01-07 12:41:54,594][05758] Updated weights for policy 0, policy_version 510 (0.0527) [2025-01-07 12:41:56,607][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 2088960. Throughput: 0: 215.1. Samples: 524210. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2025-01-07 12:41:56,614][01669] Avg episode reward: [(0, '20.989')] [2025-01-07 12:42:01,606][01669] Fps is (10 sec: 819.4, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 2093056. Throughput: 0: 227.7. Samples: 525874. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:42:01,614][01669] Avg episode reward: [(0, '20.747')] [2025-01-07 12:42:06,607][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2097152. Throughput: 0: 227.9. Samples: 526516. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:42:06,613][01669] Avg episode reward: [(0, '21.211')] [2025-01-07 12:42:11,606][01669] Fps is (10 sec: 819.2, 60 sec: 819.3, 300 sec: 874.7). Total num frames: 2101248. Throughput: 0: 216.2. Samples: 527424. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:42:11,614][01669] Avg episode reward: [(0, '21.445')] [2025-01-07 12:42:16,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 2105344. Throughput: 0: 221.4. Samples: 528948. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:42:16,618][01669] Avg episode reward: [(0, '22.435')] [2025-01-07 12:42:21,095][05745] Saving new best policy, reward=22.435! [2025-01-07 12:42:21,606][01669] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 888.6). Total num frames: 2113536. Throughput: 0: 227.3. Samples: 529946. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:42:21,615][01669] Avg episode reward: [(0, '20.968')] [2025-01-07 12:42:26,607][01669] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 874.7). Total num frames: 2113536. Throughput: 0: 223.3. Samples: 531018. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:42:26,609][01669] Avg episode reward: [(0, '21.238')] [2025-01-07 12:42:31,607][01669] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 874.7). Total num frames: 2117632. Throughput: 0: 205.2. Samples: 532036. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:42:31,617][01669] Avg episode reward: [(0, '20.846')] [2025-01-07 12:42:36,606][01669] Fps is (10 sec: 1228.9, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2125824. Throughput: 0: 223.4. Samples: 533070. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:42:36,614][01669] Avg episode reward: [(0, '20.859')] [2025-01-07 12:42:39,341][05758] Updated weights for policy 0, policy_version 520 (0.1896) [2025-01-07 12:42:41,606][01669] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2129920. Throughput: 0: 228.0. Samples: 534470. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:42:41,611][01669] Avg episode reward: [(0, '20.181')] [2025-01-07 12:42:46,611][01669] Fps is (10 sec: 818.8, 60 sec: 887.4, 300 sec: 888.6). Total num frames: 2134016. Throughput: 0: 212.1. Samples: 535420. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:42:46,616][01669] Avg episode reward: [(0, '19.869')] [2025-01-07 12:42:51,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2138112. Throughput: 0: 214.2. Samples: 536156. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:42:51,609][01669] Avg episode reward: [(0, '19.765')] [2025-01-07 12:42:56,606][01669] Fps is (10 sec: 819.5, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2142208. Throughput: 0: 231.5. Samples: 537840. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:42:56,616][01669] Avg episode reward: [(0, '20.308')] [2025-01-07 12:43:01,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2146304. Throughput: 0: 223.1. Samples: 538986. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:43:01,608][01669] Avg episode reward: [(0, '20.735')] [2025-01-07 12:43:06,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 2150400. Throughput: 0: 207.2. Samples: 539270. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:43:06,608][01669] Avg episode reward: [(0, '21.102')] [2025-01-07 12:43:11,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.8). Total num frames: 2154496. Throughput: 0: 223.7. Samples: 541084. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:43:11,616][01669] Avg episode reward: [(0, '21.079')] [2025-01-07 12:43:16,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2158592. Throughput: 0: 227.5. Samples: 542274. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:43:16,609][01669] Avg episode reward: [(0, '20.985')] [2025-01-07 12:43:21,606][01669] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 874.7). Total num frames: 2162688. Throughput: 0: 217.6. Samples: 542860. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:43:21,614][01669] Avg episode reward: [(0, '20.658')] [2025-01-07 12:43:26,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 2166784. Throughput: 0: 218.6. Samples: 544306. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2025-01-07 12:43:26,613][01669] Avg episode reward: [(0, '20.287')] [2025-01-07 12:43:27,081][05758] Updated weights for policy 0, policy_version 530 (0.0491) [2025-01-07 12:43:31,607][01669] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 888.6). Total num frames: 2174976. Throughput: 0: 223.7. Samples: 545484. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:43:31,609][01669] Avg episode reward: [(0, '20.439')] [2025-01-07 12:43:36,310][05745] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000532_2179072.pth... [2025-01-07 12:43:36,457][05745] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000480_1966080.pth [2025-01-07 12:43:36,609][01669] Fps is (10 sec: 1228.5, 60 sec: 887.4, 300 sec: 888.6). Total num frames: 2179072. Throughput: 0: 225.4. Samples: 546298. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:43:36,615][01669] Avg episode reward: [(0, '20.749')] [2025-01-07 12:43:41,606][01669] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 874.7). Total num frames: 2179072. Throughput: 0: 210.4. Samples: 547308. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:43:41,609][01669] Avg episode reward: [(0, '20.447')] [2025-01-07 12:43:46,606][01669] Fps is (10 sec: 819.4, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2187264. Throughput: 0: 211.9. Samples: 548522. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:43:46,611][01669] Avg episode reward: [(0, '20.447')] [2025-01-07 12:43:51,606][01669] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2191360. Throughput: 0: 224.6. Samples: 549378. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:43:51,613][01669] Avg episode reward: [(0, '20.082')] [2025-01-07 12:43:56,611][01669] Fps is (10 sec: 818.8, 60 sec: 887.4, 300 sec: 888.6). Total num frames: 2195456. Throughput: 0: 208.9. Samples: 550484. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:43:56,613][01669] Avg episode reward: [(0, '20.166')] [2025-01-07 12:44:01,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 2199552. Throughput: 0: 217.2. Samples: 552046. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:44:01,615][01669] Avg episode reward: [(0, '19.060')] [2025-01-07 12:44:06,606][01669] Fps is (10 sec: 819.5, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2203648. Throughput: 0: 219.9. Samples: 552756. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:44:06,617][01669] Avg episode reward: [(0, '19.159')] [2025-01-07 12:44:11,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2207744. Throughput: 0: 222.1. Samples: 554300. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:44:11,609][01669] Avg episode reward: [(0, '19.615')] [2025-01-07 12:44:13,139][05758] Updated weights for policy 0, policy_version 540 (0.0941) [2025-01-07 12:44:16,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 2211840. Throughput: 0: 219.3. Samples: 555354. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:44:16,609][01669] Avg episode reward: [(0, '19.576')] [2025-01-07 12:44:21,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 2215936. Throughput: 0: 219.0. Samples: 556154. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:44:21,611][01669] Avg episode reward: [(0, '19.566')] [2025-01-07 12:44:26,606][01669] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 888.6). Total num frames: 2224128. Throughput: 0: 226.2. Samples: 557488. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:44:26,611][01669] Avg episode reward: [(0, '21.044')] [2025-01-07 12:44:31,606][01669] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2228224. Throughput: 0: 224.4. Samples: 558622. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:44:31,614][01669] Avg episode reward: [(0, '20.919')] [2025-01-07 12:44:36,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2232320. Throughput: 0: 223.1. Samples: 559416. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:44:36,611][01669] Avg episode reward: [(0, '20.735')] [2025-01-07 12:44:41,608][01669] Fps is (10 sec: 819.1, 60 sec: 955.7, 300 sec: 888.6). Total num frames: 2236416. Throughput: 0: 231.2. Samples: 560888. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:44:41,611][01669] Avg episode reward: [(0, '21.260')] [2025-01-07 12:44:46,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2240512. Throughput: 0: 228.5. Samples: 562330. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:44:46,615][01669] Avg episode reward: [(0, '21.381')] [2025-01-07 12:44:51,606][01669] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2244608. Throughput: 0: 220.9. Samples: 562698. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:44:51,609][01669] Avg episode reward: [(0, '21.228')] [2025-01-07 12:44:56,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 2248704. Throughput: 0: 222.4. Samples: 564306. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:44:56,609][01669] Avg episode reward: [(0, '21.667')] [2025-01-07 12:44:57,836][05758] Updated weights for policy 0, policy_version 550 (0.0891) [2025-01-07 12:45:00,302][05745] Signal inference workers to stop experience collection... (550 times) [2025-01-07 12:45:00,376][05758] InferenceWorker_p0-w0: stopping experience collection (550 times) [2025-01-07 12:45:01,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2252800. Throughput: 0: 232.1. Samples: 565800. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:45:01,608][01669] Avg episode reward: [(0, '21.488')] [2025-01-07 12:45:01,825][05745] Signal inference workers to resume experience collection... (550 times) [2025-01-07 12:45:01,826][05758] InferenceWorker_p0-w0: resuming experience collection (550 times) [2025-01-07 12:45:06,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 2256896. Throughput: 0: 228.5. Samples: 566438. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:45:06,609][01669] Avg episode reward: [(0, '21.875')] [2025-01-07 12:45:11,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 2260992. Throughput: 0: 223.4. Samples: 567542. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:45:11,612][01669] Avg episode reward: [(0, '22.299')] [2025-01-07 12:45:16,606][01669] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 888.6). Total num frames: 2269184. Throughput: 0: 229.7. Samples: 568960. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:45:16,613][01669] Avg episode reward: [(0, '22.051')] [2025-01-07 12:45:21,607][01669] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 888.6). Total num frames: 2273280. Throughput: 0: 231.4. Samples: 569830. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:45:21,609][01669] Avg episode reward: [(0, '22.898')] [2025-01-07 12:45:25,438][05745] Saving new best policy, reward=22.898! [2025-01-07 12:45:26,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2277376. Throughput: 0: 221.4. Samples: 570850. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:45:26,609][01669] Avg episode reward: [(0, '22.846')] [2025-01-07 12:45:31,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2281472. Throughput: 0: 225.6. Samples: 572482. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:45:31,611][01669] Avg episode reward: [(0, '23.599')] [2025-01-07 12:45:33,743][05745] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000558_2285568.pth... [2025-01-07 12:45:33,862][05745] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000506_2072576.pth [2025-01-07 12:45:33,883][05745] Saving new best policy, reward=23.599! [2025-01-07 12:45:36,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2285568. Throughput: 0: 232.9. Samples: 573178. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:45:36,618][01669] Avg episode reward: [(0, '23.424')] [2025-01-07 12:45:41,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2289664. Throughput: 0: 227.2. Samples: 574528. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:45:41,613][01669] Avg episode reward: [(0, '23.820')] [2025-01-07 12:45:43,696][05745] Saving new best policy, reward=23.820! [2025-01-07 12:45:43,710][05758] Updated weights for policy 0, policy_version 560 (0.1050) [2025-01-07 12:45:46,607][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2293760. Throughput: 0: 223.7. Samples: 575866. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:45:46,609][01669] Avg episode reward: [(0, '24.761')] [2025-01-07 12:45:51,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2297856. Throughput: 0: 221.4. Samples: 576400. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:45:51,609][01669] Avg episode reward: [(0, '24.360')] [2025-01-07 12:45:51,865][05745] Saving new best policy, reward=24.761! [2025-01-07 12:45:56,606][01669] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 2306048. Throughput: 0: 232.0. Samples: 577984. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:45:56,613][01669] Avg episode reward: [(0, '24.432')] [2025-01-07 12:46:01,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2306048. Throughput: 0: 224.0. Samples: 579038. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:46:01,608][01669] Avg episode reward: [(0, '24.974')] [2025-01-07 12:46:06,076][05745] Saving new best policy, reward=24.974! [2025-01-07 12:46:06,606][01669] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 888.6). Total num frames: 2314240. Throughput: 0: 223.8. Samples: 579900. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:46:06,615][01669] Avg episode reward: [(0, '25.151')] [2025-01-07 12:46:09,877][05745] Saving new best policy, reward=25.151! [2025-01-07 12:46:11,606][01669] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 2318336. Throughput: 0: 231.8. Samples: 581280. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:46:11,611][01669] Avg episode reward: [(0, '25.461')] [2025-01-07 12:46:14,010][05745] Saving new best policy, reward=25.461! [2025-01-07 12:46:16,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2322432. Throughput: 0: 225.5. Samples: 582628. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:46:16,618][01669] Avg episode reward: [(0, '25.163')] [2025-01-07 12:46:21,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2326528. Throughput: 0: 219.1. Samples: 583038. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:46:21,609][01669] Avg episode reward: [(0, '25.299')] [2025-01-07 12:46:26,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2330624. Throughput: 0: 225.9. Samples: 584694. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:46:26,616][01669] Avg episode reward: [(0, '24.354')] [2025-01-07 12:46:27,903][05758] Updated weights for policy 0, policy_version 570 (0.0490) [2025-01-07 12:46:31,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2334720. Throughput: 0: 229.2. Samples: 586178. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:46:31,613][01669] Avg episode reward: [(0, '24.385')] [2025-01-07 12:46:36,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2338816. Throughput: 0: 228.5. Samples: 586682. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:46:36,609][01669] Avg episode reward: [(0, '24.375')] [2025-01-07 12:46:41,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2342912. Throughput: 0: 226.1. Samples: 588158. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:46:41,615][01669] Avg episode reward: [(0, '24.513')] [2025-01-07 12:46:46,607][01669] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 2351104. Throughput: 0: 232.9. Samples: 589518. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:46:46,609][01669] Avg episode reward: [(0, '24.760')] [2025-01-07 12:46:51,607][01669] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 2355200. Throughput: 0: 229.2. Samples: 590214. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:46:51,610][01669] Avg episode reward: [(0, '24.557')] [2025-01-07 12:46:56,613][01669] Fps is (10 sec: 818.7, 60 sec: 887.4, 300 sec: 902.5). Total num frames: 2359296. Throughput: 0: 220.7. Samples: 591214. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:46:56,618][01669] Avg episode reward: [(0, '23.690')] [2025-01-07 12:47:01,606][01669] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 2363392. Throughput: 0: 229.9. Samples: 592974. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:47:01,609][01669] Avg episode reward: [(0, '23.749')] [2025-01-07 12:47:06,607][01669] Fps is (10 sec: 819.7, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2367488. Throughput: 0: 232.5. Samples: 593502. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:47:06,609][01669] Avg episode reward: [(0, '24.045')] [2025-01-07 12:47:11,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2371584. Throughput: 0: 225.6. Samples: 594846. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2025-01-07 12:47:11,612][01669] Avg episode reward: [(0, '23.820')] [2025-01-07 12:47:13,852][05758] Updated weights for policy 0, policy_version 580 (0.1023) [2025-01-07 12:47:16,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2375680. Throughput: 0: 225.7. Samples: 596334. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2025-01-07 12:47:16,609][01669] Avg episode reward: [(0, '23.868')] [2025-01-07 12:47:21,606][01669] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 2383872. Throughput: 0: 232.4. Samples: 597142. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2025-01-07 12:47:21,608][01669] Avg episode reward: [(0, '23.125')] [2025-01-07 12:47:26,606][01669] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 2387968. Throughput: 0: 226.4. Samples: 598344. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:47:26,621][01669] Avg episode reward: [(0, '23.225')] [2025-01-07 12:47:31,606][01669] Fps is (10 sec: 409.6, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2387968. Throughput: 0: 218.3. Samples: 599342. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:47:31,615][01669] Avg episode reward: [(0, '23.131')] [2025-01-07 12:47:35,500][05745] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000585_2396160.pth... [2025-01-07 12:47:35,609][05745] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000532_2179072.pth [2025-01-07 12:47:36,606][01669] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 2396160. Throughput: 0: 226.1. Samples: 600390. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2025-01-07 12:47:36,608][01669] Avg episode reward: [(0, '23.018')] [2025-01-07 12:47:41,606][01669] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 2400256. Throughput: 0: 233.4. Samples: 601716. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2025-01-07 12:47:41,609][01669] Avg episode reward: [(0, '23.274')] [2025-01-07 12:47:46,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2404352. Throughput: 0: 223.5. Samples: 603032. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2025-01-07 12:47:46,616][01669] Avg episode reward: [(0, '22.601')] [2025-01-07 12:47:51,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2408448. Throughput: 0: 222.7. Samples: 603524. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2025-01-07 12:47:51,609][01669] Avg episode reward: [(0, '22.601')] [2025-01-07 12:47:56,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.6, 300 sec: 902.5). Total num frames: 2412544. Throughput: 0: 229.4. Samples: 605170. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2025-01-07 12:47:56,615][01669] Avg episode reward: [(0, '22.487')] [2025-01-07 12:47:57,496][05758] Updated weights for policy 0, policy_version 590 (0.1300) [2025-01-07 12:48:01,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2416640. Throughput: 0: 228.6. Samples: 606620. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2025-01-07 12:48:01,609][01669] Avg episode reward: [(0, '22.692')] [2025-01-07 12:48:06,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2420736. Throughput: 0: 222.3. Samples: 607144. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:48:06,611][01669] Avg episode reward: [(0, '22.647')] [2025-01-07 12:48:11,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2424832. Throughput: 0: 226.6. Samples: 608542. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:48:11,617][01669] Avg episode reward: [(0, '22.350')] [2025-01-07 12:48:16,606][01669] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 2433024. Throughput: 0: 236.1. Samples: 609968. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:48:16,615][01669] Avg episode reward: [(0, '22.398')] [2025-01-07 12:48:21,606][01669] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 2437120. Throughput: 0: 230.0. Samples: 610742. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:48:21,608][01669] Avg episode reward: [(0, '22.157')] [2025-01-07 12:48:26,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2441216. Throughput: 0: 222.0. Samples: 611708. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:48:26,609][01669] Avg episode reward: [(0, '22.515')] [2025-01-07 12:48:31,606][01669] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 2445312. Throughput: 0: 224.3. Samples: 613124. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:48:31,609][01669] Avg episode reward: [(0, '22.868')] [2025-01-07 12:48:36,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 2449408. Throughput: 0: 227.6. Samples: 613768. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:48:36,615][01669] Avg episode reward: [(0, '22.601')] [2025-01-07 12:48:41,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2453504. Throughput: 0: 213.6. Samples: 614784. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:48:41,612][01669] Avg episode reward: [(0, '22.398')] [2025-01-07 12:48:45,217][05758] Updated weights for policy 0, policy_version 600 (0.1083) [2025-01-07 12:48:46,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2457600. Throughput: 0: 217.7. Samples: 616416. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:48:46,608][01669] Avg episode reward: [(0, '23.156')] [2025-01-07 12:48:47,544][05745] Signal inference workers to stop experience collection... (600 times) [2025-01-07 12:48:47,589][05758] InferenceWorker_p0-w0: stopping experience collection (600 times) [2025-01-07 12:48:49,202][05745] Signal inference workers to resume experience collection... (600 times) [2025-01-07 12:48:49,205][05758] InferenceWorker_p0-w0: resuming experience collection (600 times) [2025-01-07 12:48:51,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2461696. Throughput: 0: 218.3. Samples: 616966. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:48:51,612][01669] Avg episode reward: [(0, '23.550')] [2025-01-07 12:48:56,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2465792. Throughput: 0: 218.7. Samples: 618382. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:48:56,614][01669] Avg episode reward: [(0, '23.630')] [2025-01-07 12:49:01,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2469888. Throughput: 0: 214.8. Samples: 619634. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:49:01,614][01669] Avg episode reward: [(0, '23.354')] [2025-01-07 12:49:06,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2473984. Throughput: 0: 211.6. Samples: 620264. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:49:06,614][01669] Avg episode reward: [(0, '23.300')] [2025-01-07 12:49:11,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2478080. Throughput: 0: 226.4. Samples: 621894. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:49:11,612][01669] Avg episode reward: [(0, '22.847')] [2025-01-07 12:49:16,606][01669] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 902.5). Total num frames: 2482176. Throughput: 0: 217.1. Samples: 622892. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:49:16,613][01669] Avg episode reward: [(0, '23.012')] [2025-01-07 12:49:21,606][01669] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 888.6). Total num frames: 2486272. Throughput: 0: 214.8. Samples: 623434. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:49:21,616][01669] Avg episode reward: [(0, '23.486')] [2025-01-07 12:49:26,606][01669] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2494464. Throughput: 0: 231.3. Samples: 625194. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:49:26,617][01669] Avg episode reward: [(0, '23.188')] [2025-01-07 12:49:31,606][01669] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 888.6). Total num frames: 2494464. Throughput: 0: 218.8. Samples: 626260. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:49:31,614][01669] Avg episode reward: [(0, '23.060')] [2025-01-07 12:49:33,098][05758] Updated weights for policy 0, policy_version 610 (0.1930) [2025-01-07 12:49:36,606][01669] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 888.6). Total num frames: 2498560. Throughput: 0: 214.5. Samples: 626618. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:49:36,618][01669] Avg episode reward: [(0, '23.060')] [2025-01-07 12:49:37,393][05745] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000611_2502656.pth... [2025-01-07 12:49:37,502][05745] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000558_2285568.pth [2025-01-07 12:49:41,607][01669] Fps is (10 sec: 1228.7, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2506752. Throughput: 0: 220.5. Samples: 628306. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:49:41,610][01669] Avg episode reward: [(0, '22.549')] [2025-01-07 12:49:46,610][01669] Fps is (10 sec: 1228.4, 60 sec: 887.4, 300 sec: 902.5). Total num frames: 2510848. Throughput: 0: 219.9. Samples: 629530. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:49:46,612][01669] Avg episode reward: [(0, '23.328')] [2025-01-07 12:49:51,606][01669] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2514944. Throughput: 0: 221.6. Samples: 630234. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:49:51,614][01669] Avg episode reward: [(0, '23.917')] [2025-01-07 12:49:56,606][01669] Fps is (10 sec: 819.5, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2519040. Throughput: 0: 209.0. Samples: 631300. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:49:56,609][01669] Avg episode reward: [(0, '23.910')] [2025-01-07 12:50:01,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2523136. Throughput: 0: 209.5. Samples: 632318. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:50:01,609][01669] Avg episode reward: [(0, '23.976')] [2025-01-07 12:50:06,608][01669] Fps is (10 sec: 819.1, 60 sec: 887.4, 300 sec: 902.5). Total num frames: 2527232. Throughput: 0: 221.1. Samples: 633382. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:50:06,616][01669] Avg episode reward: [(0, '23.916')] [2025-01-07 12:50:11,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2531328. Throughput: 0: 205.4. Samples: 634438. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:50:11,609][01669] Avg episode reward: [(0, '24.031')] [2025-01-07 12:50:16,606][01669] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2535424. Throughput: 0: 217.5. Samples: 636048. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:50:16,611][01669] Avg episode reward: [(0, '24.079')] [2025-01-07 12:50:18,678][05758] Updated weights for policy 0, policy_version 620 (0.0499) [2025-01-07 12:50:21,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2539520. Throughput: 0: 226.2. Samples: 636796. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:50:21,610][01669] Avg episode reward: [(0, '23.643')] [2025-01-07 12:50:26,606][01669] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 888.6). Total num frames: 2543616. Throughput: 0: 212.8. Samples: 637884. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:50:26,614][01669] Avg episode reward: [(0, '23.524')] [2025-01-07 12:50:31,608][01669] Fps is (10 sec: 819.1, 60 sec: 887.4, 300 sec: 888.6). Total num frames: 2547712. Throughput: 0: 219.5. Samples: 639408. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:50:31,618][01669] Avg episode reward: [(0, '23.653')] [2025-01-07 12:50:36,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2551808. Throughput: 0: 216.9. Samples: 639994. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:50:36,611][01669] Avg episode reward: [(0, '23.435')] [2025-01-07 12:50:41,606][01669] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 888.6). Total num frames: 2555904. Throughput: 0: 227.0. Samples: 641514. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:50:41,609][01669] Avg episode reward: [(0, '22.997')] [2025-01-07 12:50:46,606][01669] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 888.6). Total num frames: 2560000. Throughput: 0: 226.9. Samples: 642528. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:50:46,617][01669] Avg episode reward: [(0, '22.870')] [2025-01-07 12:50:51,606][01669] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 874.7). Total num frames: 2564096. Throughput: 0: 220.1. Samples: 643284. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:50:51,609][01669] Avg episode reward: [(0, '23.211')] [2025-01-07 12:50:56,606][01669] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2572288. Throughput: 0: 228.8. Samples: 644736. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:50:56,609][01669] Avg episode reward: [(0, '23.732')] [2025-01-07 12:51:01,606][01669] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 874.7). Total num frames: 2572288. Throughput: 0: 214.9. Samples: 645720. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:51:01,614][01669] Avg episode reward: [(0, '23.516')] [2025-01-07 12:51:06,607][01669] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 874.7). Total num frames: 2576384. Throughput: 0: 212.9. Samples: 646378. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2025-01-07 12:51:06,618][01669] Avg episode reward: [(0, '23.259')] [2025-01-07 12:51:06,632][05758] Updated weights for policy 0, policy_version 630 (0.0038) [2025-01-07 12:51:11,606][01669] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2584576. Throughput: 0: 219.6. Samples: 647768. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:51:11,614][01669] Avg episode reward: [(0, '22.773')] [2025-01-07 12:51:16,606][01669] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2588672. Throughput: 0: 216.6. Samples: 649156. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:51:16,617][01669] Avg episode reward: [(0, '22.673')] [2025-01-07 12:51:21,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2592768. Throughput: 0: 216.1. Samples: 649720. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:51:21,609][01669] Avg episode reward: [(0, '23.119')] [2025-01-07 12:51:26,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2596864. Throughput: 0: 209.6. Samples: 650946. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:51:26,619][01669] Avg episode reward: [(0, '23.389')] [2025-01-07 12:51:31,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2600960. Throughput: 0: 226.9. Samples: 652738. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:51:31,608][01669] Avg episode reward: [(0, '23.416')] [2025-01-07 12:51:36,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2605056. Throughput: 0: 214.8. Samples: 652948. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:51:36,618][01669] Avg episode reward: [(0, '23.407')] [2025-01-07 12:51:39,523][05745] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000637_2609152.pth... [2025-01-07 12:51:39,636][05745] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000585_2396160.pth [2025-01-07 12:51:41,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 2609152. Throughput: 0: 209.9. Samples: 654182. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:51:41,608][01669] Avg episode reward: [(0, '22.378')] [2025-01-07 12:51:46,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 2613248. Throughput: 0: 223.6. Samples: 655780. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:51:46,615][01669] Avg episode reward: [(0, '22.157')] [2025-01-07 12:51:51,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.8). Total num frames: 2617344. Throughput: 0: 227.5. Samples: 656616. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:51:51,609][01669] Avg episode reward: [(0, '21.818')] [2025-01-07 12:51:53,445][05758] Updated weights for policy 0, policy_version 640 (0.0628) [2025-01-07 12:51:56,606][01669] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 874.7). Total num frames: 2621440. Throughput: 0: 213.0. Samples: 657354. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:51:56,615][01669] Avg episode reward: [(0, '22.185')] [2025-01-07 12:52:01,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 2625536. Throughput: 0: 215.5. Samples: 658852. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:52:01,615][01669] Avg episode reward: [(0, '21.583')] [2025-01-07 12:52:06,607][01669] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 888.6). Total num frames: 2633728. Throughput: 0: 222.0. Samples: 659710. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:52:06,614][01669] Avg episode reward: [(0, '22.530')] [2025-01-07 12:52:11,606][01669] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 874.7). Total num frames: 2633728. Throughput: 0: 220.5. Samples: 660868. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:52:11,613][01669] Avg episode reward: [(0, '22.530')] [2025-01-07 12:52:16,608][01669] Fps is (10 sec: 819.1, 60 sec: 887.4, 300 sec: 874.7). Total num frames: 2641920. Throughput: 0: 204.7. Samples: 661952. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:52:16,619][01669] Avg episode reward: [(0, '23.038')] [2025-01-07 12:52:21,606][01669] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 2646016. Throughput: 0: 221.6. Samples: 662920. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:52:21,610][01669] Avg episode reward: [(0, '22.813')] [2025-01-07 12:52:26,606][01669] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2650112. Throughput: 0: 223.1. Samples: 664222. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:52:26,616][01669] Avg episode reward: [(0, '22.813')] [2025-01-07 12:52:31,610][01669] Fps is (10 sec: 818.9, 60 sec: 887.4, 300 sec: 874.7). Total num frames: 2654208. Throughput: 0: 210.1. Samples: 665234. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:52:31,616][01669] Avg episode reward: [(0, '22.419')] [2025-01-07 12:52:36,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 2658304. Throughput: 0: 209.2. Samples: 666030. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:52:36,614][01669] Avg episode reward: [(0, '22.438')] [2025-01-07 12:52:39,066][05758] Updated weights for policy 0, policy_version 650 (0.0491) [2025-01-07 12:52:41,360][05745] Signal inference workers to stop experience collection... (650 times) [2025-01-07 12:52:41,401][05758] InferenceWorker_p0-w0: stopping experience collection (650 times) [2025-01-07 12:52:41,606][01669] Fps is (10 sec: 819.5, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 2662400. Throughput: 0: 229.6. Samples: 667688. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:52:41,609][01669] Avg episode reward: [(0, '23.276')] [2025-01-07 12:52:43,510][05745] Signal inference workers to resume experience collection... (650 times) [2025-01-07 12:52:43,517][05758] InferenceWorker_p0-w0: resuming experience collection (650 times) [2025-01-07 12:52:46,607][01669] Fps is (10 sec: 819.1, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 2666496. Throughput: 0: 219.1. Samples: 668712. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:52:46,610][01669] Avg episode reward: [(0, '23.400')] [2025-01-07 12:52:51,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 2670592. Throughput: 0: 209.8. Samples: 669150. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:52:51,618][01669] Avg episode reward: [(0, '22.565')] [2025-01-07 12:52:56,606][01669] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 2674688. Throughput: 0: 226.5. Samples: 671060. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:52:56,618][01669] Avg episode reward: [(0, '22.518')] [2025-01-07 12:53:01,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 2678784. Throughput: 0: 230.1. Samples: 672308. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 12:53:01,608][01669] Avg episode reward: [(0, '22.581')] [2025-01-07 12:53:06,614][01669] Fps is (10 sec: 818.6, 60 sec: 819.1, 300 sec: 874.7). Total num frames: 2682880. Throughput: 0: 217.1. Samples: 672690. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:53:06,617][01669] Avg episode reward: [(0, '22.529')] [2025-01-07 12:53:11,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 860.9). Total num frames: 2686976. Throughput: 0: 222.2. Samples: 674220. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:53:11,618][01669] Avg episode reward: [(0, '23.329')] [2025-01-07 12:53:16,606][01669] Fps is (10 sec: 1229.7, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 2695168. Throughput: 0: 230.0. Samples: 675582. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:53:16,608][01669] Avg episode reward: [(0, '23.438')] [2025-01-07 12:53:21,606][01669] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 2699264. Throughput: 0: 230.2. Samples: 676390. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:53:21,612][01669] Avg episode reward: [(0, '23.040')] [2025-01-07 12:53:26,606][01669] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 860.9). Total num frames: 2699264. Throughput: 0: 216.7. Samples: 677440. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:53:26,614][01669] Avg episode reward: [(0, '23.284')] [2025-01-07 12:53:26,657][05758] Updated weights for policy 0, policy_version 660 (0.1075) [2025-01-07 12:53:31,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 2707456. Throughput: 0: 225.2. Samples: 678844. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:53:31,609][01669] Avg episode reward: [(0, '23.507')] [2025-01-07 12:53:34,541][05745] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000662_2711552.pth... [2025-01-07 12:53:34,658][05745] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000611_2502656.pth [2025-01-07 12:53:36,606][01669] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 2711552. Throughput: 0: 229.9. Samples: 679496. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:53:36,609][01669] Avg episode reward: [(0, '23.794')] [2025-01-07 12:53:41,614][01669] Fps is (10 sec: 818.6, 60 sec: 887.4, 300 sec: 874.7). Total num frames: 2715648. Throughput: 0: 210.7. Samples: 680544. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:53:41,617][01669] Avg episode reward: [(0, '24.449')] [2025-01-07 12:53:46,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 2719744. Throughput: 0: 219.5. Samples: 682184. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:53:46,608][01669] Avg episode reward: [(0, '24.201')] [2025-01-07 12:53:51,606][01669] Fps is (10 sec: 819.8, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 2723840. Throughput: 0: 225.1. Samples: 682818. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:53:51,618][01669] Avg episode reward: [(0, '24.518')] [2025-01-07 12:53:56,612][01669] Fps is (10 sec: 818.7, 60 sec: 887.4, 300 sec: 874.7). Total num frames: 2727936. Throughput: 0: 219.2. Samples: 684086. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:53:56,615][01669] Avg episode reward: [(0, '24.049')] [2025-01-07 12:54:01,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 2732032. Throughput: 0: 219.6. Samples: 685466. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:54:01,609][01669] Avg episode reward: [(0, '24.799')] [2025-01-07 12:54:06,606][01669] Fps is (10 sec: 819.7, 60 sec: 887.6, 300 sec: 874.7). Total num frames: 2736128. Throughput: 0: 215.4. Samples: 686082. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:54:06,608][01669] Avg episode reward: [(0, '25.358')] [2025-01-07 12:54:11,607][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 2740224. Throughput: 0: 229.6. Samples: 687770. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:54:11,609][01669] Avg episode reward: [(0, '25.201')] [2025-01-07 12:54:12,903][05758] Updated weights for policy 0, policy_version 670 (0.1160) [2025-01-07 12:54:16,606][01669] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 874.7). Total num frames: 2744320. Throughput: 0: 217.5. Samples: 688632. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:54:16,613][01669] Avg episode reward: [(0, '24.875')] [2025-01-07 12:54:21,606][01669] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 860.9). Total num frames: 2748416. Throughput: 0: 219.5. Samples: 689374. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:54:21,614][01669] Avg episode reward: [(0, '24.599')] [2025-01-07 12:54:26,606][01669] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 888.6). Total num frames: 2756608. Throughput: 0: 230.0. Samples: 690894. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2025-01-07 12:54:26,612][01669] Avg episode reward: [(0, '25.837')] [2025-01-07 12:54:31,480][05745] Saving new best policy, reward=25.837! [2025-01-07 12:54:31,610][01669] Fps is (10 sec: 1228.4, 60 sec: 887.4, 300 sec: 888.6). Total num frames: 2760704. Throughput: 0: 215.9. Samples: 691900. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2025-01-07 12:54:31,613][01669] Avg episode reward: [(0, '25.829')] [2025-01-07 12:54:36,606][01669] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 860.9). Total num frames: 2760704. Throughput: 0: 218.0. Samples: 692628. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2025-01-07 12:54:36,609][01669] Avg episode reward: [(0, '25.589')] [2025-01-07 12:54:41,606][01669] Fps is (10 sec: 819.5, 60 sec: 887.6, 300 sec: 874.7). Total num frames: 2768896. Throughput: 0: 219.1. Samples: 693944. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2025-01-07 12:54:41,616][01669] Avg episode reward: [(0, '26.262')] [2025-01-07 12:54:44,994][05745] Saving new best policy, reward=26.262! [2025-01-07 12:54:46,606][01669] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 2772992. Throughput: 0: 222.5. Samples: 695480. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2025-01-07 12:54:46,612][01669] Avg episode reward: [(0, '26.319')] [2025-01-07 12:54:50,906][05745] Saving new best policy, reward=26.319! [2025-01-07 12:54:51,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 2777088. Throughput: 0: 220.1. Samples: 695986. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2025-01-07 12:54:51,613][01669] Avg episode reward: [(0, '26.318')] [2025-01-07 12:54:56,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.6, 300 sec: 874.7). Total num frames: 2781184. Throughput: 0: 205.3. Samples: 697008. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2025-01-07 12:54:56,611][01669] Avg episode reward: [(0, '26.368')] [2025-01-07 12:54:59,791][05745] Saving new best policy, reward=26.368! [2025-01-07 12:54:59,796][05758] Updated weights for policy 0, policy_version 680 (0.0505) [2025-01-07 12:55:01,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 2785280. Throughput: 0: 225.5. Samples: 698778. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2025-01-07 12:55:01,621][01669] Avg episode reward: [(0, '25.974')] [2025-01-07 12:55:06,607][01669] Fps is (10 sec: 819.1, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 2789376. Throughput: 0: 217.2. Samples: 699148. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2025-01-07 12:55:06,613][01669] Avg episode reward: [(0, '27.121')] [2025-01-07 12:55:10,216][05745] Saving new best policy, reward=27.121! [2025-01-07 12:55:11,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 2793472. Throughput: 0: 204.2. Samples: 700084. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2025-01-07 12:55:11,609][01669] Avg episode reward: [(0, '27.427')] [2025-01-07 12:55:14,690][05745] Saving new best policy, reward=27.427! [2025-01-07 12:55:16,606][01669] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 2797568. Throughput: 0: 222.1. Samples: 701894. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2025-01-07 12:55:16,612][01669] Avg episode reward: [(0, '26.781')] [2025-01-07 12:55:21,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 2801664. Throughput: 0: 217.8. Samples: 702430. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2025-01-07 12:55:21,610][01669] Avg episode reward: [(0, '26.152')] [2025-01-07 12:55:26,606][01669] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 874.7). Total num frames: 2805760. Throughput: 0: 214.7. Samples: 703606. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2025-01-07 12:55:26,616][01669] Avg episode reward: [(0, '26.216')] [2025-01-07 12:55:31,606][01669] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 874.7). Total num frames: 2809856. Throughput: 0: 213.2. Samples: 705076. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2025-01-07 12:55:31,609][01669] Avg episode reward: [(0, '26.383')] [2025-01-07 12:55:33,378][05745] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000687_2813952.pth... [2025-01-07 12:55:33,488][05745] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000637_2609152.pth [2025-01-07 12:55:36,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 2813952. Throughput: 0: 214.9. Samples: 705658. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2025-01-07 12:55:36,616][01669] Avg episode reward: [(0, '26.391')] [2025-01-07 12:55:41,606][01669] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 874.7). Total num frames: 2818048. Throughput: 0: 225.6. Samples: 707162. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2025-01-07 12:55:41,609][01669] Avg episode reward: [(0, '26.466')] [2025-01-07 12:55:46,606][01669] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 874.7). Total num frames: 2822144. Throughput: 0: 207.1. Samples: 708098. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:55:46,611][01669] Avg episode reward: [(0, '26.255')] [2025-01-07 12:55:48,038][05758] Updated weights for policy 0, policy_version 690 (0.0059) [2025-01-07 12:55:51,606][01669] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 860.9). Total num frames: 2826240. Throughput: 0: 215.3. Samples: 708836. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:55:51,609][01669] Avg episode reward: [(0, '26.022')] [2025-01-07 12:55:56,606][01669] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2834432. Throughput: 0: 225.9. Samples: 710248. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:55:56,616][01669] Avg episode reward: [(0, '26.102')] [2025-01-07 12:56:01,606][01669] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 874.7). Total num frames: 2834432. Throughput: 0: 208.5. Samples: 711278. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:56:01,609][01669] Avg episode reward: [(0, '25.784')] [2025-01-07 12:56:06,606][01669] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 860.9). Total num frames: 2838528. Throughput: 0: 210.9. Samples: 711922. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2025-01-07 12:56:06,615][01669] Avg episode reward: [(0, '26.043')] [2025-01-07 12:56:11,606][01669] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 2846720. Throughput: 0: 216.4. Samples: 713346. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:56:11,609][01669] Avg episode reward: [(0, '24.743')] [2025-01-07 12:56:16,606][01669] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 2850816. Throughput: 0: 214.4. Samples: 714726. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:56:16,609][01669] Avg episode reward: [(0, '25.030')] [2025-01-07 12:56:21,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 2854912. Throughput: 0: 215.7. Samples: 715366. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2025-01-07 12:56:21,615][01669] Avg episode reward: [(0, '25.178')] [2025-01-07 12:56:26,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 2859008. Throughput: 0: 207.2. Samples: 716486. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2025-01-07 12:56:26,615][01669] Avg episode reward: [(0, '25.631')] [2025-01-07 12:56:31,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 2863104. Throughput: 0: 226.0. Samples: 718270. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2025-01-07 12:56:31,609][01669] Avg episode reward: [(0, '25.526')] [2025-01-07 12:56:34,118][05758] Updated weights for policy 0, policy_version 700 (0.1447) [2025-01-07 12:56:36,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 2867200. Throughput: 0: 216.0. Samples: 718556. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2025-01-07 12:56:36,610][01669] Avg episode reward: [(0, '25.165')] [2025-01-07 12:56:38,473][05745] Signal inference workers to stop experience collection... (700 times) [2025-01-07 12:56:38,539][05758] InferenceWorker_p0-w0: stopping experience collection (700 times) [2025-01-07 12:56:39,986][05745] Signal inference workers to resume experience collection... (700 times) [2025-01-07 12:56:39,987][05758] InferenceWorker_p0-w0: resuming experience collection (700 times) [2025-01-07 12:56:41,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 2871296. Throughput: 0: 205.7. Samples: 719506. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2025-01-07 12:56:41,609][01669] Avg episode reward: [(0, '24.847')] [2025-01-07 12:56:46,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 2875392. Throughput: 0: 227.5. Samples: 721516. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2025-01-07 12:56:46,616][01669] Avg episode reward: [(0, '24.468')] [2025-01-07 12:56:51,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 2879488. Throughput: 0: 224.3. Samples: 722016. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:56:51,611][01669] Avg episode reward: [(0, '24.139')] [2025-01-07 12:56:56,606][01669] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 874.7). Total num frames: 2883584. Throughput: 0: 216.0. Samples: 723068. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:56:56,618][01669] Avg episode reward: [(0, '24.074')] [2025-01-07 12:57:01,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 860.9). Total num frames: 2887680. Throughput: 0: 218.9. Samples: 724576. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:57:01,615][01669] Avg episode reward: [(0, '24.066')] [2025-01-07 12:57:06,606][01669] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 888.6). Total num frames: 2895872. Throughput: 0: 222.6. Samples: 725384. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2025-01-07 12:57:06,613][01669] Avg episode reward: [(0, '24.586')] [2025-01-07 12:57:11,609][01669] Fps is (10 sec: 819.0, 60 sec: 819.2, 300 sec: 860.9). Total num frames: 2895872. Throughput: 0: 223.7. Samples: 726552. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2025-01-07 12:57:11,616][01669] Avg episode reward: [(0, '23.934')] [2025-01-07 12:57:16,606][01669] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 860.9). Total num frames: 2899968. Throughput: 0: 206.2. Samples: 727548. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2025-01-07 12:57:16,609][01669] Avg episode reward: [(0, '23.771')] [2025-01-07 12:57:20,529][05758] Updated weights for policy 0, policy_version 710 (0.1392) [2025-01-07 12:57:21,606][01669] Fps is (10 sec: 1229.1, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 2908160. Throughput: 0: 223.0. Samples: 728590. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:57:21,612][01669] Avg episode reward: [(0, '24.112')] [2025-01-07 12:57:26,615][01669] Fps is (10 sec: 1227.7, 60 sec: 887.3, 300 sec: 874.7). Total num frames: 2912256. Throughput: 0: 233.2. Samples: 730000. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:57:26,618][01669] Avg episode reward: [(0, '24.328')] [2025-01-07 12:57:31,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 2916352. Throughput: 0: 209.3. Samples: 730934. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:57:31,610][01669] Avg episode reward: [(0, '24.445')] [2025-01-07 12:57:34,919][05745] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000713_2920448.pth... [2025-01-07 12:57:35,050][05745] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000662_2711552.pth [2025-01-07 12:57:36,606][01669] Fps is (10 sec: 819.9, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 2920448. Throughput: 0: 214.4. Samples: 731662. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:57:36,619][01669] Avg episode reward: [(0, '24.831')] [2025-01-07 12:57:41,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 2924544. Throughput: 0: 226.9. Samples: 733280. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:57:41,614][01669] Avg episode reward: [(0, '24.551')] [2025-01-07 12:57:46,607][01669] Fps is (10 sec: 819.1, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 2928640. Throughput: 0: 220.0. Samples: 734478. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:57:46,612][01669] Avg episode reward: [(0, '24.366')] [2025-01-07 12:57:51,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 2932736. Throughput: 0: 207.7. Samples: 734730. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:57:51,609][01669] Avg episode reward: [(0, '24.954')] [2025-01-07 12:57:56,607][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 2936832. Throughput: 0: 222.5. Samples: 736566. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:57:56,618][01669] Avg episode reward: [(0, '25.035')] [2025-01-07 12:58:01,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.8). Total num frames: 2940928. Throughput: 0: 227.3. Samples: 737776. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:58:01,609][01669] Avg episode reward: [(0, '24.860')] [2025-01-07 12:58:06,606][01669] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 874.7). Total num frames: 2945024. Throughput: 0: 216.2. Samples: 738318. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:58:06,613][01669] Avg episode reward: [(0, '25.178')] [2025-01-07 12:58:08,196][05758] Updated weights for policy 0, policy_version 720 (0.0502) [2025-01-07 12:58:11,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 860.9). Total num frames: 2949120. Throughput: 0: 219.4. Samples: 739870. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:58:11,615][01669] Avg episode reward: [(0, '25.322')] [2025-01-07 12:58:16,606][01669] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 874.7). Total num frames: 2957312. Throughput: 0: 225.0. Samples: 741058. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 12:58:16,616][01669] Avg episode reward: [(0, '25.574')] [2025-01-07 12:58:21,606][01669] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2961408. Throughput: 0: 229.1. Samples: 741970. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:58:21,609][01669] Avg episode reward: [(0, '25.869')] [2025-01-07 12:58:26,606][01669] Fps is (10 sec: 409.6, 60 sec: 819.3, 300 sec: 860.9). Total num frames: 2961408. Throughput: 0: 213.9. Samples: 742906. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:58:26,609][01669] Avg episode reward: [(0, '26.432')] [2025-01-07 12:58:31,607][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 2969600. Throughput: 0: 219.0. Samples: 744334. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2025-01-07 12:58:31,615][01669] Avg episode reward: [(0, '25.876')] [2025-01-07 12:58:36,611][01669] Fps is (10 sec: 1228.3, 60 sec: 887.4, 300 sec: 874.7). Total num frames: 2973696. Throughput: 0: 228.3. Samples: 745006. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2025-01-07 12:58:36,618][01669] Avg episode reward: [(0, '25.872')] [2025-01-07 12:58:41,607][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 2977792. Throughput: 0: 212.0. Samples: 746104. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:58:41,612][01669] Avg episode reward: [(0, '26.377')] [2025-01-07 12:58:46,606][01669] Fps is (10 sec: 819.6, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 2981888. Throughput: 0: 219.1. Samples: 747636. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:58:46,615][01669] Avg episode reward: [(0, '26.657')] [2025-01-07 12:58:51,606][01669] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 874.8). Total num frames: 2985984. Throughput: 0: 220.6. Samples: 748244. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:58:51,614][01669] Avg episode reward: [(0, '26.831')] [2025-01-07 12:58:53,109][05758] Updated weights for policy 0, policy_version 730 (0.1027) [2025-01-07 12:58:56,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 2990080. Throughput: 0: 218.6. Samples: 749706. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:58:56,609][01669] Avg episode reward: [(0, '27.148')] [2025-01-07 12:59:01,608][01669] Fps is (10 sec: 819.1, 60 sec: 887.4, 300 sec: 874.7). Total num frames: 2994176. Throughput: 0: 216.3. Samples: 750790. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:59:01,618][01669] Avg episode reward: [(0, '27.083')] [2025-01-07 12:59:06,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 2998272. Throughput: 0: 210.5. Samples: 751442. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:59:06,617][01669] Avg episode reward: [(0, '27.272')] [2025-01-07 12:59:11,606][01669] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 3002368. Throughput: 0: 228.4. Samples: 753186. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:59:11,613][01669] Avg episode reward: [(0, '27.466')] [2025-01-07 12:59:16,610][01669] Fps is (10 sec: 818.9, 60 sec: 819.2, 300 sec: 874.7). Total num frames: 3006464. Throughput: 0: 221.6. Samples: 754308. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:59:16,619][01669] Avg episode reward: [(0, '27.080')] [2025-01-07 12:59:18,357][05745] Saving new best policy, reward=27.466! [2025-01-07 12:59:21,606][01669] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 860.9). Total num frames: 3010560. Throughput: 0: 216.2. Samples: 754734. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2025-01-07 12:59:21,609][01669] Avg episode reward: [(0, '27.091')] [2025-01-07 12:59:26,607][01669] Fps is (10 sec: 1229.2, 60 sec: 955.7, 300 sec: 874.7). Total num frames: 3018752. Throughput: 0: 227.6. Samples: 756344. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2025-01-07 12:59:26,618][01669] Avg episode reward: [(0, '27.345')] [2025-01-07 12:59:31,608][01669] Fps is (10 sec: 1228.6, 60 sec: 887.4, 300 sec: 888.6). Total num frames: 3022848. Throughput: 0: 218.6. Samples: 757474. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2025-01-07 12:59:31,614][01669] Avg episode reward: [(0, '27.279')] [2025-01-07 12:59:36,606][01669] Fps is (10 sec: 409.6, 60 sec: 819.3, 300 sec: 860.9). Total num frames: 3022848. Throughput: 0: 220.8. Samples: 758182. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2025-01-07 12:59:36,616][01669] Avg episode reward: [(0, '27.612')] [2025-01-07 12:59:36,811][05745] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000739_3026944.pth... [2025-01-07 12:59:36,921][05745] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000687_2813952.pth [2025-01-07 12:59:40,837][05745] Saving new best policy, reward=27.612! [2025-01-07 12:59:40,843][05758] Updated weights for policy 0, policy_version 740 (0.0956) [2025-01-07 12:59:41,610][01669] Fps is (10 sec: 819.1, 60 sec: 887.4, 300 sec: 874.7). Total num frames: 3031040. Throughput: 0: 215.2. Samples: 759390. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2025-01-07 12:59:41,615][01669] Avg episode reward: [(0, '27.367')] [2025-01-07 12:59:46,606][01669] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 3035136. Throughput: 0: 230.0. Samples: 761140. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:59:46,611][01669] Avg episode reward: [(0, '28.009')] [2025-01-07 12:59:49,823][05745] Saving new best policy, reward=28.009! [2025-01-07 12:59:51,606][01669] Fps is (10 sec: 819.5, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 3039232. Throughput: 0: 220.8. Samples: 761380. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:59:51,610][01669] Avg episode reward: [(0, '28.153')] [2025-01-07 12:59:55,213][05745] Saving new best policy, reward=28.153! [2025-01-07 12:59:56,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 3043328. Throughput: 0: 205.9. Samples: 762452. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 12:59:56,615][01669] Avg episode reward: [(0, '28.178')] [2025-01-07 12:59:59,214][05745] Saving new best policy, reward=28.178! [2025-01-07 13:00:01,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 3047424. Throughput: 0: 223.2. Samples: 764352. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 13:00:01,617][01669] Avg episode reward: [(0, '28.277')] [2025-01-07 13:00:06,607][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 3051520. Throughput: 0: 223.4. Samples: 764786. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 13:00:06,620][01669] Avg episode reward: [(0, '27.542')] [2025-01-07 13:00:09,205][05745] Saving new best policy, reward=28.277! [2025-01-07 13:00:11,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 3055616. Throughput: 0: 208.3. Samples: 765716. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 13:00:11,613][01669] Avg episode reward: [(0, '27.624')] [2025-01-07 13:00:16,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 3059712. Throughput: 0: 218.7. Samples: 767314. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 13:00:16,611][01669] Avg episode reward: [(0, '28.323')] [2025-01-07 13:00:18,175][05745] Saving new best policy, reward=28.323! [2025-01-07 13:00:21,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 3063808. Throughput: 0: 219.8. Samples: 768072. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 13:00:21,617][01669] Avg episode reward: [(0, '28.487')] [2025-01-07 13:00:26,606][01669] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 874.7). Total num frames: 3067904. Throughput: 0: 218.9. Samples: 769240. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 13:00:26,611][01669] Avg episode reward: [(0, '28.092')] [2025-01-07 13:00:28,638][05745] Saving new best policy, reward=28.487! [2025-01-07 13:00:28,636][05758] Updated weights for policy 0, policy_version 750 (0.0511) [2025-01-07 13:00:31,282][05745] Signal inference workers to stop experience collection... (750 times) [2025-01-07 13:00:31,326][05758] InferenceWorker_p0-w0: stopping experience collection (750 times) [2025-01-07 13:00:31,607][01669] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 874.7). Total num frames: 3072000. Throughput: 0: 209.3. Samples: 770558. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 13:00:31,610][01669] Avg episode reward: [(0, '27.632')] [2025-01-07 13:00:32,969][05745] Signal inference workers to resume experience collection... (750 times) [2025-01-07 13:00:32,971][05758] InferenceWorker_p0-w0: resuming experience collection (750 times) [2025-01-07 13:00:36,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 3076096. Throughput: 0: 214.8. Samples: 771044. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 13:00:36,609][01669] Avg episode reward: [(0, '27.792')] [2025-01-07 13:00:41,606][01669] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 874.7). Total num frames: 3080192. Throughput: 0: 226.4. Samples: 772638. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 13:00:41,608][01669] Avg episode reward: [(0, '27.426')] [2025-01-07 13:00:46,606][01669] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 874.7). Total num frames: 3084288. Throughput: 0: 207.2. Samples: 773676. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 13:00:46,610][01669] Avg episode reward: [(0, '27.326')] [2025-01-07 13:00:51,606][01669] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 860.9). Total num frames: 3088384. Throughput: 0: 213.9. Samples: 774412. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 13:00:51,609][01669] Avg episode reward: [(0, '27.360')] [2025-01-07 13:00:56,606][01669] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 3096576. Throughput: 0: 221.6. Samples: 775690. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 13:00:56,609][01669] Avg episode reward: [(0, '26.763')] [2025-01-07 13:01:01,607][01669] Fps is (10 sec: 1228.7, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 3100672. Throughput: 0: 210.8. Samples: 776802. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 13:01:01,612][01669] Avg episode reward: [(0, '26.530')] [2025-01-07 13:01:06,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 3104768. Throughput: 0: 209.6. Samples: 777502. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 13:01:06,618][01669] Avg episode reward: [(0, '26.177')] [2025-01-07 13:01:11,606][01669] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 3108864. Throughput: 0: 210.6. Samples: 778718. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 13:01:11,613][01669] Avg episode reward: [(0, '26.028')] [2025-01-07 13:01:14,407][05758] Updated weights for policy 0, policy_version 760 (0.0483) [2025-01-07 13:01:16,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 3112960. Throughput: 0: 217.4. Samples: 780342. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 13:01:16,608][01669] Avg episode reward: [(0, '25.979')] [2025-01-07 13:01:21,607][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 3117056. Throughput: 0: 215.1. Samples: 780722. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 13:01:21,610][01669] Avg episode reward: [(0, '26.064')] [2025-01-07 13:01:26,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 3121152. Throughput: 0: 207.8. Samples: 781988. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 13:01:26,608][01669] Avg episode reward: [(0, '26.912')] [2025-01-07 13:01:31,607][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 3125248. Throughput: 0: 226.2. Samples: 783856. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 13:01:31,609][01669] Avg episode reward: [(0, '26.757')] [2025-01-07 13:01:36,609][01669] Fps is (10 sec: 818.9, 60 sec: 887.4, 300 sec: 874.7). Total num frames: 3129344. Throughput: 0: 216.7. Samples: 784164. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 13:01:36,613][01669] Avg episode reward: [(0, '26.811')] [2025-01-07 13:01:38,660][05745] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000765_3133440.pth... [2025-01-07 13:01:38,821][05745] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000713_2920448.pth [2025-01-07 13:01:41,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 3133440. Throughput: 0: 215.3. Samples: 785380. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 13:01:41,608][01669] Avg episode reward: [(0, '26.601')] [2025-01-07 13:01:46,606][01669] Fps is (10 sec: 819.4, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 3137536. Throughput: 0: 226.0. Samples: 786972. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 13:01:46,615][01669] Avg episode reward: [(0, '26.146')] [2025-01-07 13:01:51,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 3141632. Throughput: 0: 230.8. Samples: 787886. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 13:01:51,614][01669] Avg episode reward: [(0, '25.594')] [2025-01-07 13:01:56,606][01669] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 874.7). Total num frames: 3145728. Throughput: 0: 226.8. Samples: 788922. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 13:01:56,609][01669] Avg episode reward: [(0, '25.922')] [2025-01-07 13:02:01,606][01669] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 860.9). Total num frames: 3149824. Throughput: 0: 215.6. Samples: 790046. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 13:02:01,616][01669] Avg episode reward: [(0, '25.532')] [2025-01-07 13:02:02,010][05758] Updated weights for policy 0, policy_version 770 (0.0540) [2025-01-07 13:02:06,606][01669] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 3158016. Throughput: 0: 230.4. Samples: 791090. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 13:02:06,615][01669] Avg episode reward: [(0, '25.854')] [2025-01-07 13:02:11,610][01669] Fps is (10 sec: 1228.4, 60 sec: 887.4, 300 sec: 888.6). Total num frames: 3162112. Throughput: 0: 225.4. Samples: 792130. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 13:02:11,617][01669] Avg episode reward: [(0, '25.574')] [2025-01-07 13:02:16,606][01669] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 860.9). Total num frames: 3162112. Throughput: 0: 206.6. Samples: 793152. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 13:02:16,617][01669] Avg episode reward: [(0, '25.917')] [2025-01-07 13:02:21,606][01669] Fps is (10 sec: 819.5, 60 sec: 887.5, 300 sec: 874.8). Total num frames: 3170304. Throughput: 0: 222.5. Samples: 794174. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 13:02:21,609][01669] Avg episode reward: [(0, '26.135')] [2025-01-07 13:02:26,606][01669] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 3174400. Throughput: 0: 225.9. Samples: 795544. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 13:02:26,611][01669] Avg episode reward: [(0, '25.959')] [2025-01-07 13:02:31,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 3178496. Throughput: 0: 213.5. Samples: 796580. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 13:02:31,613][01669] Avg episode reward: [(0, '25.603')] [2025-01-07 13:02:36,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 3182592. Throughput: 0: 207.0. Samples: 797200. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 13:02:36,609][01669] Avg episode reward: [(0, '25.106')] [2025-01-07 13:02:41,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 3186688. Throughput: 0: 217.2. Samples: 798698. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 13:02:41,608][01669] Avg episode reward: [(0, '26.050')] [2025-01-07 13:02:46,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 3190784. Throughput: 0: 223.8. Samples: 800118. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 13:02:46,613][01669] Avg episode reward: [(0, '26.172')] [2025-01-07 13:02:49,465][05758] Updated weights for policy 0, policy_version 780 (0.0933) [2025-01-07 13:02:51,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 3194880. Throughput: 0: 205.4. Samples: 800332. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 13:02:51,615][01669] Avg episode reward: [(0, '25.929')] [2025-01-07 13:02:56,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 3198976. Throughput: 0: 218.5. Samples: 801962. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 13:02:56,618][01669] Avg episode reward: [(0, '25.866')] [2025-01-07 13:03:01,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 3203072. Throughput: 0: 229.2. Samples: 803464. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 13:03:01,609][01669] Avg episode reward: [(0, '25.886')] [2025-01-07 13:03:06,606][01669] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 874.7). Total num frames: 3207168. Throughput: 0: 217.1. Samples: 803944. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 13:03:06,613][01669] Avg episode reward: [(0, '25.986')] [2025-01-07 13:03:11,606][01669] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 860.9). Total num frames: 3211264. Throughput: 0: 214.2. Samples: 805182. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 13:03:11,609][01669] Avg episode reward: [(0, '25.996')] [2025-01-07 13:03:16,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 860.9). Total num frames: 3215360. Throughput: 0: 221.4. Samples: 806544. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 13:03:16,616][01669] Avg episode reward: [(0, '26.540')] [2025-01-07 13:03:21,609][01669] Fps is (10 sec: 1228.5, 60 sec: 887.4, 300 sec: 888.6). Total num frames: 3223552. Throughput: 0: 229.1. Samples: 807512. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2025-01-07 13:03:21,613][01669] Avg episode reward: [(0, '26.998')] [2025-01-07 13:03:26,606][01669] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 860.9). Total num frames: 3223552. Throughput: 0: 217.0. Samples: 808462. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2025-01-07 13:03:26,613][01669] Avg episode reward: [(0, '26.717')] [2025-01-07 13:03:31,606][01669] Fps is (10 sec: 819.4, 60 sec: 887.5, 300 sec: 874.8). Total num frames: 3231744. Throughput: 0: 212.8. Samples: 809694. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2025-01-07 13:03:31,609][01669] Avg episode reward: [(0, '26.945')] [2025-01-07 13:03:35,127][05745] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000790_3235840.pth... [2025-01-07 13:03:35,131][05758] Updated weights for policy 0, policy_version 790 (0.0047) [2025-01-07 13:03:35,244][05745] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000739_3026944.pth [2025-01-07 13:03:36,606][01669] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 3235840. Throughput: 0: 227.8. Samples: 810582. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2025-01-07 13:03:36,609][01669] Avg episode reward: [(0, '26.988')] [2025-01-07 13:03:41,610][01669] Fps is (10 sec: 818.9, 60 sec: 887.4, 300 sec: 874.7). Total num frames: 3239936. Throughput: 0: 214.7. Samples: 811626. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2025-01-07 13:03:41,613][01669] Avg episode reward: [(0, '26.510')] [2025-01-07 13:03:46,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 3244032. Throughput: 0: 205.7. Samples: 812720. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2025-01-07 13:03:46,613][01669] Avg episode reward: [(0, '26.066')] [2025-01-07 13:03:51,606][01669] Fps is (10 sec: 819.5, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 3248128. Throughput: 0: 216.9. Samples: 813704. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 13:03:51,615][01669] Avg episode reward: [(0, '27.145')] [2025-01-07 13:03:56,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 3252224. Throughput: 0: 220.6. Samples: 815110. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 13:03:56,617][01669] Avg episode reward: [(0, '27.427')] [2025-01-07 13:04:01,610][01669] Fps is (10 sec: 818.9, 60 sec: 887.4, 300 sec: 874.7). Total num frames: 3256320. Throughput: 0: 210.8. Samples: 816032. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2025-01-07 13:04:01,612][01669] Avg episode reward: [(0, '27.337')] [2025-01-07 13:04:06,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 3260416. Throughput: 0: 203.0. Samples: 816646. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2025-01-07 13:04:06,609][01669] Avg episode reward: [(0, '27.676')] [2025-01-07 13:04:11,606][01669] Fps is (10 sec: 819.5, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 3264512. Throughput: 0: 222.4. Samples: 818470. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2025-01-07 13:04:11,616][01669] Avg episode reward: [(0, '27.849')] [2025-01-07 13:04:16,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 3268608. Throughput: 0: 219.4. Samples: 819566. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2025-01-07 13:04:16,609][01669] Avg episode reward: [(0, '28.211')] [2025-01-07 13:04:21,606][01669] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 860.9). Total num frames: 3272704. Throughput: 0: 205.2. Samples: 819816. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 13:04:21,609][01669] Avg episode reward: [(0, '28.281')] [2025-01-07 13:04:23,719][05758] Updated weights for policy 0, policy_version 800 (0.1406) [2025-01-07 13:04:26,012][05745] Signal inference workers to stop experience collection... (800 times) [2025-01-07 13:04:26,063][05758] InferenceWorker_p0-w0: stopping experience collection (800 times) [2025-01-07 13:04:26,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 860.9). Total num frames: 3276800. Throughput: 0: 220.8. Samples: 821560. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 13:04:26,614][01669] Avg episode reward: [(0, '27.906')] [2025-01-07 13:04:27,354][05745] Signal inference workers to resume experience collection... (800 times) [2025-01-07 13:04:27,355][05758] InferenceWorker_p0-w0: resuming experience collection (800 times) [2025-01-07 13:04:31,606][01669] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 874.7). Total num frames: 3280896. Throughput: 0: 224.7. Samples: 822832. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 13:04:31,612][01669] Avg episode reward: [(0, '27.622')] [2025-01-07 13:04:36,612][01669] Fps is (10 sec: 818.7, 60 sec: 819.1, 300 sec: 860.8). Total num frames: 3284992. Throughput: 0: 211.4. Samples: 823218. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 13:04:36,618][01669] Avg episode reward: [(0, '27.411')] [2025-01-07 13:04:41,606][01669] Fps is (10 sec: 819.2, 60 sec: 819.3, 300 sec: 860.9). Total num frames: 3289088. Throughput: 0: 216.2. Samples: 824840. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 13:04:41,616][01669] Avg episode reward: [(0, '28.098')] [2025-01-07 13:04:46,606][01669] Fps is (10 sec: 1229.5, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 3297280. Throughput: 0: 219.7. Samples: 825920. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 13:04:46,609][01669] Avg episode reward: [(0, '27.972')] [2025-01-07 13:04:51,606][01669] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 860.9). Total num frames: 3297280. Throughput: 0: 223.8. Samples: 826718. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 13:04:51,609][01669] Avg episode reward: [(0, '28.514')] [2025-01-07 13:04:56,606][01669] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 860.9). Total num frames: 3301376. Throughput: 0: 209.5. Samples: 827896. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 13:04:56,609][01669] Avg episode reward: [(0, '29.023')] [2025-01-07 13:04:56,851][05745] Saving new best policy, reward=28.514! [2025-01-07 13:05:00,889][05745] Saving new best policy, reward=29.023! [2025-01-07 13:05:01,606][01669] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 3309568. Throughput: 0: 214.8. Samples: 829230. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 13:05:01,611][01669] Avg episode reward: [(0, '28.291')] [2025-01-07 13:05:06,608][01669] Fps is (10 sec: 1228.6, 60 sec: 887.4, 300 sec: 874.7). Total num frames: 3313664. Throughput: 0: 225.8. Samples: 829978. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 13:05:06,618][01669] Avg episode reward: [(0, '28.284')] [2025-01-07 13:05:11,318][05758] Updated weights for policy 0, policy_version 810 (0.1074) [2025-01-07 13:05:11,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 3317760. Throughput: 0: 209.7. Samples: 830998. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 13:05:11,611][01669] Avg episode reward: [(0, '28.415')] [2025-01-07 13:05:16,606][01669] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 3321856. Throughput: 0: 211.5. Samples: 832348. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2025-01-07 13:05:16,610][01669] Avg episode reward: [(0, '28.770')] [2025-01-07 13:05:21,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 3325952. Throughput: 0: 218.2. Samples: 833036. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2025-01-07 13:05:21,614][01669] Avg episode reward: [(0, '28.511')] [2025-01-07 13:05:26,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 3330048. Throughput: 0: 212.3. Samples: 834392. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2025-01-07 13:05:26,612][01669] Avg episode reward: [(0, '28.452')] [2025-01-07 13:05:31,608][01669] Fps is (10 sec: 819.1, 60 sec: 887.4, 300 sec: 874.7). Total num frames: 3334144. Throughput: 0: 217.2. Samples: 835696. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2025-01-07 13:05:31,611][01669] Avg episode reward: [(0, '28.352')] [2025-01-07 13:05:33,836][05745] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000815_3338240.pth... [2025-01-07 13:05:33,944][05745] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000765_3133440.pth [2025-01-07 13:05:36,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.6, 300 sec: 874.7). Total num frames: 3338240. Throughput: 0: 214.8. Samples: 836384. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2025-01-07 13:05:36,617][01669] Avg episode reward: [(0, '28.287')] [2025-01-07 13:05:41,606][01669] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 3342336. Throughput: 0: 228.0. Samples: 838156. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2025-01-07 13:05:41,609][01669] Avg episode reward: [(0, '28.169')] [2025-01-07 13:05:46,607][01669] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 874.7). Total num frames: 3346432. Throughput: 0: 220.8. Samples: 839164. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2025-01-07 13:05:46,609][01669] Avg episode reward: [(0, '27.640')] [2025-01-07 13:05:51,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 860.9). Total num frames: 3350528. Throughput: 0: 214.9. Samples: 839650. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2025-01-07 13:05:51,617][01669] Avg episode reward: [(0, '26.992')] [2025-01-07 13:05:56,225][05758] Updated weights for policy 0, policy_version 820 (0.0050) [2025-01-07 13:05:56,606][01669] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 874.7). Total num frames: 3358720. Throughput: 0: 226.3. Samples: 841180. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2025-01-07 13:05:56,610][01669] Avg episode reward: [(0, '27.500')] [2025-01-07 13:06:01,606][01669] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 3362816. Throughput: 0: 220.1. Samples: 842252. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 13:06:01,609][01669] Avg episode reward: [(0, '27.531')] [2025-01-07 13:06:06,606][01669] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 860.9). Total num frames: 3362816. Throughput: 0: 221.2. Samples: 842988. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 13:06:06,616][01669] Avg episode reward: [(0, '26.850')] [2025-01-07 13:06:11,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 3371008. Throughput: 0: 219.5. Samples: 844270. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 13:06:11,617][01669] Avg episode reward: [(0, '26.185')] [2025-01-07 13:06:16,606][01669] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 3375104. Throughput: 0: 225.1. Samples: 845826. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 13:06:16,613][01669] Avg episode reward: [(0, '26.679')] [2025-01-07 13:06:21,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 3379200. Throughput: 0: 220.8. Samples: 846318. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2025-01-07 13:06:21,609][01669] Avg episode reward: [(0, '25.570')] [2025-01-07 13:06:26,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 3383296. Throughput: 0: 204.0. Samples: 847336. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2025-01-07 13:06:26,609][01669] Avg episode reward: [(0, '25.778')] [2025-01-07 13:06:31,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 3387392. Throughput: 0: 218.7. Samples: 849006. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 13:06:31,609][01669] Avg episode reward: [(0, '25.570')] [2025-01-07 13:06:36,609][01669] Fps is (10 sec: 819.0, 60 sec: 887.4, 300 sec: 874.7). Total num frames: 3391488. Throughput: 0: 220.5. Samples: 849574. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 13:06:36,619][01669] Avg episode reward: [(0, '25.729')] [2025-01-07 13:06:41,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 3395584. Throughput: 0: 208.7. Samples: 850572. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 13:06:41,615][01669] Avg episode reward: [(0, '25.641')] [2025-01-07 13:06:44,527][05758] Updated weights for policy 0, policy_version 830 (0.0498) [2025-01-07 13:06:46,607][01669] Fps is (10 sec: 819.4, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 3399680. Throughput: 0: 224.5. Samples: 852356. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 13:06:46,609][01669] Avg episode reward: [(0, '25.882')] [2025-01-07 13:06:51,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 3403776. Throughput: 0: 221.3. Samples: 852948. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2025-01-07 13:06:51,613][01669] Avg episode reward: [(0, '25.495')] [2025-01-07 13:06:56,609][01669] Fps is (10 sec: 819.0, 60 sec: 819.2, 300 sec: 874.7). Total num frames: 3407872. Throughput: 0: 217.7. Samples: 854066. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2025-01-07 13:06:56,611][01669] Avg episode reward: [(0, '25.444')] [2025-01-07 13:07:01,607][01669] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 860.9). Total num frames: 3411968. Throughput: 0: 216.4. Samples: 855562. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2025-01-07 13:07:01,614][01669] Avg episode reward: [(0, '25.304')] [2025-01-07 13:07:06,606][01669] Fps is (10 sec: 819.4, 60 sec: 887.5, 300 sec: 860.9). Total num frames: 3416064. Throughput: 0: 220.0. Samples: 856218. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2025-01-07 13:07:06,618][01669] Avg episode reward: [(0, '25.234')] [2025-01-07 13:07:11,606][01669] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 874.7). Total num frames: 3420160. Throughput: 0: 229.0. Samples: 857640. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 13:07:11,609][01669] Avg episode reward: [(0, '25.675')] [2025-01-07 13:07:16,610][01669] Fps is (10 sec: 818.9, 60 sec: 819.1, 300 sec: 860.8). Total num frames: 3424256. Throughput: 0: 214.3. Samples: 858650. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 13:07:16,613][01669] Avg episode reward: [(0, '25.733')] [2025-01-07 13:07:21,606][01669] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 860.9). Total num frames: 3428352. Throughput: 0: 218.1. Samples: 859388. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2025-01-07 13:07:21,617][01669] Avg episode reward: [(0, '25.385')] [2025-01-07 13:07:26,606][01669] Fps is (10 sec: 1229.3, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 3436544. Throughput: 0: 224.8. Samples: 860686. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2025-01-07 13:07:26,610][01669] Avg episode reward: [(0, '26.149')] [2025-01-07 13:07:31,177][05758] Updated weights for policy 0, policy_version 840 (0.0489) [2025-01-07 13:07:31,607][01669] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 3440640. Throughput: 0: 210.0. Samples: 861804. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2025-01-07 13:07:31,616][01669] Avg episode reward: [(0, '26.159')] [2025-01-07 13:07:36,292][05745] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000841_3444736.pth... [2025-01-07 13:07:36,405][05745] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000790_3235840.pth [2025-01-07 13:07:36,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 3444736. Throughput: 0: 214.1. Samples: 862582. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 13:07:36,615][01669] Avg episode reward: [(0, '26.577')] [2025-01-07 13:07:41,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 3448832. Throughput: 0: 218.0. Samples: 863876. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 13:07:41,609][01669] Avg episode reward: [(0, '27.701')] [2025-01-07 13:07:46,610][01669] Fps is (10 sec: 818.9, 60 sec: 887.4, 300 sec: 874.7). Total num frames: 3452928. Throughput: 0: 218.9. Samples: 865414. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2025-01-07 13:07:46,613][01669] Avg episode reward: [(0, '27.795')] [2025-01-07 13:07:51,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 3457024. Throughput: 0: 213.6. Samples: 865832. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2025-01-07 13:07:51,609][01669] Avg episode reward: [(0, '27.482')] [2025-01-07 13:07:56,606][01669] Fps is (10 sec: 819.5, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 3461120. Throughput: 0: 206.5. Samples: 866932. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2025-01-07 13:07:56,609][01669] Avg episode reward: [(0, '27.338')] [2025-01-07 13:08:01,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 3465216. Throughput: 0: 227.3. Samples: 868876. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2025-01-07 13:08:01,609][01669] Avg episode reward: [(0, '28.526')] [2025-01-07 13:08:06,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 3469312. Throughput: 0: 219.2. Samples: 869252. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2025-01-07 13:08:06,611][01669] Avg episode reward: [(0, '28.736')] [2025-01-07 13:08:11,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 3473408. Throughput: 0: 213.8. Samples: 870306. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2025-01-07 13:08:11,609][01669] Avg episode reward: [(0, '28.587')] [2025-01-07 13:08:16,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 860.9). Total num frames: 3477504. Throughput: 0: 226.1. Samples: 871980. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 13:08:16,619][01669] Avg episode reward: [(0, '28.673')] [2025-01-07 13:08:17,539][05758] Updated weights for policy 0, policy_version 850 (0.0510) [2025-01-07 13:08:19,798][05745] Signal inference workers to stop experience collection... (850 times) [2025-01-07 13:08:19,869][05758] InferenceWorker_p0-w0: stopping experience collection (850 times) [2025-01-07 13:08:21,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 3481600. Throughput: 0: 227.9. Samples: 872836. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 13:08:21,614][01669] Avg episode reward: [(0, '27.735')] [2025-01-07 13:08:22,166][05745] Signal inference workers to resume experience collection... (850 times) [2025-01-07 13:08:22,167][05758] InferenceWorker_p0-w0: resuming experience collection (850 times) [2025-01-07 13:08:26,606][01669] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 860.9). Total num frames: 3485696. Throughput: 0: 220.5. Samples: 873800. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2025-01-07 13:08:26,614][01669] Avg episode reward: [(0, '28.245')] [2025-01-07 13:08:31,606][01669] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 860.9). Total num frames: 3489792. Throughput: 0: 214.4. Samples: 875062. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2025-01-07 13:08:31,617][01669] Avg episode reward: [(0, '27.990')] [2025-01-07 13:08:36,606][01669] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 874.8). Total num frames: 3497984. Throughput: 0: 224.1. Samples: 875918. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2025-01-07 13:08:36,611][01669] Avg episode reward: [(0, '28.262')] [2025-01-07 13:08:41,606][01669] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 3502080. Throughput: 0: 225.4. Samples: 877076. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 13:08:41,617][01669] Avg episode reward: [(0, '28.208')] [2025-01-07 13:08:46,606][01669] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 860.9). Total num frames: 3502080. Throughput: 0: 205.8. Samples: 878138. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 13:08:46,614][01669] Avg episode reward: [(0, '28.190')] [2025-01-07 13:08:51,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 3510272. Throughput: 0: 220.4. Samples: 879172. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 13:08:51,612][01669] Avg episode reward: [(0, '27.799')] [2025-01-07 13:08:56,608][01669] Fps is (10 sec: 1228.6, 60 sec: 887.4, 300 sec: 874.7). Total num frames: 3514368. Throughput: 0: 220.7. Samples: 880236. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 13:08:56,611][01669] Avg episode reward: [(0, '27.767')] [2025-01-07 13:09:01,608][01669] Fps is (10 sec: 819.1, 60 sec: 887.4, 300 sec: 874.7). Total num frames: 3518464. Throughput: 0: 208.4. Samples: 881360. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2025-01-07 13:09:01,610][01669] Avg episode reward: [(0, '27.660')] [2025-01-07 13:09:05,647][05758] Updated weights for policy 0, policy_version 860 (0.1395) [2025-01-07 13:09:06,606][01669] Fps is (10 sec: 819.4, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 3522560. Throughput: 0: 206.6. Samples: 882132. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2025-01-07 13:09:06,613][01669] Avg episode reward: [(0, '26.898')] [2025-01-07 13:09:11,606][01669] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 3526656. Throughput: 0: 212.4. Samples: 883358. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2025-01-07 13:09:11,611][01669] Avg episode reward: [(0, '27.248')] [2025-01-07 13:09:16,608][01669] Fps is (10 sec: 819.1, 60 sec: 887.4, 300 sec: 874.7). Total num frames: 3530752. Throughput: 0: 213.9. Samples: 884686. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2025-01-07 13:09:16,615][01669] Avg episode reward: [(0, '26.865')] [2025-01-07 13:09:21,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 3534848. Throughput: 0: 206.1. Samples: 885194. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 13:09:21,609][01669] Avg episode reward: [(0, '26.553')] [2025-01-07 13:09:26,606][01669] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 3538944. Throughput: 0: 214.0. Samples: 886704. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 13:09:26,609][01669] Avg episode reward: [(0, '26.413')] [2025-01-07 13:09:31,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.8). Total num frames: 3543040. Throughput: 0: 225.0. Samples: 888264. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 13:09:31,614][01669] Avg episode reward: [(0, '25.959')] [2025-01-07 13:09:36,613][01669] Fps is (10 sec: 818.7, 60 sec: 819.1, 300 sec: 874.7). Total num frames: 3547136. Throughput: 0: 212.4. Samples: 888732. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 13:09:36,616][01669] Avg episode reward: [(0, '25.959')] [2025-01-07 13:09:38,673][05745] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000867_3551232.pth... [2025-01-07 13:09:38,794][05745] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000815_3338240.pth [2025-01-07 13:09:41,606][01669] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 860.9). Total num frames: 3551232. Throughput: 0: 217.9. Samples: 890042. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 13:09:41,614][01669] Avg episode reward: [(0, '26.099')] [2025-01-07 13:09:46,606][01669] Fps is (10 sec: 819.7, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 3555328. Throughput: 0: 222.6. Samples: 891378. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 13:09:46,609][01669] Avg episode reward: [(0, '25.800')] [2025-01-07 13:09:51,606][01669] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 874.7). Total num frames: 3559424. Throughput: 0: 224.0. Samples: 892210. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 13:09:51,609][01669] Avg episode reward: [(0, '26.108')] [2025-01-07 13:09:51,961][05758] Updated weights for policy 0, policy_version 870 (0.1493) [2025-01-07 13:09:56,606][01669] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 860.9). Total num frames: 3563520. Throughput: 0: 218.6. Samples: 893194. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 13:09:56,614][01669] Avg episode reward: [(0, '25.908')] [2025-01-07 13:10:01,606][01669] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 860.9). Total num frames: 3567616. Throughput: 0: 216.5. Samples: 894430. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 13:10:01,617][01669] Avg episode reward: [(0, '25.280')] [2025-01-07 13:10:06,606][01669] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 3575808. Throughput: 0: 227.1. Samples: 895412. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 13:10:06,614][01669] Avg episode reward: [(0, '24.752')] [2025-01-07 13:10:11,606][01669] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 3579904. Throughput: 0: 217.5. Samples: 896490. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 13:10:11,609][01669] Avg episode reward: [(0, '24.910')] [2025-01-07 13:10:16,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 3584000. Throughput: 0: 205.2. Samples: 897498. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 13:10:16,616][01669] Avg episode reward: [(0, '24.868')] [2025-01-07 13:10:21,609][01669] Fps is (10 sec: 819.0, 60 sec: 887.4, 300 sec: 874.7). Total num frames: 3588096. Throughput: 0: 216.3. Samples: 898464. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 13:10:21,612][01669] Avg episode reward: [(0, '24.989')] [2025-01-07 13:10:26,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 3592192. Throughput: 0: 218.0. Samples: 899854. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 13:10:26,610][01669] Avg episode reward: [(0, '25.545')] [2025-01-07 13:10:31,606][01669] Fps is (10 sec: 819.4, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 3596288. Throughput: 0: 209.3. Samples: 900796. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 13:10:31,610][01669] Avg episode reward: [(0, '25.692')] [2025-01-07 13:10:36,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.6, 300 sec: 874.7). Total num frames: 3600384. Throughput: 0: 207.0. Samples: 901524. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 13:10:36,612][01669] Avg episode reward: [(0, '25.789')] [2025-01-07 13:10:39,332][05758] Updated weights for policy 0, policy_version 880 (0.1386) [2025-01-07 13:10:41,607][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 3604480. Throughput: 0: 220.1. Samples: 903100. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 13:10:41,609][01669] Avg episode reward: [(0, '25.754')] [2025-01-07 13:10:46,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 3608576. Throughput: 0: 222.3. Samples: 904434. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 13:10:46,615][01669] Avg episode reward: [(0, '26.085')] [2025-01-07 13:10:51,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 860.9). Total num frames: 3612672. Throughput: 0: 206.2. Samples: 904690. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 13:10:51,614][01669] Avg episode reward: [(0, '25.667')] [2025-01-07 13:10:56,607][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 860.9). Total num frames: 3616768. Throughput: 0: 222.2. Samples: 906488. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 13:10:56,617][01669] Avg episode reward: [(0, '26.377')] [2025-01-07 13:11:01,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 3620864. Throughput: 0: 227.9. Samples: 907752. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 13:11:01,614][01669] Avg episode reward: [(0, '26.269')] [2025-01-07 13:11:06,606][01669] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 860.9). Total num frames: 3624960. Throughput: 0: 218.1. Samples: 908276. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 13:11:06,611][01669] Avg episode reward: [(0, '26.594')] [2025-01-07 13:11:11,606][01669] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 860.9). Total num frames: 3629056. Throughput: 0: 218.8. Samples: 909700. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 13:11:11,608][01669] Avg episode reward: [(0, '27.136')] [2025-01-07 13:11:16,606][01669] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 3637248. Throughput: 0: 207.4. Samples: 910130. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 13:11:16,608][01669] Avg episode reward: [(0, '27.136')] [2025-01-07 13:11:21,609][01669] Fps is (10 sec: 1228.5, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 3641344. Throughput: 0: 225.7. Samples: 911682. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 13:11:21,617][01669] Avg episode reward: [(0, '27.008')] [2025-01-07 13:11:26,606][01669] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 860.9). Total num frames: 3641344. Throughput: 0: 212.8. Samples: 912674. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 13:11:26,617][01669] Avg episode reward: [(0, '26.095')] [2025-01-07 13:11:27,058][05758] Updated weights for policy 0, policy_version 890 (0.0945) [2025-01-07 13:11:31,606][01669] Fps is (10 sec: 819.4, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 3649536. Throughput: 0: 215.4. Samples: 914128. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 13:11:31,612][01669] Avg episode reward: [(0, '25.918')] [2025-01-07 13:11:34,851][05745] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000892_3653632.pth... [2025-01-07 13:11:34,974][05745] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000841_3444736.pth [2025-01-07 13:11:36,606][01669] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 3653632. Throughput: 0: 226.8. Samples: 914896. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 13:11:36,613][01669] Avg episode reward: [(0, '25.544')] [2025-01-07 13:11:41,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 3657728. Throughput: 0: 210.6. Samples: 915966. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 13:11:41,609][01669] Avg episode reward: [(0, '25.250')] [2025-01-07 13:11:46,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 3661824. Throughput: 0: 213.8. Samples: 917372. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 13:11:46,613][01669] Avg episode reward: [(0, '25.063')] [2025-01-07 13:11:51,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 3665920. Throughput: 0: 216.7. Samples: 918026. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 13:11:51,609][01669] Avg episode reward: [(0, '24.691')] [2025-01-07 13:11:56,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 3670016. Throughput: 0: 217.4. Samples: 919484. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 13:11:56,609][01669] Avg episode reward: [(0, '24.988')] [2025-01-07 13:12:01,613][01669] Fps is (10 sec: 818.6, 60 sec: 887.4, 300 sec: 874.7). Total num frames: 3674112. Throughput: 0: 231.8. Samples: 920564. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 13:12:01,616][01669] Avg episode reward: [(0, '24.900')] [2025-01-07 13:12:06,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 3678208. Throughput: 0: 212.8. Samples: 921256. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 13:12:06,617][01669] Avg episode reward: [(0, '24.196')] [2025-01-07 13:12:11,606][01669] Fps is (10 sec: 819.8, 60 sec: 887.5, 300 sec: 874.8). Total num frames: 3682304. Throughput: 0: 233.3. Samples: 923174. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 13:12:11,609][01669] Avg episode reward: [(0, '24.029')] [2025-01-07 13:12:12,559][05758] Updated weights for policy 0, policy_version 900 (0.0502) [2025-01-07 13:12:16,606][01669] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 874.7). Total num frames: 3686400. Throughput: 0: 222.5. Samples: 924142. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 13:12:16,617][01669] Avg episode reward: [(0, '24.241')] [2025-01-07 13:12:16,747][05745] Signal inference workers to stop experience collection... (900 times) [2025-01-07 13:12:16,815][05758] InferenceWorker_p0-w0: stopping experience collection (900 times) [2025-01-07 13:12:18,735][05745] Signal inference workers to resume experience collection... (900 times) [2025-01-07 13:12:18,737][05758] InferenceWorker_p0-w0: resuming experience collection (900 times) [2025-01-07 13:12:21,606][01669] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 860.9). Total num frames: 3690496. Throughput: 0: 210.0. Samples: 924348. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2025-01-07 13:12:21,616][01669] Avg episode reward: [(0, '23.449')] [2025-01-07 13:12:26,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 860.9). Total num frames: 3694592. Throughput: 0: 229.5. Samples: 926292. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2025-01-07 13:12:26,617][01669] Avg episode reward: [(0, '24.172')] [2025-01-07 13:12:31,606][01669] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 860.9). Total num frames: 3698688. Throughput: 0: 220.9. Samples: 927312. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2025-01-07 13:12:31,609][01669] Avg episode reward: [(0, '23.967')] [2025-01-07 13:12:36,606][01669] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 860.9). Total num frames: 3702784. Throughput: 0: 221.8. Samples: 928008. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2025-01-07 13:12:36,617][01669] Avg episode reward: [(0, '24.279')] [2025-01-07 13:12:41,606][01669] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 3710976. Throughput: 0: 219.3. Samples: 929354. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2025-01-07 13:12:41,609][01669] Avg episode reward: [(0, '23.968')] [2025-01-07 13:12:46,606][01669] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 3715072. Throughput: 0: 230.0. Samples: 930912. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2025-01-07 13:12:46,615][01669] Avg episode reward: [(0, '23.984')] [2025-01-07 13:12:51,608][01669] Fps is (10 sec: 819.0, 60 sec: 887.4, 300 sec: 874.7). Total num frames: 3719168. Throughput: 0: 224.3. Samples: 931352. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2025-01-07 13:12:51,615][01669] Avg episode reward: [(0, '23.956')] [2025-01-07 13:12:56,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 3723264. Throughput: 0: 205.0. Samples: 932398. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 13:12:56,612][01669] Avg episode reward: [(0, '23.978')] [2025-01-07 13:12:59,630][05758] Updated weights for policy 0, policy_version 910 (0.0501) [2025-01-07 13:13:01,606][01669] Fps is (10 sec: 819.4, 60 sec: 887.6, 300 sec: 874.7). Total num frames: 3727360. Throughput: 0: 223.2. Samples: 934184. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 13:13:01,609][01669] Avg episode reward: [(0, '24.005')] [2025-01-07 13:13:06,608][01669] Fps is (10 sec: 819.1, 60 sec: 887.4, 300 sec: 874.7). Total num frames: 3731456. Throughput: 0: 231.9. Samples: 934784. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 13:13:06,616][01669] Avg episode reward: [(0, '23.948')] [2025-01-07 13:13:11,608][01669] Fps is (10 sec: 819.0, 60 sec: 887.4, 300 sec: 874.7). Total num frames: 3735552. Throughput: 0: 209.1. Samples: 935700. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 13:13:11,616][01669] Avg episode reward: [(0, '23.644')] [2025-01-07 13:13:16,606][01669] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 3739648. Throughput: 0: 221.0. Samples: 937258. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 13:13:16,609][01669] Avg episode reward: [(0, '22.952')] [2025-01-07 13:13:21,606][01669] Fps is (10 sec: 819.4, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 3743744. Throughput: 0: 222.7. Samples: 938028. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 13:13:21,617][01669] Avg episode reward: [(0, '23.579')] [2025-01-07 13:13:26,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 3747840. Throughput: 0: 217.5. Samples: 939140. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 13:13:26,609][01669] Avg episode reward: [(0, '23.777')] [2025-01-07 13:13:31,607][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 860.9). Total num frames: 3751936. Throughput: 0: 216.1. Samples: 940636. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 13:13:31,615][01669] Avg episode reward: [(0, '23.960')] [2025-01-07 13:13:36,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 860.9). Total num frames: 3756032. Throughput: 0: 217.6. Samples: 941142. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 13:13:36,617][01669] Avg episode reward: [(0, '23.960')] [2025-01-07 13:13:36,894][05745] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000918_3760128.pth... [2025-01-07 13:13:37,012][05745] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000867_3551232.pth [2025-01-07 13:13:41,606][01669] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 874.7). Total num frames: 3760128. Throughput: 0: 228.1. Samples: 942664. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 13:13:41,611][01669] Avg episode reward: [(0, '24.335')] [2025-01-07 13:13:46,606][01669] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 860.9). Total num frames: 3764224. Throughput: 0: 211.9. Samples: 943720. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 13:13:46,609][01669] Avg episode reward: [(0, '24.658')] [2025-01-07 13:13:47,750][05758] Updated weights for policy 0, policy_version 920 (0.1469) [2025-01-07 13:13:51,606][01669] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 860.9). Total num frames: 3768320. Throughput: 0: 215.6. Samples: 944484. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 13:13:51,612][01669] Avg episode reward: [(0, '24.858')] [2025-01-07 13:13:56,606][01669] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 3776512. Throughput: 0: 223.7. Samples: 945766. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2025-01-07 13:13:56,611][01669] Avg episode reward: [(0, '25.128')] [2025-01-07 13:14:01,610][01669] Fps is (10 sec: 1228.4, 60 sec: 887.4, 300 sec: 874.7). Total num frames: 3780608. Throughput: 0: 214.9. Samples: 946928. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2025-01-07 13:14:01,617][01669] Avg episode reward: [(0, '25.218')] [2025-01-07 13:14:06,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 3784704. Throughput: 0: 213.7. Samples: 947646. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2025-01-07 13:14:06,608][01669] Avg episode reward: [(0, '25.007')] [2025-01-07 13:14:11,606][01669] Fps is (10 sec: 819.5, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 3788800. Throughput: 0: 215.7. Samples: 948846. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2025-01-07 13:14:11,618][01669] Avg episode reward: [(0, '24.841')] [2025-01-07 13:14:16,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 3792896. Throughput: 0: 220.1. Samples: 950540. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2025-01-07 13:14:16,615][01669] Avg episode reward: [(0, '24.167')] [2025-01-07 13:14:21,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 3796992. Throughput: 0: 214.5. Samples: 950796. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 13:14:21,614][01669] Avg episode reward: [(0, '24.093')] [2025-01-07 13:14:26,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 3801088. Throughput: 0: 207.7. Samples: 952010. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 13:14:26,611][01669] Avg episode reward: [(0, '25.336')] [2025-01-07 13:14:31,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.8). Total num frames: 3805184. Throughput: 0: 223.6. Samples: 953784. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 13:14:31,609][01669] Avg episode reward: [(0, '25.298')] [2025-01-07 13:14:32,674][05758] Updated weights for policy 0, policy_version 930 (0.0491) [2025-01-07 13:14:36,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 3809280. Throughput: 0: 219.0. Samples: 954338. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 13:14:36,609][01669] Avg episode reward: [(0, '25.129')] [2025-01-07 13:14:41,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 3813376. Throughput: 0: 210.9. Samples: 955258. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 13:14:41,613][01669] Avg episode reward: [(0, '25.169')] [2025-01-07 13:14:46,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 3817472. Throughput: 0: 218.7. Samples: 956770. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 13:14:46,618][01669] Avg episode reward: [(0, '24.729')] [2025-01-07 13:14:51,611][01669] Fps is (10 sec: 1228.3, 60 sec: 955.7, 300 sec: 888.6). Total num frames: 3825664. Throughput: 0: 223.4. Samples: 957702. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 13:14:51,614][01669] Avg episode reward: [(0, '24.724')] [2025-01-07 13:14:56,606][01669] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 874.7). Total num frames: 3825664. Throughput: 0: 219.9. Samples: 958740. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 13:14:56,609][01669] Avg episode reward: [(0, '24.724')] [2025-01-07 13:15:01,606][01669] Fps is (10 sec: 409.8, 60 sec: 819.2, 300 sec: 860.9). Total num frames: 3829760. Throughput: 0: 206.8. Samples: 959846. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2025-01-07 13:15:01,616][01669] Avg episode reward: [(0, '25.536')] [2025-01-07 13:15:06,606][01669] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 3837952. Throughput: 0: 224.1. Samples: 960880. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2025-01-07 13:15:06,612][01669] Avg episode reward: [(0, '25.901')] [2025-01-07 13:15:11,606][01669] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 3842048. Throughput: 0: 220.7. Samples: 961942. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2025-01-07 13:15:11,612][01669] Avg episode reward: [(0, '25.868')] [2025-01-07 13:15:16,606][01669] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 860.9). Total num frames: 3842048. Throughput: 0: 204.3. Samples: 962978. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2025-01-07 13:15:16,609][01669] Avg episode reward: [(0, '26.422')] [2025-01-07 13:15:21,167][05758] Updated weights for policy 0, policy_version 940 (0.0477) [2025-01-07 13:15:21,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 3850240. Throughput: 0: 214.9. Samples: 964010. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 13:15:21,616][01669] Avg episode reward: [(0, '26.344')] [2025-01-07 13:15:26,606][01669] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 3854336. Throughput: 0: 220.0. Samples: 965156. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 13:15:26,611][01669] Avg episode reward: [(0, '26.710')] [2025-01-07 13:15:31,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 3858432. Throughput: 0: 209.6. Samples: 966200. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 13:15:31,611][01669] Avg episode reward: [(0, '26.306')] [2025-01-07 13:15:35,620][05745] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000943_3862528.pth... [2025-01-07 13:15:35,745][05745] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000892_3653632.pth [2025-01-07 13:15:36,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 3862528. Throughput: 0: 208.3. Samples: 967074. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 13:15:36,610][01669] Avg episode reward: [(0, '26.529')] [2025-01-07 13:15:41,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 3866624. Throughput: 0: 218.0. Samples: 968548. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-01-07 13:15:41,618][01669] Avg episode reward: [(0, '26.489')] [2025-01-07 13:15:46,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 3870720. Throughput: 0: 223.2. Samples: 969890. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 13:15:46,613][01669] Avg episode reward: [(0, '26.356')] [2025-01-07 13:15:51,606][01669] Fps is (10 sec: 819.2, 60 sec: 819.3, 300 sec: 874.7). Total num frames: 3874816. Throughput: 0: 208.7. Samples: 970272. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 13:15:51,621][01669] Avg episode reward: [(0, '26.167')] [2025-01-07 13:15:56,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 3878912. Throughput: 0: 213.4. Samples: 971544. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 13:15:56,609][01669] Avg episode reward: [(0, '27.019')] [2025-01-07 13:16:01,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 3883008. Throughput: 0: 229.9. Samples: 973322. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 13:16:01,609][01669] Avg episode reward: [(0, '26.548')] [2025-01-07 13:16:06,606][01669] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 874.7). Total num frames: 3887104. Throughput: 0: 216.3. Samples: 973742. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 13:16:06,611][01669] Avg episode reward: [(0, '26.491')] [2025-01-07 13:16:08,877][05758] Updated weights for policy 0, policy_version 950 (0.1500) [2025-01-07 13:16:11,474][05745] Signal inference workers to stop experience collection... (950 times) [2025-01-07 13:16:11,524][05758] InferenceWorker_p0-w0: stopping experience collection (950 times) [2025-01-07 13:16:11,606][01669] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 860.9). Total num frames: 3891200. Throughput: 0: 217.7. Samples: 974954. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 13:16:11,617][01669] Avg episode reward: [(0, '26.242')] [2025-01-07 13:16:13,045][05745] Signal inference workers to resume experience collection... (950 times) [2025-01-07 13:16:13,049][05758] InferenceWorker_p0-w0: resuming experience collection (950 times) [2025-01-07 13:16:16,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 860.9). Total num frames: 3895296. Throughput: 0: 227.3. Samples: 976430. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 13:16:16,614][01669] Avg episode reward: [(0, '26.296')] [2025-01-07 13:16:21,606][01669] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 874.7). Total num frames: 3899392. Throughput: 0: 231.5. Samples: 977490. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-01-07 13:16:21,609][01669] Avg episode reward: [(0, '26.325')] [2025-01-07 13:16:26,606][01669] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 860.9). Total num frames: 3903488. Throughput: 0: 218.3. Samples: 978372. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 13:16:26,614][01669] Avg episode reward: [(0, '27.774')] [2025-01-07 13:16:31,606][01669] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 3911680. Throughput: 0: 218.4. Samples: 979718. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 13:16:31,608][01669] Avg episode reward: [(0, '28.392')] [2025-01-07 13:16:36,606][01669] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 3915776. Throughput: 0: 229.1. Samples: 980580. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 13:16:36,612][01669] Avg episode reward: [(0, '28.315')] [2025-01-07 13:16:41,609][01669] Fps is (10 sec: 819.0, 60 sec: 887.4, 300 sec: 874.7). Total num frames: 3919872. Throughput: 0: 228.4. Samples: 981824. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 13:16:41,620][01669] Avg episode reward: [(0, '27.731')] [2025-01-07 13:16:46,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 3923968. Throughput: 0: 217.9. Samples: 983126. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 13:16:46,613][01669] Avg episode reward: [(0, '27.920')] [2025-01-07 13:16:51,606][01669] Fps is (10 sec: 819.4, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 3928064. Throughput: 0: 221.2. Samples: 983696. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2025-01-07 13:16:51,609][01669] Avg episode reward: [(0, '28.044')] [2025-01-07 13:16:53,447][05758] Updated weights for policy 0, policy_version 960 (0.0492) [2025-01-07 13:16:56,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.8). Total num frames: 3932160. Throughput: 0: 233.6. Samples: 985468. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2025-01-07 13:16:56,616][01669] Avg episode reward: [(0, '27.661')] [2025-01-07 13:17:01,614][01669] Fps is (10 sec: 818.6, 60 sec: 887.4, 300 sec: 874.7). Total num frames: 3936256. Throughput: 0: 220.1. Samples: 986336. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2025-01-07 13:17:01,617][01669] Avg episode reward: [(0, '27.830')] [2025-01-07 13:17:06,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 3940352. Throughput: 0: 209.7. Samples: 986928. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2025-01-07 13:17:06,611][01669] Avg episode reward: [(0, '28.146')] [2025-01-07 13:17:11,606][01669] Fps is (10 sec: 819.8, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 3944448. Throughput: 0: 232.2. Samples: 988822. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2025-01-07 13:17:11,612][01669] Avg episode reward: [(0, '27.318')] [2025-01-07 13:17:16,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 3948544. Throughput: 0: 225.3. Samples: 989858. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2025-01-07 13:17:16,613][01669] Avg episode reward: [(0, '27.249')] [2025-01-07 13:17:21,607][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 3952640. Throughput: 0: 214.4. Samples: 990226. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2025-01-07 13:17:21,614][01669] Avg episode reward: [(0, '26.897')] [2025-01-07 13:17:26,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 3956736. Throughput: 0: 224.4. Samples: 991920. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2025-01-07 13:17:26,619][01669] Avg episode reward: [(0, '26.263')] [2025-01-07 13:17:31,613][01669] Fps is (10 sec: 1228.0, 60 sec: 887.4, 300 sec: 888.6). Total num frames: 3964928. Throughput: 0: 221.8. Samples: 993110. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2025-01-07 13:17:31,615][01669] Avg episode reward: [(0, '25.803')] [2025-01-07 13:17:36,606][01669] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 860.9). Total num frames: 3964928. Throughput: 0: 224.7. Samples: 993808. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2025-01-07 13:17:36,609][01669] Avg episode reward: [(0, '25.963')] [2025-01-07 13:17:36,792][05745] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000969_3969024.pth... [2025-01-07 13:17:36,950][05745] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000918_3760128.pth [2025-01-07 13:17:41,305][05758] Updated weights for policy 0, policy_version 970 (0.0040) [2025-01-07 13:17:41,606][01669] Fps is (10 sec: 819.7, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 3973120. Throughput: 0: 210.3. Samples: 994930. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2025-01-07 13:17:41,613][01669] Avg episode reward: [(0, '25.910')] [2025-01-07 13:17:46,606][01669] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 3977216. Throughput: 0: 225.7. Samples: 996490. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 13:17:46,608][01669] Avg episode reward: [(0, '26.030')] [2025-01-07 13:17:51,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 3981312. Throughput: 0: 223.1. Samples: 996968. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 13:17:51,611][01669] Avg episode reward: [(0, '26.421')] [2025-01-07 13:17:56,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 3985408. Throughput: 0: 204.9. Samples: 998044. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 13:17:56,618][01669] Avg episode reward: [(0, '25.609')] [2025-01-07 13:18:01,606][01669] Fps is (10 sec: 819.2, 60 sec: 887.6, 300 sec: 874.7). Total num frames: 3989504. Throughput: 0: 218.8. Samples: 999706. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-01-07 13:18:01,609][01669] Avg episode reward: [(0, '25.513')] [2025-01-07 13:18:06,608][01669] Fps is (10 sec: 819.0, 60 sec: 887.4, 300 sec: 874.7). Total num frames: 3993600. Throughput: 0: 224.1. Samples: 1000310. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2025-01-07 13:18:06,616][01669] Avg episode reward: [(0, '26.025')] [2025-01-07 13:18:11,607][01669] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 3997696. Throughput: 0: 211.6. Samples: 1001444. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2025-01-07 13:18:11,617][01669] Avg episode reward: [(0, '25.877')] [2025-01-07 13:18:16,606][01669] Fps is (10 sec: 819.4, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 4001792. Throughput: 0: 218.0. Samples: 1002918. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2025-01-07 13:18:16,609][01669] Avg episode reward: [(0, '26.048')] [2025-01-07 13:18:18,792][05745] Stopping Batcher_0... [2025-01-07 13:18:18,793][05745] Loop batcher_evt_loop terminating... [2025-01-07 13:18:18,794][01669] Component Batcher_0 stopped! [2025-01-07 13:18:18,837][05758] Weights refcount: 2 0 [2025-01-07 13:18:18,848][01669] Component InferenceWorker_p0-w0 stopped! [2025-01-07 13:18:18,851][05758] Stopping InferenceWorker_p0-w0... [2025-01-07 13:18:18,852][05758] Loop inference_proc0-0_evt_loop terminating... [2025-01-07 13:18:19,174][01669] Component RolloutWorker_w5 stopped! [2025-01-07 13:18:19,184][05764] Stopping RolloutWorker_w5... [2025-01-07 13:18:19,185][05764] Loop rollout_proc5_evt_loop terminating... [2025-01-07 13:18:19,232][01669] Component RolloutWorker_w7 stopped! [2025-01-07 13:18:19,237][05766] Stopping RolloutWorker_w7... [2025-01-07 13:18:19,242][05766] Loop rollout_proc7_evt_loop terminating... [2025-01-07 13:18:19,252][01669] Component RolloutWorker_w3 stopped! [2025-01-07 13:18:19,259][05761] Stopping RolloutWorker_w3... [2025-01-07 13:18:19,259][05761] Loop rollout_proc3_evt_loop terminating... [2025-01-07 13:18:19,278][01669] Component RolloutWorker_w1 stopped! [2025-01-07 13:18:19,286][05760] Stopping RolloutWorker_w1... [2025-01-07 13:18:19,287][05760] Loop rollout_proc1_evt_loop terminating... [2025-01-07 13:18:19,369][05762] Stopping RolloutWorker_w2... [2025-01-07 13:18:19,369][01669] Component RolloutWorker_w2 stopped! [2025-01-07 13:18:19,378][05762] Loop rollout_proc2_evt_loop terminating... [2025-01-07 13:18:19,388][01669] Component RolloutWorker_w0 stopped! [2025-01-07 13:18:19,396][05759] Stopping RolloutWorker_w0... [2025-01-07 13:18:19,397][05759] Loop rollout_proc0_evt_loop terminating... [2025-01-07 13:18:19,453][01669] Component RolloutWorker_w6 stopped! [2025-01-07 13:18:19,456][05765] Stopping RolloutWorker_w6... [2025-01-07 13:18:19,470][01669] Component RolloutWorker_w4 stopped! [2025-01-07 13:18:19,482][05763] Stopping RolloutWorker_w4... [2025-01-07 13:18:19,483][05763] Loop rollout_proc4_evt_loop terminating... [2025-01-07 13:18:19,462][05765] Loop rollout_proc6_evt_loop terminating... [2025-01-07 13:18:24,419][05745] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000979_4009984.pth... [2025-01-07 13:18:24,529][05745] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000943_3862528.pth [2025-01-07 13:18:24,551][05745] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000979_4009984.pth... [2025-01-07 13:18:24,732][05745] Stopping LearnerWorker_p0... [2025-01-07 13:18:24,733][05745] Loop learner_proc0_evt_loop terminating... [2025-01-07 13:18:24,732][01669] Component LearnerWorker_p0 stopped! [2025-01-07 13:18:24,739][01669] Waiting for process learner_proc0 to stop... [2025-01-07 13:18:26,444][01669] Waiting for process inference_proc0-0 to join... [2025-01-07 13:18:26,453][01669] Waiting for process rollout_proc0 to join... [2025-01-07 13:18:26,456][01669] Waiting for process rollout_proc1 to join... [2025-01-07 13:18:26,466][01669] Waiting for process rollout_proc2 to join... [2025-01-07 13:18:26,470][01669] Waiting for process rollout_proc3 to join... [2025-01-07 13:18:26,476][01669] Waiting for process rollout_proc4 to join... [2025-01-07 13:18:26,484][01669] Waiting for process rollout_proc5 to join... [2025-01-07 13:18:26,492][01669] Waiting for process rollout_proc6 to join... [2025-01-07 13:18:26,497][01669] Waiting for process rollout_proc7 to join... [2025-01-07 13:18:26,502][01669] Batcher 0 profile tree view: batching: 13.3617, releasing_batches: 0.1838 [2025-01-07 13:18:26,504][01669] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0001 wait_policy_total: 47.5968 update_model: 93.6288 weight_update: 0.0046 one_step: 0.0255 handle_policy_step: 2764.4241 deserialize: 80.2216, stack: 13.6797, obs_to_device_normalize: 453.3641, forward: 2047.4291, send_messages: 66.6792 prepare_outputs: 32.1791 to_cpu: 3.7189 [2025-01-07 13:18:26,506][01669] Learner 0 profile tree view: misc: 0.0063, prepare_batch: 1155.2573 train: 3366.8204 epoch_init: 0.0211, minibatch_init: 0.0208, losses_postprocess: 0.1573, kl_divergence: 0.5946, after_optimizer: 2.4594 calculate_losses: 1646.5975 losses_init: 0.0045, forward_head: 1493.9967, bptt_initial: 4.2037, tail: 3.0355, advantages_returns: 0.1833, losses: 1.3708 bptt: 143.2939 bptt_forward_core: 142.6606 update: 1716.1415 clip: 3.1763 [2025-01-07 13:18:26,509][01669] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.6735, enqueue_policy_requests: 44.5266, env_step: 1444.9729, overhead: 30.8013, complete_rollouts: 16.1929 save_policy_outputs: 28.9053 split_output_tensors: 11.6193 [2025-01-07 13:18:26,510][01669] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.7202, enqueue_policy_requests: 46.0631, env_step: 1412.7885, overhead: 32.9936, complete_rollouts: 13.2416 save_policy_outputs: 27.4940 split_output_tensors: 10.6488 [2025-01-07 13:18:26,513][01669] Loop Runner_EvtLoop terminating... [2025-01-07 13:18:26,515][01669] Runner profile tree view: main_loop: 4607.6466 [2025-01-07 13:18:26,517][01669] Collected {0: 4009984}, FPS: 870.3 [2025-01-07 13:18:26,580][01669] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2025-01-07 13:18:26,582][01669] Overriding arg 'num_workers' with value 1 passed from command line [2025-01-07 13:18:26,584][01669] Adding new argument 'no_render'=True that is not in the saved config file! [2025-01-07 13:18:26,585][01669] Adding new argument 'save_video'=True that is not in the saved config file! [2025-01-07 13:18:26,587][01669] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2025-01-07 13:18:26,589][01669] Adding new argument 'video_name'=None that is not in the saved config file! [2025-01-07 13:18:26,592][01669] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2025-01-07 13:18:26,594][01669] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2025-01-07 13:18:26,595][01669] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2025-01-07 13:18:26,596][01669] Adding new argument 'hf_repository'=None that is not in the saved config file! [2025-01-07 13:18:26,597][01669] Adding new argument 'policy_index'=0 that is not in the saved config file! [2025-01-07 13:18:26,598][01669] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2025-01-07 13:18:26,599][01669] Adding new argument 'train_script'=None that is not in the saved config file! [2025-01-07 13:18:26,600][01669] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2025-01-07 13:18:26,601][01669] Using frameskip 1 and render_action_repeat=4 for evaluation [2025-01-07 13:18:26,662][01669] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-07 13:18:26,670][01669] RunningMeanStd input shape: (3, 72, 128) [2025-01-07 13:18:26,673][01669] RunningMeanStd input shape: (1,) [2025-01-07 13:18:26,704][01669] ConvEncoder: input_channels=3 [2025-01-07 13:18:26,957][01669] Conv encoder output size: 512 [2025-01-07 13:18:26,960][01669] Policy head output size: 512 [2025-01-07 13:18:26,999][01669] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000979_4009984.pth... [2025-01-07 13:18:27,794][01669] Num frames 100... [2025-01-07 13:18:28,004][01669] Num frames 200... [2025-01-07 13:18:28,236][01669] Num frames 300... [2025-01-07 13:18:28,434][01669] Num frames 400... [2025-01-07 13:18:28,629][01669] Num frames 500... [2025-01-07 13:18:28,820][01669] Num frames 600... [2025-01-07 13:18:29,009][01669] Num frames 700... [2025-01-07 13:18:29,216][01669] Num frames 800... [2025-01-07 13:18:29,407][01669] Num frames 900... [2025-01-07 13:18:29,605][01669] Num frames 1000... [2025-01-07 13:18:29,802][01669] Num frames 1100... [2025-01-07 13:18:29,977][01669] Num frames 1200... [2025-01-07 13:18:30,154][01669] Num frames 1300... [2025-01-07 13:18:30,333][01669] Num frames 1400... [2025-01-07 13:18:30,516][01669] Avg episode rewards: #0: 33.660, true rewards: #0: 14.660 [2025-01-07 13:18:30,519][01669] Avg episode reward: 33.660, avg true_objective: 14.660 [2025-01-07 13:18:30,579][01669] Num frames 1500... [2025-01-07 13:18:30,767][01669] Num frames 1600... [2025-01-07 13:18:30,951][01669] Num frames 1700... [2025-01-07 13:18:31,140][01669] Num frames 1800... [2025-01-07 13:18:31,343][01669] Num frames 1900... [2025-01-07 13:18:31,536][01669] Num frames 2000... [2025-01-07 13:18:31,726][01669] Num frames 2100... [2025-01-07 13:18:31,946][01669] Num frames 2200... [2025-01-07 13:18:32,224][01669] Num frames 2300... [2025-01-07 13:18:32,472][01669] Num frames 2400... [2025-01-07 13:18:32,684][01669] Num frames 2500... [2025-01-07 13:18:32,963][01669] Num frames 2600... [2025-01-07 13:18:33,170][01669] Num frames 2700... [2025-01-07 13:18:33,466][01669] Num frames 2800... [2025-01-07 13:18:33,804][01669] Num frames 2900... [2025-01-07 13:18:34,113][01669] Num frames 3000... [2025-01-07 13:18:34,619][01669] Num frames 3100... [2025-01-07 13:18:34,940][01669] Num frames 3200... [2025-01-07 13:18:35,344][01669] Num frames 3300... [2025-01-07 13:18:35,890][01669] Num frames 3400... [2025-01-07 13:18:36,301][01669] Num frames 3500... [2025-01-07 13:18:36,546][01669] Avg episode rewards: #0: 45.829, true rewards: #0: 17.830 [2025-01-07 13:18:36,550][01669] Avg episode reward: 45.829, avg true_objective: 17.830 [2025-01-07 13:18:36,715][01669] Num frames 3600... [2025-01-07 13:18:37,021][01669] Num frames 3700... [2025-01-07 13:18:37,318][01669] Num frames 3800... [2025-01-07 13:18:37,630][01669] Num frames 3900... [2025-01-07 13:18:38,423][01669] Num frames 4000... [2025-01-07 13:18:39,134][01669] Num frames 4100... [2025-01-07 13:18:39,810][01669] Num frames 4200... [2025-01-07 13:18:40,393][01669] Num frames 4300... [2025-01-07 13:18:41,080][01669] Num frames 4400... [2025-01-07 13:18:41,712][01669] Num frames 4500... [2025-01-07 13:18:42,148][01669] Num frames 4600... [2025-01-07 13:18:42,616][01669] Num frames 4700... [2025-01-07 13:18:42,992][01669] Num frames 4800... [2025-01-07 13:18:43,372][01669] Num frames 4900... [2025-01-07 13:18:43,804][01669] Avg episode rewards: #0: 41.580, true rewards: #0: 16.580 [2025-01-07 13:18:43,810][01669] Avg episode reward: 41.580, avg true_objective: 16.580 [2025-01-07 13:18:43,863][01669] Num frames 5000... [2025-01-07 13:18:44,066][01669] Num frames 5100... [2025-01-07 13:18:44,259][01669] Num frames 5200... [2025-01-07 13:18:44,462][01669] Num frames 5300... [2025-01-07 13:18:44,663][01669] Num frames 5400... [2025-01-07 13:18:44,860][01669] Num frames 5500... [2025-01-07 13:18:45,050][01669] Num frames 5600... [2025-01-07 13:18:45,248][01669] Num frames 5700... [2025-01-07 13:18:45,430][01669] Num frames 5800... [2025-01-07 13:18:45,615][01669] Num frames 5900... [2025-01-07 13:18:45,803][01669] Num frames 6000... [2025-01-07 13:18:45,922][01669] Avg episode rewards: #0: 37.825, true rewards: #0: 15.075 [2025-01-07 13:18:45,924][01669] Avg episode reward: 37.825, avg true_objective: 15.075 [2025-01-07 13:18:46,059][01669] Num frames 6100... [2025-01-07 13:18:46,251][01669] Num frames 6200... [2025-01-07 13:18:46,448][01669] Num frames 6300... [2025-01-07 13:18:46,640][01669] Num frames 6400... [2025-01-07 13:18:46,841][01669] Num frames 6500... [2025-01-07 13:18:47,039][01669] Num frames 6600... [2025-01-07 13:18:47,256][01669] Num frames 6700... [2025-01-07 13:18:47,462][01669] Num frames 6800... [2025-01-07 13:18:47,663][01669] Num frames 6900... [2025-01-07 13:18:47,869][01669] Num frames 7000... [2025-01-07 13:18:48,057][01669] Num frames 7100... [2025-01-07 13:18:48,277][01669] Num frames 7200... [2025-01-07 13:18:48,329][01669] Avg episode rewards: #0: 35.600, true rewards: #0: 14.400 [2025-01-07 13:18:48,331][01669] Avg episode reward: 35.600, avg true_objective: 14.400 [2025-01-07 13:18:48,532][01669] Num frames 7300... [2025-01-07 13:18:48,803][01669] Num frames 7400... [2025-01-07 13:18:49,072][01669] Num frames 7500... [2025-01-07 13:18:49,349][01669] Num frames 7600... [2025-01-07 13:18:49,573][01669] Num frames 7700... [2025-01-07 13:18:49,768][01669] Num frames 7800... [2025-01-07 13:18:49,963][01669] Num frames 7900... [2025-01-07 13:18:50,155][01669] Num frames 8000... [2025-01-07 13:18:50,353][01669] Num frames 8100... [2025-01-07 13:18:50,555][01669] Num frames 8200... [2025-01-07 13:18:50,760][01669] Num frames 8300... [2025-01-07 13:18:50,949][01669] Num frames 8400... [2025-01-07 13:18:51,140][01669] Num frames 8500... [2025-01-07 13:18:51,339][01669] Num frames 8600... [2025-01-07 13:18:51,531][01669] Num frames 8700... [2025-01-07 13:18:51,782][01669] Num frames 8800... [2025-01-07 13:18:51,924][01669] Avg episode rewards: #0: 36.885, true rewards: #0: 14.718 [2025-01-07 13:18:51,927][01669] Avg episode reward: 36.885, avg true_objective: 14.718 [2025-01-07 13:18:52,111][01669] Num frames 8900... [2025-01-07 13:18:52,378][01669] Num frames 9000... [2025-01-07 13:18:52,631][01669] Num frames 9100... [2025-01-07 13:18:52,886][01669] Num frames 9200... [2025-01-07 13:18:53,136][01669] Num frames 9300... [2025-01-07 13:18:53,419][01669] Num frames 9400... [2025-01-07 13:18:53,707][01669] Num frames 9500... [2025-01-07 13:18:53,988][01669] Num frames 9600... [2025-01-07 13:18:54,135][01669] Avg episode rewards: #0: 34.187, true rewards: #0: 13.759 [2025-01-07 13:18:54,139][01669] Avg episode reward: 34.187, avg true_objective: 13.759 [2025-01-07 13:18:54,330][01669] Num frames 9700... [2025-01-07 13:18:54,620][01669] Num frames 9800... [2025-01-07 13:18:54,844][01669] Num frames 9900... [2025-01-07 13:18:55,030][01669] Num frames 10000... [2025-01-07 13:18:55,223][01669] Num frames 10100... [2025-01-07 13:18:55,366][01669] Avg episode rewards: #0: 30.929, true rewards: #0: 12.679 [2025-01-07 13:18:55,369][01669] Avg episode reward: 30.929, avg true_objective: 12.679 [2025-01-07 13:18:55,494][01669] Num frames 10200... [2025-01-07 13:18:55,688][01669] Num frames 10300... [2025-01-07 13:18:55,876][01669] Num frames 10400... [2025-01-07 13:18:56,071][01669] Num frames 10500... [2025-01-07 13:18:56,265][01669] Num frames 10600... [2025-01-07 13:18:56,454][01669] Num frames 10700... [2025-01-07 13:18:56,662][01669] Num frames 10800... [2025-01-07 13:18:56,875][01669] Num frames 10900... [2025-01-07 13:18:57,066][01669] Num frames 11000... [2025-01-07 13:18:57,252][01669] Num frames 11100... [2025-01-07 13:18:57,439][01669] Avg episode rewards: #0: 29.735, true rewards: #0: 12.402 [2025-01-07 13:18:57,441][01669] Avg episode reward: 29.735, avg true_objective: 12.402 [2025-01-07 13:18:57,522][01669] Num frames 11200... [2025-01-07 13:18:57,715][01669] Num frames 11300... [2025-01-07 13:18:57,901][01669] Num frames 11400... [2025-01-07 13:18:58,090][01669] Num frames 11500... [2025-01-07 13:18:58,275][01669] Num frames 11600... [2025-01-07 13:18:58,466][01669] Num frames 11700... [2025-01-07 13:18:58,668][01669] Num frames 11800... [2025-01-07 13:18:58,854][01669] Num frames 11900... [2025-01-07 13:18:59,048][01669] Num frames 12000... [2025-01-07 13:18:59,230][01669] Num frames 12100... [2025-01-07 13:18:59,432][01669] Num frames 12200... [2025-01-07 13:18:59,642][01669] Num frames 12300... [2025-01-07 13:18:59,830][01669] Num frames 12400... [2025-01-07 13:19:00,033][01669] Num frames 12500... [2025-01-07 13:19:00,234][01669] Num frames 12600... [2025-01-07 13:19:00,433][01669] Num frames 12700... [2025-01-07 13:19:00,630][01669] Num frames 12800... [2025-01-07 13:19:00,823][01669] Num frames 12900... [2025-01-07 13:19:01,014][01669] Num frames 13000... [2025-01-07 13:19:01,203][01669] Num frames 13100... [2025-01-07 13:19:01,390][01669] Num frames 13200... [2025-01-07 13:19:01,565][01669] Avg episode rewards: #0: 32.462, true rewards: #0: 13.262 [2025-01-07 13:19:01,567][01669] Avg episode reward: 32.462, avg true_objective: 13.262 [2025-01-07 13:20:37,274][01669] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2025-01-07 13:20:37,971][01669] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2025-01-07 13:20:37,973][01669] Overriding arg 'num_workers' with value 1 passed from command line [2025-01-07 13:20:37,975][01669] Adding new argument 'no_render'=True that is not in the saved config file! [2025-01-07 13:20:37,978][01669] Adding new argument 'save_video'=True that is not in the saved config file! [2025-01-07 13:20:37,986][01669] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2025-01-07 13:20:37,989][01669] Adding new argument 'video_name'=None that is not in the saved config file! [2025-01-07 13:20:37,991][01669] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2025-01-07 13:20:37,992][01669] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2025-01-07 13:20:37,993][01669] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2025-01-07 13:20:37,994][01669] Adding new argument 'hf_repository'='DisposableTmep/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2025-01-07 13:20:37,995][01669] Adding new argument 'policy_index'=0 that is not in the saved config file! [2025-01-07 13:20:37,997][01669] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2025-01-07 13:20:37,998][01669] Adding new argument 'train_script'=None that is not in the saved config file! [2025-01-07 13:20:37,999][01669] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2025-01-07 13:20:38,000][01669] Using frameskip 1 and render_action_repeat=4 for evaluation [2025-01-07 13:20:38,049][01669] RunningMeanStd input shape: (3, 72, 128) [2025-01-07 13:20:38,053][01669] RunningMeanStd input shape: (1,) [2025-01-07 13:20:38,074][01669] ConvEncoder: input_channels=3 [2025-01-07 13:20:38,146][01669] Conv encoder output size: 512 [2025-01-07 13:20:38,148][01669] Policy head output size: 512 [2025-01-07 13:20:38,175][01669] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000979_4009984.pth... [2025-01-07 13:20:39,023][01669] Num frames 100... [2025-01-07 13:20:39,304][01669] Num frames 200... [2025-01-07 13:20:39,565][01669] Num frames 300... [2025-01-07 13:20:39,867][01669] Num frames 400... [2025-01-07 13:20:40,136][01669] Num frames 500... [2025-01-07 13:20:40,411][01669] Num frames 600... [2025-01-07 13:20:40,711][01669] Num frames 700... [2025-01-07 13:20:40,991][01669] Num frames 800... [2025-01-07 13:20:41,282][01669] Num frames 900... [2025-01-07 13:20:41,541][01669] Num frames 1000... [2025-01-07 13:20:41,797][01669] Num frames 1100... [2025-01-07 13:20:42,084][01669] Num frames 1200... [2025-01-07 13:20:42,338][01669] Num frames 1300... [2025-01-07 13:20:42,629][01669] Num frames 1400... [2025-01-07 13:20:42,776][01669] Avg episode rewards: #0: 31.350, true rewards: #0: 14.350 [2025-01-07 13:20:42,779][01669] Avg episode reward: 31.350, avg true_objective: 14.350 [2025-01-07 13:20:42,977][01669] Num frames 1500... [2025-01-07 13:20:43,307][01669] Num frames 1600... [2025-01-07 13:20:43,634][01669] Num frames 1700... [2025-01-07 13:20:43,947][01669] Num frames 1800... [2025-01-07 13:20:44,265][01669] Num frames 1900... [2025-01-07 13:20:44,570][01669] Num frames 2000... [2025-01-07 13:20:44,678][01669] Avg episode rewards: #0: 20.555, true rewards: #0: 10.055 [2025-01-07 13:20:44,680][01669] Avg episode reward: 20.555, avg true_objective: 10.055 [2025-01-07 13:20:44,944][01669] Num frames 2100... [2025-01-07 13:20:45,272][01669] Num frames 2200... [2025-01-07 13:20:45,610][01669] Num frames 2300... [2025-01-07 13:20:45,935][01669] Num frames 2400... [2025-01-07 13:20:46,095][01669] Avg episode rewards: #0: 16.793, true rewards: #0: 8.127 [2025-01-07 13:20:46,098][01669] Avg episode reward: 16.793, avg true_objective: 8.127 [2025-01-07 13:20:46,265][01669] Num frames 2500... [2025-01-07 13:20:46,552][01669] Num frames 2600... [2025-01-07 13:20:46,871][01669] Num frames 2700... [2025-01-07 13:20:47,198][01669] Num frames 2800... [2025-01-07 13:20:47,529][01669] Num frames 2900... [2025-01-07 13:20:47,846][01669] Num frames 3000... [2025-01-07 13:20:48,131][01669] Num frames 3100... [2025-01-07 13:20:48,405][01669] Num frames 3200... [2025-01-07 13:20:48,703][01669] Num frames 3300... [2025-01-07 13:20:48,983][01669] Num frames 3400... [2025-01-07 13:20:49,269][01669] Num frames 3500... [2025-01-07 13:20:49,537][01669] Num frames 3600... [2025-01-07 13:20:49,809][01669] Num frames 3700... [2025-01-07 13:20:50,104][01669] Num frames 3800... [2025-01-07 13:20:50,379][01669] Num frames 3900... [2025-01-07 13:20:50,604][01669] Num frames 4000... [2025-01-07 13:20:50,796][01669] Num frames 4100... [2025-01-07 13:20:50,974][01669] Avg episode rewards: #0: 24.645, true rewards: #0: 10.395 [2025-01-07 13:20:50,975][01669] Avg episode reward: 24.645, avg true_objective: 10.395 [2025-01-07 13:20:51,056][01669] Num frames 4200... [2025-01-07 13:20:51,240][01669] Num frames 4300... [2025-01-07 13:20:51,429][01669] Num frames 4400... [2025-01-07 13:20:51,614][01669] Num frames 4500... [2025-01-07 13:20:51,819][01669] Avg episode rewards: #0: 21.148, true rewards: #0: 9.148 [2025-01-07 13:20:51,820][01669] Avg episode reward: 21.148, avg true_objective: 9.148 [2025-01-07 13:20:51,872][01669] Num frames 4600... [2025-01-07 13:20:52,062][01669] Num frames 4700... [2025-01-07 13:20:52,262][01669] Num frames 4800... [2025-01-07 13:20:52,456][01669] Num frames 4900... [2025-01-07 13:20:52,643][01669] Num frames 5000... [2025-01-07 13:20:52,833][01669] Num frames 5100... [2025-01-07 13:20:53,024][01669] Num frames 5200... [2025-01-07 13:20:53,222][01669] Num frames 5300... [2025-01-07 13:20:53,412][01669] Num frames 5400... [2025-01-07 13:20:53,600][01669] Num frames 5500... [2025-01-07 13:20:53,803][01669] Num frames 5600... [2025-01-07 13:20:53,990][01669] Num frames 5700... [2025-01-07 13:20:54,182][01669] Num frames 5800... [2025-01-07 13:20:54,381][01669] Num frames 5900... [2025-01-07 13:20:54,569][01669] Num frames 6000... [2025-01-07 13:20:54,767][01669] Num frames 6100... [2025-01-07 13:20:54,959][01669] Num frames 6200... [2025-01-07 13:20:55,157][01669] Num frames 6300... [2025-01-07 13:20:55,355][01669] Num frames 6400... [2025-01-07 13:20:55,546][01669] Num frames 6500... [2025-01-07 13:20:55,729][01669] Num frames 6600... [2025-01-07 13:20:55,953][01669] Avg episode rewards: #0: 26.956, true rewards: #0: 11.123 [2025-01-07 13:20:55,955][01669] Avg episode reward: 26.956, avg true_objective: 11.123 [2025-01-07 13:20:56,024][01669] Num frames 6700... [2025-01-07 13:20:56,281][01669] Num frames 6800... [2025-01-07 13:20:56,543][01669] Num frames 6900... [2025-01-07 13:20:56,798][01669] Num frames 7000... [2025-01-07 13:20:57,055][01669] Num frames 7100... [2025-01-07 13:20:57,295][01669] Num frames 7200... [2025-01-07 13:20:57,569][01669] Num frames 7300... [2025-01-07 13:20:57,841][01669] Num frames 7400... [2025-01-07 13:20:58,113][01669] Num frames 7500... [2025-01-07 13:20:58,399][01669] Num frames 7600... [2025-01-07 13:20:58,700][01669] Num frames 7700... [2025-01-07 13:20:58,973][01669] Num frames 7800... [2025-01-07 13:20:59,172][01669] Num frames 7900... [2025-01-07 13:20:59,377][01669] Num frames 8000... [2025-01-07 13:20:59,464][01669] Avg episode rewards: #0: 27.737, true rewards: #0: 11.451 [2025-01-07 13:20:59,466][01669] Avg episode reward: 27.737, avg true_objective: 11.451 [2025-01-07 13:20:59,624][01669] Num frames 8100... [2025-01-07 13:20:59,807][01669] Num frames 8200... [2025-01-07 13:21:00,002][01669] Num frames 8300... [2025-01-07 13:21:00,185][01669] Num frames 8400... [2025-01-07 13:21:00,382][01669] Num frames 8500... [2025-01-07 13:21:00,566][01669] Num frames 8600... [2025-01-07 13:21:00,747][01669] Num frames 8700... [2025-01-07 13:21:00,935][01669] Num frames 8800... [2025-01-07 13:21:01,131][01669] Num frames 8900... [2025-01-07 13:21:01,313][01669] Num frames 9000... [2025-01-07 13:21:01,507][01669] Num frames 9100... [2025-01-07 13:21:01,697][01669] Num frames 9200... [2025-01-07 13:21:01,885][01669] Num frames 9300... [2025-01-07 13:21:01,939][01669] Avg episode rewards: #0: 28.375, true rewards: #0: 11.625 [2025-01-07 13:21:01,941][01669] Avg episode reward: 28.375, avg true_objective: 11.625 [2025-01-07 13:21:02,133][01669] Num frames 9400... [2025-01-07 13:21:02,315][01669] Num frames 9500... [2025-01-07 13:21:02,517][01669] Num frames 9600... [2025-01-07 13:21:02,706][01669] Num frames 9700... [2025-01-07 13:21:02,899][01669] Num frames 9800... [2025-01-07 13:21:03,097][01669] Num frames 9900... [2025-01-07 13:21:03,285][01669] Num frames 10000... [2025-01-07 13:21:03,414][01669] Avg episode rewards: #0: 27.040, true rewards: #0: 11.151 [2025-01-07 13:21:03,416][01669] Avg episode reward: 27.040, avg true_objective: 11.151 [2025-01-07 13:21:03,539][01669] Num frames 10100... [2025-01-07 13:21:03,739][01669] Num frames 10200... [2025-01-07 13:21:03,929][01669] Num frames 10300... [2025-01-07 13:21:04,133][01669] Num frames 10400... [2025-01-07 13:21:04,331][01669] Num frames 10500... [2025-01-07 13:21:04,534][01669] Num frames 10600... [2025-01-07 13:21:04,724][01669] Num frames 10700... [2025-01-07 13:21:04,920][01669] Num frames 10800... [2025-01-07 13:21:05,115][01669] Num frames 10900... [2025-01-07 13:21:05,309][01669] Num frames 11000... [2025-01-07 13:21:05,503][01669] Num frames 11100... [2025-01-07 13:21:05,695][01669] Num frames 11200... [2025-01-07 13:21:05,886][01669] Num frames 11300... [2025-01-07 13:21:06,078][01669] Num frames 11400... [2025-01-07 13:21:06,282][01669] Num frames 11500... [2025-01-07 13:21:06,478][01669] Num frames 11600... [2025-01-07 13:21:06,670][01669] Num frames 11700... [2025-01-07 13:21:06,863][01669] Num frames 11800... [2025-01-07 13:21:07,060][01669] Num frames 11900... [2025-01-07 13:21:07,265][01669] Num frames 12000... [2025-01-07 13:21:07,465][01669] Num frames 12100... [2025-01-07 13:21:07,591][01669] Avg episode rewards: #0: 31.236, true rewards: #0: 12.136 [2025-01-07 13:21:07,592][01669] Avg episode reward: 31.236, avg true_objective: 12.136 [2025-01-07 13:22:36,256][01669] Replay video saved to /content/train_dir/default_experiment/replay.mp4!