ppo-Pixelcopter-PLE-v0 / results.json
CoreyMorris's picture
SB3 PPO. Vectorized 16 env. ~ 9_000_000 timesteps of training. mean_reward=163 +/- 103 . Training for an additional 50_000_000 timesteps resulted in a worse reward when evaluating
28a0b97
raw
history blame contribute delete
152 Bytes
{"mean_reward": 162.9, "std_reward": 102.90038872618508, "is_deterministic": true, "n_eval_episodes": 10, "eval_datetime": "2023-01-13T08:25:01.744757"}