Commit History

SB3 PPO. Vectorized 16 env. ~ 9_000_000 timesteps of training. mean_reward=163 +/- 103 . Training for an additional 50_000_000 timesteps resulted in a worse reward when evaluating
28a0b97

CoreyMorris commited on

initial commit
0152a8d

CoreyMorris commited on