SB3 PPO. Vectorized 16 env. ~ 9_000_000 timesteps of training. mean_reward=163 +/- 103 . Training for an additional 50_000_000 timesteps resulted in a worse reward when evaluating
28a0b97
library_name: stable-baselines3 | |
tags: | |
- Pixelcopter-PLE-v0 | |
- deep-reinforcement-learning | |
- reinforcement-learning | |
- stable-baselines3 | |
model-index: | |
- name: ppo | |
results: | |
- task: | |
type: reinforcement-learning | |
name: reinforcement-learning | |
dataset: | |
name: Pixelcopter-PLE-v0 | |
type: Pixelcopter-PLE-v0 | |
metrics: | |
- type: mean_reward | |
value: 162.90 +/- 102.90 | |
name: mean_reward | |
verified: false | |
# **ppo** Agent playing **Pixelcopter-PLE-v0** | |
This is a trained model of a **ppo** agent playing **Pixelcopter-PLE-v0** | |
using the [stable-baselines3 library](https://github.com/DLR-RM/stable-baselines3). | |
## Usage (with Stable-baselines3) | |
TODO: Add your code | |
```python | |
from stable_baselines3 import ... | |
from huggingface_sb3 import load_from_hub | |
... | |
``` | |