jostyposty
commited on
Commit
•
3261e0d
1
Parent(s):
03a70ee
feat: add four models
Browse files- .gitignore +1 -0
- README.md +77 -0
- config-ppo-LunarLander-v2_001_000_000_hf_defaults.json +125 -0
- config-ppo-LunarLander-v2_010_000_000_sb3_defaults.json +125 -0
- config-ppo-LunarLander-v2_123_456_789_hf_defaults.json +125 -0
- config.json +125 -0
- evaluate.py +69 -0
- evaluation_results.csv +0 -0
- hf_helpers/__init__.py +0 -0
- hf_helpers/gym_video.py +82 -0
- hf_helpers/hf_sb3.py +25 -0
- hf_helpers/readme.md +1 -0
- hf_helpers/sb3_eval.py +89 -0
- main.py +57 -0
- ppo-LunarLander-v2_001_000_000_hf_defaults.zip +3 -0
- ppo-LunarLander-v2_010_000_000_hf_defaults.zip +3 -0
- ppo-LunarLander-v2_010_000_000_sb3_defaults.zip +3 -0
- ppo-LunarLander-v2_123_456_789_hf_defaults.zip +3 -0
- results.json +7 -0
- video.mp4 +0 -0
.gitignore
ADDED
@@ -0,0 +1 @@
|
|
|
|
|
1 |
+
__pycache__
|
README.md
CHANGED
@@ -1,3 +1,80 @@
|
|
1 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
2 |
license: mit
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
+
library_name: stable-baselines3
|
3 |
+
tags:
|
4 |
+
- LunarLander-v2
|
5 |
+
- deep-reinforcement-learning
|
6 |
+
- reinforcement-learning
|
7 |
+
- stable-baselines3
|
8 |
+
model-index:
|
9 |
+
- name: ppo-LunarLander-v2_010_000_000_hf_defaults
|
10 |
+
results:
|
11 |
+
- task:
|
12 |
+
type: reinforcement-learning
|
13 |
+
name: reinforcement-learning
|
14 |
+
dataset:
|
15 |
+
name: LunarLander-v2
|
16 |
+
type: LunarLander-v2
|
17 |
+
metrics:
|
18 |
+
- type: mean_reward
|
19 |
+
value: 311.61 +/- 6.23
|
20 |
+
name: mean_reward
|
21 |
+
verified: false
|
22 |
license: mit
|
23 |
---
|
24 |
+
|
25 |
+
# **PPO** Agent playing **LunarLander-v2**
|
26 |
+
This is a trained model of a **PPO** agent playing **LunarLander-v2**
|
27 |
+
using the [stable-baselines3 library](https://github.com/DLR-RM/stable-baselines3).
|
28 |
+
|
29 |
+
## Training
|
30 |
+
When I first started training, I experimented with different parameter values to see if I could find something that gave better results than others. I ended up just using the defaults provided by Hugging Face (HF), but the differences in results between those defaults and the defaults from Stable Baselines3 (SB3) where not that large in my findings.
|
31 |
+
|
32 |
+
| Defaults name | n_steps | batch_size | n_epochs | gamma | gae_lambda | ent_coef |
|
33 |
+
|-------------------------------------|--------:|-----------:|---------:|------:|-----------:|---------:|
|
34 |
+
| Hugging Face Defaults (hf_defaults) | 1,024 | 64 | 8 | 0.999 | 0.98 | 0.01 |
|
35 |
+
| SB3 Defaults (sb3_defaults) | 2,048 | 64 | 10 | 0.99 | 0.95 | 0.0 |
|
36 |
+
|
37 |
+
## Models
|
38 |
+
I decided to train and upload four models. I wanted to test the following. I thought 1,000,000 (1M) timesteps was insufficient and 123,456,789 (123M) timesteps was excessively time-consuming without significant improvement in results. I believed 10,000,000 (10M) timesteps would offer a reasonable balance between training duration and outcomes. I used defaults from both Hugging Face and Stable Baseline3 when training with 10M timesteps.
|
39 |
+
|
40 |
+
| Number | Model name | timesteps | Defaults |
|
41 |
+
|:------:|---------------------------------------------|------------:|-------------:|
|
42 |
+
| 1 | ppo-LunarLander-v2_001_000_000_hf_defaults | 1,000,000 | hf_defaults |
|
43 |
+
| 2 | ppo-LunarLander-v2_010_000_000_hf_defaults | 10,000,000 | hf_defaults |
|
44 |
+
| 3 | ppo-LunarLander-v2_010_000_000_sb3_defaults | 10,000,000 | sb3_defaults |
|
45 |
+
| 4 | ppo-LunarLander-v2_123_456_789_hf_defaults | 123,456,789 | hf_defaults |
|
46 |
+
|
47 |
+
|
48 |
+
## Evaluation
|
49 |
+
I evaluated the four models using two approaches:
|
50 |
+
- Search: Search through a lot different random environments for a good seed
|
51 |
+
- Average: Averaging over a lot different random environments
|
52 |
+
|
53 |
+
The code in evaluate.py shows the method of evaluating and storing the results. All the results are included in the evaluation_results.csv file. The result is mean_reward - std_reward, but I also store mean_reward, std_reward, and seed and n_envs as well.
|
54 |
+
|
55 |
+
### Results
|
56 |
+
| Model name | Number of results | Min | Max | Average |
|
57 |
+
|:--------------------------------------------|--------------------:|---------:|--------:|----------:|
|
58 |
+
| ppo-LunarLander-v2_001_000_000_hf_defaults | 4136 | 144.712 | 269.721 | 240.895 |
|
59 |
+
| ppo-LunarLander-v2_010_000_000_hf_defaults | 4136 | 130.43 | 305.384 | 270.451 |
|
60 |
+
| ppo-LunarLander-v2_010_000_000_sb3_defaults | 4136 | 87.9966 | 298.898 | 269.568 |
|
61 |
+
| ppo-LunarLander-v2_123_456_789_hf_defaults | 4136 | 141.814 | 302.567 | 268.735 |
|
62 |
+
|
63 |
+
### Conclusion
|
64 |
+
As suspected, the 1M model performed the worst. I really don't think there are significant differences between the two 10M and the 123M models.
|
65 |
+
|
66 |
+
## Disclaimer regarding the evaluation result
|
67 |
+
I kind of don't like the randomness we get by the current method for evaluating the model. As you see, I tested with different seeds and number of parallel environments for the same model, and I got quite varying results. I have not manually updated the score to the better, neither used a lower number for n_eval_episodes. The latter would give a better result, as there would be less to average over. But, as can be seen in evaluation_results.csv, I do have "mined" for a good seed for when to share.
|
68 |
+
|
69 |
+
### A better way to evaluate the models?
|
70 |
+
Perhaps we should average over more environments? Wouldn't this give a result less prone to the randomness of the environments? When averaging over the environments, we get a much more stable result. So I think this perhaps could be a better way of evaluating the results for use in a leader board. In short: n_eval_episodes=10 and average over at least 10 different random environments.
|
71 |
+
|
72 |
+
## Usage (with Stable-baselines3)
|
73 |
+
|
74 |
+
```python
|
75 |
+
from huggingface_sb3 import load_from_hub
|
76 |
+
checkpoint = load_from_hub("jostyposty/drl-course-unit-01-lunar-lander-v2", "ppo-LunarLander-v2_010_000_000_hf_defaults.zip")
|
77 |
+
# TODO: test this
|
78 |
+
```
|
79 |
+
|
80 |
+
|
config-ppo-LunarLander-v2_001_000_000_hf_defaults.json
ADDED
@@ -0,0 +1,125 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"policy_class": {
|
3 |
+
":type:": "<class 'abc.ABCMeta'>",
|
4 |
+
":serialized:": "gAWVOwAAAAAAAACMIXN0YWJsZV9iYXNlbGluZXMzLmNvbW1vbi5wb2xpY2llc5SMEUFjdG9yQ3JpdGljUG9saWN5lJOULg==",
|
5 |
+
"__module__": "stable_baselines3.common.policies",
|
6 |
+
"__doc__": "\n Policy class for actor-critic algorithms (has both policy and value prediction).\n Used by A2C, PPO and the likes.\n\n :param observation_space: Observation space\n :param action_space: Action space\n :param lr_schedule: Learning rate schedule (could be constant)\n :param net_arch: The specification of the policy and value networks.\n :param activation_fn: Activation function\n :param ortho_init: Whether to use or not orthogonal initialization\n :param use_sde: Whether to use State Dependent Exploration or not\n :param log_std_init: Initial value for the log standard deviation\n :param full_std: Whether to use (n_features x n_actions) parameters\n for the std instead of only (n_features,) when using gSDE\n :param use_expln: Use ``expln()`` function instead of ``exp()`` to ensure\n a positive standard deviation (cf paper). It allows to keep variance\n above zero and prevent it from growing too fast. In practice, ``exp()`` is usually enough.\n :param squash_output: Whether to squash the output using a tanh function,\n this allows to ensure boundaries when using gSDE.\n :param features_extractor_class: Features extractor to use.\n :param features_extractor_kwargs: Keyword arguments\n to pass to the features extractor.\n :param share_features_extractor: If True, the features extractor is shared between the policy and value networks.\n :param normalize_images: Whether to normalize images or not,\n dividing by 255.0 (True by default)\n :param optimizer_class: The optimizer to use,\n ``th.optim.Adam`` by default\n :param optimizer_kwargs: Additional keyword arguments,\n excluding the learning rate, to pass to the optimizer\n ",
|
7 |
+
"__init__": "<function ActorCriticPolicy.__init__ at 0x7f2d8fd011b0>",
|
8 |
+
"_get_constructor_parameters": "<function ActorCriticPolicy._get_constructor_parameters at 0x7f2d8fd01240>",
|
9 |
+
"reset_noise": "<function ActorCriticPolicy.reset_noise at 0x7f2d8fd012d0>",
|
10 |
+
"_build_mlp_extractor": "<function ActorCriticPolicy._build_mlp_extractor at 0x7f2d8fd01360>",
|
11 |
+
"_build": "<function ActorCriticPolicy._build at 0x7f2d8fd013f0>",
|
12 |
+
"forward": "<function ActorCriticPolicy.forward at 0x7f2d8fd01480>",
|
13 |
+
"extract_features": "<function ActorCriticPolicy.extract_features at 0x7f2d8fd01510>",
|
14 |
+
"_get_action_dist_from_latent": "<function ActorCriticPolicy._get_action_dist_from_latent at 0x7f2d8fd015a0>",
|
15 |
+
"_predict": "<function ActorCriticPolicy._predict at 0x7f2d8fd01630>",
|
16 |
+
"evaluate_actions": "<function ActorCriticPolicy.evaluate_actions at 0x7f2d8fd016c0>",
|
17 |
+
"get_distribution": "<function ActorCriticPolicy.get_distribution at 0x7f2d8fd01750>",
|
18 |
+
"predict_values": "<function ActorCriticPolicy.predict_values at 0x7f2d8fd017e0>",
|
19 |
+
"__abstractmethods__": "frozenset()",
|
20 |
+
"_abc_impl": "<_abc._abc_data object at 0x7f2d8fcf67c0>"
|
21 |
+
},
|
22 |
+
"verbose": 0,
|
23 |
+
"policy_kwargs": {},
|
24 |
+
"num_timesteps": 1015808,
|
25 |
+
"_total_timesteps": 1000000,
|
26 |
+
"_num_timesteps_at_start": 0,
|
27 |
+
"seed": null,
|
28 |
+
"action_noise": null,
|
29 |
+
"start_time": 1708464106672864008,
|
30 |
+
"learning_rate": 0.0003,
|
31 |
+
"tensorboard_log": null,
|
32 |
+
"_last_obs": {
|
33 |
+
":type:": "<class 'numpy.ndarray'>",
|
34 |
+
":serialized:": "gAWVdQQAAAAAAACMEm51bXB5LmNvcmUubnVtZXJpY5SMC19mcm9tYnVmZmVylJOUKJYABAAAAAAAAGbmnbrhCt26snE1u6CkODi+HSs74BrWOQAAgD8AAIA/M/N8PVx7WrozR5o6ANUqNvGUijt+L7G5AACAPwAAgD/AqpE99O6VvHsMrr1BMoW9KxxJPVi9YD4AAIA/AACAPw0Ys732wHi6feBtOSAcKzWrhje7iFGItwAAgD8AAAAAZmJ5vFwsqT/jPeG9x82zvta98DsycOG8AAAAAAAAAAAzdoY89iQwut1Qvjsyogs4VbhZO2NdCrYAAIA/AACAP03cRD2Phmi6Lh6vOhdXJTbwm2W4WZwaNQAAgD8AAIA/2je8vZthcz/wm4C8MfbTvlvMprxZsSi9AAAAAAAAAADNsGi9uHbzuU0CQLm8MjyzDmjktubyXjgAAIA/AACAPwCg0roUDLO6u6/+uxCsmDtqddE6KkiIPAAAgD8AAIA/gJ5qvRQQmrrzgKq79/ZfNvs+S7svOME6AACAPwAAgD8Ad6e82NGYP+RLSb2LzZO+0IIHOlrUpzwAAAAAAAAAAM0+QzyuYYi6shlfu8t9KTi8KkI79zQAOgAAgD8AAIA/jXsAPvj0Kz9F/J281kqsviTmvj0X1ii7AAAAAAAAAAAAbvc89gQdutBdA7o300u1fvmfudnmGjkAAIA/AACAP5pd0rvDlVS6z425Ow/lMzjWyDK7q1iUtwAAgD8AAIA/ZslCvey5ybkCVzw8I+6GNdDaSjsGpYg0AACAPwAAgD/NWhS8KUgCuh7pAbnrJTAzf81luxhRGDgAAIA/AACAP83jwbwfLei5m6xftvfYYzCFQKW6V0CFNQAAgD8AAIA/miUavBMgoz9gPe+8y2G1vh1vxbsmMbk8AAAAAAAAAACt90G+k0ejPrT5rz7bJZe+/AyaPcNOdDwAAAAAAAAAAAA7EL0pSCm64jPfuuTuwrVOHQI5a7/7OQAAgD8AAIA/xqgmvptL3z4VbfY9pqpPviw4lL1IGnU9AAAAAAAAAACAmwi9FBCCumieOjlAFog0IiFcO1DzV7gAAIA/AACAP/PjSr7UyxA/PtFoPnKjjL7mPaK72ywKvQAAAAAAAAAATTksvVzrcboU+iC6sacBtfJtnjprtzs5AACAPwAAgD+a+eS61STXPjbZ5b03Cqi+fH6jveqCHj0AAAAAAAAAADNZJbwpuGe6ImFBuiMOC72QDEI7UtjzPQAAgD8AAAAAZlf6vCloUbo6G8K6IuShM71FiLlLAN85AACAPwAAgD+A+Wc94daAupFJDbi9tk6zgPtuO46JIzcAAIA/AACAP80EsDxc+1u6TiuLuryCiLPSf865bwGgOQAAgD8AAIA/zVZkPcMBLrrwP2w7KpqzNkETmTo2U7E1AACAPwAAgD+UjAVudW1weZSMBWR0eXBllJOUjAJmNJSJiIeUUpQoSwOMATyUTk5OSv////9K/////0sAdJRiSyBLCIaUjAFDlHSUUpQu"
|
35 |
+
},
|
36 |
+
"_last_episode_starts": {
|
37 |
+
":type:": "<class 'numpy.ndarray'>",
|
38 |
+
":serialized:": "gAWVkwAAAAAAAACMEm51bXB5LmNvcmUubnVtZXJpY5SMC19mcm9tYnVmZmVylJOUKJYgAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAlIwFbnVtcHmUjAVkdHlwZZSTlIwCYjGUiYiHlFKUKEsDjAF8lE5OTkr/////Sv////9LAHSUYksghZSMAUOUdJRSlC4="
|
39 |
+
},
|
40 |
+
"_last_original_obs": null,
|
41 |
+
"_episode_num": 0,
|
42 |
+
"use_sde": false,
|
43 |
+
"sde_sample_freq": -1,
|
44 |
+
"_current_progress_remaining": -0.015808000000000044,
|
45 |
+
"_stats_window_size": 100,
|
46 |
+
"ep_info_buffer": {
|
47 |
+
":type:": "<class 'collections.deque'>",
|
48 |
+
":serialized:": "gAWVQgwAAAAAAACMC2NvbGxlY3Rpb25zlIwFZGVxdWWUk5QpS2SGlFKUKH2UKIwBcpRHQGGWG2TgVGmMAWyUTegDjAF0lEdAe/VybhFVk3V9lChoBkdAZmz+NLlFMWgHTegDaAhHQHv40YXO4Xp1fZQoaAZHQGbdXRgJC0FoB03oA2gIR0B7+FgF5fMOdX2UKGgGR0BkU6eI2wV1aAdN6ANoCEdAe/4Qgs9SuXV9lChoBkdAXVd9JBgNPWgHTegDaAhHQHv5yhSLqD91fZQoaAZHQGZKDMeOn2toB03oA2gIR0B7+z4yoGY8dX2UKGgGR0BiZ7xZuAI6aAdN6ANoCEdAe/3fzz3AVXV9lChoBkdAZdq+/QBxP2gHTegDaAhHQHwAcWO6unx1fZQoaAZHQGOElBppN9JoB03oA2gIR0B7/eyxA0KrdX2UKGgGR0Bi95ZntfG/aAdN6ANoCEdAe/4We6I3znV9lChoBkdAYubRaX8fm2gHTegDaAhHQHwDoqLCN0h1fZQoaAZHQGEkgJb+tKZoB03oA2gIR0B8BBy6tknUdX2UKGgGR0AiLVe8f3evaAdL4mgIR0B8CXpaA4GVdX2UKGgGR0BmjouAZsKtaAdN6ANoCEdAfAUGTLW7OHV9lChoBkdAXb8AWBSUDGgHTegDaAhHQHwI3tv4ubt1fZQoaAZHQGRXjyOJcgRoB03oA2gIR0B8C3EqDsdDdX2UKGgGR0BgZeW+oLofaAdN6ANoCEdAfAzvX9R77nV9lChoBkdAXu93Ux20RmgHTegDaAhHQHwLqBNEgGN1fZQoaAZHQGgEb5VOsT5oB03oA2gIR0B8DbXFtKqXdX2UKGgGR0BkiCYLLIPtaAdN6ANoCEdAfA5X2/SH/XV9lChoBkdAciNQdCE6DGgHTf4BaAhHQHwPD/VAiV11fZQoaAZHQGHkxubZvk1oB03oA2gIR0B8Ds/5ckdFdX2UKGgGR0Bj8NN8E3bVaAdN6ANoCEdAfBL4ACGN73V9lChoBkdARh0MmWt2cWgHS+BoCEdAfBk5EMLF43V9lChoBkdAZfS2F36hx2gHTegDaAhHQHwdWSpzcRF1fZQoaAZHQGaLGHpKSPloB03oA2gIR0B8GikEcKgJdX2UKGgGR0BjLbakAPupaAdN6ANoCEdAfB2ybhFVk3V9lChoBkdAYxRVwxWT5mgHTegDaAhHQHwgFmOEM9d1fZQoaAZHQGWAS6tknTloB03oA2gIR0B8IAgLZzxPdX2UKGgGR0BkEbKNhmXgaAdN6ANoCEdAfCG8GLUCrHV9lChoBkdAYk6IFeOXFGgHTegDaAhHQHwhlIuoP091fZQoaAZHQGLuoRRMvh9oB03oA2gIR0B8I/FOwgTzdX2UKGgGR0BlOdqcmShbaAdN6ANoCEdAfCfOUt7KJXV9lChoBkdAZ2XlvqC6H2gHTegDaAhHQHzjtL6DXe51fZQoaAZHQF9P5ggHNX5oB03oA2gIR0B85UUg0TDgdX2UKGgGR0Bfa5hjOLR8aAdN6ANoCEdAfOibADaGpXV9lChoBkdAX/CxHG0eEWgHTegDaAhHQHzn8rAgxJx1fZQoaAZHQGD5GJm/WUdoB03oA2gIR0B86UjLSuyNdX2UKGgGR0BkEo7T2FnJaAdN6ANoCEdAfOqxj8UEgXV9lChoBkdAXEUK5TZQHmgHTegDaAhHQHztRBE8aGZ1fZQoaAZHQGRDYqXnhbZoB03oA2gIR0B8797b+Lm7dX2UKGgGR0BkQTQw9JSSaAdN6ANoCEdAfO1fIjnmrHV9lChoBkdAYIzSF49ovmgHTegDaAhHQHztiOzY2891fZQoaAZHQGVm3yy2QXBoB03oA2gIR0B88x4rz5GjdX2UKGgGR0BjknMKTjebaAdN6ANoCEdAfPOuE25xznV9lChoBkdAY0RkuHvc8GgHTegDaAhHQHz5G8Zk0791fZQoaAZHQGdPTLfUF0RoB03oA2gIR0B89KrKeTV2dX2UKGgGR0Bjo42jwhGIaAdN6ANoCEdAfPhp8WsRx3V9lChoBkdAZ0xP4VRDTmgHTegDaAhHQHz6+GXXyy51fZQoaAZHQG9v2KuSwGJoB02NAmgIR0B899DPWxyGdX2UKGgGR0BlzvwG4ZuRaAdN6ANoCEdAfPx/fO2RaHV9lChoBkdAZh5o/zJ6p2gHTegDaAhHQHz7KO5rgwZ1fZQoaAZHQGe9lBY3eepoB03oA2gIR0B8/dmBe5WjdX2UKGgGR0BmxpH/cWTHaAdN6ANoCEdAfP6a3Zwn6XV9lChoBkdAZ8o6reZXuGgHTegDaAhHQHz+XE/B3zN1fZQoaAZHQGEXZ4wAU+NoB03oA2gIR0B9Aqwt8NQTdX2UKGgGR0Bngr6vaDf4aAdN6ANoCEdAfQj0FKTSs3V9lChoBkdAZV5p0OmR/2gHTegDaAhHQH0NKlgtvn91fZQoaAZHQGa8aJQ+EAZoB03oA2gIR0B9DcJhOP/8dX2UKGgGR0BkBTSgGr0baAdN6ANoCEdAfRBbADaGpXV9lChoBkdAZcthDPWxyGgHTegDaAhHQH0QWMS9M9N1fZQoaAZHQGajUQK8cuJoB03oA2gIR0B9EjO4XoC/dX2UKGgGR0Bhe6VD8cdYaAdN6ANoCEdAfRIIWP91l3V9lChoBkdAZa95UtI07GgHTegDaAhHQH0Unl8w5/91fZQoaAZHQGS3Wom5UcZoB03oA2gIR0B9GJlQMx46dX2UKGgGR0Bl498qnWJ8aAdN6ANoCEdAfRg1loUSI3V9lChoBkdAXC5GG21D0GgHTegDaAhHQH3bnQ+lj3F1fZQoaAZHQF7xb/ffoA5oB03oA2gIR0B93wmeDnNgdX2UKGgGR0BhNfTVlPJraAdN6ANoCEdAfd7qEeyRjnV9lChoBkdAYrCJb+tKZmgHTegDaAhHQH3gwKKHfuV1fZQoaAZHQGYye5vtMPBoB03oA2gIR0B94kTGo73gdX2UKGgGR0Bjvsm4RVZLaAdN6ANoCEdAfeUZ5iVjZ3V9lChoBkdAYBKPBi1Aq2gHTegDaAhHQH3nrrcCYC11fZQoaAZHQGg1uEug6EJoB03oA2gIR0B95UU21lXjdX2UKGgGR0BgSjpNbkfcaAdN6ANoCEdAfeVqbz9S/HV9lChoBkdAYF0xTsIE82gHTegDaAhHQH3r1eOXE611fZQoaAZHQGH8z0g8r7RoB03oA2gIR0B97JWxQizLdX2UKGgGR0Bg6iXKKYReaAdN6ANoCEdAffKgCOmzjXV9lChoBkdAYpijYZl4DGgHTegDaAhHQH3uL8R+SbJ1fZQoaAZHQGJtpkXk5p9oB03oA2gIR0B98lthuwX7dX2UKGgGR0BmR9mcvugIaAdN6ANoCEdAffTxEv0yxnV9lChoBkdAZ4R7cfvF32gHTegDaAhHQH3x/crRSgp1fZQoaAZHQGOa4Dklu3toB03oA2gIR0B99w9zOopAdX2UKGgGR0Blu30EovzwaAdN6ANoCEdAffZu7pV0cXV9lChoBkdAY3yCz1K5CmgHTegDaAhHQH36BYJVsDZ1fZQoaAZHQGaVKtYB/7VoB03oA2gIR0B9+uLehwl0dX2UKGgGR0BkEVgnc+JQaAdN6ANoCEdAffqkUsWfsnV9lChoBkdAYlj7mdRR/GgHTegDaAhHQH3/6Kcd5pt1fZQoaAZHQGEEEmhM8HRoB03oA2gIR0B+CLBYV6/qdX2UKGgGR0Bkr5LZi/fwaAdN6ANoCEdAfgzhddE9dXV9lChoBkdAYlXZ1V5rxmgHTegDaAhHQH4OffTCtRx1fZQoaAZHQGZOb5M10kpoB03oA2gIR0B+ElqCYkVvdX2UKGgGR0BlUOPBBRhuaAdN6ANoCEdAfhJaTfR/mXV9lChoBkdAXLbSXt0FKWgHTegDaAhHQH4UgmeDnNh1fZQoaAZHQGYdemWMS9NoB03oA2gIR0B+FF79hqj8dX2UKGgGR0BgEutGNJe3aAdN6ANoCEdAfhgWS2Yv4HV9lChoBkdAXaps1sLv1GgHTegDaAhHQH4cP5ULlV91fZQoaAZHQGW1tC7btZ5oB03oA2gIR0B+HV2Pkq+bdX2UKGgGR0Bg/a3G4qgAaAdN6ANoCEdAfh8QXQ+lj3V9lChoBkdAZW4bqhUR4GgHTegDaAhHQH4izRc/t6Z1ZS4="
|
49 |
+
},
|
50 |
+
"ep_success_buffer": {
|
51 |
+
":type:": "<class 'collections.deque'>",
|
52 |
+
":serialized:": "gAWVIAAAAAAAAACMC2NvbGxlY3Rpb25zlIwFZGVxdWWUk5QpS2SGlFKULg=="
|
53 |
+
},
|
54 |
+
"_n_updates": 248,
|
55 |
+
"observation_space": {
|
56 |
+
":type:": "<class 'gymnasium.spaces.box.Box'>",
|
57 |
+
":serialized:": "gAWVdgIAAAAAAACMFGd5bW5hc2l1bS5zcGFjZXMuYm94lIwDQm94lJOUKYGUfZQojAVkdHlwZZSMBW51bXB5lIwFZHR5cGWUk5SMAmY0lImIh5RSlChLA4wBPJROTk5K/////0r/////SwB0lGKMDWJvdW5kZWRfYmVsb3eUjBJudW1weS5jb3JlLm51bWVyaWOUjAtfZnJvbWJ1ZmZlcpSTlCiWCAAAAAAAAAABAQEBAQEBAZRoCIwCYjGUiYiHlFKUKEsDjAF8lE5OTkr/////Sv////9LAHSUYksIhZSMAUOUdJRSlIwNYm91bmRlZF9hYm92ZZRoESiWCAAAAAAAAAABAQEBAQEBAZRoFUsIhZRoGXSUUpSMBl9zaGFwZZRLCIWUjANsb3eUaBEoliAAAAAAAAAAAAC0wgAAtMIAAKDAAACgwNsPScAAAKDAAAAAgAAAAICUaAtLCIWUaBl0lFKUjARoaWdolGgRKJYgAAAAAAAAAAAAtEIAALRCAACgQAAAoEDbD0lAAACgQAAAgD8AAIA/lGgLSwiFlGgZdJRSlIwIbG93X3JlcHKUjFtbLTkwLiAgICAgICAgLTkwLiAgICAgICAgIC01LiAgICAgICAgIC01LiAgICAgICAgIC0zLjE0MTU5MjcgIC01LgogIC0wLiAgICAgICAgIC0wLiAgICAgICBdlIwJaGlnaF9yZXBylIxTWzkwLiAgICAgICAgOTAuICAgICAgICAgNS4gICAgICAgICA1LiAgICAgICAgIDMuMTQxNTkyNyAgNS4KICAxLiAgICAgICAgIDEuICAgICAgIF2UjApfbnBfcmFuZG9tlE51Yi4=",
|
58 |
+
"dtype": "float32",
|
59 |
+
"bounded_below": "[ True True True True True True True True]",
|
60 |
+
"bounded_above": "[ True True True True True True True True]",
|
61 |
+
"_shape": [
|
62 |
+
8
|
63 |
+
],
|
64 |
+
"low": "[-90. -90. -5. -5. -3.1415927 -5.\n -0. -0. ]",
|
65 |
+
"high": "[90. 90. 5. 5. 3.1415927 5.\n 1. 1. ]",
|
66 |
+
"low_repr": "[-90. -90. -5. -5. -3.1415927 -5.\n -0. -0. ]",
|
67 |
+
"high_repr": "[90. 90. 5. 5. 3.1415927 5.\n 1. 1. ]",
|
68 |
+
"_np_random": null
|
69 |
+
},
|
70 |
+
"action_space": {
|
71 |
+
":type:": "<class 'gymnasium.spaces.discrete.Discrete'>",
|
72 |
+
":serialized:": "gAWV/QAAAAAAAACMGWd5bW5hc2l1bS5zcGFjZXMuZGlzY3JldGWUjAhEaXNjcmV0ZZSTlCmBlH2UKIwBbpSMFW51bXB5LmNvcmUubXVsdGlhcnJheZSMBnNjYWxhcpSTlIwFbnVtcHmUjAVkdHlwZZSTlIwCaTiUiYiHlFKUKEsDjAE8lE5OTkr/////Sv////9LAHSUYkMIBAAAAAAAAACUhpRSlIwFc3RhcnSUaAhoDkMIAAAAAAAAAACUhpRSlIwGX3NoYXBllCmMBWR0eXBllGgLjAJpOJSJiIeUUpQoSwNoD05OTkr/////Sv////9LAHSUYowKX25wX3JhbmRvbZROdWIu",
|
73 |
+
"n": "4",
|
74 |
+
"start": "0",
|
75 |
+
"_shape": [],
|
76 |
+
"dtype": "int64",
|
77 |
+
"_np_random": null
|
78 |
+
},
|
79 |
+
"n_envs": 32,
|
80 |
+
"n_steps": 1024,
|
81 |
+
"gamma": 0.999,
|
82 |
+
"gae_lambda": 0.98,
|
83 |
+
"ent_coef": 0.01,
|
84 |
+
"vf_coef": 0.5,
|
85 |
+
"max_grad_norm": 0.5,
|
86 |
+
"rollout_buffer_class": {
|
87 |
+
":type:": "<class 'abc.ABCMeta'>",
|
88 |
+
":serialized:": "gAWVNgAAAAAAAACMIHN0YWJsZV9iYXNlbGluZXMzLmNvbW1vbi5idWZmZXJzlIwNUm9sbG91dEJ1ZmZlcpSTlC4=",
|
89 |
+
"__module__": "stable_baselines3.common.buffers",
|
90 |
+
"__annotations__": "{'observations': <class 'numpy.ndarray'>, 'actions': <class 'numpy.ndarray'>, 'rewards': <class 'numpy.ndarray'>, 'advantages': <class 'numpy.ndarray'>, 'returns': <class 'numpy.ndarray'>, 'episode_starts': <class 'numpy.ndarray'>, 'log_probs': <class 'numpy.ndarray'>, 'values': <class 'numpy.ndarray'>}",
|
91 |
+
"__doc__": "\n Rollout buffer used in on-policy algorithms like A2C/PPO.\n It corresponds to ``buffer_size`` transitions collected\n using the current policy.\n This experience will be discarded after the policy update.\n In order to use PPO objective, we also store the current value of each state\n and the log probability of each taken action.\n\n The term rollout here refers to the model-free notion and should not\n be used with the concept of rollout used in model-based RL or planning.\n Hence, it is only involved in policy and value function training but not action selection.\n\n :param buffer_size: Max number of element in the buffer\n :param observation_space: Observation space\n :param action_space: Action space\n :param device: PyTorch device\n :param gae_lambda: Factor for trade-off of bias vs variance for Generalized Advantage Estimator\n Equivalent to classic advantage when set to 1.\n :param gamma: Discount factor\n :param n_envs: Number of parallel environments\n ",
|
92 |
+
"__init__": "<function RolloutBuffer.__init__ at 0x7f2d8fe97520>",
|
93 |
+
"reset": "<function RolloutBuffer.reset at 0x7f2d8fe975b0>",
|
94 |
+
"compute_returns_and_advantage": "<function RolloutBuffer.compute_returns_and_advantage at 0x7f2d8fe97640>",
|
95 |
+
"add": "<function RolloutBuffer.add at 0x7f2d8fe976d0>",
|
96 |
+
"get": "<function RolloutBuffer.get at 0x7f2d8fe97760>",
|
97 |
+
"_get_samples": "<function RolloutBuffer._get_samples at 0x7f2d8fe977f0>",
|
98 |
+
"__abstractmethods__": "frozenset()",
|
99 |
+
"_abc_impl": "<_abc._abc_data object at 0x7f2d8fe8c900>"
|
100 |
+
},
|
101 |
+
"rollout_buffer_kwargs": {},
|
102 |
+
"batch_size": 64,
|
103 |
+
"n_epochs": 8,
|
104 |
+
"clip_range": {
|
105 |
+
":type:": "<class 'function'>",
|
106 |
+
":serialized:": "gAWV3AIAAAAAAACMF2Nsb3VkcGlja2xlLmNsb3VkcGlja2xllIwOX21ha2VfZnVuY3Rpb26Uk5QoaACMDV9idWlsdGluX3R5cGWUk5SMCENvZGVUeXBllIWUUpQoSwFLAEsASwFLAUsTQwSIAFMAlE6FlCmMAV+UhZSMYy9ob21lL2pvc3NrYS9taW5pY29uZGEzL2VudnMvanNtbDMxMC9saWIvcHl0aG9uMy4xMC9zaXRlLXBhY2thZ2VzL3N0YWJsZV9iYXNlbGluZXMzL2NvbW1vbi91dGlscy5weZSMBGZ1bmOUS4NDAgQBlIwDdmFslIWUKXSUUpR9lCiMC19fcGFja2FnZV9flIwYc3RhYmxlX2Jhc2VsaW5lczMuY29tbW9ulIwIX19uYW1lX1+UjB5zdGFibGVfYmFzZWxpbmVzMy5jb21tb24udXRpbHOUjAhfX2ZpbGVfX5SMYy9ob21lL2pvc3NrYS9taW5pY29uZGEzL2VudnMvanNtbDMxMC9saWIvcHl0aG9uMy4xMC9zaXRlLXBhY2thZ2VzL3N0YWJsZV9iYXNlbGluZXMzL2NvbW1vbi91dGlscy5weZR1Tk5oAIwQX21ha2VfZW1wdHlfY2VsbJSTlClSlIWUdJRSlGgAjBJfZnVuY3Rpb25fc2V0c3RhdGWUk5RoH32UfZQoaBZoDYwMX19xdWFsbmFtZV9flIwZY29uc3RhbnRfZm4uPGxvY2Fscz4uZnVuY5SMD19fYW5ub3RhdGlvbnNfX5R9lIwOX19rd2RlZmF1bHRzX1+UTowMX19kZWZhdWx0c19flE6MCl9fbW9kdWxlX1+UaBeMB19fZG9jX1+UTowLX19jbG9zdXJlX1+UaACMCl9tYWtlX2NlbGyUk5RHP8mZmZmZmZqFlFKUhZSMF19jbG91ZHBpY2tsZV9zdWJtb2R1bGVzlF2UjAtfX2dsb2JhbHNfX5R9lHWGlIZSMC4="
|
107 |
+
},
|
108 |
+
"clip_range_vf": null,
|
109 |
+
"normalize_advantage": true,
|
110 |
+
"target_kl": null,
|
111 |
+
"lr_schedule": {
|
112 |
+
":type:": "<class 'function'>",
|
113 |
+
":serialized:": "gAWV3AIAAAAAAACMF2Nsb3VkcGlja2xlLmNsb3VkcGlja2xllIwOX21ha2VfZnVuY3Rpb26Uk5QoaACMDV9idWlsdGluX3R5cGWUk5SMCENvZGVUeXBllIWUUpQoSwFLAEsASwFLAUsTQwSIAFMAlE6FlCmMAV+UhZSMYy9ob21lL2pvc3NrYS9taW5pY29uZGEzL2VudnMvanNtbDMxMC9saWIvcHl0aG9uMy4xMC9zaXRlLXBhY2thZ2VzL3N0YWJsZV9iYXNlbGluZXMzL2NvbW1vbi91dGlscy5weZSMBGZ1bmOUS4NDAgQBlIwDdmFslIWUKXSUUpR9lCiMC19fcGFja2FnZV9flIwYc3RhYmxlX2Jhc2VsaW5lczMuY29tbW9ulIwIX19uYW1lX1+UjB5zdGFibGVfYmFzZWxpbmVzMy5jb21tb24udXRpbHOUjAhfX2ZpbGVfX5SMYy9ob21lL2pvc3NrYS9taW5pY29uZGEzL2VudnMvanNtbDMxMC9saWIvcHl0aG9uMy4xMC9zaXRlLXBhY2thZ2VzL3N0YWJsZV9iYXNlbGluZXMzL2NvbW1vbi91dGlscy5weZR1Tk5oAIwQX21ha2VfZW1wdHlfY2VsbJSTlClSlIWUdJRSlGgAjBJfZnVuY3Rpb25fc2V0c3RhdGWUk5RoH32UfZQoaBZoDYwMX19xdWFsbmFtZV9flIwZY29uc3RhbnRfZm4uPGxvY2Fscz4uZnVuY5SMD19fYW5ub3RhdGlvbnNfX5R9lIwOX19rd2RlZmF1bHRzX1+UTowMX19kZWZhdWx0c19flE6MCl9fbW9kdWxlX1+UaBeMB19fZG9jX1+UTowLX19jbG9zdXJlX1+UaACMCl9tYWtlX2NlbGyUk5RHPzOpKjBVMmGFlFKUhZSMF19jbG91ZHBpY2tsZV9zdWJtb2R1bGVzlF2UjAtfX2dsb2JhbHNfX5R9lHWGlIZSMC4="
|
114 |
+
},
|
115 |
+
"system_info": {
|
116 |
+
"OS": "Linux-5.15.133.1-microsoft-standard-WSL2-x86_64-with-glibc2.35 # 1 SMP Thu Oct 5 21:02:42 UTC 2023",
|
117 |
+
"Python": "3.10.13",
|
118 |
+
"Stable-Baselines3": "2.2.1",
|
119 |
+
"PyTorch": "2.2.0+cu121",
|
120 |
+
"GPU Enabled": "True",
|
121 |
+
"Numpy": "1.26.4",
|
122 |
+
"Cloudpickle": "3.0.0",
|
123 |
+
"Gymnasium": "0.28.1"
|
124 |
+
}
|
125 |
+
}
|
config-ppo-LunarLander-v2_010_000_000_sb3_defaults.json
ADDED
@@ -0,0 +1,125 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"policy_class": {
|
3 |
+
":type:": "<class 'abc.ABCMeta'>",
|
4 |
+
":serialized:": "gAWVOwAAAAAAAACMIXN0YWJsZV9iYXNlbGluZXMzLmNvbW1vbi5wb2xpY2llc5SMEUFjdG9yQ3JpdGljUG9saWN5lJOULg==",
|
5 |
+
"__module__": "stable_baselines3.common.policies",
|
6 |
+
"__doc__": "\n Policy class for actor-critic algorithms (has both policy and value prediction).\n Used by A2C, PPO and the likes.\n\n :param observation_space: Observation space\n :param action_space: Action space\n :param lr_schedule: Learning rate schedule (could be constant)\n :param net_arch: The specification of the policy and value networks.\n :param activation_fn: Activation function\n :param ortho_init: Whether to use or not orthogonal initialization\n :param use_sde: Whether to use State Dependent Exploration or not\n :param log_std_init: Initial value for the log standard deviation\n :param full_std: Whether to use (n_features x n_actions) parameters\n for the std instead of only (n_features,) when using gSDE\n :param use_expln: Use ``expln()`` function instead of ``exp()`` to ensure\n a positive standard deviation (cf paper). It allows to keep variance\n above zero and prevent it from growing too fast. In practice, ``exp()`` is usually enough.\n :param squash_output: Whether to squash the output using a tanh function,\n this allows to ensure boundaries when using gSDE.\n :param features_extractor_class: Features extractor to use.\n :param features_extractor_kwargs: Keyword arguments\n to pass to the features extractor.\n :param share_features_extractor: If True, the features extractor is shared between the policy and value networks.\n :param normalize_images: Whether to normalize images or not,\n dividing by 255.0 (True by default)\n :param optimizer_class: The optimizer to use,\n ``th.optim.Adam`` by default\n :param optimizer_kwargs: Additional keyword arguments,\n excluding the learning rate, to pass to the optimizer\n ",
|
7 |
+
"__init__": "<function ActorCriticPolicy.__init__ at 0x7f0c58a111b0>",
|
8 |
+
"_get_constructor_parameters": "<function ActorCriticPolicy._get_constructor_parameters at 0x7f0c58a11240>",
|
9 |
+
"reset_noise": "<function ActorCriticPolicy.reset_noise at 0x7f0c58a112d0>",
|
10 |
+
"_build_mlp_extractor": "<function ActorCriticPolicy._build_mlp_extractor at 0x7f0c58a11360>",
|
11 |
+
"_build": "<function ActorCriticPolicy._build at 0x7f0c58a113f0>",
|
12 |
+
"forward": "<function ActorCriticPolicy.forward at 0x7f0c58a11480>",
|
13 |
+
"extract_features": "<function ActorCriticPolicy.extract_features at 0x7f0c58a11510>",
|
14 |
+
"_get_action_dist_from_latent": "<function ActorCriticPolicy._get_action_dist_from_latent at 0x7f0c58a115a0>",
|
15 |
+
"_predict": "<function ActorCriticPolicy._predict at 0x7f0c58a11630>",
|
16 |
+
"evaluate_actions": "<function ActorCriticPolicy.evaluate_actions at 0x7f0c58a116c0>",
|
17 |
+
"get_distribution": "<function ActorCriticPolicy.get_distribution at 0x7f0c58a11750>",
|
18 |
+
"predict_values": "<function ActorCriticPolicy.predict_values at 0x7f0c58a117e0>",
|
19 |
+
"__abstractmethods__": "frozenset()",
|
20 |
+
"_abc_impl": "<_abc._abc_data object at 0x7f0c58a05e80>"
|
21 |
+
},
|
22 |
+
"verbose": 0,
|
23 |
+
"policy_kwargs": {},
|
24 |
+
"num_timesteps": 10027008,
|
25 |
+
"_total_timesteps": 10000000,
|
26 |
+
"_num_timesteps_at_start": 0,
|
27 |
+
"seed": null,
|
28 |
+
"action_noise": null,
|
29 |
+
"start_time": 1708166080608340309,
|
30 |
+
"learning_rate": 0.0003,
|
31 |
+
"tensorboard_log": null,
|
32 |
+
"_last_obs": {
|
33 |
+
":type:": "<class 'numpy.ndarray'>",
|
34 |
+
":serialized:": "gAWVdQQAAAAAAACMEm51bXB5LmNvcmUubnVtZXJpY5SMC19mcm9tYnVmZmVylJOUKJYABAAAAAAAAEATm7327HC6qn9ms3Nx8q6jbNC6fsPEMwAAgD8AAIA/jdwGPoPAFD/tt7s9qwhlv95f7j4xSKC9AAAAAAAAAABTFko+bgDLPsZLgb7mpke/8nyEPvWPib4AAAAAAAAAAI0bz708Loo/fKAvvn+GUL/VVIa+NCkwvgAAAAAAAAAAM9QGPbZVLbzcKbK9ztc/PSvV7TzfEpy8AACAPwAAgD8aqn89DmB7Pw9JFT6Ldmu/yptDPnjJJz4AAAAAAAAAALPVNT6TNDA/eniqvfFKOb8ir6k+fL2MvgAAAAAAAAAApiYtviKhfT9rSa6+Vu0Xv+Qqwr4BmZu+AAAAAAAAAADNvns8aQSdPxNEsD1u7zG/lO4DO0rPnDwAAAAAAAAAAA0rkj24TN86FdJPvlDWp76UcQe9IoutvQAAAAAAAAAABgYJvvuTej+w7U6+8bE7v6Wbzb54ZF6+AAAAAAAAAAAzYJg9DOuzPwLTYz4Ngd6+MHDyPNNY1jwAAAAAAAAAAIZbP75kplw/bz2ivWbHJb+Fjwe/lsJ5vQAAAAAAAAAAM31+vAozDrsWdWk95QaJPM5bFbxCRm09AACAPwAAgD9mWoy8H+2mufB7WztVdhU9SzYEOqVz/r0AAIA/AACAP2YuwrzsosK7kpZ9vP2E+jyKUbe7uNKQNwAAgD8AAIA/zaSHPXy4qT/zt/s+KZjsvmnKLj0EvZg+AAAAAAAAAADAVpq9Hv+gPySqTL6fFxy/SASVvQKIj74AAAAAAAAAAM0ZK73DkWG6eFV0tzZAmrL4E4o7jt2ONgAAgD8AAIA/MyfJu3bqtj+fTQW9BihKvhNVYLprEG45AAAAAAAAAACz+pQ97H6mPibmpL16uTO/VJanPb63B74AAAAAAAAAAKDIYz5zo6w+krDOvm4VRr8zdyM+NWeWvgAAAAAAAAAAhkhNPq34OD/dWLU9xy8ivwTg/z6x3Bi9AAAAAAAAAAAA9aE9PQREu2chw748aHG9sqyMPI0xbj8AAAAAAACAP6PjgL6yfPs+JmWRPsqxU7+EDsO+g1zdPgAAAAAAAAAAZo27PPaYc7pBORw3A4ybMqSKCjvikDG2AACAPwAAgD9NsS09cUcju1YwCb3/CSs9bHcbvC5QDT0AAIA/AACAPzMDhzzsf9C7ciyjvS15YjxxzIo8ATEHvgAAgD8AAIA/pvCHvXHXoj++x/++CfQov14jM73qlIu+AAAAAAAAAADAwJ69Pdhyu2UoHz1RqVk9vM0cPI5d+jwAAIA/AACAPwBJKr1cGyG6dn7hNafDXDFxMkc76skHtQAAgD8AAIA/puiOPd4rlT1m+4m+9AfMvhckZ71VrCO+AAAAAAAAAACUjAVudW1weZSMBWR0eXBllJOUjAJmNJSJiIeUUpQoSwOMATyUTk5OSv////9K/////0sAdJRiSyBLCIaUjAFDlHSUUpQu"
|
35 |
+
},
|
36 |
+
"_last_episode_starts": {
|
37 |
+
":type:": "<class 'numpy.ndarray'>",
|
38 |
+
":serialized:": "gAWVkwAAAAAAAACMEm51bXB5LmNvcmUubnVtZXJpY5SMC19mcm9tYnVmZmVylJOUKJYgAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAlIwFbnVtcHmUjAVkdHlwZZSTlIwCYjGUiYiHlFKUKEsDjAF8lE5OTkr/////Sv////9LAHSUYksghZSMAUOUdJRSlC4="
|
39 |
+
},
|
40 |
+
"_last_original_obs": null,
|
41 |
+
"_episode_num": 0,
|
42 |
+
"use_sde": false,
|
43 |
+
"sde_sample_freq": -1,
|
44 |
+
"_current_progress_remaining": -0.0027007999999999477,
|
45 |
+
"_stats_window_size": 100,
|
46 |
+
"ep_info_buffer": {
|
47 |
+
":type:": "<class 'collections.deque'>",
|
48 |
+
":serialized:": "gAWV4AsAAAAAAACMC2NvbGxlY3Rpb25zlIwFZGVxdWWUk5QpS2SGlFKUKH2UKIwBcpRHQHJX+0TlDF+MAWyUS3iMAXSUR0C2AIrah6BzdX2UKGgGR0Bz9euxKQJYaAdLr2gIR0C2AHG5tm+TdX2UKGgGR0Bx54j0L+glaAdLo2gIR0C2AM1efI0ZdX2UKGgGR0Bz7T5dnkDIaAdL0mgIR0C2AISZ4Oc2dX2UKGgGR0BzpYVk+X7caAdLkmgIR0C2ASABYFJQdX2UKGgGR0Bwfyp0fYBeaAdLmmgIR0C2AJJiZv1ldX2UKGgGR0ByWtnscABDaAdLrmgIR0C2ALwiu+yrdX2UKGgGR0ByKEF/x2B8aAdLh2gIR0C2ARHeBQN1dX2UKGgGR0BzQvSWqtHQaAdLmmgIR0C2AS+biIcjdX2UKGgGR0ByPiPEKmbcaAdLpGgIR0C2AI0WZZ0TdX2UKGgGR0Bx9GPIXCTEaAdLrGgIR0C2ASjZlFtsdX2UKGgGR0BxBSABkqc3aAdLhGgIR0C2ASd9x6v8dX2UKGgGR0Bv84X40uUVaAdLjWgIR0C2AMvR/mT1dX2UKGgGR0BMemgzxgAqaAdLhmgIR0C2ANSu2Zy/dX2UKGgGR0Bz3vu4PPLQaAdLvGgIR0C2AMsbWEsbdX2UKGgGR0Byoakj5bhWaAdLo2gIR0C2AMgPqcEvdX2UKGgGR0BwJuJWNm16aAdLimgIR0C2AP4gV45cdX2UKGgGR0BxtIZLqUu+aAdLmGgIR0C2ANX9R77bdX2UKGgGR0Bz7u+7Dl5oaAdLuWgIR0C2ALQX/HYIdX2UKGgGR0ByWC8g6ltTaAdLo2gIR0C2AU0DZDiPdX2UKGgGR0Bx8Cqn3ta7aAdLsWgIR0C2ARdj5KvndX2UKGgGR0BzLWxRl6JJaAdLk2gIR0C2AQc1Gb1AdX2UKGgGR0BynjzmOlwcaAdLjGgIR0C2AVmvfTCtdX2UKGgGR0BxuuMUAT7EaAdLlmgIR0C2AVxEv0yydX2UKGgGR0BzWY4JeE7GaAdLpmgIR0C2APfIn0CjdX2UKGgGR0BxjKrdWQwLaAdLfWgIR0C2AMVC1JDmdX2UKGgGR0Bx8t6rvLHNaAdLh2gIR0C2APW8dxQ0dX2UKGgGR0ByGDiIcinpaAdLqGgIR0C2AWg1rIo3dX2UKGgGR0ByTGg6EJ0GaAdLoWgIR0C2AMFWKdhBdX2UKGgGR0BydAF+uvECaAdLtGgIR0C2ANUTHsC1dX2UKGgGR0Bz/wzuWrwOaAdLlmgIR0C2ANzZtelbdX2UKGgGR0BxePNgSeyzaAdLjGgIR0C2ASqagElmdX2UKGgGR0By/jRKHwgDaAdLs2gIR0C2ANhP420idX2UKGgGR0Bz3Nf9gnc+aAdLpWgIR0C2APnqu8sddX2UKGgGR0BwloyzollcaAdLk2gIR0C2APR7eEZjdX2UKGgGR0BxlJKbrkbQaAdLtWgIR0C2ATUE5hjOdX2UKGgGR0BwnT56+nIiaAdLkGgIR0C2AXGNvOyFdX2UKGgGR0Bwppc3VCokaAdLkGgIR0C2AY9bX6IndX2UKGgGR0BzgHmHP/rCaAdLmmgIR0C2ASLtqpLmdX2UKGgGR0Bxr4prk8zRaAdLemgIR0C2AR4uscQzdX2UKGgGR0Bwpy9FnZkDaAdLhWgIR0C2AYKfOD8MdX2UKGgGR0B0TG4vvjOtaAdLzmgIR0C2ARCF49owdX2UKGgGR0BzFF+EytV8aAdLhmgIR0C2AVl3Y+SsdX2UKGgGR0By4r+JgsshaAdLp2gIR0C2AP4YJmdzdX2UKGgGR0BzDsC5mRNiaAdLg2gIR0C2AQ0mD15CdX2UKGgGR0BzzfpiZv1laAdL02gIR0C2Aa+R5kbxdX2UKGgGR0BxtgxIre67aAdLlGgIR0C2ATrhrFfidX2UKGgGR0Bx/6yB06o3aAdLimgIR0C2AXWrKeTWdX2UKGgGR0Bxv4Sh8IAwaAdLomgIR0C2AUMH8jzJdX2UKGgGR0BwgW2Yv38GaAdLnWgIR0C2ATZhnanKdX2UKGgGR0BxrFvybx3FaAdLrGgIR0C2AZ0SqU/wdX2UKGgGR0BwEWwqy4WlaAdLl2gIR0C2AbQ9FF2FdX2UKGgGR0BzDg2vStvGaAdLp2gIR0C2ATnjU/fPdX2UKGgGR0ByPAQtjCpFaAdLoGgIR0C2AXUqc3ERdX2UKGgGR0BxGPayrxRVaAdLj2gIR0C2Act4NZvDdX2UKGgGR0BvX6rilzltaAdLmGgIR0C2AS5PAO8TdX2UKGgGR0BzBQEfT1CgaAdLlGgIR0C2ASfwiJO4dX2UKGgGR0Bwe7AIppevaAdLomgIR0C2AWduHerNdX2UKGgGR0BxC+ZnctXgaAdLlmgIR0C2AT0Q04zadX2UKGgGR0BzfdpXZGrkaAdLrmgIR0C2AdPv8ZUDdX2UKGgGR0Bxu7MA3kxRaAdLiGgIR0C2AZQq7ROUdX2UKGgGR0ByJ2S9ugpSaAdLsWgIR0C2AXDcRDkVdX2UKGgGR0BzOOxt52QoaAdLjWgIR0C2AVcAaNuMdX2UKGgGR0BwNFW/8EV4aAdLjWgIR0C2AdP2kBS2dX2UKGgGR0ByR5IsiB5HaAdLrWgIR0C2AaNITXardX2UKGgGR0BxzFY9xIataAdLimgIR0C2AYNKNAC5dX2UKGgGR0BzjGtcOby6aAdLz2gIR0C2Aeh0yP+5dX2UKGgGR0BzctW+49X+aAdLo2gIR0C2AWrbQC0XdX2UKGgGR0BzJwU+LWI5aAdLt2gIR0C2AVearmyPdX2UKGgGR0BzthmmLtNSaAdLvWgIR0C2AWCoCMgmdX2UKGgGR0BzhGax5cC6aAdLrWgIR0C2Agc1TBIndX2UKGgGR0BzJk9V3ljmaAdLqGgIR0C2AfZML4N7dX2UKGgGR0Bx4T8BMi8naAdLl2gIR0C2AXe1F6RhdX2UKGgGR0BwWf0cwQDnaAdLl2gIR0C2AXRgVoHtdX2UKGgGR0By+NzEJjUeaAdLj2gIR0C2AZxGlQ/HdX2UKGgGR0Bw2BzPrv9caAdLhmgIR0C2Ag+LiuMddX2UKGgGR0ByR0ySFGoaaAdLoWgIR0C2Acd38n/ldX2UKGgGR0BxiTel9BrvaAdLomgIR0C2AWzAvcrRdX2UKGgGR0BwHaWt2cJ/aAdLjGgIR0C2AfxwVCXydX2UKGgGR0BzFVr9ETg3aAdLmWgIR0C2Ad4J/oaDdX2UKGgGR0B0AghMajveaAdLwWgIR0C2AaJ2Qnx8dX2UKGgGR0BxT6wNb1RMaAdLlGgIR0C2AZsERraedX2UKGgGR0ByEJ5MURFraAdLmWgIR0C2AasU21lYdX2UKGgGR0BwQAflp48maAdLkGgIR0C2AZvXwsoVdX2UKGgGR0By/PN/vv0AaAdLtGgIR0C2AimsJY1YdX2UKGgGR0BxUj9qDbrUaAdLk2gIR0C2AdkFbFCLdX2UKGgGR0Bxlr6qKgqWaAdLjGgIR0C2AYZyZKFqdX2UKGgGR0Bxk3i83++/aAdLnmgIR0C2AjZjc2zfdX2UKGgGR0BwtpoTPBznaAdLjmgIR0C2AZ3IZIhAdX2UKGgGR0By7iPNmlImaAdLoGgIR0C2AZsSoOx0dX2UKGgGR0Bxf15v99+gaAdLmWgIR0C2Ac97a7EpdX2UKGgGR0Bw4peAuqWDaAdLk2gIR0C2AjgwsXizdX2UKGgGR0BwB9kpZwGXaAdLj2gIR0C2AdGYF7ladX2UKGgGR0Bwk1J9RaX8aAdLh2gIR0C2Af6+i8FqdX2UKGgGR0BwXwdYGMXKaAdLk2gIR0C2Afekxh2GdX2UKGgGR0By5NikO7QLaAdLj2gIR0C2Abe7cwg1dX2UKGgGR0BwQr4k/r0KaAdLkWgIR0C2AeXfdhy9dX2UKGgGR0Byp9W6shgWaAdLjWgIR0C2AcrLpzLfdX2UKGgGR0BIo7BfrrxBaAdLWWgIR0C2Aht8JD3NdX2UKGgGR0BxIS8xsVL0aAdLmmgIR0C2AjzWCmMwdWUu"
|
49 |
+
},
|
50 |
+
"ep_success_buffer": {
|
51 |
+
":type:": "<class 'collections.deque'>",
|
52 |
+
":serialized:": "gAWVIAAAAAAAAACMC2NvbGxlY3Rpb25zlIwFZGVxdWWUk5QpS2SGlFKULg=="
|
53 |
+
},
|
54 |
+
"_n_updates": 1530,
|
55 |
+
"observation_space": {
|
56 |
+
":type:": "<class 'gymnasium.spaces.box.Box'>",
|
57 |
+
":serialized:": "gAWVdgIAAAAAAACMFGd5bW5hc2l1bS5zcGFjZXMuYm94lIwDQm94lJOUKYGUfZQojAVkdHlwZZSMBW51bXB5lIwFZHR5cGWUk5SMAmY0lImIh5RSlChLA4wBPJROTk5K/////0r/////SwB0lGKMDWJvdW5kZWRfYmVsb3eUjBJudW1weS5jb3JlLm51bWVyaWOUjAtfZnJvbWJ1ZmZlcpSTlCiWCAAAAAAAAAABAQEBAQEBAZRoCIwCYjGUiYiHlFKUKEsDjAF8lE5OTkr/////Sv////9LAHSUYksIhZSMAUOUdJRSlIwNYm91bmRlZF9hYm92ZZRoESiWCAAAAAAAAAABAQEBAQEBAZRoFUsIhZRoGXSUUpSMBl9zaGFwZZRLCIWUjANsb3eUaBEoliAAAAAAAAAAAAC0wgAAtMIAAKDAAACgwNsPScAAAKDAAAAAgAAAAICUaAtLCIWUaBl0lFKUjARoaWdolGgRKJYgAAAAAAAAAAAAtEIAALRCAACgQAAAoEDbD0lAAACgQAAAgD8AAIA/lGgLSwiFlGgZdJRSlIwIbG93X3JlcHKUjFtbLTkwLiAgICAgICAgLTkwLiAgICAgICAgIC01LiAgICAgICAgIC01LiAgICAgICAgIC0zLjE0MTU5MjcgIC01LgogIC0wLiAgICAgICAgIC0wLiAgICAgICBdlIwJaGlnaF9yZXBylIxTWzkwLiAgICAgICAgOTAuICAgICAgICAgNS4gICAgICAgICA1LiAgICAgICAgIDMuMTQxNTkyNyAgNS4KICAxLiAgICAgICAgIDEuICAgICAgIF2UjApfbnBfcmFuZG9tlE51Yi4=",
|
58 |
+
"dtype": "float32",
|
59 |
+
"bounded_below": "[ True True True True True True True True]",
|
60 |
+
"bounded_above": "[ True True True True True True True True]",
|
61 |
+
"_shape": [
|
62 |
+
8
|
63 |
+
],
|
64 |
+
"low": "[-90. -90. -5. -5. -3.1415927 -5.\n -0. -0. ]",
|
65 |
+
"high": "[90. 90. 5. 5. 3.1415927 5.\n 1. 1. ]",
|
66 |
+
"low_repr": "[-90. -90. -5. -5. -3.1415927 -5.\n -0. -0. ]",
|
67 |
+
"high_repr": "[90. 90. 5. 5. 3.1415927 5.\n 1. 1. ]",
|
68 |
+
"_np_random": null
|
69 |
+
},
|
70 |
+
"action_space": {
|
71 |
+
":type:": "<class 'gymnasium.spaces.discrete.Discrete'>",
|
72 |
+
":serialized:": "gAWV/QAAAAAAAACMGWd5bW5hc2l1bS5zcGFjZXMuZGlzY3JldGWUjAhEaXNjcmV0ZZSTlCmBlH2UKIwBbpSMFW51bXB5LmNvcmUubXVsdGlhcnJheZSMBnNjYWxhcpSTlIwFbnVtcHmUjAVkdHlwZZSTlIwCaTiUiYiHlFKUKEsDjAE8lE5OTkr/////Sv////9LAHSUYkMIBAAAAAAAAACUhpRSlIwFc3RhcnSUaAhoDkMIAAAAAAAAAACUhpRSlIwGX3NoYXBllCmMBWR0eXBllGgLjAJpOJSJiIeUUpQoSwNoD05OTkr/////Sv////9LAHSUYowKX25wX3JhbmRvbZROdWIu",
|
73 |
+
"n": "4",
|
74 |
+
"start": "0",
|
75 |
+
"_shape": [],
|
76 |
+
"dtype": "int64",
|
77 |
+
"_np_random": null
|
78 |
+
},
|
79 |
+
"n_envs": 32,
|
80 |
+
"n_steps": 2048,
|
81 |
+
"gamma": 0.99,
|
82 |
+
"gae_lambda": 0.95,
|
83 |
+
"ent_coef": 0.0,
|
84 |
+
"vf_coef": 0.5,
|
85 |
+
"max_grad_norm": 0.5,
|
86 |
+
"rollout_buffer_class": {
|
87 |
+
":type:": "<class 'abc.ABCMeta'>",
|
88 |
+
":serialized:": "gAWVNgAAAAAAAACMIHN0YWJsZV9iYXNlbGluZXMzLmNvbW1vbi5idWZmZXJzlIwNUm9sbG91dEJ1ZmZlcpSTlC4=",
|
89 |
+
"__module__": "stable_baselines3.common.buffers",
|
90 |
+
"__annotations__": "{'observations': <class 'numpy.ndarray'>, 'actions': <class 'numpy.ndarray'>, 'rewards': <class 'numpy.ndarray'>, 'advantages': <class 'numpy.ndarray'>, 'returns': <class 'numpy.ndarray'>, 'episode_starts': <class 'numpy.ndarray'>, 'log_probs': <class 'numpy.ndarray'>, 'values': <class 'numpy.ndarray'>}",
|
91 |
+
"__doc__": "\n Rollout buffer used in on-policy algorithms like A2C/PPO.\n It corresponds to ``buffer_size`` transitions collected\n using the current policy.\n This experience will be discarded after the policy update.\n In order to use PPO objective, we also store the current value of each state\n and the log probability of each taken action.\n\n The term rollout here refers to the model-free notion and should not\n be used with the concept of rollout used in model-based RL or planning.\n Hence, it is only involved in policy and value function training but not action selection.\n\n :param buffer_size: Max number of element in the buffer\n :param observation_space: Observation space\n :param action_space: Action space\n :param device: PyTorch device\n :param gae_lambda: Factor for trade-off of bias vs variance for Generalized Advantage Estimator\n Equivalent to classic advantage when set to 1.\n :param gamma: Discount factor\n :param n_envs: Number of parallel environments\n ",
|
92 |
+
"__init__": "<function RolloutBuffer.__init__ at 0x7f0c58ba3520>",
|
93 |
+
"reset": "<function RolloutBuffer.reset at 0x7f0c58ba35b0>",
|
94 |
+
"compute_returns_and_advantage": "<function RolloutBuffer.compute_returns_and_advantage at 0x7f0c58ba3640>",
|
95 |
+
"add": "<function RolloutBuffer.add at 0x7f0c58ba36d0>",
|
96 |
+
"get": "<function RolloutBuffer.get at 0x7f0c58ba3760>",
|
97 |
+
"_get_samples": "<function RolloutBuffer._get_samples at 0x7f0c58ba37f0>",
|
98 |
+
"__abstractmethods__": "frozenset()",
|
99 |
+
"_abc_impl": "<_abc._abc_data object at 0x7f0c58d68540>"
|
100 |
+
},
|
101 |
+
"rollout_buffer_kwargs": {},
|
102 |
+
"batch_size": 64,
|
103 |
+
"n_epochs": 10,
|
104 |
+
"clip_range": {
|
105 |
+
":type:": "<class 'function'>",
|
106 |
+
":serialized:": "gAWV3AIAAAAAAACMF2Nsb3VkcGlja2xlLmNsb3VkcGlja2xllIwOX21ha2VfZnVuY3Rpb26Uk5QoaACMDV9idWlsdGluX3R5cGWUk5SMCENvZGVUeXBllIWUUpQoSwFLAEsASwFLAUsTQwSIAFMAlE6FlCmMAV+UhZSMYy9ob21lL2pvc3NrYS9taW5pY29uZGEzL2VudnMvanNtbDMxMC9saWIvcHl0aG9uMy4xMC9zaXRlLXBhY2thZ2VzL3N0YWJsZV9iYXNlbGluZXMzL2NvbW1vbi91dGlscy5weZSMBGZ1bmOUS4NDAgQBlIwDdmFslIWUKXSUUpR9lCiMC19fcGFja2FnZV9flIwYc3RhYmxlX2Jhc2VsaW5lczMuY29tbW9ulIwIX19uYW1lX1+UjB5zdGFibGVfYmFzZWxpbmVzMy5jb21tb24udXRpbHOUjAhfX2ZpbGVfX5SMYy9ob21lL2pvc3NrYS9taW5pY29uZGEzL2VudnMvanNtbDMxMC9saWIvcHl0aG9uMy4xMC9zaXRlLXBhY2thZ2VzL3N0YWJsZV9iYXNlbGluZXMzL2NvbW1vbi91dGlscy5weZR1Tk5oAIwQX21ha2VfZW1wdHlfY2VsbJSTlClSlIWUdJRSlGgAjBJfZnVuY3Rpb25fc2V0c3RhdGWUk5RoH32UfZQoaBZoDYwMX19xdWFsbmFtZV9flIwZY29uc3RhbnRfZm4uPGxvY2Fscz4uZnVuY5SMD19fYW5ub3RhdGlvbnNfX5R9lIwOX19rd2RlZmF1bHRzX1+UTowMX19kZWZhdWx0c19flE6MCl9fbW9kdWxlX1+UaBeMB19fZG9jX1+UTowLX19jbG9zdXJlX1+UaACMCl9tYWtlX2NlbGyUk5RHP8mZmZmZmZqFlFKUhZSMF19jbG91ZHBpY2tsZV9zdWJtb2R1bGVzlF2UjAtfX2dsb2JhbHNfX5R9lHWGlIZSMC4="
|
107 |
+
},
|
108 |
+
"clip_range_vf": null,
|
109 |
+
"normalize_advantage": true,
|
110 |
+
"target_kl": null,
|
111 |
+
"lr_schedule": {
|
112 |
+
":type:": "<class 'function'>",
|
113 |
+
":serialized:": "gAWV3AIAAAAAAACMF2Nsb3VkcGlja2xlLmNsb3VkcGlja2xllIwOX21ha2VfZnVuY3Rpb26Uk5QoaACMDV9idWlsdGluX3R5cGWUk5SMCENvZGVUeXBllIWUUpQoSwFLAEsASwFLAUsTQwSIAFMAlE6FlCmMAV+UhZSMYy9ob21lL2pvc3NrYS9taW5pY29uZGEzL2VudnMvanNtbDMxMC9saWIvcHl0aG9uMy4xMC9zaXRlLXBhY2thZ2VzL3N0YWJsZV9iYXNlbGluZXMzL2NvbW1vbi91dGlscy5weZSMBGZ1bmOUS4NDAgQBlIwDdmFslIWUKXSUUpR9lCiMC19fcGFja2FnZV9flIwYc3RhYmxlX2Jhc2VsaW5lczMuY29tbW9ulIwIX19uYW1lX1+UjB5zdGFibGVfYmFzZWxpbmVzMy5jb21tb24udXRpbHOUjAhfX2ZpbGVfX5SMYy9ob21lL2pvc3NrYS9taW5pY29uZGEzL2VudnMvanNtbDMxMC9saWIvcHl0aG9uMy4xMC9zaXRlLXBhY2thZ2VzL3N0YWJsZV9iYXNlbGluZXMzL2NvbW1vbi91dGlscy5weZR1Tk5oAIwQX21ha2VfZW1wdHlfY2VsbJSTlClSlIWUdJRSlGgAjBJfZnVuY3Rpb25fc2V0c3RhdGWUk5RoH32UfZQoaBZoDYwMX19xdWFsbmFtZV9flIwZY29uc3RhbnRfZm4uPGxvY2Fscz4uZnVuY5SMD19fYW5ub3RhdGlvbnNfX5R9lIwOX19rd2RlZmF1bHRzX1+UTowMX19kZWZhdWx0c19flE6MCl9fbW9kdWxlX1+UaBeMB19fZG9jX1+UTowLX19jbG9zdXJlX1+UaACMCl9tYWtlX2NlbGyUk5RHPzOpKjBVMmGFlFKUhZSMF19jbG91ZHBpY2tsZV9zdWJtb2R1bGVzlF2UjAtfX2dsb2JhbHNfX5R9lHWGlIZSMC4="
|
114 |
+
},
|
115 |
+
"system_info": {
|
116 |
+
"OS": "Linux-5.15.133.1-microsoft-standard-WSL2-x86_64-with-glibc2.35 # 1 SMP Thu Oct 5 21:02:42 UTC 2023",
|
117 |
+
"Python": "3.10.13",
|
118 |
+
"Stable-Baselines3": "2.2.1",
|
119 |
+
"PyTorch": "2.2.0+cu121",
|
120 |
+
"GPU Enabled": "True",
|
121 |
+
"Numpy": "1.26.4",
|
122 |
+
"Cloudpickle": "3.0.0",
|
123 |
+
"Gymnasium": "0.28.1"
|
124 |
+
}
|
125 |
+
}
|
config-ppo-LunarLander-v2_123_456_789_hf_defaults.json
ADDED
@@ -0,0 +1,125 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"policy_class": {
|
3 |
+
":type:": "<class 'abc.ABCMeta'>",
|
4 |
+
":serialized:": "gAWVOwAAAAAAAACMIXN0YWJsZV9iYXNlbGluZXMzLmNvbW1vbi5wb2xpY2llc5SMEUFjdG9yQ3JpdGljUG9saWN5lJOULg==",
|
5 |
+
"__module__": "stable_baselines3.common.policies",
|
6 |
+
"__doc__": "\n Policy class for actor-critic algorithms (has both policy and value prediction).\n Used by A2C, PPO and the likes.\n\n :param observation_space: Observation space\n :param action_space: Action space\n :param lr_schedule: Learning rate schedule (could be constant)\n :param net_arch: The specification of the policy and value networks.\n :param activation_fn: Activation function\n :param ortho_init: Whether to use or not orthogonal initialization\n :param use_sde: Whether to use State Dependent Exploration or not\n :param log_std_init: Initial value for the log standard deviation\n :param full_std: Whether to use (n_features x n_actions) parameters\n for the std instead of only (n_features,) when using gSDE\n :param use_expln: Use ``expln()`` function instead of ``exp()`` to ensure\n a positive standard deviation (cf paper). It allows to keep variance\n above zero and prevent it from growing too fast. In practice, ``exp()`` is usually enough.\n :param squash_output: Whether to squash the output using a tanh function,\n this allows to ensure boundaries when using gSDE.\n :param features_extractor_class: Features extractor to use.\n :param features_extractor_kwargs: Keyword arguments\n to pass to the features extractor.\n :param share_features_extractor: If True, the features extractor is shared between the policy and value networks.\n :param normalize_images: Whether to normalize images or not,\n dividing by 255.0 (True by default)\n :param optimizer_class: The optimizer to use,\n ``th.optim.Adam`` by default\n :param optimizer_kwargs: Additional keyword arguments,\n excluding the learning rate, to pass to the optimizer\n ",
|
7 |
+
"__init__": "<function ActorCriticPolicy.__init__ at 0x7f3bc45911b0>",
|
8 |
+
"_get_constructor_parameters": "<function ActorCriticPolicy._get_constructor_parameters at 0x7f3bc4591240>",
|
9 |
+
"reset_noise": "<function ActorCriticPolicy.reset_noise at 0x7f3bc45912d0>",
|
10 |
+
"_build_mlp_extractor": "<function ActorCriticPolicy._build_mlp_extractor at 0x7f3bc4591360>",
|
11 |
+
"_build": "<function ActorCriticPolicy._build at 0x7f3bc45913f0>",
|
12 |
+
"forward": "<function ActorCriticPolicy.forward at 0x7f3bc4591480>",
|
13 |
+
"extract_features": "<function ActorCriticPolicy.extract_features at 0x7f3bc4591510>",
|
14 |
+
"_get_action_dist_from_latent": "<function ActorCriticPolicy._get_action_dist_from_latent at 0x7f3bc45915a0>",
|
15 |
+
"_predict": "<function ActorCriticPolicy._predict at 0x7f3bc4591630>",
|
16 |
+
"evaluate_actions": "<function ActorCriticPolicy.evaluate_actions at 0x7f3bc45916c0>",
|
17 |
+
"get_distribution": "<function ActorCriticPolicy.get_distribution at 0x7f3bc4591750>",
|
18 |
+
"predict_values": "<function ActorCriticPolicy.predict_values at 0x7f3bc45917e0>",
|
19 |
+
"__abstractmethods__": "frozenset()",
|
20 |
+
"_abc_impl": "<_abc._abc_data object at 0x7f3bc4585d40>"
|
21 |
+
},
|
22 |
+
"verbose": 0,
|
23 |
+
"policy_kwargs": {},
|
24 |
+
"num_timesteps": 123469824,
|
25 |
+
"_total_timesteps": 123456789,
|
26 |
+
"_num_timesteps_at_start": 0,
|
27 |
+
"seed": null,
|
28 |
+
"action_noise": null,
|
29 |
+
"start_time": 1708381100659934127,
|
30 |
+
"learning_rate": 0.0003,
|
31 |
+
"tensorboard_log": null,
|
32 |
+
"_last_obs": {
|
33 |
+
":type:": "<class 'numpy.ndarray'>",
|
34 |
+
":serialized:": "gAWVdQQAAAAAAACMEm51bXB5LmNvcmUubnVtZXJpY5SMC19mcm9tYnVmZmVylJOUKJYABAAAAAAAAC0vXz6vPes+m5mbvi/bMr8kxKU+Lz2svgAAAAAAAAAAZjb+uqQhSruhmsy7IFTWPGNIITxROLC8AACAPwAAgD8AVOk8iFu3P5v1vz51w2k9I3g3PAqIHT4AAAAAAAAAAIAyKD6HQqc/M6/5PrylDb9l7wc/TvGyPgAAAAAAAAAAmpd0PSrT/D4GbTu9gkJev5cO6j0ckaO9AAAAAAAAAAAzy4C9QLdIP1Q9t70fS3W/Sb1SvkFLD7wAAAAAAAAAAJrqrDzbOKG8SpZ4vvDzLb4tGZM89VFpPwAAgD8AAIA/5u3BPTxFkj7Q1TK+9LIsv7te7z1axUW+AAAAAAAAAAAztQg9w506um+yLLz2TpQ1hmKHO0qODLUAAIA/AACAP5qL+j3eLdI9YBoJvzZR9b6UGa29vxXKvgAAAAAAAAAAmhkzPEw8tT8TWww/BqPIPeAjNbx5ScC9AAAAAAAAAADT8iC+fxp/P4DH975gcjS/gTZ1vmqHpr4AAAAAAAAAACqrjL63k3w/W30hvj9VU7/NNRy/mHD0PQAAAAAAAAAAJjHXPWIAgD/2l5Q+Hus+v49iVD4erYQ+AAAAAAAAAAAzU1S6w5lKusgo/DXTxXAxewrluQqAIrUAAIA/AACAPwAM4ruuS4a6ghNRPRolIjlw1Ta7EwAbOAAAgD8AAIA/pkA5vuQchD+dInq+h6BWv3zg3b7Q9GO9AAAAAAAAAADNeeo8e4W3P/1bpz5hBgk9Pbd+PNCoHj4AAAAAAAAAAM3f0bz2zHi6HE0guBp5zbIdXcY5tDQ3NwAAgD8AAIA/Wu5evqvYKz/iUYc9JfdYv1gHxr5rvjo+AAAAAAAAAADN4OW7w5Unuv6lHLj91IuznbWiO0iENzcAAIA/AACAPzMTETtIG426X6CUvfVHPbSsAzm6mlmrMwAAgD8AAIA/ZjasOp/Y9bsU1J09z+SzPDaOsjwu9868AACAPwAAgD8zAa68ZNGuP935bb5HSrC+rK5xvGYdJr4AAAAAAAAAAKAhMr4n150++K6bPn/pM7+xCHa+lR5qPgAAAAAAAAAATY4IPj5kcT/mXA8+/yJgv85C2D5cWKu8AAAAAAAAAABmSAi8FJyhus2vJ7ipx2uzbehmOp+EPzcAAIA/AACAP43Lij0840Q99uXYvpOewr5+KUS+LY/DvgAAAAAAAAAAM4+Wu+Ggrbqezgg1s5faL6LxYDrlCG20AACAPwAAgD+aiYC6SJeAuvXw4rNB/aMvxE2EugWxpzMAAIA/AACAP5rFLT1FshE+43hKvqYADb/JAiA7uNi2vQAAAAAAAAAAmhkEunvClbrS1Hq6BAdstb1QlrjtJZE5AACAPwAAgD+UjAVudW1weZSMBWR0eXBllJOUjAJmNJSJiIeUUpQoSwOMATyUTk5OSv////9K/////0sAdJRiSyBLCIaUjAFDlHSUUpQu"
|
35 |
+
},
|
36 |
+
"_last_episode_starts": {
|
37 |
+
":type:": "<class 'numpy.ndarray'>",
|
38 |
+
":serialized:": "gAWVkwAAAAAAAACMEm51bXB5LmNvcmUubnVtZXJpY5SMC19mcm9tYnVmZmVylJOUKJYgAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAlIwFbnVtcHmUjAVkdHlwZZSTlIwCYjGUiYiHlFKUKEsDjAF8lE5OTkr/////Sv////9LAHSUYksghZSMAUOUdJRSlC4="
|
39 |
+
},
|
40 |
+
"_last_original_obs": null,
|
41 |
+
"_episode_num": 0,
|
42 |
+
"use_sde": false,
|
43 |
+
"sde_sample_freq": -1,
|
44 |
+
"_current_progress_remaining": -0.00010558350096090408,
|
45 |
+
"_stats_window_size": 100,
|
46 |
+
"ep_info_buffer": {
|
47 |
+
":type:": "<class 'collections.deque'>",
|
48 |
+
":serialized:": "gAWV4QsAAAAAAACMC2NvbGxlY3Rpb25zlIwFZGVxdWWUk5QpS2SGlFKUKH2UKIwBcpRHQHAKqVyFPBWMAWyUS5eMAXSUR0DqXdl/qgRLdX2UKGgGR0BzCrcUM5OraAdLt2gIR0DqXdmdfb9IdX2UKGgGR0Byaka86FM7aAdLsGgIR0DqXdgx7AtWdX2UKGgGR0Bv57cM3IdVaAdLnWgIR0DqXdp+b3GodX2UKGgGR0ByuqsfaHsUaAdLrGgIR0DqXdscFyJbdX2UKGgGR0BynZ4IKMNuaAdLr2gIR0DqXdpKwIMSdX2UKGgGR0BzHHQrtmcwaAdLtWgIR0DqXdulQdjodX2UKGgGR0BvLBPl+3H8aAdLjmgIR0DqXd0qwQlKdX2UKGgGR0BzAw8yN4qxaAdLmWgIR0DqXeL84PwvdX2UKGgGR0Bz+L8R+SbIaAdLtmgIR0DqXd9VDrqudX2UKGgGR0Bw1MX1rZanaAdLkGgIR0DqXd6gQHzIdX2UKGgGR0By3LmNipeeaAdLtWgIR0DqXd3/jKgadX2UKGgGR0BzZDtdAxBWaAdLsmgIR0DqXeDat9x7dX2UKGgGR0Bv3on+hoM8aAdLn2gIR0DqXd83eenRdX2UKGgGR0BwLFuHerMlaAdLlWgIR0DqXdzvRZ2ZdX2UKGgGR0Bw99JTVDrraAdLjmgIR0DqXd5ZeRgadX2UKGgGR0BxSv30wrUcaAdLsWgIR0DqXeBiG34LdX2UKGgGR0BzRD0NBnjAaAdLrWgIR0DqXeBTRYzSdX2UKGgGR0Bwrzklu3tsaAdLjmgIR0DqXd6g1WKedX2UKGgGR0BzNekN4JNTaAdLuGgIR0DqXeIpkPMCdX2UKGgGR0Byelszl90BaAdLumgIR0DqXeA0xdpqdX2UKGgGR0ByyjGT9sJqaAdLvWgIR0DqXeOVbiZOdX2UKGgGR0BxodqDbrTqaAdLsGgIR0DqXeM1F6RhdX2UKGgGR0By+gy/KyOaaAdLlGgIR0DqXeUqp97XdX2UKGgGR0B0CipcX3xnaAdLwmgIR0DqXeME384xdX2UKGgGR0Bxl6f29L6DaAdLkGgIR0DqXeTXzUZvdX2UKGgGR0BytMSf16E8aAdLuWgIR0DqXeOUmD15dX2UKGgGR0B0H0lhPTG6aAdLv2gIR0DqXeVO1v2odX2UKGgGR0Bx389zOopAaAdLr2gIR0DqXepf779AdX2UKGgGR0BxWy0ngHeKaAdLl2gIR0DqXeS83++/dX2UKGgGR0By5TA31jAjaAdLlWgIR0DqXebZeRgadX2UKGgGR0B0aaNhmXgMaAdLvWgIR0DqXep+5OJtdX2UKGgGR0BwTYZhrnDBaAdLrWgIR0DqXeffXwsodX2UKGgGR0Bw6iEtdzGQaAdLr2gIR0DqXegmMOwxdX2UKGgGR0BySRARkEs8aAdLlmgIR0DqXe9VoYeldX2UKGgGR0Bx6+Jzkp7UaAdLkWgIR0DqXeqMJhOQdX2UKGgGR0BxPkxVQyh0aAdLjmgIR0DqXeyZ2pyZdX2UKGgGR0BvA0dilSCOaAdLoWgIR0DqXeynG828dX2UKGgGR0BzIPi4rjHXaAdLuWgIR0DqXep5uZTidX2UKGgGR0Bzg/eDWbw0aAdLl2gIR0DqXel6cAindX2UKGgGR0ByXK5lOGj9aAdLt2gIR0DqXemCjk+5dX2UKGgGR0Bxg9FXq7iAaAdLhmgIR0DqXenjsD4hdX2UKGgGR0By3+RYA80UaAdLkWgIR0DqXex/vv0AdX2UKGgGR0BzEVU0elsQaAdLu2gIR0DqXey6r/83dX2UKGgGR0Bz67qQiiZfaAdLw2gIR0DqXevUXpGGdX2UKGgGR0BzRit9x6v8aAdLrGgIR0DqXe1xtHhCdX2UKGgGR0B0c4l9jPOZaAdLvGgIR0DqXe2C/47BdX2UKGgGR0ByRVrDZUT+aAdLqGgIR0DqXe5KYiPidX2UKGgGR0Bz3ZL7GecyaAdLwWgIR0DqXe5F72L6dX2UKGgGR0BmCKTMaCL/aAdN6ANoCEdA6l3u+9Jz1nV9lChoBkdAc+B0IToMa2gHS6JoCEdA6l3w99lVcXV9lChoBkdAcyEUNayKN2gHS7loCEdA6l3vfhVENXV9lChoBkdAcTLaisXBQGgHS6poCEdA6l3xRrzoU3V9lChoBkdAc54Trmhdt2gHS8FoCEdA6l3yHjIaLnV9lChoBkdAccOuAI6bOWgHS6ZoCEdA6l3y59mYjXV9lChoBkdAcZ5ER8MNMGgHS6VoCEdA6l3wrDZUUHV9lChoBkdAcVgKqGUOeGgHS6VoCEdA6l3yd2ovSXV9lChoBkdAcjdNATqSo2gHS45oCEdA6l32EAo5P3V9lChoBkdAcW40Ltu1nmgHS6RoCEdA6l3xE0BOpXV9lChoBkdAcZYySFGoaWgHS5ZoCEdA6l32w/HHWHV9lChoBkdAc0Gkc0cfeWgHS6loCEdA6l3zLFn7HnV9lChoBkdAcq8Tjebd8GgHS6loCEdA6l30yOq//XV9lChoBkdAcjwP+4smOWgHS7JoCEdA6l3zhAWznnV9lChoBkdAcl8RYzSCv2gHS7BoCEdA6l32fD+BH3V9lChoBkdAclmScLBsRGgHS69oCEdA6l32r4nF53V9lChoBkdAcscjHGS6lWgHS6loCEdA6l36kZR8+nV9lChoBkdAcu6toSL612gHS7doCEdA6l350vf0mXV9lChoBkdAclpflp48l2gHS7BoCEdA6l37RJ/XoXV9lChoBkdAckhIdU83dmgHS6loCEdA6l36io86m3V9lChoBkdAchYtfXwsoWgHS6toCEdA6l34GFJxvXV9lChoBkdAck0AhB7eEmgHS51oCEdA6l36j15B1XV9lChoBkdAcz8EcsDnvGgHS6RoCEdA6l35gdfb9XV9lChoBkdAcusVkc0cfmgHS6xoCEdA6l37CzC1qnV9lChoBkdAcYPub7TDwmgHS4loCEdA6l35xwyZa3V9lChoBkdAcqpE12q1gGgHS7loCEdA6l357kXDWXV9lChoBkdAcXb4T9KmK2gHS8toCEdA6l4AUkfLcXV9lChoBkdAclOaGYa5w2gHS7xoCEdA6l35KCHymXV9lChoBkdAcIyJ4jbBXWgHS6ZoCEdA6l37X5nDi3V9lChoBkdAcyTCUX531WgHS4hoCEdA6l38cAzYVnV9lChoBkdAb+K065oXbmgHS9hoCEdA6l37fA0sOHV9lChoBkdAcHf2KEWZZ2gHS5loCEdA6l3+OGCZnnV9lChoBkdAcZeMcZLqU2gHS7VoCEdA6l3+L5qM33V9lChoBkdAcvNvXsgMdGgHS5loCEdA6l3/cgyM1nV9lChoBkdAdM7Zof0VamgHS61oCEdA6l3/VyWAw3V9lChoBkdAczfHhS9/SmgHS7loCEdA6l3/KFh5PnV9lChoBkdAcw8VM23rlmgHS7VoCEdA6l4BbcoH9nV9lChoBkdAcpQm8ujASGgHS7NoCEdA6l4CDW9UTHV9lChoBkdAcc7FTefqYGgHS69oCEdA6l4E5dv863V9lChoBkdAcADyZ8a4t2gHS5doCEdA6l4AINmUW3V9lChoBkdAcnbfOUt7KWgHS5JoCEdA6l3/4NiH7HV9lChoBkdAcxP2hIvrW2gHS7doCEdA6l4AmxMWXXV9lChoBkdAdBUNMGorF2gHS/NoCEdA6l4CsfzSTnV9lChoBkdAcbLf/FR51WgHS4loCEdA6l4GGa6ST3V9lChoBkdAc7ZSeiBXjmgHS8BoCEdA6l4HBWo3rHV9lChoBkdAceRij+Jgs2gHS6poCEdA6l4EyX+l03V9lChoBkdAdBy9RrJr+GgHS8FoCEdA6l4FFaB7NXV9lChoBkdAc7Tb/wRXfmgHS7poCEdA6l4GRlHz6XV9lChoBkdAcZHMqjJuEWgHS5poCEdA6l4GqXv6THV9lChoBkdAcRxvIfbKzWgHS5BoCEdA6l4HCV8kU3V9lChoBkdActiMVUModGgHS5doCEdA6l4HHObAlHVlLg=="
|
49 |
+
},
|
50 |
+
"ep_success_buffer": {
|
51 |
+
":type:": "<class 'collections.deque'>",
|
52 |
+
":serialized:": "gAWVIAAAAAAAAACMC2NvbGxlY3Rpb25zlIwFZGVxdWWUk5QpS2SGlFKULg=="
|
53 |
+
},
|
54 |
+
"_n_updates": 30144,
|
55 |
+
"observation_space": {
|
56 |
+
":type:": "<class 'gymnasium.spaces.box.Box'>",
|
57 |
+
":serialized:": "gAWVdgIAAAAAAACMFGd5bW5hc2l1bS5zcGFjZXMuYm94lIwDQm94lJOUKYGUfZQojAVkdHlwZZSMBW51bXB5lIwFZHR5cGWUk5SMAmY0lImIh5RSlChLA4wBPJROTk5K/////0r/////SwB0lGKMDWJvdW5kZWRfYmVsb3eUjBJudW1weS5jb3JlLm51bWVyaWOUjAtfZnJvbWJ1ZmZlcpSTlCiWCAAAAAAAAAABAQEBAQEBAZRoCIwCYjGUiYiHlFKUKEsDjAF8lE5OTkr/////Sv////9LAHSUYksIhZSMAUOUdJRSlIwNYm91bmRlZF9hYm92ZZRoESiWCAAAAAAAAAABAQEBAQEBAZRoFUsIhZRoGXSUUpSMBl9zaGFwZZRLCIWUjANsb3eUaBEoliAAAAAAAAAAAAC0wgAAtMIAAKDAAACgwNsPScAAAKDAAAAAgAAAAICUaAtLCIWUaBl0lFKUjARoaWdolGgRKJYgAAAAAAAAAAAAtEIAALRCAACgQAAAoEDbD0lAAACgQAAAgD8AAIA/lGgLSwiFlGgZdJRSlIwIbG93X3JlcHKUjFtbLTkwLiAgICAgICAgLTkwLiAgICAgICAgIC01LiAgICAgICAgIC01LiAgICAgICAgIC0zLjE0MTU5MjcgIC01LgogIC0wLiAgICAgICAgIC0wLiAgICAgICBdlIwJaGlnaF9yZXBylIxTWzkwLiAgICAgICAgOTAuICAgICAgICAgNS4gICAgICAgICA1LiAgICAgICAgIDMuMTQxNTkyNyAgNS4KICAxLiAgICAgICAgIDEuICAgICAgIF2UjApfbnBfcmFuZG9tlE51Yi4=",
|
58 |
+
"dtype": "float32",
|
59 |
+
"bounded_below": "[ True True True True True True True True]",
|
60 |
+
"bounded_above": "[ True True True True True True True True]",
|
61 |
+
"_shape": [
|
62 |
+
8
|
63 |
+
],
|
64 |
+
"low": "[-90. -90. -5. -5. -3.1415927 -5.\n -0. -0. ]",
|
65 |
+
"high": "[90. 90. 5. 5. 3.1415927 5.\n 1. 1. ]",
|
66 |
+
"low_repr": "[-90. -90. -5. -5. -3.1415927 -5.\n -0. -0. ]",
|
67 |
+
"high_repr": "[90. 90. 5. 5. 3.1415927 5.\n 1. 1. ]",
|
68 |
+
"_np_random": null
|
69 |
+
},
|
70 |
+
"action_space": {
|
71 |
+
":type:": "<class 'gymnasium.spaces.discrete.Discrete'>",
|
72 |
+
":serialized:": "gAWV/QAAAAAAAACMGWd5bW5hc2l1bS5zcGFjZXMuZGlzY3JldGWUjAhEaXNjcmV0ZZSTlCmBlH2UKIwBbpSMFW51bXB5LmNvcmUubXVsdGlhcnJheZSMBnNjYWxhcpSTlIwFbnVtcHmUjAVkdHlwZZSTlIwCaTiUiYiHlFKUKEsDjAE8lE5OTkr/////Sv////9LAHSUYkMIBAAAAAAAAACUhpRSlIwFc3RhcnSUaAhoDkMIAAAAAAAAAACUhpRSlIwGX3NoYXBllCmMBWR0eXBllGgLjAJpOJSJiIeUUpQoSwNoD05OTkr/////Sv////9LAHSUYowKX25wX3JhbmRvbZROdWIu",
|
73 |
+
"n": "4",
|
74 |
+
"start": "0",
|
75 |
+
"_shape": [],
|
76 |
+
"dtype": "int64",
|
77 |
+
"_np_random": null
|
78 |
+
},
|
79 |
+
"n_envs": 32,
|
80 |
+
"n_steps": 1024,
|
81 |
+
"gamma": 0.999,
|
82 |
+
"gae_lambda": 0.98,
|
83 |
+
"ent_coef": 0.01,
|
84 |
+
"vf_coef": 0.5,
|
85 |
+
"max_grad_norm": 0.5,
|
86 |
+
"rollout_buffer_class": {
|
87 |
+
":type:": "<class 'abc.ABCMeta'>",
|
88 |
+
":serialized:": "gAWVNgAAAAAAAACMIHN0YWJsZV9iYXNlbGluZXMzLmNvbW1vbi5idWZmZXJzlIwNUm9sbG91dEJ1ZmZlcpSTlC4=",
|
89 |
+
"__module__": "stable_baselines3.common.buffers",
|
90 |
+
"__annotations__": "{'observations': <class 'numpy.ndarray'>, 'actions': <class 'numpy.ndarray'>, 'rewards': <class 'numpy.ndarray'>, 'advantages': <class 'numpy.ndarray'>, 'returns': <class 'numpy.ndarray'>, 'episode_starts': <class 'numpy.ndarray'>, 'log_probs': <class 'numpy.ndarray'>, 'values': <class 'numpy.ndarray'>}",
|
91 |
+
"__doc__": "\n Rollout buffer used in on-policy algorithms like A2C/PPO.\n It corresponds to ``buffer_size`` transitions collected\n using the current policy.\n This experience will be discarded after the policy update.\n In order to use PPO objective, we also store the current value of each state\n and the log probability of each taken action.\n\n The term rollout here refers to the model-free notion and should not\n be used with the concept of rollout used in model-based RL or planning.\n Hence, it is only involved in policy and value function training but not action selection.\n\n :param buffer_size: Max number of element in the buffer\n :param observation_space: Observation space\n :param action_space: Action space\n :param device: PyTorch device\n :param gae_lambda: Factor for trade-off of bias vs variance for Generalized Advantage Estimator\n Equivalent to classic advantage when set to 1.\n :param gamma: Discount factor\n :param n_envs: Number of parallel environments\n ",
|
92 |
+
"__init__": "<function RolloutBuffer.__init__ at 0x7f3bc4723520>",
|
93 |
+
"reset": "<function RolloutBuffer.reset at 0x7f3bc47235b0>",
|
94 |
+
"compute_returns_and_advantage": "<function RolloutBuffer.compute_returns_and_advantage at 0x7f3bc4723640>",
|
95 |
+
"add": "<function RolloutBuffer.add at 0x7f3bc47236d0>",
|
96 |
+
"get": "<function RolloutBuffer.get at 0x7f3bc4723760>",
|
97 |
+
"_get_samples": "<function RolloutBuffer._get_samples at 0x7f3bc47237f0>",
|
98 |
+
"__abstractmethods__": "frozenset()",
|
99 |
+
"_abc_impl": "<_abc._abc_data object at 0x7f3bc48ea480>"
|
100 |
+
},
|
101 |
+
"rollout_buffer_kwargs": {},
|
102 |
+
"batch_size": 64,
|
103 |
+
"n_epochs": 8,
|
104 |
+
"clip_range": {
|
105 |
+
":type:": "<class 'function'>",
|
106 |
+
":serialized:": "gAWV3AIAAAAAAACMF2Nsb3VkcGlja2xlLmNsb3VkcGlja2xllIwOX21ha2VfZnVuY3Rpb26Uk5QoaACMDV9idWlsdGluX3R5cGWUk5SMCENvZGVUeXBllIWUUpQoSwFLAEsASwFLAUsTQwSIAFMAlE6FlCmMAV+UhZSMYy9ob21lL2pvc3NrYS9taW5pY29uZGEzL2VudnMvanNtbDMxMC9saWIvcHl0aG9uMy4xMC9zaXRlLXBhY2thZ2VzL3N0YWJsZV9iYXNlbGluZXMzL2NvbW1vbi91dGlscy5weZSMBGZ1bmOUS4NDAgQBlIwDdmFslIWUKXSUUpR9lCiMC19fcGFja2FnZV9flIwYc3RhYmxlX2Jhc2VsaW5lczMuY29tbW9ulIwIX19uYW1lX1+UjB5zdGFibGVfYmFzZWxpbmVzMy5jb21tb24udXRpbHOUjAhfX2ZpbGVfX5SMYy9ob21lL2pvc3NrYS9taW5pY29uZGEzL2VudnMvanNtbDMxMC9saWIvcHl0aG9uMy4xMC9zaXRlLXBhY2thZ2VzL3N0YWJsZV9iYXNlbGluZXMzL2NvbW1vbi91dGlscy5weZR1Tk5oAIwQX21ha2VfZW1wdHlfY2VsbJSTlClSlIWUdJRSlGgAjBJfZnVuY3Rpb25fc2V0c3RhdGWUk5RoH32UfZQoaBZoDYwMX19xdWFsbmFtZV9flIwZY29uc3RhbnRfZm4uPGxvY2Fscz4uZnVuY5SMD19fYW5ub3RhdGlvbnNfX5R9lIwOX19rd2RlZmF1bHRzX1+UTowMX19kZWZhdWx0c19flE6MCl9fbW9kdWxlX1+UaBeMB19fZG9jX1+UTowLX19jbG9zdXJlX1+UaACMCl9tYWtlX2NlbGyUk5RHP8mZmZmZmZqFlFKUhZSMF19jbG91ZHBpY2tsZV9zdWJtb2R1bGVzlF2UjAtfX2dsb2JhbHNfX5R9lHWGlIZSMC4="
|
107 |
+
},
|
108 |
+
"clip_range_vf": null,
|
109 |
+
"normalize_advantage": true,
|
110 |
+
"target_kl": null,
|
111 |
+
"lr_schedule": {
|
112 |
+
":type:": "<class 'function'>",
|
113 |
+
":serialized:": "gAWV3AIAAAAAAACMF2Nsb3VkcGlja2xlLmNsb3VkcGlja2xllIwOX21ha2VfZnVuY3Rpb26Uk5QoaACMDV9idWlsdGluX3R5cGWUk5SMCENvZGVUeXBllIWUUpQoSwFLAEsASwFLAUsTQwSIAFMAlE6FlCmMAV+UhZSMYy9ob21lL2pvc3NrYS9taW5pY29uZGEzL2VudnMvanNtbDMxMC9saWIvcHl0aG9uMy4xMC9zaXRlLXBhY2thZ2VzL3N0YWJsZV9iYXNlbGluZXMzL2NvbW1vbi91dGlscy5weZSMBGZ1bmOUS4NDAgQBlIwDdmFslIWUKXSUUpR9lCiMC19fcGFja2FnZV9flIwYc3RhYmxlX2Jhc2VsaW5lczMuY29tbW9ulIwIX19uYW1lX1+UjB5zdGFibGVfYmFzZWxpbmVzMy5jb21tb24udXRpbHOUjAhfX2ZpbGVfX5SMYy9ob21lL2pvc3NrYS9taW5pY29uZGEzL2VudnMvanNtbDMxMC9saWIvcHl0aG9uMy4xMC9zaXRlLXBhY2thZ2VzL3N0YWJsZV9iYXNlbGluZXMzL2NvbW1vbi91dGlscy5weZR1Tk5oAIwQX21ha2VfZW1wdHlfY2VsbJSTlClSlIWUdJRSlGgAjBJfZnVuY3Rpb25fc2V0c3RhdGWUk5RoH32UfZQoaBZoDYwMX19xdWFsbmFtZV9flIwZY29uc3RhbnRfZm4uPGxvY2Fscz4uZnVuY5SMD19fYW5ub3RhdGlvbnNfX5R9lIwOX19rd2RlZmF1bHRzX1+UTowMX19kZWZhdWx0c19flE6MCl9fbW9kdWxlX1+UaBeMB19fZG9jX1+UTowLX19jbG9zdXJlX1+UaACMCl9tYWtlX2NlbGyUk5RHPzOpKjBVMmGFlFKUhZSMF19jbG91ZHBpY2tsZV9zdWJtb2R1bGVzlF2UjAtfX2dsb2JhbHNfX5R9lHWGlIZSMC4="
|
114 |
+
},
|
115 |
+
"system_info": {
|
116 |
+
"OS": "Linux-5.15.133.1-microsoft-standard-WSL2-x86_64-with-glibc2.35 # 1 SMP Thu Oct 5 21:02:42 UTC 2023",
|
117 |
+
"Python": "3.10.13",
|
118 |
+
"Stable-Baselines3": "2.2.1",
|
119 |
+
"PyTorch": "2.2.0+cu121",
|
120 |
+
"GPU Enabled": "True",
|
121 |
+
"Numpy": "1.26.4",
|
122 |
+
"Cloudpickle": "3.0.0",
|
123 |
+
"Gymnasium": "0.28.1"
|
124 |
+
}
|
125 |
+
}
|
config.json
ADDED
@@ -0,0 +1,125 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"policy_class": {
|
3 |
+
":type:": "<class 'abc.ABCMeta'>",
|
4 |
+
":serialized:": "gAWVOwAAAAAAAACMIXN0YWJsZV9iYXNlbGluZXMzLmNvbW1vbi5wb2xpY2llc5SMEUFjdG9yQ3JpdGljUG9saWN5lJOULg==",
|
5 |
+
"__module__": "stable_baselines3.common.policies",
|
6 |
+
"__doc__": "\n Policy class for actor-critic algorithms (has both policy and value prediction).\n Used by A2C, PPO and the likes.\n\n :param observation_space: Observation space\n :param action_space: Action space\n :param lr_schedule: Learning rate schedule (could be constant)\n :param net_arch: The specification of the policy and value networks.\n :param activation_fn: Activation function\n :param ortho_init: Whether to use or not orthogonal initialization\n :param use_sde: Whether to use State Dependent Exploration or not\n :param log_std_init: Initial value for the log standard deviation\n :param full_std: Whether to use (n_features x n_actions) parameters\n for the std instead of only (n_features,) when using gSDE\n :param use_expln: Use ``expln()`` function instead of ``exp()`` to ensure\n a positive standard deviation (cf paper). It allows to keep variance\n above zero and prevent it from growing too fast. In practice, ``exp()`` is usually enough.\n :param squash_output: Whether to squash the output using a tanh function,\n this allows to ensure boundaries when using gSDE.\n :param features_extractor_class: Features extractor to use.\n :param features_extractor_kwargs: Keyword arguments\n to pass to the features extractor.\n :param share_features_extractor: If True, the features extractor is shared between the policy and value networks.\n :param normalize_images: Whether to normalize images or not,\n dividing by 255.0 (True by default)\n :param optimizer_class: The optimizer to use,\n ``th.optim.Adam`` by default\n :param optimizer_kwargs: Additional keyword arguments,\n excluding the learning rate, to pass to the optimizer\n ",
|
7 |
+
"__init__": "<function ActorCriticPolicy.__init__ at 0x7fdb80b8cf70>",
|
8 |
+
"_get_constructor_parameters": "<function ActorCriticPolicy._get_constructor_parameters at 0x7fdb80b8d000>",
|
9 |
+
"reset_noise": "<function ActorCriticPolicy.reset_noise at 0x7fdb80b8d090>",
|
10 |
+
"_build_mlp_extractor": "<function ActorCriticPolicy._build_mlp_extractor at 0x7fdb80b8d120>",
|
11 |
+
"_build": "<function ActorCriticPolicy._build at 0x7fdb80b8d1b0>",
|
12 |
+
"forward": "<function ActorCriticPolicy.forward at 0x7fdb80b8d240>",
|
13 |
+
"extract_features": "<function ActorCriticPolicy.extract_features at 0x7fdb80b8d2d0>",
|
14 |
+
"_get_action_dist_from_latent": "<function ActorCriticPolicy._get_action_dist_from_latent at 0x7fdb80b8d360>",
|
15 |
+
"_predict": "<function ActorCriticPolicy._predict at 0x7fdb80b8d3f0>",
|
16 |
+
"evaluate_actions": "<function ActorCriticPolicy.evaluate_actions at 0x7fdb80b8d480>",
|
17 |
+
"get_distribution": "<function ActorCriticPolicy.get_distribution at 0x7fdb80b8d510>",
|
18 |
+
"predict_values": "<function ActorCriticPolicy.predict_values at 0x7fdb80b8d5a0>",
|
19 |
+
"__abstractmethods__": "frozenset()",
|
20 |
+
"_abc_impl": "<_abc._abc_data object at 0x7fdb80b85d80>"
|
21 |
+
},
|
22 |
+
"verbose": 0,
|
23 |
+
"policy_kwargs": {},
|
24 |
+
"num_timesteps": 10027008,
|
25 |
+
"_total_timesteps": 10000000,
|
26 |
+
"_num_timesteps_at_start": 0,
|
27 |
+
"seed": null,
|
28 |
+
"action_noise": null,
|
29 |
+
"start_time": 1707903050276585938,
|
30 |
+
"learning_rate": 0.0003,
|
31 |
+
"tensorboard_log": null,
|
32 |
+
"_last_obs": {
|
33 |
+
":type:": "<class 'numpy.ndarray'>",
|
34 |
+
":serialized:": "gAWVdQQAAAAAAACMEm51bXB5LmNvcmUubnVtZXJpY5SMC19mcm9tYnVmZmVylJOUKJYABAAAAAAAAM1y4jxSmKu5jrqwvjaiUb76g169mq9lPwAAgD8AAAAAAE0APQOsfLwaIKm+J1kLvYRYEz1u+BY9AACAPwAAgD8zxWk8KRwHurtLTrm1DzS0G1Yhuz0+dzgAAIA/AACAPzMWo7w7rIk9rsfDvWnZxb5hQgW9A8XBvQAAAAAAAAAA5sgEvXtUgrp9Yiuz3AEhsI2qKzt1fcgzAACAPwAAgD9miG+8j54guruRm7qeQJa1w8LiOqBYuTkAAIA/AACAP81eR7yfvZ670mcrPU8JHj1xw/u8NA0DPgAAgD8AAIA/zQYhvNLilbsYXeS8PKHRPEhW5jzjS7C9AACAPwAAgD8AVdw8JPIfPHAvbr6PhJK++ZsKvuWeOb4AAIA/AAAAALPNNL28fbM/XP0svptBpb5mH/y8H1mkvQAAAAAAAAAAQGchPlSyGz9hJpI8fHlWv5lwsj4+Q7W9AAAAAAAAAAAzQ6W9kaZ5P8MmeL6k6me/tYj2vRbFGL4AAAAAAAAAAFpPub05bVg+GjI/PqeBHL/0rR++bg8ePgAAAAAAAAAAzRcBPeEls7yiP4G+wwKMPbKR4D2v2qG4AACAPwAAgD9zQ/s95A7EPkbhO77LhDC/yRN7PiNJPb4AAAAAAAAAACDsJL7GB4w/Y6Tvvmu3FL+ns4e+Dp27vgAAAAAAAAAAuhY3PtAO2j7gPtC+lWQ8v8TCRT46R5O+AAAAAAAAAABmt+88cQIZPI18l771oCK+dmkRvr43bz8AAIA/AAAAAEApIr6a6hk/0MKWPLIoMb9wvKm+T5nCPQAAAAAAAAAATasCPQVveT5GWPu9lakvv7UqAbzieOO9AAAAAAAAAAAzs6G6VdSAP67SsbtdlIm/MfG3OpbGnjoAAAAAAAAAAGYzxjxIqZk5fAOgOgAaPrVtxCa89pXDuQAAgD8AAIA/szxfPj0pVD/2Uz09QwJCv8EXAj/DXxe9AAAAAAAAAACAdF89KAejPtqeir4k2C+/1P+aPQX7LL4AAAAAAAAAAE0lw73dnk0/pLsMvuuaTb9D2Wq+5lb2vQAAAAAAAAAAM46pPJknVz8AUZ67DSt3v9d8fT3Ibx49AAAAAAAAAACaPeo7FMDXuu4Z4b0cnyg8adb4O1M+Fr0AAIA/AACAPwCNorwg7KY/Gn9fvZ1gGb89UUm9mmfEvAAAAAAAAAAAmsfRPMMhMrrtc9w67eWdNUCAMzm+6gK6AACAPwAAgD+auty8XFcSumDj9ztcjZI5ezHrO0XGjLkAAIA/AACAPyad7T3Jpw8/unaZO9/mSr9lz4w+7Io0vQAAAAAAAAAAZjXNvCedpT87LhC+9QMEv4iT6Dyw3gA9AAAAAAAAAACUjAVudW1weZSMBWR0eXBllJOUjAJmNJSJiIeUUpQoSwOMATyUTk5OSv////9K/////0sAdJRiSyBLCIaUjAFDlHSUUpQu"
|
35 |
+
},
|
36 |
+
"_last_episode_starts": {
|
37 |
+
":type:": "<class 'numpy.ndarray'>",
|
38 |
+
":serialized:": "gAWVkwAAAAAAAACMEm51bXB5LmNvcmUubnVtZXJpY5SMC19mcm9tYnVmZmVylJOUKJYgAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAlIwFbnVtcHmUjAVkdHlwZZSTlIwCYjGUiYiHlFKUKEsDjAF8lE5OTkr/////Sv////9LAHSUYksghZSMAUOUdJRSlC4="
|
39 |
+
},
|
40 |
+
"_last_original_obs": null,
|
41 |
+
"_episode_num": 0,
|
42 |
+
"use_sde": false,
|
43 |
+
"sde_sample_freq": -1,
|
44 |
+
"_current_progress_remaining": -0.0027007999999999477,
|
45 |
+
"_stats_window_size": 100,
|
46 |
+
"ep_info_buffer": {
|
47 |
+
":type:": "<class 'collections.deque'>",
|
48 |
+
":serialized:": "gAWV5AsAAAAAAACMC2NvbGxlY3Rpb25zlIwFZGVxdWWUk5QpS2SGlFKUKH2UKIwBcpRHQHKp67dznzSMAWyUS9OMAXSUR0Cx/e1TisGQdX2UKGgGR0BzP7ohY/3WaAdLxmgIR0Cx/ZucDr7gdX2UKGgGR0BxsKgM+eOGaAdLyGgIR0Cx/YlfzBhydX2UKGgGR0B0Zjw9aEBbaAdLrWgIR0Cx/ZM1KoQ4dX2UKGgGR0BxfwA93bEhaAdLrGgIR0Cx/cu3lS0jdX2UKGgGR0BymyhQFcIJaAdLiWgIR0Cx/chjOLR8dX2UKGgGR0By7u4mTkhiaAdLwWgIR0Cx/a7haTwEdX2UKGgGR0BzOsaef7JoaAdLvWgIR0Cx/d/c8DB/dX2UKGgGR0BzoZHe7+UAaAdLx2gIR0Cx/ZoMSbpedX2UKGgGR0ByQgBKcurZaAdLo2gIR0Cx/agY51eTdX2UKGgGR0ByKBHd43WGaAdLr2gIR0Cx/ag+t8u0dX2UKGgGR0Bx7/cAR02caAdLlWgIR0Cx/e6eXiR5dX2UKGgGR0ByBGoxYaHcaAdL1WgIR0Cx/cNbkfcOdX2UKGgGR0BzZHzErGzbaAdLtmgIR0Cx/hZvgm7bdX2UKGgGR0BysVP2wmmcaAdLyWgIR0Cx/hIhEBsAdX2UKGgGR0ByjIleF+NMaAdLtWgIR0Cx/hpMYdhidX2UKGgGR0BvKKDujRD1aAdLt2gIR0Cx/ctuk1uSdX2UKGgGR0BxmVPpIMBqaAdLrmgIR0Cx/fEq6OHWdX2UKGgGR0By1u8kD6nBaAdLsGgIR0Cx/ikwN9YwdX2UKGgGR0BxidW6shgWaAdLnGgIR0Cx/ep4jbBXdX2UKGgGR0BylEsBhhH9aAdLsWgIR0Cx/fUgOjIrdX2UKGgGR0ByGO7btZ3caAdLnGgIR0Cx/iWNJe3QdX2UKGgGR0BxfTKifxtpaAdLp2gIR0Cx/eKqbSZ0dX2UKGgGR0BzB+uxKQJYaAdN6AFoCEdAsf4mKyfL93V9lChoBkdAcy1nx8UmD2gHS5RoCEdAsf43B+F10XV9lChoBkdAcx0+SKWLP2gHS79oCEdAsf38omXw9nV9lChoBkdARRVWbPQfIWgHS3ZoCEdAsf4x9RaX8nV9lChoBkdAcjclZ5iVjmgHS7JoCEdAsf38oE0SAnV9lChoBkdAcsJkyDZlF2gHS9xoCEdAsf45schkiHV9lChoBkdAc46bLU1AJWgHS8ZoCEdAsf4LOjZcs3V9lChoBkdAcz1OoHcDbWgHS85oCEdAsf3/U4JeFHV9lChoBkdAcAvFcY64lWgHS51oCEdAsf43vttygnV9lChoBkdAcFDvZh8YymgHS5ZoCEdAsf4PsPatcXV9lChoBkdAcvQhew9q12gHS7toCEdAsf4bwWnCO3V9lChoBkdAcwO69kBjnWgHS7VoCEdAsf4GIBRyfnV9lChoBkdAcjohl18stmgHS9loCEdAsf41MDfWMHV9lChoBkdAcLz59Vmz0GgHS6VoCEdAsf4hZU1hs3V9lChoBkdAcYcIiTt9hWgHS6FoCEdAsf4KF10T13V9lChoBkdAcxczmOlwcmgHS7poCEdAsf5ImLLpzXV9lChoBkdAcLg6Skj5bmgHS6BoCEdAsf5dew9q13V9lChoBkdAcuXOJ+DvmmgHS6NoCEdAsf4zOlfqo3V9lChoBkdAczlxcE/0NGgHS6poCEdAsf6LPIGQjnV9lChoBkdAOnLzf779AGgHS1toCEdAsf46wqy4WnV9lChoBkdAdDI2jO9nLGgHS6toCEdAsf6Pghr303V9lChoBkdAchmSncclxGgHS4xoCEdAsf6IHQhOg3V9lChoBkdAcnbY0l7dBWgHS5FoCEdAsf5Tlq8DjnV9lChoBkdAc57wvg3tKWgHS+toCEdAsf40580DU3V9lChoBkdAcdALE1l5GGgHS4poCEdAsf5BHOKO1nV9lChoBkdAdBGh/y5I6WgHS9JoCEdAsf6irp7kXHV9lChoBkdAcnAYixFAmmgHS+VoCEdAsf5HHGS6lXV9lChoBkdAcjSUsnRb8mgHS5poCEdAsf5eG+K0lnV9lChoBkdAcO0snAqNImgHS51oCEdAsf6iIInjQ3V9lChoBkdAdBZ4LkS26WgHS6ZoCEdAsf6WosI3SHV9lChoBkdAcrmQp4KQaWgHS9JoCEdAsf5bFvQ4THV9lChoBkdAcSIRXOnl4mgHS4hoCEdAsf5pHOKO1nV9lChoBkdAc/28Yht+C2gHS79oCEdAsf6oxM36ynV9lChoBkdAdC6XXAdn02gHS9BoCEdAsf54M1CPZXV9lChoBkdAcKSIyCWeH2gHS5NoCEdAsf50/lhgE3V9lChoBkdAcm9tYSxqwmgHS6toCEdAsf6uh/RVqHV9lChoBkdAcbEYrJ8v3GgHS5VoCEdAsf6Bu0kWynV9lChoBkdAcp6pSJj2BmgHS8RoCEdAsf64MrmQsHV9lChoBkdAcTHhxYJVsGgHS5toCEdAsf6fGOuJUHV9lChoBkdAbmvq59Vmz2gHS7NoCEdAsf551A7gbnV9lChoBkdAcmDbzK9wm2gHS9BoCEdAsf6Kt6ol2XV9lChoBkdAcyLrvsqrimgHS61oCEdAsf6XSiM5wXV9lChoBkdAcng2qkuYhWgHS7VoCEdAsf6Bd8iOenV9lChoBkdAcg4o99tuUGgHS6RoCEdAsf7NmPHT7XV9lChoBkdAc0CcEvCdjGgHTXwBaAhHQLH+8hV2icp1fZQoaAZHQHHpJJf6XSloB0uKaAhHQLH+scJMQEp1fZQoaAZHQHOuihnJ1aJoB0vCaAhHQLH+jlenhsJ1fZQoaAZHQHPTOx4Y77toB0vBaAhHQLH+zR6Ww/x1fZQoaAZHQG/PS7GvOhVoB0uraAhHQLH+sFnIyTJ1fZQoaAZHQHCXEHt4RmNoB0uWaAhHQLH+xo2n8891fZQoaAZHQHF663y7PIJoB0u3aAhHQLH+tAy2x6h1fZQoaAZHQHFF5KBd2PloB0ugaAhHQLH/EaN+9al1fZQoaAZHQHP7+9vjwQVoB0vDaAhHQLH/D37UG3Z1fZQoaAZHQHLt/MKTjedoB0vGaAhHQLH/GN5t3wF1fZQoaAZHQGlARHoX9BNoB03oA2gIR0Cx/rjN+so2dX2UKGgGR0ByzEnPVurIaAdLsWgIR0Cx/rtn9NvgdX2UKGgGR0Bws/JYDDCQaAdLkWgIR0Cx/s0n1FpgdX2UKGgGR0BzIfw4KhL5aAdLsWgIR0Cx/sF6Z6UrdX2UKGgGR0BxymV0Lc9GaAdLoWgIR0Cx/xHEhq0udX2UKGgGR0BzV8PbwjMWaAdL92gIR0Cx/t3vDxb0dX2UKGgGR0Bw/vIxQBPsaAdLjWgIR0Cx/tmNFSbZdX2UKGgGR0Bz3NHFxXGPaAdLq2gIR0Cx/tC+De0pdX2UKGgGR0ByFJTo+wC9aAdLsGgIR0Cx/w/f8/D+dX2UKGgGR0Bz6Y8zQ/oraAdNAAFoCEdAsf89BPbfxnV9lChoBkdASjUwSJ0nxGgHS31oCEdAsf7viEQGwHV9lChoBkdAcgDngYP5HmgHS6FoCEdAsf8ov7FbV3V9lChoBkdAchx36yjYZmgHS6poCEdAsf8laGHpKXV9lChoBkdAc3OJg9eQdWgHS7NoCEdAsf7x2cJ+lXV9lChoBkdAcY9r08NhE2gHS4loCEdAsf8t+jM3ZXV9lChoBkdAc9En4fwI+mgHS71oCEdAsf8sGX5WR3V9lChoBkdAb9DVH4Glh2gHS5FoCEdAsf8WzC1qnHV9lChoBkdAcvq9SuQp4WgHS79oCEdAsf8Go0hvBXV9lChoBkdAcl9mXgLqlmgHS61oCEdAsf8DWH1vl3V9lChoBkdAcBYEL6UJOWgHS59oCEdAsf9g5n13+3V9lChoBkdAcuPINmUW22gHS75oCEdAsf79+RYA83V9lChoBkdAcGIrsSkCWGgHS5hoCEdAsf74gdOqN3V9lChoBkdAcXiaVD8cdmgHS5NoCEdAsf8zcGkeqHVlLg=="
|
49 |
+
},
|
50 |
+
"ep_success_buffer": {
|
51 |
+
":type:": "<class 'collections.deque'>",
|
52 |
+
":serialized:": "gAWVIAAAAAAAAACMC2NvbGxlY3Rpb25zlIwFZGVxdWWUk5QpS2SGlFKULg=="
|
53 |
+
},
|
54 |
+
"_n_updates": 2448,
|
55 |
+
"observation_space": {
|
56 |
+
":type:": "<class 'gymnasium.spaces.box.Box'>",
|
57 |
+
":serialized:": "gAWVdgIAAAAAAACMFGd5bW5hc2l1bS5zcGFjZXMuYm94lIwDQm94lJOUKYGUfZQojAVkdHlwZZSMBW51bXB5lIwFZHR5cGWUk5SMAmY0lImIh5RSlChLA4wBPJROTk5K/////0r/////SwB0lGKMDWJvdW5kZWRfYmVsb3eUjBJudW1weS5jb3JlLm51bWVyaWOUjAtfZnJvbWJ1ZmZlcpSTlCiWCAAAAAAAAAABAQEBAQEBAZRoCIwCYjGUiYiHlFKUKEsDjAF8lE5OTkr/////Sv////9LAHSUYksIhZSMAUOUdJRSlIwNYm91bmRlZF9hYm92ZZRoESiWCAAAAAAAAAABAQEBAQEBAZRoFUsIhZRoGXSUUpSMBl9zaGFwZZRLCIWUjANsb3eUaBEoliAAAAAAAAAAAAC0wgAAtMIAAKDAAACgwNsPScAAAKDAAAAAgAAAAICUaAtLCIWUaBl0lFKUjARoaWdolGgRKJYgAAAAAAAAAAAAtEIAALRCAACgQAAAoEDbD0lAAACgQAAAgD8AAIA/lGgLSwiFlGgZdJRSlIwIbG93X3JlcHKUjFtbLTkwLiAgICAgICAgLTkwLiAgICAgICAgIC01LiAgICAgICAgIC01LiAgICAgICAgIC0zLjE0MTU5MjcgIC01LgogIC0wLiAgICAgICAgIC0wLiAgICAgICBdlIwJaGlnaF9yZXBylIxTWzkwLiAgICAgICAgOTAuICAgICAgICAgNS4gICAgICAgICA1LiAgICAgICAgIDMuMTQxNTkyNyAgNS4KICAxLiAgICAgICAgIDEuICAgICAgIF2UjApfbnBfcmFuZG9tlE51Yi4=",
|
58 |
+
"dtype": "float32",
|
59 |
+
"bounded_below": "[ True True True True True True True True]",
|
60 |
+
"bounded_above": "[ True True True True True True True True]",
|
61 |
+
"_shape": [
|
62 |
+
8
|
63 |
+
],
|
64 |
+
"low": "[-90. -90. -5. -5. -3.1415927 -5.\n -0. -0. ]",
|
65 |
+
"high": "[90. 90. 5. 5. 3.1415927 5.\n 1. 1. ]",
|
66 |
+
"low_repr": "[-90. -90. -5. -5. -3.1415927 -5.\n -0. -0. ]",
|
67 |
+
"high_repr": "[90. 90. 5. 5. 3.1415927 5.\n 1. 1. ]",
|
68 |
+
"_np_random": null
|
69 |
+
},
|
70 |
+
"action_space": {
|
71 |
+
":type:": "<class 'gymnasium.spaces.discrete.Discrete'>",
|
72 |
+
":serialized:": "gAWV/QAAAAAAAACMGWd5bW5hc2l1bS5zcGFjZXMuZGlzY3JldGWUjAhEaXNjcmV0ZZSTlCmBlH2UKIwBbpSMFW51bXB5LmNvcmUubXVsdGlhcnJheZSMBnNjYWxhcpSTlIwFbnVtcHmUjAVkdHlwZZSTlIwCaTiUiYiHlFKUKEsDjAE8lE5OTkr/////Sv////9LAHSUYkMIBAAAAAAAAACUhpRSlIwFc3RhcnSUaAhoDkMIAAAAAAAAAACUhpRSlIwGX3NoYXBllCmMBWR0eXBllGgLjAJpOJSJiIeUUpQoSwNoD05OTkr/////Sv////9LAHSUYowKX25wX3JhbmRvbZROdWIu",
|
73 |
+
"n": "4",
|
74 |
+
"start": "0",
|
75 |
+
"_shape": [],
|
76 |
+
"dtype": "int64",
|
77 |
+
"_np_random": null
|
78 |
+
},
|
79 |
+
"n_envs": 32,
|
80 |
+
"n_steps": 1024,
|
81 |
+
"gamma": 0.999,
|
82 |
+
"gae_lambda": 0.98,
|
83 |
+
"ent_coef": 0.01,
|
84 |
+
"vf_coef": 0.5,
|
85 |
+
"max_grad_norm": 0.5,
|
86 |
+
"rollout_buffer_class": {
|
87 |
+
":type:": "<class 'abc.ABCMeta'>",
|
88 |
+
":serialized:": "gAWVNgAAAAAAAACMIHN0YWJsZV9iYXNlbGluZXMzLmNvbW1vbi5idWZmZXJzlIwNUm9sbG91dEJ1ZmZlcpSTlC4=",
|
89 |
+
"__module__": "stable_baselines3.common.buffers",
|
90 |
+
"__annotations__": "{'observations': <class 'numpy.ndarray'>, 'actions': <class 'numpy.ndarray'>, 'rewards': <class 'numpy.ndarray'>, 'advantages': <class 'numpy.ndarray'>, 'returns': <class 'numpy.ndarray'>, 'episode_starts': <class 'numpy.ndarray'>, 'log_probs': <class 'numpy.ndarray'>, 'values': <class 'numpy.ndarray'>}",
|
91 |
+
"__doc__": "\n Rollout buffer used in on-policy algorithms like A2C/PPO.\n It corresponds to ``buffer_size`` transitions collected\n using the current policy.\n This experience will be discarded after the policy update.\n In order to use PPO objective, we also store the current value of each state\n and the log probability of each taken action.\n\n The term rollout here refers to the model-free notion and should not\n be used with the concept of rollout used in model-based RL or planning.\n Hence, it is only involved in policy and value function training but not action selection.\n\n :param buffer_size: Max number of element in the buffer\n :param observation_space: Observation space\n :param action_space: Action space\n :param device: PyTorch device\n :param gae_lambda: Factor for trade-off of bias vs variance for Generalized Advantage Estimator\n Equivalent to classic advantage when set to 1.\n :param gamma: Discount factor\n :param n_envs: Number of parallel environments\n ",
|
92 |
+
"__init__": "<function RolloutBuffer.__init__ at 0x7fdb80d232e0>",
|
93 |
+
"reset": "<function RolloutBuffer.reset at 0x7fdb80d23370>",
|
94 |
+
"compute_returns_and_advantage": "<function RolloutBuffer.compute_returns_and_advantage at 0x7fdb80d23400>",
|
95 |
+
"add": "<function RolloutBuffer.add at 0x7fdb80d23490>",
|
96 |
+
"get": "<function RolloutBuffer.get at 0x7fdb80d23520>",
|
97 |
+
"_get_samples": "<function RolloutBuffer._get_samples at 0x7fdb80d235b0>",
|
98 |
+
"__abstractmethods__": "frozenset()",
|
99 |
+
"_abc_impl": "<_abc._abc_data object at 0x7fdb80f0f8c0>"
|
100 |
+
},
|
101 |
+
"rollout_buffer_kwargs": {},
|
102 |
+
"batch_size": 64,
|
103 |
+
"n_epochs": 8,
|
104 |
+
"clip_range": {
|
105 |
+
":type:": "<class 'function'>",
|
106 |
+
":serialized:": "gAWV3AIAAAAAAACMF2Nsb3VkcGlja2xlLmNsb3VkcGlja2xllIwOX21ha2VfZnVuY3Rpb26Uk5QoaACMDV9idWlsdGluX3R5cGWUk5SMCENvZGVUeXBllIWUUpQoSwFLAEsASwFLAUsTQwSIAFMAlE6FlCmMAV+UhZSMYy9ob21lL2pvc3NrYS9taW5pY29uZGEzL2VudnMvanNtbDMxMC9saWIvcHl0aG9uMy4xMC9zaXRlLXBhY2thZ2VzL3N0YWJsZV9iYXNlbGluZXMzL2NvbW1vbi91dGlscy5weZSMBGZ1bmOUS4NDAgQBlIwDdmFslIWUKXSUUpR9lCiMC19fcGFja2FnZV9flIwYc3RhYmxlX2Jhc2VsaW5lczMuY29tbW9ulIwIX19uYW1lX1+UjB5zdGFibGVfYmFzZWxpbmVzMy5jb21tb24udXRpbHOUjAhfX2ZpbGVfX5SMYy9ob21lL2pvc3NrYS9taW5pY29uZGEzL2VudnMvanNtbDMxMC9saWIvcHl0aG9uMy4xMC9zaXRlLXBhY2thZ2VzL3N0YWJsZV9iYXNlbGluZXMzL2NvbW1vbi91dGlscy5weZR1Tk5oAIwQX21ha2VfZW1wdHlfY2VsbJSTlClSlIWUdJRSlGgAjBJfZnVuY3Rpb25fc2V0c3RhdGWUk5RoH32UfZQoaBZoDYwMX19xdWFsbmFtZV9flIwZY29uc3RhbnRfZm4uPGxvY2Fscz4uZnVuY5SMD19fYW5ub3RhdGlvbnNfX5R9lIwOX19rd2RlZmF1bHRzX1+UTowMX19kZWZhdWx0c19flE6MCl9fbW9kdWxlX1+UaBeMB19fZG9jX1+UTowLX19jbG9zdXJlX1+UaACMCl9tYWtlX2NlbGyUk5RHP8mZmZmZmZqFlFKUhZSMF19jbG91ZHBpY2tsZV9zdWJtb2R1bGVzlF2UjAtfX2dsb2JhbHNfX5R9lHWGlIZSMC4="
|
107 |
+
},
|
108 |
+
"clip_range_vf": null,
|
109 |
+
"normalize_advantage": true,
|
110 |
+
"target_kl": null,
|
111 |
+
"lr_schedule": {
|
112 |
+
":type:": "<class 'function'>",
|
113 |
+
":serialized:": "gAWV3AIAAAAAAACMF2Nsb3VkcGlja2xlLmNsb3VkcGlja2xllIwOX21ha2VfZnVuY3Rpb26Uk5QoaACMDV9idWlsdGluX3R5cGWUk5SMCENvZGVUeXBllIWUUpQoSwFLAEsASwFLAUsTQwSIAFMAlE6FlCmMAV+UhZSMYy9ob21lL2pvc3NrYS9taW5pY29uZGEzL2VudnMvanNtbDMxMC9saWIvcHl0aG9uMy4xMC9zaXRlLXBhY2thZ2VzL3N0YWJsZV9iYXNlbGluZXMzL2NvbW1vbi91dGlscy5weZSMBGZ1bmOUS4NDAgQBlIwDdmFslIWUKXSUUpR9lCiMC19fcGFja2FnZV9flIwYc3RhYmxlX2Jhc2VsaW5lczMuY29tbW9ulIwIX19uYW1lX1+UjB5zdGFibGVfYmFzZWxpbmVzMy5jb21tb24udXRpbHOUjAhfX2ZpbGVfX5SMYy9ob21lL2pvc3NrYS9taW5pY29uZGEzL2VudnMvanNtbDMxMC9saWIvcHl0aG9uMy4xMC9zaXRlLXBhY2thZ2VzL3N0YWJsZV9iYXNlbGluZXMzL2NvbW1vbi91dGlscy5weZR1Tk5oAIwQX21ha2VfZW1wdHlfY2VsbJSTlClSlIWUdJRSlGgAjBJfZnVuY3Rpb25fc2V0c3RhdGWUk5RoH32UfZQoaBZoDYwMX19xdWFsbmFtZV9flIwZY29uc3RhbnRfZm4uPGxvY2Fscz4uZnVuY5SMD19fYW5ub3RhdGlvbnNfX5R9lIwOX19rd2RlZmF1bHRzX1+UTowMX19kZWZhdWx0c19flE6MCl9fbW9kdWxlX1+UaBeMB19fZG9jX1+UTowLX19jbG9zdXJlX1+UaACMCl9tYWtlX2NlbGyUk5RHPzOpKjBVMmGFlFKUhZSMF19jbG91ZHBpY2tsZV9zdWJtb2R1bGVzlF2UjAtfX2dsb2JhbHNfX5R9lHWGlIZSMC4="
|
114 |
+
},
|
115 |
+
"system_info": {
|
116 |
+
"OS": "Linux-5.15.133.1-microsoft-standard-WSL2-x86_64-with-glibc2.35 # 1 SMP Thu Oct 5 21:02:42 UTC 2023",
|
117 |
+
"Python": "3.10.13",
|
118 |
+
"Stable-Baselines3": "2.2.1",
|
119 |
+
"PyTorch": "2.2.0+cu121",
|
120 |
+
"GPU Enabled": "True",
|
121 |
+
"Numpy": "1.26.4",
|
122 |
+
"Cloudpickle": "3.0.0",
|
123 |
+
"Gymnasium": "0.28.1"
|
124 |
+
}
|
125 |
+
}
|
evaluate.py
ADDED
@@ -0,0 +1,69 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import os
|
2 |
+
import random
|
3 |
+
from hf_helpers.sb3_eval import eval_model_with_seed
|
4 |
+
import pandas as pd
|
5 |
+
|
6 |
+
env_id = "LunarLander-v2"
|
7 |
+
models_to_evaluate = [
|
8 |
+
"ppo-LunarLander-v2_001_000_000_hf_defaults.zip",
|
9 |
+
"ppo-LunarLander-v2_010_000_000_hf_defaults.zip",
|
10 |
+
"ppo-LunarLander-v2_010_000_000_sb3_defaults.zip",
|
11 |
+
"ppo-LunarLander-v2_123_456_789_hf_defaults.zip",
|
12 |
+
]
|
13 |
+
evaluation_results_fp = "evaluation_results.csv"
|
14 |
+
|
15 |
+
|
16 |
+
def store_results(results):
|
17 |
+
results_df = pd.DataFrame(results)
|
18 |
+
header = False if os.path.exists(evaluation_results_fp) else True
|
19 |
+
results_df.to_csv(evaluation_results_fp, mode="a", index=False, header=header)
|
20 |
+
|
21 |
+
|
22 |
+
def evaluate_and_store_all_results():
|
23 |
+
results = []
|
24 |
+
n_evaluations = 1000
|
25 |
+
for i in range(n_evaluations):
|
26 |
+
if i > 0 and i % 10 == 0:
|
27 |
+
print(f"Progress: {i}/{n_evaluations}")
|
28 |
+
store_results(results)
|
29 |
+
results = []
|
30 |
+
|
31 |
+
# seed = random.randint(0, 1000000000000) # Why this interval?
|
32 |
+
seed = random.randint(0, 10000) # Also try some smaller numbers for seed
|
33 |
+
n_envs = random.randint(1, 16)
|
34 |
+
for model_fp in models_to_evaluate:
|
35 |
+
result, mean_reward, std_reward = eval_model_with_seed(
|
36 |
+
model_fp, env_id, seed, n_eval_episodes=10, n_envs=n_envs
|
37 |
+
)
|
38 |
+
result_data = {
|
39 |
+
"model_fp": model_fp,
|
40 |
+
"seed": seed,
|
41 |
+
"n_envs": n_envs,
|
42 |
+
"result": result,
|
43 |
+
"mean_reward": mean_reward,
|
44 |
+
"std_reward": std_reward,
|
45 |
+
}
|
46 |
+
results.append(result_data)
|
47 |
+
|
48 |
+
|
49 |
+
def analyze_results():
|
50 |
+
results_df = pd.read_csv(evaluation_results_fp)
|
51 |
+
results_df["model_fp"] = results_df["model_fp"].str.replace(".zip", "", regex=False)
|
52 |
+
aggregated_results = (
|
53 |
+
results_df.groupby("model_fp")["result"]
|
54 |
+
.agg(["count", "min", "max", "mean"])
|
55 |
+
.reset_index()
|
56 |
+
)
|
57 |
+
aggregated_results.columns = [
|
58 |
+
"Model name",
|
59 |
+
"Number of results",
|
60 |
+
"Min",
|
61 |
+
"Max",
|
62 |
+
"Average",
|
63 |
+
]
|
64 |
+
aggregated_results = aggregated_results.sort_values(by="Model name")
|
65 |
+
print(aggregated_results.to_markdown(index=False, tablefmt="pipe"))
|
66 |
+
|
67 |
+
|
68 |
+
# evaluate_and_store_all_results()
|
69 |
+
analyze_results()
|
evaluation_results.csv
ADDED
The diff for this file is too large to render.
See raw diff
|
|
hf_helpers/__init__.py
ADDED
File without changes
|
hf_helpers/gym_video.py
ADDED
@@ -0,0 +1,82 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import os
|
2 |
+
import tempfile
|
3 |
+
import imageio
|
4 |
+
from stable_baselines3.common.vec_env import VecVideoRecorder
|
5 |
+
import numpy as np
|
6 |
+
import gymnasium as gym
|
7 |
+
from stable_baselines3 import PPO
|
8 |
+
from stable_baselines3.common.monitor import Monitor
|
9 |
+
from stable_baselines3.common.vec_env import DummyVecEnv
|
10 |
+
|
11 |
+
|
12 |
+
def generate_video(model, video_fp, video_length_in_episodes=5):
|
13 |
+
|
14 |
+
eval_env = model.get_env()
|
15 |
+
|
16 |
+
max_video_length_in_steps = (
|
17 |
+
video_length_in_episodes * eval_env.get_attr("spec")[0].max_episode_steps
|
18 |
+
)
|
19 |
+
|
20 |
+
with tempfile.TemporaryDirectory() as temp_dp:
|
21 |
+
vec_env = VecVideoRecorder(
|
22 |
+
eval_env,
|
23 |
+
temp_dp,
|
24 |
+
record_video_trigger=lambda x: x == 0,
|
25 |
+
video_length=max_video_length_in_steps,
|
26 |
+
)
|
27 |
+
|
28 |
+
frame_count = 0
|
29 |
+
episode_count = 0
|
30 |
+
obs = vec_env.reset()
|
31 |
+
for _ in range(max_video_length_in_steps):
|
32 |
+
action, _ = model.predict(obs, deterministic=True)
|
33 |
+
obs, _, dones, _ = vec_env.step(action)
|
34 |
+
frame_count += 1
|
35 |
+
if dones:
|
36 |
+
episode_count += 1
|
37 |
+
if episode_count >= video_length_in_episodes:
|
38 |
+
break
|
39 |
+
|
40 |
+
vec_env.close()
|
41 |
+
|
42 |
+
temp_fp = vec_env.video_recorder.path
|
43 |
+
|
44 |
+
# TODO: Fix this.
|
45 |
+
# Use ffmpeg to remove the last frame (it is the first frame in a new episode).
|
46 |
+
os.system(
|
47 |
+
f"""ffmpeg -y -i {temp_fp} -vf "select='not(eq(n,{frame_count}))'" {video_fp} > /dev/null 2>&1"""
|
48 |
+
)
|
49 |
+
# os.rename(temp_fp, file_path)
|
50 |
+
|
51 |
+
|
52 |
+
def generate_gif(model, file_path, video_length_in_episodes=5):
|
53 |
+
eval_env = model.get_env()
|
54 |
+
|
55 |
+
max_video_length_in_steps = (
|
56 |
+
video_length_in_episodes * eval_env.get_attr("spec")[0].max_episode_steps
|
57 |
+
)
|
58 |
+
|
59 |
+
render_image = lambda: eval_env.render(mode="rgb_array")
|
60 |
+
|
61 |
+
images = []
|
62 |
+
episode_count = 0
|
63 |
+
obs = eval_env.reset()
|
64 |
+
images.append(render_image())
|
65 |
+
for _ in range(max_video_length_in_steps):
|
66 |
+
action, _ = model.predict(obs)
|
67 |
+
obs, _, dones, _ = eval_env.step(action)
|
68 |
+
if dones:
|
69 |
+
episode_count += 1
|
70 |
+
if episode_count >= video_length_in_episodes:
|
71 |
+
break
|
72 |
+
images.append(render_image())
|
73 |
+
|
74 |
+
imageio.mimsave(
|
75 |
+
file_path, [np.array(img) for i, img in enumerate(images) if i % 2 == 0], fps=25
|
76 |
+
)
|
77 |
+
|
78 |
+
|
79 |
+
def load_ppo_model_for_video(model_fp, env_id):
|
80 |
+
env = DummyVecEnv([lambda: Monitor(gym.make(env_id, render_mode="rgb_array"))])
|
81 |
+
model = PPO.load(model_fp, env=env)
|
82 |
+
return model
|
hf_helpers/hf_sb3.py
ADDED
@@ -0,0 +1,25 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import datetime
|
2 |
+
import json
|
3 |
+
import zipfile
|
4 |
+
import stable_baselines3
|
5 |
+
|
6 |
+
|
7 |
+
def generate_config_json(model_fp, config_fp):
|
8 |
+
with zipfile.ZipFile(model_fp, 'r') as zip_ref:
|
9 |
+
with zip_ref.open("data") as file:
|
10 |
+
data = json.load(file)
|
11 |
+
data["system_info"] = stable_baselines3.get_system_info(print_info=False)[0]
|
12 |
+
with open(config_fp, 'w') as f:
|
13 |
+
json.dump(data, f, indent=4)
|
14 |
+
|
15 |
+
def generate_results_json(results_fp, mean_reward, std_reward, n_eval_episodes, is_deterministic=True):
|
16 |
+
eval_form_datetime = datetime.datetime.now().isoformat()
|
17 |
+
data = {
|
18 |
+
"mean_reward": mean_reward,
|
19 |
+
"std_reward": std_reward,
|
20 |
+
"is_deterministic": is_deterministic,
|
21 |
+
"n_eval_episodes": n_eval_episodes,
|
22 |
+
"eval_datetime": eval_form_datetime,
|
23 |
+
}
|
24 |
+
with open(results_fp, 'w') as f:
|
25 |
+
json.dump(data, f, indent=4)
|
hf_helpers/readme.md
ADDED
@@ -0,0 +1 @@
|
|
|
|
|
1 |
+
TODO: Put on GitHub?
|
hf_helpers/sb3_eval.py
ADDED
@@ -0,0 +1,89 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import gymnasium as gym
|
2 |
+
from stable_baselines3 import PPO
|
3 |
+
from stable_baselines3.common.evaluation import evaluate_policy
|
4 |
+
from stable_baselines3.common.monitor import Monitor
|
5 |
+
from stable_baselines3.common.env_util import make_vec_env
|
6 |
+
import random
|
7 |
+
|
8 |
+
|
9 |
+
def eval_model_with_seed(model_fp, env_id, seed, n_eval_episodes=10, n_envs=1):
|
10 |
+
eval_env = make_vec_env(env_id, seed=seed, n_envs=n_envs)
|
11 |
+
return eval_model(model_fp, eval_env, n_eval_episodes)
|
12 |
+
|
13 |
+
|
14 |
+
def eval_model_random(model_fp, env_id, n_eval_episodes=10):
|
15 |
+
eval_env = Monitor(gym.make(env_id))
|
16 |
+
return eval_model(model_fp, eval_env, n_eval_episodes)
|
17 |
+
|
18 |
+
|
19 |
+
def eval_model_random_with_average(
|
20 |
+
model_fp, env_id, n_eval_episodes=10, n_average=10, verbose=False
|
21 |
+
):
|
22 |
+
result_sum = 0
|
23 |
+
mean_reward_sum = 0
|
24 |
+
std_reward_sum = 0
|
25 |
+
for i in range(n_average):
|
26 |
+
if verbose and i % 100 == 0:
|
27 |
+
print(f"Progress: {i}/{n_average}")
|
28 |
+
result, mean_reward, std_reward = eval_model_random(
|
29 |
+
model_fp, env_id, n_eval_episodes
|
30 |
+
)
|
31 |
+
result_sum += result
|
32 |
+
mean_reward_sum += mean_reward
|
33 |
+
std_reward_sum += std_reward
|
34 |
+
return (
|
35 |
+
result_sum / n_average,
|
36 |
+
mean_reward_sum / n_average,
|
37 |
+
std_reward_sum / n_average,
|
38 |
+
)
|
39 |
+
|
40 |
+
|
41 |
+
def eval_model(model_fp, eval_env, n_eval_episodes=10):
|
42 |
+
model = PPO.load(model_fp, env=eval_env)
|
43 |
+
mean_reward, std_reward = evaluate_policy(
|
44 |
+
model, eval_env, n_eval_episodes=n_eval_episodes, deterministic=True
|
45 |
+
)
|
46 |
+
result = mean_reward - std_reward
|
47 |
+
return result, mean_reward, std_reward
|
48 |
+
|
49 |
+
|
50 |
+
def search_for_best_seed(
|
51 |
+
model_fp,
|
52 |
+
env_id,
|
53 |
+
n_eval_episodes=10,
|
54 |
+
n_total_envs_to_search=1000,
|
55 |
+
max_n_envs=16,
|
56 |
+
verbose=False,
|
57 |
+
):
|
58 |
+
best_result = 0
|
59 |
+
best_seed = 0
|
60 |
+
best_n_envs = 0
|
61 |
+
for i in range(n_total_envs_to_search):
|
62 |
+
if verbose and i % 100 == 0:
|
63 |
+
print(f"Progress: {i}/{n_total_envs_to_search}")
|
64 |
+
seed = random.randint(0, 1000000000000)
|
65 |
+
n_envs = random.randint(1, max_n_envs)
|
66 |
+
result, _, _ = eval_model_with_seed(
|
67 |
+
model_fp, env_id, seed, n_eval_episodes, n_envs
|
68 |
+
)
|
69 |
+
if result > best_result:
|
70 |
+
best_result = result
|
71 |
+
best_seed = seed
|
72 |
+
best_n_envs = n_envs
|
73 |
+
return best_result, best_seed, best_n_envs
|
74 |
+
|
75 |
+
|
76 |
+
def search_for_best_seed_in_range(model_fp, env_id, range=range(0, 1000)):
|
77 |
+
best_result = 0
|
78 |
+
best_seed = 0
|
79 |
+
best_n_envs = 0
|
80 |
+
for seed in range:
|
81 |
+
for n_envs in [1, 2, 4, 8, 16, 32]:
|
82 |
+
result, _, _ = eval_model_with_seed(model_fp, env_id, seed, 10, n_envs)
|
83 |
+
if result > best_result:
|
84 |
+
best_result = result
|
85 |
+
best_seed = seed
|
86 |
+
best_n_envs = n_envs
|
87 |
+
print(best_result, seed, n_envs)
|
88 |
+
print(best_result, best_seed, best_n_envs)
|
89 |
+
return best_result, best_seed, best_n_envs
|
main.py
ADDED
@@ -0,0 +1,57 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
from huggingface_sb3.push_to_hub import generate_metadata
|
2 |
+
from huggingface_hub.repocard import metadata_save
|
3 |
+
from hf_helpers.gym_video import generate_video, load_ppo_model_for_video
|
4 |
+
from hf_helpers.hf_sb3 import generate_config_json, generate_results_json
|
5 |
+
from hf_helpers.sb3_eval import eval_model_with_seed
|
6 |
+
|
7 |
+
|
8 |
+
readme_path = "README.md"
|
9 |
+
|
10 |
+
env_id = "LunarLander-v2"
|
11 |
+
|
12 |
+
main_model_fp = "ppo-LunarLander-v2_010_000_000_hf_defaults.zip"
|
13 |
+
other_models = [
|
14 |
+
"ppo-LunarLander-v2_001_000_000_hf_defaults.zip",
|
15 |
+
"ppo-LunarLander-v2_010_000_000_sb3_defaults.zip",
|
16 |
+
"ppo-LunarLander-v2_123_456_789_hf_defaults.zip",
|
17 |
+
]
|
18 |
+
|
19 |
+
|
20 |
+
# 1. Evaluate model
|
21 |
+
best_seed = 902
|
22 |
+
best_n_envs = 8
|
23 |
+
n_eval_episodes = 10
|
24 |
+
result, mean_reward, std_reward = eval_model_with_seed(
|
25 |
+
main_model_fp,
|
26 |
+
env_id,
|
27 |
+
seed=best_seed,
|
28 |
+
n_eval_episodes=n_eval_episodes,
|
29 |
+
n_envs=best_n_envs,
|
30 |
+
)
|
31 |
+
|
32 |
+
|
33 |
+
# 2. Create config.json
|
34 |
+
generate_config_json(main_model_fp, "config.json")
|
35 |
+
# Also create config files for the other models
|
36 |
+
for model_fp in other_models:
|
37 |
+
generate_config_json(model_fp, f"config-{model_fp.replace('.zip', '')}.json")
|
38 |
+
|
39 |
+
|
40 |
+
# 3. Create results.json
|
41 |
+
generate_results_json("results.json", mean_reward, std_reward, n_eval_episodes, True)
|
42 |
+
|
43 |
+
|
44 |
+
# 4. Generate video
|
45 |
+
model_for_video = load_ppo_model_for_video(main_model_fp, env_id)
|
46 |
+
generate_video(model_for_video, "video.mp4", video_length_in_episodes=5)
|
47 |
+
|
48 |
+
|
49 |
+
# 5. Generate model card
|
50 |
+
metadata = generate_metadata(
|
51 |
+
model_name=main_model_fp.replace(".zip", ""),
|
52 |
+
env_id=env_id,
|
53 |
+
mean_reward=mean_reward,
|
54 |
+
std_reward=std_reward,
|
55 |
+
)
|
56 |
+
metadata["license"] = "mit"
|
57 |
+
metadata_save(readme_path, metadata)
|
ppo-LunarLander-v2_001_000_000_hf_defaults.zip
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:95c76941c556b88a7269767145b8772aa92a7d3394cedfe21ca73f0fd71c4ca2
|
3 |
+
size 151144
|
ppo-LunarLander-v2_010_000_000_hf_defaults.zip
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:64d6cf67b2df2ebfe514a2f33e01346dc00a992789b1ca88eb42f094a747066c
|
3 |
+
size 151024
|
ppo-LunarLander-v2_010_000_000_sb3_defaults.zip
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:c291a24f4798636a4b21e9cf6e8a5eb216f2f681382926136a7e658740b2e042
|
3 |
+
size 151015
|
ppo-LunarLander-v2_123_456_789_hf_defaults.zip
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:93c100c96c3002c69c5a87cb5c37ec01fc6e0f66dbd9bd46a315f761613f9c1e
|
3 |
+
size 151024
|
results.json
ADDED
@@ -0,0 +1,7 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"mean_reward": 311.6129648,
|
3 |
+
"std_reward": 6.22892335529413,
|
4 |
+
"is_deterministic": true,
|
5 |
+
"n_eval_episodes": 10,
|
6 |
+
"eval_datetime": "2024-03-26T11:30:30.555994"
|
7 |
+
}
|
video.mp4
ADDED
Binary file (143 kB). View file
|
|