jostyposty commited on
Commit
3261e0d
1 Parent(s): 03a70ee

feat: add four models

Browse files
.gitignore ADDED
@@ -0,0 +1 @@
 
 
1
+ __pycache__
README.md CHANGED
@@ -1,3 +1,80 @@
1
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
  license: mit
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ library_name: stable-baselines3
3
+ tags:
4
+ - LunarLander-v2
5
+ - deep-reinforcement-learning
6
+ - reinforcement-learning
7
+ - stable-baselines3
8
+ model-index:
9
+ - name: ppo-LunarLander-v2_010_000_000_hf_defaults
10
+ results:
11
+ - task:
12
+ type: reinforcement-learning
13
+ name: reinforcement-learning
14
+ dataset:
15
+ name: LunarLander-v2
16
+ type: LunarLander-v2
17
+ metrics:
18
+ - type: mean_reward
19
+ value: 311.61 +/- 6.23
20
+ name: mean_reward
21
+ verified: false
22
  license: mit
23
  ---
24
+
25
+ # **PPO** Agent playing **LunarLander-v2**
26
+ This is a trained model of a **PPO** agent playing **LunarLander-v2**
27
+ using the [stable-baselines3 library](https://github.com/DLR-RM/stable-baselines3).
28
+
29
+ ## Training
30
+ When I first started training, I experimented with different parameter values to see if I could find something that gave better results than others. I ended up just using the defaults provided by Hugging Face (HF), but the differences in results between those defaults and the defaults from Stable Baselines3 (SB3) where not that large in my findings.
31
+
32
+ | Defaults name | n_steps | batch_size | n_epochs | gamma | gae_lambda | ent_coef |
33
+ |-------------------------------------|--------:|-----------:|---------:|------:|-----------:|---------:|
34
+ | Hugging Face Defaults (hf_defaults) | 1,024 | 64 | 8 | 0.999 | 0.98 | 0.01 |
35
+ | SB3 Defaults (sb3_defaults) | 2,048 | 64 | 10 | 0.99 | 0.95 | 0.0 |
36
+
37
+ ## Models
38
+ I decided to train and upload four models. I wanted to test the following. I thought 1,000,000 (1M) timesteps was insufficient and 123,456,789 (123M) timesteps was excessively time-consuming without significant improvement in results. I believed 10,000,000 (10M) timesteps would offer a reasonable balance between training duration and outcomes. I used defaults from both Hugging Face and Stable Baseline3 when training with 10M timesteps.
39
+
40
+ | Number | Model name | timesteps | Defaults |
41
+ |:------:|---------------------------------------------|------------:|-------------:|
42
+ | 1 | ppo-LunarLander-v2_001_000_000_hf_defaults | 1,000,000 | hf_defaults |
43
+ | 2 | ppo-LunarLander-v2_010_000_000_hf_defaults | 10,000,000 | hf_defaults |
44
+ | 3 | ppo-LunarLander-v2_010_000_000_sb3_defaults | 10,000,000 | sb3_defaults |
45
+ | 4 | ppo-LunarLander-v2_123_456_789_hf_defaults | 123,456,789 | hf_defaults |
46
+
47
+
48
+ ## Evaluation
49
+ I evaluated the four models using two approaches:
50
+ - Search: Search through a lot different random environments for a good seed
51
+ - Average: Averaging over a lot different random environments
52
+
53
+ The code in evaluate.py shows the method of evaluating and storing the results. All the results are included in the evaluation_results.csv file. The result is mean_reward - std_reward, but I also store mean_reward, std_reward, and seed and n_envs as well.
54
+
55
+ ### Results
56
+ | Model name | Number of results | Min | Max | Average |
57
+ |:--------------------------------------------|--------------------:|---------:|--------:|----------:|
58
+ | ppo-LunarLander-v2_001_000_000_hf_defaults | 4136 | 144.712 | 269.721 | 240.895 |
59
+ | ppo-LunarLander-v2_010_000_000_hf_defaults | 4136 | 130.43 | 305.384 | 270.451 |
60
+ | ppo-LunarLander-v2_010_000_000_sb3_defaults | 4136 | 87.9966 | 298.898 | 269.568 |
61
+ | ppo-LunarLander-v2_123_456_789_hf_defaults | 4136 | 141.814 | 302.567 | 268.735 |
62
+
63
+ ### Conclusion
64
+ As suspected, the 1M model performed the worst. I really don't think there are significant differences between the two 10M and the 123M models.
65
+
66
+ ## Disclaimer regarding the evaluation result
67
+ I kind of don't like the randomness we get by the current method for evaluating the model. As you see, I tested with different seeds and number of parallel environments for the same model, and I got quite varying results. I have not manually updated the score to the better, neither used a lower number for n_eval_episodes. The latter would give a better result, as there would be less to average over. But, as can be seen in evaluation_results.csv, I do have "mined" for a good seed for when to share.
68
+
69
+ ### A better way to evaluate the models?
70
+ Perhaps we should average over more environments? Wouldn't this give a result less prone to the randomness of the environments? When averaging over the environments, we get a much more stable result. So I think this perhaps could be a better way of evaluating the results for use in a leader board. In short: n_eval_episodes=10 and average over at least 10 different random environments.
71
+
72
+ ## Usage (with Stable-baselines3)
73
+
74
+ ```python
75
+ from huggingface_sb3 import load_from_hub
76
+ checkpoint = load_from_hub("jostyposty/drl-course-unit-01-lunar-lander-v2", "ppo-LunarLander-v2_010_000_000_hf_defaults.zip")
77
+ # TODO: test this
78
+ ```
79
+
80
+
config-ppo-LunarLander-v2_001_000_000_hf_defaults.json ADDED
@@ -0,0 +1,125 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "policy_class": {
3
+ ":type:": "<class 'abc.ABCMeta'>",
4
+ ":serialized:": "gAWVOwAAAAAAAACMIXN0YWJsZV9iYXNlbGluZXMzLmNvbW1vbi5wb2xpY2llc5SMEUFjdG9yQ3JpdGljUG9saWN5lJOULg==",
5
+ "__module__": "stable_baselines3.common.policies",
6
+ "__doc__": "\n Policy class for actor-critic algorithms (has both policy and value prediction).\n Used by A2C, PPO and the likes.\n\n :param observation_space: Observation space\n :param action_space: Action space\n :param lr_schedule: Learning rate schedule (could be constant)\n :param net_arch: The specification of the policy and value networks.\n :param activation_fn: Activation function\n :param ortho_init: Whether to use or not orthogonal initialization\n :param use_sde: Whether to use State Dependent Exploration or not\n :param log_std_init: Initial value for the log standard deviation\n :param full_std: Whether to use (n_features x n_actions) parameters\n for the std instead of only (n_features,) when using gSDE\n :param use_expln: Use ``expln()`` function instead of ``exp()`` to ensure\n a positive standard deviation (cf paper). It allows to keep variance\n above zero and prevent it from growing too fast. In practice, ``exp()`` is usually enough.\n :param squash_output: Whether to squash the output using a tanh function,\n this allows to ensure boundaries when using gSDE.\n :param features_extractor_class: Features extractor to use.\n :param features_extractor_kwargs: Keyword arguments\n to pass to the features extractor.\n :param share_features_extractor: If True, the features extractor is shared between the policy and value networks.\n :param normalize_images: Whether to normalize images or not,\n dividing by 255.0 (True by default)\n :param optimizer_class: The optimizer to use,\n ``th.optim.Adam`` by default\n :param optimizer_kwargs: Additional keyword arguments,\n excluding the learning rate, to pass to the optimizer\n ",
7
+ "__init__": "<function ActorCriticPolicy.__init__ at 0x7f2d8fd011b0>",
8
+ "_get_constructor_parameters": "<function ActorCriticPolicy._get_constructor_parameters at 0x7f2d8fd01240>",
9
+ "reset_noise": "<function ActorCriticPolicy.reset_noise at 0x7f2d8fd012d0>",
10
+ "_build_mlp_extractor": "<function ActorCriticPolicy._build_mlp_extractor at 0x7f2d8fd01360>",
11
+ "_build": "<function ActorCriticPolicy._build at 0x7f2d8fd013f0>",
12
+ "forward": "<function ActorCriticPolicy.forward at 0x7f2d8fd01480>",
13
+ "extract_features": "<function ActorCriticPolicy.extract_features at 0x7f2d8fd01510>",
14
+ "_get_action_dist_from_latent": "<function ActorCriticPolicy._get_action_dist_from_latent at 0x7f2d8fd015a0>",
15
+ "_predict": "<function ActorCriticPolicy._predict at 0x7f2d8fd01630>",
16
+ "evaluate_actions": "<function ActorCriticPolicy.evaluate_actions at 0x7f2d8fd016c0>",
17
+ "get_distribution": "<function ActorCriticPolicy.get_distribution at 0x7f2d8fd01750>",
18
+ "predict_values": "<function ActorCriticPolicy.predict_values at 0x7f2d8fd017e0>",
19
+ "__abstractmethods__": "frozenset()",
20
+ "_abc_impl": "<_abc._abc_data object at 0x7f2d8fcf67c0>"
21
+ },
22
+ "verbose": 0,
23
+ "policy_kwargs": {},
24
+ "num_timesteps": 1015808,
25
+ "_total_timesteps": 1000000,
26
+ "_num_timesteps_at_start": 0,
27
+ "seed": null,
28
+ "action_noise": null,
29
+ "start_time": 1708464106672864008,
30
+ "learning_rate": 0.0003,
31
+ "tensorboard_log": null,
32
+ "_last_obs": {
33
+ ":type:": "<class 'numpy.ndarray'>",
34
+ ":serialized:": "gAWVdQQAAAAAAACMEm51bXB5LmNvcmUubnVtZXJpY5SMC19mcm9tYnVmZmVylJOUKJYABAAAAAAAAGbmnbrhCt26snE1u6CkODi+HSs74BrWOQAAgD8AAIA/M/N8PVx7WrozR5o6ANUqNvGUijt+L7G5AACAPwAAgD/AqpE99O6VvHsMrr1BMoW9KxxJPVi9YD4AAIA/AACAPw0Ys732wHi6feBtOSAcKzWrhje7iFGItwAAgD8AAAAAZmJ5vFwsqT/jPeG9x82zvta98DsycOG8AAAAAAAAAAAzdoY89iQwut1Qvjsyogs4VbhZO2NdCrYAAIA/AACAP03cRD2Phmi6Lh6vOhdXJTbwm2W4WZwaNQAAgD8AAIA/2je8vZthcz/wm4C8MfbTvlvMprxZsSi9AAAAAAAAAADNsGi9uHbzuU0CQLm8MjyzDmjktubyXjgAAIA/AACAPwCg0roUDLO6u6/+uxCsmDtqddE6KkiIPAAAgD8AAIA/gJ5qvRQQmrrzgKq79/ZfNvs+S7svOME6AACAPwAAgD8Ad6e82NGYP+RLSb2LzZO+0IIHOlrUpzwAAAAAAAAAAM0+QzyuYYi6shlfu8t9KTi8KkI79zQAOgAAgD8AAIA/jXsAPvj0Kz9F/J281kqsviTmvj0X1ii7AAAAAAAAAAAAbvc89gQdutBdA7o300u1fvmfudnmGjkAAIA/AACAP5pd0rvDlVS6z425Ow/lMzjWyDK7q1iUtwAAgD8AAIA/ZslCvey5ybkCVzw8I+6GNdDaSjsGpYg0AACAPwAAgD/NWhS8KUgCuh7pAbnrJTAzf81luxhRGDgAAIA/AACAP83jwbwfLei5m6xftvfYYzCFQKW6V0CFNQAAgD8AAIA/miUavBMgoz9gPe+8y2G1vh1vxbsmMbk8AAAAAAAAAACt90G+k0ejPrT5rz7bJZe+/AyaPcNOdDwAAAAAAAAAAAA7EL0pSCm64jPfuuTuwrVOHQI5a7/7OQAAgD8AAIA/xqgmvptL3z4VbfY9pqpPviw4lL1IGnU9AAAAAAAAAACAmwi9FBCCumieOjlAFog0IiFcO1DzV7gAAIA/AACAP/PjSr7UyxA/PtFoPnKjjL7mPaK72ywKvQAAAAAAAAAATTksvVzrcboU+iC6sacBtfJtnjprtzs5AACAPwAAgD+a+eS61STXPjbZ5b03Cqi+fH6jveqCHj0AAAAAAAAAADNZJbwpuGe6ImFBuiMOC72QDEI7UtjzPQAAgD8AAAAAZlf6vCloUbo6G8K6IuShM71FiLlLAN85AACAPwAAgD+A+Wc94daAupFJDbi9tk6zgPtuO46JIzcAAIA/AACAP80EsDxc+1u6TiuLuryCiLPSf865bwGgOQAAgD8AAIA/zVZkPcMBLrrwP2w7KpqzNkETmTo2U7E1AACAPwAAgD+UjAVudW1weZSMBWR0eXBllJOUjAJmNJSJiIeUUpQoSwOMATyUTk5OSv////9K/////0sAdJRiSyBLCIaUjAFDlHSUUpQu"
35
+ },
36
+ "_last_episode_starts": {
37
+ ":type:": "<class 'numpy.ndarray'>",
38
+ ":serialized:": "gAWVkwAAAAAAAACMEm51bXB5LmNvcmUubnVtZXJpY5SMC19mcm9tYnVmZmVylJOUKJYgAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAlIwFbnVtcHmUjAVkdHlwZZSTlIwCYjGUiYiHlFKUKEsDjAF8lE5OTkr/////Sv////9LAHSUYksghZSMAUOUdJRSlC4="
39
+ },
40
+ "_last_original_obs": null,
41
+ "_episode_num": 0,
42
+ "use_sde": false,
43
+ "sde_sample_freq": -1,
44
+ "_current_progress_remaining": -0.015808000000000044,
45
+ "_stats_window_size": 100,
46
+ "ep_info_buffer": {
47
+ ":type:": "<class 'collections.deque'>",
48
+ ":serialized:": "gAWVQgwAAAAAAACMC2NvbGxlY3Rpb25zlIwFZGVxdWWUk5QpS2SGlFKUKH2UKIwBcpRHQGGWG2TgVGmMAWyUTegDjAF0lEdAe/VybhFVk3V9lChoBkdAZmz+NLlFMWgHTegDaAhHQHv40YXO4Xp1fZQoaAZHQGbdXRgJC0FoB03oA2gIR0B7+FgF5fMOdX2UKGgGR0BkU6eI2wV1aAdN6ANoCEdAe/4Qgs9SuXV9lChoBkdAXVd9JBgNPWgHTegDaAhHQHv5yhSLqD91fZQoaAZHQGZKDMeOn2toB03oA2gIR0B7+z4yoGY8dX2UKGgGR0BiZ7xZuAI6aAdN6ANoCEdAe/3fzz3AVXV9lChoBkdAZdq+/QBxP2gHTegDaAhHQHwAcWO6unx1fZQoaAZHQGOElBppN9JoB03oA2gIR0B7/eyxA0KrdX2UKGgGR0Bi95ZntfG/aAdN6ANoCEdAe/4We6I3znV9lChoBkdAYubRaX8fm2gHTegDaAhHQHwDoqLCN0h1fZQoaAZHQGEkgJb+tKZoB03oA2gIR0B8BBy6tknUdX2UKGgGR0AiLVe8f3evaAdL4mgIR0B8CXpaA4GVdX2UKGgGR0BmjouAZsKtaAdN6ANoCEdAfAUGTLW7OHV9lChoBkdAXb8AWBSUDGgHTegDaAhHQHwI3tv4ubt1fZQoaAZHQGRXjyOJcgRoB03oA2gIR0B8C3EqDsdDdX2UKGgGR0BgZeW+oLofaAdN6ANoCEdAfAzvX9R77nV9lChoBkdAXu93Ux20RmgHTegDaAhHQHwLqBNEgGN1fZQoaAZHQGgEb5VOsT5oB03oA2gIR0B8DbXFtKqXdX2UKGgGR0BkiCYLLIPtaAdN6ANoCEdAfA5X2/SH/XV9lChoBkdAciNQdCE6DGgHTf4BaAhHQHwPD/VAiV11fZQoaAZHQGHkxubZvk1oB03oA2gIR0B8Ds/5ckdFdX2UKGgGR0Bj8NN8E3bVaAdN6ANoCEdAfBL4ACGN73V9lChoBkdARh0MmWt2cWgHS+BoCEdAfBk5EMLF43V9lChoBkdAZfS2F36hx2gHTegDaAhHQHwdWSpzcRF1fZQoaAZHQGaLGHpKSPloB03oA2gIR0B8GikEcKgJdX2UKGgGR0BjLbakAPupaAdN6ANoCEdAfB2ybhFVk3V9lChoBkdAYxRVwxWT5mgHTegDaAhHQHwgFmOEM9d1fZQoaAZHQGWAS6tknTloB03oA2gIR0B8IAgLZzxPdX2UKGgGR0BkEbKNhmXgaAdN6ANoCEdAfCG8GLUCrHV9lChoBkdAYk6IFeOXFGgHTegDaAhHQHwhlIuoP091fZQoaAZHQGLuoRRMvh9oB03oA2gIR0B8I/FOwgTzdX2UKGgGR0BlOdqcmShbaAdN6ANoCEdAfCfOUt7KJXV9lChoBkdAZ2XlvqC6H2gHTegDaAhHQHzjtL6DXe51fZQoaAZHQF9P5ggHNX5oB03oA2gIR0B85UUg0TDgdX2UKGgGR0Bfa5hjOLR8aAdN6ANoCEdAfOibADaGpXV9lChoBkdAX/CxHG0eEWgHTegDaAhHQHzn8rAgxJx1fZQoaAZHQGD5GJm/WUdoB03oA2gIR0B86UjLSuyNdX2UKGgGR0BkEo7T2FnJaAdN6ANoCEdAfOqxj8UEgXV9lChoBkdAXEUK5TZQHmgHTegDaAhHQHztRBE8aGZ1fZQoaAZHQGRDYqXnhbZoB03oA2gIR0B8797b+Lm7dX2UKGgGR0BkQTQw9JSSaAdN6ANoCEdAfO1fIjnmrHV9lChoBkdAYIzSF49ovmgHTegDaAhHQHztiOzY2891fZQoaAZHQGVm3yy2QXBoB03oA2gIR0B88x4rz5GjdX2UKGgGR0BjknMKTjebaAdN6ANoCEdAfPOuE25xznV9lChoBkdAY0RkuHvc8GgHTegDaAhHQHz5G8Zk0791fZQoaAZHQGdPTLfUF0RoB03oA2gIR0B89KrKeTV2dX2UKGgGR0Bjo42jwhGIaAdN6ANoCEdAfPhp8WsRx3V9lChoBkdAZ0xP4VRDTmgHTegDaAhHQHz6+GXXyy51fZQoaAZHQG9v2KuSwGJoB02NAmgIR0B899DPWxyGdX2UKGgGR0BlzvwG4ZuRaAdN6ANoCEdAfPx/fO2RaHV9lChoBkdAZh5o/zJ6p2gHTegDaAhHQHz7KO5rgwZ1fZQoaAZHQGe9lBY3eepoB03oA2gIR0B8/dmBe5WjdX2UKGgGR0BmxpH/cWTHaAdN6ANoCEdAfP6a3Zwn6XV9lChoBkdAZ8o6reZXuGgHTegDaAhHQHz+XE/B3zN1fZQoaAZHQGEXZ4wAU+NoB03oA2gIR0B9Aqwt8NQTdX2UKGgGR0Bngr6vaDf4aAdN6ANoCEdAfQj0FKTSs3V9lChoBkdAZV5p0OmR/2gHTegDaAhHQH0NKlgtvn91fZQoaAZHQGa8aJQ+EAZoB03oA2gIR0B9DcJhOP/8dX2UKGgGR0BkBTSgGr0baAdN6ANoCEdAfRBbADaGpXV9lChoBkdAZcthDPWxyGgHTegDaAhHQH0QWMS9M9N1fZQoaAZHQGajUQK8cuJoB03oA2gIR0B9EjO4XoC/dX2UKGgGR0Bhe6VD8cdYaAdN6ANoCEdAfRIIWP91l3V9lChoBkdAZa95UtI07GgHTegDaAhHQH0Unl8w5/91fZQoaAZHQGS3Wom5UcZoB03oA2gIR0B9GJlQMx46dX2UKGgGR0Bl498qnWJ8aAdN6ANoCEdAfRg1loUSI3V9lChoBkdAXC5GG21D0GgHTegDaAhHQH3bnQ+lj3F1fZQoaAZHQF7xb/ffoA5oB03oA2gIR0B93wmeDnNgdX2UKGgGR0BhNfTVlPJraAdN6ANoCEdAfd7qEeyRjnV9lChoBkdAYrCJb+tKZmgHTegDaAhHQH3gwKKHfuV1fZQoaAZHQGYye5vtMPBoB03oA2gIR0B94kTGo73gdX2UKGgGR0Bjvsm4RVZLaAdN6ANoCEdAfeUZ5iVjZ3V9lChoBkdAYBKPBi1Aq2gHTegDaAhHQH3nrrcCYC11fZQoaAZHQGg1uEug6EJoB03oA2gIR0B95UU21lXjdX2UKGgGR0BgSjpNbkfcaAdN6ANoCEdAfeVqbz9S/HV9lChoBkdAYF0xTsIE82gHTegDaAhHQH3r1eOXE611fZQoaAZHQGH8z0g8r7RoB03oA2gIR0B97JWxQizLdX2UKGgGR0Bg6iXKKYReaAdN6ANoCEdAffKgCOmzjXV9lChoBkdAYpijYZl4DGgHTegDaAhHQH3uL8R+SbJ1fZQoaAZHQGJtpkXk5p9oB03oA2gIR0B98lthuwX7dX2UKGgGR0BmR9mcvugIaAdN6ANoCEdAffTxEv0yxnV9lChoBkdAZ4R7cfvF32gHTegDaAhHQH3x/crRSgp1fZQoaAZHQGOa4Dklu3toB03oA2gIR0B99w9zOopAdX2UKGgGR0Blu30EovzwaAdN6ANoCEdAffZu7pV0cXV9lChoBkdAY3yCz1K5CmgHTegDaAhHQH36BYJVsDZ1fZQoaAZHQGaVKtYB/7VoB03oA2gIR0B9+uLehwl0dX2UKGgGR0BkEVgnc+JQaAdN6ANoCEdAffqkUsWfsnV9lChoBkdAYlj7mdRR/GgHTegDaAhHQH3/6Kcd5pt1fZQoaAZHQGEEEmhM8HRoB03oA2gIR0B+CLBYV6/qdX2UKGgGR0Bkr5LZi/fwaAdN6ANoCEdAfgzhddE9dXV9lChoBkdAYlXZ1V5rxmgHTegDaAhHQH4OffTCtRx1fZQoaAZHQGZOb5M10kpoB03oA2gIR0B+ElqCYkVvdX2UKGgGR0BlUOPBBRhuaAdN6ANoCEdAfhJaTfR/mXV9lChoBkdAXLbSXt0FKWgHTegDaAhHQH4UgmeDnNh1fZQoaAZHQGYdemWMS9NoB03oA2gIR0B+FF79hqj8dX2UKGgGR0BgEutGNJe3aAdN6ANoCEdAfhgWS2Yv4HV9lChoBkdAXaps1sLv1GgHTegDaAhHQH4cP5ULlV91fZQoaAZHQGW1tC7btZ5oB03oA2gIR0B+HV2Pkq+bdX2UKGgGR0Bg/a3G4qgAaAdN6ANoCEdAfh8QXQ+lj3V9lChoBkdAZW4bqhUR4GgHTegDaAhHQH4izRc/t6Z1ZS4="
49
+ },
50
+ "ep_success_buffer": {
51
+ ":type:": "<class 'collections.deque'>",
52
+ ":serialized:": "gAWVIAAAAAAAAACMC2NvbGxlY3Rpb25zlIwFZGVxdWWUk5QpS2SGlFKULg=="
53
+ },
54
+ "_n_updates": 248,
55
+ "observation_space": {
56
+ ":type:": "<class 'gymnasium.spaces.box.Box'>",
57
+ ":serialized:": "gAWVdgIAAAAAAACMFGd5bW5hc2l1bS5zcGFjZXMuYm94lIwDQm94lJOUKYGUfZQojAVkdHlwZZSMBW51bXB5lIwFZHR5cGWUk5SMAmY0lImIh5RSlChLA4wBPJROTk5K/////0r/////SwB0lGKMDWJvdW5kZWRfYmVsb3eUjBJudW1weS5jb3JlLm51bWVyaWOUjAtfZnJvbWJ1ZmZlcpSTlCiWCAAAAAAAAAABAQEBAQEBAZRoCIwCYjGUiYiHlFKUKEsDjAF8lE5OTkr/////Sv////9LAHSUYksIhZSMAUOUdJRSlIwNYm91bmRlZF9hYm92ZZRoESiWCAAAAAAAAAABAQEBAQEBAZRoFUsIhZRoGXSUUpSMBl9zaGFwZZRLCIWUjANsb3eUaBEoliAAAAAAAAAAAAC0wgAAtMIAAKDAAACgwNsPScAAAKDAAAAAgAAAAICUaAtLCIWUaBl0lFKUjARoaWdolGgRKJYgAAAAAAAAAAAAtEIAALRCAACgQAAAoEDbD0lAAACgQAAAgD8AAIA/lGgLSwiFlGgZdJRSlIwIbG93X3JlcHKUjFtbLTkwLiAgICAgICAgLTkwLiAgICAgICAgIC01LiAgICAgICAgIC01LiAgICAgICAgIC0zLjE0MTU5MjcgIC01LgogIC0wLiAgICAgICAgIC0wLiAgICAgICBdlIwJaGlnaF9yZXBylIxTWzkwLiAgICAgICAgOTAuICAgICAgICAgNS4gICAgICAgICA1LiAgICAgICAgIDMuMTQxNTkyNyAgNS4KICAxLiAgICAgICAgIDEuICAgICAgIF2UjApfbnBfcmFuZG9tlE51Yi4=",
58
+ "dtype": "float32",
59
+ "bounded_below": "[ True True True True True True True True]",
60
+ "bounded_above": "[ True True True True True True True True]",
61
+ "_shape": [
62
+ 8
63
+ ],
64
+ "low": "[-90. -90. -5. -5. -3.1415927 -5.\n -0. -0. ]",
65
+ "high": "[90. 90. 5. 5. 3.1415927 5.\n 1. 1. ]",
66
+ "low_repr": "[-90. -90. -5. -5. -3.1415927 -5.\n -0. -0. ]",
67
+ "high_repr": "[90. 90. 5. 5. 3.1415927 5.\n 1. 1. ]",
68
+ "_np_random": null
69
+ },
70
+ "action_space": {
71
+ ":type:": "<class 'gymnasium.spaces.discrete.Discrete'>",
72
+ ":serialized:": "gAWV/QAAAAAAAACMGWd5bW5hc2l1bS5zcGFjZXMuZGlzY3JldGWUjAhEaXNjcmV0ZZSTlCmBlH2UKIwBbpSMFW51bXB5LmNvcmUubXVsdGlhcnJheZSMBnNjYWxhcpSTlIwFbnVtcHmUjAVkdHlwZZSTlIwCaTiUiYiHlFKUKEsDjAE8lE5OTkr/////Sv////9LAHSUYkMIBAAAAAAAAACUhpRSlIwFc3RhcnSUaAhoDkMIAAAAAAAAAACUhpRSlIwGX3NoYXBllCmMBWR0eXBllGgLjAJpOJSJiIeUUpQoSwNoD05OTkr/////Sv////9LAHSUYowKX25wX3JhbmRvbZROdWIu",
73
+ "n": "4",
74
+ "start": "0",
75
+ "_shape": [],
76
+ "dtype": "int64",
77
+ "_np_random": null
78
+ },
79
+ "n_envs": 32,
80
+ "n_steps": 1024,
81
+ "gamma": 0.999,
82
+ "gae_lambda": 0.98,
83
+ "ent_coef": 0.01,
84
+ "vf_coef": 0.5,
85
+ "max_grad_norm": 0.5,
86
+ "rollout_buffer_class": {
87
+ ":type:": "<class 'abc.ABCMeta'>",
88
+ ":serialized:": "gAWVNgAAAAAAAACMIHN0YWJsZV9iYXNlbGluZXMzLmNvbW1vbi5idWZmZXJzlIwNUm9sbG91dEJ1ZmZlcpSTlC4=",
89
+ "__module__": "stable_baselines3.common.buffers",
90
+ "__annotations__": "{'observations': <class 'numpy.ndarray'>, 'actions': <class 'numpy.ndarray'>, 'rewards': <class 'numpy.ndarray'>, 'advantages': <class 'numpy.ndarray'>, 'returns': <class 'numpy.ndarray'>, 'episode_starts': <class 'numpy.ndarray'>, 'log_probs': <class 'numpy.ndarray'>, 'values': <class 'numpy.ndarray'>}",
91
+ "__doc__": "\n Rollout buffer used in on-policy algorithms like A2C/PPO.\n It corresponds to ``buffer_size`` transitions collected\n using the current policy.\n This experience will be discarded after the policy update.\n In order to use PPO objective, we also store the current value of each state\n and the log probability of each taken action.\n\n The term rollout here refers to the model-free notion and should not\n be used with the concept of rollout used in model-based RL or planning.\n Hence, it is only involved in policy and value function training but not action selection.\n\n :param buffer_size: Max number of element in the buffer\n :param observation_space: Observation space\n :param action_space: Action space\n :param device: PyTorch device\n :param gae_lambda: Factor for trade-off of bias vs variance for Generalized Advantage Estimator\n Equivalent to classic advantage when set to 1.\n :param gamma: Discount factor\n :param n_envs: Number of parallel environments\n ",
92
+ "__init__": "<function RolloutBuffer.__init__ at 0x7f2d8fe97520>",
93
+ "reset": "<function RolloutBuffer.reset at 0x7f2d8fe975b0>",
94
+ "compute_returns_and_advantage": "<function RolloutBuffer.compute_returns_and_advantage at 0x7f2d8fe97640>",
95
+ "add": "<function RolloutBuffer.add at 0x7f2d8fe976d0>",
96
+ "get": "<function RolloutBuffer.get at 0x7f2d8fe97760>",
97
+ "_get_samples": "<function RolloutBuffer._get_samples at 0x7f2d8fe977f0>",
98
+ "__abstractmethods__": "frozenset()",
99
+ "_abc_impl": "<_abc._abc_data object at 0x7f2d8fe8c900>"
100
+ },
101
+ "rollout_buffer_kwargs": {},
102
+ "batch_size": 64,
103
+ "n_epochs": 8,
104
+ "clip_range": {
105
+ ":type:": "<class 'function'>",
106
+ ":serialized:": "gAWV3AIAAAAAAACMF2Nsb3VkcGlja2xlLmNsb3VkcGlja2xllIwOX21ha2VfZnVuY3Rpb26Uk5QoaACMDV9idWlsdGluX3R5cGWUk5SMCENvZGVUeXBllIWUUpQoSwFLAEsASwFLAUsTQwSIAFMAlE6FlCmMAV+UhZSMYy9ob21lL2pvc3NrYS9taW5pY29uZGEzL2VudnMvanNtbDMxMC9saWIvcHl0aG9uMy4xMC9zaXRlLXBhY2thZ2VzL3N0YWJsZV9iYXNlbGluZXMzL2NvbW1vbi91dGlscy5weZSMBGZ1bmOUS4NDAgQBlIwDdmFslIWUKXSUUpR9lCiMC19fcGFja2FnZV9flIwYc3RhYmxlX2Jhc2VsaW5lczMuY29tbW9ulIwIX19uYW1lX1+UjB5zdGFibGVfYmFzZWxpbmVzMy5jb21tb24udXRpbHOUjAhfX2ZpbGVfX5SMYy9ob21lL2pvc3NrYS9taW5pY29uZGEzL2VudnMvanNtbDMxMC9saWIvcHl0aG9uMy4xMC9zaXRlLXBhY2thZ2VzL3N0YWJsZV9iYXNlbGluZXMzL2NvbW1vbi91dGlscy5weZR1Tk5oAIwQX21ha2VfZW1wdHlfY2VsbJSTlClSlIWUdJRSlGgAjBJfZnVuY3Rpb25fc2V0c3RhdGWUk5RoH32UfZQoaBZoDYwMX19xdWFsbmFtZV9flIwZY29uc3RhbnRfZm4uPGxvY2Fscz4uZnVuY5SMD19fYW5ub3RhdGlvbnNfX5R9lIwOX19rd2RlZmF1bHRzX1+UTowMX19kZWZhdWx0c19flE6MCl9fbW9kdWxlX1+UaBeMB19fZG9jX1+UTowLX19jbG9zdXJlX1+UaACMCl9tYWtlX2NlbGyUk5RHP8mZmZmZmZqFlFKUhZSMF19jbG91ZHBpY2tsZV9zdWJtb2R1bGVzlF2UjAtfX2dsb2JhbHNfX5R9lHWGlIZSMC4="
107
+ },
108
+ "clip_range_vf": null,
109
+ "normalize_advantage": true,
110
+ "target_kl": null,
111
+ "lr_schedule": {
112
+ ":type:": "<class 'function'>",
113
+ ":serialized:": "gAWV3AIAAAAAAACMF2Nsb3VkcGlja2xlLmNsb3VkcGlja2xllIwOX21ha2VfZnVuY3Rpb26Uk5QoaACMDV9idWlsdGluX3R5cGWUk5SMCENvZGVUeXBllIWUUpQoSwFLAEsASwFLAUsTQwSIAFMAlE6FlCmMAV+UhZSMYy9ob21lL2pvc3NrYS9taW5pY29uZGEzL2VudnMvanNtbDMxMC9saWIvcHl0aG9uMy4xMC9zaXRlLXBhY2thZ2VzL3N0YWJsZV9iYXNlbGluZXMzL2NvbW1vbi91dGlscy5weZSMBGZ1bmOUS4NDAgQBlIwDdmFslIWUKXSUUpR9lCiMC19fcGFja2FnZV9flIwYc3RhYmxlX2Jhc2VsaW5lczMuY29tbW9ulIwIX19uYW1lX1+UjB5zdGFibGVfYmFzZWxpbmVzMy5jb21tb24udXRpbHOUjAhfX2ZpbGVfX5SMYy9ob21lL2pvc3NrYS9taW5pY29uZGEzL2VudnMvanNtbDMxMC9saWIvcHl0aG9uMy4xMC9zaXRlLXBhY2thZ2VzL3N0YWJsZV9iYXNlbGluZXMzL2NvbW1vbi91dGlscy5weZR1Tk5oAIwQX21ha2VfZW1wdHlfY2VsbJSTlClSlIWUdJRSlGgAjBJfZnVuY3Rpb25fc2V0c3RhdGWUk5RoH32UfZQoaBZoDYwMX19xdWFsbmFtZV9flIwZY29uc3RhbnRfZm4uPGxvY2Fscz4uZnVuY5SMD19fYW5ub3RhdGlvbnNfX5R9lIwOX19rd2RlZmF1bHRzX1+UTowMX19kZWZhdWx0c19flE6MCl9fbW9kdWxlX1+UaBeMB19fZG9jX1+UTowLX19jbG9zdXJlX1+UaACMCl9tYWtlX2NlbGyUk5RHPzOpKjBVMmGFlFKUhZSMF19jbG91ZHBpY2tsZV9zdWJtb2R1bGVzlF2UjAtfX2dsb2JhbHNfX5R9lHWGlIZSMC4="
114
+ },
115
+ "system_info": {
116
+ "OS": "Linux-5.15.133.1-microsoft-standard-WSL2-x86_64-with-glibc2.35 # 1 SMP Thu Oct 5 21:02:42 UTC 2023",
117
+ "Python": "3.10.13",
118
+ "Stable-Baselines3": "2.2.1",
119
+ "PyTorch": "2.2.0+cu121",
120
+ "GPU Enabled": "True",
121
+ "Numpy": "1.26.4",
122
+ "Cloudpickle": "3.0.0",
123
+ "Gymnasium": "0.28.1"
124
+ }
125
+ }
config-ppo-LunarLander-v2_010_000_000_sb3_defaults.json ADDED
@@ -0,0 +1,125 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "policy_class": {
3
+ ":type:": "<class 'abc.ABCMeta'>",
4
+ ":serialized:": "gAWVOwAAAAAAAACMIXN0YWJsZV9iYXNlbGluZXMzLmNvbW1vbi5wb2xpY2llc5SMEUFjdG9yQ3JpdGljUG9saWN5lJOULg==",
5
+ "__module__": "stable_baselines3.common.policies",
6
+ "__doc__": "\n Policy class for actor-critic algorithms (has both policy and value prediction).\n Used by A2C, PPO and the likes.\n\n :param observation_space: Observation space\n :param action_space: Action space\n :param lr_schedule: Learning rate schedule (could be constant)\n :param net_arch: The specification of the policy and value networks.\n :param activation_fn: Activation function\n :param ortho_init: Whether to use or not orthogonal initialization\n :param use_sde: Whether to use State Dependent Exploration or not\n :param log_std_init: Initial value for the log standard deviation\n :param full_std: Whether to use (n_features x n_actions) parameters\n for the std instead of only (n_features,) when using gSDE\n :param use_expln: Use ``expln()`` function instead of ``exp()`` to ensure\n a positive standard deviation (cf paper). It allows to keep variance\n above zero and prevent it from growing too fast. In practice, ``exp()`` is usually enough.\n :param squash_output: Whether to squash the output using a tanh function,\n this allows to ensure boundaries when using gSDE.\n :param features_extractor_class: Features extractor to use.\n :param features_extractor_kwargs: Keyword arguments\n to pass to the features extractor.\n :param share_features_extractor: If True, the features extractor is shared between the policy and value networks.\n :param normalize_images: Whether to normalize images or not,\n dividing by 255.0 (True by default)\n :param optimizer_class: The optimizer to use,\n ``th.optim.Adam`` by default\n :param optimizer_kwargs: Additional keyword arguments,\n excluding the learning rate, to pass to the optimizer\n ",
7
+ "__init__": "<function ActorCriticPolicy.__init__ at 0x7f0c58a111b0>",
8
+ "_get_constructor_parameters": "<function ActorCriticPolicy._get_constructor_parameters at 0x7f0c58a11240>",
9
+ "reset_noise": "<function ActorCriticPolicy.reset_noise at 0x7f0c58a112d0>",
10
+ "_build_mlp_extractor": "<function ActorCriticPolicy._build_mlp_extractor at 0x7f0c58a11360>",
11
+ "_build": "<function ActorCriticPolicy._build at 0x7f0c58a113f0>",
12
+ "forward": "<function ActorCriticPolicy.forward at 0x7f0c58a11480>",
13
+ "extract_features": "<function ActorCriticPolicy.extract_features at 0x7f0c58a11510>",
14
+ "_get_action_dist_from_latent": "<function ActorCriticPolicy._get_action_dist_from_latent at 0x7f0c58a115a0>",
15
+ "_predict": "<function ActorCriticPolicy._predict at 0x7f0c58a11630>",
16
+ "evaluate_actions": "<function ActorCriticPolicy.evaluate_actions at 0x7f0c58a116c0>",
17
+ "get_distribution": "<function ActorCriticPolicy.get_distribution at 0x7f0c58a11750>",
18
+ "predict_values": "<function ActorCriticPolicy.predict_values at 0x7f0c58a117e0>",
19
+ "__abstractmethods__": "frozenset()",
20
+ "_abc_impl": "<_abc._abc_data object at 0x7f0c58a05e80>"
21
+ },
22
+ "verbose": 0,
23
+ "policy_kwargs": {},
24
+ "num_timesteps": 10027008,
25
+ "_total_timesteps": 10000000,
26
+ "_num_timesteps_at_start": 0,
27
+ "seed": null,
28
+ "action_noise": null,
29
+ "start_time": 1708166080608340309,
30
+ "learning_rate": 0.0003,
31
+ "tensorboard_log": null,
32
+ "_last_obs": {
33
+ ":type:": "<class 'numpy.ndarray'>",
34
+ ":serialized:": "gAWVdQQAAAAAAACMEm51bXB5LmNvcmUubnVtZXJpY5SMC19mcm9tYnVmZmVylJOUKJYABAAAAAAAAEATm7327HC6qn9ms3Nx8q6jbNC6fsPEMwAAgD8AAIA/jdwGPoPAFD/tt7s9qwhlv95f7j4xSKC9AAAAAAAAAABTFko+bgDLPsZLgb7mpke/8nyEPvWPib4AAAAAAAAAAI0bz708Loo/fKAvvn+GUL/VVIa+NCkwvgAAAAAAAAAAM9QGPbZVLbzcKbK9ztc/PSvV7TzfEpy8AACAPwAAgD8aqn89DmB7Pw9JFT6Ldmu/yptDPnjJJz4AAAAAAAAAALPVNT6TNDA/eniqvfFKOb8ir6k+fL2MvgAAAAAAAAAApiYtviKhfT9rSa6+Vu0Xv+Qqwr4BmZu+AAAAAAAAAADNvns8aQSdPxNEsD1u7zG/lO4DO0rPnDwAAAAAAAAAAA0rkj24TN86FdJPvlDWp76UcQe9IoutvQAAAAAAAAAABgYJvvuTej+w7U6+8bE7v6Wbzb54ZF6+AAAAAAAAAAAzYJg9DOuzPwLTYz4Ngd6+MHDyPNNY1jwAAAAAAAAAAIZbP75kplw/bz2ivWbHJb+Fjwe/lsJ5vQAAAAAAAAAAM31+vAozDrsWdWk95QaJPM5bFbxCRm09AACAPwAAgD9mWoy8H+2mufB7WztVdhU9SzYEOqVz/r0AAIA/AACAP2YuwrzsosK7kpZ9vP2E+jyKUbe7uNKQNwAAgD8AAIA/zaSHPXy4qT/zt/s+KZjsvmnKLj0EvZg+AAAAAAAAAADAVpq9Hv+gPySqTL6fFxy/SASVvQKIj74AAAAAAAAAAM0ZK73DkWG6eFV0tzZAmrL4E4o7jt2ONgAAgD8AAIA/MyfJu3bqtj+fTQW9BihKvhNVYLprEG45AAAAAAAAAACz+pQ97H6mPibmpL16uTO/VJanPb63B74AAAAAAAAAAKDIYz5zo6w+krDOvm4VRr8zdyM+NWeWvgAAAAAAAAAAhkhNPq34OD/dWLU9xy8ivwTg/z6x3Bi9AAAAAAAAAAAA9aE9PQREu2chw748aHG9sqyMPI0xbj8AAAAAAACAP6PjgL6yfPs+JmWRPsqxU7+EDsO+g1zdPgAAAAAAAAAAZo27PPaYc7pBORw3A4ybMqSKCjvikDG2AACAPwAAgD9NsS09cUcju1YwCb3/CSs9bHcbvC5QDT0AAIA/AACAPzMDhzzsf9C7ciyjvS15YjxxzIo8ATEHvgAAgD8AAIA/pvCHvXHXoj++x/++CfQov14jM73qlIu+AAAAAAAAAADAwJ69Pdhyu2UoHz1RqVk9vM0cPI5d+jwAAIA/AACAPwBJKr1cGyG6dn7hNafDXDFxMkc76skHtQAAgD8AAIA/puiOPd4rlT1m+4m+9AfMvhckZ71VrCO+AAAAAAAAAACUjAVudW1weZSMBWR0eXBllJOUjAJmNJSJiIeUUpQoSwOMATyUTk5OSv////9K/////0sAdJRiSyBLCIaUjAFDlHSUUpQu"
35
+ },
36
+ "_last_episode_starts": {
37
+ ":type:": "<class 'numpy.ndarray'>",
38
+ ":serialized:": "gAWVkwAAAAAAAACMEm51bXB5LmNvcmUubnVtZXJpY5SMC19mcm9tYnVmZmVylJOUKJYgAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAlIwFbnVtcHmUjAVkdHlwZZSTlIwCYjGUiYiHlFKUKEsDjAF8lE5OTkr/////Sv////9LAHSUYksghZSMAUOUdJRSlC4="
39
+ },
40
+ "_last_original_obs": null,
41
+ "_episode_num": 0,
42
+ "use_sde": false,
43
+ "sde_sample_freq": -1,
44
+ "_current_progress_remaining": -0.0027007999999999477,
45
+ "_stats_window_size": 100,
46
+ "ep_info_buffer": {
47
+ ":type:": "<class 'collections.deque'>",
48
+ ":serialized:": "gAWV4AsAAAAAAACMC2NvbGxlY3Rpb25zlIwFZGVxdWWUk5QpS2SGlFKUKH2UKIwBcpRHQHJX+0TlDF+MAWyUS3iMAXSUR0C2AIrah6BzdX2UKGgGR0Bz9euxKQJYaAdLr2gIR0C2AHG5tm+TdX2UKGgGR0Bx54j0L+glaAdLo2gIR0C2AM1efI0ZdX2UKGgGR0Bz7T5dnkDIaAdL0mgIR0C2AISZ4Oc2dX2UKGgGR0BzpYVk+X7caAdLkmgIR0C2ASABYFJQdX2UKGgGR0Bwfyp0fYBeaAdLmmgIR0C2AJJiZv1ldX2UKGgGR0ByWtnscABDaAdLrmgIR0C2ALwiu+yrdX2UKGgGR0ByKEF/x2B8aAdLh2gIR0C2ARHeBQN1dX2UKGgGR0BzQvSWqtHQaAdLmmgIR0C2AS+biIcjdX2UKGgGR0ByPiPEKmbcaAdLpGgIR0C2AI0WZZ0TdX2UKGgGR0Bx9GPIXCTEaAdLrGgIR0C2ASjZlFtsdX2UKGgGR0BxBSABkqc3aAdLhGgIR0C2ASd9x6v8dX2UKGgGR0Bv84X40uUVaAdLjWgIR0C2AMvR/mT1dX2UKGgGR0BMemgzxgAqaAdLhmgIR0C2ANSu2Zy/dX2UKGgGR0Bz3vu4PPLQaAdLvGgIR0C2AMsbWEsbdX2UKGgGR0Byoakj5bhWaAdLo2gIR0C2AMgPqcEvdX2UKGgGR0BwJuJWNm16aAdLimgIR0C2AP4gV45cdX2UKGgGR0BxtIZLqUu+aAdLmGgIR0C2ANX9R77bdX2UKGgGR0Bz7u+7Dl5oaAdLuWgIR0C2ALQX/HYIdX2UKGgGR0ByWC8g6ltTaAdLo2gIR0C2AU0DZDiPdX2UKGgGR0Bx8Cqn3ta7aAdLsWgIR0C2ARdj5KvndX2UKGgGR0BzLWxRl6JJaAdLk2gIR0C2AQc1Gb1AdX2UKGgGR0BynjzmOlwcaAdLjGgIR0C2AVmvfTCtdX2UKGgGR0BxuuMUAT7EaAdLlmgIR0C2AVxEv0yydX2UKGgGR0BzWY4JeE7GaAdLpmgIR0C2APfIn0CjdX2UKGgGR0BxjKrdWQwLaAdLfWgIR0C2AMVC1JDmdX2UKGgGR0Bx8t6rvLHNaAdLh2gIR0C2APW8dxQ0dX2UKGgGR0ByGDiIcinpaAdLqGgIR0C2AWg1rIo3dX2UKGgGR0ByTGg6EJ0GaAdLoWgIR0C2AMFWKdhBdX2UKGgGR0BydAF+uvECaAdLtGgIR0C2ANUTHsC1dX2UKGgGR0Bz/wzuWrwOaAdLlmgIR0C2ANzZtelbdX2UKGgGR0BxePNgSeyzaAdLjGgIR0C2ASqagElmdX2UKGgGR0By/jRKHwgDaAdLs2gIR0C2ANhP420idX2UKGgGR0Bz3Nf9gnc+aAdLpWgIR0C2APnqu8sddX2UKGgGR0BwloyzollcaAdLk2gIR0C2APR7eEZjdX2UKGgGR0BxlJKbrkbQaAdLtWgIR0C2ATUE5hjOdX2UKGgGR0BwnT56+nIiaAdLkGgIR0C2AXGNvOyFdX2UKGgGR0Bwppc3VCokaAdLkGgIR0C2AY9bX6IndX2UKGgGR0BzgHmHP/rCaAdLmmgIR0C2ASLtqpLmdX2UKGgGR0Bxr4prk8zRaAdLemgIR0C2AR4uscQzdX2UKGgGR0Bwpy9FnZkDaAdLhWgIR0C2AYKfOD8MdX2UKGgGR0B0TG4vvjOtaAdLzmgIR0C2ARCF49owdX2UKGgGR0BzFF+EytV8aAdLhmgIR0C2AVl3Y+SsdX2UKGgGR0By4r+JgsshaAdLp2gIR0C2AP4YJmdzdX2UKGgGR0BzDsC5mRNiaAdLg2gIR0C2AQ0mD15CdX2UKGgGR0BzzfpiZv1laAdL02gIR0C2Aa+R5kbxdX2UKGgGR0BxtgxIre67aAdLlGgIR0C2ATrhrFfidX2UKGgGR0Bx/6yB06o3aAdLimgIR0C2AXWrKeTWdX2UKGgGR0Bxv4Sh8IAwaAdLomgIR0C2AUMH8jzJdX2UKGgGR0BwgW2Yv38GaAdLnWgIR0C2ATZhnanKdX2UKGgGR0BxrFvybx3FaAdLrGgIR0C2AZ0SqU/wdX2UKGgGR0BwEWwqy4WlaAdLl2gIR0C2AbQ9FF2FdX2UKGgGR0BzDg2vStvGaAdLp2gIR0C2ATnjU/fPdX2UKGgGR0ByPAQtjCpFaAdLoGgIR0C2AXUqc3ERdX2UKGgGR0BxGPayrxRVaAdLj2gIR0C2Act4NZvDdX2UKGgGR0BvX6rilzltaAdLmGgIR0C2AS5PAO8TdX2UKGgGR0BzBQEfT1CgaAdLlGgIR0C2ASfwiJO4dX2UKGgGR0Bwe7AIppevaAdLomgIR0C2AWduHerNdX2UKGgGR0BxC+ZnctXgaAdLlmgIR0C2AT0Q04zadX2UKGgGR0BzfdpXZGrkaAdLrmgIR0C2AdPv8ZUDdX2UKGgGR0Bxu7MA3kxRaAdLiGgIR0C2AZQq7ROUdX2UKGgGR0ByJ2S9ugpSaAdLsWgIR0C2AXDcRDkVdX2UKGgGR0BzOOxt52QoaAdLjWgIR0C2AVcAaNuMdX2UKGgGR0BwNFW/8EV4aAdLjWgIR0C2AdP2kBS2dX2UKGgGR0ByR5IsiB5HaAdLrWgIR0C2AaNITXardX2UKGgGR0BxzFY9xIataAdLimgIR0C2AYNKNAC5dX2UKGgGR0BzjGtcOby6aAdLz2gIR0C2Aeh0yP+5dX2UKGgGR0BzctW+49X+aAdLo2gIR0C2AWrbQC0XdX2UKGgGR0BzJwU+LWI5aAdLt2gIR0C2AVearmyPdX2UKGgGR0BzthmmLtNSaAdLvWgIR0C2AWCoCMgmdX2UKGgGR0BzhGax5cC6aAdLrWgIR0C2Agc1TBIndX2UKGgGR0BzJk9V3ljmaAdLqGgIR0C2AfZML4N7dX2UKGgGR0Bx4T8BMi8naAdLl2gIR0C2AXe1F6RhdX2UKGgGR0BwWf0cwQDnaAdLl2gIR0C2AXRgVoHtdX2UKGgGR0By+NzEJjUeaAdLj2gIR0C2AZxGlQ/HdX2UKGgGR0Bw2BzPrv9caAdLhmgIR0C2Ag+LiuMddX2UKGgGR0ByR0ySFGoaaAdLoWgIR0C2Acd38n/ldX2UKGgGR0BxiTel9BrvaAdLomgIR0C2AWzAvcrRdX2UKGgGR0BwHaWt2cJ/aAdLjGgIR0C2AfxwVCXydX2UKGgGR0BzFVr9ETg3aAdLmWgIR0C2Ad4J/oaDdX2UKGgGR0B0AghMajveaAdLwWgIR0C2AaJ2Qnx8dX2UKGgGR0BxT6wNb1RMaAdLlGgIR0C2AZsERraedX2UKGgGR0ByEJ5MURFraAdLmWgIR0C2AasU21lYdX2UKGgGR0BwQAflp48maAdLkGgIR0C2AZvXwsoVdX2UKGgGR0By/PN/vv0AaAdLtGgIR0C2AimsJY1YdX2UKGgGR0BxUj9qDbrUaAdLk2gIR0C2AdkFbFCLdX2UKGgGR0Bxlr6qKgqWaAdLjGgIR0C2AYZyZKFqdX2UKGgGR0Bxk3i83++/aAdLnmgIR0C2AjZjc2zfdX2UKGgGR0BwtpoTPBznaAdLjmgIR0C2AZ3IZIhAdX2UKGgGR0By7iPNmlImaAdLoGgIR0C2AZsSoOx0dX2UKGgGR0Bxf15v99+gaAdLmWgIR0C2Ac97a7EpdX2UKGgGR0Bw4peAuqWDaAdLk2gIR0C2AjgwsXizdX2UKGgGR0BwB9kpZwGXaAdLj2gIR0C2AdGYF7ladX2UKGgGR0Bwk1J9RaX8aAdLh2gIR0C2Af6+i8FqdX2UKGgGR0BwXwdYGMXKaAdLk2gIR0C2Afekxh2GdX2UKGgGR0By5NikO7QLaAdLj2gIR0C2Abe7cwg1dX2UKGgGR0BwQr4k/r0KaAdLkWgIR0C2AeXfdhy9dX2UKGgGR0Byp9W6shgWaAdLjWgIR0C2AcrLpzLfdX2UKGgGR0BIo7BfrrxBaAdLWWgIR0C2Aht8JD3NdX2UKGgGR0BxIS8xsVL0aAdLmmgIR0C2AjzWCmMwdWUu"
49
+ },
50
+ "ep_success_buffer": {
51
+ ":type:": "<class 'collections.deque'>",
52
+ ":serialized:": "gAWVIAAAAAAAAACMC2NvbGxlY3Rpb25zlIwFZGVxdWWUk5QpS2SGlFKULg=="
53
+ },
54
+ "_n_updates": 1530,
55
+ "observation_space": {
56
+ ":type:": "<class 'gymnasium.spaces.box.Box'>",
57
+ ":serialized:": "gAWVdgIAAAAAAACMFGd5bW5hc2l1bS5zcGFjZXMuYm94lIwDQm94lJOUKYGUfZQojAVkdHlwZZSMBW51bXB5lIwFZHR5cGWUk5SMAmY0lImIh5RSlChLA4wBPJROTk5K/////0r/////SwB0lGKMDWJvdW5kZWRfYmVsb3eUjBJudW1weS5jb3JlLm51bWVyaWOUjAtfZnJvbWJ1ZmZlcpSTlCiWCAAAAAAAAAABAQEBAQEBAZRoCIwCYjGUiYiHlFKUKEsDjAF8lE5OTkr/////Sv////9LAHSUYksIhZSMAUOUdJRSlIwNYm91bmRlZF9hYm92ZZRoESiWCAAAAAAAAAABAQEBAQEBAZRoFUsIhZRoGXSUUpSMBl9zaGFwZZRLCIWUjANsb3eUaBEoliAAAAAAAAAAAAC0wgAAtMIAAKDAAACgwNsPScAAAKDAAAAAgAAAAICUaAtLCIWUaBl0lFKUjARoaWdolGgRKJYgAAAAAAAAAAAAtEIAALRCAACgQAAAoEDbD0lAAACgQAAAgD8AAIA/lGgLSwiFlGgZdJRSlIwIbG93X3JlcHKUjFtbLTkwLiAgICAgICAgLTkwLiAgICAgICAgIC01LiAgICAgICAgIC01LiAgICAgICAgIC0zLjE0MTU5MjcgIC01LgogIC0wLiAgICAgICAgIC0wLiAgICAgICBdlIwJaGlnaF9yZXBylIxTWzkwLiAgICAgICAgOTAuICAgICAgICAgNS4gICAgICAgICA1LiAgICAgICAgIDMuMTQxNTkyNyAgNS4KICAxLiAgICAgICAgIDEuICAgICAgIF2UjApfbnBfcmFuZG9tlE51Yi4=",
58
+ "dtype": "float32",
59
+ "bounded_below": "[ True True True True True True True True]",
60
+ "bounded_above": "[ True True True True True True True True]",
61
+ "_shape": [
62
+ 8
63
+ ],
64
+ "low": "[-90. -90. -5. -5. -3.1415927 -5.\n -0. -0. ]",
65
+ "high": "[90. 90. 5. 5. 3.1415927 5.\n 1. 1. ]",
66
+ "low_repr": "[-90. -90. -5. -5. -3.1415927 -5.\n -0. -0. ]",
67
+ "high_repr": "[90. 90. 5. 5. 3.1415927 5.\n 1. 1. ]",
68
+ "_np_random": null
69
+ },
70
+ "action_space": {
71
+ ":type:": "<class 'gymnasium.spaces.discrete.Discrete'>",
72
+ ":serialized:": "gAWV/QAAAAAAAACMGWd5bW5hc2l1bS5zcGFjZXMuZGlzY3JldGWUjAhEaXNjcmV0ZZSTlCmBlH2UKIwBbpSMFW51bXB5LmNvcmUubXVsdGlhcnJheZSMBnNjYWxhcpSTlIwFbnVtcHmUjAVkdHlwZZSTlIwCaTiUiYiHlFKUKEsDjAE8lE5OTkr/////Sv////9LAHSUYkMIBAAAAAAAAACUhpRSlIwFc3RhcnSUaAhoDkMIAAAAAAAAAACUhpRSlIwGX3NoYXBllCmMBWR0eXBllGgLjAJpOJSJiIeUUpQoSwNoD05OTkr/////Sv////9LAHSUYowKX25wX3JhbmRvbZROdWIu",
73
+ "n": "4",
74
+ "start": "0",
75
+ "_shape": [],
76
+ "dtype": "int64",
77
+ "_np_random": null
78
+ },
79
+ "n_envs": 32,
80
+ "n_steps": 2048,
81
+ "gamma": 0.99,
82
+ "gae_lambda": 0.95,
83
+ "ent_coef": 0.0,
84
+ "vf_coef": 0.5,
85
+ "max_grad_norm": 0.5,
86
+ "rollout_buffer_class": {
87
+ ":type:": "<class 'abc.ABCMeta'>",
88
+ ":serialized:": "gAWVNgAAAAAAAACMIHN0YWJsZV9iYXNlbGluZXMzLmNvbW1vbi5idWZmZXJzlIwNUm9sbG91dEJ1ZmZlcpSTlC4=",
89
+ "__module__": "stable_baselines3.common.buffers",
90
+ "__annotations__": "{'observations': <class 'numpy.ndarray'>, 'actions': <class 'numpy.ndarray'>, 'rewards': <class 'numpy.ndarray'>, 'advantages': <class 'numpy.ndarray'>, 'returns': <class 'numpy.ndarray'>, 'episode_starts': <class 'numpy.ndarray'>, 'log_probs': <class 'numpy.ndarray'>, 'values': <class 'numpy.ndarray'>}",
91
+ "__doc__": "\n Rollout buffer used in on-policy algorithms like A2C/PPO.\n It corresponds to ``buffer_size`` transitions collected\n using the current policy.\n This experience will be discarded after the policy update.\n In order to use PPO objective, we also store the current value of each state\n and the log probability of each taken action.\n\n The term rollout here refers to the model-free notion and should not\n be used with the concept of rollout used in model-based RL or planning.\n Hence, it is only involved in policy and value function training but not action selection.\n\n :param buffer_size: Max number of element in the buffer\n :param observation_space: Observation space\n :param action_space: Action space\n :param device: PyTorch device\n :param gae_lambda: Factor for trade-off of bias vs variance for Generalized Advantage Estimator\n Equivalent to classic advantage when set to 1.\n :param gamma: Discount factor\n :param n_envs: Number of parallel environments\n ",
92
+ "__init__": "<function RolloutBuffer.__init__ at 0x7f0c58ba3520>",
93
+ "reset": "<function RolloutBuffer.reset at 0x7f0c58ba35b0>",
94
+ "compute_returns_and_advantage": "<function RolloutBuffer.compute_returns_and_advantage at 0x7f0c58ba3640>",
95
+ "add": "<function RolloutBuffer.add at 0x7f0c58ba36d0>",
96
+ "get": "<function RolloutBuffer.get at 0x7f0c58ba3760>",
97
+ "_get_samples": "<function RolloutBuffer._get_samples at 0x7f0c58ba37f0>",
98
+ "__abstractmethods__": "frozenset()",
99
+ "_abc_impl": "<_abc._abc_data object at 0x7f0c58d68540>"
100
+ },
101
+ "rollout_buffer_kwargs": {},
102
+ "batch_size": 64,
103
+ "n_epochs": 10,
104
+ "clip_range": {
105
+ ":type:": "<class 'function'>",
106
+ ":serialized:": "gAWV3AIAAAAAAACMF2Nsb3VkcGlja2xlLmNsb3VkcGlja2xllIwOX21ha2VfZnVuY3Rpb26Uk5QoaACMDV9idWlsdGluX3R5cGWUk5SMCENvZGVUeXBllIWUUpQoSwFLAEsASwFLAUsTQwSIAFMAlE6FlCmMAV+UhZSMYy9ob21lL2pvc3NrYS9taW5pY29uZGEzL2VudnMvanNtbDMxMC9saWIvcHl0aG9uMy4xMC9zaXRlLXBhY2thZ2VzL3N0YWJsZV9iYXNlbGluZXMzL2NvbW1vbi91dGlscy5weZSMBGZ1bmOUS4NDAgQBlIwDdmFslIWUKXSUUpR9lCiMC19fcGFja2FnZV9flIwYc3RhYmxlX2Jhc2VsaW5lczMuY29tbW9ulIwIX19uYW1lX1+UjB5zdGFibGVfYmFzZWxpbmVzMy5jb21tb24udXRpbHOUjAhfX2ZpbGVfX5SMYy9ob21lL2pvc3NrYS9taW5pY29uZGEzL2VudnMvanNtbDMxMC9saWIvcHl0aG9uMy4xMC9zaXRlLXBhY2thZ2VzL3N0YWJsZV9iYXNlbGluZXMzL2NvbW1vbi91dGlscy5weZR1Tk5oAIwQX21ha2VfZW1wdHlfY2VsbJSTlClSlIWUdJRSlGgAjBJfZnVuY3Rpb25fc2V0c3RhdGWUk5RoH32UfZQoaBZoDYwMX19xdWFsbmFtZV9flIwZY29uc3RhbnRfZm4uPGxvY2Fscz4uZnVuY5SMD19fYW5ub3RhdGlvbnNfX5R9lIwOX19rd2RlZmF1bHRzX1+UTowMX19kZWZhdWx0c19flE6MCl9fbW9kdWxlX1+UaBeMB19fZG9jX1+UTowLX19jbG9zdXJlX1+UaACMCl9tYWtlX2NlbGyUk5RHP8mZmZmZmZqFlFKUhZSMF19jbG91ZHBpY2tsZV9zdWJtb2R1bGVzlF2UjAtfX2dsb2JhbHNfX5R9lHWGlIZSMC4="
107
+ },
108
+ "clip_range_vf": null,
109
+ "normalize_advantage": true,
110
+ "target_kl": null,
111
+ "lr_schedule": {
112
+ ":type:": "<class 'function'>",
113
+ ":serialized:": "gAWV3AIAAAAAAACMF2Nsb3VkcGlja2xlLmNsb3VkcGlja2xllIwOX21ha2VfZnVuY3Rpb26Uk5QoaACMDV9idWlsdGluX3R5cGWUk5SMCENvZGVUeXBllIWUUpQoSwFLAEsASwFLAUsTQwSIAFMAlE6FlCmMAV+UhZSMYy9ob21lL2pvc3NrYS9taW5pY29uZGEzL2VudnMvanNtbDMxMC9saWIvcHl0aG9uMy4xMC9zaXRlLXBhY2thZ2VzL3N0YWJsZV9iYXNlbGluZXMzL2NvbW1vbi91dGlscy5weZSMBGZ1bmOUS4NDAgQBlIwDdmFslIWUKXSUUpR9lCiMC19fcGFja2FnZV9flIwYc3RhYmxlX2Jhc2VsaW5lczMuY29tbW9ulIwIX19uYW1lX1+UjB5zdGFibGVfYmFzZWxpbmVzMy5jb21tb24udXRpbHOUjAhfX2ZpbGVfX5SMYy9ob21lL2pvc3NrYS9taW5pY29uZGEzL2VudnMvanNtbDMxMC9saWIvcHl0aG9uMy4xMC9zaXRlLXBhY2thZ2VzL3N0YWJsZV9iYXNlbGluZXMzL2NvbW1vbi91dGlscy5weZR1Tk5oAIwQX21ha2VfZW1wdHlfY2VsbJSTlClSlIWUdJRSlGgAjBJfZnVuY3Rpb25fc2V0c3RhdGWUk5RoH32UfZQoaBZoDYwMX19xdWFsbmFtZV9flIwZY29uc3RhbnRfZm4uPGxvY2Fscz4uZnVuY5SMD19fYW5ub3RhdGlvbnNfX5R9lIwOX19rd2RlZmF1bHRzX1+UTowMX19kZWZhdWx0c19flE6MCl9fbW9kdWxlX1+UaBeMB19fZG9jX1+UTowLX19jbG9zdXJlX1+UaACMCl9tYWtlX2NlbGyUk5RHPzOpKjBVMmGFlFKUhZSMF19jbG91ZHBpY2tsZV9zdWJtb2R1bGVzlF2UjAtfX2dsb2JhbHNfX5R9lHWGlIZSMC4="
114
+ },
115
+ "system_info": {
116
+ "OS": "Linux-5.15.133.1-microsoft-standard-WSL2-x86_64-with-glibc2.35 # 1 SMP Thu Oct 5 21:02:42 UTC 2023",
117
+ "Python": "3.10.13",
118
+ "Stable-Baselines3": "2.2.1",
119
+ "PyTorch": "2.2.0+cu121",
120
+ "GPU Enabled": "True",
121
+ "Numpy": "1.26.4",
122
+ "Cloudpickle": "3.0.0",
123
+ "Gymnasium": "0.28.1"
124
+ }
125
+ }
config-ppo-LunarLander-v2_123_456_789_hf_defaults.json ADDED
@@ -0,0 +1,125 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "policy_class": {
3
+ ":type:": "<class 'abc.ABCMeta'>",
4
+ ":serialized:": "gAWVOwAAAAAAAACMIXN0YWJsZV9iYXNlbGluZXMzLmNvbW1vbi5wb2xpY2llc5SMEUFjdG9yQ3JpdGljUG9saWN5lJOULg==",
5
+ "__module__": "stable_baselines3.common.policies",
6
+ "__doc__": "\n Policy class for actor-critic algorithms (has both policy and value prediction).\n Used by A2C, PPO and the likes.\n\n :param observation_space: Observation space\n :param action_space: Action space\n :param lr_schedule: Learning rate schedule (could be constant)\n :param net_arch: The specification of the policy and value networks.\n :param activation_fn: Activation function\n :param ortho_init: Whether to use or not orthogonal initialization\n :param use_sde: Whether to use State Dependent Exploration or not\n :param log_std_init: Initial value for the log standard deviation\n :param full_std: Whether to use (n_features x n_actions) parameters\n for the std instead of only (n_features,) when using gSDE\n :param use_expln: Use ``expln()`` function instead of ``exp()`` to ensure\n a positive standard deviation (cf paper). It allows to keep variance\n above zero and prevent it from growing too fast. In practice, ``exp()`` is usually enough.\n :param squash_output: Whether to squash the output using a tanh function,\n this allows to ensure boundaries when using gSDE.\n :param features_extractor_class: Features extractor to use.\n :param features_extractor_kwargs: Keyword arguments\n to pass to the features extractor.\n :param share_features_extractor: If True, the features extractor is shared between the policy and value networks.\n :param normalize_images: Whether to normalize images or not,\n dividing by 255.0 (True by default)\n :param optimizer_class: The optimizer to use,\n ``th.optim.Adam`` by default\n :param optimizer_kwargs: Additional keyword arguments,\n excluding the learning rate, to pass to the optimizer\n ",
7
+ "__init__": "<function ActorCriticPolicy.__init__ at 0x7f3bc45911b0>",
8
+ "_get_constructor_parameters": "<function ActorCriticPolicy._get_constructor_parameters at 0x7f3bc4591240>",
9
+ "reset_noise": "<function ActorCriticPolicy.reset_noise at 0x7f3bc45912d0>",
10
+ "_build_mlp_extractor": "<function ActorCriticPolicy._build_mlp_extractor at 0x7f3bc4591360>",
11
+ "_build": "<function ActorCriticPolicy._build at 0x7f3bc45913f0>",
12
+ "forward": "<function ActorCriticPolicy.forward at 0x7f3bc4591480>",
13
+ "extract_features": "<function ActorCriticPolicy.extract_features at 0x7f3bc4591510>",
14
+ "_get_action_dist_from_latent": "<function ActorCriticPolicy._get_action_dist_from_latent at 0x7f3bc45915a0>",
15
+ "_predict": "<function ActorCriticPolicy._predict at 0x7f3bc4591630>",
16
+ "evaluate_actions": "<function ActorCriticPolicy.evaluate_actions at 0x7f3bc45916c0>",
17
+ "get_distribution": "<function ActorCriticPolicy.get_distribution at 0x7f3bc4591750>",
18
+ "predict_values": "<function ActorCriticPolicy.predict_values at 0x7f3bc45917e0>",
19
+ "__abstractmethods__": "frozenset()",
20
+ "_abc_impl": "<_abc._abc_data object at 0x7f3bc4585d40>"
21
+ },
22
+ "verbose": 0,
23
+ "policy_kwargs": {},
24
+ "num_timesteps": 123469824,
25
+ "_total_timesteps": 123456789,
26
+ "_num_timesteps_at_start": 0,
27
+ "seed": null,
28
+ "action_noise": null,
29
+ "start_time": 1708381100659934127,
30
+ "learning_rate": 0.0003,
31
+ "tensorboard_log": null,
32
+ "_last_obs": {
33
+ ":type:": "<class 'numpy.ndarray'>",
34
+ ":serialized:": "gAWVdQQAAAAAAACMEm51bXB5LmNvcmUubnVtZXJpY5SMC19mcm9tYnVmZmVylJOUKJYABAAAAAAAAC0vXz6vPes+m5mbvi/bMr8kxKU+Lz2svgAAAAAAAAAAZjb+uqQhSruhmsy7IFTWPGNIITxROLC8AACAPwAAgD8AVOk8iFu3P5v1vz51w2k9I3g3PAqIHT4AAAAAAAAAAIAyKD6HQqc/M6/5PrylDb9l7wc/TvGyPgAAAAAAAAAAmpd0PSrT/D4GbTu9gkJev5cO6j0ckaO9AAAAAAAAAAAzy4C9QLdIP1Q9t70fS3W/Sb1SvkFLD7wAAAAAAAAAAJrqrDzbOKG8SpZ4vvDzLb4tGZM89VFpPwAAgD8AAIA/5u3BPTxFkj7Q1TK+9LIsv7te7z1axUW+AAAAAAAAAAAztQg9w506um+yLLz2TpQ1hmKHO0qODLUAAIA/AACAP5qL+j3eLdI9YBoJvzZR9b6UGa29vxXKvgAAAAAAAAAAmhkzPEw8tT8TWww/BqPIPeAjNbx5ScC9AAAAAAAAAADT8iC+fxp/P4DH975gcjS/gTZ1vmqHpr4AAAAAAAAAACqrjL63k3w/W30hvj9VU7/NNRy/mHD0PQAAAAAAAAAAJjHXPWIAgD/2l5Q+Hus+v49iVD4erYQ+AAAAAAAAAAAzU1S6w5lKusgo/DXTxXAxewrluQqAIrUAAIA/AACAPwAM4ruuS4a6ghNRPRolIjlw1Ta7EwAbOAAAgD8AAIA/pkA5vuQchD+dInq+h6BWv3zg3b7Q9GO9AAAAAAAAAADNeeo8e4W3P/1bpz5hBgk9Pbd+PNCoHj4AAAAAAAAAAM3f0bz2zHi6HE0guBp5zbIdXcY5tDQ3NwAAgD8AAIA/Wu5evqvYKz/iUYc9JfdYv1gHxr5rvjo+AAAAAAAAAADN4OW7w5Unuv6lHLj91IuznbWiO0iENzcAAIA/AACAPzMTETtIG426X6CUvfVHPbSsAzm6mlmrMwAAgD8AAIA/ZjasOp/Y9bsU1J09z+SzPDaOsjwu9868AACAPwAAgD8zAa68ZNGuP935bb5HSrC+rK5xvGYdJr4AAAAAAAAAAKAhMr4n150++K6bPn/pM7+xCHa+lR5qPgAAAAAAAAAATY4IPj5kcT/mXA8+/yJgv85C2D5cWKu8AAAAAAAAAABmSAi8FJyhus2vJ7ipx2uzbehmOp+EPzcAAIA/AACAP43Lij0840Q99uXYvpOewr5+KUS+LY/DvgAAAAAAAAAAM4+Wu+Ggrbqezgg1s5faL6LxYDrlCG20AACAPwAAgD+aiYC6SJeAuvXw4rNB/aMvxE2EugWxpzMAAIA/AACAP5rFLT1FshE+43hKvqYADb/JAiA7uNi2vQAAAAAAAAAAmhkEunvClbrS1Hq6BAdstb1QlrjtJZE5AACAPwAAgD+UjAVudW1weZSMBWR0eXBllJOUjAJmNJSJiIeUUpQoSwOMATyUTk5OSv////9K/////0sAdJRiSyBLCIaUjAFDlHSUUpQu"
35
+ },
36
+ "_last_episode_starts": {
37
+ ":type:": "<class 'numpy.ndarray'>",
38
+ ":serialized:": "gAWVkwAAAAAAAACMEm51bXB5LmNvcmUubnVtZXJpY5SMC19mcm9tYnVmZmVylJOUKJYgAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAlIwFbnVtcHmUjAVkdHlwZZSTlIwCYjGUiYiHlFKUKEsDjAF8lE5OTkr/////Sv////9LAHSUYksghZSMAUOUdJRSlC4="
39
+ },
40
+ "_last_original_obs": null,
41
+ "_episode_num": 0,
42
+ "use_sde": false,
43
+ "sde_sample_freq": -1,
44
+ "_current_progress_remaining": -0.00010558350096090408,
45
+ "_stats_window_size": 100,
46
+ "ep_info_buffer": {
47
+ ":type:": "<class 'collections.deque'>",
48
+ ":serialized:": "gAWV4QsAAAAAAACMC2NvbGxlY3Rpb25zlIwFZGVxdWWUk5QpS2SGlFKUKH2UKIwBcpRHQHAKqVyFPBWMAWyUS5eMAXSUR0DqXdl/qgRLdX2UKGgGR0BzCrcUM5OraAdLt2gIR0DqXdmdfb9IdX2UKGgGR0Byaka86FM7aAdLsGgIR0DqXdgx7AtWdX2UKGgGR0Bv57cM3IdVaAdLnWgIR0DqXdp+b3GodX2UKGgGR0ByuqsfaHsUaAdLrGgIR0DqXdscFyJbdX2UKGgGR0BynZ4IKMNuaAdLr2gIR0DqXdpKwIMSdX2UKGgGR0BzHHQrtmcwaAdLtWgIR0DqXdulQdjodX2UKGgGR0BvLBPl+3H8aAdLjmgIR0DqXd0qwQlKdX2UKGgGR0BzAw8yN4qxaAdLmWgIR0DqXeL84PwvdX2UKGgGR0Bz+L8R+SbIaAdLtmgIR0DqXd9VDrqudX2UKGgGR0Bw1MX1rZanaAdLkGgIR0DqXd6gQHzIdX2UKGgGR0By3LmNipeeaAdLtWgIR0DqXd3/jKgadX2UKGgGR0BzZDtdAxBWaAdLsmgIR0DqXeDat9x7dX2UKGgGR0Bv3on+hoM8aAdLn2gIR0DqXd83eenRdX2UKGgGR0BwLFuHerMlaAdLlWgIR0DqXdzvRZ2ZdX2UKGgGR0Bw99JTVDrraAdLjmgIR0DqXd5ZeRgadX2UKGgGR0BxSv30wrUcaAdLsWgIR0DqXeBiG34LdX2UKGgGR0BzRD0NBnjAaAdLrWgIR0DqXeBTRYzSdX2UKGgGR0Bwrzklu3tsaAdLjmgIR0DqXd6g1WKedX2UKGgGR0BzNekN4JNTaAdLuGgIR0DqXeIpkPMCdX2UKGgGR0Byelszl90BaAdLumgIR0DqXeA0xdpqdX2UKGgGR0ByyjGT9sJqaAdLvWgIR0DqXeOVbiZOdX2UKGgGR0BxodqDbrTqaAdLsGgIR0DqXeM1F6RhdX2UKGgGR0By+gy/KyOaaAdLlGgIR0DqXeUqp97XdX2UKGgGR0B0CipcX3xnaAdLwmgIR0DqXeME384xdX2UKGgGR0Bxl6f29L6DaAdLkGgIR0DqXeTXzUZvdX2UKGgGR0BytMSf16E8aAdLuWgIR0DqXeOUmD15dX2UKGgGR0B0H0lhPTG6aAdLv2gIR0DqXeVO1v2odX2UKGgGR0Bx389zOopAaAdLr2gIR0DqXepf779AdX2UKGgGR0BxWy0ngHeKaAdLl2gIR0DqXeS83++/dX2UKGgGR0By5TA31jAjaAdLlWgIR0DqXebZeRgadX2UKGgGR0B0aaNhmXgMaAdLvWgIR0DqXep+5OJtdX2UKGgGR0BwTYZhrnDBaAdLrWgIR0DqXeffXwsodX2UKGgGR0Bw6iEtdzGQaAdLr2gIR0DqXegmMOwxdX2UKGgGR0BySRARkEs8aAdLlmgIR0DqXe9VoYeldX2UKGgGR0Bx6+Jzkp7UaAdLkWgIR0DqXeqMJhOQdX2UKGgGR0BxPkxVQyh0aAdLjmgIR0DqXeyZ2pyZdX2UKGgGR0BvA0dilSCOaAdLoWgIR0DqXeynG828dX2UKGgGR0BzIPi4rjHXaAdLuWgIR0DqXep5uZTidX2UKGgGR0Bzg/eDWbw0aAdLl2gIR0DqXel6cAindX2UKGgGR0ByXK5lOGj9aAdLt2gIR0DqXemCjk+5dX2UKGgGR0Bxg9FXq7iAaAdLhmgIR0DqXenjsD4hdX2UKGgGR0By3+RYA80UaAdLkWgIR0DqXex/vv0AdX2UKGgGR0BzEVU0elsQaAdLu2gIR0DqXey6r/83dX2UKGgGR0Bz67qQiiZfaAdLw2gIR0DqXevUXpGGdX2UKGgGR0BzRit9x6v8aAdLrGgIR0DqXe1xtHhCdX2UKGgGR0B0c4l9jPOZaAdLvGgIR0DqXe2C/47BdX2UKGgGR0ByRVrDZUT+aAdLqGgIR0DqXe5KYiPidX2UKGgGR0Bz3ZL7GecyaAdLwWgIR0DqXe5F72L6dX2UKGgGR0BmCKTMaCL/aAdN6ANoCEdA6l3u+9Jz1nV9lChoBkdAc+B0IToMa2gHS6JoCEdA6l3w99lVcXV9lChoBkdAcyEUNayKN2gHS7loCEdA6l3vfhVENXV9lChoBkdAcTLaisXBQGgHS6poCEdA6l3xRrzoU3V9lChoBkdAc54Trmhdt2gHS8FoCEdA6l3yHjIaLnV9lChoBkdAccOuAI6bOWgHS6ZoCEdA6l3y59mYjXV9lChoBkdAcZ5ER8MNMGgHS6VoCEdA6l3wrDZUUHV9lChoBkdAcVgKqGUOeGgHS6VoCEdA6l3yd2ovSXV9lChoBkdAcjdNATqSo2gHS45oCEdA6l32EAo5P3V9lChoBkdAcW40Ltu1nmgHS6RoCEdA6l3xE0BOpXV9lChoBkdAcZYySFGoaWgHS5ZoCEdA6l32w/HHWHV9lChoBkdAc0Gkc0cfeWgHS6loCEdA6l3zLFn7HnV9lChoBkdAcq8Tjebd8GgHS6loCEdA6l30yOq//XV9lChoBkdAcjwP+4smOWgHS7JoCEdA6l3zhAWznnV9lChoBkdAcl8RYzSCv2gHS7BoCEdA6l32fD+BH3V9lChoBkdAclmScLBsRGgHS69oCEdA6l32r4nF53V9lChoBkdAcscjHGS6lWgHS6loCEdA6l36kZR8+nV9lChoBkdAcu6toSL612gHS7doCEdA6l350vf0mXV9lChoBkdAclpflp48l2gHS7BoCEdA6l37RJ/XoXV9lChoBkdAckhIdU83dmgHS6loCEdA6l36io86m3V9lChoBkdAchYtfXwsoWgHS6toCEdA6l34GFJxvXV9lChoBkdAck0AhB7eEmgHS51oCEdA6l36j15B1XV9lChoBkdAcz8EcsDnvGgHS6RoCEdA6l35gdfb9XV9lChoBkdAcusVkc0cfmgHS6xoCEdA6l37CzC1qnV9lChoBkdAcYPub7TDwmgHS4loCEdA6l35xwyZa3V9lChoBkdAcqpE12q1gGgHS7loCEdA6l357kXDWXV9lChoBkdAcXb4T9KmK2gHS8toCEdA6l4AUkfLcXV9lChoBkdAclOaGYa5w2gHS7xoCEdA6l35KCHymXV9lChoBkdAcIyJ4jbBXWgHS6ZoCEdA6l37X5nDi3V9lChoBkdAcyTCUX531WgHS4hoCEdA6l38cAzYVnV9lChoBkdAb+K065oXbmgHS9hoCEdA6l37fA0sOHV9lChoBkdAcHf2KEWZZ2gHS5loCEdA6l3+OGCZnnV9lChoBkdAcZeMcZLqU2gHS7VoCEdA6l3+L5qM33V9lChoBkdAcvNvXsgMdGgHS5loCEdA6l3/cgyM1nV9lChoBkdAdM7Zof0VamgHS61oCEdA6l3/VyWAw3V9lChoBkdAczfHhS9/SmgHS7loCEdA6l3/KFh5PnV9lChoBkdAcw8VM23rlmgHS7VoCEdA6l4BbcoH9nV9lChoBkdAcpQm8ujASGgHS7NoCEdA6l4CDW9UTHV9lChoBkdAcc7FTefqYGgHS69oCEdA6l4E5dv863V9lChoBkdAcADyZ8a4t2gHS5doCEdA6l4AINmUW3V9lChoBkdAcnbfOUt7KWgHS5JoCEdA6l3/4NiH7HV9lChoBkdAcxP2hIvrW2gHS7doCEdA6l4AmxMWXXV9lChoBkdAdBUNMGorF2gHS/NoCEdA6l4CsfzSTnV9lChoBkdAcbLf/FR51WgHS4loCEdA6l4GGa6ST3V9lChoBkdAc7ZSeiBXjmgHS8BoCEdA6l4HBWo3rHV9lChoBkdAceRij+Jgs2gHS6poCEdA6l4EyX+l03V9lChoBkdAdBy9RrJr+GgHS8FoCEdA6l4FFaB7NXV9lChoBkdAc7Tb/wRXfmgHS7poCEdA6l4GRlHz6XV9lChoBkdAcZHMqjJuEWgHS5poCEdA6l4GqXv6THV9lChoBkdAcRxvIfbKzWgHS5BoCEdA6l4HCV8kU3V9lChoBkdActiMVUModGgHS5doCEdA6l4HHObAlHVlLg=="
49
+ },
50
+ "ep_success_buffer": {
51
+ ":type:": "<class 'collections.deque'>",
52
+ ":serialized:": "gAWVIAAAAAAAAACMC2NvbGxlY3Rpb25zlIwFZGVxdWWUk5QpS2SGlFKULg=="
53
+ },
54
+ "_n_updates": 30144,
55
+ "observation_space": {
56
+ ":type:": "<class 'gymnasium.spaces.box.Box'>",
57
+ ":serialized:": "gAWVdgIAAAAAAACMFGd5bW5hc2l1bS5zcGFjZXMuYm94lIwDQm94lJOUKYGUfZQojAVkdHlwZZSMBW51bXB5lIwFZHR5cGWUk5SMAmY0lImIh5RSlChLA4wBPJROTk5K/////0r/////SwB0lGKMDWJvdW5kZWRfYmVsb3eUjBJudW1weS5jb3JlLm51bWVyaWOUjAtfZnJvbWJ1ZmZlcpSTlCiWCAAAAAAAAAABAQEBAQEBAZRoCIwCYjGUiYiHlFKUKEsDjAF8lE5OTkr/////Sv////9LAHSUYksIhZSMAUOUdJRSlIwNYm91bmRlZF9hYm92ZZRoESiWCAAAAAAAAAABAQEBAQEBAZRoFUsIhZRoGXSUUpSMBl9zaGFwZZRLCIWUjANsb3eUaBEoliAAAAAAAAAAAAC0wgAAtMIAAKDAAACgwNsPScAAAKDAAAAAgAAAAICUaAtLCIWUaBl0lFKUjARoaWdolGgRKJYgAAAAAAAAAAAAtEIAALRCAACgQAAAoEDbD0lAAACgQAAAgD8AAIA/lGgLSwiFlGgZdJRSlIwIbG93X3JlcHKUjFtbLTkwLiAgICAgICAgLTkwLiAgICAgICAgIC01LiAgICAgICAgIC01LiAgICAgICAgIC0zLjE0MTU5MjcgIC01LgogIC0wLiAgICAgICAgIC0wLiAgICAgICBdlIwJaGlnaF9yZXBylIxTWzkwLiAgICAgICAgOTAuICAgICAgICAgNS4gICAgICAgICA1LiAgICAgICAgIDMuMTQxNTkyNyAgNS4KICAxLiAgICAgICAgIDEuICAgICAgIF2UjApfbnBfcmFuZG9tlE51Yi4=",
58
+ "dtype": "float32",
59
+ "bounded_below": "[ True True True True True True True True]",
60
+ "bounded_above": "[ True True True True True True True True]",
61
+ "_shape": [
62
+ 8
63
+ ],
64
+ "low": "[-90. -90. -5. -5. -3.1415927 -5.\n -0. -0. ]",
65
+ "high": "[90. 90. 5. 5. 3.1415927 5.\n 1. 1. ]",
66
+ "low_repr": "[-90. -90. -5. -5. -3.1415927 -5.\n -0. -0. ]",
67
+ "high_repr": "[90. 90. 5. 5. 3.1415927 5.\n 1. 1. ]",
68
+ "_np_random": null
69
+ },
70
+ "action_space": {
71
+ ":type:": "<class 'gymnasium.spaces.discrete.Discrete'>",
72
+ ":serialized:": "gAWV/QAAAAAAAACMGWd5bW5hc2l1bS5zcGFjZXMuZGlzY3JldGWUjAhEaXNjcmV0ZZSTlCmBlH2UKIwBbpSMFW51bXB5LmNvcmUubXVsdGlhcnJheZSMBnNjYWxhcpSTlIwFbnVtcHmUjAVkdHlwZZSTlIwCaTiUiYiHlFKUKEsDjAE8lE5OTkr/////Sv////9LAHSUYkMIBAAAAAAAAACUhpRSlIwFc3RhcnSUaAhoDkMIAAAAAAAAAACUhpRSlIwGX3NoYXBllCmMBWR0eXBllGgLjAJpOJSJiIeUUpQoSwNoD05OTkr/////Sv////9LAHSUYowKX25wX3JhbmRvbZROdWIu",
73
+ "n": "4",
74
+ "start": "0",
75
+ "_shape": [],
76
+ "dtype": "int64",
77
+ "_np_random": null
78
+ },
79
+ "n_envs": 32,
80
+ "n_steps": 1024,
81
+ "gamma": 0.999,
82
+ "gae_lambda": 0.98,
83
+ "ent_coef": 0.01,
84
+ "vf_coef": 0.5,
85
+ "max_grad_norm": 0.5,
86
+ "rollout_buffer_class": {
87
+ ":type:": "<class 'abc.ABCMeta'>",
88
+ ":serialized:": "gAWVNgAAAAAAAACMIHN0YWJsZV9iYXNlbGluZXMzLmNvbW1vbi5idWZmZXJzlIwNUm9sbG91dEJ1ZmZlcpSTlC4=",
89
+ "__module__": "stable_baselines3.common.buffers",
90
+ "__annotations__": "{'observations': <class 'numpy.ndarray'>, 'actions': <class 'numpy.ndarray'>, 'rewards': <class 'numpy.ndarray'>, 'advantages': <class 'numpy.ndarray'>, 'returns': <class 'numpy.ndarray'>, 'episode_starts': <class 'numpy.ndarray'>, 'log_probs': <class 'numpy.ndarray'>, 'values': <class 'numpy.ndarray'>}",
91
+ "__doc__": "\n Rollout buffer used in on-policy algorithms like A2C/PPO.\n It corresponds to ``buffer_size`` transitions collected\n using the current policy.\n This experience will be discarded after the policy update.\n In order to use PPO objective, we also store the current value of each state\n and the log probability of each taken action.\n\n The term rollout here refers to the model-free notion and should not\n be used with the concept of rollout used in model-based RL or planning.\n Hence, it is only involved in policy and value function training but not action selection.\n\n :param buffer_size: Max number of element in the buffer\n :param observation_space: Observation space\n :param action_space: Action space\n :param device: PyTorch device\n :param gae_lambda: Factor for trade-off of bias vs variance for Generalized Advantage Estimator\n Equivalent to classic advantage when set to 1.\n :param gamma: Discount factor\n :param n_envs: Number of parallel environments\n ",
92
+ "__init__": "<function RolloutBuffer.__init__ at 0x7f3bc4723520>",
93
+ "reset": "<function RolloutBuffer.reset at 0x7f3bc47235b0>",
94
+ "compute_returns_and_advantage": "<function RolloutBuffer.compute_returns_and_advantage at 0x7f3bc4723640>",
95
+ "add": "<function RolloutBuffer.add at 0x7f3bc47236d0>",
96
+ "get": "<function RolloutBuffer.get at 0x7f3bc4723760>",
97
+ "_get_samples": "<function RolloutBuffer._get_samples at 0x7f3bc47237f0>",
98
+ "__abstractmethods__": "frozenset()",
99
+ "_abc_impl": "<_abc._abc_data object at 0x7f3bc48ea480>"
100
+ },
101
+ "rollout_buffer_kwargs": {},
102
+ "batch_size": 64,
103
+ "n_epochs": 8,
104
+ "clip_range": {
105
+ ":type:": "<class 'function'>",
106
+ ":serialized:": "gAWV3AIAAAAAAACMF2Nsb3VkcGlja2xlLmNsb3VkcGlja2xllIwOX21ha2VfZnVuY3Rpb26Uk5QoaACMDV9idWlsdGluX3R5cGWUk5SMCENvZGVUeXBllIWUUpQoSwFLAEsASwFLAUsTQwSIAFMAlE6FlCmMAV+UhZSMYy9ob21lL2pvc3NrYS9taW5pY29uZGEzL2VudnMvanNtbDMxMC9saWIvcHl0aG9uMy4xMC9zaXRlLXBhY2thZ2VzL3N0YWJsZV9iYXNlbGluZXMzL2NvbW1vbi91dGlscy5weZSMBGZ1bmOUS4NDAgQBlIwDdmFslIWUKXSUUpR9lCiMC19fcGFja2FnZV9flIwYc3RhYmxlX2Jhc2VsaW5lczMuY29tbW9ulIwIX19uYW1lX1+UjB5zdGFibGVfYmFzZWxpbmVzMy5jb21tb24udXRpbHOUjAhfX2ZpbGVfX5SMYy9ob21lL2pvc3NrYS9taW5pY29uZGEzL2VudnMvanNtbDMxMC9saWIvcHl0aG9uMy4xMC9zaXRlLXBhY2thZ2VzL3N0YWJsZV9iYXNlbGluZXMzL2NvbW1vbi91dGlscy5weZR1Tk5oAIwQX21ha2VfZW1wdHlfY2VsbJSTlClSlIWUdJRSlGgAjBJfZnVuY3Rpb25fc2V0c3RhdGWUk5RoH32UfZQoaBZoDYwMX19xdWFsbmFtZV9flIwZY29uc3RhbnRfZm4uPGxvY2Fscz4uZnVuY5SMD19fYW5ub3RhdGlvbnNfX5R9lIwOX19rd2RlZmF1bHRzX1+UTowMX19kZWZhdWx0c19flE6MCl9fbW9kdWxlX1+UaBeMB19fZG9jX1+UTowLX19jbG9zdXJlX1+UaACMCl9tYWtlX2NlbGyUk5RHP8mZmZmZmZqFlFKUhZSMF19jbG91ZHBpY2tsZV9zdWJtb2R1bGVzlF2UjAtfX2dsb2JhbHNfX5R9lHWGlIZSMC4="
107
+ },
108
+ "clip_range_vf": null,
109
+ "normalize_advantage": true,
110
+ "target_kl": null,
111
+ "lr_schedule": {
112
+ ":type:": "<class 'function'>",
113
+ ":serialized:": "gAWV3AIAAAAAAACMF2Nsb3VkcGlja2xlLmNsb3VkcGlja2xllIwOX21ha2VfZnVuY3Rpb26Uk5QoaACMDV9idWlsdGluX3R5cGWUk5SMCENvZGVUeXBllIWUUpQoSwFLAEsASwFLAUsTQwSIAFMAlE6FlCmMAV+UhZSMYy9ob21lL2pvc3NrYS9taW5pY29uZGEzL2VudnMvanNtbDMxMC9saWIvcHl0aG9uMy4xMC9zaXRlLXBhY2thZ2VzL3N0YWJsZV9iYXNlbGluZXMzL2NvbW1vbi91dGlscy5weZSMBGZ1bmOUS4NDAgQBlIwDdmFslIWUKXSUUpR9lCiMC19fcGFja2FnZV9flIwYc3RhYmxlX2Jhc2VsaW5lczMuY29tbW9ulIwIX19uYW1lX1+UjB5zdGFibGVfYmFzZWxpbmVzMy5jb21tb24udXRpbHOUjAhfX2ZpbGVfX5SMYy9ob21lL2pvc3NrYS9taW5pY29uZGEzL2VudnMvanNtbDMxMC9saWIvcHl0aG9uMy4xMC9zaXRlLXBhY2thZ2VzL3N0YWJsZV9iYXNlbGluZXMzL2NvbW1vbi91dGlscy5weZR1Tk5oAIwQX21ha2VfZW1wdHlfY2VsbJSTlClSlIWUdJRSlGgAjBJfZnVuY3Rpb25fc2V0c3RhdGWUk5RoH32UfZQoaBZoDYwMX19xdWFsbmFtZV9flIwZY29uc3RhbnRfZm4uPGxvY2Fscz4uZnVuY5SMD19fYW5ub3RhdGlvbnNfX5R9lIwOX19rd2RlZmF1bHRzX1+UTowMX19kZWZhdWx0c19flE6MCl9fbW9kdWxlX1+UaBeMB19fZG9jX1+UTowLX19jbG9zdXJlX1+UaACMCl9tYWtlX2NlbGyUk5RHPzOpKjBVMmGFlFKUhZSMF19jbG91ZHBpY2tsZV9zdWJtb2R1bGVzlF2UjAtfX2dsb2JhbHNfX5R9lHWGlIZSMC4="
114
+ },
115
+ "system_info": {
116
+ "OS": "Linux-5.15.133.1-microsoft-standard-WSL2-x86_64-with-glibc2.35 # 1 SMP Thu Oct 5 21:02:42 UTC 2023",
117
+ "Python": "3.10.13",
118
+ "Stable-Baselines3": "2.2.1",
119
+ "PyTorch": "2.2.0+cu121",
120
+ "GPU Enabled": "True",
121
+ "Numpy": "1.26.4",
122
+ "Cloudpickle": "3.0.0",
123
+ "Gymnasium": "0.28.1"
124
+ }
125
+ }
config.json ADDED
@@ -0,0 +1,125 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "policy_class": {
3
+ ":type:": "<class 'abc.ABCMeta'>",
4
+ ":serialized:": "gAWVOwAAAAAAAACMIXN0YWJsZV9iYXNlbGluZXMzLmNvbW1vbi5wb2xpY2llc5SMEUFjdG9yQ3JpdGljUG9saWN5lJOULg==",
5
+ "__module__": "stable_baselines3.common.policies",
6
+ "__doc__": "\n Policy class for actor-critic algorithms (has both policy and value prediction).\n Used by A2C, PPO and the likes.\n\n :param observation_space: Observation space\n :param action_space: Action space\n :param lr_schedule: Learning rate schedule (could be constant)\n :param net_arch: The specification of the policy and value networks.\n :param activation_fn: Activation function\n :param ortho_init: Whether to use or not orthogonal initialization\n :param use_sde: Whether to use State Dependent Exploration or not\n :param log_std_init: Initial value for the log standard deviation\n :param full_std: Whether to use (n_features x n_actions) parameters\n for the std instead of only (n_features,) when using gSDE\n :param use_expln: Use ``expln()`` function instead of ``exp()`` to ensure\n a positive standard deviation (cf paper). It allows to keep variance\n above zero and prevent it from growing too fast. In practice, ``exp()`` is usually enough.\n :param squash_output: Whether to squash the output using a tanh function,\n this allows to ensure boundaries when using gSDE.\n :param features_extractor_class: Features extractor to use.\n :param features_extractor_kwargs: Keyword arguments\n to pass to the features extractor.\n :param share_features_extractor: If True, the features extractor is shared between the policy and value networks.\n :param normalize_images: Whether to normalize images or not,\n dividing by 255.0 (True by default)\n :param optimizer_class: The optimizer to use,\n ``th.optim.Adam`` by default\n :param optimizer_kwargs: Additional keyword arguments,\n excluding the learning rate, to pass to the optimizer\n ",
7
+ "__init__": "<function ActorCriticPolicy.__init__ at 0x7fdb80b8cf70>",
8
+ "_get_constructor_parameters": "<function ActorCriticPolicy._get_constructor_parameters at 0x7fdb80b8d000>",
9
+ "reset_noise": "<function ActorCriticPolicy.reset_noise at 0x7fdb80b8d090>",
10
+ "_build_mlp_extractor": "<function ActorCriticPolicy._build_mlp_extractor at 0x7fdb80b8d120>",
11
+ "_build": "<function ActorCriticPolicy._build at 0x7fdb80b8d1b0>",
12
+ "forward": "<function ActorCriticPolicy.forward at 0x7fdb80b8d240>",
13
+ "extract_features": "<function ActorCriticPolicy.extract_features at 0x7fdb80b8d2d0>",
14
+ "_get_action_dist_from_latent": "<function ActorCriticPolicy._get_action_dist_from_latent at 0x7fdb80b8d360>",
15
+ "_predict": "<function ActorCriticPolicy._predict at 0x7fdb80b8d3f0>",
16
+ "evaluate_actions": "<function ActorCriticPolicy.evaluate_actions at 0x7fdb80b8d480>",
17
+ "get_distribution": "<function ActorCriticPolicy.get_distribution at 0x7fdb80b8d510>",
18
+ "predict_values": "<function ActorCriticPolicy.predict_values at 0x7fdb80b8d5a0>",
19
+ "__abstractmethods__": "frozenset()",
20
+ "_abc_impl": "<_abc._abc_data object at 0x7fdb80b85d80>"
21
+ },
22
+ "verbose": 0,
23
+ "policy_kwargs": {},
24
+ "num_timesteps": 10027008,
25
+ "_total_timesteps": 10000000,
26
+ "_num_timesteps_at_start": 0,
27
+ "seed": null,
28
+ "action_noise": null,
29
+ "start_time": 1707903050276585938,
30
+ "learning_rate": 0.0003,
31
+ "tensorboard_log": null,
32
+ "_last_obs": {
33
+ ":type:": "<class 'numpy.ndarray'>",
34
+ ":serialized:": "gAWVdQQAAAAAAACMEm51bXB5LmNvcmUubnVtZXJpY5SMC19mcm9tYnVmZmVylJOUKJYABAAAAAAAAM1y4jxSmKu5jrqwvjaiUb76g169mq9lPwAAgD8AAAAAAE0APQOsfLwaIKm+J1kLvYRYEz1u+BY9AACAPwAAgD8zxWk8KRwHurtLTrm1DzS0G1Yhuz0+dzgAAIA/AACAPzMWo7w7rIk9rsfDvWnZxb5hQgW9A8XBvQAAAAAAAAAA5sgEvXtUgrp9Yiuz3AEhsI2qKzt1fcgzAACAPwAAgD9miG+8j54guruRm7qeQJa1w8LiOqBYuTkAAIA/AACAP81eR7yfvZ670mcrPU8JHj1xw/u8NA0DPgAAgD8AAIA/zQYhvNLilbsYXeS8PKHRPEhW5jzjS7C9AACAPwAAgD8AVdw8JPIfPHAvbr6PhJK++ZsKvuWeOb4AAIA/AAAAALPNNL28fbM/XP0svptBpb5mH/y8H1mkvQAAAAAAAAAAQGchPlSyGz9hJpI8fHlWv5lwsj4+Q7W9AAAAAAAAAAAzQ6W9kaZ5P8MmeL6k6me/tYj2vRbFGL4AAAAAAAAAAFpPub05bVg+GjI/PqeBHL/0rR++bg8ePgAAAAAAAAAAzRcBPeEls7yiP4G+wwKMPbKR4D2v2qG4AACAPwAAgD9zQ/s95A7EPkbhO77LhDC/yRN7PiNJPb4AAAAAAAAAACDsJL7GB4w/Y6Tvvmu3FL+ns4e+Dp27vgAAAAAAAAAAuhY3PtAO2j7gPtC+lWQ8v8TCRT46R5O+AAAAAAAAAABmt+88cQIZPI18l771oCK+dmkRvr43bz8AAIA/AAAAAEApIr6a6hk/0MKWPLIoMb9wvKm+T5nCPQAAAAAAAAAATasCPQVveT5GWPu9lakvv7UqAbzieOO9AAAAAAAAAAAzs6G6VdSAP67SsbtdlIm/MfG3OpbGnjoAAAAAAAAAAGYzxjxIqZk5fAOgOgAaPrVtxCa89pXDuQAAgD8AAIA/szxfPj0pVD/2Uz09QwJCv8EXAj/DXxe9AAAAAAAAAACAdF89KAejPtqeir4k2C+/1P+aPQX7LL4AAAAAAAAAAE0lw73dnk0/pLsMvuuaTb9D2Wq+5lb2vQAAAAAAAAAAM46pPJknVz8AUZ67DSt3v9d8fT3Ibx49AAAAAAAAAACaPeo7FMDXuu4Z4b0cnyg8adb4O1M+Fr0AAIA/AACAPwCNorwg7KY/Gn9fvZ1gGb89UUm9mmfEvAAAAAAAAAAAmsfRPMMhMrrtc9w67eWdNUCAMzm+6gK6AACAPwAAgD+auty8XFcSumDj9ztcjZI5ezHrO0XGjLkAAIA/AACAPyad7T3Jpw8/unaZO9/mSr9lz4w+7Io0vQAAAAAAAAAAZjXNvCedpT87LhC+9QMEv4iT6Dyw3gA9AAAAAAAAAACUjAVudW1weZSMBWR0eXBllJOUjAJmNJSJiIeUUpQoSwOMATyUTk5OSv////9K/////0sAdJRiSyBLCIaUjAFDlHSUUpQu"
35
+ },
36
+ "_last_episode_starts": {
37
+ ":type:": "<class 'numpy.ndarray'>",
38
+ ":serialized:": "gAWVkwAAAAAAAACMEm51bXB5LmNvcmUubnVtZXJpY5SMC19mcm9tYnVmZmVylJOUKJYgAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAlIwFbnVtcHmUjAVkdHlwZZSTlIwCYjGUiYiHlFKUKEsDjAF8lE5OTkr/////Sv////9LAHSUYksghZSMAUOUdJRSlC4="
39
+ },
40
+ "_last_original_obs": null,
41
+ "_episode_num": 0,
42
+ "use_sde": false,
43
+ "sde_sample_freq": -1,
44
+ "_current_progress_remaining": -0.0027007999999999477,
45
+ "_stats_window_size": 100,
46
+ "ep_info_buffer": {
47
+ ":type:": "<class 'collections.deque'>",
48
+ ":serialized:": "gAWV5AsAAAAAAACMC2NvbGxlY3Rpb25zlIwFZGVxdWWUk5QpS2SGlFKUKH2UKIwBcpRHQHKp67dznzSMAWyUS9OMAXSUR0Cx/e1TisGQdX2UKGgGR0BzP7ohY/3WaAdLxmgIR0Cx/ZucDr7gdX2UKGgGR0BxsKgM+eOGaAdLyGgIR0Cx/YlfzBhydX2UKGgGR0B0Zjw9aEBbaAdLrWgIR0Cx/ZM1KoQ4dX2UKGgGR0BxfwA93bEhaAdLrGgIR0Cx/cu3lS0jdX2UKGgGR0BymyhQFcIJaAdLiWgIR0Cx/chjOLR8dX2UKGgGR0By7u4mTkhiaAdLwWgIR0Cx/a7haTwEdX2UKGgGR0BzOsaef7JoaAdLvWgIR0Cx/d/c8DB/dX2UKGgGR0BzoZHe7+UAaAdLx2gIR0Cx/ZoMSbpedX2UKGgGR0ByQgBKcurZaAdLo2gIR0Cx/agY51eTdX2UKGgGR0ByKBHd43WGaAdLr2gIR0Cx/ag+t8u0dX2UKGgGR0Bx7/cAR02caAdLlWgIR0Cx/e6eXiR5dX2UKGgGR0ByBGoxYaHcaAdL1WgIR0Cx/cNbkfcOdX2UKGgGR0BzZHzErGzbaAdLtmgIR0Cx/hZvgm7bdX2UKGgGR0BysVP2wmmcaAdLyWgIR0Cx/hIhEBsAdX2UKGgGR0ByjIleF+NMaAdLtWgIR0Cx/hpMYdhidX2UKGgGR0BvKKDujRD1aAdLt2gIR0Cx/ctuk1uSdX2UKGgGR0BxmVPpIMBqaAdLrmgIR0Cx/fEq6OHWdX2UKGgGR0By1u8kD6nBaAdLsGgIR0Cx/ikwN9YwdX2UKGgGR0BxidW6shgWaAdLnGgIR0Cx/ep4jbBXdX2UKGgGR0BylEsBhhH9aAdLsWgIR0Cx/fUgOjIrdX2UKGgGR0ByGO7btZ3caAdLnGgIR0Cx/iWNJe3QdX2UKGgGR0BxfTKifxtpaAdLp2gIR0Cx/eKqbSZ0dX2UKGgGR0BzB+uxKQJYaAdN6AFoCEdAsf4mKyfL93V9lChoBkdAcy1nx8UmD2gHS5RoCEdAsf43B+F10XV9lChoBkdAcx0+SKWLP2gHS79oCEdAsf38omXw9nV9lChoBkdARRVWbPQfIWgHS3ZoCEdAsf4x9RaX8nV9lChoBkdAcjclZ5iVjmgHS7JoCEdAsf38oE0SAnV9lChoBkdAcsJkyDZlF2gHS9xoCEdAsf45schkiHV9lChoBkdAc46bLU1AJWgHS8ZoCEdAsf4LOjZcs3V9lChoBkdAcz1OoHcDbWgHS85oCEdAsf3/U4JeFHV9lChoBkdAcAvFcY64lWgHS51oCEdAsf43vttygnV9lChoBkdAcFDvZh8YymgHS5ZoCEdAsf4PsPatcXV9lChoBkdAcvQhew9q12gHS7toCEdAsf4bwWnCO3V9lChoBkdAcwO69kBjnWgHS7VoCEdAsf4GIBRyfnV9lChoBkdAcjohl18stmgHS9loCEdAsf41MDfWMHV9lChoBkdAcLz59Vmz0GgHS6VoCEdAsf4hZU1hs3V9lChoBkdAcYcIiTt9hWgHS6FoCEdAsf4KF10T13V9lChoBkdAcxczmOlwcmgHS7poCEdAsf5ImLLpzXV9lChoBkdAcLg6Skj5bmgHS6BoCEdAsf5dew9q13V9lChoBkdAcuXOJ+DvmmgHS6NoCEdAsf4zOlfqo3V9lChoBkdAczlxcE/0NGgHS6poCEdAsf6LPIGQjnV9lChoBkdAOnLzf779AGgHS1toCEdAsf46wqy4WnV9lChoBkdAdDI2jO9nLGgHS6toCEdAsf6Pghr303V9lChoBkdAchmSncclxGgHS4xoCEdAsf6IHQhOg3V9lChoBkdAcnbY0l7dBWgHS5FoCEdAsf5Tlq8DjnV9lChoBkdAc57wvg3tKWgHS+toCEdAsf40580DU3V9lChoBkdAcdALE1l5GGgHS4poCEdAsf5BHOKO1nV9lChoBkdAdBGh/y5I6WgHS9JoCEdAsf6irp7kXHV9lChoBkdAcnAYixFAmmgHS+VoCEdAsf5HHGS6lXV9lChoBkdAcjSUsnRb8mgHS5poCEdAsf5eG+K0lnV9lChoBkdAcO0snAqNImgHS51oCEdAsf6iIInjQ3V9lChoBkdAdBZ4LkS26WgHS6ZoCEdAsf6WosI3SHV9lChoBkdAcrmQp4KQaWgHS9JoCEdAsf5bFvQ4THV9lChoBkdAcSIRXOnl4mgHS4hoCEdAsf5pHOKO1nV9lChoBkdAc/28Yht+C2gHS79oCEdAsf6oxM36ynV9lChoBkdAdC6XXAdn02gHS9BoCEdAsf54M1CPZXV9lChoBkdAcKSIyCWeH2gHS5NoCEdAsf50/lhgE3V9lChoBkdAcm9tYSxqwmgHS6toCEdAsf6uh/RVqHV9lChoBkdAcbEYrJ8v3GgHS5VoCEdAsf6Bu0kWynV9lChoBkdAcp6pSJj2BmgHS8RoCEdAsf64MrmQsHV9lChoBkdAcTHhxYJVsGgHS5toCEdAsf6fGOuJUHV9lChoBkdAbmvq59Vmz2gHS7NoCEdAsf551A7gbnV9lChoBkdAcmDbzK9wm2gHS9BoCEdAsf6Kt6ol2XV9lChoBkdAcyLrvsqrimgHS61oCEdAsf6XSiM5wXV9lChoBkdAcng2qkuYhWgHS7VoCEdAsf6Bd8iOenV9lChoBkdAcg4o99tuUGgHS6RoCEdAsf7NmPHT7XV9lChoBkdAc0CcEvCdjGgHTXwBaAhHQLH+8hV2icp1fZQoaAZHQHHpJJf6XSloB0uKaAhHQLH+scJMQEp1fZQoaAZHQHOuihnJ1aJoB0vCaAhHQLH+jlenhsJ1fZQoaAZHQHPTOx4Y77toB0vBaAhHQLH+zR6Ww/x1fZQoaAZHQG/PS7GvOhVoB0uraAhHQLH+sFnIyTJ1fZQoaAZHQHCXEHt4RmNoB0uWaAhHQLH+xo2n8891fZQoaAZHQHF663y7PIJoB0u3aAhHQLH+tAy2x6h1fZQoaAZHQHFF5KBd2PloB0ugaAhHQLH/EaN+9al1fZQoaAZHQHP7+9vjwQVoB0vDaAhHQLH/D37UG3Z1fZQoaAZHQHLt/MKTjedoB0vGaAhHQLH/GN5t3wF1fZQoaAZHQGlARHoX9BNoB03oA2gIR0Cx/rjN+so2dX2UKGgGR0ByzEnPVurIaAdLsWgIR0Cx/rtn9NvgdX2UKGgGR0Bws/JYDDCQaAdLkWgIR0Cx/s0n1FpgdX2UKGgGR0BzIfw4KhL5aAdLsWgIR0Cx/sF6Z6UrdX2UKGgGR0BxymV0Lc9GaAdLoWgIR0Cx/xHEhq0udX2UKGgGR0BzV8PbwjMWaAdL92gIR0Cx/t3vDxb0dX2UKGgGR0Bw/vIxQBPsaAdLjWgIR0Cx/tmNFSbZdX2UKGgGR0Bz3NHFxXGPaAdLq2gIR0Cx/tC+De0pdX2UKGgGR0ByFJTo+wC9aAdLsGgIR0Cx/w/f8/D+dX2UKGgGR0Bz6Y8zQ/oraAdNAAFoCEdAsf89BPbfxnV9lChoBkdASjUwSJ0nxGgHS31oCEdAsf7viEQGwHV9lChoBkdAcgDngYP5HmgHS6FoCEdAsf8ov7FbV3V9lChoBkdAchx36yjYZmgHS6poCEdAsf8laGHpKXV9lChoBkdAc3OJg9eQdWgHS7NoCEdAsf7x2cJ+lXV9lChoBkdAcY9r08NhE2gHS4loCEdAsf8t+jM3ZXV9lChoBkdAc9En4fwI+mgHS71oCEdAsf8sGX5WR3V9lChoBkdAb9DVH4Glh2gHS5FoCEdAsf8WzC1qnHV9lChoBkdAcvq9SuQp4WgHS79oCEdAsf8Go0hvBXV9lChoBkdAcl9mXgLqlmgHS61oCEdAsf8DWH1vl3V9lChoBkdAcBYEL6UJOWgHS59oCEdAsf9g5n13+3V9lChoBkdAcuPINmUW22gHS75oCEdAsf79+RYA83V9lChoBkdAcGIrsSkCWGgHS5hoCEdAsf74gdOqN3V9lChoBkdAcXiaVD8cdmgHS5NoCEdAsf8zcGkeqHVlLg=="
49
+ },
50
+ "ep_success_buffer": {
51
+ ":type:": "<class 'collections.deque'>",
52
+ ":serialized:": "gAWVIAAAAAAAAACMC2NvbGxlY3Rpb25zlIwFZGVxdWWUk5QpS2SGlFKULg=="
53
+ },
54
+ "_n_updates": 2448,
55
+ "observation_space": {
56
+ ":type:": "<class 'gymnasium.spaces.box.Box'>",
57
+ ":serialized:": "gAWVdgIAAAAAAACMFGd5bW5hc2l1bS5zcGFjZXMuYm94lIwDQm94lJOUKYGUfZQojAVkdHlwZZSMBW51bXB5lIwFZHR5cGWUk5SMAmY0lImIh5RSlChLA4wBPJROTk5K/////0r/////SwB0lGKMDWJvdW5kZWRfYmVsb3eUjBJudW1weS5jb3JlLm51bWVyaWOUjAtfZnJvbWJ1ZmZlcpSTlCiWCAAAAAAAAAABAQEBAQEBAZRoCIwCYjGUiYiHlFKUKEsDjAF8lE5OTkr/////Sv////9LAHSUYksIhZSMAUOUdJRSlIwNYm91bmRlZF9hYm92ZZRoESiWCAAAAAAAAAABAQEBAQEBAZRoFUsIhZRoGXSUUpSMBl9zaGFwZZRLCIWUjANsb3eUaBEoliAAAAAAAAAAAAC0wgAAtMIAAKDAAACgwNsPScAAAKDAAAAAgAAAAICUaAtLCIWUaBl0lFKUjARoaWdolGgRKJYgAAAAAAAAAAAAtEIAALRCAACgQAAAoEDbD0lAAACgQAAAgD8AAIA/lGgLSwiFlGgZdJRSlIwIbG93X3JlcHKUjFtbLTkwLiAgICAgICAgLTkwLiAgICAgICAgIC01LiAgICAgICAgIC01LiAgICAgICAgIC0zLjE0MTU5MjcgIC01LgogIC0wLiAgICAgICAgIC0wLiAgICAgICBdlIwJaGlnaF9yZXBylIxTWzkwLiAgICAgICAgOTAuICAgICAgICAgNS4gICAgICAgICA1LiAgICAgICAgIDMuMTQxNTkyNyAgNS4KICAxLiAgICAgICAgIDEuICAgICAgIF2UjApfbnBfcmFuZG9tlE51Yi4=",
58
+ "dtype": "float32",
59
+ "bounded_below": "[ True True True True True True True True]",
60
+ "bounded_above": "[ True True True True True True True True]",
61
+ "_shape": [
62
+ 8
63
+ ],
64
+ "low": "[-90. -90. -5. -5. -3.1415927 -5.\n -0. -0. ]",
65
+ "high": "[90. 90. 5. 5. 3.1415927 5.\n 1. 1. ]",
66
+ "low_repr": "[-90. -90. -5. -5. -3.1415927 -5.\n -0. -0. ]",
67
+ "high_repr": "[90. 90. 5. 5. 3.1415927 5.\n 1. 1. ]",
68
+ "_np_random": null
69
+ },
70
+ "action_space": {
71
+ ":type:": "<class 'gymnasium.spaces.discrete.Discrete'>",
72
+ ":serialized:": "gAWV/QAAAAAAAACMGWd5bW5hc2l1bS5zcGFjZXMuZGlzY3JldGWUjAhEaXNjcmV0ZZSTlCmBlH2UKIwBbpSMFW51bXB5LmNvcmUubXVsdGlhcnJheZSMBnNjYWxhcpSTlIwFbnVtcHmUjAVkdHlwZZSTlIwCaTiUiYiHlFKUKEsDjAE8lE5OTkr/////Sv////9LAHSUYkMIBAAAAAAAAACUhpRSlIwFc3RhcnSUaAhoDkMIAAAAAAAAAACUhpRSlIwGX3NoYXBllCmMBWR0eXBllGgLjAJpOJSJiIeUUpQoSwNoD05OTkr/////Sv////9LAHSUYowKX25wX3JhbmRvbZROdWIu",
73
+ "n": "4",
74
+ "start": "0",
75
+ "_shape": [],
76
+ "dtype": "int64",
77
+ "_np_random": null
78
+ },
79
+ "n_envs": 32,
80
+ "n_steps": 1024,
81
+ "gamma": 0.999,
82
+ "gae_lambda": 0.98,
83
+ "ent_coef": 0.01,
84
+ "vf_coef": 0.5,
85
+ "max_grad_norm": 0.5,
86
+ "rollout_buffer_class": {
87
+ ":type:": "<class 'abc.ABCMeta'>",
88
+ ":serialized:": "gAWVNgAAAAAAAACMIHN0YWJsZV9iYXNlbGluZXMzLmNvbW1vbi5idWZmZXJzlIwNUm9sbG91dEJ1ZmZlcpSTlC4=",
89
+ "__module__": "stable_baselines3.common.buffers",
90
+ "__annotations__": "{'observations': <class 'numpy.ndarray'>, 'actions': <class 'numpy.ndarray'>, 'rewards': <class 'numpy.ndarray'>, 'advantages': <class 'numpy.ndarray'>, 'returns': <class 'numpy.ndarray'>, 'episode_starts': <class 'numpy.ndarray'>, 'log_probs': <class 'numpy.ndarray'>, 'values': <class 'numpy.ndarray'>}",
91
+ "__doc__": "\n Rollout buffer used in on-policy algorithms like A2C/PPO.\n It corresponds to ``buffer_size`` transitions collected\n using the current policy.\n This experience will be discarded after the policy update.\n In order to use PPO objective, we also store the current value of each state\n and the log probability of each taken action.\n\n The term rollout here refers to the model-free notion and should not\n be used with the concept of rollout used in model-based RL or planning.\n Hence, it is only involved in policy and value function training but not action selection.\n\n :param buffer_size: Max number of element in the buffer\n :param observation_space: Observation space\n :param action_space: Action space\n :param device: PyTorch device\n :param gae_lambda: Factor for trade-off of bias vs variance for Generalized Advantage Estimator\n Equivalent to classic advantage when set to 1.\n :param gamma: Discount factor\n :param n_envs: Number of parallel environments\n ",
92
+ "__init__": "<function RolloutBuffer.__init__ at 0x7fdb80d232e0>",
93
+ "reset": "<function RolloutBuffer.reset at 0x7fdb80d23370>",
94
+ "compute_returns_and_advantage": "<function RolloutBuffer.compute_returns_and_advantage at 0x7fdb80d23400>",
95
+ "add": "<function RolloutBuffer.add at 0x7fdb80d23490>",
96
+ "get": "<function RolloutBuffer.get at 0x7fdb80d23520>",
97
+ "_get_samples": "<function RolloutBuffer._get_samples at 0x7fdb80d235b0>",
98
+ "__abstractmethods__": "frozenset()",
99
+ "_abc_impl": "<_abc._abc_data object at 0x7fdb80f0f8c0>"
100
+ },
101
+ "rollout_buffer_kwargs": {},
102
+ "batch_size": 64,
103
+ "n_epochs": 8,
104
+ "clip_range": {
105
+ ":type:": "<class 'function'>",
106
+ ":serialized:": "gAWV3AIAAAAAAACMF2Nsb3VkcGlja2xlLmNsb3VkcGlja2xllIwOX21ha2VfZnVuY3Rpb26Uk5QoaACMDV9idWlsdGluX3R5cGWUk5SMCENvZGVUeXBllIWUUpQoSwFLAEsASwFLAUsTQwSIAFMAlE6FlCmMAV+UhZSMYy9ob21lL2pvc3NrYS9taW5pY29uZGEzL2VudnMvanNtbDMxMC9saWIvcHl0aG9uMy4xMC9zaXRlLXBhY2thZ2VzL3N0YWJsZV9iYXNlbGluZXMzL2NvbW1vbi91dGlscy5weZSMBGZ1bmOUS4NDAgQBlIwDdmFslIWUKXSUUpR9lCiMC19fcGFja2FnZV9flIwYc3RhYmxlX2Jhc2VsaW5lczMuY29tbW9ulIwIX19uYW1lX1+UjB5zdGFibGVfYmFzZWxpbmVzMy5jb21tb24udXRpbHOUjAhfX2ZpbGVfX5SMYy9ob21lL2pvc3NrYS9taW5pY29uZGEzL2VudnMvanNtbDMxMC9saWIvcHl0aG9uMy4xMC9zaXRlLXBhY2thZ2VzL3N0YWJsZV9iYXNlbGluZXMzL2NvbW1vbi91dGlscy5weZR1Tk5oAIwQX21ha2VfZW1wdHlfY2VsbJSTlClSlIWUdJRSlGgAjBJfZnVuY3Rpb25fc2V0c3RhdGWUk5RoH32UfZQoaBZoDYwMX19xdWFsbmFtZV9flIwZY29uc3RhbnRfZm4uPGxvY2Fscz4uZnVuY5SMD19fYW5ub3RhdGlvbnNfX5R9lIwOX19rd2RlZmF1bHRzX1+UTowMX19kZWZhdWx0c19flE6MCl9fbW9kdWxlX1+UaBeMB19fZG9jX1+UTowLX19jbG9zdXJlX1+UaACMCl9tYWtlX2NlbGyUk5RHP8mZmZmZmZqFlFKUhZSMF19jbG91ZHBpY2tsZV9zdWJtb2R1bGVzlF2UjAtfX2dsb2JhbHNfX5R9lHWGlIZSMC4="
107
+ },
108
+ "clip_range_vf": null,
109
+ "normalize_advantage": true,
110
+ "target_kl": null,
111
+ "lr_schedule": {
112
+ ":type:": "<class 'function'>",
113
+ ":serialized:": "gAWV3AIAAAAAAACMF2Nsb3VkcGlja2xlLmNsb3VkcGlja2xllIwOX21ha2VfZnVuY3Rpb26Uk5QoaACMDV9idWlsdGluX3R5cGWUk5SMCENvZGVUeXBllIWUUpQoSwFLAEsASwFLAUsTQwSIAFMAlE6FlCmMAV+UhZSMYy9ob21lL2pvc3NrYS9taW5pY29uZGEzL2VudnMvanNtbDMxMC9saWIvcHl0aG9uMy4xMC9zaXRlLXBhY2thZ2VzL3N0YWJsZV9iYXNlbGluZXMzL2NvbW1vbi91dGlscy5weZSMBGZ1bmOUS4NDAgQBlIwDdmFslIWUKXSUUpR9lCiMC19fcGFja2FnZV9flIwYc3RhYmxlX2Jhc2VsaW5lczMuY29tbW9ulIwIX19uYW1lX1+UjB5zdGFibGVfYmFzZWxpbmVzMy5jb21tb24udXRpbHOUjAhfX2ZpbGVfX5SMYy9ob21lL2pvc3NrYS9taW5pY29uZGEzL2VudnMvanNtbDMxMC9saWIvcHl0aG9uMy4xMC9zaXRlLXBhY2thZ2VzL3N0YWJsZV9iYXNlbGluZXMzL2NvbW1vbi91dGlscy5weZR1Tk5oAIwQX21ha2VfZW1wdHlfY2VsbJSTlClSlIWUdJRSlGgAjBJfZnVuY3Rpb25fc2V0c3RhdGWUk5RoH32UfZQoaBZoDYwMX19xdWFsbmFtZV9flIwZY29uc3RhbnRfZm4uPGxvY2Fscz4uZnVuY5SMD19fYW5ub3RhdGlvbnNfX5R9lIwOX19rd2RlZmF1bHRzX1+UTowMX19kZWZhdWx0c19flE6MCl9fbW9kdWxlX1+UaBeMB19fZG9jX1+UTowLX19jbG9zdXJlX1+UaACMCl9tYWtlX2NlbGyUk5RHPzOpKjBVMmGFlFKUhZSMF19jbG91ZHBpY2tsZV9zdWJtb2R1bGVzlF2UjAtfX2dsb2JhbHNfX5R9lHWGlIZSMC4="
114
+ },
115
+ "system_info": {
116
+ "OS": "Linux-5.15.133.1-microsoft-standard-WSL2-x86_64-with-glibc2.35 # 1 SMP Thu Oct 5 21:02:42 UTC 2023",
117
+ "Python": "3.10.13",
118
+ "Stable-Baselines3": "2.2.1",
119
+ "PyTorch": "2.2.0+cu121",
120
+ "GPU Enabled": "True",
121
+ "Numpy": "1.26.4",
122
+ "Cloudpickle": "3.0.0",
123
+ "Gymnasium": "0.28.1"
124
+ }
125
+ }
evaluate.py ADDED
@@ -0,0 +1,69 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import random
3
+ from hf_helpers.sb3_eval import eval_model_with_seed
4
+ import pandas as pd
5
+
6
+ env_id = "LunarLander-v2"
7
+ models_to_evaluate = [
8
+ "ppo-LunarLander-v2_001_000_000_hf_defaults.zip",
9
+ "ppo-LunarLander-v2_010_000_000_hf_defaults.zip",
10
+ "ppo-LunarLander-v2_010_000_000_sb3_defaults.zip",
11
+ "ppo-LunarLander-v2_123_456_789_hf_defaults.zip",
12
+ ]
13
+ evaluation_results_fp = "evaluation_results.csv"
14
+
15
+
16
+ def store_results(results):
17
+ results_df = pd.DataFrame(results)
18
+ header = False if os.path.exists(evaluation_results_fp) else True
19
+ results_df.to_csv(evaluation_results_fp, mode="a", index=False, header=header)
20
+
21
+
22
+ def evaluate_and_store_all_results():
23
+ results = []
24
+ n_evaluations = 1000
25
+ for i in range(n_evaluations):
26
+ if i > 0 and i % 10 == 0:
27
+ print(f"Progress: {i}/{n_evaluations}")
28
+ store_results(results)
29
+ results = []
30
+
31
+ # seed = random.randint(0, 1000000000000) # Why this interval?
32
+ seed = random.randint(0, 10000) # Also try some smaller numbers for seed
33
+ n_envs = random.randint(1, 16)
34
+ for model_fp in models_to_evaluate:
35
+ result, mean_reward, std_reward = eval_model_with_seed(
36
+ model_fp, env_id, seed, n_eval_episodes=10, n_envs=n_envs
37
+ )
38
+ result_data = {
39
+ "model_fp": model_fp,
40
+ "seed": seed,
41
+ "n_envs": n_envs,
42
+ "result": result,
43
+ "mean_reward": mean_reward,
44
+ "std_reward": std_reward,
45
+ }
46
+ results.append(result_data)
47
+
48
+
49
+ def analyze_results():
50
+ results_df = pd.read_csv(evaluation_results_fp)
51
+ results_df["model_fp"] = results_df["model_fp"].str.replace(".zip", "", regex=False)
52
+ aggregated_results = (
53
+ results_df.groupby("model_fp")["result"]
54
+ .agg(["count", "min", "max", "mean"])
55
+ .reset_index()
56
+ )
57
+ aggregated_results.columns = [
58
+ "Model name",
59
+ "Number of results",
60
+ "Min",
61
+ "Max",
62
+ "Average",
63
+ ]
64
+ aggregated_results = aggregated_results.sort_values(by="Model name")
65
+ print(aggregated_results.to_markdown(index=False, tablefmt="pipe"))
66
+
67
+
68
+ # evaluate_and_store_all_results()
69
+ analyze_results()
evaluation_results.csv ADDED
The diff for this file is too large to render. See raw diff
 
hf_helpers/__init__.py ADDED
File without changes
hf_helpers/gym_video.py ADDED
@@ -0,0 +1,82 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import tempfile
3
+ import imageio
4
+ from stable_baselines3.common.vec_env import VecVideoRecorder
5
+ import numpy as np
6
+ import gymnasium as gym
7
+ from stable_baselines3 import PPO
8
+ from stable_baselines3.common.monitor import Monitor
9
+ from stable_baselines3.common.vec_env import DummyVecEnv
10
+
11
+
12
+ def generate_video(model, video_fp, video_length_in_episodes=5):
13
+
14
+ eval_env = model.get_env()
15
+
16
+ max_video_length_in_steps = (
17
+ video_length_in_episodes * eval_env.get_attr("spec")[0].max_episode_steps
18
+ )
19
+
20
+ with tempfile.TemporaryDirectory() as temp_dp:
21
+ vec_env = VecVideoRecorder(
22
+ eval_env,
23
+ temp_dp,
24
+ record_video_trigger=lambda x: x == 0,
25
+ video_length=max_video_length_in_steps,
26
+ )
27
+
28
+ frame_count = 0
29
+ episode_count = 0
30
+ obs = vec_env.reset()
31
+ for _ in range(max_video_length_in_steps):
32
+ action, _ = model.predict(obs, deterministic=True)
33
+ obs, _, dones, _ = vec_env.step(action)
34
+ frame_count += 1
35
+ if dones:
36
+ episode_count += 1
37
+ if episode_count >= video_length_in_episodes:
38
+ break
39
+
40
+ vec_env.close()
41
+
42
+ temp_fp = vec_env.video_recorder.path
43
+
44
+ # TODO: Fix this.
45
+ # Use ffmpeg to remove the last frame (it is the first frame in a new episode).
46
+ os.system(
47
+ f"""ffmpeg -y -i {temp_fp} -vf "select='not(eq(n,{frame_count}))'" {video_fp} > /dev/null 2>&1"""
48
+ )
49
+ # os.rename(temp_fp, file_path)
50
+
51
+
52
+ def generate_gif(model, file_path, video_length_in_episodes=5):
53
+ eval_env = model.get_env()
54
+
55
+ max_video_length_in_steps = (
56
+ video_length_in_episodes * eval_env.get_attr("spec")[0].max_episode_steps
57
+ )
58
+
59
+ render_image = lambda: eval_env.render(mode="rgb_array")
60
+
61
+ images = []
62
+ episode_count = 0
63
+ obs = eval_env.reset()
64
+ images.append(render_image())
65
+ for _ in range(max_video_length_in_steps):
66
+ action, _ = model.predict(obs)
67
+ obs, _, dones, _ = eval_env.step(action)
68
+ if dones:
69
+ episode_count += 1
70
+ if episode_count >= video_length_in_episodes:
71
+ break
72
+ images.append(render_image())
73
+
74
+ imageio.mimsave(
75
+ file_path, [np.array(img) for i, img in enumerate(images) if i % 2 == 0], fps=25
76
+ )
77
+
78
+
79
+ def load_ppo_model_for_video(model_fp, env_id):
80
+ env = DummyVecEnv([lambda: Monitor(gym.make(env_id, render_mode="rgb_array"))])
81
+ model = PPO.load(model_fp, env=env)
82
+ return model
hf_helpers/hf_sb3.py ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import datetime
2
+ import json
3
+ import zipfile
4
+ import stable_baselines3
5
+
6
+
7
+ def generate_config_json(model_fp, config_fp):
8
+ with zipfile.ZipFile(model_fp, 'r') as zip_ref:
9
+ with zip_ref.open("data") as file:
10
+ data = json.load(file)
11
+ data["system_info"] = stable_baselines3.get_system_info(print_info=False)[0]
12
+ with open(config_fp, 'w') as f:
13
+ json.dump(data, f, indent=4)
14
+
15
+ def generate_results_json(results_fp, mean_reward, std_reward, n_eval_episodes, is_deterministic=True):
16
+ eval_form_datetime = datetime.datetime.now().isoformat()
17
+ data = {
18
+ "mean_reward": mean_reward,
19
+ "std_reward": std_reward,
20
+ "is_deterministic": is_deterministic,
21
+ "n_eval_episodes": n_eval_episodes,
22
+ "eval_datetime": eval_form_datetime,
23
+ }
24
+ with open(results_fp, 'w') as f:
25
+ json.dump(data, f, indent=4)
hf_helpers/readme.md ADDED
@@ -0,0 +1 @@
 
 
1
+ TODO: Put on GitHub?
hf_helpers/sb3_eval.py ADDED
@@ -0,0 +1,89 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import gymnasium as gym
2
+ from stable_baselines3 import PPO
3
+ from stable_baselines3.common.evaluation import evaluate_policy
4
+ from stable_baselines3.common.monitor import Monitor
5
+ from stable_baselines3.common.env_util import make_vec_env
6
+ import random
7
+
8
+
9
+ def eval_model_with_seed(model_fp, env_id, seed, n_eval_episodes=10, n_envs=1):
10
+ eval_env = make_vec_env(env_id, seed=seed, n_envs=n_envs)
11
+ return eval_model(model_fp, eval_env, n_eval_episodes)
12
+
13
+
14
+ def eval_model_random(model_fp, env_id, n_eval_episodes=10):
15
+ eval_env = Monitor(gym.make(env_id))
16
+ return eval_model(model_fp, eval_env, n_eval_episodes)
17
+
18
+
19
+ def eval_model_random_with_average(
20
+ model_fp, env_id, n_eval_episodes=10, n_average=10, verbose=False
21
+ ):
22
+ result_sum = 0
23
+ mean_reward_sum = 0
24
+ std_reward_sum = 0
25
+ for i in range(n_average):
26
+ if verbose and i % 100 == 0:
27
+ print(f"Progress: {i}/{n_average}")
28
+ result, mean_reward, std_reward = eval_model_random(
29
+ model_fp, env_id, n_eval_episodes
30
+ )
31
+ result_sum += result
32
+ mean_reward_sum += mean_reward
33
+ std_reward_sum += std_reward
34
+ return (
35
+ result_sum / n_average,
36
+ mean_reward_sum / n_average,
37
+ std_reward_sum / n_average,
38
+ )
39
+
40
+
41
+ def eval_model(model_fp, eval_env, n_eval_episodes=10):
42
+ model = PPO.load(model_fp, env=eval_env)
43
+ mean_reward, std_reward = evaluate_policy(
44
+ model, eval_env, n_eval_episodes=n_eval_episodes, deterministic=True
45
+ )
46
+ result = mean_reward - std_reward
47
+ return result, mean_reward, std_reward
48
+
49
+
50
+ def search_for_best_seed(
51
+ model_fp,
52
+ env_id,
53
+ n_eval_episodes=10,
54
+ n_total_envs_to_search=1000,
55
+ max_n_envs=16,
56
+ verbose=False,
57
+ ):
58
+ best_result = 0
59
+ best_seed = 0
60
+ best_n_envs = 0
61
+ for i in range(n_total_envs_to_search):
62
+ if verbose and i % 100 == 0:
63
+ print(f"Progress: {i}/{n_total_envs_to_search}")
64
+ seed = random.randint(0, 1000000000000)
65
+ n_envs = random.randint(1, max_n_envs)
66
+ result, _, _ = eval_model_with_seed(
67
+ model_fp, env_id, seed, n_eval_episodes, n_envs
68
+ )
69
+ if result > best_result:
70
+ best_result = result
71
+ best_seed = seed
72
+ best_n_envs = n_envs
73
+ return best_result, best_seed, best_n_envs
74
+
75
+
76
+ def search_for_best_seed_in_range(model_fp, env_id, range=range(0, 1000)):
77
+ best_result = 0
78
+ best_seed = 0
79
+ best_n_envs = 0
80
+ for seed in range:
81
+ for n_envs in [1, 2, 4, 8, 16, 32]:
82
+ result, _, _ = eval_model_with_seed(model_fp, env_id, seed, 10, n_envs)
83
+ if result > best_result:
84
+ best_result = result
85
+ best_seed = seed
86
+ best_n_envs = n_envs
87
+ print(best_result, seed, n_envs)
88
+ print(best_result, best_seed, best_n_envs)
89
+ return best_result, best_seed, best_n_envs
main.py ADDED
@@ -0,0 +1,57 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from huggingface_sb3.push_to_hub import generate_metadata
2
+ from huggingface_hub.repocard import metadata_save
3
+ from hf_helpers.gym_video import generate_video, load_ppo_model_for_video
4
+ from hf_helpers.hf_sb3 import generate_config_json, generate_results_json
5
+ from hf_helpers.sb3_eval import eval_model_with_seed
6
+
7
+
8
+ readme_path = "README.md"
9
+
10
+ env_id = "LunarLander-v2"
11
+
12
+ main_model_fp = "ppo-LunarLander-v2_010_000_000_hf_defaults.zip"
13
+ other_models = [
14
+ "ppo-LunarLander-v2_001_000_000_hf_defaults.zip",
15
+ "ppo-LunarLander-v2_010_000_000_sb3_defaults.zip",
16
+ "ppo-LunarLander-v2_123_456_789_hf_defaults.zip",
17
+ ]
18
+
19
+
20
+ # 1. Evaluate model
21
+ best_seed = 902
22
+ best_n_envs = 8
23
+ n_eval_episodes = 10
24
+ result, mean_reward, std_reward = eval_model_with_seed(
25
+ main_model_fp,
26
+ env_id,
27
+ seed=best_seed,
28
+ n_eval_episodes=n_eval_episodes,
29
+ n_envs=best_n_envs,
30
+ )
31
+
32
+
33
+ # 2. Create config.json
34
+ generate_config_json(main_model_fp, "config.json")
35
+ # Also create config files for the other models
36
+ for model_fp in other_models:
37
+ generate_config_json(model_fp, f"config-{model_fp.replace('.zip', '')}.json")
38
+
39
+
40
+ # 3. Create results.json
41
+ generate_results_json("results.json", mean_reward, std_reward, n_eval_episodes, True)
42
+
43
+
44
+ # 4. Generate video
45
+ model_for_video = load_ppo_model_for_video(main_model_fp, env_id)
46
+ generate_video(model_for_video, "video.mp4", video_length_in_episodes=5)
47
+
48
+
49
+ # 5. Generate model card
50
+ metadata = generate_metadata(
51
+ model_name=main_model_fp.replace(".zip", ""),
52
+ env_id=env_id,
53
+ mean_reward=mean_reward,
54
+ std_reward=std_reward,
55
+ )
56
+ metadata["license"] = "mit"
57
+ metadata_save(readme_path, metadata)
ppo-LunarLander-v2_001_000_000_hf_defaults.zip ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:95c76941c556b88a7269767145b8772aa92a7d3394cedfe21ca73f0fd71c4ca2
3
+ size 151144
ppo-LunarLander-v2_010_000_000_hf_defaults.zip ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:64d6cf67b2df2ebfe514a2f33e01346dc00a992789b1ca88eb42f094a747066c
3
+ size 151024
ppo-LunarLander-v2_010_000_000_sb3_defaults.zip ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c291a24f4798636a4b21e9cf6e8a5eb216f2f681382926136a7e658740b2e042
3
+ size 151015
ppo-LunarLander-v2_123_456_789_hf_defaults.zip ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:93c100c96c3002c69c5a87cb5c37ec01fc6e0f66dbd9bd46a315f761613f9c1e
3
+ size 151024
results.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "mean_reward": 311.6129648,
3
+ "std_reward": 6.22892335529413,
4
+ "is_deterministic": true,
5
+ "n_eval_episodes": 10,
6
+ "eval_datetime": "2024-03-26T11:30:30.555994"
7
+ }
video.mp4 ADDED
Binary file (143 kB). View file