Antonio Serrano Muñoz commited on
Commit
1b5a051
1 Parent(s): e2dc694

Update README

Browse files
Files changed (1) hide show
  1. README.md +55 -37
README.md CHANGED
@@ -9,7 +9,7 @@ model-index:
9
  results:
10
  - metrics:
11
  - type: mean_reward
12
- value: 11336.27 +/- 182.79
13
  name: Total reward (mean)
14
  task:
15
  type: reinforcement-learning
@@ -19,51 +19,69 @@ model-index:
19
  type: Isaac-Ant-v0
20
  ---
21
 
 
 
 
 
 
 
22
  # IsaacOrbit-Isaac-Ant-v0-PPO
23
 
24
- Trained agent model for [NVIDIA Isaac Orbit](https://github.com/NVIDIA-Omniverse/Orbit) environment
25
 
26
  - **Task:** Isaac-Ant-v0
27
- - **Agent:** [PPO](https://skrl.readthedocs.io/en/latest/modules/skrl.agents.ppo.html)
28
 
29
- # Usage (with skrl)
30
 
31
- ```python
32
- from skrl.utils.huggingface import download_model_from_huggingface
33
 
34
- # assuming that there is an agent named `agent`
35
- path = download_model_from_huggingface("skrl/IsaacOrbit-Isaac-Ant-v0-PPO")
36
- agent.load(path)
37
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
38
 
39
  # Hyperparameters
40
 
41
  ```python
42
- # https://skrl.readthedocs.io/en/latest/modules/skrl.agents.ppo.html#configuration-and-hyperparameters
43
- cfg_ppo = PPO_DEFAULT_CONFIG.copy()
44
- cfg_ppo["rollouts"] = 16 # memory_size
45
- cfg_ppo["learning_epochs"] = 8
46
- cfg_ppo["mini_batches"] = 4 # 16 * 1024 / 4096
47
- cfg_ppo["discount_factor"] = 0.99
48
- cfg_ppo["lambda"] = 0.95
49
- cfg_ppo["learning_rate"] = 3e-4
50
- cfg_ppo["learning_rate_scheduler"] = KLAdaptiveRL
51
- cfg_ppo["learning_rate_scheduler_kwargs"] = {"kl_threshold": 0.008}
52
- cfg_ppo["random_timesteps"] = 0
53
- cfg_ppo["learning_starts"] = 0
54
- cfg_ppo["grad_norm_clip"] = 1.0
55
- cfg_ppo["ratio_clip"] = 0.2
56
- cfg_ppo["value_clip"] = 0.2
57
- cfg_ppo["clip_predicted_values"] = True
58
- cfg_ppo["entropy_loss_scale"] = 0.0
59
- cfg_ppo["value_loss_scale"] = 1.0
60
- cfg_ppo["kl_threshold"] = 0
61
- cfg_ppo["rewards_shaper"] = lambda rewards, timestep, timesteps: rewards * 0.01
62
- cfg_ppo["state_preprocessor"] = RunningStandardScaler
63
- cfg_ppo["state_preprocessor_kwargs"] = {"size": env.observation_space, "device": device}
64
- cfg_ppo["value_preprocessor"] = RunningStandardScaler
65
- cfg_ppo["value_preprocessor_kwargs"] = {"size": 1, "device": device}
66
- # logging to TensorBoard and writing checkpoints
67
- cfg_ppo["experiment"]["write_interval"] = 40
68
- cfg_ppo["experiment"]["checkpoint_interval"] = 400
69
  ```
 
9
  results:
10
  - metrics:
11
  - type: mean_reward
12
+ value: 13273.86 +/- 3550.43
13
  name: Total reward (mean)
14
  task:
15
  type: reinforcement-learning
 
19
  type: Isaac-Ant-v0
20
  ---
21
 
22
+ <!-- ---
23
+ torch: 13273.86 +/- 3550.43
24
+ jax: 20690.53 +/- 0.0
25
+ numpy:
26
+ --- -->
27
+
28
  # IsaacOrbit-Isaac-Ant-v0-PPO
29
 
30
+ Trained agent for [NVIDIA Isaac Orbit](https://github.com/NVIDIA-Omniverse/Orbit) environments.
31
 
32
  - **Task:** Isaac-Ant-v0
33
+ - **Agent:** [PPO](https://skrl.readthedocs.io/en/latest/api/agents/ppo.html)
34
 
35
+ # Usage (with skrl)
36
 
37
+ Note: Visit the skrl [Examples](https://skrl.readthedocs.io/en/latest/intro/examples.html) section to access the scripts.
 
38
 
39
+ * PyTorch
40
+
41
+ ```python
42
+ from skrl.utils.huggingface import download_model_from_huggingface
43
+
44
+ # assuming that there is an agent named `agent`
45
+ path = download_model_from_huggingface("skrl/IsaacOrbit-Isaac-Ant-v0-PPO", filename="agent.pt")
46
+ agent.load(path)
47
+ ```
48
+
49
+ * JAX
50
+
51
+ ```python
52
+ from skrl.utils.huggingface import download_model_from_huggingface
53
+
54
+ # assuming that there is an agent named `agent`
55
+ path = download_model_from_huggingface("skrl/IsaacOrbit-Isaac-Ant-v0-PPO", filename="agent.pickle")
56
+ agent.load(path)
57
+ ```
58
 
59
  # Hyperparameters
60
 
61
  ```python
62
+ # https://skrl.readthedocs.io/en/latest/api/agents/ppo.html#configuration-and-hyperparameters
63
+ cfg = PPO_DEFAULT_CONFIG.copy()
64
+ cfg["rollouts"] = 16 # memory_size
65
+ cfg["learning_epochs"] = 8
66
+ cfg["mini_batches"] = 4 # 16 * 1024 / 4096
67
+ cfg["discount_factor"] = 0.99
68
+ cfg["lambda"] = 0.95
69
+ cfg["learning_rate"] = 3e-4
70
+ cfg["learning_rate_scheduler"] = KLAdaptiveRL
71
+ cfg["learning_rate_scheduler_kwargs"] = {"kl_threshold": 0.008}
72
+ cfg["random_timesteps"] = 0
73
+ cfg["learning_starts"] = 0
74
+ cfg["grad_norm_clip"] = 1.0
75
+ cfg["ratio_clip"] = 0.2
76
+ cfg["value_clip"] = 0.2
77
+ cfg["clip_predicted_values"] = True
78
+ cfg["entropy_loss_scale"] = 0.0
79
+ cfg["value_loss_scale"] = 1.0
80
+ cfg["kl_threshold"] = 0
81
+ cfg["rewards_shaper"] = lambda rewards, *args, **kwargs: rewards * 0.1
82
+ cfg["time_limit_bootstrap"] = True
83
+ cfg["state_preprocessor"] = RunningStandardScaler
84
+ cfg["state_preprocessor_kwargs"] = {"size": env.observation_space, "device": device}
85
+ cfg["value_preprocessor"] = RunningStandardScaler
86
+ cfg["value_preprocessor_kwargs"] = {"size": 1, "device": device}
 
 
87
  ```