JaiSurya commited on
Commit
8aae42a
·
1 Parent(s): a087db9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +43 -7
README.md CHANGED
@@ -5,6 +5,7 @@ tags:
5
  - deep-reinforcement-learning
6
  - reinforcement-learning
7
  - stable-baselines3
 
8
  model-index:
9
  - name: PPO
10
  results:
@@ -19,19 +20,54 @@ model-index:
19
  value: 240.31 +/- 69.19
20
  name: mean_reward
21
  verified: false
 
 
 
22
  ---
23
 
24
  # **PPO** Agent playing **LunarLander-v2**
25
  This is a trained model of a **PPO** agent playing **LunarLander-v2**
26
- using the [stable-baselines3 library](https://github.com/DLR-RM/stable-baselines3).
 
 
27
 
28
  ## Usage (with Stable-baselines3)
29
- TODO: Add your code
 
 
30
 
 
 
 
 
31
 
32
- ```python
33
- from stable_baselines3 import ...
34
- from huggingface_sb3 import load_from_hub
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
35
 
36
- ...
37
- ```
 
 
 
 
5
  - deep-reinforcement-learning
6
  - reinforcement-learning
7
  - stable-baselines3
8
+ - gymnasium
9
  model-index:
10
  - name: PPO
11
  results:
 
20
  value: 240.31 +/- 69.19
21
  name: mean_reward
22
  verified: false
23
+ language:
24
+ - en
25
+ pipeline_tag: reinforcement-learning
26
  ---
27
 
28
  # **PPO** Agent playing **LunarLander-v2**
29
  This is a trained model of a **PPO** agent playing **LunarLander-v2**
30
+ using the [stable-baselines3 library](https://github.com/DLR-RM/stable-baselines3)
31
+
32
+ This model is trained with the help of [Deep RL Course by HuggingFace](https://huggingface.co/learn/deep-rl-course/unit0/introduction)
33
 
34
  ## Usage (with Stable-baselines3)
35
+ ```python
36
+ # necessary libraries
37
+ import gymnasium as gym
38
 
39
+ from huggingface_sb3 import load_from_hub, package_to_hub
40
+ from huggingface_hub import (
41
+ notebook_login,
42
+ )
43
 
44
+ from stable_baselines3 import PPO
45
+ from stable_baselines3.common.env_util import make_vec_env
46
+ from stable_baselines3.common.evaluation import evaluate_policy
47
+ from stable_baselines3.common.monitor import Monitor
48
+
49
+ # Step 1 : Create an environment
50
+ env = gym.make("LunarLander-v2")
51
+ observation,info = env.reset() # initialize the environment
52
+
53
+ # Step 2 : Create the model
54
+ model = PPO(
55
+ policy = "MlpPolicy", # Multiple Layer Perceptron Policy
56
+ env = env,
57
+ n_steps = 1024,
58
+ batch_size = 64,
59
+ n_epochs = 5,
60
+ gamma = 0.995, # discount factor
61
+ gae_lambda = 0.98, # close to 1 - more bias and less variance
62
+ ent_coef = 0.01, # exploration exploitation tradeoff
63
+ verbose = 1
64
+ )
65
+
66
+ # Step 3 : Train the model
67
+ model.learn(total_timesteps=2000000,progress_bar = True)
68
 
69
+ # Step 4 : Evaluation
70
+ eval_env = Monitor(gym.make("LunarLander-v2"))
71
+ mean_reward,std_reward = evaluate_policy(model,eval_env,n_eval_episodes = 10 ,deterministic=True)
72
+ print(f"Mean reward : {mean_reward} +/- {std_reward}")
73
+ ```