thien1892 commited on
Commit
b00d034
1 Parent(s): 9af6a14

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +64 -4
README.md CHANGED
@@ -21,13 +21,60 @@ model-index:
21
  verified: false
22
  ---
23
 
24
- # **PPO** Agent playing **LunarLander-v2**
 
 
25
  This is a trained model of a **PPO** agent playing **LunarLander-v2**
26
  using the [stable-baselines3 library](https://github.com/DLR-RM/stable-baselines3).
27
 
28
- ## Re-train model (with Stable-baselines3)
29
- TODO: Add your code
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
30
 
 
31
 
32
  ```python
33
  # Load a saved LunarLander model from the Hub and retrain
@@ -62,7 +109,7 @@ mean_reward, std_reward = evaluate_policy(model, eval_env, n_eval_episodes=10, d
62
  print(f"mean_reward={mean_reward:.2f} +/- {std_reward}")
63
  ```
64
 
65
- ## Pust to HF hub
66
 
67
  ```python
68
  notebook_login()
@@ -94,4 +141,17 @@ package_to_hub(model=model, # Our trained model
94
  eval_env=eval_env, # Evaluation Environment
95
  repo_id=repo_id, # id of the model repository from the Hugging Face Hub (repo_id = {organization}/{repo_name} for instance ThomasSimonini/ppo-LunarLander-v2
96
  commit_message=commit_message)
 
 
 
 
 
 
 
 
 
 
 
 
 
97
  ```
 
21
  verified: false
22
  ---
23
 
24
+ # Train your first Deep Reinforcement Learning Agent 🤖
25
+ ![Cover](https://github.com/huggingface/deep-rl-class/blob/main/unit1/assets/img/thumbnail.png?raw=true)
26
+
27
  This is a trained model of a **PPO** agent playing **LunarLander-v2**
28
  using the [stable-baselines3 library](https://github.com/DLR-RM/stable-baselines3).
29
 
30
+ ## 1. Install package
31
+ ```python
32
+ from IPython.display import clear_output
33
+ !apt install swig cmake
34
+ !pip install -r https://raw.githubusercontent.com/huggingface/deep-rl-class/main/notebooks/unit1/requirements-unit1.txt
35
+ !sudo apt-get update
36
+ !apt install python-opengl
37
+ !apt install ffmpeg
38
+ !apt install xvfb
39
+ !pip3 install pyvirtualdisplay
40
+ clear_output()
41
+ ```
42
+
43
+ Restart notebook
44
+ ```
45
+ import os
46
+ os.kill(os.getpid(), 9)
47
+ ```
48
+
49
+ ## 2. Use model
50
+
51
+ ```python
52
+ from huggingface_sb3 import load_from_hub
53
+ repo_id = "thien1892/LunarLander-v2-ppo-5m"
54
+ filename = "ppo-LunarLander-v2-5m.zip" # The model filename.zip
55
+
56
+ # When the model was trained on Python 3.8 the pickle protocol is 5
57
+ # But Python 3.6, 3.7 use protocol 4
58
+ # In order to get compatibility we need to:
59
+ # 1. Install pickle5 (we done it at the beginning of the colab)
60
+ # 2. Create a custom empty object we pass as parameter to PPO.load()
61
+ custom_objects = {
62
+ "learning_rate": 0.0,
63
+ "lr_schedule": lambda _: 0.0,
64
+ "clip_range": lambda _: 0.0,
65
+ }
66
+
67
+ checkpoint = load_from_hub(repo_id, filename)
68
+ model = PPO.load(checkpoint, custom_objects=custom_objects, print_system_info=True)
69
+ ```
70
+ Evaluate
71
+ ```python
72
+ eval_env = gym.make("LunarLander-v2")
73
+ mean_reward, std_reward = evaluate_policy(model, eval_env, n_eval_episodes=10, deterministic=True)
74
+ print(f"mean_reward={mean_reward:.2f} +/- {std_reward}")
75
+ ```
76
 
77
+ ## 3. Re-train model (choice 1)
78
 
79
  ```python
80
  # Load a saved LunarLander model from the Hub and retrain
 
109
  print(f"mean_reward={mean_reward:.2f} +/- {std_reward}")
110
  ```
111
 
112
+ Pust to HF hub
113
 
114
  ```python
115
  notebook_login()
 
141
  eval_env=eval_env, # Evaluation Environment
142
  repo_id=repo_id, # id of the model repository from the Hugging Face Hub (repo_id = {organization}/{repo_name} for instance ThomasSimonini/ppo-LunarLander-v2
143
  commit_message=commit_message)
144
+ ```
145
+
146
+ ## 4. Re-train model (choice 2)
147
+ - Change `--repo_id` become your repo id :)
148
+ - `--id_retrain` and `--filename_retrain` in order to load my trained model, you can change to your trained model
149
+ ```python
150
+ !python train_and_push.py --repo_id "thien1892/LunarLander-v2-ppo-v3" \
151
+ --commit_message "retrain model from hub 5m" \
152
+ --id_retrain "thien1892/LunarLander-v2-ppo-v5" \
153
+ --filename_retrain "ppo-LunarLander-v2.zip" \
154
+ --total_timesteps 5000000 \
155
+ --learning_rate 1e-6 \
156
+ --n_envs 64
157
  ```