thien1892
/

LunarLander-v2-ppo-v5

Reinforcement Learning

stable-baselines3

LunarLander-v2

deep-reinforcement-learning

Eval Results

Model card Files Files and versions Community

thien1892 commited on Jan 11, 2023

Commit

b00d034

1 Parent(s): 9af6a14

Update README.md

Browse files

Files changed (1) hide show

README.md +64 -4

README.md CHANGED Viewed

@@ -21,13 +21,60 @@ model-index:
       verified: false
 ---
-# **PPO** Agent playing **LunarLander-v2**
 This is a trained model of a **PPO** agent playing **LunarLander-v2**
 using the [stable-baselines3 library](https://github.com/DLR-RM/stable-baselines3).
-## Re-train model (with Stable-baselines3)
-TODO: Add your code
 ```python
 # Load a saved LunarLander model from the Hub and retrain
@@ -62,7 +109,7 @@ mean_reward, std_reward = evaluate_policy(model, eval_env, n_eval_episodes=10, d
 print(f"mean_reward={mean_reward:.2f} +/- {std_reward}")
 ```
-## Pust to HF hub
 ```python
 notebook_login()
@@ -94,4 +141,17 @@ package_to_hub(model=model, # Our trained model
                eval_env=eval_env, # Evaluation Environment
                repo_id=repo_id, # id of the model repository from the Hugging Face Hub (repo_id = {organization}/{repo_name} for instance ThomasSimonini/ppo-LunarLander-v2
                commit_message=commit_message)
 ```

       verified: false
 ---
+# Train your first Deep Reinforcement Learning Agent 🤖
+![Cover](https://github.com/huggingface/deep-rl-class/blob/main/unit1/assets/img/thumbnail.png?raw=true)
 This is a trained model of a **PPO** agent playing **LunarLander-v2**
 using the [stable-baselines3 library](https://github.com/DLR-RM/stable-baselines3).
+## 1. Install package
+```python
+from IPython.display import clear_output
+!apt install swig cmake
+!pip install -r https://raw.githubusercontent.com/huggingface/deep-rl-class/main/notebooks/unit1/requirements-unit1.txt
+!sudo apt-get update
+!apt install python-opengl
+!apt install ffmpeg
+!apt install xvfb
+!pip3 install pyvirtualdisplay
+clear_output()
+```
+Restart notebook
+```
+import os
+os.kill(os.getpid(), 9)
+```
+## 2. Use model
+```python
+from huggingface_sb3 import load_from_hub
+repo_id = "thien1892/LunarLander-v2-ppo-5m"
+filename = "ppo-LunarLander-v2-5m.zip" # The model filename.zip
+# When the model was trained on Python 3.8 the pickle protocol is 5
+# But Python 3.6, 3.7 use protocol 4
+# In order to get compatibility we need to:
+# 1. Install pickle5 (we done it at the beginning of the colab)
+# 2. Create a custom empty object we pass as parameter to PPO.load()
+custom_objects = {
+            "learning_rate": 0.0,
+            "lr_schedule": lambda _: 0.0,
+            "clip_range": lambda _: 0.0,
+}
+checkpoint = load_from_hub(repo_id, filename)
+model = PPO.load(checkpoint, custom_objects=custom_objects, print_system_info=True)
+```
+Evaluate
+```python
+eval_env = gym.make("LunarLander-v2")
+mean_reward, std_reward = evaluate_policy(model, eval_env, n_eval_episodes=10, deterministic=True)
+print(f"mean_reward={mean_reward:.2f} +/- {std_reward}")
+```
+## 3. Re-train model (choice 1)
 ```python
 # Load a saved LunarLander model from the Hub and retrain
 print(f"mean_reward={mean_reward:.2f} +/- {std_reward}")
 ```
+Pust to HF hub
 ```python
 notebook_login()
                eval_env=eval_env, # Evaluation Environment
                repo_id=repo_id, # id of the model repository from the Hugging Face Hub (repo_id = {organization}/{repo_name} for instance ThomasSimonini/ppo-LunarLander-v2
                commit_message=commit_message)
+```
+## 4. Re-train model (choice 2)
+- Change `--repo_id` become your repo id :)
+- `--id_retrain` and `--filename_retrain` in order to load my trained model, you can change to your trained model
+```python
+!python train_and_push.py --repo_id "thien1892/LunarLander-v2-ppo-v3" \
+--commit_message "retrain model from hub 5m" \
+--id_retrain "thien1892/LunarLander-v2-ppo-v5" \
+--filename_retrain "ppo-LunarLander-v2.zip" \
+--total_timesteps 5000000 \
+--learning_rate 1e-6 \
+--n_envs 64
 ```