mrm8488 commited on
Commit
824185e
·
1 Parent(s): 72528ab

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +50 -1
README.md CHANGED
@@ -1,7 +1,56 @@
 
1
  ---
2
  tags:
 
 
3
  - deep-reinforcement-learning
4
  - reinforcement-learning
5
  - stable-baselines3
6
  ---
7
- # TODO: Fill this model card
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #@title
2
  ---
3
  tags:
4
+ - bipedal
5
+ - walker
6
  - deep-reinforcement-learning
7
  - reinforcement-learning
8
  - stable-baselines3
9
  ---
10
+ # PPO BipedalWalker v3
11
+
12
+ This is a pre-trained model of a PPO agent playing BipedalWalker-v3 using the [stable-baselines3](https://github.com/DLR-RM/stable-baselines3) library.
13
+
14
+ <video loop="" autoplay="" controls="" src="https://huggingface.co/mrm8488/ppo-BipedalWalker-v3/resolve/main/output.mp4"></video>
15
+
16
+ ### Usage (with Stable-baselines3)
17
+ Using this model becomes easy when you have stable-baselines3 and huggingface_sb3 installed:
18
+
19
+ ```
20
+ pip install stable-baselines3
21
+ pip install huggingface_sb3
22
+ ```
23
+
24
+ Then, you can use the model like this:
25
+
26
+ ```python
27
+ import gym
28
+
29
+ from huggingface_sb3 import load_from_hub
30
+ from stable_baselines3 import PPO
31
+ from stable_baselines3.common.evaluation import evaluate_policy
32
+
33
+ # Retrieve the model from the hub
34
+ ## repo_id = id of the model repository from the Hugging Face Hub (repo_id = {organization}/{repo_name})
35
+ ## filename = name of the model zip file from the repository
36
+ checkpoint = load_from_hub(repo_id="mrm8488/ppo-BipedalWalker-v3", filename="bipedalwalker-v3.zip")
37
+ model = PPO.load(checkpoint)
38
+
39
+ # Evaluate the agent
40
+ eval_env = gym.make('{environment}')
41
+ mean_reward, std_reward = evaluate_policy(model, eval_env, n_eval_episodes=10, deterministic=True)
42
+ print(f"mean_reward={mean_reward:.2f} +/- {std_reward}")
43
+
44
+ # Watch the agent play
45
+ obs = env.reset()
46
+ for i in range(1000):
47
+ action, _state = model.predict(obs)
48
+ obs, reward, done, info = env.step(action)
49
+ env.render()
50
+ if done:
51
+ obs = env.reset()
52
+ env.close()
53
+ ```
54
+
55
+ ### Evaluation Results
56
+ Mean_reward: 213.55 +/- 113.82