MY11111111
commited on
Commit
•
7c9d462
1
Parent(s):
87e61f3
Update README.md
Browse files
README.md
CHANGED
@@ -15,13 +15,20 @@ pipeline_tag: reinforcement-learning
|
|
15 |
# **PPO AI Agents Playing Pyramids**
|
16 |
|
17 |
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit7/pyramids.gif" alt="Pyramids"/>
|
|
|
|
|
|
|
18 |
|
19 |
-
This is a trained model of a **ppo** agent playing **UNITY game Pyramids** Using Q-learning.
|
20 |
I used the [Unity ML-Agents Library](https://github.com/Unity-Technologies/ml-agents).
|
21 |
Throughout this notebook you will learn about how to train AI agent using Q learning in a Unity Game 3D game environment. Utilizing the different curiosity and exploitation values as well as manipulating the various hyperperameters to get the best training results.
|
22 |
-
It is an easy notebook to follow through with excellent instructions so if you want to learn more about the process used to train these AI agents in 3D environments I highly recommend this project.
|
|
|
23 |
|
24 |
-
|
|
|
|
|
|
|
25 |
|
26 |
|
27 |
## **Learning components of this model:**
|
@@ -31,8 +38,17 @@ pipeline_tag: reinforcement-learning
|
|
31 |
1. Agent component: training agents by optimizing their policy(policy based method, unlike value based methods they optimize the policy itself instead of values) telling the model what action to take in each step in the model called a brain.
|
32 |
2. For this model we will be using a proximal policy optimizer (PPO) as seen at the title of the model card. PPO is ideal for training AI agents in Unity games because it is sample-efficient, stable during training, compatible with neural networks, handles both continuous and discrete action spaces, and robust in handling complex game dynamics and mechanics.
|
33 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
34 |
|
35 |
## **Improving model training through hyperparameters adjusting**
|
|
|
36 |
These hyperparameter tunings can be adjusted within the Pyramid RND file component on the side and below is a detailed list on what changing each individual parameter will impact the training. Just be minful after making changes you need to run the code responsible for copying the file into the envs executable linux, as well as unzipping along with retraining to implement these new parameters into your model.
|
37 |
|
38 |
1. Trainer type: the type of trainer being used here we use Proximal policy optimization
|
@@ -69,15 +85,19 @@ pipeline_tag: reinforcement-learning
|
|
69 |
Here are some problems I encountered and solutions I used, and also things I wished I knew in hindsight
|
70 |
|
71 |
**GPU not connecting**
|
72 |
-
|
73 |
Sometimes the GPU can get overwhelmed causing the code to not load if you have pressed it too many times and too many piled up commands.
|
74 |
You can check on the right top side if the GPU is being used, if it shows "connecting" or gives you the error gpu is not connected would you like to continue anyways one way is under the tab manage sessions,
|
75 |
-
you can terminate previous sessions and start again from my own experience this has rebooted the session and gpu was able to connect
|
|
|
|
|
76 |
|
|
|
|
|
77 |
**Unizipping files wont load**
|
78 |
|
79 |
I have struggled with the line of code regarding unzipping the Pyramid files struggling to load, one method could be reconnecting the GPU as I have mentioned earlier
|
80 |
-
but if that still doesnt work you can download the code from the link
|
81 |
https://colab.research.google.com/corgiredirector?site=https%3A%2F%2Fhuggingface.co%2Fspaces%2Funity%2FML-Agents-Pyramids%2Fresolve%2Fmain%2FPyramids.zip then unzipping on your computer then reuploading it to the corresponding folder location in training-envs-executables/linux/
|
82 |
|
83 |
**File does not exist error code**
|
@@ -86,17 +106,22 @@ pipeline_tag: reinforcement-learning
|
|
86 |
|
87 |
**Connecting to google drive**
|
88 |
|
89 |
-
In order for the code to run it needs to be mounted to your google drive. So if you a running this through an organizations google account for example schools. It may need to be approved from the IT for it to be allowed to be mounted to the google drive. So make sure that is cleared before continuing the notebook.
|
|
|
|
|
|
|
|
|
|
|
|
|
90 |
|
91 |
-
### Resume the training
|
92 |
-
```bash
|
93 |
-
mlagents-learn <your_configuration_file_path.yaml> --run-id=<run_id> --resume
|
94 |
-
```
|
95 |
|
96 |
### Watch your Agent play
|
|
|
97 |
You can watch your agent **playing directly in your browser**
|
98 |
-
|
|
|
|
|
99 |
1. If the environment is part of ML-Agents official environments, go to https://huggingface.co/unity
|
100 |
-
2. Step 1: Find your model_id: MY11111111/ppo-Pyramids123
|
101 |
3. Step 2: Select your *.nn /*.onnx file
|
102 |
4. Click on Watch the agent play 👀
|
|
|
15 |
# **PPO AI Agents Playing Pyramids**
|
16 |
|
17 |
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit7/pyramids.gif" alt="Pyramids"/>
|
18 |
+
|
19 |
+
**DISCLAIMER: In this notebook it has two environments you can train agents to play in the snowball and pyramid one. In this model card I am only covering the Pyramid game model. But a lot of these trouble shooting guides will be applicable to the snowball environment as well.**
|
20 |
+
|
21 |
|
22 |
+
This is a trained model of a **ppo** agent playing **UNITY game Pyramids** Using Q-learning and reinforcement learning to train the agent to navigate around a simple maze environment where they need to activate a button that the pyramid then they need to locate the pyramid and knock the pyramid stacked blocks over so the green block on top falls to the ground.
|
23 |
I used the [Unity ML-Agents Library](https://github.com/Unity-Technologies/ml-agents).
|
24 |
Throughout this notebook you will learn about how to train AI agent using Q learning in a Unity Game 3D game environment. Utilizing the different curiosity and exploitation values as well as manipulating the various hyperperameters to get the best training results.
|
25 |
+
It is an easy notebook to follow through with excellent instructions so if you want to learn more about the process used to train these AI agents in 3D environments I highly recommend this project. It is best if you have some experiences learning about or working with deep learning machine learning, because it may be difficult to understand this reinforcement learning process. If you dont have experiences or interested in learning more you can access resources to more introductory notebooks in this link.
|
26 |
+
https://huggingface.co/learn/deep-rl-course/unit1/introduction
|
27 |
|
28 |
+
So if you are interested in continuing in training AI agents to playing the Unity Pyramids thats great! Because below is a few different resources I have gathered to troubleshoot through problems I have faced, basic info about how the model works and how you can improve the model. I wish I had known about before completing this notebook. Which will hopefully make it easier for you on your journey.
|
29 |
+
Also here is the link of my working model
|
30 |
+
https://colab.research.google.com/drive/1W3omht-9b_ybPlmpaisEek9Mgy5LV875?usp=sharing
|
31 |
+
Also heres a video demoing what the aim of the AI agents in this pyramid game is https://www.youtube.com/watch?v=Ab2fHTMGf50
|
32 |
|
33 |
|
34 |
## **Learning components of this model:**
|
|
|
38 |
1. Agent component: training agents by optimizing their policy(policy based method, unlike value based methods they optimize the policy itself instead of values) telling the model what action to take in each step in the model called a brain.
|
39 |
2. For this model we will be using a proximal policy optimizer (PPO) as seen at the title of the model card. PPO is ideal for training AI agents in Unity games because it is sample-efficient, stable during training, compatible with neural networks, handles both continuous and discrete action spaces, and robust in handling complex game dynamics and mechanics.
|
40 |
|
41 |
+
**Curiosity in training ML agents:**
|
42 |
+
In this reinforcement learning project you will need to understand how curiosity plays a part in the training.In short, Traditionally a reward system is used to train ML agents but for more complicated games and obscure objective it is hard to manually place rewards for the agent.
|
43 |
+
Curiosity is rewarding the model for taking new trajectories for example exploring new rooms
|
44 |
+
|
45 |
+
Here is a youtube videos that helped me understand the concept:
|
46 |
+
https://www.youtube.com/watch?v=eLq6yI2No (this one talks specifically about the game environment)
|
47 |
+
https://www.youtube.com/watch?v=nIgIv4IfJ6s (And this one illustrates more generally about reinforcement learning but also covers curiosity, great and easy to understand for beginners, also if you are intersted in learning more about how ai and machine learning the rest of this crash course series is great as well)
|
48 |
+
|
49 |
|
50 |
## **Improving model training through hyperparameters adjusting**
|
51 |
+
So once you have a working model and want to improve the training outcomes.
|
52 |
These hyperparameter tunings can be adjusted within the Pyramid RND file component on the side and below is a detailed list on what changing each individual parameter will impact the training. Just be minful after making changes you need to run the code responsible for copying the file into the envs executable linux, as well as unzipping along with retraining to implement these new parameters into your model.
|
53 |
|
54 |
1. Trainer type: the type of trainer being used here we use Proximal policy optimization
|
|
|
85 |
Here are some problems I encountered and solutions I used, and also things I wished I knew in hindsight
|
86 |
|
87 |
**GPU not connecting**
|
88 |
+
|
89 |
Sometimes the GPU can get overwhelmed causing the code to not load if you have pressed it too many times and too many piled up commands.
|
90 |
You can check on the right top side if the GPU is being used, if it shows "connecting" or gives you the error gpu is not connected would you like to continue anyways one way is under the tab manage sessions,
|
91 |
+
you can terminate previous sessions and start again from my own experience this has rebooted the session and gpu was able to connect.
|
92 |
+
|
93 |
+
**Restarting session for numpy**
|
94 |
|
95 |
+
When you are running the second block of code that downloads all the packages you will need for this notebook it is important when the popup appears telling you to restart notebook for numpy to accept it as it will need to reboot in order for the packages to work correctly and after restarting you can continue on the notebook running from the next following code block.
|
96 |
+
|
97 |
**Unizipping files wont load**
|
98 |
|
99 |
I have struggled with the line of code regarding unzipping the Pyramid files struggling to load, one method could be reconnecting the GPU as I have mentioned earlier
|
100 |
+
but if that still doesnt work you can download the code from the link. And unzip it on your computer then reuploaded to the corresponding folder to bypass that line of code.
|
101 |
https://colab.research.google.com/corgiredirector?site=https%3A%2F%2Fhuggingface.co%2Fspaces%2Funity%2FML-Agents-Pyramids%2Fresolve%2Fmain%2FPyramids.zip then unzipping on your computer then reuploading it to the corresponding folder location in training-envs-executables/linux/
|
102 |
|
103 |
**File does not exist error code**
|
|
|
106 |
|
107 |
**Connecting to google drive**
|
108 |
|
109 |
+
1. In order for the code to run it needs to be mounted to your google drive. So if you a running this through an organizations google account for example schools. It may need to be approved from the IT for it to be allowed to be mounted to the google drive. So make sure that is cleared before continuing the notebook.
|
110 |
+
2. Also Another cause to the drive not connecting is you may have popups blocked so you will need to allow popups for google collab or else the option to connect ot connect to your google drive will not appear
|
111 |
+
|
112 |
+
**Saving progress**
|
113 |
+
|
114 |
+
While run info is saved to your google drive. Since this is a edited notebook changes you make to the hyper parameters are not saved so everytime you rerun the notebook it will reset to the prexisting values. Also training progress cannot be called back if you reopen a later time, you will need to rerun the whole code and retrain which is quite time consuming so I recommend you using the resource.
|
115 |
+
https://learningmaterialcomputations.medium.com/save-any-file-from-google-colab-to-your-google-drive-caf8291ba59b#:~:text=Step%201%3A%20Mount%20your%20google,that%20you're%20working%20with.&text=Step%202%3A%20Authorise%20Google%20Colab,the%20%E2%80%9CCopy%20Path%E2%80%9D%20option.
|
116 |
|
|
|
|
|
|
|
|
|
117 |
|
118 |
### Watch your Agent play
|
119 |
+
|
120 |
You can watch your agent **playing directly in your browser**
|
121 |
+
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit7/snowballtarget_load.png" alt="Snowballtarget load"/>
|
122 |
+
After correctly training the agent and uploading it to the hub it should produce a link that leads you to this page where you can see your agent playing.
|
123 |
+
|
124 |
1. If the environment is part of ML-Agents official environments, go to https://huggingface.co/unity
|
125 |
+
2. Step 1: Find your model_id: MY11111111/ppo-Pyramids123 (this is my model_id so you will need to isnert the one that is produced from your own notebooke)
|
126 |
3. Step 2: Select your *.nn /*.onnx file
|
127 |
4. Click on Watch the agent play 👀
|