MY11111111
commited on
Commit
โข
87e61f3
1
Parent(s):
dca9046
Update README.md
Browse files
README.md
CHANGED
@@ -5,21 +5,89 @@ tags:
|
|
5 |
- deep-reinforcement-learning
|
6 |
- reinforcement-learning
|
7 |
- ML-Agents-Pyramids
|
|
|
|
|
|
|
8 |
---
|
9 |
|
10 |
-
|
11 |
-
This is a trained model of a **ppo** agent playing **Pyramids**
|
12 |
-
using the [Unity ML-Agents Library](https://github.com/Unity-Technologies/ml-agents).
|
13 |
|
14 |
-
|
15 |
-
|
|
|
16 |
|
17 |
-
|
18 |
-
|
19 |
-
|
20 |
-
|
21 |
-
https://huggingface.co/learn/deep-rl-course/unit5/introduction
|
22 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
23 |
### Resume the training
|
24 |
```bash
|
25 |
mlagents-learn <your_configuration_file_path.yaml> --run-id=<run_id> --resume
|
@@ -32,4 +100,3 @@ tags:
|
|
32 |
2. Step 1: Find your model_id: MY11111111/ppo-Pyramids123
|
33 |
3. Step 2: Select your *.nn /*.onnx file
|
34 |
4. Click on Watch the agent play ๐
|
35 |
-
|
|
|
5 |
- deep-reinforcement-learning
|
6 |
- reinforcement-learning
|
7 |
- ML-Agents-Pyramids
|
8 |
+
language:
|
9 |
+
- en
|
10 |
+
pipeline_tag: reinforcement-learning
|
11 |
---
|
12 |
|
13 |
+
---
|
|
|
|
|
14 |
|
15 |
+
# **PPO AI Agents Playing Pyramids**
|
16 |
+
|
17 |
+
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit7/pyramids.gif" alt="Pyramids"/>
|
18 |
|
19 |
+
This is a trained model of a **ppo** agent playing **UNITY game Pyramids** Using Q-learning.
|
20 |
+
I used the [Unity ML-Agents Library](https://github.com/Unity-Technologies/ml-agents).
|
21 |
+
Throughout this notebook you will learn about how to train AI agent using Q learning in a Unity Game 3D game environment. Utilizing the different curiosity and exploitation values as well as manipulating the various hyperperameters to get the best training results.
|
22 |
+
It is an easy notebook to follow through with excellent instructions so if you want to learn more about the process used to train these AI agents in 3D environments I highly recommend this project.
|
|
|
23 |
|
24 |
+
Below is a few different resources I have gathered to troubleshoot through problems I have faced, basic info about how the model works and how you can improve the model
|
25 |
+
|
26 |
+
|
27 |
+
## **Learning components of this model:**
|
28 |
+
|
29 |
+
<img src="https://cdn-lfs.huggingface.co/repos/48/e0/48e06489d875e3d8a62c53306ab6e114abc24ab8fb4cba7652e808785a6bdc24/f0ff122f71f964288bf4fc216472f5c105f24c8b3107c007707ae1c8fecdb653?response-content-disposition=inline%3B+filename*%3DUTF-8%27%27RL_process_game.jpg%3B+filename%3D%22RL_process_game.jpg%22%3B&response-content-type=image%2Fjpeg&Expires=1714697644&Policy=eyJTdGF0ZW1lbnQiOlt7IkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTcxNDY5NzY0NH19LCJSZXNvdXJjZSI6Imh0dHBzOi8vY2RuLWxmcy5odWdnaW5nZmFjZS5jby9yZXBvcy80OC9lMC80OGUwNjQ4OWQ4NzVlM2Q4YTYyYzUzMzA2YWI2ZTExNGFiYzI0YWI4ZmI0Y2JhNzY1MmU4MDg3ODVhNmJkYzI0L2YwZmYxMjJmNzFmOTY0Mjg4YmY0ZmMyMTY0NzJmNWMxMDVmMjRjOGIzMTA3YzAwNzcwN2FlMWM4ZmVjZGI2NTM%7EcmVzcG9uc2UtY29udGVudC1kaXNwb3NpdGlvbj0qJnJlc3BvbnNlLWNvbnRlbnQtdHlwZT0qIn1dfQ__&Signature=xv5GLSYAUY%7E0cL0kgCR4aQ6rMqcG-BYc5g1HzmybPb33X3Yk0fefddSwVqbErJ%7Eq4Olh6aS0-xv6KHBOtI8Xv1DDzJo6h2yvHFLkE%7EbqFpeVjig2VgGCrSxzjtRuzY3xhgL0nmBYLKersb%7E7fSZ-2JNDyqwTIfFSPhJJLwH6SqzjCLPUQxBoxAvrGBx2I0z%7Es0Zrz9RancvDKGDLmSh1vcRKnpNoeMNyTbdZIYKgZ18bg4gQwpCl6%7EN9mblNrdGlO-Z9O6RKzR7RJWHtZkfk5MBL-5t6AwflaR%7EMqIy4rEPOWBb38gEi4B-xuskiImg8e6dKwxduhckRiOBTokWXug__&Key-Pair-Id=KVTP0A1DKRTAX"/>
|
30 |
+
|
31 |
+
1. Agent component: training agents by optimizing their policy(policy based method, unlike value based methods they optimize the policy itself instead of values) telling the model what action to take in each step in the model called a brain.
|
32 |
+
2. For this model we will be using a proximal policy optimizer (PPO) as seen at the title of the model card. PPO is ideal for training AI agents in Unity games because it is sample-efficient, stable during training, compatible with neural networks, handles both continuous and discrete action spaces, and robust in handling complex game dynamics and mechanics.
|
33 |
+
|
34 |
+
|
35 |
+
## **Improving model training through hyperparameters adjusting**
|
36 |
+
These hyperparameter tunings can be adjusted within the Pyramid RND file component on the side and below is a detailed list on what changing each individual parameter will impact the training. Just be minful after making changes you need to run the code responsible for copying the file into the envs executable linux, as well as unzipping along with retraining to implement these new parameters into your model.
|
37 |
+
|
38 |
+
1. Trainer type: the type of trainer being used here we use Proximal policy optimization
|
39 |
+
2. Summary_freq: How often the training summaries and statisitcs are recorded(rewards, losses, lengths, time etc )
|
40 |
+
3. Keep_checkpoints: number of recent checkpoints to keep checkpoints are snapshots of training models for resumign training or evaluation
|
41 |
+
4. Checkpoint interval: how often(many steps) save checkpoints
|
42 |
+
5. Max_steps: Maximum number of steps or interactions
|
43 |
+
6. Time_horizon: The number of steps the agent considers when making decisions
|
44 |
+
7. Threaded: Enables multi-threading during training(may allow for faster processing, parts of code run simultaneously)
|
45 |
+
8. Hyperparameters:
|
46 |
+
9. Learning rate: How quickly the agents adjust their behavior based on feedback
|
47 |
+
10. Learning rate_schedule: the rule that used to adjust or modify the learning rate during the training process
|
48 |
+
11. Batch_size: number of samples used in each updated batch training
|
49 |
+
12. Buffer_size :size of the experience replay buffer, which stores past experiences for training updates.
|
50 |
+
13. Beta: exploration levels
|
51 |
+
14. Epilson:It limits the size of behavior changes to prevent large policy updates.
|
52 |
+
15. Lambd: It helps estimate the advantage of taking a particular action in a given state.
|
53 |
+
16. Num_epoch:Specifies the number of times the entire dataset is used for training updates. Each epoch consists of multiple iterations over the dataset.
|
54 |
+
|
55 |
+
**Network Settings:(architecture for neural network)**
|
56 |
+
|
57 |
+
17. Normalize:It determines whether input observations are normalized.
|
58 |
+
18. Hidden unit: Number of units in each hidden layers
|
59 |
+
19. Num layers: Number of hidden layers the model has
|
60 |
+
20. Vis_encode_type: ways visual observations are encoded
|
61 |
+
|
62 |
+
**Reward Signals**
|
63 |
+
|
64 |
+
21. Gamma: It determines the importance of future rewards compared to immediate rewards.
|
65 |
+
22. Strength: It controls the weight of the primary reward signal relative to other rewards, if present.
|
66 |
+
|
67 |
+
|
68 |
+
## **Trouble Shooting**
|
69 |
+
Here are some problems I encountered and solutions I used, and also things I wished I knew in hindsight
|
70 |
+
|
71 |
+
**GPU not connecting**
|
72 |
+
|
73 |
+
Sometimes the GPU can get overwhelmed causing the code to not load if you have pressed it too many times and too many piled up commands.
|
74 |
+
You can check on the right top side if the GPU is being used, if it shows "connecting" or gives you the error gpu is not connected would you like to continue anyways one way is under the tab manage sessions,
|
75 |
+
you can terminate previous sessions and start again from my own experience this has rebooted the session and gpu was able to connect
|
76 |
+
|
77 |
+
**Unizipping files wont load**
|
78 |
+
|
79 |
+
I have struggled with the line of code regarding unzipping the Pyramid files struggling to load, one method could be reconnecting the GPU as I have mentioned earlier
|
80 |
+
but if that still doesnt work you can download the code from the link
|
81 |
+
https://colab.research.google.com/corgiredirector?site=https%3A%2F%2Fhuggingface.co%2Fspaces%2Funity%2FML-Agents-Pyramids%2Fresolve%2Fmain%2FPyramids.zip then unzipping on your computer then reuploading it to the corresponding folder location in training-envs-executables/linux/
|
82 |
+
|
83 |
+
**File does not exist error code**
|
84 |
+
|
85 |
+
When running a code results in a "this file does not exist, or this folder does not exist" it could be from not correctly loading previous code blocks or run time was lost if you closed down the program. You can check if this is the case by going into the side directory of files and go under the corresponding folders to check if files are indeed there. If not just reload the blocks of code that creates the files.
|
86 |
+
|
87 |
+
**Connecting to google drive**
|
88 |
+
|
89 |
+
In order for the code to run it needs to be mounted to your google drive. So if you a running this through an organizations google account for example schools. It may need to be approved from the IT for it to be allowed to be mounted to the google drive. So make sure that is cleared before continuing the notebook.
|
90 |
+
|
91 |
### Resume the training
|
92 |
```bash
|
93 |
mlagents-learn <your_configuration_file_path.yaml> --run-id=<run_id> --resume
|
|
|
100 |
2. Step 1: Find your model_id: MY11111111/ppo-Pyramids123
|
101 |
3. Step 2: Select your *.nn /*.onnx file
|
102 |
4. Click on Watch the agent play ๐
|
|