PPO AI Agents Playing Pyramids
DISCLAIMER: In this notebook it has two environments you can train agents to play in the snowball and pyramid one. In this model card I am only covering the Pyramid game model. But a lot of these trouble shooting guides will be applicable to the snowball environment as well.
This is a trained model of a ppo agent playing UNITY game Pyramids Using Q-learning and reinforcement learning to train the agent to navigate around a simple maze environment where they need to activate a button that the pyramid then they need to locate the pyramid and knock the pyramid stacked blocks over so the green block on top falls to the ground. I used the Unity ML-Agents Library. Throughout this notebook you will learn about how to train AI agent using Q learning in a Unity Game 3D game environment. Utilizing the different curiosity and exploitation values as well as manipulating the various hyperperameters to get the best training results. It is an easy notebook to follow through with excellent instructions so if you want to learn more about the process used to train these AI agents in 3D environments I highly recommend this project. It is best if you have some experiences learning about or working with deep learning machine learning, because it may be difficult to understand this reinforcement learning process. If you dont have experiences or interested in learning more you can access resources to more introductory notebooks in this link. https://huggingface.co/learn/deep-rl-course/unit1/introduction
So if you are interested in continuing in training AI agents to playing the Unity Pyramids thats great! Because below is a few different resources I have gathered to troubleshoot through problems I have faced, basic info about how the model works and how you can improve the model. I wish I had known about before completing this notebook. Which will hopefully make it easier for you on your journey. Also here is the link of my working model https://colab.research.google.com/drive/1W3omht-9b_ybPlmpaisEek9Mgy5LV875?usp=sharing Also heres a video demoing what the aim of the AI agents in this pyramid game is https://www.youtube.com/watch?v=Ab2fHTMGf50
Learning components of this model:
- Agent component: training agents by optimizing their policy(policy based method, unlike value based methods they optimize the policy itself instead of values) telling the model what action to take in each step in the model called a brain.
- For this model we will be using a proximal policy optimizer (PPO) as seen at the title of the model card. PPO is ideal for training AI agents in Unity games because it is sample-efficient, stable during training, compatible with neural networks, handles both continuous and discrete action spaces, and robust in handling complex game dynamics and mechanics.
Curiosity in training ML agents: In this reinforcement learning project you will need to understand how curiosity plays a part in the training.In short, Traditionally a reward system is used to train ML agents but for more complicated games and obscure objective it is hard to manually place rewards for the agent. Curiosity is rewarding the model for taking new trajectories for example exploring new rooms
Here is a youtube videos that helped me understand the concept: https://www.youtube.com/watch?v=eLq6yI2No (this one talks specifically about the game environment) https://www.youtube.com/watch?v=nIgIv4IfJ6s (And this one illustrates more generally about reinforcement learning but also covers curiosity, great and easy to understand for beginners, also if you are intersted in learning more about how ai and machine learning the rest of this crash course series is great as well)
Improving model training through hyperparameters adjusting
So once you have a working model and want to improve the training outcomes. These hyperparameter tunings can be adjusted within the Pyramid RND file component on the side and below is a detailed list on what changing each individual parameter will impact the training. Just be minful after making changes you need to run the code responsible for copying the file into the envs executable linux, as well as unzipping along with retraining to implement these new parameters into your model.
- Trainer type: the type of trainer being used here we use Proximal policy optimization
- Summary_freq: How often the training summaries and statisitcs are recorded(rewards, losses, lengths, time etc )
- Keep_checkpoints: number of recent checkpoints to keep checkpoints are snapshots of training models for resumign training or evaluation
- Checkpoint interval: how often(many steps) save checkpoints
- Max_steps: Maximum number of steps or interactions
- Time_horizon: The number of steps the agent considers when making decisions
- Threaded: Enables multi-threading during training(may allow for faster processing, parts of code run simultaneously)
- Hyperparameters:
- Learning rate: How quickly the agents adjust their behavior based on feedback
- Learning rate_schedule: the rule that used to adjust or modify the learning rate during the training process
- Batch_size: number of samples used in each updated batch training
- Buffer_size :size of the experience replay buffer, which stores past experiences for training updates.
- Beta: exploration levels
- Epilson:It limits the size of behavior changes to prevent large policy updates.
- Lambd: It helps estimate the advantage of taking a particular action in a given state.
- Num_epoch:Specifies the number of times the entire dataset is used for training updates. Each epoch consists of multiple iterations over the dataset.
Network Settings:(architecture for neural network)
- Normalize:It determines whether input observations are normalized.
- Hidden unit: Number of units in each hidden layers
- Num layers: Number of hidden layers the model has
- Vis_encode_type: ways visual observations are encoded
Reward Signals
- Gamma: It determines the importance of future rewards compared to immediate rewards.
- Strength: It controls the weight of the primary reward signal relative to other rewards, if present.
Trouble Shooting
Here are some problems I encountered and solutions I used, and also things I wished I knew in hindsight
GPU not connecting
Sometimes the GPU can get overwhelmed causing the code to not load if you have pressed it too many times and too many piled up commands. You can check on the right top side if the GPU is being used, if it shows "connecting" or gives you the error gpu is not connected would you like to continue anyways one way is under the tab manage sessions, you can terminate previous sessions and start again from my own experience this has rebooted the session and gpu was able to connect.
Restarting session for numpy
When you are running the second block of code that downloads all the packages you will need for this notebook it is important when the popup appears telling you to restart notebook for numpy to accept it as it will need to reboot in order for the packages to work correctly and after restarting you can continue on the notebook running from the next following code block.
Unizipping files wont load
I have struggled with the line of code regarding unzipping the Pyramid files struggling to load, one method could be reconnecting the GPU as I have mentioned earlier but if that still doesnt work you can download the code from the link. And unzip it on your computer then reuploaded to the corresponding folder to bypass that line of code. https://colab.research.google.com/corgiredirector?site=https%3A%2F%2Fhuggingface.co%2Fspaces%2Funity%2FML-Agents-Pyramids%2Fresolve%2Fmain%2FPyramids.zip then unzipping on your computer then reuploading it to the corresponding folder location in training-envs-executables/linux/
File does not exist error code
When running a code results in a "this file does not exist, or this folder does not exist" it could be from not correctly loading previous code blocks or run time was lost if you closed down the program. You can check if this is the case by going into the side directory of files and go under the corresponding folders to check if files are indeed there. If not just reload the blocks of code that creates the files.
Connecting to google drive
- In order for the code to run it needs to be mounted to your google drive. So if you a running this through an organizations google account for example schools. It may need to be approved from the IT for it to be allowed to be mounted to the google drive. So make sure that is cleared before continuing the notebook.
- Also Another cause to the drive not connecting is you may have popups blocked so you will need to allow popups for google collab or else the option to connect ot connect to your google drive will not appear
Saving progress
While run info is saved to your google drive. Since this is a edited notebook changes you make to the hyper parameters are not saved so everytime you rerun the notebook it will reset to the prexisting values. Also training progress cannot be called back if you reopen a later time, you will need to rerun the whole code and retrain which is quite time consuming so I recommend you using the resource. https://learningmaterialcomputations.medium.com/save-any-file-from-google-colab-to-your-google-drive-caf8291ba59b#:~:text=Step%201%3A%20Mount%20your%20google,that%20you're%20working%20with.&text=Step%202%3A%20Authorise%20Google%20Colab,the%20%E2%80%9CCopy%20Path%E2%80%9D%20option.
Watch your Agent play
You can watch your agent playing directly in your browser After correctly training the agent and uploading it to the hub it should produce a link that leads you to this page where you can see your agent playing.
- If the environment is part of ML-Agents official environments, go to https://huggingface.co/unity
- Step 1: Find your model_id: MY11111111/ppo-Pyramids123 (this is my model_id so you will need to isnert the one that is produced from your own notebooke)
- Step 2: Select your .nn /.onnx file
- Click on Watch the agent play 👀
- Downloads last month
- 132