// Internationalization dictionary window.lang_dict = { // English 'EN': { 'introTour': { 'nextLabel': "Next", 'prevLabel': "Back", 'doneLabel': "Done", 'welcomeTitle': "Welcome!", 'welcomeText': "Here you can play with a simulation where autonomously trained agents are trying to navigate through a 2D environment.", 'viewportTitle': "Viewport simulation", 'viewportText': "Here is the viewport where the simulation is rendered in real time. It allows you to see the environment and visualize live how the agents are dealing with it.

You can also interact with the simulation using the mouse in order to scroll, zoom or even drag and drop the agents.

Try it now! ", 'runTitle': "Run the simulation", 'runText': 'Click the button to run the simulation.
Then, click the button to pause it or the to reset it.

Try it now!', 'baseEnvsTitle': "Some environments", 'baseEnvsText': "Here are some basic environments that will let you become more familiar with the different morphologies of agents.
You will be able to load them into the simulation to visualize the behaviour of the different agents.", 'morphologiesTitle': "Agents morphologies", 'morphologiesText': "Here are all the morphologies available for the agents. You can select one of several agents for each morphology and add it to the simulation.

Each agent has been trained to learn an optimal behaviour to navigate through the environment according to its morphology. Try to compare them!", 'agentsListTitle': "List of running agents", 'agentsListText': "In this section you can find all the agents that are currently running in the simulation.", 'customEnvsTitle': "Custom environments", 'customEnvsText': "Here you can save and download your custom environments but also upload them from previously downloaded files. Try to share them with your friends!", 'furtherTitle': "Going further...", 'furtherText': "If you want to customize the environment, access more advanced options or learn more about the context of this demo, open these tabs.

Enjoy!", }, 'introHints': { 'buttonLabel': "Got it", 'tips': `Tips

You can scroll horizontally and vertically in the environment by dragging the mouse.
You can zoom in or out using the mouse wheel.
You can select an agent or an asset by clicking on it, and then delete it by pressing the delete key of your keyboard.
You can drag and drop an agent or an asset using the mouse.
You can change the eraser and assets radius using the mouse wheel.

` }, 'agentsList': { 'title': "List of running agents", 'follow': "Follow", 'followTooltip' : "Center the viewport on the agent", 'savePosTooltip': "Save the agent's position", 'resetPosTooltip': "Reset the agent's position", 'deleteAgentTooltip': "Delete the agent", }, 'mainButtons':{ 'runBtnTooltip': "Run the simulation", 'pauseBtnTooltip': "Pause the simulation", 'resetBtnTooltip': "Reset the simulation", 'saveBtnTooltip': "Save the current environment" }, 'drawingMode': { 'intro': 'Here you can draw your own parkour!', 'text': `Select the Ground or Ceiling button to start drawing the corresponding terrain shape with the mouse.
Be careful not to draw more than one line at different heights if you want the result to be optimal.
You can use the Erase button if you need to correct your drawing or the Clear one to clear all your drawing.
When you are satisfied with the result, just click the Generate Terrain button.`, 'ground': "Ground", 'ceiling': "Ceiling", 'erase': "Erase", 'clear': "Clear", 'generateTerrain': "Generate Terrain", 'draw': "Draw", }, 'parkourConfig': { 'terrainGeneration': "Terrain Generation", 'generalParameters': "General Parameters", 'creepers': "Creepers", 'drawTabBtn': "Draw Yourself!", 'procGenTabBtn': "Procedural Generation", 'procGenText': `You can also use these three sliders to generate the terrain shapes automatically.`, 'smoothing': "Smoothing", 'waterLevel': "Water level", 'creepersWidth': "Width", 'creepersHeight': "Height", 'creepersSpacing': "Spacing", //'creepersType': "Type", 'rigid': "Rigid", 'swingable': "Swingable", }, 'morphologies': { 'title': "Add an agent", 'text': "Here you can add an agent to the simulation with the morphology of your choice.", 'policySelectTooltip': "Select an agent", 'addBtnTooltip': "Add the agent to the simulation", 'baby': "baby", 'teenager': "teenager", 'adult': "adult", 'bipedal': { 'title': "Bipedal", 'description': "This morphology is composed of a head and two legs which allow it to walk on the floor." }, 'spider': { 'title': "Spider", 'description': "This morphology is composed of a body and four legs which allow it to walk on the floor." }, 'chimpanzee': { 'title': "Chimpanzee", 'description': "This morphology is composed of a head, a torso and two arms and legs. It can only move by climbing the ceiling and grasping the creepers.", }, 'fish': { 'title': "Fish", 'description': "This morphology is composed of a head, a tail and a fin, allowing it to swim in the water.", }, }, 'envsSets': { 'baseSetText': `To begin you can select one of the following environments to load it into the simulation, and then run it by clicking the button.`, 'customSetText': `In this section you can store your own custom environments by saving them thanks to the button above or by uploading them from a JSON file.`, 'uploadCard': { 'title': "Upload an environment", 'text': `Choose a JSON file then click the button below to save the corresponding environment in your collection.`, 'uploadBtnTooltip': "Upload the environment from the selected file", }, 'downloadBtnTooltip': "Download the environment", 'deleteBtnTooltip': "Delete the environment", }, 'advancedOptions': { 'renderingOptions': ` Rendering Options `, 'drawJoints': "Draw joints", 'drawJointsTooltip': "Joints are what bind the different bodies together. They also act as motors for the agents morphologies.", 'drawLidars': "Draw lidars", 'drawLidarsTooltip': "Lidars are sensors which enable agents to detect nearby obstacles and water.", 'drawNames': "Draw names", 'drawNamesTooltip': "Name of the agents.", 'drawObservation': "Draw observations", 'drawObservationTooltip': "The observation is a partial view of the environment state at each step on which the agents rely to take an action.", 'drawReward': "Draw rewards", 'drawRewardTooltip': "The reward is a value that indicates how good or bad the current state is, according to the objective of the agent.", 'stepReward': "Step reward", 'totalReward': "Cumulative reward", 'assetsTitle': ` Assets `, 'assetsText': "Here you can find several types of assets, which are objects that you can add to the simulation using the mouse.", 'circle': ` Ball`, 'comingSoon': "More assets coming soon...", }, 'globalElements': { 'demoTitle': "Interactive Deep Reinforcement Learning Demo", 'gettingStarted': "Getting Started", 'parkourCustomization': "Parkour Customization", 'advancedOptions': "Advanced Options", 'about': "About...", 'saveEnvModal': { 'title': `Please enter a name and a description for the current environment.`, 'text': "This environment will be saved in your collection of custom environments so that you could reload it later or download it to share it.", 'nameLabel': "Name", 'descriptionLabel': "Description", 'cancelBtn': "Cancel", 'confirmBtn': "Save", }, }, 'aboutTab': { 'purposeTitle': "Purpose of the demo", 'purposeText': `

The goal of this demo is to showcase the challenge of generalization to unknown tasks for Deep Reinforcement Learning (DRL) agents.

DRL is a Machine Learning approach for teaching virtual agents how to solve tasks by combining Reinforcement Learning and Deep Learning methods. This approach has been used for a diverse set of applications including robotics (e.g. Solving Rubik's Cube [1]), video games and boardgames (e.g. AlphaGo [2]).

In this demo, all the agents have been autonomously trained to learn an efficient behaviour to navigate through a 2D environment, combining a DRL algorithm and a teacher algorithm (see below) so that they can be able to generalize their behaviour to never-seen-before situations.

The demo provides different tools to customize the environment in order to test and challenge the robustness of the agents on different situations.

`, 'rlTitle': "Reinforcement Learning", 'rlText': `

Reinforcement Learning (RL) is the study of agents and how they learn by trial and error. The main idea is to reward or punish an agent according to the actions it takes in order to make it learn an efficient behavior to reach an objective.
The RL approaches generally feature an agent which evolves and interacts with a world. At each interaction step, the agent sees a partial observation of the current state of the environment and decides of an action to take. Each action taken by the agent changes the state of the world. The agent also receives a reward signal at each step, that indicates how good or bad the current state is according to the objective the agent has to reach.

The diagram on the right presents this interaction process between the agent and the environment, with the different information they exchange at each step.
Maximizing the reward over steps is a way for the agent to learn a behaviour, also called policy, to achieve its objective.

`, 'drlTitle': "Deep RL", 'drlText': `

In order to remember and improve the actions taken by the agent, DRL algorithms utilize artificial neural networks. With training, these neural networks are able to learn to predict an optimal action to take at each step from the observation received, thanks to all the observations and rewards previously received after each action during training. Thanks to this, DRL algorithms are able to produce behaviours that are very effective in situations similar to those they were trained on.

However, in real-world applications, the environment rarely remains still and frequently evolves. Therefore one would want DRL agents to be able to generalize their behaviour to previously unseen changes of the environment so that they can adapt to a large range of situations.

`, 'aclTitle': "Automatic Curriculum Learning", 'aclText': `

One solution to handle this challenge is to train DRL agents on procedurally generated environments.
Procedural generation is a method of automatically creating environments according to some parameters. Using this method, DRL agents can be trained on a very wide range of environments, hence allowing them to generalize their behaviour to more different situations.

However, randomly generating environments during training implies the risk to generate environments that are too difficult or too easy to resolve for the agents, preventing them to continuously learn in an efficient way.
Therefore, one would need smarter training strategies that propose relevant environments tailored to the current learning progress of the student (DRL agent). This method is called Automatic Curriculum Learning (ACL) [3] and is embodied by a teacher algorithm which is trained to learn to generate the most relevant environments throughout the entire training process according to the student performances.
This way, the teacher proposes easy environments to the student at the beginning and gradually increases the difficulty and the diversity of the tasks in order to guarantee that the student is progressing while not always facing the same situation or forgetting what it has already learned.

`, 'aboutDemoTitle': "About the demo", 'aboutDemoText': `

In this demo, all the available agents were trained using Soft Actor Critic [4] as the DRL student algorithm alongside different ACL teacher algorithms such as ALP-GMM [5].
They successfully learned efficient behaviours to move through the environment and to generalize to never-seen-before situations.

The physics of the simulation are handled by the box2d.js physics engine which is a direct port to JavaScript of the Box2D physics engine.
The pre-trained policies (agents behaviours) are loaded in the browser thanks to TensorFlow.js.

`, 'creditsTitle': "Credits", 'creditsText': `

This demo was designed by Paul Germon as part of an internship within Flowers research team at Inria. This internship was monitored by Rémy Portelas and Clément Romac, and supervised by Pierre-Yves Oudeyer. Special thanks to Nikita Melkozerov for its very helpful contribution. Recommended citation format:

                                @misc{germon2021demo,
                                  title={Interactive Deep Reinforcement Learning Demo},
                                  author={Germon, Paul and Romac, Clément and Portelas, Rémy and Pierre-Yves, Oudeyer},
                                  url={https://developmentalsystems.org/Interactive_DeepRL_Demo/},
                                  year={2021}
                                }

The code of this demo is open-source and can be found on this github repository.
The code of the environment and agents is adapted from the TeachMyAgent [6] benchmark's Python code to JavaScript.

`, 'referencesTitle': "References", 'referencesText': `

[1] OpenAI, Ilge Akkaya, Marcin Andrychowicz, Maciek Chociej, Mateusz Litwin, Bob McGrew, Arthur Petron, Alex Paino, Matthias Plappert, Glenn Powell, Raphael Ribas, Jonas Schneider, Nikolas Tezak, Jerry Tworek, Peter Welinder, Lilian Weng, Qiming Yuan, Wojciech Zaremba, Lei Zhang: Solving Rubik's Cube with a Robot Hand (2019). https://arxiv.org/abs/1910.07113
[2] Silver, D., Huang, A., Maddison, C. et al. Mastering the game of Go with deep neural networks and tree search. Nature 529, 484–489 (2016). https://doi.org/10.1038/nature16961
[3] Portelas, R., Colas, C., Weng, L., Hofmann, K., & Oudeyer, P. Y. (2020). Automatic curriculum learning for deep rl: A short survey (2020). https://arxiv.org/abs/2003.04664
[4] Haarnoja, T., Zhou, A., Abbeel, P., & Levine, S. (2018, July). Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International conference on machine learning (pp. 1861-1870). PMLR https://arxiv.org/abs/1801.01290
[5] Portelas, R., Colas, C., Hofmann, K., & Oudeyer, P. Y. (2020, May). Teacher algorithms for curriculum learning of deep rl in continuously parameterized environments. In Conference on Robot Learning (pp. 835-853). PMLR. https://arxiv.org/abs/1910.07224
[6] Romac, C., Portelas, R., Hofmann, K., & Oudeyer, P.-Y. (2021). TeachMyAgent: A Benchmark for Automatic Curriculum Learning in Deep RL. International Conference on Machine Learning, 9052–9063. https://arxiv.org/abs/2103.09815

`, } }, // French 'FR': { 'introTour': { 'nextLabel': "Suivant", 'prevLabel': "Précédent", 'doneLabel': "Fermer", 'welcomeTitle': "Bienvenue !", 'welcomeText': "Ici tu peux jouer avec une simulation dans laquelle des agents entraînés de manière autonome essayent de se déplacer au travers d'un environnement 2D.", 'viewportTitle': "Fenêtre d'affichage", 'viewportText': "C'est dans cet espace que la simulation est affichée en temps réel. Cela va te permettre de visualiser en direct comment les agents essayent de s'adapter à leur environment.

Tu peux aussi interagir avec la simulation à l'aide de la souris pour faire défiler l'environnement, zoomer ou encore déplacer les agents.

Tu peux essayer dès maintenant !", 'runTitle': "Lancer la simulation", 'runText': 'Clique sur le bouton pour lancer la simulation.
Tu peux ensuite cliquer sur le bouton pour la mettre en pause ou sur le bouton pour la réinitialiser.

Tu peux essayer dès maintenant !', 'baseEnvsTitle': "Quelques environnements", 'baseEnvsText': "Voici quelques environnements de base qui t'aideront à te familiariser avec les différentes morphologies d'agents.
Tu pourras les charger dans la simulation pour voir les comportements des différents agents.", 'morphologiesTitle': "Les différentes morphologies d'agents", 'morphologiesText': "Voici toutes les morphologies disponibles pour les agents. Tu peux choisir parmi plusieurs agents différents pour chaque morphologie et l'ajouter à la simulation.

Chaque agent a été entraîné pour apprendre un comportement efficace pour se déplacer à travers l'environnement en fonction de sa morphologie. Essaye de les comparer !", 'agentsListTitle': "List des agents actifs", 'agentsListText': "Dans cette section sont affichés tous les agents actuellement présents dans la simulation.", 'customEnvsTitle': "Environnements personnalisés", 'customEnvsText': "Ici tu peux sauvegarder et télécharger tes environnements personnalisés mais aussi en importer depuis des fichiers précédemment téléchargés. Essaye de les échanger avec tes amis !", 'furtherTitle': "Pour aller plus loin...", 'furtherText': "Si tu veux personnaliser ton propre environnement, accéder à des options avancées ou en apprendre davantage à propos du contexte autour de cette démo, ouvre ces onglets.

Amuse-toi bien !", }, 'introHints': { 'buttonLabel': "Ok", 'tips': `Astuces

Tu peux faire défiler l'environnement horizontalement et verticalement à l'aide de la souris.
Tu peux zoomer ou dézoomer avec la molette de la souris.
Tu peux sélectionner un agent ou un objet en cliquant dessus, puis le supprimer en appuyant sur la touche suppr de ton clavier.
Tu peux déplacer un agent ou un objet en le faisant glisser avec la souris.
Tu peux changer le rayon de la gomme ou la taille des objets avec la molette de la souris.

` }, 'agentsList': { 'title': "Liste des agents actifs", 'follow': "Suivre", 'followTooltip' : "Centrer la fenêtre d'affichage sur l'agent", 'savePosTooltip': "Sauvegarder la position de l'agent", 'resetPosTooltip': "Réinitiliaser la position de l'agent", 'deleteAgentTooltip': "Supprimer l'agent", }, 'mainButtons':{ 'runBtnTooltip': "Lancer la simulation", 'pauseBtnTooltip': "Mettre la simulation en pause", 'resetBtnTooltip': "Réinitialiser la simulation", 'saveBtnTooltip': "Sauvegarder l'environnement actuel" }, 'drawingMode': { 'intro': 'Ici tu peux dessiner ton propre parkour !', 'text': `Selectionne les boutons Sol ou Plafond pour commencer à dessiner l'élément du terrain correspondant avec la souris.
Fais attention à ne pas superposer plusieurs traits à différentes hauteurs pour obtenir un résultat optimal.
Tu peux utiliser le bouton Gommer pour corriger ton dessin ou le bouton Effacer pour tout effacer.
Une fois que tu es satisfait du résultat, clique sur le bouton Générer le terrain.`, 'ground': "Sol", 'ceiling': "Plafond", 'erase': "Gommer", 'clear': "Effacer", 'generateTerrain': "Générer le terrain", 'draw': "Dessiner", }, 'parkourConfig': { 'terrainGeneration': "Génération du terrain", 'generalParameters': "Paramètres Généraux", 'creepers': "Lianes", 'drawTabBtn': "Dessine par toi-même !", 'procGenTabBtn': "Génération procédurale", 'procGenText': `Tu peux aussi utiliser ces trois curseurs pour générer les formes du terrain de manière automatique.`, 'smoothing': "Lissage", 'waterLevel': "Niveau d'eau", 'creepersWidth': "Largeur", 'creepersHeight': "Hauteur", 'creepersSpacing': "Espacement", //'creepersType': "Type", 'rigid': "Rigides", 'swingable': "Flexibles", }, 'morphologies': { 'title': "Ajouter un agent", 'text': "Ici tu peux ajouter un agent à la simulation avec la morphologie de ton choix.", 'policySelectTooltip': "Sélectionner un agent", 'addBtnTooltip': "Ajouter l'agent à la simulation", 'baby': "bébé", 'teenager': "adolescent", 'adult': "adulte", 'bipedal': { 'title': "Bipède", 'description': "Cette morphologie est composée d'une tête et de deux jambes qui lui permettent de marcher sur le sol." }, 'spider': { 'title': "Araignée", 'description': "Cette morphologie est composée d'un corps et de quatre pattes qui lui permettent de marcher sur le sol." }, 'chimpanzee': { 'title': "Chimpanzé", 'description': "Cette morphologie est composée d'une tête, d'un corps ainsi que de deux bras et deux jambes. Elle peut uniquement se déplacer en s'accrochant au plafond ou en se balançant de lianes en lianes.", }, 'fish': { 'title': "Poisson", 'description': "Cette morphologie est composée d'une tête, d'une queue et d'une nageoire, ce qui lui permet de nager dans l'eau.", }, }, 'envsSets': { 'baseSetText': `Pour commencer tu peux sélectionner un des environnments suivants en cliquant dessus pour le charger dans la simulation, puis lancer celle-ci en cliquant sur le bouton .`, 'customSetText': `Dans cette section tu peux stocker tes propres environnements personnalisés en les sauvegardant grâce au bouton ci-dessus ou en les important depuis un fichier JSON.`, 'uploadCard': { 'title': "Importer un environnement", 'text': `Choisis un fichier JSON puis clique sur le bouton ci-dessous pour importer l'environnement correspondant dans ta collection.`, 'uploadBtnTooltip': "Importer l'environnement depuis le fichier sélectionné", }, 'downloadBtnTooltip': "Télécharger l'environnement", 'deleteBtnTooltip': "Supprimer l'environnement", }, 'advancedOptions': { 'renderingOptions': ` Options d'affichage `, 'drawJoints': "Afficher les joints", 'drawJointsTooltip': "Les joints sont les liens qui attachent les différents objets physiques ensemble. Ils servent aussi de moteurs pour les morphologies des agents.", 'drawLidars': "Afficher les lidars", 'drawLidarsTooltip': "Les lidars sont des capteurs qui permettent aux agents de détecter les obstacles et l'eau à proximité.", 'drawNames': "Afficher les noms", 'drawNamesTooltip': "Nom des agents.", 'drawObservation': "Afficher les observations", 'drawObservationTooltip': "L'observation d'un agent est une vue partielle de l'état actuel de l'environnement à chaque pas de temps, sur laquelle les agents s'appuient pour effectuer une action.", 'drawReward': "Afficher les récompenses", 'drawRewardTooltip': "La récompense est une valeur qui indique la qualité de l'état actuel de l'environnement selon l'objectif que l'agent doit atteindre.", 'stepReward': "Récompense courante", 'totalReward': "Récompense cumulée", 'assetsTitle': ` Objets `, 'assetsText': "Ici tu peux trouver plusieurs types d'objets que tu peux ajouter à la simulation avec la souris.", 'circle': ` Boule`, 'comingSoon': "Plus d'objets à venir...", }, 'globalElements': { 'demoTitle': "Démo Interactive de Deep Reinforcement Learning", 'gettingStarted': "Commencer", 'parkourCustomization': "Personnalisation du parkour", 'advancedOptions': "Options avancées", 'about': "À propos...", 'saveEnvModal': { 'title': `Saisis un nom et une description pour l'environnement actuel`, 'text': "Cet environnement sera sauvegardé dans ta collection d'environnements personnalisés pour que tu puisses le recharger plus tard ou le télécharger pour le partager.", 'nameLabel': "Nom", 'descriptionLabel': "Description", 'cancelBtn': "Annuler", 'confirmBtn': "Sauvegarder", }, }, 'aboutTab': { 'purposeTitle': "Objectif de la démo", 'purposeText': `

Le but de cette démonstration est de mettre en évidence le challenge de généralisation à des nouvelles tâches pour des agents entraînés par Apprentissage par Renforcement Profond (ARP).

L'ARP est une approche d'Apprentissage automatique (Machine Learning) qui permet d'apprendre à des agents virtuels à résoudre des tâches en combinant des méthodes d'Apprentissage par renforcement (Reinforcement Learning) et d'Apprentissage profond (Deep Learning). Cette approche a été utilisée pour diverses applications dont notamment la robotique (e.g. Solving Rubik's Cube) [1], les jeux vidéo et les jeux de plateau (e.g. AlphaGo) [2].

Dans cette démo, tous les agents ont été entraînés de manière autonome dans le but d'apprendre un comportement efficace pour traverser un environnement 2D. Ils ont été entraînés en combinant un algorithme d'ARP et un algorithme enseignant (voir ci-dessous) pour qu'ils soient capables de généraliser leur comportement à des situations inédites.

La démo fournit différents outils pour personnaliser l'environnement dans le but de tester et mettre la robustesse des agents à l'épreuve lorsqu'ils font face à différentes situations.

`, 'rlTitle': "Apprentissage par Renforcement", 'rlText': `

L'Apprentissage par renforcement (AR) est l'étude des agents virtuels et de leur apprentissage par essais et erreurs. L'idée principale est de récompenser ou punir un agent en fonction des actions qu'il effectue pour qu'il apprenne un comportement efficace pour atteindre un objectif.
Les approches d'AR sont généralement composées d'un agent qui évolue dans un monde ou environnement et interagit avec. À chaque pas de temps, l'agent voit une observation partielle de l'état actuel de l'environnement et décide d'effectuer une action. Chaque action effectuée par l'agent change l'état de l'environnement. L'agent reçoit également une récompense à chaque pas de temps, qui indique à quel point l'état actuel du monde est bon ou mauvais selon l'objectif que l'agent doit atteindre.

Le schéma ci-contre présente ce processus d'interaction entre l'agent et l'environment, avec les différentes informations qu'ils échangent à chaque pas de temps.
Maximiser la récompense au cours du temps est un moyen pour l'agent d'apprendre un comportement optimal (ou politique) pour atteindre son objectif.

`, 'drlTitle': "Apprentissage par renforcement profond", 'drlText': `

Pour améliorer continuellement les actions effectuées par les agents, les algorithmes d'ARP utilisent des réseaux de neurones artificiels. Avec de l'entraînement, ces réseaux de neurones sont capables d'apprendre à prédire une action optimale à effectuer à chaque étape à partir de l'observation reçue grâce à toutes les observations et les récompenses préalablement reçues après chaque action effectuée au cours de l'entraînement. Grâce à cela, les algorithmes d'ARP sont capables de produire des comportements très efficaces dans des situations similaires à celles sur lesquelles ils ont été entraînés.

Cependant, dans des applications réelles, l'environnement reste rarement identique en permanence mais évolue. Par conséquent, l'idéal serait que les agents d'ARP soient capables de généraliser leur comportement à des changements inédits du monde pour qu'ils puissent s'adapter à un large éventail de situations.

`, 'aclTitle': "Programme d'apprentissage automatique", 'aclText': `

Une solution pour résoudre en partie ce problème est d'entraîner les agents d'ARP sur des environnements générés procéduralement.
La génération procédurale est une méthode permettant de créer automatiquement des environnements en fonction de quelques paramètres. En utilisant cette méthode, les agents peuvent être entraînés sur un très large éventail d'environnements, leur permettant ainsi de généraliser leur comportement à plus de situations différentes.

Néanmoins, le fait de générer des environnements aléatoires au cours de l'entraînement implique le risque de générer des environnements trop difficiles ou trop faciles à résoudre pour les agents, les empêchant par conséquent d'apprendre en continu de manière efficace.
L'idéal serait donc de disposer de stratégies d'entraînement plus intelligentes qui proposeraient des environnements pertinents adaptés à l'avancement de l'apprentissage de l'élève (agent). Cette méthode est appelée programme d'apprentissage automatique (Automatic Curriculum Learning) [3] et est incarnée par un algorithme enseignant qui est entraîné dans le but d'apprendre à générer des environnements les plus pertinents possibles tout au long du processus d'entraînement de l'élève en fonction de l'évolution de ses performances.
De cette manière, l'enseignant commence par proposer des environnements faciles à l'élève puis augmente leur difficulté et leur diversité petit à petit pour garantir que l'élève progresse continuellement tout en ne rencontrant pas toujours la même situation et en n'oubliant pas ce qu'il a déjà appris.

`, 'aboutDemoTitle': "À propos de la démo", 'aboutDemoText': `

Dans cette démo, tous les agents ont été entraînés avec Soft Actor Critic [4] comme algorithme élève d'ARP aux côtés de différents algorithmes enseignants comme par exemple ALP-GMM [5].
Les agents ont ainsi appris avec succès des comportements efficaces pour se déplacer à travers les envirionnements 2D en généralisant leur comportement à des situations inédites.

La physique de la simulation est gérée par le moteur physique box2d.js qui est un portage direct en JavaScript du moteur physique Box2D.
Les politiques pré-entraînées (comportements des agents) sont chargées dans le navigateur grâce à TensorFlow.js.

`, 'creditsTitle': "Crédits", 'creditsText': `

Cette démo a été conçue par Paul Germon dans le cadre d'un stage au sein de l'équipe de recherche Flowers au laboratoire Inria. Ce stage était encadré par Rémy Portelas et Clément Romac, et supervisé par Pierre-Yves Oudeyer. Remerciements particuliers à Nikita Melkozerov pour sa contribution. Citation recommandée:

                                @misc{germon2021demo,
                                  title={Interactive Deep Reinforcement Learning Demo},
                                  author={Germon, Paul and Romac, Clément and Portelas, Rémy and Pierre-Yves, Oudeyer},
                                  url={https://developmentalsystems.org/Interactive_DeepRL_Demo/},
                                  year={2021}
                                }

Le code source de cette démo est libre d'accès sur ce dépôt github.
Le code de l'environnement et des agents a été adapté en JavaScript à partir du code Python de TeachMyAgent [6].

`, 'referencesTitle': "Références", 'referencesText': `

[1] OpenAI, Ilge Akkaya, Marcin Andrychowicz, Maciek Chociej, Mateusz Litwin, Bob McGrew, Arthur Petron, Alex Paino, Matthias Plappert, Glenn Powell, Raphael Ribas, Jonas Schneider, Nikolas Tezak, Jerry Tworek, Peter Welinder, Lilian Weng, Qiming Yuan, Wojciech Zaremba, Lei Zhang: Solving Rubik's Cube with a Robot Hand (2019). https://arxiv.org/abs/1910.07113
[2] Silver, D., Huang, A., Maddison, C. et al. Mastering the game of Go with deep neural networks and tree search. Nature 529, 484–489 (2016). https://doi.org/10.1038/nature16961
[3] Portelas, R., Colas, C., Weng, L., Hofmann, K., & Oudeyer, P. Y. (2020). Automatic curriculum learning for deep rl: A short survey (2020). https://arxiv.org/abs/2003.04664
[4] Haarnoja, T., Zhou, A., Abbeel, P., & Levine, S. (2018, July). Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International conference on machine learning (pp. 1861-1870). PMLR https://arxiv.org/abs/1801.01290
[5] Portelas, R., Colas, C., Hofmann, K., & Oudeyer, P. Y. (2020, May). Teacher algorithms for curriculum learning of deep rl in continuously parameterized environments. In Conference on Robot Learning (pp. 835-853). PMLR. https://arxiv.org/abs/1910.07224
[6] Romac, C., Portelas, R., Hofmann, K., & Oudeyer, P.-Y. (2021). TeachMyAgent: A Benchmark for Automatic Curriculum Learning in Deep RL. International Conference on Machine Learning, 9052–9063. https://arxiv.org/abs/2103.09815

`, } } }