metadata

title: Negatively Correlated Ensemble RL
emoji: 🌹
colorFrom: red
colorTo: yellow
sdk: gradio
python_version: 3.9
app_file: app.py
pinned: false

Negatively Correlated Ensemble RL

环境安装

创建conda环境

conda create -n ncerl python=3.9

安装环境依赖

pip install -r requirements.txt

注：该程序不需要您使用任何显卡，但是需要安装pytorch。如果您的显卡支持cuda，那么请安装cuda版本，否则安装cpu版本。使用cuda版本可以提高推理速度。

切换conda环境

conda activate ncerl

快速开始

如果您想查看效果，可以通过

python app.py

后打开命令行显示连接互动查看。

也可以通过运行

python generate_and_play.py

后查看models/example_policy/samples.png查看生成效果。

开始训练

All training are launched by running train.py with option and arguments. For example, execute python train.py ncesac --lbd 0.3 --m 5 will train NCERL with hyperparameters set as $\lambda = 0.3, m=5$. Plot script is plots.py

python train.py gan: to train a decoder which maps a continuous action to a game level segment.
python train.py sac: to train a standard SAC as the policy for online game level generation
python train.py asyncsac: to train a SAC with an asynchronous evaluation environment as the policy for online game level generation
python train.py ncesac: to train an NCERL based on SAC as the policy for online game level generation
python train.py egsac: to train an episodic generative SAC (see paper The fun facets of Mario: Multifaceted experience-driven PCG via reinforcement learning) as the policy for online game level generation
python train.py pmoe: to train an episodic generative SAC (see paper Probabilistic Mixture-of-Experts for Efficient Deep Reinforcement Learning) as the policy for online game level generation
python train.py sunrise: to train a SUNRISE (see paper SUNRISE: A Simple Unified Framework for Ensemble Learning in Deep Reinforcement Learning) as the policy for online game level generation
python train.py dvd: to train a DvD-SAC (see paper Effective Diversity in Population Based Reinforcement Learning) as the policy for online game level generation

For the training arguments, please refer to the help python train.py [option] --help

目录结构

NCERL-DIVERSE-PCG/
* analysis/
  * generate.py 未使用
  * tests.py 做evaluation使用
* media/ markdown素材文件
* models/
  * example_policy/ 做生成展示使用
* smb/ 马里奥仿真以及图片资源数据
* src/
  * ddpm/ ddpm模型相关目录
  * drl/ drl模型、训练目录
  * env/ 马里奥gym环境和reward function
  * gan/ gan模型、训练目录
  * olgen/ 在线生成环境与policy目录
  * rlkit/ 强化学习使用部件目录
  * smb/ 马里奥与仿真器交互组件以及多进程异步池组件
  * utils/ 一些功能性文件
* training_data/ 训练数据
* README.md 当前文件
* app.py 用于gradio展示用途文件
* generate_and_play.py 用于非gradio展示文件
* train.py 训练文件
* test_ddpm.py 测试训练ddpm文件
* requirements.txt 环境依赖文件