PPO playing MicrortsDefeatCoacAIShaped-v3 from https://github.com/sgoodfriend/rl-algo-impls/tree/9bd13b7f398a66b2a9cc00bc552f55d9d665a570

Browse files

Files changed (14) hide show

README.md +16 -14
pyproject.toml +6 -1
replay.meta.json +1 -1
replay.mp4 +0 -0
rl_algo_impls/hyperparams/a2c.yml +16 -12
rl_algo_impls/hyperparams/ppo.yml +1 -1
rl_algo_impls/shared/actor/gridnet_decoder.py +1 -0
rl_algo_impls/shared/encoder/cnn.py +7 -2
rl_algo_impls/shared/module/module.py +4 -1
rl_algo_impls/shared/policy/critic.py +1 -0
saved_models/{ppo-enc-dec-MicrortsDefeatCoacAIShaped-v3-diverseBots-S2-best → ppo-enc-dec-MicrortsDefeatCoacAIShaped-v3-diverseBots-S3-best}/model.pth +1 -1
scripts/benchmark.sh +7 -5
scripts/setup.sh +1 -1
scripts/tags_benchmark.sh +5 -1

README.md CHANGED Viewed

@@ -10,7 +10,7 @@ model-index:
   results:
   - metrics:
     - type: mean_reward
-      value: 189.67 +/- 25.51
       name: mean_reward
     task:
       type: reinforcement-learning
@@ -23,17 +23,17 @@ model-index:
 This is a trained model of a **PPO** agent playing **MicrortsDefeatCoacAIShaped-v3** using the [/sgoodfriend/rl-algo-impls](https://github.com/sgoodfriend/rl-algo-impls) repo.
-All models trained at this commit can be found at https://api.wandb.ai/links/sgoodfriend/pa4zu5l1.
 ## Training Results
-This model was trained from 3 trainings of **PPO** agents using different initial seeds. These agents were trained by checking out [29807ca](https://github.com/sgoodfriend/rl-algo-impls/tree/29807ca31848a767b26c6d16e06a414b3321ffb6). The best and last models were kept from each training. This submission has loaded the best models from each training, reevaluates them, and selects the best model from these latest evaluations (mean - std).
 | algo   | env                           |   seed |   reward_mean |   reward_std |   eval_episodes | best   | wandb_url                                                                    |
 |:-------|:------------------------------|-------:|--------------:|-------------:|----------------:|:-------|:-----------------------------------------------------------------------------|
-| ppo    | MicrortsDefeatCoacAIShaped-v3 |      1 |       181.292 |      22.2249 |              24 |        | [wandb](https://wandb.ai/sgoodfriend/rl-algo-impls-benchmarks/runs/5cl8neoo) |
-| ppo    | MicrortsDefeatCoacAIShaped-v3 |      2 |       189.675 |      25.5148 |              24 | *      | [wandb](https://wandb.ai/sgoodfriend/rl-algo-impls-benchmarks/runs/i3tzp4np) |
-| ppo    | MicrortsDefeatCoacAIShaped-v3 |      3 |       168.558 |      33.7266 |              24 |        | [wandb](https://wandb.ai/sgoodfriend/rl-algo-impls-benchmarks/runs/deb8july) |
 ### Prerequisites: Weights & Biases (WandB)
@@ -53,10 +53,10 @@ login`.
 Note: While the model state dictionary and hyperaparameters are saved, the latest
 implementation could be sufficiently different to not be able to reproduce similar
 results. You might need to checkout the commit the agent was trained on:
-[29807ca](https://github.com/sgoodfriend/rl-algo-impls/tree/29807ca31848a767b26c6d16e06a414b3321ffb6).
 ```
 # Downloads the model, sets hyperparameters, and runs agent for 3 episodes
-python enjoy.py --wandb-run-path=sgoodfriend/rl-algo-impls-benchmarks/i3tzp4np
 ```
 Setup hasn't been completely worked out yet, so you might be best served by using Google
@@ -68,11 +68,11 @@ notebook.
 ## Training
 If you want the highest chance to reproduce these results, you'll want to checkout the
-commit the agent was trained on: [29807ca](https://github.com/sgoodfriend/rl-algo-impls/tree/29807ca31848a767b26c6d16e06a414b3321ffb6). While
 training is deterministic, different hardware will give different results.
 ```
-python train.py --algo ppo --env MicrortsDefeatCoacAIShaped-v3 --seed 2
 ```
 Setup hasn't been completely worked out yet, so you might be best served by using Google
@@ -83,7 +83,7 @@ notebook.
 ## Benchmarking (with Lambda Labs instance)
-This and other models from https://api.wandb.ai/links/sgoodfriend/pa4zu5l1 were generated by running a script on a Lambda
 Labs instance. In a Lambda Labs instance terminal:
 ```
 git clone git@github.com:sgoodfriend/rl-algo-impls.git
@@ -154,13 +154,15 @@ policy_hyperparams:
   cnn_style: gridnet_encoder
   v_hidden_sizes:
   - 128
-seed: 2
 use_deterministic_algorithms: true
 wandb_entity: null
 wandb_group: null
 wandb_project_name: rl-algo-impls-benchmarks
 wandb_tags:
-- benchmark_29807ca
-- host_138-2-235-180
 ```

   results:
   - metrics:
     - type: mean_reward
+      value: 201.7 +/- 15.28
       name: mean_reward
     task:
       type: reinforcement-learning
 This is a trained model of a **PPO** agent playing **MicrortsDefeatCoacAIShaped-v3** using the [/sgoodfriend/rl-algo-impls](https://github.com/sgoodfriend/rl-algo-impls) repo.
+All models trained at this commit can be found at https://api.wandb.ai/links/sgoodfriend/83rqdpmp.
 ## Training Results
+This model was trained from 3 trainings of **PPO** agents using different initial seeds. These agents were trained by checking out [9bd13b7](https://github.com/sgoodfriend/rl-algo-impls/tree/9bd13b7f398a66b2a9cc00bc552f55d9d665a570). The best and last models were kept from each training. This submission has loaded the best models from each training, reevaluates them, and selects the best model from these latest evaluations (mean - std).
 | algo   | env                           |   seed |   reward_mean |   reward_std |   eval_episodes | best   | wandb_url                                                                    |
 |:-------|:------------------------------|-------:|--------------:|-------------:|----------------:|:-------|:-----------------------------------------------------------------------------|
+| ppo    | MicrortsDefeatCoacAIShaped-v3 |      1 |       157.8   |      23.608  |              24 |        | [wandb](https://wandb.ai/sgoodfriend/rl-algo-impls-benchmarks/runs/2a90owyc) |
+| ppo    | MicrortsDefeatCoacAIShaped-v3 |      2 |       180.233 |      25.6048 |              24 |        | [wandb](https://wandb.ai/sgoodfriend/rl-algo-impls-benchmarks/runs/p1vlxuno) |
+| ppo    | MicrortsDefeatCoacAIShaped-v3 |      3 |       201.7   |      15.2797 |              24 | *      | [wandb](https://wandb.ai/sgoodfriend/rl-algo-impls-benchmarks/runs/o8o6klqw) |
 ### Prerequisites: Weights & Biases (WandB)
 Note: While the model state dictionary and hyperaparameters are saved, the latest
 implementation could be sufficiently different to not be able to reproduce similar
 results. You might need to checkout the commit the agent was trained on:
+[9bd13b7](https://github.com/sgoodfriend/rl-algo-impls/tree/9bd13b7f398a66b2a9cc00bc552f55d9d665a570).
 ```
 # Downloads the model, sets hyperparameters, and runs agent for 3 episodes
+python enjoy.py --wandb-run-path=sgoodfriend/rl-algo-impls-benchmarks/o8o6klqw
 ```
 Setup hasn't been completely worked out yet, so you might be best served by using Google
 ## Training
 If you want the highest chance to reproduce these results, you'll want to checkout the
+commit the agent was trained on: [9bd13b7](https://github.com/sgoodfriend/rl-algo-impls/tree/9bd13b7f398a66b2a9cc00bc552f55d9d665a570). While
 training is deterministic, different hardware will give different results.
 ```
+python train.py --algo ppo --env MicrortsDefeatCoacAIShaped-v3 --seed 3
 ```
 Setup hasn't been completely worked out yet, so you might be best served by using Google
 ## Benchmarking (with Lambda Labs instance)
+This and other models from https://api.wandb.ai/links/sgoodfriend/83rqdpmp were generated by running a script on a Lambda
 Labs instance. In a Lambda Labs instance terminal:
 ```
 git clone git@github.com:sgoodfriend/rl-algo-impls.git
   cnn_style: gridnet_encoder
   v_hidden_sizes:
   - 128
+seed: 3
 use_deterministic_algorithms: true
 wandb_entity: null
 wandb_group: null
 wandb_project_name: rl-algo-impls-benchmarks
 wandb_tags:
+- benchmark_9bd13b7
+- host_138-2-238-188
+- branch_main
+- v0.0.8
 ```

pyproject.toml CHANGED Viewed

@@ -1,6 +1,6 @@
 [project]
 name = "rl_algo_impls"
-version = "0.0.7"
 description = "Implementations of reinforcement learning algorithms"
 authors = [
     {name = "Scott Goodfriend", email = "goodfriend.scott@gmail.com"},
@@ -68,6 +68,11 @@ jupyter = [
     "jupyter",
     "notebook"
 ]
 [project.urls]
 "Homepage" = "https://github.com/sgoodfriend/rl-algo-impls"

 [project]
 name = "rl_algo_impls"
+version = "0.0.8"
 description = "Implementations of reinforcement learning algorithms"
 authors = [
     {name = "Scott Goodfriend", email = "goodfriend.scott@gmail.com"},
     "jupyter",
     "notebook"
 ]
+all = [
+    "rl-algo-impls[test]",
+    "rl-algo-impls[procgen]",
+    "rl-algo-impls[microrts]",
+]
 [project.urls]
 "Homepage" = "https://github.com/sgoodfriend/rl-algo-impls"

replay.meta.json CHANGED Viewed

@@ -1 +1 @@

- {"content_type": "video/mp4", "encoder_version": {"backend": "ffmpeg", "version": "b'ffmpeg version 4.2.7-0ubuntu0.1 Copyright (c) 2000-2022 the FFmpeg developers\\nbuilt with gcc 9 (Ubuntu 9.4.0-1ubuntu1~20.04.1)\\nconfiguration: --prefix=/usr --extra-version=0ubuntu0.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-avresample --disable-filter=resample --enable-avisynth --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librsvg --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opencl --enable-opengl --enable-sdl2 --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-nvenc --enable-chromaprint --enable-frei0r --enable-libx264 --enable-shared\\nlibavutil 56. 31.100 / 56. 31.100\\nlibavcodec 58. 54.100 / 58. 54.100\\nlibavformat 58. 29.100 / 58. 29.100\\nlibavdevice 58. 8.100 / 58. 8.100\\nlibavfilter 7. 57.100 / 7. 57.100\\nlibavresample 4. 0. 0 / 4. 0. 0\\nlibswscale 5. 5.100 / 5. 5.100\\nlibswresample 3. 5.100 / 3. 5.100\\nlibpostproc 55. 5.100 / 55. 5.100\\n'", "cmdline": ["ffmpeg", "-nostats", "-loglevel", "error", "-y", "-f", "rawvideo", "-s:v", "640x640", "-pix_fmt", "rgb24", "-framerate", "150", "-i", "-", "-vf", "scale=trunc(iw/2)*2:trunc(ih/2)*2", "-vcodec", "libx264", "-pix_fmt", "yuv420p", "-r", "150", "/tmp/~~tmpccssnwt9~~/ppo-enc-dec-MicrortsDefeatCoacAIShaped-v3-diverseBots/replay.mp4"]}, "episode": {"r": ~~193~~.~~1999969482422~~, "l": ~~1160~~, "t": 12.~~301441~~}}

+ {"content_type": "video/mp4", "encoder_version": {"backend": "ffmpeg", "version": "b'ffmpeg version 4.2.7-0ubuntu0.1 Copyright (c) 2000-2022 the FFmpeg developers\\nbuilt with gcc 9 (Ubuntu 9.4.0-1ubuntu1~20.04.1)\\nconfiguration: --prefix=/usr --extra-version=0ubuntu0.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-avresample --disable-filter=resample --enable-avisynth --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librsvg --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opencl --enable-opengl --enable-sdl2 --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-nvenc --enable-chromaprint --enable-frei0r --enable-libx264 --enable-shared\\nlibavutil 56. 31.100 / 56. 31.100\\nlibavcodec 58. 54.100 / 58. 54.100\\nlibavformat 58. 29.100 / 58. 29.100\\nlibavdevice 58. 8.100 / 58. 8.100\\nlibavfilter 7. 57.100 / 7. 57.100\\nlibavresample 4. 0. 0 / 4. 0. 0\\nlibswscale 5. 5.100 / 5. 5.100\\nlibswresample 3. 5.100 / 3. 5.100\\nlibpostproc 55. 5.100 / 55. 5.100\\n'", "cmdline": ["ffmpeg", "-nostats", "-loglevel", "error", "-y", "-f", "rawvideo", "-s:v", "640x640", "-pix_fmt", "rgb24", "-framerate", "150", "-i", "-", "-vf", "scale=trunc(iw/2)*2:trunc(ih/2)*2", "-vcodec", "libx264", "-pix_fmt", "yuv420p", "-r", "150", "/tmp/tmp7lfej84r/ppo-enc-dec-MicrortsDefeatCoacAIShaped-v3-diverseBots/replay.mp4"]}, "episode": {"r": 205.39999389648438, "l": 1228, "t": 13.29542}}

replay.mp4 CHANGED Viewed

Binary files a/replay.mp4 and b/replay.mp4 differ

rl_algo_impls/hyperparams/a2c.yml CHANGED Viewed

@@ -97,31 +97,35 @@ Walker2DBulletEnv-v0:
 HopperBulletEnv-v0:
   <<: *pybullet-defaults
 CarRacing-v0:
   n_timesteps: !!float 4e6
   env_hyperparams:
-    n_envs: 8
     frame_stack: 4
     normalize: true
     normalize_kwargs:
       norm_obs: false
       norm_reward: true
   policy_hyperparams:
-    use_sde: true
-    log_std_init: -2
-    init_layers_orthogonal: false
-    activation_fn: relu
     share_features_extractor: false
     cnn_flatten_dim: 256
     hidden_sizes: [256]
   algo_hyperparams:
-    n_steps: 512
-    learning_rate: !!float 1.62e-5
-    gamma: 0.997
-    gae_lambda: 0.975
-    ent_coef: 0
-    sde_sample_freq: 128
-    vf_coef: 0.64
 _atari: &atari-defaults
   n_timesteps: !!float 1e7

 HopperBulletEnv-v0:
   <<: *pybullet-defaults
+# Tuned
 CarRacing-v0:
   n_timesteps: !!float 4e6
   env_hyperparams:
+    n_envs: 16
     frame_stack: 4
     normalize: true
     normalize_kwargs:
       norm_obs: false
       norm_reward: true
   policy_hyperparams:
+    use_sde: false
+    log_std_init: -1.3502584927786276
+    init_layers_orthogonal: true
+    activation_fn: tanh
     share_features_extractor: false
     cnn_flatten_dim: 256
     hidden_sizes: [256]
   algo_hyperparams:
+    n_steps: 16
+    learning_rate: 0.000025630993245026736
+    learning_rate_decay: linear
+    gamma: 0.99957617037542
+    gae_lambda: 0.949455676599436
+    ent_coef: !!float 1.707983205298309e-7
+    vf_coef: 0.10428178193833336
+    max_grad_norm: 0.5406643389792273
+    normalize_advantage: true
+    use_rms_prop: false
 _atari: &atari-defaults
   n_timesteps: !!float 1e7

rl_algo_impls/hyperparams/ppo.yml CHANGED Viewed

@@ -266,7 +266,7 @@ _microrts_ai: &microrts-ai-defaults
     <<: *microrts-policy-defaults
     cnn_flatten_dim: 256
     actor_head_style: gridnet
-  algo_hyperparams:
     <<: *microrts-algo-defaults
     learning_rate: !!float 2.5e-4
     learning_rate_decay: linear

     <<: *microrts-policy-defaults
     cnn_flatten_dim: 256
     actor_head_style: gridnet
+  algo_hyperparams: &microrts-ai-algo-defaults
     <<: *microrts-algo-defaults
     learning_rate: !!float 2.5e-4
     learning_rate_decay: linear

rl_algo_impls/shared/actor/gridnet_decoder.py CHANGED Viewed

@@ -57,6 +57,7 @@ class GridnetDecoder(Actor):
                     32, action_vec.sum(), 3, stride=2, padding=1, output_padding=1
                 ),
                 init_layers_orthogonal=init_layers_orthogonal,
             ),
             Transpose((0, 2, 3, 1)),
         )

                     32, action_vec.sum(), 3, stride=2, padding=1, output_padding=1
                 ),
                 init_layers_orthogonal=init_layers_orthogonal,
+                std=0.01,
             ),
             Transpose((0, 2, 3, 1)),
         )

rl_algo_impls/shared/encoder/cnn.py CHANGED Viewed

@@ -49,7 +49,9 @@ class FlattenedCnnEncoder(CnnEncoder):
         self.cnn = cnn
         self.flattened_dim = cnn_flatten_dim
         with torch.no_grad():
-            cnn_out = cnn(self.preprocess(torch.as_tensor(obs_space.sample())))  # type: ignore
         self.fc = nn.Sequential(
             nn.Flatten(),
             layer_init(
@@ -60,7 +62,10 @@ class FlattenedCnnEncoder(CnnEncoder):
         )
     def forward(self, obs: torch.Tensor) -> torch.Tensor:
-        return self.fc(self.cnn(super().forward(obs)))
     @property
     def out_dim(self) -> EncoderOutDim:

         self.cnn = cnn
         self.flattened_dim = cnn_flatten_dim
         with torch.no_grad():
+            cnn_out = torch.flatten(
+                cnn(self.preprocess(torch.as_tensor(obs_space.sample()))), start_dim=1
+            )
         self.fc = nn.Sequential(
             nn.Flatten(),
             layer_init(
         )
     def forward(self, obs: torch.Tensor) -> torch.Tensor:
+        x = super().forward(obs)
+        x = self.cnn(x)
+        x = self.fc(x)
+        return x
     @property
     def out_dim(self) -> EncoderOutDim:

rl_algo_impls/shared/module/module.py CHANGED Viewed

@@ -10,12 +10,15 @@ def mlp(
     output_activation: Type[nn.Module] = nn.Identity,
     init_layers_orthogonal: bool = False,
     final_layer_gain: float = np.sqrt(2),
 ) -> nn.Module:
     layers = []
     for i in range(len(layer_sizes) - 2):
         layers.append(
             layer_init(
-                nn.Linear(layer_sizes[i], layer_sizes[i + 1]), init_layers_orthogonal
             )
         )
         layers.append(activation())

     output_activation: Type[nn.Module] = nn.Identity,
     init_layers_orthogonal: bool = False,
     final_layer_gain: float = np.sqrt(2),
+    hidden_layer_gain: float = np.sqrt(2),
 ) -> nn.Module:
     layers = []
     for i in range(len(layer_sizes) - 2):
         layers.append(
             layer_init(
+                nn.Linear(layer_sizes[i], layer_sizes[i + 1]),
+                init_layers_orthogonal,
+                std=hidden_layer_gain,
             )
         )
         layers.append(activation())

rl_algo_impls/shared/policy/critic.py CHANGED Viewed

@@ -30,6 +30,7 @@ class CriticHead(nn.Module):
                 activation,
                 init_layers_orthogonal=init_layers_orthogonal,
                 final_layer_gain=1.0,
             )
         )
         self._fc = nn.Sequential(*seq)

                 activation,
                 init_layers_orthogonal=init_layers_orthogonal,
                 final_layer_gain=1.0,
+                hidden_layer_gain=1.0,
             )
         )
         self._fc = nn.Sequential(*seq)

saved_models/{ppo-enc-dec-MicrortsDefeatCoacAIShaped-v3-diverseBots-S2-best → ppo-enc-dec-MicrortsDefeatCoacAIShaped-v3-diverseBots-S3-best}/model.pth RENAMED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:2507ccb05ded8cefcd96f8bb32c145e6f443d8d2d811aa0a4b8810863d72b9ee
 size 3359293

 version https://git-lfs.github.com/spec/v1
+oid sha256:18a7857bb764911a5696b4ec54840fdf8168823418edc78ded6603ade49bdeb1
 size 3359293

scripts/benchmark.sh CHANGED Viewed

@@ -19,13 +19,13 @@ n_jobs="${n_jobs:-6}"
 project_name="${project_name:-rl-algo-impls-benchmarks}"
 seeds="${seeds:-1 2 3}"
-DISCRETE_ENVS=(
-    # Basic
     "CartPole-v1"
     "MountainCar-v0"
     "Acrobot-v1"
     "LunarLander-v2"
-    # Atari
     "PongNoFrameskip-v4"
     "BreakoutNoFrameskip-v4"
     "SpaceInvadersNoFrameskip-v4"
@@ -75,9 +75,11 @@ for algo in $(echo $algos); do
         algo_envs=${MICRORTS_AI_ENVS[*]}
     elif [ -z "$envs" ]; then
         if [ "$algo" = "dqn" ]; then
-            BENCHMARK_ENVS="${DISCRETE_ENVS[*]}"
         else
-            BENCHMARK_ENVS="${DISCRETE_ENVS[*]} ${BOX_ENVS[*]}"
         fi
         algo_envs=${BENCHMARK_ENVS[*]}
     else

 project_name="${project_name:-rl-algo-impls-benchmarks}"
 seeds="${seeds:-1 2 3}"
+BASIC_ENVS=(
     "CartPole-v1"
     "MountainCar-v0"
     "Acrobot-v1"
     "LunarLander-v2"
+)
+ATARI_ENVS=(
     "PongNoFrameskip-v4"
     "BreakoutNoFrameskip-v4"
     "SpaceInvadersNoFrameskip-v4"
         algo_envs=${MICRORTS_AI_ENVS[*]}
     elif [ -z "$envs" ]; then
         if [ "$algo" = "dqn" ]; then
+            BENCHMARK_ENVS="${BASIC_ENVS[*]} ${ATARI_ENVS[*]}"
+        elif [ "$algo" = "vpg" ]; then
+            BENCHMARK_ENVS="${BASIC_ENVS[*]} ${BOX_ENVS[*]}"
         else
+            BENCHMARK_ENVS="${BASIC_ENVS[*]} ${BOX_ENVS[*]} ${ATARI_ENVS[*]}"
         fi
         algo_envs=${BENCHMARK_ENVS[*]}
     else

scripts/setup.sh CHANGED Viewed

@@ -8,4 +8,4 @@ sudo apt install -y default-jdk
 python3 -m pip install --upgrade pip
 pip install --upgrade torch torchvision torchaudio
-python -m pip install --upgrade '.[test,procgen,microrts]'

 python3 -m pip install --upgrade pip
 pip install --upgrade torch torchvision torchaudio
+python -m pip install --upgrade '.[all]'

scripts/tags_benchmark.sh CHANGED Viewed

	@@ -1 +1,5 @@
1	- ~~echo~~ "benchmark_$(git rev-parse --short HEAD) ~~host_$(hostname)~~"

+commit="benchmark_$(git rev-parse --short HEAD)"
+host="host_$(hostname)"
+branch="branch_$(git rev-parse --abbrev-ref HEAD)"
+version="v$(pip show rl_algo_impls | grep Version | sed -e 's#Version:\ ##')"
+echo "$commit $host $branch $version"