sgoodfriend commited on
Commit
90325dc
·
1 Parent(s): d61f7b9

PPO playing MicrortsDefeatCoacAIShaped-v3 from https://github.com/sgoodfriend/rl-algo-impls/tree/9bd13b7f398a66b2a9cc00bc552f55d9d665a570

Browse files
README.md CHANGED
@@ -10,7 +10,7 @@ model-index:
10
  results:
11
  - metrics:
12
  - type: mean_reward
13
- value: 189.67 +/- 25.51
14
  name: mean_reward
15
  task:
16
  type: reinforcement-learning
@@ -23,17 +23,17 @@ model-index:
23
 
24
  This is a trained model of a **PPO** agent playing **MicrortsDefeatCoacAIShaped-v3** using the [/sgoodfriend/rl-algo-impls](https://github.com/sgoodfriend/rl-algo-impls) repo.
25
 
26
- All models trained at this commit can be found at https://api.wandb.ai/links/sgoodfriend/pa4zu5l1.
27
 
28
  ## Training Results
29
 
30
- This model was trained from 3 trainings of **PPO** agents using different initial seeds. These agents were trained by checking out [29807ca](https://github.com/sgoodfriend/rl-algo-impls/tree/29807ca31848a767b26c6d16e06a414b3321ffb6). The best and last models were kept from each training. This submission has loaded the best models from each training, reevaluates them, and selects the best model from these latest evaluations (mean - std).
31
 
32
  | algo | env | seed | reward_mean | reward_std | eval_episodes | best | wandb_url |
33
  |:-------|:------------------------------|-------:|--------------:|-------------:|----------------:|:-------|:-----------------------------------------------------------------------------|
34
- | ppo | MicrortsDefeatCoacAIShaped-v3 | 1 | 181.292 | 22.2249 | 24 | | [wandb](https://wandb.ai/sgoodfriend/rl-algo-impls-benchmarks/runs/5cl8neoo) |
35
- | ppo | MicrortsDefeatCoacAIShaped-v3 | 2 | 189.675 | 25.5148 | 24 | * | [wandb](https://wandb.ai/sgoodfriend/rl-algo-impls-benchmarks/runs/i3tzp4np) |
36
- | ppo | MicrortsDefeatCoacAIShaped-v3 | 3 | 168.558 | 33.7266 | 24 | | [wandb](https://wandb.ai/sgoodfriend/rl-algo-impls-benchmarks/runs/deb8july) |
37
 
38
 
39
  ### Prerequisites: Weights & Biases (WandB)
@@ -53,10 +53,10 @@ login`.
53
  Note: While the model state dictionary and hyperaparameters are saved, the latest
54
  implementation could be sufficiently different to not be able to reproduce similar
55
  results. You might need to checkout the commit the agent was trained on:
56
- [29807ca](https://github.com/sgoodfriend/rl-algo-impls/tree/29807ca31848a767b26c6d16e06a414b3321ffb6).
57
  ```
58
  # Downloads the model, sets hyperparameters, and runs agent for 3 episodes
59
- python enjoy.py --wandb-run-path=sgoodfriend/rl-algo-impls-benchmarks/i3tzp4np
60
  ```
61
 
62
  Setup hasn't been completely worked out yet, so you might be best served by using Google
@@ -68,11 +68,11 @@ notebook.
68
 
69
  ## Training
70
  If you want the highest chance to reproduce these results, you'll want to checkout the
71
- commit the agent was trained on: [29807ca](https://github.com/sgoodfriend/rl-algo-impls/tree/29807ca31848a767b26c6d16e06a414b3321ffb6). While
72
  training is deterministic, different hardware will give different results.
73
 
74
  ```
75
- python train.py --algo ppo --env MicrortsDefeatCoacAIShaped-v3 --seed 2
76
  ```
77
 
78
  Setup hasn't been completely worked out yet, so you might be best served by using Google
@@ -83,7 +83,7 @@ notebook.
83
 
84
 
85
  ## Benchmarking (with Lambda Labs instance)
86
- This and other models from https://api.wandb.ai/links/sgoodfriend/pa4zu5l1 were generated by running a script on a Lambda
87
  Labs instance. In a Lambda Labs instance terminal:
88
  ```
89
  git clone git@github.com:sgoodfriend/rl-algo-impls.git
@@ -154,13 +154,15 @@ policy_hyperparams:
154
  cnn_style: gridnet_encoder
155
  v_hidden_sizes:
156
  - 128
157
- seed: 2
158
  use_deterministic_algorithms: true
159
  wandb_entity: null
160
  wandb_group: null
161
  wandb_project_name: rl-algo-impls-benchmarks
162
  wandb_tags:
163
- - benchmark_29807ca
164
- - host_138-2-235-180
 
 
165
 
166
  ```
 
10
  results:
11
  - metrics:
12
  - type: mean_reward
13
+ value: 201.7 +/- 15.28
14
  name: mean_reward
15
  task:
16
  type: reinforcement-learning
 
23
 
24
  This is a trained model of a **PPO** agent playing **MicrortsDefeatCoacAIShaped-v3** using the [/sgoodfriend/rl-algo-impls](https://github.com/sgoodfriend/rl-algo-impls) repo.
25
 
26
+ All models trained at this commit can be found at https://api.wandb.ai/links/sgoodfriend/83rqdpmp.
27
 
28
  ## Training Results
29
 
30
+ This model was trained from 3 trainings of **PPO** agents using different initial seeds. These agents were trained by checking out [9bd13b7](https://github.com/sgoodfriend/rl-algo-impls/tree/9bd13b7f398a66b2a9cc00bc552f55d9d665a570). The best and last models were kept from each training. This submission has loaded the best models from each training, reevaluates them, and selects the best model from these latest evaluations (mean - std).
31
 
32
  | algo | env | seed | reward_mean | reward_std | eval_episodes | best | wandb_url |
33
  |:-------|:------------------------------|-------:|--------------:|-------------:|----------------:|:-------|:-----------------------------------------------------------------------------|
34
+ | ppo | MicrortsDefeatCoacAIShaped-v3 | 1 | 157.8 | 23.608 | 24 | | [wandb](https://wandb.ai/sgoodfriend/rl-algo-impls-benchmarks/runs/2a90owyc) |
35
+ | ppo | MicrortsDefeatCoacAIShaped-v3 | 2 | 180.233 | 25.6048 | 24 | | [wandb](https://wandb.ai/sgoodfriend/rl-algo-impls-benchmarks/runs/p1vlxuno) |
36
+ | ppo | MicrortsDefeatCoacAIShaped-v3 | 3 | 201.7 | 15.2797 | 24 | * | [wandb](https://wandb.ai/sgoodfriend/rl-algo-impls-benchmarks/runs/o8o6klqw) |
37
 
38
 
39
  ### Prerequisites: Weights & Biases (WandB)
 
53
  Note: While the model state dictionary and hyperaparameters are saved, the latest
54
  implementation could be sufficiently different to not be able to reproduce similar
55
  results. You might need to checkout the commit the agent was trained on:
56
+ [9bd13b7](https://github.com/sgoodfriend/rl-algo-impls/tree/9bd13b7f398a66b2a9cc00bc552f55d9d665a570).
57
  ```
58
  # Downloads the model, sets hyperparameters, and runs agent for 3 episodes
59
+ python enjoy.py --wandb-run-path=sgoodfriend/rl-algo-impls-benchmarks/o8o6klqw
60
  ```
61
 
62
  Setup hasn't been completely worked out yet, so you might be best served by using Google
 
68
 
69
  ## Training
70
  If you want the highest chance to reproduce these results, you'll want to checkout the
71
+ commit the agent was trained on: [9bd13b7](https://github.com/sgoodfriend/rl-algo-impls/tree/9bd13b7f398a66b2a9cc00bc552f55d9d665a570). While
72
  training is deterministic, different hardware will give different results.
73
 
74
  ```
75
+ python train.py --algo ppo --env MicrortsDefeatCoacAIShaped-v3 --seed 3
76
  ```
77
 
78
  Setup hasn't been completely worked out yet, so you might be best served by using Google
 
83
 
84
 
85
  ## Benchmarking (with Lambda Labs instance)
86
+ This and other models from https://api.wandb.ai/links/sgoodfriend/83rqdpmp were generated by running a script on a Lambda
87
  Labs instance. In a Lambda Labs instance terminal:
88
  ```
89
  git clone git@github.com:sgoodfriend/rl-algo-impls.git
 
154
  cnn_style: gridnet_encoder
155
  v_hidden_sizes:
156
  - 128
157
+ seed: 3
158
  use_deterministic_algorithms: true
159
  wandb_entity: null
160
  wandb_group: null
161
  wandb_project_name: rl-algo-impls-benchmarks
162
  wandb_tags:
163
+ - benchmark_9bd13b7
164
+ - host_138-2-238-188
165
+ - branch_main
166
+ - v0.0.8
167
 
168
  ```
pyproject.toml CHANGED
@@ -1,6 +1,6 @@
1
  [project]
2
  name = "rl_algo_impls"
3
- version = "0.0.7"
4
  description = "Implementations of reinforcement learning algorithms"
5
  authors = [
6
  {name = "Scott Goodfriend", email = "goodfriend.scott@gmail.com"},
@@ -68,6 +68,11 @@ jupyter = [
68
  "jupyter",
69
  "notebook"
70
  ]
 
 
 
 
 
71
 
72
  [project.urls]
73
  "Homepage" = "https://github.com/sgoodfriend/rl-algo-impls"
 
1
  [project]
2
  name = "rl_algo_impls"
3
+ version = "0.0.8"
4
  description = "Implementations of reinforcement learning algorithms"
5
  authors = [
6
  {name = "Scott Goodfriend", email = "goodfriend.scott@gmail.com"},
 
68
  "jupyter",
69
  "notebook"
70
  ]
71
+ all = [
72
+ "rl-algo-impls[test]",
73
+ "rl-algo-impls[procgen]",
74
+ "rl-algo-impls[microrts]",
75
+ ]
76
 
77
  [project.urls]
78
  "Homepage" = "https://github.com/sgoodfriend/rl-algo-impls"
replay.meta.json CHANGED
@@ -1 +1 @@
1
- {"content_type": "video/mp4", "encoder_version": {"backend": "ffmpeg", "version": "b'ffmpeg version 4.2.7-0ubuntu0.1 Copyright (c) 2000-2022 the FFmpeg developers\\nbuilt with gcc 9 (Ubuntu 9.4.0-1ubuntu1~20.04.1)\\nconfiguration: --prefix=/usr --extra-version=0ubuntu0.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-avresample --disable-filter=resample --enable-avisynth --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librsvg --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opencl --enable-opengl --enable-sdl2 --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-nvenc --enable-chromaprint --enable-frei0r --enable-libx264 --enable-shared\\nlibavutil 56. 31.100 / 56. 31.100\\nlibavcodec 58. 54.100 / 58. 54.100\\nlibavformat 58. 29.100 / 58. 29.100\\nlibavdevice 58. 8.100 / 58. 8.100\\nlibavfilter 7. 57.100 / 7. 57.100\\nlibavresample 4. 0. 0 / 4. 0. 0\\nlibswscale 5. 5.100 / 5. 5.100\\nlibswresample 3. 5.100 / 3. 5.100\\nlibpostproc 55. 5.100 / 55. 5.100\\n'", "cmdline": ["ffmpeg", "-nostats", "-loglevel", "error", "-y", "-f", "rawvideo", "-s:v", "640x640", "-pix_fmt", "rgb24", "-framerate", "150", "-i", "-", "-vf", "scale=trunc(iw/2)*2:trunc(ih/2)*2", "-vcodec", "libx264", "-pix_fmt", "yuv420p", "-r", "150", "/tmp/tmpccssnwt9/ppo-enc-dec-MicrortsDefeatCoacAIShaped-v3-diverseBots/replay.mp4"]}, "episode": {"r": 193.1999969482422, "l": 1160, "t": 12.301441}}
 
1
+ {"content_type": "video/mp4", "encoder_version": {"backend": "ffmpeg", "version": "b'ffmpeg version 4.2.7-0ubuntu0.1 Copyright (c) 2000-2022 the FFmpeg developers\\nbuilt with gcc 9 (Ubuntu 9.4.0-1ubuntu1~20.04.1)\\nconfiguration: --prefix=/usr --extra-version=0ubuntu0.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-avresample --disable-filter=resample --enable-avisynth --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librsvg --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opencl --enable-opengl --enable-sdl2 --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-nvenc --enable-chromaprint --enable-frei0r --enable-libx264 --enable-shared\\nlibavutil 56. 31.100 / 56. 31.100\\nlibavcodec 58. 54.100 / 58. 54.100\\nlibavformat 58. 29.100 / 58. 29.100\\nlibavdevice 58. 8.100 / 58. 8.100\\nlibavfilter 7. 57.100 / 7. 57.100\\nlibavresample 4. 0. 0 / 4. 0. 0\\nlibswscale 5. 5.100 / 5. 5.100\\nlibswresample 3. 5.100 / 3. 5.100\\nlibpostproc 55. 5.100 / 55. 5.100\\n'", "cmdline": ["ffmpeg", "-nostats", "-loglevel", "error", "-y", "-f", "rawvideo", "-s:v", "640x640", "-pix_fmt", "rgb24", "-framerate", "150", "-i", "-", "-vf", "scale=trunc(iw/2)*2:trunc(ih/2)*2", "-vcodec", "libx264", "-pix_fmt", "yuv420p", "-r", "150", "/tmp/tmp7lfej84r/ppo-enc-dec-MicrortsDefeatCoacAIShaped-v3-diverseBots/replay.mp4"]}, "episode": {"r": 205.39999389648438, "l": 1228, "t": 13.29542}}
replay.mp4 CHANGED
Binary files a/replay.mp4 and b/replay.mp4 differ
 
rl_algo_impls/hyperparams/a2c.yml CHANGED
@@ -97,31 +97,35 @@ Walker2DBulletEnv-v0:
97
  HopperBulletEnv-v0:
98
  <<: *pybullet-defaults
99
 
 
100
  CarRacing-v0:
101
  n_timesteps: !!float 4e6
102
  env_hyperparams:
103
- n_envs: 8
104
  frame_stack: 4
105
  normalize: true
106
  normalize_kwargs:
107
  norm_obs: false
108
  norm_reward: true
109
  policy_hyperparams:
110
- use_sde: true
111
- log_std_init: -2
112
- init_layers_orthogonal: false
113
- activation_fn: relu
114
  share_features_extractor: false
115
  cnn_flatten_dim: 256
116
  hidden_sizes: [256]
117
  algo_hyperparams:
118
- n_steps: 512
119
- learning_rate: !!float 1.62e-5
120
- gamma: 0.997
121
- gae_lambda: 0.975
122
- ent_coef: 0
123
- sde_sample_freq: 128
124
- vf_coef: 0.64
 
 
 
125
 
126
  _atari: &atari-defaults
127
  n_timesteps: !!float 1e7
 
97
  HopperBulletEnv-v0:
98
  <<: *pybullet-defaults
99
 
100
+ # Tuned
101
  CarRacing-v0:
102
  n_timesteps: !!float 4e6
103
  env_hyperparams:
104
+ n_envs: 16
105
  frame_stack: 4
106
  normalize: true
107
  normalize_kwargs:
108
  norm_obs: false
109
  norm_reward: true
110
  policy_hyperparams:
111
+ use_sde: false
112
+ log_std_init: -1.3502584927786276
113
+ init_layers_orthogonal: true
114
+ activation_fn: tanh
115
  share_features_extractor: false
116
  cnn_flatten_dim: 256
117
  hidden_sizes: [256]
118
  algo_hyperparams:
119
+ n_steps: 16
120
+ learning_rate: 0.000025630993245026736
121
+ learning_rate_decay: linear
122
+ gamma: 0.99957617037542
123
+ gae_lambda: 0.949455676599436
124
+ ent_coef: !!float 1.707983205298309e-7
125
+ vf_coef: 0.10428178193833336
126
+ max_grad_norm: 0.5406643389792273
127
+ normalize_advantage: true
128
+ use_rms_prop: false
129
 
130
  _atari: &atari-defaults
131
  n_timesteps: !!float 1e7
rl_algo_impls/hyperparams/ppo.yml CHANGED
@@ -266,7 +266,7 @@ _microrts_ai: &microrts-ai-defaults
266
  <<: *microrts-policy-defaults
267
  cnn_flatten_dim: 256
268
  actor_head_style: gridnet
269
- algo_hyperparams:
270
  <<: *microrts-algo-defaults
271
  learning_rate: !!float 2.5e-4
272
  learning_rate_decay: linear
 
266
  <<: *microrts-policy-defaults
267
  cnn_flatten_dim: 256
268
  actor_head_style: gridnet
269
+ algo_hyperparams: &microrts-ai-algo-defaults
270
  <<: *microrts-algo-defaults
271
  learning_rate: !!float 2.5e-4
272
  learning_rate_decay: linear
rl_algo_impls/shared/actor/gridnet_decoder.py CHANGED
@@ -57,6 +57,7 @@ class GridnetDecoder(Actor):
57
  32, action_vec.sum(), 3, stride=2, padding=1, output_padding=1
58
  ),
59
  init_layers_orthogonal=init_layers_orthogonal,
 
60
  ),
61
  Transpose((0, 2, 3, 1)),
62
  )
 
57
  32, action_vec.sum(), 3, stride=2, padding=1, output_padding=1
58
  ),
59
  init_layers_orthogonal=init_layers_orthogonal,
60
+ std=0.01,
61
  ),
62
  Transpose((0, 2, 3, 1)),
63
  )
rl_algo_impls/shared/encoder/cnn.py CHANGED
@@ -49,7 +49,9 @@ class FlattenedCnnEncoder(CnnEncoder):
49
  self.cnn = cnn
50
  self.flattened_dim = cnn_flatten_dim
51
  with torch.no_grad():
52
- cnn_out = cnn(self.preprocess(torch.as_tensor(obs_space.sample()))) # type: ignore
 
 
53
  self.fc = nn.Sequential(
54
  nn.Flatten(),
55
  layer_init(
@@ -60,7 +62,10 @@ class FlattenedCnnEncoder(CnnEncoder):
60
  )
61
 
62
  def forward(self, obs: torch.Tensor) -> torch.Tensor:
63
- return self.fc(self.cnn(super().forward(obs)))
 
 
 
64
 
65
  @property
66
  def out_dim(self) -> EncoderOutDim:
 
49
  self.cnn = cnn
50
  self.flattened_dim = cnn_flatten_dim
51
  with torch.no_grad():
52
+ cnn_out = torch.flatten(
53
+ cnn(self.preprocess(torch.as_tensor(obs_space.sample()))), start_dim=1
54
+ )
55
  self.fc = nn.Sequential(
56
  nn.Flatten(),
57
  layer_init(
 
62
  )
63
 
64
  def forward(self, obs: torch.Tensor) -> torch.Tensor:
65
+ x = super().forward(obs)
66
+ x = self.cnn(x)
67
+ x = self.fc(x)
68
+ return x
69
 
70
  @property
71
  def out_dim(self) -> EncoderOutDim:
rl_algo_impls/shared/module/module.py CHANGED
@@ -10,12 +10,15 @@ def mlp(
10
  output_activation: Type[nn.Module] = nn.Identity,
11
  init_layers_orthogonal: bool = False,
12
  final_layer_gain: float = np.sqrt(2),
 
13
  ) -> nn.Module:
14
  layers = []
15
  for i in range(len(layer_sizes) - 2):
16
  layers.append(
17
  layer_init(
18
- nn.Linear(layer_sizes[i], layer_sizes[i + 1]), init_layers_orthogonal
 
 
19
  )
20
  )
21
  layers.append(activation())
 
10
  output_activation: Type[nn.Module] = nn.Identity,
11
  init_layers_orthogonal: bool = False,
12
  final_layer_gain: float = np.sqrt(2),
13
+ hidden_layer_gain: float = np.sqrt(2),
14
  ) -> nn.Module:
15
  layers = []
16
  for i in range(len(layer_sizes) - 2):
17
  layers.append(
18
  layer_init(
19
+ nn.Linear(layer_sizes[i], layer_sizes[i + 1]),
20
+ init_layers_orthogonal,
21
+ std=hidden_layer_gain,
22
  )
23
  )
24
  layers.append(activation())
rl_algo_impls/shared/policy/critic.py CHANGED
@@ -30,6 +30,7 @@ class CriticHead(nn.Module):
30
  activation,
31
  init_layers_orthogonal=init_layers_orthogonal,
32
  final_layer_gain=1.0,
 
33
  )
34
  )
35
  self._fc = nn.Sequential(*seq)
 
30
  activation,
31
  init_layers_orthogonal=init_layers_orthogonal,
32
  final_layer_gain=1.0,
33
+ hidden_layer_gain=1.0,
34
  )
35
  )
36
  self._fc = nn.Sequential(*seq)
saved_models/{ppo-enc-dec-MicrortsDefeatCoacAIShaped-v3-diverseBots-S2-best → ppo-enc-dec-MicrortsDefeatCoacAIShaped-v3-diverseBots-S3-best}/model.pth RENAMED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:2507ccb05ded8cefcd96f8bb32c145e6f443d8d2d811aa0a4b8810863d72b9ee
3
  size 3359293
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:18a7857bb764911a5696b4ec54840fdf8168823418edc78ded6603ade49bdeb1
3
  size 3359293
scripts/benchmark.sh CHANGED
@@ -19,13 +19,13 @@ n_jobs="${n_jobs:-6}"
19
  project_name="${project_name:-rl-algo-impls-benchmarks}"
20
  seeds="${seeds:-1 2 3}"
21
 
22
- DISCRETE_ENVS=(
23
- # Basic
24
  "CartPole-v1"
25
  "MountainCar-v0"
26
  "Acrobot-v1"
27
  "LunarLander-v2"
28
- # Atari
 
29
  "PongNoFrameskip-v4"
30
  "BreakoutNoFrameskip-v4"
31
  "SpaceInvadersNoFrameskip-v4"
@@ -75,9 +75,11 @@ for algo in $(echo $algos); do
75
  algo_envs=${MICRORTS_AI_ENVS[*]}
76
  elif [ -z "$envs" ]; then
77
  if [ "$algo" = "dqn" ]; then
78
- BENCHMARK_ENVS="${DISCRETE_ENVS[*]}"
 
 
79
  else
80
- BENCHMARK_ENVS="${DISCRETE_ENVS[*]} ${BOX_ENVS[*]}"
81
  fi
82
  algo_envs=${BENCHMARK_ENVS[*]}
83
  else
 
19
  project_name="${project_name:-rl-algo-impls-benchmarks}"
20
  seeds="${seeds:-1 2 3}"
21
 
22
+ BASIC_ENVS=(
 
23
  "CartPole-v1"
24
  "MountainCar-v0"
25
  "Acrobot-v1"
26
  "LunarLander-v2"
27
+ )
28
+ ATARI_ENVS=(
29
  "PongNoFrameskip-v4"
30
  "BreakoutNoFrameskip-v4"
31
  "SpaceInvadersNoFrameskip-v4"
 
75
  algo_envs=${MICRORTS_AI_ENVS[*]}
76
  elif [ -z "$envs" ]; then
77
  if [ "$algo" = "dqn" ]; then
78
+ BENCHMARK_ENVS="${BASIC_ENVS[*]} ${ATARI_ENVS[*]}"
79
+ elif [ "$algo" = "vpg" ]; then
80
+ BENCHMARK_ENVS="${BASIC_ENVS[*]} ${BOX_ENVS[*]}"
81
  else
82
+ BENCHMARK_ENVS="${BASIC_ENVS[*]} ${BOX_ENVS[*]} ${ATARI_ENVS[*]}"
83
  fi
84
  algo_envs=${BENCHMARK_ENVS[*]}
85
  else
scripts/setup.sh CHANGED
@@ -8,4 +8,4 @@ sudo apt install -y default-jdk
8
  python3 -m pip install --upgrade pip
9
  pip install --upgrade torch torchvision torchaudio
10
 
11
- python -m pip install --upgrade '.[test,procgen,microrts]'
 
8
  python3 -m pip install --upgrade pip
9
  pip install --upgrade torch torchvision torchaudio
10
 
11
+ python -m pip install --upgrade '.[all]'
scripts/tags_benchmark.sh CHANGED
@@ -1 +1,5 @@
1
- echo "benchmark_$(git rev-parse --short HEAD) host_$(hostname)"
 
 
 
 
 
1
+ commit="benchmark_$(git rev-parse --short HEAD)"
2
+ host="host_$(hostname)"
3
+ branch="branch_$(git rev-parse --abbrev-ref HEAD)"
4
+ version="v$(pip show rl_algo_impls | grep Version | sed -e 's#Version:\ ##')"
5
+ echo "$commit $host $branch $version"