Commit
·
90325dc
1
Parent(s):
d61f7b9
PPO playing MicrortsDefeatCoacAIShaped-v3 from https://github.com/sgoodfriend/rl-algo-impls/tree/9bd13b7f398a66b2a9cc00bc552f55d9d665a570
Browse files- README.md +16 -14
- pyproject.toml +6 -1
- replay.meta.json +1 -1
- replay.mp4 +0 -0
- rl_algo_impls/hyperparams/a2c.yml +16 -12
- rl_algo_impls/hyperparams/ppo.yml +1 -1
- rl_algo_impls/shared/actor/gridnet_decoder.py +1 -0
- rl_algo_impls/shared/encoder/cnn.py +7 -2
- rl_algo_impls/shared/module/module.py +4 -1
- rl_algo_impls/shared/policy/critic.py +1 -0
- saved_models/{ppo-enc-dec-MicrortsDefeatCoacAIShaped-v3-diverseBots-S2-best → ppo-enc-dec-MicrortsDefeatCoacAIShaped-v3-diverseBots-S3-best}/model.pth +1 -1
- scripts/benchmark.sh +7 -5
- scripts/setup.sh +1 -1
- scripts/tags_benchmark.sh +5 -1
README.md
CHANGED
@@ -10,7 +10,7 @@ model-index:
|
|
10 |
results:
|
11 |
- metrics:
|
12 |
- type: mean_reward
|
13 |
-
value:
|
14 |
name: mean_reward
|
15 |
task:
|
16 |
type: reinforcement-learning
|
@@ -23,17 +23,17 @@ model-index:
|
|
23 |
|
24 |
This is a trained model of a **PPO** agent playing **MicrortsDefeatCoacAIShaped-v3** using the [/sgoodfriend/rl-algo-impls](https://github.com/sgoodfriend/rl-algo-impls) repo.
|
25 |
|
26 |
-
All models trained at this commit can be found at https://api.wandb.ai/links/sgoodfriend/
|
27 |
|
28 |
## Training Results
|
29 |
|
30 |
-
This model was trained from 3 trainings of **PPO** agents using different initial seeds. These agents were trained by checking out [
|
31 |
|
32 |
| algo | env | seed | reward_mean | reward_std | eval_episodes | best | wandb_url |
|
33 |
|:-------|:------------------------------|-------:|--------------:|-------------:|----------------:|:-------|:-----------------------------------------------------------------------------|
|
34 |
-
| ppo | MicrortsDefeatCoacAIShaped-v3 | 1 |
|
35 |
-
| ppo | MicrortsDefeatCoacAIShaped-v3 | 2 |
|
36 |
-
| ppo | MicrortsDefeatCoacAIShaped-v3 | 3 |
|
37 |
|
38 |
|
39 |
### Prerequisites: Weights & Biases (WandB)
|
@@ -53,10 +53,10 @@ login`.
|
|
53 |
Note: While the model state dictionary and hyperaparameters are saved, the latest
|
54 |
implementation could be sufficiently different to not be able to reproduce similar
|
55 |
results. You might need to checkout the commit the agent was trained on:
|
56 |
-
[
|
57 |
```
|
58 |
# Downloads the model, sets hyperparameters, and runs agent for 3 episodes
|
59 |
-
python enjoy.py --wandb-run-path=sgoodfriend/rl-algo-impls-benchmarks/
|
60 |
```
|
61 |
|
62 |
Setup hasn't been completely worked out yet, so you might be best served by using Google
|
@@ -68,11 +68,11 @@ notebook.
|
|
68 |
|
69 |
## Training
|
70 |
If you want the highest chance to reproduce these results, you'll want to checkout the
|
71 |
-
commit the agent was trained on: [
|
72 |
training is deterministic, different hardware will give different results.
|
73 |
|
74 |
```
|
75 |
-
python train.py --algo ppo --env MicrortsDefeatCoacAIShaped-v3 --seed
|
76 |
```
|
77 |
|
78 |
Setup hasn't been completely worked out yet, so you might be best served by using Google
|
@@ -83,7 +83,7 @@ notebook.
|
|
83 |
|
84 |
|
85 |
## Benchmarking (with Lambda Labs instance)
|
86 |
-
This and other models from https://api.wandb.ai/links/sgoodfriend/
|
87 |
Labs instance. In a Lambda Labs instance terminal:
|
88 |
```
|
89 |
git clone git@github.com:sgoodfriend/rl-algo-impls.git
|
@@ -154,13 +154,15 @@ policy_hyperparams:
|
|
154 |
cnn_style: gridnet_encoder
|
155 |
v_hidden_sizes:
|
156 |
- 128
|
157 |
-
seed:
|
158 |
use_deterministic_algorithms: true
|
159 |
wandb_entity: null
|
160 |
wandb_group: null
|
161 |
wandb_project_name: rl-algo-impls-benchmarks
|
162 |
wandb_tags:
|
163 |
-
-
|
164 |
-
- host_138-2-
|
|
|
|
|
165 |
|
166 |
```
|
|
|
10 |
results:
|
11 |
- metrics:
|
12 |
- type: mean_reward
|
13 |
+
value: 201.7 +/- 15.28
|
14 |
name: mean_reward
|
15 |
task:
|
16 |
type: reinforcement-learning
|
|
|
23 |
|
24 |
This is a trained model of a **PPO** agent playing **MicrortsDefeatCoacAIShaped-v3** using the [/sgoodfriend/rl-algo-impls](https://github.com/sgoodfriend/rl-algo-impls) repo.
|
25 |
|
26 |
+
All models trained at this commit can be found at https://api.wandb.ai/links/sgoodfriend/83rqdpmp.
|
27 |
|
28 |
## Training Results
|
29 |
|
30 |
+
This model was trained from 3 trainings of **PPO** agents using different initial seeds. These agents were trained by checking out [9bd13b7](https://github.com/sgoodfriend/rl-algo-impls/tree/9bd13b7f398a66b2a9cc00bc552f55d9d665a570). The best and last models were kept from each training. This submission has loaded the best models from each training, reevaluates them, and selects the best model from these latest evaluations (mean - std).
|
31 |
|
32 |
| algo | env | seed | reward_mean | reward_std | eval_episodes | best | wandb_url |
|
33 |
|:-------|:------------------------------|-------:|--------------:|-------------:|----------------:|:-------|:-----------------------------------------------------------------------------|
|
34 |
+
| ppo | MicrortsDefeatCoacAIShaped-v3 | 1 | 157.8 | 23.608 | 24 | | [wandb](https://wandb.ai/sgoodfriend/rl-algo-impls-benchmarks/runs/2a90owyc) |
|
35 |
+
| ppo | MicrortsDefeatCoacAIShaped-v3 | 2 | 180.233 | 25.6048 | 24 | | [wandb](https://wandb.ai/sgoodfriend/rl-algo-impls-benchmarks/runs/p1vlxuno) |
|
36 |
+
| ppo | MicrortsDefeatCoacAIShaped-v3 | 3 | 201.7 | 15.2797 | 24 | * | [wandb](https://wandb.ai/sgoodfriend/rl-algo-impls-benchmarks/runs/o8o6klqw) |
|
37 |
|
38 |
|
39 |
### Prerequisites: Weights & Biases (WandB)
|
|
|
53 |
Note: While the model state dictionary and hyperaparameters are saved, the latest
|
54 |
implementation could be sufficiently different to not be able to reproduce similar
|
55 |
results. You might need to checkout the commit the agent was trained on:
|
56 |
+
[9bd13b7](https://github.com/sgoodfriend/rl-algo-impls/tree/9bd13b7f398a66b2a9cc00bc552f55d9d665a570).
|
57 |
```
|
58 |
# Downloads the model, sets hyperparameters, and runs agent for 3 episodes
|
59 |
+
python enjoy.py --wandb-run-path=sgoodfriend/rl-algo-impls-benchmarks/o8o6klqw
|
60 |
```
|
61 |
|
62 |
Setup hasn't been completely worked out yet, so you might be best served by using Google
|
|
|
68 |
|
69 |
## Training
|
70 |
If you want the highest chance to reproduce these results, you'll want to checkout the
|
71 |
+
commit the agent was trained on: [9bd13b7](https://github.com/sgoodfriend/rl-algo-impls/tree/9bd13b7f398a66b2a9cc00bc552f55d9d665a570). While
|
72 |
training is deterministic, different hardware will give different results.
|
73 |
|
74 |
```
|
75 |
+
python train.py --algo ppo --env MicrortsDefeatCoacAIShaped-v3 --seed 3
|
76 |
```
|
77 |
|
78 |
Setup hasn't been completely worked out yet, so you might be best served by using Google
|
|
|
83 |
|
84 |
|
85 |
## Benchmarking (with Lambda Labs instance)
|
86 |
+
This and other models from https://api.wandb.ai/links/sgoodfriend/83rqdpmp were generated by running a script on a Lambda
|
87 |
Labs instance. In a Lambda Labs instance terminal:
|
88 |
```
|
89 |
git clone git@github.com:sgoodfriend/rl-algo-impls.git
|
|
|
154 |
cnn_style: gridnet_encoder
|
155 |
v_hidden_sizes:
|
156 |
- 128
|
157 |
+
seed: 3
|
158 |
use_deterministic_algorithms: true
|
159 |
wandb_entity: null
|
160 |
wandb_group: null
|
161 |
wandb_project_name: rl-algo-impls-benchmarks
|
162 |
wandb_tags:
|
163 |
+
- benchmark_9bd13b7
|
164 |
+
- host_138-2-238-188
|
165 |
+
- branch_main
|
166 |
+
- v0.0.8
|
167 |
|
168 |
```
|
pyproject.toml
CHANGED
@@ -1,6 +1,6 @@
|
|
1 |
[project]
|
2 |
name = "rl_algo_impls"
|
3 |
-
version = "0.0.
|
4 |
description = "Implementations of reinforcement learning algorithms"
|
5 |
authors = [
|
6 |
{name = "Scott Goodfriend", email = "goodfriend.scott@gmail.com"},
|
@@ -68,6 +68,11 @@ jupyter = [
|
|
68 |
"jupyter",
|
69 |
"notebook"
|
70 |
]
|
|
|
|
|
|
|
|
|
|
|
71 |
|
72 |
[project.urls]
|
73 |
"Homepage" = "https://github.com/sgoodfriend/rl-algo-impls"
|
|
|
1 |
[project]
|
2 |
name = "rl_algo_impls"
|
3 |
+
version = "0.0.8"
|
4 |
description = "Implementations of reinforcement learning algorithms"
|
5 |
authors = [
|
6 |
{name = "Scott Goodfriend", email = "goodfriend.scott@gmail.com"},
|
|
|
68 |
"jupyter",
|
69 |
"notebook"
|
70 |
]
|
71 |
+
all = [
|
72 |
+
"rl-algo-impls[test]",
|
73 |
+
"rl-algo-impls[procgen]",
|
74 |
+
"rl-algo-impls[microrts]",
|
75 |
+
]
|
76 |
|
77 |
[project.urls]
|
78 |
"Homepage" = "https://github.com/sgoodfriend/rl-algo-impls"
|
replay.meta.json
CHANGED
@@ -1 +1 @@
|
|
1 |
-
{"content_type": "video/mp4", "encoder_version": {"backend": "ffmpeg", "version": "b'ffmpeg version 4.2.7-0ubuntu0.1 Copyright (c) 2000-2022 the FFmpeg developers\\nbuilt with gcc 9 (Ubuntu 9.4.0-1ubuntu1~20.04.1)\\nconfiguration: --prefix=/usr --extra-version=0ubuntu0.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-avresample --disable-filter=resample --enable-avisynth --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librsvg --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opencl --enable-opengl --enable-sdl2 --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-nvenc --enable-chromaprint --enable-frei0r --enable-libx264 --enable-shared\\nlibavutil 56. 31.100 / 56. 31.100\\nlibavcodec 58. 54.100 / 58. 54.100\\nlibavformat 58. 29.100 / 58. 29.100\\nlibavdevice 58. 8.100 / 58. 8.100\\nlibavfilter 7. 57.100 / 7. 57.100\\nlibavresample 4. 0. 0 / 4. 0. 0\\nlibswscale 5. 5.100 / 5. 5.100\\nlibswresample 3. 5.100 / 3. 5.100\\nlibpostproc 55. 5.100 / 55. 5.100\\n'", "cmdline": ["ffmpeg", "-nostats", "-loglevel", "error", "-y", "-f", "rawvideo", "-s:v", "640x640", "-pix_fmt", "rgb24", "-framerate", "150", "-i", "-", "-vf", "scale=trunc(iw/2)*2:trunc(ih/2)*2", "-vcodec", "libx264", "-pix_fmt", "yuv420p", "-r", "150", "/tmp/
|
|
|
1 |
+
{"content_type": "video/mp4", "encoder_version": {"backend": "ffmpeg", "version": "b'ffmpeg version 4.2.7-0ubuntu0.1 Copyright (c) 2000-2022 the FFmpeg developers\\nbuilt with gcc 9 (Ubuntu 9.4.0-1ubuntu1~20.04.1)\\nconfiguration: --prefix=/usr --extra-version=0ubuntu0.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-avresample --disable-filter=resample --enable-avisynth --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librsvg --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opencl --enable-opengl --enable-sdl2 --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-nvenc --enable-chromaprint --enable-frei0r --enable-libx264 --enable-shared\\nlibavutil 56. 31.100 / 56. 31.100\\nlibavcodec 58. 54.100 / 58. 54.100\\nlibavformat 58. 29.100 / 58. 29.100\\nlibavdevice 58. 8.100 / 58. 8.100\\nlibavfilter 7. 57.100 / 7. 57.100\\nlibavresample 4. 0. 0 / 4. 0. 0\\nlibswscale 5. 5.100 / 5. 5.100\\nlibswresample 3. 5.100 / 3. 5.100\\nlibpostproc 55. 5.100 / 55. 5.100\\n'", "cmdline": ["ffmpeg", "-nostats", "-loglevel", "error", "-y", "-f", "rawvideo", "-s:v", "640x640", "-pix_fmt", "rgb24", "-framerate", "150", "-i", "-", "-vf", "scale=trunc(iw/2)*2:trunc(ih/2)*2", "-vcodec", "libx264", "-pix_fmt", "yuv420p", "-r", "150", "/tmp/tmp7lfej84r/ppo-enc-dec-MicrortsDefeatCoacAIShaped-v3-diverseBots/replay.mp4"]}, "episode": {"r": 205.39999389648438, "l": 1228, "t": 13.29542}}
|
replay.mp4
CHANGED
Binary files a/replay.mp4 and b/replay.mp4 differ
|
|
rl_algo_impls/hyperparams/a2c.yml
CHANGED
@@ -97,31 +97,35 @@ Walker2DBulletEnv-v0:
|
|
97 |
HopperBulletEnv-v0:
|
98 |
<<: *pybullet-defaults
|
99 |
|
|
|
100 |
CarRacing-v0:
|
101 |
n_timesteps: !!float 4e6
|
102 |
env_hyperparams:
|
103 |
-
n_envs:
|
104 |
frame_stack: 4
|
105 |
normalize: true
|
106 |
normalize_kwargs:
|
107 |
norm_obs: false
|
108 |
norm_reward: true
|
109 |
policy_hyperparams:
|
110 |
-
use_sde:
|
111 |
-
log_std_init: -
|
112 |
-
init_layers_orthogonal:
|
113 |
-
activation_fn:
|
114 |
share_features_extractor: false
|
115 |
cnn_flatten_dim: 256
|
116 |
hidden_sizes: [256]
|
117 |
algo_hyperparams:
|
118 |
-
n_steps:
|
119 |
-
learning_rate:
|
120 |
-
|
121 |
-
|
122 |
-
|
123 |
-
|
124 |
-
vf_coef: 0.
|
|
|
|
|
|
|
125 |
|
126 |
_atari: &atari-defaults
|
127 |
n_timesteps: !!float 1e7
|
|
|
97 |
HopperBulletEnv-v0:
|
98 |
<<: *pybullet-defaults
|
99 |
|
100 |
+
# Tuned
|
101 |
CarRacing-v0:
|
102 |
n_timesteps: !!float 4e6
|
103 |
env_hyperparams:
|
104 |
+
n_envs: 16
|
105 |
frame_stack: 4
|
106 |
normalize: true
|
107 |
normalize_kwargs:
|
108 |
norm_obs: false
|
109 |
norm_reward: true
|
110 |
policy_hyperparams:
|
111 |
+
use_sde: false
|
112 |
+
log_std_init: -1.3502584927786276
|
113 |
+
init_layers_orthogonal: true
|
114 |
+
activation_fn: tanh
|
115 |
share_features_extractor: false
|
116 |
cnn_flatten_dim: 256
|
117 |
hidden_sizes: [256]
|
118 |
algo_hyperparams:
|
119 |
+
n_steps: 16
|
120 |
+
learning_rate: 0.000025630993245026736
|
121 |
+
learning_rate_decay: linear
|
122 |
+
gamma: 0.99957617037542
|
123 |
+
gae_lambda: 0.949455676599436
|
124 |
+
ent_coef: !!float 1.707983205298309e-7
|
125 |
+
vf_coef: 0.10428178193833336
|
126 |
+
max_grad_norm: 0.5406643389792273
|
127 |
+
normalize_advantage: true
|
128 |
+
use_rms_prop: false
|
129 |
|
130 |
_atari: &atari-defaults
|
131 |
n_timesteps: !!float 1e7
|
rl_algo_impls/hyperparams/ppo.yml
CHANGED
@@ -266,7 +266,7 @@ _microrts_ai: µrts-ai-defaults
|
|
266 |
<<: *microrts-policy-defaults
|
267 |
cnn_flatten_dim: 256
|
268 |
actor_head_style: gridnet
|
269 |
-
algo_hyperparams:
|
270 |
<<: *microrts-algo-defaults
|
271 |
learning_rate: !!float 2.5e-4
|
272 |
learning_rate_decay: linear
|
|
|
266 |
<<: *microrts-policy-defaults
|
267 |
cnn_flatten_dim: 256
|
268 |
actor_head_style: gridnet
|
269 |
+
algo_hyperparams: µrts-ai-algo-defaults
|
270 |
<<: *microrts-algo-defaults
|
271 |
learning_rate: !!float 2.5e-4
|
272 |
learning_rate_decay: linear
|
rl_algo_impls/shared/actor/gridnet_decoder.py
CHANGED
@@ -57,6 +57,7 @@ class GridnetDecoder(Actor):
|
|
57 |
32, action_vec.sum(), 3, stride=2, padding=1, output_padding=1
|
58 |
),
|
59 |
init_layers_orthogonal=init_layers_orthogonal,
|
|
|
60 |
),
|
61 |
Transpose((0, 2, 3, 1)),
|
62 |
)
|
|
|
57 |
32, action_vec.sum(), 3, stride=2, padding=1, output_padding=1
|
58 |
),
|
59 |
init_layers_orthogonal=init_layers_orthogonal,
|
60 |
+
std=0.01,
|
61 |
),
|
62 |
Transpose((0, 2, 3, 1)),
|
63 |
)
|
rl_algo_impls/shared/encoder/cnn.py
CHANGED
@@ -49,7 +49,9 @@ class FlattenedCnnEncoder(CnnEncoder):
|
|
49 |
self.cnn = cnn
|
50 |
self.flattened_dim = cnn_flatten_dim
|
51 |
with torch.no_grad():
|
52 |
-
cnn_out =
|
|
|
|
|
53 |
self.fc = nn.Sequential(
|
54 |
nn.Flatten(),
|
55 |
layer_init(
|
@@ -60,7 +62,10 @@ class FlattenedCnnEncoder(CnnEncoder):
|
|
60 |
)
|
61 |
|
62 |
def forward(self, obs: torch.Tensor) -> torch.Tensor:
|
63 |
-
|
|
|
|
|
|
|
64 |
|
65 |
@property
|
66 |
def out_dim(self) -> EncoderOutDim:
|
|
|
49 |
self.cnn = cnn
|
50 |
self.flattened_dim = cnn_flatten_dim
|
51 |
with torch.no_grad():
|
52 |
+
cnn_out = torch.flatten(
|
53 |
+
cnn(self.preprocess(torch.as_tensor(obs_space.sample()))), start_dim=1
|
54 |
+
)
|
55 |
self.fc = nn.Sequential(
|
56 |
nn.Flatten(),
|
57 |
layer_init(
|
|
|
62 |
)
|
63 |
|
64 |
def forward(self, obs: torch.Tensor) -> torch.Tensor:
|
65 |
+
x = super().forward(obs)
|
66 |
+
x = self.cnn(x)
|
67 |
+
x = self.fc(x)
|
68 |
+
return x
|
69 |
|
70 |
@property
|
71 |
def out_dim(self) -> EncoderOutDim:
|
rl_algo_impls/shared/module/module.py
CHANGED
@@ -10,12 +10,15 @@ def mlp(
|
|
10 |
output_activation: Type[nn.Module] = nn.Identity,
|
11 |
init_layers_orthogonal: bool = False,
|
12 |
final_layer_gain: float = np.sqrt(2),
|
|
|
13 |
) -> nn.Module:
|
14 |
layers = []
|
15 |
for i in range(len(layer_sizes) - 2):
|
16 |
layers.append(
|
17 |
layer_init(
|
18 |
-
nn.Linear(layer_sizes[i], layer_sizes[i + 1]),
|
|
|
|
|
19 |
)
|
20 |
)
|
21 |
layers.append(activation())
|
|
|
10 |
output_activation: Type[nn.Module] = nn.Identity,
|
11 |
init_layers_orthogonal: bool = False,
|
12 |
final_layer_gain: float = np.sqrt(2),
|
13 |
+
hidden_layer_gain: float = np.sqrt(2),
|
14 |
) -> nn.Module:
|
15 |
layers = []
|
16 |
for i in range(len(layer_sizes) - 2):
|
17 |
layers.append(
|
18 |
layer_init(
|
19 |
+
nn.Linear(layer_sizes[i], layer_sizes[i + 1]),
|
20 |
+
init_layers_orthogonal,
|
21 |
+
std=hidden_layer_gain,
|
22 |
)
|
23 |
)
|
24 |
layers.append(activation())
|
rl_algo_impls/shared/policy/critic.py
CHANGED
@@ -30,6 +30,7 @@ class CriticHead(nn.Module):
|
|
30 |
activation,
|
31 |
init_layers_orthogonal=init_layers_orthogonal,
|
32 |
final_layer_gain=1.0,
|
|
|
33 |
)
|
34 |
)
|
35 |
self._fc = nn.Sequential(*seq)
|
|
|
30 |
activation,
|
31 |
init_layers_orthogonal=init_layers_orthogonal,
|
32 |
final_layer_gain=1.0,
|
33 |
+
hidden_layer_gain=1.0,
|
34 |
)
|
35 |
)
|
36 |
self._fc = nn.Sequential(*seq)
|
saved_models/{ppo-enc-dec-MicrortsDefeatCoacAIShaped-v3-diverseBots-S2-best → ppo-enc-dec-MicrortsDefeatCoacAIShaped-v3-diverseBots-S3-best}/model.pth
RENAMED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
size 3359293
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:18a7857bb764911a5696b4ec54840fdf8168823418edc78ded6603ade49bdeb1
|
3 |
size 3359293
|
scripts/benchmark.sh
CHANGED
@@ -19,13 +19,13 @@ n_jobs="${n_jobs:-6}"
|
|
19 |
project_name="${project_name:-rl-algo-impls-benchmarks}"
|
20 |
seeds="${seeds:-1 2 3}"
|
21 |
|
22 |
-
|
23 |
-
# Basic
|
24 |
"CartPole-v1"
|
25 |
"MountainCar-v0"
|
26 |
"Acrobot-v1"
|
27 |
"LunarLander-v2"
|
28 |
-
|
|
|
29 |
"PongNoFrameskip-v4"
|
30 |
"BreakoutNoFrameskip-v4"
|
31 |
"SpaceInvadersNoFrameskip-v4"
|
@@ -75,9 +75,11 @@ for algo in $(echo $algos); do
|
|
75 |
algo_envs=${MICRORTS_AI_ENVS[*]}
|
76 |
elif [ -z "$envs" ]; then
|
77 |
if [ "$algo" = "dqn" ]; then
|
78 |
-
BENCHMARK_ENVS="${
|
|
|
|
|
79 |
else
|
80 |
-
BENCHMARK_ENVS="${
|
81 |
fi
|
82 |
algo_envs=${BENCHMARK_ENVS[*]}
|
83 |
else
|
|
|
19 |
project_name="${project_name:-rl-algo-impls-benchmarks}"
|
20 |
seeds="${seeds:-1 2 3}"
|
21 |
|
22 |
+
BASIC_ENVS=(
|
|
|
23 |
"CartPole-v1"
|
24 |
"MountainCar-v0"
|
25 |
"Acrobot-v1"
|
26 |
"LunarLander-v2"
|
27 |
+
)
|
28 |
+
ATARI_ENVS=(
|
29 |
"PongNoFrameskip-v4"
|
30 |
"BreakoutNoFrameskip-v4"
|
31 |
"SpaceInvadersNoFrameskip-v4"
|
|
|
75 |
algo_envs=${MICRORTS_AI_ENVS[*]}
|
76 |
elif [ -z "$envs" ]; then
|
77 |
if [ "$algo" = "dqn" ]; then
|
78 |
+
BENCHMARK_ENVS="${BASIC_ENVS[*]} ${ATARI_ENVS[*]}"
|
79 |
+
elif [ "$algo" = "vpg" ]; then
|
80 |
+
BENCHMARK_ENVS="${BASIC_ENVS[*]} ${BOX_ENVS[*]}"
|
81 |
else
|
82 |
+
BENCHMARK_ENVS="${BASIC_ENVS[*]} ${BOX_ENVS[*]} ${ATARI_ENVS[*]}"
|
83 |
fi
|
84 |
algo_envs=${BENCHMARK_ENVS[*]}
|
85 |
else
|
scripts/setup.sh
CHANGED
@@ -8,4 +8,4 @@ sudo apt install -y default-jdk
|
|
8 |
python3 -m pip install --upgrade pip
|
9 |
pip install --upgrade torch torchvision torchaudio
|
10 |
|
11 |
-
python -m pip install --upgrade '.[
|
|
|
8 |
python3 -m pip install --upgrade pip
|
9 |
pip install --upgrade torch torchvision torchaudio
|
10 |
|
11 |
+
python -m pip install --upgrade '.[all]'
|
scripts/tags_benchmark.sh
CHANGED
@@ -1 +1,5 @@
|
|
1 |
-
|
|
|
|
|
|
|
|
|
|
1 |
+
commit="benchmark_$(git rev-parse --short HEAD)"
|
2 |
+
host="host_$(hostname)"
|
3 |
+
branch="branch_$(git rev-parse --abbrev-ref HEAD)"
|
4 |
+
version="v$(pip show rl_algo_impls | grep Version | sed -e 's#Version:\ ##')"
|
5 |
+
echo "$commit $host $branch $version"
|