File size: 7,564 Bytes
073a9d3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
---
license: cdla-permissive-2.0
datasets:
- microsoft/mocapact-data
---
# MoCapAct Model Zoo
Control of simulated humanoid characters is a challenging benchmark for sequential decision-making methods, as it assesses a policy’s ability to drive an inherently unstable, discontinuous, and high-dimensional physical system. Motion capture (MoCap) data can be very helpful in learning sophisticated locomotion policies by teaching a humanoid agent low-level skills (e.g., standing, walking, and running) that can then be used to generate high-level behaviors. However, even with MoCap data, controlling simulated humanoids remains very hard, because this data offers only kinematic information. Finding physical control inputs to realize the MoCap-demonstrated motions has required methods like reinforcement learning that need large amounts of compute, which has effectively served as a barrier to entry for this exciting research direction.

In an effort to broaden participation and facilitate evaluation of ideas in humanoid locomotion research, we are releasing MoCapAct (Motion Capture with Actions), a library of high-quality pre-trained agents that can track over three hours of MoCap data for a simulated humanoid in the `dm_control` physics-based environment and rollouts from these experts containing proprioceptive observations and actions. MoCapAct allows researchers to sidestep the computationally intensive task of training low-level control policies from MoCap data and instead use MoCapAct's expert agents and demonstrations for learning advanced locomotion behaviors. It also allows improving on our low-level policies by using them and their demonstration data as a starting point.

In our work, we use MoCapAct to train a single hierarchical policy capable of tracking the entire MoCap dataset within `dm_control`.
We then re-use the learned low-level component to efficiently learn other high-level tasks.
Finally, we use MoCapAct to train an autoregressive GPT model and show that it can perform natural motion completion given a motion prompt.
We encourage the reader to visit our [project website](https://microsoft.github.io/MoCapAct/) to see videos of our results as well as get links to our paper and code.

## Model Zoo Structure

The file structure of the model zoo is:
```
β”œβ”€β”€ all
β”‚   └── experts
β”‚       β”œβ”€β”€ experts_1.tar.gz
β”‚       β”œβ”€β”€ experts_2.tar.gz
β”‚       ...
β”‚       └── experts_8.tar.gz
β”‚
β”œβ”€β”€ sample
β”‚   └── experts.tar.gz
β”‚
β”œβ”€β”€ multiclip_policy.tar.gz
β”‚   β”œβ”€β”€ full_dataset
β”‚   └── locomotion_dataset
β”‚
β”œβ”€β”€ transfer.tar.gz
β”‚   β”œβ”€β”€ go_to_target
β”‚   β”‚   β”œβ”€β”€ general_low_level
β”‚   β”‚   β”œβ”€β”€ locomotion_low_level
β”‚   β”‚   └── no_low_level
β”‚   β”‚
β”‚   └── velocity_control
β”‚       β”œβ”€β”€ general_low_level
β”‚       β”œβ”€β”€ locomotion_low_level
β”‚       └── no_low_level
β”‚
β”œβ”€β”€ gpt.ckpt
β”‚
└── videos
    β”œβ”€β”€ full_clip_videos.tar.gz
    └── snippet_videos.tar.gz
```

## Experts Tarball Files
The expert tarball files have the following structure:
- `all/experts/experts_*.tar.gz`: Contains all of the clip snippet experts. Due to file size limitations, we split the experts among multiple tarball files.
- `sample/experts.tar.gz`: Contains the clip snippet experts used to run the examples on the [dataset website](https://microsoft.github.io/MoCapAct/).

The expert structure is detailed in Appendix A.1 of the paper as well as https://github.com/microsoft/MoCapAct#description.

An expert can be loaded and rolled out in Python as in the following example:
```python
from mocapact import observables
from mocapact.sb3 import utils
expert_path = "/path/to/experts/CMU_083_33/CMU_083_33-0-194/eval_rsi/model"
expert = utils.load_policy(expert_path, observables.TIME_INDEX_OBSERVABLES)

from mocapact.envs import tracking
from dm_control.locomotion.tasks.reference_pose import types
dataset = types.ClipCollection(ids=['CMU_083_33'], start_steps=[0], end_steps=[194])
env = tracking.MocapTrackingGymEnv(dataset)
obs, done = env.reset(), False
while not done:
    action, _ = expert.predict(obs, deterministic=True)
    obs, rew, done, _ = env.step(action)
    print(rew)
```

Alternatively, an expert can be rolled out from the command line:
```bash
python -m mocapact.clip_expert.evaluate \
  --policy_root /path/to/experts/CMU_016_22/CMU_016_22-0-82/eval_rsi/model \
  --act_noise 0 \
  --ghost_offset 1 \
  --always_init_at_clip_start
```

## GPT
The GPT policy is contained in `gpt.ckpt` and can be loaded using PyTorch Lightning:
```python
from mocapact.distillation import model
policy = model.GPTPolicy.load_from_checkpoint('/path/to/gpt.ckpt', map_location='cpu')
```
This policy can be used with `mocapact/distillation/motion_completion.py`, as in the following example:
```bash
python -m mocapact.distillation.motion_completion.py \
  --policy_path /path/to/gpt.ckpt \
  --nodeterministic \
  --ghost_offset 1 \
  --expert_root /path/to/experts/CMU_016_25 \
  --max_steps 500 \
  --always_init_at_clip_start \
  --prompt_length 32 \
  --min_steps 32 \
  --device cuda \
  --clip_snippet CMU_016_25
```

## Multi-Clip Policy
The `multiclip_policy.tar.gz` file contains two policies:
- `full_dataset`: Trained on the entire MoCapAct dataset
- `locomotion_dataset`: Trained on the `locomotion_small` portion of the MoCapAct dataset

Taking `full_dataset` as an example, a multi-clip policy can be loaded using PyTorch Lightning:
```python
from mocapact.distillation import model
policy = model.NpmpPolicy.load_from_checkpoint('/path/to/multiclip_policy/full_dataset/model/model.ckpt', map_location='cpu')
```
The policy can be used with `mocapact/distillation/evaluate.py`, as in the following example:
```bash
python -m mocapact.distillation.evaluate \
  --policy_path /path/to/multiclip_policy/full_dataset/model/model.ckpt \
  --act_noise 0 \
  --ghost_offset 1 \
  --always_init_at_clip_start \
  --termination_error_threshold 10 \
  --clip_snippets CMU_016_22
```

## Transfer
The `transfer.tar.gz` file contains policies for downstream tasks. The main difference between the contained folders is what low-level policy is used:
- `general_low_level`: Low-level policy comes from `multiclip_policy/full_dataset`
- `locomotion_low_level`: Low-level policy comes from `multiclip_policy/locomotion_dataset`
- `no_low_level`: No low-level policy used

The policy structure is as follows:
```
β”œβ”€β”€ best_model.zip
β”œβ”€β”€ low_level_policy.ckpt
└── vecnormalize.pkl
```
The `low_level_policy.ckpt` (only present in `general_low_level` and `locomotion_low_level`) contains the low-level policy and is loaded with PyTorch Lightning.
The `best_model.zip` file contains the task policy parameters.
The `vecnormalize.pkl` file contains the observation normalizer.
The latter two files are loaded with Stable-Baselines3.

The policy can be used with `mocapact/transfer/evaluate.py`, as in the following example:
```bash
python -m mocapact.transfer.evaluate \
  --model_root /path/to/transfer/go_to_target/general_low_level \
  --task /path/to/mocapact/transfer/config.py:go_to_target
```

## MoCap Videos
There are two tarball files containing videos of the MoCap clips in the dataset:
- `full_clip_videos.tar.gz` contains videos of the full MoCap clips.
- `snippet_videos.tar.gz` contains videos of the snippets that were used to train the experts.
Note that they are playbacks of the clips themselves, not rollouts of the corresponding experts.