Meta Motivo M

Meta Motivo is a behavioral foundation model pre-trained with a novel unsupervised reinforcement learning algorithm to control the movements of a complex virtual humanoid agent. At test time, our model can be prompted to solve unseen tasks such as motion tracking, pose reaching, and reward optimization without any additional learning or fine-tuning.

Meta Motivo M is our largest and most performant model. It is pretrained as described in the paper "Zero-shot Whole-Body Humanoid Control via Behavioral Foundation Models", and it can be interactively tested in our demo.

Model Developer: Meta

Model Details

Meta Motivo is composed of multiple networks

forward net $F (s, a, z)$
backward net $B (s)$
actor net $\pi(s,z)$
discriminator net $D (s, z)$
critic net $Q (s, a, z)$

Network architectures

Forward, actor, and critic. All these networks are MLPs composed of a sequence of residual blocks similar to those employed in modern transformer architectures. Each residual block involves a layernorm followed by a linear layer with 2048 hidden units, a Mish activation function, and a residual connection. The networks have two initial "embedding layers", one processing (s,z), and the other processing s alone. The second embedding layer has half the hidden units of the first layer, and their outputs are concatenated and fed into the main MLP. We use 2 residual blocks for the embedding layers and 12 residual blocks for the main MLP. The actor network outputs the mean of a Gaussian distribution with fixed standard deviation, while the forward and critic networks output a d-dimensional vector and a scalar, respectively. The two latter networks use an ensemble of two networks.

Backward. The backward map is a simple MLP composed of a layernorm operation, a linear layer with 256 hidden units, a tanh activation function, and another linear layer which outputs a d-dimensional vector that is then normalized in l2-norm.

Discriminator. The discriminator is an MLP with 3 hidden layers of 1024 units and ReLU activations, except for the first hidden layer which uses a layernorm followed by tanh. It takes as input a state observation s and a latent variable z, and has a sigmoidal unit at the output.

See the config.json file for more details.

How to use

> pip install "metamotivo[all] @ git+https://github.com/facebookresearch/metamotivo.git"

and then

from metamotivo.fb_cpr.huggingface import FBcprModel

model = FBcprModel.from_pretrained("facebook/metamotivo-M-1")

Citation

If you find our code useful for your research, please consider citing:

@article{tirinzoni2024metamotivo,
    title={Zero-shot Whole-Body Humanoid Control via Behavioral Foundation Models},
    author={Tirinzoni, Andrea and Touati, Ahmed and Farebrother, Jesse and Guzek, Mateusz and Kanervisto, Anssi and Xu, Yingchen and Lazaric, Alessandro and Pirotta, Matteo},
}

License

Meta Motivo is CC-BY-NC 4.0 licensed as of now.

facebook
/

metamotivo-M-1