arxiv:2203.03580

The Unsurprising Effectiveness of Pre-Trained Vision Models for Control

Published on Mar 7, 2022

Authors:

Abstract

Recent years have seen the emergence of pre-trained representations as a powerful abstraction for AI applications in computer vision, natural language, and speech. However, policy learning for control is still dominated by a tabula-rasa learning paradigm, with visuo-motor policies often trained from scratch using data from deployment environments. In this context, we revisit and study the role of pre-trained visual representations for control, and in particular representations trained on large-scale computer vision datasets. Through extensive empirical evaluation in diverse <PRE_TAG>control domains</POST_TAG> (Habitat, DeepMind Control, Adroit, Franka Kitchen), we isolate and study the importance of different representation training methods, data augmentations, and feature hierarchies. Overall, we find that pre-trained visual representations can be competitive or even better than ground-truth state representations to train control policies. This is in spite of using only out-of-domain data from standard vision datasets, without any in-domain data from the deployment environments. Source code and more at https://sites.google.com/view/pvr-control.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

No model linking this paper

Cite arxiv.org/abs/2203.03580 in a model README.md to link it from this page.

No dataset linking this paper

Cite arxiv.org/abs/2203.03580 in a dataset README.md to link it from this page.

No Space linking this paper

Cite arxiv.org/abs/2203.03580 in a Space README.md to link it from this page.

No Collection including this paper

Add this paper to a collection to link it from this page.