arxiv:2206.02829

RORL: Robust Offline Reinforcement Learning via Conservative Smoothing

Published on Jun 6, 2022

Authors:

Chenjia Bai ,

Zhaoran Wang ,

Abstract

Offline reinforcement learning (RL) provides a promising direction to exploit massive amount of offline data for complex decision-making tasks. Due to the distribution shift issue, current offline RL algorithms are generally designed to be conservative in value estimation and action selection. However, such conservatism can impair the robustness of learned policies when encountering observation deviation under realistic conditions, such as sensor errors and adversarial attacks. To trade off robustness and conservatism, we propose Robust Offline Reinforcement Learning (RORL) with a novel conservative smoothing technique. In RORL, we explicitly introduce regularization on the policy and the value function for states near the dataset, as well as additional conservative value estimation on these states. Theoretically, we show RORL enjoys a tighter suboptimality bound than recent theoretical results in linear MDPs. We demonstrate that RORL can achieve state-of-the-art performance on the general offline RL benchmark and is considerably robust to adversarial observation perturbations.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

No model linking this paper

Cite arxiv.org/abs/2206.02829 in a model README.md to link it from this page.

No dataset linking this paper

Cite arxiv.org/abs/2206.02829 in a dataset README.md to link it from this page.

No Space linking this paper

Cite arxiv.org/abs/2206.02829 in a Space README.md to link it from this page.

No Collection including this paper

Add this paper to a collection to link it from this page.