arxiv:2310.19805

Sample Efficient Reward Augmentation in offline-to-online Reinforcement Learning

Published on Oct 7, 2023

Authors:

Abstract

Offline-to-online RL can make full use of pre-collected offline datasets to initialize policies, resulting in higher sample efficiency and better performance compared to only using online algorithms alone for policy training. However, direct fine-tuning of the pre-trained policy tends to result in sub-optimal performance. A primary reason is that conservative offline RL methods diminish the agent's capability of exploration, thereby impacting online fine-tuning performance. To encourage agent's exploration during online fine-tuning and enhance the overall online fine-tuning performance, we propose a generalized reward augmentation method called Sample Efficient Reward Augmentation (SERA). Specifically, SERA encourages agent to explore by computing Q conditioned entropy as intrinsic reward. The advantage of SERA is that it can extensively utilize offline pre-trained Q to encourage agent uniformly coverage of state space while considering the imbalance between the distributions of high-value and low-value states. Additionally, SERA can be effortlessly plugged into various RL algorithms to improve online fine-tuning and ensure sustained asymptotic improvement. Moreover, extensive experimental results demonstrate that when conducting offline-to-online problems, SERA consistently and effectively enhances the performance of various offline algorithms.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2310.19805 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2310.19805 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2310.19805 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.