arxiv:2502.13144

RAD: Training an End-to-End Driving Policy via Large-Scale 3DGS-based Reinforcement Learning

Published on Feb 18

· Submitted by

Hao605 on Feb 20

Upvote

Authors:

Hao Gao ,

Shaoyu Chen ,

Xiangyu Li ,

Wenyu Liu ,

Xinggang Wang

Abstract

Existing end-to-end autonomous driving (AD) algorithms typically follow the Imitation Learning (IL) paradigm, which faces challenges such as causal confusion and the open-loop gap. In this work, we establish a 3DGS-based closed-loop Reinforcement Learning (RL) training paradigm. By leveraging 3DGS techniques, we construct a photorealistic digital replica of the real physical world, enabling the AD policy to extensively explore the state space and learn to handle out-of-distribution scenarios through large-scale trial and error. To enhance safety, we design specialized rewards that guide the policy to effectively respond to safety-critical events and understand real-world causal relationships. For better alignment with human driving behavior, IL is incorporated into RL training as a regularization term. We introduce a closed-loop evaluation benchmark consisting of diverse, previously unseen 3DGS environments. Compared to IL-based methods, RAD achieves stronger performance in most closed-loop metrics, especially 3x lower collision rate. Abundant closed-loop results are presented at https://hgao-cv.github.io/RAD.

View arXiv page View PDF Add to collection

Community

Hao605

Paper author Paper submitter 1 day ago

Project page: https://hgao-cv.github.io/RAD/
We propose a reinforcement learning-based post-training paradigm for end-to-end autonomous driving, called RAD. RAD leverages 3DGS technology to construct a highly realistic digital twin of the physical world, enabling the end-to-end model to control the vehicle like a human driver. RAD progressively improves its driving skills through continuous interaction with the environment, receiving feedback, and refining its policy via extensive exploration and trial-and-error.
After reinforcement learning training in a large-scale 3DGS environment, RAD learns a more effective driving strategy. In the same closed-loop evaluation benchmark, RAD reduces the collision rate by a factor of three compared to a policy trained solely with imitation learning (IL-only). Additionally, we provide a series of visual experiments to illustrate the key differences between RAD and the IL-only policy.