Papers
arxiv:2502.13144

RAD: Training an End-to-End Driving Policy via Large-Scale 3DGS-based Reinforcement Learning

Published on Feb 18
· Submitted by Hao605 on Feb 20
Authors:
,
,
,
,
,
,
,
,
,

Abstract

Existing end-to-end autonomous driving (AD) algorithms typically follow the Imitation Learning (IL) paradigm, which faces challenges such as causal confusion and the open-loop gap. In this work, we establish a 3DGS-based closed-loop Reinforcement Learning (RL) training paradigm. By leveraging 3DGS techniques, we construct a photorealistic digital replica of the real physical world, enabling the AD policy to extensively explore the state space and learn to handle out-of-distribution scenarios through large-scale trial and error. To enhance safety, we design specialized rewards that guide the policy to effectively respond to safety-critical events and understand real-world causal relationships. For better alignment with human driving behavior, IL is incorporated into RL training as a regularization term. We introduce a closed-loop evaluation benchmark consisting of diverse, previously unseen 3DGS environments. Compared to IL-based methods, RAD achieves stronger performance in most closed-loop metrics, especially 3x lower collision rate. Abundant closed-loop results are presented at https://hgao-cv.github.io/RAD.

Community

Paper author Paper submitter

Project page: https://hgao-cv.github.io/RAD/
We propose a reinforcement learning-based post-training paradigm for end-to-end autonomous driving, called RAD. RAD leverages 3DGS technology to construct a highly realistic digital twin of the physical world, enabling the end-to-end model to control the vehicle like a human driver. RAD progressively improves its driving skills through continuous interaction with the environment, receiving feedback, and refining its policy via extensive exploration and trial-and-error.
After reinforcement learning training in a large-scale 3DGS environment, RAD learns a more effective driving strategy. In the same closed-loop evaluation benchmark, RAD reduces the collision rate by a factor of three compared to a policy trained solely with imitation learning (IL-only). Additionally, we provide a series of visual experiments to illustrate the key differences between RAD and the IL-only policy.

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2502.13144 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2502.13144 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2502.13144 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.