Exploratory Preference Optimization: Harnessing Implicit Q*-Approximation for Sample-Efficient RLHF Paper • 2405.21046 • Published May 31, 2024 • 4