Direct Preference Optimization: Your Language Model is Secretly a Reward Model Paper • 2305.18290 • Published May 29, 2023 • 48
SimPO: Simple Preference Optimization with a Reference-Free Reward Paper • 2405.14734 • Published May 23 • 11
Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment Paper • 2408.06266 • Published Aug 12 • 9
Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs Paper • 2402.14740 • Published Feb 22 • 11
Binary Classifier Optimization for Large Language Model Alignment Paper • 2404.04656 • Published Apr 6 • 2
Noise Contrastive Alignment of Language Models with Explicit Rewards Paper • 2402.05369 • Published Feb 8 • 1
Contrastive Preference Optimization: Pushing the Boundaries of LLM Performance in Machine Translation Paper • 2401.08417 • Published Jan 16 • 33
ORPO: Monolithic Preference Optimization without Reference Model Paper • 2403.07691 • Published Mar 12 • 62
Exploratory Preference Optimization: Harnessing Implicit Q*-Approximation for Sample-Efficient RLHF Paper • 2405.21046 • Published May 31 • 3
From r to Q^*: Your Language Model is Secretly a Q-Function Paper • 2404.12358 • Published Apr 18 • 2
Offline Regularised Reinforcement Learning for Large Language Models Alignment Paper • 2405.19107 • Published May 29 • 13
Towards a Unified View of Preference Learning for Large Language Models: A Survey Paper • 2409.02795 • Published Sep 4 • 72
Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data Paper • 2404.14367 • Published Apr 22 • 1
Self-Supervised Alignment with Mutual Information: Learning to Follow Principles without Preference Labels Paper • 2404.14313 • Published Apr 22
Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback Paper • 2406.09279 • Published Jun 13 • 1
Is DPO Superior to PPO for LLM Alignment? A Comprehensive Study Paper • 2404.10719 • Published Apr 16 • 4
Building Math Agents with Multi-Turn Iterative Preference Learning Paper • 2409.02392 • Published Sep 4 • 14
Not All Preference Pairs Are Created Equal: A Recipe for Annotation-Efficient Iterative Preference Learning Paper • 2406.17312 • Published Jun 25
Show, Don't Tell: Aligning Language Models with Demonstrated Feedback Paper • 2406.00888 • Published Jun 2 • 30
Contrastive Prefence Learning: Learning from Human Feedback without RL Paper • 2310.13639 • Published Oct 20, 2023 • 24
Towards Efficient and Exact Optimization of Language Model Alignment Paper • 2402.00856 • Published Feb 1
HelpSteer2-Preference: Complementing Ratings with Preferences Paper • 2410.01257 • Published Oct 2 • 21
General Preference Modeling with Preference Representations for Aligning Language Models Paper • 2410.02197 • Published Oct 3 • 7
Modulated Intervention Preference Optimization (MIPO): Keep the Easy, Refine the Difficult Paper • 2409.17545 • Published Sep 26 • 18
Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF Paper • 2410.04612 • Published Oct 6
Understanding Likelihood Over-optimisation in Direct Alignment Algorithms Paper • 2410.11677 • Published Oct 15