u-brixton
's Collections
alignment_24_best
updated
KTO: Model Alignment as Prospect Theoretic Optimization
Paper
•
2402.01306
•
Published
•
16
Direct Preference Optimization: Your Language Model is Secretly a Reward
Model
Paper
•
2305.18290
•
Published
•
50
SimPO: Simple Preference Optimization with a Reference-Free Reward
Paper
•
2405.14734
•
Published
•
11
Anchored Preference Optimization and Contrastive Revisions: Addressing
Underspecification in Alignment
Paper
•
2408.06266
•
Published
•
9
Back to Basics: Revisiting REINFORCE Style Optimization for Learning
from Human Feedback in LLMs
Paper
•
2402.14740
•
Published
•
12
Binary Classifier Optimization for Large Language Model Alignment
Paper
•
2404.04656
•
Published
•
2
Noise Contrastive Alignment of Language Models with Explicit Rewards
Paper
•
2402.05369
•
Published
•
1
Contrastive Preference Optimization: Pushing the Boundaries of LLM
Performance in Machine Translation
Paper
•
2401.08417
•
Published
•
34
Direct Language Model Alignment from Online AI Feedback
Paper
•
2402.04792
•
Published
•
29
Nash Learning from Human Feedback
Paper
•
2312.00886
•
Published
•
14
ORPO: Monolithic Preference Optimization without Reference Model
Paper
•
2403.07691
•
Published
•
64
Exploratory Preference Optimization: Harnessing Implicit
Q*-Approximation for Sample-Efficient RLHF
Paper
•
2405.21046
•
Published
•
4
From r to Q^*: Your Language Model is Secretly a Q-Function
Paper
•
2404.12358
•
Published
•
2
Offline Regularised Reinforcement Learning for Large Language Models
Alignment
Paper
•
2405.19107
•
Published
•
14
Towards Scalable Automated Alignment of LLMs: A Survey
Paper
•
2406.01252
•
Published
•
2
Towards a Unified View of Preference Learning for Large Language Models:
A Survey
Paper
•
2409.02795
•
Published
•
71
Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy
Data
Paper
•
2404.14367
•
Published
•
1
Self-Supervised Alignment with Mutual Information: Learning to Follow
Principles without Preference Labels
Paper
•
2404.14313
•
Published
Unpacking DPO and PPO: Disentangling Best Practices for Learning from
Preference Feedback
Paper
•
2406.09279
•
Published
•
2
Is DPO Superior to PPO for LLM Alignment? A Comprehensive Study
Paper
•
2404.10719
•
Published
•
4
Advancing LLM Reasoning Generalists with Preference Trees
Paper
•
2404.02078
•
Published
•
44
Building Math Agents with Multi-Turn Iterative Preference Learning
Paper
•
2409.02392
•
Published
•
14
Not All Preference Pairs Are Created Equal: A Recipe for
Annotation-Efficient Iterative Preference Learning
Paper
•
2406.17312
•
Published
Show, Don't Tell: Aligning Language Models with Demonstrated Feedback
Paper
•
2406.00888
•
Published
•
30
Contrastive Prefence Learning: Learning from Human Feedback without RL
Paper
•
2310.13639
•
Published
•
24
Viewer
•
Updated
•
15k
•
337
•
6
Towards Efficient and Exact Optimization of Language Model Alignment
Paper
•
2402.00856
•
Published
HelpSteer2-Preference: Complementing Ratings with Preferences
Paper
•
2410.01257
•
Published
•
21
General Preference Modeling with Preference Representations for Aligning
Language Models
Paper
•
2410.02197
•
Published
•
8
Modulated Intervention Preference Optimization (MIPO): Keep the Easy,
Refine the Difficult
Paper
•
2409.17545
•
Published
•
20
RLHF Workflow: From Reward Modeling to Online RLHF
Paper
•
2405.07863
•
Published
•
66
Regressing the Relative Future: Efficient Policy Optimization for
Multi-turn RLHF
Paper
•
2410.04612
•
Published
Understanding Likelihood Over-optimisation in Direct Alignment
Algorithms
Paper
•
2410.11677
•
Published