Collections
Discover the best community collections!
Collections including paper arxiv:2405.20304
-
UltraFeedback: Boosting Language Models with High-quality Feedback
Paper • 2310.01377 • Published • 5 -
Learn Your Reference Model for Real Good Alignment
Paper • 2404.09656 • Published • 83 -
Natural Language Reinforcement Learning
Paper • 2411.14251 • Published • 28 -
Group Robust Preference Optimization in Reward-free RLHF
Paper • 2405.20304 • Published • 1
-
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Paper • 2305.18290 • Published • 53 -
ICDPO: Effectively Borrowing Alignment Capability of Others via In-context Direct Preference Optimization
Paper • 2402.09320 • Published • 6 -
sDPO: Don't Use Your Data All at Once
Paper • 2403.19270 • Published • 41 -
Dueling RL: Reinforcement Learning with Trajectory Preferences
Paper • 2111.04850 • Published • 2
-
Unleashing the Power of Pre-trained Language Models for Offline Reinforcement Learning
Paper • 2310.20587 • Published • 18 -
SELF: Language-Driven Self-Evolution for Large Language Model
Paper • 2310.00533 • Published • 2 -
QLoRA: Efficient Finetuning of Quantized LLMs
Paper • 2305.14314 • Published • 49 -
QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models
Paper • 2309.14717 • Published • 44