Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2405.20304

Papers - Fine-tuning - Preference Opt - Reward Free

Group Robust Preference Optimization in Reward-free RLHF

Paper • 2405.20304 • Published May 30, 2024 • 1

Papers - Fine-tuning - GRPO

Group Robust Preference Optimization in Reward-free RLHF

Paper • 2405.20304 • Published May 30, 2024 • 1

UltraFeedback: Boosting Language Models with High-quality Feedback

Paper • 2310.01377 • Published Oct 2, 2023 • 5
Learn Your Reference Model for Real Good Alignment

Paper • 2404.09656 • Published Apr 15, 2024 • 83
Natural Language Reinforcement Learning

Paper • 2411.14251 • Published Nov 21, 2024 • 28
Group Robust Preference Optimization in Reward-free RLHF

Paper • 2405.20304 • Published May 30, 2024 • 1

Papers - Fine-tuning - DPO

Refer to additional papers: https://link.springer.com/article/10.1007/s10994-014-5458-8 and https://link.springer.com/article/10.1007/BF00992696

Direct Preference Optimization: Your Language Model is Secretly a Reward Model

Paper • 2305.18290 • Published May 29, 2023 • 53
ICDPO: Effectively Borrowing Alignment Capability of Others via In-context Direct Preference Optimization

Paper • 2402.09320 • Published Feb 14, 2024 • 6
sDPO: Don't Use Your Data All at Once

Paper • 2403.19270 • Published Mar 28, 2024 • 41
Dueling RL: Reinforcement Learning with Trajectory Preferences

Paper • 2111.04850 • Published Nov 8, 2021 • 2

Papers - Fine-tuning

Unleashing the Power of Pre-trained Language Models for Offline Reinforcement Learning

Paper • 2310.20587 • Published Oct 31, 2023 • 18
SELF: Language-Driven Self-Evolution for Large Language Model

Paper • 2310.00533 • Published Oct 1, 2023 • 2
QLoRA: Efficient Finetuning of Quantized LLMs

Paper • 2305.14314 • Published May 23, 2023 • 49
QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models

Paper • 2309.14717 • Published Sep 26, 2023 • 44

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs