-
sDPO: Don't Use Your Data All at Once
Paper • 2403.19270 • Published • 40 -
Advancing LLM Reasoning Generalists with Preference Trees
Paper • 2404.02078 • Published • 44 -
Learn Your Reference Model for Real Good Alignment
Paper • 2404.09656 • Published • 82 -
mDPO: Conditional Preference Optimization for Multimodal Large Language Models
Paper • 2406.11839 • Published • 37
zyyang
zy0yang
AI & ML interests
SFT & RLHF
Organizations
Collections
5
models
None public yet
datasets
None public yet