Using Human Feedback to Fine-tune Diffusion Models without Any Reward Model Paper • 2311.13231 • Published Nov 22, 2023 • 26
Secrets of RLHF in Large Language Models Part II: Reward Modeling Paper • 2401.06080 • Published Jan 11, 2024 • 26
MusicRL: Aligning Music Generation to Human Preferences Paper • 2402.04229 • Published Feb 6, 2024 • 16
OpenAssistant/reward-model-deberta-v3-large-v2 Text Classification • Updated Feb 1, 2023 • 19.4k • 212