Fine-Grained Human Feedback Gives Better Rewards for Language Model Training Paper • 2306.01693 • Published Jun 2, 2023 • 3
Generative Verifiers: Reward Modeling as Next-Token Prediction Paper • 2408.15240 • Published Aug 27 • 13