Secrets of RLHF in Large Language Models Part II: Reward Modeling Paper • 2401.06080 • Published Jan 11 • 26
Offline Actor-Critic Reinforcement Learning Scales to Large Models Paper • 2402.05546 • Published Feb 8 • 4