Asynchronous RLHF Collection Models and datasets for asynchronous rlhf paper, see code at https://github.com/mnoukhov/async_rlhf • 10 items • Updated Oct 28
Asynchronous RLHF: Faster and More Efficient Off-Policy RL for Language Models Paper • 2410.18252 • Published Oct 23 • 5 • 2
The N+ Implementation Details of RLHF with PPO: A Case Study on TL;DR Summarization Paper • 2403.17031 • Published Mar 24 • 3
Asynchronous RLHF Collection Models and datasets for asynchronous rlhf paper, see code at https://github.com/mnoukhov/async_rlhf • 10 items • Updated Oct 28
Elastic Reset Collection Models and datasets for Elastic Reset (NeurIPS 2023), code at https://github.com/mnoukhov/elastic-reset • 5 items • Updated Oct 22