mnoukhov/summarize_from_feedback_oai_preprocessing_1706381144_relabel_pythia6.9b Viewer • Updated Jun 20 • 177k • 44
vwxyzjn/summarize_from_feedback_tldr_3_filtered_oai_preprocessing_1706381144 Viewer • Updated Jan 27 • 130k • 724
Asynchronous RLHF: Faster and More Efficient Off-Policy RL for Language Models Paper • 2410.18252 • Published Oct 23 • 5