lm-human-preference-details

Activity Feed Request to join this org

AI & ML interests

None defined yet.

Recent Activity

vwxyzjn authored a paper 20 days ago

The N+ Implementation Details of RLHF with PPO: A Case Study on TL;DR Summarization

vwxyzjn authored a paper 20 days ago

A2C is a special case of PPO

vwxyzjn authored a paper 20 days ago

Asynchronous RLHF: Faster and More Efficient Off-Policy RL for Language Models

View all activity

Collections 1

spaces 1

Rlhf Demo

models 63

lm-human-preference-details/train_policy_accelerate_tf_adam_gpt2_xl_grad_accu__sentiment_offline_5k.json__seed1

Text Generation • Updated Oct 6, 2023 • 87

lm-human-preference-details/train_policy_accelerate_tf_adam_gpt2_xl_grad_accu__sentiment_offline_5k.json__seed5

Text Generation • Updated Oct 6, 2023 • 88

lm-human-preference-details/train_policy_accelerate_tf_adam_gpt2_xl_grad_accu__sentiment_offline_5k.json__seed3

Text Generation • Updated Oct 6, 2023 • 88

lm-human-preference-details/train_policy_accelerate_tf_adam_gpt2_xl_grad_accu__sentiment_offline_5k.json__seed4

Text Generation • Updated Oct 6, 2023 • 87

lm-human-preference-details/train_policy_accelerate_tf_adam_gpt2_xl_grad_accu__sentiment_offline_5k.json__seed2

Text Generation • Updated Oct 6, 2023 • 88

lm-human-preference-details/train_policy_accelerate_pt_adam_gpt2_xl_grad_accu__sentiment_offline_5k.json__seed3

Text Generation • Updated Oct 6, 2023 • 91

lm-human-preference-details/train_policy_accelerate_pt_adam_gpt2_xl_grad_accu__sentiment_offline_5k.json__seed5

Text Generation • Updated Oct 6, 2023 • 88

lm-human-preference-details/train_policy_accelerate_pt_adam_gpt2_xl_grad_accu__sentiment_offline_5k.json__seed2

Text Generation • Updated Oct 6, 2023 • 87

lm-human-preference-details/train_policy_accelerate_pt_adam_gpt2_xl_grad_accu__sentiment_offline_5k.json__seed4

Text Generation • Updated Oct 6, 2023 • 86

lm-human-preference-details/train_policy_accelerate_pt_adam_gpt2__sentiment_offline_5k.json__seed5

Text Generation • Updated Oct 6, 2023 • 122

datasets

None public yet