Asynchronous RLHF
Collection
Models and datasets for asynchronous rlhf paper, see code at https://github.com/mnoukhov/async_rlhf
•
10 items
•
Updated
This model is a fine-tuned version of EleutherAI/pythia-1b-deduped on an unknown dataset. It achieves the following results on the evaluation set:
More information needed
More information needed
More information needed
The following hyperparameters were used during training:
Training Loss | Epoch | Step | Validation Loss |
---|---|---|---|
2.5278 | 0.2007 | 183 | 2.4199 |
2.4136 | 0.4013 | 366 | 2.4004 |
2.3978 | 0.6020 | 549 | 2.3887 |
2.3813 | 0.8026 | 732 | 2.3828 |