Asynchronous RLHF
Collection
Models and datasets for asynchronous rlhf paper, see code at https://github.com/mnoukhov/async_rlhf
•
10 items
•
Updated
This model is a fine-tuned version of mnoukhov/pythia2.8b-sft-tldr on an unknown dataset. It achieves the following results on the evaluation set:
More information needed
More information needed
More information needed
The following hyperparameters were used during training:
Training Loss | Epoch | Step | Validation Loss | Accuracy |
---|---|---|---|---|
0.5048 | 0.2006 | 291 | 0.4736 | 0.7684 |
0.4188 | 0.4011 | 582 | 0.4287 | 0.7951 |
0.3628 | 0.6017 | 873 | 0.4141 | 0.8028 |
0.3203 | 0.8022 | 1164 | 0.3979 | 0.8129 |
Base model
EleutherAI/pythia-2.8b-deduped