Reward model based deberta-v3-large-tasksource-nli
fine-tuned on Anthropic/hh-rlhf
For 1 epoch with 1e-5 learning rate.
The data are described in the paper: Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback.
Validation accuracy is currently the best publicly available reported: 75.16% (vs 69.25% for OpenAssistant/reward-model-deberta-v3-large-v2
).
- Downloads last month
- 321
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Dataset used to train sileod/deberta-v3-large-tasksource-rlhf-reward-model
Evaluation results
- accuracy on Anthropic/hh-rlhfvalidation set self-reported0,7516