File size: 664 Bytes
052ab03 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
---
datasets:
- Anthropic/hh-rlhf
language:
- en
tags:
- rlhf
model-index:
- name: deberta-v3-large-tasksource-rlhf-reward-model
results:
- task:
type: text-classification
name: RLHF
dataset:
type: rlhf
name: Anthropic/hh-rlhf
split: validation
metrics:
- type: accuracy
value: 0,7516
verified: true
---
`deberta-v3-large-tasksource-nli` fine-tuned on Anthropic/hh-rlhf for 1 epoch with 1e-5 learning rate.
Validation accuracy is currently the best publicly available reported: 75.16% (vs 69.25% for `OpenAssistant/reward-model-deberta-v3-large-v2`). |