|
--- |
|
language: |
|
- en |
|
tags: |
|
- webgpt |
|
- regression |
|
- reward-model |
|
license: apache-2.0 |
|
datasets: |
|
- openai/webgpt_comparisons |
|
- openai/summarize_from_feedback |
|
metrics: |
|
- accuracy |
|
--- |
|
|
|
Reward Model pretrained on openai/webgpt_comparison and humanfeedback summary. Unlike the other electra-large model this model is trained using rank loss with one more datasets. |
|
|
|
On validation dataset the result is much more stable than usual. |
|
|
|
You can refer to this [wandb](https://wandb.ai/theblackcat102/reward-model/runs/1d4e4oi2?workspace=) for more details |
|
|
|
|
|
Slightly better than previous webgpt only model : [electra-large](https://huggingface.co/theblackcat102/electra-large-webgpt-rm) |
|
|
|
|