|
|
|
This is a Vanilla BT based Reward model based on Gemma-2-9B. The recipes are from RLHF Workflow. |
|
|
|
|
|
We have the reward-bench result: |
|
|
|
Chat: 98.04 |
|
|
|
Chat Hard: 65.35 |
|
|
|
Safety: 89.54 |
|
|
|
Reasoning: 92.31 |
|
|
|
Please refer to |
|
|
|
```bibtex |
|
@misc{dong2024rlhf, |
|
title={RLHF Workflow: From Reward Modeling to Online RLHF}, |
|
author={Hanze Dong and Wei Xiong and Bo Pang and Haoxiang Wang and Han Zhao and Yingbo Zhou and Nan Jiang and Doyen Sahoo and Caiming Xiong and Tong Zhang}, |
|
year={2024}, |
|
eprint={2405.07863}, |
|
archivePrefix={arXiv}, |
|
primaryClass={cs.LG} |
|
} |
|
``` |