hendrydong's picture
Update README.md
6a7fde3 verified
This is a Vanilla BT based Reward model based on Gemma-2-9B. The recipes are from RLHF Workflow.
We have the reward-bench result:
Chat: 98.04
Chat Hard: 65.35
Safety: 89.54
Reasoning: 92.31
Please refer to
```bibtex
@misc{dong2024rlhf,
title={RLHF Workflow: From Reward Modeling to Online RLHF},
author={Hanze Dong and Wei Xiong and Bo Pang and Haoxiang Wang and Han Zhao and Yingbo Zhou and Nan Jiang and Doyen Sahoo and Caiming Xiong and Tong Zhang},
year={2024},
eprint={2405.07863},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
```