sfairXC
/

FsfairX-LLaMA3-RM-v0.1

Text Classification

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

hendrydong commited on Apr 20

Commit

f3760a7

•

1 Parent(s): 1c58e48

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -56,7 +56,7 @@ This Reward model is the SOTA open-source RM (Apr 20, 2024) on Reward-Bench.
 You can also refer to our short blog for RM training details: https://www.notion.so/Reward-Modeling-for-RLHF-abe03f9afdac42b9a5bee746844518d0.
-## Reference
 The repo was part of the iterative rejection sampling fine-tuning and iterative DPO. If you find the content of this repo useful in your work, please consider cite it as follows:
 ```bibtex

 You can also refer to our short blog for RM training details: https://www.notion.so/Reward-Modeling-for-RLHF-abe03f9afdac42b9a5bee746844518d0.
+## References
 The repo was part of the iterative rejection sampling fine-tuning and iterative DPO. If you find the content of this repo useful in your work, please consider cite it as follows:
 ```bibtex