sfairXC
/

FsfairX-LLaMA3-RM-v0.1

Text Classification

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

hendrydong commited on Apr 20

Commit

1c58e48

•

1 Parent(s): 096b3be

Update README.md

Files changed (1) hide show

README.md +3 -2

README.md CHANGED Viewed

@@ -6,8 +6,6 @@ The base model is `meta-llama/Meta-Llama-3-8B-Instruct`.
 We use the training script at `https://github.com/WeiXiongUST/RLHF-Reward-Modeling`.
-You can also refer to a short blog for RM training details: https://www.notion.so/Reward-Modeling-for-RLHF-abe03f9afdac42b9a5bee746844518d0.
 ## Uses
@@ -54,6 +52,9 @@ This Reward model is the SOTA open-source RM (Apr 20, 2024) on Reward-Bench.
 | Safety       | 88.76  |
 | Reasoning    | 88.3   |
 ## Reference
 The repo was part of the iterative rejection sampling fine-tuning and iterative DPO. If you find the content of this repo useful in your work, please consider cite it as follows:

 We use the training script at `https://github.com/WeiXiongUST/RLHF-Reward-Modeling`.
 ## Uses
 | Safety       | 88.76  |
 | Reasoning    | 88.3   |
+## See also
+You can also refer to our short blog for RM training details: https://www.notion.so/Reward-Modeling-for-RLHF-abe03f9afdac42b9a5bee746844518d0.
 ## Reference
 The repo was part of the iterative rejection sampling fine-tuning and iterative DPO. If you find the content of this repo useful in your work, please consider cite it as follows: