hendrydong
/

Mistral-RM-for-RAFT-GSHF-v0

Text Classification

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

hendrydong commited on Mar 23

Commit

739cb2d

•

1 Parent(s): 368f9ed

Update README.md

Files changed (1) hide show

README.md +2 -1

README.md CHANGED Viewed

@@ -14,8 +14,9 @@ SAMPLE =[
 The template is the same as `mistralai/Mistral-7B-Instruct-v0.2`.
-The reward model can be used for iterative SFT/DPO
 ```
 @article{dong2023raft,
   title={Raft: Reward ranked finetuning for generative foundation model alignment},

 The template is the same as `mistralai/Mistral-7B-Instruct-v0.2`.
+The reward model can be used for iterative SFT/DPO.
+Please cite them if you found this RM helpful,
 ```
 @article{dong2023raft,
   title={Raft: Reward ranked finetuning for generative foundation model alignment},