hendrydong commited on
Commit
739cb2d
1 Parent(s): 368f9ed

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -1
README.md CHANGED
@@ -14,8 +14,9 @@ SAMPLE =[
14
 
15
  The template is the same as `mistralai/Mistral-7B-Instruct-v0.2`.
16
 
17
- The reward model can be used for iterative SFT/DPO
18
 
 
19
  ```
20
  @article{dong2023raft,
21
  title={Raft: Reward ranked finetuning for generative foundation model alignment},
 
14
 
15
  The template is the same as `mistralai/Mistral-7B-Instruct-v0.2`.
16
 
17
+ The reward model can be used for iterative SFT/DPO.
18
 
19
+ Please cite them if you found this RM helpful,
20
  ```
21
  @article{dong2023raft,
22
  title={Raft: Reward ranked finetuning for generative foundation model alignment},