hendrydong commited on
Commit
6b69100
1 Parent(s): 524cd28

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +14 -0
README.md CHANGED
@@ -1,3 +1,17 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  The reward model can be used for iterative SFT/DPO
2
 
3
  ```
 
1
+ Ideally, please use the following format for the reward evaluations,
2
+ ```python
3
+ def format(messages):
4
+ format_text = "[INST] You must read the following conversation carefully and rate the assistant's response from score 0-100 in these aspects: helpfulness, correctness, coherence, honesty, complexity.\n"
5
+
6
+ for message in messages:
7
+ if message['role'] == "user":
8
+ format_text = format_text + "\nUser: " + message['content']
9
+ elif message['role'] == 'assistant':
10
+ format_text = format_text + "\nAssistant: " + message['content']
11
+ return format_text
12
+ ```
13
+
14
+
15
  The reward model can be used for iterative SFT/DPO
16
 
17
  ```