hendrydong commited on
Commit
e8596eb
1 Parent(s): 6b69100

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +0 -11
README.md CHANGED
@@ -1,15 +1,4 @@
1
- Ideally, please use the following format for the reward evaluations,
2
- ```python
3
- def format(messages):
4
- format_text = "[INST] You must read the following conversation carefully and rate the assistant's response from score 0-100 in these aspects: helpfulness, correctness, coherence, honesty, complexity.\n"
5
 
6
- for message in messages:
7
- if message['role'] == "user":
8
- format_text = format_text + "\nUser: " + message['content']
9
- elif message['role'] == 'assistant':
10
- format_text = format_text + "\nAssistant: " + message['content']
11
- return format_text
12
- ```
13
 
14
 
15
  The reward model can be used for iterative SFT/DPO
 
 
 
 
 
1
 
 
 
 
 
 
 
 
2
 
3
 
4
  The reward model can be used for iterative SFT/DPO