hendrydong
commited on
Commit
•
e8596eb
1
Parent(s):
6b69100
Update README.md
Browse files
README.md
CHANGED
@@ -1,15 +1,4 @@
|
|
1 |
-
Ideally, please use the following format for the reward evaluations,
|
2 |
-
```python
|
3 |
-
def format(messages):
|
4 |
-
format_text = "[INST] You must read the following conversation carefully and rate the assistant's response from score 0-100 in these aspects: helpfulness, correctness, coherence, honesty, complexity.\n"
|
5 |
|
6 |
-
for message in messages:
|
7 |
-
if message['role'] == "user":
|
8 |
-
format_text = format_text + "\nUser: " + message['content']
|
9 |
-
elif message['role'] == 'assistant':
|
10 |
-
format_text = format_text + "\nAssistant: " + message['content']
|
11 |
-
return format_text
|
12 |
-
```
|
13 |
|
14 |
|
15 |
The reward model can be used for iterative SFT/DPO
|
|
|
|
|
|
|
|
|
|
|
1 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
2 |
|
3 |
|
4 |
The reward model can be used for iterative SFT/DPO
|