hendrydong
/

Mistral-RM-for-RAFT-GSHF-v0

Text Classification

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

hendrydong commited on Mar 22

Commit

6b69100

•

1 Parent(s): 524cd28

Update README.md

Files changed (1) hide show

README.md +14 -0

README.md CHANGED Viewed

@@ -1,3 +1,17 @@
 The reward model can be used for iterative SFT/DPO
 ```

+Ideally, please use the following format for the reward evaluations,
+```python
+def format(messages):
+    format_text = "[INST] You must read the following conversation carefully and rate the assistant's response from score 0-100 in these aspects: helpfulness, correctness, coherence, honesty, complexity.\n"
+    for message in messages:
+        if message['role'] == "user":
+            format_text = format_text + "\nUser: " + message['content']
+        elif message['role'] == 'assistant':
+            format_text = format_text + "\nAssistant: " + message['content']
+    return format_text
+```
 The reward model can be used for iterative SFT/DPO
 ```