hendrydong
/

Mistral-RM-for-RAFT-GSHF-v0

Text Classification

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

hendrydong commited on Mar 22

Commit

e8596eb

•

1 Parent(s): 6b69100

Update README.md

Files changed (1) hide show

README.md +0 -11

README.md CHANGED Viewed

@@ -1,15 +1,4 @@
-Ideally, please use the following format for the reward evaluations,
-```python
-def format(messages):
-    format_text = "[INST] You must read the following conversation carefully and rate the assistant's response from score 0-100 in these aspects: helpfulness, correctness, coherence, honesty, complexity.\n"
-    for message in messages:
-        if message['role'] == "user":
-            format_text = format_text + "\nUser: " + message['content']
-        elif message['role'] == 'assistant':
-            format_text = format_text + "\nAssistant: " + message['content']
-    return format_text
-```
 The reward model can be used for iterative SFT/DPO






1







2
3
4	The reward model can be used for iterative SFT/DPO