hendrydong
/

Mistral-RM-for-RAFT-GSHF-v0

Text Classification

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

hendrydong commited on Mar 22

Commit

368f9ed

•

1 Parent(s): e8596eb

Update README.md

Files changed (1) hide show

README.md +14 -1

README.md CHANGED Viewed

	@@ -1,5 +1,18 @@













1
2	-
3
4	The reward model can be used for iterative SFT/DPO
5

+To use this model, you need to load by `AutoModelForSequenceClassification`,
+```python
+model = AutoModelForSequenceClassification.from_pretrained(
+    "hendrydong/Mistral-RM-for-RAFT-GSHF-v0", num_labels=1, torch_dtype=torch.bfloat16
+)
+```
+and prepare dataset like
+```python
+SAMPLE =[
+{'role': 'user', 'content': 'Hi!'},
+{'role': 'assistant', 'content': 'How are you?'},
+]
+```
+The template is the same as `mistralai/Mistral-7B-Instruct-v0.2`.
 The reward model can be used for iterative SFT/DPO