hendrydong commited on
Commit
368f9ed
1 Parent(s): e8596eb

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +14 -1
README.md CHANGED
@@ -1,5 +1,18 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
 
2
-
3
 
4
  The reward model can be used for iterative SFT/DPO
5
 
 
1
+ To use this model, you need to load by `AutoModelForSequenceClassification`,
2
+ ```python
3
+ model = AutoModelForSequenceClassification.from_pretrained(
4
+ "hendrydong/Mistral-RM-for-RAFT-GSHF-v0", num_labels=1, torch_dtype=torch.bfloat16
5
+ )
6
+ ```
7
+ and prepare dataset like
8
+ ```python
9
+ SAMPLE =[
10
+ {'role': 'user', 'content': 'Hi!'},
11
+ {'role': 'assistant', 'content': 'How are you?'},
12
+ ]
13
+ ```
14
 
15
+ The template is the same as `mistralai/Mistral-7B-Instruct-v0.2`.
16
 
17
  The reward model can be used for iterative SFT/DPO
18