goldmermaid commited on
Commit
5a7b9c4
1 Parent(s): ab54afd

short instruction

Browse files
Files changed (1) hide show
  1. README.md +41 -0
README.md ADDED
@@ -0,0 +1,41 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # 🚀 RLHF Step-2 Reward Model
2
+ This repository is home to a RLHF reward model. This model is trained on questions and answers from the Stack Overflow Data Dump (https://huggingface.co/datasets/HuggingFaceH4/stack-exchange-preferences), using the `distilroberta-base` model (https://huggingface.co/distilroberta-base) as a base.
3
+
4
+ ## Usage
5
+ You can use this model directly with a pipeline for tasks such as text generation and instruction following:
6
+
7
+ ```python
8
+ from transformers import (
9
+ AutoModelForSequenceClassification,
10
+ AutoTokenizer,
11
+ pipeline
12
+ )
13
+
14
+ reward_model = AutoModelForSequenceClassification.from_pretrained(
15
+ cambioml/rlhf_reward_model,
16
+ num_labels=1,
17
+ # torch_dtype=torch.bfloat16,
18
+ load_in_8bit=True,
19
+ device_map={"": Accelerator().process_index}
20
+ )
21
+
22
+ reward_tokenizer = AutoTokenizer.from_pretrained(cambioml/rlhf_reward_model)
23
+ reward_tokenizer.pad_token = reward_tokenizer.eos_token
24
+
25
+
26
+ reward_kwargs = {
27
+ "return_all_scores": True,
28
+ "function_to_apply": "none",
29
+ "batch_size": 32,
30
+ "truncation": True,
31
+ "max_length": 138
32
+ }
33
+
34
+ reward_pipe = pipeline(
35
+ "sentiment-analysis",
36
+ model=reward_model,
37
+ model_kwargs=reward_kwargs,
38
+ tokenizer=reward_tokenizer,
39
+ return_token_type_ids=False,
40
+ )
41
+ ```