File size: 1,256 Bytes
5a7b9c4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
# 🚀 RLHF Step-2 Reward Model
This repository is home to a RLHF reward model. This model is trained on questions and answers from the Stack Overflow Data Dump (https://huggingface.co/datasets/HuggingFaceH4/stack-exchange-preferences), using the `distilroberta-base` model (https://huggingface.co/distilroberta-base) as a base.

## Usage 
You can use this model directly with a pipeline for tasks such as text generation and instruction following:

```python
from transformers import (
    AutoModelForSequenceClassification, 
    AutoTokenizer, 
    pipeline
)

reward_model = AutoModelForSequenceClassification.from_pretrained(
    cambioml/rlhf_reward_model, 
    num_labels=1,
    # torch_dtype=torch.bfloat16,
    load_in_8bit=True,
    device_map={"": Accelerator().process_index}
)

reward_tokenizer = AutoTokenizer.from_pretrained(cambioml/rlhf_reward_model) 
reward_tokenizer.pad_token = reward_tokenizer.eos_token


reward_kwargs = {
    "return_all_scores": True,
    "function_to_apply": "none",
    "batch_size": 32,
    "truncation": True,
    "max_length": 138
}

reward_pipe = pipeline(
    "sentiment-analysis",
    model=reward_model,
    model_kwargs=reward_kwargs,
    tokenizer=reward_tokenizer,
    return_token_type_ids=False,
)
```