theblackcat102 commited on
Commit
bc37b43
1 Parent(s): 02de36e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +80 -0
README.md CHANGED
@@ -1,3 +1,83 @@
1
  ---
2
  license: mit
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: mit
3
+ datasets:
4
+ - openai/summarize_from_feedback
5
+ - openai/webgpt_comparisons
6
+ - Dahoas/instruct-synthetic-prompt-responses
7
+ - Anthropic/hh-rlhf
8
+ language:
9
+ - en
10
+ metrics:
11
+ - accuracy
12
+ tags:
13
+ - reward-model
14
+ - reward_model
15
+ - RLHF
16
  ---
17
+ # Reward model trained from human feedback
18
+
19
+ Reward model (RM) trained to predict which generated answer is better judged by a human, given a question.
20
+
21
+ RM are useful in these domain:
22
+
23
+ - QA model evaluation
24
+
25
+ - serves as reward score in RLHF
26
+
27
+ - detect potential toxic response via ranking
28
+
29
+ All models are train on these dataset with a same split seed across datasets (if validation split wasn't available)
30
+
31
+ - [webgpt_comparisons](https://huggingface.co/datasets/openai/webgpt_comparisons)
32
+
33
+ - [summarize_from_feedback](https://huggingface.co/datasets/openai/summarize_from_feedback)
34
+
35
+ - [synthetic-instruct-gptj-pairwise](https://huggingface.co/datasets/Dahoas/synthetic-instruct-gptj-pairwise)
36
+
37
+ - [anthropic_hh-rlhf](https://huggingface.co/datasets/Anthropic/hh-rlhf)
38
+
39
+ # How to use
40
+
41
+ ```
42
+ from transformers import AutoModelForSequenceClassification, AutoTokenizer
43
+ reward_name = "OpenAssistant/reward-model-deberta-v3-large-v2"
44
+ rank_model, tokenizer = AutoModelForSequenceClassification.from_pretrained(reward_name), AutoTokenizer.from_pretrained(reward_name)
45
+ question, answer = "Explain nuclear fusion like I am five", "Nuclear fusion is the process by which two or more protons and neutrons combine to form a single nucleus. It is a very important process in the universe, as it is the source of energy for stars and galaxies. Nuclear fusion is also a key process in the production of energy for nuclear power plants."
46
+ inputs = tokenizer(question, answer, return_tensors='pt')
47
+ score = rank_model(**inputs).logits[0].cpu().detach()
48
+ print(score)
49
+ ```
50
+
51
+ **Toxic response detection**
52
+
53
+ ```
54
+ from transformers import AutoModelForSequenceClassification, AutoTokenizer
55
+ reward_name = "OpenAssistant/reward-model-deberta-v3-large-v2"
56
+ rank_model, tokenizer = AutoModelForSequenceClassification.from_pretrained(reward_name), AutoTokenizer.from_pretrained(reward_name)
57
+
58
+ question = "I just came out of from jail, any suggestion of my future?"
59
+ helpful = "It's great to hear that you have been released from jail."
60
+ bad = "Go back to jail you scum"
61
+
62
+ inputs = tokenizer(question, helpful, return_tensors='pt')
63
+ good_score = rank_model(**inputs).logits[0].cpu().detach()
64
+
65
+ inputs = tokenizer(question, bad, return_tensors='pt')
66
+ bad_score = rank_model(**inputs).logits[0].cpu().detach()
67
+ print(good_score > bad_score) # tensor([True])
68
+ ```
69
+
70
+ # Performance
71
+
72
+ Validation split accuracy
73
+
74
+ | Model | [WebGPT](https://huggingface.co/datasets/openai/webgpt_comparisons) | [Summary](https://huggingface.co/datasets/openai/summarize_from_feedback) | [SytheticGPT](https://huggingface.co/datasets/Dahoas/synthetic-instruct-gptj-pairwise) | [Anthropic RLHF]() |
75
+ |---|---|---|---|---|
76
+ | [electra-large-discriminator](https://huggingface.co/OpenAssistant/reward-model-electra-large-discriminator) | 59.30 | 68.66 | 99.85 | 54.33 |
77
+ | **[deberta-v3-large-v2](https://huggingface.co/OpenAssistant/reward-model-deberta-v3-large-v2)** | **61.57** | 71.47 | 99.88 | **69.25** |
78
+ | [deberta-v3-large](https://huggingface.co/OpenAssistant/reward-model-deberta-v3-large) | 61.13 | 72.23 | **99.94** | 55.62 |
79
+ | [deberta-v3-base](https://huggingface.co/OpenAssistant/reward-model-deberta-v3-base) | 59.07 | 66.84 | 99.85 | 54.51 |
80
+ | deberta-v2-xxlarge | 58.67 | 73.27 | 99.77 | 66.74 |
81
+
82
+ Its likely SytheticGPT has somekind of surface pattern on the choosen-rejected pair which makes it trivial to differentiate between better the answer.
83
+