nicolinho commited on
Commit
ee02c07
1 Parent(s): 7d1484a

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +65 -3
README.md CHANGED
@@ -1,3 +1,65 @@
1
- ---
2
- license: llama3.1
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: llama3
3
+ ---
4
+
5
+ # Quantile Regression for Distributional Reward Models in RLHF
6
+
7
+
8
+
9
+
10
+ + **Author:** Nicolai Dorka
11
+ + **Tech Report**: https://arxiv.org/abs/2409.10164
12
+ + **Code Repository:** https://github.com/Nicolinho/QRM
13
+ + **Method Overview:** QRM generates a distribution over rewards by aggregating individual distributions over attribute scores like helpfulness and harmlessness.
14
+
15
+ <p align="left">
16
+ <img width="800" alt="image" src="https://github.com/Nicolinho/QRM/blob/main/assets/method_vis.png?raw=true">
17
+ </p>
18
+
19
+
20
+ This model uses [Skywork/Skywork-Reward-Llama-3.1-8B](https://huggingface.co/Skywork/Skywork-Reward-Llama-3.1-8B) as backbone and used
21
+ [Skywork/Skywork-Reward-Preference-80K-v0.1](https://huggingface.co/datasets/Skywork/Skywork-Reward-Preference-80K-v0.1) for training the gating network.
22
+ Apart from this, it has been trained exactly as described in the tech report.
23
+
24
+ ## Demo Code
25
+ ```python
26
+ # export ACCELERATE_MIXED_PRECISION=bf16
27
+ import torch
28
+ from transformers import AutoModelForSequenceClassification, AutoTokenizer
29
+ device = "cuda"
30
+ path = "nicolinho/QRM-Llama3.1-8B"
31
+ model = AutoModelForSequenceClassification.from_pretrained(path, device_map=device, trust_remote_code=True)
32
+ tokenizer = AutoTokenizer.from_pretrained(path, use_fast=True)
33
+ # We load a random sample from the validation set of the HelpSteer dataset
34
+ prompt = 'Does pineapple belong on a Pizza?'
35
+ response = "There are different opinions on this. Some people like pineapple on a Pizza while others condemn this."
36
+ messages = [{"role": "user", "content": prompt},
37
+ {"role": "assistant", "content": response}]
38
+ input_ids = tokenizer.apply_chat_template(messages, return_tensors="pt").to(device)
39
+ with torch.no_grad():
40
+ output = model(input_ids)
41
+ # Expectation of the reward distribution
42
+ reward = output.score.cpu().float()
43
+ # Quantile estimates for the quantiles 0.05, 0.1, ..., 0.9, 0.95 representing the distribution over rewards
44
+ reward_quantiles = output.reward_quantiles.cpu().float()
45
+
46
+ # The attributes of the 19 reward objectives
47
+ attributes = ['helpsteer-helpfulness','helpsteer-correctness','helpsteer-coherence',
48
+ 'helpsteer-complexity','helpsteer-verbosity','ultrafeedback-overall_score',
49
+ 'ultrafeedback-instruction_following', 'ultrafeedback-truthfulness',
50
+ 'ultrafeedback-honesty','ultrafeedback-helpfulness','beavertails-is_safe',
51
+ 'prometheus-score','argilla-overall_quality','argilla-judge_lm','code-complexity',
52
+ 'code-style','code-explanation','code-instruction-following','code-readability']
53
+ ```
54
+
55
+ ## Citation
56
+
57
+ If you find this work useful for your research, please consider citing:
58
+ ```
59
+ @article{dorka2024quantile,
60
+ title={Quantile Regression for Distributional Reward Models in RLHF},
61
+ author={Dorka, Nicolai},
62
+ journal={arXiv preprint arXiv:2409.10164},
63
+ year={2024}
64
+ }
65
+ ```