nicolinho
/

QRM-Llama3.1-8B

Model card Files Files and versions Community

nicolinho commited on Sep 25, 2024

Commit

ee02c07

·

verified ·

1 Parent(s): 7d1484a

Upload README.md

Files changed (1) hide show

README.md +65 -3

README.md CHANGED Viewed

@@ -1,3 +1,65 @@
----
-license: llama3.1
----

+---
+license: llama3
+---
+# Quantile Regression for Distributional Reward Models in RLHF
++ **Author:** Nicolai Dorka
++ **Tech Report**: https://arxiv.org/abs/2409.10164
++ **Code Repository:** https://github.com/Nicolinho/QRM
++ **Method Overview:** QRM generates a distribution over rewards by aggregating individual distributions over attribute scores like helpfulness and harmlessness.
+    <p align="left">
+      <img width="800" alt="image" src="https://github.com/Nicolinho/QRM/blob/main/assets/method_vis.png?raw=true">
+    </p>
+This model uses [Skywork/Skywork-Reward-Llama-3.1-8B](https://huggingface.co/Skywork/Skywork-Reward-Llama-3.1-8B) as backbone and used
+[Skywork/Skywork-Reward-Preference-80K-v0.1](https://huggingface.co/datasets/Skywork/Skywork-Reward-Preference-80K-v0.1) for training the gating network.
+Apart from this, it has been trained exactly as described in the tech report.
+## Demo Code
+```python
+# export ACCELERATE_MIXED_PRECISION=bf16
+import torch
+from transformers import AutoModelForSequenceClassification, AutoTokenizer
+device = "cuda"
+path = "nicolinho/QRM-Llama3.1-8B"
+model = AutoModelForSequenceClassification.from_pretrained(path, device_map=device, trust_remote_code=True)
+tokenizer = AutoTokenizer.from_pretrained(path, use_fast=True)
+# We load a random sample from the validation set of the HelpSteer dataset
+prompt = 'Does pineapple belong on a Pizza?'
+response = "There are different opinions on this. Some people like pineapple on a Pizza while others condemn this."
+messages = [{"role": "user", "content": prompt},
+           {"role": "assistant", "content": response}]
+input_ids = tokenizer.apply_chat_template(messages, return_tensors="pt").to(device)
+with torch.no_grad():
+   output = model(input_ids)
+   # Expectation of the reward distribution
+   reward = output.score.cpu().float()
+   # Quantile estimates for the quantiles 0.05, 0.1, ..., 0.9, 0.95 representing the distribution over rewards
+   reward_quantiles = output.reward_quantiles.cpu().float()
+# The attributes of the 19 reward objectives
+attributes = ['helpsteer-helpfulness','helpsteer-correctness','helpsteer-coherence',
+   'helpsteer-complexity','helpsteer-verbosity','ultrafeedback-overall_score',
+   'ultrafeedback-instruction_following', 'ultrafeedback-truthfulness',
+   'ultrafeedback-honesty','ultrafeedback-helpfulness','beavertails-is_safe',
+   'prometheus-score','argilla-overall_quality','argilla-judge_lm','code-complexity',
+   'code-style','code-explanation','code-instruction-following','code-readability']
+```
+## Citation
+If you find this work useful for your research, please consider citing:
+```
+@article{dorka2024quantile,
+  title={Quantile Regression for Distributional Reward Models in RLHF},
+  author={Dorka, Nicolai},
+  journal={arXiv preprint arXiv:2409.10164},
+  year={2024}
+}
+```