Text Classification
Transformers
PyTorch
English
deberta-v2
reward-model
reward_model
RLHF
Inference Endpoints
Edit model card

Reward model trained from human feedback

Reward model (RM) trained to predict which generated answer is better judged by a human, given a question.

RM are useful in these domain:

  • QA model evaluation

  • serves as reward score in RLHF

All models are train on these dataset with a same split seed across datasets (if validation split wasn't available)

How to use

from transformers import AutoModelForSequenceClassification, AutoTokenizer
reward_name = "OpenAssistant/reward-model-deberta-v3-base"
rank_model, tokenizer = AutoModelForSequenceClassification.from_pretrained(reward_name), AutoTokenizer.from_pretrained(reward_name)
question, answer = "Explain nuclear fusion like I am five", "Nuclear fusion is the process by which two or more protons and neutrons combine to form a single nucleus. It is a very important process in the universe, as it is the source of energy for stars and galaxies. Nuclear fusion is also a key process in the production of energy for nuclear power plants."
inputs = tokenizer(question, answer, return_tensors='pt')
score = rank_model(**inputs).logits[0].cpu().detach()
print(score)

Performance

Validation split accuracy

Model WebGPT Summary SytheticGPT
electra-large-discriminator 59.30 68.66 99.85
deberta-v3-large 61.13 72.23 99.94
deberta-v3-base 59.07 66.84 99.85

Its likely SytheticGPT has somekind of surface pattern on the choosen-rejected pair which makes it trivial to differentiate between better the answer.

Downloads last month
429
Inference API
This model can be loaded on Inference API (serverless).

Datasets used to train OpenAssistant/reward-model-deberta-v3-base