Ray2333
/

gpt2-large-helpful-reward_model

Text Classification

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

gpt2-large-helpful-reward_model / README.md

Ray2333's picture

Update README.md

f5559ed verified 8 months ago

|

1.24 kB

	---
	license: mit
	datasets:
	- Anthropic/hh-rlhf
	metrics:
	- accuracy
	---


	GPT2 large model trained on Anthropic/hh-rlhf helpful dataset. It is specifically used for helpful response detection or RLHF. It achieves an accuracy of 0.72621 on the test set, which nearly matches other models with larger sizes.

	Note: 1. Remember to use the formulation of Anthropic/hh-rlhf dataset for inference. 2. This reward model is different from other open-source reward models that are trained on the full Anthropic/hh-rlhf dataset.


	## Usage:
	```
	import torch
	from transformers import AutoTokenizer, AutoModelForSequenceClassification

	rm_tokenizer = AutoTokenizer.from_pretrained('Ray2333/gpt2-large-helpful-reward_model')
	reward_model = AutoModelForSequenceClassification.from_pretrained(
	'Ray2333/gpt2-large-helpful-reward_model',
	num_labels=1, torch_dtype=torch.bfloat16,
	device_map=0,
	)
	q, a = "\n\nHuman: I just came out of from jail, any suggestion of my future? \n\nAssistant:", "Sorry, I don't understand."
	inputs = rm_tokenizer(q, a, return_tensors='pt', truncation=True)
	with torch.no_grad():
	reward = reward_model(**(inputs.to(0))).logits[0].cpu().detach().item()
	```