--- license: apache-2.0 language: - en --- OpenLLama-13B for reward modeling - Dataset: https://huggingface.co/datasets/pvduy/rm_oa_hh - Logs: https://wandb.ai/sorry/autocrit/runs/j05t4e97?workspace=user-sorry - Code: https://github.com/CarperAI/autocrit/blob/main/train_reward_model.py Usage: ```python from transformers import AutoModelForSequenceClassification, AutoTokenizer ckpt = "reciprocate/openllama-13b_rm_oasst-hh" model = AutoModelForSequenceClassification.from_pretrained(ckpt, load_in_4bit=True) tokenizer = AutoTokenizer.from_pretrained(ckpt) model(**tokenizer("ASSISTANT: This sentence is a lie.", return_tensors="pt"))[0].item() ``` Output: ```python -1.626953125 ```