|
--- |
|
license: apache-2.0 |
|
language: |
|
- en |
|
--- |
|
|
|
OpenLLama-13B for reward modeling |
|
|
|
- Dataset: https://huggingface.co/datasets/pvduy/rm_oa_hh |
|
- Logs: https://wandb.ai/sorry/autocrit/runs/j05t4e97?workspace=user-sorry |
|
- Code: https://github.com/CarperAI/autocrit/blob/main/train_reward_model.py |
|
|
|
Usage: |
|
|
|
```python |
|
from transformers import AutoModelForSequenceClassification, AutoTokenizer |
|
|
|
ckpt = "reciprocate/openllama-13b_rm_oasst-hh" |
|
model = AutoModelForSequenceClassification.from_pretrained(ckpt, load_in_4bit=True) |
|
tokenizer = AutoTokenizer.from_pretrained(ckpt) |
|
|
|
model(**tokenizer("ASSISTANT: This sentence is a lie.", return_tensors="pt"))[0].item() |
|
``` |
|
|
|
Output: |
|
```python |
|
-1.626953125 |
|
``` |
|
|