Issue when finetuning the reward model on custom dataset

#2
by yguooo - opened

Currently, I am benchmarking the performance of different reward on custom dataset. And I encountered the following problem when using a standard pipeline from trl, similar to https://huggingface.co/docs/trl/en/reward_trainer.
image.png
I am wondering what I should do to fix the issue above.

Thank you!

The model expects data to be prepared in a specific format - see https://huggingface.co/RLHFlow/ArmoRM-Llama3-8B-v0.1#demo-code

Sign up or log in to comment