Issue when finetuning the reward model on custom dataset

by yguooo - opened Jun 4, 2024

Jun 4, 2024

Currently, I am benchmarking the performance of different reward on custom dataset. And I encountered the following problem when using a standard pipeline from trl, similar to https://huggingface.co/docs/trl/en/reward_trainer.

I am wondering what I should do to fix the issue above.

Thank you!

Haoxiang-Wang

RLHFlow org Jun 4, 2024

The model expects data to be prepared in a specific format - see https://huggingface.co/RLHFlow/ArmoRM-Llama3-8B-v0.1#demo-code

Haoxiang-Wang changed discussion status to closed Jul 18, 2024

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment