Spaces:

allenai
/

reward-bench

Running

App Files Files Community

Separate Scores: With & Without Prior Sets

by Haoxiang-Wang - opened Jun 15, 2024

Discussion

Haoxiang-Wang

Jun 15, 2024

Hi @natolambert ,

Currently, models with or without prior-set numbers are compared together using score with different weights. Those without prior-set numbers are favored in the benchmark because prior-set numbers are lower than the average of the other four categories. I think, to make a fair comparison, we should add another score column to the benchmark that computes the average over the four primary categories (excluding the prior sets). What do you think?

Haoxiang-Wang

Jun 15, 2024

For instance, if we exclude the prior sets, ArmoRM's score increases to 90.8, which is higher than Llama3-70B-SteerLM-RM's score of 89.0. However, in the current benchmark, ArmoRM ranks lower than the Llama3-70B RM because it includes the prior sets in the score computation.

natolambert

Ai2 org Jun 17, 2024

@Haoxiang-Wang this is why I added the button that makes it so prior tests aren't included in the ranking? Do you think this isn't enough?
Trying to be minimal in additions, but yeah I've thought about this too.

natolambert

Ai2 org Jun 22, 2024

Prior sets is now off by default.

natolambert changed discussion status to closed Jun 22, 2024

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment