Other benchmarks as MT-Bench and/or AlpacaEval

#14
by alvarobartt HF staff - opened

Hi here! Are you also planning to run both MT-Bench and/or AlpacaEval? Those benchmarks seem to be close to reality rather than lm-eval-harness, and would be interested in the results too if any, thanks in advance!

(Maybe those already exist, but couldn't find those within the model on the Hub)

hi, we will update the results soon~

Hi @lvkaokao , that's great to hear! Feel free to ping me when uploaded, I'm really looking forward those!

Sign up or log in to comment