MMLU Lower Results Theory

by fblgit - opened May 26

May 26

A model that performs better on "everything" must perform better or similar in MMLU as well , otherwise something must be wrong.

Have u tried to transform MMLU into a turn TrivialPursuit kind of Q&A session ?

urtuuuu

May 26

This comment has been hidden

imone

OpenChat org May 28

It's likely that Llama-3-8b instruct official results were obtained with special prompts. The measured results are quite low, see here for discussions.

fblgit

May 30

The score marks of a rusty test are irrelevant. I believe here we are trying to really prove this model and strategy to be most optimal right? :D

fblgit changed discussion status to closed May 30

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment