About MMLU evaluation
#12
by
ldwang
- opened
Thank you for sharing.
Some models, like Qwen1.5B Phi1.5, typically use a 5-shot setting to measure MMLU.
And cosmo-1b also used the same setting https://huggingface.co/blog/cosmopedia#training-stack.
Can you explain why here MMLU evaluations are changed to a zero-shot plus option content approach?
Thank you.
Hi, we use the same evaluation setup now for our internal projects (same as FineWeb and FineWeb-Edu ablations) where we do zero-shot for all the benchmarks