Can't replicate MMLU results for 27b...
#39
by
cinjonr
- opened
I've replicated MMLU with Gemma-9b. And when I use a random model I get 25% as expected. However, I can't replicate it with 27b. Anyone else running into this issue? Is the 27b model ... correct?
I messed with this a bunch and now have gemma-27b reproducing but not gemma-9b. I made a reproduction: https://gist.github.com/cinjon/de9a22f57cfa0dc9ccb2afc255a8093e.
The main problem are the results below, which show roughly reproductions on gemma-27b, slight degradation on gemma-27b-it, slight degradation on gemma-2-9b, and terrible result on gemma-2-9b-it. What am I doing wrong?
1. python -m huggingface_test_gemma_base_mmlu --model_name="google/gemma-2-9b"
--> all 0.7057399230878793
2. python -m huggingface_test_gemma_base_mmlu --model_name="google/gemma-2-9b-it"
--> all 0.6387266771115225
3. python -m huggingface_test_gemma_base_mmlu --model_name="google/gemma-2-27b-it"
--> all 0.7518159806295399
4. python -m huggingface_test_gemma_base_mmlu --model_name="google/gemma-2-27b"
--> all 0.7517447657028913