Crazy Stuff
#1
by
dillfrescott
- opened
Good job. Is there some sort of contamination or have 7b's really caught up to giant models of last year?
In all honesty, I think this scores way higher on this benchmark than it deserves.
My hypothesis is that xDAN-AI/xDAN-L1-Chat-RL-v1 answers questions in a very specific way that MT-Bench likes and my merge managed to keep that element while boosting its knowledge in xDAN-L1-Chat-RL-v1's weaknesses like STEM, Writing, and RP.
I'm combining it to an MoE here.