MoE?

#1
by WesPro - opened

Is this really a MoE model? Your mergekit yaml looks nothing like the one needed for creating a MoE.

I am actually quite new to this tool, therefore I can make some mistakes πŸ˜…. Feel free to correct me!

It looks like you actually used the slerp merge method. When you use slerp on two different 8b models the end-result is a model that is a mix of both models you merged but the result is still just one 8b parameters so it's not getting bigger aftert the merge and a MoE is more a like a collection of all the models you put in and they only get used when theyre needed so when you use a MoE model only a part of the model is actually active at the moment. If you want to know it more accurate you can read up on all the different merge methods it's not that easy to explain but I hope it's kind of understandable. MoE is actually more a way to run a collection of several models to safe computation power. Every expert you add gets added together and not really merged with the other models.

Got it!πŸ‘Œ

Sign up or log in to comment