Loss function?

#10
by narvind2003 - opened

My understanding is that the MoE model uses the same LM Loss like previous transformers. Is there any other aux losses used?
Please clarify or point me to the right file in the megablocks src. Thank you!

Disco Research org
bjoernp changed discussion status to closed

Sign up or log in to comment