When can we have the training code as illustrated in the paper.
Amazing work! Would love to have the training code.
Please check our modified megablocks: https://github.com/yikangshen/megablocks. To reproduce the training code, you only need to integrate it with the Megatron codebase.
Thanks a lot @YikangS .
Could you please explain the following statement ?
β To reproduce the training code, you only need to integrate it with the Megatron codebase.β
Is there a standard way to integrate above mentioned training code with the Megatron?
Sorry I am new to Megatrone and Megablocks.
ya, the original Megablocks repo provides an example of integrating Megablocks with Megatron.
You can also integrate this Megablocks repo into any pretraining framework you prefer.
Thanks a lot
@YikangS , just to clarify:
This is the example related to Megatron right?
https://github.com/databricks/megablocks/tree/main/exp/moe
@YikangS I went through the https://github.com/yikangshen/megablocks repositary. Could you please explain how we can add the https://github.com/myshell-ai/JetMoE model to the megabucks?
Is there a pre-training script in the modified Megablocks repository? If so please share the link.
In the JetMoE technical report, there are several key settings related to model pre-training (Section 4.1). How can we set this stuff in the modified Megablocks repository?
Do you think we can use the same method to fine-tune JetMOE as well?
+1. Thanks for the amazing work! Can you please share the Megatron integration codes to either pretrain or finetune JetMoE using MegaBlocks? This would be so helpful.
Sure, it will take some time to clean the code. I will release the full training code in 1 week.