jetmoe
/

jetmoe-8b

YikangS commited on Mar 25

Commit

3dcf664

•

1 Parent(s): 7ad0442

update readme

Files changed (1) hide show

README.md CHANGED Viewed

@@ -57,8 +57,6 @@ Each MoA and MoE layer has 8 expert, and 2 experts are activated for each input
 It has 8 billion parameters in total and 2.2B active parameters.
 JetMoE-8B is trained on 1.25T tokens from publicly available datasets, with a learning rate of 5.0 x 10<sup>-4</sup> and a global batch-size of 4M tokens.
-**Model Developers** JetMoE is developed by Yikang Shen and MyShell.
 **Input** Models input text only.
 **Output** Models generate text only.

 It has 8 billion parameters in total and 2.2B active parameters.
 JetMoE-8B is trained on 1.25T tokens from publicly available datasets, with a learning rate of 5.0 x 10<sup>-4</sup> and a global batch-size of 4M tokens.
 **Input** Models input text only.
 **Output** Models generate text only.