LocMoE: A Low-overhead MoE for Large Language Model Training Paper • 2401.13920 • Published Jan 25 • 2
HyperRouter: Towards Efficient Training and Inference of Sparse Mixture of Experts Paper • 2312.07035 • Published Dec 12, 2023 • 2
DEMix Layers: Disentangling Domains for Modular Language Modeling Paper • 2108.05036 • Published Aug 11, 2021 • 3