Memory Augmented Language Models through Mixture of Word Experts Paper • 2311.10768 • Published Nov 15, 2023 • 16
MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts Paper • 2401.04081 • Published Jan 8, 2024 • 70
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models Paper • 2401.06066 • Published Jan 11, 2024 • 44
OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models Paper • 2402.01739 • Published Jan 29, 2024 • 26
Yuan 2.0-M32: Mixture of Experts with Attention Router Paper • 2405.17976 • Published May 28, 2024 • 18
Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts Language Models Paper • 2406.06563 • Published Jun 3, 2024 • 17
A Closer Look into Mixture-of-Experts in Large Language Models Paper • 2406.18219 • Published Jun 26, 2024 • 15
MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression Paper • 2406.14909 • Published Jun 21, 2024 • 14