matlok
's Collections
Mixture of Experts Papers
updated
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models
Paper
•
2401.15947
•
Published
•
49
DeepSeekMoE: Towards Ultimate Expert Specialization in
Mixture-of-Experts Language Models
Paper
•
2401.06066
•
Published
•
43
SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention
Paper
•
2312.07987
•
Published
•
40
Switch Transformers: Scaling to Trillion Parameter Models with Simple
and Efficient Sparsity
Paper
•
2101.03961
•
Published
•
14
Outrageously Large Neural Networks: The Sparsely-Gated
Mixture-of-Experts Layer
Paper
•
1701.06538
•
Published
•
5
Sparse Networks from Scratch: Faster Training without Losing Performance
Paper
•
1907.04840
•
Published
•
3
A Mixture of h-1 Heads is Better than h Heads
Paper
•
2005.06537
•
Published
•
2
FastMoE: A Fast Mixture-of-Expert Training System
Paper
•
2103.13262
•
Published
•
2
SpeechMoE: Scaling to Large Acoustic Models with Dynamic Routing Mixture
of Experts
Paper
•
2105.03036
•
Published
•
2
GShard: Scaling Giant Models with Conditional Computation and Automatic
Sharding
Paper
•
2006.16668
•
Published
•
3
A Review of Sparse Expert Models in Deep Learning
Paper
•
2209.01667
•
Published
•
3
Building a great multi-lingual teacher with sparsely-gated mixture of
experts for speech recognition
Paper
•
2112.05820
•
Published
•
2