GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection Paper • 2403.03507 • Published Mar 6, 2024 • 183
Efficient Language Adaptive Pre-training: Extending State-of-the-Art Large Language Models for Polish Paper • 2402.09759 • Published Feb 15, 2024 • 1
MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts Paper • 2401.04081 • Published Jan 8, 2024 • 70