MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases Paper • 2402.14905 • Published Feb 22, 2024 • 127
Recursive Speculative Decoding: Accelerating LLM Inference via Sampling Without Replacement Paper • 2402.14160 • Published Feb 21, 2024 • 1