Self-Training Elicits Concise Reasoning in Large Language Models Paper • 2502.20122 • Published 18 days ago • 2
DistiLLM-2: A Contrastive Approach Boosts the Distillation of LLMs Paper • 2503.07067 • Published 7 days ago • 27
Towards Fast Multilingual LLM Inference: Speculative Decoding and Specialized Drafters Paper • 2406.16758 • Published Jun 24, 2024 • 20
Block Transformer: Global-to-Local Language Modeling for Fast Inference Paper • 2406.02657 • Published Jun 4, 2024 • 40