ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization Paper • 2406.05981 • Published Jun 10, 2024 • 12
When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models Paper • 2406.07368 • Published Jun 11, 2024 • 2
Efficiently Serving LLM Reasoning Programs with Certaindex Paper • 2412.20993 • Published 8 days ago • 31
Efficiently Serving LLM Reasoning Programs with Certaindex Paper • 2412.20993 • Published 8 days ago • 31
LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs Paper • 2408.07055 • Published Aug 13, 2024 • 66
Break the Sequential Dependency of LLM Inference Using Lookahead Decoding Paper • 2402.02057 • Published Feb 3, 2024