DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads Paper • 2410.10819 • Published Oct 14 • 6
Data Engineering for Scaling Language Models to 128K Context Paper • 2402.10171 • Published Feb 15 • 21