Compressing KV Cache for Long-Context LLM Inference with Inter-Layer Attention Similarity Paper • 2412.02252 • Published Dec 3, 2024 • 2