Compressing KV Cache for Long-Context LLM Inference with Inter-Layer Attention Similarity Paper • 2412.02252 • Published Dec 3, 2024 • 2
Hibiki fr-en Collection Hibiki is a model for streaming speech translation , which can run on device! See https://github.com/kyutai-labs/hibiki. • 5 items • Updated Feb 6 • 50