🔥🚀🌟 New Research Alert - YOCO! 🌟🚀🔥
📄 Title: You Only Cache Once: Decoder-Decoder Architectures for Language Models 🔝

📝 Description: YOCO is a novel decoder-decoder architecture for LLMs that reduces memory requirements, speeds up prefilling, and maintains global attention. It consists of a self-decoder for encoding KV caches and a cross-decoder for reusing these caches via cross-attention.

👥 Authors: Yutao Sun et al.

📄 Paper: You Only Cache Once: Decoder-Decoder Architectures for Language Models (2405.05254)

📁 Repository: https://github.com/microsoft/unilm/tree/master/YOCO

📚 More Papers: more cutting-edge research presented at other conferences in the DmitryRyumin/NewEraAI-Papers curated by @DmitryRyumin

🔍 Keywords: #YOCO #DecoderDecoder #LargeLanguageModels #EfficientArchitecture #GPUMemoryReduction #PrefillingSpeedup #GlobalAttention #DeepLearning #Innovation #AI