๐ฅ๐๐ New Research Alert - YOCO! ๐๐๐ฅ ๐ Title: You Only Cache Once: Decoder-Decoder Architectures for Language Models ๐
๐ Description: YOCO is a novel decoder-decoder architecture for LLMs that reduces memory requirements, speeds up prefilling, and maintains global attention. It consists of a self-decoder for encoding KV caches and a cross-decoder for reusing these caches via cross-attention.