Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
DmitryRyuminΒ 
posted an update May 9
Post
1814
πŸ”₯πŸš€πŸŒŸ New Research Alert - YOCO! πŸŒŸπŸš€πŸ”₯
πŸ“„ Title: You Only Cache Once: Decoder-Decoder Architectures for Language Models πŸ”

πŸ“ Description: YOCO is a novel decoder-decoder architecture for LLMs that reduces memory requirements, speeds up prefilling, and maintains global attention. It consists of a self-decoder for encoding KV caches and a cross-decoder for reusing these caches via cross-attention.

πŸ‘₯ Authors: Yutao Sun et al.

πŸ“„ Paper: You Only Cache Once: Decoder-Decoder Architectures for Language Models (2405.05254)

πŸ“ Repository: https://github.com/microsoft/unilm/tree/master/YOCO

πŸ“š More Papers: more cutting-edge research presented at other conferences in the DmitryRyumin/NewEraAI-Papers curated by @DmitryRyumin

πŸ” Keywords: #YOCO #DecoderDecoder #LargeLanguageModels #EfficientArchitecture #GPUMemoryReduction #PrefillingSpeedup #GlobalAttention #DeepLearning #Innovation #AI

ηΎŠζ―›ε‡Ίεœ¨ηΎŠθΊ«δΈŠοΌŒ

The evaluation benchmarks use zero-shot, where usually few-shot is used. This raises the question whether the few-shot results weren't as good, compared to similar-sized models.

Hopefully the authors will release few-shot results, aligned with common practice (e.g. HF Open LLM Leaderboard)

Β·

We report zero-shot numbers so that we can directly compare the results with stablelm and openllama-v2, which all follow the zero-shot protocol. The trend of few-shot results is similar to zero-shot.