MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression Paper • 2406.14909 • Published 10 days ago • 12
LLM Compiler Collection Meta LLM Compiler is a state-of-the-art LLM that builds upon Code Llama with improved performance for code optimization and compiler reasoning. • 4 items • Updated 4 days ago • 121
A Closer Look into Mixture-of-Experts in Large Language Models Paper • 2406.18219 • Published 5 days ago • 12
Unlocking Continual Learning Abilities in Language Models Paper • 2406.17245 • Published 6 days ago • 26
Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts Language Models Paper • 2406.06563 • Published 28 days ago • 17
An Image is Worth 32 Tokens for Reconstruction and Generation Paper • 2406.07550 • Published 20 days ago • 53
Training-Free Long-Context Scaling of Large Language Models Paper • 2402.17463 • Published Feb 27 • 19
Representation Engineering: A Top-Down Approach to AI Transparency Paper • 2310.01405 • Published Oct 2, 2023 • 5
YaRN: Efficient Context Window Extension of Large Language Models Paper • 2309.00071 • Published Aug 31, 2023 • 59
Reducing Transformer Key-Value Cache Size with Cross-Layer Attention Paper • 2405.12981 • Published May 21 • 26
A Unified Sequence Parallelism Approach for Long Context Generative AI Paper • 2405.07719 • Published May 13 • 1
Beyond Scaling Laws: Understanding Transformer Performance with Associative Memory Paper • 2405.08707 • Published May 14 • 27
vAttention: Dynamic Memory Management for Serving LLMs without PagedAttention Paper • 2405.04437 • Published May 7 • 3
Quantizable Transformers: Removing Outliers by Helping Attention Heads Do Nothing Paper • 2306.12929 • Published Jun 22, 2023 • 11
OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework Paper • 2404.14619 • Published Apr 22 • 124
Meta Llama 3 Collection This collection hosts the transformers and original repos of the Meta Llama 3 and Llama Guard 2 releases • 5 items • Updated Apr 18 • 616
Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length Paper • 2404.08801 • Published Apr 12 • 62
Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention Paper • 2404.07143 • Published Apr 10 • 97
MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies Paper • 2404.06395 • Published Apr 9 • 18
MiniGPT4-Video: Advancing Multimodal LLMs for Video Understanding with Interleaved Visual-Textual Tokens Paper • 2404.03413 • Published Apr 4 • 22
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models Paper • 2404.02258 • Published Apr 2 • 102
Rethinking Memory and Communication Cost for Efficient Large Language Model Training Paper • 2310.06003 • Published Oct 9, 2023 • 2
Optimized Network Architectures for Large Language Model Training with Billions of Parameters Paper • 2307.12169 • Published Jul 22, 2023 • 9
Cobra: Extending Mamba to Multi-Modal Large Language Model for Efficient Inference Paper • 2403.14520 • Published Mar 21 • 31
Unicron: Economizing Self-Healing LLM Training at Scale Paper • 2401.00134 • Published Dec 30, 2023 • 9
LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens Paper • 2402.13753 • Published Feb 21 • 106
Recurrent Drafter for Fast Speculative Decoding in Large Language Models Paper • 2403.09919 • Published Mar 14 • 19
LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning Paper • 2401.01325 • Published Jan 2 • 25
Advancing Transformer Architecture in Long-Context Large Language Models: A Comprehensive Survey Paper • 2311.12351 • Published Nov 21, 2023 • 3
Infinite-LLM: Efficient LLM Service for Long Context with DistAttention and Distributed KVCache Paper • 2401.02669 • Published Jan 5 • 12
Beyond the Limits: A Survey of Techniques to Extend the Context Length in Large Language Models Paper • 2402.02244 • Published Feb 3 • 1
BurstAttention: An Efficient Distributed Attention Framework for Extremely Long Sequences Paper • 2403.09347 • Published Mar 14 • 20
LightSeq: Sequence Level Parallelism for Distributed Training of Long Context Transformers Paper • 2310.03294 • Published Oct 5, 2023 • 2
DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models Paper • 2309.14509 • Published Sep 25, 2023 • 16
Striped Attention: Faster Ring Attention for Causal Transformers Paper • 2311.09431 • Published Nov 15, 2023 • 4
Ring Attention with Blockwise Transformers for Near-Infinite Context Paper • 2310.01889 • Published Oct 3, 2023 • 9
Sequence Parallelism: Long Sequence Training from System Perspective Paper • 2105.13120 • Published May 26, 2021 • 5
DeepSeek-VL: Towards Real-World Vision-Language Understanding Paper • 2403.05525 • Published Mar 8 • 39
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context Paper • 2403.05530 • Published Mar 8 • 51