leonardlin
's Collections
Soaring from 4K to 400K: Extending LLM's Context with Activation Beacon
Paper
•
2401.03462
•
Published
•
27
MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers
Paper
•
2305.07185
•
Published
•
9
YaRN: Efficient Context Window Extension of Large Language Models
Paper
•
2309.00071
•
Published
•
65
Infinite-LLM: Efficient LLM Service for Long Context with DistAttention
and Distributed KVCache
Paper
•
2401.02669
•
Published
•
14
LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning
Paper
•
2401.01325
•
Published
•
27
Extending Context Window of Large Language Models via Semantic
Compression
Paper
•
2312.09571
•
Published
•
12
Zebra: Extending Context Window with Layerwise Grouped Local-Global
Attention
Paper
•
2312.08618
•
Published
•
11
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Paper
•
2312.00752
•
Published
•
138
E^2-LLM: Efficient and Extreme Length Extension of Large Language Models
Paper
•
2401.06951
•
Published
•
25
Extending LLMs' Context Window with 100 Samples
Paper
•
2401.07004
•
Published
•
15
KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache
Quantization
Paper
•
2401.18079
•
Published
•
7
KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache
Paper
•
2402.02750
•
Published
•
3
A Human-Inspired Reading Agent with Gist Memory of Very Long Contexts
Paper
•
2402.09727
•
Published
•
36
In Search of Needles in a 10M Haystack: Recurrent Memory Finds What LLMs
Miss
Paper
•
2402.10790
•
Published
•
41
LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens
Paper
•
2402.13753
•
Published
•
112
Data Engineering for Scaling Language Models to 128K Context
Paper
•
2402.10171
•
Published
•
23
Striped Attention: Faster Ring Attention for Causal Transformers
Paper
•
2311.09431
•
Published
•
4
Ring Attention with Blockwise Transformers for Near-Infinite Context
Paper
•
2310.01889
•
Published
•
10
LLoCO: Learning Long Contexts Offline
Paper
•
2404.07979
•
Published
•
20
Leave No Context Behind: Efficient Infinite Context Transformers with
Infini-attention
Paper
•
2404.07143
•
Published
•
104
InfLLM: Unveiling the Intrinsic Capacity of LLMs for Understanding
Extremely Long Sequences with Training-Free Memory
Paper
•
2402.04617
•
Published
•
4
Megalodon: Efficient LLM Pretraining and Inference with Unlimited
Context Length
Paper
•
2404.08801
•
Published
•
63
TransformerFAM: Feedback attention is working memory
Paper
•
2404.09173
•
Published
•
43
Extending Llama-3's Context Ten-Fold Overnight
Paper
•
2404.19553
•
Published
•
33
Make Your LLM Fully Utilize the Context
Paper
•
2404.16811
•
Published
•
52
Long-context LLMs Struggle with Long In-context Learning
Paper
•
2404.02060
•
Published
•
35
HyperAttention: Long-context Attention in Near-Linear Time
Paper
•
2310.05869
•
Published
•
2
World Model on Million-Length Video And Language With RingAttention
Paper
•
2402.08268
•
Published
•
37
LongNet: Scaling Transformers to 1,000,000,000 Tokens
Paper
•
2307.02486
•
Published
•
80
MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via
Dynamic Sparse Attention
Paper
•
2407.02490
•
Published
•
23
LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs
Paper
•
2408.07055
•
Published
•
64