Dynamic Memory Compression: Retrofitting LLMs for Accelerated Inference Paper • 2403.09636 • Published Mar 14 • 2
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models Paper • 2401.15947 • Published Jan 29 • 48
Enhancing Multimodal Large Language Models with Vision Detection Models: An Empirical Study Paper • 2401.17981 • Published Jan 31 • 1
What Algorithms can Transformers Learn? A Study in Length Generalization Paper • 2310.16028 • Published Oct 24, 2023 • 2
Empower Your Model with Longer and Better Context Comprehension Paper • 2307.13365 • Published Jul 25, 2023 • 1
Transformer Language Models without Positional Encodings Still Learn Positional Information Paper • 2203.16634 • Published Mar 30, 2022 • 5
Faster Neighborhood Attention: Reducing the O(n^2) Cost of Self Attention at the Threadblock Level Paper • 2403.04690 • Published Mar 7 • 1
Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation Paper • 2310.05737 • Published Oct 9, 2023 • 4
An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models Paper • 2403.06764 • Published Mar 11 • 25
MoAI: Mixture of All Intelligence for Large Language and Vision Models Paper • 2403.07508 • Published Mar 12 • 75
Unfamiliar Finetuning Examples Control How Language Models Hallucinate Paper • 2403.05612 • Published Mar 8 • 3
LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens Paper • 2402.13753 • Published Feb 21 • 111
Zoology: Measuring and Improving Recall in Efficient Language Models Paper • 2312.04927 • Published Dec 8, 2023 • 2
Advancing Transformer Architecture in Long-Context Large Language Models: A Comprehensive Survey Paper • 2311.12351 • Published Nov 21, 2023 • 3
Sequence Parallelism: Long Sequence Training from System Perspective Paper • 2105.13120 • Published May 26, 2021 • 5
Foundation AI Papers Collection Curated List of Must-Reads on LLM reasoning at Temus AI team • 135 items • Updated Jun 15 • 26
Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models Paper • 2304.08818 • Published Apr 18, 2023 • 7
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection Paper • 2403.03507 • Published Mar 6 • 182
The Reversal Curse: LLMs trained on "A is B" fail to learn "B is A" Paper • 2309.12288 • Published Sep 21, 2023 • 3
AtP*: An efficient and scalable method for localizing LLM behaviour to components Paper • 2403.00745 • Published Mar 1 • 11
Functional Interpolation for Relative Positions Improves Long Context Transformers Paper • 2310.04418 • Published Oct 6, 2023 • 4
Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention Paper • 2006.16236 • Published Jun 29, 2020 • 2
The Hedgehog & the Porcupine: Expressive Linear Attentions with Softmax Mimicry Paper • 2402.04347 • Published Feb 6 • 13
Resonance RoPE: Improving Context Length Generalization of Large Language Models Paper • 2403.00071 • Published Feb 29 • 22
Hydragen: High-Throughput LLM Inference with Shared Prefixes Paper • 2402.05099 • Published Feb 7 • 18
LongHeads: Multi-Head Attention is Secretly a Long Context Processor Paper • 2402.10685 • Published Feb 16 • 1
Do Large Language Models Latently Perform Multi-Hop Reasoning? Paper • 2402.16837 • Published Feb 26 • 24
MobiLlama: Towards Accurate and Lightweight Fully Transparent GPT Paper • 2402.16840 • Published Feb 26 • 23
Evaluating Very Long-Term Conversational Memory of LLM Agents Paper • 2402.17753 • Published Feb 27 • 18
Simple linear attention language models balance the recall-throughput tradeoff Paper • 2402.18668 • Published Feb 28 • 18
Beyond Language Models: Byte Models are Digital World Simulators Paper • 2402.19155 • Published Feb 29 • 49
Ring Attention with Blockwise Transformers for Near-Infinite Context Paper • 2310.01889 • Published Oct 3, 2023 • 10
World Model on Million-Length Video And Language With RingAttention Paper • 2402.08268 • Published Feb 13 • 36
DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models Paper • 2309.14509 • Published Sep 25, 2023 • 17
Efficient Long-Range Transformers: You Need to Attend More, but Not Necessarily at Every Layer Paper • 2310.12442 • Published Oct 19, 2023 • 1
Zebra: Extending Context Window with Layerwise Grouped Local-Global Attention Paper • 2312.08618 • Published Dec 14, 2023 • 11
ChunkAttention: Efficient Self-Attention with Prefix-Aware KV Cache and Two-Phase Partition Paper • 2402.15220 • Published Feb 23 • 19
Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models Paper • 2402.17177 • Published Feb 27 • 88
MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs Paper • 2402.15627 • Published Feb 23 • 34
In-Context Pretraining: Language Modeling Beyond Document Boundaries Paper • 2310.10638 • Published Oct 16, 2023 • 28
Data Engineering for Scaling Language Models to 128K Context Paper • 2402.10171 • Published Feb 15 • 21
Rethinking Interpretability in the Era of Large Language Models Paper • 2402.01761 • Published Jan 30 • 21
Flexibly Scaling Large Language Models Contexts Through Extensible Tokenization Paper • 2401.07793 • Published Jan 15 • 3