Lee Gao's picture

Lee Gao

leegao19

·

AI & ML interests

None yet

Organizations

leegao19's activity

commented 15 papers 7 months ago

MoAI: Mixture of All Intelligence for Large Language and Vision Models

Paper • 2403.07508 • Published Mar 12 • 75 •

Simple linear attention language models balance the recall-throughput tradeoff

Paper • 2402.18668 • Published Feb 28 • 18 •

GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection

Paper • 2403.03507 • Published Mar 6 • 182 •

Yi: Open Foundation Models by 01.AI

Paper • 2403.04652 • Published Mar 7 • 61 •

Resonance RoPE: Improving Context Length Generalization of Large Language Models

Paper • 2403.00071 • Published Feb 29 • 22 •

The Hedgehog & the Porcupine: Expressive Linear Attentions with Softmax Mimicry

Paper • 2402.04347 • Published Feb 6 • 13 •

Hydragen: High-Throughput LLM Inference with Shared Prefixes

Paper • 2402.05099 • Published Feb 7 • 18 •

World Model on Million-Length Video And Language With RingAttention

Paper • 2402.08268 • Published Feb 13 • 36 •

Scaling Laws of RoPE-based Extrapolation

Paper • 2310.05209 • Published Oct 8, 2023 • 6 •

Efficient Long-Range Transformers: You Need to Attend More, but Not Necessarily at Every Layer

Paper • 2310.12442 • Published Oct 19, 2023 • 1 •

ChunkAttention: Efficient Self-Attention with Prefix-Aware KV Cache and Two-Phase Partition

Paper • 2402.15220 • Published Feb 23 • 19 •

Cure the headache of Transformers via Collinear Constrained Attention

Paper • 2309.08646 • Published Sep 15, 2023 • 12 •

Data Engineering for Scaling Language Models to 128K Context

Paper • 2402.10171 • Published Feb 15 • 21 •

Genie: Generative Interactive Environments

Paper • 2402.15391 • Published Feb 23 • 70 •

LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens

Paper • 2402.13753 • Published Feb 21 • 111 •

commented 5 papers 8 months ago

Repeat After Me: Transformers are Better than State Space Models at Copying

Paper • 2402.01032 • Published Feb 1 • 22 •

EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty

Paper • 2401.15077 • Published Jan 26 • 17 •

LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning

Paper • 2401.01325 • Published Jan 2 • 26 •

E^2-LLM: Efficient and Extreme Length Extension of Large Language Models

Paper • 2401.06951 • Published Jan 13 • 24 •

SliceGPT: Compress Large Language Models by Deleting Rows and Columns

Paper • 2401.15024 • Published Jan 26 • 67 •

New activity in cognitivecomputations/dolphin-2.6-mistral-7b-dpo-laser 9 months ago

Open Code

#3 opened 9 months ago by