To read... eventually Collection A collection of papers that i have read or plan to read all in one place. Includes a wide range of topics. • 137 items • Updated 1 day ago • 3
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference Paper • 2412.13663 • Published 3 days ago • 86
To read... eventually Collection A collection of papers that i have read or plan to read all in one place. Includes a wide range of topics. • 137 items • Updated 1 day ago • 3
To read... eventually Collection A collection of papers that i have read or plan to read all in one place. Includes a wide range of topics. • 137 items • Updated 1 day ago • 3
Byte Latent Transformer: Patches Scale Better Than Tokens Paper • 2412.09871 • Published 8 days ago • 68
LLM Pruning and Distillation in Practice: The Minitron Approach Paper • 2408.11796 • Published Aug 21 • 57
To read... eventually Collection A collection of papers that i have read or plan to read all in one place. Includes a wide range of topics. • 137 items • Updated 1 day ago • 3
Transformers Can Navigate Mazes With Multi-Step Prediction Paper • 2412.05117 • Published 15 days ago • 5
To read... eventually Collection A collection of papers that i have read or plan to read all in one place. Includes a wide range of topics. • 137 items • Updated 1 day ago • 3
To read... eventually Collection A collection of papers that i have read or plan to read all in one place. Includes a wide range of topics. • 137 items • Updated 1 day ago • 3
Mooncake: A KVCache-centric Disaggregated Architecture for LLM Serving Paper • 2407.00079 • Published Jun 24 • 5
To read... eventually Collection A collection of papers that i have read or plan to read all in one place. Includes a wide range of topics. • 137 items • Updated 1 day ago • 3
Critical Tokens Matter: Token-Level Contrastive Estimation Enhence LLM's Reasoning Capability Paper • 2411.19943 • Published 22 days ago • 55
To read... eventually Collection A collection of papers that i have read or plan to read all in one place. Includes a wide range of topics. • 137 items • Updated 1 day ago • 3
Agent Skill Acquisition for Large Language Models via CycleQD Paper • 2410.14735 • Published Oct 16 • 2