No More Adam: Learning Rate Scaling at Initialization is All You Need Paper • 2412.11768 • Published 14 days ago • 41
TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks Paper • 2412.14161 • Published 12 days ago • 44
HiRED: Attention-Guided Token Dropping for Efficient Inference of High-Resolution Vision-Language Models in Resource-Constrained Environments Paper • 2408.10945 • Published Aug 20 • 9
PDFTriage: Question Answering over Long, Structured Documents Paper • 2309.08872 • Published Sep 16, 2023 • 53
Compressed Chain of Thought: Efficient Reasoning Through Dense Representations Paper • 2412.13171 • Published 13 days ago • 30
A Modern Self-Referential Weight Matrix That Learns to Modify Itself Paper • 2202.05780 • Published Feb 11, 2022