SELF: Language-Driven Self-Evolution for Large Language Model Paper • 2310.00533 • Published Oct 1, 2023 • 2
GrowLength: Accelerating LLMs Pretraining by Progressively Growing Training Length Paper • 2310.00576 • Published Oct 1, 2023 • 2
A Pretrainer's Guide to Training Data: Measuring the Effects of Data Age, Domain Coverage, Quality, & Toxicity Paper • 2305.13169 • Published May 22, 2023 • 3
Transformers Can Achieve Length Generalization But Not Robustly Paper • 2402.09371 • Published Feb 14 • 13
Triple-Encoders: Representations That Fire Together, Wire Together Paper • 2402.12332 • Published Feb 19 • 2
Chain-of-Verification Reduces Hallucination in Large Language Models Paper • 2309.11495 • Published Sep 20, 2023 • 38
Contrastive Decoding Improves Reasoning in Large Language Models Paper • 2309.09117 • Published Sep 17, 2023 • 37
Physics of Language Models: Part 3.3, Knowledge Capacity Scaling Laws Paper • 2404.05405 • Published Apr 8 • 9
Neural Tangent Kernel: Convergence and Generalization in Neural Networks Paper • 1806.07572 • Published Jun 20, 2018 • 1
Procedural Knowledge in Pretraining Drives Reasoning in Large Language Models Paper • 2411.12580 • Published 3 days ago • 2
Studying Large Language Model Generalization with Influence Functions Paper • 2308.03296 • Published Aug 7, 2023 • 12