Scaling Laws for Reward Model Overoptimization in Direct Alignment Algorithms Paper • 2406.02900 • Published Jun 5 • 11
Show, Don't Tell: Aligning Language Models with Demonstrated Feedback Paper • 2406.00888 • Published Jun 2 • 30
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits Paper • 2402.17764 • Published Feb 27 • 602
LiPO: Listwise Preference Optimization through Learning-to-Rank Paper • 2402.01878 • Published Feb 2 • 19
Rephrasing the Web: A Recipe for Compute and Data-Efficient Language Modeling Paper • 2401.16380 • Published Jan 29 • 48
StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback Paper • 2402.01391 • Published Feb 2 • 41
Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research Paper • 2402.00159 • Published Jan 31 • 59
Generative AI for Math: Part I -- MathPile: A Billion-Token-Scale Pretraining Corpus for Math Paper • 2312.17120 • Published Dec 28, 2023 • 25