Are Emergent Abilities of Large Language Models a Mirage? Paper • 2304.15004 • Published Apr 28, 2023 • 6
Pantograph: A Machine-to-Machine Interaction Interface for Advanced Theorem Proving, High Level Reasoning, and Data Extraction in Lean 4 Paper • 2410.16429 • Published Oct 21 • 3
ZIP-FIT: Embedding-Free Data Selection via Compression-Based Alignment Paper • 2410.18194 • Published Oct 23 • 4
Beyond Scale: the Diversity Coefficient as a Data Quality Metric Demonstrates LLMs are Pre-trained on Formally Diverse Data Paper • 2306.13840 • Published Jun 24, 2023 • 11