RealCritic: Towards Effectiveness-Driven Evaluation of Language Model Critiques Paper • 2501.14492 • Published 19 days ago • 29
Omni-MATH: A Universal Olympiad Level Mathematic Benchmark For Large Language Models Paper • 2410.07985 • Published Oct 10, 2024 • 29
Synthetic Data (Almost) from Scratch: Generalized Instruction Tuning for Language Models Paper • 2402.13064 • Published Feb 20, 2024 • 48
MathScale: Scaling Instruction Tuning for Mathematical Reasoning Paper • 2403.02884 • Published Mar 5, 2024 • 17