Abstract
Test-time scaling is a promising new approach to language modeling that uses extra test-time compute to improve performance. Recently, OpenAI's o1 model showed this capability but did not publicly share its methodology, leading to many replication efforts. We seek the simplest approach to achieve test-time scaling and strong reasoning performance. First, we curate a small dataset s1K of 1,000 questions paired with reasoning traces relying on three criteria we validate through ablations: difficulty, diversity, and quality. Second, we develop budget forcing to control test-time compute by forcefully terminating the model's thinking process or lengthening it by appending "Wait" multiple times to the model's generation when it tries to end. This can lead the model to double-check its answer, often fixing incorrect reasoning steps. After supervised finetuning the Qwen2.5-32B-Instruct language model on s1K and equipping it with budget forcing, our model s1 exceeds o1-preview on competition math questions by up to 27% (MATH and AIME24). Further, scaling s1 with budget forcing allows extrapolating beyond its performance without test-time intervention: from 50% to 57% on AIME24. Our model, data, and code are open-source at https://github.com/simplescaling/s1.
Community
Discover how to achieve greatness with Attack Mode by Shwetabh Gangwar! Whether you're looking for the Attack Mode free download, the Attack Mode PDF, or simply the best course to improve your life, you’ve come to the right place.
Download Free Attack Mode Here
Get Attack Mode Shwetabh Gangwar free today and learn the proven techniques used in the Attack Mode program. Offering insights on how to level up your game, this course covers everything from Attack Mode Shwetabh Gangwar review to how to master the Attack Mode book and Attack Mode notion template. Don’t miss out on the chance to download Attack Mode by Shwetabh Gangwar and start your journey to success. Click the link below to get the course for free!
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper