README.md · Nickyang/FastCuRL-1.5B-Preview at main

metadata

license: mit
language:
  - en
base_model:
  - deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
pipeline_tag: text-generation
library_name: transformers

FastCuRL-1.5B-Preview

FastCuRL Overview

We release FastCuRL-1.5B-Preview, a slow-thinking reasoning model that outperforms the previous SoTA DeepScaleR-1.5B-Preview with 50% training steps! We adapt a novel curriculum-guided iterative lengthening reinforcement learning to the DeepSeek-R1-Distill-Qwen-1.5B and observe continuous performance improvement as training steps increase. To better reproduce our work and advance research progress, we open-source our code, model, and data.

Code: https://github.com/nick7nlp/FastCuRL

Key Results

We report Pass@1 accuracy averaged over 16 samples for each problem.

Model	AIME 2024	MATH 500	AMC 2023	Minerva Math	OlympiadBench	Avg.
Qwen2.5-Math-7B-Instruct	13.3	79.8	50.6	34.6	40.7	43.8
rStar-Math-7B	26.7	78.4	47.5	-	47.1	-
Eurus-2-7B-PRIME	26.7	79.2	57.8	38.6	42.1	48.9
Qwen2.5-7B-SimpleRL	26.7	82.4	62.5	39.7	43.3	50.9
DeepSeek-R1-Distill-Qwen-1.5B	28.8	82.8	62.9	26.5	43.3	48.9
Still-1.5B	32.5	84.4	66.7	29.0	45.4	51.6
DeepScaleR-1.5B-Preview	43.1	87.8	73.6	30.2	50.0	57.0
FastCuRL-1.5B-Preview	43.1	88.0	74.2	31.6	50.4	57.5

Training Data

Following DeepScaleR, our training dataset consists of 40,315 unique problem-answer pairs compiled from:

AIME problems (1984-2023)
AMC problems (before 2023)
Omni-MATH dataset
Still dataset

Acknowledgements

Our training experiments are powered by our heavily modified fork of verl and deepscaler.
Our model is trained on top of DeepSeek-R1-Distill-Qwen-1.5B.