Nickyang's picture
Update README.md
d18c25e verified
metadata
license: mit
language:
  - en
base_model:
  - deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
pipeline_tag: text-generation
library_name: transformers
FastCuRL-1.5B-Preview

FastCuRL Overview

We release FastCuRL-1.5B-Preview, a slow-thinking reasoning model that outperforms the previous SoTA DeepScaleR-1.5B-Preview with 50% training steps! We adapt a novel curriculum-guided iterative lengthening reinforcement learning to the DeepSeek-R1-Distill-Qwen-1.5B and observe continuous performance improvement as training steps increase. To better reproduce our work and advance research progress, we open-source our code, model, and data.

Code: https://github.com/nick7nlp/FastCuRL

Key Results

We report Pass@1 accuracy averaged over 16 samples for each problem.

Model AIME 2024 MATH 500 AMC 2023 Minerva Math OlympiadBench Avg.
Qwen2.5-Math-7B-Instruct 13.3 79.8 50.6 34.6 40.7 43.8
rStar-Math-7B 26.7 78.4 47.5 - 47.1 -
Eurus-2-7B-PRIME 26.7 79.2 57.8 38.6 42.1 48.9
Qwen2.5-7B-SimpleRL 26.7 82.4 62.5 39.7 43.3 50.9
DeepSeek-R1-Distill-Qwen-1.5B 28.8 82.8 62.9 26.5 43.3 48.9
Still-1.5B 32.5 84.4 66.7 29.0 45.4 51.6
DeepScaleR-1.5B-Preview 43.1 87.8 73.6 30.2 50.0 57.0
FastCuRL-1.5B-Preview 43.1 88.0 74.2 31.6 50.4 57.5

Training Data

Following DeepScaleR, our training dataset consists of 40,315 unique problem-answer pairs compiled from:

  • AIME problems (1984-2023)
  • AMC problems (before 2023)
  • Omni-MATH dataset
  • Still dataset

Acknowledgements