metadata

license: apache-2.0
language:
  - en
base_model:
  - Qwen/Qwen2.5-Math-7B

Simple Reinforcement Learning for Reasoning

This is the model checkpoint in Project SimpleRL. Qwen-2.5-Math-7B-SimpleRL is the simple RL training from the base model with initial warmup stage.

Citation

If you find this blog or our code useful, we would appreciate it if you could cite our work:

@misc{
    zeng2025simplerl,
    title={7B Model and 8K Examples: Emerging Reasoning with Reinforcement Learning is Both Effective and Efficient},
    author={Weihao Zeng and Yuzhen Huang and Wei Liu and Keqing He and Qian Liu and Zejun Ma and Junxian He},
    year={2025},
    howpublished={\url{https://hkust-nlp.notion.site/simplerl-reason}},
    note={Notion Blog}
}