yuzhen17's picture
Update README.md
3960f70 verified
|
raw
history blame
927 Bytes
metadata
license: apache-2.0
language:
  - en
base_model:
  - Qwen/Qwen2.5-Math-7B

Simple Reinforcement Learning for Reasoning

Notion

This is the model checkpoint in Project SimpleRL. Qwen-2.5-Math-7B-SimpleRL is the simple RL training from the base model with initial warmup stage.

Citation

If you find this blog or our code useful, we would appreciate it if you could cite our work:

@misc{
    zeng2025simplerl,
    title={7B Model and 8K Examples: Emerging Reasoning with Reinforcement Learning is Both Effective and Efficient},
    author={Weihao Zeng and Yuzhen Huang and Wei Liu and Keqing He and Qian Liu and Zejun Ma and Junxian He},
    year={2025},
    howpublished={\url{https://hkust-nlp.notion.site/simplerl-reason}},
    note={Notion Blog}
}