PRIME-RL/Eurus-2-7B-PRIME
Text Generation
•
Updated
•
87
•
16
None defined yet.
Through PRIME, we successfully achieve substantial improvements on key reasoning benchmarks over our SFT version of the model, leading to 16.7% improvement on average, and over 20% on AMC&AIME competitions. Our final model Eurus-2-7B-PRIME, based on Qwen-2.5-Math-7B-Base, surpassed its instruct version on 5 key reasoning benchmarks. The final results are presented below:
Eurus-2-7B-PRIME | Eurus-2-7B-SFT | Qwen-2.5-Math-7B-Instruct | Llama-3.1-70B-Instruct | GPT-4o | |
---|---|---|---|---|---|
AIME 2024 | 26.7 (+23.3) | 3.3 | 13.3 | 16.7 | 9.3 |
MATH-500 | 79.2 (+14.1) | 65.1 | 79.8 | 64.6 | 76.4 |
AMC | 57.8 (+27.7) | 30.1 | 50.6 | 30.1 | 45.8 |
Minerva Math | 38.6 (+5.9) | 32.7 | 34.6 | 35.3 | 36.8 |
OlympiadBench | 42.1 (+12.3) | 29.8 | 40.7 | 31.9 | 43.3 |
Avg. | 48.9 (+16.7) | 32.2 | 43.8 | 35.7 | 43.3 |
We achieved this with only 1/10 data and model resources compared with Qwen-Math.
Eurus-2-7B-PRIME | Qwen2.5-Math-7B-Instruct | |
---|---|---|
Base Model | Qwen2.5-Math-7B | Qwen2.5-Math-7B |
SFT Data | 230K (open-source) | 2.5M (open-source and in-house) |
RM Data | 0 | 618K (in-house) |
RM | Eurus-2-7B-SFT | Qwen2.5-Math-RM (72B) |
RL Data | 150K queries \times 4 samples | 66K queries \times 32 samples |
If you find PRIME or ImplicitPRM helpful, please cite us.
@misc{cui2024process,
title={Process Reinforcement through Implicit Rewards},
author={Ganqu Cui and Lifan Yuan and Zefan Wang and Hanbin Wang and Wendi Li and Bingxiang He and Yuchen Fan and Tianyu Yu and Qixin Xu and Weize Chen and Jiarui Yuan and Huayu Chen and Kaiyan Zhang and Xingtai Lv and Shuo Wang and Yuan Yao and Hao Peng and Yu Cheng and Zhiyuan Liu and Maosong Sun and Bowen Zhou and Ning Ding},
year={2025},
howpublished={\url{https://curvy-check-498.notion.site/Process-Reinforcement-through-Implicit-Rewards-15f4fcb9c42180f1b498cc9b2eaf896f}},
note={Notion Blog}
}
@article{yuan2024implicitprm,
title={Free Process Rewards without Process Labels},
author={Lifan Yuan and Wendi Li and Huayu Chen and Ganqu Cui and Ning Ding and Kaiyan Zhang and Bowen Zhou and Zhiyuan Liu and Hao Peng},
journal={arXiv preprint arXiv:2412.01981},
year={2024}
}