Organization Card

Process Reinforcement Through Implicit Rewards

Links

Evaluation

Through PRIME, we successfully achieve substantial improvements on key reasoning benchmarks over our SFT version of the model, leading to 16.7% improvement on average, and over 20% on AMC&AIME competitions. Our final model Eurus-2-7B-PRIME, based on Qwen-2.5-Math-7B-Base, surpassed its instruct version on 5 key reasoning benchmarks. The final results are presented below:

	Eurus-2-7B-PRIME	Eurus-2-7B-SFT	Qwen-2.5-Math-7B-Instruct	Llama-3.1-70B-Instruct	GPT-4o
AIME 2024	26.7 (+23.3)	3.3	13.3	16.7	9.3
MATH-500	79.2 (+14.1)	65.1	79.8	64.6	76.4
AMC	57.8 (+27.7)	30.1	50.6	30.1	45.8
Minerva Math	38.6 (+5.9)	32.7	34.6	35.3	36.8
OlympiadBench	42.1 (+12.3)	29.8	40.7	31.9	43.3
Avg.	48.9 (+16.7)	32.2	43.8	35.7	43.3

We achieved this with only 1/10 data and model resources compared with Qwen-Math.

	Eurus-2-7B-PRIME	Qwen2.5-Math-7B-Instruct
Base Model	Qwen2.5-Math-7B	Qwen2.5-Math-7B
SFT Data	230K (open-source)	2.5M (open-source and in-house)
RM Data	0	618K (in-house)
RM	Eurus-2-7B-SFT	Qwen2.5-Math-RM (72B)
RL Data	150K queries \times 4 samples	66K queries \times 32 samples

Citation

If you find PRIME or ImplicitPRM helpful, please cite us.

@article{cui2025process,
  title={Process reinforcement through implicit rewards},
  author={Cui, Ganqu and Yuan, Lifan and Wang, Zefan and Wang, Hanbin and Li, Wendi and He, Bingxiang and Fan, Yuchen and Yu, Tianyu and Xu, Qixin and Chen, Weize and others},
  journal={arXiv preprint arXiv:2502.01456},
  year={2025}
}

@article{yuan2024implicitprm,
  title={Free Process Rewards without Process Labels},
  author={Lifan Yuan and Wendi Li and Huayu Chen and Ganqu Cui and Ning Ding and Kaiyan Zhang and Bowen Zhou and Zhiyuan Liu and Hao Peng},
  journal={arXiv preprint arXiv:2412.01981},
  year={2024}
}

models 4

datasets 5

PRIME-RL/Eurus-2-SFT-Data

Viewer • Updated 22 days ago • 230k • 255 • 11

PRIME-RL/EurusPRM-Stage1-Data

Viewer • Updated 22 days ago • 463k • 139 • 4

PRIME-RL/Eurus-2-RL-Data

Viewer • Updated 22 days ago • 483k • 2.75k • 29

PRIME-RL/Eurus-2-Rollout

Viewer • Updated Jan 13 • 300k • 164 • 2

PRIME-RL/EurusPRM-Stage2-Data

Viewer • Updated Dec 30, 2024 • 30.1k • 68 • 2

PRIME

AI & ML interests

Recent Activity

Process Reinforcement Through Implicit Rewards

Links

Evaluation

Citation

models 4

PRIME-RL/EurusPRM-Stage1

PRIME-RL/Eurus-2-7B-SFT

PRIME-RL/Eurus-2-7B-PRIME

PRIME-RL/EurusPRM-Stage2

datasets 5

PRIME-RL/Eurus-2-SFT-Data

PRIME-RL/EurusPRM-Stage1-Data

PRIME-RL/Eurus-2-RL-Data

PRIME-RL/Eurus-2-Rollout

PRIME-RL/EurusPRM-Stage2-Data

AI & ML interests

Recent Activity

Team members 6

Process Reinforcement Through Implicit Rewards

Links

Evaluation

Citation

models 4 Sort: Recently updated

datasets 5 Sort: Recently updated

models 4

datasets 5