A question about the effectiveness of Qwen2.5-Math-PRM-7B in reinforcement learning

by zsyyy - opened Jan 20

Jan 20

Hello, I am using Qwen2.5-Math-PRM-7B for PPO training, and we did not achieve the expected gains on math-en. However, we obtained good results using Skywork's PRM. Therefore, we would like to know how your PRM is used in reinforcement learning?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment