Hello, I am using Qwen2.5-Math-PRM-7B for PPO training, and we did not achieve the expected gains on math-en. However, we obtained good results using Skywork's PRM. Therefore, we would like to know how your PRM is used in reinforcement learning?
· Sign up or log in to comment