A question about the effectiveness of Qwen2.5-Math-PRM-7B in reinforcement learning
#7 opened about 4 hours ago
by
zsyyy
If the response length exceeds 4096, is a sliding window used, or is it simply truncated?
#6 opened 3 days ago
by
ShelterW
question about the step separato "\n\n"
1
#3 opened 5 days ago
by
pixas
Could you clarify whether the PRM800K deduplication was performed using the original 5000-test set from MATH or the MATH500 dataset?
3
#2 opened 6 days ago
by
masterLan
vllm support
1
#1 opened 6 days ago
by
baohao