arxiv:2502.11157

Dyve: Thinking Fast and Slow for Dynamic Process Verification

Published on Feb 16

· Submitted by

Jianyuan1 on Feb 18

Upvote

Authors:

Jianyuan Zhong ,

Zeju Li ,

Xiangyu Wen ,

Abstract

We present Dyve, a dynamic process verifier that enhances reasoning error detection in large language models by integrating fast and slow thinking, inspired by Kahneman's Systems Theory. Dyve adaptively applies immediate token-level confirmation System 1 for straightforward steps and comprehensive analysis System 2 for complex ones. Leveraging a novel step-wise consensus-filtered process supervision technique, combining Monte Carlo estimation with LLM based evaluation, Dyve curates high-quality supervision signals from noisy data. Experimental results on ProcessBench and the MATH dataset confirm that Dyve significantly outperforms existing process-based verifiers and boosts performance in Best-of-N settings.

View arXiv page View PDF Add to collection

Community

Jianyuan1

Paper author Paper submitter 4 days ago

Large language models (LLMs) excel at complex reasoning but struggle with reliable step-by-step verification. Current methods face a trade-off: rapid "System 1" binary checks lack depth, while thorough "System 2" analyses are slow and computationally costly. To bridge this gap, we present Dyve, a dynamic verifier that adaptively combines fast token-level confirmation (System 1) for simple steps and deep analysis (System 2) for complex ones, inspired by Kahneman’s dual-process theory.

Dyve leverages a novel step-wise consensus-filtered supervision technique to train high-quality verifiers from noisy data. By generating diverse reasoning traces via Monte Carlo rollouts, filtering them with an LLM-as-a-Judge, and performing granular step-level error detection, Dyve distills 117K high-quality examples from 1.2M noisy samples. This approach ensures robust supervision while maintaining efficiency.

Evaluated on ProcessBench and MATH, Dyve outperforms existing verifiers, achieving state-of-the-art F1 scores (e.g., 68.5 on GSM8K and 58.3 on MATH) and generalizing effectively to Olympiad-level problems. When integrated with proposer LLMs in Best-of-N settings, Dyve boosts accuracy to 95.5% (N=8), demonstrating superior synergy between dynamic verification and reasoning generation. Dyve also balances speed and precision, offering significant faster inference than pure System 2 models.

Our work advances reliable AI reasoning by ensuring systematic, step-wise validation. Code, data, and models are open-sourced to support further research in trustworthy LLM development.

paper: https://arxiv.org/pdf/2502.11157
code: https://github.com/staymylove/Dyve
model: Jianyuan1/deepseek-r1-14b-cot-math-reasoning-full
data: Jianyuan1/cot-data

librarian-bot

4 days ago

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 1

Datasets citing this paper 1

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2502.11157 in a Space README.md to link it from this page.