Multi-Turn Code Generation Through Single-Step Rewards
Abstract
We address the problem of code generation from multi-turn execution feedback. Existing methods either generate code without feedback or use complex, hierarchical reinforcement learning to optimize multi-turn rewards. We propose a simple yet scalable approach, muCode, that solves multi-turn code generation using only single-step rewards. Our key insight is that code generation is a one-step recoverable MDP, where the correct code can be recovered from any intermediate code state in a single turn. muCode iteratively trains both a generator to provide code solutions conditioned on multi-turn execution feedback and a verifier to score the newly generated code. Experimental evaluations show that our approach achieves significant improvements over the state-of-the-art baselines. We provide analysis of the design choices of the reward models and policy, and show the efficacy of muCode at utilizing the execution feedback. Our code is available at https://github.com/portal-cornell/muCode.
Community
We are excited to share our paper “Multi-Turn Code Generation Through Single-Step Rewards“. Please find our project page at https://portal-cornell.github.io/muCode/
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- ACECODER: Acing Coder RL via Automated Test-Case Synthesis (2025)
- Learning to Solve and Verify: A Self-Play Framework for Code and Test Generation (2025)
- RefineCoder: Iterative Improving of Large Language Models via Adaptive Critique Refinement for Code Generation (2025)
- CODESIM: Multi-Agent Code Generation and Problem Solving through Simulation-Driven Planning and Debugging (2025)
- VersaPRM: Multi-Domain Process Reward Model via Synthetic Reasoning Data (2025)
- Process-Supervised Reinforcement Learning for Code Generation (2025)
- ConvCodeWorld: Benchmarking Conversational Code Generation in Reproducible Feedback Environments (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper