Multi-Turn Code Generation Through Single-Step Rewards Paper • 2502.20380 • Published 6 days ago • 28
Cognitive Behaviors that Enable Self-Improving Reasoners, or, Four Habits of Highly Effective STaRs Paper • 2503.01307 • Published 2 days ago • 23
Predictive Data Selection: The Data That Predicts Is the Data That Teaches Paper • 2503.00808 • Published 3 days ago • 48
SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution Paper • 2502.18449 • Published 8 days ago • 62
TheoremExplainAgent: Towards Multimodal Explanations for LLM Theorem Understanding Paper • 2502.19400 • Published 7 days ago • 41
Mol-LLaMA: Towards General Understanding of Molecules in Large Molecular Language Model Paper • 2502.13449 • Published 15 days ago • 42
MLGym: A New Framework and Benchmark for Advancing AI Research Agents Paper • 2502.14499 • Published 13 days ago • 171
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention Paper • 2502.11089 • Published 17 days ago • 139
The Danger of Overthinking: Examining the Reasoning-Action Dilemma in Agentic Tasks Paper • 2502.08235 • Published 21 days ago • 54
CodeI/O: Condensing Reasoning Patterns via Code Input-Output Prediction Paper • 2502.07316 • Published 22 days ago • 46
ReasonFlux: Hierarchical LLM Reasoning via Scaling Thought Templates Paper • 2502.06772 • Published 23 days ago • 20
Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling Paper • 2502.06703 • Published 23 days ago • 141
LLMs Can Easily Learn to Reason from Demonstrations Structure, not content, is what matters! Paper • 2502.07374 • Published 22 days ago • 36
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach Paper • 2502.05171 • Published 26 days ago • 121