SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training Paper • 2501.17161 • Published Jan 28 • 108
MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization Paper • 2412.12098 • Published Dec 16, 2024 • 4
RLDG: Robotic Generalist Policy Distillation via Reinforcement Learning Paper • 2412.09858 • Published Dec 13, 2024 • 1
Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs Paper • 2501.18585 • Published Jan 30 • 56
The Jumping Reasoning Curve? Tracking the Evolution of Reasoning Performance in GPT-[n] and o-[n] Models on Multimodal Puzzles Paper • 2502.01081 • Published Feb 3 • 14
Improving Transformer World Models for Data-Efficient RL Paper • 2502.01591 • Published about 1 month ago • 9
Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search Paper • 2502.02508 • Published about 1 month ago • 23
Boosting Multimodal Reasoning with MCTS-Automated Structured Thinking Paper • 2502.02339 • Published about 1 month ago • 22
A Probabilistic Inference Approach to Inference-Time Scaling of LLMs using Particle-Based Monte Carlo Methods Paper • 2502.01618 • Published about 1 month ago • 10
BOLT: Bootstrap Long Chain-of-Thought in Language Models without Distillation Paper • 2502.03860 • Published 28 days ago • 24
Step Back to Leap Forward: Self-Backtracking for Boosting Reasoning of Language Models Paper • 2502.04404 • Published 28 days ago • 23
Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling Paper • 2502.06703 • Published 24 days ago • 141
ReasonFlux: Hierarchical LLM Reasoning via Scaling Thought Templates Paper • 2502.06772 • Published 24 days ago • 20
LLMs Can Easily Learn to Reason from Demonstrations Structure, not content, is what matters! Paper • 2502.07374 • Published 23 days ago • 36
Teaching Language Models to Critique via Reinforcement Learning Paper • 2502.03492 • Published 30 days ago • 24
An Open Recipe: Adapting Language-Specific LLMs to a Reasoning Model in One Day via Model Merging Paper • 2502.09056 • Published 21 days ago • 30
The Danger of Overthinking: Examining the Reasoning-Action Dilemma in Agentic Tasks Paper • 2502.08235 • Published 22 days ago • 54
video-SALMONN-o1: Reasoning-enhanced Audio-visual Large Language Model Paper • 2502.11775 • Published 17 days ago • 8
Soundwave: Less is More for Speech-Text Alignment in LLMs Paper • 2502.12900 • Published 16 days ago • 76
Revisiting the Test-Time Scaling of o1-like Models: Do they Truly Possess Test-Time Scaling Capabilities? Paper • 2502.12215 • Published 17 days ago • 16
Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning Paper • 2502.14768 • Published 14 days ago • 44
The Relationship Between Reasoning and Performance in Large Language Models -- o3 (mini) Thinks Harder, Not Longer Paper • 2502.15631 • Published 13 days ago • 8
Multimodal Inconsistency Reasoning (MMIR): A New Benchmark for Multimodal Reasoning Models Paper • 2502.16033 • Published 13 days ago • 15
SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution Paper • 2502.18449 • Published 9 days ago • 64
MedVLM-R1: Incentivizing Medical Reasoning Capability of Vision-Language Models (VLMs) via Reinforcement Learning Paper • 2502.19634 • Published 8 days ago • 56
FINEREASON: Evaluating and Improving LLMs' Deliberate Reasoning through Reflective Puzzle Solving Paper • 2502.20238 • Published 7 days ago • 23
SoS1: O1 and R1-Like Reasoning LLMs are Sum-of-Square Solvers Paper • 2502.20545 • Published 7 days ago • 18
Cognitive Behaviors that Enable Self-Improving Reasoners, or, Four Habits of Highly Effective STaRs Paper • 2503.01307 • Published 3 days ago • 24
LADDER: Self-Improving LLMs Through Recursive Problem Decomposition Paper • 2503.00735 • Published 5 days ago • 14