Agentless: Demystifying LLM-based Software Engineering Agents Paper • 2407.01489 • Published Jul 1, 2024 • 56
Agent Laboratory: Using LLM Agents as Research Assistants Paper • 2501.04227 • Published 6 days ago • 72
Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Though Paper • 2501.04682 • Published 5 days ago • 75
LLM Reasoning Papers Collection Papers to improve reasoning capabilities of LLMs • 18 items • Updated 4 days ago • 99
HumanEval Pro and MBPP Pro: Evaluating Large Language Models on Self-invoking Code Generation Paper • 2412.21199 • Published 14 days ago • 11
Training Software Engineering Agents and Verifiers with SWE-Gym Paper • 2412.21139 • Published 14 days ago • 20
Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs Paper • 2412.21187 • Published 14 days ago • 34
OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis Paper • 2412.19723 • Published 17 days ago • 78
Dynamic Scaling of Unit Tests for Code Reward Modeling Paper • 2501.01054 • Published 12 days ago • 16
CodeElo: Benchmarking Competition-level Code Generation of LLMs with Human-comparable Elo Ratings Paper • 2501.01257 • Published 12 days ago • 46
SDPO: Segment-Level Direct Preference Optimization for Social Agents Paper • 2501.01821 • Published 11 days ago • 18
PRMBench: A Fine-grained and Challenging Benchmark for Process-Level Reward Models Paper • 2501.03124 • Published 7 days ago • 13
Test-time Computing: from System-1 Thinking to System-2 Thinking Paper • 2501.02497 • Published 9 days ago • 34
Dolphin: Closed-loop Open-ended Auto-research through Thinking, Practice, and Feedback Paper • 2501.03916 • Published 6 days ago • 14
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models Paper • 2501.03262 • Published 10 days ago • 74
BoostStep: Boosting mathematical capability of Large Language Models via improved single-step reasoning Paper • 2501.03226 • Published 7 days ago • 34
BoxingGym: Benchmarking Progress in Automated Experimental Design and Model Discovery Paper • 2501.01540 • Published 11 days ago • 6