TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks Paper • 2412.14161 • Published 5 days ago • 42
OpenDevin: An Open Platform for AI Software Developers as Generalist Agents Paper • 2407.16741 • Published Jul 23 • 68
Incorporating External Knowledge through Pre-training for Natural Language to Code Generation Paper • 2004.09015 • Published Apr 20, 2020 • 1
A Systematic Evaluation of Large Language Models of Code Paper • 2202.13169 • Published Feb 26, 2022 • 1
Neuro-Symbolic Language Modeling with Automaton-augmented Retrieval Paper • 2201.12431 • Published Jan 28, 2022
MCoNaLa: A Benchmark for Code Generation from Multiple Natural Languages Paper • 2203.08388 • Published Mar 16, 2022
WebArena: A Realistic Web Environment for Building Autonomous Agents Paper • 2307.13854 • Published Jul 25, 2023 • 23