LINC: A Neurosymbolic Approach for Logical Reasoning by Combining Language Models with First-Order Logic Provers Paper • 2310.15164 • Published Oct 23, 2023 • 1
LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code Paper • 2403.07974 • Published Mar 12 • 1
CRUXEval: A Benchmark for Code Reasoning, Understanding and Execution Paper • 2401.03065 • Published Jan 5 • 11
LeanDojo: Theorem Proving with Retrieval-Augmented Language Models Paper • 2306.15626 • Published Jun 27, 2023 • 17