Papers
arxiv:2409.09054

Evaluating the Performance of Large Language Models in Competitive Programming: A Multi-Year, Multi-Grade Analysis

Published on Aug 31
Authors:
,

Abstract

This study explores the performance of large language models (LLMs) in solving competitive programming problems from the Romanian Informatics Olympiad at the county level. Romania, a leading nation in computer science competitions, provides an ideal environment for evaluating LLM capabilities due to its rich history and stringent competition standards. We collected and analyzed a dataset comprising 304 challenges from 2002 to 2023, focusing on solutions written by LLMs in C++ and Python for these problems. Our primary goal is to understand why LLMs perform well or poorly on different tasks. We evaluated various models, including closed-source models like GPT-4 and open-weight models such as CodeLlama and RoMistral, using a standardized process involving multiple attempts and feedback rounds. The analysis revealed significant variations in LLM performance across different grades and problem types. Notably, GPT-4 showed strong performance, indicating its potential use as an educational tool for middle school students. We also observed differences in code quality and style across various LLMs

Community

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2409.09054 in a model README.md to link it from this page.

Datasets citing this paper 1

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2409.09054 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.