We're thrilled to introduce our latest project: SE Arena! 🎉
SE Arena is an interactive platform designed to evaluate and compare software engineering chatbots powered by foundation models. With a transparent, open-source leaderboard, support for multi-round conversations, and head-to-head model comparisons, SE Arena is here to bring clarity to the evaluation process for FMs in software engineering tasks.
Our team just dropped something cool! 🎉 We've published a new paper on arxiv diving into the foundation model leaderboards across different platforms. We've analyzed the content, operational workflows, and common issues of these leaderboards. From this, we came up with two new concepts: Leaderboard Operations (LBOps) and leaderboard smells.