Running 114 114 Open-LLM performances are plateauing, let’s make the leaderboard steep again 🏔 Update leaderboard for fair model evaluation