The Big Benchmarks Collection

open-llm-leaderboard 's Collections

Details

Open LLM Leaderboard 2

Open LLM Leaderboard best models ❤️‍🔥

updated Nov 18, 2024

Gathering benchmark spaces on the hub (beyond the Open LLM Leaderboard)

Upvote

204

Running on CPU Upgrade

12.7k

12.7k

Open LLM Leaderboard

🏆

Track, rank and evaluate open LLMs and chatbots

Note 📐 The 🤗 Open LLM Leaderboard aims to track, rank and evaluate open LLMs and chatbots. 🤗 Submit a model for automated evaluation on the 🤗 GPU cluster on the “Submit” page!
Running on CPU Upgrade

5.01k

5.01k

MTEB Leaderboard

🥇

Select benchmarks and languages for text embeddings evaluation

Note Massive Text Embedding Benchmark (MTEB) Leaderboard.
Running

4.14k

4.14k

Chatbot Arena Leaderboard

🏆

Display chatbot leaderboard and statistics

Note 🏆 This leaderboard is based on the following three benchmarks: Chatbot Arena - a crowdsourced, randomized battle platform. We use 70K+ user votes to compute Elo ratings. MT-Bench - a set of challenging multi-turn questions. We use GPT-4 to grade the model responses. MMLU (5-shot) - a test to measure a model’s multitask accuracy on 57 tasks.
Running

436

436

LLM-Perf Leaderboard

🏆

Explore LLM performance across hardware

Note The 🤗 LLM-Perf Leaderboard 🏋️ aims to benchmark the performance (latency, throughput & memory) of Large Language Models (LLMs) with different hardwares, backends and optimizations using Optimum-Benchmark and Optimum flavors. Anyone from the community can request a model or a hardware/backend/optimization configuration for automated benchmarking:
Running

1.18k

1.18k

Big Code Models Leaderboard

📈

Submit code models for evaluation on benchmarks

Note Compare performance of base multilingual code generation models on HumanEval benchmark and MultiPL-E. We also measure throughput and provide information about the models. We only compare open pre-trained multilingual code models, that people can start from as base models for their trainings.
Running on CPU Upgrade

662

662

Open ASR Leaderboard

🏆

Request evaluation of a speech recognition model

Note The 🤗 Open ASR Leaderboard ranks and evaluates speech recognition models on the Hugging Face Hub. We report the Average WER (⬇️) and RTF (⬇️) - lower the better. Models are ranked based on their Average WER, from lowest to highest
Running

183

183

MT Bench

📊

Compare model answers to questions

Note The MT-Bench Browser (see Chatbot arena)
Running

65

65

Toolbench Leaderboard

⚡

Display ToolBench model performance results
Running

93

93

OpenCompass LLM Leaderboard

🚀

Display a web page
Running

21

21

MMBench Leaderboard

🚀

View and filter MMBench leaderboard data
Running on CPU Upgrade

531

531

Open Ko-LLM Leaderboard

📉

Explore and filter language model benchmark results
Running

18

18

Subquadratic LLM Leaderboard

🏆

Submit and filter LLM models for evaluation
Running

60

60

Open Persian LLM Leaderboard

🏅

Open Persian LLM Leaderboard

Upvote

204

Open LLM Leaderboard

MTEB Leaderboard

Chatbot Arena Leaderboard

LLM-Perf Leaderboard

Big Code Models Leaderboard

Open ASR Leaderboard

MT Bench

Toolbench Leaderboard

OpenCompass LLM Leaderboard

MMBench Leaderboard

Open Ko-LLM Leaderboard

Subquadratic LLM Leaderboard

Open Persian LLM Leaderboard