Benchmarks - a hppdqdq Collection

Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

hppdqdq 's Collections

Benchmarks

updated Oct 26

Running on CPU Upgrade

171

🥇

MMLU Pro

More advanced and challenging multi-task evaluation
Running

28

🎭

Stick To Your Role! Leaderboard
Running

46

📊

ZeroEval Leaderboard
Running

23

🥇

Decentralized Arena Leaderboard
Running on CPU Upgrade

311

🥇

Open Medical-LLM Leaderboard
Running

129

🏆

GPU Poor LLM Arena

Compact LLM Battle Arena: Frugal AI Face-Off!
Running

92

🌎

Open VLM Video Leaderboard

VLMEvalKit Eval Results in video understanding benchmark
Running on CPU Upgrade

12k

🏆

Open LLM Leaderboard

Track, rank and evaluate open LLMs and chatbots

Collection guide
Browse collections

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs