--- title: Leaderboard Test emoji: 👁 colorFrom: gray colorTo: indigo sdk: streamlit sdk_version: 1.41.1 app_file: app.py pinned: false --- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference # 🏆 LLM-Safety-Leaderboard A joint community effort to create one central leaderboard for LLMs. Contributions and corrections welcome!
We refer to a model being "open" if it can be locally deployed and used for commercial purposes. ## Interactive Dashboard https://llm-leaderboard.streamlit.app/
https://huggingface.co/spaces/ludwigstumpp/llm-leaderboard ## Leaderboard |Model|Open?|Critical Personal Safety|Property & Living Security|Fundamental Rights|Welfare Protection|Average|LexSafeBench| |------------|------------|------------|------------|------------|------------|------------|------------| |[gpt-4o](https://openai.com/index/hello-gpt-4o/)|No|81.8|76.6|74.0|81.5|77.6| |[claude-3-5-sonnet-20241022](https://www.anthropic.com/news/claude-3-5-sonnet)|No|79.3|75.4|73.5|79.8|76.2| |[Qwen2.5-72B-Instruct](https://huggingface.co/Qwen/Qwen2.5-72B-Instruct)|Yes|79.8|74.8|72.4|83.8|75.9| |[Mistral-Large-Instruct-2411](https://huggingface.co/models/Mistral-Large-Instruct-2411)|Yes|77.7|73.3|72.5|83.7|74.8| |[Llama-3.1-70B-Instruct](https://huggingface.co/models/Llama-3.1-70B-Instruct)|Yes|76.5|72.9|72.5|81.2|74.2| |[Meta-Llama-3-70B-Instruct](https://huggingface.co/models/Llama-3-70B-Instruct)|Yes|76.1|72.1|71.2|79.9|73.4| |[Qwen2.5-14B-Instruct](https://huggingface.co/Qwen/Qwen2.5-14B-Instruct)|Yes|77.4|71.9|69.8|82.5|73.3| |[Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct)|Yes|74.1|69.7|67.5|77.7|70.7| |[Mistral-Small-Instruct-2409](https://huggingface.co/models/Mistral-Small-Instruct-2409)|Yes|73.5|67.7|65.8|78.3|69.3| |[gemma-2-27b-it](https://huggingface.co/models/Gemma-2-27B-IT)|Yes|72.7|66.9|64.6|76.5|68.3| |[Qwen2.5-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct)|Yes|67.8|65.4|64.3|75.3|66.3| |[glm-4-9b-chat](https://huggingface.co/models/GLM-4-9B-Chat)|Yes|66.1|61.9|61.2|71.2|63.4| |[Meta-Llama-3-8B-Instruct](https://huggingface.co/models/Llama-3-8B-Instruct)|Yes|63.7|61.9|60.6|69.3|62.4| |[gemma-2-2b-it](https://huggingface.co/models/Gemma-2-2B-IT)|Yes|59.1|55.9|54.8|62.2|56.8| |[Llama-3.1-8B-Instruct](https://huggingface.co/models/Llama-3.1-8B-Instruct)|Yes|55.2|51.8|52.9|59.6|53.5| |[vicuna-7b-v1.5](https://huggingface.co/models/Vicuna-7B-V1.5)|Yes|41.4|40.7|38.8|44.6|40.5| |[vicuna-13b-v1.5](https://huggingface.co/models/Vicuna-13B-V1.5)|Yes|32.3|29.9|28.0|32.6|30.1| ## Benchmarks | Benchmark Name | Author | Link | Description | | ----------------- | ---------------- | ---------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | LexSafeBench | HKAIR-Lab | https://huggingface.co/HKAIR-Lab | "We introduce LexSafebench, an LLM benchmark featuring legal safety" ## More Open LLMs If you are interested in an overview about open llms for commercial use and finetuning, check out the [open-llms](https://github.com/eugeneyan/open-llms) repository. ## Sources The results of this leaderboard are collected from the individual papers and published results of the model authors. For each reported value, the source is added as a link. Special thanks to the following pages: - [MosaicML - Model benchmarks](https://www.mosaicml.com/blog/mpt-7b) - [lmsys.org - Chatbot Arena benchmarks](https://lmsys.org/blog/2023-05-03-arena/) - [Papers With Code](https://paperswithcode.com/) - [Stanford HELM](https://crfm.stanford.edu/helm/latest/) - [HF Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) ## Disclaimer Above information may be wrong. If you want to use a published model for commercial use, please contact a lawyer.