---
title: Leaderboard Test
emoji: 👁
colorFrom: gray
colorTo: indigo
sdk: streamlit
sdk_version: 1.41.1
app_file: app.py
pinned: false

---
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference


# 🏆 LLM-Safety-Leaderboard

A joint community effort to create one central leaderboard for LLMs. Contributions and corrections welcome! <br>
We refer to a model being "open" if it can be locally deployed and used for commercial purposes.

## Interactive Dashboard

https://llm-leaderboard.streamlit.app/ <br>
https://huggingface.co/spaces/ludwigstumpp/llm-leaderboard

## Leaderboard

|Model|Open?|Critical Personal Safety|Property & Living Security|Fundamental Rights|Welfare Protection|Average|LexSafeBench|
|------------|------------|------------|------------|------------|------------|------------|------------|
|[gpt-4o](https://openai.com/index/hello-gpt-4o/)|No|81.8|76.6|74.0|81.5|77.6|
|[claude-3-5-sonnet-20241022](https://www.anthropic.com/news/claude-3-5-sonnet)|No|79.3|75.4|73.5|79.8|76.2|
|[Qwen2.5-72B-Instruct](https://huggingface.co/Qwen/Qwen2.5-72B-Instruct)|Yes|79.8|74.8|72.4|83.8|75.9|
|[Mistral-Large-Instruct-2411](https://huggingface.co/models/Mistral-Large-Instruct-2411)|Yes|77.7|73.3|72.5|83.7|74.8|
|[Llama-3.1-70B-Instruct](https://huggingface.co/models/Llama-3.1-70B-Instruct)|Yes|76.5|72.9|72.5|81.2|74.2|
|[Meta-Llama-3-70B-Instruct](https://huggingface.co/models/Llama-3-70B-Instruct)|Yes|76.1|72.1|71.2|79.9|73.4|
|[Qwen2.5-14B-Instruct](https://huggingface.co/Qwen/Qwen2.5-14B-Instruct)|Yes|77.4|71.9|69.8|82.5|73.3|
|[Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct)|Yes|74.1|69.7|67.5|77.7|70.7|
|[Mistral-Small-Instruct-2409](https://huggingface.co/models/Mistral-Small-Instruct-2409)|Yes|73.5|67.7|65.8|78.3|69.3|
|[gemma-2-27b-it](https://huggingface.co/models/Gemma-2-27B-IT)|Yes|72.7|66.9|64.6|76.5|68.3|
|[Qwen2.5-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct)|Yes|67.8|65.4|64.3|75.3|66.3|
|[glm-4-9b-chat](https://huggingface.co/models/GLM-4-9B-Chat)|Yes|66.1|61.9|61.2|71.2|63.4|
|[Meta-Llama-3-8B-Instruct](https://huggingface.co/models/Llama-3-8B-Instruct)|Yes|63.7|61.9|60.6|69.3|62.4|
|[gemma-2-2b-it](https://huggingface.co/models/Gemma-2-2B-IT)|Yes|59.1|55.9|54.8|62.2|56.8|
|[Llama-3.1-8B-Instruct](https://huggingface.co/models/Llama-3.1-8B-Instruct)|Yes|55.2|51.8|52.9|59.6|53.5|
|[vicuna-7b-v1.5](https://huggingface.co/models/Vicuna-7B-V1.5)|Yes|41.4|40.7|38.8|44.6|40.5|
|[vicuna-13b-v1.5](https://huggingface.co/models/Vicuna-13B-V1.5)|Yes|32.3|29.9|28.0|32.6|30.1|

## Benchmarks

| Benchmark Name    | Author           | Link                                     | Description                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
| ----------------- | ---------------- | ---------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| LexSafeBench | HKAIR-Lab            | https://huggingface.co/HKAIR-Lab | "We introduce LexSafebench, an LLM benchmark featuring legal safety"                                                                                                                                                                                                                                                                                                                                                    

## More Open LLMs

If you are interested in an overview about open llms for commercial use and finetuning, check out the [open-llms](https://github.com/eugeneyan/open-llms) repository.

## Sources

The results of this leaderboard are collected from the individual papers and published results of the model authors. For each reported value, the source is added as a link.

Special thanks to the following pages:
- [MosaicML - Model benchmarks](https://www.mosaicml.com/blog/mpt-7b)
- [lmsys.org - Chatbot Arena benchmarks](https://lmsys.org/blog/2023-05-03-arena/)
- [Papers With Code](https://paperswithcode.com/)
- [Stanford HELM](https://crfm.stanford.edu/helm/latest/)
- [HF Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)

## Disclaimer

Above information may be wrong. If you want to use a published model for commercial use, please contact a lawyer.