Update app.py
Browse files
app.py
CHANGED
@@ -45,7 +45,8 @@ Contribute your vote π³οΈ at [chat.lmsys.org](https://chat.lmsys.org)! Find m
|
|
45 |
|
46 |
def make_full_leaderboard_md(elo_results):
|
47 |
leaderboard_md = f"""
|
48 |
-
|
|
|
49 |
- [MT-Bench](https://arxiv.org/abs/2306.05685): a set of challenging multi-turn questions. We use GPT-4 to grade the model responses.
|
50 |
- [MMLU](https://arxiv.org/abs/2009.03300) (5-shot): a test to measure a model's multitask accuracy on 57 tasks.
|
51 |
|
|
|
45 |
|
46 |
def make_full_leaderboard_md(elo_results):
|
47 |
leaderboard_md = f"""
|
48 |
+
Three benchmarks are displayed: **Arena Elo**, **MT-Bench** and **MMLU**.
|
49 |
+
- [Chatbot Arena](https://chat.lmsys.org/?arena) - a crowdsourced, randomized battle platform. We use 200K+ user votes to compute Elo ratings.
|
50 |
- [MT-Bench](https://arxiv.org/abs/2306.05685): a set of challenging multi-turn questions. We use GPT-4 to grade the model responses.
|
51 |
- [MMLU](https://arxiv.org/abs/2009.03300) (5-shot): a test to measure a model's multitask accuracy on 57 tasks.
|
52 |
|