Spaces:
Running
Running
Update content.py
Browse files- content.py +1 -1
content.py
CHANGED
@@ -11,7 +11,7 @@ Here, you can compare models on tasks in the Czech language or submit your own m
|
|
11 |
- Check out the **About** page for a brief overview of our evaluation protocol, win score mechanism, citation details, and future plans for this benchmark.
|
12 |
- __How scoring works__:
|
13 |
- On each task, we score every model using one of our metrics (Accuracy for multiple choice tasks, Word Perplexity for language modeling, AUROC for classification).
|
14 |
-
- On each task
|
15 |
- For each task, the __Duel Win Score__ reflects the proportion of duels a model has won.
|
16 |
- Category scores are calculated by averaging scores across all tasks within that category. When viewing a specific category (other than Overall), the "Average" column displays the Category Duel Win Scores.
|
17 |
- The __Overall__ Duel Win Score is the average across all category scores. When selecting the Overall category, the "Average" column shows the Overall Duel Win Score.
|
|
|
11 |
- Check out the **About** page for a brief overview of our evaluation protocol, win score mechanism, citation details, and future plans for this benchmark.
|
12 |
- __How scoring works__:
|
13 |
- On each task, we score every model using one of our metrics (Accuracy for multiple choice tasks, Word Perplexity for language modeling, AUROC for classification).
|
14 |
+
- On each task for each model pair, we perform a _duel_: a statistical significance test (with a 5% alpha level) to determine if the model's improvement in the metric is significant.
|
15 |
- For each task, the __Duel Win Score__ reflects the proportion of duels a model has won.
|
16 |
- Category scores are calculated by averaging scores across all tasks within that category. When viewing a specific category (other than Overall), the "Average" column displays the Category Duel Win Scores.
|
17 |
- The __Overall__ Duel Win Score is the average across all category scores. When selecting the Overall category, the "Average" column shows the Overall Duel Win Score.
|