Spaces:
Running
Running
Update content.py
Browse files- content.py +2 -0
content.py
CHANGED
@@ -10,6 +10,8 @@ Here, you can compare models on tasks in the Czech language or submit your own m
|
|
10 |
- Visit the **Submission** page to learn about how to submit your model.
|
11 |
- Check out the **About** page for a brief overview of our evaluation protocol, win score mechanism, citation details, and future plans for this benchmark.
|
12 |
- __How scoring works__:
|
|
|
|
|
13 |
- For each task, the __Duel Win Score__ reflects the proportion of duels a model has won.
|
14 |
- Category scores are calculated by averaging scores across all tasks within that category. When viewing a specific category (other than Overall), the "Average" column displays the Category Duel Win Scores.
|
15 |
- The __Overall__ Duel Win Score is the average across all category scores. When selecting the Overall category, the "Average" column shows the Overall Duel Win Score.
|
|
|
10 |
- Visit the **Submission** page to learn about how to submit your model.
|
11 |
- Check out the **About** page for a brief overview of our evaluation protocol, win score mechanism, citation details, and future plans for this benchmark.
|
12 |
- __How scoring works__:
|
13 |
+
- On each task, we score every model using one of our metrics (Accuracy for multiple choice tasks, Word Perplexity for language modeling, AUROC for classification).
|
14 |
+
- On each task, for each model pair, we evaluate a __duel__: a statistical significant test (with alpha 5%) that the model's improvement in metric is significant.
|
15 |
- For each task, the __Duel Win Score__ reflects the proportion of duels a model has won.
|
16 |
- Category scores are calculated by averaging scores across all tasks within that category. When viewing a specific category (other than Overall), the "Average" column displays the Category Duel Win Scores.
|
17 |
- The __Overall__ Duel Win Score is the average across all category scores. When selecting the Overall category, the "Average" column shows the Overall Duel Win Score.
|