mfajcik commited on
Commit
da8b87f
β€’
1 Parent(s): 73f436c

Update content.py

Browse files
Files changed (1) hide show
  1. content.py +2 -0
content.py CHANGED
@@ -10,6 +10,8 @@ Here, you can compare models on tasks in the Czech language or submit your own m
10
  - Visit the **Submission** page to learn about how to submit your model.
11
  - Check out the **About** page for a brief overview of our evaluation protocol, win score mechanism, citation details, and future plans for this benchmark.
12
  - __How scoring works__:
 
 
13
  - For each task, the __Duel Win Score__ reflects the proportion of duels a model has won.
14
  - Category scores are calculated by averaging scores across all tasks within that category. When viewing a specific category (other than Overall), the "Average" column displays the Category Duel Win Scores.
15
  - The __Overall__ Duel Win Score is the average across all category scores. When selecting the Overall category, the "Average" column shows the Overall Duel Win Score.
 
10
  - Visit the **Submission** page to learn about how to submit your model.
11
  - Check out the **About** page for a brief overview of our evaluation protocol, win score mechanism, citation details, and future plans for this benchmark.
12
  - __How scoring works__:
13
+ - On each task, we score every model using one of our metrics (Accuracy for multiple choice tasks, Word Perplexity for language modeling, AUROC for classification).
14
+ - On each task, for each model pair, we evaluate a __duel__: a statistical significant test (with alpha 5%) that the model's improvement in metric is significant.
15
  - For each task, the __Duel Win Score__ reflects the proportion of duels a model has won.
16
  - Category scores are calculated by averaging scores across all tasks within that category. When viewing a specific category (other than Overall), the "Average" column displays the Category Duel Win Scores.
17
  - The __Overall__ Duel Win Score is the average across all category scores. When selecting the Overall category, the "Average" column shows the Overall Duel Win Score.