mfajcik commited on
Commit
7585b04
Β·
verified Β·
1 Parent(s): 4071b16

Update content.py

Browse files
Files changed (1) hide show
  1. content.py +4 -4
content.py CHANGED
@@ -19,7 +19,7 @@ Here, you can compare models on tasks in the Czech language or submit your own m
19
  - On the submission page, __you can view your model's results on the leaderboard without publishing them__.
20
  - The first step is "pre-submission." After this is complete (significance tests may take up to 2 hours), you can choose to submit the results if you wish.
21
  - NEWS:
22
- - 19.02.2025: We added an performance-size plot under the Table for better overview! Scroll down to find out, which model works the best for it's size!
23
  - 23.12.2024: We released [a preprint](http://arxiv.org/abs/2412.17933) detailing our work.
24
  - 7.11.2024: We acknowledge that one of the Qwen2.5 models correctly predicted our (& Bigbench's) canary string. This confirms the contamination, it was trained on benchmark data. Other [studies](https://arxiv.org/pdf/2409.01790) also suggest the contamination issues of the Qwen family.
25
  - 1.10.2024: Find out more about πŸ‡¨πŸ‡Ώ BenCzechMark in our [Huggingface blogpost](https://huggingface.co/blog/benczechmark)!
@@ -29,9 +29,9 @@ LEADERBOARD_TAB_TITLE_MARKDOWN = """
29
  """
30
 
31
  LEADERBOARD_TAB_BELLOW_PLOT_MARKDOWN = """
32
- Explanation:
33
- - the point symbol is determined by the type of model ('chat': 'circle', 'pretrained': 'triangle', 'ensemble': 'star')
34
- - the size of the symbol is larger according to the variance across categories
35
  """
36
 
37
  SUBMISSION_TAB_TITLE_MARKDOWN = """
 
19
  - On the submission page, __you can view your model's results on the leaderboard without publishing them__.
20
  - The first step is "pre-submission." After this is complete (significance tests may take up to 2 hours), you can choose to submit the results if you wish.
21
  - NEWS:
22
+ - 19.02.2025: We added a performance-size plot under the Table for better overview! Scroll down to find out, which model works the best for it's size!
23
  - 23.12.2024: We released [a preprint](http://arxiv.org/abs/2412.17933) detailing our work.
24
  - 7.11.2024: We acknowledge that one of the Qwen2.5 models correctly predicted our (& Bigbench's) canary string. This confirms the contamination, it was trained on benchmark data. Other [studies](https://arxiv.org/pdf/2409.01790) also suggest the contamination issues of the Qwen family.
25
  - 1.10.2024: Find out more about πŸ‡¨πŸ‡Ώ BenCzechMark in our [Huggingface blogpost](https://huggingface.co/blog/benczechmark)!
 
29
  """
30
 
31
  LEADERBOARD_TAB_BELLOW_PLOT_MARKDOWN = """
32
+ Legend:
33
+ - the point symbol is determined by the type of the model ('chat': 'circle', 'pretrained': 'triangle', 'ensemble': 'star')
34
+ - the size of the symbol is larger according to the max-centered variance across categories (our measure of prompt stability, see the [paper](http://arxiv.org/abs/2412.17933) for details).
35
  """
36
 
37
  SUBMISSION_TAB_TITLE_MARKDOWN = """