clefourrier HF staff commited on
Commit
5f4968c
β€’
1 Parent(s): 5891795

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -48
README.md CHANGED
@@ -1,6 +1,6 @@
1
  ---
2
  title: LeaderboardFinder
3
- emoji: 🐒
4
  colorFrom: pink
5
  colorTo: gray
6
  sdk: gradio
@@ -9,50 +9,4 @@ app_file: app.py
9
  pinned: false
10
  ---
11
 
12
- If you want your leaderboard to appear, feel free to add relevant information in its metadata, and it will be displayed here.
13
-
14
- # Categories
15
-
16
- ## Submission type
17
- Arenas are not concerned by this category.
18
-
19
- - `submission:automatic`: users can submit their models as such to the leaderboard, and evaluation is run automatically without human intervention
20
- - `submission:semiautomatic`: the leaderboard requires the model owner to run evaluations on his side and submit the results
21
- - `submission:manual`: the leaderboard requires the leaderboard owner to run evaluations for new submissions
22
- - `submission:closed`: the leaderboard does not accept submissions at the moment
23
-
24
- ## Test set status
25
- Arenas are not concerned by this category.
26
-
27
- - `test:public`: all the test sets used are public, the evaluations are completely reproducible
28
- - `test:mix`: some test sets are public and some private
29
- - `test:private`: all the test sets used are private, the evaluations are hard to game
30
- - `test:rolling`: the test sets used change regularly through time and evaluation scores are refreshed
31
-
32
- ## Judges
33
- - `judge:auto`: evaluations are run automatically, using an evaluation suite such as `lm_eval` or `lighteval`
34
- - `judge:model`: evaluations are run using a model as a judge approach to rate answer
35
- - `judge:humans`: evaluations are done by humans to rate answer - this is an arena
36
- - `judge:vibe_check`: evaluations are done manually by one human
37
-
38
- ## Modalities
39
- Can be any (or several) of the following list:
40
- - `modality:text`
41
- - `modality:image`
42
- - `modality:video`
43
- - `modality:audio`
44
- A bit outside of usual modalities
45
- - `modality:tools`: requires added tool usage - mostly for assistant models
46
- - `modality:artefacts`: the leaderboard concerns itself with machine learning artefacts as themselves, for example, quality evaluation of text embeddings.
47
-
48
- ## Evaluation categories
49
- Can be any (or several) of the following list:
50
- - `eval:generation`: the evaluation looks at generation capabilities specifically (can be image generation, text generation, ...)
51
- - `eval:math`
52
- - `eval:code`
53
- - `eval:performance`: model performance (speed, energy consumption, ...)
54
- - `eval:safety`: safety, toxicity, bias evaluations
55
-
56
- ## Language
57
- You can indicate the languages covered by your benchmark like so: `language:mylanguage`.
58
- At the moment, we do not support language codes, please use the language name in English.
 
1
  ---
2
  title: LeaderboardFinder
3
+ emoji: πŸ”Ž
4
  colorFrom: pink
5
  colorTo: gray
6
  sdk: gradio
 
9
  pinned: false
10
  ---
11
 
12
+ If you want your leaderboard to appear, feel free to add relevant information in its metadata, and it will be displayed here (see the About tab).