Spaces:
Running
on
CPU Upgrade
Running
on
CPU Upgrade
Gregor Betz
commited on
Commit
•
992caee
1
Parent(s):
0841987
description
Browse files- src/display/about.py +1 -1
src/display/about.py
CHANGED
@@ -44,7 +44,7 @@ To assess the reasoning skill of a given `model`, we carry out the following ste
|
|
44 |
3. `model` answers the test dataset problems _with the reasoning traces appended_ to the prompt, we record the resulting _CoT accuracy_.
|
45 |
4. We compute the _accuracy gain Δ_ = _CoT accuracy_ — _baseline accuracy_ for the given `model`, `task`, and `regime`.
|
46 |
|
47 |
-
Each `regime`
|
48 |
|
49 |
|
50 |
## How is it different from other leaderboards?
|
|
|
44 |
3. `model` answers the test dataset problems _with the reasoning traces appended_ to the prompt, we record the resulting _CoT accuracy_.
|
45 |
4. We compute the _accuracy gain Δ_ = _CoT accuracy_ — _baseline accuracy_ for the given `model`, `task`, and `regime`.
|
46 |
|
47 |
+
Each `regime` yields a different _accuracy gain Δ_, and the leaderboard reports (for every `model`/`task`) the best Δ achieved by any regime. All models are evaluated against the same set of regimes.
|
48 |
|
49 |
|
50 |
## How is it different from other leaderboards?
|