Spaces:
Running
on
CPU Upgrade
Running
on
CPU Upgrade
Gregor Betz
commited on
Commit
β’
b52a077
1
Parent(s):
b0d8fb5
link logos
Browse files
assets/AI2_Logo_Square.png
DELETED
Binary file (121 kB)
|
|
assets/logo_logikon_notext_withborder.png
DELETED
Binary file (6.99 kB)
|
|
src/display/about.py
CHANGED
@@ -20,8 +20,8 @@ class Tasks(Enum):
|
|
20 |
#METRICS = list(set([task.value.metric for task in Tasks]))
|
21 |
|
22 |
|
23 |
-
logo1_url = "
|
24 |
-
logo2_url = "
|
25 |
LOGOS = f'<span style="display: flex; justify-content: flex-start;"><img src="{logo1_url}" alt="AI2" style="width: 40vw; min-width: 30px; max-width: 80px;"> <img src="{logo2_url}" alt="AI2" style="width: 40vw; min-width: 30px; max-width: 80px;"></span>'
|
26 |
|
27 |
# Your leaderboard name
|
@@ -29,7 +29,7 @@ TITLE = f'<h1 align="center" id="space-title"> {LOGOS} Open CoT Leaderboa
|
|
29 |
|
30 |
# What does your leaderboard evaluate?
|
31 |
INTRODUCTION_TEXT = """
|
32 |
-
The
|
33 |
|
34 |
The leaderboard reports **accuracy gains** achieved by using [chain-of-thought](https://logikon.ai/docs/delib_prompting) (CoT), i.e.: _accuracy gain Ξ_ = _accuracy with CoT_ β _accuracy w/o CoT_.
|
35 |
|
@@ -55,7 +55,7 @@ A notebook with detailed result exploration and visualization is available [here
|
|
55 |
|
56 |
Performance leaderboards like the [π€ Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) or [YALL](https://huggingface.co/spaces/mlabonne/Yet_Another_LLM_Leaderboard) do a great job in ranking models according to task performance.
|
57 |
|
58 |
-
Unlike these leaderboards, the
|
59 |
|
60 |
### π€ Open LLM Leaderboard
|
61 |
* a. Can `model` solve `task`?
|
@@ -63,7 +63,7 @@ Unlike these leaderboards, the `/\/` Open CoT Leaderboard assesses a model's abi
|
|
63 |
* c. Measures `task` performance.
|
64 |
* d. Covers broad spectrum of `tasks`.
|
65 |
|
66 |
-
###
|
67 |
* a. Can `model` do CoT to improve in `task`?
|
68 |
* b. Metric: relative accuracy gain.
|
69 |
* c. Measures ability to reason (about `task`).
|
|
|
20 |
#METRICS = list(set([task.value.metric for task in Tasks]))
|
21 |
|
22 |
|
23 |
+
logo1_url = "https://github.com/logikon-ai/cot-eval/blob/main/assets/AI2_Logo_Square.png"
|
24 |
+
logo2_url = "https://github.com/logikon-ai/cot-eval/blob/main/assets/logo_logikon_notext_withborder.png"
|
25 |
LOGOS = f'<span style="display: flex; justify-content: flex-start;"><img src="{logo1_url}" alt="AI2" style="width: 40vw; min-width: 30px; max-width: 80px;"> <img src="{logo2_url}" alt="AI2" style="width: 40vw; min-width: 30px; max-width: 80px;"></span>'
|
26 |
|
27 |
# Your leaderboard name
|
|
|
29 |
|
30 |
# What does your leaderboard evaluate?
|
31 |
INTRODUCTION_TEXT = """
|
32 |
+
The Open CoT Leaderboard tracks the reasoning skills of LLMs, measured as their ability to generate **effective chain-of-thought reasoning traces**.
|
33 |
|
34 |
The leaderboard reports **accuracy gains** achieved by using [chain-of-thought](https://logikon.ai/docs/delib_prompting) (CoT), i.e.: _accuracy gain Ξ_ = _accuracy with CoT_ β _accuracy w/o CoT_.
|
35 |
|
|
|
55 |
|
56 |
Performance leaderboards like the [π€ Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) or [YALL](https://huggingface.co/spaces/mlabonne/Yet_Another_LLM_Leaderboard) do a great job in ranking models according to task performance.
|
57 |
|
58 |
+
Unlike these leaderboards, the Open CoT Leaderboard assesses a model's ability to effectively reason about a `task`:
|
59 |
|
60 |
### π€ Open LLM Leaderboard
|
61 |
* a. Can `model` solve `task`?
|
|
|
63 |
* c. Measures `task` performance.
|
64 |
* d. Covers broad spectrum of `tasks`.
|
65 |
|
66 |
+
### Open CoT Leaderboard
|
67 |
* a. Can `model` do CoT to improve in `task`?
|
68 |
* b. Metric: relative accuracy gain.
|
69 |
* c. Measures ability to reason (about `task`).
|