Spaces:

logikon
/

open_cot_leaderboard

Running on CPU Upgrade

App Files Files Community

Gregor Betz commited on Mar 29

Commit

b52a077

•

1 Parent(s): b0d8fb5

link logos

Browse files

Files changed (3) hide show

assets/AI2_Logo_Square.png +0 -0
assets/logo_logikon_notext_withborder.png +0 -0
src/display/about.py +5 -5

assets/AI2_Logo_Square.png DELETED Viewed

Binary file (121 kB)

assets/logo_logikon_notext_withborder.png DELETED Viewed

Binary file (6.99 kB)

src/display/about.py CHANGED Viewed

@@ -20,8 +20,8 @@ class Tasks(Enum):
 #METRICS = list(set([task.value.metric for task in Tasks]))
-logo1_url = "./assets/AI2_Logo_Square.png"
-logo2_url = "./assets/logo_logikon_notext_withborder.png"
 LOGOS = f'<span style="display: flex; justify-content: flex-start;"><img src="{logo1_url}" alt="AI2" style="width: 40vw; min-width: 30px; max-width: 80px;"> <img src="{logo2_url}" alt="AI2" style="width: 40vw; min-width: 30px; max-width: 80px;"></span>'
 # Your leaderboard name
@@ -29,7 +29,7 @@ TITLE = f'<h1 align="center" id="space-title"> {LOGOS} &nbsp; Open CoT Leaderboa
 # What does your leaderboard evaluate?
 INTRODUCTION_TEXT = """
-The `/\/` Open CoT Leaderboard tracks the reasoning skills of LLMs, measured as their ability to generate **effective chain-of-thought reasoning traces**.
 The leaderboard reports **accuracy gains** achieved by using [chain-of-thought](https://logikon.ai/docs/delib_prompting) (CoT), i.e.: _accuracy gain Δ_ = _accuracy with CoT_ — _accuracy w/o CoT_.
@@ -55,7 +55,7 @@ A notebook with detailed result exploration and visualization is available [here
 Performance leaderboards like the [🤗 Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) or [YALL](https://huggingface.co/spaces/mlabonne/Yet_Another_LLM_Leaderboard) do a great job in ranking models according to task performance.
-Unlike these leaderboards, the `/\/` Open CoT Leaderboard assesses a model's ability to effectively reason about a `task`:
 ### 🤗 Open LLM Leaderboard
 * a. Can `model` solve `task`?
@@ -63,7 +63,7 @@ Unlike these leaderboards, the `/\/` Open CoT Leaderboard assesses a model's abi
 * c. Measures `task` performance.
 * d. Covers broad spectrum of `tasks`.
-### `/\/` Open CoT Leaderboard
 * a. Can `model` do CoT to improve in `task`?
 * b. Metric: relative accuracy gain.
 * c. Measures ability to reason (about `task`).

 #METRICS = list(set([task.value.metric for task in Tasks]))
+logo1_url = "https://github.com/logikon-ai/cot-eval/blob/main/assets/AI2_Logo_Square.png"
+logo2_url = "https://github.com/logikon-ai/cot-eval/blob/main/assets/logo_logikon_notext_withborder.png"
 LOGOS = f'<span style="display: flex; justify-content: flex-start;"><img src="{logo1_url}" alt="AI2" style="width: 40vw; min-width: 30px; max-width: 80px;"> <img src="{logo2_url}" alt="AI2" style="width: 40vw; min-width: 30px; max-width: 80px;"></span>'
 # Your leaderboard name
 # What does your leaderboard evaluate?
 INTRODUCTION_TEXT = """
+The Open CoT Leaderboard tracks the reasoning skills of LLMs, measured as their ability to generate **effective chain-of-thought reasoning traces**.
 The leaderboard reports **accuracy gains** achieved by using [chain-of-thought](https://logikon.ai/docs/delib_prompting) (CoT), i.e.: _accuracy gain Δ_ = _accuracy with CoT_ — _accuracy w/o CoT_.
 Performance leaderboards like the [🤗 Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) or [YALL](https://huggingface.co/spaces/mlabonne/Yet_Another_LLM_Leaderboard) do a great job in ranking models according to task performance.
+Unlike these leaderboards, the Open CoT Leaderboard assesses a model's ability to effectively reason about a `task`:
 ### 🤗 Open LLM Leaderboard
 * a. Can `model` solve `task`?
 * c. Measures `task` performance.
 * d. Covers broad spectrum of `tasks`.
+### Open CoT Leaderboard
 * a. Can `model` do CoT to improve in `task`?
 * b. Metric: relative accuracy gain.
 * c. Measures ability to reason (about `task`).