Gregor Betz commited on
Commit
b52a077
β€’
1 Parent(s): b0d8fb5

link logos

Browse files
assets/AI2_Logo_Square.png DELETED
Binary file (121 kB)
 
assets/logo_logikon_notext_withborder.png DELETED
Binary file (6.99 kB)
 
src/display/about.py CHANGED
@@ -20,8 +20,8 @@ class Tasks(Enum):
20
  #METRICS = list(set([task.value.metric for task in Tasks]))
21
 
22
 
23
- logo1_url = "./assets/AI2_Logo_Square.png"
24
- logo2_url = "./assets/logo_logikon_notext_withborder.png"
25
  LOGOS = f'<span style="display: flex; justify-content: flex-start;"><img src="{logo1_url}" alt="AI2" style="width: 40vw; min-width: 30px; max-width: 80px;"> <img src="{logo2_url}" alt="AI2" style="width: 40vw; min-width: 30px; max-width: 80px;"></span>'
26
 
27
  # Your leaderboard name
@@ -29,7 +29,7 @@ TITLE = f'<h1 align="center" id="space-title"> {LOGOS} &nbsp; Open CoT Leaderboa
29
 
30
  # What does your leaderboard evaluate?
31
  INTRODUCTION_TEXT = """
32
- The `/\/` Open CoT Leaderboard tracks the reasoning skills of LLMs, measured as their ability to generate **effective chain-of-thought reasoning traces**.
33
 
34
  The leaderboard reports **accuracy gains** achieved by using [chain-of-thought](https://logikon.ai/docs/delib_prompting) (CoT), i.e.: _accuracy gain Ξ”_ = _accuracy with CoT_ β€” _accuracy w/o CoT_.
35
 
@@ -55,7 +55,7 @@ A notebook with detailed result exploration and visualization is available [here
55
 
56
  Performance leaderboards like the [πŸ€— Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) or [YALL](https://huggingface.co/spaces/mlabonne/Yet_Another_LLM_Leaderboard) do a great job in ranking models according to task performance.
57
 
58
- Unlike these leaderboards, the `/\/` Open CoT Leaderboard assesses a model's ability to effectively reason about a `task`:
59
 
60
  ### πŸ€— Open LLM Leaderboard
61
  * a. Can `model` solve `task`?
@@ -63,7 +63,7 @@ Unlike these leaderboards, the `/\/` Open CoT Leaderboard assesses a model's abi
63
  * c. Measures `task` performance.
64
  * d. Covers broad spectrum of `tasks`.
65
 
66
- ### `/\/` Open CoT Leaderboard
67
  * a. Can `model` do CoT to improve in `task`?
68
  * b. Metric: relative accuracy gain.
69
  * c. Measures ability to reason (about `task`).
 
20
  #METRICS = list(set([task.value.metric for task in Tasks]))
21
 
22
 
23
+ logo1_url = "https://github.com/logikon-ai/cot-eval/blob/main/assets/AI2_Logo_Square.png"
24
+ logo2_url = "https://github.com/logikon-ai/cot-eval/blob/main/assets/logo_logikon_notext_withborder.png"
25
  LOGOS = f'<span style="display: flex; justify-content: flex-start;"><img src="{logo1_url}" alt="AI2" style="width: 40vw; min-width: 30px; max-width: 80px;"> <img src="{logo2_url}" alt="AI2" style="width: 40vw; min-width: 30px; max-width: 80px;"></span>'
26
 
27
  # Your leaderboard name
 
29
 
30
  # What does your leaderboard evaluate?
31
  INTRODUCTION_TEXT = """
32
+ The Open CoT Leaderboard tracks the reasoning skills of LLMs, measured as their ability to generate **effective chain-of-thought reasoning traces**.
33
 
34
  The leaderboard reports **accuracy gains** achieved by using [chain-of-thought](https://logikon.ai/docs/delib_prompting) (CoT), i.e.: _accuracy gain Ξ”_ = _accuracy with CoT_ β€” _accuracy w/o CoT_.
35
 
 
55
 
56
  Performance leaderboards like the [πŸ€— Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) or [YALL](https://huggingface.co/spaces/mlabonne/Yet_Another_LLM_Leaderboard) do a great job in ranking models according to task performance.
57
 
58
+ Unlike these leaderboards, the Open CoT Leaderboard assesses a model's ability to effectively reason about a `task`:
59
 
60
  ### πŸ€— Open LLM Leaderboard
61
  * a. Can `model` solve `task`?
 
63
  * c. Measures `task` performance.
64
  * d. Covers broad spectrum of `tasks`.
65
 
66
+ ### Open CoT Leaderboard
67
  * a. Can `model` do CoT to improve in `task`?
68
  * b. Metric: relative accuracy gain.
69
  * c. Measures ability to reason (about `task`).