Gregor Betz commited on
Commit
db734a2
1 Parent(s): 3b4e965

Link to notebook

Browse files
Files changed (1) hide show
  1. src/display/about.py +2 -1
src/display/about.py CHANGED
@@ -46,6 +46,7 @@ To assess the reasoning skill of a given `model`, we carry out the following ste
46
 
47
  Each `regime` yields a different _accuracy gain Δ_, and the leaderboard reports (for every `model`/`task`) the best Δ achieved by any regime. All models are evaluated against the same set of regimes.
48
 
 
49
 
50
  ## How is it different from other leaderboards?
51
 
@@ -68,7 +69,7 @@ Unlike these leaderboards, the `/\/` Open CoT Leaderboard assesses a model's abi
68
 
69
  ## Test dataset selection (`tasks`)
70
 
71
- The test dataset porblems in the CoT Leaderboard can be solved through clear thinking alone, no specific knowledge is required to do so. They are subsets of the [AGIEval benchmark](https://github.com/ruixiangcui/AGIEval) and re-published as [`logikon-bench`](https://huggingface.co/datasets/logikon/logikon-bench). The `logiqa` dataset has been newly translated from Chinese to English.
72
 
73
 
74
  ## Reproducibility
 
46
 
47
  Each `regime` yields a different _accuracy gain Δ_, and the leaderboard reports (for every `model`/`task`) the best Δ achieved by any regime. All models are evaluated against the same set of regimes.
48
 
49
+ A notebook with detailed result exploration and visualization is available [here](https://github.com/logikon-ai/cot-eval/blob/main/notebooks/CoT_Leaderboard_Results_Exploration.ipynb).
50
 
51
  ## How is it different from other leaderboards?
52
 
 
69
 
70
  ## Test dataset selection (`tasks`)
71
 
72
+ The test dataset problems in the CoT Leaderboard can be solved through clear thinking alone, no specific knowledge is required to do so. They are subsets of the [AGIEval benchmark](https://github.com/ruixiangcui/AGIEval) and re-published as [`logikon-bench`](https://huggingface.co/datasets/logikon/logikon-bench). The `logiqa` dataset has been newly translated from Chinese to English.
73
 
74
 
75
  ## Reproducibility