victormiller commited on
Commit
274ce6d
·
verified ·
1 Parent(s): 8228132

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +13 -0
README.md CHANGED
@@ -14,6 +14,19 @@ Evaluations include standard best practice benchmarks, medical, math, and coding
14
 
15
  <center><img src="k2_chat_table_of_tables.png" alt="k2 big eval table"/></center>
16
 
 
 
 
 
 
 
 
 
 
 
 
 
 
17
  ## Datasets and Mix
18
 
19
  | Subset | #Tokens | Avg. #Q | Avg. Query Len | Avg. #R | Avg. Reply Len |
 
14
 
15
  <center><img src="k2_chat_table_of_tables.png" alt="k2 big eval table"/></center>
16
 
17
+ ## Open LLM Leaderboard
18
+ | Evaluation | Score | Raw Score |
19
+ | ----------- | ----------- | ----------- |
20
+ | IFEval | 51.52 | 52 |
21
+ | BBH | 33.79 | 54 |
22
+ | Math Lvl 5 | 1.59 | 2 |
23
+ | GPQA | 7.49 | 31 |
24
+ | MUSR | 16.82 | 46 |
25
+ | MMLU-PRO | 26.34 | 34 |
26
+ | Average | 22.93 | 36.5 |
27
+
28
+
29
+
30
  ## Datasets and Mix
31
 
32
  | Subset | #Tokens | Avg. #Q | Avg. Query Len | Avg. #R | Avg. Reply Len |