huihui-ai commited on
Commit
dddbc12
1 Parent(s): 0c136b3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +9 -1
README.md CHANGED
@@ -103,4 +103,12 @@ while True:
103
 
104
  ```
105
  ## Evaluations
106
- We will be submitting this model to the OpenLLM Leaderboard for a more conclusive benchmark - but here are our internal benchmarks using the main branch of lm evaluation harness:
 
 
 
 
 
 
 
 
 
103
 
104
  ```
105
  ## Evaluations
106
+
107
+ The following data has been re-evaluated and calculated as the average for each test.
108
+ | Benchmark | SuperNova-Lite | Meta-Llama-3.1-8B-Instruct-abliterated | Llama-3.1-8B-Fusion-9010 | Llama-3.1-8B-Fusion-8020 | Llama-3.1-8B-Fusion-7030 | Llama-3.1-8B-Fusion-6040 | Llama-3.1-8B-Fusion-5050 |
109
+ |-------------|----------------|----------------------------------------|--------------------------|--------------------------|--------------------------|--------------------------|--------------------------|
110
+ | IF_Eval | 82.09 | 76.29 | 82.44 | 82.93 | **83.10** | 82.94 | 82.03 |
111
+ | MMLU Pro | **35.87** | 33.1 | 35.65 | 35.32 | 34.91 | 34.5 | 33.96 |
112
+ | TruthfulQA | **64.35** | 53.25 | 62.67 | 61.04 | 59.09 | 57.8 | 56.75 |
113
+ | BBH | **49.48** | 44.87 | 48.86 | 48.47 | 48.30 | 48.19 | 47.93 |
114
+ | GPQA | 31.98 | 29.50 | 32.25 | 32.38 | **32.61** | 31.14 | 30.6 |