nm-research commited on
Commit
b50cd3a
·
verified ·
1 Parent(s): 133efb6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +13 -0
README.md CHANGED
@@ -206,6 +206,19 @@ evalplus.evaluate \
206
  | **Average Score** | **70.30** | **70.26** |
207
  | **Recovery** | **100.00** | **99.95** |
208
 
 
 
 
 
 
 
 
 
 
 
 
 
 
209
  #### HumanEval pass@1 scores
210
  | Metric | ibm-granite/granite-3.1-8b-instruct | neuralmagic-ent/granite-3.1-8b-instruct-quantized.w8a8 |
211
  |-----------------------------------------|:---------------------------------:|:-------------------------------------------:|
 
206
  | **Average Score** | **70.30** | **70.26** |
207
  | **Recovery** | **100.00** | **99.95** |
208
 
209
+ #### OpenLLM Leaderboard V2 evaluation scores
210
+
211
+ | Metric | ibm-granite/granite-3.1-8b-instruct | neuralmagic-ent/granite-3.1-8b-instruct-quantized.w8a8 |
212
+ |-----------------------------------------|:---------------------------------:|:-------------------------------------------:|
213
+ | IFEval (Inst Level Strict Acc, 0-shot)| 74.01 | 73.50 |
214
+ | BBH (Acc-Norm, 3-shot) | 53.19 | 52.59 |
215
+ | Math-Hard (Exact-Match, 4-shot) | 14.77 | 15.73 |
216
+ | GPQA (Acc-Norm, 0-shot) | 31.76 | 30.62 |
217
+ | MUSR (Acc-Norm, 0-shot) | 46.01 | 44.30 |
218
+ | MMLU-Pro (Acc, 5-shot) | 35.81 | 35.41 |
219
+ | **Average Score** | **42.61** | **42.03** |
220
+ | **Recovery** | **100.00** | **98.64** |
221
+
222
  #### HumanEval pass@1 scores
223
  | Metric | ibm-granite/granite-3.1-8b-instruct | neuralmagic-ent/granite-3.1-8b-instruct-quantized.w8a8 |
224
  |-----------------------------------------|:---------------------------------:|:-------------------------------------------:|