cicdatopea commited on
Commit
feaca7b
·
verified ·
1 Parent(s): f086880

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +17 -16
README.md CHANGED
@@ -186,8 +186,7 @@ without assuming the other birds fly away, then after
186
 
187
  ### Evaluate the model
188
 
189
- will update later
190
- <!-- pip3 install lm-eval==0.4.7
191
  we found lm-eval is very unstable for this model. Please set `add_bos_token=True `to align with the origin model. **Please use autogptq format**
192
 
193
  ```bash
@@ -195,20 +194,22 @@ lm-eval --model hf --model_args pretrained=OPEA/DeepSeek-R1-Distill-Llama-70B-in
195
  ```
196
  | Metric | BF16 | INT4 |
197
  | :------------------------ | :---------------------- | :--------------- |
198
- | avg | 0.6647 | 0.6639|
199
- | leaderboard_mmlu_pro | - | - |
200
- | mmlu | 0.7964 | 0.7928 |
201
- | lambada_openai | 0.6649 | 0.6718 |
202
- | hellaswag | 0.6292 | 0.6223 |
203
- | winogrande | 0.7482 | 0.7482 |
204
- | piqa | 0.8058 | 0.7982 |
205
- | truthfulqa_mc1 | 0.3831 | 0.3905 |
206
- | openbookqa | 0.3520 | 0.3520 |
207
- | boolq | 0.8963 | 0.8972 |
208
- | arc_easy | 0.8207 | 0.8194 |
209
- | arc_challenge | 0.5503 | 0.5469 |
210
- | leaderboard_ifeval | - | - |
211
- | gsm8k | - | - | -->
 
 
212
 
213
 
214
 
 
186
 
187
  ### Evaluate the model
188
 
189
+ pip3 install lm-eval==0.4.7
 
190
  we found lm-eval is very unstable for this model. Please set `add_bos_token=True `to align with the origin model. **Please use autogptq format**
191
 
192
  ```bash
 
194
  ```
195
  | Metric | BF16 | INT4 |
196
  | :------------------------ | :---------------------- | :--------------- |
197
+ | avg | 0.6636 | 0.6678 |
198
+ |----------------------|--------|--------|
199
+ | leaderboard_mmlu_pro | 0.4913 | 0.4780 |
200
+ | mmlu | 0.7752 | 0.7791 |
201
+ | lambada_openai | 0.6977 | 0.6996 |
202
+ | hellaswag | 0.6408 | 0.6438 |
203
+ | winogrande | 0.7530 | 0.7782 |
204
+ | piqa | 0.8112 | 0.8194 |
205
+ | truthfulqa_mc1 | 0.3709 | 0.3721 |
206
+ | openbookqa | 0.3380 | 0.3600 |
207
+ | boolq | 0.8847 | 0.8917 |
208
+ | arc_easy | 0.8131 | 0.8106 |
209
+ | arc_challenge | 0.5512 | 0.5239 |
210
+ | leaderboard_ifeval | 0.4421 | 0.4208 |
211
+ | gsm8k | 0.9295 | 0.9265 |
212
+
213
 
214
 
215