Lin-K76 commited on
Commit
53de35d
·
verified ·
1 Parent(s): 9f5dfe3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -10
README.md CHANGED
@@ -25,7 +25,7 @@ language:
25
  - **Model Developers:** Neural Magic
26
 
27
  Quantized version of [Meta-Llama-3.1-70B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-70B-Instruct).
28
- It achieves an average score of 82.00 on the [OpenLLM](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard) benchmark (version 1), whereas the unquantized model achieves 82.74.
29
 
30
  ### Model Optimizations
31
 
@@ -163,7 +163,7 @@ oneshot(
163
  ## Evaluation
164
 
165
  The model was evaluated on the [OpenLLM](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard) leaderboard tasks (version 1) with the [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) and the [vLLM](https://docs.vllm.ai/en/stable/) engine, using the following command.
166
- A modified version of ARC-C was used for evaluations, in line with Llama 3.1's prompting.
167
  ```
168
  lm_eval \
169
  --model vllm \
@@ -197,7 +197,7 @@ lm_eval \
197
  </td>
198
  </tr>
199
  <tr>
200
- <td>ARC Challenge (25-shot)
201
  </td>
202
  <td>95.05
203
  </td>
@@ -207,13 +207,13 @@ lm_eval \
207
  </td>
208
  </tr>
209
  <tr>
210
- <td>GSM-8K (5-shot, strict-match)
211
  </td>
212
- <td>87.95
213
  </td>
214
- <td>86.50
215
  </td>
216
- <td>98.35%
217
  </td>
218
  </tr>
219
  <tr>
@@ -249,11 +249,11 @@ lm_eval \
249
  <tr>
250
  <td><strong>Average</strong>
251
  </td>
252
- <td><strong>82.74</strong>
253
  </td>
254
- <td><strong>82.00</strong>
255
  </td>
256
- <td><strong>99.10%</strong>
257
  </td>
258
  </tr>
259
  </table>
 
25
  - **Model Developers:** Neural Magic
26
 
27
  Quantized version of [Meta-Llama-3.1-70B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-70B-Instruct).
28
+ It achieves an average score of 83.14 on the [OpenLLM](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard) benchmark (version 1), whereas the unquantized model achieves 83.61.
29
 
30
  ### Model Optimizations
31
 
 
163
  ## Evaluation
164
 
165
  The model was evaluated on the [OpenLLM](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard) leaderboard tasks (version 1) with the [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) and the [vLLM](https://docs.vllm.ai/en/stable/) engine, using the following command.
166
+ A modified version of ARC-C and GSM8k-cot was used for evaluations, in line with Llama 3.1's prompting. It can be accessed on the [Neural Magic fork of the lm-evaluation-harness](https://github.com/neuralmagic/lm-evaluation-harness/tree/llama_3.1_instruct).
167
  ```
168
  lm_eval \
169
  --model vllm \
 
197
  </td>
198
  </tr>
199
  <tr>
200
+ <td>ARC Challenge (0-shot)
201
  </td>
202
  <td>95.05
203
  </td>
 
207
  </td>
208
  </tr>
209
  <tr>
210
+ <td>GSM-8K-cot (8-shot, strict-match)
211
  </td>
212
+ <td>93.18
213
  </td>
214
+ <td>93.33
215
  </td>
216
+ <td>100.1%
217
  </td>
218
  </tr>
219
  <tr>
 
249
  <tr>
250
  <td><strong>Average</strong>
251
  </td>
252
+ <td><strong>83.61</strong>
253
  </td>
254
+ <td><strong>83.14</strong>
255
  </td>
256
+ <td><strong>99.43%</strong>
257
  </td>
258
  </tr>
259
  </table>