alexmarques commited on
Commit
6b753ea
1 Parent(s): 7d1f72d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +23 -22
README.md CHANGED
@@ -32,7 +32,7 @@ base_model: meta-llama/Meta-Llama-3.1-405B-Instruct
32
  - **Model Developers:** Neural Magic
33
 
34
  Quantized version of [Meta-Llama-3.1-405B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-405B-Instruct).
35
- It achieves an average score of 86.47 on the [OpenLLM](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard) benchmark (version 1), whereas the unquantized model achieves 86.63.
36
 
37
  ### Model Optimizations
38
 
@@ -130,8 +130,9 @@ model.save_pretrained("Meta-Llama-3.1-405B-Instruct-quantized.w4a16")
130
 
131
  The model was evaluated on MMLU, ARC-Challenge, GSM-8K, Hellaswag, Winogrande and TruthfulQA.
132
  Evaluation was conducted using the Neural Magic fork of [lm-evaluation-harness](https://github.com/neuralmagic/lm-evaluation-harness/tree/llama_3.1_instruct) (branch llama_3.1_instruct) and the [vLLM](https://docs.vllm.ai/en/stable/) engine.
133
- This version of the lm-evaluation-harness includes versions of ARC-Challenge, GSM-8K, and MMLU that match the prompting style of [Meta-Llama-3.1-Instruct-evals](https://huggingface.co/datasets/meta-llama/Meta-Llama-3.1-8B-Instruct-evals).
134
 
 
135
 
136
  ### Accuracy
137
 
@@ -144,27 +145,27 @@ This version of the lm-evaluation-harness includes versions of ARC-Challenge, GS
144
  </td>
145
  <td><strong>Meta-Llama-3.1-405B-Instruct-quantized.w4a16 (this model)</strong>
146
  </td>
147
- <td><strong>Recovery (this model) </strong>
148
  </td>
149
  </tr>
150
  <tr>
151
  <td>MMLU (5-shot)
152
  </td>
153
- <td>86.25
154
  </td>
155
- <td>86.17
156
  </td>
157
- <td>99.90%
158
  </td>
159
  </tr>
160
  <tr>
161
  <td>ARC Challenge (0-shot)
162
  </td>
163
- <td>96.93
164
  </td>
165
- <td>95.3
166
  </td>
167
- <td>98.31%
168
  </td>
169
  </tr>
170
  <tr>
@@ -172,9 +173,9 @@ This version of the lm-evaluation-harness includes versions of ARC-Challenge, GS
172
  </td>
173
  <td>96.44
174
  </td>
175
- <td>96.05
176
  </td>
177
- <td>99.59%
178
  </td>
179
  </tr>
180
  <tr>
@@ -184,7 +185,7 @@ This version of the lm-evaluation-harness includes versions of ARC-Challenge, GS
184
  </td>
185
  <td>88.27
186
  </td>
187
- <td>99.93%
188
  </td>
189
  </tr>
190
  <tr>
@@ -192,9 +193,9 @@ This version of the lm-evaluation-harness includes versions of ARC-Challenge, GS
192
  </td>
193
  <td>87.21
194
  </td>
195
- <td>87.76
196
  </td>
197
- <td>100.63%
198
  </td>
199
  </tr>
200
  <tr>
@@ -202,19 +203,19 @@ This version of the lm-evaluation-harness includes versions of ARC-Challenge, GS
202
  </td>
203
  <td>64.64
204
  </td>
205
- <td>65.27
206
  </td>
207
- <td>100.97%
208
  </td>
209
  </tr>
210
  <tr>
211
  <td><strong>Average</strong>
212
  </td>
213
- <td><strong>86.63</strong>
214
  </td>
215
- <td><strong>86.47</strong>
216
  </td>
217
- <td><strong>99.81%</strong>
218
  </td>
219
  </tr>
220
  </table>
@@ -227,7 +228,7 @@ The results were obtained using the following commands:
227
  ```
228
  lm_eval \
229
  --model vllm \
230
- --model_args pretrained="neuralmagic/Meta-Llama-3.1-405B-Instruct-quantized.w4a16",dtype=auto,add_bos_token=True,max_model_len=4096,max_gen_toks=10,tensor_parallel_size=8 \
231
  --tasks mmlu_llama_3.1_instruct \
232
  --apply_chat_template \
233
  --fewshot_as_multiturn \
@@ -239,7 +240,7 @@ lm_eval \
239
  ```
240
  lm_eval \
241
  --model vllm \
242
- --model_args pretrained="neuralmagic/Meta-Llama-3.1-405B-Instruct-quantized.w4a16",dtype=auto,add_bos_token=True,max_model_len=4096,tensor_parallel_size=8 \
243
  --tasks arc_challenge_llama_3.1_instruct \
244
  --apply_chat_template \
245
  --num_fewshot 0 \
@@ -250,7 +251,7 @@ lm_eval \
250
  ```
251
  lm_eval \
252
  --model vllm \
253
- --model_args pretrained="neuralmagic/Meta-Llama-3.1-405B-Instruct-quantized.w4a16",dtype=auto,add_bos_token=True,max_model_len=4096,tensor_parallel_size=8 \
254
  --tasks gsm8k_cot_llama_3.1_instruct \
255
  --apply_chat_template \
256
  --fewshot_as_multiturn \
 
32
  - **Model Developers:** Neural Magic
33
 
34
  Quantized version of [Meta-Llama-3.1-405B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-405B-Instruct).
35
+ It achieves scores within 1% of the scores of the unquantized model for MMLU, ARC-Challenge, GSM-8k, Hellaswag, Winogrande, and TruthfulQA.
36
 
37
  ### Model Optimizations
38
 
 
130
 
131
  The model was evaluated on MMLU, ARC-Challenge, GSM-8K, Hellaswag, Winogrande and TruthfulQA.
132
  Evaluation was conducted using the Neural Magic fork of [lm-evaluation-harness](https://github.com/neuralmagic/lm-evaluation-harness/tree/llama_3.1_instruct) (branch llama_3.1_instruct) and the [vLLM](https://docs.vllm.ai/en/stable/) engine.
133
+ This version of the lm-evaluation-harness includes versions of ARC-Challenge, GSM-8K, and MMLU that match the prompting style of [Meta-Llama-3.1-Instruct-evals](https://huggingface.co/datasets/meta-llama/Meta-Llama-3.1-405B-Instruct-evals).
134
 
135
+ **Note:** Results have been updated after Meta modified the chat template.
136
 
137
  ### Accuracy
138
 
 
145
  </td>
146
  <td><strong>Meta-Llama-3.1-405B-Instruct-quantized.w4a16 (this model)</strong>
147
  </td>
148
+ <td><strong>Recovery</strong>
149
  </td>
150
  </tr>
151
  <tr>
152
  <td>MMLU (5-shot)
153
  </td>
154
+ <td>87.38
155
  </td>
156
+ <td>87.22
157
  </td>
158
+ <td>99.8%
159
  </td>
160
  </tr>
161
  <tr>
162
  <td>ARC Challenge (0-shot)
163
  </td>
164
+ <td>94.97
165
  </td>
166
+ <td>95.31
167
  </td>
168
+ <td>100.4%
169
  </td>
170
  </tr>
171
  <tr>
 
173
  </td>
174
  <td>96.44
175
  </td>
176
+ <td>96.29
177
  </td>
178
+ <td>99.8%
179
  </td>
180
  </tr>
181
  <tr>
 
185
  </td>
186
  <td>88.27
187
  </td>
188
+ <td>99.9%
189
  </td>
190
  </tr>
191
  <tr>
 
193
  </td>
194
  <td>87.21
195
  </td>
196
+ <td>87.37
197
  </td>
198
+ <td>100.2%
199
  </td>
200
  </tr>
201
  <tr>
 
203
  </td>
204
  <td>64.64
205
  </td>
206
+ <td>65.26
207
  </td>
208
+ <td>101.0%
209
  </td>
210
  </tr>
211
  <tr>
212
  <td><strong>Average</strong>
213
  </td>
214
+ <td><strong>86.75</strong>
215
  </td>
216
+ <td><strong>86.76</strong>
217
  </td>
218
+ <td><strong>100.0%</strong>
219
  </td>
220
  </tr>
221
  </table>
 
228
  ```
229
  lm_eval \
230
  --model vllm \
231
+ --model_args pretrained="neuralmagic/Meta-Llama-3.1-405B-Instruct-quantized.w4a16",dtype=auto,max_model_len=4096,max_gen_toks=10,tensor_parallel_size=8 \
232
  --tasks mmlu_llama_3.1_instruct \
233
  --apply_chat_template \
234
  --fewshot_as_multiturn \
 
240
  ```
241
  lm_eval \
242
  --model vllm \
243
+ --model_args pretrained="neuralmagic/Meta-Llama-3.1-405B-Instruct-quantized.w4a16",dtype=auto,max_model_len=4096,tensor_parallel_size=8 \
244
  --tasks arc_challenge_llama_3.1_instruct \
245
  --apply_chat_template \
246
  --num_fewshot 0 \
 
251
  ```
252
  lm_eval \
253
  --model vllm \
254
+ --model_args pretrained="neuralmagic/Meta-Llama-3.1-405B-Instruct-quantized.w4a16",dtype=auto,max_model_len=4096,tensor_parallel_size=8 \
255
  --tasks gsm8k_cot_llama_3.1_instruct \
256
  --apply_chat_template \
257
  --fewshot_as_multiturn \