Spaces:

Intel
/

low_bit_open_llm_leaderboard

Running

lvkaokao commited on May 7

Commit

4121a2d

•

1 Parent(s): 0b39ee8

update.

Files changed (1) hide show

src/display/about.py CHANGED Viewed

@@ -40,22 +40,32 @@ We chose these benchmarks as they test a variety of reasoning and general knowle
 ## REPRODUCIBILITY
 To reproduce our results, here is the commands you can run, using [v0.4.2](https://github.com/EleutherAI/lm-evaluation-harness/tree/v0.4.2) of the Eleuther AI Harness:
-`python main.py --model=hf-causal-experimental --model_args="pretrained=<your_model>,use_accelerate=True,revision=<your_model_revision>"`
-` --tasks=<task_list> --num_fewshot=<n_few_shot> --batch_size=1 --output_path=<output_path>`
 ```
-python main.py --model=hf-causal-experimental \
-    --model_args="pretrained=<your_model>,use_accelerate=True,revision=<your_model_revision>" \
-    --tasks=<task_list> \
-    --num_fewshot=<n_few_shot> \
-    --batch_size=1 \
     --output_path=<output_path>
 ```
-**Note:** You can expect results to vary slightly for different batch sizes because of padding.
-The tasks and few shots parameters are:
 - ARC-C: 0-shot, *arc_challenge* (`acc`)
 - ARC-E: 0-shot, *arc_easy* (`acc`)
 - HellaSwag: 0-shot, *hellaswag* (`acc`)

 ## REPRODUCIBILITY
 To reproduce our results, here is the commands you can run, using [v0.4.2](https://github.com/EleutherAI/lm-evaluation-harness/tree/v0.4.2) of the Eleuther AI Harness:
+```
+python main.py --model=hf-causal-experimental
+    --model_args="pretrained=<your_model>,use_accelerate=True,revision=<your_model_revision>"
+    --tasks=<task_list>
+    --num_fewshot=<n_few_shot>
+    --batch_size=1
+    --output_path=<output_path>
+```
 ```
+python main.py --model=hf-causal-experimental
+    --model_args="pretrained=<your_model>,use_accelerate=True,revision=<your_model_revision>"
+    --tasks=<task_list>
+    --num_fewshot=<n_few_shot>
+    --batch_size=1
     --output_path=<output_path>
 ```
+**Note:**
+- We run `llama.cpp` series models on Xeon CPU and others on NVidia GPU.
+- If model paramerters > 7B, we use `--batch_size 4`. If model parameters < 7B, we use `--batch_size 2`. And we set `--batch_size 1` for llama.cpp. You can expect results to vary slightly for different batch sizes because of padding.
+### The tasks and few shots parameters are:
 - ARC-C: 0-shot, *arc_challenge* (`acc`)
 - ARC-E: 0-shot, *arc_easy* (`acc`)
 - HellaSwag: 0-shot, *hellaswag* (`acc`)