lvkaokao
commited on
Commit
•
4121a2d
1
Parent(s):
0b39ee8
update.
Browse files- src/display/about.py +19 -9
src/display/about.py
CHANGED
@@ -40,22 +40,32 @@ We chose these benchmarks as they test a variety of reasoning and general knowle
|
|
40 |
|
41 |
## REPRODUCIBILITY
|
42 |
To reproduce our results, here is the commands you can run, using [v0.4.2](https://github.com/EleutherAI/lm-evaluation-harness/tree/v0.4.2) of the Eleuther AI Harness:
|
43 |
-
|
44 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
45 |
|
46 |
```
|
47 |
-
python main.py --model=hf-causal-experimental
|
48 |
-
--model_args="pretrained=<your_model>,use_accelerate=True,revision=<your_model_revision>"
|
49 |
-
--tasks=<task_list>
|
50 |
-
--num_fewshot=<n_few_shot>
|
51 |
-
--batch_size=1
|
52 |
--output_path=<output_path>
|
53 |
|
54 |
```
|
55 |
|
56 |
-
**Note:**
|
|
|
|
|
|
|
|
|
57 |
|
58 |
-
The tasks and few shots parameters are:
|
59 |
- ARC-C: 0-shot, *arc_challenge* (`acc`)
|
60 |
- ARC-E: 0-shot, *arc_easy* (`acc`)
|
61 |
- HellaSwag: 0-shot, *hellaswag* (`acc`)
|
|
|
40 |
|
41 |
## REPRODUCIBILITY
|
42 |
To reproduce our results, here is the commands you can run, using [v0.4.2](https://github.com/EleutherAI/lm-evaluation-harness/tree/v0.4.2) of the Eleuther AI Harness:
|
43 |
+
```
|
44 |
+
python main.py --model=hf-causal-experimental
|
45 |
+
--model_args="pretrained=<your_model>,use_accelerate=True,revision=<your_model_revision>"
|
46 |
+
--tasks=<task_list>
|
47 |
+
--num_fewshot=<n_few_shot>
|
48 |
+
--batch_size=1
|
49 |
+
--output_path=<output_path>
|
50 |
+
```
|
51 |
|
52 |
```
|
53 |
+
python main.py --model=hf-causal-experimental
|
54 |
+
--model_args="pretrained=<your_model>,use_accelerate=True,revision=<your_model_revision>"
|
55 |
+
--tasks=<task_list>
|
56 |
+
--num_fewshot=<n_few_shot>
|
57 |
+
--batch_size=1
|
58 |
--output_path=<output_path>
|
59 |
|
60 |
```
|
61 |
|
62 |
+
**Note:**
|
63 |
+
- We run `llama.cpp` series models on Xeon CPU and others on NVidia GPU.
|
64 |
+
- If model paramerters > 7B, we use `--batch_size 4`. If model parameters < 7B, we use `--batch_size 2`. And we set `--batch_size 1` for llama.cpp. You can expect results to vary slightly for different batch sizes because of padding.
|
65 |
+
|
66 |
+
|
67 |
|
68 |
+
### The tasks and few shots parameters are:
|
69 |
- ARC-C: 0-shot, *arc_challenge* (`acc`)
|
70 |
- ARC-E: 0-shot, *arc_easy* (`acc`)
|
71 |
- HellaSwag: 0-shot, *hellaswag* (`acc`)
|