Spaces:
Runtime error
Runtime error
pminervini
commited on
Commit
•
9cbf014
1
Parent(s):
a008a91
update
Browse files- src/display/about.py +3 -5
src/display/about.py
CHANGED
@@ -54,11 +54,9 @@ As large language models (LLMs) get better at creating believable texts, address
|
|
54 |
# Reproducibility
|
55 |
To reproduce our results, here is the commands you can run, using [this script](https://huggingface.co/spaces/hallucinations-leaderboard/leaderboard/blob/main/backend-cli.py): python backend-cli.py.
|
56 |
|
57 |
-
Alternatively, if you're interested in evaluating a specific task with a particular model, you can use [
|
58 |
-
`python main.py --model=hf-causal-experimental --model_args="pretrained=<your_model>,revision=<your_model_revision>"`
|
59 |
-
` --tasks=<task_list> --num_fewshot=<n_few_shot> --batch_size=
|
60 |
-
|
61 |
-
The total batch size we get for models which fit on one A100 node is 8 (8 GPUs * 1). If you don't use parallelism, adapt your batch size to fit. You can expect results to vary slightly for different batch sizes because of padding.
|
62 |
|
63 |
The tasks and few shots parameters are:
|
64 |
|
|
|
54 |
# Reproducibility
|
55 |
To reproduce our results, here is the commands you can run, using [this script](https://huggingface.co/spaces/hallucinations-leaderboard/leaderboard/blob/main/backend-cli.py): python backend-cli.py.
|
56 |
|
57 |
+
Alternatively, if you're interested in evaluating a specific task with a particular model, you can use the [EleutherAI Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness/):
|
58 |
+
`python main.py --model=hf-causal-experimental --model_args="pretrained=<your_model>,parallelize=True,revision=<your_model_revision>"`
|
59 |
+
` --tasks=<task_list> --num_fewshot=<n_few_shot> --batch_size=auto --output_path=<output_path>` (Note that you may need to add tasks from [here](https://huggingface.co/spaces/hallucinations-leaderboard/leaderboard/tree/main/src/backend/tasks) to [this folder](https://github.com/EleutherAI/lm-evaluation-harness/tree/b281b0921b636bc36ad05c0b0b0763bd6dd43463/lm_eval/tasks))
|
|
|
|
|
60 |
|
61 |
The tasks and few shots parameters are:
|
62 |
|