Spaces:
Running
on
CPU Upgrade
Running
on
CPU Upgrade
pminervini
commited on
Commit
•
51b654e
1
Parent(s):
855cd65
update
Browse files- src/display/about.py +1 -1
src/display/about.py
CHANGED
@@ -56,7 +56,7 @@ LLM_BENCHMARKS_DETAILS = f"""
|
|
56 |
# Reproducibility
|
57 |
To reproduce our results, here is the commands you can run, using [this script](https://huggingface.co/spaces/hallucinations-leaderboard/leaderboard/blob/main/backend-cli.py): python backend-cli.py.
|
58 |
|
59 |
-
Alternatively, if you're interested in evaluating a specific task with a particular model, you can use [
|
60 |
`python main.py --model=hf-causal-experimental --model_args="pretrained=<your_model>,revision=<your_model_revision>"`
|
61 |
` --tasks=<task_list> --num_fewshot=<n_few_shot> --batch_size=1 --output_path=<output_path>` (Note that you may need to add tasks from [here](https://huggingface.co/spaces/hallucinations-leaderboard/leaderboard/tree/main/src/backend/tasks) to [this folder](https://github.com/EleutherAI/lm-evaluation-harness/tree/b281b0921b636bc36ad05c0b0b0763bd6dd43463/lm_eval/tasks))
|
62 |
|
|
|
56 |
# Reproducibility
|
57 |
To reproduce our results, here is the commands you can run, using [this script](https://huggingface.co/spaces/hallucinations-leaderboard/leaderboard/blob/main/backend-cli.py): python backend-cli.py.
|
58 |
|
59 |
+
Alternatively, if you're interested in evaluating a specific task with a particular model, you can use the [EleutherAI LLM Evaluation Harness library](https://github.com/EleutherAI/lm-evaluation-harness/) as follows:
|
60 |
`python main.py --model=hf-causal-experimental --model_args="pretrained=<your_model>,revision=<your_model_revision>"`
|
61 |
` --tasks=<task_list> --num_fewshot=<n_few_shot> --batch_size=1 --output_path=<output_path>` (Note that you may need to add tasks from [here](https://huggingface.co/spaces/hallucinations-leaderboard/leaderboard/tree/main/src/backend/tasks) to [this folder](https://github.com/EleutherAI/lm-evaluation-harness/tree/b281b0921b636bc36ad05c0b0b0763bd6dd43463/lm_eval/tasks))
|
62 |
|