Spaces:
Running
on
CPU Upgrade
Running
on
CPU Upgrade
Are instruction models evaluated with chat template?
#1
by
alexrs
- opened
In the Hugging Face Harness fork it is possible to specify --apply_chat_template
and fewshot_as_multiturn
options for instruction models (https://huggingface.co/docs/leaderboards/open_llm_leaderboard/about#reproducibility). That does not seem to be the case in this leaderboard according to the reproducibility instructions and when I try it (the flag exists in the code -- https://github.com/mohamedalhajjar/lm-evaluation-harness-multilingual/blob/64286c9b9a270f9b72a9c4ba05e014b8284108da/lm_eval/__main__.py#L172) I get the following error:
[rank6]: Traceback (most recent call last):
[rank6]: File "/opt/conda/envs/openllm/lib/python3.10/runpy.py", line 196, in _run_module_as_main
[rank6]: return _run_code(code, main_globals, None,
[rank6]: File "/opt/conda/envs/openllm/lib/python3.10/runpy.py", line 86, in _run_code
[rank6]: exec(code, run_globals)
[rank6]: File "/opt/conda/envs/openllm/lib/python3.10/site-packages/lm_eval/__main__.py", line 461, in <module>
[rank6]: cli_evaluate()
[rank6]: File "/opt/conda/envs/openllm/lib/python3.10/site-packages/lm_eval/__main__.py", line 382, in cli_evaluate
[rank6]: results = evaluator.simple_evaluate(
[rank6]: File "/opt/conda/envs/openllm/lib/python3.10/site-packages/lm_eval/utils.py", line 397, in _wrapper
[rank6]: return fn(*args, **kwargs)
[rank6]: File "/opt/conda/envs/openllm/lib/python3.10/site-packages/lm_eval/evaluator.py", line 288, in simple_evaluate
[rank6]: evaluation_tracker.general_config_tracker.log_experiment_args(
[rank6]: File "/opt/conda/envs/openllm/lib/python3.10/site-packages/lm_eval/loggers/evaluation_tracker.py", line 97, in log_experiment_args
[rank6]: self.chat_template_sha = hash_string(chat_template) if chat_template else None
[rank6]: File "/opt/conda/envs/openllm/lib/python3.10/site-packages/lm_eval/utils.py", line 36, in hash_string
[rank6]: return hashlib.sha256(string.encode("utf-8")).hexdigest()
[rank6]: AttributeError: 'dict' object has no attribute 'encode'
Thank you for raising this. Could you please add it to the github repo to be fixed? Thanks!
malhajar
changed discussion status to
closed