LM-Eval-Harness Evaluation
#5
by
GuanCL
- opened
Hi, I tested the activation-sparse models with lm-eval:
lm_eval --model hf \
--model_args pretrained=${MODEL},parallelize=True,trust_remote_code=True \
--tasks arc_easy,arc_challenge,boolq,hellaswag,openbookqa,lambada_openai,mmlu,piqa,truthfulqa_mc1,winogrande \
--batch_size auto
the outcomes seem bad on prosparse-llama models (sparse mini-CPM is sense), is there any insights?
metric 'acc' reported, 0-shot