LM-Eval-Harness Evaluation

#5
by GuanCL - opened

Hi, I tested the activation-sparse models with lm-eval:

lm_eval --model hf \
    --model_args pretrained=${MODEL},parallelize=True,trust_remote_code=True \
    --tasks arc_easy,arc_challenge,boolq,hellaswag,openbookqa,lambada_openai,mmlu,piqa,truthfulqa_mc1,winogrande \
    --batch_size auto

image.png

the outcomes seem bad on prosparse-llama models (sparse mini-CPM is sense), is there any insights?

metric 'acc' reported, 0-shot

Sign up or log in to comment