LM-Eval-Harness Evaluation

by GuanCL - opened Sep 26

Sep 26

Hi, I tested the activation-sparse models with lm-eval:

lm_eval --model hf \
    --model_args pretrained=${MODEL},parallelize=True,trust_remote_code=True \
    --tasks arc_easy,arc_challenge,boolq,hellaswag,openbookqa,lambada_openai,mmlu,piqa,truthfulqa_mc1,winogrande \
    --batch_size auto

the outcomes seem bad on prosparse-llama models (sparse mini-CPM is sense), is there any insights?

GuanCL

Sep 26

metric 'acc' reported, 0-shot

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment