pythia-70m-sft-hh / README.md
usvsnsp's picture
Add Evaluation Results
3df871a
|
raw
history blame
874 Bytes

Wandb runs: https://wandb.ai/eleutherai/pythia-rlhf/runs/s0qdwbg6?workspace=user-yongzx

Evaluation results:

Task Version Filter Metric Value Stderr
arc_challenge Yaml none acc 0.1758 ± 0.0111
none acc_norm 0.2176 ± 0.0121
arc_easy Yaml none acc 0.3742 ± 0.0099
none acc_norm 0.3565 ± 0.0098
logiqa Yaml none acc 0.2058 ± 0.0159
none acc_norm 0.2412 ± 0.0168
piqa Yaml none acc 0.5958 ± 0.0114
none acc_norm 0.5941 ± 0.0115
sciq Yaml none acc 0.5930 ± 0.0155
none acc_norm 0.5720 ± 0.0157
winogrande Yaml none acc 0.5154 ± 0.0140