|
Wandb runs: https://wandb.ai/eleutherai/pythia-rlhf/runs/s0qdwbg6?workspace=user-yongzx |
|
|
|
|
|
Evaluation results: |
|
| Task |Version|Filter| Metric |Value | |Stderr| |
|
|-------------|-------|------|--------|-----:|---|-----:| |
|
|arc_challenge|Yaml |none |acc |0.1758|± |0.0111| |
|
| | |none |acc_norm|0.2176|± |0.0121| |
|
|arc_easy |Yaml |none |acc |0.3742|± |0.0099| |
|
| | |none |acc_norm|0.3565|± |0.0098| |
|
|logiqa |Yaml |none |acc |0.2058|± |0.0159| |
|
| | |none |acc_norm|0.2412|± |0.0168| |
|
|piqa |Yaml |none |acc |0.5958|± |0.0114| |
|
| | |none |acc_norm|0.5941|± |0.0115| |
|
|sciq |Yaml |none |acc |0.5930|± |0.0155| |
|
| | |none |acc_norm|0.5720|± |0.0157| |
|
|winogrande |Yaml |none |acc |0.5154|± |0.0140| |