task,metric,value,err,version anli_r1,acc,0.327,0.014842213153411244,0 anli_r2,acc,0.327,0.014842213153411245,0 anli_r3,acc,0.32166666666666666,0.013490095282989521,0 arc_challenge,acc,0.295221843003413,0.013329750293382318,0 arc_challenge,acc_norm,0.32849829351535836,0.013724978465537378,0 arc_easy,acc,0.6452020202020202,0.009817629113069696,0 arc_easy,acc_norm,0.6388888888888888,0.00985601342581124,0 boolq,acc,0.618960244648318,0.00849393752443934,1 cb,acc,0.5178571428571429,0.06737697508644647,1 cb,f1,0.32323232323232315,,1 copa,acc,0.77,0.04229525846816505,0 hellaswag,acc,0.45210117506472813,0.004966832553245033,0 hellaswag,acc_norm,0.5958972316271659,0.004897146690596266,0 piqa,acc,0.7399347116430903,0.010234893249061301,0 piqa,acc_norm,0.7453754080522307,0.0101644322370605,0 rte,acc,0.48736462093862815,0.030086851767188564,0 sciq,acc,0.922,0.008484573530118585,0 sciq,acc_norm,0.924,0.008384169266796387,0 storycloze_2016,acc,0.6974879743452699,0.010622307774396942,0 winogrande,acc,0.5714285714285714,0.013908353814606696,0