task,metric,value,err,version arc_challenge,acc,0.20051194539249148,0.01170031805049937,0 arc_challenge,acc_norm,0.24658703071672355,0.012595726268790127,0 arc_easy,acc,0.4911616161616162,0.010258180468004828,0 arc_easy,acc_norm,0.4297138047138047,0.01015790800576368,0 boolq,acc,0.5651376146788991,0.008670528471841561,1 hellaswag,acc,0.3211511651065525,0.0046596447333095365,0 hellaswag,acc_norm,0.36755626369249156,0.004811543077792706,0 sciq,acc,0.746,0.013772206565168543,0 sciq,acc_norm,0.662,0.01496596071022449,0