leaderboard-pr-bot's picture
Adding Evaluation Results
3279010
|
raw
history blame
670 Bytes

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric Value
Avg. 51.36
ARC (25-shot) 63.57
HellaSwag (10-shot) 83.51
MMLU (5-shot) 59.82
TruthfulQA (0-shot) 55.96
Winogrande (5-shot) 76.16
GSM8K (5-shot) 8.42
DROP (3-shot) 12.09