leaderboard-pr-bot's picture
Adding Evaluation Results
b30330c
|
raw
history blame
1.31 kB

We release the long instruction-following dataset, LongAlpaca-12k and the corresponding models, LongAlpaca-7B, LongAlpaca-13B, and LongAlpaca-70B.

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric Value
Avg. 30.42
ARC (25-shot) 26.54
HellaSwag (10-shot) 26.1
MMLU (5-shot) 23.12
TruthfulQA (0-shot) 49.16
Winogrande (5-shot) 64.33
GSM8K (5-shot) 0.0
DROP (3-shot) 23.71