chargoddard's picture
Adding Evaluation Results (#1)
9f457c2
|
raw
history blame
943 Bytes
---
datasets:
- togethercomputer/RedPajama-Data-1T-Sample
tags:
- llama2
- llama
---
Similar to llama2-22b, but with BLOCK_DIAGONAL=false in the merge and twice the fine-tuning tokens.
Again, not intended for direct use - meant as a base for further tuning and merging.
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_chargoddard__llama2-22b-blocktriangular)
| Metric | Value |
|-----------------------|---------------------------|
| Avg. | 46.86 |
| ARC (25-shot) | 58.28 |
| HellaSwag (10-shot) | 82.69 |
| MMLU (5-shot) | 54.53 |
| TruthfulQA (0-shot) | 39.23 |
| Winogrande (5-shot) | 75.93 |
| GSM8K (5-shot) | 11.22 |
| DROP (3-shot) | 6.17 |