ehartford leaderboard-pr-bot commited on
Commit
07a2f3f
1 Parent(s): c673387

Adding Evaluation Results (#7)

Browse files

- Adding Evaluation Results (86c03ac60589425fe099116ca7b073af01657670)


Co-authored-by: Open LLM Leaderboard PR Bot <leaderboard-pr-bot@users.noreply.huggingface.co>

Files changed (1) hide show
  1. README.md +14 -1
README.md CHANGED
@@ -55,4 +55,17 @@ What is the best way to train a dolphin to obey me? Please answer step by step.
55
 
56
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/63111b2d88942700629f5771/xnz5M1lYd4oGVATSDRkQ-.png)
57
 
58
- [Buy me a coffee](https://www.buymeacoffee.com/ehartford)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
55
 
56
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/63111b2d88942700629f5771/xnz5M1lYd4oGVATSDRkQ-.png)
57
 
58
+ [Buy me a coffee](https://www.buymeacoffee.com/ehartford)
59
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
60
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_ehartford__dolphin-2.0-mistral-7b)
61
+
62
+ | Metric | Value |
63
+ |-----------------------|---------------------------|
64
+ | Avg. | 55.85 |
65
+ | ARC (25-shot) | 59.22 |
66
+ | HellaSwag (10-shot) | 80.26 |
67
+ | MMLU (5-shot) | 56.9 |
68
+ | TruthfulQA (0-shot) | 61.09 |
69
+ | Winogrande (5-shot) | 75.37 |
70
+ | GSM8K (5-shot) | 18.65 |
71
+ | DROP (3-shot) | 39.49 |