Henk717
/

chronoboros-33B

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Adding Evaluation Results

#3

by leaderboard-pr-bot - opened Nov 17, 2023

base: refs/heads/main

←

from: refs/pr/3

Discussion Files changed

Files changed (1) hide show

README.md +14 -1

README.md CHANGED Viewed

@@ -6,4 +6,17 @@ This model was the result of a 50/50 average weight merge between Airoboros-33B-
 After prolonged testing we concluded that while this merge is highly flexible and capable of many different tasks, it has to much variation in how it answers to be reliable.
 Because of this the model relies on some luck to get good results, and is therefore not recommended to people seeking a consistent experience, or people sensitive to anticipation based addictions.
-If you would like an improved version of this model that is more stable check out my Airochronos-33B merge.

 After prolonged testing we concluded that while this merge is highly flexible and capable of many different tasks, it has to much variation in how it answers to be reliable.
 Because of this the model relies on some luck to get good results, and is therefore not recommended to people seeking a consistent experience, or people sensitive to anticipation based addictions.
+If you would like an improved version of this model that is more stable check out my Airochronos-33B merge.
+# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
+Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_Henk717__chronoboros-33B)
+| Metric                | Value                     |
+|-----------------------|---------------------------|
+| Avg.                  | 51.45   |
+| ARC (25-shot)         | 63.91          |
+| HellaSwag (10-shot)   | 85.0    |
+| MMLU (5-shot)         | 59.44         |
+| TruthfulQA (0-shot)   | 49.83   |
+| Winogrande (5-shot)   | 80.35   |
+| GSM8K (5-shot)        | 15.01        |
+| DROP (3-shot)         | 6.62         |