Update README.md
Browse files
README.md
CHANGED
@@ -61,7 +61,7 @@ The model was trained with compute provided by [HessianAI](https://hessian.ai/)
|
|
61 |
### Hugginface Leaderboard
|
62 |
|
63 |
This models is still an early Alpha and we can't guarantee that there isn't any contamination.
|
64 |
-
|
65 |
|
66 |
| Metric | Value |
|
67 |
|-----------------------|-------|
|
@@ -73,6 +73,12 @@ However, the average of **71.24** would earn the #3 spot on the HF leaderboard a
|
|
73 |
| GSM8k (5-shot) | 63.68 |
|
74 |
| **Avg.** | **71.24** |
|
75 |
|
|
|
|
|
|
|
|
|
|
|
|
|
76 |
We use [Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness) to run the benchmark tests above, using the same version as the HuggingFace LLM Leaderboard.
|
77 |
|
78 |
### FastEval
|
|
|
61 |
### Hugginface Leaderboard
|
62 |
|
63 |
This models is still an early Alpha and we can't guarantee that there isn't any contamination.
|
64 |
+
The following are the scores from our own evaluation.
|
65 |
|
66 |
| Metric | Value |
|
67 |
|-----------------------|-------|
|
|
|
73 |
| GSM8k (5-shot) | 63.68 |
|
74 |
| **Avg.** | **71.24** |
|
75 |
|
76 |
+
The model is now also officially ranked on the Open LLM Leaderboard as #6 overall and as the second strongest Llama-2-70b based model (ranking only begind TigerBot 70b):
|
77 |
+
|
78 |
+
![image/png](https://cdn-uploads.huggingface.co/production/uploads/62e3b6ab0c2a907c388e4965/0ZIBCnO08tX44ilGcl8Wb.png)
|
79 |
+
(Screenshot from the 05. of December 2023)
|
80 |
+
|
81 |
+
|
82 |
We use [Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness) to run the benchmark tests above, using the same version as the HuggingFace LLM Leaderboard.
|
83 |
|
84 |
### FastEval
|