DiscoResearch
/

DiscoLM-70b

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

bjoernp commited on Dec 7, 2023

Commit

93db18f

·

1 Parent(s): f7e859a

Update README.md

Files changed (1) hide show

README.md +7 -1

README.md CHANGED Viewed

@@ -61,7 +61,7 @@ The model was trained with compute provided by [HessianAI](https://hessian.ai/)
 ### Hugginface Leaderboard
 This models is still an early Alpha and we can't guarantee that there isn't any contamination.
-However, the average of **71.24** would earn the #3 spot on the HF leaderboard at the time of writing.
 | Metric | Value |
 |-----------------------|-------|
@@ -73,6 +73,12 @@ However, the average of **71.24** would earn the #3 spot on the HF leaderboard a
 | GSM8k (5-shot)   | 63.68 |
 | **Avg.**                  | **71.24** |
 We use [Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness) to run the benchmark tests above, using the same version as the HuggingFace LLM Leaderboard.
 ### FastEval

 ### Hugginface Leaderboard
 This models is still an early Alpha and we can't guarantee that there isn't any contamination.
+The following are the scores from our own evaluation.
 | Metric | Value |
 |-----------------------|-------|
 | GSM8k (5-shot)   | 63.68 |
 | **Avg.**                  | **71.24** |
+The model is now also officially ranked on the Open LLM Leaderboard as #6 overall and as the second strongest Llama-2-70b based model (ranking only begind TigerBot 70b):
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/62e3b6ab0c2a907c388e4965/0ZIBCnO08tX44ilGcl8Wb.png)
+(Screenshot from the 05. of December 2023)
 We use [Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness) to run the benchmark tests above, using the same version as the HuggingFace LLM Leaderboard.
 ### FastEval