ChuckMcSneed commited on
Commit
798b7fe
1 Parent(s): eef189e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +13 -1
README.md CHANGED
@@ -75,6 +75,8 @@ Then I SLERP-merged it with cognitivecomputations/dolphin-2.2-70b (Needed to bri
75
  | P | 5.25 |
76
  | Total | 19.75 |
77
 
 
 
78
  ### Open LLM leaderboard
79
  [Leaderboard on Huggingface](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
80
  |Model |Average|ARC |HellaSwag|MMLU |TruthfulQA|Winogrande|GSM8K|
@@ -82,4 +84,14 @@ Then I SLERP-merged it with cognitivecomputations/dolphin-2.2-70b (Needed to bri
82
  |ChuckMcSneed/Gembo-v1-70b |70.51 |71.25|86.98 |70.85|63.25 |80.51 |50.19|
83
  |ChuckMcSneed/SMaxxxer-v1-70b |72.23 |70.65|88.02 |70.55|60.7 |82.87 |60.58|
84
 
85
- Looks like adding a shitton of RP stuff decreased HellaSwag, WinoGrande and GSM8K, but increased TruthfulQA, MMLU and ARC. Interesting. To be hosnest, I'm a bit surprised that it didn't do that much worse.
 
 
 
 
 
 
 
 
 
 
 
75
  | P | 5.25 |
76
  | Total | 19.75 |
77
 
78
+ Absurdly high. That's what happens when you optimize the merges for a benchmark.
79
+
80
  ### Open LLM leaderboard
81
  [Leaderboard on Huggingface](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
82
  |Model |Average|ARC |HellaSwag|MMLU |TruthfulQA|Winogrande|GSM8K|
 
84
  |ChuckMcSneed/Gembo-v1-70b |70.51 |71.25|86.98 |70.85|63.25 |80.51 |50.19|
85
  |ChuckMcSneed/SMaxxxer-v1-70b |72.23 |70.65|88.02 |70.55|60.7 |82.87 |60.58|
86
 
87
+ Looks like adding a shitton of RP stuff decreased HellaSwag, WinoGrande and GSM8K, but increased TruthfulQA, MMLU and ARC. Interesting. To be hosnest, I'm a bit surprised that it didn't do that much worse.
88
+
89
+ ### WolframRavenwolf
90
+ Benchmark by [@wolfram](https://huggingface.co/wolfram)
91
+
92
+ Artefact2/Gembo-v1-70b-GGUF GGUF Q5_K_M, 4K context, Alpaca format:
93
+ - ✅ Gave correct answers to all 18/18 multiple choice questions! Just the questions, no previous information, gave correct answers: 16/18
94
+ - ✅ Consistently acknowledged all data input with "OK".
95
+ - ➖ Did NOT follow instructions to answer with just a single letter or more than just a single letter.
96
+
97
+ This shows that this model can be used for real world use cases as an assistant.