ChuckMcSneed commited on
Commit
195a19c
1 Parent(s): cb2b48a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -9
README.md CHANGED
@@ -180,14 +180,6 @@ Then I SLERP-merged it with cognitivecomputations/dolphin-2.2-70b (Needed to bri
180
 
181
  Absurdly high. That's what happens when you optimize the merges for a benchmark.
182
 
183
- ### Open LLM leaderboard
184
- [Leaderboard on Huggingface](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
185
- |Model |Average|ARC |HellaSwag|MMLU |TruthfulQA|Winogrande|GSM8K|
186
- |--------------------------------|-------|-----|---------|-----|----------|----------|-----|
187
- |ChuckMcSneed/Gembo-v1-70b |70.51 |71.25|86.98 |70.85|63.25 |80.51 |50.19|
188
- |ChuckMcSneed/SMaxxxer-v1-70b |72.23 |70.65|88.02 |70.55|60.7 |82.87 |60.58|
189
-
190
- Looks like adding a shitton of RP stuff decreased HellaSwag, WinoGrande and GSM8K, but increased TruthfulQA, MMLU and ARC. Interesting. To be hosnest, I'm a bit surprised that it didn't do that much worse.
191
 
192
  ### WolframRavenwolf
193
  Benchmark by [@wolfram](https://huggingface.co/wolfram)
@@ -198,7 +190,16 @@ Artefact2/Gembo-v1-70b-GGUF GGUF Q5_K_M, 4K context, Alpaca format:
198
  - ➖ Did NOT follow instructions to answer with just a single letter or more than just a single letter.
199
 
200
  This shows that this model can be used for real world use cases as an assistant.
201
- # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
 
 
 
 
 
 
 
 
 
202
  Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_ChuckMcSneed__Gembo-v1-70b)
203
 
204
  | Metric |Value|
 
180
 
181
  Absurdly high. That's what happens when you optimize the merges for a benchmark.
182
 
 
 
 
 
 
 
 
 
183
 
184
  ### WolframRavenwolf
185
  Benchmark by [@wolfram](https://huggingface.co/wolfram)
 
190
  - ➖ Did NOT follow instructions to answer with just a single letter or more than just a single letter.
191
 
192
  This shows that this model can be used for real world use cases as an assistant.
193
+
194
+ ### [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
195
+ [Leaderboard on Huggingface](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
196
+ |Model |Average|ARC |HellaSwag|MMLU |TruthfulQA|Winogrande|GSM8K|
197
+ |--------------------------------|-------|-----|---------|-----|----------|----------|-----|
198
+ |ChuckMcSneed/Gembo-v1-70b |70.51 |71.25|86.98 |70.85|63.25 |80.51 |50.19|
199
+ |ChuckMcSneed/SMaxxxer-v1-70b |72.23 |70.65|88.02 |70.55|60.7 |82.87 |60.58|
200
+
201
+ Looks like adding a shitton of RP stuff decreased HellaSwag, WinoGrande and GSM8K, but increased TruthfulQA, MMLU and ARC. Interesting. To be hosnest, I'm a bit surprised that it didn't do that much worse.
202
+
203
  Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_ChuckMcSneed__Gembo-v1-70b)
204
 
205
  | Metric |Value|