bleysg commited on
Commit
5be510b
1 Parent(s): 54e9364

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -38
README.md CHANGED
@@ -28,7 +28,7 @@ This model is being released as a demonstration of the performance of our new cu
28
 
29
  This new dataset release provides an efficient means of reaching performance on-par with using larger slices of our data, while only including ~500k GPT-4 completions.
30
 
31
- HF Leaderboard evals place this model as TBD
32
 
33
  Codename: "*MistralSlimOrca*"
34
 
@@ -43,14 +43,6 @@ or check the OpenAccess AI Collective Discord for more information about Axolotl
43
  https://discord.gg/5y8STgB3P3
44
 
45
 
46
- # Quantized Models
47
-
48
- Quantized versions of this model are generously made available by [TheBloke](https://huggingface.co/TheBloke).
49
-
50
- - AWQ: https://huggingface.co/TheBloke/Mistral-7B-SlimOrca-AWQ
51
- - GPTQ: https://huggingface.co/TheBloke/Mistral-7B-SlimOrca-GPTQ
52
- - GGUF: https://huggingface.co/TheBloke/Mistral-7B-SlimOrca-GGUF
53
-
54
 
55
  # Prompt Template
56
 
@@ -125,35 +117,6 @@ This is also **98.6%** of *`Llama2-70b-chat`*'s performance!
125
  We use [Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness) to run the benchmark tests above, using the same version as the HuggingFace LLM Leaderboard.
126
 
127
 
128
- ## AGIEval Performance
129
-
130
- We compare our results to the base Mistral-7B model (using LM Evaluation Harness).
131
-
132
- We find **tbd** of the base model's performance on AGI Eval, averaging **tbd**.
133
- As well, we significantly improve upon the official `mistralai/Mistral-7B-Instruct-v0.1` finetuning, achieving **tbd** of their performance.
134
-
135
- ![AGIEval Performance](https://huggingface.co/Open-Orca/Mistral-7B-SlimOrca/resolve/main/Images/MistralSlimOrca7BAGIEval.png "AGIEval Performance")
136
-
137
- ## BigBench-Hard Performance
138
-
139
- We find **tbd** of the base model's performance on BigBench-Hard, averaging **tbd**.
140
-
141
- ![BigBench-Hard Performance](https://huggingface.co/Open-Orca/Mistral-7B-SlimOrca/resolve/main/Images/MistralSlimOrca7BBigBenchHard.png "BigBench-Hard Performance")
142
-
143
- ## GPT4ALL Leaderboard Performance
144
-
145
- We ... averaging **tbd**.
146
-
147
- ![GPT4ALL Performance](https://huggingface.co/Open-Orca/Mistral-7B-SlimOrca/resolve/main/Images/MistralSlimOrca7BGPT4ALL.png "GPT4ALL Performance")
148
-
149
- ## MT-Bench Performance
150
-
151
- MT-Bench uses GPT-4 as a judge of model response quality, across a wide range of challenges.
152
- We find our performance is *on-par with `Llama2-70b-chat`*, averaging **6.86**.
153
-
154
- ![MT-Bench Performance](https://huggingface.co/Open-Orca/Mistral-7B-SlimOrca/resolve/main/Images/MistralSlimOrca7BMTBENCH.png "MT-Bench Performance")
155
-
156
-
157
  # Dataset
158
 
159
  We used a curated, filtered selection of most of the GPT-4 augmented data from our OpenOrca dataset, which aims to reproduce the Orca Research Paper dataset.
 
28
 
29
  This new dataset release provides an efficient means of reaching performance on-par with using larger slices of our data, while only including ~500k GPT-4 completions.
30
 
31
+ HF Leaderboard evals place this model as near parity with our recent [MistralOrca](https://huggingface.co/Open-Orca/Mistral-7B-OpenOrca) release, which was the #1 model at release time recently.
32
 
33
  Codename: "*MistralSlimOrca*"
34
 
 
43
  https://discord.gg/5y8STgB3P3
44
 
45
 
 
 
 
 
 
 
 
 
46
 
47
  # Prompt Template
48
 
 
117
  We use [Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness) to run the benchmark tests above, using the same version as the HuggingFace LLM Leaderboard.
118
 
119
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
120
  # Dataset
121
 
122
  We used a curated, filtered selection of most of the GPT-4 augmented data from our OpenOrca dataset, which aims to reproduce the Orca Research Paper dataset.