iansotnek commited on
Commit
92b5342
1 Parent(s): dc8f7f5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +14 -14
README.md CHANGED
@@ -96,17 +96,17 @@ We present the results from various model benchmarks on the EleutherAI LLM Evalu
96
  Model results are sorted by mean score, ascending, to provide an ordering. These metrics serve to further show that none of the DLite models are
97
  state of the art, but rather further show that chat-like behaviors in LLMs can be trained almost independent of model size.
98
 
99
- | model | openbookqa | arc_easy | winogrande | hellaswag | arc_challenge | piqa | boolq |
100
- |:--------------|-------------:|-----------:|-------------:|------------:|----------------:|---------:|---------:|
101
- | gpt2 | 0.164 | 0.438131 | 0.51618 | 0.289185 | 0.190273 | 0.628945 | 0.487156 |
102
- | dlite-v2-124m | 0.174 | 0.44697 | 0.502762 | 0.291974 | 0.192833 | 0.631665 | 0.520183 |
103
- | dlite-v1-124m | 0.17 | 0.462542 | 0.494081 | 0.293268 | 0.223549 | 0.622416 | 0.502446 |
104
- | gpt2-medium | 0.186 | 0.490741 | 0.531176 | 0.333101 | 0.215017 | 0.676279 | 0.585933 |
105
- | dlite-v2-355m | 0.206 | 0.493687 | 0.524073 | 0.334993 | 0.226109 | 0.670838 | 0.582263 |
106
- | dlite-v1-355m | 0.216 | 0.507576 | 0.496448 | 0.338478 | 0.234642 | 0.664309 | 0.600306 |
107
- | gpt2-large | 0.194 | 0.531566 | 0.553275 | 0.363971 | 0.216724 | 0.703482 | 0.604893 |
108
- | dlite-774m-v2 | 0.212 | 0.539562 | 0.5588 | 0.365565 | 0.234642 | 0.700218 | 0.60367 |
109
- | dlite-774m-v1 | 0.218 | 0.545875 | 0.562747 | 0.375124 | 0.250853 | 0.698041 | 0.614985 |
110
- | gpt2-xl | 0.224 | 0.582912 | 0.583268 | 0.400418 | 0.25 | 0.708379 | 0.617737 |
111
- | dlite-v1-1.5b | 0.226 | 0.588384 | 0.584846 | 0.401414 | 0.268771 | 0.708379 | 0.624159 |
112
- | dlite-v2-1.5b | 0.226 | 0.59596 | 0.581689 | 0.40719 | 0.273891 | 0.705114 | 0.630887 |
 
96
  Model results are sorted by mean score, ascending, to provide an ordering. These metrics serve to further show that none of the DLite models are
97
  state of the art, but rather further show that chat-like behaviors in LLMs can be trained almost independent of model size.
98
 
99
+ | Model | arc_challenge | arc_easy | boolq | hellaswag | openbookqa | piqa | winogrande |
100
+ |:--------------|----------------:|-----------:|---------:|------------:|-------------:|---------:|-------------:|
101
+ | dlite-v2-124m | 0.199659 | 0.447811 | 0.494801 | 0.291675 | 0.156 | 0.620239 | 0.487766 |
102
+ | gpt2 | 0.190273 | 0.438131 | 0.487156 | 0.289185 | 0.164 | 0.628945 | 0.51618 |
103
+ | dlite-v1-124m | 0.223549 | 0.462542 | 0.502446 | 0.293268 | 0.17 | 0.622416 | 0.494081 |
104
+ | gpt2-medium | 0.215017 | 0.490741 | 0.585933 | 0.333101 | 0.186 | 0.676279 | 0.531176 |
105
+ | dlite-v2-355m | 0.251706 | 0.486111 | 0.547401 | 0.344354 | 0.216 | 0.671926 | 0.52723 |
106
+ | dlite-v1-355m | 0.234642 | 0.507576 | 0.600306 | 0.338478 | 0.216 | 0.664309 | 0.496448 |
107
+ | gpt2-large | 0.216724 | 0.531566 | 0.604893 | 0.363971 | 0.194 | 0.703482 | 0.553275 |
108
+ | dlite-v1-774m | 0.250853 | 0.545875 | 0.614985 | 0.375124 | 0.218 | 0.698041 | 0.562747 |
109
+ | dlite-v2-774m | 0.269625 | 0.52904 | 0.613761 | 0.395937 | 0.256 | 0.691513 | 0.566693 |
110
+ | gpt2-xl | 0.25 | 0.582912 | 0.617737 | 0.400418 | 0.224 | 0.708379 | 0.583268 |
111
+ | dlite-v1-1_5b | 0.268771 | 0.588384 | 0.624159 | 0.401414 | 0.226 | 0.708379 | 0.584846 |
112
+ | dlite-v2-1_5b | 0.289249 | 0.565657 | 0.601223 | 0.434077 | 0.272 | 0.703482 | 0.588003 |