Text Generation
Transformers
PyTorch
English
gpt2a
custom_code
File size: 6,653 Bytes
6bafc77
 
 
 
 
 
 
bd7acf5
6bafc77
bd7acf5
b96b864
3f3039f
2b64783
 
d8d29b4
 
7140f59
35d7309
d8d29b4
 
41b6ace
2b64783
 
 
 
 
 
 
 
 
 
 
 
 
 
d8d29b4
2b64783
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d8d29b4
50b5761
d8d29b4
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
---
license: apache-2.0
datasets:
- cerebras/SlimPajama-627B
- togethercomputer/RedPajama-Data-1T
language:
- en

---

A modified GPT-2 model with only 25 million non-embedding params that outbenches GPT-2(124m), Pythia-70m/160m, and Cerebras-111m, it has ScaledSinusoidal position embeddings, embedding layernorm, no biases, and was trained on only 8 billion tokens of the SlimPajama dataset at home on 2xA6000. (On the graphic it's mis-labeled as cramp-41m)


**OLD BENCHMARK**
| model | avg | arc | hellaswag | mmlu | truthfulqa |
| --- | --- | --- | --- | --- | --- |
| cramp-25m | 30.57 | 21.76 | 27.35 | 25.53 | 47.66 |
| gpt2 (125m) | 30.06 | 22.1 | 31.6 | 25.86 | 40.67 | 
| pythia 70m deduped | 30.25 | 21.08 | 27.17 | 25.26 | 47.51 |
| pythia 70m | 30.46 | 21.59 | 27.29 | 25.9 | 47.06 |
| pythia 160m deduped | 31.16 | 24.06 | 30.34 | 24.95 | 44.34 |
| pythia 160m | 30.58 | 22.78 | 30.34 | 24.95 | 44.26 |

***NEW BENCHMARK**

|    Tasks    |Version|Filter|n-shot| Metric |Value |   |Stderr|
|-------------|------:|------|-----:|--------|-----:|---|-----:|
|arc_challenge|      1|none  |    25|acc     |0.1724|±  |0.0110|
|             |       |none  |    25|acc_norm|0.2031|±  |0.0118|
|truthfulqa_mc2|      2|none  |     0|acc   |0.4767|±  |0.0156|
|hellaswag|      1|none  |    10|acc     |0.2687|±  |0.0044|
|         |       |none  |    10|acc_norm|0.2773|±  |0.0045|
|winogrande|      1|none  |     5|acc   |0.5028|±  |0.0141|

*MMLU*

|               Tasks               |Version|Filter|n-shot|Metric|Value |   |Stderr|
|-----------------------------------|------:|------|-----:|------|-----:|---|-----:|
|world_religions                    |      0|none  |     5|acc   |0.1813|±  |0.0295|
|virology                           |      0|none  |     5|acc   |0.1928|±  |0.0307|
|us_foreign_policy                  |      0|none  |     5|acc   |0.2900|±  |0.0456|
|sociology                          |      0|none  |     5|acc   |0.2438|±  |0.0304|
|security_studies                   |      0|none  |     5|acc   |0.2367|±  |0.0272|
|public_relations                   |      0|none  |     5|acc   |0.2455|±  |0.0412|
|professional_psychology            |      0|none  |     5|acc   |0.2271|±  |0.0169|
|professional_medicine              |      0|none  |     5|acc   |0.4375|±  |0.0301|
|professional_law                   |      0|none  |     5|acc   |0.2490|±  |0.0110|
|professional_accounting            |      0|none  |     5|acc   |0.2589|±  |0.0261|
|prehistory                         |      0|none  |     5|acc   |0.2963|±  |0.0254|
|philosophy                         |      0|none  |     5|acc   |0.2315|±  |0.0240|
|nutrition                          |      0|none  |     5|acc   |0.2222|±  |0.0238|
|moral_scenarios                    |      0|none  |     5|acc   |0.2313|±  |0.0141|
|moral_disputes                     |      0|none  |     5|acc   |0.2168|±  |0.0222|
|miscellaneous                      |      0|none  |     5|acc   |0.2708|±  |0.0159|
|medical_genetics                   |      0|none  |     5|acc   |0.3000|±  |0.0461|
|marketing                          |      0|none  |     5|acc   |0.1923|±  |0.0258|
|management                         |      0|none  |     5|acc   |0.1942|±  |0.0392|
|machine_learning                   |      0|none  |     5|acc   |0.2054|±  |0.0383|
|logical_fallacies                  |      0|none  |     5|acc   |0.2393|±  |0.0335|
|jurisprudence                      |      0|none  |     5|acc   |0.2130|±  |0.0396|
|international_law                  |      0|none  |     5|acc   |0.2562|±  |0.0398|
|human_sexuality                    |      0|none  |     5|acc   |0.2366|±  |0.0373|
|human_aging                        |      0|none  |     5|acc   |0.2063|±  |0.0272|
|high_school_world_history          |      0|none  |     5|acc   |0.2700|±  |0.0289|
|high_school_us_history             |      0|none  |     5|acc   |0.2206|±  |0.0291|
|high_school_statistics             |      0|none  |     5|acc   |0.4722|±  |0.0340|
|high_school_psychology             |      0|none  |     5|acc   |0.2257|±  |0.0179|
|high_school_physics                |      0|none  |     5|acc   |0.2384|±  |0.0348|
|high_school_microeconomics         |      0|none  |     5|acc   |0.3403|±  |0.0308|
|high_school_mathematics            |      0|none  |     5|acc   |0.2630|±  |0.0268|
|high_school_macroeconomics         |      0|none  |     5|acc   |0.2051|±  |0.0205|
|high_school_government_and_politics|      0|none  |     5|acc   |0.2280|±  |0.0303|
|high_school_geography              |      0|none  |     5|acc   |0.3535|±  |0.0341|
|high_school_european_history       |      0|none  |     5|acc   |0.2909|±  |0.0355|
|high_school_computer_science       |      0|none  |     5|acc   |0.2400|±  |0.0429|
|high_school_chemistry              |      0|none  |     5|acc   |0.2759|±  |0.0314|
|high_school_biology                |      0|none  |     5|acc   |0.3161|±  |0.0265|
|global_facts                       |      0|none  |     5|acc   |0.2000|±  |0.0402|
|formal_logic                       |      0|none  |     5|acc   |0.1825|±  |0.0346|
|elementary_mathematics             |      0|none  |     5|acc   |0.2566|±  |0.0225|
|electrical_engineering             |      0|none  |     5|acc   |0.2414|±  |0.0357|
|econometrics                       |      0|none  |     5|acc   |0.2544|±  |0.0410|
|conceptual_physics                 |      0|none  |     5|acc   |0.2809|±  |0.0294|
|computer_security                  |      0|none  |     5|acc   |0.2000|±  |0.0402|
|college_physics                    |      0|none  |     5|acc   |0.3431|±  |0.0472|
|college_medicine                   |      0|none  |     5|acc   |0.2197|±  |0.0316|
|college_mathematics                |      0|none  |     5|acc   |0.3100|±  |0.0465|
|college_computer_science           |      0|none  |     5|acc   |0.3100|±  |0.0465|
|college_chemistry                  |      0|none  |     5|acc   |0.3400|±  |0.0476|
|college_biology                    |      0|none  |     5|acc   |0.2083|±  |0.0340|
|clinical_knowledge                 |      0|none  |     5|acc   |0.2189|±  |0.0254|
|business_ethics                    |      0|none  |     5|acc   |0.2000|±  |0.0402|
|astronomy                          |      0|none  |     5|acc   |0.2237|±  |0.0339|
|anatomy                            |      0|none  |     5|acc   |0.3333|±  |0.0407|
|abstract_algebra                   |      0|none  |     5|acc   |0.2200|±  |0.0416|

![image/png](https://cdn-uploads.huggingface.co/production/uploads/6079949388160e14e4e2e499/NzTdlxtBDp4drBRZgJiXt.png)