Text Generation
Transformers
PyTorch
English
gpt2a
custom_code

A modified GPT-2 model with only 25 million non-embedding params that outbenches GPT-2(124m), Pythia-70m/160m, and Cerebras-111m, it has ScaledSinusoidal position embeddings, embedding layernorm, no biases, and was trained on only 8 billion tokens of the SlimPajama dataset at home on 2xA6000. (On the graphic it's mis-labeled as cramp-41m)

OLD BENCHMARK

model avg arc hellaswag mmlu truthfulqa
cramp-25m 30.57 21.76 27.35 25.53 47.66
gpt2 (125m) 30.06 22.1 31.6 25.86 40.67
pythia 70m deduped 30.25 21.08 27.17 25.26 47.51
pythia 70m 30.46 21.59 27.29 25.9 47.06
pythia 160m deduped 31.16 24.06 30.34 24.95 44.34
pythia 160m 30.58 22.78 30.34 24.95 44.26

*NEW BENCHMARK

Tasks Version Filter n-shot Metric Value Stderr
arc_challenge 1 none 25 acc 0.1724 ± 0.0110
none 25 acc_norm 0.2031 ± 0.0118
truthfulqa_mc2 2 none 0 acc 0.4767 ± 0.0156
hellaswag 1 none 10 acc 0.2687 ± 0.0044
none 10 acc_norm 0.2773 ± 0.0045
winogrande 1 none 5 acc 0.5028 ± 0.0141

MMLU

Tasks Version Filter n-shot Metric Value Stderr
world_religions 0 none 5 acc 0.1813 ± 0.0295
virology 0 none 5 acc 0.1928 ± 0.0307
us_foreign_policy 0 none 5 acc 0.2900 ± 0.0456
sociology 0 none 5 acc 0.2438 ± 0.0304
security_studies 0 none 5 acc 0.2367 ± 0.0272
public_relations 0 none 5 acc 0.2455 ± 0.0412
professional_psychology 0 none 5 acc 0.2271 ± 0.0169
professional_medicine 0 none 5 acc 0.4375 ± 0.0301
professional_law 0 none 5 acc 0.2490 ± 0.0110
professional_accounting 0 none 5 acc 0.2589 ± 0.0261
prehistory 0 none 5 acc 0.2963 ± 0.0254
philosophy 0 none 5 acc 0.2315 ± 0.0240
nutrition 0 none 5 acc 0.2222 ± 0.0238
moral_scenarios 0 none 5 acc 0.2313 ± 0.0141
moral_disputes 0 none 5 acc 0.2168 ± 0.0222
miscellaneous 0 none 5 acc 0.2708 ± 0.0159
medical_genetics 0 none 5 acc 0.3000 ± 0.0461
marketing 0 none 5 acc 0.1923 ± 0.0258
management 0 none 5 acc 0.1942 ± 0.0392
machine_learning 0 none 5 acc 0.2054 ± 0.0383
logical_fallacies 0 none 5 acc 0.2393 ± 0.0335
jurisprudence 0 none 5 acc 0.2130 ± 0.0396
international_law 0 none 5 acc 0.2562 ± 0.0398
human_sexuality 0 none 5 acc 0.2366 ± 0.0373
human_aging 0 none 5 acc 0.2063 ± 0.0272
high_school_world_history 0 none 5 acc 0.2700 ± 0.0289
high_school_us_history 0 none 5 acc 0.2206 ± 0.0291
high_school_statistics 0 none 5 acc 0.4722 ± 0.0340
high_school_psychology 0 none 5 acc 0.2257 ± 0.0179
high_school_physics 0 none 5 acc 0.2384 ± 0.0348
high_school_microeconomics 0 none 5 acc 0.3403 ± 0.0308
high_school_mathematics 0 none 5 acc 0.2630 ± 0.0268
high_school_macroeconomics 0 none 5 acc 0.2051 ± 0.0205
high_school_government_and_politics 0 none 5 acc 0.2280 ± 0.0303
high_school_geography 0 none 5 acc 0.3535 ± 0.0341
high_school_european_history 0 none 5 acc 0.2909 ± 0.0355
high_school_computer_science 0 none 5 acc 0.2400 ± 0.0429
high_school_chemistry 0 none 5 acc 0.2759 ± 0.0314
high_school_biology 0 none 5 acc 0.3161 ± 0.0265
global_facts 0 none 5 acc 0.2000 ± 0.0402
formal_logic 0 none 5 acc 0.1825 ± 0.0346
elementary_mathematics 0 none 5 acc 0.2566 ± 0.0225
electrical_engineering 0 none 5 acc 0.2414 ± 0.0357
econometrics 0 none 5 acc 0.2544 ± 0.0410
conceptual_physics 0 none 5 acc 0.2809 ± 0.0294
computer_security 0 none 5 acc 0.2000 ± 0.0402
college_physics 0 none 5 acc 0.3431 ± 0.0472
college_medicine 0 none 5 acc 0.2197 ± 0.0316
college_mathematics 0 none 5 acc 0.3100 ± 0.0465
college_computer_science 0 none 5 acc 0.3100 ± 0.0465
college_chemistry 0 none 5 acc 0.3400 ± 0.0476
college_biology 0 none 5 acc 0.2083 ± 0.0340
clinical_knowledge 0 none 5 acc 0.2189 ± 0.0254
business_ethics 0 none 5 acc 0.2000 ± 0.0402
astronomy 0 none 5 acc 0.2237 ± 0.0339
anatomy 0 none 5 acc 0.3333 ± 0.0407
abstract_algebra 0 none 5 acc 0.2200 ± 0.0416

image/png

Downloads last month
13
Inference Examples
Inference API (serverless) does not yet support model repos that contain custom code.

Datasets used to train crumbly/cramp-25m