Text Generation
Transformers
PyTorch
English
gpt2a
custom_code
cramp-25m / README.md
crumb's picture
Update README.md
41b6ace
|
raw
history blame
11.7 kB
metadata
license: apache-2.0
datasets:
  - cerebras/SlimPajama-627B
  - togethercomputer/RedPajama-Data-1T
language:
  - en

Open LLM Leaderboard Average Score: 0.3057

This is just above base gpt2 only because of truthfulqa score bringing the average up, it has a higher truthfulqa score than any base gpt2 model. It is also just under pythia-160m for average score (0.01%) and

model avg arc hellaswag mmlu truthfulqa
cramp-41m 30.57 21.76 27.35 25.53 47.66
gpt2 (125m) 30.06 22.1 31.6 25.86 40.67
pythia 70m deduped 30.25 21.08 27.17 25.26 47.51
pythia 70m 30.46 21.59 27.29 25.9 47.06
pythia 160m deduped 31.16 24.06 30.34 24.95 44.34
pythia 160m 30.58 22.78 30.34 24.95 44.26
Task Version Metric Value Stderr
arc_challenge 0 acc 0.1741 ± 0.0111
acc_norm 0.2176 ± 0.0121
Task Version Metric Value Stderr
hellaswag 0 acc 0.2698 ± 0.0044
acc_norm 0.2735 ± 0.0044
Task Version Metric Value Stderr
truthfulqa_mc 1 mc1 0.2803 ± 0.0157
mc2 0.4766 ± 0.0156
Task Version Metric Value Stderr
hendrycksTest-abstract_algebra 1 acc 0.2200 ± 0.0416
acc_norm 0.2200 ± 0.0416
hendrycksTest-anatomy 1 acc 0.3333 ± 0.0407
acc_norm 0.3333 ± 0.0407
hendrycksTest-astronomy 1 acc 0.2237 ± 0.0339
acc_norm 0.2237 ± 0.0339
hendrycksTest-business_ethics 1 acc 0.2000 ± 0.0402
acc_norm 0.2000 ± 0.0402
hendrycksTest-clinical_knowledge 1 acc 0.2189 ± 0.0254
acc_norm 0.2189 ± 0.0254
hendrycksTest-college_biology 1 acc 0.2083 ± 0.0340
acc_norm 0.2083 ± 0.0340
hendrycksTest-college_chemistry 1 acc 0.3400 ± 0.0476
acc_norm 0.3400 ± 0.0476
hendrycksTest-college_computer_science 1 acc 0.3100 ± 0.0465
acc_norm 0.3100 ± 0.0465
hendrycksTest-college_mathematics 1 acc 0.3100 ± 0.0465
acc_norm 0.3100 ± 0.0465
hendrycksTest-college_medicine 1 acc 0.2197 ± 0.0316
acc_norm 0.2197 ± 0.0316
hendrycksTest-college_physics 1 acc 0.3431 ± 0.0472
acc_norm 0.3431 ± 0.0472
hendrycksTest-computer_security 1 acc 0.2000 ± 0.0402
acc_norm 0.2000 ± 0.0402
hendrycksTest-conceptual_physics 1 acc 0.2809 ± 0.0294
acc_norm 0.2809 ± 0.0294
hendrycksTest-econometrics 1 acc 0.2544 ± 0.0410
acc_norm 0.2544 ± 0.0410
hendrycksTest-electrical_engineering 1 acc 0.2414 ± 0.0357
acc_norm 0.2414 ± 0.0357
hendrycksTest-elementary_mathematics 1 acc 0.2566 ± 0.0225
acc_norm 0.2566 ± 0.0225
hendrycksTest-formal_logic 1 acc 0.1825 ± 0.0346
acc_norm 0.1825 ± 0.0346
hendrycksTest-global_facts 1 acc 0.2000 ± 0.0402
acc_norm 0.2000 ± 0.0402
hendrycksTest-high_school_biology 1 acc 0.3161 ± 0.0265
acc_norm 0.3161 ± 0.0265
hendrycksTest-high_school_chemistry 1 acc 0.2759 ± 0.0314
acc_norm 0.2759 ± 0.0314
hendrycksTest-high_school_computer_science 1 acc 0.2400 ± 0.0429
acc_norm 0.2400 ± 0.0429
hendrycksTest-high_school_european_history 1 acc 0.2909 ± 0.0355
acc_norm 0.2909 ± 0.0355
hendrycksTest-high_school_geography 1 acc 0.3535 ± 0.0341
acc_norm 0.3535 ± 0.0341
hendrycksTest-high_school_government_and_politics 1 acc 0.2280 ± 0.0303
acc_norm 0.2280 ± 0.0303
hendrycksTest-high_school_macroeconomics 1 acc 0.2051 ± 0.0205
acc_norm 0.2051 ± 0.0205
hendrycksTest-high_school_mathematics 1 acc 0.2630 ± 0.0268
acc_norm 0.2630 ± 0.0268
hendrycksTest-high_school_microeconomics 1 acc 0.3403 ± 0.0308
acc_norm 0.3403 ± 0.0308
hendrycksTest-high_school_physics 1 acc 0.2384 ± 0.0348
acc_norm 0.2384 ± 0.0348
hendrycksTest-high_school_psychology 1 acc 0.2257 ± 0.0179
acc_norm 0.2257 ± 0.0179
hendrycksTest-high_school_statistics 1 acc 0.4722 ± 0.0340
acc_norm 0.4722 ± 0.0340
hendrycksTest-high_school_us_history 1 acc 0.2206 ± 0.0291
acc_norm 0.2206 ± 0.0291
hendrycksTest-high_school_world_history 1 acc 0.2658 ± 0.0288
acc_norm 0.2658 ± 0.0288
hendrycksTest-human_aging 1 acc 0.2063 ± 0.0272
acc_norm 0.2063 ± 0.0272
hendrycksTest-human_sexuality 1 acc 0.2366 ± 0.0373
acc_norm 0.2366 ± 0.0373
hendrycksTest-international_law 1 acc 0.2562 ± 0.0398
acc_norm 0.2562 ± 0.0398
hendrycksTest-jurisprudence 1 acc 0.2130 ± 0.0396
acc_norm 0.2130 ± 0.0396
hendrycksTest-logical_fallacies 1 acc 0.2393 ± 0.0335
acc_norm 0.2393 ± 0.0335
hendrycksTest-machine_learning 1 acc 0.2054 ± 0.0383
acc_norm 0.2054 ± 0.0383
hendrycksTest-management 1 acc 0.1942 ± 0.0392
acc_norm 0.1942 ± 0.0392
hendrycksTest-marketing 1 acc 0.1923 ± 0.0258
acc_norm 0.1923 ± 0.0258
hendrycksTest-medical_genetics 1 acc 0.3000 ± 0.0461
acc_norm 0.3000 ± 0.0461
hendrycksTest-miscellaneous 1 acc 0.2708 ± 0.0159
acc_norm 0.2708 ± 0.0159
hendrycksTest-moral_disputes 1 acc 0.2168 ± 0.0222
acc_norm 0.2168 ± 0.0222
hendrycksTest-moral_scenarios 1 acc 0.2313 ± 0.0141
acc_norm 0.2313 ± 0.0141
hendrycksTest-nutrition 1 acc 0.2222 ± 0.0238
acc_norm 0.2222 ± 0.0238
hendrycksTest-philosophy 1 acc 0.2315 ± 0.0240
acc_norm 0.2315 ± 0.0240
hendrycksTest-prehistory 1 acc 0.2963 ± 0.0254
acc_norm 0.2963 ± 0.0254
hendrycksTest-professional_accounting 1 acc 0.2589 ± 0.0261
acc_norm 0.2589 ± 0.0261
hendrycksTest-professional_law 1 acc 0.2490 ± 0.0110
acc_norm 0.2490 ± 0.0110
hendrycksTest-professional_medicine 1 acc 0.4375 ± 0.0301
acc_norm 0.4375 ± 0.0301
hendrycksTest-professional_psychology 1 acc 0.2271 ± 0.0169
acc_norm 0.2271 ± 0.0169
hendrycksTest-public_relations 1 acc 0.2455 ± 0.0412
acc_norm 0.2455 ± 0.0412
hendrycksTest-security_studies 1 acc 0.2367 ± 0.0272
acc_norm 0.2367 ± 0.0272
hendrycksTest-sociology 1 acc 0.2438 ± 0.0304
acc_norm 0.2438 ± 0.0304
hendrycksTest-us_foreign_policy 1 acc 0.2900 ± 0.0456
acc_norm 0.2900 ± 0.0456
hendrycksTest-virology 1 acc 0.1928 ± 0.0307
acc_norm 0.1928 ± 0.0307
hendrycksTest-world_religions 1 acc 0.1813 ± 0.0295
acc_norm 0.1813 ± 0.0295

average mmlu is 0.2553175438596491 ??