crumbly
/

cramp-25m

Text Generation

Transformers

PyTorch

English

gpt2a

custom_code

Model card Files Files and versions Community

crumb commited on Oct 11, 2023

Commit

50b5761

1 Parent(s): 35d7309

Update README.md

Browse files

Files changed (1) hide show

README.md +2 -139

README.md CHANGED Viewed

@@ -8,10 +8,7 @@ language:
 ---
-### Open LLM Leaderboard Average Score: 0.3057
-This is just above base gpt2 only because of truthfulqa score bringing the average up, it has a higher truthfulqa score than any base gpt2 model. It is also just under pythia-160m for average score (0.01%) and
 | model | avg | arc | hellaswag | mmlu | truthfulqa |
 | --- | --- | --- | --- | --- | --- |
@@ -23,139 +20,5 @@ This is just above base gpt2 only because of truthfulqa score bringing the avera
 | pythia 160m | 30.58 | 22.78 | 30.34 | 24.95 | 44.26 |
-|    Task     |Version| Metric |Value |   |Stderr|
-|-------------|------:|--------|-----:|---|-----:|
-|arc_challenge|      0|acc     |0.1741|±  |0.0111|
-|             |       |acc_norm|**0.2176**|±  |0.0121|
-|  Task   |Version| Metric |Value |   |Stderr|
-|---------|------:|--------|-----:|---|-----:|
-|hellaswag|      0|acc     |0.2698|±  |0.0044|
-|         |       |acc_norm|**0.2735**|±  |0.0044|
-|    Task     |Version|Metric|Value |   |Stderr|
-|-------------|------:|------|-----:|---|-----:|
-|truthfulqa_mc|      1|mc1   |0.2803|±  |0.0157|
-|             |       |mc2   |**0.4766**|±  |0.0156|
-|                      Task                       |Version| Metric |Value |   |Stderr|
-|-------------------------------------------------|------:|--------|-----:|---|-----:|
-|hendrycksTest-abstract_algebra                   |      1|acc     |0.2200|±  |0.0416|
-|                                                 |       |acc_norm|0.2200|±  |0.0416|
-|hendrycksTest-anatomy                            |      1|acc     |0.3333|±  |0.0407|
-|                                                 |       |acc_norm|0.3333|±  |0.0407|
-|hendrycksTest-astronomy                          |      1|acc     |0.2237|±  |0.0339|
-|                                                 |       |acc_norm|0.2237|±  |0.0339|
-|hendrycksTest-business_ethics                    |      1|acc     |0.2000|±  |0.0402|
-|                                                 |       |acc_norm|0.2000|±  |0.0402|
-|hendrycksTest-clinical_knowledge                 |      1|acc     |0.2189|±  |0.0254|
-|                                                 |       |acc_norm|0.2189|±  |0.0254|
-|hendrycksTest-college_biology                    |      1|acc     |0.2083|±  |0.0340|
-|                                                 |       |acc_norm|0.2083|±  |0.0340|
-|hendrycksTest-college_chemistry                  |      1|acc     |0.3400|±  |0.0476|
-|                                                 |       |acc_norm|0.3400|±  |0.0476|
-|hendrycksTest-college_computer_science           |      1|acc     |0.3100|±  |0.0465|
-|                                                 |       |acc_norm|0.3100|±  |0.0465|
-|hendrycksTest-college_mathematics                |      1|acc     |0.3100|±  |0.0465|
-|                                                 |       |acc_norm|0.3100|±  |0.0465|
-|hendrycksTest-college_medicine                   |      1|acc     |0.2197|±  |0.0316|
-|                                                 |       |acc_norm|0.2197|±  |0.0316|
-|hendrycksTest-college_physics                    |      1|acc     |0.3431|±  |0.0472|
-|                                                 |       |acc_norm|0.3431|±  |0.0472|
-|hendrycksTest-computer_security                  |      1|acc     |0.2000|±  |0.0402|
-|                                                 |       |acc_norm|0.2000|±  |0.0402|
-|hendrycksTest-conceptual_physics                 |      1|acc     |0.2809|±  |0.0294|
-|                                                 |       |acc_norm|0.2809|±  |0.0294|
-|hendrycksTest-econometrics                       |      1|acc     |0.2544|±  |0.0410|
-|                                                 |       |acc_norm|0.2544|±  |0.0410|
-|hendrycksTest-electrical_engineering             |      1|acc     |0.2414|±  |0.0357|
-|                                                 |       |acc_norm|0.2414|±  |0.0357|
-|hendrycksTest-elementary_mathematics             |      1|acc     |0.2566|±  |0.0225|
-|                                                 |       |acc_norm|0.2566|±  |0.0225|
-|hendrycksTest-formal_logic                       |      1|acc     |0.1825|±  |0.0346|
-|                                                 |       |acc_norm|0.1825|±  |0.0346|
-|hendrycksTest-global_facts                       |      1|acc     |0.2000|±  |0.0402|
-|                                                 |       |acc_norm|0.2000|±  |0.0402|
-|hendrycksTest-high_school_biology                |      1|acc     |0.3161|±  |0.0265|
-|                                                 |       |acc_norm|0.3161|±  |0.0265|
-|hendrycksTest-high_school_chemistry              |      1|acc     |0.2759|±  |0.0314|
-|                                                 |       |acc_norm|0.2759|±  |0.0314|
-|hendrycksTest-high_school_computer_science       |      1|acc     |0.2400|±  |0.0429|
-|                                                 |       |acc_norm|0.2400|±  |0.0429|
-|hendrycksTest-high_school_european_history       |      1|acc     |0.2909|±  |0.0355|
-|                                                 |       |acc_norm|0.2909|±  |0.0355|
-|hendrycksTest-high_school_geography              |      1|acc     |0.3535|±  |0.0341|
-|                                                 |       |acc_norm|0.3535|±  |0.0341|
-|hendrycksTest-high_school_government_and_politics|      1|acc     |0.2280|±  |0.0303|
-|                                                 |       |acc_norm|0.2280|±  |0.0303|
-|hendrycksTest-high_school_macroeconomics         |      1|acc     |0.2051|±  |0.0205|
-|                                                 |       |acc_norm|0.2051|±  |0.0205|
-|hendrycksTest-high_school_mathematics            |      1|acc     |0.2630|±  |0.0268|
-|                                                 |       |acc_norm|0.2630|±  |0.0268|
-|hendrycksTest-high_school_microeconomics         |      1|acc     |0.3403|±  |0.0308|
-|                                                 |       |acc_norm|0.3403|±  |0.0308|
-|hendrycksTest-high_school_physics                |      1|acc     |0.2384|±  |0.0348|
-|                                                 |       |acc_norm|0.2384|±  |0.0348|
-|hendrycksTest-high_school_psychology             |      1|acc     |0.2257|±  |0.0179|
-|                                                 |       |acc_norm|0.2257|±  |0.0179|
-|hendrycksTest-high_school_statistics             |      1|acc     |0.4722|±  |0.0340|
-|                                                 |       |acc_norm|0.4722|±  |0.0340|
-|hendrycksTest-high_school_us_history             |      1|acc     |0.2206|±  |0.0291|
-|                                                 |       |acc_norm|0.2206|±  |0.0291|
-|hendrycksTest-high_school_world_history          |      1|acc     |0.2658|±  |0.0288|
-|                                                 |       |acc_norm|0.2658|±  |0.0288|
-|hendrycksTest-human_aging                        |      1|acc     |0.2063|±  |0.0272|
-|                                                 |       |acc_norm|0.2063|±  |0.0272|
-|hendrycksTest-human_sexuality                    |      1|acc     |0.2366|±  |0.0373|
-|                                                 |       |acc_norm|0.2366|±  |0.0373|
-|hendrycksTest-international_law                  |      1|acc     |0.2562|±  |0.0398|
-|                                                 |       |acc_norm|0.2562|±  |0.0398|
-|hendrycksTest-jurisprudence                      |      1|acc     |0.2130|±  |0.0396|
-|                                                 |       |acc_norm|0.2130|±  |0.0396|
-|hendrycksTest-logical_fallacies                  |      1|acc     |0.2393|±  |0.0335|
-|                                                 |       |acc_norm|0.2393|±  |0.0335|
-|hendrycksTest-machine_learning                   |      1|acc     |0.2054|±  |0.0383|
-|                                                 |       |acc_norm|0.2054|±  |0.0383|
-|hendrycksTest-management                         |      1|acc     |0.1942|±  |0.0392|
-|                                                 |       |acc_norm|0.1942|±  |0.0392|
-|hendrycksTest-marketing                          |      1|acc     |0.1923|±  |0.0258|
-|                                                 |       |acc_norm|0.1923|±  |0.0258|
-|hendrycksTest-medical_genetics                   |      1|acc     |0.3000|±  |0.0461|
-|                                                 |       |acc_norm|0.3000|±  |0.0461|
-|hendrycksTest-miscellaneous                      |      1|acc     |0.2708|±  |0.0159|
-|                                                 |       |acc_norm|0.2708|±  |0.0159|
-|hendrycksTest-moral_disputes                     |      1|acc     |0.2168|±  |0.0222|
-|                                                 |       |acc_norm|0.2168|±  |0.0222|
-|hendrycksTest-moral_scenarios                    |      1|acc     |0.2313|±  |0.0141|
-|                                                 |       |acc_norm|0.2313|±  |0.0141|
-|hendrycksTest-nutrition                          |      1|acc     |0.2222|±  |0.0238|
-|                                                 |       |acc_norm|0.2222|±  |0.0238|
-|hendrycksTest-philosophy                         |      1|acc     |0.2315|±  |0.0240|
-|                                                 |       |acc_norm|0.2315|±  |0.0240|
-|hendrycksTest-prehistory                         |      1|acc     |0.2963|±  |0.0254|
-|                                                 |       |acc_norm|0.2963|±  |0.0254|
-|hendrycksTest-professional_accounting            |      1|acc     |0.2589|±  |0.0261|
-|                                                 |       |acc_norm|0.2589|±  |0.0261|
-|hendrycksTest-professional_law                   |      1|acc     |0.2490|±  |0.0110|
-|                                                 |       |acc_norm|0.2490|±  |0.0110|
-|hendrycksTest-professional_medicine              |      1|acc     |0.4375|±  |0.0301|
-|                                                 |       |acc_norm|0.4375|±  |0.0301|
-|hendrycksTest-professional_psychology            |      1|acc     |0.2271|±  |0.0169|
-|                                                 |       |acc_norm|0.2271|±  |0.0169|
-|hendrycksTest-public_relations                   |      1|acc     |0.2455|±  |0.0412|
-|                                                 |       |acc_norm|0.2455|±  |0.0412|
-|hendrycksTest-security_studies                   |      1|acc     |0.2367|±  |0.0272|
-|                                                 |       |acc_norm|0.2367|±  |0.0272|
-|hendrycksTest-sociology                          |      1|acc     |0.2438|±  |0.0304|
-|                                                 |       |acc_norm|0.2438|±  |0.0304|
-|hendrycksTest-us_foreign_policy                  |      1|acc     |0.2900|±  |0.0456|
-|                                                 |       |acc_norm|0.2900|±  |0.0456|
-|hendrycksTest-virology                           |      1|acc     |0.1928|±  |0.0307|
-|                                                 |       |acc_norm|0.1928|±  |0.0307|
-|hendrycksTest-world_religions                    |      1|acc     |0.1813|±  |0.0295|
-|                                                 |       |acc_norm|0.1813|±  |0.0295|
-average mmlu is 0.2553175438596491 ??

 ---
+A modified GPT-2 model with only 25 million non-embedding params that outbenches GPT-2(124m), Pythia-70m/160m, and Cerebras-111m, it has ScaledSinusoidal position embeddings, embedding layernorm, no biases, and was trained on only 8 billion tokens of the SlimPajama dataset at home on 2xA6000.
 | model | avg | arc | hellaswag | mmlu | truthfulqa |
 | --- | --- | --- | --- | --- | --- |
 | pythia 160m | 30.58 | 22.78 | 30.34 | 24.95 | 44.26 |
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/6079949388160e14e4e2e499/NzTdlxtBDp4drBRZgJiXt.png)