--- datasets: - cerebras/SlimPajama-627B language: - en tags: - llama --- 200m-ish parameter model (I think the param count in the graphic here is wrong, but the bench values are correct) with the token embedding and language modelling head of Llama2-70b attached, with linear transformations from Llama2-70b's 8192d space down to this model's 1024d space. ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6079949388160e14e4e2e499/PhqViTuOrE7s65WyVRpNX.png) | Tasks |Version|Filter|n-shot| Metric |Value | |Stderr| |-------------|-------|------|-----:|--------|-----:|---|-----:| |arc_challenge|Yaml |none | 25|acc |0.1775|± |0.0112| | | |none | 25|acc_norm|0.2133|± |0.0120| |truthfulqa_mc2|Yaml |none | 0|acc |0.4457|± |0.0152| |winogrande|Yaml |none | 5|acc |0.5154|± | 0.014| |hellaswag|Yaml |none | 10|acc |0.2832|± |0.0045| | | |none | 10|acc_norm|0.3024|± |0.0046| ### MMLU (avg accuracy: 26.17%) | Tasks |Version|Filter|n-shot|Metric|Value | |Stderr| |-----------------------------------|-------|------|-----:|------|-----:|---|-----:| |abstract_algebra |Yaml |none | 5|acc |0.2200|± |0.0416| |anatomy |Yaml |none | 5|acc |0.2222|± |0.0359| |astronomy |Yaml |none | 5|acc |0.1776|± |0.0311| |business_ethics |Yaml |none | 5|acc |0.2300|± |0.0423| |clinical_knowledge |Yaml |none | 5|acc |0.2415|± |0.0263| |college_biology |Yaml |none | 5|acc |0.3194|± |0.0390| |college_chemistry |Yaml |none | 5|acc |0.2000|± |0.0402| |college_computer_science |Yaml |none | 5|acc |0.2800|± |0.0451| |college_mathematics |Yaml |none | 5|acc |0.2800|± |0.0451| |college_medicine |Yaml |none | 5|acc |0.2254|± |0.0319| |college_physics |Yaml |none | 5|acc |0.2157|± |0.0409| |computer_security |Yaml |none | 5|acc |0.2200|± |0.0416| |conceptual_physics |Yaml |none | 5|acc |0.2553|± |0.0285| |econometrics |Yaml |none | 5|acc |0.2368|± |0.0400| |electrical_engineering |Yaml |none | 5|acc |0.2345|± |0.0353| |elementary_mathematics |Yaml |none | 5|acc |0.2646|± |0.0227| |formal_logic |Yaml |none | 5|acc |0.2302|± |0.0376| |global_facts |Yaml |none | 5|acc |0.1700|± |0.0378| |high_school_biology |Yaml |none | 5|acc |0.2903|± |0.0258| |high_school_chemistry |Yaml |none | 5|acc |0.2611|± |0.0309| |high_school_computer_science |Yaml |none | 5|acc |0.2300|± |0.0423| |high_school_european_history |Yaml |none | 5|acc |0.2788|± |0.0350| |high_school_geography |Yaml |none | 5|acc |0.3081|± |0.0329| |high_school_government_and_politics|Yaml |none | 5|acc |0.3731|± |0.0349| |high_school_macroeconomics |Yaml |none | 5|acc |0.2923|± |0.0231| |high_school_mathematics |Yaml |none | 5|acc |0.2630|± |0.0268| |high_school_microeconomics |Yaml |none | 5|acc |0.3403|± |0.0308| |high_school_physics |Yaml |none | 5|acc |0.2715|± |0.0363| |high_school_psychology |Yaml |none | 5|acc |0.2881|± |0.0194| |high_school_statistics |Yaml |none | 5|acc |0.4722|± |0.0340| |high_school_us_history |Yaml |none | 5|acc |0.3529|± |0.0335| |high_school_world_history |Yaml |none | 5|acc |0.2532|± |0.0283| |human_aging |Yaml |none | 5|acc |0.2108|± |0.0274| |human_sexuality |Yaml |none | 5|acc |0.2672|± |0.0388| |international_law |Yaml |none | 5|acc |0.2479|± |0.0394| |jurisprudence |Yaml |none | 5|acc |0.2500|± |0.0419| |logical_fallacies |Yaml |none | 5|acc |0.2393|± |0.0335| |machine_learning |Yaml |none | 5|acc |0.2946|± |0.0433| |management |Yaml |none | 5|acc |0.1650|± |0.0368| |marketing |Yaml |none | 5|acc |0.1923|± |0.0258| |medical_genetics |Yaml |none | 5|acc |0.3000|± |0.0461| |miscellaneous |Yaml |none | 5|acc |0.2720|± |0.0159| |moral_disputes |Yaml |none | 5|acc |0.1936|± |0.0213| |moral_scenarios |Yaml |none | 5|acc |0.2380|± |0.0142| |nutrition |Yaml |none | 5|acc |0.2484|± |0.0247| |philosophy |Yaml |none | 5|acc |0.2283|± |0.0238| |prehistory |Yaml |none | 5|acc |0.2346|± |0.0236| |professional_accounting |Yaml |none | 5|acc |0.2589|± |0.0261| |professional_law |Yaml |none | 5|acc |0.2445|± |0.0110| |professional_medicine |Yaml |none | 5|acc |0.4485|± |0.0302| |professional_psychology |Yaml |none | 5|acc |0.2614|± |0.0178| |public_relations |Yaml |none | 5|acc |0.2364|± |0.0407| |security_studies |Yaml |none | 5|acc |0.4000|± |0.0314| |sociology |Yaml |none | 5|acc |0.3035|± |0.0325| |us_foreign_policy |Yaml |none | 5|acc |0.2800|± |0.0451| |virology |Yaml |none | 5|acc |0.2048|± |0.0314| |world_religions |Yaml |none | 5|acc |0.1988|± |0.0306|