Text Generation
Transformers
Safetensors
English
phi
conversational
Eval Results
Inference Endpoints
text-generation-inference
Edit model card

Model Card for nano-phi-192M-v0.1

This is a continual effort from kenhktsui/nano-phi-115M-v0.1.
The model is not aligned.

Major differences:

How to use

To use the model, you will need transformer version >= 4.37.2

pip install transformers>=4.37.2
# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="kenhktsui/nano-phi-192M-v0.1")
pipe("I am a machine learning researcher. I work on", max_new_tokens=50, repetition_penalty=10.0)

Some metrics

  • model
    • hidden_size: 768
    • num_key_value_heads: 8 (grouped query attention)
    • num_attention_heads: 24
    • num_hidden_layers: 6
    • context length: 1024
    • total params: 192M
  • training:
    • global steps: 36,000

Open LLM Leaderboard Evaluation Results

Metric kenhktsui/nano-phi-191M-v0.1 kenhktsui/nano-phi-115M-v0.1 microsoft/phi-2 (Reproduced)
Avg. 29.24 28.68 61.53
ARC (25-shot) 24.15 21.93 61.52
HellaSwag (10-shot) 29.99 27.87 75.13
MMLU (5-shot) 25.46 25.30 58.23
TruthfulQA (0-shot) 44.30 46.01 44.46
Winogrande (5-shot) 51.54 50.99 74.51
GSM8K (5-shot) 0.0 0.0 55.34

Details:

hf-causal-experimental (pretrained=/content/lm-evaluation-harness/artifacts/model-9gh18vfl:v25,use_accelerate=false,trust_remote_code=True), limit: None, provide_description: False, num_fewshot: 0, batch_size: 8

Task Version Metric Value Stderr
arc_easy 0 acc 0.4596 ± 0.0102
acc_norm 0.4070 ± 0.0101

hf-causal-experimental (pretrained=/content/lm-evaluation-harness/artifacts/model-9gh18vfl:v25,use_accelerate=false,trust_remote_code=True), limit: None, provide_description: False, num_fewshot: 25, batch_size: 8

Task Version Metric Value Stderr
arc_challenge 0 acc 0.1911 ± 0.0115
acc_norm 0.2415 ± 0.0125

hf-causal-experimental (pretrained=/content/lm-evaluation-harness/artifacts/model-9gh18vfl:v25,use_accelerate=false,trust_remote_code=True), limit: None, provide_description: False, num_fewshot: 10, batch_size: 8

Task Version Metric Value Stderr
hellaswag 0 acc 0.2833 ± 0.0045
acc_norm 0.2999 ± 0.0046

hf-causal-experimental (pretrained=/content/lm-evaluation-harness/artifacts/model-9gh18vfl:v25,use_accelerate=false,trust_remote_code=True), limit: None, provide_description: False, num_fewshot: 0, batch_size: 8

Task Version Metric Value Stderr
truthfulqa_mc 1 mc1 0.2583 ± 0.0153
mc2 0.4430 ± 0.0152

hf-causal-experimental (pretrained=/content/lm-evaluation-harness/artifacts/model-9gh18vfl:v25,use_accelerate=false,trust_remote_code=True), limit: None, provide_description: False, num_fewshot: 5, batch_size: 8

Task Version Metric Value Stderr
hendrycksTest-abstract_algebra 1 acc 0.2200 ± 0.0416
acc_norm 0.2200 ± 0.0416
hendrycksTest-anatomy 1 acc 0.2593 ± 0.0379
acc_norm 0.2593 ± 0.0379
hendrycksTest-astronomy 1 acc 0.1711 ± 0.0306
acc_norm 0.1711 ± 0.0306
hendrycksTest-business_ethics 1 acc 0.2400 ± 0.0429
acc_norm 0.2400 ± 0.0429
hendrycksTest-clinical_knowledge 1 acc 0.2566 ± 0.0269
acc_norm 0.2566 ± 0.0269
hendrycksTest-college_biology 1 acc 0.2639 ± 0.0369
acc_norm 0.2639 ± 0.0369
hendrycksTest-college_chemistry 1 acc 0.1800 ± 0.0386
acc_norm 0.1800 ± 0.0386
hendrycksTest-college_computer_science 1 acc 0.3300 ± 0.0473
acc_norm 0.3300 ± 0.0473
hendrycksTest-college_mathematics 1 acc 0.3000 ± 0.0461
acc_norm 0.3000 ± 0.0461
hendrycksTest-college_medicine 1 acc 0.2023 ± 0.0306
acc_norm 0.2023 ± 0.0306
hendrycksTest-college_physics 1 acc 0.2843 ± 0.0449
acc_norm 0.2843 ± 0.0449
hendrycksTest-computer_security 1 acc 0.2200 ± 0.0416
acc_norm 0.2200 ± 0.0416
hendrycksTest-conceptual_physics 1 acc 0.2511 ± 0.0283
acc_norm 0.2511 ± 0.0283
hendrycksTest-econometrics 1 acc 0.2807 ± 0.0423
acc_norm 0.2807 ± 0.0423
hendrycksTest-electrical_engineering 1 acc 0.2897 ± 0.0378
acc_norm 0.2897 ± 0.0378
hendrycksTest-elementary_mathematics 1 acc 0.2804 ± 0.0231
acc_norm 0.2804 ± 0.0231
hendrycksTest-formal_logic 1 acc 0.2143 ± 0.0367
acc_norm 0.2143 ± 0.0367
hendrycksTest-global_facts 1 acc 0.1700 ± 0.0378
acc_norm 0.1700 ± 0.0378
hendrycksTest-high_school_biology 1 acc 0.3226 ± 0.0266
acc_norm 0.3226 ± 0.0266
hendrycksTest-high_school_chemistry 1 acc 0.2759 ± 0.0314
acc_norm 0.2759 ± 0.0314
hendrycksTest-high_school_computer_science 1 acc 0.2700 ± 0.0446
acc_norm 0.2700 ± 0.0446
hendrycksTest-high_school_european_history 1 acc 0.2606 ± 0.0343
acc_norm 0.2606 ± 0.0343
hendrycksTest-high_school_geography 1 acc 0.3081 ± 0.0329
acc_norm 0.3081 ± 0.0329
hendrycksTest-high_school_government_and_politics 1 acc 0.3627 ± 0.0347
acc_norm 0.3627 ± 0.0347
hendrycksTest-high_school_macroeconomics 1 acc 0.2641 ± 0.0224
acc_norm 0.2641 ± 0.0224
hendrycksTest-high_school_mathematics 1 acc 0.2630 ± 0.0268
acc_norm 0.2630 ± 0.0268
hendrycksTest-high_school_microeconomics 1 acc 0.3403 ± 0.0308
acc_norm 0.3403 ± 0.0308
hendrycksTest-high_school_physics 1 acc 0.3113 ± 0.0378
acc_norm 0.3113 ± 0.0378
hendrycksTest-high_school_psychology 1 acc 0.2716 ± 0.0191
acc_norm 0.2716 ± 0.0191
hendrycksTest-high_school_statistics 1 acc 0.4491 ± 0.0339
acc_norm 0.4491 ± 0.0339
hendrycksTest-high_school_us_history 1 acc 0.2402 ± 0.0300
acc_norm 0.2402 ± 0.0300
hendrycksTest-high_school_world_history 1 acc 0.2363 ± 0.0277
acc_norm 0.2363 ± 0.0277
hendrycksTest-human_aging 1 acc 0.2197 ± 0.0278
acc_norm 0.2197 ± 0.0278
hendrycksTest-human_sexuality 1 acc 0.2824 ± 0.0395
acc_norm 0.2824 ± 0.0395
hendrycksTest-international_law 1 acc 0.2479 ± 0.0394
acc_norm 0.2479 ± 0.0394
hendrycksTest-jurisprudence 1 acc 0.2037 ± 0.0389
acc_norm 0.2037 ± 0.0389
hendrycksTest-logical_fallacies 1 acc 0.2393 ± 0.0335
acc_norm 0.2393 ± 0.0335
hendrycksTest-machine_learning 1 acc 0.1875 ± 0.0370
acc_norm 0.1875 ± 0.0370
hendrycksTest-management 1 acc 0.2039 ± 0.0399
acc_norm 0.2039 ± 0.0399
hendrycksTest-marketing 1 acc 0.1795 ± 0.0251
acc_norm 0.1795 ± 0.0251
hendrycksTest-medical_genetics 1 acc 0.3000 ± 0.0461
acc_norm 0.3000 ± 0.0461
hendrycksTest-miscellaneous 1 acc 0.2644 ± 0.0158
acc_norm 0.2644 ± 0.0158
hendrycksTest-moral_disputes 1 acc 0.2225 ± 0.0224
acc_norm 0.2225 ± 0.0224
hendrycksTest-moral_scenarios 1 acc 0.2726 ± 0.0149
acc_norm 0.2726 ± 0.0149
hendrycksTest-nutrition 1 acc 0.2353 ± 0.0243
acc_norm 0.2353 ± 0.0243
hendrycksTest-philosophy 1 acc 0.2283 ± 0.0238
acc_norm 0.2283 ± 0.0238
hendrycksTest-prehistory 1 acc 0.2099 ± 0.0227
acc_norm 0.2099 ± 0.0227
hendrycksTest-professional_accounting 1 acc 0.2411 ± 0.0255
acc_norm 0.2411 ± 0.0255
hendrycksTest-professional_law 1 acc 0.2458 ± 0.0110
acc_norm 0.2458 ± 0.0110
hendrycksTest-professional_medicine 1 acc 0.3897 ± 0.0296
acc_norm 0.3897 ± 0.0296
hendrycksTest-professional_psychology 1 acc 0.2141 ± 0.0166
acc_norm 0.2141 ± 0.0166
hendrycksTest-public_relations 1 acc 0.1818 ± 0.0369
acc_norm 0.1818 ± 0.0369
hendrycksTest-security_studies 1 acc 0.2490 ± 0.0277
acc_norm 0.2490 ± 0.0277
hendrycksTest-sociology 1 acc 0.2537 ± 0.0308
acc_norm 0.2537 ± 0.0308
hendrycksTest-us_foreign_policy 1 acc 0.2900 ± 0.0456
acc_norm 0.2900 ± 0.0456
hendrycksTest-virology 1 acc 0.1807 ± 0.0300
acc_norm 0.1807 ± 0.0300
hendrycksTest-world_religions 1 acc 0.1813 ± 0.0295
acc_norm 0.1813 ± 0.0295

hf-causal-experimental (pretrained=/content/lm-evaluation-harness/artifacts/model-9gh18vfl:v25,use_accelerate=false,trust_remote_code=True), limit: None, provide_description: False, num_fewshot: 5, batch_size: 8

Task Version Metric Value Stderr
winogrande 0 acc 0.5154 ± 0.014

hf-causal-experimental (pretrained=/content/lm-evaluation-harness/artifacts/model-9gh18vfl:v25,use_accelerate=false,trust_remote_code=True), limit: None, provide_description: False, num_fewshot: 5, batch_size: 8

Task Version Metric Value Stderr
gsm8k 0 acc 0 ± 0
Downloads last month
4
Safetensors
Model size
192M params
Tensor type
F32
·
Inference API
Input a message to start chatting with kenhktsui/nano-phi-192M-v0.1.
This model can be loaded on Inference API (serverless).

Datasets used to train kenhktsui/nano-phi-192M-v0.1

Collection including kenhktsui/nano-phi-192M-v0.1

Evaluation results