distily_bench_obj_cross_v2.10_gpt2

This student model is distilled from the teacher model gpt2 using the dataset (unspecified).

The Distily library was used for this distillation.

It achieves the following results on the evaluation set:

  • eval_enwikippl: 401.6902
  • eval_frwikippl: 385.9396
  • eval_zhwikippl: 137.9653
  • eval_tinystoriesppl: 881.4292
  • eval_loss: 0.7112
  • eval_runtime: 21.2483
  • eval_samples_per_second: 47.063
  • eval_steps_per_second: 11.766

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • distillation_objective: DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl, layer_mapper=None, projector=None), hs_loss_component=LossComponent(label=hs, weight=0, loss_fn=None, layer_mapper=None, projector=None), attn_loss_component=LossComponent(label=attn, weight=0, loss_fn=None, layer_mapper=None, projector=None))
  • train_embeddings: True
  • learning_rate: 4e-05
  • train_batch_size: 1
  • eval_batch_size: 4
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 1.0

Resource Usage

Peak GPU Memory: 3.9285 GB

Eval-Phase Metrics

step epoch enwikippl frwikippl loss runtime samples_per_second steps_per_second tinystoriesppl zhwikippl
teacher eval 270.2348 76.8142 671.1238 22.8030
0 0 120078.375 1867851235328.0 18.7920 21.2125 47.142 11.786 72.8770 4013754155008.0
5000 0.0505 621.5149 991.7020 1.3528 21.2177 47.13 11.783 980.0922 399.9691
10000 0.1010 574.4407 664.8521 1.1590 21.2225 47.12 11.78 1036.6780 493.8460
15000 0.1515 543.0890 635.0353 1.0360 21.2351 47.092 11.773 1033.2988 145.9157
20000 0.2020 509.8121 599.6746 0.9759 21.2099 47.148 11.787 985.1690 251.1274
25000 0.2525 448.2854 486.9003 0.8334 21.2284 47.107 11.777 923.3450 171.9567
30000 0.3030 420.2149 441.8981 0.7741 21.2742 47.005 11.751 893.9037 129.4944
35000 0.3535 417.6187 442.7548 0.7695 21.5924 46.313 11.578 884.2755 140.6411
40000 0.4040 419.8570 418.2776 0.7678 21.23 47.103 11.776 893.9774 162.6632
45000 0.4545 420.1905 413.8966 0.7576 21.2355 47.091 11.773 905.9177 154.8089
50000 0.5051 420.9561 426.7430 0.7544 21.2196 47.126 11.782 906.1800 147.5501
55000 0.5556 417.3034 409.1867 0.7509 21.2021 47.165 11.791 902.3304 143.7327
60000 0.6061 418.3230 413.0230 0.7525 21.2367 47.088 11.772 894.0145 156.6996
65000 0.6566 404.0308 404.5305 0.7221 21.2003 47.169 11.792 878.4468 136.2006
70000 0.7071 406.0154 392.1317 0.7194 21.2119 47.143 11.786 891.9106 137.0481
75000 0.7576 400.8665 383.9604 0.7188 21.2118 47.144 11.786 871.7914 140.4630
80000 0.8081 402.5625 387.4647 0.7168 21.2234 47.118 11.779 882.3771 141.0827
85000 0.8586 399.3479 385.9124 0.7123 21.2047 47.159 11.79 875.1130 140.0700
90000 0.9091 401.2549 386.7830 0.7117 21.2316 47.1 11.775 881.0649 138.5555
95000 0.9596 401.4725 386.1842 0.7112 21.2217 47.122 11.78 880.2640 138.0389
99000 1.0 401.6902 385.9396 0.7112 21.2483 47.063 11.766 881.4292 137.9653

Framework versions

  • Distily 0.2.0
  • Transformers 4.44.0
  • Pytorch 2.3.0
  • Datasets 2.21.0
Downloads last month
6
Safetensors
Model size
124M params
Tensor type
BF16
·
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for lapp0/distily_bench_obj_cross_v2.10_gpt2

Quantized
(52)
this model