lapp0's picture
End of training
2067516 verified
|
raw
history blame
3.18 kB
metadata
base_model: gpt2
library_name: Distily
license: mit
tags:
  - generated_from_trainer
model-index:
  - name: distily_bench_obj_cross_v2.13_gpt2
    results: []

distily_bench_obj_cross_v2.13_gpt2

This student model is distilled from the teacher model gpt2 using the dataset (unspecified).

The Distily library was used for this distillation.

It achieves the following results on the evaluation set:

  • eval_enwikippl: 2144.0
  • eval_frwikippl: 8704.0
  • eval_zhwikippl: 148480.0
  • eval_tinystoriesppl: 1648.0
  • eval_loss: 3.2443
  • eval_runtime: 12.9651
  • eval_samples_per_second: 46.278
  • eval_steps_per_second: 11.569

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • distillation_objective: DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl, layer_mapper=None, projector=None), hs_loss_component=LossComponent(label=hs, weight=1.0, loss_fn=kl, layer_mapper=uniform_cons, projector=None), attn_loss_component=LossComponent(label=attn, weight=0, loss_fn=None, layer_mapper=None, projector=None))
  • train_embeddings: True
  • learning_rate: 0.0001
  • train_batch_size: 8
  • eval_batch_size: 4
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.5
  • num_epochs: 1.0

Resource Usage

Peak GPU Memory: 8.0905 GB

Eval-Phase Metrics

step epoch enwikippl frwikippl loss runtime samples_per_second steps_per_second tinystoriesppl zhwikippl
teacher eval 43.75 61.75 11.8125 19.125
0 0 1821066133504.0 158329674399744.0 25.4650 12.9053 46.493 11.623 12079595520.0 98956046499840.0
750 0.1010 2144.0 8704.0 3.2443 12.9651 46.278 11.569 1648.0 148480.0
1500 0.2020 776.0 4800.0 2.2942 12.9521 46.325 11.581 568.0 4736.0
2250 0.3030 448.0 2832.0 1.9334 12.9674 46.27 11.568 358.0 592.0
3000 0.4040 318.0 1424.0 1.6643 13.0049 46.136 11.534 256.0 308.0
3750 0.5051 249.0 912.0 1.4761 12.9665 46.273 11.568 198.0 474.0
4500 0.6061 188.0 684.0 1.2711 12.9804 46.224 11.556 152.0 354.0
5250 0.7071 147.0 560.0 1.1017 12.9809 46.222 11.555 116.0 218.0
6000 0.8081 134.0 490.0 1.0242 12.9725 46.252 11.563 105.5 186.0
6750 0.9091 125.5 464.0 0.9844 12.9741 46.246 11.561 99.0 175.0
7425 1.0 124.0 462.0 0.9768 12.938 46.375 11.594 97.5 165.0

Framework versions

  • Distily 0.2.0
  • Transformers 4.44.0
  • Pytorch 2.3.0
  • Datasets 2.21.0