lapp0's picture
End of training
523d104 verified
|
raw
history blame
3.17 kB
metadata
base_model: gpt2
library_name: Distily
license: mit
tags:
  - generated_from_trainer
model-index:
  - name: distily_bench_obj_cross_v2.13_gpt2
    results: []

distily_bench_obj_cross_v2.13_gpt2

This student model is distilled from the teacher model gpt2 using the dataset (unspecified).

The Distily library was used for this distillation.

It achieves the following results on the evaluation set:

  • eval_enwikippl: 1272.0
  • eval_frwikippl: 5952.0
  • eval_zhwikippl: 45056.0
  • eval_tinystoriesppl: 816.0
  • eval_loss: 2.4585
  • eval_runtime: 13.0264
  • eval_samples_per_second: 46.06
  • eval_steps_per_second: 11.515

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • distillation_objective: DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl, layer_mapper=None, projector=None), hs_loss_component=LossComponent(label=hs, weight=1.0, loss_fn=mse, layer_mapper=uniform+last, projector=None), attn_loss_component=LossComponent(label=attn, weight=0, loss_fn=None, layer_mapper=None, projector=None))
  • train_embeddings: True
  • learning_rate: 0.0001
  • train_batch_size: 8
  • eval_batch_size: 4
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.5
  • num_epochs: 1.0

Resource Usage

Peak GPU Memory: 8.0905 GB

Eval-Phase Metrics

step epoch enwikippl frwikippl loss runtime samples_per_second steps_per_second tinystoriesppl zhwikippl
teacher eval 43.75 61.75 11.8125 19.125
0 0 1821066133504.0 158329674399744.0 19.3254 13.0759 45.886 11.471 12079595520.0 98956046499840.0
750 0.1010 1272.0 5952.0 2.4585 13.0264 46.06 11.515 816.0 45056.0
1500 0.2020 508.0 3248.0 1.7942 12.914 46.461 11.615 352.0 736.0
2250 0.3030 338.0 1624.0 1.5662 12.8953 46.528 11.632 276.0 306.0
3000 0.4040 241.0 916.0 1.3553 12.9278 46.412 11.603 209.0 186.0
3750 0.5051 200.0 628.0 1.2057 12.8882 46.554 11.639 169.0 174.0
4500 0.6061 158.0 512.0 1.0599 12.8875 46.557 11.639 129.0 171.0
5250 0.7071 128.0 460.0 0.9376 12.9282 46.41 11.603 106.5 133.0
6000 0.8081 118.0 416.0 0.8788 12.8993 46.514 11.629 96.5 130.0
6750 0.9091 112.5 400.0 0.8508 13.0134 46.106 11.527 92.5 126.5
7425 1.0 112.0 396.0 0.8455 12.9043 46.496 11.624 91.0 126.0

Framework versions

  • Distily 0.2.0
  • Transformers 4.44.0
  • Pytorch 2.3.0
  • Datasets 2.21.0