lapp0's picture
End of training
041f7c8 verified
|
raw
history blame
3.16 kB
metadata
base_model: gpt2
library_name: Distily
license: mit
tags:
  - generated_from_trainer
model-index:
  - name: distily_bench_obj_cross_v2.13_gpt2
    results: []

distily_bench_obj_cross_v2.13_gpt2

This student model is distilled from the teacher model gpt2 using the dataset (unspecified).

The Distily library was used for this distillation.

It achieves the following results on the evaluation set:

  • eval_enwikippl: 1272.0
  • eval_frwikippl: 5952.0
  • eval_zhwikippl: 44800.0
  • eval_tinystoriesppl: 816.0
  • eval_loss: 2.4588
  • eval_runtime: 12.986
  • eval_samples_per_second: 46.204
  • eval_steps_per_second: 11.551

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • distillation_objective: DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl, layer_mapper=None, projector=None), hs_loss_component=LossComponent(label=hs, weight=1.0, loss_fn=mse, layer_mapper=last, projector=None), attn_loss_component=LossComponent(label=attn, weight=0, loss_fn=None, layer_mapper=None, projector=None))
  • train_embeddings: True
  • learning_rate: 0.0001
  • train_batch_size: 8
  • eval_batch_size: 4
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.5
  • num_epochs: 1.0

Resource Usage

Peak GPU Memory: 8.0905 GB

Eval-Phase Metrics

step epoch enwikippl frwikippl loss runtime samples_per_second steps_per_second tinystoriesppl zhwikippl
teacher eval 43.75 61.75 11.8125 19.125
0 0 1821066133504.0 158329674399744.0 19.3254 12.8793 46.586 11.647 12079595520.0 98956046499840.0
750 0.1010 1272.0 5952.0 2.4588 12.986 46.204 11.551 816.0 44800.0
1500 0.2020 506.0 3248.0 1.7942 13.1654 45.574 11.393 350.0 736.0
2250 0.3030 338.0 1632.0 1.5655 13.13 45.697 11.424 278.0 306.0
3000 0.4040 241.0 916.0 1.3548 13.0817 45.865 11.466 209.0 188.0
3750 0.5051 199.0 624.0 1.2057 13.1347 45.68 11.42 169.0 174.0
4500 0.6061 160.0 510.0 1.0606 13.0674 45.916 11.479 129.0 171.0
5250 0.7071 128.0 468.0 0.9370 13.0344 46.032 11.508 106.5 132.0
6000 0.8081 118.0 418.0 0.8787 13.3306 45.009 11.252 97.0 131.0
6750 0.9091 112.0 400.0 0.8501 13.0719 45.9 11.475 93.0 129.0
7425 1.0 112.0 396.0 0.8454 13.0273 46.057 11.514 91.5 128.0

Framework versions

  • Distily 0.2.0
  • Transformers 4.44.0
  • Pytorch 2.3.0
  • Datasets 2.21.0