lapp0's picture
End of training
add29c3 verified
|
raw
history blame
3.17 kB
metadata
base_model: gpt2
library_name: Distily
license: mit
tags:
  - generated_from_trainer
model-index:
  - name: distily_bench_obj_cross_v2.13_gpt2
    results: []

distily_bench_obj_cross_v2.13_gpt2

This student model is distilled from the teacher model gpt2 using the dataset (unspecified).

The Distily library was used for this distillation.

It achieves the following results on the evaluation set:

  • eval_enwikippl: 1280.0
  • eval_frwikippl: 5952.0
  • eval_zhwikippl: 45056.0
  • eval_tinystoriesppl: 816.0
  • eval_loss: 2.4587
  • eval_runtime: 12.9968
  • eval_samples_per_second: 46.165
  • eval_steps_per_second: 11.541

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • distillation_objective: DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl, layer_mapper=None, projector=None), hs_loss_component=LossComponent(label=hs, weight=1.0, loss_fn=mse, layer_mapper=uniform_cons, projector=None), attn_loss_component=LossComponent(label=attn, weight=0, loss_fn=None, layer_mapper=None, projector=None))
  • train_embeddings: True
  • learning_rate: 0.0001
  • train_batch_size: 8
  • eval_batch_size: 4
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.5
  • num_epochs: 1.0

Resource Usage

Peak GPU Memory: 8.0905 GB

Eval-Phase Metrics

step epoch enwikippl frwikippl loss runtime samples_per_second steps_per_second tinystoriesppl zhwikippl
teacher eval 43.75 61.75 11.8125 19.125
0 0 1821066133504.0 158329674399744.0 19.3254 13.0242 46.068 11.517 12079595520.0 98956046499840.0
750 0.1010 1280.0 5952.0 2.4587 12.9968 46.165 11.541 816.0 45056.0
1500 0.2020 506.0 3248.0 1.7942 13.0259 46.062 11.515 352.0 736.0
2250 0.3030 338.0 1624.0 1.5654 13.2237 45.373 11.343 278.0 306.0
3000 0.4040 241.0 920.0 1.3552 13.084 45.857 11.464 210.0 186.0
3750 0.5051 200.0 624.0 1.2059 13.1502 45.627 11.407 169.0 175.0
4500 0.6061 159.0 510.0 1.0607 12.9758 46.24 11.56 128.0 173.0
5250 0.7071 128.0 464.0 0.9374 13.0443 45.997 11.499 106.0 130.0
6000 0.8081 117.5 414.0 0.8786 12.9003 46.51 11.628 96.0 129.0
6750 0.9091 112.5 398.0 0.8508 12.8837 46.571 11.643 93.0 125.0
7425 1.0 112.0 396.0 0.8456 12.9485 46.337 11.584 91.5 125.0

Framework versions

  • Distily 0.2.0
  • Transformers 4.44.0
  • Pytorch 2.3.0
  • Datasets 2.21.0