lapp0's picture
End of training
113b768 verified
|
raw
history blame
No virus
3.48 kB
metadata
base_model: gpt2
library_name: Distily
license: mit
tags:
  - generated_from_trainer
model-index:
  - name: distily_bench_obj_cross_v2.15_gpt2
    results: []

distily_bench_obj_cross_v2.15_gpt2

This student model is distilled from the teacher model gpt2 using the dataset (unspecified).

The Distily library was used for this distillation.

It achieves the following results on the evaluation set:

  • eval_enwikippl: 5056.0
  • eval_frwikippl: 3696.0
  • eval_zhwikippl: 29312.0
  • eval_tinystoriesppl: 4672.0
  • eval_loss: 1.3042
  • eval_runtime: 16.773
  • eval_samples_per_second: 59.62
  • eval_steps_per_second: 7.452

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • distillation_objective: DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl, layer_mapper=None, projector=None), hs_loss_component=LossComponent(label=hs, weight=0, loss_fn=None, layer_mapper=None, projector=None), attn_loss_component=LossComponent(label=attn, weight=0, loss_fn=None, layer_mapper=None, projector=None))
  • train_embeddings: True
  • learning_rate: 0.0004
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant
  • lr_scheduler_warmup_ratio: 0.2
  • num_epochs: 1.0

Resource Usage

Peak GPU Memory: 7.9368 GB

Eval-Phase Metrics

step epoch enwikippl frwikippl loss runtime samples_per_second steps_per_second tinystoriesppl zhwikippl
teacher eval 43.75 61.75 11.8125 19.125
0 0 2336462209024.0 122045790683136.0 24.1200 16.7622 59.658 7.457 4429185024.0 25975962206208.0
1000 0.0808 3312.0 3184.0 1.1229 16.7657 59.646 7.456 2720.0 2992.0
2000 0.1616 5248.0 4048.0 1.2528 16.7825 59.586 7.448 5408.0 9984.0
3000 0.2424 5600.0 3744.0 1.2812 16.7695 59.632 7.454 5312.0 23680.0
4000 0.3232 5408.0 3920.0 1.2832 16.8694 59.279 7.41 5440.0 33280.0
5000 0.4040 5376.0 3952.0 1.2841 16.7361 59.751 7.469 5408.0 27008.0
6000 0.4848 5344.0 3680.0 1.2770 16.7635 59.653 7.457 5440.0 29312.0
7000 0.5657 5024.0 3760.0 1.2800 16.7492 59.704 7.463 5184.0 39936.0
8000 0.6465 4992.0 3712.0 1.2922 16.7445 59.721 7.465 5088.0 26752.0
9000 0.7273 5056.0 3696.0 1.3042 16.773 59.62 7.452 4672.0 29312.0
10000 0.8081 5824.0 3648.0 1.3192 16.7669 59.641 7.455 5312.0 24448.0
11000 0.8889 5568.0 3872.0 1.3215 16.8215 59.448 7.431 5504.0 40704.0
12000 0.9697 5440.0 3792.0 1.3263 16.7825 59.586 7.448 5120.0 72704.0
12375 1.0 5696.0 3936.0 1.3389 16.7852 59.576 7.447 5696.0 40192.0

Framework versions

  • Distily 0.2.0
  • Transformers 4.44.0
  • Pytorch 2.3.0
  • Datasets 2.21.0