lapp0's picture
End of training
926ac2d verified
|
raw
history blame
3.47 kB
metadata
base_model: gpt2
library_name: Distily
license: mit
tags:
  - generated_from_trainer
model-index:
  - name: distily_bench_obj_cross_v2.15_gpt2
    results: []

distily_bench_obj_cross_v2.15_gpt2

This student model is distilled from the teacher model gpt2 using the dataset (unspecified).

The Distily library was used for this distillation.

It achieves the following results on the evaluation set:

  • eval_enwikippl: 1784.0
  • eval_frwikippl: 9792.0
  • eval_zhwikippl: 72192.0
  • eval_tinystoriesppl: 1448.0
  • eval_loss: 2.5122
  • eval_runtime: 17.041
  • eval_samples_per_second: 58.682
  • eval_steps_per_second: 7.335

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • distillation_objective: DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl, layer_mapper=None, projector=None), hs_loss_component=LossComponent(label=hs, weight=1.0, loss_fn=mse, layer_mapper=last, projector=None), attn_loss_component=LossComponent(label=attn, weight=0, loss_fn=None, layer_mapper=None, projector=None))
  • train_embeddings: True
  • learning_rate: 0.0004
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant
  • lr_scheduler_warmup_ratio: 0.2
  • num_epochs: 1.0

Resource Usage

Peak GPU Memory: 8.0892 GB

Eval-Phase Metrics

step epoch enwikippl frwikippl loss runtime samples_per_second steps_per_second tinystoriesppl zhwikippl
teacher eval 43.75 61.75 11.8125 19.125
0 0 2336462209024.0 122045790683136.0 22.4230 17.051 58.648 7.331 4429185024.0 25975962206208.0
1000 0.0808 588.0 3680.0 1.8545 17.0585 58.622 7.328 612.0 2880.0
2000 0.1616 988.0 5600.0 2.1657 17.0179 58.762 7.345 816.0 3200.0
3000 0.2424 1744.0 8640.0 2.5064 17.083 58.538 7.317 1544.0 46080.0
4000 0.3232 1864.0 8896.0 2.5506 17.0876 58.522 7.315 1544.0 63744.0
5000 0.4040 1728.0 8832.0 2.4970 17.0783 58.554 7.319 1520.0 51200.0
6000 0.4848 1936.0 9216.0 2.5779 17.0361 58.699 7.337 1688.0 63744.0
7000 0.5657 2224.0 9792.0 2.6441 17.0202 58.754 7.344 1832.0 82944.0
8000 0.6465 1936.0 8832.0 2.5707 17.057 58.627 7.328 1808.0 115200.0
9000 0.7273 1784.0 9792.0 2.5122 17.041 58.682 7.335 1448.0 72192.0
10000 0.8081 2064.0 9664.0 2.5934 17.147 58.319 7.29 1552.0 91648.0
11000 0.8889 2064.0 10240.0 2.6004 17.0431 58.675 7.334 1720.0 80896.0
12000 0.9697 2064.0 11008.0 2.6036 17.1142 58.431 7.304 1800.0 68608.0
12375 1.0 1992.0 9280.0 2.5963 17.0677 58.59 7.324 1752.0 70144.0

Framework versions

  • Distily 0.2.0
  • Transformers 4.44.0
  • Pytorch 2.3.0
  • Datasets 2.21.0