lapp0's picture
End of training
b378d0c verified
|
raw
history blame
3.48 kB
metadata
base_model: gpt2
library_name: Distily
license: mit
tags:
  - generated_from_trainer
model-index:
  - name: distily_bench_obj_cross_v2.15_gpt2
    results: []

distily_bench_obj_cross_v2.15_gpt2

This student model is distilled from the teacher model gpt2 using the dataset (unspecified).

The Distily library was used for this distillation.

It achieves the following results on the evaluation set:

  • eval_enwikippl: 2192.0
  • eval_frwikippl: 11200.0
  • eval_zhwikippl: 93184.0
  • eval_tinystoriesppl: 1808.0
  • eval_loss: 2.6293
  • eval_runtime: 16.9228
  • eval_samples_per_second: 59.092
  • eval_steps_per_second: 7.386

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • distillation_objective: DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl, layer_mapper=None, projector=None), hs_loss_component=LossComponent(label=hs, weight=0, loss_fn=None, layer_mapper=None, projector=None), attn_loss_component=LossComponent(label=attn, weight=0, loss_fn=None, layer_mapper=None, projector=None))
  • train_embeddings: True
  • learning_rate: 0.0004
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant
  • lr_scheduler_warmup_ratio: 0.2
  • num_epochs: 1.0

Resource Usage

Peak GPU Memory: 7.9368 GB

Eval-Phase Metrics

step epoch enwikippl frwikippl loss runtime samples_per_second steps_per_second tinystoriesppl zhwikippl
teacher eval 43.75 61.75 11.8125 19.125
0 0 2473901162496.0 170424302305280.0 20.7680 16.794 59.545 7.443 4060086272.0 71468255805440.0
1000 0.0808 688.0 3728.0 1.9530 16.821 59.449 7.431 652.0 2784.0
2000 0.1616 1728.0 8256.0 2.4948 16.7878 59.567 7.446 1384.0 35584.0
3000 0.2424 2040.0 10112.0 2.6087 16.7522 59.694 7.462 1720.0 64256.0
4000 0.3232 2160.0 9280.0 2.6353 16.796 59.538 7.442 1816.0 57088.0
5000 0.4040 1904.0 9088.0 2.5782 16.8206 59.451 7.431 1848.0 61440.0
6000 0.4848 1840.0 8960.0 2.5344 16.7618 59.659 7.457 1592.0 69120.0
7000 0.5657 1808.0 8512.0 2.5269 16.7913 59.555 7.444 1648.0 60672.0
8000 0.6465 2096.0 8960.0 2.6404 16.8233 59.442 7.43 1928.0 137216.0
9000 0.7273 2192.0 11200.0 2.6293 16.9228 59.092 7.386 1808.0 93184.0
10000 0.8081 1944.0 9984.0 2.5759 16.857 59.323 7.415 1568.0 80896.0
11000 0.8889 1736.0 9344.0 2.5147 16.8438 59.369 7.421 1488.0 48640.0
12000 0.9697 2224.0 11840.0 2.6633 16.7839 59.581 7.448 1968.0 98816.0
12375 1.0 2432.0 11072.0 2.7197 16.7952 59.541 7.443 2176.0 109568.0

Framework versions

  • Distily 0.2.0
  • Transformers 4.44.0
  • Pytorch 2.3.0
  • Datasets 2.21.0