lapp0's picture
End of training
1d0ba9d verified
|
raw
history blame
3.17 kB
metadata
base_model: gpt2
library_name: Distily
license: mit
tags:
  - generated_from_trainer
model-index:
  - name: distily_bench_obj_cross_v2.13_gpt2
    results: []

distily_bench_obj_cross_v2.13_gpt2

This student model is distilled from the teacher model gpt2 using the dataset (unspecified).

The Distily library was used for this distillation.

It achieves the following results on the evaluation set:

  • eval_enwikippl: 1376.0
  • eval_frwikippl: 5856.0
  • eval_zhwikippl: 111104.0
  • eval_tinystoriesppl: 964.0
  • eval_loss: 3.1072
  • eval_runtime: 12.9331
  • eval_samples_per_second: 46.392
  • eval_steps_per_second: 11.598

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • distillation_objective: DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl, layer_mapper=None, projector=None), hs_loss_component=LossComponent(label=hs, weight=1.0, loss_fn=cos, layer_mapper=last, projector=None), attn_loss_component=LossComponent(label=attn, weight=0, loss_fn=None, layer_mapper=None, projector=None))
  • train_embeddings: True
  • learning_rate: 0.0001
  • train_batch_size: 8
  • eval_batch_size: 4
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.5
  • num_epochs: 1.0

Resource Usage

Peak GPU Memory: 8.0905 GB

Eval-Phase Metrics

step epoch enwikippl frwikippl loss runtime samples_per_second steps_per_second tinystoriesppl zhwikippl
teacher eval 43.75 61.75 11.8125 19.125
0 0 1821066133504.0 158329674399744.0 20.2008 12.9107 46.473 11.618 12079595520.0 98956046499840.0
750 0.1010 1376.0 5856.0 3.1072 12.9331 46.392 11.598 964.0 111104.0
1500 0.2020 580.0 3600.0 2.2189 12.9266 46.416 11.604 438.0 1020.0
2250 0.3030 376.0 1904.0 1.9249 12.9295 46.405 11.601 312.0 374.0
3000 0.4040 268.0 1080.0 1.6655 12.9091 46.479 11.62 238.0 208.0
3750 0.5051 211.0 732.0 1.4810 12.9336 46.391 11.598 172.0 217.0
4500 0.6061 167.0 580.0 1.2985 12.9202 46.439 11.61 143.0 146.0
5250 0.7071 135.0 486.0 1.1339 12.9225 46.431 11.608 112.5 133.0
6000 0.8081 124.5 452.0 1.0647 12.9107 46.473 11.618 101.5 125.5
6750 0.9091 118.5 436.0 1.0324 12.9153 46.456 11.614 97.5 120.0
7425 1.0 117.5 432.0 1.0264 13.1576 45.601 11.4 95.5 118.5

Framework versions

  • Distily 0.2.0
  • Transformers 4.44.0
  • Pytorch 2.3.0
  • Datasets 2.21.0