lapp0's picture
End of training
34d95d5 verified
|
raw
history blame
No virus
3.33 kB
metadata
base_model: gpt2
library_name: Distily
license: mit
tags:
  - generated_from_trainer
model-index:
  - name: distily_bench_gpt2_activation_loss_b
    results: []

distily_bench_gpt2_activation_loss_b

This student model is distilled from the teacher model gpt2 using the dataset (unspecified).

The Distily library was used for this distillation.

It achieves the following results on the evaluation set:

  • eval_enwikippl: 210.2820
  • eval_frwikippl: 1274.1346
  • eval_zhwikippl: 583.2827
  • eval_loss: 1.2965
  • eval_runtime: 17.2526
  • eval_samples_per_second: 57.962
  • eval_steps_per_second: 7.245

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • distillation_objective: DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl, layer_mapper=None, projector=None), hs_loss_component=LossComponent(label=hs, weight=2.0, loss_fn=mse, layer_mapper=None, projector=None), attn_loss_component=LossComponent(label=attn, weight=0, loss_fn=None, layer_mapper=None, projector=None))
  • train_embeddings: True
  • learning_rate: 4e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant
  • num_epochs: 1.0

Resource Usage

Peak GPU Memory: 8.0904 GB

Eval-Phase Metrics

step epoch enwikippl frwikippl loss runtime samples_per_second steps_per_second zhwikippl
teacher eval 30.2086 57.2728 18.1784
0 0 58037.3203 58017.0117 6.0237 17.2607 57.935 7.242 56038.0625
1000 0.0808 715.0994 4658.6846 2.0131 17.1734 58.23 7.279 16350.8623
2000 0.1616 508.9246 3343.2109 1.8201 17.2004 58.138 7.267 3102.6990
3000 0.2424 419.7101 2552.4004 1.7020 17.1441 58.329 7.291 1042.4126
4000 0.3232 361.0421 2336.7490 1.6177 17.0616 58.611 7.326 911.8621
5000 0.4040 313.2633 1815.2219 1.5316 17.1786 58.212 7.276 863.9713
6000 0.4848 281.3860 1725.1301 1.4597 17.3168 57.747 7.218 705.6341
7000 0.5657 253.9131 1485.6165 1.3999 17.1434 58.332 7.291 605.2624
8000 0.6465 229.4073 1427.2965 1.3455 17.134 58.363 7.295 629.6656
9000 0.7273 210.2820 1274.1346 1.2965 17.2526 57.962 7.245 583.2827
10000 0.8081 194.6313 1199.3423 1.2490 17.1679 58.248 7.281 677.5621
11000 0.8889 180.3274 1160.25 1.1980 17.1591 58.278 7.285 758.1945
12000 0.9697 164.7045 1005.8066 1.1583 17.1824 58.199 7.275 600.1918
12375 1.0 161.0243 969.7354 1.1403 17.1939 58.16 7.27 632.9536

Framework versions

  • Distily 0.2.0
  • Transformers 4.44.0
  • Pytorch 2.3.0
  • Datasets 2.21.0