lapp0's picture
End of training
9b8dc5c verified
|
raw
history blame
4.25 kB
metadata
base_model: gpt2
library_name: distily
license: mit
tags:
  - generated_from_trainer
model-index:
  - name: distily_bench_gpt2_batch_size
    results: []

distily_bench_gpt2_batch_size

This student model is distilled from the teacher model gpt2 using the dataset (unspecified).

The Distily library was used for this distillation.

It achieves the following results on the evaluation set:

  • eval_enwikippl: 685.3206
  • eval_frwikippl: 4166.5459
  • eval_zhwikippl: 10016.8096
  • eval_loss: 7038.3359
  • eval_runtime: 21.4586
  • eval_samples_per_second: 46.601
  • eval_steps_per_second: 11.65

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • distillation_objective: <distily.objectives.LegacyObjective object at 0x7f7d9010c220>
  • train_embeddings: True
  • learning_rate: 4e-05
  • train_batch_size: 8
  • eval_batch_size: 4
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant
  • num_epochs: 1.0

Resource Usage

Peak GPU Memory: 15.7299 GB

Eval-Phase Metrics

step epoch enwikippl frwikippl loss runtime samples_per_second steps_per_second zhwikippl
teacher eval 30.2385 57.2728 18.1772
0 0 55339.3672 57682.5742 331776.0 21.5939 46.309 11.577 57080.2930
500 0.0404 2539.2073 10701.1621 12591.8076 21.7416 45.995 11.499 52355.7031
1000 0.0808 1909.6178 6915.1094 10639.6162 21.6609 46.166 11.542 28701.2539
1500 0.1212 1563.3774 6141.7974 9791.8721 21.4972 46.518 11.629 22309.7324
2000 0.1616 1355.0397 6279.7275 9359.6162 21.5918 46.314 11.578 21985.8770
2500 0.2020 1227.5216 5683.0625 9057.0879 21.5112 46.487 11.622 18872.3887
3000 0.2424 1126.7594 5939.9272 8646.5283 21.3909 46.749 11.687 24154.6621
3500 0.2828 1050.4360 5464.5869 8420.1602 21.4858 46.542 11.636 17743.2598
4000 0.3232 995.7695 5204.1860 8302.2080 21.671 46.145 11.536 16421.9824
4500 0.3636 939.8518 4812.9336 8087.6162 21.6884 46.108 11.527 18503.0742
5000 0.4040 894.0764 5064.1064 7913.1841 21.5518 46.4 11.6 18140.9707
5500 0.4444 854.5688 4709.2144 7854.3359 21.4663 46.585 11.646 13195.6348
6000 0.4848 815.7767 4654.2524 7642.8481 21.5227 46.463 11.616 14954.4814
6500 0.5253 795.4309 4827.8882 7615.8398 21.7578 45.961 11.49 16576.2129
7000 0.5657 769.8770 4643.7627 7491.7759 21.5078 46.495 11.624 16412.1191
7500 0.6061 755.6475 4527.0581 7376.8638 21.6221 46.249 11.562 15845.7139
8000 0.6465 728.3958 4527.8569 7300.3838 21.828 45.813 11.453 17439.0645
8500 0.6869 725.3337 4182.8813 7177.5039 21.5139 46.482 11.62 14905.6289
9000 0.7273 693.5788 4218.2700 7109.6958 21.5247 46.458 11.615 13303.5693
9500 0.7677 685.3206 4166.5459 7038.3359 21.4586 46.601 11.65 10016.8096
10000 0.8081 681.6716 4036.2834 7004.8638 21.5314 46.444 11.611 8738.9414
10500 0.8485 656.0428 4098.0786 6922.6558 21.6176 46.259 11.565 9690.5127
11000 0.8889 655.4447 4287.2363 6842.6240 21.4458 46.629 11.657 13403.4229
11500 0.9293 635.5220 4204.7627 6807.0400 21.4021 46.724 11.681 10938.9814
12000 0.9697 633.3297 4234.2144 6815.6162 21.5151 46.479 11.62 10046.2852
12375 1.0 633.1452 4051.2502 6749.3442 21.4393 46.643 11.661 10211.2959

Framework versions

  • Distily 0.2.0
  • Transformers 4.44.0
  • Pytorch 2.3.0
  • Datasets 2.20.0